diff mbox

[v4] NUMA: Enable adding NUMA node implicitly

Message ID 20171026080245.GA26955@localhost.localdomain (mailing list archive)
State New, archived
Headers show

Commit Message

Eduardo Habkost Oct. 26, 2017, 8:02 a.m. UTC
Hi,

Sorry for taking so long to review it:

On Mon, Oct 23, 2017 at 09:33:42AM +0800, Dou Liyang wrote:
> Linux and Windows need ACPI SRAT table to make memory hotplug work properly,
> however currently QEMU doesn't create SRAT table if numa options aren't present
> on CLI.
> 
> Which breaks both linux and windows guests in certain conditions:
>  * Windows: won't enable memory hotplug without SRAT table at all
>  * Linux: if QEMU is started with initial memory all below 4Gb and no SRAT table
>    present, guest kernel will use nommu DMA ops, which breaks 32bit hw drivers
>    when memory is hotplugged and guest tries to use it with that drivers.
> 
> Fix above issues by automatically creating a numa node when QEMU is started with
> memory hotplug enabled but without '-numa' options on CLI.
> (PS: auto-create numa node only for new machine types so not to break migration).
> 
> Which would provide SRAT table to guests without explicit -numa options on CLI
> and would allow:
>  * Windows: to enable memory hotplug
>  * Linux: switch to SWIOTLB DMA ops, to bounce DMA transfers to 32bit allocated
>    buffers that legacy drivers/hw can handle.
> 
> [Rewritten by Igor]
> 
> Reported-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
> Suggested-by: Igor Mammedov <imammedo@redhat.com>
> Signed-off-by: Dou Liyang <douly.fnst@cn.fujitsu.com>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Richard Henderson <rth@twiddle.net>
> Cc: Eduardo Habkost <ehabkost@redhat.com>
> Cc: "Michael S. Tsirkin" <mst@redhat.com>
> Cc: Marcel Apfelbaum <marcel@redhat.com>
> Cc: Igor Mammedov <imammedo@redhat.com>
> Cc: David Hildenbrand <david@redhat.com>
> Cc: Thomas Huth <thuth@redhat.com>
> Cc: Alistair Francis <alistair23@gmail.com>
> Cc: f4bug@amsat.org
> Cc: Takao Indoh <indou.takao@jp.fujitsu.com>
> Cc: Izumi Taku <izumi.taku@jp.fujitsu.com>
> ---
[...]
> diff --git a/numa.c b/numa.c
> index 100a67f..ba8d813 100644
> --- a/numa.c
> +++ b/numa.c
> @@ -423,12 +423,32 @@ void numa_default_auto_assign_ram(MachineClass *mc, NodeInfo *nodes,
>      nodes[i].node_mem = size - usedmem;
>  }
>  
> -void parse_numa_opts(MachineState *ms)
> +void parse_numa_opts(MachineState *ms, uint64_t ram_slots)
>  {
>      int i;
>      MachineClass *mc = MACHINE_GET_CLASS(ms);
> +    QemuOptsList *numa_opts = qemu_find_opts("numa");
>  
> -    if (qemu_opts_foreach(qemu_find_opts("numa"), parse_numa, ms, NULL)) {
> +    /*
> +     * If memory hotplug is enabled (slots > 0) but without '-numa'
> +     * options explicitly on CLI, guestes will break.
> +     *
> +     *   Windows: won't enable memory hotplug without SRAT table at all
> +     *
> +     *   Linux: if QEMU is started with initial memory all below 4Gb
> +     *   and no SRAT table present, guest kernel will use nommu DMA ops,
> +     *   which breaks 32bit hw drivers when memory is hotplugged and
> +     *   guest tries to use it with that drivers.
> +     *
> +     * Enable NUMA implicitly by adding a new NUMA node automatically.
> +     */
> +    if (ram_slots > 0 && QTAILQ_EMPTY(&numa_opts->head)) {
> +        if (mc->auto_enable_numa_with_memhp) {

If you move the code after qemu_opts_foreach(), you could just
check if nb_numa_nodes is 0 instead of peeking at
numa_opts->head.


> +            qemu_opts_parse_noisily(numa_opts, "node", true);
> +        }
> +    }

Calling qemu_opts_parse*() has additional user-visible side
effects (it can make -writeconfig include the new option,
depending on the initialization ordering).  Affecting QemuOpts
depending on the machine-type breaks the separation between
machine configuration from machine initialization, so I would
like to avoid it.

We could simply call parse_numa_node() (after making it increment
nb_numa_nodes automatically).

e.g.:



> +
> +    if (qemu_opts_foreach(numa_opts, parse_numa, ms, NULL)) {
>          exit(1);
>      }
>  
> diff --git a/vl.c b/vl.c
> index 0723835..516d0c9 100644
> --- a/vl.c
> +++ b/vl.c
> @@ -4677,8 +4677,6 @@ int main(int argc, char **argv, char **envp)
>      default_drive(default_floppy, snapshot, IF_FLOPPY, 0, FD_OPTS);
>      default_drive(default_sdcard, snapshot, IF_SD, 0, SD_OPTS);
>  
> -    parse_numa_opts(current_machine);
> -
>      if (qemu_opts_foreach(qemu_find_opts("mon"),
>                            mon_init_func, NULL, NULL)) {
>          exit(1);
> @@ -4728,6 +4726,7 @@ int main(int argc, char **argv, char **envp)
>      current_machine->boot_order = boot_order;
>      current_machine->cpu_model = cpu_model;
>  
> +    parse_numa_opts(current_machine, ram_slots);

Why did you add a ram_slots argument if it's already present at
current_machine->ram_slots?

>  
>      /* parse features once if machine provides default cpu_type */
>      if (machine_class->default_cpu_type) {
> -- 
> 2.5.5
> 
> 
> 
>

Comments

Dou Liyang Oct. 27, 2017, 3:02 a.m. UTC | #1
Deer Eduardo,

At 10/26/2017 04:02 PM, Eduardo Habkost wrote:
> Hi,
>
> Sorry for taking so long to review it:

Not matter. It's my honor!

>
> On Mon, Oct 23, 2017 at 09:33:42AM +0800, Dou Liyang wrote:

[...]

>> +     */
>> +    if (ram_slots > 0 && QTAILQ_EMPTY(&numa_opts->head)) {
>> +        if (mc->auto_enable_numa_with_memhp) {
>
> If you move the code after qemu_opts_foreach(), you could just
> check if nb_numa_nodes is 0 instead of peeking at
> numa_opts->head.
>
>
>> +            qemu_opts_parse_noisily(numa_opts, "node", true);
>> +        }
>> +    }
>
> Calling qemu_opts_parse*() has additional user-visible side
> effects (it can make -writeconfig include the new option,
> depending on the initialization ordering).  Affecting QemuOpts
> depending on the machine-type breaks the separation between
> machine configuration from machine initialization, so I would
> like to avoid it.
>
Yes, Indeed. Thank you so much for kind explanation!

> We could simply call parse_numa_node() (after making it increment
> nb_numa_nodes automatically).
>

Yes, I will use it in the next version.

> e.g.:
>
>
> diff --git a/numa.c b/numa.c
> index 8d78d959f6..da18e42ce7 100644
> --- a/numa.c
> +++ b/numa.c
> @@ -216,6 +216,7 @@ static void parse_numa_node(MachineState *ms, NumaNodeOptions *node,
>      }
>      numa_info[nodenr].present = true;
>      max_numa_nodeid = MAX(max_numa_nodeid, nodenr + 1);
> +    nb_numa_nodes++;
>  }
>
>  static void parse_numa_distance(NumaDistOptions *dist, Error **errp)
> @@ -282,7 +283,6 @@ static int parse_numa(void *opaque, QemuOpts *opts, Error **errp)
>          if (err) {
>              goto end;
>          }
> -        nb_numa_nodes++;
>          break;
>      case NUMA_OPTIONS_TYPE_DIST:
>          parse_numa_distance(&object->u.dist, &err);
> @@ -433,6 +433,26 @@ void parse_numa_opts(MachineState *ms)
>          exit(1);
>      }
>
> +    /*
> +     * If memory hotplug is enabled (slots > 0) but without '-numa'
> +     * options explicitly on CLI, guestes will break.
> +     *
> +     *   Windows: won't enable memory hotplug without SRAT table at all
> +     *
> +     *   Linux: if QEMU is started with initial memory all below 4Gb
> +     *   and no SRAT table present, guest kernel will use nommu DMA ops,
> +     *   which breaks 32bit hw drivers when memory is hotplugged and
> +     *   guest tries to use it with that drivers.
> +     *
> +     * Enable NUMA implicitly by adding a new NUMA node automatically.
> +     */
> +    if (ms->ram_slots > 0 && nb_numa_nodes == 0 &&
> +        mc->auto_enable_numa_with_memhp) {
> +        NumaNodeOptions node = { };
> +        parse_numa_node(ms, &node, &error_abort);
> +    }
> +
> +
>      assert(max_numa_nodeid <= MAX_NODES);
>
>      /* No support for sparse NUMA node IDs yet: */
>
>> +
>> +    if (qemu_opts_foreach(numa_opts, parse_numa, ms, NULL)) {
>>          exit(1);
>>      }
>>
>> diff --git a/vl.c b/vl.c
>> index 0723835..516d0c9 100644
>> --- a/vl.c
>> +++ b/vl.c
>> @@ -4677,8 +4677,6 @@ int main(int argc, char **argv, char **envp)
>>      default_drive(default_floppy, snapshot, IF_FLOPPY, 0, FD_OPTS);
>>      default_drive(default_sdcard, snapshot, IF_SD, 0, SD_OPTS);
>>
>> -    parse_numa_opts(current_machine);
>> -
>>      if (qemu_opts_foreach(qemu_find_opts("mon"),
>>                            mon_init_func, NULL, NULL)) {
>>          exit(1);
>> @@ -4728,6 +4726,7 @@ int main(int argc, char **argv, char **envp)
>>      current_machine->boot_order = boot_order;
>>      current_machine->cpu_model = cpu_model;
>>
>> +    parse_numa_opts(current_machine, ram_slots);
>
> Why did you add a ram_slots argument if it's already present at
> current_machine->ram_slots?

Oops, Forgot to update it after moving it behind the setup of
current_machine. will remove the redundant ram_slots argument.

Thanks,
	dou
>
>>
>>      /* parse features once if machine provides default cpu_type */
>>      if (machine_class->default_cpu_type) {
>> --
>> 2.5.5
>>
>>
>>
>>
>
diff mbox

Patch

diff --git a/numa.c b/numa.c
index 8d78d959f6..da18e42ce7 100644
--- a/numa.c
+++ b/numa.c
@@ -216,6 +216,7 @@  static void parse_numa_node(MachineState *ms, NumaNodeOptions *node,
     }
     numa_info[nodenr].present = true;
     max_numa_nodeid = MAX(max_numa_nodeid, nodenr + 1);
+    nb_numa_nodes++;
 }
 
 static void parse_numa_distance(NumaDistOptions *dist, Error **errp)
@@ -282,7 +283,6 @@  static int parse_numa(void *opaque, QemuOpts *opts, Error **errp)
         if (err) {
             goto end;
         }
-        nb_numa_nodes++;
         break;
     case NUMA_OPTIONS_TYPE_DIST:
         parse_numa_distance(&object->u.dist, &err);
@@ -433,6 +433,26 @@  void parse_numa_opts(MachineState *ms)
         exit(1);
     }
 
+    /*
+     * If memory hotplug is enabled (slots > 0) but without '-numa'
+     * options explicitly on CLI, guestes will break.
+     *
+     *   Windows: won't enable memory hotplug without SRAT table at all
+     *
+     *   Linux: if QEMU is started with initial memory all below 4Gb
+     *   and no SRAT table present, guest kernel will use nommu DMA ops,
+     *   which breaks 32bit hw drivers when memory is hotplugged and
+     *   guest tries to use it with that drivers.
+     *
+     * Enable NUMA implicitly by adding a new NUMA node automatically.
+     */
+    if (ms->ram_slots > 0 && nb_numa_nodes == 0 &&
+        mc->auto_enable_numa_with_memhp) {
+        NumaNodeOptions node = { };
+        parse_numa_node(ms, &node, &error_abort);
+    }
+
+
     assert(max_numa_nodeid <= MAX_NODES);
 
     /* No support for sparse NUMA node IDs yet: */