Message ID | 1342499490-7061-1-git-send-email-chegu_vinod@hp.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Mon, Jul 16, 2012 at 09:31:30PM -0700, Chegu Vinod wrote: > Changes since v3: > - using bitmap_set() instead of set_bit() in numa_add() routine. > - removed call to bitmak_zero() since bitmap_new() also zeros' the bitmap. > - Rebased to the latest qemu. Tested-by: Eduardo Habkost <ehabkost@redhat.com> Reviewed-by: Eduardo Habkost <ehabkost@redhat.com> > > Changes since v2: > - Using "unsigned long *" for the node_cpumask[]. > - Use bitmap_new() instead of g_malloc0() for allocation. > - Don't rely on "max_cpus" since it may not be initialized > before the numa related qemu options are parsed & processed. > > Note: Continuing to use a new constant for allocation of > the mask (This constant is currently set to 255 since > with an 8bit APIC ID VCPUs can range from 0-254 in a > guest. The APIC ID 255 (0xFF) is reserved for broadcast). > > Changes since v1: > > - Use bitmap functions that are already in qemu (instead > of cpu_set_t macro's from sched.h) > - Added a check for endvalue >= max_cpus. > - Fix to address the round-robbing assignment when > cpu's are not explicitly specified. > ----------------------------------------------- > > v1: > -- > > The -numa option to qemu is used to create [fake] numa nodes > and expose them to the guest OS instance. > > There are a couple of issues with the -numa option: > > a) Max VCPU's that can be specified for a guest while using > the qemu's -numa option is 64. Due to a typecasting issue > when the number of VCPUs is > 32 the VCPUs don't show up > under the specified [fake] numa nodes. > > b) KVM currently has support for 160VCPUs per guest. The > qemu's -numa option has only support for upto 64VCPUs > per guest. > This patch addresses these two issues. > > Below are examples of (a) and (b) > > a) >32 VCPUs are specified with the -numa option: > > /usr/local/bin/qemu-system-x86_64 \ > -enable-kvm \ > 71:01:01 \ > -net tap,ifname=tap0,script=no,downscript=no \ > -vnc :4 > > ... > Upstream qemu : > -------------- > > QEMU 1.1.50 monitor - type 'help' for more information > (qemu) info numa > 6 nodes > node 0 cpus: 0 1 2 3 4 5 6 7 8 9 32 33 34 35 36 37 38 39 40 41 > node 0 size: 131072 MB > node 1 cpus: 10 11 12 13 14 15 16 17 18 19 42 43 44 45 46 47 48 49 50 51 > node 1 size: 131072 MB > node 2 cpus: 20 21 22 23 24 25 26 27 28 29 52 53 54 55 56 57 58 59 > node 2 size: 131072 MB > node 3 cpus: 30 > node 3 size: 131072 MB > node 4 cpus: > node 4 size: 131072 MB > node 5 cpus: 31 > node 5 size: 131072 MB > > With the patch applied : > ----------------------- > > QEMU 1.1.50 monitor - type 'help' for more information > (qemu) info numa > 6 nodes > node 0 cpus: 0 1 2 3 4 5 6 7 8 9 > node 0 size: 131072 MB > node 1 cpus: 10 11 12 13 14 15 16 17 18 19 > node 1 size: 131072 MB > node 2 cpus: 20 21 22 23 24 25 26 27 28 29 > node 2 size: 131072 MB > node 3 cpus: 30 31 32 33 34 35 36 37 38 39 > node 3 size: 131072 MB > node 4 cpus: 40 41 42 43 44 45 46 47 48 49 > node 4 size: 131072 MB > node 5 cpus: 50 51 52 53 54 55 56 57 58 59 > node 5 size: 131072 MB > > b) >64 VCPUs specified with -numa option: > > /usr/local/bin/qemu-system-x86_64 \ > -enable-kvm \ > -cpu Westmere,+rdtscp,+pdpe1gb,+dca,+pdcm,+xtpr,+tm2,+est,+smx,+vmx,+ds_cpl,+monitor,+dtes64,+pclmuldq,+pbe,+tm,+ht,+ss,+acpi,+d-vnc :4 > > ... > > Upstream qemu : > -------------- > > only 63 CPUs in NUMA mode supported. > only 64 CPUs in NUMA mode supported. > QEMU 1.1.50 monitor - type 'help' for more information > (qemu) info numa > 8 nodes > node 0 cpus: 6 7 8 9 38 39 40 41 70 71 72 73 > node 0 size: 65536 MB > node 1 cpus: 10 11 12 13 14 15 16 17 18 19 42 43 44 45 46 47 48 49 50 51 74 75 76 77 78 79 > node 1 size: 65536 MB > node 2 cpus: 20 21 22 23 24 25 26 27 28 29 52 53 54 55 56 57 58 59 60 61 > node 2 size: 65536 MB > node 3 cpus: 30 62 > node 3 size: 65536 MB > node 4 cpus: > node 4 size: 65536 MB > node 5 cpus: > node 5 size: 65536 MB > node 6 cpus: 31 63 > node 6 size: 65536 MB > node 7 cpus: 0 1 2 3 4 5 32 33 34 35 36 37 64 65 66 67 68 69 > node 7 size: 65536 MB > > With the patch applied : > ----------------------- > > QEMU 1.1.50 monitor - type 'help' for more information > (qemu) info numa > 8 nodes > node 0 cpus: 0 1 2 3 4 5 6 7 8 9 > node 0 size: 65536 MB > node 1 cpus: 10 11 12 13 14 15 16 17 18 19 > node 1 size: 65536 MB > node 2 cpus: 20 21 22 23 24 25 26 27 28 29 > node 2 size: 65536 MB > node 3 cpus: 30 31 32 33 34 35 36 37 38 39 > node 3 size: 65536 MB > node 4 cpus: 40 41 42 43 44 45 46 47 48 49 > node 4 size: 65536 MB > node 5 cpus: 50 51 52 53 54 55 56 57 58 59 > node 5 size: 65536 MB > node 6 cpus: 60 61 62 63 64 65 66 67 68 69 > node 6 size: 65536 MB > node 7 cpus: 70 71 72 73 74 75 76 77 78 79 > > Signed-off-by: Chegu Vinod <chegu_vinod@hp.com>, Jim Hull <jim.hull@hp.com>, Craig Hada <craig.hada@hp.com> > --- > cpus.c | 3 ++- > hw/pc.c | 3 ++- > sysemu.h | 3 ++- > vl.c | 43 +++++++++++++++++++++---------------------- > 4 files changed, 27 insertions(+), 25 deletions(-) > > diff --git a/cpus.c b/cpus.c > index b182b3d..acccd08 100644 > --- a/cpus.c > +++ b/cpus.c > @@ -36,6 +36,7 @@ > #include "cpus.h" > #include "qtest.h" > #include "main-loop.h" > +#include "bitmap.h" > > #ifndef _WIN32 > #include "compatfd.h" > @@ -1145,7 +1146,7 @@ void set_numa_modes(void) > > for (env = first_cpu; env != NULL; env = env->next_cpu) { > for (i = 0; i < nb_numa_nodes; i++) { > - if (node_cpumask[i] & (1 << env->cpu_index)) { > + if (test_bit(env->cpu_index, node_cpumask[i])) { > env->numa_node = i; > } > } > diff --git a/hw/pc.c b/hw/pc.c > index c7e9ab3..2edcc07 100644 > --- a/hw/pc.c > +++ b/hw/pc.c > @@ -48,6 +48,7 @@ > #include "memory.h" > #include "exec-memory.h" > #include "arch_init.h" > +#include "bitmap.h" > > /* output Bochs bios info messages */ > //#define DEBUG_BIOS > @@ -639,7 +640,7 @@ static void *bochs_bios_init(void) > numa_fw_cfg[0] = cpu_to_le64(nb_numa_nodes); > for (i = 0; i < max_cpus; i++) { > for (j = 0; j < nb_numa_nodes; j++) { > - if (node_cpumask[j] & (1 << i)) { > + if (test_bit(i, node_cpumask[j])) { > numa_fw_cfg[i + 1] = cpu_to_le64(j); > break; > } > diff --git a/sysemu.h b/sysemu.h > index 6540c79..4669348 100644 > --- a/sysemu.h > +++ b/sysemu.h > @@ -134,9 +134,10 @@ extern uint8_t qemu_extra_params_fw[2]; > extern QEMUClock *rtc_clock; > > #define MAX_NODES 64 > +#define MAX_CPUMASK_BITS 255 > extern int nb_numa_nodes; > extern uint64_t node_mem[MAX_NODES]; > -extern uint64_t node_cpumask[MAX_NODES]; > +extern unsigned long *node_cpumask[MAX_NODES]; > > #define MAX_OPTION_ROMS 16 > typedef struct QEMUOptionRom { > diff --git a/vl.c b/vl.c > index 46248b9..35a641b 100644 > --- a/vl.c > +++ b/vl.c > @@ -28,6 +28,7 @@ > #include <errno.h> > #include <sys/time.h> > #include <zlib.h> > +#include "bitmap.h" > > /* Needed early for CONFIG_BSD etc. */ > #include "config-host.h" > @@ -240,7 +241,7 @@ QTAILQ_HEAD(, FWBootEntry) fw_boot_order = QTAILQ_HEAD_INITIALIZER(fw_boot_order > > int nb_numa_nodes; > uint64_t node_mem[MAX_NODES]; > -uint64_t node_cpumask[MAX_NODES]; > +unsigned long *node_cpumask[MAX_NODES]; > > uint8_t qemu_uuid[16]; > > @@ -951,6 +952,8 @@ static void numa_add(const char *optarg) > unsigned long long value, endvalue; > int nodenr; > > + value = endvalue = 0ULL; > + > optarg = get_opt_name(option, 128, optarg, ',') + 1; > if (!strcmp(option, "node")) { > if (get_param_value(option, 128, "nodeid", optarg) == 0) { > @@ -970,27 +973,22 @@ static void numa_add(const char *optarg) > } > node_mem[nodenr] = sval; > } > - if (get_param_value(option, 128, "cpus", optarg) == 0) { > - node_cpumask[nodenr] = 0; > - } else { > + if (get_param_value(option, 128, "cpus", optarg) != 0) { > value = strtoull(option, &endptr, 10); > - if (value >= 64) { > - value = 63; > - fprintf(stderr, "only 64 CPUs in NUMA mode supported.\n"); > + if (*endptr == '-') { > + endvalue = strtoull(endptr+1, &endptr, 10); > } else { > - if (*endptr == '-') { > - endvalue = strtoull(endptr+1, &endptr, 10); > - if (endvalue >= 63) { > - endvalue = 62; > - fprintf(stderr, > - "only 63 CPUs in NUMA mode supported.\n"); > - } > - value = (2ULL << endvalue) - (1ULL << value); > - } else { > - value = 1ULL << value; > - } > + endvalue = value; > + } > + > + if (!(endvalue < MAX_CPUMASK_BITS)) { > + endvalue = MAX_CPUMASK_BITS - 1; > + fprintf(stderr, > + "A max of %d CPUs are supported in a guest\n", > + MAX_CPUMASK_BITS); > } > - node_cpumask[nodenr] = value; > + > + bitmap_set(node_cpumask[nodenr], value, endvalue-value+1); > } > nb_numa_nodes++; > } > @@ -2330,7 +2328,7 @@ int main(int argc, char **argv, char **envp) > > for (i = 0; i < MAX_NODES; i++) { > node_mem[i] = 0; > - node_cpumask[i] = 0; > + node_cpumask[i] = bitmap_new(MAX_CPUMASK_BITS); > } > > nb_numa_nodes = 0; > @@ -3468,8 +3466,9 @@ int main(int argc, char **argv, char **envp) > } > > for (i = 0; i < nb_numa_nodes; i++) { > - if (node_cpumask[i] != 0) > + if (!bitmap_empty(node_cpumask[i], MAX_CPUMASK_BITS)) { > break; > + } > } > /* assigning the VCPUs round-robin is easier to implement, guest OSes > * must cope with this anyway, because there are BIOSes out there in > @@ -3477,7 +3476,7 @@ int main(int argc, char **argv, char **envp) > */ > if (i == nb_numa_nodes) { > for (i = 0; i < max_cpus; i++) { > - node_cpumask[i % nb_numa_nodes] |= 1 << i; > + set_bit(i, node_cpumask[i % nb_numa_nodes]); > } > } > } > -- > 1.7.1 >
Thanks Eduardo ! Hi Anthony, If you are ok with this patch...could you pl pull these changes into upstream (or) suggest who I should talk to get these changes in ? Thanks! Vinod -----Original Message----- From: Eduardo Habkost [mailto:ehabkost@redhat.com] Sent: Wednesday, July 18, 2012 10:15 AM To: Vinod, Chegu Cc: qemu-devel@nongnu.org; aliguori@us.ibm.com; kvm@vger.kernel.org Subject: Re: [Qemu-devel] [PATCH v4] Fixes related to processing of qemu's -numa option On Mon, Jul 16, 2012 at 09:31:30PM -0700, Chegu Vinod wrote: > Changes since v3: > - using bitmap_set() instead of set_bit() in numa_add() routine. > - removed call to bitmak_zero() since bitmap_new() also zeros' the bitmap. > - Rebased to the latest qemu. Tested-by: Eduardo Habkost <ehabkost@redhat.com> Reviewed-by: Eduardo Habkost <ehabkost@redhat.com> > > Changes since v2: > - Using "unsigned long *" for the node_cpumask[]. > - Use bitmap_new() instead of g_malloc0() for allocation. > - Don't rely on "max_cpus" since it may not be initialized > before the numa related qemu options are parsed & processed. > > Note: Continuing to use a new constant for allocation of > the mask (This constant is currently set to 255 since > with an 8bit APIC ID VCPUs can range from 0-254 in a > guest. The APIC ID 255 (0xFF) is reserved for broadcast). > > Changes since v1: > > - Use bitmap functions that are already in qemu (instead > of cpu_set_t macro's from sched.h) > - Added a check for endvalue >= max_cpus. > - Fix to address the round-robbing assignment when > cpu's are not explicitly specified. > ----------------------------------------------- > > v1: > -- > > The -numa option to qemu is used to create [fake] numa nodes and > expose them to the guest OS instance. > > There are a couple of issues with the -numa option: > > a) Max VCPU's that can be specified for a guest while using > the qemu's -numa option is 64. Due to a typecasting issue > when the number of VCPUs is > 32 the VCPUs don't show up > under the specified [fake] numa nodes. > > b) KVM currently has support for 160VCPUs per guest. The > qemu's -numa option has only support for upto 64VCPUs > per guest. > This patch addresses these two issues. > > Below are examples of (a) and (b) > > a) >32 VCPUs are specified with the -numa option: > > /usr/local/bin/qemu-system-x86_64 \ > -enable-kvm \ > 71:01:01 \ > -net tap,ifname=tap0,script=no,downscript=no \ -vnc :4 > > ... > Upstream qemu : > -------------- > > QEMU 1.1.50 monitor - type 'help' for more information > (qemu) info numa > 6 nodes > node 0 cpus: 0 1 2 3 4 5 6 7 8 9 32 33 34 35 36 37 38 39 40 41 node 0 > size: 131072 MB node 1 cpus: 10 11 12 13 14 15 16 17 18 19 42 43 44 45 > 46 47 48 49 50 51 node 1 size: 131072 MB node 2 cpus: 20 21 22 23 24 > 25 26 27 28 29 52 53 54 55 56 57 58 59 node 2 size: 131072 MB node 3 > cpus: 30 node 3 size: 131072 MB node 4 cpus: > node 4 size: 131072 MB > node 5 cpus: 31 > node 5 size: 131072 MB > > With the patch applied : > ----------------------- > > QEMU 1.1.50 monitor - type 'help' for more information > (qemu) info numa > 6 nodes > node 0 cpus: 0 1 2 3 4 5 6 7 8 9 > node 0 size: 131072 MB > node 1 cpus: 10 11 12 13 14 15 16 17 18 19 node 1 size: 131072 MB node > 2 cpus: 20 21 22 23 24 25 26 27 28 29 node 2 size: 131072 MB node 3 > cpus: 30 31 32 33 34 35 36 37 38 39 node 3 size: 131072 MB node 4 > cpus: 40 41 42 43 44 45 46 47 48 49 node 4 size: 131072 MB node 5 > cpus: 50 51 52 53 54 55 56 57 58 59 node 5 size: 131072 MB > > b) >64 VCPUs specified with -numa option: > > /usr/local/bin/qemu-system-x86_64 \ > -enable-kvm \ > -cpu > Westmere,+rdtscp,+pdpe1gb,+dca,+pdcm,+xtpr,+tm2,+est,+smx,+vmx,+ds_cpl > ,+monitor,+dtes64,+pclmuldq,+pbe,+tm,+ht,+ss,+acpi,+d-vnc :4 > > ... > > Upstream qemu : > -------------- > > only 63 CPUs in NUMA mode supported. > only 64 CPUs in NUMA mode supported. > QEMU 1.1.50 monitor - type 'help' for more information > (qemu) info numa > 8 nodes > node 0 cpus: 6 7 8 9 38 39 40 41 70 71 72 73 node 0 size: 65536 MB > node 1 cpus: 10 11 12 13 14 15 16 17 18 19 42 43 44 45 46 47 48 49 50 > 51 74 75 76 77 78 79 node 1 size: 65536 MB node 2 cpus: 20 21 22 23 24 > 25 26 27 28 29 52 53 54 55 56 57 58 59 60 61 node 2 size: 65536 MB > node 3 cpus: 30 62 node 3 size: 65536 MB node 4 cpus: > node 4 size: 65536 MB > node 5 cpus: > node 5 size: 65536 MB > node 6 cpus: 31 63 > node 6 size: 65536 MB > node 7 cpus: 0 1 2 3 4 5 32 33 34 35 36 37 64 65 66 67 68 69 node 7 > size: 65536 MB > > With the patch applied : > ----------------------- > > QEMU 1.1.50 monitor - type 'help' for more information > (qemu) info numa > 8 nodes > node 0 cpus: 0 1 2 3 4 5 6 7 8 9 > node 0 size: 65536 MB > node 1 cpus: 10 11 12 13 14 15 16 17 18 19 node 1 size: 65536 MB node > 2 cpus: 20 21 22 23 24 25 26 27 28 29 node 2 size: 65536 MB node 3 > cpus: 30 31 32 33 34 35 36 37 38 39 node 3 size: 65536 MB node 4 cpus: > 40 41 42 43 44 45 46 47 48 49 node 4 size: 65536 MB node 5 cpus: 50 51 > 52 53 54 55 56 57 58 59 node 5 size: 65536 MB node 6 cpus: 60 61 62 63 > 64 65 66 67 68 69 node 6 size: 65536 MB node 7 cpus: 70 71 72 73 74 75 > 76 77 78 79 > > Signed-off-by: Chegu Vinod <chegu_vinod@hp.com>, Jim Hull > <jim.hull@hp.com>, Craig Hada <craig.hada@hp.com> > --- > cpus.c | 3 ++- > hw/pc.c | 3 ++- > sysemu.h | 3 ++- > vl.c | 43 +++++++++++++++++++++---------------------- > 4 files changed, 27 insertions(+), 25 deletions(-) > > diff --git a/cpus.c b/cpus.c > index b182b3d..acccd08 100644 > --- a/cpus.c > +++ b/cpus.c > @@ -36,6 +36,7 @@ > #include "cpus.h" > #include "qtest.h" > #include "main-loop.h" > +#include "bitmap.h" > > #ifndef _WIN32 > #include "compatfd.h" > @@ -1145,7 +1146,7 @@ void set_numa_modes(void) > > for (env = first_cpu; env != NULL; env = env->next_cpu) { > for (i = 0; i < nb_numa_nodes; i++) { > - if (node_cpumask[i] & (1 << env->cpu_index)) { > + if (test_bit(env->cpu_index, node_cpumask[i])) { > env->numa_node = i; > } > } > diff --git a/hw/pc.c b/hw/pc.c > index c7e9ab3..2edcc07 100644 > --- a/hw/pc.c > +++ b/hw/pc.c > @@ -48,6 +48,7 @@ > #include "memory.h" > #include "exec-memory.h" > #include "arch_init.h" > +#include "bitmap.h" > > /* output Bochs bios info messages */ //#define DEBUG_BIOS @@ -639,7 > +640,7 @@ static void *bochs_bios_init(void) > numa_fw_cfg[0] = cpu_to_le64(nb_numa_nodes); > for (i = 0; i < max_cpus; i++) { > for (j = 0; j < nb_numa_nodes; j++) { > - if (node_cpumask[j] & (1 << i)) { > + if (test_bit(i, node_cpumask[j])) { > numa_fw_cfg[i + 1] = cpu_to_le64(j); > break; > } > diff --git a/sysemu.h b/sysemu.h > index 6540c79..4669348 100644 > --- a/sysemu.h > +++ b/sysemu.h > @@ -134,9 +134,10 @@ extern uint8_t qemu_extra_params_fw[2]; extern > QEMUClock *rtc_clock; > > #define MAX_NODES 64 > +#define MAX_CPUMASK_BITS 255 > extern int nb_numa_nodes; > extern uint64_t node_mem[MAX_NODES]; > -extern uint64_t node_cpumask[MAX_NODES]; > +extern unsigned long *node_cpumask[MAX_NODES]; > > #define MAX_OPTION_ROMS 16 > typedef struct QEMUOptionRom { > diff --git a/vl.c b/vl.c > index 46248b9..35a641b 100644 > --- a/vl.c > +++ b/vl.c > @@ -28,6 +28,7 @@ > #include <errno.h> > #include <sys/time.h> > #include <zlib.h> > +#include "bitmap.h" > > /* Needed early for CONFIG_BSD etc. */ #include "config-host.h" > @@ -240,7 +241,7 @@ QTAILQ_HEAD(, FWBootEntry) fw_boot_order = > QTAILQ_HEAD_INITIALIZER(fw_boot_order > > int nb_numa_nodes; > uint64_t node_mem[MAX_NODES]; > -uint64_t node_cpumask[MAX_NODES]; > +unsigned long *node_cpumask[MAX_NODES]; > > uint8_t qemu_uuid[16]; > > @@ -951,6 +952,8 @@ static void numa_add(const char *optarg) > unsigned long long value, endvalue; > int nodenr; > > + value = endvalue = 0ULL; > + > optarg = get_opt_name(option, 128, optarg, ',') + 1; > if (!strcmp(option, "node")) { > if (get_param_value(option, 128, "nodeid", optarg) == 0) { @@ > -970,27 +973,22 @@ static void numa_add(const char *optarg) > } > node_mem[nodenr] = sval; > } > - if (get_param_value(option, 128, "cpus", optarg) == 0) { > - node_cpumask[nodenr] = 0; > - } else { > + if (get_param_value(option, 128, "cpus", optarg) != 0) { > value = strtoull(option, &endptr, 10); > - if (value >= 64) { > - value = 63; > - fprintf(stderr, "only 64 CPUs in NUMA mode supported.\n"); > + if (*endptr == '-') { > + endvalue = strtoull(endptr+1, &endptr, 10); > } else { > - if (*endptr == '-') { > - endvalue = strtoull(endptr+1, &endptr, 10); > - if (endvalue >= 63) { > - endvalue = 62; > - fprintf(stderr, > - "only 63 CPUs in NUMA mode supported.\n"); > - } > - value = (2ULL << endvalue) - (1ULL << value); > - } else { > - value = 1ULL << value; > - } > + endvalue = value; > + } > + > + if (!(endvalue < MAX_CPUMASK_BITS)) { > + endvalue = MAX_CPUMASK_BITS - 1; > + fprintf(stderr, > + "A max of %d CPUs are supported in a guest\n", > + MAX_CPUMASK_BITS); > } > - node_cpumask[nodenr] = value; > + > + bitmap_set(node_cpumask[nodenr], value, > + endvalue-value+1); > } > nb_numa_nodes++; > } > @@ -2330,7 +2328,7 @@ int main(int argc, char **argv, char **envp) > > for (i = 0; i < MAX_NODES; i++) { > node_mem[i] = 0; > - node_cpumask[i] = 0; > + node_cpumask[i] = bitmap_new(MAX_CPUMASK_BITS); > } > > nb_numa_nodes = 0; > @@ -3468,8 +3466,9 @@ int main(int argc, char **argv, char **envp) > } > > for (i = 0; i < nb_numa_nodes; i++) { > - if (node_cpumask[i] != 0) > + if (!bitmap_empty(node_cpumask[i], MAX_CPUMASK_BITS)) { > break; > + } > } > /* assigning the VCPUs round-robin is easier to implement, guest OSes > * must cope with this anyway, because there are BIOSes out > there in @@ -3477,7 +3476,7 @@ int main(int argc, char **argv, char **envp) > */ > if (i == nb_numa_nodes) { > for (i = 0; i < max_cpus; i++) { > - node_cpumask[i % nb_numa_nodes] |= 1 << i; > + set_bit(i, node_cpumask[i % nb_numa_nodes]); > } > } > } > -- > 1.7.1 >
Ping? Can anybody help Vinod, here? How can we get the attention of some maintainer, to get this pulled in? On Wed, Jul 18, 2012 at 06:26:52PM +0000, Vinod, Chegu wrote: > Thanks Eduardo ! > > Hi Anthony, If you are ok with this patch...could you pl pull these changes into upstream (or) > suggest who I should talk to get these changes in ? > > Thanks! > Vinod > > -----Original Message----- > From: Eduardo Habkost [mailto:ehabkost@redhat.com] > Sent: Wednesday, July 18, 2012 10:15 AM > To: Vinod, Chegu > Cc: qemu-devel@nongnu.org; aliguori@us.ibm.com; kvm@vger.kernel.org > Subject: Re: [Qemu-devel] [PATCH v4] Fixes related to processing of qemu's -numa option > > On Mon, Jul 16, 2012 at 09:31:30PM -0700, Chegu Vinod wrote: > > Changes since v3: > > - using bitmap_set() instead of set_bit() in numa_add() routine. > > - removed call to bitmak_zero() since bitmap_new() also zeros' the bitmap. > > - Rebased to the latest qemu. > > Tested-by: Eduardo Habkost <ehabkost@redhat.com> > Reviewed-by: Eduardo Habkost <ehabkost@redhat.com> > > > > > > Changes since v2: > > - Using "unsigned long *" for the node_cpumask[]. > > - Use bitmap_new() instead of g_malloc0() for allocation. > > - Don't rely on "max_cpus" since it may not be initialized > > before the numa related qemu options are parsed & processed. > > > > Note: Continuing to use a new constant for allocation of > > the mask (This constant is currently set to 255 since > > with an 8bit APIC ID VCPUs can range from 0-254 in a > > guest. The APIC ID 255 (0xFF) is reserved for broadcast). > > > > Changes since v1: > > > > - Use bitmap functions that are already in qemu (instead > > of cpu_set_t macro's from sched.h) > > - Added a check for endvalue >= max_cpus. > > - Fix to address the round-robbing assignment when > > cpu's are not explicitly specified. > > ----------------------------------------------- > > > > v1: > > -- > > > > The -numa option to qemu is used to create [fake] numa nodes and > > expose them to the guest OS instance. > > > > There are a couple of issues with the -numa option: > > > > a) Max VCPU's that can be specified for a guest while using > > the qemu's -numa option is 64. Due to a typecasting issue > > when the number of VCPUs is > 32 the VCPUs don't show up > > under the specified [fake] numa nodes. > > > > b) KVM currently has support for 160VCPUs per guest. The > > qemu's -numa option has only support for upto 64VCPUs > > per guest. > > This patch addresses these two issues. > > > > Below are examples of (a) and (b) > > > > a) >32 VCPUs are specified with the -numa option: > > > > /usr/local/bin/qemu-system-x86_64 \ > > -enable-kvm \ > > 71:01:01 \ > > -net tap,ifname=tap0,script=no,downscript=no \ -vnc :4 > > > > ... > > Upstream qemu : > > -------------- > > > > QEMU 1.1.50 monitor - type 'help' for more information > > (qemu) info numa > > 6 nodes > > node 0 cpus: 0 1 2 3 4 5 6 7 8 9 32 33 34 35 36 37 38 39 40 41 node 0 > > size: 131072 MB node 1 cpus: 10 11 12 13 14 15 16 17 18 19 42 43 44 45 > > 46 47 48 49 50 51 node 1 size: 131072 MB node 2 cpus: 20 21 22 23 24 > > 25 26 27 28 29 52 53 54 55 56 57 58 59 node 2 size: 131072 MB node 3 > > cpus: 30 node 3 size: 131072 MB node 4 cpus: > > node 4 size: 131072 MB > > node 5 cpus: 31 > > node 5 size: 131072 MB > > > > With the patch applied : > > ----------------------- > > > > QEMU 1.1.50 monitor - type 'help' for more information > > (qemu) info numa > > 6 nodes > > node 0 cpus: 0 1 2 3 4 5 6 7 8 9 > > node 0 size: 131072 MB > > node 1 cpus: 10 11 12 13 14 15 16 17 18 19 node 1 size: 131072 MB node > > 2 cpus: 20 21 22 23 24 25 26 27 28 29 node 2 size: 131072 MB node 3 > > cpus: 30 31 32 33 34 35 36 37 38 39 node 3 size: 131072 MB node 4 > > cpus: 40 41 42 43 44 45 46 47 48 49 node 4 size: 131072 MB node 5 > > cpus: 50 51 52 53 54 55 56 57 58 59 node 5 size: 131072 MB > > > > b) >64 VCPUs specified with -numa option: > > > > /usr/local/bin/qemu-system-x86_64 \ > > -enable-kvm \ > > -cpu > > Westmere,+rdtscp,+pdpe1gb,+dca,+pdcm,+xtpr,+tm2,+est,+smx,+vmx,+ds_cpl > > ,+monitor,+dtes64,+pclmuldq,+pbe,+tm,+ht,+ss,+acpi,+d-vnc :4 > > > > ... > > > > Upstream qemu : > > -------------- > > > > only 63 CPUs in NUMA mode supported. > > only 64 CPUs in NUMA mode supported. > > QEMU 1.1.50 monitor - type 'help' for more information > > (qemu) info numa > > 8 nodes > > node 0 cpus: 6 7 8 9 38 39 40 41 70 71 72 73 node 0 size: 65536 MB > > node 1 cpus: 10 11 12 13 14 15 16 17 18 19 42 43 44 45 46 47 48 49 50 > > 51 74 75 76 77 78 79 node 1 size: 65536 MB node 2 cpus: 20 21 22 23 24 > > 25 26 27 28 29 52 53 54 55 56 57 58 59 60 61 node 2 size: 65536 MB > > node 3 cpus: 30 62 node 3 size: 65536 MB node 4 cpus: > > node 4 size: 65536 MB > > node 5 cpus: > > node 5 size: 65536 MB > > node 6 cpus: 31 63 > > node 6 size: 65536 MB > > node 7 cpus: 0 1 2 3 4 5 32 33 34 35 36 37 64 65 66 67 68 69 node 7 > > size: 65536 MB > > > > With the patch applied : > > ----------------------- > > > > QEMU 1.1.50 monitor - type 'help' for more information > > (qemu) info numa > > 8 nodes > > node 0 cpus: 0 1 2 3 4 5 6 7 8 9 > > node 0 size: 65536 MB > > node 1 cpus: 10 11 12 13 14 15 16 17 18 19 node 1 size: 65536 MB node > > 2 cpus: 20 21 22 23 24 25 26 27 28 29 node 2 size: 65536 MB node 3 > > cpus: 30 31 32 33 34 35 36 37 38 39 node 3 size: 65536 MB node 4 cpus: > > 40 41 42 43 44 45 46 47 48 49 node 4 size: 65536 MB node 5 cpus: 50 51 > > 52 53 54 55 56 57 58 59 node 5 size: 65536 MB node 6 cpus: 60 61 62 63 > > 64 65 66 67 68 69 node 6 size: 65536 MB node 7 cpus: 70 71 72 73 74 75 > > 76 77 78 79 > > > > Signed-off-by: Chegu Vinod <chegu_vinod@hp.com>, Jim Hull > > <jim.hull@hp.com>, Craig Hada <craig.hada@hp.com> > > --- > > cpus.c | 3 ++- > > hw/pc.c | 3 ++- > > sysemu.h | 3 ++- > > vl.c | 43 +++++++++++++++++++++---------------------- > > 4 files changed, 27 insertions(+), 25 deletions(-) > > > > diff --git a/cpus.c b/cpus.c > > index b182b3d..acccd08 100644 > > --- a/cpus.c > > +++ b/cpus.c > > @@ -36,6 +36,7 @@ > > #include "cpus.h" > > #include "qtest.h" > > #include "main-loop.h" > > +#include "bitmap.h" > > > > #ifndef _WIN32 > > #include "compatfd.h" > > @@ -1145,7 +1146,7 @@ void set_numa_modes(void) > > > > for (env = first_cpu; env != NULL; env = env->next_cpu) { > > for (i = 0; i < nb_numa_nodes; i++) { > > - if (node_cpumask[i] & (1 << env->cpu_index)) { > > + if (test_bit(env->cpu_index, node_cpumask[i])) { > > env->numa_node = i; > > } > > } > > diff --git a/hw/pc.c b/hw/pc.c > > index c7e9ab3..2edcc07 100644 > > --- a/hw/pc.c > > +++ b/hw/pc.c > > @@ -48,6 +48,7 @@ > > #include "memory.h" > > #include "exec-memory.h" > > #include "arch_init.h" > > +#include "bitmap.h" > > > > /* output Bochs bios info messages */ //#define DEBUG_BIOS @@ -639,7 > > +640,7 @@ static void *bochs_bios_init(void) > > numa_fw_cfg[0] = cpu_to_le64(nb_numa_nodes); > > for (i = 0; i < max_cpus; i++) { > > for (j = 0; j < nb_numa_nodes; j++) { > > - if (node_cpumask[j] & (1 << i)) { > > + if (test_bit(i, node_cpumask[j])) { > > numa_fw_cfg[i + 1] = cpu_to_le64(j); > > break; > > } > > diff --git a/sysemu.h b/sysemu.h > > index 6540c79..4669348 100644 > > --- a/sysemu.h > > +++ b/sysemu.h > > @@ -134,9 +134,10 @@ extern uint8_t qemu_extra_params_fw[2]; extern > > QEMUClock *rtc_clock; > > > > #define MAX_NODES 64 > > +#define MAX_CPUMASK_BITS 255 > > extern int nb_numa_nodes; > > extern uint64_t node_mem[MAX_NODES]; > > -extern uint64_t node_cpumask[MAX_NODES]; > > +extern unsigned long *node_cpumask[MAX_NODES]; > > > > #define MAX_OPTION_ROMS 16 > > typedef struct QEMUOptionRom { > > diff --git a/vl.c b/vl.c > > index 46248b9..35a641b 100644 > > --- a/vl.c > > +++ b/vl.c > > @@ -28,6 +28,7 @@ > > #include <errno.h> > > #include <sys/time.h> > > #include <zlib.h> > > +#include "bitmap.h" > > > > /* Needed early for CONFIG_BSD etc. */ #include "config-host.h" > > @@ -240,7 +241,7 @@ QTAILQ_HEAD(, FWBootEntry) fw_boot_order = > > QTAILQ_HEAD_INITIALIZER(fw_boot_order > > > > int nb_numa_nodes; > > uint64_t node_mem[MAX_NODES]; > > -uint64_t node_cpumask[MAX_NODES]; > > +unsigned long *node_cpumask[MAX_NODES]; > > > > uint8_t qemu_uuid[16]; > > > > @@ -951,6 +952,8 @@ static void numa_add(const char *optarg) > > unsigned long long value, endvalue; > > int nodenr; > > > > + value = endvalue = 0ULL; > > + > > optarg = get_opt_name(option, 128, optarg, ',') + 1; > > if (!strcmp(option, "node")) { > > if (get_param_value(option, 128, "nodeid", optarg) == 0) { @@ > > -970,27 +973,22 @@ static void numa_add(const char *optarg) > > } > > node_mem[nodenr] = sval; > > } > > - if (get_param_value(option, 128, "cpus", optarg) == 0) { > > - node_cpumask[nodenr] = 0; > > - } else { > > + if (get_param_value(option, 128, "cpus", optarg) != 0) { > > value = strtoull(option, &endptr, 10); > > - if (value >= 64) { > > - value = 63; > > - fprintf(stderr, "only 64 CPUs in NUMA mode supported.\n"); > > + if (*endptr == '-') { > > + endvalue = strtoull(endptr+1, &endptr, 10); > > } else { > > - if (*endptr == '-') { > > - endvalue = strtoull(endptr+1, &endptr, 10); > > - if (endvalue >= 63) { > > - endvalue = 62; > > - fprintf(stderr, > > - "only 63 CPUs in NUMA mode supported.\n"); > > - } > > - value = (2ULL << endvalue) - (1ULL << value); > > - } else { > > - value = 1ULL << value; > > - } > > + endvalue = value; > > + } > > + > > + if (!(endvalue < MAX_CPUMASK_BITS)) { > > + endvalue = MAX_CPUMASK_BITS - 1; > > + fprintf(stderr, > > + "A max of %d CPUs are supported in a guest\n", > > + MAX_CPUMASK_BITS); > > } > > - node_cpumask[nodenr] = value; > > + > > + bitmap_set(node_cpumask[nodenr], value, > > + endvalue-value+1); > > } > > nb_numa_nodes++; > > } > > @@ -2330,7 +2328,7 @@ int main(int argc, char **argv, char **envp) > > > > for (i = 0; i < MAX_NODES; i++) { > > node_mem[i] = 0; > > - node_cpumask[i] = 0; > > + node_cpumask[i] = bitmap_new(MAX_CPUMASK_BITS); > > } > > > > nb_numa_nodes = 0; > > @@ -3468,8 +3466,9 @@ int main(int argc, char **argv, char **envp) > > } > > > > for (i = 0; i < nb_numa_nodes; i++) { > > - if (node_cpumask[i] != 0) > > + if (!bitmap_empty(node_cpumask[i], MAX_CPUMASK_BITS)) { > > break; > > + } > > } > > /* assigning the VCPUs round-robin is easier to implement, guest OSes > > * must cope with this anyway, because there are BIOSes out > > there in @@ -3477,7 +3476,7 @@ int main(int argc, char **argv, char **envp) > > */ > > if (i == nb_numa_nodes) { > > for (i = 0; i < max_cpus; i++) { > > - node_cpumask[i % nb_numa_nodes] |= 1 << i; > > + set_bit(i, node_cpumask[i % nb_numa_nodes]); > > } > > } > > } > > -- > > 1.7.1 > > > > -- > Eduardo
Thanks, applied. On Tue, Jul 17, 2012 at 4:31 AM, Chegu Vinod <chegu_vinod@hp.com> wrote: > Changes since v3: > - using bitmap_set() instead of set_bit() in numa_add() routine. > - removed call to bitmak_zero() since bitmap_new() also zeros' the bitmap. > - Rebased to the latest qemu. > > Changes since v2: > - Using "unsigned long *" for the node_cpumask[]. > - Use bitmap_new() instead of g_malloc0() for allocation. > - Don't rely on "max_cpus" since it may not be initialized > before the numa related qemu options are parsed & processed. > > Note: Continuing to use a new constant for allocation of > the mask (This constant is currently set to 255 since > with an 8bit APIC ID VCPUs can range from 0-254 in a > guest. The APIC ID 255 (0xFF) is reserved for broadcast). > > Changes since v1: > > - Use bitmap functions that are already in qemu (instead > of cpu_set_t macro's from sched.h) > - Added a check for endvalue >= max_cpus. > - Fix to address the round-robbing assignment when > cpu's are not explicitly specified. > ----------------------------------------------- > > v1: > -- > > The -numa option to qemu is used to create [fake] numa nodes > and expose them to the guest OS instance. > > There are a couple of issues with the -numa option: > > a) Max VCPU's that can be specified for a guest while using > the qemu's -numa option is 64. Due to a typecasting issue > when the number of VCPUs is > 32 the VCPUs don't show up > under the specified [fake] numa nodes. > > b) KVM currently has support for 160VCPUs per guest. The > qemu's -numa option has only support for upto 64VCPUs > per guest. > This patch addresses these two issues. > > Below are examples of (a) and (b) > > a) >32 VCPUs are specified with the -numa option: > > /usr/local/bin/qemu-system-x86_64 \ > -enable-kvm \ > 71:01:01 \ > -net tap,ifname=tap0,script=no,downscript=no \ > -vnc :4 > > ... > Upstream qemu : > -------------- > > QEMU 1.1.50 monitor - type 'help' for more information > (qemu) info numa > 6 nodes > node 0 cpus: 0 1 2 3 4 5 6 7 8 9 32 33 34 35 36 37 38 39 40 41 > node 0 size: 131072 MB > node 1 cpus: 10 11 12 13 14 15 16 17 18 19 42 43 44 45 46 47 48 49 50 51 > node 1 size: 131072 MB > node 2 cpus: 20 21 22 23 24 25 26 27 28 29 52 53 54 55 56 57 58 59 > node 2 size: 131072 MB > node 3 cpus: 30 > node 3 size: 131072 MB > node 4 cpus: > node 4 size: 131072 MB > node 5 cpus: 31 > node 5 size: 131072 MB > > With the patch applied : > ----------------------- > > QEMU 1.1.50 monitor - type 'help' for more information > (qemu) info numa > 6 nodes > node 0 cpus: 0 1 2 3 4 5 6 7 8 9 > node 0 size: 131072 MB > node 1 cpus: 10 11 12 13 14 15 16 17 18 19 > node 1 size: 131072 MB > node 2 cpus: 20 21 22 23 24 25 26 27 28 29 > node 2 size: 131072 MB > node 3 cpus: 30 31 32 33 34 35 36 37 38 39 > node 3 size: 131072 MB > node 4 cpus: 40 41 42 43 44 45 46 47 48 49 > node 4 size: 131072 MB > node 5 cpus: 50 51 52 53 54 55 56 57 58 59 > node 5 size: 131072 MB > > b) >64 VCPUs specified with -numa option: > > /usr/local/bin/qemu-system-x86_64 \ > -enable-kvm \ > -cpu Westmere,+rdtscp,+pdpe1gb,+dca,+pdcm,+xtpr,+tm2,+est,+smx,+vmx,+ds_cpl,+monitor,+dtes64,+pclmuldq,+pbe,+tm,+ht,+ss,+acpi,+d-vnc :4 > > ... > > Upstream qemu : > -------------- > > only 63 CPUs in NUMA mode supported. > only 64 CPUs in NUMA mode supported. > QEMU 1.1.50 monitor - type 'help' for more information > (qemu) info numa > 8 nodes > node 0 cpus: 6 7 8 9 38 39 40 41 70 71 72 73 > node 0 size: 65536 MB > node 1 cpus: 10 11 12 13 14 15 16 17 18 19 42 43 44 45 46 47 48 49 50 51 74 75 76 77 78 79 > node 1 size: 65536 MB > node 2 cpus: 20 21 22 23 24 25 26 27 28 29 52 53 54 55 56 57 58 59 60 61 > node 2 size: 65536 MB > node 3 cpus: 30 62 > node 3 size: 65536 MB > node 4 cpus: > node 4 size: 65536 MB > node 5 cpus: > node 5 size: 65536 MB > node 6 cpus: 31 63 > node 6 size: 65536 MB > node 7 cpus: 0 1 2 3 4 5 32 33 34 35 36 37 64 65 66 67 68 69 > node 7 size: 65536 MB > > With the patch applied : > ----------------------- > > QEMU 1.1.50 monitor - type 'help' for more information > (qemu) info numa > 8 nodes > node 0 cpus: 0 1 2 3 4 5 6 7 8 9 > node 0 size: 65536 MB > node 1 cpus: 10 11 12 13 14 15 16 17 18 19 > node 1 size: 65536 MB > node 2 cpus: 20 21 22 23 24 25 26 27 28 29 > node 2 size: 65536 MB > node 3 cpus: 30 31 32 33 34 35 36 37 38 39 > node 3 size: 65536 MB > node 4 cpus: 40 41 42 43 44 45 46 47 48 49 > node 4 size: 65536 MB > node 5 cpus: 50 51 52 53 54 55 56 57 58 59 > node 5 size: 65536 MB > node 6 cpus: 60 61 62 63 64 65 66 67 68 69 > node 6 size: 65536 MB > node 7 cpus: 70 71 72 73 74 75 76 77 78 79 > > Signed-off-by: Chegu Vinod <chegu_vinod@hp.com>, Jim Hull <jim.hull@hp.com>, Craig Hada <craig.hada@hp.com> > --- > cpus.c | 3 ++- > hw/pc.c | 3 ++- > sysemu.h | 3 ++- > vl.c | 43 +++++++++++++++++++++---------------------- > 4 files changed, 27 insertions(+), 25 deletions(-) > > diff --git a/cpus.c b/cpus.c > index b182b3d..acccd08 100644 > --- a/cpus.c > +++ b/cpus.c > @@ -36,6 +36,7 @@ > #include "cpus.h" > #include "qtest.h" > #include "main-loop.h" > +#include "bitmap.h" > > #ifndef _WIN32 > #include "compatfd.h" > @@ -1145,7 +1146,7 @@ void set_numa_modes(void) > > for (env = first_cpu; env != NULL; env = env->next_cpu) { > for (i = 0; i < nb_numa_nodes; i++) { > - if (node_cpumask[i] & (1 << env->cpu_index)) { > + if (test_bit(env->cpu_index, node_cpumask[i])) { > env->numa_node = i; > } > } > diff --git a/hw/pc.c b/hw/pc.c > index c7e9ab3..2edcc07 100644 > --- a/hw/pc.c > +++ b/hw/pc.c > @@ -48,6 +48,7 @@ > #include "memory.h" > #include "exec-memory.h" > #include "arch_init.h" > +#include "bitmap.h" > > /* output Bochs bios info messages */ > //#define DEBUG_BIOS > @@ -639,7 +640,7 @@ static void *bochs_bios_init(void) > numa_fw_cfg[0] = cpu_to_le64(nb_numa_nodes); > for (i = 0; i < max_cpus; i++) { > for (j = 0; j < nb_numa_nodes; j++) { > - if (node_cpumask[j] & (1 << i)) { > + if (test_bit(i, node_cpumask[j])) { > numa_fw_cfg[i + 1] = cpu_to_le64(j); > break; > } > diff --git a/sysemu.h b/sysemu.h > index 6540c79..4669348 100644 > --- a/sysemu.h > +++ b/sysemu.h > @@ -134,9 +134,10 @@ extern uint8_t qemu_extra_params_fw[2]; > extern QEMUClock *rtc_clock; > > #define MAX_NODES 64 > +#define MAX_CPUMASK_BITS 255 > extern int nb_numa_nodes; > extern uint64_t node_mem[MAX_NODES]; > -extern uint64_t node_cpumask[MAX_NODES]; > +extern unsigned long *node_cpumask[MAX_NODES]; > > #define MAX_OPTION_ROMS 16 > typedef struct QEMUOptionRom { > diff --git a/vl.c b/vl.c > index 46248b9..35a641b 100644 > --- a/vl.c > +++ b/vl.c > @@ -28,6 +28,7 @@ > #include <errno.h> > #include <sys/time.h> > #include <zlib.h> > +#include "bitmap.h" > > /* Needed early for CONFIG_BSD etc. */ > #include "config-host.h" > @@ -240,7 +241,7 @@ QTAILQ_HEAD(, FWBootEntry) fw_boot_order = QTAILQ_HEAD_INITIALIZER(fw_boot_order > > int nb_numa_nodes; > uint64_t node_mem[MAX_NODES]; > -uint64_t node_cpumask[MAX_NODES]; > +unsigned long *node_cpumask[MAX_NODES]; > > uint8_t qemu_uuid[16]; > > @@ -951,6 +952,8 @@ static void numa_add(const char *optarg) > unsigned long long value, endvalue; > int nodenr; > > + value = endvalue = 0ULL; > + > optarg = get_opt_name(option, 128, optarg, ',') + 1; > if (!strcmp(option, "node")) { > if (get_param_value(option, 128, "nodeid", optarg) == 0) { > @@ -970,27 +973,22 @@ static void numa_add(const char *optarg) > } > node_mem[nodenr] = sval; > } > - if (get_param_value(option, 128, "cpus", optarg) == 0) { > - node_cpumask[nodenr] = 0; > - } else { > + if (get_param_value(option, 128, "cpus", optarg) != 0) { > value = strtoull(option, &endptr, 10); > - if (value >= 64) { > - value = 63; > - fprintf(stderr, "only 64 CPUs in NUMA mode supported.\n"); > + if (*endptr == '-') { > + endvalue = strtoull(endptr+1, &endptr, 10); > } else { > - if (*endptr == '-') { > - endvalue = strtoull(endptr+1, &endptr, 10); > - if (endvalue >= 63) { > - endvalue = 62; > - fprintf(stderr, > - "only 63 CPUs in NUMA mode supported.\n"); > - } > - value = (2ULL << endvalue) - (1ULL << value); > - } else { > - value = 1ULL << value; > - } > + endvalue = value; > + } > + > + if (!(endvalue < MAX_CPUMASK_BITS)) { > + endvalue = MAX_CPUMASK_BITS - 1; > + fprintf(stderr, > + "A max of %d CPUs are supported in a guest\n", > + MAX_CPUMASK_BITS); > } > - node_cpumask[nodenr] = value; > + > + bitmap_set(node_cpumask[nodenr], value, endvalue-value+1); > } > nb_numa_nodes++; > } > @@ -2330,7 +2328,7 @@ int main(int argc, char **argv, char **envp) > > for (i = 0; i < MAX_NODES; i++) { > node_mem[i] = 0; > - node_cpumask[i] = 0; > + node_cpumask[i] = bitmap_new(MAX_CPUMASK_BITS); > } > > nb_numa_nodes = 0; > @@ -3468,8 +3466,9 @@ int main(int argc, char **argv, char **envp) > } > > for (i = 0; i < nb_numa_nodes; i++) { > - if (node_cpumask[i] != 0) > + if (!bitmap_empty(node_cpumask[i], MAX_CPUMASK_BITS)) { > break; > + } > } > /* assigning the VCPUs round-robin is easier to implement, guest OSes > * must cope with this anyway, because there are BIOSes out there in > @@ -3477,7 +3476,7 @@ int main(int argc, char **argv, char **envp) > */ > if (i == nb_numa_nodes) { > for (i = 0; i < max_cpus; i++) { > - node_cpumask[i % nb_numa_nodes] |= 1 << i; > + set_bit(i, node_cpumask[i % nb_numa_nodes]); > } > } > } > -- > 1.7.1 > > -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/cpus.c b/cpus.c index b182b3d..acccd08 100644 --- a/cpus.c +++ b/cpus.c @@ -36,6 +36,7 @@ #include "cpus.h" #include "qtest.h" #include "main-loop.h" +#include "bitmap.h" #ifndef _WIN32 #include "compatfd.h" @@ -1145,7 +1146,7 @@ void set_numa_modes(void) for (env = first_cpu; env != NULL; env = env->next_cpu) { for (i = 0; i < nb_numa_nodes; i++) { - if (node_cpumask[i] & (1 << env->cpu_index)) { + if (test_bit(env->cpu_index, node_cpumask[i])) { env->numa_node = i; } } diff --git a/hw/pc.c b/hw/pc.c index c7e9ab3..2edcc07 100644 --- a/hw/pc.c +++ b/hw/pc.c @@ -48,6 +48,7 @@ #include "memory.h" #include "exec-memory.h" #include "arch_init.h" +#include "bitmap.h" /* output Bochs bios info messages */ //#define DEBUG_BIOS @@ -639,7 +640,7 @@ static void *bochs_bios_init(void) numa_fw_cfg[0] = cpu_to_le64(nb_numa_nodes); for (i = 0; i < max_cpus; i++) { for (j = 0; j < nb_numa_nodes; j++) { - if (node_cpumask[j] & (1 << i)) { + if (test_bit(i, node_cpumask[j])) { numa_fw_cfg[i + 1] = cpu_to_le64(j); break; } diff --git a/sysemu.h b/sysemu.h index 6540c79..4669348 100644 --- a/sysemu.h +++ b/sysemu.h @@ -134,9 +134,10 @@ extern uint8_t qemu_extra_params_fw[2]; extern QEMUClock *rtc_clock; #define MAX_NODES 64 +#define MAX_CPUMASK_BITS 255 extern int nb_numa_nodes; extern uint64_t node_mem[MAX_NODES]; -extern uint64_t node_cpumask[MAX_NODES]; +extern unsigned long *node_cpumask[MAX_NODES]; #define MAX_OPTION_ROMS 16 typedef struct QEMUOptionRom { diff --git a/vl.c b/vl.c index 46248b9..35a641b 100644 --- a/vl.c +++ b/vl.c @@ -28,6 +28,7 @@ #include <errno.h> #include <sys/time.h> #include <zlib.h> +#include "bitmap.h" /* Needed early for CONFIG_BSD etc. */ #include "config-host.h" @@ -240,7 +241,7 @@ QTAILQ_HEAD(, FWBootEntry) fw_boot_order = QTAILQ_HEAD_INITIALIZER(fw_boot_order int nb_numa_nodes; uint64_t node_mem[MAX_NODES]; -uint64_t node_cpumask[MAX_NODES]; +unsigned long *node_cpumask[MAX_NODES]; uint8_t qemu_uuid[16]; @@ -951,6 +952,8 @@ static void numa_add(const char *optarg) unsigned long long value, endvalue; int nodenr; + value = endvalue = 0ULL; + optarg = get_opt_name(option, 128, optarg, ',') + 1; if (!strcmp(option, "node")) { if (get_param_value(option, 128, "nodeid", optarg) == 0) { @@ -970,27 +973,22 @@ static void numa_add(const char *optarg) } node_mem[nodenr] = sval; } - if (get_param_value(option, 128, "cpus", optarg) == 0) { - node_cpumask[nodenr] = 0; - } else { + if (get_param_value(option, 128, "cpus", optarg) != 0) { value = strtoull(option, &endptr, 10); - if (value >= 64) { - value = 63; - fprintf(stderr, "only 64 CPUs in NUMA mode supported.\n"); + if (*endptr == '-') { + endvalue = strtoull(endptr+1, &endptr, 10); } else { - if (*endptr == '-') { - endvalue = strtoull(endptr+1, &endptr, 10); - if (endvalue >= 63) { - endvalue = 62; - fprintf(stderr, - "only 63 CPUs in NUMA mode supported.\n"); - } - value = (2ULL << endvalue) - (1ULL << value); - } else { - value = 1ULL << value; - } + endvalue = value; + } + + if (!(endvalue < MAX_CPUMASK_BITS)) { + endvalue = MAX_CPUMASK_BITS - 1; + fprintf(stderr, + "A max of %d CPUs are supported in a guest\n", + MAX_CPUMASK_BITS); } - node_cpumask[nodenr] = value; + + bitmap_set(node_cpumask[nodenr], value, endvalue-value+1); } nb_numa_nodes++; } @@ -2330,7 +2328,7 @@ int main(int argc, char **argv, char **envp) for (i = 0; i < MAX_NODES; i++) { node_mem[i] = 0; - node_cpumask[i] = 0; + node_cpumask[i] = bitmap_new(MAX_CPUMASK_BITS); } nb_numa_nodes = 0; @@ -3468,8 +3466,9 @@ int main(int argc, char **argv, char **envp) } for (i = 0; i < nb_numa_nodes; i++) { - if (node_cpumask[i] != 0) + if (!bitmap_empty(node_cpumask[i], MAX_CPUMASK_BITS)) { break; + } } /* assigning the VCPUs round-robin is easier to implement, guest OSes * must cope with this anyway, because there are BIOSes out there in @@ -3477,7 +3476,7 @@ int main(int argc, char **argv, char **envp) */ if (i == nb_numa_nodes) { for (i = 0; i < max_cpus; i++) { - node_cpumask[i % nb_numa_nodes] |= 1 << i; + set_bit(i, node_cpumask[i % nb_numa_nodes]); } } }
Changes since v3: - using bitmap_set() instead of set_bit() in numa_add() routine. - removed call to bitmak_zero() since bitmap_new() also zeros' the bitmap. - Rebased to the latest qemu. Changes since v2: - Using "unsigned long *" for the node_cpumask[]. - Use bitmap_new() instead of g_malloc0() for allocation. - Don't rely on "max_cpus" since it may not be initialized before the numa related qemu options are parsed & processed. Note: Continuing to use a new constant for allocation of the mask (This constant is currently set to 255 since with an 8bit APIC ID VCPUs can range from 0-254 in a guest. The APIC ID 255 (0xFF) is reserved for broadcast). Changes since v1: - Use bitmap functions that are already in qemu (instead of cpu_set_t macro's from sched.h) - Added a check for endvalue >= max_cpus. - Fix to address the round-robbing assignment when cpu's are not explicitly specified. ----------------------------------------------- v1: -- The -numa option to qemu is used to create [fake] numa nodes and expose them to the guest OS instance. There are a couple of issues with the -numa option: a) Max VCPU's that can be specified for a guest while using the qemu's -numa option is 64. Due to a typecasting issue when the number of VCPUs is > 32 the VCPUs don't show up under the specified [fake] numa nodes. b) KVM currently has support for 160VCPUs per guest. The qemu's -numa option has only support for upto 64VCPUs per guest. This patch addresses these two issues. Below are examples of (a) and (b) a) >32 VCPUs are specified with the -numa option: /usr/local/bin/qemu-system-x86_64 \ -enable-kvm \ 71:01:01 \ -net tap,ifname=tap0,script=no,downscript=no \ -vnc :4 ... Upstream qemu : -------------- QEMU 1.1.50 monitor - type 'help' for more information (qemu) info numa 6 nodes node 0 cpus: 0 1 2 3 4 5 6 7 8 9 32 33 34 35 36 37 38 39 40 41 node 0 size: 131072 MB node 1 cpus: 10 11 12 13 14 15 16 17 18 19 42 43 44 45 46 47 48 49 50 51 node 1 size: 131072 MB node 2 cpus: 20 21 22 23 24 25 26 27 28 29 52 53 54 55 56 57 58 59 node 2 size: 131072 MB node 3 cpus: 30 node 3 size: 131072 MB node 4 cpus: node 4 size: 131072 MB node 5 cpus: 31 node 5 size: 131072 MB With the patch applied : ----------------------- QEMU 1.1.50 monitor - type 'help' for more information (qemu) info numa 6 nodes node 0 cpus: 0 1 2 3 4 5 6 7 8 9 node 0 size: 131072 MB node 1 cpus: 10 11 12 13 14 15 16 17 18 19 node 1 size: 131072 MB node 2 cpus: 20 21 22 23 24 25 26 27 28 29 node 2 size: 131072 MB node 3 cpus: 30 31 32 33 34 35 36 37 38 39 node 3 size: 131072 MB node 4 cpus: 40 41 42 43 44 45 46 47 48 49 node 4 size: 131072 MB node 5 cpus: 50 51 52 53 54 55 56 57 58 59 node 5 size: 131072 MB b) >64 VCPUs specified with -numa option: /usr/local/bin/qemu-system-x86_64 \ -enable-kvm \ -cpu Westmere,+rdtscp,+pdpe1gb,+dca,+pdcm,+xtpr,+tm2,+est,+smx,+vmx,+ds_cpl,+monitor,+dtes64,+pclmuldq,+pbe,+tm,+ht,+ss,+acpi,+d-vnc :4 ... Upstream qemu : -------------- only 63 CPUs in NUMA mode supported. only 64 CPUs in NUMA mode supported. QEMU 1.1.50 monitor - type 'help' for more information (qemu) info numa 8 nodes node 0 cpus: 6 7 8 9 38 39 40 41 70 71 72 73 node 0 size: 65536 MB node 1 cpus: 10 11 12 13 14 15 16 17 18 19 42 43 44 45 46 47 48 49 50 51 74 75 76 77 78 79 node 1 size: 65536 MB node 2 cpus: 20 21 22 23 24 25 26 27 28 29 52 53 54 55 56 57 58 59 60 61 node 2 size: 65536 MB node 3 cpus: 30 62 node 3 size: 65536 MB node 4 cpus: node 4 size: 65536 MB node 5 cpus: node 5 size: 65536 MB node 6 cpus: 31 63 node 6 size: 65536 MB node 7 cpus: 0 1 2 3 4 5 32 33 34 35 36 37 64 65 66 67 68 69 node 7 size: 65536 MB With the patch applied : ----------------------- QEMU 1.1.50 monitor - type 'help' for more information (qemu) info numa 8 nodes node 0 cpus: 0 1 2 3 4 5 6 7 8 9 node 0 size: 65536 MB node 1 cpus: 10 11 12 13 14 15 16 17 18 19 node 1 size: 65536 MB node 2 cpus: 20 21 22 23 24 25 26 27 28 29 node 2 size: 65536 MB node 3 cpus: 30 31 32 33 34 35 36 37 38 39 node 3 size: 65536 MB node 4 cpus: 40 41 42 43 44 45 46 47 48 49 node 4 size: 65536 MB node 5 cpus: 50 51 52 53 54 55 56 57 58 59 node 5 size: 65536 MB node 6 cpus: 60 61 62 63 64 65 66 67 68 69 node 6 size: 65536 MB node 7 cpus: 70 71 72 73 74 75 76 77 78 79 Signed-off-by: Chegu Vinod <chegu_vinod@hp.com>, Jim Hull <jim.hull@hp.com>, Craig Hada <craig.hada@hp.com> --- cpus.c | 3 ++- hw/pc.c | 3 ++- sysemu.h | 3 ++- vl.c | 43 +++++++++++++++++++++---------------------- 4 files changed, 27 insertions(+), 25 deletions(-)