Message ID | 20160406172349.25877.14008.stgit@Solace.fritz.box (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On 06/04/16 19:23, Dario Faggioli wrote: > In fact, credit2 uses CPU topology to decide how to arrange > its internal runqueues. Before this change, only 'one runqueue > per socket' was allowed. However, experiments have shown that, > for instance, having one runqueue per physical core improves > performance, especially in case hyperthreading is available. > > In general, it makes sense to allow users to pick one runqueue > arrangement at boot time, so that: > - more experiments can be easily performed to even better > assess and improve performance; > - one can select the best configuration for his specific > use case and/or hardware. > > This patch enables the above. > > Note that, for correctly arranging runqueues to be per-core, > just checking cpu_to_core() on the host CPUs is not enough. > In fact, cores (and hyperthreads) on different sockets, can > have the same core (and thread) IDs! We, therefore, need to > check whether the full topology of two CPUs matches, for > them to be put in the same runqueue. > > Note also that the default (although not functional) for > credit2, since now, has been per-socket runqueue. This patch > leaves things that way, to avoid mixing policy and technical > changes. > > Finally, it would be a nice feature to be able to select > a particular runqueue arrangement, even when creating a > Credit2 cpupool. This is left as future work. > > Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com> > Signed-off-by: Uma Sharma <uma.sharma523@gmail.com> With the one comment below addressed: Reviewed-by: Juergen Gross <jgross@suse.com> > --- > Cc: George Dunlap <george.dunlap@eu.citrix.com> > Cc: Uma Sharma <uma.sharma523@gmail.com> > Cc: Juergen Gross <jgross@suse.com> > --- > Cahnges from v1: > * fix bug in parameter parsing, and start using strcmp() > for that, as requested during review. > --- > docs/misc/xen-command-line.markdown | 19 +++++++++ > xen/common/sched_credit2.c | 76 +++++++++++++++++++++++++++++++++-- > 2 files changed, 90 insertions(+), 5 deletions(-) > ... > @@ -2006,7 +2067,10 @@ cpu_to_runqueue(struct csched2_private *prv, unsigned int cpu) > BUG_ON(cpu_to_socket(cpu) == XEN_INVALID_SOCKET_ID || > cpu_to_socket(peer_cpu) == XEN_INVALID_SOCKET_ID); > > - if ( cpu_to_socket(cpumask_first(&rqd->active)) == cpu_to_socket(cpu) ) > + if ( opt_runqueue == OPT_RUNQUEUE_ALL || > + (opt_runqueue == OPT_RUNQUEUE_CORE && same_core(peer_cpu, cpu)) || > + (opt_runqueue == OPT_RUNQUEUE_SOCKET && same_socket(peer_cpu, cpu)) || > + (opt_runqueue == OPT_RUNQUEUE_NODE && same_node(peer_cpu, cpu)) ) > break; > } > > @@ -2170,6 +2234,8 @@ csched2_init(struct scheduler *ops) > printk(" load_window_shift: %d\n", opt_load_window_shift); > printk(" underload_balance_tolerance: %d\n", opt_underload_balance_tolerance); > printk(" overload_balance_tolerance: %d\n", opt_overload_balance_tolerance); > + printk(" runqueues arrangement: per-%s\n", > + opt_runqueue == OPT_RUNQUEUE_CORE ? "core" : "socket"); I asked this before: shouldn't the optiones "node" and "all" be respected here, too? Juergen
On 07/04/16 06:04, Juergen Gross wrote: > On 06/04/16 19:23, Dario Faggioli wrote: >> In fact, credit2 uses CPU topology to decide how to arrange >> its internal runqueues. Before this change, only 'one runqueue >> per socket' was allowed. However, experiments have shown that, >> for instance, having one runqueue per physical core improves >> performance, especially in case hyperthreading is available. >> >> In general, it makes sense to allow users to pick one runqueue >> arrangement at boot time, so that: >> - more experiments can be easily performed to even better >> assess and improve performance; >> - one can select the best configuration for his specific >> use case and/or hardware. >> >> This patch enables the above. >> >> Note that, for correctly arranging runqueues to be per-core, >> just checking cpu_to_core() on the host CPUs is not enough. >> In fact, cores (and hyperthreads) on different sockets, can >> have the same core (and thread) IDs! We, therefore, need to >> check whether the full topology of two CPUs matches, for >> them to be put in the same runqueue. >> >> Note also that the default (although not functional) for >> credit2, since now, has been per-socket runqueue. This patch >> leaves things that way, to avoid mixing policy and technical >> changes. >> >> Finally, it would be a nice feature to be able to select >> a particular runqueue arrangement, even when creating a >> Credit2 cpupool. This is left as future work. >> >> Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com> >> Signed-off-by: Uma Sharma <uma.sharma523@gmail.com> > > With the one comment below addressed: > > Reviewed-by: Juergen Gross <jgross@suse.com> > >> --- >> Cc: George Dunlap <george.dunlap@eu.citrix.com> >> Cc: Uma Sharma <uma.sharma523@gmail.com> >> Cc: Juergen Gross <jgross@suse.com> >> --- >> Cahnges from v1: >> * fix bug in parameter parsing, and start using strcmp() >> for that, as requested during review. >> --- >> docs/misc/xen-command-line.markdown | 19 +++++++++ >> xen/common/sched_credit2.c | 76 +++++++++++++++++++++++++++++++++-- >> 2 files changed, 90 insertions(+), 5 deletions(-) >> > > ... > >> @@ -2006,7 +2067,10 @@ cpu_to_runqueue(struct csched2_private *prv, unsigned int cpu) >> BUG_ON(cpu_to_socket(cpu) == XEN_INVALID_SOCKET_ID || >> cpu_to_socket(peer_cpu) == XEN_INVALID_SOCKET_ID); >> >> - if ( cpu_to_socket(cpumask_first(&rqd->active)) == cpu_to_socket(cpu) ) >> + if ( opt_runqueue == OPT_RUNQUEUE_ALL || >> + (opt_runqueue == OPT_RUNQUEUE_CORE && same_core(peer_cpu, cpu)) || >> + (opt_runqueue == OPT_RUNQUEUE_SOCKET && same_socket(peer_cpu, cpu)) || >> + (opt_runqueue == OPT_RUNQUEUE_NODE && same_node(peer_cpu, cpu)) ) >> break; >> } >> >> @@ -2170,6 +2234,8 @@ csched2_init(struct scheduler *ops) >> printk(" load_window_shift: %d\n", opt_load_window_shift); >> printk(" underload_balance_tolerance: %d\n", opt_underload_balance_tolerance); >> printk(" overload_balance_tolerance: %d\n", opt_overload_balance_tolerance); >> + printk(" runqueues arrangement: per-%s\n", >> + opt_runqueue == OPT_RUNQUEUE_CORE ? "core" : "socket"); > > I asked this before: shouldn't the optiones "node" and "all" be > respected here, too? Dario, would it make sense to put the string names ("core", "socket", &c) in an array, then have both parse_credit2_runqueue() iterate over the array to find the appropriate numeric value, and have this use the array to convert from the numeric value to a string? -George
On Thu, 2016-04-07 at 16:04 +0100, George Dunlap wrote: > On 07/04/16 06:04, Juergen Gross wrote: > > On 06/04/16 19:23, Dario Faggioli wrote: > > > @@ -2170,6 +2234,8 @@ csched2_init(struct scheduler *ops) > > > printk(" load_window_shift: %d\n", opt_load_window_shift); > > > printk(" underload_balance_tolerance: %d\n", > > > opt_underload_balance_tolerance); > > > printk(" overload_balance_tolerance: %d\n", > > > opt_overload_balance_tolerance); > > > + printk(" runqueues arrangement: per-%s\n", > > > + opt_runqueue == OPT_RUNQUEUE_CORE ? "core" : > > > "socket"); > > I asked this before: shouldn't the optiones "node" and "all" be > > respected here, too? > Dario, would it make sense to put the string names ("core", "socket", > &c) in an array, then have both parse_credit2_runqueue() iterate over > the array to find the appropriate numeric value, and have this use > the > array to convert from the numeric value to a string? > Ok, I'll do that. Even if I do, though, I can't get rid of the OPT_RUNQUEUE_CORE, etc., symbols, as I need to figure out what the numeric value found during parsing actually means in cpu_to_runqueue(). I know you're not mentioning this, but I felt like I better make this clear, in case one would expect for those to go away too. In any case, you'll see this in the patch. Thanks and Regards, Dario
diff --git a/docs/misc/xen-command-line.markdown b/docs/misc/xen-command-line.markdown index ca77e3b..0047f94 100644 --- a/docs/misc/xen-command-line.markdown +++ b/docs/misc/xen-command-line.markdown @@ -469,6 +469,25 @@ combination with the `low_crashinfo` command line option. ### credit2\_load\_window\_shift > `= <integer>` +### credit2\_runqueue +> `= core | socket | node | all` + +> Default: `socket` + +Specify how host CPUs are arranged in runqueues. Runqueues are kept +balanced with respect to the load generated by the vCPUs running on +them. Smaller runqueues (as in with `core`) means more accurate load +balancing (for instance, it will deal better with hyperthreading), +but also more overhead. + +Available alternatives, with their meaning, are: +* `core`: one runqueue per each physical core of the host; +* `socket`: one runqueue per each physical socket (which often, + but not always, matches a NUMA node) of the host; +* `node`: one runqueue per each NUMA node of the host; +* `all`: just one runqueue shared by all the logical pCPUs of + the host + ### dbgp > `= ehci[ <integer> | @pci<bus>:<slot>.<func> ]` diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c index a61a45a..20f8d35 100644 --- a/xen/common/sched_credit2.c +++ b/xen/common/sched_credit2.c @@ -81,10 +81,6 @@ * Credits are "reset" when the next vcpu in the runqueue is less than * or equal to zero. At that point, everyone's credits are "clipped" * to a small value, and a fixed credit is added to everyone. - * - * The plan is for all cores that share an L2 will share the same - * runqueue. At the moment, there is one global runqueue for all - * cores. */ /* @@ -193,6 +189,55 @@ static int __read_mostly opt_overload_balance_tolerance = -3; integer_param("credit2_balance_over", opt_overload_balance_tolerance); /* + * Runqueue organization. + * + * The various cpus are to be assigned each one to a runqueue, and we + * want that to happen basing on topology. At the moment, it is possible + * to choose to arrange runqueues to be: + * + * - per-core: meaning that there will be one runqueue per each physical + * core of the host. This will happen if the opt_runqueue + * parameter is set to 'core'; + * + * - per-node: meaning that there will be one runqueue per each physical + * NUMA node of the host. This will happen if the opt_runqueue + * parameter is set to 'node'; + * + * - per-socket: meaning that there will be one runqueue per each physical + * socket (AKA package, which often, but not always, also + * matches a NUMA node) of the host; This will happen if + * the opt_runqueue parameter is set to 'socket'; + * + * - global: meaning that there will be only one runqueue to which all the + * (logical) processors of the host belongs. This will happen if + * the opt_runqueue parameter is set to 'all'. + * + * Depending on the value of opt_runqueue, therefore, cpus that are part of + * either the same physical core, or of the same physical socket, will be + * put together to form runqueues. + */ +#define OPT_RUNQUEUE_CORE 1 +#define OPT_RUNQUEUE_SOCKET 2 +#define OPT_RUNQUEUE_NODE 3 +#define OPT_RUNQUEUE_ALL 4 +static int __read_mostly opt_runqueue = OPT_RUNQUEUE_SOCKET; + +static void parse_credit2_runqueue(const char *s) +{ + if ( !strcmp(s, "core") ) + opt_runqueue = OPT_RUNQUEUE_CORE; + else if ( !strcmp(s, "socket") ) + opt_runqueue = OPT_RUNQUEUE_SOCKET; + else if ( !strcmp(s, "node") ) + opt_runqueue = OPT_RUNQUEUE_NODE; + else if ( !strcmp(s, "all") ) + opt_runqueue = OPT_RUNQUEUE_ALL; + else + printk("WARNING, unrecognized value of credit2_runqueue option!\n"); +} +custom_param("credit2_runqueue", parse_credit2_runqueue); + +/* * Per-runqueue data */ struct csched2_runqueue_data { @@ -1974,6 +2019,22 @@ static void deactivate_runqueue(struct csched2_private *prv, int rqi) cpumask_clear_cpu(rqi, &prv->active_queues); } +static inline bool_t same_node(unsigned int cpua, unsigned int cpub) +{ + return cpu_to_node(cpua) == cpu_to_node(cpub); +} + +static inline bool_t same_socket(unsigned int cpua, unsigned int cpub) +{ + return cpu_to_socket(cpua) == cpu_to_socket(cpub); +} + +static inline bool_t same_core(unsigned int cpua, unsigned int cpub) +{ + return same_socket(cpua, cpub) && + cpu_to_core(cpua) == cpu_to_core(cpub); +} + static unsigned int cpu_to_runqueue(struct csched2_private *prv, unsigned int cpu) { @@ -2006,7 +2067,10 @@ cpu_to_runqueue(struct csched2_private *prv, unsigned int cpu) BUG_ON(cpu_to_socket(cpu) == XEN_INVALID_SOCKET_ID || cpu_to_socket(peer_cpu) == XEN_INVALID_SOCKET_ID); - if ( cpu_to_socket(cpumask_first(&rqd->active)) == cpu_to_socket(cpu) ) + if ( opt_runqueue == OPT_RUNQUEUE_ALL || + (opt_runqueue == OPT_RUNQUEUE_CORE && same_core(peer_cpu, cpu)) || + (opt_runqueue == OPT_RUNQUEUE_SOCKET && same_socket(peer_cpu, cpu)) || + (opt_runqueue == OPT_RUNQUEUE_NODE && same_node(peer_cpu, cpu)) ) break; } @@ -2170,6 +2234,8 @@ csched2_init(struct scheduler *ops) printk(" load_window_shift: %d\n", opt_load_window_shift); printk(" underload_balance_tolerance: %d\n", opt_underload_balance_tolerance); printk(" overload_balance_tolerance: %d\n", opt_overload_balance_tolerance); + printk(" runqueues arrangement: per-%s\n", + opt_runqueue == OPT_RUNQUEUE_CORE ? "core" : "socket"); if ( opt_load_window_shift < LOADAVG_WINDOW_SHIFT_MIN ) {