diff mbox series

[RFC,44/49] xen: round up max vcpus to scheduling granularity

Message ID 20190329150934.17694-45-jgross@suse.com (mailing list archive)
State Superseded
Headers show
Series xen: add core scheduling support | expand

Commit Message

Jürgen Groß March 29, 2019, 3:09 p.m. UTC
Make sure the number of vcpus is always a multiple of the scheduling
granularity. Note that we don't support a scheduling granularity above
one on ARM.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
 xen/arch/x86/dom0_build.c | 1 +
 xen/common/domain.c       | 1 +
 xen/common/domctl.c       | 1 +
 xen/include/xen/sched.h   | 5 +++++
 4 files changed, 8 insertions(+)

Comments

Andrew Cooper April 1, 2019, 8:50 a.m. UTC | #1
On 29/03/2019 15:09, Juergen Gross wrote:
> Make sure the number of vcpus is always a multiple of the scheduling
> granularity. Note that we don't support a scheduling granularity above
> one on ARM.

I'm afraid that I don't think this is a clever move.  In turn, this
brings into question the approach to idle handling.

Firstly, with a proposed socket granularity, this would be 128 on some
systems which exist today.  Furthermore, consider the case where
cpupool0 has a granularity of 1, and a second pool has a granularity of
2.  A domain can be created with an odd number of vcpus and operate in
pool0 fine, but can't now be moved to pool1.

If at all possible, I think it would be better to try and reuse the idle
cpus for holes like this.  Seeing as you've been playing with this code
a lot, what is your assessment?

~Andrew
Jürgen Groß April 1, 2019, 9:47 a.m. UTC | #2
On 01/04/2019 10:50, Andrew Cooper wrote:
> On 29/03/2019 15:09, Juergen Gross wrote:
>> Make sure the number of vcpus is always a multiple of the scheduling
>> granularity. Note that we don't support a scheduling granularity above
>> one on ARM.
> 
> I'm afraid that I don't think this is a clever move.  In turn, this
> brings into question the approach to idle handling.
> 
> Firstly, with a proposed socket granularity, this would be 128 on some
> systems which exist today.  Furthermore, consider the case where
> cpupool0 has a granularity of 1, and a second pool has a granularity of
> 2.  A domain can be created with an odd number of vcpus and operate in
> pool0 fine, but can't now be moved to pool1.

For now granularity is the same for all pools, but I plan to enhance
that in future.

The answer to that problem might be either to allow for later addition
of dummy vcpus (e.g. by sizing only the vcpu pointer array to the needed
number), or to really disallow moving such a domain between pools.

> If at all possible, I think it would be better to try and reuse the idle
> cpus for holes like this.  Seeing as you've been playing with this code
> a lot, what is your assessment?

This would be rather complicated. I'd either need to switch vcpus
dynamically in schedule items, or I'd need to special case the idle
vcpus in _lots_ of places.


Juergen
Jürgen Groß April 2, 2019, 7:49 a.m. UTC | #3
On 01/04/2019 11:47, Juergen Gross wrote:
> On 01/04/2019 10:50, Andrew Cooper wrote:
>> On 29/03/2019 15:09, Juergen Gross wrote:
>>> Make sure the number of vcpus is always a multiple of the scheduling
>>> granularity. Note that we don't support a scheduling granularity above
>>> one on ARM.
>>
>> I'm afraid that I don't think this is a clever move.  In turn, this
>> brings into question the approach to idle handling.
>>
>> Firstly, with a proposed socket granularity, this would be 128 on some
>> systems which exist today.  Furthermore, consider the case where
>> cpupool0 has a granularity of 1, and a second pool has a granularity of
>> 2.  A domain can be created with an odd number of vcpus and operate in
>> pool0 fine, but can't now be moved to pool1.
> 
> For now granularity is the same for all pools, but I plan to enhance
> that in future.
> 
> The answer to that problem might be either to allow for later addition
> of dummy vcpus (e.g. by sizing only the vcpu pointer array to the needed
> number), or to really disallow moving such a domain between pools.
> 
>> If at all possible, I think it would be better to try and reuse the idle
>> cpus for holes like this.  Seeing as you've been playing with this code
>> a lot, what is your assessment?
> 
> This would be rather complicated. I'd either need to switch vcpus
> dynamically in schedule items, or I'd need to special case the idle
> vcpus in _lots_ of places.

I have thought more about this and maybe I have found a way to make that
less intrusive as I thought in the beginning.

I'll give it a try...


Juergen
diff mbox series

Patch

diff --git a/xen/arch/x86/dom0_build.c b/xen/arch/x86/dom0_build.c
index 77b5646424..76a81dd4a9 100644
--- a/xen/arch/x86/dom0_build.c
+++ b/xen/arch/x86/dom0_build.c
@@ -258,6 +258,7 @@  unsigned int __init dom0_max_vcpus(void)
         max_vcpus = opt_dom0_max_vcpus_min;
     if ( opt_dom0_max_vcpus_max < max_vcpus )
         max_vcpus = opt_dom0_max_vcpus_max;
+    max_vcpus = sched_max_vcpus(max_vcpus);
     limit = dom0_pvh ? HVM_MAX_VCPUS : MAX_VIRT_CPUS;
     if ( max_vcpus > limit )
         max_vcpus = limit;
diff --git a/xen/common/domain.c b/xen/common/domain.c
index b448d20d40..d338a2204c 100644
--- a/xen/common/domain.c
+++ b/xen/common/domain.c
@@ -290,6 +290,7 @@  static int sanitise_domain_config(struct xen_domctl_createdomain *config)
         return -EINVAL;
     }
 
+    config->max_vcpus = sched_max_vcpus(config->max_vcpus);
     if ( config->max_vcpus < 1 )
     {
         dprintk(XENLOG_INFO, "No vCPUS\n");
diff --git a/xen/common/domctl.c b/xen/common/domctl.c
index ccde1ba706..80837a2a5e 100644
--- a/xen/common/domctl.c
+++ b/xen/common/domctl.c
@@ -542,6 +542,7 @@  long do_domctl(XEN_GUEST_HANDLE_PARAM(xen_domctl_t) u_domctl)
     {
         unsigned int i, max = op->u.max_vcpus.max;
 
+        max = sched_max_vcpus(max);
         ret = -EINVAL;
         if ( (d == current->domain) || /* no domain_pause() */
              (max != d->max_vcpus) )   /* max_vcpus set up in createdomain */
diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
index 52a1abfca9..314a453a60 100644
--- a/xen/include/xen/sched.h
+++ b/xen/include/xen/sched.h
@@ -490,6 +490,11 @@  extern struct vcpu *idle_vcpu[NR_CPUS];
 
 extern unsigned int sched_granularity;
 
+static inline unsigned int sched_max_vcpus(unsigned int n_vcpus)
+{
+    return DIV_ROUND_UP(n_vcpus, sched_granularity) * sched_granularity;
+}
+
 static inline bool is_system_domain(const struct domain *d)
 {
     return d->domain_id >= DOMID_FIRST_RESERVED;