[23/60] xen/sched: make credit2 scheduler vcpu agnostic.

Message ID	20190528103313.1343-24-jgross@suse.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <xen-devel-bounces@lists.xenproject.org> From: Juergen Gross <jgross@suse.com> To: xen-devel@lists.xenproject.org Date: Tue, 28 May 2019 12:32:36 +0200 Message-Id: <20190528103313.1343-24-jgross@suse.com> In-Reply-To: <20190528103313.1343-1-jgross@suse.com> References: <20190528103313.1343-1-jgross@suse.com> Subject: [Xen-devel] [PATCH 23/60] xen/sched: make credit2 scheduler vcpu agnostic. Precedence: list Cc: Juergen Gross <jgross@suse.com>, George Dunlap <george.dunlap@eu.citrix.com>, Dario Faggioli <dfaggioli@suse.com> MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: base64 Errors-To: xen-devel-bounces@lists.xenproject.org Sender: "Xen-devel" <xen-devel-bounces@lists.xenproject.org>
Series	xen: add core scheduling support \| expand [00/60] xen: add core scheduling support [01/60] xen/sched: only allow schedulers with all mandatory functions available [02/60] xen/sched: add inline wrappers for calling per-scheduler functions [03/60] xen/sched: let sched_switch_sched() return new lock address [04/60] xen/sched: use new sched_unit instead of vcpu in scheduler interfaces [05/60] xen/sched: alloc struct sched_unit for each vcpu [06/60] xen/sched: move per-vcpu scheduler private data pointer to sched_unit [07/60] xen/sched: build a linked list of struct sched_unit [08/60] xen/sched: introduce struct sched_resource [09/60] xen/sched: let pick_cpu return a scheduler resource [10/60] xen/sched: switch schedule_data.curr to point at sched_unit [11/60] xen/sched: move per cpu scheduler private data into struct sched_resource [12/60] xen/sched: switch vcpu_schedule_lock to unit_schedule_lock [13/60] xen/sched: move some per-vcpu items to struct sched_unit [14/60] xen/sched: add scheduler helpers hiding vcpu [15/60] xen/sched: add domain pointer to struct sched_unit [16/60] xen/sched: add id to struct sched_unit [17/60] xen/sched: rename scheduler related perf counters [18/60] xen/sched: switch struct task_slice from vcpu to sched_unit [19/60] xen/sched: add is_running indicator to struct sched_unit [20/60] xen/sched: make null scheduler vcpu agnostic. [21/60] xen/sched: make rt scheduler vcpu agnostic. [22/60] xen/sched: make credit scheduler vcpu agnostic. [23/60] xen/sched: make credit2 scheduler vcpu agnostic. [24/60] xen/sched: make arinc653 scheduler vcpu agnostic. [25/60] xen: add sched_unit_pause_nosync() and sched_unit_unpause() [26/60] xen: let vcpu_create() select processor [27/60] xen/sched: use sched_resource cpu instead smp_processor_id in schedulers [28/60] xen/sched: switch schedule() from vcpus to sched_units [29/60] xen/sched: switch sched_move_irqs() to take sched_unit as parameter [30/60] xen: switch from for_each_vcpu() to for_each_sched_unit() [31/60] xen/sched: add runstate counters to struct sched_unit [32/60] xen/sched: rework and rename vcpu_force_reschedule() [33/60] xen/sched: Change vcpu_migrate_*() to operate on schedule unit [34/60] xen/sched: move struct task_slice into struct sched_unit [35/60] xen/sched: add code to sync scheduling of all vcpus of a sched unit [36/60] xen/sched: introduce unit_runnable_state() [37/60] xen/sched: add support for multiple vcpus per sched unit where missing [38/60] x86: make loading of GDT at context switch more modular [39/60] x86: optimize loading of GDT at context switch [40/60] xen/sched: modify cpupool_domain_cpumask() to be an unit mask [41/60] xen/sched: support allocating multiple vcpus into one sched unit [42/60] xen/sched: add a scheduler_percpu_init() function [43/60] xen/sched: add a percpu resource index [44/60] xen/sched: add fall back to idle vcpu when scheduling unit [45/60] xen/sched: make vcpu_wake() and vcpu_sleep() core scheduling aware [46/60] xen/sched: carve out freeing sched_unit memory into dedicated function [47/60] xen/sched: move per-cpu variable scheduler to struct sched_resource [48/60] xen/sched: move per-cpu variable cpupool to struct sched_resource [49/60] xen/sched: reject switching smt on/off with core scheduling active [50/60] xen/sched: prepare per-cpupool scheduling granularity [51/60] xen/sched: use one schedule lock for all free cpus [52/60] xen/sched: populate cpupool0 only after all cpus are up [53/60] xen/sched: remove cpu from pool0 before removing it [54/60] xen/sched: add minimalistic idle scheduler for free cpus [55/60] xen/sched: split schedule_cpu_switch() [56/60] xen/sched: protect scheduling resource via rcu [57/60] xen/sched: support multiple cpus per scheduling resource [58/60] xen/sched: support differing granularity in schedule_cpu_[add/rm]() [59/60] xen/sched: support core scheduling for moving cpus to/from cpupools [60/60] xen/sched: add scheduling granularity enum

diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c index ef29a3d874..0f57135e81 100644 --- a/xen/common/sched_credit2.c +++ b/xen/common/sched_credit2.c @@ -45,7 +45,7 @@ #define TRC_CSCHED2_SCHED_TASKLET TRC_SCHED_CLASS_EVT(CSCHED2, 8) #define TRC_CSCHED2_UPDATE_LOAD TRC_SCHED_CLASS_EVT(CSCHED2, 9) #define TRC_CSCHED2_RUNQ_ASSIGN TRC_SCHED_CLASS_EVT(CSCHED2, 10) -#define TRC_CSCHED2_UPDATE_VCPU_LOAD TRC_SCHED_CLASS_EVT(CSCHED2, 11) +#define TRC_CSCHED2_UPDATE_UNIT_LOAD TRC_SCHED_CLASS_EVT(CSCHED2, 11) #define TRC_CSCHED2_UPDATE_RUNQ_LOAD TRC_SCHED_CLASS_EVT(CSCHED2, 12) #define TRC_CSCHED2_TICKLE_NEW TRC_SCHED_CLASS_EVT(CSCHED2, 13) #define TRC_CSCHED2_RUNQ_MAX_WEIGHT TRC_SCHED_CLASS_EVT(CSCHED2, 14) @@ -74,13 +74,13 @@ * Design: * * VMs "burn" credits based on their weight; higher weight means - * credits burn more slowly. The highest weight vcpu burns credits at + * credits burn more slowly. The highest weight unit burns credits at * a rate of 1 credit per nanosecond. Others burn proportionally * more. * - * vcpus are inserted into the runqueue by credit order. + * units are inserted into the runqueue by credit order. * - * Credits are "reset" when the next vcpu in the runqueue is less than + * Credits are "reset" when the next unit in the runqueue is less than * or equal to zero. At that point, everyone's credits are "clipped" * to a small value, and a fixed credit is added to everyone. */ @@ -95,33 +95,33 @@ * be given a cap of 25%; a domain that must not use more than 1+1/2 of * physical CPU time, will be given a cap of 150%; * - * - caps are per-domain (not per-vCPU). If a domain has only 1 vCPU, and - * a 40% cap, that one vCPU will use 40% of one pCPU. If a somain has 4 - * vCPUs, and a 200% cap, the equivalent of 100% time on 2 pCPUs will be - * split among the v vCPUs. How much each of the vCPUs will actually get, + * - caps are per-domain (not per-unit). If a domain has only 1 unit, and + * a 40% cap, that one unit will use 40% of one pCPU. If a somain has 4 + * units, and a 200% cap, the equivalent of 100% time on 2 pCPUs will be + * split among the v units. How much each of the units will actually get, * during any given interval of time, is unspecified (as it depends on * various aspects: workload, system load, etc.). For instance, it is - * possible that, during a given time interval, 2 vCPUs use 100% each, + * possible that, during a given time interval, 2 units use 100% each, * and the other two use nothing; while during another time interval, - * two vCPUs use 80%, one uses 10% and the other 30%; or that each use + * two units use 80%, one uses 10% and the other 30%; or that each use * 50% (and so on and so forth). * * For implementing this, we use the following approach: * * - each domain is given a 'budget', an each domain has a timer, which * replenishes the domain's budget periodically. The budget is the amount - * of time the vCPUs of the domain can use every 'period'; + * of time the units of the domain can use every 'period'; * * - the period is CSCHED2_BDGT_REPL_PERIOD, and is the same for all domains * (but each domain has its own timer; so the all are periodic by the same * period, but replenishment of the budgets of the various domains, at * periods boundaries, are not synchronous); * - * - when vCPUs run, they consume budget. When they don't run, they don't - * consume budget. If there is no budget left for the domain, no vCPU of - * that domain can run. If a vCPU tries to run and finds that there is no + * - when units run, they consume budget. When they don't run, they don't + * consume budget. If there is no budget left for the domain, no unit of + * that domain can run. If an unit tries to run and finds that there is no * budget, it blocks. - * At whatever time a vCPU wants to run, it must check the domain's budget, + * At whatever time an unit wants to run, it must check the domain's budget, * and if there is some, it can use it. * * - budget is replenished to the top of the capacity for the domain once @@ -129,39 +129,39 @@ * though, the budget after a replenishment will always be at most equal * to the total capacify of the domain ('tot_budget'); * - * - when a budget replenishment occurs, if there are vCPUs that had been + * - when a budget replenishment occurs, if there are units that had been * blocked because of lack of budget, they'll be unblocked, and they will * (potentially) be able to run again. * * Finally, some even more implementation related detail: * - * - budget is stored in a domain-wide pool. vCPUs of the domain that want + * - budget is stored in a domain-wide pool. Items of the domain that want * to run go to such pool, and grub some. When they do so, the amount * they grabbed is _immediately_ removed from the pool. This happens in - * vcpu_grab_budget(); + * unit_grab_budget(); * - * - when vCPUs stop running, if they've not consumed all the budget they + * - when units stop running, if they've not consumed all the budget they * took, the leftover is put back in the pool. This happens in - * vcpu_return_budget(); + * unit_return_budget(); * - * - the above means that a vCPU can find out that there is no budget and + * - the above means that an unit can find out that there is no budget and * block, not only if the cap has actually been reached (for this period), - * but also if some other vCPUs, in order to run, have grabbed a certain + * but also if some other units, in order to run, have grabbed a certain * quota of budget, no matter whether they've already used it all or not. - * A vCPU blocking because (any form of) lack of budget is said to be - * "parked", and such blocking happens in park_vcpu(); + * An unit blocking because (any form of) lack of budget is said to be + * "parked", and such blocking happens in park_unit(); * - * - when a vCPU stops running, and puts back some budget in the domain pool, + * - when an unit stops running, and puts back some budget in the domain pool, * we need to check whether there is someone which has been parked and that - * can be unparked. This happens in unpark_parked_vcpus(), called from + * can be unparked. This happens in unpark_parked_units(), called from * csched2_context_saved(); * * - of course, unparking happens also as a consequence of the domain's budget * being replenished by the periodic timer. This also occurs by means of * calling csched2_context_saved() (but from replenish_domain_budget()); * - * - parked vCPUs of a domain are kept in a (per-domain) list, called - * 'parked_vcpus'). Manipulation of the list and of the domain-wide budget + * - parked units of a domain are kept in a (per-domain) list, called + * 'parked_units'). Manipulation of the list and of the domain-wide budget * pool, must occur only when holding the 'budget_lock'. */ @@ -174,9 +174,9 @@ * pcpu_schedule_lock() / unit_schedule_lock() (and friends), * * a cpu may (try to) take a "remote" runqueue lock, e.g., for * load balancing; - * + serializes runqueue operations (removing and inserting vcpus); + * + serializes runqueue operations (removing and inserting units); * + protects runqueue-wide data in csched2_runqueue_data; - * + protects vcpu parameters in csched2_unit for the vcpu in the + * + protects unit parameters in csched2_unit for the unit in the * runqueue. * * - Private scheduler lock @@ -190,8 +190,8 @@ * + it is per-domain; * + protects, in domains that have an utilization cap; * * manipulation of the total budget of the domain (as it is shared - * among all vCPUs of the domain), - * * manipulation of the list of vCPUs that are blocked waiting for + * among all units of the domain), + * * manipulation of the list of units that are blocked waiting for * some budget to be available. * * - Type: @@ -228,9 +228,9 @@ */ #define CSCHED2_CREDIT_INIT MILLISECS(10) /* - * Amount of credit the idle vcpus have. It never changes, as idle - * vcpus does not consume credits, and it must be lower than whatever - * amount of credit 'regular' vcpu would end up with. + * Amount of credit the idle units have. It never changes, as idle + * units does not consume credits, and it must be lower than whatever + * amount of credit 'regular' unit would end up with. */ #define CSCHED2_IDLE_CREDIT (-(1U<<30)) /* @@ -243,9 +243,9 @@ * MIN_TIMER. */ #define CSCHED2_MIGRATE_RESIST ((opt_migrate_resist)*MICROSECS(1)) -/* How much to "compensate" a vcpu for L2 migration. */ +/* How much to "compensate" an unit for L2 migration. */ #define CSCHED2_MIGRATE_COMPENSATION MICROSECS(50) -/* How tolerant we should be when peeking at runtime of vcpus on other cpus */ +/* How tolerant we should be when peeking at runtime of units on other cpus */ #define CSCHED2_RATELIMIT_TICKLE_TOLERANCE MICROSECS(50) /* Reset: Value below which credit will be reset. */ #define CSCHED2_CREDIT_RESET 0 @@ -258,7 +258,7 @@ * Flags */ /* - * CSFLAG_scheduled: Is this vcpu either running on, or context-switching off, + * CSFLAG_scheduled: Is this unit either running on, or context-switching off, * a physical cpu? * + Accessed only with runqueue lock held * + Set when chosen as next in csched2_schedule(). @@ -280,21 +280,21 @@ #define __CSFLAG_delayed_runq_add 2 #define CSFLAG_delayed_runq_add (1U<<__CSFLAG_delayed_runq_add) /* - * CSFLAG_runq_migrate_request: This vcpu is being migrated as a result of a + * CSFLAG_runq_migrate_request: This unit is being migrated as a result of a * credit2-initiated runq migrate request; migrate it to the runqueue indicated - * in the svc struct. + * in the svc struct. */ #define __CSFLAG_runq_migrate_request 3 #define CSFLAG_runq_migrate_request (1U<<__CSFLAG_runq_migrate_request) /* - * CSFLAG_vcpu_yield: this vcpu was running, and has called vcpu_yield(). The + * CSFLAG_unit_yield: this unit was running, and has called vcpu_yield(). The * scheduler is invoked to see if we can give the cpu to someone else, and - * get back to the yielding vcpu in a while. + * get back to the yielding unit in a while. */ -#define __CSFLAG_vcpu_yield 4 -#define CSFLAG_vcpu_yield (1U<<__CSFLAG_vcpu_yield) +#define __CSFLAG_unit_yield 4 +#define CSFLAG_unit_yield (1U<<__CSFLAG_unit_yield) /* - * CSFLAGS_pinned: this vcpu is currently 'pinned', i.e., has its hard + * CSFLAGS_pinned: this unit is currently 'pinned', i.e., has its hard * affinity set to one and only 1 cpu (and, hence, can only run there). */ #define __CSFLAG_pinned 5 @@ -306,7 +306,7 @@ integer_param("sched_credit2_migrate_resist", opt_migrate_resist); /* * Load tracking and load balancing * - * Load history of runqueues and vcpus is accounted for by using an + * Load history of runqueues and units is accounted for by using an * exponential weighted moving average algorithm. However, instead of using * fractions,we shift everything to left by the number of bits we want to * use for representing the fractional part (Q-format). @@ -326,7 +326,7 @@ integer_param("sched_credit2_migrate_resist", opt_migrate_resist); * * where W is the length of the window, P the multiplier for transitiong into * Q-format fixed point arithmetic and load is the instantaneous load of a - * runqueue, which basically is the number of runnable vcpus there are on the + * runqueue, which basically is the number of runnable units there are on the * runqueue (for the meaning of the other terms, look at the doc comment to * update_runq_load()). * @@ -338,7 +338,7 @@ integer_param("sched_credit2_migrate_resist", opt_migrate_resist); * The maximum possible value for the average load, which we want to store in * s_time_t type variables (i.e., we have 63 bits available) is load*P. This * means that, with P 18 bits wide, load can occupy 45 bits. This in turn - * means we can have 2^45 vcpus in each runqueue, before overflow occurs! + * means we can have 2^45 units in each runqueue, before overflow occurs! * * However, it can happen that, at step j+1, if: * @@ -354,13 +354,13 @@ integer_param("sched_credit2_migrate_resist", opt_migrate_resist); * * 2^(63 - 30 - 18) = 2^15 = 32768 * - * So 32768 is the maximum number of vcpus the we can have in a runqueue, + * So 32768 is the maximum number of units the we can have in a runqueue, * at any given time, and still not have problems with the load tracking * calculations... and this is more than fine. * * As a matter of fact, since we are using microseconds granularity, we have * W=2^20. So, still with 18 fractional bits and a 1 second long window, there - * may be 2^25 = 33554432 vcpus in a runq before we have to start thinking + * may be 2^25 = 33554432 units in a runq before we have to start thinking * about overflow. */ @@ -468,7 +468,7 @@ struct csched2_runqueue_data { struct list_head runq; /* Ordered list of runnable vms */ int id; /* ID of this runqueue (-1 if invalid) */ - int load; /* Instantaneous load (num of non-idle vcpus) */ + int load; /* Instantaneous load (num of non-idle units) */ s_time_t load_last_update; /* Last time average was updated */ s_time_t avgload; /* Decaying queue load */ s_time_t b_avgload; /* Decaying queue load modified by balancing */ @@ -478,8 +478,8 @@ struct csched2_runqueue_data { tickled, /* Have been asked to go through schedule */ idle; /* Currently idle pcpus */ - struct list_head svc; /* List of all vcpus assigned to the runqueue */ - unsigned int max_weight; /* Max weight of the vcpus in this runqueue */ + struct list_head svc; /* List of all units assigned to the runqueue */ + unsigned int max_weight; /* Max weight of the units in this runqueue */ unsigned int pick_bias; /* Last picked pcpu. Start from it next time */ }; @@ -509,20 +509,20 @@ struct csched2_pcpu { }; /* - * Virtual CPU + * Schedule Item */ struct csched2_unit { struct csched2_dom *sdom; /* Up-pointer to domain */ - struct vcpu *vcpu; /* Up-pointer, to vcpu */ + struct sched_unit *unit; /* Up-pointer, to schedule unit */ struct csched2_runqueue_data *rqd; /* Up-pointer to the runqueue */ int credit; /* Current amount of credit */ - unsigned int weight; /* Weight of this vcpu */ + unsigned int weight; /* Weight of this unit */ unsigned int residual; /* Reminder of div(max_weight/weight) */ unsigned flags; /* Status flags (16 bits would be ok, */ s_time_t budget; /* Current budget (if domains has cap) */ /* but clear_bit() does not like that) */ - s_time_t budget_quota; /* Budget to which vCPU is entitled */ + s_time_t budget_quota; /* Budget to which unit is entitled */ s_time_t start_time; /* Time we were scheduled (for credit) */ @@ -531,7 +531,7 @@ struct csched2_unit { s_time_t avgload; /* Decaying queue load */ struct list_head runq_elem; /* On the runqueue (rqd->runq) */ - struct list_head parked_elem; /* On the parked_vcpus list */ + struct list_head parked_elem; /* On the parked_units list */ struct list_head rqd_elem; /* On csched2_runqueue_data's svc list */ struct csched2_runqueue_data *migrate_rqd; /* Pre-determined migr. target */ int tickled_cpu; /* Cpu that will pick us (-1 if none) */ @@ -549,12 +549,12 @@ struct csched2_dom { struct timer repl_timer; /* Timer for periodic replenishment of budget */ s_time_t next_repl; /* Time at which next replenishment occurs */ - struct list_head parked_vcpus; /* List of CPUs waiting for budget */ + struct list_head parked_units; /* List of CPUs waiting for budget */ struct list_head sdom_elem; /* On csched2_runqueue_data's sdom list */ uint16_t weight; /* User specified weight */ uint16_t cap; /* User specified cap */ - uint16_t nr_vcpus; /* Number of vcpus of this domain */ + uint16_t nr_units; /* Number of units of this domain */ }; /* @@ -593,7 +593,7 @@ static inline struct csched2_runqueue_data *c2rqd(const struct scheduler *ops, return &csched2_priv(ops)->rqd[c2r(cpu)]; } -/* Does the domain of this vCPU have a cap? */ +/* Does the domain of this unit have a cap? */ static inline bool has_cap(const struct csched2_unit *svc) { return svc->budget != STIME_MAX; @@ -611,24 +611,24 @@ static inline bool has_cap(const struct csched2_unit *svc) * smt_idle mask. * * Once we have such a mask, it is easy to implement a policy that, either: - * - uses fully idle cores first: it is enough to try to schedule the vcpus + * - uses fully idle cores first: it is enough to try to schedule the units * on pcpus from smt_idle mask first. This is what happens if * sched_smt_power_savings was not set at boot (default), and it maximizes * true parallelism, and hence performance; - * - uses already busy cores first: it is enough to try to schedule the vcpus + * - uses already busy cores first: it is enough to try to schedule the units * on pcpus that are idle, but are not in smt_idle. This is what happens if * sched_smt_power_savings is set at boot, and it allows as more cores as * possible to stay in low power states, minimizing power consumption. * * This logic is entirely implemented in runq_tickle(), and that is enough. - * In fact, in this scheduler, placement of a vcpu on one of the pcpus of a + * In fact, in this scheduler, placement of an unit on one of the pcpus of a * runq, _always_ happens by means of tickling: - * - when a vcpu wakes up, it calls csched2_unit_wake(), which calls + * - when an unit wakes up, it calls csched2_unit_wake(), which calls * runq_tickle(); * - when a migration is initiated in schedule.c, we call csched2_res_pick(), * csched2_unit_migrate() (which calls migrate()) and csched2_unit_wake(). * csched2_res_pick() looks for the least loaded runq and return just any - * of its processors. Then, csched2_unit_migrate() just moves the vcpu to + * of its processors. Then, csched2_unit_migrate() just moves the unit to * the chosen runq, and it is again runq_tickle(), called by * csched2_unit_wake() that actually decides what pcpu to use within the * chosen runq; @@ -643,7 +643,7 @@ static inline bool has_cap(const struct csched2_unit *svc) * * NB that rqd->smt_idle is different than rqd->idle. rqd->idle * records pcpus that at are merely idle (i.e., at the moment do not - * have a vcpu running on them). But you have to manually filter out + * have an unit running on them). But you have to manually filter out * which pcpus have been tickled in order to find cores that are not * going to be busy soon. Filtering out tickled cpus pairwise is a * lot of extra pain; so for rqd->smt_idle, we explicitly make so that @@ -690,24 +690,24 @@ void smt_idle_mask_clear(unsigned int cpu, cpumask_t *mask) */ static int get_fallback_cpu(struct csched2_unit *svc) { - struct vcpu *v = svc->vcpu; + struct sched_unit *unit = svc->unit; unsigned int bs; SCHED_STAT_CRANK(need_fallback_cpu); for_each_affinity_balance_step( bs ) { - int cpu = v->processor; + int cpu = sched_unit_cpu(unit); - if ( bs == BALANCE_SOFT_AFFINITY && !has_soft_affinity(v->sched_unit) ) + if ( bs == BALANCE_SOFT_AFFINITY && !has_soft_affinity(unit) ) continue; - affinity_balance_cpumask(v->sched_unit, bs, cpumask_scratch_cpu(cpu)); + affinity_balance_cpumask(unit, bs, cpumask_scratch_cpu(cpu)); cpumask_and(cpumask_scratch_cpu(cpu), cpumask_scratch_cpu(cpu), - cpupool_domain_cpumask(v->domain)); + cpupool_domain_cpumask(unit->domain)); /* - * This is cases 1 or 3 (depending on bs): if v->processor is (still) + * This is cases 1 or 3 (depending on bs): if processor is (still) * in our affinity, go for it, for cache betterness. */ if ( likely(cpumask_test_cpu(cpu, cpumask_scratch_cpu(cpu))) ) @@ -729,7 +729,7 @@ static int get_fallback_cpu(struct csched2_unit *svc) * We may well pick any valid pcpu from our soft-affinity, outside * of our current runqueue, but we decide not to. In fact, changing * runqueue is slow, affects load distribution, and is a source of - * overhead for the vcpus running on the other runqueue (we need the + * overhead for the units running on the other runqueue (we need the * lock). So, better do that as a consequence of a well informed * decision (or if we really don't have any other chance, as we will, * at step 5, if we get to there). @@ -761,7 +761,7 @@ static int get_fallback_cpu(struct csched2_unit *svc) * We can't be here. But if that somehow happen (in non-debug builds), * at least return something which both online and in our hard-affinity. */ - return cpumask_any(cpumask_scratch_cpu(v->processor)); + return cpumask_any(cpumask_scratch_cpu(sched_unit_cpu(unit))); } /* @@ -790,7 +790,7 @@ static s_time_t c2t(struct csched2_runqueue_data *rqd, s_time_t credit, struct c * Runqueue related code. */ -static inline int vcpu_on_runq(struct csched2_unit *svc) +static inline int unit_on_runq(struct csched2_unit *svc) { return !list_empty(&svc->runq_elem); } @@ -948,17 +948,17 @@ _runq_assign(struct csched2_unit *svc, struct csched2_runqueue_data *rqd) update_max_weight(svc->rqd, svc->weight, 0); - /* Expected new load based on adding this vcpu */ + /* Expected new load based on adding this unit */ rqd->b_avgload += svc->avgload; if ( unlikely(tb_init_done) ) { struct { - unsigned vcpu:16, dom:16; + unsigned unit:16, dom:16; unsigned rqi:16; } d; - d.dom = svc->vcpu->domain->domain_id; - d.vcpu = svc->vcpu->vcpu_id; + d.dom = svc->unit->domain->domain_id; + d.unit = svc->unit->unit_id; d.rqi=rqd->id; __trace_var(TRC_CSCHED2_RUNQ_ASSIGN, 1, sizeof(d), @@ -968,13 +968,13 @@ _runq_assign(struct csched2_unit *svc, struct csched2_runqueue_data *rqd) } static void -runq_assign(const struct scheduler *ops, struct vcpu *vc) +runq_assign(const struct scheduler *ops, struct sched_unit *unit) { - struct csched2_unit *svc = vc->sched_unit->priv; + struct csched2_unit *svc = unit->priv; ASSERT(svc->rqd == NULL); - _runq_assign(svc, c2rqd(ops, vc->processor)); + _runq_assign(svc, c2rqd(ops, sched_unit_cpu(unit))); } static void @@ -982,24 +982,24 @@ _runq_deassign(struct csched2_unit *svc) { struct csched2_runqueue_data *rqd = svc->rqd; - ASSERT(!vcpu_on_runq(svc)); + ASSERT(!unit_on_runq(svc)); ASSERT(!(svc->flags & CSFLAG_scheduled)); list_del_init(&svc->rqd_elem); update_max_weight(rqd, 0, svc->weight); - /* Expected new load based on removing this vcpu */ + /* Expected new load based on removing this unit */ rqd->b_avgload = max_t(s_time_t, rqd->b_avgload - svc->avgload, 0); svc->rqd = NULL; } static void -runq_deassign(const struct scheduler *ops, struct vcpu *vc) +runq_deassign(const struct scheduler *ops, struct sched_unit *unit) { - struct csched2_unit *svc = vc->sched_unit->priv; + struct csched2_unit *svc = unit->priv; - ASSERT(svc->rqd == c2rqd(ops, vc->processor)); + ASSERT(svc->rqd == c2rqd(ops, sched_unit_cpu(unit))); _runq_deassign(svc); } @@ -1202,15 +1202,15 @@ update_svc_load(const struct scheduler *ops, struct csched2_unit *svc, int change, s_time_t now) { struct csched2_private *prv = csched2_priv(ops); - s_time_t delta, vcpu_load; + s_time_t delta, unit_load; unsigned int P, W; if ( change == -1 ) - vcpu_load = 1; + unit_load = 1; else if ( change == 1 ) - vcpu_load = 0; + unit_load = 0; else - vcpu_load = vcpu_runnable(svc->vcpu); + unit_load = unit_runnable(svc->unit); W = prv->load_window_shift; P = prv->load_precision_shift; @@ -1218,7 +1218,7 @@ update_svc_load(const struct scheduler *ops, if ( svc->load_last_update + (1ULL << W) < now ) { - svc->avgload = vcpu_load << P; + svc->avgload = unit_load << P; } else { @@ -1231,7 +1231,7 @@ update_svc_load(const struct scheduler *ops, } svc->avgload = svc->avgload + - ((delta * (vcpu_load << P)) >> W) - + ((delta * (unit_load << P)) >> W) - ((delta * svc->avgload) >> W); } svc->load_last_update = now; @@ -1243,14 +1243,14 @@ update_svc_load(const struct scheduler *ops, { struct { uint64_t v_avgload; - unsigned vcpu:16, dom:16; + unsigned unit:16, dom:16; unsigned shift; } d; - d.dom = svc->vcpu->domain->domain_id; - d.vcpu = svc->vcpu->vcpu_id; + d.dom = svc->unit->domain->domain_id; + d.unit = svc->unit->unit_id; d.v_avgload = svc->avgload; d.shift = P; - __trace_var(TRC_CSCHED2_UPDATE_VCPU_LOAD, 1, + __trace_var(TRC_CSCHED2_UPDATE_UNIT_LOAD, 1, sizeof(d), (unsigned char *)&d); } @@ -1272,18 +1272,18 @@ static void runq_insert(const struct scheduler *ops, struct csched2_unit *svc) { struct list_head *iter; - unsigned int cpu = svc->vcpu->processor; + unsigned int cpu = sched_unit_cpu(svc->unit); struct list_head * runq = &c2rqd(ops, cpu)->runq; int pos = 0; ASSERT(spin_is_locked(get_sched_res(cpu)->schedule_lock)); - ASSERT(!vcpu_on_runq(svc)); - ASSERT(c2r(cpu) == c2r(svc->vcpu->processor)); + ASSERT(!unit_on_runq(svc)); + ASSERT(c2r(cpu) == c2r(sched_unit_cpu(svc->unit))); ASSERT(&svc->rqd->runq == runq); - ASSERT(!is_idle_vcpu(svc->vcpu)); - ASSERT(!svc->vcpu->sched_unit->is_running); + ASSERT(!is_idle_unit(svc->unit)); + ASSERT(!svc->unit->is_running); ASSERT(!(svc->flags & CSFLAG_scheduled)); list_for_each( iter, runq ) @@ -1300,11 +1300,11 @@ runq_insert(const struct scheduler *ops, struct csched2_unit *svc) if ( unlikely(tb_init_done) ) { struct { - unsigned vcpu:16, dom:16; + unsigned unit:16, dom:16; unsigned pos; } d; - d.dom = svc->vcpu->domain->domain_id; - d.vcpu = svc->vcpu->vcpu_id; + d.dom = svc->unit->domain->domain_id; + d.unit = svc->unit->unit_id; d.pos = pos; __trace_var(TRC_CSCHED2_RUNQ_POS, 1, sizeof(d), @@ -1314,7 +1314,7 @@ runq_insert(const struct scheduler *ops, struct csched2_unit *svc) static inline void runq_remove(struct csched2_unit *svc) { - ASSERT(vcpu_on_runq(svc)); + ASSERT(unit_on_runq(svc)); list_del_init(&svc->runq_elem); } @@ -1340,8 +1340,8 @@ static inline bool is_preemptable(const struct csched2_unit *svc, if ( ratelimit <= CSCHED2_RATELIMIT_TICKLE_TOLERANCE ) return true; - ASSERT(svc->vcpu->sched_unit->is_running); - return now - svc->vcpu->sched_unit->state_entry_time > + ASSERT(svc->unit->is_running); + return now - svc->unit->state_entry_time > ratelimit - CSCHED2_RATELIMIT_TICKLE_TOLERANCE; } @@ -1369,17 +1369,17 @@ static s_time_t tickle_score(const struct scheduler *ops, s_time_t now, /* * We are dealing with cpus that are marked non-idle (i.e., that are not - * in rqd->idle). However, some of them may be running their idle vcpu, + * in rqd->idle). However, some of them may be running their idle unit, * if taking care of tasklets. In that case, we want to leave it alone. */ - if ( unlikely(is_idle_vcpu(cur->vcpu) || + if ( unlikely(is_idle_unit(cur->unit) || !is_preemptable(cur, now, MICROSECS(prv->ratelimit_us))) ) return -1; burn_credits(rqd, cur, now); score = new->credit - cur->credit; - if ( new->vcpu->processor != cpu ) + if ( sched_unit_cpu(new->unit) != cpu ) score -= CSCHED2_MIGRATE_RESIST; /* @@ -1390,21 +1390,21 @@ static s_time_t tickle_score(const struct scheduler *ops, s_time_t now, */ if ( score > 0 ) { - if ( cpumask_test_cpu(cpu, new->vcpu->sched_unit->cpu_soft_affinity) ) + if ( cpumask_test_cpu(cpu, new->unit->cpu_soft_affinity) ) score += CSCHED2_CREDIT_INIT; - if ( !cpumask_test_cpu(cpu, cur->vcpu->sched_unit->cpu_soft_affinity) ) + if ( !cpumask_test_cpu(cpu, cur->unit->cpu_soft_affinity) ) score += CSCHED2_CREDIT_INIT; } if ( unlikely(tb_init_done) ) { struct { - unsigned vcpu:16, dom:16; + unsigned unit:16, dom:16; int credit, score; } d; - d.dom = cur->vcpu->domain->domain_id; - d.vcpu = cur->vcpu->vcpu_id; + d.dom = cur->unit->domain->domain_id; + d.unit = cur->unit->unit_id; d.credit = cur->credit; d.score = score; __trace_var(TRC_CSCHED2_TICKLE_CHECK, 1, @@ -1416,14 +1416,14 @@ static s_time_t tickle_score(const struct scheduler *ops, s_time_t now, } /* - * Check what processor it is best to 'wake', for picking up a vcpu that has + * Check what processor it is best to 'wake', for picking up an unit that has * just been put (back) in the runqueue. Logic is as follows: * 1. if there are idle processors in the runq, wake one of them; - * 2. if there aren't idle processor, check the one were the vcpu was + * 2. if there aren't idle processor, check the one were the unit was * running before to see if we can preempt what's running there now * (and hence doing just one migration); - * 3. last stand: check all processors and see if the vcpu is in right - * of preempting any of the other vcpus running on them (this requires + * 3. last stand: check all processors and see if the unit is in right + * of preempting any of the other units running on them (this requires * two migrations, and that's indeed why it is left as the last stand). * * Note that when we say 'idle processors' what we really mean is (pretty @@ -1436,10 +1436,10 @@ runq_tickle(const struct scheduler *ops, struct csched2_unit *new, s_time_t now) { int i, ipid = -1; s_time_t max = 0; - struct sched_unit *unit = new->vcpu->sched_unit; - unsigned int bs, cpu = new->vcpu->processor; + struct sched_unit *unit = new->unit; + unsigned int bs, cpu = sched_unit_cpu(unit); struct csched2_runqueue_data *rqd = c2rqd(ops, cpu); - cpumask_t *online = cpupool_domain_cpumask(new->vcpu->domain); + cpumask_t *online = cpupool_domain_cpumask(unit->domain); cpumask_t mask; ASSERT(new->rqd == rqd); @@ -1447,13 +1447,13 @@ runq_tickle(const struct scheduler *ops, struct csched2_unit *new, s_time_t now) if ( unlikely(tb_init_done) ) { struct { - unsigned vcpu:16, dom:16; + unsigned unit:16, dom:16; unsigned processor; int credit; } d; - d.dom = new->vcpu->domain->domain_id; - d.vcpu = new->vcpu->vcpu_id; - d.processor = new->vcpu->processor; + d.dom = unit->domain->domain_id; + d.unit = unit->unit_id; + d.processor = cpu; d.credit = new->credit; __trace_var(TRC_CSCHED2_TICKLE_NEW, 1, sizeof(d), @@ -1461,11 +1461,11 @@ runq_tickle(const struct scheduler *ops, struct csched2_unit *new, s_time_t now) } /* - * Exclusive pinning is when a vcpu has hard-affinity with only one - * cpu, and there is no other vcpu that has hard-affinity with that + * Exclusive pinning is when an unit has hard-affinity with only one + * cpu, and there is no other unit that has hard-affinity with that * same cpu. This is infrequent, but if it happens, is for achieving * the most possible determinism, and least possible overhead for - * the vcpus in question. + * the units in question. * * Try to identify the vast majority of these situations, and deal * with them quickly. @@ -1532,7 +1532,7 @@ runq_tickle(const struct scheduler *ops, struct csched2_unit *new, s_time_t now) /* * Note that, if we are here, it means we have done the hard-affinity * balancing step of the loop, and hence what we have in cpumask_scratch - * is what we put there for last, i.e., new's vcpu_hard_affinity & online + * is what we put there for last, i.e., new's unit_hard_affinity & online * which is exactly what we need for the next part of the function. */ @@ -1543,7 +1543,7 @@ runq_tickle(const struct scheduler *ops, struct csched2_unit *new, s_time_t now) * * For deciding which cpu to tickle, we use tickle_score(), which will * factor in both new's soft-affinity, and the soft-affinity of the - * vcpu running on each cpu that we consider. + * unit running on each cpu that we consider. */ cpumask_andnot(&mask, &rqd->active, &rqd->idle); cpumask_andnot(&mask, &mask, &rqd->tickled); @@ -1588,7 +1588,7 @@ runq_tickle(const struct scheduler *ops, struct csched2_unit *new, s_time_t now) return; } - ASSERT(!is_idle_vcpu(curr_on_cpu(ipid)->vcpu)); + ASSERT(!is_idle_unit(curr_on_cpu(ipid))); SCHED_STAT_CRANK(tickled_busy_cpu); tickle: BUG_ON(ipid == -1); @@ -1623,16 +1623,16 @@ static void reset_credit(const struct scheduler *ops, int cpu, s_time_t now, /* * Under normal circumstances, snext->credit should never be less - * than -CSCHED2_MIN_TIMER. However, under some circumstances, a - * vcpu with low credits may be allowed to run long enough that + * than -CSCHED2_MIN_TIMER. However, under some circumstances, an + * unit with low credits may be allowed to run long enough that * its credits are actually less than -CSCHED2_CREDIT_INIT. - * (Instances have been observed, for example, where a vcpu with + * (Instances have been observed, for example, where an unit with * 200us of credit was allowed to run for 11ms, giving it -10.8ms * of credit. Thus it was still negative even after the reset.) * * If this is the case for snext, we simply want to keep moving * everyone up until it is in the black again. This fair because - * none of the other vcpus want to run at the moment. + * none of the other units want to run at the moment. * * Rather than looping, however, we just calculate a multiplier, * avoiding an integer division and multiplication in the common @@ -1649,16 +1649,16 @@ static void reset_credit(const struct scheduler *ops, int cpu, s_time_t now, int start_credit; svc = list_entry(iter, struct csched2_unit, rqd_elem); - svc_cpu = svc->vcpu->processor; + svc_cpu = sched_unit_cpu(svc->unit); - ASSERT(!is_idle_vcpu(svc->vcpu)); + ASSERT(!is_idle_unit(svc->unit)); ASSERT(svc->rqd == rqd); /* * If svc is running, it is our responsibility to make sure, here, * that the credit it has spent so far get accounted. */ - if ( svc->vcpu == curr_on_cpu(svc_cpu)->vcpu ) + if ( svc->unit == curr_on_cpu(svc_cpu) ) { burn_credits(rqd, svc, now); /* @@ -1689,12 +1689,12 @@ static void reset_credit(const struct scheduler *ops, int cpu, s_time_t now, if ( unlikely(tb_init_done) ) { struct { - unsigned vcpu:16, dom:16; + unsigned unit:16, dom:16; int credit_start, credit_end; unsigned multiplier; } d; - d.dom = svc->vcpu->domain->domain_id; - d.vcpu = svc->vcpu->vcpu_id; + d.dom = svc->unit->domain->domain_id; + d.unit = svc->unit->unit_id; d.credit_start = start_credit; d.credit_end = svc->credit; d.multiplier = m; @@ -1714,9 +1714,9 @@ void burn_credits(struct csched2_runqueue_data *rqd, { s_time_t delta; - ASSERT(svc == csched2_unit(curr_on_cpu(svc->vcpu->processor))); + ASSERT(svc == csched2_unit(curr_on_cpu(sched_unit_cpu(svc->unit)))); - if ( unlikely(is_idle_vcpu(svc->vcpu)) ) + if ( unlikely(is_idle_unit(svc->unit)) ) { ASSERT(svc->credit == CSCHED2_IDLE_CREDIT); return; @@ -1745,12 +1745,12 @@ void burn_credits(struct csched2_runqueue_data *rqd, if ( unlikely(tb_init_done) ) { struct { - unsigned vcpu:16, dom:16; + unsigned unit:16, dom:16; int credit, budget; int delta; } d; - d.dom = svc->vcpu->domain->domain_id; - d.vcpu = svc->vcpu->vcpu_id; + d.dom = svc->unit->domain->domain_id; + d.unit = svc->unit->unit_id; d.credit = svc->credit; d.budget = has_cap(svc) ? svc->budget : INT_MIN; d.delta = delta; @@ -1764,39 +1764,39 @@ void burn_credits(struct csched2_runqueue_data *rqd, * Budget-related code. */ -static void park_vcpu(struct csched2_unit *svc) +static void park_unit(struct csched2_unit *svc) { - struct vcpu *v = svc->vcpu; + struct sched_unit *unit = svc->unit; ASSERT(spin_is_locked(&svc->sdom->budget_lock)); /* - * It was impossible to find budget for this vCPU, so it has to be + * It was impossible to find budget for this unit, so it has to be * "parked". This implies it is not runnable, so we mark it as such in - * its pause_flags. If the vCPU is currently scheduled (which means we + * its pause_flags. If the unit is currently scheduled (which means we * are here after being called from within csched_schedule()), flagging * is enough, as we'll choose someone else, and then context_saved() * will take care of updating the load properly. * - * If, OTOH, the vCPU is sitting in the runqueue (which means we are here + * If, OTOH, the unit is sitting in the runqueue (which means we are here * after being called from within runq_candidate()), we must go all the * way down to taking it out of there, and updating the load accordingly. * - * In both cases, we also add it to the list of parked vCPUs of the domain. + * In both cases, we also add it to the list of parked units of the domain. */ - __set_bit(_VPF_parked, &v->pause_flags); - if ( vcpu_on_runq(svc) ) + sched_set_pause_flags(unit, _VPF_parked); + if ( unit_on_runq(svc) ) { runq_remove(svc); update_load(svc->sdom->dom->cpupool->sched, svc->rqd, svc, -1, NOW()); } - list_add(&svc->parked_elem, &svc->sdom->parked_vcpus); + list_add(&svc->parked_elem, &svc->sdom->parked_units); } -static bool vcpu_grab_budget(struct csched2_unit *svc) +static bool unit_grab_budget(struct csched2_unit *svc) { struct csched2_dom *sdom = svc->sdom; - unsigned int cpu = svc->vcpu->processor; + unsigned int cpu = sched_unit_cpu(svc->unit); ASSERT(spin_is_locked(get_sched_res(cpu)->schedule_lock)); @@ -1808,9 +1808,9 @@ static bool vcpu_grab_budget(struct csched2_unit *svc) /* * Here, svc->budget is <= 0 (as, if it was > 0, we'd have taken the if - * above!). That basically means the vCPU has overrun a bit --because of + * above!). That basically means the unit has overrun a bit --because of * various reasons-- and we want to take that into account. With the +=, - * we are actually subtracting the amount of budget the vCPU has + * we are actually subtracting the amount of budget the unit has * overconsumed, from the total domain budget. */ sdom->budget += svc->budget; @@ -1831,7 +1831,7 @@ static bool vcpu_grab_budget(struct csched2_unit *svc) else { svc->budget = 0; - park_vcpu(svc); + park_unit(svc); } spin_unlock(&sdom->budget_lock); @@ -1840,10 +1840,10 @@ static bool vcpu_grab_budget(struct csched2_unit *svc) } static void -vcpu_return_budget(struct csched2_unit *svc, struct list_head *parked) +unit_return_budget(struct csched2_unit *svc, struct list_head *parked) { struct csched2_dom *sdom = svc->sdom; - unsigned int cpu = svc->vcpu->processor; + unsigned int cpu = sched_unit_cpu(svc->unit); ASSERT(spin_is_locked(get_sched_res(cpu)->schedule_lock)); ASSERT(list_empty(parked)); @@ -1852,7 +1852,7 @@ vcpu_return_budget(struct csched2_unit *svc, struct list_head *parked) spin_lock(&sdom->budget_lock); /* - * The vCPU is stopping running (e.g., because it's blocking, or it has + * The unit is stopping running (e.g., because it's blocking, or it has * been preempted). If it hasn't consumed all the budget it got when, * starting to run, put that remaining amount back in the domain's budget * pool. @@ -1861,58 +1861,58 @@ vcpu_return_budget(struct csched2_unit *svc, struct list_head *parked) svc->budget = 0; /* - * Making budget available again to the domain means that parked vCPUs - * may be unparked and run. They are, if any, in the domain's parked_vcpus + * Making budget available again to the domain means that parked units + * may be unparked and run. They are, if any, in the domain's parked_units * list, so we want to go through that and unpark them (so they can try * to get some budget). * * Touching the list requires the budget_lock, which we hold. Let's * therefore put everyone in that list in another, temporary list, which - * then the caller will traverse, unparking the vCPUs it finds there. + * then the caller will traverse, unparking the units it finds there. * * In fact, we can't do the actual unparking here, because that requires - * taking the runqueue lock of the vCPUs being unparked, and we can't + * taking the runqueue lock of the units being unparked, and we can't * take any runqueue locks while we hold a budget_lock. */ if ( sdom->budget > 0 ) - list_splice_init(&sdom->parked_vcpus, parked); + list_splice_init(&sdom->parked_units, parked); spin_unlock(&sdom->budget_lock); } static void -unpark_parked_vcpus(const struct scheduler *ops, struct list_head *vcpus) +unpark_parked_units(const struct scheduler *ops, struct list_head *units) { struct csched2_unit *svc, *tmp; spinlock_t *lock; - list_for_each_entry_safe(svc, tmp, vcpus, parked_elem) + list_for_each_entry_safe ( svc, tmp, units, parked_elem ) { unsigned long flags; s_time_t now; - lock = unit_schedule_lock_irqsave(svc->vcpu->sched_unit, &flags); + lock = unit_schedule_lock_irqsave(svc->unit, &flags); - __clear_bit(_VPF_parked, &svc->vcpu->pause_flags); + sched_clear_pause_flags(svc->unit, _VPF_parked); if ( unlikely(svc->flags & CSFLAG_scheduled) ) { /* * We end here if a budget replenishment arrived between * csched2_schedule() (and, in particular, after a call to - * vcpu_grab_budget() that returned false), and + * unit_grab_budget() that returned false), and * context_saved(). By setting __CSFLAG_delayed_runq_add, - * we tell context_saved() to put the vCPU back in the + * we tell context_saved() to put the unit back in the * runqueue, from where it will compete with the others * for the newly replenished budget. */ ASSERT( svc->rqd != NULL ); - ASSERT( c2rqd(ops, svc->vcpu->processor) == svc->rqd ); + ASSERT( c2rqd(ops, sched_unit_cpu(svc->unit)) == svc->rqd ); __set_bit(__CSFLAG_delayed_runq_add, &svc->flags); } - else if ( vcpu_runnable(svc->vcpu) ) + else if ( unit_runnable(svc->unit) ) { /* - * The vCPU should go back to the runqueue, and compete for + * The unit should go back to the runqueue, and compete for * the newly replenished budget, but only if it is actually * runnable (and was therefore offline only because of the * lack of budget). @@ -1924,7 +1924,7 @@ unpark_parked_vcpus(const struct scheduler *ops, struct list_head *vcpus) } list_del_init(&svc->parked_elem); - unit_schedule_unlock_irqrestore(lock, flags, svc->vcpu->sched_unit); + unit_schedule_unlock_irqrestore(lock, flags, svc->unit); } } @@ -1954,7 +1954,7 @@ static void replenish_domain_budget(void* data) * * Even in cases of overrun or delay, however, we expect that in 99% of * cases, doing just one replenishment will be good enough for being able - * to unpark the vCPUs that are waiting for some budget. + * to unpark the units that are waiting for some budget. */ do_replenish(sdom); @@ -1974,7 +1974,7 @@ static void replenish_domain_budget(void* data) } /* * 2) if we overrun by more than tot_budget, then budget+tot_budget is - * still < 0, which means that we can't unpark the vCPUs. Let's bail, + * still < 0, which means that we can't unpark the units. Let's bail, * and wait for future replenishments. */ if ( unlikely(sdom->budget <= 0) ) @@ -1988,14 +1988,14 @@ static void replenish_domain_budget(void* data) /* * As above, let's prepare the temporary list, out of the domain's - * parked_vcpus list, now that we hold the budget_lock. Then, drop such + * parked_units list, now that we hold the budget_lock. Then, drop such * lock, and pass the list to the unparking function. */ - list_splice_init(&sdom->parked_vcpus, &parked); + list_splice_init(&sdom->parked_units, &parked); spin_unlock_irqrestore(&sdom->budget_lock, flags); - unpark_parked_vcpus(sdom->dom->cpupool->sched, &parked); + unpark_parked_units(sdom->dom->cpupool->sched, &parked); out: set_timer(&sdom->repl_timer, sdom->next_repl); @@ -2003,37 +2003,36 @@ static void replenish_domain_budget(void* data) #ifndef NDEBUG static inline void -csched2_vcpu_check(struct vcpu *vc) +csched2_unit_check(struct sched_unit *unit) { - struct csched2_unit * const svc = csched2_unit(vc->sched_unit); + struct csched2_unit * const svc = csched2_unit(unit); struct csched2_dom * const sdom = svc->sdom; - BUG_ON( svc->vcpu != vc ); - BUG_ON( sdom != csched2_dom(vc->domain) ); + BUG_ON( svc->unit != unit ); + BUG_ON( sdom != csched2_dom(unit->domain) ); if ( sdom ) { - BUG_ON( is_idle_vcpu(vc) ); - BUG_ON( sdom->dom != vc->domain ); + BUG_ON( is_idle_unit(unit) ); + BUG_ON( sdom->dom != unit->domain ); } else { - BUG_ON( !is_idle_vcpu(vc) ); + BUG_ON( !is_idle_unit(unit) ); } SCHED_STAT_CRANK(unit_check); } -#define CSCHED2_VCPU_CHECK(_vc) (csched2_vcpu_check(_vc)) +#define CSCHED2_UNIT_CHECK(unit) (csched2_unit_check(unit)) #else -#define CSCHED2_VCPU_CHECK(_vc) +#define CSCHED2_UNIT_CHECK(unit) #endif static void * csched2_alloc_vdata(const struct scheduler *ops, struct sched_unit *unit, void *dd) { - struct vcpu *vc = unit->vcpu; struct csched2_unit *svc; - /* Allocate per-VCPU info */ + /* Allocate per-UNIT info */ svc = xzalloc(struct csched2_unit); if ( svc == NULL ) return NULL; @@ -2042,10 +2041,10 @@ csched2_alloc_vdata(const struct scheduler *ops, struct sched_unit *unit, INIT_LIST_HEAD(&svc->runq_elem); svc->sdom = dd; - svc->vcpu = vc; + svc->unit = unit; svc->flags = 0U; - if ( ! is_idle_vcpu(vc) ) + if ( ! is_idle_unit(unit) ) { ASSERT(svc->sdom != NULL); svc->credit = CSCHED2_CREDIT_INIT; @@ -2074,19 +2073,18 @@ csched2_alloc_vdata(const struct scheduler *ops, struct sched_unit *unit, static void csched2_unit_sleep(const struct scheduler *ops, struct sched_unit *unit) { - struct vcpu *vc = unit->vcpu; struct csched2_unit * const svc = csched2_unit(unit); - ASSERT(!is_idle_vcpu(vc)); + ASSERT(!is_idle_unit(unit)); SCHED_STAT_CRANK(unit_sleep); - if ( curr_on_cpu(vc->processor) == unit ) + if ( curr_on_cpu(sched_unit_cpu(unit)) == unit ) { - tickle_cpu(vc->processor, svc->rqd); + tickle_cpu(sched_unit_cpu(unit), svc->rqd); } - else if ( vcpu_on_runq(svc) ) + else if ( unit_on_runq(svc) ) { - ASSERT(svc->rqd == c2rqd(ops, vc->processor)); + ASSERT(svc->rqd == c2rqd(ops, sched_unit_cpu(unit))); update_load(ops, svc->rqd, svc, -1, NOW()); runq_remove(svc); } @@ -2097,14 +2095,13 @@ csched2_unit_sleep(const struct scheduler *ops, struct sched_unit *unit) static void csched2_unit_wake(const struct scheduler *ops, struct sched_unit *unit) { - struct vcpu *vc = unit->vcpu; struct csched2_unit * const svc = csched2_unit(unit); - unsigned int cpu = vc->processor; + unsigned int cpu = sched_unit_cpu(unit); s_time_t now; ASSERT(spin_is_locked(get_sched_res(cpu)->schedule_lock)); - ASSERT(!is_idle_vcpu(vc)); + ASSERT(!is_idle_unit(unit)); if ( unlikely(curr_on_cpu(cpu) == unit) ) { @@ -2112,18 +2109,18 @@ csched2_unit_wake(const struct scheduler *ops, struct sched_unit *unit) goto out; } - if ( unlikely(vcpu_on_runq(svc)) ) + if ( unlikely(unit_on_runq(svc)) ) { SCHED_STAT_CRANK(unit_wake_onrunq); goto out; } - if ( likely(vcpu_runnable(vc)) ) + if ( likely(unit_runnable(unit)) ) SCHED_STAT_CRANK(unit_wake_runnable); else SCHED_STAT_CRANK(unit_wake_not_runnable); - /* If the context hasn't been saved for this vcpu yet, we can't put it on + /* If the context hasn't been saved for this unit yet, we can't put it on * another runqueue. Instead, we set a flag so that it will be put on the runqueue * after the context has been saved. */ if ( unlikely(svc->flags & CSFLAG_scheduled) ) @@ -2134,15 +2131,15 @@ csched2_unit_wake(const struct scheduler *ops, struct sched_unit *unit) /* Add into the new runqueue if necessary */ if ( svc->rqd == NULL ) - runq_assign(ops, vc); + runq_assign(ops, unit); else - ASSERT(c2rqd(ops, vc->processor) == svc->rqd ); + ASSERT(c2rqd(ops, sched_unit_cpu(unit)) == svc->rqd ); now = NOW(); update_load(ops, svc->rqd, svc, 1, now); - - /* Put the VCPU on the runq */ + + /* Put the UNIT on the runq */ runq_insert(ops, svc); runq_tickle(ops, svc, now); @@ -2155,49 +2152,48 @@ csched2_unit_yield(const struct scheduler *ops, struct sched_unit *unit) { struct csched2_unit * const svc = csched2_unit(unit); - __set_bit(__CSFLAG_vcpu_yield, &svc->flags); + __set_bit(__CSFLAG_unit_yield, &svc->flags); } static void csched2_context_saved(const struct scheduler *ops, struct sched_unit *unit) { - struct vcpu *vc = unit->vcpu; struct csched2_unit * const svc = csched2_unit(unit); spinlock_t *lock = unit_schedule_lock_irq(unit); s_time_t now = NOW(); LIST_HEAD(were_parked); - BUG_ON( !is_idle_vcpu(vc) && svc->rqd != c2rqd(ops, vc->processor)); - ASSERT(is_idle_vcpu(vc) || svc->rqd == c2rqd(ops, vc->processor)); + BUG_ON( !is_idle_unit(unit) && svc->rqd != c2rqd(ops, sched_unit_cpu(unit))); + ASSERT(is_idle_unit(unit) || svc->rqd == c2rqd(ops, sched_unit_cpu(unit))); - /* This vcpu is now eligible to be put on the runqueue again */ + /* This unit is now eligible to be put on the runqueue again */ __clear_bit(__CSFLAG_scheduled, &svc->flags); if ( unlikely(has_cap(svc) && svc->budget > 0) ) - vcpu_return_budget(svc, &were_parked); + unit_return_budget(svc, &were_parked); /* If someone wants it on the runqueue, put it there. */ /* * NB: We can get rid of CSFLAG_scheduled by checking for - * vc->is_running and vcpu_on_runq(svc) here. However, + * vc->is_running and unit_on_runq(svc) here. However, * since we're accessing the flags cacheline anyway, * it seems a bit pointless; especially as we have plenty of * bits free. */ if ( __test_and_clear_bit(__CSFLAG_delayed_runq_add, &svc->flags) - && likely(vcpu_runnable(vc)) ) + && likely(unit_runnable(unit)) ) { - ASSERT(!vcpu_on_runq(svc)); + ASSERT(!unit_on_runq(svc)); runq_insert(ops, svc); runq_tickle(ops, svc, now); } - else if ( !is_idle_vcpu(vc) ) + else if ( !is_idle_unit(unit) ) update_load(ops, svc->rqd, svc, -1, now); unit_schedule_unlock_irq(lock, unit); - unpark_parked_vcpus(ops, &were_parked); + unpark_parked_units(ops, &were_parked); } #define MAX_LOAD (STIME_MAX) @@ -2205,9 +2201,8 @@ static struct sched_resource * csched2_res_pick(const struct scheduler *ops, struct sched_unit *unit) { struct csched2_private *prv = csched2_priv(ops); - struct vcpu *vc = unit->vcpu; int i, min_rqi = -1, min_s_rqi = -1; - unsigned int new_cpu, cpu = vc->processor; + unsigned int new_cpu, cpu = sched_unit_cpu(unit); struct csched2_unit *svc = csched2_unit(unit); s_time_t min_avgload = MAX_LOAD, min_s_avgload = MAX_LOAD; bool has_soft; @@ -2245,7 +2240,7 @@ csched2_res_pick(const struct scheduler *ops, struct sched_unit *unit) } cpumask_and(cpumask_scratch_cpu(cpu), unit->cpu_hard_affinity, - cpupool_domain_cpumask(vc->domain)); + cpupool_domain_cpumask(unit->domain)); /* * First check to see if we're here because someone else suggested a place @@ -2356,7 +2351,7 @@ csched2_res_pick(const struct scheduler *ops, struct sched_unit *unit) * We have soft affinity, and we have a candidate runq, so go for it. * * Note that, to obtain the soft-affinity mask, we "just" put what we - * have in cpumask_scratch in && with vc->cpu_soft_affinity. This is + * have in cpumask_scratch in && with unit->cpu_soft_affinity. This is * ok because: * - we know that unit->cpu_hard_affinity and ->cpu_soft_affinity have * a non-empty intersection (because has_soft is true); @@ -2379,7 +2374,7 @@ csched2_res_pick(const struct scheduler *ops, struct sched_unit *unit) * any suitable runq. But we did find one when considering hard * affinity, so go for it. * - * cpumask_scratch already has vc->cpu_hard_affinity & + * cpumask_scratch already has unit->cpu_hard_affinity & * cpupool_domain_cpumask() in it, so it's enough that we filter * with the cpus of the runq. */ @@ -2410,11 +2405,11 @@ csched2_res_pick(const struct scheduler *ops, struct sched_unit *unit) { struct { uint64_t b_avgload; - unsigned vcpu:16, dom:16; + unsigned unit:16, dom:16; unsigned rq_id:16, new_cpu:16; } d; - d.dom = vc->domain->domain_id; - d.vcpu = vc->vcpu_id; + d.dom = unit->domain->domain_id; + d.unit = unit->unit_id; d.rq_id = min_rqi; d.b_avgload = min_avgload; d.new_cpu = new_cpu; @@ -2433,10 +2428,10 @@ typedef struct { struct csched2_unit * best_push_svc, *best_pull_svc; /* NB: Read by consider() */ struct csched2_runqueue_data *lrqd; - struct csched2_runqueue_data *orqd; + struct csched2_runqueue_data *orqd; } balance_state_t; -static void consider(balance_state_t *st, +static void consider(balance_state_t *st, struct csched2_unit *push_svc, struct csched2_unit *pull_svc) { @@ -2475,17 +2470,17 @@ static void migrate(const struct scheduler *ops, struct csched2_runqueue_data *trqd, s_time_t now) { - int cpu = svc->vcpu->processor; - struct sched_unit *unit = svc->vcpu->sched_unit; + struct sched_unit *unit = svc->unit; + int cpu = sched_unit_cpu(unit); if ( unlikely(tb_init_done) ) { struct { - unsigned vcpu:16, dom:16; + unsigned unit:16, dom:16; unsigned rqi:16, trqi:16; } d; - d.dom = svc->vcpu->domain->domain_id; - d.vcpu = svc->vcpu->vcpu_id; + d.dom = unit->domain->domain_id; + d.unit = unit->unit_id; d.rqi = svc->rqd->id; d.trqi = trqd->id; __trace_var(TRC_CSCHED2_MIGRATE, 1, @@ -2497,7 +2492,7 @@ static void migrate(const struct scheduler *ops, { /* It's running; mark it to migrate. */ svc->migrate_rqd = trqd; - __set_bit(_VPF_migrating, &svc->vcpu->pause_flags); + sched_set_pause_flags(unit, _VPF_migrating); __set_bit(__CSFLAG_runq_migrate_request, &svc->flags); SCHED_STAT_CRANK(migrate_requested); tickle_cpu(cpu, svc->rqd); @@ -2506,7 +2501,7 @@ static void migrate(const struct scheduler *ops, { int on_runq = 0; /* It's not running; just move it */ - if ( vcpu_on_runq(svc) ) + if ( unit_on_runq(svc) ) { runq_remove(svc); update_load(ops, svc->rqd, NULL, -1, now); @@ -2515,14 +2510,14 @@ static void migrate(const struct scheduler *ops, _runq_deassign(svc); cpumask_and(cpumask_scratch_cpu(cpu), unit->cpu_hard_affinity, - cpupool_domain_cpumask(svc->vcpu->domain)); + cpupool_domain_cpumask(unit->domain)); cpumask_and(cpumask_scratch_cpu(cpu), cpumask_scratch_cpu(cpu), &trqd->active); - svc->vcpu->processor = cpumask_cycle(trqd->pick_bias, - cpumask_scratch_cpu(cpu)); - svc->vcpu->sched_unit->res = get_sched_res(svc->vcpu->processor); - trqd->pick_bias = svc->vcpu->processor; - ASSERT(svc->vcpu->processor < nr_cpu_ids); + sched_set_res(unit, + get_sched_res(cpumask_cycle(trqd->pick_bias, + cpumask_scratch_cpu(cpu)))); + trqd->pick_bias = sched_unit_cpu(unit); + ASSERT(sched_unit_cpu(unit) < nr_cpu_ids); _runq_assign(svc, trqd); if ( on_runq ) @@ -2542,14 +2537,14 @@ static void migrate(const struct scheduler *ops, * - svc is not already flagged to migrate, * - if svc is allowed to run on at least one of the pcpus of rqd. */ -static bool vcpu_is_migrateable(struct csched2_unit *svc, +static bool unit_is_migrateable(struct csched2_unit *svc, struct csched2_runqueue_data *rqd) { - struct vcpu *v = svc->vcpu; - int cpu = svc->vcpu->processor; + struct sched_unit *unit = svc->unit; + int cpu = sched_unit_cpu(unit); - cpumask_and(cpumask_scratch_cpu(cpu), v->sched_unit->cpu_hard_affinity, - cpupool_domain_cpumask(v->domain)); + cpumask_and(cpumask_scratch_cpu(cpu), unit->cpu_hard_affinity, + cpupool_domain_cpumask(unit->domain)); return !(svc->flags & CSFLAG_runq_migrate_request) && cpumask_intersects(cpumask_scratch_cpu(cpu), &rqd->active); @@ -2586,7 +2581,7 @@ retry: for_each_cpu(i, &prv->active_queues) { s_time_t delta; - + st.orqd = prv->rqd + i; if ( st.orqd == st.lrqd @@ -2594,7 +2589,7 @@ retry: continue; update_runq_load(ops, st.orqd, 0, now); - + delta = st.lrqd->b_avgload - st.orqd->b_avgload; if ( delta < 0 ) delta = -delta; @@ -2617,7 +2612,7 @@ retry: s_time_t load_max; int cpus_max; - + load_max = st.lrqd->b_avgload; if ( st.orqd->b_avgload > load_max ) load_max = st.orqd->b_avgload; @@ -2656,7 +2651,7 @@ retry: opt_overload_balance_tolerance)) ) goto out; } - + /* Try to grab the other runqueue lock; if it's been taken in the * meantime, try the process over again. This can't deadlock * because if it doesn't get any other rqd locks, it will simply @@ -2696,17 +2691,17 @@ retry: update_svc_load(ops, push_svc, 0, now); - if ( !vcpu_is_migrateable(push_svc, st.orqd) ) + if ( !unit_is_migrateable(push_svc, st.orqd) ) continue; list_for_each( pull_iter, &st.orqd->svc ) { struct csched2_unit * pull_svc = list_entry(pull_iter, struct csched2_unit, rqd_elem); - + if ( !inner_load_updated ) update_svc_load(ops, pull_svc, 0, now); - - if ( !vcpu_is_migrateable(pull_svc, st.lrqd) ) + + if ( !unit_is_migrateable(pull_svc, st.lrqd) ) continue; consider(&st, push_svc, pull_svc); @@ -2721,8 +2716,8 @@ retry: list_for_each( pull_iter, &st.orqd->svc ) { struct csched2_unit * pull_svc = list_entry(pull_iter, struct csched2_unit, rqd_elem); - - if ( !vcpu_is_migrateable(pull_svc, st.lrqd) ) + + if ( !unit_is_migrateable(pull_svc, st.lrqd) ) continue; /* Consider pull only */ @@ -2745,8 +2740,7 @@ static void csched2_unit_migrate( const struct scheduler *ops, struct sched_unit *unit, unsigned int new_cpu) { - struct vcpu *vc = unit->vcpu; - struct domain *d = vc->domain; + struct domain *d = unit->domain; struct csched2_unit * const svc = csched2_unit(unit); struct csched2_runqueue_data *trqd; s_time_t now = NOW(); @@ -2758,25 +2752,24 @@ csched2_unit_migrate( * cpupool. * * And since there indeed is the chance that it is not part of it, all - * we must do is remove _and_ unassign the vCPU from any runqueue, as + * we must do is remove _and_ unassign the unit from any runqueue, as * well as updating v->processor with the target, so that the suspend * process can continue. * * It will then be during resume that a new, meaningful, value for * v->processor will be chosen, and during actual domain unpause that - * the vCPU will be assigned to and added to the proper runqueue. + * the unit will be assigned to and added to the proper runqueue. */ if ( unlikely(!cpumask_test_cpu(new_cpu, cpupool_domain_cpumask(d))) ) { ASSERT(system_state == SYS_STATE_suspend); - if ( vcpu_on_runq(svc) ) + if ( unit_on_runq(svc) ) { runq_remove(svc); update_load(ops, svc->rqd, NULL, -1, now); } _runq_deassign(svc); - vc->processor = new_cpu; - unit->res = get_sched_res(new_cpu); + sched_set_res(unit, get_sched_res(new_cpu)); return; } @@ -2790,17 +2783,14 @@ csched2_unit_migrate( * Do the actual movement toward new_cpu, and update vc->processor. * If we are changing runqueue, migrate() takes care of everything. * If we are not changing runqueue, we need to update vc->processor - * here. In fact, if, for instance, we are here because the vcpu's + * here. In fact, if, for instance, we are here because the unit's * hard affinity changed, we don't want to risk leaving vc->processor * pointing to a pcpu where we can't run any longer. */ if ( trqd != svc->rqd ) migrate(ops, svc, trqd, now); else - { - vc->processor = new_cpu; - unit->res = get_sched_res(new_cpu); - } + sched_set_res(unit, get_sched_res(new_cpu)); } static int @@ -2812,18 +2802,18 @@ csched2_dom_cntl( struct csched2_dom * const sdom = csched2_dom(d); struct csched2_private *prv = csched2_priv(ops); unsigned long flags; - struct vcpu *v; + struct sched_unit *unit; int rc = 0; /* * Locking: * - we must take the private lock for accessing the weights of the - * vcpus of d, and/or the cap; + * units of d, and/or the cap; * - in the putinfo case, we also need the runqueue lock(s), for * updating the max waight of the runqueue(s). * If changing the cap, we also need the budget_lock, for updating * the value of the domain budget pool (and the runqueue lock, - * for adjusting the parameters and rescheduling any vCPU that is + * for adjusting the parameters and rescheduling any unit that is * running at the time of the change). */ switch ( op->cmd ) @@ -2845,18 +2835,18 @@ csched2_dom_cntl( sdom->weight = op->u.credit2.weight; - /* Update weights for vcpus, and max_weight for runqueues on which they reside */ - for_each_vcpu ( d, v ) + /* Update weights for units, and max_weight for runqueues on which they reside */ + for_each_sched_unit ( d, unit ) { - struct csched2_unit *svc = csched2_unit(v->sched_unit); - spinlock_t *lock = unit_schedule_lock(svc->vcpu->sched_unit); + struct csched2_unit *svc = csched2_unit(unit); + spinlock_t *lock = unit_schedule_lock(unit); - ASSERT(svc->rqd == c2rqd(ops, svc->vcpu->processor)); + ASSERT(svc->rqd == c2rqd(ops, sched_unit_cpu(unit))); svc->weight = sdom->weight; update_max_weight(svc->rqd, svc->weight, old_weight); - unit_schedule_unlock(lock, svc->vcpu->sched_unit); + unit_schedule_unlock(lock, unit); } } /* Cap */ @@ -2865,8 +2855,8 @@ csched2_dom_cntl( struct csched2_unit *svc; spinlock_t *lock; - /* Cap is only valid if it's below 100 * nr_of_vCPUS */ - if ( op->u.credit2.cap > 100 * sdom->nr_vcpus ) + /* Cap is only valid if it's below 100 * nr_of_units */ + if ( op->u.credit2.cap > 100 * sdom->nr_units ) { rc = -EINVAL; write_unlock_irqrestore(&prv->lock, flags); @@ -2879,23 +2869,23 @@ csched2_dom_cntl( spin_unlock(&sdom->budget_lock); /* - * When trying to get some budget and run, each vCPU will grab - * from the pool 1/N (with N = nr of vCPUs of the domain) of - * the total budget. Roughly speaking, this means each vCPU will + * When trying to get some budget and run, each unit will grab + * from the pool 1/N (with N = nr of units of the domain) of + * the total budget. Roughly speaking, this means each unit will * have at least one chance to run during every period. */ - for_each_vcpu ( d, v ) + for_each_sched_unit ( d, unit ) { - svc = csched2_unit(v->sched_unit); - lock = unit_schedule_lock(svc->vcpu->sched_unit); + svc = csched2_unit(unit); + lock = unit_schedule_lock(unit); /* * Too small quotas would in theory cause a lot of overhead, * which then won't happen because, in csched2_runtime(), * CSCHED2_MIN_TIMER is what would be used anyway. */ - svc->budget_quota = max(sdom->tot_budget / sdom->nr_vcpus, + svc->budget_quota = max(sdom->tot_budget / sdom->nr_units, CSCHED2_MIN_TIMER); - unit_schedule_unlock(lock, svc->vcpu->sched_unit); + unit_schedule_unlock(lock, unit); } if ( sdom->cap == 0 ) @@ -2905,7 +2895,7 @@ csched2_dom_cntl( * and queue its first replenishment event. * * Since cap is currently disabled for this domain, we - * know no vCPU is messing with the domain's budget, and + * know no unit is messing with the domain's budget, and * the replenishment timer is still off. * For these reasons, it is safe to do the following without * taking the budget_lock. @@ -2915,42 +2905,42 @@ csched2_dom_cntl( set_timer(&sdom->repl_timer, sdom->next_repl); /* - * Now, let's enable budget accounting for all the vCPUs. + * Now, let's enable budget accounting for all the units. * For making sure that they will start to honour the domain's * cap, we set their budget to 0. * This way, as soon as they will try to run, they will have * to get some budget. * - * For the vCPUs that are already running, we trigger the + * For the units that are already running, we trigger the * scheduler on their pCPU. When, as a consequence of this, * csched2_schedule() will run, it will figure out there is - * no budget, and the vCPU will try to get some (and be parked, + * no budget, and the unit will try to get some (and be parked, * if there's none, and we'll switch to someone else). */ - for_each_vcpu ( d, v ) + for_each_sched_unit ( d, unit ) { - svc = csched2_unit(v->sched_unit); - lock = unit_schedule_lock(svc->vcpu->sched_unit); - if ( v->sched_unit->is_running ) + svc = csched2_unit(unit); + lock = unit_schedule_lock(unit); + if ( unit->is_running ) { - unsigned int cpu = v->processor; + unsigned int cpu = sched_unit_cpu(unit); struct csched2_runqueue_data *rqd = c2rqd(ops, cpu); - ASSERT(curr_on_cpu(cpu)->vcpu == v); + ASSERT(curr_on_cpu(cpu) == unit); /* - * We are triggering a reschedule on the vCPU's + * We are triggering a reschedule on the unit's * pCPU. That will run burn_credits() and, since - * the vCPU is capped now, it would charge all the + * the unit is capped now, it would charge all the * execution time of this last round as budget as - * well. That will make the vCPU budget go negative, + * well. That will make the unit budget go negative, * potentially by a large amount, and it's unfair. * * To avoid that, call burn_credit() here, to do the * accounting of this current running instance now, * with budgetting still disabled. This does not * prevent some small amount of budget being charged - * to the vCPU (i.e., the amount of time it runs from + * to the unit (i.e., the amount of time it runs from * now, to when scheduling happens). The budget will * also go below 0, but a lot less than how it would * if we don't do this. @@ -2961,7 +2951,7 @@ csched2_dom_cntl( cpu_raise_softirq(cpu, SCHEDULE_SOFTIRQ); } svc->budget = 0; - unit_schedule_unlock(lock, svc->vcpu->sched_unit); + unit_schedule_unlock(lock, unit); } } @@ -2973,30 +2963,30 @@ csched2_dom_cntl( stop_timer(&sdom->repl_timer); - /* Disable budget accounting for all the vCPUs. */ - for_each_vcpu ( d, v ) + /* Disable budget accounting for all the units. */ + for_each_sched_unit ( d, unit ) { - struct csched2_unit *svc = csched2_unit(v->sched_unit); - spinlock_t *lock = unit_schedule_lock(svc->vcpu->sched_unit); + struct csched2_unit *svc = csched2_unit(unit); + spinlock_t *lock = unit_schedule_lock(unit); svc->budget = STIME_MAX; svc->budget_quota = 0; - unit_schedule_unlock(lock, svc->vcpu->sched_unit); + unit_schedule_unlock(lock, unit); } sdom->cap = 0; /* * We are disabling the cap for this domain, which may have - * vCPUs waiting for a replenishment, so we unpark them all. + * units waiting for a replenishment, so we unpark them all. * Note that, since we have already disabled budget accounting - * for all the vCPUs of the domain, no currently running vCPU - * will be added to the parked vCPUs list any longer. + * for all the units of the domain, no currently running unit + * will be added to the parked units list any longer. */ spin_lock(&sdom->budget_lock); - list_splice_init(&sdom->parked_vcpus, &parked); + list_splice_init(&sdom->parked_units, &parked); spin_unlock(&sdom->budget_lock); - unpark_parked_vcpus(ops, &parked); + unpark_parked_units(ops, &parked); } write_unlock_irqrestore(&prv->lock, flags); break; @@ -3073,12 +3063,12 @@ csched2_alloc_domdata(const struct scheduler *ops, struct domain *dom) sdom->dom = dom; sdom->weight = CSCHED2_DEFAULT_WEIGHT; sdom->cap = 0U; - sdom->nr_vcpus = 0; + sdom->nr_units = 0; init_timer(&sdom->repl_timer, replenish_domain_budget, sdom, cpumask_any(cpupool_domain_cpumask(dom))); spin_lock_init(&sdom->budget_lock); - INIT_LIST_HEAD(&sdom->parked_vcpus); + INIT_LIST_HEAD(&sdom->parked_units); write_lock_irqsave(&prv->lock, flags); @@ -3112,34 +3102,32 @@ csched2_free_domdata(const struct scheduler *ops, void *data) static void csched2_unit_insert(const struct scheduler *ops, struct sched_unit *unit) { - struct vcpu *vc = unit->vcpu; struct csched2_unit *svc = unit->priv; struct csched2_dom * const sdom = svc->sdom; spinlock_t *lock; - ASSERT(!is_idle_vcpu(vc)); + ASSERT(!is_idle_unit(unit)); ASSERT(list_empty(&svc->runq_elem)); /* csched2_res_pick() expects the pcpu lock to be held */ lock = unit_schedule_lock_irq(unit); - unit->res = csched2_res_pick(ops, unit); - vc->processor = unit->res->processor; + sched_set_res(unit, csched2_res_pick(ops, unit)); spin_unlock_irq(lock); lock = unit_schedule_lock_irq(unit); - /* Add vcpu to runqueue of initial processor */ - runq_assign(ops, vc); + /* Add unit to runqueue of initial processor */ + runq_assign(ops, unit); unit_schedule_unlock_irq(lock, unit); - sdom->nr_vcpus++; + sdom->nr_units++; SCHED_STAT_CRANK(unit_insert); - CSCHED2_VCPU_CHECK(vc); + CSCHED2_UNIT_CHECK(unit); } static void @@ -3153,11 +3141,10 @@ csched2_free_vdata(const struct scheduler *ops, void *priv) static void csched2_unit_remove(const struct scheduler *ops, struct sched_unit *unit) { - struct vcpu *vc = unit->vcpu; struct csched2_unit * const svc = csched2_unit(unit); spinlock_t *lock; - ASSERT(!is_idle_vcpu(vc)); + ASSERT(!is_idle_unit(unit)); ASSERT(list_empty(&svc->runq_elem)); SCHED_STAT_CRANK(unit_remove); @@ -3165,14 +3152,14 @@ csched2_unit_remove(const struct scheduler *ops, struct sched_unit *unit) /* Remove from runqueue */ lock = unit_schedule_lock_irq(unit); - runq_deassign(ops, vc); + runq_deassign(ops, unit); unit_schedule_unlock_irq(lock, unit); - svc->sdom->nr_vcpus--; + svc->sdom->nr_units--; } -/* How long should we let this vcpu run for? */ +/* How long should we let this unit run for? */ static s_time_t csched2_runtime(const struct scheduler *ops, int cpu, struct csched2_unit *snext, s_time_t now) @@ -3187,7 +3174,7 @@ csched2_runtime(const struct scheduler *ops, int cpu, * If we're idle, just stay so. Others (or external events) * will poke us when necessary. */ - if ( is_idle_vcpu(snext->vcpu) ) + if ( is_idle_unit(snext->unit) ) return -1; /* General algorithm: @@ -3204,8 +3191,8 @@ csched2_runtime(const struct scheduler *ops, int cpu, if ( prv->ratelimit_us ) { s_time_t ratelimit_min = MICROSECS(prv->ratelimit_us); - if ( snext->vcpu->sched_unit->is_running ) - ratelimit_min = snext->vcpu->sched_unit->state_entry_time + + if ( snext->unit->is_running ) + ratelimit_min = snext->unit->state_entry_time + MICROSECS(prv->ratelimit_us) - now; if ( ratelimit_min > min_time ) min_time = ratelimit_min; @@ -3222,7 +3209,7 @@ csched2_runtime(const struct scheduler *ops, int cpu, { struct csched2_unit *swait = runq_elem(runq->next); - if ( ! is_idle_vcpu(swait->vcpu) + if ( ! is_idle_unit(swait->unit) && swait->credit > 0 ) { rt_credit = snext->credit - swait->credit; @@ -3236,7 +3223,7 @@ csched2_runtime(const struct scheduler *ops, int cpu, * * FIXME: See if we can eliminate this conversion if we know time * will be outside (MIN,MAX). Probably requires pre-calculating - * credit values of MIN,MAX per vcpu, since each vcpu burns credit + * credit values of MIN,MAX per unit, since each unit burns credit * at a different rate. */ if ( rt_credit > 0 ) @@ -3284,36 +3271,35 @@ runq_candidate(struct csched2_runqueue_data *rqd, *skipped = 0; - if ( unlikely(is_idle_vcpu(scurr->vcpu)) ) + if ( unlikely(is_idle_unit(scurr->unit)) ) { snext = scurr; goto check_runq; } - yield = __test_and_clear_bit(__CSFLAG_vcpu_yield, &scurr->flags); + yield = __test_and_clear_bit(__CSFLAG_unit_yield, &scurr->flags); /* - * Return the current vcpu if it has executed for less than ratelimit. - * Adjuststment for the selected vcpu's credit and decision + * Return the current unit if it has executed for less than ratelimit. + * Adjuststment for the selected unit's credit and decision * for how long it will run will be taken in csched2_runtime. * * Note that, if scurr is yielding, we don't let rate limiting kick in. * In fact, it may be the case that scurr is about to spin, and there's * no point forcing it to do so until rate limiting expires. */ - if ( !yield && prv->ratelimit_us && vcpu_runnable(scurr->vcpu) && - (now - scurr->vcpu->sched_unit->state_entry_time) < - MICROSECS(prv->ratelimit_us) ) + if ( !yield && prv->ratelimit_us && unit_runnable(scurr->unit) && + (now - scurr->unit->state_entry_time) < MICROSECS(prv->ratelimit_us) ) { if ( unlikely(tb_init_done) ) { struct { - unsigned vcpu:16, dom:16; + unsigned unit:16, dom:16; unsigned runtime; } d; - d.dom = scurr->vcpu->domain->domain_id; - d.vcpu = scurr->vcpu->vcpu_id; - d.runtime = now - scurr->vcpu->sched_unit->state_entry_time; + d.dom = scurr->unit->domain->domain_id; + d.unit = scurr->unit->unit_id; + d.runtime = now - scurr->unit->state_entry_time; __trace_var(TRC_CSCHED2_RATELIMIT, 1, sizeof(d), (unsigned char *)&d); @@ -3322,13 +3308,13 @@ runq_candidate(struct csched2_runqueue_data *rqd, } /* If scurr has a soft-affinity, let's check whether cpu is part of it */ - if ( has_soft_affinity(scurr->vcpu->sched_unit) ) + if ( has_soft_affinity(scurr->unit) ) { - affinity_balance_cpumask(scurr->vcpu->sched_unit, BALANCE_SOFT_AFFINITY, + affinity_balance_cpumask(scurr->unit, BALANCE_SOFT_AFFINITY, cpumask_scratch); if ( unlikely(!cpumask_test_cpu(cpu, cpumask_scratch)) ) { - cpumask_t *online = cpupool_domain_cpumask(scurr->vcpu->domain); + cpumask_t *online = cpupool_domain_cpumask(scurr->unit->domain); /* Ok, is any of the pcpus in scurr soft-affinity idle? */ cpumask_and(cpumask_scratch, cpumask_scratch, &rqd->idle); @@ -3356,10 +3342,10 @@ runq_candidate(struct csched2_runqueue_data *rqd, * * Of course, we also default to idle also if scurr is not runnable. */ - if ( vcpu_runnable(scurr->vcpu) && !soft_aff_preempt ) + if ( unit_runnable(scurr->unit) && !soft_aff_preempt ) snext = scurr; else - snext = csched2_unit(idle_vcpu[cpu]->sched_unit); + snext = csched2_unit(sched_idle_unit(cpu)); check_runq: list_for_each_safe( iter, temp, &rqd->runq ) @@ -3369,24 +3355,24 @@ runq_candidate(struct csched2_runqueue_data *rqd, if ( unlikely(tb_init_done) ) { struct { - unsigned vcpu:16, dom:16; + unsigned unit:16, dom:16; } d; - d.dom = svc->vcpu->domain->domain_id; - d.vcpu = svc->vcpu->vcpu_id; + d.dom = svc->unit->domain->domain_id; + d.unit = svc->unit->unit_id; __trace_var(TRC_CSCHED2_RUNQ_CAND_CHECK, 1, sizeof(d), (unsigned char *)&d); } - /* Only consider vcpus that are allowed to run on this processor. */ - if ( !cpumask_test_cpu(cpu, svc->vcpu->sched_unit->cpu_hard_affinity) ) + /* Only consider units that are allowed to run on this processor. */ + if ( !cpumask_test_cpu(cpu, svc->unit->cpu_hard_affinity) ) { (*skipped)++; continue; } /* - * If a vcpu is meant to be picked up by another processor, and such + * If an unit is meant to be picked up by another processor, and such * processor has not scheduled yet, leave it in the runqueue for him. */ if ( svc->tickled_cpu != -1 && svc->tickled_cpu != cpu && @@ -3401,7 +3387,7 @@ runq_candidate(struct csched2_runqueue_data *rqd, * If this is on a different processor, don't pull it unless * its credit is at least CSCHED2_MIGRATE_RESIST higher. */ - if ( svc->vcpu->processor != cpu + if ( sched_unit_cpu(svc->unit) != cpu && snext->credit + CSCHED2_MIGRATE_RESIST > svc->credit ) { (*skipped)++; @@ -3416,7 +3402,7 @@ runq_candidate(struct csched2_runqueue_data *rqd, * some budget, then choose it. */ if ( (yield || svc->credit > snext->credit) && - (!has_cap(svc) || vcpu_grab_budget(svc)) ) + (!has_cap(svc) || unit_grab_budget(svc)) ) snext = svc; /* In any case, if we got this far, break. */ @@ -3426,12 +3412,12 @@ runq_candidate(struct csched2_runqueue_data *rqd, if ( unlikely(tb_init_done) ) { struct { - unsigned vcpu:16, dom:16; + unsigned unit:16, dom:16; unsigned tickled_cpu, skipped; int credit; } d; - d.dom = snext->vcpu->domain->domain_id; - d.vcpu = snext->vcpu->vcpu_id; + d.dom = snext->unit->domain->domain_id; + d.unit = snext->unit->unit_id; d.credit = snext->credit; d.tickled_cpu = snext->tickled_cpu; d.skipped = *skipped; @@ -3463,14 +3449,15 @@ csched2_schedule( { const int cpu = smp_processor_id(); struct csched2_runqueue_data *rqd; - struct csched2_unit * const scurr = csched2_unit(current->sched_unit); + struct sched_unit *currunit = current->sched_unit; + struct csched2_unit * const scurr = csched2_unit(currunit); struct csched2_unit *snext = NULL; - unsigned int skipped_vcpus = 0; + unsigned int skipped_units = 0; struct task_slice ret; bool tickled; SCHED_STAT_CRANK(schedule); - CSCHED2_VCPU_CHECK(current); + CSCHED2_UNIT_CHECK(currunit); BUG_ON(!cpumask_test_cpu(cpu, &csched2_priv(ops)->initialized)); @@ -3479,7 +3466,7 @@ csched2_schedule( ASSERT(spin_is_locked(get_sched_res(cpu)->schedule_lock)); - BUG_ON(!is_idle_vcpu(scurr->vcpu) && scurr->rqd != rqd); + BUG_ON(!is_idle_unit(currunit) && scurr->rqd != rqd); /* Clear "tickled" bit now that we've been scheduled */ tickled = cpumask_test_cpu(cpu, &rqd->tickled); @@ -3499,7 +3486,7 @@ csched2_schedule( d.cpu = cpu; d.rq_id = c2r(cpu); d.tasklet = tasklet_work_scheduled; - d.idle = is_idle_vcpu(current); + d.idle = is_idle_unit(currunit); d.smt_idle = cpumask_test_cpu(cpu, &rqd->smt_idle); d.tickled = tickled; __trace_var(TRC_CSCHED2_SCHEDULE, 1, @@ -3513,55 +3500,55 @@ csched2_schedule( /* * Below 0, means that we are capped and we have overrun our budget. * Let's try to get some more but, if we fail (e.g., because of the - * other running vcpus), we will be parked. + * other running units), we will be parked. */ if ( unlikely(scurr->budget <= 0) ) - vcpu_grab_budget(scurr); + unit_grab_budget(scurr); /* - * Select next runnable local VCPU (ie top of local runq). + * Select next runnable local UNIT (ie top of local runq). * - * If the current vcpu is runnable, and has higher credit than + * If the current unit is runnable, and has higher credit than * the next guy on the queue (or there is noone else), we want to * run him again. * - * If there's tasklet work to do, we want to chose the idle vcpu + * If there's tasklet work to do, we want to chose the idle unit * for this processor, and mark the current for delayed runqueue * add. * - * If the current vcpu is runnable, and there's another runnable + * If the current unit is runnable, and there's another runnable * candidate, we want to mark current for delayed runqueue add, * and remove the next guy from the queue. * - * If the current vcpu is not runnable, we want to chose the idle - * vcpu for this processor. + * If the current unit is not runnable, we want to chose the idle + * unit for this processor. */ if ( tasklet_work_scheduled ) { - __clear_bit(__CSFLAG_vcpu_yield, &scurr->flags); + __clear_bit(__CSFLAG_unit_yield, &scurr->flags); trace_var(TRC_CSCHED2_SCHED_TASKLET, 1, 0, NULL); - snext = csched2_unit(idle_vcpu[cpu]->sched_unit); + snext = csched2_unit(sched_idle_unit(cpu)); } else - snext = runq_candidate(rqd, scurr, cpu, now, &skipped_vcpus); + snext = runq_candidate(rqd, scurr, cpu, now, &skipped_units); - /* If switching from a non-idle runnable vcpu, put it + /* If switching from a non-idle runnable unit, put it * back on the runqueue. */ if ( snext != scurr - && !is_idle_vcpu(scurr->vcpu) - && vcpu_runnable(current) ) + && !is_idle_unit(currunit) + && unit_runnable(currunit) ) __set_bit(__CSFLAG_delayed_runq_add, &scurr->flags); ret.migrated = 0; /* Accounting for non-idle tasks */ - if ( !is_idle_vcpu(snext->vcpu) ) + if ( !is_idle_unit(snext->unit) ) { /* If switching, remove this from the runqueue and mark it scheduled */ if ( snext != scurr ) { ASSERT(snext->rqd == rqd); - ASSERT(!snext->vcpu->sched_unit->is_running); + ASSERT(!snext->unit->is_running); runq_remove(snext); __set_bit(__CSFLAG_scheduled, &snext->flags); @@ -3576,19 +3563,19 @@ csched2_schedule( /* * The reset condition is "has a scheduler epoch come to an end?". - * The way this is enforced is checking whether the vcpu at the top + * The way this is enforced is checking whether the unit at the top * of the runqueue has negative credits. This means the epochs have * variable length, as in one epoch expores when: - * 1) the vcpu at the top of the runqueue has executed for + * 1) the unit at the top of the runqueue has executed for * around 10 ms (with default parameters); - * 2) no other vcpu with higher credits wants to run. + * 2) no other unit with higher credits wants to run. * * Here, where we want to check for reset, we need to make sure the - * proper vcpu is being used. In fact, runqueue_candidate() may have - * not returned the first vcpu in the runqueue, for various reasons + * proper unit is being used. In fact, runqueue_candidate() may have + * not returned the first unit in the runqueue, for various reasons * (e.g., affinity). Only trigger a reset when it does. */ - if ( skipped_vcpus == 0 && snext->credit <= CSCHED2_CREDIT_RESET ) + if ( skipped_units == 0 && snext->credit <= CSCHED2_CREDIT_RESET ) { reset_credit(ops, cpu, now, snext); balance_load(ops, cpu, now); @@ -3598,11 +3585,10 @@ csched2_schedule( snext->tickled_cpu = -1; /* Safe because lock for old processor is held */ - if ( snext->vcpu->processor != cpu ) + if ( sched_unit_cpu(snext->unit) != cpu ) { snext->credit += CSCHED2_MIGRATE_COMPENSATION; - snext->vcpu->processor = cpu; - snext->vcpu->sched_unit->res = get_sched_res(cpu); + sched_set_res(snext->unit, get_sched_res(cpu)); SCHED_STAT_CRANK(migrated); ret.migrated = 1; } @@ -3636,20 +3622,20 @@ csched2_schedule( * Return task to run next... */ ret.time = csched2_runtime(ops, cpu, snext, now); - ret.task = snext->vcpu->sched_unit; + ret.task = snext->unit; - CSCHED2_VCPU_CHECK(ret.task->vcpu); + CSCHED2_UNIT_CHECK(ret.task); return ret; } static void -csched2_dump_vcpu(struct csched2_private *prv, struct csched2_unit *svc) +csched2_dump_unit(struct csched2_private *prv, struct csched2_unit *svc) { printk("[%i.%i] flags=%x cpu=%i", - svc->vcpu->domain->domain_id, - svc->vcpu->vcpu_id, + svc->unit->domain->domain_id, + svc->unit->unit_id, svc->flags, - svc->vcpu->processor); + sched_unit_cpu(svc->unit)); printk(" credit=%" PRIi32" [w=%u]", svc->credit, svc->weight); @@ -3674,12 +3660,12 @@ dump_pcpu(const struct scheduler *ops, int cpu) nr_cpu_ids, cpumask_bits(per_cpu(cpu_sibling_mask, cpu)), nr_cpu_ids, cpumask_bits(per_cpu(cpu_core_mask, cpu))); - /* current VCPU (nothing to say if that's the idle vcpu) */ + /* current UNIT (nothing to say if that's the idle unit) */ svc = csched2_unit(curr_on_cpu(cpu)); - if ( svc && !is_idle_vcpu(svc->vcpu) ) + if ( svc && !is_idle_unit(svc->unit) ) { printk("\trun: "); - csched2_dump_vcpu(prv, svc); + csched2_dump_unit(prv, svc); } } @@ -3736,7 +3722,7 @@ csched2_dump(const struct scheduler *ops) list_for_each( iter_sdom, &prv->sdom ) { struct csched2_dom *sdom; - struct vcpu *v; + struct sched_unit *unit; sdom = list_entry(iter_sdom, struct csched2_dom, sdom_elem); @@ -3744,19 +3730,19 @@ csched2_dump(const struct scheduler *ops) sdom->dom->domain_id, sdom->weight, sdom->cap, - sdom->nr_vcpus); + sdom->nr_units); - for_each_vcpu( sdom->dom, v ) + for_each_sched_unit ( sdom->dom, unit ) { - struct csched2_unit * const svc = csched2_unit(v->sched_unit); + struct csched2_unit * const svc = csched2_unit(unit); spinlock_t *lock; - lock = unit_schedule_lock(svc->vcpu->sched_unit); + lock = unit_schedule_lock(unit); printk("\t%3d: ", ++loop); - csched2_dump_vcpu(prv, svc); + csched2_dump_unit(prv, svc); - unit_schedule_unlock(lock, svc->vcpu->sched_unit); + unit_schedule_unlock(lock, unit); } } @@ -3782,7 +3768,7 @@ csched2_dump(const struct scheduler *ops) if ( svc ) { printk("\t%3d: ", loop++); - csched2_dump_vcpu(prv, svc); + csched2_dump_unit(prv, svc); } } spin_unlock(&rqd->lock); @@ -3882,7 +3868,7 @@ csched2_switch_sched(struct scheduler *new_ops, unsigned int cpu, struct sched_resource *sd = get_sched_res(cpu); unsigned rqi; - ASSERT(pdata && svc && is_idle_vcpu(svc->vcpu)); + ASSERT(pdata && svc && is_idle_unit(svc->unit)); /* * We own one runqueue lock already (from schedule_cpu_switch()). This @@ -3895,7 +3881,7 @@ csched2_switch_sched(struct scheduler *new_ops, unsigned int cpu, ASSERT(!local_irq_is_enabled()); write_lock(&prv->lock); - idle_vcpu[cpu]->sched_unit->priv = vdata; + sched_idle_unit(cpu)->priv = vdata; rqi = init_pdata(prv, pdata, cpu); @@ -3937,7 +3923,7 @@ csched2_deinit_pdata(const struct scheduler *ops, void *pcpu, int cpu) */ ASSERT(spc && spc->runq_id != -1); ASSERT(cpumask_test_cpu(cpu, &prv->initialized)); - + /* Find the old runqueue and remove this cpu from it */ rqd = prv->rqd + spc->runq_id;

[23/60] xen/sched: make credit2 scheduler vcpu agnostic.

Commit Message

Patch