diff mbox

[v2,11/11] xen: credit2: implement true SMT support

Message ID CAFLBxZb72-NWjbwP+a7EryJkUfRqKbtuDhyXcVMCmj7cuJQyWA@mail.gmail.com (mailing list archive)
State New, archived
Headers show

Commit Message

George Dunlap July 19, 2016, 9:39 a.m. UTC
On Mon, Jul 18, 2016 at 6:24 PM, Dario Faggioli
<dario.faggioli@citrix.com> wrote:
> On Mon, 2016-07-18 at 17:48 +0100, George Dunlap wrote:
>> On 15/07/16 15:50, Dario Faggioli wrote:
>> >
>> > +/*
>> > + * If all the siblings of cpu (including cpu itself) are in
>> > idlers,
>> > + * set all their bits in mask.
>> > + *
>> > + * In order to properly take into account tickling, idlers needs
>> > to be
>> > + * set qeual to something like:
>> *equal (I can fix this on check-in)
>>
> Oops!
>
>> > + *
>> > + *  rqd->idle & (~rqd->tickled)
>> > + *
>> > + * This is because cpus that have been tickled will very likely
>> > pick up some
>> > + * work as soon as the manage to schedule, and hence we should
>> > really consider
>> > + * them as busy.
>> OK this is something that slightly confused me when I was reviewing
>> the
>> patch the first time: that rqd->idle is *all* pcpus which are
>> currently
>> idle (and thus we need to & (~tickled) when using it), but rqd-
>> >smt_idle
>> is meant to be maintained as *non-tickled* idle pcpus.
>>
> Short answer is, "yes, this recap of yours is correct".
>
> In fact, the difference between idle and smt_idle is that the former is
> valid instantaneously, while the latter is tracking a state.
>
> IOW, if, at any given time, I want to know what pcpus are idle, I check
> rqd->idle. If I want to know what are idle and also are not (or are
> unlikely) just about to pick up work, I can check
> rqd->idle&(~rqd->tickled)
>
> Let's now consider smt_idle and assume that, at time t siblings pcpus 2
> and 3 are idle (as in, their bit is 1 in rqd->idle). If I'd be basing
> smt_idle just on that, I could at this point set the bit of the core in
> smt_idle. This in turn means that work will likely be sent to either 2
> or 3 (depending on all the other factors that influence this). Let's
> assume we select 2. But if either of them --although being idle-- was
> has actually been tickled already, we may have taken a suboptimal
> decision. In fact, if 3 was tickled, both 2 and 3 will pick up work,
> and if there is another core (say, made up of siblings pcpus 6 and 7)
> which is truly fully idle, we would better have chosen a pcpu from
> there. If 2 was the one that was tickled, that's even worse, because I
> most likely have 2 work items, and am tickling only 1 pcpu!
>
> So, again, yes, basically this means that I need smt_idle to be
> representative of the set of non-tickled idle pcpus.
>
>> Are you planning at some point to have a follow-up patch which
>> changes
>> rqd->idle to be non-tickled idle pcpus as well?  Unless I missed
>> something it looks like at the moment the only times rqd->idle is
>> acted
>> upon is after &~-ing out rqd->tickled anyway.
>>
> I am indeed, but I was planning to do that after this round of changes
> (this series, plus soft-affinity, plus caps, which I have in my queue).
>
> It's, after all, an optimization, and hence I think it is fine to leave
> it to when things will be proven to be working. :-)
>
> If you're saying that this discrepancy between rqd->idle's and
> rqd->smt_idle's semantic is, at minimum, unideal, I do agree... but I
> think, for now at least, it's worth living with it.

I hadn't actually said anything, but you know me well enough to guess
what I'm thinking. :-)  I am somewhat torn between feeling like the
inconsistency and as you say, the fact that this is a distinct
improvement and it would seem a bit petty to insist that you either
wait or produce a patch to change idle at the same time.

But I do think that the difference needs to be called out a bit
better.  What about folding in something like the attached patch?

  -George

Comments

Dario Faggioli July 19, 2016, 9:57 a.m. UTC | #1
On Tue, 2016-07-19 at 10:39 +0100, George Dunlap wrote:
> On Mon, Jul 18, 2016 at 6:24 PM, Dario Faggioli
> <dario.faggioli@citrix.com> wrote:
> > 
> > If you're saying that this discrepancy between rqd->idle's and
> > rqd->smt_idle's semantic is, at minimum, unideal, I do agree... but
> > I
> > think, for now at least, it's worth living with it.
> I hadn't actually said anything, but you know me well enough to guess
> what I'm thinking. :-)  
>
Hehe. :-)

> I am somewhat torn between feeling like the
> inconsistency and as you say, the fact that this is a distinct
> improvement and it would seem a bit petty to insist that you either
> wait or produce a patch to change idle at the same time.
> 
If we go ahead, I sign up for double checking and, if possible, fixing
the inconsistency.

> But I do think that the difference needs to be called out a bit
> better.  
>
Yes, I was about to re-replying saying "perhaps we should add a comment
about this".

> What about folding in something like the attached patch?
> 
I'd be totally fine with this.

Thanks and Regards,
Dario
diff mbox

Patch

commit fd8fe6d8526cc9d6abe510aae7a654d1b72d4305
Author: George Dunlap <george.dunlap@citrix.com>
Commit: George Dunlap <george.dunlap@citrix.com>

    George's mods

diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c
index 6ccc6f0..3e1720c 100644
--- a/xen/common/sched_credit2.c
+++ b/xen/common/sched_credit2.c
@@ -353,8 +353,8 @@  struct csched2_runqueue_data {
     struct list_head svc;  /* List of all vcpus assigned to this runqueue */
     unsigned int max_weight;
 
-    cpumask_t idle,        /* Currently idle */
-        smt_idle,          /* Fully idle cores (as in all the siblings are idle) */
+    cpumask_t idle,        /* Currently idle pcpus */
+        smt_idle,          /* Fully idle-and-untickled cores (see below) */
         tickled;           /* Have been asked to go through schedule */
     int load;              /* Instantaneous load: Length of queue  + num non-idle threads */
     s_time_t load_last_update;  /* Last time average was updated */
@@ -454,17 +454,20 @@  struct csched2_dom {
  */
 
 /*
- * If all the siblings of cpu (including cpu itself) are in idlers,
- * set all their bits in mask.
- *
- * In order to properly take into account tickling, idlers needs to be
- * set qeual to something like:
- *
- *  rqd->idle & (~rqd->tickled)
- *
- * This is because cpus that have been tickled will very likely pick up some
- * work as soon as the manage to schedule, and hence we should really consider
- * them as busy.
+ * If all the siblings of cpu (including cpu itself) are both idle and
+ * untickled, set all their bits in mask.
+ *
+ * NB that rqd->smt_idle is different than rqd->idle.  rqd->idle
+ * records pcpus that at are merely idle (i.e., at the moment do not
+ * have a vcpu running on them).  But you have to manually filter out
+ * which pcpus have been tickled in order to find cores that are not
+ * going to be busy soon.  Filtering out tickled cpus pairwise is a
+ * lot of extra pain; so for rqd->smt_idle, we explicitly make so that
+ * the bits of a pcpu are set only if all the threads on its core are
+ * both idle *and* untickled.
+ *
+ * This means changing the mask when either rqd->idle or rqd->tickled
+ * changes.
  */
 static inline
 void smt_idle_mask_set(unsigned int cpu, const cpumask_t *idlers,