diff mbox

[08/24] xen: tracing: add trace records for schedule and rate-limiting.

Message ID 147145430952.25877.10937525630692282780.stgit@Solace.fritz.box (mailing list archive)
State New, archived
Headers show

Commit Message

Dario Faggioli Aug. 17, 2016, 5:18 p.m. UTC
As far as {csched, csched2, rt}_schedule() are concerned,
an "empty" event, would already make it easier to read and
understand a trace.

But while there, add a few useful information, like
if the cpu that is going through the scheduler has
been tickled or not, if it is currently idle, etc
(they vary, on a per-scheduler basis).

For Credit1 and Credit2, add a record about when
rate-limiting kicks in too.

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
---
Cc: George Dunlap <george.dunlap@eu.citrix.com>
Cc: Meng Xu <mengxu@cis.upenn.edu>
Cc: Anshul Makkar <anshul.makkar@citrix.com>
---
 xen/common/sched_credit.c  |    7 +++++++
 xen/common/sched_credit2.c |   38 +++++++++++++++++++++++++++++++++++++-
 xen/common/sched_rt.c      |   15 +++++++++++++++
 3 files changed, 59 insertions(+), 1 deletion(-)

Comments

Meng Xu Aug. 18, 2016, 12:57 a.m. UTC | #1
On Wed, Aug 17, 2016 at 1:18 PM, Dario Faggioli
<dario.faggioli@citrix.com> wrote:
>
> As far as {csched, csched2, rt}_schedule() are concerned,
> an "empty" event, would already make it easier to read and
> understand a trace.
>
> But while there, add a few useful information, like
> if the cpu that is going through the scheduler has
> been tickled or not, if it is currently idle, etc
> (they vary, on a per-scheduler basis).
>
> For Credit1 and Credit2, add a record about when
> rate-limiting kicks in too.
>
> Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
> ---
> Cc: George Dunlap <george.dunlap@eu.citrix.com>
> Cc: Meng Xu <mengxu@cis.upenn.edu>
> Cc: Anshul Makkar <anshul.makkar@citrix.com>
> ---
>  xen/common/sched_credit.c  |    7 +++++++
>  xen/common/sched_credit2.c |   38 +++++++++++++++++++++++++++++++++++++-
>  xen/common/sched_rt.c      |   15 +++++++++++++++
>  3 files changed, 59 insertions(+), 1 deletion(-)
>


> diff --git a/xen/common/sched_rt.c b/xen/common/sched_rt.c
> index 41c61a7..903dbd8 100644
> --- a/xen/common/sched_rt.c
> +++ b/xen/common/sched_rt.c
> @@ -160,6 +160,7 @@
>  #define TRC_RTDS_BUDGET_BURN      TRC_SCHED_CLASS_EVT(RTDS, 3)
>  #define TRC_RTDS_BUDGET_REPLENISH TRC_SCHED_CLASS_EVT(RTDS, 4)
>  #define TRC_RTDS_SCHED_TASKLET    TRC_SCHED_CLASS_EVT(RTDS, 5)
> +#define TRC_RTDS_SCHEDULE         TRC_SCHED_CLASS_EVT(RTDS, 6)
>
>  static void repl_timer_handler(void *data);
>
> @@ -1035,6 +1036,20 @@ rt_schedule(const struct scheduler *ops, s_time_t now, bool_t tasklet_work_sched
>      struct rt_vcpu *snext = NULL;
>      struct task_slice ret = { .migrated = 0 };
>
> +    /* TRACE */
> +    {
> +        struct __packed {
> +            unsigned cpu:17, tasklet:8, tickled:4, idle:4;
> +        } d;
> +        d.cpu = cpu;
> +        d.tasklet = tasklet_work_scheduled;
> +        d.tickled = cpumask_test_cpu(cpu, &prv->tickled);
> +        d.idle = is_idle_vcpu(current);
> +        __trace_var(TRC_RTDS_SCHEDULE, 1,
> +                    sizeof(d),
> +                    (unsigned char *)&d);
> +    }

I have two questions here:
1) IMHO, the trace should be wrapped by the if (
unlikely(tb_init_done) ) {} statement as done in sched_credit2.c.
Otherwise, we always enable the tracing which may hurt the
performance, I think.

2) Why does the cpu field here use 17 bits instead of 16 bits as used
in credit2?
This may be a typo I guess (since you are trying to align the
structure to 32 bits I guess ;-) )?
In addition, I'm wondering if uint16 is better than unsigned? I'm not
that confident if unsigned type will always have 16 bits on all types
of machines?

Thanks,

Meng

------------
Meng Xu
PhD Student in Computer and Information Science
University of Pennsylvania
http://www.cis.upenn.edu/~mengxu/
Dario Faggioli Aug. 18, 2016, 9:41 a.m. UTC | #2
On Wed, 2016-08-17 at 20:57 -0400, Meng Xu wrote:
> On Wed, Aug 17, 2016 at 1:18 PM, Dario Faggioli
> <dario.faggioli@citrix.com> wrote:
> > 
> > diff --git a/xen/common/sched_rt.c b/xen/common/sched_rt.c
> > index 41c61a7..903dbd8 100644
> > --- a/xen/common/sched_rt.c
> > +++ b/xen/common/sched_rt.c
> > 
> > @@ -1035,6 +1036,20 @@ rt_schedule(const struct scheduler *ops,
> > s_time_t now, bool_t tasklet_work_sched
> >      struct rt_vcpu *snext = NULL;
> >      struct task_slice ret = { .migrated = 0 };
> > 
> > +    /* TRACE */
> > +    {
> > +        struct __packed {
> > +            unsigned cpu:17, tasklet:8, tickled:4, idle:4;
> > +        } d;
> > +        d.cpu = cpu;
> > +        d.tasklet = tasklet_work_scheduled;
> > +        d.tickled = cpumask_test_cpu(cpu, &prv->tickled);
> > +        d.idle = is_idle_vcpu(current);
> > +        __trace_var(TRC_RTDS_SCHEDULE, 1,
> > +                    sizeof(d),
> > +                    (unsigned char *)&d);
> > +    }
>

> 1) IMHO, the trace should be wrapped by the if (
> unlikely(tb_init_done) ) {} statement as done in sched_credit2.c.
> Otherwise, we always enable the tracing which may hurt the
> performance, I think.
> 
You're right. Actually, in order to follow suite from sched_rt.c, I
think using trace_var() instead of __trace_var() is what we want.

Then, I think it will be a good thing, at some point, to convert all
these /*TRACE*/ blocks to extrapolate the tb_init_done check and make
it guard the trace record marshalling, like I did a few patches ago for
Credit2.... but that's another patch.

This is a cut-&-paste mistake, good job noticing it. :-)

> 2) Why does the cpu field here use 17 bits instead of 16 bits as used
> in credit2?
> This may be a typo I guess (since you are trying to align the
> structure to 32 bits I guess ;-) )?
>
Wow... 17.. I must have been drunk when writing that! :-O

> In addition, I'm wondering if uint16 is better than unsigned? I'm not
> that confident if unsigned type will always have 16 bits on all types
> of machines?
> 
Yeah, well, TBH, all this bitfields and packing, etc., is something I
truly hate. But I think in this case unsigned is fine.

Thanks and Regards,
Dario
George Dunlap Sept. 20, 2016, 1:50 p.m. UTC | #3
On 17/08/16 18:18, Dario Faggioli wrote:
> As far as {csched, csched2, rt}_schedule() are concerned,
> an "empty" event, would already make it easier to read and
> understand a trace.
> 
> But while there, add a few useful information, like
> if the cpu that is going through the scheduler has
> been tickled or not, if it is currently idle, etc
> (they vary, on a per-scheduler basis).
> 
> For Credit1 and Credit2, add a record about when
> rate-limiting kicks in too.
> 
> Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
> ---
> Cc: George Dunlap <george.dunlap@eu.citrix.com>
> Cc: Meng Xu <mengxu@cis.upenn.edu>
> Cc: Anshul Makkar <anshul.makkar@citrix.com>
> ---
>  xen/common/sched_credit.c  |    7 +++++++
>  xen/common/sched_credit2.c |   38 +++++++++++++++++++++++++++++++++++++-
>  xen/common/sched_rt.c      |   15 +++++++++++++++
>  3 files changed, 59 insertions(+), 1 deletion(-)
> 
> diff --git a/xen/common/sched_credit.c b/xen/common/sched_credit.c
> index ca04732..f9d3ac9 100644
> --- a/xen/common/sched_credit.c
> +++ b/xen/common/sched_credit.c
> @@ -134,6 +134,8 @@
>  #define TRC_CSCHED_TICKLE        TRC_SCHED_CLASS_EVT(CSCHED, 6)
>  #define TRC_CSCHED_BOOST_START   TRC_SCHED_CLASS_EVT(CSCHED, 7)
>  #define TRC_CSCHED_BOOST_END     TRC_SCHED_CLASS_EVT(CSCHED, 8)
> +#define TRC_CSCHED_SCHEDULE      TRC_SCHED_CLASS_EVT(CSCHED, 9)
> +#define TRC_CSCHED_RATELIMIT     TRC_SCHED_CLASS_EVT(CSCHED, 10)
>  
>  
>  /*
> @@ -1743,6 +1745,9 @@ csched_schedule(
>      SCHED_STAT_CRANK(schedule);
>      CSCHED_VCPU_CHECK(current);
>  
> +    TRACE_3D(TRC_CSCHED_SCHEDULE, cpu, tasklet_work_scheduled,
> +             is_idle_vcpu(current));

Sorry to be annoying, but you're using two full words here for two bits
of information, and scheduling isn't exactly a once-every-few-seconds
phenomenon. :-)  Would you mind packing this in a bit?

> +
>      runtime = now - current->runstate.state_entry_time;
>      if ( runtime < 0 ) /* Does this ever happen? */
>          runtime = 0;
> @@ -1792,6 +1797,8 @@ csched_schedule(
>          snext->start_time += now;
>          perfc_incr(delay_ms);
>          tslice = MICROSECS(prv->ratelimit_us) - runtime;
> +        TRACE_3D(TRC_CSCHED_RATELIMIT, scurr->vcpu->domain->domain_id,
> +                 scurr->vcpu->vcpu_id, runtime);

Same for this one, if you don't mind -- this one is less important
probably, but since you essentially have the code in credit2, it seems
like a pretty straightforward exercise to copy-and-paste it.

The credit2 traces look good -- thanks!

 -George
diff mbox

Patch

diff --git a/xen/common/sched_credit.c b/xen/common/sched_credit.c
index ca04732..f9d3ac9 100644
--- a/xen/common/sched_credit.c
+++ b/xen/common/sched_credit.c
@@ -134,6 +134,8 @@ 
 #define TRC_CSCHED_TICKLE        TRC_SCHED_CLASS_EVT(CSCHED, 6)
 #define TRC_CSCHED_BOOST_START   TRC_SCHED_CLASS_EVT(CSCHED, 7)
 #define TRC_CSCHED_BOOST_END     TRC_SCHED_CLASS_EVT(CSCHED, 8)
+#define TRC_CSCHED_SCHEDULE      TRC_SCHED_CLASS_EVT(CSCHED, 9)
+#define TRC_CSCHED_RATELIMIT     TRC_SCHED_CLASS_EVT(CSCHED, 10)
 
 
 /*
@@ -1743,6 +1745,9 @@  csched_schedule(
     SCHED_STAT_CRANK(schedule);
     CSCHED_VCPU_CHECK(current);
 
+    TRACE_3D(TRC_CSCHED_SCHEDULE, cpu, tasklet_work_scheduled,
+             is_idle_vcpu(current));
+
     runtime = now - current->runstate.state_entry_time;
     if ( runtime < 0 ) /* Does this ever happen? */
         runtime = 0;
@@ -1792,6 +1797,8 @@  csched_schedule(
         snext->start_time += now;
         perfc_incr(delay_ms);
         tslice = MICROSECS(prv->ratelimit_us) - runtime;
+        TRACE_3D(TRC_CSCHED_RATELIMIT, scurr->vcpu->domain->domain_id,
+                 scurr->vcpu->vcpu_id, runtime);
         ret.migrated = 0;
         goto out;
     }
diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c
index c8e0ee7..164296b 100644
--- a/xen/common/sched_credit2.c
+++ b/xen/common/sched_credit2.c
@@ -55,6 +55,8 @@ 
 #define TRC_CSCHED2_LOAD_BALANCE     TRC_SCHED_CLASS_EVT(CSCHED2, 17)
 #define TRC_CSCHED2_PICKED_CPU       TRC_SCHED_CLASS_EVT(CSCHED2, 19)
 #define TRC_CSCHED2_RUNQ_CANDIDATE   TRC_SCHED_CLASS_EVT(CSCHED2, 20)
+#define TRC_CSCHED2_SCHEDULE         TRC_SCHED_CLASS_EVT(CSCHED2, 21)
+#define TRC_CSCHED2_RATELIMIT        TRC_SCHED_CLASS_EVT(CSCHED2, 22)
 
 /*
  * WARNING: This is still in an experimental phase.  Status and work can be found at the
@@ -2293,7 +2295,22 @@  runq_candidate(struct csched2_runqueue_data *rqd,
          vcpu_runnable(scurr->vcpu) &&
          (now - scurr->vcpu->runstate.state_entry_time) <
           MICROSECS(prv->ratelimit_us) )
+    {
+        if ( unlikely(tb_init_done) )
+        {
+            struct {
+                unsigned vcpu:16, dom:16;
+                unsigned runtime;
+            } d;
+            d.dom = scurr->vcpu->domain->domain_id;
+            d.vcpu = scurr->vcpu->vcpu_id;
+            d.runtime = now - scurr->vcpu->runstate.state_entry_time;
+            __trace_var(TRC_CSCHED2_RATELIMIT, 1,
+                        sizeof(d),
+                        (unsigned char *)&d);
+        }
         return scurr;
+    }
 
     /* Default to current if runnable, idle otherwise */
     if ( vcpu_runnable(scurr->vcpu) )
@@ -2383,6 +2400,7 @@  csched2_schedule(
     struct csched2_vcpu *snext = NULL;
     unsigned int snext_pos = 0;
     struct task_slice ret;
+    bool_t tickled;
 
     SCHED_STAT_CRANK(schedule);
     CSCHED2_VCPU_CHECK(current);
@@ -2397,13 +2415,31 @@  csched2_schedule(
     BUG_ON(!is_idle_vcpu(scurr->vcpu) && scurr->rqd != rqd);
 
     /* Clear "tickled" bit now that we've been scheduled */
-    if ( cpumask_test_cpu(cpu, &rqd->tickled) )
+    tickled = cpumask_test_cpu(cpu, &rqd->tickled);
+    if ( tickled )
     {
         __cpumask_clear_cpu(cpu, &rqd->tickled);
         cpumask_andnot(cpumask_scratch, &rqd->idle, &rqd->tickled);
         smt_idle_mask_set(cpu, cpumask_scratch, &rqd->smt_idle);
     }
 
+    if ( unlikely(tb_init_done) )
+    {
+        struct {
+            unsigned cpu:16, rq_id:16;
+            unsigned tasklet:8, idle:8, smt_idle:8, tickled:8;
+        } d;
+        d.cpu = cpu;
+        d.rq_id = c2r(ops, cpu);
+        d.tasklet = tasklet_work_scheduled;
+        d.idle = is_idle_vcpu(current);
+        d.smt_idle = cpumask_test_cpu(cpu, &rqd->smt_idle);
+        d.tickled = tickled;
+        __trace_var(TRC_CSCHED2_SCHEDULE, 1,
+                    sizeof(d),
+                    (unsigned char *)&d);
+    }
+
     /* Update credits */
     burn_credits(rqd, scurr, now);
 
diff --git a/xen/common/sched_rt.c b/xen/common/sched_rt.c
index 41c61a7..903dbd8 100644
--- a/xen/common/sched_rt.c
+++ b/xen/common/sched_rt.c
@@ -160,6 +160,7 @@ 
 #define TRC_RTDS_BUDGET_BURN      TRC_SCHED_CLASS_EVT(RTDS, 3)
 #define TRC_RTDS_BUDGET_REPLENISH TRC_SCHED_CLASS_EVT(RTDS, 4)
 #define TRC_RTDS_SCHED_TASKLET    TRC_SCHED_CLASS_EVT(RTDS, 5)
+#define TRC_RTDS_SCHEDULE         TRC_SCHED_CLASS_EVT(RTDS, 6)
 
 static void repl_timer_handler(void *data);
 
@@ -1035,6 +1036,20 @@  rt_schedule(const struct scheduler *ops, s_time_t now, bool_t tasklet_work_sched
     struct rt_vcpu *snext = NULL;
     struct task_slice ret = { .migrated = 0 };
 
+    /* TRACE */
+    {
+        struct __packed {
+            unsigned cpu:17, tasklet:8, tickled:4, idle:4;
+        } d;
+        d.cpu = cpu;
+        d.tasklet = tasklet_work_scheduled;
+        d.tickled = cpumask_test_cpu(cpu, &prv->tickled);
+        d.idle = is_idle_vcpu(current);
+        __trace_var(TRC_RTDS_SCHEDULE, 1,
+                    sizeof(d),
+                    (unsigned char *)&d);
+    }
+
     /* clear ticked bit now that we've been scheduled */
     cpumask_clear_cpu(cpu, &prv->tickled);