diff mbox series

[1/2] xenalyze: Handle start-of-day ->RUNNING transitions

Message ID 20230327161326.48851-1-george.dunlap@cloud.com (mailing list archive)
State Superseded
Headers show
Series [1/2] xenalyze: Handle start-of-day ->RUNNING transitions | expand

Commit Message

George Dunlap March 27, 2023, 4:13 p.m. UTC
A recent xentrace highlighted an unhandled corner case in the vcpu
"start-of-day" logic, if the trace starts after the last running ->
non-running transition, but before the first non-running -> running
transition.  Because start-of-day wasn't handled, vcpu_next_update()
was expecting p->current to be NULL, and tripping out with the
following error message when it wasn't:

vcpu_next_update: FATAL: p->current not NULL! (d32768dv$p, runstate RUNSTATE_INIT)

where 32768 is the DEFAULT_DOMAIN, and $p is the pcpu number.

Instead of calling vcpu_start() piecemeal throughout
sched_runstate_process(), call it at the top of the function if the
vcpu in question is still in RUNSTATE_INIT, so that we can handle all
the cases in one place.

Sketch out at the top of the function all cases which we need to
handle, and what to do in those cases.  Some transitions tell us where
v is running; some transitions tell us about what is (or is not)
running on p; some transitions tell us neither.

If a transition tells us where v is now running, update its state;
otherwise leave it in INIT, in order to avoid having to deal with TSC
skew on start-up.

If a transition tells us what is or is not running on p, update
p->current (either to v or NULL).  Otherwise leave it alone.

If neither, do nothing.

Reifying those rules:

- If we're continuing to run, set v to RUNNING, and use p->first_tsc
  as the runstate time.

- If we're starting to run, set v to RUNNING, and use ri->tsc as the
  runstate time.

- If v is being deschedled, leave v in the INIT state to avoid dealing
  with TSC skew; but set p->current to NULL so that whatever is
  scheduled next won't trigger the assert in vcpu_next_update().

- If a vcpu is waking up (switching from one non-runnable state to
  another non-runnable state), leave v in INIT, and p in whatever
  state it's in (which may be the default domain, or some other vcpu
  which has already run).

While here, fix the comment above vcpu_start; it's called when the
vcpu state is INIT, not when current is the default domain.

Signed-off-by: George Dunlap <george.dunlap@cloud.com>
---
CC: Anthony Perard <anthony.perard@cloud.com>
CC: Wei Liu <wl@xenproject.org>
---
 tools/xentrace/xenalyze.c | 159 ++++++++++++++++++++++++--------------
 1 file changed, 101 insertions(+), 58 deletions(-)

Comments

Andrew Cooper April 27, 2023, 8:28 p.m. UTC | #1
On 27/03/2023 5:13 pm, George Dunlap wrote:
> A recent xentrace highlighted an unhandled corner case in the vcpu
> "start-of-day" logic, if the trace starts after the last running ->
> non-running transition, but before the first non-running -> running
> transition.  Because start-of-day wasn't handled, vcpu_next_update()
> was expecting p->current to be NULL, and tripping out with the
> following error message when it wasn't:
>
> vcpu_next_update: FATAL: p->current not NULL! (d32768dv$p, runstate RUNSTATE_INIT)
>
> where 32768 is the DEFAULT_DOMAIN, and $p is the pcpu number.
>
> Instead of calling vcpu_start() piecemeal throughout
> sched_runstate_process(), call it at the top of the function if the
> vcpu in question is still in RUNSTATE_INIT, so that we can handle all
> the cases in one place.
>
> Sketch out at the top of the function all cases which we need to
> handle, and what to do in those cases.  Some transitions tell us where
> v is running; some transitions tell us about what is (or is not)
> running on p; some transitions tell us neither.
>
> If a transition tells us where v is now running, update its state;
> otherwise leave it in INIT, in order to avoid having to deal with TSC
> skew on start-up.
>
> If a transition tells us what is or is not running on p, update
> p->current (either to v or NULL).  Otherwise leave it alone.
>
> If neither, do nothing.
>
> Reifying those rules:
>
> - If we're continuing to run, set v to RUNNING, and use p->first_tsc
>   as the runstate time.
>
> - If we're starting to run, set v to RUNNING, and use ri->tsc as the
>   runstate time.
>
> - If v is being deschedled, leave v in the INIT state to avoid dealing
>   with TSC skew; but set p->current to NULL so that whatever is
>   scheduled next won't trigger the assert in vcpu_next_update().
>
> - If a vcpu is waking up (switching from one non-runnable state to
>   another non-runnable state), leave v in INIT, and p in whatever
>   state it's in (which may be the default domain, or some other vcpu
>   which has already run).
>
> While here, fix the comment above vcpu_start; it's called when the
> vcpu state is INIT, not when current is the default domain.
>
> Signed-off-by: George Dunlap <george.dunlap@cloud.com>

Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Anthony PERARD May 4, 2023, 1:23 p.m. UTC | #2
On Mon, Mar 27, 2023 at 05:13:25PM +0100, George Dunlap wrote:
> A recent xentrace highlighted an unhandled corner case in the vcpu
> "start-of-day" logic, if the trace starts after the last running ->
> non-running transition, but before the first non-running -> running
> transition.  Because start-of-day wasn't handled, vcpu_next_update()
> was expecting p->current to be NULL, and tripping out with the
> following error message when it wasn't:
> 
> vcpu_next_update: FATAL: p->current not NULL! (d32768dv$p, runstate RUNSTATE_INIT)
> 
> where 32768 is the DEFAULT_DOMAIN, and $p is the pcpu number.
> 
> Instead of calling vcpu_start() piecemeal throughout
> sched_runstate_process(), call it at the top of the function if the
> vcpu in question is still in RUNSTATE_INIT, so that we can handle all
> the cases in one place.
> 
> Sketch out at the top of the function all cases which we need to
> handle, and what to do in those cases.  Some transitions tell us where
> v is running; some transitions tell us about what is (or is not)
> running on p; some transitions tell us neither.
> 
> If a transition tells us where v is now running, update its state;
> otherwise leave it in INIT, in order to avoid having to deal with TSC
> skew on start-up.
> 
> If a transition tells us what is or is not running on p, update
> p->current (either to v or NULL).  Otherwise leave it alone.
> 
> If neither, do nothing.
> 
> Reifying those rules:
> 
> - If we're continuing to run, set v to RUNNING, and use p->first_tsc
>   as the runstate time.
> 
> - If we're starting to run, set v to RUNNING, and use ri->tsc as the
>   runstate time.
> 
> - If v is being deschedled, leave v in the INIT state to avoid dealing
>   with TSC skew; but set p->current to NULL so that whatever is
>   scheduled next won't trigger the assert in vcpu_next_update().
> 
> - If a vcpu is waking up (switching from one non-runnable state to
>   another non-runnable state), leave v in INIT, and p in whatever
>   state it's in (which may be the default domain, or some other vcpu
>   which has already run).
> 
> While here, fix the comment above vcpu_start; it's called when the
> vcpu state is INIT, not when current is the default domain.
> 
> Signed-off-by: George Dunlap <george.dunlap@cloud.com>

Acked-by: Anthony PERARD <anthony.perard@citrix.com>
diff mbox series

Patch

diff --git a/tools/xentrace/xenalyze.c b/tools/xentrace/xenalyze.c
index 5de167031e..9f6d9763ac 100644
--- a/tools/xentrace/xenalyze.c
+++ b/tools/xentrace/xenalyze.c
@@ -6885,39 +6885,86 @@  void vcpu_next_update(struct pcpu_info *p, struct vcpu_data *next, tsc_t tsc)
     p->lost_record.seen_valid_schedule = 1;
 }
 
-/* If current is the default domain, we're fixing up from something
- * like start-of-day.  Update what we can. */
-void vcpu_start(struct pcpu_info *p, struct vcpu_data *v) {
-    /* If vcpus are created, or first show up, in a "dead zone", this will
-     * fail. */
-    if( !p->current || p->current->d->did != DEFAULT_DOMAIN) {
-        fprintf(stderr, "Strange, p->current not default domain!\n");
-        error(ERR_FILE, NULL);
-        return;
-    }
+/* 
+ * If the vcpu in question is in state INIT, we're fixing up from something
+ * like start-of-day.  Update what we can.
+ */
+void vcpu_start(struct pcpu_info *p, struct vcpu_data *v,
+                int old_runstate, int new_runstate, tsc_t ri_tsc) {
+    tsc_t tsc;
+
+    /*
+     * 
+     * Cases:
+     * running -> running:
+     *  v -> running, using p->first_tsc
+     * {runnable, blocked} -> running:
+     *  v -> running, using ri->tsc
+     * running -> {runnable, blocked}:
+     *  Leave v INIT, but clear p->current in case another vcpu is scheduled
+     * blocked -> runnable:
+     *  Leave INIT, and also leave p->current, since we still don't know who's scheduled here
+     */
+
+    /* 
+     * NB that a vcpu won't come out of INIT until it starts running somewhere.  
+     * If this event is pcpu that has already seen a scheduling event, p->current
+     * should be null; if this is the first scheduling event on this pcpu, 
+     * p->current should be the default domain.
+     */
+    if( old_runstate == RUNSTATE_RUNNING ) {
+        if ( !p->current ||  p->current->d->did != DEFAULT_DOMAIN) {
+            fprintf(stderr, "Strange, p->current not default domain!\n");
+            error(ERR_FILE, NULL);
+            return;
 
-    if(!p->first_tsc) {
-        fprintf(stderr, "Strange, p%d first_tsc 0!\n", p->pid);
-        error(ERR_FILE, NULL);
+        }
+        
+        if(!p->first_tsc) {
+            fprintf(stderr, "Strange, p%d first_tsc 0!\n", p->pid);
+            error(ERR_FILE, NULL);
+        }
+
+        if(p->first_tsc <= p->current->runstate.tsc) {
+            fprintf(stderr, "Strange, first_tsc %llx < default_domain runstate tsc %llx!\n",
+                    p->first_tsc,
+                    p->current->runstate.tsc);
+            error(ERR_FILE, NULL);
+        }
+    
+        /* Change default domain to 'queued' */
+        runstate_update(p->current, RUNSTATE_QUEUED, p->first_tsc);
+
+        /* 
+         * Set current to NULL, so that if another vcpu (not in INIT)
+         * is scheduled here, we don't trip over the check in
+         * vcpu_next_update()
+         */
+        p->current = NULL;
     }
 
-    if(p->first_tsc <= p->current->runstate.tsc) {
-        fprintf(stderr, "Strange, first_tsc %llx < default_domain runstate tsc %llx!\n",
-                p->first_tsc,
-                p->current->runstate.tsc);
-        error(ERR_FILE, NULL);
+    /* TSC skew at start-of-day is hard to deal with.  Don't
+     * bring a vcpu out of INIT until it's seen to be actually
+     * running somewhere. */
+    if ( new_runstate != RUNSTATE_RUNNING ) {
+        fprintf(warn, "First schedule for d%dv%d doesn't take us into a running state; leaving INIT\n",
+                v->d->did, v->vid);
+
+        return;
     }
 
-    /* Change default domain to 'queued' */
-    runstate_update(p->current, RUNSTATE_QUEUED, p->first_tsc);
+    tsc = ri_tsc;
+    if ( old_runstate == RUNSTATE_RUNNING ) {
+        /* FIXME: Copy over data from the default domain this interval */
+        fprintf(warn, "Using first_tsc for d%dv%d (%lld cycles)\n",
+                v->d->did, v->vid, p->last_tsc - p->first_tsc);
 
-    /* FIXME: Copy over data from the default domain this interval */
-    fprintf(warn, "Using first_tsc for d%dv%d (%lld cycles)\n",
-            v->d->did, v->vid, p->last_tsc - p->first_tsc);
+        tsc = p->first_tsc;
+    }
 
     /* Simulate the time since the first tsc */
-    runstate_update(v, RUNSTATE_RUNNING, p->first_tsc);
-    p->time.tsc = p->first_tsc;
+    runstate_update(v, RUNSTATE_RUNNING, tsc);
+    p->time.tsc = tsc;
     p->current = v;
     pcpu_string_draw(p);
     v->p = p;
@@ -7021,6 +7068,13 @@  void sched_runstate_process(struct pcpu_info *p)
     last_oldstate = v->runstate.last_oldstate;
     v->runstate.last_oldstate.wrong = RUNSTATE_INIT;
 
+    /* Handle all "start-of-day" issues in one place.  This can be
+     * done before any of the other tracks or sanity checks. */
+    if ( v->runstate.state == RUNSTATE_INIT ) {
+        vcpu_start(p, v, sevt.old_runstate, sevt.new_runstate, ri->tsc);
+        return;
+    }
+
     /* Close vmexits when the putative reason for blocking / &c stops.
      * This way, we don't account cpu contention to some other overhead. */
     if(sevt.new_runstate == RUNSTATE_RUNNABLE
@@ -7190,32 +7244,27 @@  update:
      * or stopping actually running on a physical cpu. */
     if ( type == CONTINUE )
     {
-        if( v->runstate.state == RUNSTATE_INIT ) {
-            /* Start-of-day; account first tsc -> now to v */
-            vcpu_start(p, v);
-        } else {
-            /* Continue running.  First, do some sanity checks */
-            if ( v->runstate.state == RUNSTATE_LOST ) {
-                fprintf(warn, "WARNING: continue with d%dv%d in RUNSTATE_LOST.  Resetting current.\n",
-                        v->d->did, v->vid);
-                if ( p->current )
-                    vcpu_prev_update(p, p->current, ri->tsc, RUNSTATE_LOST);
-                vcpu_next_update(p, v, ri->tsc);
-            }
-            else if( v->runstate.state != RUNSTATE_RUNNING ) {
-                /* This should never happen. */
-                fprintf(warn, "FATAL: sevt.old_runstate running, but d%dv%d runstate %s!\n",
-                        v->d->did, v->vid, runstate_name[v->runstate.state]);
-                error(ERR_FILE, NULL);
-            } else if ( v->p != p ) {
-                fprintf(warn, "FATAL: continue on p%d, but d%dv%d p%d!\n",
-                        p->pid, v->d->did, v->vid,
-                        v->p ? v->p->pid : -1);
-                error(ERR_FILE, NULL);
-            }
-
-            runstate_update(v, RUNSTATE_RUNNING, ri->tsc);
+        /* Continue running.  First, do some sanity checks */
+        if ( v->runstate.state == RUNSTATE_LOST ) {
+            fprintf(warn, "WARNING: continue with d%dv%d in RUNSTATE_LOST.  Resetting current.\n",
+                    v->d->did, v->vid);
+            if ( p->current )
+                vcpu_prev_update(p, p->current, ri->tsc, RUNSTATE_LOST);
+            vcpu_next_update(p, v, ri->tsc);
+        }
+        else if( v->runstate.state != RUNSTATE_RUNNING ) {
+            /* This should never happen. */
+            fprintf(warn, "FATAL: sevt.old_runstate running, but d%dv%d runstate %s!\n",
+                    v->d->did, v->vid, runstate_name[v->runstate.state]);
+            error(ERR_FILE, NULL);
+        } else if ( v->p != p ) {
+            fprintf(warn, "FATAL: continue on p%d, but d%dv%d p%d!\n",
+                    p->pid, v->d->did, v->vid,
+                    v->p ? v->p->pid : -1);
+            error(ERR_FILE, NULL);
         }
+
+        runstate_update(v, RUNSTATE_RUNNING, ri->tsc);
     }
     else if ( sevt.old_runstate == RUNSTATE_RUNNING
               || v->runstate.state == RUNSTATE_RUNNING )
@@ -7232,10 +7281,7 @@  update:
          *   # (should never happen)
          */
         if( sevt.old_runstate == RUNSTATE_RUNNING ) {
-            if( v->runstate.state == RUNSTATE_INIT ) {
-                /* Start-of-day; account first tsc -> now to v */
-                vcpu_start(p, v);
-            } else if( v->runstate.state != RUNSTATE_RUNNING
+            if( v->runstate.state != RUNSTATE_RUNNING
                        && v->runstate.state != RUNSTATE_LOST ) {
                 /* This should never happen. */
                 fprintf(warn, "FATAL: sevt.old_runstate running, but d%dv%d runstate %s!\n",
@@ -7264,11 +7310,8 @@  update:
 
         vcpu_next_update(p, v, ri->tsc);
     }
-    else if ( v->runstate.state != RUNSTATE_INIT )
+    else
     {
-        /* TSC skew at start-of-day is hard to deal with.  Don't
-         * bring a vcpu out of INIT until it's seen to be actually
-         * running somewhere. */
         runstate_update(v, sevt.new_runstate, ri->tsc);
     }