[1/3] libxc/xc_domain_resume: Update comment.
diff mbox

Message ID 1453756017-8747-2-git-send-email-konrad.wilk@oracle.com
State New, archived
Headers show

Commit Message

Konrad Rzeszutek Wilk Jan. 25, 2016, 9:06 p.m. UTC
To hopefully clarify what it meant.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
 tools/libxc/xc_resume.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

Comments

Ian Campbell Jan. 26, 2016, 4:19 p.m. UTC | #1
On Mon, 2016-01-25 at 16:06 -0500, Konrad Rzeszutek Wilk wrote:
> To hopefully clarify what it meant.
> 
> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> ---
>  tools/libxc/xc_resume.c | 7 +++++--
>  1 file changed, 5 insertions(+), 2 deletions(-)
> 
> diff --git a/tools/libxc/xc_resume.c b/tools/libxc/xc_resume.c
> index 87d4324..19ba2a3 100644
> --- a/tools/libxc/xc_resume.c
> +++ b/tools/libxc/xc_resume.c
> @@ -248,9 +248,12 @@ out:
>  /*
>   * Resume execution of a domain after suspend shutdown.
>   * This can happen in one of two ways:
> - *  1. Resume with special return code.
> - *  2. Reset guest environment so it believes it is resumed in a new
> + *  1. (fast=1) Resume with special return code (1) that the guest
> + *     gets from SCHEDOP_shutdown:SHUTDOWN_suspend.

"SCHEDOP_shutdown(SHUTDOWN_suspend)" looks more like the function call
which this in effect is.

I think I'd say "Resume the guest without resetting the domain environment.
The guests's call to SCHEDOP_shutdown(SHUTDOWN_suspend) will return 1".

(assuming that is true re resetting)

> + *
> + *  2. (fast=0) Reset guest environment so it believes it is resumed in a new
>   *     domain context.

with the above I would suggesting adding "The guests's call to
SCHEDOP_shutdown(SHUTDOWN_suspend) will return 0".

> + *
>   * (2) should be used only for guests which cannot handle the special
>   * new return code. (1) is always safe (but slower).

Is this correct? I'd have said (2) was always safe but slow?

And I would invert the first, that is to say that (1) should be used in
preference with guests which support it.

Ian.
Ian Jackson Jan. 26, 2016, 4:22 p.m. UTC | #2
Ian Campbell writes ("Re: [PATCH 1/3] libxc/xc_domain_resume: Update comment."):
> On Mon, 2016-01-25 at 16:06 -0500, Konrad Rzeszutek Wilk wrote:
> > To hopefully clarify what it meant.
...
> > + *  1. (fast=1) Resume with special return code (1) that the guest
> > + *     gets from SCHEDOP_shutdown:SHUTDOWN_suspend.
> 
> "SCHEDOP_shutdown(SHUTDOWN_suspend)" looks more like the function call
> which this in effect is.
> 
> I think I'd say "Resume the guest without resetting the domain environment.
> The guests's call to SCHEDOP_shutdown(SHUTDOWN_suspend) will return 1".
> 
> (assuming that is true re resetting)

I'm not sure that `will return 1' is correct.  IIRC there is some
... unpleasantness here, with something effectively corrupting the
guest state in a way that the guest is supposed to expect and
cooperate with.

I haven't investigated the details recently.  I do remember it being
fiddly.

Ian.
Ian Campbell Jan. 26, 2016, 4:36 p.m. UTC | #3
On Tue, 2016-01-26 at 16:22 +0000, Ian Jackson wrote:
> Ian Campbell writes ("Re: [PATCH 1/3] libxc/xc_domain_resume: Update
> comment."):
> > On Mon, 2016-01-25 at 16:06 -0500, Konrad Rzeszutek Wilk wrote:
> > > To hopefully clarify what it meant.
> ...
> > > + *  1. (fast=1) Resume with special return code (1) that the guest
> > > + *     gets from SCHEDOP_shutdown:SHUTDOWN_suspend.
> > 
> > "SCHEDOP_shutdown(SHUTDOWN_suspend)" looks more like the function call
> > which this in effect is.
> > 
> > I think I'd say "Resume the guest without resetting the domain
> > environment.
> > The guests's call to SCHEDOP_shutdown(SHUTDOWN_suspend) will return 1".
> > 
> > (assuming that is true re resetting)
> 
> I'm not sure that `will return 1' is correct.  IIRC there is some
> ... unpleasantness here, with something effectively corrupting the
> guest state in a way that the guest is supposed to expect and
> cooperate with.

The tools arrange for the hypercall to return 1, which the guest is indeed
expected to expect and cooperate, as with any PV interface call it makes.

They do this by intimate knowledge of the hypercall ABI (i.e. which
register is the return value) and one could certainly argue it ought to be
arranged in a less horrific way, but I think to characterise it as
"corrupting" is probably going to far.


> 
> I haven't investigated the details recently.  I do remember it being
> fiddly.
> 
> Ian.
Ian Jackson Jan. 26, 2016, 4:52 p.m. UTC | #4
Ian Campbell writes ("Re: [PATCH 1/3] libxc/xc_domain_resume: Update comment."):
> On Tue, 2016-01-26 at 16:22 +0000, Ian Jackson wrote:
> > I'm not sure that `will return 1' is correct.  IIRC there is some
> > ... unpleasantness here, with something effectively corrupting the
> > guest state in a way that the guest is supposed to expect and
> > cooperate with.
> 
> The tools arrange for the hypercall to return 1, which the guest is indeed
> expected to expect and cooperate, as with any PV interface call it makes.
> 
> They do this by intimate knowledge of the hypercall ABI (i.e. which
> register is the return value) and one could certainly argue it ought to be
> arranged in a less horrific way, but I think to characterise it as
> "corrupting" is probably going to far.

Ian C had a conversation about this in person.  We think (ie, I am now
convinced) that provided that this xc resume call is only made when
the guest is suspended, that the worst outcome will indeed be that the
guest experiences the hypercall returning 1, and then finding itself
in a state it's not expecting.  The guest will hopefully crash due
to the unexpected return value but is in any case likely to implode
soon due to event channel misconfiguration etc.

Only if the `resume' is attempted with the guest running, would the
guest's %eax actually be `corrupted' in this sense.

Ian.
Konrad Rzeszutek Wilk Jan. 26, 2016, 7:37 p.m. UTC | #5
On Tue, Jan 26, 2016 at 04:52:06PM +0000, Ian Jackson wrote:
> Ian Campbell writes ("Re: [PATCH 1/3] libxc/xc_domain_resume: Update comment."):
> > On Tue, 2016-01-26 at 16:22 +0000, Ian Jackson wrote:
> > > I'm not sure that `will return 1' is correct.  IIRC there is some
> > > ... unpleasantness here, with something effectively corrupting the
> > > guest state in a way that the guest is supposed to expect and
> > > cooperate with.
> > 
> > The tools arrange for the hypercall to return 1, which the guest is indeed
> > expected to expect and cooperate, as with any PV interface call it makes.
> > 
> > They do this by intimate knowledge of the hypercall ABI (i.e. which
> > register is the return value) and one could certainly argue it ought to be
> > arranged in a less horrific way, but I think to characterise it as
> > "corrupting" is probably going to far.
> 
> Ian C had a conversation about this in person.  We think (ie, I am now
> convinced) that provided that this xc resume call is only made when
> the guest is suspended, that the worst outcome will indeed be that the
> guest experiences the hypercall returning 1, and then finding itself
> in a state it's not expecting.  The guest will hopefully crash due
> to the unexpected return value but is in any case likely to implode
> soon due to event channel misconfiguration etc.

<nods>
> 
> Only if the `resume' is attempted with the guest running, would the
> guest's %eax actually be `corrupted' in this sense.

Right. That code doing the set_vcpu manipulations is quite .. ahhh
interesting!

I will update the comment to be more clear.
> 
> Ian.
Konrad Rzeszutek Wilk Jan. 26, 2016, 7:47 p.m. UTC | #6
On Tue, Jan 26, 2016 at 04:19:54PM +0000, Ian Campbell wrote:
> On Mon, 2016-01-25 at 16:06 -0500, Konrad Rzeszutek Wilk wrote:
> > To hopefully clarify what it meant.
> > 
> > Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> > ---
> >  tools/libxc/xc_resume.c | 7 +++++--
> >  1 file changed, 5 insertions(+), 2 deletions(-)
> > 
> > diff --git a/tools/libxc/xc_resume.c b/tools/libxc/xc_resume.c
> > index 87d4324..19ba2a3 100644
> > --- a/tools/libxc/xc_resume.c
> > +++ b/tools/libxc/xc_resume.c
> > @@ -248,9 +248,12 @@ out:
> >  /*
> >   * Resume execution of a domain after suspend shutdown.
> >   * This can happen in one of two ways:
> > - *  1. Resume with special return code.
> > - *  2. Reset guest environment so it believes it is resumed in a new
> > + *  1. (fast=1) Resume with special return code (1) that the guest
> > + *     gets from SCHEDOP_shutdown:SHUTDOWN_suspend.
> 
> "SCHEDOP_shutdown(SHUTDOWN_suspend)" looks more like the function call
> which this in effect is.
> 
> I think I'd say "Resume the guest without resetting the domain environment.
> The guests's call to SCHEDOP_shutdown(SHUTDOWN_suspend) will return 1".
> 
> (assuming that is true re resetting)
> 
> > + *
> > + *  2. (fast=0) Reset guest environment so it believes it is resumed in a new
> >   *     domain context.
> 
> with the above I would suggesting adding "The guests's call to
> SCHEDOP_shutdown(SHUTDOWN_suspend) will return 0".
> 
> > + *
> >   * (2) should be used only for guests which cannot handle the special
> >   * new return code. (1) is always safe (but slower).
> 
> Is this correct? I'd have said (2) was always safe but slow?

That does not sound right. It should have said that fast=1 
would be fast but not safe. And 2) (fast=0) is safe but slower.

Let me resend this - with it hopefully being more clear.
> 
> And I would invert the first, that is to say that (1) should be used in
> preference with guests which support it.

Reading the 1) I am bit perplexed. It says "safe" but what it does is far
from safe - it manipulates the vCPU eax register to be 1. Granted it does
it on a "paused" vCPU and once the vCPU resume it can read it.

I guess in the olden days this was considered safe. Along with driving
without seatbelts.

> 
> Ian.
Ian Campbell Jan. 27, 2016, 9:52 a.m. UTC | #7
On Tue, 2016-01-26 at 14:47 -0500, Konrad Rzeszutek Wilk wrote:
> On Tue, Jan 26, 2016 at 04:19:54PM +0000, Ian Campbell wrote:
> > On Mon, 2016-01-25 at 16:06 -0500, Konrad Rzeszutek Wilk wrote:
> > > To hopefully clarify what it meant.
> > > 
> > > Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> > > ---
> > >  tools/libxc/xc_resume.c | 7 +++++--
> > >  1 file changed, 5 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/tools/libxc/xc_resume.c b/tools/libxc/xc_resume.c
> > > index 87d4324..19ba2a3 100644
> > > --- a/tools/libxc/xc_resume.c
> > > +++ b/tools/libxc/xc_resume.c
> > > @@ -248,9 +248,12 @@ out:
> > >  /*
> > >   * Resume execution of a domain after suspend shutdown.
> > >   * This can happen in one of two ways:
> > > - *  1. Resume with special return code.
> > > - *  2. Reset guest environment so it believes it is resumed in a new
> > > + *  1. (fast=1) Resume with special return code (1) that the guest
> > > + *     gets from SCHEDOP_shutdown:SHUTDOWN_suspend.
> > 
> > "SCHEDOP_shutdown(SHUTDOWN_suspend)" looks more like the function call
> > which this in effect is.
> > 
> > I think I'd say "Resume the guest without resetting the domain
> > environment.
> > The guests's call to SCHEDOP_shutdown(SHUTDOWN_suspend) will return 1".
> > 
> > (assuming that is true re resetting)
> > 
> > > + *
> > > + *  2. (fast=0) Reset guest environment so it believes it is resumed
> > > in a new
> > >   *     domain context.
> > 
> > with the above I would suggesting adding "The guests's call to
> > SCHEDOP_shutdown(SHUTDOWN_suspend) will return 0".
> > 
> > > + *
> > >   * (2) should be used only for guests which cannot handle the
> > > special
> > >   * new return code. (1) is always safe (but slower).
> > 
> > Is this correct? I'd have said (2) was always safe but slow?
> 
> That does not sound right. It should have said that fast=1 
> would be fast but not safe. And 2) (fast=0) is safe but slower.
> 
> Let me resend this - with it hopefully being more clear.
> > 
> > And I would invert the first, that is to say that (1) should be used in
> > preference with guests which support it.
> 
> Reading the 1) I am bit perplexed. It says "safe" but what it does is far
> from safe - it manipulates the vCPU eax register to be 1. Granted it does
> it on a "paused" vCPU and once the vCPU resume it can read it.

I think "(1) is always safe (but slower)" should have always read "(2) is
always safe (but slower)". (1) has always been "fast and loose" and
requires guest side support.

Ian.

Patch
diff mbox

diff --git a/tools/libxc/xc_resume.c b/tools/libxc/xc_resume.c
index 87d4324..19ba2a3 100644
--- a/tools/libxc/xc_resume.c
+++ b/tools/libxc/xc_resume.c
@@ -248,9 +248,12 @@  out:
 /*
  * Resume execution of a domain after suspend shutdown.
  * This can happen in one of two ways:
- *  1. Resume with special return code.
- *  2. Reset guest environment so it believes it is resumed in a new
+ *  1. (fast=1) Resume with special return code (1) that the guest
+ *     gets from SCHEDOP_shutdown:SHUTDOWN_suspend.
+ *
+ *  2. (fast=0) Reset guest environment so it believes it is resumed in a new
  *     domain context.
+ *
  * (2) should be used only for guests which cannot handle the special
  * new return code. (1) is always safe (but slower).
  */