diff mbox

[3/3] xen: Document XEN_SYSCTL_CPUPOOL_OP_RMCPU anomalous EBUSY result

Message ID 1460653660-6654-4-git-send-email-ian.jackson@eu.citrix.com (mailing list archive)
State New, archived
Headers show

Commit Message

Ian Jackson April 14, 2016, 5:07 p.m. UTC
This is my attempt at understanding the situation, from reading
descriptions provided on list in the context of toolstack patches
which were attempting to work around the anomaly.

The multiple `xxx' entries reflect 1. my lack of complete understanding
2. API defects which I think I have identified.

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
CC: Dario Faggioli <dario.faggioli@citrix.com>
CC: Juergen Gross <jgross@suse.com>
CC: George Dunlap <george.dunlap@eu.citrix.com>
CC: Jan Beulich <jbeulich@suse.com>
CC: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
 xen/include/public/sysctl.h |   28 ++++++++++++++++++++++++++++
 1 file changed, 28 insertions(+)

Comments

Andrew Cooper April 14, 2016, 5:24 p.m. UTC | #1
On 14/04/16 18:07, Ian Jackson wrote:
> This is my attempt at understanding the situation, from reading
> descriptions provided on list in the context of toolstack patches
> which were attempting to work around the anomaly.
>
> The multiple `xxx' entries reflect 1. my lack of complete understanding
> 2. API defects which I think I have identified.
>
> Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
> Cc: Wei Liu <wei.liu2@citrix.com>
> CC: Dario Faggioli <dario.faggioli@citrix.com>
> CC: Juergen Gross <jgross@suse.com>
> CC: George Dunlap <george.dunlap@eu.citrix.com>
> CC: Jan Beulich <jbeulich@suse.com>
> CC: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> ---
>  xen/include/public/sysctl.h |   28 ++++++++++++++++++++++++++++
>  1 file changed, 28 insertions(+)
>
> diff --git a/xen/include/public/sysctl.h b/xen/include/public/sysctl.h
> index 0849908..cfccf38 100644
> --- a/xen/include/public/sysctl.h
> +++ b/xen/include/public/sysctl.h
> @@ -560,6 +560,34 @@ struct xen_sysctl_cpupool_op {
>  typedef struct xen_sysctl_cpupool_op xen_sysctl_cpupool_op_t;
>  DEFINE_XEN_GUEST_HANDLE(xen_sysctl_cpupool_op_t);
>  
> +/*
> + * cpupool operations may return EBUSY if the operation cannot be
> + * executed right now because of another cpupool operation which is
> + * still in progress.  In this case, EBUSY means that the failed
> + * operation had no effect.
> + *
> + * Some operations including at least RMCPU (xxx which others?) may
> + * also return EBUSY because a guest has temporarily pinned one of its
> + * vcpus to the pcpu in question.  It is the pious hope (xxx) of the
> + * author of this comment that this can only occur for domains which
> + * have been granted some kind of hardware privilege (eg passthrough).

Any VM can be given any arbitrary pinning in its xl configuration file. 
Any arbitrary pinning can be applied at runtime via `xl vcpu-pin ...`

(To the best of my knowledge) A VM cannot choose pinning of its own
accord.  (i.e. the host admin has to choose the pinning.)

~Andrew

> + *
> + * In this case the operation may have been partially carried out and
> + * the pcpu is left in an anomalous state.  In this state the pcpu may
> + * be used by some not readily predictable subset of the vcpus
> + * (domains) whose vcpus are in the old cpupool.  (xxx is this true?)
> + *
> + * This can be detected by seeing whether the pcpu can be added to a
> + * different cpupool.  (xxx this is a silly interface; the situation
> + * should be reported by a different errno value, at least.)  If the
> + * pcpu can't be added to a different cpupool for this reason,
> + * attempts to do so will returning (xxx what errno value?)
> + *
> + * The anomalous situation can be recovered by adding the pcpu back to
> + * the cpupool it came from (xxx this permits a buggy or malicious
> + * guest to prevent the cpu ever being removed from its cpupool).
> + */ 
> +
>  #define ARINC653_MAX_DOMAINS_PER_SCHEDULE   64
>  /*
>   * This structure is used to pass a new ARINC653 schedule from a
Ian Jackson April 14, 2016, 5:56 p.m. UTC | #2
Andrew Cooper writes ("Re: [Xen-devel] [PATCH 3/3] xen: Document XEN_SYSCTL_CPUPOOL_OP_RMCPU anomalous EBUSY result"):
> On 14/04/16 18:07, Ian Jackson wrote:
> > +/*
> > + * cpupool operations may return EBUSY if the operation cannot be
> > + * executed right now because of another cpupool operation which is
> > + * still in progress.  In this case, EBUSY means that the failed
> > + * operation had no effect.
> > + *
> > + * Some operations including at least RMCPU (xxx which others?) may
> > + * also return EBUSY because a guest has temporarily pinned one of its
> > + * vcpus to the pcpu in question.  It is the pious hope (xxx) of the
> > + * author of this comment that this can only occur for domains which
> > + * have been granted some kind of hardware privilege (eg passthrough).
> 
> Any VM can be given any arbitrary pinning in its xl configuration file. 
> Any arbitrary pinning can be applied at runtime via `xl vcpu-pin ...`

Does that produce EBUSY as well ?

The reuse of the same error number for all of

  "the existing configuration (eg toolstack-selected vcpu pinning)
   means that the operation does not make sense"

  "there is some lock contention and trying again may help"

  "a semantically conflicting, or nearly-semantically-conflicting,
   operation is currently in progress"

  "the guest has done a temporary pin which prevents this operation"

is very unfortunate.  How is a toolstack to know what to do ?

> (To the best of my knowledge) A VM cannot choose pinning of its own
> accord.  (i.e. the host admin has to choose the pinning.)

AIUI, that is not (now) true.

Ian.
Dario Faggioli April 14, 2016, 8:22 p.m. UTC | #3
On Thu, 2016-04-14 at 18:56 +0100, Ian Jackson wrote:
> Andrew Cooper writes ("Re: [Xen-devel] [PATCH 3/3] xen: Document
> XEN_SYSCTL_CPUPOOL_OP_RMCPU anomalous EBUSY result"):
> > 
> > On 14/04/16 18:07, Ian Jackson wrote:
> > > 
> > > +/*
> > > + * cpupool operations may return EBUSY if the operation cannot
> > > be
> > > + * executed right now because of another cpupool operation which
> > > is
> > > + * still in progress.  In this case, EBUSY means that the failed
> > > + * operation had no effect.
> > > + *
> > > + * Some operations including at least RMCPU (xxx which others?)
> > > may
> > > + * also return EBUSY because a guest has temporarily pinned one
> > > of its
> > > + * vcpus to the pcpu in question.  It is the pious hope (xxx) of
> > > the
> > > + * author of this comment that this can only occur for domains
> > > which
> > > + * have been granted some kind of hardware privilege (eg
> > > passthrough).
> > Any VM can be given any arbitrary pinning in its xl configuration
> > file. 
> > Any arbitrary pinning can be applied at runtime via `xl vcpu-pin
> > ...`
> Does that produce EBUSY as well ?
> 
It can, after Juergen series, but I think in this case (setting
affinity), the situation is still acceptable. In fact:

> The reuse of the same error number for all of
> 
>   "the existing configuration (eg toolstack-selected vcpu pinning)
>    means that the operation does not make sense"
> 
This return -EINVAL.

>   "there is some lock contention and trying again may help"
> 
This can't happen in this case (and reason is just that setting the
affinity of a vcpu is different and less problematic than removing a
cpu from a cpupool).

>   "a semantically conflicting, or nearly-semantically-conflicting,
>    operation is currently in progress"
> 
I'm not sure what this means exactly, but I think that --depending on
what it exactly means-- it either can't happen or fall into the -EINVAL
case.

>   "the guest has done a temporary pin which prevents this operation"
> 
This (because of the series) returns -EBUSY.

> is very unfortunate.  How is a toolstack to know what to do ?
> 
Yeah, I agree, but again, I think in this case it's possible for
toolstack to tell.

From a quick check, we do not, in libxl, output any specific error
message in case we get -EBUSY... but I can send a patch to that effect
pretty quickly, if that's deemed necessary.

> > (To the best of my knowledge) A VM cannot choose pinning of its own
> > accord.  (i.e. the host admin has to choose the pinning.)
> AIUI, that is not (now) true.
> 
Yes, now a guest can call the new SCHEDOP_pin_override hypercall (and
Juergen is pushing a series to Linux for it to be able to do that... as
that was the purpose of the while thing!).

However, as said in another email, there's already a check like this in
place, in the implementation of such an hypercall:

        ret = -EPERM;
        if ( !is_hardware_domain(current->domain) )
            break;

which I think satisfies Ian's (legitimate) concern?

Regards,
Dario
Jürgen Groß April 15, 2016, 5:35 a.m. UTC | #4
On 14/04/16 19:07, Ian Jackson wrote:
> This is my attempt at understanding the situation, from reading
> descriptions provided on list in the context of toolstack patches
> which were attempting to work around the anomaly.
> 
> The multiple `xxx' entries reflect 1. my lack of complete understanding
> 2. API defects which I think I have identified.
> 
> Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
> Cc: Wei Liu <wei.liu2@citrix.com>
> CC: Dario Faggioli <dario.faggioli@citrix.com>
> CC: Juergen Gross <jgross@suse.com>
> CC: George Dunlap <george.dunlap@eu.citrix.com>
> CC: Jan Beulich <jbeulich@suse.com>
> CC: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> ---
>  xen/include/public/sysctl.h |   28 ++++++++++++++++++++++++++++
>  1 file changed, 28 insertions(+)
> 
> diff --git a/xen/include/public/sysctl.h b/xen/include/public/sysctl.h
> index 0849908..cfccf38 100644
> --- a/xen/include/public/sysctl.h
> +++ b/xen/include/public/sysctl.h
> @@ -560,6 +560,34 @@ struct xen_sysctl_cpupool_op {
>  typedef struct xen_sysctl_cpupool_op xen_sysctl_cpupool_op_t;
>  DEFINE_XEN_GUEST_HANDLE(xen_sysctl_cpupool_op_t);
>  
> +/*
> + * cpupool operations may return EBUSY if the operation cannot be
> + * executed right now because of another cpupool operation which is
> + * still in progress.  In this case, EBUSY means that the failed
> + * operation had no effect.
> + *
> + * Some operations including at least RMCPU (xxx which others?) may

Only this one.

> + * also return EBUSY because a guest has temporarily pinned one of its
> + * vcpus to the pcpu in question.  It is the pious hope (xxx) of the
> + * author of this comment that this can only occur for domains which
> + * have been granted some kind of hardware privilege (eg passthrough).

Only the hardware domain is allowed to do this.

> + * In this case the operation may have been partially carried out and
> + * the pcpu is left in an anomalous state.  In this state the pcpu may
> + * be used by some not readily predictable subset of the vcpus
> + * (domains) whose vcpus are in the old cpupool.  (xxx is this true?)

Yes.

> + *
> + * This can be detected by seeing whether the pcpu can be added to a
> + * different cpupool.  (xxx this is a silly interface; the situation
> + * should be reported by a different errno value, at least.)  If the

I'm open for suggestions. :-)

> + * pcpu can't be added to a different cpupool for this reason,
> + * attempts to do so will returning (xxx what errno value?)

You might have guessed: EBUSY

> + *
> + * The anomalous situation can be recovered by adding the pcpu back to
> + * the cpupool it came from (xxx this permits a buggy or malicious
> + * guest to prevent the cpu ever being removed from its cpupool).

Only the hardware domain, which has other means to inject problems.


Juergen
Dario Faggioli April 15, 2016, 7:42 a.m. UTC | #5
On Fri, 2016-04-15 at 07:35 +0200, Juergen Gross wrote:
> On 14/04/16 19:07, Ian Jackson wrote:
> > 
> > --- a/xen/include/public/sysctl.h
> > +++ b/xen/include/public/sysctl.h
> > @@ -560,6 +560,34 @@ struct xen_sysctl_cpupool_op {
> > 
> > + * In this case the operation may have been partially carried out
> > and
> > + * the pcpu is left in an anomalous state.  In this state the pcpu
> > may
> > + * be used by some not readily predictable subset of the vcpus
> > + * (domains) whose vcpus are in the old cpupool.  (xxx is this
> > true?)
> Yes.
> 
Well, I think it's a little less bad than that. the pcpu is no longer a
valid member of its cpupool. This means the scheduler will not run
vcpus of the domains of the pool (it's domains that are in cpupools)
any longer. And even the vcpus that are temporarily stuck on the pcpu
(i.e., the ones that are actually preventing the operation to succeed)
will rather quickly get away from it.

So, I'd say rather that "In this state the pcpu will, potentially after
a short transitory, just not be used by any vcpu"

> > + *
> > + * This can be detected by seeing whether the pcpu can be added to
> > a
> > + * different cpupool.  (xxx this is a silly interface; the
> > situation
> > + * should be reported by a different errno value, at least.)  If
> > the
> I'm open for suggestions. :-)
> 
Well, changing the system behavior with anything that involves retry
loops is indeed non-trivial (or, at least, I don't out of the top of my
head have an alternative).

However, a different return value for the super special case of
temporary pinning override could maybe be selected. I'm having a look,
and although I don't find one that could be seen as a perfect fit (that
would be EBUSY, which is taken!), what about one of these:

  EEXIST        17      /* File exists */
  ENOTEMPTY     39      /* Directory not empty */
  EROFS         30      /* Read-only file system */
  ENOSPC        28      /* No space left on device */

After all, as ugly as the resulting error message would be:
 - if's a very rare special case
 - we need to document this very rare special case anyway
 - we can log more info about the actual error

> > + * pcpu can't be added to a different cpupool for this reason,
> > + * attempts to do so will returning (xxx what errno value?)
> You might have guessed: EBUSY
> 
And here too, if the pcpu is not in cpupool_free_cpus, maybe we can go
and check whether it is in any of the existing cpupool's cpu_valid. If
it is, then EBUSY is ok. If not, it means it's in the inconsistent
state, and we can pick another (same as above?) error value, use it and
document the situation.

> > + *
> > + * The anomalous situation can be recovered by adding the pcpu
> > back to
> > + * the cpupool it came from (xxx this permits a buggy or malicious
> > + * guest to prevent the cpu ever being removed from its cpupool).
> Only the hardware domain, which has other means to inject problems.
> 
Yep, indeed.

Regards,
Dario
Ian Jackson April 15, 2016, 10:20 a.m. UTC | #6
Dario Faggioli writes ("Re: [Xen-devel] [PATCH 3/3] xen: Document XEN_SYSCTL_CPUPOOL_OP_RMCPU anomalous EBUSY result"):
> On Thu, 2016-04-14 at 18:56 +0100, Ian Jackson wrote:
> > is very unfortunate.  How is a toolstack to know what to do ?
> > 
> Yeah, I agree, but again, I think in this case it's possible for
> toolstack to tell.

Thanks to both you and Juergen for the explanations.  I'm glad to see
that things weren't as bad as I feared - I admit that I didn't spend
the time to completely follow what all the relevant bits of code did.

Would either of you care to provide a version of my documentation patch
which answers the questions that my text answers ?  Or shall we commit
my version and you can edit it in-tree :-).

>         ret = -EPERM;
>         if ( !is_hardware_domain(current->domain) )
>             break;
> 
> which I think satisfies Ian's (legitimate) concern?

That addresses the security concern, yes.  (Is a driver domain a
`hardware domain' in this sense?)

All I need now is a recipe for the tools to tell what has happened and
then I can make xl or libxl at least print comprehensible and correct
error messages...

Thanks,
Ian.
Jürgen Groß April 15, 2016, 10:43 a.m. UTC | #7
On 15/04/16 12:20, Ian Jackson wrote:
> Dario Faggioli writes ("Re: [Xen-devel] [PATCH 3/3] xen: Document XEN_SYSCTL_CPUPOOL_OP_RMCPU anomalous EBUSY result"):
>> On Thu, 2016-04-14 at 18:56 +0100, Ian Jackson wrote:
>>> is very unfortunate.  How is a toolstack to know what to do ?
>>>
>> Yeah, I agree, but again, I think in this case it's possible for
>> toolstack to tell.
> 
> Thanks to both you and Juergen for the explanations.  I'm glad to see
> that things weren't as bad as I feared - I admit that I didn't spend
> the time to completely follow what all the relevant bits of code did.
> 
> Would either of you care to provide a version of my documentation patch
> which answers the questions that my text answers ?  Or shall we commit
> my version and you can edit it in-tree :-).

I can provide an updated patch.

>>         ret = -EPERM;
>>         if ( !is_hardware_domain(current->domain) )
>>             break;
>>
>> which I think satisfies Ian's (legitimate) concern?
> 
> That addresses the security concern, yes.  (Is a driver domain a
> `hardware domain' in this sense?)

The hardware domain is the one domain having direct access to the
native hardware. This is dom0 in most cases. There can be only one
hardware domain.

> All I need now is a recipe for the tools to tell what has happened and
> then I can make xl or libxl at least print comprehensible and correct
> error messages...

So this boils down to finding an appropriate ESOMETHING replacement
for the EBUSY case introduced by the temporary pinning.

I think ENOTEMPTY or EADDRINUSE would fit best.

The EBUSY returns of not successful repair attempts (trying to assign a
cpu to another cpupool) should be changed to e.g. EADDRNOTAVAIL?


Juergen
Dario Faggioli April 15, 2016, 10:58 a.m. UTC | #8
On Fri, 2016-04-15 at 12:43 +0200, Juergen Gross wrote:
> On 15/04/16 12:20, Ian Jackson wrote:
> > 
> > Would either of you care to provide a version of my documentation
> > patch
> > which answers the questions that my text answers ?  Or shall we
> > commit
> > my version and you can edit it in-tree :-).
> I can provide an updated patch.
> 
Great.

> > All I need now is a recipe for the tools to tell what has happened
> > and
> > then I can make xl or libxl at least print comprehensible and
> > correct
> > error messages...
> So this boils down to finding an appropriate ESOMETHING replacement
> for the EBUSY case introduced by the temporary pinning.
> 
Exactly.

> I think ENOTEMPTY or EADDRINUSE would fit best.
>
I like the latter better.
> 
> The EBUSY returns of not successful repair attempts (trying to assign
> a
> cpu to another cpupool) should be changed to e.g. EADDRNOTAVAIL?
> 
I'd go for EADDRINUSE and EADDRNOTAVAIL so the two error values are
similarly *wrong* hinting an addressing issue, which is more consistent
(and would come handy when documenting) than having one pointing at the
filesystem and the other at the address space.

Are you going to do the patch for this yourself as well?

Regards,
Dario
Jürgen Groß April 15, 2016, 11:37 a.m. UTC | #9
On 15/04/16 12:58, Dario Faggioli wrote:
> On Fri, 2016-04-15 at 12:43 +0200, Juergen Gross wrote:
>> On 15/04/16 12:20, Ian Jackson wrote:
>>>
>>> Would either of you care to provide a version of my documentation
>>> patch
>>> which answers the questions that my text answers ?  Or shall we
>>> commit
>>> my version and you can edit it in-tree :-).
>> I can provide an updated patch.
>>
> Great.
> 
>>> All I need now is a recipe for the tools to tell what has happened
>>> and
>>> then I can make xl or libxl at least print comprehensible and
>>> correct
>>> error messages...
>> So this boils down to finding an appropriate ESOMETHING replacement
>> for the EBUSY case introduced by the temporary pinning.
>>
> Exactly.
> 
>> I think ENOTEMPTY or EADDRINUSE would fit best.
>>
> I like the latter better.
>>
>> The EBUSY returns of not successful repair attempts (trying to assign
>> a
>> cpu to another cpupool) should be changed to e.g. EADDRNOTAVAIL?
>>
> I'd go for EADDRINUSE and EADDRNOTAVAIL so the two error values are
> similarly *wrong* hinting an addressing issue, which is more consistent
> (and would come handy when documenting) than having one pointing at the
> filesystem and the other at the address space.
> 
> Are you going to do the patch for this yourself as well?

Sure.


Juergen
Jürgen Groß April 15, 2016, 1:20 p.m. UTC | #10
On 15/04/16 12:58, Dario Faggioli wrote:
> On Fri, 2016-04-15 at 12:43 +0200, Juergen Gross wrote:
>> On 15/04/16 12:20, Ian Jackson wrote:
>>>
>>> Would either of you care to provide a version of my documentation
>>> patch
>>> which answers the questions that my text answers ?  Or shall we
>>> commit
>>> my version and you can edit it in-tree :-).
>> I can provide an updated patch.
>>
> Great.
> 
>>> All I need now is a recipe for the tools to tell what has happened
>>> and
>>> then I can make xl or libxl at least print comprehensible and
>>> correct
>>> error messages...
>> So this boils down to finding an appropriate ESOMETHING replacement
>> for the EBUSY case introduced by the temporary pinning.
>>
> Exactly.
> 
>> I think ENOTEMPTY or EADDRINUSE would fit best.
>>
> I like the latter better.
>>
>> The EBUSY returns of not successful repair attempts (trying to assign
>> a
>> cpu to another cpupool) should be changed to e.g. EADDRNOTAVAIL?
>>
> I'd go for EADDRINUSE and EADDRNOTAVAIL so the two error values are
> similarly *wrong* hinting an addressing issue, which is more consistent
> (and would come handy when documenting) than having one pointing at the
> filesystem and the other at the address space.

Just saw there are still two cases left where EBUSY will be returned:

- when trying to remove the last cpu from a cpupool with active domains
  (EBUSY seems appropriate)
- when trying to move a cpu which is not available in the source cpupool
  or free cpus (I'd suggest ENODEV in this case)


Juergen
Dario Faggioli April 15, 2016, 1:33 p.m. UTC | #11
On Fri, 2016-04-15 at 15:20 +0200, Juergen Gross wrote:
> On 15/04/16 12:58, Dario Faggioli wrote:
> > 
> > On Fri, 2016-04-15 at 12:43 +0200, Juergen Gross wrote:
> > > 
> > > The EBUSY returns of not successful repair attempts (trying to
> > > assign
> > > a
> > > cpu to another cpupool) should be changed to e.g. EADDRNOTAVAIL?
> > > 
> > I'd go for EADDRINUSE and EADDRNOTAVAIL so the two error values are
> > similarly *wrong* hinting an addressing issue, which is more
> > consistent
> > (and would come handy when documenting) than having one pointing at
> > the
> > filesystem and the other at the address space.
> Just saw there are still two cases left where EBUSY will be returned:
> 
I know...

> - when trying to remove the last cpu from a cpupool with active
> domains
>   (EBUSY seems appropriate)
>
Indeed.

> - when trying to move a cpu which is not available in the source
> cpupool
>   or free cpus (I'd suggest ENODEV in this case)
> 
Well, in practice, I don't actually think this is too big of a deal. In
fact, I don't think that tools can or need to differentiate their
behavior too much between this and, for instance, the above mentioned
failure.

_However_, if while you're there you're up for making this case
returning ENODEV, it would indeed look like it's more descriptive of
the actual problem, and hence better.

Regards,
Dario
Ian Jackson April 15, 2016, 2:11 p.m. UTC | #12
Juergen Gross writes ("Re: [Xen-devel] [PATCH 3/3] xen: Document XEN_SYSCTL_CPUPOOL_OP_RMCPU anomalous EBUSY result"):
> Just saw there are still two cases left where EBUSY will be returned:
> 
> - when trying to remove the last cpu from a cpupool with active domains
>   (EBUSY seems appropriate)

The problem is that EBUSY here might mean "this operation can't be
done because it amounts to a request for a configuration which
violates an invariant" or "this operation can't be done for temporary
reasons and you should retry it".

Ian.
Ian Jackson April 15, 2016, 2:12 p.m. UTC | #13
Dario Faggioli writes ("Re: [PATCH 3/3] xen: Document XEN_SYSCTL_CPUPOOL_OP_RMCPU anomalous EBUSY result"):
> However, a different return value for the super special case of
> temporary pinning override could maybe be selected. I'm having a look,
> and although I don't find one that could be seen as a perfect fit (that
> would be EBUSY, which is taken!), what about one of these:
> 
>   EEXIST        17      /* File exists */
>   ENOTEMPTY     39      /* Directory not empty */
>   EROFS         30      /* Read-only file system */
>   ENOSPC        28      /* No space left on device */

Does the hypervisor currently use EAGAIN for anything ?

Ian.
Konrad Rzeszutek Wilk April 15, 2016, 2:17 p.m. UTC | #14
On Fri, Apr 15, 2016 at 03:12:56PM +0100, Ian Jackson wrote:
> Dario Faggioli writes ("Re: [PATCH 3/3] xen: Document XEN_SYSCTL_CPUPOOL_OP_RMCPU anomalous EBUSY result"):
> > However, a different return value for the super special case of
> > temporary pinning override could maybe be selected. I'm having a look,
> > and although I don't find one that could be seen as a perfect fit (that
> > would be EBUSY, which is taken!), what about one of these:
> > 
> >   EEXIST        17      /* File exists */
> >   ENOTEMPTY     39      /* Directory not empty */
> >   EROFS         30      /* Read-only file system */
> >   ENOSPC        28      /* No space left on device */
> 
> Does the hypervisor currently use EAGAIN for anything ?

Yes. XEN_SYSCTL_pm_op, XSM, HVMOP_get_mem_type, MAP_PIRQ_TYPE_MULTI_MSI (ah
scratch that, it makes it -EINVAL).
> 
> Ian.
Jürgen Groß April 15, 2016, 2:34 p.m. UTC | #15
On 15/04/16 16:12, Ian Jackson wrote:
> Dario Faggioli writes ("Re: [PATCH 3/3] xen: Document XEN_SYSCTL_CPUPOOL_OP_RMCPU anomalous EBUSY result"):
>> However, a different return value for the super special case of
>> temporary pinning override could maybe be selected. I'm having a look,
>> and although I don't find one that could be seen as a perfect fit (that
>> would be EBUSY, which is taken!), what about one of these:
>>
>>   EEXIST        17      /* File exists */
>>   ENOTEMPTY     39      /* Directory not empty */
>>   EROFS         30      /* Read-only file system */
>>   ENOSPC        28      /* No space left on device */
> 
> Does the hypervisor currently use EAGAIN for anything ?

Yes. Especially XEN_SYSCTL_CPUPOOL_OP_RMCPU can return it already.
Before you are asking: Jan didn't want it to return EAGAIN in the
temporary pinning case as the hypervisor couldn't guarantee that
the situation will be resolved (opposed to the other usage of
EAGAIN).


Juergen
Jürgen Groß April 15, 2016, 2:39 p.m. UTC | #16
On 15/04/16 16:11, Ian Jackson wrote:
> Juergen Gross writes ("Re: [Xen-devel] [PATCH 3/3] xen: Document XEN_SYSCTL_CPUPOOL_OP_RMCPU anomalous EBUSY result"):
>> Just saw there are still two cases left where EBUSY will be returned:
>>
>> - when trying to remove the last cpu from a cpupool with active domains
>>   (EBUSY seems appropriate)
> 
> The problem is that EBUSY here might mean "this operation can't be
> done because it amounts to a request for a configuration which
> violates an invariant" or "this operation can't be done for temporary
> reasons and you should retry it".

I'm not sure I understand your comment correctly: do you see a problem
with EBUSY being returned in the two cases I mentioned or just in the
one case you cited?


Juergen
Ian Jackson April 15, 2016, 2:44 p.m. UTC | #17
Juergen Gross writes ("Re: [PATCH 3/3] xen: Document XEN_SYSCTL_CPUPOOL_OP_RMCPU anomalous EBUSY result"):
> On 15/04/16 16:12, Ian Jackson wrote:
> > Does the hypervisor currently use EAGAIN for anything ?
> 
> Yes. Especially XEN_SYSCTL_CPUPOOL_OP_RMCPU can return it already.

That is a good reason not to use it.

> Before you are asking: Jan didn't want it to return EAGAIN in the
> temporary pinning case as the hypervisor couldn't guarantee that
> the situation will be resolved (opposed to the other usage of
> EAGAIN).

Hrm.  I note that ENOLCK is in the list in errno.h but not used
anywhere.  EISCONN is another possibility...

Ian.
diff mbox

Patch

diff --git a/xen/include/public/sysctl.h b/xen/include/public/sysctl.h
index 0849908..cfccf38 100644
--- a/xen/include/public/sysctl.h
+++ b/xen/include/public/sysctl.h
@@ -560,6 +560,34 @@  struct xen_sysctl_cpupool_op {
 typedef struct xen_sysctl_cpupool_op xen_sysctl_cpupool_op_t;
 DEFINE_XEN_GUEST_HANDLE(xen_sysctl_cpupool_op_t);
 
+/*
+ * cpupool operations may return EBUSY if the operation cannot be
+ * executed right now because of another cpupool operation which is
+ * still in progress.  In this case, EBUSY means that the failed
+ * operation had no effect.
+ *
+ * Some operations including at least RMCPU (xxx which others?) may
+ * also return EBUSY because a guest has temporarily pinned one of its
+ * vcpus to the pcpu in question.  It is the pious hope (xxx) of the
+ * author of this comment that this can only occur for domains which
+ * have been granted some kind of hardware privilege (eg passthrough).
+ *
+ * In this case the operation may have been partially carried out and
+ * the pcpu is left in an anomalous state.  In this state the pcpu may
+ * be used by some not readily predictable subset of the vcpus
+ * (domains) whose vcpus are in the old cpupool.  (xxx is this true?)
+ *
+ * This can be detected by seeing whether the pcpu can be added to a
+ * different cpupool.  (xxx this is a silly interface; the situation
+ * should be reported by a different errno value, at least.)  If the
+ * pcpu can't be added to a different cpupool for this reason,
+ * attempts to do so will returning (xxx what errno value?)
+ *
+ * The anomalous situation can be recovered by adding the pcpu back to
+ * the cpupool it came from (xxx this permits a buggy or malicious
+ * guest to prevent the cpu ever being removed from its cpupool).
+ */ 
+
 #define ARINC653_MAX_DOMAINS_PER_SCHEDULE   64
 /*
  * This structure is used to pass a new ARINC653 schedule from a