Message ID | 1460653660-6654-4-git-send-email-ian.jackson@eu.citrix.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On 14/04/16 18:07, Ian Jackson wrote: > This is my attempt at understanding the situation, from reading > descriptions provided on list in the context of toolstack patches > which were attempting to work around the anomaly. > > The multiple `xxx' entries reflect 1. my lack of complete understanding > 2. API defects which I think I have identified. > > Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com> > Cc: Wei Liu <wei.liu2@citrix.com> > CC: Dario Faggioli <dario.faggioli@citrix.com> > CC: Juergen Gross <jgross@suse.com> > CC: George Dunlap <george.dunlap@eu.citrix.com> > CC: Jan Beulich <jbeulich@suse.com> > CC: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> > --- > xen/include/public/sysctl.h | 28 ++++++++++++++++++++++++++++ > 1 file changed, 28 insertions(+) > > diff --git a/xen/include/public/sysctl.h b/xen/include/public/sysctl.h > index 0849908..cfccf38 100644 > --- a/xen/include/public/sysctl.h > +++ b/xen/include/public/sysctl.h > @@ -560,6 +560,34 @@ struct xen_sysctl_cpupool_op { > typedef struct xen_sysctl_cpupool_op xen_sysctl_cpupool_op_t; > DEFINE_XEN_GUEST_HANDLE(xen_sysctl_cpupool_op_t); > > +/* > + * cpupool operations may return EBUSY if the operation cannot be > + * executed right now because of another cpupool operation which is > + * still in progress. In this case, EBUSY means that the failed > + * operation had no effect. > + * > + * Some operations including at least RMCPU (xxx which others?) may > + * also return EBUSY because a guest has temporarily pinned one of its > + * vcpus to the pcpu in question. It is the pious hope (xxx) of the > + * author of this comment that this can only occur for domains which > + * have been granted some kind of hardware privilege (eg passthrough). Any VM can be given any arbitrary pinning in its xl configuration file. Any arbitrary pinning can be applied at runtime via `xl vcpu-pin ...` (To the best of my knowledge) A VM cannot choose pinning of its own accord. (i.e. the host admin has to choose the pinning.) ~Andrew > + * > + * In this case the operation may have been partially carried out and > + * the pcpu is left in an anomalous state. In this state the pcpu may > + * be used by some not readily predictable subset of the vcpus > + * (domains) whose vcpus are in the old cpupool. (xxx is this true?) > + * > + * This can be detected by seeing whether the pcpu can be added to a > + * different cpupool. (xxx this is a silly interface; the situation > + * should be reported by a different errno value, at least.) If the > + * pcpu can't be added to a different cpupool for this reason, > + * attempts to do so will returning (xxx what errno value?) > + * > + * The anomalous situation can be recovered by adding the pcpu back to > + * the cpupool it came from (xxx this permits a buggy or malicious > + * guest to prevent the cpu ever being removed from its cpupool). > + */ > + > #define ARINC653_MAX_DOMAINS_PER_SCHEDULE 64 > /* > * This structure is used to pass a new ARINC653 schedule from a
Andrew Cooper writes ("Re: [Xen-devel] [PATCH 3/3] xen: Document XEN_SYSCTL_CPUPOOL_OP_RMCPU anomalous EBUSY result"): > On 14/04/16 18:07, Ian Jackson wrote: > > +/* > > + * cpupool operations may return EBUSY if the operation cannot be > > + * executed right now because of another cpupool operation which is > > + * still in progress. In this case, EBUSY means that the failed > > + * operation had no effect. > > + * > > + * Some operations including at least RMCPU (xxx which others?) may > > + * also return EBUSY because a guest has temporarily pinned one of its > > + * vcpus to the pcpu in question. It is the pious hope (xxx) of the > > + * author of this comment that this can only occur for domains which > > + * have been granted some kind of hardware privilege (eg passthrough). > > Any VM can be given any arbitrary pinning in its xl configuration file. > Any arbitrary pinning can be applied at runtime via `xl vcpu-pin ...` Does that produce EBUSY as well ? The reuse of the same error number for all of "the existing configuration (eg toolstack-selected vcpu pinning) means that the operation does not make sense" "there is some lock contention and trying again may help" "a semantically conflicting, or nearly-semantically-conflicting, operation is currently in progress" "the guest has done a temporary pin which prevents this operation" is very unfortunate. How is a toolstack to know what to do ? > (To the best of my knowledge) A VM cannot choose pinning of its own > accord. (i.e. the host admin has to choose the pinning.) AIUI, that is not (now) true. Ian.
On Thu, 2016-04-14 at 18:56 +0100, Ian Jackson wrote: > Andrew Cooper writes ("Re: [Xen-devel] [PATCH 3/3] xen: Document > XEN_SYSCTL_CPUPOOL_OP_RMCPU anomalous EBUSY result"): > > > > On 14/04/16 18:07, Ian Jackson wrote: > > > > > > +/* > > > + * cpupool operations may return EBUSY if the operation cannot > > > be > > > + * executed right now because of another cpupool operation which > > > is > > > + * still in progress. In this case, EBUSY means that the failed > > > + * operation had no effect. > > > + * > > > + * Some operations including at least RMCPU (xxx which others?) > > > may > > > + * also return EBUSY because a guest has temporarily pinned one > > > of its > > > + * vcpus to the pcpu in question. It is the pious hope (xxx) of > > > the > > > + * author of this comment that this can only occur for domains > > > which > > > + * have been granted some kind of hardware privilege (eg > > > passthrough). > > Any VM can be given any arbitrary pinning in its xl configuration > > file. > > Any arbitrary pinning can be applied at runtime via `xl vcpu-pin > > ...` > Does that produce EBUSY as well ? > It can, after Juergen series, but I think in this case (setting affinity), the situation is still acceptable. In fact: > The reuse of the same error number for all of > > "the existing configuration (eg toolstack-selected vcpu pinning) > means that the operation does not make sense" > This return -EINVAL. > "there is some lock contention and trying again may help" > This can't happen in this case (and reason is just that setting the affinity of a vcpu is different and less problematic than removing a cpu from a cpupool). > "a semantically conflicting, or nearly-semantically-conflicting, > operation is currently in progress" > I'm not sure what this means exactly, but I think that --depending on what it exactly means-- it either can't happen or fall into the -EINVAL case. > "the guest has done a temporary pin which prevents this operation" > This (because of the series) returns -EBUSY. > is very unfortunate. How is a toolstack to know what to do ? > Yeah, I agree, but again, I think in this case it's possible for toolstack to tell. From a quick check, we do not, in libxl, output any specific error message in case we get -EBUSY... but I can send a patch to that effect pretty quickly, if that's deemed necessary. > > (To the best of my knowledge) A VM cannot choose pinning of its own > > accord. (i.e. the host admin has to choose the pinning.) > AIUI, that is not (now) true. > Yes, now a guest can call the new SCHEDOP_pin_override hypercall (and Juergen is pushing a series to Linux for it to be able to do that... as that was the purpose of the while thing!). However, as said in another email, there's already a check like this in place, in the implementation of such an hypercall: ret = -EPERM; if ( !is_hardware_domain(current->domain) ) break; which I think satisfies Ian's (legitimate) concern? Regards, Dario
On 14/04/16 19:07, Ian Jackson wrote: > This is my attempt at understanding the situation, from reading > descriptions provided on list in the context of toolstack patches > which were attempting to work around the anomaly. > > The multiple `xxx' entries reflect 1. my lack of complete understanding > 2. API defects which I think I have identified. > > Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com> > Cc: Wei Liu <wei.liu2@citrix.com> > CC: Dario Faggioli <dario.faggioli@citrix.com> > CC: Juergen Gross <jgross@suse.com> > CC: George Dunlap <george.dunlap@eu.citrix.com> > CC: Jan Beulich <jbeulich@suse.com> > CC: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> > --- > xen/include/public/sysctl.h | 28 ++++++++++++++++++++++++++++ > 1 file changed, 28 insertions(+) > > diff --git a/xen/include/public/sysctl.h b/xen/include/public/sysctl.h > index 0849908..cfccf38 100644 > --- a/xen/include/public/sysctl.h > +++ b/xen/include/public/sysctl.h > @@ -560,6 +560,34 @@ struct xen_sysctl_cpupool_op { > typedef struct xen_sysctl_cpupool_op xen_sysctl_cpupool_op_t; > DEFINE_XEN_GUEST_HANDLE(xen_sysctl_cpupool_op_t); > > +/* > + * cpupool operations may return EBUSY if the operation cannot be > + * executed right now because of another cpupool operation which is > + * still in progress. In this case, EBUSY means that the failed > + * operation had no effect. > + * > + * Some operations including at least RMCPU (xxx which others?) may Only this one. > + * also return EBUSY because a guest has temporarily pinned one of its > + * vcpus to the pcpu in question. It is the pious hope (xxx) of the > + * author of this comment that this can only occur for domains which > + * have been granted some kind of hardware privilege (eg passthrough). Only the hardware domain is allowed to do this. > + * In this case the operation may have been partially carried out and > + * the pcpu is left in an anomalous state. In this state the pcpu may > + * be used by some not readily predictable subset of the vcpus > + * (domains) whose vcpus are in the old cpupool. (xxx is this true?) Yes. > + * > + * This can be detected by seeing whether the pcpu can be added to a > + * different cpupool. (xxx this is a silly interface; the situation > + * should be reported by a different errno value, at least.) If the I'm open for suggestions. :-) > + * pcpu can't be added to a different cpupool for this reason, > + * attempts to do so will returning (xxx what errno value?) You might have guessed: EBUSY > + * > + * The anomalous situation can be recovered by adding the pcpu back to > + * the cpupool it came from (xxx this permits a buggy or malicious > + * guest to prevent the cpu ever being removed from its cpupool). Only the hardware domain, which has other means to inject problems. Juergen
On Fri, 2016-04-15 at 07:35 +0200, Juergen Gross wrote: > On 14/04/16 19:07, Ian Jackson wrote: > > > > --- a/xen/include/public/sysctl.h > > +++ b/xen/include/public/sysctl.h > > @@ -560,6 +560,34 @@ struct xen_sysctl_cpupool_op { > > > > + * In this case the operation may have been partially carried out > > and > > + * the pcpu is left in an anomalous state. In this state the pcpu > > may > > + * be used by some not readily predictable subset of the vcpus > > + * (domains) whose vcpus are in the old cpupool. (xxx is this > > true?) > Yes. > Well, I think it's a little less bad than that. the pcpu is no longer a valid member of its cpupool. This means the scheduler will not run vcpus of the domains of the pool (it's domains that are in cpupools) any longer. And even the vcpus that are temporarily stuck on the pcpu (i.e., the ones that are actually preventing the operation to succeed) will rather quickly get away from it. So, I'd say rather that "In this state the pcpu will, potentially after a short transitory, just not be used by any vcpu" > > + * > > + * This can be detected by seeing whether the pcpu can be added to > > a > > + * different cpupool. (xxx this is a silly interface; the > > situation > > + * should be reported by a different errno value, at least.) If > > the > I'm open for suggestions. :-) > Well, changing the system behavior with anything that involves retry loops is indeed non-trivial (or, at least, I don't out of the top of my head have an alternative). However, a different return value for the super special case of temporary pinning override could maybe be selected. I'm having a look, and although I don't find one that could be seen as a perfect fit (that would be EBUSY, which is taken!), what about one of these: EEXIST 17 /* File exists */ ENOTEMPTY 39 /* Directory not empty */ EROFS 30 /* Read-only file system */ ENOSPC 28 /* No space left on device */ After all, as ugly as the resulting error message would be: - if's a very rare special case - we need to document this very rare special case anyway - we can log more info about the actual error > > + * pcpu can't be added to a different cpupool for this reason, > > + * attempts to do so will returning (xxx what errno value?) > You might have guessed: EBUSY > And here too, if the pcpu is not in cpupool_free_cpus, maybe we can go and check whether it is in any of the existing cpupool's cpu_valid. If it is, then EBUSY is ok. If not, it means it's in the inconsistent state, and we can pick another (same as above?) error value, use it and document the situation. > > + * > > + * The anomalous situation can be recovered by adding the pcpu > > back to > > + * the cpupool it came from (xxx this permits a buggy or malicious > > + * guest to prevent the cpu ever being removed from its cpupool). > Only the hardware domain, which has other means to inject problems. > Yep, indeed. Regards, Dario
Dario Faggioli writes ("Re: [Xen-devel] [PATCH 3/3] xen: Document XEN_SYSCTL_CPUPOOL_OP_RMCPU anomalous EBUSY result"): > On Thu, 2016-04-14 at 18:56 +0100, Ian Jackson wrote: > > is very unfortunate. How is a toolstack to know what to do ? > > > Yeah, I agree, but again, I think in this case it's possible for > toolstack to tell. Thanks to both you and Juergen for the explanations. I'm glad to see that things weren't as bad as I feared - I admit that I didn't spend the time to completely follow what all the relevant bits of code did. Would either of you care to provide a version of my documentation patch which answers the questions that my text answers ? Or shall we commit my version and you can edit it in-tree :-). > ret = -EPERM; > if ( !is_hardware_domain(current->domain) ) > break; > > which I think satisfies Ian's (legitimate) concern? That addresses the security concern, yes. (Is a driver domain a `hardware domain' in this sense?) All I need now is a recipe for the tools to tell what has happened and then I can make xl or libxl at least print comprehensible and correct error messages... Thanks, Ian.
On 15/04/16 12:20, Ian Jackson wrote: > Dario Faggioli writes ("Re: [Xen-devel] [PATCH 3/3] xen: Document XEN_SYSCTL_CPUPOOL_OP_RMCPU anomalous EBUSY result"): >> On Thu, 2016-04-14 at 18:56 +0100, Ian Jackson wrote: >>> is very unfortunate. How is a toolstack to know what to do ? >>> >> Yeah, I agree, but again, I think in this case it's possible for >> toolstack to tell. > > Thanks to both you and Juergen for the explanations. I'm glad to see > that things weren't as bad as I feared - I admit that I didn't spend > the time to completely follow what all the relevant bits of code did. > > Would either of you care to provide a version of my documentation patch > which answers the questions that my text answers ? Or shall we commit > my version and you can edit it in-tree :-). I can provide an updated patch. >> ret = -EPERM; >> if ( !is_hardware_domain(current->domain) ) >> break; >> >> which I think satisfies Ian's (legitimate) concern? > > That addresses the security concern, yes. (Is a driver domain a > `hardware domain' in this sense?) The hardware domain is the one domain having direct access to the native hardware. This is dom0 in most cases. There can be only one hardware domain. > All I need now is a recipe for the tools to tell what has happened and > then I can make xl or libxl at least print comprehensible and correct > error messages... So this boils down to finding an appropriate ESOMETHING replacement for the EBUSY case introduced by the temporary pinning. I think ENOTEMPTY or EADDRINUSE would fit best. The EBUSY returns of not successful repair attempts (trying to assign a cpu to another cpupool) should be changed to e.g. EADDRNOTAVAIL? Juergen
On Fri, 2016-04-15 at 12:43 +0200, Juergen Gross wrote: > On 15/04/16 12:20, Ian Jackson wrote: > > > > Would either of you care to provide a version of my documentation > > patch > > which answers the questions that my text answers ? Or shall we > > commit > > my version and you can edit it in-tree :-). > I can provide an updated patch. > Great. > > All I need now is a recipe for the tools to tell what has happened > > and > > then I can make xl or libxl at least print comprehensible and > > correct > > error messages... > So this boils down to finding an appropriate ESOMETHING replacement > for the EBUSY case introduced by the temporary pinning. > Exactly. > I think ENOTEMPTY or EADDRINUSE would fit best. > I like the latter better. > > The EBUSY returns of not successful repair attempts (trying to assign > a > cpu to another cpupool) should be changed to e.g. EADDRNOTAVAIL? > I'd go for EADDRINUSE and EADDRNOTAVAIL so the two error values are similarly *wrong* hinting an addressing issue, which is more consistent (and would come handy when documenting) than having one pointing at the filesystem and the other at the address space. Are you going to do the patch for this yourself as well? Regards, Dario
On 15/04/16 12:58, Dario Faggioli wrote: > On Fri, 2016-04-15 at 12:43 +0200, Juergen Gross wrote: >> On 15/04/16 12:20, Ian Jackson wrote: >>> >>> Would either of you care to provide a version of my documentation >>> patch >>> which answers the questions that my text answers ? Or shall we >>> commit >>> my version and you can edit it in-tree :-). >> I can provide an updated patch. >> > Great. > >>> All I need now is a recipe for the tools to tell what has happened >>> and >>> then I can make xl or libxl at least print comprehensible and >>> correct >>> error messages... >> So this boils down to finding an appropriate ESOMETHING replacement >> for the EBUSY case introduced by the temporary pinning. >> > Exactly. > >> I think ENOTEMPTY or EADDRINUSE would fit best. >> > I like the latter better. >> >> The EBUSY returns of not successful repair attempts (trying to assign >> a >> cpu to another cpupool) should be changed to e.g. EADDRNOTAVAIL? >> > I'd go for EADDRINUSE and EADDRNOTAVAIL so the two error values are > similarly *wrong* hinting an addressing issue, which is more consistent > (and would come handy when documenting) than having one pointing at the > filesystem and the other at the address space. > > Are you going to do the patch for this yourself as well? Sure. Juergen
On 15/04/16 12:58, Dario Faggioli wrote: > On Fri, 2016-04-15 at 12:43 +0200, Juergen Gross wrote: >> On 15/04/16 12:20, Ian Jackson wrote: >>> >>> Would either of you care to provide a version of my documentation >>> patch >>> which answers the questions that my text answers ? Or shall we >>> commit >>> my version and you can edit it in-tree :-). >> I can provide an updated patch. >> > Great. > >>> All I need now is a recipe for the tools to tell what has happened >>> and >>> then I can make xl or libxl at least print comprehensible and >>> correct >>> error messages... >> So this boils down to finding an appropriate ESOMETHING replacement >> for the EBUSY case introduced by the temporary pinning. >> > Exactly. > >> I think ENOTEMPTY or EADDRINUSE would fit best. >> > I like the latter better. >> >> The EBUSY returns of not successful repair attempts (trying to assign >> a >> cpu to another cpupool) should be changed to e.g. EADDRNOTAVAIL? >> > I'd go for EADDRINUSE and EADDRNOTAVAIL so the two error values are > similarly *wrong* hinting an addressing issue, which is more consistent > (and would come handy when documenting) than having one pointing at the > filesystem and the other at the address space. Just saw there are still two cases left where EBUSY will be returned: - when trying to remove the last cpu from a cpupool with active domains (EBUSY seems appropriate) - when trying to move a cpu which is not available in the source cpupool or free cpus (I'd suggest ENODEV in this case) Juergen
On Fri, 2016-04-15 at 15:20 +0200, Juergen Gross wrote: > On 15/04/16 12:58, Dario Faggioli wrote: > > > > On Fri, 2016-04-15 at 12:43 +0200, Juergen Gross wrote: > > > > > > The EBUSY returns of not successful repair attempts (trying to > > > assign > > > a > > > cpu to another cpupool) should be changed to e.g. EADDRNOTAVAIL? > > > > > I'd go for EADDRINUSE and EADDRNOTAVAIL so the two error values are > > similarly *wrong* hinting an addressing issue, which is more > > consistent > > (and would come handy when documenting) than having one pointing at > > the > > filesystem and the other at the address space. > Just saw there are still two cases left where EBUSY will be returned: > I know... > - when trying to remove the last cpu from a cpupool with active > domains > (EBUSY seems appropriate) > Indeed. > - when trying to move a cpu which is not available in the source > cpupool > or free cpus (I'd suggest ENODEV in this case) > Well, in practice, I don't actually think this is too big of a deal. In fact, I don't think that tools can or need to differentiate their behavior too much between this and, for instance, the above mentioned failure. _However_, if while you're there you're up for making this case returning ENODEV, it would indeed look like it's more descriptive of the actual problem, and hence better. Regards, Dario
Juergen Gross writes ("Re: [Xen-devel] [PATCH 3/3] xen: Document XEN_SYSCTL_CPUPOOL_OP_RMCPU anomalous EBUSY result"): > Just saw there are still two cases left where EBUSY will be returned: > > - when trying to remove the last cpu from a cpupool with active domains > (EBUSY seems appropriate) The problem is that EBUSY here might mean "this operation can't be done because it amounts to a request for a configuration which violates an invariant" or "this operation can't be done for temporary reasons and you should retry it". Ian.
Dario Faggioli writes ("Re: [PATCH 3/3] xen: Document XEN_SYSCTL_CPUPOOL_OP_RMCPU anomalous EBUSY result"): > However, a different return value for the super special case of > temporary pinning override could maybe be selected. I'm having a look, > and although I don't find one that could be seen as a perfect fit (that > would be EBUSY, which is taken!), what about one of these: > > EEXIST 17 /* File exists */ > ENOTEMPTY 39 /* Directory not empty */ > EROFS 30 /* Read-only file system */ > ENOSPC 28 /* No space left on device */ Does the hypervisor currently use EAGAIN for anything ? Ian.
On Fri, Apr 15, 2016 at 03:12:56PM +0100, Ian Jackson wrote: > Dario Faggioli writes ("Re: [PATCH 3/3] xen: Document XEN_SYSCTL_CPUPOOL_OP_RMCPU anomalous EBUSY result"): > > However, a different return value for the super special case of > > temporary pinning override could maybe be selected. I'm having a look, > > and although I don't find one that could be seen as a perfect fit (that > > would be EBUSY, which is taken!), what about one of these: > > > > EEXIST 17 /* File exists */ > > ENOTEMPTY 39 /* Directory not empty */ > > EROFS 30 /* Read-only file system */ > > ENOSPC 28 /* No space left on device */ > > Does the hypervisor currently use EAGAIN for anything ? Yes. XEN_SYSCTL_pm_op, XSM, HVMOP_get_mem_type, MAP_PIRQ_TYPE_MULTI_MSI (ah scratch that, it makes it -EINVAL). > > Ian.
On 15/04/16 16:12, Ian Jackson wrote: > Dario Faggioli writes ("Re: [PATCH 3/3] xen: Document XEN_SYSCTL_CPUPOOL_OP_RMCPU anomalous EBUSY result"): >> However, a different return value for the super special case of >> temporary pinning override could maybe be selected. I'm having a look, >> and although I don't find one that could be seen as a perfect fit (that >> would be EBUSY, which is taken!), what about one of these: >> >> EEXIST 17 /* File exists */ >> ENOTEMPTY 39 /* Directory not empty */ >> EROFS 30 /* Read-only file system */ >> ENOSPC 28 /* No space left on device */ > > Does the hypervisor currently use EAGAIN for anything ? Yes. Especially XEN_SYSCTL_CPUPOOL_OP_RMCPU can return it already. Before you are asking: Jan didn't want it to return EAGAIN in the temporary pinning case as the hypervisor couldn't guarantee that the situation will be resolved (opposed to the other usage of EAGAIN). Juergen
On 15/04/16 16:11, Ian Jackson wrote: > Juergen Gross writes ("Re: [Xen-devel] [PATCH 3/3] xen: Document XEN_SYSCTL_CPUPOOL_OP_RMCPU anomalous EBUSY result"): >> Just saw there are still two cases left where EBUSY will be returned: >> >> - when trying to remove the last cpu from a cpupool with active domains >> (EBUSY seems appropriate) > > The problem is that EBUSY here might mean "this operation can't be > done because it amounts to a request for a configuration which > violates an invariant" or "this operation can't be done for temporary > reasons and you should retry it". I'm not sure I understand your comment correctly: do you see a problem with EBUSY being returned in the two cases I mentioned or just in the one case you cited? Juergen
Juergen Gross writes ("Re: [PATCH 3/3] xen: Document XEN_SYSCTL_CPUPOOL_OP_RMCPU anomalous EBUSY result"): > On 15/04/16 16:12, Ian Jackson wrote: > > Does the hypervisor currently use EAGAIN for anything ? > > Yes. Especially XEN_SYSCTL_CPUPOOL_OP_RMCPU can return it already. That is a good reason not to use it. > Before you are asking: Jan didn't want it to return EAGAIN in the > temporary pinning case as the hypervisor couldn't guarantee that > the situation will be resolved (opposed to the other usage of > EAGAIN). Hrm. I note that ENOLCK is in the list in errno.h but not used anywhere. EISCONN is another possibility... Ian.
diff --git a/xen/include/public/sysctl.h b/xen/include/public/sysctl.h index 0849908..cfccf38 100644 --- a/xen/include/public/sysctl.h +++ b/xen/include/public/sysctl.h @@ -560,6 +560,34 @@ struct xen_sysctl_cpupool_op { typedef struct xen_sysctl_cpupool_op xen_sysctl_cpupool_op_t; DEFINE_XEN_GUEST_HANDLE(xen_sysctl_cpupool_op_t); +/* + * cpupool operations may return EBUSY if the operation cannot be + * executed right now because of another cpupool operation which is + * still in progress. In this case, EBUSY means that the failed + * operation had no effect. + * + * Some operations including at least RMCPU (xxx which others?) may + * also return EBUSY because a guest has temporarily pinned one of its + * vcpus to the pcpu in question. It is the pious hope (xxx) of the + * author of this comment that this can only occur for domains which + * have been granted some kind of hardware privilege (eg passthrough). + * + * In this case the operation may have been partially carried out and + * the pcpu is left in an anomalous state. In this state the pcpu may + * be used by some not readily predictable subset of the vcpus + * (domains) whose vcpus are in the old cpupool. (xxx is this true?) + * + * This can be detected by seeing whether the pcpu can be added to a + * different cpupool. (xxx this is a silly interface; the situation + * should be reported by a different errno value, at least.) If the + * pcpu can't be added to a different cpupool for this reason, + * attempts to do so will returning (xxx what errno value?) + * + * The anomalous situation can be recovered by adding the pcpu back to + * the cpupool it came from (xxx this permits a buggy or malicious + * guest to prevent the cpu ever being removed from its cpupool). + */ + #define ARINC653_MAX_DOMAINS_PER_SCHEDULE 64 /* * This structure is used to pass a new ARINC653 schedule from a
This is my attempt at understanding the situation, from reading descriptions provided on list in the context of toolstack patches which were attempting to work around the anomaly. The multiple `xxx' entries reflect 1. my lack of complete understanding 2. API defects which I think I have identified. Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com> Cc: Wei Liu <wei.liu2@citrix.com> CC: Dario Faggioli <dario.faggioli@citrix.com> CC: Juergen Gross <jgross@suse.com> CC: George Dunlap <george.dunlap@eu.citrix.com> CC: Jan Beulich <jbeulich@suse.com> CC: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> --- xen/include/public/sysctl.h | 28 ++++++++++++++++++++++++++++ 1 file changed, 28 insertions(+)