diff mbox series

[v4,1/3] domctl: introduce a new domain create flag, XEN_DOMCTL_CDF_disable_fifo, ...

Message ID 20201124191751.11472-2-paul@xen.org (mailing list archive)
State New, archived
Headers show
Series evtchn: Introduce a per-guest knob to control FIFO ABI | expand

Commit Message

Paul Durrant Nov. 24, 2020, 7:17 p.m. UTC
From: Paul Durrant <pdurrant@amazon.com>

...to control the visibility of the FIFO event channel operations
(EVTCHNOP_init_control, EVTCHNOP_expand_array, and EVTCHNOP_set_priority) to
the guest.

These operations were added to the public header in commit d2d50c2f308f
("evtchn: add FIFO-based event channel ABI") and the first implementation
appeared in the two subsequent commits: edc8872aeb4a ("evtchn: implement
EVTCHNOP_set_priority and add the set_priority hook") and 88910061ec61
("evtchn: add FIFO-based event channel hypercalls and port ops"). Prior to
that, a guest issuing those operations would receive a return value of
-ENOSYS (not implemented) from Xen. Guests aware of the FIFO operations but
running on an older (pre-4.4) Xen would fall back to using the 2-level event
channel interface upon seeing this return value.

Unfortunately the uncontrolable appearance of these new operations in Xen 4.4
onwards has implications for hibernation of some Linux guests. During resume
from hibernation, there are two kernels involved: the "boot" kernel and the
"resume" kernel. The guest boot kernel may default to use FIFO operations and
instruct Xen via EVTCHNOP_init_control to switch from 2-level to FIFO. On the
other hand, the resume kernel keeps assuming 2-level, because it was hibernated
on a version of Xen that did not support the FIFO operations.

To maintain compatibility it is necessary to make Xen behave as it did
before the new operations were added and hence the code in this patch ensures
that, if XEN_DOMCTL_CDF_disable_fifo is set, the FIFO event channel operations
will again result in -ENOSYS being returned to the guest.

Signed-off-by: Paul Durrant <pdurrant@amazon.com>
Signed-off-by: Eslam Elnikety <elnikety@amazon.com>
---
Cc: Christian Lindig <christian.lindig@citrix.com>
Cc: David Scott <dave@recoil.org>
Cc: Ian Jackson <iwj@xenproject.org>
Cc: Wei Liu <wl@xen.org>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: George Dunlap <george.dunlap@citrix.com>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Julien Grall <julien@xen.org>
Cc: Stefano Stabellini <sstabellini@kernel.org>

v4:
 - New in v4
---
 tools/ocaml/libs/xc/xenctrl.ml  |  1 +
 tools/ocaml/libs/xc/xenctrl.mli |  1 +
 xen/common/domain.c             |  2 +-
 xen/common/event_channel.c      | 17 +++++++++++++++++
 xen/include/public/domctl.h     |  4 +++-
 5 files changed, 23 insertions(+), 2 deletions(-)

Comments

Jan Beulich Nov. 25, 2020, 9:20 a.m. UTC | #1
On 24.11.2020 20:17, Paul Durrant wrote:
> From: Paul Durrant <pdurrant@amazon.com>
> 
> ...to control the visibility of the FIFO event channel operations
> (EVTCHNOP_init_control, EVTCHNOP_expand_array, and EVTCHNOP_set_priority) to
> the guest.
> 
> These operations were added to the public header in commit d2d50c2f308f
> ("evtchn: add FIFO-based event channel ABI") and the first implementation
> appeared in the two subsequent commits: edc8872aeb4a ("evtchn: implement
> EVTCHNOP_set_priority and add the set_priority hook") and 88910061ec61
> ("evtchn: add FIFO-based event channel hypercalls and port ops"). Prior to
> that, a guest issuing those operations would receive a return value of
> -ENOSYS (not implemented) from Xen. Guests aware of the FIFO operations but
> running on an older (pre-4.4) Xen would fall back to using the 2-level event
> channel interface upon seeing this return value.
> 
> Unfortunately the uncontrolable appearance of these new operations in Xen 4.4
> onwards has implications for hibernation of some Linux guests. During resume
> from hibernation, there are two kernels involved: the "boot" kernel and the
> "resume" kernel. The guest boot kernel may default to use FIFO operations and
> instruct Xen via EVTCHNOP_init_control to switch from 2-level to FIFO. On the
> other hand, the resume kernel keeps assuming 2-level, because it was hibernated
> on a version of Xen that did not support the FIFO operations.

And the alternative of the boot kernel issuing EVTCHNOP_reset has
other unwanted consequences. Maybe worth mentioning here, as
otherwise this would look like the obvious way to return to 2-level
mode?

Also, why can't the boot kernel be instructed to avoid engaging
FIFO mode?

> To maintain compatibility it is necessary to make Xen behave as it did
> before the new operations were added and hence the code in this patch ensures
> that, if XEN_DOMCTL_CDF_disable_fifo is set, the FIFO event channel operations
> will again result in -ENOSYS being returned to the guest.

Are there indeed dependencies on the precise return value anywhere?
If so, the generally inappropriate use (do_event_channel_op()'s
default case really would also need switching) would want a brief
comment, so it'll be understood by readers that this isn't code to
derive other code from. If not, -EPERM or -EACCES perhaps?

Also, now that we gain a runtime control, do we perhaps also want a
build time one?

> Signed-off-by: Paul Durrant <pdurrant@amazon.com>
> Signed-off-by: Eslam Elnikety <elnikety@amazon.com>

Are this order as well as the From: tag above correct? Or
alternatively, are there actually any pieces left at all from
Eslam's earlier patch?

> v4:
>  - New in v4

(Just as an aside: That's quite interesting for a previously
standalone patch. I guess that patch was really split, considering
you've retained Eslam's S-o-b? But perhaps there are different ways
to look at things ...)

> --- a/xen/include/public/domctl.h
> +++ b/xen/include/public/domctl.h
> @@ -70,9 +70,11 @@ struct xen_domctl_createdomain {
>  #define XEN_DOMCTL_CDF_iommu          (1U<<_XEN_DOMCTL_CDF_iommu)
>  #define _XEN_DOMCTL_CDF_nested_virt   6
>  #define XEN_DOMCTL_CDF_nested_virt    (1U << _XEN_DOMCTL_CDF_nested_virt)
> +#define _XEN_DOMCTL_CDF_disable_fifo  7
> +#define XEN_DOMCTL_CDF_disable_fifo   (1U << _XEN_DOMCTL_CDF_disable_fifo)

Despite getting longish, I think this needs "evtchn" somewhere in
the name. To keep size bounded, maybe XEN_DOMCTL_CDF_no_fifo_evtchn?

>  /* Max XEN_DOMCTL_CDF_* constant.  Used for ABI checking. */
> -#define XEN_DOMCTL_CDF_MAX XEN_DOMCTL_CDF_nested_virt
> +#define XEN_DOMCTL_CDF_MAX XEN_DOMCTL_CDF_disable_fifo

While not directly related to this patch, I'm puzzled by the
presence of this constant: I've not been able to find any use of
it. In particular you did have a need to modify
sanitise_domain_config().

Jan
Jan Beulich Nov. 25, 2020, 9:36 a.m. UTC | #2
On 25.11.2020 10:20, Jan Beulich wrote:
> On 24.11.2020 20:17, Paul Durrant wrote:
>> --- a/xen/include/public/domctl.h
>> +++ b/xen/include/public/domctl.h
>> @@ -70,9 +70,11 @@ struct xen_domctl_createdomain {
>>  #define XEN_DOMCTL_CDF_iommu          (1U<<_XEN_DOMCTL_CDF_iommu)
>>  #define _XEN_DOMCTL_CDF_nested_virt   6
>>  #define XEN_DOMCTL_CDF_nested_virt    (1U << _XEN_DOMCTL_CDF_nested_virt)
>> +#define _XEN_DOMCTL_CDF_disable_fifo  7
>> +#define XEN_DOMCTL_CDF_disable_fifo   (1U << _XEN_DOMCTL_CDF_disable_fifo)
> 
> Despite getting longish, I think this needs "evtchn" somewhere in
> the name. To keep size bounded, maybe XEN_DOMCTL_CDF_no_fifo_evtchn?
> 
>>  /* Max XEN_DOMCTL_CDF_* constant.  Used for ABI checking. */
>> -#define XEN_DOMCTL_CDF_MAX XEN_DOMCTL_CDF_nested_virt
>> +#define XEN_DOMCTL_CDF_MAX XEN_DOMCTL_CDF_disable_fifo
> 
> While not directly related to this patch, I'm puzzled by the
> presence of this constant: I've not been able to find any use of
> it. In particular you did have a need to modify
> sanitise_domain_config().

So it was you to introduce this, right away without any user, in
7fb0e134f8c6 ("tools/ocaml: abi: Use formal conversion and check
in more places"). The only reference is from what I regard as a
comment (I don't speak any ocaml, so I may be wrong). Could you
clarify why we need to maintain this constant?

Thanks, Jan
Durrant, Paul Nov. 25, 2020, 11:01 a.m. UTC | #3
> -----Original Message-----
> From: Jan Beulich <jbeulich@suse.com>
> Sent: 25 November 2020 09:36
> To: Andrew Cooper <andrew.cooper3@citrix.com>
> Cc: Durrant, Paul <pdurrant@amazon.co.uk>; Elnikety, Eslam <elnikety@amazon.com>; Christian Lindig
> <christian.lindig@citrix.com>; David Scott <dave@recoil.org>; Ian Jackson <iwj@xenproject.org>; Wei
> Liu <wl@xen.org>; George Dunlap <george.dunlap@citrix.com>; Julien Grall <julien@xen.org>; Stefano
> Stabellini <sstabellini@kernel.org>; xen-devel@lists.xenproject.org; Paul Durrant <paul@xen.org>
> Subject: RE: [EXTERNAL] [PATCH v4 1/3] domctl: introduce a new domain create flag,
> XEN_DOMCTL_CDF_disable_fifo, ...
> 
> CAUTION: This email originated from outside of the organization. Do not click links or open
> attachments unless you can confirm the sender and know the content is safe.
> 
> 
> 
> On 25.11.2020 10:20, Jan Beulich wrote:
> > On 24.11.2020 20:17, Paul Durrant wrote:
> >> --- a/xen/include/public/domctl.h
> >> +++ b/xen/include/public/domctl.h
> >> @@ -70,9 +70,11 @@ struct xen_domctl_createdomain {
> >>  #define XEN_DOMCTL_CDF_iommu          (1U<<_XEN_DOMCTL_CDF_iommu)
> >>  #define _XEN_DOMCTL_CDF_nested_virt   6
> >>  #define XEN_DOMCTL_CDF_nested_virt    (1U << _XEN_DOMCTL_CDF_nested_virt)
> >> +#define _XEN_DOMCTL_CDF_disable_fifo  7
> >> +#define XEN_DOMCTL_CDF_disable_fifo   (1U << _XEN_DOMCTL_CDF_disable_fifo)
> >
> > Despite getting longish, I think this needs "evtchn" somewhere in
> > the name. To keep size bounded, maybe XEN_DOMCTL_CDF_no_fifo_evtchn?
> >

I'm ok with that name; I'll send a v5.

> >>  /* Max XEN_DOMCTL_CDF_* constant.  Used for ABI checking. */
> >> -#define XEN_DOMCTL_CDF_MAX XEN_DOMCTL_CDF_nested_virt
> >> +#define XEN_DOMCTL_CDF_MAX XEN_DOMCTL_CDF_disable_fifo
> >
> > While not directly related to this patch, I'm puzzled by the
> > presence of this constant: I've not been able to find any use of
> > it. In particular you did have a need to modify
> > sanitise_domain_config().
> 
> So it was you to introduce this, right away without any user, in
> 7fb0e134f8c6 ("tools/ocaml: abi: Use formal conversion and check
> in more places"). The only reference is from what I regard as a
> comment (I don't speak any ocaml, so I may be wrong). Could you
> clarify why we need to maintain this constant?
> 

I can't remember the exact sequence of events but it became apparent at some point that the ocaml bindings were out of sync and they rely on a list of domain create flags where the number has to match the bit-shift value in domctl.h (among other things). Thus there is an auto-generated header called "xenctrl_abi_check.h" which is included by xenctrl_stubs.c. This header is generated from xenctrl.ml by the perl script "abi-check" and it relies the XEN_DOMCTL_CDF_MAX constant to form part of the checks it generates.

As an example, here is the generated header with this patch applied:

// found ocaml type x86_arch_emulation_flags at xenctrl.ml:38
BUILD_BUG_ON( XEN_X86_EMU_LAPIC              != (1u << 0)  );
BUILD_BUG_ON( XEN_X86_EMU_HPET               != (1u << 1)  );
BUILD_BUG_ON( XEN_X86_EMU_PM                 != (1u << 2)  );
BUILD_BUG_ON( XEN_X86_EMU_RTC                != (1u << 3)  );
BUILD_BUG_ON( XEN_X86_EMU_IOAPIC             != (1u << 4)  );
BUILD_BUG_ON( XEN_X86_EMU_PIC                != (1u << 5)  );
BUILD_BUG_ON( XEN_X86_EMU_VGA                != (1u << 6)  );
BUILD_BUG_ON( XEN_X86_EMU_IOMMU              != (1u << 7)  );
BUILD_BUG_ON( XEN_X86_EMU_PIT                != (1u << 8)  );
BUILD_BUG_ON( XEN_X86_EMU_USE_PIRQ           != (1u << 9)  );
BUILD_BUG_ON( XEN_X86_EMU_VPCI               != (1u << 10) );
BUILD_BUG_ON( XEN_X86_EMU_ALL                != (1u << 11)-1u );
// found ocaml type domain_create_flag at xenctrl.ml:60
BUILD_BUG_ON( XEN_DOMCTL_CDF_hvm             != (1u << 0)  );
BUILD_BUG_ON( XEN_DOMCTL_CDF_hap             != (1u << 1)  );
BUILD_BUG_ON( XEN_DOMCTL_CDF_s3_integrity    != (1u << 2)  );
BUILD_BUG_ON( XEN_DOMCTL_CDF_oos_off         != (1u << 3)  );
BUILD_BUG_ON( XEN_DOMCTL_CDF_xs_domain       != (1u << 4)  );
BUILD_BUG_ON( XEN_DOMCTL_CDF_iommu           != (1u << 5)  );
BUILD_BUG_ON( XEN_DOMCTL_CDF_nested_virt     != (1u << 6)  );
BUILD_BUG_ON( XEN_DOMCTL_CDF_disable_fifo    != (1u << 7)  );
BUILD_BUG_ON( XEN_DOMCTL_CDF_MAX             != (1u << 7)  );
// found ocaml type domain_create_iommu_opts at xenctrl.ml:70
BUILD_BUG_ON( XEN_DOMCTL_IOMMU_no_sharept    != (1u << 0)  );
BUILD_BUG_ON( XEN_DOMCTL_IOMMU_MAX           != (1u << 0)  );
// found ocaml type physinfo_cap_flag at xenctrl.ml:113
BUILD_BUG_ON( XEN_SYSCTL_PHYSCAP_hvm         != (1u << 0)  );
BUILD_BUG_ON( XEN_SYSCTL_PHYSCAP_pv          != (1u << 1)  );
BUILD_BUG_ON( XEN_SYSCTL_PHYSCAP_directio    != (1u << 2)  );
BUILD_BUG_ON( XEN_SYSCTL_PHYSCAP_hap         != (1u << 3)  );
BUILD_BUG_ON( XEN_SYSCTL_PHYSCAP_shadow      != (1u << 4)  );
BUILD_BUG_ON( XEN_SYSCTL_PHYSCAP_iommu_hap_pt_share != (1u << 5)  );
BUILD_BUG_ON( XEN_SYSCTL_PHYSCAP_MAX         != (1u << 5)  );

  Paul
Paul Durrant Nov. 25, 2020, 11:10 a.m. UTC | #4
> -----Original Message-----
> From: Jan Beulich <jbeulich@suse.com>
> Sent: 25 November 2020 09:20
> To: Paul Durrant <paul@xen.org>
> Cc: Paul Durrant <pdurrant@amazon.com>; Eslam Elnikety <elnikety@amazon.com>; Christian Lindig
> <christian.lindig@citrix.com>; David Scott <dave@recoil.org>; Ian Jackson <iwj@xenproject.org>; Wei
> Liu <wl@xen.org>; Andrew Cooper <andrew.cooper3@citrix.com>; George Dunlap <george.dunlap@citrix.com>;
> Julien Grall <julien@xen.org>; Stefano Stabellini <sstabellini@kernel.org>; xen-
> devel@lists.xenproject.org
> Subject: Re: [PATCH v4 1/3] domctl: introduce a new domain create flag, XEN_DOMCTL_CDF_disable_fifo,
> ...
> 
> On 24.11.2020 20:17, Paul Durrant wrote:
> > From: Paul Durrant <pdurrant@amazon.com>
> >
> > ...to control the visibility of the FIFO event channel operations
> > (EVTCHNOP_init_control, EVTCHNOP_expand_array, and EVTCHNOP_set_priority) to
> > the guest.
> >
> > These operations were added to the public header in commit d2d50c2f308f
> > ("evtchn: add FIFO-based event channel ABI") and the first implementation
> > appeared in the two subsequent commits: edc8872aeb4a ("evtchn: implement
> > EVTCHNOP_set_priority and add the set_priority hook") and 88910061ec61
> > ("evtchn: add FIFO-based event channel hypercalls and port ops"). Prior to
> > that, a guest issuing those operations would receive a return value of
> > -ENOSYS (not implemented) from Xen. Guests aware of the FIFO operations but
> > running on an older (pre-4.4) Xen would fall back to using the 2-level event
> > channel interface upon seeing this return value.
> >
> > Unfortunately the uncontrolable appearance of these new operations in Xen 4.4
> > onwards has implications for hibernation of some Linux guests. During resume
> > from hibernation, there are two kernels involved: the "boot" kernel and the
> > "resume" kernel. The guest boot kernel may default to use FIFO operations and
> > instruct Xen via EVTCHNOP_init_control to switch from 2-level to FIFO. On the
> > other hand, the resume kernel keeps assuming 2-level, because it was hibernated
> > on a version of Xen that did not support the FIFO operations.
> 
> And the alternative of the boot kernel issuing EVTCHNOP_reset has
> other unwanted consequences. Maybe worth mentioning here, as
> otherwise this would look like the obvious way to return to 2-level
> mode?
> 
> Also, why can't the boot kernel be instructed to avoid engaging
> FIFO mode?
> 

Both of those are, of course, viable alternatives if the guest can be modified. The problem we need to work around is guest that are already out there and cannot be updated.

> > To maintain compatibility it is necessary to make Xen behave as it did
> > before the new operations were added and hence the code in this patch ensures
> > that, if XEN_DOMCTL_CDF_disable_fifo is set, the FIFO event channel operations
> > will again result in -ENOSYS being returned to the guest.
> 
> Are there indeed dependencies on the precise return value anywhere?
> If so, the generally inappropriate use (do_event_channel_op()'s
> default case really would also need switching) would want a brief
> comment, so it'll be understood by readers that this isn't code to
> derive other code from. If not, -EPERM or -EACCES perhaps?
> 

The patch, as stated, is reverting behaviour and so the -ENOSYS really needs to stay since it is essentially ABI now. I am not aware of guest code that will, in fact, die if it sees -EPERM or -EACCES instead but there may be such code. The only safe thing to do is to make things look like the used to.

> Also, now that we gain a runtime control, do we perhaps also want a
> build time one?

Yes, a Kconfig flag to compile out the code seems like a worthy addition. I'll spin up a follow-up patch as soon as I get time.

  Paul

> 
> > Signed-off-by: Paul Durrant <pdurrant@amazon.com>
> > Signed-off-by: Eslam Elnikety <elnikety@amazon.com>
> 
> Are this order as well as the From: tag above correct? Or
> alternatively, are there actually any pieces left at all from
> Eslam's earlier patch?
> 
> > v4:
> >  - New in v4
> 
> (Just as an aside: That's quite interesting for a previously
> standalone patch. I guess that patch was really split, considering
> you've retained Eslam's S-o-b? But perhaps there are different ways
> to look at things ...)
> 
> > --- a/xen/include/public/domctl.h
> > +++ b/xen/include/public/domctl.h
> > @@ -70,9 +70,11 @@ struct xen_domctl_createdomain {
> >  #define XEN_DOMCTL_CDF_iommu          (1U<<_XEN_DOMCTL_CDF_iommu)
> >  #define _XEN_DOMCTL_CDF_nested_virt   6
> >  #define XEN_DOMCTL_CDF_nested_virt    (1U << _XEN_DOMCTL_CDF_nested_virt)
> > +#define _XEN_DOMCTL_CDF_disable_fifo  7
> > +#define XEN_DOMCTL_CDF_disable_fifo   (1U << _XEN_DOMCTL_CDF_disable_fifo)
> 
> Despite getting longish, I think this needs "evtchn" somewhere in
> the name. To keep size bounded, maybe XEN_DOMCTL_CDF_no_fifo_evtchn?
> 
> >  /* Max XEN_DOMCTL_CDF_* constant.  Used for ABI checking. */
> > -#define XEN_DOMCTL_CDF_MAX XEN_DOMCTL_CDF_nested_virt
> > +#define XEN_DOMCTL_CDF_MAX XEN_DOMCTL_CDF_disable_fifo
> 
> While not directly related to this patch, I'm puzzled by the
> presence of this constant: I've not been able to find any use of
> it. In particular you did have a need to modify
> sanitise_domain_config().
> 
> Jan
Andrew Cooper Nov. 25, 2020, 11:30 a.m. UTC | #5
On 24/11/2020 19:17, Paul Durrant wrote:
> diff --git a/xen/include/public/domctl.h b/xen/include/public/domctl.h
> index 666aeb71bf1b..70701c59d053 100644
> --- a/xen/include/public/domctl.h
> +++ b/xen/include/public/domctl.h
> @@ -70,9 +70,11 @@ struct xen_domctl_createdomain {
>  #define XEN_DOMCTL_CDF_iommu          (1U<<_XEN_DOMCTL_CDF_iommu)
>  #define _XEN_DOMCTL_CDF_nested_virt   6
>  #define XEN_DOMCTL_CDF_nested_virt    (1U << _XEN_DOMCTL_CDF_nested_virt)
> +#define _XEN_DOMCTL_CDF_disable_fifo  7
> +#define XEN_DOMCTL_CDF_disable_fifo   (1U << _XEN_DOMCTL_CDF_disable_fifo)

The sense is backwards.  It should be a "permit the use of FIFO"
control.  If the code had been written this way to begin with, the bug
you found wouldn't have existed.

Given that there is not currently a way to disable FIFO, you can
probably do without an enumeration of whether the hypervisor supports it
or not.

~Andrew
Durrant, Paul Nov. 25, 2020, 11:49 a.m. UTC | #6
> -----Original Message-----
> From: Andrew Cooper <andrew.cooper3@citrix.com>
> Sent: 25 November 2020 11:31
> To: Paul Durrant <paul@xen.org>; xen-devel@lists.xenproject.org
> Cc: Durrant, Paul <pdurrant@amazon.co.uk>; Elnikety, Eslam <elnikety@amazon.com>; Christian Lindig
> <christian.lindig@citrix.com>; David Scott <dave@recoil.org>; Ian Jackson <iwj@xenproject.org>; Wei
> Liu <wl@xen.org>; George Dunlap <george.dunlap@citrix.com>; Jan Beulich <jbeulich@suse.com>; Julien
> Grall <julien@xen.org>; Stefano Stabellini <sstabellini@kernel.org>
> Subject: RE: [EXTERNAL] [PATCH v4 1/3] domctl: introduce a new domain create flag,
> XEN_DOMCTL_CDF_disable_fifo, ...
> 
> CAUTION: This email originated from outside of the organization. Do not click links or open
> attachments unless you can confirm the sender and know the content is safe.
> 
> 
> 
> On 24/11/2020 19:17, Paul Durrant wrote:
> > diff --git a/xen/include/public/domctl.h b/xen/include/public/domctl.h
> > index 666aeb71bf1b..70701c59d053 100644
> > --- a/xen/include/public/domctl.h
> > +++ b/xen/include/public/domctl.h
> > @@ -70,9 +70,11 @@ struct xen_domctl_createdomain {
> >  #define XEN_DOMCTL_CDF_iommu          (1U<<_XEN_DOMCTL_CDF_iommu)
> >  #define _XEN_DOMCTL_CDF_nested_virt   6
> >  #define XEN_DOMCTL_CDF_nested_virt    (1U << _XEN_DOMCTL_CDF_nested_virt)
> > +#define _XEN_DOMCTL_CDF_disable_fifo  7
> > +#define XEN_DOMCTL_CDF_disable_fifo   (1U << _XEN_DOMCTL_CDF_disable_fifo)
> 
> The sense is backwards.  It should be a "permit the use of FIFO"
> control.  If the code had been written this way to begin with, the bug
> you found wouldn't have existed.
> 
> Given that there is not currently a way to disable FIFO, you can
> probably do without an enumeration of whether the hypervisor supports it
> or not.
> 

Ok, I can reverse the sense.

I found another one that we ought to control in a similar way... the per-cpu evtchn upcalls. AFAIK only the Windows PV drivers make use of it (and I can arrange to squash that with a registry flag) but it really falls into the same category as FIFO... so maybe we need a separate bit-field for these sorts of thing?

  Paul

> ~Andrew
Jan Beulich Nov. 25, 2020, 11:51 a.m. UTC | #7
On 25.11.2020 12:10, Paul Durrant wrote:
>> From: Jan Beulich <jbeulich@suse.com>
>> Sent: 25 November 2020 09:20
>>
>> On 24.11.2020 20:17, Paul Durrant wrote:
>>> From: Paul Durrant <pdurrant@amazon.com>
>>>
>>> ...to control the visibility of the FIFO event channel operations
>>> (EVTCHNOP_init_control, EVTCHNOP_expand_array, and EVTCHNOP_set_priority) to
>>> the guest.
>>>
>>> These operations were added to the public header in commit d2d50c2f308f
>>> ("evtchn: add FIFO-based event channel ABI") and the first implementation
>>> appeared in the two subsequent commits: edc8872aeb4a ("evtchn: implement
>>> EVTCHNOP_set_priority and add the set_priority hook") and 88910061ec61
>>> ("evtchn: add FIFO-based event channel hypercalls and port ops"). Prior to
>>> that, a guest issuing those operations would receive a return value of
>>> -ENOSYS (not implemented) from Xen. Guests aware of the FIFO operations but
>>> running on an older (pre-4.4) Xen would fall back to using the 2-level event
>>> channel interface upon seeing this return value.
>>>
>>> Unfortunately the uncontrolable appearance of these new operations in Xen 4.4
>>> onwards has implications for hibernation of some Linux guests. During resume
>>> from hibernation, there are two kernels involved: the "boot" kernel and the
>>> "resume" kernel. The guest boot kernel may default to use FIFO operations and
>>> instruct Xen via EVTCHNOP_init_control to switch from 2-level to FIFO. On the
>>> other hand, the resume kernel keeps assuming 2-level, because it was hibernated
>>> on a version of Xen that did not support the FIFO operations.
>>
>> And the alternative of the boot kernel issuing EVTCHNOP_reset has
>> other unwanted consequences. Maybe worth mentioning here, as
>> otherwise this would look like the obvious way to return to 2-level
>> mode?
>>
>> Also, why can't the boot kernel be instructed to avoid engaging
>> FIFO mode?
>>
> 
> Both of those are, of course, viable alternatives if the guest can be
> modified. The problem we need to work around is guest that are already
> out there and cannot be updated.

Making use of EVTCHNOP_reset indeed would require a change to the
kernel. But Linux has a command line option to suppress use of
FIFO event channels, so I can't see why the boot kernel couldn't
be passed this flag without any modification to the binary.

>>> To maintain compatibility it is necessary to make Xen behave as it did
>>> before the new operations were added and hence the code in this patch ensures
>>> that, if XEN_DOMCTL_CDF_disable_fifo is set, the FIFO event channel operations
>>> will again result in -ENOSYS being returned to the guest.
>>
>> Are there indeed dependencies on the precise return value anywhere?
>> If so, the generally inappropriate use (do_event_channel_op()'s
>> default case really would also need switching) would want a brief
>> comment, so it'll be understood by readers that this isn't code to
>> derive other code from. If not, -EPERM or -EACCES perhaps?
>>
> 
> The patch, as stated, is reverting behaviour and so the -ENOSYS really
> needs to stay since it is essentially ABI now. I am not aware of guest
> code that will, in fact, die if it sees -EPERM or -EACCES instead but
> there may be such code. The only safe thing to do is to make things
> look like the used to.

I don't think specific error codes can be considered "ABI". Not
the least because, if there are multiple causes for an error, it
ought to be undefined which error gets returned. A guest not
falling back to 2-level on _any_ error here is basically setting
itself up for eventual failure because of e.g. getting back
-ENOMEM. Or someone deciding to add an XSM check to the function.

As said, I'm of the opinion that the other -ENOSYS ought to be
replaced as well, which of course would be precluded if this was
considered "ABI".

Jan
Durrant, Paul Nov. 25, 2020, 12:10 p.m. UTC | #8
> -----Original Message-----
> From: Jan Beulich <jbeulich@suse.com>
> Sent: 25 November 2020 11:51
> To: paul@xen.org
> Cc: Durrant, Paul <pdurrant@amazon.co.uk>; Elnikety, Eslam <elnikety@amazon.com>; 'Christian Lindig'
> <christian.lindig@citrix.com>; 'David Scott' <dave@recoil.org>; 'Ian Jackson' <iwj@xenproject.org>;
> 'Wei Liu' <wl@xen.org>; 'Andrew Cooper' <andrew.cooper3@citrix.com>; 'George Dunlap'
> <george.dunlap@citrix.com>; 'Julien Grall' <julien@xen.org>; 'Stefano Stabellini'
> <sstabellini@kernel.org>; xen-devel@lists.xenproject.org
> Subject: RE: [EXTERNAL] [PATCH v4 1/3] domctl: introduce a new domain create flag,
> XEN_DOMCTL_CDF_disable_fifo, ...
> 
> CAUTION: This email originated from outside of the organization. Do not click links or open
> attachments unless you can confirm the sender and know the content is safe.
> 
> 
> 
> On 25.11.2020 12:10, Paul Durrant wrote:
> >> From: Jan Beulich <jbeulich@suse.com>
> >> Sent: 25 November 2020 09:20
> >>
> >> On 24.11.2020 20:17, Paul Durrant wrote:
> >>> From: Paul Durrant <pdurrant@amazon.com>
> >>>
> >>> ...to control the visibility of the FIFO event channel operations
> >>> (EVTCHNOP_init_control, EVTCHNOP_expand_array, and EVTCHNOP_set_priority) to
> >>> the guest.
> >>>
> >>> These operations were added to the public header in commit d2d50c2f308f
> >>> ("evtchn: add FIFO-based event channel ABI") and the first implementation
> >>> appeared in the two subsequent commits: edc8872aeb4a ("evtchn: implement
> >>> EVTCHNOP_set_priority and add the set_priority hook") and 88910061ec61
> >>> ("evtchn: add FIFO-based event channel hypercalls and port ops"). Prior to
> >>> that, a guest issuing those operations would receive a return value of
> >>> -ENOSYS (not implemented) from Xen. Guests aware of the FIFO operations but
> >>> running on an older (pre-4.4) Xen would fall back to using the 2-level event
> >>> channel interface upon seeing this return value.
> >>>
> >>> Unfortunately the uncontrolable appearance of these new operations in Xen 4.4
> >>> onwards has implications for hibernation of some Linux guests. During resume
> >>> from hibernation, there are two kernels involved: the "boot" kernel and the
> >>> "resume" kernel. The guest boot kernel may default to use FIFO operations and
> >>> instruct Xen via EVTCHNOP_init_control to switch from 2-level to FIFO. On the
> >>> other hand, the resume kernel keeps assuming 2-level, because it was hibernated
> >>> on a version of Xen that did not support the FIFO operations.
> >>
> >> And the alternative of the boot kernel issuing EVTCHNOP_reset has
> >> other unwanted consequences. Maybe worth mentioning here, as
> >> otherwise this would look like the obvious way to return to 2-level
> >> mode?
> >>
> >> Also, why can't the boot kernel be instructed to avoid engaging
> >> FIFO mode?
> >>
> >
> > Both of those are, of course, viable alternatives if the guest can be
> > modified. The problem we need to work around is guest that are already
> > out there and cannot be updated.
> 
> Making use of EVTCHNOP_reset indeed would require a change to the
> kernel. But Linux has a command line option to suppress use of
> FIFO event channels, so I can't see why the boot kernel couldn't
> be passed this flag without any modification to the binary.
> 

I'm sure that was considered and found not to be feasible in our use-case. (Likely the command line is as much baked into the guest image as the kernel itself).

> >>> To maintain compatibility it is necessary to make Xen behave as it did
> >>> before the new operations were added and hence the code in this patch ensures
> >>> that, if XEN_DOMCTL_CDF_disable_fifo is set, the FIFO event channel operations
> >>> will again result in -ENOSYS being returned to the guest.
> >>
> >> Are there indeed dependencies on the precise return value anywhere?
> >> If so, the generally inappropriate use (do_event_channel_op()'s
> >> default case really would also need switching) would want a brief
> >> comment, so it'll be understood by readers that this isn't code to
> >> derive other code from. If not, -EPERM or -EACCES perhaps?
> >>
> >
> > The patch, as stated, is reverting behaviour and so the -ENOSYS really
> > needs to stay since it is essentially ABI now. I am not aware of guest
> > code that will, in fact, die if it sees -EPERM or -EACCES instead but
> > there may be such code. The only safe thing to do is to make things
> > look like the used to.
> 
> I don't think specific error codes can be considered "ABI". Not
> the least because, if there are multiple causes for an error, it
> ought to be undefined which error gets returned. A guest not
> falling back to 2-level on _any_ error here is basically setting
> itself up for eventual failure because of e.g. getting back
> -ENOMEM. Or someone deciding to add an XSM check to the function.
> 

I'm not disagreeing that depending on -ENOSYS is a bad idea but, before FIFO came along, that's what the guest would see and that is what it ought to see again if we want to truly obscure the interface (which is the stated aim here).

> As said, I'm of the opinion that the other -ENOSYS ought to be
> replaced as well, which of course would be precluded if this was
> considered "ABI".
> 

Indeed.

  Paul

> Jan
diff mbox series

Patch

diff --git a/tools/ocaml/libs/xc/xenctrl.ml b/tools/ocaml/libs/xc/xenctrl.ml
index e878699b0a1a..9ccad9aece8c 100644
--- a/tools/ocaml/libs/xc/xenctrl.ml
+++ b/tools/ocaml/libs/xc/xenctrl.ml
@@ -65,6 +65,7 @@  type domain_create_flag =
 	| CDF_XS_DOMAIN
 	| CDF_IOMMU
 	| CDF_NESTED_VIRT
+	| CDF_DISABLE_FIFO
 
 type domain_create_iommu_opts =
 	| IOMMU_NO_SHAREPT
diff --git a/tools/ocaml/libs/xc/xenctrl.mli b/tools/ocaml/libs/xc/xenctrl.mli
index e64907df8e7e..8bb0f9e14b8e 100644
--- a/tools/ocaml/libs/xc/xenctrl.mli
+++ b/tools/ocaml/libs/xc/xenctrl.mli
@@ -58,6 +58,7 @@  type domain_create_flag =
   | CDF_XS_DOMAIN
   | CDF_IOMMU
   | CDF_NESTED_VIRT
+  | CDF_DISABLE_FIFO
 
 type domain_create_iommu_opts =
   | IOMMU_NO_SHAREPT
diff --git a/xen/common/domain.c b/xen/common/domain.c
index f748806a450b..75aed7fd5b01 100644
--- a/xen/common/domain.c
+++ b/xen/common/domain.c
@@ -307,7 +307,7 @@  static int sanitise_domain_config(struct xen_domctl_createdomain *config)
          ~(XEN_DOMCTL_CDF_hvm | XEN_DOMCTL_CDF_hap |
            XEN_DOMCTL_CDF_s3_integrity | XEN_DOMCTL_CDF_oos_off |
            XEN_DOMCTL_CDF_xs_domain | XEN_DOMCTL_CDF_iommu |
-           XEN_DOMCTL_CDF_nested_virt) )
+           XEN_DOMCTL_CDF_nested_virt | XEN_DOMCTL_CDF_disable_fifo) )
     {
         dprintk(XENLOG_INFO, "Unknown CDF flags %#x\n", config->flags);
         return -EINVAL;
diff --git a/xen/common/event_channel.c b/xen/common/event_channel.c
index 54b2e2550e93..6a96ccf56c3a 100644
--- a/xen/common/event_channel.c
+++ b/xen/common/event_channel.c
@@ -1193,10 +1193,27 @@  static long evtchn_set_priority(const struct evtchn_set_priority *set_priority)
     return ret;
 }
 
+static bool is_fifo_op(int cmd)
+{
+    switch ( cmd )
+    {
+    case EVTCHNOP_init_control:
+    case EVTCHNOP_expand_array:
+    case EVTCHNOP_set_priority:
+        return true;
+    default:
+        return false;
+    }
+}
+
 long do_event_channel_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
 {
     long rc;
 
+    if ( (current->domain->options & XEN_DOMCTL_CDF_disable_fifo) &&
+         is_fifo_op(cmd) )
+        return -ENOSYS;
+
     switch ( cmd )
     {
     case EVTCHNOP_alloc_unbound: {
diff --git a/xen/include/public/domctl.h b/xen/include/public/domctl.h
index 666aeb71bf1b..70701c59d053 100644
--- a/xen/include/public/domctl.h
+++ b/xen/include/public/domctl.h
@@ -70,9 +70,11 @@  struct xen_domctl_createdomain {
 #define XEN_DOMCTL_CDF_iommu          (1U<<_XEN_DOMCTL_CDF_iommu)
 #define _XEN_DOMCTL_CDF_nested_virt   6
 #define XEN_DOMCTL_CDF_nested_virt    (1U << _XEN_DOMCTL_CDF_nested_virt)
+#define _XEN_DOMCTL_CDF_disable_fifo  7
+#define XEN_DOMCTL_CDF_disable_fifo   (1U << _XEN_DOMCTL_CDF_disable_fifo)
 
 /* Max XEN_DOMCTL_CDF_* constant.  Used for ABI checking. */
-#define XEN_DOMCTL_CDF_MAX XEN_DOMCTL_CDF_nested_virt
+#define XEN_DOMCTL_CDF_MAX XEN_DOMCTL_CDF_disable_fifo
 
     uint32_t flags;