[RFC,1/1] xsm: allows system domains to allocate evtchn

Message ID	20220328203622.30961-2-dpsmith@apertussolutions.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <xen-devel-bounces@lists.xenproject.org> Errors-To: xen-devel-bounces@lists.xenproject.org Precedence: list Sender: "Xen-devel" <xen-devel-bounces@lists.xenproject.org> From: "Daniel P. Smith" <dpsmith@apertussolutions.com> To: "Daniel P. Smith" <dpsmith@apertussolutions.com>, xen-devel@lists.xenproject.org Cc: scott.davis@starlab.io, Andrew Cooper <andrew.cooper3@citrix.com>, George Dunlap <george.dunlap@citrix.com>, Jan Beulich <jbeulich@suse.com>, Julien Grall <julien@xen.org>, Stefano Stabellini <sstabellini@kernel.org>, Wei Liu <wl@xen.org>, Daniel De Graaf <dgdegra@tycho.nsa.gov> Subject: [RFC PATCH 1/1] xsm: allows system domains to allocate evtchn Date: Mon, 28 Mar 2022 16:36:22 -0400 Message-Id: <20220328203622.30961-2-dpsmith@apertussolutions.com> In-Reply-To: <20220328203622.30961-1-dpsmith@apertussolutions.com> References: <20220328203622.30961-1-dpsmith@apertussolutions.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit
Series	allow system domains to allocate event channels \| expand [RFC,0/1] allow system domains to allocate event channels [RFC,1/1] xsm: allows system domains to allocate evtchn

Daniel P. Smith March 28, 2022, 8:36 p.m. UTC

During domain construction under dom0less and hyperlaunch it is necessary to
allocate at least the event channel for xenstore and potentially the event
channel for the core console. When dom0less and hyperlaunch are doing their
construction logic they are executing under the idle domain context. The idle
domain is not a privileged domain, it is not the target domain, and as a result
under the current default XSM policy is not allowed to allocate the event
channel.

This patch only addresses the event channel situation by adjust the default XSM
policy for xsm_evtchn_unbound to explicitly allow system domains to be able to
make the allocation call.

Signed-off-by: Daniel P. Smith <dpsmith@apertussolutions.com>
---
 xen/common/event_channel.c | 4 ++--
 xen/include/xsm/dummy.h    | 8 ++++++--
 2 files changed, 8 insertions(+), 4 deletions(-)

Stefano Stabellini March 28, 2022, 11:21 p.m. UTC | #1

On Mon, 28 Mar 2022, Daniel P. Smith wrote:
> During domain construction under dom0less and hyperlaunch it is necessary to
> allocate at least the event channel for xenstore and potentially the event
> channel for the core console. When dom0less and hyperlaunch are doing their
> construction logic they are executing under the idle domain context. The idle
> domain is not a privileged domain, it is not the target domain, and as a result
> under the current default XSM policy is not allowed to allocate the event
> channel.
> 
> This patch only addresses the event channel situation by adjust the default XSM
> policy for xsm_evtchn_unbound to explicitly allow system domains to be able to
> make the allocation call.
> 
> Signed-off-by: Daniel P. Smith <dpsmith@apertussolutions.com>

Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Tested-by: Stefano Stabellini <sstabellini@kernel.org>


> ---
>  xen/common/event_channel.c | 4 ++--
>  xen/include/xsm/dummy.h    | 8 ++++++--
>  2 files changed, 8 insertions(+), 4 deletions(-)
> 
> diff --git a/xen/common/event_channel.c b/xen/common/event_channel.c
> index ffb042a241..c9c3876ee9 100644
> --- a/xen/common/event_channel.c
> +++ b/xen/common/event_channel.c
> @@ -306,7 +306,7 @@ static int evtchn_alloc_unbound(evtchn_alloc_unbound_t *alloc)
>          ERROR_EXIT_DOM(port, d);
>      chn = evtchn_from_port(d, port);
>  
> -    rc = xsm_evtchn_unbound(XSM_TARGET, d, chn, alloc->remote_dom);
> +    rc = xsm_evtchn_unbound(XSM_OTHER, d, chn, alloc->remote_dom);
>      if ( rc )
>          goto out;
>  
> @@ -1366,7 +1366,7 @@ int alloc_unbound_xen_event_channel(
>          goto out;
>      chn = evtchn_from_port(ld, port);
>  
> -    rc = xsm_evtchn_unbound(XSM_TARGET, ld, chn, remote_domid);
> +    rc = xsm_evtchn_unbound(XSM_OTHER, ld, chn, remote_domid);
>      if ( rc )
>          goto out;
>  
> diff --git a/xen/include/xsm/dummy.h b/xen/include/xsm/dummy.h
> index 58afc1d589..bd31ce43f9 100644
> --- a/xen/include/xsm/dummy.h
> +++ b/xen/include/xsm/dummy.h
> @@ -294,8 +294,12 @@ static XSM_INLINE int cf_check xsm_claim_pages(XSM_DEFAULT_ARG struct domain *d)
>  static XSM_INLINE int cf_check xsm_evtchn_unbound(
>      XSM_DEFAULT_ARG struct domain *d, struct evtchn *chn, domid_t id2)
>  {
> -    XSM_ASSERT_ACTION(XSM_TARGET);
> -    return xsm_default_action(action, current->domain, d);
> +    XSM_ASSERT_ACTION(XSM_OTHER);
> +
> +    if ( is_system_domain(current->domain) )
> +        return xsm_default_action(XSM_HOOK, current->domain, d);
> +    else
> +        return xsm_default_action(XSM_TARGET, current->domain, d);
>  }
>  
>  static XSM_INLINE int cf_check xsm_evtchn_interdomain(
> -- 
> 2.20.1
>

Jan Beulich March 29, 2022, 6:43 a.m. UTC | #2

On 28.03.2022 22:36, Daniel P. Smith wrote:
> During domain construction under dom0less and hyperlaunch it is necessary to
> allocate at least the event channel for xenstore and potentially the event
> channel for the core console. When dom0less and hyperlaunch are doing their
> construction logic they are executing under the idle domain context. The idle
> domain is not a privileged domain, it is not the target domain, and as a result
> under the current default XSM policy is not allowed to allocate the event
> channel.

I appreciate the change is only needed there right now, but it feels
inconsistent. _If_ it is to remain that way, at least a comment needs
to be put in xsm_evtchn_unbound() making clear why this is a special
case, and hence clarifying to people what the approximate conditions
are to have such also put elsewhere. But imo it would be better to
make the adjustment right in xsm_default_action(), without touching
event_channel.c at all. Iirc altering xsm_default_action() was
discussed before, but I don't recall particular reasons speaking
against that approach.

> This patch only addresses the event channel situation by adjust the default XSM
> policy for xsm_evtchn_unbound to explicitly allow system domains to be able to
> make the allocation call.

Indeed I'm having trouble seeing how your change would work for SILO
mode, albeit Stefano having tested this would make me assume he did
so in SILO mode, as that's the default on Arm iirc. Afaict
silo_mode_dom_check() should return false in the described situation.

Similarly I don't see how things would work transparently with a
Flask policy in place. Regardless of you mentioning the restriction,
I think this wants resolving before the patch can go in.

Jan

Roger Pau Monné March 29, 2022, 7:29 a.m. UTC | #3

On Mon, Mar 28, 2022 at 04:36:22PM -0400, Daniel P. Smith wrote:
> During domain construction under dom0less and hyperlaunch it is necessary to
> allocate at least the event channel for xenstore and potentially the event
> channel for the core console. When dom0less and hyperlaunch are doing their
> construction logic they are executing under the idle domain context. The idle
> domain is not a privileged domain, it is not the target domain, and as a result
> under the current default XSM policy is not allowed to allocate the event
> channel.

I've not been following the discussion around this patch, but I would
assume such privileges are only required for init code when no other
domains are running?

Since it's only at that point where the idle domain context needs to
allocate event channels would it make sense to temporary elevate it's
privileges by setting d->is_privileged while doing the domain creation?

That way we wouldn't need to grant those permissions for the lifetime
of the host when they are only needed for initialization code.

Another option would be switching to the initial vCPU of the domain
being created, but that's likely to be more complex, or even create a
short lived system domain with is_privileged set just for the purpose
of building other domains.

Overall I'm not sure it's worth giving those extra privileges to the
idle domain when they are just need for a known and bounded period of
time.

Thanks, Roger.

Daniel P. Smith March 29, 2022, 6:57 p.m. UTC | #4

On 3/29/22 02:43, Jan Beulich wrote:
> On 28.03.2022 22:36, Daniel P. Smith wrote:
>> During domain construction under dom0less and hyperlaunch it is necessary to
>> allocate at least the event channel for xenstore and potentially the event
>> channel for the core console. When dom0less and hyperlaunch are doing their
>> construction logic they are executing under the idle domain context. The idle
>> domain is not a privileged domain, it is not the target domain, and as a result
>> under the current default XSM policy is not allowed to allocate the event
>> channel.
> 
> I appreciate the change is only needed there right now, but it feels
> inconsistent. _If_ it is to remain that way, at least a comment needs
> to be put in xsm_evtchn_unbound() making clear why this is a special
> case, and hence clarifying to people what the approximate conditions
> are to have such also put elsewhere. But imo it would be better to
> make the adjustment right in xsm_default_action(), without touching
> event_channel.c at all. Iirc altering xsm_default_action() was
> discussed before, but I don't recall particular reasons speaking
> against that approach.

By inconsistent, I take it you mean this is first place within an XSM
hook where an access decision is based on the current domain being a
system domain? I do agree and would add a comment to the change in the
XSM hook in a non-RFC version of the patch.

As to moving the check down into xsm_default_action(), the concern I
have with doing so is that this would then make every XSM check succeed
if `current->domain` is a system domain. Doing so would require a review
of every function which has an XSM hoook to evaluate every invocation of
those functions that,
  1. is there ever a time when current->domain may be a system domain
  2. if so,
    a. is the invocation on behalf of the system domain
    b. or is the invocation on behalf of a non-system domain

If there is any instance of 2b, then an inadvertent privilege escalation
can occur on that path. For evtchn_alloc_unbound() I verified the only
place, besides the new hyperlaunch calls, it is invoked is in the evtchn
hypercall handler, where current should be pointing at the domain that
made the hypercall.

>> This patch only addresses the event channel situation by adjust the default XSM
>> policy for xsm_evtchn_unbound to explicitly allow system domains to be able to
>> make the allocation call.
> 
> Indeed I'm having trouble seeing how your change would work for SILO
> mode, albeit Stefano having tested this would make me assume he did
> so in SILO mode, as that's the default on Arm iirc. Afaict
> silo_mode_dom_check() should return false in the described situation.

Correct, this patch only addressed the default policy. If an equivalent
change for SILO is desired, then it would be placed in
silo_evtchn_unbound() and not in silo_mode_dom_check() for the same
reasons I would be hesitant to place it in xsm_default_action().

> Similarly I don't see how things would work transparently with a
> Flask policy in place. Regardless of you mentioning the restriction,
> I think this wants resolving before the patch can go in.

To enable the equivalent in flask would require no hypervisor code
changes. Instead that would be handled by adding the necessary rules to
the appropriate flask policy file(s).


v/r,
dps

Julien Grall March 29, 2022, 9:57 p.m. UTC | #5

Hi Daniel,

On 29/03/2022 19:57, Daniel P. Smith wrote:
> On 3/29/22 02:43, Jan Beulich wrote:
>> On 28.03.2022 22:36, Daniel P. Smith wrote:
>>> During domain construction under dom0less and hyperlaunch it is necessary to
>>> allocate at least the event channel for xenstore and potentially the event
>>> channel for the core console. When dom0less and hyperlaunch are doing their
>>> construction logic they are executing under the idle domain context. The idle
>>> domain is not a privileged domain, it is not the target domain, and as a result
>>> under the current default XSM policy is not allowed to allocate the event
>>> channel.
>>
>> I appreciate the change is only needed there right now, but it feels
>> inconsistent. _If_ it is to remain that way, at least a comment needs
>> to be put in xsm_evtchn_unbound() making clear why this is a special
>> case, and hence clarifying to people what the approximate conditions
>> are to have such also put elsewhere. But imo it would be better to
>> make the adjustment right in xsm_default_action(), without touching
>> event_channel.c at all. Iirc altering xsm_default_action() was
>> discussed before, but I don't recall particular reasons speaking
>> against that approach.
> 
> By inconsistent, I take it you mean this is first place within an XSM
> hook where an access decision is based on the current domain being a
> system domain? I do agree and would add a comment to the change in the
> XSM hook in a non-RFC version of the patch.
> 
> As to moving the check down into xsm_default_action(), the concern I
> have with doing so is that this would then make every XSM check succeed
> if `current->domain` is a system domain. Doing so would require a review
> of every function which has an XSM hoook to evaluate every invocation of
> those functions that,
>    1. is there ever a time when current->domain may be a system domain
>    2. if so,
>      a. is the invocation on behalf of the system domain
>      b. or is the invocation on behalf of a non-system domain
> 
> If there is any instance of 2b, then an inadvertent privilege escalation
> can occur on that path. For evtchn_alloc_unbound() I verified the only
> place, besides the new hyperlaunch calls, it is invoked is in the evtchn
> hypercall handler, where current should be pointing at the domain that
> made the hypercall.
Auditing existing calls is somewhat easy. The trouble are for new calls. 
I would say they are unlikely, but we would need to rely on the 
reviewers to spot any misuse. So this is a bit risky.

I am also a bit worry that we would end up to convert a lot of 
XSM_TARGET to XSM_HOOK (Note I have Live-Update in mind). This would 
make more difficult to figure what would the XSM calls allows without 
looking at the helper.

I quite like the proposal from Roger. If we define two helpers (e.g. 
xsm_{enable, disable}_build_domain()), we could elevate the privilege 
for the idle domain for a short period of time (this could be restricted 
to when the dummy policy is used).

Cheers,

Daniel P. Smith March 29, 2022, 11:12 p.m. UTC | #6

On 3/29/22 03:29, Roger Pau Monné wrote:
> On Mon, Mar 28, 2022 at 04:36:22PM -0400, Daniel P. Smith wrote:
>> During domain construction under dom0less and hyperlaunch it is necessary to
>> allocate at least the event channel for xenstore and potentially the event
>> channel for the core console. When dom0less and hyperlaunch are doing their
>> construction logic they are executing under the idle domain context. The idle
>> domain is not a privileged domain, it is not the target domain, and as a result
>> under the current default XSM policy is not allowed to allocate the event
>> channel.
> 
> I've not been following the discussion around this patch, but I would
> assume such privileges are only required for init code when no other
> domains are running?

At this time, correct.

> Since it's only at that point where the idle domain context needs to
> allocate event channels would it make sense to temporary elevate it's
> privileges by setting d->is_privileged while doing the domain creation?

This is initially what I did but it seemed like there was some
reluctance. As I was looking to formalize/abstract this in XSM instead
of doing direct manipulations, I realized I could achieve it in the hook
which would allow the hyperlaunch and dom0less code work without having
to ensure priv escalation is properly handled.

> That way we wouldn't need to grant those permissions for the lifetime
> of the host when they are only needed for initialization code.

Correct, which is why I adjusted the effective default policy only on
the check instead of in xsm_default_action() as Jan has suggested.
Outside of a code fault, all other times that evtchn_alloc_unbound() is
called `current->domain` should be pointing at the caller of the hypercall.

This works as an interim solution with minimal impact as it is all
internal to XSM and can easily be evolved. My concern is that exposing a
function call to provide priv escalation for the idle domain as an
interim solution for dom0less and hyperlaunch will have more impactful
code churn in both of these when a longer term approach is adopted.

> Another option would be switching to the initial vCPU of the domain
> being created, but that's likely to be more complex, or even create a
> short lived system domain with is_privileged set just for the purpose
> of building other domains.

Longer term I would like to explore doing this in general. Some initial
thinking is the fact that hypervisor has a few contexts, relative to
external entities, under which it is executing. When it is handling
internal house keeping (e.g. scheduler and security server), when it is
interacting with guest domains, when it is interacting with hardware
(e.g. vpci), and now when it is processing boot material to construct
domains. It  has been mentioned that today in Xen if one of these
contexts acting with external entities is corrupted, it can interfere
with operations occurring in the other contexts. In the past the have
advocated and been working to split these contexts using hard L0/L1
separation. As noted in other discussions, some architectures are
gaining hardware features that can be used in hard L0/L1 partitioning
but also could be used in a more "soft" partitioning more a kin to
Nested Kernel[1] and Dune[2]. Again just some initial thoughts.

> Overall I'm not sure it's worth giving those extra privileges to the
> idle domain when they are just need for a known and bounded period of
> time.

IMHO that is a slight over simplification. Setting is_privileged to the
idle domain while it is processing domain construction data from outside
the hypervisor means that during that bounded period the idle domain is
complete unrestricted and may invoke any XSM protected call. Contrast
this with only granting the idle domain the ability to allocate event
channels between domains at any time with the only codified usage is
during init/setup. While I am unsure how, theoretically malformed
construction data could expose a logic flaw to do some very unsavory
allocations without any guards. Whereas during runtime if the idle
domain was tricked into establishing an event channel between two
domains, it would only serve to provide a covert channel between the two
domains. Neither is desirable but IMHO I find the former a little more
concerning than the latter.

With that said, I am not completely against doing the priv escalation if
overall this is the direction that is preferred. If so, I would prefer
to provide a pair of static inlines under XSM name space to provide a
consistent implementation and be able to easily locate the places where
it is applied if/when a longer term approach is implemented.

v/r,
dps

[1]
https://nathandautenhahn.com/downloads/publications/asplos200-dautenhahn.pdf
[2]
https://web.stanford.edu/group/mast/cgi-bin/drupal/system/files/2012.dune_.osdi_.pdf

Jan Beulich March 30, 2022, 6:30 a.m. UTC | #7

On 29.03.2022 20:57, Daniel P. Smith wrote:
> On 3/29/22 02:43, Jan Beulich wrote:
>> On 28.03.2022 22:36, Daniel P. Smith wrote:
>>> During domain construction under dom0less and hyperlaunch it is necessary to
>>> allocate at least the event channel for xenstore and potentially the event
>>> channel for the core console. When dom0less and hyperlaunch are doing their
>>> construction logic they are executing under the idle domain context. The idle
>>> domain is not a privileged domain, it is not the target domain, and as a result
>>> under the current default XSM policy is not allowed to allocate the event
>>> channel.
>>
>> I appreciate the change is only needed there right now, but it feels
>> inconsistent. _If_ it is to remain that way, at least a comment needs
>> to be put in xsm_evtchn_unbound() making clear why this is a special
>> case, and hence clarifying to people what the approximate conditions
>> are to have such also put elsewhere. But imo it would be better to
>> make the adjustment right in xsm_default_action(), without touching
>> event_channel.c at all. Iirc altering xsm_default_action() was
>> discussed before, but I don't recall particular reasons speaking
>> against that approach.
> 
> By inconsistent, I take it you mean this is first place within an XSM
> hook where an access decision is based on the current domain being a
> system domain?

Well - yes and no. Even if further instances appeared, overall state
would still end up inconsistent.

> I do agree and would add a comment to the change in the
> XSM hook in a non-RFC version of the patch.
> 
> As to moving the check down into xsm_default_action(), the concern I
> have with doing so is that this would then make every XSM check succeed
> if `current->domain` is a system domain. Doing so would require a review
> of every function which has an XSM hoook to evaluate every invocation of
> those functions that,
>   1. is there ever a time when current->domain may be a system domain
>   2. if so,
>     a. is the invocation on behalf of the system domain
>     b. or is the invocation on behalf of a non-system domain
> 
> If there is any instance of 2b, then an inadvertent privilege escalation
> can occur on that path. For evtchn_alloc_unbound() I verified the only
> place, besides the new hyperlaunch calls, it is invoked is in the evtchn
> hypercall handler, where current should be pointing at the domain that
> made the hypercall.

Such an audit shouldn't be overly difficult, as the majority of XSM hook
invocations sit clearly visible on hypercall paths, where it is clear
that current->domain is not a system one.

>>> This patch only addresses the event channel situation by adjust the default XSM
>>> policy for xsm_evtchn_unbound to explicitly allow system domains to be able to
>>> make the allocation call.
>>
>> Indeed I'm having trouble seeing how your change would work for SILO
>> mode, albeit Stefano having tested this would make me assume he did
>> so in SILO mode, as that's the default on Arm iirc. Afaict
>> silo_mode_dom_check() should return false in the described situation.
> 
> Correct, this patch only addressed the default policy. If an equivalent
> change for SILO is desired, then it would be placed in
> silo_evtchn_unbound() and not in silo_mode_dom_check() for the same
> reasons I would be hesitant to place it in xsm_default_action().
> 
>> Similarly I don't see how things would work transparently with a
>> Flask policy in place. Regardless of you mentioning the restriction,
>> I think this wants resolving before the patch can go in.
> 
> To enable the equivalent in flask would require no hypervisor code
> changes. Instead that would be handled by adding the necessary rules to
> the appropriate flask policy file(s).

Of course this can be dealt with by adjusting policy file(s), but until
people do so they'd end up with a perceived regression and/or unexplained
difference in behavior from running in dummy (or SILO, once also taken
care of) mode.

Jan

Roger Pau Monné March 30, 2022, 9:40 a.m. UTC | #8

On Tue, Mar 29, 2022 at 07:12:56PM -0400, Daniel P. Smith wrote:
> On 3/29/22 03:29, Roger Pau Monné wrote:
> > On Mon, Mar 28, 2022 at 04:36:22PM -0400, Daniel P. Smith wrote:
> >> During domain construction under dom0less and hyperlaunch it is necessary to
> >> allocate at least the event channel for xenstore and potentially the event
> >> channel for the core console. When dom0less and hyperlaunch are doing their
> >> construction logic they are executing under the idle domain context. The idle
> >> domain is not a privileged domain, it is not the target domain, and as a result
> >> under the current default XSM policy is not allowed to allocate the event
> >> channel.
> > 
> > I've not been following the discussion around this patch, but I would
> > assume such privileges are only required for init code when no other
> > domains are running?
> 
> At this time, correct.
> 
> > Since it's only at that point where the idle domain context needs to
> > allocate event channels would it make sense to temporary elevate it's
> > privileges by setting d->is_privileged while doing the domain creation?
> 
> This is initially what I did but it seemed like there was some
> reluctance. As I was looking to formalize/abstract this in XSM instead
> of doing direct manipulations, I realized I could achieve it in the hook
> which would allow the hyperlaunch and dom0less code work without having
> to ensure priv escalation is properly handled.
> 
> > That way we wouldn't need to grant those permissions for the lifetime
> > of the host when they are only needed for initialization code.
> 
> Correct, which is why I adjusted the effective default policy only on
> the check instead of in xsm_default_action() as Jan has suggested.
> Outside of a code fault, all other times that evtchn_alloc_unbound() is
> called `current->domain` should be pointing at the caller of the hypercall.
> 
> This works as an interim solution with minimal impact as it is all
> internal to XSM and can easily be evolved. My concern is that exposing a
> function call to provide priv escalation for the idle domain as an
> interim solution for dom0less and hyperlaunch will have more impactful
> code churn in both of these when a longer term approach is adopted.
> 
> > Another option would be switching to the initial vCPU of the domain
> > being created, but that's likely to be more complex, or even create a
> > short lived system domain with is_privileged set just for the purpose
> > of building other domains.
> 
> Longer term I would like to explore doing this in general. Some initial
> thinking is the fact that hypervisor has a few contexts, relative to
> external entities, under which it is executing. When it is handling
> internal house keeping (e.g. scheduler and security server), when it is
> interacting with guest domains, when it is interacting with hardware
> (e.g. vpci), and now when it is processing boot material to construct
> domains. It  has been mentioned that today in Xen if one of these
> contexts acting with external entities is corrupted, it can interfere
> with operations occurring in the other contexts. In the past the have
> advocated and been working to split these contexts using hard L0/L1
> separation. As noted in other discussions, some architectures are
> gaining hardware features that can be used in hard L0/L1 partitioning
> but also could be used in a more "soft" partitioning more a kin to
> Nested Kernel[1] and Dune[2]. Again just some initial thoughts.
> 
> > Overall I'm not sure it's worth giving those extra privileges to the
> > idle domain when they are just need for a known and bounded period of
> > time.
> 
> IMHO that is a slight over simplification. Setting is_privileged to the
> idle domain while it is processing domain construction data from outside
> the hypervisor means that during that bounded period the idle domain is
> complete unrestricted and may invoke any XSM protected call.

The domain builder code executed in the idle domain context can make
direct calls to any functions that are otherwise protected behind an
XSM check on the hypercall paths, so I don't see much difference.  The
code executed by the idle domain in order to do domain creation is
already part of the trusted code base (ie: it's hypervisor code)
likewise with the data used as input.

> Contrast
> this with only granting the idle domain the ability to allocate event
> channels between domains at any time with the only codified usage is
> during init/setup. While I am unsure how, theoretically malformed
> construction data could expose a logic flaw to do some very unsavory
> allocations without any guards.

It's kind of like that already, it's just that in other instances the
calls done by the domain builder in idle domain context bypass the
hypercall XSM checks.

This might be giving you a false sense of security, but what's done in
the idle domain context in order to do domain creation would strictly
speaking require the idle domain to be a fully privileged entity, it's
just that we mostly bypass the XSM checks by calling functions
directly instead of using the hypercall entry paths.

> Whereas during runtime if the idle
> domain was tricked into establishing an event channel between two
> domains, it would only serve to provide a covert channel between the two
> domains. Neither is desirable but IMHO I find the former a little more
> concerning than the latter.
> 
> With that said, I am not completely against doing the priv escalation if
> overall this is the direction that is preferred. If so, I would prefer
> to provide a pair of static inlines under XSM name space to provide a
> consistent implementation and be able to easily locate the places where
> it is applied if/when a longer term approach is implemented.

I think those helpers must be __init, and we need to assert that by
the time domains are started the idle domain is no longer
privileged.

From my PoV increasing the privileges of the idle domain just for the
time it acts as a domain builder is merely formalizing what is already
a fact: if the domain building was executed outside of Xen it would
require the context domain to be privileged.

Thanks, Roger.

Jason Andryuk March 30, 2022, 12:30 p.m. UTC | #9

On Wed, Mar 30, 2022 at 2:30 AM Jan Beulich <jbeulich@suse.com> wrote:
>
> On 29.03.2022 20:57, Daniel P. Smith wrote:
> > On 3/29/22 02:43, Jan Beulich wrote:
> >> Similarly I don't see how things would work transparently with a
> >> Flask policy in place. Regardless of you mentioning the restriction,
> >> I think this wants resolving before the patch can go in.
> >
> > To enable the equivalent in flask would require no hypervisor code
> > changes. Instead that would be handled by adding the necessary rules to
> > the appropriate flask policy file(s).
>
> Of course this can be dealt with by adjusting policy file(s), but until
> people do so they'd end up with a perceived regression and/or unexplained
> difference in behavior from running in dummy (or SILO, once also taken
> care of) mode.

This need to change the Flask policy is the crux of my dislike for
making Xen-internal operations go through XSM checks.  It is wrong,
IMO, to require the separate policy to grant xen_t permissions for
proper operation.  I prefer restructuring the code so Xen itself
doesn't have to go through XSM checks since that way Xen itself always
runs properly regardless of the policy.

This is more based on unmap_domain_pirq having an XSM check for an
internal operation.  dom0less, hyperlaunch, & Live Update may all be
niche use cases where requiring a policy change is reasonable.

Regards,
Jason

Daniel P. Smith March 30, 2022, 1:05 p.m. UTC | #10

Hey Julien,

On 3/29/22 17:57, Julien Grall wrote:
> Hi Daniel,
> 
> On 29/03/2022 19:57, Daniel P. Smith wrote:
>> On 3/29/22 02:43, Jan Beulich wrote:
>>> On 28.03.2022 22:36, Daniel P. Smith wrote:
>>>> During domain construction under dom0less and hyperlaunch it is
>>>> necessary to
>>>> allocate at least the event channel for xenstore and potentially the
>>>> event
>>>> channel for the core console. When dom0less and hyperlaunch are
>>>> doing their
>>>> construction logic they are executing under the idle domain context.
>>>> The idle
>>>> domain is not a privileged domain, it is not the target domain, and
>>>> as a result
>>>> under the current default XSM policy is not allowed to allocate the
>>>> event
>>>> channel.
>>>
>>> I appreciate the change is only needed there right now, but it feels
>>> inconsistent. _If_ it is to remain that way, at least a comment needs
>>> to be put in xsm_evtchn_unbound() making clear why this is a special
>>> case, and hence clarifying to people what the approximate conditions
>>> are to have such also put elsewhere. But imo it would be better to
>>> make the adjustment right in xsm_default_action(), without touching
>>> event_channel.c at all. Iirc altering xsm_default_action() was
>>> discussed before, but I don't recall particular reasons speaking
>>> against that approach.
>>
>> By inconsistent, I take it you mean this is first place within an XSM
>> hook where an access decision is based on the current domain being a
>> system domain? I do agree and would add a comment to the change in the
>> XSM hook in a non-RFC version of the patch.
>>
>> As to moving the check down into xsm_default_action(), the concern I
>> have with doing so is that this would then make every XSM check succeed
>> if `current->domain` is a system domain. Doing so would require a review
>> of every function which has an XSM hoook to evaluate every invocation of
>> those functions that,
>>    1. is there ever a time when current->domain may be a system domain
>>    2. if so,
>>      a. is the invocation on behalf of the system domain
>>      b. or is the invocation on behalf of a non-system domain
>>
>> If there is any instance of 2b, then an inadvertent privilege escalation
>> can occur on that path. For evtchn_alloc_unbound() I verified the only
>> place, besides the new hyperlaunch calls, it is invoked is in the evtchn
>> hypercall handler, where current should be pointing at the domain that
>> made the hypercall.
> Auditing existing calls is somewhat easy. The trouble are for new calls.
> I would say they are unlikely, but we would need to rely on the
> reviewers to spot any misuse. So this is a bit risky.
> 
> I am also a bit worry that we would end up to convert a lot of
> XSM_TARGET to XSM_HOOK (Note I have Live-Update in mind). This would
> make more difficult to figure what would the XSM calls allows without
> looking at the helper.

This approach was not mean to be the long term solution but to deal with
the immediate need as I agree doing this long term would make the
default policy very nuanced.

> I quite like the proposal from Roger. If we define two helpers (e.g.
> xsm_{enable, disable}_build_domain()), we could elevate the privilege
> for the idle domain for a short period of time (this could be restricted
> to when the dummy policy is used).

This is where I was initially going but I am hesitant to change the XSM
API in what might be temporary API calls which have a good chance will
be displaced. That being said, and it does seem like there is more in
favor of it, if the priv escalation is the overall preferred approach I
would still agree and prefer it be done in the XSM API so any usage is
more easily tracked.

v/r,
dps

Daniel P. Smith March 30, 2022, 1:42 p.m. UTC | #11

On 3/30/22 05:40, Roger Pau Monné wrote:
> On Tue, Mar 29, 2022 at 07:12:56PM -0400, Daniel P. Smith wrote:
>> On 3/29/22 03:29, Roger Pau Monné wrote:
>>> On Mon, Mar 28, 2022 at 04:36:22PM -0400, Daniel P. Smith wrote:
>>>> During domain construction under dom0less and hyperlaunch it is necessary to
>>>> allocate at least the event channel for xenstore and potentially the event
>>>> channel for the core console. When dom0less and hyperlaunch are doing their
>>>> construction logic they are executing under the idle domain context. The idle
>>>> domain is not a privileged domain, it is not the target domain, and as a result
>>>> under the current default XSM policy is not allowed to allocate the event
>>>> channel.
>>>
>>> I've not been following the discussion around this patch, but I would
>>> assume such privileges are only required for init code when no other
>>> domains are running?
>>
>> At this time, correct.
>>
>>> Since it's only at that point where the idle domain context needs to
>>> allocate event channels would it make sense to temporary elevate it's
>>> privileges by setting d->is_privileged while doing the domain creation?
>>
>> This is initially what I did but it seemed like there was some
>> reluctance. As I was looking to formalize/abstract this in XSM instead
>> of doing direct manipulations, I realized I could achieve it in the hook
>> which would allow the hyperlaunch and dom0less code work without having
>> to ensure priv escalation is properly handled.
>>
>>> That way we wouldn't need to grant those permissions for the lifetime
>>> of the host when they are only needed for initialization code.
>>
>> Correct, which is why I adjusted the effective default policy only on
>> the check instead of in xsm_default_action() as Jan has suggested.
>> Outside of a code fault, all other times that evtchn_alloc_unbound() is
>> called `current->domain` should be pointing at the caller of the hypercall.
>>
>> This works as an interim solution with minimal impact as it is all
>> internal to XSM and can easily be evolved. My concern is that exposing a
>> function call to provide priv escalation for the idle domain as an
>> interim solution for dom0less and hyperlaunch will have more impactful
>> code churn in both of these when a longer term approach is adopted.
>>
>>> Another option would be switching to the initial vCPU of the domain
>>> being created, but that's likely to be more complex, or even create a
>>> short lived system domain with is_privileged set just for the purpose
>>> of building other domains.
>>
>> Longer term I would like to explore doing this in general. Some initial
>> thinking is the fact that hypervisor has a few contexts, relative to
>> external entities, under which it is executing. When it is handling
>> internal house keeping (e.g. scheduler and security server), when it is
>> interacting with guest domains, when it is interacting with hardware
>> (e.g. vpci), and now when it is processing boot material to construct
>> domains. It  has been mentioned that today in Xen if one of these
>> contexts acting with external entities is corrupted, it can interfere
>> with operations occurring in the other contexts. In the past the have
>> advocated and been working to split these contexts using hard L0/L1
>> separation. As noted in other discussions, some architectures are
>> gaining hardware features that can be used in hard L0/L1 partitioning
>> but also could be used in a more "soft" partitioning more a kin to
>> Nested Kernel[1] and Dune[2]. Again just some initial thoughts.
>>
>>> Overall I'm not sure it's worth giving those extra privileges to the
>>> idle domain when they are just need for a known and bounded period of
>>> time.
>>
>> IMHO that is a slight over simplification. Setting is_privileged to the
>> idle domain while it is processing domain construction data from outside
>> the hypervisor means that during that bounded period the idle domain is
>> complete unrestricted and may invoke any XSM protected call.
> 
> The domain builder code executed in the idle domain context can make
> direct calls to any functions that are otherwise protected behind an
> XSM check on the hypercall paths, so I don't see much difference.  The
> code executed by the idle domain in order to do domain creation is
> already part of the trusted code base (ie: it's hypervisor code)
> likewise with the data used as input.

I am only referring to what legit code paths exist, not what a full
control exploit could achieve at this time. My comments below was
discussing what might want to be considered down the road.

>> Contrast
>> this with only granting the idle domain the ability to allocate event
>> channels between domains at any time with the only codified usage is
>> during init/setup. While I am unsure how, theoretically malformed
>> construction data could expose a logic flaw to do some very unsavory
>> allocations without any guards.
> 
> It's kind of like that already, it's just that in other instances the
> calls done by the domain builder in idle domain context bypass the
> hypercall XSM checks.

Not on legitimate code paths, which is what I am focused on since the
primary vector that I was considering here is data attacks which are
about steering execution down legitimate paths as opposed to attacks
like ROP that allows runtime construction of execution paths.

> This might be giving you a false sense of security, but what's done in
> the idle domain context in order to do domain creation would strictly
> speaking require the idle domain to be a fully privileged entity, it's
> just that we mostly bypass the XSM checks by calling functions
> directly instead of using the hypercall entry paths.

Not at all as long as it is understood this is just about
least-privilege with the concern being around data attacks.

To be clear, I am not saying either solution is wrong and standing
firmly for one or the other. I am just trying to ensure that we consider
the applicable security aspects and thus make an informed decision.

>> Whereas during runtime if the idle
>> domain was tricked into establishing an event channel between two
>> domains, it would only serve to provide a covert channel between the two
>> domains. Neither is desirable but IMHO I find the former a little more
>> concerning than the latter.
>>
>> With that said, I am not completely against doing the priv escalation if
>> overall this is the direction that is preferred. If so, I would prefer
>> to provide a pair of static inlines under XSM name space to provide a
>> consistent implementation and be able to easily locate the places where
>> it is applied if/when a longer term approach is implemented.
> 
> I think those helpers must be __init, and we need to assert that by
> the time domains are started the idle domain is no longer
> privileged.

That is fine, I am not sure if there is a difference between static
inline functions that are only invoked from __init code (and I would
think would thus not exist anywhere else in the binary) over__init
functions. Although I have not been keeping Live Update in mind but I
think this is where the controlled privileged builder context is really
needed that dom0less, hyperlaunch, and Live Update can all utilize.

Good point regarding the assert. This was a concern I forgot to mention
with using priv escalation. Once a legitimate path exists to do such an
action, what locations should there be checks/assertions it is has not
occurred.

> From my PoV increasing the privileges of the idle domain just for the
> time it acts as a domain builder is merely formalizing what is already
> a fact: if the domain building was executed outside of Xen it would
> require the context domain to be privileged.

Except that it is now in guest space and can only reach resources
through the controlled hypercall interface. Any data that is sent is
considered trustworthy because the domain builder is trusted.

I will send out my original priv escalation patch later today.

v/r,
dps

Daniel P. Smith March 30, 2022, 1:52 p.m. UTC | #12

On 3/30/22 02:30, Jan Beulich wrote:
> On 29.03.2022 20:57, Daniel P. Smith wrote:
>> On 3/29/22 02:43, Jan Beulich wrote:
>>> On 28.03.2022 22:36, Daniel P. Smith wrote:
>>>> During domain construction under dom0less and hyperlaunch it is necessary to
>>>> allocate at least the event channel for xenstore and potentially the event
>>>> channel for the core console. When dom0less and hyperlaunch are doing their
>>>> construction logic they are executing under the idle domain context. The idle
>>>> domain is not a privileged domain, it is not the target domain, and as a result
>>>> under the current default XSM policy is not allowed to allocate the event
>>>> channel.
>>>
>>> I appreciate the change is only needed there right now, but it feels
>>> inconsistent. _If_ it is to remain that way, at least a comment needs
>>> to be put in xsm_evtchn_unbound() making clear why this is a special
>>> case, and hence clarifying to people what the approximate conditions
>>> are to have such also put elsewhere. But imo it would be better to
>>> make the adjustment right in xsm_default_action(), without touching
>>> event_channel.c at all. Iirc altering xsm_default_action() was
>>> discussed before, but I don't recall particular reasons speaking
>>> against that approach.
>>
>> By inconsistent, I take it you mean this is first place within an XSM
>> hook where an access decision is based on the current domain being a
>> system domain?
> 
> Well - yes and no. Even if further instances appeared, overall state
> would still end up inconsistent.

Okay, I think I get what you mean, which was brought up by Julien as
well. Which is that the default policy will become very nuanced over
whether certain xsm checks grants access to the idle domain or not. Yes,
I agree if this would not be good if this approach was used as the long
term solution.

>> I do agree and would add a comment to the change in the
>> XSM hook in a non-RFC version of the patch.
>>
>> As to moving the check down into xsm_default_action(), the concern I
>> have with doing so is that this would then make every XSM check succeed
>> if `current->domain` is a system domain. Doing so would require a review
>> of every function which has an XSM hoook to evaluate every invocation of
>> those functions that,
>>   1. is there ever a time when current->domain may be a system domain
>>   2. if so,
>>     a. is the invocation on behalf of the system domain
>>     b. or is the invocation on behalf of a non-system domain
>>
>> If there is any instance of 2b, then an inadvertent privilege escalation
>> can occur on that path. For evtchn_alloc_unbound() I verified the only
>> place, besides the new hyperlaunch calls, it is invoked is in the evtchn
>> hypercall handler, where current should be pointing at the domain that
>> made the hypercall.
> 
> Such an audit shouldn't be overly difficult, as the majority of XSM hook
> invocations sit clearly visible on hypercall paths, where it is clear
> that current->domain is not a system one.

Agreed but this would be asking someone to dedicate the time to the task.

>>>> This patch only addresses the event channel situation by adjust the default XSM
>>>> policy for xsm_evtchn_unbound to explicitly allow system domains to be able to
>>>> make the allocation call.
>>>
>>> Indeed I'm having trouble seeing how your change would work for SILO
>>> mode, albeit Stefano having tested this would make me assume he did
>>> so in SILO mode, as that's the default on Arm iirc. Afaict
>>> silo_mode_dom_check() should return false in the described situation.
>>
>> Correct, this patch only addressed the default policy. If an equivalent
>> change for SILO is desired, then it would be placed in
>> silo_evtchn_unbound() and not in silo_mode_dom_check() for the same
>> reasons I would be hesitant to place it in xsm_default_action().
>>
>>> Similarly I don't see how things would work transparently with a
>>> Flask policy in place. Regardless of you mentioning the restriction,
>>> I think this wants resolving before the patch can go in.
>>
>> To enable the equivalent in flask would require no hypervisor code
>> changes. Instead that would be handled by adding the necessary rules to
>> the appropriate flask policy file(s).
> 
> Of course this can be dealt with by adjusting policy file(s), but until
> people do so they'd end up with a perceived regression and/or unexplained
> difference in behavior from running in dummy (or SILO, once also taken
> care of) mode.

By the vary nature of running flask is to have a different behavior than
the dummy policy. What would need to be adjusted is the example policy
which does attempt to reflect the dummy policy. IMHO if a group is
writing these parts of their policy themselves, it is part of the policy
maintenance they have accepted with having a custom policy.

v/r,
dps

Daniel P. Smith March 30, 2022, 2:04 p.m. UTC | #13

On 3/30/22 08:30, Jason Andryuk wrote:
> On Wed, Mar 30, 2022 at 2:30 AM Jan Beulich <jbeulich@suse.com> wrote:
>>
>> On 29.03.2022 20:57, Daniel P. Smith wrote:
>>> On 3/29/22 02:43, Jan Beulich wrote:
>>>> Similarly I don't see how things would work transparently with a
>>>> Flask policy in place. Regardless of you mentioning the restriction,
>>>> I think this wants resolving before the patch can go in.
>>>
>>> To enable the equivalent in flask would require no hypervisor code
>>> changes. Instead that would be handled by adding the necessary rules to
>>> the appropriate flask policy file(s).
>>
>> Of course this can be dealt with by adjusting policy file(s), but until
>> people do so they'd end up with a perceived regression and/or unexplained
>> difference in behavior from running in dummy (or SILO, once also taken
>> care of) mode.
> 
> This need to change the Flask policy is the crux of my dislike for
> making Xen-internal operations go through XSM checks.  It is wrong,
> IMO, to require the separate policy to grant xen_t permissions for
> proper operation.  I prefer restructuring the code so Xen itself
> doesn't have to go through XSM checks since that way Xen itself always
> runs properly regardless of the policy.
> 
> This is more based on unmap_domain_pirq having an XSM check for an
> internal operation.  dom0less, hyperlaunch, & Live Update may all be
> niche use cases where requiring a policy change is reasonable.

I will continue to agree with the base logic that today any least
privilege separation imposed is merely artificial in the face of any
attack that gains execution control over hypervisor space. What I will
disagree with is using that as a justification to create tight couplings
between the internals of the hypervisor that have a potential of causing
more work in the future when people are looking to use for instance's
Arms upcoming capability pointers as a real separation enforcement
mechanisms to de-privilege portions of the hypervisor.

While on principle it is justified to object to having policy statements
that present a facade, is it not better to look longer term than object
to some thing on principle based in the now?

v/r,
dps

Roger Pau Monné March 30, 2022, 3 p.m. UTC | #14

On Wed, Mar 30, 2022 at 09:42:18AM -0400, Daniel P. Smith wrote:
> On 3/30/22 05:40, Roger Pau Monné wrote:
> > On Tue, Mar 29, 2022 at 07:12:56PM -0400, Daniel P. Smith wrote:
> >> On 3/29/22 03:29, Roger Pau Monné wrote:
> >>> On Mon, Mar 28, 2022 at 04:36:22PM -0400, Daniel P. Smith wrote:
> >>>> During domain construction under dom0less and hyperlaunch it is necessary to
> >>>> allocate at least the event channel for xenstore and potentially the event
> >>>> channel for the core console. When dom0less and hyperlaunch are doing their
> >>>> construction logic they are executing under the idle domain context. The idle
> >>>> domain is not a privileged domain, it is not the target domain, and as a result
> >>>> under the current default XSM policy is not allowed to allocate the event
> >>>> channel.
> >>>
> >>> I've not been following the discussion around this patch, but I would
> >>> assume such privileges are only required for init code when no other
> >>> domains are running?
> >>
> >> At this time, correct.
> >>
> >>> Since it's only at that point where the idle domain context needs to
> >>> allocate event channels would it make sense to temporary elevate it's
> >>> privileges by setting d->is_privileged while doing the domain creation?
> >>
> >> This is initially what I did but it seemed like there was some
> >> reluctance. As I was looking to formalize/abstract this in XSM instead
> >> of doing direct manipulations, I realized I could achieve it in the hook
> >> which would allow the hyperlaunch and dom0less code work without having
> >> to ensure priv escalation is properly handled.
> >>
> >>> That way we wouldn't need to grant those permissions for the lifetime
> >>> of the host when they are only needed for initialization code.
> >>
> >> Correct, which is why I adjusted the effective default policy only on
> >> the check instead of in xsm_default_action() as Jan has suggested.
> >> Outside of a code fault, all other times that evtchn_alloc_unbound() is
> >> called `current->domain` should be pointing at the caller of the hypercall.
> >>
> >> This works as an interim solution with minimal impact as it is all
> >> internal to XSM and can easily be evolved. My concern is that exposing a
> >> function call to provide priv escalation for the idle domain as an
> >> interim solution for dom0less and hyperlaunch will have more impactful
> >> code churn in both of these when a longer term approach is adopted.
> >>
> >>> Another option would be switching to the initial vCPU of the domain
> >>> being created, but that's likely to be more complex, or even create a
> >>> short lived system domain with is_privileged set just for the purpose
> >>> of building other domains.
> >>
> >> Longer term I would like to explore doing this in general. Some initial
> >> thinking is the fact that hypervisor has a few contexts, relative to
> >> external entities, under which it is executing. When it is handling
> >> internal house keeping (e.g. scheduler and security server), when it is
> >> interacting with guest domains, when it is interacting with hardware
> >> (e.g. vpci), and now when it is processing boot material to construct
> >> domains. It  has been mentioned that today in Xen if one of these
> >> contexts acting with external entities is corrupted, it can interfere
> >> with operations occurring in the other contexts. In the past the have
> >> advocated and been working to split these contexts using hard L0/L1
> >> separation. As noted in other discussions, some architectures are
> >> gaining hardware features that can be used in hard L0/L1 partitioning
> >> but also could be used in a more "soft" partitioning more a kin to
> >> Nested Kernel[1] and Dune[2]. Again just some initial thoughts.
> >>
> >>> Overall I'm not sure it's worth giving those extra privileges to the
> >>> idle domain when they are just need for a known and bounded period of
> >>> time.
> >>
> >> IMHO that is a slight over simplification. Setting is_privileged to the
> >> idle domain while it is processing domain construction data from outside
> >> the hypervisor means that during that bounded period the idle domain is
> >> complete unrestricted and may invoke any XSM protected call.
> > 
> > The domain builder code executed in the idle domain context can make
> > direct calls to any functions that are otherwise protected behind an
> > XSM check on the hypercall paths, so I don't see much difference.  The
> > code executed by the idle domain in order to do domain creation is
> > already part of the trusted code base (ie: it's hypervisor code)
> > likewise with the data used as input.
> 
> I am only referring to what legit code paths exist, not what a full
> control exploit could achieve at this time. My comments below was
> discussing what might want to be considered down the road.

Maybe I wasn't very clear on that paragraph, what I meant to say is
that the domain builder code does already bypass XSM checks that would
otherwise fail, just by calling functions that are behind the XSM
checks.

For example the domain builder will likely call
alloc_domheap_pages(target,...); which would otherwise be protected by
xsm_memory_adjust_reservation(XSM_TARGET, current, target) when
populating the domain physmap, so here you are basically bypassing an
XSM check already.

> >> Contrast
> >> this with only granting the idle domain the ability to allocate event
> >> channels between domains at any time with the only codified usage is
> >> during init/setup. While I am unsure how, theoretically malformed
> >> construction data could expose a logic flaw to do some very unsavory
> >> allocations without any guards.
> > 
> > It's kind of like that already, it's just that in other instances the
> > calls done by the domain builder in idle domain context bypass the
> > hypercall XSM checks.
> 
> Not on legitimate code paths, which is what I am focused on since the
> primary vector that I was considering here is data attacks which are
> about steering execution down legitimate paths as opposed to attacks
> like ROP that allows runtime construction of execution paths.

The internal domain builder does bypass XSM checks by calling
functions that on hypercall paths are otherwise protected by an XSM
check, see my comment above about alloc_domheap_pages() for example.

> > This might be giving you a false sense of security, but what's done in
> > the idle domain context in order to do domain creation would strictly
> > speaking require the idle domain to be a fully privileged entity, it's
> > just that we mostly bypass the XSM checks by calling functions
> > directly instead of using the hypercall entry paths.
> 
> Not at all as long as it is understood this is just about
> least-privilege with the concern being around data attacks.
> 
> To be clear, I am not saying either solution is wrong and standing
> firmly for one or the other. I am just trying to ensure that we consider
> the applicable security aspects and thus make an informed decision.
> 
> >> Whereas during runtime if the idle
> >> domain was tricked into establishing an event channel between two
> >> domains, it would only serve to provide a covert channel between the two
> >> domains. Neither is desirable but IMHO I find the former a little more
> >> concerning than the latter.
> >>
> >> With that said, I am not completely against doing the priv escalation if
> >> overall this is the direction that is preferred. If so, I would prefer
> >> to provide a pair of static inlines under XSM name space to provide a
> >> consistent implementation and be able to easily locate the places where
> >> it is applied if/when a longer term approach is implemented.
> > 
> > I think those helpers must be __init, and we need to assert that by
> > the time domains are started the idle domain is no longer
> > privileged.
> 
> That is fine, I am not sure if there is a difference between static
> inline functions that are only invoked from __init code (and I would
> think would thus not exist anywhere else in the binary) over__init
> functions.

__init functions are freed after boot, so any attempt to call them
after boot will result in a page fault. This is what we want here, as
after boot the permissions of the idle domain shouldn't ever be
increased.

> Although I have not been keeping Live Update in mind but I
> think this is where the controlled privileged builder context is really
> needed that dom0less, hyperlaunch, and Live Update can all utilize.
> 
> Good point regarding the assert. This was a concern I forgot to mention
> with using priv escalation. Once a legitimate path exists to do such an
> action, what locations should there be checks/assertions it is has not
> occurred.
> 
> > From my PoV increasing the privileges of the idle domain just for the
> > time it acts as a domain builder is merely formalizing what is already
> > a fact: if the domain building was executed outside of Xen it would
> > require the context domain to be privileged.
> 
> Except that it is now in guest space and can only reach resources
> through the controlled hypercall interface. Any data that is sent is
> considered trustworthy because the domain builder is trusted.

So if it is trusted then there's no issue in formalizing this by
setting its context to is_privileged = true.  Ideally we might want to
do domain build on a separate system domain just for the purpose of
executing that code, but given the system is not live yet (no
domains are scheduled in parallel with the idle domain) I don't see an
issue with re-using the idle domain for that purpose.

> I will send out my original priv escalation patch later today.

Thanks! Will give it a look (likely tomorrow).

Roger.

Jason Andryuk March 30, 2022, 3:15 p.m. UTC | #15

On Wed, Mar 30, 2022 at 10:04 AM Daniel P. Smith
<dpsmith@apertussolutions.com> wrote:
>
> On 3/30/22 08:30, Jason Andryuk wrote:
> > On Wed, Mar 30, 2022 at 2:30 AM Jan Beulich <jbeulich@suse.com> wrote:
> >>
> >> On 29.03.2022 20:57, Daniel P. Smith wrote:
> >>> On 3/29/22 02:43, Jan Beulich wrote:
> >>>> Similarly I don't see how things would work transparently with a
> >>>> Flask policy in place. Regardless of you mentioning the restriction,
> >>>> I think this wants resolving before the patch can go in.
> >>>
> >>> To enable the equivalent in flask would require no hypervisor code
> >>> changes. Instead that would be handled by adding the necessary rules to
> >>> the appropriate flask policy file(s).
> >>
> >> Of course this can be dealt with by adjusting policy file(s), but until
> >> people do so they'd end up with a perceived regression and/or unexplained
> >> difference in behavior from running in dummy (or SILO, once also taken
> >> care of) mode.
> >
> > This need to change the Flask policy is the crux of my dislike for
> > making Xen-internal operations go through XSM checks.  It is wrong,
> > IMO, to require the separate policy to grant xen_t permissions for
> > proper operation.  I prefer restructuring the code so Xen itself
> > doesn't have to go through XSM checks since that way Xen itself always
> > runs properly regardless of the policy.
> >
> > This is more based on unmap_domain_pirq having an XSM check for an
> > internal operation.  dom0less, hyperlaunch, & Live Update may all be
> > niche use cases where requiring a policy change is reasonable.
>
> I will continue to agree with the base logic that today any least
> privilege separation imposed is merely artificial in the face of any
> attack that gains execution control over hypervisor space. What I will
> disagree with is using that as a justification to create tight couplings
> between the internals of the hypervisor that have a potential of causing
> more work in the future when people are looking to use for instance's
> Arms upcoming capability pointers as a real separation enforcement
> mechanisms to de-privilege portions of the hypervisor.
>
> While on principle it is justified to object to having policy statements
> that present a facade, is it not better to look longer term than object
> to some thing on principle based in the now?

Your claims seem to be speculative about something that doesn't exist,
so I can't evaluate them.

Do you envision that this future Xen would have multiple xen_*_t types
requiring explicit Flask policy rules?

Regards,
Jason

Daniel P. Smith March 30, 2022, 4:23 p.m. UTC | #16

On 3/30/22 11:15, Jason Andryuk wrote:
> On Wed, Mar 30, 2022 at 10:04 AM Daniel P. Smith
> <dpsmith@apertussolutions.com> wrote:
>>
>> On 3/30/22 08:30, Jason Andryuk wrote:
>>> On Wed, Mar 30, 2022 at 2:30 AM Jan Beulich <jbeulich@suse.com> wrote:
>>>>
>>>> On 29.03.2022 20:57, Daniel P. Smith wrote:
>>>>> On 3/29/22 02:43, Jan Beulich wrote:
>>>>>> Similarly I don't see how things would work transparently with a
>>>>>> Flask policy in place. Regardless of you mentioning the restriction,
>>>>>> I think this wants resolving before the patch can go in.
>>>>>
>>>>> To enable the equivalent in flask would require no hypervisor code
>>>>> changes. Instead that would be handled by adding the necessary rules to
>>>>> the appropriate flask policy file(s).
>>>>
>>>> Of course this can be dealt with by adjusting policy file(s), but until
>>>> people do so they'd end up with a perceived regression and/or unexplained
>>>> difference in behavior from running in dummy (or SILO, once also taken
>>>> care of) mode.
>>>
>>> This need to change the Flask policy is the crux of my dislike for
>>> making Xen-internal operations go through XSM checks.  It is wrong,
>>> IMO, to require the separate policy to grant xen_t permissions for
>>> proper operation.  I prefer restructuring the code so Xen itself
>>> doesn't have to go through XSM checks since that way Xen itself always
>>> runs properly regardless of the policy.
>>>
>>> This is more based on unmap_domain_pirq having an XSM check for an
>>> internal operation.  dom0less, hyperlaunch, & Live Update may all be
>>> niche use cases where requiring a policy change is reasonable.
>>
>> I will continue to agree with the base logic that today any least
>> privilege separation imposed is merely artificial in the face of any
>> attack that gains execution control over hypervisor space. What I will
>> disagree with is using that as a justification to create tight couplings
>> between the internals of the hypervisor that have a potential of causing
>> more work in the future when people are looking to use for instance's
>> Arms upcoming capability pointers as a real separation enforcement
>> mechanisms to de-privilege portions of the hypervisor.
>>
>> While on principle it is justified to object to having policy statements
>> that present a facade, is it not better to look longer term than object
>> to some thing on principle based in the now?
> 
> Your claims seem to be speculative about something that doesn't exist,
> so I can't evaluate them.

They exists, they are available in OpenPOWER and Arm CHERI is in
evaluation now.

> Do you envision that this future Xen would have multiple xen_*_t types
> requiring explicit Flask policy rules?

Right now I would say no for two reasons, first flask comes from the
mind set of controlling what hypervisor interfaces a guest may have
access and second is that I am not certain whether hypervisor internal
contexts should be configurable.

v/r,
dps

Daniel P. Smith March 30, 2022, 4:28 p.m. UTC | #17

On 3/30/22 11:15, Jason Andryuk wrote:
> On Wed, Mar 30, 2022 at 10:04 AM Daniel P. Smith
> <dpsmith@apertussolutions.com> wrote:
>>
>> On 3/30/22 08:30, Jason Andryuk wrote:
>>> On Wed, Mar 30, 2022 at 2:30 AM Jan Beulich <jbeulich@suse.com> wrote:
>>>>
>>>> On 29.03.2022 20:57, Daniel P. Smith wrote:
>>>>> On 3/29/22 02:43, Jan Beulich wrote:
>>>>>> Similarly I don't see how things would work transparently with a
>>>>>> Flask policy in place. Regardless of you mentioning the restriction,
>>>>>> I think this wants resolving before the patch can go in.
>>>>>
>>>>> To enable the equivalent in flask would require no hypervisor code
>>>>> changes. Instead that would be handled by adding the necessary rules to
>>>>> the appropriate flask policy file(s).
>>>>
>>>> Of course this can be dealt with by adjusting policy file(s), but until
>>>> people do so they'd end up with a perceived regression and/or unexplained
>>>> difference in behavior from running in dummy (or SILO, once also taken
>>>> care of) mode.
>>>
>>> This need to change the Flask policy is the crux of my dislike for
>>> making Xen-internal operations go through XSM checks.  It is wrong,
>>> IMO, to require the separate policy to grant xen_t permissions for
>>> proper operation.  I prefer restructuring the code so Xen itself
>>> doesn't have to go through XSM checks since that way Xen itself always
>>> runs properly regardless of the policy.
>>>
>>> This is more based on unmap_domain_pirq having an XSM check for an
>>> internal operation.  dom0less, hyperlaunch, & Live Update may all be
>>> niche use cases where requiring a policy change is reasonable.
>>
>> I will continue to agree with the base logic that today any least
>> privilege separation imposed is merely artificial in the face of any
>> attack that gains execution control over hypervisor space. What I will
>> disagree with is using that as a justification to create tight couplings
>> between the internals of the hypervisor that have a potential of causing
>> more work in the future when people are looking to use for instance's
>> Arms upcoming capability pointers as a real separation enforcement
>> mechanisms to de-privilege portions of the hypervisor.
>>
>> While on principle it is justified to object to having policy statements
>> that present a facade, is it not better to look longer term than object
>> to some thing on principle based in the now?
> 
> Your claims seem to be speculative about something that doesn't exist,
> so I can't evaluate them.

Apologies, let me give you some references as well.

https://mrfunk.info/?page_id=5
https://www.platformsecuritysummit.com/2019/speaker/hunt/

v/r,
dps

Jason Andryuk March 30, 2022, 7:53 p.m. UTC | #18

On Wed, Mar 30, 2022 at 12:24 PM Daniel P. Smith
<dpsmith@apertussolutions.com> wrote:
>
> On 3/30/22 11:15, Jason Andryuk wrote:
> > On Wed, Mar 30, 2022 at 10:04 AM Daniel P. Smith
> > <dpsmith@apertussolutions.com> wrote:
> >>
> >> On 3/30/22 08:30, Jason Andryuk wrote:
> >>> On Wed, Mar 30, 2022 at 2:30 AM Jan Beulich <jbeulich@suse.com> wrote:
> >>>>
> >>>> On 29.03.2022 20:57, Daniel P. Smith wrote:
> >>>>> On 3/29/22 02:43, Jan Beulich wrote:
> >>>>>> Similarly I don't see how things would work transparently with a
> >>>>>> Flask policy in place. Regardless of you mentioning the restriction,
> >>>>>> I think this wants resolving before the patch can go in.
> >>>>>
> >>>>> To enable the equivalent in flask would require no hypervisor code
> >>>>> changes. Instead that would be handled by adding the necessary rules to
> >>>>> the appropriate flask policy file(s).
> >>>>
> >>>> Of course this can be dealt with by adjusting policy file(s), but until
> >>>> people do so they'd end up with a perceived regression and/or unexplained
> >>>> difference in behavior from running in dummy (or SILO, once also taken
> >>>> care of) mode.
> >>>
> >>> This need to change the Flask policy is the crux of my dislike for
> >>> making Xen-internal operations go through XSM checks.  It is wrong,
> >>> IMO, to require the separate policy to grant xen_t permissions for
> >>> proper operation.  I prefer restructuring the code so Xen itself
> >>> doesn't have to go through XSM checks since that way Xen itself always
> >>> runs properly regardless of the policy.
> >>>
> >>> This is more based on unmap_domain_pirq having an XSM check for an
> >>> internal operation.  dom0less, hyperlaunch, & Live Update may all be
> >>> niche use cases where requiring a policy change is reasonable.
> >>
> >> I will continue to agree with the base logic that today any least
> >> privilege separation imposed is merely artificial in the face of any
> >> attack that gains execution control over hypervisor space. What I will
> >> disagree with is using that as a justification to create tight couplings
> >> between the internals of the hypervisor that have a potential of causing
> >> more work in the future when people are looking to use for instance's
> >> Arms upcoming capability pointers as a real separation enforcement
> >> mechanisms to de-privilege portions of the hypervisor.
> >>
> >> While on principle it is justified to object to having policy statements
> >> that present a facade, is it not better to look longer term than object
> >> to some thing on principle based in the now?
> >
> > Your claims seem to be speculative about something that doesn't exist,
> > so I can't evaluate them.
>
> They exists, they are available in OpenPOWER and Arm CHERI is in
> evaluation now.

Yes, but how will Xen use the hardware feature to internally
deprivilege itself?  What sort of division are you planning?

> > Do you envision that this future Xen would have multiple xen_*_t types
> > requiring explicit Flask policy rules?
>
> Right now I would say no for two reasons, first flask comes from the
> mind set of controlling what hypervisor interfaces a guest may have
> access and second is that I am not certain whether hypervisor internal
> contexts should be configurable.

Oh.  I expected multiple xen_*_t types to be a natural part of the Xen
deprivileging.

Regards,
Jason

[RFC,1/1] xsm: allows system domains to allocate evtchn

Commit Message

Comments

Patch