diff mbox series

Revert "evtchn: refuse EVTCHNOP_status for Xen-bound event channels"

Message ID 20240402170612.2477791-1-andrew.cooper3@citrix.com (mailing list archive)
State New
Headers show
Series Revert "evtchn: refuse EVTCHNOP_status for Xen-bound event channels" | expand

Commit Message

Andrew Cooper April 2, 2024, 5:06 p.m. UTC
The commit makes a claim without any kind of justification.

The claim is false, and the commit broke lsevtchn in dom0.  It is also quite
obvious from XSM_TARGET that it has broken device model stubdoms too.

Whether to return information about a xen-owned evtchn is a matter of policy,
and it's not acceptable to short circuit the XSM on the matter.

This reverts commit f60ab5337f968e2f10c639ab59db7afb0fe4f7c3.

Fixes: f60ab5337f96 ("evtchn: refuse EVTCHNOP_status for Xen-bound event channels")
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: George Dunlap <George.Dunlap@citrix.com>
CC: Jan Beulich <JBeulich@suse.com>
CC: Stefano Stabellini <sstabellini@kernel.org>
CC: Julien Grall <julien@xen.org>
CC: Daniel Smith <dpsmith@apertussolutions.com>
---
 xen/common/event_channel.c | 6 ------
 1 file changed, 6 deletions(-)


base-commit: 7a09966e7b2823b70f6d56d0cf66c11124f4a3c1

Comments

Jan Beulich April 3, 2024, 6:16 a.m. UTC | #1
On 02.04.2024 19:06, Andrew Cooper wrote:
> The commit makes a claim without any kind of justification.

Well, what does "have no business" leave open?

> The claim is false, and the commit broke lsevtchn in dom0.

Or alternatively lsevtchn was doing something that was never meant to work
(from Xen's perspective).

>  It is also quite
> obvious from XSM_TARGET that it has broken device model stubdoms too.

Why would that be "obvious"? What business would a stubdom have to look at
Xen's side of an evtchn?

> Whether to return information about a xen-owned evtchn is a matter of policy,
> and it's not acceptable to short circuit the XSM on the matter.

I can certainly accept this as one possible view point. As in so many cases
I'm afraid I dislike you putting it as if it was the only possible one.

In summary: The supposed justification you claim is missing in the original
change is imo also missing here then: What business would any entity in the
system have to look at Xen's side of an event channel? Back at the time, 3
people agreed that it's "none".

Jan
Jan Beulich April 3, 2024, 6:52 a.m. UTC | #2
On 03.04.2024 08:16, Jan Beulich wrote:
> On 02.04.2024 19:06, Andrew Cooper wrote:
>> Whether to return information about a xen-owned evtchn is a matter of policy,
>> and it's not acceptable to short circuit the XSM on the matter.
> 
> I can certainly accept this as one possible view point. As in so many cases
> I'm afraid I dislike you putting it as if it was the only possible one.

Further to this: Is there even a way to express the same denial in XSM?
alloc_unbound_xen_event_channel() doesn't specifically "mark" such a
channel, and (yes, it could in principle be open-coded in Flask code)
consumer_is_xen() is private to event_channel.c. I also dare to question
whether in SILO mode status information like this should be available.

Jan
Daniel P. Smith April 3, 2024, 11:10 a.m. UTC | #3
On 4/3/24 02:16, Jan Beulich wrote:
> On 02.04.2024 19:06, Andrew Cooper wrote:
>> The commit makes a claim without any kind of justification.
> 
> Well, what does "have no business" leave open?

Why does it not have any business? Why should a domain that creates an 
event channel not be able to inquire about its status?

>> The claim is false, and the commit broke lsevtchn in dom0.
> 
> Or alternatively lsevtchn was doing something that was never meant to work
> (from Xen's perspective).

Again, you have not said why this is a problem. What concern does it 
create? Does it open the door for access elevation, resource 
deprivation, or some other malicious behaviors?

>>   It is also quite
>> obvious from XSM_TARGET that it has broken device model stubdoms too.
> 
> Why would that be "obvious"? What business would a stubdom have to look at
> Xen's side of an evtchn?

Again, you have not expressed why it shouldn't be able to do so.

>> Whether to return information about a xen-owned evtchn is a matter of policy,
>> and it's not acceptable to short circuit the XSM on the matter.
> 
> I can certainly accept this as one possible view point. As in so many cases
> I'm afraid I dislike you putting it as if it was the only possible one.

In fact, this commit is in violation of the XSM. It hard-codes a 
resource access check outside XSM, thus breaking the fine-grained access 
control of FLASK.

> In summary: The supposed justification you claim is missing in the original
> change is imo also missing here then: What business would any entity in the
> system have to look at Xen's side of an event channel? Back at the time, 3
> people agreed that it's "none".

As stated, you provided no reason or justification for "has no business" 
and by face value is an opinion that a few people agreed with. As for 
why, there could be a myriad number of reasons a domain may want to 
check the status of an interface it has with the hypervisor. From just 
logging its state for debug to throttling attempts at sending an event. 
So why, from a security/access control decision, does this access have 
to absolutely blocked, even from FLASK?

v/r,
dps
Daniel P. Smith April 3, 2024, 11:50 a.m. UTC | #4
On 4/3/24 02:52, Jan Beulich wrote:
> On 03.04.2024 08:16, Jan Beulich wrote:
>> On 02.04.2024 19:06, Andrew Cooper wrote:
>>> Whether to return information about a xen-owned evtchn is a matter of policy,
>>> and it's not acceptable to short circuit the XSM on the matter.
>>
>> I can certainly accept this as one possible view point. As in so many cases
>> I'm afraid I dislike you putting it as if it was the only possible one.
> 
> Further to this: Is there even a way to express the same denial in XSM?
> alloc_unbound_xen_event_channel() doesn't specifically "mark" such a
> channel, and (yes, it could in principle be open-coded in Flask code)
> consumer_is_xen() is private to event_channel.c. I also dare to question
> whether in SILO mode status information like this should be available.

To build on the previous response: if the natural failure return value 
is -EACCESS in response to a domain resource access attempt, then the 
probability is extremely high that it should be implemented under a XSM 
hook and not hard-coded into the resource logic.

v/r,
dps
Jan Beulich April 3, 2024, 11:54 a.m. UTC | #5
On 03.04.2024 13:50, Daniel P. Smith wrote:
> On 4/3/24 02:52, Jan Beulich wrote:
>> On 03.04.2024 08:16, Jan Beulich wrote:
>>> On 02.04.2024 19:06, Andrew Cooper wrote:
>>>> Whether to return information about a xen-owned evtchn is a matter of policy,
>>>> and it's not acceptable to short circuit the XSM on the matter.
>>>
>>> I can certainly accept this as one possible view point. As in so many cases
>>> I'm afraid I dislike you putting it as if it was the only possible one.
>>
>> Further to this: Is there even a way to express the same denial in XSM?
>> alloc_unbound_xen_event_channel() doesn't specifically "mark" such a
>> channel, and (yes, it could in principle be open-coded in Flask code)
>> consumer_is_xen() is private to event_channel.c. I also dare to question
>> whether in SILO mode status information like this should be available.
> 
> To build on the previous response: if the natural failure return value 
> is -EACCESS in response to a domain resource access attempt, then the 
> probability is extremely high that it should be implemented under a XSM 
> hook and not hard-coded into the resource logic.

Possibly. But first of all - could you answer the earlier question I raised?

Jan
Jan Beulich April 3, 2024, 12:05 p.m. UTC | #6
On 03.04.2024 13:10, Daniel P. Smith wrote:
> On 4/3/24 02:16, Jan Beulich wrote:
>> On 02.04.2024 19:06, Andrew Cooper wrote:
>>> The commit makes a claim without any kind of justification.
>>
>> Well, what does "have no business" leave open?
> 
> Why does it not have any business? Why should a domain that creates an 
> event channel not be able to inquire about its status?

Event channels we talk about here are created via
alloc_unbound_xen_event_channel(). IOW it's not any domain creating them.
Once connected, the respective domain is of course fine to query its end
of the channel.

>>> The claim is false, and the commit broke lsevtchn in dom0.
>>
>> Or alternatively lsevtchn was doing something that was never meant to work
>> (from Xen's perspective).
> 
> Again, you have not said why this is a problem. What concern does it 
> create? Does it open the door for access elevation, resource 
> deprivation, or some other malicious behaviors?

It exposes information that perhaps better wouldn't be exposed. Imo if
Xen owned resource state is of interest, it would want exposing via
hypfs.

>>>   It is also quite
>>> obvious from XSM_TARGET that it has broken device model stubdoms too.
>>
>> Why would that be "obvious"? What business would a stubdom have to look at
>> Xen's side of an evtchn?
> 
> Again, you have not expressed why it shouldn't be able to do so.

See above - not its resource, nor its guest's.

>>> Whether to return information about a xen-owned evtchn is a matter of policy,
>>> and it's not acceptable to short circuit the XSM on the matter.
>>
>> I can certainly accept this as one possible view point. As in so many cases
>> I'm afraid I dislike you putting it as if it was the only possible one.
> 
> In fact, this commit is in violation of the XSM. It hard-codes a 
> resource access check outside XSM, thus breaking the fine-grained access 
> control of FLASK.

Perhaps; see below and see the question raised in the subsequent reply
to the patch.

>> In summary: The supposed justification you claim is missing in the original
>> change is imo also missing here then: What business would any entity in the
>> system have to look at Xen's side of an event channel? Back at the time, 3
>> people agreed that it's "none".
> 
> As stated, you provided no reason or justification for "has no business" 
> and by face value is an opinion that a few people agreed with. As for 
> why, there could be a myriad number of reasons a domain may want to 
> check the status of an interface it has with the hypervisor. From just 
> logging its state for debug to throttling attempts at sending an event. 
> So why, from a security/access control decision, does this access have 
> to absolutely blocked, even from FLASK?

I didn't say it absolutely needs to be blocked. I'm okay to become
convinced otherwise. But in the description complaining about lack of
reasons in the 3-4 year old change, just to then again not provide any
reasons looks "interesting" to me. (And no, just to take that example,
lsevtchn not working anymore on such channels is not on its own a
reason. As indicated, it may well be that conceptually it was never
supposed to be able to have access to this information. The latest not
anymore when hypfs was introduced.)

Jan
Daniel P. Smith April 3, 2024, 1:27 p.m. UTC | #7
On 4/3/24 08:05, Jan Beulich wrote:
> On 03.04.2024 13:10, Daniel P. Smith wrote:
>> On 4/3/24 02:16, Jan Beulich wrote:
>>> On 02.04.2024 19:06, Andrew Cooper wrote:
>>>> The commit makes a claim without any kind of justification.
>>>
>>> Well, what does "have no business" leave open?
>>
>> Why does it not have any business? Why should a domain that creates an
>> event channel not be able to inquire about its status?
> 
> Event channels we talk about here are created via
> alloc_unbound_xen_event_channel(). IOW it's not any domain creating them.
> Once connected, the respective domain is of course fine to query its end
> of the channel.


I would disagree, for instance alloc_unbound_xen_event_channel() is used 
in response to XEN_DOMCTL_vuart_op:XEN_DOMCTL_VUART_OP_INIT and 
XEN_DOMCTL_VM_EVENT_OP_PAGING:XEN_VM_EVENT_ENABLE, which are hypercalls 
by a domain and not something initiated by the hypervisor.

>>>> The claim is false, and the commit broke lsevtchn in dom0.
>>>
>>> Or alternatively lsevtchn was doing something that was never meant to work
>>> (from Xen's perspective).
>>
>> Again, you have not said why this is a problem. What concern does it
>> create? Does it open the door for access elevation, resource
>> deprivation, or some other malicious behaviors?
> 
> It exposes information that perhaps better wouldn't be exposed. Imo if
> Xen owned resource state is of interest, it would want exposing via
> hypfs.

You didn't answer why, just again expressed your opinion that it is not 
better exposed. And I would have to wholly disagree with the sentiment 
that hypfs exposure is the deciding factor what is or is not worth 
exposing. This thinking is completely orthogonal to FLASK and 
fine-grained access control.

>>>>    It is also quite
>>>> obvious from XSM_TARGET that it has broken device model stubdoms too.
>>>
>>> Why would that be "obvious"? What business would a stubdom have to look at
>>> Xen's side of an evtchn?
>>
>> Again, you have not expressed why it shouldn't be able to do so.
> 
> See above - not its resource, nor its guest's.

It is a resource provided to a domain that the domain can send/raise an 
event to and a backing domain that can bind to it, ie. the two 
parameters that must be passed to the allocation call.

>>>> Whether to return information about a xen-owned evtchn is a matter of policy,
>>>> and it's not acceptable to short circuit the XSM on the matter.
>>>
>>> I can certainly accept this as one possible view point. As in so many cases
>>> I'm afraid I dislike you putting it as if it was the only possible one.
>>
>> In fact, this commit is in violation of the XSM. It hard-codes a
>> resource access check outside XSM, thus breaking the fine-grained access
>> control of FLASK.
> 
> Perhaps; see below and see the question raised in the subsequent reply
> to the patch.
> 
>>> In summary: The supposed justification you claim is missing in the original
>>> change is imo also missing here then: What business would any entity in the
>>> system have to look at Xen's side of an event channel? Back at the time, 3
>>> people agreed that it's "none".
>>
>> As stated, you provided no reason or justification for "has no business"
>> and by face value is an opinion that a few people agreed with. As for
>> why, there could be a myriad number of reasons a domain may want to
>> check the status of an interface it has with the hypervisor. From just
>> logging its state for debug to throttling attempts at sending an event.
>> So why, from a security/access control decision, does this access have
>> to absolutely blocked, even from FLASK?
> 
> I didn't say it absolutely needs to be blocked. I'm okay to become
> convinced otherwise. But in the description complaining about lack of
> reasons in the 3-4 year old change, just to then again not provide any
> reasons looks "interesting" to me. (And no, just to take that example,
> lsevtchn not working anymore on such channels is not on its own a
> reason. As indicated, it may well be that conceptually it was never
> supposed to be able to have access to this information. The latest not
> anymore when hypfs was introduced.)

This broke an existing behavior, whether that behavior is correct can 
always be questioned, does not justify leaving an incorrect 
implementation. And it is incorrect because as again you have not 
articulated why the lsevtchn behavior is wrong and thus whether this is 
the valid corrective action.

v/r,
dps
Daniel P. Smith April 3, 2024, 1:31 p.m. UTC | #8
On 4/3/24 07:54, Jan Beulich wrote:
> On 03.04.2024 13:50, Daniel P. Smith wrote:
>> On 4/3/24 02:52, Jan Beulich wrote:
>>> On 03.04.2024 08:16, Jan Beulich wrote:
>>>> On 02.04.2024 19:06, Andrew Cooper wrote:
>>>>> Whether to return information about a xen-owned evtchn is a matter of policy,
>>>>> and it's not acceptable to short circuit the XSM on the matter.
>>>>
>>>> I can certainly accept this as one possible view point. As in so many cases
>>>> I'm afraid I dislike you putting it as if it was the only possible one.
>>>
>>> Further to this: Is there even a way to express the same denial in XSM?
>>> alloc_unbound_xen_event_channel() doesn't specifically "mark" such a
>>> channel, and (yes, it could in principle be open-coded in Flask code)
>>> consumer_is_xen() is private to event_channel.c. I also dare to question
>>> whether in SILO mode status information like this should be available.
>>
>> To build on the previous response: if the natural failure return value
>> is -EACCESS in response to a domain resource access attempt, then the
>> probability is extremely high that it should be implemented under a XSM
>> hook and not hard-coded into the resource logic.
> 
> Possibly. But first of all - could you answer the earlier question I raised?

Don't need to, this change subverts/violates the access control 
framework. If the desire is to make this access decision for the 
default/dummy policy, then codify it there. Otherwise I will be ack'ing 
this change since it is access control and falls under the purview of XSM.

v/r,
dps
Daniel P. Smith April 3, 2024, 1:35 p.m. UTC | #9
On 4/2/24 13:06, Andrew Cooper wrote:
> The commit makes a claim without any kind of justification.
> 
> The claim is false, and the commit broke lsevtchn in dom0.  It is also quite
> obvious from XSM_TARGET that it has broken device model stubdoms too.
> 
> Whether to return information about a xen-owned evtchn is a matter of policy,
> and it's not acceptable to short circuit the XSM on the matter.
> 
> This reverts commit f60ab5337f968e2f10c639ab59db7afb0fe4f7c3.
> 
> Fixes: f60ab5337f96 ("evtchn: refuse EVTCHNOP_status for Xen-bound event channels")
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
> ---
> CC: George Dunlap <George.Dunlap@citrix.com>
> CC: Jan Beulich <JBeulich@suse.com>
> CC: Stefano Stabellini <sstabellini@kernel.org>
> CC: Julien Grall <julien@xen.org>
> CC: Daniel Smith <dpsmith@apertussolutions.com>
> ---
>   xen/common/event_channel.c | 6 ------
>   1 file changed, 6 deletions(-)
> 
> diff --git a/xen/common/event_channel.c b/xen/common/event_channel.c
> index 20f586cf5ecd..ae6c2f902645 100644
> --- a/xen/common/event_channel.c
> +++ b/xen/common/event_channel.c
> @@ -1040,12 +1040,6 @@ int evtchn_status(evtchn_status_t *status)
>   
>       read_lock(&d->event_lock);
>   
> -    if ( consumer_is_xen(chn) )
> -    {
> -        rc = -EACCES;
> -        goto out;
> -    }
> -
>       rc = xsm_evtchn_status(XSM_TARGET, d, chn);
>       if ( rc )
>           goto out;
> 
> base-commit: 7a09966e7b2823b70f6d56d0cf66c11124f4a3c1

Acked-by: Daniel P. Smith <dpsmith@apertussolutions.com>
Jan Beulich April 4, 2024, 7:57 a.m. UTC | #10
On 03.04.2024 15:27, Daniel P. Smith wrote:
> On 4/3/24 08:05, Jan Beulich wrote:
>> On 03.04.2024 13:10, Daniel P. Smith wrote:
>>> On 4/3/24 02:16, Jan Beulich wrote:
>>>> On 02.04.2024 19:06, Andrew Cooper wrote:
>>>>> The commit makes a claim without any kind of justification.
>>>>
>>>> Well, what does "have no business" leave open?
>>>
>>> Why does it not have any business? Why should a domain that creates an
>>> event channel not be able to inquire about its status?
>>
>> Event channels we talk about here are created via
>> alloc_unbound_xen_event_channel(). IOW it's not any domain creating them.
>> Once connected, the respective domain is of course fine to query its end
>> of the channel.
> 
> I would disagree, for instance alloc_unbound_xen_event_channel() is used 
> in response to XEN_DOMCTL_vuart_op:XEN_DOMCTL_VUART_OP_INIT and 
> XEN_DOMCTL_VM_EVENT_OP_PAGING:XEN_VM_EVENT_ENABLE, which are hypercalls 
> by a domain and not something initiated by the hypervisor.

Those ports, aiui, aren't supposed to be used by the caller for other
than connecting an inter-domain port to the other side.

>>>>> The claim is false, and the commit broke lsevtchn in dom0.
>>>>
>>>> Or alternatively lsevtchn was doing something that was never meant to work
>>>> (from Xen's perspective).
>>>
>>> Again, you have not said why this is a problem. What concern does it
>>> create? Does it open the door for access elevation, resource
>>> deprivation, or some other malicious behaviors?
>>
>> It exposes information that perhaps better wouldn't be exposed. Imo if
>> Xen owned resource state is of interest, it would want exposing via
>> hypfs.
> 
> You didn't answer why, just again expressed your opinion that it is not 
> better exposed.

I'm sorry, but "better wouldn't be exposed" includes the "why" part
already imo: Information should simply not be exposed unduly. For
every bit of exposed information, there ought to be a reason (and
then the right vehicle used for exposure).

>>>>>    It is also quite
>>>>> obvious from XSM_TARGET that it has broken device model stubdoms too.
>>>>
>>>> Why would that be "obvious"? What business would a stubdom have to look at
>>>> Xen's side of an evtchn?
>>>
>>> Again, you have not expressed why it shouldn't be able to do so.
>>
>> See above - not its resource, nor its guest's.
> 
> It is a resource provided to a domain that the domain can send/raise an 
> event to and a backing domain that can bind to it, ie. the two 
> parameters that must be passed to the allocation call.

I don't think so: As per above (my understanding may be wrong), it's only
the other side of the connection which is available for use by a domain.
Over night I was pretty close to admitting a mistake there, but upon re-
checking of the sources I could only find this view of mine supported.
Which doesn't mean I'm viewing things correctly; please point out my
mistake if there is any.

>>>>> Whether to return information about a xen-owned evtchn is a matter of policy,
>>>>> and it's not acceptable to short circuit the XSM on the matter.
>>>>
>>>> I can certainly accept this as one possible view point. As in so many cases
>>>> I'm afraid I dislike you putting it as if it was the only possible one.
>>>
>>> In fact, this commit is in violation of the XSM. It hard-codes a
>>> resource access check outside XSM, thus breaking the fine-grained access
>>> control of FLASK.
>>
>> Perhaps; see below and see the question raised in the subsequent reply
>> to the patch.
>>
>>>> In summary: The supposed justification you claim is missing in the original
>>>> change is imo also missing here then: What business would any entity in the
>>>> system have to look at Xen's side of an event channel? Back at the time, 3
>>>> people agreed that it's "none".
>>>
>>> As stated, you provided no reason or justification for "has no business"
>>> and by face value is an opinion that a few people agreed with. As for
>>> why, there could be a myriad number of reasons a domain may want to
>>> check the status of an interface it has with the hypervisor. From just
>>> logging its state for debug to throttling attempts at sending an event.
>>> So why, from a security/access control decision, does this access have
>>> to absolutely blocked, even from FLASK?
>>
>> I didn't say it absolutely needs to be blocked. I'm okay to become
>> convinced otherwise. But in the description complaining about lack of
>> reasons in the 3-4 year old change, just to then again not provide any
>> reasons looks "interesting" to me. (And no, just to take that example,
>> lsevtchn not working anymore on such channels is not on its own a
>> reason. As indicated, it may well be that conceptually it was never
>> supposed to be able to have access to this information. The latest not
>> anymore when hypfs was introduced.)
> 
> This broke an existing behavior, whether that behavior is correct can 
> always be questioned, does not justify leaving an incorrect 
> implementation. And it is incorrect because as again you have not 
> articulated why the lsevtchn behavior is wrong and thus whether this is 
> the valid corrective action.

Again - if lsevtchn is supposed to be able to access Xen-internal
resources, _that_ is what needs justifying. Otherwise my take is that
it is supposed to only access domain resources.

Jan
Jan Beulich April 4, 2024, 8:11 a.m. UTC | #11
On 03.04.2024 15:31, Daniel P. Smith wrote:
> On 4/3/24 07:54, Jan Beulich wrote:
>> On 03.04.2024 13:50, Daniel P. Smith wrote:
>>> On 4/3/24 02:52, Jan Beulich wrote:
>>>> On 03.04.2024 08:16, Jan Beulich wrote:
>>>>> On 02.04.2024 19:06, Andrew Cooper wrote:
>>>>>> Whether to return information about a xen-owned evtchn is a matter of policy,
>>>>>> and it's not acceptable to short circuit the XSM on the matter.
>>>>>
>>>>> I can certainly accept this as one possible view point. As in so many cases
>>>>> I'm afraid I dislike you putting it as if it was the only possible one.
>>>>
>>>> Further to this: Is there even a way to express the same denial in XSM?
>>>> alloc_unbound_xen_event_channel() doesn't specifically "mark" such a
>>>> channel, and (yes, it could in principle be open-coded in Flask code)
>>>> consumer_is_xen() is private to event_channel.c. I also dare to question
>>>> whether in SILO mode status information like this should be available.
>>>
>>> To build on the previous response: if the natural failure return value
>>> is -EACCESS in response to a domain resource access attempt, then the
>>> probability is extremely high that it should be implemented under a XSM
>>> hook and not hard-coded into the resource logic.
>>
>> Possibly. But first of all - could you answer the earlier question I raised?
> 
> Don't need to, this change subverts/violates the access control 
> framework. If the desire is to make this access decision for the 
> default/dummy policy, then codify it there. Otherwise I will be ack'ing 
> this change since it is access control and falls under the purview of XSM.

If Xen internals like this are to be exposable (and controlled by XSM), why
would other Xen internals not similarly be (optionally) exposed?

Further, since above you referred to EACCES being what XSM is supposed to
control: xsm_default_action() used EPERM, and (presumably; too long ago)
EACCES was chosen here precisely to make it not look like an XSM surrogate.

Jan
Jan Beulich April 5, 2024, 5:59 a.m. UTC | #12
On 03.04.2024 15:27, Daniel P. Smith wrote:
> On 4/3/24 08:05, Jan Beulich wrote:
>> On 03.04.2024 13:10, Daniel P. Smith wrote:
>>> On 4/3/24 02:16, Jan Beulich wrote:
>>>> On 02.04.2024 19:06, Andrew Cooper wrote:
>>>>>    It is also quite
>>>>> obvious from XSM_TARGET that it has broken device model stubdoms too.
>>>>
>>>> Why would that be "obvious"? What business would a stubdom have to look at
>>>> Xen's side of an evtchn?
>>>
>>> Again, you have not expressed why it shouldn't be able to do so.
>>
>> See above - not its resource, nor its guest's.
> 
> It is a resource provided to a domain that the domain can send/raise an 
> event to and a backing domain that can bind to it, ie. the two 
> parameters that must be passed to the allocation call.

Before writing this particular part of your reply, did you look as
evtchn_send()? Sending on such ports is similarly denied without
involving XSM. For a good reason, stated in the accompanying comment.
It is therefore simply inconsistent to allow any kind of other
operation on such ports. Hence the patch that Andrew now deems needs
reverting.

In fact I view these ports living in the guest's event channel space
as similarly inappropriate as the ioreq pages - until a few years
back - living in the guest's GFN space.

Jan
Jan Beulich May 14, 2024, 9:25 a.m. UTC | #13
On 03.04.2024 08:16, Jan Beulich wrote:
> On 02.04.2024 19:06, Andrew Cooper wrote:
>> The commit makes a claim without any kind of justification.
> 
> Well, what does "have no business" leave open?
> 
>> The claim is false, and the commit broke lsevtchn in dom0.
> 
> Or alternatively lsevtchn was doing something that was never meant to work
> (from Xen's perspective).
> 
>>  It is also quite
>> obvious from XSM_TARGET that it has broken device model stubdoms too.
> 
> Why would that be "obvious"? What business would a stubdom have to look at
> Xen's side of an evtchn?
> 
>> Whether to return information about a xen-owned evtchn is a matter of policy,
>> and it's not acceptable to short circuit the XSM on the matter.
> 
> I can certainly accept this as one possible view point. As in so many cases
> I'm afraid I dislike you putting it as if it was the only possible one.
> 
> In summary: The supposed justification you claim is missing in the original
> change is imo also missing here then: What business would any entity in the
> system have to look at Xen's side of an event channel? Back at the time, 3
> people agreed that it's "none".

You've never responded to this reply of mine, or its follow-up. You also
didn't chime in on the discussion Daniel and I were having. I consider my
objections unaddressed, and in fact I continue to consider the change to
be wrong. Therefore it was inappropriate for you to commit it; it needs
reverting asap. If you're not going to do so, I will.

Jan
Andrew Cooper May 14, 2024, 9:51 a.m. UTC | #14
On 14/05/2024 10:25 am, Jan Beulich wrote:
> On 03.04.2024 08:16, Jan Beulich wrote:
>> On 02.04.2024 19:06, Andrew Cooper wrote:
>>> The commit makes a claim without any kind of justification.
>> Well, what does "have no business" leave open?
>>
>>> The claim is false, and the commit broke lsevtchn in dom0.
>> Or alternatively lsevtchn was doing something that was never meant to work
>> (from Xen's perspective).
>>
>>>  It is also quite
>>> obvious from XSM_TARGET that it has broken device model stubdoms too.
>> Why would that be "obvious"? What business would a stubdom have to look at
>> Xen's side of an evtchn?
>>
>>> Whether to return information about a xen-owned evtchn is a matter of policy,
>>> and it's not acceptable to short circuit the XSM on the matter.
>> I can certainly accept this as one possible view point. As in so many cases
>> I'm afraid I dislike you putting it as if it was the only possible one.
>>
>> In summary: The supposed justification you claim is missing in the original
>> change is imo also missing here then: What business would any entity in the
>> system have to look at Xen's side of an event channel? Back at the time, 3
>> people agreed that it's "none".
> You've never responded to this reply of mine, or its follow-up. You also
> didn't chime in on the discussion Daniel and I were having. I consider my
> objections unaddressed, and in fact I continue to consider the change to
> be wrong. Therefore it was inappropriate for you to commit it; it needs
> reverting asap. If you're not going to do so, I will.

You tried defending breaking a utility with "well it shouldn't exist then".

You don't have a leg to stand on, and two maintainers of relevant
subsystems here just got tired of bullshit being presented in place of
any credible argument for having done the change in the way you did.

The correct response was "Sorry I broke things.  Lets revert this for
now to unbreak, and I'll see about reworking it to not intentionally
subvert Xen's security mechanism".

As it stands, you're 2-1 outvoted, and wasted any sympathy I may have
had for the principle of the change based on the absurdity of your
arguments.

~Andrew
Jan Beulich May 14, 2024, 10:03 a.m. UTC | #15
On 14.05.2024 11:51, Andrew Cooper wrote:
> On 14/05/2024 10:25 am, Jan Beulich wrote:
>> On 03.04.2024 08:16, Jan Beulich wrote:
>>> On 02.04.2024 19:06, Andrew Cooper wrote:
>>>> The commit makes a claim without any kind of justification.
>>> Well, what does "have no business" leave open?
>>>
>>>> The claim is false, and the commit broke lsevtchn in dom0.
>>> Or alternatively lsevtchn was doing something that was never meant to work
>>> (from Xen's perspective).
>>>
>>>>  It is also quite
>>>> obvious from XSM_TARGET that it has broken device model stubdoms too.
>>> Why would that be "obvious"? What business would a stubdom have to look at
>>> Xen's side of an evtchn?
>>>
>>>> Whether to return information about a xen-owned evtchn is a matter of policy,
>>>> and it's not acceptable to short circuit the XSM on the matter.
>>> I can certainly accept this as one possible view point. As in so many cases
>>> I'm afraid I dislike you putting it as if it was the only possible one.
>>>
>>> In summary: The supposed justification you claim is missing in the original
>>> change is imo also missing here then: What business would any entity in the
>>> system have to look at Xen's side of an event channel? Back at the time, 3
>>> people agreed that it's "none".
>> You've never responded to this reply of mine, or its follow-up. You also
>> didn't chime in on the discussion Daniel and I were having. I consider my
>> objections unaddressed, and in fact I continue to consider the change to
>> be wrong. Therefore it was inappropriate for you to commit it; it needs
>> reverting asap. If you're not going to do so, I will.
> 
> You tried defending breaking a utility with "well it shouldn't exist then".
> 
> You don't have a leg to stand on, and two maintainers of relevant
> subsystems here just got tired of bullshit being presented in place of
> any credible argument for having done the change in the way you did.

Please can you finally get into the habit of not sending rude replies?

> The correct response was "Sorry I broke things.  Lets revert this for
> now to unbreak, and I'll see about reworking it to not intentionally
> subvert Xen's security mechanism".

I'm sorry, but I didn't break things. I made things more consistent with
the earlier change, as pointed out before: With your revert,
evtchn_status() is now (again) inconsistent with e.g. evtchn_send(). If
you were serious about this being something that needs leaving to XSM,
you'd have adjusted such further uses of consumer_is_xen() as well. But
you aren't. You're merely insisting on lsevtchn needing to continue to
work in a way it should never have worked, with a patch to improve the
situation already pending.

Just to state a very basic principle here again: Xen-internal event
channels ought to either be fully under XSM control when it comes to
domains attempting to access them (in whichever way), or they should
truly be Xen-internal, with access uniformly prevented. To me the
former option simply makes very little sense.

> As it stands, you're 2-1 outvoted, and wasted any sympathy I may have
> had for the principle of the change based on the absurdity of your
> arguments.

No, pending objections are pending objections. Daniel's responses didn't
eliminate them.

As a separate aspect: I can't assume anymore that it is just coincidence
that you taking such a controversial action is at a time when I'm away.

Jan
Julien Grall May 14, 2024, 11:13 a.m. UTC | #16
Hi,

(+ Oleksii as the release manager)

Chiming into the discussion as there seems there is disagreement.

On 14/05/2024 11:03, Jan Beulich wrote:
> On 14.05.2024 11:51, Andrew Cooper wrote:
>> On 14/05/2024 10:25 am, Jan Beulich wrote:
>>> On 03.04.2024 08:16, Jan Beulich wrote:
>>>> On 02.04.2024 19:06, Andrew Cooper wrote:
>>>>> The commit makes a claim without any kind of justification.
>>>> Well, what does "have no business" leave open?
>>>>
>>>>> The claim is false, and the commit broke lsevtchn in dom0.
>>>> Or alternatively lsevtchn was doing something that was never meant to work
>>>> (from Xen's perspective).
>>>>
>>>>>   It is also quite
>>>>> obvious from XSM_TARGET that it has broken device model stubdoms too.
>>>> Why would that be "obvious"? What business would a stubdom have to look at
>>>> Xen's side of an evtchn?
>>>>
>>>>> Whether to return information about a xen-owned evtchn is a matter of policy,
>>>>> and it's not acceptable to short circuit the XSM on the matter.
>>>> I can certainly accept this as one possible view point. As in so many cases
>>>> I'm afraid I dislike you putting it as if it was the only possible one.
>>>>
>>>> In summary: The supposed justification you claim is missing in the original
>>>> change is imo also missing here then: What business would any entity in the
>>>> system have to look at Xen's side of an event channel? Back at the time, 3
>>>> people agreed that it's "none".
>>> You've never responded to this reply of mine, or its follow-up. You also
>>> didn't chime in on the discussion Daniel and I were having. I consider my
>>> objections unaddressed, and in fact I continue to consider the change to
>>> be wrong. Therefore it was inappropriate for you to commit it; it needs
>>> reverting asap. If you're not going to do so, I will.
>>
>> You tried defending breaking a utility with "well it shouldn't exist then".
>>
>> You don't have a leg to stand on, and two maintainers of relevant
>> subsystems here just got tired of bullshit being presented in place of
>> any credible argument for having done the change in the way you did.
> 
> Please can you finally get into the habit of not sending rude replies?
> 
>> The correct response was "Sorry I broke things.  Lets revert this for
>> now to unbreak, and I'll see about reworking it to not intentionally
>> subvert Xen's security mechanism".
> 
> I'm sorry, but I didn't break things. I made things more consistent with
> the earlier change, as pointed out before: With your revert,
> evtchn_status() is now (again) inconsistent with e.g. evtchn_send(). If
> you were serious about this being something that needs leaving to XSM,
> you'd have adjusted such further uses of consumer_is_xen() as well. But
> you aren't. You're merely insisting on lsevtchn needing to continue to
> work in a way it should never have worked, with a patch to improve the
> situation already pending.
> 
> Just to state a very basic principle here again: Xen-internal event
> channels ought to either be fully under XSM control when it comes to
> domains attempting to access them (in whichever way), or they should
> truly be Xen-internal, with access uniformly prevented. To me the
> former option simply makes very little sense.

I agree we need consistency on how we handle security policy event 
channel. Although, I don't have a strong opinion on which way to go.

For the commit message, it is not entirely clear what "broke lseventch 
in dom0" really mean. Is it lsevtchn would not stop or it will just not 
display the event channel?

If the former, isn't a sign that the tool needs to be harden a bit more? 
If the latter, then I would argue that consistency for the XSM policy is 
more important than displaying the event channel for now (the patch was 
also committed 3 years ago...).

So I would vote for a revert and, if desired, replacing with a patch 
that would change the XSM policy consistently. Alternatively, the 
consistency should be a blocker for Xen 4.19.

> 
>> As it stands, you're 2-1 outvoted, and wasted any sympathy I may have
>> had for the principle of the change based on the absurdity of your
>> arguments.
> 
> No, pending objections are pending objections. Daniel's responses didn't
> eliminate them.

Indeed, this is rule 4 of the check-in policy:

4. There must be no "open" objections.

I don't view Jan's objections as unreasonable in particular for the 
consistency part.

> As a separate aspect: I can't assume anymore that it is just coincidence
> that you taking such a controversial action is at a time when I'm away.

Cheers,
Stefano Stabellini May 14, 2024, 9:35 p.m. UTC | #17
On Tue, 14 May 2024, Julien Grall wrote:
> Hi,
> 
> (+ Oleksii as the release manager)
> 
> Chiming into the discussion as there seems there is disagreement.
> 
> On 14/05/2024 11:03, Jan Beulich wrote:
> > On 14.05.2024 11:51, Andrew Cooper wrote:
> > > On 14/05/2024 10:25 am, Jan Beulich wrote:
> > > > On 03.04.2024 08:16, Jan Beulich wrote:
> > > > > On 02.04.2024 19:06, Andrew Cooper wrote:
> > > > > > The commit makes a claim without any kind of justification.
> > > > > Well, what does "have no business" leave open?
> > > > > 
> > > > > > The claim is false, and the commit broke lsevtchn in dom0.
> > > > > Or alternatively lsevtchn was doing something that was never meant to
> > > > > work
> > > > > (from Xen's perspective).
> > > > > 
> > > > > >   It is also quite
> > > > > > obvious from XSM_TARGET that it has broken device model stubdoms
> > > > > > too.
> > > > > Why would that be "obvious"? What business would a stubdom have to
> > > > > look at
> > > > > Xen's side of an evtchn?
> > > > > 
> > > > > > Whether to return information about a xen-owned evtchn is a matter
> > > > > > of policy,
> > > > > > and it's not acceptable to short circuit the XSM on the matter.
> > > > > I can certainly accept this as one possible view point. As in so many
> > > > > cases
> > > > > I'm afraid I dislike you putting it as if it was the only possible
> > > > > one.
> > > > > 
> > > > > In summary: The supposed justification you claim is missing in the
> > > > > original
> > > > > change is imo also missing here then: What business would any entity
> > > > > in the
> > > > > system have to look at Xen's side of an event channel? Back at the
> > > > > time, 3
> > > > > people agreed that it's "none".
> > > > You've never responded to this reply of mine, or its follow-up. You also
> > > > didn't chime in on the discussion Daniel and I were having. I consider
> > > > my
> > > > objections unaddressed, and in fact I continue to consider the change to
> > > > be wrong. Therefore it was inappropriate for you to commit it; it needs
> > > > reverting asap. If you're not going to do so, I will.
> > > 
> > > You tried defending breaking a utility with "well it shouldn't exist
> > > then".
> > > 
> > > You don't have a leg to stand on, and two maintainers of relevant
> > > subsystems here just got tired of bullshit being presented in place of
> > > any credible argument for having done the change in the way you did.
> > 
> > Please can you finally get into the habit of not sending rude replies?
> > 
> > > The correct response was "Sorry I broke things.  Lets revert this for
> > > now to unbreak, and I'll see about reworking it to not intentionally
> > > subvert Xen's security mechanism".
> > 
> > I'm sorry, but I didn't break things. I made things more consistent with
> > the earlier change, as pointed out before: With your revert,
> > evtchn_status() is now (again) inconsistent with e.g. evtchn_send(). If
> > you were serious about this being something that needs leaving to XSM,
> > you'd have adjusted such further uses of consumer_is_xen() as well. But
> > you aren't. You're merely insisting on lsevtchn needing to continue to
> > work in a way it should never have worked, with a patch to improve the
> > situation already pending.
> > 
> > Just to state a very basic principle here again: Xen-internal event
> > channels ought to either be fully under XSM control when it comes to
> > domains attempting to access them (in whichever way), or they should
> > truly be Xen-internal, with access uniformly prevented. To me the
> > former option simply makes very little sense.
> 
> I agree we need consistency on how we handle security policy event channel.
> Although, I don't have a strong opinion on which way to go.

Same here


> For the commit message, it is not entirely clear what "broke lseventch in
> dom0" really mean. Is it lsevtchn would not stop or it will just not display
> the event channel?
> 
> If the former, isn't a sign that the tool needs to be harden a bit more? If
> the latter, then I would argue that consistency for the XSM policy is more
> important than displaying the event channel for now (the patch was also
> committed 3 years ago...).

I realize 3 years have passed and it is a long time, but many
downstreams (including some which are widely used) don't rebase
regularly and we are still missing lots of tests from gitlab-ci. The
unfortunate result is that it can take years to realize there is a
breakage. We need more gitlab-ci (or OSSTest) tests.


> So I would vote for a revert and, if desired, replacing with a patch that
> would change the XSM policy consistently. Alternatively, the consistency
> should be a blocker for Xen 4.19.

I am convinced by Daniel's argument here:

https://marc.info/?l=xen-devel&m=171215093102694
https://marc.info/?l=xen-devel&m=171215073502479

I would ack Andrew's revert. If we decide to revert Andrew's revert, I
also think that we should make the alternative solution, whatever that
might be, a blocker for Xen 4.19.

My favorite alternative solition is Daniel's suggestion of adding a
check to the dummy XSM policy. I am not sure if this is the same thing
you mean with "change the XSM policy consistently".


> > > As it stands, you're 2-1 outvoted, and wasted any sympathy I may have
> > > had for the principle of the change based on the absurdity of your
> > > arguments.
> > 
> > No, pending objections are pending objections. Daniel's responses didn't
> > eliminate them.
> 
> Indeed, this is rule 4 of the check-in policy:
> 
> 4. There must be no "open" objections.
> 
> I don't view Jan's objections as unreasonable in particular for the
> consistency part.
Jan Beulich May 15, 2024, 7:33 a.m. UTC | #18
On 14.05.2024 23:35, Stefano Stabellini wrote:
> On Tue, 14 May 2024, Julien Grall wrote:
>> On 14/05/2024 11:03, Jan Beulich wrote:
>>> On 14.05.2024 11:51, Andrew Cooper wrote:
>>>> You tried defending breaking a utility with "well it shouldn't exist
>>>> then".
>>>>
>>>> You don't have a leg to stand on, and two maintainers of relevant
>>>> subsystems here just got tired of bullshit being presented in place of
>>>> any credible argument for having done the change in the way you did.
>>>
>>> Please can you finally get into the habit of not sending rude replies?
>>>
>>>> The correct response was "Sorry I broke things.  Lets revert this for
>>>> now to unbreak, and I'll see about reworking it to not intentionally
>>>> subvert Xen's security mechanism".
>>>
>>> I'm sorry, but I didn't break things. I made things more consistent with
>>> the earlier change, as pointed out before: With your revert,
>>> evtchn_status() is now (again) inconsistent with e.g. evtchn_send(). If
>>> you were serious about this being something that needs leaving to XSM,
>>> you'd have adjusted such further uses of consumer_is_xen() as well. But
>>> you aren't. You're merely insisting on lsevtchn needing to continue to
>>> work in a way it should never have worked, with a patch to improve the
>>> situation already pending.
>>>
>>> Just to state a very basic principle here again: Xen-internal event
>>> channels ought to either be fully under XSM control when it comes to
>>> domains attempting to access them (in whichever way), or they should
>>> truly be Xen-internal, with access uniformly prevented. To me the
>>> former option simply makes very little sense.
>>
>> I agree we need consistency on how we handle security policy event channel.
>> Although, I don't have a strong opinion on which way to go.
> 
> Same here
> 
> 
>> For the commit message, it is not entirely clear what "broke lseventch in
>> dom0" really mean. Is it lsevtchn would not stop or it will just not display
>> the event channel?
>>
>> If the former, isn't a sign that the tool needs to be harden a bit more? If
>> the latter, then I would argue that consistency for the XSM policy is more
>> important than displaying the event channel for now (the patch was also
>> committed 3 years ago...).
> 
> I realize 3 years have passed and it is a long time, but many
> downstreams (including some which are widely used) don't rebase
> regularly and we are still missing lots of tests from gitlab-ci. The
> unfortunate result is that it can take years to realize there is a
> breakage. We need more gitlab-ci (or OSSTest) tests.
> 
> 
>> So I would vote for a revert and, if desired, replacing with a patch that
>> would change the XSM policy consistently. Alternatively, the consistency
>> should be a blocker for Xen 4.19.
> 
> I am convinced by Daniel's argument here:
> 
> https://marc.info/?l=xen-devel&m=171215093102694

I particularly disagree with the "since it is access control and falls under
the purview of XSM" in there, without addressing my point regarding Xen-
internal resources. It is a fundamental hypervisor decision whether to leave
access to Xen-internal resources to XSM control. If that decision ended up
being "yes", then I agree XSM maintainers may ack a respective change. If
that decision as "no", though, acking would purely fall to REST for code
like what is being touched here.

Just to further clarify: If it was "yes" above, other Xen-internal resources
then also ought to be domain accessible based on XSM policy. I don't think
that's the case e.g. for Xen-private memory.

IOW I can't help the impression that both the patch and the ack were
provided looking at just the one special case, driven by the (perceived)
tool breakage (see below).

> https://marc.info/?l=xen-devel&m=171215073502479

In there he said in particular: "And it is incorrect because as again you
have not articulated why the lsevtchn behavior is wrong and thus whether
this is the valid corrective action." Daniel, did you even look at the code
when saying so? With the revert in place, lsevtchn is still going to fall
on the nose when XSM denies access to a particular channel. I didn't think
this needed calling out explicitly; the tool needs fixing.

> I would ack Andrew's revert. If we decide to revert Andrew's revert, I
> also think that we should make the alternative solution, whatever that
> might be, a blocker for Xen 4.19.
> 
> My favorite alternative solition is Daniel's suggestion of adding a
> check to the dummy XSM policy. I am not sure if this is the same thing
> you mean with "change the XSM policy consistently".

I don't think it would be, but I also don't know what exact check was
thought about. I think I was quite clear about evtchn_send()'s similar
code (there may be more). All of these want to behave the same: All
involving XSM, or all not doing so. This is the kind of thing where I
don't think any "majority" can trump technical aspects. If the verdict was
"XSM", then yes, my original patch would have moved us in the wrong
direction. But then a plain revert is insufficient, and the blaming in
there also should have been done at least differently, if at all.

Jan
Kelly Choi May 15, 2024, 10:49 a.m. UTC | #19
Hi all,

As Stefano has mentioned, we have the maintainers and committers call later
today.
Let's use this time to better collaborate and discuss any differences in
opinions we have. It will also give everyone a chance to explain their
viewpoints.

Andy, please can I remind you to keep the language clean and professional
in line with our standards as a community.

Many thanks,
Kelly Choi

Community Manager
Xen Project


On Tue, May 14, 2024 at 10:51 AM Andrew Cooper <andrew.cooper3@citrix.com>
wrote:

> On 14/05/2024 10:25 am, Jan Beulich wrote:
> > On 03.04.2024 08:16, Jan Beulich wrote:
> >> On 02.04.2024 19:06, Andrew Cooper wrote:
> >>> The commit makes a claim without any kind of justification.
> >> Well, what does "have no business" leave open?
> >>
> >>> The claim is false, and the commit broke lsevtchn in dom0.
> >> Or alternatively lsevtchn was doing something that was never meant to
> work
> >> (from Xen's perspective).
> >>
> >>>  It is also quite
> >>> obvious from XSM_TARGET that it has broken device model stubdoms too.
> >> Why would that be "obvious"? What business would a stubdom have to look
> at
> >> Xen's side of an evtchn?
> >>
> >>> Whether to return information about a xen-owned evtchn is a matter of
> policy,
> >>> and it's not acceptable to short circuit the XSM on the matter.
> >> I can certainly accept this as one possible view point. As in so many
> cases
> >> I'm afraid I dislike you putting it as if it was the only possible one.
> >>
> >> In summary: The supposed justification you claim is missing in the
> original
> >> change is imo also missing here then: What business would any entity in
> the
> >> system have to look at Xen's side of an event channel? Back at the
> time, 3
> >> people agreed that it's "none".
> > You've never responded to this reply of mine, or its follow-up. You also
> > didn't chime in on the discussion Daniel and I were having. I consider my
> > objections unaddressed, and in fact I continue to consider the change to
> > be wrong. Therefore it was inappropriate for you to commit it; it needs
> > reverting asap. If you're not going to do so, I will.
>
> You tried defending breaking a utility with "well it shouldn't exist then".
>
> You don't have a leg to stand on, and two maintainers of relevant
> subsystems here just got tired of bullshit being presented in place of
> any credible argument for having done the change in the way you did.
>
> The correct response was "Sorry I broke things.  Lets revert this for
> now to unbreak, and I'll see about reworking it to not intentionally
> subvert Xen's security mechanism".
>
> As it stands, you're 2-1 outvoted, and wasted any sympathy I may have
> had for the principle of the change based on the absurdity of your
> arguments.
>
> ~Andrew
>
George Dunlap May 15, 2024, 12:59 p.m. UTC | #20
On Tue, May 14, 2024 at 10:51 AM Andrew Cooper
<andrew.cooper3@citrix.com> wrote:
>
> On 14/05/2024 10:25 am, Jan Beulich wrote:
> > On 03.04.2024 08:16, Jan Beulich wrote:
> >> On 02.04.2024 19:06, Andrew Cooper wrote:
> >>> The commit makes a claim without any kind of justification.
> >> Well, what does "have no business" leave open?
> >>
> >>> The claim is false, and the commit broke lsevtchn in dom0.
> >> Or alternatively lsevtchn was doing something that was never meant to work
> >> (from Xen's perspective).
> >>
> >>>  It is also quite
> >>> obvious from XSM_TARGET that it has broken device model stubdoms too.
> >> Why would that be "obvious"? What business would a stubdom have to look at
> >> Xen's side of an evtchn?
> >>
> >>> Whether to return information about a xen-owned evtchn is a matter of policy,
> >>> and it's not acceptable to short circuit the XSM on the matter.
> >> I can certainly accept this as one possible view point. As in so many cases
> >> I'm afraid I dislike you putting it as if it was the only possible one.
> >>
> >> In summary: The supposed justification you claim is missing in the original
> >> change is imo also missing here then: What business would any entity in the
> >> system have to look at Xen's side of an event channel? Back at the time, 3
> >> people agreed that it's "none".
> > You've never responded to this reply of mine, or its follow-up. You also
> > didn't chime in on the discussion Daniel and I were having. I consider my
> > objections unaddressed, and in fact I continue to consider the change to
> > be wrong. Therefore it was inappropriate for you to commit it; it needs
> > reverting asap. If you're not going to do so, I will.
>
> You tried defending breaking a utility with "well it shouldn't exist then".
>
> You don't have a leg to stand on, and two maintainers of relevant
> subsystems here just got tired of bullshit being presented in place of
> any credible argument for having done the change in the way you did.
>
> The correct response was "Sorry I broke things.  Lets revert this for
> now to unbreak, and I'll see about reworking it to not intentionally
> subvert Xen's security mechanism".
>
> As it stands, you're 2-1 outvoted, and wasted any sympathy I may have
> had for the principle of the change based on the absurdity of your
> arguments.

We have a simple process for dealing with this situation, Andy, which
you didn't follow.  You can't just go checking things in because you
feel strongly about it.

 -George
Jan Beulich May 16, 2024, 6:41 a.m. UTC | #21
On 14.05.2024 11:25, Jan Beulich wrote:
> On 03.04.2024 08:16, Jan Beulich wrote:
>> On 02.04.2024 19:06, Andrew Cooper wrote:
>>> The commit makes a claim without any kind of justification.
>>
>> Well, what does "have no business" leave open?
>>
>>> The claim is false, and the commit broke lsevtchn in dom0.
>>
>> Or alternatively lsevtchn was doing something that was never meant to work
>> (from Xen's perspective).
>>
>>>  It is also quite
>>> obvious from XSM_TARGET that it has broken device model stubdoms too.
>>
>> Why would that be "obvious"? What business would a stubdom have to look at
>> Xen's side of an evtchn?
>>
>>> Whether to return information about a xen-owned evtchn is a matter of policy,
>>> and it's not acceptable to short circuit the XSM on the matter.
>>
>> I can certainly accept this as one possible view point. As in so many cases
>> I'm afraid I dislike you putting it as if it was the only possible one.
>>
>> In summary: The supposed justification you claim is missing in the original
>> change is imo also missing here then: What business would any entity in the
>> system have to look at Xen's side of an event channel? Back at the time, 3
>> people agreed that it's "none".
> 
> You've never responded to this reply of mine, or its follow-up. You also
> didn't chime in on the discussion Daniel and I were having. I consider my
> objections unaddressed, and in fact I continue to consider the change to
> be wrong. Therefore it was inappropriate for you to commit it; it needs
> reverting asap. If you're not going to do so, I will.

For the record: With Andrew having clarified that in his revert's
description:

"The claim is false; it broke lsevtchn in dom0, a debugging utility which
 absolutely does care about all of the domain's event channels."

"all of the domain's event channels" means to include event channels Xen
uses for its own, and merely puts in the domain's table for ease of
handling, I've agreed that - in the absence of any better debugging
means - the revert may stay in place. For context, my prior understanding
of the issue was that lsevtchn simply stops probing further channels when
getting back any kind of error from EVTCHNOP_status (which I continue to
think wants addressing there, by a future version of a patch that was
already sent). However, there are follow-on concerns I have:

1) In the discussion George claimed that exposing status information in
an uncontrolled manner is okay. I'm afraid I have to disagree, seeing
how a similar assumption by CPU designers has led to a flood of
vulnerabilities over the last 6+ years. Information exposure imo is never
okay, unless it can be _proven_ that absolutely nothing "useful" can be
inferred from it. (I'm having difficulty seeing how such a proof might
look like.)

2) Me pointing out that the XSM hook might similarly get in the way of
debugging, Andrew suggested that this is not an issue because any sensible
XSM policy used in such an environment would grant sufficient privilege to
Dom0. Yet that then still doesn't cover why DomU-s also can obtain status
for Xen-internal event channels. The debugging argument then becomes weak,
as in that case the XSM hook is possibly going to get in the way.

3) In the discussion Andrew further gave the impression that evtchn_send()
had no XSM check. Yet it has; the difference to evtchn_status() is that
the latter uses XSM_TARGET while the former uses XSM_HOOK. (Much like
evtchn_status() may indeed be useful for debugging, evtchn_send() may be
similarly useful to allow getting a stuck channel unstuck.)

In summary I continue to think that an outright revert was inappropriate.
DomU-s should continue to be denied status information on Xen-internal
event channels, unconditionally and independent of whether dummy, silo, or
Flask is in use.

Plus, as stated before, evtchn_send() would better remain in sync in this
regard with evtchn_status(). The situation is less clear for evtchn_close()
and evtchn_bind_vcpu(): Those indeed have no XSM checks while they do deny
operation on Xen-internal channels. It is likely more far fetched to
assume permitting them for debugging purposes might in fact help in rare
situations. Yet it may still be a matter of consistency to keep them in
sync with the other two. (Note that there's also evtchn_usable(), which
might then also need tweaking itself or at the use sites.)

FTR, it is going to be only then that I would consider the cumulative
result as eligible for backporting. For this purpose, at the risk of
being flamed again, it might still be easier to revert the revert and then
put in place a change meeting the above criteria. That could then be taken
for backport as is.

Jan
Oleksii K. May 16, 2024, 7:15 p.m. UTC | #22
On Tue, 2024-05-14 at 12:13 +0100, Julien Grall wrote:
> Hi,
> 
> (+ Oleksii as the release manager)
> 
> Chiming into the discussion as there seems there is disagreement.
> 
> On 14/05/2024 11:03, Jan Beulich wrote:
> > On 14.05.2024 11:51, Andrew Cooper wrote:
> > > On 14/05/2024 10:25 am, Jan Beulich wrote:
> > > > On 03.04.2024 08:16, Jan Beulich wrote:
> > > > > On 02.04.2024 19:06, Andrew Cooper wrote:
> > > > > > The commit makes a claim without any kind of justification.
> > > > > Well, what does "have no business" leave open?
> > > > > 
> > > > > > The claim is false, and the commit broke lsevtchn in dom0.
> > > > > Or alternatively lsevtchn was doing something that was never
> > > > > meant to work
> > > > > (from Xen's perspective).
> > > > > 
> > > > > >   It is also quite
> > > > > > obvious from XSM_TARGET that it has broken device model
> > > > > > stubdoms too.
> > > > > Why would that be "obvious"? What business would a stubdom
> > > > > have to look at
> > > > > Xen's side of an evtchn?
> > > > > 
> > > > > > Whether to return information about a xen-owned evtchn is a
> > > > > > matter of policy,
> > > > > > and it's not acceptable to short circuit the XSM on the
> > > > > > matter.
> > > > > I can certainly accept this as one possible view point. As in
> > > > > so many cases
> > > > > I'm afraid I dislike you putting it as if it was the only
> > > > > possible one.
> > > > > 
> > > > > In summary: The supposed justification you claim is missing
> > > > > in the original
> > > > > change is imo also missing here then: What business would any
> > > > > entity in the
> > > > > system have to look at Xen's side of an event channel? Back
> > > > > at the time, 3
> > > > > people agreed that it's "none".
> > > > You've never responded to this reply of mine, or its follow-up.
> > > > You also
> > > > didn't chime in on the discussion Daniel and I were having. I
> > > > consider my
> > > > objections unaddressed, and in fact I continue to consider the
> > > > change to
> > > > be wrong. Therefore it was inappropriate for you to commit it;
> > > > it needs
> > > > reverting asap. If you're not going to do so, I will.
> > > 
> > > You tried defending breaking a utility with "well it shouldn't
> > > exist then".
> > > 
> > > You don't have a leg to stand on, and two maintainers of relevant
> > > subsystems here just got tired of bullshit being presented in
> > > place of
> > > any credible argument for having done the change in the way you
> > > did.
> > 
> > Please can you finally get into the habit of not sending rude
> > replies?
> > 
> > > The correct response was "Sorry I broke things.  Lets revert this
> > > for
> > > now to unbreak, and I'll see about reworking it to not
> > > intentionally
> > > subvert Xen's security mechanism".
> > 
> > I'm sorry, but I didn't break things. I made things more consistent
> > with
> > the earlier change, as pointed out before: With your revert,
> > evtchn_status() is now (again) inconsistent with e.g.
> > evtchn_send(). If
> > you were serious about this being something that needs leaving to
> > XSM,
> > you'd have adjusted such further uses of consumer_is_xen() as well.
> > But
> > you aren't. You're merely insisting on lsevtchn needing to continue
> > to
> > work in a way it should never have worked, with a patch to improve
> > the
> > situation already pending.
> > 
> > Just to state a very basic principle here again: Xen-internal event
> > channels ought to either be fully under XSM control when it comes
> > to
> > domains attempting to access them (in whichever way), or they
> > should
> > truly be Xen-internal, with access uniformly prevented. To me the
> > former option simply makes very little sense.
> 
> I agree we need consistency on how we handle security policy event 
> channel. Although, I don't have a strong opinion on which way to go.
> 
> For the commit message, it is not entirely clear what "broke
> lseventch 
> in dom0" really mean. Is it lsevtchn would not stop or it will just
> not 
> display the event channel?
> 
> If the former, isn't a sign that the tool needs to be harden a bit
> more? 
> If the latter, then I would argue that consistency for the XSM policy
> is 
> more important than displaying the event channel for now (the patch
> was 
> also committed 3 years ago...).
> 
> So I would vote for a revert and, if desired, replacing with a patch 
> that would change the XSM policy consistently. Alternatively, the 
> consistency should be a blocker for Xen 4.19.
Sorry for the delayed response.

I am not deeply familiar with the technical details surrounding XSM,
but if I understand Daniel's point correctly, the original change
violates the access control framework. This suggests to me that the
revert should be merged.

However, I have a question: if we merge this revert, does it mean that
using channels a user ( domain ) will be able to get information about
certain events such as EVTCHNSTAT_unbound, EVTCHNSTAT_interdomain,
EVTCHNSTAT_pirq, EVTCHNSTAT_virq, and EVTCHNSTAT_IPI (based on the code
of lseventch.c)? Is this information really so critical that it cannot
be exposed for some time until a patch that changes the XSM policy
consistently is provided and merged?

If this information is indeed critical and should not be exposed, I
think we can consider Daniel's suggestion to add a check to the dummy
XSM policy as a solution.

~ Oleksii
> 
> > 
> > > As it stands, you're 2-1 outvoted, and wasted any sympathy I may
> > > have
> > > had for the principle of the change based on the absurdity of
> > > your
> > > arguments.
> > 
> > No, pending objections are pending objections. Daniel's responses
> > didn't
> > eliminate them.
> 
> Indeed, this is rule 4 of the check-in policy:
> 
> 4. There must be no "open" objections.
> 
> I don't view Jan's objections as unreasonable in particular for the 
> consistency part.
> 
> > As a separate aspect: I can't assume anymore that it is just
> > coincidence
> > that you taking such a controversial action is at a time when I'm
> > away.
> 
> Cheers,
>
Stefano Stabellini May 17, 2024, 1:21 a.m. UTC | #23
On Thu, 16 May 2024, Jan Beulich wrote:
> 1) In the discussion George claimed that exposing status information in
> an uncontrolled manner is okay. I'm afraid I have to disagree, seeing
> how a similar assumption by CPU designers has led to a flood of
> vulnerabilities over the last 6+ years. Information exposure imo is never
> okay, unless it can be _proven_ that absolutely nothing "useful" can be
> inferred from it. (I'm having difficulty seeing how such a proof might
> look like.)

Many would agree that it is better not to expose status information in
an uncontrolled manner. Anyway, let's focus on the actionable.


> 2) Me pointing out that the XSM hook might similarly get in the way of
> debugging, Andrew suggested that this is not an issue because any sensible
> XSM policy used in such an environment would grant sufficient privilege to
> Dom0. Yet that then still doesn't cover why DomU-s also can obtain status
> for Xen-internal event channels. The debugging argument then becomes weak,
> as in that case the XSM hook is possibly going to get in the way.
>
> 3) In the discussion Andrew further gave the impression that evtchn_send()
> had no XSM check. Yet it has; the difference to evtchn_status() is that
> the latter uses XSM_TARGET while the former uses XSM_HOOK. (Much like
> evtchn_status() may indeed be useful for debugging, evtchn_send() may be
> similarly useful to allow getting a stuck channel unstuck.)
> 
> In summary I continue to think that an outright revert was inappropriate.
> DomU-s should continue to be denied status information on Xen-internal
> event channels, unconditionally and independent of whether dummy, silo, or
> Flask is in use.

I think DomU-s should continue to be denied status information on
Xen-internal event channels *based on the default dummy, silo, or Flask
policy*. It is not up to us to decide the security policy, only to
enforce it and provide sensible defaults.

In any case, the XSM_TARGET check in evtchn_status seems to do what we
want?

evtchn_send uses XSM_HOOK, which is weaker, but it doesn't seem to be an
issue because (ignoring the consumer_is_xen check) there is a if(!lchn)
check that would fail on invalid event channels?


> Plus, as stated before, evtchn_send() would better remain in sync in this
> regard with evtchn_status(). The situation is less clear for evtchn_close()
> and evtchn_bind_vcpu(): Those indeed have no XSM checks while they do deny
> operation on Xen-internal channels. It is likely more far fetched to
> assume permitting them for debugging purposes might in fact help in rare
> situations. Yet it may still be a matter of consistency to keep them in
> sync with the other two. (Note that there's also evtchn_usable(), which
> might then also need tweaking itself or at the use sites.)
> 
> FTR, it is going to be only then that I would consider the cumulative
> result as eligible for backporting. For this purpose, at the risk of
> being flamed again, it might still be easier to revert the revert and then
> put in place a change meeting the above criteria. That could then be taken
> for backport as is.

I think we want to keep the revert because we need to unbreak lsevtchn.c.
Your point about consistency is valid and it would be good to go in that
direction but to me it is not the kind of thing that we would make
release-blocking.
Daniel P. Smith May 17, 2024, 1:22 a.m. UTC | #24
On 5/16/24 02:41, Jan Beulich wrote:
> On 14.05.2024 11:25, Jan Beulich wrote:
>> On 03.04.2024 08:16, Jan Beulich wrote:
>>> On 02.04.2024 19:06, Andrew Cooper wrote:
>>>> The commit makes a claim without any kind of justification.
>>>
>>> Well, what does "have no business" leave open?
>>>
>>>> The claim is false, and the commit broke lsevtchn in dom0.
>>>
>>> Or alternatively lsevtchn was doing something that was never meant to work
>>> (from Xen's perspective).
>>>
>>>>   It is also quite
>>>> obvious from XSM_TARGET that it has broken device model stubdoms too.
>>>
>>> Why would that be "obvious"? What business would a stubdom have to look at
>>> Xen's side of an evtchn?
>>>
>>>> Whether to return information about a xen-owned evtchn is a matter of policy,
>>>> and it's not acceptable to short circuit the XSM on the matter.
>>>
>>> I can certainly accept this as one possible view point. As in so many cases
>>> I'm afraid I dislike you putting it as if it was the only possible one.
>>>
>>> In summary: The supposed justification you claim is missing in the original
>>> change is imo also missing here then: What business would any entity in the
>>> system have to look at Xen's side of an event channel? Back at the time, 3
>>> people agreed that it's "none".
>>
>> You've never responded to this reply of mine, or its follow-up. You also
>> didn't chime in on the discussion Daniel and I were having. I consider my
>> objections unaddressed, and in fact I continue to consider the change to
>> be wrong. Therefore it was inappropriate for you to commit it; it needs
>> reverting asap. If you're not going to do so, I will.
> 
> For the record: With Andrew having clarified that in his revert's
> description:
> 
> "The claim is false; it broke lsevtchn in dom0, a debugging utility which
>   absolutely does care about all of the domain's event channels."
> 
> "all of the domain's event channels" means to include event channels Xen
> uses for its own, and merely puts in the domain's table for ease of
> handling, I've agreed that - in the absence of any better debugging
> means - the revert may stay in place. For context, my prior understanding
> of the issue was that lsevtchn simply stops probing further channels when
> getting back any kind of error from EVTCHNOP_status (which I continue to
> think wants addressing there, by a future version of a patch that was
> already sent). However, there are follow-on concerns I have:
> 
> 1) In the discussion George claimed that exposing status information in
> an uncontrolled manner is okay. I'm afraid I have to disagree, seeing
> how a similar assumption by CPU designers has led to a flood of
> vulnerabilities over the last 6+ years. Information exposure imo is never
> okay, unless it can be _proven_ that absolutely nothing "useful" can be
> inferred from it. (I'm having difficulty seeing how such a proof might
> look like.)

Jan, I would agree with you that any resources exposed to the guest, 
even if that resource is status information, should be behind an XSM 
check. I would, however, have to disagree the position over proving 
absolutely nothing "useful" can be inferred. Prior to spectre becoming 
commonly aware, no one considered proving there needed to be protections 
against out-of-order instruction execution being used to bypass branch 
conditions. Predicting the future will more often than not fail, but 
with an XSM check in place, the dummy/default policy can just be updated 
with the general safe decision and others can use FLASK to provide fine 
grain access.

> 2) Me pointing out that the XSM hook might similarly get in the way of
> debugging, Andrew suggested that this is not an issue because any sensible
> XSM policy used in such an environment would grant sufficient privilege to
> Dom0. Yet that then still doesn't cover why DomU-s also can obtain status
> for Xen-internal event channels. The debugging argument then becomes weak,
> as in that case the XSM hook is possibly going to get in the way.

And this is where we have a fundamental difference. Event channels are 
XSM labeled objects that are always connected to a guest. If they were 
truly Xen-internal, then a guest would have no way to get 
access/reference them. Again, I never disagreed with whether the guest 
should or should not be able to access them. I did ask for a better 
explanation than, "has no business", which is a statement of opinion not 
of fact or reason. The point is these event channels are a resource 
attached to the guest and access control decisions to them should be 
made under XSM. Injecting a hard-coded access control in front of the 
XSM check subverted/invalidated rules in the existing FLASK policy.

> 3) In the discussion Andrew further gave the impression that evtchn_send()
> had no XSM check. Yet it has; the difference to evtchn_status() is that
> the latter uses XSM_TARGET while the former uses XSM_HOOK. (Much like
> evtchn_status() may indeed be useful for debugging, evtchn_send() may be
> similarly useful to allow getting a stuck channel unstuck.)

A more appropriate default might be XSM_OTHER with conditions with in 
the hook  as to whether the policy should be XSM_HOOK, XSM_TARGET or 
flat denial.

> In summary I continue to think that an outright revert was inappropriate.
> DomU-s should continue to be denied status information on Xen-internal
> event channels, unconditionally and independent of whether dummy, silo, or
> Flask is in use.

Any guest facing resources, regardless if the backing end is the 
hypervisor, should be protected with XSM. This allows the maintainers to 
codify what they believe is a sane policy in the dummy policy and the 
end user can use FLASK to enforce what they believe is acceptable.

> Plus, as stated before, evtchn_send() would better remain in sync in this
> regard with evtchn_status(). The situation is less clear for evtchn_close()
> and evtchn_bind_vcpu(): Those indeed have no XSM checks while they do deny
> operation on Xen-internal channels. It is likely more far fetched to
> assume permitting them for debugging purposes might in fact help in rare
> situations. Yet it may still be a matter of consistency to keep them in
> sync with the other two. (Note that there's also evtchn_usable(), which
> might then also need tweaking itself or at the use sites.)

Just because that is how it was, doesn't mean it was correct. I had a 
discussion with one of the original authors of FLASK before taking up 
the maintainership and he felt there were likely more XSM checks that 
should have been in place originally. He considered it a first order 
approximation of what should be protected.

v/r,
dps
Jan Beulich May 17, 2024, 7:01 a.m. UTC | #25
On 16.05.2024 21:15, Oleksii K. wrote:
> I am not deeply familiar with the technical details surrounding XSM,
> but if I understand Daniel's point correctly, the original change
> violates the access control framework. This suggests to me that the
> revert should be merged.
> 
> However, I have a question: if we merge this revert, does it mean that
> using channels a user ( domain ) will be able to get information about
> certain events such as EVTCHNSTAT_unbound, EVTCHNSTAT_interdomain,
> EVTCHNSTAT_pirq, EVTCHNSTAT_virq, and EVTCHNSTAT_IPI (based on the code
> of lseventch.c)? Is this information really so critical that it cannot
> be exposed for some time until a patch that changes the XSM policy
> consistently is provided and merged?
> 
> If this information is indeed critical and should not be exposed, I
> think we can consider Daniel's suggestion to add a check to the dummy
> XSM policy as a solution.

The question isn't whether it's critical. As pointed out elsewhere, my
view is that any exposure of information needs to come with a proof that
nothing undue can be derived from that information. I see Daniel has
responded there, so we'll continue the discussion there.

Jan
Jan Beulich May 17, 2024, 7:04 a.m. UTC | #26
On 17.05.2024 03:21, Stefano Stabellini wrote:
> On Thu, 16 May 2024, Jan Beulich wrote:
>> 1) In the discussion George claimed that exposing status information in
>> an uncontrolled manner is okay. I'm afraid I have to disagree, seeing
>> how a similar assumption by CPU designers has led to a flood of
>> vulnerabilities over the last 6+ years. Information exposure imo is never
>> okay, unless it can be _proven_ that absolutely nothing "useful" can be
>> inferred from it. (I'm having difficulty seeing how such a proof might
>> look like.)
> 
> Many would agree that it is better not to expose status information in
> an uncontrolled manner. Anyway, let's focus on the actionable.
> 
> 
>> 2) Me pointing out that the XSM hook might similarly get in the way of
>> debugging, Andrew suggested that this is not an issue because any sensible
>> XSM policy used in such an environment would grant sufficient privilege to
>> Dom0. Yet that then still doesn't cover why DomU-s also can obtain status
>> for Xen-internal event channels. The debugging argument then becomes weak,
>> as in that case the XSM hook is possibly going to get in the way.
>>
>> 3) In the discussion Andrew further gave the impression that evtchn_send()
>> had no XSM check. Yet it has; the difference to evtchn_status() is that
>> the latter uses XSM_TARGET while the former uses XSM_HOOK. (Much like
>> evtchn_status() may indeed be useful for debugging, evtchn_send() may be
>> similarly useful to allow getting a stuck channel unstuck.)
>>
>> In summary I continue to think that an outright revert was inappropriate.
>> DomU-s should continue to be denied status information on Xen-internal
>> event channels, unconditionally and independent of whether dummy, silo, or
>> Flask is in use.
> 
> I think DomU-s should continue to be denied status information on
> Xen-internal event channels *based on the default dummy, silo, or Flask
> policy*. It is not up to us to decide the security policy, only to
> enforce it and provide sensible defaults.
> 
> In any case, the XSM_TARGET check in evtchn_status seems to do what we
> want?

No. XSM_TARGET permits the "owning" (not really, but it's its table) domain
access. See xsm_default_action() in xsm/dummy.h.

Jan

> evtchn_send uses XSM_HOOK, which is weaker, but it doesn't seem to be an
> issue because (ignoring the consumer_is_xen check) there is a if(!lchn)
> check that would fail on invalid event channels?
Jan Beulich May 17, 2024, 7:24 a.m. UTC | #27
On 17.05.2024 03:22, Daniel P. Smith wrote:
> On 5/16/24 02:41, Jan Beulich wrote:
>> On 14.05.2024 11:25, Jan Beulich wrote:
>>> On 03.04.2024 08:16, Jan Beulich wrote:
>>>> On 02.04.2024 19:06, Andrew Cooper wrote:
>>>>> The commit makes a claim without any kind of justification.
>>>>
>>>> Well, what does "have no business" leave open?
>>>>
>>>>> The claim is false, and the commit broke lsevtchn in dom0.
>>>>
>>>> Or alternatively lsevtchn was doing something that was never meant to work
>>>> (from Xen's perspective).
>>>>
>>>>>   It is also quite
>>>>> obvious from XSM_TARGET that it has broken device model stubdoms too.
>>>>
>>>> Why would that be "obvious"? What business would a stubdom have to look at
>>>> Xen's side of an evtchn?
>>>>
>>>>> Whether to return information about a xen-owned evtchn is a matter of policy,
>>>>> and it's not acceptable to short circuit the XSM on the matter.
>>>>
>>>> I can certainly accept this as one possible view point. As in so many cases
>>>> I'm afraid I dislike you putting it as if it was the only possible one.
>>>>
>>>> In summary: The supposed justification you claim is missing in the original
>>>> change is imo also missing here then: What business would any entity in the
>>>> system have to look at Xen's side of an event channel? Back at the time, 3
>>>> people agreed that it's "none".
>>>
>>> You've never responded to this reply of mine, or its follow-up. You also
>>> didn't chime in on the discussion Daniel and I were having. I consider my
>>> objections unaddressed, and in fact I continue to consider the change to
>>> be wrong. Therefore it was inappropriate for you to commit it; it needs
>>> reverting asap. If you're not going to do so, I will.
>>
>> For the record: With Andrew having clarified that in his revert's
>> description:
>>
>> "The claim is false; it broke lsevtchn in dom0, a debugging utility which
>>   absolutely does care about all of the domain's event channels."
>>
>> "all of the domain's event channels" means to include event channels Xen
>> uses for its own, and merely puts in the domain's table for ease of
>> handling, I've agreed that - in the absence of any better debugging
>> means - the revert may stay in place. For context, my prior understanding
>> of the issue was that lsevtchn simply stops probing further channels when
>> getting back any kind of error from EVTCHNOP_status (which I continue to
>> think wants addressing there, by a future version of a patch that was
>> already sent). However, there are follow-on concerns I have:
>>
>> 1) In the discussion George claimed that exposing status information in
>> an uncontrolled manner is okay. I'm afraid I have to disagree, seeing
>> how a similar assumption by CPU designers has led to a flood of
>> vulnerabilities over the last 6+ years. Information exposure imo is never
>> okay, unless it can be _proven_ that absolutely nothing "useful" can be
>> inferred from it. (I'm having difficulty seeing how such a proof might
>> look like.)
> 
> Jan, I would agree with you that any resources exposed to the guest, 
> even if that resource is status information, should be behind an XSM 
> check. I would, however, have to disagree the position over proving 
> absolutely nothing "useful" can be inferred. Prior to spectre becoming 
> commonly aware, no one considered proving there needed to be protections 
> against out-of-order instruction execution being used to bypass branch 
> conditions.

Interesting perspective.

> Predicting the future will more often than not fail, but 
> with an XSM check in place, the dummy/default policy can just be updated 
> with the general safe decision and others can use FLASK to provide fine 
> grain access.

I have to admit I have difficulty seeing how one would adjust dummy to do
restrict things by default, at least as long as XSM_TARGET is used.

>> 2) Me pointing out that the XSM hook might similarly get in the way of
>> debugging, Andrew suggested that this is not an issue because any sensible
>> XSM policy used in such an environment would grant sufficient privilege to
>> Dom0. Yet that then still doesn't cover why DomU-s also can obtain status
>> for Xen-internal event channels. The debugging argument then becomes weak,
>> as in that case the XSM hook is possibly going to get in the way.
> 
> And this is where we have a fundamental difference. Event channels are 
> XSM labeled objects that are always connected to a guest. If they were 
> truly Xen-internal, then a guest would have no way to get 
> access/reference them.

And that's what I'd like to achieve. The fact that Xen puts these event
channels in the guest's table is a mere implementation detail. They could
as well be kept elsewhere, just that handling then would (likely) be more
cumbersome.

> Again, I never disagreed with whether the guest 
> should or should not be able to access them. I did ask for a better 
> explanation than, "has no business", which is a statement of opinion not 
> of fact or reason. The point is these event channels are a resource 
> attached to the guest

As per above - not really, they merely appear like that.

> and access control decisions to them should be 
> made under XSM. Injecting a hard-coded access control in front of the 
> XSM check subverted/invalidated rules in the existing FLASK policy.

I'd agree if these were truly guest resources. Elsewhere I pointed at
the equivalent in memory management: Xen-internal memory also isn't
protected by XSM checks. It's inaccessible for entirely different reasons.

>> 3) In the discussion Andrew further gave the impression that evtchn_send()
>> had no XSM check. Yet it has; the difference to evtchn_status() is that
>> the latter uses XSM_TARGET while the former uses XSM_HOOK. (Much like
>> evtchn_status() may indeed be useful for debugging, evtchn_send() may be
>> similarly useful to allow getting a stuck channel unstuck.)
> 
> A more appropriate default might be XSM_OTHER with conditions with in 
> the hook  as to whether the policy should be XSM_HOOK, XSM_TARGET or 
> flat denial.

That you mean for send, status, or both?

>> In summary I continue to think that an outright revert was inappropriate.
>> DomU-s should continue to be denied status information on Xen-internal
>> event channels, unconditionally and independent of whether dummy, silo, or
>> Flask is in use.
> 
> Any guest facing resources, regardless if the backing end is the 
> hypervisor, should be protected with XSM. This allows the maintainers to 
> codify what they believe is a sane policy in the dummy policy and the 
> end user can use FLASK to enforce what they believe is acceptable.

I continue to be under the impression that either I don't look at things
correctly, or you don't. What's "guest facing" here? Xen-internal channels
aren't guest facing aiui. Their remote end is, in Dom0 or a stubdom. The
Xen side of it isn't guest facing at all; it's merely a vehicle to ease
handling that they're put in the guest's table.

>> Plus, as stated before, evtchn_send() would better remain in sync in this
>> regard with evtchn_status(). The situation is less clear for evtchn_close()
>> and evtchn_bind_vcpu(): Those indeed have no XSM checks while they do deny
>> operation on Xen-internal channels. It is likely more far fetched to
>> assume permitting them for debugging purposes might in fact help in rare
>> situations. Yet it may still be a matter of consistency to keep them in
>> sync with the other two. (Note that there's also evtchn_usable(), which
>> might then also need tweaking itself or at the use sites.)
> 
> Just because that is how it was, doesn't mean it was correct. I had a 
> discussion with one of the original authors of FLASK before taking up 
> the maintainership and he felt there were likely more XSM checks that 
> should have been in place originally. He considered it a first order 
> approximation of what should be protected.

I'm afraid it's not really clear to me what to take from this. Are you
suggesting that further XSM checks may be wanted? If so, where, and who
would take care of adding them?

Jan
Stefano Stabellini May 17, 2024, 8:28 p.m. UTC | #28
On Fri, 17 May 2024, Jan Beulich wrote:
> On 17.05.2024 03:21, Stefano Stabellini wrote:
> > On Thu, 16 May 2024, Jan Beulich wrote:
> >> 1) In the discussion George claimed that exposing status information in
> >> an uncontrolled manner is okay. I'm afraid I have to disagree, seeing
> >> how a similar assumption by CPU designers has led to a flood of
> >> vulnerabilities over the last 6+ years. Information exposure imo is never
> >> okay, unless it can be _proven_ that absolutely nothing "useful" can be
> >> inferred from it. (I'm having difficulty seeing how such a proof might
> >> look like.)
> > 
> > Many would agree that it is better not to expose status information in
> > an uncontrolled manner. Anyway, let's focus on the actionable.
> > 
> > 
> >> 2) Me pointing out that the XSM hook might similarly get in the way of
> >> debugging, Andrew suggested that this is not an issue because any sensible
> >> XSM policy used in such an environment would grant sufficient privilege to
> >> Dom0. Yet that then still doesn't cover why DomU-s also can obtain status
> >> for Xen-internal event channels. The debugging argument then becomes weak,
> >> as in that case the XSM hook is possibly going to get in the way.
> >>
> >> 3) In the discussion Andrew further gave the impression that evtchn_send()
> >> had no XSM check. Yet it has; the difference to evtchn_status() is that
> >> the latter uses XSM_TARGET while the former uses XSM_HOOK. (Much like
> >> evtchn_status() may indeed be useful for debugging, evtchn_send() may be
> >> similarly useful to allow getting a stuck channel unstuck.)
> >>
> >> In summary I continue to think that an outright revert was inappropriate.
> >> DomU-s should continue to be denied status information on Xen-internal
> >> event channels, unconditionally and independent of whether dummy, silo, or
> >> Flask is in use.
> > 
> > I think DomU-s should continue to be denied status information on
> > Xen-internal event channels *based on the default dummy, silo, or Flask
> > policy*. It is not up to us to decide the security policy, only to
> > enforce it and provide sensible defaults.
> > 
> > In any case, the XSM_TARGET check in evtchn_status seems to do what we
> > want?
> 
> No. XSM_TARGET permits the "owning" (not really, but it's its table) domain
> access. See xsm_default_action() in xsm/dummy.h.

Sorry I still don't understand. Why is that a problem? It looks like the
wanted default behavior?
Jan Beulich May 21, 2024, 6:17 a.m. UTC | #29
On 17.05.2024 22:28, Stefano Stabellini wrote:
> On Fri, 17 May 2024, Jan Beulich wrote:
>> On 17.05.2024 03:21, Stefano Stabellini wrote:
>>> On Thu, 16 May 2024, Jan Beulich wrote:
>>>> 1) In the discussion George claimed that exposing status information in
>>>> an uncontrolled manner is okay. I'm afraid I have to disagree, seeing
>>>> how a similar assumption by CPU designers has led to a flood of
>>>> vulnerabilities over the last 6+ years. Information exposure imo is never
>>>> okay, unless it can be _proven_ that absolutely nothing "useful" can be
>>>> inferred from it. (I'm having difficulty seeing how such a proof might
>>>> look like.)
>>>
>>> Many would agree that it is better not to expose status information in
>>> an uncontrolled manner. Anyway, let's focus on the actionable.
>>>
>>>
>>>> 2) Me pointing out that the XSM hook might similarly get in the way of
>>>> debugging, Andrew suggested that this is not an issue because any sensible
>>>> XSM policy used in such an environment would grant sufficient privilege to
>>>> Dom0. Yet that then still doesn't cover why DomU-s also can obtain status
>>>> for Xen-internal event channels. The debugging argument then becomes weak,
>>>> as in that case the XSM hook is possibly going to get in the way.
>>>>
>>>> 3) In the discussion Andrew further gave the impression that evtchn_send()
>>>> had no XSM check. Yet it has; the difference to evtchn_status() is that
>>>> the latter uses XSM_TARGET while the former uses XSM_HOOK. (Much like
>>>> evtchn_status() may indeed be useful for debugging, evtchn_send() may be
>>>> similarly useful to allow getting a stuck channel unstuck.)
>>>>
>>>> In summary I continue to think that an outright revert was inappropriate.
>>>> DomU-s should continue to be denied status information on Xen-internal
>>>> event channels, unconditionally and independent of whether dummy, silo, or
>>>> Flask is in use.
>>>
>>> I think DomU-s should continue to be denied status information on
>>> Xen-internal event channels *based on the default dummy, silo, or Flask
>>> policy*. It is not up to us to decide the security policy, only to
>>> enforce it and provide sensible defaults.
>>>
>>> In any case, the XSM_TARGET check in evtchn_status seems to do what we
>>> want?
>>
>> No. XSM_TARGET permits the "owning" (not really, but it's its table) domain
>> access. See xsm_default_action() in xsm/dummy.h.
> 
> Sorry I still don't understand. Why is that a problem? It looks like the
> wanted default behavior?

For ordinary event channels - yes. But not for Xen-internal ones, at least
from my pov.

Jan
diff mbox series

Patch

diff --git a/xen/common/event_channel.c b/xen/common/event_channel.c
index 20f586cf5ecd..ae6c2f902645 100644
--- a/xen/common/event_channel.c
+++ b/xen/common/event_channel.c
@@ -1040,12 +1040,6 @@  int evtchn_status(evtchn_status_t *status)
 
     read_lock(&d->event_lock);
 
-    if ( consumer_is_xen(chn) )
-    {
-        rc = -EACCES;
-        goto out;
-    }
-
     rc = xsm_evtchn_status(XSM_TARGET, d, chn);
     if ( rc )
         goto out;