diff mbox series

[v2,01/12] xen/trace: Don't over-read trace objects

Message ID 20210920172529.24932-2-andrew.cooper3@citrix.com (mailing list archive)
State New, archived
Headers show
Series xen/trace: Fix leakage of uninitialised stack into the tracebuffer | expand

Commit Message

Andrew Cooper Sept. 20, 2021, 5:25 p.m. UTC
In the case that 'extra' isn't a multiple of uint32_t, the calculation rounds
the number of bytes up, causing later logic to read unrelated bytes beyond the
end of the object.

Also, asserting that the object is within TRACE_EXTRA_MAX, but truncating it
in release builds is rude.  Instead, reject any out-of-spec records, leaving
enough of a message to identify the faulty caller.

There is one buggy race record, TRC_RTDS_BUDGET_BURN.  As it must remain
__packed (as cur_budget is misaligned), change bool has_extratime to uint32_t
to compensate.

The new printk() can also be hit by HVMOP_xentrace, although no over-read is
possible.  This has no business being a hypercall in the first place, as it
can't be used outside of custom local debugging, so extend the uint32_t
requirement to HVMOP_xentrace too.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: George Dunlap <George.Dunlap@eu.citrix.com>
CC: Ian Jackson <iwj@xenproject.org>
CC: Jan Beulich <JBeulich@suse.com>
CC: Stefano Stabellini <sstabellini@kernel.org>
CC: Wei Liu <wl@xen.org>
CC: Julien Grall <julien@xen.org>
CC: Dario Faggioli <dfaggioli@suse.com>
CC: Meng Xu <mengxu@cis.upenn.edu>

v2:
 * Adjust HVMOP_xentrace to avoid hitting the printk()
 * Fix TRC_RTDS_BUDGET_BURN
 * Adjust wording in __trace_var()
---
 xen/arch/x86/hvm/hvm.c |  5 +++--
 xen/common/sched/rt.c  | 24 +++++++++++++-----------
 xen/common/trace.c     | 21 ++++++++++-----------
 3 files changed, 26 insertions(+), 24 deletions(-)

Comments

Jan Beulich Sept. 21, 2021, 6:53 a.m. UTC | #1
On 20.09.2021 19:25, Andrew Cooper wrote:
> In the case that 'extra' isn't a multiple of uint32_t, the calculation rounds
> the number of bytes up, causing later logic to read unrelated bytes beyond the
> end of the object.
> 
> Also, asserting that the object is within TRACE_EXTRA_MAX, but truncating it
> in release builds is rude.  Instead, reject any out-of-spec records, leaving
> enough of a message to identify the faulty caller.
> 
> There is one buggy race record, TRC_RTDS_BUDGET_BURN.  As it must remain

Nit: I guess s/race/trace/ ?

> __packed (as cur_budget is misaligned), change bool has_extratime to uint32_t
> to compensate.
> 
> The new printk() can also be hit by HVMOP_xentrace, although no over-read is
> possible.  This has no business being a hypercall in the first place, as it
> can't be used outside of custom local debugging, so extend the uint32_t
> requirement to HVMOP_xentrace too.
> 
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>

Reviewed-by: Jan Beulich <jbeulich@suse.com>

Two remarks (plus further not directly related ones), nevertheless:

> --- a/xen/arch/x86/hvm/hvm.c
> +++ b/xen/arch/x86/hvm/hvm.c
> @@ -5063,8 +5063,9 @@ long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) arg)
>          if ( copy_from_guest(&tr, arg, 1 ) )
>              return -EFAULT;
>  
> -        if ( tr.extra_bytes > sizeof(tr.extra)
> -             || (tr.event & ~((1u<<TRC_SUBCLS_SHIFT)-1)) )
> +        if ( tr.extra_bytes % sizeof(uint32_t) ||
> +             tr.extra_bytes > sizeof(tr.extra) ||
> +             tr.event >> TRC_SUBCLS_SHIFT )
>              return -EINVAL;

Despite this being a function that supposedly no-one is to really
use, you're breaking the interface here when really there would be a
way to be backwards compatible: Instead of failing, pad the data to
a 32-bit boundary. As the interface struct is large enough, this
would look to be as simple as a memset() plus aligning extra_bytes
upwards. Otherwise the deliberate breaking of potential existing
callers wants making explicit in the respective paragraph of the
description.

As an aside I think at the very least the case block wants enclosing
in "#ifdef CONFIG_TRACEBUFFER", such that builds without the support
would indicate so to callers (albeit that indication would then be
accompanied by a bogus log message in debug builds).

Seeing the adjacent HVMOP_get_time I also wonder who the intended
users of that one are.

> --- a/xen/common/sched/rt.c
> +++ b/xen/common/sched/rt.c
> @@ -968,18 +968,20 @@ burn_budget(const struct scheduler *ops, struct rt_unit *svc, s_time_t now)
>      /* TRACE */
>      {
>          struct __packed {
> -            unsigned unit:16, dom:16;
> +            uint16_t unit, dom;
>              uint64_t cur_budget;
> -            int delta;
> -            unsigned priority_level;
> -            bool has_extratime;
> -        } d;
> -        d.dom = svc->unit->domain->domain_id;
> -        d.unit = svc->unit->unit_id;
> -        d.cur_budget = (uint64_t) svc->cur_budget;
> -        d.delta = delta;
> -        d.priority_level = svc->priority_level;
> -        d.has_extratime = svc->flags & RTDS_extratime;
> +            uint32_t delta;

The original field was plain int, and aiui for a valid reason. I
don't see why you couldn't use int32_t here.

Feel free to retain the R-b if you decide to make the two suggested
changes which are directly related to your change here.

Jan
Andrew Cooper Sept. 21, 2021, 5:51 p.m. UTC | #2
On 21/09/2021 07:53, Jan Beulich wrote:
> On 20.09.2021 19:25, Andrew Cooper wrote:
>> In the case that 'extra' isn't a multiple of uint32_t, the calculation rounds
>> the number of bytes up, causing later logic to read unrelated bytes beyond the
>> end of the object.
>>
>> Also, asserting that the object is within TRACE_EXTRA_MAX, but truncating it
>> in release builds is rude.  Instead, reject any out-of-spec records, leaving
>> enough of a message to identify the faulty caller.
>>
>> There is one buggy race record, TRC_RTDS_BUDGET_BURN.  As it must remain
> Nit: I guess s/race/trace/ ?

Oops yes.

>
>> __packed (as cur_budget is misaligned), change bool has_extratime to uint32_t
>> to compensate.
>>
>> The new printk() can also be hit by HVMOP_xentrace, although no over-read is
>> possible.  This has no business being a hypercall in the first place, as it
>> can't be used outside of custom local debugging, so extend the uint32_t
>> requirement to HVMOP_xentrace too.
>>
>> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
> Reviewed-by: Jan Beulich <jbeulich@suse.com>
>
> Two remarks (plus further not directly related ones), nevertheless:
>
>> --- a/xen/arch/x86/hvm/hvm.c
>> +++ b/xen/arch/x86/hvm/hvm.c
>> @@ -5063,8 +5063,9 @@ long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) arg)
>>          if ( copy_from_guest(&tr, arg, 1 ) )
>>              return -EFAULT;
>>  
>> -        if ( tr.extra_bytes > sizeof(tr.extra)
>> -             || (tr.event & ~((1u<<TRC_SUBCLS_SHIFT)-1)) )
>> +        if ( tr.extra_bytes % sizeof(uint32_t) ||
>> +             tr.extra_bytes > sizeof(tr.extra) ||
>> +             tr.event >> TRC_SUBCLS_SHIFT )
>>              return -EINVAL;
> Despite this being a function that supposedly no-one is to really
> use, you're breaking the interface here when really there would be a
> way to be backwards compatible: Instead of failing, pad the data to
> a 32-bit boundary. As the interface struct is large enough, this
> would look to be as simple as a memset() plus aligning extra_bytes
> upwards. Otherwise the deliberate breaking of potential existing
> callers wants making explicit in the respective paragraph of the
> description.

It is discussed, along with a justification for why an ABI change is fine.

But I could do

tr.extra_bytes = ROUNDUP(tr.extra_bytes, sizeof(uint32_t));

if you'd really prefer.


> As an aside I think at the very least the case block wants enclosing
> in "#ifdef CONFIG_TRACEBUFFER", such that builds without the support
> would indicate so to callers (albeit that indication would then be
> accompanied by a bogus log message in debug builds).

That message really needs deleting.

As a better alternative,

if ( !IS_ENABLED(CONFIG_TRACEBUFFER) )
    return -EOPNOTSUPP;

The call to __trace_var() is safe in !CONFIG_TRACEBUFFER builds.

>
> Seeing the adjacent HVMOP_get_time I also wonder who the intended
> users of that one are.

There is a very large amount of junk in hvm_op(), and to a first
approximation, I would include HVMOP_get_time in that category.

But c/s b91391491c58ac40a935e10cf0703b87d8733c38 explains why it is
necessary.  This just goes to demonstrate how broken our time handling
is.  I'll add this to the pile of things needing fixing in ABI-v2.

>
>> --- a/xen/common/sched/rt.c
>> +++ b/xen/common/sched/rt.c
>> @@ -968,18 +968,20 @@ burn_budget(const struct scheduler *ops, struct rt_unit *svc, s_time_t now)
>>      /* TRACE */
>>      {
>>          struct __packed {
>> -            unsigned unit:16, dom:16;
>> +            uint16_t unit, dom;
>>              uint64_t cur_budget;
>> -            int delta;
>> -            unsigned priority_level;
>> -            bool has_extratime;
>> -        } d;
>> -        d.dom = svc->unit->domain->domain_id;
>> -        d.unit = svc->unit->unit_id;
>> -        d.cur_budget = (uint64_t) svc->cur_budget;
>> -        d.delta = delta;
>> -        d.priority_level = svc->priority_level;
>> -        d.has_extratime = svc->flags & RTDS_extratime;
>> +            uint32_t delta;
> The original field was plain int, and aiui for a valid reason. I
> don't see why you couldn't use int32_t here.

delta can't be negative, because there is a check earlier in the function.

What is a problem is the 63=>32 bit truncation, and uint32_t here is
half as bad as int32_t.

~Andrew
Jan Beulich Sept. 22, 2021, 7:01 a.m. UTC | #3
On 21.09.2021 19:51, Andrew Cooper wrote:
> On 21/09/2021 07:53, Jan Beulich wrote:
>> On 20.09.2021 19:25, Andrew Cooper wrote:
>>> --- a/xen/arch/x86/hvm/hvm.c
>>> +++ b/xen/arch/x86/hvm/hvm.c
>>> @@ -5063,8 +5063,9 @@ long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) arg)
>>>          if ( copy_from_guest(&tr, arg, 1 ) )
>>>              return -EFAULT;
>>>  
>>> -        if ( tr.extra_bytes > sizeof(tr.extra)
>>> -             || (tr.event & ~((1u<<TRC_SUBCLS_SHIFT)-1)) )
>>> +        if ( tr.extra_bytes % sizeof(uint32_t) ||
>>> +             tr.extra_bytes > sizeof(tr.extra) ||
>>> +             tr.event >> TRC_SUBCLS_SHIFT )
>>>              return -EINVAL;
>> Despite this being a function that supposedly no-one is to really
>> use, you're breaking the interface here when really there would be a
>> way to be backwards compatible: Instead of failing, pad the data to
>> a 32-bit boundary. As the interface struct is large enough, this
>> would look to be as simple as a memset() plus aligning extra_bytes
>> upwards. Otherwise the deliberate breaking of potential existing
>> callers wants making explicit in the respective paragraph of the
>> description.
> 
> It is discussed, along with a justification for why an ABI change is fine.

What you say is "This has no business being a hypercall in the first
place", yet to me that's not a justification to break an interface.
It is part of the ABI, so disallowing what was previously allowed
may break people's (questionable, yes) code.

> But I could do
> 
> tr.extra_bytes = ROUNDUP(tr.extra_bytes, sizeof(uint32_t));
> 
> if you'd really prefer.

I would, indeed, and as said ideally alongside clearing the excess
bytes in the buffer.

>> As an aside I think at the very least the case block wants enclosing
>> in "#ifdef CONFIG_TRACEBUFFER", such that builds without the support
>> would indicate so to callers (albeit that indication would then be
>> accompanied by a bogus log message in debug builds).
> 
> That message really needs deleting.
> 
> As a better alternative,
> 
> if ( !IS_ENABLED(CONFIG_TRACEBUFFER) )
>     return -EOPNOTSUPP;
> 
> The call to __trace_var() is safe in !CONFIG_TRACEBUFFER builds.

Fine with me (I'm inclined to say of course).

>> Seeing the adjacent HVMOP_get_time I also wonder who the intended
>> users of that one are.
> 
> There is a very large amount of junk in hvm_op(), and to a first
> approximation, I would include HVMOP_get_time in that category.
> 
> But c/s b91391491c58ac40a935e10cf0703b87d8733c38 explains why it is
> necessary.  This just goes to demonstrate how broken our time handling
> is.  I'll add this to the pile of things needing fixing in ABI-v2.

Urgh.

>>> --- a/xen/common/sched/rt.c
>>> +++ b/xen/common/sched/rt.c
>>> @@ -968,18 +968,20 @@ burn_budget(const struct scheduler *ops, struct rt_unit *svc, s_time_t now)
>>>      /* TRACE */
>>>      {
>>>          struct __packed {
>>> -            unsigned unit:16, dom:16;
>>> +            uint16_t unit, dom;
>>>              uint64_t cur_budget;
>>> -            int delta;
>>> -            unsigned priority_level;
>>> -            bool has_extratime;
>>> -        } d;
>>> -        d.dom = svc->unit->domain->domain_id;
>>> -        d.unit = svc->unit->unit_id;
>>> -        d.cur_budget = (uint64_t) svc->cur_budget;
>>> -        d.delta = delta;
>>> -        d.priority_level = svc->priority_level;
>>> -        d.has_extratime = svc->flags & RTDS_extratime;
>>> +            uint32_t delta;
>> The original field was plain int, and aiui for a valid reason. I
>> don't see why you couldn't use int32_t here.
> 
> delta can't be negative, because there is a check earlier in the function.

Oh, yes, didn't spot that.

> What is a problem is the 63=>32 bit truncation, and uint32_t here is
> half as bad as int32_t.

Agreed. Whether the truncation is an issue in practice is questionable,
as I wouldn't expect budget to be consumed in multiple-second individual
steps. But I didn't check whether this scheduler might allow a vCPU to
run for this long all in one go.

Jan
Andrew Cooper Sept. 22, 2021, 12:58 p.m. UTC | #4
On 22/09/2021 08:01, Jan Beulich wrote:
> On 21.09.2021 19:51, Andrew Cooper wrote:
>> On 21/09/2021 07:53, Jan Beulich wrote:
>>> On 20.09.2021 19:25, Andrew Cooper wrote:
>>>> --- a/xen/arch/x86/hvm/hvm.c
>>>> +++ b/xen/arch/x86/hvm/hvm.c
>>>> @@ -5063,8 +5063,9 @@ long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) arg)
>>>>          if ( copy_from_guest(&tr, arg, 1 ) )
>>>>              return -EFAULT;
>>>>  
>>>> -        if ( tr.extra_bytes > sizeof(tr.extra)
>>>> -             || (tr.event & ~((1u<<TRC_SUBCLS_SHIFT)-1)) )
>>>> +        if ( tr.extra_bytes % sizeof(uint32_t) ||
>>>> +             tr.extra_bytes > sizeof(tr.extra) ||
>>>> +             tr.event >> TRC_SUBCLS_SHIFT )
>>>>              return -EINVAL;
>>> Despite this being a function that supposedly no-one is to really
>>> use, you're breaking the interface here when really there would be a
>>> way to be backwards compatible: Instead of failing, pad the data to
>>> a 32-bit boundary. As the interface struct is large enough, this
>>> would look to be as simple as a memset() plus aligning extra_bytes
>>> upwards. Otherwise the deliberate breaking of potential existing
>>> callers wants making explicit in the respective paragraph of the
>>> description.
>> It is discussed, along with a justification for why an ABI change is fine.
> What you say is "This has no business being a hypercall in the first
> place", yet to me that's not a justification to break an interface.

No, but "cannot be used outside of custom debugging" means there are no
users in practice, and therefore it really doesn't matter.

> It is part of the ABI, so disallowing what was previously allowed
> may break people's (questionable, yes) code.
>
>> But I could do
>>
>> tr.extra_bytes = ROUNDUP(tr.extra_bytes, sizeof(uint32_t));
>>
>> if you'd really prefer.
> I would, indeed, and as said ideally alongside clearing the excess
> bytes in the buffer.

Why?  The entire structure is copied out of guest memory, with a fixed size.

It's not Xen's fault/problem if the VM didn't initialise it correctly,
and an explicit ROUNDUP() here maintains the current behaviour.

>>>> --- a/xen/common/sched/rt.c
>>>> +++ b/xen/common/sched/rt.c
>>>> @@ -968,18 +968,20 @@ burn_budget(const struct scheduler *ops, struct rt_unit *svc, s_time_t now)
>>>>      /* TRACE */
>>>>      {
>>>>          struct __packed {
>>>> -            unsigned unit:16, dom:16;
>>>> +            uint16_t unit, dom;
>>>>              uint64_t cur_budget;
>>>> -            int delta;
>>>> -            unsigned priority_level;
>>>> -            bool has_extratime;
>>>> -        } d;
>>>> -        d.dom = svc->unit->domain->domain_id;
>>>> -        d.unit = svc->unit->unit_id;
>>>> -        d.cur_budget = (uint64_t) svc->cur_budget;
>>>> -        d.delta = delta;
>>>> -        d.priority_level = svc->priority_level;
>>>> -        d.has_extratime = svc->flags & RTDS_extratime;
>>>> +            uint32_t delta;
>>> The original field was plain int, and aiui for a valid reason. I
>>> don't see why you couldn't use int32_t here.
>> delta can't be negative, because there is a check earlier in the function.
> Oh, yes, didn't spot that.
>
>> What is a problem is the 63=>32 bit truncation, and uint32_t here is
>> half as bad as int32_t.
> Agreed. Whether the truncation is an issue in practice is questionable,
> as I wouldn't expect budget to be consumed in multiple-second individual
> steps. But I didn't check whether this scheduler might allow a vCPU to
> run for this long all in one go.

I expect it's marginal too.  Honestly, its not a bug I care to fix right
about now.  I could leave a /* TODO: truncation? */ in place so whomever
encounters weird behaviour from this trace record has a bit more help of
where to look?

~Andrew
Jan Beulich Sept. 22, 2021, 1:32 p.m. UTC | #5
On 22.09.2021 14:58, Andrew Cooper wrote:
> On 22/09/2021 08:01, Jan Beulich wrote:
>> On 21.09.2021 19:51, Andrew Cooper wrote:
>>> On 21/09/2021 07:53, Jan Beulich wrote:
>>>> On 20.09.2021 19:25, Andrew Cooper wrote:
>>>>> --- a/xen/arch/x86/hvm/hvm.c
>>>>> +++ b/xen/arch/x86/hvm/hvm.c
>>>>> @@ -5063,8 +5063,9 @@ long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) arg)
>>>>>          if ( copy_from_guest(&tr, arg, 1 ) )
>>>>>              return -EFAULT;
>>>>>  
>>>>> -        if ( tr.extra_bytes > sizeof(tr.extra)
>>>>> -             || (tr.event & ~((1u<<TRC_SUBCLS_SHIFT)-1)) )
>>>>> +        if ( tr.extra_bytes % sizeof(uint32_t) ||
>>>>> +             tr.extra_bytes > sizeof(tr.extra) ||
>>>>> +             tr.event >> TRC_SUBCLS_SHIFT )
>>>>>              return -EINVAL;
>>>> Despite this being a function that supposedly no-one is to really
>>>> use, you're breaking the interface here when really there would be a
>>>> way to be backwards compatible: Instead of failing, pad the data to
>>>> a 32-bit boundary. As the interface struct is large enough, this
>>>> would look to be as simple as a memset() plus aligning extra_bytes
>>>> upwards. Otherwise the deliberate breaking of potential existing
>>>> callers wants making explicit in the respective paragraph of the
>>>> description.
>>> It is discussed, along with a justification for why an ABI change is fine.
>> What you say is "This has no business being a hypercall in the first
>> place", yet to me that's not a justification to break an interface.
> 
> No, but "cannot be used outside of custom debugging" means there are no
> users in practice, and therefore it really doesn't matter.
> 
>> It is part of the ABI, so disallowing what was previously allowed
>> may break people's (questionable, yes) code.
>>
>>> But I could do
>>>
>>> tr.extra_bytes = ROUNDUP(tr.extra_bytes, sizeof(uint32_t));
>>>
>>> if you'd really prefer.
>> I would, indeed, and as said ideally alongside clearing the excess
>> bytes in the buffer.
> 
> Why?  The entire structure is copied out of guest memory, with a fixed size.
> 
> It's not Xen's fault/problem if the VM didn't initialise it correctly,
> and an explicit ROUNDUP() here maintains the current behaviour.

What I don't understand is what you derive "didn't initialise it correctly"
from. The public header says nothing. The data field being of type uint8_t[]
may very well suggest that any size is fine. Propagating rubbish instead of
predictable values (zeroes) seems wrong to me.

Anyway - I'm not going to insist; I merely think we should be as forgiving
as possible in situations like this.

>>>>> --- a/xen/common/sched/rt.c
>>>>> +++ b/xen/common/sched/rt.c
>>>>> @@ -968,18 +968,20 @@ burn_budget(const struct scheduler *ops, struct rt_unit *svc, s_time_t now)
>>>>>      /* TRACE */
>>>>>      {
>>>>>          struct __packed {
>>>>> -            unsigned unit:16, dom:16;
>>>>> +            uint16_t unit, dom;
>>>>>              uint64_t cur_budget;
>>>>> -            int delta;
>>>>> -            unsigned priority_level;
>>>>> -            bool has_extratime;
>>>>> -        } d;
>>>>> -        d.dom = svc->unit->domain->domain_id;
>>>>> -        d.unit = svc->unit->unit_id;
>>>>> -        d.cur_budget = (uint64_t) svc->cur_budget;
>>>>> -        d.delta = delta;
>>>>> -        d.priority_level = svc->priority_level;
>>>>> -        d.has_extratime = svc->flags & RTDS_extratime;
>>>>> +            uint32_t delta;
>>>> The original field was plain int, and aiui for a valid reason. I
>>>> don't see why you couldn't use int32_t here.
>>> delta can't be negative, because there is a check earlier in the function.
>> Oh, yes, didn't spot that.
>>
>>> What is a problem is the 63=>32 bit truncation, and uint32_t here is
>>> half as bad as int32_t.
>> Agreed. Whether the truncation is an issue in practice is questionable,
>> as I wouldn't expect budget to be consumed in multiple-second individual
>> steps. But I didn't check whether this scheduler might allow a vCPU to
>> run for this long all in one go.
> 
> I expect it's marginal too.  Honestly, its not a bug I care to fix right
> about now.  I could leave a /* TODO: truncation? */ in place so whomever
> encounters weird behaviour from this trace record has a bit more help of
> where to look?

Would seem reasonable to me, but really needs to be answered by the
maintainers of this code.

Jan
Dario Faggioli Sept. 24, 2021, 2:35 p.m. UTC | #6
On Wed, 2021-09-22 at 13:58 +0100, Andrew Cooper wrote:
> On 22/09/2021 08:01, Jan Beulich wrote:
> 
> > 
> > Agreed. Whether the truncation is an issue in practice is
> > questionable,
> > as I wouldn't expect budget to be consumed in multiple-second
> > individual
> > steps. But I didn't check whether this scheduler might allow a vCPU
> > to
> > run for this long all in one go.
> 
> I expect it's marginal too.  
>
It is indeed.

> Honestly, its not a bug I care to fix right
> about now.  I could leave a /* TODO: truncation? */ in place so
> whomever
> encounters weird behaviour from this trace record has a bit more help
> of
> where to look?
> 
Sure, that's fine for me.

Thanks and Regards
Dario Faggioli Sept. 24, 2021, 2:51 p.m. UTC | #7
On Mon, 2021-09-20 at 18:25 +0100, Andrew Cooper wrote:

> There is one buggy race record, TRC_RTDS_BUDGET_BURN.  As it must
> remain
> __packed (as cur_budget is misaligned), change bool has_extratime to
> uint32_t
> to compensate.
> 
Mmm... maybe my understanding of data alignment inside structs is a bit
lacking, but what the actual issue here, and what would we need to do
to fix it (where, by fix, I mean us being able to get rid of the
`__packed`)?

If rearranging fields is not enough, we can think about making
priority_level and has_extratime smaller, or even combining them in
just one field and decode the information in xentrace.

Of course, I can send a patch for that myself, even as a followup of
this series when it's in, as soon as we agree about the best way
forward.

Thanks and Regards
Jan Beulich Sept. 27, 2021, 7:51 a.m. UTC | #8
On 24.09.2021 16:51, Dario Faggioli wrote:
> On Mon, 2021-09-20 at 18:25 +0100, Andrew Cooper wrote:
> 
>> There is one buggy race record, TRC_RTDS_BUDGET_BURN.  As it must
>> remain
>> __packed (as cur_budget is misaligned), change bool has_extratime to
>> uint32_t
>> to compensate.
>>
> Mmm... maybe my understanding of data alignment inside structs is a bit
> lacking, but what the actual issue here, and what would we need to do
> to fix it (where, by fix, I mean us being able to get rid of the
> `__packed`)?
> 
> If rearranging fields is not enough, we can think about making
> priority_level and has_extratime smaller, or even combining them in
> just one field and decode the information in xentrace.

I guess Andrew has tried to avoid re-arranging field order so that
the consumer side doesn't need to also change.

Jan
Dario Faggioli Sept. 30, 2021, 8:07 a.m. UTC | #9
On Mon, 2021-09-27 at 09:51 +0200, Jan Beulich wrote:
> On 24.09.2021 16:51, Dario Faggioli wrote:
> > On Mon, 2021-09-20 at 18:25 +0100, Andrew Cooper wrote:
> > 
> > > There is one buggy race record, TRC_RTDS_BUDGET_BURN.  As it must
> > > remain
> > > __packed (as cur_budget is misaligned), change bool has_extratime
> > > to
> > > uint32_t
> > > to compensate.
> > > 
> > Mmm... maybe my understanding of data alignment inside structs is a
> > bit
> > lacking, but what the actual issue here, and what would we need to
> > do
> > to fix it (where, by fix, I mean us being able to get rid of the
> > `__packed`)?
> > 
> > If rearranging fields is not enough, we can think about making
> > priority_level and has_extratime smaller, or even combining them in
> > just one field and decode the information in xentrace.
> 
> I guess Andrew has tried to avoid re-arranging field order so that
> the consumer side doesn't need to also change.
> 
Right, but is it really worth it, in this case?

Aren't we (very very likely) in control, by having them here in the
tree, of all the consumers? And is is this a stable ABI?

Also, both xentrace_format and xenalyze are being modified in this
series anyway...

Maybe there's still something I'm missing, but if we've getting rid of
those ugly bitfields and __packed attributes, it seems to me that doing
it completely --i.e., including in this case-- is worth the small
change in the tools.

Regards
Andrew Cooper Dec. 3, 2021, 4:29 p.m. UTC | #10
On 30/09/2021 09:07, Dario Faggioli wrote:
> On Mon, 2021-09-27 at 09:51 +0200, Jan Beulich wrote:
>> On 24.09.2021 16:51, Dario Faggioli wrote:
>>> On Mon, 2021-09-20 at 18:25 +0100, Andrew Cooper wrote:
>>>
>>>> There is one buggy race record, TRC_RTDS_BUDGET_BURN.  As it must
>>>> remain
>>>> __packed (as cur_budget is misaligned), change bool has_extratime
>>>> to
>>>> uint32_t
>>>> to compensate.
>>>>
>>> Mmm... maybe my understanding of data alignment inside structs is a
>>> bit
>>> lacking, but what the actual issue here, and what would we need to
>>> do
>>> to fix it (where, by fix, I mean us being able to get rid of the
>>> `__packed`)?
>>>
>>> If rearranging fields is not enough, we can think about making
>>> priority_level and has_extratime smaller, or even combining them in
>>> just one field and decode the information in xentrace.
>> I guess Andrew has tried to avoid re-arranging field order so that
>> the consumer side doesn't need to also change.
>>
> Right, but is it really worth it, in this case?
>
> Aren't we (very very likely) in control, by having them here in the
> tree, of all the consumers? And is is this a stable ABI?
>
> Also, both xentrace_format and xenalyze are being modified in this
> series anyway...
>
> Maybe there's still something I'm missing, but if we've getting rid of
> those ugly bitfields and __packed attributes, it seems to me that doing
> it completely --i.e., including in this case-- is worth the small
> change in the tools.

This patch needs backporting to stable trees.  We shouldn't be changing
the ABI, even if its stability is unclear.

Which means patch 2 needs altering to avoid ABI changes.  /sigh

~Andre
diff mbox series

Patch

diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 7b48a1b925bb..09cf6330ad26 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -5063,8 +5063,9 @@  long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) arg)
         if ( copy_from_guest(&tr, arg, 1 ) )
             return -EFAULT;
 
-        if ( tr.extra_bytes > sizeof(tr.extra)
-             || (tr.event & ~((1u<<TRC_SUBCLS_SHIFT)-1)) )
+        if ( tr.extra_bytes % sizeof(uint32_t) ||
+             tr.extra_bytes > sizeof(tr.extra) ||
+             tr.event >> TRC_SUBCLS_SHIFT )
             return -EINVAL;
 
         /* Cycles will be taken at the vmexit and vmenter */
diff --git a/xen/common/sched/rt.c b/xen/common/sched/rt.c
index c24cd2ac3200..c58edca0de84 100644
--- a/xen/common/sched/rt.c
+++ b/xen/common/sched/rt.c
@@ -968,18 +968,20 @@  burn_budget(const struct scheduler *ops, struct rt_unit *svc, s_time_t now)
     /* TRACE */
     {
         struct __packed {
-            unsigned unit:16, dom:16;
+            uint16_t unit, dom;
             uint64_t cur_budget;
-            int delta;
-            unsigned priority_level;
-            bool has_extratime;
-        } d;
-        d.dom = svc->unit->domain->domain_id;
-        d.unit = svc->unit->unit_id;
-        d.cur_budget = (uint64_t) svc->cur_budget;
-        d.delta = delta;
-        d.priority_level = svc->priority_level;
-        d.has_extratime = svc->flags & RTDS_extratime;
+            uint32_t delta;
+            uint32_t priority_level;
+            uint32_t has_extratime;
+        } d = {
+            .unit            = svc->unit->unit_id,
+            .dom             = svc->unit->domain->domain_id,
+            .cur_budget      = svc->cur_budget,
+            .delta           = delta,
+            .priority_level  = svc->priority_level,
+            .has_extratime   = !!(svc->flags & RTDS_extratime),
+        };
+
         trace_var(TRC_RTDS_BUDGET_BURN, 1,
                   sizeof(d),
                   (unsigned char *) &d);
diff --git a/xen/common/trace.c b/xen/common/trace.c
index a2a389a1c7c3..4297ff505fb9 100644
--- a/xen/common/trace.c
+++ b/xen/common/trace.c
@@ -686,22 +686,21 @@  void __trace_var(u32 event, bool_t cycles, unsigned int extra,
     unsigned long flags;
     u32 bytes_to_tail, bytes_to_wrap;
     unsigned int rec_size, total_size;
-    unsigned int extra_word;
     bool_t started_below_highwater;
 
     if( !tb_init_done )
         return;
 
-    /* Convert byte count into word count, rounding up */
-    extra_word = (extra / sizeof(u32));
-    if ( (extra % sizeof(u32)) != 0 )
-        extra_word++;
-    
-    ASSERT(extra_word <= TRACE_EXTRA_MAX);
-    extra_word = min_t(int, extra_word, TRACE_EXTRA_MAX);
-
-    /* Round size up to nearest word */
-    extra = extra_word * sizeof(u32);
+    /*
+     * extra data needs to be an exact multiple of uint32_t to prevent the
+     * later logic over-reading the object.  Reject out-of-spec records.  Any
+     * failure here is an error in the caller.
+     */
+    if ( extra % sizeof(uint32_t) ||
+         extra / sizeof(uint32_t) > TRACE_EXTRA_MAX )
+        return printk_once(XENLOG_WARNING
+                           "Trace event %#x bad size %u, discarding\n",
+                           event, extra);
 
     if ( (tb_event_mask & event) == 0 )
         return;