diff mbox series

[3/4] xen/version: Drop bogus return values for XENVER_platform_parameters

Message ID 20230103200943.5801-4-andrew.cooper3@citrix.com (mailing list archive)
State Superseded
Headers show
Series Fix truncation of various XENVER_* subops | expand

Commit Message

Andrew Cooper Jan. 3, 2023, 8:09 p.m. UTC
A split in virtual address space is only applicable for x86 PV guests.
Furthermore, the information returned for x86 64bit PV guests is wrong.

Explain the problem in version.h, stating the other information that PV guests
need to know.

For 64bit PV guests, and all non-x86-PV guests, return 0, which is strictly
less wrong than the values currently returned.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: George Dunlap <George.Dunlap@eu.citrix.com>
CC: Jan Beulich <JBeulich@suse.com>
CC: Stefano Stabellini <sstabellini@kernel.org>
CC: Wei Liu <wl@xen.org>
CC: Julien Grall <julien@xen.org>
---
 xen/common/kernel.c          |  6 ++++--
 xen/include/public/version.h | 20 ++++++++++++++++++++
 2 files changed, 24 insertions(+), 2 deletions(-)

Comments

Jan Beulich Jan. 4, 2023, 4:40 p.m. UTC | #1
On 03.01.2023 21:09, Andrew Cooper wrote:
> A split in virtual address space is only applicable for x86 PV guests.
> Furthermore, the information returned for x86 64bit PV guests is wrong.
> 
> Explain the problem in version.h, stating the other information that PV guests
> need to know.
> 
> For 64bit PV guests, and all non-x86-PV guests, return 0, which is strictly
> less wrong than the values currently returned.

I disagree for the 64-bit part of this. Seeing Linux'es exposure of the
value in sysfs I even wonder whether we can change this like you do for
HVM. Who knows what is being inferred from the value, and by whom.

> --- a/xen/include/public/version.h
> +++ b/xen/include/public/version.h
> @@ -42,6 +42,26 @@ typedef char xen_capabilities_info_t[1024];
>  typedef char xen_changeset_info_t[64];
>  #define XEN_CHANGESET_INFO_LEN (sizeof(xen_changeset_info_t))
>  
> +/*
> + * This API is problematic.
> + *
> + * It is only applicable to guests which share pagetables with Xen (x86 PV
> + * guests), and is supposed to identify the virtual address split between
> + * guest kernel and Xen.
> + *
> + * For 32bit PV guests, it mostly does this, but the caller needs to know that
> + * Xen lives between the split and 4G.
> + *
> + * For 64bit PV guests, Xen lives at the bottom of the upper canonical range.
> + * This previously returned the start of the upper canonical range (which is
> + * the userspace/Xen split), not the Xen/kernel split (which is 8TB further
> + * on).  This now returns 0 because the old number wasn't correct, and
> + * changing it to anything else would be even worse.

Whether the guest runs user mode code in the low or high half (or in yet
another way of splitting) isn't really dictated by the PV ABI, is it? So
whether the value is "wrong" is entirely unclear. Instead ...

> + * For all guest types using hardware virt extentions, Xen is not mapped into
> + * the guest kernel virtual address space.  This now return 0, where it
> + * previously returned unrelated data.
> + */
>  #define XENVER_platform_parameters 5
>  struct xen_platform_parameters {
>      xen_ulong_t virt_start;

... the field name tells me that all that is being conveyed is the virtual
address of where the hypervisor area starts.

Jan
Andrew Cooper Jan. 4, 2023, 7:55 p.m. UTC | #2
On 04/01/2023 4:40 pm, Jan Beulich wrote:
> On 03.01.2023 21:09, Andrew Cooper wrote:
>> A split in virtual address space is only applicable for x86 PV guests.
>> Furthermore, the information returned for x86 64bit PV guests is wrong.
>>
>> Explain the problem in version.h, stating the other information that PV guests
>> need to know.
>>
>> For 64bit PV guests, and all non-x86-PV guests, return 0, which is strictly
>> less wrong than the values currently returned.
> I disagree for the 64-bit part of this. Seeing Linux'es exposure of the
> value in sysfs I even wonder whether we can change this like you do for
> HVM. Who knows what is being inferred from the value, and by whom.

Linux's sysfs ABI isn't relevant to us here.  The sysfs ABI says it
reports what the hypervisor presents, not that it will be a nonzero number.

>> --- a/xen/include/public/version.h
>> +++ b/xen/include/public/version.h
>> @@ -42,6 +42,26 @@ typedef char xen_capabilities_info_t[1024];
>>  typedef char xen_changeset_info_t[64];
>>  #define XEN_CHANGESET_INFO_LEN (sizeof(xen_changeset_info_t))
>>  
>> +/*
>> + * This API is problematic.
>> + *
>> + * It is only applicable to guests which share pagetables with Xen (x86 PV
>> + * guests), and is supposed to identify the virtual address split between
>> + * guest kernel and Xen.
>> + *
>> + * For 32bit PV guests, it mostly does this, but the caller needs to know that
>> + * Xen lives between the split and 4G.
>> + *
>> + * For 64bit PV guests, Xen lives at the bottom of the upper canonical range.
>> + * This previously returned the start of the upper canonical range (which is
>> + * the userspace/Xen split), not the Xen/kernel split (which is 8TB further
>> + * on).  This now returns 0 because the old number wasn't correct, and
>> + * changing it to anything else would be even worse.
> Whether the guest runs user mode code in the low or high half (or in yet
> another way of splitting) isn't really dictated by the PV ABI, is it?

No, but given a choice of reporting the thing which is an architectural
boundary, or the one which is the actual split between the two adjacent
ranges, reporting the architectural boundary is clearly the unhelpful thing.

>  So
> whether the value is "wrong" is entirely unclear. Instead ...
>
>> + * For all guest types using hardware virt extentions, Xen is not mapped into
>> + * the guest kernel virtual address space.  This now return 0, where it
>> + * previously returned unrelated data.
>> + */
>>  #define XENVER_platform_parameters 5
>>  struct xen_platform_parameters {
>>      xen_ulong_t virt_start;
> ... the field name tells me that all that is being conveyed is the virtual
> address of where the hypervisor area starts.

IMO, it doesn't matter what the name of the field is.  It dates from the
days when 32bit PV was the only type of guest.

32bit PV guests really do have a variable split, so the guest kernel
really does need to get this value from Xen.

The split for 64bit PV guests is compile-time constant, hence why 64bit
PV kernels don't care.

For compat HVM, it happens to pick up the -1 from:

#ifdef CONFIG_PV32
    HYPERVISOR_COMPAT_VIRT_START(d) =
        is_pv_domain(d) ? __HYPERVISOR_COMPAT_VIRT_START : ~0u;
#endif

in arch_domain_create(), whereas for non-compat HVM, it gets a number in
an address space it has no connection to in the slightest.  ARM guests
end up getting XEN_VIRT_START (== 2M) handed back, but this absolutely
an internal detail that guests have no business knowing.


The only reason I'm not issuing an XSA for this is because we don't have
any pretence of KASLR in Xen.  Pretty much every other kernel gets CVEs
for infoleaks like this.

We feasibly could do KASLR in !PV builds, at which point this would
qualify for an XSA.

~Andrew
Jan Beulich Jan. 5, 2023, 7:57 a.m. UTC | #3
On 04.01.2023 20:55, Andrew Cooper wrote:
> On 04/01/2023 4:40 pm, Jan Beulich wrote:
>> On 03.01.2023 21:09, Andrew Cooper wrote:
>>> A split in virtual address space is only applicable for x86 PV guests.
>>> Furthermore, the information returned for x86 64bit PV guests is wrong.
>>>
>>> Explain the problem in version.h, stating the other information that PV guests
>>> need to know.
>>>
>>> For 64bit PV guests, and all non-x86-PV guests, return 0, which is strictly
>>> less wrong than the values currently returned.
>> I disagree for the 64-bit part of this. Seeing Linux'es exposure of the
>> value in sysfs I even wonder whether we can change this like you do for
>> HVM. Who knows what is being inferred from the value, and by whom.
> 
> Linux's sysfs ABI isn't relevant to us here.  The sysfs ABI says it
> reports what the hypervisor presents, not that it will be a nonzero number.

It effectively reports the hypervisor (virtual) base address there. How
can we not care if something (kexec would come to mind) would be using
it for whatever purpose. And thinking of it, the tool stack has uses,
too. Assuming you audited them, did you consider removing dead uses in
a prereq patch (and discuss the effects on live ones in the description)?

>>> --- a/xen/include/public/version.h
>>> +++ b/xen/include/public/version.h
>>> @@ -42,6 +42,26 @@ typedef char xen_capabilities_info_t[1024];
>>>  typedef char xen_changeset_info_t[64];
>>>  #define XEN_CHANGESET_INFO_LEN (sizeof(xen_changeset_info_t))
>>>  
>>> +/*
>>> + * This API is problematic.
>>> + *
>>> + * It is only applicable to guests which share pagetables with Xen (x86 PV
>>> + * guests), and is supposed to identify the virtual address split between
>>> + * guest kernel and Xen.
>>> + *
>>> + * For 32bit PV guests, it mostly does this, but the caller needs to know that
>>> + * Xen lives between the split and 4G.
>>> + *
>>> + * For 64bit PV guests, Xen lives at the bottom of the upper canonical range.
>>> + * This previously returned the start of the upper canonical range (which is
>>> + * the userspace/Xen split), not the Xen/kernel split (which is 8TB further
>>> + * on).  This now returns 0 because the old number wasn't correct, and
>>> + * changing it to anything else would be even worse.
>> Whether the guest runs user mode code in the low or high half (or in yet
>> another way of splitting) isn't really dictated by the PV ABI, is it?
> 
> No, but given a choice of reporting the thing which is an architectural
> boundary, or the one which is the actual split between the two adjacent
> ranges, reporting the architectural boundary is clearly the unhelpful thing.

Hmm. To properly parallel the 32-bit variant, a [start,end] range would need
exposing for 64-bit, rather than exposing nothing. Not the least because ...

>>  So
>> whether the value is "wrong" is entirely unclear. Instead ...
>>
>>> + * For all guest types using hardware virt extentions, Xen is not mapped into
>>> + * the guest kernel virtual address space.  This now return 0, where it
>>> + * previously returned unrelated data.
>>> + */
>>>  #define XENVER_platform_parameters 5
>>>  struct xen_platform_parameters {
>>>      xen_ulong_t virt_start;
>> ... the field name tells me that all that is being conveyed is the virtual
>> address of where the hypervisor area starts.
> 
> IMO, it doesn't matter what the name of the field is.  It dates from the
> days when 32bit PV was the only type of guest.
> 
> 32bit PV guests really do have a variable split, so the guest kernel
> really does need to get this value from Xen.
> 
> The split for 64bit PV guests is compile-time constant, hence why 64bit
> PV kernels don't care.

... once we get to run Xen in 5-level mode, 4-level PV guests could also
gain a variable split: Like for 32-bit guests now, only the r/o M2P would
need to live in that area, and this may well occupy less than the full
range presently reserved for the hypervisor.

> For compat HVM, it happens to pick up the -1 from:
> 
> #ifdef CONFIG_PV32
>     HYPERVISOR_COMPAT_VIRT_START(d) =
>         is_pv_domain(d) ? __HYPERVISOR_COMPAT_VIRT_START : ~0u;
> #endif
> 
> in arch_domain_create(), whereas for non-compat HVM, it gets a number in
> an address space it has no connection to in the slightest.  ARM guests
> end up getting XEN_VIRT_START (== 2M) handed back, but this absolutely
> an internal detail that guests have no business knowing.

Well, okay, this looks to be good enough an argument to make the adjustment
you propose for !PV guests.

> The only reason I'm not issuing an XSA for this is because we don't have
> any pretence of KASLR in Xen.  Pretty much every other kernel gets CVEs
> for infoleaks like this.
> 
> We feasibly could do KASLR in !PV builds, at which point this would
> qualify for an XSA.

I would question that, but I can see your view as one possible one.

Jan
Andrew Cooper Jan. 5, 2023, 10:17 p.m. UTC | #4
On 05/01/2023 7:57 am, Jan Beulich wrote:
> On 04.01.2023 20:55, Andrew Cooper wrote:
>> On 04/01/2023 4:40 pm, Jan Beulich wrote:
>>> On 03.01.2023 21:09, Andrew Cooper wrote:
>>>> A split in virtual address space is only applicable for x86 PV guests.
>>>> Furthermore, the information returned for x86 64bit PV guests is wrong.
>>>>
>>>> Explain the problem in version.h, stating the other information that PV guests
>>>> need to know.
>>>>
>>>> For 64bit PV guests, and all non-x86-PV guests, return 0, which is strictly
>>>> less wrong than the values currently returned.
>>> I disagree for the 64-bit part of this. Seeing Linux'es exposure of the
>>> value in sysfs I even wonder whether we can change this like you do for
>>> HVM. Who knows what is being inferred from the value, and by whom.
>> Linux's sysfs ABI isn't relevant to us here.  The sysfs ABI says it
>> reports what the hypervisor presents, not that it will be a nonzero number.
> It effectively reports the hypervisor (virtual) base address there. How
> can we not care if something (kexec would come to mind) would be using
> it for whatever purpose.

What about kexec do you think would care?

The only thing kexec-tools cares about is XENVER_capabilities, but even
that's not actually correct for figuring out whether Xen can do kexec
transitions to ELF64/32.

> And thinking of it, the tool stack has uses,
> too. Assuming you audited them, did you consider removing dead uses in
> a prereq patch (and discuss the effects on live ones in the description)?

There is only one toolstack use I can spot which is non-informational,
and it's broken AFAICT.

`xl dump-core` writes out a header which includes this metadata, but it
takes dom0's value, not domU's.  (Not that this is relevant AFAICT,
because the M2P is handled specially anyway.)


Most XENVER_* information is global (and by this, I mean invariant and
non-caller dependent, outside of livepatching.)

XENVER_guest_handle is caller-variant, but the toolstack has proper
interfaces to get/set this value.

XENVER_platform_parameters (and XENVER_get_features for that matter) are
caller-variant, and the toolstack has no way to get domU's view of this
data.


Every instance (well - this is the only interesting one) of the use of
XENVER_platform_parameters I can find is broken, even in the Xen code.

>>>> --- a/xen/include/public/version.h
>>>> +++ b/xen/include/public/version.h
>>>> @@ -42,6 +42,26 @@ typedef char xen_capabilities_info_t[1024];
>>>>  typedef char xen_changeset_info_t[64];
>>>>  #define XEN_CHANGESET_INFO_LEN (sizeof(xen_changeset_info_t))
>>>>  
>>>> +/*
>>>> + * This API is problematic.
>>>> + *
>>>> + * It is only applicable to guests which share pagetables with Xen (x86 PV
>>>> + * guests), and is supposed to identify the virtual address split between
>>>> + * guest kernel and Xen.
>>>> + *
>>>> + * For 32bit PV guests, it mostly does this, but the caller needs to know that
>>>> + * Xen lives between the split and 4G.
>>>> + *
>>>> + * For 64bit PV guests, Xen lives at the bottom of the upper canonical range.
>>>> + * This previously returned the start of the upper canonical range (which is
>>>> + * the userspace/Xen split), not the Xen/kernel split (which is 8TB further
>>>> + * on).  This now returns 0 because the old number wasn't correct, and
>>>> + * changing it to anything else would be even worse.
>>> Whether the guest runs user mode code in the low or high half (or in yet
>>> another way of splitting) isn't really dictated by the PV ABI, is it?
>> No, but given a choice of reporting the thing which is an architectural
>> boundary, or the one which is the actual split between the two adjacent
>> ranges, reporting the architectural boundary is clearly the unhelpful thing.
> Hmm. To properly parallel the 32-bit variant, a [start,end] range would need
> exposing for 64-bit, rather than exposing nothing.

The 32-bit version is a start/end pair, but with end being implicit at
the 4G architectural boundary.

If we were doing 64-bit from scratch, then reporting end would have been
sensible, because for 64-bit, start is the architectural boundary which
can be implicit.

But there is no such thing as a 64bit PV guest with any (useful) idea of
a variable split, because this number has been junk for the entire
lifetime of 64bit PV guests.  In particular, ...

>>>> + * For all guest types using hardware virt extentions, Xen is not mapped into
>>>> + * the guest kernel virtual address space.  This now return 0, where it
>>>> + * previously returned unrelated data.
>>>> + */
>>>>  #define XENVER_platform_parameters 5
>>>>  struct xen_platform_parameters {
>>>>      xen_ulong_t virt_start;
>>> ... the field name tells me that all that is being conveyed is the virtual
>>> address of where the hypervisor area starts.
>> IMO, it doesn't matter what the name of the field is.  It dates from the
>> days when 32bit PV was the only type of guest.
>>
>> 32bit PV guests really do have a variable split, so the guest kernel
>> really does need to get this value from Xen.
>>
>> The split for 64bit PV guests is compile-time constant, hence why 64bit
>> PV kernels don't care.
> ... once we get to run Xen in 5-level mode, 4-level PV guests could also
> gain a variable split: Like for 32-bit guests now, only the r/o M2P would
> need to live in that area, and this may well occupy less than the full
> range presently reserved for the hypervisor.

... you can't do this, because it only works for guests which have
chosen to find the M2P using XENMEM_machphys_mapping (e.g. Linux), and
doesn't for e.g. MiniOS which does:

#define machine_to_phys_mapping ((unsigned long *)HYPERVISOR_VIRT_START)

in fact, looking at this, MiniOS is also broken as a 32bit PV dom0,
because it hardcodes __MACH2PHYS_VIRT_START in the case where the split
really is variable.


Its only PV guests which are LA57 aware which can possibly benefit from
a variable position M2P, and only because that will be a new ELFNOTE
protocol.

>
>> For compat HVM, it happens to pick up the -1 from:
>>
>> #ifdef CONFIG_PV32
>>     HYPERVISOR_COMPAT_VIRT_START(d) =
>>         is_pv_domain(d) ? __HYPERVISOR_COMPAT_VIRT_START : ~0u;
>> #endif
>>
>> in arch_domain_create(), whereas for non-compat HVM, it gets a number in
>> an address space it has no connection to in the slightest.  ARM guests
>> end up getting XEN_VIRT_START (== 2M) handed back, but this absolutely
>> an internal detail that guests have no business knowing.
> Well, okay, this looks to be good enough an argument to make the adjustment
> you propose for !PV guests.

Right, HVM (on all architectures) is very cut and dry.

But it feels wrong to not address the PV64 issue at the same time
because it is similar level of broken, despite there being (in theory) a
legitimate need for a PV guest kernel to know it.

~Andrew
Jan Beulich Jan. 6, 2023, 7:54 a.m. UTC | #5
On 05.01.2023 23:17, Andrew Cooper wrote:
> On 05/01/2023 7:57 am, Jan Beulich wrote:
>> On 04.01.2023 20:55, Andrew Cooper wrote:
>>> On 04/01/2023 4:40 pm, Jan Beulich wrote:
>>>> On 03.01.2023 21:09, Andrew Cooper wrote:
>>>>> A split in virtual address space is only applicable for x86 PV guests.
>>>>> Furthermore, the information returned for x86 64bit PV guests is wrong.
>>>>>
>>>>> Explain the problem in version.h, stating the other information that PV guests
>>>>> need to know.
>>>>>
>>>>> For 64bit PV guests, and all non-x86-PV guests, return 0, which is strictly
>>>>> less wrong than the values currently returned.
>>>> I disagree for the 64-bit part of this. Seeing Linux'es exposure of the
>>>> value in sysfs I even wonder whether we can change this like you do for
>>>> HVM. Who knows what is being inferred from the value, and by whom.
>>> Linux's sysfs ABI isn't relevant to us here.  The sysfs ABI says it
>>> reports what the hypervisor presents, not that it will be a nonzero number.
>> It effectively reports the hypervisor (virtual) base address there. How
>> can we not care if something (kexec would come to mind) would be using
>> it for whatever purpose.
> 
> What about kexec do you think would care?

I didn't think about anything specific, but I could see why it may want to
know where in VA space Xen sits.

>>>>> --- a/xen/include/public/version.h
>>>>> +++ b/xen/include/public/version.h
>>>>> @@ -42,6 +42,26 @@ typedef char xen_capabilities_info_t[1024];
>>>>>  typedef char xen_changeset_info_t[64];
>>>>>  #define XEN_CHANGESET_INFO_LEN (sizeof(xen_changeset_info_t))
>>>>>  
>>>>> +/*
>>>>> + * This API is problematic.
>>>>> + *
>>>>> + * It is only applicable to guests which share pagetables with Xen (x86 PV
>>>>> + * guests), and is supposed to identify the virtual address split between
>>>>> + * guest kernel and Xen.
>>>>> + *
>>>>> + * For 32bit PV guests, it mostly does this, but the caller needs to know that
>>>>> + * Xen lives between the split and 4G.
>>>>> + *
>>>>> + * For 64bit PV guests, Xen lives at the bottom of the upper canonical range.
>>>>> + * This previously returned the start of the upper canonical range (which is
>>>>> + * the userspace/Xen split), not the Xen/kernel split (which is 8TB further
>>>>> + * on).  This now returns 0 because the old number wasn't correct, and
>>>>> + * changing it to anything else would be even worse.
>>>> Whether the guest runs user mode code in the low or high half (or in yet
>>>> another way of splitting) isn't really dictated by the PV ABI, is it?
>>> No, but given a choice of reporting the thing which is an architectural
>>> boundary, or the one which is the actual split between the two adjacent
>>> ranges, reporting the architectural boundary is clearly the unhelpful thing.
>> Hmm. To properly parallel the 32-bit variant, a [start,end] range would need
>> exposing for 64-bit, rather than exposing nothing.
> 
> The 32-bit version is a start/end pair, but with end being implicit at
> the 4G architectural boundary.
> 
> If we were doing 64-bit from scratch, then reporting end would have been
> sensible, because for 64-bit, start is the architectural boundary which
> can be implicit.
> 
> But there is no such thing as a 64bit PV guest with any (useful) idea of
> a variable split, because this number has been junk for the entire
> lifetime of 64bit PV guests.  In particular, ...
> 
>>>>> + * For all guest types using hardware virt extentions, Xen is not mapped into
>>>>> + * the guest kernel virtual address space.  This now return 0, where it
>>>>> + * previously returned unrelated data.
>>>>> + */
>>>>>  #define XENVER_platform_parameters 5
>>>>>  struct xen_platform_parameters {
>>>>>      xen_ulong_t virt_start;
>>>> ... the field name tells me that all that is being conveyed is the virtual
>>>> address of where the hypervisor area starts.
>>> IMO, it doesn't matter what the name of the field is.  It dates from the
>>> days when 32bit PV was the only type of guest.
>>>
>>> 32bit PV guests really do have a variable split, so the guest kernel
>>> really does need to get this value from Xen.
>>>
>>> The split for 64bit PV guests is compile-time constant, hence why 64bit
>>> PV kernels don't care.
>> ... once we get to run Xen in 5-level mode, 4-level PV guests could also
>> gain a variable split: Like for 32-bit guests now, only the r/o M2P would
>> need to live in that area, and this may well occupy less than the full
>> range presently reserved for the hypervisor.
> 
> ... you can't do this, because it only works for guests which have
> chosen to find the M2P using XENMEM_machphys_mapping (e.g. Linux), and
> doesn't for e.g. MiniOS which does:
> 
> #define machine_to_phys_mapping ((unsigned long *)HYPERVISOR_VIRT_START)

Hmm, looks like a misunderstanding? I certainly wasn't thinking about
making the start of that region variable, but rather the end (i.e. not
exactly like for 32-bit compat).

>>> For compat HVM, it happens to pick up the -1 from:
>>>
>>> #ifdef CONFIG_PV32
>>>     HYPERVISOR_COMPAT_VIRT_START(d) =
>>>         is_pv_domain(d) ? __HYPERVISOR_COMPAT_VIRT_START : ~0u;
>>> #endif
>>>
>>> in arch_domain_create(), whereas for non-compat HVM, it gets a number in
>>> an address space it has no connection to in the slightest.  ARM guests
>>> end up getting XEN_VIRT_START (== 2M) handed back, but this absolutely
>>> an internal detail that guests have no business knowing.
>> Well, okay, this looks to be good enough an argument to make the adjustment
>> you propose for !PV guests.
> 
> Right, HVM (on all architectures) is very cut and dry.
> 
> But it feels wrong to not address the PV64 issue at the same time
> because it is similar level of broken, despite there being (in theory) a
> legitimate need for a PV guest kernel to know it.

To me it feels wrong to address the 64-bit PV issue by removing information,
when - as you also say - it is actually _missing_ information. To me the
proper course of action would be to expose the upper bound as well (such
that, down the road, it could become dynamic). There's also no info leak
there, as the two (static) bounds are part of the PV ABI anyway.

Jan
Andrew Cooper Jan. 6, 2023, 12:14 p.m. UTC | #6
On 06/01/2023 7:54 am, Jan Beulich wrote:
> On 05.01.2023 23:17, Andrew Cooper wrote:
>> On 05/01/2023 7:57 am, Jan Beulich wrote:
>>> On 04.01.2023 20:55, Andrew Cooper wrote:
>>>> On 04/01/2023 4:40 pm, Jan Beulich wrote:
>>>>> On 03.01.2023 21:09, Andrew Cooper wrote:
>>>>>> A split in virtual address space is only applicable for x86 PV guests.
>>>>>> Furthermore, the information returned for x86 64bit PV guests is wrong.
>>>>>>
>>>>>> Explain the problem in version.h, stating the other information that PV guests
>>>>>> need to know.
>>>>>>
>>>>>> For 64bit PV guests, and all non-x86-PV guests, return 0, which is strictly
>>>>>> less wrong than the values currently returned.
>>>>> I disagree for the 64-bit part of this. Seeing Linux'es exposure of the
>>>>> value in sysfs I even wonder whether we can change this like you do for
>>>>> HVM. Who knows what is being inferred from the value, and by whom.
>>>> Linux's sysfs ABI isn't relevant to us here.  The sysfs ABI says it
>>>> reports what the hypervisor presents, not that it will be a nonzero number.
>>> It effectively reports the hypervisor (virtual) base address there. How
>>> can we not care if something (kexec would come to mind) would be using
>>> it for whatever purpose.
>> What about kexec do you think would care?
> I didn't think about anything specific, but I could see why it may want to
> know where in VA space Xen sits.

The kexec image doesn't run "under" Xen; it replaces Xen in memory, and
transition into the new image is via no paging (32bit) or identity
paging (64bit) in the reserved region.

We don't really support kexec load (it's there, but I don't expect
anyone has exercised it in anger), but if we were to load a new
Xen+dom0, then kexec-tools still has nothing relevant to do with
new-Xen+dom0's split.

>>>>>> + * For all guest types using hardware virt extentions, Xen is not mapped into
>>>>>> + * the guest kernel virtual address space.  This now return 0, where it
>>>>>> + * previously returned unrelated data.
>>>>>> + */
>>>>>>  #define XENVER_platform_parameters 5
>>>>>>  struct xen_platform_parameters {
>>>>>>      xen_ulong_t virt_start;
>>>>> ... the field name tells me that all that is being conveyed is the virtual
>>>>> address of where the hypervisor area starts.
>>>> IMO, it doesn't matter what the name of the field is.  It dates from the
>>>> days when 32bit PV was the only type of guest.
>>>>
>>>> 32bit PV guests really do have a variable split, so the guest kernel
>>>> really does need to get this value from Xen.
>>>>
>>>> The split for 64bit PV guests is compile-time constant, hence why 64bit
>>>> PV kernels don't care.
>>> ... once we get to run Xen in 5-level mode, 4-level PV guests could also
>>> gain a variable split: Like for 32-bit guests now, only the r/o M2P would
>>> need to live in that area, and this may well occupy less than the full
>>> range presently reserved for the hypervisor.
>> ... you can't do this, because it only works for guests which have
>> chosen to find the M2P using XENMEM_machphys_mapping (e.g. Linux), and
>> doesn't for e.g. MiniOS which does:
>>
>> #define machine_to_phys_mapping ((unsigned long *)HYPERVISOR_VIRT_START)
> Hmm, looks like a misunderstanding? I certainly wasn't thinking about
> making the start of that region variable, but rather the end (i.e. not
> exactly like for 32-bit compat).

But for what purpose?  You can't make 4-level guests have a variable range.

The best you can do is make a 5-level-aware guest running on 4-level Xen
have some new semantics, but a 4-level PV guest already owns the
overwhelming majority of virtual address space, so being able to hand
back a few extra TB is not interesting.  And for making that happen...

>>>> For compat HVM, it happens to pick up the -1 from:
>>>>
>>>> #ifdef CONFIG_PV32
>>>>     HYPERVISOR_COMPAT_VIRT_START(d) =
>>>>         is_pv_domain(d) ? __HYPERVISOR_COMPAT_VIRT_START : ~0u;
>>>> #endif
>>>>
>>>> in arch_domain_create(), whereas for non-compat HVM, it gets a number in
>>>> an address space it has no connection to in the slightest.  ARM guests
>>>> end up getting XEN_VIRT_START (== 2M) handed back, but this absolutely
>>>> an internal detail that guests have no business knowing.
>>> Well, okay, this looks to be good enough an argument to make the adjustment
>>> you propose for !PV guests.
>> Right, HVM (on all architectures) is very cut and dry.
>>
>> But it feels wrong to not address the PV64 issue at the same time
>> because it is similar level of broken, despite there being (in theory) a
>> legitimate need for a PV guest kernel to know it.
> To me it feels wrong to address the 64-bit PV issue by removing information,
> when - as you also say - it is actually _missing_ information. To me the
> proper course of action would be to expose the upper bound as well (such
> that, down the road, it could become dynamic). There's also no info leak
> there, as the two (static) bounds are part of the PV ABI anyway.

... the absolute best you could do here is introduce a new
XENVER_platform_parameters2 that has a different structure.

Which still leaves us with XENVER_platform_parameters providing data
which is somewhere between useless and actively unhelpful.

~Andrew
Jan Beulich Jan. 9, 2023, 8:06 a.m. UTC | #7
On 06.01.2023 13:14, Andrew Cooper wrote:
> On 06/01/2023 7:54 am, Jan Beulich wrote:
>> On 05.01.2023 23:17, Andrew Cooper wrote:
>>> On 05/01/2023 7:57 am, Jan Beulich wrote:
>>>> On 04.01.2023 20:55, Andrew Cooper wrote:
>>>>> On 04/01/2023 4:40 pm, Jan Beulich wrote:
>>>>>> On 03.01.2023 21:09, Andrew Cooper wrote:
>>>>>>> + * For all guest types using hardware virt extentions, Xen is not mapped into
>>>>>>> + * the guest kernel virtual address space.  This now return 0, where it
>>>>>>> + * previously returned unrelated data.
>>>>>>> + */
>>>>>>>  #define XENVER_platform_parameters 5
>>>>>>>  struct xen_platform_parameters {
>>>>>>>      xen_ulong_t virt_start;
>>>>>> ... the field name tells me that all that is being conveyed is the virtual
>>>>>> address of where the hypervisor area starts.
>>>>> IMO, it doesn't matter what the name of the field is.  It dates from the
>>>>> days when 32bit PV was the only type of guest.
>>>>>
>>>>> 32bit PV guests really do have a variable split, so the guest kernel
>>>>> really does need to get this value from Xen.
>>>>>
>>>>> The split for 64bit PV guests is compile-time constant, hence why 64bit
>>>>> PV kernels don't care.
>>>> ... once we get to run Xen in 5-level mode, 4-level PV guests could also
>>>> gain a variable split: Like for 32-bit guests now, only the r/o M2P would
>>>> need to live in that area, and this may well occupy less than the full
>>>> range presently reserved for the hypervisor.
>>> ... you can't do this, because it only works for guests which have
>>> chosen to find the M2P using XENMEM_machphys_mapping (e.g. Linux), and
>>> doesn't for e.g. MiniOS which does:
>>>
>>> #define machine_to_phys_mapping ((unsigned long *)HYPERVISOR_VIRT_START)
>> Hmm, looks like a misunderstanding? I certainly wasn't thinking about
>> making the start of that region variable, but rather the end (i.e. not
>> exactly like for 32-bit compat).
> 
> But for what purpose?  You can't make 4-level guests have a variable range.

That entirely depends on the kernel. Linux, for example, could put its
direct map lower, ...

> The best you can do is make a 5-level-aware guest running on 4-level Xen
> have some new semantics, but a 4-level PV guest already owns the
> overwhelming majority of virtual address space, so being able to hand
> back a few extra TB is not interesting.  And for making that happen...

... allowing it to cover some more space. That's not a whole lot more,
sure.

>>>>> For compat HVM, it happens to pick up the -1 from:
>>>>>
>>>>> #ifdef CONFIG_PV32
>>>>>     HYPERVISOR_COMPAT_VIRT_START(d) =
>>>>>         is_pv_domain(d) ? __HYPERVISOR_COMPAT_VIRT_START : ~0u;
>>>>> #endif
>>>>>
>>>>> in arch_domain_create(), whereas for non-compat HVM, it gets a number in
>>>>> an address space it has no connection to in the slightest.  ARM guests
>>>>> end up getting XEN_VIRT_START (== 2M) handed back, but this absolutely
>>>>> an internal detail that guests have no business knowing.
>>>> Well, okay, this looks to be good enough an argument to make the adjustment
>>>> you propose for !PV guests.
>>> Right, HVM (on all architectures) is very cut and dry.
>>>
>>> But it feels wrong to not address the PV64 issue at the same time
>>> because it is similar level of broken, despite there being (in theory) a
>>> legitimate need for a PV guest kernel to know it.
>> To me it feels wrong to address the 64-bit PV issue by removing information,
>> when - as you also say - it is actually _missing_ information. To me the
>> proper course of action would be to expose the upper bound as well (such
>> that, down the road, it could become dynamic). There's also no info leak
>> there, as the two (static) bounds are part of the PV ABI anyway.
> 
> ... the absolute best you could do here is introduce a new
> XENVER_platform_parameters2 that has a different structure.

Indeed.

> Which still leaves us with XENVER_platform_parameters providing data
> which is somewhere between useless and actively unhelpful.

If it's useless, there's still no reason to remove / alter the information,
as you can never be absolutely certain that no-one uses this in some way.
And for the "actively unhelpful" aspect it looks like our views simply
differ. Is there no way we can settle on making the change affect HVM only?

Jan
diff mbox series

Patch

diff --git a/xen/common/kernel.c b/xen/common/kernel.c
index ccee178ff17a..70e7dff87488 100644
--- a/xen/common/kernel.c
+++ b/xen/common/kernel.c
@@ -522,7 +522,9 @@  long do_xen_version(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
         if ( current->hcall_compat )
         {
             compat_platform_parameters_t params = {
-                .virt_start = HYPERVISOR_COMPAT_VIRT_START(current->domain),
+                .virt_start = is_pv_vcpu(current)
+                            ? HYPERVISOR_COMPAT_VIRT_START(current->domain)
+                            : 0,
             };
 
             if ( copy_to_guest(arg, &params, 1) )
@@ -532,7 +534,7 @@  long do_xen_version(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
 #endif
         {
             xen_platform_parameters_t params = {
-                .virt_start = HYPERVISOR_VIRT_START,
+                .virt_start = 0,
             };
 
             if ( copy_to_guest(arg, &params, 1) )
diff --git a/xen/include/public/version.h b/xen/include/public/version.h
index 0ff8bd9077c6..c8325219f648 100644
--- a/xen/include/public/version.h
+++ b/xen/include/public/version.h
@@ -42,6 +42,26 @@  typedef char xen_capabilities_info_t[1024];
 typedef char xen_changeset_info_t[64];
 #define XEN_CHANGESET_INFO_LEN (sizeof(xen_changeset_info_t))
 
+/*
+ * This API is problematic.
+ *
+ * It is only applicable to guests which share pagetables with Xen (x86 PV
+ * guests), and is supposed to identify the virtual address split between
+ * guest kernel and Xen.
+ *
+ * For 32bit PV guests, it mostly does this, but the caller needs to know that
+ * Xen lives between the split and 4G.
+ *
+ * For 64bit PV guests, Xen lives at the bottom of the upper canonical range.
+ * This previously returned the start of the upper canonical range (which is
+ * the userspace/Xen split), not the Xen/kernel split (which is 8TB further
+ * on).  This now returns 0 because the old number wasn't correct, and
+ * changing it to anything else would be even worse.
+ *
+ * For all guest types using hardware virt extentions, Xen is not mapped into
+ * the guest kernel virtual address space.  This now return 0, where it
+ * previously returned unrelated data.
+ */
 #define XENVER_platform_parameters 5
 struct xen_platform_parameters {
     xen_ulong_t virt_start;