Patchwork [RFC,v2,1/2] xen/pvh: Add memory map pointer to hvm_start_info struct

login
register
mail settings
Submitter Maran Wilson
Date Dec. 7, 2017, 10:45 p.m.
Message ID <1512686715-11488-2-git-send-email-maran.wilson@oracle.com>
Download mbox | patch
Permalink /patch/10101213/
State New
Headers show

Comments

Maran Wilson - Dec. 7, 2017, 10:45 p.m.
The start info structure that is defined as part of the x86/HVM direct
boot ABI and used for starting Xen PVH guests would be more versatile if
it also included a way to efficiently pass information about the memory
map to the guest.

That way Xen PVH guests would not be forced to use a hypercall to get the
information and would make it easier for KVM guests to share the PVH
entry point.
---
 include/xen/interface/hvm/start_info.h | 34 +++++++++++++++++++++++++++++++---
 1 file changed, 31 insertions(+), 3 deletions(-)
Jan Beulich - Dec. 8, 2017, 8:49 a.m.
>>> On 07.12.17 at 23:45, <maran.wilson@oracle.com> wrote:
> The start info structure that is defined as part of the x86/HVM direct
> boot ABI and used for starting Xen PVH guests would be more versatile if
> it also included a way to efficiently pass information about the memory
> map to the guest.
> 
> That way Xen PVH guests would not be forced to use a hypercall to get the
> information and would make it easier for KVM guests to share the PVH
> entry point.
> ---
>  include/xen/interface/hvm/start_info.h | 34 +++++++++++++++++++++++++++++++---
>  1 file changed, 31 insertions(+), 3 deletions(-)

First of all such a change should be submitted against the canonical
copy of the header, which lives in the Xen tree.

The argument of avoiding a hypercall doesn't really count imo - this
isn't in any way performance critical code. The argument of making
re-use easier is fine, though.

> --- a/include/xen/interface/hvm/start_info.h
> +++ b/include/xen/interface/hvm/start_info.h
> @@ -33,7 +33,7 @@
>   *    | magic          | Contains the magic value XEN_HVM_START_MAGIC_VALUE
>   *    |                | ("xEn3" with the 0x80 bit of the "E" set).
>   *  4 +----------------+
> - *    | version        | Version of this structure. Current version is 0. New
> + *    | version        | Version of this structure. Current version is 1. New
>   *    |                | versions are guaranteed to be backwards-compatible.
>   *  8 +----------------+
>   *    | flags          | SIF_xxx flags.
> @@ -48,6 +48,12 @@
>   * 32 +----------------+
>   *    | rsdp_paddr     | Physical address of the RSDP ACPI data structure.
>   * 40 +----------------+
> + *    | memmap_paddr   | Physical address of the memory map. Only present in
> + *    |                | version 1 and newer of the structure.
> + * 48 +----------------+
> + *    | memmap_entries | Number of entries in the memory map table. Only
> + *    |                | present in version 1 and newer of the structure.
> + * 52 +----------------+

Please let's make this optional even in v1 (and later), i.e. spell out
that it may be zero. That way Xen code could continue to use the
hypercall approach even.

Also please spell out a 4-byte reserved entry at the end, to make
the specified structure a multiple of 8 in size again regardless of
bitness of the producer/consumer.

> @@ -62,6 +68,17 @@
>   *    | reserved       |
>   * 32 +----------------+
>   *
> + * The layout of each entry in the memory map table is as follows and no
> + * padding is used between entries in the array:
> + *
> + *  0 +----------------+
> + *    | addr           | Base address
> + *  8 +----------------+
> + *    | size           | Size of mapping
> + * 16 +----------------+
> + *    | type           | E820_TYPE_xxx
> + * 20 +----------------|

I'm not convinced of re-using E820 types here. I can see that this
might ease the consumption in Linux, but I don't think there should
be any connection to x86 aspects here - the data being supplied is
x86-agnostic, and Linux'es placement of the header is also making
no connection to x86 (oddly enough, the current placement in the
Xen tree does, for a reason which escapes me).

I could also imagine reasons to add new types without them being
sanctioned by whoever maintains E820 type assignments.

As to the size field - you need to spell out whether these are bytes
or pages (it might be worthwhile to also make this explicit for the
addr one, but there I view it as less of a problem, since "address"
doesn't commonly mean a page granular entity).

Also this again lacks a 4-byte reserved field at the end.

> @@ -86,13 +103,24 @@ struct hvm_start_info {
>      uint64_t cmdline_paddr;     /* Physical address of the command line.     */
>      uint64_t rsdp_paddr;        /* Physical address of the RSDP ACPI data    */
>                                  /* structure.                               
>  */
> -};
> +    uint64_t memmap_paddr;	/* Physical address of an array of           */
> +				/* hvm_memmap_table_entry. Only present in   */
> +				/* Ver 1 or later. For e820 mem map table.   */
> +    uint32_t memmap_entries;	/* Only present in Ver 1 or later. Number of */
> +				/* entries in the memmap table.              */
> +} __attribute__((packed));

No packed attribute here and below please, at least not in the
canonical (non-Linux) variant of the header.

Jan
Maran Wilson - Dec. 8, 2017, 7:05 p.m.
Thanks for taking a look Jan. More below...

On 12/8/2017 12:49 AM, Jan Beulich wrote:
>>>> On 07.12.17 at 23:45, <maran.wilson@oracle.com> wrote:
>> The start info structure that is defined as part of the x86/HVM direct
>> boot ABI and used for starting Xen PVH guests would be more versatile if
>> it also included a way to efficiently pass information about the memory
>> map to the guest.
>>
>> That way Xen PVH guests would not be forced to use a hypercall to get the
>> information and would make it easier for KVM guests to share the PVH
>> entry point.
>> ---
>>   include/xen/interface/hvm/start_info.h | 34 +++++++++++++++++++++++++++++++---
>>   1 file changed, 31 insertions(+), 3 deletions(-)
> First of all such a change should be submitted against the canonical
> copy of the header, which lives in the Xen tree.

Understood. Will do that when this converts from RFC to actual patch.

> The argument of avoiding a hypercall doesn't really count imo - this
> isn't in any way performance critical code. The argument of making
> re-use easier is fine, though.

Okay, I will reword the commit message.

>> --- a/include/xen/interface/hvm/start_info.h
>> +++ b/include/xen/interface/hvm/start_info.h
>> @@ -33,7 +33,7 @@
>>    *    | magic          | Contains the magic value XEN_HVM_START_MAGIC_VALUE
>>    *    |                | ("xEn3" with the 0x80 bit of the "E" set).
>>    *  4 +----------------+
>> - *    | version        | Version of this structure. Current version is 0. New
>> + *    | version        | Version of this structure. Current version is 1. New
>>    *    |                | versions are guaranteed to be backwards-compatible.
>>    *  8 +----------------+
>>    *    | flags          | SIF_xxx flags.
>> @@ -48,6 +48,12 @@
>>    * 32 +----------------+
>>    *    | rsdp_paddr     | Physical address of the RSDP ACPI data structure.
>>    * 40 +----------------+
>> + *    | memmap_paddr   | Physical address of the memory map. Only present in
>> + *    |                | version 1 and newer of the structure.
>> + * 48 +----------------+
>> + *    | memmap_entries | Number of entries in the memory map table. Only
>> + *    |                | present in version 1 and newer of the structure.
>> + * 52 +----------------+
> Please let's make this optional even in v1 (and later), i.e. spell out
> that it may be zero. That way Xen code could continue to use the
> hypercall approach even.

Yes, my intention was to make this optional. I will spell it out.

> Also please spell out a 4-byte reserved entry at the end, to make
> the specified structure a multiple of 8 in size again regardless of
> bitness of the producer/consumer.

Sure, I can add that.

>> @@ -62,6 +68,17 @@
>>    *    | reserved       |
>>    * 32 +----------------+
>>    *
>> + * The layout of each entry in the memory map table is as follows and no
>> + * padding is used between entries in the array:
>> + *
>> + *  0 +----------------+
>> + *    | addr           | Base address
>> + *  8 +----------------+
>> + *    | size           | Size of mapping
>> + * 16 +----------------+
>> + *    | type           | E820_TYPE_xxx
>> + * 20 +----------------|
> I'm not convinced of re-using E820 types here. I can see that this
> might ease the consumption in Linux, but I don't think there should
> be any connection to x86 aspects here - the data being supplied is
> x86-agnostic, and Linux'es placement of the header is also making
> no connection to x86 (oddly enough, the current placement in the
> Xen tree does, for a reason which escapes me).
>
> I could also imagine reasons to add new types without them being
> sanctioned by whoever maintains E820 type assignments.

So there are three aspects to discuss here.

1) The addition of the "E820_TYPE_xxx" comment. I am fine with just 
changing that to "mapping type" and leaving it as something to be 
coordinated between the hypervisor and the guest OS being started by 
that hypervisor.

2) x86 vs x86-agnostic. While I'm trying to keep this interface generic 
in terms of guest OS (like Linux, FreeBSD, possible other guests in the 
future) and hypervisor type (Xen, QEMU/KVM, etc), I was actually under 
the impression that we are dealing with an ABI that is very much x86 
specific.

The canonical document describing the ABI 
(https://xenbits.xen.org/docs/unstable/misc/pvh.html) is titled "x86/HVM 
direct boot ABI" and goes on to describe an interface in very 
x86-specific terms. i.e. The ebx register must contain a pointer, cs, 
ds, es must be set a certain way, etc.

That is probably why Xen's placement of the header file is in a x86 
section of the tree. And also why there already exist a number of "x86" 
references in the existing header file. A quick grep of the existing 
header file will show lines like:

    "C representation of the x86/HVM start info layout"
    "Start of day structure passed to PVH guests and to HVM guests in %ebx"
    "Xen on x86 will always try to place all the data below the 4GiB"

If at some point in the future someone decides to implement a similar 
ABI for a different CPU architecture while re-using this same 
hvm_start_info struct, then this header will have to be redone a bit 
anyway. But I'm not aware of any other such ABI that exists or is 
currently in the works.

3) The (packed) layout of the hvm_memmap_table_entry struct. I did 
initially consider just making this a new structure that did not 
necessarily match struct e820_entry in its array layout. But, it's not 
just the consumer that has an easier time digesting it in the e820_entry 
array format. It's also the producer side (QEMU for instance) where code 
already exists to lay out this information in e820_entry array format. 
And since this is all x86 specific anyway, it just seemed like I would 
be needlessly making more work for both ends by inventing a completely 
new memory map layout just for the sake of being different. Especially 
when there doesn't seem to be anything terribly broken about the 
existing e820_entry array format as a general purpose memory map.

Would that be acceptable?

> As to the size field - you need to spell out whether these are bytes
> or pages (it might be worthwhile to also make this explicit for the
> addr one, but there I view it as less of a problem, since "address"
> doesn't commonly mean a page granular entity).

Sure, I will change that to "Size of the mapping in bytes".

> Also this again lacks a 4-byte reserved field at the end.

For the reasons described above, I'd much prefer to leave this struct 
packed and without the 4-byte reserved field. But let me know if there 
is a compelling reason that I have missed.

>> @@ -86,13 +103,24 @@ struct hvm_start_info {
>>       uint64_t cmdline_paddr;     /* Physical address of the command line.     */
>>       uint64_t rsdp_paddr;        /* Physical address of the RSDP ACPI data    */
>>                                   /* structure.
>>   */
>> -};
>> +    uint64_t memmap_paddr;	/* Physical address of an array of           */
>> +				/* hvm_memmap_table_entry. Only present in   */
>> +				/* Ver 1 or later. For e820 mem map table.   */
>> +    uint32_t memmap_entries;	/* Only present in Ver 1 or later. Number of */
>> +				/* entries in the memmap table.              */
>> +} __attribute__((packed));
> No packed attribute here and below please, at least not in the
> canonical (non-Linux) variant of the header.

Sure, with the addition of the 4 byte reserved field you mentioned 
earlier, I'll remove the packed attribute for the two existing 
structures. But as discussed above, would prefer to leave it in place 
for the new memory map array struct.

Thanks,
-Maran

> Jan
Jan Beulich - Dec. 11, 2017, 9:25 a.m.
>>> On 08.12.17 at 20:05, <maran.wilson@oracle.com> wrote:
> On 12/8/2017 12:49 AM, Jan Beulich wrote:
>> I'm not convinced of re-using E820 types here. I can see that this
>> might ease the consumption in Linux, but I don't think there should
>> be any connection to x86 aspects here - the data being supplied is
>> x86-agnostic, and Linux'es placement of the header is also making
>> no connection to x86 (oddly enough, the current placement in the
>> Xen tree does, for a reason which escapes me).
>>
>> I could also imagine reasons to add new types without them being
>> sanctioned by whoever maintains E820 type assignments.
> 
> So there are three aspects to discuss here.
> 
> 1) The addition of the "E820_TYPE_xxx" comment. I am fine with just 
> changing that to "mapping type" and leaving it as something to be 
> coordinated between the hypervisor and the guest OS being started by 
> that hypervisor.
> 
> 2) x86 vs x86-agnostic. While I'm trying to keep this interface generic 
> in terms of guest OS (like Linux, FreeBSD, possible other guests in the 
> future) and hypervisor type (Xen, QEMU/KVM, etc), I was actually under 
> the impression that we are dealing with an ABI that is very much x86 
> specific.
> 
> The canonical document describing the ABI 
> (https://xenbits.xen.org/docs/unstable/misc/pvh.html) is titled "x86/HVM 
> direct boot ABI" and goes on to describe an interface in very 
> x86-specific terms. i.e. The ebx register must contain a pointer, cs, 
> ds, es must be set a certain way, etc.
> 
> That is probably why Xen's placement of the header file is in a x86 
> section of the tree. And also why there already exist a number of "x86" 
> references in the existing header file. A quick grep of the existing 
> header file will show lines like:
> 
>     "C representation of the x86/HVM start info layout"
>     "Start of day structure passed to PVH guests and to HVM guests in %ebx"
>     "Xen on x86 will always try to place all the data below the 4GiB"
> 
> If at some point in the future someone decides to implement a similar 
> ABI for a different CPU architecture while re-using this same 
> hvm_start_info struct, then this header will have to be redone a bit 
> anyway. But I'm not aware of any other such ABI that exists or is 
> currently in the works.

Anything like this being in the works elsewhere doesn't mean much.
Otherwise you'd advocate for anyone introducing something new
on a single architecture only to ignore all portability concerns. My
fundamental point here is that within the currently defined
structures there's nothing x86 specific, no matter that a few
comments are mentioning x86 (and in at least two of the three
cases for obvious [implementation] reasons rather than to
explain why this interface can/should be x86 only).

> 3) The (packed) layout of the hvm_memmap_table_entry struct. I did 
> initially consider just making this a new structure that did not 
> necessarily match struct e820_entry in its array layout. But, it's not 
> just the consumer that has an easier time digesting it in the e820_entry 
> array format. It's also the producer side (QEMU for instance) where code 
> already exists to lay out this information in e820_entry array format. 
> And since this is all x86 specific anyway, it just seemed like I would 
> be needlessly making more work for both ends by inventing a completely 
> new memory map layout just for the sake of being different. Especially 
> when there doesn't seem to be anything terribly broken about the 
> existing e820_entry array format as a general purpose memory map.

The brokenness is the mis-alignment of every other array member.
This doesn't matter on x86, but it would matter on any architecture
requiring strict alignment. Furthermore please note that in the
canonical headers you can't even express what you want: Use of
#pragma pack or the packed attribute is prohibited there - we
demand that the headers can be used by any C89-compatible
compiler (lately there have been a few extensions requiring C99,
but these need to be opted in for by consumers, which imo is not
an option here).

Jan
Paolo Bonzini - Dec. 11, 2017, 9:59 p.m.
On 08/12/2017 09:49, Jan Beulich wrote:
>> + * The layout of each entry in the memory map table is as follows and no
>> + * padding is used between entries in the array:
>> + *
>> + *  0 +----------------+
>> + *    | addr           | Base address
>> + *  8 +----------------+
>> + *    | size           | Size of mapping
>> + * 16 +----------------+
>> + *    | type           | E820_TYPE_xxx
>> + * 20 +----------------|
> I'm not convinced of re-using E820 types here. I can see that this
> might ease the consumption in Linux, but I don't think there should
> be any connection to x86 aspects here - the data being supplied is
> x86-agnostic, and Linux'es placement of the header is also making
> no connection to x86 (oddly enough, the current placement in the
> Xen tree does, for a reason which escapes me).

FWIW, e820 types are now part of the ACPI standard.  So using them is
not necessarily related to x86, and reasonably x86-agnostic.

Paolo

> I could also imagine reasons to add new types without them being
> sanctioned by whoever maintains E820 type assignments.
Jan Beulich - Dec. 12, 2017, 8:06 a.m.
>>> On 11.12.17 at 22:59, <pbonzini@redhat.com> wrote:
> On 08/12/2017 09:49, Jan Beulich wrote:
>>> + * The layout of each entry in the memory map table is as follows and no
>>> + * padding is used between entries in the array:
>>> + *
>>> + *  0 +----------------+
>>> + *    | addr           | Base address
>>> + *  8 +----------------+
>>> + *    | size           | Size of mapping
>>> + * 16 +----------------+
>>> + *    | type           | E820_TYPE_xxx
>>> + * 20 +----------------|
>> I'm not convinced of re-using E820 types here. I can see that this
>> might ease the consumption in Linux, but I don't think there should
>> be any connection to x86 aspects here - the data being supplied is
>> x86-agnostic, and Linux'es placement of the header is also making
>> no connection to x86 (oddly enough, the current placement in the
>> Xen tree does, for a reason which escapes me).
> 
> FWIW, e820 types are now part of the ACPI standard.  So using them is
> not necessarily related to x86, and reasonably x86-agnostic.

Sort of - the description of it starts with "This interface is used in
real mode only on IA-PC-based systems ..."

But it being there is useful in another way: It shows that there's
an optional field making the full structure 64-bit aligned again. (It
at the same time shows - I admit I had forgotten about this aspect -
that the structure size isn't fixed in the first place, so consumers
have to convert [truncate/extend] the output to their internal
representation anyway, and hence there's even less of a reason
to tie the proposed structure's layout to the E820 one.)

Jan
Paolo Bonzini - Dec. 12, 2017, 9:54 a.m.
On 12/12/2017 09:06, Jan Beulich wrote:
>>>> On 11.12.17 at 22:59, <pbonzini@redhat.com> wrote:
>> On 08/12/2017 09:49, Jan Beulich wrote:
>>>> + * The layout of each entry in the memory map table is as follows and no
>>>> + * padding is used between entries in the array:
>>>> + *
>>>> + *  0 +----------------+
>>>> + *    | addr           | Base address
>>>> + *  8 +----------------+
>>>> + *    | size           | Size of mapping
>>>> + * 16 +----------------+
>>>> + *    | type           | E820_TYPE_xxx
>>>> + * 20 +----------------|
>>> I'm not convinced of re-using E820 types here. I can see that this
>>> might ease the consumption in Linux, but I don't think there should
>>> be any connection to x86 aspects here - the data being supplied is
>>> x86-agnostic, and Linux'es placement of the header is also making
>>> no connection to x86 (oddly enough, the current placement in the
>>> Xen tree does, for a reason which escapes me).
>>
>> FWIW, e820 types are now part of the ACPI standard.  So using them is
>> not necessarily related to x86, and reasonably x86-agnostic.
> 
> Sort of - the description of it starts with "This interface is used in
> real mode only on IA-PC-based systems ..."

Note I said the e820 *types*.  While the interface is there for PC
compatibility, the ACPI address range types (AddressRangeMemory,
AddressRangeReserved, AddressRangeACPI, etc.) are exactly the e820 types.

> But it being there is useful in another way: It shows that there's
> an optional field making the full structure 64-bit aligned again. (It
> at the same time shows - I admit I had forgotten about this aspect -
> that the structure size isn't fixed in the first place, so consumers
> have to convert [truncate/extend] the output to their internal
> representation anyway, and hence there's even less of a reason
> to tie the proposed structure's layout to the E820 one.)

My point was that the e820 types are okay to use in an
architecture-agnostic way in my opinion.  The layout only matters so
much, as there aren't many ways to encode a memory map (note I do agree
about that alignment dword).

Paolo

Patch

diff --git a/include/xen/interface/hvm/start_info.h b/include/xen/interface/hvm/start_info.h
index 6484159..60206bb 100644
--- a/include/xen/interface/hvm/start_info.h
+++ b/include/xen/interface/hvm/start_info.h
@@ -33,7 +33,7 @@ 
  *    | magic          | Contains the magic value XEN_HVM_START_MAGIC_VALUE
  *    |                | ("xEn3" with the 0x80 bit of the "E" set).
  *  4 +----------------+
- *    | version        | Version of this structure. Current version is 0. New
+ *    | version        | Version of this structure. Current version is 1. New
  *    |                | versions are guaranteed to be backwards-compatible.
  *  8 +----------------+
  *    | flags          | SIF_xxx flags.
@@ -48,6 +48,12 @@ 
  * 32 +----------------+
  *    | rsdp_paddr     | Physical address of the RSDP ACPI data structure.
  * 40 +----------------+
+ *    | memmap_paddr   | Physical address of the memory map. Only present in
+ *    |                | version 1 and newer of the structure.
+ * 48 +----------------+
+ *    | memmap_entries | Number of entries in the memory map table. Only
+ *    |                | present in version 1 and newer of the structure.
+ * 52 +----------------+
  *
  * The layout of each entry in the module structure is the following:
  *
@@ -62,6 +68,17 @@ 
  *    | reserved       |
  * 32 +----------------+
  *
+ * The layout of each entry in the memory map table is as follows and no
+ * padding is used between entries in the array:
+ *
+ *  0 +----------------+
+ *    | addr           | Base address
+ *  8 +----------------+
+ *    | size           | Size of mapping
+ * 16 +----------------+
+ *    | type           | E820_TYPE_xxx
+ * 20 +----------------|
+ *
  * The address and sizes are always a 64bit little endian unsigned integer.
  *
  * NB: Xen on x86 will always try to place all the data below the 4GiB
@@ -86,13 +103,24 @@  struct hvm_start_info {
     uint64_t cmdline_paddr;     /* Physical address of the command line.     */
     uint64_t rsdp_paddr;        /* Physical address of the RSDP ACPI data    */
                                 /* structure.                                */
-};
+    uint64_t memmap_paddr;	/* Physical address of an array of           */
+				/* hvm_memmap_table_entry. Only present in   */
+				/* Ver 1 or later. For e820 mem map table.   */
+    uint32_t memmap_entries;	/* Only present in Ver 1 or later. Number of */
+				/* entries in the memmap table.              */
+} __attribute__((packed));
 
 struct hvm_modlist_entry {
     uint64_t paddr;             /* Physical address of the module.           */
     uint64_t size;              /* Size of the module in bytes.              */
     uint64_t cmdline_paddr;     /* Physical address of the command line.     */
     uint64_t reserved;
-};
+} __attribute__((packed));
+
+struct hvm_memmap_table_entry {
+    uint64_t addr;		/* Base address of the memory region         */
+    uint64_t size;		/* Size of the memory region                 */
+    uint32_t type;		/* E820_TYPE_xxx of the memory region        */
+} __attribute__((packed));
 
 #endif /* __XEN_PUBLIC_ARCH_X86_HVM_START_INFO_H__ */