[v2] x86/HVMlite: document the BSP/AP boot ABI
diff mbox

Message ID 1453717009-91577-1-git-send-email-royger@FreeBSD.org
State New, archived
Headers show

Commit Message

Roger Pau Monné Jan. 25, 2016, 10:16 a.m. UTC
From: Roger Pau Monne <roger.pau@citrix.com>

The discussion in [1] lead to an agreement of the missing pieces in PVH
(or HVM without a device-model) in order to progress with it's
implementation.

One of the missing pieces is a new boot ABI, that replaces the PV boot
ABI. The aim of this new boot ABI is to remove the limitations of the
PV boot ABI, that are no longer present when using auto-translated
guests. The new boot protocol should allow to use the same entry point
for both 32bit and 64bit guests, and let the guest choose it's bitness
at run time without the domain builder knowing in advance.

This patch introduces a new document called hvmlite.markdown, with the
intention of merging it into pvh.markdown once the HVMlite implementation
has feature parity with PVH and the old PVH ABI is replaced with the
HVMlite one.

[1] http://lists.xen.org/archives/html/xen-devel/2015-06/msg00258.html

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Tim Deegan <tim@xen.org>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
---
Changes since v1:
 - Prepend x86 to the title.
 - Consistently use "must".
---
 docs/misc/hvmlite.markdown | 82 ++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 82 insertions(+)
 create mode 100644 docs/misc/hvmlite.markdown

Comments

Andrew Cooper Jan. 25, 2016, 10:21 a.m. UTC | #1
On 25/01/16 10:16, Roger Pau Monne wrote:
> From: Roger Pau Monne <roger.pau@citrix.com>
>
> The discussion in [1] lead to an agreement of the missing pieces in PVH
> (or HVM without a device-model) in order to progress with it's
> implementation.
>
> One of the missing pieces is a new boot ABI, that replaces the PV boot
> ABI. The aim of this new boot ABI is to remove the limitations of the
> PV boot ABI, that are no longer present when using auto-translated
> guests. The new boot protocol should allow to use the same entry point
> for both 32bit and 64bit guests, and let the guest choose it's bitness

I would add "and paging mode" here, as it is just as important as bitness.

Otherwise, Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>

Patch
diff mbox

diff --git a/docs/misc/hvmlite.markdown b/docs/misc/hvmlite.markdown
new file mode 100644
index 0000000..c1b75c6
--- /dev/null
+++ b/docs/misc/hvmlite.markdown
@@ -0,0 +1,82 @@ 
+**NOTE**: this document will be merged into `pvh.markdown` once PVH is replaced
+with the HVMlite implementation.
+
+# x86/HVM direct boot ABI #
+
+Since the Xen entry point into the kernel can be different from the
+native entry point, a `ELFNOTE` is used in order to tell the domain
+builder how to load and jump into the kernel entry point:
+
+    ELFNOTE(Xen, XEN_ELFNOTE_PHYS32_ENTRY,          .long,  xen_start32)
+
+The presence of the `XEN_ELFNOTE_PHYS32_ENTRY` note indicates that the
+kernel supports the boot ABI described in this document.
+
+The domain builder must load the kernel into the guest memory space and
+jump into the entry point defined at `XEN_ELFNOTE_PHYS32_ENTRY` with the
+following machine state:
+
+ * `ebx`: contains the physical memory address where the loader has placed
+   the boot start info structure.
+
+ * `cr0`: bit 0 (PE) must be set. All the other writeable bits are cleared.
+
+ * `cr4`: all bits are cleared.
+
+ * `cs`: must be a 32-bit read/execute code segment with a base of ‘0’
+   and a limit of ‘0xFFFFFFFF’. The selector value is unspecified.
+
+ * `ds`, `es`: must be a 32-bit read/write data segment with a base of
+   ‘0’ and a limit of ‘0xFFFFFFFF’. The selector values are all unspecified.
+
+ * `tr`: must be a 32-bit TSS (active) with a base of '0' and a limit of '0x67'.
+
+ * `eflags`: bit 17 (VM) must be cleared. Bit 9 (IF) must be cleared.
+   Bit 8 (TF) must be cleared. Other bits are all unspecified.
+
+All other processor registers and flag bits are unspecified. The OS is in
+charge of setting up it's own stack, GDT and IDT.
+
+The format of the boot start info structure is the following (pointed to
+be %ebx):
+
+    struct hvm_start_info {
+    #define HVM_START_MAGIC_VALUE 0x336ec578
+        uint32_t magic;             /* Contains the magic value 0x336ec578       */
+                                    /* ("xEn3" with the 0x80 bit of the "E" set).*/
+        uint32_t flags;             /* SIF_xxx flags.                            */
+        uint32_t cmdline_paddr;     /* Physical address of the command line.     */
+        uint32_t nr_modules;        /* Number of modules passed to the kernel.   */
+        uint32_t modlist_paddr;     /* Physical address of an array of           */
+                                    /* hvm_modlist_entry.                        */
+    };
+
+    struct hvm_modlist_entry {
+        uint32_t paddr;             /* Physical address of the module.           */
+        uint32_t size;              /* Size of the module in bytes.              */
+    };
+
+Other relevant information needed in order to boot a guest kernel
+(console page address, xenstore event channel...) can be obtained
+using HVMPARAMS, just like it's done on HVM guests.
+
+The setup of the hypercall page is also performed in the same way
+as HVM guests, using the hypervisor cpuid leaves and msr ranges.
+
+## AP startup ##
+
+AP startup is performed using hypercalls. The following VCPU operations
+are used in order to bring up secondary vCPUs:
+
+ * `VCPUOP_initialise` is used to set the initial state of the vCPU. The
+   argument passed to the hypercall must be of the type vcpu_hvm_context.
+   See `public/hvm/hvm_vcpu.h` for the layout of the structure. Note that
+   this hypercall allows starting the vCPU in several modes (16/32/64bits),
+   regardless of the mode the BSP is currently running on.
+
+ * `VCPUOP_up` is used to launch the vCPU once the initial state has been
+   set using `VCPUOP_initialise`.
+
+ * `VCPUOP_down` is used to bring down a vCPU.
+
+ * `VCPUOP_is_up` is used to scan the number of available vCPUs.