diff mbox series

[v5,15/16] KVM: Add documentation for Xen hypercall and shared_info updates

Message ID 20210111195725.4601-16-dwmw2@infradead.org (mailing list archive)
State New, archived
Headers show
Series KVM: Add minimal support for Xen HVM guests | expand

Commit Message

David Woodhouse Jan. 11, 2021, 7:57 p.m. UTC
From: David Woodhouse <dwmw@amazon.co.uk>

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
---
 Documentation/virt/kvm/api.rst | 124 ++++++++++++++++++++++++++++++++-
 1 file changed, 123 insertions(+), 1 deletion(-)

Comments

Paolo Bonzini Jan. 28, 2021, 12:18 p.m. UTC | #1
On 11/01/21 20:57, David Woodhouse wrote:
> 
> +
> +KVM_XEN_ATTR_TYPE_LONG_MODE
> +  Sets the ABI mode of the VM to 32-bit or 64-bit (long mode). This
> +  determines the layout of the shared info pages exposed to the VM.
> +
> +KVM_XEN_ATTR_TYPE_SHARED_INFO
> +  Sets the guest physical frame number at which the Xen "shared info"
> +  page resides. Note that although Xen places vcpu_info for the first
> +  32 vCPUs in the shared_info page, KVM does not automatically do so
> +  and requires that KVM_XEN_ATTR_TYPE_VCPU_INFO be used explicitly
> +  even when the vcpu_info for a given vCPU resides at the "default"
> +  location in the shared_info page. This is because KVM is not aware
> +  of the Xen CPU id which is used as the index into the vcpu_info[]
> +  array, so cannot know the correct default location.
> +
> +KVM_XEN_ATTR_TYPE_VCPU_INFO
> +  Sets the guest physical address of the vcpu_info for a given vCPU.
> +
> +KVM_XEN_ATTR_TYPE_VCPU_TIME_INFO
> +  Sets the guest physical address of an additional pvclock structure
> +  for a given vCPU. This is typically used for guest vsyscall support.
> +
> +KVM_XEN_ATTR_TYPE_VCPU_RUNSTATE
> +  Sets the guest physical address of the vcpu_runstate_info for a given
> +  vCPU. This is how a Xen guest tracks CPU state such as steal time.
> +

My only qualm is really that the userspace API is really ugly.

Can you just have both a VM and a VCPU ioctl (so no vcpu_id to pass!), 
add a generous padding to the struct, and just get everything out with a 
single ioctl without having to pass in a type?

Paolo
David Woodhouse Jan. 28, 2021, 4:49 p.m. UTC | #2
On Thu, 2021-01-28 at 13:18 +0100, Paolo Bonzini wrote:
> My only qualm is really that the userspace API is really ugly.
> 
> Can you just have both a VM and a VCPU ioctl (so no vcpu_id to pass!), 

Sure, that seems like a sensible thing to do.

> add a generous padding to the struct,

I think I added *some* padding to the struct kvm_xen_hvm_attr which
wasn't there in Joao's original. I can add more, certainly.

>  and just get everything out with a 
> single ioctl without having to pass in a type?

Honestly, I don't even care about reading it out except for long_mode
which the kernel *does* infer for itself when the MSR is used to fill
in the hypercall page.

I quite like keeping them separate; they *do* get set separately, in
response to different hypercalls from the guest. And the capabilities
translate naturally to a given field existing or not existing; having
another mapping of that to fields in a binary structure would be
additional complexity.
Paolo Bonzini Jan. 28, 2021, 5:06 p.m. UTC | #3
On 28/01/21 17:49, David Woodhouse wrote:
> On Thu, 2021-01-28 at 13:18 +0100, Paolo Bonzini wrote:
>> My only qualm is really that the userspace API is really ugly.
>>
>> Can you just have both a VM and a VCPU ioctl (so no vcpu_id to pass!),
> 
> Sure, that seems like a sensible thing to do.
> 
>> add a generous padding to the struct,
> 
> I think I added *some* padding to the struct kvm_xen_hvm_attr which
> wasn't there in Joao's original. I can add more, certainly.
> 
>>   and just get everything out with a
>> single ioctl without having to pass in a type?
> 
> Honestly, I don't even care about reading it out except for long_mode
> which the kernel *does* infer for itself when the MSR is used to fill
> in the hypercall page.

What about VM migration?

> I quite like keeping them separate; they *do* get set separately, in
> response to different hypercalls from the guest. And the capabilities
> translate naturally to a given field existing or not existing; having
> another mapping of that to fields in a binary structure would be
> additional complexity.

Does it make sense to reuse the bits that you return from 
KVM_CHECK_EXTENSION as a bitset for both the get and set ioctls?  The 
struct then would be

	struct kvm_xen_attr {
		uint32_t valid;
		uint32_t lm;
		struct {
		} ...;
		uint8_t pad[nnn /* to 128 bytes */];
	};

The get ioctl would return a constant value in "valid" (different for 
the VM and VCPU ioctls of course), the set ioctl would look only at the 
fields mentioned in "valid" and error out if they're unsupported or 
invalid for VM/VCPU.  Basically the "switch" you have becomes a series 
of "if (attr->valid & ...)" statements.

Paolo
Paolo Bonzini Jan. 29, 2021, 7:59 a.m. UTC | #4
On 28/01/21 20:26, David Woodhouse wrote:
>>> Honestly, I don't even care about reading it out except for long_mode
>>> which the kernel *does* infer for itself when the MSR is used to fill
>>> in the hypercall page.
>>
>> What about VM migration?
> 
> The VMM needs to know all these things anyway. It's not like it can
> *forget* where each vCPU's vcpu_info is, then just ask the kernel when
> it's time to orchestrate a migration.

Yeah it may not be particularly useful but it's not hard to write the 
code and it's easier to document.

> I suppose *maybe* the upcall vector is an exception; the VMM *could*
> forget that if it really wanted to, then ask the kernel for it on
> migration. But we don't because that seems pointless.
>
> But then it *theoretically* could have a sparse bitmap of "this feature
> but not the next feature", even though that would probably never happen
> in practice without weird selective backports. But it's still an icky
> API. And frankly I could live without the 'get' for any of these except
> LONG_MODE which the kernel might actually flip for us. The rest could
> be write-only for all I care.

I don't think it's that icky, the alternative would be a KVM_GET_ONE_REG 
like architecture that just returns a u64 but the bitmap works too. 
It's not hard to write the code.

Paolo
diff mbox series

Patch

diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index c136e254b496..911171caeedb 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -961,6 +961,13 @@  memory.
 	__u8 pad2[30];
   };
 
+If the KVM_XEN_HVM_CONFIG_INTERCEPT_HCALL flag is returned from the
+KVM_CAP_XEN_HVM check, it may be set in the flags field of this ioctl.
+This requests KVM to generate the contents of the hypercall page
+automatically, and also to intercept hypercalls with KVM_EXIT_XEN.
+In this case, all of the blob size and address fields must be zero.
+
+No other flags are currently valid.
 
 4.29 KVM_GET_CLOCK
 ------------------
@@ -4830,6 +4837,71 @@  into user space.
 If a vCPU is in running state while this ioctl is invoked, the vCPU may
 experience inconsistent filtering behavior on MSR accesses.
 
+4.127 KVM_XEN_HVM_SET_ATTR
+--------------------------
+
+:Capability: KVM_CAP_XEN_HVM / KVM_XEN_HVM_CONFIG_SHARED_INFO
+:Architectures: x86
+:Type: vm ioctl
+:Parameters: struct kvm_xen_hvm_attr
+:Returns: 0 on success, < 0 on error
+
+::
+
+  struct kvm_xen_hvm_attr {
+	__u16 type;
+
+	union {
+		__u8 long_mode;
+		struct {
+			__u64 gfn;
+		} shared_info;
+		struct {
+			__u32 vcpu_id;
+			__u64 gpa;
+		} vcpu_attr;
+		__u64 pad[4];
+	} u;
+  };
+
+type values:
+
+KVM_XEN_ATTR_TYPE_LONG_MODE
+  Sets the ABI mode of the VM to 32-bit or 64-bit (long mode). This
+  determines the layout of the shared info pages exposed to the VM.
+
+KVM_XEN_ATTR_TYPE_SHARED_INFO
+  Sets the guest physical frame number at which the Xen "shared info"
+  page resides. Note that although Xen places vcpu_info for the first
+  32 vCPUs in the shared_info page, KVM does not automatically do so
+  and requires that KVM_XEN_ATTR_TYPE_VCPU_INFO be used explicitly
+  even when the vcpu_info for a given vCPU resides at the "default"
+  location in the shared_info page. This is because KVM is not aware
+  of the Xen CPU id which is used as the index into the vcpu_info[]
+  array, so cannot know the correct default location.
+
+KVM_XEN_ATTR_TYPE_VCPU_INFO
+  Sets the guest physical address of the vcpu_info for a given vCPU.
+
+KVM_XEN_ATTR_TYPE_VCPU_TIME_INFO
+  Sets the guest physical address of an additional pvclock structure
+  for a given vCPU. This is typically used for guest vsyscall support.
+
+KVM_XEN_ATTR_TYPE_VCPU_RUNSTATE
+  Sets the guest physical address of the vcpu_runstate_info for a given
+  vCPU. This is how a Xen guest tracks CPU state such as steal time.
+
+4.128 KVM_XEN_HVM_GET_ATTR
+--------------------------
+
+:Capability: KVM_CAP_XEN_HVM / KVM_XEN_HVM_CONFIG_SHARED_INFO
+:Architectures: x86
+:Type: vm ioctl
+:Parameters: struct kvm_xen_hvm_attr
+:Returns: 0 on success, < 0 on error
+
+Allows Xen VM attributes to be read. For the structure and types,
+see KVM_XEN_HVM_SET_ATTR above.
 
 5. The kvm_run structure
 ========================
@@ -5326,6 +5398,34 @@  wants to write. Once finished processing the event, user space must continue
 vCPU execution. If the MSR write was unsuccessful, user space also sets the
 "error" field to "1".
 
+::
+
+
+		struct kvm_xen_exit {
+  #define KVM_EXIT_XEN_HCALL          1
+			__u32 type;
+			union {
+				struct {
+					__u32 longmode;
+					__u32 cpl;
+					__u64 input;
+					__u64 result;
+					__u64 params[6];
+				} hcall;
+			} u;
+		};
+		/* KVM_EXIT_XEN */
+                struct kvm_hyperv_exit xen;
+
+Indicates that the VCPU exits into userspace to process some tasks
+related to Xen emulation.
+
+Valid values for 'type' are:
+
+  - KVM_EXIT_XEN_HCALL -- synchronously notify user-space about Xen hypercall.
+    Userspace is expected to place the hypercall result into the appropriate
+    field before invoking KVM_RUN again.
+
 ::
 
 		/* Fix the size of the union. */
@@ -6414,7 +6514,6 @@  guest according to the bits in the KVM_CPUID_FEATURES CPUID leaf
 (0x40000001). Otherwise, a guest may use the paravirtual features
 regardless of what has actually been exposed through the CPUID leaf.
 
-
 8.29 KVM_CAP_DIRTY_LOG_RING
 ---------------------------
 
@@ -6501,3 +6600,26 @@  KVM_GET_DIRTY_LOG and KVM_CLEAR_DIRTY_LOG.  After enabling
 KVM_CAP_DIRTY_LOG_RING with an acceptable dirty ring size, the virtual
 machine will switch to ring-buffer dirty page tracking and further
 KVM_GET_DIRTY_LOG or KVM_CLEAR_DIRTY_LOG ioctls will fail.
+
+8.30 KVM_CAP_XEN_HVM
+--------------------
+
+:Architectures: x86
+
+This capability indicates the features that Xen supports for hosting Xen
+PVHVM guests. Valid flags are::
+
+  #define KVM_XEN_HVM_CONFIG_HYPERCALL_MSR	(1 << 0)
+  #define KVM_XEN_HVM_CONFIG_INTERCEPT_HCALL	(1 << 1)
+  #define KVM_XEN_HVM_CONFIG_SHARED_INFO	(1 << 2)
+
+The KVM_XEN_HVM_CONFIG_HYPERCALL_MSR flag indicates that the KVM_XEN_HVM_CONFIG
+ioctl is available, for the guest to set its hypercall page.
+
+If KVM_XEN_HVM_CONFIG_INTERCEPT_HCALL is also set, the same flag may also be
+provided in the flags to KVM_XEN_HVM_CONFIG, without providing hypercall page
+contents, to request that KVM generate hypercall page content automatically
+and also enable interception of guest hypercalls with KVM_EXIT_XEN.
+
+The KVM_XEN_HVM_CONFIG_SHARED_INFO flag indicates the availability of the
+KVM_XEN_HVM_SET_ATTR and KVM_XEN_HVM_GET_ATTR ioctls.