diff mbox

[v8,2/2] kvm: x86: hyperv: guest->host event signaling via eventfd

Message ID 20180131135633.11228-3-rkagan@virtuozzo.com (mailing list archive)
State New, archived
Headers show

Commit Message

Roman Kagan Jan. 31, 2018, 1:56 p.m. UTC
In Hyper-V, the fast guest->host notification mechanism is the
SIGNAL_EVENT hypercall, with a single parameter of the connection ID to
signal.

Currently this hypercall incurs a user exit and requires the userspace
to decode the parameters and trigger the notification of the potentially
different I/O context.

To avoid the costly user exit, process this hypercall and signal the
corresponding eventfd in KVM, similar to ioeventfd.  The association
between the connection id and the eventfd is established via the newly
introduced KVM_HYPERV_EVENTFD ioctl, and maintained in an
(srcu-protected) IDR.

Signed-off-by: Roman Kagan <rkagan@virtuozzo.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
---
v7 -> v8:
 - rebase to latest master (adjust ioctl and doc section numbers)

v6 -> v7:
 - reject non-zero flag number as invalid
 - adjust error returns to better match the spec
 [these changes didn't look critical so I retained David's r-b]

v5 -> v6:
 - drop unnecessary srcu_read_lock/unlock, and clean up after that

v4 -> v5:
 - fix block comment formatting

v3 -> v4:
 - switch to kvm_vcpu_read_guest and take srcu_read_lock around it
 - rework and document the interpretation of the hypercall parameter
 - merge !fast version into kvm_hvcall_signal_event for brevity

v2 -> v3:
 - expand docs on allowed values and return codes
 - fix uninitialized return
 - style fixes

v1 -> v2:
 - make data types consistent
 - get by without the recently dropped struct hv_input_signal_event
 - fix subject prefix

 Documentation/virtual/kvm/api.txt  |  31 +++++++++++
 arch/x86/include/asm/kvm_host.h    |   2 +
 arch/x86/include/uapi/asm/hyperv.h |   2 +
 arch/x86/kvm/hyperv.h              |   1 +
 include/uapi/linux/kvm.h           |  13 +++++
 arch/x86/kvm/hyperv.c              | 103 ++++++++++++++++++++++++++++++++++++-
 arch/x86/kvm/x86.c                 |  10 ++++
 7 files changed, 161 insertions(+), 1 deletion(-)

Comments

Radim Krčmář Jan. 31, 2018, 6:18 p.m. UTC | #1
2018-01-31 16:56+0300, Roman Kagan:
> In Hyper-V, the fast guest->host notification mechanism is the
> SIGNAL_EVENT hypercall, with a single parameter of the connection ID to
> signal.
> 
> Currently this hypercall incurs a user exit and requires the userspace
> to decode the parameters and trigger the notification of the potentially
> different I/O context.
> 
> To avoid the costly user exit, process this hypercall and signal the
> corresponding eventfd in KVM, similar to ioeventfd.  The association
> between the connection id and the eventfd is established via the newly
> introduced KVM_HYPERV_EVENTFD ioctl, and maintained in an
> (srcu-protected) IDR.
> 
> Signed-off-by: Roman Kagan <rkagan@virtuozzo.com>
> Reviewed-by: David Hildenbrand <david@redhat.com>
> ---
> v7 -> v8:
>  - rebase to latest master (adjust ioctl and doc section numbers)
> 
> v6 -> v7:
>  - reject non-zero flag number as invalid
>  - adjust error returns to better match the spec
>  [these changes didn't look critical so I retained David's r-b]
> 
> v5 -> v6:
>  - drop unnecessary srcu_read_lock/unlock, and clean up after that
> 
> v4 -> v5:
>  - fix block comment formatting
> 
> v3 -> v4:
>  - switch to kvm_vcpu_read_guest and take srcu_read_lock around it
>  - rework and document the interpretation of the hypercall parameter
>  - merge !fast version into kvm_hvcall_signal_event for brevity
> 
> v2 -> v3:
>  - expand docs on allowed values and return codes
>  - fix uninitialized return
>  - style fixes
> 
> v1 -> v2:
>  - make data types consistent
>  - get by without the recently dropped struct hv_input_signal_event
>  - fix subject prefix
> 
>  Documentation/virtual/kvm/api.txt  |  31 +++++++++++
>  arch/x86/include/asm/kvm_host.h    |   2 +
>  arch/x86/include/uapi/asm/hyperv.h |   2 +
>  arch/x86/kvm/hyperv.h              |   1 +
>  include/uapi/linux/kvm.h           |  13 +++++
>  arch/x86/kvm/hyperv.c              | 103 ++++++++++++++++++++++++++++++++++++-
>  arch/x86/kvm/x86.c                 |  10 ++++
>  7 files changed, 161 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/kvm/hyperv.c b/arch/x86/kvm/hyperv.c
> @@ -1226,6 +1227,43 @@ static int kvm_hv_hypercall_complete_userspace(struct kvm_vcpu *vcpu)
>  	return 1;
>  }
>  
> +static u16 kvm_hvcall_signal_event(struct kvm_vcpu *vcpu, bool fast, u64 param)
> +{
> +	struct eventfd_ctx *eventfd;
> +
> +	if (unlikely(!fast)) {
> +		int ret;
> +		gpa_t gpa = param;
> +
> +		if ((gpa & (__alignof__(param) - 1)) ||
> +		    offset_in_page(gpa) + sizeof(param) > PAGE_SIZE)
> +			return HV_STATUS_INVALID_ALIGNMENT;
> +
> +		ret = kvm_vcpu_read_guest(vcpu, gpa, &param, sizeof(param));
> +		if (ret < 0)
> +			return HV_STATUS_INVALID_ALIGNMENT;
> +	}

I have completely missed this series in the past, sorry.

It looks good, but I don't follow this:

> +
> +	/*
> +	 * Per spec, bits 32-47 contain the extra "flag number".  However, we
> +	 * have no use for it, and in all known usecases it is zero, so just
> +	 * require it here.
> +	 */

The spec says:

  FlagNumber specifies the relative index of the event flag that the
  caller wants to set within the target SIEF area. This number is
  relative to the base flag number associated with the port.

And then:

  11.5.1 Event Flag Delivery
  When a partition calls HvSignalEvent, it specifies an event flag
  number. The hypervisor responds by atomically setting a bit within the
  receiving virtual processor’s SIEF page. (See section 11.9 for a
  detailed description of the SIEF page.) Virtual processors whose SynIC
  or SIEF page is disabled will not be considered as potential targets.
  If no targets are available, the hypervisor terminates the operation
  and returns an error to the caller.

  If the event flag was previously cleared, the hypervisor attempts to
  notify the receiving partition that the flag is now set by generating
  an edge-triggered interrupt. The target virtual processor, along with
  the target SINTx, is specified as part of a port’s creation. (See the
  following for information about ports.) If the SINTx is masked,
  HvSignalEvent returns HV_STATUS_INVALID_SYNIC_STATE.

  As with any fixed-priority external interrupt, the interrupt is not
  acknowledged by the virtual processor until the process priority
  register (PPR) is less than the vector specified in the SINTx register
  and interrupts are not masked by the virtual processor (rFLAGS[IF] is
  set to 1).

Don't we want to pass the flag into userspace?

Thanks.

> +	if (param & 0xffff00000000)
> +		return HV_STATUS_INVALID_PARAMETER;
> +	/* remaining bits are reserved-zero */
> +	if (param & ~KVM_HYPERV_CONN_ID_MASK)
> +		return HV_STATUS_INVALID_HYPERCALL_INPUT;
> +
> +	/* conn_to_evt is protected by vcpu->kvm->srcu */
> +	eventfd = idr_find(&vcpu->kvm->arch.hyperv.conn_to_evt, param);
> +	if (!eventfd)
> +		return HV_STATUS_INVALID_PORT_ID;
> +
> +	eventfd_signal(eventfd, 1);
> +	return HV_STATUS_SUCCESS;
> +}
> +
Roman Kagan Feb. 1, 2018, 12:45 p.m. UTC | #2
On Wed, Jan 31, 2018 at 07:18:12PM +0100, Radim Krčmář wrote:
> 2018-01-31 16:56+0300, Roman Kagan:
> I have completely missed this series in the past, sorry.

No wonder, everybody was too busy with all this Meltdown+Spectre
stuff.

> > +	/*
> > +	 * Per spec, bits 32-47 contain the extra "flag number".  However, we
> > +	 * have no use for it, and in all known usecases it is zero, so just
> > +	 * require it here.
> > +	 */
> 
> The spec says:
> 
>   FlagNumber specifies the relative index of the event flag that the
>   caller wants to set within the target SIEF area. This number is
>   relative to the base flag number associated with the port.

This smells like an overdesign to me.  Anyway we've never seen non-zero
FlagNumbers.  The guest code in Linux doesn't even define a way to pass
such a FlagNumber.

> Don't we want to pass the flag into userspace?

To be future-proof in case there turns out to be a user for non-zero
FlagNumber?  Makes sense, will respin with that.

Thanks,
Roman.
Radim Krčmář Feb. 1, 2018, 1:20 p.m. UTC | #3
2018-02-01 15:45+0300, Roman Kagan:
> On Wed, Jan 31, 2018 at 07:18:12PM +0100, Radim Krčmář wrote:
> > 2018-01-31 16:56+0300, Roman Kagan:
> > I have completely missed this series in the past, sorry.
> 
> No wonder, everybody was too busy with all this Meltdown+Spectre
> stuff.
> 
> > > +	/*
> > > +	 * Per spec, bits 32-47 contain the extra "flag number".  However, we
> > > +	 * have no use for it, and in all known usecases it is zero, so just
> > > +	 * require it here.
> > > +	 */
> > 
> > The spec says:
> > 
> >   FlagNumber specifies the relative index of the event flag that the
> >   caller wants to set within the target SIEF area. This number is
> >   relative to the base flag number associated with the port.
> 
> This smells like an overdesign to me.  Anyway we've never seen non-zero
> FlagNumbers.  The guest code in Linux doesn't even define a way to pass
> such a FlagNumber.
> 
> > Don't we want to pass the flag into userspace?
> 
> To be future-proof in case there turns out to be a user for non-zero
> FlagNumber?  Makes sense, will respin with that.

Exactly, thanks!
diff mbox

Patch

diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
index fc3ae951bc07..3a3fbcc3377d 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -3449,6 +3449,37 @@  array bounds check and the array access.
 These fields use the same bit definitions as the new
 H_GET_CPU_CHARACTERISTICS hypercall.
 
+4.110 KVM_HYPERV_EVENTFD
+
+Capability: KVM_CAP_HYPERV_EVENTFD
+Architectures: x86
+Type: vm ioctl
+Parameters: struct kvm_hyperv_eventfd (in)
+
+This ioctl (un)registers an eventfd to receive notifications from the guest on
+the specified Hyper-V connection id through the SIGNAL_EVENT hypercall, without
+causing a user exit.
+
+struct kvm_hyperv_eventfd {
+	__u32 conn_id;
+	__s32 fd;
+	__u32 flags;
+	__u32 padding[3];
+};
+
+The conn_id field should fit within 24 bits:
+
+#define KVM_HYPERV_CONN_ID_MASK		0x00ffffff
+
+The acceptable values for the flags field are:
+
+#define KVM_HYPERV_EVENTFD_DEASSIGN	(1 << 0)
+
+Returns: 0 on success,
+	-EINVAL if conn_id or flags is outside the allowed range
+	-ENOENT on deassign if the conn_id isn't registered
+	-EEXIST on assign if the conn_id is already registered
+
 5. The kvm_run structure
 ------------------------
 
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 516798431328..6a9914752a84 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -752,6 +752,8 @@  struct kvm_hv {
 	u64 hv_crash_ctl;
 
 	HV_REFERENCE_TSC_PAGE tsc_ref;
+
+	struct idr conn_to_evt;
 };
 
 enum kvm_irqchip_mode {
diff --git a/arch/x86/include/uapi/asm/hyperv.h b/arch/x86/include/uapi/asm/hyperv.h
index 1a5bfead93b4..f9ed479d479c 100644
--- a/arch/x86/include/uapi/asm/hyperv.h
+++ b/arch/x86/include/uapi/asm/hyperv.h
@@ -276,7 +276,9 @@  enum HV_GENERIC_SET_FORMAT {
 #define HV_STATUS_INVALID_HYPERCALL_CODE	2
 #define HV_STATUS_INVALID_HYPERCALL_INPUT	3
 #define HV_STATUS_INVALID_ALIGNMENT		4
+#define HV_STATUS_INVALID_PARAMETER		5
 #define HV_STATUS_INSUFFICIENT_MEMORY		11
+#define HV_STATUS_INVALID_PORT_ID		17
 #define HV_STATUS_INVALID_CONNECTION_ID		18
 #define HV_STATUS_INSUFFICIENT_BUFFERS		19
 
diff --git a/arch/x86/kvm/hyperv.h b/arch/x86/kvm/hyperv.h
index cc2468244ca2..837465d69c6d 100644
--- a/arch/x86/kvm/hyperv.h
+++ b/arch/x86/kvm/hyperv.h
@@ -90,5 +90,6 @@  void kvm_hv_setup_tsc_page(struct kvm *kvm,
 
 void kvm_hv_init_vm(struct kvm *kvm);
 void kvm_hv_destroy_vm(struct kvm *kvm);
+int kvm_vm_ioctl_hv_eventfd(struct kvm *kvm, struct kvm_hyperv_eventfd *args);
 
 #endif
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 8fb90a0819c3..38d29ea65c64 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -934,6 +934,7 @@  struct kvm_ppc_resize_hpt {
 #define KVM_CAP_S390_AIS_MIGRATION 150
 #define KVM_CAP_PPC_GET_CPU_CHAR 151
 #define KVM_CAP_S390_BPB 152
+#define KVM_CAP_HYPERV_EVENTFD 153
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
@@ -1363,6 +1364,8 @@  struct kvm_s390_ucas_mapping {
 #define KVM_S390_GET_CMMA_BITS      _IOWR(KVMIO, 0xb8, struct kvm_s390_cmma_log)
 #define KVM_S390_SET_CMMA_BITS      _IOW(KVMIO, 0xb9, struct kvm_s390_cmma_log)
 
+#define KVM_HYPERV_EVENTFD	_IOW(KVMIO,  0xba, struct kvm_hyperv_eventfd)
+
 #define KVM_DEV_ASSIGN_ENABLE_IOMMU	(1 << 0)
 #define KVM_DEV_ASSIGN_PCI_2_3		(1 << 1)
 #define KVM_DEV_ASSIGN_MASK_INTX	(1 << 2)
@@ -1423,4 +1426,14 @@  struct kvm_assigned_msix_entry {
 #define KVM_ARM_DEV_EL1_PTIMER		(1 << 1)
 #define KVM_ARM_DEV_PMU			(1 << 2)
 
+struct kvm_hyperv_eventfd {
+	__u32 conn_id;
+	__s32 fd;
+	__u32 flags;
+	__u32 padding[3];
+};
+
+#define KVM_HYPERV_CONN_ID_MASK		0x00ffffff
+#define KVM_HYPERV_EVENTFD_DEASSIGN	(1 << 0)
+
 #endif /* __LINUX_KVM_H */
diff --git a/arch/x86/kvm/hyperv.c b/arch/x86/kvm/hyperv.c
index 015fb06c7522..7c283c404238 100644
--- a/arch/x86/kvm/hyperv.c
+++ b/arch/x86/kvm/hyperv.c
@@ -29,6 +29,7 @@ 
 #include <linux/kvm_host.h>
 #include <linux/highmem.h>
 #include <linux/sched/cputime.h>
+#include <linux/eventfd.h>
 
 #include <asm/apicdef.h>
 #include <trace/events/kvm.h>
@@ -1226,6 +1227,43 @@  static int kvm_hv_hypercall_complete_userspace(struct kvm_vcpu *vcpu)
 	return 1;
 }
 
+static u16 kvm_hvcall_signal_event(struct kvm_vcpu *vcpu, bool fast, u64 param)
+{
+	struct eventfd_ctx *eventfd;
+
+	if (unlikely(!fast)) {
+		int ret;
+		gpa_t gpa = param;
+
+		if ((gpa & (__alignof__(param) - 1)) ||
+		    offset_in_page(gpa) + sizeof(param) > PAGE_SIZE)
+			return HV_STATUS_INVALID_ALIGNMENT;
+
+		ret = kvm_vcpu_read_guest(vcpu, gpa, &param, sizeof(param));
+		if (ret < 0)
+			return HV_STATUS_INVALID_ALIGNMENT;
+	}
+
+	/*
+	 * Per spec, bits 32-47 contain the extra "flag number".  However, we
+	 * have no use for it, and in all known usecases it is zero, so just
+	 * require it here.
+	 */
+	if (param & 0xffff00000000)
+		return HV_STATUS_INVALID_PARAMETER;
+	/* remaining bits are reserved-zero */
+	if (param & ~KVM_HYPERV_CONN_ID_MASK)
+		return HV_STATUS_INVALID_HYPERCALL_INPUT;
+
+	/* conn_to_evt is protected by vcpu->kvm->srcu */
+	eventfd = idr_find(&vcpu->kvm->arch.hyperv.conn_to_evt, param);
+	if (!eventfd)
+		return HV_STATUS_INVALID_PORT_ID;
+
+	eventfd_signal(eventfd, 1);
+	return HV_STATUS_SUCCESS;
+}
+
 int kvm_hv_hypercall(struct kvm_vcpu *vcpu)
 {
 	u64 param, ingpa, outgpa, ret;
@@ -1276,8 +1314,12 @@  int kvm_hv_hypercall(struct kvm_vcpu *vcpu)
 	case HVCALL_NOTIFY_LONG_SPIN_WAIT:
 		kvm_vcpu_on_spin(vcpu, true);
 		break;
-	case HVCALL_POST_MESSAGE:
 	case HVCALL_SIGNAL_EVENT:
+		res = kvm_hvcall_signal_event(vcpu, fast, ingpa);
+		if (res != HV_STATUS_INVALID_PORT_ID)
+			break;
+		/* maybe userspace knows this conn_id: fall through */
+	case HVCALL_POST_MESSAGE:
 		/* don't bother userspace if it has no way to handle it */
 		if (!vcpu_to_synic(vcpu)->active) {
 			res = HV_STATUS_INVALID_HYPERCALL_CODE;
@@ -1305,8 +1347,67 @@  int kvm_hv_hypercall(struct kvm_vcpu *vcpu)
 void kvm_hv_init_vm(struct kvm *kvm)
 {
 	mutex_init(&kvm->arch.hyperv.hv_lock);
+	idr_init(&kvm->arch.hyperv.conn_to_evt);
 }
 
 void kvm_hv_destroy_vm(struct kvm *kvm)
 {
+	struct eventfd_ctx *eventfd;
+	int i;
+
+	idr_for_each_entry(&kvm->arch.hyperv.conn_to_evt, eventfd, i)
+		eventfd_ctx_put(eventfd);
+	idr_destroy(&kvm->arch.hyperv.conn_to_evt);
+}
+
+static int kvm_hv_eventfd_assign(struct kvm *kvm, u32 conn_id, int fd)
+{
+	struct kvm_hv *hv = &kvm->arch.hyperv;
+	struct eventfd_ctx *eventfd;
+	int ret;
+
+	eventfd = eventfd_ctx_fdget(fd);
+	if (IS_ERR(eventfd))
+		return PTR_ERR(eventfd);
+
+	mutex_lock(&hv->hv_lock);
+	ret = idr_alloc(&hv->conn_to_evt, eventfd, conn_id, conn_id + 1,
+			GFP_KERNEL);
+	mutex_unlock(&hv->hv_lock);
+
+	if (ret >= 0)
+		return 0;
+
+	if (ret == -ENOSPC)
+		ret = -EEXIST;
+	eventfd_ctx_put(eventfd);
+	return ret;
+}
+
+static int kvm_hv_eventfd_deassign(struct kvm *kvm, u32 conn_id)
+{
+	struct kvm_hv *hv = &kvm->arch.hyperv;
+	struct eventfd_ctx *eventfd;
+
+	mutex_lock(&hv->hv_lock);
+	eventfd = idr_remove(&hv->conn_to_evt, conn_id);
+	mutex_unlock(&hv->hv_lock);
+
+	if (!eventfd)
+		return -ENOENT;
+
+	synchronize_srcu(&kvm->srcu);
+	eventfd_ctx_put(eventfd);
+	return 0;
+}
+
+int kvm_vm_ioctl_hv_eventfd(struct kvm *kvm, struct kvm_hyperv_eventfd *args)
+{
+	if ((args->flags & ~KVM_HYPERV_EVENTFD_DEASSIGN) ||
+	    (args->conn_id & ~KVM_HYPERV_CONN_ID_MASK))
+		return -EINVAL;
+
+	if (args->flags == KVM_HYPERV_EVENTFD_DEASSIGN)
+		return kvm_hv_eventfd_deassign(kvm, args->conn_id);
+	return kvm_hv_eventfd_assign(kvm, args->conn_id, args->fd);
 }
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index f57635bd9e61..aa2707f2b396 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2701,6 +2701,7 @@  int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
 	case KVM_CAP_HYPERV_SYNIC:
 	case KVM_CAP_HYPERV_SYNIC2:
 	case KVM_CAP_HYPERV_VP_INDEX:
+	case KVM_CAP_HYPERV_EVENTFD:
 	case KVM_CAP_PCI_SEGMENT:
 	case KVM_CAP_DEBUGREGS:
 	case KVM_CAP_X86_ROBUST_SINGLESTEP:
@@ -4295,6 +4296,15 @@  long kvm_arch_vm_ioctl(struct file *filp,
 		r = kvm_vm_ioctl_enable_cap(kvm, &cap);
 		break;
 	}
+	case KVM_HYPERV_EVENTFD: {
+		struct kvm_hyperv_eventfd hvevfd;
+
+		r = -EFAULT;
+		if (copy_from_user(&hvevfd, argp, sizeof(hvevfd)))
+			goto out;
+		r = kvm_vm_ioctl_hv_eventfd(kvm, &hvevfd);
+		break;
+	}
 	default:
 		r = -ENOTTY;
 	}