diff mbox

[3/3] KVM: Enable MSI-X for KVM assigned device

Message ID 1234339731-3195-4-git-send-email-sheng@linux.intel.com (mailing list archive)
State Accepted, archived
Headers show

Commit Message

Sheng Yang Feb. 11, 2009, 8:08 a.m. UTC
This patch finally enable MSI-X.

What we need for MSI-X:
1. Intercept one page in MMIO region of device. So that we can get guest desired
MSI-X table and set up the real one. Now this have been done by guest, and
transfer to kernel using ioctl KVM_SET_MSIX_NR and KVM_SET_MSIX_ENTRY.

2. Information for incoming interrupt. Now one device can have more than one
interrupt, and they are all handled by one workqueue structure. So we need to
identify them. The previous patch enable gsi_msg_pending_bitmap get this done.

3. Mapping from host IRQ to guest gsi as well as guest gsi to real MSI/MSI-X
message address/data. We used same entry number for the host and guest here, so
that it's easy to find the correlated guest gsi.

What we lack for now:
1. The PCI spec said nothing can existed with MSI-X table in the same page of
MMIO region, except pending bits. The patch ignore pending bits as the first
step (so they are always 0 - no pending).

2. The PCI spec allowed to change MSI-X table dynamically. That means, the OS
can enable MSI-X, then mask one MSI-X entry, modify it, and unmask it. The patch
didn't support this, and Linux also don't work in this way.

3. The patch didn't implement MSI-X mask all and mask single entry. I would
implement the former in driver/pci/msi.c later. And for single entry, userspace
should have reposibility to handle it.

Signed-off-by: Sheng Yang <sheng@linux.intel.com>
---
 include/linux/kvm.h |    8 ++++
 virt/kvm/kvm_main.c |  107 ++++++++++++++++++++++++++++++++++++++++++++++++---
 2 files changed, 109 insertions(+), 6 deletions(-)

Comments

Avi Kivity Feb. 11, 2009, 12:48 p.m. UTC | #1
Sheng Yang wrote:
> This patch finally enable MSI-X.
>
> What we need for MSI-X:
> 1. Intercept one page in MMIO region of device. So that we can get guest desired
> MSI-X table and set up the real one. Now this have been done by guest, and
> transfer to kernel using ioctl KVM_SET_MSIX_NR and KVM_SET_MSIX_ENTRY.
>
>   

Looking at the patch, you're doing that in userspace.  Good.

> 2. The PCI spec allowed to change MSI-X table dynamically. That means, the OS
> can enable MSI-X, then mask one MSI-X entry, modify it, and unmask it. The patch
> didn't support this, and Linux also don't work in this way.
>   

What about Windows?
Sheng Yang Feb. 12, 2009, 6:03 a.m. UTC | #2
On Wednesday 11 February 2009 20:48:55 Avi Kivity wrote:
> Sheng Yang wrote:
> > This patch finally enable MSI-X.
> >
> > What we need for MSI-X:
> > 1. Intercept one page in MMIO region of device. So that we can get guest
> > desired MSI-X table and set up the real one. Now this have been done by
> > guest, and transfer to kernel using ioctl KVM_SET_MSIX_NR and
> > KVM_SET_MSIX_ENTRY.
>
> Looking at the patch, you're doing that in userspace.  Good.
>
> > 2. The PCI spec allowed to change MSI-X table dynamically. That means,
> > the OS can enable MSI-X, then mask one MSI-X entry, modify it, and unmask
> > it. The patch didn't support this, and Linux also don't work in this way.
>
> What about Windows?

Not sure. Don't have Windows driver to test... And only Vista and later 
support MSI/MSI-X, not sure it would be so aggressive... We can add some debug 
if such condition happened.
Marcelo Tosatti Feb. 12, 2009, 8:40 p.m. UTC | #3
On Wed, Feb 11, 2009 at 04:08:51PM +0800, Sheng Yang wrote:
> This patch finally enable MSI-X.
> 
> What we need for MSI-X:
> 1. Intercept one page in MMIO region of device. So that we can get guest desired
> MSI-X table and set up the real one. Now this have been done by guest, and
> transfer to kernel using ioctl KVM_SET_MSIX_NR and KVM_SET_MSIX_ENTRY.
> 
> 2. Information for incoming interrupt. Now one device can have more than one
> interrupt, and they are all handled by one workqueue structure. So we need to
> identify them. The previous patch enable gsi_msg_pending_bitmap get this done.
> 
> 3. Mapping from host IRQ to guest gsi as well as guest gsi to real MSI/MSI-X
> message address/data. We used same entry number for the host and guest here, so
> that it's easy to find the correlated guest gsi.
> 
> What we lack for now:
> 1. The PCI spec said nothing can existed with MSI-X table in the same page of
> MMIO region, except pending bits. The patch ignore pending bits as the first
> step (so they are always 0 - no pending).
> 
> 2. The PCI spec allowed to change MSI-X table dynamically. That means, the OS
> can enable MSI-X, then mask one MSI-X entry, modify it, and unmask it. The patch
> didn't support this, and Linux also don't work in this way.

Have you thought about supporting this with the current ioctl interface?
Point is that once ioctl's are in and userspace code uses it, you can't
change. So:

- If userspace calls kvm_vm_ioctl_set_msix_entry twice don't you leak
  memory (that is a bug actually unless I'm mistaken)? So in the future
  kvm_vm_ioctl_set_msix_entry will deallocate the currently allocated
  guest/host entries and replace with userspace supplied entries?

- interrupt context can read the table while kvm_vm_ioctl_set_msix_entry
  is modifying it. So you either need to forbid more than one
  kvm_vm_ioctl_set_msix_entry call in the lifetime of a guest (which
  you can later allow when you support MSI table change), or handle 
  accesses from multiple contexes now. It seems forbidding is enough for
  the moment from what you said.

But in general looks good to me (not familiar with the internals of
pci / msi).
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Sheng Yang Feb. 13, 2009, 5:29 a.m. UTC | #4
On Friday 13 February 2009 04:40:05 Marcelo Tosatti wrote:
> On Wed, Feb 11, 2009 at 04:08:51PM +0800, Sheng Yang wrote:
> > This patch finally enable MSI-X.
> >
> > What we need for MSI-X:
> > 1. Intercept one page in MMIO region of device. So that we can get guest
> > desired MSI-X table and set up the real one. Now this have been done by
> > guest, and transfer to kernel using ioctl KVM_SET_MSIX_NR and
> > KVM_SET_MSIX_ENTRY.
> >
> > 2. Information for incoming interrupt. Now one device can have more than
> > one interrupt, and they are all handled by one workqueue structure. So we
> > need to identify them. The previous patch enable gsi_msg_pending_bitmap
> > get this done.
> >
> > 3. Mapping from host IRQ to guest gsi as well as guest gsi to real
> > MSI/MSI-X message address/data. We used same entry number for the host
> > and guest here, so that it's easy to find the correlated guest gsi.
> >
> > What we lack for now:
> > 1. The PCI spec said nothing can existed with MSI-X table in the same
> > page of MMIO region, except pending bits. The patch ignore pending bits
> > as the first step (so they are always 0 - no pending).
> >
> > 2. The PCI spec allowed to change MSI-X table dynamically. That means,
> > the OS can enable MSI-X, then mask one MSI-X entry, modify it, and unmask
> > it. The patch didn't support this, and Linux also don't work in this way.
>
> Have you thought about supporting this with the current ioctl interface?
> Point is that once ioctl's are in and userspace code uses it, you can't
> change. So:
>
> - If userspace calls kvm_vm_ioctl_set_msix_entry twice don't you leak
>   memory (that is a bug actually unless I'm mistaken)? So in the future
>   kvm_vm_ioctl_set_msix_entry will deallocate the currently allocated
>   guest/host entries and replace with userspace supplied entries?
>

The allocation only happen once, at the second time it would report error in 
current code. But allocate/deallocate is also acceptable for future.

> - interrupt context can read the table while kvm_vm_ioctl_set_msix_entry
>   is modifying it. So you either need to forbid more than one
>   kvm_vm_ioctl_set_msix_entry call in the lifetime of a guest (which
>   you can later allow when you support MSI table change), or handle
>   accesses from multiple contexes now. It seems forbidding is enough for
>   the moment from what you said.

Yeah.

But for the modifying the MSI-X table, the most critical problem is, current 
Linux didn't support it IIRC. So I have to disable MSI-X then enable it again 
with new table, and it would result in lost interrupt.

So seems the most reasonable method is to modify pci_enable_msix() and related 
function's action to support this...
Marcelo Tosatti Feb. 13, 2009, 5:13 p.m. UTC | #5
On Fri, Feb 13, 2009 at 01:29:42PM +0800, Sheng Yang wrote:
> The allocation only happen once, at the second time it would report error in 
> current code. But allocate/deallocate is also acceptable for future.

Oops, it OK. 

> > - interrupt context can read the table while kvm_vm_ioctl_set_msix_entry
> >   is modifying it. So you either need to forbid more than one
> >   kvm_vm_ioctl_set_msix_entry call in the lifetime of a guest (which
> >   you can later allow when you support MSI table change), or handle
> >   accesses from multiple contexes now. It seems forbidding is enough for
> >   the moment from what you said.
> 
> Yeah.
> 
> But for the modifying the MSI-X table, the most critical problem is, current 
> Linux didn't support it IIRC. So I have to disable MSI-X then enable it again 
> with new table, and it would result in lost interrupt.
> 
> So seems the most reasonable method is to modify pci_enable_msix() and related 
> function's action to support this...

Alright, then just reply "[PATCH 1/3] KVM: Ioctls for init MSI-X entry"
with the minor comments reworked so Avi can apply.

Thanks.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/include/linux/kvm.h b/include/linux/kvm.h
index 5200768..fcd08da 100644
--- a/include/linux/kvm.h
+++ b/include/linux/kvm.h
@@ -439,6 +439,9 @@  struct kvm_irq_routing {
 };
 
 #endif
+#if defined(CONFIG_X86)
+#define KVM_CAP_DEVICE_MSIX 26
+#endif
 
 /*
  * ioctls for VM fds
@@ -596,6 +599,11 @@  struct kvm_assigned_irq {
 #define KVM_DEV_IRQ_ASSIGN_MSI_ACTION	KVM_DEV_IRQ_ASSIGN_ENABLE_MSI
 #define KVM_DEV_IRQ_ASSIGN_ENABLE_MSI	(1 << 0)
 
+#define KVM_DEV_IRQ_ASSIGN_MSIX_ACTION  (KVM_DEV_IRQ_ASSIGN_ENABLE_MSIX |\
+					KVM_DEV_IRQ_ASSIGN_MASK_MSIX)
+#define KVM_DEV_IRQ_ASSIGN_ENABLE_MSIX  (1 << 1)
+#define KVM_DEV_IRQ_ASSIGN_MASK_MSIX    (1 << 2)
+
 struct kvm_assigned_msix_nr {
 	__u32 assigned_dev_id;
 	__u16 entry_nr;
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 961603f..c774027 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -281,13 +281,33 @@  static void kvm_free_assigned_irq(struct kvm *kvm,
 	 * now, the kvm state is still legal for probably we also have to wait
 	 * interrupt_work done.
 	 */
-	disable_irq_nosync(assigned_dev->host_irq);
-	cancel_work_sync(&assigned_dev->interrupt_work);
+	if (assigned_dev->irq_requested_type & KVM_ASSIGNED_DEV_MSIX) {
+		int i;
+		for (i = 0; i < assigned_dev->entries_nr; i++)
+			disable_irq_nosync(assigned_dev->
+					   host_msix_entries[i].vector);
+
+		cancel_work_sync(&assigned_dev->interrupt_work);
+
+		for (i = 0; i < assigned_dev->entries_nr; i++)
+			free_irq(assigned_dev->host_msix_entries[i].vector,
+				 (void *)assigned_dev);
+
+		assigned_dev->entries_nr = 0;
+		kfree(assigned_dev->host_msix_entries);
+		kfree(assigned_dev->guest_msix_entries);
+		pci_disable_msix(assigned_dev->dev);
+	} else {
+		/* Deal with MSI and INTx */
+		disable_irq_nosync(assigned_dev->host_irq);
+		cancel_work_sync(&assigned_dev->interrupt_work);
 
-	free_irq(assigned_dev->host_irq, (void *)assigned_dev);
+		free_irq(assigned_dev->host_irq, (void *)assigned_dev);
 
-	if (assigned_dev->irq_requested_type & KVM_ASSIGNED_DEV_HOST_MSI)
-		pci_disable_msi(assigned_dev->dev);
+		if (assigned_dev->irq_requested_type &
+				KVM_ASSIGNED_DEV_HOST_MSI)
+			pci_disable_msi(assigned_dev->dev);
+	}
 
 	assigned_dev->irq_requested_type = 0;
 }
@@ -416,6 +436,69 @@  static int assigned_device_update_msi(struct kvm *kvm,
 	adev->irq_requested_type |= KVM_ASSIGNED_DEV_HOST_MSI;
 	return 0;
 }
+
+static int assigned_device_update_msix(struct kvm *kvm,
+			struct kvm_assigned_dev_kernel *adev,
+			struct kvm_assigned_irq *airq)
+{
+	/* TODO Deal with KVM_DEV_IRQ_ASSIGNED_MASK_MSIX */
+	int i, r;
+
+	adev->ack_notifier.gsi = -1;
+
+	if (irqchip_in_kernel(kvm)) {
+		if (airq->flags & KVM_DEV_IRQ_ASSIGN_MASK_MSIX) {
+			printk(KERN_WARNING
+			       "kvm: unsupported mask MSI-X, flags 0x%x!\n",
+			       airq->flags);
+			return 0;
+		}
+
+		if (!(airq->flags & KVM_DEV_IRQ_ASSIGN_ENABLE_MSIX)) {
+			/* Guest disable MSI-X */
+			kvm_free_assigned_irq(kvm, adev);
+			if (msi2intx) {
+				pci_enable_msi(adev->dev);
+				if (adev->dev->msi_enabled)
+					return assigned_device_update_msi(kvm,
+							adev, airq);
+			}
+			return assigned_device_update_intx(kvm, adev, airq);
+		}
+
+		/* host_msix_entries and guest_msix_entries should have been
+		 * initialized */
+
+		if (adev->entries_nr == 0) {
+			printk(KERN_ERR
+			       "kvm: MSI-X entry not initialized!\n");
+			return -EFAULT;
+		}
+
+		kvm_free_assigned_irq(kvm, adev);
+
+		r = pci_enable_msix(adev->dev, adev->host_msix_entries,
+				    adev->entries_nr);
+		if (r) {
+			printk(KERN_ERR "Fail to enable MSI-X feature!\n");
+			return r;
+		}
+
+		for (i = 0; i < adev->entries_nr; i++) {
+			r = request_irq((adev->host_msix_entries + i)->vector,
+					kvm_assigned_dev_intr, 0,
+					"kvm_assigned_msix_device",
+					(void *)adev);
+			if (r)
+				return r;
+		}
+	}
+
+	adev->irq_requested_type |= KVM_ASSIGNED_DEV_MSIX;
+
+	return 0;
+}
+
 #endif
 
 static int kvm_vm_ioctl_assign_irq(struct kvm *kvm,
@@ -462,12 +545,24 @@  static int kvm_vm_ioctl_assign_irq(struct kvm *kvm,
 		}
 	}
 
-	if ((match->irq_requested_type & KVM_ASSIGNED_DEV_HOST_MSI) &&
+	if (match->irq_requested_type & KVM_ASSIGNED_DEV_MSIX)
+		current_flags |= KVM_DEV_IRQ_ASSIGN_ENABLE_MSIX;
+	else if ((match->irq_requested_type & KVM_ASSIGNED_DEV_HOST_MSI) &&
 		 (match->irq_requested_type & KVM_ASSIGNED_DEV_GUEST_MSI))
 		current_flags |= KVM_DEV_IRQ_ASSIGN_ENABLE_MSI;
 
 	changed_flags = assigned_irq->flags ^ current_flags;
 
+#ifdef CONFIG_X86
+	if (changed_flags & KVM_DEV_IRQ_ASSIGN_MSIX_ACTION) {
+		r = assigned_device_update_msix(kvm, match, assigned_irq);
+		if (r) {
+			printk(KERN_WARNING "kvm: failed to execute "
+					"MSI-X action!\n");
+			goto out_release;
+		}
+	} else
+#endif
 	if ((changed_flags & KVM_DEV_IRQ_ASSIGN_MSI_ACTION) ||
 	    (msi2intx && match->dev->msi_enabled)) {
 #ifdef CONFIG_X86