diff mbox

KVM PCI device assignment issues

Message ID 20090213182245.GB9898@x200.localdomain (mailing list archive)
State New, archived
Headers show

Commit Message

Chris Wright Feb. 13, 2009, 6:22 p.m. UTC
* Matthew Wilcox (matthew@wil.cx) wrote:
> On Fri, Feb 13, 2009 at 04:32:47PM +0000, Mark McLoughlin wrote:
> >   - Conventional PCI devices (i.e. PCI/PCI-X, not PCIe) behind the same 
> >     bridge must be assigned to the same VT-d domain - i.e given device 
> >     A (0000:0f:1.0) and device B (and 0000:0f:2.0), if you assign 
> >     device A to guest, you cannot then use device B in the host or 
> >     another guest.
> 
> Is that a limitation of the VT-d / IOMMU setup?

Yes.  The source id will essentially show up as the bridge.

> >   - Some newer PCIe devices (and newer conventional PCI devices too via 
> >     PCI Advanced Features) support Function Level Reset (FLR). This 
> >     allows a PCI function to be reset without affecting any other 
> >     functions on that device, or any other devices. This feature is not 
> >     widespread yet AFAIK - e.g. I've seen it on an audio controller, 
> >     and it must also be supported by SR-IOV devices.
> 
> Yes, that's definitely not very widespread yet.  OTOH, we don't need to
> worry about disturbing other functions if all devices behind the same
> bridge have to be mapped to the same guest.

FLR (when it exists) would work fine for devices not behind a conventional
pci bridge.

> > Driver Unbinding
> > ================
> > 
> > Before a device is assigned to a guest, we should make sure that no host
> > device driver is currently bound to the device.
> > 
> > We can do that with e.g.
> > 
> >  $> echo -n "8086 10de"  > /sys/bus/pci/drivers/pci-stub/new_id
> >  $> echo -n 0000:00:19.0 > /sys/bus/pci/drivers/e1000e/unbind
> >  $> echo -n 0000:00:19.0 > /sys/bus/pci/drivers/pci-stub/bind
> > 
> > One minor problem with this scheme is that at this point you can't
> > unbind from pci-stub and trigger a re-probe and have e1000e bind to it.
> > In order to support that, we need a "remove_id" interface to remove the
> > dynamic ID.
> 
> It sounds like you'd be OK with a 'remove_id' interface that only
> removes subsequently-added interfaces.
> 
> I might suggest a second approach which would be to have an explicit
> echo to the bind file ignore the list of ids.  Then you wouldn't need to
> 'echo -n "8086 10de"' to begin with.

I tried that first, and it dips into the driver logic, where it wants
to filter via ->match.  Untested patch below _should_ be enough to avoid
adding the id to begin with.

> > Furthermore, it should be possible to do this without actually affecting
> > any of the devices - i.e. a "try to unbind and see if we oops" approach
> > clearly isn't great.
> 
> Well, yes.  I'd even be upset if my network or storage flickered away
> briefly while another using was starting to run KVM.
> 
> > This last constraint is the most difficult and points to the logic
> > needing to be in userland management libraries. Possibly the only sane
> > kernel space support would be "try to unbind and reset; if it works then
> > the device is assignable".
> 
> If we expose a 'reset' file in the /sys/bus/pci/devices/*/ directories
> for devices that are resettable, that should be enough, I would think.

Yes, currently it's only internal and it's not robust given the reset
constraints.

thanks,
-chris
--

 drivers/base/base.h |    1 +
 drivers/base/bus.c  |    2 +-
 drivers/base/dd.c   |   15 +++++++++++++--
 3 files changed, 15 insertions(+), 3 deletions(-)

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Chris Wright Feb. 13, 2009, 7:47 p.m. UTC | #1
* Chris Wright (chrisw@redhat.com) wrote:
> * Matthew Wilcox (matthew@wil.cx) wrote:
> > I might suggest a second approach which would be to have an explicit
> > echo to the bind file ignore the list of ids.  Then you wouldn't need to
> > 'echo -n "8086 10de"' to begin with.
> 
> I tried that first, and it dips into the driver logic, where it wants
> to filter via ->match.  Untested patch below _should_ be enough to avoid
> adding the id to begin with.

OK, after making it actually compile.  Still gets trapped into generic
logic, this time in pci core.  I'm starting to remember why dynid looked
like the better option.

pci_device_probe
  __pci_device_probe
    pci_match_device()  <-- fails
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/base/base.h b/drivers/base/base.h
index 0a5f055..60dc346 100644
--- a/drivers/base/base.h
+++ b/drivers/base/base.h
@@ -86,6 +86,7 @@  extern void bus_remove_driver(struct device_driver *drv);
 
 extern void driver_detach(struct device_driver *drv);
 extern int driver_probe_device(struct device_driver *drv, struct device *dev);
+extern int driver_bind_probe_device(struct device_driver *drv, struct device *dev);
 
 extern void sysdev_shutdown(void);
 extern int sysdev_suspend(pm_message_t state);
diff --git a/drivers/base/bus.c b/drivers/base/bus.c
index 83f32b8..ad28338 100644
--- a/drivers/base/bus.c
+++ b/drivers/base/bus.c
@@ -202,7 +202,7 @@  static ssize_t driver_bind(struct device_driver *drv,
 		if (dev->parent)	/* Needed for USB */
 			down(&dev->parent->sem);
 		down(&dev->sem);
-		err = driver_probe_device(drv, dev);
+		err = driver_bind_probe_device(drv, dev);
 		up(&dev->sem);
 		if (dev->parent)
 			up(&dev->parent->sem);
diff --git a/drivers/base/dd.c b/drivers/base/dd.c
index 315bed8..fba6463 100644
--- a/drivers/base/dd.c
+++ b/drivers/base/dd.c
@@ -184,13 +184,14 @@  int driver_probe_done(void)
  * This function must be called with @dev->sem held.  When called for a
  * USB interface, @dev->parent->sem must be held as well.
  */
-int driver_probe_device(struct device_driver *drv, struct device *dev)
+static int __driver_probe_device(struct device_driver *drv, struct device *dev,
+				 bool force)
 {
 	int ret = 0;
 
 	if (!device_is_registered(dev))
 		return -ENODEV;
-	if (drv->bus->match && !drv->bus->match(dev, drv))
+	if (!force && drv->bus->match && !drv->bus->match(dev, drv))
 		goto done;
 
 	pr_debug("bus: '%s': %s: matched device %s with driver %s\n",
@@ -202,6 +203,16 @@  done:
 	return ret;
 }
 
+int driver_probe_bind_device(struct device_driver *drv, struct device *dev)
+{
+	return __driver_probe_device(drv, dev, 1);
+}
+
+int driver_probe_device(struct device_driver *drv, struct device *dev)
+{
+	return __driver_probe_device(drv, dev, 0);
+}
+
 static int __device_attach(struct device_driver *drv, void *data)
 {
 	struct device *dev = data;