diff mbox

[PATCHv4] uio: add generic driver for PCI 2.3 devices

Message ID 20090715201340.GA12279@redhat.com (mailing list archive)
State New, archived
Headers show

Commit Message

Michael S. Tsirkin July 15, 2009, 8:13 p.m. UTC
This adds a generic uio driver that can bind to any PCI device.  First
user will be virtualization where a qemu userspace process needs to give
guest OS access to the device.

Interrupts are handled using the Interrupt Disable bit in the PCI command
register and Interrupt Status bit in the PCI status register.  All devices
compliant to PCI 2.3 (circa 2002) and all compliant PCI Express devices should
support these bits.  Driver detects this support, and won't bind to devices
which do not support the Interrupt Disable Bit in the command register.

It's expected that more features of interest to virtualization will be
added to this driver in the future. Possibilities are: mmap for device
resources, MSI/MSI-X, eventfd (to interface with kvm), iommu.

Acked-by: Chris Wright <chrisw@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---

Hans, Greg, please review and consider for upstream.

This is intended to solve the problem in virtualization that shared
interrupts do not work with assigned devices. Earlier versions of this
patch have circulated on kvm@vger.

Changes since v1:
- some naming changes
- do a single read to get both command and status register
Changes since v2:
- remove irqcontrol: user can enable interrupts by
  writing command register directly
- don't claim resources as we don't support mmap yet,
  but note the intent to do so in the commit log
Changes since v3:
- minor driver version fix

 MAINTAINERS                   |    8 ++
 drivers/uio/Kconfig           |   10 ++
 drivers/uio/Makefile          |    1 +
 drivers/uio/uio_pci_generic.c |  207 +++++++++++++++++++++++++++++++++++++++++
 include/linux/pci_regs.h      |    1 +
 5 files changed, 227 insertions(+), 0 deletions(-)
 create mode 100644 drivers/uio/uio_pci_generic.c

Comments

Hans J. Koch July 15, 2009, 9:39 p.m. UTC | #1
On Wed, Jul 15, 2009 at 11:13:40PM +0300, Michael S. Tsirkin wrote:
> This adds a generic uio driver that can bind to any PCI device.  First
> user will be virtualization where a qemu userspace process needs to give
> guest OS access to the device.
> 
> Interrupts are handled using the Interrupt Disable bit in the PCI command
> register and Interrupt Status bit in the PCI status register.  All devices
> compliant to PCI 2.3 (circa 2002) and all compliant PCI Express devices should
> support these bits.  Driver detects this support, and won't bind to devices
> which do not support the Interrupt Disable Bit in the command register.
> 
> It's expected that more features of interest to virtualization will be
> added to this driver in the future. Possibilities are: mmap for device
> resources, MSI/MSI-X, eventfd (to interface with kvm), iommu.

Well, I'm not enough of a PCI expert to tell whether your 2.3-test works or
not (can it have side effects, e.g. trigger an interrupt when you toggle that
bit?). I've added Jesse Barnes to Cc: since you modify a PCI core header file.
If there are no objections from the PCI people, I guess we can take this.

Thanks,
Hans

> 
> Acked-by: Chris Wright <chrisw@redhat.com>
> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

Signed-off-by: Hans J. Koch <hjk@linutronix.de>

> ---
> 
> Hans, Greg, please review and consider for upstream.
> 
> This is intended to solve the problem in virtualization that shared
> interrupts do not work with assigned devices. Earlier versions of this
> patch have circulated on kvm@vger.
> 
> Changes since v1:
> - some naming changes
> - do a single read to get both command and status register
> Changes since v2:
> - remove irqcontrol: user can enable interrupts by
>   writing command register directly
> - don't claim resources as we don't support mmap yet,
>   but note the intent to do so in the commit log
> Changes since v3:
> - minor driver version fix
> 
>  MAINTAINERS                   |    8 ++
>  drivers/uio/Kconfig           |   10 ++
>  drivers/uio/Makefile          |    1 +
>  drivers/uio/uio_pci_generic.c |  207 +++++++++++++++++++++++++++++++++++++++++
>  include/linux/pci_regs.h      |    1 +
>  5 files changed, 227 insertions(+), 0 deletions(-)
>  create mode 100644 drivers/uio/uio_pci_generic.c
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 18c3f0c..39c7207 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -2538,6 +2538,14 @@ T:	git git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic.git
>  S:	Maintained
>  F:	include/asm-generic
>  
> +GENERIC UIO DRIVER FOR PCI DEVICES
> +P:	Michael S. Tsirkin
> +M:	mst@redhat.com
> +L:	kvm@vger.kernel.org
> +L:	linux-kernel@vger.kernel.org
> +S:	Supported
> +F:	drivers/uio/uio_pci_generic.c
> +
>  GFS2 FILE SYSTEM
>  P:	Steven Whitehouse
>  M:	swhiteho@redhat.com
> diff --git a/drivers/uio/Kconfig b/drivers/uio/Kconfig
> index 7f86534..0f14c8e 100644
> --- a/drivers/uio/Kconfig
> +++ b/drivers/uio/Kconfig
> @@ -89,4 +89,14 @@ config UIO_SERCOS3
>  
>  	  If you compile this as a module, it will be called uio_sercos3.
>  
> +config UIO_PCI_GENERIC
> +	tristate "Generic driver for PCI 2.3 and PCI Express cards"
> +	depends on PCI
> +	default n
> +	help
> +	  Generic driver that you can bind, dynamically, to any
> +	  PCI 2.3 compliant and PCI Express card. It is useful,
> +	  primarily, for virtualization scenarios.
> +	  If you compile this as a module, it will be called uio_pci_generic.
> +
>  endif
> diff --git a/drivers/uio/Makefile b/drivers/uio/Makefile
> index 5c2586d..73b2e75 100644
> --- a/drivers/uio/Makefile
> +++ b/drivers/uio/Makefile
> @@ -5,3 +5,4 @@ obj-$(CONFIG_UIO_PDRV_GENIRQ)	+= uio_pdrv_genirq.o
>  obj-$(CONFIG_UIO_SMX)	+= uio_smx.o
>  obj-$(CONFIG_UIO_AEC)	+= uio_aec.o
>  obj-$(CONFIG_UIO_SERCOS3)	+= uio_sercos3.o
> +obj-$(CONFIG_UIO_PCI_GENERIC)	+= uio_pci_generic.o
> diff --git a/drivers/uio/uio_pci_generic.c b/drivers/uio/uio_pci_generic.c
> new file mode 100644
> index 0000000..313da35
> --- /dev/null
> +++ b/drivers/uio/uio_pci_generic.c
> @@ -0,0 +1,207 @@
> +/* uio_pci_generic - generic UIO driver for PCI 2.3 devices
> + *
> + * Copyright (C) 2009 Red Hat, Inc.
> + * Author: Michael S. Tsirkin <mst@redhat.com>
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2.
> + *
> + * Since the driver does not declare any device ids, you must allocate
> + * id and bind the device to the driver yourself.  For example:
> + *
> + * # echo "8086 10f5" > /sys/bus/pci/drivers/uio_pci_generic/new_id
> + * # echo -n 0000:00:19.0 > /sys/bus/pci/drivers/e1000e/unbind
> + * # echo -n 0000:00:19.0 > /sys/bus/pci/drivers/uio_pci_generic/bind
> + * # ls -l /sys/bus/pci/devices/0000:00:19.0/driver
> + * .../0000:00:19.0/driver -> ../../../bus/pci/drivers/uio_pci_generic
> + *
> + * Driver won't bind to devices which do not support the Interrupt Disable Bit
> + * in the command register. All devices compliant to PCI 2.3 (circa 2002) and
> + * all compliant PCI Express devices should support this bit.
> + */
> +
> +#include <linux/device.h>
> +#include <linux/module.h>
> +#include <linux/pci.h>
> +#include <linux/uio_driver.h>
> +#include <linux/spinlock.h>
> +
> +#define DRIVER_VERSION	"0.01.0"
> +#define DRIVER_AUTHOR	"Michael S. Tsirkin <mst@redhat.com>"
> +#define DRIVER_DESC	"Generic UIO driver for PCI 2.3 devices"
> +
> +struct uio_pci_generic_dev {
> +	struct uio_info info;
> +	struct pci_dev *pdev;
> +	spinlock_t lock; /* guards command register accesses */
> +};
> +
> +static inline struct uio_pci_generic_dev *
> +to_uio_pci_generic_dev(struct uio_info *info)
> +{
> +	return container_of(info, struct uio_pci_generic_dev, info);
> +}
> +
> +/* Interrupt handler. Read/modify/write the command register to disable
> + * the interrupt. */
> +static irqreturn_t irqhandler(int irq, struct uio_info *info)
> +{
> +	struct uio_pci_generic_dev *gdev = to_uio_pci_generic_dev(info);
> +	struct pci_dev *pdev = gdev->pdev;
> +	irqreturn_t ret = IRQ_NONE;
> +	u32 cmd_status_dword;
> +	u16 origcmd, newcmd, status;
> +
> +	/* We do a single dword read to retrieve both command and status.
> +	 * Document assumptions that make this possible. */
> +	BUILD_BUG_ON(PCI_COMMAND % 4);
> +	BUILD_BUG_ON(PCI_COMMAND + 2 != PCI_STATUS);
> +
> +	spin_lock_irq(&gdev->lock);
> +	pci_block_user_cfg_access(pdev);
> +
> +	/* Read both command and status registers in a single 32-bit operation.
> +	 * Note: we could cache the value for command and move the status read
> +	 * out of the lock if there was a way to get notified of user changes
> +	 * to command register through sysfs. Should be good for shared irqs. */
> +	pci_read_config_dword(pdev, PCI_COMMAND, &cmd_status_dword);
> +	origcmd = cmd_status_dword;
> +	status = cmd_status_dword >> 16;
> +
> +	/* Check interrupt status register to see whether our device
> +	 * triggered the interrupt. */
> +	if (!(status & PCI_STATUS_INTERRUPT))
> +		goto done;
> +
> +	/* We triggered the interrupt, disable it. */
> +	newcmd = origcmd | PCI_COMMAND_INTX_DISABLE;
> +	if (newcmd != origcmd)
> +		pci_write_config_word(pdev, PCI_COMMAND, newcmd);
> +
> +	/* UIO core will signal the user process. */
> +	ret = IRQ_HANDLED;
> +done:
> +
> +	pci_unblock_user_cfg_access(pdev);
> +	spin_unlock_irq(&gdev->lock);
> +	return ret;
> +}
> +
> +/* Verify that the device supports Interrupt Disable bit in command register,
> + * per PCI 2.3, by flipping this bit and reading it back: this bit was readonly
> + * in PCI 2.2. */
> +static int __devinit verify_pci_2_3(struct pci_dev *pdev)
> +{
> +	u16 orig, new;
> +	int err = 0;
> +
> +	pci_block_user_cfg_access(pdev);
> +	pci_read_config_word(pdev, PCI_COMMAND, &orig);
> +	pci_write_config_word(pdev, PCI_COMMAND,
> +			      orig ^ PCI_COMMAND_INTX_DISABLE);
> +	pci_read_config_word(pdev, PCI_COMMAND, &new);
> +	/* There's no way to protect against
> +	 * hardware bugs or detect them reliably, but as long as we know
> +	 * what the value should be, let's go ahead and check it. */
> +	if ((new ^ orig) & ~PCI_COMMAND_INTX_DISABLE) {
> +		err = -EBUSY;
> +		dev_err(&pdev->dev, "Command changed from 0x%x to 0x%x: "
> +			"driver or HW bug?\n", orig, new);
> +		goto err;
> +	}
> +	if (!((new ^ orig) & PCI_COMMAND_INTX_DISABLE)) {
> +		dev_warn(&pdev->dev, "Device does not support "
> +			 "disabling interrupts: unable to bind.\n");
> +		err = -ENODEV;
> +		goto err;
> +	}
> +	/* Now restore the original value. */
> +	pci_write_config_word(pdev, PCI_COMMAND, orig);
> +err:
> +	pci_unblock_user_cfg_access(pdev);
> +	return err;
> +}
> +
> +static int __devinit probe(struct pci_dev *pdev,
> +			   const struct pci_device_id *id)
> +{
> +	struct uio_pci_generic_dev *gdev;
> +	int err;
> +
> +	if (!pdev->irq) {
> +		dev_warn(&pdev->dev, "No IRQ assigned to device: "
> +			 "no support for interrupts?\n");
> +		return -ENODEV;
> +	}
> +
> +	err = pci_enable_device(pdev);
> +	if (err) {
> +		dev_err(&pdev->dev, "%s: pci_enable_device failed: %d\n",
> +			__func__, err);
> +		return err;
> +	}
> +
> +	err = verify_pci_2_3(pdev);
> +	if (err)
> +		goto err_verify;
> +
> +	gdev = kzalloc(sizeof(struct uio_pci_generic_dev), GFP_KERNEL);
> +	if (!gdev) {
> +		err = -ENOMEM;
> +		goto err_alloc;
> +	}
> +
> +	gdev->info.name = "uio_pci_generic";
> +	gdev->info.version = DRIVER_VERSION;
> +	gdev->info.irq = pdev->irq;
> +	gdev->info.irq_flags = IRQF_SHARED;
> +	gdev->info.handler = irqhandler;
> +	gdev->pdev = pdev;
> +	spin_lock_init(&gdev->lock);
> +
> +	if (uio_register_device(&pdev->dev, &gdev->info))
> +		goto err_register;
> +	pci_set_drvdata(pdev, gdev);
> +
> +	return 0;
> +err_register:
> +	kfree(gdev);
> +err_alloc:
> +err_verify:
> +	pci_disable_device(pdev);
> +	return err;
> +}
> +
> +static void remove(struct pci_dev *pdev)
> +{
> +	struct uio_pci_generic_dev *gdev = pci_get_drvdata(pdev);
> +
> +	uio_unregister_device(&gdev->info);
> +	pci_disable_device(pdev);
> +	kfree(gdev);
> +}
> +
> +static struct pci_driver driver = {
> +	.name = "uio_pci_generic",
> +	.id_table = NULL, /* only dynamic id's */
> +	.probe = probe,
> +	.remove = remove,
> +};
> +
> +static int __init init(void)
> +{
> +	pr_info(DRIVER_DESC " version: " DRIVER_VERSION "\n");
> +	return pci_register_driver(&driver);
> +}
> +
> +static void __exit cleanup(void)
> +{
> +	pci_unregister_driver(&driver);
> +}
> +
> +module_init(init);
> +module_exit(cleanup);
> +
> +MODULE_VERSION(DRIVER_VERSION);
> +MODULE_LICENSE("GPL v2");
> +MODULE_AUTHOR(DRIVER_AUTHOR);
> +MODULE_DESCRIPTION(DRIVER_DESC);
> diff --git a/include/linux/pci_regs.h b/include/linux/pci_regs.h
> index fcaee42..dd0bed4 100644
> --- a/include/linux/pci_regs.h
> +++ b/include/linux/pci_regs.h
> @@ -42,6 +42,7 @@
>  #define  PCI_COMMAND_INTX_DISABLE 0x400 /* INTx Emulation Disable */
>  
>  #define PCI_STATUS		0x06	/* 16 bits */
> +#define  PCI_STATUS_INTERRUPT	0x08	/* Interrupt status */
>  #define  PCI_STATUS_CAP_LIST	0x10	/* Support Capability List */
>  #define  PCI_STATUS_66MHZ	0x20	/* Support 66 Mhz PCI 2.1 bus */
>  #define  PCI_STATUS_UDF		0x40	/* Support User Definable Features [obsolete] */
> -- 
> 1.6.2.5
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Greg Kroah-Hartman July 15, 2009, 10:08 p.m. UTC | #2
On Wed, Jul 15, 2009 at 11:13:40PM +0300, Michael S. Tsirkin wrote:
> This adds a generic uio driver that can bind to any PCI device.  First
> user will be virtualization where a qemu userspace process needs to give
> guest OS access to the device.
> 
> Interrupts are handled using the Interrupt Disable bit in the PCI command
> register and Interrupt Status bit in the PCI status register.  All devices
> compliant to PCI 2.3 (circa 2002) and all compliant PCI Express devices should
> support these bits.  Driver detects this support, and won't bind to devices
> which do not support the Interrupt Disable Bit in the command register.
> 
> It's expected that more features of interest to virtualization will be
> added to this driver in the future. Possibilities are: mmap for device
> resources, MSI/MSI-X, eventfd (to interface with kvm), iommu.
> 
> Acked-by: Chris Wright <chrisw@redhat.com>
> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> ---
> 
> Hans, Greg, please review and consider for upstream.
> 
> This is intended to solve the problem in virtualization that shared
> interrupts do not work with assigned devices. Earlier versions of this
> patch have circulated on kvm@vger.

How does this play with the pci-stub driver that I thought was written
to solve this very problem?  Will it conflict?

In fact, it looks like you copied the comments for this driver directly
from the pci-stub driver :)

How about moving that documentation into a place that people will notice
it, like the rest of the UIO documentation?

And right now you are just sending the irq to userspace, what is
userspace supposed to do with it?  Do you have a userspace program that
uses this interface today to verify that everything works?  If so, care
to provide a pointer to it?

thanks,

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Michael S. Tsirkin July 16, 2009, 12:31 p.m. UTC | #3
On Wed, Jul 15, 2009 at 03:08:29PM -0700, Greg KH wrote:
> On Wed, Jul 15, 2009 at 11:13:40PM +0300, Michael S. Tsirkin wrote:
> > This adds a generic uio driver that can bind to any PCI device.  First
> > user will be virtualization where a qemu userspace process needs to give
> > guest OS access to the device.
> > 
> > Interrupts are handled using the Interrupt Disable bit in the PCI command
> > register and Interrupt Status bit in the PCI status register.  All devices
> > compliant to PCI 2.3 (circa 2002) and all compliant PCI Express devices should
> > support these bits.  Driver detects this support, and won't bind to devices
> > which do not support the Interrupt Disable Bit in the command register.
> > 
> > It's expected that more features of interest to virtualization will be
> > added to this driver in the future. Possibilities are: mmap for device
> > resources, MSI/MSI-X, eventfd (to interface with kvm), iommu.
> > 
> > Acked-by: Chris Wright <chrisw@redhat.com>
> > Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> > ---
> > 
> > Hans, Greg, please review and consider for upstream.
> > 
> > This is intended to solve the problem in virtualization that shared
> > interrupts do not work with assigned devices. Earlier versions of this
> > patch have circulated on kvm@vger.
> 
> How does this play with the pci-stub driver that I thought was written
> to solve this very problem?


AFAIK the problem pci stub was written to solve is simply to bind to a
device. You then have to use another kernel module which looks the
device up with something like pci_get_bus_and_slot to do anything
useful. In particular, for non-shared interrupts, we can disable the
interrupt in the apic. But this does not work well for shared
interrupts. Thus this work.

The uio driver will be used in virtualization scenarious, a couple
of possible ones that have been mentioned on the kvm list are:
- device assignment (guest access to device) for simple devices with
  shared interrupts: emulating PCI is tricky enough to better be done in
  userspace. shared interrupt support is important as it happens
  with real devices
- simple communication between guest and host:
  we create a virtual device in host, and userspace
  driver in guest gets events and passes them on
  to e.g. dbus. shared interrupt support is important
  to avoid wasting irqs


>  Will it conflict?

No in a sense that you can't bind both drivers to the same device.

> In fact, it looks like you copied the comments for this driver directly
> from the pci-stub driver :)

Right.

> How about moving that documentation into a place that people will notice
> it, like the rest of the UIO documentation?

OK.

> And right now you are just sending the irq to userspace, what is
> userspace supposed to do with it?

Userspace uses libpci (i.e. pci sysfs) to talk to the device and to
re-enable interrupts by writing to the command register.

In the case of device assignment, this will be qemu which
acts as a proxy for driver running in guest context.
In case of uio loaded in guest, the driver will be in guest
userspace, and the device is emulated in qemu.


> Do you have a userspace program that
> uses this interface today to verify that everything works?  If so, care
> to provide a pointer to it?

Sure. I used an emulated device for this.
First, you patch qemu to add the device:
http://www.linux-kvm.org/downloads/mst/test_irq.patch

Now, run with the new kernel (-kernel flag), adding
 -device test-irq

Once in guest, assign the device id
echo "1af4 2009" > /sys/bus/pci/drivers/uio_pci_generic/new_id

Compile and run this driver:
http://www.linux-kvm.org/downloads/mst/testuio.c
(it does not use any libraries besides libc,
 so just gcc testuio.c -o testuio)

And now make the device assert interrupts, like this:

while
sleep 1
do
setpci -s 00:04.0 0x40.B=0x1
done

You should see messages printed as the driver gets interrupts, but no
error messages about missed interrupts.

> thanks,
> 
> greg k-h
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Sheng Yang July 16, 2009, 1:33 p.m. UTC | #4
On Thursday 16 July 2009 20:31:01 Michael S. Tsirkin wrote:
> On Wed, Jul 15, 2009 at 03:08:29PM -0700, Greg KH wrote:
> > On Wed, Jul 15, 2009 at 11:13:40PM +0300, Michael S. Tsirkin wrote:
> > > This adds a generic uio driver that can bind to any PCI device.  First
> > > user will be virtualization where a qemu userspace process needs to
> > > give guest OS access to the device.
> > >
> > > Interrupts are handled using the Interrupt Disable bit in the PCI
> > > command register and Interrupt Status bit in the PCI status register. 
> > > All devices compliant to PCI 2.3 (circa 2002) and all compliant PCI
> > > Express devices should support these bits.  Driver detects this
> > > support, and won't bind to devices which do not support the Interrupt
> > > Disable Bit in the command register.
> > >
> > > It's expected that more features of interest to virtualization will be
> > > added to this driver in the future. Possibilities are: mmap for device
> > > resources, MSI/MSI-X, eventfd (to interface with kvm), iommu.
> > >
> > > Acked-by: Chris Wright <chrisw@redhat.com>
> > > Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> > > ---
> > >
> > > Hans, Greg, please review and consider for upstream.
> > >
> > > This is intended to solve the problem in virtualization that shared
> > > interrupts do not work with assigned devices. Earlier versions of this
> > > patch have circulated on kvm@vger.
> >
> > How does this play with the pci-stub driver that I thought was written
> > to solve this very problem?
>
> AFAIK the problem pci stub was written to solve is simply to bind to a
> device. You then have to use another kernel module which looks the
> device up with something like pci_get_bus_and_slot to do anything
> useful. In particular, for non-shared interrupts, we can disable the
> interrupt in the apic. But this does not work well for shared
> interrupts. Thus this work.
>
> The uio driver will be used in virtualization scenarious, a couple
> of possible ones that have been mentioned on the kvm list are:
> - device assignment (guest access to device) for simple devices with
>   shared interrupts: emulating PCI is tricky enough to better be done in
>   userspace. shared interrupt support is important as it happens
>   with real devices

One comments for shared interrupt: if you means guest device shares interrupt 
with device in other domain(that means guest or host), it's still a security 
hole, and our position seems still won't-do it. Could you explain how the 
situation change with this patch? I am not sure if I understand your meaning 
completely...

Thanks.
Michael S. Tsirkin July 16, 2009, 1:50 p.m. UTC | #5
On Thu, Jul 16, 2009 at 09:33:05PM +0800, Sheng Yang wrote:
> On Thursday 16 July 2009 20:31:01 Michael S. Tsirkin wrote:
> > On Wed, Jul 15, 2009 at 03:08:29PM -0700, Greg KH wrote:
> > > On Wed, Jul 15, 2009 at 11:13:40PM +0300, Michael S. Tsirkin wrote:
> > > > This adds a generic uio driver that can bind to any PCI device.  First
> > > > user will be virtualization where a qemu userspace process needs to
> > > > give guest OS access to the device.
> > > >
> > > > Interrupts are handled using the Interrupt Disable bit in the PCI
> > > > command register and Interrupt Status bit in the PCI status register. 
> > > > All devices compliant to PCI 2.3 (circa 2002) and all compliant PCI
> > > > Express devices should support these bits.  Driver detects this
> > > > support, and won't bind to devices which do not support the Interrupt
> > > > Disable Bit in the command register.
> > > >
> > > > It's expected that more features of interest to virtualization will be
> > > > added to this driver in the future. Possibilities are: mmap for device
> > > > resources, MSI/MSI-X, eventfd (to interface with kvm), iommu.
> > > >
> > > > Acked-by: Chris Wright <chrisw@redhat.com>
> > > > Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> > > > ---
> > > >
> > > > Hans, Greg, please review and consider for upstream.
> > > >
> > > > This is intended to solve the problem in virtualization that shared
> > > > interrupts do not work with assigned devices. Earlier versions of this
> > > > patch have circulated on kvm@vger.
> > >
> > > How does this play with the pci-stub driver that I thought was written
> > > to solve this very problem?
> >
> > AFAIK the problem pci stub was written to solve is simply to bind to a
> > device. You then have to use another kernel module which looks the
> > device up with something like pci_get_bus_and_slot to do anything
> > useful. In particular, for non-shared interrupts, we can disable the
> > interrupt in the apic. But this does not work well for shared
> > interrupts. Thus this work.
> >
> > The uio driver will be used in virtualization scenarious, a couple
> > of possible ones that have been mentioned on the kvm list are:
> > - device assignment (guest access to device) for simple devices with
> >   shared interrupts: emulating PCI is tricky enough to better be done in
> >   userspace. shared interrupt support is important as it happens
> >   with real devices
> 
> One comments for shared interrupt: if you means guest device shares interrupt 
> with device in other domain(that means guest or host), it's still a security 
> hole, and our position seems still won't-do it. Could you explain how the 
> situation change with this patch? I am not sure if I understand your meaning 
> completely...
> 
> Thanks.

Yes, this lets you safely share an interrupt between guests. Here's how this works:
a device asserts interrupt
host (kernel) sets INTD bit in device, wakes up guest
guest handles interrupt and acks it
host (userspace) clears INTD bit in device

As you see, INTD bit is under control of the host, thus guest can not
deny service to other devices sharing the interrupt.

Performance is likely to be lower than with non-shared interrupts,
but that's often the case with interrupt sharing anyway.
Michael S. Tsirkin July 16, 2009, 2:07 p.m. UTC | #6
On Wed, Jul 15, 2009 at 03:08:29PM -0700, Greg KH wrote:
> How about moving that documentation into a place that people will notice
> it, like the rest of the UIO documentation?

Greg,

would it make more sense to add this to
Documentation/DocBook/uio-howto.xml, or to create
Documentation/uio_pci_generic.txt ?
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Hans J. Koch July 16, 2009, 3:12 p.m. UTC | #7
On Thu, Jul 16, 2009 at 05:07:10PM +0300, Michael S. Tsirkin wrote:
> On Wed, Jul 15, 2009 at 03:08:29PM -0700, Greg KH wrote:
> > How about moving that documentation into a place that people will notice
> > it, like the rest of the UIO documentation?
> 
> Greg,
> 
> would it make more sense to add this to
> Documentation/DocBook/uio-howto.xml, or to create
> Documentation/uio_pci_generic.txt ?

Hi Michael,
I'd prefer to have it in uio-howto.xml so that people only have to look in
one place. In does not have to be very detailled, just a short explanation
what this driver is all about and a short example how to use it.

Thanks,
Hans

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Greg Kroah-Hartman July 16, 2009, 3:52 p.m. UTC | #8
On Thu, Jul 16, 2009 at 05:12:55PM +0200, Hans J. Koch wrote:
> On Thu, Jul 16, 2009 at 05:07:10PM +0300, Michael S. Tsirkin wrote:
> > On Wed, Jul 15, 2009 at 03:08:29PM -0700, Greg KH wrote:
> > > How about moving that documentation into a place that people will notice
> > > it, like the rest of the UIO documentation?
> > 
> > Greg,
> > 
> > would it make more sense to add this to
> > Documentation/DocBook/uio-howto.xml, or to create
> > Documentation/uio_pci_generic.txt ?
> 
> Hi Michael,
> I'd prefer to have it in uio-howto.xml so that people only have to look in
> one place. In does not have to be very detailled, just a short explanation
> what this driver is all about and a short example how to use it.

I agree, that would be the best place for it.

thanks,

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Greg Kroah-Hartman July 16, 2009, 6:19 p.m. UTC | #9
On Thu, Jul 16, 2009 at 03:31:01PM +0300, Michael S. Tsirkin wrote:
> On Wed, Jul 15, 2009 at 03:08:29PM -0700, Greg KH wrote:
> > On Wed, Jul 15, 2009 at 11:13:40PM +0300, Michael S. Tsirkin wrote:
> > > This adds a generic uio driver that can bind to any PCI device.  First
> > > user will be virtualization where a qemu userspace process needs to give
> > > guest OS access to the device.
> > > 
> > > Interrupts are handled using the Interrupt Disable bit in the PCI command
> > > register and Interrupt Status bit in the PCI status register.  All devices
> > > compliant to PCI 2.3 (circa 2002) and all compliant PCI Express devices should
> > > support these bits.  Driver detects this support, and won't bind to devices
> > > which do not support the Interrupt Disable Bit in the command register.
> > > 
> > > It's expected that more features of interest to virtualization will be
> > > added to this driver in the future. Possibilities are: mmap for device
> > > resources, MSI/MSI-X, eventfd (to interface with kvm), iommu.
> > > 
> > > Acked-by: Chris Wright <chrisw@redhat.com>
> > > Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> > > ---
> > > 
> > > Hans, Greg, please review and consider for upstream.
> > > 
> > > This is intended to solve the problem in virtualization that shared
> > > interrupts do not work with assigned devices. Earlier versions of this
> > > patch have circulated on kvm@vger.
> > 
> > How does this play with the pci-stub driver that I thought was written
> > to solve this very problem?
> 
> 
> AFAIK the problem pci stub was written to solve is simply to bind to a
> device. You then have to use another kernel module which looks the
> device up with something like pci_get_bus_and_slot to do anything
> useful. In particular, for non-shared interrupts, we can disable the
> interrupt in the apic. But this does not work well for shared
> interrupts. Thus this work.
> 
> The uio driver will be used in virtualization scenarious, a couple
> of possible ones that have been mentioned on the kvm list are:
> - device assignment (guest access to device) for simple devices with
>   shared interrupts: emulating PCI is tricky enough to better be done in
>   userspace. shared interrupt support is important as it happens
>   with real devices
> - simple communication between guest and host:
>   we create a virtual device in host, and userspace
>   driver in guest gets events and passes them on
>   to e.g. dbus. shared interrupt support is important
>   to avoid wasting irqs

Ah, ok, thanks for all of the explanation, that makes sense.

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jesse Barnes July 20, 2009, 5:17 p.m. UTC | #10
On Wed, 15 Jul 2009 23:39:11 +0200
"Hans J. Koch" <hjk@linutronix.de> wrote:

> On Wed, Jul 15, 2009 at 11:13:40PM +0300, Michael S. Tsirkin wrote:
> > This adds a generic uio driver that can bind to any PCI device.
> > First user will be virtualization where a qemu userspace process
> > needs to give guest OS access to the device.
> > 
> > Interrupts are handled using the Interrupt Disable bit in the PCI
> > command register and Interrupt Status bit in the PCI status
> > register.  All devices compliant to PCI 2.3 (circa 2002) and all
> > compliant PCI Express devices should support these bits.  Driver
> > detects this support, and won't bind to devices which do not
> > support the Interrupt Disable Bit in the command register.
> > 
> > It's expected that more features of interest to virtualization will
> > be added to this driver in the future. Possibilities are: mmap for
> > device resources, MSI/MSI-X, eventfd (to interface with kvm), iommu.
> 
> Well, I'm not enough of a PCI expert to tell whether your 2.3-test
> works or not (can it have side effects, e.g. trigger an interrupt
> when you toggle that bit?). I've added Jesse Barnes to Cc: since you
> modify a PCI core header file. If there are no objections from the
> PCI people, I guess we can take this.

pci_reg.h portion looks fine to me, and only supporting devices with
the interrupt disable bit certainly simplifies things.  There were some
other questions on the thread though (like Greg's similar driver); not
sure if you've answered those yet.
Michael S. Tsirkin July 20, 2009, 6:19 p.m. UTC | #11
On Mon, Jul 20, 2009 at 10:17:02AM -0700, Jesse Barnes wrote:
> On Wed, 15 Jul 2009 23:39:11 +0200
> "Hans J. Koch" <hjk@linutronix.de> wrote:
> 
> > On Wed, Jul 15, 2009 at 11:13:40PM +0300, Michael S. Tsirkin wrote:
> > > This adds a generic uio driver that can bind to any PCI device.
> > > First user will be virtualization where a qemu userspace process
> > > needs to give guest OS access to the device.
> > > 
> > > Interrupts are handled using the Interrupt Disable bit in the PCI
> > > command register and Interrupt Status bit in the PCI status
> > > register.  All devices compliant to PCI 2.3 (circa 2002) and all
> > > compliant PCI Express devices should support these bits.  Driver
> > > detects this support, and won't bind to devices which do not
> > > support the Interrupt Disable Bit in the command register.
> > > 
> > > It's expected that more features of interest to virtualization will
> > > be added to this driver in the future. Possibilities are: mmap for
> > > device resources, MSI/MSI-X, eventfd (to interface with kvm), iommu.
> > 
> > Well, I'm not enough of a PCI expert to tell whether your 2.3-test
> > works or not (can it have side effects, e.g. trigger an interrupt
> > when you toggle that bit?). I've added Jesse Barnes to Cc: since you
> > modify a PCI core header file. If there are no objections from the
> > PCI people, I guess we can take this.
> 
> pci_reg.h portion looks fine to me, and only supporting devices with
> the interrupt disable bit certainly simplifies things.  There were some
> other questions on the thread though (like Greg's similar driver); not
> sure if you've answered those yet.

Yes, and Greg seems satisfied :). See 20090716181919.GB27811@suse.de.
Chris Wright (author of pci-stub that was mentioned) also acked the
patch.

> -- 
> Jesse Barnes, Intel Open Source Technology Center
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/MAINTAINERS b/MAINTAINERS
index 18c3f0c..39c7207 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -2538,6 +2538,14 @@  T:	git git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic.git
 S:	Maintained
 F:	include/asm-generic
 
+GENERIC UIO DRIVER FOR PCI DEVICES
+P:	Michael S. Tsirkin
+M:	mst@redhat.com
+L:	kvm@vger.kernel.org
+L:	linux-kernel@vger.kernel.org
+S:	Supported
+F:	drivers/uio/uio_pci_generic.c
+
 GFS2 FILE SYSTEM
 P:	Steven Whitehouse
 M:	swhiteho@redhat.com
diff --git a/drivers/uio/Kconfig b/drivers/uio/Kconfig
index 7f86534..0f14c8e 100644
--- a/drivers/uio/Kconfig
+++ b/drivers/uio/Kconfig
@@ -89,4 +89,14 @@  config UIO_SERCOS3
 
 	  If you compile this as a module, it will be called uio_sercos3.
 
+config UIO_PCI_GENERIC
+	tristate "Generic driver for PCI 2.3 and PCI Express cards"
+	depends on PCI
+	default n
+	help
+	  Generic driver that you can bind, dynamically, to any
+	  PCI 2.3 compliant and PCI Express card. It is useful,
+	  primarily, for virtualization scenarios.
+	  If you compile this as a module, it will be called uio_pci_generic.
+
 endif
diff --git a/drivers/uio/Makefile b/drivers/uio/Makefile
index 5c2586d..73b2e75 100644
--- a/drivers/uio/Makefile
+++ b/drivers/uio/Makefile
@@ -5,3 +5,4 @@  obj-$(CONFIG_UIO_PDRV_GENIRQ)	+= uio_pdrv_genirq.o
 obj-$(CONFIG_UIO_SMX)	+= uio_smx.o
 obj-$(CONFIG_UIO_AEC)	+= uio_aec.o
 obj-$(CONFIG_UIO_SERCOS3)	+= uio_sercos3.o
+obj-$(CONFIG_UIO_PCI_GENERIC)	+= uio_pci_generic.o
diff --git a/drivers/uio/uio_pci_generic.c b/drivers/uio/uio_pci_generic.c
new file mode 100644
index 0000000..313da35
--- /dev/null
+++ b/drivers/uio/uio_pci_generic.c
@@ -0,0 +1,207 @@ 
+/* uio_pci_generic - generic UIO driver for PCI 2.3 devices
+ *
+ * Copyright (C) 2009 Red Hat, Inc.
+ * Author: Michael S. Tsirkin <mst@redhat.com>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.
+ *
+ * Since the driver does not declare any device ids, you must allocate
+ * id and bind the device to the driver yourself.  For example:
+ *
+ * # echo "8086 10f5" > /sys/bus/pci/drivers/uio_pci_generic/new_id
+ * # echo -n 0000:00:19.0 > /sys/bus/pci/drivers/e1000e/unbind
+ * # echo -n 0000:00:19.0 > /sys/bus/pci/drivers/uio_pci_generic/bind
+ * # ls -l /sys/bus/pci/devices/0000:00:19.0/driver
+ * .../0000:00:19.0/driver -> ../../../bus/pci/drivers/uio_pci_generic
+ *
+ * Driver won't bind to devices which do not support the Interrupt Disable Bit
+ * in the command register. All devices compliant to PCI 2.3 (circa 2002) and
+ * all compliant PCI Express devices should support this bit.
+ */
+
+#include <linux/device.h>
+#include <linux/module.h>
+#include <linux/pci.h>
+#include <linux/uio_driver.h>
+#include <linux/spinlock.h>
+
+#define DRIVER_VERSION	"0.01.0"
+#define DRIVER_AUTHOR	"Michael S. Tsirkin <mst@redhat.com>"
+#define DRIVER_DESC	"Generic UIO driver for PCI 2.3 devices"
+
+struct uio_pci_generic_dev {
+	struct uio_info info;
+	struct pci_dev *pdev;
+	spinlock_t lock; /* guards command register accesses */
+};
+
+static inline struct uio_pci_generic_dev *
+to_uio_pci_generic_dev(struct uio_info *info)
+{
+	return container_of(info, struct uio_pci_generic_dev, info);
+}
+
+/* Interrupt handler. Read/modify/write the command register to disable
+ * the interrupt. */
+static irqreturn_t irqhandler(int irq, struct uio_info *info)
+{
+	struct uio_pci_generic_dev *gdev = to_uio_pci_generic_dev(info);
+	struct pci_dev *pdev = gdev->pdev;
+	irqreturn_t ret = IRQ_NONE;
+	u32 cmd_status_dword;
+	u16 origcmd, newcmd, status;
+
+	/* We do a single dword read to retrieve both command and status.
+	 * Document assumptions that make this possible. */
+	BUILD_BUG_ON(PCI_COMMAND % 4);
+	BUILD_BUG_ON(PCI_COMMAND + 2 != PCI_STATUS);
+
+	spin_lock_irq(&gdev->lock);
+	pci_block_user_cfg_access(pdev);
+
+	/* Read both command and status registers in a single 32-bit operation.
+	 * Note: we could cache the value for command and move the status read
+	 * out of the lock if there was a way to get notified of user changes
+	 * to command register through sysfs. Should be good for shared irqs. */
+	pci_read_config_dword(pdev, PCI_COMMAND, &cmd_status_dword);
+	origcmd = cmd_status_dword;
+	status = cmd_status_dword >> 16;
+
+	/* Check interrupt status register to see whether our device
+	 * triggered the interrupt. */
+	if (!(status & PCI_STATUS_INTERRUPT))
+		goto done;
+
+	/* We triggered the interrupt, disable it. */
+	newcmd = origcmd | PCI_COMMAND_INTX_DISABLE;
+	if (newcmd != origcmd)
+		pci_write_config_word(pdev, PCI_COMMAND, newcmd);
+
+	/* UIO core will signal the user process. */
+	ret = IRQ_HANDLED;
+done:
+
+	pci_unblock_user_cfg_access(pdev);
+	spin_unlock_irq(&gdev->lock);
+	return ret;
+}
+
+/* Verify that the device supports Interrupt Disable bit in command register,
+ * per PCI 2.3, by flipping this bit and reading it back: this bit was readonly
+ * in PCI 2.2. */
+static int __devinit verify_pci_2_3(struct pci_dev *pdev)
+{
+	u16 orig, new;
+	int err = 0;
+
+	pci_block_user_cfg_access(pdev);
+	pci_read_config_word(pdev, PCI_COMMAND, &orig);
+	pci_write_config_word(pdev, PCI_COMMAND,
+			      orig ^ PCI_COMMAND_INTX_DISABLE);
+	pci_read_config_word(pdev, PCI_COMMAND, &new);
+	/* There's no way to protect against
+	 * hardware bugs or detect them reliably, but as long as we know
+	 * what the value should be, let's go ahead and check it. */
+	if ((new ^ orig) & ~PCI_COMMAND_INTX_DISABLE) {
+		err = -EBUSY;
+		dev_err(&pdev->dev, "Command changed from 0x%x to 0x%x: "
+			"driver or HW bug?\n", orig, new);
+		goto err;
+	}
+	if (!((new ^ orig) & PCI_COMMAND_INTX_DISABLE)) {
+		dev_warn(&pdev->dev, "Device does not support "
+			 "disabling interrupts: unable to bind.\n");
+		err = -ENODEV;
+		goto err;
+	}
+	/* Now restore the original value. */
+	pci_write_config_word(pdev, PCI_COMMAND, orig);
+err:
+	pci_unblock_user_cfg_access(pdev);
+	return err;
+}
+
+static int __devinit probe(struct pci_dev *pdev,
+			   const struct pci_device_id *id)
+{
+	struct uio_pci_generic_dev *gdev;
+	int err;
+
+	if (!pdev->irq) {
+		dev_warn(&pdev->dev, "No IRQ assigned to device: "
+			 "no support for interrupts?\n");
+		return -ENODEV;
+	}
+
+	err = pci_enable_device(pdev);
+	if (err) {
+		dev_err(&pdev->dev, "%s: pci_enable_device failed: %d\n",
+			__func__, err);
+		return err;
+	}
+
+	err = verify_pci_2_3(pdev);
+	if (err)
+		goto err_verify;
+
+	gdev = kzalloc(sizeof(struct uio_pci_generic_dev), GFP_KERNEL);
+	if (!gdev) {
+		err = -ENOMEM;
+		goto err_alloc;
+	}
+
+	gdev->info.name = "uio_pci_generic";
+	gdev->info.version = DRIVER_VERSION;
+	gdev->info.irq = pdev->irq;
+	gdev->info.irq_flags = IRQF_SHARED;
+	gdev->info.handler = irqhandler;
+	gdev->pdev = pdev;
+	spin_lock_init(&gdev->lock);
+
+	if (uio_register_device(&pdev->dev, &gdev->info))
+		goto err_register;
+	pci_set_drvdata(pdev, gdev);
+
+	return 0;
+err_register:
+	kfree(gdev);
+err_alloc:
+err_verify:
+	pci_disable_device(pdev);
+	return err;
+}
+
+static void remove(struct pci_dev *pdev)
+{
+	struct uio_pci_generic_dev *gdev = pci_get_drvdata(pdev);
+
+	uio_unregister_device(&gdev->info);
+	pci_disable_device(pdev);
+	kfree(gdev);
+}
+
+static struct pci_driver driver = {
+	.name = "uio_pci_generic",
+	.id_table = NULL, /* only dynamic id's */
+	.probe = probe,
+	.remove = remove,
+};
+
+static int __init init(void)
+{
+	pr_info(DRIVER_DESC " version: " DRIVER_VERSION "\n");
+	return pci_register_driver(&driver);
+}
+
+static void __exit cleanup(void)
+{
+	pci_unregister_driver(&driver);
+}
+
+module_init(init);
+module_exit(cleanup);
+
+MODULE_VERSION(DRIVER_VERSION);
+MODULE_LICENSE("GPL v2");
+MODULE_AUTHOR(DRIVER_AUTHOR);
+MODULE_DESCRIPTION(DRIVER_DESC);
diff --git a/include/linux/pci_regs.h b/include/linux/pci_regs.h
index fcaee42..dd0bed4 100644
--- a/include/linux/pci_regs.h
+++ b/include/linux/pci_regs.h
@@ -42,6 +42,7 @@ 
 #define  PCI_COMMAND_INTX_DISABLE 0x400 /* INTx Emulation Disable */
 
 #define PCI_STATUS		0x06	/* 16 bits */
+#define  PCI_STATUS_INTERRUPT	0x08	/* Interrupt status */
 #define  PCI_STATUS_CAP_LIST	0x10	/* Support Capability List */
 #define  PCI_STATUS_66MHZ	0x20	/* Support 66 Mhz PCI 2.1 bus */
 #define  PCI_STATUS_UDF		0x40	/* Support User Definable Features [obsolete] */