diff mbox

[corrected,RFC] uio: add generic driver for PCI 2.3 devices

Message ID 20090709114834.GB26479@redhat.com (mailing list archive)
State New, archived
Headers show

Commit Message

Michael S. Tsirkin July 9, 2009, 11:48 a.m. UTC
Resending with corrected addresses. Sorry about the churn.

-----------

I got annoyed by the fact that we don't support shared interrupts
with PCI in assigned devides, so here's a draft patch to add that
support in kernel through uio.

I intend to send this to lkml, but meanwhile I'd appreciate some early
feedback/flames from people on the list.

Thanks!


----------->

This adds a generic uio driver that can bind to any PCI device.  First
user will be virtualization where a qemu userspace process needs to give
guest OS access to the device.

Interrupts are handled using the Interrupt Disable bit in the PCI command
register and Interrupt Status bit in the PCI status register.  All devices
compliant to PCI 2.3 (circa 2002) and all compliant PCI Express devices should
support these bits.  Driver detects this support, and won't bind to devices
which do not support the Interrupt Disable Bit in the command register.

It's expected that MSI/MSI-X support will be added to this driver in the
future, to interface with virtualization irqfd/eventfd infrastructure.
Another area to examine, and of interest to virtualization, is iommu.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 drivers/uio/Kconfig           |   10 ++
 drivers/uio/Makefile          |    1 +
 drivers/uio/uio_pci_generic.c |  202 +++++++++++++++++++++++++++++++++++++++++
 include/linux/pci_regs.h      |    1 +
 4 files changed, 214 insertions(+), 0 deletions(-)
 create mode 100644 drivers/uio/uio_pci_generic.c

Comments

Anthony Liguori July 9, 2009, 2:54 p.m. UTC | #1
Michael S. Tsirkin wrote:
> Resending with corrected addresses. Sorry about the churn.
>
> -----------
>
> I got annoyed by the fact that we don't support shared interrupts
> with PCI in assigned devides, so here's a draft patch to add that
> support in kernel through uio.
>
> I intend to send this to lkml, but meanwhile I'd appreciate some early
> feedback/flames from people on the list.
>
> Thanks!
>
>
> ----------->
>
> This adds a generic uio driver that can bind to any PCI device.  First
> user will be virtualization where a qemu userspace process needs to give
> guest OS access to the device.
>
> Interrupts are handled using the Interrupt Disable bit in the PCI command
> register and Interrupt Status bit in the PCI status register.  All devices
> compliant to PCI 2.3 (circa 2002) and all compliant PCI Express devices should
> support these bits.  Driver detects this support, and won't bind to devices
> which do not support the Interrupt Disable Bit in the command register.
>
> It's expected that MSI/MSI-X support will be added to this driver in the
> future, to interface with virtualization irqfd/eventfd infrastructure.
> Another area to examine, and of interest to virtualization, is iommu.
>
> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
>   

I didn't know this was possible...  so we could also use this driver for 
vm-channel.

> +
> +	err = pci_request_regions(pdev, "uio_pci_generic");
> +	if (err) {
> +		dev_err(&pdev->dev, "%s: pci_request_regions failed: %d\n",
> +			 __func__, err);
> +		goto err_verify;
> +	}
> +
> +	dev = kzalloc(sizeof(struct generic_dev), GFP_KERNEL);
> +	if (!dev) {
> +		err = -ENOMEM;
> +		goto err_alloc;
> +	}
> +
> +	dev->info.name = "uio_pci_generic";
> +	dev->info.version = "0.01";
> +	dev->info.irq = pdev->irq;
> +	dev->info.irq_flags = IRQF_SHARED;
> +	dev->info.handler = irqhandler;
> +	dev->info.irqcontrol = irqcontrol;
> +	dev->pdev = pdev;
> +	spin_lock_init(&dev->lock)
>   

I know it's not strictly needed for PCI pass through, but it would be 
useful to register the IO regions via UIO.  The userspace implementation 
would then use UIO strictly instead of poking the sysfs pci info 
directly.  I think that ends up being cleaner.

Regards,

ANthony Liguori
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Michael S. Tsirkin July 9, 2009, 6:12 p.m. UTC | #2
On Thu, Jul 09, 2009 at 09:54:43AM -0500, Anthony Liguori wrote:
> I know it's not strictly needed for PCI pass through, but it would be  
> useful to register the IO regions via UIO.  The userspace implementation  
> would then use UIO strictly instead of poking the sysfs pci info  
> directly.  I think that ends up being cleaner.

Hmm, this is good for specific drivers, but for a generic one like qemu,
still need sysfs to figure out the size at least, and
we need config accesses which uio does not support now.
And if you use libpci as qemu does now, this interface will likely
go unused. So .. there does not seem to be much point at the moment.

My idea is, let's start with a minimal interface, longer term
let's see if we can add config access, mmap and other stuff like eventfd.
Makes sense?
Anthony Liguori July 9, 2009, 8:14 p.m. UTC | #3
Michael S. Tsirkin wrote:
> On Thu, Jul 09, 2009 at 09:54:43AM -0500, Anthony Liguori wrote:
>   
>> I know it's not strictly needed for PCI pass through, but it would be  
>> useful to register the IO regions via UIO.  The userspace implementation  
>> would then use UIO strictly instead of poking the sysfs pci info  
>> directly.  I think that ends up being cleaner.
>>     
>
> Hmm, this is good for specific drivers, but for a generic one like qemu,
> still need sysfs to figure out the size at least,

size of what?

>  and
> we need config accesses which uio does not support now.
> And if you use libpci as qemu does now, this interface will likely
> go unused. So .. there does not seem to be much point at the moment.
>   

Right, I would expect uio to replace libpci.

> My idea is, let's start with a minimal interface, longer term
> let's see if we can add config access, mmap and other stuff like eventfd.
> Makes sense?
>   

It can certainly grow more features down the road.

Regards,

Anthony Liguori

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Michael S. Tsirkin July 9, 2009, 8:21 p.m. UTC | #4
On Thu, Jul 09, 2009 at 03:14:53PM -0500, Anthony Liguori wrote:
> Michael S. Tsirkin wrote:
>> On Thu, Jul 09, 2009 at 09:54:43AM -0500, Anthony Liguori wrote:
>>   
>>> I know it's not strictly needed for PCI pass through, but it would be 
>>>  useful to register the IO regions via UIO.  The userspace 
>>> implementation  would then use UIO strictly instead of poking the 
>>> sysfs pci info  directly.  I think that ends up being cleaner.
>>>     
>>
>> Hmm, this is good for specific drivers, but for a generic one like qemu,
>> still need sysfs to figure out the size at least,
>
> size of what?

memory region which you mmap.

>>  and
>> we need config accesses which uio does not support now.
>> And if you use libpci as qemu does now, this interface will likely
>> go unused. So .. there does not seem to be much point at the moment.
>>   
>
> Right, I would expect uio to replace libpci.
> 
>> My idea is, let's start with a minimal interface, longer term
>> let's see if we can add config access, mmap and other stuff like eventfd.
>> Makes sense?
>>   
>
> It can certainly grow more features down the road.

OK, I'll try to get it included as is, then grow it.
Chris Wright July 10, 2009, 2:19 a.m. UTC | #5
* Michael S. Tsirkin (mst@redhat.com) wrote:
> +struct generic_dev {

I know I commented on this one on an earlier, private version, and naming
is not my strength... maybe "struct uio_generic_pci_dev" or "struct
uio_generic_pci"?

> +	struct uio_info info;
> +	struct pci_dev *pdev;
> +	spinlock_t lock; /* guards command register accesses */
> +};
> +
> +/* Read/modify/write command register to disable interrupts.
> + * Note: we could cache the value and optimize the read if there was a way to
> + * get notified of user changes to command register through sysfs.
> + * */

For the irqcontrol case, I don't think RMW is a problem (coming from
userspace it's already a slower path).

For the irqhandler case, you can grab the full dword to get Command+Status
(since you needed status anyway, and config reads are dword).

> +static void irqtoggle(struct generic_dev *dev, int irq_on)
> +{
> +	struct pci_dev *pdev = dev->pdev;
> +	unsigned long flags;
> +	u16 orig, new;
> +
> +	spin_lock_irqsave(&dev->lock, flags);
> +	pci_block_user_cfg_access(pdev);
> +	pci_read_config_word(pdev, PCI_COMMAND, &orig);
> +	new = irq_on ? orig & ~PCI_COMMAND_INTX_DISABLE :
> +		orig | PCI_COMMAND_INTX_DISABLE;
> +	if (new != orig)
> +		pci_write_config_word(dev->pdev, PCI_COMMAND, new);
> +	pci_unblock_user_cfg_access(dev);
> +	spin_unlock_irqrestore(&dev->lock, flags);
> +}
> +
> +/* irqcontrol is use by userspace to enable/disable interrupts. */
> +static int irqcontrol(struct uio_info *info, s32 irq_on)
> +{
> +	struct generic_dev *dev = container_of(info, struct generic_dev, info);
> +	irqtoggle(dev, irq_on);
> +	return 0;
> +}
> +
> +static irqreturn_t irqhandler(int irq, struct uio_info *info)
> +{
> +	struct generic_dev *dev = container_of(info, struct generic_dev, info);
> +	irqreturn_t ret = IRQ_NONE;
> +	u16 status;
> +
> +	/* Check interrupt status register to see whether our device
> +	 * triggered the interrupt. */
> +	pci_read_config_word(dev->pdev, PCI_STATUS, &status);
> +	if (!(status & PCI_STATUS_INTERRUPT))
> +		goto done;
> +
> +	/* We triggered the interrupt, disable it. */
> +	irqtoggle(dev, 0);
> +	/* UIO core will signal the user process. */
> +	ret = IRQ_HANDLED;
> +done:
> +	return ret;
> +}
> +
> +/* Verify that the device supports Interrupt Disable bit in command register,
> + * per PCI 2.3, by flipping this bit and reading it back: this bit was readonly
> + * in PCI 2.2. */

Wonder if this should also restrict from things like bridges?

> +static int __devinit verify_pci_2_3(struct pci_dev *pdev)
> +{
> +	u16 orig, new;
> +	int err = 0;
> +
> +	pci_block_user_cfg_access(pdev);
> +	pci_read_config_word(pdev, PCI_COMMAND, &orig);
> +	pci_write_config_word(pdev, PCI_COMMAND,
> +			      orig ^ PCI_COMMAND_INTX_DISABLE);
> +	pci_read_config_word(pdev, PCI_COMMAND, &new);
> +	/* There's no way to protect against
> +	 * hardware bugs or detect them reliably, but as long as we know
> +	 * what the value should be, let's go ahead and check it. */
> +	if ((new ^ orig) & ~PCI_COMMAND_INTX_DISABLE) {
> +		err = -EBUSY;
> +		dev_err(&pdev->dev, "Command changed from 0x%x to 0x%x: "
> +			"driver or HW bug?\n", orig, new);
> +		goto err;
> +	}
> +	if (!((new ^ orig) & PCI_COMMAND_INTX_DISABLE)) {
> +		dev_warn(&pdev->dev, "Device does not support "
> +			 "disabling interrupts: unable to bind.\n");
> +		err = -ENODEV;
> +		goto err;
> +	}
> +	/* Now restore the original value. */
> +	pci_write_config_word(pdev, PCI_COMMAND, orig);
> +err:
> +	pci_unblock_user_cfg_access(pdev);
> +	return err;
> +}
> +
> +static int __devinit probe(struct pci_dev *pdev,
> +			   const struct pci_device_id *id)
> +{
> +	struct generic_dev *dev;
> +	int err;
> +
> +	err = pci_enable_device(pdev);
> +	if (err) {
> +		dev_err(&pdev->dev, "%s: pci_enable_device failed: %d\n",
> +			 __func__, err);
> +		return err;
> +	}
> +
> +	err = verify_pci_2_3(pdev);
> +	if (err)
> +		goto err_verify;
> +
> +	err = pci_request_regions(pdev, "uio_pci_generic");
> +	if (err) {
> +		dev_err(&pdev->dev, "%s: pci_request_regions failed: %d\n",
> +			 __func__, err);
> +		goto err_verify;
> +	}
> +
> +	dev = kzalloc(sizeof(struct generic_dev), GFP_KERNEL);
> +	if (!dev) {
> +		err = -ENOMEM;
> +		goto err_alloc;
> +	}
> +
> +	dev->info.name = "uio_pci_generic";
> +	dev->info.version = "0.01";
> +	dev->info.irq = pdev->irq;

May need to verify pdev->irq in verify too?

> +	dev->info.irq_flags = IRQF_SHARED;
> +	dev->info.handler = irqhandler;
> +	dev->info.irqcontrol = irqcontrol;
> +	dev->pdev = pdev;
> +	spin_lock_init(&dev->lock);
> +
> +	pci_reset_function(pdev);

I think this could be a bit much, since it will fall back to secondary
bus reset.

thanks,
-chris
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Chris Wright July 10, 2009, 2:22 a.m. UTC | #6
* Anthony Liguori (anthony@codemonkey.ws) wrote:
> I didn't know this was possible...  so we could also use this driver for  
> vm-channel.

With MSI things are much nicer (which we could define for vm-channel).
This is for shared legacy INTx.

>> +
>> +	err = pci_request_regions(pdev, "uio_pci_generic");
>> +	if (err) {
>> +		dev_err(&pdev->dev, "%s: pci_request_regions failed: %d\n",
>> +			 __func__, err);
>> +		goto err_verify;
>> +	}
>> +
>> +	dev = kzalloc(sizeof(struct generic_dev), GFP_KERNEL);
>> +	if (!dev) {
>> +		err = -ENOMEM;
>> +		goto err_alloc;
>> +	}
>> +
>> +	dev->info.name = "uio_pci_generic";
>> +	dev->info.version = "0.01";
>> +	dev->info.irq = pdev->irq;
>> +	dev->info.irq_flags = IRQF_SHARED;
>> +	dev->info.handler = irqhandler;
>> +	dev->info.irqcontrol = irqcontrol;
>> +	dev->pdev = pdev;
>> +	spin_lock_init(&dev->lock)
>>   
>
> I know it's not strictly needed for PCI pass through, but it would be  
> useful to register the IO regions via UIO.  The userspace implementation  
> would then use UIO strictly instead of poking the sysfs pci info  
> directly.  I think that ends up being cleaner.

I don't see what the advantage is?

thanks,
-chris
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Michael S. Tsirkin July 10, 2009, 1:56 p.m. UTC | #7
On Thu, Jul 09, 2009 at 07:19:45PM -0700, Chris Wright wrote:
> * Michael S. Tsirkin (mst@redhat.com) wrote:
> > +struct generic_dev {
> 
> I know I commented on this one on an earlier, private version, and naming
> is not my strength... maybe "struct uio_generic_pci_dev" or "struct
> uio_generic_pci"?

Hmm, I'd like to keep it short. the struct is private in file after all.
Longer name will not let me init variables of this type on the same
line.  generic_dev is enough to make grep/find in file happy?

> > +	struct uio_info info;
> > +	struct pci_dev *pdev;
> > +	spinlock_t lock; /* guards command register accesses */
> > +};
> > +
> > +/* Read/modify/write command register to disable interrupts.
> > + * Note: we could cache the value and optimize the read if there was a way to
> > + * get notified of user changes to command register through sysfs.
> > + * */
> 
> For the irqcontrol case, I don't think RMW is a problem (coming from
> userspace it's already a slower path).

I still expect it to be noticeable ...

> For the irqhandler case, you can grab the full dword to get Command+Status
> (since you needed status anyway, and config reads are dword).

Good point.

> > +static void irqtoggle(struct generic_dev *dev, int irq_on)
> > +{
> > +	struct pci_dev *pdev = dev->pdev;
> > +	unsigned long flags;
> > +	u16 orig, new;
> > +
> > +	spin_lock_irqsave(&dev->lock, flags);
> > +	pci_block_user_cfg_access(pdev);
> > +	pci_read_config_word(pdev, PCI_COMMAND, &orig);
> > +	new = irq_on ? orig & ~PCI_COMMAND_INTX_DISABLE :
> > +		orig | PCI_COMMAND_INTX_DISABLE;
> > +	if (new != orig)
> > +		pci_write_config_word(dev->pdev, PCI_COMMAND, new);
> > +	pci_unblock_user_cfg_access(dev);
> > +	spin_unlock_irqrestore(&dev->lock, flags);
> > +}
> > +
> > +/* irqcontrol is use by userspace to enable/disable interrupts. */
> > +static int irqcontrol(struct uio_info *info, s32 irq_on)
> > +{
> > +	struct generic_dev *dev = container_of(info, struct generic_dev, info);
> > +	irqtoggle(dev, irq_on);
> > +	return 0;
> > +}
> > +
> > +static irqreturn_t irqhandler(int irq, struct uio_info *info)
> > +{
> > +	struct generic_dev *dev = container_of(info, struct generic_dev, info);
> > +	irqreturn_t ret = IRQ_NONE;
> > +	u16 status;
> > +
> > +	/* Check interrupt status register to see whether our device
> > +	 * triggered the interrupt. */
> > +	pci_read_config_word(dev->pdev, PCI_STATUS, &status);
> > +	if (!(status & PCI_STATUS_INTERRUPT))
> > +		goto done;
> > +
> > +	/* We triggered the interrupt, disable it. */
> > +	irqtoggle(dev, 0);
> > +	/* UIO core will signal the user process. */
> > +	ret = IRQ_HANDLED;
> > +done:
> > +	return ret;
> > +}
> > +
> > +/* Verify that the device supports Interrupt Disable bit in command register,
> > + * per PCI 2.3, by flipping this bit and reading it back: this bit was readonly
> > + * in PCI 2.2. */
> 
> Wonder if this should also restrict from things like bridges?

Nope, why should it? If the device deasserts interrupt, the bridge
does not care about the reason.

> > +static int __devinit verify_pci_2_3(struct pci_dev *pdev)
> > +{
> > +	u16 orig, new;
> > +	int err = 0;
> > +
> > +	pci_block_user_cfg_access(pdev);
> > +	pci_read_config_word(pdev, PCI_COMMAND, &orig);
> > +	pci_write_config_word(pdev, PCI_COMMAND,
> > +			      orig ^ PCI_COMMAND_INTX_DISABLE);
> > +	pci_read_config_word(pdev, PCI_COMMAND, &new);
> > +	/* There's no way to protect against
> > +	 * hardware bugs or detect them reliably, but as long as we know
> > +	 * what the value should be, let's go ahead and check it. */
> > +	if ((new ^ orig) & ~PCI_COMMAND_INTX_DISABLE) {
> > +		err = -EBUSY;
> > +		dev_err(&pdev->dev, "Command changed from 0x%x to 0x%x: "
> > +			"driver or HW bug?\n", orig, new);
> > +		goto err;
> > +	}
> > +	if (!((new ^ orig) & PCI_COMMAND_INTX_DISABLE)) {
> > +		dev_warn(&pdev->dev, "Device does not support "
> > +			 "disabling interrupts: unable to bind.\n");
> > +		err = -ENODEV;
> > +		goto err;
> > +	}
> > +	/* Now restore the original value. */
> > +	pci_write_config_word(pdev, PCI_COMMAND, orig);
> > +err:
> > +	pci_unblock_user_cfg_access(pdev);
> > +	return err;
> > +}
> > +
> > +static int __devinit probe(struct pci_dev *pdev,
> > +			   const struct pci_device_id *id)
> > +{
> > +	struct generic_dev *dev;
> > +	int err;
> > +
> > +	err = pci_enable_device(pdev);
> > +	if (err) {
> > +		dev_err(&pdev->dev, "%s: pci_enable_device failed: %d\n",
> > +			 __func__, err);
> > +		return err;
> > +	}
> > +
> > +	err = verify_pci_2_3(pdev);
> > +	if (err)
> > +		goto err_verify;
> > +
> > +	err = pci_request_regions(pdev, "uio_pci_generic");
> > +	if (err) {
> > +		dev_err(&pdev->dev, "%s: pci_request_regions failed: %d\n",
> > +			 __func__, err);
> > +		goto err_verify;
> > +	}
> > +
> > +	dev = kzalloc(sizeof(struct generic_dev), GFP_KERNEL);
> > +	if (!dev) {
> > +		err = -ENOMEM;
> > +		goto err_alloc;
> > +	}
> > +
> > +	dev->info.name = "uio_pci_generic";
> > +	dev->info.version = "0.01";
> > +	dev->info.irq = pdev->irq;
> 
> May need to verify pdev->irq in verify too?

Good point. Although I am not sure why might irq be 0,
but from code this seems possible.

> > +	dev->info.irq_flags = IRQF_SHARED;
> > +	dev->info.handler = irqhandler;
> > +	dev->info.irqcontrol = irqcontrol;
> > +	dev->pdev = pdev;
> > +	spin_lock_init(&dev->lock);
> > +
> > +	pci_reset_function(pdev);
> 
> I think this could be a bit much,
> since it will fall back to secondary
> bus reset.

The reason I put it here, is because with userspace driver the device
might be left in a very strange state if you kill the driver abruptly.
With uio I don't currently get notification on open (which would be
ideal) so user will have to unbind/rebind to reset the device.
And btw, an unfortunate side affect that I didn't realise
is that this actually is restricted to PCI express devices.

So, what I think I will do is add a patch in pci-sysfs.c with a sysfs
entry that resets the device. Should be pretty useful generally.
Makes sense?

> thanks,
> -chris
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Avi Kivity July 12, 2009, 6:17 a.m. UTC | #8
On 07/10/2009 05:22 AM, Chris Wright wrote:
>> I know it's not strictly needed for PCI pass through, but it would be
>> useful to register the IO regions via UIO.  The userspace implementation
>> would then use UIO strictly instead of poking the sysfs pci info
>> directly.  I think that ends up being cleaner.
>>      
>
> I don't see what the advantage is?
>    

Have a single fd represent the assigned device, so security can be 
concentrated at that point.  Some privileged server can then hand out 
the fd to qemu.
diff mbox

Patch

diff --git a/drivers/uio/Kconfig b/drivers/uio/Kconfig
index 7f86534..0f14c8e 100644
--- a/drivers/uio/Kconfig
+++ b/drivers/uio/Kconfig
@@ -89,4 +89,14 @@  config UIO_SERCOS3
 
 	  If you compile this as a module, it will be called uio_sercos3.
 
+config UIO_PCI_GENERIC
+	tristate "Generic driver for PCI 2.3 and PCI Express cards"
+	depends on PCI
+	default n
+	help
+	  Generic driver that you can bind, dynamically, to any
+	  PCI 2.3 compliant and PCI Express card. It is useful,
+	  primarily, for virtualization scenarios.
+	  If you compile this as a module, it will be called uio_pci_generic.
+
 endif
diff --git a/drivers/uio/Makefile b/drivers/uio/Makefile
index 5c2586d..73b2e75 100644
--- a/drivers/uio/Makefile
+++ b/drivers/uio/Makefile
@@ -5,3 +5,4 @@  obj-$(CONFIG_UIO_PDRV_GENIRQ)	+= uio_pdrv_genirq.o
 obj-$(CONFIG_UIO_SMX)	+= uio_smx.o
 obj-$(CONFIG_UIO_AEC)	+= uio_aec.o
 obj-$(CONFIG_UIO_SERCOS3)	+= uio_sercos3.o
+obj-$(CONFIG_UIO_PCI_GENERIC)	+= uio_pci_generic.o
diff --git a/drivers/uio/uio_pci_generic.c b/drivers/uio/uio_pci_generic.c
new file mode 100644
index 0000000..dd0df44
--- /dev/null
+++ b/drivers/uio/uio_pci_generic.c
@@ -0,0 +1,202 @@ 
+/* uio_pci_generic - generic UIO driver for PCI 2.3 devices
+ *
+ * Copyright (C) 2009 Red Hat, Inc.
+ * Author: Michael S. Tsirkin <mst@redhat.com>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.
+ *
+ * Since the driver does not declare any device ids, you must allocate
+ * id and bind the device to the driver yourself.  For example:
+ *
+ * # echo "8086 10f5" > /sys/bus/pci/drivers/uio_pci_generic/new_id
+ * # echo -n 0000:00:19.0 > /sys/bus/pci/drivers/e1000e/unbind
+ * # echo -n 0000:00:19.0 > /sys/bus/pci/drivers/uio_pci_generic/bind
+ * # ls -l /sys/bus/pci/devices/0000:00:19.0/driver
+ * .../0000:00:19.0/driver -> ../../../bus/pci/drivers/uio_pci_generic
+ *
+ * Driver won't bind to devices which do not support the Interrupt Disable Bit
+ * in the command register. All devices compliant to PCI 2.3 (circa 2002) and
+ * all compliant PCI Express devices should support this bit.
+ */
+
+#include <linux/device.h>
+#include <linux/module.h>
+#include <linux/pci.h>
+#include <linux/uio_driver.h>
+#include <linux/spinlock.h>
+
+struct generic_dev {
+	struct uio_info info;
+	struct pci_dev *pdev;
+	spinlock_t lock; /* guards command register accesses */
+};
+
+/* Read/modify/write command register to disable interrupts.
+ * Note: we could cache the value and optimize the read if there was a way to
+ * get notified of user changes to command register through sysfs.
+ * */
+static void irqtoggle(struct generic_dev *dev, int irq_on)
+{
+	struct pci_dev *pdev = dev->pdev;
+	unsigned long flags;
+	u16 orig, new;
+
+	spin_lock_irqsave(&dev->lock, flags);
+	pci_block_user_cfg_access(pdev);
+	pci_read_config_word(pdev, PCI_COMMAND, &orig);
+	new = irq_on ? orig & ~PCI_COMMAND_INTX_DISABLE :
+		orig | PCI_COMMAND_INTX_DISABLE;
+	if (new != orig)
+		pci_write_config_word(dev->pdev, PCI_COMMAND, new);
+	pci_unblock_user_cfg_access(dev);
+	spin_unlock_irqrestore(&dev->lock, flags);
+}
+
+/* irqcontrol is use by userspace to enable/disable interrupts. */
+static int irqcontrol(struct uio_info *info, s32 irq_on)
+{
+	struct generic_dev *dev = container_of(info, struct generic_dev, info);
+	irqtoggle(dev, irq_on);
+	return 0;
+}
+
+static irqreturn_t irqhandler(int irq, struct uio_info *info)
+{
+	struct generic_dev *dev = container_of(info, struct generic_dev, info);
+	irqreturn_t ret = IRQ_NONE;
+	u16 status;
+
+	/* Check interrupt status register to see whether our device
+	 * triggered the interrupt. */
+	pci_read_config_word(dev->pdev, PCI_STATUS, &status);
+	if (!(status & PCI_STATUS_INTERRUPT))
+		goto done;
+
+	/* We triggered the interrupt, disable it. */
+	irqtoggle(dev, 0);
+	/* UIO core will signal the user process. */
+	ret = IRQ_HANDLED;
+done:
+	return ret;
+}
+
+/* Verify that the device supports Interrupt Disable bit in command register,
+ * per PCI 2.3, by flipping this bit and reading it back: this bit was readonly
+ * in PCI 2.2. */
+static int __devinit verify_pci_2_3(struct pci_dev *pdev)
+{
+	u16 orig, new;
+	int err = 0;
+
+	pci_block_user_cfg_access(pdev);
+	pci_read_config_word(pdev, PCI_COMMAND, &orig);
+	pci_write_config_word(pdev, PCI_COMMAND,
+			      orig ^ PCI_COMMAND_INTX_DISABLE);
+	pci_read_config_word(pdev, PCI_COMMAND, &new);
+	/* There's no way to protect against
+	 * hardware bugs or detect them reliably, but as long as we know
+	 * what the value should be, let's go ahead and check it. */
+	if ((new ^ orig) & ~PCI_COMMAND_INTX_DISABLE) {
+		err = -EBUSY;
+		dev_err(&pdev->dev, "Command changed from 0x%x to 0x%x: "
+			"driver or HW bug?\n", orig, new);
+		goto err;
+	}
+	if (!((new ^ orig) & PCI_COMMAND_INTX_DISABLE)) {
+		dev_warn(&pdev->dev, "Device does not support "
+			 "disabling interrupts: unable to bind.\n");
+		err = -ENODEV;
+		goto err;
+	}
+	/* Now restore the original value. */
+	pci_write_config_word(pdev, PCI_COMMAND, orig);
+err:
+	pci_unblock_user_cfg_access(pdev);
+	return err;
+}
+
+static int __devinit probe(struct pci_dev *pdev,
+			   const struct pci_device_id *id)
+{
+	struct generic_dev *dev;
+	int err;
+
+	err = pci_enable_device(pdev);
+	if (err) {
+		dev_err(&pdev->dev, "%s: pci_enable_device failed: %d\n",
+			 __func__, err);
+		return err;
+	}
+
+	err = verify_pci_2_3(pdev);
+	if (err)
+		goto err_verify;
+
+	err = pci_request_regions(pdev, "uio_pci_generic");
+	if (err) {
+		dev_err(&pdev->dev, "%s: pci_request_regions failed: %d\n",
+			 __func__, err);
+		goto err_verify;
+	}
+
+	dev = kzalloc(sizeof(struct generic_dev), GFP_KERNEL);
+	if (!dev) {
+		err = -ENOMEM;
+		goto err_alloc;
+	}
+
+	dev->info.name = "uio_pci_generic";
+	dev->info.version = "0.01";
+	dev->info.irq = pdev->irq;
+	dev->info.irq_flags = IRQF_SHARED;
+	dev->info.handler = irqhandler;
+	dev->info.irqcontrol = irqcontrol;
+	dev->pdev = pdev;
+	spin_lock_init(&dev->lock);
+
+	pci_reset_function(pdev);
+
+	if (uio_register_device(&pdev->dev, &dev->info))
+		goto err_register;
+	pci_set_drvdata(pdev, dev);
+
+	return 0;
+err_register:
+	kfree(dev);
+err_alloc:
+	pci_release_regions(pdev);
+err_verify:
+	pci_disable_device(pdev);
+	return err;
+}
+
+static void remove(struct pci_dev *pdev)
+{
+	struct generic_dev *dev = pci_get_drvdata(pdev);
+
+	uio_unregister_device(&dev->info);
+	kfree(dev);
+}
+
+static struct pci_driver driver = {
+	.name = "uio_pci_generic",
+	.id_table = NULL, /* only dynamic id's */
+	.probe = probe,
+	.remove = remove,
+};
+
+static int __init init(void)
+{
+	return pci_register_driver(&driver);
+}
+
+static void __exit cleanup(void)
+{
+	pci_unregister_driver(&driver);
+}
+
+module_init(init);
+module_exit(cleanup);
+
+MODULE_LICENSE("GPL v2");
+MODULE_AUTHOR("Michael S. Tsirkin <mst@redhat.com>");
diff --git a/include/linux/pci_regs.h b/include/linux/pci_regs.h
index 616bf8b..bfb9b31 100644
--- a/include/linux/pci_regs.h
+++ b/include/linux/pci_regs.h
@@ -42,6 +42,7 @@ 
 #define  PCI_COMMAND_INTX_DISABLE 0x400 /* INTx Emulation Disable */
 
 #define PCI_STATUS		0x06	/* 16 bits */
+#define  PCI_STATUS_INTERRUPT	0x08	/* Interrupt status */
 #define  PCI_STATUS_CAP_LIST	0x10	/* Support Capability List */
 #define  PCI_STATUS_66MHZ	0x20	/* Support 66 Mhz PCI 2.1 bus */
 #define  PCI_STATUS_UDF		0x40	/* Support User Definable Features [obsolete] */