diff mbox

[RFC] PCI pass-through fixups

Message ID 1238789816.15558.399.camel@lappy (mailing list archive)
State New, archived
Headers show

Commit Message

Alex Williamson April 3, 2009, 8:16 p.m. UTC
I'm wondering if we need a spot for device specific fixups for PCI
pass-through.  In the example below, I want to expose a single port of
an Intel 82571EB quad port copper NIC to a guest.  It works great until
I shutdown the guest, at which point the guest e1000e driver knows by
the device ID that the NIC is a quad port, and blindly attempts to
twiddle some bits on the bridge above it (that doesn't exist).
Obviously some robustness could be added to the driver, but would it
make sense to do something like below and automatically remap these
devices to identical single port device IDs?  Thanks,

Alex

Signed-off-by: Alex Williamson <alex.williamson@hp.com>
--



--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Chris Wright April 7, 2009, 12:02 a.m. UTC | #1
* Alex Williamson (alex.williamson@hp.com) wrote:
> 
> I'm wondering if we need a spot for device specific fixups for PCI
> pass-through.  In the example below, I want to expose a single port of
> an Intel 82571EB quad port copper NIC to a guest.  It works great until
> I shutdown the guest, at which point the guest e1000e driver knows by
> the device ID that the NIC is a quad port, and blindly attempts to
> twiddle some bits on the bridge above it (that doesn't exist).

And what happens?

> Obviously some robustness could be added to the driver, but would it
> make sense to do something like below and automatically remap these
> devices to identical single port device IDs?  Thanks,

Sounds quite fragile to me.

thanks,
-chris
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Sheng Yang April 7, 2009, 6:25 a.m. UTC | #2
On Tuesday 07 April 2009 08:02:10 Chris Wright wrote:
> * Alex Williamson (alex.williamson@hp.com) wrote:
> > I'm wondering if we need a spot for device specific fixups for PCI
> > pass-through.  In the example below, I want to expose a single port of
> > an Intel 82571EB quad port copper NIC to a guest.  It works great until
> > I shutdown the guest, at which point the guest e1000e driver knows by
> > the device ID that the NIC is a quad port, and blindly attempts to
> > twiddle some bits on the bridge above it (that doesn't exist).
>
> And what happens?

And the output of DEVICE_ASSIGNMENT_DEBUG=1 may help. And I don't have trouble 
here with another dual/quad port card by Intel, but not 82571EB. 
>
> > Obviously some robustness could be added to the driver, but would it
> > make sense to do something like below and automatically remap these
> > devices to identical single port device IDs?  Thanks,
>
> Sounds quite fragile to me.

Same here... We'd better to know what's happened. (wow, you even know the 
single port ones' device ID... :) )
Chris Wright April 7, 2009, 2:54 p.m. UTC | #3
* Sheng Yang (sheng@linux.intel.com) wrote:
> On Tuesday 07 April 2009 08:02:10 Chris Wright wrote:
> > * Alex Williamson (alex.williamson@hp.com) wrote:
> > > I'm wondering if we need a spot for device specific fixups for PCI
> > > pass-through.  In the example below, I want to expose a single port of
> > > an Intel 82571EB quad port copper NIC to a guest.  It works great until
> > > I shutdown the guest, at which point the guest e1000e driver knows by
> > > the device ID that the NIC is a quad port, and blindly attempts to
> > > twiddle some bits on the bridge above it (that doesn't exist).
> >
> > And what happens?
> 
> And the output of DEVICE_ASSIGNMENT_DEBUG=1 may help. And I don't have trouble 
> here with another dual/quad port card by Intel, but not 82571EB. 

Right, it's driver dependent.  I too can assign a single port w/out
trouble.  I'm assuming the guest just complains at rmmod?

thanks,
-chris
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alex Williamson April 7, 2009, 3:23 p.m. UTC | #4
On Tue, 2009-04-07 at 07:54 -0700, Chris Wright wrote:
> * Sheng Yang (sheng@linux.intel.com) wrote:
> > On Tuesday 07 April 2009 08:02:10 Chris Wright wrote:
> > > * Alex Williamson (alex.williamson@hp.com) wrote:
> > > > I'm wondering if we need a spot for device specific fixups for PCI
> > > > pass-through.  In the example below, I want to expose a single port of
> > > > an Intel 82571EB quad port copper NIC to a guest.  It works great until
> > > > I shutdown the guest, at which point the guest e1000e driver knows by
> > > > the device ID that the NIC is a quad port, and blindly attempts to
> > > > twiddle some bits on the bridge above it (that doesn't exist).
> > >
> > > And what happens?
> > 
> > And the output of DEVICE_ASSIGNMENT_DEBUG=1 may help. And I don't have trouble 
> > here with another dual/quad port card by Intel, but not 82571EB. 
> 
> Right, it's driver dependent.  I too can assign a single port w/out
> trouble.  I'm assuming the guest just complains at rmmod?

No, it's a little more severe than that, see below.  This may be rather
unique, I certainly wasn't expecting different device IDs for a quad
port versus a single port.  However, you can see in
e1000e/82571.c:e1000_get_variants_82571() we set a flag for quad port
device IDs, apparently for WOL only being supported on the first port.
Then when we shutdown, we hit e1000e/netdev.c:e1000_suspend() where we
take that quad port flag and blindly walk up to bus->self, which is
NULL, and try to get the PCI capabilities on it.

It may be prudent for the driver to check the pointer, but there's
probably also an argument that the device ID indicates something about
the topology that makes that unnecessary, so we may hit the same problem
in drivers we don't have source for.  I agree that my rough sketch of a
patch is pretty fragile and static.  Maybe an option to override a
device ID on the command line would be more flexible.  Something like:

-pcidevice host=09:00.0,device_id=0x105E

This puts the burden on the user, but at least allows us an out.
Thanks,

Alex

Sheng - BTW my copies to you bounced with "X-Postfix; Server certificate
not verified"

[   26.462952] e1000e 0000:00:04.0: PCI INT B disabled
[   26.464083] BUG: unable to handle kernel NULL pointer dereference at 0000002d
[   26.465694] IP: [<c0208a95>] pci_find_capability+0x9/0x69
[   26.466837] *pde = 00000000 
[   26.467485] Oops: 0000 [#1] SMP 
[   26.468046] last sysfs file: /sys/block/sda/removable
[   26.468046] Modules linked in: ipv6 loop button serio_raw parport_pc snd_pcsp virtio_balloon psmouse parport snd_pcm snd_timer i2c_piix4 i2c_core snd soundcore snd_page_alloc evdev ext3 jbd mbcache sg sr_mod cdrom sd_mod piix ide_pci_generic ide_core ata_piix floppy ata_generic e1000e virtio_pci libata scsi_mod thermal processor fan thermal_sys
[   26.468046] 
[   26.468046] Pid: 2365, comm: halt Not tainted (2.6.29-net #1) 
[   26.468046] EIP: 0060:[<c0208a95>] EFLAGS: 00010286 CPU: 0
[   26.468046] EIP is at pci_find_capability+0x9/0x69
[   26.468046] EAX: 00000000 EBX: 00000000 ECX: f5c8be3c EDX: 00000010
[   26.468046] ESI: 00000010 EDI: f60a3c00 EBP: 00000000 ESP: f5c8be60
[   26.468046]  DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
[   26.468046] Process halt (pid: 2365, ti=f5c8a000 task=f5e2a2c0 task.ti=f5c8a000)
[   26.468046] Stack:
[   26.468046]  00428360 f60a3c00 f5c08360 00000000 f81a2197 00000000 00000002 c0270cbc
[   26.468046]  f60a3c00 4321fedc 28121969 f5c8a000 c020b635 f60a1058 c02614ea ffffffff
[   26.468046]  c013418c c01343bd c0122c31 0021bbc7 00000000 00000000 00000046 f707d900
[   26.468046] Call Trace:
[   26.468046]  [<f81a2197>] e1000_suspend+0x1ec/0x269 [e1000e]
[   26.468046]  [<c0270cbc>] cmos_suspend+0x77/0xae
[   26.468046]  [<c020b635>] pci_device_shutdown+0x16/0x25
[   26.468046]  [<c02614ea>] device_shutdown+0x31/0x63
[   26.468046]  [<c013418c>] kernel_power_off+0xa/0x2f
[   26.468046]  [<c01343bd>] sys_reboot+0xf6/0x16c
[   26.468046]  [<c0122c31>] try_to_wake_up+0x1ed/0x1f7
[   26.468046]  [<c01178ca>] pvclock_clocksource_read+0x48/0xa7
[   26.468046]  [<c0102026>] __switch_to+0xcd/0x14e
[   26.468046]  [<c01244de>] finish_task_switch+0x42/0xc7
[   26.468046]  [<c02eb510>] schedule+0x811/0x88b
[   26.468046]  [<c0196a6e>] dput+0x1a/0x108
[   26.468046]  [<c019b8cf>] mntput_no_expire+0x1a/0xfe
[   26.468046]  [<c0188281>] filp_close+0x4e/0x54
[   26.468046]  [<c01882eb>] sys_close+0x64/0x9f
[   26.468046]  [<c0103466>] syscall_call+0x7/0xb
[   26.468046] Code: 44 24 1c 75 06 0f b6 04 24 eb 10 fe 04 24 8b 17 8d 42 ff 85 d2 89 07 75 ab 31 c0 5b 5e 5b 5e 5f 5d c3 56 89 d6 53 89 c3 83 ec 08 <8a> 40 2d 8d 4c 24 04 88 44 24 03 8b 53 1c 8b 43 08 51 b9 06 00 
[   26.468046] EIP: [<c0208a95>] pci_find_capability+0x9/0x69 SS:ESP 0068:f5c8be60
[   26.529902] ---[ end trace 69f68df891ae55cc ]---


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Chris Wright April 7, 2009, 3:47 p.m. UTC | #5
* Alex Williamson (alex.williamson@hp.com) wrote:
> On Tue, 2009-04-07 at 07:54 -0700, Chris Wright wrote:
> > * Sheng Yang (sheng@linux.intel.com) wrote:
> > > On Tuesday 07 April 2009 08:02:10 Chris Wright wrote:
> > > > * Alex Williamson (alex.williamson@hp.com) wrote:
> > > > > I'm wondering if we need a spot for device specific fixups for PCI
> > > > > pass-through.  In the example below, I want to expose a single port of
> > > > > an Intel 82571EB quad port copper NIC to a guest.  It works great until
> > > > > I shutdown the guest, at which point the guest e1000e driver knows by
> > > > > the device ID that the NIC is a quad port, and blindly attempts to
> > > > > twiddle some bits on the bridge above it (that doesn't exist).
> > > >
> > > > And what happens?
> > > 
> > > And the output of DEVICE_ASSIGNMENT_DEBUG=1 may help. And I don't have trouble 
> > > here with another dual/quad port card by Intel, but not 82571EB. 
> > 
> > Right, it's driver dependent.  I too can assign a single port w/out
> > trouble.  I'm assuming the guest just complains at rmmod?
> 
> No, it's a little more severe than that, see below.  This may be rather
> unique, I certainly wasn't expecting different device IDs for a quad
> port versus a single port.  However, you can see in
> e1000e/82571.c:e1000_get_variants_82571() we set a flag for quad port
> device IDs, apparently for WOL only being supported on the first port.
> Then when we shutdown, we hit e1000e/netdev.c:e1000_suspend() where we
> take that quad port flag and blindly walk up to bus->self, which is
> NULL, and try to get the PCI capabilities on it.

That's a form of guest complaint ;-)  Yes, I noticed the global wol bit,
but didn't see the bus->self...ick

> It may be prudent for the driver to check the pointer, but there's
> probably also an argument that the device ID indicates something about
> the topology that makes that unnecessary, so we may hit the same problem
> in drivers we don't have source for.  I agree that my rough sketch of a
> patch is pretty fragile and static.  Maybe an option to override a
> device ID on the command line would be more flexible.  Something like:
> 
> -pcidevice host=09:00.0,device_id=0x105E
> 
> This puts the burden on the user, but at least allows us an out.

Still ugly, esp since there's no real way to know ahead of time
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/qemu/hw/device-assignment.c b/qemu/hw/device-assignment.c
index b7f9fa6..1c6d1e8 100644
--- a/qemu/hw/device-assignment.c
+++ b/qemu/hw/device-assignment.c
@@ -427,6 +427,35 @@  static int assigned_dev_register_regions(PCIRegion *io_regions,
     return 0;
 }
 
+static void assigned_device_fixup(AssignedDevice *pci_dev)
+{
+    uint16_t vendor_id, device_id;
+
+    vendor_id = pci_dev->dev.config[0] | pci_dev->dev.config[1] << 8;
+    device_id = pci_dev->dev.config[2] | pci_dev->dev.config[3] << 8;
+
+    switch (vendor_id) {
+        case 0x8086:
+            switch (device_id) {
+                case 0x10A4:
+                case 0x10BC:
+                    /* quad port copper -> single port copper */
+                    pci_dev->dev.config[2] = 0x5E;
+                    break;
+                case 0x10A5:
+                    /* quad port fiber -> single port fiber */
+                    pci_dev->dev.config[2] = 0x5F;
+                    break;
+                case 0x10DA:
+                case 0x10D9:
+                    /* dual/quad port serdes -> single port serdes */
+                    pci_dev->dev.config[2] = 0x60;
+                    break;
+            }
+            break;
+    }
+}
+
 static int get_real_device(AssignedDevice *pci_dev, uint8_t r_bus,
                            uint8_t r_dev, uint8_t r_func)
 {
@@ -524,6 +553,8 @@  again:
     }
     fclose(f);
 
+    assigned_device_fixup(pci_dev);
+
     /* dealing with virtual function device */
     snprintf(name, sizeof(name), "%sphysfn/", dir);
     if (!stat(name, &statbuf))