Message ID | 20180108151932.23bb70ea@t450s.home (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
At first, thank you very much! On 08.01.2018 23:19, Alex Williamson wrote: > We already have quirks to support various other versions of the Marvell > chip, but the 9128 is missing, so it's just a couple lines to add it. > This is against v4.9.75: > > diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c > index 98eba9127a0b..19ca3c9fac3a 100644 > --- a/drivers/pci/quirks.c > +++ b/drivers/pci/quirks.c > @@ -3868,6 +3868,8 @@ static void quirk_dma_func1_alias(struct pci_dev *dev) > /* https://bugzilla.kernel.org/show_bug.cgi?id=42679#c49 */ > DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_MARVELL_EXT, 0x9230, > quirk_dma_func1_alias); > +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_MARVELL_EXT, 0x9128, > + quirk_dma_func1_alias); > DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_TTI, 0x0642, > quirk_dma_func1_alias); > /* https://bugs.gentoo.org/show_bug.cgi?id=497630 */ There is good news, and there is bad news: The good news is that the patch works as expected. I have applied it to kernel 4.9 and recompiled the kernel (which was not that easy for me because this machine boots from ZFS, so beware of forgetting to recompile and correctly include the ZFS modules as well into the new kernel / initramfs ...). I then have booted the new kernel with intel_iommu=on. The boot process went normally - the AHCI / SATA driver now is behaving correctly when initializing the controller in question. I then have configured my system to let the vfio_pci kernel driver grab that controller during the boot process, and have made sure that vfio_pci gets loaded before the AHCI kernel driver. That also worked well; dmesg |grep vfio was showing the expected output, and lspci was showing that the controller indeed was under control of vfio_pci. But I couldn't get any further, and this is the bad news: I have spent the rest of my day with trying to actually pass through the controller to the VM in question. I am starting this VM by command line: qemu_xxx <option> My first step was to change the machine model from pc (the default) to q35 because I thought it would be a good idea to use the default pcie.0 bus that model provides. Since https://github.com/qemu/qemu/blob/master/docs/pcie.txt says that we shouldn't connect PCIe devices directly to the pcie.0 bus, I have then added -device ioh3420,bus=pcie.0,addr=1c.0,multifunction=on,port=2,chassis=1,id=root.1 to the command line and booted the VM. This went normally, but of course the OS in the VM did not find the controller because the above line only adds a new PCIe root bus, but does not pass through the controller. However, I am considering it noteworthy that this worked. As the final step, I added -device vfio-pci,host=02:00.0,bus=root.1,addr=00.0 to the command line. For the rest of the day, I have tested any combination of vfio-pci and ioh3420 options which I have found in various tutorials / threads and which came to my mind. With every of these combinations, the VM shows the same behavior: The Seabios boot screen hangs for about a minute or so. Then the OS (W2K8 R2 server 64 bit) hangs forever at the first screen which shows the progress bar. By booting into safe mode, I have found out that this happens when it tries to load the classpnp.sys driver. In some cases, when starting the VM, there was a message on the console saying it was disabling IRQ 16. This is the point where I am lost (again). I think I have done something very basic badly wrong; my interpretation is that it does not find the bus topology it expects. What scares me is that even the Seabios already hangs although the greatest parts of the articles out there proposes exactly (more or less) what I am doing. Could my (Debian stretch's) qemu be too old (it is 2.8.0)? Or does quemu / vfio_pci have the same requester problem as the kernel? What else could be the reason? An example of a command line I have used: /usr/bin/qemu-system-x86_64 -machine q35,accel=kvm -cpu host -smp cores=2,threads=2,sockets=1 -rtc base=localtime,clock=host,driftfix=none -drive file=/vm-image/dax.img,format=raw,if=virtio,cache=writeback,index=0 -drive file=/dev/sda,format=raw,if=virtio,cache=none,index=1 -device ioh3420,bus=pcie.0,addr=1c.0,multifunction=on,port=2,chassis=1,id=root.1 -device vfio-pci,host=02:00.0,bus=root.1,addr=00.0 -boot c -pidfile /root/qemu-kvm/qemu-dax.pid -m 12288 -k de -daemonize -usb -usbdevice "tablet" -name dax -device virtio-net-pci,vlan=0,mac=02:01:01:01:02:01 -net tap,vlan=0,name=dax,ifname=dax0,script=/etc/qemu-ifup,downscript=/etc/qemu-ifdown -vnc :2 > Personally, I don't often assign storage controllers and they're mostly > all terrible. The Marvell controllers nearly all have this DMA > aliasing issue afaik, but you can see in the code nearby the patch > above that we already have quirks for many of them. For instance you > could buy a similarly priced Marvell 9230 card rather than a 9128 and > it might have worked since we've already got a quirk for it. Sorry I > can't be more precise, even as the device assignment maintainer I > generally use virtio for VM disks and find it to be sufficiently fast > and feature-ful. Thanks, Thank you very much - no problem. Just a short explanation: My issue is not performance; instead, I need to be able to dynamically mount and unmount ("eject") disks from within the VM (via the famous Windows tray icon "Safely remove hardware"). Some days ago, Paolo Bonzini on this list has explained me how I could achieve clean removal of HDDs from a VM, either using SCSI hotplug or PCIe hotplug. Both suggestions were working at the first sight. However, I am not sure if W2K8 R2 does reliably handle SCSI / PCIe hotplug every time, and both methods require commands in the VM and in the host system. For my use case (changing a disk twice a day without restarting the VM) this is too complicated and error prone; I really would like a solution where I only need to eject the disk from within the Windows VM. If I finally could pass through that (or another) SATA controller into the VM, this problem would be solved the most elegant way. Thank you very much again for any help, Binarus
To answer my own message: On 09.01.2018 18:58, Binarus wrote: > The Seabios boot screen hangs for about a minute or so. Then the OS > (W2K8 R2 server 64 bit) hangs forever at the first screen which shows > the progress bar. By booting into safe mode, I have found out that this > happens when it tries to load the classpnp.sys driver. > > In some cases, when starting the VM, there was a message on the console > saying it was disabling IRQ 16. > > This is the point where I am lost (again). It seems I have got it to work. I have added the option "x-no-kvm-intx=on" to the device definition. My command line is now: /usr/bin/qemu-system-x86_64 -machine q35,accel=kvm -cpu host -smp cores=2,threads=2,sockets=1 -rtc base=localtime,clock=host,driftfix=none -drive file=/vm-image/dax.img,format=raw,if=virtio,cache=writeback,index=0 -device ioh3420,bus=pcie.0,addr=1c.0,multifunction=on,port=2,chassis=1,id=root.1 -device vfio-pci,host=02:00.0,bus=root.1,addr=00.0,x-no-kvm-intx=on -boot c -pidfile /root/qemu-kvm/qemu-dax.pid -m 12288 -k de -daemonize -usb -usbdevice "tablet" -name dax -device virtio-net-pci,vlan=0,mac=02:01:01:01:02:01 -net tap,vlan=0,name=dax,ifname=dax0,script=/etc/qemu-ifup,downscript=/etc/qemu-ifdown -vnc :2 This command line makes the Seabios hang for between 30 and 60 seconds (it seems the time it takes is not always the same) during the boot process, but then boots up the W2K8 R2 server without any issue. Within the VM, I have installed the Marvell Windows drivers for the controller's chipset. Great! And as desired, I can now cleanly "eject" the disks connected to that controller without leaving the VM, i.e. without visiting the host's console. Remaining questions: - What could make the Seabios hang for such a long time upon every boot? - Could you please shortly explain what the option "x-no-kvm-intx=on" does and why I need it in this case? - Could you please shortly explain what exactly it wants to tell me when it says that it disables INT xx, and notable if this is a bad thing I should take care of? - What about the "x-no-kvm-msi" and "x-no-kvm-msix" options? Would it be better to use them as well? I couldn't find any sound information about what exactly they do (Note: Initially, I had all three of those "x-no..." options active, which made the VM boot the first time, and later out of curiosity found out that "x-no-kvm-intx" is the essential one. Without this one, the VM won't boot; the other two don't seem to change anything in my case). - Could we expect your patch to go into upstream (perhaps after the above issues / questions have been investigated)? I will try to convince the Debian people to include the patch into 4.9; if they refuse, I will have to compile a new kernel each time they release one, which happens quite often (probably security fixes) since some time ... Thank you very much again, Binarus
On Tue, 9 Jan 2018 22:36:01 +0100 Binarus <lists@binarus.de> wrote: > To answer my own message: > > On 09.01.2018 18:58, Binarus wrote: > > > The Seabios boot screen hangs for about a minute or so. Then the OS > > (W2K8 R2 server 64 bit) hangs forever at the first screen which shows > > the progress bar. By booting into safe mode, I have found out that this > > happens when it tries to load the classpnp.sys driver. > > > > In some cases, when starting the VM, there was a message on the console > > saying it was disabling IRQ 16. > > > > This is the point where I am lost (again). > > It seems I have got it to work. I have added the option > "x-no-kvm-intx=on" to the device definition. My command line is now: > > /usr/bin/qemu-system-x86_64 > -machine q35,accel=kvm > -cpu host > -smp cores=2,threads=2,sockets=1 > -rtc base=localtime,clock=host,driftfix=none > -drive file=/vm-image/dax.img,format=raw,if=virtio,cache=writeback,index=0 > -device > ioh3420,bus=pcie.0,addr=1c.0,multifunction=on,port=2,chassis=1,id=root.1 > -device vfio-pci,host=02:00.0,bus=root.1,addr=00.0,x-no-kvm-intx=on > -boot c > -pidfile /root/qemu-kvm/qemu-dax.pid > -m 12288 > -k de > -daemonize > -usb -usbdevice "tablet" > -name dax > -device virtio-net-pci,vlan=0,mac=02:01:01:01:02:01 > -net > tap,vlan=0,name=dax,ifname=dax0,script=/etc/qemu-ifup,downscript=/etc/qemu-ifdown > -vnc :2 > > This command line makes the Seabios hang for between 30 and 60 seconds > (it seems the time it takes is not always the same) during the boot > process, but then boots up the W2K8 R2 server without any issue. Within > the VM, I have installed the Marvell Windows drivers for the > controller's chipset. Great! > > And as desired, I can now cleanly "eject" the disks connected to that > controller without leaving the VM, i.e. without visiting the host's console. > > Remaining questions: > > - What could make the Seabios hang for such a long time upon every boot? Perhaps some sort of problem with the device ROM. Assuming you're not booting the VM from the assigned device, you can add rombar=0 to the qemu vfio-pci device options to disable the ROM. I suppose it's possible that SeaBIOS might know how to talk to the device regardless of the ROM, so no guarantees that will resolve it. Setting a bootindex both on the vfio-pci device and the actual boot device could help. I think the '-boot c' option is deprecated, explicitly specifying a emulated controller would be better. virt-install or virt-manager would do this for you. Also, using q35 vs 440fx for the VM machine type makes no difference, q35 is, if anything, more troublesome imo. > - Could you please shortly explain what the option "x-no-kvm-intx=on" > does and why I need it in this case? INTx is the legacy PCI interrupt (ie. INTA, INTB, INTC, INTD). This is a level triggered interrupt therefore it continues to assert until the device is serviced. It must therefore be masked on the host while it is handled by the guest. There are two paths we can use for injecting this interrupt into the VM and unmasking it on the host once the VM samples the interrupt. When KVM is used for acceleration, these happen via direct connection between the vfio-pci and kvm modules using eventfds and irqfds. The x-no-kvm-intx option disables that path, instead bouncing out to QEMU to do the same. TBH, I have no idea why this would make it work. The QEMU path is slower than the KVM path, but they should be functionally identical. > - Could you please shortly explain what exactly it wants to tell me when > it says that it disables INT xx, and notable if this is a bad thing I > should take care of? The "Disabling IRQ XX, nobody cared" message means that the specified IRQ asserted many times without any of the interrupt handlers claiming that it was their device asserting it. It then masks the interrupt at the APIC. With device assignment this can mean that the mechanism we use to mask the device doesn't work for that device. There's a vfio-pci module option you can use to have vfio-pci mask the interrupt at the APIC rather than the device, nointxmask=1. The trouble with this option is that it can only be used with exclusive interrupts, so if any other devices share the interrupt, starting the VM will fail. As a test, you can unbind conflicting devices from their drivers (assuming non-critical devices). The troublesome point here is that regardless of x-no-kvm-intx, the kernel uses the same masking technique for the device, so it's unclear why one works and the other does not. > - What about the "x-no-kvm-msi" and "x-no-kvm-msix" options? Would it be > better to use them as well? I couldn't find any sound information about > what exactly they do (Note: Initially, I had all three of those > "x-no..." options active, which made the VM boot the first time, and > later out of curiosity found out that "x-no-kvm-intx" is the essential > one. Without this one, the VM won't boot; the other two don't seem to > change anything in my case). Similar to the INTx version, they route the interrupts out through QEMU rather than inject them through a side channel with KVM. They're just slower. Generally these options are only used for debugging as they make the interrupts visible to QEMU, functionality is generally not affected. What interrupt mode does the device operate in once the VM is running? You can run 'lspci -vs <device address>' on the host and see something like: Capabilities: [50] MSI: Enable- Count=1/1 Maskable+ 64bit+ Capabilities: [70] MSI-X: Enable+ Count=10 Masked- In this case the Enable+ shows the device is using MSI-X rather than MSI, which shows Enable-. The device might not support both (or either). If none are Enable+, legacy interrupts are probably being used. Often legacy interrupts are only used at boot and then the device switches to MSI/X. If that's the case for this device, x-no-kvm-intx doesn't really hurt you runtime. > - Could we expect your patch to go into upstream (perhaps after the > above issues / questions have been investigated)? I will try to convince > the Debian people to include the patch into 4.9; if they refuse, I will > have to compile a new kernel each time they release one, which happens > quite often (probably security fixes) since some time ... I would not recommend trying to convince Debian to take a non-upstream patch, the process is that I need to do more research to figure out why this device isn't already quirked, I'm sure others have complained, but did similar patches make things worse for them or did they simply disappear. Can you confirm whether the device behaves properly for host use with the patch? Issues with assigning the device could be considered secondary if the host behavior is obviously improved. Alternatively, the 9230, or various others in that section of the quirk code, are already quirked, so you can decide if picking a different $30 card is a better option for you ;) Thanks, Alex
Thank you very much for the detailed and invaluable information! In the meantime, it has turned out that host and VM are stable, but that performance is a disaster. Therefore, the success is a pyrrhic victory. I have connected two disks to the controller and copied a large file between them from within the VM. The speed was about 3 MB/s. Of course, this does not make any sense. In any case, I will follow your advice and buy another adapter card, probably with the ADM1061. But it still would be interesting (hopefully) to figure out what is going on here. Thus, ... On 09.01.2018 23:41, Alex Williamson wrote: >> Remaining questions: >> >> - What could make the Seabios hang for such a long time upon every boot? > > Perhaps some sort of problem with the device ROM. Assuming you're not > booting the VM from the assigned device, you can add rombar=0 to the > qemu vfio-pci device options to disable the ROM. I now have tried that. Sadly, rombar=0 did not change anything. Seabios still hangs during boot for a minute or so, then the VM boots up without problems. Seabios hangs whether or not disks are connected to the controller. > Setting a bootindex > both on the vfio-pci device and the actual boot device could help. Unfortunately, setting the bootindex on the actual boot device is not possible since the boot device's image format is raw. Trying to set a bootindex makes qemu emit the following error message upon start: "[...] Block format 'raw' does not support the option 'bootindex'" I have then set the bootindex of the vfio device to 9; that did not change anything. Additionally, I have tried -boot strict=on; that didn't change anything as well. I think I can remember a message from you on another list (or maybe the same) where you were helping a person with a similar problem. If memory serves me, you were suggesting that the Seabios might be too old. Could that be the case for me, too? > I > think the '-boot c' option is deprecated, explicitly specifying a > emulated controller would be better. I have re-read qemu's manual for my host system, and of course, you are right :-) I'll try to figure out how to set the boot order in a non-deprecated fashion (but still without using bootindex). > Also, using q35 vs 440fx for the VM machine > type makes no difference, q35 is, if anything, more troublesome imo. This is interesting. I have re-tested and confirmed my initial findings: When I use -machine pc,... instead of -machine q35,..., qemu emits the following error when starting: -device ioh3420,bus=pcie.0,addr=1c.0,multifunction=on,port=2,chassis=1,id=root.1: Bus 'pcie.0' not found This is one of the few things I thought I had understood. According to my research, the q35 model establishes a root PCI Express bus by default (pcie.0), while the pc (= 440fx) model establishes only a root PCI bus by default (pci.0). The device which I would like to pass through is a PCI-E device. According to https://github.com/qemu/qemu/blob/master/docs/pcie.txt (as far as I have understood it), we should put PCI-E devices only on PCI-E buses, but not on PCI buses. If I would use -machine pc, there would be only a PCI (non-Express) root bus, and although we could plug in the pass-through device there, we shouldn't do it (or should we?). Did I get this wrong? >> - Could you please shortly explain what the option "x-no-kvm-intx=on" >> does and why I need it in this case? > > INTx is the legacy PCI interrupt (ie. INTA, INTB, INTC, INTD). This is > a level triggered interrupt therefore it continues to assert until the > device is serviced. It must therefore be masked on the host while it > is handled by the guest. There are two paths we can use for injecting > this interrupt into the VM and unmasking it on the host once the VM > samples the interrupt. When KVM is used for acceleration, these happen > via direct connection between the vfio-pci and kvm modules using > eventfds and irqfds. The x-no-kvm-intx option disables that path, > instead bouncing out to QEMU to do the same. I see. Thank you very much for explaining so clearly. > TBH, I have no idea why this would make it work. The QEMU path is > slower than the KVM path, but they should be functionally identical. Eventually the device design is indeed so badly broken that the functional identity might not be given in that case. I suppose that the difference in speed between the two paths is not great enough to explain the extremely slow data transfer in the VM? >> - Could you please shortly explain what exactly it wants to tell me when >> it says that it disables INT xx, and notable if this is a bad thing I >> should take care of? > > The "Disabling IRQ XX, nobody cared" message means that the specified > IRQ asserted many times without any of the interrupt handlers claiming > that it was their device asserting it. It then masks the interrupt at > the APIC. With device assignment this can mean that the mechanism we > use to mask the device doesn't work for that device. There's a > vfio-pci module option you can use to have vfio-pci mask the interrupt > at the APIC rather than the device, nointxmask=1. The trouble with > this option is that it can only be used with exclusive interrupts, so > if any other devices share the interrupt, starting the VM will fail. > As a test, you can unbind conflicting devices from their drivers > (assuming non-critical devices). Again, thank you very much for the clear explanation. I'll investigate and report back in a few hours. > The troublesome point here is that regardless of x-no-kvm-intx, the > kernel uses the same masking technique for the device, so it's unclear > why one works and the other does not. >> - What about the "x-no-kvm-msi" and "x-no-kvm-msix" options? Would it be >> better to use them as well? I couldn't find any sound information about >> what exactly they do (Note: Initially, I had all three of those >> "x-no..." options active, which made the VM boot the first time, and >> later out of curiosity found out that "x-no-kvm-intx" is the essential >> one. Without this one, the VM won't boot; the other two don't seem to >> change anything in my case). > > Similar to the INTx version, they route the interrupts out through QEMU > rather than inject them through a side channel with KVM. They're just > slower. Generally these options are only used for debugging as they > make the interrupts visible to QEMU, functionality is generally not > affected. Thank you very much - got it. > What interrupt mode does the device operate in once the VM is running? > You can run 'lspci -vs <device address>' on the host and see something > like: > > Capabilities: [50] MSI: Enable- Count=1/1 Maskable+ 64bit+ > Capabilities: [70] MSI-X: Enable+ Count=10 Masked- > > In this case the Enable+ shows the device is using MSI-X rather than > MSI, which shows Enable-. The device might not support both (or > either). If none are Enable+, legacy interrupts are probably being > used. It says: ... Capabilities: [50] MSI: Enable- Count=1/1 Maskable- 64bit- ... Capabilities: [70] Express (v2) Legacy Endpoint, MSI 00 ... Nothing else containing the string "MSI" is in the output. > Often legacy interrupts are only used at boot and then the device > switches to MSI/X. If that's the case for this device, x-no-kvm-intx > doesn't really hurt you runtime. >> - Could we expect your patch to go into upstream (perhaps after the >> above issues / questions have been investigated)? I will try to convince >> the Debian people to include the patch into 4.9; if they refuse, I will >> have to compile a new kernel each time they release one, which happens >> quite often (probably security fixes) since some time ... > > I would not recommend trying to convince Debian to take a non-upstream > patch, the process is that I need to do more research to figure out > why this device isn't already quirked, I'm sure others have complained, > but did similar patches make things worse for them or did they simply > disappear. Can you confirm whether the device behaves properly for > host use with the patch? Issues with assigning the device could be > considered secondary if the host behavior is obviously improved. I can definitely confirm that the patch vastly improves behavior for the host. As I have described in my first message, without the patch and with intel_iommu=on, the boot process hung for a minute or so when the kernel tried to initialize that controller, thereby obviously hitting timeouts and spitting out error messages multiple times. The two most relevant messages were that the SATA link speed would be reduced (saying one time to 3.0 Gbps and the next time to 1.5 Gbps, repeating multiple times), for both channels, and that the disk(s) could not be identified (if a disk(s) was (were) connected). This applies to both channels, and consequently, the respective block devices were missing after the boot process had finished. I have verified this behavior multiple times with the controller card connected to different slots, with and without HDDs connected, and after cold boots as well as after warm boots. There were no issues when the kernel parameter intel_iommu had *not* been given. With your patch applied, the system boots up without any problem whether or not the intel_iommu=on is given. I have verified this multiple times, putting the controller in different slots. In every case, the boot process went normally, and the disks connected to the controller had become block devices as expected once the system had finished booting. Likewise, I have tested the behavior with the patched kernel, but without the intel_iommu parameter. I did not notice any problems. All tests were done with the Debian 4.9.0 kernel with Debian patches (version 65). When patching the kernel, I have downloaded the Debian kernel source package, unpacked it, copied the config from the stock kernel, applied your patch and then compiled. During yesterday's research, I had the system running without passing through that controller most time (because pass-through didn't work yet); instead, I had passed through two disks (i.e. the block devices) connected to the controller via virtio into the VM in question. I did not notice any problem or misbehavior. This is production (VM) server, so I surely had noticed if there had been problems :-) In summary, despite the short testing time, we can conclude: 1) Your patch only affects people with a Marvel ...9128 SATA chipset. 2) People without intel_iommu=on do not benefit from your patch, and are not hurt by it. 3) People with intel_iommu=on and a stock kernel will not be able to boot cleanly if that SATA chip is in the system; the disks connected to that chip probably won't be recognized (as in my case); if this happens nevertheless, it probably would be dangerous to use them. 4) People with the patched kernel will be able to use that controller without any problem, whether intel_iommu=on is given or not; at least, I can definitely confirm that the boot problems are being solved by that patch. Long term stability should be further tested. Although I am personally convinced and will use the controller in production (either for pass through if I can make it work in terms of performance or in another machine for the host system), I do not take the responsibility. I am just reporting my personal experience. > Alternatively, the 9230, or various others in that section of the > quirk code, are already quirked, so you can decide if picking a > different $30 card is a better option for you ;) Thanks, Perhaps I'll even buy two different ones: One with the 9230 (but seriously wondering why its design should be less flawed than that of the 9128), and one with the ADM1061 (hoping there is at least one company which did it right - getting Windows driver for that one could be a nightmare, though). Thank you very much again, Binarus
Alex, thank you! I think I have solved the performance problem and have made some interesting observations. On 09.01.2018 23:41, Alex Williamson wrote: >> - Could you please shortly explain what exactly it wants to tell me when >> it says that it disables INT xx, and notable if this is a bad thing I >> should take care of? > > The "Disabling IRQ XX, nobody cared" message means that the specified > IRQ asserted many times without any of the interrupt handlers claiming > that it was their device asserting it. It then masks the interrupt at > the APIC. With device assignment this can mean that the mechanism we > use to mask the device doesn't work for that device. There's a > vfio-pci module option you can use to have vfio-pci mask the interrupt > at the APIC rather than the device, nointxmask=1. The trouble with > this option is that it can only be used with exclusive interrupts, so > if any other devices share the interrupt, starting the VM will fail. > As a test, you can unbind conflicting devices from their drivers > (assuming non-critical devices). This statement has put me on the right track: First, I rebooted the machine without vfio_pci and looked into /proc/interrupts. The SATA controller in question was bound to INT 37 and was the *only* device using that INT. I then rebooted with vfio_pci active and tried to start the VM, passing through the SATA controller to it. As described in my previous messages, the console showed an error message saying that it disabled INT 16 (!) when starting the VM. I looked into /proc/interrupts again and noticed that INT 16 was bound to one of the USB ports, and that this was the only device using INT 16. Then I added nointxmask=1 to vfio_pci's options, made depmod and updated the initramfs and kept this setting for all further experiments. After having rebooted, I removed all "x-no-" options (the ones we talked about recently) from the device definitions of the VM. Then I unbound the USB port in question (i.e. the one which used INT 16) from its driver. Although lspci was still claiming that this USB port was using INT 16, /proc/interrupts showed that INT 16 was not bound to a driver any more. Then I started the VM. The console did not show any messages any more, the VM booted without any issue, *and SATA speed was back to normal again* (100 MB/s with nointxmask=1 and that USB port unbound versus 2 MB/s without nointxmask and without unbinding that USB port). I have lost one USB port, but finally have full SATA hardware in the VM. I can very well live with the lost USB port because there are plenty of them, and it was USB 1.1 anyway. I will stick with this configuration for the next time. *And here is the interesting (from my naive point of view) part which might explain what happened:* /proc/interrupts (with the VM running!) shows that *vfio-intx is using INT 16* now. KVM / Quemu obviously had the idea to assign INT 16 to the vfio device *although* INT 16 was already bound to a USB port which was active in the host, and *although* the device which is passed through would be at INT 37 if vfio_pci would not be active. Therefore, the console was showing the error message regarding INT 16; obviously, the kernel / KVM / QEMU could not handle the interrupt sharing between the host USB port and the vfio_pci device which KVM / QEMU had made necessary. By the way, this is the only vfio_pci device on this machine. Should we consider this behavior a bug? Why does a vfio_pci device get bound to an interrupt which is bound to another hardware device on the host? Do we have any chance to influence that (modinfo vfio_pci does not show any parameter related to interrupt numbers)? >> - Could we expect your patch to go into upstream (perhaps after the >> above issues / questions have been investigated)? I will try to convince >> the Debian people to include the patch into 4.9; if they refuse, I will >> have to compile a new kernel each time they release one, which happens >> quite often (probably security fixes) since some time ... > > I would not recommend trying to convince Debian to take a non-upstream > patch, the process is that I need to do more research to figure out > why this device isn't already quirked, I'm sure others have complained, > but did similar patches make things worse for them or did they simply > disappear. Can you confirm whether the device behaves properly for > host use with the patch? Issues with assigning the device could be > considered secondary if the host behavior is obviously improved. > Alternatively, the 9230, or various others in that section of the > quirk code, are already quirked, so you can decide if picking a > different $30 card is a better option for you ;) Thanks, I am not sure if the interrupt conflict between the USB port and vfio_pci is related to that chipset in particular. I guess (it's really that: a guess) that KVM or QEMU do not assign an appropriate interrupt number to vfio_pci devices under certain circumstances. If this is the case, it could happen with other controllers / chipsets of all kind as well. Thus, I assume we have that controller running now. If you are interested, I will test for a while and report back if it is stable; I would like to keep it passed through into the VM, though, so I can't test if it is stable for the host. However, if the letter is a high priority thing for you, I'll revert the configuration and let it run in the host for a week or so. Regards and many thanks, Binarus
diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c index 98eba9127a0b..19ca3c9fac3a 100644 --- a/drivers/pci/quirks.c +++ b/drivers/pci/quirks.c @@ -3868,6 +3868,8 @@ static void quirk_dma_func1_alias(struct pci_dev *dev) /* https://bugzilla.kernel.org/show_bug.cgi?id=42679#c49 */ DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_MARVELL_EXT, 0x9230, quirk_dma_func1_alias); +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_MARVELL_EXT, 0x9128, + quirk_dma_func1_alias); DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_TTI, 0x0642, quirk_dma_func1_alias); /* https://bugs.gentoo.org/show_bug.cgi?id=497630 */