Message ID | 1343115724.1813.69.camel@haakon2.linux-iscsi.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On 2012-07-24 09:42, Nicholas A. Bellinger wrote: > Hi Anthony, Stefan & QEMU folks, > > So during the process of separating out the patches from Zhi's > vhost-scsi tree this evening, I managed to squash everything down to > nine nicely reviewable patches that apply against the current > qemu.git/master: > > Nicholas Bellinger (1): > virtio-scsi: Set max_target=0 during vhost-scsi operation > > Stefan Hajnoczi (8): > notifier: add validity check and notify function > virtio-pci: support host notifiers in TCG mode > virtio-pci: check that event notification worked > vhost: Pass device path to vhost_dev_init() > virtio-scsi: Add wwpn and tgpt properties > virtio-scsi: Open and initialize /dev/vhost-scsi > virtio-scsi: Start/stop vhost > vhost-scsi: add -vhost-scsi host device > > > However, thus far I've not been able to get virtio-scsi <-> tcm_vhost > I/O to actually work against the latest qemu.git/master.. > > So while doing a (manual) bisection w/ this series to track down the > issue with qemu/master, I managed to run across something else.. With > the vhost-scsi series applied, everything is working as expected up > until the following commit: > > commit 1523ed9e1d46b0b54540049d491475ccac7e6421 > Author: Jan Kiszka <jan.kiszka@siemens.com> > Date: Thu May 17 10:32:39 2012 -0300 > > virtio/vhost: Add support for KVM in-kernel MSI injection > > > This commit ends up triggering the following assert immediately after > starting qemu with virtio-scsi <-> tcm_vhost: > > qemu-system-x86_64: /usr/src/qemu.git/hw/msix.c:515: > msix_unset_vector_notifiers: Assertion `dev->msix_vector_use_notifier && > dev->msix_vector_release_notifier' failed. > > OK, so adding the following hack allows me to boot: > > diff --git a/hw/msix.c b/hw/msix.c > index 59c7a83..6036909 100644 > --- a/hw/msix.c > +++ b/hw/msix.c > @@ -511,6 +511,11 @@ void msix_unset_vector_notifiers(PCIDevice *dev) > { > int vector; > > + if (!dev->msix_vector_use_notifier && !dev->msix_vector_release_notifier) { > + printf("Hit NULL msix_unset_vector_notifiers for: %s\n", dev->name); > + return; > + } > + > assert(dev->msix_vector_use_notifier && > dev->msix_vector_release_notifier); > > -- Can you post a backtrace from gdb? Also, is there a git tree and a way to reproduce this without special hardware needs? > > and virtio-scsi is then able to load + detect tcm_vhost LUNs as > expected. > > However the random I/O performance with commit 1523ed9e1d46b is off by a > couple of orders of magnitude, ~6K IOPs compared to ~60K IOPs on raw > block flash using just the previous commit bdd00bdc64ba in Jan's series. > > So AFAICT there appears to be a serious performance regression that is > easily reproducible with that patch, which is about as far along as I've > been able to diagnose yet. > > Interestingly enough, virtio-scsi-raw performance does not seem to be > effected AFAICT by this regression, and is still able to go ~20K IOPs > with the same workload using commit 1523ed9e1d46b. (Roughly the same as > before) > > Does anyone have any idea why commit 1523ed9e1d46b would be killing > vhost / tcm_vhost performance so terribly, or is there something else > that vhost / vhost-scsi should be doing with new code..? No good idea yet, will have to look closer. Maybe you are somehow deassigning (via set_guest_notifiers) before assigning. But that would not yet explain performance regressions. Your target is exposing MSI-X, isn't it? Jan
On Tue, Jul 24, 2012 at 8:57 AM, Jan Kiszka <jan.kiszka@siemens.com> wrote: > On 2012-07-24 09:42, Nicholas A. Bellinger wrote: >> Hi Anthony, Stefan & QEMU folks, >> >> So during the process of separating out the patches from Zhi's >> vhost-scsi tree this evening, I managed to squash everything down to >> nine nicely reviewable patches that apply against the current >> qemu.git/master: >> >> Nicholas Bellinger (1): >> virtio-scsi: Set max_target=0 during vhost-scsi operation >> >> Stefan Hajnoczi (8): >> notifier: add validity check and notify function >> virtio-pci: support host notifiers in TCG mode >> virtio-pci: check that event notification worked >> vhost: Pass device path to vhost_dev_init() >> virtio-scsi: Add wwpn and tgpt properties >> virtio-scsi: Open and initialize /dev/vhost-scsi >> virtio-scsi: Start/stop vhost >> vhost-scsi: add -vhost-scsi host device >> >> >> However, thus far I've not been able to get virtio-scsi <-> tcm_vhost >> I/O to actually work against the latest qemu.git/master.. >> >> So while doing a (manual) bisection w/ this series to track down the >> issue with qemu/master, I managed to run across something else.. With >> the vhost-scsi series applied, everything is working as expected up >> until the following commit: >> >> commit 1523ed9e1d46b0b54540049d491475ccac7e6421 >> Author: Jan Kiszka <jan.kiszka@siemens.com> >> Date: Thu May 17 10:32:39 2012 -0300 >> >> virtio/vhost: Add support for KVM in-kernel MSI injection >> >> >> This commit ends up triggering the following assert immediately after >> starting qemu with virtio-scsi <-> tcm_vhost: >> >> qemu-system-x86_64: /usr/src/qemu.git/hw/msix.c:515: >> msix_unset_vector_notifiers: Assertion `dev->msix_vector_use_notifier && >> dev->msix_vector_release_notifier' failed. >> >> OK, so adding the following hack allows me to boot: >> >> diff --git a/hw/msix.c b/hw/msix.c >> index 59c7a83..6036909 100644 >> --- a/hw/msix.c >> +++ b/hw/msix.c >> @@ -511,6 +511,11 @@ void msix_unset_vector_notifiers(PCIDevice *dev) >> { >> int vector; >> >> + if (!dev->msix_vector_use_notifier && !dev->msix_vector_release_notifier) { >> + printf("Hit NULL msix_unset_vector_notifiers for: %s\n", dev->name); >> + return; >> + } >> + >> assert(dev->msix_vector_use_notifier && >> dev->msix_vector_release_notifier); >> >> -- > > Can you post a backtrace from gdb? > > Also, is there a git tree and a way to reproduce this without special > hardware needs? > >> >> and virtio-scsi is then able to load + detect tcm_vhost LUNs as >> expected. >> >> However the random I/O performance with commit 1523ed9e1d46b is off by a >> couple of orders of magnitude, ~6K IOPs compared to ~60K IOPs on raw >> block flash using just the previous commit bdd00bdc64ba in Jan's series. >> >> So AFAICT there appears to be a serious performance regression that is >> easily reproducible with that patch, which is about as far along as I've >> been able to diagnose yet. >> >> Interestingly enough, virtio-scsi-raw performance does not seem to be >> effected AFAICT by this regression, and is still able to go ~20K IOPs >> with the same workload using commit 1523ed9e1d46b. (Roughly the same as >> before) >> >> Does anyone have any idea why commit 1523ed9e1d46b would be killing >> vhost / tcm_vhost performance so terribly, or is there something else >> that vhost / vhost-scsi should be doing with new code..? > > No good idea yet, will have to look closer. > > Maybe you are somehow deassigning (via set_guest_notifiers) before > assigning. But that would not yet explain performance regressions. Your > target is exposing MSI-X, isn't it? Regarding performance, have the command-line options to enable irqchip or anything like that changed after the qemu-kvm.git -> qemu.git refactoring? Stefan -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 2012-07-24 14:05, Stefan Hajnoczi wrote: > On Tue, Jul 24, 2012 at 8:57 AM, Jan Kiszka <jan.kiszka@siemens.com> wrote: >> On 2012-07-24 09:42, Nicholas A. Bellinger wrote: >>> Hi Anthony, Stefan & QEMU folks, >>> >>> So during the process of separating out the patches from Zhi's >>> vhost-scsi tree this evening, I managed to squash everything down to >>> nine nicely reviewable patches that apply against the current >>> qemu.git/master: >>> >>> Nicholas Bellinger (1): >>> virtio-scsi: Set max_target=0 during vhost-scsi operation >>> >>> Stefan Hajnoczi (8): >>> notifier: add validity check and notify function >>> virtio-pci: support host notifiers in TCG mode >>> virtio-pci: check that event notification worked >>> vhost: Pass device path to vhost_dev_init() >>> virtio-scsi: Add wwpn and tgpt properties >>> virtio-scsi: Open and initialize /dev/vhost-scsi >>> virtio-scsi: Start/stop vhost >>> vhost-scsi: add -vhost-scsi host device >>> >>> >>> However, thus far I've not been able to get virtio-scsi <-> tcm_vhost >>> I/O to actually work against the latest qemu.git/master.. >>> >>> So while doing a (manual) bisection w/ this series to track down the >>> issue with qemu/master, I managed to run across something else.. With >>> the vhost-scsi series applied, everything is working as expected up >>> until the following commit: >>> >>> commit 1523ed9e1d46b0b54540049d491475ccac7e6421 >>> Author: Jan Kiszka <jan.kiszka@siemens.com> >>> Date: Thu May 17 10:32:39 2012 -0300 >>> >>> virtio/vhost: Add support for KVM in-kernel MSI injection >>> >>> >>> This commit ends up triggering the following assert immediately after >>> starting qemu with virtio-scsi <-> tcm_vhost: >>> >>> qemu-system-x86_64: /usr/src/qemu.git/hw/msix.c:515: >>> msix_unset_vector_notifiers: Assertion `dev->msix_vector_use_notifier && >>> dev->msix_vector_release_notifier' failed. >>> >>> OK, so adding the following hack allows me to boot: >>> >>> diff --git a/hw/msix.c b/hw/msix.c >>> index 59c7a83..6036909 100644 >>> --- a/hw/msix.c >>> +++ b/hw/msix.c >>> @@ -511,6 +511,11 @@ void msix_unset_vector_notifiers(PCIDevice *dev) >>> { >>> int vector; >>> >>> + if (!dev->msix_vector_use_notifier && !dev->msix_vector_release_notifier) { >>> + printf("Hit NULL msix_unset_vector_notifiers for: %s\n", dev->name); >>> + return; >>> + } >>> + >>> assert(dev->msix_vector_use_notifier && >>> dev->msix_vector_release_notifier); >>> >>> -- >> >> Can you post a backtrace from gdb? >> >> Also, is there a git tree and a way to reproduce this without special >> hardware needs? >> >>> >>> and virtio-scsi is then able to load + detect tcm_vhost LUNs as >>> expected. >>> >>> However the random I/O performance with commit 1523ed9e1d46b is off by a >>> couple of orders of magnitude, ~6K IOPs compared to ~60K IOPs on raw >>> block flash using just the previous commit bdd00bdc64ba in Jan's series. >>> >>> So AFAICT there appears to be a serious performance regression that is >>> easily reproducible with that patch, which is about as far along as I've >>> been able to diagnose yet. >>> >>> Interestingly enough, virtio-scsi-raw performance does not seem to be >>> effected AFAICT by this regression, and is still able to go ~20K IOPs >>> with the same workload using commit 1523ed9e1d46b. (Roughly the same as >>> before) >>> >>> Does anyone have any idea why commit 1523ed9e1d46b would be killing >>> vhost / tcm_vhost performance so terribly, or is there something else >>> that vhost / vhost-scsi should be doing with new code..? >> >> No good idea yet, will have to look closer. >> >> Maybe you are somehow deassigning (via set_guest_notifiers) before >> assigning. But that would not yet explain performance regressions. Your >> target is exposing MSI-X, isn't it? > > Regarding performance, have the command-line options to enable irqchip > or anything like that changed after the qemu-kvm.git -> qemu.git > refactoring? Current qemu.git has kernel_irqchip=on by default now, so the command line should not matter (unless you explicitly turn it off). Jan
On Tue, 2012-07-24 at 09:57 +0200, Jan Kiszka wrote: > On 2012-07-24 09:42, Nicholas A. Bellinger wrote: > > Hi Anthony, Stefan & QEMU folks, > > <SNIP> > > However, thus far I've not been able to get virtio-scsi <-> tcm_vhost > > I/O to actually work against the latest qemu.git/master.. > > > > So while doing a (manual) bisection w/ this series to track down the > > issue with qemu/master, I managed to run across something else.. With > > the vhost-scsi series applied, everything is working as expected up > > until the following commit: > > > > commit 1523ed9e1d46b0b54540049d491475ccac7e6421 > > Author: Jan Kiszka <jan.kiszka@siemens.com> > > Date: Thu May 17 10:32:39 2012 -0300 > > > > virtio/vhost: Add support for KVM in-kernel MSI injection > > > > > > This commit ends up triggering the following assert immediately after > > starting qemu with virtio-scsi <-> tcm_vhost: > > > > qemu-system-x86_64: /usr/src/qemu.git/hw/msix.c:515: > > msix_unset_vector_notifiers: Assertion `dev->msix_vector_use_notifier && > > dev->msix_vector_release_notifier' failed. > > > > OK, so adding the following hack allows me to boot: > > > > diff --git a/hw/msix.c b/hw/msix.c > > index 59c7a83..6036909 100644 > > --- a/hw/msix.c > > +++ b/hw/msix.c > > @@ -511,6 +511,11 @@ void msix_unset_vector_notifiers(PCIDevice *dev) > > { > > int vector; > > > > + if (!dev->msix_vector_use_notifier && !dev->msix_vector_release_notifier) { > > + printf("Hit NULL msix_unset_vector_notifiers for: %s\n", dev->name); > > + return; > > + } > > + > > assert(dev->msix_vector_use_notifier && > > dev->msix_vector_release_notifier); > > > > -- > > Can you post a backtrace from gdb? > Sure, w/o the above patch the backtrace with commit 1523ed9e1d looks like the following: (gdb) run Starting program: /usr/src/qemu.git/x86_64-softmmu/qemu-system-x86_64 -enable-kvm -smp 2 -m 2048 -serial file:/tmp/vhost-serial.txt -hda /usr/src/qemu-vhost.git/debian_squeeze_amd64_standard-old.qcow2 -vhost-scsi id=vhost-scsi0,wwpn=naa.600140579ad21088,tpgt=1 -device virtio-scsi-pci,vhost-scsi=vhost-scsi0,event_idx=off [Thread debugging using libthread_db enabled] wwpn = "vhost-scsi0" tpgt = "1" [New Thread 0x7ffff45f8700 (LWP 26508)] [New Thread 0x7ffff3bf6700 (LWP 26509)] [New Thread 0x7ffff33f5700 (LWP 26510)] vhost_scsi_stop Failed to clear endpoint qemu-system-x86_64: /usr/src/qemu.git/hw/msix.c:515: msix_unset_vector_notifiers: Assertion `dev->msix_vector_use_notifier && dev->msix_vector_release_notifier' failed. Program received signal SIGABRT, Aborted. 0x00007ffff5e8b165 in raise () from /lib/libc.so.6 (gdb) bt #0 0x00007ffff5e8b165 in raise () from /lib/libc.so.6 #1 0x00007ffff5e8df70 in abort () from /lib/libc.so.6 #2 0x00007ffff5e842b1 in __assert_fail () from /lib/libc.so.6 #3 0x00000000004a84a1 in msix_unset_vector_notifiers (dev=0x1463a70) at /usr/src/qemu.git/hw/msix.c:514 #4 0x00000000004d2865 in virtio_pci_set_guest_notifiers (opaque=0x6788, assign=136) at /usr/src/qemu.git/hw/virtio-pci.c:703 #5 0x000000000062955f in vhost_dev_stop (hdev=0x126c8a8, vdev=0x1465220) at /usr/src/qemu.git/hw/vhost.c:954 #6 0x0000000000628989 in vhost_scsi_stop (vs=0x126c890, vdev=0x1465220) at /usr/src/qemu.git/hw/vhost-scsi.c:115 #7 0x000000000062f5c9 in virtio_scsi_set_status (vdev=0x1465220, val=<value optimized out>) at /usr/src/qemu.git/hw/virtio-scsi.c:631 #8 0x0000000000632082 in virtio_set_status (vdev=0x1465220, val=136 '\210') at /usr/src/qemu.git/hw/virtio.c:507 #9 0x0000000000633410 in virtio_reset (opaque=0x6788) at /usr/src/qemu.git/hw/virtio.c:517 #10 0x00000000004d30a9 in virtio_pci_reset (d=0x1463a70) at /usr/src/qemu.git/hw/virtio-pci.c:280 #11 0x00000000004fc909 in qdev_reset_one (dev=0x6788, opaque=0x6788) at /usr/src/qemu.git/hw/qdev.c:207 #12 0x00000000004fc670 in qdev_walk_children (dev=0x1463a70, devfn=0x4fc8f0 <qdev_reset_one>, busfn=0x4fc510 <qbus_reset_one>, opaque=0x0) at /usr/src/qemu.git/hw/qdev.c:372 #13 0x00000000004ae43d in pci_device_reset (dev=0x6788) at /usr/src/qemu.git/hw/pci.c:163 #14 0x00000000004ae64f in pci_bus_reset (bus=0x1415bd0) at /usr/src/qemu.git/hw/pci.c:206 #15 0x00000000004ae699 in pcibus_reset (qbus=0x6788) at /usr/src/qemu.git/hw/pci.c:213 #16 0x00000000004fc710 in qbus_walk_children (bus=0x1415bd0, devfn=0x4fc8f0 <qdev_reset_one>, busfn=0x6, opaque=0x0) at /usr/src/qemu.git/hw/qdev.c:349 #17 0x00000000004fc6a3 in qdev_walk_children (dev=<value optimized out>, devfn=0x4fc8f0 <qdev_reset_one>, busfn=0x4fc510 <qbus_reset_one>, opaque=0x0) at /usr/src/qemu.git/hw/qdev.c:379 #18 0x00000000004fc745 in qbus_walk_children (bus=<value optimized out>, devfn=0x4fc8f0 <qdev_reset_one>, busfn=0x4fc510 <qbus_reset_one>, opaque=0x0) at /usr/src/qemu.git/hw/qdev.c:356 #19 0x00000000004d5822 in qemu_system_reset (report=false) at /usr/src/qemu.git/vl.c:1412 #20 0x00000000004d70bb in main (argc=<value optimized out>, argv=<value optimized out>, envp=<value optimized out>) at /usr/src/qemu.git/vl.c:3647 (gdb) > Also, is there a git tree and a way to reproduce this without special > hardware needs? > I'll push this series + branches to demonstrate the issue into an public tree this afternoon. Also, the particular backend is a Fusion-IO raw block flash device, but I'm pretty sure that using a TCM RAMDISK into tcm_vhost would exhibit the same type of behavior. (Will double check on that shortly..) > > > > and virtio-scsi is then able to load + detect tcm_vhost LUNs as > > expected. > > > > However the random I/O performance with commit 1523ed9e1d46b is off by a > > couple of orders of magnitude, ~6K IOPs compared to ~60K IOPs on raw > > block flash using just the previous commit bdd00bdc64ba in Jan's series. > > > > So AFAICT there appears to be a serious performance regression that is > > easily reproducible with that patch, which is about as far along as I've > > been able to diagnose yet. > > > > Interestingly enough, virtio-scsi-raw performance does not seem to be > > effected AFAICT by this regression, and is still able to go ~20K IOPs > > with the same workload using commit 1523ed9e1d46b. (Roughly the same as > > before) > > > > Does anyone have any idea why commit 1523ed9e1d46b would be killing > > vhost / tcm_vhost performance so terribly, or is there something else > > that vhost / vhost-scsi should be doing with new code..? > > No good idea yet, will have to look closer. > <nod>, thanks for your help Jan. ;) > Maybe you are somehow deassigning (via set_guest_notifiers) before > assigning. But that would not yet explain performance regressions. Your > target is exposing MSI-X, isn't it? > I believe that is correct. --nab -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, 2012-07-24 at 13:20 -0700, Nicholas A. Bellinger wrote: > On Tue, 2012-07-24 at 09:57 +0200, Jan Kiszka wrote: > > On 2012-07-24 09:42, Nicholas A. Bellinger wrote: > > > Hi Anthony, Stefan & QEMU folks, > > > > > <SNIP> > > > > However, thus far I've not been able to get virtio-scsi <-> tcm_vhost > > > I/O to actually work against the latest qemu.git/master.. > > > > > > So while doing a (manual) bisection w/ this series to track down the > > > issue with qemu/master, I managed to run across something else.. With > > > the vhost-scsi series applied, everything is working as expected up > > > until the following commit: > > > > > > commit 1523ed9e1d46b0b54540049d491475ccac7e6421 > > > Author: Jan Kiszka <jan.kiszka@siemens.com> > > > Date: Thu May 17 10:32:39 2012 -0300 > > > > > > virtio/vhost: Add support for KVM in-kernel MSI injection > > > > > > > > > This commit ends up triggering the following assert immediately after > > > starting qemu with virtio-scsi <-> tcm_vhost: > > > > > > qemu-system-x86_64: /usr/src/qemu.git/hw/msix.c:515: > > > msix_unset_vector_notifiers: Assertion `dev->msix_vector_use_notifier && > > > dev->msix_vector_release_notifier' failed. > > > > > > OK, so adding the following hack allows me to boot: <SNIP> > > Also, is there a git tree and a way to reproduce this without special > > hardware needs? > > > > I'll push this series + branches to demonstrate the issue into an public > tree this afternoon. > OK, the updated vhost-scsi QEMU working tree is now available here: http://git.kernel.org/?p=virt/kvm/nab/qemu-kvm.git;a=summary Please ignore the qemu-kvm.git name for the moment, as it really is based on upstream qemu. ;) The branch breakdown looks like: master 401a663 remove unused QemuOpts parameter from net init functions * vhost-scsi bdd00bd msix: Add msix_nr_vectors_allocated vhost-scsi-merge e4fdb8c [ahead 9] vhost-scsi: add -vhost-scsi host device vhost-scsi-workaround 32ed0db msix: Work-around for vhost-scsi with KVM in-kernel MSI injection - 'vhost-scsi' contains vhost-scsi patches + pre commit 1523ed9e1d logic (Running stable at ~67.5K IOPs, see below) - 'vhost-scsi-merge' contains a vhost-scsi series against yesterday's qemu.git HEAD (not working atm, but will be sending out for RFC shortly) - 'vhost-scsi-workaround' contains the msix.c workaround to get commit 1523ed9e1d to boot (reproducible performance regression, down to 6K IOPs) > Also, the particular backend is a Fusion-IO raw block flash device, but > I'm pretty sure that using a TCM RAMDISK into tcm_vhost would exhibit > the same type of behavior. (Will double check on that shortly..) > Interestingly enough, after running some test with the vhost-scsi branch above with the last good working commit (bdd00bd), the same fio benchmark inline below is now reaching ~67.5K IOPs using v3.5-rc2 guest + host.. rw=randrw rwmixwrite=25 rwmixread=75 size=32768m ioengine=libaio direct=1 iodepth=64 blocksize=4k numjobs=2 filename=/dev/sdb This ends up being within a few percentage of what the bare-metal HW is capable of.. (~70K IOPs per LUN for an ioDrive Duo, Jen's CC'ed). Note the older QEMU vhost-scsi tree based on pre 1.1.0 code was doing ~60K IOPs for the same test.. So this looks promising pre commit 1523ed9e1d vs. what we've been using with vhost-scsi so far.. ;) --nab -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 2012-07-24 22:20, Nicholas A. Bellinger wrote: > On Tue, 2012-07-24 at 09:57 +0200, Jan Kiszka wrote: >> On 2012-07-24 09:42, Nicholas A. Bellinger wrote: >>> Hi Anthony, Stefan & QEMU folks, >>> > > <SNIP> > >>> However, thus far I've not been able to get virtio-scsi <-> tcm_vhost >>> I/O to actually work against the latest qemu.git/master.. >>> >>> So while doing a (manual) bisection w/ this series to track down the >>> issue with qemu/master, I managed to run across something else.. With >>> the vhost-scsi series applied, everything is working as expected up >>> until the following commit: >>> >>> commit 1523ed9e1d46b0b54540049d491475ccac7e6421 >>> Author: Jan Kiszka <jan.kiszka@siemens.com> >>> Date: Thu May 17 10:32:39 2012 -0300 >>> >>> virtio/vhost: Add support for KVM in-kernel MSI injection >>> >>> >>> This commit ends up triggering the following assert immediately after >>> starting qemu with virtio-scsi <-> tcm_vhost: >>> >>> qemu-system-x86_64: /usr/src/qemu.git/hw/msix.c:515: >>> msix_unset_vector_notifiers: Assertion `dev->msix_vector_use_notifier && >>> dev->msix_vector_release_notifier' failed. >>> >>> OK, so adding the following hack allows me to boot: >>> >>> diff --git a/hw/msix.c b/hw/msix.c >>> index 59c7a83..6036909 100644 >>> --- a/hw/msix.c >>> +++ b/hw/msix.c >>> @@ -511,6 +511,11 @@ void msix_unset_vector_notifiers(PCIDevice *dev) >>> { >>> int vector; >>> >>> + if (!dev->msix_vector_use_notifier && !dev->msix_vector_release_notifier) { >>> + printf("Hit NULL msix_unset_vector_notifiers for: %s\n", dev->name); >>> + return; >>> + } >>> + >>> assert(dev->msix_vector_use_notifier && >>> dev->msix_vector_release_notifier); >>> >>> -- >> >> Can you post a backtrace from gdb? >> > > Sure, w/o the above patch the backtrace with commit 1523ed9e1d looks > like the following: > > (gdb) run > Starting program: /usr/src/qemu.git/x86_64-softmmu/qemu-system-x86_64 -enable-kvm -smp 2 -m 2048 -serial file:/tmp/vhost-serial.txt -hda /usr/src/qemu-vhost.git/debian_squeeze_amd64_standard-old.qcow2 -vhost-scsi id=vhost-scsi0,wwpn=naa.600140579ad21088,tpgt=1 -device virtio-scsi-pci,vhost-scsi=vhost-scsi0,event_idx=off > [Thread debugging using libthread_db enabled] > wwpn = "vhost-scsi0" tpgt = "1" > [New Thread 0x7ffff45f8700 (LWP 26508)] > [New Thread 0x7ffff3bf6700 (LWP 26509)] > [New Thread 0x7ffff33f5700 (LWP 26510)] > vhost_scsi_stop > Failed to clear endpoint > qemu-system-x86_64: /usr/src/qemu.git/hw/msix.c:515: msix_unset_vector_notifiers: Assertion `dev->msix_vector_use_notifier && dev->msix_vector_release_notifier' failed. > > Program received signal SIGABRT, Aborted. > 0x00007ffff5e8b165 in raise () from /lib/libc.so.6 > (gdb) bt > #0 0x00007ffff5e8b165 in raise () from /lib/libc.so.6 > #1 0x00007ffff5e8df70 in abort () from /lib/libc.so.6 > #2 0x00007ffff5e842b1 in __assert_fail () from /lib/libc.so.6 > #3 0x00000000004a84a1 in msix_unset_vector_notifiers (dev=0x1463a70) at /usr/src/qemu.git/hw/msix.c:514 > #4 0x00000000004d2865 in virtio_pci_set_guest_notifiers (opaque=0x6788, assign=136) There are obviously random parameter values submitted. This is broken. > at /usr/src/qemu.git/hw/virtio-pci.c:703 > #5 0x000000000062955f in vhost_dev_stop (hdev=0x126c8a8, vdev=0x1465220) at /usr/src/qemu.git/hw/vhost.c:954 > #6 0x0000000000628989 in vhost_scsi_stop (vs=0x126c890, vdev=0x1465220) at /usr/src/qemu.git/hw/vhost-scsi.c:115 And I suspect this is the origin: A stop on an unstarted vhost device. Am I right? But such a bug may not yet explain a performance regression - unless the spurious stop creates a state that slows things down later on. Did you check that direct MSI-X vector injection is actually in use later on? Jan
On Tue, Jul 24, 2012 at 01:20:56PM -0700, Nicholas A. Bellinger wrote: > On Tue, 2012-07-24 at 09:57 +0200, Jan Kiszka wrote: > > On 2012-07-24 09:42, Nicholas A. Bellinger wrote: > > > Hi Anthony, Stefan & QEMU folks, > > > > > <SNIP> > > > > However, thus far I've not been able to get virtio-scsi <-> tcm_vhost > > > I/O to actually work against the latest qemu.git/master.. > > > > > > So while doing a (manual) bisection w/ this series to track down the > > > issue with qemu/master, I managed to run across something else.. With > > > the vhost-scsi series applied, everything is working as expected up > > > until the following commit: > > > > > > commit 1523ed9e1d46b0b54540049d491475ccac7e6421 > > > Author: Jan Kiszka <jan.kiszka@siemens.com> > > > Date: Thu May 17 10:32:39 2012 -0300 > > > > > > virtio/vhost: Add support for KVM in-kernel MSI injection > > > > > > > > > This commit ends up triggering the following assert immediately after > > > starting qemu with virtio-scsi <-> tcm_vhost: > > > > > > qemu-system-x86_64: /usr/src/qemu.git/hw/msix.c:515: > > > msix_unset_vector_notifiers: Assertion `dev->msix_vector_use_notifier && > > > dev->msix_vector_release_notifier' failed. > > > > > > OK, so adding the following hack allows me to boot: > > > > > > diff --git a/hw/msix.c b/hw/msix.c > > > index 59c7a83..6036909 100644 > > > --- a/hw/msix.c > > > +++ b/hw/msix.c > > > @@ -511,6 +511,11 @@ void msix_unset_vector_notifiers(PCIDevice *dev) > > > { > > > int vector; > > > > > > + if (!dev->msix_vector_use_notifier && !dev->msix_vector_release_notifier) { > > > + printf("Hit NULL msix_unset_vector_notifiers for: %s\n", dev->name); > > > + return; > > > + } > > > + > > > assert(dev->msix_vector_use_notifier && > > > dev->msix_vector_release_notifier); > > > > > > -- > > > > Can you post a backtrace from gdb? > > > > Sure, w/o the above patch the backtrace with commit 1523ed9e1d looks > like the following: > > (gdb) run > Starting program: /usr/src/qemu.git/x86_64-softmmu/qemu-system-x86_64 -enable-kvm -smp 2 -m 2048 -serial file:/tmp/vhost-serial.txt -hda /usr/src/qemu-vhost.git/debian_squeeze_amd64_standard-old.qcow2 -vhost-scsi id=vhost-scsi0,wwpn=naa.600140579ad21088,tpgt=1 -device virtio-scsi-pci,vhost-scsi=vhost-scsi0,event_idx=off > [Thread debugging using libthread_db enabled] > wwpn = "vhost-scsi0" tpgt = "1" > [New Thread 0x7ffff45f8700 (LWP 26508)] > [New Thread 0x7ffff3bf6700 (LWP 26509)] > [New Thread 0x7ffff33f5700 (LWP 26510)] > vhost_scsi_stop > Failed to clear endpoint > qemu-system-x86_64: /usr/src/qemu.git/hw/msix.c:515: msix_unset_vector_notifiers: Assertion `dev->msix_vector_use_notifier && dev->msix_vector_release_notifier' failed. > > Program received signal SIGABRT, Aborted. > 0x00007ffff5e8b165 in raise () from /lib/libc.so.6 > (gdb) bt > #0 0x00007ffff5e8b165 in raise () from /lib/libc.so.6 > #1 0x00007ffff5e8df70 in abort () from /lib/libc.so.6 > #2 0x00007ffff5e842b1 in __assert_fail () from /lib/libc.so.6 > #3 0x00000000004a84a1 in msix_unset_vector_notifiers (dev=0x1463a70) at /usr/src/qemu.git/hw/msix.c:514 > #4 0x00000000004d2865 in virtio_pci_set_guest_notifiers (opaque=0x6788, assign=136) > at /usr/src/qemu.git/hw/virtio-pci.c:703 > #5 0x000000000062955f in vhost_dev_stop (hdev=0x126c8a8, vdev=0x1465220) at /usr/src/qemu.git/hw/vhost.c:954 > #6 0x0000000000628989 in vhost_scsi_stop (vs=0x126c890, vdev=0x1465220) at /usr/src/qemu.git/hw/vhost-scsi.c:115 > #7 0x000000000062f5c9 in virtio_scsi_set_status (vdev=0x1465220, val=<value optimized out>) > at /usr/src/qemu.git/hw/virtio-scsi.c:631 > #8 0x0000000000632082 in virtio_set_status (vdev=0x1465220, val=136 '\210') at /usr/src/qemu.git/hw/virtio.c:507 > #9 0x0000000000633410 in virtio_reset (opaque=0x6788) at /usr/src/qemu.git/hw/virtio.c:517 This is strange, code shows virtio_reset calls virtio_set_status with value 0. Why is it 136 here? Stack corruption? We actually did have a memory corruptor with virtio scsi: You sent patch virtio-scsi: Fix incorrect VirtIOSCSI->cmd_vqs[0] definition maybe the bug goes away if you apply that? > #10 0x00000000004d30a9 in virtio_pci_reset (d=0x1463a70) at /usr/src/qemu.git/hw/virtio-pci.c:280 > #11 0x00000000004fc909 in qdev_reset_one (dev=0x6788, opaque=0x6788) at /usr/src/qemu.git/hw/qdev.c:207 > #12 0x00000000004fc670 in qdev_walk_children (dev=0x1463a70, devfn=0x4fc8f0 <qdev_reset_one>, > busfn=0x4fc510 <qbus_reset_one>, opaque=0x0) at /usr/src/qemu.git/hw/qdev.c:372 > #13 0x00000000004ae43d in pci_device_reset (dev=0x6788) at /usr/src/qemu.git/hw/pci.c:163 > #14 0x00000000004ae64f in pci_bus_reset (bus=0x1415bd0) at /usr/src/qemu.git/hw/pci.c:206 > #15 0x00000000004ae699 in pcibus_reset (qbus=0x6788) at /usr/src/qemu.git/hw/pci.c:213 > #16 0x00000000004fc710 in qbus_walk_children (bus=0x1415bd0, devfn=0x4fc8f0 <qdev_reset_one>, busfn=0x6, opaque=0x0) > at /usr/src/qemu.git/hw/qdev.c:349 > #17 0x00000000004fc6a3 in qdev_walk_children (dev=<value optimized out>, devfn=0x4fc8f0 <qdev_reset_one>, > busfn=0x4fc510 <qbus_reset_one>, opaque=0x0) at /usr/src/qemu.git/hw/qdev.c:379 > #18 0x00000000004fc745 in qbus_walk_children (bus=<value optimized out>, devfn=0x4fc8f0 <qdev_reset_one>, > busfn=0x4fc510 <qbus_reset_one>, opaque=0x0) at /usr/src/qemu.git/hw/qdev.c:356 > #19 0x00000000004d5822 in qemu_system_reset (report=false) at /usr/src/qemu.git/vl.c:1412 > #20 0x00000000004d70bb in main (argc=<value optimized out>, argv=<value optimized out>, envp=<value optimized out>) > at /usr/src/qemu.git/vl.c:3647 > (gdb) > > > > Also, is there a git tree and a way to reproduce this without special > > hardware needs? > > > > I'll push this series + branches to demonstrate the issue into an public > tree this afternoon. > > Also, the particular backend is a Fusion-IO raw block flash device, but > I'm pretty sure that using a TCM RAMDISK into tcm_vhost would exhibit > the same type of behavior. (Will double check on that shortly..) > > > > > > > and virtio-scsi is then able to load + detect tcm_vhost LUNs as > > > expected. > > > > > > However the random I/O performance with commit 1523ed9e1d46b is off by a > > > couple of orders of magnitude, ~6K IOPs compared to ~60K IOPs on raw > > > block flash using just the previous commit bdd00bdc64ba in Jan's series. > > > > > > So AFAICT there appears to be a serious performance regression that is > > > easily reproducible with that patch, which is about as far along as I've > > > been able to diagnose yet. > > > > > > Interestingly enough, virtio-scsi-raw performance does not seem to be > > > effected AFAICT by this regression, and is still able to go ~20K IOPs > > > with the same workload using commit 1523ed9e1d46b. (Roughly the same as > > > before) > > > > > > Does anyone have any idea why commit 1523ed9e1d46b would be killing > > > vhost / tcm_vhost performance so terribly, or is there something else > > > that vhost / vhost-scsi should be doing with new code..? > > > > No good idea yet, will have to look closer. > > > > <nod>, thanks for your help Jan. ;) > > > Maybe you are somehow deassigning (via set_guest_notifiers) before > > assigning. But that would not yet explain performance regressions. Your > > target is exposing MSI-X, isn't it? > > > > I believe that is correct. > > --nab -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/hw/msix.c b/hw/msix.c index 59c7a83..6036909 100644 --- a/hw/msix.c +++ b/hw/msix.c @@ -511,6 +511,11 @@ void msix_unset_vector_notifiers(PCIDevice *dev) { int vector; + if (!dev->msix_vector_use_notifier && !dev->msix_vector_release_notifier) { + printf("Hit NULL msix_unset_vector_notifiers for: %s\n", dev->name); + return; + } + assert(dev->msix_vector_use_notifier && dev->msix_vector_release_notifier);