Message ID | 1459679740-17519-1-git-send-email-mst@redhat.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On 04/03/2016 12:37 PM, Michael S. Tsirkin wrote: > Reentrancy cannot happen while the BQL is being held, > so we should never enter this condition. > > Cc: Christian Borntraeger <borntraeger@de.ibm.com> > Cc: Cornelia Huck <cornelia.huck@de.ibm.com> > Cc: Paolo Bonzini <pbonzini@redhat.com> > Signed-off-by: Michael S. Tsirkin <mst@redhat.com> > --- > > This is a replacement for [PATCH 9/9] virtio: remove starting/stopping > checks Christian, could you please give it a spin with debug enabled? > Since you reported above Paolo's patch triggers segfaults, I expect this > one to trigger assertions as well, which should give us more info on > the root cause. > the assert triggered (see below). (gdb) thread apply all bt Thread 5 (Thread 0x3ffa9fff910 (LWP 41714)): #0 0x000003ffab68841e in syscall () at /lib64/libc.so.6 #1 0x00000000803e84f6 in futex_wait (ev=0x80a65bd4 <rcu_call_ready_event>, val=4294967295) at /home/cborntra/REPOS/qemu/util/qemu-thread-posix.c:292 #2 0x00000000803e8786 in qemu_event_wait (ev=0x80a65bd4 <rcu_call_ready_event>) at /home/cborntra/REPOS/qemu/util/qemu-thread-posix.c:399 #3 0x0000000080405ec4 in call_rcu_thread (opaque=0x0) at /home/cborntra/REPOS/qemu/util/rcu.c:250 #4 0x000003ffab787c2c in start_thread () at /lib64/libpthread.so.0 #5 0x000003ffab68ec9a in thread_start () at /lib64/libc.so.6 Thread 4 (Thread 0x3ffa97ff910 (LWP 41718)): #0 0x000000008001b09a in address_space_read_continue (as=0x805da230 <address_space_memory>, addr=350645744, attrs=..., buf=0x3ffa97f8450 "", len=0, addr1=350645728, l=16, mr=0x80b0d6a0) at /home/cborntra/REPOS/qemu/exec.c:2738 #1 0x000000008001b186 in address_space_read_full (as=0x805da230 <address_space_memory>, addr=350645728, attrs=..., buf=0x3ffa97f8440 "\230\001q\024", len=16) at /home/cborntra/REPOS/qemu/exec.c:2752 #2 0x00000000800ed284 in vring_desc_read (len=16, buf=0x3ffa97f8440 "\230\001q\024", attrs=..., addr=350645728, as=0x805da230 <address_space_memory>) at /home/cborntra/REPOS/qemu/include/exec/memory.h:1431 #3 0x00000000800ed284 in vring_desc_read (vdev=0x80e44b88, desc=0x3ffa97f8440, desc_pa=350645696, i=2) at /home/cborntra/REPOS/qemu/hw/virtio/virtio.c:124 #4 0x00000000800ee05e in virtqueue_read_next_desc (vdev=0x80e44b88, desc=0x3ffa97f8440, desc_pa=350645696, max=3) at /home/cborntra/REPOS/qemu/hw/virtio/virtio.c:366 #5 0x00000000800eecbe in virtqueue_pop (vq=0x80f221c0, sz=160) at /home/cborntra/REPOS/qemu/hw/virtio/virtio.c:602 #6 0x00000000800b40b0 in virtio_blk_get_request (s=0x80e44b88) at /home/cborntra/REPOS/qemu/hw/block/virtio-blk.c:192 #7 0x00000000800b56e0 in virtio_blk_handle_vq (s=0x80e44b88, vq=0x80f221c0) at /home/cborntra/REPOS/qemu/hw/block/virtio-blk.c:588 #8 0x00000000800b78a2 in virtio_blk_data_plane_handle_output (vdev=0x80e44b88, vq=0x80f221c0) at /home/cborntra/REPOS/qemu/hw/block/dataplane/virtio-blk.c:195 #9 0x00000000800f0cb4 in virtio_queue_notify_aio_vq (vq=0x80f221c0) at /home/cborntra/REPOS/qemu/hw/virtio/virtio.c:1098 #10 0x00000000800f3664 in virtio_queue_host_notifier_aio_read (n=0x80f22220) at /home/cborntra/REPOS/qemu/hw/virtio/virtio.c:1799 #11 0x00000000802f1a0c in aio_dispatch (ctx=0x80acd5d0) at /home/cborntra/REPOS/qemu/aio-posix.c:327 #12 0x00000000802f2392 in aio_poll (ctx=0x80acd5d0, blocking=true) at /home/cborntra/REPOS/qemu/aio-posix.c:475 #13 0x000000008016590a in iothread_run (opaque=0x80acd090) at /home/cborntra/REPOS/qemu/iothread.c:46 #14 0x000003ffab787c2c in start_thread () at /lib64/libpthread.so.0 #15 0x000003ffab68ec9a in thread_start () at /lib64/libc.so.6 Thread 3 (Thread 0x3ff8b9d2910 (LWP 41739)): #0 0x000003ffab68334a in ioctl () at /lib64/libc.so.6 #1 0x0000000080081c84 in kvm_vcpu_ioctl (cpu=0x80e4d2b0, type=44672) at /home/cborntra/REPOS/qemu/kvm-all.c:1984 #2 0x000000008008154c in kvm_cpu_exec (cpu=0x80e4d2b0) at /home/cborntra/REPOS/qemu/kvm-all.c:1834 #3 0x000000008006075c in qemu_kvm_cpu_thread_fn (arg=0x80e4d2b0) at /home/cborntra/REPOS/qemu/cpus.c:1056 #4 0x000003ffab787c2c in start_thread () at /lib64/libpthread.so.0 #5 0x000003ffab68ec9a in thread_start () at /lib64/libc.so.6 Thread 2 (Thread 0x3ff8b1d2910 (LWP 41743)): #0 0x000003ffab68334a in ioctl () at /lib64/libc.so.6 #1 0x0000000080081c84 in kvm_vcpu_ioctl (cpu=0x80b40040, type=44672) at /home/cborntra/REPOS/qemu/kvm-all.c:1984 #2 0x000000008008154c in kvm_cpu_exec (cpu=0x80b40040) at /home/cborntra/REPOS/qemu/kvm-all.c:1834 #3 0x000000008006075c in qemu_kvm_cpu_thread_fn (arg=0x80b40040) at /home/cborntra/REPOS/qemu/cpus.c:1056 #4 0x000003ffab787c2c in start_thread () at /lib64/libpthread.so.0 #5 0x000003ffab68ec9a in thread_start () at /lib64/libc.so.6 Thread 1 (Thread 0x3ffad25bb90 (LWP 41685)): ---Type <return> to continue, or q <return> to quit--- #0 0x000003ffab5be2c0 in raise () at /lib64/libc.so.6 #1 0x000003ffab5bfc26 in abort () at /lib64/libc.so.6 #2 0x000003ffab5b5bce in __assert_fail_base () at /lib64/libc.so.6 #3 0x000003ffab5b5c5c in () at /lib64/libc.so.6 #4 0x00000000800b79e4 in virtio_blk_data_plane_start (s=0x80b195a0) at /home/cborntra/REPOS/qemu/hw/block/dataplane/virtio-blk.c:210 #5 0x00000000800b57ba in virtio_blk_handle_output (vdev=0x80e4b0f8, vq=0x80eaa180) at /home/cborntra/REPOS/qemu/hw/block/virtio-blk.c:607 #6 0x00000000800f0d74 in virtio_queue_notify_vq (vq=0x80eaa180) at /home/cborntra/REPOS/qemu/hw/virtio/virtio.c:1108 #7 0x00000000800f376c in virtio_queue_host_notifier_read (n=0x80eaa1e0) at /home/cborntra/REPOS/qemu/hw/virtio/virtio.c:1820 #8 0x00000000800f381c in virtio_queue_set_host_notifier_fd_handler (vq=0x80eaa180, assign=false, set_handler=false) at /home/cborntra/REPOS/qemu/hw/virtio/virtio.c:1836 #9 0x000000008010b808 in virtio_ccw_set_guest2host_notifier (dev=0x80e49fb0, n=0, assign=false, set_handler=false) at /home/cborntra/REPOS/qemu/hw/s390x/virtio-ccw.c:98 #10 0x000000008010baaa in virtio_ccw_stop_ioeventfd (dev=0x80e49fb0) at /home/cborntra/REPOS/qemu/hw/s390x/virtio-ccw.c:155 #11 0x000000008010f162 in virtio_ccw_set_host_notifier (d=0x80e49fb0, n=0, assign=true) at /home/cborntra/REPOS/qemu/hw/s390x/virtio-ccw.c:1212 #12 0x00000000800b7ab0 in virtio_blk_data_plane_start (s=0x80b195a0) at /home/cborntra/REPOS/qemu/hw/block/dataplane/virtio-blk.c:225 #13 0x00000000800b57ba in virtio_blk_handle_output (vdev=0x80e4b0f8, vq=0x80eaa180) at /home/cborntra/REPOS/qemu/hw/block/virtio-blk.c:607 #14 0x00000000800f0d74 in virtio_queue_notify_vq (vq=0x80eaa180) at /home/cborntra/REPOS/qemu/hw/virtio/virtio.c:1108 #15 0x00000000800f376c in virtio_queue_host_notifier_read (n=0x80eaa1e0) at /home/cborntra/REPOS/qemu/hw/virtio/virtio.c:1820 #16 0x00000000802f1a0c in aio_dispatch (ctx=0x80abae30) at /home/cborntra/REPOS/qemu/aio-posix.c:327 #17 0x00000000802df4d4 in aio_ctx_dispatch (source=0x80abae30, callback=0x0, user_data=0x0) at /home/cborntra/REPOS/qemu/async.c:233 #18 0x000003ffabfd1c0a in g_main_context_dispatch () at /lib64/libglib-2.0.so.0 #19 0x00000000802ee70e in glib_pollfds_poll () at /home/cborntra/REPOS/qemu/main-loop.c:213 #20 0x00000000802ee84a in os_host_main_loop_wait (timeout=1472000000) at /home/cborntra/REPOS/qemu/main-loop.c:258 #21 0x00000000802ee956 in main_loop_wait (nonblocking=0) at /home/cborntra/REPOS/qemu/main-loop.c:506 #22 0x000000008017dc0c in main_loop () at /home/cborntra/REPOS/qemu/vl.c:1934 #23 0x00000000801860e0 in main (argc=72, argv=0x3ffd60fe838, envp=0x3ffd60fea80) at /home/cborntra/REPOS/qemu/vl.c:4652
On 03/04/2016 21:59, Christian Borntraeger wrote: > Thread 1 (Thread 0x3ffad25bb90 (LWP 41685)): > ---Type <return> to continue, or q <return> to quit--- > #0 0x000003ffab5be2c0 in raise () at /lib64/libc.so.6 > #1 0x000003ffab5bfc26 in abort () at /lib64/libc.so.6 > #2 0x000003ffab5b5bce in __assert_fail_base () at /lib64/libc.so.6 > #3 0x000003ffab5b5c5c in () at /lib64/libc.so.6 > #4 0x00000000800b79e4 in virtio_blk_data_plane_start (s=0x80b195a0) at /home/cborntra/REPOS/qemu/hw/block/dataplane/virtio-blk.c:210 > #5 0x00000000800b57ba in virtio_blk_handle_output (vdev=0x80e4b0f8, vq=0x80eaa180) at /home/cborntra/REPOS/qemu/hw/block/virtio-blk.c:607 > #6 0x00000000800f0d74 in virtio_queue_notify_vq (vq=0x80eaa180) at /home/cborntra/REPOS/qemu/hw/virtio/virtio.c:1108 > #7 0x00000000800f376c in virtio_queue_host_notifier_read (n=0x80eaa1e0) at /home/cborntra/REPOS/qemu/hw/virtio/virtio.c:1820 > #8 0x00000000800f381c in virtio_queue_set_host_notifier_fd_handler (vq=0x80eaa180, assign=false, set_handler=false) at /home/cborntra/REPOS/qemu/hw/virtio/virtio.c:1836 > #9 0x000000008010b808 in virtio_ccw_set_guest2host_notifier (dev=0x80e49fb0, n=0, assign=false, set_handler=false) at /home/cborntra/REPOS/qemu/hw/s390x/virtio-ccw.c:98 > #10 0x000000008010baaa in virtio_ccw_stop_ioeventfd (dev=0x80e49fb0) at /home/cborntra/REPOS/qemu/hw/s390x/virtio-ccw.c:155 > #11 0x000000008010f162 in virtio_ccw_set_host_notifier (d=0x80e49fb0, n=0, assign=true) at /home/cborntra/REPOS/qemu/hw/s390x/virtio-ccw.c:1212 > #12 0x00000000800b7ab0 in virtio_blk_data_plane_start (s=0x80b195a0) at /home/cborntra/REPOS/qemu/hw/block/dataplane/virtio-blk.c:225 > #13 0x00000000800b57ba in virtio_blk_handle_output (vdev=0x80e4b0f8, vq=0x80eaa180) at /home/cborntra/REPOS/qemu/hw/block/virtio-blk.c:607 > #14 0x00000000800f0d74 in virtio_queue_notify_vq (vq=0x80eaa180) at /home/cborntra/REPOS/qemu/hw/virtio/virtio.c:1108 > #15 0x00000000800f376c in virtio_queue_host_notifier_read (n=0x80eaa1e0) at /home/cborntra/REPOS/qemu/hw/virtio/virtio.c:1820 > #16 0x00000000802f1a0c in aio_dispatch (ctx=0x80abae30) at /home/cborntra/REPOS/qemu/aio-posix.c:327 > #17 0x00000000802df4d4 in aio_ctx_dispatch (source=0x80abae30, callback=0x0, user_data=0x0) at /home/cborntra/REPOS/qemu/async.c:233 > #18 0x000003ffabfd1c0a in g_main_context_dispatch () at /lib64/libglib-2.0.so.0 > #19 0x00000000802ee70e in glib_pollfds_poll () at /home/cborntra/REPOS/qemu/main-loop.c:213 > #20 0x00000000802ee84a in os_host_main_loop_wait (timeout=1472000000) at /home/cborntra/REPOS/qemu/main-loop.c:258 > #21 0x00000000802ee956 in main_loop_wait (nonblocking=0) at /home/cborntra/REPOS/qemu/main-loop.c:506 > #22 0x000000008017dc0c in main_loop () at /home/cborntra/REPOS/qemu/vl.c:1934 > #23 0x00000000801860e0 in main (argc=72, argv=0x3ffd60fe838, envp=0x3ffd60fea80) at /home/cborntra/REPOS/qemu/vl.c:4652 This will be fixed by Cornelia's rework, and is an example of why I think patch 1/9 is a good idea (IOW, assign=false is harmful). Thanks, Paolo
On Sun, 3 Apr 2016 23:13:28 +0200 Paolo Bonzini <pbonzini@redhat.com> wrote: > On 03/04/2016 21:59, Christian Borntraeger wrote: > > Thread 1 (Thread 0x3ffad25bb90 (LWP 41685)): > > ---Type <return> to continue, or q <return> to quit--- > > #0 0x000003ffab5be2c0 in raise () at /lib64/libc.so.6 > > #1 0x000003ffab5bfc26 in abort () at /lib64/libc.so.6 > > #2 0x000003ffab5b5bce in __assert_fail_base () at /lib64/libc.so.6 > > #3 0x000003ffab5b5c5c in () at /lib64/libc.so.6 > > #4 0x00000000800b79e4 in virtio_blk_data_plane_start (s=0x80b195a0) at /home/cborntra/REPOS/qemu/hw/block/dataplane/virtio-blk.c:210 > > #5 0x00000000800b57ba in virtio_blk_handle_output (vdev=0x80e4b0f8, vq=0x80eaa180) at /home/cborntra/REPOS/qemu/hw/block/virtio-blk.c:607 > > #6 0x00000000800f0d74 in virtio_queue_notify_vq (vq=0x80eaa180) at /home/cborntra/REPOS/qemu/hw/virtio/virtio.c:1108 > > #7 0x00000000800f376c in virtio_queue_host_notifier_read (n=0x80eaa1e0) at /home/cborntra/REPOS/qemu/hw/virtio/virtio.c:1820 > > #8 0x00000000800f381c in virtio_queue_set_host_notifier_fd_handler (vq=0x80eaa180, assign=false, set_handler=false) at /home/cborntra/REPOS/qemu/hw/virtio/virtio.c:1836 > > #9 0x000000008010b808 in virtio_ccw_set_guest2host_notifier (dev=0x80e49fb0, n=0, assign=false, set_handler=false) at /home/cborntra/REPOS/qemu/hw/s390x/virtio-ccw.c:98 > > #10 0x000000008010baaa in virtio_ccw_stop_ioeventfd (dev=0x80e49fb0) at /home/cborntra/REPOS/qemu/hw/s390x/virtio-ccw.c:155 > > #11 0x000000008010f162 in virtio_ccw_set_host_notifier (d=0x80e49fb0, n=0, assign=true) at /home/cborntra/REPOS/qemu/hw/s390x/virtio-ccw.c:1212 > > #12 0x00000000800b7ab0 in virtio_blk_data_plane_start (s=0x80b195a0) at /home/cborntra/REPOS/qemu/hw/block/dataplane/virtio-blk.c:225 > > #13 0x00000000800b57ba in virtio_blk_handle_output (vdev=0x80e4b0f8, vq=0x80eaa180) at /home/cborntra/REPOS/qemu/hw/block/virtio-blk.c:607 > > #14 0x00000000800f0d74 in virtio_queue_notify_vq (vq=0x80eaa180) at /home/cborntra/REPOS/qemu/hw/virtio/virtio.c:1108 > > #15 0x00000000800f376c in virtio_queue_host_notifier_read (n=0x80eaa1e0) at /home/cborntra/REPOS/qemu/hw/virtio/virtio.c:1820 > > #16 0x00000000802f1a0c in aio_dispatch (ctx=0x80abae30) at /home/cborntra/REPOS/qemu/aio-posix.c:327 > > #17 0x00000000802df4d4 in aio_ctx_dispatch (source=0x80abae30, callback=0x0, user_data=0x0) at /home/cborntra/REPOS/qemu/async.c:233 > > #18 0x000003ffabfd1c0a in g_main_context_dispatch () at /lib64/libglib-2.0.so.0 > > #19 0x00000000802ee70e in glib_pollfds_poll () at /home/cborntra/REPOS/qemu/main-loop.c:213 > > #20 0x00000000802ee84a in os_host_main_loop_wait (timeout=1472000000) at /home/cborntra/REPOS/qemu/main-loop.c:258 > > #21 0x00000000802ee956 in main_loop_wait (nonblocking=0) at /home/cborntra/REPOS/qemu/main-loop.c:506 > > #22 0x000000008017dc0c in main_loop () at /home/cborntra/REPOS/qemu/vl.c:1934 > > #23 0x00000000801860e0 in main (argc=72, argv=0x3ffd60fe838, envp=0x3ffd60fea80) at /home/cborntra/REPOS/qemu/vl.c:4652 > > This will be fixed by Cornelia's rework, and is an example of why I > think patch 1/9 is a good idea (IOW, assign=false is harmful). So what do we want to do for 2.6? The aio handler rework (without the cleanup) is needed. Do we want to include the minimal version of my "keep handler assigned" patch (the one without the api rework) as well, as it fixes a latent bug?
On 04/04/2016 10:10, Cornelia Huck wrote: > > This will be fixed by Cornelia's rework, and is an example of why I > > think patch 1/9 is a good idea (IOW, assign=false is harmful). > > So what do we want to do for 2.6? The aio handler rework (without the > cleanup) is needed. Do we want to include the minimal version of my > "keep handler assigned" patch (the one without the api rework) as well, > as it fixes a latent bug? I would, but Michael is more conservative in general. Since the difference between a bug and a feature is very fuzzy here, I would just omit my patch 9. Paolo
On Mon, 4 Apr 2016 10:19:42 +0200 Paolo Bonzini <pbonzini@redhat.com> wrote: > On 04/04/2016 10:10, Cornelia Huck wrote: > > > This will be fixed by Cornelia's rework, and is an example of why I > > > think patch 1/9 is a good idea (IOW, assign=false is harmful). > > > > So what do we want to do for 2.6? The aio handler rework (without the > > cleanup) is needed. Do we want to include the minimal version of my > > "keep handler assigned" patch (the one without the api rework) as well, > > as it fixes a latent bug? > > I would, but Michael is more conservative in general. Since the > difference between a bug and a feature is very fuzzy here, I would just > omit my patch 9. I'd omit patch 9 as well, but the knowledge that the "handler deassigned" bug is still lurking makes me uncomfortable. Would like to see a test from someone with a large setup, anyway (and I need to enhance my test setup, I guess...)
On Mon, Apr 04, 2016 at 10:25:34AM +0200, Cornelia Huck wrote: > On Mon, 4 Apr 2016 10:19:42 +0200 > Paolo Bonzini <pbonzini@redhat.com> wrote: > > > On 04/04/2016 10:10, Cornelia Huck wrote: > > > > This will be fixed by Cornelia's rework, and is an example of why I > > > > think patch 1/9 is a good idea (IOW, assign=false is harmful). > > > > > > So what do we want to do for 2.6? The aio handler rework (without the > > > cleanup) is needed. Do we want to include the minimal version of my > > > "keep handler assigned" patch (the one without the api rework) as well, > > > as it fixes a latent bug? > > > > I would, but Michael is more conservative in general. Since the > > difference between a bug and a feature is very fuzzy here, I would just > > omit my patch 9. > > I'd omit patch 9 as well, but the knowledge that the "handler > deassigned" bug is still lurking makes me uncomfortable. It's not a bug as such - that logic was relying on handler invoking itself being a nop and that assumption broke with dataplane rework. > Would like to see a test from someone with a large setup, anyway (and I > need to enhance my test setup, I guess...) Now that Christian sent the backtrace I feel with understand the issues, but more testing is always good :)
diff --git a/hw/block/dataplane/virtio-blk.c b/hw/block/dataplane/virtio-blk.c index fd06726..04e0e0d 100644 --- a/hw/block/dataplane/virtio-blk.c +++ b/hw/block/dataplane/virtio-blk.c @@ -203,10 +203,12 @@ void virtio_blk_data_plane_start(VirtIOBlockDataPlane *s) VirtIOBlock *vblk = VIRTIO_BLK(s->vdev); int r; - if (vblk->dataplane_started || s->starting) { + if (vblk->dataplane_started) { return; } + assert(!s->starting); + s->starting = true; s->vq = virtio_get_queue(s->vdev, 0); @@ -257,10 +259,12 @@ void virtio_blk_data_plane_stop(VirtIOBlockDataPlane *s) VirtioBusClass *k = VIRTIO_BUS_GET_CLASS(qbus); VirtIOBlock *vblk = VIRTIO_BLK(s->vdev); - if (!vblk->dataplane_started || s->stopping) { + if (!vblk->dataplane_started) { return; } + assert(!s->stopping); + /* Better luck next time. */ if (vblk->dataplane_disabled) { vblk->dataplane_disabled = false;
Reentrancy cannot happen while the BQL is being held, so we should never enter this condition. Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Cornelia Huck <cornelia.huck@de.ibm.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> --- This is a replacement for [PATCH 9/9] virtio: remove starting/stopping checks Christian, could you please give it a spin with debug enabled? Since you reported above Paolo's patch triggers segfaults, I expect this one to trigger assertions as well, which should give us more info on the root cause. hw/block/dataplane/virtio-blk.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-)