diff mbox

[1/1] Revert "linux-aio: Cancel BH if not needed"

Message ID 1466775608-31052-1-git-send-email-roman.penyaev@profitbricks.com (mailing list archive)
State New, archived
Headers show

Commit Message

Roman Pen June 24, 2016, 1:40 p.m. UTC
This reverts commit ccb9dc10129954d0bcd7814298ed445e684d5a2a,
which causes MQ stuck while doing IO thru virtio_blk.

I reproduce very easily this stuck on recent v4 Stefan's set
using num-queues=4:

  "[PATCH v4 0/7] virtio-blk: multiqueue support"
  https://lists.gnu.org/archive/html/qemu-devel/2016-06/msg05999.html

Some debug output from guest:
-----------------------------

[root@andbd-vm ~]# cat /sys/block/vda/inflight
     106       98
[root@andbd-vm ~]# cat /sys/block/vda/mq/*/tags
nr_tags=128, reserved_tags=0, bits_per_word=5
nr_free=89, nr_reserved=0
active_queues=0
nr_tags=128, reserved_tags=0, bits_per_word=5
nr_free=83, nr_reserved=0
active_queues=0
nr_tags=128, reserved_tags=0, bits_per_word=5
nr_free=31, nr_reserved=0
active_queues=0
nr_tags=128, reserved_tags=0, bits_per_word=5
nr_free=105, nr_reserved=0
active_queues=0

Fio configuration:
------------------

[global]
description=Emulation of Storage Server Access Pattern
bssplit=512/20:1k/16:2k/9:4k/12:8k/19:16k/10:32k/8:64k/4
fadvise_hint=0
rw=randrw:2
direct=1

ioengine=libaio
iodepth=64
iodepth_batch_submit=64
iodepth_batch_complete=64
numjobs=8
gtod_reduce=1
group_reporting=1

time_based=1
runtime=30

[job]
filename=/dev/vda

VM configuration:
-----------------

-object iothread,id=t0 \
-drive if=none,id=d0,file=/dev/nullb0,format=raw,snapshot=off,cache=none,aio=native \
-device virtio-blk-pci,drive=d0,iothread=t0,num-queues=4,disable-modern=off,disable-legacy=on \

Signed-off-by: Roman Pen <roman.penyaev@profitbricks.com>
Cc: Kevin Wolf <kwolf@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Stefan Hajnoczi <stefanha@redhat.com>
Cc: qemu-devel@nongnu.org
---
 block/linux-aio.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

Comments

Kevin Wolf June 24, 2016, 2:46 p.m. UTC | #1
Am 24.06.2016 um 15:40 hat Roman Pen geschrieben:
> This reverts commit ccb9dc10129954d0bcd7814298ed445e684d5a2a,
> which causes MQ stuck while doing IO thru virtio_blk.

It would be good to have a theory why this happens.

> diff --git a/block/linux-aio.c b/block/linux-aio.c
> index e468960..fe7cece 100644
> --- a/block/linux-aio.c
> +++ b/block/linux-aio.c
> @@ -149,8 +149,6 @@ static void qemu_laio_completion_bh(void *opaque)
>      if (!s->io_q.plugged && !QSIMPLEQ_EMPTY(&s->io_q.pending)) {
>          ioq_submit(s);
>      }
> -
> -    qemu_bh_cancel(s->completion_bh);
>  }

Maybe if a nested event loops cancels the BH, it's missing on the next
loop iteration. Before my patch, the nested callback happened to leave
an additional BH around which the outer one actually needs.

I find this a bit ugly, but if we're okay with this mechanism we could
add a counter for the nesting level and only cancel on the top level.

If you find it as ugly as I do, a cleaner solution would be to schedule
the BH inside the loop.

> @@ -158,7 +156,7 @@ static void qemu_laio_completion_cb(EventNotifier *e)
>      LinuxAioState *s = container_of(e, LinuxAioState, e);
>  
>      if (event_notifier_test_and_clear(&s->e)) {
> -        qemu_laio_completion_bh(s);
> +        qemu_bh_schedule(s->completion_bh);
>      }
>  }

I can't see how this hunk would make a difference. Can you confirm that
just the first hunk is enough to fix the problem?

Kevin
Stefan Hajnoczi June 27, 2016, 3:09 p.m. UTC | #2
On Fri, Jun 24, 2016 at 3:46 PM, Kevin Wolf <kwolf@redhat.com> wrote:
> Am 24.06.2016 um 15:40 hat Roman Pen geschrieben:
>> This reverts commit ccb9dc10129954d0bcd7814298ed445e684d5a2a,
>> which causes MQ stuck while doing IO thru virtio_blk.
>
> It would be good to have a theory why this happens.

It's worth taking the batch notify BH out of the equation in
virtio_blk_data_plane_notify():

-    set_bit(virtio_get_queue_index(vq), s->batch_notify_vqs);
-    qemu_bh_schedule(s->bh);
+    if (virtio_should_notify(s->vdev, vq)) {
+        event_notifier_set(virtio_queue_get_guest_notifier(vq));
+    }

I wonder if that makes any difference?

I don't have a concrete theory why batch notify interferes with
Kevin's patch though.

Stefan
Stefan Hajnoczi June 27, 2016, 4:01 p.m. UTC | #3
On Fri, Jun 24, 2016 at 2:40 PM, Roman Pen
<roman.penyaev@profitbricks.com> wrote:
> diff --git a/block/linux-aio.c b/block/linux-aio.c
> index e468960..fe7cece 100644
> --- a/block/linux-aio.c
> +++ b/block/linux-aio.c
> @@ -149,8 +149,6 @@ static void qemu_laio_completion_bh(void *opaque)
>      if (!s->io_q.plugged && !QSIMPLEQ_EMPTY(&s->io_q.pending)) {
>          ioq_submit(s);
>      }
> -
> -    qemu_bh_cancel(s->completion_bh);

This was the cause.  I've found the root cause and will send a patch.

Stefan
Stefan Hajnoczi June 28, 2016, 8:41 a.m. UTC | #4
On Fri, Jun 24, 2016 at 3:46 PM, Kevin Wolf <kwolf@redhat.com> wrote:
>> diff --git a/block/linux-aio.c b/block/linux-aio.c
>> index e468960..fe7cece 100644
>> --- a/block/linux-aio.c
>> +++ b/block/linux-aio.c
>> @@ -149,8 +149,6 @@ static void qemu_laio_completion_bh(void *opaque)
>>      if (!s->io_q.plugged && !QSIMPLEQ_EMPTY(&s->io_q.pending)) {
>>          ioq_submit(s);
>>      }
>> -
>> -    qemu_bh_cancel(s->completion_bh);
>>  }
>
> Maybe if a nested event loops cancels the BH, it's missing on the next
> loop iteration. Before my patch, the nested callback happened to leave
> an additional BH around which the outer one actually needs.

The scenario you described is:

qemu_laio_completion_bh()
 -> cb1()
     -> aio_poll()
         -> qemu_laio_completion_bh()
         <- qemu_laio_completion_bh() (cancel BH)
     <- aio_poll()
 <- cb1()
 -> cb2()
     -> aio_poll()
        (hang!)

This hang seems impossible because the qemu_laio_completion_bh() loop
processes all pending events.  Therefore cb1() consumes all pending
events and cb2() will not poll.

If new I/O was submitted during cb1() and cb2() waits for it, then the
eventfd will become readable upon completion and cb2() does not hang
in that case either.

If, instead of the original scenario, cb1() nests deeper then the BH
is still scheduled and events will be processed without a hang.

In summary, the job of scheduling the BH is not to force all nested
callbacks to call qemu_laio_completion_bh().  Only the first nested
callback needs the BH so that all pending events will be processed.

Stefan
diff mbox

Patch

diff --git a/block/linux-aio.c b/block/linux-aio.c
index e468960..fe7cece 100644
--- a/block/linux-aio.c
+++ b/block/linux-aio.c
@@ -149,8 +149,6 @@  static void qemu_laio_completion_bh(void *opaque)
     if (!s->io_q.plugged && !QSIMPLEQ_EMPTY(&s->io_q.pending)) {
         ioq_submit(s);
     }
-
-    qemu_bh_cancel(s->completion_bh);
 }
 
 static void qemu_laio_completion_cb(EventNotifier *e)
@@ -158,7 +156,7 @@  static void qemu_laio_completion_cb(EventNotifier *e)
     LinuxAioState *s = container_of(e, LinuxAioState, e);
 
     if (event_notifier_test_and_clear(&s->e)) {
-        qemu_laio_completion_bh(s);
+        qemu_bh_schedule(s->completion_bh);
     }
 }