vl.c/exit: pause cpus before closing block devices

Message ID	20170808100243.GG4850@dhcp-200-186.str.redhat.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org> DMARC-Filter: OpenDMARC Filter v1.3.2 mx1.redhat.com A8552C0587C9 Date: Tue, 8 Aug 2017 12:02:43 +0200 From: Kevin Wolf <kwolf@redhat.com> To: Paolo Bonzini <pbonzini@redhat.com> Message-ID: <20170808100243.GG4850@dhcp-200-186.str.redhat.com> References: <20170713190116.21608-1-dgilbert@redhat.com> <20170717101703.GH7163@stefanha-x1.localdomain> <20170717102642.GG2106@work-vm> <c6161e7a-9b03-16bf-fb20-86df6f3fd9bc@redhat.com> <CAJSP0QWd7gtV21Hpr3noA0odAMhdriNkaqJtLJvbSNzoErvFdQ@mail.gmail.com> <9e79cd50-c23a-2b96-4206-f505993179f5@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <9e79cd50-c23a-2b96-4206-f505993179f5@redhat.com> User-Agent: Mutt/1.8.3 (2017-05-23) Subject: Re: [Qemu-devel] [PATCH] vl.c/exit: pause cpus before closing block devices Precedence: list Cc: Stefan Hajnoczi <stefanha@gmail.com>, Prasad Pandit <ppandit@redhat.com>, John Snow <jsnow@redhat.com>, "Dr. David Alan Gilbert" <dgilbert@redhat.com>, qemu-devel <qemu-devel@nongnu.org> Errors-To: qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org Sender: "Qemu-devel" <qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org>

Message ID

20170808100243.GG4850@dhcp-200-186.str.redhat.com (mailing list archive)

State

New, archived

Headers

DMARC-Filter: OpenDMARC Filter v1.3.2 mx1.redhat.com A8552C0587C9
Date: Tue, 8 Aug 2017 12:02:43 +0200
From: Kevin Wolf <kwolf@redhat.com>
To: Paolo Bonzini <pbonzini@redhat.com>
Message-ID: <20170808100243.GG4850@dhcp-200-186.str.redhat.com>
References: <20170713190116.21608-1-dgilbert@redhat.com>
	<20170717101703.GH7163@stefanha-x1.localdomain>
	<20170717102642.GG2106@work-vm>
	<c6161e7a-9b03-16bf-fb20-86df6f3fd9bc@redhat.com>
	<CAJSP0QWd7gtV21Hpr3noA0odAMhdriNkaqJtLJvbSNzoErvFdQ@mail.gmail.com>
	<9e79cd50-c23a-2b96-4206-f505993179f5@redhat.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <9e79cd50-c23a-2b96-4206-f505993179f5@redhat.com>
User-Agent: Mutt/1.8.3 (2017-05-23)
Subject: Re: [Qemu-devel] [PATCH] vl.c/exit: pause cpus before closing block
	devices
Precedence: list
Cc: Stefan Hajnoczi <stefanha@gmail.com>, Prasad Pandit <ppandit@redhat.com>,
	John Snow <jsnow@redhat.com>,
	"Dr. David Alan Gilbert" <dgilbert@redhat.com>,
	qemu-devel <qemu-devel@nongnu.org>
Errors-To: qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org
Sender: "Qemu-devel"
	<qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org>

Commit Message

Kevin Wolf Aug. 8, 2017, 10:02 a.m. UTC

Am 04.08.2017 um 13:46 hat Paolo Bonzini geschrieben:
> On 04/08/2017 11:58, Stefan Hajnoczi wrote:
> >> the root cause of this bug is related to this as well:
> >> https://lists.gnu.org/archive/html/qemu-devel/2017-07/msg02945.html
> >>
> >> From commit 99723548 we started assuming (incorrectly?) that blk_
> >> functions always WILL have an attached BDS, but this is not always true,
> >> for instance, flushing the cache from an empty CDROM.
> >>
> >> Paolo, can we move the flight counter increment outside of the
> >> block-backend layer, is that safe?
> > I think the bdrv_inc_in_flight(blk_bs(blk)) needs to be fixed
> > regardless of the throttling timer issue discussed below.  BB cannot
> > assume that the BDS graph is non-empty.
> 
> Can we make bdrv_aio_* return NULL (even temporarily) if there is no
> attached BDS?  That would make it much easier to fix.

Would the proper fix be much more complicated than the following? I must
admit that I don't fully understand the current state of affairs with
respect to threading, AioContext etc. so I may well be missing
something.

Note that my blk_drain() implementation doesn't necessarily drain
blk_bs(blk) completely, but only those requests that came from the
specific BlockBackend. I think this is what the callers want, but
if otherwise, it shouldn't be hard to change.

Kevin

Comments

Paolo Bonzini Aug. 8, 2017, 11:04 a.m. UTC | #1

On 08/08/2017 12:02, Kevin Wolf wrote:
> Am 04.08.2017 um 13:46 hat Paolo Bonzini geschrieben:
>> On 04/08/2017 11:58, Stefan Hajnoczi wrote:
>>>> the root cause of this bug is related to this as well:
>>>> https://lists.gnu.org/archive/html/qemu-devel/2017-07/msg02945.html
>>>>
>>>> From commit 99723548 we started assuming (incorrectly?) that blk_
>>>> functions always WILL have an attached BDS, but this is not always true,
>>>> for instance, flushing the cache from an empty CDROM.
>>>>
>>>> Paolo, can we move the flight counter increment outside of the
>>>> block-backend layer, is that safe?
>>> I think the bdrv_inc_in_flight(blk_bs(blk)) needs to be fixed
>>> regardless of the throttling timer issue discussed below.  BB cannot
>>> assume that the BDS graph is non-empty.
>>
>> Can we make bdrv_aio_* return NULL (even temporarily) if there is no
>> attached BDS?  That would make it much easier to fix.
> 
> Would the proper fix be much more complicated than the following? I must
> admit that I don't fully understand the current state of affairs with
> respect to threading, AioContext etc. so I may well be missing
> something.

Not much, but it's not complete either.  The issues I see are that: 1)
blk_drain_all does not take the new counter into account; 2)
bdrv_drain_all callers need to be audited to see if they should be
blk_drain_all (or more likely, only device BlockBackends should be drained).

Paolo

> Note that my blk_drain() implementation doesn't necessarily drain
> blk_bs(blk) completely, but only those requests that came from the
> specific BlockBackend. I think this is what the callers want, but
> if otherwise, it shouldn't be hard to change.

Yes, this should be what they want.

Paolo

> Kevin
> 
> diff --git a/block.c b/block.c
> index f1c3e98a4d..07decc3b5f 100644
> --- a/block.c
> +++ b/block.c
> @@ -4563,7 +4563,7 @@ out:
>  
>  AioContext *bdrv_get_aio_context(BlockDriverState *bs)
>  {
> -    return bs->aio_context;
> +    return bs ? bs->aio_context : qemu_get_aio_context();
>  }
>  
>  void bdrv_coroutine_enter(BlockDriverState *bs, Coroutine *co)
> diff --git a/block/block-backend.c b/block/block-backend.c
> index 968438c149..649d11b4ef 100644
> --- a/block/block-backend.c
> +++ b/block/block-backend.c
> @@ -68,6 +68,9 @@ struct BlockBackend {
>      NotifierList remove_bs_notifiers, insert_bs_notifiers;
>  
>      int quiesce_counter;
> +
> +    /* Number of in-flight requests. Accessed with atomic ops. */
> +    unsigned int in_flight;
>  };
>  
>  typedef struct BlockBackendAIOCB {
> @@ -1109,6 +1112,16 @@ int blk_make_zero(BlockBackend *blk, BdrvRequestFlags flags)
>      return bdrv_make_zero(blk->root, flags);
>  }
>  
> +static void blk_inc_in_flight(BlockBackend *blk)
> +{
> +    atomic_inc(&blk->in_flight);
> +}
> +
> +static void blk_dec_in_flight(BlockBackend *blk)
> +{
> +    atomic_dec(&blk->in_flight);
> +}
> +
>  static void error_callback_bh(void *opaque)
>  {
>      struct BlockBackendAIOCB *acb = opaque;
> @@ -1147,7 +1160,7 @@ static const AIOCBInfo blk_aio_em_aiocb_info = {
>  static void blk_aio_complete(BlkAioEmAIOCB *acb)
>  {
>      if (acb->has_returned) {
> -        bdrv_dec_in_flight(acb->common.bs);
> +        blk_dec_in_flight(acb->rwco.blk);
>          acb->common.cb(acb->common.opaque, acb->rwco.ret);
>          qemu_aio_unref(acb);
>      }
> @@ -1168,7 +1181,7 @@ static BlockAIOCB *blk_aio_prwv(BlockBackend *blk, int64_t offset, int bytes,
>      BlkAioEmAIOCB *acb;
>      Coroutine *co;
>  
> -    bdrv_inc_in_flight(blk_bs(blk));
> +    blk_inc_in_flight(blk);
>      acb = blk_aio_get(&blk_aio_em_aiocb_info, blk, cb, opaque);
>      acb->rwco = (BlkRwCo) {
>          .blk    = blk,
> @@ -1405,8 +1418,16 @@ int blk_flush(BlockBackend *blk)
>  
>  void blk_drain(BlockBackend *blk)
>  {
> -    if (blk_bs(blk)) {
> -        bdrv_drain(blk_bs(blk));
> +    AioContext *ctx = blk_get_aio_context(blk);
> +
> +    while (atomic_read(&blk->in_flight)) {
> +        aio_context_acquire(ctx);
> +        aio_poll(ctx, false);
> +        aio_context_release(ctx);
> +
> +        if (blk_bs(blk)) {
> +            bdrv_drain(blk_bs(blk));
> +        }
>      }
>  }
>  
> @@ -1453,10 +1474,11 @@ static void send_qmp_error_event(BlockBackend *blk,
>                                   bool is_read, int error)
>  {
>      IoOperationType optype;
> +    BlockDriverState *bs = blk_bs(blk);
>  
>      optype = is_read ? IO_OPERATION_TYPE_READ : IO_OPERATION_TYPE_WRITE;
>      qapi_event_send_block_io_error(blk_name(blk),
> -                                   bdrv_get_node_name(blk_bs(blk)), optype,
> +                                   bs ? bdrv_get_node_name(bs) : "", optype,
>                                     action, blk_iostatus_is_enabled(blk),
>                                     error == ENOSPC, strerror(error),
>                                     &error_abort);
> diff --git a/tests/ide-test.c b/tests/ide-test.c
> index bfd79ddbdc..ffbfb04271 100644
> --- a/tests/ide-test.c
> +++ b/tests/ide-test.c
> @@ -689,6 +689,24 @@ static void test_flush_nodev(void)
>      ide_test_quit();
>  }
>  
> +static void test_flush_empty_drive(void)
> +{
> +    QPCIDevice *dev;
> +    QPCIBar bmdma_bar, ide_bar;
> +
> +    ide_test_start("-device ide-cd,bus=ide.0");
> +    dev = get_pci_device(&bmdma_bar, &ide_bar);
> +
> +    /* FLUSH CACHE command on device 0*/
> +    qpci_io_writeb(dev, ide_bar, reg_device, 0);
> +    qpci_io_writeb(dev, ide_bar, reg_command, CMD_FLUSH_CACHE);
> +
> +    /* Just testing that qemu doesn't crash... */
> +
> +    free_pci_device(dev);
> +    ide_test_quit();
> +}
> +
>  static void test_pci_retry_flush(void)
>  {
>      test_retry_flush("pc");
> @@ -954,6 +972,7 @@ int main(int argc, char **argv)
>  
>      qtest_add_func("/ide/flush", test_flush);
>      qtest_add_func("/ide/flush/nodev", test_flush_nodev);
> +    qtest_add_func("/ide/flush/empty_drive", test_flush_empty_drive);
>      qtest_add_func("/ide/flush/retry_pci", test_pci_retry_flush);
>      qtest_add_func("/ide/flush/retry_isa", test_isa_retry_flush);
>  
> diff --git a/tests/test-block-backend.c b/tests/test-block-backend.c
> new file mode 100644
> index 0000000000..5348781a42
> --- /dev/null
> +++ b/tests/test-block-backend.c
> @@ -0,0 +1,62 @@
> +/*
> + * BlockBackend tests
> + *
> + * Copyright (c) 2017 Kevin Wolf <kwolf@redhat.com>
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a copy
> + * of this software and associated documentation files (the "Software"), to deal
> + * in the Software without restriction, including without limitation the rights
> + * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
> + * copies of the Software, and to permit persons to whom the Software is
> + * furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice shall be included in
> + * all copies or substantial portions of the Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
> + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
> + * THE SOFTWARE.
> + */
> +
> +#include "qemu/osdep.h"
> +#include "block/block.h"
> +#include "sysemu/block-backend.h"
> +#include "qapi/error.h"
> +
> +static void test_drain_aio_error_flush_cb(void *opaque, int ret)
> +{
> +    bool *completed = opaque;
> +
> +    g_assert(ret == -ENOMEDIUM);
> +    *completed = true;
> +}
> +
> +static void test_drain_aio_error(void)
> +{
> +    BlockBackend *blk = blk_new(BLK_PERM_ALL, BLK_PERM_ALL);
> +    BlockAIOCB *acb;
> +    bool completed = false;
> +
> +    acb = blk_aio_flush(blk, test_drain_aio_error_flush_cb, &completed);
> +    g_assert(acb != NULL);
> +    g_assert(completed == false);
> +
> +    blk_drain(blk);
> +    g_assert(completed == true);
> +}
> +
> +int main(int argc, char **argv)
> +{
> +    bdrv_init();
> +    qemu_init_main_loop(&error_abort);
> +
> +    g_test_init(&argc, &argv, NULL);
> +
> +    g_test_add_func("/block-backend/drain_aio_error", test_drain_aio_error);
> +
> +    return g_test_run();
> +}
> diff --git a/tests/Makefile.include b/tests/Makefile.include
> index 59e536bf0b..7beb65397b 100644
> --- a/tests/Makefile.include
> +++ b/tests/Makefile.include
> @@ -56,6 +56,7 @@ check-unit-y += tests/test-hbitmap$(EXESUF)
>  gcov-files-test-hbitmap-y = blockjob.c
>  check-unit-y += tests/test-blockjob$(EXESUF)
>  check-unit-y += tests/test-blockjob-txn$(EXESUF)
> +check-unit-y += tests/test-block-backend$(EXESUF)
>  check-unit-y += tests/test-x86-cpuid$(EXESUF)
>  # all code tested by test-x86-cpuid is inside topology.h
>  gcov-files-test-x86-cpuid-y =
> @@ -556,6 +557,7 @@ tests/test-aio-multithread$(EXESUF): tests/test-aio-multithread.o $(test-block-o
>  tests/test-throttle$(EXESUF): tests/test-throttle.o $(test-block-obj-y)
>  tests/test-blockjob$(EXESUF): tests/test-blockjob.o $(test-block-obj-y) $(test-util-obj-y)
>  tests/test-blockjob-txn$(EXESUF): tests/test-blockjob-txn.o $(test-block-obj-y) $(test-util-obj-y)
> +tests/test-block-backend$(EXESUF): tests/test-block-backend.o $(test-block-obj-y) $(test-util-obj-y)
>  tests/test-thread-pool$(EXESUF): tests/test-thread-pool.o $(test-block-obj-y)
>  tests/test-iov$(EXESUF): tests/test-iov.o $(test-util-obj-y)
>  tests/test-hbitmap$(EXESUF): tests/test-hbitmap.o $(test-util-obj-y) $(test-crypto-obj-y)
>

Kevin Wolf Aug. 8, 2017, 11:56 a.m. UTC | #2

Am 08.08.2017 um 13:04 hat Paolo Bonzini geschrieben:
> On 08/08/2017 12:02, Kevin Wolf wrote:
> > Am 04.08.2017 um 13:46 hat Paolo Bonzini geschrieben:
> >> On 04/08/2017 11:58, Stefan Hajnoczi wrote:
> >>>> the root cause of this bug is related to this as well:
> >>>> https://lists.gnu.org/archive/html/qemu-devel/2017-07/msg02945.html
> >>>>
> >>>> From commit 99723548 we started assuming (incorrectly?) that blk_
> >>>> functions always WILL have an attached BDS, but this is not always true,
> >>>> for instance, flushing the cache from an empty CDROM.
> >>>>
> >>>> Paolo, can we move the flight counter increment outside of the
> >>>> block-backend layer, is that safe?
> >>> I think the bdrv_inc_in_flight(blk_bs(blk)) needs to be fixed
> >>> regardless of the throttling timer issue discussed below.  BB cannot
> >>> assume that the BDS graph is non-empty.
> >>
> >> Can we make bdrv_aio_* return NULL (even temporarily) if there is no
> >> attached BDS?  That would make it much easier to fix.
> > 
> > Would the proper fix be much more complicated than the following? I must
> > admit that I don't fully understand the current state of affairs with
> > respect to threading, AioContext etc. so I may well be missing
> > something.
> 
> Not much, but it's not complete either.  The issues I see are that: 1)
> blk_drain_all does not take the new counter into account;

Ok, I think this does the trick:

void blk_drain_all(void)
{
    BlockBackend *blk = NULL;

    bdrv_drain_all_begin();
    while ((blk = blk_all_next(blk)) != NULL) {
        blk_drain(blk);
    }
    bdrv_drain_all_end();
}

> 2) bdrv_drain_all callers need to be audited to see if they should be
> blk_drain_all (or more likely, only device BlockBackends should be drained).

qmp_transaction() is unclear to me. It should be changed in some way
anyway because it uses bdrv_drain_all() rather than a begin/end pair.

do_vm_stop() and vm_stop_force_state() probably want blk_drain_all().

xen_invalidate_map_cache() - wtf? Looks like the wrong layer to do this,
but I guess blk_drain_all(), too.

block_migration_cleanup() is just lazy and really means a blk_drain()
for its own BlockBackends. blk_drain_all() as the simple conversion.

migration/savevm: Migration wants blk_drain_all() to get the devices
quiesced.

qemu-io: blk_drain_all(), too.

Hm, looks like there won't be many callers of bdrv_drain_all() left. :-)

> > Note that my blk_drain() implementation doesn't necessarily drain
> > blk_bs(blk) completely, but only those requests that came from the
> > specific BlockBackend. I think this is what the callers want, but
> > if otherwise, it shouldn't be hard to change.
> 
> Yes, this should be what they want.

Apparently not; block jobs don't complete with it any more. I haven't
checked in detail, but it makes sense that they can have a BH (e.g. for
block_job_defer_to_main_loop) without a request being in flight.

So I'm including an unconditional bdrv_drain() again. Or I guess,
calling aio_poll() unconditionally and including its return value
in the loop condition would be the cleaner approach?

Kevin

Paolo Bonzini Aug. 8, 2017, 12:47 p.m. UTC | #3

On 08/08/2017 13:56, Kevin Wolf wrote:
> Am 08.08.2017 um 13:04 hat Paolo Bonzini geschrieben:
>> On 08/08/2017 12:02, Kevin Wolf wrote:
>>> Am 04.08.2017 um 13:46 hat Paolo Bonzini geschrieben:
>>>> On 04/08/2017 11:58, Stefan Hajnoczi wrote:
>>>>>> the root cause of this bug is related to this as well:
>>>>>> https://lists.gnu.org/archive/html/qemu-devel/2017-07/msg02945.html
>>>>>>
>>>>>> From commit 99723548 we started assuming (incorrectly?) that blk_
>>>>>> functions always WILL have an attached BDS, but this is not always true,
>>>>>> for instance, flushing the cache from an empty CDROM.
>>>>>>
>>>>>> Paolo, can we move the flight counter increment outside of the
>>>>>> block-backend layer, is that safe?
>>>>> I think the bdrv_inc_in_flight(blk_bs(blk)) needs to be fixed
>>>>> regardless of the throttling timer issue discussed below.  BB cannot
>>>>> assume that the BDS graph is non-empty.
>>>>
>>>> Can we make bdrv_aio_* return NULL (even temporarily) if there is no
>>>> attached BDS?  That would make it much easier to fix.
>>>
>>> Would the proper fix be much more complicated than the following? I must
>>> admit that I don't fully understand the current state of affairs with
>>> respect to threading, AioContext etc. so I may well be missing
>>> something.
>>
>> Not much, but it's not complete either.  The issues I see are that: 1)
>> blk_drain_all does not take the new counter into account;
> 
> Ok, I think this does the trick:
> 
> void blk_drain_all(void)
> {
>     BlockBackend *blk = NULL;
> 
>     bdrv_drain_all_begin();
>     while ((blk = blk_all_next(blk)) != NULL) {
>         blk_drain(blk);
>     }
>     bdrv_drain_all_end();
> }

Small note: blk_drain_all is called while no AioContext has been
acquired, while blk_drain requires you to acquire blk's AioContext.

And this of course should be split in blk_drain_all_begin/blk_drain_all_end.

>> 2) bdrv_drain_all callers need to be audited to see if they should be
>> blk_drain_all (or more likely, only device BlockBackends should be drained).
> 
> qmp_transaction() is unclear to me. It should be changed in some way
> anyway because it uses bdrv_drain_all() rather than a begin/end pair.

I think every ->prepare should do drained_begin and ->clean should end
the section.  Most of them are doing it already.

> do_vm_stop() and vm_stop_force_state() probably want blk_drain_all().

Or just all devices.  From the October 2016 discussion, "qdev
drive properties would install a vmstate notifier to quiesce their own
BlockBackend rather than relying on a blind bdrv_drain_all from vm_stop"
(though there'll surely be some non-qdevified straggler).

> xen_invalidate_map_cache() - wtf? Looks like the wrong layer to do this,
> but I guess blk_drain_all(), too.

It seems to be an advisory operation triggered by balloon deflation.
Whatever.

> block_migration_cleanup() is just lazy and really means a blk_drain()
> for its own BlockBackends. blk_drain_all() as the simple conversion.
> 
> migration/savevm: Migration wants blk_drain_all() to get the devices
> quiesced.

It already does vm_stop though.  It doesn't seem to be necessary at all.

> qemu-io: blk_drain_all(), too.

Or just blk_drain(qemuio_blk).

> Hm, looks like there won't be many callers of bdrv_drain_all() left. :-)

Not many blk_drain_all(), and that's not a bad thing.

You missed bdrv_reopen_multiple.  Maybe it also should not do anything,
and the caller should do begin/end instead.

>>> Note that my blk_drain() implementation doesn't necessarily drain
>>> blk_bs(blk) completely, but only those requests that came from the
>>> specific BlockBackend. I think this is what the callers want, but
>>> if otherwise, it shouldn't be hard to change.
>>
>> Yes, this should be what they want.
> 
> Apparently not; block jobs don't complete with it any more. I haven't
> checked in detail, but it makes sense that they can have a BH (e.g. for
> block_job_defer_to_main_loop) without a request being in flight.

That BH should be handled here:

    while (!job->completed) {
        aio_poll(qemu_get_aio_context(), true);
    }

so that should be okay.  (Block jobs have been a huge pain every time I
touched bdrv_drain, indeed...).

> So I'm including an unconditional bdrv_drain() again. Or I guess,
> calling aio_poll() unconditionally and including its return value
> in the loop condition would be the cleaner approach?

Note that if you have no block device then you don't have an AioContext
for aio_poll() to use.  In fact, we're really close to removing the
AioContext dependency for submitting I/O requests (AioContext remains
only as a place for network drivers to register their sockets' I/O
handlers), so you shouldn't use it from block-backend.c.

That's why I suggested returning NULL from AIO operations if there is no
blk_bs(blk).  It may be a little less tidy (I'm more worried about
static analysis lossage than anything else), but it does make some
sense.  Most callers are checking blk_is_available or blk_is_inserted
already, which explains why we're not seeing more crashes in
block-backend.c.

If you can set aside the blk_bs(blk) == NULL case, you're basically
reinventing BDRV_POLL_WHILE at this point.  If you add a bdrv_wakeup to
blk_dec_in_flight, and use BDRV_POLL_WHILE in blk_drain, then things
should in theory just work (though most likely they won't for some
reason that takes hours to debug).

Paolo

diff --git a/block.c b/block.c
index f1c3e98a4d..07decc3b5f 100644
--- a/block.c
+++ b/block.c
@@ -4563,7 +4563,7 @@  out:
 
 AioContext *bdrv_get_aio_context(BlockDriverState *bs)
 {
-    return bs->aio_context;
+    return bs ? bs->aio_context : qemu_get_aio_context();
 }
 
 void bdrv_coroutine_enter(BlockDriverState *bs, Coroutine *co)
diff --git a/block/block-backend.c b/block/block-backend.c
index 968438c149..649d11b4ef 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -68,6 +68,9 @@  struct BlockBackend {
     NotifierList remove_bs_notifiers, insert_bs_notifiers;
 
     int quiesce_counter;
+
+    /* Number of in-flight requests. Accessed with atomic ops. */
+    unsigned int in_flight;
 };
 
 typedef struct BlockBackendAIOCB {
@@ -1109,6 +1112,16 @@  int blk_make_zero(BlockBackend *blk, BdrvRequestFlags flags)
     return bdrv_make_zero(blk->root, flags);
 }
 
+static void blk_inc_in_flight(BlockBackend *blk)
+{
+    atomic_inc(&blk->in_flight);
+}
+
+static void blk_dec_in_flight(BlockBackend *blk)
+{
+    atomic_dec(&blk->in_flight);
+}
+
 static void error_callback_bh(void *opaque)
 {
     struct BlockBackendAIOCB *acb = opaque;
@@ -1147,7 +1160,7 @@  static const AIOCBInfo blk_aio_em_aiocb_info = {
 static void blk_aio_complete(BlkAioEmAIOCB *acb)
 {
     if (acb->has_returned) {
-        bdrv_dec_in_flight(acb->common.bs);
+        blk_dec_in_flight(acb->rwco.blk);
         acb->common.cb(acb->common.opaque, acb->rwco.ret);
         qemu_aio_unref(acb);
     }
@@ -1168,7 +1181,7 @@  static BlockAIOCB *blk_aio_prwv(BlockBackend *blk, int64_t offset, int bytes,
     BlkAioEmAIOCB *acb;
     Coroutine *co;
 
-    bdrv_inc_in_flight(blk_bs(blk));
+    blk_inc_in_flight(blk);
     acb = blk_aio_get(&blk_aio_em_aiocb_info, blk, cb, opaque);
     acb->rwco = (BlkRwCo) {
         .blk    = blk,
@@ -1405,8 +1418,16 @@  int blk_flush(BlockBackend *blk)
 
 void blk_drain(BlockBackend *blk)
 {
-    if (blk_bs(blk)) {
-        bdrv_drain(blk_bs(blk));
+    AioContext *ctx = blk_get_aio_context(blk);
+
+    while (atomic_read(&blk->in_flight)) {
+        aio_context_acquire(ctx);
+        aio_poll(ctx, false);
+        aio_context_release(ctx);
+
+        if (blk_bs(blk)) {
+            bdrv_drain(blk_bs(blk));
+        }
     }
 }
 
@@ -1453,10 +1474,11 @@  static void send_qmp_error_event(BlockBackend *blk,
                                  bool is_read, int error)
 {
     IoOperationType optype;
+    BlockDriverState *bs = blk_bs(blk);
 
     optype = is_read ? IO_OPERATION_TYPE_READ : IO_OPERATION_TYPE_WRITE;
     qapi_event_send_block_io_error(blk_name(blk),
-                                   bdrv_get_node_name(blk_bs(blk)), optype,
+                                   bs ? bdrv_get_node_name(bs) : "", optype,
                                    action, blk_iostatus_is_enabled(blk),
                                    error == ENOSPC, strerror(error),
                                    &error_abort);
diff --git a/tests/ide-test.c b/tests/ide-test.c
index bfd79ddbdc..ffbfb04271 100644
--- a/tests/ide-test.c
+++ b/tests/ide-test.c
@@ -689,6 +689,24 @@  static void test_flush_nodev(void)
     ide_test_quit();
 }
 
+static void test_flush_empty_drive(void)
+{
+    QPCIDevice *dev;
+    QPCIBar bmdma_bar, ide_bar;
+
+    ide_test_start("-device ide-cd,bus=ide.0");
+    dev = get_pci_device(&bmdma_bar, &ide_bar);
+
+    /* FLUSH CACHE command on device 0*/
+    qpci_io_writeb(dev, ide_bar, reg_device, 0);
+    qpci_io_writeb(dev, ide_bar, reg_command, CMD_FLUSH_CACHE);
+
+    /* Just testing that qemu doesn't crash... */
+
+    free_pci_device(dev);
+    ide_test_quit();
+}
+
 static void test_pci_retry_flush(void)
 {
     test_retry_flush("pc");
@@ -954,6 +972,7 @@  int main(int argc, char **argv)
 
     qtest_add_func("/ide/flush", test_flush);
     qtest_add_func("/ide/flush/nodev", test_flush_nodev);
+    qtest_add_func("/ide/flush/empty_drive", test_flush_empty_drive);
     qtest_add_func("/ide/flush/retry_pci", test_pci_retry_flush);
     qtest_add_func("/ide/flush/retry_isa", test_isa_retry_flush);
 
diff --git a/tests/test-block-backend.c b/tests/test-block-backend.c
new file mode 100644
index 0000000000..5348781a42
--- /dev/null
+++ b/tests/test-block-backend.c
@@ -0,0 +1,62 @@ 
+/*
+ * BlockBackend tests
+ *
+ * Copyright (c) 2017 Kevin Wolf <kwolf@redhat.com>
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+
+#include "qemu/osdep.h"
+#include "block/block.h"
+#include "sysemu/block-backend.h"
+#include "qapi/error.h"
+
+static void test_drain_aio_error_flush_cb(void *opaque, int ret)
+{
+    bool *completed = opaque;
+
+    g_assert(ret == -ENOMEDIUM);
+    *completed = true;
+}
+
+static void test_drain_aio_error(void)
+{
+    BlockBackend *blk = blk_new(BLK_PERM_ALL, BLK_PERM_ALL);
+    BlockAIOCB *acb;
+    bool completed = false;
+
+    acb = blk_aio_flush(blk, test_drain_aio_error_flush_cb, &completed);
+    g_assert(acb != NULL);
+    g_assert(completed == false);
+
+    blk_drain(blk);
+    g_assert(completed == true);
+}
+
+int main(int argc, char **argv)
+{
+    bdrv_init();
+    qemu_init_main_loop(&error_abort);
+
+    g_test_init(&argc, &argv, NULL);
+
+    g_test_add_func("/block-backend/drain_aio_error", test_drain_aio_error);
+
+    return g_test_run();
+}
diff --git a/tests/Makefile.include b/tests/Makefile.include
index 59e536bf0b..7beb65397b 100644
--- a/tests/Makefile.include
+++ b/tests/Makefile.include
@@ -56,6 +56,7 @@  check-unit-y += tests/test-hbitmap$(EXESUF)
 gcov-files-test-hbitmap-y = blockjob.c
 check-unit-y += tests/test-blockjob$(EXESUF)
 check-unit-y += tests/test-blockjob-txn$(EXESUF)
+check-unit-y += tests/test-block-backend$(EXESUF)
 check-unit-y += tests/test-x86-cpuid$(EXESUF)
 # all code tested by test-x86-cpuid is inside topology.h
 gcov-files-test-x86-cpuid-y =
@@ -556,6 +557,7 @@  tests/test-aio-multithread$(EXESUF): tests/test-aio-multithread.o $(test-block-o
 tests/test-throttle$(EXESUF): tests/test-throttle.o $(test-block-obj-y)
 tests/test-blockjob$(EXESUF): tests/test-blockjob.o $(test-block-obj-y) $(test-util-obj-y)
 tests/test-blockjob-txn$(EXESUF): tests/test-blockjob-txn.o $(test-block-obj-y) $(test-util-obj-y)
+tests/test-block-backend$(EXESUF): tests/test-block-backend.o $(test-block-obj-y) $(test-util-obj-y)
 tests/test-thread-pool$(EXESUF): tests/test-thread-pool.o $(test-block-obj-y)
 tests/test-iov$(EXESUF): tests/test-iov.o $(test-util-obj-y)
 tests/test-hbitmap$(EXESUF): tests/test-hbitmap.o $(test-util-obj-y) $(test-crypto-obj-y)

vl.c/exit: pause cpus before closing block devices

Commit Message

Comments

Patch