diff mbox

[v8,14/16] block: Rewrite bdrv_close_all()

Message ID 1453917600-2663-15-git-send-email-mreitz@redhat.com (mailing list archive)
State New, archived
Headers show

Commit Message

Max Reitz Jan. 27, 2016, 5:59 p.m. UTC
This patch rewrites bdrv_close_all(): Until now, all root BDSs have been
force-closed. This is bad because it can lead to cached data not being
flushed to disk.

Instead, try to make all reference holders relinquish their reference
voluntarily:

1. All BlockBackend users are handled by making all BBs simply eject
   their BDS tree. Since a BDS can never be on top of a BB, this will
   not cause any of the issues as seen with the force-closing of BDSs.
   The references will be relinquished and any further access to the BB
   will fail gracefully.
2. All BDSs which are owned by the monitor itself (because they do not
   have a BB) are relinquished next.
3. Besides BBs and the monitor, block jobs and other BDSs are the only
   things left that can hold a reference to BDSs. After every remaining
   block job has been canceled, there should not be any BDSs left (and
   the loop added here will always terminate (as long as NDEBUG is not
   defined), because either all_bdrv_states will be empty or there will
   not be any block job left to cancel, failing the assertion).

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
---
 block.c | 45 +++++++++++++++++++++++++++++++++++++--------
 1 file changed, 37 insertions(+), 8 deletions(-)

Comments

Fam Zheng Jan. 28, 2016, 4:17 a.m. UTC | #1
On Wed, 01/27 18:59, Max Reitz wrote:
> This patch rewrites bdrv_close_all(): Until now, all root BDSs have been
> force-closed. This is bad because it can lead to cached data not being
> flushed to disk.
> 
> Instead, try to make all reference holders relinquish their reference
> voluntarily:
> 
> 1. All BlockBackend users are handled by making all BBs simply eject
>    their BDS tree. Since a BDS can never be on top of a BB, this will
>    not cause any of the issues as seen with the force-closing of BDSs.
>    The references will be relinquished and any further access to the BB
>    will fail gracefully.
> 2. All BDSs which are owned by the monitor itself (because they do not
>    have a BB) are relinquished next.
> 3. Besides BBs and the monitor, block jobs and other BDSs are the only
>    things left that can hold a reference to BDSs. After every remaining
>    block job has been canceled, there should not be any BDSs left (and
>    the loop added here will always terminate (as long as NDEBUG is not
>    defined), because either all_bdrv_states will be empty or there will
>    not be any block job left to cancel, failing the assertion).
> 
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> Reviewed-by: Kevin Wolf <kwolf@redhat.com>
> ---
>  block.c | 45 +++++++++++++++++++++++++++++++++++++--------
>  1 file changed, 37 insertions(+), 8 deletions(-)
> 
> diff --git a/block.c b/block.c
> index f8dd4a3..478e0db 100644
> --- a/block.c
> +++ b/block.c
> @@ -2145,9 +2145,7 @@ static void bdrv_close(BlockDriverState *bs)
>  {
>      BdrvAioNotifier *ban, *ban_next;
>  
> -    if (bs->job) {
> -        block_job_cancel_sync(bs->job);
> -    }
> +    assert(!bs->job);
>  
>      /* Disable I/O limits and drain all pending throttled requests */
>      if (bs->throttle_state) {
> @@ -2213,13 +2211,44 @@ static void bdrv_close(BlockDriverState *bs)
>  void bdrv_close_all(void)
>  {
>      BlockDriverState *bs;
> +    AioContext *aio_context;
> +    int original_refcount = 0;
>  
> -    QTAILQ_FOREACH(bs, &bdrv_states, device_list) {
> -        AioContext *aio_context = bdrv_get_aio_context(bs);
> +    /* Drop references from requests still in flight, such as canceled block
> +     * jobs whose AIO context has not been polled yet */
> +    bdrv_drain_all();
>  
> -        aio_context_acquire(aio_context);
> -        bdrv_close(bs);
> -        aio_context_release(aio_context);
> +    blockdev_close_all_bdrv_states();
> +    blk_remove_all_bs();

This (monitor before BB) doesn't match the order in the commit message (BB
before monitor).

> +
> +    /* Cancel all block jobs */
> +    while (!QTAILQ_EMPTY(&all_bdrv_states)) {
> +        QTAILQ_FOREACH(bs, &all_bdrv_states, bs_list) {
> +            aio_context = bdrv_get_aio_context(bs);
> +
> +            aio_context_acquire(aio_context);
> +            if (bs->job) {
> +                /* So we can safely query the current refcount */
> +                bdrv_ref(bs);
> +                original_refcount = bs->refcnt;
> +
> +                block_job_cancel_sync(bs->job);
> +                aio_context_release(aio_context);
> +                break;
> +            }
> +            aio_context_release(aio_context);
> +        }
> +
> +        /* All the remaining BlockDriverStates are referenced directly or
> +         * indirectly from block jobs, so there needs to be at least one BDS
> +         * directly used by a block job */
> +        assert(bs);
> +
> +        /* Wait for the block job to release its reference */
> +        while (bs->refcnt >= original_refcount) {
> +            aio_poll(aio_context, true);

Why is this safe without acquiring aio_context? But oh wait, completions of
block jobs are defered to main loop BH, so I think to release the reference,
aio_poll(qemu_get_aio_context(), ...) is the right thing to do.

This is also the problem in block_job_cancel_sync, which can dead loop waiting
for job->completed flag, without processing main loop BH.

Fam

> +        }
> +        bdrv_unref(bs);
>      }
>  }
>  
> -- 
> 2.7.0
>
Max Reitz Jan. 29, 2016, 1:54 p.m. UTC | #2
On 28.01.2016 05:17, Fam Zheng wrote:
> On Wed, 01/27 18:59, Max Reitz wrote:
>> This patch rewrites bdrv_close_all(): Until now, all root BDSs have been
>> force-closed. This is bad because it can lead to cached data not being
>> flushed to disk.
>>
>> Instead, try to make all reference holders relinquish their reference
>> voluntarily:
>>
>> 1. All BlockBackend users are handled by making all BBs simply eject
>>    their BDS tree. Since a BDS can never be on top of a BB, this will
>>    not cause any of the issues as seen with the force-closing of BDSs.
>>    The references will be relinquished and any further access to the BB
>>    will fail gracefully.
>> 2. All BDSs which are owned by the monitor itself (because they do not
>>    have a BB) are relinquished next.
>> 3. Besides BBs and the monitor, block jobs and other BDSs are the only
>>    things left that can hold a reference to BDSs. After every remaining
>>    block job has been canceled, there should not be any BDSs left (and
>>    the loop added here will always terminate (as long as NDEBUG is not
>>    defined), because either all_bdrv_states will be empty or there will
>>    not be any block job left to cancel, failing the assertion).
>>
>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>> Reviewed-by: Kevin Wolf <kwolf@redhat.com>
>> ---
>>  block.c | 45 +++++++++++++++++++++++++++++++++++++--------
>>  1 file changed, 37 insertions(+), 8 deletions(-)
>>
>> diff --git a/block.c b/block.c
>> index f8dd4a3..478e0db 100644
>> --- a/block.c
>> +++ b/block.c
>> @@ -2145,9 +2145,7 @@ static void bdrv_close(BlockDriverState *bs)
>>  {
>>      BdrvAioNotifier *ban, *ban_next;
>>  
>> -    if (bs->job) {
>> -        block_job_cancel_sync(bs->job);
>> -    }
>> +    assert(!bs->job);
>>  
>>      /* Disable I/O limits and drain all pending throttled requests */
>>      if (bs->throttle_state) {
>> @@ -2213,13 +2211,44 @@ static void bdrv_close(BlockDriverState *bs)
>>  void bdrv_close_all(void)
>>  {
>>      BlockDriverState *bs;
>> +    AioContext *aio_context;
>> +    int original_refcount = 0;
>>  
>> -    QTAILQ_FOREACH(bs, &bdrv_states, device_list) {
>> -        AioContext *aio_context = bdrv_get_aio_context(bs);
>> +    /* Drop references from requests still in flight, such as canceled block
>> +     * jobs whose AIO context has not been polled yet */
>> +    bdrv_drain_all();
>>  
>> -        aio_context_acquire(aio_context);
>> -        bdrv_close(bs);
>> -        aio_context_release(aio_context);
>> +    blockdev_close_all_bdrv_states();
>> +    blk_remove_all_bs();
> 
> This (monitor before BB) doesn't match the order in the commit message (BB
> before monitor).

Will ask random.org whether to change the order here or in the commit
message. :-)

>> +
>> +    /* Cancel all block jobs */
>> +    while (!QTAILQ_EMPTY(&all_bdrv_states)) {
>> +        QTAILQ_FOREACH(bs, &all_bdrv_states, bs_list) {
>> +            aio_context = bdrv_get_aio_context(bs);
>> +
>> +            aio_context_acquire(aio_context);
>> +            if (bs->job) {
>> +                /* So we can safely query the current refcount */
>> +                bdrv_ref(bs);
>> +                original_refcount = bs->refcnt;
>> +
>> +                block_job_cancel_sync(bs->job);
>> +                aio_context_release(aio_context);
>> +                break;
>> +            }
>> +            aio_context_release(aio_context);
>> +        }
>> +
>> +        /* All the remaining BlockDriverStates are referenced directly or
>> +         * indirectly from block jobs, so there needs to be at least one BDS
>> +         * directly used by a block job */
>> +        assert(bs);
>> +
>> +        /* Wait for the block job to release its reference */
>> +        while (bs->refcnt >= original_refcount) {
>> +            aio_poll(aio_context, true);
> 
> Why is this safe without acquiring aio_context? But oh wait, completions of
> block jobs are defered to main loop BH, so I think to release the reference,
> aio_poll(qemu_get_aio_context(), ...) is the right thing to do.

Actually, I think, commit 94db6d2d30962cc0422a69c88c3b3e9981b33e50 made
this loop completely unnecessary. Will investigate.

Max

> This is also the problem in block_job_cancel_sync, which can dead loop waiting
> for job->completed flag, without processing main loop BH.
> 
> Fam
> 
>> +        }
>> +        bdrv_unref(bs);
>>      }
>>  }
>>  
>> -- 
>> 2.7.0
>>
diff mbox

Patch

diff --git a/block.c b/block.c
index f8dd4a3..478e0db 100644
--- a/block.c
+++ b/block.c
@@ -2145,9 +2145,7 @@  static void bdrv_close(BlockDriverState *bs)
 {
     BdrvAioNotifier *ban, *ban_next;
 
-    if (bs->job) {
-        block_job_cancel_sync(bs->job);
-    }
+    assert(!bs->job);
 
     /* Disable I/O limits and drain all pending throttled requests */
     if (bs->throttle_state) {
@@ -2213,13 +2211,44 @@  static void bdrv_close(BlockDriverState *bs)
 void bdrv_close_all(void)
 {
     BlockDriverState *bs;
+    AioContext *aio_context;
+    int original_refcount = 0;
 
-    QTAILQ_FOREACH(bs, &bdrv_states, device_list) {
-        AioContext *aio_context = bdrv_get_aio_context(bs);
+    /* Drop references from requests still in flight, such as canceled block
+     * jobs whose AIO context has not been polled yet */
+    bdrv_drain_all();
 
-        aio_context_acquire(aio_context);
-        bdrv_close(bs);
-        aio_context_release(aio_context);
+    blockdev_close_all_bdrv_states();
+    blk_remove_all_bs();
+
+    /* Cancel all block jobs */
+    while (!QTAILQ_EMPTY(&all_bdrv_states)) {
+        QTAILQ_FOREACH(bs, &all_bdrv_states, bs_list) {
+            aio_context = bdrv_get_aio_context(bs);
+
+            aio_context_acquire(aio_context);
+            if (bs->job) {
+                /* So we can safely query the current refcount */
+                bdrv_ref(bs);
+                original_refcount = bs->refcnt;
+
+                block_job_cancel_sync(bs->job);
+                aio_context_release(aio_context);
+                break;
+            }
+            aio_context_release(aio_context);
+        }
+
+        /* All the remaining BlockDriverStates are referenced directly or
+         * indirectly from block jobs, so there needs to be at least one BDS
+         * directly used by a block job */
+        assert(bs);
+
+        /* Wait for the block job to release its reference */
+        while (bs->refcnt >= original_refcount) {
+            aio_poll(aio_context, true);
+        }
+        bdrv_unref(bs);
     }
 }