[v6,16/42] block: Flush all children in generic code

Message ID	20190809161407.11920-17-mreitz@redhat.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org> From: Max Reitz <mreitz@redhat.com> To: qemu-block@nongnu.org Date: Fri, 9 Aug 2019 18:13:41 +0200 Message-Id: <20190809161407.11920-17-mreitz@redhat.com> In-Reply-To: <20190809161407.11920-1-mreitz@redhat.com> References: <20190809161407.11920-1-mreitz@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Subject: [Qemu-devel] [PATCH v6 16/42] block: Flush all children in generic code Precedence: list Cc: Kevin Wolf <kwolf@redhat.com>, Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>, qemu-devel@nongnu.org, Max Reitz <mreitz@redhat.com> Errors-To: qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org Sender: "Qemu-devel" <qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org>
Series	block: Deal with filters \| expand [v6,00/42] block: Deal with filters [v6,01/42] block: Mark commit and mirror as filter drivers [v6,02/42] copy-on-read: Support compressed writes [v6,03/42] throttle: Support compressed writes [v6,04/42] block: Add child access functions [v6,05/42] block: Add chain helper functions [v6,06/42] qcow2: Implement .bdrv_storage_child() [v6,07/42] block: filtered_cow_child() for has_zero_init() [v6,08/42] block: bdrv_set_backing_hd() is about bs->backing [v6,09/42] block: Include filters when freezing backing chain [v6,10/42] block: Drop bdrv_is_encrypted() [v6,11/42] block: Add bdrv_supports_compressed_writes() [v6,12/42] block: Use bdrv_filtered_rw* where obvious [v6,13/42] block: Use CAFs in block status functions [v6,14/42] block: Use CAFs when working with backing chains [v6,15/42] block: Re-evaluate backing file handling in reopen [v6,16/42] block: Flush all children in generic code [v6,17/42] block: Use CAFs in bdrv_refresh_limits() [v6,18/42] block: Use CAFs in bdrv_refresh_filename() [v6,19/42] block: Use CAF in bdrv_co_rw_vmstate() [v6,20/42] block/snapshot: Fix fallback [v6,21/42] block: Use CAFs for debug breakpoints [v6,22/42] block: Fix bdrv_get_allocated_file_size's fallback [v6,23/42] blockdev: Use CAF in external_snapshot_prepare() [v6,24/42] block: Use child access functions for QAPI queries [v6,25/42] mirror: Deal with filters [v6,26/42] backup: Deal with filters [v6,27/42] commit: Deal with filters [v6,28/42] stream: Deal with filters [v6,29/42] nbd: Use CAF when looking for dirty bitmap [v6,30/42] qemu-img: Use child access functions [v6,31/42] block: Drop backing_bs() [v6,32/42] block: Make bdrv_get_cumulative_perm() public [v6,33/42] blockdev: Fix active commit choice [v6,34/42] block: Inline bdrv_co_block_status_from_*() [v6,35/42] block: Fix check_to_replace_node() [v6,36/42] iotests: Add tests for mirror @replaces loops [v6,37/42] block: Leave BDS.backing_file constant [v6,38/42] iotests: Let complete_and_wait() work with commit [v6,39/42] iotests: Add filter commit test cases [v6,40/42] iotests: Add filter mirror test cases [v6,41/42] iotests: Add test for commit in sub directory [v6,42/42] iotests: Test committing to overridden backing

Max Reitz Aug. 9, 2019, 4:13 p.m. UTC

If the driver does not support .bdrv_co_flush() so bdrv_co_flush()
itself has to flush the children of the given node, it should not flush
just bs->file->bs, but in fact all children.

In any case, the BLKDBG_EVENT() should be emitted on the primary child,
because that is where a blkdebug node would be if there is any.

Suggested-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 block/io.c | 23 +++++++++++++++++------
 1 file changed, 17 insertions(+), 6 deletions(-)

Vladimir Sementsov-Ogievskiy Aug. 10, 2019, 3:36 p.m. UTC | #1

09.08.2019 19:13, Max Reitz wrote:
> If the driver does not support .bdrv_co_flush() so bdrv_co_flush()
> itself has to flush the children of the given node, it should not flush
> just bs->file->bs, but in fact all children.
> 
> In any case, the BLKDBG_EVENT() should be emitted on the primary child,
> because that is where a blkdebug node would be if there is any.
> 
> Suggested-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> ---
>   block/io.c | 23 +++++++++++++++++------
>   1 file changed, 17 insertions(+), 6 deletions(-)
> 
> diff --git a/block/io.c b/block/io.c
> index c5a8e3e6a3..bcc770d336 100644
> --- a/block/io.c
> +++ b/block/io.c
> @@ -2572,6 +2572,8 @@ static void coroutine_fn bdrv_flush_co_entry(void *opaque)
>   
>   int coroutine_fn bdrv_co_flush(BlockDriverState *bs)
>   {
> +    BdrvChild *primary_child = bdrv_primary_child(bs);
> +    BdrvChild *child;
>       int current_gen;
>       int ret = 0;
>   
> @@ -2601,7 +2603,7 @@ int coroutine_fn bdrv_co_flush(BlockDriverState *bs)
>       }
>   
>       /* Write back cached data to the OS even with cache=unsafe */
> -    BLKDBG_EVENT(bs->file, BLKDBG_FLUSH_TO_OS);
> +    BLKDBG_EVENT(primary_child, BLKDBG_FLUSH_TO_OS);
>       if (bs->drv->bdrv_co_flush_to_os) {
>           ret = bs->drv->bdrv_co_flush_to_os(bs);
>           if (ret < 0) {
> @@ -2611,15 +2613,15 @@ int coroutine_fn bdrv_co_flush(BlockDriverState *bs)
>   
>       /* But don't actually force it to the disk with cache=unsafe */
>       if (bs->open_flags & BDRV_O_NO_FLUSH) {
> -        goto flush_parent;
> +        goto flush_children;
>       }
>   
>       /* Check if we really need to flush anything */
>       if (bs->flushed_gen == current_gen) {
> -        goto flush_parent;
> +        goto flush_children;
>       }
>   
> -    BLKDBG_EVENT(bs->file, BLKDBG_FLUSH_TO_DISK);
> +    BLKDBG_EVENT(primary_child, BLKDBG_FLUSH_TO_DISK);
>       if (!bs->drv) {
>           /* bs->drv->bdrv_co_flush() might have ejected the BDS
>            * (even in case of apparent success) */
> @@ -2663,8 +2665,17 @@ int coroutine_fn bdrv_co_flush(BlockDriverState *bs)
>       /* Now flush the underlying protocol.  It will also have BDRV_O_NO_FLUSH
>        * in the case of cache=unsafe, so there are no useless flushes.
>        */
> -flush_parent:
> -    ret = bs->file ? bdrv_co_flush(bs->file->bs) : 0;
> +flush_children:
> +    ret = 0; > +    QLIST_FOREACH(child, &bs->children, next) {
> +        int this_child_ret;
> +
> +        this_child_ret = bdrv_co_flush(child->bs);
> +        if (!ret) {
> +            ret = this_child_ret;
> +        }
> +    }

Hmm, you said that we want to flush only children with write-access from parent..
Shouldn't we check it? Or we assume that it's always safe to call bdrv_co_flush on
a node?

> +
>   out:
>       /* Notify any pending flushes that we have completed */
>       if (ret == 0) {
>

Max Reitz Aug. 12, 2019, 12:58 p.m. UTC | #2

On 10.08.19 17:36, Vladimir Sementsov-Ogievskiy wrote:
> 09.08.2019 19:13, Max Reitz wrote:
>> If the driver does not support .bdrv_co_flush() so bdrv_co_flush()
>> itself has to flush the children of the given node, it should not flush
>> just bs->file->bs, but in fact all children.
>>
>> In any case, the BLKDBG_EVENT() should be emitted on the primary child,
>> because that is where a blkdebug node would be if there is any.
>>
>> Suggested-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>> ---
>>   block/io.c | 23 +++++++++++++++++------
>>   1 file changed, 17 insertions(+), 6 deletions(-)
>>
>> diff --git a/block/io.c b/block/io.c
>> index c5a8e3e6a3..bcc770d336 100644
>> --- a/block/io.c
>> +++ b/block/io.c
>> @@ -2572,6 +2572,8 @@ static void coroutine_fn bdrv_flush_co_entry(void *opaque)
>>   
>>   int coroutine_fn bdrv_co_flush(BlockDriverState *bs)
>>   {
>> +    BdrvChild *primary_child = bdrv_primary_child(bs);
>> +    BdrvChild *child;
>>       int current_gen;
>>       int ret = 0;
>>   
>> @@ -2601,7 +2603,7 @@ int coroutine_fn bdrv_co_flush(BlockDriverState *bs)
>>       }
>>   
>>       /* Write back cached data to the OS even with cache=unsafe */
>> -    BLKDBG_EVENT(bs->file, BLKDBG_FLUSH_TO_OS);
>> +    BLKDBG_EVENT(primary_child, BLKDBG_FLUSH_TO_OS);
>>       if (bs->drv->bdrv_co_flush_to_os) {
>>           ret = bs->drv->bdrv_co_flush_to_os(bs);
>>           if (ret < 0) {
>> @@ -2611,15 +2613,15 @@ int coroutine_fn bdrv_co_flush(BlockDriverState *bs)
>>   
>>       /* But don't actually force it to the disk with cache=unsafe */
>>       if (bs->open_flags & BDRV_O_NO_FLUSH) {
>> -        goto flush_parent;
>> +        goto flush_children;
>>       }
>>   
>>       /* Check if we really need to flush anything */
>>       if (bs->flushed_gen == current_gen) {
>> -        goto flush_parent;
>> +        goto flush_children;
>>       }
>>   
>> -    BLKDBG_EVENT(bs->file, BLKDBG_FLUSH_TO_DISK);
>> +    BLKDBG_EVENT(primary_child, BLKDBG_FLUSH_TO_DISK);
>>       if (!bs->drv) {
>>           /* bs->drv->bdrv_co_flush() might have ejected the BDS
>>            * (even in case of apparent success) */
>> @@ -2663,8 +2665,17 @@ int coroutine_fn bdrv_co_flush(BlockDriverState *bs)
>>       /* Now flush the underlying protocol.  It will also have BDRV_O_NO_FLUSH
>>        * in the case of cache=unsafe, so there are no useless flushes.
>>        */
>> -flush_parent:
>> -    ret = bs->file ? bdrv_co_flush(bs->file->bs) : 0;
>> +flush_children:
>> +    ret = 0; > +    QLIST_FOREACH(child, &bs->children, next) {
>> +        int this_child_ret;
>> +
>> +        this_child_ret = bdrv_co_flush(child->bs);
>> +        if (!ret) {
>> +            ret = this_child_ret;
>> +        }
>> +    }
> 
> Hmm, you said that we want to flush only children with write-access from parent..

Good that you remember it, I must have overlooked it (when reading the
replies to the previous version). :-)

> Shouldn't we check it? Or we assume that it's always safe to call bdrv_co_flush on
> a node?

I think it’s always safe.  But checking it seems like a nice touch, yes.

Max

>> +
>>   out:
>>       /* Notify any pending flushes that we have completed */
>>       if (ret == 0) {
>>
> 
>

Kevin Wolf Sept. 5, 2019, 4:24 p.m. UTC | #3

Am 12.08.2019 um 14:58 hat Max Reitz geschrieben:
> On 10.08.19 17:36, Vladimir Sementsov-Ogievskiy wrote:
> > 09.08.2019 19:13, Max Reitz wrote:
> >> If the driver does not support .bdrv_co_flush() so bdrv_co_flush()
> >> itself has to flush the children of the given node, it should not flush
> >> just bs->file->bs, but in fact all children.
> >>
> >> In any case, the BLKDBG_EVENT() should be emitted on the primary child,
> >> because that is where a blkdebug node would be if there is any.
> >>
> >> Suggested-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> >> Signed-off-by: Max Reitz <mreitz@redhat.com>
> >> ---
> >>   block/io.c | 23 +++++++++++++++++------
> >>   1 file changed, 17 insertions(+), 6 deletions(-)
> >>
> >> diff --git a/block/io.c b/block/io.c
> >> index c5a8e3e6a3..bcc770d336 100644
> >> --- a/block/io.c
> >> +++ b/block/io.c
> >> @@ -2572,6 +2572,8 @@ static void coroutine_fn bdrv_flush_co_entry(void *opaque)
> >>   
> >>   int coroutine_fn bdrv_co_flush(BlockDriverState *bs)
> >>   {
> >> +    BdrvChild *primary_child = bdrv_primary_child(bs);
> >> +    BdrvChild *child;
> >>       int current_gen;
> >>       int ret = 0;
> >>   
> >> @@ -2601,7 +2603,7 @@ int coroutine_fn bdrv_co_flush(BlockDriverState *bs)
> >>       }
> >>   
> >>       /* Write back cached data to the OS even with cache=unsafe */
> >> -    BLKDBG_EVENT(bs->file, BLKDBG_FLUSH_TO_OS);
> >> +    BLKDBG_EVENT(primary_child, BLKDBG_FLUSH_TO_OS);
> >>       if (bs->drv->bdrv_co_flush_to_os) {
> >>           ret = bs->drv->bdrv_co_flush_to_os(bs);
> >>           if (ret < 0) {
> >> @@ -2611,15 +2613,15 @@ int coroutine_fn bdrv_co_flush(BlockDriverState *bs)
> >>   
> >>       /* But don't actually force it to the disk with cache=unsafe */
> >>       if (bs->open_flags & BDRV_O_NO_FLUSH) {
> >> -        goto flush_parent;
> >> +        goto flush_children;
> >>       }
> >>   
> >>       /* Check if we really need to flush anything */
> >>       if (bs->flushed_gen == current_gen) {
> >> -        goto flush_parent;
> >> +        goto flush_children;
> >>       }
> >>   
> >> -    BLKDBG_EVENT(bs->file, BLKDBG_FLUSH_TO_DISK);
> >> +    BLKDBG_EVENT(primary_child, BLKDBG_FLUSH_TO_DISK);
> >>       if (!bs->drv) {
> >>           /* bs->drv->bdrv_co_flush() might have ejected the BDS
> >>            * (even in case of apparent success) */
> >> @@ -2663,8 +2665,17 @@ int coroutine_fn bdrv_co_flush(BlockDriverState *bs)
> >>       /* Now flush the underlying protocol.  It will also have BDRV_O_NO_FLUSH
> >>        * in the case of cache=unsafe, so there are no useless flushes.
> >>        */
> >> -flush_parent:
> >> -    ret = bs->file ? bdrv_co_flush(bs->file->bs) : 0;
> >> +flush_children:
> >> +    ret = 0; > +    QLIST_FOREACH(child, &bs->children, next) {
> >> +        int this_child_ret;
> >> +
> >> +        this_child_ret = bdrv_co_flush(child->bs);
> >> +        if (!ret) {
> >> +            ret = this_child_ret;
> >> +        }
> >> +    }
> > 
> > Hmm, you said that we want to flush only children with write-access from parent..
> 
> Good that you remember it, I must have overlooked it (when reading the
> replies to the previous version). :-)
> 
> > Shouldn't we check it? Or we assume that it's always safe to call bdrv_co_flush on
> > a node?
> 
> I think it’s always safe.  But checking it seems like a nice touch, yes.

I'm not sure why we would unconditionally flush all children anyway. The
only drivers I can think of that really need to flush more than one
child are blkverify and quorum, and both of them already implement this.
blkverify implements .bdrv_co_flush, so it's not affected by the change
anyway, but quorum children will be flushed twice now.

But more than this, I'm worried about the overhead of needlessly
recursing through the whole backing chain and calling flush on every
node there.  Maybe bs->write_gen saves us so that at least this doesn't
result in an fdatasync() call for each, but still... Without a use case,
I'd rather not do this.

Oh, well, after having written all of this, I see that qcow2 with an
external data file is buggy... This could be fixed in the qcow2 driver,
but maybe restricting the recursion to read-only is actually good enough
then. Can you mention this case in the commit message and maybe build a
test for it?

Kevin

Max Reitz Sept. 9, 2019, 8:31 a.m. UTC | #4

On 05.09.19 18:24, Kevin Wolf wrote:
> Am 12.08.2019 um 14:58 hat Max Reitz geschrieben:
>> On 10.08.19 17:36, Vladimir Sementsov-Ogievskiy wrote:
>>> 09.08.2019 19:13, Max Reitz wrote:
>>>> If the driver does not support .bdrv_co_flush() so bdrv_co_flush()
>>>> itself has to flush the children of the given node, it should not flush
>>>> just bs->file->bs, but in fact all children.
>>>>
>>>> In any case, the BLKDBG_EVENT() should be emitted on the primary child,
>>>> because that is where a blkdebug node would be if there is any.
>>>>
>>>> Suggested-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>>>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>>>> ---
>>>>   block/io.c | 23 +++++++++++++++++------
>>>>   1 file changed, 17 insertions(+), 6 deletions(-)
>>>>
>>>> diff --git a/block/io.c b/block/io.c
>>>> index c5a8e3e6a3..bcc770d336 100644
>>>> --- a/block/io.c
>>>> +++ b/block/io.c
>>>> @@ -2572,6 +2572,8 @@ static void coroutine_fn bdrv_flush_co_entry(void *opaque)
>>>>   
>>>>   int coroutine_fn bdrv_co_flush(BlockDriverState *bs)
>>>>   {
>>>> +    BdrvChild *primary_child = bdrv_primary_child(bs);
>>>> +    BdrvChild *child;
>>>>       int current_gen;
>>>>       int ret = 0;
>>>>   
>>>> @@ -2601,7 +2603,7 @@ int coroutine_fn bdrv_co_flush(BlockDriverState *bs)
>>>>       }
>>>>   
>>>>       /* Write back cached data to the OS even with cache=unsafe */
>>>> -    BLKDBG_EVENT(bs->file, BLKDBG_FLUSH_TO_OS);
>>>> +    BLKDBG_EVENT(primary_child, BLKDBG_FLUSH_TO_OS);
>>>>       if (bs->drv->bdrv_co_flush_to_os) {
>>>>           ret = bs->drv->bdrv_co_flush_to_os(bs);
>>>>           if (ret < 0) {
>>>> @@ -2611,15 +2613,15 @@ int coroutine_fn bdrv_co_flush(BlockDriverState *bs)
>>>>   
>>>>       /* But don't actually force it to the disk with cache=unsafe */
>>>>       if (bs->open_flags & BDRV_O_NO_FLUSH) {
>>>> -        goto flush_parent;
>>>> +        goto flush_children;
>>>>       }
>>>>   
>>>>       /* Check if we really need to flush anything */
>>>>       if (bs->flushed_gen == current_gen) {
>>>> -        goto flush_parent;
>>>> +        goto flush_children;
>>>>       }
>>>>   
>>>> -    BLKDBG_EVENT(bs->file, BLKDBG_FLUSH_TO_DISK);
>>>> +    BLKDBG_EVENT(primary_child, BLKDBG_FLUSH_TO_DISK);
>>>>       if (!bs->drv) {
>>>>           /* bs->drv->bdrv_co_flush() might have ejected the BDS
>>>>            * (even in case of apparent success) */
>>>> @@ -2663,8 +2665,17 @@ int coroutine_fn bdrv_co_flush(BlockDriverState *bs)
>>>>       /* Now flush the underlying protocol.  It will also have BDRV_O_NO_FLUSH
>>>>        * in the case of cache=unsafe, so there are no useless flushes.
>>>>        */
>>>> -flush_parent:
>>>> -    ret = bs->file ? bdrv_co_flush(bs->file->bs) : 0;
>>>> +flush_children:
>>>> +    ret = 0; > +    QLIST_FOREACH(child, &bs->children, next) {
>>>> +        int this_child_ret;
>>>> +
>>>> +        this_child_ret = bdrv_co_flush(child->bs);
>>>> +        if (!ret) {
>>>> +            ret = this_child_ret;
>>>> +        }
>>>> +    }
>>>
>>> Hmm, you said that we want to flush only children with write-access from parent..
>>
>> Good that you remember it, I must have overlooked it (when reading the
>> replies to the previous version). :-)
>>
>>> Shouldn't we check it? Or we assume that it's always safe to call bdrv_co_flush on
>>> a node?
>>
>> I think it’s always safe.  But checking it seems like a nice touch, yes.
> 
> I'm not sure why we would unconditionally flush all children anyway. The
> only drivers I can think of that really need to flush more than one
> child are blkverify and quorum, and both of them already implement this.
> blkverify implements .bdrv_co_flush, so it's not affected by the change
> anyway, but quorum children will be flushed twice now.
> 
> But more than this, I'm worried about the overhead of needlessly
> recursing through the whole backing chain and calling flush on every
> node there.  Maybe bs->write_gen saves us so that at least this doesn't
> result in an fdatasync() call for each, but still... Without a use case,
> I'd rather not do this.
> 
> Oh, well, after having written all of this, I see that qcow2 with an
> external data file is buggy... This could be fixed in the qcow2 driver,
> but maybe restricting the recursion to read-only is actually good enough
> then. Can you mention this case in the commit message and maybe build a
> test for it?

And I should thus probably drop vmdk’s .bdrv_co_flush_to_disk()
implementation.

I will indeed try to write a test, but to be completely honest, I feel
like this series is long enough.

Max

Kevin Wolf Sept. 9, 2019, 10:01 a.m. UTC | #5

Am 09.09.2019 um 10:31 hat Max Reitz geschrieben:
> On 05.09.19 18:24, Kevin Wolf wrote:
> > Am 12.08.2019 um 14:58 hat Max Reitz geschrieben:
> >> On 10.08.19 17:36, Vladimir Sementsov-Ogievskiy wrote:
> >>> 09.08.2019 19:13, Max Reitz wrote:
> >>>> If the driver does not support .bdrv_co_flush() so bdrv_co_flush()
> >>>> itself has to flush the children of the given node, it should not flush
> >>>> just bs->file->bs, but in fact all children.
> >>>>
> >>>> In any case, the BLKDBG_EVENT() should be emitted on the primary child,
> >>>> because that is where a blkdebug node would be if there is any.
> >>>>
> >>>> Suggested-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> >>>> Signed-off-by: Max Reitz <mreitz@redhat.com>
> >>>> ---
> >>>>   block/io.c | 23 +++++++++++++++++------
> >>>>   1 file changed, 17 insertions(+), 6 deletions(-)
> >>>>
> >>>> diff --git a/block/io.c b/block/io.c
> >>>> index c5a8e3e6a3..bcc770d336 100644
> >>>> --- a/block/io.c
> >>>> +++ b/block/io.c
> >>>> @@ -2572,6 +2572,8 @@ static void coroutine_fn bdrv_flush_co_entry(void *opaque)
> >>>>   
> >>>>   int coroutine_fn bdrv_co_flush(BlockDriverState *bs)
> >>>>   {
> >>>> +    BdrvChild *primary_child = bdrv_primary_child(bs);
> >>>> +    BdrvChild *child;
> >>>>       int current_gen;
> >>>>       int ret = 0;
> >>>>   
> >>>> @@ -2601,7 +2603,7 @@ int coroutine_fn bdrv_co_flush(BlockDriverState *bs)
> >>>>       }
> >>>>   
> >>>>       /* Write back cached data to the OS even with cache=unsafe */
> >>>> -    BLKDBG_EVENT(bs->file, BLKDBG_FLUSH_TO_OS);
> >>>> +    BLKDBG_EVENT(primary_child, BLKDBG_FLUSH_TO_OS);
> >>>>       if (bs->drv->bdrv_co_flush_to_os) {
> >>>>           ret = bs->drv->bdrv_co_flush_to_os(bs);
> >>>>           if (ret < 0) {
> >>>> @@ -2611,15 +2613,15 @@ int coroutine_fn bdrv_co_flush(BlockDriverState *bs)
> >>>>   
> >>>>       /* But don't actually force it to the disk with cache=unsafe */
> >>>>       if (bs->open_flags & BDRV_O_NO_FLUSH) {
> >>>> -        goto flush_parent;
> >>>> +        goto flush_children;
> >>>>       }
> >>>>   
> >>>>       /* Check if we really need to flush anything */
> >>>>       if (bs->flushed_gen == current_gen) {
> >>>> -        goto flush_parent;
> >>>> +        goto flush_children;
> >>>>       }
> >>>>   
> >>>> -    BLKDBG_EVENT(bs->file, BLKDBG_FLUSH_TO_DISK);
> >>>> +    BLKDBG_EVENT(primary_child, BLKDBG_FLUSH_TO_DISK);
> >>>>       if (!bs->drv) {
> >>>>           /* bs->drv->bdrv_co_flush() might have ejected the BDS
> >>>>            * (even in case of apparent success) */
> >>>> @@ -2663,8 +2665,17 @@ int coroutine_fn bdrv_co_flush(BlockDriverState *bs)
> >>>>       /* Now flush the underlying protocol.  It will also have BDRV_O_NO_FLUSH
> >>>>        * in the case of cache=unsafe, so there are no useless flushes.
> >>>>        */
> >>>> -flush_parent:
> >>>> -    ret = bs->file ? bdrv_co_flush(bs->file->bs) : 0;
> >>>> +flush_children:
> >>>> +    ret = 0; > +    QLIST_FOREACH(child, &bs->children, next) {
> >>>> +        int this_child_ret;
> >>>> +
> >>>> +        this_child_ret = bdrv_co_flush(child->bs);
> >>>> +        if (!ret) {
> >>>> +            ret = this_child_ret;
> >>>> +        }
> >>>> +    }
> >>>
> >>> Hmm, you said that we want to flush only children with write-access from parent..
> >>
> >> Good that you remember it, I must have overlooked it (when reading the
> >> replies to the previous version). :-)
> >>
> >>> Shouldn't we check it? Or we assume that it's always safe to call bdrv_co_flush on
> >>> a node?
> >>
> >> I think it’s always safe.  But checking it seems like a nice touch, yes.
> > 
> > I'm not sure why we would unconditionally flush all children anyway. The
> > only drivers I can think of that really need to flush more than one
> > child are blkverify and quorum, and both of them already implement this.
> > blkverify implements .bdrv_co_flush, so it's not affected by the change
> > anyway, but quorum children will be flushed twice now.
> > 
> > But more than this, I'm worried about the overhead of needlessly
> > recursing through the whole backing chain and calling flush on every
> > node there.  Maybe bs->write_gen saves us so that at least this doesn't
> > result in an fdatasync() call for each, but still... Without a use case,
> > I'd rather not do this.
> > 
> > Oh, well, after having written all of this, I see that qcow2 with an
> > external data file is buggy... This could be fixed in the qcow2 driver,
> > but maybe restricting the recursion to read-only is actually good enough
> > then. Can you mention this case in the commit message and maybe build a
> > test for it?
> 
> And I should thus probably drop vmdk’s .bdrv_co_flush_to_disk()
> implementation.
> 
> I will indeed try to write a test, but to be completely honest, I feel
> like this series is long enough.

I guess I could already merge patch 1 to give you space for another test
patch. ;-)

Kevin

Max Reitz Sept. 9, 2019, 2:15 p.m. UTC | #6

On 09.09.19 12:01, Kevin Wolf wrote:
> Am 09.09.2019 um 10:31 hat Max Reitz geschrieben:
>> On 05.09.19 18:24, Kevin Wolf wrote:
>>> Am 12.08.2019 um 14:58 hat Max Reitz geschrieben:
>>>> On 10.08.19 17:36, Vladimir Sementsov-Ogievskiy wrote:
>>>>> 09.08.2019 19:13, Max Reitz wrote:
>>>>>> If the driver does not support .bdrv_co_flush() so bdrv_co_flush()
>>>>>> itself has to flush the children of the given node, it should not flush
>>>>>> just bs->file->bs, but in fact all children.
>>>>>>
>>>>>> In any case, the BLKDBG_EVENT() should be emitted on the primary child,
>>>>>> because that is where a blkdebug node would be if there is any.
>>>>>>
>>>>>> Suggested-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>>>>>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>>>>>> ---
>>>>>>   block/io.c | 23 +++++++++++++++++------
>>>>>>   1 file changed, 17 insertions(+), 6 deletions(-)
>>>>>>
>>>>>> diff --git a/block/io.c b/block/io.c
>>>>>> index c5a8e3e6a3..bcc770d336 100644
>>>>>> --- a/block/io.c
>>>>>> +++ b/block/io.c
>>>>>> @@ -2572,6 +2572,8 @@ static void coroutine_fn bdrv_flush_co_entry(void *opaque)
>>>>>>   
>>>>>>   int coroutine_fn bdrv_co_flush(BlockDriverState *bs)
>>>>>>   {
>>>>>> +    BdrvChild *primary_child = bdrv_primary_child(bs);
>>>>>> +    BdrvChild *child;
>>>>>>       int current_gen;
>>>>>>       int ret = 0;
>>>>>>   
>>>>>> @@ -2601,7 +2603,7 @@ int coroutine_fn bdrv_co_flush(BlockDriverState *bs)
>>>>>>       }
>>>>>>   
>>>>>>       /* Write back cached data to the OS even with cache=unsafe */
>>>>>> -    BLKDBG_EVENT(bs->file, BLKDBG_FLUSH_TO_OS);
>>>>>> +    BLKDBG_EVENT(primary_child, BLKDBG_FLUSH_TO_OS);
>>>>>>       if (bs->drv->bdrv_co_flush_to_os) {
>>>>>>           ret = bs->drv->bdrv_co_flush_to_os(bs);
>>>>>>           if (ret < 0) {
>>>>>> @@ -2611,15 +2613,15 @@ int coroutine_fn bdrv_co_flush(BlockDriverState *bs)
>>>>>>   
>>>>>>       /* But don't actually force it to the disk with cache=unsafe */
>>>>>>       if (bs->open_flags & BDRV_O_NO_FLUSH) {
>>>>>> -        goto flush_parent;
>>>>>> +        goto flush_children;
>>>>>>       }
>>>>>>   
>>>>>>       /* Check if we really need to flush anything */
>>>>>>       if (bs->flushed_gen == current_gen) {
>>>>>> -        goto flush_parent;
>>>>>> +        goto flush_children;
>>>>>>       }
>>>>>>   
>>>>>> -    BLKDBG_EVENT(bs->file, BLKDBG_FLUSH_TO_DISK);
>>>>>> +    BLKDBG_EVENT(primary_child, BLKDBG_FLUSH_TO_DISK);
>>>>>>       if (!bs->drv) {
>>>>>>           /* bs->drv->bdrv_co_flush() might have ejected the BDS
>>>>>>            * (even in case of apparent success) */
>>>>>> @@ -2663,8 +2665,17 @@ int coroutine_fn bdrv_co_flush(BlockDriverState *bs)
>>>>>>       /* Now flush the underlying protocol.  It will also have BDRV_O_NO_FLUSH
>>>>>>        * in the case of cache=unsafe, so there are no useless flushes.
>>>>>>        */
>>>>>> -flush_parent:
>>>>>> -    ret = bs->file ? bdrv_co_flush(bs->file->bs) : 0;
>>>>>> +flush_children:
>>>>>> +    ret = 0; > +    QLIST_FOREACH(child, &bs->children, next) {
>>>>>> +        int this_child_ret;
>>>>>> +
>>>>>> +        this_child_ret = bdrv_co_flush(child->bs);
>>>>>> +        if (!ret) {
>>>>>> +            ret = this_child_ret;
>>>>>> +        }
>>>>>> +    }
>>>>>
>>>>> Hmm, you said that we want to flush only children with write-access from parent..
>>>>
>>>> Good that you remember it, I must have overlooked it (when reading the
>>>> replies to the previous version). :-)
>>>>
>>>>> Shouldn't we check it? Or we assume that it's always safe to call bdrv_co_flush on
>>>>> a node?
>>>>
>>>> I think it’s always safe.  But checking it seems like a nice touch, yes.
>>>
>>> I'm not sure why we would unconditionally flush all children anyway. The
>>> only drivers I can think of that really need to flush more than one
>>> child are blkverify and quorum, and both of them already implement this.
>>> blkverify implements .bdrv_co_flush, so it's not affected by the change
>>> anyway, but quorum children will be flushed twice now.
>>>
>>> But more than this, I'm worried about the overhead of needlessly
>>> recursing through the whole backing chain and calling flush on every
>>> node there.  Maybe bs->write_gen saves us so that at least this doesn't
>>> result in an fdatasync() call for each, but still... Without a use case,
>>> I'd rather not do this.
>>>
>>> Oh, well, after having written all of this, I see that qcow2 with an
>>> external data file is buggy... This could be fixed in the qcow2 driver,
>>> but maybe restricting the recursion to read-only is actually good enough
>>> then. Can you mention this case in the commit message and maybe build a
>>> test for it?
>>
>> And I should thus probably drop vmdk’s .bdrv_co_flush_to_disk()
>> implementation.
>>
>> I will indeed try to write a test, but to be completely honest, I feel
>> like this series is long enough.
> 
> I guess I could already merge patch 1 to give you space for another test
> patch. ;-)

Don’t forget the mirror-top patch.  And AFAIR, there was some comment
from Vladimir that also required an additional patch.  So it would need
to be three!

(Or I just start squashing from the back?)

Max

[v6,16/42] block: Flush all children in generic code

Commit Message

Comments

Patch