[v6,28/42] stream: Deal with filters

Message ID	20190809161407.11920-29-mreitz@redhat.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org> From: Max Reitz <mreitz@redhat.com> To: qemu-block@nongnu.org Date: Fri, 9 Aug 2019 18:13:53 +0200 Message-Id: <20190809161407.11920-29-mreitz@redhat.com> In-Reply-To: <20190809161407.11920-1-mreitz@redhat.com> References: <20190809161407.11920-1-mreitz@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Subject: [Qemu-devel] [PATCH v6 28/42] stream: Deal with filters Precedence: list Cc: Kevin Wolf <kwolf@redhat.com>, Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>, qemu-devel@nongnu.org, Max Reitz <mreitz@redhat.com> Errors-To: qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org Sender: "Qemu-devel" <qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org>
Series	block: Deal with filters \| expand [v6,00/42] block: Deal with filters [v6,01/42] block: Mark commit and mirror as filter drivers [v6,02/42] copy-on-read: Support compressed writes [v6,03/42] throttle: Support compressed writes [v6,04/42] block: Add child access functions [v6,05/42] block: Add chain helper functions [v6,06/42] qcow2: Implement .bdrv_storage_child() [v6,07/42] block: filtered_cow_child() for has_zero_init() [v6,08/42] block: bdrv_set_backing_hd() is about bs->backing [v6,09/42] block: Include filters when freezing backing chain [v6,10/42] block: Drop bdrv_is_encrypted() [v6,11/42] block: Add bdrv_supports_compressed_writes() [v6,12/42] block: Use bdrv_filtered_rw* where obvious [v6,13/42] block: Use CAFs in block status functions [v6,14/42] block: Use CAFs when working with backing chains [v6,15/42] block: Re-evaluate backing file handling in reopen [v6,16/42] block: Flush all children in generic code [v6,17/42] block: Use CAFs in bdrv_refresh_limits() [v6,18/42] block: Use CAFs in bdrv_refresh_filename() [v6,19/42] block: Use CAF in bdrv_co_rw_vmstate() [v6,20/42] block/snapshot: Fix fallback [v6,21/42] block: Use CAFs for debug breakpoints [v6,22/42] block: Fix bdrv_get_allocated_file_size's fallback [v6,23/42] blockdev: Use CAF in external_snapshot_prepare() [v6,24/42] block: Use child access functions for QAPI queries [v6,25/42] mirror: Deal with filters [v6,26/42] backup: Deal with filters [v6,27/42] commit: Deal with filters [v6,28/42] stream: Deal with filters [v6,29/42] nbd: Use CAF when looking for dirty bitmap [v6,30/42] qemu-img: Use child access functions [v6,31/42] block: Drop backing_bs() [v6,32/42] block: Make bdrv_get_cumulative_perm() public [v6,33/42] blockdev: Fix active commit choice [v6,34/42] block: Inline bdrv_co_block_status_from_*() [v6,35/42] block: Fix check_to_replace_node() [v6,36/42] iotests: Add tests for mirror @replaces loops [v6,37/42] block: Leave BDS.backing_file constant [v6,38/42] iotests: Let complete_and_wait() work with commit [v6,39/42] iotests: Add filter commit test cases [v6,40/42] iotests: Add filter mirror test cases [v6,41/42] iotests: Add test for commit in sub directory [v6,42/42] iotests: Test committing to overridden backing

Max Reitz Aug. 9, 2019, 4:13 p.m. UTC

Because of the recent changes that make the stream job independent of
the base node and instead track the node above it, we have to split that
"bottom" node into two cases: The bottom COW node, and the node directly
above the base node (which may be an R/W filter or the bottom COW node).

Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 qapi/block-core.json |  4 ++++
 block/stream.c       | 52 ++++++++++++++++++++++++++++----------------
 blockdev.c           |  2 +-
 3 files changed, 38 insertions(+), 20 deletions(-)

Vladimir Sementsov-Ogievskiy Aug. 12, 2019, 11:55 a.m. UTC | #1

09.08.2019 19:13, Max Reitz wrote:
> Because of the recent changes that make the stream job independent of
> the base node and instead track the node above it, we have to split that
> "bottom" node into two cases: The bottom COW node, and the node directly
> above the base node (which may be an R/W filter or the bottom COW node).
> 
> Signed-off-by: Max Reitz <mreitz@redhat.com>

Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>

Kevin Wolf Sept. 13, 2019, 2:16 p.m. UTC | #2

Am 09.08.2019 um 18:13 hat Max Reitz geschrieben:
> Because of the recent changes that make the stream job independent of
> the base node and instead track the node above it, we have to split that
> "bottom" node into two cases: The bottom COW node, and the node directly
> above the base node (which may be an R/W filter or the bottom COW node).
> 
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> ---
>  qapi/block-core.json |  4 ++++
>  block/stream.c       | 52 ++++++++++++++++++++++++++++----------------
>  blockdev.c           |  2 +-
>  3 files changed, 38 insertions(+), 20 deletions(-)
> 
> diff --git a/qapi/block-core.json b/qapi/block-core.json
> index 38c4dbd7c3..3c54717870 100644
> --- a/qapi/block-core.json
> +++ b/qapi/block-core.json
> @@ -2516,6 +2516,10 @@
>  # On successful completion the image file is updated to drop the backing file
>  # and the BLOCK_JOB_COMPLETED event is emitted.
>  #
> +# In case @device is a filter node, block-stream modifies the first non-filter
> +# overlay node below it to point to base's backing node (or NULL if @base was
> +# not specified) instead of modifying @device itself.
> +#
>  # @job-id: identifier for the newly-created block job. If
>  #          omitted, the device name will be used. (Since 2.7)
>  #
> diff --git a/block/stream.c b/block/stream.c
> index 4c8b89884a..bd4a351dae 100644
> --- a/block/stream.c
> +++ b/block/stream.c
> @@ -31,7 +31,8 @@ enum {
>  
>  typedef struct StreamBlockJob {
>      BlockJob common;
> -    BlockDriverState *bottom;
> +    BlockDriverState *bottom_cow_node;
> +    BlockDriverState *above_base;

Confusing naming, especially because in commit you used above_base for
what is bottom_cow_node here. Vladimir already suggested using
base_overlay consistently, so we should do this here too (for
bottom_cow_node). above_base can keep its name because the different
above_base in commit is going to be renamed).

>      BlockdevOnError on_error;
>      char *backing_file_str;
>      bool bs_read_only;
> @@ -54,7 +55,7 @@ static void stream_abort(Job *job)
>  
>      if (s->chain_frozen) {
>          BlockJob *bjob = &s->common;
> -        bdrv_unfreeze_chain(blk_bs(bjob->blk), s->bottom);
> +        bdrv_unfreeze_chain(blk_bs(bjob->blk), s->above_base);
>      }
>  }
>  
> @@ -63,14 +64,15 @@ static int stream_prepare(Job *job)
>      StreamBlockJob *s = container_of(job, StreamBlockJob, common.job);
>      BlockJob *bjob = &s->common;
>      BlockDriverState *bs = blk_bs(bjob->blk);
> -    BlockDriverState *base = backing_bs(s->bottom);
> +    BlockDriverState *unfiltered_bs = bdrv_skip_rw_filters(bs);
> +    BlockDriverState *base = bdrv_filtered_bs(s->above_base);
>      Error *local_err = NULL;
>      int ret = 0;
>  
> -    bdrv_unfreeze_chain(bs, s->bottom);
> +    bdrv_unfreeze_chain(bs, s->above_base);
>      s->chain_frozen = false;
>  
> -    if (bs->backing) {
> +    if (bdrv_filtered_cow_child(unfiltered_bs)) {
>          const char *base_id = NULL, *base_fmt = NULL;
>          if (base) {
>              base_id = s->backing_file_str;
> @@ -78,8 +80,8 @@ static int stream_prepare(Job *job)
>                  base_fmt = base->drv->format_name;
>              }
>          }
> -        bdrv_set_backing_hd(bs, base, &local_err);
> -        ret = bdrv_change_backing_file(bs, base_id, base_fmt);
> +        bdrv_set_backing_hd(unfiltered_bs, base, &local_err);
> +        ret = bdrv_change_backing_file(unfiltered_bs, base_id, base_fmt);
>          if (local_err) {
>              error_report_err(local_err);
>              return -EPERM;
> @@ -110,7 +112,8 @@ static int coroutine_fn stream_run(Job *job, Error **errp)
>      StreamBlockJob *s = container_of(job, StreamBlockJob, common.job);
>      BlockBackend *blk = s->common.blk;
>      BlockDriverState *bs = blk_bs(blk);
> -    bool enable_cor = !backing_bs(s->bottom);
> +    BlockDriverState *unfiltered_bs = bdrv_skip_rw_filters(bs);
> +    bool enable_cor = !bdrv_filtered_bs(s->above_base);
>      int64_t len;
>      int64_t offset = 0;
>      uint64_t delay_ns = 0;
> @@ -119,7 +122,7 @@ static int coroutine_fn stream_run(Job *job, Error **errp)
>      int64_t n = 0; /* bytes */
>      void *buf;
>  
> -    if (bs == s->bottom) {
> +    if (unfiltered_bs == s->bottom_cow_node) {
>          /* Nothing to stream */
>          return 0;
>      }
> @@ -154,13 +157,14 @@ static int coroutine_fn stream_run(Job *job, Error **errp)
>  
>          copy = false;
>  
> -        ret = bdrv_is_allocated(bs, offset, STREAM_BUFFER_SIZE, &n);
> +        ret = bdrv_is_allocated(unfiltered_bs, offset, STREAM_BUFFER_SIZE, &n);
>          if (ret == 1) {
>              /* Allocated in the top, no need to copy.  */
>          } else if (ret >= 0) {
>              /* Copy if allocated in the intermediate images.  Limit to the
>               * known-unallocated area [offset, offset+n*BDRV_SECTOR_SIZE).  */
> -            ret = bdrv_is_allocated_above(backing_bs(bs), s->bottom, true,
> +            ret = bdrv_is_allocated_above(bdrv_filtered_cow_bs(unfiltered_bs),
> +                                          s->bottom_cow_node, true,
>                                            offset, n, &n);
>              /* Finish early if end of backing file has been reached */
>              if (ret == 0 && n == 0) {
> @@ -231,9 +235,16 @@ void stream_start(const char *job_id, BlockDriverState *bs,
>      BlockDriverState *iter;
>      bool bs_read_only;
>      int basic_flags = BLK_PERM_CONSISTENT_READ | BLK_PERM_WRITE_UNCHANGED;
> -    BlockDriverState *bottom = bdrv_find_overlay(bs, base);
> +    BlockDriverState *bottom_cow_node = bdrv_find_overlay(bs, base);
> +    BlockDriverState *above_base;

Do we need to check for bottom_cow_node == NULL?

I think you could get a bs that is a filter of bottom_cow_node, and then
bdrv_find_overlay() returns NULL and...

> -    if (bdrv_freeze_chain(bs, bottom, errp) < 0) {
> +    /* Find the node directly above @base */
> +    for (above_base = bottom_cow_node;
> +         bdrv_filtered_bs(above_base) != base;
> +         above_base = bdrv_filtered_bs(above_base))
> +    {}

...bottom_cow_node == NULL turns this into an endless loop.

> +    if (bdrv_freeze_chain(bs, above_base, errp) < 0) {
>          return;
>      }

Hm... This feels odd. There are two places where stopping to freeze the
chain would make obvious sense: At base, like we originally did; or at
base_overlay, like we (intend to) do since commit c624b015, because we
say that we don't actually mind if the user replaces the base image. I
don't see how stopping at the first filter above base makes sense.

So should this use bottom_cow_node/base_overlay instead of above_base?

You couldn't use StreamBlockJob.above_base any more then because it
could change, but you also don't really need it anywhere. It's only used
for unfreezing (which would change) and for finding the base (you can
still find bdrv_backing_chain_next(s->base_overlay)). I guess this would
even be a code simplification.

> @@ -261,16 +272,19 @@ void stream_start(const char *job_id, BlockDriverState *bs,
>       * disappear from the chain after this operation. The streaming job reads
>       * every block only once, assuming that it doesn't change, so forbid writes
>       * and resizes. Reassign the base node pointer because the backing BS of the
> -     * bottom node might change after the call to bdrv_reopen_set_read_only()
> -     * due to parallel block jobs running.
> +     * above_base node might change after the call to
> +     * bdrv_reopen_set_read_only() due to parallel block jobs running.
>       */
> -    base = backing_bs(bottom);
> -    for (iter = backing_bs(bs); iter && iter != base; iter = backing_bs(iter)) {
> +    base = bdrv_filtered_bs(above_base);

We just calculated above_base such that it's the parent of base. Why
would base not already have the value we're assigning it again here?

> +    for (iter = bdrv_filtered_bs(bs); iter && iter != base;
> +         iter = bdrv_filtered_bs(iter))
> +    {
>          block_job_add_bdrv(&s->common, "intermediate node", iter, 0,
>                             basic_flags, &error_abort);
>      }
>  
> -    s->bottom = bottom;
> +    s->bottom_cow_node = bottom_cow_node;
> +    s->above_base = above_base;
>      s->backing_file_str = g_strdup(backing_file_str);
>      s->bs_read_only = bs_read_only;
>      s->chain_frozen = true;
> @@ -284,5 +298,5 @@ fail:
>      if (bs_read_only) {
>          bdrv_reopen_set_read_only(bs, true, NULL);
>      }
> -    bdrv_unfreeze_chain(bs, bottom);
> +    bdrv_unfreeze_chain(bs, above_base);
>  }

Kevin

Max Reitz Sept. 16, 2019, 9:52 a.m. UTC | #3

On 13.09.19 16:16, Kevin Wolf wrote:
> Am 09.08.2019 um 18:13 hat Max Reitz geschrieben:
>> Because of the recent changes that make the stream job independent of
>> the base node and instead track the node above it, we have to split that
>> "bottom" node into two cases: The bottom COW node, and the node directly
>> above the base node (which may be an R/W filter or the bottom COW node).
>>
>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>> ---
>>  qapi/block-core.json |  4 ++++
>>  block/stream.c       | 52 ++++++++++++++++++++++++++++----------------
>>  blockdev.c           |  2 +-
>>  3 files changed, 38 insertions(+), 20 deletions(-)
>>
>> diff --git a/qapi/block-core.json b/qapi/block-core.json
>> index 38c4dbd7c3..3c54717870 100644
>> --- a/qapi/block-core.json
>> +++ b/qapi/block-core.json
>> @@ -2516,6 +2516,10 @@
>>  # On successful completion the image file is updated to drop the backing file
>>  # and the BLOCK_JOB_COMPLETED event is emitted.
>>  #
>> +# In case @device is a filter node, block-stream modifies the first non-filter
>> +# overlay node below it to point to base's backing node (or NULL if @base was
>> +# not specified) instead of modifying @device itself.
>> +#
>>  # @job-id: identifier for the newly-created block job. If
>>  #          omitted, the device name will be used. (Since 2.7)
>>  #
>> diff --git a/block/stream.c b/block/stream.c
>> index 4c8b89884a..bd4a351dae 100644
>> --- a/block/stream.c
>> +++ b/block/stream.c
>> @@ -31,7 +31,8 @@ enum {
>>  
>>  typedef struct StreamBlockJob {
>>      BlockJob common;
>> -    BlockDriverState *bottom;
>> +    BlockDriverState *bottom_cow_node;
>> +    BlockDriverState *above_base;
> 
> Confusing naming, especially because in commit you used above_base for
> what is bottom_cow_node here. Vladimir already suggested using
> base_overlay consistently, so we should do this here too (for
> bottom_cow_node). above_base can keep its name because the different
> above_base in commit is going to be renamed).

Sure.

>>      BlockdevOnError on_error;
>>      char *backing_file_str;
>>      bool bs_read_only;
>> @@ -54,7 +55,7 @@ static void stream_abort(Job *job)
>>  
>>      if (s->chain_frozen) {
>>          BlockJob *bjob = &s->common;
>> -        bdrv_unfreeze_chain(blk_bs(bjob->blk), s->bottom);
>> +        bdrv_unfreeze_chain(blk_bs(bjob->blk), s->above_base);
>>      }
>>  }
>>  
>> @@ -63,14 +64,15 @@ static int stream_prepare(Job *job)
>>      StreamBlockJob *s = container_of(job, StreamBlockJob, common.job);
>>      BlockJob *bjob = &s->common;
>>      BlockDriverState *bs = blk_bs(bjob->blk);
>> -    BlockDriverState *base = backing_bs(s->bottom);
>> +    BlockDriverState *unfiltered_bs = bdrv_skip_rw_filters(bs);
>> +    BlockDriverState *base = bdrv_filtered_bs(s->above_base);
>>      Error *local_err = NULL;
>>      int ret = 0;
>>  
>> -    bdrv_unfreeze_chain(bs, s->bottom);
>> +    bdrv_unfreeze_chain(bs, s->above_base);
>>      s->chain_frozen = false;
>>  
>> -    if (bs->backing) {
>> +    if (bdrv_filtered_cow_child(unfiltered_bs)) {
>>          const char *base_id = NULL, *base_fmt = NULL;
>>          if (base) {
>>              base_id = s->backing_file_str;
>> @@ -78,8 +80,8 @@ static int stream_prepare(Job *job)
>>                  base_fmt = base->drv->format_name;
>>              }
>>          }
>> -        bdrv_set_backing_hd(bs, base, &local_err);
>> -        ret = bdrv_change_backing_file(bs, base_id, base_fmt);
>> +        bdrv_set_backing_hd(unfiltered_bs, base, &local_err);
>> +        ret = bdrv_change_backing_file(unfiltered_bs, base_id, base_fmt);
>>          if (local_err) {
>>              error_report_err(local_err);
>>              return -EPERM;
>> @@ -110,7 +112,8 @@ static int coroutine_fn stream_run(Job *job, Error **errp)
>>      StreamBlockJob *s = container_of(job, StreamBlockJob, common.job);
>>      BlockBackend *blk = s->common.blk;
>>      BlockDriverState *bs = blk_bs(blk);
>> -    bool enable_cor = !backing_bs(s->bottom);
>> +    BlockDriverState *unfiltered_bs = bdrv_skip_rw_filters(bs);
>> +    bool enable_cor = !bdrv_filtered_bs(s->above_base);
>>      int64_t len;
>>      int64_t offset = 0;
>>      uint64_t delay_ns = 0;
>> @@ -119,7 +122,7 @@ static int coroutine_fn stream_run(Job *job, Error **errp)
>>      int64_t n = 0; /* bytes */
>>      void *buf;
>>  
>> -    if (bs == s->bottom) {
>> +    if (unfiltered_bs == s->bottom_cow_node) {
>>          /* Nothing to stream */
>>          return 0;
>>      }
>> @@ -154,13 +157,14 @@ static int coroutine_fn stream_run(Job *job, Error **errp)
>>  
>>          copy = false;
>>  
>> -        ret = bdrv_is_allocated(bs, offset, STREAM_BUFFER_SIZE, &n);
>> +        ret = bdrv_is_allocated(unfiltered_bs, offset, STREAM_BUFFER_SIZE, &n);
>>          if (ret == 1) {
>>              /* Allocated in the top, no need to copy.  */
>>          } else if (ret >= 0) {
>>              /* Copy if allocated in the intermediate images.  Limit to the
>>               * known-unallocated area [offset, offset+n*BDRV_SECTOR_SIZE).  */
>> -            ret = bdrv_is_allocated_above(backing_bs(bs), s->bottom, true,
>> +            ret = bdrv_is_allocated_above(bdrv_filtered_cow_bs(unfiltered_bs),
>> +                                          s->bottom_cow_node, true,
>>                                            offset, n, &n);
>>              /* Finish early if end of backing file has been reached */
>>              if (ret == 0 && n == 0) {
>> @@ -231,9 +235,16 @@ void stream_start(const char *job_id, BlockDriverState *bs,
>>      BlockDriverState *iter;
>>      bool bs_read_only;
>>      int basic_flags = BLK_PERM_CONSISTENT_READ | BLK_PERM_WRITE_UNCHANGED;
>> -    BlockDriverState *bottom = bdrv_find_overlay(bs, base);
>> +    BlockDriverState *bottom_cow_node = bdrv_find_overlay(bs, base);
>> +    BlockDriverState *above_base;
> 
> Do we need to check for bottom_cow_node == NULL?
> 
> I think you could get a bs that is a filter of bottom_cow_node, and then
> bdrv_find_overlay() returns NULL and...

Ah, yes.  It isn’t even about the infinite loop, it’s just a case of
“Nothing to stream” (if @bs is just a filter chain away from @base).

Also, I just noticed that bdrv_find_overlay() in the version of this
series won’t work if the BDS passed to it (@base here) is a filter, so
that’s something else to be fixed.

>> -    if (bdrv_freeze_chain(bs, bottom, errp) < 0) {
>> +    /* Find the node directly above @base */
>> +    for (above_base = bottom_cow_node;
>> +         bdrv_filtered_bs(above_base) != base;
>> +         above_base = bdrv_filtered_bs(above_base))
>> +    {}
>  	
> ...bottom_cow_node == NULL turns this into an endless loop.
> 
>> +    if (bdrv_freeze_chain(bs, above_base, errp) < 0) {
>>          return;
>>      }
> 
> Hm... This feels odd. There are two places where stopping to freeze the
> chain would make obvious sense: At base, like we originally did; or at
> base_overlay, like we (intend to) do since commit c624b015, because we
> say that we don't actually mind if the user replaces the base image. I
> don't see how stopping at the first filter above base makes sense.
> 
> So should this use bottom_cow_node/base_overlay instead of above_base?

I suppose I thought “Better be safe than sorry”.

> You couldn't use StreamBlockJob.above_base any more then because it
> could change, but you also don't really need it anywhere. It's only used
> for unfreezing (which would change) and for finding the base (you can
> still find bdrv_backing_chain_next(s->base_overlay)). I guess this would
> even be a code simplification.

Great, I’ll see to it.

>> @@ -261,16 +272,19 @@ void stream_start(const char *job_id, BlockDriverState *bs,
>>       * disappear from the chain after this operation. The streaming job reads
>>       * every block only once, assuming that it doesn't change, so forbid writes
>>       * and resizes. Reassign the base node pointer because the backing BS of the
>> -     * bottom node might change after the call to bdrv_reopen_set_read_only()
>> -     * due to parallel block jobs running.
>> +     * above_base node might change after the call to
>> +     * bdrv_reopen_set_read_only() due to parallel block jobs running.
>>       */
>> -    base = backing_bs(bottom);
>> -    for (iter = backing_bs(bs); iter && iter != base; iter = backing_bs(iter)) {
>> +    base = bdrv_filtered_bs(above_base);
> 
> We just calculated above_base such that it's the parent of base. Why
> would base not already have the value we're assigning it again here?

That’s no change to existing code, whose reasoning is explained in the
comment above: bdrv_reopen_set_read_only() can yield, which might lead
to children of the bottom node changing.

If you feel like either that’s superfluous, or that if something like
that were to happen we’d have much bigger problems, be my guest to drop
both.

But in this series I’d rather just not change it.

Max

Kevin Wolf Sept. 16, 2019, 2:47 p.m. UTC | #4

Am 16.09.2019 um 11:52 hat Max Reitz geschrieben:
> On 13.09.19 16:16, Kevin Wolf wrote:
> > Am 09.08.2019 um 18:13 hat Max Reitz geschrieben:
> >> @@ -261,16 +272,19 @@ void stream_start(const char *job_id, BlockDriverState *bs,
> >>       * disappear from the chain after this operation. The streaming job reads
> >>       * every block only once, assuming that it doesn't change, so forbid writes
> >>       * and resizes. Reassign the base node pointer because the backing BS of the
> >> -     * bottom node might change after the call to bdrv_reopen_set_read_only()
> >> -     * due to parallel block jobs running.
> >> +     * above_base node might change after the call to
> >> +     * bdrv_reopen_set_read_only() due to parallel block jobs running.
> >>       */
> >> -    base = backing_bs(bottom);
> >> -    for (iter = backing_bs(bs); iter && iter != base; iter = backing_bs(iter)) {
> >> +    base = bdrv_filtered_bs(above_base);
> > 
> > We just calculated above_base such that it's the parent of base. Why
> > would base not already have the value we're assigning it again here?
> 
> That’s no change to existing code, whose reasoning is explained in the
> comment above: bdrv_reopen_set_read_only() can yield, which might lead
> to children of the bottom node changing.
> 
> If you feel like either that’s superfluous, or that if something like
> that were to happen we’d have much bigger problems, be my guest to drop
> both.
> 
> But in this series I’d rather just not change it.

Ah, you mean comments are there to be read?

But actually, I think iterating down to base is too much anyway. The
reasoning in the comment for block_job_add_bdrv() is that the nodes will
be dropped at the end. But base with all of its filter will be kept
after this patch.

So I think the for loop should stop after bs->base_overlay. And then
concurrently changing links aren't even a problem any more because
that's exactly the place up to which we've frozen the chain.

Kevin

Max Reitz Dec. 11, 2019, 12:52 p.m. UTC | #5

On 16.09.19 11:52, Max Reitz wrote:
> On 13.09.19 16:16, Kevin Wolf wrote:
>> Am 09.08.2019 um 18:13 hat Max Reitz geschrieben:
>>> Because of the recent changes that make the stream job independent of
>>> the base node and instead track the node above it, we have to split that
>>> "bottom" node into two cases: The bottom COW node, and the node directly
>>> above the base node (which may be an R/W filter or the bottom COW node).
>>>
>>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>>> ---
>>>  qapi/block-core.json |  4 ++++
>>>  block/stream.c       | 52 ++++++++++++++++++++++++++++----------------
>>>  blockdev.c           |  2 +-
>>>  3 files changed, 38 insertions(+), 20 deletions(-)

[...]

>>> +    if (bdrv_freeze_chain(bs, above_base, errp) < 0) {
>>>          return;
>>>      }
>>
>> Hm... This feels odd. There are two places where stopping to freeze the
>> chain would make obvious sense: At base, like we originally did; or at
>> base_overlay, like we (intend to) do since commit c624b015, because we
>> say that we don't actually mind if the user replaces the base image. I
>> don't see how stopping at the first filter above base makes sense.
>>
>> So should this use bottom_cow_node/base_overlay instead of above_base?
> 
> I suppose I thought “Better be safe than sorry”.
> 
>> You couldn't use StreamBlockJob.above_base any more then because it
>> could change, but you also don't really need it anywhere. It's only used
>> for unfreezing (which would change) and for finding the base (you can
>> still find bdrv_backing_chain_next(s->base_overlay)). I guess this would
>> even be a code simplification.
> 
> Great, I’ll see to it.

On second thought (yes, I know, it’s been a couple of months...) I’m not
so sure.

If @base is a filter, then bdrv_backing_chain_next(s->base_overlay) will
not return it.  So then the filter will be dropped, but that probably
isn’t what the user intended.

(In fact, the block-stream doc says “When streaming completes the image
file will have the base file as its backing file.”)

So now this gets hairy.  We do want exactly @base as the backing file
unless the user changed the graph.  But how can we detect that and if it
hasn’t changed find @base again?

What this patch did in this version worked because the graph was frozen
until @above_base.

Alternatively, we could store a pointer to @base directly (or its node
nmae) and then see whether there is some node between s->base_overlay
and bdrv_backing_chain_next(s->base_overlay) that matches that at the
end of streaming.

Well, actually, a pointer won’t work because of course maybe that node
was deleted and the area is now used for an unrelated node that the user
doesn’t want as the new backing file.

The node name could actually work, though.  I mean, if there is some
node in the immediate backing filter chain of base_overlay with that
node name after streaming, it does make sense to assume that the user
wants this to be the backing file; regardless of whether that’s exactly
the same node as it was at the start of streaming.

But we now also have to think about what to do when there is no node
with such a node name.  Should we keep all filters below base_overlay?
Or should we drop all of them?  I actually think keeping them is the
safer choice...

Max

Kevin Wolf Dec. 11, 2019, 3:52 p.m. UTC | #6

Am 11.12.2019 um 13:52 hat Max Reitz geschrieben:
> On 16.09.19 11:52, Max Reitz wrote:
> > On 13.09.19 16:16, Kevin Wolf wrote:
> >> Am 09.08.2019 um 18:13 hat Max Reitz geschrieben:
> >>> Because of the recent changes that make the stream job independent of
> >>> the base node and instead track the node above it, we have to split that
> >>> "bottom" node into two cases: The bottom COW node, and the node directly
> >>> above the base node (which may be an R/W filter or the bottom COW node).
> >>>
> >>> Signed-off-by: Max Reitz <mreitz@redhat.com>
> >>> ---
> >>>  qapi/block-core.json |  4 ++++
> >>>  block/stream.c       | 52 ++++++++++++++++++++++++++++----------------
> >>>  blockdev.c           |  2 +-
> >>>  3 files changed, 38 insertions(+), 20 deletions(-)
> 
> 
> [...]
> 
> >>> +    if (bdrv_freeze_chain(bs, above_base, errp) < 0) {
> >>>          return;
> >>>      }
> >>
> >> Hm... This feels odd. There are two places where stopping to freeze the
> >> chain would make obvious sense: At base, like we originally did; or at
> >> base_overlay, like we (intend to) do since commit c624b015, because we
> >> say that we don't actually mind if the user replaces the base image. I
> >> don't see how stopping at the first filter above base makes sense.
> >>
> >> So should this use bottom_cow_node/base_overlay instead of above_base?
> > 
> > I suppose I thought “Better be safe than sorry”.
> > 
> >> You couldn't use StreamBlockJob.above_base any more then because it
> >> could change, but you also don't really need it anywhere. It's only used
> >> for unfreezing (which would change) and for finding the base (you can
> >> still find bdrv_backing_chain_next(s->base_overlay)). I guess this would
> >> even be a code simplification.
> > 
> > Great, I’ll see to it.
> 
> On second thought (yes, I know, it’s been a couple of months...) I’m not
> so sure.
> 
> If @base is a filter, then bdrv_backing_chain_next(s->base_overlay) will
> not return it.  So then the filter will be dropped, but that probably
> isn’t what the user intended.
> 
> (In fact, the block-stream doc says “When streaming completes the image
> file will have the base file as its backing file.”)

Hm... Okay, let's try an example:

    ... <- backing <- filter1 <- filter2 <- filter3 <- top
                         |         |                    |
                      @base    above_base            @device
                                                   base_overlay

The expected result after the job has completed is:

    ... <- backing <- filter1 <- top

This means that filter1 must not go away until the job has completed. In
other words, we would need to freeze the link between @base and
above_base.

If we use backing_bs(above_base) as we currently do, we wouldn't
necessarily get filter1 as the new backing child of top (as the
documentation promises), but whatever is the backing child of filter2
when the job completes. In other words, documentation and code don't
match currently.

Let's look at a few more examples to see which of the options makes more
sense:

1. Assume we drop filter1 while the stream job is running, i.e. backing
   is now the backing child of filter 2. I think it's obvious that when
   the stream job completes, you want backing to be the new backing
   child of top and not add filter1 back to the chain.

2. If we add filter1b between filter1 and filter2, do we want to drop it
   when the job completes? It's not entirely clear, but there are
   probably cases where keeping it makes sense. The user can always drop
   it manually again, but if we delete a node, some requests will go
   through unfiltered.

3. Imagine we replace filter1 wholesale, for example because a
   concurrently running mirror job performs a storage migration for
   everything up to filter1. Do we then want to reinstate the whole old
   subtree when the stream job completes? Certainly not.

So from this I would probably conclude that the bug is the
documentation, not necessarily in the code.

> So now this gets hairy.  We do want exactly @base as the backing file
> unless the user changed the graph.  But how can we detect that and if it
> hasn’t changed find @base again?
> 
> What this patch did in this version worked because the graph was frozen
> until @above_base.

No, it didn't provide the promised semantics, because "unless the user
changed the graph" isn't part of the documentation. But the promised
semantics are probably not what we want, so it's okay.

> Alternatively, we could store a pointer to @base directly (or its node
> nmae) and then see whether there is some node between s->base_overlay
> and bdrv_backing_chain_next(s->base_overlay) that matches that at the
> end of streaming.
> 
> Well, actually, a pointer won’t work because of course maybe that node
> was deleted and the area is now used for an unrelated node that the user
> doesn’t want as the new backing file.
> 
> The node name could actually work, though.  I mean, if there is some
> node in the immediate backing filter chain of base_overlay with that
> node name after streaming, it does make sense to assume that the user
> wants this to be the backing file; regardless of whether that’s exactly
> the same node as it was at the start of streaming.
> 
> But we now also have to think about what to do when there is no node
> with such a node name.  Should we keep all filters below base_overlay?
> Or should we drop all of them?  I actually think keeping them is the
> safer choice...

Using node names feels completely wrong to me. If we can't keep a
pointer (with a reference) because we want the user to be able to
delete the node, then we certainly can't keep the node name either
because that could refer to an entirely different node when the job
completes.

I don't think it's useful to waste too much thought on how to implement
the behaviour required by the documentation. The requirement seems to be
just wrong.

So in the end, maybe your current code is fine, and the way to address
the "doesn't make obvious sense" part is having a comment that explains
why above_base is where we stop freezing.

Kevin

Max Reitz Dec. 11, 2019, 4:12 p.m. UTC | #7

On 11.12.19 16:52, Kevin Wolf wrote:
> Am 11.12.2019 um 13:52 hat Max Reitz geschrieben:
>> On 16.09.19 11:52, Max Reitz wrote:
>>> On 13.09.19 16:16, Kevin Wolf wrote:
>>>> Am 09.08.2019 um 18:13 hat Max Reitz geschrieben:
>>>>> Because of the recent changes that make the stream job independent of
>>>>> the base node and instead track the node above it, we have to split that
>>>>> "bottom" node into two cases: The bottom COW node, and the node directly
>>>>> above the base node (which may be an R/W filter or the bottom COW node).
>>>>>
>>>>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>>>>> ---
>>>>>  qapi/block-core.json |  4 ++++
>>>>>  block/stream.c       | 52 ++++++++++++++++++++++++++++----------------
>>>>>  blockdev.c           |  2 +-
>>>>>  3 files changed, 38 insertions(+), 20 deletions(-)
>>
>>
>> [...]
>>
>>>>> +    if (bdrv_freeze_chain(bs, above_base, errp) < 0) {
>>>>>          return;
>>>>>      }
>>>>
>>>> Hm... This feels odd. There are two places where stopping to freeze the
>>>> chain would make obvious sense: At base, like we originally did; or at
>>>> base_overlay, like we (intend to) do since commit c624b015, because we
>>>> say that we don't actually mind if the user replaces the base image. I
>>>> don't see how stopping at the first filter above base makes sense.
>>>>
>>>> So should this use bottom_cow_node/base_overlay instead of above_base?
>>>
>>> I suppose I thought “Better be safe than sorry”.
>>>
>>>> You couldn't use StreamBlockJob.above_base any more then because it
>>>> could change, but you also don't really need it anywhere. It's only used
>>>> for unfreezing (which would change) and for finding the base (you can
>>>> still find bdrv_backing_chain_next(s->base_overlay)). I guess this would
>>>> even be a code simplification.
>>>
>>> Great, I’ll see to it.
>>
>> On second thought (yes, I know, it’s been a couple of months...) I’m not
>> so sure.
>>
>> If @base is a filter, then bdrv_backing_chain_next(s->base_overlay) will
>> not return it.  So then the filter will be dropped, but that probably
>> isn’t what the user intended.
>>
>> (In fact, the block-stream doc says “When streaming completes the image
>> file will have the base file as its backing file.”)
> 
> Hm... Okay, let's try an example:
> 
>     ... <- backing <- filter1 <- filter2 <- filter3 <- top
>                          |         |                    |
>                       @base    above_base            @device
>                                                    base_overlay
> 
> 
> The expected result after the job has completed is:
> 
>     ... <- backing <- filter1 <- top
> 
> This means that filter1 must not go away until the job has completed. In
> other words, we would need to freeze the link between @base and
> above_base.
> 
> If we use backing_bs(above_base) as we currently do, we wouldn't
> necessarily get filter1 as the new backing child of top (as the
> documentation promises), but whatever is the backing child of filter2
> when the job completes. In other words, documentation and code don't
> match currently.

Correct.

> Let's look at a few more examples to see which of the options makes more
> sense:
> 
> 1. Assume we drop filter1 while the stream job is running, i.e. backing
>    is now the backing child of filter 2. I think it's obvious that when
>    the stream job completes, you want backing to be the new backing
>    child of top and not add filter1 back to the chain.
> 
> 2. If we add filter1b between filter1 and filter2, do we want to drop it
>    when the job completes? It's not entirely clear, but there are
>    probably cases where keeping it makes sense. The user can always drop
>    it manually again, but if we delete a node, some requests will go
>    through unfiltered.

Depends.  If the base-node was given to be literally "backing", then I’d
say the user wants us to use "backing" as the new base still.

> 3. Imagine we replace filter1 wholesale, for example because a
>    concurrently running mirror job performs a storage migration for
>    everything up to filter1. Do we then want to reinstate the whole old
>    subtree when the stream job completes? Certainly not.

I’m not sure.  Again, if I as the user specified the "backing" node-name
as the base, I’d expect that to be the new backing file in all cases.

> So from this I would probably conclude that the bug is the
> documentation, not necessarily in the code.

It certainly is true that it does not address what happens when you
meddle with base.

(Except it can be argued (and I suppose I did argue that) that as it is
it does say the base file should be the backing file after stream, so if
the base file is still there, I do interpret that to mean that the
stream job always uses that as the backing node.  Of course even if it
says that, that doesn’t mean that it makes sense.  And if it doesn’t
make sense, then that’s the definition of a bug in the documentation, yes.)

>> So now this gets hairy.  We do want exactly @base as the backing file
>> unless the user changed the graph.  But how can we detect that and if it
>> hasn’t changed find @base again?
>>
>> What this patch did in this version worked because the graph was frozen
>> until @above_base.
> 
> No, it didn't provide the promised semantics, because "unless the user
> changed the graph" isn't part of the documentation.But the promised
> semantics are probably not what we want, so it's okay.

I would have said that no semantics are promised for when the node is no
longer a valid base, so we can basically do what we think makes sense.

>> Alternatively, we could store a pointer to @base directly (or its node
>> nmae) and then see whether there is some node between s->base_overlay
>> and bdrv_backing_chain_next(s->base_overlay) that matches that at the
>> end of streaming.
>>
>> Well, actually, a pointer won’t work because of course maybe that node
>> was deleted and the area is now used for an unrelated node that the user
>> doesn’t want as the new backing file.
>>
>> The node name could actually work, though.  I mean, if there is some
>> node in the immediate backing filter chain of base_overlay with that
>> node name after streaming, it does make sense to assume that the user
>> wants this to be the backing file; regardless of whether that’s exactly
>> the same node as it was at the start of streaming.
>>
>> But we now also have to think about what to do when there is no node
>> with such a node name.  Should we keep all filters below base_overlay?
>> Or should we drop all of them?  I actually think keeping them is the
>> safer choice...
> 
> Using node names feels completely wrong to me. If we can't keep a
> pointer (with a reference) because we want the user to be able to
> delete the node, then we certainly can't keep the node name either
> because that could refer to an entirely different node when the job
> completes.

I was thinking that if the user does graph manipulation, we can expect
them to have given base-node.  So they actually did refer to a node name
and it makes sense to me to keep looking it up.

(As a side note: I originally intended to say “storing a pointer or the
node-name are the only things that come to mind, but obviously both are
stupid” – but then I changed my mind and realized that the node name
maybe actually isn’t that stupid.)

> I don't think it's useful to waste too much thought on how to implement
> the behaviour required by the documentation. The requirement seems to be
> just wrong.
> 
> So in the end, maybe your current code is fine, and the way to address
> the "doesn't make obvious sense" part is having a comment that explains
> why above_base is where we stop freezing.

Let’s maybe talk about it tomorrow. :-)

Max

Kevin Wolf Dec. 11, 2019, 4:35 p.m. UTC | #8

Am 11.12.2019 um 17:12 hat Max Reitz geschrieben:
> On 11.12.19 16:52, Kevin Wolf wrote:
> > Am 11.12.2019 um 13:52 hat Max Reitz geschrieben:
> >> On 16.09.19 11:52, Max Reitz wrote:
> >>> On 13.09.19 16:16, Kevin Wolf wrote:
> >>>> Am 09.08.2019 um 18:13 hat Max Reitz geschrieben:
> >>>>> Because of the recent changes that make the stream job independent of
> >>>>> the base node and instead track the node above it, we have to split that
> >>>>> "bottom" node into two cases: The bottom COW node, and the node directly
> >>>>> above the base node (which may be an R/W filter or the bottom COW node).
> >>>>>
> >>>>> Signed-off-by: Max Reitz <mreitz@redhat.com>
> >>>>> ---
> >>>>>  qapi/block-core.json |  4 ++++
> >>>>>  block/stream.c       | 52 ++++++++++++++++++++++++++++----------------
> >>>>>  blockdev.c           |  2 +-
> >>>>>  3 files changed, 38 insertions(+), 20 deletions(-)
> >>
> >>
> >> [...]
> >>
> >>>>> +    if (bdrv_freeze_chain(bs, above_base, errp) < 0) {
> >>>>>          return;
> >>>>>      }
> >>>>
> >>>> Hm... This feels odd. There are two places where stopping to freeze the
> >>>> chain would make obvious sense: At base, like we originally did; or at
> >>>> base_overlay, like we (intend to) do since commit c624b015, because we
> >>>> say that we don't actually mind if the user replaces the base image. I
> >>>> don't see how stopping at the first filter above base makes sense.
> >>>>
> >>>> So should this use bottom_cow_node/base_overlay instead of above_base?
> >>>
> >>> I suppose I thought “Better be safe than sorry”.
> >>>
> >>>> You couldn't use StreamBlockJob.above_base any more then because it
> >>>> could change, but you also don't really need it anywhere. It's only used
> >>>> for unfreezing (which would change) and for finding the base (you can
> >>>> still find bdrv_backing_chain_next(s->base_overlay)). I guess this would
> >>>> even be a code simplification.
> >>>
> >>> Great, I’ll see to it.
> >>
> >> On second thought (yes, I know, it’s been a couple of months...) I’m not
> >> so sure.
> >>
> >> If @base is a filter, then bdrv_backing_chain_next(s->base_overlay) will
> >> not return it.  So then the filter will be dropped, but that probably
> >> isn’t what the user intended.
> >>
> >> (In fact, the block-stream doc says “When streaming completes the image
> >> file will have the base file as its backing file.”)
> > 
> > Hm... Okay, let's try an example:
> > 
> >     ... <- backing <- filter1 <- filter2 <- filter3 <- top
> >                          |         |                    |
> >                       @base    above_base            @device
> >                                                    base_overlay
> > 
> > 
> > The expected result after the job has completed is:
> > 
> >     ... <- backing <- filter1 <- top
> > 
> > This means that filter1 must not go away until the job has completed. In
> > other words, we would need to freeze the link between @base and
> > above_base.
> > 
> > If we use backing_bs(above_base) as we currently do, we wouldn't
> > necessarily get filter1 as the new backing child of top (as the
> > documentation promises), but whatever is the backing child of filter2
> > when the job completes. In other words, documentation and code don't
> > match currently.
> 
> Correct.
> 
> > Let's look at a few more examples to see which of the options makes more
> > sense:
> > 
> > 1. Assume we drop filter1 while the stream job is running, i.e. backing
> >    is now the backing child of filter 2. I think it's obvious that when
> >    the stream job completes, you want backing to be the new backing
> >    child of top and not add filter1 back to the chain.
> > 
> > 2. If we add filter1b between filter1 and filter2, do we want to drop it
> >    when the job completes? It's not entirely clear, but there are
> >    probably cases where keeping it makes sense. The user can always drop
> >    it manually again, but if we delete a node, some requests will go
> >    through unfiltered.
> 
> Depends.  If the base-node was given to be literally "backing", then I’d
> say the user wants us to use "backing" as the new base still.
> 
> > 3. Imagine we replace filter1 wholesale, for example because a
> >    concurrently running mirror job performs a storage migration for
> >    everything up to filter1. Do we then want to reinstate the whole old
> >    subtree when the stream job completes? Certainly not.
> 
> I’m not sure.  Again, if I as the user specified the "backing" node-name
> as the base, I’d expect that to be the new backing file in all cases.
> 
> > So from this I would probably conclude that the bug is the
> > documentation, not necessarily in the code.
> 
> It certainly is true that it does not address what happens when you
> meddle with base.

Which is because it wasn't relevant then: You were not allowed to meddle
with base when block-stream documentation was written. It's only a
possible case since commit c624b015.

If we really want to guarantee that it's exactly base that will be used
at the end, then we need to freeze all links down to base, essentially
reverting commit c624b015 and disallowing some operations that would be
possible if we didn't make that promise. I don't think that's the right
thing to do, but at least it would be self-consistent.

Promising base, but not freezing the links, leads to weird results in
some of the cases mentioned above.

> (Except it can be argued (and I suppose I did argue that) that as it is
> it does say the base file should be the backing file after stream, so if
> the base file is still there, I do interpret that to mean that the
> stream job always uses that as the backing node.  Of course even if it
> says that, that doesn’t mean that it makes sense.  And if it doesn’t
> make sense, then that’s the definition of a bug in the documentation, yes.)
> 
> >> So now this gets hairy.  We do want exactly @base as the backing file
> >> unless the user changed the graph.  But how can we detect that and if it
> >> hasn’t changed find @base again?
> >>
> >> What this patch did in this version worked because the graph was frozen
> >> until @above_base.
> > 
> > No, it didn't provide the promised semantics, because "unless the user
> > changed the graph" isn't part of the documentation.But the promised
> > semantics are probably not what we want, so it's okay.
> 
> I would have said that no semantics are promised for when the node is no
> longer a valid base, so we can basically do what we think makes sense.

Like undefined behaviour in C?

Sounds great, let's reboot the guest into some 512-byte demo then.

> >> Alternatively, we could store a pointer to @base directly (or its node
> >> nmae) and then see whether there is some node between s->base_overlay
> >> and bdrv_backing_chain_next(s->base_overlay) that matches that at the
> >> end of streaming.
> >>
> >> Well, actually, a pointer won’t work because of course maybe that node
> >> was deleted and the area is now used for an unrelated node that the user
> >> doesn’t want as the new backing file.
> >>
> >> The node name could actually work, though.  I mean, if there is some
> >> node in the immediate backing filter chain of base_overlay with that
> >> node name after streaming, it does make sense to assume that the user
> >> wants this to be the backing file; regardless of whether that’s exactly
> >> the same node as it was at the start of streaming.
> >>
> >> But we now also have to think about what to do when there is no node
> >> with such a node name.  Should we keep all filters below base_overlay?
> >> Or should we drop all of them?  I actually think keeping them is the
> >> safer choice...
> > 
> > Using node names feels completely wrong to me. If we can't keep a
> > pointer (with a reference) because we want the user to be able to
> > delete the node, then we certainly can't keep the node name either
> > because that could refer to an entirely different node when the job
> > completes.
> 
> I was thinking that if the user does graph manipulation, we can expect
> them to have given base-node.  So they actually did refer to a node name
> and it makes sense to me to keep looking it up.
> 
> (As a side note: I originally intended to say “storing a pointer or the
> node-name are the only things that come to mind, but obviously both are
> stupid” – but then I changed my mind and realized that the node name
> maybe actually isn’t that stupid.)

I think it is. A node name refers to the node with that name at the
time of the QMP command. What node was referenced in that command
doesn't change later because of graph manipulations.

> > I don't think it's useful to waste too much thought on how to implement
> > the behaviour required by the documentation. The requirement seems to be
> > just wrong.
> > 
> > So in the end, maybe your current code is fine, and the way to address
> > the "doesn't make obvious sense" part is having a comment that explains
> > why above_base is where we stop freezing.
> 
> Let’s maybe talk about it tomorrow. :-)

Okay, added to my preliminary agenda.

Kevin

[v6,28/42] stream: Deal with filters

Commit Message

Comments

Patch