diff mbox series

[07/12] block/backup: add 'always' bitmap sync policy

Message ID 20190620010356.19164-8-jsnow@redhat.com (mailing list archive)
State New, archived
Headers show
Series bitmaps: introduce 'bitmap' sync mode | expand

Commit Message

John Snow June 20, 2019, 1:03 a.m. UTC
This adds an "always" policy for bitmap synchronization. Regardless of if
the job succeeds or fails, the bitmap is *always* synchronized. This means
that for backups that fail part-way through, the bitmap retains a record of
which sectors need to be copied out to accomplish a new backup using the
old, partial result.

In effect, this allows us to "resume" a failed backup; however the new backup
will be from the new point in time, so it isn't a "resume" as much as it is
an "incremental retry." This can be useful in the case of extremely large
backups that fail considerably through the operation and we'd like to not waste
the work that was already performed.

Signed-off-by: John Snow <jsnow@redhat.com>
---
 qapi/block-core.json |  5 ++++-
 block/backup.c       | 10 ++++++----
 2 files changed, 10 insertions(+), 5 deletions(-)

Comments

Max Reitz June 20, 2019, 5 p.m. UTC | #1
On 20.06.19 03:03, John Snow wrote:
> This adds an "always" policy for bitmap synchronization. Regardless of if
> the job succeeds or fails, the bitmap is *always* synchronized. This means
> that for backups that fail part-way through, the bitmap retains a record of
> which sectors need to be copied out to accomplish a new backup using the
> old, partial result.
> 
> In effect, this allows us to "resume" a failed backup; however the new backup
> will be from the new point in time, so it isn't a "resume" as much as it is
> an "incremental retry." This can be useful in the case of extremely large
> backups that fail considerably through the operation and we'd like to not waste
> the work that was already performed.
> 
> Signed-off-by: John Snow <jsnow@redhat.com>
> ---
>  qapi/block-core.json |  5 ++++-
>  block/backup.c       | 10 ++++++----
>  2 files changed, 10 insertions(+), 5 deletions(-)
> 
> diff --git a/qapi/block-core.json b/qapi/block-core.json
> index 0332dcaabc..58d267f1f5 100644
> --- a/qapi/block-core.json
> +++ b/qapi/block-core.json
> @@ -1143,6 +1143,9 @@
>  # An enumeration of possible behaviors for the synchronization of a bitmap
>  # when used for data copy operations.
>  #
> +# @always: The bitmap is always synchronized with remaining blocks to copy,
> +#          whether or not the operation has completed successfully or not.
> +#
>  # @conditional: The bitmap is only synchronized when the operation is successul.
>  #               This is useful for Incremental semantics.
>  #
> @@ -1153,7 +1156,7 @@
>  # Since: 4.1
>  ##
>  { 'enum': 'BitmapSyncMode',
> -  'data': ['conditional', 'never'] }
> +  'data': ['always', 'conditional', 'never'] }
>  
>  ##
>  # @MirrorCopyMode:
> diff --git a/block/backup.c b/block/backup.c
> index 627f724b68..beb2078696 100644
> --- a/block/backup.c
> +++ b/block/backup.c
> @@ -266,15 +266,17 @@ static void backup_cleanup_sync_bitmap(BackupBlockJob *job, int ret)
>      BlockDriverState *bs = blk_bs(job->common.blk);
>  
>      if (ret < 0 || job->bitmap_mode == BITMAP_SYNC_MODE_NEVER) {
> -        /* Failure, or we don't want to synchronize the bitmap.
> -         * Merge the successor back into the parent, delete nothing. */
> +        /* Failure, or we don't want to synchronize the bitmap. */
> +        if (job->bitmap_mode == BITMAP_SYNC_MODE_ALWAYS) {
> +            bdrv_dirty_bitmap_claim(job->sync_bitmap, &job->copy_bitmap);

Hmm...  OK, bitmaps in backup always confuse me, so bear with me, please.

(Hi, I’m a time traveler from the end of this section and I can tell you
that everything is fine.  I was just confused.  I’ll still keep this
here, because it was so much work.)

The copy_bitmap is copied from the sync_bitmap at the beginning, so the
sync_bitmap can continue to be dirtied, but that won’t affect the job.
In normal incremental mode, this means that the sync point is always at
the beginning of the job.  (Well, naturally, because that’s how backup
is supposed to go.)

But then replacing the sync_bitmap with the copy_bitmap here means that
all of these dirtyings that happened during the job are lost.  Hmm, but
that doesn’t matter, does it?  Because whenever something was dirtied in
sync_bitmap, the corresponding area must have been copied to the backup
due to the job.

Ah, yes, it would actually be wrong to keep the new dirty bits, because
in this mode, sync_bitmap should (on failure) reflect what is left to
copy to make the backup complete.  Copying these newly dirtied sectors
would be wrong.  (Yes, I know you wrote that in the documentation of
@always.  I just tried to get a different perspective.)

Yes, yes, and copy_bitmap is always set whenever a CBW to the target
fails before the source can be updated.  Good, good.


Hi, I’m the time traveler from above.  I also left the section here so I
can give one of my trademark “Ramble, ramble,

Reviewed-by: Max Reitz <mreitz@redhat.com>
> +        }
> +        /* Merge the successor back into the parent. */
>          bm = bdrv_reclaim_dirty_bitmap(bs, job->sync_bitmap, NULL);
> -        assert(bm);
>      } else {
>          /* Everything is fine, delete this bitmap and install the backup. */
>          bm = bdrv_dirty_bitmap_abdicate(bs, job->sync_bitmap, NULL);
> -        assert(bm);
>      }
> +    assert(bm);
>  }
>  
>  static void backup_commit(Job *job)
>
John Snow June 20, 2019, 6:44 p.m. UTC | #2
On 6/20/19 1:00 PM, Max Reitz wrote:
> On 20.06.19 03:03, John Snow wrote:
>> This adds an "always" policy for bitmap synchronization. Regardless of if
>> the job succeeds or fails, the bitmap is *always* synchronized. This means
>> that for backups that fail part-way through, the bitmap retains a record of
>> which sectors need to be copied out to accomplish a new backup using the
>> old, partial result.
>>
>> In effect, this allows us to "resume" a failed backup; however the new backup
>> will be from the new point in time, so it isn't a "resume" as much as it is
>> an "incremental retry." This can be useful in the case of extremely large
>> backups that fail considerably through the operation and we'd like to not waste
>> the work that was already performed.
>>
>> Signed-off-by: John Snow <jsnow@redhat.com>
>> ---
>>  qapi/block-core.json |  5 ++++-
>>  block/backup.c       | 10 ++++++----
>>  2 files changed, 10 insertions(+), 5 deletions(-)
>>
>> diff --git a/qapi/block-core.json b/qapi/block-core.json
>> index 0332dcaabc..58d267f1f5 100644
>> --- a/qapi/block-core.json
>> +++ b/qapi/block-core.json
>> @@ -1143,6 +1143,9 @@
>>  # An enumeration of possible behaviors for the synchronization of a bitmap
>>  # when used for data copy operations.
>>  #
>> +# @always: The bitmap is always synchronized with remaining blocks to copy,
>> +#          whether or not the operation has completed successfully or not.
>> +#
>>  # @conditional: The bitmap is only synchronized when the operation is successul.
>>  #               This is useful for Incremental semantics.
>>  #
>> @@ -1153,7 +1156,7 @@
>>  # Since: 4.1
>>  ##
>>  { 'enum': 'BitmapSyncMode',
>> -  'data': ['conditional', 'never'] }
>> +  'data': ['always', 'conditional', 'never'] }
>>  
>>  ##
>>  # @MirrorCopyMode:
>> diff --git a/block/backup.c b/block/backup.c
>> index 627f724b68..beb2078696 100644
>> --- a/block/backup.c
>> +++ b/block/backup.c
>> @@ -266,15 +266,17 @@ static void backup_cleanup_sync_bitmap(BackupBlockJob *job, int ret)
>>      BlockDriverState *bs = blk_bs(job->common.blk);
>>  
>>      if (ret < 0 || job->bitmap_mode == BITMAP_SYNC_MODE_NEVER) {
>> -        /* Failure, or we don't want to synchronize the bitmap.
>> -         * Merge the successor back into the parent, delete nothing. */
>> +        /* Failure, or we don't want to synchronize the bitmap. */
>> +        if (job->bitmap_mode == BITMAP_SYNC_MODE_ALWAYS) {
>> +            bdrv_dirty_bitmap_claim(job->sync_bitmap, &job->copy_bitmap);
> 
> Hmm...  OK, bitmaps in backup always confuse me, so bear with me, please.
> 

I realize this is an extremely dense section that actually covers a
*lot* of pathways.

> (Hi, I’m a time traveler from the end of this section and I can tell you
> that everything is fine.  I was just confused.  I’ll still keep this
> here, because it was so much work.)
> 
> The copy_bitmap is copied from the sync_bitmap at the beginning, so the
> sync_bitmap can continue to be dirtied, but that won’t affect the job.
> In normal incremental mode, this means that the sync point is always at
> the beginning of the job.  (Well, naturally, because that’s how backup
> is supposed to go.)
> 

sync_bitmap: This is used as an initial manifest for which sectors to
copy out. It is the user-provided bitmap. We actually *never* edit this
bitmap in the body of the job.

copy_bitmap: This is the manifest for which blocks remain to be copied
out. We clear bits in this as we go, because we use it as our loop
condition.

So what you say is actually only half-true: the sync_bitmap actually
remains static during the duration of the job, and it has an anonymous
child that accrues new writes. This is a holdover from before we had a
copy_bitmap, and we used to use a sync_bitmap directly as our loop
condition.

(This could be simplified upstream at present; but after this patch it
cannot be for reasons explained below. We do wish to maintain three
distinct sets of bits:
1. The bits at the start of the operation,
2. The bits accrued during the operation, and
3. The bits that remain to be, or were not, copied during the operation.)

So there's actually three bitmaps:

- sync_bitmap: actually just static and read-only
- sync_bitmap's anonymous child: accrues new writes.
- copy_bitmap: loop conditional.

> But then replacing the sync_bitmap with the copy_bitmap here means that
> all of these dirtyings that happened during the job are lost.  Hmm, but
> that doesn’t matter, does it?  Because whenever something was dirtied in
> sync_bitmap, the corresponding area must have been copied to the backup
> due to the job.
> 

The new dirty bits were accrued very secretly in the anonymous child.
The new dirty bits are merged in via the reclaim() function.

So, what happens is:

- Sync_bitmap gets the bit pattern of copy_bitmap (one way or another)
- Sync_bitmap reclaims (merges with) its anonymous child.

> Ah, yes, it would actually be wrong to keep the new dirty bits, because
> in this mode, sync_bitmap should (on failure) reflect what is left to
> copy to make the backup complete.  Copying these newly dirtied sectors
> would be wrong.  (Yes, I know you wrote that in the documentation of
> @always.  I just tried to get a different perspective.)
> 
> Yes, yes, and copy_bitmap is always set whenever a CBW to the target
> fails before the source can be updated.  Good, good.
> 

You might have slightly the wrong idea; it's important to keep track of
what was dirtied during the operation because that data is important for
the next bitmap backup.

The merging of "sectors left to copy" (in the case of a failed backup)
and "sectors dirtied since we started the operation" forms the actual
minimal set needed to re-write to this target to achieve a new
functioning point in time. This is what you get with the "always" mode
in a failure case.

In a success case, it just so happens that "sectors left to copy" is the
empty set.

It's like an incremental on top of the incremental.

Consider this:

We have a 4TB drive and we have dirtied 3TB of it since our full backup.
We copy out 2TB as part of a new incremental backup before suffering
some kind of failure.

Today, you'd need to start a new incremental backup that copies that
entire 3TB *plus* whatever was dirtied since the job failed.

With this mode, you'd only need to copy the remaining 1TB + whatever was
dirtied since.

So, what this logic is really doing is:

If we failed, OR if we want the "never" sync policy:

Merge the anonymous child (bits written during op) back into sync_bitmap
(bits we were instructed to copy), leaving us as if we have never
started this operation.

If, however, we failed and we have the "always" sync policy, we destroy
the sync_bitmap (bits we were instructed to copy) and replace it with
the copy_bitmap (bits remaining to copy). Then, we merge that with the
anonymous child (bits written during op).

Or, in success cases (when sync policy is not never), we simply delete
the sync_bitmap (bits we were instructed to copy) and replace it with
its anonymous child (bits written during op).

> 
> Hi, I’m the time traveler from above.  I also left the section here so I
> can give one of my trademark “Ramble, ramble,
> 
> Reviewed-by: Max Reitz <mreitz@redhat.com>
> 
> ”
> 
>> +        }
>> +        /* Merge the successor back into the parent. */
>>          bm = bdrv_reclaim_dirty_bitmap(bs, job->sync_bitmap, NULL);
>> -        assert(bm);
>>      } else {
>>          /* Everything is fine, delete this bitmap and install the backup. */
>>          bm = bdrv_dirty_bitmap_abdicate(bs, job->sync_bitmap, NULL);
>> -        assert(bm);
>>      }
>> +    assert(bm);
>>  }
>>  
>>  static void backup_commit(Job *job)
>>
> 
>
Max Reitz June 20, 2019, 6:53 p.m. UTC | #3
On 20.06.19 20:44, John Snow wrote:
> 
> 
> On 6/20/19 1:00 PM, Max Reitz wrote:
>> On 20.06.19 03:03, John Snow wrote:
>>> This adds an "always" policy for bitmap synchronization. Regardless of if
>>> the job succeeds or fails, the bitmap is *always* synchronized. This means
>>> that for backups that fail part-way through, the bitmap retains a record of
>>> which sectors need to be copied out to accomplish a new backup using the
>>> old, partial result.
>>>
>>> In effect, this allows us to "resume" a failed backup; however the new backup
>>> will be from the new point in time, so it isn't a "resume" as much as it is
>>> an "incremental retry." This can be useful in the case of extremely large
>>> backups that fail considerably through the operation and we'd like to not waste
>>> the work that was already performed.
>>>
>>> Signed-off-by: John Snow <jsnow@redhat.com>
>>> ---
>>>  qapi/block-core.json |  5 ++++-
>>>  block/backup.c       | 10 ++++++----
>>>  2 files changed, 10 insertions(+), 5 deletions(-)
>>>
>>> diff --git a/qapi/block-core.json b/qapi/block-core.json
>>> index 0332dcaabc..58d267f1f5 100644
>>> --- a/qapi/block-core.json
>>> +++ b/qapi/block-core.json
>>> @@ -1143,6 +1143,9 @@
>>>  # An enumeration of possible behaviors for the synchronization of a bitmap
>>>  # when used for data copy operations.
>>>  #
>>> +# @always: The bitmap is always synchronized with remaining blocks to copy,
>>> +#          whether or not the operation has completed successfully or not.
>>> +#
>>>  # @conditional: The bitmap is only synchronized when the operation is successul.
>>>  #               This is useful for Incremental semantics.
>>>  #
>>> @@ -1153,7 +1156,7 @@
>>>  # Since: 4.1
>>>  ##
>>>  { 'enum': 'BitmapSyncMode',
>>> -  'data': ['conditional', 'never'] }
>>> +  'data': ['always', 'conditional', 'never'] }
>>>  
>>>  ##
>>>  # @MirrorCopyMode:
>>> diff --git a/block/backup.c b/block/backup.c
>>> index 627f724b68..beb2078696 100644
>>> --- a/block/backup.c
>>> +++ b/block/backup.c
>>> @@ -266,15 +266,17 @@ static void backup_cleanup_sync_bitmap(BackupBlockJob *job, int ret)
>>>      BlockDriverState *bs = blk_bs(job->common.blk);
>>>  
>>>      if (ret < 0 || job->bitmap_mode == BITMAP_SYNC_MODE_NEVER) {
>>> -        /* Failure, or we don't want to synchronize the bitmap.
>>> -         * Merge the successor back into the parent, delete nothing. */
>>> +        /* Failure, or we don't want to synchronize the bitmap. */
>>> +        if (job->bitmap_mode == BITMAP_SYNC_MODE_ALWAYS) {
>>> +            bdrv_dirty_bitmap_claim(job->sync_bitmap, &job->copy_bitmap);
>>
>> Hmm...  OK, bitmaps in backup always confuse me, so bear with me, please.
>>
> 
> I realize this is an extremely dense section that actually covers a
> *lot* of pathways.
> 
>> (Hi, I’m a time traveler from the end of this section and I can tell you
>> that everything is fine.  I was just confused.  I’ll still keep this
>> here, because it was so much work.)
>>
>> The copy_bitmap is copied from the sync_bitmap at the beginning, so the
>> sync_bitmap can continue to be dirtied, but that won’t affect the job.
>> In normal incremental mode, this means that the sync point is always at
>> the beginning of the job.  (Well, naturally, because that’s how backup
>> is supposed to go.)
>>
> 
> sync_bitmap: This is used as an initial manifest for which sectors to
> copy out. It is the user-provided bitmap. We actually *never* edit this
> bitmap in the body of the job.
> 
> copy_bitmap: This is the manifest for which blocks remain to be copied
> out. We clear bits in this as we go, because we use it as our loop
> condition.
> 
> So what you say is actually only half-true: the sync_bitmap actually
> remains static during the duration of the job, and it has an anonymous
> child that accrues new writes. This is a holdover from before we had a
> copy_bitmap, and we used to use a sync_bitmap directly as our loop
> condition.
> 
> (This could be simplified upstream at present; but after this patch it
> cannot be for reasons explained below. We do wish to maintain three
> distinct sets of bits:
> 1. The bits at the start of the operation,
> 2. The bits accrued during the operation, and
> 3. The bits that remain to be, or were not, copied during the operation.)
> 
> So there's actually three bitmaps:
> 
> - sync_bitmap: actually just static and read-only
> - sync_bitmap's anonymous child: accrues new writes.

Ah, right...  Thanks for writing that up.

> - copy_bitmap: loop conditional.
> 
>> But then replacing the sync_bitmap with the copy_bitmap here means that
>> all of these dirtyings that happened during the job are lost.  Hmm, but
>> that doesn’t matter, does it?  Because whenever something was dirtied in
>> sync_bitmap, the corresponding area must have been copied to the backup
>> due to the job.
>>
> 
> The new dirty bits were accrued very secretly in the anonymous child.
> The new dirty bits are merged in via the reclaim() function.
> 
> So, what happens is:
> 
> - Sync_bitmap gets the bit pattern of copy_bitmap (one way or another)
> - Sync_bitmap reclaims (merges with) its anonymous child.
> 
>> Ah, yes, it would actually be wrong to keep the new dirty bits, because
>> in this mode, sync_bitmap should (on failure) reflect what is left to
>> copy to make the backup complete.  Copying these newly dirtied sectors
>> would be wrong.  (Yes, I know you wrote that in the documentation of
>> @always.  I just tried to get a different perspective.)
>>
>> Yes, yes, and copy_bitmap is always set whenever a CBW to the target
>> fails before the source can be updated.  Good, good.
>>
> 
> You might have slightly the wrong idea; it's important to keep track of
> what was dirtied during the operation because that data is important for
> the next bitmap backup.
> 
> The merging of "sectors left to copy" (in the case of a failed backup)
> and "sectors dirtied since we started the operation" forms the actual
> minimal set needed to re-write to this target to achieve a new
> functioning point in time. This is what you get with the "always" mode
> in a failure case.
> 
> In a success case, it just so happens that "sectors left to copy" is the
> empty set.
> 
> It's like an incremental on top of the incremental.
> 
> Consider this:
> 
> We have a 4TB drive and we have dirtied 3TB of it since our full backup.
> We copy out 2TB as part of a new incremental backup before suffering
> some kind of failure.
> 
> Today, you'd need to start a new incremental backup that copies that
> entire 3TB *plus* whatever was dirtied since the job failed.
> 
> With this mode, you'd only need to copy the remaining 1TB + whatever was
> dirtied since.
> 
> So, what this logic is really doing is:
> 
> If we failed, OR if we want the "never" sync policy:
> 
> Merge the anonymous child (bits written during op) back into sync_bitmap
> (bits we were instructed to copy), leaving us as if we have never
> started this operation.
> 
> If, however, we failed and we have the "always" sync policy, we destroy
> the sync_bitmap (bits we were instructed to copy) and replace it with
> the copy_bitmap (bits remaining to copy). Then, we merge that with the
> anonymous child (bits written during op).

Oh, so that’s the way it works.  I thought “always” meant that you can
repeat the backup.  But it just means you keep your partial backup and
pretend it’s a full incremental one.

Now that I think about it again...  Yeah, you can’t repeat a backup at a
later point, of course.  If data is gone in the meantime, it’s gone.

So, uh, I was wrong that it’s all good, because it would have been
wrong?  But thankfully I was just wrong myself, and so it is all good
after all?  My confusion with bitmaps as lifted, now I’m just confused
with myself.

I revoke my R-b and give a new one:

Reviewed-by: Max Reitz <mreitz@redhat.com>

Or something like that.

Again, thanks a lot for clarifying.

Max

> Or, in success cases (when sync policy is not never), we simply delete
> the sync_bitmap (bits we were instructed to copy) and replace it with
> its anonymous child (bits written during op).
Vladimir Sementsov-Ogievskiy June 21, 2019, 12:57 p.m. UTC | #4
20.06.2019 4:03, John Snow wrote:
> This adds an "always" policy for bitmap synchronization. Regardless of if
> the job succeeds or fails, the bitmap is *always* synchronized. This means
> that for backups that fail part-way through, the bitmap retains a record of
> which sectors need to be copied out to accomplish a new backup using the
> old, partial result.
> 
> In effect, this allows us to "resume" a failed backup; however the new backup
> will be from the new point in time, so it isn't a "resume" as much as it is
> an "incremental retry." This can be useful in the case of extremely large
> backups that fail considerably through the operation and we'd like to not waste
> the work that was already performed.
> 
> Signed-off-by: John Snow <jsnow@redhat.com>
> ---
>   qapi/block-core.json |  5 ++++-
>   block/backup.c       | 10 ++++++----
>   2 files changed, 10 insertions(+), 5 deletions(-)
> 
> diff --git a/qapi/block-core.json b/qapi/block-core.json
> index 0332dcaabc..58d267f1f5 100644
> --- a/qapi/block-core.json
> +++ b/qapi/block-core.json
> @@ -1143,6 +1143,9 @@
>   # An enumeration of possible behaviors for the synchronization of a bitmap
>   # when used for data copy operations.
>   #
> +# @always: The bitmap is always synchronized with remaining blocks to copy,
> +#          whether or not the operation has completed successfully or not.

Hmm, now I think that 'always' sounds a bit like 'really always' i.e. during backup
too, which is confusing.. But I don't have better suggestion.

> +#
>   # @conditional: The bitmap is only synchronized when the operation is successul.
>   #               This is useful for Incremental semantics.
>   #
> @@ -1153,7 +1156,7 @@
>   # Since: 4.1
>   ##
>   { 'enum': 'BitmapSyncMode',
> -  'data': ['conditional', 'never'] }
> +  'data': ['always', 'conditional', 'never'] }
>   
>   ##
>   # @MirrorCopyMode:
> diff --git a/block/backup.c b/block/backup.c
> index 627f724b68..beb2078696 100644
> --- a/block/backup.c
> +++ b/block/backup.c
> @@ -266,15 +266,17 @@ static void backup_cleanup_sync_bitmap(BackupBlockJob *job, int ret)
>       BlockDriverState *bs = blk_bs(job->common.blk);
>   
>       if (ret < 0 || job->bitmap_mode == BITMAP_SYNC_MODE_NEVER) {
> -        /* Failure, or we don't want to synchronize the bitmap.
> -         * Merge the successor back into the parent, delete nothing. */
> +        /* Failure, or we don't want to synchronize the bitmap. */
> +        if (job->bitmap_mode == BITMAP_SYNC_MODE_ALWAYS) {
> +            bdrv_dirty_bitmap_claim(job->sync_bitmap, &job->copy_bitmap);
> +        }
> +        /* Merge the successor back into the parent. */
>           bm = bdrv_reclaim_dirty_bitmap(bs, job->sync_bitmap, NULL);

Hmm good, it should work. It's a lot more tricky, than just
"synchronized with remaining blocks to copy", but I'm not sure the we need more details in
spec.

What we have in backup? So, from one hand we have an incremental backup, and a bitmap, counting from it.
On the other hand it's not normal incremental backup, as it don't correspond to any valid state of vm disk,
and it may be used only as a backing in a chain of further successful incremental backup, yes?

And then I think: with this mode we can not stop on first error, but ignore it, just leaving dirty bit for
resulting bitmap.. We have BLOCKDEV_ON_ERROR_IGNORE, which may be used to achieve it, but seems it don't
work as expected, as in backup_loop() we retry operation if ret < 0 and  action != BLOCK_ERROR_ACTION_REPORT.

And another thought: can user take a decision of discarding (like CONDITIONAL) or saving in backing chain (like
ALWAYS) failed backup result _after_ backup job complete? For example, for small resulting backup it may be
better to discard it and for large - to save.
Will it work if we start job with ALWAYS mode and autocomplete = false, then on fail we can look at job progress,
and if it is small we cancel job, otherwise call complete? Or stop, block-job-complete will not work with failure
scenarios? Then we have to set BLOCKDEV_ON_ERROR_IGNORE, and on first error event decide, cancel or not? But we
can only cancel or continue..

Hmm. Cancel. So on cancel and abort you synchronize bitmap too? Seems in bad relation with what cancel should do,
and in transactions in general...


> -        assert(bm);
>       } else {
>           /* Everything is fine, delete this bitmap and install the backup. */
>           bm = bdrv_dirty_bitmap_abdicate(bs, job->sync_bitmap, NULL);
> -        assert(bm);
>       }
> +    assert(bm);
>   }
>   
>   static void backup_commit(Job *job)
>
Vladimir Sementsov-Ogievskiy June 21, 2019, 12:59 p.m. UTC | #5
21.06.2019 15:57, Vladimir Sementsov-Ogievskiy wrote:
> 20.06.2019 4:03, John Snow wrote:
>> This adds an "always" policy for bitmap synchronization. Regardless of if
>> the job succeeds or fails, the bitmap is *always* synchronized. This means
>> that for backups that fail part-way through, the bitmap retains a record of
>> which sectors need to be copied out to accomplish a new backup using the
>> old, partial result.
>>
>> In effect, this allows us to "resume" a failed backup; however the new backup
>> will be from the new point in time, so it isn't a "resume" as much as it is
>> an "incremental retry." This can be useful in the case of extremely large
>> backups that fail considerably through the operation and we'd like to not waste
>> the work that was already performed.
>>
>> Signed-off-by: John Snow <jsnow@redhat.com>
>> ---
>>   qapi/block-core.json |  5 ++++-
>>   block/backup.c       | 10 ++++++----
>>   2 files changed, 10 insertions(+), 5 deletions(-)
>>
>> diff --git a/qapi/block-core.json b/qapi/block-core.json
>> index 0332dcaabc..58d267f1f5 100644
>> --- a/qapi/block-core.json
>> +++ b/qapi/block-core.json
>> @@ -1143,6 +1143,9 @@
>>   # An enumeration of possible behaviors for the synchronization of a bitmap
>>   # when used for data copy operations.
>>   #
>> +# @always: The bitmap is always synchronized with remaining blocks to copy,
>> +#          whether or not the operation has completed successfully or not.
> 
> Hmm, now I think that 'always' sounds a bit like 'really always' i.e. during backup
> too, which is confusing.. But I don't have better suggestion.
> 
>> +#
>>   # @conditional: The bitmap is only synchronized when the operation is successul.
>>   #               This is useful for Incremental semantics.
>>   #
>> @@ -1153,7 +1156,7 @@
>>   # Since: 4.1
>>   ##
>>   { 'enum': 'BitmapSyncMode',
>> -  'data': ['conditional', 'never'] }
>> +  'data': ['always', 'conditional', 'never'] }
>>   ##
>>   # @MirrorCopyMode:
>> diff --git a/block/backup.c b/block/backup.c
>> index 627f724b68..beb2078696 100644
>> --- a/block/backup.c
>> +++ b/block/backup.c
>> @@ -266,15 +266,17 @@ static void backup_cleanup_sync_bitmap(BackupBlockJob *job, int ret)
>>       BlockDriverState *bs = blk_bs(job->common.blk);
>>       if (ret < 0 || job->bitmap_mode == BITMAP_SYNC_MODE_NEVER) {
>> -        /* Failure, or we don't want to synchronize the bitmap.
>> -         * Merge the successor back into the parent, delete nothing. */
>> +        /* Failure, or we don't want to synchronize the bitmap. */
>> +        if (job->bitmap_mode == BITMAP_SYNC_MODE_ALWAYS) {
>> +            bdrv_dirty_bitmap_claim(job->sync_bitmap, &job->copy_bitmap);
>> +        }
>> +        /* Merge the successor back into the parent. */
>>           bm = bdrv_reclaim_dirty_bitmap(bs, job->sync_bitmap, NULL);
> 
> Hmm good, it should work. It's a lot more tricky, than just
> "synchronized with remaining blocks to copy", but I'm not sure the we need more details in
> spec.
> 
> What we have in backup? So, from one hand we have an incremental backup, and a bitmap, counting from it.
> On the other hand it's not normal incremental backup, as it don't correspond to any valid state of vm disk,
> and it may be used only as a backing in a chain of further successful incremental backup, yes?
> 
> And then I think: with this mode we can not stop on first error, but ignore it, just leaving dirty bit for
> resulting bitmap.. We have BLOCKDEV_ON_ERROR_IGNORE, which may be used to achieve it, but seems it don't
> work as expected, as in backup_loop() we retry operation if ret < 0 and  action != BLOCK_ERROR_ACTION_REPORT.
> 
> And another thought: can user take a decision of discarding (like CONDITIONAL) or saving in backing chain (like
> ALWAYS) failed backup result _after_ backup job complete? For example, for small resulting backup it may be
> better to discard it and for large - to save.
> Will it work if we start job with ALWAYS mode and autocomplete = false, then on fail we can look at job progress,
> and if it is small we cancel job, otherwise call complete? Or stop, block-job-complete will not work with failure
> scenarios? Then we have to set BLOCKDEV_ON_ERROR_IGNORE, and on first error event decide, cancel or not? But we
> can only cancel or continue..
> 
> Hmm. Cancel. So on cancel and abort you synchronize bitmap too? Seems in bad relation with what cancel should do,
> and in transactions in general...

I mean grouped transaction mode, how should it work with this?

> 
> 
>> -        assert(bm);
>>       } else {
>>           /* Everything is fine, delete this bitmap and install the backup. */
>>           bm = bdrv_dirty_bitmap_abdicate(bs, job->sync_bitmap, NULL);
>> -        assert(bm);
>>       }
>> +    assert(bm);
>>   }
>>   static void backup_commit(Job *job)
>>
> 
>
Vladimir Sementsov-Ogievskiy June 21, 2019, 1:08 p.m. UTC | #6
21.06.2019 15:59, Vladimir Sementsov-Ogievskiy wrote:
> 21.06.2019 15:57, Vladimir Sementsov-Ogievskiy wrote:
>> 20.06.2019 4:03, John Snow wrote:
>>> This adds an "always" policy for bitmap synchronization. Regardless of if
>>> the job succeeds or fails, the bitmap is *always* synchronized. This means
>>> that for backups that fail part-way through, the bitmap retains a record of
>>> which sectors need to be copied out to accomplish a new backup using the
>>> old, partial result.
>>>
>>> In effect, this allows us to "resume" a failed backup; however the new backup
>>> will be from the new point in time, so it isn't a "resume" as much as it is
>>> an "incremental retry." This can be useful in the case of extremely large
>>> backups that fail considerably through the operation and we'd like to not waste
>>> the work that was already performed.
>>>
>>> Signed-off-by: John Snow <jsnow@redhat.com>
>>> ---
>>>   qapi/block-core.json |  5 ++++-
>>>   block/backup.c       | 10 ++++++----
>>>   2 files changed, 10 insertions(+), 5 deletions(-)
>>>
>>> diff --git a/qapi/block-core.json b/qapi/block-core.json
>>> index 0332dcaabc..58d267f1f5 100644
>>> --- a/qapi/block-core.json
>>> +++ b/qapi/block-core.json
>>> @@ -1143,6 +1143,9 @@
>>>   # An enumeration of possible behaviors for the synchronization of a bitmap
>>>   # when used for data copy operations.
>>>   #
>>> +# @always: The bitmap is always synchronized with remaining blocks to copy,
>>> +#          whether or not the operation has completed successfully or not.
>>
>> Hmm, now I think that 'always' sounds a bit like 'really always' i.e. during backup
>> too, which is confusing.. But I don't have better suggestion.
>>
>>> +#
>>>   # @conditional: The bitmap is only synchronized when the operation is successul.
>>>   #               This is useful for Incremental semantics.
>>>   #
>>> @@ -1153,7 +1156,7 @@
>>>   # Since: 4.1
>>>   ##
>>>   { 'enum': 'BitmapSyncMode',
>>> -  'data': ['conditional', 'never'] }
>>> +  'data': ['always', 'conditional', 'never'] }
>>>   ##
>>>   # @MirrorCopyMode:
>>> diff --git a/block/backup.c b/block/backup.c
>>> index 627f724b68..beb2078696 100644
>>> --- a/block/backup.c
>>> +++ b/block/backup.c
>>> @@ -266,15 +266,17 @@ static void backup_cleanup_sync_bitmap(BackupBlockJob *job, int ret)
>>>       BlockDriverState *bs = blk_bs(job->common.blk);
>>>       if (ret < 0 || job->bitmap_mode == BITMAP_SYNC_MODE_NEVER) {
>>> -        /* Failure, or we don't want to synchronize the bitmap.
>>> -         * Merge the successor back into the parent, delete nothing. */
>>> +        /* Failure, or we don't want to synchronize the bitmap. */
>>> +        if (job->bitmap_mode == BITMAP_SYNC_MODE_ALWAYS) {
>>> +            bdrv_dirty_bitmap_claim(job->sync_bitmap, &job->copy_bitmap);
>>> +        }
>>> +        /* Merge the successor back into the parent. */
>>>           bm = bdrv_reclaim_dirty_bitmap(bs, job->sync_bitmap, NULL);
>>
>> Hmm good, it should work. It's a lot more tricky, than just
>> "synchronized with remaining blocks to copy", but I'm not sure the we need more details in
>> spec.
>>
>> What we have in backup? So, from one hand we have an incremental backup, and a bitmap, counting from it.
>> On the other hand it's not normal incremental backup, as it don't correspond to any valid state of vm disk,
>> and it may be used only as a backing in a chain of further successful incremental backup, yes?
>>
>> And then I think: with this mode we can not stop on first error, but ignore it, just leaving dirty bit for
>> resulting bitmap.. We have BLOCKDEV_ON_ERROR_IGNORE, which may be used to achieve it, but seems it don't
>> work as expected, as in backup_loop() we retry operation if ret < 0 and  action != BLOCK_ERROR_ACTION_REPORT.
>>
>> And another thought: can user take a decision of discarding (like CONDITIONAL) or saving in backing chain (like
>> ALWAYS) failed backup result _after_ backup job complete? For example, for small resulting backup it may be
>> better to discard it and for large - to save.
>> Will it work if we start job with ALWAYS mode and autocomplete = false, then on fail we can look at job progress,
>> and if it is small we cancel job, otherwise call complete? Or stop, block-job-complete will not work with failure
>> scenarios? Then we have to set BLOCKDEV_ON_ERROR_IGNORE, and on first error event decide, cancel or not? But we
>> can only cancel or continue..
>>
>> Hmm. Cancel. So on cancel and abort you synchronize bitmap too? Seems in bad relation with what cancel should do,
>> and in transactions in general...
> 
> I mean grouped transaction mode, how should it work with this?

Actual the problem is that you want to implement partial success, and block jobs api and transactions api are not prepared
for such thing

> 
>>
>>
>>> -        assert(bm);
>>>       } else {
>>>           /* Everything is fine, delete this bitmap and install the backup. */
>>>           bm = bdrv_dirty_bitmap_abdicate(bs, job->sync_bitmap, NULL);
>>> -        assert(bm);
>>>       }
>>> +    assert(bm);
>>>   }
>>>   static void backup_commit(Job *job)
>>>
>>
>>
> 
>
Vladimir Sementsov-Ogievskiy June 21, 2019, 1:44 p.m. UTC | #7
21.06.2019 16:08, Vladimir Sementsov-Ogievskiy wrote:
> 21.06.2019 15:59, Vladimir Sementsov-Ogievskiy wrote:
>> 21.06.2019 15:57, Vladimir Sementsov-Ogievskiy wrote:
>>> 20.06.2019 4:03, John Snow wrote:
>>>> This adds an "always" policy for bitmap synchronization. Regardless of if
>>>> the job succeeds or fails, the bitmap is *always* synchronized. This means
>>>> that for backups that fail part-way through, the bitmap retains a record of
>>>> which sectors need to be copied out to accomplish a new backup using the
>>>> old, partial result.
>>>>
>>>> In effect, this allows us to "resume" a failed backup; however the new backup
>>>> will be from the new point in time, so it isn't a "resume" as much as it is
>>>> an "incremental retry." This can be useful in the case of extremely large
>>>> backups that fail considerably through the operation and we'd like to not waste
>>>> the work that was already performed.
>>>>
>>>> Signed-off-by: John Snow <jsnow@redhat.com>
>>>> ---
>>>>   qapi/block-core.json |  5 ++++-
>>>>   block/backup.c       | 10 ++++++----
>>>>   2 files changed, 10 insertions(+), 5 deletions(-)
>>>>
>>>> diff --git a/qapi/block-core.json b/qapi/block-core.json
>>>> index 0332dcaabc..58d267f1f5 100644
>>>> --- a/qapi/block-core.json
>>>> +++ b/qapi/block-core.json
>>>> @@ -1143,6 +1143,9 @@
>>>>   # An enumeration of possible behaviors for the synchronization of a bitmap
>>>>   # when used for data copy operations.
>>>>   #
>>>> +# @always: The bitmap is always synchronized with remaining blocks to copy,
>>>> +#          whether or not the operation has completed successfully or not.
>>>
>>> Hmm, now I think that 'always' sounds a bit like 'really always' i.e. during backup
>>> too, which is confusing.. But I don't have better suggestion.
>>>
>>>> +#
>>>>   # @conditional: The bitmap is only synchronized when the operation is successul.
>>>>   #               This is useful for Incremental semantics.
>>>>   #
>>>> @@ -1153,7 +1156,7 @@
>>>>   # Since: 4.1
>>>>   ##
>>>>   { 'enum': 'BitmapSyncMode',
>>>> -  'data': ['conditional', 'never'] }
>>>> +  'data': ['always', 'conditional', 'never'] }
>>>>   ##
>>>>   # @MirrorCopyMode:
>>>> diff --git a/block/backup.c b/block/backup.c
>>>> index 627f724b68..beb2078696 100644
>>>> --- a/block/backup.c
>>>> +++ b/block/backup.c
>>>> @@ -266,15 +266,17 @@ static void backup_cleanup_sync_bitmap(BackupBlockJob *job, int ret)
>>>>       BlockDriverState *bs = blk_bs(job->common.blk);
>>>>       if (ret < 0 || job->bitmap_mode == BITMAP_SYNC_MODE_NEVER) {
>>>> -        /* Failure, or we don't want to synchronize the bitmap.
>>>> -         * Merge the successor back into the parent, delete nothing. */
>>>> +        /* Failure, or we don't want to synchronize the bitmap. */
>>>> +        if (job->bitmap_mode == BITMAP_SYNC_MODE_ALWAYS) {
>>>> +            bdrv_dirty_bitmap_claim(job->sync_bitmap, &job->copy_bitmap);
>>>> +        }
>>>> +        /* Merge the successor back into the parent. */
>>>>           bm = bdrv_reclaim_dirty_bitmap(bs, job->sync_bitmap, NULL);
>>>
>>> Hmm good, it should work. It's a lot more tricky, than just
>>> "synchronized with remaining blocks to copy", but I'm not sure the we need more details in
>>> spec.
>>>
>>> What we have in backup? So, from one hand we have an incremental backup, and a bitmap, counting from it.
>>> On the other hand it's not normal incremental backup, as it don't correspond to any valid state of vm disk,
>>> and it may be used only as a backing in a chain of further successful incremental backup, yes?
>>>
>>> And then I think: with this mode we can not stop on first error, but ignore it, just leaving dirty bit for
>>> resulting bitmap.. We have BLOCKDEV_ON_ERROR_IGNORE, which may be used to achieve it, but seems it don't
>>> work as expected, as in backup_loop() we retry operation if ret < 0 and  action != BLOCK_ERROR_ACTION_REPORT.
>>>
>>> And another thought: can user take a decision of discarding (like CONDITIONAL) or saving in backing chain (like
>>> ALWAYS) failed backup result _after_ backup job complete? For example, for small resulting backup it may be
>>> better to discard it and for large - to save.
>>> Will it work if we start job with ALWAYS mode and autocomplete = false, then on fail we can look at job progress,
>>> and if it is small we cancel job, otherwise call complete? Or stop, block-job-complete will not work with failure
>>> scenarios? Then we have to set BLOCKDEV_ON_ERROR_IGNORE, and on first error event decide, cancel or not? But we
>>> can only cancel or continue..
>>>
>>> Hmm. Cancel. So on cancel and abort you synchronize bitmap too? Seems in bad relation with what cancel should do,
>>> and in transactions in general...
>>
>> I mean grouped transaction mode, how should it work with this?
> 
> Actual the problem is that you want to implement partial success, and block jobs api and transactions api are not prepared
> for such thing


Should it be OK if we just:

1. restrict using ALWAYS together with grouped transaction mode, so we don't need to deal with other job failures.
2. don't claim but only reclaim on cancel even in ALWAYS mode, to make cancel roll-back all things

?
John Snow June 21, 2019, 8:58 p.m. UTC | #8
On 6/21/19 9:44 AM, Vladimir Sementsov-Ogievskiy wrote:
> 21.06.2019 16:08, Vladimir Sementsov-Ogievskiy wrote:
>> 21.06.2019 15:59, Vladimir Sementsov-Ogievskiy wrote:
>>> 21.06.2019 15:57, Vladimir Sementsov-Ogievskiy wrote:

^ Home Run!

I'm going to reply to all four of these mails at once below, I'm sorry
for the words but I want to make sure I am being clear in my intent.

>>>> 20.06.2019 4:03, John Snow wrote:
>>>>> This adds an "always" policy for bitmap synchronization. Regardless of if
>>>>> the job succeeds or fails, the bitmap is *always* synchronized. This means
>>>>> that for backups that fail part-way through, the bitmap retains a record of
>>>>> which sectors need to be copied out to accomplish a new backup using the
>>>>> old, partial result.
>>>>>
>>>>> In effect, this allows us to "resume" a failed backup; however the new backup
>>>>> will be from the new point in time, so it isn't a "resume" as much as it is
>>>>> an "incremental retry." This can be useful in the case of extremely large
>>>>> backups that fail considerably through the operation and we'd like to not waste
>>>>> the work that was already performed.
>>>>>
>>>>> Signed-off-by: John Snow <jsnow@redhat.com>
>>>>> ---
>>>>>   qapi/block-core.json |  5 ++++-
>>>>>   block/backup.c       | 10 ++++++----
>>>>>   2 files changed, 10 insertions(+), 5 deletions(-)
>>>>>
>>>>> diff --git a/qapi/block-core.json b/qapi/block-core.json
>>>>> index 0332dcaabc..58d267f1f5 100644
>>>>> --- a/qapi/block-core.json
>>>>> +++ b/qapi/block-core.json
>>>>> @@ -1143,6 +1143,9 @@
>>>>>   # An enumeration of possible behaviors for the synchronization of a bitmap
>>>>>   # when used for data copy operations.
>>>>>   #
>>>>> +# @always: The bitmap is always synchronized with remaining blocks to copy,
>>>>> +#          whether or not the operation has completed successfully or not.
>>>>
>>>> Hmm, now I think that 'always' sounds a bit like 'really always' i.e. during backup
>>>> too, which is confusing.. But I don't have better suggestion.
>>>>

I could probably clarify to say "at the conclusion of the operation",
but we should also keep in mind that bitmaps tied to an operation can't
be used during that timeframe anyway.

>>>>> +#
>>>>>   # @conditional: The bitmap is only synchronized when the operation is successul.
>>>>>   #               This is useful for Incremental semantics.
>>>>>   #
>>>>> @@ -1153,7 +1156,7 @@
>>>>>   # Since: 4.1
>>>>>   ##
>>>>>   { 'enum': 'BitmapSyncMode',
>>>>> -  'data': ['conditional', 'never'] }
>>>>> +  'data': ['always', 'conditional', 'never'] }
>>>>>   ##
>>>>>   # @MirrorCopyMode:
>>>>> diff --git a/block/backup.c b/block/backup.c
>>>>> index 627f724b68..beb2078696 100644
>>>>> --- a/block/backup.c
>>>>> +++ b/block/backup.c
>>>>> @@ -266,15 +266,17 @@ static void backup_cleanup_sync_bitmap(BackupBlockJob *job, int ret)
>>>>>       BlockDriverState *bs = blk_bs(job->common.blk);
>>>>>       if (ret < 0 || job->bitmap_mode == BITMAP_SYNC_MODE_NEVER) {
>>>>> -        /* Failure, or we don't want to synchronize the bitmap.
>>>>> -         * Merge the successor back into the parent, delete nothing. */
>>>>> +        /* Failure, or we don't want to synchronize the bitmap. */
>>>>> +        if (job->bitmap_mode == BITMAP_SYNC_MODE_ALWAYS) {
>>>>> +            bdrv_dirty_bitmap_claim(job->sync_bitmap, &job->copy_bitmap);
>>>>> +        }
>>>>> +        /* Merge the successor back into the parent. */
>>>>>           bm = bdrv_reclaim_dirty_bitmap(bs, job->sync_bitmap, NULL);
>>>>
>>>> Hmm good, it should work. It's a lot more tricky, than just
>>>> "synchronized with remaining blocks to copy", but I'm not sure the we need more details in
>>>> spec.
>>>>

Right, it's complicated because backups involve two points in time; the
start and finish of the operation. The actual technical truth of what
happens is hard to phrase succinctly.

It was difficult to phrase for even the normal Incremental/conditional
mode that we have.

I can't help but feel I need to write a blog post that has some good
diagrams that can be used to explain the concept clearly.

>>>> What we have in backup? So, from one hand we have an incremental backup, and a bitmap, counting from it.
>>>> On the other hand it's not normal incremental backup, as it don't correspond to any valid state of vm disk,
>>>> and it may be used only as a backing in a chain of further successful incremental backup, yes?
>>>>

You can also continue writing directly into it, which is likely the
smarter choice because it saves you the trouble of doing an intermediate
block commit later, and then you don't keep any image files that are
"meaningless" by themselves.

However, yes, iotest 257 uses them as backing images.

>>>> And then I think: with this mode we can not stop on first error, but ignore it, just leaving dirty bit for
>>>> resulting bitmap.. We have BLOCKDEV_ON_ERROR_IGNORE, which may be used to achieve it, but seems it don't
>>>> work as expected, as in backup_loop() we retry operation if ret < 0 and  action != BLOCK_ERROR_ACTION_REPORT.
>>>>

This strikes me as a good idea, but I wonder: if we retry already for
'ignore', it seems likely that transient network errors likely recover
on their own as a result already. Are there cases where we really want
the job to move forward, because we expect certain sectors will never
copy correctly, like reading from unreliable media? Are those cases ones
we expect to be able to fix later?

(Actually, what happens if we ignore errors and we get stuck on a
sector? How many times do we retry this before we give up and admit that
it's actually an error we can't ignore?)

The use cases aren't clear to me right away, but it's worth looking into
because it sounds like it could be useful. I think that should not be
part of this series, however.

>>>> And another thought: can user take a decision of discarding (like CONDITIONAL) or saving in backing chain (like
>>>> ALWAYS) failed backup result _after_ backup job complete? For example, for small resulting backup it may be
>>>> better to discard it and for large - to save.

That seems complicated, because you need to keep the bitmap split into
its component subsets (original, copy manifest, and writes since start)
all the way until AFTER the job, which means more bitmap management
commands that need to be issued after the job is done.

Which means the job would move the bitmap into yet another new state
where it is "busy" and cannot be used, but is awaiting some kind of
rollover command from the user.

However, you could also just use our 'merge' support to make a copy of
the bitmap before you begin and use the 'always' sync mode, then if you
decide it's not worth restarting after the fact, you can just delete the
copy.

>>>> Will it work if we start job with ALWAYS mode and autocomplete = false, then on fail we can look at job progress,
>>>> and if it is small we cancel job, otherwise call complete? Or stop, block-job-complete will not work with failure
>>>> scenarios? Then we have to set BLOCKDEV_ON_ERROR_IGNORE, and on first error event decide, cancel or not? But we
>>>> can only cancel or continue..
>>>>
>>>> Hmm. Cancel. So on cancel and abort you synchronize bitmap too? Seems in bad relation with what cancel should do,
>>>> and in transactions in general...
>>>
>>> I mean grouped transaction mode, how should it work with this?
>>
>> Actual the problem is that you want to implement partial success, and block jobs api and transactions api are not prepared
>> for such thing

I wouldn't call it partial success, but rather a "failure with detailed
error log" -- but I concede I am playing games with terminology.

The operation failed, but the bitmap can be considered a record of
exactly which bitmap regions didn't succeed in being copied.

You're right, though; the regions that got cleared could be considered a
record of partial success; but I think I might resist the idea of
wanting to formalize that in a new API. I think it's easier to
conceptualize it as a recoverable failure, and the bitmap behaves as the
resume/recovery data.

> 
> 
> Should it be OK if we just:
> 
> 1. restrict using ALWAYS together with grouped transaction mode, so we don't need to deal with other job failures.
> 2. don't claim but only reclaim on cancel even in ALWAYS mode, to make cancel roll-back all things
> 
> ?
> 
> 

> "grouped transaction mode, how should it work with this?"

With or without the grouped completion mode, it does the same thing: it
ALWAYS synchronizes!

Yes, that means that:

1. In the case of user cancellation, it still saves partial work.
2. In the case of an induced cancellation from a peer job, it saves
partial work.

I think this behavior is correct because grouped completion mode does
not actually guarantee that jobs that are already running clean up
completely as if they were never launched; that is, we cannot undo the
fact that we DID copy data out to a target. Therefore, because we
factually DID perform some work, this mode simply preserves a record of
what DID occur, in the case that the client prefers to salvage partial
work instead of restarting from scratch.

Just because we launched this as part of a transaction, in other words,
does not seem like a good case for negating the intention of the user to
be able to resume from failures if they occur.

I realize this seems like a strange usage of a transaction -- because
state from the transaction can escape a failed transaction -- but real
database systems in practice do allow the ability to do partial unwinds
to minimize the amount of work that needs to be redone. I don't think
this is too surprising -- and it happens ONLY when the user specifically
requests it, so I believe this is safe behavior.

I do agree that it *probably* doesn't make sense to use these together
-- it will work just fine, but why would you be okay with one type of
partial completion when you're not ok with another? I don't understand
why you'd ask for it, but I think it will do exactly what you ask for.
It's actually less code to just let it work like this.


> "So on cancel and abort you synchronize bitmap too?"

I will concede that this means that if you ask for a bitmap backup with
the 'always' policy and, for whatever reason change your mind about
this, there's no way to "cancel" the job in a manner that does not edit
the bitmap at this point.

I do agree that this seems to go against the wishes of the user, because
we have different "kinds" of cancellations:

A) Cancellations that actually represent failures in transactions
B) Cancellations that represent genuine user intention

It might be nice to allow the user to say "No wait, please don't edit
that bitmap, I made a mistake!"

In these cases, how about a slight overloading of the "forced" user
cancellation? For mirror, a "forced" cancel means "Please don't try to
sync the mirror, just exit immediately." whereas an unforced cancel
means "Please complete!"

For backup, it could mean something similar:

force: "Please exit immediately, don't even sync the bitmap!"
!force: "Exit, but proceed with normal cleanup."

This would mean that in the grouped completion failure case, we actually
still sync the bitmap, but if the user sends a forced-cancel, they can
actually abort the entire transaction.

Actually, that needs just a minor edit:

job.c:746:

job_cancel_async(other_job, false);

should become:

job_cancel_async(other_job, job->force_cancel)


And then the bitmap cleanup code can check for this condition to avoid
synchronizing the bitmap in these cases.
Max Reitz June 21, 2019, 9:48 p.m. UTC | #9
On 21.06.19 22:58, John Snow wrote:
> 
> 
> On 6/21/19 9:44 AM, Vladimir Sementsov-Ogievskiy wrote:

[...]

Just chiming in on this:

>> "So on cancel and abort you synchronize bitmap too?"
> 
> I will concede that this means that if you ask for a bitmap backup with
> the 'always' policy and, for whatever reason change your mind about
> this, there's no way to "cancel" the job in a manner that does not edit
> the bitmap at this point.
> 
> I do agree that this seems to go against the wishes of the user, because
> we have different "kinds" of cancellations:
> 
> A) Cancellations that actually represent failures in transactions
> B) Cancellations that represent genuine user intention
> 
> It might be nice to allow the user to say "No wait, please don't edit
> that bitmap, I made a mistake!"

So that “always” doesn’t mean “always”?  To me, that seems like not so
good an idea.

If the user uses always, they have to live with that.  I had to live
with calling “rm” on the wrong file before.  Life’s tough.

In all seriousness: “Always” is not something a user would use, is it?
It’s something for management tools.  Why would they cancel because
“They made a mistake”?

Second, what’s the worst thing that may come out of such a mistake?
Having to perform a full backup?  If so, that doesn’t seem so bad to me.
 It certainly doesn’t seem so bad to make an unrelated mechanic have an
influence on whether “always” means “always”.

Also, this cancel idea would only work for jobs where the bitmap mode
does not come into play until the job is done, i.e. backup.  I suppose
if we want to have bitmap modes other than 'always' for mirror, that too
would have to make a copy of the user-supplied bitmap, so there the
bitmap mode would make a difference only at the end of the job, too, but
who knows.

And if it only makes a difference at the end of the job, you might as
well just add a way to change a running job’s bitmap-mode.

Max
John Snow June 21, 2019, 10:52 p.m. UTC | #10
On 6/21/19 5:48 PM, Max Reitz wrote:
> On 21.06.19 22:58, John Snow wrote:
>>
>>
>> On 6/21/19 9:44 AM, Vladimir Sementsov-Ogievskiy wrote:
> 
> [...]
> 
> Just chiming in on this:
> 
>>> "So on cancel and abort you synchronize bitmap too?"
>>
>> I will concede that this means that if you ask for a bitmap backup with
>> the 'always' policy and, for whatever reason change your mind about
>> this, there's no way to "cancel" the job in a manner that does not edit
>> the bitmap at this point.
>>
>> I do agree that this seems to go against the wishes of the user, because
>> we have different "kinds" of cancellations:
>>
>> A) Cancellations that actually represent failures in transactions
>> B) Cancellations that represent genuine user intention
>>
>> It might be nice to allow the user to say "No wait, please don't edit
>> that bitmap, I made a mistake!"
> 
> So that “always” doesn’t mean “always”?  To me, that seems like not so
> good an idea.
> 
> If the user uses always, they have to live with that.  I had to live
> with calling “rm” on the wrong file before.  Life’s tough.
> 

I actually agree, but I was making a concession in the ONE conceivable
case where you would theoretically want to abort "always".

> In all seriousness: “Always” is not something a user would use, is it?
> It’s something for management tools.  Why would they cancel because
> “They made a mistake”?
> 

A user might use it -- it's an attractive mode. It's basically
Incremental with retry ability. It is designed for use by a management
utility though, yes.

> Second, what’s the worst thing that may come out of such a mistake?
> Having to perform a full backup?  If so, that doesn’t seem so bad to me.
>  It certainly doesn’t seem so bad to make an unrelated mechanic have an
> influence on whether “always” means “always”.
> 

No, if you "accidentally" issue always (and you change your mind for
whatever reason), the correct way to fix this is:

(1) If the job completes successfully, nothing. Everything is situation
normal. This behaves exactly like "incremental" mode.

(2) If the job fails so hard you don't succeed in writing data anywhere
at all, nothing. Everything is fine. This behaves exactly like a failure
in "incremental" mode. The only way to reliably tell if this happened is
if job never even succeeded in creating a target for you, or your target
is still verifiably empty. (Even so: good practice would be to never
delete a target if you used 'always' mode.)

(3) If the job fails after writing SOME data, you simply issue another
mode=bitmap policy=always against the same target. (Presumably after
fixing your network or clearing room on the target storage.)

The worst mistake you can make is this:

- Issue sync=bitmap policy=always
- Cancel the job because it's taking too long, and you are impatient
- Forget that you used "always", delete the incomplete backup target

Oops! That had data that QEMU was counting on having written out
already. Your bitmap is now garbage.

You fix this with a full backup, yes.

> Also, this cancel idea would only work for jobs where the bitmap mode
> does not come into play until the job is done, i.e. backup.  I suppose
> if we want to have bitmap modes other than 'always' for mirror, that too
> would have to make a copy of the user-supplied bitmap, so there the
> bitmap mode would make a difference only at the end of the job, too, but
> who knows.
> 

Reasonable point; at the moment I modeled bitmap support for mirror to
only do synchronization at the end of the job as well. In this case,
"soft cancels" are modeled (upstream, today) as ret == 0, so those won't
count as failures at all.

(And, actually, force cancels will count as real failures. So maybe it
IS best not to overload this already hacky semantic we have on cancel.)

> And if it only makes a difference at the end of the job, you might as
> well just add a way to change a running job’s bitmap-mode.
> 

This is prescient. I have wanted a "completion-mode" for mirror that you
can change during its runtime (and to deprecate cancel as a way to
"complete" the job) for a very long time.

It's just that the QAPI for it always seems ugly so I shy away from it.

> Max
> 

So, I will say this:

1) I think the implementation of "always" is perfectly correct, in
single, transaction, or grouped-completion transaction modes.

2) Some of these combinations don't make much practical sense, but it's
more work to disallow them, and past experience reminds me that it's not
my job to save the user from themselves at the primitive level.

3) Adding nicer features like "I want a different completion mode since
i started this job" don't exist for any other mode or any other job
right now, and I don't think I will add them to this series.
diff mbox series

Patch

diff --git a/qapi/block-core.json b/qapi/block-core.json
index 0332dcaabc..58d267f1f5 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -1143,6 +1143,9 @@ 
 # An enumeration of possible behaviors for the synchronization of a bitmap
 # when used for data copy operations.
 #
+# @always: The bitmap is always synchronized with remaining blocks to copy,
+#          whether or not the operation has completed successfully or not.
+#
 # @conditional: The bitmap is only synchronized when the operation is successul.
 #               This is useful for Incremental semantics.
 #
@@ -1153,7 +1156,7 @@ 
 # Since: 4.1
 ##
 { 'enum': 'BitmapSyncMode',
-  'data': ['conditional', 'never'] }
+  'data': ['always', 'conditional', 'never'] }
 
 ##
 # @MirrorCopyMode:
diff --git a/block/backup.c b/block/backup.c
index 627f724b68..beb2078696 100644
--- a/block/backup.c
+++ b/block/backup.c
@@ -266,15 +266,17 @@  static void backup_cleanup_sync_bitmap(BackupBlockJob *job, int ret)
     BlockDriverState *bs = blk_bs(job->common.blk);
 
     if (ret < 0 || job->bitmap_mode == BITMAP_SYNC_MODE_NEVER) {
-        /* Failure, or we don't want to synchronize the bitmap.
-         * Merge the successor back into the parent, delete nothing. */
+        /* Failure, or we don't want to synchronize the bitmap. */
+        if (job->bitmap_mode == BITMAP_SYNC_MODE_ALWAYS) {
+            bdrv_dirty_bitmap_claim(job->sync_bitmap, &job->copy_bitmap);
+        }
+        /* Merge the successor back into the parent. */
         bm = bdrv_reclaim_dirty_bitmap(bs, job->sync_bitmap, NULL);
-        assert(bm);
     } else {
         /* Everything is fine, delete this bitmap and install the backup. */
         bm = bdrv_dirty_bitmap_abdicate(bs, job->sync_bitmap, NULL);
-        assert(bm);
     }
+    assert(bm);
 }
 
 static void backup_commit(Job *job)