diff mbox

[v2,10/11] blockjob: refactor backup_start as backup_job_create

Message ID 1475272849-19990-11-git-send-email-jsnow@redhat.com (mailing list archive)
State New, archived
Headers show

Commit Message

John Snow Sept. 30, 2016, 10 p.m. UTC
Refactor backup_start as backup_job_create, which only creates the job,
but does not automatically start it. The old interface, 'backup_start',
is not kept in favor of limiting the number of nearly-identical iterfaces
that would have to be edited to keep up with QAPI changes in the future.

Callers that wish to synchronously start the backup_block_job can
instead just call block_job_start immediately after calling
backup_job_create.

Transactions are updated to use the new interface, calling block_job_start
only during the .commit phase, which helps prevent race conditions where
jobs may finish before we even finish building the transaction. This may
happen, for instance, during empty block backup jobs.

Reported-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: John Snow <jsnow@redhat.com>
---
 block/backup.c            | 26 ++++++++-------
 block/replication.c       | 11 ++++---
 blockdev.c                | 81 +++++++++++++++++++++++++++++++----------------
 include/block/block_int.h | 21 ++++++------
 4 files changed, 86 insertions(+), 53 deletions(-)

Comments

John Snow Oct. 7, 2016, 6:39 p.m. UTC | #1
On 09/30/2016 06:00 PM, John Snow wrote:
> Refactor backup_start as backup_job_create, which only creates the job,
> but does not automatically start it. The old interface, 'backup_start',
> is not kept in favor of limiting the number of nearly-identical iterfaces
> that would have to be edited to keep up with QAPI changes in the future.
>
> Callers that wish to synchronously start the backup_block_job can
> instead just call block_job_start immediately after calling
> backup_job_create.
>
> Transactions are updated to use the new interface, calling block_job_start
> only during the .commit phase, which helps prevent race conditions where
> jobs may finish before we even finish building the transaction. This may
> happen, for instance, during empty block backup jobs.
>

Sadly for me, I realized this patch has a potential problem. When we 
were adding the bitmap operations, it became clear that the atomicity 
point was during .prepare, not .commit.

e.g. the bitmap is cleared or created during prepare, and backup_run 
installs its Write Notifier at that point in time, too.

By changing BlockJobs to only run on commit, we've severed the atomicity 
point such that some actions will take effect during prepare, and others 
at commit.

I still think it's the correct thing to do to delay the BlockJobs until 
the commit phase, so I will start auditing the code to see how hard it 
is to shift the atomicity point to commit instead. If it's possible to 
do that, I think from the POV of the managing application, having the 
atomicity point be

Feel free to chime in with suggestions and counterpoints until then.

--js

> Reported-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> Signed-off-by: John Snow <jsnow@redhat.com>
> ---
>  block/backup.c            | 26 ++++++++-------
>  block/replication.c       | 11 ++++---
>  blockdev.c                | 81 +++++++++++++++++++++++++++++++----------------
>  include/block/block_int.h | 21 ++++++------
>  4 files changed, 86 insertions(+), 53 deletions(-)
>
> diff --git a/block/backup.c b/block/backup.c
> index 7294169..aad69eb 100644
> --- a/block/backup.c
> +++ b/block/backup.c
> @@ -527,7 +527,7 @@ static const BlockJobDriver backup_job_driver = {
>      .attached_aio_context   = backup_attached_aio_context,
>  };
>
> -void backup_start(const char *job_id, BlockDriverState *bs,
> +BlockJob *backup_job_create(const char *job_id, BlockDriverState *bs,
>                    BlockDriverState *target, int64_t speed,
>                    MirrorSyncMode sync_mode, BdrvDirtyBitmap *sync_bitmap,
>                    bool compress,
> @@ -546,52 +546,52 @@ void backup_start(const char *job_id, BlockDriverState *bs,
>
>      if (bs == target) {
>          error_setg(errp, "Source and target cannot be the same");
> -        return;
> +        return NULL;
>      }
>
>      if (!bdrv_is_inserted(bs)) {
>          error_setg(errp, "Device is not inserted: %s",
>                     bdrv_get_device_name(bs));
> -        return;
> +        return NULL;
>      }
>
>      if (!bdrv_is_inserted(target)) {
>          error_setg(errp, "Device is not inserted: %s",
>                     bdrv_get_device_name(target));
> -        return;
> +        return NULL;
>      }
>
>      if (compress && target->drv->bdrv_co_pwritev_compressed == NULL) {
>          error_setg(errp, "Compression is not supported for this drive %s",
>                     bdrv_get_device_name(target));
> -        return;
> +        return NULL;
>      }
>
>      if (bdrv_op_is_blocked(bs, BLOCK_OP_TYPE_BACKUP_SOURCE, errp)) {
> -        return;
> +        return NULL;
>      }
>
>      if (bdrv_op_is_blocked(target, BLOCK_OP_TYPE_BACKUP_TARGET, errp)) {
> -        return;
> +        return NULL;
>      }
>
>      if (sync_mode == MIRROR_SYNC_MODE_INCREMENTAL) {
>          if (!sync_bitmap) {
>              error_setg(errp, "must provide a valid bitmap name for "
>                               "\"incremental\" sync mode");
> -            return;
> +            return NULL;
>          }
>
>          /* Create a new bitmap, and freeze/disable this one. */
>          if (bdrv_dirty_bitmap_create_successor(bs, sync_bitmap, errp) < 0) {
> -            return;
> +            return NULL;
>          }
>      } else if (sync_bitmap) {
>          error_setg(errp,
>                     "a sync_bitmap was provided to backup_run, "
>                     "but received an incompatible sync_mode (%s)",
>                     MirrorSyncMode_lookup[sync_mode]);
> -        return;
> +        return NULL;
>      }
>
>      len = bdrv_getlength(bs);
> @@ -638,8 +638,8 @@ void backup_start(const char *job_id, BlockDriverState *bs,
>      bdrv_op_block_all(target, job->common.blocker);
>      job->common.len = len;
>      block_job_txn_add_job(txn, &job->common);
> -    block_job_start(&job->common);
> -    return;
> +
> +    return &job->common;
>
>   error:
>      if (sync_bitmap) {
> @@ -649,4 +649,6 @@ void backup_start(const char *job_id, BlockDriverState *bs,
>          blk_unref(job->target);
>          block_job_unref(&job->common);
>      }
> +
> +    return NULL;
>  }
> diff --git a/block/replication.c b/block/replication.c
> index b604b93..d9cdc36 100644
> --- a/block/replication.c
> +++ b/block/replication.c
> @@ -409,6 +409,7 @@ static void replication_start(ReplicationState *rs, ReplicationMode mode,
>      int64_t active_length, hidden_length, disk_length;
>      AioContext *aio_context;
>      Error *local_err = NULL;
> +    BlockJob *job;
>
>      aio_context = bdrv_get_aio_context(bs);
>      aio_context_acquire(aio_context);
> @@ -496,16 +497,18 @@ static void replication_start(ReplicationState *rs, ReplicationMode mode,
>          bdrv_op_block_all(top_bs, s->blocker);
>          bdrv_op_unblock(top_bs, BLOCK_OP_TYPE_DATAPLANE, s->blocker);
>
> -        backup_start("replication-backup", s->secondary_disk->bs,
> -                     s->hidden_disk->bs, 0, MIRROR_SYNC_MODE_NONE, NULL, false,
> -                     BLOCKDEV_ON_ERROR_REPORT, BLOCKDEV_ON_ERROR_REPORT,
> -                     backup_job_completed, s, NULL, &local_err);
> +        job = backup_job_create("replication-backup", s->secondary_disk->bs,
> +                                s->hidden_disk->bs, 0, MIRROR_SYNC_MODE_NONE,
> +                                NULL, false, BLOCKDEV_ON_ERROR_REPORT,
> +                                BLOCKDEV_ON_ERROR_REPORT, backup_job_completed,
> +                                s, NULL, &local_err);
>          if (local_err) {
>              error_propagate(errp, local_err);
>              backup_job_cleanup(s);
>              aio_context_release(aio_context);
>              return;
>          }
> +        block_job_start(job);
>          break;
>      default:
>          aio_context_release(aio_context);
> diff --git a/blockdev.c b/blockdev.c
> index 0ac507f..37d78d3 100644
> --- a/blockdev.c
> +++ b/blockdev.c
> @@ -1866,7 +1866,7 @@ typedef struct DriveBackupState {
>      BlockJob *job;
>  } DriveBackupState;
>
> -static void do_drive_backup(DriveBackup *backup, BlockJobTxn *txn,
> +static BlockJob *do_drive_backup(DriveBackup *backup, BlockJobTxn *txn,
>                              Error **errp);
>
>  static void drive_backup_prepare(BlkActionState *common, Error **errp)
> @@ -1890,23 +1890,27 @@ static void drive_backup_prepare(BlkActionState *common, Error **errp)
>      bdrv_drained_begin(bs);
>      state->bs = bs;
>
> -    do_drive_backup(backup, common->block_job_txn, &local_err);
> +    state->job = do_drive_backup(backup, common->block_job_txn, &local_err);
>      if (local_err) {
>          error_propagate(errp, local_err);
>          return;
>      }
> +}
>
> -    state->job = state->bs->job;
> +static void drive_backup_commit(BlkActionState *common)
> +{
> +    DriveBackupState *state = DO_UPCAST(DriveBackupState, common, common);
> +    if (state->job) {
> +        block_job_start(state->job);
> +    }
>  }
>
>  static void drive_backup_abort(BlkActionState *common)
>  {
>      DriveBackupState *state = DO_UPCAST(DriveBackupState, common, common);
> -    BlockDriverState *bs = state->bs;
>
> -    /* Only cancel if it's the job we started */
> -    if (bs && bs->job && bs->job == state->job) {
> -        block_job_cancel_sync(bs->job);
> +    if (state->job) {
> +        block_job_cancel_sync(state->job);
>      }
>  }
>
> @@ -1927,8 +1931,8 @@ typedef struct BlockdevBackupState {
>      AioContext *aio_context;
>  } BlockdevBackupState;
>
> -static void do_blockdev_backup(BlockdevBackup *backup, BlockJobTxn *txn,
> -                               Error **errp);
> +static BlockJob *do_blockdev_backup(BlockdevBackup *backup, BlockJobTxn *txn,
> +                                    Error **errp);
>
>  static void blockdev_backup_prepare(BlkActionState *common, Error **errp)
>  {
> @@ -1961,23 +1965,27 @@ static void blockdev_backup_prepare(BlkActionState *common, Error **errp)
>      state->bs = bs;
>      bdrv_drained_begin(state->bs);
>
> -    do_blockdev_backup(backup, common->block_job_txn, &local_err);
> +    state->job = do_blockdev_backup(backup, common->block_job_txn, &local_err);
>      if (local_err) {
>          error_propagate(errp, local_err);
>          return;
>      }
> +}
>
> -    state->job = state->bs->job;
> +static void blockdev_backup_commit(BlkActionState *common)
> +{
> +    BlockdevBackupState *state = DO_UPCAST(BlockdevBackupState, common, common);
> +    if (state->job) {
> +        block_job_start(state->job);
> +    }
>  }
>
>  static void blockdev_backup_abort(BlkActionState *common)
>  {
>      BlockdevBackupState *state = DO_UPCAST(BlockdevBackupState, common, common);
> -    BlockDriverState *bs = state->bs;
>
> -    /* Only cancel if it's the job we started */
> -    if (bs && bs->job && bs->job == state->job) {
> -        block_job_cancel_sync(bs->job);
> +    if (state->job) {
> +        block_job_cancel_sync(state->job);
>      }
>  }
>
> @@ -2127,12 +2135,14 @@ static const BlkActionOps actions[] = {
>      [TRANSACTION_ACTION_KIND_DRIVE_BACKUP] = {
>          .instance_size = sizeof(DriveBackupState),
>          .prepare = drive_backup_prepare,
> +        .commit = drive_backup_commit,
>          .abort = drive_backup_abort,
>          .clean = drive_backup_clean,
>      },
>      [TRANSACTION_ACTION_KIND_BLOCKDEV_BACKUP] = {
>          .instance_size = sizeof(BlockdevBackupState),
>          .prepare = blockdev_backup_prepare,
> +        .commit = blockdev_backup_commit,
>          .abort = blockdev_backup_abort,
>          .clean = blockdev_backup_clean,
>      },
> @@ -3126,11 +3136,13 @@ out:
>      aio_context_release(aio_context);
>  }
>
> -static void do_drive_backup(DriveBackup *backup, BlockJobTxn *txn, Error **errp)
> +static BlockJob *do_drive_backup(DriveBackup *backup, BlockJobTxn *txn,
> +                                 Error **errp)
>  {
>      BlockDriverState *bs;
>      BlockDriverState *target_bs;
>      BlockDriverState *source = NULL;
> +    BlockJob *job = NULL;
>      BdrvDirtyBitmap *bmap = NULL;
>      AioContext *aio_context;
>      QDict *options = NULL;
> @@ -3159,7 +3171,7 @@ static void do_drive_backup(DriveBackup *backup, BlockJobTxn *txn, Error **errp)
>
>      bs = qmp_get_root_bs(backup->device, errp);
>      if (!bs) {
> -        return;
> +        return NULL;
>      }
>
>      aio_context = bdrv_get_aio_context(bs);
> @@ -3233,9 +3245,10 @@ static void do_drive_backup(DriveBackup *backup, BlockJobTxn *txn, Error **errp)
>          }
>      }
>
> -    backup_start(backup->job_id, bs, target_bs, backup->speed, backup->sync,
> -                 bmap, backup->compress, backup->on_source_error,
> -                 backup->on_target_error, NULL, bs, txn, &local_err);
> +    job = backup_job_create(backup->job_id, bs, target_bs, backup->speed,
> +                            backup->sync, bmap, backup->compress,
> +                            backup->on_source_error, backup->on_target_error,
> +                            NULL, bs, txn, &local_err);
>      bdrv_unref(target_bs);
>      if (local_err != NULL) {
>          error_propagate(errp, local_err);
> @@ -3244,11 +3257,17 @@ static void do_drive_backup(DriveBackup *backup, BlockJobTxn *txn, Error **errp)
>
>  out:
>      aio_context_release(aio_context);
> +    return job;
>  }
>
>  void qmp_drive_backup(DriveBackup *arg, Error **errp)
>  {
> -    return do_drive_backup(arg, NULL, errp);
> +
> +    BlockJob *job;
> +    job = do_drive_backup(arg, NULL, errp);
> +    if (job) {
> +        block_job_start(job);
> +    }
>  }
>
>  BlockDeviceInfoList *qmp_query_named_block_nodes(Error **errp)
> @@ -3256,12 +3275,14 @@ BlockDeviceInfoList *qmp_query_named_block_nodes(Error **errp)
>      return bdrv_named_nodes_list(errp);
>  }
>
> -void do_blockdev_backup(BlockdevBackup *backup, BlockJobTxn *txn, Error **errp)
> +BlockJob *do_blockdev_backup(BlockdevBackup *backup, BlockJobTxn *txn,
> +                             Error **errp)
>  {
>      BlockDriverState *bs;
>      BlockDriverState *target_bs;
>      Error *local_err = NULL;
>      AioContext *aio_context;
> +    BlockJob *job = NULL;
>
>      if (!backup->has_speed) {
>          backup->speed = 0;
> @@ -3281,7 +3302,7 @@ void do_blockdev_backup(BlockdevBackup *backup, BlockJobTxn *txn, Error **errp)
>
>      bs = qmp_get_root_bs(backup->device, errp);
>      if (!bs) {
> -        return;
> +        return NULL;
>      }
>
>      aio_context = bdrv_get_aio_context(bs);
> @@ -3303,19 +3324,25 @@ void do_blockdev_backup(BlockdevBackup *backup, BlockJobTxn *txn, Error **errp)
>              goto out;
>          }
>      }
> -    backup_start(backup->job_id, bs, target_bs, backup->speed, backup->sync,
> -                 NULL, backup->compress, backup->on_source_error,
> -                 backup->on_target_error, NULL, bs, txn, &local_err);
> +    job = backup_job_create(backup->job_id, bs, target_bs, backup->speed,
> +                            backup->sync, NULL, backup->compress,
> +                            backup->on_source_error, backup->on_target_error,
> +                            NULL, bs, txn, &local_err);
>      if (local_err != NULL) {
>          error_propagate(errp, local_err);
>      }
>  out:
>      aio_context_release(aio_context);
> +    return job;
>  }
>
>  void qmp_blockdev_backup(BlockdevBackup *arg, Error **errp)
>  {
> -    do_blockdev_backup(arg, NULL, errp);
> +    BlockJob *job;
> +    job = do_blockdev_backup(arg, NULL, errp);
> +    if (job) {
> +        block_job_start(job);
> +    }
>  }
>
>  /* Parameter check and block job starting for drive mirroring.
> diff --git a/include/block/block_int.h b/include/block/block_int.h
> index 686f6a8..738e4b4 100644
> --- a/include/block/block_int.h
> +++ b/include/block/block_int.h
> @@ -737,7 +737,7 @@ void mirror_start(const char *job_id, BlockDriverState *bs,
>                    void *opaque, Error **errp);
>
>  /*
> - * backup_start:
> + * backup_job_create:
>   * @job_id: The id of the newly-created job, or %NULL to use the
>   * device name of @bs.
>   * @bs: Block device to operate on.
> @@ -751,17 +751,18 @@ void mirror_start(const char *job_id, BlockDriverState *bs,
>   * @opaque: Opaque pointer value passed to @cb.
>   * @txn: Transaction that this job is part of (may be NULL).
>   *
> - * Start a backup operation on @bs.  Clusters in @bs are written to @target
> + * Create a backup operation on @bs.  Clusters in @bs are written to @target
>   * until the job is cancelled or manually completed.
>   */
> -void backup_start(const char *job_id, BlockDriverState *bs,
> -                  BlockDriverState *target, int64_t speed,
> -                  MirrorSyncMode sync_mode, BdrvDirtyBitmap *sync_bitmap,
> -                  bool compress,
> -                  BlockdevOnError on_source_error,
> -                  BlockdevOnError on_target_error,
> -                  BlockCompletionFunc *cb, void *opaque,
> -                  BlockJobTxn *txn, Error **errp);
> +BlockJob *backup_job_create(const char *job_id, BlockDriverState *bs,
> +                            BlockDriverState *target, int64_t speed,
> +                            MirrorSyncMode sync_mode,
> +                            BdrvDirtyBitmap *sync_bitmap,
> +                            bool compress,
> +                            BlockdevOnError on_source_error,
> +                            BlockdevOnError on_target_error,
> +                            BlockCompletionFunc *cb, void *opaque,
> +                            BlockJobTxn *txn, Error **errp);
>
>  /**
>   * block_job_start:
>
Kevin Wolf Oct. 10, 2016, 8:57 a.m. UTC | #2
Am 07.10.2016 um 20:39 hat John Snow geschrieben:
> On 09/30/2016 06:00 PM, John Snow wrote:
> >Refactor backup_start as backup_job_create, which only creates the job,
> >but does not automatically start it. The old interface, 'backup_start',
> >is not kept in favor of limiting the number of nearly-identical iterfaces
> >that would have to be edited to keep up with QAPI changes in the future.
> >
> >Callers that wish to synchronously start the backup_block_job can
> >instead just call block_job_start immediately after calling
> >backup_job_create.
> >
> >Transactions are updated to use the new interface, calling block_job_start
> >only during the .commit phase, which helps prevent race conditions where
> >jobs may finish before we even finish building the transaction. This may
> >happen, for instance, during empty block backup jobs.
> >
> 
> Sadly for me, I realized this patch has a potential problem. When we
> were adding the bitmap operations, it became clear that the
> atomicity point was during .prepare, not .commit.
> 
> e.g. the bitmap is cleared or created during prepare, and backup_run
> installs its Write Notifier at that point in time, too.

Strictly speaking that's wrong then.

The write notifier doesn't really hurt because it is never triggered
between prepare and commit (we're holding the lock) and it can just be
removed again.

Clearing the bitmap is a bug because the caller could expect that the
bitmap is in its original state if the transaction fails. I doubt this
is a problem in practice, but we should fix it anyway.

By the way, why did we allow to add a 'bitmap' option for DriveBackup
without adding it to BlockdevBackup at the same time?

> By changing BlockJobs to only run on commit, we've severed the
> atomicity point such that some actions will take effect during
> prepare, and others at commit.
> 
> I still think it's the correct thing to do to delay the BlockJobs
> until the commit phase, so I will start auditing the code to see how
> hard it is to shift the atomicity point to commit instead. If it's
> possible to do that, I think from the POV of the managing
> application, having the atomicity point be
> 
> Feel free to chime in with suggestions and counterpoints until then.

I agree that jobs have to be started only at commit. There may be other
things that are currently happening in prepare that really should be
moved as well, but unless moving one thing but not the other doesn't
break anything that was working, we can fix one thing at a time.

Kevin
John Snow Oct. 10, 2016, 10:51 p.m. UTC | #3
On 10/10/2016 04:57 AM, Kevin Wolf wrote:
> Am 07.10.2016 um 20:39 hat John Snow geschrieben:
>> On 09/30/2016 06:00 PM, John Snow wrote:
>>> Refactor backup_start as backup_job_create, which only creates the job,
>>> but does not automatically start it. The old interface, 'backup_start',
>>> is not kept in favor of limiting the number of nearly-identical iterfaces

(Ah yes, 'iterfaces.')

>>> that would have to be edited to keep up with QAPI changes in the future.
>>>
>>> Callers that wish to synchronously start the backup_block_job can
>>> instead just call block_job_start immediately after calling
>>> backup_job_create.
>>>
>>> Transactions are updated to use the new interface, calling block_job_start
>>> only during the .commit phase, which helps prevent race conditions where
>>> jobs may finish before we even finish building the transaction. This may
>>> happen, for instance, during empty block backup jobs.
>>>
>>
>> Sadly for me, I realized this patch has a potential problem. When we
>> were adding the bitmap operations, it became clear that the
>> atomicity point was during .prepare, not .commit.
>>
>> e.g. the bitmap is cleared or created during prepare, and backup_run
>> installs its Write Notifier at that point in time, too.
>
> Strictly speaking that's wrong then.
>

I agree, though I do remember this coming up during the bitmap review 
process that the current point-in-time spot is during prepare at the moment.

I do think that while it's at least a consistent model (The model where 
we do in fact commit during .prepare(), and simply undo or revert during 
.abort(), and only clean or remove undo-cache in .commit()) it certainly 
violates the principle of least surprise and is a little rude...

> The write notifier doesn't really hurt because it is never triggered
> between prepare and commit (we're holding the lock) and it can just be
> removed again.
>
> Clearing the bitmap is a bug because the caller could expect that the
> bitmap is in its original state if the transaction fails. I doubt this
> is a problem in practice, but we should fix it anyway.
>

We make a backup to undo the process if it fails. I only mention it to 
emphasize that the atomic point appears to be during prepare. In 
practice we hold the locks for the whole process, but... I think Paolo 
may be actively trying to change that.

> By the way, why did we allow to add a 'bitmap' option for DriveBackup
> without adding it to BlockdevBackup at the same time?
>

I don't remember. I'm not sure anyone ever audited it to convince 
themselves it was a useful or safe thing to do. I believe at the time I 
was pushing for bitmaps in DriveBackup, Fam was still authoring the 
BlockdevBackup interface.

>> By changing BlockJobs to only run on commit, we've severed the
>> atomicity point such that some actions will take effect during
>> prepare, and others at commit.
>>
>> I still think it's the correct thing to do to delay the BlockJobs
>> until the commit phase, so I will start auditing the code to see how
>> hard it is to shift the atomicity point to commit instead. If it's
>> possible to do that, I think from the POV of the managing
>> application, having the atomicity point be
>>
>> Feel free to chime in with suggestions and counterpoints until then.
>
> I agree that jobs have to be started only at commit. There may be other
> things that are currently happening in prepare that really should be
> moved as well, but unless moving one thing but not the other doesn't
> break anything that was working, we can fix one thing at a time.
>
> Kevin
>

Alright, let's give this a whirl.

We have 8 transaction actions:

drive_backup
blockdev_backup
block_dirty_bitmap_add
block_dirty_bitmap_clear
abort
blockdev_snapshot
blockdev_snapshot_sync
blockdev_snapshot_internal_sync

Drive and Blockdev backup are already modified to behave point-in-time 
at time of .commit() by changing them to only begin running once the 
commit phase occurs.

Bitmap add and clear are trivial to rework; clear just moves the call to 
clear in commit, with possibly some action taken to prevent the bitmap 
from become used by some other process in the meantime. Add is easy to 
rework too, we can create it during prepare but reset it back to zero 
during commit if necessary.

Abort needs no changes.

blockdev_snapshot[_sync] actually appears to already be doing the right 
thing, by only installing the new top layer during commit, which makes 
this action inconsistent by current semantics, but requires no changes 
to move to the desired new semantics.

That leaves only the internal snapshot to worry about, which does 
admittedly look like quite the yak to shave. It's a bit out of scope for 
me, but Kevin, do you think this is possible?

Looks like implementations are qcow2, rbd, and sheepdog. I imagine this 
would need to be split into prepare and commit semantics to accommodate 
this change... though we don't have any meaningful control over the rbd 
implementation.

Any thoughts? I could conceivably just change everything over to working 
primarily during .commit(), and just argue that the locks held for the 
transaction are sufficient to leave the internal snapshot alone "for 
now," ...

--js
Paolo Bonzini Oct. 11, 2016, 8:56 a.m. UTC | #4
On 11/10/2016 00:51, John Snow wrote:
>> Clearing the bitmap is a bug because the caller could expect that the
>> bitmap is in its original state if the transaction fails. I doubt this
>> is a problem in practice, but we should fix it anyway.
> 
> We make a backup to undo the process if it fails. I only mention it to
> emphasize that the atomic point appears to be during prepare. In
> practice we hold the locks for the whole process, but... I think Paolo
> may be actively trying to change that.

Even now, atomicity must be ensured with bdrv_drained_begin/end.  The
AioContext lock does not promise atomicity.

Paolo
Kevin Wolf Oct. 11, 2016, 9:35 a.m. UTC | #5
Am 11.10.2016 um 00:51 hat John Snow geschrieben:
> >>Sadly for me, I realized this patch has a potential problem. When we
> >>were adding the bitmap operations, it became clear that the
> >>atomicity point was during .prepare, not .commit.
> >>
> >>e.g. the bitmap is cleared or created during prepare, and backup_run
> >>installs its Write Notifier at that point in time, too.
> >
> >Strictly speaking that's wrong then.
> >
> 
> I agree, though I do remember this coming up during the bitmap
> review process that the current point-in-time spot is during prepare
> at the moment.
> 
> I do think that while it's at least a consistent model (The model
> where we do in fact commit during .prepare(), and simply undo or
> revert during .abort(), and only clean or remove undo-cache in
> .commit()) it certainly violates the principle of least surprise and
> is a little rude...

As long as we can reliably undo things in .abort (i.e. use operations
that can't fail) and keep the locks and the device drained, we should be
okay in terms of atomicity.

I think it's still nicer if we can enable things only in .commit, but
sometimes we have to use operations that could fail, so we have to do
them in .prepare.

The exact split between .prepare/.commit/.abort isn't visible on the
external interfaces as long as it's done correctly, so it doesn't
necessarily have to be the same for all commands.

> >The write notifier doesn't really hurt because it is never triggered
> >between prepare and commit (we're holding the lock) and it can just be
> >removed again.
> >
> >Clearing the bitmap is a bug because the caller could expect that the
> >bitmap is in its original state if the transaction fails. I doubt this
> >is a problem in practice, but we should fix it anyway.
> 
> We make a backup to undo the process if it fails. I only mention it
> to emphasize that the atomic point appears to be during prepare. In
> practice we hold the locks for the whole process, but... I think
> Paolo may be actively trying to change that.

Well, the whole .prepare/.commit or .prepare/.abort sequence is supposed
to be atomic, so it's really the same thing. Changing this would break
the transactional behaviour, so that's not possible anyway.

> >By the way, why did we allow to add a 'bitmap' option for DriveBackup
> >without adding it to BlockdevBackup at the same time?
> 
> I don't remember. I'm not sure anyone ever audited it to convince
> themselves it was a useful or safe thing to do. I believe at the
> time I was pushing for bitmaps in DriveBackup, Fam was still
> authoring the BlockdevBackup interface.

Hm, maybe that's why. I checked the commit dates of both (and there
BlockdevBackup was earlier), but I didn't check the development history.

Should we add it now or is it a bad idea?

> >>By changing BlockJobs to only run on commit, we've severed the
> >>atomicity point such that some actions will take effect during
> >>prepare, and others at commit.
> >>
> >>I still think it's the correct thing to do to delay the BlockJobs
> >>until the commit phase, so I will start auditing the code to see how
> >>hard it is to shift the atomicity point to commit instead. If it's
> >>possible to do that, I think from the POV of the managing
> >>application, having the atomicity point be
> >>
> >>Feel free to chime in with suggestions and counterpoints until then.
> >
> >I agree that jobs have to be started only at commit. There may be other
> >things that are currently happening in prepare that really should be
> >moved as well, but unless moving one thing but not the other doesn't
> >break anything that was working, we can fix one thing at a time.
> >
> >Kevin
> >
> 
> Alright, let's give this a whirl.
> 
> We have 8 transaction actions:
> 
> drive_backup
> blockdev_backup
> block_dirty_bitmap_add
> block_dirty_bitmap_clear
> abort
> blockdev_snapshot
> blockdev_snapshot_sync
> blockdev_snapshot_internal_sync
> 
> Drive and Blockdev backup are already modified to behave
> point-in-time at time of .commit() by changing them to only begin
> running once the commit phase occurs.
> 
> Bitmap add and clear are trivial to rework; clear just moves the
> call to clear in commit, with possibly some action taken to prevent
> the bitmap from become used by some other process in the meantime.
> Add is easy to rework too, we can create it during prepare but reset
> it back to zero during commit if necessary.
> 
> Abort needs no changes.
> 
> blockdev_snapshot[_sync] actually appears to already be doing the
> right thing, by only installing the new top layer during commit,
> which makes this action inconsistent by current semantics, but
> requires no changes to move to the desired new semantics.

This doesn't sound too bad.

> That leaves only the internal snapshot to worry about, which does
> admittedly look like quite the yak to shave. It's a bit out of scope
> for me, but Kevin, do you think this is possible?
> 
> Looks like implementations are qcow2, rbd, and sheepdog. I imagine
> this would need to be split into prepare and commit semantics to
> accommodate this change... though we don't have any meaningful
> control over the rbd implementation.
> 
> Any thoughts? I could conceivably just change everything over to
> working primarily during .commit(), and just argue that the locks
> held for the transaction are sufficient to leave the internal
> snapshot alone "for now," ...

Leave them alone. We don't really support atomic internal snapshots. We
could make some heavy refactoring in order to split the BlockDriver
callbacks into prepare/commit/abort, but that's probably not worth the
effort and would make some code that already isn't tested much a lot
more complex.

If we ever decided to get serious about internal snapshots, we could
still do this. I kind of like internal snapshots, but I doubt it will
happen.

Kevin
Fam Zheng Oct. 17, 2016, 8:59 a.m. UTC | #6
On Tue, 10/11 11:35, Kevin Wolf wrote:
> > >By the way, why did we allow to add a 'bitmap' option for DriveBackup
> > >without adding it to BlockdevBackup at the same time?
> > 
> > I don't remember. I'm not sure anyone ever audited it to convince
> > themselves it was a useful or safe thing to do. I believe at the
> > time I was pushing for bitmaps in DriveBackup, Fam was still
> > authoring the BlockdevBackup interface.
> 
> Hm, maybe that's why. I checked the commit dates of both (and there
> BlockdevBackup was earlier), but I didn't check the development history.
> 
> Should we add it now or is it a bad idea?

Yes, we should add it. I'll send a separate patch. Thanks for catching that.

Fam
diff mbox

Patch

diff --git a/block/backup.c b/block/backup.c
index 7294169..aad69eb 100644
--- a/block/backup.c
+++ b/block/backup.c
@@ -527,7 +527,7 @@  static const BlockJobDriver backup_job_driver = {
     .attached_aio_context   = backup_attached_aio_context,
 };
 
-void backup_start(const char *job_id, BlockDriverState *bs,
+BlockJob *backup_job_create(const char *job_id, BlockDriverState *bs,
                   BlockDriverState *target, int64_t speed,
                   MirrorSyncMode sync_mode, BdrvDirtyBitmap *sync_bitmap,
                   bool compress,
@@ -546,52 +546,52 @@  void backup_start(const char *job_id, BlockDriverState *bs,
 
     if (bs == target) {
         error_setg(errp, "Source and target cannot be the same");
-        return;
+        return NULL;
     }
 
     if (!bdrv_is_inserted(bs)) {
         error_setg(errp, "Device is not inserted: %s",
                    bdrv_get_device_name(bs));
-        return;
+        return NULL;
     }
 
     if (!bdrv_is_inserted(target)) {
         error_setg(errp, "Device is not inserted: %s",
                    bdrv_get_device_name(target));
-        return;
+        return NULL;
     }
 
     if (compress && target->drv->bdrv_co_pwritev_compressed == NULL) {
         error_setg(errp, "Compression is not supported for this drive %s",
                    bdrv_get_device_name(target));
-        return;
+        return NULL;
     }
 
     if (bdrv_op_is_blocked(bs, BLOCK_OP_TYPE_BACKUP_SOURCE, errp)) {
-        return;
+        return NULL;
     }
 
     if (bdrv_op_is_blocked(target, BLOCK_OP_TYPE_BACKUP_TARGET, errp)) {
-        return;
+        return NULL;
     }
 
     if (sync_mode == MIRROR_SYNC_MODE_INCREMENTAL) {
         if (!sync_bitmap) {
             error_setg(errp, "must provide a valid bitmap name for "
                              "\"incremental\" sync mode");
-            return;
+            return NULL;
         }
 
         /* Create a new bitmap, and freeze/disable this one. */
         if (bdrv_dirty_bitmap_create_successor(bs, sync_bitmap, errp) < 0) {
-            return;
+            return NULL;
         }
     } else if (sync_bitmap) {
         error_setg(errp,
                    "a sync_bitmap was provided to backup_run, "
                    "but received an incompatible sync_mode (%s)",
                    MirrorSyncMode_lookup[sync_mode]);
-        return;
+        return NULL;
     }
 
     len = bdrv_getlength(bs);
@@ -638,8 +638,8 @@  void backup_start(const char *job_id, BlockDriverState *bs,
     bdrv_op_block_all(target, job->common.blocker);
     job->common.len = len;
     block_job_txn_add_job(txn, &job->common);
-    block_job_start(&job->common);
-    return;
+
+    return &job->common;
 
  error:
     if (sync_bitmap) {
@@ -649,4 +649,6 @@  void backup_start(const char *job_id, BlockDriverState *bs,
         blk_unref(job->target);
         block_job_unref(&job->common);
     }
+
+    return NULL;
 }
diff --git a/block/replication.c b/block/replication.c
index b604b93..d9cdc36 100644
--- a/block/replication.c
+++ b/block/replication.c
@@ -409,6 +409,7 @@  static void replication_start(ReplicationState *rs, ReplicationMode mode,
     int64_t active_length, hidden_length, disk_length;
     AioContext *aio_context;
     Error *local_err = NULL;
+    BlockJob *job;
 
     aio_context = bdrv_get_aio_context(bs);
     aio_context_acquire(aio_context);
@@ -496,16 +497,18 @@  static void replication_start(ReplicationState *rs, ReplicationMode mode,
         bdrv_op_block_all(top_bs, s->blocker);
         bdrv_op_unblock(top_bs, BLOCK_OP_TYPE_DATAPLANE, s->blocker);
 
-        backup_start("replication-backup", s->secondary_disk->bs,
-                     s->hidden_disk->bs, 0, MIRROR_SYNC_MODE_NONE, NULL, false,
-                     BLOCKDEV_ON_ERROR_REPORT, BLOCKDEV_ON_ERROR_REPORT,
-                     backup_job_completed, s, NULL, &local_err);
+        job = backup_job_create("replication-backup", s->secondary_disk->bs,
+                                s->hidden_disk->bs, 0, MIRROR_SYNC_MODE_NONE,
+                                NULL, false, BLOCKDEV_ON_ERROR_REPORT,
+                                BLOCKDEV_ON_ERROR_REPORT, backup_job_completed,
+                                s, NULL, &local_err);
         if (local_err) {
             error_propagate(errp, local_err);
             backup_job_cleanup(s);
             aio_context_release(aio_context);
             return;
         }
+        block_job_start(job);
         break;
     default:
         aio_context_release(aio_context);
diff --git a/blockdev.c b/blockdev.c
index 0ac507f..37d78d3 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -1866,7 +1866,7 @@  typedef struct DriveBackupState {
     BlockJob *job;
 } DriveBackupState;
 
-static void do_drive_backup(DriveBackup *backup, BlockJobTxn *txn,
+static BlockJob *do_drive_backup(DriveBackup *backup, BlockJobTxn *txn,
                             Error **errp);
 
 static void drive_backup_prepare(BlkActionState *common, Error **errp)
@@ -1890,23 +1890,27 @@  static void drive_backup_prepare(BlkActionState *common, Error **errp)
     bdrv_drained_begin(bs);
     state->bs = bs;
 
-    do_drive_backup(backup, common->block_job_txn, &local_err);
+    state->job = do_drive_backup(backup, common->block_job_txn, &local_err);
     if (local_err) {
         error_propagate(errp, local_err);
         return;
     }
+}
 
-    state->job = state->bs->job;
+static void drive_backup_commit(BlkActionState *common)
+{
+    DriveBackupState *state = DO_UPCAST(DriveBackupState, common, common);
+    if (state->job) {
+        block_job_start(state->job);
+    }
 }
 
 static void drive_backup_abort(BlkActionState *common)
 {
     DriveBackupState *state = DO_UPCAST(DriveBackupState, common, common);
-    BlockDriverState *bs = state->bs;
 
-    /* Only cancel if it's the job we started */
-    if (bs && bs->job && bs->job == state->job) {
-        block_job_cancel_sync(bs->job);
+    if (state->job) {
+        block_job_cancel_sync(state->job);
     }
 }
 
@@ -1927,8 +1931,8 @@  typedef struct BlockdevBackupState {
     AioContext *aio_context;
 } BlockdevBackupState;
 
-static void do_blockdev_backup(BlockdevBackup *backup, BlockJobTxn *txn,
-                               Error **errp);
+static BlockJob *do_blockdev_backup(BlockdevBackup *backup, BlockJobTxn *txn,
+                                    Error **errp);
 
 static void blockdev_backup_prepare(BlkActionState *common, Error **errp)
 {
@@ -1961,23 +1965,27 @@  static void blockdev_backup_prepare(BlkActionState *common, Error **errp)
     state->bs = bs;
     bdrv_drained_begin(state->bs);
 
-    do_blockdev_backup(backup, common->block_job_txn, &local_err);
+    state->job = do_blockdev_backup(backup, common->block_job_txn, &local_err);
     if (local_err) {
         error_propagate(errp, local_err);
         return;
     }
+}
 
-    state->job = state->bs->job;
+static void blockdev_backup_commit(BlkActionState *common)
+{
+    BlockdevBackupState *state = DO_UPCAST(BlockdevBackupState, common, common);
+    if (state->job) {
+        block_job_start(state->job);
+    }
 }
 
 static void blockdev_backup_abort(BlkActionState *common)
 {
     BlockdevBackupState *state = DO_UPCAST(BlockdevBackupState, common, common);
-    BlockDriverState *bs = state->bs;
 
-    /* Only cancel if it's the job we started */
-    if (bs && bs->job && bs->job == state->job) {
-        block_job_cancel_sync(bs->job);
+    if (state->job) {
+        block_job_cancel_sync(state->job);
     }
 }
 
@@ -2127,12 +2135,14 @@  static const BlkActionOps actions[] = {
     [TRANSACTION_ACTION_KIND_DRIVE_BACKUP] = {
         .instance_size = sizeof(DriveBackupState),
         .prepare = drive_backup_prepare,
+        .commit = drive_backup_commit,
         .abort = drive_backup_abort,
         .clean = drive_backup_clean,
     },
     [TRANSACTION_ACTION_KIND_BLOCKDEV_BACKUP] = {
         .instance_size = sizeof(BlockdevBackupState),
         .prepare = blockdev_backup_prepare,
+        .commit = blockdev_backup_commit,
         .abort = blockdev_backup_abort,
         .clean = blockdev_backup_clean,
     },
@@ -3126,11 +3136,13 @@  out:
     aio_context_release(aio_context);
 }
 
-static void do_drive_backup(DriveBackup *backup, BlockJobTxn *txn, Error **errp)
+static BlockJob *do_drive_backup(DriveBackup *backup, BlockJobTxn *txn,
+                                 Error **errp)
 {
     BlockDriverState *bs;
     BlockDriverState *target_bs;
     BlockDriverState *source = NULL;
+    BlockJob *job = NULL;
     BdrvDirtyBitmap *bmap = NULL;
     AioContext *aio_context;
     QDict *options = NULL;
@@ -3159,7 +3171,7 @@  static void do_drive_backup(DriveBackup *backup, BlockJobTxn *txn, Error **errp)
 
     bs = qmp_get_root_bs(backup->device, errp);
     if (!bs) {
-        return;
+        return NULL;
     }
 
     aio_context = bdrv_get_aio_context(bs);
@@ -3233,9 +3245,10 @@  static void do_drive_backup(DriveBackup *backup, BlockJobTxn *txn, Error **errp)
         }
     }
 
-    backup_start(backup->job_id, bs, target_bs, backup->speed, backup->sync,
-                 bmap, backup->compress, backup->on_source_error,
-                 backup->on_target_error, NULL, bs, txn, &local_err);
+    job = backup_job_create(backup->job_id, bs, target_bs, backup->speed,
+                            backup->sync, bmap, backup->compress,
+                            backup->on_source_error, backup->on_target_error,
+                            NULL, bs, txn, &local_err);
     bdrv_unref(target_bs);
     if (local_err != NULL) {
         error_propagate(errp, local_err);
@@ -3244,11 +3257,17 @@  static void do_drive_backup(DriveBackup *backup, BlockJobTxn *txn, Error **errp)
 
 out:
     aio_context_release(aio_context);
+    return job;
 }
 
 void qmp_drive_backup(DriveBackup *arg, Error **errp)
 {
-    return do_drive_backup(arg, NULL, errp);
+
+    BlockJob *job;
+    job = do_drive_backup(arg, NULL, errp);
+    if (job) {
+        block_job_start(job);
+    }
 }
 
 BlockDeviceInfoList *qmp_query_named_block_nodes(Error **errp)
@@ -3256,12 +3275,14 @@  BlockDeviceInfoList *qmp_query_named_block_nodes(Error **errp)
     return bdrv_named_nodes_list(errp);
 }
 
-void do_blockdev_backup(BlockdevBackup *backup, BlockJobTxn *txn, Error **errp)
+BlockJob *do_blockdev_backup(BlockdevBackup *backup, BlockJobTxn *txn,
+                             Error **errp)
 {
     BlockDriverState *bs;
     BlockDriverState *target_bs;
     Error *local_err = NULL;
     AioContext *aio_context;
+    BlockJob *job = NULL;
 
     if (!backup->has_speed) {
         backup->speed = 0;
@@ -3281,7 +3302,7 @@  void do_blockdev_backup(BlockdevBackup *backup, BlockJobTxn *txn, Error **errp)
 
     bs = qmp_get_root_bs(backup->device, errp);
     if (!bs) {
-        return;
+        return NULL;
     }
 
     aio_context = bdrv_get_aio_context(bs);
@@ -3303,19 +3324,25 @@  void do_blockdev_backup(BlockdevBackup *backup, BlockJobTxn *txn, Error **errp)
             goto out;
         }
     }
-    backup_start(backup->job_id, bs, target_bs, backup->speed, backup->sync,
-                 NULL, backup->compress, backup->on_source_error,
-                 backup->on_target_error, NULL, bs, txn, &local_err);
+    job = backup_job_create(backup->job_id, bs, target_bs, backup->speed,
+                            backup->sync, NULL, backup->compress,
+                            backup->on_source_error, backup->on_target_error,
+                            NULL, bs, txn, &local_err);
     if (local_err != NULL) {
         error_propagate(errp, local_err);
     }
 out:
     aio_context_release(aio_context);
+    return job;
 }
 
 void qmp_blockdev_backup(BlockdevBackup *arg, Error **errp)
 {
-    do_blockdev_backup(arg, NULL, errp);
+    BlockJob *job;
+    job = do_blockdev_backup(arg, NULL, errp);
+    if (job) {
+        block_job_start(job);
+    }
 }
 
 /* Parameter check and block job starting for drive mirroring.
diff --git a/include/block/block_int.h b/include/block/block_int.h
index 686f6a8..738e4b4 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -737,7 +737,7 @@  void mirror_start(const char *job_id, BlockDriverState *bs,
                   void *opaque, Error **errp);
 
 /*
- * backup_start:
+ * backup_job_create:
  * @job_id: The id of the newly-created job, or %NULL to use the
  * device name of @bs.
  * @bs: Block device to operate on.
@@ -751,17 +751,18 @@  void mirror_start(const char *job_id, BlockDriverState *bs,
  * @opaque: Opaque pointer value passed to @cb.
  * @txn: Transaction that this job is part of (may be NULL).
  *
- * Start a backup operation on @bs.  Clusters in @bs are written to @target
+ * Create a backup operation on @bs.  Clusters in @bs are written to @target
  * until the job is cancelled or manually completed.
  */
-void backup_start(const char *job_id, BlockDriverState *bs,
-                  BlockDriverState *target, int64_t speed,
-                  MirrorSyncMode sync_mode, BdrvDirtyBitmap *sync_bitmap,
-                  bool compress,
-                  BlockdevOnError on_source_error,
-                  BlockdevOnError on_target_error,
-                  BlockCompletionFunc *cb, void *opaque,
-                  BlockJobTxn *txn, Error **errp);
+BlockJob *backup_job_create(const char *job_id, BlockDriverState *bs,
+                            BlockDriverState *target, int64_t speed,
+                            MirrorSyncMode sync_mode,
+                            BdrvDirtyBitmap *sync_bitmap,
+                            bool compress,
+                            BlockdevOnError on_source_error,
+                            BlockdevOnError on_target_error,
+                            BlockCompletionFunc *cb, void *opaque,
+                            BlockJobTxn *txn, Error **errp);
 
 /**
  * block_job_start: