diff mbox

[RFC,v4,12/21] blockjobs: ensure abort is called for cancelled jobs

Message ID 20180223235142.21501-13-jsnow@redhat.com (mailing list archive)
State New, archived
Headers show

Commit Message

John Snow Feb. 23, 2018, 11:51 p.m. UTC
Presently, even if a job is canceled post-completion as a result of
a failing peer in a transaction, it will still call .commit because
nothing has updated or changed its return code.

The reason why this does not cause problems currently is because
backup's implementation of .commit checks for cancellation itself.

I'd like to simplify this contract:

(1) Abort is called if the job/transaction fails
(2) Commit is called if the job/transaction succeeds

To this end: A job's return code, if 0, will be forcibly set as
-ECANCELED if that job has already concluded. Remove the now
redundant check in the backup job implementation.

We need to check for cancellation in both block_job_completed
AND block_job_completed_single, because jobs may be cancelled between
those two calls; for instance in transactions.

The check in block_job_completed could be removed, but there's no
point in starting to attempt to succeed a transaction that we know
in advance will fail.

This does NOT affect mirror jobs that are "canceled" during their
synchronous phase. The mirror job itself forcibly sets the canceled
property to false prior to ceding control, so such cases will invoke
the "commit" callback.

Signed-off-by: John Snow <jsnow@redhat.com>
---
 block/backup.c     |  2 +-
 block/trace-events |  1 +
 blockjob.c         | 19 +++++++++++++++----
 3 files changed, 17 insertions(+), 5 deletions(-)

Comments

Eric Blake Feb. 27, 2018, 7:49 p.m. UTC | #1
On 02/23/2018 05:51 PM, John Snow wrote:
> Presently, even if a job is canceled post-completion as a result of
> a failing peer in a transaction, it will still call .commit because
> nothing has updated or changed its return code.
> 
> The reason why this does not cause problems currently is because
> backup's implementation of .commit checks for cancellation itself.
> 
> I'd like to simplify this contract:
> 
> (1) Abort is called if the job/transaction fails
> (2) Commit is called if the job/transaction succeeds
> 
> To this end: A job's return code, if 0, will be forcibly set as
> -ECANCELED if that job has already concluded. Remove the now
> redundant check in the backup job implementation.
> 
> We need to check for cancellation in both block_job_completed
> AND block_job_completed_single, because jobs may be cancelled between
> those two calls; for instance in transactions.
> 
> The check in block_job_completed could be removed, but there's no
> point in starting to attempt to succeed a transaction that we know
> in advance will fail.
> 
> This does NOT affect mirror jobs that are "canceled" during their
> synchronous phase. The mirror job itself forcibly sets the canceled
> property to false prior to ceding control, so such cases will invoke
> the "commit" callback.
> 
> Signed-off-by: John Snow <jsnow@redhat.com>
> ---
>   block/backup.c     |  2 +-
>   block/trace-events |  1 +
>   blockjob.c         | 19 +++++++++++++++----
>   3 files changed, 17 insertions(+), 5 deletions(-)

More lines of code, but the contract does seem simpler and useful for 
the later patches.

Reviewed-by: Eric Blake <eblake@redhat.com>
John Snow Feb. 27, 2018, 8:43 p.m. UTC | #2
On 02/27/2018 02:49 PM, Eric Blake wrote:
> On 02/23/2018 05:51 PM, John Snow wrote:
>> Presently, even if a job is canceled post-completion as a result of
>> a failing peer in a transaction, it will still call .commit because
>> nothing has updated or changed its return code.
>>
>> The reason why this does not cause problems currently is because
>> backup's implementation of .commit checks for cancellation itself.
>>
>> I'd like to simplify this contract:
>>
>> (1) Abort is called if the job/transaction fails
>> (2) Commit is called if the job/transaction succeeds
>>
>> To this end: A job's return code, if 0, will be forcibly set as
>> -ECANCELED if that job has already concluded. Remove the now
>> redundant check in the backup job implementation.
>>
>> We need to check for cancellation in both block_job_completed
>> AND block_job_completed_single, because jobs may be cancelled between
>> those two calls; for instance in transactions.
>>
>> The check in block_job_completed could be removed, but there's no
>> point in starting to attempt to succeed a transaction that we know
>> in advance will fail.
>>
>> This does NOT affect mirror jobs that are "canceled" during their
>> synchronous phase. The mirror job itself forcibly sets the canceled
>> property to false prior to ceding control, so such cases will invoke
>> the "commit" callback.
>>
>> Signed-off-by: John Snow <jsnow@redhat.com>
>> ---
>>   block/backup.c     |  2 +-
>>   block/trace-events |  1 +
>>   blockjob.c         | 19 +++++++++++++++----
>>   3 files changed, 17 insertions(+), 5 deletions(-)
> 
> More lines of code, but the contract does seem simpler and useful for
> the later patches.
> 

Unfortunately yes, but it would be less overall if more than Backup made
use of commit/abort, which I anticipate in the future when I try to
start switching jobs over to using the .prepare callback.

It was just genuinely shocking to me that we'd call commit(), but backup
secretly knew we wanted abort. That type of logic belongs in the core
dispatch layer.

> Reviewed-by: Eric Blake <eblake@redhat.com>
> 

Thanks for the reviews!
Kevin Wolf Feb. 28, 2018, 4:05 p.m. UTC | #3
Am 24.02.2018 um 00:51 hat John Snow geschrieben:
> Presently, even if a job is canceled post-completion as a result of
> a failing peer in a transaction, it will still call .commit because
> nothing has updated or changed its return code.
> 
> The reason why this does not cause problems currently is because
> backup's implementation of .commit checks for cancellation itself.
> 
> I'd like to simplify this contract:
> 
> (1) Abort is called if the job/transaction fails
> (2) Commit is called if the job/transaction succeeds
> 
> To this end: A job's return code, if 0, will be forcibly set as
> -ECANCELED if that job has already concluded. Remove the now
> redundant check in the backup job implementation.

Thanks, another fix that addresses a comment I made on an earlier patch!

> We need to check for cancellation in both block_job_completed
> AND block_job_completed_single, because jobs may be cancelled between
> those two calls; for instance in transactions.
> 
> The check in block_job_completed could be removed, but there's no
> point in starting to attempt to succeed a transaction that we know
> in advance will fail.
> 
> This does NOT affect mirror jobs that are "canceled" during their
> synchronous phase. The mirror job itself forcibly sets the canceled
> property to false prior to ceding control, so such cases will invoke
> the "commit" callback.

Which is a sign that mirror is abusing the interface. But yes, we need
to keep it working.

> Signed-off-by: John Snow <jsnow@redhat.com>
> ---
>  block/backup.c     |  2 +-
>  block/trace-events |  1 +
>  blockjob.c         | 19 +++++++++++++++----
>  3 files changed, 17 insertions(+), 5 deletions(-)
> 
> diff --git a/block/backup.c b/block/backup.c
> index 7e254dabff..453cd62c24 100644
> --- a/block/backup.c
> +++ b/block/backup.c
> @@ -206,7 +206,7 @@ static void backup_cleanup_sync_bitmap(BackupBlockJob *job, int ret)
>      BdrvDirtyBitmap *bm;
>      BlockDriverState *bs = blk_bs(job->common.blk);
>  
> -    if (ret < 0 || block_job_is_cancelled(&job->common)) {
> +    if (ret < 0) {
>          /* Merge the successor back into the parent, delete nothing. */
>          bm = bdrv_reclaim_dirty_bitmap(bs, job->sync_bitmap, NULL);
>          assert(bm);
> diff --git a/block/trace-events b/block/trace-events
> index 266afd9e99..5e531e0310 100644
> --- a/block/trace-events
> +++ b/block/trace-events
> @@ -5,6 +5,7 @@ bdrv_open_common(void *bs, const char *filename, int flags, const char *format_n
>  bdrv_lock_medium(void *bs, bool locked) "bs %p locked %d"
>  
>  # blockjob.c
> +block_job_completed(void *job, int ret, int jret) "job %p ret %d corrected ret %d"
>  block_job_state_transition(void *job,  int ret, const char *legal, const char *s0, const char *s1) "job %p (ret: %d) attempting %s transition (%s-->%s)"
>  block_job_apply_verb(void *job, const char *state, const char *verb, const char *legal) "job %p in state %s; applying verb %s (%s)"
>  
> diff --git a/blockjob.c b/blockjob.c
> index 4d29391673..ef17dea004 100644
> --- a/blockjob.c
> +++ b/blockjob.c
> @@ -384,13 +384,22 @@ void block_job_start(BlockJob *job)
>      bdrv_coroutine_enter(blk_bs(job->blk), job->co);
>  }
>  
> +static void block_job_update_rc(BlockJob *job)
> +{
> +    if (!job->ret && block_job_is_cancelled(job)) {
> +        job->ret = -ECANCELED;
> +    }
> +    if (job->ret) {
> +        block_job_state_transition(job, BLOCK_JOB_STATUS_ABORTING);
> +    }
> +}
> +
>  static void block_job_completed_single(BlockJob *job)
>  {
>      assert(job->completed);
>  
> -    if (job->ret || block_job_is_cancelled(job)) {
> -        block_job_state_transition(job, BLOCK_JOB_STATUS_ABORTING);
> -    }
> +    /* Ensure abort is called for late-transactional failures */
> +    block_job_update_rc(job);
>  
>      if (!job->ret) {
>          if (job->driver->commit) {
> @@ -898,7 +907,9 @@ void block_job_completed(BlockJob *job, int ret)
>      assert(blk_bs(job->blk)->job == job);
>      job->completed = true;
>      job->ret = ret;
> -    if (ret < 0 || block_job_is_cancelled(job)) {
> +    block_job_update_rc(job);

Hmm... We are transitioning to ABORTED here now. This means that in
block_job_completed_single() we get an ABORTED -> ABORTED transition.
A bit ugly, but not really a problem. If you decide to remove the
s0 == s1 special case in block_job_state_transition(), you'll need to
allow this transition in the table.

Reviewed-by: Kevin Wolf <kwolf@redhat.com>
diff mbox

Patch

diff --git a/block/backup.c b/block/backup.c
index 7e254dabff..453cd62c24 100644
--- a/block/backup.c
+++ b/block/backup.c
@@ -206,7 +206,7 @@  static void backup_cleanup_sync_bitmap(BackupBlockJob *job, int ret)
     BdrvDirtyBitmap *bm;
     BlockDriverState *bs = blk_bs(job->common.blk);
 
-    if (ret < 0 || block_job_is_cancelled(&job->common)) {
+    if (ret < 0) {
         /* Merge the successor back into the parent, delete nothing. */
         bm = bdrv_reclaim_dirty_bitmap(bs, job->sync_bitmap, NULL);
         assert(bm);
diff --git a/block/trace-events b/block/trace-events
index 266afd9e99..5e531e0310 100644
--- a/block/trace-events
+++ b/block/trace-events
@@ -5,6 +5,7 @@  bdrv_open_common(void *bs, const char *filename, int flags, const char *format_n
 bdrv_lock_medium(void *bs, bool locked) "bs %p locked %d"
 
 # blockjob.c
+block_job_completed(void *job, int ret, int jret) "job %p ret %d corrected ret %d"
 block_job_state_transition(void *job,  int ret, const char *legal, const char *s0, const char *s1) "job %p (ret: %d) attempting %s transition (%s-->%s)"
 block_job_apply_verb(void *job, const char *state, const char *verb, const char *legal) "job %p in state %s; applying verb %s (%s)"
 
diff --git a/blockjob.c b/blockjob.c
index 4d29391673..ef17dea004 100644
--- a/blockjob.c
+++ b/blockjob.c
@@ -384,13 +384,22 @@  void block_job_start(BlockJob *job)
     bdrv_coroutine_enter(blk_bs(job->blk), job->co);
 }
 
+static void block_job_update_rc(BlockJob *job)
+{
+    if (!job->ret && block_job_is_cancelled(job)) {
+        job->ret = -ECANCELED;
+    }
+    if (job->ret) {
+        block_job_state_transition(job, BLOCK_JOB_STATUS_ABORTING);
+    }
+}
+
 static void block_job_completed_single(BlockJob *job)
 {
     assert(job->completed);
 
-    if (job->ret || block_job_is_cancelled(job)) {
-        block_job_state_transition(job, BLOCK_JOB_STATUS_ABORTING);
-    }
+    /* Ensure abort is called for late-transactional failures */
+    block_job_update_rc(job);
 
     if (!job->ret) {
         if (job->driver->commit) {
@@ -898,7 +907,9 @@  void block_job_completed(BlockJob *job, int ret)
     assert(blk_bs(job->blk)->job == job);
     job->completed = true;
     job->ret = ret;
-    if (ret < 0 || block_job_is_cancelled(job)) {
+    block_job_update_rc(job);
+    trace_block_job_completed(job, ret, job->ret);
+    if (job->ret) {
         block_job_completed_txn_abort(job);
     } else {
         block_job_completed_txn_success(job);