diff mbox series

[for-6.1?,4/6] job: Add job_cancel_requested()

Message ID 20210722122627.29605-5-mreitz@redhat.com (mailing list archive)
State New, archived
Headers show
Series mirror: Handle errors after READY cancel | expand

Commit Message

Max Reitz July 22, 2021, 12:26 p.m. UTC
Most callers of job_is_cancelled() actually want to know whether the job
is on its way to immediate termination.  For example, we refuse to pause
jobs that are cancelled; but this only makes sense for jobs that are
really actually cancelled.

A mirror job that is cancelled during READY with force=false should
absolutely be allowed to pause.  This "cancellation" (which is actually
a kind of completion) may take an indefinite amount of time, and so
should behave like any job during normal operation.  For example, with
on-target-error=stop, the job should stop on write errors.  (In
contrast, force-cancelled jobs should not get write errors, as they
should just terminate and not do further I/O.)

Therefore, redefine job_is_cancelled() to only return true for jobs that
are force-cancelled (which as of HEAD^ means any job that interprets the
cancellation request as a request for immediate termination), and add
job_cancel_request() as the general variant, which returns true for any
jobs which have been requested to be cancelled, whether it be
immediately or after an arbitrarily long completion phase.

Buglink: https://gitlab.com/qemu-project/qemu/-/issues/462
Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 include/qemu/job.h |  8 +++++++-
 block/mirror.c     |  9 ++++-----
 job.c              | 13 +++++++++----
 3 files changed, 20 insertions(+), 10 deletions(-)

Comments

Vladimir Sementsov-Ogievskiy July 22, 2021, 5:58 p.m. UTC | #1
22.07.2021 15:26, Max Reitz wrote:
> Most callers of job_is_cancelled() actually want to know whether the job
> is on its way to immediate termination.  For example, we refuse to pause
> jobs that are cancelled; but this only makes sense for jobs that are
> really actually cancelled.
> 
> A mirror job that is cancelled during READY with force=false should
> absolutely be allowed to pause.  This "cancellation" (which is actually
> a kind of completion) 

You have to repeat that this "cancel" is not "cancel".

So, the whole problem is that feature of mirror, on cancel in READY state do not cancel but do some specific kind of completion.

You try to make this thing correctly handled on generic layer..

Did you consider instead just drop the feature from generic layer? So that all *cancel* functions always do force-cancel. Then the internal implementation become a lot clearer.

But we have to support the qmp block-job-cancel of READY mirror (and commit) with force=false.

We can do it as an exclusion in qmp_block_job_cancel, something like:

if (job is mirror or commit AND it's ready AND force = false)
    mirror_soft_cancel(...);
else
    job_cancel(...);


> may take an indefinite amount of time, and so
> should behave like any job during normal operation.  For example, with
> on-target-error=stop, the job should stop on write errors.  (In
> contrast, force-cancelled jobs should not get write errors, as they
> should just terminate and not do further I/O.)
> 
> Therefore, redefine job_is_cancelled() to only return true for jobs that
> are force-cancelled (which as of HEAD^ means any job that interprets the
> cancellation request as a request for immediate termination), and add
> job_cancel_request() as the general variant, which returns true for any
> jobs which have been requested to be cancelled, whether it be
> immediately or after an arbitrarily long completion phase.
> 
> Buglink: https://gitlab.com/qemu-project/qemu/-/issues/462
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> ---

[..]

> --- a/job.c
> +++ b/job.c
> @@ -216,6 +216,11 @@ const char *job_type_str(const Job *job)
>   }
>   
>   bool job_is_cancelled(Job *job)
> +{
> +    return job->cancelled && job->force_cancel;
> +}
> +
> +bool job_cancel_requested(Job *job)
>   {
>       return job->cancelled;
>   }
> @@ -650,7 +655,7 @@ static void job_conclude(Job *job)
>   
>   static void job_update_rc(Job *job)
>   {
> -    if (!job->ret && job_is_cancelled(job)) {
> +    if (!job->ret && job_cancel_requested(job)) {

Why not job_is_cancelled() here?

So in case of mirror other kind of completion we set ret to -ECANCELED?

>           job->ret = -ECANCELED;
>       }
>       if (job->ret) {
> @@ -704,7 +709,7 @@ static int job_finalize_single(Job *job)
>   
>       /* Emit events only if we actually started */
>       if (job_started(job)) {
> -        if (job_is_cancelled(job)) {
> +        if (job_cancel_requested(job)) {
>               job_event_cancelled(job);

Same question here.. Shouldn't mirror report COMPLETED event in case of not-force cancelled in READY state?

>           } else {
>               job_event_completed(job);
> @@ -1015,7 +1020,7 @@ void job_complete(Job *job, Error **errp)
>       if (job_apply_verb(job, JOB_VERB_COMPLETE, errp)) {
>           return;
>       }
> -    if (job_is_cancelled(job) || !job->driver->complete) {
> +    if (job_cancel_requested(job) || !job->driver->complete) {
>           error_setg(errp, "The active block job '%s' cannot be completed",
>                      job->id);
>           return;
> @@ -1043,7 +1048,7 @@ int job_finish_sync(Job *job, void (*finish)(Job *, Error **errp), Error **errp)
>       AIO_WAIT_WHILE(job->aio_context,
>                      (job_enter(job), !job_is_completed(job)));
>   
> -    ret = (job_is_cancelled(job) && job->ret == 0) ? -ECANCELED : job->ret;
> +    ret = (job_cancel_requested(job) && job->ret == 0) ? -ECANCELED : job->ret;
>       job_unref(job);
>       return ret;
>   }
>
Max Reitz July 26, 2021, 7:09 a.m. UTC | #2
On 22.07.21 19:58, Vladimir Sementsov-Ogievskiy wrote:
> 22.07.2021 15:26, Max Reitz wrote:
>> Most callers of job_is_cancelled() actually want to know whether the job
>> is on its way to immediate termination.  For example, we refuse to pause
>> jobs that are cancelled; but this only makes sense for jobs that are
>> really actually cancelled.
>>
>> A mirror job that is cancelled during READY with force=false should
>> absolutely be allowed to pause.  This "cancellation" (which is actually
>> a kind of completion) 
>
> You have to repeat that this "cancel" is not "cancel".
>
> So, the whole problem is that feature of mirror, on cancel in READY 
> state do not cancel but do some specific kind of completion.
>
> You try to make this thing correctly handled on generic layer..
>
> Did you consider instead just drop the feature from generic layer? So 
> that all *cancel* functions always do force-cancel. Then the internal 
> implementation become a lot clearer.

Yes, I considered that, and I’ve decided against it (for now), because 
such a change would obviously be an incompatible change.  It would 
require a deprecation period, and so we would need to fix this bug now 
anyway.

> But we have to support the qmp block-job-cancel of READY mirror (and 
> commit) with force=false.
>
> We can do it as an exclusion in qmp_block_job_cancel, something like:
>
> if (job is mirror or commit AND it's ready AND force = false)
>    mirror_soft_cancel(...);
> else
>    job_cancel(...);

I didn’t consider such a hack, though.  I don’t like it.  If we think 
that we should change our approach because mirror’s soft cancel is 
actually a completion mode, and the current situation is too confusing, 
such a change should be user-visible, too.  (I think there was this idea 
of having job-specific flags or parameters you could change at runtime, 
and so you’d just change the “pivot” parameter between true or false.)

Also, I don’t know whether this would really make anything “a lot” 
easier.  After this series job_is_cancelled() already tells the true 
story, so all we could really change is to drop force_cancel and unify 
the “s->should_complete || job_cancel_requested()” conditions in 
block/mirror.c into one variable.  So when I considered making cancel 
exclusively force-cancel jobs, I thought it wouldn’t actually be worth 
it in practice.

>> may take an indefinite amount of time, and so
>> should behave like any job during normal operation.  For example, with
>> on-target-error=stop, the job should stop on write errors.  (In
>> contrast, force-cancelled jobs should not get write errors, as they
>> should just terminate and not do further I/O.)
>>
>> Therefore, redefine job_is_cancelled() to only return true for jobs that
>> are force-cancelled (which as of HEAD^ means any job that interprets the
>> cancellation request as a request for immediate termination), and add
>> job_cancel_request() as the general variant, which returns true for any
>> jobs which have been requested to be cancelled, whether it be
>> immediately or after an arbitrarily long completion phase.
>>
>> Buglink: https://gitlab.com/qemu-project/qemu/-/issues/462
>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>> ---
>
> [..]
>
>> --- a/job.c
>> +++ b/job.c
>> @@ -216,6 +216,11 @@ const char *job_type_str(const Job *job)
>>   }
>>     bool job_is_cancelled(Job *job)
>> +{
>> +    return job->cancelled && job->force_cancel;
>> +}
>> +
>> +bool job_cancel_requested(Job *job)
>>   {
>>       return job->cancelled;
>>   }
>> @@ -650,7 +655,7 @@ static void job_conclude(Job *job)
>>     static void job_update_rc(Job *job)
>>   {
>> -    if (!job->ret && job_is_cancelled(job)) {
>> +    if (!job->ret && job_cancel_requested(job)) {
>
> Why not job_is_cancelled() here?
>
> So in case of mirror other kind of completion we set ret to -ECANCELED?

I thought the return value is a user-visible thing, so I left it as-is.

Seems I was wrong, more below.

>>           job->ret = -ECANCELED;
>>       }
>>       if (job->ret) {
>> @@ -704,7 +709,7 @@ static int job_finalize_single(Job *job)
>>         /* Emit events only if we actually started */
>>       if (job_started(job)) {
>> -        if (job_is_cancelled(job)) {
>> +        if (job_cancel_requested(job)) {
>>               job_event_cancelled(job);
>
> Same question here.. Shouldn't mirror report COMPLETED event in case 
> of not-force cancelled in READY state?

Same here, I thought this is user-visible, nothing internal, so I should 
leave it as-is.

Now I see that cancelling mirror post-READY indeed should result in a 
COMPLETED event.  So I’m actually not exactly sure how mirror does that, 
despite this code here (which functionally isn’t changed by this patch), 
but it’s absolutely true that job_is_cancelled() would be more 
appropriate here.

(No iotest failed, so I thought this change was right.  Well.)

>>           } else {
>>               job_event_completed(job);
>> @@ -1015,7 +1020,7 @@ void job_complete(Job *job, Error **errp)
>>       if (job_apply_verb(job, JOB_VERB_COMPLETE, errp)) {
>>           return;
>>       }
>> -    if (job_is_cancelled(job) || !job->driver->complete) {
>> +    if (job_cancel_requested(job) || !job->driver->complete) {
>>           error_setg(errp, "The active block job '%s' cannot be 
>> completed",
>>                      job->id);
>>           return;
>> @@ -1043,7 +1048,7 @@ int job_finish_sync(Job *job, void 
>> (*finish)(Job *, Error **errp), Error **errp)
>>       AIO_WAIT_WHILE(job->aio_context,
>>                      (job_enter(job), !job_is_completed(job)));
>>   -    ret = (job_is_cancelled(job) && job->ret == 0) ? -ECANCELED : 
>> job->ret;
>> +    ret = (job_cancel_requested(job) && job->ret == 0) ? -ECANCELED 
>> : job->ret;

So here it should probably stay a job_is_cancelled(), too.

Max

>>       job_unref(job);
>>       return ret;
>>   }
>>
>
>
Vladimir Sementsov-Ogievskiy July 27, 2021, 2:47 p.m. UTC | #3
26.07.2021 10:09, Max Reitz wrote:
> 
>>>           job->ret = -ECANCELED;
>>>       }
>>>       if (job->ret) {
>>> @@ -704,7 +709,7 @@ static int job_finalize_single(Job *job)
>>>         /* Emit events only if we actually started */
>>>       if (job_started(job)) {
>>> -        if (job_is_cancelled(job)) {
>>> +        if (job_cancel_requested(job)) {
>>>               job_event_cancelled(job);
>>
>> Same question here.. Shouldn't mirror report COMPLETED event in case of not-force cancelled in READY state?
> 
> Same here, I thought this is user-visible, nothing internal, so I should leave it as-is.
> 
> Now I see that cancelling mirror post-READY indeed should result in a COMPLETED event.  So I’m actually not exactly sure how mirror does that, despite this code here


Hmm. Now looking at mirror code, I see that it does "s->common.job.cancelled = false"
Max Reitz July 27, 2021, 3:30 p.m. UTC | #4
On 27.07.21 16:47, Vladimir Sementsov-Ogievskiy wrote:
> 26.07.2021 10:09, Max Reitz wrote:
>>
>>>>           job->ret = -ECANCELED;
>>>>       }
>>>>       if (job->ret) {
>>>> @@ -704,7 +709,7 @@ static int job_finalize_single(Job *job)
>>>>         /* Emit events only if we actually started */
>>>>       if (job_started(job)) {
>>>> -        if (job_is_cancelled(job)) {
>>>> +        if (job_cancel_requested(job)) {
>>>>               job_event_cancelled(job);
>>>
>>> Same question here.. Shouldn't mirror report COMPLETED event in case 
>>> of not-force cancelled in READY state?
>>
>> Same here, I thought this is user-visible, nothing internal, so I 
>> should leave it as-is.
>>
>> Now I see that cancelling mirror post-READY indeed should result in a 
>> COMPLETED event.  So I’m actually not exactly sure how mirror does 
>> that, despite this code here
>
>
> Hmm. Now looking at mirror code, I see that it does 
> "s->common.job.cancelled = false"

*lol*, that’s nice.

So since we’ve missed the rc1 boat now, I think this is 6.2 material.  
I’ll look into whether we can drop that then, that would be nice.

Max
diff mbox series

Patch

diff --git a/include/qemu/job.h b/include/qemu/job.h
index 8aa90f7395..032edf3c5f 100644
--- a/include/qemu/job.h
+++ b/include/qemu/job.h
@@ -436,9 +436,15 @@  const char *job_type_str(const Job *job);
 /** Returns true if the job should not be visible to the management layer. */
 bool job_is_internal(Job *job);
 
-/** Returns whether the job is scheduled for cancellation. */
+/** Returns whether the job is being cancelled. */
 bool job_is_cancelled(Job *job);
 
+/**
+ * Returns whether the job is scheduled for cancellation (at an
+ * indefinite point).
+ */
+bool job_cancel_requested(Job *job);
+
 /** Returns whether the job is in a completed state. */
 bool job_is_completed(Job *job);
 
diff --git a/block/mirror.c b/block/mirror.c
index c3514f4196..291d2ed040 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -938,7 +938,7 @@  static int coroutine_fn mirror_run(Job *job, Error **errp)
         job_transition_to_ready(&s->common.job);
         s->synced = true;
         s->actively_synced = true;
-        while (!job_is_cancelled(&s->common.job) && !s->should_complete) {
+        while (!job_cancel_requested(&s->common.job) && !s->should_complete) {
             job_yield(&s->common.job);
         }
         s->common.job.cancelled = false;
@@ -1046,7 +1046,7 @@  static int coroutine_fn mirror_run(Job *job, Error **errp)
             }
 
             should_complete = s->should_complete ||
-                job_is_cancelled(&s->common.job);
+                job_cancel_requested(&s->common.job);
             cnt = bdrv_get_dirty_count(s->dirty_bitmap);
         }
 
@@ -1089,7 +1089,7 @@  static int coroutine_fn mirror_run(Job *job, Error **errp)
         }
         trace_mirror_before_sleep(s, cnt, s->synced, delay_ns);
         job_sleep_ns(&s->common.job, delay_ns);
-        if (job_is_cancelled(&s->common.job) && s->common.job.force_cancel) {
+        if (job_is_cancelled(&s->common.job)) {
             break;
         }
         s->last_pause_ns = qemu_clock_get_ns(QEMU_CLOCK_REALTIME);
@@ -1101,8 +1101,7 @@  immediate_exit:
          * or it was cancelled prematurely so that we do not guarantee that
          * the target is a copy of the source.
          */
-        assert(ret < 0 || (s->common.job.force_cancel &&
-               job_is_cancelled(&s->common.job)));
+        assert(ret < 0 || job_is_cancelled(&s->common.job));
         assert(need_drain);
         mirror_wait_for_all_io(s);
     }
diff --git a/job.c b/job.c
index e78d893a9c..c51c8077cb 100644
--- a/job.c
+++ b/job.c
@@ -216,6 +216,11 @@  const char *job_type_str(const Job *job)
 }
 
 bool job_is_cancelled(Job *job)
+{
+    return job->cancelled && job->force_cancel;
+}
+
+bool job_cancel_requested(Job *job)
 {
     return job->cancelled;
 }
@@ -650,7 +655,7 @@  static void job_conclude(Job *job)
 
 static void job_update_rc(Job *job)
 {
-    if (!job->ret && job_is_cancelled(job)) {
+    if (!job->ret && job_cancel_requested(job)) {
         job->ret = -ECANCELED;
     }
     if (job->ret) {
@@ -704,7 +709,7 @@  static int job_finalize_single(Job *job)
 
     /* Emit events only if we actually started */
     if (job_started(job)) {
-        if (job_is_cancelled(job)) {
+        if (job_cancel_requested(job)) {
             job_event_cancelled(job);
         } else {
             job_event_completed(job);
@@ -1015,7 +1020,7 @@  void job_complete(Job *job, Error **errp)
     if (job_apply_verb(job, JOB_VERB_COMPLETE, errp)) {
         return;
     }
-    if (job_is_cancelled(job) || !job->driver->complete) {
+    if (job_cancel_requested(job) || !job->driver->complete) {
         error_setg(errp, "The active block job '%s' cannot be completed",
                    job->id);
         return;
@@ -1043,7 +1048,7 @@  int job_finish_sync(Job *job, void (*finish)(Job *, Error **errp), Error **errp)
     AIO_WAIT_WHILE(job->aio_context,
                    (job_enter(job), !job_is_completed(job)));
 
-    ret = (job_is_cancelled(job) && job->ret == 0) ? -ECANCELED : job->ret;
+    ret = (job_cancel_requested(job) && job->ret == 0) ? -ECANCELED : job->ret;
     job_unref(job);
     return ret;
 }