Message ID | 20210722122627.29605-5-mreitz@redhat.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | mirror: Handle errors after READY cancel | expand |
22.07.2021 15:26, Max Reitz wrote: > Most callers of job_is_cancelled() actually want to know whether the job > is on its way to immediate termination. For example, we refuse to pause > jobs that are cancelled; but this only makes sense for jobs that are > really actually cancelled. > > A mirror job that is cancelled during READY with force=false should > absolutely be allowed to pause. This "cancellation" (which is actually > a kind of completion) You have to repeat that this "cancel" is not "cancel". So, the whole problem is that feature of mirror, on cancel in READY state do not cancel but do some specific kind of completion. You try to make this thing correctly handled on generic layer.. Did you consider instead just drop the feature from generic layer? So that all *cancel* functions always do force-cancel. Then the internal implementation become a lot clearer. But we have to support the qmp block-job-cancel of READY mirror (and commit) with force=false. We can do it as an exclusion in qmp_block_job_cancel, something like: if (job is mirror or commit AND it's ready AND force = false) mirror_soft_cancel(...); else job_cancel(...); > may take an indefinite amount of time, and so > should behave like any job during normal operation. For example, with > on-target-error=stop, the job should stop on write errors. (In > contrast, force-cancelled jobs should not get write errors, as they > should just terminate and not do further I/O.) > > Therefore, redefine job_is_cancelled() to only return true for jobs that > are force-cancelled (which as of HEAD^ means any job that interprets the > cancellation request as a request for immediate termination), and add > job_cancel_request() as the general variant, which returns true for any > jobs which have been requested to be cancelled, whether it be > immediately or after an arbitrarily long completion phase. > > Buglink: https://gitlab.com/qemu-project/qemu/-/issues/462 > Signed-off-by: Max Reitz <mreitz@redhat.com> > --- [..] > --- a/job.c > +++ b/job.c > @@ -216,6 +216,11 @@ const char *job_type_str(const Job *job) > } > > bool job_is_cancelled(Job *job) > +{ > + return job->cancelled && job->force_cancel; > +} > + > +bool job_cancel_requested(Job *job) > { > return job->cancelled; > } > @@ -650,7 +655,7 @@ static void job_conclude(Job *job) > > static void job_update_rc(Job *job) > { > - if (!job->ret && job_is_cancelled(job)) { > + if (!job->ret && job_cancel_requested(job)) { Why not job_is_cancelled() here? So in case of mirror other kind of completion we set ret to -ECANCELED? > job->ret = -ECANCELED; > } > if (job->ret) { > @@ -704,7 +709,7 @@ static int job_finalize_single(Job *job) > > /* Emit events only if we actually started */ > if (job_started(job)) { > - if (job_is_cancelled(job)) { > + if (job_cancel_requested(job)) { > job_event_cancelled(job); Same question here.. Shouldn't mirror report COMPLETED event in case of not-force cancelled in READY state? > } else { > job_event_completed(job); > @@ -1015,7 +1020,7 @@ void job_complete(Job *job, Error **errp) > if (job_apply_verb(job, JOB_VERB_COMPLETE, errp)) { > return; > } > - if (job_is_cancelled(job) || !job->driver->complete) { > + if (job_cancel_requested(job) || !job->driver->complete) { > error_setg(errp, "The active block job '%s' cannot be completed", > job->id); > return; > @@ -1043,7 +1048,7 @@ int job_finish_sync(Job *job, void (*finish)(Job *, Error **errp), Error **errp) > AIO_WAIT_WHILE(job->aio_context, > (job_enter(job), !job_is_completed(job))); > > - ret = (job_is_cancelled(job) && job->ret == 0) ? -ECANCELED : job->ret; > + ret = (job_cancel_requested(job) && job->ret == 0) ? -ECANCELED : job->ret; > job_unref(job); > return ret; > } >
On 22.07.21 19:58, Vladimir Sementsov-Ogievskiy wrote: > 22.07.2021 15:26, Max Reitz wrote: >> Most callers of job_is_cancelled() actually want to know whether the job >> is on its way to immediate termination. For example, we refuse to pause >> jobs that are cancelled; but this only makes sense for jobs that are >> really actually cancelled. >> >> A mirror job that is cancelled during READY with force=false should >> absolutely be allowed to pause. This "cancellation" (which is actually >> a kind of completion) > > You have to repeat that this "cancel" is not "cancel". > > So, the whole problem is that feature of mirror, on cancel in READY > state do not cancel but do some specific kind of completion. > > You try to make this thing correctly handled on generic layer.. > > Did you consider instead just drop the feature from generic layer? So > that all *cancel* functions always do force-cancel. Then the internal > implementation become a lot clearer. Yes, I considered that, and I’ve decided against it (for now), because such a change would obviously be an incompatible change. It would require a deprecation period, and so we would need to fix this bug now anyway. > But we have to support the qmp block-job-cancel of READY mirror (and > commit) with force=false. > > We can do it as an exclusion in qmp_block_job_cancel, something like: > > if (job is mirror or commit AND it's ready AND force = false) > mirror_soft_cancel(...); > else > job_cancel(...); I didn’t consider such a hack, though. I don’t like it. If we think that we should change our approach because mirror’s soft cancel is actually a completion mode, and the current situation is too confusing, such a change should be user-visible, too. (I think there was this idea of having job-specific flags or parameters you could change at runtime, and so you’d just change the “pivot” parameter between true or false.) Also, I don’t know whether this would really make anything “a lot” easier. After this series job_is_cancelled() already tells the true story, so all we could really change is to drop force_cancel and unify the “s->should_complete || job_cancel_requested()” conditions in block/mirror.c into one variable. So when I considered making cancel exclusively force-cancel jobs, I thought it wouldn’t actually be worth it in practice. >> may take an indefinite amount of time, and so >> should behave like any job during normal operation. For example, with >> on-target-error=stop, the job should stop on write errors. (In >> contrast, force-cancelled jobs should not get write errors, as they >> should just terminate and not do further I/O.) >> >> Therefore, redefine job_is_cancelled() to only return true for jobs that >> are force-cancelled (which as of HEAD^ means any job that interprets the >> cancellation request as a request for immediate termination), and add >> job_cancel_request() as the general variant, which returns true for any >> jobs which have been requested to be cancelled, whether it be >> immediately or after an arbitrarily long completion phase. >> >> Buglink: https://gitlab.com/qemu-project/qemu/-/issues/462 >> Signed-off-by: Max Reitz <mreitz@redhat.com> >> --- > > [..] > >> --- a/job.c >> +++ b/job.c >> @@ -216,6 +216,11 @@ const char *job_type_str(const Job *job) >> } >> bool job_is_cancelled(Job *job) >> +{ >> + return job->cancelled && job->force_cancel; >> +} >> + >> +bool job_cancel_requested(Job *job) >> { >> return job->cancelled; >> } >> @@ -650,7 +655,7 @@ static void job_conclude(Job *job) >> static void job_update_rc(Job *job) >> { >> - if (!job->ret && job_is_cancelled(job)) { >> + if (!job->ret && job_cancel_requested(job)) { > > Why not job_is_cancelled() here? > > So in case of mirror other kind of completion we set ret to -ECANCELED? I thought the return value is a user-visible thing, so I left it as-is. Seems I was wrong, more below. >> job->ret = -ECANCELED; >> } >> if (job->ret) { >> @@ -704,7 +709,7 @@ static int job_finalize_single(Job *job) >> /* Emit events only if we actually started */ >> if (job_started(job)) { >> - if (job_is_cancelled(job)) { >> + if (job_cancel_requested(job)) { >> job_event_cancelled(job); > > Same question here.. Shouldn't mirror report COMPLETED event in case > of not-force cancelled in READY state? Same here, I thought this is user-visible, nothing internal, so I should leave it as-is. Now I see that cancelling mirror post-READY indeed should result in a COMPLETED event. So I’m actually not exactly sure how mirror does that, despite this code here (which functionally isn’t changed by this patch), but it’s absolutely true that job_is_cancelled() would be more appropriate here. (No iotest failed, so I thought this change was right. Well.) >> } else { >> job_event_completed(job); >> @@ -1015,7 +1020,7 @@ void job_complete(Job *job, Error **errp) >> if (job_apply_verb(job, JOB_VERB_COMPLETE, errp)) { >> return; >> } >> - if (job_is_cancelled(job) || !job->driver->complete) { >> + if (job_cancel_requested(job) || !job->driver->complete) { >> error_setg(errp, "The active block job '%s' cannot be >> completed", >> job->id); >> return; >> @@ -1043,7 +1048,7 @@ int job_finish_sync(Job *job, void >> (*finish)(Job *, Error **errp), Error **errp) >> AIO_WAIT_WHILE(job->aio_context, >> (job_enter(job), !job_is_completed(job))); >> - ret = (job_is_cancelled(job) && job->ret == 0) ? -ECANCELED : >> job->ret; >> + ret = (job_cancel_requested(job) && job->ret == 0) ? -ECANCELED >> : job->ret; So here it should probably stay a job_is_cancelled(), too. Max >> job_unref(job); >> return ret; >> } >> > >
26.07.2021 10:09, Max Reitz wrote: > >>> job->ret = -ECANCELED; >>> } >>> if (job->ret) { >>> @@ -704,7 +709,7 @@ static int job_finalize_single(Job *job) >>> /* Emit events only if we actually started */ >>> if (job_started(job)) { >>> - if (job_is_cancelled(job)) { >>> + if (job_cancel_requested(job)) { >>> job_event_cancelled(job); >> >> Same question here.. Shouldn't mirror report COMPLETED event in case of not-force cancelled in READY state? > > Same here, I thought this is user-visible, nothing internal, so I should leave it as-is. > > Now I see that cancelling mirror post-READY indeed should result in a COMPLETED event. So I’m actually not exactly sure how mirror does that, despite this code here Hmm. Now looking at mirror code, I see that it does "s->common.job.cancelled = false"
On 27.07.21 16:47, Vladimir Sementsov-Ogievskiy wrote: > 26.07.2021 10:09, Max Reitz wrote: >> >>>> job->ret = -ECANCELED; >>>> } >>>> if (job->ret) { >>>> @@ -704,7 +709,7 @@ static int job_finalize_single(Job *job) >>>> /* Emit events only if we actually started */ >>>> if (job_started(job)) { >>>> - if (job_is_cancelled(job)) { >>>> + if (job_cancel_requested(job)) { >>>> job_event_cancelled(job); >>> >>> Same question here.. Shouldn't mirror report COMPLETED event in case >>> of not-force cancelled in READY state? >> >> Same here, I thought this is user-visible, nothing internal, so I >> should leave it as-is. >> >> Now I see that cancelling mirror post-READY indeed should result in a >> COMPLETED event. So I’m actually not exactly sure how mirror does >> that, despite this code here > > > Hmm. Now looking at mirror code, I see that it does > "s->common.job.cancelled = false" *lol*, that’s nice. So since we’ve missed the rc1 boat now, I think this is 6.2 material. I’ll look into whether we can drop that then, that would be nice. Max
diff --git a/include/qemu/job.h b/include/qemu/job.h index 8aa90f7395..032edf3c5f 100644 --- a/include/qemu/job.h +++ b/include/qemu/job.h @@ -436,9 +436,15 @@ const char *job_type_str(const Job *job); /** Returns true if the job should not be visible to the management layer. */ bool job_is_internal(Job *job); -/** Returns whether the job is scheduled for cancellation. */ +/** Returns whether the job is being cancelled. */ bool job_is_cancelled(Job *job); +/** + * Returns whether the job is scheduled for cancellation (at an + * indefinite point). + */ +bool job_cancel_requested(Job *job); + /** Returns whether the job is in a completed state. */ bool job_is_completed(Job *job); diff --git a/block/mirror.c b/block/mirror.c index c3514f4196..291d2ed040 100644 --- a/block/mirror.c +++ b/block/mirror.c @@ -938,7 +938,7 @@ static int coroutine_fn mirror_run(Job *job, Error **errp) job_transition_to_ready(&s->common.job); s->synced = true; s->actively_synced = true; - while (!job_is_cancelled(&s->common.job) && !s->should_complete) { + while (!job_cancel_requested(&s->common.job) && !s->should_complete) { job_yield(&s->common.job); } s->common.job.cancelled = false; @@ -1046,7 +1046,7 @@ static int coroutine_fn mirror_run(Job *job, Error **errp) } should_complete = s->should_complete || - job_is_cancelled(&s->common.job); + job_cancel_requested(&s->common.job); cnt = bdrv_get_dirty_count(s->dirty_bitmap); } @@ -1089,7 +1089,7 @@ static int coroutine_fn mirror_run(Job *job, Error **errp) } trace_mirror_before_sleep(s, cnt, s->synced, delay_ns); job_sleep_ns(&s->common.job, delay_ns); - if (job_is_cancelled(&s->common.job) && s->common.job.force_cancel) { + if (job_is_cancelled(&s->common.job)) { break; } s->last_pause_ns = qemu_clock_get_ns(QEMU_CLOCK_REALTIME); @@ -1101,8 +1101,7 @@ immediate_exit: * or it was cancelled prematurely so that we do not guarantee that * the target is a copy of the source. */ - assert(ret < 0 || (s->common.job.force_cancel && - job_is_cancelled(&s->common.job))); + assert(ret < 0 || job_is_cancelled(&s->common.job)); assert(need_drain); mirror_wait_for_all_io(s); } diff --git a/job.c b/job.c index e78d893a9c..c51c8077cb 100644 --- a/job.c +++ b/job.c @@ -216,6 +216,11 @@ const char *job_type_str(const Job *job) } bool job_is_cancelled(Job *job) +{ + return job->cancelled && job->force_cancel; +} + +bool job_cancel_requested(Job *job) { return job->cancelled; } @@ -650,7 +655,7 @@ static void job_conclude(Job *job) static void job_update_rc(Job *job) { - if (!job->ret && job_is_cancelled(job)) { + if (!job->ret && job_cancel_requested(job)) { job->ret = -ECANCELED; } if (job->ret) { @@ -704,7 +709,7 @@ static int job_finalize_single(Job *job) /* Emit events only if we actually started */ if (job_started(job)) { - if (job_is_cancelled(job)) { + if (job_cancel_requested(job)) { job_event_cancelled(job); } else { job_event_completed(job); @@ -1015,7 +1020,7 @@ void job_complete(Job *job, Error **errp) if (job_apply_verb(job, JOB_VERB_COMPLETE, errp)) { return; } - if (job_is_cancelled(job) || !job->driver->complete) { + if (job_cancel_requested(job) || !job->driver->complete) { error_setg(errp, "The active block job '%s' cannot be completed", job->id); return; @@ -1043,7 +1048,7 @@ int job_finish_sync(Job *job, void (*finish)(Job *, Error **errp), Error **errp) AIO_WAIT_WHILE(job->aio_context, (job_enter(job), !job_is_completed(job))); - ret = (job_is_cancelled(job) && job->ret == 0) ? -ECANCELED : job->ret; + ret = (job_cancel_requested(job) && job->ret == 0) ? -ECANCELED : job->ret; job_unref(job); return ret; }
Most callers of job_is_cancelled() actually want to know whether the job is on its way to immediate termination. For example, we refuse to pause jobs that are cancelled; but this only makes sense for jobs that are really actually cancelled. A mirror job that is cancelled during READY with force=false should absolutely be allowed to pause. This "cancellation" (which is actually a kind of completion) may take an indefinite amount of time, and so should behave like any job during normal operation. For example, with on-target-error=stop, the job should stop on write errors. (In contrast, force-cancelled jobs should not get write errors, as they should just terminate and not do further I/O.) Therefore, redefine job_is_cancelled() to only return true for jobs that are force-cancelled (which as of HEAD^ means any job that interprets the cancellation request as a request for immediate termination), and add job_cancel_request() as the general variant, which returns true for any jobs which have been requested to be cancelled, whether it be immediately or after an arbitrarily long completion phase. Buglink: https://gitlab.com/qemu-project/qemu/-/issues/462 Signed-off-by: Max Reitz <mreitz@redhat.com> --- include/qemu/job.h | 8 +++++++- block/mirror.c | 9 ++++----- job.c | 13 +++++++++---- 3 files changed, 20 insertions(+), 10 deletions(-)