diff mbox series

[for-6.1?,v2,6/7] mirror: Check job_is_cancelled() earlier

Message ID 20210726144613.954844-7-mreitz@redhat.com (mailing list archive)
State New, archived
Headers show
Series mirror: Handle errors after READY cancel | expand

Commit Message

Max Reitz July 26, 2021, 2:46 p.m. UTC
We must check whether the job is force-cancelled early in our main loop,
most importantly before any `continue` statement.  For example, we used
to have `continue`s before our current checking location that are
triggered by `mirror_flush()` failing.  So, if `mirror_flush()` kept
failing, force-cancelling the job would not terminate it.

A job being force-cancelled should be treated the same as the job having
failed, so put the check in the same place where we check `s->ret < 0`.

Buglink: https://gitlab.com/qemu-project/qemu/-/issues/462
Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 block/mirror.c | 7 +------
 1 file changed, 1 insertion(+), 6 deletions(-)

Comments

Vladimir Sementsov-Ogievskiy July 27, 2021, 1:13 p.m. UTC | #1
26.07.2021 17:46, Max Reitz wrote:
> We must check whether the job is force-cancelled early in our main loop,
> most importantly before any `continue` statement.  For example, we used
> to have `continue`s before our current checking location that are
> triggered by `mirror_flush()` failing.  So, if `mirror_flush()` kept
> failing, force-cancelling the job would not terminate it.
> 
> A job being force-cancelled should be treated the same as the job having
> failed, so put the check in the same place where we check `s->ret < 0`.
> 
> Buglink: https://gitlab.com/qemu-project/qemu/-/issues/462
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> ---
>   block/mirror.c | 7 +------
>   1 file changed, 1 insertion(+), 6 deletions(-)
> 
> diff --git a/block/mirror.c b/block/mirror.c
> index 72e02fa34e..46d1a1e5a2 100644
> --- a/block/mirror.c
> +++ b/block/mirror.c
> @@ -993,7 +993,7 @@ static int coroutine_fn mirror_run(Job *job, Error **errp)
>               mirror_wait_for_any_operation(s, true);
>           }
>   
> -        if (s->ret < 0) {
> +        if (s->ret < 0 || job_is_cancelled(&s->common.job)) {
>               ret = s->ret;
>               goto immediate_exit;
>           }
> @@ -1078,8 +1078,6 @@ static int coroutine_fn mirror_run(Job *job, Error **errp)
>               break;
>           }
>   
> -        ret = 0;
> -

That's just a cleanup, that statement is useless pre-patch, yes?

>           if (job_is_ready(&s->common.job) && !should_complete) {
>               delay_ns = (s->in_flight == 0 &&
>                           cnt == 0 ? BLOCK_JOB_SLICE_TIME : 0);
> @@ -1087,9 +1085,6 @@ static int coroutine_fn mirror_run(Job *job, Error **errp)
>           trace_mirror_before_sleep(s, cnt, job_is_ready(&s->common.job),
>                                     delay_ns);
>           job_sleep_ns(&s->common.job, delay_ns);
> -        if (job_is_cancelled(&s->common.job)) {
> -            break;
> -        }
>           s->last_pause_ns = qemu_clock_get_ns(QEMU_CLOCK_REALTIME);
>       }
>   
> 

Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Max Reitz July 27, 2021, 3:40 p.m. UTC | #2
On 27.07.21 15:13, Vladimir Sementsov-Ogievskiy wrote:
> 26.07.2021 17:46, Max Reitz wrote:
>> We must check whether the job is force-cancelled early in our main loop,
>> most importantly before any `continue` statement.  For example, we used
>> to have `continue`s before our current checking location that are
>> triggered by `mirror_flush()` failing.  So, if `mirror_flush()` kept
>> failing, force-cancelling the job would not terminate it.
>>
>> A job being force-cancelled should be treated the same as the job having
>> failed, so put the check in the same place where we check `s->ret < 0`.
>>
>> Buglink: https://gitlab.com/qemu-project/qemu/-/issues/462
>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>> ---
>>   block/mirror.c | 7 +------
>>   1 file changed, 1 insertion(+), 6 deletions(-)
>>
>> diff --git a/block/mirror.c b/block/mirror.c
>> index 72e02fa34e..46d1a1e5a2 100644
>> --- a/block/mirror.c
>> +++ b/block/mirror.c
>> @@ -993,7 +993,7 @@ static int coroutine_fn mirror_run(Job *job, 
>> Error **errp)
>>               mirror_wait_for_any_operation(s, true);
>>           }
>>   -        if (s->ret < 0) {
>> +        if (s->ret < 0 || job_is_cancelled(&s->common.job)) {
>>               ret = s->ret;
>>               goto immediate_exit;
>>           }
>> @@ -1078,8 +1078,6 @@ static int coroutine_fn mirror_run(Job *job, 
>> Error **errp)
>>               break;
>>           }
>>   -        ret = 0;
>> -
>
> That's just a cleanup, that statement is useless pre-patch, yes?

I think it was intended for if we left this loop via the 
job_is_cancelled() condition below.  Since it’s removed, this statement 
seems meaningless, so I removed it along with the `break`.

Max

>
>>           if (job_is_ready(&s->common.job) && !should_complete) {
>>               delay_ns = (s->in_flight == 0 &&
>>                           cnt == 0 ? BLOCK_JOB_SLICE_TIME : 0);
>> @@ -1087,9 +1085,6 @@ static int coroutine_fn mirror_run(Job *job, 
>> Error **errp)
>>           trace_mirror_before_sleep(s, cnt, 
>> job_is_ready(&s->common.job),
>>                                     delay_ns);
>>           job_sleep_ns(&s->common.job, delay_ns);
>> -        if (job_is_cancelled(&s->common.job)) {
>> -            break;
>> -        }
>>           s->last_pause_ns = qemu_clock_get_ns(QEMU_CLOCK_REALTIME);
>>       }
>>
>
> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>
Kevin Wolf Aug. 3, 2021, 2:34 p.m. UTC | #3
Am 26.07.2021 um 16:46 hat Max Reitz geschrieben:
> We must check whether the job is force-cancelled early in our main loop,
> most importantly before any `continue` statement.  For example, we used
> to have `continue`s before our current checking location that are
> triggered by `mirror_flush()` failing.  So, if `mirror_flush()` kept
> failing, force-cancelling the job would not terminate it.
> 
> A job being force-cancelled should be treated the same as the job having
> failed, so put the check in the same place where we check `s->ret < 0`.
> 
> Buglink: https://gitlab.com/qemu-project/qemu/-/issues/462
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> ---
>  block/mirror.c | 7 +------
>  1 file changed, 1 insertion(+), 6 deletions(-)
> 
> diff --git a/block/mirror.c b/block/mirror.c
> index 72e02fa34e..46d1a1e5a2 100644
> --- a/block/mirror.c
> +++ b/block/mirror.c
> @@ -993,7 +993,7 @@ static int coroutine_fn mirror_run(Job *job, Error **errp)
>              mirror_wait_for_any_operation(s, true);
>          }
>  
> -        if (s->ret < 0) {
> +        if (s->ret < 0 || job_is_cancelled(&s->common.job)) {
>              ret = s->ret;
>              goto immediate_exit;
>          }
> @@ -1078,8 +1078,6 @@ static int coroutine_fn mirror_run(Job *job, Error **errp)
>              break;
>          }
>  
> -        ret = 0;
> -
>          if (job_is_ready(&s->common.job) && !should_complete) {
>              delay_ns = (s->in_flight == 0 &&
>                          cnt == 0 ? BLOCK_JOB_SLICE_TIME : 0);
> @@ -1087,9 +1085,6 @@ static int coroutine_fn mirror_run(Job *job, Error **errp)
>          trace_mirror_before_sleep(s, cnt, job_is_ready(&s->common.job),
>                                    delay_ns);
>          job_sleep_ns(&s->common.job, delay_ns);
> -        if (job_is_cancelled(&s->common.job)) {
> -            break;
> -        }

I think it was intentional that the check is here because it means
skipping the job_sleep_ns() and instead cancelling immediately, and we
probably still want that. Between your check above and here, the
coroutine can yield, so cancellation could have been newly requested.

So have the check in both places, I guess? And a comment to explain why
neither is redundant.

>          s->last_pause_ns = qemu_clock_get_ns(QEMU_CLOCK_REALTIME);
>      }

Kevin
Max Reitz Aug. 4, 2021, 8:25 a.m. UTC | #4
On 03.08.21 16:34, Kevin Wolf wrote:
> Am 26.07.2021 um 16:46 hat Max Reitz geschrieben:
>> We must check whether the job is force-cancelled early in our main loop,
>> most importantly before any `continue` statement.  For example, we used
>> to have `continue`s before our current checking location that are
>> triggered by `mirror_flush()` failing.  So, if `mirror_flush()` kept
>> failing, force-cancelling the job would not terminate it.
>>
>> A job being force-cancelled should be treated the same as the job having
>> failed, so put the check in the same place where we check `s->ret < 0`.
>>
>> Buglink: https://gitlab.com/qemu-project/qemu/-/issues/462
>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>> ---
>>   block/mirror.c | 7 +------
>>   1 file changed, 1 insertion(+), 6 deletions(-)
>>
>> diff --git a/block/mirror.c b/block/mirror.c
>> index 72e02fa34e..46d1a1e5a2 100644
>> --- a/block/mirror.c
>> +++ b/block/mirror.c
>> @@ -993,7 +993,7 @@ static int coroutine_fn mirror_run(Job *job, Error **errp)
>>               mirror_wait_for_any_operation(s, true);
>>           }
>>   
>> -        if (s->ret < 0) {
>> +        if (s->ret < 0 || job_is_cancelled(&s->common.job)) {
>>               ret = s->ret;
>>               goto immediate_exit;
>>           }
>> @@ -1078,8 +1078,6 @@ static int coroutine_fn mirror_run(Job *job, Error **errp)
>>               break;
>>           }
>>   
>> -        ret = 0;
>> -
>>           if (job_is_ready(&s->common.job) && !should_complete) {
>>               delay_ns = (s->in_flight == 0 &&
>>                           cnt == 0 ? BLOCK_JOB_SLICE_TIME : 0);
>> @@ -1087,9 +1085,6 @@ static int coroutine_fn mirror_run(Job *job, Error **errp)
>>           trace_mirror_before_sleep(s, cnt, job_is_ready(&s->common.job),
>>                                     delay_ns);
>>           job_sleep_ns(&s->common.job, delay_ns);
>> -        if (job_is_cancelled(&s->common.job)) {
>> -            break;
>> -        }
> I think it was intentional that the check is here because it means
> skipping the job_sleep_ns() and instead cancelling immediately, and we
> probably still want that. Between your check above and here, the
> coroutine can yield, so cancellation could have been newly requested.

I’m afraid I don’t quite understand.  If cancel is requested in 
job_sleep_ns(), then we will go back to the top of the loop, wait for 
in-flight active requests and then break.  Waiting for the in-flight 
requests seems unnecessary, but does it really make a difference in 
practice?  We don’t start new requests, so it should be legal to wait 
for existing ones to settle, and also I believe someone will have to 
wait for those in-flight requests anyway (when the mirror top node is 
removed).  (The only thing we could do is to cancel the in-flight 
requests, but that is what mirror_cancel() does.)

Looking more at the whole loop, there are a couple of places that can 
yield.  Of course we can check whether the job has been cancelled after 
every single one of them, but that would be a bit strange.  We only 
really need to check before we initiate new requests or want to change 
the state.  I believe the right place to do the check would be after the 
job_pause_point().

And perhaps the active write functions (bdrv_mirror_top_do_write() and 
bdrv_mirror_top_pwritev()) should stop copying to the target if the job 
has been cancelled.

Max

> So have the check in both places, I guess? And a comment to explain why
> neither is redundant.
>
>>           s->last_pause_ns = qemu_clock_get_ns(QEMU_CLOCK_REALTIME);
>>       }
> Kevin
>
Kevin Wolf Aug. 4, 2021, 9:48 a.m. UTC | #5
Am 04.08.2021 um 10:25 hat Max Reitz geschrieben:
> On 03.08.21 16:34, Kevin Wolf wrote:
> > Am 26.07.2021 um 16:46 hat Max Reitz geschrieben:
> > > We must check whether the job is force-cancelled early in our main loop,
> > > most importantly before any `continue` statement.  For example, we used
> > > to have `continue`s before our current checking location that are
> > > triggered by `mirror_flush()` failing.  So, if `mirror_flush()` kept
> > > failing, force-cancelling the job would not terminate it.
> > > 
> > > A job being force-cancelled should be treated the same as the job having
> > > failed, so put the check in the same place where we check `s->ret < 0`.
> > > 
> > > Buglink: https://gitlab.com/qemu-project/qemu/-/issues/462
> > > Signed-off-by: Max Reitz <mreitz@redhat.com>
> > > ---
> > >   block/mirror.c | 7 +------
> > >   1 file changed, 1 insertion(+), 6 deletions(-)
> > > 
> > > diff --git a/block/mirror.c b/block/mirror.c
> > > index 72e02fa34e..46d1a1e5a2 100644
> > > --- a/block/mirror.c
> > > +++ b/block/mirror.c
> > > @@ -993,7 +993,7 @@ static int coroutine_fn mirror_run(Job *job, Error **errp)
> > >               mirror_wait_for_any_operation(s, true);
> > >           }
> > > -        if (s->ret < 0) {
> > > +        if (s->ret < 0 || job_is_cancelled(&s->common.job)) {
> > >               ret = s->ret;
> > >               goto immediate_exit;
> > >           }
> > > @@ -1078,8 +1078,6 @@ static int coroutine_fn mirror_run(Job *job, Error **errp)
> > >               break;
> > >           }
> > > -        ret = 0;
> > > -
> > >           if (job_is_ready(&s->common.job) && !should_complete) {
> > >               delay_ns = (s->in_flight == 0 &&
> > >                           cnt == 0 ? BLOCK_JOB_SLICE_TIME : 0);
> > > @@ -1087,9 +1085,6 @@ static int coroutine_fn mirror_run(Job *job, Error **errp)
> > >           trace_mirror_before_sleep(s, cnt, job_is_ready(&s->common.job),
> > >                                     delay_ns);
> > >           job_sleep_ns(&s->common.job, delay_ns);
> > > -        if (job_is_cancelled(&s->common.job)) {
> > > -            break;
> > > -        }
> > I think it was intentional that the check is here because it means
> > skipping the job_sleep_ns() and instead cancelling immediately, and we
> > probably still want that. Between your check above and here, the
> > coroutine can yield, so cancellation could have been newly requested.
> 
> I’m afraid I don’t quite understand.

Hm, I don't either. Somehow I thought job_sleep_ns() was after the
check, while quoting the exact hunk that shows that it comes before
it...

I'm still not sure if sleeping before exiting is really useful, but it
seems we never cared about that.

Kevin
Max Reitz Aug. 4, 2021, 10:12 a.m. UTC | #6
On 04.08.21 11:48, Kevin Wolf wrote:
> Am 04.08.2021 um 10:25 hat Max Reitz geschrieben:
>> On 03.08.21 16:34, Kevin Wolf wrote:
>>> Am 26.07.2021 um 16:46 hat Max Reitz geschrieben:
>>>> We must check whether the job is force-cancelled early in our main loop,
>>>> most importantly before any `continue` statement.  For example, we used
>>>> to have `continue`s before our current checking location that are
>>>> triggered by `mirror_flush()` failing.  So, if `mirror_flush()` kept
>>>> failing, force-cancelling the job would not terminate it.
>>>>
>>>> A job being force-cancelled should be treated the same as the job having
>>>> failed, so put the check in the same place where we check `s->ret < 0`.
>>>>
>>>> Buglink: https://gitlab.com/qemu-project/qemu/-/issues/462
>>>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>>>> ---
>>>>    block/mirror.c | 7 +------
>>>>    1 file changed, 1 insertion(+), 6 deletions(-)
>>>>
>>>> diff --git a/block/mirror.c b/block/mirror.c
>>>> index 72e02fa34e..46d1a1e5a2 100644
>>>> --- a/block/mirror.c
>>>> +++ b/block/mirror.c
>>>> @@ -993,7 +993,7 @@ static int coroutine_fn mirror_run(Job *job, Error **errp)
>>>>                mirror_wait_for_any_operation(s, true);
>>>>            }
>>>> -        if (s->ret < 0) {
>>>> +        if (s->ret < 0 || job_is_cancelled(&s->common.job)) {
>>>>                ret = s->ret;
>>>>                goto immediate_exit;
>>>>            }
>>>> @@ -1078,8 +1078,6 @@ static int coroutine_fn mirror_run(Job *job, Error **errp)
>>>>                break;
>>>>            }
>>>> -        ret = 0;
>>>> -
>>>>            if (job_is_ready(&s->common.job) && !should_complete) {
>>>>                delay_ns = (s->in_flight == 0 &&
>>>>                            cnt == 0 ? BLOCK_JOB_SLICE_TIME : 0);
>>>> @@ -1087,9 +1085,6 @@ static int coroutine_fn mirror_run(Job *job, Error **errp)
>>>>            trace_mirror_before_sleep(s, cnt, job_is_ready(&s->common.job),
>>>>                                      delay_ns);
>>>>            job_sleep_ns(&s->common.job, delay_ns);
>>>> -        if (job_is_cancelled(&s->common.job)) {
>>>> -            break;
>>>> -        }
>>> I think it was intentional that the check is here because it means
>>> skipping the job_sleep_ns() and instead cancelling immediately, and we
>>> probably still want that. Between your check above and here, the
>>> coroutine can yield, so cancellation could have been newly requested.
>> I’m afraid I don’t quite understand.
> Hm, I don't either. Somehow I thought job_sleep_ns() was after the
> check, while quoting the exact hunk that shows that it comes before
> it...
>
> I'm still not sure if sleeping before exiting is really useful, but it
> seems we never cared about that.

Jobs that are (force-)cancelled cannot yield or sleep anyway 
(job_sleep_ns(), job_yield(), and job_pause_point() will all return 
immediately when called on a cancelled job).

So I thought you meant that a job can only be cancelled while it is 
yielding, so we should prefer to put the is_cancelled check after a 
yield point (like job_pause_point()) than before it.

But I mean, if you’re happy, I’ll be happy, too. :)

Max
diff mbox series

Patch

diff --git a/block/mirror.c b/block/mirror.c
index 72e02fa34e..46d1a1e5a2 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -993,7 +993,7 @@  static int coroutine_fn mirror_run(Job *job, Error **errp)
             mirror_wait_for_any_operation(s, true);
         }
 
-        if (s->ret < 0) {
+        if (s->ret < 0 || job_is_cancelled(&s->common.job)) {
             ret = s->ret;
             goto immediate_exit;
         }
@@ -1078,8 +1078,6 @@  static int coroutine_fn mirror_run(Job *job, Error **errp)
             break;
         }
 
-        ret = 0;
-
         if (job_is_ready(&s->common.job) && !should_complete) {
             delay_ns = (s->in_flight == 0 &&
                         cnt == 0 ? BLOCK_JOB_SLICE_TIME : 0);
@@ -1087,9 +1085,6 @@  static int coroutine_fn mirror_run(Job *job, Error **errp)
         trace_mirror_before_sleep(s, cnt, job_is_ready(&s->common.job),
                                   delay_ns);
         job_sleep_ns(&s->common.job, delay_ns);
-        if (job_is_cancelled(&s->common.job)) {
-            break;
-        }
         s->last_pause_ns = qemu_clock_get_ns(QEMU_CLOCK_REALTIME);
     }