[v2,1/3] block/commit: implement final flush

Message ID	20240626145038.458709-2-vsementsov@yandex-team.ru (mailing list archive)
State	New, archived
Headers	show Return-Path: <qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org> From: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> To: qemu-block@nongnu.org Cc: qemu-devel@nongnu.org, hreitz@redhat.com, kwolf@redhat.com, vsementsov@yandex-team.ru, jsnow@redhat.com, den@openvz.org, f.ebner@proxmox.com Subject: [PATCH v2 1/3] block/commit: implement final flush Date: Wed, 26 Jun 2024 17:50:36 +0300 Message-Id: <20240626145038.458709-2-vsementsov@yandex-team.ru> In-Reply-To: <20240626145038.458709-1-vsementsov@yandex-team.ru> References: <20240626145038.458709-1-vsementsov@yandex-team.ru> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Received-SPF: pass client-ip=178.154.239.136; envelope-from=vsementsov@yandex-team.ru; helo=forwardcorp1b.mail.yandex.net X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action Precedence: list Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org
Series	block-jobs: add final flush \| expand [v2,0/3] block-jobs: add final flush [v2,1/3] block/commit: implement final flush [v2,2/3] block/stream: implement final flush [v2,3/3] block/backup: implement final flush

Message ID

20240626145038.458709-2-vsementsov@yandex-team.ru (mailing list archive)

State

New, archived

Headers

From: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
To: qemu-block@nongnu.org
Cc: qemu-devel@nongnu.org, hreitz@redhat.com, kwolf@redhat.com,
 vsementsov@yandex-team.ru, jsnow@redhat.com, den@openvz.org,
 f.ebner@proxmox.com
Subject: [PATCH v2 1/3] block/commit: implement final flush
Date: Wed, 26 Jun 2024 17:50:36 +0300
Message-Id: <20240626145038.458709-2-vsementsov@yandex-team.ru>
In-Reply-To: <20240626145038.458709-1-vsementsov@yandex-team.ru>
References: <20240626145038.458709-1-vsementsov@yandex-team.ru>
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
Received-SPF: pass client-ip=178.154.239.136;
 envelope-from=vsementsov@yandex-team.ru; helo=forwardcorp1b.mail.yandex.net
X-Spam_score_int: -20
X-Spam_score: -2.1
X-Spam_bar: --
X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1,
 DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, SPF_HELO_NONE=0.001,
 SPF_PASS=-0.001 autolearn=ham autolearn_force=no
X-Spam_action: no action
X-BeenThere: qemu-devel@nongnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
 <mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <https://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
 <mailto:qemu-devel-request@nongnu.org?subject=subscribe>
Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org
Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org

Series

block-jobs: add final flush | expand

Commit Message

Vladimir Sementsov-Ogievskiy June 26, 2024, 2:50 p.m. UTC

Actually block job is not completed without the final flush. It's
rather unexpected to have broken target when job was successfully
completed long ago and now we fail to flush or process just
crashed/killed.

Mirror job already has mirror_flush() for this. So, it's OK.

Do this for commit job too.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
---
 block/commit.c | 41 +++++++++++++++++++++++++++--------------
 1 file changed, 27 insertions(+), 14 deletions(-)

Comments

Kevin Wolf July 18, 2024, 7:22 p.m. UTC | #1

Am 26.06.2024 um 16:50 hat Vladimir Sementsov-Ogievskiy geschrieben:
> Actually block job is not completed without the final flush. It's
> rather unexpected to have broken target when job was successfully
> completed long ago and now we fail to flush or process just
> crashed/killed.
> 
> Mirror job already has mirror_flush() for this. So, it's OK.
> 
> Do this for commit job too.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
> ---
>  block/commit.c | 41 +++++++++++++++++++++++++++--------------
>  1 file changed, 27 insertions(+), 14 deletions(-)
> 
> diff --git a/block/commit.c b/block/commit.c
> index 7c3fdcb0ca..81971692a2 100644
> --- a/block/commit.c
> +++ b/block/commit.c
> @@ -134,6 +134,7 @@ static int coroutine_fn commit_run(Job *job, Error **errp)
>      int64_t n = 0; /* bytes */
>      QEMU_AUTO_VFREE void *buf = NULL;
>      int64_t len, base_len;
> +    bool need_final_flush = true;
>  
>      len = blk_co_getlength(s->top);
>      if (len < 0) {
> @@ -155,8 +156,8 @@ static int coroutine_fn commit_run(Job *job, Error **errp)
>  
>      buf = blk_blockalign(s->top, COMMIT_BUFFER_SIZE);
>  
> -    for (offset = 0; offset < len; offset += n) {
> -        bool copy;
> +    for (offset = 0; offset < len || need_final_flush; offset += n) {

In general, the control flow would be nicer to read if the final flush
weren't integrated into the loop, but just added after it.

But I assume this is pretty much required for pausing the job during
error handling in the final flush if you don't want to duplicate a lot
of the logic into a second loop?

> +        bool copy = false;
>          bool error_in_source = true;
>  
>          /* Note that even when no rate limit is applied we need to yield
> @@ -166,22 +167,34 @@ static int coroutine_fn commit_run(Job *job, Error **errp)
>          if (job_is_cancelled(&s->common.job)) {
>              break;
>          }
> -        /* Copy if allocated above the base */
> -        ret = blk_co_is_allocated_above(s->top, s->base_overlay, true,
> -                                        offset, COMMIT_BUFFER_SIZE, &n);
> -        copy = (ret > 0);
> -        trace_commit_one_iteration(s, offset, n, ret);
> -        if (copy) {
> -            assert(n < SIZE_MAX);
>  
> -            ret = blk_co_pread(s->top, offset, n, buf, 0);
> -            if (ret >= 0) {
> -                ret = blk_co_pwrite(s->base, offset, n, buf, 0);
> -                if (ret < 0) {
> -                    error_in_source = false;
> +        if (offset < len) {
> +            /* Copy if allocated above the base */
> +            ret = blk_co_is_allocated_above(s->top, s->base_overlay, true,
> +                                            offset, COMMIT_BUFFER_SIZE, &n);
> +            copy = (ret > 0);
> +            trace_commit_one_iteration(s, offset, n, ret);
> +            if (copy) {
> +                assert(n < SIZE_MAX);
> +
> +                ret = blk_co_pread(s->top, offset, n, buf, 0);
> +                if (ret >= 0) {
> +                    ret = blk_co_pwrite(s->base, offset, n, buf, 0);
> +                    if (ret < 0) {
> +                        error_in_source = false;
> +                    }
>                  }
>              }
> +        } else {
> +            assert(need_final_flush);
> +            ret = blk_co_flush(s->base);
> +            if (ret < 0) {
> +                error_in_source = false;
> +            } else {
> +                need_final_flush = false;
> +            }

Should we set n = 0 in this block to avoid counting the last chunk twice
for the progress?

>          }
> +
>          if (ret < 0) {
>              BlockErrorAction action =
>                  block_job_error_action(&s->common, s->on_error,

Kevin

Vladimir Sementsov-Ogievskiy July 19, 2024, 10:35 a.m. UTC | #2

On 18.07.24 22:22, Kevin Wolf wrote:
> Am 26.06.2024 um 16:50 hat Vladimir Sementsov-Ogievskiy geschrieben:
>> Actually block job is not completed without the final flush. It's
>> rather unexpected to have broken target when job was successfully
>> completed long ago and now we fail to flush or process just
>> crashed/killed.
>>
>> Mirror job already has mirror_flush() for this. So, it's OK.
>>
>> Do this for commit job too.
>>
>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
>> ---
>>   block/commit.c | 41 +++++++++++++++++++++++++++--------------
>>   1 file changed, 27 insertions(+), 14 deletions(-)
>>
>> diff --git a/block/commit.c b/block/commit.c
>> index 7c3fdcb0ca..81971692a2 100644
>> --- a/block/commit.c
>> +++ b/block/commit.c
>> @@ -134,6 +134,7 @@ static int coroutine_fn commit_run(Job *job, Error **errp)
>>       int64_t n = 0; /* bytes */
>>       QEMU_AUTO_VFREE void *buf = NULL;
>>       int64_t len, base_len;
>> +    bool need_final_flush = true;
>>   
>>       len = blk_co_getlength(s->top);
>>       if (len < 0) {
>> @@ -155,8 +156,8 @@ static int coroutine_fn commit_run(Job *job, Error **errp)
>>   
>>       buf = blk_blockalign(s->top, COMMIT_BUFFER_SIZE);
>>   
>> -    for (offset = 0; offset < len; offset += n) {
>> -        bool copy;
>> +    for (offset = 0; offset < len || need_final_flush; offset += n) {
> 
> In general, the control flow would be nicer to read if the final flush
> weren't integrated into the loop, but just added after it.
> 
> But I assume this is pretty much required for pausing the job during
> error handling in the final flush if you don't want to duplicate a lot
> of the logic into a second loop?

Right, that's the reason.

> 
>> +        bool copy = false;
>>           bool error_in_source = true;
>>   
>>           /* Note that even when no rate limit is applied we need to yield
>> @@ -166,22 +167,34 @@ static int coroutine_fn commit_run(Job *job, Error **errp)
>>           if (job_is_cancelled(&s->common.job)) {
>>               break;
>>           }
>> -        /* Copy if allocated above the base */
>> -        ret = blk_co_is_allocated_above(s->top, s->base_overlay, true,
>> -                                        offset, COMMIT_BUFFER_SIZE, &n);
>> -        copy = (ret > 0);
>> -        trace_commit_one_iteration(s, offset, n, ret);
>> -        if (copy) {
>> -            assert(n < SIZE_MAX);
>>   
>> -            ret = blk_co_pread(s->top, offset, n, buf, 0);
>> -            if (ret >= 0) {
>> -                ret = blk_co_pwrite(s->base, offset, n, buf, 0);
>> -                if (ret < 0) {
>> -                    error_in_source = false;
>> +        if (offset < len) {
>> +            /* Copy if allocated above the base */
>> +            ret = blk_co_is_allocated_above(s->top, s->base_overlay, true,
>> +                                            offset, COMMIT_BUFFER_SIZE, &n);
>> +            copy = (ret > 0);
>> +            trace_commit_one_iteration(s, offset, n, ret);
>> +            if (copy) {
>> +                assert(n < SIZE_MAX);
>> +
>> +                ret = blk_co_pread(s->top, offset, n, buf, 0);
>> +                if (ret >= 0) {
>> +                    ret = blk_co_pwrite(s->base, offset, n, buf, 0);
>> +                    if (ret < 0) {
>> +                        error_in_source = false;
>> +                    }
>>                   }
>>               }
>> +        } else {
>> +            assert(need_final_flush);
>> +            ret = blk_co_flush(s->base);
>> +            if (ret < 0) {
>> +                error_in_source = false;
>> +            } else {
>> +                need_final_flush = false;
>> +            }
> 
> Should we set n = 0 in this block to avoid counting the last chunk twice
> for the progress?

Oops, right. Will fix, thanks

> 
>>           }
>> +
>>           if (ret < 0) {
>>               BlockErrorAction action =
>>                   block_job_error_action(&s->common, s->on_error,
> 
> Kevin
>

Kevin Wolf July 29, 2024, 12:25 p.m. UTC | #3

Am 19.07.2024 um 12:35 hat Vladimir Sementsov-Ogievskiy geschrieben:
> On 18.07.24 22:22, Kevin Wolf wrote:
> > Am 26.06.2024 um 16:50 hat Vladimir Sementsov-Ogievskiy geschrieben:
> > > Actually block job is not completed without the final flush. It's
> > > rather unexpected to have broken target when job was successfully
> > > completed long ago and now we fail to flush or process just
> > > crashed/killed.
> > > 
> > > Mirror job already has mirror_flush() for this. So, it's OK.
> > > 
> > > Do this for commit job too.
> > > 
> > > Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
> > > ---
> > >   block/commit.c | 41 +++++++++++++++++++++++++++--------------
> > >   1 file changed, 27 insertions(+), 14 deletions(-)
> > > 
> > > diff --git a/block/commit.c b/block/commit.c
> > > index 7c3fdcb0ca..81971692a2 100644
> > > --- a/block/commit.c
> > > +++ b/block/commit.c
> > > @@ -134,6 +134,7 @@ static int coroutine_fn commit_run(Job *job, Error **errp)
> > >       int64_t n = 0; /* bytes */
> > >       QEMU_AUTO_VFREE void *buf = NULL;
> > >       int64_t len, base_len;
> > > +    bool need_final_flush = true;
> > >       len = blk_co_getlength(s->top);
> > >       if (len < 0) {
> > > @@ -155,8 +156,8 @@ static int coroutine_fn commit_run(Job *job, Error **errp)
> > >       buf = blk_blockalign(s->top, COMMIT_BUFFER_SIZE);
> > > -    for (offset = 0; offset < len; offset += n) {
> > > -        bool copy;
> > > +    for (offset = 0; offset < len || need_final_flush; offset += n) {
> > 
> > In general, the control flow would be nicer to read if the final flush
> > weren't integrated into the loop, but just added after it.
> > 
> > But I assume this is pretty much required for pausing the job during
> > error handling in the final flush if you don't want to duplicate a lot
> > of the logic into a second loop?
> 
> Right, that's the reason.

This would probably be the right solution if it affected only commit.
But I've thought a bit more about this and given that the same thing
happens in all of the block jobs, I'm really wondering if introducing a
block job helper function wouldn't be better, so that each block job
could just add something like this after its main loop:

    do {
        ret = blk_co_flush();
    } while (block_job_handle_error(job, ret));

And the helper would call block_job_error_action(), stop the job if
necessary, check if it's cancelled, include a pause point etc.

Kevin

Vladimir Sementsov-Ogievskiy Aug. 2, 2024, 11:12 a.m. UTC | #4

On 29.07.24 15:25, Kevin Wolf wrote:
> Am 19.07.2024 um 12:35 hat Vladimir Sementsov-Ogievskiy geschrieben:
>> On 18.07.24 22:22, Kevin Wolf wrote:
>>> Am 26.06.2024 um 16:50 hat Vladimir Sementsov-Ogievskiy geschrieben:
>>>> Actually block job is not completed without the final flush. It's
>>>> rather unexpected to have broken target when job was successfully
>>>> completed long ago and now we fail to flush or process just
>>>> crashed/killed.
>>>>
>>>> Mirror job already has mirror_flush() for this. So, it's OK.
>>>>
>>>> Do this for commit job too.
>>>>
>>>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
>>>> ---
>>>>    block/commit.c | 41 +++++++++++++++++++++++++++--------------
>>>>    1 file changed, 27 insertions(+), 14 deletions(-)
>>>>
>>>> diff --git a/block/commit.c b/block/commit.c
>>>> index 7c3fdcb0ca..81971692a2 100644
>>>> --- a/block/commit.c
>>>> +++ b/block/commit.c
>>>> @@ -134,6 +134,7 @@ static int coroutine_fn commit_run(Job *job, Error **errp)
>>>>        int64_t n = 0; /* bytes */
>>>>        QEMU_AUTO_VFREE void *buf = NULL;
>>>>        int64_t len, base_len;
>>>> +    bool need_final_flush = true;
>>>>        len = blk_co_getlength(s->top);
>>>>        if (len < 0) {
>>>> @@ -155,8 +156,8 @@ static int coroutine_fn commit_run(Job *job, Error **errp)
>>>>        buf = blk_blockalign(s->top, COMMIT_BUFFER_SIZE);
>>>> -    for (offset = 0; offset < len; offset += n) {
>>>> -        bool copy;
>>>> +    for (offset = 0; offset < len || need_final_flush; offset += n) {
>>>
>>> In general, the control flow would be nicer to read if the final flush
>>> weren't integrated into the loop, but just added after it.
>>>
>>> But I assume this is pretty much required for pausing the job during
>>> error handling in the final flush if you don't want to duplicate a lot
>>> of the logic into a second loop?
>>
>> Right, that's the reason.
> 
> This would probably be the right solution if it affected only commit.
> But I've thought a bit more about this and given that the same thing
> happens in all of the block jobs, I'm really wondering if introducing a
> block job helper function wouldn't be better, so that each block job
> could just add something like this after its main loop:
> 
>      do {
>          ret = blk_co_flush();
>      } while (block_job_handle_error(job, ret));
> 
> And the helper would call block_job_error_action(), stop the job if
> necessary, check if it's cancelled, include a pause point etc.
> 

Agree. Thanks, I'll try.

diff --git a/block/commit.c b/block/commit.c
index 7c3fdcb0ca..81971692a2 100644
--- a/block/commit.c
+++ b/block/commit.c
@@ -134,6 +134,7 @@  static int coroutine_fn commit_run(Job *job, Error **errp)
     int64_t n = 0; /* bytes */
     QEMU_AUTO_VFREE void *buf = NULL;
     int64_t len, base_len;
+    bool need_final_flush = true;
 
     len = blk_co_getlength(s->top);
     if (len < 0) {
@@ -155,8 +156,8 @@  static int coroutine_fn commit_run(Job *job, Error **errp)
 
     buf = blk_blockalign(s->top, COMMIT_BUFFER_SIZE);
 
-    for (offset = 0; offset < len; offset += n) {
-        bool copy;
+    for (offset = 0; offset < len || need_final_flush; offset += n) {
+        bool copy = false;
         bool error_in_source = true;
 
         /* Note that even when no rate limit is applied we need to yield
@@ -166,22 +167,34 @@  static int coroutine_fn commit_run(Job *job, Error **errp)
         if (job_is_cancelled(&s->common.job)) {
             break;
         }
-        /* Copy if allocated above the base */
-        ret = blk_co_is_allocated_above(s->top, s->base_overlay, true,
-                                        offset, COMMIT_BUFFER_SIZE, &n);
-        copy = (ret > 0);
-        trace_commit_one_iteration(s, offset, n, ret);
-        if (copy) {
-            assert(n < SIZE_MAX);
 
-            ret = blk_co_pread(s->top, offset, n, buf, 0);
-            if (ret >= 0) {
-                ret = blk_co_pwrite(s->base, offset, n, buf, 0);
-                if (ret < 0) {
-                    error_in_source = false;
+        if (offset < len) {
+            /* Copy if allocated above the base */
+            ret = blk_co_is_allocated_above(s->top, s->base_overlay, true,
+                                            offset, COMMIT_BUFFER_SIZE, &n);
+            copy = (ret > 0);
+            trace_commit_one_iteration(s, offset, n, ret);
+            if (copy) {
+                assert(n < SIZE_MAX);
+
+                ret = blk_co_pread(s->top, offset, n, buf, 0);
+                if (ret >= 0) {
+                    ret = blk_co_pwrite(s->base, offset, n, buf, 0);
+                    if (ret < 0) {
+                        error_in_source = false;
+                    }
                 }
             }
+        } else {
+            assert(need_final_flush);
+            ret = blk_co_flush(s->base);
+            if (ret < 0) {
+                error_in_source = false;
+            } else {
+                need_final_flush = false;
+            }
         }
+
         if (ret < 0) {
             BlockErrorAction action =
                 block_job_error_action(&s->common, s->on_error,

[v2,1/3] block/commit: implement final flush

Commit Message

Comments

Patch