[v3,0/4] block: seriously improve savevm performance

Message ID	20200611171143.21589-1-den@openvz.org (mailing list archive)
Headers	show Return-Path: <SRS0=ppCK=7Y=nongnu.org=qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@kernel.org> DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org EF4CC20792 From: "Denis V. Lunev" <den@openvz.org> To: qemu-block@nongnu.org, qemu-devel@nongnu.org Subject: [PATCH v3 0/4] block: seriously improve savevm performance Date: Thu, 11 Jun 2020 20:11:39 +0300 Message-Id: <20200611171143.21589-1-den@openvz.org> Received-SPF: pass client-ip=185.231.240.75; envelope-from=den@openvz.org; helo=relay3.sw.ru Precedence: list Cc: Kevin Wolf <kwolf@redhat.com>, Fam Zheng <fam@euphon.net>, Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>, Juan Quintela <quintela@redhat.com>, "Dr. David Alan Gilbert" <dgilbert@redhat.com>, Max Reitz <mreitz@redhat.com>, Denis Plotnikov <dplotnikov@virtuozzo.com>, Stefan Hajnoczi <stefanha@redhat.com>, "Denis V . Lunev" <den@openvz.org> Errors-To: qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org Sender: "Qemu-devel" <qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org>
Series	block: seriously improve savevm performance \| expand [v3,0/4] block: seriously improve savevm performance [1/4] migration/savevm: respect qemu_fclose() error code in save_snapshot() [2/4] block/aio_task: allow start/wait task from any coroutine [3/4] block, migration: add bdrv_flush_vmstate helper [4/4] block/io: improve savevm performance

Message ID

20200611171143.21589-1-den@openvz.org (mailing list archive)

Headers

DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org EF4CC20792
From: "Denis V. Lunev" <den@openvz.org>
To: qemu-block@nongnu.org,
	qemu-devel@nongnu.org
Subject: [PATCH v3 0/4] block: seriously improve savevm performance
Date: Thu, 11 Jun 2020 20:11:39 +0300
Message-Id: <20200611171143.21589-1-den@openvz.org>
Received-SPF: pass client-ip=185.231.240.75; envelope-from=den@openvz.org;
 helo=relay3.sw.ru
X-Spam_score_int: -18
X-Spam_score: -1.9
X-Spam_bar: -
X-Spam_report: (-1.9 / 5.0 requ) BAYES_00=-1.9, SPF_HELO_NONE=0.001,
 SPF_PASS=-0.001 autolearn=_AUTOLEARN
X-Spam_action: no action
X-BeenThere: qemu-devel@nongnu.org
X-Mailman-Version: 2.1.23
Precedence: list
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
 <mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <https://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
 <mailto:qemu-devel-request@nongnu.org?subject=subscribe>
Cc: Kevin Wolf <kwolf@redhat.com>, Fam Zheng <fam@euphon.net>,
 Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>,
 Juan Quintela <quintela@redhat.com>,
 "Dr. David Alan Gilbert" <dgilbert@redhat.com>,
 Max Reitz <mreitz@redhat.com>,
 Denis Plotnikov <dplotnikov@virtuozzo.com>,
 Stefan Hajnoczi <stefanha@redhat.com>, "Denis V . Lunev" <den@openvz.org>
Errors-To: 
 qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org
Sender: "Qemu-devel"
 <qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org>

Series

block: seriously improve savevm performance | expand

Message

Denis V. Lunev June 11, 2020, 5:11 p.m. UTC

This series do standard basic things:
- it creates intermediate buffer for all writes from QEMU migration code
  to QCOW2 image,
- this buffer is sent to disk asynchronously, allowing several writes to
  run in parallel.

In general, migration code is fantastically inefficent (by observation),
buffers are not aligned and sent with arbitrary pieces, a lot of time
less than 100 bytes at a chunk, which results in read-modify-write
operations with non-cached operations. It should also be noted that all
operations are performed into unallocated image blocks, which also suffer
due to partial writes to such new clusters.

This patch series is an implementation of idea discussed in the RFC
posted by Denis Plotnikov
https://lists.gnu.org/archive/html/qemu-devel/2020-04/msg01925.html
Results with this series over NVME are better than original code
                original     rfc    this
cached:          1.79s      2.38s   1.27s
non-cached:      3.29s      1.31s   0.81s

Changes from v2:
- code moved from QCOW2 level to generic block level
- created bdrv_flush_vmstate helper to fix 022, 029 tests
- added recursive for bs->file in bdrv_co_flush_vmstate (fix 267)
- fixed blk_save_vmstate helper
- fixed coroutine wait as Vladimir suggested with waiting fixes from me

Changes from v1:
- patchew warning fixed
- fixed validation that only 1 waiter is allowed in patch 1

Signed-off-by: Denis V. Lunev <den@openvz.org>
CC: Kevin Wolf <kwolf@redhat.com>
CC: Max Reitz <mreitz@redhat.com>
CC: Stefan Hajnoczi <stefanha@redhat.com>
CC: Fam Zheng <fam@euphon.net>
CC: Juan Quintela <quintela@redhat.com>
CC: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
CC: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
CC: Denis Plotnikov <dplotnikov@virtuozzo.com>

Comments

Dr. David Alan Gilbert June 15, 2020, 12:17 p.m. UTC | #1

* Denis V. Lunev (den@openvz.org) wrote:
> This series do standard basic things:
> - it creates intermediate buffer for all writes from QEMU migration code
>   to QCOW2 image,
> - this buffer is sent to disk asynchronously, allowing several writes to
>   run in parallel.
> 
> In general, migration code is fantastically inefficent (by observation),
> buffers are not aligned and sent with arbitrary pieces, a lot of time
> less than 100 bytes at a chunk, which results in read-modify-write
> operations with non-cached operations. It should also be noted that all
> operations are performed into unallocated image blocks, which also suffer
> due to partial writes to such new clusters.

It surprises me a little that you're not benefiting from the buffer
internal to qemu-file.c

Dave

> This patch series is an implementation of idea discussed in the RFC
> posted by Denis Plotnikov
> https://lists.gnu.org/archive/html/qemu-devel/2020-04/msg01925.html
> Results with this series over NVME are better than original code
>                 original     rfc    this
> cached:          1.79s      2.38s   1.27s
> non-cached:      3.29s      1.31s   0.81s
> 
> Changes from v2:
> - code moved from QCOW2 level to generic block level
> - created bdrv_flush_vmstate helper to fix 022, 029 tests
> - added recursive for bs->file in bdrv_co_flush_vmstate (fix 267)
> - fixed blk_save_vmstate helper
> - fixed coroutine wait as Vladimir suggested with waiting fixes from me
> 
> Changes from v1:
> - patchew warning fixed
> - fixed validation that only 1 waiter is allowed in patch 1
> 
> Signed-off-by: Denis V. Lunev <den@openvz.org>
> CC: Kevin Wolf <kwolf@redhat.com>
> CC: Max Reitz <mreitz@redhat.com>
> CC: Stefan Hajnoczi <stefanha@redhat.com>
> CC: Fam Zheng <fam@euphon.net>
> CC: Juan Quintela <quintela@redhat.com>
> CC: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> CC: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> CC: Denis Plotnikov <dplotnikov@virtuozzo.com>
> 
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

Denis V. Lunev June 15, 2020, 12:36 p.m. UTC | #2

On 6/15/20 3:17 PM, Dr. David Alan Gilbert wrote:
> * Denis V. Lunev (den@openvz.org) wrote:
>> This series do standard basic things:
>> - it creates intermediate buffer for all writes from QEMU migration code
>>   to QCOW2 image,
>> - this buffer is sent to disk asynchronously, allowing several writes to
>>   run in parallel.
>>
>> In general, migration code is fantastically inefficent (by observation),
>> buffers are not aligned and sent with arbitrary pieces, a lot of time
>> less than 100 bytes at a chunk, which results in read-modify-write
>> operations with non-cached operations. It should also be noted that all
>> operations are performed into unallocated image blocks, which also suffer
>> due to partial writes to such new clusters.
> It surprises me a little that you're not benefiting from the buffer
> internal to qemu-file.c
>
> Dave
There are a lot of problems with this buffer:

Flushes to block driver state are performed in the abstract places,
pushing
  a) small IO
  b) non-aligned IO both to
       1) page size
       2) cluster size
It should also be noted that buffer in QEMU file is quite small and
all IO operations with it are synchronous. IO, like ethernet, wants
good queues.

The difference is on the table.

Den

Dr. David Alan Gilbert June 15, 2020, 12:49 p.m. UTC | #3

* Denis V. Lunev (den@openvz.org) wrote:
> On 6/15/20 3:17 PM, Dr. David Alan Gilbert wrote:
> > * Denis V. Lunev (den@openvz.org) wrote:
> >> This series do standard basic things:
> >> - it creates intermediate buffer for all writes from QEMU migration code
> >>   to QCOW2 image,
> >> - this buffer is sent to disk asynchronously, allowing several writes to
> >>   run in parallel.
> >>
> >> In general, migration code is fantastically inefficent (by observation),
> >> buffers are not aligned and sent with arbitrary pieces, a lot of time
> >> less than 100 bytes at a chunk, which results in read-modify-write
> >> operations with non-cached operations. It should also be noted that all
> >> operations are performed into unallocated image blocks, which also suffer
> >> due to partial writes to such new clusters.
> > It surprises me a little that you're not benefiting from the buffer
> > internal to qemu-file.c
> >
> > Dave
> There are a lot of problems with this buffer:
> 
> Flushes to block driver state are performed in the abstract places,
> pushing
> Â  a) small IO
> Â  b) non-aligned IO both to
> Â Â Â Â Â Â  1) page size
> Â Â Â Â Â Â  2) cluster size
> It should also be noted that buffer in QEMU file is quite small and
> all IO operations with it are synchronous. IO, like ethernet, wants
> good queues.

Yeh, for ethernet we immediately get the kernels buffer so it's not too
bad; and I guess the async page writes are easier as well.

Dave

> The difference is on the table.
> 
> Den
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK