Message ID | 20170818133118.8650-1-stefanha@redhat.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On 08/18/2017 08:31 AM, Stefan Hajnoczi wrote: > Most qcow2 files are uncompressed so it is wasteful to allocate (32 + 1) > * cluster_size + 512 bytes upfront. Allocate s->cluster_cache and > s->cluster_data when the first read operation is performance on a > compressed cluster. > > The buffers are freed in .bdrv_close(). .bdrv_open() no longer has any > code paths that can allocate these buffers, so remove the free functions > in the error code path. > > Reported-by: Alexey Kardashevskiy <aik@ozlabs.ru> > Cc: Kevin Wolf <kwolf@redhat.com> > Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> > --- > Alexey: Does this improve your memory profiling results? Is this a regression from earlier versions? Likely, it is NOT -rc4 material, and thus can wait for 2.11; but it should be fine for -stable as part of 2.10.1 down the road. > +++ b/block/qcow2-cluster.c > @@ -1516,6 +1516,23 @@ int qcow2_decompress_cluster(BlockDriverState *bs, uint64_t cluster_offset) > nb_csectors = ((cluster_offset >> s->csize_shift) & s->csize_mask) + 1; > sector_offset = coffset & 511; > csize = nb_csectors * 512 - sector_offset; > + > + /* Allocate buffers on first decompress operation, most images are > + * uncompressed and the memory overhead can be avoided. The buffers > + * are freed in .bdrv_close(). > + */ > + if (!s->cluster_data) { > + /* one more sector for decompressed data alignment */ > + s->cluster_data = qemu_try_blockalign(bs->file->bs, > + QCOW_MAX_CRYPT_CLUSTERS * s->cluster_size + 512); > + if (!s->cluster_data) { > + return -EIO; Is -ENOMEM any better than -EIO here?
On 18/08/17 23:31, Stefan Hajnoczi wrote: > Most qcow2 files are uncompressed so it is wasteful to allocate (32 + 1) > * cluster_size + 512 bytes upfront. Allocate s->cluster_cache and > s->cluster_data when the first read operation is performance on a > compressed cluster. > > The buffers are freed in .bdrv_close(). .bdrv_open() no longer has any > code paths that can allocate these buffers, so remove the free functions > in the error code path. > > Reported-by: Alexey Kardashevskiy <aik@ozlabs.ru> > Cc: Kevin Wolf <kwolf@redhat.com> > Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> > --- > Alexey: Does this improve your memory profiling results? Yes, it does: was: 12.81% (1,023,193,088B) now: 05.36% (393,893,888B) Tested-by: Alexey Kardashevskiy <aik@ozlabs.ru> > block/qcow2-cluster.c | 17 +++++++++++++++++ > block/qcow2.c | 12 ------------ > 2 files changed, 17 insertions(+), 12 deletions(-) > > diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c > index f06c08f64c..c47600a44e 100644 > --- a/block/qcow2-cluster.c > +++ b/block/qcow2-cluster.c > @@ -1516,6 +1516,23 @@ int qcow2_decompress_cluster(BlockDriverState *bs, uint64_t cluster_offset) > nb_csectors = ((cluster_offset >> s->csize_shift) & s->csize_mask) + 1; > sector_offset = coffset & 511; > csize = nb_csectors * 512 - sector_offset; > + > + /* Allocate buffers on first decompress operation, most images are > + * uncompressed and the memory overhead can be avoided. The buffers > + * are freed in .bdrv_close(). > + */ > + if (!s->cluster_data) { > + /* one more sector for decompressed data alignment */ > + s->cluster_data = qemu_try_blockalign(bs->file->bs, > + QCOW_MAX_CRYPT_CLUSTERS * s->cluster_size + 512); > + if (!s->cluster_data) { > + return -EIO; > + } > + } > + if (!s->cluster_cache) { > + s->cluster_cache = g_malloc(s->cluster_size); > + } > + > BLKDBG_EVENT(bs->file, BLKDBG_READ_COMPRESSED); > ret = bdrv_read(bs->file, coffset >> 9, s->cluster_data, > nb_csectors); > diff --git a/block/qcow2.c b/block/qcow2.c > index 40ba26c111..0ac201910a 100644 > --- a/block/qcow2.c > +++ b/block/qcow2.c > @@ -1360,16 +1360,6 @@ static int qcow2_do_open(BlockDriverState *bs, QDict *options, int flags, > goto fail; > } > > - s->cluster_cache = g_malloc(s->cluster_size); > - /* one more sector for decompressed data alignment */ > - s->cluster_data = qemu_try_blockalign(bs->file->bs, QCOW_MAX_CRYPT_CLUSTERS > - * s->cluster_size + 512); > - if (s->cluster_data == NULL) { > - error_setg(errp, "Could not allocate temporary cluster buffer"); > - ret = -ENOMEM; > - goto fail; > - } > - > s->cluster_cache_offset = -1; > s->flags = flags; > > @@ -1507,8 +1497,6 @@ static int qcow2_do_open(BlockDriverState *bs, QDict *options, int flags, > if (s->refcount_block_cache) { > qcow2_cache_destroy(bs, s->refcount_block_cache); > } > - g_free(s->cluster_cache); > - qemu_vfree(s->cluster_data); > qcrypto_block_free(s->crypto); > qapi_free_QCryptoBlockOpenOptions(s->crypto_opts); > return ret; >
On Fri, Aug 18, 2017 at 10:18:37AM -0500, Eric Blake wrote: > On 08/18/2017 08:31 AM, Stefan Hajnoczi wrote: > > Most qcow2 files are uncompressed so it is wasteful to allocate (32 + 1) > > * cluster_size + 512 bytes upfront. Allocate s->cluster_cache and > > s->cluster_data when the first read operation is performance on a > > compressed cluster. > > > > The buffers are freed in .bdrv_close(). .bdrv_open() no longer has any > > code paths that can allocate these buffers, so remove the free functions > > in the error code path. > > > > Reported-by: Alexey Kardashevskiy <aik@ozlabs.ru> > > Cc: Kevin Wolf <kwolf@redhat.com> > > Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> > > --- > > Alexey: Does this improve your memory profiling results? > > Is this a regression from earlier versions? Likely, it is NOT -rc4 > material, and thus can wait for 2.11; but it should be fine for -stable > as part of 2.10.1 down the road. It's not a regression. s->cluster_data was always allocated upfront in .bdrv_open(). > > +++ b/block/qcow2-cluster.c > > @@ -1516,6 +1516,23 @@ int qcow2_decompress_cluster(BlockDriverState *bs, uint64_t cluster_offset) > > nb_csectors = ((cluster_offset >> s->csize_shift) & s->csize_mask) + 1; > > sector_offset = coffset & 511; > > csize = nb_csectors * 512 - sector_offset; > > + > > + /* Allocate buffers on first decompress operation, most images are > > + * uncompressed and the memory overhead can be avoided. The buffers > > + * are freed in .bdrv_close(). > > + */ > > + if (!s->cluster_data) { > > + /* one more sector for decompressed data alignment */ > > + s->cluster_data = qemu_try_blockalign(bs->file->bs, > > + QCOW_MAX_CRYPT_CLUSTERS * s->cluster_size + 512); > > + if (!s->cluster_data) { > > + return -EIO; > > Is -ENOMEM any better than -EIO here? There is another instance of -ENOMEM in qcow2.c so it seems reasonable to use the more specific ENOMEM error code. I'll resend the patch. Stefan
diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c index f06c08f64c..c47600a44e 100644 --- a/block/qcow2-cluster.c +++ b/block/qcow2-cluster.c @@ -1516,6 +1516,23 @@ int qcow2_decompress_cluster(BlockDriverState *bs, uint64_t cluster_offset) nb_csectors = ((cluster_offset >> s->csize_shift) & s->csize_mask) + 1; sector_offset = coffset & 511; csize = nb_csectors * 512 - sector_offset; + + /* Allocate buffers on first decompress operation, most images are + * uncompressed and the memory overhead can be avoided. The buffers + * are freed in .bdrv_close(). + */ + if (!s->cluster_data) { + /* one more sector for decompressed data alignment */ + s->cluster_data = qemu_try_blockalign(bs->file->bs, + QCOW_MAX_CRYPT_CLUSTERS * s->cluster_size + 512); + if (!s->cluster_data) { + return -EIO; + } + } + if (!s->cluster_cache) { + s->cluster_cache = g_malloc(s->cluster_size); + } + BLKDBG_EVENT(bs->file, BLKDBG_READ_COMPRESSED); ret = bdrv_read(bs->file, coffset >> 9, s->cluster_data, nb_csectors); diff --git a/block/qcow2.c b/block/qcow2.c index 40ba26c111..0ac201910a 100644 --- a/block/qcow2.c +++ b/block/qcow2.c @@ -1360,16 +1360,6 @@ static int qcow2_do_open(BlockDriverState *bs, QDict *options, int flags, goto fail; } - s->cluster_cache = g_malloc(s->cluster_size); - /* one more sector for decompressed data alignment */ - s->cluster_data = qemu_try_blockalign(bs->file->bs, QCOW_MAX_CRYPT_CLUSTERS - * s->cluster_size + 512); - if (s->cluster_data == NULL) { - error_setg(errp, "Could not allocate temporary cluster buffer"); - ret = -ENOMEM; - goto fail; - } - s->cluster_cache_offset = -1; s->flags = flags; @@ -1507,8 +1497,6 @@ static int qcow2_do_open(BlockDriverState *bs, QDict *options, int flags, if (s->refcount_block_cache) { qcow2_cache_destroy(bs, s->refcount_block_cache); } - g_free(s->cluster_cache); - qemu_vfree(s->cluster_data); qcrypto_block_free(s->crypto); qapi_free_QCryptoBlockOpenOptions(s->crypto_opts); return ret;
Most qcow2 files are uncompressed so it is wasteful to allocate (32 + 1) * cluster_size + 512 bytes upfront. Allocate s->cluster_cache and s->cluster_data when the first read operation is performance on a compressed cluster. The buffers are freed in .bdrv_close(). .bdrv_open() no longer has any code paths that can allocate these buffers, so remove the free functions in the error code path. Reported-by: Alexey Kardashevskiy <aik@ozlabs.ru> Cc: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> --- Alexey: Does this improve your memory profiling results? block/qcow2-cluster.c | 17 +++++++++++++++++ block/qcow2.c | 12 ------------ 2 files changed, 17 insertions(+), 12 deletions(-)