diff mbox series

[v6,1/3] block: introduce compress filter driver

Message ID 1573488277-794975-2-git-send-email-andrey.shinkevich@virtuozzo.com (mailing list archive)
State New, archived
Headers show
Series qcow2: advanced compression options | expand

Commit Message

Andrey Shinkevich Nov. 11, 2019, 4:04 p.m. UTC
Allow writing all the data compressed through the filter driver.
The written data will be aligned by the cluster size.
Based on the QEMU current implementation, that data can be written to
unallocated clusters only. May be used for a backup job.

Suggested-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
---
 block/Makefile.objs     |   1 +
 block/filter-compress.c | 212 ++++++++++++++++++++++++++++++++++++++++++++++++
 qapi/block-core.json    |  10 ++-
 3 files changed, 219 insertions(+), 4 deletions(-)
 create mode 100644 block/filter-compress.c

Comments

Eric Blake Nov. 11, 2019, 8:47 p.m. UTC | #1
On 11/11/19 10:04 AM, Andrey Shinkevich wrote:
> Allow writing all the data compressed through the filter driver.
> The written data will be aligned by the cluster size.
> Based on the QEMU current implementation, that data can be written to
> unallocated clusters only. May be used for a backup job.
> 
> Suggested-by: Max Reitz <mreitz@redhat.com>
> Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
> ---
>   block/Makefile.objs     |   1 +
>   block/filter-compress.c | 212 ++++++++++++++++++++++++++++++++++++++++++++++++
>   qapi/block-core.json    |  10 ++-
>   3 files changed, 219 insertions(+), 4 deletions(-)
>   create mode 100644 block/filter-compress.c

> +++ b/qapi/block-core.json
> @@ -2884,15 +2884,16 @@
>   # @copy-on-read: Since 3.0
>   # @blklogwrites: Since 3.0
>   # @blkreplay: Since 4.2
> +# @compress: Since 4.2

Are we still trying to get this in 4.2, even though soft freeze is past? 
  Or are we going to have to defer it to 5.0 as a new feature?
Vladimir Sementsov-Ogievskiy Nov. 12, 2019, 8:57 a.m. UTC | #2
11.11.2019 23:47, Eric Blake wrote:
> On 11/11/19 10:04 AM, Andrey Shinkevich wrote:
>> Allow writing all the data compressed through the filter driver.
>> The written data will be aligned by the cluster size.
>> Based on the QEMU current implementation, that data can be written to
>> unallocated clusters only. May be used for a backup job.
>>
>> Suggested-by: Max Reitz <mreitz@redhat.com>
>> Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
>> ---
>>   block/Makefile.objs     |   1 +
>>   block/filter-compress.c | 212 ++++++++++++++++++++++++++++++++++++++++++++++++
>>   qapi/block-core.json    |  10 ++-
>>   3 files changed, 219 insertions(+), 4 deletions(-)
>>   create mode 100644 block/filter-compress.c
> 
>> +++ b/qapi/block-core.json
>> @@ -2884,15 +2884,16 @@
>>   # @copy-on-read: Since 3.0
>>   # @blklogwrites: Since 3.0
>>   # @blkreplay: Since 4.2
>> +# @compress: Since 4.2
> 
> Are we still trying to get this in 4.2, even though soft freeze is past?  Or are we going to have to defer it to 5.0 as a new feature?
> 

5.0 of course
Kevin Wolf Nov. 12, 2019, 9:39 a.m. UTC | #3
Am 11.11.2019 um 17:04 hat Andrey Shinkevich geschrieben:
> Allow writing all the data compressed through the filter driver.
> The written data will be aligned by the cluster size.
> Based on the QEMU current implementation, that data can be written to
> unallocated clusters only. May be used for a backup job.
> 
> Suggested-by: Max Reitz <mreitz@redhat.com>
> Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>

> +static BlockDriver bdrv_compress = {
> +    .format_name                        = "compress",
> +
> +    .bdrv_open                          = zip_open,
> +    .bdrv_child_perm                    = zip_child_perm,

Why do you call the functions zip_* when the driver is called compress?
I think zip would be a driver for zip archives, which we don't use here.

> +    .bdrv_getlength                     = zip_getlength,
> +    .bdrv_co_truncate                   = zip_co_truncate,
> +
> +    .bdrv_co_preadv                     = zip_co_preadv,
> +    .bdrv_co_preadv_part                = zip_co_preadv_part,
> +    .bdrv_co_pwritev                    = zip_co_pwritev,
> +    .bdrv_co_pwritev_part               = zip_co_pwritev_part,

If you implement .bdrv_co_preadv/pwritev_part, isn't the implementation
of .bdrv_co_preadv/pwritev (without _part) dead code?

> +    .bdrv_co_pwrite_zeroes              = zip_co_pwrite_zeroes,
> +    .bdrv_co_pdiscard                   = zip_co_pdiscard,
> +    .bdrv_refresh_limits                = zip_refresh_limits,
> +
> +    .bdrv_eject                         = zip_eject,
> +    .bdrv_lock_medium                   = zip_lock_medium,
> +
> +    .bdrv_co_block_status               = bdrv_co_block_status_from_backing,

Why not use bs->file? (Well, apart from the still not merged filter
series by Max...)

> +    .bdrv_recurse_is_first_non_filter   = zip_recurse_is_first_non_filter,
> +
> +    .has_variable_length                = true,
> +    .is_filter                          = true,
> +};

Kevin
Andrey Shinkevich Nov. 12, 2019, 10:07 a.m. UTC | #4
On 12/11/2019 12:39, Kevin Wolf wrote:
> Am 11.11.2019 um 17:04 hat Andrey Shinkevich geschrieben:
>> Allow writing all the data compressed through the filter driver.
>> The written data will be aligned by the cluster size.
>> Based on the QEMU current implementation, that data can be written to
>> unallocated clusters only. May be used for a backup job.
>>
>> Suggested-by: Max Reitz <mreitz@redhat.com>
>> Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
> 
>> +static BlockDriver bdrv_compress = {
>> +    .format_name                        = "compress",
>> +
>> +    .bdrv_open                          = zip_open,
>> +    .bdrv_child_perm                    = zip_child_perm,
> 
> Why do you call the functions zip_* when the driver is called compress?
> I think zip would be a driver for zip archives, which we don't use here.
> 

Kevin,
Thanks for your response.
I was trying to make my mind up with a short form for 'compress'.
I will change the 'zip' for something like 'compr'.

>> +    .bdrv_getlength                     = zip_getlength,
>> +    .bdrv_co_truncate                   = zip_co_truncate,
>> +
>> +    .bdrv_co_preadv                     = zip_co_preadv,
>> +    .bdrv_co_preadv_part                = zip_co_preadv_part,
>> +    .bdrv_co_pwritev                    = zip_co_pwritev,
>> +    .bdrv_co_pwritev_part               = zip_co_pwritev_part,
> 
> If you implement .bdrv_co_preadv/pwritev_part, isn't the implementation
> of .bdrv_co_preadv/pwritev (without _part) dead code?
> 

Understood and will remove the dead path.

>> +    .bdrv_co_pwrite_zeroes              = zip_co_pwrite_zeroes,
>> +    .bdrv_co_pdiscard                   = zip_co_pdiscard,
>> +    .bdrv_refresh_limits                = zip_refresh_limits,
>> +
>> +    .bdrv_eject                         = zip_eject,
>> +    .bdrv_lock_medium                   = zip_lock_medium,
>> +
>> +    .bdrv_co_block_status               = bdrv_co_block_status_from_backing,
> 
> Why not use bs->file? (Well, apart from the still not merged filter
> series by Max...)
> 

We need to keep a backing chain unbroken with the filter inserted. So, 
the backing child should not be zero. It is necessary for the backup 
job, for instance. When I initialized both children pointing to the same 
node, the code didn't work properly. I have to reproduce it again to 
tell you exactly what happened then and will appreciate your advice 
about a proper design.

Andrey

>> +    .bdrv_recurse_is_first_non_filter   = zip_recurse_is_first_non_filter,
>> +
>> +    .has_variable_length                = true,
>> +    .is_filter                          = true,
>> +};
> 
> Kevin
>
Vladimir Sementsov-Ogievskiy Nov. 12, 2019, 10:24 a.m. UTC | #5
12.11.2019 13:07, Andrey Shinkevich wrote:
> On 12/11/2019 12:39, Kevin Wolf wrote:
>> Am 11.11.2019 um 17:04 hat Andrey Shinkevich geschrieben:
>>> Allow writing all the data compressed through the filter driver.
>>> The written data will be aligned by the cluster size.
>>> Based on the QEMU current implementation, that data can be written to
>>> unallocated clusters only. May be used for a backup job.
>>>
>>> Suggested-by: Max Reitz <mreitz@redhat.com>
>>> Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
>>
>>> +static BlockDriver bdrv_compress = {
>>> +    .format_name                        = "compress",
>>> +
>>> +    .bdrv_open                          = zip_open,
>>> +    .bdrv_child_perm                    = zip_child_perm,
>>
>> Why do you call the functions zip_* when the driver is called compress?
>> I think zip would be a driver for zip archives, which we don't use here.
>>
> 
> Kevin,
> Thanks for your response.
> I was trying to make my mind up with a short form for 'compress'.
> I will change the 'zip' for something like 'compr'.

I'd keep it compress, it sounds better.

> 
>>> +    .bdrv_getlength                     = zip_getlength,
>>> +    .bdrv_co_truncate                   = zip_co_truncate,
>>> +
>>> +    .bdrv_co_preadv                     = zip_co_preadv,
>>> +    .bdrv_co_preadv_part                = zip_co_preadv_part,
>>> +    .bdrv_co_pwritev                    = zip_co_pwritev,
>>> +    .bdrv_co_pwritev_part               = zip_co_pwritev_part,
>>
>> If you implement .bdrv_co_preadv/pwritev_part, isn't the implementation
>> of .bdrv_co_preadv/pwritev (without _part) dead code?
>>
> 
> Understood and will remove the dead path.
> 
>>> +    .bdrv_co_pwrite_zeroes              = zip_co_pwrite_zeroes,
>>> +    .bdrv_co_pdiscard                   = zip_co_pdiscard,
>>> +    .bdrv_refresh_limits                = zip_refresh_limits,
>>> +
>>> +    .bdrv_eject                         = zip_eject,
>>> +    .bdrv_lock_medium                   = zip_lock_medium,
>>> +
>>> +    .bdrv_co_block_status               = bdrv_co_block_status_from_backing,
>>
>> Why not use bs->file? (Well, apart from the still not merged filter
>> series by Max...)
>>
> 
> We need to keep a backing chain unbroken with the filter inserted. So,
> the backing child should not be zero. It is necessary for the backup
> job, for instance. When I initialized both children pointing to the same
> node, the code didn't work properly. I have to reproduce it again to
> tell you exactly what happened then and will appreciate your advice
> about a proper design.
> 
> Andrey


file-child based filters are pain, which needs 42-patches (seems postponed now :(
Max's series to work correct (or at least more correct than now). file-child
based filters break backing-chains, and backing-child-based works well. So, I don't
know any benefit of file-child based filters, and I think there is no reason to
create new ones. If in future for some reason we need file-child support in the filter
it's simple to add it (so filter will have only one child, but it may be backing or
file, as requested by user).

So, I propose to start with backing-child which works better, and add file-child support
in future if needed.

Also, note that backup-top filter uses backing child, and there is a reason to make it
public filter (to realize image fleecing without backup job), so we'll have public
backing-child based filter anyway.

Also, we have pending series about using COR filter in block-stream (it breaks
backing-chain, as it is file-child-based), and now I think that the simplest
way to fix it is support backing child in block-stream (so, block-stream job
will create COR filter with backing child instead of file child).

> 
>>> +    .bdrv_recurse_is_first_non_filter   = zip_recurse_is_first_non_filter,
>>> +
>>> +    .has_variable_length                = true,
>>> +    .is_filter                          = true,
>>> +};
>>
>> Kevin
>>
>
Vladimir Sementsov-Ogievskiy Nov. 12, 2019, 10:33 a.m. UTC | #6
11.11.2019 19:04, Andrey Shinkevich wrote:
> Allow writing all the data compressed through the filter driver.
> The written data will be aligned by the cluster size.
> Based on the QEMU current implementation, that data can be written to
> unallocated clusters only. May be used for a backup job.
> 
> Suggested-by: Max Reitz <mreitz@redhat.com>
> Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
> ---
>   block/Makefile.objs     |   1 +
>   block/filter-compress.c | 212 ++++++++++++++++++++++++++++++++++++++++++++++++
>   qapi/block-core.json    |  10 ++-
>   3 files changed, 219 insertions(+), 4 deletions(-)
>   create mode 100644 block/filter-compress.c
> 
> diff --git a/block/Makefile.objs b/block/Makefile.objs
> index e394fe0..330529b 100644
> --- a/block/Makefile.objs
> +++ b/block/Makefile.objs
> @@ -43,6 +43,7 @@ block-obj-y += crypto.o
>   
>   block-obj-y += aio_task.o
>   block-obj-y += backup-top.o
> +block-obj-y += filter-compress.o
>   
>   common-obj-y += stream.o
>   
> diff --git a/block/filter-compress.c b/block/filter-compress.c
> new file mode 100644
> index 0000000..a7b0337
> --- /dev/null
> +++ b/block/filter-compress.c
> @@ -0,0 +1,212 @@
> +/*
> + * Compress filter block driver
> + *
> + * Copyright (c) 2019 Virtuozzo International GmbH
> + *
> + * Author:
> + *   Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
> + *   (based on block/copy-on-read.c by Max Reitz)
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU General Public License as
> + * published by the Free Software Foundation; either version 2 or
> + * (at your option) any later version of the License.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program; if not, see <http://www.gnu.org/licenses/>.
> + */
> +
> +#include "qemu/osdep.h"
> +#include "block/block_int.h"
> +#include "qemu/module.h"
> +
> +
> +static int zip_open(BlockDriverState *bs, QDict *options, int flags,
> +                     Error **errp)
> +{
> +    bs->backing = bdrv_open_child(NULL, options, "file", bs, &child_file, false,
> +                                  errp);
> +    if (!bs->backing) {
> +        return -EINVAL;
> +    }
> +
> +    bs->supported_write_flags = BDRV_REQ_WRITE_UNCHANGED |
> +        BDRV_REQ_WRITE_COMPRESSED |
> +        (BDRV_REQ_FUA & bs->backing->bs->supported_write_flags);
> +
> +    bs->supported_zero_flags = BDRV_REQ_WRITE_UNCHANGED |
> +        ((BDRV_REQ_FUA | BDRV_REQ_MAY_UNMAP | BDRV_REQ_NO_FALLBACK) &
> +            bs->backing->bs->supported_zero_flags);
> +
> +    return 0;
> +}
> +
> +
> +#define PERM_PASSTHROUGH (BLK_PERM_CONSISTENT_READ \
> +                          | BLK_PERM_WRITE \
> +                          | BLK_PERM_RESIZE)
> +#define PERM_UNCHANGED (BLK_PERM_ALL & ~PERM_PASSTHROUGH)
> +
> +static void zip_child_perm(BlockDriverState *bs, BdrvChild *c,
> +                            const BdrvChildRole *role,
> +                            BlockReopenQueue *reopen_queue,
> +                            uint64_t perm, uint64_t shared,
> +                            uint64_t *nperm, uint64_t *nshared)
> +{
> +    *nperm = perm & PERM_PASSTHROUGH;
> +    *nshared = (shared & PERM_PASSTHROUGH) | PERM_UNCHANGED;
> +
> +    /*
> +     * We must not request write permissions for an inactive node, the child
> +     * cannot provide it.
> +     */
> +    if (!(bs->open_flags & BDRV_O_INACTIVE)) {
> +        *nperm |= BLK_PERM_WRITE_UNCHANGED;
> +    }
> +}
> +
> +
> +static int64_t zip_getlength(BlockDriverState *bs)
> +{
> +    return bdrv_getlength(bs->backing->bs);
> +}
> +
> +
> +static int coroutine_fn zip_co_truncate(BlockDriverState *bs, int64_t offset,
> +                                         bool exact, PreallocMode prealloc,
> +                                         Error **errp)
> +{
> +    return bdrv_co_truncate(bs->backing, offset, exact, prealloc, errp);
> +}
> +
> +
> +static int coroutine_fn zip_co_preadv(BlockDriverState *bs,
> +                                       uint64_t offset, uint64_t bytes,
> +                                       QEMUIOVector *qiov, int flags)
> +{
> +    return bdrv_co_preadv(bs->backing, offset, bytes, qiov, flags);
> +}
> +
> +
> +static int coroutine_fn zip_co_preadv_part(BlockDriverState *bs,
> +                                            uint64_t offset, uint64_t bytes,
> +                                            QEMUIOVector *qiov,
> +                                            size_t qiov_offset,
> +                                            int flags)
> +{
> +    return bdrv_co_preadv_part(bs->backing, offset, bytes, qiov, qiov_offset,
> +                               flags);
> +}
> +
> +
> +static int coroutine_fn zip_co_pwritev(BlockDriverState *bs,
> +                                        uint64_t offset, uint64_t bytes,
> +                                        QEMUIOVector *qiov, int flags)
> +{
> +    return bdrv_co_pwritev(bs->backing, offset, bytes, qiov,
> +                           flags | BDRV_REQ_WRITE_COMPRESSED);
> +}
> +
> +
> +static int coroutine_fn zip_co_pwritev_part(BlockDriverState *bs,
> +                                             uint64_t offset, uint64_t bytes,
> +                                             QEMUIOVector *qiov,
> +                                             size_t qiov_offset, int flags)
> +{
> +    return bdrv_co_pwritev_part(bs->backing, offset, bytes, qiov, qiov_offset,
> +                                flags | BDRV_REQ_WRITE_COMPRESSED);
> +}
> +
> +
> +static int coroutine_fn zip_co_pwrite_zeroes(BlockDriverState *bs,
> +                                              int64_t offset, int bytes,
> +                                              BdrvRequestFlags flags)
> +{
> +    return bdrv_co_pwrite_zeroes(bs->backing, offset, bytes, flags);
> +}
> +
> +
> +static int coroutine_fn zip_co_pdiscard(BlockDriverState *bs,
> +                                         int64_t offset, int bytes)
> +{
> +    return bdrv_co_pdiscard(bs->backing, offset, bytes);
> +}
> +
> +
> +static void zip_refresh_limits(BlockDriverState *bs, Error **errp)
> +{
> +    BlockDriverInfo bdi;
> +    int ret;
> +
> +    if (!bs->backing) {
> +        return;
> +    }
> +
> +    ret = bdrv_get_info(bs->backing->bs, &bdi);
> +    if (ret < 0 || bdi.cluster_size == 0) {
> +        return;
> +    }
> +
> +    bs->backing->bs->bl.request_alignment = bdi.cluster_size;
> +    bs->backing->bs->bl.max_transfer = bdi.cluster_size;


I think, you should not edit these fields of child, we don't own them.
This handler should set ours bs->bl structure, bs->bl of the filter itself.

Also, we need a way to unset max_transfer here after next patch, to allow
multiple-cluster compressed writes.. But only for qcow2.

This means (sorry, I sent you on the wrong path). that we need separate
bs->bl.max_write_compressed, which defaults to cluster_size and may be set
by driver. And in the following patch which add multiple cluster compressed
write support to qcow2, we should set this bs->bl.max_write_compressed to
INT_MAX.

> +}
> +
> +
> +static void zip_eject(BlockDriverState *bs, bool eject_flag)
> +{
> +    bdrv_eject(bs->backing->bs, eject_flag);
> +}
> +
> +
> +static void zip_lock_medium(BlockDriverState *bs, bool locked)
> +{
> +    bdrv_lock_medium(bs->backing->bs, locked);
> +}
> +
> +
> +static bool zip_recurse_is_first_non_filter(BlockDriverState *bs,
> +                                             BlockDriverState *candidate)
> +{
> +    return bdrv_recurse_is_first_non_filter(bs->backing->bs, candidate);
> +}
> +
> +
> +static BlockDriver bdrv_compress = {
> +    .format_name                        = "compress",
> +
> +    .bdrv_open                          = zip_open,
> +    .bdrv_child_perm                    = zip_child_perm,
> +
> +    .bdrv_getlength                     = zip_getlength,
> +    .bdrv_co_truncate                   = zip_co_truncate,
> +
> +    .bdrv_co_preadv                     = zip_co_preadv,
> +    .bdrv_co_preadv_part                = zip_co_preadv_part,
> +    .bdrv_co_pwritev                    = zip_co_pwritev,
> +    .bdrv_co_pwritev_part               = zip_co_pwritev_part,
> +    .bdrv_co_pwrite_zeroes              = zip_co_pwrite_zeroes,
> +    .bdrv_co_pdiscard                   = zip_co_pdiscard,
> +    .bdrv_refresh_limits                = zip_refresh_limits,
> +
> +    .bdrv_eject                         = zip_eject,
> +    .bdrv_lock_medium                   = zip_lock_medium,
> +
> +    .bdrv_co_block_status               = bdrv_co_block_status_from_backing,
> +
> +    .bdrv_recurse_is_first_non_filter   = zip_recurse_is_first_non_filter,
> +
> +    .has_variable_length                = true,
> +    .is_filter                          = true,
> +};
> +
> +static void bdrv_compress_init(void)
> +{
> +    bdrv_register(&bdrv_compress);
> +}
> +
> +block_init(bdrv_compress_init);
> diff --git a/qapi/block-core.json b/qapi/block-core.json
> index aa97ee2..33d8cd8 100644
> --- a/qapi/block-core.json
> +++ b/qapi/block-core.json
> @@ -2884,15 +2884,16 @@
>   # @copy-on-read: Since 3.0
>   # @blklogwrites: Since 3.0
>   # @blkreplay: Since 4.2
> +# @compress: Since 4.2
>   #
>   # Since: 2.9
>   ##
>   { 'enum': 'BlockdevDriver',
>     'data': [ 'blkdebug', 'blklogwrites', 'blkreplay', 'blkverify', 'bochs',
> -            'cloop', 'copy-on-read', 'dmg', 'file', 'ftp', 'ftps', 'gluster',
> -            'host_cdrom', 'host_device', 'http', 'https', 'iscsi', 'luks',
> -            'nbd', 'nfs', 'null-aio', 'null-co', 'nvme', 'parallels', 'qcow',
> -            'qcow2', 'qed', 'quorum', 'raw', 'rbd',
> +            'cloop', 'copy-on-read', 'compress', 'dmg', 'file', 'ftp', 'ftps',
> +            'gluster', 'host_cdrom', 'host_device', 'http', 'https', 'iscsi',
> +            'luks', 'nbd', 'nfs', 'null-aio', 'null-co', 'nvme', 'parallels',
> +            'qcow', 'qcow2', 'qed', 'quorum', 'raw', 'rbd',
>               { 'name': 'replication', 'if': 'defined(CONFIG_REPLICATION)' },
>               'sheepdog',
>               'ssh', 'throttle', 'vdi', 'vhdx', 'vmdk', 'vpc', 'vvfat', 'vxhs' ] }
> @@ -4045,6 +4046,7 @@
>         'bochs':      'BlockdevOptionsGenericFormat',
>         'cloop':      'BlockdevOptionsGenericFormat',
>         'copy-on-read':'BlockdevOptionsGenericFormat',
> +      'compress':   'BlockdevOptionsGenericFormat',
>         'dmg':        'BlockdevOptionsGenericFormat',
>         'file':       'BlockdevOptionsFile',
>         'ftp':        'BlockdevOptionsCurlFtp',
>
diff mbox series

Patch

diff --git a/block/Makefile.objs b/block/Makefile.objs
index e394fe0..330529b 100644
--- a/block/Makefile.objs
+++ b/block/Makefile.objs
@@ -43,6 +43,7 @@  block-obj-y += crypto.o
 
 block-obj-y += aio_task.o
 block-obj-y += backup-top.o
+block-obj-y += filter-compress.o
 
 common-obj-y += stream.o
 
diff --git a/block/filter-compress.c b/block/filter-compress.c
new file mode 100644
index 0000000..a7b0337
--- /dev/null
+++ b/block/filter-compress.c
@@ -0,0 +1,212 @@ 
+/*
+ * Compress filter block driver
+ *
+ * Copyright (c) 2019 Virtuozzo International GmbH
+ *
+ * Author:
+ *   Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
+ *   (based on block/copy-on-read.c by Max Reitz)
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation; either version 2 or
+ * (at your option) any later version of the License.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include "qemu/osdep.h"
+#include "block/block_int.h"
+#include "qemu/module.h"
+
+
+static int zip_open(BlockDriverState *bs, QDict *options, int flags,
+                     Error **errp)
+{
+    bs->backing = bdrv_open_child(NULL, options, "file", bs, &child_file, false,
+                                  errp);
+    if (!bs->backing) {
+        return -EINVAL;
+    }
+
+    bs->supported_write_flags = BDRV_REQ_WRITE_UNCHANGED |
+        BDRV_REQ_WRITE_COMPRESSED |
+        (BDRV_REQ_FUA & bs->backing->bs->supported_write_flags);
+
+    bs->supported_zero_flags = BDRV_REQ_WRITE_UNCHANGED |
+        ((BDRV_REQ_FUA | BDRV_REQ_MAY_UNMAP | BDRV_REQ_NO_FALLBACK) &
+            bs->backing->bs->supported_zero_flags);
+
+    return 0;
+}
+
+
+#define PERM_PASSTHROUGH (BLK_PERM_CONSISTENT_READ \
+                          | BLK_PERM_WRITE \
+                          | BLK_PERM_RESIZE)
+#define PERM_UNCHANGED (BLK_PERM_ALL & ~PERM_PASSTHROUGH)
+
+static void zip_child_perm(BlockDriverState *bs, BdrvChild *c,
+                            const BdrvChildRole *role,
+                            BlockReopenQueue *reopen_queue,
+                            uint64_t perm, uint64_t shared,
+                            uint64_t *nperm, uint64_t *nshared)
+{
+    *nperm = perm & PERM_PASSTHROUGH;
+    *nshared = (shared & PERM_PASSTHROUGH) | PERM_UNCHANGED;
+
+    /*
+     * We must not request write permissions for an inactive node, the child
+     * cannot provide it.
+     */
+    if (!(bs->open_flags & BDRV_O_INACTIVE)) {
+        *nperm |= BLK_PERM_WRITE_UNCHANGED;
+    }
+}
+
+
+static int64_t zip_getlength(BlockDriverState *bs)
+{
+    return bdrv_getlength(bs->backing->bs);
+}
+
+
+static int coroutine_fn zip_co_truncate(BlockDriverState *bs, int64_t offset,
+                                         bool exact, PreallocMode prealloc,
+                                         Error **errp)
+{
+    return bdrv_co_truncate(bs->backing, offset, exact, prealloc, errp);
+}
+
+
+static int coroutine_fn zip_co_preadv(BlockDriverState *bs,
+                                       uint64_t offset, uint64_t bytes,
+                                       QEMUIOVector *qiov, int flags)
+{
+    return bdrv_co_preadv(bs->backing, offset, bytes, qiov, flags);
+}
+
+
+static int coroutine_fn zip_co_preadv_part(BlockDriverState *bs,
+                                            uint64_t offset, uint64_t bytes,
+                                            QEMUIOVector *qiov,
+                                            size_t qiov_offset,
+                                            int flags)
+{
+    return bdrv_co_preadv_part(bs->backing, offset, bytes, qiov, qiov_offset,
+                               flags);
+}
+
+
+static int coroutine_fn zip_co_pwritev(BlockDriverState *bs,
+                                        uint64_t offset, uint64_t bytes,
+                                        QEMUIOVector *qiov, int flags)
+{
+    return bdrv_co_pwritev(bs->backing, offset, bytes, qiov,
+                           flags | BDRV_REQ_WRITE_COMPRESSED);
+}
+
+
+static int coroutine_fn zip_co_pwritev_part(BlockDriverState *bs,
+                                             uint64_t offset, uint64_t bytes,
+                                             QEMUIOVector *qiov,
+                                             size_t qiov_offset, int flags)
+{
+    return bdrv_co_pwritev_part(bs->backing, offset, bytes, qiov, qiov_offset,
+                                flags | BDRV_REQ_WRITE_COMPRESSED);
+}
+
+
+static int coroutine_fn zip_co_pwrite_zeroes(BlockDriverState *bs,
+                                              int64_t offset, int bytes,
+                                              BdrvRequestFlags flags)
+{
+    return bdrv_co_pwrite_zeroes(bs->backing, offset, bytes, flags);
+}
+
+
+static int coroutine_fn zip_co_pdiscard(BlockDriverState *bs,
+                                         int64_t offset, int bytes)
+{
+    return bdrv_co_pdiscard(bs->backing, offset, bytes);
+}
+
+
+static void zip_refresh_limits(BlockDriverState *bs, Error **errp)
+{
+    BlockDriverInfo bdi;
+    int ret;
+
+    if (!bs->backing) {
+        return;
+    }
+
+    ret = bdrv_get_info(bs->backing->bs, &bdi);
+    if (ret < 0 || bdi.cluster_size == 0) {
+        return;
+    }
+
+    bs->backing->bs->bl.request_alignment = bdi.cluster_size;
+    bs->backing->bs->bl.max_transfer = bdi.cluster_size;
+}
+
+
+static void zip_eject(BlockDriverState *bs, bool eject_flag)
+{
+    bdrv_eject(bs->backing->bs, eject_flag);
+}
+
+
+static void zip_lock_medium(BlockDriverState *bs, bool locked)
+{
+    bdrv_lock_medium(bs->backing->bs, locked);
+}
+
+
+static bool zip_recurse_is_first_non_filter(BlockDriverState *bs,
+                                             BlockDriverState *candidate)
+{
+    return bdrv_recurse_is_first_non_filter(bs->backing->bs, candidate);
+}
+
+
+static BlockDriver bdrv_compress = {
+    .format_name                        = "compress",
+
+    .bdrv_open                          = zip_open,
+    .bdrv_child_perm                    = zip_child_perm,
+
+    .bdrv_getlength                     = zip_getlength,
+    .bdrv_co_truncate                   = zip_co_truncate,
+
+    .bdrv_co_preadv                     = zip_co_preadv,
+    .bdrv_co_preadv_part                = zip_co_preadv_part,
+    .bdrv_co_pwritev                    = zip_co_pwritev,
+    .bdrv_co_pwritev_part               = zip_co_pwritev_part,
+    .bdrv_co_pwrite_zeroes              = zip_co_pwrite_zeroes,
+    .bdrv_co_pdiscard                   = zip_co_pdiscard,
+    .bdrv_refresh_limits                = zip_refresh_limits,
+
+    .bdrv_eject                         = zip_eject,
+    .bdrv_lock_medium                   = zip_lock_medium,
+
+    .bdrv_co_block_status               = bdrv_co_block_status_from_backing,
+
+    .bdrv_recurse_is_first_non_filter   = zip_recurse_is_first_non_filter,
+
+    .has_variable_length                = true,
+    .is_filter                          = true,
+};
+
+static void bdrv_compress_init(void)
+{
+    bdrv_register(&bdrv_compress);
+}
+
+block_init(bdrv_compress_init);
diff --git a/qapi/block-core.json b/qapi/block-core.json
index aa97ee2..33d8cd8 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -2884,15 +2884,16 @@ 
 # @copy-on-read: Since 3.0
 # @blklogwrites: Since 3.0
 # @blkreplay: Since 4.2
+# @compress: Since 4.2
 #
 # Since: 2.9
 ##
 { 'enum': 'BlockdevDriver',
   'data': [ 'blkdebug', 'blklogwrites', 'blkreplay', 'blkverify', 'bochs',
-            'cloop', 'copy-on-read', 'dmg', 'file', 'ftp', 'ftps', 'gluster',
-            'host_cdrom', 'host_device', 'http', 'https', 'iscsi', 'luks',
-            'nbd', 'nfs', 'null-aio', 'null-co', 'nvme', 'parallels', 'qcow',
-            'qcow2', 'qed', 'quorum', 'raw', 'rbd',
+            'cloop', 'copy-on-read', 'compress', 'dmg', 'file', 'ftp', 'ftps',
+            'gluster', 'host_cdrom', 'host_device', 'http', 'https', 'iscsi',
+            'luks', 'nbd', 'nfs', 'null-aio', 'null-co', 'nvme', 'parallels',
+            'qcow', 'qcow2', 'qed', 'quorum', 'raw', 'rbd',
             { 'name': 'replication', 'if': 'defined(CONFIG_REPLICATION)' },
             'sheepdog',
             'ssh', 'throttle', 'vdi', 'vhdx', 'vmdk', 'vpc', 'vvfat', 'vxhs' ] }
@@ -4045,6 +4046,7 @@ 
       'bochs':      'BlockdevOptionsGenericFormat',
       'cloop':      'BlockdevOptionsGenericFormat',
       'copy-on-read':'BlockdevOptionsGenericFormat',
+      'compress':   'BlockdevOptionsGenericFormat',
       'dmg':        'BlockdevOptionsGenericFormat',
       'file':       'BlockdevOptionsFile',
       'ftp':        'BlockdevOptionsCurlFtp',