Message ID | 6d596d82ed62615a8565b661691a06bfaf32237e.1584468723.git.berto@igalia.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Add subcluster allocation to qcow2 | expand |
On Tue 17 Mar 2020 07:16:21 PM CET, Alberto Garcia <berto@igalia.com> wrote: > Compressed clusters always have the bitmap part of the extended L2 > entry set to 0. I was just finishing some improvements to the new code that allows BDRV_REQ_ZERO_WRITE at the subcluster level, and I'm starting to entertain the idea of using the L2 bitmap for compressed clusters as well. I will make some tests next week, but I would like to know your opinion in case I'm missing something. A compressed cluster cannot be divided into subclusters on the image: you would not be able to allocate or overwrite them separately, therefore any write request necessarily has to write (or do COW of) the whole cluster. However if you consider the uncompressed guest data I don't see any reason why you wouldn't be able to zeroize or even deallocate individual subclusters. These operations don't touch the cluster data on disk anyway, they only touch the L2 metadata in order to change what the guest sees. 'write -c 0 64k' followed by 'write -z 16k 16k' would not need to do any copy on write. The compressed data would remain untouched on disk but some of the subclusters would have the 'all zeroes' bit set, exactly like what happens with normal clusters. I think that this would make the on-disk format a bit simpler in general (no need to treat compressed clusters differently in some cases) and it would add a new optimization to compressed images. I just need to make sure that it doesn't complicate the code (my feeling is that it would actually simplify it, but I have to see). Opinions? Berto
On 4/24/20 12:02 PM, Alberto Garcia wrote: > On Tue 17 Mar 2020 07:16:21 PM CET, Alberto Garcia <berto@igalia.com> wrote: >> Compressed clusters always have the bitmap part of the extended L2 >> entry set to 0. > > I was just finishing some improvements to the new code that allows > BDRV_REQ_ZERO_WRITE at the subcluster level, and I'm starting to > entertain the idea of using the L2 bitmap for compressed clusters as > well. > > I will make some tests next week, but I would like to know your opinion > in case I'm missing something. > > A compressed cluster cannot be divided into subclusters on the image: > you would not be able to allocate or overwrite them separately, > therefore any write request necessarily has to write (or do COW of) the > whole cluster. > > However if you consider the uncompressed guest data I don't see any > reason why you wouldn't be able to zeroize or even deallocate individual > subclusters. These operations don't touch the cluster data on disk > anyway, they only touch the L2 metadata in order to change what the > guest sees. > > 'write -c 0 64k' followed by 'write -z 16k 16k' would not need to do any > copy on write. The compressed data would remain untouched on disk but > some of the subclusters would have the 'all zeroes' bit set, exactly > like what happens with normal clusters. It's a special case that avoids COW for write zeroes, but not for anything else. The moment you write any data (whether to the zero-above-compressed or the regular compressed portion), the entire cluster has to be rewritten. I'm not sure how frequently guests will actually have the scenario of doing a zero request on a sub-cluster, but at the same time, I can see where you're coming from in stating that if it makes management of extended L2 easier to allow zero subclusters on top of a compressed cluster, then there's no reason to forbid it. > > I think that this would make the on-disk format a bit simpler in general > (no need to treat compressed clusters differently in some cases) and it > would add a new optimization to compressed images. I just need to make > sure that it doesn't complicate the code (my feeling is that it would > actually simplify it, but I have to see). > > Opinions? > > Berto >
On Fri 24 Apr 2020 07:11:08 PM CEST, Eric Blake <eblake@redhat.com> wrote: >> 'write -c 0 64k' followed by 'write -z 16k 16k' would not need to do any >> copy on write. The compressed data would remain untouched on disk but >> some of the subclusters would have the 'all zeroes' bit set, exactly >> like what happens with normal clusters. > > It's a special case that avoids COW for write zeroes, but not for > anything else. The moment you write any data (whether to the > zero-above-compressed or the regular compressed portion), the entire > cluster has to be rewritten. That's right but you can still write zeroes without having to rewrite anything, and read back the zeroes without having to decompress the data. > at the same time, I can see where you're coming from in stating that > if it makes management of extended L2 easier to allow zero subclusters > on top of a compressed cluster, then there's no reason to forbid it. I'm not sure if it makes it easier. Some operations are definitely going to be easier but maybe we have to add and handle _ZERO_COMPRESSED in addition to _ZERO_PLAIN and _ZERO_ALLOC (the same for unallocated subclusters). Or maybe replace QCow2SubclusterType with something else. I need to evaluate that. Berto
On 4/24/20 12:21 PM, Alberto Garcia wrote: > On Fri 24 Apr 2020 07:11:08 PM CEST, Eric Blake <eblake@redhat.com> wrote: >>> 'write -c 0 64k' followed by 'write -z 16k 16k' would not need to do any >>> copy on write. The compressed data would remain untouched on disk but >>> some of the subclusters would have the 'all zeroes' bit set, exactly >>> like what happens with normal clusters. >> >> It's a special case that avoids COW for write zeroes, but not for >> anything else. The moment you write any data (whether to the >> zero-above-compressed or the regular compressed portion), the entire >> cluster has to be rewritten. > > That's right but you can still write zeroes without having to rewrite > anything, and read back the zeroes without having to decompress the > data. > >> at the same time, I can see where you're coming from in stating that >> if it makes management of extended L2 easier to allow zero subclusters >> on top of a compressed cluster, then there's no reason to forbid it. > > I'm not sure if it makes it easier. Some operations are definitely going > to be easier but maybe we have to add and handle _ZERO_COMPRESSED in > addition to _ZERO_PLAIN and _ZERO_ALLOC (the same for unallocated > subclusters). Or maybe replace QCow2SubclusterType with something > else. I need to evaluate that. Reading the entire cluster will be interesting - you'll have to decompress the entire memory, then overwrite the zeroed portions. The savings in reading occur only when your read is limited to just the subclusters that are zeroed. But then again, even on a regular cluster, read has to pay attention to which subclusters are zeroed, so you already have the workhorse in read for detecting whether a normal read is sufficient or if you have to follow up with piecing together zeroed sections.
On Fri 24 Apr 2020 07:44:33 PM CEST, Eric Blake <eblake@redhat.com> wrote: >>> at the same time, I can see where you're coming from in stating that >>> if it makes management of extended L2 easier to allow zero subclusters >>> on top of a compressed cluster, then there's no reason to forbid it. >> >> I'm not sure if it makes it easier. Some operations are definitely going >> to be easier but maybe we have to add and handle _ZERO_COMPRESSED in >> addition to _ZERO_PLAIN and _ZERO_ALLOC (the same for unallocated >> subclusters). Or maybe replace QCow2SubclusterType with something >> else. I need to evaluate that. > > Reading the entire cluster will be interesting - you'll have to > decompress the entire memory, then overwrite the zeroed portions. I don't think so, qcow2_get_host_offset() would detect the number of contiguous subclusters of the same type at the given offset. In this case they would be _ZERO subclusters so there's no need to decompress anything, or even read it (it works the same with uncompressed clusters). Berto
24.04.2020 20:44, Eric Blake wrote: > On 4/24/20 12:21 PM, Alberto Garcia wrote: >> On Fri 24 Apr 2020 07:11:08 PM CEST, Eric Blake <eblake@redhat.com> wrote: >>>> 'write -c 0 64k' followed by 'write -z 16k 16k' would not need to do any >>>> copy on write. The compressed data would remain untouched on disk but >>>> some of the subclusters would have the 'all zeroes' bit set, exactly >>>> like what happens with normal clusters. >>> >>> It's a special case that avoids COW for write zeroes, but not for >>> anything else. The moment you write any data (whether to the >>> zero-above-compressed or the regular compressed portion), the entire >>> cluster has to be rewritten. >> >> That's right but you can still write zeroes without having to rewrite >> anything, and read back the zeroes without having to decompress the >> data. >> >>> at the same time, I can see where you're coming from in stating that >>> if it makes management of extended L2 easier to allow zero subclusters >>> on top of a compressed cluster, then there's no reason to forbid it. >> >> I'm not sure if it makes it easier. Some operations are definitely going >> to be easier but maybe we have to add and handle _ZERO_COMPRESSED in >> addition to _ZERO_PLAIN and _ZERO_ALLOC (the same for unallocated >> subclusters). Or maybe replace QCow2SubclusterType with something >> else. I need to evaluate that. > > Reading the entire cluster will be interesting - you'll have to decompress the entire memory, then overwrite the zeroed portions. The savings in reading occur only when your read is limited to just the subclusters that are zeroed. > > But then again, even on a regular cluster, read has to pay attention to which subclusters are zeroed, so you already have the workhorse in read for detecting whether a normal read is sufficient or if you have to follow up with piecing together zeroed sections. > AFAIK, now compressed clusters can't be used in scenarios with guest, as qcow2 driver doesn't support rewriting them. Or am I wrong? And we normally don't combine normal and compressed clusters together in one image. So, currently, the usual use-case of compressed clusters is a fully compressed image, written once. It means, that with current specification, subclusters adds nothing to this case, and no reason to create compressed image with subclusters. And even if we allow zero/unallocated subclusters, seems it adds nothing to this use-case. So, I don't see real benefits of it for now, but neither any problems with it, so agree that it's mostly about which way is simpler..
24.04.2020 20:56, Alberto Garcia wrote: > On Fri 24 Apr 2020 07:44:33 PM CEST, Eric Blake <eblake@redhat.com> wrote: >>>> at the same time, I can see where you're coming from in stating that >>>> if it makes management of extended L2 easier to allow zero subclusters >>>> on top of a compressed cluster, then there's no reason to forbid it. >>> >>> I'm not sure if it makes it easier. Some operations are definitely going >>> to be easier but maybe we have to add and handle _ZERO_COMPRESSED in >>> addition to _ZERO_PLAIN and _ZERO_ALLOC (the same for unallocated >>> subclusters). Or maybe replace QCow2SubclusterType with something >>> else. I need to evaluate that. Reviewing your series it already came in my mind, that we are doing too much with the conversion from l2e flags to "type". Does it worth it? All these ZERO_PLAIN and UNALLOCATED_ALLOC, and "case <TYPE>:" lines combined by three-four into one case, do they help, or is it an extra work? We just have to maintain two views of one model.. But I don't suggest to refactor it in these series :) >> >> Reading the entire cluster will be interesting - you'll have to >> decompress the entire memory, then overwrite the zeroed portions. > > I don't think so, qcow2_get_host_offset() would detect the number of > contiguous subclusters of the same type at the given offset. In this > case they would be _ZERO subclusters so there's no need to decompress > anything, or even read it (it works the same with uncompressed > clusters). > But if at least one of subclusters to read is not _ZERO, you'll have to decompress the whole cluster, and after decompression rewrite zero-subclusters by zeroes, as Eric says.. Or I lost the thread:)
On Fri 24 Apr 2020 08:25:45 PM CEST, Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> wrote: >>> Reading the entire cluster will be interesting - you'll have to >>> decompress the entire memory, then overwrite the zeroed portions. >> >> I don't think so, qcow2_get_host_offset() would detect the number of >> contiguous subclusters of the same type at the given offset. In this >> case they would be _ZERO subclusters so there's no need to decompress >> anything, or even read it (it works the same with uncompressed >> clusters). > > But if at least one of subclusters to read is not _ZERO, you'll have > to decompress the whole cluster, and after decompression rewrite > zero-subclusters by zeroes, as Eric says.. Or I lost the thread:) I don't see why you would need to rewrite anything... you do have to decompress the whole cluster, and the uncompressed cluster in memory would have stale data, but you never need to use that data for anything, let alone to return it to the guest. Even if there's a COW, the new cluster would inherit the compressed cluster's bitmap so the zeroized subclusters still read as zeroes. It's the same with normal clusters, 'write -P 0xff 0 64k' followed by 'write -z 16k 16k'. The host cluster on disk still reads as 0xff but the L2 entry indicates that part of it is just zeroes. Berto
On Fri 24 Apr 2020 08:15:04 PM CEST, Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> wrote: > AFAIK, now compressed clusters can't be used in scenarios with guest, > as qcow2 driver doesn't support rewriting them. You can write to those images just fine, it's just not efficient because you have to COW the compressed clusters. > Or am I wrong? And we normally don't combine normal and compressed > clusters together in one image. As soon as you start writing to an image with compressed clusters you'll have a combination of both. But it's true that you don't have an image with compressed clusters if what you're looking for is performance. So I wouldn't add support for this if it complicates things too much. Berto
On 4/24/20 1:37 PM, Alberto Garcia wrote: > On Fri 24 Apr 2020 08:25:45 PM CEST, Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> wrote: >>>> Reading the entire cluster will be interesting - you'll have to >>>> decompress the entire memory, then overwrite the zeroed portions. >>> >>> I don't think so, qcow2_get_host_offset() would detect the number of >>> contiguous subclusters of the same type at the given offset. In this >>> case they would be _ZERO subclusters so there's no need to decompress >>> anything, or even read it (it works the same with uncompressed >>> clusters). >> >> But if at least one of subclusters to read is not _ZERO, you'll have >> to decompress the whole cluster, and after decompression rewrite >> zero-subclusters by zeroes, as Eric says.. Or I lost the thread:) > > I don't see why you would need to rewrite anything... you do have to > decompress the whole cluster, and the uncompressed cluster in memory > would have stale data, but you never need to use that data for anything, > let alone to return it to the guest. > > Even if there's a COW, the new cluster would inherit the compressed > cluster's bitmap so the zeroized subclusters still read as zeroes. > > It's the same with normal clusters, 'write -P 0xff 0 64k' followed by > 'write -z 16k 16k'. The host cluster on disk still reads as 0xff but the > L2 entry indicates that part of it is just zeroes. The point is this: Consider 'write -P 0xff 0 64k', then 'write -z 16k 16k', then 'read 0 64k'. For normal clusters, we can just do a scatter-gather iov read of read 0-16k and 32-64k, plus a memset of 16-32k. But for compressed clusters, we have to read and decompress the entire 64k, AND also memset 16k-32k. But if zeroing after reading is not that expensive, then the same technique for normal clusters is fine (instead of a scatter-gather read of 48k, just read the whole 64k cluster before doing the memset). So the question at hand is not what happens in writing, but in reading, and whether we are penalizing reads from a compressed cluster or even from regular clusters, when reading from a cluster where subclusters have different status.
24.04.2020 21:41, Alberto Garcia wrote: > On Fri 24 Apr 2020 08:15:04 PM CEST, Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> wrote: >> AFAIK, now compressed clusters can't be used in scenarios with guest, >> as qcow2 driver doesn't support rewriting them. > > You can write to those images just fine, it's just not efficient because > you have to COW the compressed clusters. No, rewriting doesn't work: [root@kvm master]# ./qemu-img create -f qcow2 x 10M Formatting 'x', fmt=qcow2 size=10485760 cluster_size=65536 lazy_refcounts=off refcount_bits=16 [root@kvm master]# ./qemu-io -c 'write -c 0 64K' x wrote 65536/65536 bytes at offset 0 64 KiB, 1 ops; 00.23 sec (278.708 KiB/sec and 4.3548 ops/sec) [root@kvm master]# ./qemu-io -c 'write -c 0 64K' x write failed: Input/output error > >> Or am I wrong? And we normally don't combine normal and compressed >> clusters together in one image. > > As soon as you start writing to an image with compressed clusters you'll > have a combination of both. Ah, you mean, rewriting compressed clusters by normal.. So the use-case is just take compressed backup image and use it for the guest, instead of converting first.. It makes sense. > > But it's true that you don't have an image with compressed clusters if > what you're looking for is performance. So I wouldn't add support for > this if it complicates things too much. > > Berto >
On 24.04.20 20:47, Eric Blake wrote: > On 4/24/20 1:37 PM, Alberto Garcia wrote: >> On Fri 24 Apr 2020 08:25:45 PM CEST, Vladimir Sementsov-Ogievskiy >> <vsementsov@virtuozzo.com> wrote: >>>>> Reading the entire cluster will be interesting - you'll have to >>>>> decompress the entire memory, then overwrite the zeroed portions. >>>> >>>> I don't think so, qcow2_get_host_offset() would detect the number of >>>> contiguous subclusters of the same type at the given offset. In this >>>> case they would be _ZERO subclusters so there's no need to decompress >>>> anything, or even read it (it works the same with uncompressed >>>> clusters). >>> >>> But if at least one of subclusters to read is not _ZERO, you'll have >>> to decompress the whole cluster, and after decompression rewrite >>> zero-subclusters by zeroes, as Eric says.. Or I lost the thread:) >> >> I don't see why you would need to rewrite anything... you do have to >> decompress the whole cluster, and the uncompressed cluster in memory >> would have stale data, but you never need to use that data for anything, >> let alone to return it to the guest. >> >> Even if there's a COW, the new cluster would inherit the compressed >> cluster's bitmap so the zeroized subclusters still read as zeroes. >> >> It's the same with normal clusters, 'write -P 0xff 0 64k' followed by >> 'write -z 16k 16k'. The host cluster on disk still reads as 0xff but the >> L2 entry indicates that part of it is just zeroes. > > The point is this: Consider 'write -P 0xff 0 64k', then 'write -z 16k > 16k', then 'read 0 64k'. For normal clusters, we can just do a > scatter-gather iov read of read 0-16k and 32-64k, plus a memset of > 16-32k. But for compressed clusters, we have to read and decompress the > entire 64k, AND also memset 16k-32k. But if zeroing after reading is > not that expensive, then the same technique for normal clusters is fine > (instead of a scatter-gather read of 48k, just read the whole 64k > cluster before doing the memset). It would also mean letting qcow2_co_preadv_part() special-handle such cases, i.e., whenever the whole clusters is compressed, it needs to read it as a whole, regardless of the subcluster status, and then memset() all areas to zero that are all-zero subclusters. Otherwise we’d read and decompress the whole buffer twice (once for 0 to 16k, once for 32k to 64k). This may be complicated a bit by the task schema, i.e. that reads are scheduled in the task pool. For qcow2_co_preadv_part() to memset some area after decompression, it would need to wait on the read_compressed task, which would make the whole task pool thing moot (for compressed clusters). Or it just does the memset() at the end, when we have to settle the task pool anyway, but then it would have to remember all areas it still needs to zero. Hm, or, qcow2_co_preadv_compresed() could figure out where the zeroed subclusters are and then memset() them itself, e.g. by receiving the subcluster bitmap. Probably the simplest implementation, but it seems a bit like a layering breach. Not sure how bad the complexity is on the write side for not letting zero writes just zero the subcluster, but it doesn’t seem to me that the opposite would come for free on the read side. Max
On Mon 27 Apr 2020 09:49:00 AM CEST, Max Reitz wrote: >> The point is this: Consider 'write -P 0xff 0 64k', then 'write -z 16k >> 16k', then 'read 0 64k'. For normal clusters, we can just do a >> scatter-gather iov read of read 0-16k and 32-64k, plus a memset of >> 16-32k. But for compressed clusters, we have to read and decompress >> the entire 64k, AND also memset 16k-32k. But if zeroing after reading >> is not that expensive, then the same technique for normal clusters is >> fine (instead of a scatter-gather read of 48k, just read the whole >> 64k cluster before doing the memset). > > It would also mean letting qcow2_co_preadv_part() special-handle such > cases, i.e., whenever the whole clusters is compressed, it needs to > read it as a whole, regardless of the subcluster status, and then > memset() all areas to zero that are all-zero subclusters. Otherwise > we’d read and decompress the whole buffer twice (once for 0 to 16k, > once for 32k to 64k). This is actually a good reason against adding subcluster allocation to compressed clusters. I wouldn't like to complicate the code for this use case, so we either don't support it at all, or we support it with the problem that you mention (decompressing the whole buffer more than once if the cluster contains holes). > Not sure how bad the complexity is on the write side for not letting > zero writes just zero the subcluster It is not bad, I just have to check the cluster type and return -ENOTSUP. Berto
diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c index dfd8b66958..1f471db98c 100644 --- a/block/qcow2-cluster.c +++ b/block/qcow2-cluster.c @@ -806,6 +806,9 @@ int qcow2_alloc_compressed_cluster_offset(BlockDriverState *bs, BLKDBG_EVENT(bs->file, BLKDBG_L2_UPDATE_COMPRESSED); qcow2_cache_entry_mark_dirty(s->l2_table_cache, l2_slice); set_l2_entry(s, l2_slice, l2_index, cluster_offset); + if (has_subclusters(s)) { + set_l2_bitmap(s, l2_slice, l2_index, 0); + } qcow2_cache_put(s->l2_table_cache, (void **) &l2_slice); *host_offset = cluster_offset & s->cluster_offset_mask;