Message ID | 1463229957-14253-3-git-send-email-den@openvz.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Sat, May 14, 2016 at 03:45:50PM +0300, Denis V. Lunev wrote: > + qemu_co_mutex_lock(&s->lock); > + cluster_offset = \ > + qcow2_alloc_compressed_cluster_offset(bs, sector_num << 9, out_len); The backslash isn't necessary for wrapping lines in C. This kind of thing is only necessary in languages like Python where the grammar is whitespace sensistive. The C compiler is happy with an arbitrary amount of whitespace (newlines) in the middle of a statement. The backslash in C is handled by the preprocessor: it joins the line. That's useful for macro definitions where you need to tell the preprocessor that several lines belong to one macro definition. But it's not needed for normal C code. > + if (!cluster_offset) { > + qemu_co_mutex_unlock(&s->lock); > + ret = -EIO; > + goto fail; > + } > + cluster_offset &= s->cluster_offset_mask; > > - BLKDBG_EVENT(bs->file, BLKDBG_WRITE_COMPRESSED); > - ret = bdrv_pwrite(bs->file->bs, cluster_offset, out_buf, out_len); > - if (ret < 0) { > - goto fail; > - } > + ret = qcow2_pre_write_overlap_check(bs, 0, cluster_offset, out_len); > + qemu_co_mutex_unlock(&s->lock); > + if (ret < 0) { > + goto fail; > } > > + iov = (struct iovec) { > + .iov_base = out_buf, > + .iov_len = out_len, > + }; > + qemu_iovec_init_external(&hd_qiov, &iov, 1); > + > + BLKDBG_EVENT(bs->file, BLKDBG_WRITE_COMPRESSED); > + ret = bdrv_co_pwritev(bs->file->bs, cluster_offset, out_len, &hd_qiov, 0); There is a race condition here: If the newly allocated cluster is only partially filled by compressed data then qcow2_alloc_compressed_cluster_offset() remembers that more bytes are still available in the cluster. The qcow2_alloc_compressed_cluster_offset() caller will continue filling the same cluster. Imagine two compressed writes running at the same time. Write A allocates just a few bytes so write B shares a sector with the first write: Sector 1 |AAABBBBBBBBB| The race condition is that bdrv_co_pwritev() uses read-modify-write (a bounce buffer). If both requests call bdrv_co_pwritev() around the same time then the following could happen: Sector 1 |000BBBBBBBBB| or: Sector 1 |AAA000000000| It's necessary to hold s->lock around the compressed data write to avoid this race condition.
On 27.05.2016 20:33, Stefan Hajnoczi wrote: > On Sat, May 14, 2016 at 03:45:50PM +0300, Denis V. Lunev wrote: >> + qemu_co_mutex_lock(&s->lock); >> + cluster_offset = \ >> + qcow2_alloc_compressed_cluster_offset(bs, sector_num << 9, out_len); > > The backslash isn't necessary for wrapping lines in C. This kind of > thing is only necessary in languages like Python where the grammar is > whitespace sensistive. > > The C compiler is happy with an arbitrary amount of whitespace > (newlines) in the middle of a statement. The backslash in C is handled > by the preprocessor: it joins the line. That's useful for macro > definitions where you need to tell the preprocessor that several lines > belong to one macro definition. But it's not needed for normal C code. > Thanks for the explanation, but the backslash is used more for the person as a marker a line break. The current coding style misses this point, but I can remove the backslash, because I don't think it's something important :) >> + if (!cluster_offset) { >> + qemu_co_mutex_unlock(&s->lock); >> + ret = -EIO; >> + goto fail; >> + } >> + cluster_offset &= s->cluster_offset_mask; >> >> - BLKDBG_EVENT(bs->file, BLKDBG_WRITE_COMPRESSED); >> - ret = bdrv_pwrite(bs->file->bs, cluster_offset, out_buf, out_len); >> - if (ret < 0) { >> - goto fail; >> - } >> + ret = qcow2_pre_write_overlap_check(bs, 0, cluster_offset, out_len); >> + qemu_co_mutex_unlock(&s->lock); >> + if (ret < 0) { >> + goto fail; >> } >> >> + iov = (struct iovec) { >> + .iov_base = out_buf, >> + .iov_len = out_len, >> + }; >> + qemu_iovec_init_external(&hd_qiov, &iov, 1); >> + >> + BLKDBG_EVENT(bs->file, BLKDBG_WRITE_COMPRESSED); >> + ret = bdrv_co_pwritev(bs->file->bs, cluster_offset, out_len, &hd_qiov, 0); > > There is a race condition here: > > If the newly allocated cluster is only partially filled by compressed > data then qcow2_alloc_compressed_cluster_offset() remembers that more > bytes are still available in the cluster. The > qcow2_alloc_compressed_cluster_offset() caller will continue filling the > same cluster. > > Imagine two compressed writes running at the same time. Write A > allocates just a few bytes so write B shares a sector with the first > write: > > Sector 1 > |AAABBBBBBBBB| > > The race condition is that bdrv_co_pwritev() uses read-modify-write (a > bounce buffer). If both requests call bdrv_co_pwritev() around the same > time then the following could happen: > > Sector 1 > |000BBBBBBBBB| > > or: > > Sector 1 > |AAA000000000| > > It's necessary to hold s->lock around the compressed data write to avoid > this race condition. > I agree, there is really a race.. Thank you, this is a very good point!
On 30.05.2016 12:12, Pavel Butsykin wrote: > On 27.05.2016 20:33, Stefan Hajnoczi wrote: >> On Sat, May 14, 2016 at 03:45:50PM +0300, Denis V. Lunev wrote: >>> + qemu_co_mutex_lock(&s->lock); >>> + cluster_offset = \ >>> + qcow2_alloc_compressed_cluster_offset(bs, sector_num << 9, >>> out_len); >> >> The backslash isn't necessary for wrapping lines in C. This kind of >> thing is only necessary in languages like Python where the grammar is >> whitespace sensistive. >> >> The C compiler is happy with an arbitrary amount of whitespace >> (newlines) in the middle of a statement. The backslash in C is handled >> by the preprocessor: it joins the line. That's useful for macro >> definitions where you need to tell the preprocessor that several lines >> belong to one macro definition. But it's not needed for normal C code. >> > Thanks for the explanation, but the backslash is used more for the > person as a marker a line break. The current coding style misses this > point, but I can remove the backslash, because I don't think it's > something important :) > >>> + if (!cluster_offset) { >>> + qemu_co_mutex_unlock(&s->lock); >>> + ret = -EIO; >>> + goto fail; >>> + } >>> + cluster_offset &= s->cluster_offset_mask; >>> >>> - BLKDBG_EVENT(bs->file, BLKDBG_WRITE_COMPRESSED); >>> - ret = bdrv_pwrite(bs->file->bs, cluster_offset, out_buf, >>> out_len); >>> - if (ret < 0) { >>> - goto fail; >>> - } >>> + ret = qcow2_pre_write_overlap_check(bs, 0, cluster_offset, >>> out_len); >>> + qemu_co_mutex_unlock(&s->lock); >>> + if (ret < 0) { >>> + goto fail; >>> } >>> >>> + iov = (struct iovec) { >>> + .iov_base = out_buf, >>> + .iov_len = out_len, >>> + }; >>> + qemu_iovec_init_external(&hd_qiov, &iov, 1); >>> + >>> + BLKDBG_EVENT(bs->file, BLKDBG_WRITE_COMPRESSED); >>> + ret = bdrv_co_pwritev(bs->file->bs, cluster_offset, out_len, >>> &hd_qiov, 0); >> >> There is a race condition here: >> >> If the newly allocated cluster is only partially filled by compressed >> data then qcow2_alloc_compressed_cluster_offset() remembers that more >> bytes are still available in the cluster. The >> qcow2_alloc_compressed_cluster_offset() caller will continue filling the >> same cluster. >> >> Imagine two compressed writes running at the same time. Write A >> allocates just a few bytes so write B shares a sector with the first >> write: Sorry, but it seems this will never happen, because the second write will not pass this check: uint64_t qcow2_alloc_compressed_cluster_offset(BlockDriverState *bs, uint64_t offset, int compressed_size) { ... /* Compression can't overwrite anything. Fail if the cluster was already * allocated. */ cluster_offset = be64_to_cpu(l2_table[l2_index]); if (cluster_offset & L2E_OFFSET_MASK) { qcow2_cache_put(bs, s->l2_table_cache, (void**) &l2_table); return 0; } ... As you can see we can't do the compressed write in the already allocated cluster. >> >> Sector 1 >> |AAABBBBBBBBB| >> >> The race condition is that bdrv_co_pwritev() uses read-modify-write (a >> bounce buffer). If both requests call bdrv_co_pwritev() around the same >> time then the following could happen: >> >> Sector 1 >> |000BBBBBBBBB| >> >> or: >> >> Sector 1 >> |AAA000000000| >> >> It's necessary to hold s->lock around the compressed data write to avoid >> this race condition. >> > I agree, there is really a race.. Thank you, this is a very good point! > >
On 05/30/2016 06:58 AM, Pavel Butsykin wrote: > > Sorry, but it seems this will never happen, because the second write > will not pass this check: > > uint64_t qcow2_alloc_compressed_cluster_offset(BlockDriverState *bs, > uint64_t offset, > int compressed_size) > { > ... > /* Compression can't overwrite anything. Fail if the cluster was > already > * allocated. */ > cluster_offset = be64_to_cpu(l2_table[l2_index]); > if (cluster_offset & L2E_OFFSET_MASK) { > qcow2_cache_put(bs, s->l2_table_cache, (void**) &l2_table); > return 0; > } > ... > > As you can see we can't do the compressed write in the already allocated > cluster. Umm, doesn't that defeat the point of compression, if every compressed cluster becomes the head of a new cluster? The whole goal of compression is to be able to fit multiple clusters within one.
On 05/31/2016 09:42 PM, Eric Blake wrote: > On 05/30/2016 06:58 AM, Pavel Butsykin wrote: > >> Sorry, but it seems this will never happen, because the second write >> will not pass this check: >> >> uint64_t qcow2_alloc_compressed_cluster_offset(BlockDriverState *bs, >> uint64_t offset, >> int compressed_size) >> { >> ... >> /* Compression can't overwrite anything. Fail if the cluster was >> already >> * allocated. */ >> cluster_offset = be64_to_cpu(l2_table[l2_index]); >> if (cluster_offset & L2E_OFFSET_MASK) { >> qcow2_cache_put(bs, s->l2_table_cache, (void**) &l2_table); >> return 0; >> } >> ... >> >> As you can see we can't do the compressed write in the already allocated >> cluster. > Umm, doesn't that defeat the point of compression, if every compressed > cluster becomes the head of a new cluster? The whole goal of > compression is to be able to fit multiple clusters within one. > AFAIK the file will be sparse in that unused areas
On 05/31/2016 03:00 PM, Denis V. Lunev wrote: > On 05/31/2016 09:42 PM, Eric Blake wrote: >> On 05/30/2016 06:58 AM, Pavel Butsykin wrote: >> >>> Sorry, but it seems this will never happen, because the second write >>> will not pass this check: >>> >>> uint64_t qcow2_alloc_compressed_cluster_offset(BlockDriverState *bs, >>> uint64_t offset, >>> int compressed_size) >>> { >>> ... >>> /* Compression can't overwrite anything. Fail if the cluster was >>> already >>> * allocated. */ >>> cluster_offset = be64_to_cpu(l2_table[l2_index]); >>> if (cluster_offset & L2E_OFFSET_MASK) { >>> qcow2_cache_put(bs, s->l2_table_cache, (void**) &l2_table); >>> return 0; >>> } >>> ... >>> >>> As you can see we can't do the compressed write in the already allocated >>> cluster. >> Umm, doesn't that defeat the point of compression, if every compressed >> cluster becomes the head of a new cluster? The whole goal of >> compression is to be able to fit multiple clusters within one. >> > AFAIK the file will be sparse in that unused areas IIRC, on the NTFS file system, the minimum hole size is 64k. If you also have 64k clusters, then you don't have a sparse file - every tail of zero sectors will be explicit in the filesystem, if you are using 1:1 clusters for compression. Other file systems may have finer granularity for holes, but it's still rather awkward to be relying on sparseness when a better solution is to pack compressed sectors consecutively.
Am 27.05.2016 um 19:33 hat Stefan Hajnoczi geschrieben: > On Sat, May 14, 2016 at 03:45:50PM +0300, Denis V. Lunev wrote: > > + qemu_co_mutex_lock(&s->lock); > > + cluster_offset = \ > > + qcow2_alloc_compressed_cluster_offset(bs, sector_num << 9, out_len); > > The backslash isn't necessary for wrapping lines in C. This kind of > thing is only necessary in languages like Python where the grammar is > whitespace sensistive. > > The C compiler is happy with an arbitrary amount of whitespace > (newlines) in the middle of a statement. The backslash in C is handled > by the preprocessor: it joins the line. That's useful for macro > definitions where you need to tell the preprocessor that several lines > belong to one macro definition. But it's not needed for normal C code. > > > + if (!cluster_offset) { > > + qemu_co_mutex_unlock(&s->lock); > > + ret = -EIO; > > + goto fail; > > + } > > + cluster_offset &= s->cluster_offset_mask; > > > > - BLKDBG_EVENT(bs->file, BLKDBG_WRITE_COMPRESSED); > > - ret = bdrv_pwrite(bs->file->bs, cluster_offset, out_buf, out_len); > > - if (ret < 0) { > > - goto fail; > > - } > > + ret = qcow2_pre_write_overlap_check(bs, 0, cluster_offset, out_len); > > + qemu_co_mutex_unlock(&s->lock); > > + if (ret < 0) { > > + goto fail; > > } > > > > + iov = (struct iovec) { > > + .iov_base = out_buf, > > + .iov_len = out_len, > > + }; > > + qemu_iovec_init_external(&hd_qiov, &iov, 1); > > + > > + BLKDBG_EVENT(bs->file, BLKDBG_WRITE_COMPRESSED); > > + ret = bdrv_co_pwritev(bs->file->bs, cluster_offset, out_len, &hd_qiov, 0); > > There is a race condition here: > > If the newly allocated cluster is only partially filled by compressed > data then qcow2_alloc_compressed_cluster_offset() remembers that more > bytes are still available in the cluster. The > qcow2_alloc_compressed_cluster_offset() caller will continue filling the > same cluster. > > Imagine two compressed writes running at the same time. Write A > allocates just a few bytes so write B shares a sector with the first > write: > > Sector 1 > |AAABBBBBBBBB| > > The race condition is that bdrv_co_pwritev() uses read-modify-write (a > bounce buffer). If both requests call bdrv_co_pwritev() around the same > time then the following could happen: > > Sector 1 > |000BBBBBBBBB| > > or: > > Sector 1 > |AAA000000000| > > It's necessary to hold s->lock around the compressed data write to avoid > this race condition. I don't think this race condition exists. bdrv_co_pwritev() can indeed perform RMW if the lower layer requires alignment, but that's not something callers need to care about (which would be an awful API) but is fully handled there by marking requests serialising if they involve RMW. Kevin
Am 31.05.2016 um 20:42 hat Eric Blake geschrieben: > On 05/30/2016 06:58 AM, Pavel Butsykin wrote: > > > > > Sorry, but it seems this will never happen, because the second write > > will not pass this check: > > > > uint64_t qcow2_alloc_compressed_cluster_offset(BlockDriverState *bs, > > uint64_t offset, > > int compressed_size) > > { > > ... > > /* Compression can't overwrite anything. Fail if the cluster was > > already > > * allocated. */ > > cluster_offset = be64_to_cpu(l2_table[l2_index]); > > if (cluster_offset & L2E_OFFSET_MASK) { > > qcow2_cache_put(bs, s->l2_table_cache, (void**) &l2_table); > > return 0; > > } > > ... > > > > As you can see we can't do the compressed write in the already allocated > > cluster. > > Umm, doesn't that defeat the point of compression, if every compressed > cluster becomes the head of a new cluster? The whole goal of > compression is to be able to fit multiple clusters within one. With compression, be careful what kind of clusters you're looking at. The clusters in the code above are guest clusters and you can't overwrite a guest cluster that has already been written (I'm not sure why, though - didn't matter so far because the only writer is qemu-img convert, which writes the whole image sequentially and doesn't split clusters across requests, but if we want to allow compressed writes in a running VM, we may want to change this). However, the compressed data of multiple guest clusters may end up in the same host cluster, and that's indeed the whole point of compression. Kevin
On 01.06.2016 00:13, Eric Blake wrote: > On 05/31/2016 03:00 PM, Denis V. Lunev wrote: >> On 05/31/2016 09:42 PM, Eric Blake wrote: >>> On 05/30/2016 06:58 AM, Pavel Butsykin wrote: >>> >>>> Sorry, but it seems this will never happen, because the second write >>>> will not pass this check: >>>> >>>> uint64_t qcow2_alloc_compressed_cluster_offset(BlockDriverState *bs, >>>> uint64_t offset, >>>> int compressed_size) >>>> { >>>> ... >>>> /* Compression can't overwrite anything. Fail if the cluster was >>>> already >>>> * allocated. */ >>>> cluster_offset = be64_to_cpu(l2_table[l2_index]); >>>> if (cluster_offset & L2E_OFFSET_MASK) { >>>> qcow2_cache_put(bs, s->l2_table_cache, (void**) &l2_table); >>>> return 0; >>>> } >>>> ... >>>> >>>> As you can see we can't do the compressed write in the already allocated >>>> cluster. >>> Umm, doesn't that defeat the point of compression, if every compressed >>> cluster becomes the head of a new cluster? The whole goal of >>> compression is to be able to fit multiple clusters within one. >>> >> AFAIK the file will be sparse in that unused areas > > IIRC, on the NTFS file system, the minimum hole size is 64k. If you also > have 64k clusters, then you don't have a sparse file - every tail of > zero sectors will be explicit in the filesystem, if you are using 1:1 > clusters for compression. Other file systems may have finer granularity > for holes, but it's still rather awkward to be relying on sparseness > when a better solution is to pack compressed sectors consecutively. > Maybe you're right, I agree that the above mentioned issues are taking place and it would be good to solve them, but it looks like the next step. The solution that you proposed has a direct relation to the improvement of the format driver that goes beyond the goals of this patch-set, where the main goal is to provide opportunities compression from the format drivers for the backup function.
On Wed, Jun 01, 2016 at 11:25:57AM +0200, Kevin Wolf wrote: > Am 27.05.2016 um 19:33 hat Stefan Hajnoczi geschrieben: > > On Sat, May 14, 2016 at 03:45:50PM +0300, Denis V. Lunev wrote: > > > + qemu_co_mutex_lock(&s->lock); > > > + cluster_offset = \ > > > + qcow2_alloc_compressed_cluster_offset(bs, sector_num << 9, out_len); > > > > The backslash isn't necessary for wrapping lines in C. This kind of > > thing is only necessary in languages like Python where the grammar is > > whitespace sensistive. > > > > The C compiler is happy with an arbitrary amount of whitespace > > (newlines) in the middle of a statement. The backslash in C is handled > > by the preprocessor: it joins the line. That's useful for macro > > definitions where you need to tell the preprocessor that several lines > > belong to one macro definition. But it's not needed for normal C code. > > > > > + if (!cluster_offset) { > > > + qemu_co_mutex_unlock(&s->lock); > > > + ret = -EIO; > > > + goto fail; > > > + } > > > + cluster_offset &= s->cluster_offset_mask; > > > > > > - BLKDBG_EVENT(bs->file, BLKDBG_WRITE_COMPRESSED); > > > - ret = bdrv_pwrite(bs->file->bs, cluster_offset, out_buf, out_len); > > > - if (ret < 0) { > > > - goto fail; > > > - } > > > + ret = qcow2_pre_write_overlap_check(bs, 0, cluster_offset, out_len); > > > + qemu_co_mutex_unlock(&s->lock); > > > + if (ret < 0) { > > > + goto fail; > > > } > > > > > > + iov = (struct iovec) { > > > + .iov_base = out_buf, > > > + .iov_len = out_len, > > > + }; > > > + qemu_iovec_init_external(&hd_qiov, &iov, 1); > > > + > > > + BLKDBG_EVENT(bs->file, BLKDBG_WRITE_COMPRESSED); > > > + ret = bdrv_co_pwritev(bs->file->bs, cluster_offset, out_len, &hd_qiov, 0); > > > > There is a race condition here: > > > > If the newly allocated cluster is only partially filled by compressed > > data then qcow2_alloc_compressed_cluster_offset() remembers that more > > bytes are still available in the cluster. The > > qcow2_alloc_compressed_cluster_offset() caller will continue filling the > > same cluster. > > > > Imagine two compressed writes running at the same time. Write A > > allocates just a few bytes so write B shares a sector with the first > > write: > > > > Sector 1 > > |AAABBBBBBBBB| > > > > The race condition is that bdrv_co_pwritev() uses read-modify-write (a > > bounce buffer). If both requests call bdrv_co_pwritev() around the same > > time then the following could happen: > > > > Sector 1 > > |000BBBBBBBBB| > > > > or: > > > > Sector 1 > > |AAA000000000| > > > > It's necessary to hold s->lock around the compressed data write to avoid > > this race condition. > > I don't think this race condition exists. bdrv_co_pwritev() can indeed > perform RMW if the lower layer requires alignment, but that's not > something callers need to care about (which would be an awful API) but > is fully handled there by marking requests serialising if they involve > RMW. Good point. Stefan
diff --git a/block/qcow2.c b/block/qcow2.c index 62febfc..d948d44 100644 --- a/block/qcow2.c +++ b/block/qcow2.c @@ -2533,13 +2533,16 @@ static int qcow2_truncate(BlockDriverState *bs, int64_t offset) /* XXX: put compressed sectors first, then all the cluster aligned tables to avoid losing bytes in alignment */ -static int qcow2_write_compressed(BlockDriverState *bs, int64_t sector_num, - const uint8_t *buf, int nb_sectors) +static coroutine_fn int +qcow2_co_write_compressed(BlockDriverState *bs, int64_t sector_num, + int nb_sectors, QEMUIOVector *qiov) { BDRVQcow2State *s = bs->opaque; + QEMUIOVector hd_qiov; + struct iovec iov; z_stream strm; int ret, out_len; - uint8_t *out_buf; + uint8_t *buf, *out_buf; uint64_t cluster_offset; if (nb_sectors == 0) { @@ -2549,29 +2552,25 @@ static int qcow2_write_compressed(BlockDriverState *bs, int64_t sector_num, return bdrv_truncate(bs->file->bs, cluster_offset); } + buf = qemu_blockalign(bs, s->cluster_size); if (nb_sectors != s->cluster_sectors) { - ret = -EINVAL; - - /* Zero-pad last write if image size is not cluster aligned */ - if (sector_num + nb_sectors == bs->total_sectors && - nb_sectors < s->cluster_sectors) { - uint8_t *pad_buf = qemu_blockalign(bs, s->cluster_size); - memset(pad_buf, 0, s->cluster_size); - memcpy(pad_buf, buf, nb_sectors * BDRV_SECTOR_SIZE); - ret = qcow2_write_compressed(bs, sector_num, - pad_buf, s->cluster_sectors); - qemu_vfree(pad_buf); + if (nb_sectors > s->cluster_sectors || + sector_num + nb_sectors != bs->total_sectors) + { + qemu_vfree(buf); + return -EINVAL; } - return ret; + /* Zero-pad last write if image size is not cluster aligned */ + memset(buf, 0, s->cluster_size); } + qemu_iovec_to_buf(qiov, 0, buf, qiov->size); out_buf = g_malloc(s->cluster_size + (s->cluster_size / 1000) + 128); /* best compression, small window, no zlib header */ memset(&strm, 0, sizeof(strm)); - ret = deflateInit2(&strm, Z_DEFAULT_COMPRESSION, - Z_DEFLATED, -12, - 9, Z_DEFAULT_STRATEGY); + ret = deflateInit2(&strm, Z_DEFAULT_COMPRESSION, Z_DEFLATED, + -12, 9, Z_DEFAULT_STRATEGY); if (ret != 0) { ret = -EINVAL; goto fail; @@ -2593,34 +2592,50 @@ static int qcow2_write_compressed(BlockDriverState *bs, int64_t sector_num, deflateEnd(&strm); if (ret != Z_STREAM_END || out_len >= s->cluster_size) { + iov = (struct iovec) { + .iov_base = buf, + .iov_len = out_len, + }; + qemu_iovec_init_external(&hd_qiov, &iov, 1); /* could not compress: write normal cluster */ - ret = bdrv_write(bs, sector_num, buf, s->cluster_sectors); + ret = qcow2_co_writev(bs, sector_num, s->cluster_sectors, &hd_qiov); if (ret < 0) { goto fail; } - } else { - cluster_offset = qcow2_alloc_compressed_cluster_offset(bs, - sector_num << 9, out_len); - if (!cluster_offset) { - ret = -EIO; - goto fail; - } - cluster_offset &= s->cluster_offset_mask; + goto success; + } - ret = qcow2_pre_write_overlap_check(bs, 0, cluster_offset, out_len); - if (ret < 0) { - goto fail; - } + qemu_co_mutex_lock(&s->lock); + cluster_offset = \ + qcow2_alloc_compressed_cluster_offset(bs, sector_num << 9, out_len); + if (!cluster_offset) { + qemu_co_mutex_unlock(&s->lock); + ret = -EIO; + goto fail; + } + cluster_offset &= s->cluster_offset_mask; - BLKDBG_EVENT(bs->file, BLKDBG_WRITE_COMPRESSED); - ret = bdrv_pwrite(bs->file->bs, cluster_offset, out_buf, out_len); - if (ret < 0) { - goto fail; - } + ret = qcow2_pre_write_overlap_check(bs, 0, cluster_offset, out_len); + qemu_co_mutex_unlock(&s->lock); + if (ret < 0) { + goto fail; } + iov = (struct iovec) { + .iov_base = out_buf, + .iov_len = out_len, + }; + qemu_iovec_init_external(&hd_qiov, &iov, 1); + + BLKDBG_EVENT(bs->file, BLKDBG_WRITE_COMPRESSED); + ret = bdrv_co_pwritev(bs->file->bs, cluster_offset, out_len, &hd_qiov, 0); + if (ret < 0) { + goto fail; + } +success: ret = 0; fail: + qemu_vfree(buf); g_free(out_buf); return ret; } @@ -3382,7 +3397,7 @@ BlockDriver bdrv_qcow2 = { .bdrv_co_write_zeroes = qcow2_co_write_zeroes, .bdrv_co_discard = qcow2_co_discard, .bdrv_truncate = qcow2_truncate, - .bdrv_write_compressed = qcow2_write_compressed, + .bdrv_co_write_compressed = qcow2_co_write_compressed, .bdrv_make_empty = qcow2_make_empty, .bdrv_snapshot_create = qcow2_snapshot_create,