Message ID | 20200422152129.167074-5-kwolf@redhat.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | block: Fix resize (extending) of short overlays | expand |
On 4/22/20 10:21 AM, Kevin Wolf wrote: > If BDRV_REQ_ZERO_WRITE is set and we're extending the image, calling > qcow2_cluster_zeroize() with flags=0 does the right thing: It doesn't > undo any previous preallocation, but just adds the zero flag to all > relevant L2 entries. If an external data file is in use, a write_zeroes > request to the data file is made instead. > > Signed-off-by: Kevin Wolf <kwolf@redhat.com> > --- > block/qcow2.c | 30 ++++++++++++++++++++++++++++++ > 1 file changed, 30 insertions(+) > > @@ -4214,6 +4215,35 @@ static int coroutine_fn qcow2_co_truncate(BlockDriverState *bs, int64_t offset, > g_assert_not_reached(); > } > > + if ((flags & BDRV_REQ_ZERO_WRITE) && offset > old_length) { > + uint64_t zero_start = QEMU_ALIGN_UP(old_length, s->cluster_size); > + uint64_t zero_end = QEMU_ALIGN_UP(offset, s->cluster_size); This rounds up beyond the new size... > + > + /* Use zero clusters as much as we can */ > + ret = qcow2_cluster_zeroize(bs, zero_start, zero_end - zero_start, 0); and then requests that the extra be zeroed. Does that always work, even when it results in pdrv_co_pwrite_zeroes beyond the end of s->data_file? If so, Reviewed-by: Eric Blake <eblake@redhat.com> otherwise, you may have to treat the tail specially, the same way you treated an unaligned head.
Am 22.04.2020 um 17:33 hat Eric Blake geschrieben: > On 4/22/20 10:21 AM, Kevin Wolf wrote: > > If BDRV_REQ_ZERO_WRITE is set and we're extending the image, calling > > qcow2_cluster_zeroize() with flags=0 does the right thing: It doesn't > > undo any previous preallocation, but just adds the zero flag to all > > relevant L2 entries. If an external data file is in use, a write_zeroes > > request to the data file is made instead. > > > > Signed-off-by: Kevin Wolf <kwolf@redhat.com> > > --- > > block/qcow2.c | 30 ++++++++++++++++++++++++++++++ > > 1 file changed, 30 insertions(+) > > > > > @@ -4214,6 +4215,35 @@ static int coroutine_fn qcow2_co_truncate(BlockDriverState *bs, int64_t offset, > > g_assert_not_reached(); > > } > > + if ((flags & BDRV_REQ_ZERO_WRITE) && offset > old_length) { > > + uint64_t zero_start = QEMU_ALIGN_UP(old_length, s->cluster_size); > > + uint64_t zero_end = QEMU_ALIGN_UP(offset, s->cluster_size); > > This rounds up beyond the new size... > > > + > > + /* Use zero clusters as much as we can */ > > + ret = qcow2_cluster_zeroize(bs, zero_start, zero_end - zero_start, 0); > > and then requests that the extra be zeroed. Does that always work, even > when it results in pdrv_co_pwrite_zeroes beyond the end of s->data_file? You mean the data_file_is_raw() path in qcow2_cluster_zeroize()? It's currently not a code path that is run because we only set BDRV_REQ_ZERO_WRITE for truncate if the image has a backing file, and data_file_is_raw() doesn't work with backing files. But hypothetically, if someone called truncate with BDRV_REQ_ZERO_WRITE for such a file, I think it would fail. > If so, > > Reviewed-by: Eric Blake <eblake@redhat.com> > > otherwise, you may have to treat the tail specially, the same way you > treated an unaligned head. Actually, do I even need to round the tail? /* Caller must pass aligned values, except at image end */ assert(QEMU_IS_ALIGNED(offset, s->cluster_size)); assert(QEMU_IS_ALIGNED(end_offset, s->cluster_size) || end_offset == bs->total_sectors << BDRV_SECTOR_BITS); So qcow2_cluster_zeroize() seems to accept the unaligned tail. It would still set the zero flag for the partial last cluster and for the external data file, bdrv_co_pwrite_zeroes() would have the correct size. Kevin
On 4/22/20 10:58 AM, Kevin Wolf wrote: >>> @@ -4214,6 +4215,35 @@ static int coroutine_fn qcow2_co_truncate(BlockDriverState *bs, int64_t offset, >>> g_assert_not_reached(); >>> } >>> + if ((flags & BDRV_REQ_ZERO_WRITE) && offset > old_length) { >>> + uint64_t zero_start = QEMU_ALIGN_UP(old_length, s->cluster_size); >>> + uint64_t zero_end = QEMU_ALIGN_UP(offset, s->cluster_size); >> >> This rounds up beyond the new size... >> >>> + >>> + /* Use zero clusters as much as we can */ >>> + ret = qcow2_cluster_zeroize(bs, zero_start, zero_end - zero_start, 0); >> >> and then requests that the extra be zeroed. Does that always work, even >> when it results in pdrv_co_pwrite_zeroes beyond the end of s->data_file? > > You mean the data_file_is_raw() path in qcow2_cluster_zeroize()? It's > currently not a code path that is run because we only set > BDRV_REQ_ZERO_WRITE for truncate if the image has a backing file, and > data_file_is_raw() doesn't work with backing files. Good point. > > But hypothetically, if someone called truncate with BDRV_REQ_ZERO_WRITE > for such a file, I think it would fail. > >> If so, >> >> Reviewed-by: Eric Blake <eblake@redhat.com> >> >> otherwise, you may have to treat the tail specially, the same way you >> treated an unaligned head. > > Actually, do I even need to round the tail? > > /* Caller must pass aligned values, except at image end */ > assert(QEMU_IS_ALIGNED(offset, s->cluster_size)); > assert(QEMU_IS_ALIGNED(end_offset, s->cluster_size) || > end_offset == bs->total_sectors << BDRV_SECTOR_BITS); > > So qcow2_cluster_zeroize() seems to accept the unaligned tail. It would > still set the zero flag for the partial last cluster and for the > external data file, bdrv_co_pwrite_zeroes() would have the correct size. Then I'm in favor of NOT rounding the tail. That's an easy enough change and we've now justified that it does what we want, so R-b stands with that one-line tweak.
On 22.04.20 17:21, Kevin Wolf wrote: > If BDRV_REQ_ZERO_WRITE is set and we're extending the image, calling > qcow2_cluster_zeroize() with flags=0 does the right thing: It doesn't > undo any previous preallocation, but just adds the zero flag to all > relevant L2 entries. If an external data file is in use, a write_zeroes > request to the data file is made instead. > > Signed-off-by: Kevin Wolf <kwolf@redhat.com> > --- > block/qcow2.c | 30 ++++++++++++++++++++++++++++++ > 1 file changed, 30 insertions(+) > > diff --git a/block/qcow2.c b/block/qcow2.c > index 9cfbdfc939..bd632405d1 100644 > --- a/block/qcow2.c > +++ b/block/qcow2.c [...] > @@ -4214,6 +4215,35 @@ static int coroutine_fn qcow2_co_truncate(BlockDriverState *bs, int64_t offset, > g_assert_not_reached(); > } > > + if ((flags & BDRV_REQ_ZERO_WRITE) && offset > old_length) { > + uint64_t zero_start = QEMU_ALIGN_UP(old_length, s->cluster_size); > + uint64_t zero_end = QEMU_ALIGN_UP(offset, s->cluster_size); > + > + /* Use zero clusters as much as we can */ > + ret = qcow2_cluster_zeroize(bs, zero_start, zero_end - zero_start, 0); It’s kind of a pity that this changes the cluster mappings that were established when using falloc/full preallocation already (i.e., they become preallocated zero clusters then, so when writing to them, we need COW again). But falloc/full preallocation do not guarantee that the new data is zero, so I suppose this is the only thing we can reasonably do. I personally don’t really care about whether zero_end is aligned or not, so either way: Reviewed-by: Max Reitz <mreitz@redhat.com>
Am 22.04.2020 um 18:14 hat Eric Blake geschrieben: > On 4/22/20 10:58 AM, Kevin Wolf wrote: > > > > > @@ -4214,6 +4215,35 @@ static int coroutine_fn qcow2_co_truncate(BlockDriverState *bs, int64_t offset, > > > > g_assert_not_reached(); > > > > } > > > > + if ((flags & BDRV_REQ_ZERO_WRITE) && offset > old_length) { > > > > + uint64_t zero_start = QEMU_ALIGN_UP(old_length, s->cluster_size); > > > > + uint64_t zero_end = QEMU_ALIGN_UP(offset, s->cluster_size); > > > > > > This rounds up beyond the new size... > > > > > > > + > > > > + /* Use zero clusters as much as we can */ > > > > + ret = qcow2_cluster_zeroize(bs, zero_start, zero_end - zero_start, 0); > > > > > > and then requests that the extra be zeroed. Does that always work, even > > > when it results in pdrv_co_pwrite_zeroes beyond the end of s->data_file? > > > > You mean the data_file_is_raw() path in qcow2_cluster_zeroize()? It's > > currently not a code path that is run because we only set > > BDRV_REQ_ZERO_WRITE for truncate if the image has a backing file, and > > data_file_is_raw() doesn't work with backing files. > > Good point. > > > > > But hypothetically, if someone called truncate with BDRV_REQ_ZERO_WRITE > > for such a file, I think it would fail. > > > > > If so, > > > > > > Reviewed-by: Eric Blake <eblake@redhat.com> > > > > > > otherwise, you may have to treat the tail specially, the same way you > > > treated an unaligned head. > > > > Actually, do I even need to round the tail? > > > > /* Caller must pass aligned values, except at image end */ > > assert(QEMU_IS_ALIGNED(offset, s->cluster_size)); > > assert(QEMU_IS_ALIGNED(end_offset, s->cluster_size) || > > end_offset == bs->total_sectors << BDRV_SECTOR_BITS); > > > > So qcow2_cluster_zeroize() seems to accept the unaligned tail. It would > > still set the zero flag for the partial last cluster and for the > > external data file, bdrv_co_pwrite_zeroes() would have the correct size. > > Then I'm in favor of NOT rounding the tail. That's an easy enough change > and we've now justified that it does what we want, so R-b stands with that > one-line tweak. Would have been too easy... bs->total_sectors isn't updated yet, so the assertion does fail. I can make the assertion check end_offset >= ... instead. That should still check what we wanted to check here and allow the unaligned extension. This feels like the better option to me compared to updating bs->total_sectors earlier and then undoing that change in every error path. Kevin
Am 23.04.2020 um 12:53 hat Max Reitz geschrieben: > On 22.04.20 17:21, Kevin Wolf wrote: > > If BDRV_REQ_ZERO_WRITE is set and we're extending the image, calling > > qcow2_cluster_zeroize() with flags=0 does the right thing: It doesn't > > undo any previous preallocation, but just adds the zero flag to all > > relevant L2 entries. If an external data file is in use, a write_zeroes > > request to the data file is made instead. > > > > Signed-off-by: Kevin Wolf <kwolf@redhat.com> > > --- > > block/qcow2.c | 30 ++++++++++++++++++++++++++++++ > > 1 file changed, 30 insertions(+) > > > > diff --git a/block/qcow2.c b/block/qcow2.c > > index 9cfbdfc939..bd632405d1 100644 > > --- a/block/qcow2.c > > +++ b/block/qcow2.c > > [...] > > > @@ -4214,6 +4215,35 @@ static int coroutine_fn qcow2_co_truncate(BlockDriverState *bs, int64_t offset, > > g_assert_not_reached(); > > } > > > > + if ((flags & BDRV_REQ_ZERO_WRITE) && offset > old_length) { > > + uint64_t zero_start = QEMU_ALIGN_UP(old_length, s->cluster_size); > > + uint64_t zero_end = QEMU_ALIGN_UP(offset, s->cluster_size); > > + > > + /* Use zero clusters as much as we can */ > > + ret = qcow2_cluster_zeroize(bs, zero_start, zero_end - zero_start, 0); > > It’s kind of a pity that this changes the cluster mappings that were > established when using falloc/full preallocation already (i.e., they > become preallocated zero clusters then, so when writing to them, we need > COW again). > > But falloc/full preallocation do not guarantee that the new data is > zero, so I suppose this is the only thing we can reasonably do. If we really want, I guess we could make full preallocation first try passing BDRV_REQ_ZERO_WRITE to the protocol layer and if that succeeds, we could skip setting the zero cluster flag at the qcow2 level. Feels like a separate patch, though. Kevin
On 23.04.20 15:25, Kevin Wolf wrote: > Am 23.04.2020 um 12:53 hat Max Reitz geschrieben: >> On 22.04.20 17:21, Kevin Wolf wrote: >>> If BDRV_REQ_ZERO_WRITE is set and we're extending the image, calling >>> qcow2_cluster_zeroize() with flags=0 does the right thing: It doesn't >>> undo any previous preallocation, but just adds the zero flag to all >>> relevant L2 entries. If an external data file is in use, a write_zeroes >>> request to the data file is made instead. >>> >>> Signed-off-by: Kevin Wolf <kwolf@redhat.com> >>> --- >>> block/qcow2.c | 30 ++++++++++++++++++++++++++++++ >>> 1 file changed, 30 insertions(+) >>> >>> diff --git a/block/qcow2.c b/block/qcow2.c >>> index 9cfbdfc939..bd632405d1 100644 >>> --- a/block/qcow2.c >>> +++ b/block/qcow2.c >> >> [...] >> >>> @@ -4214,6 +4215,35 @@ static int coroutine_fn qcow2_co_truncate(BlockDriverState *bs, int64_t offset, >>> g_assert_not_reached(); >>> } >>> >>> + if ((flags & BDRV_REQ_ZERO_WRITE) && offset > old_length) { >>> + uint64_t zero_start = QEMU_ALIGN_UP(old_length, s->cluster_size); >>> + uint64_t zero_end = QEMU_ALIGN_UP(offset, s->cluster_size); >>> + >>> + /* Use zero clusters as much as we can */ >>> + ret = qcow2_cluster_zeroize(bs, zero_start, zero_end - zero_start, 0); >> >> It’s kind of a pity that this changes the cluster mappings that were >> established when using falloc/full preallocation already (i.e., they >> become preallocated zero clusters then, so when writing to them, we need >> COW again). >> >> But falloc/full preallocation do not guarantee that the new data is >> zero, so I suppose this is the only thing we can reasonably do. > > If we really want, I guess we could make full preallocation first try > passing BDRV_REQ_ZERO_WRITE to the protocol layer and if that succeeds, > we could skip setting the zero cluster flag at the qcow2 level. That might be nice. > Feels like a separate patch, though. Definitely, yes. Max
On 4/23/20 8:23 AM, Kevin Wolf wrote: >>> >>> So qcow2_cluster_zeroize() seems to accept the unaligned tail. It would >>> still set the zero flag for the partial last cluster and for the >>> external data file, bdrv_co_pwrite_zeroes() would have the correct size. >> >> Then I'm in favor of NOT rounding the tail. That's an easy enough change >> and we've now justified that it does what we want, so R-b stands with that >> one-line tweak. > > Would have been too easy... bs->total_sectors isn't updated yet, so the > assertion does fail. > > I can make the assertion check end_offset >= ... instead. That should > still check what we wanted to check here and allow the unaligned > extension. Yes, that works for me. > > This feels like the better option to me compared to updating > bs->total_sectors earlier and then undoing that change in every error > path. Indeed.
diff --git a/block/qcow2.c b/block/qcow2.c index 9cfbdfc939..bd632405d1 100644 --- a/block/qcow2.c +++ b/block/qcow2.c @@ -1726,6 +1726,7 @@ static int coroutine_fn qcow2_do_open(BlockDriverState *bs, QDict *options, bs->supported_zero_flags = header.version >= 3 ? BDRV_REQ_MAY_UNMAP | BDRV_REQ_NO_FALLBACK : 0; + bs->supported_truncate_flags = BDRV_REQ_ZERO_WRITE; /* Repair image if dirty */ if (!(flags & (BDRV_O_CHECK | BDRV_O_INACTIVE)) && !bs->read_only && @@ -4214,6 +4215,35 @@ static int coroutine_fn qcow2_co_truncate(BlockDriverState *bs, int64_t offset, g_assert_not_reached(); } + if ((flags & BDRV_REQ_ZERO_WRITE) && offset > old_length) { + uint64_t zero_start = QEMU_ALIGN_UP(old_length, s->cluster_size); + uint64_t zero_end = QEMU_ALIGN_UP(offset, s->cluster_size); + + /* Use zero clusters as much as we can */ + ret = qcow2_cluster_zeroize(bs, zero_start, zero_end - zero_start, 0); + if (ret < 0) { + error_setg_errno(errp, -ret, "Failed to zero out the new area"); + goto fail; + } + + /* Write explicit zeros for the unaligned head */ + if (zero_start > old_length) { + uint8_t *buf = qemu_blockalign0(bs, s->cluster_size); + QEMUIOVector qiov; + qemu_iovec_init_buf(&qiov, buf, zero_start - old_length); + + qemu_co_mutex_unlock(&s->lock); + ret = qcow2_co_pwritev_part(bs, old_length, qiov.size, &qiov, 0, 0); + qemu_co_mutex_lock(&s->lock); + + qemu_vfree(buf); + if (ret < 0) { + error_setg_errno(errp, -ret, "Failed to zero out the new area"); + goto fail; + } + } + } + if (prealloc != PREALLOC_MODE_OFF) { /* Flush metadata before actually changing the image size */ ret = qcow2_write_caches(bs);
If BDRV_REQ_ZERO_WRITE is set and we're extending the image, calling qcow2_cluster_zeroize() with flags=0 does the right thing: It doesn't undo any previous preallocation, but just adds the zero flag to all relevant L2 entries. If an external data file is in use, a write_zeroes request to the data file is made instead. Signed-off-by: Kevin Wolf <kwolf@redhat.com> --- block/qcow2.c | 30 ++++++++++++++++++++++++++++++ 1 file changed, 30 insertions(+)