Message ID | 20170427014626.11553-10-eblake@redhat.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On 27.04.2017 03:46, Eric Blake wrote: > We've already improved discards to operate efficiently on the tail > of an unaligned qcow2 image; it's time to make a similar improvement > to write zeroes. The special case is only valid at the tail > cluster of a file, where we must recognize that any sectors beyond > the image end would implicitly read as zero, and therefore should > not penalize our logic for widening a partial cluster into writing > the whole cluster as zero. > > Update test 154 to cover the new scenarios, using two images of > intentionally differing length. > > While at it, fix the test to gracefully skip when run as > ./check -qcow2 -o compat=0.10 154 > since the older format lacks zero clusters already required earlier > in the test. > > Signed-off-by: Eric Blake <eblake@redhat.com> > > --- > v10: rebase to better reporting of preallocated zero clusters > v9: new patch > --- > block/qcow2.c | 7 +++ > tests/qemu-iotests/154 | 112 ++++++++++++++++++++++++++++++++++++++++++++- > tests/qemu-iotests/154.out | 83 +++++++++++++++++++++++++++++++++ > 3 files changed, 200 insertions(+), 2 deletions(-) > > diff --git a/block/qcow2.c b/block/qcow2.c > index 5c1573c..8038793 100644 > --- a/block/qcow2.c > +++ b/block/qcow2.c > @@ -2449,6 +2449,10 @@ static bool is_zero_sectors(BlockDriverState *bs, int64_t start, > BlockDriverState *file; > int64_t res; > > + if (start + count > bs->total_sectors) { > + count = bs->total_sectors - start; > + } > + > if (!count) { > return true; > } > @@ -2467,6 +2471,9 @@ static coroutine_fn int qcow2_co_pwrite_zeroes(BlockDriverState *bs, > uint32_t tail = (offset + count) % s->cluster_size; > > trace_qcow2_pwrite_zeroes_start_req(qemu_coroutine_self(), offset, count); > + if (offset + count == bs->total_sectors * BDRV_SECTOR_SIZE) { > + tail = 0; > + } > > if (head || tail) { > int64_t cl_start = (offset - head) >> BDRV_SECTOR_BITS; > diff --git a/tests/qemu-iotests/154 b/tests/qemu-iotests/154 > index 7ca7219..005f09f 100755 > --- a/tests/qemu-iotests/154 > +++ b/tests/qemu-iotests/154 > @@ -2,7 +2,7 @@ > # > # qcow2 specific bdrv_pwrite_zeroes tests with backing files (complements 034) > # > -# Copyright (C) 2016 Red Hat, Inc. > +# Copyright (C) 2016-2017 Red Hat, Inc. > # > # This program is free software; you can redistribute it and/or modify > # it under the terms of the GNU General Public License as published by > @@ -42,7 +42,10 @@ _supported_proto file > _supported_os Linux > > CLUSTER_SIZE=4k > -size=128M > +size=$((128 * 1024 * 1024)) > + > +# This test requires zero clusters, added in v3 images > +_unsupported_imgopts compat=0.10 > > echo > echo == backing file contains zeros == > @@ -299,6 +302,111 @@ $QEMU_IO -c "read -P 0 75k 1k" "$TEST_IMG" | _filter_qemu_io > > $QEMU_IMG map --output=json "$TEST_IMG" | _filter_qemu_img_map > > +echo > +echo == unaligned image tail cluster, no allocation needed == > + > +CLUSTER_SIZE=1024 TEST_IMG="$TEST_IMG.base" _make_test_img $((size + 1024)) Any reason for the CLUSTER_SIZE? It passes with 64 kB as well, and I actually think that would be a better test because as it is, the image's "unaligned" tail is exactly one cluster (so it isn't really unaligned). > +_make_test_img -b "$TEST_IMG.base" $((size + 2048)) > + > +# With no backing file, write to all or part of unallocated partial cluster > + > +# Backing file: 128m: ... | 00 -- Not sure what the "00 --" is supposed to mean. The cluster size is 1 kB, so this is just a single cluster; which will be converted from unallocated to a zero cluster because the tail reads as zero. > +$QEMU_IO -c "write -z $size 512" "$TEST_IMG.base" | _filter_qemu_io > + > +# Backing file: 128m: ... | -- 00 Same here, but now it's the head that reads as zero (and it's already a zero cluster so nothing changes here). > +$QEMU_IO -c "write -z $size 512" "$TEST_IMG.base" | _filter_qemu_io But I suppose you mean $((size + 512))? > + > +# Backing file: 128m: ... | 00 00 And finally this doesn't change anything either -- but then again this is a fully aligned request, so I don't quite see the point in testing this here. > +$QEMU_IO -c "write -z $size 1024" "$TEST_IMG.base" | _filter_qemu_io > + > +# No offset should be allocated, although the cluster itself is considered > +# allocated due to the zero flag > +$QEMU_IO -c "alloc $size 1024" "$TEST_IMG.base" | _filter_qemu_io > +$QEMU_IMG map --output=json "$TEST_IMG.base" | _filter_qemu_img_map > + > +# With backing file that reads as zero, write to all or part of entire cluster > + > +# Backing file: 128m: ... | 00 00 > +# Active layer: 128m: ... | 00 00 00 00 > +$QEMU_IO -c "write -z $size 2048" "$TEST_IMG" | _filter_qemu_io > +$QEMU_IMG map --output=json "$TEST_IMG" | _filter_qemu_img_map > + > +# Backing file: 128m: ... | 00 00 > +# Active layer: 128m: ... | 00 00 -- -- How so? Shouldn't this just stay a zero cluster because the rest of the cluster already reads as zero? > +$QEMU_IO -c "write -z $((size)) 1024" "$TEST_IMG" | _filter_qemu_io > + > +# Backing file: 128m: ... | 00 00 > +# Active layer: 128m: ... | -- -- 00 00 Same here. > +$QEMU_IO -c "write -z $((size + 1024)) 1024" "$TEST_IMG" | _filter_qemu_io > + > +# Backing file: 128m: ... | 00 00 > +# Active layer: 128m: ... | -- 00 00 -- Same here. Also, I'm not quite sure what you are testing by just issuing three consecutive write requests without checking the result each time. You assume it to be something which you wrote in the comments above each, but you can't be sure (yes, if something goes wrong in between, the following final map should catch it, but it's not for certain). > +$QEMU_IO -c "write -z $((size + 512)) 1024" "$TEST_IMG" | _filter_qemu_io > + > +# No offset should be allocated, although the cluster itself is considered > +# allocated due to the zero flag > +$QEMU_IO -c "alloc $size 2048" "$TEST_IMG" | _filter_qemu_io > +$QEMU_IMG map --output=json "$TEST_IMG" | _filter_qemu_img_map > + > +# Preallocated cluster is not freed when unmap is not requested > + > +# Backing file: 128m: ... | XX -- => ... | XX 00 > +$QEMU_IO -c "write $((size)) 512" "$TEST_IMG.base" | _filter_qemu_io > +$QEMU_IO -c "write -z $((size + 512)) 512" "$TEST_IMG.base" | _filter_qemu_io > + > +# Backing file: 128m: ... | XX 00 => ... | 00 00 > +$QEMU_IO -c "write -z $((size)) 1024" "$TEST_IMG.base" | _filter_qemu_io > +$QEMU_IMG map --output=json "$TEST_IMG.base" | _filter_qemu_img_map | > + sed 's/"offset": [0-9]*/"offset": OFFSET/g' > + > +echo > +echo == unaligned image tail cluster, allocation required == > + > +CLUSTER_SIZE=512 TEST_IMG="$TEST_IMG.base" _make_test_img $((size + 1024)) > +_make_test_img -b "$TEST_IMG.base" $((size + 2048)) > + > +# Write beyond backing file must COW > +# Backing file: 128m: ... | XX -- Here, the "--" is an unallocated cluster... > +# Active layer: 128m: ... | -- -- 00 -- ...while here it is normally allocated data. Or is "--" supposed to mean you just don't write to that area...? > + > +$QEMU_IO -c "write -P 1 $((size)) 512" "$TEST_IMG.base" | _filter_qemu_io > +$QEMU_IO -c "write -z $((size + 1024)) 512" "$TEST_IMG" | _filter_qemu_io > +$QEMU_IO -c "read -P 1 $((size)) 512" "$TEST_IMG" | _filter_qemu_io > +$QEMU_IO -c "read -P 0 $((size + 512)) 1536" "$TEST_IMG" | _filter_qemu_io > +$QEMU_IMG map --output=json "$TEST_IMG" | _filter_qemu_img_map > + > +CLUSTER_SIZE=512 TEST_IMG="$TEST_IMG.base" _make_test_img $((size + 1024)) > +_make_test_img -b "$TEST_IMG.base" $((size + 2048)) > + > +# Writes at boundaries of (partial) cluster must not lose mid-cluster data > +# Backing file: 128m: ... | -- XX > +# Active layer: 128m: ... | 00 -- -- 00 > + > +$QEMU_IO -c "write -P 1 $((size + 512)) 512" "$TEST_IMG.base" | _filter_qemu_io > +$QEMU_IO -c "write -z $((size)) 512" "$TEST_IMG" | _filter_qemu_io > +$QEMU_IO -c "read -P 0 $((size)) 512" "$TEST_IMG" | _filter_qemu_io > +$QEMU_IO -c "read -P 1 $((size + 512)) 512" "$TEST_IMG" | _filter_qemu_io > +$QEMU_IO -c "read -P 0 $((size + 1024)) 1024" "$TEST_IMG" | _filter_qemu_> +$QEMU_IO -c "write -z $((size + 1536)) 512" "$TEST_IMG" | _filter_qemu_io [1] > +$QEMU_IO -c "read -P 0 $((size)) 512" "$TEST_IMG" | _filter_qemu_io > +$QEMU_IO -c "read -P 1 $((size + 512)) 512" "$TEST_IMG" | _filter_qemu_io > +$QEMU_IO -c "read -P 0 $((size + 1024)) 1024" "$TEST_IMG" | _filter_qemu_io > +$QEMU_IMG map --output=json "$TEST_IMG" | _filter_qemu_img_map > + > +CLUSTER_SIZE=512 TEST_IMG="$TEST_IMG.base" _make_test_img $((size + 1024)) > +_make_test_img -b "$TEST_IMG.base" $((size + 2048)) > + > +# Partial write must not lose data > +# Backing file: 128m: ... | -- -- > +# Active layer: 128m: ... | -- -- XX 00 Well, you have basically tested this in [1] already. After the first write -z to the active layer, the final four sectors are fully allocated (as they are in a single cluster) and look like this: 00 01 00 00 = XX XX XX XX (Because the 00s are just written as data.) Now the write -z in [1] just overwrites the last of those four sectors with zero data (albeit it being zero already). But if you want to test it again, I'm not stopping you. :-) Max > + > +$QEMU_IO -c "write -P 1 $((size + 1024)) 512" "$TEST_IMG" | _filter_qemu_io > +$QEMU_IO -c "write -z $((size + 1536)) 512" "$TEST_IMG" | _filter_qemu_io > +$QEMU_IO -c "read -P 0 $((size)) 1024" "$TEST_IMG" | _filter_qemu_io > +$QEMU_IO -c "read -P 1 $((size + 1024)) 512" "$TEST_IMG" | _filter_qemu_io > +$QEMU_IO -c "read -P 0 $((size + 1536)) 512" "$TEST_IMG" | _filter_qemu_io > +$QEMU_IMG map --output=json "$TEST_IMG" | _filter_qemu_img_map > + > # success, all done > echo "*** done" > rm -f $seq.full > diff --git a/tests/qemu-iotests/154.out b/tests/qemu-iotests/154.out > index da9eabd..f8a09af 100644 > --- a/tests/qemu-iotests/154.out > +++ b/tests/qemu-iotests/154.out > @@ -282,4 +282,87 @@ read 1024/1024 bytes at offset 76800 > { "start": 69632, "length": 4096, "depth": 0, "zero": true, "data": false}, > { "start": 73728, "length": 4096, "depth": 0, "zero": false, "data": true, "offset": 24576}, > { "start": 77824, "length": 134139904, "depth": 1, "zero": true, "data": false}] > + > +== unaligned image tail cluster, no allocation needed == > +Formatting 'TEST_DIR/t.IMGFMT.base', fmt=IMGFMT size=134218752 > +Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=134219776 backing_file=TEST_DIR/t.IMGFMT.base > +wrote 512/512 bytes at offset 134217728 > +512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) > +wrote 512/512 bytes at offset 134217728 > +512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) > +wrote 1024/1024 bytes at offset 134217728 > +1 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) > +1024/1024 bytes allocated at offset 128 MiB > +[{ "start": 0, "length": 134218752, "depth": 0, "zero": true, "data": false}] > +wrote 2048/2048 bytes at offset 134217728 > +2 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) > +[{ "start": 0, "length": 134217728, "depth": 1, "zero": true, "data": false}, > +{ "start": 134217728, "length": 2048, "depth": 0, "zero": true, "data": false}] > +wrote 1024/1024 bytes at offset 134217728 > +1 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) > +wrote 1024/1024 bytes at offset 134218752 > +1 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) > +wrote 1024/1024 bytes at offset 134218240 > +1 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) > +2048/2048 bytes allocated at offset 128 MiB > +[{ "start": 0, "length": 134217728, "depth": 1, "zero": true, "data": false}, > +{ "start": 134217728, "length": 2048, "depth": 0, "zero": true, "data": false}] > +wrote 512/512 bytes at offset 134217728 > +512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) > +wrote 512/512 bytes at offset 134218240 > +512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) > +wrote 1024/1024 bytes at offset 134217728 > +1 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) > +[{ "start": 0, "length": 134217728, "depth": 0, "zero": true, "data": false}, > +{ "start": 134217728, "length": 1024, "depth": 0, "zero": true, "data": false, "offset": OFFSET}] > + > +== unaligned image tail cluster, allocation required == > +Formatting 'TEST_DIR/t.IMGFMT.base', fmt=IMGFMT size=134218752 > +Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=134219776 backing_file=TEST_DIR/t.IMGFMT.base > +wrote 512/512 bytes at offset 134217728 > +512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) > +wrote 512/512 bytes at offset 134218752 > +512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) > +read 512/512 bytes at offset 134217728 > +512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) > +read 1536/1536 bytes at offset 134218240 > +1.500 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) > +[{ "start": 0, "length": 134217728, "depth": 1, "zero": true, "data": false}, > +{ "start": 134217728, "length": 2048, "depth": 0, "zero": false, "data": true, "offset": 20480}] > +Formatting 'TEST_DIR/t.IMGFMT.base', fmt=IMGFMT size=134218752 > +Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=134219776 backing_file=TEST_DIR/t.IMGFMT.base > +wrote 512/512 bytes at offset 134218240 > +512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) > +wrote 512/512 bytes at offset 134217728 > +512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) > +read 512/512 bytes at offset 134217728 > +512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) > +read 512/512 bytes at offset 134218240 > +512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) > +read 1024/1024 bytes at offset 134218752 > +1 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) > +wrote 512/512 bytes at offset 134219264 > +512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) > +read 512/512 bytes at offset 134217728 > +512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) > +read 512/512 bytes at offset 134218240 > +512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) > +read 1024/1024 bytes at offset 134218752 > +1 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) > +[{ "start": 0, "length": 134217728, "depth": 1, "zero": true, "data": false}, > +{ "start": 134217728, "length": 2048, "depth": 0, "zero": false, "data": true, "offset": 20480}] > +Formatting 'TEST_DIR/t.IMGFMT.base', fmt=IMGFMT size=134218752 > +Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=134219776 backing_file=TEST_DIR/t.IMGFMT.base > +wrote 512/512 bytes at offset 134218752 > +512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) > +wrote 512/512 bytes at offset 134219264 > +512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) > +read 1024/1024 bytes at offset 134217728 > +1 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) > +read 512/512 bytes at offset 134218752 > +512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) > +read 512/512 bytes at offset 134219264 > +512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) > +[{ "start": 0, "length": 134217728, "depth": 1, "zero": true, "data": false}, > +{ "start": 134217728, "length": 2048, "depth": 0, "zero": false, "data": true, "offset": 20480}] > *** done >
On 04/28/2017 03:48 PM, Max Reitz wrote: > On 27.04.2017 03:46, Eric Blake wrote: >> We've already improved discards to operate efficiently on the tail >> of an unaligned qcow2 image; it's time to make a similar improvement >> to write zeroes. The special case is only valid at the tail >> cluster of a file, where we must recognize that any sectors beyond >> the image end would implicitly read as zero, and therefore should >> not penalize our logic for widening a partial cluster into writing >> the whole cluster as zero. >> >> Update test 154 to cover the new scenarios, using two images of >> intentionally differing length. >> >> While at it, fix the test to gracefully skip when run as >> ./check -qcow2 -o compat=0.10 154 >> since the older format lacks zero clusters already required earlier >> in the test. >> >> Signed-off-by: Eric Blake <eblake@redhat.com> >> >> >> +echo >> +echo == unaligned image tail cluster, no allocation needed == >> + >> +CLUSTER_SIZE=1024 TEST_IMG="$TEST_IMG.base" _make_test_img $((size + 1024)) > > Any reason for the CLUSTER_SIZE? It passes with 64 kB as well, and I > actually think that would be a better test because as it is, the image's > "unaligned" tail is exactly one cluster (so it isn't really unaligned). Uhhh - copy-and-paste? You're right, that 1024 is too small for what I'm actually doing with it. :( > >> +_make_test_img -b "$TEST_IMG.base" $((size + 2048)) >> + >> +# With no backing file, write to all or part of unallocated partial cluster >> + >> +# Backing file: 128m: ... | 00 -- > > Not sure what the "00 --" is supposed to mean. The cluster size is 1 kB, > so this is just a single cluster; which will be converted from > unallocated to a zero cluster because the tail reads as zero. > >> +$QEMU_IO -c "write -z $size 512" "$TEST_IMG.base" | _filter_qemu_io I'm doing a write of 512 bytes, so I was trying to show that in the cluster starting at 128m, we have a write request for one sector to be written to 00, while the rest of the sector is left uninitialized. The obvious intent is that we note that the uninitialized portion reads as zero, so we can optimize the entire cluster (even though it is a partial cluster) to be a cluster write-zero instead of a sector-based allocating write. But since you've pointed out that my setup is flawed, I'll have to double-check that I'm actually testing what I thought I was (I know I hit the right breakpoints in gdb, but its the iotest expected output that needs to show the difference between pre- and post-patched qemu with regards to the partial-tail-cluster optimizations). >> + >> +# Backing file: 128m: ... | -- 00 > > Same here, but now it's the head that reads as zero (and it's already a > zero cluster so nothing changes here). > >> +$QEMU_IO -c "write -z $size 512" "$TEST_IMG.base" | _filter_qemu_io > > But I suppose you mean $((size + 512))? Umm. Yeah. (Hey - you're not the only one that writes patches late at night). > >> + >> +# Backing file: 128m: ... | 00 00 > > And finally this doesn't change anything either -- but then again this > is a fully aligned request, so I don't quite see the point in testing > this here. At one point during my development, I had a bug where a partial write request at the head of the cluster behaved differently from a partial write request at the end (one allocated while the other did not). Maybe it's best if I do a single write, then inspect the map, then reset the image, rather than doing all three writes in a row and proving that the net result of the three writes had no allocation. In v9, when I was trying to use unallocated clusters as a shortcut on files with no backing image, this actually worked (after all three writes, the cluster would still be unallocated); but since Kevin convinced me that v10 has to set the zero flag on all three writes, I'm back to the point of needing to clear the zero flag between writes. Okay, I'll definitely have to fix it. >> +# With backing file that reads as zero, write to all or part of entire cluster >> + >> +# Backing file: 128m: ... | 00 00 >> +# Active layer: 128m: ... | 00 00 00 00 >> +$QEMU_IO -c "write -z $size 2048" "$TEST_IMG" | _filter_qemu_io >> +$QEMU_IMG map --output=json "$TEST_IMG" | _filter_qemu_img_map >> + >> +# Backing file: 128m: ... | 00 00 >> +# Active layer: 128m: ... | 00 00 -- -- > > How so? Shouldn't this just stay a zero cluster because the rest of the > cluster already reads as zero? Same problem as above. In v9, I was trying to exploit code that left a cluster unallocated; but that's not viable, so I'll have to reset the image between steps. > > Also, I'm not quite sure what you are testing by just issuing three > consecutive write requests without checking the result each time. You > assume it to be something which you wrote in the comments above each, > but you can't be sure (yes, if something goes wrong in between, the > following final map should catch it, but it's not for certain). Well, it helped me during development, but you're absolutely right that we can't be certain in the long-run. 3 separate checks, with resets between, is indeed the way to go. >> +echo >> +echo == unaligned image tail cluster, allocation required == >> + >> +CLUSTER_SIZE=512 TEST_IMG="$TEST_IMG.base" _make_test_img $((size + 1024)) Here, my backing cluster size is intentionally small, so that bdrv_get_block_status() can easily return sane results for the backing file portions, so only the active layer has a partial cluster. But making it obvious in the comments can't hurt, given that my earlier tests were invalidated when my size choice didn't actually match with the operations I was attempting. >> +_make_test_img -b "$TEST_IMG.base" $((size + 2048)) >> + >> +# Write beyond backing file must COW >> +# Backing file: 128m: ... | XX -- > > Here, the "--" is an unallocated cluster... > >> +# Active layer: 128m: ... | -- -- 00 -- > > ...while here it is normally allocated data. Or is "--" supposed to mean > you just don't write to that area...? If we had subclusters, then it would be unallocated data. But we don't. Since I just reset the image, I'm doing a write onto an unallocated cluster in the current layer; the write request is to a subset of the cluster, and the code checking whether the rest of the cluster reads as zeros will see that a COW (and corresponding allocation) is required. > >> + >> +$QEMU_IO -c "write -P 1 $((size)) 512" "$TEST_IMG.base" | _filter_qemu_io >> +$QEMU_IO -c "write -z $((size + 1024)) 512" "$TEST_IMG" | _filter_qemu_io >> +$QEMU_IO -c "read -P 1 $((size)) 512" "$TEST_IMG" | _filter_qemu_io >> +$QEMU_IO -c "read -P 0 $((size + 512)) 1536" "$TEST_IMG" | _filter_qemu_io >> +$QEMU_IMG map --output=json "$TEST_IMG" | _filter_qemu_img_map >> + >> +CLUSTER_SIZE=512 TEST_IMG="$TEST_IMG.base" _make_test_img $((size + 1024)) >> +_make_test_img -b "$TEST_IMG.base" $((size + 2048)) >> + >> +# Writes at boundaries of (partial) cluster must not lose mid-cluster data >> +# Backing file: 128m: ... | -- XX >> +# Active layer: 128m: ... | 00 -- -- 00 >> + >> +$QEMU_IO -c "write -P 1 $((size + 512)) 512" "$TEST_IMG.base" | _filter_qemu_io >> +$QEMU_IO -c "write -z $((size)) 512" "$TEST_IMG" | _filter_qemu_io >> +$QEMU_IO -c "read -P 0 $((size)) 512" "$TEST_IMG" | _filter_qemu_io >> +$QEMU_IO -c "read -P 1 $((size + 512)) 512" "$TEST_IMG" | _filter_qemu_io >> +$QEMU_IO -c "read -P 0 $((size + 1024)) 1024" "$TEST_IMG" | _filter_qemu_> +$QEMU_IO -c "write -z $((size + 1536)) 512" "$TEST_IMG" | _filter_qemu_io > > [1] > >> +$QEMU_IO -c "read -P 0 $((size)) 512" "$TEST_IMG" | _filter_qemu_io >> +$QEMU_IO -c "read -P 1 $((size + 512)) 512" "$TEST_IMG" | _filter_qemu_io >> +$QEMU_IO -c "read -P 0 $((size + 1024)) 1024" "$TEST_IMG" | _filter_qemu_io >> +$QEMU_IMG map --output=json "$TEST_IMG" | _filter_qemu_img_map >> + >> +CLUSTER_SIZE=512 TEST_IMG="$TEST_IMG.base" _make_test_img $((size + 1024)) >> +_make_test_img -b "$TEST_IMG.base" $((size + 2048)) >> + >> +# Partial write must not lose data >> +# Backing file: 128m: ... | -- -- >> +# Active layer: 128m: ... | -- -- XX 00 > > Well, you have basically tested this in [1] already. After the first > write -z to the active layer, the final four sectors are fully allocated > (as they are in a single cluster) and look like this: > > 00 01 00 00 = XX XX XX XX > > (Because the 00s are just written as data.) > > Now the write -z in [1] just overwrites the last of those four sectors > with zero data (albeit it being zero already). > > But if you want to test it again, I'm not stopping you. :-) > Well, you've already proven that I need to revisit this test with a fine-tooth comb, especially if I split the series to do the blkdebug stuff separately from the qcow2 stuff.
On 04/28/2017 04:24 PM, Eric Blake wrote: >>> +echo >>> +echo == unaligned image tail cluster, no allocation needed == >>> + >>> +CLUSTER_SIZE=1024 TEST_IMG="$TEST_IMG.base" _make_test_img $((size + 1024)) >> >> Any reason for the CLUSTER_SIZE? It passes with 64 kB as well, and I >> actually think that would be a better test because as it is, the image's >> "unaligned" tail is exactly one cluster (so it isn't really unaligned). > > Uhhh - copy-and-paste? You're right, that 1024 is too small for what > I'm actually doing with it. :( Actually, the whole test defaults to 4k clusters except when overridden, but I did learn while hammering at things that we don't have a nice way to tell that a backing file slightly shorter than the active file behaves as though we read all zeros for the difference. I'm thinking of proposing an RFC patch introducing BDRV_BLOCK_EOF, which gets set any time bdrv_get_block_status() clamps the returns *pnum because it hit EOF, to see what optimizations fall out elsewhere in the tree as a result. (It may not be worth it in the end, but we'll see).
diff --git a/block/qcow2.c b/block/qcow2.c index 5c1573c..8038793 100644 --- a/block/qcow2.c +++ b/block/qcow2.c @@ -2449,6 +2449,10 @@ static bool is_zero_sectors(BlockDriverState *bs, int64_t start, BlockDriverState *file; int64_t res; + if (start + count > bs->total_sectors) { + count = bs->total_sectors - start; + } + if (!count) { return true; } @@ -2467,6 +2471,9 @@ static coroutine_fn int qcow2_co_pwrite_zeroes(BlockDriverState *bs, uint32_t tail = (offset + count) % s->cluster_size; trace_qcow2_pwrite_zeroes_start_req(qemu_coroutine_self(), offset, count); + if (offset + count == bs->total_sectors * BDRV_SECTOR_SIZE) { + tail = 0; + } if (head || tail) { int64_t cl_start = (offset - head) >> BDRV_SECTOR_BITS; diff --git a/tests/qemu-iotests/154 b/tests/qemu-iotests/154 index 7ca7219..005f09f 100755 --- a/tests/qemu-iotests/154 +++ b/tests/qemu-iotests/154 @@ -2,7 +2,7 @@ # # qcow2 specific bdrv_pwrite_zeroes tests with backing files (complements 034) # -# Copyright (C) 2016 Red Hat, Inc. +# Copyright (C) 2016-2017 Red Hat, Inc. # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by @@ -42,7 +42,10 @@ _supported_proto file _supported_os Linux CLUSTER_SIZE=4k -size=128M +size=$((128 * 1024 * 1024)) + +# This test requires zero clusters, added in v3 images +_unsupported_imgopts compat=0.10 echo echo == backing file contains zeros == @@ -299,6 +302,111 @@ $QEMU_IO -c "read -P 0 75k 1k" "$TEST_IMG" | _filter_qemu_io $QEMU_IMG map --output=json "$TEST_IMG" | _filter_qemu_img_map +echo +echo == unaligned image tail cluster, no allocation needed == + +CLUSTER_SIZE=1024 TEST_IMG="$TEST_IMG.base" _make_test_img $((size + 1024)) +_make_test_img -b "$TEST_IMG.base" $((size + 2048)) + +# With no backing file, write to all or part of unallocated partial cluster + +# Backing file: 128m: ... | 00 -- +$QEMU_IO -c "write -z $size 512" "$TEST_IMG.base" | _filter_qemu_io + +# Backing file: 128m: ... | -- 00 +$QEMU_IO -c "write -z $size 512" "$TEST_IMG.base" | _filter_qemu_io + +# Backing file: 128m: ... | 00 00 +$QEMU_IO -c "write -z $size 1024" "$TEST_IMG.base" | _filter_qemu_io + +# No offset should be allocated, although the cluster itself is considered +# allocated due to the zero flag +$QEMU_IO -c "alloc $size 1024" "$TEST_IMG.base" | _filter_qemu_io +$QEMU_IMG map --output=json "$TEST_IMG.base" | _filter_qemu_img_map + +# With backing file that reads as zero, write to all or part of entire cluster + +# Backing file: 128m: ... | 00 00 +# Active layer: 128m: ... | 00 00 00 00 +$QEMU_IO -c "write -z $size 2048" "$TEST_IMG" | _filter_qemu_io +$QEMU_IMG map --output=json "$TEST_IMG" | _filter_qemu_img_map + +# Backing file: 128m: ... | 00 00 +# Active layer: 128m: ... | 00 00 -- -- +$QEMU_IO -c "write -z $((size)) 1024" "$TEST_IMG" | _filter_qemu_io + +# Backing file: 128m: ... | 00 00 +# Active layer: 128m: ... | -- -- 00 00 +$QEMU_IO -c "write -z $((size + 1024)) 1024" "$TEST_IMG" | _filter_qemu_io + +# Backing file: 128m: ... | 00 00 +# Active layer: 128m: ... | -- 00 00 -- +$QEMU_IO -c "write -z $((size + 512)) 1024" "$TEST_IMG" | _filter_qemu_io + +# No offset should be allocated, although the cluster itself is considered +# allocated due to the zero flag +$QEMU_IO -c "alloc $size 2048" "$TEST_IMG" | _filter_qemu_io +$QEMU_IMG map --output=json "$TEST_IMG" | _filter_qemu_img_map + +# Preallocated cluster is not freed when unmap is not requested + +# Backing file: 128m: ... | XX -- => ... | XX 00 +$QEMU_IO -c "write $((size)) 512" "$TEST_IMG.base" | _filter_qemu_io +$QEMU_IO -c "write -z $((size + 512)) 512" "$TEST_IMG.base" | _filter_qemu_io + +# Backing file: 128m: ... | XX 00 => ... | 00 00 +$QEMU_IO -c "write -z $((size)) 1024" "$TEST_IMG.base" | _filter_qemu_io +$QEMU_IMG map --output=json "$TEST_IMG.base" | _filter_qemu_img_map | + sed 's/"offset": [0-9]*/"offset": OFFSET/g' + +echo +echo == unaligned image tail cluster, allocation required == + +CLUSTER_SIZE=512 TEST_IMG="$TEST_IMG.base" _make_test_img $((size + 1024)) +_make_test_img -b "$TEST_IMG.base" $((size + 2048)) + +# Write beyond backing file must COW +# Backing file: 128m: ... | XX -- +# Active layer: 128m: ... | -- -- 00 -- + +$QEMU_IO -c "write -P 1 $((size)) 512" "$TEST_IMG.base" | _filter_qemu_io +$QEMU_IO -c "write -z $((size + 1024)) 512" "$TEST_IMG" | _filter_qemu_io +$QEMU_IO -c "read -P 1 $((size)) 512" "$TEST_IMG" | _filter_qemu_io +$QEMU_IO -c "read -P 0 $((size + 512)) 1536" "$TEST_IMG" | _filter_qemu_io +$QEMU_IMG map --output=json "$TEST_IMG" | _filter_qemu_img_map + +CLUSTER_SIZE=512 TEST_IMG="$TEST_IMG.base" _make_test_img $((size + 1024)) +_make_test_img -b "$TEST_IMG.base" $((size + 2048)) + +# Writes at boundaries of (partial) cluster must not lose mid-cluster data +# Backing file: 128m: ... | -- XX +# Active layer: 128m: ... | 00 -- -- 00 + +$QEMU_IO -c "write -P 1 $((size + 512)) 512" "$TEST_IMG.base" | _filter_qemu_io +$QEMU_IO -c "write -z $((size)) 512" "$TEST_IMG" | _filter_qemu_io +$QEMU_IO -c "read -P 0 $((size)) 512" "$TEST_IMG" | _filter_qemu_io +$QEMU_IO -c "read -P 1 $((size + 512)) 512" "$TEST_IMG" | _filter_qemu_io +$QEMU_IO -c "read -P 0 $((size + 1024)) 1024" "$TEST_IMG" | _filter_qemu_io +$QEMU_IO -c "write -z $((size + 1536)) 512" "$TEST_IMG" | _filter_qemu_io +$QEMU_IO -c "read -P 0 $((size)) 512" "$TEST_IMG" | _filter_qemu_io +$QEMU_IO -c "read -P 1 $((size + 512)) 512" "$TEST_IMG" | _filter_qemu_io +$QEMU_IO -c "read -P 0 $((size + 1024)) 1024" "$TEST_IMG" | _filter_qemu_io +$QEMU_IMG map --output=json "$TEST_IMG" | _filter_qemu_img_map + +CLUSTER_SIZE=512 TEST_IMG="$TEST_IMG.base" _make_test_img $((size + 1024)) +_make_test_img -b "$TEST_IMG.base" $((size + 2048)) + +# Partial write must not lose data +# Backing file: 128m: ... | -- -- +# Active layer: 128m: ... | -- -- XX 00 + +$QEMU_IO -c "write -P 1 $((size + 1024)) 512" "$TEST_IMG" | _filter_qemu_io +$QEMU_IO -c "write -z $((size + 1536)) 512" "$TEST_IMG" | _filter_qemu_io +$QEMU_IO -c "read -P 0 $((size)) 1024" "$TEST_IMG" | _filter_qemu_io +$QEMU_IO -c "read -P 1 $((size + 1024)) 512" "$TEST_IMG" | _filter_qemu_io +$QEMU_IO -c "read -P 0 $((size + 1536)) 512" "$TEST_IMG" | _filter_qemu_io +$QEMU_IMG map --output=json "$TEST_IMG" | _filter_qemu_img_map + # success, all done echo "*** done" rm -f $seq.full diff --git a/tests/qemu-iotests/154.out b/tests/qemu-iotests/154.out index da9eabd..f8a09af 100644 --- a/tests/qemu-iotests/154.out +++ b/tests/qemu-iotests/154.out @@ -282,4 +282,87 @@ read 1024/1024 bytes at offset 76800 { "start": 69632, "length": 4096, "depth": 0, "zero": true, "data": false}, { "start": 73728, "length": 4096, "depth": 0, "zero": false, "data": true, "offset": 24576}, { "start": 77824, "length": 134139904, "depth": 1, "zero": true, "data": false}] + +== unaligned image tail cluster, no allocation needed == +Formatting 'TEST_DIR/t.IMGFMT.base', fmt=IMGFMT size=134218752 +Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=134219776 backing_file=TEST_DIR/t.IMGFMT.base +wrote 512/512 bytes at offset 134217728 +512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) +wrote 512/512 bytes at offset 134217728 +512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) +wrote 1024/1024 bytes at offset 134217728 +1 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) +1024/1024 bytes allocated at offset 128 MiB +[{ "start": 0, "length": 134218752, "depth": 0, "zero": true, "data": false}] +wrote 2048/2048 bytes at offset 134217728 +2 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) +[{ "start": 0, "length": 134217728, "depth": 1, "zero": true, "data": false}, +{ "start": 134217728, "length": 2048, "depth": 0, "zero": true, "data": false}] +wrote 1024/1024 bytes at offset 134217728 +1 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) +wrote 1024/1024 bytes at offset 134218752 +1 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) +wrote 1024/1024 bytes at offset 134218240 +1 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) +2048/2048 bytes allocated at offset 128 MiB +[{ "start": 0, "length": 134217728, "depth": 1, "zero": true, "data": false}, +{ "start": 134217728, "length": 2048, "depth": 0, "zero": true, "data": false}] +wrote 512/512 bytes at offset 134217728 +512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) +wrote 512/512 bytes at offset 134218240 +512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) +wrote 1024/1024 bytes at offset 134217728 +1 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) +[{ "start": 0, "length": 134217728, "depth": 0, "zero": true, "data": false}, +{ "start": 134217728, "length": 1024, "depth": 0, "zero": true, "data": false, "offset": OFFSET}] + +== unaligned image tail cluster, allocation required == +Formatting 'TEST_DIR/t.IMGFMT.base', fmt=IMGFMT size=134218752 +Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=134219776 backing_file=TEST_DIR/t.IMGFMT.base +wrote 512/512 bytes at offset 134217728 +512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) +wrote 512/512 bytes at offset 134218752 +512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) +read 512/512 bytes at offset 134217728 +512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) +read 1536/1536 bytes at offset 134218240 +1.500 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) +[{ "start": 0, "length": 134217728, "depth": 1, "zero": true, "data": false}, +{ "start": 134217728, "length": 2048, "depth": 0, "zero": false, "data": true, "offset": 20480}] +Formatting 'TEST_DIR/t.IMGFMT.base', fmt=IMGFMT size=134218752 +Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=134219776 backing_file=TEST_DIR/t.IMGFMT.base +wrote 512/512 bytes at offset 134218240 +512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) +wrote 512/512 bytes at offset 134217728 +512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) +read 512/512 bytes at offset 134217728 +512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) +read 512/512 bytes at offset 134218240 +512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) +read 1024/1024 bytes at offset 134218752 +1 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) +wrote 512/512 bytes at offset 134219264 +512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) +read 512/512 bytes at offset 134217728 +512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) +read 512/512 bytes at offset 134218240 +512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) +read 1024/1024 bytes at offset 134218752 +1 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) +[{ "start": 0, "length": 134217728, "depth": 1, "zero": true, "data": false}, +{ "start": 134217728, "length": 2048, "depth": 0, "zero": false, "data": true, "offset": 20480}] +Formatting 'TEST_DIR/t.IMGFMT.base', fmt=IMGFMT size=134218752 +Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=134219776 backing_file=TEST_DIR/t.IMGFMT.base +wrote 512/512 bytes at offset 134218752 +512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) +wrote 512/512 bytes at offset 134219264 +512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) +read 1024/1024 bytes at offset 134217728 +1 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) +read 512/512 bytes at offset 134218752 +512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) +read 512/512 bytes at offset 134219264 +512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) +[{ "start": 0, "length": 134217728, "depth": 1, "zero": true, "data": false}, +{ "start": 134217728, "length": 2048, "depth": 0, "zero": false, "data": true, "offset": 20480}] *** done
We've already improved discards to operate efficiently on the tail of an unaligned qcow2 image; it's time to make a similar improvement to write zeroes. The special case is only valid at the tail cluster of a file, where we must recognize that any sectors beyond the image end would implicitly read as zero, and therefore should not penalize our logic for widening a partial cluster into writing the whole cluster as zero. Update test 154 to cover the new scenarios, using two images of intentionally differing length. While at it, fix the test to gracefully skip when run as ./check -qcow2 -o compat=0.10 154 since the older format lacks zero clusters already required earlier in the test. Signed-off-by: Eric Blake <eblake@redhat.com> --- v10: rebase to better reporting of preallocated zero clusters v9: new patch --- block/qcow2.c | 7 +++ tests/qemu-iotests/154 | 112 ++++++++++++++++++++++++++++++++++++++++++++- tests/qemu-iotests/154.out | 83 +++++++++++++++++++++++++++++++++ 3 files changed, 200 insertions(+), 2 deletions(-)