mbox series

[for-4.2?,v3,0/8] block: Fix resize (extending) of short overlays

Message ID 20191122160511.8377-1-kwolf@redhat.com (mailing list archive)
Headers show
Series block: Fix resize (extending) of short overlays | expand

Message

Kevin Wolf Nov. 22, 2019, 4:05 p.m. UTC
See patch 4 for the description of the bug fixed.

v3:
- Don't allow blocking the monitor for a zero write in block_resize
  (even though we can already blockfor other reasons there). This is
  mainly responsible for the increased complexity compared to v2.
  Personally, I think this is not an improvement over v2, but if this is
  what it takes to fix a corruption issue in 4.2... [Max]
- Don't use huge image files in the test case [Vladimir]

v2:
- Switched order of bs->total_sectors update and zero write [Vladimir]
- Fixed coding style [Vladimir]
- Changed the commit message to contain what was in the cover letter
- Test all preallocation modes
- Test allocation status with qemu-io 'map' [Vladimir]

Kevin Wolf (8):
  block: bdrv_co_do_pwrite_zeroes: 64 bit 'bytes' parameter
  block: Add no_fallback parameter to bdrv_co_truncate()
  qcow2: Declare BDRV_REQ_NO_FALLBACK supported
  block: truncate: Don't make backing file data visible
  iotests: Add qemu_io_log()
  iotests: Fix timeout in run_job()
  iotests: Support job-complete in run_job()
  iotests: Test committing to short backing file

 include/block/block.h          |   5 +-
 include/sysemu/block-backend.h |   2 +-
 block/block-backend.c          |   4 +-
 block/commit.c                 |   4 +-
 block/crypto.c                 |   4 +-
 block/io.c                     |  55 +++++++--
 block/mirror.c                 |   2 +-
 block/parallels.c              |   6 +-
 block/qcow.c                   |   4 +-
 block/qcow2-refcount.c         |   2 +-
 block/qcow2.c                  |  22 ++--
 block/qed.c                    |   2 +-
 block/raw-format.c             |   2 +-
 block/vdi.c                    |   2 +-
 block/vhdx-log.c               |   2 +-
 block/vhdx.c                   |   6 +-
 block/vmdk.c                   |  10 +-
 block/vpc.c                    |   2 +-
 blockdev.c                     |   2 +-
 qemu-img.c                     |   2 +-
 qemu-io-cmds.c                 |   2 +-
 tests/test-block-iothread.c    |   6 +-
 tests/qemu-iotests/274         | 152 ++++++++++++++++++++++++
 tests/qemu-iotests/274.out     | 203 +++++++++++++++++++++++++++++++++
 tests/qemu-iotests/group       |   1 +
 tests/qemu-iotests/iotests.py  |  11 +-
 26 files changed, 463 insertions(+), 52 deletions(-)
 create mode 100755 tests/qemu-iotests/274
 create mode 100644 tests/qemu-iotests/274.out

Comments

Peter Maydell Nov. 22, 2019, 4:17 p.m. UTC | #1
On Fri, 22 Nov 2019 at 16:08, Kevin Wolf <kwolf@redhat.com> wrote:
>
> See patch 4 for the description of the bug fixed.

I guess my questions for trying to answer the "for-4.2?"
question in the subject are:
 1) is this a security (leaking data into the guest) bug ?
 2) is this a regression?
 3) is this something a lot of people are likely to run into?

Eyeballing of the diffstat plus the fact we're on v4 of
the patchset already makes me a little uneasy about
putting it into rc3, but if the bug we're fixing matters
enough we can do it.

thanks
-- PMM
Eric Blake Nov. 22, 2019, 4:41 p.m. UTC | #2
On 11/22/19 10:17 AM, Peter Maydell wrote:
> On Fri, 22 Nov 2019 at 16:08, Kevin Wolf <kwolf@redhat.com> wrote:
>>
>> See patch 4 for the description of the bug fixed.
> 
> I guess my questions for trying to answer the "for-4.2?"
> question in the subject are:
>   1) is this a security (leaking data into the guest) bug ?
>   2) is this a regression?
>   3) is this something a lot of people are likely to run into?

My thoughts (although Kevin's may be more definitive):

1) yes, there is a security aspect: certain resize or commit actions can 
result in the guest seeing a revival of stale data that the guest may 
have thought that it previously scrubbed.  Similarly, the tail end of 
the series proves via iotests that we have an actual case of data 
corruption after a block commit without this patch

2) no, this is a long-standing bug, we've only recently noticed it

3) no, it is uncommon to have an overlay with a size shorter than its 
backing file (it's not even all that common to have an overlay longer 
than the backing file), so this is a corner case not many people will 
hit.  It's even less common to have the difference in overlay sizes also 
coincide with formats that introduce the speed penalty of a longer 
blocking due to the added zeroing.

> 
> Eyeballing of the diffstat plus the fact we're on v4 of
> the patchset already makes me a little uneasy about
> putting it into rc3, but if the bug we're fixing matters
> enough we can do it.

In terms of diffstat, the v3 series was much smaller in impact.  Both 
versions add robustness, where the difference between v3 and v4 is 
whether we introduce a speed penalty on an unlikely setup (v3) or reject 
any operation where it would require a speed penalty to avoid data 
problems (v4).  I think all the patches in v3 were reviewed, but I'll go 
ahead and review v4 as well.

Because of point 1, I am leaning towards some version of this patch 
series (whether 3 or 4) making -rc3; but point 2 (it is not a 4.2 
regression) also seems to be a reasonable justification for slipping 
this to 5.0.
Max Reitz Nov. 25, 2019, 12:21 p.m. UTC | #3
On 22.11.19 17:41, Eric Blake wrote:
> On 11/22/19 10:17 AM, Peter Maydell wrote:

[...]

>> Eyeballing of the diffstat plus the fact we're on v4 of
>> the patchset already makes me a little uneasy about
>> putting it into rc3, but if the bug we're fixing matters
>> enough we can do it.
> 
> In terms of diffstat, the v3 series was much smaller in impact.  Both
> versions add robustness, where the difference between v3 and v4 is
> whether we introduce a speed penalty on an unlikely setup (v3) or reject
> any operation where it would require a speed penalty to avoid data
> problems (v4).

I’d just like to add that this isn’t just about a speed penalty, but
about the fact that the monitor is blocked while the operation is
running.  So the speed penalty has more impact than just some background
operation being slow.

Max
Max Reitz Nov. 25, 2019, 12:24 p.m. UTC | #4
On 22.11.19 17:05, Kevin Wolf wrote:
> See patch 4 for the description of the bug fixed.
> 
> v3:
> - Don't allow blocking the monitor for a zero write in block_resize
>   (even though we can already blockfor other reasons there). This is
>   mainly responsible for the increased complexity compared to v2.
>   Personally, I think this is not an improvement over v2, but if this is
>   what it takes to fix a corruption issue in 4.2... [Max]

I don’t find it so bad because the added complexity is:

(1) A mainly mechanical change of code to add another parameter to
{blk,bdrv}(_co)?_truncate(),

(2) qcow2 providing BDRV_REQ_NO_FALLBACK, and

(3) passing BDRV_REQ_NO_FALLBACK in bdrv_co_truncate() if the new
parameter is true.

(1) sees the most LoC changed, but it isn’t a complex change.  (2) and
(3) are both basically one-line changes each.


OTOH, as I’ve said on IRC, I believe you have a sufficient number of
R-bs on v2 to take it without mine, so the choice is yours.

Max
Kevin Wolf Dec. 10, 2019, 5:46 p.m. UTC | #5
Am 22.11.2019 um 17:05 hat Kevin Wolf geschrieben:
> See patch 4 for the description of the bug fixed.

I'm applying patches 3 and 5-7 to the block branch because they make
sense on their own.

The real fix will need another approach because the error handling is
broken in this one: If zeroing out fails (either because of NO_FALLBACK
or because of some other I/O error), bdrv_co_truncate() will return
failure, but the image size has already been increased, with potentially
incorrect data in the new area.

To fix this, we need to make sure that zeros will be read before we
commit the new image size to the image file (e.g. qcow2 header) and to
bs->total_sectors. In other words, it must become the responsibility of
the block driver.

To this effect, I'm planning to introduce a PREALLOC_MODE_ZERO_INIT flag
that can be or'ed to the preallocation mode. This will fail by default
because it looks like just another unimplemented preallocation mode to
block drivers. It will be requested explicitly by commit jobs and
automatically added by bdrv_co_truncate() if the backing file would
become visible (like in this series, but now for all preallocation
modes). I'm planning to implement it for qcow2 and file-posix for now,
which should cover most interesting cases.

Does this make sense to you?

Kevin
Max Reitz Dec. 11, 2019, 7:09 a.m. UTC | #6
On 10.12.19 18:46, Kevin Wolf wrote:
> Am 22.11.2019 um 17:05 hat Kevin Wolf geschrieben:
>> See patch 4 for the description of the bug fixed.
> 
> I'm applying patches 3 and 5-7 to the block branch because they make
> sense on their own.
> 
> The real fix will need another approach because the error handling is
> broken in this one: If zeroing out fails (either because of NO_FALLBACK
> or because of some other I/O error), bdrv_co_truncate() will return
> failure, but the image size has already been increased, with potentially
> incorrect data in the new area.
> 
> To fix this, we need to make sure that zeros will be read before we
> commit the new image size to the image file (e.g. qcow2 header) and to
> bs->total_sectors. In other words, it must become the responsibility of
> the block driver.
> 
> To this effect, I'm planning to introduce a PREALLOC_MODE_ZERO_INIT flag
> that can be or'ed to the preallocation mode. This will fail by default
> because it looks like just another unimplemented preallocation mode to
> block drivers. It will be requested explicitly by commit jobs and
> automatically added by bdrv_co_truncate() if the backing file would
> become visible (like in this series, but now for all preallocation
> modes). I'm planning to implement it for qcow2 and file-posix for now,
> which should cover most interesting cases.
> 
> Does this make sense to you?

Sounds good to me.

Max
Vladimir Sementsov-Ogievskiy Dec. 19, 2019, 9:24 a.m. UTC | #7
10.12.2019 20:46, Kevin Wolf wrote:
> Am 22.11.2019 um 17:05 hat Kevin Wolf geschrieben:
>> See patch 4 for the description of the bug fixed.
> 
> I'm applying patches 3 and 5-7 to the block branch because they make
> sense on their own.
> 
> The real fix will need another approach because the error handling is
> broken in this one: If zeroing out fails (either because of NO_FALLBACK
> or because of some other I/O error), bdrv_co_truncate() will return
> failure, but the image size has already been increased, with potentially
> incorrect data in the new area.
> 
> To fix this, we need to make sure that zeros will be read before we
> commit the new image size to the image file (e.g. qcow2 header) and to
> bs->total_sectors. In other words, it must become the responsibility of
> the block driver.
> 
> To this effect, I'm planning to introduce a PREALLOC_MODE_ZERO_INIT flag
> that can be or'ed to the preallocation mode. This will fail by default
> because it looks like just another unimplemented preallocation mode to
> block drivers. It will be requested explicitly by commit jobs and
> automatically added by bdrv_co_truncate() if the backing file would
> become visible (like in this series, but now for all preallocation
> modes). I'm planning to implement it for qcow2 and file-posix for now,
> which should cover most interesting cases.
> 
> Does this make sense to you?
> 

This should work. Do you still have this plan in a timeline?
Kevin Wolf Dec. 19, 2019, 10:13 a.m. UTC | #8
Am 19.12.2019 um 10:24 hat Vladimir Sementsov-Ogievskiy geschrieben:
> 10.12.2019 20:46, Kevin Wolf wrote:
> > Am 22.11.2019 um 17:05 hat Kevin Wolf geschrieben:
> >> See patch 4 for the description of the bug fixed.
> > 
> > I'm applying patches 3 and 5-7 to the block branch because they make
> > sense on their own.
> > 
> > The real fix will need another approach because the error handling is
> > broken in this one: If zeroing out fails (either because of NO_FALLBACK
> > or because of some other I/O error), bdrv_co_truncate() will return
> > failure, but the image size has already been increased, with potentially
> > incorrect data in the new area.
> > 
> > To fix this, we need to make sure that zeros will be read before we
> > commit the new image size to the image file (e.g. qcow2 header) and to
> > bs->total_sectors. In other words, it must become the responsibility of
> > the block driver.
> > 
> > To this effect, I'm planning to introduce a PREALLOC_MODE_ZERO_INIT flag
> > that can be or'ed to the preallocation mode. This will fail by default
> > because it looks like just another unimplemented preallocation mode to
> > block drivers. It will be requested explicitly by commit jobs and
> > automatically added by bdrv_co_truncate() if the backing file would
> > become visible (like in this series, but now for all preallocation
> > modes). I'm planning to implement it for qcow2 and file-posix for now,
> > which should cover most interesting cases.
> > 
> > Does this make sense to you?
> 
> This should work. Do you still have this plan in a timeline?

Still planning to do this, but tomorrow is my last working day for this
year. So I guess I'll get to it sometime in January.

Kevin
Vladimir Sementsov-Ogievskiy Dec. 19, 2019, 10:20 a.m. UTC | #9
19.12.2019 13:13, Kevin Wolf wrote:
> Am 19.12.2019 um 10:24 hat Vladimir Sementsov-Ogievskiy geschrieben:
>> 10.12.2019 20:46, Kevin Wolf wrote:
>>> Am 22.11.2019 um 17:05 hat Kevin Wolf geschrieben:
>>>> See patch 4 for the description of the bug fixed.
>>>
>>> I'm applying patches 3 and 5-7 to the block branch because they make
>>> sense on their own.
>>>
>>> The real fix will need another approach because the error handling is
>>> broken in this one: If zeroing out fails (either because of NO_FALLBACK
>>> or because of some other I/O error), bdrv_co_truncate() will return
>>> failure, but the image size has already been increased, with potentially
>>> incorrect data in the new area.
>>>
>>> To fix this, we need to make sure that zeros will be read before we
>>> commit the new image size to the image file (e.g. qcow2 header) and to
>>> bs->total_sectors. In other words, it must become the responsibility of
>>> the block driver.
>>>
>>> To this effect, I'm planning to introduce a PREALLOC_MODE_ZERO_INIT flag
>>> that can be or'ed to the preallocation mode. This will fail by default
>>> because it looks like just another unimplemented preallocation mode to
>>> block drivers. It will be requested explicitly by commit jobs and
>>> automatically added by bdrv_co_truncate() if the backing file would
>>> become visible (like in this series, but now for all preallocation
>>> modes). I'm planning to implement it for qcow2 and file-posix for now,
>>> which should cover most interesting cases.
>>>
>>> Does this make sense to you?
>>
>> This should work. Do you still have this plan in a timeline?
> 
> Still planning to do this, but tomorrow is my last working day for this
> year. So I guess I'll get to it sometime in January.
> 

Good. Have a nice holiday!
Vladimir Sementsov-Ogievskiy Feb. 5, 2020, 1:43 p.m. UTC | #10
19.12.2019 13:20, Vladimir Sementsov-Ogievskiy wrote:
> 19.12.2019 13:13, Kevin Wolf wrote:
>> Am 19.12.2019 um 10:24 hat Vladimir Sementsov-Ogievskiy geschrieben:
>>> 10.12.2019 20:46, Kevin Wolf wrote:
>>>> Am 22.11.2019 um 17:05 hat Kevin Wolf geschrieben:
>>>>> See patch 4 for the description of the bug fixed.
>>>>
>>>> I'm applying patches 3 and 5-7 to the block branch because they make
>>>> sense on their own.
>>>>
>>>> The real fix will need another approach because the error handling is
>>>> broken in this one: If zeroing out fails (either because of NO_FALLBACK
>>>> or because of some other I/O error), bdrv_co_truncate() will return
>>>> failure, but the image size has already been increased, with potentially
>>>> incorrect data in the new area.
>>>>
>>>> To fix this, we need to make sure that zeros will be read before we
>>>> commit the new image size to the image file (e.g. qcow2 header) and to
>>>> bs->total_sectors. In other words, it must become the responsibility of
>>>> the block driver.
>>>>
>>>> To this effect, I'm planning to introduce a PREALLOC_MODE_ZERO_INIT flag
>>>> that can be or'ed to the preallocation mode. This will fail by default
>>>> because it looks like just another unimplemented preallocation mode to
>>>> block drivers. It will be requested explicitly by commit jobs and
>>>> automatically added by bdrv_co_truncate() if the backing file would
>>>> become visible (like in this series, but now for all preallocation
>>>> modes). I'm planning to implement it for qcow2 and file-posix for now,
>>>> which should cover most interesting cases.
>>>>
>>>> Does this make sense to you?
>>>
>>> This should work. Do you still have this plan in a timeline?
>>
>> Still planning to do this, but tomorrow is my last working day for this
>> year. So I guess I'll get to it sometime in January.
>>
> 
> Good. Have a nice holiday!
> 
> 

Hi, didn't you forget? I just going to ping (or resend) my related
"[PATCH 0/4] fix & merge block_status_above and is_allocated_above", so,
pinging these patches too...