diff mbox series

[v2,2/6] block: truncate: Don't make backing file data visible

Message ID 20191120184501.28159-3-kwolf@redhat.com (mailing list archive)
State New, archived
Headers show
Series block: Fix resize (extending) of short overlays | expand

Commit Message

Kevin Wolf Nov. 20, 2019, 6:44 p.m. UTC
When extending the size of an image that has a backing file larger than
its old size, make sure that the backing file data doesn't become
visible in the guest, but the added area is properly zeroed out.

Consider the following scenario where the overlay is shorter than its
backing file:

    base.qcow2:     AAAAAAAA
    overlay.qcow2:  BBBB

When resizing (extending) overlay.qcow2, the new blocks should not stay
unallocated and make the additional As from base.qcow2 visible like
before this patch, but zeros should be read.

A similar case happens with the various variants of a commit job when an
intermediate file is short (- for unallocated):

    base.qcow2:     A-A-AAAA
    mid.qcow2:      BB-B
    top.qcow2:      C--C--C-

After commit top.qcow2 to mid.qcow2, the following happens:

    mid.qcow2:      CB-C00C0 (correct result)
    mid.qcow2:      CB-C--C- (before this fix)

Without the fix, blocks that previously read as zeros on top.qcow2
suddenly turn into A.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 block/io.c | 32 ++++++++++++++++++++++++++++++++
 1 file changed, 32 insertions(+)

Comments

Eric Blake Nov. 20, 2019, 9:15 p.m. UTC | #1
On 11/20/19 12:44 PM, Kevin Wolf wrote:
> When extending the size of an image that has a backing file larger than
> its old size, make sure that the backing file data doesn't become
> visible in the guest, but the added area is properly zeroed out.
> 
> Consider the following scenario where the overlay is shorter than its
> backing file:
> 
>      base.qcow2:     AAAAAAAA
>      overlay.qcow2:  BBBB
> 
> When resizing (extending) overlay.qcow2, the new blocks should not stay
> unallocated and make the additional As from base.qcow2 visible like
> before this patch, but zeros should be read.
> 
> A similar case happens with the various variants of a commit job when an
> intermediate file is short (- for unallocated):
> 
>      base.qcow2:     A-A-AAAA
>      mid.qcow2:      BB-B
>      top.qcow2:      C--C--C-
> 
> After commit top.qcow2 to mid.qcow2, the following happens:
> 
>      mid.qcow2:      CB-C00C0 (correct result)
>      mid.qcow2:      CB-C--C- (before this fix)
> 
> Without the fix, blocks that previously read as zeros on top.qcow2
> suddenly turn into A.
> 
> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> ---
>   block/io.c | 32 ++++++++++++++++++++++++++++++++
>   1 file changed, 32 insertions(+)
> 

> +    if (new_bytes && bs->backing && prealloc == PREALLOC_MODE_OFF) {
> +        int64_t backing_len;
> +
> +        backing_len = bdrv_getlength(backing_bs(bs));
> +        if (backing_len < 0) {
> +            ret = backing_len;
> +            goto out;
> +        }
> +
> +        if (backing_len > old_size) {
> +            ret = bdrv_co_do_pwrite_zeroes(
> +                    bs, old_size, MIN(new_bytes, backing_len - old_size),
> +                    BDRV_REQ_ZERO_WRITE | BDRV_REQ_MAY_UNMAP);
> +            if (ret < 0) {
> +                goto out;
> +            }
> +        }
> +    }

Note that if writing zeroes is not fast, and it turns out that we copy a 
lot of data rather than unallocated sections from the image being 
committed, that this can actually slow things down (doing a bulk 
pre-zero doubles up data I/O unless it is fast, which is why we added 
BDRV_REQ_NO_FALLBACK to avoid slow pre-zeroing).  However, the 
complication of zeroing only the unallocated clusters rather than a bulk 
pre-zeroing for something that is an unlikely corner case (how often do 
you create an overlay shorter than the backing file?) is not worth the 
extra code maintenance (unlike in the 'qemu-img convert' case where it 
was worth the optimization). So I'm fine with how you fixed it here.

Reviewed-by: Eric Blake <eblake@redhat.com>
Max Reitz Nov. 21, 2019, 8:59 a.m. UTC | #2
On 20.11.19 19:44, Kevin Wolf wrote:
> When extending the size of an image that has a backing file larger than
> its old size, make sure that the backing file data doesn't become
> visible in the guest, but the added area is properly zeroed out.
> 
> Consider the following scenario where the overlay is shorter than its
> backing file:
> 
>     base.qcow2:     AAAAAAAA
>     overlay.qcow2:  BBBB
> 
> When resizing (extending) overlay.qcow2, the new blocks should not stay
> unallocated and make the additional As from base.qcow2 visible like
> before this patch, but zeros should be read.
> 
> A similar case happens with the various variants of a commit job when an
> intermediate file is short (- for unallocated):
> 
>     base.qcow2:     A-A-AAAA
>     mid.qcow2:      BB-B
>     top.qcow2:      C--C--C-
> 
> After commit top.qcow2 to mid.qcow2, the following happens:
> 
>     mid.qcow2:      CB-C00C0 (correct result)
>     mid.qcow2:      CB-C--C- (before this fix)
> 
> Without the fix, blocks that previously read as zeros on top.qcow2
> suddenly turn into A.
> 
> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> ---
>  block/io.c | 32 ++++++++++++++++++++++++++++++++
>  1 file changed, 32 insertions(+)

Zeroing the intersection may take some time.  So is it right for QMP’s
block_resize to do this, seeing it is a synchronous operation?

As far as I can tell, jobs actually have the same problem.  I don’t
think mirror or commit have a pause point before truncating, so they
still block the monitor there, don’t they?

Max
Vladimir Sementsov-Ogievskiy Nov. 21, 2019, 9:46 a.m. UTC | #3
21.11.2019 11:59, Max Reitz wrote:
> On 20.11.19 19:44, Kevin Wolf wrote:
>> When extending the size of an image that has a backing file larger than
>> its old size, make sure that the backing file data doesn't become
>> visible in the guest, but the added area is properly zeroed out.
>>
>> Consider the following scenario where the overlay is shorter than its
>> backing file:
>>
>>      base.qcow2:     AAAAAAAA
>>      overlay.qcow2:  BBBB
>>
>> When resizing (extending) overlay.qcow2, the new blocks should not stay
>> unallocated and make the additional As from base.qcow2 visible like
>> before this patch, but zeros should be read.
>>
>> A similar case happens with the various variants of a commit job when an
>> intermediate file is short (- for unallocated):
>>
>>      base.qcow2:     A-A-AAAA
>>      mid.qcow2:      BB-B
>>      top.qcow2:      C--C--C-
>>
>> After commit top.qcow2 to mid.qcow2, the following happens:
>>
>>      mid.qcow2:      CB-C00C0 (correct result)
>>      mid.qcow2:      CB-C--C- (before this fix)
>>
>> Without the fix, blocks that previously read as zeros on top.qcow2
>> suddenly turn into A.
>>
>> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
>> ---
>>   block/io.c | 32 ++++++++++++++++++++++++++++++++
>>   1 file changed, 32 insertions(+)
> 
> Zeroing the intersection may take some time.  So is it right for QMP’s
> block_resize to do this, seeing it is a synchronous operation?
> 
> As far as I can tell, jobs actually have the same problem.  I don’t
> think mirror or commit have a pause point before truncating, so they
> still block the monitor there, don’t they?
> 
> Max
> 

For mirror there is alternative way: we may somehow cheat with block-status
to consider space after EOF as allocated on upper level, then mirror will
ZERO it by itself..
Kevin Wolf Nov. 21, 2019, 11:22 a.m. UTC | #4
Am 20.11.2019 um 22:15 hat Eric Blake geschrieben:
> On 11/20/19 12:44 PM, Kevin Wolf wrote:
> > When extending the size of an image that has a backing file larger than
> > its old size, make sure that the backing file data doesn't become
> > visible in the guest, but the added area is properly zeroed out.
> > 
> > Consider the following scenario where the overlay is shorter than its
> > backing file:
> > 
> >      base.qcow2:     AAAAAAAA
> >      overlay.qcow2:  BBBB
> > 
> > When resizing (extending) overlay.qcow2, the new blocks should not stay
> > unallocated and make the additional As from base.qcow2 visible like
> > before this patch, but zeros should be read.
> > 
> > A similar case happens with the various variants of a commit job when an
> > intermediate file is short (- for unallocated):
> > 
> >      base.qcow2:     A-A-AAAA
> >      mid.qcow2:      BB-B
> >      top.qcow2:      C--C--C-
> > 
> > After commit top.qcow2 to mid.qcow2, the following happens:
> > 
> >      mid.qcow2:      CB-C00C0 (correct result)
> >      mid.qcow2:      CB-C--C- (before this fix)
> > 
> > Without the fix, blocks that previously read as zeros on top.qcow2
> > suddenly turn into A.
> > 
> > Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> > ---
> >   block/io.c | 32 ++++++++++++++++++++++++++++++++
> >   1 file changed, 32 insertions(+)
> > 
> 
> > +    if (new_bytes && bs->backing && prealloc == PREALLOC_MODE_OFF) {
> > +        int64_t backing_len;
> > +
> > +        backing_len = bdrv_getlength(backing_bs(bs));
> > +        if (backing_len < 0) {
> > +            ret = backing_len;
> > +            goto out;
> > +        }
> > +
> > +        if (backing_len > old_size) {
> > +            ret = bdrv_co_do_pwrite_zeroes(
> > +                    bs, old_size, MIN(new_bytes, backing_len - old_size),
> > +                    BDRV_REQ_ZERO_WRITE | BDRV_REQ_MAY_UNMAP);
> > +            if (ret < 0) {
> > +                goto out;
> > +            }
> > +        }
> > +    }
> 
> Note that if writing zeroes is not fast, and it turns out that we copy a lot
> of data rather than unallocated sections from the image being committed,
> that this can actually slow things down (doing a bulk pre-zero doubles up
> data I/O unless it is fast, which is why we added BDRV_REQ_NO_FALLBACK to
> avoid slow pre-zeroing).  However, the complication of zeroing only the
> unallocated clusters rather than a bulk pre-zeroing

I'm not sure how there could already be any allocated clusters in the
area that we just added? (Apart from preallocation, of course, which is
why this is restricted to PREALLOC_MODE_OFF.)

You're right that if this truncate was called in the context of a commit
block job rather than a resize, the zeros might be overwritten later.
I'm not sure if leaving clusters unallocated while the job is running
(and might be cancelled) is right, though. On the other hand, while the
job is running, the target image on its own is invalid anyway.

> for something that is an unlikely corner case (how often do you create
> an overlay shorter than the backing file?) is not worth the extra code
> maintenance (unlike in the 'qemu-img convert' case where it was worth
> the optimization). So I'm fine with how you fixed it here.

Also note that raw doesn't support backing files and qcow2 generally
supports zero writes. If you use an external data file and backing file
at the same time and your file system doesn't support zero writes, you
could run into the problem. Or if you use a more unusual image format
(including qcow2 v2).

So it's really two corner cases combined.

Kevin
Kevin Wolf Nov. 21, 2019, 11:34 a.m. UTC | #5
Am 21.11.2019 um 09:59 hat Max Reitz geschrieben:
> On 20.11.19 19:44, Kevin Wolf wrote:
> > When extending the size of an image that has a backing file larger than
> > its old size, make sure that the backing file data doesn't become
> > visible in the guest, but the added area is properly zeroed out.
> > 
> > Consider the following scenario where the overlay is shorter than its
> > backing file:
> > 
> >     base.qcow2:     AAAAAAAA
> >     overlay.qcow2:  BBBB
> > 
> > When resizing (extending) overlay.qcow2, the new blocks should not stay
> > unallocated and make the additional As from base.qcow2 visible like
> > before this patch, but zeros should be read.
> > 
> > A similar case happens with the various variants of a commit job when an
> > intermediate file is short (- for unallocated):
> > 
> >     base.qcow2:     A-A-AAAA
> >     mid.qcow2:      BB-B
> >     top.qcow2:      C--C--C-
> > 
> > After commit top.qcow2 to mid.qcow2, the following happens:
> > 
> >     mid.qcow2:      CB-C00C0 (correct result)
> >     mid.qcow2:      CB-C--C- (before this fix)
> > 
> > Without the fix, blocks that previously read as zeros on top.qcow2
> > suddenly turn into A.
> > 
> > Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> > ---
> >  block/io.c | 32 ++++++++++++++++++++++++++++++++
> >  1 file changed, 32 insertions(+)
> 
> Zeroing the intersection may take some time.  So is it right for QMP’s
> block_resize to do this, seeing it is a synchronous operation?

What else would be right? Returning an error?

Common cases (raw and qcow2 v3 without external data files) are quick
anyway.

> As far as I can tell, jobs actually have the same problem.  I don’t
> think mirror or commit have a pause point before truncating, so they
> still block the monitor there, don’t they?

Do you really need a pause point? They call bdrv_co_truncate() from
inside the job coroutine, so it will yield. I would expect that this
is enough.

But in fact, all jobs have a pause point before even calling .run(), so
even if that made a difference, it should still be fine.

Kevin
Max Reitz Nov. 21, 2019, 12:21 p.m. UTC | #6
On 21.11.19 12:34, Kevin Wolf wrote:
> Am 21.11.2019 um 09:59 hat Max Reitz geschrieben:
>> On 20.11.19 19:44, Kevin Wolf wrote:
>>> When extending the size of an image that has a backing file larger than
>>> its old size, make sure that the backing file data doesn't become
>>> visible in the guest, but the added area is properly zeroed out.
>>>
>>> Consider the following scenario where the overlay is shorter than its
>>> backing file:
>>>
>>>     base.qcow2:     AAAAAAAA
>>>     overlay.qcow2:  BBBB
>>>
>>> When resizing (extending) overlay.qcow2, the new blocks should not stay
>>> unallocated and make the additional As from base.qcow2 visible like
>>> before this patch, but zeros should be read.
>>>
>>> A similar case happens with the various variants of a commit job when an
>>> intermediate file is short (- for unallocated):
>>>
>>>     base.qcow2:     A-A-AAAA
>>>     mid.qcow2:      BB-B
>>>     top.qcow2:      C--C--C-
>>>
>>> After commit top.qcow2 to mid.qcow2, the following happens:
>>>
>>>     mid.qcow2:      CB-C00C0 (correct result)
>>>     mid.qcow2:      CB-C--C- (before this fix)
>>>
>>> Without the fix, blocks that previously read as zeros on top.qcow2
>>> suddenly turn into A.
>>>
>>> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
>>> ---
>>>  block/io.c | 32 ++++++++++++++++++++++++++++++++
>>>  1 file changed, 32 insertions(+)
>>
>> Zeroing the intersection may take some time.  So is it right for QMP’s
>> block_resize to do this, seeing it is a synchronous operation?
> 
> What else would be right? Returning an error?

Going through a deprecation cycle.

> Common cases (raw and qcow2 v3 without external data files) are quick
> anyway.

Well, but quick enough for a fully blocking operation?

>> As far as I can tell, jobs actually have the same problem.  I don’t
>> think mirror or commit have a pause point before truncating, so they
>> still block the monitor there, don’t they?
> 
> Do you really need a pause point? They call bdrv_co_truncate() from
> inside the job coroutine, so it will yield. I would expect that this
> is enough.

OK then.

> But in fact, all jobs have a pause point before even calling .run(), so
> even if that made a difference, it should still be fine.

Good.

But I believe this is still a problem for block_resize.  I don’t see why
this needs to be fixed in 4.2-rc3.  What’s the problem with going
through a proper deprecation cycle other than the fact that we can’t
start it in 4.2 because we don’t have a resize job yet?

Max
Kevin Wolf Nov. 21, 2019, 2:33 p.m. UTC | #7
Am 21.11.2019 um 13:21 hat Max Reitz geschrieben:
> On 21.11.19 12:34, Kevin Wolf wrote:
> > Am 21.11.2019 um 09:59 hat Max Reitz geschrieben:
> >> On 20.11.19 19:44, Kevin Wolf wrote:
> >>> When extending the size of an image that has a backing file larger than
> >>> its old size, make sure that the backing file data doesn't become
> >>> visible in the guest, but the added area is properly zeroed out.
> >>>
> >>> Consider the following scenario where the overlay is shorter than its
> >>> backing file:
> >>>
> >>>     base.qcow2:     AAAAAAAA
> >>>     overlay.qcow2:  BBBB
> >>>
> >>> When resizing (extending) overlay.qcow2, the new blocks should not stay
> >>> unallocated and make the additional As from base.qcow2 visible like
> >>> before this patch, but zeros should be read.
> >>>
> >>> A similar case happens with the various variants of a commit job when an
> >>> intermediate file is short (- for unallocated):
> >>>
> >>>     base.qcow2:     A-A-AAAA
> >>>     mid.qcow2:      BB-B
> >>>     top.qcow2:      C--C--C-
> >>>
> >>> After commit top.qcow2 to mid.qcow2, the following happens:
> >>>
> >>>     mid.qcow2:      CB-C00C0 (correct result)
> >>>     mid.qcow2:      CB-C--C- (before this fix)
> >>>
> >>> Without the fix, blocks that previously read as zeros on top.qcow2
> >>> suddenly turn into A.
> >>>
> >>> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> >>> ---
> >>>  block/io.c | 32 ++++++++++++++++++++++++++++++++
> >>>  1 file changed, 32 insertions(+)
> >>
> >> Zeroing the intersection may take some time.  So is it right for QMP’s
> >> block_resize to do this, seeing it is a synchronous operation?
> > 
> > What else would be right? Returning an error?
> 
> Going through a deprecation cycle.

And keeping the buggy behaviour for two more releases?

> > Common cases (raw and qcow2 v3 without external data files) are quick
> > anyway.
> 
> Well, but quick enough for a fully blocking operation?

For raw definitely yes, because raw doesn't have backing files, so the
code will never run.

For qcow2, block_resize can already block for a relatively long time
while metadata tables are resized, clusters are discarded etc. I just
don't really see the difference in quality between that and allocating
some zero clusters in a corner case like having a short overlay.

Would you feel more comfortable if we set BDRV_REQ_NO_FALLBACK and
return an error if we can't zero out the area? We would have to
advertise that flag in bs->supported_zero_flags for qcow2 then (but
probably we should do that anyway?)

> >> As far as I can tell, jobs actually have the same problem.  I don’t
> >> think mirror or commit have a pause point before truncating, so they
> >> still block the monitor there, don’t they?
> > 
> > Do you really need a pause point? They call bdrv_co_truncate() from
> > inside the job coroutine, so it will yield. I would expect that this
> > is enough.
> 
> OK then.
> 
> > But in fact, all jobs have a pause point before even calling .run(), so
> > even if that made a difference, it should still be fine.
> 
> Good.
> 
> But I believe this is still a problem for block_resize.  I don’t see why
> this needs to be fixed in 4.2-rc3.  What’s the problem with going
> through a proper deprecation cycle other than the fact that we can’t
> start it in 4.2 because we don’t have a resize job yet?

That the behaviour is wrong.

For commit it's an image corruptor. For resize, I'll admit that it's
just wrong behaviour that is probably harmless in most cases, but it's
still wrong behaviour. It would be a corruptor in the same way as commit
if it allowed resizing intermediate nodes, but I _think_ the old-style
op blockers still forbid this. We'd have to double-check this if we
leave things broken for block_resize.

Anyway, so are you suggesting adding another parameter to
bdrv_co_truncate() that enables wrong, but quicker semantics, and that
would only be used by block_resize?

Kevin
Max Reitz Nov. 21, 2019, 3:25 p.m. UTC | #8
On 21.11.19 15:33, Kevin Wolf wrote:
> Am 21.11.2019 um 13:21 hat Max Reitz geschrieben:
>> On 21.11.19 12:34, Kevin Wolf wrote:
>>> Am 21.11.2019 um 09:59 hat Max Reitz geschrieben:
>>>> On 20.11.19 19:44, Kevin Wolf wrote:
>>>>> When extending the size of an image that has a backing file larger than
>>>>> its old size, make sure that the backing file data doesn't become
>>>>> visible in the guest, but the added area is properly zeroed out.
>>>>>
>>>>> Consider the following scenario where the overlay is shorter than its
>>>>> backing file:
>>>>>
>>>>>     base.qcow2:     AAAAAAAA
>>>>>     overlay.qcow2:  BBBB
>>>>>
>>>>> When resizing (extending) overlay.qcow2, the new blocks should not stay
>>>>> unallocated and make the additional As from base.qcow2 visible like
>>>>> before this patch, but zeros should be read.
>>>>>
>>>>> A similar case happens with the various variants of a commit job when an
>>>>> intermediate file is short (- for unallocated):
>>>>>
>>>>>     base.qcow2:     A-A-AAAA
>>>>>     mid.qcow2:      BB-B
>>>>>     top.qcow2:      C--C--C-
>>>>>
>>>>> After commit top.qcow2 to mid.qcow2, the following happens:
>>>>>
>>>>>     mid.qcow2:      CB-C00C0 (correct result)
>>>>>     mid.qcow2:      CB-C--C- (before this fix)
>>>>>
>>>>> Without the fix, blocks that previously read as zeros on top.qcow2
>>>>> suddenly turn into A.
>>>>>
>>>>> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
>>>>> ---
>>>>>  block/io.c | 32 ++++++++++++++++++++++++++++++++
>>>>>  1 file changed, 32 insertions(+)
>>>>
>>>> Zeroing the intersection may take some time.  So is it right for QMP’s
>>>> block_resize to do this, seeing it is a synchronous operation?
>>>
>>> What else would be right? Returning an error?
>>
>> Going through a deprecation cycle.
> 
> And keeping the buggy behaviour for two more releases?

This sounds like you don’t care about adding another bug when it means
fixing this bug.  I do care.  And so all I was saying is that it seemed
problematic to me to fix the problem in this way, because clearly this
would make block_resize block the monitor in some cases and clearly that
would be a problem, if not a bug.

(And that’s precisely what can be said about the current block_resize
behavior of revealing previous backing file data: It is a problem, but I
wouldn’t call it a full-on bug.  It just seems to me that nobody has
ever really thought about it.)

Also, I don’t see this as a really pressing issue for block_resize at
least, because this isn’t a recent regression or anything.  It was just
the behavior we had, I believe everyone knew it, we just never thought
about whether it really is the best kind of behavior.  So, yes, I’m in
absolutely no hurry to fix this for block_resize.  (Commit is a
different story, but then again commit is a job already, so it doesn’t
suffer from the blocking issue.)


But of course this wasn’t all that you’re saying, you give a very good
point next.

>>> Common cases (raw and qcow2 v3 without external data files) are quick
>>> anyway.
>>
>> Well, but quick enough for a fully blocking operation?> For qcow2, block_resize can already block for a relatively long time
> while metadata tables are resized, clusters are discarded etc. I just
> don't really see the difference in quality between that and allocating
> some zero clusters in a corner case like having a short overlay.

Discarding cropped clusters when shrinking is a good point.  I didn’t
think of that.  So block_resize already has the very problem I was
afraid you’d introduce, because of course discarding isn’t very
different from quick-zeroing.

> Would you feel more comfortable if we set BDRV_REQ_NO_FALLBACK and
> return an error if we can't zero out the area? We would have to
> advertise that flag in bs->supported_zero_flags for qcow2 then (but
> probably we should do that anyway?)

Hm.  I don’t feel that bad about disallowing this edge case (growing a
shrunken overlay) for potentially all non-qcow2v3 formats.  I don’t know
how bad it would be for users other than block_resize, though.

(And I suppose we want a resize job anyway then, and that would work for
those cases?)

>>>> As far as I can tell, jobs actually have the same problem.  I don’t
>>>> think mirror or commit have a pause point before truncating, so they
>>>> still block the monitor there, don’t they?
>>>
>>> Do you really need a pause point? They call bdrv_co_truncate() from
>>> inside the job coroutine, so it will yield. I would expect that this
>>> is enough.
>>
>> OK then.
>>
>>> But in fact, all jobs have a pause point before even calling .run(), so
>>> even if that made a difference, it should still be fine.
>>
>> Good.
>>
>> But I believe this is still a problem for block_resize.  I don’t see why
>> this needs to be fixed in 4.2-rc3.  What’s the problem with going
>> through a proper deprecation cycle other than the fact that we can’t
>> start it in 4.2 because we don’t have a resize job yet?
> 
> That the behaviour is wrong.

I know a couple of wrong behaviors that won’t be fixed in 4.2.

> For commit it's an image corruptor.

That’s a reason why we need it in 4.2.  It’s no reason why we need it
for block_resize.

> For resize, I'll admit that it's
> just wrong behaviour that is probably harmless in most cases, but it's
> still wrong behaviour. It would be a corruptor in the same way as commit
> if it allowed resizing intermediate nodes, but I _think_ the old-style
> op blockers still forbid this. We'd have to double-check this if we
> leave things broken for block_resize.
>> Anyway, so are you suggesting adding another parameter to
> bdrv_co_truncate() that enables wrong, but quicker semantics, and that
> would only be used by block_resize?

That would certainly be a possibility, yes.

I like your suggestion of only allowing it with NO_FALLBACK, but I
suppose we’d want the same parameter for that, too, because users other
than block_resize probably prefer a slow zeroing over an error.

So to me the question then is whether the added complexity is worth it
for an rc3.

Max
Alberto Garcia Nov. 22, 2019, 2:07 p.m. UTC | #9
On Thu 21 Nov 2019 03:33:31 PM CET, Kevin Wolf wrote:
> For commit it's an image corruptor. For resize, I'll admit that it's
> just wrong behaviour that is probably harmless in most cases, but it's
> still wrong behaviour. It would be a corruptor in the same way as
> commit if it allowed resizing intermediate nodes, but I _think_ the
> old-style op blockers still forbid this. We'd have to double-check
> this if we leave things broken for block_resize.

I tried but cannot make block_resize touch an intermediate image:

    if (!bdrv_is_first_non_filter(bs)) {
        error_setg(errp, QERR_FEATURE_DISABLED, "resize");
        goto out;
    }

Berto
Alberto Garcia Nov. 22, 2019, 2:27 p.m. UTC | #10
On Wed 20 Nov 2019 07:44:57 PM CET, Kevin Wolf wrote:
> When extending the size of an image that has a backing file larger than
> its old size, make sure that the backing file data doesn't become
> visible in the guest, but the added area is properly zeroed out.
>
> Consider the following scenario where the overlay is shorter than its
> backing file:
>
>     base.qcow2:     AAAAAAAA
>     overlay.qcow2:  BBBB
>
> When resizing (extending) overlay.qcow2, the new blocks should not stay
> unallocated and make the additional As from base.qcow2 visible like
> before this patch, but zeros should be read.
>
> A similar case happens with the various variants of a commit job when an
> intermediate file is short (- for unallocated):
>
>     base.qcow2:     A-A-AAAA
>     mid.qcow2:      BB-B
>     top.qcow2:      C--C--C-
>
> After commit top.qcow2 to mid.qcow2, the following happens:
>
>     mid.qcow2:      CB-C00C0 (correct result)
>     mid.qcow2:      CB-C--C- (before this fix)
>
> Without the fix, blocks that previously read as zeros on top.qcow2
> suddenly turn into A.
>
> Signed-off-by: Kevin Wolf <kwolf@redhat.com>

Reviewed-by: Alberto Garcia <berto@igalia.com>

Berto
diff mbox series

Patch

diff --git a/block/io.c b/block/io.c
index 003f4ea38c..6a5144f8d2 100644
--- a/block/io.c
+++ b/block/io.c
@@ -3385,12 +3385,44 @@  int coroutine_fn bdrv_co_truncate(BdrvChild *child, int64_t offset, bool exact,
     ret = refresh_total_sectors(bs, offset >> BDRV_SECTOR_BITS);
     if (ret < 0) {
         error_setg_errno(errp, -ret, "Could not refresh total sector count");
+        goto fail_refresh_total_sectors;
     } else {
         offset = bs->total_sectors * BDRV_SECTOR_SIZE;
     }
+
+    /*
+     * If the image has a backing file that is large enough that it would
+     * provide data for the new area, we cannot leave it unallocated because
+     * then the backing file content would become visible. Instead, zero-fill
+     * the area where backing file and new area overlap.
+     *
+     * Note that if the image has a backing file, but was opened without the
+     * backing file, taking care of keeping things consistent with that backing
+     * file is the user's responsibility.
+     */
+    if (new_bytes && bs->backing && prealloc == PREALLOC_MODE_OFF) {
+        int64_t backing_len;
+
+        backing_len = bdrv_getlength(backing_bs(bs));
+        if (backing_len < 0) {
+            ret = backing_len;
+            goto out;
+        }
+
+        if (backing_len > old_size) {
+            ret = bdrv_co_do_pwrite_zeroes(
+                    bs, old_size, MIN(new_bytes, backing_len - old_size),
+                    BDRV_REQ_ZERO_WRITE | BDRV_REQ_MAY_UNMAP);
+            if (ret < 0) {
+                goto out;
+            }
+        }
+    }
+
     /* It's possible that truncation succeeded but refresh_total_sectors
      * failed, but the latter doesn't affect how we should finish the request.
      * Pass 0 as the last parameter so that dirty bitmaps etc. are handled. */
+fail_refresh_total_sectors:
     bdrv_co_write_req_finish(child, offset - new_bytes, new_bytes, &req, 0);
 
 out: