mbox series

[0/5] block/qcow2-bitmap: Enable resize with persistent bitmaps

Message ID 20190305234337.18353-1-jsnow@redhat.com (mailing list archive)
Headers show
Series block/qcow2-bitmap: Enable resize with persistent bitmaps | expand

Message

John Snow March 5, 2019, 11:43 p.m. UTC
This series aims to enable block resizes when persistent bitmaps are
in use. The basic approach here is to recognize that we now load all
persistent bitmaps in memory, and so we can rely on in-memory resizes
and then flush the changed metadata back to disk.

One part that is potentially now quite strange is that bitmap resizes
may happen twice: once during the qcow2 resize event only if persistent
bitmaps are found, and then again as part of the generic resize callback
event whether or not we have any persistent bitmaps.

The second round is required if we are not using qcow2 or we have only
transient bitmaps. The first round is required as we wish to flush the
bitmaps back to disk atomically with the qcow2 resize to avoid violating
our invariants for the bitmap metadata which is checked in many places.

This is harmless; hbitmap_truncate will recognize the second round as
a no-op.

John Snow (5):
  block/qcow2-bitmap: Skip length check in some cases
  block/qcow2-bitmap: Allow bitmap flushing
  block/qcow2-bitmap: don't remove bitmaps on reopen
  block/qcow2-bitmap: Allow resizes with persistent bitmaps
  tests/qemu-iotests: add bitmap resize test 246

 block/qcow2.h              |   2 +
 block/qcow2-bitmap.c       | 123 +++++++++++----
 block/qcow2.c              |  27 +++-
 tests/qemu-iotests/246     | 114 ++++++++++++++
 tests/qemu-iotests/246.out | 296 +++++++++++++++++++++++++++++++++++++
 tests/qemu-iotests/group   |   1 +
 6 files changed, 528 insertions(+), 35 deletions(-)
 create mode 100755 tests/qemu-iotests/246
 create mode 100644 tests/qemu-iotests/246.out

Comments

John Snow March 6, 2019, 12:02 a.m. UTC | #1
On 3/5/19 6:43 PM, John Snow wrote:

++ This series relies on [Qemu-devel] [PATCH v3 0/7] bitmaps: add
inconsistent bit, you can apply it on top of those patches, or you can
use my github staging branch as a basis:

https://github.com/jnsnow/qemu/tree/bitmaps

> This series aims to enable block resizes when persistent bitmaps are
> in use. The basic approach here is to recognize that we now load all
> persistent bitmaps in memory, and so we can rely on in-memory resizes
> and then flush the changed metadata back to disk.
> 
> One part that is potentially now quite strange is that bitmap resizes
> may happen twice: once during the qcow2 resize event only if persistent
> bitmaps are found, and then again as part of the generic resize callback
> event whether or not we have any persistent bitmaps.
> 
> The second round is required if we are not using qcow2 or we have only
> transient bitmaps. The first round is required as we wish to flush the
> bitmaps back to disk atomically with the qcow2 resize to avoid violating
> our invariants for the bitmap metadata which is checked in many places.
> 
> This is harmless; hbitmap_truncate will recognize the second round as
> a no-op.
> 
> John Snow (5):
>   block/qcow2-bitmap: Skip length check in some cases
>   block/qcow2-bitmap: Allow bitmap flushing
>   block/qcow2-bitmap: don't remove bitmaps on reopen
>   block/qcow2-bitmap: Allow resizes with persistent bitmaps
>   tests/qemu-iotests: add bitmap resize test 246
> 
>  block/qcow2.h              |   2 +
>  block/qcow2-bitmap.c       | 123 +++++++++++----
>  block/qcow2.c              |  27 +++-
>  tests/qemu-iotests/246     | 114 ++++++++++++++
>  tests/qemu-iotests/246.out | 296 +++++++++++++++++++++++++++++++++++++
>  tests/qemu-iotests/group   |   1 +
>  6 files changed, 528 insertions(+), 35 deletions(-)
>  create mode 100755 tests/qemu-iotests/246
>  create mode 100644 tests/qemu-iotests/246.out
>
no-reply@patchew.org March 10, 2019, 4:50 p.m. UTC | #2
Patchew URL: https://patchew.org/QEMU/20190305234337.18353-1-jsnow@redhat.com/



Hi,

This series failed the docker-mingw@fedora build test. Please find the testing commands and
their output below. If you have Docker installed, you can probably reproduce it
locally.

=== TEST SCRIPT BEGIN ===
#!/bin/bash
time make docker-test-mingw@fedora SHOW_ENV=1 J=14 NETWORK=1
=== TEST SCRIPT END ===

  CC      crypto/tlssession.o
  CC      crypto/secret.o
/tmp/qemu-test/src/block/qcow2-bitmap.c: In function 'qcow2_truncate_bitmaps_check':
/tmp/qemu-test/src/block/qcow2-bitmap.c:1198:13: error: implicit declaration of function 'bdrv_dirty_bitmap_check'; did you mean 'bdrv_dirty_bitmap_lock'? [-Werror=implicit-function-declaration]
         if (bdrv_dirty_bitmap_check(bitmap, BDRV_BITMAP_DEFAULT, errp)) {
             ^~~~~~~~~~~~~~~~~~~~~~~
             bdrv_dirty_bitmap_lock
/tmp/qemu-test/src/block/qcow2-bitmap.c:1198:13: error: nested extern declaration of 'bdrv_dirty_bitmap_check' [-Werror=nested-externs]
/tmp/qemu-test/src/block/qcow2-bitmap.c:1198:45: error: 'BDRV_BITMAP_DEFAULT' undeclared (first use in this function); did you mean 'DM_OUT_DEFAULT'?
         if (bdrv_dirty_bitmap_check(bitmap, BDRV_BITMAP_DEFAULT, errp)) {
                                             ^~~~~~~~~~~~~~~~~~~
                                             DM_OUT_DEFAULT


The full log is available at
http://patchew.org/logs/20190305234337.18353-1-jsnow@redhat.com/testing.docker-mingw@fedora/?type=message.
---
Email generated automatically by Patchew [http://patchew.org/].
Please send your feedback to patchew-devel@redhat.com
Eric Blake March 11, 2019, 4:18 p.m. UTC | #3
On 3/5/19 5:43 PM, John Snow wrote:
> This series aims to enable block resizes when persistent bitmaps are
> in use. The basic approach here is to recognize that we now load all
> persistent bitmaps in memory, and so we can rely on in-memory resizes
> and then flush the changed metadata back to disk.
> 
> One part that is potentially now quite strange is that bitmap resizes
> may happen twice: once during the qcow2 resize event only if persistent
> bitmaps are found, and then again as part of the generic resize callback
> event whether or not we have any persistent bitmaps.
> 
> The second round is required if we are not using qcow2 or we have only
> transient bitmaps. The first round is required as we wish to flush the
> bitmaps back to disk atomically with the qcow2 resize to avoid violating
> our invariants for the bitmap metadata which is checked in many places.
> 
> This is harmless; hbitmap_truncate will recognize the second round as
> a no-op.

FWIW - I have not yet reviewed this series closely, but I think it would
be wise to get this initial cut in before softfreeze (we can make
further tweaks to fix bugs in assumptions during rc1 and rc2, but it's a
lot harder to add the series at all if it misses softfreeze).
Vladimir Sementsov-Ogievskiy March 11, 2019, 5:36 p.m. UTC | #4
11.03.2019 19:18, Eric Blake wrote:
> On 3/5/19 5:43 PM, John Snow wrote:
>> This series aims to enable block resizes when persistent bitmaps are
>> in use. The basic approach here is to recognize that we now load all
>> persistent bitmaps in memory, and so we can rely on in-memory resizes
>> and then flush the changed metadata back to disk.
>>
>> One part that is potentially now quite strange is that bitmap resizes
>> may happen twice: once during the qcow2 resize event only if persistent
>> bitmaps are found, and then again as part of the generic resize callback
>> event whether or not we have any persistent bitmaps.
>>
>> The second round is required if we are not using qcow2 or we have only
>> transient bitmaps. The first round is required as we wish to flush the
>> bitmaps back to disk atomically with the qcow2 resize to avoid violating
>> our invariants for the bitmap metadata which is checked in many places.
>>
>> This is harmless; hbitmap_truncate will recognize the second round as
>> a no-op.
> 
> FWIW - I have not yet reviewed this series closely, but I think it would
> be wise to get this initial cut in before softfreeze (we can make
> further tweaks to fix bugs in assumptions during rc1 and rc2, but it's a
> lot harder to add the series at all if it misses softfreeze).
> 

I still not convinced that we need bitmap flushing. I think (and Den supports
me) that it's a bad idea. It makes things more difficult without a reason,
except improving debugging with --force-share which should never be used in
production.

Bitmaps are stored in qcow2_inactivate() which is true place for flushing caches.
Eric Blake March 11, 2019, 5:48 p.m. UTC | #5
On 3/11/19 12:36 PM, Vladimir Sementsov-Ogievskiy wrote:

>> FWIW - I have not yet reviewed this series closely, but I think it would
>> be wise to get this initial cut in before softfreeze (we can make
>> further tweaks to fix bugs in assumptions during rc1 and rc2, but it's a
>> lot harder to add the series at all if it misses softfreeze).
>>
> 
> I still not convinced that we need bitmap flushing. I think (and Den supports
> me) that it's a bad idea. It makes things more difficult without a reason,
> except improving debugging with --force-share which should never be used in
> production.

And I'm still not convinced that adding bitmap flushing will add a
benchmarkable delay to operation. Although debugging may not be needed
in a production environment, having it after a failure can be a lifesaver.

Looking at this from another point of view: if this series goes in now
(with its use of limited bitmap flushing on resize in order to make
coding bitmap resize easier, rather than global bitmap flushing after
every bitmap add/delete), then the only time that flushing to disk
happens is during resize (which is currently flat-out forbidden), so it
will not penalize existing use cases. And we still have rc1/rc2 to fix
any bugs if we can come up with some other way to get resize to work
without flushing (or even revert things if it proves to be too
invasive). But as a feature addition, if this series does not go in now,
then bitmap resize is stuck waiting for 4.2, regardless of what we can
figure out during rc1/rc2.

> 
> Bitmaps are stored in qcow2_inactivate() which is true place for flushing caches.
> 

There doesn't have to be just one place for flushing caches.  In terms
of IOPs, how much overhead does a flush cost in relation to everything
else?  And how infrequent are bitmap resize, add, and delete events?
Optimizing to the bare minimum of flushes just because it adds minimal
overhead may be premature optimization if that overhead is in the noise
anyway.
John Snow March 11, 2019, 6:05 p.m. UTC | #6
On 3/11/19 1:36 PM, Vladimir Sementsov-Ogievskiy wrote:
> 11.03.2019 19:18, Eric Blake wrote:
>> On 3/5/19 5:43 PM, John Snow wrote:
>>> This series aims to enable block resizes when persistent bitmaps are
>>> in use. The basic approach here is to recognize that we now load all
>>> persistent bitmaps in memory, and so we can rely on in-memory resizes
>>> and then flush the changed metadata back to disk.
>>>
>>> One part that is potentially now quite strange is that bitmap resizes
>>> may happen twice: once during the qcow2 resize event only if persistent
>>> bitmaps are found, and then again as part of the generic resize callback
>>> event whether or not we have any persistent bitmaps.
>>>
>>> The second round is required if we are not using qcow2 or we have only
>>> transient bitmaps. The first round is required as we wish to flush the
>>> bitmaps back to disk atomically with the qcow2 resize to avoid violating
>>> our invariants for the bitmap metadata which is checked in many places.
>>>
>>> This is harmless; hbitmap_truncate will recognize the second round as
>>> a no-op.
>>
>> FWIW - I have not yet reviewed this series closely, but I think it would
>> be wise to get this initial cut in before softfreeze (we can make
>> further tweaks to fix bugs in assumptions during rc1 and rc2, but it's a
>> lot harder to add the series at all if it misses softfreeze).
>>
> 
> I still not convinced that we need bitmap flushing. I think (and Den supports
> me) that it's a bad idea. It makes things more difficult without a reason,
> except improving debugging with --force-share which should never be used in
> production.
> 

I know you don't want it in the general case, but I think resizing
represents a genuine case for exactly when we want it.

Set aside your concerns about general case flushing for now -- this is a
distraction -- and let us consider ONLY resize events.

If QEMU crashes after a resize event, we will be unable to even open
qcow2 images if we don't resize the metadata information and write it
back out to disk.

It is far simpler -- logically and technically -- to just write this
data out to disk during a resize event. It won't happen often, shouldn't
incur much of an IO penalty, and (I think) is algorithmically
straightforward.

If it's really that much of a problem, we can revise it during the RC
period to eliminate the flush.

> Bitmaps are stored in qcow2_inactivate() which is true place for flushing caches.
>
Vladimir Sementsov-Ogievskiy March 11, 2019, 6:05 p.m. UTC | #7
11.03.2019 20:48, Eric Blake wrote:
> On 3/11/19 12:36 PM, Vladimir Sementsov-Ogievskiy wrote:
> 
>>> FWIW - I have not yet reviewed this series closely, but I think it would
>>> be wise to get this initial cut in before softfreeze (we can make
>>> further tweaks to fix bugs in assumptions during rc1 and rc2, but it's a
>>> lot harder to add the series at all if it misses softfreeze).
>>>
>>
>> I still not convinced that we need bitmap flushing. I think (and Den supports
>> me) that it's a bad idea. It makes things more difficult without a reason,
>> except improving debugging with --force-share which should never be used in
>> production.
> 
> And I'm still not convinced that adding bitmap flushing will add a
> benchmarkable delay to operation. Although debugging may not be needed
> in a production environment, having it after a failure can be a lifesaver.

What is the real debugging case, when we need this information? Keeping in mind,
that it may still be incorrect due to different error paths? (resize failed on
some point before flush, or resize successed but flush failed)

> 
> Looking at this from another point of view: if this series goes in now
> (with its use of limited bitmap flushing on resize in order to make
> coding bitmap resize easier, rather than global bitmap flushing after
> every bitmap add/delete), then the only time that flushing to disk
> happens is during resize (which is currently flat-out forbidden), so it
> will not penalize existing use cases. And we still have rc1/rc2 to fix
> any bugs if we can come up with some other way to get resize to work
> without flushing (or even revert things if it proves to be too
> invasive). But as a feature addition, if this series does not go in now,
> then bitmap resize is stuck waiting for 4.2, regardless of what we can
> figure out during rc1/rc2.
> 
>>
>> Bitmaps are stored in qcow2_inactivate() which is true place for flushing caches.
>>
> 
> There doesn't have to be just one place for flushing caches.  In terms
> of IOPs, how much overhead does a flush cost in relation to everything
> else?

Assume we have 16tb disk. Bitmap with default granularity will take 30mb.
Assume that we have several such bitmaps (for example, several disabled and
one active, remember differential backups and related staff). So, it will
be several hundreds of megabytes.

   And how infrequent are bitmap resize, add, and delete events?
> Optimizing to the bare minimum of flushes just because it adds minimal
> overhead may be premature optimization if that overhead is in the noise
> anyway.
>
Vladimir Sementsov-Ogievskiy March 11, 2019, 6:26 p.m. UTC | #8
11.03.2019 21:05, John Snow wrote:
> 
> 
> On 3/11/19 1:36 PM, Vladimir Sementsov-Ogievskiy wrote:
>> 11.03.2019 19:18, Eric Blake wrote:
>>> On 3/5/19 5:43 PM, John Snow wrote:
>>>> This series aims to enable block resizes when persistent bitmaps are
>>>> in use. The basic approach here is to recognize that we now load all
>>>> persistent bitmaps in memory, and so we can rely on in-memory resizes
>>>> and then flush the changed metadata back to disk.
>>>>
>>>> One part that is potentially now quite strange is that bitmap resizes
>>>> may happen twice: once during the qcow2 resize event only if persistent
>>>> bitmaps are found, and then again as part of the generic resize callback
>>>> event whether or not we have any persistent bitmaps.
>>>>
>>>> The second round is required if we are not using qcow2 or we have only
>>>> transient bitmaps. The first round is required as we wish to flush the
>>>> bitmaps back to disk atomically with the qcow2 resize to avoid violating
>>>> our invariants for the bitmap metadata which is checked in many places.
>>>>
>>>> This is harmless; hbitmap_truncate will recognize the second round as
>>>> a no-op.
>>>
>>> FWIW - I have not yet reviewed this series closely, but I think it would
>>> be wise to get this initial cut in before softfreeze (we can make
>>> further tweaks to fix bugs in assumptions during rc1 and rc2, but it's a
>>> lot harder to add the series at all if it misses softfreeze).
>>>
>>
>> I still not convinced that we need bitmap flushing. I think (and Den supports
>> me) that it's a bad idea. It makes things more difficult without a reason,
>> except improving debugging with --force-share which should never be used in
>> production.
>>
> 
> I know you don't want it in the general case, but I think resizing
> represents a genuine case for exactly when we want it.
> 
> Set aside your concerns about general case flushing for now -- this is a
> distraction -- and let us consider ONLY resize events.
> 
> If QEMU crashes after a resize event, we will be unable to even open
> qcow2 images if we don't resize the metadata information and write it
> back out to disk.
> 
> It is far simpler -- logically and technically -- to just write this
> data out to disk during a resize event. It won't happen often, shouldn't
> incur much of an IO penalty, and (I think) is algorithmically
> straightforward.
> 
> If it's really that much of a problem, we can revise it during the RC
> period to eliminate the flush.
> 
>> Bitmaps are stored in qcow2_inactivate() which is true place for flushing caches.
>>


Hmm, you are so sure, that I can't continue arguing)
What about patch 3, did you test it?
It's just strange for me that we are going to push unreviewed series just to have
the feature in 4.0.. Why is it so important?
Also, we have no comments from Max and Kevin who are maintainers of the only file,
changed in the series. And why this don't go through their trees?
This all just breaks all patterns I'm used to..

And I hope, I'll send a counter-propasal without flushing in a few minutes...
John Snow March 11, 2019, 6:35 p.m. UTC | #9
On 3/11/19 2:26 PM, Vladimir Sementsov-Ogievskiy wrote:
> 11.03.2019 21:05, John Snow wrote:
>>
>>
>> On 3/11/19 1:36 PM, Vladimir Sementsov-Ogievskiy wrote:
>>> 11.03.2019 19:18, Eric Blake wrote:
>>>> On 3/5/19 5:43 PM, John Snow wrote:
>>>>> This series aims to enable block resizes when persistent bitmaps are
>>>>> in use. The basic approach here is to recognize that we now load all
>>>>> persistent bitmaps in memory, and so we can rely on in-memory resizes
>>>>> and then flush the changed metadata back to disk.
>>>>>
>>>>> One part that is potentially now quite strange is that bitmap resizes
>>>>> may happen twice: once during the qcow2 resize event only if persistent
>>>>> bitmaps are found, and then again as part of the generic resize callback
>>>>> event whether or not we have any persistent bitmaps.
>>>>>
>>>>> The second round is required if we are not using qcow2 or we have only
>>>>> transient bitmaps. The first round is required as we wish to flush the
>>>>> bitmaps back to disk atomically with the qcow2 resize to avoid violating
>>>>> our invariants for the bitmap metadata which is checked in many places.
>>>>>
>>>>> This is harmless; hbitmap_truncate will recognize the second round as
>>>>> a no-op.
>>>>
>>>> FWIW - I have not yet reviewed this series closely, but I think it would
>>>> be wise to get this initial cut in before softfreeze (we can make
>>>> further tweaks to fix bugs in assumptions during rc1 and rc2, but it's a
>>>> lot harder to add the series at all if it misses softfreeze).
>>>>
>>>
>>> I still not convinced that we need bitmap flushing. I think (and Den supports
>>> me) that it's a bad idea. It makes things more difficult without a reason,
>>> except improving debugging with --force-share which should never be used in
>>> production.
>>>
>>
>> I know you don't want it in the general case, but I think resizing
>> represents a genuine case for exactly when we want it.
>>
>> Set aside your concerns about general case flushing for now -- this is a
>> distraction -- and let us consider ONLY resize events.
>>
>> If QEMU crashes after a resize event, we will be unable to even open
>> qcow2 images if we don't resize the metadata information and write it
>> back out to disk.
>>
>> It is far simpler -- logically and technically -- to just write this
>> data out to disk during a resize event. It won't happen often, shouldn't
>> incur much of an IO penalty, and (I think) is algorithmically
>> straightforward.
>>
>> If it's really that much of a problem, we can revise it during the RC
>> period to eliminate the flush.
>>
>>> Bitmaps are stored in qcow2_inactivate() which is true place for flushing caches.
>>>
> 
> 
> Hmm, you are so sure, that I can't continue arguing)
> What about patch 3, did you test it?

I'm going to scrap that part. I'm not staging this series as-is.

> It's just strange for me that we are going to push unreviewed series just to have
> the feature in 4.0.. Why is it so important?

Libvirt is finalizing its API to use the bitmaps feature and consumers
above libvirt, oVirt et al, want to be able to resize their disks. I
consider it part of the minimum necessary package to deliver the libvirt
API, and QEMU 4.0 will be a minimum version to use that API.

> Also, we have no comments from Max and Kevin who are maintainers of the only file,
> changed in the series. And why this don't go through their trees?

I've discussed it with Kevin in the IRC channel; I don't think he has
strong feelings on flushing bitmap metadata versus not as much as he
cares that we make sure we leave openable images on crash. In general I
think bitmaps are "our problem".

> This all just breaks all patterns I'm used to..
> 

Come now, this is melodramatic. This changes nothing about how bitmaps
work except during resize operations which used to be prohibited.

> And I hope, I'll send a counter-propasal without flushing in a few minutes...
> 
Go ahead.

I'm going to have to make a decision tonight, though.

--js
Kevin Wolf March 12, 2019, 10:20 a.m. UTC | #10
Am 11.03.2019 um 19:35 hat John Snow geschrieben:
> On 3/11/19 2:26 PM, Vladimir Sementsov-Ogievskiy wrote:
> > Also, we have no comments from Max and Kevin who are maintainers of the only file,
> > changed in the series. And why this don't go through their trees?
> 
> I've discussed it with Kevin in the IRC channel; I don't think he has
> strong feelings on flushing bitmap metadata versus not as much as he
> cares that we make sure we leave openable images on crash. In general I
> think bitmaps are "our problem".

This, pretty much. I haven't been involved much in the bitmap code
before, and today I have other series to deal with for soft freeze. So
I'll trust you to do the right thing.

And the right thing means at least that after the resize operation has
returned (either success or failure), a crash doesn't lead to an image
file that can't be opened again.

Ideally, the same would be true for a crash in the middle the resize
operation, but as John explained to me that we require an exact match of
the size, this doesn't seem to completely work out without more changes:
Either a way to update both atomically (journal?) or at least accept
oversized bitmaps so we can just choose the right order. The latter
sounds tempting to me, though I haven't put much thought in the
consequences this would have.

Kevin
Vladimir Sementsov-Ogievskiy March 12, 2019, 1:39 p.m. UTC | #11
12.03.2019 13:20, Kevin Wolf wrote:
> Am 11.03.2019 um 19:35 hat John Snow geschrieben:
>> On 3/11/19 2:26 PM, Vladimir Sementsov-Ogievskiy wrote:
>>> Also, we have no comments from Max and Kevin who are maintainers of the only file,
>>> changed in the series. And why this don't go through their trees?
>>
>> I've discussed it with Kevin in the IRC channel; I don't think he has
>> strong feelings on flushing bitmap metadata versus not as much as he
>> cares that we make sure we leave openable images on crash. In general I
>> think bitmaps are "our problem".
> 
> This, pretty much. I haven't been involved much in the bitmap code
> before, and today I have other series to deal with for soft freeze. So
> I'll trust you to do the right thing.
> 
> And the right thing means at least that after the resize operation has
> returned (either success or failure), a crash doesn't lead to an image
> file that can't be opened again.
> 
> Ideally, the same would be true for a crash in the middle the resize
> operation, but as John explained to me that we require an exact match of
> the size,

Not exact, and bitmap size is not stored in the image. But it is checked,
that bitmap table is enough to store bitmap corresponding to current image size.

  this doesn't seem to completely work out without more changes:
> Either a way to update both atomically (journal?) or at least accept
> oversized bitmaps so we can just choose the right order. The latter
> sounds tempting to me, though I haven't put much thought in the
> consequences this would have.

I think correct way is not doing this check for inconsistent bitmaps, considering
their size to be possibly outdated. And I've sent a counter proposal which goes
this way.