[v2,2/3] block/mirror: Fix target backing BDS
diff mbox

Message ID 20160606144212.24074-3-mreitz@redhat.com
State New
Headers show

Commit Message

Max Reitz June 6, 2016, 2:42 p.m. UTC
Currently, we are trying to move the backing BDS from the source to the
target in bdrv_replace_in_backing_chain() which is called from
mirror_exit(). However, mirror_complete() already tries to open the
target's backing chain with a call to bdrv_open_backing_file().

First, we should only set the target's backing BDS once. Second, the
mirroring block job has a better idea of what to set it to than the
generic code in bdrv_replace_in_backing_chain() (in fact, the latter's
conditions on when to move the backing BDS from source to target are not
really correct).

Therefore, remove that code from bdrv_replace_in_backing_chain() and
leave it to mirror_complete().

However, mirror_complete() in turn pursues a questionable strategy by
employing bdrv_open_backing_file(): On the one hand, because this may
open the wrong backing file with drive-mirror in "existing" mode, or
because it will not override a possibly wrong backing file in the
blockdev-mirror case.

On the other hand, we want to reuse the existing backing chain of the
source instead of opening everything anew, because the latter results in
having multiple BDSs for a single physical file and thus potentially
concurrent access which we should try to avoid.

Thus, instead of invoking bdrv_open_backing_file(), just set the correct
backing BDS directly via bdrv_set_backing_hd(). Also, do so only when
mirror_complete() is certain to succeed.

In contrast to what bdrv_replace_in_backing_chain() did so far, we do
not need to drop the source's backing file.

Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 block.c        |  8 --------
 block/mirror.c | 21 +++++++++++++--------
 2 files changed, 13 insertions(+), 16 deletions(-)

Comments

Kevin Wolf June 8, 2016, 9:32 a.m. UTC | #1
Am 06.06.2016 um 16:42 hat Max Reitz geschrieben:
> Currently, we are trying to move the backing BDS from the source to the
> target in bdrv_replace_in_backing_chain() which is called from
> mirror_exit(). However, mirror_complete() already tries to open the
> target's backing chain with a call to bdrv_open_backing_file().
> 
> First, we should only set the target's backing BDS once. Second, the
> mirroring block job has a better idea of what to set it to than the
> generic code in bdrv_replace_in_backing_chain() (in fact, the latter's
> conditions on when to move the backing BDS from source to target are not
> really correct).
> 
> Therefore, remove that code from bdrv_replace_in_backing_chain() and
> leave it to mirror_complete().
> 
> However, mirror_complete() in turn pursues a questionable strategy by
> employing bdrv_open_backing_file(): On the one hand, because this may
> open the wrong backing file with drive-mirror in "existing" mode, or
> because it will not override a possibly wrong backing file in the
> blockdev-mirror case.
> 
> On the other hand, we want to reuse the existing backing chain of the
> source instead of opening everything anew, because the latter results in
> having multiple BDSs for a single physical file and thus potentially
> concurrent access which we should try to avoid.

Careful, this "wrong" backing file might actually be intended!

Consider a case where you want to move an image with its whole backing
chain to different storage. In that case, you would copy all of the
backing files (cp is good enough, they are read-only), create the
destination image which already points at the copied backing chain, and
then mirror in "existing" mode.

The intention is obviously that after the job completion the new backing
chain is used and not the old one.

I know that such cases were discussed when mirroring was introduced, I'm
not sure whether it's actually used. We need some input there:

Eric, can you tell us whether libvirt makes use of such a setup?

Nir, I'm not sure who is the right person in oVirt these days, but do
you either know yourself whether oVirt requires this to work, or do you
know who else would know?

> Thus, instead of invoking bdrv_open_backing_file(), just set the correct
> backing BDS directly via bdrv_set_backing_hd(). Also, do so only when
> mirror_complete() is certain to succeed.
> 
> In contrast to what bdrv_replace_in_backing_chain() did so far, we do
> not need to drop the source's backing file.
> 
> Signed-off-by: Max Reitz <mreitz@redhat.com>

Leaving the actual code review for later when we have decided what
semantics we even want.

Kevin
Paolo Bonzini June 8, 2016, 11:28 a.m. UTC | #2
----- Original Message -----
> From: "Kevin Wolf" <kwolf@redhat.com>
> To: "Max Reitz" <mreitz@redhat.com>
> Cc: qemu-block@nongnu.org, qemu-devel@nongnu.org, "Fam Zheng" <famz@redhat.com>, nsoffer@redhat.com,
> eblake@redhat.com, pbonzini@redhat.com
> Sent: Wednesday, June 8, 2016 11:32:29 AM
> Subject: Re: [PATCH v2 2/3] block/mirror: Fix target backing BDS
> 
> Am 06.06.2016 um 16:42 hat Max Reitz geschrieben:
> > Currently, we are trying to move the backing BDS from the source to the
> > target in bdrv_replace_in_backing_chain() which is called from
> > mirror_exit(). However, mirror_complete() already tries to open the
> > target's backing chain with a call to bdrv_open_backing_file().
> > 
> > First, we should only set the target's backing BDS once. Second, the
> > mirroring block job has a better idea of what to set it to than the
> > generic code in bdrv_replace_in_backing_chain() (in fact, the latter's
> > conditions on when to move the backing BDS from source to target are not
> > really correct).
> > 
> > Therefore, remove that code from bdrv_replace_in_backing_chain() and
> > leave it to mirror_complete().
> > 
> > However, mirror_complete() in turn pursues a questionable strategy by
> > employing bdrv_open_backing_file(): On the one hand, because this may
> > open the wrong backing file with drive-mirror in "existing" mode, or
> > because it will not override a possibly wrong backing file in the
> > blockdev-mirror case.
> 
> Careful, this "wrong" backing file might actually be intended!
> 
> Consider a case where you want to move an image with its whole backing
> chain to different storage. In that case, you would copy all of the
> backing files (cp is good enough, they are read-only), create the
> destination image which already points at the copied backing chain, and
> then mirror in "existing" mode.
> 
> The intention is obviously that after the job completion the new backing
> chain is used and not the old one.

Yes, this is the intention and it should not be changed.  In addition
to what Kevin said, you can use drive-mirror to collapse the image to a
single file; in this case, QEMU should not be using the backing files of
the source.

bdrv_open_backing_file() is used because what we want to do is to
"undo" the BDRV_O_NO_BACKING flag used by qmp_drive_mirror.

If the contents change under the guest feet, it's the layers above
QEMU that have screwed up.

Paolo
Kevin Wolf June 8, 2016, 11:47 a.m. UTC | #3
Am 08.06.2016 um 13:28 hat Paolo Bonzini geschrieben:
> 
> 
> ----- Original Message -----
> > From: "Kevin Wolf" <kwolf@redhat.com>
> > To: "Max Reitz" <mreitz@redhat.com>
> > Cc: qemu-block@nongnu.org, qemu-devel@nongnu.org, "Fam Zheng" <famz@redhat.com>, nsoffer@redhat.com,
> > eblake@redhat.com, pbonzini@redhat.com
> > Sent: Wednesday, June 8, 2016 11:32:29 AM
> > Subject: Re: [PATCH v2 2/3] block/mirror: Fix target backing BDS
> > 
> > Am 06.06.2016 um 16:42 hat Max Reitz geschrieben:
> > > Currently, we are trying to move the backing BDS from the source to the
> > > target in bdrv_replace_in_backing_chain() which is called from
> > > mirror_exit(). However, mirror_complete() already tries to open the
> > > target's backing chain with a call to bdrv_open_backing_file().
> > > 
> > > First, we should only set the target's backing BDS once. Second, the
> > > mirroring block job has a better idea of what to set it to than the
> > > generic code in bdrv_replace_in_backing_chain() (in fact, the latter's
> > > conditions on when to move the backing BDS from source to target are not
> > > really correct).
> > > 
> > > Therefore, remove that code from bdrv_replace_in_backing_chain() and
> > > leave it to mirror_complete().
> > > 
> > > However, mirror_complete() in turn pursues a questionable strategy by
> > > employing bdrv_open_backing_file(): On the one hand, because this may
> > > open the wrong backing file with drive-mirror in "existing" mode, or
> > > because it will not override a possibly wrong backing file in the
> > > blockdev-mirror case.
> > 
> > Careful, this "wrong" backing file might actually be intended!
> > 
> > Consider a case where you want to move an image with its whole backing
> > chain to different storage. In that case, you would copy all of the
> > backing files (cp is good enough, they are read-only), create the
> > destination image which already points at the copied backing chain, and
> > then mirror in "existing" mode.
> > 
> > The intention is obviously that after the job completion the new backing
> > chain is used and not the old one.
> 
> Yes, this is the intention and it should not be changed.  In addition
> to what Kevin said, you can use drive-mirror to collapse the image to a
> single file; in this case, QEMU should not be using the backing files of
> the source.
> 
> bdrv_open_backing_file() is used because what we want to do is to
> "undo" the BDRV_O_NO_BACKING flag used by qmp_drive_mirror.
> 
> If the contents change under the guest feet, it's the layers above
> QEMU that have screwed up.

We should probably have test cases for both scenarios. They would make
it obvious that changing this behaviour is not okay. Actually, I'm
surprised that our existing cases don't seem to cover this.

Kevin
Max Reitz June 8, 2016, 2:38 p.m. UTC | #4
On 08.06.2016 11:32, Kevin Wolf wrote:
> Am 06.06.2016 um 16:42 hat Max Reitz geschrieben:
>> Currently, we are trying to move the backing BDS from the source to the
>> target in bdrv_replace_in_backing_chain() which is called from
>> mirror_exit(). However, mirror_complete() already tries to open the
>> target's backing chain with a call to bdrv_open_backing_file().
>>
>> First, we should only set the target's backing BDS once. Second, the
>> mirroring block job has a better idea of what to set it to than the
>> generic code in bdrv_replace_in_backing_chain() (in fact, the latter's
>> conditions on when to move the backing BDS from source to target are not
>> really correct).
>>
>> Therefore, remove that code from bdrv_replace_in_backing_chain() and
>> leave it to mirror_complete().
>>
>> However, mirror_complete() in turn pursues a questionable strategy by
>> employing bdrv_open_backing_file(): On the one hand, because this may
>> open the wrong backing file with drive-mirror in "existing" mode, or
>> because it will not override a possibly wrong backing file in the
>> blockdev-mirror case.
>>
>> On the other hand, we want to reuse the existing backing chain of the
>> source instead of opening everything anew, because the latter results in
>> having multiple BDSs for a single physical file and thus potentially
>> concurrent access which we should try to avoid.
> 
> Careful, this "wrong" backing file might actually be intended!

True.

I still consider completely opening the backing chain not correct,
though, at least in absolute-paths mode, because this will result in
having at least two BDSs for single physical image files (once for the
old chain, once for the new one).

So let's go through everything.

== drive-mirror with absolute-paths ==

We already have the backing chain open (around the source BDS), and it's
definitely the correct one. So I think we can always reuse it for the
target.

== drive-mirror with existing ==

You're right, we should probably keep doing bdrv_open_backing_file()
because we cannot check whether the existing image has the same backing
chain as a new absolute-paths image would have had.

This is prone to give you some issues if you actually do want to have
the "default" backing chain, though, because of the multiple BDS thing.
This case is basically guaranteed to break with sync=none and default
image locking.

== blockdev-mirror ==

In theory the simplest one: We just assume the backing chain of the
target has been opened already, and then we blame the user if they have
created multiple BDSs per physical file.

Unluckily in practice, though, we require the target BDS to not have a
backing file at all. blockdev-mirror is just supposed to open the
backing chain after completion, which I really don't like (I don't think
a blockdev- command should do this kind of magic).

Maybe we should allow the target to have a backing file (I really don't
see why it shouldn't have one) and treat the non-backing case like
drive-mirror in existing mode.


Does that sound right?

Max


> Consider a case where you want to move an image with its whole backing
> chain to different storage. In that case, you would copy all of the
> backing files (cp is good enough, they are read-only), create the
> destination image which already points at the copied backing chain, and
> then mirror in "existing" mode.
> 
> The intention is obviously that after the job completion the new backing
> chain is used and not the old one.
> 
> I know that such cases were discussed when mirroring was introduced, I'm
> not sure whether it's actually used. We need some input there:
> 
> Eric, can you tell us whether libvirt makes use of such a setup?
> 
> Nir, I'm not sure who is the right person in oVirt these days, but do
> you either know yourself whether oVirt requires this to work, or do you
> know who else would know?
> 
>> Thus, instead of invoking bdrv_open_backing_file(), just set the correct
>> backing BDS directly via bdrv_set_backing_hd(). Also, do so only when
>> mirror_complete() is certain to succeed.
>>
>> In contrast to what bdrv_replace_in_backing_chain() did so far, we do
>> not need to drop the source's backing file.
>>
>> Signed-off-by: Max Reitz <mreitz@redhat.com>
> 
> Leaving the actual code review for later when we have decided what
> semantics we even want.
> 
> Kevin
>
Max Reitz June 8, 2016, 2:40 p.m. UTC | #5
On 08.06.2016 13:28, Paolo Bonzini wrote:
> 
> 
> ----- Original Message -----
>> From: "Kevin Wolf" <kwolf@redhat.com>
>> To: "Max Reitz" <mreitz@redhat.com>
>> Cc: qemu-block@nongnu.org, qemu-devel@nongnu.org, "Fam Zheng" <famz@redhat.com>, nsoffer@redhat.com,
>> eblake@redhat.com, pbonzini@redhat.com
>> Sent: Wednesday, June 8, 2016 11:32:29 AM
>> Subject: Re: [PATCH v2 2/3] block/mirror: Fix target backing BDS
>>
>> Am 06.06.2016 um 16:42 hat Max Reitz geschrieben:
>>> Currently, we are trying to move the backing BDS from the source to the
>>> target in bdrv_replace_in_backing_chain() which is called from
>>> mirror_exit(). However, mirror_complete() already tries to open the
>>> target's backing chain with a call to bdrv_open_backing_file().
>>>
>>> First, we should only set the target's backing BDS once. Second, the
>>> mirroring block job has a better idea of what to set it to than the
>>> generic code in bdrv_replace_in_backing_chain() (in fact, the latter's
>>> conditions on when to move the backing BDS from source to target are not
>>> really correct).
>>>
>>> Therefore, remove that code from bdrv_replace_in_backing_chain() and
>>> leave it to mirror_complete().
>>>
>>> However, mirror_complete() in turn pursues a questionable strategy by
>>> employing bdrv_open_backing_file(): On the one hand, because this may
>>> open the wrong backing file with drive-mirror in "existing" mode, or
>>> because it will not override a possibly wrong backing file in the
>>> blockdev-mirror case.
>>
>> Careful, this "wrong" backing file might actually be intended!
>>
>> Consider a case where you want to move an image with its whole backing
>> chain to different storage. In that case, you would copy all of the
>> backing files (cp is good enough, they are read-only), create the
>> destination image which already points at the copied backing chain, and
>> then mirror in "existing" mode.
>>
>> The intention is obviously that after the job completion the new backing
>> chain is used and not the old one.
> 
> Yes, this is the intention and it should not be changed.  In addition
> to what Kevin said, you can use drive-mirror to collapse the image to a
> single file; in this case, QEMU should not be using the backing files of
> the source.

That is an issue that we have right now. If you do drive-mirror in
absolute-paths mode with sync=full, the target will have the backing
chain of the source. This is something that this patch fixes.

In fact, I think if you do drive-mirror in existing mode or
blockdev-mirror and the target image does not have a backing file
(whatever sync mode you have used), the same will happen.

Max

> bdrv_open_backing_file() is used because what we want to do is to
> "undo" the BDRV_O_NO_BACKING flag used by qmp_drive_mirror.
> 
> If the contents change under the guest feet, it's the layers above
> QEMU that have screwed up.
> 
> Paolo
>
Max Reitz June 8, 2016, 2:42 p.m. UTC | #6
On 08.06.2016 16:40, Max Reitz wrote:
> On 08.06.2016 13:28, Paolo Bonzini wrote:
>>
>>
>> ----- Original Message -----
>>> From: "Kevin Wolf" <kwolf@redhat.com>
>>> To: "Max Reitz" <mreitz@redhat.com>
>>> Cc: qemu-block@nongnu.org, qemu-devel@nongnu.org, "Fam Zheng" <famz@redhat.com>, nsoffer@redhat.com,
>>> eblake@redhat.com, pbonzini@redhat.com
>>> Sent: Wednesday, June 8, 2016 11:32:29 AM
>>> Subject: Re: [PATCH v2 2/3] block/mirror: Fix target backing BDS
>>>
>>> Am 06.06.2016 um 16:42 hat Max Reitz geschrieben:
>>>> Currently, we are trying to move the backing BDS from the source to the
>>>> target in bdrv_replace_in_backing_chain() which is called from
>>>> mirror_exit(). However, mirror_complete() already tries to open the
>>>> target's backing chain with a call to bdrv_open_backing_file().
>>>>
>>>> First, we should only set the target's backing BDS once. Second, the
>>>> mirroring block job has a better idea of what to set it to than the
>>>> generic code in bdrv_replace_in_backing_chain() (in fact, the latter's
>>>> conditions on when to move the backing BDS from source to target are not
>>>> really correct).
>>>>
>>>> Therefore, remove that code from bdrv_replace_in_backing_chain() and
>>>> leave it to mirror_complete().
>>>>
>>>> However, mirror_complete() in turn pursues a questionable strategy by
>>>> employing bdrv_open_backing_file(): On the one hand, because this may
>>>> open the wrong backing file with drive-mirror in "existing" mode, or
>>>> because it will not override a possibly wrong backing file in the
>>>> blockdev-mirror case.
>>>
>>> Careful, this "wrong" backing file might actually be intended!
>>>
>>> Consider a case where you want to move an image with its whole backing
>>> chain to different storage. In that case, you would copy all of the
>>> backing files (cp is good enough, they are read-only), create the
>>> destination image which already points at the copied backing chain, and
>>> then mirror in "existing" mode.
>>>
>>> The intention is obviously that after the job completion the new backing
>>> chain is used and not the old one.
>>
>> Yes, this is the intention and it should not be changed.  In addition
>> to what Kevin said, you can use drive-mirror to collapse the image to a
>> single file; in this case, QEMU should not be using the backing files of
>> the source.
> 
> That is an issue that we have right now. If you do drive-mirror in
> absolute-paths mode with sync=full, the target will have the backing
> chain of the source. This is something that this patch fixes.

As a clarification: I mean the backing chain inside QEMU (in the BDS
graph), not the on-disk backing chain, i.e. how the physical image files
link to each other.

Max

> In fact, I think if you do drive-mirror in existing mode or
> blockdev-mirror and the target image does not have a backing file
> (whatever sync mode you have used), the same will happen.
> 
> Max
> 
>> bdrv_open_backing_file() is used because what we want to do is to
>> "undo" the BDRV_O_NO_BACKING flag used by qmp_drive_mirror.
>>
>> If the contents change under the guest feet, it's the layers above
>> QEMU that have screwed up.
>>
>> Paolo
>>
> 
>
Nir Soffer June 8, 2016, 3:39 p.m. UTC | #7
On Wed, Jun 8, 2016 at 12:32 PM, Kevin Wolf <kwolf@redhat.com> wrote:
> Am 06.06.2016 um 16:42 hat Max Reitz geschrieben:
>> Currently, we are trying to move the backing BDS from the source to the
>> target in bdrv_replace_in_backing_chain() which is called from
>> mirror_exit(). However, mirror_complete() already tries to open the
>> target's backing chain with a call to bdrv_open_backing_file().
>>
>> First, we should only set the target's backing BDS once. Second, the
>> mirroring block job has a better idea of what to set it to than the
>> generic code in bdrv_replace_in_backing_chain() (in fact, the latter's
>> conditions on when to move the backing BDS from source to target are not
>> really correct).
>>
>> Therefore, remove that code from bdrv_replace_in_backing_chain() and
>> leave it to mirror_complete().
>>
>> However, mirror_complete() in turn pursues a questionable strategy by
>> employing bdrv_open_backing_file(): On the one hand, because this may
>> open the wrong backing file with drive-mirror in "existing" mode, or
>> because it will not override a possibly wrong backing file in the
>> blockdev-mirror case.
>>
>> On the other hand, we want to reuse the existing backing chain of the
>> source instead of opening everything anew, because the latter results in
>> having multiple BDSs for a single physical file and thus potentially
>> concurrent access which we should try to avoid.
>
> Careful, this "wrong" backing file might actually be intended!
>
> Consider a case where you want to move an image with its whole backing
> chain to different storage. In that case, you would copy all of the
> backing files (cp is good enough, they are read-only), create the
> destination image which already points at the copied backing chain, and
> then mirror in "existing" mode.
>
> The intention is obviously that after the job completion the new backing
> chain is used and not the old one.
>
> I know that such cases were discussed when mirroring was introduced, I'm
> not sure whether it's actually used. We need some input there:
>
> Eric, can you tell us whether libvirt makes use of such a setup?
>
> Nir, I'm not sure who is the right person in oVirt these days, but do
> you either know yourself whether oVirt requires this to work, or do you
> know who else would know?

I'm the right person, thanks for keeping me in the loop.

What you describe is how we migrate a disk from one storage to another:

1. Create a vm snapshot
2. Create a volume on the destination storage for the snapshot
3. Start mirroring from the source snapshot to the destination snapshot
    using libvirt virDomainBlockCopy:
    https://libvirt.org/html/libvirt-libvirt-domain.html#virDomainBlockCopy
4. Copy the reset of the chain from source to destination using qemu-img convert
5. Pivot to the new chain using libvirt virDomainBlockJobAbort
    https://libvirt.org/html/libvirt-libvirt-domain.html#virDomainBlockJobAbort
6. Remove the old chain

source and target can be files or block device, and we plan to support also
rbd and gluster volumes as target, maybe also as source.

Nir

>
>> Thus, instead of invoking bdrv_open_backing_file(), just set the correct
>> backing BDS directly via bdrv_set_backing_hd(). Also, do so only when
>> mirror_complete() is certain to succeed.
>>
>> In contrast to what bdrv_replace_in_backing_chain() did so far, we do
>> not need to drop the source's backing file.
>>
>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>
> Leaving the actual code review for later when we have decided what
> semantics we even want.
>
> Kevin
Max Reitz June 8, 2016, 4:54 p.m. UTC | #8
On 08.06.2016 16:38, Max Reitz wrote:
> On 08.06.2016 11:32, Kevin Wolf wrote:
>> Am 06.06.2016 um 16:42 hat Max Reitz geschrieben:
>>> Currently, we are trying to move the backing BDS from the source to the
>>> target in bdrv_replace_in_backing_chain() which is called from
>>> mirror_exit(). However, mirror_complete() already tries to open the
>>> target's backing chain with a call to bdrv_open_backing_file().
>>>
>>> First, we should only set the target's backing BDS once. Second, the
>>> mirroring block job has a better idea of what to set it to than the
>>> generic code in bdrv_replace_in_backing_chain() (in fact, the latter's
>>> conditions on when to move the backing BDS from source to target are not
>>> really correct).
>>>
>>> Therefore, remove that code from bdrv_replace_in_backing_chain() and
>>> leave it to mirror_complete().
>>>
>>> However, mirror_complete() in turn pursues a questionable strategy by
>>> employing bdrv_open_backing_file(): On the one hand, because this may
>>> open the wrong backing file with drive-mirror in "existing" mode, or
>>> because it will not override a possibly wrong backing file in the
>>> blockdev-mirror case.
>>>
>>> On the other hand, we want to reuse the existing backing chain of the
>>> source instead of opening everything anew, because the latter results in
>>> having multiple BDSs for a single physical file and thus potentially
>>> concurrent access which we should try to avoid.
>>
>> Careful, this "wrong" backing file might actually be intended!
> 
> True.
> 
> I still consider completely opening the backing chain not correct,
> though, at least in absolute-paths mode, because this will result in
> having at least two BDSs for single physical image files (once for the
> old chain, once for the new one).
> 
> So let's go through everything.
> 
> == drive-mirror with absolute-paths ==
> 
> We already have the backing chain open (around the source BDS), and it's
> definitely the correct one. So I think we can always reuse it for the
> target.
> 
> == drive-mirror with existing ==
> 
> You're right, we should probably keep doing bdrv_open_backing_file()
> because we cannot check whether the existing image has the same backing
> chain as a new absolute-paths image would have had.
> 
> This is prone to give you some issues if you actually do want to have
> the "default" backing chain, though, because of the multiple BDS thing.
> This case is basically guaranteed to break with sync=none and default
> image locking.
> 
> == blockdev-mirror ==
> 
> In theory the simplest one: We just assume the backing chain of the
> target has been opened already, and then we blame the user if they have
> created multiple BDSs per physical file.
> 
> Unluckily in practice, though, we require the target BDS to not have a
> backing file at all. blockdev-mirror is just supposed to open the
> backing chain after completion, which I really don't like (I don't think
> a blockdev- command should do this kind of magic).

Good news: Turns out I was wrong. I was somehow mixing things up with
blockdev-snapshot (don't ask me why, I have no clue).

So I think it'd be fine to rely on the user that the backing chain of
the target is correct.

Max

> Maybe we should allow the target to have a backing file (I really don't
> see why it shouldn't have one) and treat the non-backing case like
> drive-mirror in existing mode.
> 
> 
> Does that sound right?
> 
> Max
> 
> 
>> Consider a case where you want to move an image with its whole backing
>> chain to different storage. In that case, you would copy all of the
>> backing files (cp is good enough, they are read-only), create the
>> destination image which already points at the copied backing chain, and
>> then mirror in "existing" mode.
>>
>> The intention is obviously that after the job completion the new backing
>> chain is used and not the old one.
>>
>> I know that such cases were discussed when mirroring was introduced, I'm
>> not sure whether it's actually used. We need some input there:
>>
>> Eric, can you tell us whether libvirt makes use of such a setup?
>>
>> Nir, I'm not sure who is the right person in oVirt these days, but do
>> you either know yourself whether oVirt requires this to work, or do you
>> know who else would know?
>>
>>> Thus, instead of invoking bdrv_open_backing_file(), just set the correct
>>> backing BDS directly via bdrv_set_backing_hd(). Also, do so only when
>>> mirror_complete() is certain to succeed.
>>>
>>> In contrast to what bdrv_replace_in_backing_chain() did so far, we do
>>> not need to drop the source's backing file.
>>>
>>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>>
>> Leaving the actual code review for later when we have decided what
>> semantics we even want.
>>
>> Kevin
>>
> 
>
Kevin Wolf June 9, 2016, 8:58 a.m. UTC | #9
Am 08.06.2016 um 17:39 hat Nir Soffer geschrieben:
> On Wed, Jun 8, 2016 at 12:32 PM, Kevin Wolf <kwolf@redhat.com> wrote:
> > Am 06.06.2016 um 16:42 hat Max Reitz geschrieben:
> >> Currently, we are trying to move the backing BDS from the source to the
> >> target in bdrv_replace_in_backing_chain() which is called from
> >> mirror_exit(). However, mirror_complete() already tries to open the
> >> target's backing chain with a call to bdrv_open_backing_file().
> >>
> >> First, we should only set the target's backing BDS once. Second, the
> >> mirroring block job has a better idea of what to set it to than the
> >> generic code in bdrv_replace_in_backing_chain() (in fact, the latter's
> >> conditions on when to move the backing BDS from source to target are not
> >> really correct).
> >>
> >> Therefore, remove that code from bdrv_replace_in_backing_chain() and
> >> leave it to mirror_complete().
> >>
> >> However, mirror_complete() in turn pursues a questionable strategy by
> >> employing bdrv_open_backing_file(): On the one hand, because this may
> >> open the wrong backing file with drive-mirror in "existing" mode, or
> >> because it will not override a possibly wrong backing file in the
> >> blockdev-mirror case.
> >>
> >> On the other hand, we want to reuse the existing backing chain of the
> >> source instead of opening everything anew, because the latter results in
> >> having multiple BDSs for a single physical file and thus potentially
> >> concurrent access which we should try to avoid.
> >
> > Careful, this "wrong" backing file might actually be intended!
> >
> > Consider a case where you want to move an image with its whole backing
> > chain to different storage. In that case, you would copy all of the
> > backing files (cp is good enough, they are read-only), create the
> > destination image which already points at the copied backing chain, and
> > then mirror in "existing" mode.
> >
> > The intention is obviously that after the job completion the new backing
> > chain is used and not the old one.
> >
> > I know that such cases were discussed when mirroring was introduced, I'm
> > not sure whether it's actually used. We need some input there:
> >
> > Eric, can you tell us whether libvirt makes use of such a setup?
> >
> > Nir, I'm not sure who is the right person in oVirt these days, but do
> > you either know yourself whether oVirt requires this to work, or do you
> > know who else would know?
> 
> I'm the right person, thanks for keeping me in the loop.
> 
> What you describe is how we migrate a disk from one storage to another:
> 
> 1. Create a vm snapshot
> 2. Create a volume on the destination storage for the snapshot
> 3. Start mirroring from the source snapshot to the destination snapshot
>     using libvirt virDomainBlockCopy:
>     https://libvirt.org/html/libvirt-libvirt-domain.html#virDomainBlockCopy

With VIR_DOMAIN_BLOCK_COPY_SHALLOW set, right? (That is, sync=top in QMP
speech.)

> 4. Copy the reset of the chain from source to destination using qemu-img convert
> 5. Pivot to the new chain using libvirt virDomainBlockJobAbort
>     https://libvirt.org/html/libvirt-libvirt-domain.html#virDomainBlockJobAbort
> 6. Remove the old chain
> 
> source and target can be files or block device, and we plan to support also
> rbd and gluster volumes as target, maybe also as source.

Thanks, Nir, we should then do our best not to break it.

Max, maybe we can add a qemu-iotests case that does the exact same thing
as oVirt does?

Kevin
Nir Soffer June 9, 2016, 11:16 a.m. UTC | #10
On Thu, Jun 9, 2016 at 11:58 AM, Kevin Wolf <kwolf@redhat.com> wrote:
> Am 08.06.2016 um 17:39 hat Nir Soffer geschrieben:
>> On Wed, Jun 8, 2016 at 12:32 PM, Kevin Wolf <kwolf@redhat.com> wrote:
>> > Am 06.06.2016 um 16:42 hat Max Reitz geschrieben:
>> >> Currently, we are trying to move the backing BDS from the source to the
>> >> target in bdrv_replace_in_backing_chain() which is called from
>> >> mirror_exit(). However, mirror_complete() already tries to open the
>> >> target's backing chain with a call to bdrv_open_backing_file().
>> >>
>> >> First, we should only set the target's backing BDS once. Second, the
>> >> mirroring block job has a better idea of what to set it to than the
>> >> generic code in bdrv_replace_in_backing_chain() (in fact, the latter's
>> >> conditions on when to move the backing BDS from source to target are not
>> >> really correct).
>> >>
>> >> Therefore, remove that code from bdrv_replace_in_backing_chain() and
>> >> leave it to mirror_complete().
>> >>
>> >> However, mirror_complete() in turn pursues a questionable strategy by
>> >> employing bdrv_open_backing_file(): On the one hand, because this may
>> >> open the wrong backing file with drive-mirror in "existing" mode, or
>> >> because it will not override a possibly wrong backing file in the
>> >> blockdev-mirror case.
>> >>
>> >> On the other hand, we want to reuse the existing backing chain of the
>> >> source instead of opening everything anew, because the latter results in
>> >> having multiple BDSs for a single physical file and thus potentially
>> >> concurrent access which we should try to avoid.
>> >
>> > Careful, this "wrong" backing file might actually be intended!
>> >
>> > Consider a case where you want to move an image with its whole backing
>> > chain to different storage. In that case, you would copy all of the
>> > backing files (cp is good enough, they are read-only), create the
>> > destination image which already points at the copied backing chain, and
>> > then mirror in "existing" mode.
>> >
>> > The intention is obviously that after the job completion the new backing
>> > chain is used and not the old one.
>> >
>> > I know that such cases were discussed when mirroring was introduced, I'm
>> > not sure whether it's actually used. We need some input there:
>> >
>> > Eric, can you tell us whether libvirt makes use of such a setup?
>> >
>> > Nir, I'm not sure who is the right person in oVirt these days, but do
>> > you either know yourself whether oVirt requires this to work, or do you
>> > know who else would know?
>>
>> I'm the right person, thanks for keeping me in the loop.
>>
>> What you describe is how we migrate a disk from one storage to another:
>>
>> 1. Create a vm snapshot
>> 2. Create a volume on the destination storage for the snapshot
>> 3. Start mirroring from the source snapshot to the destination snapshot
>>     using libvirt virDomainBlockCopy:
>>     https://libvirt.org/html/libvirt-libvirt-domain.html#virDomainBlockCopy
>
> With VIR_DOMAIN_BLOCK_COPY_SHALLOW set, right? (That is, sync=top in QMP
> speech.)

Yes, actually we use:

VIR_DOMAIN_BLOCK_COPY_SHALLOW | VIR_DOMAIN_BLOCK_COPY_REUSE_EXT

>> 4. Copy the reset of the chain from source to destination using qemu-img convert
>> 5. Pivot to the new chain using libvirt virDomainBlockJobAbort
>>     https://libvirt.org/html/libvirt-libvirt-domain.html#virDomainBlockJobAbort
>> 6. Remove the old chain
>>
>> source and target can be files or block device, and we plan to support also
>> rbd and gluster volumes as target, maybe also as source.
>
> Thanks, Nir, we should then do our best not to break it.
>
> Max, maybe we can add a qemu-iotests case that does the exact same thing
> as oVirt does?
>
> Kevin

Patch
diff mbox

diff --git a/block.c b/block.c
index 16463aa..792f5dd 100644
--- a/block.c
+++ b/block.c
@@ -2288,14 +2288,6 @@  void bdrv_replace_in_backing_chain(BlockDriverState *old, BlockDriverState *new)
 
     change_parent_backing_link(old, new);
 
-    /* Change backing files if a previously independent node is added to the
-     * chain. For active commit, we replace top by its own (indirect) backing
-     * file and don't do anything here so we don't build a loop. */
-    if (new->backing == NULL && !bdrv_chain_contains(backing_bs(old), new)) {
-        bdrv_set_backing_hd(new, backing_bs(old));
-        bdrv_set_backing_hd(old, NULL);
-    }
-
     bdrv_unref(old);
 }
 
diff --git a/block/mirror.c b/block/mirror.c
index 80fd3c7..217475b 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -742,15 +742,11 @@  static void mirror_set_speed(BlockJob *job, int64_t speed, Error **errp)
 static void mirror_complete(BlockJob *job, Error **errp)
 {
     MirrorBlockJob *s = container_of(job, MirrorBlockJob, common);
-    Error *local_err = NULL;
-    int ret;
+    BlockDriverState *src, *target;
+
+    src = blk_bs(job->blk);
+    target = blk_bs(s->target);
 
-    ret = bdrv_open_backing_file(blk_bs(s->target), NULL, "backing",
-                                 &local_err);
-    if (ret < 0) {
-        error_propagate(errp, local_err);
-        return;
-    }
     if (!s->synced) {
         error_setg(errp, QERR_BLOCK_JOB_NOT_READY, job->id);
         return;
@@ -777,6 +773,15 @@  static void mirror_complete(BlockJob *job, Error **errp)
         aio_context_release(replace_aio_context);
     }
 
+    /* Now we need to adjust the target's backing BDS. This is not necessary
+     * when having performed a commit operation. */
+    if (!bdrv_chain_contains(backing_bs(src), target)) {
+        BlockDriverState *backing = s->is_none_mode ? src : s->base;
+        if (backing_bs(target) != backing) {
+            bdrv_set_backing_hd(target, backing);
+        }
+    }
+
     s->should_complete = true;
     block_job_enter(&s->common);
 }