[PULL,08/17] block: Assert no write requests under BDRV_O_INCOMING
diff mbox

Message ID 1453307106-28330-9-git-send-email-kwolf@redhat.com
State New
Headers show

Commit Message

Kevin Wolf Jan. 20, 2016, 4:24 p.m. UTC
As long as BDRV_O_INCOMING is set, the image file is only opened so we
have a file descriptor for it. We're definitely not supposed to modify
the image, it's still owned by the migration source.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
---
 block/io.c | 2 ++
 1 file changed, 2 insertions(+)

Comments

Laurent Vivier Feb. 18, 2016, 4:03 p.m. UTC | #1
Hi,

this commit breaks incoming migration case:

qemu-system-ppc64 XXX -incoming tcp:0:4444
qemu-system-ppc64: .../qemu/block/io.c:1304: bdrv_co_do_pwritev:
Assertion `!(bs->open_flags & 0x0800)' failed.

Without "-incoming", the same command line boots fine.

Are you aware of the problem?

Laurent

On 20/01/2016 17:24, Kevin Wolf wrote:
> As long as BDRV_O_INCOMING is set, the image file is only opened so we
> have a file descriptor for it. We're definitely not supposed to modify
> the image, it's still owned by the migration source.
> 
> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> Reviewed-by: Eric Blake <eblake@redhat.com>
> ---
>  block/io.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/block/io.c b/block/io.c
> index 707c04b..2372994 100644
> --- a/block/io.c
> +++ b/block/io.c
> @@ -1301,6 +1301,7 @@ static int coroutine_fn bdrv_co_do_pwritev(BlockDriverState *bs,
>      if (bs->read_only) {
>          return -EPERM;
>      }
> +    assert(!(bs->open_flags & BDRV_O_INCOMING));
>  
>      ret = bdrv_check_byte_request(bs, offset, bytes);
>      if (ret < 0) {
> @@ -2462,6 +2463,7 @@ int coroutine_fn bdrv_co_discard(BlockDriverState *bs, int64_t sector_num,
>      } else if (bs->read_only) {
>          return -EPERM;
>      }
> +    assert(!(bs->open_flags & BDRV_O_INCOMING));
>  
>      /* Do nothing if disabled.  */
>      if (!(bs->open_flags & BDRV_O_UNMAP)) {
>
Kevin Wolf Feb. 19, 2016, 9:17 a.m. UTC | #2
Am 18.02.2016 um 17:03 hat Laurent Vivier geschrieben:
> Hi,
> 
> this commit breaks incoming migration case:
> 
> qemu-system-ppc64 XXX -incoming tcp:0:4444
> qemu-system-ppc64: .../qemu/block/io.c:1304: bdrv_co_do_pwritev:
> Assertion `!(bs->open_flags & 0x0800)' failed.
> 
> Without "-incoming", the same command line boots fine.
> 
> Are you aware of the problem?

No, and with no other command line options it works for me:

    $ ppc64-softmmu/qemu-system-ppc64 -incoming tcp::4444

I have also tested somewhat more realistic cases on x86 (including
performing actual migrations) before committing the patch, so it's not
generally broken.

Can you please give the full command line and the stack backtrace? I
think you're seeing a bug where some PPC hardware tries to read (or
access anyway) the image before the migration has completed. This is for
obvious reasons wrong (the contents may still change until the VM is
handed over).

Kevin
Laurent Vivier Feb. 19, 2016, 10:37 a.m. UTC | #3
On 19/02/2016 10:17, Kevin Wolf wrote:
> Am 18.02.2016 um 17:03 hat Laurent Vivier geschrieben:
>> Hi,
>>
>> this commit breaks incoming migration case:
>>
>> qemu-system-ppc64 XXX -incoming tcp:0:4444
>> qemu-system-ppc64: .../qemu/block/io.c:1304: bdrv_co_do_pwritev:
>> Assertion `!(bs->open_flags & 0x0800)' failed.
>>
>> Without "-incoming", the same command line boots fine.
>>
>> Are you aware of the problem?
> 
> No, and with no other command line options it works for me:
> 
>     $ ppc64-softmmu/qemu-system-ppc64 -incoming tcp::4444
> 
> I have also tested somewhat more realistic cases on x86 (including
> performing actual migrations) before committing the patch, so it's not
> generally broken.
> 
> Can you please give the full command line and the stack backtrace? I
> think you're seeing a bug where some PPC hardware tries to read (or
> access anyway) the image before the migration has completed. This is for
> obvious reasons wrong (the contents may still change until the VM is
> handed over).

It happens before the migration has started, as soon as I start qemu
with "-incoming" parameter.

The simplified command line is:

qemu-system-ppc64
-drive file=rhel7.2-le.qcow2,format=qcow2,snapshot=on
-incoming tcp:0:4444

The problem seems coming from "snapshot=on".

Laurent
Kevin Wolf Feb. 19, 2016, 1:47 p.m. UTC | #4
Am 19.02.2016 um 11:37 hat Laurent Vivier geschrieben:
> 
> 
> On 19/02/2016 10:17, Kevin Wolf wrote:
> > Am 18.02.2016 um 17:03 hat Laurent Vivier geschrieben:
> >> Hi,
> >>
> >> this commit breaks incoming migration case:
> >>
> >> qemu-system-ppc64 XXX -incoming tcp:0:4444
> >> qemu-system-ppc64: .../qemu/block/io.c:1304: bdrv_co_do_pwritev:
> >> Assertion `!(bs->open_flags & 0x0800)' failed.
> >>
> >> Without "-incoming", the same command line boots fine.
> >>
> >> Are you aware of the problem?
> > 
> > No, and with no other command line options it works for me:
> > 
> >     $ ppc64-softmmu/qemu-system-ppc64 -incoming tcp::4444
> > 
> > I have also tested somewhat more realistic cases on x86 (including
> > performing actual migrations) before committing the patch, so it's not
> > generally broken.
> > 
> > Can you please give the full command line and the stack backtrace? I
> > think you're seeing a bug where some PPC hardware tries to read (or
> > access anyway) the image before the migration has completed. This is for
> > obvious reasons wrong (the contents may still change until the VM is
> > handed over).
> 
> It happens before the migration has started, as soon as I start qemu
> with "-incoming" parameter.
> 
> The simplified command line is:
> 
> qemu-system-ppc64
> -drive file=rhel7.2-le.qcow2,format=qcow2,snapshot=on
> -incoming tcp:0:4444
> 
> The problem seems coming from "snapshot=on".

Aha. The combination of migration with snapshot=on is obvious nonsense
(because on the destination the disk state would be reverted while the
guest OS thinks the disk is unchanged), so we should probably take a
migration blocker when an image with snapshot=on is opened.

Kevin
Paolo Bonzini Feb. 19, 2016, 2:11 p.m. UTC | #5
On 19/02/2016 14:47, Kevin Wolf wrote:
> > The problem seems coming from "snapshot=on".
> Aha. The combination of migration with snapshot=on is obvious nonsense
> (because on the destination the disk state would be reverted while the
> guest OS thinks the disk is unchanged),

Not really, on the destination the disk state will not be _further
updated_, but no reversion happens.

Migration with "snapshot=on" makes sense if you are resuming a VM from a
saved checkpoint ("-incoming exec").  You can use it to debug a
migration load, for example, because you want the resumed guest to
always start with the same disk contents.  It's somewhat weird, but it
definitely has its uses.  A blocker would be a pain in the ass...

Paolo
Kevin Wolf Feb. 19, 2016, 2:17 p.m. UTC | #6
Am 19.02.2016 um 15:11 hat Paolo Bonzini geschrieben:
> On 19/02/2016 14:47, Kevin Wolf wrote:
> > > The problem seems coming from "snapshot=on".
> > Aha. The combination of migration with snapshot=on is obvious nonsense
> > (because on the destination the disk state would be reverted while the
> > guest OS thinks the disk is unchanged),
> 
> Not really, on the destination the disk state will not be _further
> updated_, but no reversion happens.

Right, I started writing a patch when I noticed that that wasn't quite
right and -snapshot on the destination is somewhat unusual, but not in
all cases wrong.

> Migration with "snapshot=on" makes sense if you are resuming a VM from a
> saved checkpoint ("-incoming exec").  You can use it to debug a
> migration load, for example, because you want the resumed guest to
> always start with the same disk contents.  It's somewhat weird, but it
> definitely has its uses.  A blocker would be a pain in the ass...

My other thinko was that a blocker would help. In reality, blockers are
on the source host. They would make a little more sense than forbidding
-snapshot with -incoming, but we don't know how you migrate storage. If
you don't use shared storage but mirroring, then it could be okay again
(but probably even weirder than the other case).

Kevin

Patch
diff mbox

diff --git a/block/io.c b/block/io.c
index 707c04b..2372994 100644
--- a/block/io.c
+++ b/block/io.c
@@ -1301,6 +1301,7 @@  static int coroutine_fn bdrv_co_do_pwritev(BlockDriverState *bs,
     if (bs->read_only) {
         return -EPERM;
     }
+    assert(!(bs->open_flags & BDRV_O_INCOMING));
 
     ret = bdrv_check_byte_request(bs, offset, bytes);
     if (ret < 0) {
@@ -2462,6 +2463,7 @@  int coroutine_fn bdrv_co_discard(BlockDriverState *bs, int64_t sector_num,
     } else if (bs->read_only) {
         return -EPERM;
     }
+    assert(!(bs->open_flags & BDRV_O_INCOMING));
 
     /* Do nothing if disabled.  */
     if (!(bs->open_flags & BDRV_O_UNMAP)) {