Message ID | 20170925132312.GA7999@localhost.localdomain (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
25.09.2017 16:23, Kevin Wolf wrote: > Am 20.09.2017 um 13:45 hat Juan Quintela geschrieben: >> "Dr. David Alan Gilbert" <dgilbert@redhat.com> wrote: >>> * Vladimir Sementsov-Ogievskiy (vsementsov@virtuozzo.com) wrote: >>>> ping for 1-3 >>>> Can we merge them? >>> I see all of them have R-b's; so lets try and put them in the next >>> migration merge. >>> >>> Quintela: Sound good? >> Yeap. > This patch broke qemu-iotests 181 ('Test postcopy live migration with > shared storage'): > > --- /home/kwolf/source/qemu/tests/qemu-iotests/181.out 2017-06-16 19:19:53.000000000 +0200 > +++ 181.out.bad 2017-09-25 15:20:40.787582000 +0200 > @@ -21,18 +21,16 @@ > === Do some I/O on the destination === > > QEMU X.Y.Z monitor - type 'help' for more information > -(qemu) qemu-io disk "read -P 0x55 0 64k" > +(qemu) QEMU_PROG: Expected vmdescription section, but got 0 > +QEMU_PROG: Failed to get "write" lock > +Is another process using the image? > +qemu-io disk "read -P 0x55 0 64k" > read 65536/65536 bytes at offset 0 > 64 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) > (qemu) > (qemu) qemu-io disk "write -P 0x66 1M 64k" > -wrote 65536/65536 bytes at offset 1048576 > -64 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) > - > -=== Shut down and check image === > - > -(qemu) quit > -(qemu) > -(qemu) quit > -No errors were found on the image. > -*** done > +QEMU_PROG: block/io.c:1359: bdrv_aligned_pwritev: Assertion `child->perm & BLK_PERM_WRITE' failed. > +./common.config: Aborted (core dumped) ( if [ -n "${QEMU_NEED_PID}" ]; then > +echo $BASHPID > "${QEMU_TEST_DIR}/qemu-${_QEMU_HANDLE}.pid"; > +fi; exec "$QEMU_PROG" $QEMU_OPTIONS "$@" ) > +Timeout waiting for ops/sec on handle 1 Not sure about locking (don't see this error on my old kernel without OFD locking), but it looks like that 181 test should be fixed to set postcopy-ram capability on target too (which was considered as correct way on list)
Am 25.09.2017 um 16:31 hat Vladimir Sementsov-Ogievskiy geschrieben: > 25.09.2017 16:23, Kevin Wolf wrote: > > Am 20.09.2017 um 13:45 hat Juan Quintela geschrieben: > > > "Dr. David Alan Gilbert" <dgilbert@redhat.com> wrote: > > > > * Vladimir Sementsov-Ogievskiy (vsementsov@virtuozzo.com) wrote: > > > > > ping for 1-3 > > > > > Can we merge them? > > > > I see all of them have R-b's; so lets try and put them in the next > > > > migration merge. > > > > > > > > Quintela: Sound good? > > > Yeap. > > This patch broke qemu-iotests 181 ('Test postcopy live migration with > > shared storage'): > > > > --- /home/kwolf/source/qemu/tests/qemu-iotests/181.out 2017-06-16 19:19:53.000000000 +0200 > > +++ 181.out.bad 2017-09-25 15:20:40.787582000 +0200 > > @@ -21,18 +21,16 @@ > > === Do some I/O on the destination === > > QEMU X.Y.Z monitor - type 'help' for more information > > -(qemu) qemu-io disk "read -P 0x55 0 64k" > > +(qemu) QEMU_PROG: Expected vmdescription section, but got 0 > > +QEMU_PROG: Failed to get "write" lock > > +Is another process using the image? > > +qemu-io disk "read -P 0x55 0 64k" > > read 65536/65536 bytes at offset 0 > > 64 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) > > (qemu) > > (qemu) qemu-io disk "write -P 0x66 1M 64k" > > -wrote 65536/65536 bytes at offset 1048576 > > -64 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) > > - > > -=== Shut down and check image === > > - > > -(qemu) quit > > -(qemu) > > -(qemu) quit > > -No errors were found on the image. > > -*** done > > +QEMU_PROG: block/io.c:1359: bdrv_aligned_pwritev: Assertion `child->perm & BLK_PERM_WRITE' failed. > > +./common.config: Aborted (core dumped) ( if [ -n "${QEMU_NEED_PID}" ]; then > > +echo $BASHPID > "${QEMU_TEST_DIR}/qemu-${_QEMU_HANDLE}.pid"; > > +fi; exec "$QEMU_PROG" $QEMU_OPTIONS "$@" ) > > +Timeout waiting for ops/sec on handle 1 > > Not sure about locking (don't see this error on my old kernel without OFD > locking), but it looks like that > 181 test should be fixed to set postcopy-ram capability on target too (which > was considered as correct way on list) Whatever you think the preferred way to set up postcopy migration is: If something worked before this patch and doesn't after it, that's a regression and breaks backwards compatibility. If we were talking about a graceful failure, maybe we could discuss whether carefully and deliberately breaking compatibility could be justified in this specific case. But the breakage is neither mentioned in the commit message nor is it graceful, so I can only call it a bug. Kevin
25.09.2017 17:58, Kevin Wolf wrote: > Am 25.09.2017 um 16:31 hat Vladimir Sementsov-Ogievskiy geschrieben: >> 25.09.2017 16:23, Kevin Wolf wrote: >>> Am 20.09.2017 um 13:45 hat Juan Quintela geschrieben: >>>> "Dr. David Alan Gilbert" <dgilbert@redhat.com> wrote: >>>>> * Vladimir Sementsov-Ogievskiy (vsementsov@virtuozzo.com) wrote: >>>>>> ping for 1-3 >>>>>> Can we merge them? >>>>> I see all of them have R-b's; so lets try and put them in the next >>>>> migration merge. >>>>> >>>>> Quintela: Sound good? >>>> Yeap. >>> This patch broke qemu-iotests 181 ('Test postcopy live migration with >>> shared storage'): >>> >>> --- /home/kwolf/source/qemu/tests/qemu-iotests/181.out 2017-06-16 19:19:53.000000000 +0200 >>> +++ 181.out.bad 2017-09-25 15:20:40.787582000 +0200 >>> @@ -21,18 +21,16 @@ >>> === Do some I/O on the destination === >>> QEMU X.Y.Z monitor - type 'help' for more information >>> -(qemu) qemu-io disk "read -P 0x55 0 64k" >>> +(qemu) QEMU_PROG: Expected vmdescription section, but got 0 >>> +QEMU_PROG: Failed to get "write" lock >>> +Is another process using the image? >>> +qemu-io disk "read -P 0x55 0 64k" >>> read 65536/65536 bytes at offset 0 >>> 64 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) >>> (qemu) >>> (qemu) qemu-io disk "write -P 0x66 1M 64k" >>> -wrote 65536/65536 bytes at offset 1048576 >>> -64 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) >>> - >>> -=== Shut down and check image === >>> - >>> -(qemu) quit >>> -(qemu) >>> -(qemu) quit >>> -No errors were found on the image. >>> -*** done >>> +QEMU_PROG: block/io.c:1359: bdrv_aligned_pwritev: Assertion `child->perm & BLK_PERM_WRITE' failed. >>> +./common.config: Aborted (core dumped) ( if [ -n "${QEMU_NEED_PID}" ]; then >>> +echo $BASHPID > "${QEMU_TEST_DIR}/qemu-${_QEMU_HANDLE}.pid"; >>> +fi; exec "$QEMU_PROG" $QEMU_OPTIONS "$@" ) >>> +Timeout waiting for ops/sec on handle 1 >> Not sure about locking (don't see this error on my old kernel without OFD >> locking), but it looks like that >> 181 test should be fixed to set postcopy-ram capability on target too (which >> was considered as correct way on list) > Whatever you think the preferred way to set up postcopy migration is: If > something worked before this patch and doesn't after it, that's a > regression and breaks backwards compatibility. > > If we were talking about a graceful failure, maybe we could discuss > whether carefully and deliberately breaking compatibility could be > justified in this specific case. But the breakage is neither mentioned > in the commit message nor is it graceful, so I can only call it a bug. > > Kevin It's of course my fault, I don't mean "it's wrong test, so it's not my problem") And I've already sent a patch.
* Vladimir Sementsov-Ogievskiy (vsementsov@virtuozzo.com) wrote: > 25.09.2017 17:58, Kevin Wolf wrote: > > Am 25.09.2017 um 16:31 hat Vladimir Sementsov-Ogievskiy geschrieben: > > > 25.09.2017 16:23, Kevin Wolf wrote: > > > > Am 20.09.2017 um 13:45 hat Juan Quintela geschrieben: > > > > > "Dr. David Alan Gilbert" <dgilbert@redhat.com> wrote: > > > > > > * Vladimir Sementsov-Ogievskiy (vsementsov@virtuozzo.com) wrote: > > > > > > > ping for 1-3 > > > > > > > Can we merge them? > > > > > > I see all of them have R-b's; so lets try and put them in the next > > > > > > migration merge. > > > > > > > > > > > > Quintela: Sound good? > > > > > Yeap. > > > > This patch broke qemu-iotests 181 ('Test postcopy live migration with > > > > shared storage'): > > > > > > > > --- /home/kwolf/source/qemu/tests/qemu-iotests/181.out 2017-06-16 19:19:53.000000000 +0200 > > > > +++ 181.out.bad 2017-09-25 15:20:40.787582000 +0200 > > > > @@ -21,18 +21,16 @@ > > > > === Do some I/O on the destination === > > > > QEMU X.Y.Z monitor - type 'help' for more information > > > > -(qemu) qemu-io disk "read -P 0x55 0 64k" > > > > +(qemu) QEMU_PROG: Expected vmdescription section, but got 0 > > > > +QEMU_PROG: Failed to get "write" lock > > > > +Is another process using the image? > > > > +qemu-io disk "read -P 0x55 0 64k" > > > > read 65536/65536 bytes at offset 0 > > > > 64 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) > > > > (qemu) > > > > (qemu) qemu-io disk "write -P 0x66 1M 64k" > > > > -wrote 65536/65536 bytes at offset 1048576 > > > > -64 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) > > > > - > > > > -=== Shut down and check image === > > > > - > > > > -(qemu) quit > > > > -(qemu) > > > > -(qemu) quit > > > > -No errors were found on the image. > > > > -*** done > > > > +QEMU_PROG: block/io.c:1359: bdrv_aligned_pwritev: Assertion `child->perm & BLK_PERM_WRITE' failed. > > > > +./common.config: Aborted (core dumped) ( if [ -n "${QEMU_NEED_PID}" ]; then > > > > +echo $BASHPID > "${QEMU_TEST_DIR}/qemu-${_QEMU_HANDLE}.pid"; > > > > +fi; exec "$QEMU_PROG" $QEMU_OPTIONS "$@" ) > > > > +Timeout waiting for ops/sec on handle 1 > > > Not sure about locking (don't see this error on my old kernel without OFD > > > locking), but it looks like that > > > 181 test should be fixed to set postcopy-ram capability on target too (which > > > was considered as correct way on list) > > Whatever you think the preferred way to set up postcopy migration is: If > > something worked before this patch and doesn't after it, that's a > > regression and breaks backwards compatibility. > > > > If we were talking about a graceful failure, maybe we could discuss > > whether carefully and deliberately breaking compatibility could be > > justified in this specific case. But the breakage is neither mentioned > > in the commit message nor is it graceful, so I can only call it a bug. > > > > Kevin > > It's of course my fault, I don't mean "it's wrong test, so it's not my > problem") And I've already sent a patch. Why does this fail so badly, asserts etc - I was hoping for something a bit more obvious from the migration code. postcopy did originally work without the destination having the flag on but setting the flag on the destination was always good practice because it detected whether the host support was there early on. Dave > -- > Best regards, > Vladimir > -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
Am 25.09.2017 um 17:27 hat Dr. David Alan Gilbert geschrieben: > > > Whatever you think the preferred way to set up postcopy migration is: If > > > something worked before this patch and doesn't after it, that's a > > > regression and breaks backwards compatibility. > > > > > > If we were talking about a graceful failure, maybe we could discuss > > > whether carefully and deliberately breaking compatibility could be > > > justified in this specific case. But the breakage is neither mentioned > > > in the commit message nor is it graceful, so I can only call it a bug. > > > > > > Kevin > > > > It's of course my fault, I don't mean "it's wrong test, so it's not my > > problem") And I've already sent a patch. > > Why does this fail so badly, asserts etc - I was hoping for something > a bit more obvious from the migration code. > > postcopy did originally work without the destination having the flag on > but setting the flag on the destination was always good practice because > it detected whether the host support was there early on. So what does this mean for 2.11? Do you think it is acceptable breaking cases where the flag isn't set on the destination? If so, just changing the test case is enough. But if not, I'd rather keep the test case as it is and fix only the migration code. Kevin
* Kevin Wolf (kwolf@redhat.com) wrote: > Am 25.09.2017 um 17:27 hat Dr. David Alan Gilbert geschrieben: > > > > Whatever you think the preferred way to set up postcopy migration is: If > > > > something worked before this patch and doesn't after it, that's a > > > > regression and breaks backwards compatibility. > > > > > > > > If we were talking about a graceful failure, maybe we could discuss > > > > whether carefully and deliberately breaking compatibility could be > > > > justified in this specific case. But the breakage is neither mentioned > > > > in the commit message nor is it graceful, so I can only call it a bug. > > > > > > > > Kevin > > > > > > It's of course my fault, I don't mean "it's wrong test, so it's not my > > > problem") And I've already sent a patch. > > > > Why does this fail so badly, asserts etc - I was hoping for something > > a bit more obvious from the migration code. > > > > postcopy did originally work without the destination having the flag on > > but setting the flag on the destination was always good practice because > > it detected whether the host support was there early on. > > So what does this mean for 2.11? Do you think it is acceptable breaking > cases where the flag isn't set on the destination? I think so, because we've always recommended setting it on the destination for the early detection. > If so, just changing the test case is enough. But if not, I'd rather > keep the test case as it is and fix only the migration code. I'd take the test case fix, but I also want to dig why it fails so badly; it would be nice just to have a clean failure telling you that postcopy was expected. Dave > > Kevin -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
Am 26.09.2017 um 12:21 hat Dr. David Alan Gilbert geschrieben: > * Kevin Wolf (kwolf@redhat.com) wrote: > > Am 25.09.2017 um 17:27 hat Dr. David Alan Gilbert geschrieben: > > > > > Whatever you think the preferred way to set up postcopy migration is: If > > > > > something worked before this patch and doesn't after it, that's a > > > > > regression and breaks backwards compatibility. > > > > > > > > > > If we were talking about a graceful failure, maybe we could discuss > > > > > whether carefully and deliberately breaking compatibility could be > > > > > justified in this specific case. But the breakage is neither mentioned > > > > > in the commit message nor is it graceful, so I can only call it a bug. > > > > > > > > > > Kevin > > > > > > > > It's of course my fault, I don't mean "it's wrong test, so it's not my > > > > problem") And I've already sent a patch. > > > > > > Why does this fail so badly, asserts etc - I was hoping for something > > > a bit more obvious from the migration code. > > > > > > postcopy did originally work without the destination having the flag on > > > but setting the flag on the destination was always good practice because > > > it detected whether the host support was there early on. > > > > So what does this mean for 2.11? Do you think it is acceptable breaking > > cases where the flag isn't set on the destination? > > I think so, because we've always recommended setting it on the > destination for the early detection. Okay, I'll include the test case patch in my pull request today then. > > If so, just changing the test case is enough. But if not, I'd rather > > keep the test case as it is and fix only the migration code. > > I'd take the test case fix, but I also want to dig why it fails so > badly; it would be nice just to have a clean failure telling you > that postcopy was expected. Yes, that would be nice. Kevin
--- /home/kwolf/source/qemu/tests/qemu-iotests/181.out 2017-06-16 19:19:53.000000000 +0200 +++ 181.out.bad 2017-09-25 15:20:40.787582000 +0200 @@ -21,18 +21,16 @@ === Do some I/O on the destination === QEMU X.Y.Z monitor - type 'help' for more information -(qemu) qemu-io disk "read -P 0x55 0 64k" +(qemu) QEMU_PROG: Expected vmdescription section, but got 0 +QEMU_PROG: Failed to get "write" lock +Is another process using the image? +qemu-io disk "read -P 0x55 0 64k" read 65536/65536 bytes at offset 0 64 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) (qemu) (qemu) qemu-io disk "write -P 0x66 1M 64k" -wrote 65536/65536 bytes at offset 1048576 -64 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) - -=== Shut down and check image === - -(qemu) quit -(qemu) -(qemu) quit -No errors were found on the image. -*** done +QEMU_PROG: block/io.c:1359: bdrv_aligned_pwritev: Assertion `child->perm & BLK_PERM_WRITE' failed. +./common.config: Aborted (core dumped) ( if [ -n "${QEMU_NEED_PID}" ]; then +echo $BASHPID > "${QEMU_TEST_DIR}/qemu-${_QEMU_HANDLE}.pid"; +fi; exec "$QEMU_PROG" $QEMU_OPTIONS "$@" ) +Timeout waiting for ops/sec on handle 1