Message ID | 20171201055832.8392-1-fangying1@huawei.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On 2017/12/1 22:39, Michael S. Tsirkin wrote: > On Fri, Dec 01, 2017 at 01:58:32PM +0800, fangying wrote: >> QEMU will abort when vhost-user process is restarted during migration >> when vhost_log_global_start/stop is called. The reason is clear that >> vhost_dev_set_log returns -1 because network connection is lost. >> >> To handle this situation, let's cancel migration by setting migrate >> state to failure and report it to user. > > In fact I don't see this as the right way to fix it. Backend is dead so why > not just proceed with migration? We just need to make sure we re-send > migration data on re-connect. > This is where vhost start/stop migration dirty log. The original code aborts qemu here beacuse vhost data stream may break down if we fail to start/stop vhost dirty log during migration. Backend may be active after vhost_log_global_start. dirty log start ----------------- dirty log stop ^ ^ | | ----- backend dead ----- backend active Currently we don't re-send migration data on re-connect in this situation. May we should work it out. >> --- >> hw/virtio/vhost.c | 12 ++++++++++-- >> 1 file changed, 10 insertions(+), 2 deletions(-) >> >> diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c >> index ddc42f0..92725f7 100644 >> --- a/hw/virtio/vhost.c >> +++ b/hw/virtio/vhost.c >> @@ -26,6 +26,8 @@ >> #include "hw/virtio/virtio-bus.h" >> #include "hw/virtio/virtio-access.h" >> #include "migration/blocker.h" >> +#include "migration/migration.h" >> +#include "migration/qemu-file.h" >> #include "sysemu/dma.h" >> >> /* enabled until disconnected backend stabilizes */ >> @@ -885,7 +887,10 @@ static void vhost_log_global_start(MemoryListener *listener) >> >> r = vhost_migration_log(listener, true); >> if (r < 0) { >> - abort(); >> + error_report("Failed to start vhost dirty log"); >> + if (migrate_get_current()->migration_thread_running) { >> + qemu_file_set_error(migrate_get_current()->to_dst_file, -ECHILD); >> + } >> } >> } >> >> @@ -895,7 +900,10 @@ static void vhost_log_global_stop(MemoryListener *listener) >> >> r = vhost_migration_log(listener, false); >> if (r < 0) { >> - abort(); >> + error_report("Failed to stop vhost dirty log"); >> + if (migrate_get_current()->migration_thread_running) { >> + qemu_file_set_error(migrate_get_current()->to_dst_file, -ECHILD); >> + } >> } >> } >> >> -- >> 1.8.3.1 >> > > . >
On Wed, Dec 06, 2017 at 09:30:27PM +0800, Ying Fang wrote: > > On 2017/12/1 22:39, Michael S. Tsirkin wrote: > > On Fri, Dec 01, 2017 at 01:58:32PM +0800, fangying wrote: > >> QEMU will abort when vhost-user process is restarted during migration > >> when vhost_log_global_start/stop is called. The reason is clear that > >> vhost_dev_set_log returns -1 because network connection is lost. > >> > >> To handle this situation, let's cancel migration by setting migrate > >> state to failure and report it to user. > > > > In fact I don't see this as the right way to fix it. Backend is dead so why > > not just proceed with migration? We just need to make sure we re-send > > migration data on re-connect. > > This is where vhost start/stop migration dirty log. The original code aborts > qemu here beacuse vhost data stream may break down if we fail to start/stop > vhost dirty log during migration. Backend may be active after vhost_log_global_start. > > dirty log start ----------------- dirty log stop > ^ ^ > | | > ----- backend dead ----- backend active I'm sorry, I don't understand yet. Backend is active after logging started - why is this a problem? > Currently we don't re-send migration data on re-connect in this situation. > May we should work it out. So basically backend connects after logging started, and we do not tell it to start logging and where - is that the issue? I agree, that would be a bug then.
On 2017/12/7 0:34, Michael S. Tsirkin wrote: > On Wed, Dec 06, 2017 at 09:30:27PM +0800, Ying Fang wrote: >> >> On 2017/12/1 22:39, Michael S. Tsirkin wrote: >>> On Fri, Dec 01, 2017 at 01:58:32PM +0800, fangying wrote: >>>> QEMU will abort when vhost-user process is restarted during migration >>>> when vhost_log_global_start/stop is called. The reason is clear that >>>> vhost_dev_set_log returns -1 because network connection is lost. >>>> >>>> To handle this situation, let's cancel migration by setting migrate >>>> state to failure and report it to user. >>> >>> In fact I don't see this as the right way to fix it. Backend is dead so why >>> not just proceed with migration? We just need to make sure we re-send >>> migration data on re-connect. >>> This is where vhost start/stop migration dirty log. The original code aborts >> qemu here beacuse vhost data stream may break down if we fail to start/stop >> vhost dirty log during migration. Backend may be active after vhost_log_global_start. >> >> dirty log start ----------------- dirty log stop >> ^ ^ >> | | >> ----- backend dead ----- backend active > > I'm sorry, I don't understand yet. Backend is active after logging started - > why is this a problem?Sorry, I did not explain it well. IF backend is dead when dirty log start is called, vhost_dev_set_log/vhost_dev_set_features may fail because connection is temporarily lost. So even if migration is in progress and vhost-user backend is active again later, vhost-user dirty memory is not logged. > >> Currently we don't re-send migration data on re-connect in this situation. >> May we should work it out. > > So basically backend connects after logging started, and we > do not tell it to start logging and where - is that the issue? > I agree, that would be a bug then. > Yes, this is just the issue.
diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c index ddc42f0..92725f7 100644 --- a/hw/virtio/vhost.c +++ b/hw/virtio/vhost.c @@ -26,6 +26,8 @@ #include "hw/virtio/virtio-bus.h" #include "hw/virtio/virtio-access.h" #include "migration/blocker.h" +#include "migration/migration.h" +#include "migration/qemu-file.h" #include "sysemu/dma.h" /* enabled until disconnected backend stabilizes */ @@ -885,7 +887,10 @@ static void vhost_log_global_start(MemoryListener *listener) r = vhost_migration_log(listener, true); if (r < 0) { - abort(); + error_report("Failed to start vhost dirty log"); + if (migrate_get_current()->migration_thread_running) { + qemu_file_set_error(migrate_get_current()->to_dst_file, -ECHILD); + } } } @@ -895,7 +900,10 @@ static void vhost_log_global_stop(MemoryListener *listener) r = vhost_migration_log(listener, false); if (r < 0) { - abort(); + error_report("Failed to stop vhost dirty log"); + if (migrate_get_current()->migration_thread_running) { + qemu_file_set_error(migrate_get_current()->to_dst_file, -ECHILD); + } } }