From patchwork Mon May 23 17:19:47 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Amit Shah X-Patchwork-Id: 9132193 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 18B5160761 for ; Mon, 23 May 2016 17:24:08 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 0CA132823A for ; Mon, 23 May 2016 17:24:08 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 00CCC28246; Mon, 23 May 2016 17:24:07 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.9 required=2.0 tests=BAYES_00,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from lists.gnu.org (lists.gnu.org [208.118.235.17]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 9BF4D2823A for ; Mon, 23 May 2016 17:24:07 +0000 (UTC) Received: from localhost ([::1]:49499 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1b4taI-0001qk-Mi for patchwork-qemu-devel@patchwork.kernel.org; Mon, 23 May 2016 13:24:06 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:43005) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1b4tWf-0007UI-8y for qemu-devel@nongnu.org; Mon, 23 May 2016 13:20:22 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1b4tWY-0005Uo-Pf for qemu-devel@nongnu.org; Mon, 23 May 2016 13:20:20 -0400 Received: from mx1.redhat.com ([209.132.183.28]:57913) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1b4tWY-0005Uj-HK for qemu-devel@nongnu.org; Mon, 23 May 2016 13:20:14 -0400 Received: from int-mx13.intmail.prod.int.phx2.redhat.com (int-mx13.intmail.prod.int.phx2.redhat.com [10.5.11.26]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 24E227F368; Mon, 23 May 2016 17:20:14 +0000 (UTC) Received: from localhost (ovpn-116-17.sin2.redhat.com [10.67.116.17]) by int-mx13.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id u4NHKCkN005190; Mon, 23 May 2016 13:20:13 -0400 From: Amit Shah To: gkurz@linux.vnet.ibm.com, Markus Armbruster , jjherne@linux.vnet.ibm.com, Peter Maydell Date: Mon, 23 May 2016 22:49:47 +0530 Message-Id: In-Reply-To: References: In-Reply-To: References: X-Scanned-By: MIMEDefang 2.68 on 10.5.11.26 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.26]); Mon, 23 May 2016 17:20:14 +0000 (UTC) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 209.132.183.28 Subject: [Qemu-devel] [PULL 5/5] migration: regain control of images when migration fails to complete X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Amit Shah , qemu list , "Dr. David Alan Gilbert" , Juan Quintela Errors-To: qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org Sender: "Qemu-devel" X-Virus-Scanned: ClamAV using ClamSMTP From: Greg Kurz We currently have an error path during migration that can cause the source QEMU to abort: migration_thread() migration_completion() runstate_is_running() ----------------> true if guest is running bdrv_inactivate_all() ----------------> inactivate images qemu_savevm_state_complete_precopy() ... qemu_fflush() socket_writev_buffer() --------> error because destination fails qemu_fflush() -------------------> set error on migration stream migration_completion() -----------------> set migrate state to FAILED migration_thread() -----------------------> break migration loop vm_start() -----------------------------> restart guest with inactive images and you get: qemu-system-ppc64: socket_writev_buffer: Got err=104 for (32768/18446744073709551615) qemu-system-ppc64: /home/greg/Work/qemu/qemu-master/block/io.c:1342:bdrv_co_do_pwritev: Assertion `!(bs->open_flags & 0x0800)' failed. Aborted (core dumped) If we try postcopy with a similar scenario, we also get the writev error message but QEMU leaves the guest paused because entered_postcopy is true. We could possibly do the same with precopy and leave the guest paused. But since the historical default for migration errors is to restart the source, this patch adds a call to bdrv_invalidate_cache_all() instead. Signed-off-by: Greg Kurz Message-Id: <146357896785.6003.11983081732454362715.stgit@bahia.huguette.org> Signed-off-by: Amit Shah --- migration/migration.c | 17 +++++++++++++++-- 1 file changed, 15 insertions(+), 2 deletions(-) diff --git a/migration/migration.c b/migration/migration.c index 721d010..f5327e8 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -1606,19 +1606,32 @@ static void migration_completion(MigrationState *s, int current_active_state, rp_error = await_return_path_close_on_source(s); trace_migration_completion_postcopy_end_after_rp(rp_error); if (rp_error) { - goto fail; + goto fail_invalidate; } } if (qemu_file_get_error(s->to_dst_file)) { trace_migration_completion_file_err(); - goto fail; + goto fail_invalidate; } migrate_set_state(&s->state, current_active_state, MIGRATION_STATUS_COMPLETED); return; +fail_invalidate: + /* If not doing postcopy, vm_start() will be called: let's regain + * control on images. + */ + if (s->state == MIGRATION_STATUS_ACTIVE) { + Error *local_err = NULL; + + bdrv_invalidate_cache_all(&local_err); + if (local_err) { + error_report_err(local_err); + } + } + fail: migrate_set_state(&s->state, current_active_state, MIGRATION_STATUS_FAILED);