From patchwork Mon Feb 6 16:51:49 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Juan Quintela X-Patchwork-Id: 9558411 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 9A78960242 for ; Mon, 6 Feb 2017 16:53:20 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 8C12326E76 for ; Mon, 6 Feb 2017 16:53:20 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 807FC27D5E; Mon, 6 Feb 2017 16:53:20 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.9 required=2.0 tests=BAYES_00,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from lists.gnu.org (lists.gnu.org [208.118.235.17]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 292D426E76 for ; Mon, 6 Feb 2017 16:53:20 +0000 (UTC) Received: from localhost ([::1]:49503 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1camXX-0004kT-3p for patchwork-qemu-devel@patchwork.kernel.org; Mon, 06 Feb 2017 11:53:19 -0500 Received: from eggs.gnu.org ([2001:4830:134:3::10]:35285) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1camWU-0004IB-Aq for qemu-devel@nongnu.org; Mon, 06 Feb 2017 11:52:17 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1camWT-0005nm-Ek for qemu-devel@nongnu.org; Mon, 06 Feb 2017 11:52:14 -0500 Received: from mx1.redhat.com ([209.132.183.28]:33022) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1camWT-0005nO-8W for qemu-devel@nongnu.org; Mon, 06 Feb 2017 11:52:13 -0500 Received: from int-mx09.intmail.prod.int.phx2.redhat.com (int-mx09.intmail.prod.int.phx2.redhat.com [10.5.11.22]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 55CEE51469 for ; Mon, 6 Feb 2017 16:52:13 +0000 (UTC) Received: from emacs.mitica (ovpn-116-74.ams2.redhat.com [10.36.116.74]) by int-mx09.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id v16Gq2Mq021125; Mon, 6 Feb 2017 11:52:12 -0500 From: Juan Quintela To: qemu-devel@nongnu.org Date: Mon, 6 Feb 2017 17:51:49 +0100 Message-Id: <1486399909-16338-7-git-send-email-quintela@redhat.com> In-Reply-To: <1486399909-16338-1-git-send-email-quintela@redhat.com> References: <1486399909-16338-1-git-send-email-quintela@redhat.com> X-Scanned-By: MIMEDefang 2.68 on 10.5.11.22 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.39]); Mon, 06 Feb 2017 16:52:13 +0000 (UTC) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] [fuzzy] X-Received-From: 209.132.183.28 Subject: [Qemu-devel] [PULL 6/6] postcopy: Recover block devices on early failure X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: amit.shah@redhat.com, dgilbert@redhat.com Errors-To: qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org Sender: "Qemu-devel" X-Virus-Scanned: ClamAV using ClamSMTP From: "Dr. David Alan Gilbert" An early postcopy failure can be recovered from as long as we know we haven't sent the command to run the destination. We have to undo the bdrv_inactivate_all by calling bdrv_invalidate_cache_all Note that I'm not using ms->block_inactive because once we've sent the postcopy package we dont want anything else to try and recover the block storage on the source; the destination might have started writing to it. Signed-off-by: Dr. David Alan Gilbert Message-Id: <20170202155909.31784-3-dgilbert@redhat.com> Signed-off-by: Juan Quintela --- migration/migration.c | 25 +++++++++++++++++++++++++ 1 file changed, 25 insertions(+) diff --git a/migration/migration.c b/migration/migration.c index 619ccc4..2b179c6 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -1601,6 +1601,7 @@ static int postcopy_start(MigrationState *ms, bool *old_vm_running) QIOChannelBuffer *bioc; QEMUFile *fb; int64_t time_at_stop = qemu_clock_get_ms(QEMU_CLOCK_REALTIME); + bool restart_block = false; migrate_set_state(&ms->state, MIGRATION_STATUS_ACTIVE, MIGRATION_STATUS_POSTCOPY_ACTIVE); @@ -1620,6 +1621,7 @@ static int postcopy_start(MigrationState *ms, bool *old_vm_running) if (ret < 0) { goto fail; } + restart_block = true; /* * Cause any non-postcopiable, but iterative devices to @@ -1676,6 +1678,18 @@ static int postcopy_start(MigrationState *ms, bool *old_vm_running) /* <><> end of stuff going into the package */ + /* Last point of recovery; as soon as we send the package the destination + * can open devices and potentially start running. + * Lets just check again we've not got any errors. + */ + ret = qemu_file_get_error(ms->to_dst_file); + if (ret) { + error_report("postcopy_start: Migration stream errored (pre package)"); + goto fail_closefb; + } + + restart_block = false; + /* Now send that blob */ if (qemu_savevm_send_packaged(ms->to_dst_file, bioc->data, bioc->usage)) { goto fail_closefb; @@ -1713,6 +1727,17 @@ fail_closefb: fail: migrate_set_state(&ms->state, MIGRATION_STATUS_POSTCOPY_ACTIVE, MIGRATION_STATUS_FAILED); + if (restart_block) { + /* A failure happened early enough that we know the destination hasn't + * accessed block devices, so we're safe to recover. + */ + Error *local_err = NULL; + + bdrv_invalidate_cache_all(&local_err); + if (local_err) { + error_report_err(local_err); + } + } qemu_mutex_unlock_iothread(); return -1; }