From patchwork Wed May 18 13:44:36 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Greg Kurz X-Patchwork-Id: 9118941 Return-Path: X-Original-To: patchwork-qemu-devel@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork2.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.136]) by patchwork2.web.kernel.org (Postfix) with ESMTP id AAFC4BF29F for ; Wed, 18 May 2016 13:45:27 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id 0B58F20218 for ; Wed, 18 May 2016 13:45:26 +0000 (UTC) Received: from lists.gnu.org (lists.gnu.org [208.118.235.17]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id D30CD20138 for ; Wed, 18 May 2016 13:45:23 +0000 (UTC) Received: from localhost ([::1]:45124 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1b31ms-0005T2-Ll for patchwork-qemu-devel@patchwork.kernel.org; Wed, 18 May 2016 09:45:22 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:55811) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1b31mc-0005QC-D8 for qemu-devel@nongnu.org; Wed, 18 May 2016 09:45:07 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1b31mY-0001W3-6V for qemu-devel@nongnu.org; Wed, 18 May 2016 09:45:05 -0400 Received: from e06smtp10.uk.ibm.com ([195.75.94.106]:55859) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1b31mX-0001Um-SQ for qemu-devel@nongnu.org; Wed, 18 May 2016 09:45:02 -0400 Received: from localhost by e06smtp10.uk.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Wed, 18 May 2016 14:44:56 +0100 Received: from d06dlp01.portsmouth.uk.ibm.com (9.149.20.13) by e06smtp10.uk.ibm.com (192.168.101.140) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; Wed, 18 May 2016 14:44:40 +0100 X-IBM-Helo: d06dlp01.portsmouth.uk.ibm.com X-IBM-MailFrom: gkurz@linux.vnet.ibm.com X-IBM-RcptTo: qemu-devel@nongnu.org;qemu-stable@nongnu.org Received: from b06cxnps4075.portsmouth.uk.ibm.com (d06relay12.portsmouth.uk.ibm.com [9.149.109.197]) by d06dlp01.portsmouth.uk.ibm.com (Postfix) with ESMTP id 3AE7217D806A; Wed, 18 May 2016 14:45:40 +0100 (BST) Received: from d06av08.portsmouth.uk.ibm.com (d06av08.portsmouth.uk.ibm.com [9.149.37.249]) by b06cxnps4075.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id u4IDidHd5964086; Wed, 18 May 2016 13:44:39 GMT Received: from d06av08.portsmouth.uk.ibm.com (localhost [127.0.0.1]) by d06av08.portsmouth.uk.ibm.com (8.14.4/8.14.4/NCO v10.0 AVout) with ESMTP id u4IDidhV001669; Wed, 18 May 2016 07:44:39 -0600 Received: from smtp.lab.toulouse-stg.fr.ibm.com (srv01.lab.toulouse-stg.fr.ibm.com [9.101.4.1]) by d06av08.portsmouth.uk.ibm.com (8.14.4/8.14.4/NCO v10.0 AVin) with ESMTP id u4IDidvQ001657; Wed, 18 May 2016 07:44:39 -0600 Received: from bahia.huguette.org (sig-9-84-49-211.evts.de.ibm.com [9.84.49.211]) by smtp.lab.toulouse-stg.fr.ibm.com (Postfix) with ESMTP id E6A8F22050E; Wed, 18 May 2016 15:44:37 +0200 (CEST) From: Greg Kurz To: Amit Shah , Juan Quintela Date: Wed, 18 May 2016 15:44:36 +0200 Message-ID: <146357896785.6003.11983081732454362715.stgit@bahia.huguette.org> User-Agent: StGit/0.17.1-dirty MIME-Version: 1.0 X-TM-AS-MML: disable X-Content-Scanned: Fidelis XPS MAILER x-cbid: 16051813-0041-0000-0000-000020182D7E X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused X-detected-operating-system: by eggs.gnu.org: GNU/Linux 3.x X-Received-From: 195.75.94.106 Subject: [Qemu-devel] [PATCH v2] migration: regain control of images when migration fails to complete X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Kevin Wolf , qemu-devel@nongnu.org, qemu-stable@nongnu.org, "Dr. David Alan Gilbert" Errors-To: qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org Sender: "Qemu-devel" X-Spam-Status: No, score=-6.9 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_HI, UNPARSEABLE_RELAY autolearn=unavailable version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP We currently have an error path during migration that can cause the source QEMU to abort: migration_thread() migration_completion() runstate_is_running() ----------------> true if guest is running bdrv_inactivate_all() ----------------> inactivate images qemu_savevm_state_complete_precopy() ... qemu_fflush() socket_writev_buffer() --------> error because destination fails qemu_fflush() -------------------> set error on migration stream migration_completion() -----------------> set migrate state to FAILED migration_thread() -----------------------> break migration loop vm_start() -----------------------------> restart guest with inactive images and you get: qemu-system-ppc64: socket_writev_buffer: Got err=104 for (32768/18446744073709551615) qemu-system-ppc64: /home/greg/Work/qemu/qemu-master/block/io.c:1342:bdrv_co_do_pwritev: Assertion `!(bs->open_flags & 0x0800)' failed. Aborted (core dumped) If we try postcopy with a similar scenario, we also get the writev error message but QEMU leaves the guest paused because entered_postcopy is true. We could possibly do the same with precopy and leave the guest paused. But since the historical default for migration errors is to restart the source, this patch adds a call to bdrv_invalidate_cache_all() instead. Signed-off-by: Greg Kurz --- v2: - follow the existing error handling patterns (Kevin) --- migration/migration.c | 17 +++++++++++++++-- 1 file changed, 15 insertions(+), 2 deletions(-) diff --git a/migration/migration.c b/migration/migration.c index 991313a8629a..0563b4c348e6 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -1597,19 +1597,32 @@ static void migration_completion(MigrationState *s, int current_active_state, rp_error = await_return_path_close_on_source(s); trace_migration_completion_postcopy_end_after_rp(rp_error); if (rp_error) { - goto fail; + goto fail_invalidate; } } if (qemu_file_get_error(s->to_dst_file)) { trace_migration_completion_file_err(); - goto fail; + goto fail_invalidate; } migrate_set_state(&s->state, current_active_state, MIGRATION_STATUS_COMPLETED); return; +fail_invalidate: + /* If not doing postcopy, vm_start() will be called: let's regain + * control on images. + */ + if (s->state == MIGRATION_STATUS_ACTIVE) { + Error *local_err = NULL; + + bdrv_invalidate_cache_all(&local_err); + if (local_err) { + error_report_err(local_err); + } + } + fail: migrate_set_state(&s->state, current_active_state, MIGRATION_STATUS_FAILED);