From patchwork Fri Jun 3 07:52:33 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zhanghailiang X-Patchwork-Id: 9152087 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 0D64D6072B for ; Fri, 3 Jun 2016 08:16:30 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id EE86A2675C for ; Fri, 3 Jun 2016 08:16:29 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id E1FF628309; Fri, 3 Jun 2016 08:16:29 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.9 required=2.0 tests=BAYES_00,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from lists.gnu.org (lists.gnu.org [208.118.235.17]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 27A2D2675C for ; Fri, 3 Jun 2016 08:16:28 +0000 (UTC) Received: from localhost ([::1]:53147 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1b8kHL-0002Cy-9u for patchwork-qemu-devel@patchwork.kernel.org; Fri, 03 Jun 2016 04:16:27 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:41477) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1b8jvs-0007HQ-Gb for qemu-devel@nongnu.org; Fri, 03 Jun 2016 03:54:17 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1b8jvo-0004q2-AN for qemu-devel@nongnu.org; Fri, 03 Jun 2016 03:54:16 -0400 Received: from szxga02-in.huawei.com ([119.145.14.65]:25190) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1b8jvn-0004oy-D4 for qemu-devel@nongnu.org; Fri, 03 Jun 2016 03:54:12 -0400 Received: from 172.24.1.137 (EHLO szxeml428-hub.china.huawei.com) ([172.24.1.137]) by szxrg02-dlp.huawei.com (MOS 4.3.7-GA FastPath queued) with ESMTP id DIE18641; Fri, 03 Jun 2016 15:53:17 +0800 (CST) Received: from localhost (10.177.24.212) by szxeml428-hub.china.huawei.com (10.82.67.183) with Microsoft SMTP Server id 14.3.235.1; Fri, 3 Jun 2016 15:53:10 +0800 From: zhanghailiang To: , , , Date: Fri, 3 Jun 2016 15:52:33 +0800 Message-ID: <1464940366-9880-22-git-send-email-zhang.zhanghailiang@huawei.com> X-Mailer: git-send-email 2.7.2.windows.1 In-Reply-To: <1464940366-9880-1-git-send-email-zhang.zhanghailiang@huawei.com> References: <1464940366-9880-1-git-send-email-zhang.zhanghailiang@huawei.com> MIME-Version: 1.0 X-Originating-IP: [10.177.24.212] X-CFilter-Loop: Reflected X-Mirapoint-Virus-RAPID-Raw: score=unknown(0), refid=str=0001.0A090202.57513797.0024, ss=1, re=0.000, recu=0.000, reip=0.000, cl=1, cld=1, fgs=0, ip=0.0.0.0, so=2013-06-18 04:22:30, dmn=2013-03-21 17:37:32 X-Mirapoint-Loop-Id: 5f5420ac6e40ce53eef5f3819a764ddd X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.4.x-2.6.x [generic] X-Received-From: 119.145.14.65 Subject: [Qemu-devel] [PATCH COLO-Frame v17 21/34] COLO failover: Shutdown related socket fd when do failover X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: xiecl.fnst@cn.fujitsu.com, lizhijian@cn.fujitsu.com, yunhong.jiang@intel.com, eddie.dong@intel.com, peter.huangpeng@huawei.com, zhanghailiang , arei.gonglei@huawei.com, stefanha@redhat.com, zhangchen.fnst@cn.fujitsu.com, hongyang.yang@easystack.cn Errors-To: qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org Sender: "Qemu-devel" X-Virus-Scanned: ClamAV using ClamSMTP If the net connection between COLO's two sides is broken while COLO or COLO incoming thread is blocked in read()/write() socket fd. It will not detect this error until connection is timeout. That will be a long time. Here we shutdown all the related socket file descriptors to wake up the blocking operation in failover BH. Besides, we should close the corresponding file descriptors after failvoer BH shutdown them, or there will be an error. Signed-off-by: zhanghailiang Signed-off-by: Li Zhijian Reviewed-by: Dr. David Alan Gilbert Cc: Dr. David Alan Gilbert --- v17: - Rename colo_sem to colo_exit_sem. v13: - Add Reviewed-by tag - Use semaphore to notify colo/colo incoming loop that failover work is finished. v12: - Shutdown both QEMUFile's fd though they may use the same fd. (Dave's suggestion) v11: - Only shutdown fd for once --- include/migration/migration.h | 3 +++ migration/colo.c | 43 +++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 46 insertions(+) diff --git a/include/migration/migration.h b/include/migration/migration.h index 566b2a5..74f49ee 100644 --- a/include/migration/migration.h +++ b/include/migration/migration.h @@ -113,6 +113,7 @@ struct MigrationIncomingState { QemuThread colo_incoming_thread; /* The coroutine we should enter (back) after failover */ Coroutine *migration_incoming_co; + QemuSemaphore colo_incoming_sem; /* See savevm.c */ LoadStateEntry_Head loadvm_handlers; @@ -181,6 +182,8 @@ struct MigrationState QSIMPLEQ_HEAD(src_page_requests, MigrationSrcPageRequest) src_page_requests; /* The RAMBlock used in the last src_page_request */ RAMBlock *last_req_rb; + /* The semaphore is used to notify COLO thread that failover is finished */ + QemuSemaphore colo_exit_sem; /* The last error that occurred */ Error *error; diff --git a/migration/colo.c b/migration/colo.c index db6534a..ff7b77b 100644 --- a/migration/colo.c +++ b/migration/colo.c @@ -60,6 +60,18 @@ static void secondary_vm_do_failover(void) /* recover runstate to normal migration finish state */ autostart = true; } + /* + * Make sure colo incoming thread not block in recv or send, + * If mis->from_src_file and mis->to_src_file use the same fd, + * The second shutdown() will return -1, we ignore this value, + * It is harmless. + */ + if (mis->from_src_file) { + qemu_file_shutdown(mis->from_src_file); + } + if (mis->to_src_file) { + qemu_file_shutdown(mis->to_src_file); + } old_state = failover_set_state(FAILOVER_STATUS_HANDLING, FAILOVER_STATUS_COMPLETED); @@ -68,6 +80,8 @@ static void secondary_vm_do_failover(void) "secondary VM", old_state); return; } + /* Notify COLO incoming thread that failover work is finished */ + qemu_sem_post(&mis->colo_incoming_sem); /* For Secondary VM, jump to incoming co */ if (mis->migration_incoming_co) { qemu_coroutine_enter(mis->migration_incoming_co, NULL); @@ -82,6 +96,18 @@ static void primary_vm_do_failover(void) migrate_set_state(&s->state, MIGRATION_STATUS_COLO, MIGRATION_STATUS_COMPLETED); + /* + * Wake up COLO thread which may blocked in recv() or send(), + * The s->rp_state.from_dst_file and s->to_dst_file may use the + * same fd, but we still shutdown the fd for twice, it is harmless. + */ + if (s->to_dst_file) { + qemu_file_shutdown(s->to_dst_file); + } + if (s->rp_state.from_dst_file) { + qemu_file_shutdown(s->rp_state.from_dst_file); + } + old_state = failover_set_state(FAILOVER_STATUS_HANDLING, FAILOVER_STATUS_COMPLETED); if (old_state != FAILOVER_STATUS_HANDLING) { @@ -89,6 +115,8 @@ static void primary_vm_do_failover(void) old_state); return; } + /* Notify COLO thread that failover work is finished */ + qemu_sem_post(&s->colo_exit_sem); } void colo_do_failover(MigrationState *s) @@ -374,6 +402,14 @@ out: COLO_EXIT_REASON_REQUEST, NULL); } + /* Hope this not to be too long to wait here */ + qemu_sem_wait(&s->colo_exit_sem); + qemu_sem_destroy(&s->colo_exit_sem); + /* + * Must be called after failover BH is completed, + * Or the failover BH may shutdown the wrong fd that + * re-used by other threads after we release here. + */ if (s->rp_state.from_dst_file) { qemu_fclose(s->rp_state.from_dst_file); } @@ -382,6 +418,7 @@ out: void migrate_start_colo_process(MigrationState *s) { qemu_mutex_unlock_iothread(); + qemu_sem_init(&s->colo_exit_sem, 0); migrate_set_state(&s->state, MIGRATION_STATUS_ACTIVE, MIGRATION_STATUS_COLO); colo_process_checkpoint(s); @@ -421,6 +458,8 @@ void *colo_process_incoming_thread(void *opaque) Error *local_err = NULL; int ret; + qemu_sem_init(&mis->colo_incoming_sem, 0); + migrate_set_state(&mis->state, MIGRATION_STATUS_ACTIVE, MIGRATION_STATUS_COLO); @@ -551,6 +590,10 @@ out: */ colo_release_ram_cache(); + /* Hope this not to be too long to loop here */ + qemu_sem_wait(&mis->colo_incoming_sem); + qemu_sem_destroy(&mis->colo_incoming_sem); + /* Must be called after failover BH is completed */ if (mis->to_src_file) { qemu_fclose(mis->to_src_file); }