From patchwork Fri Dec 25 01:45:33 2015 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Wen Congyang X-Patchwork-Id: 7920241 Return-Path: X-Original-To: patchwork-xen-devel@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork1.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.136]) by patchwork1.web.kernel.org (Postfix) with ESMTP id 064A09F1AF for ; Fri, 25 Dec 2015 01:48:06 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id 11D3320460 for ; Fri, 25 Dec 2015 01:48:05 +0000 (UTC) Received: from lists.xen.org (lists.xenproject.org [50.57.142.19]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 0B3F320458 for ; Fri, 25 Dec 2015 01:48:04 +0000 (UTC) Received: from localhost ([127.0.0.1] helo=lists.xen.org) by lists.xen.org with esmtp (Exim 4.72) (envelope-from ) id 1aCHS4-0002VV-9Y; Fri, 25 Dec 2015 01:45:52 +0000 Received: from mail6.bemta3.messagelabs.com ([195.245.230.39]) by lists.xen.org with esmtp (Exim 4.72) (envelope-from ) id 1aCHS2-0002VO-Uu for xen-devel@lists.xen.org; Fri, 25 Dec 2015 01:45:51 +0000 Received: from [85.158.137.68] by server-5.bemta-3.messagelabs.com id ED/18-07651-ECF9C765; Fri, 25 Dec 2015 01:45:50 +0000 X-Env-Sender: wency@cn.fujitsu.com X-Msg-Ref: server-8.tower-31.messagelabs.com!1451007947!12772497!1 X-Originating-IP: [59.151.112.132] X-SpamReason: No, hits=0.5 required=7.0 tests=BODY_RANDOM_LONG X-StarScan-Received: X-StarScan-Version: 7.35.1; banners=-,-,- X-VirusChecked: Checked Received: (qmail 7064 invoked from network); 25 Dec 2015 01:45:48 -0000 Received: from cn.fujitsu.com (HELO heian.cn.fujitsu.com) (59.151.112.132) by server-8.tower-31.messagelabs.com with SMTP; 25 Dec 2015 01:45:48 -0000 X-IronPort-AV: E=Sophos;i="5.20,346,1444665600"; d="scan'208";a="1961906" Received: from unknown (HELO cn.fujitsu.com) ([10.167.33.5]) by heian.cn.fujitsu.com with ESMTP; 25 Dec 2015 09:45:46 +0800 Received: from G08CNEXCHPEKD01.g08.fujitsu.local (unknown [10.167.33.80]) by cn.fujitsu.com (Postfix) with ESMTP id 497A54004E07; Fri, 25 Dec 2015 09:45:34 +0800 (CST) Received: from [10.167.226.52] (10.167.226.52) by G08CNEXCHPEKD01.g08.fujitsu.local (10.167.33.89) with Microsoft SMTP Server id 14.3.181.6; Fri, 25 Dec 2015 09:45:33 +0800 To: Andrew Cooper References: <567B58A0.7010201@cn.fujitsu.com> <567BE6B7.4030800@citrix.com> From: Wen Congyang Message-ID: <567C9FBD.4000104@cn.fujitsu.com> Date: Fri, 25 Dec 2015 09:45:33 +0800 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.4.0 MIME-Version: 1.0 In-Reply-To: <567BE6B7.4030800@citrix.com> X-Originating-IP: [10.167.226.52] X-yoursite-MailScanner-ID: 497A54004E07.A5198 X-yoursite-MailScanner: Found to be clean X-yoursite-MailScanner-From: wency@cn.fujitsu.com X-Spam-Status: No, score=-4.2 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_MED, UNPARSEABLE_RELAY autolearn=unavailable version=3.3.1 Cc: xen devel Subject: Re: [Xen-devel] question about migration X-BeenThere: xen-devel@lists.xen.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: Xen developer discussion List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP On 12/24/2015 08:36 PM, Andrew Cooper wrote: > On 24/12/15 02:29, Wen Congyang wrote: >> Hi Andrew Cooper: >> >> I rebase the COLO codes to the newest upstream xen, and test it. I found >> a problem in the test, and I can reproduce this problem via the migration. >> >> How to reproduce: >> 1. xl cr -p hvm_nopv >> 2. xl migrate hvm_nopv 192.168.3.1 > > You are the very first person to try a usecase like this. > > It works as much as it does because of your changes to the uncooperative HVM domain logic. I have said repeatedly during review, this is not necessarily a safe change to make without an in-depth analysis of the knock-on effects; it looks as if you have found the first knock-on effect. > >> >> The migration successes, but the vm doesn't run in the target machine. >> You can get the reason from 'xl dmesg': >> (XEN) HVM2 restore: VMCE_VCPU 1 >> (XEN) HVM2 restore: TSC_ADJUST 0 >> (XEN) HVM2 restore: TSC_ADJUST 1 >> (d2) HVM Loader >> (d2) Detected Xen v4.7-unstable >> (d2) Get guest memory maps[128] failed. (-38) >> (d2) *** HVMLoader bug at e820.c:39 >> (d2) *** HVMLoader crashed. >> >> The reason is that: >> We don't call xc_domain_set_memory_map() in the target machine. >> When we create a hvm domain: >> libxl__domain_build() >> libxl__build_hvm() >> libxl__arch_domain_construct_memmap() >> xc_domain_set_memory_map() >> >> Should we migrate the guest memory from source machine to target machine? > > This bug specifically is because HVMLoader is expected to have run and turned the hypercall information in an E820 table in the guest before a migration occurs. > > Unfortunately, the current codebase is riddled with such assumption and expectations (e.g. the HVM save code assumed that FPU context is valid when it is saving register state) which is a direct side effect of how it was developed. > > > Having said all of the above, I agree that your example is a usecase which should work. It is the ultimate test of whether the migration stream contains enough information to faithfully reproduce the domain on the far side. Clearly at the moment, this is not the case. > > I have an upcoming project to work on the domain memory layout logic, because it is unsuitable for a number of XenServer usecases. Part of that will require moving it in the migration stream. I found another migration problem in the test: If the migration fails, we will resume it in the source side. But the hvm guest doesn't response any more. In my test envirionment, the migration always successses, so I use a hack way to reproduce it: 1. modify the target xen tools: 2. xl cr hvm_nopv, and wait some time(You can login to the guest) 3. xl migrate hvm_nopv 192.168.3.1 The reason it that: We create a default ioreq server when we get the hvm param HVM_PARAM_IOREQ_PFN. It means that: the problem occurs only when the migration fails after we get the hvm param HVM_PARAM_IOREQ_PFN. In the function hvm_select_ioreq_server() If the I/O will be handed by non-default ioreq server, we will return the non-default ioreq server. In this case, it is handed by qemu. If the I/O will not be handed by non-default ioreq server, we will return the default ioreq server. Before migration, we return NULL, and after migration it is not NULL. See the caller is hvmemul_do_io(): case X86EMUL_UNHANDLEABLE: { struct hvm_ioreq_server *s = hvm_select_ioreq_server(curr->domain, &p); /* If there is no suitable backing DM, just ignore accesses */ if ( !s ) { rc = hvm_process_io_intercept(&null_handler, &p); vio->io_req.state = STATE_IOREQ_NONE; } else { rc = hvm_send_ioreq(s, &p, 0); if ( rc != X86EMUL_RETRY || curr->domain->is_shutting_down ) vio->io_req.state = STATE_IOREQ_NONE; else if ( data_is_addr ) rc = X86EMUL_OKAY; } break; We send the I/O request to the default I/O request server, but no backing DM hands it. We wil wait the I/O forever...... Thanks Wen Congyang > > ~Andrew > > > . > diff --git a/tools/libxl/libxl_stream_read.c b/tools/libxl/libxl_stream_read.c index 258dec4..da95606 100644 --- a/tools/libxl/libxl_stream_read.c +++ b/tools/libxl/libxl_stream_read.c @@ -767,6 +767,8 @@ void libxl__xc_domain_restore_done(libxl__egc *egc, void *dcs_void, goto err; } + rc = ERROR_FAIL; + err: check_all_finished(egc, stream, rc);