From patchwork Mon Jan 4 15:31:27 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ian Jackson X-Patchwork-Id: 7948521 Return-Path: X-Original-To: patchwork-xen-devel@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork1.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.136]) by patchwork1.web.kernel.org (Postfix) with ESMTP id 69B689F1C0 for ; Mon, 4 Jan 2016 15:33:53 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id 6F59E2034C for ; Mon, 4 Jan 2016 15:33:52 +0000 (UTC) Received: from lists.xen.org (lists.xenproject.org [50.57.142.19]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 45DF120219 for ; Mon, 4 Jan 2016 15:33:51 +0000 (UTC) Received: from localhost ([127.0.0.1] helo=lists.xen.org) by lists.xen.org with esmtp (Exim 4.72) (envelope-from ) id 1aG76g-0004IC-B1; Mon, 04 Jan 2016 15:31:38 +0000 Received: from mail6.bemta5.messagelabs.com ([195.245.231.135]) by lists.xen.org with esmtp (Exim 4.72) (envelope-from ) id 1aG76e-0004I3-VK for xen-devel@lists.xen.org; Mon, 04 Jan 2016 15:31:37 +0000 Received: from [85.158.139.211] by server-16.bemta-5.messagelabs.com id 09/72-14422-7509A865; Mon, 04 Jan 2016 15:31:35 +0000 X-Env-Sender: prvs=80467dea4=Ian.Jackson@citrix.com X-Msg-Ref: server-7.tower-206.messagelabs.com!1451921490!13602995!1 X-Originating-IP: [66.165.176.89] X-SpamReason: No, hits=0.0 required=7.0 tests=sa_preprocessor: VHJ1c3RlZCBJUDogNjYuMTY1LjE3Ni44OSA9PiAyMDMwMDc=\n, received_headers: No Received headers X-StarScan-Received: X-StarScan-Version: 7.35.1; banners=-,-,- X-VirusChecked: Checked Received: (qmail 46890 invoked from network); 4 Jan 2016 15:31:31 -0000 Received: from smtp.citrix.com (HELO SMTP.CITRIX.COM) (66.165.176.89) by server-7.tower-206.messagelabs.com with RC4-SHA encrypted SMTP; 4 Jan 2016 15:31:31 -0000 X-IronPort-AV: E=Sophos;i="5.20,521,1444694400"; d="scan'208";a="322744406" From: Ian Jackson MIME-Version: 1.0 Message-ID: <22154.36943.266930.151060@mariner.uk.xensource.com> Date: Mon, 4 Jan 2016 15:31:27 +0000 To: Andrew Cooper In-Reply-To: <568280A1.80703@citrix.com> References: <567B58A0.7010201@cn.fujitsu.com> <567BE6B7.4030800@citrix.com> <567C9FBD.4000104@cn.fujitsu.com> <567CB2B7.9010808@cn.fujitsu.com> <568280A1.80703@citrix.com> X-Mailer: VM 8.1.0 under 23.4.1 (i486-pc-linux-gnu) X-DLP: MIA2 Cc: Wei Liu , "Ian.Campbell@citrix.com" , Wen Congyang , xen devel Subject: Re: [Xen-devel] question about migration X-BeenThere: xen-devel@lists.xen.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: Xen developer discussion List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org X-Spam-Status: No, score=-4.2 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_MED, UNPARSEABLE_RELAY autolearn=unavailable version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Andrew Cooper writes ("Re: [Xen-devel] question about migration"): > On 25/12/2015 03:06, Wen Congyang wrote: > > Another problem: > > If migration fails after the guest is suspended, we will resume it in the source. > > In this case, we cannot shutdown it. because no process hanlds the shutdown event. > > The log in /var/log/xen/xl-hvm_nopv.log: > > Waiting for domain hvm_nopv (domid 1) to die [pid 5508] > > Domain 1 has shut down, reason code 2 0x2 > > Domain has suspended. > > Done. Exiting now > > > > The xl has exited... ... > Hmm yes. This is a libxl bug in libxl_evenable_domain_death(). CC'ing > the toolstack maintainers. AIUI this is a response to Wen's comments above. > It waits for the @releasedomain watch, but doesn't interpret the results > correctly. In particular, if it can still make successful hypercalls > with the provided domid, that domain was not the subject of > @releasedomain. (I also observe that domain_death_xswatch_callback() is > very inefficient. It only needs to make a single hypercall, not query > the entire state of all domains.) I don't understand precisely what you allege this bug to be, but: * libxl_evenable_domain_death may generate two events, a DOMAIN_SHUTDOWN and a DOMAIN_DEATH, or only one, a DOMAIN_DEATH. This is documented in libxl.h (although it refers to DESTROY rather than DEATH - see patch below to fix the doc). * @releaseDomain usually triggers twice for each domain: once when it goes to SHUTDOWN and once when it is actually destroyed. (This is obviously necessary to implement the above.) * @releaseDomain does not have a specific domain which is the "subject of @releaseDomain". Arguably this is unhelpful, but it is not libxl's fault. It arises from the VIRQ generated by Xen. Note that xenstored needs to search its own list of active domains to see what has happened; it generates the @releaseDomain event and throws away the domid. * domain_death_xswatch_callback handles the reporting for all domains that this instance of libxl is interested in. * That libxl searches its own smaller list of domains it is interested in is I think not a huge performance problem in a context where xenstored has to search all the domains on the system. In the worst case where each libxl process is interested in one domain it makes O(number of domains) hypercalls requesting info for 200 domains each. If this `200' is a problem a better heuristic would be quite easy. * Wen is using xl. The domain death events in xl are handled in xl_cmdimpl.c in the big while loop near the end of create_domain. The message in the log comes from this (in `create_domain'): case LIBXL_EVENT_TYPE_DOMAIN_SHUTDOWN: LOG("Domain %d has shut down, reason code %d 0x%x", domid, event->u.domain_shutdown.shutdown_reason, event->u.domain_shutdown.shutdown_reason); and then this (in `handle_domain_death') case LIBXL_SHUTDOWN_REASON_SUSPEND: LOG("Domain has suspended."); return 0; This is anomalous coding style because handle_domain_death should return one of the DOMAIN_RESTART values. This anomaly was introduced in 1a0e1789 "xl: support on_{poweroff,reboot,crash} domain configuration options", which introduced ACTION_* but left the suspend case to `return 0'. 0 now means in this context DOMAIN_RESTART_NONE which means that the daemonic xl should simply exit. But this stylistic anomaly is of no functional import: even before that (eg in 0e89a4ce~), shutdown with the reason code suspend causes xl to exit. The daemonic xl exits in this situation because it expects the migration machinery to tidy up the domain. * It is not possible to resume the domain in the source after it has suspended. The domain is, in the hypervisor, in the lifecycle state SHUTDOWN. For this reason in general, it is a bad idea to suspend the guest until the migration is known to be likely to succeed. After a late migration failure the domain should probably be treated as crashed. This is slightly awkward to implement because the daemonic xl would have to hang around to see what happened next and perhaps be told what to do. Or maybe the migrating xl would have to know what the on_crash configuration was. In the meantime the failure is rather less graceful: debris is simply left for the user. This is not ideal but tolerable I think. Ian. From 00c4fc6495078026d68f3fdd88bfd29a8ad483d7 Mon Sep 17 00:00:00 2001 From: Ian Jackson Date: Mon, 4 Jan 2016 15:13:14 +0000 Subject: [PATCH] libxl: Fix doc comment ref to DOMAIN_DEATH The doc comment for libxl_evdisable_domain_death mistakenly referred to DOMAIN_DESTROY but the event type name is actually DOMAIN_DEATH. Signed-off-by: Ian Jackson CC: Andrew Cooper Acked-by: Ian Campbell --- tools/libxl/libxl_event.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/tools/libxl/libxl_event.h b/tools/libxl/libxl_event.h index fad4c14..1ea789e 100644 --- a/tools/libxl/libxl_event.h +++ b/tools/libxl/libxl_event.h @@ -179,9 +179,9 @@ typedef struct libxl__evgen_domain_death libxl_evgen_domain_death; int libxl_evenable_domain_death(libxl_ctx *ctx, uint32_t domid, libxl_ev_user, libxl_evgen_domain_death **evgen_out); void libxl_evdisable_domain_death(libxl_ctx *ctx, libxl_evgen_domain_death*); - /* Arranges for the generation of DOMAIN_SHUTDOWN and DOMAIN_DESTROY + /* Arranges for the generation of DOMAIN_SHUTDOWN and DOMAIN_DEATH * events. A domain which is destroyed before it shuts down - * may generate only a DESTROY event. + * may generate only a DEATH event. */ typedef struct libxl__evgen_disk_eject libxl_evgen_disk_eject;