From patchwork Mon May 22 11:03:05 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: George Dunlap X-Patchwork-Id: 9739925 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id E848960392 for ; Mon, 22 May 2017 11:05:29 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id E15B728400 for ; Mon, 22 May 2017 11:05:29 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id D3D8F28700; Mon, 22 May 2017 11:05:29 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-4.2 required=2.0 tests=BAYES_00, RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1 Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120]) (using TLSv1.2 with cipher AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id E74E628400 for ; Mon, 22 May 2017 11:05:28 +0000 (UTC) Received: from localhost ([127.0.0.1] helo=lists.xenproject.org) by lists.xenproject.org with esmtp (Exim 4.84_2) (envelope-from ) id 1dCl7J-0000Ci-8l; Mon, 22 May 2017 11:03:13 +0000 Received: from mail6.bemta6.messagelabs.com ([193.109.254.103]) by lists.xenproject.org with esmtp (Exim 4.84_2) (envelope-from ) id 1dCl7I-0000Cc-C6 for xen-devel@lists.xen.org; Mon, 22 May 2017 11:03:12 +0000 Received: from [85.158.143.35] by server-10.bemta-6.messagelabs.com id B1/73-03613-F65C2295; Mon, 22 May 2017 11:03:11 +0000 X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFvrGIsWRWlGSWpSXmKPExsWyU9JRQjfvqFK kwZ9XFhZLPi5mcWD0OLr7N1MAYxRrZl5SfkUCa0bDvgbmglWGFXsufmBqYFyo28XIySEhECTx bP4tJgg7T+LIwbvsXYwcQHaRxOVHriBhXgFBiZMzn7CA2JwCdhKHfj1i7GLk4hASaGOUuNrdw A6SYBPQk5h3/CtYEYuAqsSb9R+ZIOYkSpyaKAYxJ0Bi9tRTYOXCAi4S3bO/MoLYIgJxEnM+TW UCmcks8JhRYsP6XcwgCWYBN4mNva/BZgoBzVz84Cg7xJ3pEiv2nmKZwCgwC8l9s5C0QNiaEq3 bf7ND2GUSiy42QMVzJF5/n8gGYStKTOl+CFYjISAjsfPddbYFjOyrGNWLU4vKUot0zfWSijLT M0pyEzNzdA0NzPRyU4uLE9NTcxKTivWS83M3MQJDnwEIdjDOvOx/iFGSg0lJlHf5JqVIIb6k/ JTKjMTijPii0pzU4kOMMhwcShK8vkeAcoJFqempFWmZOcAohElLcPAoifC6gKR5iwsSc4sz0y FSpxiNOd4t/fCeiaOv4+N7JiGWvPy8VClx3jyQUgGQ0ozSPLhBsORwiVFWSpiXEeg0IZ6C1KL czBJU+VeM4hyMSsK8miBTeDLzSuD2vQI6hQnoFOtn8iCnlCQipKQaGBWfbzyY9LH6rKdPnfdN Q6Vop1WuFxmMNfZtrNE3MTE60qY7qejeqm2Pdgtcff2Wzc9m9dLgK3w5cyR6pm8RCG0Wesq8L e+T/+T6LBn/d/u/CeetXOdzNcRo4m/dYrG+DYX5YT2XOlON0/pOpSptTXLv0r55vW77vykz/U 1SGR/tW7c82PXhfCWW4oxEQy3mouJEAFMJbQMJAwAA X-Env-Sender: prvs=308df9f11=George.Dunlap@citrix.com X-Msg-Ref: server-6.tower-21.messagelabs.com!1495450990!46948682!1 X-Originating-IP: [185.25.65.24] X-SpamReason: No, hits=0.5 required=7.0 tests=BODY_RANDOM_LONG, received_headers: No Received headers X-StarScan-Received: X-StarScan-Version: 9.4.12; banners=-,-,- X-VirusChecked: Checked Received: (qmail 27446 invoked from network); 22 May 2017 11:03:10 -0000 Received: from smtp.eu.citrix.com (HELO SMTP.EU.CITRIX.COM) (185.25.65.24) by server-6.tower-21.messagelabs.com with RC4-SHA encrypted SMTP; 22 May 2017 11:03:10 -0000 X-IronPort-AV: E=Sophos; i="5.38,377,1491264000"; d="scan'208,223"; a="46483675" X-Gm-Message-State: AODbwcDXHCgc/xVSMqJ5LFXQ+IbP/oc+PMDSADlmpV3arK3hEfLgm+cI mve62cAobkBFoOE+ZYMSSV8B3YxQRA== X-Received: by 10.55.217.70 with SMTP id u67mr19279369qki.17.1495450986178; Mon, 22 May 2017 04:03:06 -0700 (PDT) MIME-Version: 1.0 In-Reply-To: References: From: George Dunlap Date: Mon, 22 May 2017 12:03:05 +0100 X-Gmail-Original-Message-ID: Message-ID: To: "Hao, Xudong" , "xen-devel@lists.xen.org" X-ClientProxiedBy: FTLPEX02CAS01.citrite.net (10.13.99.120) To AMSPEX02CL01.citrite.net (10.69.22.125) Cc: Lars Kurth , Andrew Cooper , Julien Grall , Paul Durrant , Jan Beulich , "Gao, Chao" Subject: Re: [Xen-devel] [BUG] repeated live migration for VM failed X-BeenThere: xen-devel@lists.xen.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: Xen developer discussion List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Errors-To: xen-devel-bounces@lists.xen.org Sender: "Xen-devel" X-Virus-Scanned: ClamAV using ClamSMTP On Mon, May 22, 2017 at 11:18 AM, George Dunlap wrote: > On 22/05/17 07:35, Hao, Xudong wrote: >> Bug detailed description: >> >> ---------------- >> >> Create one RHEL7.3 HVM and do live migration continuously, while doing the 200+ or 300+ times live-migration, tool stack report error and migration failed. >> >> >> >> Environment : >> >> ---------------- >> >> HW: Skylake server >> >> Xen: Xen 4.9.0 RC4 >> >> Dom0: Linux 4.11.0 >> >> >> >> Reproduce steps: >> >> ---------------- >> >> 1. Compile Xen 4.9 Rc4 and dom0 kernel 4.11.0, boot to dom0 >> >> 2. Boot RHEL7.3 HVM guest >> >> 3. Migrate guest to localhost, sleep 10 seconds >> >> 4. Repeat doing the step3. >> >> >> >> Current result: >> >> ---------------- >> >> VM Migration fail. >> >> >> >> Base error log: >> >> ---------------- >> >> xl migrate 24hrs_lm_guest_2 localhost >> >> root@localhost's password: >> >> migration target: Ready to receive domain. >> >> Saving to migration stream new xl format (info 0x3/0x0/1761) >> >> Loading new save file (new xl fmt info 0x3/0x0/1761) >> >> Savefile contains xl domain config in JSON format >> >> Parsing config from >> >> xc: info: Saving domain 273, type x86 HVM >> >> xc: info: Found x86 HVM domain from Xen 4.9 >> >> xc: info: Restoring domain >> >> xc: error: set HVM param 12 = 0x00000000feffe000 (85 = Interrupted system call should ): Internal error >> >> xc: error: Restore failed (85 = Interrupted system call should ): Internal error > > Interesting -- it appears that setting HVM_PARAM_IDENT_PT (#12) can fail > with -ERESTART. But the comment for ERESTART makes it explicit that it > should be internal only -- it should cause a hypercall continuation (so > that the hypercall restarts automatically), rather than returning to the > guest. > > But the hypercall continuation code seems to have disappeared from > do_hvm_op() at some point? > > /me digs a bit more... The problem turns out to be commit ae20ccf ("dm_op: convert HVMOP_set_mem_type"), which says: This patch removes the need for handling HVMOP restarts, so that infrastructure is removed. While it's true that there are no more operations which need iteration information restored, but there are two operations which may still need to be restarted to avoid deadlocks with other operations. Attached is a patch which restores hypercall continuation checking. Xudong, can you give it a test? Thanks, -George Reviewed-by: Andrew Cooper (with the final Tested-by: Xudong Hao From 3d4ce135ea3b396bb63752c39e6234366d590c16 Mon Sep 17 00:00:00 2001 From: George Dunlap Date: Mon, 22 May 2017 11:38:31 +0100 Subject: [PATCH] Restore HVM_OP hypercall continuation (partial revert of ae20ccf) Commit ae20ccf removed the hypercall continuation logic from the end of do_hvm_op(), claiming: "This patch removes the need for handling HVMOP restarts, so that infrastructure is removed." That turns out to be false. The removal of HVMOP_set_mem_type removed the need to store a start iteration value in the hypercall continuation, but a grep through hvm.c for ERESTART turns up at least two places where do_hvm_op() may still need a hypercall continuation: * HVMOP_set_hvm_param can return -ERESTART when setting HVM_PARAM_IDENT_PT in the event that it fails to acquire the domctl lock * HVMOP_flush_tlbs can return -ERESTART if several vcpus call it at the same time In both cases, a simple restart (with no stored iteration information) is necessary. Add a check for -ERESTART again, along with a comment at the top of the function regarding the lack of decoding any information from the op value. Reported-by: Xudong Hao Signed-off-by: George Dunlap --- CC: Andrew Cooper CC: Jan Beulich CC: Paul Durrant --- xen/arch/x86/hvm/hvm.c | 12 +++++++++++- 1 file changed, 11 insertions(+), 1 deletion(-) diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c index 81691e2..e3e817d 100644 --- a/xen/arch/x86/hvm/hvm.c +++ b/xen/arch/x86/hvm/hvm.c @@ -4544,6 +4544,13 @@ long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) arg) { long rc = 0; + /* + * NB: hvm_op can be part of a restarted hypercall; but at the + * moment the only hypercalls which do continuations don't need to + * store any iteration information (since they're just re-trying + * the acquisition of a lock). + */ + switch ( op ) { case HVMOP_set_evtchn_upcall_vector: @@ -4636,6 +4643,10 @@ long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) arg) } } + if ( rc == -ERESTART ) + rc = hypercall_create_continuation(__HYPERVISOR_hvm_op, "lh", + op, arg); + return rc; } @@ -4869,4 +4880,3 @@ void hvm_set_segment_register(struct vcpu *v, enum x86_segment seg, * indent-tabs-mode: nil * End: */ - -- 2.1.4