From patchwork Thu Sep 22 11:32:10 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: George Dunlap X-Patchwork-Id: 9345099 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id E22096077A for ; Thu, 22 Sep 2016 11:34:39 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id CFA532A9FD for ; Thu, 22 Sep 2016 11:34:39 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id C22832A9FF; Thu, 22 Sep 2016 11:34:39 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-4.2 required=2.0 tests=BAYES_00, RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1 Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120]) (using TLSv1.2 with cipher AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 83E732A9FD for ; Thu, 22 Sep 2016 11:34:38 +0000 (UTC) Received: from localhost ([127.0.0.1] helo=lists.xenproject.org) by lists.xenproject.org with esmtp (Exim 4.84_2) (envelope-from ) id 1bn2Et-0005wJ-ML; Thu, 22 Sep 2016 11:32:27 +0000 Received: from mail6.bemta6.messagelabs.com ([193.109.254.103]) by lists.xenproject.org with esmtp (Exim 4.84_2) (envelope-from ) id 1bn2Es-0005w9-Hz for Xen-devel@lists.xen.org; Thu, 22 Sep 2016 11:32:26 +0000 Received: from [85.158.143.35] by server-8.bemta-6.messagelabs.com id F5/FB-05361-941C3E75; Thu, 22 Sep 2016 11:32:25 +0000 X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFvrJIsWRWlGSWpSXmKPExsWyU9JRQtfj4ON wgz8PVCyWfFzM4sDocXT3b6YAxijWzLyk/IoE1oxpndkFZ0MqJn+azNbAuMK3i5GTQ0IgSOL/ vNWMEHaOxInL3WwQdonE4hl/2UFsXgFBiZMzn7B0MXJwcAroS+zfHdLFyMUhJLCLSWLmh31MI DVsAnoS845/ZQGxWQRUJVo/n2eGmJMo0bP/KRPEnACJOb97wGxhgXqJFwdOgM0XEdCUaL72gw 1kKLPACkaJT91z2UGWMQu4Sbz7BzZHCGjm4gdH2SFmpkus2HuKBcLmllj5+Q/LBEbBWUhOnYX QDRJmBtrQuv03O0S4TKJnGSNEOEfi08EPLBC2osSU7ofss8CGykgs3/CYeQEj+ypG9eLUorLU Il0LvaSizPSMktzEzBxdQwMzvdzU4uLE9NScxKRiveT83E2MwGhgAIIdjLMv+x9ilORgUhLlv dD/OFyILyk/pTIjsTgjvqg0J7X4EKMMB4eSBO+i/UA5waLU9NSKtMwcYFzCpCU4eJREeMtB0r zFBYm5xZnpEKlTjLocx17eXsskxJKXn5cqJc77E6RIAKQoozQPbgQsRVxilJUS5mUEOkqIpyC 1KDezBFX+FaM4B6OSMO9VkCk8mXklcJteAR3BBHTElp8PQI4oSURISTUwLr6Tld+0kuvI1SDj +5GfPlX93vAiYuoUXpkTR1dV7Svx1XPM/qBmmPS/fPa0b0Xxt1lypbpdL3qGXu3fWSa0c/YvR xUly1tZxi+bbpyL8Z7gZHe3ydJ2fe2C05fU9q1faTkjaVpuPteydWuk3/EtWfIqZ3Fb+3sr3r Z/dX6inh0KFQw/uPfXKLEUZyQaajEXFScCAHbTpcYMAwAA X-Env-Sender: prvs=0669ec993=George.Dunlap@citrix.com X-Msg-Ref: server-6.tower-21.messagelabs.com!1474543944!11999719!1 X-Originating-IP: [185.25.65.24] X-SpamReason: No, hits=0.0 required=7.0 tests=received_headers: No Received headers X-StarScan-Received: X-StarScan-Version: 8.84; banners=-,-,- X-VirusChecked: Checked Received: (qmail 4252 invoked from network); 22 Sep 2016 11:32:24 -0000 Received: from smtp.eu.citrix.com (HELO SMTP.EU.CITRIX.COM) (185.25.65.24) by server-6.tower-21.messagelabs.com with RC4-SHA encrypted SMTP; 22 Sep 2016 11:32:24 -0000 X-IronPort-AV: E=Sophos; i="5.30,378,1470700800"; d="scan'208,223"; a="31602137" X-Gm-Message-State: AA6/9Rk2cWFHj3k7/pvz3eFWkG3vgUN1fne0rIBwqIrOzdngxWzlWXwfX3IfM+6B3/T1NfhbApqiahQfIMTjyA== X-Received: by 10.55.165.135 with SMTP id o129mr1449120qke.196.1474543931310; Thu, 22 Sep 2016 04:32:11 -0700 (PDT) MIME-Version: 1.0 In-Reply-To: <57E3A06E.1020000@linux.intel.com> References: <1472813240-11011-1-git-send-email-yu.c.zhang@linux.intel.com> <1472813240-11011-2-git-send-email-yu.c.zhang@linux.intel.com> <57D24730.2050904@linux.intel.com> <57D24DFF.3070901@linux.intel.com> <57E3A06E.1020000@linux.intel.com> From: George Dunlap Date: Thu, 22 Sep 2016 12:32:10 +0100 X-Gmail-Original-Message-ID: Message-ID: To: Yu Zhang X-ClientProxiedBy: FTLPEX02CAS01.citrite.net (10.13.99.120) To AMSPEX02CL01.citrite.net (10.69.22.125) X-DLP: AMS1 Cc: Paul Durrant , "Lv, Zhiyuan" , Jan Beulich , "Xen-devel@lists.xen.org" Subject: Re: [Xen-devel] [PATCH v6 1/4] x86/ioreq server: Add HVMOP to map guest ram with p2m_ioreq_server to an ioreq server. X-BeenThere: xen-devel@lists.xen.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: Xen developer discussion List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Errors-To: xen-devel-bounces@lists.xen.org Sender: "Xen-devel" X-Virus-Scanned: ClamAV using ClamSMTP On Thu, Sep 22, 2016 at 10:12 AM, Yu Zhang wrote: > > > On 9/21/2016 9:04 PM, George Dunlap wrote: >> >> On Fri, Sep 9, 2016 at 6:51 AM, Yu Zhang >> wrote: >>>> >>>> On 9/2/2016 6:47 PM, Yu Zhang wrote: >>>>> >>>>> A new HVMOP - HVMOP_map_mem_type_to_ioreq_server, is added to >>>>> let one ioreq server claim/disclaim its responsibility for the >>>>> handling of guest pages with p2m type p2m_ioreq_server. Users >>>>> of this HVMOP can specify which kind of operation is supposed >>>>> to be emulated in a parameter named flags. Currently, this HVMOP >>>>> only support the emulation of write operations. And it can be >>>>> further extended to support the emulation of read ones if an >>>>> ioreq server has such requirement in the future. >>>>> >>>>> For now, we only support one ioreq server for this p2m type, so >>>>> once an ioreq server has claimed its ownership, subsequent calls >>>>> of the HVMOP_map_mem_type_to_ioreq_server will fail. Users can also >>>>> disclaim the ownership of guest ram pages with p2m_ioreq_server, by >>>>> triggering this new HVMOP, with ioreq server id set to the current >>>>> owner's and flags parameter set to 0. >>>>> >>>>> Note both HVMOP_map_mem_type_to_ioreq_server and p2m_ioreq_server >>>>> are only supported for HVMs with HAP enabled. >>>>> >>>>> Also note that only after one ioreq server claims its ownership >>>>> of p2m_ioreq_server, will the p2m type change to p2m_ioreq_server >>>>> be allowed. >>>>> >>>>> Signed-off-by: Paul Durrant >>>>> Signed-off-by: Yu Zhang >>>>> Acked-by: Tim Deegan >>>>> --- >>>>> Cc: Paul Durrant >>>>> Cc: Jan Beulich >>>>> Cc: Andrew Cooper >>>>> Cc: George Dunlap >>>>> Cc: Jun Nakajima >>>>> Cc: Kevin Tian >>>>> Cc: Tim Deegan >>>>> >>>>> changes in v6: >>>>> - Clarify logic in hvmemul_do_io(). >>>>> - Use recursive lock for ioreq server lock. >>>>> - Remove debug print when mapping ioreq server. >>>>> - Clarify code in ept_p2m_type_to_flags() for consistency. >>>>> - Remove definition of P2M_IOREQ_HANDLE_WRITE_ACCESS. >>>>> - Add comments for HVMMEM_ioreq_server to note only changes >>>>> to/from HVMMEM_ram_rw are permitted. >>>>> - Add domain_pause/unpause() in hvm_map_mem_type_to_ioreq_server() >>>>> to avoid the race condition when a vm exit happens on a write- >>>>> protected page, just to find the ioreq server has been unmapped >>>>> already. >>>>> - Introduce a seperate patch to delay the release of p2m >>>>> lock to avoid the race condition. >>>>> - Introduce a seperate patch to handle the read-modify-write >>>>> operations on a write protected page. >>>>> >>>> Why do we need to do this? Won't the default case just DTRT if it finds >>>> that the ioreq server has been unmapped? >>> >>> >>> Well, patch 4 will either mark the remaining p2m_ioreq_server entries as >>> "recal" or >>> reset to p2m_ram_rw directly. So my understanding is that we do not wish >>> to >>> see a ept violation due to a p2m_ioreq_server access after the ioreq >>> server >>> is unmapped. >>> Yet without this domain_pause/unpause() pair, VM accesses may trigger an >>> ept >>> violation >>> during the hvmop hypercall(hvm_map_mem_type_to_ioreq_server), just to >>> find >>> the ioreq >>> server is NULL. Then we would have to provide handlers which just do the >>> copy to/from >>> actions for the VM. This seems awkward to me. >> >> So the race you're worried about is this: >> >> 1. Guest fault happens >> 2. ioreq server calls map_mem_type_to_ioreq_server, unhooking >> 3. guest finds no ioreq server present >> >> I think in that case the easiest thing to do would be to simply assume >> there was a race and re-execute the instruction. Is that not possible >> for some reason? >> >> -George > > > Thanks for your reply, George. :) > Two reasons I'd like to use the domain_pause/unpause() to avoid the race > condition: > > 1> Like my previous explanation, in the read-modify-write scenario, the > ioreq server will > be NULL for the read emulation. But in such case, hypervisor will not > discard this trap, instead > it is supposed to do the copy work for the read access. So it would be > difficult for hypervisor > to decide if the ioreq server was detached due to a race condition, or if > the ioreq server should > be a NULL because we are emulating a read operation first for a > read-modify-write instruction. Wouldn't a patch like the attached work (applied on top of the whole series)? -George From 4e4b14641cd94c0c9fe64606b329cdbbf9c6a92b Mon Sep 17 00:00:00 2001 From: George Dunlap Date: Thu, 22 Sep 2016 12:30:26 +0100 Subject: [PATCH] x86/hvm: Handle both ioreq detach races and read-modify-write instructions Compile-tested only. Signed-off-by: George Dunlap --- xen/arch/x86/hvm/emulate.c | 68 ++++++++++++++++++++++++++++++++++++++-------- 1 file changed, 56 insertions(+), 12 deletions(-) diff --git a/xen/arch/x86/hvm/emulate.c b/xen/arch/x86/hvm/emulate.c index 564c117..120ef86 100644 --- a/xen/arch/x86/hvm/emulate.c +++ b/xen/arch/x86/hvm/emulate.c @@ -214,40 +214,84 @@ static int hvmemul_do_io( break; case X86EMUL_UNHANDLEABLE: { + /* + * Xen isn't emulating the instruction internally, so see if + * there's an ioreq server that can handle it. Rules: + * + * - PIO and "normal" mmio run through + * hvm_select_ioreq_server() to choose the ioreq server by + * range. If no server is found, the access is ignored. + * + * - p2m_ioreq_server accesses are handled by the current + * ioreq_server for the domain, but there are some corner + * cases: + * + * - If the domain ioreq_server is NULL, assume this is a + * race between unlooking the ioreq server and + * p2m_type_change and re-try the instruction. + * + * - If the IOREQ_MEM_ACCESS_WRITE flag is not set, treat it + * like a normal PIO or MMIO that doesn't have an ioreq + * server (i.e., by ignoring it). + * + * - If the accesss is a read, this must be part of a + * read-modify-write instruction; emulate the read so that we have + * it + */ + struct hvm_ioreq_server *s = NULL; p2m_type_t p2mt = p2m_invalid; - + if ( is_mmio ) { unsigned long gmfn = paddr_to_pfn(addr); (void) get_gfn_query_unlocked(currd, gmfn, &p2mt); - if ( p2mt == p2m_ioreq_server && dir == IOREQ_WRITE ) + if ( p2mt == p2m_ioreq_server ) { unsigned int flags; s = p2m_get_ioreq_server(currd, &flags); + + /* + * If p2mt is ioreq_server but ioreq_server is NULL, + * we probably lost a race with p2m_type_change; just + * retry the access. + */ + if ( s == NULL ) + { + rc = X86EMUL_RETRY; + vio->io_req.state = STATE_IOREQ_NONE; + break; + } + + /* + * This is part of a read-modify-write instruction. + * Emulate the read part so we have the value cached. + */ + if ( dir == IOREQ_READ ) + { + rc = hvm_process_io_intercept(&mem_handler, &p); + vio->io_req.state = STATE_IOREQ_NONE; + break; + } + + if ( !(flags & XEN_HVMOP_IOREQ_MEM_ACCESS_WRITE) ) + { s = NULL; + } } } if ( !s && p2mt != p2m_ioreq_server ) s = hvm_select_ioreq_server(currd, &p); - /* If there is no suitable backing DM, just ignore accesses */ if ( !s ) { - /* - * For p2m_ioreq_server pages accessed with read-modify-write - * instructions, we provide a read handler to copy the data to - * the buffer. - */ - if ( p2mt == p2m_ioreq_server ) - rc = hvm_process_io_intercept(&mem_handler, &p); - else - rc = hvm_process_io_intercept(&null_handler, &p); + /* If there is no suitable backing DM, just ignore accesses */ + rc = hvm_process_io_intercept(&null_handler, &p); vio->io_req.state = STATE_IOREQ_NONE; } else -- 2.1.4