From patchwork Wed Mar 29 15:04:47 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Razvan Cojocaru X-Patchwork-Id: 9651805 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 45CA9601D7 for ; Wed, 29 Mar 2017 15:07:28 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 32B7528498 for ; Wed, 29 Mar 2017 15:07:28 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 26F3B284D5; Wed, 29 Mar 2017 15:07:28 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-4.2 required=2.0 tests=BAYES_00, RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1 Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120]) (using TLSv1.2 with cipher AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 0EC2427CAF for ; Wed, 29 Mar 2017 15:07:27 +0000 (UTC) Received: from localhost ([127.0.0.1] helo=lists.xenproject.org) by lists.xenproject.org with esmtp (Exim 4.84_2) (envelope-from ) id 1ctF9b-0002Bi-I5; Wed, 29 Mar 2017 15:04:55 +0000 Received: from mail6.bemta5.messagelabs.com ([195.245.231.135]) by lists.xenproject.org with esmtp (Exim 4.84_2) (envelope-from ) id 1ctF9a-0002Bc-Bf for xen-devel@lists.xen.org; Wed, 29 Mar 2017 15:04:54 +0000 Received: from [85.158.139.211] by server-11.bemta-5.messagelabs.com id 6D/1E-01710-51DCBD85; Wed, 29 Mar 2017 15:04:53 +0000 X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFlrMKsWRWlGSWpSXmKPExsUSfTxjoa7I2ds RBk8WyFks+biYxYHR4+ju30wBjFGsmXlJ+RUJrBmLtn9gKXjgVNE08yZjA2OPSRcjJ4eQgIfE 3XWb2CDstYwSdxfFdTFyAdnXGCUunX/IBlM0afUSVojEIkaJY8efgyWEBVIlri67yghiiwgoS /T++s0CUXSUWWJj22NWkASzQJbEtO9v2UFsNgFDidUbW8CaeQWcJL72/2fuYuTgYBFQlZix1h EkLCoQLvG28QgLRImgxMmZT8BsTgFnic/TPjJBjPSXeDNtPdh4CYEciVN3/zKCjJEQkJL436o EcoKEwAEWid0tpxghamQkHk28yTaBUWQWkrGzkIyCsNUl/sy7xAxhy0tsfzsHyjaV2PL/Bzum uL3Eq+dr2Bcwsq9i1ChOLSpLLdI1NNVLKspMzyjJTczM0TU0MNXLTS0uTkxPzUlMKtZLzs/dx AiMMAYg2MHYsN3zEKMkB5OSKO8Jw9sRQnxJ+SmVGYnFGfFFpTmpxYcYZTg4lCR47U4D5QSLUt NTK9Iyc4CxDpOW4OBREuF9cwoozVtckJhbnJkOkTrFqMsxZ/buN0xCLHn5ealS4rz2IDMEQIo ySvPgRsDSziVGWSlhXkago4R4ClKLcjNLUOVfMYpzMCoJ8zKATOHJzCuB2/QK6AgmoCPEbW6B HFGSiJCSamBcd6AynGvFpNWXA76xbuf39jVpSz9bW5vx7/qFSCFV9vePNHgMmf3UnlS8jbtcu 4eTe/O2kPWC8R5p0vYKCwJqFvULfj56kGWSmk2ySVmhxtKJ1xjtSkoT9jd2HztwSWVxsd76hw ePfOtlvvf1edmu3LSjD5e1O1dFMCz6ItamffNphVuFXYYSS3FGoqEWc1FxIgCq2AtwNgMAAA= = X-Env-Sender: rcojocaru@bitdefender.com X-Msg-Ref: server-12.tower-206.messagelabs.com!1490799889!55715781!1 X-Originating-IP: [91.199.104.161] X-SpamReason: No, hits=0.0 required=7.0 tests= X-StarScan-Received: X-StarScan-Version: 9.2.3; banners=-,-,- X-VirusChecked: Checked Received: (qmail 3332 invoked from network); 29 Mar 2017 15:04:51 -0000 Received: from mx01.bbu.dsd.mx.bitdefender.com (HELO mx01.bbu.dsd.mx.bitdefender.com) (91.199.104.161) by server-12.tower-206.messagelabs.com with DHE-RSA-AES128-GCM-SHA256 encrypted SMTP; 29 Mar 2017 15:04:51 -0000 Received: (qmail 26169 invoked from network); 29 Mar 2017 18:04:49 +0300 Received: from unknown (HELO mx-sr.buh.bitdefender.com) (10.17.80.103) by mx01.bbu.dsd.mx.bitdefender.com with AES256-GCM-SHA384 encrypted SMTP; 29 Mar 2017 18:04:49 +0300 Received: from smtp03.buh.bitdefender.org (smtp.bitdefender.biz [10.17.80.77]) by mx-sr.buh.bitdefender.com (Postfix) with ESMTP id F1E647FA61 for ; Wed, 29 Mar 2017 18:04:48 +0300 (EEST) Received: (qmail 22595 invoked from network); 29 Mar 2017 18:04:48 +0300 Received: from rcojocaru.dsd.ro (HELO ?10.10.14.59?) (rcojocaru@bitdefender.com@10.10.14.59) by smtp03.buh.bitdefender.org with SMTP; 29 Mar 2017 18:04:48 +0300 To: Jan Beulich References: <1490361899-18303-1-git-send-email-rcojocaru@bitdefender.com> <58DA510A0200007800148F6F@prv-mh.provo.novell.com> <925827a5-b346-1733-3c0a-64eaa7b3e251@bitdefender.com> <58DA5B7E020000780014900C@prv-mh.provo.novell.com> <58DBD8FF020000780014A113@prv-mh.provo.novell.com> From: Razvan Cojocaru Message-ID: <185bbccb-0156-07fe-d060-6135cae07caf@bitdefender.com> Date: Wed, 29 Mar 2017 18:04:47 +0300 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.8.0 MIME-Version: 1.0 In-Reply-To: X-BitDefender-Scanner: Clean, Agent: BitDefender qmail 3.1.6 on smtp03.buh.bitdefender.org, sigver: 7.70482 X-BitDefender-Spam: No (0) X-BitDefender-SpamStamp: Build: [Engines: 2.15.8.1074, Dats: 444658, Stamp: 3], Multi: [Enabled, t: (0.000014, 0.038222)], BW: [Enabled, t: (0.000010)], RBL DNSBL: [Disabled], APM: [Enabled, Score: 500, t: (0.010914), Flags: 85D2ED72; NN_NO_NEED_TO; NN_LEGIT_VALID_REPLY; NN_MPART_MIXED_WO_CT_PH_APP_ADN; NN_LEGIT_SUMM_400_WORDS; NN_NO_LINK_NMD; NN_LEGIT_BITDEFENDER; NN_LEGIT_S_SQARE_BRACKETS], SGN: [Enabled, t: (0.013213,0.000489)], URL: [Enabled, t: (0.000006)], RTDA: [Enabled, t: (0.155145), Hit: No, Details: v2.4.5; Id: 11.5eu6g4.1bc7qd559.1cvjn], total: 0(775) X-BitDefender-CF-Stamp: none Cc: andrew.cooper3@citrix.com, paul.durrant@citrix.com, Tim Deegan , xen-devel@lists.xen.org Subject: Re: [Xen-devel] [PATCH RFC] x86/emulate: implement hvmemul_cmpxchg() with an actual CMPXCHG X-BeenThere: xen-devel@lists.xen.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: Xen developer discussion List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Errors-To: xen-devel-bounces@lists.xen.org Sender: "Xen-devel" X-Virus-Scanned: ClamAV using ClamSMTP On 03/29/2017 05:00 PM, Razvan Cojocaru wrote: > On 03/29/2017 04:55 PM, Jan Beulich wrote: >>>>> On 28.03.17 at 12:50, wrote: >>> On 03/28/2017 01:47 PM, Jan Beulich wrote: >>>>>>> On 28.03.17 at 12:27, wrote: >>>>> On 03/28/2017 01:03 PM, Jan Beulich wrote: >>>>>>>>> On 28.03.17 at 11:14, wrote: >>>>>>> I'm not sure that the RETRY model is what the guest OS expects. AFAIK, a >>>>>>> failed CMPXCHG should happen just once, with the proper registers and ZF >>>>>>> set. The guest surely expects neither that the instruction resume until >>>>>>> it succeeds, nor that some hidden loop goes on for an undeterminate >>>>>>> ammount of time until a CMPXCHG succeeds. >>>>>> >>>>>> The guest doesn't observe the CMPXCHG failing - RETRY leads to >>>>>> the instruction being restarted instead of completed. >>>>> >>>>> Indeed, but it works differently with hvm_emulate_one_vm_event() where >>>>> RETRY currently would have the instruction be re-executed (properly >>>>> re-executed, not just re-emulated) by the guest. >>>> >>>> Right - see my other reply to Andrew: The function likely would >>>> need to tell apart guest CMPXCHG uses from us using the insn to >>>> carry out the write by some other one. That may involve >>>> adjustments to the memory write logic in x86_emulate() itself, as >>>> the late failure of the comparison then would also need to be >>>> communicated back (via ZF clear) to the guest. >>> >>> Exactly, it would require quite some reworking of x86_emulate(). >> >> I had imagined it to be less intrusive (outside of x86_emulate()), >> but I've now learned why Andrew was able to get rid of >> X86EMUL_CMPXCHG_FAILED - the apparently intended behavior >> was never implemented. Attached a first take at it, which has >> seen smoke testing, but nothing more. The way it ends up being >> I don't think this can reasonably be considered for 4.9 at this >> point in time. (Also Cc-ing Tim for the shadow code changes, >> even if this isn't really a proper patch submission.) > > Thanks! I'll give a spin with a modified version of my CMPXCHG patch as > soon as possible. With the attached patch with hvmemul_cmpxchg() now returning X86EMUL_CMPXCHG_FAILED if __cmpxchg() fails my (32-bit) Windows 7 guest gets stuck at the "Starting Windows" screen. It's state appears to be: # ./xenctx -a 3 cs:eip: 0008:8bcd85d6 flags: 00200246 cid i z p ss:esp: 0010:82736b9c eax: 00000000 ebx: 84f3a678 ecx: 84ee2610 edx: 001eb615 esi: 40008000 edi: 82739d20 ebp: 82736c20 ds: 0023 es: 0023 fs: 0030 gs: 0000 cr0: 8001003b cr2: 8fd94000 cr3: 00185000 cr4: 000406f9 dr0: 00000000 dr1: 00000000 dr2: 00000000 dr3: 00000000 dr6: fffe0ff0 dr7: 00000400 Code (instr addr 8bcd85d6) 47 fc 83 c7 14 4e 75 ef 5f 5e c3 cc cc cc cc cc cc 8b ff fb f4 cc cc cc cc cc 8b ff 55 8b ec # ./xenctx -a 3 cs:eip: 0008:8bcd85d6 flags: 00200246 cid i z p ss:esp: 0010:82736b9c eax: 00000000 ebx: 84f3a678 ecx: 84ee2610 edx: 002ca60d esi: 40008000 edi: 82739d20 ebp: 82736c20 ds: 0023 es: 0023 fs: 0030 gs: 0000 cr0: 8001003b cr2: 8fd94000 cr3: 00185000 cr4: 000406f9 dr0: 00000000 dr1: 00000000 dr2: 00000000 dr3: 00000000 dr6: fffe0ff0 dr7: 00000400 Code (instr addr 8bcd85d6) 47 fc 83 c7 14 4e 75 ef 5f 5e c3 cc cc cc cc cc cc 8b ff fb f4 cc cc cc cc cc 8b ff 55 8b ec This only happens in SMP scenarios (my guest had 10 VCPUs for easy reproduction). With a single VCPU, the guest booted fine. So something somehow is still not right when a CMPXCHG fails in a race-type situation (unless something's obviously wrong with my patch, but I don't see it). Thanks, Razvan diff --git a/xen/arch/x86/hvm/emulate.c b/xen/arch/x86/hvm/emulate.c index 2d92957..b946ef7 100644 --- a/xen/arch/x86/hvm/emulate.c +++ b/xen/arch/x86/hvm/emulate.c @@ -20,6 +20,7 @@ #include #include #include +#include #include #include #include @@ -1029,6 +1030,77 @@ static int hvmemul_wbinvd_discard( return X86EMUL_OKAY; } +static int hvmemul_vaddr_to_mfn( + unsigned long addr, + mfn_t *mfn, + uint32_t pfec, + struct x86_emulate_ctxt *ctxt) +{ + paddr_t gpa = addr & ~PAGE_MASK; + struct page_info *page; + p2m_type_t p2mt; + unsigned long gfn; + struct vcpu *curr = current; + struct hvm_emulate_ctxt *hvmemul_ctxt = + container_of(ctxt, struct hvm_emulate_ctxt, ctxt); + + gfn = paging_gva_to_gfn(curr, addr, &pfec); + + if ( gfn == gfn_x(INVALID_GFN) ) + { + pagefault_info_t pfinfo = {}; + + if ( ( pfec & PFEC_page_paged ) || ( pfec & PFEC_page_shared ) ) + return X86EMUL_RETRY; + + pfinfo.linear = addr; + pfinfo.ec = pfec; + + x86_emul_pagefault(pfinfo.ec, pfinfo.linear, &hvmemul_ctxt->ctxt); + return X86EMUL_EXCEPTION; + } + + gpa |= (paddr_t)gfn << PAGE_SHIFT; + + /* + * No need to do the P2M lookup for internally handled MMIO, benefiting + * - 32-bit WinXP (& older Windows) on AMD CPUs for LAPIC accesses, + * - newer Windows (like Server 2012) for HPET accesses. + */ + if ( !nestedhvm_vcpu_in_guestmode(curr) && hvm_mmio_internal(gpa) ) + return X86EMUL_UNHANDLEABLE; + + page = get_page_from_gfn(curr->domain, gfn, &p2mt, P2M_UNSHARE); + + if ( !page ) + return X86EMUL_UNHANDLEABLE; + + if ( p2m_is_paging(p2mt) ) + { + put_page(page); + p2m_mem_paging_populate(curr->domain, gfn); + return X86EMUL_RETRY; + } + + if ( p2m_is_shared(p2mt) ) + { + put_page(page); + return X86EMUL_RETRY; + } + + if ( p2m_is_grant(p2mt) ) + { + put_page(page); + return X86EMUL_UNHANDLEABLE; + } + + *mfn = _mfn(page_to_mfn(page)); + + put_page(page); + + return X86EMUL_OKAY; +} + static int hvmemul_cmpxchg( enum x86_segment seg, unsigned long offset, @@ -1037,8 +1109,70 @@ static int hvmemul_cmpxchg( unsigned int bytes, struct x86_emulate_ctxt *ctxt) { - /* Fix this in case the guest is really relying on r-m-w atomicity. */ - return hvmemul_write(seg, offset, p_new, bytes, ctxt); + unsigned long addr, reps = 1; + int rc = X86EMUL_OKAY; + unsigned long old = 0, new = 0; + uint32_t pfec = PFEC_page_present | PFEC_write_access; + struct hvm_emulate_ctxt *hvmemul_ctxt = + container_of(ctxt, struct hvm_emulate_ctxt, ctxt); + mfn_t mfn[2]; + void *map = NULL; + struct domain *currd = current->domain; + + if ( is_x86_system_segment(seg) ) + pfec |= PFEC_implicit; + else if ( hvmemul_ctxt->seg_reg[x86_seg_ss].attr.fields.dpl == 3 ) + pfec |= PFEC_user_mode; + + rc = hvmemul_virtual_to_linear( + seg, offset, bytes, &reps, hvm_access_write, hvmemul_ctxt, &addr); + + if ( rc != X86EMUL_OKAY || !bytes ) + return rc; + + rc = hvmemul_vaddr_to_mfn(addr, &mfn[0], pfec, ctxt); + + if ( rc != X86EMUL_OKAY ) + return rc; + + if ( likely(((addr + bytes - 1) & PAGE_MASK) == (addr & PAGE_MASK)) ) + { + /* Whole write fits on a single page. */ + mfn[1] = INVALID_MFN; + map = map_domain_page(mfn[0]); + } + else + { + rc = hvmemul_vaddr_to_mfn((addr + bytes - 1) & PAGE_MASK, &mfn[1], + pfec, ctxt); + if ( rc != X86EMUL_OKAY ) + return rc; + + map = vmap(mfn, 2); + } + + if ( !map ) + return X86EMUL_UNHANDLEABLE; + + map += (addr & ~PAGE_MASK); + + memcpy(&old, p_old, bytes); + memcpy(&new, p_new, bytes); + + if ( __cmpxchg(map, old, new, bytes) != old ) + rc = X86EMUL_CMPXCHG_FAILED; + + paging_mark_dirty(currd, mfn[0]); + + if ( unlikely(mfn_valid(mfn[1])) ) + { + paging_mark_dirty(currd, mfn[1]); + vunmap((void *)((unsigned long)map & PAGE_MASK)); + } + else + unmap_domain_page(map); + + return rc; } static int hvmemul_validate(