From patchwork Wed Mar 29 13:55:43 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jan Beulich X-Patchwork-Id: 9651613 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 7A156602BE for ; Wed, 29 Mar 2017 13:58:26 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 6519628485 for ; Wed, 29 Mar 2017 13:58:26 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 588682849F; Wed, 29 Mar 2017 13:58:26 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-4.2 required=2.0 tests=BAYES_00, RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1 Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120]) (using TLSv1.2 with cipher AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 582C428485 for ; Wed, 29 Mar 2017 13:58:25 +0000 (UTC) Received: from localhost ([127.0.0.1] helo=lists.xenproject.org) by lists.xenproject.org with esmtp (Exim 4.84_2) (envelope-from ) id 1ctE4l-0003d0-Hr; Wed, 29 Mar 2017 13:55:51 +0000 Received: from mail6.bemta3.messagelabs.com ([195.245.230.39]) by lists.xenproject.org with esmtp (Exim 4.84_2) (envelope-from ) id 1ctE4k-0003cu-4p for xen-devel@lists.xen.org; Wed, 29 Mar 2017 13:55:50 +0000 Received: from [85.158.137.68] by server-7.bemta-3.messagelabs.com id 76/F2-23854-5ECBBD85; Wed, 29 Mar 2017 13:55:49 +0000 X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFjrAIsWRWlGSWpSXmKPExsXS6fjDS/fJnts RBm/WqVss+biYxYHR4+ju30wBjFGsmXlJ+RUJrBlbOx+zFVxsYazoXPeBqYHxUkIXIyeHkECe ROu7i8wgNq+AncSVrzvZQWwJAUOJp++vs4HYLAKqEhumP2YCsdkE1CXanm1nBbFFgGpOznjMC GIzCxRKvFvRDTZHWCBS4uDGH0C9XEDzbzJJHPt8EGwQp4CzxPPr64EGcQAtE5T4u0MYotdO4u aJ38wTGHlmIWRmIclA2FoSD3/dYoGwtSWWLXzNDFLOLCAtsfwfB0TYWmLnjk/MqEpAbDeJ/RP 3Mi5g5FjFqFGcWlSWWqRrZKKXVJSZnlGSm5iZo2toYKyXm1pcnJiempOYVKyXnJ+7iREYsvUM DIw7GF8d9zvEKMnBpCTKe8LwdoQQX1J+SmVGYnFGfFFpTmrxIUYZDg4lCd5ju4FygkWp6akVa Zk5wOiBSUtw8CiJ8D4GSfMWFyTmFmemQ6ROMSpKifM+B0kIgCQySvPg2mARe4lRVkqYl5GBgU GIpyC1KDezBFX+FaM4B6OSMO8ekCk8mXklcNNfAS1mAlosbnMLZHFJIkJKqoEx+gVjV+Del/M 3x0bXStVztnbPOp973iA1pCUx6Zqe9yfNPNWluaf3rlo1g32tuUiS997GtMqCxwwJgXdTdA8I K/87079bzXS7+/ZrJSXlmi+ydZO0lLv+T+WK/3xfXmvaWrETyjdX5TNdmbDt40zxKQGOHHKS/ Yd7wgTKNx9cf/9QXfql9U+UWIozEg21mIuKEwH6PPub0wIAAA== X-Env-Sender: JBeulich@suse.com X-Msg-Ref: server-3.tower-31.messagelabs.com!1490795746!92695595!1 X-Originating-IP: [137.65.248.74] X-SpamReason: No, hits=0.5 required=7.0 tests=BODY_RANDOM_LONG X-StarScan-Received: X-StarScan-Version: 9.2.3; banners=-,-,- X-VirusChecked: Checked Received: (qmail 31580 invoked from network); 29 Mar 2017 13:55:48 -0000 Received: from prv-mh.provo.novell.com (HELO prv-mh.provo.novell.com) (137.65.248.74) by server-3.tower-31.messagelabs.com with DHE-RSA-AES256-GCM-SHA384 encrypted SMTP; 29 Mar 2017 13:55:48 -0000 Received: from INET-PRV-MTA by prv-mh.provo.novell.com with Novell_GroupWise; Wed, 29 Mar 2017 07:55:45 -0600 Message-Id: <58DBD8FF020000780014A113@prv-mh.provo.novell.com> X-Mailer: Novell GroupWise Internet Agent 14.2.1 Date: Wed, 29 Mar 2017 07:55:43 -0600 From: "Jan Beulich" To: "Razvan Cojocaru" References: <1490361899-18303-1-git-send-email-rcojocaru@bitdefender.com> <58DA510A0200007800148F6F@prv-mh.provo.novell.com> <925827a5-b346-1733-3c0a-64eaa7b3e251@bitdefender.com> <58DA5B7E020000780014900C@prv-mh.provo.novell.com> In-Reply-To: Mime-Version: 1.0 Cc: andrew.cooper3@citrix.com, paul.durrant@citrix.com, Tim Deegan , xen-devel@lists.xen.org Subject: Re: [Xen-devel] [PATCH RFC] x86/emulate: implement hvmemul_cmpxchg() with an actual CMPXCHG X-BeenThere: xen-devel@lists.xen.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: Xen developer discussion List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Errors-To: xen-devel-bounces@lists.xen.org Sender: "Xen-devel" X-Virus-Scanned: ClamAV using ClamSMTP >>> On 28.03.17 at 12:50, wrote: > On 03/28/2017 01:47 PM, Jan Beulich wrote: >>>>> On 28.03.17 at 12:27, wrote: >>> On 03/28/2017 01:03 PM, Jan Beulich wrote: >>>>>>> On 28.03.17 at 11:14, wrote: >>>>> I'm not sure that the RETRY model is what the guest OS expects. AFAIK, a >>>>> failed CMPXCHG should happen just once, with the proper registers and ZF >>>>> set. The guest surely expects neither that the instruction resume until >>>>> it succeeds, nor that some hidden loop goes on for an undeterminate >>>>> ammount of time until a CMPXCHG succeeds. >>>> >>>> The guest doesn't observe the CMPXCHG failing - RETRY leads to >>>> the instruction being restarted instead of completed. >>> >>> Indeed, but it works differently with hvm_emulate_one_vm_event() where >>> RETRY currently would have the instruction be re-executed (properly >>> re-executed, not just re-emulated) by the guest. >> >> Right - see my other reply to Andrew: The function likely would >> need to tell apart guest CMPXCHG uses from us using the insn to >> carry out the write by some other one. That may involve >> adjustments to the memory write logic in x86_emulate() itself, as >> the late failure of the comparison then would also need to be >> communicated back (via ZF clear) to the guest. > > Exactly, it would require quite some reworking of x86_emulate(). I had imagined it to be less intrusive (outside of x86_emulate()), but I've now learned why Andrew was able to get rid of X86EMUL_CMPXCHG_FAILED - the apparently intended behavior was never implemented. Attached a first take at it, which has seen smoke testing, but nothing more. The way it ends up being I don't think this can reasonably be considered for 4.9 at this point in time. (Also Cc-ing Tim for the shadow code changes, even if this isn't really a proper patch submission.) Jan x86emul: correctly handle CMPXCHG* comparison failures If the ->cmpxchg() hook finds a mismatch, we should deal with this the same as when the "manual" comparison reports a mismatch. This involves reverting bfce0e62c3 ("x86/emul: Drop X86EMUL_CMPXCHG_FAILED"), albeit with X86EMUL_CMPXCHG_FAILED now becoming a value distinct from X86EMUL_RETRY. In order to not leave mixed code also fully switch affected functions from paddr_t to intpte_t. Signed-off-by: Jan Beulich --- The code could be further simplified if we could rely on all ->cmpxchg() hooks always using CMPXCHG, but for now we need to cope with them using plain write (and hence accept the double reads if CMPXCHG is actually being used). Note that the patch doesn't address the incorrectness of there not being a memory write even in the comparison-failed case. --- a/xen/arch/x86/mm.c +++ b/xen/arch/x86/mm.c @@ -5236,16 +5236,17 @@ static int ptwr_emulated_read( static int ptwr_emulated_update( unsigned long addr, - paddr_t old, - paddr_t val, + intpte_t *p_old, + intpte_t val, unsigned int bytes, - unsigned int do_cmpxchg, struct ptwr_emulate_ctxt *ptwr_ctxt) { unsigned long mfn; unsigned long unaligned_addr = addr; struct page_info *page; l1_pgentry_t pte, ol1e, nl1e, *pl1e; + intpte_t old = p_old ? *p_old : 0; + unsigned int offset = 0; struct vcpu *v = current; struct domain *d = v->domain; int ret; @@ -5259,28 +5260,30 @@ static int ptwr_emulated_update( } /* Turn a sub-word access into a full-word access. */ - if ( bytes != sizeof(paddr_t) ) + if ( bytes != sizeof(val) ) { - paddr_t full; - unsigned int rc, offset = addr & (sizeof(paddr_t)-1); + intpte_t full; + unsigned int rc; + + offset = addr & (sizeof(full) - 1); /* Align address; read full word. */ - addr &= ~(sizeof(paddr_t)-1); - if ( (rc = copy_from_user(&full, (void *)addr, sizeof(paddr_t))) != 0 ) + addr &= ~(sizeof(full) - 1); + if ( (rc = copy_from_user(&full, (void *)addr, sizeof(full))) != 0 ) { x86_emul_pagefault(0, /* Read fault. */ - addr + sizeof(paddr_t) - rc, + addr + sizeof(full) - rc, &ptwr_ctxt->ctxt); return X86EMUL_EXCEPTION; } /* Mask out bits provided by caller. */ - full &= ~((((paddr_t)1 << (bytes*8)) - 1) << (offset*8)); + full &= ~((((intpte_t)1 << (bytes * 8)) - 1) << (offset * 8)); /* Shift the caller value and OR in the missing bits. */ - val &= (((paddr_t)1 << (bytes*8)) - 1); + val &= (((intpte_t)1 << (bytes * 8)) - 1); val <<= (offset)*8; val |= full; /* Also fill in missing parts of the cmpxchg old value. */ - old &= (((paddr_t)1 << (bytes*8)) - 1); + old &= (((intpte_t)1 << (bytes * 8)) - 1); old <<= (offset)*8; old |= full; } @@ -5302,7 +5305,7 @@ static int ptwr_emulated_update( { default: if ( is_pv_32bit_domain(d) && (bytes == 4) && (unaligned_addr & 4) && - !do_cmpxchg && (l1e_get_flags(nl1e) & _PAGE_PRESENT) ) + !p_old && (l1e_get_flags(nl1e) & _PAGE_PRESENT) ) { /* * If this is an upper-half write to a PAE PTE then we assume that @@ -5333,21 +5336,26 @@ static int ptwr_emulated_update( /* Checked successfully: do the update (write or cmpxchg). */ pl1e = map_domain_page(_mfn(mfn)); pl1e = (l1_pgentry_t *)((unsigned long)pl1e + (addr & ~PAGE_MASK)); - if ( do_cmpxchg ) + if ( p_old ) { - int okay; - intpte_t t = old; ol1e = l1e_from_intpte(old); - okay = paging_cmpxchg_guest_entry(v, &l1e_get_intpte(*pl1e), - &t, l1e_get_intpte(nl1e), _mfn(mfn)); - okay = (okay && t == old); + if ( !paging_cmpxchg_guest_entry(v, &l1e_get_intpte(*pl1e), + &old, l1e_get_intpte(nl1e), _mfn(mfn)) ) + ret = X86EMUL_UNHANDLEABLE; + else if ( l1e_get_intpte(ol1e) == old ) + ret = X86EMUL_OKAY; + else + { + *p_old = old >> (offset * 8); + ret = X86EMUL_CMPXCHG_FAILED; + } - if ( !okay ) + if ( ret != X86EMUL_OKAY ) { unmap_domain_page(pl1e); put_page_from_l1e(nl1e, d); - return X86EMUL_RETRY; + return ret; } } else @@ -5374,9 +5382,9 @@ static int ptwr_emulated_write( unsigned int bytes, struct x86_emulate_ctxt *ctxt) { - paddr_t val = 0; + intpte_t val = 0; - if ( (bytes > sizeof(paddr_t)) || (bytes & (bytes - 1)) || !bytes ) + if ( (bytes > sizeof(val)) || (bytes & (bytes - 1)) || !bytes ) { MEM_LOG("ptwr_emulate: bad write size (addr=%lx, bytes=%u)", offset, bytes); @@ -5385,9 +5393,9 @@ static int ptwr_emulated_write( memcpy(&val, p_data, bytes); - return ptwr_emulated_update( - offset, 0, val, bytes, 0, - container_of(ctxt, struct ptwr_emulate_ctxt, ctxt)); + return ptwr_emulated_update(offset, NULL, val, bytes, + container_of(ctxt, struct ptwr_emulate_ctxt, + ctxt)); } static int ptwr_emulated_cmpxchg( @@ -5398,21 +5406,20 @@ static int ptwr_emulated_cmpxchg( unsigned int bytes, struct x86_emulate_ctxt *ctxt) { - paddr_t old = 0, new = 0; + intpte_t new = 0; - if ( (bytes > sizeof(paddr_t)) || (bytes & (bytes -1)) ) + if ( (bytes > sizeof(new)) || (bytes & (bytes -1)) ) { MEM_LOG("ptwr_emulate: bad cmpxchg size (addr=%lx, bytes=%u)", offset, bytes); return X86EMUL_UNHANDLEABLE; } - memcpy(&old, p_old, bytes); memcpy(&new, p_new, bytes); - return ptwr_emulated_update( - offset, old, new, bytes, 1, - container_of(ctxt, struct ptwr_emulate_ctxt, ctxt)); + return ptwr_emulated_update(offset, p_old, new, bytes, + container_of(ctxt, struct ptwr_emulate_ctxt, + ctxt)); } static int pv_emul_is_mem_write(const struct x86_emulate_state *state, --- a/xen/arch/x86/mm/shadow/common.c +++ b/xen/arch/x86/mm/shadow/common.c @@ -285,7 +285,7 @@ hvm_emulate_cmpxchg(enum x86_segment seg struct sh_emulate_ctxt *sh_ctxt = container_of(ctxt, struct sh_emulate_ctxt, ctxt); struct vcpu *v = current; - unsigned long addr, old, new; + unsigned long addr, new = 0; int rc; if ( bytes > sizeof(long) ) @@ -296,12 +296,10 @@ hvm_emulate_cmpxchg(enum x86_segment seg if ( rc ) return rc; - old = new = 0; - memcpy(&old, p_old, bytes); memcpy(&new, p_new, bytes); return v->arch.paging.mode->shadow.x86_emulate_cmpxchg( - v, addr, old, new, bytes, sh_ctxt); + v, addr, p_old, new, bytes, sh_ctxt); } static const struct x86_emulate_ops hvm_shadow_emulator_ops = { --- a/xen/arch/x86/mm/shadow/multi.c +++ b/xen/arch/x86/mm/shadow/multi.c @@ -4755,11 +4755,11 @@ sh_x86_emulate_write(struct vcpu *v, uns static int sh_x86_emulate_cmpxchg(struct vcpu *v, unsigned long vaddr, - unsigned long old, unsigned long new, - unsigned int bytes, struct sh_emulate_ctxt *sh_ctxt) + unsigned long *p_old, unsigned long new, + unsigned int bytes, struct sh_emulate_ctxt *sh_ctxt) { void *addr; - unsigned long prev; + unsigned long prev, old = *p_old; int rv = X86EMUL_OKAY; /* Unaligned writes are only acceptable on HVM */ @@ -4783,7 +4783,10 @@ sh_x86_emulate_cmpxchg(struct vcpu *v, u } if ( prev != old ) - rv = X86EMUL_RETRY; + { + *p_old = prev; + rv = X86EMUL_CMPXCHG_FAILED; + } SHADOW_DEBUG(EMULATE, "va %#lx was %#lx expected %#lx" " wanted %#lx now %#lx bytes %u\n", --- a/xen/arch/x86/x86_emulate/x86_emulate.c +++ b/xen/arch/x86/x86_emulate/x86_emulate.c @@ -1880,6 +1880,9 @@ protmode_load_seg( default: return rc; + + case X86EMUL_CMPXCHG_FAILED: + return X86EMUL_RETRY; } /* Force the Accessed flag in our local copy. */ @@ -6702,6 +6705,7 @@ x86_emulate( break; case X86EMUL_OPC(0x0f, 0xb0): case X86EMUL_OPC(0x0f, 0xb1): /* cmpxchg */ + fail_if(!ops->cmpxchg); /* Save real source value, then compare EAX against destination. */ src.orig_val = src.val; src.val = _regs.r(ax); @@ -6710,8 +6714,17 @@ x86_emulate( if ( _regs.eflags & X86_EFLAGS_ZF ) { /* Success: write back to memory. */ - dst.val = src.orig_val; + dst.val = src.val; + rc = ops->cmpxchg(dst.mem.seg, dst.mem.off, &dst.val, + &src.orig_val, dst.bytes, ctxt); + if ( rc == X86EMUL_CMPXCHG_FAILED ) + { + _regs.eflags &= ~X86_EFLAGS_ZF; + rc = X86EMUL_OKAY; + } } + if ( _regs.eflags & X86_EFLAGS_ZF ) + dst.type = OP_NONE; else { /* Failure: write the value we saw to EAX. */ @@ -7016,6 +7029,7 @@ x86_emulate( if ( memcmp(old, aux, op_bytes) ) { + cmpxchgNb_failed: /* Expected != actual: store actual to rDX:rAX and clear ZF. */ _regs.r(ax) = !(rex_prefix & REX_W) ? old->u32[0] : old->u64[0]; _regs.r(dx) = !(rex_prefix & REX_W) ? old->u32[1] : old->u64[1]; @@ -7025,7 +7039,7 @@ x86_emulate( { /* * Expected == actual: Get proposed value, attempt atomic cmpxchg - * and set ZF. + * and set ZF if successful. */ if ( !(rex_prefix & REX_W) ) { @@ -7038,10 +7052,20 @@ x86_emulate( aux->u64[1] = _regs.r(cx); } - if ( (rc = ops->cmpxchg(ea.mem.seg, ea.mem.off, old, aux, - op_bytes, ctxt)) != X86EMUL_OKAY ) + switch ( rc = ops->cmpxchg(ea.mem.seg, ea.mem.off, old, aux, + op_bytes, ctxt) ) + { + case X86EMUL_OKAY: + _regs.eflags |= X86_EFLAGS_ZF; + break; + + case X86EMUL_CMPXCHG_FAILED: + rc = X86EMUL_OKAY; + goto cmpxchgNb_failed; + + default: goto done; - _regs.eflags |= X86_EFLAGS_ZF; + } } break; } @@ -8049,6 +8073,8 @@ x86_emulate( rc = ops->cmpxchg( dst.mem.seg, dst.mem.off, &dst.orig_val, &dst.val, dst.bytes, ctxt); + if ( rc == X86EMUL_CMPXCHG_FAILED ) + rc = X86EMUL_RETRY; } else { --- a/xen/arch/x86/x86_emulate/x86_emulate.h +++ b/xen/arch/x86/x86_emulate/x86_emulate.h @@ -153,6 +153,8 @@ struct x86_emul_fpu_aux { #define X86EMUL_EXCEPTION 2 /* Retry the emulation for some reason. No state modified. */ #define X86EMUL_RETRY 3 + /* (cmpxchg accessor): CMPXCHG failed. */ +#define X86EMUL_CMPXCHG_FAILED 4 /* * Operation fully done by one of the hooks: * - validate(): operation completed (except common insn retire logic) @@ -160,7 +162,7 @@ struct x86_emul_fpu_aux { * - read_io() / write_io(): bypass GPR update (non-string insns only) * Undefined behavior when used anywhere else. */ -#define X86EMUL_DONE 4 +#define X86EMUL_DONE 5 /* FPU sub-types which may be requested via ->get_fpu(). */ enum x86_emulate_fpu_type { @@ -250,6 +252,8 @@ struct x86_emulate_ops /* * cmpxchg: Emulate an atomic (LOCKed) CMPXCHG operation. * @p_old: [IN ] Pointer to value expected to be current at @addr. + * [OUT] Pointer to value found at @addr (may always be + * updated, meaningful for X86EMUL_CMPXCHG_FAILED only). * @p_new: [IN ] Pointer to value to write to @addr. * @bytes: [IN ] Operation size (up to 8 (x86/32) or 16 (x86/64) bytes). */ --- a/xen/include/asm-x86/paging.h +++ b/xen/include/asm-x86/paging.h @@ -89,7 +89,7 @@ struct shadow_paging_mode { void *src, u32 bytes, struct sh_emulate_ctxt *sh_ctxt); int (*x86_emulate_cmpxchg )(struct vcpu *v, unsigned long va, - unsigned long old, + unsigned long *old, unsigned long new, unsigned int bytes, struct sh_emulate_ctxt *sh_ctxt);