From patchwork Fri Jan 31 16:45:00 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jan Beulich X-Patchwork-Id: 11360281 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 124D1139A for ; Fri, 31 Jan 2020 16:45:52 +0000 (UTC) Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id E189320663 for ; Fri, 31 Jan 2020 16:45:51 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org E189320663 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=suse.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=xen-devel-bounces@lists.xenproject.org Received: from localhost ([127.0.0.1] helo=lists.xenproject.org) by lists.xenproject.org with esmtp (Exim 4.89) (envelope-from ) id 1ixZPh-0000CA-1r; Fri, 31 Jan 2020 16:45:01 +0000 Received: from us1-rack-iad1.inumbo.com ([172.99.69.81]) by lists.xenproject.org with esmtp (Exim 4.89) (envelope-from ) id 1ixZPf-0000Bl-Ak for xen-devel@lists.xenproject.org; Fri, 31 Jan 2020 16:44:59 +0000 X-Inumbo-ID: 03d83458-4449-11ea-b211-bc764e2007e4 Received: from mx2.suse.de (unknown [195.135.220.15]) by us1-rack-iad1.inumbo.com (Halon) with ESMTPS id 03d83458-4449-11ea-b211-bc764e2007e4; Fri, 31 Jan 2020 16:44:58 +0000 (UTC) X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx2.suse.de (Postfix) with ESMTP id 5052AAD3C; Fri, 31 Jan 2020 16:44:57 +0000 (UTC) From: Jan Beulich To: "xen-devel@lists.xenproject.org" References: Message-ID: <2c9ef202-a878-ab7b-5bfc-f9738d52d291@suse.com> Date: Fri, 31 Jan 2020 17:45:00 +0100 User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:68.0) Gecko/20100101 Thunderbird/68.4.2 MIME-Version: 1.0 In-Reply-To: Content-Language: en-US Subject: [Xen-devel] [PATCH v4 5/7] x86/mm: use cache in guest_walk_tables() X-BeenThere: xen-devel@lists.xenproject.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Xen developer discussion List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Cc: Wei Liu , Paul Durrant , George Dunlap , Andrew Cooper , Tim Deegan , =?utf-8?q?Roger_Pau_Monn=C3=A9?= Errors-To: xen-devel-bounces@lists.xenproject.org Sender: "Xen-devel" Emulation requiring device model assistance uses a form of instruction re-execution, assuming that the second (and any further) pass takes exactly the same path. This is a valid assumption as far as use of CPU registers goes (as those can't change without any other instruction executing in between), but is wrong for memory accesses. In particular it has been observed that Windows might page out buffers underneath an instruction currently under emulation (hitting between two passes). If the first pass translated a linear address successfully, any subsequent pass needs to do so too, yielding the exact same translation. To guarantee this, leverage the caching that now backs HVM insn emulation. Signed-off-by: Jan Beulich --- v4: Adjust for cache now (elsewhere) being transparent to callers. Provide inline stubs for the !HVM case. v2: Don't wrongly use top_gfn for non-root gpa calculation. Re-write cache entries after setting A/D bits (an alternative would be to suppress their setting upon cache hits). --- a/xen/arch/x86/hvm/emulate.c +++ b/xen/arch/x86/hvm/emulate.c @@ -2918,7 +2918,7 @@ bool hvmemul_read_cache(const struct vcp unsigned int i; /* Cache unavailable? */ - if ( cache->num_ents > cache->max_ents ) + if ( !is_hvm_vcpu(v) || cache->num_ents > cache->max_ents ) return false; while ( size > sizeof(cache->ents->data) ) @@ -2950,7 +2950,7 @@ void hvmemul_write_cache(const struct vc unsigned int i; /* Cache unavailable? */ - if ( cache->num_ents > cache->max_ents ) + if ( !is_hvm_vcpu(v) || cache->num_ents > cache->max_ents ) return; while ( size > sizeof(cache->ents->data) ) --- a/xen/arch/x86/mm/guest_walk.c +++ b/xen/arch/x86/mm/guest_walk.c @@ -31,6 +31,7 @@ asm(".file \"" __OBJECT_FILE__ "\""); #include #include #include +#include /* * Modify a guest pagetable entry to set the Accessed and Dirty bits. @@ -80,9 +81,9 @@ static bool set_ad_bits(guest_intpte_t * * requested walk, to see whether the access is permitted. */ bool -guest_walk_tables(struct vcpu *v, struct p2m_domain *p2m, - unsigned long va, walk_t *gw, - uint32_t walk, mfn_t top_mfn, void *top_map) +guest_walk_tables(const struct vcpu *v, struct p2m_domain *p2m, + unsigned long va, walk_t *gw, uint32_t walk, + gfn_t top_gfn, mfn_t top_mfn, void *top_map) { struct domain *d = v->domain; p2m_type_t p2mt; @@ -91,8 +92,13 @@ guest_walk_tables(struct vcpu *v, struct #if GUEST_PAGING_LEVELS >= 4 /* 64-bit only... */ guest_l3e_t *l3p = NULL; guest_l4e_t *l4p; + paddr_t l4gpa; +#endif +#if GUEST_PAGING_LEVELS >= 3 /* PAE or 64... */ + paddr_t l3gpa; #endif uint32_t gflags, rc; + paddr_t l1gpa = 0, l2gpa = 0; unsigned int leaf_level; p2m_query_t qt = P2M_ALLOC | P2M_UNSHARE; @@ -133,7 +139,13 @@ guest_walk_tables(struct vcpu *v, struct /* Get the l4e from the top level table and check its flags*/ gw->l4mfn = top_mfn; l4p = (guest_l4e_t *) top_map; - gw->l4e = l4p[guest_l4_table_offset(va)]; + l4gpa = gfn_to_gaddr(top_gfn) + + guest_l4_table_offset(va) * sizeof(gw->l4e); + if ( !hvmemul_read_cache(v, l4gpa, &gw->l4e, sizeof(gw->l4e)) ) + { + gw->l4e = l4p[guest_l4_table_offset(va)]; + hvmemul_write_cache(v, l4gpa, &gw->l4e, sizeof(gw->l4e)); + } gflags = guest_l4e_get_flags(gw->l4e); if ( !(gflags & _PAGE_PRESENT) ) goto out; @@ -163,7 +175,13 @@ guest_walk_tables(struct vcpu *v, struct } /* Get the l3e and check its flags*/ - gw->l3e = l3p[guest_l3_table_offset(va)]; + l3gpa = gfn_to_gaddr(guest_l4e_get_gfn(gw->l4e)) + + guest_l3_table_offset(va) * sizeof(gw->l3e); + if ( !hvmemul_read_cache(v, l3gpa, &gw->l3e, sizeof(gw->l3e)) ) + { + gw->l3e = l3p[guest_l3_table_offset(va)]; + hvmemul_write_cache(v, l3gpa, &gw->l3e, sizeof(gw->l3e)); + } gflags = guest_l3e_get_flags(gw->l3e); if ( !(gflags & _PAGE_PRESENT) ) goto out; @@ -215,7 +233,14 @@ guest_walk_tables(struct vcpu *v, struct #else /* PAE only... */ /* Get the l3e and check its flag */ - gw->l3e = ((guest_l3e_t *) top_map)[guest_l3_table_offset(va)]; + l3gpa = gfn_to_gaddr(top_gfn) + ((unsigned long)top_map & ~PAGE_MASK) + + guest_l3_table_offset(va) * sizeof(gw->l3e); + if ( !hvmemul_read_cache(v, l3gpa, &gw->l3e, sizeof(gw->l3e)) ) + { + gw->l3e = ((guest_l3e_t *)top_map)[guest_l3_table_offset(va)]; + hvmemul_write_cache(v, l3gpa, &gw->l3e, sizeof(gw->l3e)); + } + gflags = guest_l3e_get_flags(gw->l3e); if ( !(gflags & _PAGE_PRESENT) ) goto out; @@ -241,18 +266,24 @@ guest_walk_tables(struct vcpu *v, struct goto out; } - /* Get the l2e */ - gw->l2e = l2p[guest_l2_table_offset(va)]; + l2gpa = gfn_to_gaddr(guest_l3e_get_gfn(gw->l3e)); #else /* 32-bit only... */ - /* Get l2e from the top level table */ gw->l2mfn = top_mfn; l2p = (guest_l2e_t *) top_map; - gw->l2e = l2p[guest_l2_table_offset(va)]; + l2gpa = gfn_to_gaddr(top_gfn); #endif /* All levels... */ + /* Get the l2e */ + l2gpa += guest_l2_table_offset(va) * sizeof(gw->l2e); + if ( !hvmemul_read_cache(v, l2gpa, &gw->l2e, sizeof(gw->l2e)) ) + { + gw->l2e = l2p[guest_l2_table_offset(va)]; + hvmemul_write_cache(v, l2gpa, &gw->l2e, sizeof(gw->l2e)); + } + /* Check the l2e flags. */ gflags = guest_l2e_get_flags(gw->l2e); if ( !(gflags & _PAGE_PRESENT) ) @@ -334,7 +365,15 @@ guest_walk_tables(struct vcpu *v, struct gw->pfec |= rc & PFEC_synth_mask; goto out; } - gw->l1e = l1p[guest_l1_table_offset(va)]; + + l1gpa = gfn_to_gaddr(guest_l2e_get_gfn(gw->l2e)) + + guest_l1_table_offset(va) * sizeof(gw->l1e); + if ( !hvmemul_read_cache(v, l1gpa, &gw->l1e, sizeof(gw->l1e)) ) + { + gw->l1e = l1p[guest_l1_table_offset(va)]; + hvmemul_write_cache(v, l1gpa, &gw->l1e, sizeof(gw->l1e)); + } + gflags = guest_l1e_get_flags(gw->l1e); if ( !(gflags & _PAGE_PRESENT) ) goto out; @@ -445,22 +484,34 @@ guest_walk_tables(struct vcpu *v, struct case 1: if ( set_ad_bits(&l1p[guest_l1_table_offset(va)].l1, &gw->l1e.l1, (walk & PFEC_write_access)) ) + { paging_mark_dirty(d, gw->l1mfn); + hvmemul_write_cache(v, l1gpa, &gw->l1e, sizeof(gw->l1e)); + } /* Fallthrough */ case 2: if ( set_ad_bits(&l2p[guest_l2_table_offset(va)].l2, &gw->l2e.l2, (walk & PFEC_write_access) && leaf_level == 2) ) + { paging_mark_dirty(d, gw->l2mfn); + hvmemul_write_cache(v, l2gpa, &gw->l2e, sizeof(gw->l2e)); + } /* Fallthrough */ #if GUEST_PAGING_LEVELS == 4 /* 64-bit only... */ case 3: if ( set_ad_bits(&l3p[guest_l3_table_offset(va)].l3, &gw->l3e.l3, (walk & PFEC_write_access) && leaf_level == 3) ) + { paging_mark_dirty(d, gw->l3mfn); + hvmemul_write_cache(v, l3gpa, &gw->l3e, sizeof(gw->l3e)); + } if ( set_ad_bits(&l4p[guest_l4_table_offset(va)].l4, &gw->l4e.l4, false) ) + { paging_mark_dirty(d, gw->l4mfn); + hvmemul_write_cache(v, l4gpa, &gw->l4e, sizeof(gw->l4e)); + } #endif } --- a/xen/arch/x86/mm/hap/guest_walk.c +++ b/xen/arch/x86/mm/hap/guest_walk.c @@ -91,7 +91,8 @@ unsigned long hap_p2m_ga_to_gfn(GUEST_PA #if GUEST_PAGING_LEVELS == 3 top_map += (cr3 & ~(PAGE_MASK | 31)); #endif - walk_ok = guest_walk_tables(v, p2m, ga, &gw, *pfec, top_mfn, top_map); + walk_ok = guest_walk_tables(v, p2m, ga, &gw, *pfec, + top_gfn, top_mfn, top_map); unmap_domain_page(top_map); put_page(top_page); --- a/xen/arch/x86/mm/shadow/multi.c +++ b/xen/arch/x86/mm/shadow/multi.c @@ -175,9 +175,13 @@ static inline bool sh_walk_guest_tables(struct vcpu *v, unsigned long va, walk_t *gw, uint32_t pfec) { + gfn_t root_gfn = _gfn(paging_mode_external(v->domain) + ? cr3_pa(v->arch.hvm.guest_cr[3]) >> PAGE_SHIFT + : pagetable_get_pfn(v->arch.guest_table)); + #if GUEST_PAGING_LEVELS == 3 /* PAE */ return guest_walk_tables(v, p2m_get_hostp2m(v->domain), va, gw, pfec, - INVALID_MFN, v->arch.paging.shadow.gl3e); + root_gfn, INVALID_MFN, v->arch.paging.shadow.gl3e); #else /* 32 or 64 */ const struct domain *d = v->domain; mfn_t root_mfn = (v->arch.flags & TF_kernel_mode @@ -185,7 +189,7 @@ sh_walk_guest_tables(struct vcpu *v, uns : pagetable_get_mfn(v->arch.guest_table_user)); void *root_map = map_domain_page(root_mfn); bool ok = guest_walk_tables(v, p2m_get_hostp2m(d), va, gw, pfec, - root_mfn, root_map); + root_gfn, root_mfn, root_map); unmap_domain_page(root_map); --- a/xen/include/asm-x86/guest_pt.h +++ b/xen/include/asm-x86/guest_pt.h @@ -428,8 +428,9 @@ static inline unsigned int guest_walk_to #define guest_walk_tables GPT_RENAME(guest_walk_tables, GUEST_PAGING_LEVELS) bool -guest_walk_tables(struct vcpu *v, struct p2m_domain *p2m, unsigned long va, - walk_t *gw, uint32_t pfec, mfn_t top_mfn, void *top_map); +guest_walk_tables(const struct vcpu *v, struct p2m_domain *p2m, + unsigned long va, walk_t *gw, uint32_t pfec, + gfn_t top_gfn, mfn_t top_mfn, void *top_map); /* Pretty-print the contents of a guest-walk */ static inline void print_gw(const walk_t *gw)