From patchwork Wed Apr 26 15:43:33 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Olaf Hering X-Patchwork-Id: 9701595 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 6F44D603F6 for ; Wed, 26 Apr 2017 15:47:33 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 5F18928111 for ; Wed, 26 Apr 2017 15:47:33 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 5373F285EF; Wed, 26 Apr 2017 15:47:33 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-4.1 required=2.0 tests=BAYES_00,DKIM_SIGNED, RCVD_IN_DNSWL_MED,T_DKIM_INVALID autolearn=ham version=3.3.1 Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120]) (using TLSv1.2 with cipher AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 25A6B28111 for ; Wed, 26 Apr 2017 15:47:31 +0000 (UTC) Received: from localhost ([127.0.0.1] helo=lists.xenproject.org) by lists.xenproject.org with esmtp (Exim 4.84_2) (envelope-from ) id 1d3P7U-0005K1-9d; Wed, 26 Apr 2017 15:44:44 +0000 Received: from mail6.bemta6.messagelabs.com ([193.109.254.103]) by lists.xenproject.org with esmtp (Exim 4.84_2) (envelope-from ) id 1d3P7R-0005Jt-R3 for xen-devel@lists.xen.org; Wed, 26 Apr 2017 15:44:42 +0000 Received: from [193.109.254.147] by server-8.bemta-6.messagelabs.com id F0/79-03696-960C0095; Wed, 26 Apr 2017 15:44:41 +0000 X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFvrJIsWRWlGSWpSXmKPExsUSuHLSbd30Awy RBreumlks+biYxYHR4+ju30wBjFGsmXlJ+RUJrBlbWh6wFOzMqJi3/zhrA+Nm/y5GLg4WgdXM Eo8bdzJ2MXJySAjkSny6voypi5EDyBaRePI/DSQsJHCKSWLlISYIO13i0PFeZhCbRUBV4nXvW nYQm01ASWLvweNgY0QE/CUmzLzFCmIzC0hLXHvylg3EFhZwkTi3+C9YPa+AnsSuLdfYQW4QEt jOKPHpcQ8LREJQ4uTMJywgNzALFEtMmq0AYUpLLP/HAVLBKWAvsXDVTLAxogKyEo9/X2ecwCg 4C0nzLITmWQjNs8DuUZf4M+8SM4awtsSyha+ZIWxbiXXr3rMsYGRfxahRnFpUllqka2igl1SU mZ5RkpuYmQPkmenlphYXJ6an5iQmFesl5+duYgRGAwMQ7GC8tyzgEKMkB5OSKO/6FQyRQnxJ+ SmVGYnFGfFFpTmpxYcYZTg4lCR4jfYD5QSLUtNTK9Iyc4BxCZOW4OBREuFVBUnzFhck5hZnpk OkTjHqcrxb+uE9kxBLXn5eqpQ4ryhIkQBIUUZpHtwIWIq4xCgrJczLCHSUEE9BalFuZgmq/Ct GcQ5GJWFeQ5ApPJl5JXCbXgEdwQR0BIsL2BEliQgpqQbGU/VLjZnUFr8I2signn3XzWf9/U19 s5+fPu0+8fPUe+7PdWeEnF7nIniprcg4gVFcWpR7ep3eBXtLNeY304KNuBu0VySXBjosmq/Pz HB9efrJO4X8Xg1V6RyrAnXrlnwIMf85+7iBy54HZauUKzzNyvft//j6OUf+mmkfJy9ae6jHxe mv0x1VJZbijERDLeai4kQA9lwfWwwDAAA= X-Env-Sender: olaf@aepfle.de X-Msg-Ref: server-14.tower-27.messagelabs.com!1493221479!86701756!1 X-Originating-IP: [81.169.146.219] X-SpamReason: No, hits=0.0 required=7.0 tests= X-StarScan-Received: X-StarScan-Version: 9.4.12; banners=-,-,- X-VirusChecked: Checked Received: (qmail 37961 invoked from network); 26 Apr 2017 15:44:39 -0000 Received: from mo4-p00-ob.smtp.rzone.de (HELO mo4-p00-ob.smtp.rzone.de) (81.169.146.219) by server-14.tower-27.messagelabs.com with DHE-RSA-AES256-GCM-SHA384 encrypted SMTP; 26 Apr 2017 15:44:39 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; t=1493221479; l=12163; s=domk; d=aepfle.de; h=In-Reply-To:Content-Disposition:Content-Type:MIME-Version: References:Subject:Cc:To:From:Date; bh=4a6yIFN9SXO4B2JTLn4GC64mmQYwkxrjWWvFP88L6PI=; b=y7WMFWPvUtmZUkcpLsG5ijPjhMt9vfO8yNPpuW2m1Q+LmA0uuLWJyQ2rEfU+8Mstdp pmPxZjq0zF3p3cq/2XoT6eFR0Y9lahFoYq7kQ3npPmcIpg9Bz2ANN1OuQu6pNECsCpW2 XysePlbS921UuxIrYUahqkRxGPEmX07VrRW44= X-RZG-AUTH: :P2EQZWCpfu+qG7CngxMFH1J+yackYocTD1iAi8x+OWi/zfN1cLnAYQz4mT1eNf4qJBAgir0yn/UTNgv5tSq25cdrzSjN5w== X-RZG-CLASS-ID: mo00 Received: from probook ([2001:a61:3419:58ff:3830:c94a:207c:b61d]) by smtp.strato.de (RZmta 40.6 AUTH) with ESMTPSA id L0a770t3QFhYxvH (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (curve secp521r1 with 521 ECDH bits, eq. 15360 bits RSA)) (Client did not present a certificate); Wed, 26 Apr 2017 17:43:34 +0200 (CEST) Received: by probook (Postfix, from userid 1000) id 9C9E750608; Wed, 26 Apr 2017 17:43:33 +0200 (CEST) Date: Wed, 26 Apr 2017 17:43:33 +0200 From: Olaf Hering To: Andrew Cooper , Jan Beulich Message-ID: <20170426154333.GA21598@aepfle.de> References: <20170420153523.GG4645@aepfle.de> <104ce757-2105-774e-f421-5ee07f5acb0b@citrix.com> <20170420160401.GH4645@aepfle.de> <58F8FE8D0200007800152928@prv-mh.provo.novell.com> MIME-Version: 1.0 In-Reply-To: <58F8FE8D0200007800152928@prv-mh.provo.novell.com> User-Agent: Mutt/1.8.0 (7000) Cc: xen-devel@lists.xen.org Subject: Re: [Xen-devel] superpages lost after migration of HVM domU X-BeenThere: xen-devel@lists.xen.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: Xen developer discussion List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Errors-To: xen-devel-bounces@lists.xen.org Sender: "Xen-devel" X-Virus-Scanned: ClamAV using ClamSMTP On Thu, Apr 20, Jan Beulich wrote: > >>> On 20.04.17 at 18:04, wrote: > > On Thu, Apr 20, Andrew Cooper wrote: > > > >> As it currently stands, the sending side iterates from 0 to p2m_size, > >> and sends every frame on the first pass. This means we get PAGE_DATA > >> records linearly, in batches of 1024, or two aligned 2M superpages. > > Is there a way to preserve 1G pages? This 380G domU I'm looking at is > > built with 4k:461390 2M:2341 1G:365 pages. > I think we've hashed out a possible way to deal with this, by > speculatively allocating 1G pages as long as the allocation cap for > the domain allows, subsequently punching holes into those pages > if we can't allocate any new pages anymore (due to otherwise > overrunning the cap). The result is not pretty. This HVM-only approach appears to work for a domU with "memory=3024" and localhost migration. It is required to punch holes as soon as possible to avoid errors in xenforeignmemory_map due to "Over-allocation". Would be nice if the receiver gets a memory map upfront to avoid all stunts... Olaf diff --git a/tools/libxc/xc_sr_common.h b/tools/libxc/xc_sr_common.h index a83f22af4e..36e7891dde 100644 --- a/tools/libxc/xc_sr_common.h +++ b/tools/libxc/xc_sr_common.h @@ -107,6 +107,9 @@ struct xc_sr_save_ops */ struct xc_sr_restore_ops { + /* Allocate a MFN for the given PFN */ + int (*allocate_pfn)(struct xc_sr_context *ctx, xen_pfn_t pfn); + /* Convert a PFN to GFN. May return ~0UL for an invalid mapping. */ xen_pfn_t (*pfn_to_gfn)(const struct xc_sr_context *ctx, xen_pfn_t pfn); @@ -331,6 +334,14 @@ struct xc_sr_context /* HVM context blob. */ void *context; size_t contextsz; + + /* Bitmap of currently allocated PFNs during restore. */ + xen_pfn_t *sp_extents; + unsigned long *attempted_1g; + unsigned long *attempted_2m; + unsigned long *allocated_pfns; + xen_pfn_t max_allocated_pfn; + unsigned long alloc_cnt; } restore; }; } x86_hvm; diff --git a/tools/libxc/xc_sr_restore.c b/tools/libxc/xc_sr_restore.c index 3549f0a1ae..2e8d15307f 100644 --- a/tools/libxc/xc_sr_restore.c +++ b/tools/libxc/xc_sr_restore.c @@ -135,6 +135,7 @@ int populate_pfns(struct xc_sr_context *ctx, unsigned count, const xen_pfn_t *original_pfns, const uint32_t *types) { xc_interface *xch = ctx->xch; + xen_pfn_t min_pfn = original_pfns[0], max_pfn = original_pfns[0]; xen_pfn_t *mfns = malloc(count * sizeof(*mfns)), *pfns = malloc(count * sizeof(*pfns)); unsigned i, nr_pfns = 0; @@ -149,11 +150,18 @@ int populate_pfns(struct xc_sr_context *ctx, unsigned count, for ( i = 0; i < count; ++i ) { + if (original_pfns[i] < min_pfn) + min_pfn = original_pfns[i]; + if (original_pfns[i] > max_pfn) + max_pfn = original_pfns[i]; if ( (!types || (types && (types[i] != XEN_DOMCTL_PFINFO_XTAB && types[i] != XEN_DOMCTL_PFINFO_BROKEN))) && !pfn_is_populated(ctx, original_pfns[i]) ) { + rc = ctx->restore.ops.allocate_pfn(ctx, original_pfns[i]); + if ( rc ) + goto err; rc = pfn_set_populated(ctx, original_pfns[i]); if ( rc ) goto err; @@ -161,6 +169,16 @@ int populate_pfns(struct xc_sr_context *ctx, unsigned count, ++nr_pfns; } } + IPRINTF("checking range %lx %lx\n", min_pfn, max_pfn); + while (min_pfn < max_pfn) { + if (!pfn_is_populated(ctx, min_pfn) && test_and_clear_bit(min_pfn, ctx->x86_hvm.restore.allocated_pfns)) { + xen_pfn_t pfn = min_pfn; + rc = xc_domain_decrease_reservation_exact(xch, ctx->domid, 1, 0, &pfn); + IPRINTF("free %lx %lx %d\n", min_pfn, pfn, rc); + } + min_pfn++; + } + nr_pfns = 0; if ( nr_pfns ) { @@ -723,6 +741,10 @@ static void cleanup(struct xc_sr_context *ctx) NRPAGES(bitmap_size(ctx->restore.p2m_size))); free(ctx->restore.buffered_records); free(ctx->restore.populated_pfns); + free(ctx->x86_hvm.restore.sp_extents); + free(ctx->x86_hvm.restore.attempted_1g); + free(ctx->x86_hvm.restore.attempted_2m); + free(ctx->x86_hvm.restore.allocated_pfns); if ( ctx->restore.ops.cleanup(ctx) ) PERROR("Failed to clean up"); } @@ -810,6 +832,17 @@ static int restore(struct xc_sr_context *ctx) saved_errno = errno; saved_rc = rc; PERROR("Restore failed"); + { + unsigned long i; + bool a, p; + IPRINTF("alloc_cnt %lu\n", ctx->x86_hvm.restore.alloc_cnt); + for (i = 0; i < ctx->restore.p2m_size; i++) { + p = test_bit(i, ctx->restore.populated_pfns); + a = test_bit(i, ctx->x86_hvm.restore.allocated_pfns); + if (p != a) + IPRINTF("%lx a %x p %x\n", i, a, p); + } + } done: cleanup(ctx); @@ -888,6 +921,7 @@ int xc_domain_restore(xc_interface *xch, int io_fd, uint32_t dom, } ctx.restore.p2m_size = nr_pfns; + IPRINTF("p2m_size %lx\n", ctx.restore.p2m_size); if ( ctx.dominfo.hvm ) { diff --git a/tools/libxc/xc_sr_restore_x86_hvm.c b/tools/libxc/xc_sr_restore_x86_hvm.c index 1dca85354a..fc441d2a6d 100644 --- a/tools/libxc/xc_sr_restore_x86_hvm.c +++ b/tools/libxc/xc_sr_restore_x86_hvm.c @@ -3,6 +3,10 @@ #include "xc_sr_common_x86.h" +#define SUPERPAGE_2MB_SHIFT 9 +#define SUPERPAGE_2MB_NR_PFNS (1UL << SUPERPAGE_2MB_SHIFT) +#define SUPERPAGE_1GB_SHIFT 18 +#define SUPERPAGE_1GB_NR_PFNS (1UL << SUPERPAGE_1GB_SHIFT) /* * Process an HVM_CONTEXT record from the stream. */ @@ -149,6 +153,20 @@ static int x86_hvm_setup(struct xc_sr_context *ctx) return -1; } + ctx->x86_hvm.restore.sp_extents = calloc(1UL << SUPERPAGE_1GB_SHIFT, sizeof(*ctx->x86_hvm.restore.sp_extents)); + ctx->x86_hvm.restore.attempted_1g = bitmap_alloc((ctx->restore.p2m_size >> SUPERPAGE_1GB_SHIFT) + 1); + ctx->x86_hvm.restore.attempted_2m = bitmap_alloc((ctx->restore.p2m_size >> SUPERPAGE_2MB_SHIFT) + 1); + ctx->x86_hvm.restore.max_allocated_pfn = ctx->restore.p2m_size; + ctx->x86_hvm.restore.allocated_pfns = bitmap_alloc(ctx->x86_hvm.restore.max_allocated_pfn + 1); + if (!ctx->x86_hvm.restore.sp_extents || !ctx->x86_hvm.restore.allocated_pfns || !ctx->x86_hvm.restore.attempted_2m || !ctx->x86_hvm.restore.attempted_1g) + { + ERROR("Unable to allocate memory for allocated_pfns bitmaps"); + return -1; + } + /* No superpage in 1st 2MB due to VGA hole */ + set_bit(0, ctx->x86_hvm.restore.attempted_1g); + set_bit(0, ctx->x86_hvm.restore.attempted_2m); + return 0; } @@ -228,8 +246,139 @@ static int x86_hvm_cleanup(struct xc_sr_context *ctx) return 0; } +static bool pfn_is_allocated(const struct xc_sr_context *ctx, xen_pfn_t pfn) +{ + if ( pfn > ctx->x86_hvm.restore.max_allocated_pfn ) + return false; + return test_bit(pfn, ctx->x86_hvm.restore.allocated_pfns); +} + +/* + * Set a pfn as allocated, expanding the tracking structures if needed. To + * avoid realloc()ing too excessively, the size increased to the nearest power + * of two large enough to contain the required pfn. + */ +static int pfn_set_allocated(struct xc_sr_context *ctx, xen_pfn_t pfn) +{ + xc_interface *xch = ctx->xch; + + if ( pfn > ctx->x86_hvm.restore.max_allocated_pfn ) + { + xen_pfn_t new_max; + size_t old_sz, new_sz; + unsigned long *p; + + /* Round up to the nearest power of two larger than pfn, less 1. */ + new_max = pfn; + new_max |= new_max >> 1; + new_max |= new_max >> 2; + new_max |= new_max >> 4; + new_max |= new_max >> 8; + new_max |= new_max >> 16; +#ifdef __x86_64__ + new_max |= new_max >> 32; +#endif + + old_sz = bitmap_size(ctx->x86_hvm.restore.max_allocated_pfn + 1); + new_sz = bitmap_size(new_max + 1); + p = realloc(ctx->x86_hvm.restore.allocated_pfns, new_sz); + if ( !p ) + { + ERROR("Failed to realloc allocated bitmap"); + errno = ENOMEM; + return -1; + } + + memset((uint8_t *)p + old_sz, 0x00, new_sz - old_sz); + + ctx->x86_hvm.restore.allocated_pfns = p; + ctx->x86_hvm.restore.max_allocated_pfn = new_max; + } + + assert(!test_bit(pfn, ctx->x86_hvm.restore.allocated_pfns)); + set_bit(pfn, ctx->x86_hvm.restore.allocated_pfns); + + return 0; +} + +static int x86_hvm_allocate_pfn(struct xc_sr_context *ctx, xen_pfn_t pfn) +{ + xc_interface *xch = ctx->xch; + bool success = false; + int rc = -1; + long done; + unsigned long i, nr_extents; + unsigned long stat_1g = 0, stat_2m = 0, stat_4k = 0; + unsigned long idx_1g, idx_2m; + unsigned long count; + xen_pfn_t base_pfn = 0, *sp_extents = ctx->x86_hvm.restore.sp_extents; + + IPRINTF("pfn %lx\n", (long)pfn); + if (pfn_is_allocated(ctx, pfn)) + return 0; + + idx_1g = pfn >> SUPERPAGE_1GB_SHIFT; + idx_2m = pfn >> SUPERPAGE_2MB_SHIFT; + IPRINTF("idx_1g %lu idx_2m %lu\n", idx_1g, idx_2m); + if (!test_and_set_bit(idx_1g, ctx->x86_hvm.restore.attempted_1g)) { + count = 1UL << SUPERPAGE_1GB_SHIFT; + base_pfn = (pfn >> SUPERPAGE_1GB_SHIFT) << SUPERPAGE_1GB_SHIFT; + nr_extents = count >> SUPERPAGE_1GB_SHIFT; + IPRINTF("base_pfn %lx count %lu nr_extents %lu\n", (long)base_pfn, count, nr_extents); + for ( i = 0; i < nr_extents; i++ ) + sp_extents[i] = base_pfn + (i<domid, nr_extents, SUPERPAGE_1GB_SHIFT, 0, sp_extents); + IPRINTF("1G %lu -> %ld\n", nr_extents, done); + if (done > 0) { + success = true; + ctx->x86_hvm.restore.alloc_cnt += count; + stat_1g = done; + for (i = 0; i < (count >> SUPERPAGE_2MB_SHIFT); i++) + set_bit((base_pfn >> SUPERPAGE_2MB_SHIFT) + i, ctx->x86_hvm.restore.attempted_2m); + } + } + + if (!test_and_set_bit(idx_2m, ctx->x86_hvm.restore.attempted_2m)) { + count = 1UL << SUPERPAGE_2MB_SHIFT; + base_pfn = (pfn >> SUPERPAGE_2MB_SHIFT) << SUPERPAGE_2MB_SHIFT; + nr_extents = count >> SUPERPAGE_2MB_SHIFT; + IPRINTF("base_pfn %lx count %lu nr_extents %lu\n", (long)base_pfn, count, nr_extents); + for ( i = 0; i < nr_extents; i++ ) + sp_extents[i] = base_pfn + (i<domid, nr_extents, SUPERPAGE_2MB_SHIFT, 0, sp_extents); + IPRINTF("2M %lu -> %ld\n", nr_extents, done); + if (done > 0) { + success = true; + ctx->x86_hvm.restore.alloc_cnt += count; + stat_2m = done; + } + } + if (success == false) { + count = 1; + sp_extents[0] = base_pfn = pfn; + done = xc_domain_populate_physmap(xch, ctx->domid, count, 0, 0, sp_extents); + if (done > 0) { + success = true; + ctx->x86_hvm.restore.alloc_cnt += count; + stat_4k = count; + } + } + IPRINTF("count %lu\n", count); + IPRINTF("1G %lu 2M %lu 4k %lu\n", stat_1g, stat_2m, stat_4k); + if (success == true) { + do { + count--; + rc = pfn_set_allocated(ctx, base_pfn + count); + if (rc) + break; + } while (count); + } + return rc; +} + struct xc_sr_restore_ops restore_ops_x86_hvm = { + .allocate_pfn = x86_hvm_allocate_pfn, .pfn_is_valid = x86_hvm_pfn_is_valid, .pfn_to_gfn = x86_hvm_pfn_to_gfn, .set_gfn = x86_hvm_set_gfn, diff --git a/tools/libxc/xc_sr_restore_x86_pv.c b/tools/libxc/xc_sr_restore_x86_pv.c index 50e25c162c..c426f14c73 100644 --- a/tools/libxc/xc_sr_restore_x86_pv.c +++ b/tools/libxc/xc_sr_restore_x86_pv.c @@ -1152,8 +1152,15 @@ static int x86_pv_cleanup(struct xc_sr_context *ctx) return 0; } +static int x86_pv_allocate_pfn(struct xc_sr_context *ctx, xen_pfn_t pfn) +{ + errno = ENOMEM; + return -1; +} + struct xc_sr_restore_ops restore_ops_x86_pv = { + .allocate_pfn = x86_pv_allocate_pfn, .pfn_is_valid = x86_pv_pfn_is_valid, .pfn_to_gfn = pfn_to_mfn, .set_page_type = x86_pv_set_page_type,