From patchwork Tue Apr 5 01:57:28 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Konrad Rzeszutek Wilk X-Patchwork-Id: 8746411 Return-Path: X-Original-To: patchwork-xen-devel@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork2.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.136]) by patchwork2.web.kernel.org (Postfix) with ESMTP id 93DCCC0553 for ; Tue, 5 Apr 2016 02:00:28 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id 58A9920263 for ; Tue, 5 Apr 2016 02:00:27 +0000 (UTC) Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120]) (using TLSv1.2 with cipher AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id EF5E12025A for ; Tue, 5 Apr 2016 02:00:25 +0000 (UTC) Received: from localhost ([127.0.0.1] helo=lists.xenproject.org) by lists.xenproject.org with esmtp (Exim 4.84_2) (envelope-from ) id 1anGFi-00088B-3t; Tue, 05 Apr 2016 01:57:58 +0000 Received: from mail6.bemta6.messagelabs.com ([85.158.143.247]) by lists.xenproject.org with esmtp (Exim 4.84_2) (envelope-from ) id 1anGFh-000885-Lz for xen-devel@lists.xenproject.org; Tue, 05 Apr 2016 01:57:57 +0000 Received: from [85.158.143.35] by server-2.bemta-6.messagelabs.com id 6C/8A-09532-4AB13075; Tue, 05 Apr 2016 01:57:56 +0000 X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFjrJIsWRWlGSWpSXmKPExsUyZ7p8oO4SaeZ wg7UveCy+b5nM5MDocfjDFZYAxijWzLyk/IoE1owlz6axF/T4VByYHN3A+NKqi5GLQ0iglUni Uu9rJgjnK6PEphWb2LoYOYGcDYwSazbHQyS6GSVeHtwLlOAAcook1j6WBalhEVCRmHN6LRNIm E3AROLNKkeQsIiAskTvr98sIDazwH0mif9n0kBsYQEHid0f+plAbF4BM4mGc3+YIcYfZ5L482 ATVEJQ4uTMJ1DNWhI3/r0Em88sIC2x/B8HiMkpYC7ROMMPxBQFuuDVwXqQYgkBQ4nTD7cxTmA UmoVkziwkc2YhzFnAyLyKUb04tagstUjXUC+pKDM9oyQ3MTNH19DATC83tbg4MT01JzGpWC85 P3cTIzCIGYBgB+PO506HGCU5mJREeRleM4UL8SXlp1RmJBZnxBeV5qQWH2KU4eBQkuBtlWIOF xIsSk1PrUjLzAHGE0xagoNHSYQ3BSTNW1yQmFucmQ6ROsWoKCXO6wOSEABJZJTmwbXBYvgSo6 yUMC8j0CFCPAWpRbmZJajyrxjFORiVhHmXgkzhycwrgZv+CmgxE9DiemEmkMUliQgpqQbGCf3 PfazspCQzbFefE9BNnJjzb55qVeXE3EUNHwqsZvz58PRHidpmPulLXd3Z0d+tnklrX/vv/2Ly Ctvr01gusyT+zOpaX3VJ566u08KqmX0LFoS9XLetf79DnZLu3BC/U1w+Biorsh0m14bHfym/F LT46vHGCw0sbrGr5TOWzkl6OFt/jyOfEktxRqKhFnNRcSIAP0f6KdwCAAA= X-Env-Sender: konrad@char.us.oracle.com X-Msg-Ref: server-3.tower-21.messagelabs.com!1459821474!6987739!1 X-Originating-IP: [156.151.31.81] X-SpamReason: No, hits=0.0 required=7.0 tests=sa_preprocessor: VHJ1c3RlZCBJUDogMTU2LjE1MS4zMS44MSA9PiAyODgzMzk=\n X-StarScan-Received: X-StarScan-Version: 8.28; banners=-,-,- X-VirusChecked: Checked Received: (qmail 60618 invoked from network); 5 Apr 2016 01:57:55 -0000 Received: from userp1040.oracle.com (HELO userp1040.oracle.com) (156.151.31.81) by server-3.tower-21.messagelabs.com with DHE-RSA-AES256-GCM-SHA384 encrypted SMTP; 5 Apr 2016 01:57:55 -0000 Received: from userv0021.oracle.com (userv0021.oracle.com [156.151.31.71]) by userp1040.oracle.com (Sentrion-MTA-4.3.2/Sentrion-MTA-4.3.2) with ESMTP id u351vXmh012492 (version=TLSv1 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK); Tue, 5 Apr 2016 01:57:34 GMT Received: from aserv0122.oracle.com (aserv0122.oracle.com [141.146.126.236]) by userv0021.oracle.com (8.13.8/8.13.8) with ESMTP id u351vV6x028875 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK); Tue, 5 Apr 2016 01:57:31 GMT Received: from abhmp0014.oracle.com (abhmp0014.oracle.com [141.146.116.20]) by aserv0122.oracle.com (8.13.8/8.13.8) with ESMTP id u351vU4v001596; Tue, 5 Apr 2016 01:57:30 GMT Received: from char.us.oracle.com (/10.137.176.158) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Mon, 04 Apr 2016 18:57:30 -0700 Received: by char.us.oracle.com (Postfix, from userid 1000) id BA1266A00EA; Mon, 4 Apr 2016 21:57:28 -0400 (EDT) Date: Mon, 4 Apr 2016 21:57:28 -0400 From: Konrad Rzeszutek Wilk To: Jan Beulich Message-ID: <20160405015728.GA8449@char.us.oracle.com> References: <1458849640-22588-1-git-send-email-konrad.wilk@oracle.com> <1458849640-22588-11-git-send-email-konrad.wilk@oracle.com> <56FD461602000078000E1C28@prv-mh.provo.novell.com> <20160331212604.GA24340@localhost.localdomain> <56FE591502000078000E1F2D@prv-mh.provo.novell.com> <20160404194444.GA4474@char.us.oracle.com> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <20160404194444.GA4474@char.us.oracle.com> User-Agent: Mutt/1.5.24 (2015-08-30) X-Source-IP: userv0021.oracle.com [156.151.31.71] Cc: Keir Fraser , andrew.cooper3@citrix.com, mpohlack@amazon.de, ross.lagerwall@citrix.com, Julien Grall , Stefano Stabellini , xen-devel@lists.xenproject.org, Konrad Rzeszutek Wilk , sasha.levin@oracle.com Subject: Re: [Xen-devel] [PATCH v5 10/28] xsplice: Implement payload loading X-BeenThere: xen-devel@lists.xen.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: Xen developer discussion List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Errors-To: xen-devel-bounces@lists.xen.org Sender: "Xen-devel" X-Spam-Status: No, score=-4.2 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_MED, UNPARSEABLE_RELAY autolearn=unavailable version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP On Mon, Apr 04, 2016 at 03:44:44PM -0400, Konrad Rzeszutek Wilk wrote: > On Fri, Apr 01, 2016 at 03:18:45AM -0600, Jan Beulich wrote: > > >>> On 31.03.16 at 23:26, wrote: > > >> Also - how well will this O(n^2) lookup work once there are enough > > >> payloads? I think this calls for the alternative vmap() extension I've > > >> been suggesting earlier. > > > > > > Could you elaborate on the vmap extension a bit please? > > > > > > Your earlier email seems to say: drop the vmap API and just > > > allocate the underlaying pages yourself. > > > > Actually I had also said in that earlier mail: "If, otoh, you left that > > VA management to (an extended version of) vmap(), by e.g. > > allowing the caller to request allocation from a different VA range > > (much like iirc x86-64 Linux handles its modules address range > > allocation), things would be different. After all the VA > > management is the important part here, while the backing > > memory allocation is just a trivial auxiliary operation." > > > > I.e. elaboration here really just consists of the referral to the > > respective Linux approach. > > I am in need here of guidance I am afraid. > > Let me explain (did this in IRC but this will have a broader scope): > > In Linux we have the 'struct vm_area' which internally contains the start > and end address (amongst other things). The callers usually use __vmalloc_node_range > to an provide those addresses. Internally the vmalloc API allocates the > 'struct vm_area' from the normal SLAB allocator. Vmalloc API also has an > vmap block area (allocated within vmalloc area) which is a red and black tree > for all the users of its API. When vm_size() is called this tree is searched > to find the 'vm_area' for the provided virtual address. There is a lot > of code in this. Copying it and jamming does not look easy so it would be > better to take concepts of this an implement this.. > > > On Xen we setup a bitmap that covers the full vmalloc area (128MB on my > 4GB box, but can go up to 64GB) - the 128MB vmalloc area requires about > 32K bits. > > For every allocation we "waste" an page (and a bit) so that there is a gap. > This gap is needed when trying to to determine the size of the allocated > region - when scanning the bitmap we can easily figure out the cleared > bit which is akin to a fencepost. > > > To make Xen's vmalloc API be generic I need to wholesale make it able > to deal with virtual addresses that are not part of its space (as in > not in VMAP_VIRT_START to vm_top). At the start I the input to vm_size() > needs to get the size of the virtual address (either the ones from > the vmalloc areas or the ones provided earlier by vmalloc_cb). > > One easy mechanism is to embedded an array of simplified 'struct vm_area' structure: > > struct vm_area { > unsigned long va; > } > > for every slot in the VMAP_VIRT_START area (that is have 32K entries). > The vm_size and all the rest can check for this array if the virtual > address provided is not within the vmalloc virtual addresses. If there > is a match we just need to consult the vm_bitmap at the same index and > figure out where the empty bit is set. > The downside is that I've to walk the full array (32K entries). > > But when you think about it - most of the time we use normal vmalloc addresses > and only in exceptional cases do we need the alternate ones. And the only reason > to keep track of it is to know the size. > > The easier way would be to track them via a linked list: > > struct vm_area { > struct list_head list; > unsigned long va; > size_t nr; > } > > And vm_size, vm_index, etc would consult this list for the virtual address and > could get the proper information. (See inline patch) > > But if we are doing that this, then why even put it in the vmalloc API? Why not > track all of this with the user of this? (like it was done in v4 of this patch series?) > > Please advise. I re-read your previous email and I think you were leaning towards not even having a callback but rather supplying the virtual address to the vmalloc APIs and it tracking it afterwards. Like this: From 738ed247bf214a061c6822ad183c365a4f5731b9 Mon Sep 17 00:00:00 2001 From: Konrad Rzeszutek Wilk Date: Mon, 14 Mar 2016 12:02:05 -0400 Subject: [PATCH] vmap: Add vmalloc_range For those users who want to supply their own virtual address for which to allocate the underlaying pages. The vmap API also keeps track of this virtual address (along with the size) so that vunmap, vm_size, and vm_free can operate on these virtual addresses. This allows users (such as xSplice) to provide their own mechanism to change the the page flags, and also use virtual addresses closer to the hypervisor virtual addresses (at least on x86) while not having to deal with the allocation of pages. For example of users, see patch titled "xsplice: Implement payload loading". Note that the displacement of the hypervisor virtual addresses to the vmalloc (on x86) is more than 32-bits - which means that ELF relocations (which are limited to 32-bits) - won't work (we truncate the 34 and 33th bit). Hence we cannot use on vmalloc virtual addresses but must supply our own ranges. Signed-off-by: Konrad Rzeszutek Wilk --- Cc: Ian Jackson Cc: Jan Beulich Cc: Keir Fraser Cc: Tim Deegan v4: New patch. v5: Update per Jan's comments. v6: Drop the stray parentheses on typedefs. Ditch the vunmap callback. Stash away the virtual addresses in lists. Ditch the vmap callback. Just provide virtual address. --- --- xen/common/vmap.c | 104 +++++++++++++++++++++++++++++++++++++++++++++++++ xen/include/xen/vmap.h | 10 +++++ 2 files changed, 114 insertions(+) diff --git a/xen/common/vmap.c b/xen/common/vmap.c index 134eda0..b63886b 100644 --- a/xen/common/vmap.c +++ b/xen/common/vmap.c @@ -19,6 +19,10 @@ static unsigned int __read_mostly vm_end; /* lowest known clear bit in the bitmap */ static unsigned int vm_low; +static LIST_HEAD(vm_area_list); + +static DEFINE_SPINLOCK(vm_area_lock); + void __init vm_init(void) { unsigned int i, nr; @@ -146,12 +150,34 @@ static unsigned int vm_index(const void *va) test_bit(idx, vm_bitmap) ? idx : 0; } +static const struct vm_area *vm_find(const void *va) +{ + const struct vm_area *found = NULL, *vm; + + spin_lock(&vm_area_lock); + list_for_each_entry( vm, &vm_area_list, list ) + { + if ( vm->va != va ) + continue; + found = vm; + break; + } + spin_unlock(&vm_area_lock); + + return found; +} + static unsigned int vm_size(const void *va) { unsigned int start = vm_index(va), end; if ( !start ) + { + const struct vm_area *vm = vm_find(va); + if ( vm ) + return vm->pages; return 0; + } end = find_next_zero_bit(vm_bitmap, vm_top, start + 1); @@ -164,6 +190,17 @@ void vm_free(const void *va) if ( !bit ) { + struct vm_area *vm = (struct vm_area *)vm_find(va); + + if ( vm ) + { + spin_lock(&vm_area_lock); + list_del(&vm->list); + spin_unlock(&vm_area_lock); + xfree(vm->mfn); + xfree(vm); + return; + } WARN_ON(va != NULL); return; } @@ -199,6 +236,23 @@ void *__vmap(const mfn_t *mfn, unsigned int granularity, return va; } +static bool_t vmap_range(const mfn_t *mfn, unsigned long va, unsigned int nr) +{ + unsigned long cur = va; + + for ( ; va && nr--; ++mfn, cur += PAGE_SIZE ) + { + if ( map_pages_to_xen(cur, mfn_x(*mfn), 1, PAGE_HYPERVISOR) ) + { + if ( cur != va ) + destroy_xen_mappings(va, cur); + return 0; + } + } + + return 1; +} + void *vmap(const mfn_t *mfn, unsigned int nr) { return __vmap(mfn, 1, nr, 1, PAGE_HYPERVISOR); @@ -216,6 +270,56 @@ void vunmap(const void *va) vm_free(va); } +struct vm_area *vmalloc_range(size_t size, unsigned long start) +{ + mfn_t *mfn; + size_t pages, i; + struct page_info *pg; + struct vm_area *vm = NULL; + + ASSERT(size); + + pages = PFN_UP(size); + mfn = xmalloc_array(mfn_t, pages); + if ( mfn == NULL ) + return NULL; + + vm = xmalloc(struct vm_area); + if ( !vm ) + { + xfree(mfn); + return NULL; + } + vm->mfn = mfn; + + for ( i = 0; i < pages; i++ ) + { + pg = alloc_domheap_page(NULL, 0); + if ( pg == NULL ) + goto error; + mfn[i] = _mfn(page_to_mfn(pg)); + } + + if ( !vmap_range(mfn, start, pages) ) + goto error; + + vm->va = (void *)start; + vm->pages = pages; + + spin_lock(&vm_area_lock); + list_add(&vm->list, &vm_area_list); + spin_unlock(&vm_area_lock); + + return vm; + + error: + while ( i-- ) + free_domheap_page(mfn_to_page(mfn_x(mfn[i]))); + xfree(vm->mfn); + xfree(vm); + return NULL; +} + void *vmalloc(size_t size) { mfn_t *mfn; diff --git a/xen/include/xen/vmap.h b/xen/include/xen/vmap.h index 5671ac8..4c9a350 100644 --- a/xen/include/xen/vmap.h +++ b/xen/include/xen/vmap.h @@ -12,6 +12,16 @@ void *__vmap(const mfn_t *mfn, unsigned int granularity, void *vmap(const mfn_t *mfn, unsigned int nr); void vunmap(const void *); void *vmalloc(size_t size); + +struct vm_area { + struct list_head list; + mfn_t *mfn; + void *va; + unsigned int pages; +}; + +struct vm_area *vmalloc_range(size_t size, unsigned long start); + void *vzalloc(size_t size); void vfree(void *va);