From patchwork Thu Jan 5 07:24:02 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jitendra Kolhe X-Patchwork-Id: 9498521 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 290F2606DE for ; Thu, 5 Jan 2017 07:22:36 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 071C827F17 for ; Thu, 5 Jan 2017 07:22:36 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id DE40728156; Thu, 5 Jan 2017 07:22:35 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.9 required=2.0 tests=BAYES_00,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from lists.gnu.org (lists.gnu.org [208.118.235.17]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 4504B27F17 for ; Thu, 5 Jan 2017 07:22:34 +0000 (UTC) Received: from localhost ([::1]:44149 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1cP2Nd-0001TO-D4 for patchwork-qemu-devel@patchwork.kernel.org; Thu, 05 Jan 2017 02:22:33 -0500 Received: from eggs.gnu.org ([2001:4830:134:3::10]:34089) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1cP2NK-0001SV-OF for qemu-devel@nongnu.org; Thu, 05 Jan 2017 02:22:16 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1cP2NF-00007L-O0 for qemu-devel@nongnu.org; Thu, 05 Jan 2017 02:22:14 -0500 Received: from g4t3427.houston.hpe.com ([15.241.140.73]:8571) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1cP2NF-00006w-In for qemu-devel@nongnu.org; Thu, 05 Jan 2017 02:22:09 -0500 Received: from hpvmrhel1.in.rdlabs.hpecorp.net (unknown [15.213.178.32]) by g4t3427.houston.hpe.com (Postfix) with ESMTP id 90F5651; Thu, 5 Jan 2017 07:22:04 +0000 (UTC) From: Jitendra Kolhe To: qemu-devel@nongnu.org Date: Thu, 5 Jan 2017 12:54:02 +0530 Message-Id: <1483601042-6435-1-git-send-email-jitendra.kolhe@hpe.com> X-Mailer: git-send-email 1.8.3.1 X-detected-operating-system: by eggs.gnu.org: GNU/Linux 3.x [fuzzy] X-Received-From: 15.241.140.73 Subject: [Qemu-devel] [PATCH RFC] mem-prealloc: Reduce large guest start-up and migration time. X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: kwolf@redhat.com, peter.maydell@linaro.org, armbru@redhat.com, renganathan.meenakshisundaram@hpe.com, mohan_parthasarathy@hpe.com, pbonzini@redhat.com, jitendra.kolhe@hpe.com Errors-To: qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org Sender: "Qemu-devel" X-Virus-Scanned: ClamAV using ClamSMTP Using "-mem-prealloc" option for a very large guest leads to huge guest start-up and migration time. This is because with "-mem-prealloc" option qemu tries to map every guest page (create address translations), and make sure the pages are available during runtime. virsh/libvirt by default, seems to use "-mem-prealloc" option in case the guest is configured to use huge pages. The patch tries to map all guest pages simultaneously by spawning multiple threads. Given the problem is more prominent for large guests, the patch limits the changes to the guests of at-least 64GB of memory size. Currently limiting the change to QEMU library functions on POSIX compliant host only, as we are not sure if the problem exists on win32. Below are some stats with "-mem-prealloc" option for guest configured to use huge pages. ------------------------------------------------------------------------ Idle Guest | Start-up time | Migration time ------------------------------------------------------------------------ Guest stats with 2M HugePage usage - single threaded (existing code) ------------------------------------------------------------------------ 64 Core - 4TB | 54m11.796s | 75m43.843s 64 Core - 1TB | 8m56.576s | 14m29.049s 64 Core - 256GB | 2m11.245s | 3m26.598s ------------------------------------------------------------------------ Guest stats with 2M HugePage usage - map guest pages using 8 threads ------------------------------------------------------------------------ 64 Core - 4TB | 5m1.027s | 34m10.565s 64 Core - 1TB | 1m10.366s | 8m28.188s 64 Core - 256GB | 0m19.040s | 2m10.148s ----------------------------------------------------------------------- Guest stats with 2M HugePage usage - map guest pages using 16 threads ----------------------------------------------------------------------- 64 Core - 4TB | 1m58.970s | 31m43.400s 64 Core - 1TB | 0m39.885s | 7m55.289s 64 Core - 256GB | 0m11.960s | 2m0.135s ----------------------------------------------------------------------- Signed-off-by: Jitendra Kolhe --- util/oslib-posix.c | 64 +++++++++++++++++++++++++++++++++++++++++++++++++++--- 1 file changed, 61 insertions(+), 3 deletions(-) diff --git a/util/oslib-posix.c b/util/oslib-posix.c index f631464..a8bd7c2 100644 --- a/util/oslib-posix.c +++ b/util/oslib-posix.c @@ -55,6 +55,13 @@ #include "qemu/error-report.h" #endif +#define PAGE_TOUCH_THREAD_COUNT 8 +typedef struct { + char *addr; + uint64_t numpages; + uint64_t hpagesize; +} PageRange; + int qemu_get_thread_id(void) { #if defined(__linux__) @@ -323,6 +330,52 @@ static void sigbus_handler(int signal) siglongjmp(sigjump, 1); } +static void *do_touch_pages(void *arg) +{ + PageRange *range = (PageRange *)arg; + char *start_addr = range->addr; + uint64_t numpages = range->numpages; + uint64_t hpagesize = range->hpagesize; + uint64_t i = 0; + + for (i = 0; i < numpages; i++) { + memset(start_addr + (hpagesize * i), 0, 1); + } + qemu_thread_exit(NULL); + + return NULL; +} + +static int touch_all_pages(char *area, size_t hpagesize, size_t numpages) +{ + QemuThread page_threads[PAGE_TOUCH_THREAD_COUNT]; + PageRange page_range[PAGE_TOUCH_THREAD_COUNT]; + uint64_t numpage_per_thread, size_per_thread; + int i = 0, tcount = 0; + + numpage_per_thread = (numpages / PAGE_TOUCH_THREAD_COUNT); + size_per_thread = (hpagesize * numpage_per_thread); + for (i = 0; i < (PAGE_TOUCH_THREAD_COUNT - 1); i++) { + page_range[i].addr = area; + page_range[i].numpages = numpage_per_thread; + page_range[i].hpagesize = hpagesize; + + qemu_thread_create(page_threads + i, "touch_pages", + do_touch_pages, (page_range + i), + QEMU_THREAD_JOINABLE); + tcount++; + area += size_per_thread; + numpages -= numpage_per_thread; + } + for (i = 0; i < numpages; i++) { + memset(area + (hpagesize * i), 0, 1); + } + for (i = 0; i < tcount; i++) { + qemu_thread_join(page_threads + i); + } + return 0; +} + void os_mem_prealloc(int fd, char *area, size_t memory, Error **errp) { int ret; @@ -353,9 +406,14 @@ void os_mem_prealloc(int fd, char *area, size_t memory, Error **errp) size_t hpagesize = qemu_fd_getpagesize(fd); size_t numpages = DIV_ROUND_UP(memory, hpagesize); - /* MAP_POPULATE silently ignores failures */ - for (i = 0; i < numpages; i++) { - memset(area + (hpagesize * i), 0, 1); + /* touch pages simultaneously for memory >= 64G */ + if (memory < (1ULL << 36)) { + /* MAP_POPULATE silently ignores failures */ + for (i = 0; i < numpages; i++) { + memset(area + (hpagesize * i), 0, 1); + } + } else { + touch_all_pages(area, hpagesize, numpages); } }