From patchwork Thu Jul 22 12:36:30 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Hildenbrand X-Patchwork-Id: 12393997 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-10.8 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,UNWANTED_LANGUAGE_BODY, URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id CCFE5C6377D for ; Thu, 22 Jul 2021 12:39:06 +0000 (UTC) Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 7343D61278 for ; Thu, 22 Jul 2021 12:39:06 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 7343D61278 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Received: from localhost ([::1]:43944 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1m6Xyj-000623-JD for qemu-devel@archiver.kernel.org; Thu, 22 Jul 2021 08:39:05 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:51018) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1m6Xwg-000398-UC for qemu-devel@nongnu.org; Thu, 22 Jul 2021 08:37:00 -0400 Received: from us-smtp-delivery-124.mimecast.com ([216.205.24.124]:22964) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1m6Xwd-0004Th-6j for qemu-devel@nongnu.org; Thu, 22 Jul 2021 08:36:57 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1626957414; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=VaBCFUm5u8ChbQFJ9QE9XCxxc/MfcJcOIinlHfTKUUg=; b=Oqk2LBCgLRW1H9O9Nz2XBTO+dytwaod64FErcTKy8gN2zsoBF5NqURksSskpJYnlmEALjg GUKSfeP92PUr+KEzM9BlPkvEeGYiGmknmBIkDigZKSS9BY/etnQczG0tdSKiCmCu43Uewi f7Jbp/4Uadh8ZOhkuLsqchEwxwY39Gw= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-537-JL6u_c7PNtKTjsZsRGQHqA-1; Thu, 22 Jul 2021 08:36:53 -0400 X-MC-Unique: JL6u_c7PNtKTjsZsRGQHqA-1 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.phx2.redhat.com [10.5.11.23]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 5CE941074667; Thu, 22 Jul 2021 12:36:52 +0000 (UTC) Received: from t480s.redhat.com (ovpn-112-116.ams2.redhat.com [10.36.112.116]) by smtp.corp.redhat.com (Postfix) with ESMTP id 64F9817A70; Thu, 22 Jul 2021 12:36:46 +0000 (UTC) From: David Hildenbrand To: qemu-devel@nongnu.org Subject: [PATCH v2 1/6] util/oslib-posix: Support MADV_POPULATE_WRITE for os_mem_prealloc() Date: Thu, 22 Jul 2021 14:36:30 +0200 Message-Id: <20210722123635.60608-2-david@redhat.com> In-Reply-To: <20210722123635.60608-1-david@redhat.com> References: <20210722123635.60608-1-david@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.84 on 10.5.11.23 Authentication-Results: relay.mimecast.com; auth=pass smtp.auth=CUSA124A263 smtp.mailfrom=david@redhat.com X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Received-SPF: pass client-ip=216.205.24.124; envelope-from=david@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -42 X-Spam_score: -4.3 X-Spam_bar: ---- X-Spam_report: (-4.3 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-1.472, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H4=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Pankaj Gupta , =?utf-8?q?Daniel_P_=2E_Berr?= =?utf-8?q?ang=C3=A9?= , Eduardo Habkost , "Michael S. Tsirkin" , David Hildenbrand , "Dr . David Alan Gilbert" , Pankaj Gupta , Igor Mammedov , Paolo Bonzini , Marek Kedzierski Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: "Qemu-devel" Let's sense support and use it for preallocation. MADV_POPULATE_WRITE does not require a SIGBUS handler, doesn't actually touch page content, and avoids context switches; it is, therefore, faster and easier to handle than our current approach. While MADV_POPULATE_WRITE is, in general, faster than manual prefaulting, and especially faster with 4k pages, there is still value in prefaulting using multiple threads to speed up preallocation. More details on MADV_POPULATE_WRITE can be found in the Linux commit 4ca9b3859dac ("mm/madvise: introduce MADV_POPULATE_(READ|WRITE) to prefault page tables") and in the man page proposal [1]. [1] https://lkml.kernel.org/r/20210712083917.16361-1-david@redhat.com This resolves the TODO in do_touch_pages(). In the future, we might want to look into using fallocate(), eventually combined with MADV_POPULATE_READ, when dealing with shared file mappings. Reviewed-by: Pankaj Gupta Signed-off-by: David Hildenbrand --- include/qemu/osdep.h | 7 ++++ util/oslib-posix.c | 88 +++++++++++++++++++++++++++++++++----------- 2 files changed, 74 insertions(+), 21 deletions(-) diff --git a/include/qemu/osdep.h b/include/qemu/osdep.h index 60718fc342..d1660d67fa 100644 --- a/include/qemu/osdep.h +++ b/include/qemu/osdep.h @@ -471,6 +471,11 @@ static inline void qemu_cleanup_generic_vfree(void *p) #else #define QEMU_MADV_REMOVE QEMU_MADV_DONTNEED #endif +#ifdef MADV_POPULATE_WRITE +#define QEMU_MADV_POPULATE_WRITE MADV_POPULATE_WRITE +#else +#define QEMU_MADV_POPULATE_WRITE QEMU_MADV_INVALID +#endif #elif defined(CONFIG_POSIX_MADVISE) @@ -484,6 +489,7 @@ static inline void qemu_cleanup_generic_vfree(void *p) #define QEMU_MADV_HUGEPAGE QEMU_MADV_INVALID #define QEMU_MADV_NOHUGEPAGE QEMU_MADV_INVALID #define QEMU_MADV_REMOVE QEMU_MADV_DONTNEED +#define QEMU_MADV_POPULATE_WRITE QEMU_MADV_INVALID #else /* no-op */ @@ -497,6 +503,7 @@ static inline void qemu_cleanup_generic_vfree(void *p) #define QEMU_MADV_HUGEPAGE QEMU_MADV_INVALID #define QEMU_MADV_NOHUGEPAGE QEMU_MADV_INVALID #define QEMU_MADV_REMOVE QEMU_MADV_INVALID +#define QEMU_MADV_POPULATE_WRITE QEMU_MADV_INVALID #endif diff --git a/util/oslib-posix.c b/util/oslib-posix.c index e8bdb02e1d..1cb80bf94c 100644 --- a/util/oslib-posix.c +++ b/util/oslib-posix.c @@ -484,10 +484,6 @@ static void *do_touch_pages(void *arg) * * 'volatile' to stop compiler optimizing this away * to a no-op - * - * TODO: get a better solution from kernel so we - * don't need to write at all so we don't cause - * wear on the storage backing the region... */ *(volatile char *)addr = *addr; addr += hpagesize; @@ -497,6 +493,31 @@ static void *do_touch_pages(void *arg) return NULL; } +static void *do_madv_populate_write_pages(void *arg) +{ + MemsetThread *memset_args = (MemsetThread *)arg; + const size_t size = memset_args->numpages * memset_args->hpagesize; + char * const addr = memset_args->addr; + int ret; + + if (!size) { + return NULL; + } + + /* See do_touch_pages(). */ + qemu_mutex_lock(&page_mutex); + while (!threads_created_flag) { + qemu_cond_wait(&page_cond, &page_mutex); + } + qemu_mutex_unlock(&page_mutex); + + ret = qemu_madvise(addr, size, QEMU_MADV_POPULATE_WRITE); + if (ret) { + memset_thread_failed = true; + } + return NULL; +} + static inline int get_memset_num_threads(int smp_cpus) { long host_procs = sysconf(_SC_NPROCESSORS_ONLN); @@ -510,10 +531,11 @@ static inline int get_memset_num_threads(int smp_cpus) } static bool touch_all_pages(char *area, size_t hpagesize, size_t numpages, - int smp_cpus) + int smp_cpus, bool use_madv_populate_write) { static gsize initialized = 0; size_t numpages_per_thread, leftover; + void *(*touch_fn)(void *); char *addr = area; int i = 0; @@ -523,6 +545,12 @@ static bool touch_all_pages(char *area, size_t hpagesize, size_t numpages, g_once_init_leave(&initialized, 1); } + if (use_madv_populate_write) { + touch_fn = do_madv_populate_write_pages; + } else { + touch_fn = do_touch_pages; + } + memset_thread_failed = false; threads_created_flag = false; memset_num_threads = get_memset_num_threads(smp_cpus); @@ -534,7 +562,7 @@ static bool touch_all_pages(char *area, size_t hpagesize, size_t numpages, memset_thread[i].numpages = numpages_per_thread + (i < leftover); memset_thread[i].hpagesize = hpagesize; qemu_thread_create(&memset_thread[i].pgthread, "touch_pages", - do_touch_pages, &memset_thread[i], + touch_fn, &memset_thread[i], QEMU_THREAD_JOINABLE); addr += memset_thread[i].numpages * hpagesize; } @@ -553,6 +581,12 @@ static bool touch_all_pages(char *area, size_t hpagesize, size_t numpages, return memset_thread_failed; } +static bool madv_populate_write_possible(char *area, size_t pagesize) +{ + return !qemu_madvise(area, pagesize, QEMU_MADV_POPULATE_WRITE) || + errno != EINVAL; +} + void os_mem_prealloc(int fd, char *area, size_t memory, int smp_cpus, Error **errp) { @@ -560,29 +594,41 @@ void os_mem_prealloc(int fd, char *area, size_t memory, int smp_cpus, struct sigaction act, oldact; size_t hpagesize = qemu_fd_getpagesize(fd); size_t numpages = DIV_ROUND_UP(memory, hpagesize); + bool use_madv_populate_write; - memset(&act, 0, sizeof(act)); - act.sa_handler = &sigbus_handler; - act.sa_flags = 0; - - ret = sigaction(SIGBUS, &act, &oldact); - if (ret) { - error_setg_errno(errp, errno, - "os_mem_prealloc: failed to install signal handler"); - return; + /* + * Sense on every invocation, as MADV_POPULATE_WRITE cannot be used for + * some special mappings, such as mapping /dev/mem. + */ + use_madv_populate_write = madv_populate_write_possible(area, hpagesize); + + if (!use_madv_populate_write) { + memset(&act, 0, sizeof(act)); + act.sa_handler = &sigbus_handler; + act.sa_flags = 0; + + ret = sigaction(SIGBUS, &act, &oldact); + if (ret) { + error_setg_errno(errp, errno, + "os_mem_prealloc: failed to install signal handler"); + return; + } } /* touch pages simultaneously */ - if (touch_all_pages(area, hpagesize, numpages, smp_cpus)) { + if (touch_all_pages(area, hpagesize, numpages, smp_cpus, + use_madv_populate_write)) { error_setg(errp, "os_mem_prealloc: Insufficient free host memory " "pages available to allocate guest RAM"); } - ret = sigaction(SIGBUS, &oldact, NULL); - if (ret) { - /* Terminate QEMU since it can't recover from error */ - perror("os_mem_prealloc: failed to reinstall signal handler"); - exit(1); + if (!use_madv_populate_write) { + ret = sigaction(SIGBUS, &oldact, NULL); + if (ret) { + /* Terminate QEMU since it can't recover from error */ + perror("os_mem_prealloc: failed to reinstall signal handler"); + exit(1); + } } }