From patchwork Tue Dec 7 07:06:06 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Daniil Tatianin X-Patchwork-Id: 12661137 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 25717C433F5 for ; Tue, 7 Dec 2021 07:08:16 +0000 (UTC) Received: from localhost ([::1]:50374 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1muUaF-0003cM-02 for qemu-devel@archiver.kernel.org; Tue, 07 Dec 2021 02:08:15 -0500 Received: from eggs.gnu.org ([209.51.188.92]:48978) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1muUYb-0001jf-8w for qemu-devel@nongnu.org; Tue, 07 Dec 2021 02:06:33 -0500 Received: from forwardcorp1o.mail.yandex.net ([95.108.205.193]:33178) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1muUYS-0008TV-V0 for qemu-devel@nongnu.org; Tue, 07 Dec 2021 02:06:30 -0500 Received: from vla1-fdfb804fb3f3.qloud-c.yandex.net (vla1-fdfb804fb3f3.qloud-c.yandex.net [IPv6:2a02:6b8:c0d:3199:0:640:fdfb:804f]) by forwardcorp1o.mail.yandex.net (Yandex) with ESMTP id A57C62E1266; Tue, 7 Dec 2021 10:06:17 +0300 (MSK) Received: from vla1-81430ab5870b.qloud-c.yandex.net (vla1-81430ab5870b.qloud-c.yandex.net [2a02:6b8:c0d:35a1:0:640:8143:ab5]) by vla1-fdfb804fb3f3.qloud-c.yandex.net (mxbackcorp/Yandex) with ESMTP id Ph9wxHp6UI-6HLqIG1e; Tue, 07 Dec 2021 10:06:17 +0300 Precedence: bulk DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yandex-team.ru; s=default; t=1638860777; bh=8gPwH387wsHbUVM1SunSCfrZH0nPeMnZqOK7k6KpunA=; h=Message-Id:Date:Subject:To:From:Cc; b=VJV6zLcKXOAA/Cm8IdyCv6D4ZHNS/5ZMmeZwQBdEwLOQ6rWGfykJWtm7e2fjDmy8k f85not4Ntfn2bP6iUpDfv+2eFlnUqvukBYZGNWTBaFTvG+6165jL4YcBVXKvxjGEKk nFDOl9DTZWSIVfKwpb8JqaL8sZCMD61ZrcLIIv24= Authentication-Results: vla1-fdfb804fb3f3.qloud-c.yandex.net; dkim=pass header.i=@yandex-team.ru Received: from d-tatianin-nix.yandex-team.ru (dynamic-vpn.dhcp.yndx.net [2a02:6b8:b081:8118::1:31]) by vla1-81430ab5870b.qloud-c.yandex.net (smtpcorp/Yandex) with ESMTPSA id 0CVY1jOJUY-6HPCCX3h; Tue, 07 Dec 2021 10:06:17 +0300 (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) (Client certificate not present) X-Yandex-Fwd: 2 From: Daniil Tatianin To: qemu-devel@nongnu.org Subject: [PATCH v1 1/2] hostmem: use a static size for maxnode, validate policy everywhere Date: Tue, 7 Dec 2021 10:06:06 +0300 Message-Id: <20211207070607.1422670-1-d-tatianin@yandex-team.ru> X-Mailer: git-send-email 2.25.1 MIME-Version: 1.0 Received-SPF: pass client-ip=95.108.205.193; envelope-from=d-tatianin@yandex-team.ru; helo=forwardcorp1o.mail.yandex.net X-Spam_score_int: -27 X-Spam_score: -2.8 X-Spam_bar: -- X-Spam_report: (-2.8 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_LOW=-0.7, SPF_HELO_NONE=0.001, T_SPF_TEMPERROR=0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: imammedo@redhat.com, sw@weilnetz.de, pbonzini@redhat.com, yc-core@yandex-team.ru, david@redhat.com Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: "Qemu-devel" Previously we would calculate the last set bit in the mask, and add 2 to that value to get the maxnode value. This is unnecessary since the mbind syscall allows the bitmap to be any (reasonable) size as long as all the unused bits are clear. This also adds policy validation in multiple places so that it's guaranteed to be valid when we call mbind. Signed-off-by: Daniil Tatianin --- backends/hostmem.c | 64 +++++++++++++++++++++++++++++++--------------- 1 file changed, 43 insertions(+), 21 deletions(-) diff --git a/backends/hostmem.c b/backends/hostmem.c index 4c05862ed5..392026efe6 100644 --- a/backends/hostmem.c +++ b/backends/hostmem.c @@ -38,6 +38,29 @@ host_memory_backend_get_name(HostMemoryBackend *backend) return object_get_canonical_path(OBJECT(backend)); } +static bool +validate_policy(HostMemPolicy policy, bool nodes_empty, Error **errp) +{ + /* + * check for invalid host-nodes and policies and give more verbose + * error messages than mbind(). + */ + if (!nodes_empty && policy == MPOL_DEFAULT) { + error_setg(errp, "host-nodes must be empty for policy default," + " or you should explicitly specify a policy other" + " than default"); + return false; + } + + if (nodes_empty && policy != MPOL_DEFAULT) { + error_setg(errp, "host-nodes must be set for policy %s", + HostMemPolicy_str(policy)); + return false; + } + + return true; +} + static void host_memory_backend_get_size(Object *obj, Visitor *v, const char *name, void *opaque, Error **errp) @@ -110,6 +133,7 @@ host_memory_backend_set_host_nodes(Object *obj, Visitor *v, const char *name, #ifdef CONFIG_NUMA HostMemoryBackend *backend = MEMORY_BACKEND(obj); uint16List *l, *host_nodes = NULL; + bool nodes_empty = bitmap_empty(backend->host_nodes, MAX_NODES + 1); visit_type_uint16List(v, name, &host_nodes, errp); @@ -118,6 +142,13 @@ host_memory_backend_set_host_nodes(Object *obj, Visitor *v, const char *name, error_setg(errp, "Invalid host-nodes value: %d", l->value); goto out; } + + nodes_empty = false; + } + + if (host_memory_backend_mr_inited(backend) && + !validate_policy(backend->policy, nodes_empty, errp)) { + goto out; } for (l = host_nodes; l; l = l->next) { @@ -142,6 +173,13 @@ static void host_memory_backend_set_policy(Object *obj, int policy, Error **errp) { HostMemoryBackend *backend = MEMORY_BACKEND(obj); + bool nodes_empty = bitmap_empty(backend->host_nodes, MAX_NODES + 1); + + if (host_memory_backend_mr_inited(backend) && + !validate_policy(policy, nodes_empty, errp)) { + return; + } + backend->policy = policy; #ifndef CONFIG_NUMA @@ -347,24 +385,9 @@ host_memory_backend_memory_complete(UserCreatable *uc, Error **errp) qemu_madvise(ptr, sz, QEMU_MADV_DONTDUMP); } #ifdef CONFIG_NUMA - unsigned long lastbit = find_last_bit(backend->host_nodes, MAX_NODES); - /* lastbit == MAX_NODES means maxnode = 0 */ - unsigned long maxnode = (lastbit + 1) % (MAX_NODES + 1); - /* ensure policy won't be ignored in case memory is preallocated - * before mbind(). note: MPOL_MF_STRICT is ignored on hugepages so - * this doesn't catch hugepage case. */ unsigned flags = MPOL_MF_STRICT | MPOL_MF_MOVE; - - /* check for invalid host-nodes and policies and give more verbose - * error messages than mbind(). */ - if (maxnode && backend->policy == MPOL_DEFAULT) { - error_setg(errp, "host-nodes must be empty for policy default," - " or you should explicitly specify a policy other" - " than default"); - return; - } else if (maxnode == 0 && backend->policy != MPOL_DEFAULT) { - error_setg(errp, "host-nodes must be set for policy %s", - HostMemPolicy_str(backend->policy)); + bool nodes_empty = bitmap_empty(backend->host_nodes, MAX_NODES + 1); + if (!validate_policy(backend->policy, nodes_empty, errp)) { return; } @@ -373,12 +396,11 @@ host_memory_backend_memory_complete(UserCreatable *uc, Error **errp) * cuts off the last specified node. This means backend->host_nodes * must have MAX_NODES+1 bits available. */ - assert(sizeof(backend->host_nodes) >= + QEMU_BUILD_BUG_ON(sizeof(backend->host_nodes) < BITS_TO_LONGS(MAX_NODES + 1) * sizeof(unsigned long)); - assert(maxnode <= MAX_NODES); - if (maxnode && - mbind(ptr, sz, backend->policy, backend->host_nodes, maxnode + 1, + if (!nodes_empty && + mbind(ptr, sz, backend->policy, backend->host_nodes, MAX_NODES + 1, flags)) { if (backend->policy != MPOL_DEFAULT || errno != ENOSYS) { error_setg_errno(errp, errno, From patchwork Tue Dec 7 07:06:07 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Daniil Tatianin X-Patchwork-Id: 12661135 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id C9601C433EF for ; Tue, 7 Dec 2021 07:07:53 +0000 (UTC) Received: from localhost ([::1]:49704 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1muUZs-0003B3-SO for qemu-devel@archiver.kernel.org; Tue, 07 Dec 2021 02:07:52 -0500 Received: from eggs.gnu.org ([209.51.188.92]:48988) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1muUYb-0001kL-8w for qemu-devel@nongnu.org; Tue, 07 Dec 2021 02:06:33 -0500 Received: from forwardcorp1j.mail.yandex.net ([5.45.199.163]:60230) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1muUYT-0008UU-EE for qemu-devel@nongnu.org; Tue, 07 Dec 2021 02:06:32 -0500 Received: from vla1-fdfb804fb3f3.qloud-c.yandex.net (vla1-fdfb804fb3f3.qloud-c.yandex.net [IPv6:2a02:6b8:c0d:3199:0:640:fdfb:804f]) by forwardcorp1j.mail.yandex.net (Yandex) with ESMTP id EB8662E1F93; Tue, 7 Dec 2021 10:06:17 +0300 (MSK) Received: from vla1-81430ab5870b.qloud-c.yandex.net (vla1-81430ab5870b.qloud-c.yandex.net [2a02:6b8:c0d:35a1:0:640:8143:ab5]) by vla1-fdfb804fb3f3.qloud-c.yandex.net (mxbackcorp/Yandex) with ESMTP id SM0hQ2lux9-6HLS3Y8U; Tue, 07 Dec 2021 10:06:17 +0300 Precedence: bulk DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yandex-team.ru; s=default; t=1638860777; bh=RPm/uHsosaeQSlXbxGZcXNpamm54rJ+n/cPQwUb5s4o=; h=In-Reply-To:Message-Id:References:Date:Subject:To:From:Cc; b=Z3LmKiUhtSdQdbNVj2//J554HqLfc6brKpa8r6DagPYwuK7JoyPrLJPBKDABobdao YH/Ft0IMnDFlk93KUs5SA05x6GcSd7Y5+QoxGvX+PXIEIN9vEm9fgBxZ8B3sjHiNRK pkTfGYuhZ47OcJuPqPAWiqji91NTSu0TaQ1Comq0= Authentication-Results: vla1-fdfb804fb3f3.qloud-c.yandex.net; dkim=pass header.i=@yandex-team.ru Received: from d-tatianin-nix.yandex-team.ru (dynamic-vpn.dhcp.yndx.net [2a02:6b8:b081:8118::1:31]) by vla1-81430ab5870b.qloud-c.yandex.net (smtpcorp/Yandex) with ESMTPSA id 0CVY1jOJUY-6HPCSFYP; Tue, 07 Dec 2021 10:06:17 +0300 (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) (Client certificate not present) X-Yandex-Fwd: 2 From: Daniil Tatianin To: qemu-devel@nongnu.org Subject: [PATCH v1 2/2] osdep: support mempolicy for preallocation in os_mem_prealloc Date: Tue, 7 Dec 2021 10:06:07 +0300 Message-Id: <20211207070607.1422670-2-d-tatianin@yandex-team.ru> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20211207070607.1422670-1-d-tatianin@yandex-team.ru> References: <20211207070607.1422670-1-d-tatianin@yandex-team.ru> MIME-Version: 1.0 Received-SPF: pass client-ip=5.45.199.163; envelope-from=d-tatianin@yandex-team.ru; helo=forwardcorp1j.mail.yandex.net X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: imammedo@redhat.com, sw@weilnetz.de, pbonzini@redhat.com, yc-core@yandex-team.ru, david@redhat.com Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: "Qemu-devel" This is needed for cases where we want to make sure that a shared memory region gets allocated from a specific NUMA node. This is impossible to do with mbind(2) because it ignores the policy for memory mapped with MAP_SHARED. We work around this by calling set_mempolicy from prealloc threads instead. Signed-off-by: Daniil Tatianin --- backends/hostmem.c | 6 ++++-- include/qemu/osdep.h | 3 ++- util/meson.build | 2 ++ util/oslib-posix.c | 29 ++++++++++++++++++++++++++--- util/oslib-win32.c | 3 ++- 5 files changed, 36 insertions(+), 7 deletions(-) diff --git a/backends/hostmem.c b/backends/hostmem.c index 392026efe6..0c508ed9df 100644 --- a/backends/hostmem.c +++ b/backends/hostmem.c @@ -269,7 +269,8 @@ static void host_memory_backend_set_prealloc(Object *obj, bool value, void *ptr = memory_region_get_ram_ptr(&backend->mr); uint64_t sz = memory_region_size(&backend->mr); - os_mem_prealloc(fd, ptr, sz, backend->prealloc_threads, &local_err); + os_mem_prealloc(fd, ptr, sz, backend->prealloc_threads, backend->policy, + backend->host_nodes, MAX_NODES + 1, &local_err); if (local_err) { error_propagate(errp, local_err); return; @@ -415,7 +416,8 @@ host_memory_backend_memory_complete(UserCreatable *uc, Error **errp) */ if (backend->prealloc) { os_mem_prealloc(memory_region_get_fd(&backend->mr), ptr, sz, - backend->prealloc_threads, &local_err); + backend->prealloc_threads, backend->policy, + backend->host_nodes, MAX_NODES + 1, &local_err); if (local_err) { goto out; } diff --git a/include/qemu/osdep.h b/include/qemu/osdep.h index 60718fc342..abf88aeb0e 100644 --- a/include/qemu/osdep.h +++ b/include/qemu/osdep.h @@ -688,7 +688,8 @@ unsigned long qemu_getauxval(unsigned long type); void qemu_set_tty_echo(int fd, bool echo); void os_mem_prealloc(int fd, char *area, size_t sz, int smp_cpus, - Error **errp); + int policy, unsigned long *node_bitmap, + unsigned long max_node, Error **errp); /** * qemu_get_pid_name: diff --git a/util/meson.build b/util/meson.build index 05b593055a..25f9fca379 100644 --- a/util/meson.build +++ b/util/meson.build @@ -87,3 +87,5 @@ if have_block if_false: files('filemonitor-stub.c')) util_ss.add(when: 'CONFIG_LINUX', if_true: files('vfio-helpers.c')) endif + +util_ss.add(when: 'CONFIG_NUMA', if_true: numa) diff --git a/util/oslib-posix.c b/util/oslib-posix.c index e8bdb02e1d..bca25698c5 100644 --- a/util/oslib-posix.c +++ b/util/oslib-posix.c @@ -38,11 +38,13 @@ #include "qemu/sockets.h" #include "qemu/thread.h" #include +#include "qemu/bitmap.h" #include "qemu/cutils.h" #include "qemu/compiler.h" #ifdef CONFIG_LINUX #include +#include #endif #ifdef __FreeBSD__ @@ -79,6 +81,9 @@ struct MemsetThread { size_t hpagesize; QemuThread pgthread; sigjmp_buf env; + int policy; + unsigned long *node_bitmap; + unsigned long max_node; }; typedef struct MemsetThread MemsetThread; @@ -464,6 +469,18 @@ static void *do_touch_pages(void *arg) } qemu_mutex_unlock(&page_mutex); +#ifdef CONFIG_NUMA + if (memset_args->max_node && + !bitmap_empty(memset_args->node_bitmap, memset_args->max_node)) { + long ret = set_mempolicy(memset_args->policy, memset_args->node_bitmap, + memset_args->max_node); + if (ret < 0) { + memset_thread_failed = true; + return NULL; + } + } +#endif + /* unblock SIGBUS */ sigemptyset(&set); sigaddset(&set, SIGBUS); @@ -510,7 +527,8 @@ static inline int get_memset_num_threads(int smp_cpus) } static bool touch_all_pages(char *area, size_t hpagesize, size_t numpages, - int smp_cpus) + int smp_cpus, int policy, + unsigned long *node_bitmap, unsigned long max_node) { static gsize initialized = 0; size_t numpages_per_thread, leftover; @@ -533,6 +551,9 @@ static bool touch_all_pages(char *area, size_t hpagesize, size_t numpages, memset_thread[i].addr = addr; memset_thread[i].numpages = numpages_per_thread + (i < leftover); memset_thread[i].hpagesize = hpagesize; + memset_thread[i].policy = policy; + memset_thread[i].node_bitmap = node_bitmap; + memset_thread[i].max_node = max_node; qemu_thread_create(&memset_thread[i].pgthread, "touch_pages", do_touch_pages, &memset_thread[i], QEMU_THREAD_JOINABLE); @@ -554,7 +575,8 @@ static bool touch_all_pages(char *area, size_t hpagesize, size_t numpages, } void os_mem_prealloc(int fd, char *area, size_t memory, int smp_cpus, - Error **errp) + int policy, unsigned long *node_bitmap, + unsigned long max_node, Error **errp) { int ret; struct sigaction act, oldact; @@ -573,7 +595,8 @@ void os_mem_prealloc(int fd, char *area, size_t memory, int smp_cpus, } /* touch pages simultaneously */ - if (touch_all_pages(area, hpagesize, numpages, smp_cpus)) { + if (touch_all_pages(area, hpagesize, numpages, smp_cpus, policy, + node_bitmap, max_node)) { error_setg(errp, "os_mem_prealloc: Insufficient free host memory " "pages available to allocate guest RAM"); } diff --git a/util/oslib-win32.c b/util/oslib-win32.c index af559ef339..3e56bf9f09 100644 --- a/util/oslib-win32.c +++ b/util/oslib-win32.c @@ -371,7 +371,8 @@ int getpagesize(void) } void os_mem_prealloc(int fd, char *area, size_t memory, int smp_cpus, - Error **errp) + int policy, unsigned long *node_bitmap, + unsigned long max_node, Error **errp) { int i; size_t pagesize = qemu_real_host_page_size;