From patchwork Wed Sep 28 16:45:36 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Hildenbrand X-Patchwork-Id: 12992610 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 9E6A6C04A95 for ; Wed, 28 Sep 2022 17:04:08 +0000 (UTC) Received: from localhost ([::1]:46506 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1odaTf-0004tE-Js for qemu-devel@archiver.kernel.org; Wed, 28 Sep 2022 13:04:07 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:41150) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1odaCJ-00021r-BR for qemu-devel@nongnu.org; Wed, 28 Sep 2022 12:46:11 -0400 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]:36076) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1odaCF-000189-VS for qemu-devel@nongnu.org; Wed, 28 Sep 2022 12:46:09 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1664383567; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=TxIIz/pSwV3AIroPs6wxvCxEj8Y1An4m5U3N0r/hH80=; b=ak6DRRG3GP4K/EAGvowM5v2vAit+P0IWPWEi6YtjTI/YcrMKzq9O8Un1s6XhG0qvOG7NuO R8FQqnB59fXBauKM0UIK4etpEC0eOUIUx2pTPZfCn0GJqNjPloj/HCl7udN4BSKZPxa8nO /AD7vlOnwMvpXSZNxtieyfaSZpzD2Ks= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-474-5iN7T6s1NDO-_ua6PNOHAA-1; Wed, 28 Sep 2022 12:46:02 -0400 X-MC-Unique: 5iN7T6s1NDO-_ua6PNOHAA-1 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.rdu2.redhat.com [10.11.54.6]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 67D1F882822; Wed, 28 Sep 2022 16:46:02 +0000 (UTC) Received: from t480s.redhat.com (unknown [10.39.193.191]) by smtp.corp.redhat.com (Postfix) with ESMTP id 07BAE2166B26; Wed, 28 Sep 2022 16:45:58 +0000 (UTC) From: David Hildenbrand To: qemu-devel@nongnu.org Cc: David Hildenbrand , Michal Privoznik , Igor Mammedov , "Michael S. Tsirkin" , Paolo Bonzini , =?utf-8?q?Daniel_P=2E_Berrang=C3=A9?= , Eduardo Habkost , "Dr . David Alan Gilbert" , Eric Blake , Markus Armbruster , Richard Henderson , Stefan Weil Subject: [PATCH v1 1/7] util: Cleanup and rename os_mem_prealloc() Date: Wed, 28 Sep 2022 18:45:36 +0200 Message-Id: <20220928164542.117952-2-david@redhat.com> In-Reply-To: <20220928164542.117952-1-david@redhat.com> References: <20220928164542.117952-1-david@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.6 Received-SPF: pass client-ip=170.10.133.124; envelope-from=david@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -21 X-Spam_score: -2.2 X-Spam_bar: -- X-Spam_report: (-2.2 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.082, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: "Qemu-devel" Let's * give the function a "qemu_*" style name * make sure the parameters in the implementation match the prototype * rename smp_cpus to max_threads, which makes the semantics of that parameter clearer ... and add a function documentation. Reviewed-by: Michal Privoznik Signed-off-by: David Hildenbrand --- backends/hostmem.c | 6 +++--- hw/virtio/virtio-mem.c | 2 +- include/qemu/osdep.h | 17 +++++++++++++++-- softmmu/cpus.c | 2 +- util/oslib-posix.c | 24 ++++++++++++------------ util/oslib-win32.c | 8 ++++---- 6 files changed, 36 insertions(+), 23 deletions(-) diff --git a/backends/hostmem.c b/backends/hostmem.c index 4428e06738..491cb10b97 100644 --- a/backends/hostmem.c +++ b/backends/hostmem.c @@ -232,7 +232,7 @@ static void host_memory_backend_set_prealloc(Object *obj, bool value, void *ptr = memory_region_get_ram_ptr(&backend->mr); uint64_t sz = memory_region_size(&backend->mr); - os_mem_prealloc(fd, ptr, sz, backend->prealloc_threads, &local_err); + qemu_prealloc_mem(fd, ptr, sz, backend->prealloc_threads, &local_err); if (local_err) { error_propagate(errp, local_err); return; @@ -383,8 +383,8 @@ host_memory_backend_memory_complete(UserCreatable *uc, Error **errp) * specified NUMA policy in place. */ if (backend->prealloc) { - os_mem_prealloc(memory_region_get_fd(&backend->mr), ptr, sz, - backend->prealloc_threads, &local_err); + qemu_prealloc_mem(memory_region_get_fd(&backend->mr), ptr, sz, + backend->prealloc_threads, &local_err); if (local_err) { goto out; } diff --git a/hw/virtio/virtio-mem.c b/hw/virtio/virtio-mem.c index 30d03e987a..0e9ef4ff19 100644 --- a/hw/virtio/virtio-mem.c +++ b/hw/virtio/virtio-mem.c @@ -467,7 +467,7 @@ static int virtio_mem_set_block_state(VirtIOMEM *vmem, uint64_t start_gpa, int fd = memory_region_get_fd(&vmem->memdev->mr); Error *local_err = NULL; - os_mem_prealloc(fd, area, size, 1, &local_err); + qemu_prealloc_mem(fd, area, size, 1, &local_err); if (local_err) { static bool warned; diff --git a/include/qemu/osdep.h b/include/qemu/osdep.h index b1c161c035..e556e45143 100644 --- a/include/qemu/osdep.h +++ b/include/qemu/osdep.h @@ -568,8 +568,21 @@ unsigned long qemu_getauxval(unsigned long type); void qemu_set_tty_echo(int fd, bool echo); -void os_mem_prealloc(int fd, char *area, size_t sz, int smp_cpus, - Error **errp); +/** + * qemu_prealloc_mem: + * @fd: the fd mapped into the area, -1 for anonymous memory + * @area: start address of the are to preallocate + * @sz: the size of the area to preallocate + * @max_threads: maximum number of threads to use + * @errp: returns an error if this function fails + * + * Preallocate memory (populate/prefault page tables writable) for the virtual + * memory area starting at @area with the size of @sz. After a successful call, + * each page in the area was faulted in writable at least once, for example, + * after allocating file blocks for mapped files. + */ +void qemu_prealloc_mem(int fd, char *area, size_t sz, int max_threads, + Error **errp); /** * qemu_get_pid_name: diff --git a/softmmu/cpus.c b/softmmu/cpus.c index 23b30484b2..cf8aa70ca5 100644 --- a/softmmu/cpus.c +++ b/softmmu/cpus.c @@ -354,7 +354,7 @@ static void qemu_init_sigbus(void) /* * ALERT: when modifying this, take care that SIGBUS forwarding in - * os_mem_prealloc() will continue working as expected. + * qemu_prealloc_mem() will continue working as expected. */ memset(&action, 0, sizeof(action)); action.sa_flags = SA_SIGINFO; diff --git a/util/oslib-posix.c b/util/oslib-posix.c index d55af69c11..5a2ae4ef3f 100644 --- a/util/oslib-posix.c +++ b/util/oslib-posix.c @@ -310,7 +310,7 @@ static void sigbus_handler(int signal) return; } #endif /* CONFIG_LINUX */ - warn_report("os_mem_prealloc: unrelated SIGBUS detected and ignored"); + warn_report("qemu_prealloc_mem: unrelated SIGBUS detected and ignored"); } static void *do_touch_pages(void *arg) @@ -380,13 +380,13 @@ static void *do_madv_populate_write_pages(void *arg) } static inline int get_memset_num_threads(size_t hpagesize, size_t numpages, - int smp_cpus) + int max_threads) { long host_procs = sysconf(_SC_NPROCESSORS_ONLN); int ret = 1; if (host_procs > 0) { - ret = MIN(MIN(host_procs, MAX_MEM_PREALLOC_THREAD_COUNT), smp_cpus); + ret = MIN(MIN(host_procs, MAX_MEM_PREALLOC_THREAD_COUNT), max_threads); } /* Especially with gigantic pages, don't create more threads than pages. */ @@ -399,11 +399,11 @@ static inline int get_memset_num_threads(size_t hpagesize, size_t numpages, } static int touch_all_pages(char *area, size_t hpagesize, size_t numpages, - int smp_cpus, bool use_madv_populate_write) + int max_threads, bool use_madv_populate_write) { static gsize initialized = 0; MemsetContext context = { - .num_threads = get_memset_num_threads(hpagesize, numpages, smp_cpus), + .num_threads = get_memset_num_threads(hpagesize, numpages, max_threads), }; size_t numpages_per_thread, leftover; void *(*touch_fn)(void *); @@ -475,13 +475,13 @@ static bool madv_populate_write_possible(char *area, size_t pagesize) errno != EINVAL; } -void os_mem_prealloc(int fd, char *area, size_t memory, int smp_cpus, - Error **errp) +void qemu_prealloc_mem(int fd, char *area, size_t sz, int max_threads, + Error **errp) { static gsize initialized; int ret; size_t hpagesize = qemu_fd_getpagesize(fd); - size_t numpages = DIV_ROUND_UP(memory, hpagesize); + size_t numpages = DIV_ROUND_UP(sz, hpagesize); bool use_madv_populate_write; struct sigaction act; @@ -511,24 +511,24 @@ void os_mem_prealloc(int fd, char *area, size_t memory, int smp_cpus, if (ret) { qemu_mutex_unlock(&sigbus_mutex); error_setg_errno(errp, errno, - "os_mem_prealloc: failed to install signal handler"); + "qemu_prealloc_mem: failed to install signal handler"); return; } } /* touch pages simultaneously */ - ret = touch_all_pages(area, hpagesize, numpages, smp_cpus, + ret = touch_all_pages(area, hpagesize, numpages, max_threads, use_madv_populate_write); if (ret) { error_setg_errno(errp, -ret, - "os_mem_prealloc: preallocating memory failed"); + "qemu_prealloc_mem: preallocating memory failed"); } if (!use_madv_populate_write) { ret = sigaction(SIGBUS, &sigbus_oldact, NULL); if (ret) { /* Terminate QEMU since it can't recover from error */ - perror("os_mem_prealloc: failed to reinstall signal handler"); + perror("qemu_prealloc_mem: failed to reinstall signal handler"); exit(1); } qemu_mutex_unlock(&sigbus_mutex); diff --git a/util/oslib-win32.c b/util/oslib-win32.c index 5723d3eb4c..e1cb725ecc 100644 --- a/util/oslib-win32.c +++ b/util/oslib-win32.c @@ -268,14 +268,14 @@ int getpagesize(void) return system_info.dwPageSize; } -void os_mem_prealloc(int fd, char *area, size_t memory, int smp_cpus, - Error **errp) +void qemu_prealloc_mem(int fd, char *area, size_t sz, int max_threads, + Error **errp) { int i; size_t pagesize = qemu_real_host_page_size(); - memory = (memory + pagesize - 1) & -pagesize; - for (i = 0; i < memory / pagesize; i++) { + sz = (sz + pagesize - 1) & -pagesize; + for (i = 0; i < sz / pagesize; i++) { memset(area + pagesize * i, 0, 1); } } From patchwork Wed Sep 28 16:45:37 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Hildenbrand X-Patchwork-Id: 12992644 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 36F22C04A95 for ; Wed, 28 Sep 2022 18:10:28 +0000 (UTC) Received: from localhost ([::1]:50880 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1odbVq-0007bf-V9 for qemu-devel@archiver.kernel.org; Wed, 28 Sep 2022 14:10:27 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:47674) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1odaCf-0002Ot-Un for qemu-devel@nongnu.org; Wed, 28 Sep 2022 12:46:33 -0400 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]:59541) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1odaCe-0001AT-21 for qemu-devel@nongnu.org; Wed, 28 Sep 2022 12:46:33 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1664383591; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=vgufmHeyiyqoOCYrL5AaulMhDVJZ97tu9a+bovIVq70=; b=LJk3HzGfl8eNdjxycMym8ZmFbm+jWmjS5EPq4zG2UgMqoKxGljoM8jSsOSjL9s3nV6tv7B 2uxtn3PJQhote9qzSKHfRd9e5V7KFGX3ancr1KmMzMT56cjyhV40qzHR/r1bvgIvTsyTIa d0kZqxG2ziBXk8cHyPc9doH4wzuHHqw= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-212-tzizHIDPN_uL_TgzmSrS-g-1; Wed, 28 Sep 2022 12:46:30 -0400 X-MC-Unique: tzizHIDPN_uL_TgzmSrS-g-1 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.rdu2.redhat.com [10.11.54.6]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id BB1E52932498; Wed, 28 Sep 2022 16:46:29 +0000 (UTC) Received: from t480s.redhat.com (unknown [10.39.193.191]) by smtp.corp.redhat.com (Postfix) with ESMTP id C7A672166B26; Wed, 28 Sep 2022 16:46:02 +0000 (UTC) From: David Hildenbrand To: qemu-devel@nongnu.org Cc: David Hildenbrand , Michal Privoznik , Igor Mammedov , "Michael S. Tsirkin" , Paolo Bonzini , =?utf-8?q?Daniel_P=2E_Berrang=C3=A9?= , Eduardo Habkost , "Dr . David Alan Gilbert" , Eric Blake , Markus Armbruster , Richard Henderson , Stefan Weil Subject: [PATCH v1 2/7] util: Introduce qemu_thread_set_affinity() and qemu_thread_get_affinity() Date: Wed, 28 Sep 2022 18:45:37 +0200 Message-Id: <20220928164542.117952-3-david@redhat.com> In-Reply-To: <20220928164542.117952-1-david@redhat.com> References: <20220928164542.117952-1-david@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.6 Received-SPF: pass client-ip=170.10.133.124; envelope-from=david@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -21 X-Spam_score: -2.2 X-Spam_bar: -- X-Spam_report: (-2.2 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.082, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: "Qemu-devel" Usually, we let upper layers handle CPU pinning, because pthread_setaffinity_np() (-> sched_setaffinity()) is blocked via seccomp when starting QEMU with -sandbox enable=on,resourcecontrol=deny However, we want to configure and observe the CPU affinity of threads from QEMU directly in some cases when the sandbox option is either not enabled or not active yet. So let's add a way to configure CPU pinning via qemu_thread_set_affinity() and obtain CPU affinity via qemu_thread_get_affinity() and implement them under POSIX using pthread_setaffinity_np() + pthread_getaffinity_np(). Implementation under Windows is possible using SetProcessAffinityMask() + GetProcessAffinityMask(), however, that is left as future work. Reviewed-by: Michal Privoznik Signed-off-by: David Hildenbrand --- include/qemu/thread.h | 4 +++ meson.build | 16 +++++++++ util/qemu-thread-posix.c | 70 ++++++++++++++++++++++++++++++++++++++++ util/qemu-thread-win32.c | 12 +++++++ 4 files changed, 102 insertions(+) diff --git a/include/qemu/thread.h b/include/qemu/thread.h index af19f2b3fc..79e507c7f0 100644 --- a/include/qemu/thread.h +++ b/include/qemu/thread.h @@ -185,6 +185,10 @@ void qemu_event_destroy(QemuEvent *ev); void qemu_thread_create(QemuThread *thread, const char *name, void *(*start_routine)(void *), void *arg, int mode); +int qemu_thread_set_affinity(QemuThread *thread, unsigned long *host_cpus, + unsigned long nbits); +int qemu_thread_get_affinity(QemuThread *thread, unsigned long **host_cpus, + unsigned long *nbits); void *qemu_thread_join(QemuThread *thread); void qemu_thread_get_self(QemuThread *thread); bool qemu_thread_is_self(QemuThread *thread); diff --git a/meson.build b/meson.build index 8dc661363f..9121f99e71 100644 --- a/meson.build +++ b/meson.build @@ -2088,7 +2088,23 @@ config_host_data.set('CONFIG_PTHREAD_CONDATTR_SETCLOCK', cc.links(gnu_source_pre pthread_condattr_setclock(&attr, CLOCK_MONOTONIC); return 0; }''', dependencies: threads)) +config_host_data.set('CONFIG_PTHREAD_AFFINITY_NP', cc.links(gnu_source_prefix + ''' + #include + static void *f(void *p) { return NULL; } + int main(void) + { + int setsize = CPU_ALLOC_SIZE(64); + pthread_t thread; + cpu_set_t *cpuset; + pthread_create(&thread, 0, f, 0); + cpuset = CPU_ALLOC(64); + CPU_ZERO_S(setsize, cpuset); + pthread_setaffinity_np(thread, setsize, cpuset); + pthread_getaffinity_np(thread, setsize, cpuset); + CPU_FREE(cpuset); + return 0; + }''', dependencies: threads)) config_host_data.set('CONFIG_SIGNALFD', cc.links(gnu_source_prefix + ''' #include #include diff --git a/util/qemu-thread-posix.c b/util/qemu-thread-posix.c index ac1d56e673..bae938c670 100644 --- a/util/qemu-thread-posix.c +++ b/util/qemu-thread-posix.c @@ -16,6 +16,7 @@ #include "qemu/notify.h" #include "qemu-thread-common.h" #include "qemu/tsan.h" +#include "qemu/bitmap.h" static bool name_threads; @@ -552,6 +553,75 @@ void qemu_thread_create(QemuThread *thread, const char *name, pthread_attr_destroy(&attr); } +int qemu_thread_set_affinity(QemuThread *thread, unsigned long *host_cpus, + unsigned long nbits) +{ +#if defined(CONFIG_PTHREAD_AFFINITY_NP) + const size_t setsize = CPU_ALLOC_SIZE(nbits); + unsigned long value; + cpu_set_t *cpuset; + int err; + + cpuset = CPU_ALLOC(nbits); + g_assert(cpuset); + + CPU_ZERO_S(setsize, cpuset); + value = find_first_bit(host_cpus, nbits); + while (value < nbits) { + CPU_SET_S(value, setsize, cpuset); + value = find_next_bit(host_cpus, nbits, value + 1); + } + + err = pthread_setaffinity_np(thread->thread, setsize, cpuset); + CPU_FREE(cpuset); + return err; +#else + return -ENOSYS; +#endif +} + +int qemu_thread_get_affinity(QemuThread *thread, unsigned long **host_cpus, + unsigned long *nbits) +{ +#if defined(CONFIG_PTHREAD_AFFINITY_NP) + unsigned long tmpbits; + cpu_set_t *cpuset; + size_t setsize; + int i, err; + + tmpbits = CPU_SETSIZE; + while (true) { + setsize = CPU_ALLOC_SIZE(tmpbits); + cpuset = CPU_ALLOC(tmpbits); + g_assert(cpuset); + + err = pthread_getaffinity_np(thread->thread, setsize, cpuset); + if (err) { + CPU_FREE(cpuset); + if (err != -EINVAL) { + return err; + } + tmpbits *= 2; + } else { + break; + } + } + + /* Convert the result into a proper bitmap. */ + *nbits = tmpbits; + *host_cpus = bitmap_new(tmpbits); + for (i = 0; i < tmpbits; i++) { + if (CPU_ISSET(i, cpuset)) { + set_bit(i, *host_cpus); + } + } + CPU_FREE(cpuset); + return 0; +#else + return -ENOSYS; +#endif +} + void qemu_thread_get_self(QemuThread *thread) { thread->thread = pthread_self(); diff --git a/util/qemu-thread-win32.c b/util/qemu-thread-win32.c index a2d5a6e825..72338148bd 100644 --- a/util/qemu-thread-win32.c +++ b/util/qemu-thread-win32.c @@ -427,6 +427,18 @@ void qemu_thread_create(QemuThread *thread, const char *name, thread->data = data; } +int qemu_thread_set_affinity(QemuThread *thread, unsigned long *host_cpus, + unsigned long nbits) +{ + return -ENOSYS; +} + +int qemu_thread_get_affinity(QemuThread *thread, unsigned long **host_cpus, + unsigned long *nbits) +{ + return -ENOSYS; +} + void qemu_thread_get_self(QemuThread *thread) { thread->data = qemu_thread_data; From patchwork Wed Sep 28 16:45:38 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Hildenbrand X-Patchwork-Id: 12992599 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 3BABBC32771 for ; Wed, 28 Sep 2022 16:51:55 +0000 (UTC) Received: from localhost ([::1]:59944 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1odaHq-0005nJ-Bl for qemu-devel@archiver.kernel.org; Wed, 28 Sep 2022 12:51:54 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:50126) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1odaCo-0002lj-0x for qemu-devel@nongnu.org; Wed, 28 Sep 2022 12:46:45 -0400 Received: from us-smtp-delivery-124.mimecast.com ([170.10.129.124]:34906) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1odaCl-0001BF-Qi for qemu-devel@nongnu.org; Wed, 28 Sep 2022 12:46:41 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1664383598; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=+CNGYj0FKt+UPpkQgt3l9xWZuOxd7qImLiu+F26ewYY=; b=EkTa8F2ZgW67hyF6AfPIObg4W/I5V9yDmQGU4ENIns8xUrLfd7v8nvWSo1C4aYOgrU4QAn Xwj1pgFx4Xhma3ZDr7UtsKZywLAsi2maS1NtxVGhRNMjgJZf6gawh5Xq+tO3tuI0rNYUXB /ZXpoY+TEMnMEhj13WZ50uDvtM492QI= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-595-nxZTqGgHMKq7zaNt9Y0Y1g-1; Wed, 28 Sep 2022 12:46:37 -0400 X-MC-Unique: nxZTqGgHMKq7zaNt9Y0Y1g-1 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.rdu2.redhat.com [10.11.54.6]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id B206F89F5EC; Wed, 28 Sep 2022 16:46:36 +0000 (UTC) Received: from t480s.redhat.com (unknown [10.39.193.191]) by smtp.corp.redhat.com (Postfix) with ESMTP id 270FC2166B26; Wed, 28 Sep 2022 16:46:29 +0000 (UTC) From: David Hildenbrand To: qemu-devel@nongnu.org Cc: David Hildenbrand , Michal Privoznik , Igor Mammedov , "Michael S. Tsirkin" , Paolo Bonzini , =?utf-8?q?Daniel_P=2E_Berrang=C3=A9?= , Eduardo Habkost , "Dr . David Alan Gilbert" , Eric Blake , Markus Armbruster , Richard Henderson , Stefan Weil Subject: [PATCH v1 3/7] util: Introduce ThreadContext user-creatable object Date: Wed, 28 Sep 2022 18:45:38 +0200 Message-Id: <20220928164542.117952-4-david@redhat.com> In-Reply-To: <20220928164542.117952-1-david@redhat.com> References: <20220928164542.117952-1-david@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.6 Received-SPF: pass client-ip=170.10.129.124; envelope-from=david@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -28 X-Spam_score: -2.9 X-Spam_bar: -- X-Spam_report: (-2.9 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.082, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_LOW=-0.7, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: "Qemu-devel" Setting the CPU affinity of QEMU threads is a bit problematic, because QEMU doesn't always have permissions to set the CPU affinity itself, for example, with seccomp after initialized by QEMU: -sandbox enable=on,resourcecontrol=deny While upper layers are already aware how to handl;e CPU affinities for long-lived threads like iothreads or vcpu threads, especially short-lived threads, as used for memory-backend preallocation, are more involved to handle. These threads are created on demand and upper layers are not even able to identify and configure them. Introduce the concept of a ThreadContext, that is essentially a thread used for creating new threads. All threads created via that context thread inherit the configured CPU affinity. Consequently, it's sufficient to create a ThreadContext and configure it once, and have all threads created via that ThreadContext inherit the same CPU affinity. The CPU affinity of a ThreadContext can be configured two ways: (1) Obtaining the thread id via the "thread-id" property and setting the CPU affinity manually. (2) Setting the "cpu-affinity" property and letting QEMU try set the CPU affinity itself. This will fail if QEMU doesn't have permissions to do so anymore after seccomp was initialized. A ThreadContext can be reused, simply be reconfiguring the CPU affinity. Reviewed-by: Michal Privoznik Signed-off-by: David Hildenbrand --- include/qemu/thread-context.h | 58 +++++++ qapi/qom.json | 16 ++ util/meson.build | 1 + util/oslib-posix.c | 1 + util/thread-context.c | 279 ++++++++++++++++++++++++++++++++++ 5 files changed, 355 insertions(+) create mode 100644 include/qemu/thread-context.h create mode 100644 util/thread-context.c diff --git a/include/qemu/thread-context.h b/include/qemu/thread-context.h new file mode 100644 index 0000000000..c799cbe7a1 --- /dev/null +++ b/include/qemu/thread-context.h @@ -0,0 +1,58 @@ +/* + * QEMU Thread Context + * + * Copyright Red Hat Inc., 2022 + * + * Authors: + * David Hildenbrand + * + * This work is licensed under the terms of the GNU GPL, version 2 or later. + * See the COPYING file in the top-level directory. + * + */ + +#ifndef SYSEMU_THREAD_CONTEXT_H +#define SYSEMU_THREAD_CONTEXT_H + +#include "qapi/qapi-types-machine.h" +#include "qemu/thread.h" +#include "qom/object.h" + +#define TYPE_THREAD_CONTEXT "thread-context" +OBJECT_DECLARE_TYPE(ThreadContext, ThreadContextClass, + THREAD_CONTEXT) + +struct ThreadContextClass { + ObjectClass parent_class; +}; + +struct ThreadContext { + /* private */ + Object parent; + + /* private */ + unsigned int thread_id; + QemuThread thread; + + /* Semaphore to wait for context thread action. */ + QemuSemaphore sem; + /* Semaphore to wait for action in context thread. */ + QemuSemaphore sem_thread; + /* Mutex to synchronize requests. */ + QemuMutex mutex; + + /* Commands for the thread to execute. */ + int thread_cmd; + void *thread_cmd_data; + + /* CPU affinity bitmap used for initialization. */ + unsigned long *init_cpu_bitmap; + int init_cpu_nbits; +}; + +void thread_context_create_thread(ThreadContext *tc, QemuThread *thread, + const char *name, + void *(*start_routine)(void *), void *arg, + int mode); + +#endif /* SYSEMU_THREAD_CONTEXT_H */ diff --git a/qapi/qom.json b/qapi/qom.json index 80dd419b39..4775a333ed 100644 --- a/qapi/qom.json +++ b/qapi/qom.json @@ -830,6 +830,20 @@ 'reduced-phys-bits': 'uint32', '*kernel-hashes': 'bool' } } +## +# @ThreadContextProperties: +# +# Properties for thread context objects. +# +# @cpu-affinity: the CPU affinity for all threads created in the thread +# context (default: QEMU main thread affinity) +# +# Since: 7.2 +## +{ 'struct': 'ThreadContextProperties', + 'data': { '*cpu-affinity': ['uint16'] } } + + ## # @ObjectType: # @@ -882,6 +896,7 @@ { 'name': 'secret_keyring', 'if': 'CONFIG_SECRET_KEYRING' }, 'sev-guest', + 'thread-context', 's390-pv-guest', 'throttle-group', 'tls-creds-anon', @@ -948,6 +963,7 @@ 'secret_keyring': { 'type': 'SecretKeyringProperties', 'if': 'CONFIG_SECRET_KEYRING' }, 'sev-guest': 'SevGuestProperties', + 'thread-context': 'ThreadContextProperties', 'throttle-group': 'ThrottleGroupProperties', 'tls-creds-anon': 'TlsCredsAnonProperties', 'tls-creds-psk': 'TlsCredsPskProperties', diff --git a/util/meson.build b/util/meson.build index 5e282130df..e97cd2d779 100644 --- a/util/meson.build +++ b/util/meson.build @@ -1,4 +1,5 @@ util_ss.add(files('osdep.c', 'cutils.c', 'unicode.c', 'qemu-timer-common.c')) +util_ss.add(files('thread-context.c')) if not config_host_data.get('CONFIG_ATOMIC64') util_ss.add(files('atomic64.c')) endif diff --git a/util/oslib-posix.c b/util/oslib-posix.c index 5a2ae4ef3f..46f3def893 100644 --- a/util/oslib-posix.c +++ b/util/oslib-posix.c @@ -42,6 +42,7 @@ #include "qemu/cutils.h" #include "qemu/compiler.h" #include "qemu/units.h" +#include "qemu/thread-context.h" #ifdef CONFIG_LINUX #include diff --git a/util/thread-context.c b/util/thread-context.c new file mode 100644 index 0000000000..dcd607c532 --- /dev/null +++ b/util/thread-context.c @@ -0,0 +1,279 @@ +/* + * QEMU Thread Context + * + * Copyright Red Hat Inc., 2022 + * + * Authors: + * David Hildenbrand + * + * This work is licensed under the terms of the GNU GPL, version 2 or later. + * See the COPYING file in the top-level directory. + * + */ + +#include "qemu/osdep.h" +#include "qemu/thread-context.h" +#include "qapi/error.h" +#include "qapi/qapi-builtin-visit.h" +#include "qapi/visitor.h" +#include "qemu/config-file.h" +#include "qapi/qapi-builtin-visit.h" +#include "qom/object_interfaces.h" +#include "qemu/module.h" +#include "qemu/bitmap.h" + +enum { + TC_CMD_NONE = 0, + TC_CMD_STOP, + TC_CMD_NEW, +}; + +typedef struct ThreadContextCmdNew { + QemuThread *thread; + const char *name; + void *(*start_routine)(void *); + void *arg; + int mode; +} ThreadContextCmdNew; + +static void *thread_context_run(void *opaque) +{ + ThreadContext *tc = opaque; + + tc->thread_id = qemu_get_thread_id(); + qemu_sem_post(&tc->sem); + + while (true) { + /* + * Threads inherit the CPU affinity of the creating thread. For this + * reason, we create new (especially short-lived) threads from our + * persistent context thread. + * + * Especially when QEMU is not allowed to set the affinity itself, + * management tools can simply set the affinity of the context thread + * after creating the context, to have new threads created via + * the context inherit the CPU affinity automatically. + */ + switch (tc->thread_cmd) { + case TC_CMD_NONE: + break; + case TC_CMD_STOP: + tc->thread_cmd = TC_CMD_NONE; + qemu_sem_post(&tc->sem); + return NULL; + case TC_CMD_NEW: { + ThreadContextCmdNew *cmd_new = tc->thread_cmd_data; + + qemu_thread_create(cmd_new->thread, cmd_new->name, + cmd_new->start_routine, cmd_new->arg, + cmd_new->mode); + tc->thread_cmd = TC_CMD_NONE; + tc->thread_cmd_data = NULL; + qemu_sem_post(&tc->sem); + break; + } + default: + g_assert_not_reached(); + } + qemu_sem_wait(&tc->sem_thread); + } +} + +static void thread_context_set_cpu_affinity(Object *obj, Visitor *v, + const char *name, void *opaque, + Error **errp) +{ + ThreadContext *tc = THREAD_CONTEXT(obj); + uint16List *l, *host_cpus = NULL; + unsigned long *bitmap = NULL; + int nbits = 0, ret; + Error *err = NULL; + + visit_type_uint16List(v, name, &host_cpus, &err); + if (err) { + error_propagate(errp, err); + return; + } + + if (!host_cpus) { + error_setg(errp, "CPU list is empty"); + goto out; + } + + for (l = host_cpus; l; l = l->next) { + nbits = MAX(nbits, l->value + 1); + } + bitmap = bitmap_new(nbits); + for (l = host_cpus; l; l = l->next) { + set_bit(l->value, bitmap); + } + + if (tc->thread_id != -1) { + /* + * Note: we won't be adjusting the affinity of any thread that is still + * around, but only the affinity of the context thread. + */ + ret = qemu_thread_set_affinity(&tc->thread, bitmap, nbits); + if (ret) { + error_setg(errp, "Setting CPU affinity failed: %s", strerror(ret)); + } + } else { + tc->init_cpu_bitmap = bitmap; + bitmap = NULL; + tc->init_cpu_nbits = nbits; + } +out: + g_free(bitmap); + qapi_free_uint16List(host_cpus); +} + +static void thread_context_get_cpu_affinity(Object *obj, Visitor *v, + const char *name, void *opaque, + Error **errp) +{ + unsigned long *bitmap, nbits, value; + ThreadContext *tc = THREAD_CONTEXT(obj); + uint16List *host_cpus = NULL; + uint16List **tail = &host_cpus; + int ret; + + if (tc->thread_id == -1) { + error_setg(errp, "Object not initialized yet"); + return; + } + + ret = qemu_thread_get_affinity(&tc->thread, &bitmap, &nbits); + if (ret) { + error_setg(errp, "Getting CPU affinity failed: %s", strerror(ret)); + return; + } + + value = find_first_bit(bitmap, nbits); + while (value < nbits) { + QAPI_LIST_APPEND(tail, value); + + value = find_next_bit(bitmap, nbits, value + 1); + } + g_free(bitmap); + + visit_type_uint16List(v, name, &host_cpus, errp); + qapi_free_uint16List(host_cpus); +} + +static void thread_context_get_thread_id(Object *obj, Visitor *v, + const char *name, void *opaque, + Error **errp) +{ + ThreadContext *tc = THREAD_CONTEXT(obj); + uint64_t value = tc->thread_id; + + visit_type_uint64(v, name, &value, errp); +} + +static void thread_context_instance_complete(UserCreatable *uc, Error **errp) +{ + ThreadContext *tc = THREAD_CONTEXT(uc); + char *thread_name; + int ret; + + thread_name = g_strdup_printf("TC %s", + object_get_canonical_path_component(OBJECT(uc))); + qemu_thread_create(&tc->thread, thread_name, thread_context_run, tc, + QEMU_THREAD_JOINABLE); + g_free(thread_name); + + /* Wait until initialization of the thread is done. */ + while (tc->thread_id == -1) { + qemu_sem_wait(&tc->sem); + } + + if (tc->init_cpu_bitmap) { + ret = qemu_thread_set_affinity(&tc->thread, tc->init_cpu_bitmap, + tc->init_cpu_nbits); + if (ret) { + error_setg(errp, "Setting CPU affinity failed: %s", strerror(ret)); + } + g_free(tc->init_cpu_bitmap); + tc->init_cpu_bitmap = NULL; + } +} + +static void thread_context_class_init(ObjectClass *oc, void *data) +{ + UserCreatableClass *ucc = USER_CREATABLE_CLASS(oc); + + ucc->complete = thread_context_instance_complete; + object_class_property_add(oc, "thread-id", "int", + thread_context_get_thread_id, NULL, NULL, + NULL); + object_class_property_add(oc, "cpu-affinity", "int", + thread_context_get_cpu_affinity, + thread_context_set_cpu_affinity, NULL, NULL); +} + +static void thread_context_instance_init(Object *obj) +{ + ThreadContext *tc = THREAD_CONTEXT(obj); + + tc->thread_id = -1; + qemu_sem_init(&tc->sem, 0); + qemu_sem_init(&tc->sem_thread, 0); + qemu_mutex_init(&tc->mutex); +} + +static void thread_context_instance_finalize(Object *obj) +{ + ThreadContext *tc = THREAD_CONTEXT(obj); + + if (tc->thread_id != -1) { + tc->thread_cmd = TC_CMD_STOP; + qemu_sem_post(&tc->sem_thread); + qemu_thread_join(&tc->thread); + } + qemu_sem_destroy(&tc->sem); + qemu_sem_destroy(&tc->sem_thread); + qemu_mutex_destroy(&tc->mutex); +} + +static const TypeInfo thread_context_info = { + .name = TYPE_THREAD_CONTEXT, + .parent = TYPE_OBJECT, + .class_init = thread_context_class_init, + .instance_size = sizeof(ThreadContext), + .instance_init = thread_context_instance_init, + .instance_finalize = thread_context_instance_finalize, + .interfaces = (InterfaceInfo[]) { + { TYPE_USER_CREATABLE }, + { } + } +}; + +static void thread_context_register_types(void) +{ + type_register_static(&thread_context_info); +} +type_init(thread_context_register_types) + +void thread_context_create_thread(ThreadContext *tc, QemuThread *thread, + const char *name, + void *(*start_routine)(void *), void *arg, + int mode) +{ + ThreadContextCmdNew data = { + .thread = thread, + .name = name, + .start_routine = start_routine, + .arg = arg, + .mode = mode, + }; + + qemu_mutex_lock(&tc->mutex); + tc->thread_cmd = TC_CMD_NEW; + tc->thread_cmd_data = &data; + qemu_sem_post(&tc->sem_thread); + + while (tc->thread_cmd != TC_CMD_NONE) { + qemu_sem_wait(&tc->sem); + } + qemu_mutex_unlock(&tc->mutex); +} From patchwork Wed Sep 28 16:45:39 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Hildenbrand X-Patchwork-Id: 12992685 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id B81C8C32771 for ; Wed, 28 Sep 2022 18:23:41 +0000 (UTC) Received: from localhost ([::1]:45694 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1odbie-0008Cq-Je for qemu-devel@archiver.kernel.org; Wed, 28 Sep 2022 14:23:40 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:39172) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1odaCw-0002pz-1W for qemu-devel@nongnu.org; Wed, 28 Sep 2022 12:46:50 -0400 Received: from us-smtp-delivery-124.mimecast.com ([170.10.129.124]:45263) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1odaCr-0001BW-Pp for qemu-devel@nongnu.org; Wed, 28 Sep 2022 12:46:49 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1664383604; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=NaQrn3LK6xz8uX15VBGAhWc1DqnfLAT/KPqrEFNDFuM=; b=DgYqR7QTH4zpPXOJWJ89WbwNwAwGNE9LnR/P7sxOkQm4Mg5zD902G4Csd0iUkMjVlxzNxs 7OkFWPUOQluB7s7vBqNADbLt+ur0ymSuQFPSGjykT9gXQ+5KPv6vLW4nsNoScyGCiHxOd9 mDJzb4473PEAWm+K3/M3mCi/YGA3p10= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-619-8GHeacdeP7y7BE2LJPZhxg-1; Wed, 28 Sep 2022 12:46:40 -0400 X-MC-Unique: 8GHeacdeP7y7BE2LJPZhxg-1 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.rdu2.redhat.com [10.11.54.6]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 66413882827; Wed, 28 Sep 2022 16:46:40 +0000 (UTC) Received: from t480s.redhat.com (unknown [10.39.193.191]) by smtp.corp.redhat.com (Postfix) with ESMTP id 1E41A2166B26; Wed, 28 Sep 2022 16:46:36 +0000 (UTC) From: David Hildenbrand To: qemu-devel@nongnu.org Cc: David Hildenbrand , Michal Privoznik , Igor Mammedov , "Michael S. Tsirkin" , Paolo Bonzini , =?utf-8?q?Daniel_P=2E_Berrang=C3=A9?= , Eduardo Habkost , "Dr . David Alan Gilbert" , Eric Blake , Markus Armbruster , Richard Henderson , Stefan Weil Subject: [PATCH v1 4/7] util: Add write-only "node-affinity" property for ThreadContext Date: Wed, 28 Sep 2022 18:45:39 +0200 Message-Id: <20220928164542.117952-5-david@redhat.com> In-Reply-To: <20220928164542.117952-1-david@redhat.com> References: <20220928164542.117952-1-david@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.6 Received-SPF: pass client-ip=170.10.129.124; envelope-from=david@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -28 X-Spam_score: -2.9 X-Spam_bar: -- X-Spam_report: (-2.9 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.082, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_LOW=-0.7, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: "Qemu-devel" Let's make it easier to pin threads created via a ThreadContext to all current CPUs belonging to given NUMA nodes. As "node-affinity" is simply a shortcut for setting "cpu-affinity", that property cannot be read and if the CPUs for a node change due do CPU hotplug, the CPU affinity will not get updated. Reviewed-by: Michal Privoznik Signed-off-by: David Hildenbrand --- qapi/qom.json | 7 +++- util/meson.build | 2 +- util/thread-context.c | 84 +++++++++++++++++++++++++++++++++++++++++++ 3 files changed, 91 insertions(+), 2 deletions(-) diff --git a/qapi/qom.json b/qapi/qom.json index 4775a333ed..d36bf3355f 100644 --- a/qapi/qom.json +++ b/qapi/qom.json @@ -838,10 +838,15 @@ # @cpu-affinity: the CPU affinity for all threads created in the thread # context (default: QEMU main thread affinity) # +# @node-affinity: shortcut for looking up the current CPUs for the given nodes +# and setting @cpu-affinity (default: QEMU main thread +# affinity) +# # Since: 7.2 ## { 'struct': 'ThreadContextProperties', - 'data': { '*cpu-affinity': ['uint16'] } } + 'data': { '*cpu-affinity': ['uint16'], + '*node-affinity': ['uint16'] } } ## diff --git a/util/meson.build b/util/meson.build index e97cd2d779..c0a7bc54d4 100644 --- a/util/meson.build +++ b/util/meson.build @@ -1,5 +1,5 @@ util_ss.add(files('osdep.c', 'cutils.c', 'unicode.c', 'qemu-timer-common.c')) -util_ss.add(files('thread-context.c')) +util_ss.add(files('thread-context.c'), numa) if not config_host_data.get('CONFIG_ATOMIC64') util_ss.add(files('atomic64.c')) endif diff --git a/util/thread-context.c b/util/thread-context.c index dcd607c532..880f0441be 100644 --- a/util/thread-context.c +++ b/util/thread-context.c @@ -22,6 +22,10 @@ #include "qemu/module.h" #include "qemu/bitmap.h" +#ifdef CONFIG_NUMA +#include +#endif + enum { TC_CMD_NONE = 0, TC_CMD_STOP, @@ -89,6 +93,11 @@ static void thread_context_set_cpu_affinity(Object *obj, Visitor *v, int nbits = 0, ret; Error *err = NULL; + if (tc->init_cpu_bitmap) { + error_setg(errp, "Mixing CPU and node affinity not supported"); + return; + } + visit_type_uint16List(v, name, &host_cpus, &err); if (err) { error_propagate(errp, err); @@ -160,6 +169,79 @@ static void thread_context_get_cpu_affinity(Object *obj, Visitor *v, qapi_free_uint16List(host_cpus); } +static void thread_context_set_node_affinity(Object *obj, Visitor *v, + const char *name, void *opaque, + Error **errp) +{ +#ifdef CONFIG_NUMA + const int nbits = numa_num_possible_cpus(); + ThreadContext *tc = THREAD_CONTEXT(obj); + uint16List *l, *host_nodes = NULL; + unsigned long *bitmap = NULL; + struct bitmask *tmp_cpus; + Error *err = NULL; + int ret, i; + + if (tc->init_cpu_bitmap) { + error_setg(errp, "Mixing CPU and node affinity not supported"); + return; + } + + visit_type_uint16List(v, name, &host_nodes, &err); + if (err) { + error_propagate(errp, err); + return; + } + + if (!host_nodes) { + error_setg(errp, "Node list is empty"); + goto out; + } + + bitmap = bitmap_new(nbits); + tmp_cpus = numa_allocate_cpumask(); + for (l = host_nodes; l; l = l->next) { + numa_bitmask_clearall(tmp_cpus); + ret = numa_node_to_cpus(l->value, tmp_cpus); + if (ret) { + /* We ignore any errors, such as impossible nodes. */ + continue; + } + for (i = 0; i < nbits; i++) { + if (numa_bitmask_isbitset(tmp_cpus, i)) { + set_bit(i, bitmap); + } + } + } + numa_free_cpumask(tmp_cpus); + + if (bitmap_empty(bitmap, nbits)) { + error_setg(errp, "The nodes select no CPUs"); + goto out; + } + + if (tc->thread_id != -1) { + /* + * Note: we won't be adjusting the affinity of any thread that is still + * around for now, but only the affinity of the context thread. + */ + ret = qemu_thread_set_affinity(&tc->thread, bitmap, nbits); + if (ret) { + error_setg(errp, "Setting CPU affinity failed: %s", strerror(ret)); + } + } else { + tc->init_cpu_bitmap = bitmap; + bitmap = NULL; + tc->init_cpu_nbits = nbits; + } +out: + g_free(bitmap); + qapi_free_uint16List(host_nodes); +#else + error_setg(errp, "NUMA node affinity is not supported by this QEMU"); +#endif +} + static void thread_context_get_thread_id(Object *obj, Visitor *v, const char *name, void *opaque, Error **errp) @@ -209,6 +291,8 @@ static void thread_context_class_init(ObjectClass *oc, void *data) object_class_property_add(oc, "cpu-affinity", "int", thread_context_get_cpu_affinity, thread_context_set_cpu_affinity, NULL, NULL); + object_class_property_add(oc, "node-affinity", "int", NULL, + thread_context_set_node_affinity, NULL, NULL); } static void thread_context_instance_init(Object *obj) From patchwork Wed Sep 28 16:45:40 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Hildenbrand X-Patchwork-Id: 12992666 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 19A77C32771 for ; Wed, 28 Sep 2022 18:16:45 +0000 (UTC) Received: from localhost ([::1]:37634 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1odbbv-0003sq-4J for qemu-devel@archiver.kernel.org; Wed, 28 Sep 2022 14:16:43 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:39170) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1odaCw-0002pw-0a for qemu-devel@nongnu.org; Wed, 28 Sep 2022 12:46:50 -0400 Received: from us-smtp-delivery-124.mimecast.com ([170.10.129.124]:52471) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1odaCu-0001Bf-Bz for qemu-devel@nongnu.org; Wed, 28 Sep 2022 12:46:49 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1664383607; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=MFcuZ/5qYH8tqaW++iXSEuKmHCOsMxrstefyjV2UI/0=; b=RwkTNjTciR6WcLoM1GaI07WNnof7jjAriiCdenOqA7D9Kv6RuOvbmedV7IIQmXqG+jwxuB 8ps/oFScVwk+Tf5RzDgRZ+lnTQdabyRTPqmHswkc3Oux7N8nOy7KxgfNP5csmDapfidgje kBB1i78EGMK+Nipb2TUE1wEcpWT7sGI= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-153-yYRjEujsMQi0jJRhQa8P2Q-1; Wed, 28 Sep 2022 12:46:44 -0400 X-MC-Unique: yYRjEujsMQi0jJRhQa8P2Q-1 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.rdu2.redhat.com [10.11.54.6]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id C2E271871CCB; Wed, 28 Sep 2022 16:46:43 +0000 (UTC) Received: from t480s.redhat.com (unknown [10.39.193.191]) by smtp.corp.redhat.com (Postfix) with ESMTP id C5DA62166B2B; Wed, 28 Sep 2022 16:46:40 +0000 (UTC) From: David Hildenbrand To: qemu-devel@nongnu.org Cc: David Hildenbrand , Michal Privoznik , Igor Mammedov , "Michael S. Tsirkin" , Paolo Bonzini , =?utf-8?q?Daniel_P=2E_Berrang=C3=A9?= , Eduardo Habkost , "Dr . David Alan Gilbert" , Eric Blake , Markus Armbruster , Richard Henderson , Stefan Weil Subject: [PATCH v1 5/7] util: Make qemu_prealloc_mem() optionally consume a ThreadContext Date: Wed, 28 Sep 2022 18:45:40 +0200 Message-Id: <20220928164542.117952-6-david@redhat.com> In-Reply-To: <20220928164542.117952-1-david@redhat.com> References: <20220928164542.117952-1-david@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.6 Received-SPF: pass client-ip=170.10.129.124; envelope-from=david@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -28 X-Spam_score: -2.9 X-Spam_bar: -- X-Spam_report: (-2.9 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.082, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_LOW=-0.7, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: "Qemu-devel" ... and implement it under POSIX. When a ThreadContext is provided, create new threads via the context such that these new threads obtain a porperly configured CPU affinity. Reviewed-by: Michal Privoznik Signed-off-by: David Hildenbrand --- backends/hostmem.c | 5 +++-- hw/virtio/virtio-mem.c | 2 +- include/qemu/osdep.h | 4 +++- util/oslib-posix.c | 20 ++++++++++++++------ util/oslib-win32.c | 2 +- 5 files changed, 22 insertions(+), 11 deletions(-) diff --git a/backends/hostmem.c b/backends/hostmem.c index 491cb10b97..76f0394490 100644 --- a/backends/hostmem.c +++ b/backends/hostmem.c @@ -232,7 +232,8 @@ static void host_memory_backend_set_prealloc(Object *obj, bool value, void *ptr = memory_region_get_ram_ptr(&backend->mr); uint64_t sz = memory_region_size(&backend->mr); - qemu_prealloc_mem(fd, ptr, sz, backend->prealloc_threads, &local_err); + qemu_prealloc_mem(fd, ptr, sz, backend->prealloc_threads, NULL, + &local_err); if (local_err) { error_propagate(errp, local_err); return; @@ -384,7 +385,7 @@ host_memory_backend_memory_complete(UserCreatable *uc, Error **errp) */ if (backend->prealloc) { qemu_prealloc_mem(memory_region_get_fd(&backend->mr), ptr, sz, - backend->prealloc_threads, &local_err); + backend->prealloc_threads, NULL, &local_err); if (local_err) { goto out; } diff --git a/hw/virtio/virtio-mem.c b/hw/virtio/virtio-mem.c index 0e9ef4ff19..ed170def48 100644 --- a/hw/virtio/virtio-mem.c +++ b/hw/virtio/virtio-mem.c @@ -467,7 +467,7 @@ static int virtio_mem_set_block_state(VirtIOMEM *vmem, uint64_t start_gpa, int fd = memory_region_get_fd(&vmem->memdev->mr); Error *local_err = NULL; - qemu_prealloc_mem(fd, area, size, 1, &local_err); + qemu_prealloc_mem(fd, area, size, 1, NULL, &local_err); if (local_err) { static bool warned; diff --git a/include/qemu/osdep.h b/include/qemu/osdep.h index e556e45143..625298c8bc 100644 --- a/include/qemu/osdep.h +++ b/include/qemu/osdep.h @@ -568,6 +568,8 @@ unsigned long qemu_getauxval(unsigned long type); void qemu_set_tty_echo(int fd, bool echo); +typedef struct ThreadContext ThreadContext; + /** * qemu_prealloc_mem: * @fd: the fd mapped into the area, -1 for anonymous memory @@ -582,7 +584,7 @@ void qemu_set_tty_echo(int fd, bool echo); * after allocating file blocks for mapped files. */ void qemu_prealloc_mem(int fd, char *area, size_t sz, int max_threads, - Error **errp); + ThreadContext *tc, Error **errp); /** * qemu_get_pid_name: diff --git a/util/oslib-posix.c b/util/oslib-posix.c index 46f3def893..22980a11b1 100644 --- a/util/oslib-posix.c +++ b/util/oslib-posix.c @@ -400,7 +400,8 @@ static inline int get_memset_num_threads(size_t hpagesize, size_t numpages, } static int touch_all_pages(char *area, size_t hpagesize, size_t numpages, - int max_threads, bool use_madv_populate_write) + int max_threads, ThreadContext *tc, + bool use_madv_populate_write) { static gsize initialized = 0; MemsetContext context = { @@ -439,9 +440,16 @@ static int touch_all_pages(char *area, size_t hpagesize, size_t numpages, context.threads[i].numpages = numpages_per_thread + (i < leftover); context.threads[i].hpagesize = hpagesize; context.threads[i].context = &context; - qemu_thread_create(&context.threads[i].pgthread, "touch_pages", - touch_fn, &context.threads[i], - QEMU_THREAD_JOINABLE); + if (tc) { + thread_context_create_thread(tc, &context.threads[i].pgthread, + "touch_pages", + touch_fn, &context.threads[i], + QEMU_THREAD_JOINABLE); + } else { + qemu_thread_create(&context.threads[i].pgthread, "touch_pages", + touch_fn, &context.threads[i], + QEMU_THREAD_JOINABLE); + } addr += context.threads[i].numpages * hpagesize; } @@ -477,7 +485,7 @@ static bool madv_populate_write_possible(char *area, size_t pagesize) } void qemu_prealloc_mem(int fd, char *area, size_t sz, int max_threads, - Error **errp) + ThreadContext *tc, Error **errp) { static gsize initialized; int ret; @@ -518,7 +526,7 @@ void qemu_prealloc_mem(int fd, char *area, size_t sz, int max_threads, } /* touch pages simultaneously */ - ret = touch_all_pages(area, hpagesize, numpages, max_threads, + ret = touch_all_pages(area, hpagesize, numpages, max_threads, tc, use_madv_populate_write); if (ret) { error_setg_errno(errp, -ret, diff --git a/util/oslib-win32.c b/util/oslib-win32.c index e1cb725ecc..a67cb3822e 100644 --- a/util/oslib-win32.c +++ b/util/oslib-win32.c @@ -269,7 +269,7 @@ int getpagesize(void) } void qemu_prealloc_mem(int fd, char *area, size_t sz, int max_threads, - Error **errp) + ThreadContext *tc, Error **errp) { int i; size_t pagesize = qemu_real_host_page_size(); From patchwork Wed Sep 28 16:45:41 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Hildenbrand X-Patchwork-Id: 12992615 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 29221C04A95 for ; Wed, 28 Sep 2022 17:12:35 +0000 (UTC) Received: from localhost ([::1]:34064 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1odabp-0004vb-Ue for qemu-devel@archiver.kernel.org; Wed, 28 Sep 2022 13:12:33 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:41300) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1odaD4-00034j-6M for qemu-devel@nongnu.org; Wed, 28 Sep 2022 12:47:00 -0400 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]:58709) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1odaCz-0001CP-4U for qemu-devel@nongnu.org; Wed, 28 Sep 2022 12:46:57 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1664383612; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=f+jLGhYhOFxUpD/aIqbVoqUEt8YU4sfxRpOyCY6mqgw=; b=dq0oT719jESKv8+rAu4fPg9+rWc9cqk5N61Z74+QEoceVR9f4z34G7Olypq6jfhZNf65wV OYDig/PCgX3MlvIWCO6I2NP6u89dI0ce78v19K9fFVyRloqmCYu/kKCx9QcWp0G8xEIRJ5 BVEvozkvGgfxF9ZlzdRD4ezdPRFjIKo= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-62-5LqALcRNNpu4M2PTug36HQ-1; Wed, 28 Sep 2022 12:46:50 -0400 X-MC-Unique: 5LqALcRNNpu4M2PTug36HQ-1 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.rdu2.redhat.com [10.11.54.6]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 6EA701871CCC; Wed, 28 Sep 2022 16:46:50 +0000 (UTC) Received: from t480s.redhat.com (unknown [10.39.193.191]) by smtp.corp.redhat.com (Postfix) with ESMTP id 2DEDB2166B2B; Wed, 28 Sep 2022 16:46:43 +0000 (UTC) From: David Hildenbrand To: qemu-devel@nongnu.org Cc: David Hildenbrand , Michal Privoznik , Igor Mammedov , "Michael S. Tsirkin" , Paolo Bonzini , =?utf-8?q?Daniel_P=2E_Berrang=C3=A9?= , Eduardo Habkost , "Dr . David Alan Gilbert" , Eric Blake , Markus Armbruster , Richard Henderson , Stefan Weil Subject: [PATCH v1 6/7] hostmem: Allow for specifying a ThreadContext for preallocation Date: Wed, 28 Sep 2022 18:45:41 +0200 Message-Id: <20220928164542.117952-7-david@redhat.com> In-Reply-To: <20220928164542.117952-1-david@redhat.com> References: <20220928164542.117952-1-david@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.6 Received-SPF: pass client-ip=170.10.133.124; envelope-from=david@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -21 X-Spam_score: -2.2 X-Spam_bar: -- X-Spam_report: (-2.2 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.082, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, T_SPF_TEMPERROR=0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: "Qemu-devel" Let's allow for specifying a thread context via the "prealloc-context" property. When set, preallcoation threads will be crated via the thread context -- inheriting the same CPU affinity as the thread context. Pinning preallcoation threads to CPUs can heavily increase performance in NUMA setups, because, preallocation from a CPU close to the target NUMA node(s) is faster then preallocation from a CPU further remote, simply because of memory bandwidth for initializing memory with zeroes. This is especially relevant for very large VMs backed by huge/gigantic pages, whereby preallocation is mandatory. Reviewed-by: Michal Privoznik Signed-off-by: David Hildenbrand --- backends/hostmem.c | 12 +++++++++--- include/sysemu/hostmem.h | 2 ++ qapi/qom.json | 4 ++++ 3 files changed, 15 insertions(+), 3 deletions(-) diff --git a/backends/hostmem.c b/backends/hostmem.c index 76f0394490..8640294c10 100644 --- a/backends/hostmem.c +++ b/backends/hostmem.c @@ -232,8 +232,8 @@ static void host_memory_backend_set_prealloc(Object *obj, bool value, void *ptr = memory_region_get_ram_ptr(&backend->mr); uint64_t sz = memory_region_size(&backend->mr); - qemu_prealloc_mem(fd, ptr, sz, backend->prealloc_threads, NULL, - &local_err); + qemu_prealloc_mem(fd, ptr, sz, backend->prealloc_threads, + backend->prealloc_context, &local_err); if (local_err) { error_propagate(errp, local_err); return; @@ -385,7 +385,8 @@ host_memory_backend_memory_complete(UserCreatable *uc, Error **errp) */ if (backend->prealloc) { qemu_prealloc_mem(memory_region_get_fd(&backend->mr), ptr, sz, - backend->prealloc_threads, NULL, &local_err); + backend->prealloc_threads, + backend->prealloc_context, &local_err); if (local_err) { goto out; } @@ -493,6 +494,11 @@ host_memory_backend_class_init(ObjectClass *oc, void *data) NULL, NULL); object_class_property_set_description(oc, "prealloc-threads", "Number of CPU threads to use for prealloc"); + object_class_property_add_link(oc, "prealloc-context", + TYPE_THREAD_CONTEXT, offsetof(HostMemoryBackend, prealloc_context), + object_property_allow_set_link, OBJ_PROP_LINK_STRONG); + object_class_property_set_description(oc, "prealloc-context", + "Context to use for creating CPU threads for preallocation"); object_class_property_add(oc, "size", "int", host_memory_backend_get_size, host_memory_backend_set_size, diff --git a/include/sysemu/hostmem.h b/include/sysemu/hostmem.h index 9ff5c16963..39326f1d4f 100644 --- a/include/sysemu/hostmem.h +++ b/include/sysemu/hostmem.h @@ -18,6 +18,7 @@ #include "qom/object.h" #include "exec/memory.h" #include "qemu/bitmap.h" +#include "qemu/thread-context.h" #define TYPE_MEMORY_BACKEND "memory-backend" OBJECT_DECLARE_TYPE(HostMemoryBackend, HostMemoryBackendClass, @@ -66,6 +67,7 @@ struct HostMemoryBackend { bool merge, dump, use_canonical_path; bool prealloc, is_mapped, share, reserve; uint32_t prealloc_threads; + ThreadContext *prealloc_context; DECLARE_BITMAP(host_nodes, MAX_NODES + 1); HostMemPolicy policy; diff --git a/qapi/qom.json b/qapi/qom.json index d36bf3355f..9caa1a60e3 100644 --- a/qapi/qom.json +++ b/qapi/qom.json @@ -578,6 +578,9 @@ # # @prealloc-threads: number of CPU threads to use for prealloc (default: 1) # +# @prealloc-context: context to use for creation of preallocation threads +# (default: none) (since 7.2) +# # @share: if false, the memory is private to QEMU; if true, it is shared # (default: false) # @@ -608,6 +611,7 @@ '*policy': 'HostMemPolicy', '*prealloc': 'bool', '*prealloc-threads': 'uint32', + '*prealloc-context': 'str', '*share': 'bool', '*reserve': 'bool', 'size': 'size', From patchwork Wed Sep 28 16:45:42 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Hildenbrand X-Patchwork-Id: 12992633 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 717E3C04A95 for ; Wed, 28 Sep 2022 17:18:06 +0000 (UTC) Received: from localhost ([::1]:41002 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1odahB-0001uY-He for qemu-devel@archiver.kernel.org; Wed, 28 Sep 2022 13:18:05 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:41304) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1odaD7-00036x-0N for qemu-devel@nongnu.org; Wed, 28 Sep 2022 12:47:01 -0400 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]:47731) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1odaD4-0001Ct-U3 for qemu-devel@nongnu.org; Wed, 28 Sep 2022 12:47:00 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1664383618; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=fTp/LLUqQAxZxImTmxz+pvxOy6NG+NQX6VA1T1e1u0I=; b=Y5aaVYH6MMYYsRT3/NOWIzI8EZbtYBHPQYjWT4H3dvHFaq8iRJG6Z4l3krvThI4/qqBVFP lahGH9TgxtW4o5lWOWat750hgKKHpX9T31CkzpJg8Qnbada1IGP+/gRLsxmP8YtHuxcvOX aHOqRvchgVeHQ+e/g7d8x0VFyeXo1JM= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-532-y71jOLVNNoePrJWnzARDmQ-1; Wed, 28 Sep 2022 12:46:57 -0400 X-MC-Unique: y71jOLVNNoePrJWnzARDmQ-1 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.rdu2.redhat.com [10.11.54.6]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id AA97080206D; Wed, 28 Sep 2022 16:46:56 +0000 (UTC) Received: from t480s.redhat.com (unknown [10.39.193.191]) by smtp.corp.redhat.com (Postfix) with ESMTP id A9CA22166B26; Wed, 28 Sep 2022 16:46:50 +0000 (UTC) From: David Hildenbrand To: qemu-devel@nongnu.org Cc: David Hildenbrand , Michal Privoznik , Igor Mammedov , "Michael S. Tsirkin" , Paolo Bonzini , =?utf-8?q?Daniel_P=2E_Berrang=C3=A9?= , Eduardo Habkost , "Dr . David Alan Gilbert" , Eric Blake , Markus Armbruster , Richard Henderson , Stefan Weil Subject: [PATCH v1 7/7] vl: Allow ThreadContext objects to be created before the sandbox option Date: Wed, 28 Sep 2022 18:45:42 +0200 Message-Id: <20220928164542.117952-8-david@redhat.com> In-Reply-To: <20220928164542.117952-1-david@redhat.com> References: <20220928164542.117952-1-david@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.6 Received-SPF: pass client-ip=170.10.133.124; envelope-from=david@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -21 X-Spam_score: -2.2 X-Spam_bar: -- X-Spam_report: (-2.2 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.082, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: "Qemu-devel" Currently, there is no way to configure a CPU affinity inside QEMU when the sandbox option disables it for QEMU as a whole, for example, via: -sandbox enable=on,resourcecontrol=deny While ThreadContext objects can be created on the QEMU commandline and the CPU affinity can be configured externally via the thread-id, this is insufficient if a ThreadContext with a certain CPU affinity is already required during QEMU startup, before we can intercept QEMU and configure the CPU affinity. Blocking sched_setaffinity() was introduced in 24f8cdc57224 ("seccomp: add resourcecontrol argument to command line"), "to avoid any bigger of the process". However, we only care about once QEMU is running, not when the instance starting QEMU explicitly requests a certain CPU affinity on the QEMU comandline. Right now, for NUMA-aware preallocation of memory backends used for initial machine RAM, one has to: 1) Start QEMU with the memory-backend with "prealloc=off" 2) Pause QEMU before it starts the guest (-S) 3) Create ThreadContext, configure the CPU affinity using the thread-id 4) Configure the ThreadContext as "prealloc-context" of the memory backend 5) Trigger preallocation by setting "prealloc=on" To simplify this handling especially for initial machine RAM, allow creation of ThreadContext objects before parsing sandbox options, such that the CPU affinity requested on the QEMU commandline alongside the sandbox option can be set. As ThreadContext objects essentially only create a persistant context thread and set the CPU affinity, this is easily possible. With this change, we can create a ThreadContext with a CPU affinity on the QEMU commandline and use it for preallocation of memory backends glued to the machine (simplified example): To make "-name debug-threads=on" keep working as expected for the context threads, perform earlier parsing of "-name". qemu-system-x86_64 -m 1G \ -object thread-context,id=tc1,cpu-affinity=3-4 \ -object memory-backend-ram,id=pc.ram,size=1G,prealloc=on,prealloc-threads=2,prealloc-context=tc1 \ -machine memory-backend=pc.ram \ -S -monitor stdio -sandbox enable=on,resourcecontrol=deny And while we can query the current CPU affinity: (qemu) qom-get tc1 cpu-affinity [ 3, 4 ] We can no longer change it from QEMU directly: (qemu) qom-set tc1 cpu-affinity 1-2 Error: Setting CPU affinity failed: Operation not permitted Reviewed-by: Michal Privoznik Signed-off-by: David Hildenbrand --- softmmu/vl.c | 36 ++++++++++++++++++++++++++++++++---- 1 file changed, 32 insertions(+), 4 deletions(-) diff --git a/softmmu/vl.c b/softmmu/vl.c index 9abadcc150..27488e32d0 100644 --- a/softmmu/vl.c +++ b/softmmu/vl.c @@ -1761,6 +1761,27 @@ static void object_option_parse(const char *optarg) visit_free(v); } +/* + * Very early object creation, before the sandbox options have been activated. + */ +static bool object_create_pre_sandbox(const char *type) +{ + /* + * Objects should in general not get initialized "too early" without + * a reason. If you add one, state the reason in a comment! + */ + + /* + * Reason: -sandbox on,resourcecontrol=deny disallows setting CPU + * affinity of threads. + */ + if (g_str_equal(type, "thread-context")) { + return true; + } + + return false; +} + /* * Initial object creation happens before all other * QEMU data types are created. The majority of objects @@ -1775,6 +1796,11 @@ static bool object_create_early(const char *type) * add one, state the reason in a comment! */ + /* Reason: already created. */ + if (object_create_pre_sandbox(type)) { + return false; + } + /* Reason: property "chardev" */ if (g_str_equal(type, "rng-egd") || g_str_equal(type, "qtest")) { @@ -1897,7 +1923,7 @@ static void qemu_create_early_backends(void) */ static bool object_create_late(const char *type) { - return !object_create_early(type); + return !object_create_early(type) && !object_create_pre_sandbox(type); } static void qemu_create_late_backends(void) @@ -2359,6 +2385,11 @@ static int process_runstate_actions(void *opaque, QemuOpts *opts, Error **errp) static void qemu_process_early_options(void) { + qemu_opts_foreach(qemu_find_opts("name"), + parse_name, NULL, &error_fatal); + + object_option_foreach_add(object_create_pre_sandbox); + #ifdef CONFIG_SECCOMP QemuOptsList *olist = qemu_find_opts_err("sandbox", NULL); if (olist) { @@ -2366,9 +2397,6 @@ static void qemu_process_early_options(void) } #endif - qemu_opts_foreach(qemu_find_opts("name"), - parse_name, NULL, &error_fatal); - if (qemu_opts_foreach(qemu_find_opts("action"), process_runstate_actions, NULL, &error_fatal)) { exit(1);