From patchwork Tue Aug 27 17:54:20 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Maciej S. Szmigiero" X-Patchwork-Id: 13779930 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 91A74C54739 for ; Tue, 27 Aug 2024 17:55:58 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1sj0Pw-00059L-83; Tue, 27 Aug 2024 13:55:45 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1sj0Pt-00052v-UE for qemu-devel@nongnu.org; Tue, 27 Aug 2024 13:55:41 -0400 Received: from vps-vb.mhejs.net ([37.28.154.113]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1sj0Ph-0001Yg-Gp for qemu-devel@nongnu.org; Tue, 27 Aug 2024 13:55:41 -0400 Received: from MUA by vps-vb.mhejs.net with esmtps (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1sj0PC-0002LQ-6z; Tue, 27 Aug 2024 19:54:58 +0200 From: "Maciej S. Szmigiero" To: Peter Xu , Fabiano Rosas Cc: Alex Williamson , =?utf-8?q?C=C3=A9dric_Le_G?= =?utf-8?q?oater?= , Eric Blake , Markus Armbruster , =?utf-8?q?Daniel_P_=2E_Berrang=C3=A9?= , Avihai Horon , Joao Martins , qemu-devel@nongnu.org Subject: [PATCH v2 01/17] vfio/migration: Add save_{iterate, complete_precopy}_started trace events Date: Tue, 27 Aug 2024 19:54:20 +0200 Message-ID: <3c43bf662842e579c0009dfc5135024e45166987.1724701542.git.maciej.szmigiero@oracle.com> X-Mailer: git-send-email 2.45.2 In-Reply-To: References: MIME-Version: 1.0 Received-SPF: pass client-ip=37.28.154.113; envelope-from=mail@maciej.szmigiero.name; helo=vps-vb.mhejs.net X-Spam_score_int: -18 X-Spam_score: -1.9 X-Spam_bar: - X-Spam_report: (-1.9 / 5.0 requ) BAYES_00=-1.9, RCVD_IN_VALIDITY_RPBL_BLOCKED=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org From: "Maciej S. Szmigiero" This way both the start and end points of migrating a particular VFIO device are known. Add also a vfio_save_iterate_empty_hit trace event so it is known when there's no more data to send for that device. Signed-off-by: Maciej S. Szmigiero --- hw/vfio/migration.c | 13 +++++++++++++ hw/vfio/trace-events | 3 +++ include/hw/vfio/vfio-common.h | 3 +++ 3 files changed, 19 insertions(+) diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c index 262d42a46e58..24679d8c5034 100644 --- a/hw/vfio/migration.c +++ b/hw/vfio/migration.c @@ -472,6 +472,9 @@ static int vfio_save_setup(QEMUFile *f, void *opaque, Error **errp) return -ENOMEM; } + migration->save_iterate_run = false; + migration->save_iterate_empty_hit = false; + if (vfio_precopy_supported(vbasedev)) { switch (migration->device_state) { case VFIO_DEVICE_STATE_RUNNING: @@ -605,9 +608,17 @@ static int vfio_save_iterate(QEMUFile *f, void *opaque) VFIOMigration *migration = vbasedev->migration; ssize_t data_size; + if (!migration->save_iterate_run) { + trace_vfio_save_iterate_started(vbasedev->name); + migration->save_iterate_run = true; + } + data_size = vfio_save_block(f, migration); if (data_size < 0) { return data_size; + } else if (data_size == 0 && !migration->save_iterate_empty_hit) { + trace_vfio_save_iterate_empty_hit(vbasedev->name); + migration->save_iterate_empty_hit = true; } vfio_update_estimated_pending_data(migration, data_size); @@ -633,6 +644,8 @@ static int vfio_save_complete_precopy(QEMUFile *f, void *opaque) int ret; Error *local_err = NULL; + trace_vfio_save_complete_precopy_started(vbasedev->name); + /* We reach here with device state STOP or STOP_COPY only */ ret = vfio_migration_set_state(vbasedev, VFIO_DEVICE_STATE_STOP_COPY, VFIO_DEVICE_STATE_STOP, &local_err); diff --git a/hw/vfio/trace-events b/hw/vfio/trace-events index 98bd4dcceadc..013c602f30fa 100644 --- a/hw/vfio/trace-events +++ b/hw/vfio/trace-events @@ -159,8 +159,11 @@ vfio_migration_state_notifier(const char *name, int state) " (%s) state %d" vfio_save_block(const char *name, int data_size) " (%s) data_size %d" vfio_save_cleanup(const char *name) " (%s)" vfio_save_complete_precopy(const char *name, int ret) " (%s) ret %d" +vfio_save_complete_precopy_started(const char *name) " (%s)" vfio_save_device_config_state(const char *name) " (%s)" vfio_save_iterate(const char *name, uint64_t precopy_init_size, uint64_t precopy_dirty_size) " (%s) precopy initial size 0x%"PRIx64" precopy dirty size 0x%"PRIx64 +vfio_save_iterate_started(const char *name) " (%s)" +vfio_save_iterate_empty_hit(const char *name) " (%s)" vfio_save_setup(const char *name, uint64_t data_buffer_size) " (%s) data buffer size 0x%"PRIx64 vfio_state_pending_estimate(const char *name, uint64_t precopy, uint64_t postcopy, uint64_t precopy_init_size, uint64_t precopy_dirty_size) " (%s) precopy 0x%"PRIx64" postcopy 0x%"PRIx64" precopy initial size 0x%"PRIx64" precopy dirty size 0x%"PRIx64 vfio_state_pending_exact(const char *name, uint64_t precopy, uint64_t postcopy, uint64_t stopcopy_size, uint64_t precopy_init_size, uint64_t precopy_dirty_size) " (%s) precopy 0x%"PRIx64" postcopy 0x%"PRIx64" stopcopy size 0x%"PRIx64" precopy initial size 0x%"PRIx64" precopy dirty size 0x%"PRIx64 diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h index fed499b199f0..32d58e3e025b 100644 --- a/include/hw/vfio/vfio-common.h +++ b/include/hw/vfio/vfio-common.h @@ -73,6 +73,9 @@ typedef struct VFIOMigration { uint64_t precopy_init_size; uint64_t precopy_dirty_size; bool initial_data_sent; + + bool save_iterate_run; + bool save_iterate_empty_hit; } VFIOMigration; struct VFIOGroup; From patchwork Tue Aug 27 17:54:21 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Maciej S. Szmigiero" X-Patchwork-Id: 13779928 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 2F7B5C54731 for ; Tue, 27 Aug 2024 17:55:58 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1sj0Q0-0005T1-Ot; Tue, 27 Aug 2024 13:55:48 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1sj0Pv-00059H-JH for qemu-devel@nongnu.org; Tue, 27 Aug 2024 13:55:43 -0400 Received: from vps-vb.mhejs.net ([37.28.154.113]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1sj0Ph-0001Yt-U4 for qemu-devel@nongnu.org; Tue, 27 Aug 2024 13:55:43 -0400 Received: from MUA by vps-vb.mhejs.net with esmtps (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1sj0PH-0002La-Fn; Tue, 27 Aug 2024 19:55:03 +0200 From: "Maciej S. Szmigiero" To: Peter Xu , Fabiano Rosas Cc: Alex Williamson , =?utf-8?q?C=C3=A9dric_Le_G?= =?utf-8?q?oater?= , Eric Blake , Markus Armbruster , =?utf-8?q?Daniel_P_=2E_Berrang=C3=A9?= , Avihai Horon , Joao Martins , qemu-devel@nongnu.org Subject: [PATCH v2 02/17] migration/ram: Add load start trace event Date: Tue, 27 Aug 2024 19:54:21 +0200 Message-ID: <3e6af826a953ac88765064620d92b3797ea743f9.1724701542.git.maciej.szmigiero@oracle.com> X-Mailer: git-send-email 2.45.2 In-Reply-To: References: MIME-Version: 1.0 Received-SPF: pass client-ip=37.28.154.113; envelope-from=mail@maciej.szmigiero.name; helo=vps-vb.mhejs.net X-Spam_score_int: -18 X-Spam_score: -1.9 X-Spam_bar: - X-Spam_report: (-1.9 / 5.0 requ) BAYES_00=-1.9, RCVD_IN_VALIDITY_CERTIFIED_BLOCKED=0.001, RCVD_IN_VALIDITY_RPBL_BLOCKED=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org From: "Maciej S. Szmigiero" There's a RAM load complete trace event but there wasn't its start equivalent. Signed-off-by: Maciej S. Szmigiero --- migration/ram.c | 1 + migration/trace-events | 1 + 2 files changed, 2 insertions(+) diff --git a/migration/ram.c b/migration/ram.c index 67ca3d5d51a1..7997bd830b9c 100644 --- a/migration/ram.c +++ b/migration/ram.c @@ -4127,6 +4127,7 @@ static int ram_load_precopy(QEMUFile *f) RAM_SAVE_FLAG_ZERO); } + trace_ram_load_start(); while (!ret && !(flags & RAM_SAVE_FLAG_EOS)) { ram_addr_t addr; void *host = NULL, *host_bak = NULL; diff --git a/migration/trace-events b/migration/trace-events index c65902f042bd..2a99a7baaea6 100644 --- a/migration/trace-events +++ b/migration/trace-events @@ -115,6 +115,7 @@ colo_flush_ram_cache_end(void) "" save_xbzrle_page_skipping(void) "" save_xbzrle_page_overflow(void) "" ram_save_iterate_big_wait(uint64_t milliconds, int iterations) "big wait: %" PRIu64 " milliseconds, %d iterations" +ram_load_start(void) "" ram_load_complete(int ret, uint64_t seq_iter) "exit_code %d seq iteration %" PRIu64 ram_write_tracking_ramblock_start(const char *block_id, size_t page_size, void *addr, size_t length) "%s: page_size: %zu addr: %p length: %zu" ram_write_tracking_ramblock_stop(const char *block_id, size_t page_size, void *addr, size_t length) "%s: page_size: %zu addr: %p length: %zu" From patchwork Tue Aug 27 17:54:22 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Maciej S. Szmigiero" X-Patchwork-Id: 13779949 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 5FDF1C5473A for ; Tue, 27 Aug 2024 17:58:10 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1sj0Q2-0005ab-47; Tue, 27 Aug 2024 13:55:50 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1sj0Pv-00059N-KC for qemu-devel@nongnu.org; Tue, 27 Aug 2024 13:55:43 -0400 Received: from vps-vb.mhejs.net ([37.28.154.113]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1sj0Ph-0001Yk-Gt for qemu-devel@nongnu.org; Tue, 27 Aug 2024 13:55:42 -0400 Received: from MUA by vps-vb.mhejs.net with esmtps (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1sj0PM-0002Lk-Ve; Tue, 27 Aug 2024 19:55:09 +0200 From: "Maciej S. Szmigiero" To: Peter Xu , Fabiano Rosas Cc: Alex Williamson , =?utf-8?q?C=C3=A9dric_Le_G?= =?utf-8?q?oater?= , Eric Blake , Markus Armbruster , =?utf-8?q?Daniel_P_=2E_Berrang=C3=A9?= , Avihai Horon , Joao Martins , qemu-devel@nongnu.org Subject: [PATCH v2 03/17] migration/multifd: Zero p->flags before starting filling a packet Date: Tue, 27 Aug 2024 19:54:22 +0200 Message-ID: <6230f70a9d11f5b5efd2811ebdebc722318c6b15.1724701542.git.maciej.szmigiero@oracle.com> X-Mailer: git-send-email 2.45.2 In-Reply-To: References: MIME-Version: 1.0 Received-SPF: pass client-ip=37.28.154.113; envelope-from=mail@maciej.szmigiero.name; helo=vps-vb.mhejs.net X-Spam_score_int: -18 X-Spam_score: -1.9 X-Spam_bar: - X-Spam_report: (-1.9 / 5.0 requ) BAYES_00=-1.9, RCVD_IN_VALIDITY_CERTIFIED_BLOCKED=0.001, RCVD_IN_VALIDITY_RPBL_BLOCKED=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org From: "Maciej S. Szmigiero" This way there aren't stale flags there. p->flags can't contain SYNC to be sent at the next RAM packet since syncs are now handled separately in multifd_send_thread. Signed-off-by: Maciej S. Szmigiero Reviewed-by: Fabiano Rosas Reviewed-by: Peter Xu --- migration/multifd.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/migration/multifd.c b/migration/multifd.c index 0c07a2040ba8..b06a9fab500e 100644 --- a/migration/multifd.c +++ b/migration/multifd.c @@ -601,6 +601,7 @@ static void *multifd_send_thread(void *opaque) * qatomic_store_release() in multifd_send(). */ if (qatomic_load_acquire(&p->pending_job)) { + p->flags = 0; p->iovs_num = 0; assert(!multifd_payload_empty(p->data)); @@ -652,7 +653,6 @@ static void *multifd_send_thread(void *opaque) } /* p->next_packet_size will always be zero for a SYNC packet */ stat64_add(&mig_stats.multifd_bytes, p->packet_len); - p->flags = 0; } qatomic_set(&p->pending_sync, false); From patchwork Tue Aug 27 17:54:23 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Maciej S. Szmigiero" X-Patchwork-Id: 13779929 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 3474DC54734 for ; Tue, 27 Aug 2024 17:55:58 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1sj0Pz-0005JV-GT; Tue, 27 Aug 2024 13:55:47 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1sj0Pv-00059R-KM for qemu-devel@nongnu.org; Tue, 27 Aug 2024 13:55:43 -0400 Received: from vps-vb.mhejs.net ([37.28.154.113]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1sj0Ph-0001ZH-UZ for qemu-devel@nongnu.org; Tue, 27 Aug 2024 13:55:43 -0400 Received: from MUA by vps-vb.mhejs.net with esmtps (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1sj0PS-0002Lw-F1; Tue, 27 Aug 2024 19:55:14 +0200 From: "Maciej S. Szmigiero" To: Peter Xu , Fabiano Rosas Cc: Alex Williamson , =?utf-8?q?C=C3=A9dric_Le_G?= =?utf-8?q?oater?= , Eric Blake , Markus Armbruster , =?utf-8?q?Daniel_P_=2E_Berrang=C3=A9?= , Avihai Horon , Joao Martins , qemu-devel@nongnu.org Subject: [PATCH v2 04/17] thread-pool: Add a DestroyNotify parameter to thread_pool_submit{, _aio)() Date: Tue, 27 Aug 2024 19:54:23 +0200 Message-ID: <7e14c1ea1ca1797ac5658033398f2bde22c6a364.1724701542.git.maciej.szmigiero@oracle.com> X-Mailer: git-send-email 2.45.2 In-Reply-To: References: MIME-Version: 1.0 Received-SPF: pass client-ip=37.28.154.113; envelope-from=mail@maciej.szmigiero.name; helo=vps-vb.mhejs.net X-Spam_score_int: -18 X-Spam_score: -1.9 X-Spam_bar: - X-Spam_report: (-1.9 / 5.0 requ) BAYES_00=-1.9, RCVD_IN_VALIDITY_CERTIFIED_BLOCKED=0.001, RCVD_IN_VALIDITY_RPBL_BLOCKED=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org From: "Maciej S. Szmigiero" Automatic memory management is less prone to mistakes or confusion who is responsible for freeing the memory backing the "arg" parameter or dropping a strong reference to object pointed by this parameter. Signed-off-by: Maciej S. Szmigiero --- backends/tpm/tpm_backend.c | 2 +- block/file-win32.c | 2 +- hw/9pfs/coth.c | 3 ++- hw/ppc/spapr_nvdimm.c | 4 ++-- hw/virtio/virtio-pmem.c | 2 +- include/block/thread-pool.h | 6 ++++-- tests/unit/test-thread-pool.c | 8 ++++---- util/thread-pool.c | 16 ++++++++++++---- 8 files changed, 27 insertions(+), 16 deletions(-) diff --git a/backends/tpm/tpm_backend.c b/backends/tpm/tpm_backend.c index 485a20b9e09f..65ef961b59ae 100644 --- a/backends/tpm/tpm_backend.c +++ b/backends/tpm/tpm_backend.c @@ -107,7 +107,7 @@ void tpm_backend_deliver_request(TPMBackend *s, TPMBackendCmd *cmd) s->cmd = cmd; object_ref(OBJECT(s)); - thread_pool_submit_aio(tpm_backend_worker_thread, s, + thread_pool_submit_aio(tpm_backend_worker_thread, s, NULL, tpm_backend_request_completed, s); } diff --git a/block/file-win32.c b/block/file-win32.c index 7e1baa1ece6a..9b99ae2f89e1 100644 --- a/block/file-win32.c +++ b/block/file-win32.c @@ -167,7 +167,7 @@ static BlockAIOCB *paio_submit(BlockDriverState *bs, HANDLE hfile, acb->aio_offset = offset; trace_file_paio_submit(acb, opaque, offset, count, type); - return thread_pool_submit_aio(aio_worker, acb, cb, opaque); + return thread_pool_submit_aio(aio_worker, acb, NULL, cb, opaque); } int qemu_ftruncate64(int fd, int64_t length) diff --git a/hw/9pfs/coth.c b/hw/9pfs/coth.c index 598f46add993..fe5bfa6920fe 100644 --- a/hw/9pfs/coth.c +++ b/hw/9pfs/coth.c @@ -41,5 +41,6 @@ static int coroutine_enter_func(void *arg) void co_run_in_worker_bh(void *opaque) { Coroutine *co = opaque; - thread_pool_submit_aio(coroutine_enter_func, co, coroutine_enter_cb, co); + thread_pool_submit_aio(coroutine_enter_func, co, NULL, + coroutine_enter_cb, co); } diff --git a/hw/ppc/spapr_nvdimm.c b/hw/ppc/spapr_nvdimm.c index 7d2dfe5e3d2f..f9ee45935d1d 100644 --- a/hw/ppc/spapr_nvdimm.c +++ b/hw/ppc/spapr_nvdimm.c @@ -517,7 +517,7 @@ static int spapr_nvdimm_flush_post_load(void *opaque, int version_id) } QLIST_FOREACH(state, &s_nvdimm->pending_nvdimm_flush_states, node) { - thread_pool_submit_aio(flush_worker_cb, state, + thread_pool_submit_aio(flush_worker_cb, state, NULL, spapr_nvdimm_flush_completion_cb, state); } @@ -698,7 +698,7 @@ static target_ulong h_scm_flush(PowerPCCPU *cpu, SpaprMachineState *spapr, state->drcidx = drc_index; - thread_pool_submit_aio(flush_worker_cb, state, + thread_pool_submit_aio(flush_worker_cb, state, NULL, spapr_nvdimm_flush_completion_cb, state); continue_token = state->continue_token; diff --git a/hw/virtio/virtio-pmem.c b/hw/virtio/virtio-pmem.c index c3512c2dae3f..f1331c03f474 100644 --- a/hw/virtio/virtio-pmem.c +++ b/hw/virtio/virtio-pmem.c @@ -87,7 +87,7 @@ static void virtio_pmem_flush(VirtIODevice *vdev, VirtQueue *vq) req_data->fd = memory_region_get_fd(&backend->mr); req_data->pmem = pmem; req_data->vdev = vdev; - thread_pool_submit_aio(worker_cb, req_data, done_cb, req_data); + thread_pool_submit_aio(worker_cb, req_data, NULL, done_cb, req_data); } static void virtio_pmem_get_config(VirtIODevice *vdev, uint8_t *config) diff --git a/include/block/thread-pool.h b/include/block/thread-pool.h index 948ff5f30c31..b484c4780ea6 100644 --- a/include/block/thread-pool.h +++ b/include/block/thread-pool.h @@ -33,10 +33,12 @@ void thread_pool_free(ThreadPool *pool); * thread_pool_submit* API: submit I/O requests in the thread's * current AioContext. */ -BlockAIOCB *thread_pool_submit_aio(ThreadPoolFunc *func, void *arg, +BlockAIOCB *thread_pool_submit_aio(ThreadPoolFunc *func, + void *arg, GDestroyNotify arg_destroy, BlockCompletionFunc *cb, void *opaque); int coroutine_fn thread_pool_submit_co(ThreadPoolFunc *func, void *arg); -void thread_pool_submit(ThreadPoolFunc *func, void *arg); +void thread_pool_submit(ThreadPoolFunc *func, + void *arg, GDestroyNotify arg_destroy); void thread_pool_update_params(ThreadPool *pool, struct AioContext *ctx); diff --git a/tests/unit/test-thread-pool.c b/tests/unit/test-thread-pool.c index 1483e53473db..e4afb9e36292 100644 --- a/tests/unit/test-thread-pool.c +++ b/tests/unit/test-thread-pool.c @@ -46,7 +46,7 @@ static void done_cb(void *opaque, int ret) static void test_submit(void) { WorkerTestData data = { .n = 0 }; - thread_pool_submit(worker_cb, &data); + thread_pool_submit(worker_cb, &data, NULL); while (data.n == 0) { aio_poll(ctx, true); } @@ -56,7 +56,7 @@ static void test_submit(void) static void test_submit_aio(void) { WorkerTestData data = { .n = 0, .ret = -EINPROGRESS }; - data.aiocb = thread_pool_submit_aio(worker_cb, &data, + data.aiocb = thread_pool_submit_aio(worker_cb, &data, NULL, done_cb, &data); /* The callbacks are not called until after the first wait. */ @@ -121,7 +121,7 @@ static void test_submit_many(void) for (i = 0; i < 100; i++) { data[i].n = 0; data[i].ret = -EINPROGRESS; - thread_pool_submit_aio(worker_cb, &data[i], done_cb, &data[i]); + thread_pool_submit_aio(worker_cb, &data[i], NULL, done_cb, &data[i]); } active = 100; @@ -149,7 +149,7 @@ static void do_test_cancel(bool sync) for (i = 0; i < 100; i++) { data[i].n = 0; data[i].ret = -EINPROGRESS; - data[i].aiocb = thread_pool_submit_aio(long_cb, &data[i], + data[i].aiocb = thread_pool_submit_aio(long_cb, &data[i], NULL, done_cb, &data[i]); } diff --git a/util/thread-pool.c b/util/thread-pool.c index 27eb777e855b..69a87ee79252 100644 --- a/util/thread-pool.c +++ b/util/thread-pool.c @@ -38,6 +38,7 @@ struct ThreadPoolElement { ThreadPool *pool; ThreadPoolFunc *func; void *arg; + GDestroyNotify arg_destroy; /* Moving state out of THREAD_QUEUED is protected by lock. After * that, only the worker thread can write to it. Reads and writes @@ -188,6 +189,10 @@ restart: elem->ret); QLIST_REMOVE(elem, all); + if (elem->arg_destroy) { + elem->arg_destroy(elem->arg); + } + if (elem->common.cb) { /* Read state before ret. */ smp_rmb(); @@ -238,7 +243,8 @@ static const AIOCBInfo thread_pool_aiocb_info = { .cancel_async = thread_pool_cancel, }; -BlockAIOCB *thread_pool_submit_aio(ThreadPoolFunc *func, void *arg, +BlockAIOCB *thread_pool_submit_aio(ThreadPoolFunc *func, + void *arg, GDestroyNotify arg_destroy, BlockCompletionFunc *cb, void *opaque) { ThreadPoolElement *req; @@ -251,6 +257,7 @@ BlockAIOCB *thread_pool_submit_aio(ThreadPoolFunc *func, void *arg, req = qemu_aio_get(&thread_pool_aiocb_info, NULL, cb, opaque); req->func = func; req->arg = arg; + req->arg_destroy = arg_destroy; req->state = THREAD_QUEUED; req->pool = pool; @@ -285,14 +292,15 @@ int coroutine_fn thread_pool_submit_co(ThreadPoolFunc *func, void *arg) { ThreadPoolCo tpc = { .co = qemu_coroutine_self(), .ret = -EINPROGRESS }; assert(qemu_in_coroutine()); - thread_pool_submit_aio(func, arg, thread_pool_co_cb, &tpc); + thread_pool_submit_aio(func, arg, NULL, thread_pool_co_cb, &tpc); qemu_coroutine_yield(); return tpc.ret; } -void thread_pool_submit(ThreadPoolFunc *func, void *arg) +void thread_pool_submit(ThreadPoolFunc *func, + void *arg, GDestroyNotify arg_destroy) { - thread_pool_submit_aio(func, arg, NULL, NULL); + thread_pool_submit_aio(func, arg, arg_destroy, NULL, NULL); } void thread_pool_update_params(ThreadPool *pool, AioContext *ctx) From patchwork Tue Aug 27 17:54:24 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Maciej S. Szmigiero" X-Patchwork-Id: 13779952 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id C13D5C54731 for ; Tue, 27 Aug 2024 17:58:26 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1sj0Q5-0005ns-EN; Tue, 27 Aug 2024 13:55:53 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1sj0Px-0005FF-20 for qemu-devel@nongnu.org; Tue, 27 Aug 2024 13:55:45 -0400 Received: from vps-vb.mhejs.net ([37.28.154.113]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1sj0Pl-0001Zc-3H for qemu-devel@nongnu.org; Tue, 27 Aug 2024 13:55:44 -0400 Received: from MUA by vps-vb.mhejs.net with esmtps (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1sj0PX-0002MW-Nt; Tue, 27 Aug 2024 19:55:19 +0200 From: "Maciej S. Szmigiero" To: Peter Xu , Fabiano Rosas Cc: Alex Williamson , =?utf-8?q?C=C3=A9dric_Le_G?= =?utf-8?q?oater?= , Eric Blake , Markus Armbruster , =?utf-8?q?Daniel_P_=2E_Berrang=C3=A9?= , Avihai Horon , Joao Martins , qemu-devel@nongnu.org Subject: [PATCH v2 05/17] thread-pool: Implement non-AIO (generic) pool support Date: Tue, 27 Aug 2024 19:54:24 +0200 Message-ID: <54947c3a1df713f5b69d8296938f3da41116ffe0.1724701542.git.maciej.szmigiero@oracle.com> X-Mailer: git-send-email 2.45.2 In-Reply-To: References: MIME-Version: 1.0 Received-SPF: pass client-ip=37.28.154.113; envelope-from=mail@maciej.szmigiero.name; helo=vps-vb.mhejs.net X-Spam_score_int: -18 X-Spam_score: -1.9 X-Spam_bar: - X-Spam_report: (-1.9 / 5.0 requ) BAYES_00=-1.9, RCVD_IN_VALIDITY_CERTIFIED_BLOCKED=0.001, RCVD_IN_VALIDITY_RPBL_BLOCKED=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org From: "Maciej S. Szmigiero" Migration code wants to manage device data sending threads in one place. QEMU has an existing thread pool implementation, however it was limited to queuing AIO operations only and essentially had a 1:1 mapping between the current AioContext and the ThreadPool in use. Implement what is necessary to queue generic (non-AIO) work on a ThreadPool too. This brings a few new operations on a pool: * thread_pool_set_minmax_threads() explicitly sets the minimum and maximum thread count in the pool. * thread_pool_join() operation waits until all the submitted work requests have finished. * thread_pool_poll() lets the new thread and / or thread completion bottom halves run (if they are indeed scheduled to be run). It is useful for thread pool users that need to launch or terminate new threads without returning to the QEMU main loop. Signed-off-by: Maciej S. Szmigiero --- include/block/thread-pool.h | 10 ++++- tests/unit/test-thread-pool.c | 2 +- util/thread-pool.c | 77 ++++++++++++++++++++++++++++++----- 3 files changed, 76 insertions(+), 13 deletions(-) diff --git a/include/block/thread-pool.h b/include/block/thread-pool.h index b484c4780ea6..1769496056cd 100644 --- a/include/block/thread-pool.h +++ b/include/block/thread-pool.h @@ -37,9 +37,15 @@ BlockAIOCB *thread_pool_submit_aio(ThreadPoolFunc *func, void *arg, GDestroyNotify arg_destroy, BlockCompletionFunc *cb, void *opaque); int coroutine_fn thread_pool_submit_co(ThreadPoolFunc *func, void *arg); -void thread_pool_submit(ThreadPoolFunc *func, - void *arg, GDestroyNotify arg_destroy); +BlockAIOCB *thread_pool_submit(ThreadPool *pool, ThreadPoolFunc *func, + void *arg, GDestroyNotify arg_destroy, + BlockCompletionFunc *cb, void *opaque); +void thread_pool_join(ThreadPool *pool); +void thread_pool_poll(ThreadPool *pool); + +void thread_pool_set_minmax_threads(ThreadPool *pool, + int min_threads, int max_threads); void thread_pool_update_params(ThreadPool *pool, struct AioContext *ctx); #endif diff --git a/tests/unit/test-thread-pool.c b/tests/unit/test-thread-pool.c index e4afb9e36292..469c0f7057b6 100644 --- a/tests/unit/test-thread-pool.c +++ b/tests/unit/test-thread-pool.c @@ -46,7 +46,7 @@ static void done_cb(void *opaque, int ret) static void test_submit(void) { WorkerTestData data = { .n = 0 }; - thread_pool_submit(worker_cb, &data, NULL); + thread_pool_submit(NULL, worker_cb, &data, NULL, NULL, NULL); while (data.n == 0) { aio_poll(ctx, true); } diff --git a/util/thread-pool.c b/util/thread-pool.c index 69a87ee79252..2bf3be875a51 100644 --- a/util/thread-pool.c +++ b/util/thread-pool.c @@ -60,6 +60,7 @@ struct ThreadPool { QemuMutex lock; QemuCond worker_stopped; QemuCond request_cond; + QemuCond no_requests_cond; QEMUBH *new_thread_bh; /* The following variables are only accessed from one AioContext. */ @@ -73,6 +74,7 @@ struct ThreadPool { int pending_threads; /* threads created but not running yet */ int min_threads; int max_threads; + size_t requests_executing; }; static void *worker_thread(void *opaque) @@ -107,6 +109,10 @@ static void *worker_thread(void *opaque) req = QTAILQ_FIRST(&pool->request_list); QTAILQ_REMOVE(&pool->request_list, req, reqs); req->state = THREAD_ACTIVE; + + assert(pool->requests_executing < SIZE_MAX); + pool->requests_executing++; + qemu_mutex_unlock(&pool->lock); ret = req->func(req->arg); @@ -118,6 +124,14 @@ static void *worker_thread(void *opaque) qemu_bh_schedule(pool->completion_bh); qemu_mutex_lock(&pool->lock); + + assert(pool->requests_executing > 0); + pool->requests_executing--; + + if (pool->requests_executing == 0 && + QTAILQ_EMPTY(&pool->request_list)) { + qemu_cond_signal(&pool->no_requests_cond); + } } pool->cur_threads--; @@ -243,13 +257,16 @@ static const AIOCBInfo thread_pool_aiocb_info = { .cancel_async = thread_pool_cancel, }; -BlockAIOCB *thread_pool_submit_aio(ThreadPoolFunc *func, - void *arg, GDestroyNotify arg_destroy, - BlockCompletionFunc *cb, void *opaque) +BlockAIOCB *thread_pool_submit(ThreadPool *pool, ThreadPoolFunc *func, + void *arg, GDestroyNotify arg_destroy, + BlockCompletionFunc *cb, void *opaque) { ThreadPoolElement *req; AioContext *ctx = qemu_get_current_aio_context(); - ThreadPool *pool = aio_get_thread_pool(ctx); + + if (!pool) { + pool = aio_get_thread_pool(ctx); + } /* Assert that the thread submitting work is the same running the pool */ assert(pool->ctx == qemu_get_current_aio_context()); @@ -275,6 +292,18 @@ BlockAIOCB *thread_pool_submit_aio(ThreadPoolFunc *func, return &req->common; } +BlockAIOCB *thread_pool_submit_aio(ThreadPoolFunc *func, + void *arg, GDestroyNotify arg_destroy, + BlockCompletionFunc *cb, void *opaque) +{ + return thread_pool_submit(NULL, func, arg, arg_destroy, cb, opaque); +} + +void thread_pool_poll(ThreadPool *pool) +{ + aio_bh_poll(pool->ctx); +} + typedef struct ThreadPoolCo { Coroutine *co; int ret; @@ -297,18 +326,38 @@ int coroutine_fn thread_pool_submit_co(ThreadPoolFunc *func, void *arg) return tpc.ret; } -void thread_pool_submit(ThreadPoolFunc *func, - void *arg, GDestroyNotify arg_destroy) +void thread_pool_join(ThreadPool *pool) { - thread_pool_submit_aio(func, arg, arg_destroy, NULL, NULL); + /* Assert that the thread waiting is the same running the pool */ + assert(pool->ctx == qemu_get_current_aio_context()); + + qemu_mutex_lock(&pool->lock); + + if (pool->requests_executing > 0 || + !QTAILQ_EMPTY(&pool->request_list)) { + qemu_cond_wait(&pool->no_requests_cond, &pool->lock); + } + assert(pool->requests_executing == 0 && + QTAILQ_EMPTY(&pool->request_list)); + + qemu_mutex_unlock(&pool->lock); + + aio_bh_poll(pool->ctx); + + assert(QLIST_EMPTY(&pool->head)); } -void thread_pool_update_params(ThreadPool *pool, AioContext *ctx) +void thread_pool_set_minmax_threads(ThreadPool *pool, + int min_threads, int max_threads) { + assert(min_threads >= 0); + assert(max_threads > 0); + assert(max_threads >= min_threads); + qemu_mutex_lock(&pool->lock); - pool->min_threads = ctx->thread_pool_min; - pool->max_threads = ctx->thread_pool_max; + pool->min_threads = min_threads; + pool->max_threads = max_threads; /* * We either have to: @@ -330,6 +379,12 @@ void thread_pool_update_params(ThreadPool *pool, AioContext *ctx) qemu_mutex_unlock(&pool->lock); } +void thread_pool_update_params(ThreadPool *pool, AioContext *ctx) +{ + thread_pool_set_minmax_threads(pool, + ctx->thread_pool_min, ctx->thread_pool_max); +} + static void thread_pool_init_one(ThreadPool *pool, AioContext *ctx) { if (!ctx) { @@ -342,6 +397,7 @@ static void thread_pool_init_one(ThreadPool *pool, AioContext *ctx) qemu_mutex_init(&pool->lock); qemu_cond_init(&pool->worker_stopped); qemu_cond_init(&pool->request_cond); + qemu_cond_init(&pool->no_requests_cond); pool->new_thread_bh = aio_bh_new(ctx, spawn_thread_bh_fn, pool); QLIST_INIT(&pool->head); @@ -382,6 +438,7 @@ void thread_pool_free(ThreadPool *pool) qemu_mutex_unlock(&pool->lock); qemu_bh_delete(pool->completion_bh); + qemu_cond_destroy(&pool->no_requests_cond); qemu_cond_destroy(&pool->request_cond); qemu_cond_destroy(&pool->worker_stopped); qemu_mutex_destroy(&pool->lock); From patchwork Tue Aug 27 17:54:25 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Maciej S. Szmigiero" X-Patchwork-Id: 13779939 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id E3D69C54731 for ; Tue, 27 Aug 2024 17:57:56 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1sj0Q2-0005dX-RH; Tue, 27 Aug 2024 13:55:50 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1sj0Px-0005FC-21 for qemu-devel@nongnu.org; Tue, 27 Aug 2024 13:55:45 -0400 Received: from vps-vb.mhejs.net ([37.28.154.113]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1sj0Pr-0001Zm-Q9 for qemu-devel@nongnu.org; Tue, 27 Aug 2024 13:55:44 -0400 Received: from MUA by vps-vb.mhejs.net with esmtps (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1sj0Pc-0002Ml-VW; Tue, 27 Aug 2024 19:55:25 +0200 From: "Maciej S. Szmigiero" To: Peter Xu , Fabiano Rosas Cc: Alex Williamson , =?utf-8?q?C=C3=A9dric_Le_G?= =?utf-8?q?oater?= , Eric Blake , Markus Armbruster , =?utf-8?q?Daniel_P_=2E_Berrang=C3=A9?= , Avihai Horon , Joao Martins , qemu-devel@nongnu.org Subject: [PATCH v2 06/17] migration: Add save_live_complete_precopy_{begin, end} handlers Date: Tue, 27 Aug 2024 19:54:25 +0200 Message-ID: X-Mailer: git-send-email 2.45.2 In-Reply-To: References: MIME-Version: 1.0 Received-SPF: pass client-ip=37.28.154.113; envelope-from=mail@maciej.szmigiero.name; helo=vps-vb.mhejs.net X-Spam_score_int: -18 X-Spam_score: -1.9 X-Spam_bar: - X-Spam_report: (-1.9 / 5.0 requ) BAYES_00=-1.9, RCVD_IN_VALIDITY_CERTIFIED_BLOCKED=0.001, RCVD_IN_VALIDITY_RPBL_BLOCKED=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org From: "Maciej S. Szmigiero" These SaveVMHandlers help device provide its own asynchronous transmission of the remaining data at the end of a precopy phase. In this use case the save_live_complete_precopy_begin handler might be used to mark the stream boundary before proceeding with asynchronous transmission of the remaining data while the save_live_complete_precopy_end handler might be used to mark the stream boundary after performing the asynchronous transmission. Signed-off-by: Maciej S. Szmigiero Reviewed-by: Fabiano Rosas --- include/migration/register.h | 36 ++++++++++++++++++++++++++++++++++++ migration/savevm.c | 35 +++++++++++++++++++++++++++++++++++ 2 files changed, 71 insertions(+) diff --git a/include/migration/register.h b/include/migration/register.h index f60e797894e5..9de123252edf 100644 --- a/include/migration/register.h +++ b/include/migration/register.h @@ -103,6 +103,42 @@ typedef struct SaveVMHandlers { */ int (*save_live_complete_precopy)(QEMUFile *f, void *opaque); + /** + * @save_live_complete_precopy_begin + * + * Called at the end of a precopy phase, before all + * @save_live_complete_precopy handlers and before launching + * all @save_live_complete_precopy_thread threads. + * The handler might, for example, mark the stream boundary before + * proceeding with asynchronous transmission of the remaining data via + * @save_live_complete_precopy_thread. + * When postcopy is enabled, devices that support postcopy will skip this step. + * + * @f: QEMUFile where the handler can synchronously send data before returning + * @idstr: this device section idstr + * @instance_id: this device section instance_id + * @opaque: data pointer passed to register_savevm_live() + * + * Returns zero to indicate success and negative for error + */ + int (*save_live_complete_precopy_begin)(QEMUFile *f, + char *idstr, uint32_t instance_id, + void *opaque); + /** + * @save_live_complete_precopy_end + * + * Called at the end of a precopy phase, after @save_live_complete_precopy + * handlers and after all @save_live_complete_precopy_thread threads have + * finished. When postcopy is enabled, devices that support postcopy will + * skip this step. + * + * @f: QEMUFile where the handler can synchronously send data before returning + * @opaque: data pointer passed to register_savevm_live() + * + * Returns zero to indicate success and negative for error + */ + int (*save_live_complete_precopy_end)(QEMUFile *f, void *opaque); + /* This runs both outside and inside the BQL. */ /** diff --git a/migration/savevm.c b/migration/savevm.c index 6bb404b9c86f..d43acbbf20cf 100644 --- a/migration/savevm.c +++ b/migration/savevm.c @@ -1496,6 +1496,27 @@ int qemu_savevm_state_complete_precopy_iterable(QEMUFile *f, bool in_postcopy) SaveStateEntry *se; int ret; + QTAILQ_FOREACH(se, &savevm_state.handlers, entry) { + if (!se->ops || (in_postcopy && se->ops->has_postcopy && + se->ops->has_postcopy(se->opaque)) || + !se->ops->save_live_complete_precopy_begin) { + continue; + } + + save_section_header(f, se, QEMU_VM_SECTION_END); + + ret = se->ops->save_live_complete_precopy_begin(f, + se->idstr, se->instance_id, + se->opaque); + + save_section_footer(f, se); + + if (ret < 0) { + qemu_file_set_error(f, ret); + return -1; + } + } + QTAILQ_FOREACH(se, &savevm_state.handlers, entry) { if (!se->ops || (in_postcopy && se->ops->has_postcopy && @@ -1527,6 +1548,20 @@ int qemu_savevm_state_complete_precopy_iterable(QEMUFile *f, bool in_postcopy) end_ts_each - start_ts_each); } + QTAILQ_FOREACH(se, &savevm_state.handlers, entry) { + if (!se->ops || (in_postcopy && se->ops->has_postcopy && + se->ops->has_postcopy(se->opaque)) || + !se->ops->save_live_complete_precopy_end) { + continue; + } + + ret = se->ops->save_live_complete_precopy_end(f, se->opaque); + if (ret < 0) { + qemu_file_set_error(f, ret); + return -1; + } + } + trace_vmstate_downtime_checkpoint("src-iterable-saved"); return 0; From patchwork Tue Aug 27 17:54:26 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Maciej S. Szmigiero" X-Patchwork-Id: 13779935 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 750D6C54731 for ; Tue, 27 Aug 2024 17:57:29 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1sj0Q3-0005hK-QQ; Tue, 27 Aug 2024 13:55:51 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1sj0Px-0005FK-2c for qemu-devel@nongnu.org; Tue, 27 Aug 2024 13:55:45 -0400 Received: from vps-vb.mhejs.net ([37.28.154.113]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1sj0Pv-0001Zz-IJ for qemu-devel@nongnu.org; Tue, 27 Aug 2024 13:55:44 -0400 Received: from MUA by vps-vb.mhejs.net with esmtps (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1sj0Pi-0002N6-7C; Tue, 27 Aug 2024 19:55:30 +0200 From: "Maciej S. Szmigiero" To: Peter Xu , Fabiano Rosas Cc: Alex Williamson , =?utf-8?q?C=C3=A9dric_Le_G?= =?utf-8?q?oater?= , Eric Blake , Markus Armbruster , =?utf-8?q?Daniel_P_=2E_Berrang=C3=A9?= , Avihai Horon , Joao Martins , qemu-devel@nongnu.org Subject: [PATCH v2 07/17] migration: Add qemu_loadvm_load_state_buffer() and its handler Date: Tue, 27 Aug 2024 19:54:26 +0200 Message-ID: X-Mailer: git-send-email 2.45.2 In-Reply-To: References: MIME-Version: 1.0 Received-SPF: pass client-ip=37.28.154.113; envelope-from=mail@maciej.szmigiero.name; helo=vps-vb.mhejs.net X-Spam_score_int: -18 X-Spam_score: -1.9 X-Spam_bar: - X-Spam_report: (-1.9 / 5.0 requ) BAYES_00=-1.9, RCVD_IN_VALIDITY_CERTIFIED_BLOCKED=0.001, RCVD_IN_VALIDITY_RPBL_BLOCKED=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org From: "Maciej S. Szmigiero" qemu_loadvm_load_state_buffer() and its load_state_buffer SaveVMHandler allow providing device state buffer to explicitly specified device via its idstr and instance id. Signed-off-by: Maciej S. Szmigiero Reviewed-by: Fabiano Rosas --- include/migration/register.h | 15 +++++++++++++++ migration/savevm.c | 25 +++++++++++++++++++++++++ migration/savevm.h | 3 +++ 3 files changed, 43 insertions(+) diff --git a/include/migration/register.h b/include/migration/register.h index 9de123252edf..4a578f140713 100644 --- a/include/migration/register.h +++ b/include/migration/register.h @@ -263,6 +263,21 @@ typedef struct SaveVMHandlers { */ int (*load_state)(QEMUFile *f, void *opaque, int version_id); + /** + * @load_state_buffer + * + * Load device state buffer provided to qemu_loadvm_load_state_buffer(). + * + * @opaque: data pointer passed to register_savevm_live() + * @data: the data buffer to load + * @data_size: the data length in buffer + * @errp: pointer to Error*, to store an error if it happens. + * + * Returns zero to indicate success and negative for error + */ + int (*load_state_buffer)(void *opaque, char *data, size_t data_size, + Error **errp); + /** * @load_setup * diff --git a/migration/savevm.c b/migration/savevm.c index d43acbbf20cf..3fde5ca8c26b 100644 --- a/migration/savevm.c +++ b/migration/savevm.c @@ -3101,6 +3101,31 @@ int qemu_loadvm_approve_switchover(void) return migrate_send_rp_switchover_ack(mis); } +int qemu_loadvm_load_state_buffer(const char *idstr, uint32_t instance_id, + char *buf, size_t len, Error **errp) +{ + SaveStateEntry *se; + + se = find_se(idstr, instance_id); + if (!se) { + error_setg(errp, "Unknown idstr %s or instance id %u for load state buffer", + idstr, instance_id); + return -1; + } + + if (!se->ops || !se->ops->load_state_buffer) { + error_setg(errp, "idstr %s / instance %u has no load state buffer operation", + idstr, instance_id); + return -1; + } + + if (se->ops->load_state_buffer(se->opaque, buf, len, errp) != 0) { + return -1; + } + + return 0; +} + bool save_snapshot(const char *name, bool overwrite, const char *vmstate, bool has_devices, strList *devices, Error **errp) { diff --git a/migration/savevm.h b/migration/savevm.h index 9ec96a995c93..d388f1bfca98 100644 --- a/migration/savevm.h +++ b/migration/savevm.h @@ -70,4 +70,7 @@ int qemu_loadvm_approve_switchover(void); int qemu_savevm_state_complete_precopy_non_iterable(QEMUFile *f, bool in_postcopy, bool inactivate_disks); +int qemu_loadvm_load_state_buffer(const char *idstr, uint32_t instance_id, + char *buf, size_t len, Error **errp); + #endif From patchwork Tue Aug 27 17:54:27 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Maciej S. Szmigiero" X-Patchwork-Id: 13779938 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id B8428C54731 for ; Tue, 27 Aug 2024 17:57:52 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1sj0Q2-0005c0-E7; Tue, 27 Aug 2024 13:55:50 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1sj0Q1-0005X3-Au for qemu-devel@nongnu.org; Tue, 27 Aug 2024 13:55:49 -0400 Received: from vps-vb.mhejs.net ([37.28.154.113]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1sj0Pz-0001bG-Kb for qemu-devel@nongnu.org; Tue, 27 Aug 2024 13:55:49 -0400 Received: from MUA by vps-vb.mhejs.net with esmtps (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1sj0Pn-0002NS-G0; Tue, 27 Aug 2024 19:55:35 +0200 From: "Maciej S. Szmigiero" To: Peter Xu , Fabiano Rosas Cc: Alex Williamson , =?utf-8?q?C=C3=A9dric_Le_G?= =?utf-8?q?oater?= , Eric Blake , Markus Armbruster , =?utf-8?q?Daniel_P_=2E_Berrang=C3=A9?= , Avihai Horon , Joao Martins , qemu-devel@nongnu.org Subject: [PATCH v2 08/17] migration: Add load_finish handler and associated functions Date: Tue, 27 Aug 2024 19:54:27 +0200 Message-ID: <1a7599896decdbae61cee385739dc0badc9b4364.1724701542.git.maciej.szmigiero@oracle.com> X-Mailer: git-send-email 2.45.2 In-Reply-To: References: MIME-Version: 1.0 Received-SPF: pass client-ip=37.28.154.113; envelope-from=mail@maciej.szmigiero.name; helo=vps-vb.mhejs.net X-Spam_score_int: -18 X-Spam_score: -1.9 X-Spam_bar: - X-Spam_report: (-1.9 / 5.0 requ) BAYES_00=-1.9, RCVD_IN_VALIDITY_CERTIFIED_BLOCKED=0.001, RCVD_IN_VALIDITY_RPBL_BLOCKED=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org From: "Maciej S. Szmigiero" load_finish SaveVMHandler allows migration code to poll whether a device-specific asynchronous device state loading operation had finished. In order to avoid calling this handler needlessly the device is supposed to notify the migration code of its possible readiness via a call to qemu_loadvm_load_finish_ready_broadcast() while holding qemu_loadvm_load_finish_ready_lock. Signed-off-by: Maciej S. Szmigiero Reviewed-by: Fabiano Rosas --- include/migration/register.h | 21 +++++++++++++++ migration/migration.c | 6 +++++ migration/migration.h | 3 +++ migration/savevm.c | 52 ++++++++++++++++++++++++++++++++++++ migration/savevm.h | 4 +++ 5 files changed, 86 insertions(+) diff --git a/include/migration/register.h b/include/migration/register.h index 4a578f140713..44d8cf5192ae 100644 --- a/include/migration/register.h +++ b/include/migration/register.h @@ -278,6 +278,27 @@ typedef struct SaveVMHandlers { int (*load_state_buffer)(void *opaque, char *data, size_t data_size, Error **errp); + /** + * @load_finish + * + * Poll whether all asynchronous device state loading had finished. + * Not called on the load failure path. + * + * Called while holding the qemu_loadvm_load_finish_ready_lock. + * + * If this method signals "not ready" then it might not be called + * again until qemu_loadvm_load_finish_ready_broadcast() is invoked + * while holding qemu_loadvm_load_finish_ready_lock. + * + * @opaque: data pointer passed to register_savevm_live() + * @is_finished: whether the loading had finished (output parameter) + * @errp: pointer to Error*, to store an error if it happens. + * + * Returns zero to indicate success and negative for error + * It's not an error that the loading still hasn't finished. + */ + int (*load_finish)(void *opaque, bool *is_finished, Error **errp); + /** * @load_setup * diff --git a/migration/migration.c b/migration/migration.c index 3dea06d57732..d61e7b055e07 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -259,6 +259,9 @@ void migration_object_init(void) current_incoming->exit_on_error = INMIGRATE_DEFAULT_EXIT_ON_ERROR; + qemu_mutex_init(¤t_incoming->load_finish_ready_mutex); + qemu_cond_init(¤t_incoming->load_finish_ready_cond); + migration_object_check(current_migration, &error_fatal); ram_mig_init(); @@ -410,6 +413,9 @@ void migration_incoming_state_destroy(void) mis->postcopy_qemufile_dst = NULL; } + qemu_mutex_destroy(&mis->load_finish_ready_mutex); + qemu_cond_destroy(&mis->load_finish_ready_cond); + yank_unregister_instance(MIGRATION_YANK_INSTANCE); } diff --git a/migration/migration.h b/migration/migration.h index 38aa1402d516..4e2443e6c8ec 100644 --- a/migration/migration.h +++ b/migration/migration.h @@ -230,6 +230,9 @@ struct MigrationIncomingState { /* Do exit on incoming migration failure */ bool exit_on_error; + + QemuCond load_finish_ready_cond; + QemuMutex load_finish_ready_mutex; }; MigrationIncomingState *migration_incoming_get_current(void); diff --git a/migration/savevm.c b/migration/savevm.c index 3fde5ca8c26b..33c9200d1e78 100644 --- a/migration/savevm.c +++ b/migration/savevm.c @@ -3022,6 +3022,37 @@ int qemu_loadvm_state(QEMUFile *f) return ret; } + qemu_loadvm_load_finish_ready_lock(); + while (!ret) { /* Don't call load_finish() handlers on the load failure path */ + bool all_ready = true; + SaveStateEntry *se = NULL; + + QTAILQ_FOREACH(se, &savevm_state.handlers, entry) { + bool this_ready; + + if (!se->ops || !se->ops->load_finish) { + continue; + } + + ret = se->ops->load_finish(se->opaque, &this_ready, &local_err); + if (ret) { + error_report_err(local_err); + + qemu_loadvm_load_finish_ready_unlock(); + return -EINVAL; + } else if (!this_ready) { + all_ready = false; + } + } + + if (all_ready) { + break; + } + + qemu_cond_wait(&mis->load_finish_ready_cond, &mis->load_finish_ready_mutex); + } + qemu_loadvm_load_finish_ready_unlock(); + if (ret == 0) { ret = qemu_file_get_error(f); } @@ -3126,6 +3157,27 @@ int qemu_loadvm_load_state_buffer(const char *idstr, uint32_t instance_id, return 0; } +void qemu_loadvm_load_finish_ready_lock(void) +{ + MigrationIncomingState *mis = migration_incoming_get_current(); + + qemu_mutex_lock(&mis->load_finish_ready_mutex); +} + +void qemu_loadvm_load_finish_ready_unlock(void) +{ + MigrationIncomingState *mis = migration_incoming_get_current(); + + qemu_mutex_unlock(&mis->load_finish_ready_mutex); +} + +void qemu_loadvm_load_finish_ready_broadcast(void) +{ + MigrationIncomingState *mis = migration_incoming_get_current(); + + qemu_cond_broadcast(&mis->load_finish_ready_cond); +} + bool save_snapshot(const char *name, bool overwrite, const char *vmstate, bool has_devices, strList *devices, Error **errp) { diff --git a/migration/savevm.h b/migration/savevm.h index d388f1bfca98..69ae22cded7a 100644 --- a/migration/savevm.h +++ b/migration/savevm.h @@ -73,4 +73,8 @@ int qemu_savevm_state_complete_precopy_non_iterable(QEMUFile *f, int qemu_loadvm_load_state_buffer(const char *idstr, uint32_t instance_id, char *buf, size_t len, Error **errp); +void qemu_loadvm_load_finish_ready_lock(void); +void qemu_loadvm_load_finish_ready_unlock(void); +void qemu_loadvm_load_finish_ready_broadcast(void); + #endif From patchwork Tue Aug 27 17:54:28 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Maciej S. Szmigiero" X-Patchwork-Id: 13779951 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 907A5C54739 for ; Tue, 27 Aug 2024 17:58:09 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1sj0QE-0006GJ-KT; Tue, 27 Aug 2024 13:56:02 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1sj0Q8-00060I-Hm for qemu-devel@nongnu.org; Tue, 27 Aug 2024 13:55:57 -0400 Received: from vps-vb.mhejs.net ([37.28.154.113]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1sj0Q6-0001bu-I9 for qemu-devel@nongnu.org; Tue, 27 Aug 2024 13:55:56 -0400 Received: from MUA by vps-vb.mhejs.net with esmtps (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1sj0Ps-0002No-Mb; Tue, 27 Aug 2024 19:55:40 +0200 From: "Maciej S. Szmigiero" To: Peter Xu , Fabiano Rosas Cc: Alex Williamson , =?utf-8?q?C=C3=A9dric_Le_G?= =?utf-8?q?oater?= , Eric Blake , Markus Armbruster , =?utf-8?q?Daniel_P_=2E_Berrang=C3=A9?= , Avihai Horon , Joao Martins , qemu-devel@nongnu.org Subject: [PATCH v2 09/17] migration/multifd: Device state transfer support - receive side Date: Tue, 27 Aug 2024 19:54:28 +0200 Message-ID: <84141182083a8417c25b4d82a9c4b6228b22ac67.1724701542.git.maciej.szmigiero@oracle.com> X-Mailer: git-send-email 2.45.2 In-Reply-To: References: MIME-Version: 1.0 Received-SPF: pass client-ip=37.28.154.113; envelope-from=mail@maciej.szmigiero.name; helo=vps-vb.mhejs.net X-Spam_score_int: -18 X-Spam_score: -1.9 X-Spam_bar: - X-Spam_report: (-1.9 / 5.0 requ) BAYES_00=-1.9, RCVD_IN_VALIDITY_CERTIFIED_BLOCKED=0.001, RCVD_IN_VALIDITY_RPBL_BLOCKED=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org From: "Maciej S. Szmigiero" Add a basic support for receiving device state via multifd channels - channels that are shared with RAM transfers. To differentiate between a device state and a RAM packet the packet header is read first. Depending whether MULTIFD_FLAG_DEVICE_STATE flag is present or not in the packet header either device state (MultiFDPacketDeviceState_t) or RAM data (existing MultiFDPacket_t) is then read. The received device state data is provided to qemu_loadvm_load_state_buffer() function for processing in the device's load_state_buffer handler. Signed-off-by: Maciej S. Szmigiero --- migration/multifd.c | 127 +++++++++++++++++++++++++++++++++++++------- migration/multifd.h | 31 ++++++++++- 2 files changed, 138 insertions(+), 20 deletions(-) diff --git a/migration/multifd.c b/migration/multifd.c index b06a9fab500e..d5a8e5a9c9b5 100644 --- a/migration/multifd.c +++ b/migration/multifd.c @@ -21,6 +21,7 @@ #include "file.h" #include "migration.h" #include "migration-stats.h" +#include "savevm.h" #include "socket.h" #include "tls.h" #include "qemu-file.h" @@ -209,10 +210,10 @@ void multifd_send_fill_packet(MultiFDSendParams *p) memset(packet, 0, p->packet_len); - packet->magic = cpu_to_be32(MULTIFD_MAGIC); - packet->version = cpu_to_be32(MULTIFD_VERSION); + packet->hdr.magic = cpu_to_be32(MULTIFD_MAGIC); + packet->hdr.version = cpu_to_be32(MULTIFD_VERSION); - packet->flags = cpu_to_be32(p->flags); + packet->hdr.flags = cpu_to_be32(p->flags); packet->next_packet_size = cpu_to_be32(p->next_packet_size); packet_num = qatomic_fetch_inc(&multifd_send_state->packet_num); @@ -228,31 +229,49 @@ void multifd_send_fill_packet(MultiFDSendParams *p) p->flags, p->next_packet_size); } -static int multifd_recv_unfill_packet(MultiFDRecvParams *p, Error **errp) +static int multifd_recv_unfill_packet_header(MultiFDRecvParams *p, + MultiFDPacketHdr_t *hdr, + Error **errp) { - MultiFDPacket_t *packet = p->packet; - int ret = 0; - - packet->magic = be32_to_cpu(packet->magic); - if (packet->magic != MULTIFD_MAGIC) { + hdr->magic = be32_to_cpu(hdr->magic); + if (hdr->magic != MULTIFD_MAGIC) { error_setg(errp, "multifd: received packet " "magic %x and expected magic %x", - packet->magic, MULTIFD_MAGIC); + hdr->magic, MULTIFD_MAGIC); return -1; } - packet->version = be32_to_cpu(packet->version); - if (packet->version != MULTIFD_VERSION) { + hdr->version = be32_to_cpu(hdr->version); + if (hdr->version != MULTIFD_VERSION) { error_setg(errp, "multifd: received packet " "version %u and expected version %u", - packet->version, MULTIFD_VERSION); + hdr->version, MULTIFD_VERSION); return -1; } - p->flags = be32_to_cpu(packet->flags); + p->flags = be32_to_cpu(hdr->flags); + + return 0; +} + +static int multifd_recv_unfill_packet_device_state(MultiFDRecvParams *p, + Error **errp) +{ + MultiFDPacketDeviceState_t *packet = p->packet_dev_state; + + packet->instance_id = be32_to_cpu(packet->instance_id); + p->next_packet_size = be32_to_cpu(packet->next_packet_size); + + return 0; +} + +static int multifd_recv_unfill_packet_ram(MultiFDRecvParams *p, Error **errp) +{ + MultiFDPacket_t *packet = p->packet; + int ret = 0; + p->next_packet_size = be32_to_cpu(packet->next_packet_size); p->packet_num = be64_to_cpu(packet->packet_num); - p->packets_recved++; if (!(p->flags & MULTIFD_FLAG_SYNC)) { ret = multifd_ram_unfill_packet(p, errp); @@ -264,6 +283,19 @@ static int multifd_recv_unfill_packet(MultiFDRecvParams *p, Error **errp) return ret; } +static int multifd_recv_unfill_packet(MultiFDRecvParams *p, Error **errp) +{ + p->packets_recved++; + + if (p->flags & MULTIFD_FLAG_DEVICE_STATE) { + return multifd_recv_unfill_packet_device_state(p, errp); + } else { + return multifd_recv_unfill_packet_ram(p, errp); + } + + g_assert_not_reached(); +} + static bool multifd_send_should_exit(void) { return qatomic_read(&multifd_send_state->exiting); @@ -1014,6 +1046,7 @@ static void multifd_recv_cleanup_channel(MultiFDRecvParams *p) p->packet_len = 0; g_free(p->packet); p->packet = NULL; + g_clear_pointer(&p->packet_dev_state, g_free); g_free(p->normal); p->normal = NULL; g_free(p->zero); @@ -1126,8 +1159,13 @@ static void *multifd_recv_thread(void *opaque) rcu_register_thread(); while (true) { + MultiFDPacketHdr_t hdr; uint32_t flags = 0; + bool is_device_state = false; bool has_data = false; + uint8_t *pkt_buf; + size_t pkt_len; + p->normal_num = 0; if (use_packets) { @@ -1135,8 +1173,28 @@ static void *multifd_recv_thread(void *opaque) break; } - ret = qio_channel_read_all_eof(p->c, (void *)p->packet, - p->packet_len, &local_err); + ret = qio_channel_read_all_eof(p->c, (void *)&hdr, + sizeof(hdr), &local_err); + if (ret == 0 || ret == -1) { /* 0: EOF -1: Error */ + break; + } + + ret = multifd_recv_unfill_packet_header(p, &hdr, &local_err); + if (ret) { + break; + } + + is_device_state = p->flags & MULTIFD_FLAG_DEVICE_STATE; + if (is_device_state) { + pkt_buf = (uint8_t *)p->packet_dev_state + sizeof(hdr); + pkt_len = sizeof(*p->packet_dev_state) - sizeof(hdr); + } else { + pkt_buf = (uint8_t *)p->packet + sizeof(hdr); + pkt_len = p->packet_len - sizeof(hdr); + } + + ret = qio_channel_read_all_eof(p->c, (char *)pkt_buf, pkt_len, + &local_err); if (ret == 0 || ret == -1) { /* 0: EOF -1: Error */ break; } @@ -1181,8 +1239,33 @@ static void *multifd_recv_thread(void *opaque) has_data = !!p->data->size; } - if (has_data) { - ret = multifd_recv_state->ops->recv(p, &local_err); + if (!is_device_state) { + if (has_data) { + ret = multifd_recv_state->ops->recv(p, &local_err); + if (ret != 0) { + break; + } + } + } else { + g_autofree char *idstr = NULL; + g_autofree char *dev_state_buf = NULL; + + assert(use_packets); + + if (p->next_packet_size > 0) { + dev_state_buf = g_malloc(p->next_packet_size); + + ret = qio_channel_read_all(p->c, dev_state_buf, p->next_packet_size, &local_err); + if (ret != 0) { + break; + } + } + + idstr = g_strndup(p->packet_dev_state->idstr, sizeof(p->packet_dev_state->idstr)); + ret = qemu_loadvm_load_state_buffer(idstr, + p->packet_dev_state->instance_id, + dev_state_buf, p->next_packet_size, + &local_err); if (ret != 0) { break; } @@ -1190,6 +1273,11 @@ static void *multifd_recv_thread(void *opaque) if (use_packets) { if (flags & MULTIFD_FLAG_SYNC) { + if (is_device_state) { + error_setg(&local_err, "multifd: received SYNC device state packet"); + break; + } + qemu_sem_post(&multifd_recv_state->sem_sync); qemu_sem_wait(&p->sem_sync); } @@ -1258,6 +1346,7 @@ int multifd_recv_setup(Error **errp) p->packet_len = sizeof(MultiFDPacket_t) + sizeof(uint64_t) * page_count; p->packet = g_malloc0(p->packet_len); + p->packet_dev_state = g_malloc0(sizeof(*p->packet_dev_state)); } p->name = g_strdup_printf("mig/dst/recv_%d", i); p->normal = g_new0(ram_addr_t, page_count); diff --git a/migration/multifd.h b/migration/multifd.h index a3e35196d179..a8f3e4838c01 100644 --- a/migration/multifd.h +++ b/migration/multifd.h @@ -45,6 +45,12 @@ MultiFDRecvData *multifd_get_recv_data(void); #define MULTIFD_FLAG_QPL (4 << 1) #define MULTIFD_FLAG_UADK (8 << 1) +/* + * If set it means that this packet contains device state + * (MultiFDPacketDeviceState_t), not RAM data (MultiFDPacket_t). + */ +#define MULTIFD_FLAG_DEVICE_STATE (1 << 4) + /* This value needs to be a multiple of qemu_target_page_size() */ #define MULTIFD_PACKET_SIZE (512 * 1024) @@ -52,6 +58,11 @@ typedef struct { uint32_t magic; uint32_t version; uint32_t flags; +} __attribute__((packed)) MultiFDPacketHdr_t; + +typedef struct { + MultiFDPacketHdr_t hdr; + /* maximum number of allocated pages */ uint32_t pages_alloc; /* non zero pages */ @@ -72,6 +83,16 @@ typedef struct { uint64_t offset[]; } __attribute__((packed)) MultiFDPacket_t; +typedef struct { + MultiFDPacketHdr_t hdr; + + char idstr[256] QEMU_NONSTRING; + uint32_t instance_id; + + /* size of the next packet that contains the actual data */ + uint32_t next_packet_size; +} __attribute__((packed)) MultiFDPacketDeviceState_t; + typedef struct { /* number of used pages */ uint32_t num; @@ -89,6 +110,13 @@ struct MultiFDRecvData { off_t file_offset; }; +typedef struct { + char *idstr; + uint32_t instance_id; + char *buf; + size_t buf_len; +} MultiFDDeviceState_t; + typedef enum { MULTIFD_PAYLOAD_NONE, MULTIFD_PAYLOAD_RAM, @@ -204,8 +232,9 @@ typedef struct { /* thread local variables. No locking required */ - /* pointer to the packet */ + /* pointers to the possible packet types */ MultiFDPacket_t *packet; + MultiFDPacketDeviceState_t *packet_dev_state; /* size of the next packet that contains pages */ uint32_t next_packet_size; /* packets received through this channel */ From patchwork Tue Aug 27 17:54:29 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Maciej S. Szmigiero" X-Patchwork-Id: 13779931 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 787B6C5472F for ; Tue, 27 Aug 2024 17:56:42 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1sj0QK-0006h9-CJ; Tue, 27 Aug 2024 13:56:09 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1sj0QD-0006Nv-GC for qemu-devel@nongnu.org; Tue, 27 Aug 2024 13:56:02 -0400 Received: from vps-vb.mhejs.net ([37.28.154.113]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1sj0QB-0001cG-Tc for qemu-devel@nongnu.org; Tue, 27 Aug 2024 13:56:01 -0400 Received: from MUA by vps-vb.mhejs.net with esmtps (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1sj0Px-0002O7-TR; Tue, 27 Aug 2024 19:55:45 +0200 From: "Maciej S. Szmigiero" To: Peter Xu , Fabiano Rosas Cc: Alex Williamson , =?utf-8?q?C=C3=A9dric_Le_G?= =?utf-8?q?oater?= , Eric Blake , Markus Armbruster , =?utf-8?q?Daniel_P_=2E_Berrang=C3=A9?= , Avihai Horon , Joao Martins , qemu-devel@nongnu.org Subject: [PATCH v2 10/17] migration/multifd: Convert multifd_send()::next_channel to atomic Date: Tue, 27 Aug 2024 19:54:29 +0200 Message-ID: <76dc3ad69fa457fd1e358ad3de874474f9f64716.1724701542.git.maciej.szmigiero@oracle.com> X-Mailer: git-send-email 2.45.2 In-Reply-To: References: MIME-Version: 1.0 Received-SPF: pass client-ip=37.28.154.113; envelope-from=mail@maciej.szmigiero.name; helo=vps-vb.mhejs.net X-Spam_score_int: -18 X-Spam_score: -1.9 X-Spam_bar: - X-Spam_report: (-1.9 / 5.0 requ) BAYES_00=-1.9, RCVD_IN_VALIDITY_CERTIFIED_BLOCKED=0.001, RCVD_IN_VALIDITY_RPBL_BLOCKED=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org From: "Maciej S. Szmigiero" This is necessary for multifd_send() to be able to be called from multiple threads. Signed-off-by: Maciej S. Szmigiero --- migration/multifd.c | 24 ++++++++++++++++++------ 1 file changed, 18 insertions(+), 6 deletions(-) diff --git a/migration/multifd.c b/migration/multifd.c index d5a8e5a9c9b5..b25789dde0b3 100644 --- a/migration/multifd.c +++ b/migration/multifd.c @@ -343,26 +343,38 @@ bool multifd_send(MultiFDSendData **send_data) return false; } - /* We wait here, until at least one channel is ready */ - qemu_sem_wait(&multifd_send_state->channels_ready); - /* * next_channel can remain from a previous migration that was * using more channels, so ensure it doesn't overflow if the * limit is lower now. */ - next_channel %= migrate_multifd_channels(); - for (i = next_channel;; i = (i + 1) % migrate_multifd_channels()) { + i = qatomic_load_acquire(&next_channel); + if (unlikely(i >= migrate_multifd_channels())) { + qatomic_cmpxchg(&next_channel, i, 0); + } + + /* We wait here, until at least one channel is ready */ + qemu_sem_wait(&multifd_send_state->channels_ready); + + while (true) { + int i_next; + if (multifd_send_should_exit()) { return false; } + + i = qatomic_load_acquire(&next_channel); + i_next = (i + 1) % migrate_multifd_channels(); + if (qatomic_cmpxchg(&next_channel, i, i_next) != i) { + continue; + } + p = &multifd_send_state->params[i]; /* * Lockless read to p->pending_job is safe, because only multifd * sender thread can clear it. */ if (qatomic_read(&p->pending_job) == false) { - next_channel = (i + 1) % migrate_multifd_channels(); break; } } From patchwork Tue Aug 27 17:54:30 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Maciej S. Szmigiero" X-Patchwork-Id: 13779936 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 00B47C5472F for ; Tue, 27 Aug 2024 17:57:31 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1sj0Qi-00088F-C3; Tue, 27 Aug 2024 13:56:33 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1sj0QS-0007Rc-CT for qemu-devel@nongnu.org; Tue, 27 Aug 2024 13:56:18 -0400 Received: from vps-vb.mhejs.net ([37.28.154.113]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1sj0QH-0001cY-AE for qemu-devel@nongnu.org; Tue, 27 Aug 2024 13:56:16 -0400 Received: from MUA by vps-vb.mhejs.net with esmtps (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1sj0Q3-0002OX-3d; Tue, 27 Aug 2024 19:55:51 +0200 From: "Maciej S. Szmigiero" To: Peter Xu , Fabiano Rosas Cc: Alex Williamson , =?utf-8?q?C=C3=A9dric_Le_G?= =?utf-8?q?oater?= , Eric Blake , Markus Armbruster , =?utf-8?q?Daniel_P_=2E_Berrang=C3=A9?= , Avihai Horon , Joao Martins , qemu-devel@nongnu.org Subject: [PATCH v2 11/17] migration/multifd: Add an explicit MultiFDSendData destructor Date: Tue, 27 Aug 2024 19:54:30 +0200 Message-ID: <59483d382d5453e69c6c762eb0a006a1f7ac228f.1724701542.git.maciej.szmigiero@oracle.com> X-Mailer: git-send-email 2.45.2 In-Reply-To: References: MIME-Version: 1.0 Received-SPF: pass client-ip=37.28.154.113; envelope-from=mail@maciej.szmigiero.name; helo=vps-vb.mhejs.net X-Spam_score_int: -18 X-Spam_score: -1.9 X-Spam_bar: - X-Spam_report: (-1.9 / 5.0 requ) BAYES_00=-1.9, RCVD_IN_VALIDITY_RPBL_BLOCKED=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org From: "Maciej S. Szmigiero" This way if there are fields there that needs explicit disposal (like, for example, some attached buffers) they will be handled appropriately. Add a related assert to multifd_set_payload_type() in order to make sure that this function is only used to fill a previously empty MultiFDSendData with some payload, not the other way around. Signed-off-by: Maciej S. Szmigiero Reviewed-by: Fabiano Rosas --- migration/multifd-nocomp.c | 3 +-- migration/multifd.c | 31 ++++++++++++++++++++++++++++--- migration/multifd.h | 5 +++++ 3 files changed, 34 insertions(+), 5 deletions(-) diff --git a/migration/multifd-nocomp.c b/migration/multifd-nocomp.c index 53ea9f9c8371..39eb77c9b3b7 100644 --- a/migration/multifd-nocomp.c +++ b/migration/multifd-nocomp.c @@ -40,8 +40,7 @@ void multifd_ram_save_setup(void) void multifd_ram_save_cleanup(void) { - g_free(multifd_ram_send); - multifd_ram_send = NULL; + g_clear_pointer(&multifd_ram_send, multifd_send_data_free); } static void multifd_set_file_bitmap(MultiFDSendParams *p) diff --git a/migration/multifd.c b/migration/multifd.c index b25789dde0b3..a74e8a5cc891 100644 --- a/migration/multifd.c +++ b/migration/multifd.c @@ -119,6 +119,32 @@ MultiFDSendData *multifd_send_data_alloc(void) return g_malloc0(size_minus_payload + max_payload_size); } +void multifd_send_data_clear(MultiFDSendData *data) +{ + if (multifd_payload_empty(data)) { + return; + } + + switch (data->type) { + default: + /* Nothing to do */ + break; + } + + data->type = MULTIFD_PAYLOAD_NONE; +} + +void multifd_send_data_free(MultiFDSendData *data) +{ + if (!data) { + return; + } + + multifd_send_data_clear(data); + + g_free(data); +} + static bool multifd_use_packets(void) { return !migrate_mapped_ram(); @@ -506,8 +532,7 @@ static bool multifd_send_cleanup_channel(MultiFDSendParams *p, Error **errp) qemu_sem_destroy(&p->sem_sync); g_free(p->name); p->name = NULL; - g_free(p->data); - p->data = NULL; + g_clear_pointer(&p->data, multifd_send_data_free); p->packet_len = 0; g_free(p->packet); p->packet = NULL; @@ -671,7 +696,7 @@ static void *multifd_send_thread(void *opaque) p->next_packet_size + p->packet_len); p->next_packet_size = 0; - multifd_set_payload_type(p->data, MULTIFD_PAYLOAD_NONE); + multifd_send_data_clear(p->data); /* * Making sure p->data is published before saying "we're diff --git a/migration/multifd.h b/migration/multifd.h index a8f3e4838c01..a0853622153e 100644 --- a/migration/multifd.h +++ b/migration/multifd.h @@ -139,6 +139,9 @@ static inline bool multifd_payload_empty(MultiFDSendData *data) static inline void multifd_set_payload_type(MultiFDSendData *data, MultiFDPayloadType type) { + assert(multifd_payload_empty(data)); + assert(type != MULTIFD_PAYLOAD_NONE); + data->type = type; } @@ -288,6 +291,8 @@ static inline void multifd_send_prepare_header(MultiFDSendParams *p) void multifd_channel_connect(MultiFDSendParams *p, QIOChannel *ioc); bool multifd_send(MultiFDSendData **send_data); MultiFDSendData *multifd_send_data_alloc(void); +void multifd_send_data_clear(MultiFDSendData *data); +void multifd_send_data_free(MultiFDSendData *data); static inline uint32_t multifd_ram_page_size(void) { From patchwork Tue Aug 27 17:54:31 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Maciej S. Szmigiero" X-Patchwork-Id: 13779937 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 08627C5472F for ; Tue, 27 Aug 2024 17:57:43 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1sj0R1-0001Cw-T4; Tue, 27 Aug 2024 13:56:52 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1sj0QW-0007gx-Aw for qemu-devel@nongnu.org; Tue, 27 Aug 2024 13:56:21 -0400 Received: from vps-vb.mhejs.net ([37.28.154.113]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1sj0QM-0001cn-Tf for qemu-devel@nongnu.org; Tue, 27 Aug 2024 13:56:18 -0400 Received: from MUA by vps-vb.mhejs.net with esmtps (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1sj0Q8-0002Oq-A6; Tue, 27 Aug 2024 19:55:56 +0200 From: "Maciej S. Szmigiero" To: Peter Xu , Fabiano Rosas Cc: Alex Williamson , =?utf-8?q?C=C3=A9dric_Le_G?= =?utf-8?q?oater?= , Eric Blake , Markus Armbruster , =?utf-8?q?Daniel_P_=2E_Berrang=C3=A9?= , Avihai Horon , Joao Martins , qemu-devel@nongnu.org Subject: [PATCH v2 12/17] migration/multifd: Device state transfer support - send side Date: Tue, 27 Aug 2024 19:54:31 +0200 Message-ID: X-Mailer: git-send-email 2.45.2 In-Reply-To: References: MIME-Version: 1.0 Received-SPF: pass client-ip=37.28.154.113; envelope-from=mail@maciej.szmigiero.name; helo=vps-vb.mhejs.net X-Spam_score_int: -18 X-Spam_score: -1.9 X-Spam_bar: - X-Spam_report: (-1.9 / 5.0 requ) BAYES_00=-1.9, RCVD_IN_VALIDITY_CERTIFIED_BLOCKED=0.001, RCVD_IN_VALIDITY_RPBL_BLOCKED=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org From: "Maciej S. Szmigiero" A new function multifd_queue_device_state() is provided for device to queue its state for transmission via a multifd channel. Signed-off-by: Maciej S. Szmigiero Signed-off-by: Peter Xu --- include/migration/misc.h | 4 ++ migration/meson.build | 1 + migration/multifd-device-state.c | 99 ++++++++++++++++++++++++++++++++ migration/multifd-nocomp.c | 6 +- migration/multifd-qpl.c | 2 +- migration/multifd-uadk.c | 2 +- migration/multifd-zlib.c | 2 +- migration/multifd-zstd.c | 2 +- migration/multifd.c | 65 +++++++++++++++------ migration/multifd.h | 29 +++++++++- 10 files changed, 184 insertions(+), 28 deletions(-) create mode 100644 migration/multifd-device-state.c diff --git a/include/migration/misc.h b/include/migration/misc.h index bfadc5613bac..7266b1b77d1f 100644 --- a/include/migration/misc.h +++ b/include/migration/misc.h @@ -111,4 +111,8 @@ bool migration_in_bg_snapshot(void); /* migration/block-dirty-bitmap.c */ void dirty_bitmap_mig_init(void); +/* migration/multifd-device-state.c */ +bool multifd_queue_device_state(char *idstr, uint32_t instance_id, + char *data, size_t len); + #endif diff --git a/migration/meson.build b/migration/meson.build index 77f3abf08eb1..00853595894f 100644 --- a/migration/meson.build +++ b/migration/meson.build @@ -21,6 +21,7 @@ system_ss.add(files( 'migration-hmp-cmds.c', 'migration.c', 'multifd.c', + 'multifd-device-state.c', 'multifd-nocomp.c', 'multifd-zlib.c', 'multifd-zero-page.c', diff --git a/migration/multifd-device-state.c b/migration/multifd-device-state.c new file mode 100644 index 000000000000..c9b44f0b5ab9 --- /dev/null +++ b/migration/multifd-device-state.c @@ -0,0 +1,99 @@ +/* + * Multifd device state migration + * + * Copyright (C) 2024 Oracle and/or its affiliates. + * + * This work is licensed under the terms of the GNU GPL, version 2 or later. + * See the COPYING file in the top-level directory. + */ + +#include "qemu/osdep.h" +#include "qemu/lockable.h" +#include "migration/misc.h" +#include "multifd.h" + +static QemuMutex queue_job_mutex; + +static MultiFDSendData *device_state_send; + +size_t multifd_device_state_payload_size(void) +{ + return sizeof(MultiFDDeviceState_t); +} + +void multifd_device_state_save_setup(void) +{ + qemu_mutex_init(&queue_job_mutex); + + device_state_send = multifd_send_data_alloc(); +} + +void multifd_device_state_clear(MultiFDDeviceState_t *device_state) +{ + g_clear_pointer(&device_state->idstr, g_free); + g_clear_pointer(&device_state->buf, g_free); +} + +void multifd_device_state_save_cleanup(void) +{ + g_clear_pointer(&device_state_send, multifd_send_data_free); + + qemu_mutex_destroy(&queue_job_mutex); +} + +static void multifd_device_state_fill_packet(MultiFDSendParams *p) +{ + MultiFDDeviceState_t *device_state = &p->data->u.device_state; + MultiFDPacketDeviceState_t *packet = p->packet_device_state; + + packet->hdr.flags = cpu_to_be32(p->flags); + strncpy(packet->idstr, device_state->idstr, sizeof(packet->idstr)); + packet->instance_id = cpu_to_be32(device_state->instance_id); + packet->next_packet_size = cpu_to_be32(p->next_packet_size); +} + +void multifd_device_state_send_prepare(MultiFDSendParams *p) +{ + MultiFDDeviceState_t *device_state = &p->data->u.device_state; + + assert(multifd_payload_device_state(p->data)); + + multifd_send_prepare_header_device_state(p); + + assert(!(p->flags & MULTIFD_FLAG_SYNC)); + + p->next_packet_size = device_state->buf_len; + if (p->next_packet_size > 0) { + p->iov[p->iovs_num].iov_base = device_state->buf; + p->iov[p->iovs_num].iov_len = p->next_packet_size; + p->iovs_num++; + } + + p->flags |= MULTIFD_FLAG_NOCOMP | MULTIFD_FLAG_DEVICE_STATE; + + multifd_device_state_fill_packet(p); +} + +bool multifd_queue_device_state(char *idstr, uint32_t instance_id, + char *data, size_t len) +{ + /* Device state submissions can come from multiple threads */ + QEMU_LOCK_GUARD(&queue_job_mutex); + MultiFDDeviceState_t *device_state; + + assert(multifd_payload_empty(device_state_send)); + + multifd_set_payload_type(device_state_send, MULTIFD_PAYLOAD_DEVICE_STATE); + device_state = &device_state_send->u.device_state; + device_state->idstr = g_strdup(idstr); + device_state->instance_id = instance_id; + device_state->buf = g_memdup2(data, len); + device_state->buf_len = len; + + if (!multifd_send(&device_state_send)) { + multifd_send_data_clear(device_state_send); + return false; + } + + return true; +} diff --git a/migration/multifd-nocomp.c b/migration/multifd-nocomp.c index 39eb77c9b3b7..0b7b543f44db 100644 --- a/migration/multifd-nocomp.c +++ b/migration/multifd-nocomp.c @@ -116,13 +116,13 @@ static int multifd_nocomp_send_prepare(MultiFDSendParams *p, Error **errp) * Only !zerocopy needs the header in IOV; zerocopy will * send it separately. */ - multifd_send_prepare_header(p); + multifd_send_prepare_header_ram(p); } multifd_send_prepare_iovs(p); p->flags |= MULTIFD_FLAG_NOCOMP; - multifd_send_fill_packet(p); + multifd_send_fill_packet_ram(p); if (use_zero_copy_send) { /* Send header first, without zerocopy */ @@ -371,7 +371,7 @@ bool multifd_send_prepare_common(MultiFDSendParams *p) return false; } - multifd_send_prepare_header(p); + multifd_send_prepare_header_ram(p); return true; } diff --git a/migration/multifd-qpl.c b/migration/multifd-qpl.c index 75041a4c4dfe..bd6b5b6a3868 100644 --- a/migration/multifd-qpl.c +++ b/migration/multifd-qpl.c @@ -490,7 +490,7 @@ static int multifd_qpl_send_prepare(MultiFDSendParams *p, Error **errp) out: p->flags |= MULTIFD_FLAG_QPL; - multifd_send_fill_packet(p); + multifd_send_fill_packet_ram(p); return 0; } diff --git a/migration/multifd-uadk.c b/migration/multifd-uadk.c index db2549f59bfe..6e2d26010742 100644 --- a/migration/multifd-uadk.c +++ b/migration/multifd-uadk.c @@ -198,7 +198,7 @@ static int multifd_uadk_send_prepare(MultiFDSendParams *p, Error **errp) } out: p->flags |= MULTIFD_FLAG_UADK; - multifd_send_fill_packet(p); + multifd_send_fill_packet_ram(p); return 0; } diff --git a/migration/multifd-zlib.c b/migration/multifd-zlib.c index 6787538762d2..62a1fe59ad3e 100644 --- a/migration/multifd-zlib.c +++ b/migration/multifd-zlib.c @@ -156,7 +156,7 @@ static int multifd_zlib_send_prepare(MultiFDSendParams *p, Error **errp) out: p->flags |= MULTIFD_FLAG_ZLIB; - multifd_send_fill_packet(p); + multifd_send_fill_packet_ram(p); return 0; } diff --git a/migration/multifd-zstd.c b/migration/multifd-zstd.c index 1576b1e2adc6..f98b07e7f9f5 100644 --- a/migration/multifd-zstd.c +++ b/migration/multifd-zstd.c @@ -143,7 +143,7 @@ static int multifd_zstd_send_prepare(MultiFDSendParams *p, Error **errp) out: p->flags |= MULTIFD_FLAG_ZSTD; - multifd_send_fill_packet(p); + multifd_send_fill_packet_ram(p); return 0; } diff --git a/migration/multifd.c b/migration/multifd.c index a74e8a5cc891..bebe5b5a9b9c 100644 --- a/migration/multifd.c +++ b/migration/multifd.c @@ -12,6 +12,7 @@ #include "qemu/osdep.h" #include "qemu/cutils.h" +#include "qemu/iov.h" #include "qemu/rcu.h" #include "exec/target_page.h" #include "sysemu/sysemu.h" @@ -19,6 +20,7 @@ #include "qemu/error-report.h" #include "qapi/error.h" #include "file.h" +#include "migration/misc.h" #include "migration.h" #include "migration-stats.h" #include "savevm.h" @@ -107,7 +109,9 @@ MultiFDSendData *multifd_send_data_alloc(void) * added to the union in the future are larger than * (MultiFDPages_t + flex array). */ - max_payload_size = MAX(multifd_ram_payload_size(), sizeof(MultiFDPayload)); + max_payload_size = MAX(multifd_ram_payload_size(), + multifd_device_state_payload_size()); + max_payload_size = MAX(max_payload_size, sizeof(MultiFDPayload)); /* * Account for any holes the compiler might insert. We can't pack @@ -126,6 +130,9 @@ void multifd_send_data_clear(MultiFDSendData *data) } switch (data->type) { + case MULTIFD_PAYLOAD_DEVICE_STATE: + multifd_device_state_clear(&data->u.device_state); + break; default: /* Nothing to do */ break; @@ -228,7 +235,7 @@ static int multifd_recv_initial_packet(QIOChannel *c, Error **errp) return msg.id; } -void multifd_send_fill_packet(MultiFDSendParams *p) +void multifd_send_fill_packet_ram(MultiFDSendParams *p) { MultiFDPacket_t *packet = p->packet; uint64_t packet_num; @@ -397,20 +404,16 @@ bool multifd_send(MultiFDSendData **send_data) p = &multifd_send_state->params[i]; /* - * Lockless read to p->pending_job is safe, because only multifd - * sender thread can clear it. + * Lockless RMW on p->pending_job_preparing is safe, because only multifd + * sender thread can clear it after it had seen p->pending_job being set. + * + * Pairs with qatomic_store_release() in multifd_send_thread(). */ - if (qatomic_read(&p->pending_job) == false) { + if (qatomic_cmpxchg(&p->pending_job_preparing, false, true) == false) { break; } } - /* - * Make sure we read p->pending_job before all the rest. Pairs with - * qatomic_store_release() in multifd_send_thread(). - */ - smp_mb_acquire(); - assert(multifd_payload_empty(p->data)); /* @@ -534,6 +537,7 @@ static bool multifd_send_cleanup_channel(MultiFDSendParams *p, Error **errp) p->name = NULL; g_clear_pointer(&p->data, multifd_send_data_free); p->packet_len = 0; + g_clear_pointer(&p->packet_device_state, g_free); g_free(p->packet); p->packet = NULL; multifd_send_state->ops->send_cleanup(p, errp); @@ -545,6 +549,7 @@ static void multifd_send_cleanup_state(void) { file_cleanup_outgoing_migration(); socket_cleanup_outgoing_migration(); + multifd_device_state_save_cleanup(); qemu_sem_destroy(&multifd_send_state->channels_created); qemu_sem_destroy(&multifd_send_state->channels_ready); g_free(multifd_send_state->params); @@ -670,19 +675,29 @@ static void *multifd_send_thread(void *opaque) * qatomic_store_release() in multifd_send(). */ if (qatomic_load_acquire(&p->pending_job)) { + bool is_device_state = multifd_payload_device_state(p->data); + size_t total_size; + p->flags = 0; p->iovs_num = 0; assert(!multifd_payload_empty(p->data)); - ret = multifd_send_state->ops->send_prepare(p, &local_err); - if (ret != 0) { - break; + if (is_device_state) { + multifd_device_state_send_prepare(p); + } else { + ret = multifd_send_state->ops->send_prepare(p, &local_err); + if (ret != 0) { + break; + } } if (migrate_mapped_ram()) { + assert(!is_device_state); + ret = file_write_ramblock_iov(p->c, p->iov, p->iovs_num, &p->data->u.ram, &local_err); } else { + total_size = iov_size(p->iov, p->iovs_num); ret = qio_channel_writev_full_all(p->c, p->iov, p->iovs_num, NULL, 0, p->write_flags, &local_err); @@ -692,18 +707,27 @@ static void *multifd_send_thread(void *opaque) break; } - stat64_add(&mig_stats.multifd_bytes, - p->next_packet_size + p->packet_len); + if (is_device_state) { + stat64_add(&mig_stats.multifd_bytes, total_size); + } else { + /* + * Can't just always add total_size since IOVs do not include + * packet header in the zerocopy RAM case. + */ + stat64_add(&mig_stats.multifd_bytes, + p->next_packet_size + p->packet_len); + } p->next_packet_size = 0; multifd_send_data_clear(p->data); /* * Making sure p->data is published before saying "we're - * free". Pairs with the smp_mb_acquire() in + * free". Pairs with the qatomic_cmpxchg() in * multifd_send(). */ qatomic_store_release(&p->pending_job, false); + qatomic_store_release(&p->pending_job_preparing, false); } else { /* * If not a normal job, must be a sync request. Note that @@ -714,7 +738,7 @@ static void *multifd_send_thread(void *opaque) if (use_packets) { p->flags = MULTIFD_FLAG_SYNC; - multifd_send_fill_packet(p); + multifd_send_fill_packet_ram(p); ret = qio_channel_write_all(p->c, (void *)p->packet, p->packet_len, &local_err); if (ret != 0) { @@ -910,6 +934,9 @@ bool multifd_send_setup(void) p->packet_len = sizeof(MultiFDPacket_t) + sizeof(uint64_t) * page_count; p->packet = g_malloc0(p->packet_len); + p->packet_device_state = g_malloc0(sizeof(*p->packet_device_state)); + p->packet_device_state->hdr.magic = cpu_to_be32(MULTIFD_MAGIC); + p->packet_device_state->hdr.version = cpu_to_be32(MULTIFD_VERSION); } p->name = g_strdup_printf("mig/src/send_%d", i); p->write_flags = 0; @@ -944,6 +971,8 @@ bool multifd_send_setup(void) } } + multifd_device_state_save_setup(); + return true; err: diff --git a/migration/multifd.h b/migration/multifd.h index a0853622153e..c15c83104c8b 100644 --- a/migration/multifd.h +++ b/migration/multifd.h @@ -120,10 +120,12 @@ typedef struct { typedef enum { MULTIFD_PAYLOAD_NONE, MULTIFD_PAYLOAD_RAM, + MULTIFD_PAYLOAD_DEVICE_STATE, } MultiFDPayloadType; typedef union MultiFDPayload { MultiFDPages_t ram; + MultiFDDeviceState_t device_state; } MultiFDPayload; struct MultiFDSendData { @@ -136,6 +138,11 @@ static inline bool multifd_payload_empty(MultiFDSendData *data) return data->type == MULTIFD_PAYLOAD_NONE; } +static inline bool multifd_payload_device_state(MultiFDSendData *data) +{ + return data->type == MULTIFD_PAYLOAD_DEVICE_STATE; +} + static inline void multifd_set_payload_type(MultiFDSendData *data, MultiFDPayloadType type) { @@ -182,13 +189,15 @@ typedef struct { * cleared by the multifd sender threads. */ bool pending_job; + bool pending_job_preparing; bool pending_sync; MultiFDSendData *data; /* thread local variables. No locking required */ - /* pointer to the packet */ + /* pointers to the possible packet types */ MultiFDPacket_t *packet; + MultiFDPacketDeviceState_t *packet_device_state; /* size of the next packet that contains pages */ uint32_t next_packet_size; /* packets sent through this channel */ @@ -276,18 +285,25 @@ typedef struct { } MultiFDMethods; void multifd_register_ops(int method, MultiFDMethods *ops); -void multifd_send_fill_packet(MultiFDSendParams *p); +void multifd_send_fill_packet_ram(MultiFDSendParams *p); bool multifd_send_prepare_common(MultiFDSendParams *p); void multifd_send_zero_page_detect(MultiFDSendParams *p); void multifd_recv_zero_page_process(MultiFDRecvParams *p); -static inline void multifd_send_prepare_header(MultiFDSendParams *p) +static inline void multifd_send_prepare_header_ram(MultiFDSendParams *p) { p->iov[0].iov_len = p->packet_len; p->iov[0].iov_base = p->packet; p->iovs_num++; } +static inline void multifd_send_prepare_header_device_state(MultiFDSendParams *p) +{ + p->iov[0].iov_len = sizeof(*p->packet_device_state); + p->iov[0].iov_base = p->packet_device_state; + p->iovs_num++; +} + void multifd_channel_connect(MultiFDSendParams *p, QIOChannel *ioc); bool multifd_send(MultiFDSendData **send_data); MultiFDSendData *multifd_send_data_alloc(void); @@ -310,4 +326,11 @@ int multifd_ram_flush_and_sync(void); size_t multifd_ram_payload_size(void); void multifd_ram_fill_packet(MultiFDSendParams *p); int multifd_ram_unfill_packet(MultiFDRecvParams *p, Error **errp); + +size_t multifd_device_state_payload_size(void); +void multifd_device_state_save_setup(void); +void multifd_device_state_clear(MultiFDDeviceState_t *device_state); +void multifd_device_state_save_cleanup(void); +void multifd_device_state_send_prepare(MultiFDSendParams *p); + #endif From patchwork Tue Aug 27 17:54:32 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Maciej S. Szmigiero" X-Patchwork-Id: 13779932 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 5CF0CC5472F for ; Tue, 27 Aug 2024 17:56:48 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1sj0Qp-0000X6-64; Tue, 27 Aug 2024 13:56:39 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1sj0QT-0007WS-Kj for qemu-devel@nongnu.org; Tue, 27 Aug 2024 13:56:21 -0400 Received: from vps-vb.mhejs.net ([37.28.154.113]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1sj0QK-0001ci-SI for qemu-devel@nongnu.org; Tue, 27 Aug 2024 13:56:17 -0400 Received: from MUA by vps-vb.mhejs.net with esmtps (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1sj0QD-0002PG-Gc; Tue, 27 Aug 2024 19:56:01 +0200 From: "Maciej S. Szmigiero" To: Peter Xu , Fabiano Rosas Cc: Alex Williamson , =?utf-8?q?C=C3=A9dric_Le_G?= =?utf-8?q?oater?= , Eric Blake , Markus Armbruster , =?utf-8?q?Daniel_P_=2E_Berrang=C3=A9?= , Avihai Horon , Joao Martins , qemu-devel@nongnu.org Subject: [PATCH v2 13/17] migration/multifd: Add migration_has_device_state_support() Date: Tue, 27 Aug 2024 19:54:32 +0200 Message-ID: <8407eb455dfc1dea3cabf065f90833fab337eb98.1724701542.git.maciej.szmigiero@oracle.com> X-Mailer: git-send-email 2.45.2 In-Reply-To: References: MIME-Version: 1.0 Received-SPF: pass client-ip=37.28.154.113; envelope-from=mail@maciej.szmigiero.name; helo=vps-vb.mhejs.net X-Spam_score_int: -18 X-Spam_score: -1.9 X-Spam_bar: - X-Spam_report: (-1.9 / 5.0 requ) BAYES_00=-1.9, RCVD_IN_VALIDITY_CERTIFIED_BLOCKED=0.001, RCVD_IN_VALIDITY_RPBL_BLOCKED=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org From: "Maciej S. Szmigiero" Since device state transfer via multifd channels requires multifd channels with packets and is currently not compatible with multifd compression add an appropriate query function so device can learn whether it can actually make use of it. Signed-off-by: Maciej S. Szmigiero Reviewed-by: Fabiano Rosas --- include/migration/misc.h | 1 + migration/multifd-device-state.c | 7 +++++++ 2 files changed, 8 insertions(+) diff --git a/include/migration/misc.h b/include/migration/misc.h index 7266b1b77d1f..189de6d02ad6 100644 --- a/include/migration/misc.h +++ b/include/migration/misc.h @@ -114,5 +114,6 @@ void dirty_bitmap_mig_init(void); /* migration/multifd-device-state.c */ bool multifd_queue_device_state(char *idstr, uint32_t instance_id, char *data, size_t len); +bool migration_has_device_state_support(void); #endif diff --git a/migration/multifd-device-state.c b/migration/multifd-device-state.c index c9b44f0b5ab9..7b34fe736c7f 100644 --- a/migration/multifd-device-state.c +++ b/migration/multifd-device-state.c @@ -11,6 +11,7 @@ #include "qemu/lockable.h" #include "migration/misc.h" #include "multifd.h" +#include "options.h" static QemuMutex queue_job_mutex; @@ -97,3 +98,9 @@ bool multifd_queue_device_state(char *idstr, uint32_t instance_id, return true; } + +bool migration_has_device_state_support(void) +{ + return migrate_multifd() && !migrate_mapped_ram() && + migrate_multifd_compression() == MULTIFD_COMPRESSION_NONE; +} From patchwork Tue Aug 27 17:54:33 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Maciej S. Szmigiero" X-Patchwork-Id: 13779933 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 4C7FDC5472F for ; Tue, 27 Aug 2024 17:57:18 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1sj0Qk-0008QX-0F; Tue, 27 Aug 2024 13:56:34 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1sj0QU-0007YF-3k for qemu-devel@nongnu.org; Tue, 27 Aug 2024 13:56:21 -0400 Received: from vps-vb.mhejs.net ([37.28.154.113]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1sj0QR-0001cy-3o for qemu-devel@nongnu.org; Tue, 27 Aug 2024 13:56:17 -0400 Received: from MUA by vps-vb.mhejs.net with esmtps (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1sj0QI-0002PY-NN; Tue, 27 Aug 2024 19:56:06 +0200 From: "Maciej S. Szmigiero" To: Peter Xu , Fabiano Rosas Cc: Alex Williamson , =?utf-8?q?C=C3=A9dric_Le_G?= =?utf-8?q?oater?= , Eric Blake , Markus Armbruster , =?utf-8?q?Daniel_P_=2E_Berrang=C3=A9?= , Avihai Horon , Joao Martins , qemu-devel@nongnu.org Subject: [PATCH v2 14/17] migration: Add save_live_complete_precopy_thread handler Date: Tue, 27 Aug 2024 19:54:33 +0200 Message-ID: X-Mailer: git-send-email 2.45.2 In-Reply-To: References: MIME-Version: 1.0 Received-SPF: pass client-ip=37.28.154.113; envelope-from=mail@maciej.szmigiero.name; helo=vps-vb.mhejs.net X-Spam_score_int: -18 X-Spam_score: -1.9 X-Spam_bar: - X-Spam_report: (-1.9 / 5.0 requ) BAYES_00=-1.9, RCVD_IN_VALIDITY_CERTIFIED_BLOCKED=0.001, RCVD_IN_VALIDITY_RPBL_BLOCKED=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org From: "Maciej S. Szmigiero" This SaveVMHandler helps device provide its own asynchronous transmission of the remaining data at the end of a precopy phase via multifd channels, in parallel with the transfer done by save_live_complete_precopy handlers. These threads are launched only when multifd device state transfer is supported, after all save_live_complete_precopy_begin handlers have already finished (for stream synchronization purposes). Management of these threads in done in the multifd migration code, wrapping them in the generic thread pool. Signed-off-by: Maciej S. Szmigiero --- include/migration/misc.h | 10 ++++ include/migration/register.h | 25 +++++++++ include/qemu/typedefs.h | 4 ++ migration/multifd-device-state.c | 87 ++++++++++++++++++++++++++++++++ migration/savevm.c | 40 ++++++++++++++- 5 files changed, 165 insertions(+), 1 deletion(-) diff --git a/include/migration/misc.h b/include/migration/misc.h index 189de6d02ad6..26f7f3140f03 100644 --- a/include/migration/misc.h +++ b/include/migration/misc.h @@ -116,4 +116,14 @@ bool multifd_queue_device_state(char *idstr, uint32_t instance_id, char *data, size_t len); bool migration_has_device_state_support(void); +void +multifd_spawn_device_state_save_thread(SaveLiveCompletePrecopyThreadHandler hdlr, + char *idstr, uint32_t instance_id, + void *opaque); + +void multifd_launch_device_state_save_threads(int max_count); + +void multifd_abort_device_state_save_threads(void); +int multifd_join_device_state_save_threads(void); + #endif diff --git a/include/migration/register.h b/include/migration/register.h index 44d8cf5192ae..ace2cfc0f75e 100644 --- a/include/migration/register.h +++ b/include/migration/register.h @@ -139,6 +139,31 @@ typedef struct SaveVMHandlers { */ int (*save_live_complete_precopy_end)(QEMUFile *f, void *opaque); + /* This runs in a separate thread. */ + + /** + * @save_live_complete_precopy_thread + * + * Called at the end of a precopy phase from a separate worker thread + * in configurations where multifd device state transfer is supported + * in order to perform asynchronous transmission of the remaining data in + * parallel with @save_live_complete_precopy handlers. + * The call happens after all @save_live_complete_precopy_begin handlers + * have finished. + * When postcopy is enabled, devices that support postcopy will skip this + * step. + * + * @idstr: this device section idstr + * @instance_id: this device section instance_id + * @abort_flag: flag indicating that the migration core wants to abort + * the transmission and so the handler should exit ASAP. To be read by + * qatomic_read() or similar. + * @opaque: data pointer passed to register_savevm_live() + * + * Returns zero to indicate success and negative for error + */ + SaveLiveCompletePrecopyThreadHandler save_live_complete_precopy_thread; + /* This runs both outside and inside the BQL. */ /** diff --git a/include/qemu/typedefs.h b/include/qemu/typedefs.h index 9d222dc37628..edd6e7b9c116 100644 --- a/include/qemu/typedefs.h +++ b/include/qemu/typedefs.h @@ -130,5 +130,9 @@ typedef struct IRQState *qemu_irq; * Function types */ typedef void (*qemu_irq_handler)(void *opaque, int n, int level); +typedef int (*SaveLiveCompletePrecopyThreadHandler)(char *idstr, + uint32_t instance_id, + bool *abort_flag, + void *opaque); #endif /* QEMU_TYPEDEFS_H */ diff --git a/migration/multifd-device-state.c b/migration/multifd-device-state.c index 7b34fe736c7f..9b364e8ef33c 100644 --- a/migration/multifd-device-state.c +++ b/migration/multifd-device-state.c @@ -9,12 +9,17 @@ #include "qemu/osdep.h" #include "qemu/lockable.h" +#include "block/thread-pool.h" #include "migration/misc.h" #include "multifd.h" #include "options.h" static QemuMutex queue_job_mutex; +ThreadPool *send_threads; +int send_threads_ret; +bool send_threads_abort; + static MultiFDSendData *device_state_send; size_t multifd_device_state_payload_size(void) @@ -27,6 +32,10 @@ void multifd_device_state_save_setup(void) qemu_mutex_init(&queue_job_mutex); device_state_send = multifd_send_data_alloc(); + + send_threads = thread_pool_new(NULL); + send_threads_ret = 0; + send_threads_abort = false; } void multifd_device_state_clear(MultiFDDeviceState_t *device_state) @@ -37,6 +46,7 @@ void multifd_device_state_clear(MultiFDDeviceState_t *device_state) void multifd_device_state_save_cleanup(void) { + g_clear_pointer(&send_threads, thread_pool_free); g_clear_pointer(&device_state_send, multifd_send_data_free); qemu_mutex_destroy(&queue_job_mutex); @@ -104,3 +114,80 @@ bool migration_has_device_state_support(void) return migrate_multifd() && !migrate_mapped_ram() && migrate_multifd_compression() == MULTIFD_COMPRESSION_NONE; } + +static void multifd_device_state_save_thread_complete(void *opaque, int ret) +{ + if (ret && !send_threads_ret) { + send_threads_ret = ret; + } +} + +struct MultiFDDSSaveThreadData { + SaveLiveCompletePrecopyThreadHandler hdlr; + char *idstr; + uint32_t instance_id; + void *opaque; +}; + +static void multifd_device_state_save_thread_data_free(void *opaque) +{ + struct MultiFDDSSaveThreadData *data = opaque; + + g_clear_pointer(&data->idstr, g_free); + g_free(data); +} + +static int multifd_device_state_save_thread(void *opaque) +{ + struct MultiFDDSSaveThreadData *data = opaque; + + return data->hdlr(data->idstr, data->instance_id, &send_threads_abort, + data->opaque); +} + +void +multifd_spawn_device_state_save_thread(SaveLiveCompletePrecopyThreadHandler hdlr, + char *idstr, uint32_t instance_id, + void *opaque) +{ + struct MultiFDDSSaveThreadData *data; + + assert(migration_has_device_state_support()); + + data = g_new(struct MultiFDDSSaveThreadData, 1); + data->hdlr = hdlr; + data->idstr = g_strdup(idstr); + data->instance_id = instance_id; + data->opaque = opaque; + + thread_pool_submit(send_threads, + multifd_device_state_save_thread, + data, multifd_device_state_save_thread_data_free, + multifd_device_state_save_thread_complete, NULL); +} + +void multifd_launch_device_state_save_threads(int max_count) +{ + assert(migration_has_device_state_support()); + + thread_pool_set_minmax_threads(send_threads, + 0, max_count); + + thread_pool_poll(send_threads); +} + +void multifd_abort_device_state_save_threads(void) +{ + assert(migration_has_device_state_support()); + + qatomic_set(&send_threads_abort, true); +} + +int multifd_join_device_state_save_threads(void) +{ + assert(migration_has_device_state_support()); + + thread_pool_join(send_threads); + + return send_threads_ret; +} diff --git a/migration/savevm.c b/migration/savevm.c index 33c9200d1e78..a70f6ed006f2 100644 --- a/migration/savevm.c +++ b/migration/savevm.c @@ -1495,6 +1495,7 @@ int qemu_savevm_state_complete_precopy_iterable(QEMUFile *f, bool in_postcopy) int64_t start_ts_each, end_ts_each; SaveStateEntry *se; int ret; + bool multifd_device_state = migration_has_device_state_support(); QTAILQ_FOREACH(se, &savevm_state.handlers, entry) { if (!se->ops || (in_postcopy && se->ops->has_postcopy && @@ -1517,6 +1518,27 @@ int qemu_savevm_state_complete_precopy_iterable(QEMUFile *f, bool in_postcopy) } } + if (multifd_device_state) { + int thread_count = 0; + + QTAILQ_FOREACH(se, &savevm_state.handlers, entry) { + SaveLiveCompletePrecopyThreadHandler hdlr; + + if (!se->ops || (in_postcopy && se->ops->has_postcopy && + se->ops->has_postcopy(se->opaque)) || + !se->ops->save_live_complete_precopy_thread) { + continue; + } + + hdlr = se->ops->save_live_complete_precopy_thread; + multifd_spawn_device_state_save_thread(hdlr, + se->idstr, se->instance_id, + se->opaque); + thread_count++; + } + multifd_launch_device_state_save_threads(thread_count); + } + QTAILQ_FOREACH(se, &savevm_state.handlers, entry) { if (!se->ops || (in_postcopy && se->ops->has_postcopy && @@ -1541,13 +1563,21 @@ int qemu_savevm_state_complete_precopy_iterable(QEMUFile *f, bool in_postcopy) save_section_footer(f, se); if (ret < 0) { qemu_file_set_error(f, ret); - return -1; + goto ret_fail_abort_threads; } end_ts_each = qemu_clock_get_us(QEMU_CLOCK_REALTIME); trace_vmstate_downtime_save("iterable", se->idstr, se->instance_id, end_ts_each - start_ts_each); } + if (multifd_device_state) { + ret = multifd_join_device_state_save_threads(); + if (ret) { + qemu_file_set_error(f, ret); + return -1; + } + } + QTAILQ_FOREACH(se, &savevm_state.handlers, entry) { if (!se->ops || (in_postcopy && se->ops->has_postcopy && se->ops->has_postcopy(se->opaque)) || @@ -1565,6 +1595,14 @@ int qemu_savevm_state_complete_precopy_iterable(QEMUFile *f, bool in_postcopy) trace_vmstate_downtime_checkpoint("src-iterable-saved"); return 0; + +ret_fail_abort_threads: + if (multifd_device_state) { + multifd_abort_device_state_save_threads(); + multifd_join_device_state_save_threads(); + } + + return -1; } int qemu_savevm_state_complete_precopy_non_iterable(QEMUFile *f, From patchwork Tue Aug 27 17:54:34 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Maciej S. Szmigiero" X-Patchwork-Id: 13779934 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id B1C61C54731 for ; Tue, 27 Aug 2024 17:57:22 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1sj0RF-00030C-TS; Tue, 27 Aug 2024 13:57:05 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1sj0QZ-0007xy-Vi for qemu-devel@nongnu.org; Tue, 27 Aug 2024 13:56:24 -0400 Received: from vps-vb.mhejs.net ([37.28.154.113]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1sj0QX-0001da-8j for qemu-devel@nongnu.org; Tue, 27 Aug 2024 13:56:23 -0400 Received: from MUA by vps-vb.mhejs.net with esmtps (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1sj0QN-0002Q1-Tw; Tue, 27 Aug 2024 19:56:11 +0200 From: "Maciej S. Szmigiero" To: Peter Xu , Fabiano Rosas Cc: Alex Williamson , =?utf-8?q?C=C3=A9dric_Le_G?= =?utf-8?q?oater?= , Eric Blake , Markus Armbruster , =?utf-8?q?Daniel_P_=2E_Berrang=C3=A9?= , Avihai Horon , Joao Martins , qemu-devel@nongnu.org Subject: [PATCH v2 15/17] vfio/migration: Multifd device state transfer support - receive side Date: Tue, 27 Aug 2024 19:54:34 +0200 Message-ID: <4133ce80174fa3b81070adaeeb068554beba2854.1724701542.git.maciej.szmigiero@oracle.com> X-Mailer: git-send-email 2.45.2 In-Reply-To: References: MIME-Version: 1.0 Received-SPF: pass client-ip=37.28.154.113; envelope-from=mail@maciej.szmigiero.name; helo=vps-vb.mhejs.net X-Spam_score_int: -18 X-Spam_score: -1.9 X-Spam_bar: - X-Spam_report: (-1.9 / 5.0 requ) BAYES_00=-1.9, RCVD_IN_VALIDITY_CERTIFIED_BLOCKED=0.001, RCVD_IN_VALIDITY_RPBL_BLOCKED=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org From: "Maciej S. Szmigiero" The multifd received data needs to be reassembled since device state packets sent via different multifd channels can arrive out-of-order. Therefore, each VFIO device state packet carries a header indicating its position in the stream. The last such VFIO device state packet should have VFIO_DEVICE_STATE_CONFIG_STATE flag set and carry the device config state. Since it's important to finish loading device state transferred via the main migration channel (via save_live_iterate handler) before starting loading the data asynchronously transferred via multifd a new VFIO_MIG_FLAG_DEV_DATA_STATE_COMPLETE flag is introduced to mark the end of the main migration channel data. The device state loading process waits until that flag is seen before commencing loading of the multifd-transferred device state. Signed-off-by: Maciej S. Szmigiero --- hw/vfio/migration.c | 338 +++++++++++++++++++++++++++++++++- hw/vfio/pci.c | 2 + hw/vfio/trace-events | 9 +- include/hw/vfio/vfio-common.h | 17 ++ 4 files changed, 362 insertions(+), 4 deletions(-) diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c index 24679d8c5034..57c1542528dc 100644 --- a/hw/vfio/migration.c +++ b/hw/vfio/migration.c @@ -15,6 +15,7 @@ #include #include +#include "io/channel-buffer.h" #include "sysemu/runstate.h" #include "hw/vfio/vfio-common.h" #include "migration/misc.h" @@ -47,6 +48,7 @@ #define VFIO_MIG_FLAG_DEV_SETUP_STATE (0xffffffffef100003ULL) #define VFIO_MIG_FLAG_DEV_DATA_STATE (0xffffffffef100004ULL) #define VFIO_MIG_FLAG_DEV_INIT_DATA_SENT (0xffffffffef100005ULL) +#define VFIO_MIG_FLAG_DEV_DATA_STATE_COMPLETE (0xffffffffef100006ULL) /* * This is an arbitrary size based on migration of mlx5 devices, where typically @@ -55,6 +57,15 @@ */ #define VFIO_MIG_DEFAULT_DATA_BUFFER_SIZE (1 * MiB) +#define VFIO_DEVICE_STATE_CONFIG_STATE (1) + +typedef struct VFIODeviceStatePacket { + uint32_t version; + uint32_t idx; + uint32_t flags; + uint8_t data[0]; +} QEMU_PACKED VFIODeviceStatePacket; + static int64_t bytes_transferred; static const char *mig_state_to_str(enum vfio_device_mig_state state) @@ -254,6 +265,188 @@ static int vfio_load_buffer(QEMUFile *f, VFIODevice *vbasedev, return ret; } +typedef struct LoadedBuffer { + bool is_present; + char *data; + size_t len; +} LoadedBuffer; + +static void loaded_buffer_clear(gpointer data) +{ + LoadedBuffer *lb = data; + + if (!lb->is_present) { + return; + } + + g_clear_pointer(&lb->data, g_free); + lb->is_present = false; +} + +static int vfio_load_state_buffer(void *opaque, char *data, size_t data_size, + Error **errp) +{ + VFIODevice *vbasedev = opaque; + VFIOMigration *migration = vbasedev->migration; + VFIODeviceStatePacket *packet = (VFIODeviceStatePacket *)data; + QEMU_LOCK_GUARD(&migration->load_bufs_mutex); + LoadedBuffer *lb; + + if (data_size < sizeof(*packet)) { + error_setg(errp, "packet too short at %zu (min is %zu)", + data_size, sizeof(*packet)); + return -1; + } + + if (packet->version != 0) { + error_setg(errp, "packet has unknown version %" PRIu32, + packet->version); + return -1; + } + + if (packet->idx == UINT32_MAX) { + error_setg(errp, "packet has too high idx %" PRIu32, + packet->idx); + return -1; + } + + trace_vfio_load_state_device_buffer_incoming(vbasedev->name, packet->idx); + + /* config state packet should be the last one in the stream */ + if (packet->flags & VFIO_DEVICE_STATE_CONFIG_STATE) { + migration->load_buf_idx_last = packet->idx; + } + + assert(migration->load_bufs); + if (packet->idx >= migration->load_bufs->len) { + g_array_set_size(migration->load_bufs, packet->idx + 1); + } + + lb = &g_array_index(migration->load_bufs, typeof(*lb), packet->idx); + if (lb->is_present) { + error_setg(errp, "state buffer %" PRIu32 " already filled", packet->idx); + return -1; + } + + assert(packet->idx >= migration->load_buf_idx); + + migration->load_buf_queued_pending_buffers++; + if (migration->load_buf_queued_pending_buffers > + vbasedev->migration_max_queued_buffers) { + error_setg(errp, + "queuing state buffer %" PRIu32 " would exceed the max of %" PRIu64, + packet->idx, vbasedev->migration_max_queued_buffers); + return -1; + } + + lb->data = g_memdup2(&packet->data, data_size - sizeof(*packet)); + lb->len = data_size - sizeof(*packet); + lb->is_present = true; + + qemu_cond_broadcast(&migration->load_bufs_buffer_ready_cond); + + return 0; +} + +static void *vfio_load_bufs_thread(void *opaque) +{ + VFIODevice *vbasedev = opaque; + VFIOMigration *migration = vbasedev->migration; + Error **errp = &migration->load_bufs_thread_errp; + g_autoptr(QemuLockable) locker = qemu_lockable_auto_lock( + QEMU_MAKE_LOCKABLE(&migration->load_bufs_mutex)); + LoadedBuffer *lb; + + while (!migration->load_bufs_device_ready && + !migration->load_bufs_thread_want_exit) { + qemu_cond_wait(&migration->load_bufs_device_ready_cond, &migration->load_bufs_mutex); + } + + while (!migration->load_bufs_thread_want_exit) { + bool starved; + ssize_t ret; + + assert(migration->load_buf_idx <= migration->load_buf_idx_last); + + if (migration->load_buf_idx >= migration->load_bufs->len) { + assert(migration->load_buf_idx == migration->load_bufs->len); + starved = true; + } else { + lb = &g_array_index(migration->load_bufs, typeof(*lb), migration->load_buf_idx); + starved = !lb->is_present; + } + + if (starved) { + trace_vfio_load_state_device_buffer_starved(vbasedev->name, migration->load_buf_idx); + qemu_cond_wait(&migration->load_bufs_buffer_ready_cond, &migration->load_bufs_mutex); + continue; + } + + if (migration->load_buf_idx == migration->load_buf_idx_last) { + break; + } + + if (migration->load_buf_idx == 0) { + trace_vfio_load_state_device_buffer_start(vbasedev->name); + } + + if (lb->len) { + g_autofree char *buf = NULL; + size_t buf_len; + int errno_save; + + trace_vfio_load_state_device_buffer_load_start(vbasedev->name, + migration->load_buf_idx); + + /* lb might become re-allocated when we drop the lock */ + buf = g_steal_pointer(&lb->data); + buf_len = lb->len; + + /* Loading data to the device takes a while, drop the lock during this process */ + qemu_mutex_unlock(&migration->load_bufs_mutex); + ret = write(migration->data_fd, buf, buf_len); + errno_save = errno; + qemu_mutex_lock(&migration->load_bufs_mutex); + + if (ret < 0) { + error_setg(errp, "write to state buffer %" PRIu32 " failed with %d", + migration->load_buf_idx, errno_save); + break; + } else if (ret < buf_len) { + error_setg(errp, "write to state buffer %" PRIu32 " incomplete %zd / %zu", + migration->load_buf_idx, ret, buf_len); + break; + } + + trace_vfio_load_state_device_buffer_load_end(vbasedev->name, + migration->load_buf_idx); + } + + assert(migration->load_buf_queued_pending_buffers > 0); + migration->load_buf_queued_pending_buffers--; + + if (migration->load_buf_idx == migration->load_buf_idx_last - 1) { + trace_vfio_load_state_device_buffer_end(vbasedev->name); + } + + migration->load_buf_idx++; + } + + if (migration->load_bufs_thread_want_exit && + !*errp) { + error_setg(errp, "load bufs thread asked to quit"); + } + + g_clear_pointer(&locker, qemu_lockable_auto_unlock); + + qemu_loadvm_load_finish_ready_lock(); + migration->load_bufs_thread_finished = true; + qemu_loadvm_load_finish_ready_broadcast(); + qemu_loadvm_load_finish_ready_unlock(); + + return NULL; +} + static int vfio_save_device_config_state(QEMUFile *f, void *opaque, Error **errp) { @@ -285,6 +478,8 @@ static int vfio_load_device_config_state(QEMUFile *f, void *opaque) VFIODevice *vbasedev = opaque; uint64_t data; + trace_vfio_load_device_config_state_start(vbasedev->name); + if (vbasedev->ops && vbasedev->ops->vfio_load_config) { int ret; @@ -303,7 +498,7 @@ static int vfio_load_device_config_state(QEMUFile *f, void *opaque) return -EINVAL; } - trace_vfio_load_device_config_state(vbasedev->name); + trace_vfio_load_device_config_state_end(vbasedev->name); return qemu_file_get_error(f); } @@ -687,16 +882,70 @@ static void vfio_save_state(QEMUFile *f, void *opaque) static int vfio_load_setup(QEMUFile *f, void *opaque, Error **errp) { VFIODevice *vbasedev = opaque; + VFIOMigration *migration = vbasedev->migration; + int ret; + + ret = vfio_migration_set_state(vbasedev, VFIO_DEVICE_STATE_RESUMING, + vbasedev->migration->device_state, errp); + if (ret) { + return ret; + } + + assert(!migration->load_bufs); + migration->load_bufs = g_array_new(FALSE, TRUE, sizeof(LoadedBuffer)); + g_array_set_clear_func(migration->load_bufs, loaded_buffer_clear); + + qemu_mutex_init(&migration->load_bufs_mutex); + + migration->load_bufs_device_ready = false; + qemu_cond_init(&migration->load_bufs_device_ready_cond); + + migration->load_buf_idx = 0; + migration->load_buf_idx_last = UINT32_MAX; + migration->load_buf_queued_pending_buffers = 0; + qemu_cond_init(&migration->load_bufs_buffer_ready_cond); + + migration->config_state_loaded_to_dev = false; + + assert(!migration->load_bufs_thread_started); - return vfio_migration_set_state(vbasedev, VFIO_DEVICE_STATE_RESUMING, - vbasedev->migration->device_state, errp); + migration->load_bufs_thread_finished = false; + migration->load_bufs_thread_want_exit = false; + qemu_thread_create(&migration->load_bufs_thread, "vfio-load-bufs", + vfio_load_bufs_thread, opaque, QEMU_THREAD_JOINABLE); + + migration->load_bufs_thread_started = true; + + return 0; } static int vfio_load_cleanup(void *opaque) { VFIODevice *vbasedev = opaque; + VFIOMigration *migration = vbasedev->migration; + + if (migration->load_bufs_thread_started) { + qemu_mutex_lock(&migration->load_bufs_mutex); + migration->load_bufs_thread_want_exit = true; + qemu_mutex_unlock(&migration->load_bufs_mutex); + + qemu_cond_broadcast(&migration->load_bufs_device_ready_cond); + qemu_cond_broadcast(&migration->load_bufs_buffer_ready_cond); + + qemu_thread_join(&migration->load_bufs_thread); + + assert(migration->load_bufs_thread_finished); + + migration->load_bufs_thread_started = false; + } vfio_migration_cleanup(vbasedev); + + g_clear_pointer(&migration->load_bufs, g_array_unref); + qemu_cond_destroy(&migration->load_bufs_buffer_ready_cond); + qemu_cond_destroy(&migration->load_bufs_device_ready_cond); + qemu_mutex_destroy(&migration->load_bufs_mutex); + trace_vfio_load_cleanup(vbasedev->name); return 0; @@ -705,6 +954,7 @@ static int vfio_load_cleanup(void *opaque) static int vfio_load_state(QEMUFile *f, void *opaque, int version_id) { VFIODevice *vbasedev = opaque; + VFIOMigration *migration = vbasedev->migration; int ret = 0; uint64_t data; @@ -716,6 +966,7 @@ static int vfio_load_state(QEMUFile *f, void *opaque, int version_id) switch (data) { case VFIO_MIG_FLAG_DEV_CONFIG_STATE: { + migration->config_state_loaded_to_dev = true; return vfio_load_device_config_state(f, opaque); } case VFIO_MIG_FLAG_DEV_SETUP_STATE: @@ -742,6 +993,15 @@ static int vfio_load_state(QEMUFile *f, void *opaque, int version_id) } break; } + case VFIO_MIG_FLAG_DEV_DATA_STATE_COMPLETE: + { + QEMU_LOCK_GUARD(&migration->load_bufs_mutex); + + migration->load_bufs_device_ready = true; + qemu_cond_broadcast(&migration->load_bufs_device_ready_cond); + + break; + } case VFIO_MIG_FLAG_DEV_INIT_DATA_SENT: { if (!vfio_precopy_supported(vbasedev) || @@ -774,6 +1034,76 @@ static int vfio_load_state(QEMUFile *f, void *opaque, int version_id) return ret; } +static int vfio_load_finish(void *opaque, bool *is_finished, Error **errp) +{ + VFIODevice *vbasedev = opaque; + VFIOMigration *migration = vbasedev->migration; + g_autoptr(QemuLockable) locker = NULL; + LoadedBuffer *lb; + g_autoptr(QIOChannelBuffer) bioc = NULL; + QEMUFile *f_out = NULL, *f_in = NULL; + uint64_t mig_header; + int ret; + + if (migration->config_state_loaded_to_dev) { + *is_finished = true; + return 0; + } + + if (!migration->load_bufs_thread_finished) { + assert(migration->load_bufs_thread_started); + *is_finished = false; + return 0; + } + + if (migration->load_bufs_thread_errp) { + error_propagate(errp, g_steal_pointer(&migration->load_bufs_thread_errp)); + return -1; + } + + locker = qemu_lockable_auto_lock(QEMU_MAKE_LOCKABLE(&migration->load_bufs_mutex)); + + assert(migration->load_buf_idx == migration->load_buf_idx_last); + lb = &g_array_index(migration->load_bufs, typeof(*lb), migration->load_buf_idx); + assert(lb->is_present); + + bioc = qio_channel_buffer_new(lb->len); + qio_channel_set_name(QIO_CHANNEL(bioc), "vfio-device-config-load"); + + f_out = qemu_file_new_output(QIO_CHANNEL(bioc)); + qemu_put_buffer(f_out, (uint8_t *)lb->data, lb->len); + + ret = qemu_fflush(f_out); + if (ret) { + error_setg(errp, "load device config state file flush failed with %d", ret); + g_clear_pointer(&f_out, qemu_fclose); + return -1; + } + + qio_channel_io_seek(QIO_CHANNEL(bioc), 0, 0, NULL); + f_in = qemu_file_new_input(QIO_CHANNEL(bioc)); + + mig_header = qemu_get_be64(f_in); + if (mig_header != VFIO_MIG_FLAG_DEV_CONFIG_STATE) { + error_setg(errp, "load device config state invalid header %"PRIu64, mig_header); + g_clear_pointer(&f_out, qemu_fclose); + g_clear_pointer(&f_in, qemu_fclose); + return -1; + } + + ret = vfio_load_device_config_state(f_in, opaque); + g_clear_pointer(&f_out, qemu_fclose); + g_clear_pointer(&f_in, qemu_fclose); + if (ret < 0) { + error_setg(errp, "load device config state failed with %d", ret); + return -1; + } + + migration->config_state_loaded_to_dev = true; + *is_finished = true; + return 0; +} + static bool vfio_switchover_ack_needed(void *opaque) { VFIODevice *vbasedev = opaque; @@ -794,6 +1124,8 @@ static const SaveVMHandlers savevm_vfio_handlers = { .load_setup = vfio_load_setup, .load_cleanup = vfio_load_cleanup, .load_state = vfio_load_state, + .load_state_buffer = vfio_load_state_buffer, + .load_finish = vfio_load_finish, .switchover_ack_needed = vfio_switchover_ack_needed, }; diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c index 2407720c3530..08cb56d27a05 100644 --- a/hw/vfio/pci.c +++ b/hw/vfio/pci.c @@ -3378,6 +3378,8 @@ static Property vfio_pci_dev_properties[] = { VFIO_FEATURE_ENABLE_IGD_OPREGION_BIT, false), DEFINE_PROP_ON_OFF_AUTO("enable-migration", VFIOPCIDevice, vbasedev.enable_migration, ON_OFF_AUTO_AUTO), + DEFINE_PROP_UINT64("x-migration-max-queued-buffers", VFIOPCIDevice, + vbasedev.migration_max_queued_buffers, UINT64_MAX), DEFINE_PROP_BOOL("migration-events", VFIOPCIDevice, vbasedev.migration_events, false), DEFINE_PROP_BOOL("x-no-mmap", VFIOPCIDevice, vbasedev.no_mmap, false), diff --git a/hw/vfio/trace-events b/hw/vfio/trace-events index 013c602f30fa..9d2519a28a7e 100644 --- a/hw/vfio/trace-events +++ b/hw/vfio/trace-events @@ -149,9 +149,16 @@ vfio_display_edid_write_error(void) "" # migration.c vfio_load_cleanup(const char *name) " (%s)" -vfio_load_device_config_state(const char *name) " (%s)" +vfio_load_device_config_state_start(const char *name) " (%s)" +vfio_load_device_config_state_end(const char *name) " (%s)" vfio_load_state(const char *name, uint64_t data) " (%s) data 0x%"PRIx64 vfio_load_state_device_data(const char *name, uint64_t data_size, int ret) " (%s) size 0x%"PRIx64" ret %d" +vfio_load_state_device_buffer_incoming(const char *name, uint32_t idx) " (%s) idx %"PRIu32 +vfio_load_state_device_buffer_start(const char *name) " (%s)" +vfio_load_state_device_buffer_starved(const char *name, uint32_t idx) " (%s) idx %"PRIu32 +vfio_load_state_device_buffer_load_start(const char *name, uint32_t idx) " (%s) idx %"PRIu32 +vfio_load_state_device_buffer_load_end(const char *name, uint32_t idx) " (%s) idx %"PRIu32 +vfio_load_state_device_buffer_end(const char *name) " (%s)" vfio_migration_realize(const char *name) " (%s)" vfio_migration_set_device_state(const char *name, const char *state) " (%s) state %s" vfio_migration_set_state(const char *name, const char *new_state, const char *recover_state) " (%s) new state %s, recover state %s" diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h index 32d58e3e025b..ba5b9464e79a 100644 --- a/include/hw/vfio/vfio-common.h +++ b/include/hw/vfio/vfio-common.h @@ -76,6 +76,22 @@ typedef struct VFIOMigration { bool save_iterate_run; bool save_iterate_empty_hit; + + QemuThread load_bufs_thread; + Error *load_bufs_thread_errp; + bool load_bufs_thread_started; + bool load_bufs_thread_finished; + bool load_bufs_thread_want_exit; + + GArray *load_bufs; + bool load_bufs_device_ready; + QemuCond load_bufs_device_ready_cond; + QemuCond load_bufs_buffer_ready_cond; + QemuMutex load_bufs_mutex; + uint32_t load_buf_idx; + uint32_t load_buf_idx_last; + uint32_t load_buf_queued_pending_buffers; + bool config_state_loaded_to_dev; } VFIOMigration; struct VFIOGroup; @@ -134,6 +150,7 @@ typedef struct VFIODevice { bool ram_block_discard_allowed; OnOffAuto enable_migration; bool migration_events; + uint64_t migration_max_queued_buffers; VFIODeviceOps *ops; unsigned int num_irqs; unsigned int num_regions; From patchwork Tue Aug 27 17:54:35 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Maciej S. Szmigiero" X-Patchwork-Id: 13779948 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 7CB60C54731 for ; Tue, 27 Aug 2024 17:58:08 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1sj0RE-0002mz-PE; Tue, 27 Aug 2024 13:57:04 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1sj0Qb-00082T-3K for qemu-devel@nongnu.org; Tue, 27 Aug 2024 13:56:28 -0400 Received: from vps-vb.mhejs.net ([37.28.154.113]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1sj0QZ-0001e8-JM for qemu-devel@nongnu.org; Tue, 27 Aug 2024 13:56:24 -0400 Received: from MUA by vps-vb.mhejs.net with esmtps (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1sj0QT-0002QJ-4P; Tue, 27 Aug 2024 19:56:17 +0200 From: "Maciej S. Szmigiero" To: Peter Xu , Fabiano Rosas Cc: Alex Williamson , =?utf-8?q?C=C3=A9dric_Le_G?= =?utf-8?q?oater?= , Eric Blake , Markus Armbruster , =?utf-8?q?Daniel_P_=2E_Berrang=C3=A9?= , Avihai Horon , Joao Martins , qemu-devel@nongnu.org Subject: [PATCH v2 16/17] vfio/migration: Add x-migration-multifd-transfer VFIO property Date: Tue, 27 Aug 2024 19:54:35 +0200 Message-ID: <49b797459ace788bce2de757df3a722c3678cb0c.1724701542.git.maciej.szmigiero@oracle.com> X-Mailer: git-send-email 2.45.2 In-Reply-To: References: MIME-Version: 1.0 Received-SPF: pass client-ip=37.28.154.113; envelope-from=mail@maciej.szmigiero.name; helo=vps-vb.mhejs.net X-Spam_score_int: -18 X-Spam_score: -1.9 X-Spam_bar: - X-Spam_report: (-1.9 / 5.0 requ) BAYES_00=-1.9, RCVD_IN_VALIDITY_CERTIFIED_BLOCKED=0.001, RCVD_IN_VALIDITY_RPBL_BLOCKED=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org From: "Maciej S. Szmigiero" This property allows configuring at runtime whether to send the particular device state via multifd channels when live migrating that device. It is ignored on the receive side and defaults to "false" for bit stream compatibility with older QEMU versions. Signed-off-by: Maciej S. Szmigiero --- hw/vfio/pci.c | 7 +++++++ include/hw/vfio/vfio-common.h | 1 + 2 files changed, 8 insertions(+) diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c index 08cb56d27a05..b68f08ba8a4f 100644 --- a/hw/vfio/pci.c +++ b/hw/vfio/pci.c @@ -3354,6 +3354,8 @@ static void vfio_instance_init(Object *obj) pci_dev->cap_present |= QEMU_PCI_CAP_EXPRESS; } +static PropertyInfo qdev_prop_bool_mutable; + static Property vfio_pci_dev_properties[] = { DEFINE_PROP_PCI_HOST_DEVADDR("host", VFIOPCIDevice, host), DEFINE_PROP_UUID_NODEFAULT("vf-token", VFIOPCIDevice, vf_token), @@ -3378,6 +3380,8 @@ static Property vfio_pci_dev_properties[] = { VFIO_FEATURE_ENABLE_IGD_OPREGION_BIT, false), DEFINE_PROP_ON_OFF_AUTO("enable-migration", VFIOPCIDevice, vbasedev.enable_migration, ON_OFF_AUTO_AUTO), + DEFINE_PROP("x-migration-multifd-transfer", VFIOPCIDevice, + vbasedev.migration_multifd_transfer, qdev_prop_bool_mutable, bool), DEFINE_PROP_UINT64("x-migration-max-queued-buffers", VFIOPCIDevice, vbasedev.migration_max_queued_buffers, UINT64_MAX), DEFINE_PROP_BOOL("migration-events", VFIOPCIDevice, @@ -3477,6 +3481,9 @@ static const TypeInfo vfio_pci_nohotplug_dev_info = { static void register_vfio_pci_dev_type(void) { + qdev_prop_bool_mutable = qdev_prop_bool; + qdev_prop_bool_mutable.realized_set_allowed = true; + type_register_static(&vfio_pci_dev_info); type_register_static(&vfio_pci_nohotplug_dev_info); } diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h index ba5b9464e79a..fe05acb9a5d1 100644 --- a/include/hw/vfio/vfio-common.h +++ b/include/hw/vfio/vfio-common.h @@ -149,6 +149,7 @@ typedef struct VFIODevice { bool no_mmap; bool ram_block_discard_allowed; OnOffAuto enable_migration; + bool migration_multifd_transfer; bool migration_events; uint64_t migration_max_queued_buffers; VFIODeviceOps *ops; From patchwork Tue Aug 27 17:54:36 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Maciej S. Szmigiero" X-Patchwork-Id: 13779947 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 3D453C5472F for ; Tue, 27 Aug 2024 17:58:07 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1sj0RP-0003YW-RE; Tue, 27 Aug 2024 13:57:17 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1sj0Qj-00006n-0W for qemu-devel@nongnu.org; Tue, 27 Aug 2024 13:56:33 -0400 Received: from vps-vb.mhejs.net ([37.28.154.113]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1sj0Qg-0001eo-PZ for qemu-devel@nongnu.org; Tue, 27 Aug 2024 13:56:32 -0400 Received: from MUA by vps-vb.mhejs.net with esmtps (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1sj0QY-0002Qh-Ai; Tue, 27 Aug 2024 19:56:22 +0200 From: "Maciej S. Szmigiero" To: Peter Xu , Fabiano Rosas Cc: Alex Williamson , =?utf-8?q?C=C3=A9dric_Le_G?= =?utf-8?q?oater?= , Eric Blake , Markus Armbruster , =?utf-8?q?Daniel_P_=2E_Berrang=C3=A9?= , Avihai Horon , Joao Martins , qemu-devel@nongnu.org Subject: [PATCH v2 17/17] vfio/migration: Multifd device state transfer support - send side Date: Tue, 27 Aug 2024 19:54:36 +0200 Message-ID: <1429fc59079d99ca035b31303892d807868dc6c0.1724701542.git.maciej.szmigiero@oracle.com> X-Mailer: git-send-email 2.45.2 In-Reply-To: References: MIME-Version: 1.0 Received-SPF: pass client-ip=37.28.154.113; envelope-from=mail@maciej.szmigiero.name; helo=vps-vb.mhejs.net X-Spam_score_int: -18 X-Spam_score: -1.9 X-Spam_bar: - X-Spam_report: (-1.9 / 5.0 requ) BAYES_00=-1.9, RCVD_IN_VALIDITY_CERTIFIED_BLOCKED=0.001, RCVD_IN_VALIDITY_RPBL_BLOCKED=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org From: "Maciej S. Szmigiero" Implement the multifd device state transfer via additional per-device thread inside save_live_complete_precopy_thread handler. Switch between doing the data transfer in the new handler and doing it in the old save_state handler depending on the x-migration-multifd-transfer device property value. Signed-off-by: Maciej S. Szmigiero --- hw/vfio/migration.c | 169 ++++++++++++++++++++++++++++++++++ hw/vfio/trace-events | 2 + include/hw/vfio/vfio-common.h | 1 + 3 files changed, 172 insertions(+) diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c index 57c1542528dc..67996aa2df8b 100644 --- a/hw/vfio/migration.c +++ b/hw/vfio/migration.c @@ -655,6 +655,16 @@ static int vfio_save_setup(QEMUFile *f, void *opaque, Error **errp) uint64_t stop_copy_size = VFIO_MIG_DEFAULT_DATA_BUFFER_SIZE; int ret; + /* Make a copy of this setting at the start in case it is changed mid-migration */ + migration->multifd_transfer = vbasedev->migration_multifd_transfer; + + if (migration->multifd_transfer && !migration_has_device_state_support()) { + error_setg(errp, + "%s: Multifd device transfer requested but unsupported in the current config", + vbasedev->name); + return -EINVAL; + } + qemu_put_be64(f, VFIO_MIG_FLAG_DEV_SETUP_STATE); vfio_query_stop_copy_size(vbasedev, &stop_copy_size); @@ -835,10 +845,20 @@ static int vfio_save_iterate(QEMUFile *f, void *opaque) static int vfio_save_complete_precopy(QEMUFile *f, void *opaque) { VFIODevice *vbasedev = opaque; + VFIOMigration *migration = vbasedev->migration; ssize_t data_size; int ret; Error *local_err = NULL; + if (migration->multifd_transfer) { + /* + * Emit dummy NOP data, vfio_save_complete_precopy_thread() + * does the actual transfer. + */ + qemu_put_be64(f, VFIO_MIG_FLAG_END_OF_STATE); + return 0; + } + trace_vfio_save_complete_precopy_started(vbasedev->name); /* We reach here with device state STOP or STOP_COPY only */ @@ -864,12 +884,159 @@ static int vfio_save_complete_precopy(QEMUFile *f, void *opaque) return ret; } +static int vfio_save_complete_precopy_async_thread_config_state(VFIODevice *vbasedev, + char *idstr, + uint32_t instance_id, + uint32_t idx) +{ + g_autoptr(QIOChannelBuffer) bioc = NULL; + QEMUFile *f = NULL; + int ret; + g_autofree VFIODeviceStatePacket *packet = NULL; + size_t packet_len; + + bioc = qio_channel_buffer_new(0); + qio_channel_set_name(QIO_CHANNEL(bioc), "vfio-device-config-save"); + + f = qemu_file_new_output(QIO_CHANNEL(bioc)); + + ret = vfio_save_device_config_state(f, vbasedev, NULL); + if (ret) { + return ret; + } + + ret = qemu_fflush(f); + if (ret) { + goto ret_close_file; + } + + packet_len = sizeof(*packet) + bioc->usage; + packet = g_malloc0(packet_len); + packet->idx = idx; + packet->flags = VFIO_DEVICE_STATE_CONFIG_STATE; + memcpy(&packet->data, bioc->data, bioc->usage); + + if (!multifd_queue_device_state(idstr, instance_id, + (char *)packet, packet_len)) { + ret = -1; + } + + bytes_transferred += packet_len; + +ret_close_file: + g_clear_pointer(&f, qemu_fclose); + return ret; +} + +static int vfio_save_complete_precopy_thread(char *idstr, + uint32_t instance_id, + bool *abort_flag, + void *opaque) +{ + VFIODevice *vbasedev = opaque; + VFIOMigration *migration = vbasedev->migration; + int ret; + g_autofree VFIODeviceStatePacket *packet = NULL; + uint32_t idx; + + if (!migration->multifd_transfer) { + /* Nothing to do, vfio_save_complete_precopy() does the transfer. */ + return 0; + } + + trace_vfio_save_complete_precopy_thread_started(vbasedev->name, + idstr, instance_id); + + /* We reach here with device state STOP or STOP_COPY only */ + ret = vfio_migration_set_state(vbasedev, VFIO_DEVICE_STATE_STOP_COPY, + VFIO_DEVICE_STATE_STOP, NULL); + if (ret) { + goto ret_finish; + } + + packet = g_malloc0(sizeof(*packet) + migration->data_buffer_size); + + for (idx = 0; ; idx++) { + ssize_t data_size; + size_t packet_size; + + if (qatomic_read(abort_flag)) { + ret = -ECANCELED; + goto ret_finish; + } + + data_size = read(migration->data_fd, &packet->data, + migration->data_buffer_size); + if (data_size < 0) { + if (errno != ENOMSG) { + ret = -errno; + goto ret_finish; + } + + /* + * Pre-copy emptied all the device state for now. For more information, + * please refer to the Linux kernel VFIO uAPI. + */ + data_size = 0; + } + + if (data_size == 0) + break; + + packet->idx = idx; + packet_size = sizeof(*packet) + data_size; + + if (!multifd_queue_device_state(idstr, instance_id, + (char *)packet, packet_size)) { + ret = -1; + goto ret_finish; + } + + bytes_transferred += packet_size; + } + + ret = vfio_save_complete_precopy_async_thread_config_state(vbasedev, idstr, + instance_id, + idx); + +ret_finish: + trace_vfio_save_complete_precopy_thread_finished(vbasedev->name, ret); + + return ret; +} + +static int vfio_save_complete_precopy_begin(QEMUFile *f, + char *idstr, uint32_t instance_id, + void *opaque) +{ + VFIODevice *vbasedev = opaque; + VFIOMigration *migration = vbasedev->migration; + + if (!migration->multifd_transfer) { + /* Emit dummy NOP data */ + qemu_put_be64(f, VFIO_MIG_FLAG_END_OF_STATE); + return 0; + } + + qemu_put_be64(f, VFIO_MIG_FLAG_DEV_DATA_STATE_COMPLETE); + qemu_put_be64(f, VFIO_MIG_FLAG_END_OF_STATE); + + return qemu_fflush(f); +} + static void vfio_save_state(QEMUFile *f, void *opaque) { VFIODevice *vbasedev = opaque; + VFIOMigration *migration = vbasedev->migration; Error *local_err = NULL; int ret; + if (migration->multifd_transfer) { + /* Emit dummy NOP data */ + qemu_put_be64(f, VFIO_MIG_FLAG_END_OF_STATE); + return; + } + ret = vfio_save_device_config_state(f, opaque, &local_err); if (ret) { error_prepend(&local_err, @@ -1119,7 +1286,9 @@ static const SaveVMHandlers savevm_vfio_handlers = { .state_pending_exact = vfio_state_pending_exact, .is_active_iterate = vfio_is_active_iterate, .save_live_iterate = vfio_save_iterate, + .save_live_complete_precopy_begin = vfio_save_complete_precopy_begin, .save_live_complete_precopy = vfio_save_complete_precopy, + .save_live_complete_precopy_thread = vfio_save_complete_precopy_thread, .save_state = vfio_save_state, .load_setup = vfio_load_setup, .load_cleanup = vfio_load_cleanup, diff --git a/hw/vfio/trace-events b/hw/vfio/trace-events index 9d2519a28a7e..b1d9c9d5f2e1 100644 --- a/hw/vfio/trace-events +++ b/hw/vfio/trace-events @@ -167,6 +167,8 @@ vfio_save_block(const char *name, int data_size) " (%s) data_size %d" vfio_save_cleanup(const char *name) " (%s)" vfio_save_complete_precopy(const char *name, int ret) " (%s) ret %d" vfio_save_complete_precopy_started(const char *name) " (%s)" +vfio_save_complete_precopy_thread_started(const char *name, const char *idstr, uint32_t instance_id) " (%s) idstr %s instance %"PRIu32 +vfio_save_complete_precopy_thread_finished(const char *name, int ret) " (%s) ret %d" vfio_save_device_config_state(const char *name) " (%s)" vfio_save_iterate(const char *name, uint64_t precopy_init_size, uint64_t precopy_dirty_size) " (%s) precopy initial size 0x%"PRIx64" precopy dirty size 0x%"PRIx64 vfio_save_iterate_started(const char *name) " (%s)" diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h index fe05acb9a5d1..4578a0ca6a5c 100644 --- a/include/hw/vfio/vfio-common.h +++ b/include/hw/vfio/vfio-common.h @@ -72,6 +72,7 @@ typedef struct VFIOMigration { uint64_t mig_flags; uint64_t precopy_init_size; uint64_t precopy_dirty_size; + bool multifd_transfer; bool initial_data_sent; bool save_iterate_run;