From patchwork Fri Jun 30 05:22:20 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: =?utf-8?q?C=C3=A9dric_Le_Goater?= X-Patchwork-Id: 13297599 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id DEFD5EB64D7 for ; Fri, 30 Jun 2023 05:25:07 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1qF6bG-0008E0-PR; Fri, 30 Jun 2023 01:23:18 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1qF6b1-0008Ah-QU for qemu-devel@nongnu.org; Fri, 30 Jun 2023 01:23:04 -0400 Received: from mail.ozlabs.org ([2404:9400:2221:ea00::3] helo=gandalf.ozlabs.org) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1qF6at-0003Ha-FY for qemu-devel@nongnu.org; Fri, 30 Jun 2023 01:22:58 -0400 Received: from gandalf.ozlabs.org (mail.ozlabs.org [IPv6:2404:9400:2221:ea00::3]) by gandalf.ozlabs.org (Postfix) with ESMTP id 4QskGt6NQRz4wgk; Fri, 30 Jun 2023 15:22:46 +1000 (AEST) Received: from authenticated.ozlabs.org (localhost [127.0.0.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by mail.ozlabs.org (Postfix) with ESMTPSA id 4QskGr0T6Nz4wZy; Fri, 30 Jun 2023 15:22:43 +1000 (AEST) From: =?utf-8?q?C=C3=A9dric_Le_Goater?= To: qemu-devel@nongnu.org Cc: Richard Henderson , Alex Williamson , Avihai Horon , Peter Xu , Markus Armbruster , YangHang Liu , =?utf-8?q?C=C3=A9dric_Le_Goater?= Subject: [PULL 01/16] migration: Add switchover ack capability Date: Fri, 30 Jun 2023 07:22:20 +0200 Message-ID: <20230630052235.1934154-2-clg@redhat.com> X-Mailer: git-send-email 2.41.0 In-Reply-To: <20230630052235.1934154-1-clg@redhat.com> References: <20230630052235.1934154-1-clg@redhat.com> MIME-Version: 1.0 Received-SPF: pass client-ip=2404:9400:2221:ea00::3; envelope-from=SRS0=mf8A=CS=redhat.com=clg@ozlabs.org; helo=gandalf.ozlabs.org X-Spam_score_int: -39 X-Spam_score: -4.0 X-Spam_bar: ---- X-Spam_report: (-4.0 / 5.0 requ) BAYES_00=-1.9, HEADER_FROM_DIFFERENT_DOMAINS=0.25, RCVD_IN_DNSWL_MED=-2.3, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org From: Avihai Horon Migration downtime estimation is calculated based on bandwidth and remaining migration data. This assumes that loading of migration data in the destination takes a negligible amount of time and that downtime depends only on network speed. While this may be true for RAM, it's not necessarily true for other migrated devices. For example, loading the data of a VFIO device in the destination might require from the device to allocate resources, prepare internal data structures and so on. These operations can take a significant amount of time which can increase migration downtime. This patch adds a new capability "switchover ack" that prevents the source from stopping the VM and completing the migration until an ACK is received from the destination that it's OK to do so. This can be used by migrated devices in various ways to reduce downtime. For example, a device can send initial precopy metadata to pre-allocate resources in the destination and use this capability to make sure that the pre-allocation is completed before the source VM is stopped, so it will have full effect. This new capability relies on the return path capability to communicate from the destination back to the source. The actual implementation of the capability will be added in the following patches. Signed-off-by: Avihai Horon Reviewed-by: Peter Xu Acked-by: Markus Armbruster Tested-by: YangHang Liu Acked-by: Alex Williamson Signed-off-by: Cédric Le Goater --- qapi/migration.json | 12 +++++++++++- migration/options.h | 1 + migration/options.c | 21 +++++++++++++++++++++ 3 files changed, 33 insertions(+), 1 deletion(-) diff --git a/qapi/migration.json b/qapi/migration.json index 5bb5ab82a0cf..47dfef02780f 100644 --- a/qapi/migration.json +++ b/qapi/migration.json @@ -487,6 +487,16 @@ # and should not affect the correctness of postcopy migration. # (since 7.1) # +# @switchover-ack: If enabled, migration will not stop the source VM +# and complete the migration until an ACK is received from the +# destination that it's OK to do so. Exactly when this ACK is +# sent depends on the migrated devices that use this feature. +# For example, a device can use it to make sure some of its data +# is sent and loaded in the destination before doing switchover. +# This can reduce downtime if devices that support this capability +# are present. 'return-path' capability must be enabled to use +# it. (since 8.1) +# # Features: # # @unstable: Members @x-colo and @x-ignore-shared are experimental. @@ -502,7 +512,7 @@ 'dirty-bitmaps', 'postcopy-blocktime', 'late-block-activate', { 'name': 'x-ignore-shared', 'features': [ 'unstable' ] }, 'validate-uuid', 'background-snapshot', - 'zero-copy-send', 'postcopy-preempt'] } + 'zero-copy-send', 'postcopy-preempt', 'switchover-ack'] } ## # @MigrationCapabilityStatus: diff --git a/migration/options.h b/migration/options.h index 45991af3c208..9aaf363322b4 100644 --- a/migration/options.h +++ b/migration/options.h @@ -40,6 +40,7 @@ bool migrate_postcopy_ram(void); bool migrate_rdma_pin_all(void); bool migrate_release_ram(void); bool migrate_return_path(void); +bool migrate_switchover_ack(void); bool migrate_validate_uuid(void); bool migrate_xbzrle(void); bool migrate_zero_blocks(void); diff --git a/migration/options.c b/migration/options.c index b62ab30cd585..16007afca662 100644 --- a/migration/options.c +++ b/migration/options.c @@ -185,6 +185,8 @@ Property migration_properties[] = { DEFINE_PROP_MIG_CAP("x-zero-copy-send", MIGRATION_CAPABILITY_ZERO_COPY_SEND), #endif + DEFINE_PROP_MIG_CAP("x-switchover-ack", + MIGRATION_CAPABILITY_SWITCHOVER_ACK), DEFINE_PROP_END_OF_LIST(), }; @@ -308,6 +310,13 @@ bool migrate_return_path(void) return s->capabilities[MIGRATION_CAPABILITY_RETURN_PATH]; } +bool migrate_switchover_ack(void) +{ + MigrationState *s = migrate_get_current(); + + return s->capabilities[MIGRATION_CAPABILITY_SWITCHOVER_ACK]; +} + bool migrate_validate_uuid(void) { MigrationState *s = migrate_get_current(); @@ -547,6 +556,18 @@ bool migrate_caps_check(bool *old_caps, bool *new_caps, Error **errp) } } + if (new_caps[MIGRATION_CAPABILITY_SWITCHOVER_ACK]) { + if (!new_caps[MIGRATION_CAPABILITY_RETURN_PATH]) { + error_setg(errp, "Capability 'switchover-ack' requires capability " + "'return-path'"); + return false; + } + + /* Disable this capability until it's implemented */ + error_setg(errp, "'switchover-ack' is not implemented yet"); + return false; + } + return true; } From patchwork Fri Jun 30 05:22:21 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: =?utf-8?q?C=C3=A9dric_Le_Goater?= X-Patchwork-Id: 13297600 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 9363FEB64DD for ; Fri, 30 Jun 2023 05:25:09 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1qF6bR-0008GT-Cq; Fri, 30 Jun 2023 01:23:29 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1qF6b1-0008Ag-QK for qemu-devel@nongnu.org; Fri, 30 Jun 2023 01:23:04 -0400 Received: from gandalf.ozlabs.org ([150.107.74.76]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1qF6at-0003NU-FV for qemu-devel@nongnu.org; Fri, 30 Jun 2023 01:22:59 -0400 Received: from gandalf.ozlabs.org (mail.ozlabs.org [IPv6:2404:9400:2221:ea00::3]) by gandalf.ozlabs.org (Postfix) with ESMTP id 4QskGy02ZJz4wp2; Fri, 30 Jun 2023 15:22:50 +1000 (AEST) Received: from authenticated.ozlabs.org (localhost [127.0.0.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by mail.ozlabs.org (Postfix) with ESMTPSA id 4QskGv3HWtz4wZy; Fri, 30 Jun 2023 15:22:47 +1000 (AEST) From: =?utf-8?q?C=C3=A9dric_Le_Goater?= To: qemu-devel@nongnu.org Cc: Richard Henderson , Alex Williamson , Avihai Horon , Peter Xu , YangHang Liu , =?utf-8?q?C=C3=A9dric_Le_Goater?= Subject: [PULL 02/16] migration: Implement switchover ack logic Date: Fri, 30 Jun 2023 07:22:21 +0200 Message-ID: <20230630052235.1934154-3-clg@redhat.com> X-Mailer: git-send-email 2.41.0 In-Reply-To: <20230630052235.1934154-1-clg@redhat.com> References: <20230630052235.1934154-1-clg@redhat.com> MIME-Version: 1.0 Received-SPF: pass client-ip=150.107.74.76; envelope-from=SRS0=mf8A=CS=redhat.com=clg@ozlabs.org; helo=gandalf.ozlabs.org X-Spam_score_int: -16 X-Spam_score: -1.7 X-Spam_bar: - X-Spam_report: (-1.7 / 5.0 requ) BAYES_00=-1.9, HEADER_FROM_DIFFERENT_DOMAINS=0.25, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=no autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org From: Avihai Horon Implement switchover ack logic. This prevents the source from stopping the VM and completing the migration until an ACK is received from the destination that it's OK to do so. To achieve this, a new SaveVMHandlers handler switchover_ack_needed() and a new return path message MIG_RP_MSG_SWITCHOVER_ACK are added. The switchover_ack_needed() handler is called during migration setup in the destination to check if switchover ack is used by the migrated device. When switchover is approved by all migrated devices in the destination that support this capability, the MIG_RP_MSG_SWITCHOVER_ACK return path message is sent to the source to notify it that it's OK to do switchover. Signed-off-by: Avihai Horon Reviewed-by: Peter Xu Tested-by: YangHang Liu Acked-by: Alex Williamson Signed-off-by: Cédric Le Goater --- include/migration/register.h | 2 ++ migration/migration.h | 14 ++++++++++ migration/savevm.h | 1 + migration/migration.c | 32 +++++++++++++++++++-- migration/savevm.c | 54 ++++++++++++++++++++++++++++++++++++ migration/trace-events | 3 ++ 6 files changed, 104 insertions(+), 2 deletions(-) diff --git a/include/migration/register.h b/include/migration/register.h index a8dfd8fefd0a..90914f32f50c 100644 --- a/include/migration/register.h +++ b/include/migration/register.h @@ -71,6 +71,8 @@ typedef struct SaveVMHandlers { int (*load_cleanup)(void *opaque); /* Called when postcopy migration wants to resume from failure */ int (*resume_prepare)(MigrationState *s, void *opaque); + /* Checks if switchover ack should be used. Called only in dest */ + bool (*switchover_ack_needed)(void *opaque); } SaveVMHandlers; int register_savevm_live(const char *idstr, diff --git a/migration/migration.h b/migration/migration.h index 30c3e97635b1..c859a0d35eb7 100644 --- a/migration/migration.h +++ b/migration/migration.h @@ -210,6 +210,13 @@ struct MigrationIncomingState { * contains valid information. */ QemuMutex page_request_mutex; + + /* + * Number of devices that have yet to approve switchover. When this reaches + * zero an ACK that it's OK to do switchover is sent to the source. No lock + * is needed as this field is updated serially. + */ + unsigned int switchover_ack_pending_num; }; MigrationIncomingState *migration_incoming_get_current(void); @@ -440,6 +447,12 @@ struct MigrationState { /* QEMU_VM_VMDESCRIPTION content filled for all non-iterable devices. */ JSONWriter *vmdesc; + + /* + * Indicates whether an ACK from the destination that it's OK to do + * switchover has been received. + */ + bool switchover_acked; }; void migrate_set_state(int *state, int old_state, int new_state); @@ -480,6 +493,7 @@ int migrate_send_rp_message_req_pages(MigrationIncomingState *mis, void migrate_send_rp_recv_bitmap(MigrationIncomingState *mis, char *block_name); void migrate_send_rp_resume_ack(MigrationIncomingState *mis, uint32_t value); +int migrate_send_rp_switchover_ack(MigrationIncomingState *mis); void dirty_bitmap_mig_before_vm_start(void); void dirty_bitmap_mig_cancel_outgoing(void); diff --git a/migration/savevm.h b/migration/savevm.h index fb636735f0af..e894bbc14331 100644 --- a/migration/savevm.h +++ b/migration/savevm.h @@ -65,6 +65,7 @@ int qemu_loadvm_state(QEMUFile *f); void qemu_loadvm_state_cleanup(void); int qemu_loadvm_state_main(QEMUFile *f, MigrationIncomingState *mis); int qemu_load_device_state(QEMUFile *f); +int qemu_loadvm_approve_switchover(void); int qemu_savevm_state_complete_precopy_non_iterable(QEMUFile *f, bool in_postcopy, bool inactivate_disks); diff --git a/migration/migration.c b/migration/migration.c index dc05c6f6ea94..7653787f745d 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -78,6 +78,7 @@ enum mig_rp_message_type { MIG_RP_MSG_REQ_PAGES, /* data (start: be64, len: be32) */ MIG_RP_MSG_RECV_BITMAP, /* send recved_bitmap back to source */ MIG_RP_MSG_RESUME_ACK, /* tell source that we are ready to resume */ + MIG_RP_MSG_SWITCHOVER_ACK, /* Tell source it's OK to do switchover */ MIG_RP_MSG_MAX }; @@ -760,6 +761,11 @@ bool migration_has_all_channels(void) return true; } +int migrate_send_rp_switchover_ack(MigrationIncomingState *mis) +{ + return migrate_send_rp_message(mis, MIG_RP_MSG_SWITCHOVER_ACK, 0, NULL); +} + /* * Send a 'SHUT' message on the return channel with the given value * to indicate that we've finished with the RP. Non-0 value indicates @@ -1405,6 +1411,7 @@ void migrate_init(MigrationState *s) s->vm_old_state = -1; s->iteration_initial_bytes = 0; s->threshold_size = 0; + s->switchover_acked = false; } int migrate_add_blocker_internal(Error *reason, Error **errp) @@ -1721,6 +1728,7 @@ static struct rp_cmd_args { [MIG_RP_MSG_REQ_PAGES_ID] = { .len = -1, .name = "REQ_PAGES_ID" }, [MIG_RP_MSG_RECV_BITMAP] = { .len = -1, .name = "RECV_BITMAP" }, [MIG_RP_MSG_RESUME_ACK] = { .len = 4, .name = "RESUME_ACK" }, + [MIG_RP_MSG_SWITCHOVER_ACK] = { .len = 0, .name = "SWITCHOVER_ACK" }, [MIG_RP_MSG_MAX] = { .len = -1, .name = "MAX" }, }; @@ -1959,6 +1967,11 @@ retry: } break; + case MIG_RP_MSG_SWITCHOVER_ACK: + ms->switchover_acked = true; + trace_source_return_path_thread_switchover_acked(); + break; + default: break; } @@ -2693,6 +2706,20 @@ static void migration_update_counters(MigrationState *s, bandwidth, s->threshold_size); } +static bool migration_can_switchover(MigrationState *s) +{ + if (!migrate_switchover_ack()) { + return true; + } + + /* No reason to wait for switchover ACK if VM is stopped */ + if (!runstate_is_running()) { + return true; + } + + return s->switchover_acked; +} + /* Migration thread iteration status */ typedef enum { MIG_ITERATE_RESUME, /* Resume current iteration */ @@ -2708,6 +2735,7 @@ static MigIterateState migration_iteration_run(MigrationState *s) { uint64_t must_precopy, can_postcopy; bool in_postcopy = s->state == MIGRATION_STATUS_POSTCOPY_ACTIVE; + bool can_switchover = migration_can_switchover(s); qemu_savevm_state_pending_estimate(&must_precopy, &can_postcopy); uint64_t pending_size = must_precopy + can_postcopy; @@ -2720,14 +2748,14 @@ static MigIterateState migration_iteration_run(MigrationState *s) trace_migrate_pending_exact(pending_size, must_precopy, can_postcopy); } - if (!pending_size || pending_size < s->threshold_size) { + if ((!pending_size || pending_size < s->threshold_size) && can_switchover) { trace_migration_thread_low_pending(pending_size); migration_completion(s); return MIG_ITERATE_BREAK; } /* Still a significant amount to transfer */ - if (!in_postcopy && must_precopy <= s->threshold_size && + if (!in_postcopy && must_precopy <= s->threshold_size && can_switchover && qatomic_read(&s->start_postcopy)) { if (postcopy_start(s)) { error_report("%s: postcopy failed to start", __func__); diff --git a/migration/savevm.c b/migration/savevm.c index bc284087f9b9..cdf47939244d 100644 --- a/migration/savevm.c +++ b/migration/savevm.c @@ -2360,6 +2360,21 @@ static int loadvm_process_command(QEMUFile *f) error_report("CMD_OPEN_RETURN_PATH failed"); return -1; } + + /* + * Switchover ack is enabled but no device uses it, so send an ACK to + * source that it's OK to switchover. Do it here, after return path has + * been created. + */ + if (migrate_switchover_ack() && !mis->switchover_ack_pending_num) { + int ret = migrate_send_rp_switchover_ack(mis); + if (ret) { + error_report( + "Could not send switchover ack RP MSG, err %d (%s)", ret, + strerror(-ret)); + return ret; + } + } break; case MIG_CMD_PING: @@ -2586,6 +2601,23 @@ static int qemu_loadvm_state_header(QEMUFile *f) return 0; } +static void qemu_loadvm_state_switchover_ack_needed(MigrationIncomingState *mis) +{ + SaveStateEntry *se; + + QTAILQ_FOREACH(se, &savevm_state.handlers, entry) { + if (!se->ops || !se->ops->switchover_ack_needed) { + continue; + } + + if (se->ops->switchover_ack_needed(se->opaque)) { + mis->switchover_ack_pending_num++; + } + } + + trace_loadvm_state_switchover_ack_needed(mis->switchover_ack_pending_num); +} + static int qemu_loadvm_state_setup(QEMUFile *f) { SaveStateEntry *se; @@ -2789,6 +2821,10 @@ int qemu_loadvm_state(QEMUFile *f) return -EINVAL; } + if (migrate_switchover_ack()) { + qemu_loadvm_state_switchover_ack_needed(mis); + } + cpu_synchronize_all_pre_loadvm(); ret = qemu_loadvm_state_main(f, mis); @@ -2862,6 +2898,24 @@ int qemu_load_device_state(QEMUFile *f) return 0; } +int qemu_loadvm_approve_switchover(void) +{ + MigrationIncomingState *mis = migration_incoming_get_current(); + + if (!mis->switchover_ack_pending_num) { + return -EINVAL; + } + + mis->switchover_ack_pending_num--; + trace_loadvm_approve_switchover(mis->switchover_ack_pending_num); + + if (mis->switchover_ack_pending_num) { + return 0; + } + + return migrate_send_rp_switchover_ack(mis); +} + bool save_snapshot(const char *name, bool overwrite, const char *vmstate, bool has_devices, strList *devices, Error **errp) { diff --git a/migration/trace-events b/migration/trace-events index cdaef7a1ea41..5259c1044b08 100644 --- a/migration/trace-events +++ b/migration/trace-events @@ -7,6 +7,7 @@ qemu_loadvm_state_section_partend(uint32_t section_id) "%u" qemu_loadvm_state_post_main(int ret) "%d" qemu_loadvm_state_section_startfull(uint32_t section_id, const char *idstr, uint32_t instance_id, uint32_t version_id) "%u(%s) %u %u" qemu_savevm_send_packaged(void) "" +loadvm_state_switchover_ack_needed(unsigned int switchover_ack_pending_num) "Switchover ack pending num=%u" loadvm_state_setup(void) "" loadvm_state_cleanup(void) "" loadvm_handle_cmd_packaged(unsigned int length) "%u" @@ -23,6 +24,7 @@ loadvm_postcopy_ram_handle_discard_end(void) "" loadvm_postcopy_ram_handle_discard_header(const char *ramid, uint16_t len) "%s: %ud" loadvm_process_command(const char *s, uint16_t len) "com=%s len=%d" loadvm_process_command_ping(uint32_t val) "0x%x" +loadvm_approve_switchover(unsigned int switchover_ack_pending_num) "Switchover ack pending num=%u" postcopy_ram_listen_thread_exit(void) "" postcopy_ram_listen_thread_start(void) "" qemu_savevm_send_postcopy_advise(void) "" @@ -180,6 +182,7 @@ source_return_path_thread_loop_top(void) "" source_return_path_thread_pong(uint32_t val) "0x%x" source_return_path_thread_shut(uint32_t val) "0x%x" source_return_path_thread_resume_ack(uint32_t v) "%"PRIu32 +source_return_path_thread_switchover_acked(void) "" migration_thread_low_pending(uint64_t pending) "%" PRIu64 migrate_transferred(uint64_t tranferred, uint64_t time_spent, uint64_t bandwidth, uint64_t size) "transferred %" PRIu64 " time_spent %" PRIu64 " bandwidth %" PRIu64 " max_size %" PRId64 process_incoming_migration_co_end(int ret, int ps) "ret=%d postcopy-state=%d" From patchwork Fri Jun 30 05:22:22 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: =?utf-8?q?C=C3=A9dric_Le_Goater?= X-Patchwork-Id: 13297592 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 277E3EB64DA for ; Fri, 30 Jun 2023 05:23:57 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1qF6b7-0008B9-Ak; Fri, 30 Jun 2023 01:23:11 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1qF6av-0008A8-Mx for qemu-devel@nongnu.org; Fri, 30 Jun 2023 01:22:57 -0400 Received: from gandalf.ozlabs.org ([150.107.74.76]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1qF6at-0003ax-FW for qemu-devel@nongnu.org; Fri, 30 Jun 2023 01:22:57 -0400 Received: from gandalf.ozlabs.org (mail.ozlabs.org [IPv6:2404:9400:2221:ea00::3]) by gandalf.ozlabs.org (Postfix) with ESMTP id 4QskH12rHjz4wp4; Fri, 30 Jun 2023 15:22:53 +1000 (AEST) Received: from authenticated.ozlabs.org (localhost [127.0.0.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by mail.ozlabs.org (Postfix) with ESMTPSA id 4QskGy433Hz4wZy; Fri, 30 Jun 2023 15:22:50 +1000 (AEST) From: =?utf-8?q?C=C3=A9dric_Le_Goater?= To: qemu-devel@nongnu.org Cc: Richard Henderson , Alex Williamson , Avihai Horon , Juan Quintela , Peter Xu , YangHang Liu , =?utf-8?q?C=C3=A9dric_Le_Goater?= Subject: [PULL 03/16] migration: Enable switchover ack capability Date: Fri, 30 Jun 2023 07:22:22 +0200 Message-ID: <20230630052235.1934154-4-clg@redhat.com> X-Mailer: git-send-email 2.41.0 In-Reply-To: <20230630052235.1934154-1-clg@redhat.com> References: <20230630052235.1934154-1-clg@redhat.com> MIME-Version: 1.0 Received-SPF: pass client-ip=150.107.74.76; envelope-from=SRS0=mf8A=CS=redhat.com=clg@ozlabs.org; helo=gandalf.ozlabs.org X-Spam_score_int: -16 X-Spam_score: -1.7 X-Spam_bar: - X-Spam_report: (-1.7 / 5.0 requ) BAYES_00=-1.9, HEADER_FROM_DIFFERENT_DOMAINS=0.25, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=no autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org From: Avihai Horon Now that switchover ack logic has been implemented, enable the capability. Signed-off-by: Avihai Horon Reviewed-by: Juan Quintela Reviewed-by: Peter Xu Tested-by: YangHang Liu Acked-by: Alex Williamson Signed-off-by: Cédric Le Goater --- migration/options.c | 4 ---- 1 file changed, 4 deletions(-) diff --git a/migration/options.c b/migration/options.c index 16007afca662..5a9505adf7a8 100644 --- a/migration/options.c +++ b/migration/options.c @@ -562,10 +562,6 @@ bool migrate_caps_check(bool *old_caps, bool *new_caps, Error **errp) "'return-path'"); return false; } - - /* Disable this capability until it's implemented */ - error_setg(errp, "'switchover-ack' is not implemented yet"); - return false; } return true; From patchwork Fri Jun 30 05:22:23 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: =?utf-8?q?C=C3=A9dric_Le_Goater?= X-Patchwork-Id: 13297596 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id AC002EB64D7 for ; Fri, 30 Jun 2023 05:24:09 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1qF6bG-0008Dm-KH; Fri, 30 Jun 2023 01:23:18 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1qF6b3-0008B8-Sm for qemu-devel@nongnu.org; Fri, 30 Jun 2023 01:23:07 -0400 Received: from mail.ozlabs.org ([2404:9400:2221:ea00::3] helo=gandalf.ozlabs.org) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1qF6b1-0003gM-Fm for qemu-devel@nongnu.org; Fri, 30 Jun 2023 01:23:05 -0400 Received: from gandalf.ozlabs.org (gandalf.ozlabs.org [150.107.74.76]) by gandalf.ozlabs.org (Postfix) with ESMTP id 4QskH45cKGz4wp5; Fri, 30 Jun 2023 15:22:56 +1000 (AEST) Received: from authenticated.ozlabs.org (localhost [127.0.0.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by mail.ozlabs.org (Postfix) with ESMTPSA id 4QskH16qkvz4wZy; Fri, 30 Jun 2023 15:22:53 +1000 (AEST) From: =?utf-8?q?C=C3=A9dric_Le_Goater?= To: qemu-devel@nongnu.org Cc: Richard Henderson , Alex Williamson , Avihai Horon , Juan Quintela , Peter Xu , YangHang Liu , =?utf-8?q?C=C3=A9dric_Le_Goater?= Subject: [PULL 04/16] tests: Add migration switchover ack capability test Date: Fri, 30 Jun 2023 07:22:23 +0200 Message-ID: <20230630052235.1934154-5-clg@redhat.com> X-Mailer: git-send-email 2.41.0 In-Reply-To: <20230630052235.1934154-1-clg@redhat.com> References: <20230630052235.1934154-1-clg@redhat.com> MIME-Version: 1.0 Received-SPF: pass client-ip=2404:9400:2221:ea00::3; envelope-from=SRS0=mf8A=CS=redhat.com=clg@ozlabs.org; helo=gandalf.ozlabs.org X-Spam_score_int: -39 X-Spam_score: -4.0 X-Spam_bar: ---- X-Spam_report: (-4.0 / 5.0 requ) BAYES_00=-1.9, HEADER_FROM_DIFFERENT_DOMAINS=0.25, RCVD_IN_DNSWL_MED=-2.3, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org From: Avihai Horon Add migration switchover ack capability test. The test runs without devices that support this capability, but is still useful to make sure it didn't break anything. Signed-off-by: Avihai Horon Reviewed-by: Juan Quintela Reviewed-by: Peter Xu Tested-by: YangHang Liu Acked-by: Alex Williamson Signed-off-by: Cédric Le Goater --- tests/qtest/migration-test.c | 31 +++++++++++++++++++++++++++++++ 1 file changed, 31 insertions(+) diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c index b0c355bbd9ca..b9cc194100b5 100644 --- a/tests/qtest/migration-test.c +++ b/tests/qtest/migration-test.c @@ -1693,6 +1693,33 @@ static void test_precopy_tcp_plain(void) test_precopy_common(&args); } +static void *test_migrate_switchover_ack_start(QTestState *from, QTestState *to) +{ + + migrate_set_capability(from, "return-path", true); + migrate_set_capability(to, "return-path", true); + + migrate_set_capability(from, "switchover-ack", true); + migrate_set_capability(to, "switchover-ack", true); + + return NULL; +} + +static void test_precopy_tcp_switchover_ack(void) +{ + MigrateCommon args = { + .listen_uri = "tcp:127.0.0.1:0", + .start_hook = test_migrate_switchover_ack_start, + /* + * Source VM must be running in order to consider the switchover ACK + * when deciding to do switchover or not. + */ + .live = true, + }; + + test_precopy_common(&args); +} + #ifdef CONFIG_GNUTLS static void test_precopy_tcp_tls_psk_match(void) { @@ -2737,6 +2764,10 @@ int main(int argc, char **argv) #endif /* CONFIG_GNUTLS */ qtest_add_func("/migration/precopy/tcp/plain", test_precopy_tcp_plain); + + qtest_add_func("/migration/precopy/tcp/plain/switchover-ack", + test_precopy_tcp_switchover_ack); + #ifdef CONFIG_GNUTLS qtest_add_func("/migration/precopy/tcp/tls/psk/match", test_precopy_tcp_tls_psk_match); From patchwork Fri Jun 30 05:22:24 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: =?utf-8?q?C=C3=A9dric_Le_Goater?= X-Patchwork-Id: 13297598 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id E319AEB64DA for ; Fri, 30 Jun 2023 05:25:07 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1qF6bR-0008GV-Fv; Fri, 30 Jun 2023 01:23:29 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1qF6b3-0008B7-RF for qemu-devel@nongnu.org; Fri, 30 Jun 2023 01:23:07 -0400 Received: from gandalf.ozlabs.org ([150.107.74.76]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1qF6b1-0003rS-GL for qemu-devel@nongnu.org; Fri, 30 Jun 2023 01:23:05 -0400 Received: from gandalf.ozlabs.org (gandalf.ozlabs.org [150.107.74.76]) by gandalf.ozlabs.org (Postfix) with ESMTP id 4QskH76P99z4wp7; Fri, 30 Jun 2023 15:22:59 +1000 (AEST) Received: from authenticated.ozlabs.org (localhost [127.0.0.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by mail.ozlabs.org (Postfix) with ESMTPSA id 4QskH52Vyqz4wZy; Fri, 30 Jun 2023 15:22:57 +1000 (AEST) From: =?utf-8?q?C=C3=A9dric_Le_Goater?= To: qemu-devel@nongnu.org Cc: Richard Henderson , Alex Williamson , Avihai Horon , =?utf-8?q?C=C3=A9dric_Le_Goater?= , Juan Quintela , YangHang Liu Subject: [PULL 05/16] vfio/migration: Refactor vfio_save_block() to return saved data size Date: Fri, 30 Jun 2023 07:22:24 +0200 Message-ID: <20230630052235.1934154-6-clg@redhat.com> X-Mailer: git-send-email 2.41.0 In-Reply-To: <20230630052235.1934154-1-clg@redhat.com> References: <20230630052235.1934154-1-clg@redhat.com> MIME-Version: 1.0 Received-SPF: pass client-ip=150.107.74.76; envelope-from=SRS0=mf8A=CS=redhat.com=clg@ozlabs.org; helo=gandalf.ozlabs.org X-Spam_score_int: -16 X-Spam_score: -1.7 X-Spam_bar: - X-Spam_report: (-1.7 / 5.0 requ) BAYES_00=-1.9, HEADER_FROM_DIFFERENT_DOMAINS=0.25, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=no autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org From: Avihai Horon Refactor vfio_save_block() to return the size of saved data on success and -errno on error. This will be used in next patch to implement VFIO migration pre-copy support. Signed-off-by: Avihai Horon Reviewed-by: Cédric Le Goater Reviewed-by: Juan Quintela Tested-by: YangHang Liu Acked-by: Alex Williamson Signed-off-by: Cédric Le Goater --- hw/vfio/migration.c | 17 +++++++++-------- 1 file changed, 9 insertions(+), 8 deletions(-) diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c index 6b58dddb8859..235978fd6805 100644 --- a/hw/vfio/migration.c +++ b/hw/vfio/migration.c @@ -241,8 +241,8 @@ static int vfio_query_stop_copy_size(VFIODevice *vbasedev, return 0; } -/* Returns 1 if end-of-stream is reached, 0 if more data and -errno if error */ -static int vfio_save_block(QEMUFile *f, VFIOMigration *migration) +/* Returns the size of saved data on success and -errno on error */ +static ssize_t vfio_save_block(QEMUFile *f, VFIOMigration *migration) { ssize_t data_size; @@ -252,7 +252,7 @@ static int vfio_save_block(QEMUFile *f, VFIOMigration *migration) return -errno; } if (data_size == 0) { - return 1; + return 0; } qemu_put_be64(f, VFIO_MIG_FLAG_DEV_DATA_STATE); @@ -262,7 +262,7 @@ static int vfio_save_block(QEMUFile *f, VFIOMigration *migration) trace_vfio_save_block(migration->vbasedev->name, data_size); - return qemu_file_get_error(f); + return qemu_file_get_error(f) ?: data_size; } /* ---------------------------------------------------------------------- */ @@ -335,6 +335,7 @@ static void vfio_state_pending_exact(void *opaque, uint64_t *must_precopy, static int vfio_save_complete_precopy(QEMUFile *f, void *opaque) { VFIODevice *vbasedev = opaque; + ssize_t data_size; int ret; /* We reach here with device state STOP only */ @@ -345,11 +346,11 @@ static int vfio_save_complete_precopy(QEMUFile *f, void *opaque) } do { - ret = vfio_save_block(f, vbasedev->migration); - if (ret < 0) { - return ret; + data_size = vfio_save_block(f, vbasedev->migration); + if (data_size < 0) { + return data_size; } - } while (!ret); + } while (data_size); qemu_put_be64(f, VFIO_MIG_FLAG_END_OF_STATE); ret = qemu_file_get_error(f); From patchwork Fri Jun 30 05:22:25 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: =?utf-8?q?C=C3=A9dric_Le_Goater?= X-Patchwork-Id: 13297593 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 15845EB64D7 for ; Fri, 30 Jun 2023 05:23:57 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1qF6bG-0008Dx-OZ; Fri, 30 Jun 2023 01:23:18 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1qF6b5-0008BE-0X for qemu-devel@nongnu.org; Fri, 30 Jun 2023 01:23:07 -0400 Received: from gandalf.ozlabs.org ([150.107.74.76]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1qF6b3-0003uX-16 for qemu-devel@nongnu.org; Fri, 30 Jun 2023 01:23:06 -0400 Received: from gandalf.ozlabs.org (mail.ozlabs.org [IPv6:2404:9400:2221:ea00::3]) by gandalf.ozlabs.org (Postfix) with ESMTP id 4QskHB58vcz4wp8; Fri, 30 Jun 2023 15:23:02 +1000 (AEST) Received: from authenticated.ozlabs.org (localhost [127.0.0.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by mail.ozlabs.org (Postfix) with ESMTPSA id 4QskH83HjXz4wZy; Fri, 30 Jun 2023 15:23:00 +1000 (AEST) From: =?utf-8?q?C=C3=A9dric_Le_Goater?= To: qemu-devel@nongnu.org Cc: Richard Henderson , Alex Williamson , Avihai Horon , =?utf-8?q?C=C3=A9dric_Le_Goater?= , YangHang Liu Subject: [PULL 06/16] vfio/migration: Store VFIO migration flags in VFIOMigration Date: Fri, 30 Jun 2023 07:22:25 +0200 Message-ID: <20230630052235.1934154-7-clg@redhat.com> X-Mailer: git-send-email 2.41.0 In-Reply-To: <20230630052235.1934154-1-clg@redhat.com> References: <20230630052235.1934154-1-clg@redhat.com> MIME-Version: 1.0 Received-SPF: pass client-ip=150.107.74.76; envelope-from=SRS0=mf8A=CS=redhat.com=clg@ozlabs.org; helo=gandalf.ozlabs.org X-Spam_score_int: -16 X-Spam_score: -1.7 X-Spam_bar: - X-Spam_report: (-1.7 / 5.0 requ) BAYES_00=-1.9, HEADER_FROM_DIFFERENT_DOMAINS=0.25, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=no autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org From: Avihai Horon VFIO migration flags are queried once in vfio_migration_init(). Store them in VFIOMigration so they can be used later to check the device's migration capabilities without re-querying them. This will be used in the next patch to check if the device supports precopy migration. Signed-off-by: Avihai Horon Reviewed-by: Cédric Le Goater Tested-by: YangHang Liu Acked-by: Alex Williamson Signed-off-by: Cédric Le Goater --- include/hw/vfio/vfio-common.h | 1 + hw/vfio/migration.c | 1 + 2 files changed, 2 insertions(+) diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h index eed244f25f34..5f29dab83913 100644 --- a/include/hw/vfio/vfio-common.h +++ b/include/hw/vfio/vfio-common.h @@ -66,6 +66,7 @@ typedef struct VFIOMigration { int data_fd; void *data_buffer; size_t data_buffer_size; + uint64_t mig_flags; } VFIOMigration; typedef struct VFIOAddressSpace { diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c index 235978fd6805..8d3341437926 100644 --- a/hw/vfio/migration.c +++ b/hw/vfio/migration.c @@ -603,6 +603,7 @@ static int vfio_migration_init(VFIODevice *vbasedev) migration->vbasedev = vbasedev; migration->device_state = VFIO_DEVICE_STATE_RUNNING; migration->data_fd = -1; + migration->mig_flags = mig_flags; vbasedev->dirty_pages_supported = vfio_dma_logging_supported(vbasedev); From patchwork Fri Jun 30 05:22:26 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: =?utf-8?q?C=C3=A9dric_Le_Goater?= X-Patchwork-Id: 13297603 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 86992EB64D7 for ; Fri, 30 Jun 2023 05:26:07 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1qF6bS-0008Hi-SQ; Fri, 30 Jun 2023 01:23:30 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1qF6b9-0008DJ-Db for qemu-devel@nongnu.org; Fri, 30 Jun 2023 01:23:15 -0400 Received: from mail.ozlabs.org ([2404:9400:2221:ea00::3] helo=gandalf.ozlabs.org) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1qF6b6-0003yQ-0G for qemu-devel@nongnu.org; Fri, 30 Jun 2023 01:23:10 -0400 Received: from gandalf.ozlabs.org (mail.ozlabs.org [IPv6:2404:9400:2221:ea00::3]) by gandalf.ozlabs.org (Postfix) with ESMTP id 4QskHF3xH8z4wpD; Fri, 30 Jun 2023 15:23:05 +1000 (AEST) Received: from authenticated.ozlabs.org (localhost [127.0.0.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by mail.ozlabs.org (Postfix) with ESMTPSA id 4QskHC23xrz4wZy; Fri, 30 Jun 2023 15:23:03 +1000 (AEST) From: =?utf-8?q?C=C3=A9dric_Le_Goater?= To: qemu-devel@nongnu.org Cc: Richard Henderson , Alex Williamson , Avihai Horon , =?utf-8?q?C=C3=A9dric_Le_Goater?= , YangHang Liu Subject: [PULL 07/16] vfio/migration: Add VFIO migration pre-copy support Date: Fri, 30 Jun 2023 07:22:26 +0200 Message-ID: <20230630052235.1934154-8-clg@redhat.com> X-Mailer: git-send-email 2.41.0 In-Reply-To: <20230630052235.1934154-1-clg@redhat.com> References: <20230630052235.1934154-1-clg@redhat.com> MIME-Version: 1.0 Received-SPF: pass client-ip=2404:9400:2221:ea00::3; envelope-from=SRS0=mf8A=CS=redhat.com=clg@ozlabs.org; helo=gandalf.ozlabs.org X-Spam_score_int: -39 X-Spam_score: -4.0 X-Spam_bar: ---- X-Spam_report: (-4.0 / 5.0 requ) BAYES_00=-1.9, HEADER_FROM_DIFFERENT_DOMAINS=0.25, RCVD_IN_DNSWL_MED=-2.3, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org From: Avihai Horon Pre-copy support allows the VFIO device data to be transferred while the VM is running. This helps to accommodate VFIO devices that have a large amount of data that needs to be transferred, and it can reduce migration downtime. Pre-copy support is optional in VFIO migration protocol v2. Implement pre-copy of VFIO migration protocol v2 and use it for devices that support it. Full description of it can be found in the following Linux commit: 4db52602a607 ("vfio: Extend the device migration protocol with PRE_COPY"). Signed-off-by: Avihai Horon Reviewed-by: Cédric Le Goater Tested-by: YangHang Liu Acked-by: Alex Williamson Signed-off-by: Cédric Le Goater --- docs/devel/vfio-migration.rst | 35 +++++--- include/hw/vfio/vfio-common.h | 2 + hw/vfio/common.c | 6 +- hw/vfio/migration.c | 165 ++++++++++++++++++++++++++++++++-- hw/vfio/trace-events | 4 +- 5 files changed, 190 insertions(+), 22 deletions(-) diff --git a/docs/devel/vfio-migration.rst b/docs/devel/vfio-migration.rst index 1b68ccf11529..e896b2a6734b 100644 --- a/docs/devel/vfio-migration.rst +++ b/docs/devel/vfio-migration.rst @@ -7,12 +7,14 @@ the guest is running on source host and restoring this saved state on the destination host. This document details how saving and restoring of VFIO devices is done in QEMU. -Migration of VFIO devices currently consists of a single stop-and-copy phase. -During the stop-and-copy phase the guest is stopped and the entire VFIO device -data is transferred to the destination. - -The pre-copy phase of migration is currently not supported for VFIO devices. -Support for VFIO pre-copy will be added later on. +Migration of VFIO devices consists of two phases: the optional pre-copy phase, +and the stop-and-copy phase. The pre-copy phase is iterative and allows to +accommodate VFIO devices that have a large amount of data that needs to be +transferred. The iterative pre-copy phase of migration allows for the guest to +continue whilst the VFIO device state is transferred to the destination, this +helps to reduce the total downtime of the VM. VFIO devices opt-in to pre-copy +support by reporting the VFIO_MIGRATION_PRE_COPY flag in the +VFIO_DEVICE_FEATURE_MIGRATION ioctl. Note that currently VFIO migration is supported only for a single device. This is due to VFIO migration's lack of P2P support. However, P2P support is planned @@ -29,10 +31,20 @@ VFIO implements the device hooks for the iterative approach as follows: * A ``load_setup`` function that sets the VFIO device on the destination in _RESUMING state. +* A ``state_pending_estimate`` function that reports an estimate of the + remaining pre-copy data that the vendor driver has yet to save for the VFIO + device. + * A ``state_pending_exact`` function that reads pending_bytes from the vendor driver, which indicates the amount of data that the vendor driver has yet to save for the VFIO device. +* An ``is_active_iterate`` function that indicates ``save_live_iterate`` is + active only when the VFIO device is in pre-copy states. + +* A ``save_live_iterate`` function that reads the VFIO device's data from the + vendor driver during iterative pre-copy phase. + * A ``save_state`` function to save the device config space if it is present. * A ``save_live_complete_precopy`` function that sets the VFIO device in @@ -111,8 +123,10 @@ Flow of state changes during Live migration =========================================== Below is the flow of state change during live migration. -The values in the brackets represent the VM state, the migration state, and +The values in the parentheses represent the VM state, the migration state, and the VFIO device state, respectively. +The text in the square brackets represents the flow if the VFIO device supports +pre-copy. Live migration save path ------------------------ @@ -124,11 +138,12 @@ Live migration save path | migrate_init spawns migration_thread Migration thread then calls each device's .save_setup() - (RUNNING, _SETUP, _RUNNING) + (RUNNING, _SETUP, _RUNNING [_PRE_COPY]) | - (RUNNING, _ACTIVE, _RUNNING) - If device is active, get pending_bytes by .state_pending_exact() + (RUNNING, _ACTIVE, _RUNNING [_PRE_COPY]) + If device is active, get pending_bytes by .state_pending_{estimate,exact}() If total pending_bytes >= threshold_size, call .save_live_iterate() + [Data of VFIO device for pre-copy phase is copied] Iterate till total pending bytes converge and are less than threshold | On migration completion, vCPU stops and calls .save_live_complete_precopy for diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h index 5f29dab83913..1db901c1941f 100644 --- a/include/hw/vfio/vfio-common.h +++ b/include/hw/vfio/vfio-common.h @@ -67,6 +67,8 @@ typedef struct VFIOMigration { void *data_buffer; size_t data_buffer_size; uint64_t mig_flags; + uint64_t precopy_init_size; + uint64_t precopy_dirty_size; } VFIOMigration; typedef struct VFIOAddressSpace { diff --git a/hw/vfio/common.c b/hw/vfio/common.c index fa8fd949b1cf..25801de173c4 100644 --- a/hw/vfio/common.c +++ b/hw/vfio/common.c @@ -492,7 +492,8 @@ static bool vfio_devices_all_dirty_tracking(VFIOContainer *container) } if (vbasedev->pre_copy_dirty_page_tracking == ON_OFF_AUTO_OFF && - migration->device_state == VFIO_DEVICE_STATE_RUNNING) { + (migration->device_state == VFIO_DEVICE_STATE_RUNNING || + migration->device_state == VFIO_DEVICE_STATE_PRE_COPY)) { return false; } } @@ -537,7 +538,8 @@ static bool vfio_devices_all_running_and_mig_active(VFIOContainer *container) return false; } - if (migration->device_state == VFIO_DEVICE_STATE_RUNNING) { + if (migration->device_state == VFIO_DEVICE_STATE_RUNNING || + migration->device_state == VFIO_DEVICE_STATE_PRE_COPY) { continue; } else { return false; diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c index 8d3341437926..d8f6a22ae14e 100644 --- a/hw/vfio/migration.c +++ b/hw/vfio/migration.c @@ -68,6 +68,8 @@ static const char *mig_state_to_str(enum vfio_device_mig_state state) return "STOP_COPY"; case VFIO_DEVICE_STATE_RESUMING: return "RESUMING"; + case VFIO_DEVICE_STATE_PRE_COPY: + return "PRE_COPY"; default: return "UNKNOWN STATE"; } @@ -241,6 +243,25 @@ static int vfio_query_stop_copy_size(VFIODevice *vbasedev, return 0; } +static int vfio_query_precopy_size(VFIOMigration *migration) +{ + struct vfio_precopy_info precopy = { + .argsz = sizeof(precopy), + }; + + migration->precopy_init_size = 0; + migration->precopy_dirty_size = 0; + + if (ioctl(migration->data_fd, VFIO_MIG_GET_PRECOPY_INFO, &precopy)) { + return -errno; + } + + migration->precopy_init_size = precopy.initial_bytes; + migration->precopy_dirty_size = precopy.dirty_bytes; + + return 0; +} + /* Returns the size of saved data on success and -errno on error */ static ssize_t vfio_save_block(QEMUFile *f, VFIOMigration *migration) { @@ -249,6 +270,14 @@ static ssize_t vfio_save_block(QEMUFile *f, VFIOMigration *migration) data_size = read(migration->data_fd, migration->data_buffer, migration->data_buffer_size); if (data_size < 0) { + /* + * Pre-copy emptied all the device state for now. For more information, + * please refer to the Linux kernel VFIO uAPI. + */ + if (errno == ENOMSG) { + return 0; + } + return -errno; } if (data_size == 0) { @@ -265,6 +294,38 @@ static ssize_t vfio_save_block(QEMUFile *f, VFIOMigration *migration) return qemu_file_get_error(f) ?: data_size; } +static void vfio_update_estimated_pending_data(VFIOMigration *migration, + uint64_t data_size) +{ + if (!data_size) { + /* + * Pre-copy emptied all the device state for now, update estimated sizes + * accordingly. + */ + migration->precopy_init_size = 0; + migration->precopy_dirty_size = 0; + + return; + } + + if (migration->precopy_init_size) { + uint64_t init_size = MIN(migration->precopy_init_size, data_size); + + migration->precopy_init_size -= init_size; + data_size -= init_size; + } + + migration->precopy_dirty_size -= MIN(migration->precopy_dirty_size, + data_size); +} + +static bool vfio_precopy_supported(VFIODevice *vbasedev) +{ + VFIOMigration *migration = vbasedev->migration; + + return migration->mig_flags & VFIO_MIGRATION_PRE_COPY; +} + /* ---------------------------------------------------------------------- */ static int vfio_save_setup(QEMUFile *f, void *opaque) @@ -285,6 +346,28 @@ static int vfio_save_setup(QEMUFile *f, void *opaque) return -ENOMEM; } + if (vfio_precopy_supported(vbasedev)) { + int ret; + + switch (migration->device_state) { + case VFIO_DEVICE_STATE_RUNNING: + ret = vfio_migration_set_state(vbasedev, VFIO_DEVICE_STATE_PRE_COPY, + VFIO_DEVICE_STATE_RUNNING); + if (ret) { + return ret; + } + + vfio_query_precopy_size(migration); + + break; + case VFIO_DEVICE_STATE_STOP: + /* vfio_save_complete_precopy() will go to STOP_COPY */ + break; + default: + return -EINVAL; + } + } + trace_vfio_save_setup(vbasedev->name, migration->data_buffer_size); qemu_put_be64(f, VFIO_MIG_FLAG_END_OF_STATE); @@ -299,26 +382,42 @@ static void vfio_save_cleanup(void *opaque) g_free(migration->data_buffer); migration->data_buffer = NULL; + migration->precopy_init_size = 0; + migration->precopy_dirty_size = 0; vfio_migration_cleanup(vbasedev); trace_vfio_save_cleanup(vbasedev->name); } +static void vfio_state_pending_estimate(void *opaque, uint64_t *must_precopy, + uint64_t *can_postcopy) +{ + VFIODevice *vbasedev = opaque; + VFIOMigration *migration = vbasedev->migration; + + if (migration->device_state != VFIO_DEVICE_STATE_PRE_COPY) { + return; + } + + *must_precopy += + migration->precopy_init_size + migration->precopy_dirty_size; + + trace_vfio_state_pending_estimate(vbasedev->name, *must_precopy, + *can_postcopy, + migration->precopy_init_size, + migration->precopy_dirty_size); +} + /* * Migration size of VFIO devices can be as little as a few KBs or as big as * many GBs. This value should be big enough to cover the worst case. */ #define VFIO_MIG_STOP_COPY_SIZE (100 * GiB) -/* - * Only exact function is implemented and not estimate function. The reason is - * that during pre-copy phase of migration the estimate function is called - * repeatedly while pending RAM size is over the threshold, thus migration - * can't converge and querying the VFIO device pending data size is useless. - */ static void vfio_state_pending_exact(void *opaque, uint64_t *must_precopy, uint64_t *can_postcopy) { VFIODevice *vbasedev = opaque; + VFIOMigration *migration = vbasedev->migration; uint64_t stop_copy_size = VFIO_MIG_STOP_COPY_SIZE; /* @@ -328,8 +427,48 @@ static void vfio_state_pending_exact(void *opaque, uint64_t *must_precopy, vfio_query_stop_copy_size(vbasedev, &stop_copy_size); *must_precopy += stop_copy_size; + if (migration->device_state == VFIO_DEVICE_STATE_PRE_COPY) { + vfio_query_precopy_size(migration); + + *must_precopy += + migration->precopy_init_size + migration->precopy_dirty_size; + } + trace_vfio_state_pending_exact(vbasedev->name, *must_precopy, *can_postcopy, - stop_copy_size); + stop_copy_size, migration->precopy_init_size, + migration->precopy_dirty_size); +} + +static bool vfio_is_active_iterate(void *opaque) +{ + VFIODevice *vbasedev = opaque; + VFIOMigration *migration = vbasedev->migration; + + return migration->device_state == VFIO_DEVICE_STATE_PRE_COPY; +} + +static int vfio_save_iterate(QEMUFile *f, void *opaque) +{ + VFIODevice *vbasedev = opaque; + VFIOMigration *migration = vbasedev->migration; + ssize_t data_size; + + data_size = vfio_save_block(f, migration); + if (data_size < 0) { + return data_size; + } + qemu_put_be64(f, VFIO_MIG_FLAG_END_OF_STATE); + + vfio_update_estimated_pending_data(migration, data_size); + + trace_vfio_save_iterate(vbasedev->name, migration->precopy_init_size, + migration->precopy_dirty_size); + + /* + * A VFIO device's pre-copy dirty_bytes is not guaranteed to reach zero. + * Return 1 so following handlers will not be potentially blocked. + */ + return 1; } static int vfio_save_complete_precopy(QEMUFile *f, void *opaque) @@ -338,7 +477,7 @@ static int vfio_save_complete_precopy(QEMUFile *f, void *opaque) ssize_t data_size; int ret; - /* We reach here with device state STOP only */ + /* We reach here with device state STOP or STOP_COPY only */ ret = vfio_migration_set_state(vbasedev, VFIO_DEVICE_STATE_STOP_COPY, VFIO_DEVICE_STATE_STOP); if (ret) { @@ -457,7 +596,10 @@ static int vfio_load_state(QEMUFile *f, void *opaque, int version_id) static const SaveVMHandlers savevm_vfio_handlers = { .save_setup = vfio_save_setup, .save_cleanup = vfio_save_cleanup, + .state_pending_estimate = vfio_state_pending_estimate, .state_pending_exact = vfio_state_pending_exact, + .is_active_iterate = vfio_is_active_iterate, + .save_live_iterate = vfio_save_iterate, .save_live_complete_precopy = vfio_save_complete_precopy, .save_state = vfio_save_state, .load_setup = vfio_load_setup, @@ -470,13 +612,18 @@ static const SaveVMHandlers savevm_vfio_handlers = { static void vfio_vmstate_change(void *opaque, bool running, RunState state) { VFIODevice *vbasedev = opaque; + VFIOMigration *migration = vbasedev->migration; enum vfio_device_mig_state new_state; int ret; if (running) { new_state = VFIO_DEVICE_STATE_RUNNING; } else { - new_state = VFIO_DEVICE_STATE_STOP; + new_state = + (migration->device_state == VFIO_DEVICE_STATE_PRE_COPY && + (state == RUN_STATE_FINISH_MIGRATE || state == RUN_STATE_PAUSED)) ? + VFIO_DEVICE_STATE_STOP_COPY : + VFIO_DEVICE_STATE_STOP; } /* diff --git a/hw/vfio/trace-events b/hw/vfio/trace-events index cfb60c354de3..e328d644d289 100644 --- a/hw/vfio/trace-events +++ b/hw/vfio/trace-events @@ -162,6 +162,8 @@ vfio_save_block(const char *name, int data_size) " (%s) data_size %d" vfio_save_cleanup(const char *name) " (%s)" vfio_save_complete_precopy(const char *name, int ret) " (%s) ret %d" vfio_save_device_config_state(const char *name) " (%s)" +vfio_save_iterate(const char *name, uint64_t precopy_init_size, uint64_t precopy_dirty_size) " (%s) precopy initial size 0x%"PRIx64" precopy dirty size 0x%"PRIx64 vfio_save_setup(const char *name, uint64_t data_buffer_size) " (%s) data buffer size 0x%"PRIx64 -vfio_state_pending_exact(const char *name, uint64_t precopy, uint64_t postcopy, uint64_t stopcopy_size) " (%s) precopy 0x%"PRIx64" postcopy 0x%"PRIx64" stopcopy size 0x%"PRIx64 +vfio_state_pending_estimate(const char *name, uint64_t precopy, uint64_t postcopy, uint64_t precopy_init_size, uint64_t precopy_dirty_size) " (%s) precopy 0x%"PRIx64" postcopy 0x%"PRIx64" precopy initial size 0x%"PRIx64" precopy dirty size 0x%"PRIx64 +vfio_state_pending_exact(const char *name, uint64_t precopy, uint64_t postcopy, uint64_t stopcopy_size, uint64_t precopy_init_size, uint64_t precopy_dirty_size) " (%s) precopy 0x%"PRIx64" postcopy 0x%"PRIx64" stopcopy size 0x%"PRIx64" precopy initial size 0x%"PRIx64" precopy dirty size 0x%"PRIx64 vfio_vmstate_change(const char *name, int running, const char *reason, const char *dev_state) " (%s) running %d reason %s device state %s" From patchwork Fri Jun 30 05:22:27 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: =?utf-8?q?C=C3=A9dric_Le_Goater?= X-Patchwork-Id: 13297605 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id C7643EB64D7 for ; Fri, 30 Jun 2023 05:27:55 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1qF6bR-0008GA-6b; Fri, 30 Jun 2023 01:23:29 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1qF6bB-0008DK-Og for qemu-devel@nongnu.org; Fri, 30 Jun 2023 01:23:15 -0400 Received: from mail.ozlabs.org ([2404:9400:2221:ea00::3] helo=gandalf.ozlabs.org) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1qF6b9-00046I-5o for qemu-devel@nongnu.org; Fri, 30 Jun 2023 01:23:13 -0400 Received: from gandalf.ozlabs.org (gandalf.ozlabs.org [150.107.74.76]) by gandalf.ozlabs.org (Postfix) with ESMTP id 4QskHJ2gznz4wpF; Fri, 30 Jun 2023 15:23:08 +1000 (AEST) Received: from authenticated.ozlabs.org (localhost [127.0.0.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by mail.ozlabs.org (Postfix) with ESMTPSA id 4QskHG0p9Vz4wZy; Fri, 30 Jun 2023 15:23:05 +1000 (AEST) From: =?utf-8?q?C=C3=A9dric_Le_Goater?= To: qemu-devel@nongnu.org Cc: Richard Henderson , Alex Williamson , Avihai Horon , =?utf-8?q?C=C3=A9dric_Le_Goater?= , YangHang Liu Subject: [PULL 08/16] vfio/migration: Add support for switchover ack capability Date: Fri, 30 Jun 2023 07:22:27 +0200 Message-ID: <20230630052235.1934154-9-clg@redhat.com> X-Mailer: git-send-email 2.41.0 In-Reply-To: <20230630052235.1934154-1-clg@redhat.com> References: <20230630052235.1934154-1-clg@redhat.com> MIME-Version: 1.0 Received-SPF: pass client-ip=2404:9400:2221:ea00::3; envelope-from=SRS0=mf8A=CS=redhat.com=clg@ozlabs.org; helo=gandalf.ozlabs.org X-Spam_score_int: -39 X-Spam_score: -4.0 X-Spam_bar: ---- X-Spam_report: (-4.0 / 5.0 requ) BAYES_00=-1.9, HEADER_FROM_DIFFERENT_DOMAINS=0.25, RCVD_IN_DNSWL_MED=-2.3, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org From: Avihai Horon Loading of a VFIO device's data can take a substantial amount of time as the device may need to allocate resources, prepare internal data structures, etc. This can increase migration downtime, especially for VFIO devices with a lot of resources. To solve this, VFIO migration uAPI defines "initial bytes" as part of its precopy data stream. Initial bytes can be used in various ways to improve VFIO migration performance. For example, it can be used to transfer device metadata to pre-allocate resources in the destination. However, for this to work we need to make sure that all initial bytes are sent and loaded in the destination before the source VM is stopped. Use migration switchover ack capability to make sure a VFIO device's initial bytes are sent and loaded in the destination before the source stops the VM and attempts to complete the migration. This can significantly reduce migration downtime for some devices. Signed-off-by: Avihai Horon Reviewed-by: Cédric Le Goater Tested-by: YangHang Liu Acked-by: Alex Williamson Signed-off-by: Cédric Le Goater --- docs/devel/vfio-migration.rst | 10 +++++++++ include/hw/vfio/vfio-common.h | 1 + hw/vfio/migration.c | 39 ++++++++++++++++++++++++++++++++++- 3 files changed, 49 insertions(+), 1 deletion(-) diff --git a/docs/devel/vfio-migration.rst b/docs/devel/vfio-migration.rst index e896b2a6734b..b433cb5bb2c8 100644 --- a/docs/devel/vfio-migration.rst +++ b/docs/devel/vfio-migration.rst @@ -16,6 +16,13 @@ helps to reduce the total downtime of the VM. VFIO devices opt-in to pre-copy support by reporting the VFIO_MIGRATION_PRE_COPY flag in the VFIO_DEVICE_FEATURE_MIGRATION ioctl. +When pre-copy is supported, it's possible to further reduce downtime by +enabling "switchover-ack" migration capability. +VFIO migration uAPI defines "initial bytes" as part of its pre-copy data stream +and recommends that the initial bytes are sent and loaded in the destination +before stopping the source VM. Enabling this migration capability will +guarantee that and thus, can potentially reduce downtime even further. + Note that currently VFIO migration is supported only for a single device. This is due to VFIO migration's lack of P2P support. However, P2P support is planned to be added later on. @@ -45,6 +52,9 @@ VFIO implements the device hooks for the iterative approach as follows: * A ``save_live_iterate`` function that reads the VFIO device's data from the vendor driver during iterative pre-copy phase. +* A ``switchover_ack_needed`` function that checks if the VFIO device uses + "switchover-ack" migration capability when this capability is enabled. + * A ``save_state`` function to save the device config space if it is present. * A ``save_live_complete_precopy`` function that sets the VFIO device in diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h index 1db901c1941f..3dc5f2104c86 100644 --- a/include/hw/vfio/vfio-common.h +++ b/include/hw/vfio/vfio-common.h @@ -69,6 +69,7 @@ typedef struct VFIOMigration { uint64_t mig_flags; uint64_t precopy_init_size; uint64_t precopy_dirty_size; + bool initial_data_sent; } VFIOMigration; typedef struct VFIOAddressSpace { diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c index d8f6a22ae14e..acbf0bb7ab3c 100644 --- a/hw/vfio/migration.c +++ b/hw/vfio/migration.c @@ -18,6 +18,8 @@ #include "sysemu/runstate.h" #include "hw/vfio/vfio-common.h" #include "migration/migration.h" +#include "migration/options.h" +#include "migration/savevm.h" #include "migration/vmstate.h" #include "migration/qemu-file.h" #include "migration/register.h" @@ -45,6 +47,7 @@ #define VFIO_MIG_FLAG_DEV_CONFIG_STATE (0xffffffffef100002ULL) #define VFIO_MIG_FLAG_DEV_SETUP_STATE (0xffffffffef100003ULL) #define VFIO_MIG_FLAG_DEV_DATA_STATE (0xffffffffef100004ULL) +#define VFIO_MIG_FLAG_DEV_INIT_DATA_SENT (0xffffffffef100005ULL) /* * This is an arbitrary size based on migration of mlx5 devices, where typically @@ -384,6 +387,7 @@ static void vfio_save_cleanup(void *opaque) migration->data_buffer = NULL; migration->precopy_init_size = 0; migration->precopy_dirty_size = 0; + migration->initial_data_sent = false; vfio_migration_cleanup(vbasedev); trace_vfio_save_cleanup(vbasedev->name); } @@ -457,10 +461,17 @@ static int vfio_save_iterate(QEMUFile *f, void *opaque) if (data_size < 0) { return data_size; } - qemu_put_be64(f, VFIO_MIG_FLAG_END_OF_STATE); vfio_update_estimated_pending_data(migration, data_size); + if (migrate_switchover_ack() && !migration->precopy_init_size && + !migration->initial_data_sent) { + qemu_put_be64(f, VFIO_MIG_FLAG_DEV_INIT_DATA_SENT); + migration->initial_data_sent = true; + } else { + qemu_put_be64(f, VFIO_MIG_FLAG_END_OF_STATE); + } + trace_vfio_save_iterate(vbasedev->name, migration->precopy_init_size, migration->precopy_dirty_size); @@ -579,6 +590,24 @@ static int vfio_load_state(QEMUFile *f, void *opaque, int version_id) } break; } + case VFIO_MIG_FLAG_DEV_INIT_DATA_SENT: + { + if (!vfio_precopy_supported(vbasedev) || + !migrate_switchover_ack()) { + error_report("%s: Received INIT_DATA_SENT but switchover ack " + "is not used", vbasedev->name); + return -EINVAL; + } + + ret = qemu_loadvm_approve_switchover(); + if (ret) { + error_report( + "%s: qemu_loadvm_approve_switchover failed, err=%d (%s)", + vbasedev->name, ret, strerror(-ret)); + } + + return ret; + } default: error_report("%s: Unknown tag 0x%"PRIx64, vbasedev->name, data); return -EINVAL; @@ -593,6 +622,13 @@ static int vfio_load_state(QEMUFile *f, void *opaque, int version_id) return ret; } +static bool vfio_switchover_ack_needed(void *opaque) +{ + VFIODevice *vbasedev = opaque; + + return vfio_precopy_supported(vbasedev); +} + static const SaveVMHandlers savevm_vfio_handlers = { .save_setup = vfio_save_setup, .save_cleanup = vfio_save_cleanup, @@ -605,6 +641,7 @@ static const SaveVMHandlers savevm_vfio_handlers = { .load_setup = vfio_load_setup, .load_cleanup = vfio_load_cleanup, .load_state = vfio_load_state, + .switchover_ack_needed = vfio_switchover_ack_needed, }; /* ---------------------------------------------------------------------- */ From patchwork Fri Jun 30 05:22:28 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: =?utf-8?q?C=C3=A9dric_Le_Goater?= X-Patchwork-Id: 13297597 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 10D46EB64D7 for ; Fri, 30 Jun 2023 05:24:27 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1qF6bU-0008Iz-NS; Fri, 30 Jun 2023 01:23:32 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1qF6bG-0008Du-Jc for qemu-devel@nongnu.org; Fri, 30 Jun 2023 01:23:18 -0400 Received: from mail.ozlabs.org ([2404:9400:2221:ea00::3] helo=gandalf.ozlabs.org) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1qF6bD-0004LM-AQ for qemu-devel@nongnu.org; Fri, 30 Jun 2023 01:23:18 -0400 Received: from gandalf.ozlabs.org (gandalf.ozlabs.org [150.107.74.76]) by gandalf.ozlabs.org (Postfix) with ESMTP id 4QskHN0wbSz4wpJ; Fri, 30 Jun 2023 15:23:12 +1000 (AEST) Received: from authenticated.ozlabs.org (localhost [127.0.0.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by mail.ozlabs.org (Postfix) with ESMTPSA id 4QskHJ6hLPz4wZy; Fri, 30 Jun 2023 15:23:08 +1000 (AEST) From: =?utf-8?q?C=C3=A9dric_Le_Goater?= To: qemu-devel@nongnu.org Cc: Richard Henderson , Alex Williamson , =?utf-8?q?Philippe_Mathieu-Da?= =?utf-8?q?ud=C3=A9?= , =?utf-8?q?C=C3=A9dric_Le_Goater?= , Robin Voetter Subject: [PULL 09/16] vfio: Implement a common device info helper Date: Fri, 30 Jun 2023 07:22:28 +0200 Message-ID: <20230630052235.1934154-10-clg@redhat.com> X-Mailer: git-send-email 2.41.0 In-Reply-To: <20230630052235.1934154-1-clg@redhat.com> References: <20230630052235.1934154-1-clg@redhat.com> MIME-Version: 1.0 Received-SPF: pass client-ip=2404:9400:2221:ea00::3; envelope-from=SRS0=mf8A=CS=redhat.com=clg@ozlabs.org; helo=gandalf.ozlabs.org X-Spam_score_int: -39 X-Spam_score: -4.0 X-Spam_bar: ---- X-Spam_report: (-4.0 / 5.0 requ) BAYES_00=-1.9, HEADER_FROM_DIFFERENT_DOMAINS=0.25, RCVD_IN_DNSWL_MED=-2.3, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org From: Alex Williamson A common helper implementing the realloc algorithm for handling capabilities. Reviewed-by: Philippe Mathieu-Daudé Reviewed-by: Cédric Le Goater Signed-off-by: Alex Williamson Reviewed-by: Robin Voetter Signed-off-by: Cédric Le Goater --- include/hw/vfio/vfio-common.h | 1 + hw/s390x/s390-pci-vfio.c | 37 ++++------------------------ hw/vfio/common.c | 46 ++++++++++++++++++++++++++--------- 3 files changed, 41 insertions(+), 43 deletions(-) diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h index 3dc5f2104c86..6d1b8487c374 100644 --- a/include/hw/vfio/vfio-common.h +++ b/include/hw/vfio/vfio-common.h @@ -216,6 +216,7 @@ void vfio_region_finalize(VFIORegion *region); void vfio_reset_handler(void *opaque); VFIOGroup *vfio_get_group(int groupid, AddressSpace *as, Error **errp); void vfio_put_group(VFIOGroup *group); +struct vfio_device_info *vfio_get_device_info(int fd); int vfio_get_device(VFIOGroup *group, const char *name, VFIODevice *vbasedev, Error **errp); diff --git a/hw/s390x/s390-pci-vfio.c b/hw/s390x/s390-pci-vfio.c index f51190d4662f..59a2e03873bd 100644 --- a/hw/s390x/s390-pci-vfio.c +++ b/hw/s390x/s390-pci-vfio.c @@ -289,38 +289,11 @@ static void s390_pci_read_pfip(S390PCIBusDevice *pbdev, memcpy(pbdev->zpci_fn.pfip, cap->pfip, CLP_PFIP_NR_SEGMENTS); } -static struct vfio_device_info *get_device_info(S390PCIBusDevice *pbdev, - uint32_t argsz) +static struct vfio_device_info *get_device_info(S390PCIBusDevice *pbdev) { - struct vfio_device_info *info = g_malloc0(argsz); - VFIOPCIDevice *vfio_pci; - int fd; + VFIOPCIDevice *vfio_pci = container_of(pbdev->pdev, VFIOPCIDevice, pdev); - vfio_pci = container_of(pbdev->pdev, VFIOPCIDevice, pdev); - fd = vfio_pci->vbasedev.fd; - - /* - * If the specified argsz is not large enough to contain all capabilities - * it will be updated upon return from the ioctl. Retry until we have - * a big enough buffer to hold the entire capability chain. On error, - * just exit and rely on CLP defaults. - */ -retry: - info->argsz = argsz; - - if (ioctl(fd, VFIO_DEVICE_GET_INFO, info)) { - trace_s390_pci_clp_dev_info(vfio_pci->vbasedev.name); - g_free(info); - return NULL; - } - - if (info->argsz > argsz) { - argsz = info->argsz; - info = g_realloc(info, argsz); - goto retry; - } - - return info; + return vfio_get_device_info(vfio_pci->vbasedev.fd); } /* @@ -335,7 +308,7 @@ bool s390_pci_get_host_fh(S390PCIBusDevice *pbdev, uint32_t *fh) assert(fh); - info = get_device_info(pbdev, sizeof(*info)); + info = get_device_info(pbdev); if (!info) { return false; } @@ -356,7 +329,7 @@ void s390_pci_get_clp_info(S390PCIBusDevice *pbdev) { g_autofree struct vfio_device_info *info = NULL; - info = get_device_info(pbdev, sizeof(*info)); + info = get_device_info(pbdev); if (!info) { return; } diff --git a/hw/vfio/common.c b/hw/vfio/common.c index 25801de173c4..28ec9e999c09 100644 --- a/hw/vfio/common.c +++ b/hw/vfio/common.c @@ -2846,11 +2846,35 @@ void vfio_put_group(VFIOGroup *group) } } +struct vfio_device_info *vfio_get_device_info(int fd) +{ + struct vfio_device_info *info; + uint32_t argsz = sizeof(*info); + + info = g_malloc0(argsz); + +retry: + info->argsz = argsz; + + if (ioctl(fd, VFIO_DEVICE_GET_INFO, info)) { + g_free(info); + return NULL; + } + + if (info->argsz > argsz) { + argsz = info->argsz; + info = g_realloc(info, argsz); + goto retry; + } + + return info; +} + int vfio_get_device(VFIOGroup *group, const char *name, VFIODevice *vbasedev, Error **errp) { - struct vfio_device_info dev_info = { .argsz = sizeof(dev_info) }; - int ret, fd; + g_autofree struct vfio_device_info *info = NULL; + int fd; fd = ioctl(group->fd, VFIO_GROUP_GET_DEVICE_FD, name); if (fd < 0) { @@ -2862,11 +2886,11 @@ int vfio_get_device(VFIOGroup *group, const char *name, return fd; } - ret = ioctl(fd, VFIO_DEVICE_GET_INFO, &dev_info); - if (ret) { + info = vfio_get_device_info(fd); + if (!info) { error_setg_errno(errp, errno, "error getting device info"); close(fd); - return ret; + return -1; } /* @@ -2894,14 +2918,14 @@ int vfio_get_device(VFIOGroup *group, const char *name, vbasedev->group = group; QLIST_INSERT_HEAD(&group->device_list, vbasedev, next); - vbasedev->num_irqs = dev_info.num_irqs; - vbasedev->num_regions = dev_info.num_regions; - vbasedev->flags = dev_info.flags; + vbasedev->num_irqs = info->num_irqs; + vbasedev->num_regions = info->num_regions; + vbasedev->flags = info->flags; + + trace_vfio_get_device(name, info->flags, info->num_regions, info->num_irqs); - trace_vfio_get_device(name, dev_info.flags, dev_info.num_regions, - dev_info.num_irqs); + vbasedev->reset_works = !!(info->flags & VFIO_DEVICE_FLAGS_RESET); - vbasedev->reset_works = !!(dev_info.flags & VFIO_DEVICE_FLAGS_RESET); return 0; } From patchwork Fri Jun 30 05:22:29 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: =?utf-8?q?C=C3=A9dric_Le_Goater?= X-Patchwork-Id: 13297601 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 874B1EB64DA for ; Fri, 30 Jun 2023 05:25:11 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1qF6bV-0008JG-NU; Fri, 30 Jun 2023 01:23:33 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1qF6bI-0008Ee-0J for qemu-devel@nongnu.org; Fri, 30 Jun 2023 01:23:20 -0400 Received: from mail.ozlabs.org ([2404:9400:2221:ea00::3] helo=gandalf.ozlabs.org) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1qF6bE-0004SY-PI for qemu-devel@nongnu.org; Fri, 30 Jun 2023 01:23:19 -0400 Received: from gandalf.ozlabs.org (mail.ozlabs.org [IPv6:2404:9400:2221:ea00::3]) by gandalf.ozlabs.org (Postfix) with ESMTP id 4QskHQ2nwSz4wpM; Fri, 30 Jun 2023 15:23:14 +1000 (AEST) Received: from authenticated.ozlabs.org (localhost [127.0.0.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by mail.ozlabs.org (Postfix) with ESMTPSA id 4QskHN4wXrz4wZy; Fri, 30 Jun 2023 15:23:12 +1000 (AEST) From: =?utf-8?q?C=C3=A9dric_Le_Goater?= To: qemu-devel@nongnu.org Cc: Richard Henderson , Alex Williamson , =?utf-8?q?C=C3=A9dric_Le_Goat?= =?utf-8?q?er?= Subject: [PULL 10/16] hw/vfio/pci-quirks: Support alternate offset for GPUDirect Cliques Date: Fri, 30 Jun 2023 07:22:29 +0200 Message-ID: <20230630052235.1934154-11-clg@redhat.com> X-Mailer: git-send-email 2.41.0 In-Reply-To: <20230630052235.1934154-1-clg@redhat.com> References: <20230630052235.1934154-1-clg@redhat.com> MIME-Version: 1.0 Received-SPF: pass client-ip=2404:9400:2221:ea00::3; envelope-from=SRS0=mf8A=CS=redhat.com=clg@ozlabs.org; helo=gandalf.ozlabs.org X-Spam_score_int: -39 X-Spam_score: -4.0 X-Spam_bar: ---- X-Spam_report: (-4.0 / 5.0 requ) BAYES_00=-1.9, HEADER_FROM_DIFFERENT_DOMAINS=0.25, RCVD_IN_DNSWL_MED=-2.3, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org From: Alex Williamson NVIDIA Turing and newer GPUs implement the MSI-X capability at the offset previously reserved for use by hypervisors to implement the GPUDirect Cliques capability. A revised specification provides an alternate location. Add a config space walk to the quirk to check for conflicts, allowing us to fall back to the new location or generate an error at the quirk setup rather than when the real conflicting capability is added should there be no available location. Signed-off-by: Alex Williamson Reviewed-by: Cédric Le Goater Signed-off-by: Cédric Le Goater --- hw/vfio/pci-quirks.c | 41 ++++++++++++++++++++++++++++++++++++++++- 1 file changed, 40 insertions(+), 1 deletion(-) diff --git a/hw/vfio/pci-quirks.c b/hw/vfio/pci-quirks.c index f0147a050aaa..0ed2fcd53152 100644 --- a/hw/vfio/pci-quirks.c +++ b/hw/vfio/pci-quirks.c @@ -1490,6 +1490,9 @@ void vfio_setup_resetfn_quirk(VFIOPCIDevice *vdev) * +---------------------------------+---------------------------------+ * * https://lists.gnu.org/archive/html/qemu-devel/2017-08/pdfUda5iEpgOS.pdf + * + * Specification for Turning and later GPU architectures: + * https://lists.gnu.org/archive/html/qemu-devel/2023-06/pdf142OR4O4c2.pdf */ static void get_nv_gpudirect_clique_id(Object *obj, Visitor *v, const char *name, void *opaque, @@ -1530,7 +1533,9 @@ const PropertyInfo qdev_prop_nv_gpudirect_clique = { static int vfio_add_nv_gpudirect_cap(VFIOPCIDevice *vdev, Error **errp) { PCIDevice *pdev = &vdev->pdev; - int ret, pos = 0xC8; + int ret, pos; + bool c8_conflict = false, d4_conflict = false; + uint8_t tmp; if (vdev->nv_gpudirect_clique == 0xFF) { return 0; @@ -1547,6 +1552,40 @@ static int vfio_add_nv_gpudirect_cap(VFIOPCIDevice *vdev, Error **errp) return -EINVAL; } + /* + * Per the updated specification above, it's recommended to use offset + * D4h for Turing and later GPU architectures due to a conflict of the + * MSI-X capability at C8h. We don't know how to determine the GPU + * architecture, instead we walk the capability chain to mark conflicts + * and choose one or error based on the result. + * + * NB. Cap list head in pdev->config is already cleared, read from device. + */ + ret = pread(vdev->vbasedev.fd, &tmp, 1, + vdev->config_offset + PCI_CAPABILITY_LIST); + if (ret != 1 || !tmp) { + error_setg(errp, "NVIDIA GPUDirect Clique ID: error getting cap list"); + return -EINVAL; + } + + do { + if (tmp == 0xC8) { + c8_conflict = true; + } else if (tmp == 0xD4) { + d4_conflict = true; + } + tmp = pdev->config[tmp + PCI_CAP_LIST_NEXT]; + } while (tmp); + + if (!c8_conflict) { + pos = 0xC8; + } else if (!d4_conflict) { + pos = 0xD4; + } else { + error_setg(errp, "NVIDIA GPUDirect Clique ID: invalid config space"); + return -EINVAL; + } + ret = pci_add_capability(pdev, PCI_CAP_ID_VNDR, pos, 8, errp); if (ret < 0) { error_prepend(errp, "Failed to add NVIDIA GPUDirect cap: "); From patchwork Fri Jun 30 05:22:30 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: =?utf-8?q?C=C3=A9dric_Le_Goater?= X-Patchwork-Id: 13297602 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 4D860EB64D7 for ; Fri, 30 Jun 2023 05:25:58 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1qF6bW-0008K1-L5; Fri, 30 Jun 2023 01:23:34 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1qF6bK-0008Ej-5W for qemu-devel@nongnu.org; Fri, 30 Jun 2023 01:23:24 -0400 Received: from mail.ozlabs.org ([2404:9400:2221:ea00::3] helo=gandalf.ozlabs.org) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1qF6bI-0004am-Dx for qemu-devel@nongnu.org; Fri, 30 Jun 2023 01:23:21 -0400 Received: from gandalf.ozlabs.org (mail.ozlabs.org [IPv6:2404:9400:2221:ea00::3]) by gandalf.ozlabs.org (Postfix) with ESMTP id 4QskHV0SHfz4wpQ; Fri, 30 Jun 2023 15:23:18 +1000 (AEST) Received: from authenticated.ozlabs.org (localhost [127.0.0.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by mail.ozlabs.org (Postfix) with ESMTPSA id 4QskHQ6p4Pz4wZy; Fri, 30 Jun 2023 15:23:14 +1000 (AEST) From: =?utf-8?q?C=C3=A9dric_Le_Goater?= To: qemu-devel@nongnu.org Cc: Richard Henderson , Alex Williamson , Shameer Kolothum , Longpeng , =?utf-8?q?C=C3=A9dric_Le_Goater?= Subject: [PULL 11/16] vfio/pci: Call vfio_prepare_kvm_msi_virq_batch() in MSI retry path Date: Fri, 30 Jun 2023 07:22:30 +0200 Message-ID: <20230630052235.1934154-12-clg@redhat.com> X-Mailer: git-send-email 2.41.0 In-Reply-To: <20230630052235.1934154-1-clg@redhat.com> References: <20230630052235.1934154-1-clg@redhat.com> MIME-Version: 1.0 Received-SPF: pass client-ip=2404:9400:2221:ea00::3; envelope-from=SRS0=mf8A=CS=redhat.com=clg@ozlabs.org; helo=gandalf.ozlabs.org X-Spam_score_int: -39 X-Spam_score: -4.0 X-Spam_bar: ---- X-Spam_report: (-4.0 / 5.0 requ) BAYES_00=-1.9, HEADER_FROM_DIFFERENT_DOMAINS=0.25, RCVD_IN_DNSWL_MED=-2.3, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org From: Shameer Kolothum When vfio_enable_vectors() returns with less than requested nr_vectors we retry with what kernel reported back. But the retry path doesn't call vfio_prepare_kvm_msi_virq_batch() and this results in, qemu-system-aarch64: vfio: Error: Failed to enable 4 MSI vectors, retry with 1 qemu-system-aarch64: ../hw/vfio/pci.c:602: vfio_commit_kvm_msi_virq_batch: Assertion `vdev->defer_kvm_irq_routing' failed Fixes: dc580d51f7dd ("vfio: defer to commit kvm irq routing when enable msi/msix") Reviewed-by: Longpeng Signed-off-by: Shameer Kolothum Reviewed-by: Cédric Le Goater Signed-off-by: Cédric Le Goater --- hw/vfio/pci.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c index 73874a94de12..8fb2c53a63bf 100644 --- a/hw/vfio/pci.c +++ b/hw/vfio/pci.c @@ -663,6 +663,8 @@ static void vfio_msi_enable(VFIOPCIDevice *vdev) vfio_disable_interrupts(vdev); + vdev->nr_vectors = msi_nr_vectors_allocated(&vdev->pdev); +retry: /* * Setting vector notifiers needs to enable route for each vector. * Deferring to commit the KVM routes once rather than per vector @@ -670,8 +672,6 @@ static void vfio_msi_enable(VFIOPCIDevice *vdev) */ vfio_prepare_kvm_msi_virq_batch(vdev); - vdev->nr_vectors = msi_nr_vectors_allocated(&vdev->pdev); -retry: vdev->msi_vectors = g_new0(VFIOMSIVector, vdev->nr_vectors); for (i = 0; i < vdev->nr_vectors; i++) { From patchwork Fri Jun 30 05:22:31 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: =?utf-8?q?C=C3=A9dric_Le_Goater?= X-Patchwork-Id: 13297607 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id D844BEB64D7 for ; Fri, 30 Jun 2023 05:29:04 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1qF6bY-0008K7-2j; Fri, 30 Jun 2023 01:23:36 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1qF6bL-0008F9-CR for qemu-devel@nongnu.org; Fri, 30 Jun 2023 01:23:24 -0400 Received: from mail.ozlabs.org ([2404:9400:2221:ea00::3] helo=gandalf.ozlabs.org) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1qF6bJ-0004SY-Ah for qemu-devel@nongnu.org; Fri, 30 Jun 2023 01:23:23 -0400 Received: from gandalf.ozlabs.org (gandalf.ozlabs.org [150.107.74.76]) by gandalf.ozlabs.org (Postfix) with ESMTP id 4QskHX4HmDz4wpP; Fri, 30 Jun 2023 15:23:20 +1000 (AEST) Received: from authenticated.ozlabs.org (localhost [127.0.0.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by mail.ozlabs.org (Postfix) with ESMTPSA id 4QskHV4RtYz4wZy; Fri, 30 Jun 2023 15:23:18 +1000 (AEST) From: =?utf-8?q?C=C3=A9dric_Le_Goater?= To: qemu-devel@nongnu.org Cc: Richard Henderson , Alex Williamson , Avihai Horon , =?utf-8?q?C=C3=A9dric_Le_Goater?= Subject: [PULL 12/16] vfio/migration: Reset bytes_transferred properly Date: Fri, 30 Jun 2023 07:22:31 +0200 Message-ID: <20230630052235.1934154-13-clg@redhat.com> X-Mailer: git-send-email 2.41.0 In-Reply-To: <20230630052235.1934154-1-clg@redhat.com> References: <20230630052235.1934154-1-clg@redhat.com> MIME-Version: 1.0 Received-SPF: pass client-ip=2404:9400:2221:ea00::3; envelope-from=SRS0=mf8A=CS=redhat.com=clg@ozlabs.org; helo=gandalf.ozlabs.org X-Spam_score_int: -39 X-Spam_score: -4.0 X-Spam_bar: ---- X-Spam_report: (-4.0 / 5.0 requ) BAYES_00=-1.9, HEADER_FROM_DIFFERENT_DOMAINS=0.25, RCVD_IN_DNSWL_MED=-2.3, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org From: Avihai Horon Currently, VFIO bytes_transferred is not reset properly: 1. bytes_transferred is not reset after a VM snapshot (so a migration following a snapshot will report incorrect value). 2. bytes_transferred is a single counter for all VFIO devices, however upon migration failure it is reset multiple times, by each VFIO device. Fix it by introducing a new function vfio_reset_bytes_transferred() and calling it during migration and snapshot start. Remove existing bytes_transferred reset in VFIO migration state notifier, which is not needed anymore. Fixes: 3710586caa5d ("qapi: Add VFIO devices migration stats in Migration stats") Signed-off-by: Avihai Horon Reviewed-by: Cédric Le Goater Reviewed-by: Alex Williamson Signed-off-by: Cédric Le Goater --- include/hw/vfio/vfio-common.h | 1 + migration/migration.h | 1 + hw/vfio/migration.c | 6 +++++- migration/migration.c | 1 + migration/savevm.c | 1 + migration/target.c | 17 +++++++++++++++-- 6 files changed, 24 insertions(+), 3 deletions(-) diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h index 6d1b8487c374..1d19c6f251c1 100644 --- a/include/hw/vfio/vfio-common.h +++ b/include/hw/vfio/vfio-common.h @@ -229,6 +229,7 @@ int vfio_block_multiple_devices_migration(Error **errp); void vfio_unblock_multiple_devices_migration(void); int vfio_block_giommu_migration(Error **errp); int64_t vfio_mig_bytes_transferred(void); +void vfio_reset_bytes_transferred(void); #ifdef CONFIG_LINUX int vfio_get_region_info(VFIODevice *vbasedev, int index, diff --git a/migration/migration.h b/migration/migration.h index c859a0d35eb7..a80b22b703cd 100644 --- a/migration/migration.h +++ b/migration/migration.h @@ -514,6 +514,7 @@ bool migration_rate_limit(void); void migration_cancel(const Error *error); void populate_vfio_info(MigrationInfo *info); +void reset_vfio_bytes_transferred(void); void postcopy_temp_page_reset(PostcopyTmpPage *tmp_page); #endif diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c index acbf0bb7ab3c..7cf143926ce9 100644 --- a/hw/vfio/migration.c +++ b/hw/vfio/migration.c @@ -697,7 +697,6 @@ static void vfio_migration_state_notifier(Notifier *notifier, void *data) case MIGRATION_STATUS_CANCELLING: case MIGRATION_STATUS_CANCELLED: case MIGRATION_STATUS_FAILED: - bytes_transferred = 0; /* * If setting the device in RUNNING state fails, the device should * be reset. To do so, use ERROR state as a recover state. @@ -818,6 +817,11 @@ int64_t vfio_mig_bytes_transferred(void) return bytes_transferred; } +void vfio_reset_bytes_transferred(void) +{ + bytes_transferred = 0; +} + int vfio_migration_realize(VFIODevice *vbasedev, Error **errp) { int ret = -ENOTSUP; diff --git a/migration/migration.c b/migration/migration.c index 7653787f745d..096e8191d15c 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -1628,6 +1628,7 @@ static bool migrate_prepare(MigrationState *s, bool blk, bool blk_inc, */ memset(&mig_stats, 0, sizeof(mig_stats)); memset(&compression_counters, 0, sizeof(compression_counters)); + reset_vfio_bytes_transferred(); return true; } diff --git a/migration/savevm.c b/migration/savevm.c index cdf47939244d..95c2abf47c10 100644 --- a/migration/savevm.c +++ b/migration/savevm.c @@ -1622,6 +1622,7 @@ static int qemu_savevm_state(QEMUFile *f, Error **errp) migrate_init(ms); memset(&mig_stats, 0, sizeof(mig_stats)); memset(&compression_counters, 0, sizeof(compression_counters)); + reset_vfio_bytes_transferred(); ms->to_dst_file = f; qemu_mutex_unlock_iothread(); diff --git a/migration/target.c b/migration/target.c index 00ca007f9784..f39c9a8d8877 100644 --- a/migration/target.c +++ b/migration/target.c @@ -14,12 +14,25 @@ #include "hw/vfio/vfio-common.h" #endif +#ifdef CONFIG_VFIO void populate_vfio_info(MigrationInfo *info) { -#ifdef CONFIG_VFIO if (vfio_mig_active()) { info->vfio = g_malloc0(sizeof(*info->vfio)); info->vfio->transferred = vfio_mig_bytes_transferred(); } -#endif } + +void reset_vfio_bytes_transferred(void) +{ + vfio_reset_bytes_transferred(); +} +#else +void populate_vfio_info(MigrationInfo *info) +{ +} + +void reset_vfio_bytes_transferred(void) +{ +} +#endif From patchwork Fri Jun 30 05:22:32 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: =?utf-8?q?C=C3=A9dric_Le_Goater?= X-Patchwork-Id: 13297608 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 3EF7EEB64D7 for ; Fri, 30 Jun 2023 05:29:39 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1qF6bZ-0008L7-Gk; Fri, 30 Jun 2023 01:23:37 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1qF6bT-0008IB-1G for qemu-devel@nongnu.org; Fri, 30 Jun 2023 01:23:31 -0400 Received: from gandalf.ozlabs.org ([150.107.74.76]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1qF6bQ-0004i2-LV for qemu-devel@nongnu.org; Fri, 30 Jun 2023 01:23:30 -0400 Received: from gandalf.ozlabs.org (mail.ozlabs.org [IPv6:2404:9400:2221:ea00::3]) by gandalf.ozlabs.org (Postfix) with ESMTP id 4QskHb4TlQz4wpS; Fri, 30 Jun 2023 15:23:23 +1000 (AEST) Received: from authenticated.ozlabs.org (localhost [127.0.0.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by mail.ozlabs.org (Postfix) with ESMTPSA id 4QskHY1B11z4wpR; Fri, 30 Jun 2023 15:23:20 +1000 (AEST) From: =?utf-8?q?C=C3=A9dric_Le_Goater?= To: qemu-devel@nongnu.org Cc: Richard Henderson , Alex Williamson , Avihai Horon , Joao Martins , =?utf-8?q?C=C3=A9dric_Le_Goater?= Subject: [PULL 13/16] vfio/migration: Make VFIO migration non-experimental Date: Fri, 30 Jun 2023 07:22:32 +0200 Message-ID: <20230630052235.1934154-14-clg@redhat.com> X-Mailer: git-send-email 2.41.0 In-Reply-To: <20230630052235.1934154-1-clg@redhat.com> References: <20230630052235.1934154-1-clg@redhat.com> MIME-Version: 1.0 Received-SPF: pass client-ip=150.107.74.76; envelope-from=SRS0=mf8A=CS=redhat.com=clg@ozlabs.org; helo=gandalf.ozlabs.org X-Spam_score_int: -16 X-Spam_score: -1.7 X-Spam_bar: - X-Spam_report: (-1.7 / 5.0 requ) BAYES_00=-1.9, HEADER_FROM_DIFFERENT_DOMAINS=0.25, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=no autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org From: Avihai Horon The major parts of VFIO migration are supported today in QEMU. This includes basic VFIO migration, device dirty page tracking and precopy support. Thus, at this point in time, it seems appropriate to make VFIO migration non-experimental: remove the x prefix from enable_migration property, change it to ON_OFF_AUTO and let the default value be AUTO. In addition, make the following adjustments: 1. When enable_migration is ON and migration is not supported, fail VFIO device realization. 2. When enable_migration is AUTO (i.e., not explicitly enabled), require device dirty tracking support. This is because device dirty tracking is currently the only method to do dirty page tracking, which is essential for migrating in a reasonable downtime. Setting enable_migration to ON will not require device dirty tracking. 3. Make migration error and blocker messages more elaborate. 4. Remove error prints in vfio_migration_query_flags(). 5. Rename trace_vfio_migration_probe() to trace_vfio_migration_realize(). Signed-off-by: Avihai Horon Reviewed-by: Joao Martins Reviewed-by: Cédric Le Goater Reviewed-by: Alex Williamson Signed-off-by: Cédric Le Goater --- include/hw/vfio/vfio-common.h | 6 +-- hw/vfio/common.c | 16 ++++++- hw/vfio/migration.c | 79 +++++++++++++++++++++++------------ hw/vfio/pci.c | 4 +- hw/vfio/trace-events | 2 +- 5 files changed, 73 insertions(+), 34 deletions(-) diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h index 1d19c6f251c1..93429b9abba0 100644 --- a/include/hw/vfio/vfio-common.h +++ b/include/hw/vfio/vfio-common.h @@ -139,7 +139,7 @@ typedef struct VFIODevice { bool needs_reset; bool no_mmap; bool ram_block_discard_allowed; - bool enable_migration; + OnOffAuto enable_migration; VFIODeviceOps *ops; unsigned int num_irqs; unsigned int num_regions; @@ -225,9 +225,9 @@ typedef QLIST_HEAD(VFIOGroupList, VFIOGroup) VFIOGroupList; extern VFIOGroupList vfio_group_list; bool vfio_mig_active(void); -int vfio_block_multiple_devices_migration(Error **errp); +int vfio_block_multiple_devices_migration(VFIODevice *vbasedev, Error **errp); void vfio_unblock_multiple_devices_migration(void); -int vfio_block_giommu_migration(Error **errp); +int vfio_block_giommu_migration(VFIODevice *vbasedev, Error **errp); int64_t vfio_mig_bytes_transferred(void); void vfio_reset_bytes_transferred(void); diff --git a/hw/vfio/common.c b/hw/vfio/common.c index 28ec9e999c09..77e2ee0e5c6e 100644 --- a/hw/vfio/common.c +++ b/hw/vfio/common.c @@ -381,7 +381,7 @@ static unsigned int vfio_migratable_device_num(void) return device_num; } -int vfio_block_multiple_devices_migration(Error **errp) +int vfio_block_multiple_devices_migration(VFIODevice *vbasedev, Error **errp) { int ret; @@ -390,6 +390,12 @@ int vfio_block_multiple_devices_migration(Error **errp) return 0; } + if (vbasedev->enable_migration == ON_OFF_AUTO_ON) { + error_setg(errp, "Migration is currently not supported with multiple " + "VFIO devices"); + return -EINVAL; + } + error_setg(&multiple_devices_migration_blocker, "Migration is currently not supported with multiple " "VFIO devices"); @@ -427,7 +433,7 @@ static bool vfio_viommu_preset(void) return false; } -int vfio_block_giommu_migration(Error **errp) +int vfio_block_giommu_migration(VFIODevice *vbasedev, Error **errp) { int ret; @@ -436,6 +442,12 @@ int vfio_block_giommu_migration(Error **errp) return 0; } + if (vbasedev->enable_migration == ON_OFF_AUTO_ON) { + error_setg(errp, + "Migration is currently not supported with vIOMMU enabled"); + return -EINVAL; + } + error_setg(&giommu_migration_blocker, "Migration is currently not supported with vIOMMU enabled"); ret = migrate_add_blocker(giommu_migration_blocker, errp); diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c index 7cf143926ce9..1db7d52ab2c1 100644 --- a/hw/vfio/migration.c +++ b/hw/vfio/migration.c @@ -724,14 +724,6 @@ static int vfio_migration_query_flags(VFIODevice *vbasedev, uint64_t *mig_flags) feature->argsz = sizeof(buf); feature->flags = VFIO_DEVICE_FEATURE_GET | VFIO_DEVICE_FEATURE_MIGRATION; if (ioctl(vbasedev->fd, VFIO_DEVICE_FEATURE, feature)) { - if (errno == ENOTTY) { - error_report("%s: VFIO migration is not supported in kernel", - vbasedev->name); - } else { - error_report("%s: Failed to query VFIO migration support, err: %s", - vbasedev->name, strerror(errno)); - } - return -errno; } @@ -810,6 +802,27 @@ static int vfio_migration_init(VFIODevice *vbasedev) return 0; } +static int vfio_block_migration(VFIODevice *vbasedev, Error *err, Error **errp) +{ + int ret; + + if (vbasedev->enable_migration == ON_OFF_AUTO_ON) { + error_propagate(errp, err); + return -EINVAL; + } + + vbasedev->migration_blocker = error_copy(err); + error_free(err); + + ret = migrate_add_blocker(vbasedev->migration_blocker, errp); + if (ret < 0) { + error_free(vbasedev->migration_blocker); + vbasedev->migration_blocker = NULL; + } + + return ret; +} + /* ---------------------------------------------------------------------- */ int64_t vfio_mig_bytes_transferred(void) @@ -824,40 +837,54 @@ void vfio_reset_bytes_transferred(void) int vfio_migration_realize(VFIODevice *vbasedev, Error **errp) { - int ret = -ENOTSUP; + Error *err = NULL; + int ret; - if (!vbasedev->enable_migration) { - goto add_blocker; + if (vbasedev->enable_migration == ON_OFF_AUTO_OFF) { + error_setg(&err, "%s: Migration is disabled for VFIO device", + vbasedev->name); + return vfio_block_migration(vbasedev, err, errp); } ret = vfio_migration_init(vbasedev); if (ret) { - goto add_blocker; + if (ret == -ENOTTY) { + error_setg(&err, "%s: VFIO migration is not supported in kernel", + vbasedev->name); + } else { + error_setg(&err, + "%s: Migration couldn't be initialized for VFIO device, " + "err: %d (%s)", + vbasedev->name, ret, strerror(-ret)); + } + + return vfio_block_migration(vbasedev, err, errp); + } + + if (!vbasedev->dirty_pages_supported) { + if (vbasedev->enable_migration == ON_OFF_AUTO_AUTO) { + error_setg(&err, + "%s: VFIO device doesn't support device dirty tracking", + vbasedev->name); + return vfio_block_migration(vbasedev, err, errp); + } + + warn_report("%s: VFIO device doesn't support device dirty tracking", + vbasedev->name); } - ret = vfio_block_multiple_devices_migration(errp); + ret = vfio_block_multiple_devices_migration(vbasedev, errp); if (ret) { return ret; } - ret = vfio_block_giommu_migration(errp); + ret = vfio_block_giommu_migration(vbasedev, errp); if (ret) { return ret; } - trace_vfio_migration_probe(vbasedev->name); + trace_vfio_migration_realize(vbasedev->name); return 0; - -add_blocker: - error_setg(&vbasedev->migration_blocker, - "VFIO device doesn't support migration"); - - ret = migrate_add_blocker(vbasedev->migration_blocker, errp); - if (ret < 0) { - error_free(vbasedev->migration_blocker); - vbasedev->migration_blocker = NULL; - } - return ret; } void vfio_migration_exit(VFIODevice *vbasedev) diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c index 8fb2c53a63bf..73e19a04b2bf 100644 --- a/hw/vfio/pci.c +++ b/hw/vfio/pci.c @@ -3347,8 +3347,8 @@ static Property vfio_pci_dev_properties[] = { VFIO_FEATURE_ENABLE_REQ_BIT, true), DEFINE_PROP_BIT("x-igd-opregion", VFIOPCIDevice, features, VFIO_FEATURE_ENABLE_IGD_OPREGION_BIT, false), - DEFINE_PROP_BOOL("x-enable-migration", VFIOPCIDevice, - vbasedev.enable_migration, false), + DEFINE_PROP_ON_OFF_AUTO("enable-migration", VFIOPCIDevice, + vbasedev.enable_migration, ON_OFF_AUTO_AUTO), DEFINE_PROP_BOOL("x-no-mmap", VFIOPCIDevice, vbasedev.no_mmap, false), DEFINE_PROP_BOOL("x-balloon-allowed", VFIOPCIDevice, vbasedev.ram_block_discard_allowed, false), diff --git a/hw/vfio/trace-events b/hw/vfio/trace-events index e328d644d289..ee7509e68e4f 100644 --- a/hw/vfio/trace-events +++ b/hw/vfio/trace-events @@ -155,7 +155,7 @@ vfio_load_cleanup(const char *name) " (%s)" vfio_load_device_config_state(const char *name) " (%s)" vfio_load_state(const char *name, uint64_t data) " (%s) data 0x%"PRIx64 vfio_load_state_device_data(const char *name, uint64_t data_size, int ret) " (%s) size 0x%"PRIx64" ret %d" -vfio_migration_probe(const char *name) " (%s)" +vfio_migration_realize(const char *name) " (%s)" vfio_migration_set_state(const char *name, const char *state) " (%s) state %s" vfio_migration_state_notifier(const char *name, const char *state) " (%s) state %s" vfio_save_block(const char *name, int data_size) " (%s) data_size %d" From patchwork Fri Jun 30 05:22:33 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: =?utf-8?q?C=C3=A9dric_Le_Goater?= X-Patchwork-Id: 13297604 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 897E7EB64D7 for ; Fri, 30 Jun 2023 05:27:07 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1qF6bV-0008JE-MA; Fri, 30 Jun 2023 01:23:33 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1qF6bS-0008Hk-PX for qemu-devel@nongnu.org; Fri, 30 Jun 2023 01:23:30 -0400 Received: from mail.ozlabs.org ([2404:9400:2221:ea00::3] helo=gandalf.ozlabs.org) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1qF6bQ-0004jv-Sg for qemu-devel@nongnu.org; Fri, 30 Jun 2023 01:23:30 -0400 Received: from gandalf.ozlabs.org (gandalf.ozlabs.org [150.107.74.76]) by gandalf.ozlabs.org (Postfix) with ESMTP id 4QskHf1L7wz4wpV; Fri, 30 Jun 2023 15:23:26 +1000 (AEST) Received: from authenticated.ozlabs.org (localhost [127.0.0.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by mail.ozlabs.org (Postfix) with ESMTPSA id 4QskHc1VdMz4wpR; Fri, 30 Jun 2023 15:23:23 +1000 (AEST) From: =?utf-8?q?C=C3=A9dric_Le_Goater?= To: qemu-devel@nongnu.org Cc: Richard Henderson , Alex Williamson , =?utf-8?q?Philippe_Mathieu-Da?= =?utf-8?q?ud=C3=A9?= , =?utf-8?q?C=C3=A9dric_Le_Goater?= Subject: [PULL 14/16] =?utf-8?q?MAINTAINERS=3A_Promote_C=C3=A9dric_to_VFIO_c?= =?utf-8?q?o-maintainer?= Date: Fri, 30 Jun 2023 07:22:33 +0200 Message-ID: <20230630052235.1934154-15-clg@redhat.com> X-Mailer: git-send-email 2.41.0 In-Reply-To: <20230630052235.1934154-1-clg@redhat.com> References: <20230630052235.1934154-1-clg@redhat.com> MIME-Version: 1.0 Received-SPF: pass client-ip=2404:9400:2221:ea00::3; envelope-from=SRS0=mf8A=CS=redhat.com=clg@ozlabs.org; helo=gandalf.ozlabs.org X-Spam_score_int: -39 X-Spam_score: -4.0 X-Spam_bar: ---- X-Spam_report: (-4.0 / 5.0 requ) BAYES_00=-1.9, HEADER_FROM_DIFFERENT_DOMAINS=0.25, RCVD_IN_DNSWL_MED=-2.3, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org From: Alex Williamson Cédric has stepped up involvement in vfio, reviewing and managing patches, as well as pull requests. This work deserves gratitude and punishment with a promotion to co-maintainer ;) Signed-off-by: Alex Williamson Reviewed-by: Philippe Mathieu-Daudé Acked-by: Cédric Le Goater Signed-off-by: Cédric Le Goater --- MAINTAINERS | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/MAINTAINERS b/MAINTAINERS index aba07722f64f..4feea49a6e95 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -2051,7 +2051,7 @@ F: hw/usb/dev-serial.c VFIO M: Alex Williamson -R: Cédric Le Goater +M: Cédric Le Goater S: Supported F: hw/vfio/* F: include/hw/vfio/ From patchwork Fri Jun 30 05:22:34 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: =?utf-8?q?C=C3=A9dric_Le_Goater?= X-Patchwork-Id: 13297606 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 73CBBEB64DD for ; Fri, 30 Jun 2023 05:28:32 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1qF6ba-0008LJ-4P; Fri, 30 Jun 2023 01:23:38 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1qF6bV-0008JF-Av for qemu-devel@nongnu.org; Fri, 30 Jun 2023 01:23:33 -0400 Received: from mail.ozlabs.org ([2404:9400:2221:ea00::3] helo=gandalf.ozlabs.org) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1qF6bT-0004qX-Mw for qemu-devel@nongnu.org; Fri, 30 Jun 2023 01:23:33 -0400 Received: from gandalf.ozlabs.org (mail.ozlabs.org [IPv6:2404:9400:2221:ea00::3]) by gandalf.ozlabs.org (Postfix) with ESMTP id 4QskHj2LcVz4wpX; Fri, 30 Jun 2023 15:23:29 +1000 (AEST) Received: from authenticated.ozlabs.org (localhost [127.0.0.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by mail.ozlabs.org (Postfix) with ESMTPSA id 4QskHf5M1xz4wpR; Fri, 30 Jun 2023 15:23:26 +1000 (AEST) From: =?utf-8?q?C=C3=A9dric_Le_Goater?= To: qemu-devel@nongnu.org Cc: Richard Henderson , Alex Williamson , Zhenzhong Duan , =?utf-8?q?C=C3=A9dric_Le_Goater?= , Joao Martins Subject: [PULL 15/16] vfio/pci: Fix a segfault in vfio_realize Date: Fri, 30 Jun 2023 07:22:34 +0200 Message-ID: <20230630052235.1934154-16-clg@redhat.com> X-Mailer: git-send-email 2.41.0 In-Reply-To: <20230630052235.1934154-1-clg@redhat.com> References: <20230630052235.1934154-1-clg@redhat.com> MIME-Version: 1.0 Received-SPF: pass client-ip=2404:9400:2221:ea00::3; envelope-from=SRS0=mf8A=CS=redhat.com=clg@ozlabs.org; helo=gandalf.ozlabs.org X-Spam_score_int: -39 X-Spam_score: -4.0 X-Spam_bar: ---- X-Spam_report: (-4.0 / 5.0 requ) BAYES_00=-1.9, HEADER_FROM_DIFFERENT_DOMAINS=0.25, RCVD_IN_DNSWL_MED=-2.3, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org From: Zhenzhong Duan The kvm irqchip notifier is only registered if the device supports INTx, however it's unconditionally removed in vfio realize error path. If the assigned device does not support INTx, this will cause QEMU to crash when vfio realize fails. Change it to conditionally remove the notifier only if the notify hook is setup. Before fix: (qemu) device_add vfio-pci,host=81:11.1,id=vfio1,bus=root1,xres=1 Connection closed by foreign host. After fix: (qemu) device_add vfio-pci,host=81:11.1,id=vfio1,bus=root1,xres=1 Error: vfio 0000:81:11.1: xres and yres properties require display=on (qemu) Fixes: c5478fea27ac ("vfio/pci: Respond to KVM irqchip change notifier") Signed-off-by: Zhenzhong Duan Reviewed-by: Cédric Le Goater Reviewed-by: Joao Martins Signed-off-by: Cédric Le Goater --- hw/vfio/pci.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c index 73e19a04b2bf..48df517f79ee 100644 --- a/hw/vfio/pci.c +++ b/hw/vfio/pci.c @@ -3221,7 +3221,9 @@ static void vfio_realize(PCIDevice *pdev, Error **errp) out_deregister: pci_device_set_intx_routing_notifier(&vdev->pdev, NULL); - kvm_irqchip_remove_change_notifier(&vdev->irqchip_change_notifier); + if (vdev->irqchip_change_notifier.notify) { + kvm_irqchip_remove_change_notifier(&vdev->irqchip_change_notifier); + } out_teardown: vfio_teardown_msi(vdev); vfio_bars_exit(vdev); From patchwork Fri Jun 30 05:22:35 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: =?utf-8?q?C=C3=A9dric_Le_Goater?= X-Patchwork-Id: 13297594 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id E81E8C001B0 for ; Fri, 30 Jun 2023 05:23:57 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1qF6bd-0008LN-RP; Fri, 30 Jun 2023 01:23:41 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1qF6bY-0008KR-EW for qemu-devel@nongnu.org; Fri, 30 Jun 2023 01:23:36 -0400 Received: from mail.ozlabs.org ([2404:9400:2221:ea00::3] helo=gandalf.ozlabs.org) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1qF6bW-0004xk-Gm for qemu-devel@nongnu.org; Fri, 30 Jun 2023 01:23:36 -0400 Received: from gandalf.ozlabs.org (gandalf.ozlabs.org [150.107.74.76]) by gandalf.ozlabs.org (Postfix) with ESMTP id 4QskHm17Wqz4wpK; Fri, 30 Jun 2023 15:23:32 +1000 (AEST) Received: from authenticated.ozlabs.org (localhost [127.0.0.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by mail.ozlabs.org (Postfix) with ESMTPSA id 4QskHj6MdBz4wpR; Fri, 30 Jun 2023 15:23:29 +1000 (AEST) From: =?utf-8?q?C=C3=A9dric_Le_Goater?= To: qemu-devel@nongnu.org Cc: Richard Henderson , Alex Williamson , Zhenzhong Duan , =?utf-8?q?C=C3=A9dric_Le_Goater?= , Joao Martins Subject: [PULL 16/16] vfio/pci: Free leaked timer in vfio_realize error path Date: Fri, 30 Jun 2023 07:22:35 +0200 Message-ID: <20230630052235.1934154-17-clg@redhat.com> X-Mailer: git-send-email 2.41.0 In-Reply-To: <20230630052235.1934154-1-clg@redhat.com> References: <20230630052235.1934154-1-clg@redhat.com> MIME-Version: 1.0 Received-SPF: pass client-ip=2404:9400:2221:ea00::3; envelope-from=SRS0=mf8A=CS=redhat.com=clg@ozlabs.org; helo=gandalf.ozlabs.org X-Spam_score_int: -39 X-Spam_score: -4.0 X-Spam_bar: ---- X-Spam_report: (-4.0 / 5.0 requ) BAYES_00=-1.9, HEADER_FROM_DIFFERENT_DOMAINS=0.25, RCVD_IN_DNSWL_MED=-2.3, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org From: Zhenzhong Duan When vfio_realize fails, the mmap_timer used for INTx optimization isn't freed. As this timer isn't activated yet, the potential impact is just a piece of leaked memory. Fixes: ea486926b07d ("vfio-pci: Update slow path INTx algorithm timer related") Signed-off-by: Zhenzhong Duan Reviewed-by: Cédric Le Goater Reviewed-by: Joao Martins Signed-off-by: Cédric Le Goater --- hw/vfio/pci.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c index 48df517f79ee..ab6645ba60af 100644 --- a/hw/vfio/pci.c +++ b/hw/vfio/pci.c @@ -3224,6 +3224,9 @@ out_deregister: if (vdev->irqchip_change_notifier.notify) { kvm_irqchip_remove_change_notifier(&vdev->irqchip_change_notifier); } + if (vdev->intx.mmap_timer) { + timer_free(vdev->intx.mmap_timer); + } out_teardown: vfio_teardown_msi(vdev); vfio_bars_exit(vdev);