From patchwork Tue Jan 16 03:19:28 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 13520347 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 78680C4706C for ; Tue, 16 Jan 2024 03:20:46 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1rPZzg-0005f6-JG; Mon, 15 Jan 2024 22:20:04 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1rPZzf-0005ey-Ke for qemu-devel@nongnu.org; Mon, 15 Jan 2024 22:20:03 -0500 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1rPZzd-0002kz-RA for qemu-devel@nongnu.org; Mon, 15 Jan 2024 22:20:03 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1705375201; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=QlKHHaiILIxucG6Imi52ZHb3N/57FdWmDTOKqL1DEPo=; b=Popktlznl5haTKEtH48gTn80iF8vylq7O0PXPVoa4Qut8/qmpAhtb2HwnzCkeRBLoQAKKb kvxgIjbhrcYkK5/emRs52jKkoKJ9bTrYCNfjF5hxdRnL7I5lskn/QzAH7NCx8RtrGMXTnQ Y8WvcBZR0uYG61Vea3r7Z7nL+wA4adg= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-622-MwkvHsMHP6S82Bs-8Fm2fg-1; Mon, 15 Jan 2024 22:19:56 -0500 X-MC-Unique: MwkvHsMHP6S82Bs-8Fm2fg-1 Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.rdu2.redhat.com [10.11.54.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 1F1D685A588; Tue, 16 Jan 2024 03:19:56 +0000 (UTC) Received: from x1n.redhat.com (unknown [10.72.116.92]) by smtp.corp.redhat.com (Postfix) with ESMTP id E49123C25; Tue, 16 Jan 2024 03:19:52 +0000 (UTC) From: peterx@redhat.com To: qemu-devel@nongnu.org, Peter Maydell Cc: peterx@redhat.com, Fabiano Rosas , Het Gala , Markus Armbruster Subject: [PULL 01/20] migration: Simplify initial conditionals in migration for better readability Date: Tue, 16 Jan 2024 11:19:28 +0800 Message-ID: <20240116031947.69017-2-peterx@redhat.com> In-Reply-To: <20240116031947.69017-1-peterx@redhat.com> References: <20240116031947.69017-1-peterx@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.4.1 on 10.11.54.1 Received-SPF: pass client-ip=170.10.133.124; envelope-from=peterx@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -35 X-Spam_score: -3.6 X-Spam_bar: --- X-Spam_report: (-3.6 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-1.531, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H3=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org From: Het Gala The inital conditional statements in qmp migration functions is harder to understand than necessary. It is better to get all errors out of the way in the beginning itself to have better readability and error handling. Signed-off-by: Het Gala Suggested-by: Markus Armbruster Reviewed-by: Fabiano Rosas Link: https://lore.kernel.org/r/20231205080039.197615-1-het.gala@nutanix.com Signed-off-by: Peter Xu --- migration/migration.c | 36 ++++++++++++++++-------------------- 1 file changed, 16 insertions(+), 20 deletions(-) diff --git a/migration/migration.c b/migration/migration.c index 98c5c3e140..2365a3a13c 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -523,28 +523,26 @@ static void qemu_start_incoming_migration(const char *uri, bool has_channels, /* * Having preliminary checks for uri and channel */ - if (uri && has_channels) { - error_setg(errp, "'uri' and 'channels' arguments are mutually " - "exclusive; exactly one of the two should be present in " - "'migrate-incoming' qmp command "); + if (!uri == !channels) { + error_setg(errp, "need either 'uri' or 'channels' argument"); return; - } else if (channels) { + } + + if (channels) { /* To verify that Migrate channel list has only item */ if (channels->next) { error_setg(errp, "Channel list has more than one entries"); return; } addr = channels->value->addr; - } else if (uri) { + } + + if (uri) { /* caller uses the old URI syntax */ if (!migrate_uri_parse(uri, &channel, errp)) { return; } addr = channel->addr; - } else { - error_setg(errp, "neither 'uri' or 'channels' argument are " - "specified in 'migrate-incoming' qmp command "); - return; } /* transport mechanism not suitable for migration? */ @@ -1924,28 +1922,26 @@ void qmp_migrate(const char *uri, bool has_channels, /* * Having preliminary checks for uri and channel */ - if (uri && has_channels) { - error_setg(errp, "'uri' and 'channels' arguments are mutually " - "exclusive; exactly one of the two should be present in " - "'migrate' qmp command "); + if (!uri == !channels) { + error_setg(errp, "need either 'uri' or 'channels' argument"); return; - } else if (channels) { + } + + if (channels) { /* To verify that Migrate channel list has only item */ if (channels->next) { error_setg(errp, "Channel list has more than one entries"); return; } addr = channels->value->addr; - } else if (uri) { + } + + if (uri) { /* caller uses the old URI syntax */ if (!migrate_uri_parse(uri, &channel, errp)) { return; } addr = channel->addr; - } else { - error_setg(errp, "neither 'uri' or 'channels' argument are " - "specified in 'migrate' qmp command "); - return; } /* transport mechanism not suitable for migration? */ From patchwork Tue Jan 16 03:19:29 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 13520362 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id C67FBC47DA2 for ; Tue, 16 Jan 2024 03:22:34 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1rPZzj-0005fb-9V; Mon, 15 Jan 2024 22:20:07 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1rPZzg-0005fA-Uk for qemu-devel@nongnu.org; Mon, 15 Jan 2024 22:20:04 -0500 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1rPZzf-0002lS-7A for qemu-devel@nongnu.org; Mon, 15 Jan 2024 22:20:04 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1705375202; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=AAZOWz1VDAEpIhk2J3/myKMXglpqBFV6L3QJoHCSdrk=; b=Kv9O1YPFzn7AkZ17YvgntJBxjvQ6kqYW52Lq8jkpwIFunjULQCLRLEB3jCVlxI/+G+sttt 2+aW4BGpXgT/1Vsc4BslrbHdppYiqDubmDfrzTd4IXt2ZXy06fuCa2vjCSfPQ9P4ASVklD +74D/wYEEIGlNz0I7sreoFm+b2otoI4= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-572-fC1v7L7oMyyedHkMLoWj3w-1; Mon, 15 Jan 2024 22:19:59 -0500 X-MC-Unique: fC1v7L7oMyyedHkMLoWj3w-1 Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.rdu2.redhat.com [10.11.54.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 31370102587D; Tue, 16 Jan 2024 03:19:59 +0000 (UTC) Received: from x1n.redhat.com (unknown [10.72.116.92]) by smtp.corp.redhat.com (Postfix) with ESMTP id CC6213C25; Tue, 16 Jan 2024 03:19:56 +0000 (UTC) From: peterx@redhat.com To: qemu-devel@nongnu.org, Peter Maydell Cc: peterx@redhat.com, Fabiano Rosas Subject: [PULL 02/20] migration/multifd: Remove MultiFDPages_t::packet_num Date: Tue, 16 Jan 2024 11:19:29 +0800 Message-ID: <20240116031947.69017-3-peterx@redhat.com> In-Reply-To: <20240116031947.69017-1-peterx@redhat.com> References: <20240116031947.69017-1-peterx@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.4.1 on 10.11.54.1 Received-SPF: pass client-ip=170.10.133.124; envelope-from=peterx@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -35 X-Spam_score: -3.6 X-Spam_bar: --- X-Spam_report: (-3.6 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-1.531, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H3=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org From: Fabiano Rosas This was introduced by commit 34c55a94b1 ("migration: Create multipage support") and never used. Signed-off-by: Fabiano Rosas Reviewed-by: Peter Xu Link: https://lore.kernel.org/r/20240104142144.9680-2-farosas@suse.de Signed-off-by: Peter Xu --- migration/multifd.h | 2 -- migration/multifd.c | 1 - 2 files changed, 3 deletions(-) diff --git a/migration/multifd.h b/migration/multifd.h index a835643b48..b0ff610c37 100644 --- a/migration/multifd.h +++ b/migration/multifd.h @@ -58,8 +58,6 @@ typedef struct { uint32_t num; /* number of allocated pages */ uint32_t allocated; - /* global number of generated multifd packets */ - uint64_t packet_num; /* offset of each page */ ram_addr_t *offset; RAMBlock *block; diff --git a/migration/multifd.c b/migration/multifd.c index 9f353aecfa..3e650f5da0 100644 --- a/migration/multifd.c +++ b/migration/multifd.c @@ -250,7 +250,6 @@ static void multifd_pages_clear(MultiFDPages_t *pages) { pages->num = 0; pages->allocated = 0; - pages->packet_num = 0; pages->block = NULL; g_free(pages->offset); pages->offset = NULL; From patchwork Tue Jan 16 03:19:30 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 13520349 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 12756C4706C for ; Tue, 16 Jan 2024 03:21:34 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1rPa06-0005h9-3M; Mon, 15 Jan 2024 22:20:30 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1rPZzs-0005gM-94 for qemu-devel@nongnu.org; Mon, 15 Jan 2024 22:20:17 -0500 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1rPZzo-0002mU-DS for qemu-devel@nongnu.org; Mon, 15 Jan 2024 22:20:14 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1705375210; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=IawJs/pTkatJD0nB6iYg3s1VH4CU0TF4xWkc1B3gc/k=; b=XCbbndPnfyFnOQUK0QBhuRMpOS1zFUHoxFkpPNp01UDOLjaNwTTdrVPHaH+TJi4m+Tsz1E f3WN4ItDtUojXkVKkQoMIM/Sfr5YgKVvSPN7AY8Z/0yWyolfuDuxHTW5UkWW2xo3iw7z+I 8C7dEwtlzePQB30a+GK2l4VVZR05c1s= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-308-FLxkA_YKOP2uUqlprywDoQ-1; Mon, 15 Jan 2024 22:20:02 -0500 X-MC-Unique: FLxkA_YKOP2uUqlprywDoQ-1 Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.rdu2.redhat.com [10.11.54.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 2AC6D85A589; Tue, 16 Jan 2024 03:20:02 +0000 (UTC) Received: from x1n.redhat.com (unknown [10.72.116.92]) by smtp.corp.redhat.com (Postfix) with ESMTP id 004D83C25; Tue, 16 Jan 2024 03:19:59 +0000 (UTC) From: peterx@redhat.com To: qemu-devel@nongnu.org, Peter Maydell Cc: peterx@redhat.com, Fabiano Rosas Subject: [PULL 03/20] migration/multifd: Remove QEMUFile from where it is not needed Date: Tue, 16 Jan 2024 11:19:30 +0800 Message-ID: <20240116031947.69017-4-peterx@redhat.com> In-Reply-To: <20240116031947.69017-1-peterx@redhat.com> References: <20240116031947.69017-1-peterx@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.4.1 on 10.11.54.1 Received-SPF: pass client-ip=170.10.133.124; envelope-from=peterx@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -35 X-Spam_score: -3.6 X-Spam_bar: --- X-Spam_report: (-3.6 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-1.531, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H3=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org From: Fabiano Rosas Signed-off-by: Fabiano Rosas Reviewed-by: Peter Xu Link: https://lore.kernel.org/r/20240104142144.9680-3-farosas@suse.de Signed-off-by: Peter Xu --- migration/multifd.h | 4 ++-- migration/multifd.c | 12 ++++++------ migration/ram.c | 15 +++++++-------- 3 files changed, 15 insertions(+), 16 deletions(-) diff --git a/migration/multifd.h b/migration/multifd.h index b0ff610c37..35d11f103c 100644 --- a/migration/multifd.h +++ b/migration/multifd.h @@ -21,8 +21,8 @@ void multifd_load_shutdown(void); bool multifd_recv_all_channels_created(void); void multifd_recv_new_channel(QIOChannel *ioc, Error **errp); void multifd_recv_sync_main(void); -int multifd_send_sync_main(QEMUFile *f); -int multifd_queue_page(QEMUFile *f, RAMBlock *block, ram_addr_t offset); +int multifd_send_sync_main(void); +int multifd_queue_page(RAMBlock *block, ram_addr_t offset); /* Multifd Compression flags */ #define MULTIFD_FLAG_SYNC (1 << 0) diff --git a/migration/multifd.c b/migration/multifd.c index 3e650f5da0..2dbc3ba836 100644 --- a/migration/multifd.c +++ b/migration/multifd.c @@ -390,7 +390,7 @@ struct { * false. */ -static int multifd_send_pages(QEMUFile *f) +static int multifd_send_pages(void) { int i; static int next_channel; @@ -436,7 +436,7 @@ static int multifd_send_pages(QEMUFile *f) return 1; } -int multifd_queue_page(QEMUFile *f, RAMBlock *block, ram_addr_t offset) +int multifd_queue_page(RAMBlock *block, ram_addr_t offset) { MultiFDPages_t *pages = multifd_send_state->pages; bool changed = false; @@ -456,12 +456,12 @@ int multifd_queue_page(QEMUFile *f, RAMBlock *block, ram_addr_t offset) changed = true; } - if (multifd_send_pages(f) < 0) { + if (multifd_send_pages() < 0) { return -1; } if (changed) { - return multifd_queue_page(f, block, offset); + return multifd_queue_page(block, offset); } return 1; @@ -583,7 +583,7 @@ static int multifd_zero_copy_flush(QIOChannel *c) return ret; } -int multifd_send_sync_main(QEMUFile *f) +int multifd_send_sync_main(void) { int i; bool flush_zero_copy; @@ -592,7 +592,7 @@ int multifd_send_sync_main(QEMUFile *f) return 0; } if (multifd_send_state->pages->num) { - if (multifd_send_pages(f) < 0) { + if (multifd_send_pages() < 0) { error_report("%s: multifd_send_pages fail", __func__); return -1; } diff --git a/migration/ram.c b/migration/ram.c index 890f31cf66..c0cdcccb75 100644 --- a/migration/ram.c +++ b/migration/ram.c @@ -1250,10 +1250,9 @@ static int ram_save_page(RAMState *rs, PageSearchStatus *pss) return pages; } -static int ram_save_multifd_page(QEMUFile *file, RAMBlock *block, - ram_addr_t offset) +static int ram_save_multifd_page(RAMBlock *block, ram_addr_t offset) { - if (multifd_queue_page(file, block, offset) < 0) { + if (multifd_queue_page(block, offset) < 0) { return -1; } stat64_add(&mig_stats.normal_pages, 1); @@ -1336,7 +1335,7 @@ static int find_dirty_block(RAMState *rs, PageSearchStatus *pss) if (migrate_multifd() && !migrate_multifd_flush_after_each_section()) { QEMUFile *f = rs->pss[RAM_CHANNEL_PRECOPY].pss_channel; - int ret = multifd_send_sync_main(f); + int ret = multifd_send_sync_main(); if (ret < 0) { return ret; } @@ -2067,7 +2066,7 @@ static int ram_save_target_page_legacy(RAMState *rs, PageSearchStatus *pss) * still see partially copied pages which is data corruption. */ if (migrate_multifd() && !migration_in_postcopy()) { - return ram_save_multifd_page(pss->pss_channel, block, offset); + return ram_save_multifd_page(block, offset); } return ram_save_page(rs, pss); @@ -2985,7 +2984,7 @@ static int ram_save_setup(QEMUFile *f, void *opaque) migration_ops->ram_save_target_page = ram_save_target_page_legacy; bql_unlock(); - ret = multifd_send_sync_main(f); + ret = multifd_send_sync_main(); bql_lock(); if (ret < 0) { return ret; @@ -3109,7 +3108,7 @@ out: if (ret >= 0 && migration_is_setup_or_active(migrate_get_current()->state)) { if (migrate_multifd() && migrate_multifd_flush_after_each_section()) { - ret = multifd_send_sync_main(rs->pss[RAM_CHANNEL_PRECOPY].pss_channel); + ret = multifd_send_sync_main(); if (ret < 0) { return ret; } @@ -3183,7 +3182,7 @@ static int ram_save_complete(QEMUFile *f, void *opaque) } } - ret = multifd_send_sync_main(rs->pss[RAM_CHANNEL_PRECOPY].pss_channel); + ret = multifd_send_sync_main(); if (ret < 0) { return ret; } From patchwork Tue Jan 16 03:19:31 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 13520359 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 59ABAC4706C for ; Tue, 16 Jan 2024 03:22:26 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1rPZzq-0005g5-42; Mon, 15 Jan 2024 22:20:14 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1rPZzo-0005fx-E8 for qemu-devel@nongnu.org; Mon, 15 Jan 2024 22:20:12 -0500 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1rPZzm-0002mP-U8 for qemu-devel@nongnu.org; Mon, 15 Jan 2024 22:20:12 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1705375210; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=91aGM+cyDK4H2Uv3NCATyfK8T6QAGIVCked8NKauKIM=; b=JdHuVqTdDPu1YLiUQk7JV/XBOe1G48shbOO3CV7GJ+zIC7pNjJZ2Uc7HZan7RHprg06TIi mMz86TxQrnzvxyHL2QXKViXyFdz/G0NpY/SyXnqgaHz1JTVmFG0L7vOk7tZ2E/0qWbcnMR oMK0TJ/t8c+M8N2TxJqDYMCgo1V5QEY= Received: from mimecast-mx02.redhat.com (mx-ext.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-48-c-AbH1dkMdSZen7WUO6hYg-1; Mon, 15 Jan 2024 22:20:05 -0500 X-MC-Unique: c-AbH1dkMdSZen7WUO6hYg-1 Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.rdu2.redhat.com [10.11.54.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 3AC183811F51; Tue, 16 Jan 2024 03:20:05 +0000 (UTC) Received: from x1n.redhat.com (unknown [10.72.116.92]) by smtp.corp.redhat.com (Postfix) with ESMTP id D5FED3C25; Tue, 16 Jan 2024 03:20:02 +0000 (UTC) From: peterx@redhat.com To: qemu-devel@nongnu.org, Peter Maydell Cc: peterx@redhat.com, Fabiano Rosas Subject: [PULL 04/20] migration/multifd: Change multifd_pages_init argument Date: Tue, 16 Jan 2024 11:19:31 +0800 Message-ID: <20240116031947.69017-5-peterx@redhat.com> In-Reply-To: <20240116031947.69017-1-peterx@redhat.com> References: <20240116031947.69017-1-peterx@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.4.1 on 10.11.54.1 Received-SPF: pass client-ip=170.10.133.124; envelope-from=peterx@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -35 X-Spam_score: -3.6 X-Spam_bar: --- X-Spam_report: (-3.6 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-1.531, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H3=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org From: Fabiano Rosas The 'size' argument is actually the number of pages that fit in a multifd packet. Change it to uint32_t and rename. Signed-off-by: Fabiano Rosas Reviewed-by: Peter Xu Link: https://lore.kernel.org/r/20240104142144.9680-4-farosas@suse.de Signed-off-by: Peter Xu --- migration/multifd.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/migration/multifd.c b/migration/multifd.c index 2dbc3ba836..25cbc6dc6b 100644 --- a/migration/multifd.c +++ b/migration/multifd.c @@ -236,12 +236,12 @@ static int multifd_recv_initial_packet(QIOChannel *c, Error **errp) return msg.id; } -static MultiFDPages_t *multifd_pages_init(size_t size) +static MultiFDPages_t *multifd_pages_init(uint32_t n) { MultiFDPages_t *pages = g_new0(MultiFDPages_t, 1); - pages->allocated = size; - pages->offset = g_new0(ram_addr_t, size); + pages->allocated = n; + pages->offset = g_new0(ram_addr_t, n); return pages; } From patchwork Tue Jan 16 03:19:32 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 13520364 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 5F9C8C47DA2 for ; Tue, 16 Jan 2024 03:22:52 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1rPa1T-00080V-G1; Mon, 15 Jan 2024 22:21:55 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1rPa0k-0006Ku-Gf for qemu-devel@nongnu.org; Mon, 15 Jan 2024 22:21:10 -0500 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1rPa0i-00037k-Qu for qemu-devel@nongnu.org; Mon, 15 Jan 2024 22:21:10 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1705375268; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=PvtyEHXSoAVV107YvXzYk7OBbBD+6g0ZWzh1p/1A/18=; b=Wj5eNSS130OVnJtKysWSJqXVCzOWo9c6c3kvcKuPdECOD1s3cqVDQyOKBUUz6UsRRDIO/G i4AO48Ve2Jkb/iDUSDNwivOCFLS1zm9iGrzXF9+4WU7bjP4wmkBiJD9oaK/ekIaOOXfEch i9Qw+SdnDms4REl6H5SVZrZN+uDB7yM= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-458-rOT1F90jM26Y2qlQRuGB2Q-1; Mon, 15 Jan 2024 22:20:08 -0500 X-MC-Unique: rOT1F90jM26Y2qlQRuGB2Q-1 Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.rdu2.redhat.com [10.11.54.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 33A4C85A58A; Tue, 16 Jan 2024 03:20:08 +0000 (UTC) Received: from x1n.redhat.com (unknown [10.72.116.92]) by smtp.corp.redhat.com (Postfix) with ESMTP id 0B1863C25; Tue, 16 Jan 2024 03:20:05 +0000 (UTC) From: peterx@redhat.com To: qemu-devel@nongnu.org, Peter Maydell Cc: peterx@redhat.com, Fabiano Rosas Subject: [PULL 05/20] migration: Report error in incoming migration Date: Tue, 16 Jan 2024 11:19:32 +0800 Message-ID: <20240116031947.69017-6-peterx@redhat.com> In-Reply-To: <20240116031947.69017-1-peterx@redhat.com> References: <20240116031947.69017-1-peterx@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.4.1 on 10.11.54.1 Received-SPF: pass client-ip=170.10.133.124; envelope-from=peterx@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -35 X-Spam_score: -3.6 X-Spam_bar: --- X-Spam_report: (-3.6 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-1.531, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H3=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org From: Fabiano Rosas We're not currently reporting the errors set with migrate_set_error() when incoming migration fails. Signed-off-by: Fabiano Rosas Reviewed-by: Peter Xu Link: https://lore.kernel.org/r/20240104142144.9680-5-farosas@suse.de Signed-off-by: Peter Xu --- migration/migration.c | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/migration/migration.c b/migration/migration.c index 2365a3a13c..219447dea1 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -697,6 +697,13 @@ process_incoming_migration_co(void *opaque) } if (ret < 0) { + MigrationState *s = migrate_get_current(); + + if (migrate_has_error(s)) { + WITH_QEMU_LOCK_GUARD(&s->error_mutex) { + error_report_err(s->error); + } + } error_report("load of migration failed: %s", strerror(-ret)); goto fail; } From patchwork Tue Jan 16 03:19:33 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 13520365 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 7D20CC4706C for ; Tue, 16 Jan 2024 03:23:06 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1rPa0A-0005jU-1f; Mon, 15 Jan 2024 22:20:34 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1rPZzu-0005gc-3e for qemu-devel@nongnu.org; Mon, 15 Jan 2024 22:20:19 -0500 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1rPZzr-0002wD-Oz for qemu-devel@nongnu.org; Mon, 15 Jan 2024 22:20:17 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1705375214; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=HjQ6snptA/Cj63ykL6FQgiF2eJUpi9jSvmxY6jFdXDU=; b=RY+DVvaEwFJkJjHE+olBKhpJG/h9iMAT5cM1SAbkCT6TY32Nb6/uZqumEj9TvMrqMS/T+Q E/LXyk787wU7q940viVTTGIJ+qfU8PxE7gJ6gdN2H6ZS5v7LuXzDFyGqmqY89Rl6FwHqAs RP9hEXksV8Uezia3xJl1PuI0nsPmRns= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-650-SUIyRfjBOJ2gf4CJGlXJlw-1; Mon, 15 Jan 2024 22:20:11 -0500 X-MC-Unique: SUIyRfjBOJ2gf4CJGlXJlw-1 Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.rdu2.redhat.com [10.11.54.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 5E4C0185A781; Tue, 16 Jan 2024 03:20:11 +0000 (UTC) Received: from x1n.redhat.com (unknown [10.72.116.92]) by smtp.corp.redhat.com (Postfix) with ESMTP id 0347D3C25; Tue, 16 Jan 2024 03:20:08 +0000 (UTC) From: peterx@redhat.com To: qemu-devel@nongnu.org, Peter Maydell Cc: peterx@redhat.com, Fabiano Rosas Subject: [PULL 06/20] tests/qtest/migration: Print migration incoming errors Date: Tue, 16 Jan 2024 11:19:33 +0800 Message-ID: <20240116031947.69017-7-peterx@redhat.com> In-Reply-To: <20240116031947.69017-1-peterx@redhat.com> References: <20240116031947.69017-1-peterx@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.4.1 on 10.11.54.1 Received-SPF: pass client-ip=170.10.133.124; envelope-from=peterx@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -35 X-Spam_score: -3.6 X-Spam_bar: --- X-Spam_report: (-3.6 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-1.531, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H3=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org From: Fabiano Rosas We're currently just asserting when incoming migration fails. Let's print the error message from QMP as well. Signed-off-by: Fabiano Rosas Reviewed-by: Peter Xu Link: https://lore.kernel.org/r/20240104142144.9680-6-farosas@suse.de Signed-off-by: Peter Xu --- tests/qtest/migration-helpers.c | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/tests/qtest/migration-helpers.c b/tests/qtest/migration-helpers.c index 37e8e812c5..19384e3fa6 100644 --- a/tests/qtest/migration-helpers.c +++ b/tests/qtest/migration-helpers.c @@ -111,6 +111,12 @@ void migrate_incoming_qmp(QTestState *to, const char *uri, const char *fmt, ...) rsp = qtest_qmp(to, "{ 'execute': 'migrate-incoming', 'arguments': %p}", args); + + if (!qdict_haskey(rsp, "return")) { + g_autoptr(GString) s = qobject_to_json_pretty(QOBJECT(rsp), true); + g_test_message("%s", s->str); + } + g_assert(qdict_haskey(rsp, "return")); qobject_unref(rsp); From patchwork Tue Jan 16 03:19:34 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 13520352 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 29864C47DA2 for ; Tue, 16 Jan 2024 03:22:02 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1rPa0B-0005k9-Cy; Mon, 15 Jan 2024 22:20:35 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1rPZzy-0005gp-1K for qemu-devel@nongnu.org; Mon, 15 Jan 2024 22:20:26 -0500 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1rPZzv-0002xe-3F for qemu-devel@nongnu.org; Mon, 15 Jan 2024 22:20:21 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1705375218; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=uPgp0tmgiDc/OMvGTjwfLVUX42d/xPWCsacOI2JAgcQ=; b=PCUeM29ub6s64g8piVcn35CQWY3xOBSTaTmQzTtM2pqKgZc1lrDTIOIynr5kGL8dJNy9YP MyoAtE/p86n3dkruxh74DKmOOgLuCFhiO0TZBKu5Og5VWeW4BsOylcHSh5Z0+jfWkLUFXH EtXwYg9l1cOJXF2xwBnlYcSWr/QD3FM= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-356-XWED1zs0MfmvLDoh3bip2g-1; Mon, 15 Jan 2024 22:20:14 -0500 X-MC-Unique: XWED1zs0MfmvLDoh3bip2g-1 Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.rdu2.redhat.com [10.11.54.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 6FCEC800076; Tue, 16 Jan 2024 03:20:14 +0000 (UTC) Received: from x1n.redhat.com (unknown [10.72.116.92]) by smtp.corp.redhat.com (Postfix) with ESMTP id 2A8D93C25; Tue, 16 Jan 2024 03:20:11 +0000 (UTC) From: peterx@redhat.com To: qemu-devel@nongnu.org, Peter Maydell Cc: peterx@redhat.com, Fabiano Rosas Subject: [PULL 07/20] tests/qtest/migration: Add a wrapper to print test names Date: Tue, 16 Jan 2024 11:19:34 +0800 Message-ID: <20240116031947.69017-8-peterx@redhat.com> In-Reply-To: <20240116031947.69017-1-peterx@redhat.com> References: <20240116031947.69017-1-peterx@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.4.1 on 10.11.54.1 Received-SPF: pass client-ip=170.10.133.124; envelope-from=peterx@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -35 X-Spam_score: -3.6 X-Spam_bar: --- X-Spam_report: (-3.6 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-1.531, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H3=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org From: Fabiano Rosas Our usage of gtest results in us losing the very basic functionality of "knowing which test failed". The issue is that gtest only prints test names ("paths" in gtest parlance) once the test has finished, but we use asserts in the tests and crash gtest itself before it can print anything. We also use a final abort when the result of g_test_run is not 0. Depending on how the test failed/broke we can see the function that trigged the abort, which may be representative of the test, but it could also just be some generic function. We have been relying on the primitive method of looking at the name of the previous successful test and then looking at the code to figure out which test should have come next. Add a wrapper to the test registration that does the job of printing the test name before running. Signed-off-by: Fabiano Rosas Reviewed-by: Peter Xu Link: https://lore.kernel.org/r/20240104142144.9680-7-farosas@suse.de Signed-off-by: Peter Xu --- tests/qtest/migration-helpers.h | 1 + tests/qtest/migration-helpers.c | 32 ++++++++++++++++++++++++++++++++ 2 files changed, 33 insertions(+) diff --git a/tests/qtest/migration-helpers.h b/tests/qtest/migration-helpers.h index b478549096..3bf7ded1b9 100644 --- a/tests/qtest/migration-helpers.h +++ b/tests/qtest/migration-helpers.h @@ -52,4 +52,5 @@ char *find_common_machine_version(const char *mtype, const char *var1, const char *var2); char *resolve_machine_version(const char *alias, const char *var1, const char *var2); +void migration_test_add(const char *path, void (*fn)(void)); #endif /* MIGRATION_HELPERS_H */ diff --git a/tests/qtest/migration-helpers.c b/tests/qtest/migration-helpers.c index 19384e3fa6..e451dbdbed 100644 --- a/tests/qtest/migration-helpers.c +++ b/tests/qtest/migration-helpers.c @@ -291,3 +291,35 @@ char *resolve_machine_version(const char *alias, const char *var1, return find_common_machine_version(machine_name, var1, var2); } + +typedef struct { + char *name; + void (*func)(void); +} MigrationTest; + +static void migration_test_destroy(gpointer data) +{ + MigrationTest *test = (MigrationTest *)data; + + g_free(test->name); + g_free(test); +} + +static void migration_test_wrapper(const void *data) +{ + MigrationTest *test = (MigrationTest *)data; + + g_test_message("Running /%s%s", qtest_get_arch(), test->name); + test->func(); +} + +void migration_test_add(const char *path, void (*fn)(void)) +{ + MigrationTest *test = g_new0(MigrationTest, 1); + + test->func = fn; + test->name = g_strdup(path); + + qtest_add_data_func_full(path, test, migration_test_wrapper, + migration_test_destroy); +} From patchwork Tue Jan 16 03:19:35 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 13520348 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 73BB7C4706C for ; Tue, 16 Jan 2024 03:20:53 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1rPa0C-0005kY-4Z; Mon, 15 Jan 2024 22:20:36 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1rPZzz-0005gu-V1 for qemu-devel@nongnu.org; Mon, 15 Jan 2024 22:20:26 -0500 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1rPZzw-0002xm-Sl for qemu-devel@nongnu.org; Mon, 15 Jan 2024 22:20:23 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1705375219; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=0+o+MbL1QzF9Ussvv07NBtO6Oxt/PgpFeA0JIPJt2Z4=; b=dlIc1XR6g7oggaa00nYhg3yiH9QyXBLGzChav2o4Vfa2T5BC6DcSKD+04WNMBBNWxwUAHA imYG0GZl1Gpbv1tELbGeu9ZWBqTIJmODOGr4I0i8bfpYwpoSGPzWqssOPObu+gnep9VEpp W3MpxDdouZQ5uUHljUk6v0nSRgdM2Dc= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-34-NEooLRA6PoaS3pm6BOJiew-1; Mon, 15 Jan 2024 22:20:17 -0500 X-MC-Unique: NEooLRA6PoaS3pm6BOJiew-1 Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.rdu2.redhat.com [10.11.54.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 9A44785A58C; Tue, 16 Jan 2024 03:20:17 +0000 (UTC) Received: from x1n.redhat.com (unknown [10.72.116.92]) by smtp.corp.redhat.com (Postfix) with ESMTP id 26F843C25; Tue, 16 Jan 2024 03:20:14 +0000 (UTC) From: peterx@redhat.com To: qemu-devel@nongnu.org, Peter Maydell Cc: peterx@redhat.com, Fabiano Rosas Subject: [PULL 08/20] tests/qtest/migration: Use the new migration_test_add Date: Tue, 16 Jan 2024 11:19:35 +0800 Message-ID: <20240116031947.69017-9-peterx@redhat.com> In-Reply-To: <20240116031947.69017-1-peterx@redhat.com> References: <20240116031947.69017-1-peterx@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.4.1 on 10.11.54.1 Received-SPF: pass client-ip=170.10.133.124; envelope-from=peterx@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -35 X-Spam_score: -3.6 X-Spam_bar: --- X-Spam_report: (-3.6 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-1.531, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H3=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org From: Fabiano Rosas Replace the tests registration with the new function that prints tests names. Signed-off-by: Fabiano Rosas Reviewed-by: Peter Xu Link: https://lore.kernel.org/r/20240104142144.9680-8-farosas@suse.de Signed-off-by: Peter Xu --- tests/qtest/migration-test.c | 215 ++++++++++++++++++----------------- 1 file changed, 112 insertions(+), 103 deletions(-) diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c index 136e5df06c..21da140aea 100644 --- a/tests/qtest/migration-test.c +++ b/tests/qtest/migration-test.c @@ -3404,70 +3404,75 @@ int main(int argc, char **argv) module_call_init(MODULE_INIT_QOM); if (is_x86) { - qtest_add_func("/migration/precopy/unix/suspend/live", - test_precopy_unix_suspend_live); - qtest_add_func("/migration/precopy/unix/suspend/notlive", - test_precopy_unix_suspend_notlive); + migration_test_add("/migration/precopy/unix/suspend/live", + test_precopy_unix_suspend_live); + migration_test_add("/migration/precopy/unix/suspend/notlive", + test_precopy_unix_suspend_notlive); } if (has_uffd) { - qtest_add_func("/migration/postcopy/plain", test_postcopy); - qtest_add_func("/migration/postcopy/recovery/plain", - test_postcopy_recovery); - qtest_add_func("/migration/postcopy/preempt/plain", test_postcopy_preempt); - qtest_add_func("/migration/postcopy/preempt/recovery/plain", - test_postcopy_preempt_recovery); + migration_test_add("/migration/postcopy/plain", test_postcopy); + migration_test_add("/migration/postcopy/recovery/plain", + test_postcopy_recovery); + migration_test_add("/migration/postcopy/preempt/plain", + test_postcopy_preempt); + migration_test_add("/migration/postcopy/preempt/recovery/plain", + test_postcopy_preempt_recovery); if (getenv("QEMU_TEST_FLAKY_TESTS")) { - qtest_add_func("/migration/postcopy/compress/plain", - test_postcopy_compress); - qtest_add_func("/migration/postcopy/recovery/compress/plain", - test_postcopy_recovery_compress); + migration_test_add("/migration/postcopy/compress/plain", + test_postcopy_compress); + migration_test_add("/migration/postcopy/recovery/compress/plain", + test_postcopy_recovery_compress); } #ifndef _WIN32 - qtest_add_func("/migration/postcopy/recovery/double-failures", - test_postcopy_recovery_double_fail); + migration_test_add("/migration/postcopy/recovery/double-failures", + test_postcopy_recovery_double_fail); #endif /* _WIN32 */ if (is_x86) { - qtest_add_func("/migration/postcopy/suspend", - test_postcopy_suspend); + migration_test_add("/migration/postcopy/suspend", + test_postcopy_suspend); } } - qtest_add_func("/migration/bad_dest", test_baddest); + migration_test_add("/migration/bad_dest", test_baddest); #ifndef _WIN32 - qtest_add_func("/migration/analyze-script", test_analyze_script); + if (!g_str_equal(arch, "s390x")) { + migration_test_add("/migration/analyze-script", test_analyze_script); + } #endif - qtest_add_func("/migration/precopy/unix/plain", test_precopy_unix_plain); - qtest_add_func("/migration/precopy/unix/xbzrle", test_precopy_unix_xbzrle); + migration_test_add("/migration/precopy/unix/plain", + test_precopy_unix_plain); + migration_test_add("/migration/precopy/unix/xbzrle", + test_precopy_unix_xbzrle); /* * Compression fails from time to time. * Put test here but don't enable it until everything is fixed. */ if (getenv("QEMU_TEST_FLAKY_TESTS")) { - qtest_add_func("/migration/precopy/unix/compress/wait", - test_precopy_unix_compress); - qtest_add_func("/migration/precopy/unix/compress/nowait", - test_precopy_unix_compress_nowait); + migration_test_add("/migration/precopy/unix/compress/wait", + test_precopy_unix_compress); + migration_test_add("/migration/precopy/unix/compress/nowait", + test_precopy_unix_compress_nowait); } - qtest_add_func("/migration/precopy/file", - test_precopy_file); - qtest_add_func("/migration/precopy/file/offset", - test_precopy_file_offset); - qtest_add_func("/migration/precopy/file/offset/bad", - test_precopy_file_offset_bad); + migration_test_add("/migration/precopy/file", + test_precopy_file); + migration_test_add("/migration/precopy/file/offset", + test_precopy_file_offset); + migration_test_add("/migration/precopy/file/offset/bad", + test_precopy_file_offset_bad); /* * Our CI system has problems with shared memory. * Don't run this test until we find a workaround. */ if (getenv("QEMU_TEST_FLAKY_TESTS")) { - qtest_add_func("/migration/mode/reboot", test_mode_reboot); + migration_test_add("/migration/mode/reboot", test_mode_reboot); } #ifdef CONFIG_GNUTLS - qtest_add_func("/migration/precopy/unix/tls/psk", - test_precopy_unix_tls_psk); + migration_test_add("/migration/precopy/unix/tls/psk", + test_precopy_unix_tls_psk); if (has_uffd) { /* @@ -3475,110 +3480,114 @@ int main(int argc, char **argv) * channels are tested under precopy. Here what we want to test is the * general postcopy path that has TLS channel enabled. */ - qtest_add_func("/migration/postcopy/tls/psk", test_postcopy_tls_psk); - qtest_add_func("/migration/postcopy/recovery/tls/psk", - test_postcopy_recovery_tls_psk); - qtest_add_func("/migration/postcopy/preempt/tls/psk", - test_postcopy_preempt_tls_psk); - qtest_add_func("/migration/postcopy/preempt/recovery/tls/psk", - test_postcopy_preempt_all); + migration_test_add("/migration/postcopy/tls/psk", + test_postcopy_tls_psk); + migration_test_add("/migration/postcopy/recovery/tls/psk", + test_postcopy_recovery_tls_psk); + migration_test_add("/migration/postcopy/preempt/tls/psk", + test_postcopy_preempt_tls_psk); + migration_test_add("/migration/postcopy/preempt/recovery/tls/psk", + test_postcopy_preempt_all); } #ifdef CONFIG_TASN1 - qtest_add_func("/migration/precopy/unix/tls/x509/default-host", - test_precopy_unix_tls_x509_default_host); - qtest_add_func("/migration/precopy/unix/tls/x509/override-host", - test_precopy_unix_tls_x509_override_host); + migration_test_add("/migration/precopy/unix/tls/x509/default-host", + test_precopy_unix_tls_x509_default_host); + migration_test_add("/migration/precopy/unix/tls/x509/override-host", + test_precopy_unix_tls_x509_override_host); #endif /* CONFIG_TASN1 */ #endif /* CONFIG_GNUTLS */ - qtest_add_func("/migration/precopy/tcp/plain", test_precopy_tcp_plain); + migration_test_add("/migration/precopy/tcp/plain", test_precopy_tcp_plain); - qtest_add_func("/migration/precopy/tcp/plain/switchover-ack", - test_precopy_tcp_switchover_ack); + migration_test_add("/migration/precopy/tcp/plain/switchover-ack", + test_precopy_tcp_switchover_ack); #ifdef CONFIG_GNUTLS - qtest_add_func("/migration/precopy/tcp/tls/psk/match", - test_precopy_tcp_tls_psk_match); - qtest_add_func("/migration/precopy/tcp/tls/psk/mismatch", - test_precopy_tcp_tls_psk_mismatch); + migration_test_add("/migration/precopy/tcp/tls/psk/match", + test_precopy_tcp_tls_psk_match); + migration_test_add("/migration/precopy/tcp/tls/psk/mismatch", + test_precopy_tcp_tls_psk_mismatch); #ifdef CONFIG_TASN1 - qtest_add_func("/migration/precopy/tcp/tls/x509/default-host", - test_precopy_tcp_tls_x509_default_host); - qtest_add_func("/migration/precopy/tcp/tls/x509/override-host", - test_precopy_tcp_tls_x509_override_host); - qtest_add_func("/migration/precopy/tcp/tls/x509/mismatch-host", - test_precopy_tcp_tls_x509_mismatch_host); - qtest_add_func("/migration/precopy/tcp/tls/x509/friendly-client", - test_precopy_tcp_tls_x509_friendly_client); - qtest_add_func("/migration/precopy/tcp/tls/x509/hostile-client", - test_precopy_tcp_tls_x509_hostile_client); - qtest_add_func("/migration/precopy/tcp/tls/x509/allow-anon-client", - test_precopy_tcp_tls_x509_allow_anon_client); - qtest_add_func("/migration/precopy/tcp/tls/x509/reject-anon-client", - test_precopy_tcp_tls_x509_reject_anon_client); + migration_test_add("/migration/precopy/tcp/tls/x509/default-host", + test_precopy_tcp_tls_x509_default_host); + migration_test_add("/migration/precopy/tcp/tls/x509/override-host", + test_precopy_tcp_tls_x509_override_host); + migration_test_add("/migration/precopy/tcp/tls/x509/mismatch-host", + test_precopy_tcp_tls_x509_mismatch_host); + migration_test_add("/migration/precopy/tcp/tls/x509/friendly-client", + test_precopy_tcp_tls_x509_friendly_client); + migration_test_add("/migration/precopy/tcp/tls/x509/hostile-client", + test_precopy_tcp_tls_x509_hostile_client); + migration_test_add("/migration/precopy/tcp/tls/x509/allow-anon-client", + test_precopy_tcp_tls_x509_allow_anon_client); + migration_test_add("/migration/precopy/tcp/tls/x509/reject-anon-client", + test_precopy_tcp_tls_x509_reject_anon_client); #endif /* CONFIG_TASN1 */ #endif /* CONFIG_GNUTLS */ - /* qtest_add_func("/migration/ignore_shared", test_ignore_shared); */ + /* migration_test_add("/migration/ignore_shared", test_ignore_shared); */ #ifndef _WIN32 - qtest_add_func("/migration/fd_proto", test_migrate_fd_proto); + migration_test_add("/migration/fd_proto", test_migrate_fd_proto); #endif - qtest_add_func("/migration/validate_uuid", test_validate_uuid); - qtest_add_func("/migration/validate_uuid_error", test_validate_uuid_error); - qtest_add_func("/migration/validate_uuid_src_not_set", - test_validate_uuid_src_not_set); - qtest_add_func("/migration/validate_uuid_dst_not_set", - test_validate_uuid_dst_not_set); + migration_test_add("/migration/validate_uuid", test_validate_uuid); + migration_test_add("/migration/validate_uuid_error", + test_validate_uuid_error); + migration_test_add("/migration/validate_uuid_src_not_set", + test_validate_uuid_src_not_set); + migration_test_add("/migration/validate_uuid_dst_not_set", + test_validate_uuid_dst_not_set); /* * See explanation why this test is slow on function definition */ if (g_test_slow()) { - qtest_add_func("/migration/auto_converge", test_migrate_auto_converge); + migration_test_add("/migration/auto_converge", + test_migrate_auto_converge); if (g_str_equal(arch, "x86_64") && has_kvm && kvm_dirty_ring_supported()) { - qtest_add_func("/migration/dirty_limit", test_migrate_dirty_limit); + migration_test_add("/migration/dirty_limit", + test_migrate_dirty_limit); } } - qtest_add_func("/migration/multifd/tcp/plain/none", - test_multifd_tcp_none); + migration_test_add("/migration/multifd/tcp/plain/none", + test_multifd_tcp_none); /* * This test is flaky and sometimes fails in CI and otherwise: * don't run unless user opts in via environment variable. */ if (getenv("QEMU_TEST_FLAKY_TESTS")) { - qtest_add_func("/migration/multifd/tcp/plain/cancel", - test_multifd_tcp_cancel); + migration_test_add("/migration/multifd/tcp/plain/cancel", + test_multifd_tcp_cancel); } - qtest_add_func("/migration/multifd/tcp/plain/zlib", - test_multifd_tcp_zlib); + migration_test_add("/migration/multifd/tcp/plain/zlib", + test_multifd_tcp_zlib); #ifdef CONFIG_ZSTD - qtest_add_func("/migration/multifd/tcp/plain/zstd", - test_multifd_tcp_zstd); + migration_test_add("/migration/multifd/tcp/plain/zstd", + test_multifd_tcp_zstd); #endif #ifdef CONFIG_GNUTLS - qtest_add_func("/migration/multifd/tcp/tls/psk/match", - test_multifd_tcp_tls_psk_match); - qtest_add_func("/migration/multifd/tcp/tls/psk/mismatch", - test_multifd_tcp_tls_psk_mismatch); + migration_test_add("/migration/multifd/tcp/tls/psk/match", + test_multifd_tcp_tls_psk_match); + migration_test_add("/migration/multifd/tcp/tls/psk/mismatch", + test_multifd_tcp_tls_psk_mismatch); #ifdef CONFIG_TASN1 - qtest_add_func("/migration/multifd/tcp/tls/x509/default-host", - test_multifd_tcp_tls_x509_default_host); - qtest_add_func("/migration/multifd/tcp/tls/x509/override-host", - test_multifd_tcp_tls_x509_override_host); - qtest_add_func("/migration/multifd/tcp/tls/x509/mismatch-host", - test_multifd_tcp_tls_x509_mismatch_host); - qtest_add_func("/migration/multifd/tcp/tls/x509/allow-anon-client", - test_multifd_tcp_tls_x509_allow_anon_client); - qtest_add_func("/migration/multifd/tcp/tls/x509/reject-anon-client", - test_multifd_tcp_tls_x509_reject_anon_client); + migration_test_add("/migration/multifd/tcp/tls/x509/default-host", + test_multifd_tcp_tls_x509_default_host); + migration_test_add("/migration/multifd/tcp/tls/x509/override-host", + test_multifd_tcp_tls_x509_override_host); + migration_test_add("/migration/multifd/tcp/tls/x509/mismatch-host", + test_multifd_tcp_tls_x509_mismatch_host); + migration_test_add("/migration/multifd/tcp/tls/x509/allow-anon-client", + test_multifd_tcp_tls_x509_allow_anon_client); + migration_test_add("/migration/multifd/tcp/tls/x509/reject-anon-client", + test_multifd_tcp_tls_x509_reject_anon_client); #endif /* CONFIG_TASN1 */ #endif /* CONFIG_GNUTLS */ if (g_str_equal(arch, "x86_64") && has_kvm && kvm_dirty_ring_supported()) { - qtest_add_func("/migration/dirty_ring", - test_precopy_unix_dirty_ring); - qtest_add_func("/migration/vcpu_dirty_limit", - test_vcpu_dirty_limit); + migration_test_add("/migration/dirty_ring", + test_precopy_unix_dirty_ring); + migration_test_add("/migration/vcpu_dirty_limit", + test_vcpu_dirty_limit); } ret = g_test_run(); From patchwork Tue Jan 16 03:19:36 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 13520366 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 23D88C47077 for ; Tue, 16 Jan 2024 03:23:08 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1rPa0A-0005jV-Bz; Mon, 15 Jan 2024 22:20:34 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1rPa04-0005hF-Le for qemu-devel@nongnu.org; Mon, 15 Jan 2024 22:20:30 -0500 Received: from us-smtp-delivery-124.mimecast.com ([170.10.129.124]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1rPa03-00031r-2f for qemu-devel@nongnu.org; Mon, 15 Jan 2024 22:20:28 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1705375226; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=B4ztfFQKtdyoTioWuEEZqbPD89iK8Lssgp2zMsxTx+E=; b=CYEpr6Xg4s4WPlaTrJs5ZIbeJ0qLOHe4+Je/GxUUw6GZygR5rgHCPTbyYdfYrQYVsNwA/7 Zddrg+lXj+OT08/G69sYRDRpN1BODIxobSifwnrUz6Lq0LDt1oAI65HKdNylI88BXy1Wvb l3LZX915jlrcz6y/qoYX+h9l9/dSnIk= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-427-oDtXrr5yOOa9fBmZRUoI4Q-1; Mon, 15 Jan 2024 22:20:21 -0500 X-MC-Unique: oDtXrr5yOOa9fBmZRUoI4Q-1 Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.rdu2.redhat.com [10.11.54.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 1E42F811E86; Tue, 16 Jan 2024 03:20:21 +0000 (UTC) Received: from x1n.redhat.com (unknown [10.72.116.92]) by smtp.corp.redhat.com (Postfix) with ESMTP id 3C3BA3C25; Tue, 16 Jan 2024 03:20:17 +0000 (UTC) From: peterx@redhat.com To: qemu-devel@nongnu.org, Peter Maydell Cc: peterx@redhat.com, Fabiano Rosas , Juan Quintela Subject: [PULL 09/20] tests/qtest: Re-enable multifd cancel test Date: Tue, 16 Jan 2024 11:19:36 +0800 Message-ID: <20240116031947.69017-10-peterx@redhat.com> In-Reply-To: <20240116031947.69017-1-peterx@redhat.com> References: <20240116031947.69017-1-peterx@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.4.1 on 10.11.54.1 Received-SPF: pass client-ip=170.10.129.124; envelope-from=peterx@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -35 X-Spam_score: -3.6 X-Spam_bar: --- X-Spam_report: (-3.6 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-1.531, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H4=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org From: Fabiano Rosas We've found the source of flakiness in this test, so re-enable it. Reviewed-by: Juan Quintela Signed-off-by: Fabiano Rosas Link: https://lore.kernel.org/r/20230606144551.24367-4-farosas@suse.de [peterx: rebase to 2a61a6964c, to use migration_test_add()] Signed-off-by: Peter Xu --- tests/qtest/migration-test.c | 10 ++-------- 1 file changed, 2 insertions(+), 8 deletions(-) diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c index 21da140aea..d3066e119f 100644 --- a/tests/qtest/migration-test.c +++ b/tests/qtest/migration-test.c @@ -3550,14 +3550,8 @@ int main(int argc, char **argv) } migration_test_add("/migration/multifd/tcp/plain/none", test_multifd_tcp_none); - /* - * This test is flaky and sometimes fails in CI and otherwise: - * don't run unless user opts in via environment variable. - */ - if (getenv("QEMU_TEST_FLAKY_TESTS")) { - migration_test_add("/migration/multifd/tcp/plain/cancel", - test_multifd_tcp_cancel); - } + migration_test_add("/migration/multifd/tcp/plain/cancel", + test_multifd_tcp_cancel); migration_test_add("/migration/multifd/tcp/plain/zlib", test_multifd_tcp_zlib); #ifdef CONFIG_ZSTD From patchwork Tue Jan 16 03:19:37 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 13520350 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 7B388C47077 for ; Tue, 16 Jan 2024 03:22:00 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1rPa0B-0005k0-AF; Mon, 15 Jan 2024 22:20:35 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1rPa07-0005hO-6d for qemu-devel@nongnu.org; Mon, 15 Jan 2024 22:20:33 -0500 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1rPa05-00033w-FD for qemu-devel@nongnu.org; Mon, 15 Jan 2024 22:20:30 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1705375228; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=dZsZWSUdsS/SNanf1F3uVMexB59A3J31sjL5vpGtd4U=; b=PbCJYZ0/mqPYSKBxEQcSMU037U9Z0WrhfR/xpmIzZWP5gTrWb/6pKm7OJ0IR8eQXJ9KHqn 7vvcM9eYWm4elBWFEfklXViKlT+maigj+GLQqboGtSW2oxrTakH+pdjGgvknodAIQstv6u tn0VrU5q8Rv7MZGGF9Ord9++O233cS0= Received: from mimecast-mx02.redhat.com (mx-ext.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-300-V47N_fSFN4SCniHmZUSrPw-1; Mon, 15 Jan 2024 22:20:26 -0500 X-MC-Unique: V47N_fSFN4SCniHmZUSrPw-1 Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.rdu2.redhat.com [10.11.54.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id CA0431C05ABC; Tue, 16 Jan 2024 03:20:25 +0000 (UTC) Received: from x1n.redhat.com (unknown [10.72.116.92]) by smtp.corp.redhat.com (Postfix) with ESMTP id E0AA33C25; Tue, 16 Jan 2024 03:20:21 +0000 (UTC) From: peterx@redhat.com To: qemu-devel@nongnu.org, Peter Maydell Cc: peterx@redhat.com, Fabiano Rosas , "Michael S. Tsirkin" , Jason Wang , Alex Williamson , =?utf-8?q?C=C3=A9dric_Le_Goat?= =?utf-8?q?er?= Subject: [PULL 10/20] docs/migration: Create migration/ directory Date: Tue, 16 Jan 2024 11:19:37 +0800 Message-ID: <20240116031947.69017-11-peterx@redhat.com> In-Reply-To: <20240116031947.69017-1-peterx@redhat.com> References: <20240116031947.69017-1-peterx@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.4.1 on 10.11.54.1 Received-SPF: pass client-ip=170.10.133.124; envelope-from=peterx@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -35 X-Spam_score: -3.6 X-Spam_bar: --- X-Spam_report: (-3.6 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-1.531, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H3=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org From: Peter Xu Migration documentation is growing into a single file too large. Create a sub-directory for it for a split. We also already have separate vfio/virtio documentations, move it all over into the directory. Note that the virtio one is still not yet converted to rST. That is a job for later. Cc: "Michael S. Tsirkin" Cc: Jason Wang Cc: Alex Williamson Cc: Cédric Le Goater Reviewed-by: Cédric Le Goater Link: https://lore.kernel.org/r/20240109064628.595453-2-peterx@redhat.com Signed-off-by: Peter Xu --- docs/devel/index-internals.rst | 2 +- docs/devel/{migration.rst => migration/main.rst} | 0 docs/devel/{vfio-migration.rst => migration/vfio.rst} | 0 docs/devel/{virtio-migration.txt => migration/virtio.txt} | 0 4 files changed, 1 insertion(+), 1 deletion(-) rename docs/devel/{migration.rst => migration/main.rst} (100%) rename docs/devel/{vfio-migration.rst => migration/vfio.rst} (100%) rename docs/devel/{virtio-migration.txt => migration/virtio.txt} (100%) diff --git a/docs/devel/index-internals.rst b/docs/devel/index-internals.rst index 3def4a138b..a41d62c1eb 100644 --- a/docs/devel/index-internals.rst +++ b/docs/devel/index-internals.rst @@ -11,7 +11,7 @@ Details about QEMU's various subsystems including how to add features to them. block-coroutine-wrapper clocks ebpf_rss - migration + migration/main multi-process reset s390-cpu-topology diff --git a/docs/devel/migration.rst b/docs/devel/migration/main.rst similarity index 100% rename from docs/devel/migration.rst rename to docs/devel/migration/main.rst diff --git a/docs/devel/vfio-migration.rst b/docs/devel/migration/vfio.rst similarity index 100% rename from docs/devel/vfio-migration.rst rename to docs/devel/migration/vfio.rst diff --git a/docs/devel/virtio-migration.txt b/docs/devel/migration/virtio.txt similarity index 100% rename from docs/devel/virtio-migration.txt rename to docs/devel/migration/virtio.txt From patchwork Tue Jan 16 03:19:38 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 13520357 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id C6183C47077 for ; Tue, 16 Jan 2024 03:22:15 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1rPa0I-0005li-6F; Mon, 15 Jan 2024 22:20:42 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1rPa0C-0005kk-G2 for qemu-devel@nongnu.org; Mon, 15 Jan 2024 22:20:36 -0500 Received: from us-smtp-delivery-124.mimecast.com ([170.10.129.124]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1rPa0A-00035f-M9 for qemu-devel@nongnu.org; Mon, 15 Jan 2024 22:20:36 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1705375234; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=EReq4wI/bSUVOoJSjjtmfDKxcjNhlnVjkKZhvj4f0cM=; b=TratPdDI/luZ2GQTV+ZR8+NvG+nAjEo8bGUUnyDQyOy4CCipRkCsxvXFeDMmDU423MRF+v BhyRe/c4XFOZY0tTrXTzRFUkRJqJedHJ4XUUkt3mrVQFahEcZQuOCGCZknVSKaEC1orTQE wpnhP6P1e+oQ0DJLh87+WBk/lw5FA1o= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-237-EMVeXCvsOUOI_V5hQdU7FA-1; Mon, 15 Jan 2024 22:20:30 -0500 X-MC-Unique: EMVeXCvsOUOI_V5hQdU7FA-1 Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.rdu2.redhat.com [10.11.54.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id E0FE082DFE0; Tue, 16 Jan 2024 03:20:29 +0000 (UTC) Received: from x1n.redhat.com (unknown [10.72.116.92]) by smtp.corp.redhat.com (Postfix) with ESMTP id 9991E3C25; Tue, 16 Jan 2024 03:20:26 +0000 (UTC) From: peterx@redhat.com To: qemu-devel@nongnu.org, Peter Maydell Cc: peterx@redhat.com, Fabiano Rosas , Alex Williamson , =?utf-8?q?C=C3=A9dric_Le_Goat?= =?utf-8?q?er?= Subject: [PULL 11/20] docs/migration: Create index page Date: Tue, 16 Jan 2024 11:19:38 +0800 Message-ID: <20240116031947.69017-12-peterx@redhat.com> In-Reply-To: <20240116031947.69017-1-peterx@redhat.com> References: <20240116031947.69017-1-peterx@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.4.1 on 10.11.54.1 Received-SPF: pass client-ip=170.10.129.124; envelope-from=peterx@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -35 X-Spam_score: -3.6 X-Spam_bar: --- X-Spam_report: (-3.6 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-1.531, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H4=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org From: Peter Xu Create an index page for migration module. Move VFIO migration there too. A trivial touch-up on the title to use lower case there. Since then we'll have "migration" as the top title, make the main doc file renamed to "migration framework". Cc: Alex Williamson Cc: Cédric Le Goater Reviewed-by: Cédric Le Goater Link: https://lore.kernel.org/r/20240109064628.595453-3-peterx@redhat.com Signed-off-by: Peter Xu --- docs/devel/index-internals.rst | 3 +-- docs/devel/migration/index.rst | 11 +++++++++++ docs/devel/migration/main.rst | 6 +++--- docs/devel/migration/vfio.rst | 2 +- 4 files changed, 16 insertions(+), 6 deletions(-) create mode 100644 docs/devel/migration/index.rst diff --git a/docs/devel/index-internals.rst b/docs/devel/index-internals.rst index a41d62c1eb..5636e9cf1d 100644 --- a/docs/devel/index-internals.rst +++ b/docs/devel/index-internals.rst @@ -11,13 +11,12 @@ Details about QEMU's various subsystems including how to add features to them. block-coroutine-wrapper clocks ebpf_rss - migration/main + migration/index multi-process reset s390-cpu-topology s390-dasd-ipl tracing - vfio-migration vfio-iommufd writing-monitor-commands virtio-backends diff --git a/docs/devel/migration/index.rst b/docs/devel/migration/index.rst new file mode 100644 index 0000000000..02cfdcc969 --- /dev/null +++ b/docs/devel/migration/index.rst @@ -0,0 +1,11 @@ +Migration +========= + +This is the main entry for QEMU migration documentations. It explains how +QEMU live migration works. + +.. toctree:: + :maxdepth: 2 + + main + vfio diff --git a/docs/devel/migration/main.rst b/docs/devel/migration/main.rst index 95351ba51f..62bf027fb4 100644 --- a/docs/devel/migration/main.rst +++ b/docs/devel/migration/main.rst @@ -1,6 +1,6 @@ -========= -Migration -========= +=================== +Migration framework +=================== QEMU has code to load/save the state of the guest that it is running. These are two complementary operations. Saving the state just does diff --git a/docs/devel/migration/vfio.rst b/docs/devel/migration/vfio.rst index 605fe60e96..c49482eab6 100644 --- a/docs/devel/migration/vfio.rst +++ b/docs/devel/migration/vfio.rst @@ -1,5 +1,5 @@ ===================== -VFIO device Migration +VFIO device migration ===================== Migration of virtual machine involves saving the state for each device that From patchwork Tue Jan 16 03:19:39 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 13520358 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 9155BC4706C for ; Tue, 16 Jan 2024 03:22:22 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1rPa0K-0005mN-FP; Mon, 15 Jan 2024 22:20:44 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1rPa0E-0005lI-C4 for qemu-devel@nongnu.org; Mon, 15 Jan 2024 22:20:38 -0500 Received: from us-smtp-delivery-124.mimecast.com ([170.10.129.124]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1rPa0C-00035j-69 for qemu-devel@nongnu.org; Mon, 15 Jan 2024 22:20:38 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1705375235; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=oLIoAj4gg4hxjQT4UcqeKcDo0V0q8ZpScX2bn5CfyKM=; b=HLXrFRLMH4qpkalgu4tKeGy0F/AAOit0SQ0919PmVDrZwwCXbAlL2IHvvQQnpX7rPyA4fF qNi4AJWryQ2ydFEwrkAeGBXzK02shF4hBjD8TWFVoaanMQ/BE8z5J/hVQIw6qcatrL4Byf yz6IT7q2BGTphL7s69LnX36PAjyMtZE= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-677-VfCvbcl6Obe3zOf66Z_efA-1; Mon, 15 Jan 2024 22:20:33 -0500 X-MC-Unique: VfCvbcl6Obe3zOf66Z_efA-1 Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.rdu2.redhat.com [10.11.54.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 7436E102587D; Tue, 16 Jan 2024 03:20:33 +0000 (UTC) Received: from x1n.redhat.com (unknown [10.72.116.92]) by smtp.corp.redhat.com (Postfix) with ESMTP id ACD163C25; Tue, 16 Jan 2024 03:20:30 +0000 (UTC) From: peterx@redhat.com To: qemu-devel@nongnu.org, Peter Maydell Cc: peterx@redhat.com, Fabiano Rosas , =?utf-8?q?C=C3=A9dri?= =?utf-8?q?c_Le_Goater?= Subject: [PULL 12/20] docs/migration: Convert virtio.txt into rST Date: Tue, 16 Jan 2024 11:19:39 +0800 Message-ID: <20240116031947.69017-13-peterx@redhat.com> In-Reply-To: <20240116031947.69017-1-peterx@redhat.com> References: <20240116031947.69017-1-peterx@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.4.1 on 10.11.54.1 Received-SPF: pass client-ip=170.10.129.124; envelope-from=peterx@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -35 X-Spam_score: -3.6 X-Spam_bar: --- X-Spam_report: (-3.6 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-1.531, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H4=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org From: Peter Xu Convert the plain old .txt into .rst, add it into migration/index.rst. Reviewed-by: Cédric Le Goater Link: https://lore.kernel.org/r/20240109064628.595453-4-peterx@redhat.com Signed-off-by: Peter Xu --- docs/devel/migration/index.rst | 1 + docs/devel/migration/virtio.rst | 115 ++++++++++++++++++++++++++++++++ docs/devel/migration/virtio.txt | 108 ------------------------------ 3 files changed, 116 insertions(+), 108 deletions(-) create mode 100644 docs/devel/migration/virtio.rst delete mode 100644 docs/devel/migration/virtio.txt diff --git a/docs/devel/migration/index.rst b/docs/devel/migration/index.rst index 02cfdcc969..2cb701c77c 100644 --- a/docs/devel/migration/index.rst +++ b/docs/devel/migration/index.rst @@ -9,3 +9,4 @@ QEMU live migration works. main vfio + virtio diff --git a/docs/devel/migration/virtio.rst b/docs/devel/migration/virtio.rst new file mode 100644 index 0000000000..611a18b821 --- /dev/null +++ b/docs/devel/migration/virtio.rst @@ -0,0 +1,115 @@ +======================= +Virtio device migration +======================= + +Copyright 2015 IBM Corp. + +This work is licensed under the terms of the GNU GPL, version 2 or later. See +the COPYING file in the top-level directory. + +Saving and restoring the state of virtio devices is a bit of a twisty maze, +for several reasons: + +- state is distributed between several parts: + + - virtio core, for common fields like features, number of queues, ... + + - virtio transport (pci, ccw, ...), for the different proxy devices and + transport specific state (msix vectors, indicators, ...) + + - virtio device (net, blk, ...), for the different device types and their + state (mac address, request queue, ...) + +- most fields are saved via the stream interface; subsequently, subsections + have been added to make cross-version migration possible + +This file attempts to document the current procedure and point out some +caveats. + +Save state procedure +==================== + +:: + + virtio core virtio transport virtio device + ----------- ---------------- ------------- + + save() function registered + via VMState wrapper on + device class + virtio_save() <---------- + ------> save_config() + - save proxy device + - save transport-specific + device fields + - save common device + fields + - save common virtqueue + fields + ------> save_queue() + - save transport-specific + virtqueue fields + ------> save_device() + - save device-specific + fields + - save subsections + - device endianness, + if changed from + default endianness + - 64 bit features, if + any high feature bit + is set + - virtio-1 virtqueue + fields, if VERSION_1 + is set + +Load state procedure +==================== + +:: + + virtio core virtio transport virtio device + ----------- ---------------- ------------- + + load() function registered + via VMState wrapper on + device class + virtio_load() <---------- + ------> load_config() + - load proxy device + - load transport-specific + device fields + - load common device + fields + - load common virtqueue + fields + ------> load_queue() + - load transport-specific + virtqueue fields + - notify guest + ------> load_device() + - load device-specific + fields + - load subsections + - device endianness + - 64 bit features + - virtio-1 virtqueue + fields + - sanitize endianness + - sanitize features + - virtqueue index sanity + check + - feature-dependent setup + +Implications of this setup +========================== + +Devices need to be careful in their state processing during load: The +load_device() procedure is invoked by the core before subsections have +been loaded. Any code that depends on information transmitted in subsections +therefore has to be invoked in the device's load() function _after_ +virtio_load() returned (like e.g. code depending on features). + +Any extension of the state being migrated should be done in subsections +added to the core for compatibility reasons. If transport or device specific +state is added, core needs to invoke a callback from the new subsection. diff --git a/docs/devel/migration/virtio.txt b/docs/devel/migration/virtio.txt deleted file mode 100644 index 98a6b0ffb5..0000000000 --- a/docs/devel/migration/virtio.txt +++ /dev/null @@ -1,108 +0,0 @@ -Virtio devices and migration -============================ - -Copyright 2015 IBM Corp. - -This work is licensed under the terms of the GNU GPL, version 2 or later. See -the COPYING file in the top-level directory. - -Saving and restoring the state of virtio devices is a bit of a twisty maze, -for several reasons: -- state is distributed between several parts: - - virtio core, for common fields like features, number of queues, ... - - virtio transport (pci, ccw, ...), for the different proxy devices and - transport specific state (msix vectors, indicators, ...) - - virtio device (net, blk, ...), for the different device types and their - state (mac address, request queue, ...) -- most fields are saved via the stream interface; subsequently, subsections - have been added to make cross-version migration possible - -This file attempts to document the current procedure and point out some -caveats. - - -Save state procedure -==================== - -virtio core virtio transport virtio device ------------ ---------------- ------------- - - save() function registered - via VMState wrapper on - device class -virtio_save() <---------- - ------> save_config() - - save proxy device - - save transport-specific - device fields -- save common device - fields -- save common virtqueue - fields - ------> save_queue() - - save transport-specific - virtqueue fields - ------> save_device() - - save device-specific - fields -- save subsections - - device endianness, - if changed from - default endianness - - 64 bit features, if - any high feature bit - is set - - virtio-1 virtqueue - fields, if VERSION_1 - is set - - -Load state procedure -==================== - -virtio core virtio transport virtio device ------------ ---------------- ------------- - - load() function registered - via VMState wrapper on - device class -virtio_load() <---------- - ------> load_config() - - load proxy device - - load transport-specific - device fields -- load common device - fields -- load common virtqueue - fields - ------> load_queue() - - load transport-specific - virtqueue fields -- notify guest - ------> load_device() - - load device-specific - fields -- load subsections - - device endianness - - 64 bit features - - virtio-1 virtqueue - fields -- sanitize endianness -- sanitize features -- virtqueue index sanity - check - - feature-dependent setup - - -Implications of this setup -========================== - -Devices need to be careful in their state processing during load: The -load_device() procedure is invoked by the core before subsections have -been loaded. Any code that depends on information transmitted in subsections -therefore has to be invoked in the device's load() function _after_ -virtio_load() returned (like e.g. code depending on features). - -Any extension of the state being migrated should be done in subsections -added to the core for compatibility reasons. If transport or device specific -state is added, core needs to invoke a callback from the new subsection. From patchwork Tue Jan 16 03:19:40 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 13520363 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id CD722C4706C for ; Tue, 16 Jan 2024 03:22:49 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1rPa0O-0005tP-3P; Mon, 15 Jan 2024 22:20:48 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1rPa0M-0005sZ-W2 for qemu-devel@nongnu.org; Mon, 15 Jan 2024 22:20:47 -0500 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1rPa0I-000368-Gi for qemu-devel@nongnu.org; Mon, 15 Jan 2024 22:20:46 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1705375241; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=gRfe7lrLa9wUgG52T8F0i/ypspcueS3dAW+QfVSCbYQ=; b=LzIdKuvoRaB7Yboa1h8kKS8KVK2Co7WM7IhYdaZPJITh4y/OWR6DTHOqj/1UnFykdzWGYC OQBOOii6I95dxI9uDRw0UM9c3+erCK6LgqvfDoru9Vg7bp58HrGftC9WIHvhswEhiweu5D rYHG0ZgdhG3D0ZYKwLlcRSCa5ogqLeA= Received: from mimecast-mx02.redhat.com (mx-ext.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-414-SOuRxxLxN6ycN6kTBMpxNA-1; Mon, 15 Jan 2024 22:20:37 -0500 X-MC-Unique: SOuRxxLxN6ycN6kTBMpxNA-1 Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.rdu2.redhat.com [10.11.54.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id CB7813811F4F; Tue, 16 Jan 2024 03:20:36 +0000 (UTC) Received: from x1n.redhat.com (unknown [10.72.116.92]) by smtp.corp.redhat.com (Postfix) with ESMTP id 32AA23C27; Tue, 16 Jan 2024 03:20:33 +0000 (UTC) From: peterx@redhat.com To: qemu-devel@nongnu.org, Peter Maydell Cc: peterx@redhat.com, Fabiano Rosas , =?utf-8?q?C=C3=A9dri?= =?utf-8?q?c_Le_Goater?= Subject: [PULL 13/20] docs/migration: Split "Backwards compatibility" separately Date: Tue, 16 Jan 2024 11:19:40 +0800 Message-ID: <20240116031947.69017-14-peterx@redhat.com> In-Reply-To: <20240116031947.69017-1-peterx@redhat.com> References: <20240116031947.69017-1-peterx@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.4.1 on 10.11.54.1 Received-SPF: pass client-ip=170.10.133.124; envelope-from=peterx@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -35 X-Spam_score: -3.6 X-Spam_bar: --- X-Spam_report: (-3.6 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-1.531, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H3=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org From: Peter Xu Split the section from main.rst into a separate file. Reference it in the index.rst. Reviewed-by: Cédric Le Goater Link: https://lore.kernel.org/r/20240109064628.595453-5-peterx@redhat.com Signed-off-by: Peter Xu --- docs/devel/migration/compatibility.rst | 517 ++++++++++++++++++++++++ docs/devel/migration/index.rst | 1 + docs/devel/migration/main.rst | 519 ------------------------- 3 files changed, 518 insertions(+), 519 deletions(-) create mode 100644 docs/devel/migration/compatibility.rst diff --git a/docs/devel/migration/compatibility.rst b/docs/devel/migration/compatibility.rst new file mode 100644 index 0000000000..5a5417ef06 --- /dev/null +++ b/docs/devel/migration/compatibility.rst @@ -0,0 +1,517 @@ +Backwards compatibility +======================= + +How backwards compatibility works +--------------------------------- + +When we do migration, we have two QEMU processes: the source and the +target. There are two cases, they are the same version or they are +different versions. The easy case is when they are the same version. +The difficult one is when they are different versions. + +There are two things that are different, but they have very similar +names and sometimes get confused: + +- QEMU version +- machine type version + +Let's start with a practical example, we start with: + +- qemu-system-x86_64 (v5.2), from now on qemu-5.2. +- qemu-system-x86_64 (v5.1), from now on qemu-5.1. + +Related to this are the "latest" machine types defined on each of +them: + +- pc-q35-5.2 (newer one in qemu-5.2) from now on pc-5.2 +- pc-q35-5.1 (newer one in qemu-5.1) from now on pc-5.1 + +First of all, migration is only supposed to work if you use the same +machine type in both source and destination. The QEMU hardware +configuration needs to be the same also on source and destination. +Most aspects of the backend configuration can be changed at will, +except for a few cases where the backend features influence frontend +device feature exposure. But that is not relevant for this section. + +I am going to list the number of combinations that we can have. Let's +start with the trivial ones, QEMU is the same on source and +destination: + +1 - qemu-5.2 -M pc-5.2 -> migrates to -> qemu-5.2 -M pc-5.2 + + This is the latest QEMU with the latest machine type. + This have to work, and if it doesn't work it is a bug. + +2 - qemu-5.1 -M pc-5.1 -> migrates to -> qemu-5.1 -M pc-5.1 + + Exactly the same case than the previous one, but for 5.1. + Nothing to see here either. + +This are the easiest ones, we will not talk more about them in this +section. + +Now we start with the more interesting cases. Consider the case where +we have the same QEMU version in both sides (qemu-5.2) but we are using +the latest machine type for that version (pc-5.2) but one of an older +QEMU version, in this case pc-5.1. + +3 - qemu-5.2 -M pc-5.1 -> migrates to -> qemu-5.2 -M pc-5.1 + + It needs to use the definition of pc-5.1 and the devices as they + were configured on 5.1, but this should be easy in the sense that + both sides are the same QEMU and both sides have exactly the same + idea of what the pc-5.1 machine is. + +4 - qemu-5.1 -M pc-5.2 -> migrates to -> qemu-5.1 -M pc-5.2 + + This combination is not possible as the qemu-5.1 doesn't understand + pc-5.2 machine type. So nothing to worry here. + +Now it comes the interesting ones, when both QEMU processes are +different. Notice also that the machine type needs to be pc-5.1, +because we have the limitation than qemu-5.1 doesn't know pc-5.2. So +the possible cases are: + +5 - qemu-5.2 -M pc-5.1 -> migrates to -> qemu-5.1 -M pc-5.1 + + This migration is known as newer to older. We need to make sure + when we are developing 5.2 we need to take care about not to break + migration to qemu-5.1. Notice that we can't make updates to + qemu-5.1 to understand whatever qemu-5.2 decides to change, so it is + in qemu-5.2 side to make the relevant changes. + +6 - qemu-5.1 -M pc-5.1 -> migrates to -> qemu-5.2 -M pc-5.1 + + This migration is known as older to newer. We need to make sure + than we are able to receive migrations from qemu-5.1. The problem is + similar to the previous one. + +If qemu-5.1 and qemu-5.2 were the same, there will not be any +compatibility problems. But the reason that we create qemu-5.2 is to +get new features, devices, defaults, etc. + +If we get a device that has a new feature, or change a default value, +we have a problem when we try to migrate between different QEMU +versions. + +So we need a way to tell qemu-5.2 that when we are using machine type +pc-5.1, it needs to **not** use the feature, to be able to migrate to +real qemu-5.1. + +And the equivalent part when migrating from qemu-5.1 to qemu-5.2. +qemu-5.2 has to expect that it is not going to get data for the new +feature, because qemu-5.1 doesn't know about it. + +How do we tell QEMU about these device feature changes? In +hw/core/machine.c:hw_compat_X_Y arrays. + +If we change a default value, we need to put back the old value on +that array. And the device, during initialization needs to look at +that array to see what value it needs to get for that feature. And +what are we going to put in that array, the value of a property. + +To create a property for a device, we need to use one of the +DEFINE_PROP_*() macros. See include/hw/qdev-properties.h to find the +macros that exist. With it, we set the default value for that +property, and that is what it is going to get in the latest released +version. But if we want a different value for a previous version, we +can change that in the hw_compat_X_Y arrays. + +hw_compat_X_Y is an array of registers that have the format: + +- name_device +- name_property +- value + +Let's see a practical example. + +In qemu-5.2 virtio-blk-device got multi queue support. This is a +change that is not backward compatible. In qemu-5.1 it has one +queue. In qemu-5.2 it has the same number of queues as the number of +cpus in the system. + +When we are doing migration, if we migrate from a device that has 4 +queues to a device that have only one queue, we don't know where to +put the extra information for the other 3 queues, and we fail +migration. + +Similar problem when we migrate from qemu-5.1 that has only one queue +to qemu-5.2, we only sent information for one queue, but destination +has 4, and we have 3 queues that are not properly initialized and +anything can happen. + +So, how can we address this problem. Easy, just convince qemu-5.2 +that when it is running pc-5.1, it needs to set the number of queues +for virtio-blk-devices to 1. + +That way we fix the cases 5 and 6. + +5 - qemu-5.2 -M pc-5.1 -> migrates to -> qemu-5.1 -M pc-5.1 + + qemu-5.2 -M pc-5.1 sets number of queues to be 1. + qemu-5.1 -M pc-5.1 expects number of queues to be 1. + + correct. migration works. + +6 - qemu-5.1 -M pc-5.1 -> migrates to -> qemu-5.2 -M pc-5.1 + + qemu-5.1 -M pc-5.1 sets number of queues to be 1. + qemu-5.2 -M pc-5.1 expects number of queues to be 1. + + correct. migration works. + +And now the other interesting case, case 3. In this case we have: + +3 - qemu-5.2 -M pc-5.1 -> migrates to -> qemu-5.2 -M pc-5.1 + + Here we have the same QEMU in both sides. So it doesn't matter a + lot if we have set the number of queues to 1 or not, because + they are the same. + + WRONG! + + Think what happens if we do one of this double migrations: + + A -> migrates -> B -> migrates -> C + + where: + + A: qemu-5.1 -M pc-5.1 + B: qemu-5.2 -M pc-5.1 + C: qemu-5.2 -M pc-5.1 + + migration A -> B is case 6, so number of queues needs to be 1. + + migration B -> C is case 3, so we don't care. But actually we + care because we haven't started the guest in qemu-5.2, it came + migrated from qemu-5.1. So to be in the safe place, we need to + always use number of queues 1 when we are using pc-5.1. + +Now, how was this done in reality? The following commit shows how it +was done:: + + commit 9445e1e15e66c19e42bea942ba810db28052cd05 + Author: Stefan Hajnoczi + Date: Tue Aug 18 15:33:47 2020 +0100 + + virtio-blk-pci: default num_queues to -smp N + +The relevant parts for migration are:: + + @@ -1281,7 +1284,8 @@ static Property virtio_blk_properties[] = { + #endif + DEFINE_PROP_BIT("request-merging", VirtIOBlock, conf.request_merging, 0, + true), + - DEFINE_PROP_UINT16("num-queues", VirtIOBlock, conf.num_queues, 1), + + DEFINE_PROP_UINT16("num-queues", VirtIOBlock, conf.num_queues, + + VIRTIO_BLK_AUTO_NUM_QUEUES), + DEFINE_PROP_UINT16("queue-size", VirtIOBlock, conf.queue_size, 256), + +It changes the default value of num_queues. But it fishes it for old +machine types to have the right value:: + + @@ -31,6 +31,7 @@ + GlobalProperty hw_compat_5_1[] = { + ... + + { "virtio-blk-device", "num-queues", "1"}, + ... + }; + +A device with different features on both sides +---------------------------------------------- + +Let's assume that we are using the same QEMU binary on both sides, +just to make the things easier. But we have a device that has +different features on both sides of the migration. That can be +because the devices are different, because the kernel driver of both +devices have different features, whatever. + +How can we get this to work with migration. The way to do that is +"theoretically" easy. You have to get the features that the device +has in the source of the migration. The features that the device has +on the target of the migration, you get the intersection of the +features of both sides, and that is the way that you should launch +QEMU. + +Notice that this is not completely related to QEMU. The most +important thing here is that this should be handled by the managing +application that launches QEMU. If QEMU is configured correctly, the +migration will succeed. + +That said, actually doing it is complicated. Almost all devices are +bad at being able to be launched with only some features enabled. +With one big exception: cpus. + +You can read the documentation for QEMU x86 cpu models here: + +https://qemu-project.gitlab.io/qemu/system/qemu-cpu-models.html + +See when they talk about migration they recommend that one chooses the +newest cpu model that is supported for all cpus. + +Let's say that we have: + +Host A: + +Device X has the feature Y + +Host B: + +Device X has not the feature Y + +If we try to migrate without any care from host A to host B, it will +fail because when migration tries to load the feature Y on +destination, it will find that the hardware is not there. + +Doing this would be the equivalent of doing with cpus: + +Host A: + +$ qemu-system-x86_64 -cpu host + +Host B: + +$ qemu-system-x86_64 -cpu host + +When both hosts have different cpu features this is guaranteed to +fail. Especially if Host B has less features than host A. If host A +has less features than host B, sometimes it works. Important word of +last sentence is "sometimes". + +So, forgetting about cpu models and continuing with the -cpu host +example, let's see that the differences of the cpus is that Host A and +B have the following features: + +Features: 'pcid' 'stibp' 'taa-no' +Host A: X X +Host B: X + +And we want to migrate between them, the way configure both QEMU cpu +will be: + +Host A: + +$ qemu-system-x86_64 -cpu host,pcid=off,stibp=off + +Host B: + +$ qemu-system-x86_64 -cpu host,taa-no=off + +And you would be able to migrate between them. It is responsibility +of the management application or of the user to make sure that the +configuration is correct. QEMU doesn't know how to look at this kind +of features in general. + +Notice that we don't recommend to use -cpu host for migration. It is +used in this example because it makes the example simpler. + +Other devices have worse control about individual features. If they +want to be able to migrate between hosts that show different features, +the device needs a way to configure which ones it is going to use. + +In this section we have considered that we are using the same QEMU +binary in both sides of the migration. If we use different QEMU +versions process, then we need to have into account all other +differences and the examples become even more complicated. + +How to mitigate when we have a backward compatibility error +----------------------------------------------------------- + +We broke migration for old machine types continuously during +development. But as soon as we find that there is a problem, we fix +it. The problem is what happens when we detect after we have done a +release that something has gone wrong. + +Let see how it worked with one example. + +After the release of qemu-8.0 we found a problem when doing migration +of the machine type pc-7.2. + +- $ qemu-7.2 -M pc-7.2 -> qemu-7.2 -M pc-7.2 + + This migration works + +- $ qemu-8.0 -M pc-7.2 -> qemu-8.0 -M pc-7.2 + + This migration works + +- $ qemu-8.0 -M pc-7.2 -> qemu-7.2 -M pc-7.2 + + This migration fails + +- $ qemu-7.2 -M pc-7.2 -> qemu-8.0 -M pc-7.2 + + This migration fails + +So clearly something fails when migration between qemu-7.2 and +qemu-8.0 with machine type pc-7.2. The error messages, and git bisect +pointed to this commit. + +In qemu-8.0 we got this commit:: + + commit 010746ae1db7f52700cb2e2c46eb94f299cfa0d2 + Author: Jonathan Cameron + Date: Thu Mar 2 13:37:02 2023 +0000 + + hw/pci/aer: Implement PCI_ERR_UNCOR_MASK register + + +The relevant bits of the commit for our example are this ones:: + + --- a/hw/pci/pcie_aer.c + +++ b/hw/pci/pcie_aer.c + @@ -112,6 +112,10 @@ int pcie_aer_init(PCIDevice *dev, + + pci_set_long(dev->w1cmask + offset + PCI_ERR_UNCOR_STATUS, + PCI_ERR_UNC_SUPPORTED); + + pci_set_long(dev->config + offset + PCI_ERR_UNCOR_MASK, + + PCI_ERR_UNC_MASK_DEFAULT); + + pci_set_long(dev->wmask + offset + PCI_ERR_UNCOR_MASK, + + PCI_ERR_UNC_SUPPORTED); + + pci_set_long(dev->config + offset + PCI_ERR_UNCOR_SEVER, + PCI_ERR_UNC_SEVERITY_DEFAULT); + +The patch changes how we configure PCI space for AER. But QEMU fails +when the PCI space configuration is different between source and +destination. + +The following commit shows how this got fixed:: + + commit 5ed3dabe57dd9f4c007404345e5f5bf0e347317f + Author: Leonardo Bras + Date: Tue May 2 21:27:02 2023 -0300 + + hw/pci: Disable PCI_ERR_UNCOR_MASK register for machine type < 8.0 + + [...] + +The relevant parts of the fix in QEMU are as follow: + +First, we create a new property for the device to be able to configure +the old behaviour or the new behaviour:: + + diff --git a/hw/pci/pci.c b/hw/pci/pci.c + index 8a87ccc8b0..5153ad63d6 100644 + --- a/hw/pci/pci.c + +++ b/hw/pci/pci.c + @@ -79,6 +79,8 @@ static Property pci_props[] = { + DEFINE_PROP_STRING("failover_pair_id", PCIDevice, + failover_pair_id), + DEFINE_PROP_UINT32("acpi-index", PCIDevice, acpi_index, 0), + + DEFINE_PROP_BIT("x-pcie-err-unc-mask", PCIDevice, cap_present, + + QEMU_PCIE_ERR_UNC_MASK_BITNR, true), + DEFINE_PROP_END_OF_LIST() + }; + +Notice that we enable the feature for new machine types. + +Now we see how the fix is done. This is going to depend on what kind +of breakage happens, but in this case it is quite simple:: + + diff --git a/hw/pci/pcie_aer.c b/hw/pci/pcie_aer.c + index 103667c368..374d593ead 100644 + --- a/hw/pci/pcie_aer.c + +++ b/hw/pci/pcie_aer.c + @@ -112,10 +112,13 @@ int pcie_aer_init(PCIDevice *dev, uint8_t cap_ver, + uint16_t offset, + + pci_set_long(dev->w1cmask + offset + PCI_ERR_UNCOR_STATUS, + PCI_ERR_UNC_SUPPORTED); + - pci_set_long(dev->config + offset + PCI_ERR_UNCOR_MASK, + - PCI_ERR_UNC_MASK_DEFAULT); + - pci_set_long(dev->wmask + offset + PCI_ERR_UNCOR_MASK, + - PCI_ERR_UNC_SUPPORTED); + + + + if (dev->cap_present & QEMU_PCIE_ERR_UNC_MASK) { + + pci_set_long(dev->config + offset + PCI_ERR_UNCOR_MASK, + + PCI_ERR_UNC_MASK_DEFAULT); + + pci_set_long(dev->wmask + offset + PCI_ERR_UNCOR_MASK, + + PCI_ERR_UNC_SUPPORTED); + + } + + pci_set_long(dev->config + offset + PCI_ERR_UNCOR_SEVER, + PCI_ERR_UNC_SEVERITY_DEFAULT); + +I.e. If the property bit is enabled, we configure it as we did for +qemu-8.0. If the property bit is not set, we configure it as it was in 7.2. + +And now, everything that is missing is disabling the feature for old +machine types:: + + diff --git a/hw/core/machine.c b/hw/core/machine.c + index 47a34841a5..07f763eb2e 100644 + --- a/hw/core/machine.c + +++ b/hw/core/machine.c + @@ -48,6 +48,7 @@ GlobalProperty hw_compat_7_2[] = { + { "e1000e", "migrate-timadj", "off" }, + { "virtio-mem", "x-early-migration", "false" }, + { "migration", "x-preempt-pre-7-2", "true" }, + + { TYPE_PCI_DEVICE, "x-pcie-err-unc-mask", "off" }, + }; + const size_t hw_compat_7_2_len = G_N_ELEMENTS(hw_compat_7_2); + +And now, when qemu-8.0.1 is released with this fix, all combinations +are going to work as supposed. + +- $ qemu-7.2 -M pc-7.2 -> qemu-7.2 -M pc-7.2 (works) +- $ qemu-8.0.1 -M pc-7.2 -> qemu-8.0.1 -M pc-7.2 (works) +- $ qemu-8.0.1 -M pc-7.2 -> qemu-7.2 -M pc-7.2 (works) +- $ qemu-7.2 -M pc-7.2 -> qemu-8.0.1 -M pc-7.2 (works) + +So the normality has been restored and everything is ok, no? + +Not really, now our matrix is much bigger. We started with the easy +cases, migration from the same version to the same version always +works: + +- $ qemu-7.2 -M pc-7.2 -> qemu-7.2 -M pc-7.2 +- $ qemu-8.0 -M pc-7.2 -> qemu-8.0 -M pc-7.2 +- $ qemu-8.0.1 -M pc-7.2 -> qemu-8.0.1 -M pc-7.2 + +Now the interesting ones. When the QEMU processes versions are +different. For the 1st set, their fail and we can do nothing, both +versions are released and we can't change anything. + +- $ qemu-7.2 -M pc-7.2 -> qemu-8.0 -M pc-7.2 +- $ qemu-8.0 -M pc-7.2 -> qemu-7.2 -M pc-7.2 + +This two are the ones that work. The whole point of making the +change in qemu-8.0.1 release was to fix this issue: + +- $ qemu-7.2 -M pc-7.2 -> qemu-8.0.1 -M pc-7.2 +- $ qemu-8.0.1 -M pc-7.2 -> qemu-7.2 -M pc-7.2 + +But now we found that qemu-8.0 neither can migrate to qemu-7.2 not +qemu-8.0.1. + +- $ qemu-8.0 -M pc-7.2 -> qemu-8.0.1 -M pc-7.2 +- $ qemu-8.0.1 -M pc-7.2 -> qemu-8.0 -M pc-7.2 + +So, if we start a pc-7.2 machine in qemu-8.0 we can't migrate it to +anything except to qemu-8.0. + +Can we do better? + +Yeap. If we know that we are going to do this migration: + +- $ qemu-8.0 -M pc-7.2 -> qemu-8.0.1 -M pc-7.2 + +We can launch the appropriate devices with:: + + --device...,x-pci-e-err-unc-mask=on + +And now we can receive a migration from 8.0. And from now on, we can +do that migration to new machine types if we remember to enable that +property for pc-7.2. Notice that we need to remember, it is not +enough to know that the source of the migration is qemu-8.0. Think of +this example: + +$ qemu-8.0 -M pc-7.2 -> qemu-8.0.1 -M pc-7.2 -> qemu-8.2 -M pc-7.2 + +In the second migration, the source is not qemu-8.0, but we still have +that "problem" and have that property enabled. Notice that we need to +continue having this mark/property until we have this machine +rebooted. But it is not a normal reboot (that don't reload QEMU) we +need the machine to poweroff/poweron on a fixed QEMU. And from now +on we can use the proper real machine. diff --git a/docs/devel/migration/index.rst b/docs/devel/migration/index.rst index 2cb701c77c..7fc02b9520 100644 --- a/docs/devel/migration/index.rst +++ b/docs/devel/migration/index.rst @@ -8,5 +8,6 @@ QEMU live migration works. :maxdepth: 2 main + compatibility vfio virtio diff --git a/docs/devel/migration/main.rst b/docs/devel/migration/main.rst index 62bf027fb4..b3e31bb52f 100644 --- a/docs/devel/migration/main.rst +++ b/docs/devel/migration/main.rst @@ -993,522 +993,3 @@ In some cases it may be best to tie specific firmware versions to specific versioned machine types to cut down on the combinations that will need support. This is also useful when newer versions of firmware outgrow the padding. - - -Backwards compatibility -======================= - -How backwards compatibility works ---------------------------------- - -When we do migration, we have two QEMU processes: the source and the -target. There are two cases, they are the same version or they are -different versions. The easy case is when they are the same version. -The difficult one is when they are different versions. - -There are two things that are different, but they have very similar -names and sometimes get confused: - -- QEMU version -- machine type version - -Let's start with a practical example, we start with: - -- qemu-system-x86_64 (v5.2), from now on qemu-5.2. -- qemu-system-x86_64 (v5.1), from now on qemu-5.1. - -Related to this are the "latest" machine types defined on each of -them: - -- pc-q35-5.2 (newer one in qemu-5.2) from now on pc-5.2 -- pc-q35-5.1 (newer one in qemu-5.1) from now on pc-5.1 - -First of all, migration is only supposed to work if you use the same -machine type in both source and destination. The QEMU hardware -configuration needs to be the same also on source and destination. -Most aspects of the backend configuration can be changed at will, -except for a few cases where the backend features influence frontend -device feature exposure. But that is not relevant for this section. - -I am going to list the number of combinations that we can have. Let's -start with the trivial ones, QEMU is the same on source and -destination: - -1 - qemu-5.2 -M pc-5.2 -> migrates to -> qemu-5.2 -M pc-5.2 - - This is the latest QEMU with the latest machine type. - This have to work, and if it doesn't work it is a bug. - -2 - qemu-5.1 -M pc-5.1 -> migrates to -> qemu-5.1 -M pc-5.1 - - Exactly the same case than the previous one, but for 5.1. - Nothing to see here either. - -This are the easiest ones, we will not talk more about them in this -section. - -Now we start with the more interesting cases. Consider the case where -we have the same QEMU version in both sides (qemu-5.2) but we are using -the latest machine type for that version (pc-5.2) but one of an older -QEMU version, in this case pc-5.1. - -3 - qemu-5.2 -M pc-5.1 -> migrates to -> qemu-5.2 -M pc-5.1 - - It needs to use the definition of pc-5.1 and the devices as they - were configured on 5.1, but this should be easy in the sense that - both sides are the same QEMU and both sides have exactly the same - idea of what the pc-5.1 machine is. - -4 - qemu-5.1 -M pc-5.2 -> migrates to -> qemu-5.1 -M pc-5.2 - - This combination is not possible as the qemu-5.1 doesn't understand - pc-5.2 machine type. So nothing to worry here. - -Now it comes the interesting ones, when both QEMU processes are -different. Notice also that the machine type needs to be pc-5.1, -because we have the limitation than qemu-5.1 doesn't know pc-5.2. So -the possible cases are: - -5 - qemu-5.2 -M pc-5.1 -> migrates to -> qemu-5.1 -M pc-5.1 - - This migration is known as newer to older. We need to make sure - when we are developing 5.2 we need to take care about not to break - migration to qemu-5.1. Notice that we can't make updates to - qemu-5.1 to understand whatever qemu-5.2 decides to change, so it is - in qemu-5.2 side to make the relevant changes. - -6 - qemu-5.1 -M pc-5.1 -> migrates to -> qemu-5.2 -M pc-5.1 - - This migration is known as older to newer. We need to make sure - than we are able to receive migrations from qemu-5.1. The problem is - similar to the previous one. - -If qemu-5.1 and qemu-5.2 were the same, there will not be any -compatibility problems. But the reason that we create qemu-5.2 is to -get new features, devices, defaults, etc. - -If we get a device that has a new feature, or change a default value, -we have a problem when we try to migrate between different QEMU -versions. - -So we need a way to tell qemu-5.2 that when we are using machine type -pc-5.1, it needs to **not** use the feature, to be able to migrate to -real qemu-5.1. - -And the equivalent part when migrating from qemu-5.1 to qemu-5.2. -qemu-5.2 has to expect that it is not going to get data for the new -feature, because qemu-5.1 doesn't know about it. - -How do we tell QEMU about these device feature changes? In -hw/core/machine.c:hw_compat_X_Y arrays. - -If we change a default value, we need to put back the old value on -that array. And the device, during initialization needs to look at -that array to see what value it needs to get for that feature. And -what are we going to put in that array, the value of a property. - -To create a property for a device, we need to use one of the -DEFINE_PROP_*() macros. See include/hw/qdev-properties.h to find the -macros that exist. With it, we set the default value for that -property, and that is what it is going to get in the latest released -version. But if we want a different value for a previous version, we -can change that in the hw_compat_X_Y arrays. - -hw_compat_X_Y is an array of registers that have the format: - -- name_device -- name_property -- value - -Let's see a practical example. - -In qemu-5.2 virtio-blk-device got multi queue support. This is a -change that is not backward compatible. In qemu-5.1 it has one -queue. In qemu-5.2 it has the same number of queues as the number of -cpus in the system. - -When we are doing migration, if we migrate from a device that has 4 -queues to a device that have only one queue, we don't know where to -put the extra information for the other 3 queues, and we fail -migration. - -Similar problem when we migrate from qemu-5.1 that has only one queue -to qemu-5.2, we only sent information for one queue, but destination -has 4, and we have 3 queues that are not properly initialized and -anything can happen. - -So, how can we address this problem. Easy, just convince qemu-5.2 -that when it is running pc-5.1, it needs to set the number of queues -for virtio-blk-devices to 1. - -That way we fix the cases 5 and 6. - -5 - qemu-5.2 -M pc-5.1 -> migrates to -> qemu-5.1 -M pc-5.1 - - qemu-5.2 -M pc-5.1 sets number of queues to be 1. - qemu-5.1 -M pc-5.1 expects number of queues to be 1. - - correct. migration works. - -6 - qemu-5.1 -M pc-5.1 -> migrates to -> qemu-5.2 -M pc-5.1 - - qemu-5.1 -M pc-5.1 sets number of queues to be 1. - qemu-5.2 -M pc-5.1 expects number of queues to be 1. - - correct. migration works. - -And now the other interesting case, case 3. In this case we have: - -3 - qemu-5.2 -M pc-5.1 -> migrates to -> qemu-5.2 -M pc-5.1 - - Here we have the same QEMU in both sides. So it doesn't matter a - lot if we have set the number of queues to 1 or not, because - they are the same. - - WRONG! - - Think what happens if we do one of this double migrations: - - A -> migrates -> B -> migrates -> C - - where: - - A: qemu-5.1 -M pc-5.1 - B: qemu-5.2 -M pc-5.1 - C: qemu-5.2 -M pc-5.1 - - migration A -> B is case 6, so number of queues needs to be 1. - - migration B -> C is case 3, so we don't care. But actually we - care because we haven't started the guest in qemu-5.2, it came - migrated from qemu-5.1. So to be in the safe place, we need to - always use number of queues 1 when we are using pc-5.1. - -Now, how was this done in reality? The following commit shows how it -was done:: - - commit 9445e1e15e66c19e42bea942ba810db28052cd05 - Author: Stefan Hajnoczi - Date: Tue Aug 18 15:33:47 2020 +0100 - - virtio-blk-pci: default num_queues to -smp N - -The relevant parts for migration are:: - - @@ -1281,7 +1284,8 @@ static Property virtio_blk_properties[] = { - #endif - DEFINE_PROP_BIT("request-merging", VirtIOBlock, conf.request_merging, 0, - true), - - DEFINE_PROP_UINT16("num-queues", VirtIOBlock, conf.num_queues, 1), - + DEFINE_PROP_UINT16("num-queues", VirtIOBlock, conf.num_queues, - + VIRTIO_BLK_AUTO_NUM_QUEUES), - DEFINE_PROP_UINT16("queue-size", VirtIOBlock, conf.queue_size, 256), - -It changes the default value of num_queues. But it fishes it for old -machine types to have the right value:: - - @@ -31,6 +31,7 @@ - GlobalProperty hw_compat_5_1[] = { - ... - + { "virtio-blk-device", "num-queues", "1"}, - ... - }; - -A device with different features on both sides ----------------------------------------------- - -Let's assume that we are using the same QEMU binary on both sides, -just to make the things easier. But we have a device that has -different features on both sides of the migration. That can be -because the devices are different, because the kernel driver of both -devices have different features, whatever. - -How can we get this to work with migration. The way to do that is -"theoretically" easy. You have to get the features that the device -has in the source of the migration. The features that the device has -on the target of the migration, you get the intersection of the -features of both sides, and that is the way that you should launch -QEMU. - -Notice that this is not completely related to QEMU. The most -important thing here is that this should be handled by the managing -application that launches QEMU. If QEMU is configured correctly, the -migration will succeed. - -That said, actually doing it is complicated. Almost all devices are -bad at being able to be launched with only some features enabled. -With one big exception: cpus. - -You can read the documentation for QEMU x86 cpu models here: - -https://qemu-project.gitlab.io/qemu/system/qemu-cpu-models.html - -See when they talk about migration they recommend that one chooses the -newest cpu model that is supported for all cpus. - -Let's say that we have: - -Host A: - -Device X has the feature Y - -Host B: - -Device X has not the feature Y - -If we try to migrate without any care from host A to host B, it will -fail because when migration tries to load the feature Y on -destination, it will find that the hardware is not there. - -Doing this would be the equivalent of doing with cpus: - -Host A: - -$ qemu-system-x86_64 -cpu host - -Host B: - -$ qemu-system-x86_64 -cpu host - -When both hosts have different cpu features this is guaranteed to -fail. Especially if Host B has less features than host A. If host A -has less features than host B, sometimes it works. Important word of -last sentence is "sometimes". - -So, forgetting about cpu models and continuing with the -cpu host -example, let's see that the differences of the cpus is that Host A and -B have the following features: - -Features: 'pcid' 'stibp' 'taa-no' -Host A: X X -Host B: X - -And we want to migrate between them, the way configure both QEMU cpu -will be: - -Host A: - -$ qemu-system-x86_64 -cpu host,pcid=off,stibp=off - -Host B: - -$ qemu-system-x86_64 -cpu host,taa-no=off - -And you would be able to migrate between them. It is responsibility -of the management application or of the user to make sure that the -configuration is correct. QEMU doesn't know how to look at this kind -of features in general. - -Notice that we don't recommend to use -cpu host for migration. It is -used in this example because it makes the example simpler. - -Other devices have worse control about individual features. If they -want to be able to migrate between hosts that show different features, -the device needs a way to configure which ones it is going to use. - -In this section we have considered that we are using the same QEMU -binary in both sides of the migration. If we use different QEMU -versions process, then we need to have into account all other -differences and the examples become even more complicated. - -How to mitigate when we have a backward compatibility error ------------------------------------------------------------ - -We broke migration for old machine types continuously during -development. But as soon as we find that there is a problem, we fix -it. The problem is what happens when we detect after we have done a -release that something has gone wrong. - -Let see how it worked with one example. - -After the release of qemu-8.0 we found a problem when doing migration -of the machine type pc-7.2. - -- $ qemu-7.2 -M pc-7.2 -> qemu-7.2 -M pc-7.2 - - This migration works - -- $ qemu-8.0 -M pc-7.2 -> qemu-8.0 -M pc-7.2 - - This migration works - -- $ qemu-8.0 -M pc-7.2 -> qemu-7.2 -M pc-7.2 - - This migration fails - -- $ qemu-7.2 -M pc-7.2 -> qemu-8.0 -M pc-7.2 - - This migration fails - -So clearly something fails when migration between qemu-7.2 and -qemu-8.0 with machine type pc-7.2. The error messages, and git bisect -pointed to this commit. - -In qemu-8.0 we got this commit:: - - commit 010746ae1db7f52700cb2e2c46eb94f299cfa0d2 - Author: Jonathan Cameron - Date: Thu Mar 2 13:37:02 2023 +0000 - - hw/pci/aer: Implement PCI_ERR_UNCOR_MASK register - - -The relevant bits of the commit for our example are this ones:: - - --- a/hw/pci/pcie_aer.c - +++ b/hw/pci/pcie_aer.c - @@ -112,6 +112,10 @@ int pcie_aer_init(PCIDevice *dev, - - pci_set_long(dev->w1cmask + offset + PCI_ERR_UNCOR_STATUS, - PCI_ERR_UNC_SUPPORTED); - + pci_set_long(dev->config + offset + PCI_ERR_UNCOR_MASK, - + PCI_ERR_UNC_MASK_DEFAULT); - + pci_set_long(dev->wmask + offset + PCI_ERR_UNCOR_MASK, - + PCI_ERR_UNC_SUPPORTED); - - pci_set_long(dev->config + offset + PCI_ERR_UNCOR_SEVER, - PCI_ERR_UNC_SEVERITY_DEFAULT); - -The patch changes how we configure PCI space for AER. But QEMU fails -when the PCI space configuration is different between source and -destination. - -The following commit shows how this got fixed:: - - commit 5ed3dabe57dd9f4c007404345e5f5bf0e347317f - Author: Leonardo Bras - Date: Tue May 2 21:27:02 2023 -0300 - - hw/pci: Disable PCI_ERR_UNCOR_MASK register for machine type < 8.0 - - [...] - -The relevant parts of the fix in QEMU are as follow: - -First, we create a new property for the device to be able to configure -the old behaviour or the new behaviour:: - - diff --git a/hw/pci/pci.c b/hw/pci/pci.c - index 8a87ccc8b0..5153ad63d6 100644 - --- a/hw/pci/pci.c - +++ b/hw/pci/pci.c - @@ -79,6 +79,8 @@ static Property pci_props[] = { - DEFINE_PROP_STRING("failover_pair_id", PCIDevice, - failover_pair_id), - DEFINE_PROP_UINT32("acpi-index", PCIDevice, acpi_index, 0), - + DEFINE_PROP_BIT("x-pcie-err-unc-mask", PCIDevice, cap_present, - + QEMU_PCIE_ERR_UNC_MASK_BITNR, true), - DEFINE_PROP_END_OF_LIST() - }; - -Notice that we enable the feature for new machine types. - -Now we see how the fix is done. This is going to depend on what kind -of breakage happens, but in this case it is quite simple:: - - diff --git a/hw/pci/pcie_aer.c b/hw/pci/pcie_aer.c - index 103667c368..374d593ead 100644 - --- a/hw/pci/pcie_aer.c - +++ b/hw/pci/pcie_aer.c - @@ -112,10 +112,13 @@ int pcie_aer_init(PCIDevice *dev, uint8_t cap_ver, - uint16_t offset, - - pci_set_long(dev->w1cmask + offset + PCI_ERR_UNCOR_STATUS, - PCI_ERR_UNC_SUPPORTED); - - pci_set_long(dev->config + offset + PCI_ERR_UNCOR_MASK, - - PCI_ERR_UNC_MASK_DEFAULT); - - pci_set_long(dev->wmask + offset + PCI_ERR_UNCOR_MASK, - - PCI_ERR_UNC_SUPPORTED); - + - + if (dev->cap_present & QEMU_PCIE_ERR_UNC_MASK) { - + pci_set_long(dev->config + offset + PCI_ERR_UNCOR_MASK, - + PCI_ERR_UNC_MASK_DEFAULT); - + pci_set_long(dev->wmask + offset + PCI_ERR_UNCOR_MASK, - + PCI_ERR_UNC_SUPPORTED); - + } - - pci_set_long(dev->config + offset + PCI_ERR_UNCOR_SEVER, - PCI_ERR_UNC_SEVERITY_DEFAULT); - -I.e. If the property bit is enabled, we configure it as we did for -qemu-8.0. If the property bit is not set, we configure it as it was in 7.2. - -And now, everything that is missing is disabling the feature for old -machine types:: - - diff --git a/hw/core/machine.c b/hw/core/machine.c - index 47a34841a5..07f763eb2e 100644 - --- a/hw/core/machine.c - +++ b/hw/core/machine.c - @@ -48,6 +48,7 @@ GlobalProperty hw_compat_7_2[] = { - { "e1000e", "migrate-timadj", "off" }, - { "virtio-mem", "x-early-migration", "false" }, - { "migration", "x-preempt-pre-7-2", "true" }, - + { TYPE_PCI_DEVICE, "x-pcie-err-unc-mask", "off" }, - }; - const size_t hw_compat_7_2_len = G_N_ELEMENTS(hw_compat_7_2); - -And now, when qemu-8.0.1 is released with this fix, all combinations -are going to work as supposed. - -- $ qemu-7.2 -M pc-7.2 -> qemu-7.2 -M pc-7.2 (works) -- $ qemu-8.0.1 -M pc-7.2 -> qemu-8.0.1 -M pc-7.2 (works) -- $ qemu-8.0.1 -M pc-7.2 -> qemu-7.2 -M pc-7.2 (works) -- $ qemu-7.2 -M pc-7.2 -> qemu-8.0.1 -M pc-7.2 (works) - -So the normality has been restored and everything is ok, no? - -Not really, now our matrix is much bigger. We started with the easy -cases, migration from the same version to the same version always -works: - -- $ qemu-7.2 -M pc-7.2 -> qemu-7.2 -M pc-7.2 -- $ qemu-8.0 -M pc-7.2 -> qemu-8.0 -M pc-7.2 -- $ qemu-8.0.1 -M pc-7.2 -> qemu-8.0.1 -M pc-7.2 - -Now the interesting ones. When the QEMU processes versions are -different. For the 1st set, their fail and we can do nothing, both -versions are released and we can't change anything. - -- $ qemu-7.2 -M pc-7.2 -> qemu-8.0 -M pc-7.2 -- $ qemu-8.0 -M pc-7.2 -> qemu-7.2 -M pc-7.2 - -This two are the ones that work. The whole point of making the -change in qemu-8.0.1 release was to fix this issue: - -- $ qemu-7.2 -M pc-7.2 -> qemu-8.0.1 -M pc-7.2 -- $ qemu-8.0.1 -M pc-7.2 -> qemu-7.2 -M pc-7.2 - -But now we found that qemu-8.0 neither can migrate to qemu-7.2 not -qemu-8.0.1. - -- $ qemu-8.0 -M pc-7.2 -> qemu-8.0.1 -M pc-7.2 -- $ qemu-8.0.1 -M pc-7.2 -> qemu-8.0 -M pc-7.2 - -So, if we start a pc-7.2 machine in qemu-8.0 we can't migrate it to -anything except to qemu-8.0. - -Can we do better? - -Yeap. If we know that we are going to do this migration: - -- $ qemu-8.0 -M pc-7.2 -> qemu-8.0.1 -M pc-7.2 - -We can launch the appropriate devices with:: - - --device...,x-pci-e-err-unc-mask=on - -And now we can receive a migration from 8.0. And from now on, we can -do that migration to new machine types if we remember to enable that -property for pc-7.2. Notice that we need to remember, it is not -enough to know that the source of the migration is qemu-8.0. Think of -this example: - -$ qemu-8.0 -M pc-7.2 -> qemu-8.0.1 -M pc-7.2 -> qemu-8.2 -M pc-7.2 - -In the second migration, the source is not qemu-8.0, but we still have -that "problem" and have that property enabled. Notice that we need to -continue having this mark/property until we have this machine -rebooted. But it is not a normal reboot (that don't reload QEMU) we -need the machine to poweroff/poweron on a fixed QEMU. And from now -on we can use the proper real machine. From patchwork Tue Jan 16 03:19:41 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 13520361 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 5485BC4706C for ; Tue, 16 Jan 2024 03:22:28 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1rPa0M-0005rD-Gq; Mon, 15 Jan 2024 22:20:46 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1rPa0K-0005mO-MC for qemu-devel@nongnu.org; Mon, 15 Jan 2024 22:20:44 -0500 Received: from us-smtp-delivery-124.mimecast.com ([170.10.129.124]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1rPa0I-00036C-Lb for qemu-devel@nongnu.org; Mon, 15 Jan 2024 22:20:44 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1705375241; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=zaWAr7R+o0BFfn2GI+58asPDPgLyrb90Op/REHjc/j8=; b=cS5Sb544mXhnmUw4dOCdpkqSglC6cPFyEwgeTpfVh2YKJnyW9ZnILBM+ok7E1/IZVPxVjk 90lzMAOR94Evpo290elY+TNCxTzI/1NysHguFYw7aHKv21alKYETZf0j2y5s7S+hdYT4bu +NXlVCGPkliuF+YulVwbX4OWbnir0ic= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-591-G9pww0BRNneadZRs_kGSBQ-1; Mon, 15 Jan 2024 22:20:40 -0500 X-MC-Unique: G9pww0BRNneadZRs_kGSBQ-1 Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.rdu2.redhat.com [10.11.54.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id BA179185A780; Tue, 16 Jan 2024 03:20:39 +0000 (UTC) Received: from x1n.redhat.com (unknown [10.72.116.92]) by smtp.corp.redhat.com (Postfix) with ESMTP id 6CB153C25; Tue, 16 Jan 2024 03:20:37 +0000 (UTC) From: peterx@redhat.com To: qemu-devel@nongnu.org, Peter Maydell Cc: peterx@redhat.com, Fabiano Rosas , =?utf-8?q?C=C3=A9dri?= =?utf-8?q?c_Le_Goater?= Subject: [PULL 14/20] docs/migration: Split "Debugging" and "Firmware" Date: Tue, 16 Jan 2024 11:19:41 +0800 Message-ID: <20240116031947.69017-15-peterx@redhat.com> In-Reply-To: <20240116031947.69017-1-peterx@redhat.com> References: <20240116031947.69017-1-peterx@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.4.1 on 10.11.54.1 Received-SPF: pass client-ip=170.10.129.124; envelope-from=peterx@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -35 X-Spam_score: -3.6 X-Spam_bar: --- X-Spam_report: (-3.6 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-1.531, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H4=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org From: Peter Xu Move the two sections into a separate file called "best-practices.rst". Add the entry into index. Reviewed-by: Cédric Le Goater Link: https://lore.kernel.org/r/20240109064628.595453-6-peterx@redhat.com Signed-off-by: Peter Xu --- docs/devel/migration/best-practices.rst | 48 +++++++++++++++++++++++++ docs/devel/migration/index.rst | 1 + docs/devel/migration/main.rst | 44 ----------------------- 3 files changed, 49 insertions(+), 44 deletions(-) create mode 100644 docs/devel/migration/best-practices.rst diff --git a/docs/devel/migration/best-practices.rst b/docs/devel/migration/best-practices.rst new file mode 100644 index 0000000000..d7c34a3014 --- /dev/null +++ b/docs/devel/migration/best-practices.rst @@ -0,0 +1,48 @@ +============== +Best practices +============== + +Debugging +========= + +The migration stream can be analyzed thanks to ``scripts/analyze-migration.py``. + +Example usage: + +.. code-block:: shell + + $ qemu-system-x86_64 -display none -monitor stdio + (qemu) migrate "exec:cat > mig" + (qemu) q + $ ./scripts/analyze-migration.py -f mig + { + "ram (3)": { + "section sizes": { + "pc.ram": "0x0000000008000000", + ... + +See also ``analyze-migration.py -h`` help for more options. + +Firmware +======== + +Migration migrates the copies of RAM and ROM, and thus when running +on the destination it includes the firmware from the source. Even after +resetting a VM, the old firmware is used. Only once QEMU has been restarted +is the new firmware in use. + +- Changes in firmware size can cause changes in the required RAMBlock size + to hold the firmware and thus migration can fail. In practice it's best + to pad firmware images to convenient powers of 2 with plenty of space + for growth. + +- Care should be taken with device emulation code so that newer + emulation code can work with older firmware to allow forward migration. + +- Care should be taken with newer firmware so that backward migration + to older systems with older device emulation code will work. + +In some cases it may be best to tie specific firmware versions to specific +versioned machine types to cut down on the combinations that will need +support. This is also useful when newer versions of firmware outgrow +the padding. diff --git a/docs/devel/migration/index.rst b/docs/devel/migration/index.rst index 7fc02b9520..9a8fd1ead7 100644 --- a/docs/devel/migration/index.rst +++ b/docs/devel/migration/index.rst @@ -11,3 +11,4 @@ QEMU live migration works. compatibility vfio virtio + best-practices diff --git a/docs/devel/migration/main.rst b/docs/devel/migration/main.rst index b3e31bb52f..97811ce371 100644 --- a/docs/devel/migration/main.rst +++ b/docs/devel/migration/main.rst @@ -52,27 +52,6 @@ All these migration protocols use the same infrastructure to save/restore state devices. This infrastructure is shared with the savevm/loadvm functionality. -Debugging -========= - -The migration stream can be analyzed thanks to ``scripts/analyze-migration.py``. - -Example usage: - -.. code-block:: shell - - $ qemu-system-x86_64 -display none -monitor stdio - (qemu) migrate "exec:cat > mig" - (qemu) q - $ ./scripts/analyze-migration.py -f mig - { - "ram (3)": { - "section sizes": { - "pc.ram": "0x0000000008000000", - ... - -See also ``analyze-migration.py -h`` help for more options. - Common infrastructure ===================== @@ -970,26 +949,3 @@ the background migration channel. Anyone who cares about latencies of page faults during a postcopy migration should enable this feature. By default, it's not enabled. -Firmware -======== - -Migration migrates the copies of RAM and ROM, and thus when running -on the destination it includes the firmware from the source. Even after -resetting a VM, the old firmware is used. Only once QEMU has been restarted -is the new firmware in use. - -- Changes in firmware size can cause changes in the required RAMBlock size - to hold the firmware and thus migration can fail. In practice it's best - to pad firmware images to convenient powers of 2 with plenty of space - for growth. - -- Care should be taken with device emulation code so that newer - emulation code can work with older firmware to allow forward migration. - -- Care should be taken with newer firmware so that backward migration - to older systems with older device emulation code will work. - -In some cases it may be best to tie specific firmware versions to specific -versioned machine types to cut down on the combinations that will need -support. This is also useful when newer versions of firmware outgrow -the padding. From patchwork Tue Jan 16 03:19:42 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 13520353 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id BC86BC47077 for ; Tue, 16 Jan 2024 03:22:04 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1rPa0T-0005wk-8Z; Mon, 15 Jan 2024 22:20:53 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1rPa0R-0005uf-M9 for qemu-devel@nongnu.org; Mon, 15 Jan 2024 22:20:51 -0500 Received: from us-smtp-delivery-124.mimecast.com ([170.10.129.124]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1rPa0O-00036S-4h for qemu-devel@nongnu.org; Mon, 15 Jan 2024 22:20:51 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1705375247; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=WqtYvIaIxP66Jv1Pf5X6efIhpNJoSlvKm+/PsY334o4=; b=LuEmjD7h3lTLMEKCsaLI9m+izPLyofP/c/jOwGX+kyVr9t6R6E9BcbWQqA8et7PQtXnFmw sQvv2QmPC4LgdDKw0owPVCBTKkqR5/tRGmyBoqRs2ky40WUFlkpLdJKq2MM8yjzAXFK2iX 5G57FvaxOhdyO+LYJecclwH9HwbhNmk= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-655-oRqRjz1QNqC7jXBYA22Dyg-1; Mon, 15 Jan 2024 22:20:44 -0500 X-MC-Unique: oRqRjz1QNqC7jXBYA22Dyg-1 Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.rdu2.redhat.com [10.11.54.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id B71FE811E86; Tue, 16 Jan 2024 03:20:43 +0000 (UTC) Received: from x1n.redhat.com (unknown [10.72.116.92]) by smtp.corp.redhat.com (Postfix) with ESMTP id 6DBB83C25; Tue, 16 Jan 2024 03:20:40 +0000 (UTC) From: peterx@redhat.com To: qemu-devel@nongnu.org, Peter Maydell Cc: peterx@redhat.com, Fabiano Rosas , =?utf-8?q?C=C3=A9dri?= =?utf-8?q?c_Le_Goater?= Subject: [PULL 15/20] docs/migration: Split "Postcopy" Date: Tue, 16 Jan 2024 11:19:42 +0800 Message-ID: <20240116031947.69017-16-peterx@redhat.com> In-Reply-To: <20240116031947.69017-1-peterx@redhat.com> References: <20240116031947.69017-1-peterx@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.4.1 on 10.11.54.1 Received-SPF: pass client-ip=170.10.129.124; envelope-from=peterx@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -35 X-Spam_score: -3.6 X-Spam_bar: --- X-Spam_report: (-3.6 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-1.531, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H4=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org From: Peter Xu Split postcopy into a separate file. Introduce a head page "features.rst" to keep all the features on top of migration framework. Reviewed-by: Cédric Le Goater Link: https://lore.kernel.org/r/20240109064628.595453-7-peterx@redhat.com Signed-off-by: Peter Xu --- docs/devel/migration/features.rst | 9 + docs/devel/migration/index.rst | 1 + docs/devel/migration/main.rst | 305 ------------------------------ docs/devel/migration/postcopy.rst | 304 +++++++++++++++++++++++++++++ 4 files changed, 314 insertions(+), 305 deletions(-) create mode 100644 docs/devel/migration/features.rst create mode 100644 docs/devel/migration/postcopy.rst diff --git a/docs/devel/migration/features.rst b/docs/devel/migration/features.rst new file mode 100644 index 0000000000..0054e0c900 --- /dev/null +++ b/docs/devel/migration/features.rst @@ -0,0 +1,9 @@ +Migration features +================== + +Migration has plenty of features to support different use cases. + +.. toctree:: + :maxdepth: 2 + + postcopy diff --git a/docs/devel/migration/index.rst b/docs/devel/migration/index.rst index 9a8fd1ead7..21ad58b189 100644 --- a/docs/devel/migration/index.rst +++ b/docs/devel/migration/index.rst @@ -8,6 +8,7 @@ QEMU live migration works. :maxdepth: 2 main + features compatibility vfio virtio diff --git a/docs/devel/migration/main.rst b/docs/devel/migration/main.rst index 97811ce371..051ea43f0e 100644 --- a/docs/devel/migration/main.rst +++ b/docs/devel/migration/main.rst @@ -644,308 +644,3 @@ algorithm will restrict virtual CPUs as needed to keep their dirty page rate inside the limit. This leads to more steady reading performance during live migration and can aid in improving large guest responsiveness. -Postcopy -======== - -'Postcopy' migration is a way to deal with migrations that refuse to converge -(or take too long to converge) its plus side is that there is an upper bound on -the amount of migration traffic and time it takes, the down side is that during -the postcopy phase, a failure of *either* side causes the guest to be lost. - -In postcopy the destination CPUs are started before all the memory has been -transferred, and accesses to pages that are yet to be transferred cause -a fault that's translated by QEMU into a request to the source QEMU. - -Postcopy can be combined with precopy (i.e. normal migration) so that if precopy -doesn't finish in a given time the switch is made to postcopy. - -Enabling postcopy ------------------ - -To enable postcopy, issue this command on the monitor (both source and -destination) prior to the start of migration: - -``migrate_set_capability postcopy-ram on`` - -The normal commands are then used to start a migration, which is still -started in precopy mode. Issuing: - -``migrate_start_postcopy`` - -will now cause the transition from precopy to postcopy. -It can be issued immediately after migration is started or any -time later on. Issuing it after the end of a migration is harmless. - -Blocktime is a postcopy live migration metric, intended to show how -long the vCPU was in state of interruptible sleep due to pagefault. -That metric is calculated both for all vCPUs as overlapped value, and -separately for each vCPU. These values are calculated on destination -side. To enable postcopy blocktime calculation, enter following -command on destination monitor: - -``migrate_set_capability postcopy-blocktime on`` - -Postcopy blocktime can be retrieved by query-migrate qmp command. -postcopy-blocktime value of qmp command will show overlapped blocking -time for all vCPU, postcopy-vcpu-blocktime will show list of blocking -time per vCPU. - -.. note:: - During the postcopy phase, the bandwidth limits set using - ``migrate_set_parameter`` is ignored (to avoid delaying requested pages that - the destination is waiting for). - -Postcopy device transfer ------------------------- - -Loading of device data may cause the device emulation to access guest RAM -that may trigger faults that have to be resolved by the source, as such -the migration stream has to be able to respond with page data *during* the -device load, and hence the device data has to be read from the stream completely -before the device load begins to free the stream up. This is achieved by -'packaging' the device data into a blob that's read in one go. - -Source behaviour ----------------- - -Until postcopy is entered the migration stream is identical to normal -precopy, except for the addition of a 'postcopy advise' command at -the beginning, to tell the destination that postcopy might happen. -When postcopy starts the source sends the page discard data and then -forms the 'package' containing: - - - Command: 'postcopy listen' - - The device state - - A series of sections, identical to the precopy streams device state stream - containing everything except postcopiable devices (i.e. RAM) - - Command: 'postcopy run' - -The 'package' is sent as the data part of a Command: ``CMD_PACKAGED``, and the -contents are formatted in the same way as the main migration stream. - -During postcopy the source scans the list of dirty pages and sends them -to the destination without being requested (in much the same way as precopy), -however when a page request is received from the destination, the dirty page -scanning restarts from the requested location. This causes requested pages -to be sent quickly, and also causes pages directly after the requested page -to be sent quickly in the hope that those pages are likely to be used -by the destination soon. - -Destination behaviour ---------------------- - -Initially the destination looks the same as precopy, with a single thread -reading the migration stream; the 'postcopy advise' and 'discard' commands -are processed to change the way RAM is managed, but don't affect the stream -processing. - -:: - - ------------------------------------------------------------------------------ - 1 2 3 4 5 6 7 - main -----DISCARD-CMD_PACKAGED ( LISTEN DEVICE DEVICE DEVICE RUN ) - thread | | - | (page request) - | \___ - v \ - listen thread: --- page -- page -- page -- page -- page -- - - a b c - ------------------------------------------------------------------------------ - -- On receipt of ``CMD_PACKAGED`` (1) - - All the data associated with the package - the ( ... ) section in the diagram - - is read into memory, and the main thread recurses into qemu_loadvm_state_main - to process the contents of the package (2) which contains commands (3,6) and - devices (4...) - -- On receipt of 'postcopy listen' - 3 -(i.e. the 1st command in the package) - - a new thread (a) is started that takes over servicing the migration stream, - while the main thread carries on loading the package. It loads normal - background page data (b) but if during a device load a fault happens (5) - the returned page (c) is loaded by the listen thread allowing the main - threads device load to carry on. - -- The last thing in the ``CMD_PACKAGED`` is a 'RUN' command (6) - - letting the destination CPUs start running. At the end of the - ``CMD_PACKAGED`` (7) the main thread returns to normal running behaviour and - is no longer used by migration, while the listen thread carries on servicing - page data until the end of migration. - -Postcopy Recovery ------------------ - -Comparing to precopy, postcopy is special on error handlings. When any -error happens (in this case, mostly network errors), QEMU cannot easily -fail a migration because VM data resides in both source and destination -QEMU instances. On the other hand, when issue happens QEMU on both sides -will go into a paused state. It'll need a recovery phase to continue a -paused postcopy migration. - -The recovery phase normally contains a few steps: - - - When network issue occurs, both QEMU will go into PAUSED state - - - When the network is recovered (or a new network is provided), the admin - can setup the new channel for migration using QMP command - 'migrate-recover' on destination node, preparing for a resume. - - - On source host, the admin can continue the interrupted postcopy - migration using QMP command 'migrate' with resume=true flag set. - - - After the connection is re-established, QEMU will continue the postcopy - migration on both sides. - -During a paused postcopy migration, the VM can logically still continue -running, and it will not be impacted from any page access to pages that -were already migrated to destination VM before the interruption happens. -However, if any of the missing pages got accessed on destination VM, the VM -thread will be halted waiting for the page to be migrated, it means it can -be halted until the recovery is complete. - -The impact of accessing missing pages can be relevant to different -configurations of the guest. For example, when with async page fault -enabled, logically the guest can proactively schedule out the threads -accessing missing pages. - -Postcopy states ---------------- - -Postcopy moves through a series of states (see postcopy_state) from -ADVISE->DISCARD->LISTEN->RUNNING->END - - - Advise - - Set at the start of migration if postcopy is enabled, even - if it hasn't had the start command; here the destination - checks that its OS has the support needed for postcopy, and performs - setup to ensure the RAM mappings are suitable for later postcopy. - The destination will fail early in migration at this point if the - required OS support is not present. - (Triggered by reception of POSTCOPY_ADVISE command) - - - Discard - - Entered on receipt of the first 'discard' command; prior to - the first Discard being performed, hugepages are switched off - (using madvise) to ensure that no new huge pages are created - during the postcopy phase, and to cause any huge pages that - have discards on them to be broken. - - - Listen - - The first command in the package, POSTCOPY_LISTEN, switches - the destination state to Listen, and starts a new thread - (the 'listen thread') which takes over the job of receiving - pages off the migration stream, while the main thread carries - on processing the blob. With this thread able to process page - reception, the destination now 'sensitises' the RAM to detect - any access to missing pages (on Linux using the 'userfault' - system). - - - Running - - POSTCOPY_RUN causes the destination to synchronise all - state and start the CPUs and IO devices running. The main - thread now finishes processing the migration package and - now carries on as it would for normal precopy migration - (although it can't do the cleanup it would do as it - finishes a normal migration). - - - Paused - - Postcopy can run into a paused state (normally on both sides when - happens), where all threads will be temporarily halted mostly due to - network errors. When reaching paused state, migration will make sure - the qemu binary on both sides maintain the data without corrupting - the VM. To continue the migration, the admin needs to fix the - migration channel using the QMP command 'migrate-recover' on the - destination node, then resume the migration using QMP command 'migrate' - again on source node, with resume=true flag set. - - - End - - The listen thread can now quit, and perform the cleanup of migration - state, the migration is now complete. - -Source side page map --------------------- - -The 'migration bitmap' in postcopy is basically the same as in the precopy, -where each of the bit to indicate that page is 'dirty' - i.e. needs -sending. During the precopy phase this is updated as the CPU dirties -pages, however during postcopy the CPUs are stopped and nothing should -dirty anything any more. Instead, dirty bits are cleared when the relevant -pages are sent during postcopy. - -Postcopy with hugepages ------------------------ - -Postcopy now works with hugetlbfs backed memory: - - a) The linux kernel on the destination must support userfault on hugepages. - b) The huge-page configuration on the source and destination VMs must be - identical; i.e. RAMBlocks on both sides must use the same page size. - c) Note that ``-mem-path /dev/hugepages`` will fall back to allocating normal - RAM if it doesn't have enough hugepages, triggering (b) to fail. - Using ``-mem-prealloc`` enforces the allocation using hugepages. - d) Care should be taken with the size of hugepage used; postcopy with 2MB - hugepages works well, however 1GB hugepages are likely to be problematic - since it takes ~1 second to transfer a 1GB hugepage across a 10Gbps link, - and until the full page is transferred the destination thread is blocked. - -Postcopy with shared memory ---------------------------- - -Postcopy migration with shared memory needs explicit support from the other -processes that share memory and from QEMU. There are restrictions on the type of -memory that userfault can support shared. - -The Linux kernel userfault support works on ``/dev/shm`` memory and on ``hugetlbfs`` -(although the kernel doesn't provide an equivalent to ``madvise(MADV_DONTNEED)`` -for hugetlbfs which may be a problem in some configurations). - -The vhost-user code in QEMU supports clients that have Postcopy support, -and the ``vhost-user-bridge`` (in ``tests/``) and the DPDK package have changes -to support postcopy. - -The client needs to open a userfaultfd and register the areas -of memory that it maps with userfault. The client must then pass the -userfaultfd back to QEMU together with a mapping table that allows -fault addresses in the clients address space to be converted back to -RAMBlock/offsets. The client's userfaultfd is added to the postcopy -fault-thread and page requests are made on behalf of the client by QEMU. -QEMU performs 'wake' operations on the client's userfaultfd to allow it -to continue after a page has arrived. - -.. note:: - There are two future improvements that would be nice: - a) Some way to make QEMU ignorant of the addresses in the clients - address space - b) Avoiding the need for QEMU to perform ufd-wake calls after the - pages have arrived - -Retro-fitting postcopy to existing clients is possible: - a) A mechanism is needed for the registration with userfault as above, - and the registration needs to be coordinated with the phases of - postcopy. In vhost-user extra messages are added to the existing - control channel. - b) Any thread that can block due to guest memory accesses must be - identified and the implication understood; for example if the - guest memory access is made while holding a lock then all other - threads waiting for that lock will also be blocked. - -Postcopy Preemption Mode ------------------------- - -Postcopy preempt is a new capability introduced in 8.0 QEMU release, it -allows urgent pages (those got page fault requested from destination QEMU -explicitly) to be sent in a separate preempt channel, rather than queued in -the background migration channel. Anyone who cares about latencies of page -faults during a postcopy migration should enable this feature. By default, -it's not enabled. - diff --git a/docs/devel/migration/postcopy.rst b/docs/devel/migration/postcopy.rst new file mode 100644 index 0000000000..d60eec06ab --- /dev/null +++ b/docs/devel/migration/postcopy.rst @@ -0,0 +1,304 @@ +Postcopy +======== + +'Postcopy' migration is a way to deal with migrations that refuse to converge +(or take too long to converge) its plus side is that there is an upper bound on +the amount of migration traffic and time it takes, the down side is that during +the postcopy phase, a failure of *either* side causes the guest to be lost. + +In postcopy the destination CPUs are started before all the memory has been +transferred, and accesses to pages that are yet to be transferred cause +a fault that's translated by QEMU into a request to the source QEMU. + +Postcopy can be combined with precopy (i.e. normal migration) so that if precopy +doesn't finish in a given time the switch is made to postcopy. + +Enabling postcopy +----------------- + +To enable postcopy, issue this command on the monitor (both source and +destination) prior to the start of migration: + +``migrate_set_capability postcopy-ram on`` + +The normal commands are then used to start a migration, which is still +started in precopy mode. Issuing: + +``migrate_start_postcopy`` + +will now cause the transition from precopy to postcopy. +It can be issued immediately after migration is started or any +time later on. Issuing it after the end of a migration is harmless. + +Blocktime is a postcopy live migration metric, intended to show how +long the vCPU was in state of interruptible sleep due to pagefault. +That metric is calculated both for all vCPUs as overlapped value, and +separately for each vCPU. These values are calculated on destination +side. To enable postcopy blocktime calculation, enter following +command on destination monitor: + +``migrate_set_capability postcopy-blocktime on`` + +Postcopy blocktime can be retrieved by query-migrate qmp command. +postcopy-blocktime value of qmp command will show overlapped blocking +time for all vCPU, postcopy-vcpu-blocktime will show list of blocking +time per vCPU. + +.. note:: + During the postcopy phase, the bandwidth limits set using + ``migrate_set_parameter`` is ignored (to avoid delaying requested pages that + the destination is waiting for). + +Postcopy device transfer +------------------------ + +Loading of device data may cause the device emulation to access guest RAM +that may trigger faults that have to be resolved by the source, as such +the migration stream has to be able to respond with page data *during* the +device load, and hence the device data has to be read from the stream completely +before the device load begins to free the stream up. This is achieved by +'packaging' the device data into a blob that's read in one go. + +Source behaviour +---------------- + +Until postcopy is entered the migration stream is identical to normal +precopy, except for the addition of a 'postcopy advise' command at +the beginning, to tell the destination that postcopy might happen. +When postcopy starts the source sends the page discard data and then +forms the 'package' containing: + + - Command: 'postcopy listen' + - The device state + + A series of sections, identical to the precopy streams device state stream + containing everything except postcopiable devices (i.e. RAM) + - Command: 'postcopy run' + +The 'package' is sent as the data part of a Command: ``CMD_PACKAGED``, and the +contents are formatted in the same way as the main migration stream. + +During postcopy the source scans the list of dirty pages and sends them +to the destination without being requested (in much the same way as precopy), +however when a page request is received from the destination, the dirty page +scanning restarts from the requested location. This causes requested pages +to be sent quickly, and also causes pages directly after the requested page +to be sent quickly in the hope that those pages are likely to be used +by the destination soon. + +Destination behaviour +--------------------- + +Initially the destination looks the same as precopy, with a single thread +reading the migration stream; the 'postcopy advise' and 'discard' commands +are processed to change the way RAM is managed, but don't affect the stream +processing. + +:: + + ------------------------------------------------------------------------------ + 1 2 3 4 5 6 7 + main -----DISCARD-CMD_PACKAGED ( LISTEN DEVICE DEVICE DEVICE RUN ) + thread | | + | (page request) + | \___ + v \ + listen thread: --- page -- page -- page -- page -- page -- + + a b c + ------------------------------------------------------------------------------ + +- On receipt of ``CMD_PACKAGED`` (1) + + All the data associated with the package - the ( ... ) section in the diagram - + is read into memory, and the main thread recurses into qemu_loadvm_state_main + to process the contents of the package (2) which contains commands (3,6) and + devices (4...) + +- On receipt of 'postcopy listen' - 3 -(i.e. the 1st command in the package) + + a new thread (a) is started that takes over servicing the migration stream, + while the main thread carries on loading the package. It loads normal + background page data (b) but if during a device load a fault happens (5) + the returned page (c) is loaded by the listen thread allowing the main + threads device load to carry on. + +- The last thing in the ``CMD_PACKAGED`` is a 'RUN' command (6) + + letting the destination CPUs start running. At the end of the + ``CMD_PACKAGED`` (7) the main thread returns to normal running behaviour and + is no longer used by migration, while the listen thread carries on servicing + page data until the end of migration. + +Postcopy Recovery +----------------- + +Comparing to precopy, postcopy is special on error handlings. When any +error happens (in this case, mostly network errors), QEMU cannot easily +fail a migration because VM data resides in both source and destination +QEMU instances. On the other hand, when issue happens QEMU on both sides +will go into a paused state. It'll need a recovery phase to continue a +paused postcopy migration. + +The recovery phase normally contains a few steps: + + - When network issue occurs, both QEMU will go into PAUSED state + + - When the network is recovered (or a new network is provided), the admin + can setup the new channel for migration using QMP command + 'migrate-recover' on destination node, preparing for a resume. + + - On source host, the admin can continue the interrupted postcopy + migration using QMP command 'migrate' with resume=true flag set. + + - After the connection is re-established, QEMU will continue the postcopy + migration on both sides. + +During a paused postcopy migration, the VM can logically still continue +running, and it will not be impacted from any page access to pages that +were already migrated to destination VM before the interruption happens. +However, if any of the missing pages got accessed on destination VM, the VM +thread will be halted waiting for the page to be migrated, it means it can +be halted until the recovery is complete. + +The impact of accessing missing pages can be relevant to different +configurations of the guest. For example, when with async page fault +enabled, logically the guest can proactively schedule out the threads +accessing missing pages. + +Postcopy states +--------------- + +Postcopy moves through a series of states (see postcopy_state) from +ADVISE->DISCARD->LISTEN->RUNNING->END + + - Advise + + Set at the start of migration if postcopy is enabled, even + if it hasn't had the start command; here the destination + checks that its OS has the support needed for postcopy, and performs + setup to ensure the RAM mappings are suitable for later postcopy. + The destination will fail early in migration at this point if the + required OS support is not present. + (Triggered by reception of POSTCOPY_ADVISE command) + + - Discard + + Entered on receipt of the first 'discard' command; prior to + the first Discard being performed, hugepages are switched off + (using madvise) to ensure that no new huge pages are created + during the postcopy phase, and to cause any huge pages that + have discards on them to be broken. + + - Listen + + The first command in the package, POSTCOPY_LISTEN, switches + the destination state to Listen, and starts a new thread + (the 'listen thread') which takes over the job of receiving + pages off the migration stream, while the main thread carries + on processing the blob. With this thread able to process page + reception, the destination now 'sensitises' the RAM to detect + any access to missing pages (on Linux using the 'userfault' + system). + + - Running + + POSTCOPY_RUN causes the destination to synchronise all + state and start the CPUs and IO devices running. The main + thread now finishes processing the migration package and + now carries on as it would for normal precopy migration + (although it can't do the cleanup it would do as it + finishes a normal migration). + + - Paused + + Postcopy can run into a paused state (normally on both sides when + happens), where all threads will be temporarily halted mostly due to + network errors. When reaching paused state, migration will make sure + the qemu binary on both sides maintain the data without corrupting + the VM. To continue the migration, the admin needs to fix the + migration channel using the QMP command 'migrate-recover' on the + destination node, then resume the migration using QMP command 'migrate' + again on source node, with resume=true flag set. + + - End + + The listen thread can now quit, and perform the cleanup of migration + state, the migration is now complete. + +Source side page map +-------------------- + +The 'migration bitmap' in postcopy is basically the same as in the precopy, +where each of the bit to indicate that page is 'dirty' - i.e. needs +sending. During the precopy phase this is updated as the CPU dirties +pages, however during postcopy the CPUs are stopped and nothing should +dirty anything any more. Instead, dirty bits are cleared when the relevant +pages are sent during postcopy. + +Postcopy with hugepages +----------------------- + +Postcopy now works with hugetlbfs backed memory: + + a) The linux kernel on the destination must support userfault on hugepages. + b) The huge-page configuration on the source and destination VMs must be + identical; i.e. RAMBlocks on both sides must use the same page size. + c) Note that ``-mem-path /dev/hugepages`` will fall back to allocating normal + RAM if it doesn't have enough hugepages, triggering (b) to fail. + Using ``-mem-prealloc`` enforces the allocation using hugepages. + d) Care should be taken with the size of hugepage used; postcopy with 2MB + hugepages works well, however 1GB hugepages are likely to be problematic + since it takes ~1 second to transfer a 1GB hugepage across a 10Gbps link, + and until the full page is transferred the destination thread is blocked. + +Postcopy with shared memory +--------------------------- + +Postcopy migration with shared memory needs explicit support from the other +processes that share memory and from QEMU. There are restrictions on the type of +memory that userfault can support shared. + +The Linux kernel userfault support works on ``/dev/shm`` memory and on ``hugetlbfs`` +(although the kernel doesn't provide an equivalent to ``madvise(MADV_DONTNEED)`` +for hugetlbfs which may be a problem in some configurations). + +The vhost-user code in QEMU supports clients that have Postcopy support, +and the ``vhost-user-bridge`` (in ``tests/``) and the DPDK package have changes +to support postcopy. + +The client needs to open a userfaultfd and register the areas +of memory that it maps with userfault. The client must then pass the +userfaultfd back to QEMU together with a mapping table that allows +fault addresses in the clients address space to be converted back to +RAMBlock/offsets. The client's userfaultfd is added to the postcopy +fault-thread and page requests are made on behalf of the client by QEMU. +QEMU performs 'wake' operations on the client's userfaultfd to allow it +to continue after a page has arrived. + +.. note:: + There are two future improvements that would be nice: + a) Some way to make QEMU ignorant of the addresses in the clients + address space + b) Avoiding the need for QEMU to perform ufd-wake calls after the + pages have arrived + +Retro-fitting postcopy to existing clients is possible: + a) A mechanism is needed for the registration with userfault as above, + and the registration needs to be coordinated with the phases of + postcopy. In vhost-user extra messages are added to the existing + control channel. + b) Any thread that can block due to guest memory accesses must be + identified and the implication understood; for example if the + guest memory access is made while holding a lock then all other + threads waiting for that lock will also be blocked. + +Postcopy Preemption Mode +------------------------ + +Postcopy preempt is a new capability introduced in 8.0 QEMU release, it +allows urgent pages (those got page fault requested from destination QEMU +explicitly) to be sent in a separate preempt channel, rather than queued in +the background migration channel. Anyone who cares about latencies of page +faults during a postcopy migration should enable this feature. By default, +it's not enabled. From patchwork Tue Jan 16 03:19:43 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 13520351 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 4145AC4706C for ; Tue, 16 Jan 2024 03:22:04 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1rPa0c-00066f-9g; Mon, 15 Jan 2024 22:21:02 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1rPa0W-00063e-9K for qemu-devel@nongnu.org; Mon, 15 Jan 2024 22:20:56 -0500 Received: from us-smtp-delivery-124.mimecast.com ([170.10.129.124]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1rPa0U-00036d-4m for qemu-devel@nongnu.org; Mon, 15 Jan 2024 22:20:56 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1705375252; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=MtTS+Q0RUBBpLuyF/zLHSS9yw/Hs9pS+4zGkT3k6AMs=; b=JIQWPYQWuHI7M9KY3eRE8+S4+fcSzjE/Ziz92bv5OeX1ETvafcxa5w6VLCRNbLH7cYEAos lSVqB+o8LZaThA4AaFKBRmJMANtMNiNvMBcpVXJ/FPhyg8qphwkX1T7Rczxu0JDSZNh+Ga I4DwrpgqEPQoxxgLbCUDR7cBYW2O/YM= Received: from mimecast-mx02.redhat.com (mx-ext.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-189-6xkgHTDNOhGR49rn8_QNag-1; Mon, 15 Jan 2024 22:20:48 -0500 X-MC-Unique: 6xkgHTDNOhGR49rn8_QNag-1 Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.rdu2.redhat.com [10.11.54.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id E62903811F50; Tue, 16 Jan 2024 03:20:47 +0000 (UTC) Received: from x1n.redhat.com (unknown [10.72.116.92]) by smtp.corp.redhat.com (Postfix) with ESMTP id 857103C25; Tue, 16 Jan 2024 03:20:44 +0000 (UTC) From: peterx@redhat.com To: qemu-devel@nongnu.org, Peter Maydell Cc: peterx@redhat.com, Fabiano Rosas , Yong Huang , =?utf-8?q?C=C3=A9dric_Le_Goater?= Subject: [PULL 16/20] docs/migration: Split "dirty limit" Date: Tue, 16 Jan 2024 11:19:43 +0800 Message-ID: <20240116031947.69017-17-peterx@redhat.com> In-Reply-To: <20240116031947.69017-1-peterx@redhat.com> References: <20240116031947.69017-1-peterx@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.4.1 on 10.11.54.1 Received-SPF: pass client-ip=170.10.129.124; envelope-from=peterx@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -35 X-Spam_score: -3.6 X-Spam_bar: --- X-Spam_report: (-3.6 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-1.531, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H4=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org From: Peter Xu Split that into a separate file, put under "features". Cc: Yong Huang Reviewed-by: Cédric Le Goater Link: https://lore.kernel.org/r/20240109064628.595453-8-peterx@redhat.com Signed-off-by: Peter Xu --- docs/devel/migration/dirty-limit.rst | 71 ++++++++++++++++++++++++++++ docs/devel/migration/features.rst | 1 + docs/devel/migration/main.rst | 71 ---------------------------- 3 files changed, 72 insertions(+), 71 deletions(-) create mode 100644 docs/devel/migration/dirty-limit.rst diff --git a/docs/devel/migration/dirty-limit.rst b/docs/devel/migration/dirty-limit.rst new file mode 100644 index 0000000000..8f32329d5f --- /dev/null +++ b/docs/devel/migration/dirty-limit.rst @@ -0,0 +1,71 @@ +Dirty limit +=========== + +The dirty limit, short for dirty page rate upper limit, is a new capability +introduced in the 8.1 QEMU release that uses a new algorithm based on the KVM +dirty ring to throttle down the guest during live migration. + +The algorithm framework is as follows: + +:: + + ------------------------------------------------------------------------------ + main --------------> throttle thread ------------> PREPARE(1) <-------- + thread \ | | + \ | | + \ V | + -\ CALCULATE(2) | + \ | | + \ | | + \ V | + \ SET PENALTY(3) ----- + -\ | + \ | + \ V + -> virtual CPU thread -------> ACCEPT PENALTY(4) + ------------------------------------------------------------------------------ + +When the qmp command qmp_set_vcpu_dirty_limit is called for the first time, +the QEMU main thread starts the throttle thread. The throttle thread, once +launched, executes the loop, which consists of three steps: + + - PREPARE (1) + + The entire work of PREPARE (1) is preparation for the second stage, + CALCULATE(2), as the name implies. It involves preparing the dirty + page rate value and the corresponding upper limit of the VM: + The dirty page rate is calculated via the KVM dirty ring mechanism, + which tells QEMU how many dirty pages a virtual CPU has had since the + last KVM_EXIT_DIRTY_RING_FULL exception; The dirty page rate upper + limit is specified by caller, therefore fetch it directly. + + - CALCULATE (2) + + Calculate a suitable sleep period for each virtual CPU, which will be + used to determine the penalty for the target virtual CPU. The + computation must be done carefully in order to reduce the dirty page + rate progressively down to the upper limit without oscillation. To + achieve this, two strategies are provided: the first is to add or + subtract sleep time based on the ratio of the current dirty page rate + to the limit, which is used when the current dirty page rate is far + from the limit; the second is to add or subtract a fixed time when + the current dirty page rate is close to the limit. + + - SET PENALTY (3) + + Set the sleep time for each virtual CPU that should be penalized based + on the results of the calculation supplied by step CALCULATE (2). + +After completing the three above stages, the throttle thread loops back +to step PREPARE (1) until the dirty limit is reached. + +On the other hand, each virtual CPU thread reads the sleep duration and +sleeps in the path of the KVM_EXIT_DIRTY_RING_FULL exception handler, that +is ACCEPT PENALTY (4). Virtual CPUs tied with writing processes will +obviously exit to the path and get penalized, whereas virtual CPUs involved +with read processes will not. + +In summary, thanks to the KVM dirty ring technology, the dirty limit +algorithm will restrict virtual CPUs as needed to keep their dirty page +rate inside the limit. This leads to more steady reading performance during +live migration and can aid in improving large guest responsiveness. diff --git a/docs/devel/migration/features.rst b/docs/devel/migration/features.rst index 0054e0c900..e257d0d100 100644 --- a/docs/devel/migration/features.rst +++ b/docs/devel/migration/features.rst @@ -7,3 +7,4 @@ Migration has plenty of features to support different use cases. :maxdepth: 2 postcopy + dirty-limit diff --git a/docs/devel/migration/main.rst b/docs/devel/migration/main.rst index 051ea43f0e..00b9c3d32f 100644 --- a/docs/devel/migration/main.rst +++ b/docs/devel/migration/main.rst @@ -573,74 +573,3 @@ path. Return path - opened by main thread, written by main thread AND postcopy thread (protected by rp_mutex) -Dirty limit -===================== -The dirty limit, short for dirty page rate upper limit, is a new capability -introduced in the 8.1 QEMU release that uses a new algorithm based on the KVM -dirty ring to throttle down the guest during live migration. - -The algorithm framework is as follows: - -:: - - ------------------------------------------------------------------------------ - main --------------> throttle thread ------------> PREPARE(1) <-------- - thread \ | | - \ | | - \ V | - -\ CALCULATE(2) | - \ | | - \ | | - \ V | - \ SET PENALTY(3) ----- - -\ | - \ | - \ V - -> virtual CPU thread -------> ACCEPT PENALTY(4) - ------------------------------------------------------------------------------ - -When the qmp command qmp_set_vcpu_dirty_limit is called for the first time, -the QEMU main thread starts the throttle thread. The throttle thread, once -launched, executes the loop, which consists of three steps: - - - PREPARE (1) - - The entire work of PREPARE (1) is preparation for the second stage, - CALCULATE(2), as the name implies. It involves preparing the dirty - page rate value and the corresponding upper limit of the VM: - The dirty page rate is calculated via the KVM dirty ring mechanism, - which tells QEMU how many dirty pages a virtual CPU has had since the - last KVM_EXIT_DIRTY_RING_FULL exception; The dirty page rate upper - limit is specified by caller, therefore fetch it directly. - - - CALCULATE (2) - - Calculate a suitable sleep period for each virtual CPU, which will be - used to determine the penalty for the target virtual CPU. The - computation must be done carefully in order to reduce the dirty page - rate progressively down to the upper limit without oscillation. To - achieve this, two strategies are provided: the first is to add or - subtract sleep time based on the ratio of the current dirty page rate - to the limit, which is used when the current dirty page rate is far - from the limit; the second is to add or subtract a fixed time when - the current dirty page rate is close to the limit. - - - SET PENALTY (3) - - Set the sleep time for each virtual CPU that should be penalized based - on the results of the calculation supplied by step CALCULATE (2). - -After completing the three above stages, the throttle thread loops back -to step PREPARE (1) until the dirty limit is reached. - -On the other hand, each virtual CPU thread reads the sleep duration and -sleeps in the path of the KVM_EXIT_DIRTY_RING_FULL exception handler, that -is ACCEPT PENALTY (4). Virtual CPUs tied with writing processes will -obviously exit to the path and get penalized, whereas virtual CPUs involved -with read processes will not. - -In summary, thanks to the KVM dirty ring technology, the dirty limit -algorithm will restrict virtual CPUs as needed to keep their dirty page -rate inside the limit. This leads to more steady reading performance during -live migration and can aid in improving large guest responsiveness. - From patchwork Tue Jan 16 03:19:44 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 13520355 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id C1C8FC47DA2 for ; Tue, 16 Jan 2024 03:22:10 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1rPa0c-00066s-BR; Mon, 15 Jan 2024 22:21:02 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1rPa0W-00064N-Me for qemu-devel@nongnu.org; Mon, 15 Jan 2024 22:20:56 -0500 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1rPa0U-00036f-9x for qemu-devel@nongnu.org; Mon, 15 Jan 2024 22:20:56 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1705375253; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=pb/PVPIsty7f/kvMuv9jPIXlhMkKjEFnWAQon7l7LRQ=; b=IMWRdzx2jd4+t1LY7LutxCWIXn4jrdpK8L/Htm4qM+CE3BjL+c9/XocOCYONlxhL/JwfTU 7LtDKg+y2ASvrMk3GgJ9mNOrI0+kTmd2Q3SLlVc9W2KgLZeL7fY9g7dKUxTZZZzw2lNAul VI8DYzSuoi8k4oIUtn/qIhZPkO7Bd7U= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-518-DkQ6zD6GM9iyElcFeOykOg-1; Mon, 15 Jan 2024 22:20:51 -0500 X-MC-Unique: DkQ6zD6GM9iyElcFeOykOg-1 Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.rdu2.redhat.com [10.11.54.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 8BD50811E86; Tue, 16 Jan 2024 03:20:51 +0000 (UTC) Received: from x1n.redhat.com (unknown [10.72.116.92]) by smtp.corp.redhat.com (Postfix) with ESMTP id B5D7C3C25; Tue, 16 Jan 2024 03:20:48 +0000 (UTC) From: peterx@redhat.com To: qemu-devel@nongnu.org, Peter Maydell Cc: peterx@redhat.com, Fabiano Rosas , =?utf-8?q?C=C3=A9dri?= =?utf-8?q?c_Le_Goater?= Subject: [PULL 17/20] docs/migration: Organize "Postcopy" page Date: Tue, 16 Jan 2024 11:19:44 +0800 Message-ID: <20240116031947.69017-18-peterx@redhat.com> In-Reply-To: <20240116031947.69017-1-peterx@redhat.com> References: <20240116031947.69017-1-peterx@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.4.1 on 10.11.54.1 Received-SPF: pass client-ip=170.10.133.124; envelope-from=peterx@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -35 X-Spam_score: -3.6 X-Spam_bar: --- X-Spam_report: (-3.6 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-1.531, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H3=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org From: Peter Xu Reorganize the page, moving things around, and add a few headlines ("Postcopy internals", "Postcopy features") to cover sub-areas. Reviewed-by: Cédric Le Goater Link: https://lore.kernel.org/r/20240109064628.595453-9-peterx@redhat.com Signed-off-by: Peter Xu --- docs/devel/migration/postcopy.rst | 159 ++++++++++++++++-------------- 1 file changed, 84 insertions(+), 75 deletions(-) diff --git a/docs/devel/migration/postcopy.rst b/docs/devel/migration/postcopy.rst index d60eec06ab..6c51e96d79 100644 --- a/docs/devel/migration/postcopy.rst +++ b/docs/devel/migration/postcopy.rst @@ -1,6 +1,9 @@ +======== Postcopy ======== +.. contents:: + 'Postcopy' migration is a way to deal with migrations that refuse to converge (or take too long to converge) its plus side is that there is an upper bound on the amount of migration traffic and time it takes, the down side is that during @@ -14,7 +17,7 @@ Postcopy can be combined with precopy (i.e. normal migration) so that if precopy doesn't finish in a given time the switch is made to postcopy. Enabling postcopy ------------------ +================= To enable postcopy, issue this command on the monitor (both source and destination) prior to the start of migration: @@ -49,8 +52,71 @@ time per vCPU. ``migrate_set_parameter`` is ignored (to avoid delaying requested pages that the destination is waiting for). -Postcopy device transfer ------------------------- +Postcopy internals +================== + +State machine +------------- + +Postcopy moves through a series of states (see postcopy_state) from +ADVISE->DISCARD->LISTEN->RUNNING->END + + - Advise + + Set at the start of migration if postcopy is enabled, even + if it hasn't had the start command; here the destination + checks that its OS has the support needed for postcopy, and performs + setup to ensure the RAM mappings are suitable for later postcopy. + The destination will fail early in migration at this point if the + required OS support is not present. + (Triggered by reception of POSTCOPY_ADVISE command) + + - Discard + + Entered on receipt of the first 'discard' command; prior to + the first Discard being performed, hugepages are switched off + (using madvise) to ensure that no new huge pages are created + during the postcopy phase, and to cause any huge pages that + have discards on them to be broken. + + - Listen + + The first command in the package, POSTCOPY_LISTEN, switches + the destination state to Listen, and starts a new thread + (the 'listen thread') which takes over the job of receiving + pages off the migration stream, while the main thread carries + on processing the blob. With this thread able to process page + reception, the destination now 'sensitises' the RAM to detect + any access to missing pages (on Linux using the 'userfault' + system). + + - Running + + POSTCOPY_RUN causes the destination to synchronise all + state and start the CPUs and IO devices running. The main + thread now finishes processing the migration package and + now carries on as it would for normal precopy migration + (although it can't do the cleanup it would do as it + finishes a normal migration). + + - Paused + + Postcopy can run into a paused state (normally on both sides when + happens), where all threads will be temporarily halted mostly due to + network errors. When reaching paused state, migration will make sure + the qemu binary on both sides maintain the data without corrupting + the VM. To continue the migration, the admin needs to fix the + migration channel using the QMP command 'migrate-recover' on the + destination node, then resume the migration using QMP command 'migrate' + again on source node, with resume=true flag set. + + - End + + The listen thread can now quit, and perform the cleanup of migration + state, the migration is now complete. + +Device transfer +--------------- Loading of device data may cause the device emulation to access guest RAM that may trigger faults that have to be resolved by the source, as such @@ -130,7 +196,20 @@ processing. is no longer used by migration, while the listen thread carries on servicing page data until the end of migration. -Postcopy Recovery +Source side page bitmap +----------------------- + +The 'migration bitmap' in postcopy is basically the same as in the precopy, +where each of the bit to indicate that page is 'dirty' - i.e. needs +sending. During the precopy phase this is updated as the CPU dirties +pages, however during postcopy the CPUs are stopped and nothing should +dirty anything any more. Instead, dirty bits are cleared when the relevant +pages are sent during postcopy. + +Postcopy features +================= + +Postcopy recovery ----------------- Comparing to precopy, postcopy is special on error handlings. When any @@ -166,76 +245,6 @@ configurations of the guest. For example, when with async page fault enabled, logically the guest can proactively schedule out the threads accessing missing pages. -Postcopy states ---------------- - -Postcopy moves through a series of states (see postcopy_state) from -ADVISE->DISCARD->LISTEN->RUNNING->END - - - Advise - - Set at the start of migration if postcopy is enabled, even - if it hasn't had the start command; here the destination - checks that its OS has the support needed for postcopy, and performs - setup to ensure the RAM mappings are suitable for later postcopy. - The destination will fail early in migration at this point if the - required OS support is not present. - (Triggered by reception of POSTCOPY_ADVISE command) - - - Discard - - Entered on receipt of the first 'discard' command; prior to - the first Discard being performed, hugepages are switched off - (using madvise) to ensure that no new huge pages are created - during the postcopy phase, and to cause any huge pages that - have discards on them to be broken. - - - Listen - - The first command in the package, POSTCOPY_LISTEN, switches - the destination state to Listen, and starts a new thread - (the 'listen thread') which takes over the job of receiving - pages off the migration stream, while the main thread carries - on processing the blob. With this thread able to process page - reception, the destination now 'sensitises' the RAM to detect - any access to missing pages (on Linux using the 'userfault' - system). - - - Running - - POSTCOPY_RUN causes the destination to synchronise all - state and start the CPUs and IO devices running. The main - thread now finishes processing the migration package and - now carries on as it would for normal precopy migration - (although it can't do the cleanup it would do as it - finishes a normal migration). - - - Paused - - Postcopy can run into a paused state (normally on both sides when - happens), where all threads will be temporarily halted mostly due to - network errors. When reaching paused state, migration will make sure - the qemu binary on both sides maintain the data without corrupting - the VM. To continue the migration, the admin needs to fix the - migration channel using the QMP command 'migrate-recover' on the - destination node, then resume the migration using QMP command 'migrate' - again on source node, with resume=true flag set. - - - End - - The listen thread can now quit, and perform the cleanup of migration - state, the migration is now complete. - -Source side page map --------------------- - -The 'migration bitmap' in postcopy is basically the same as in the precopy, -where each of the bit to indicate that page is 'dirty' - i.e. needs -sending. During the precopy phase this is updated as the CPU dirties -pages, however during postcopy the CPUs are stopped and nothing should -dirty anything any more. Instead, dirty bits are cleared when the relevant -pages are sent during postcopy. - Postcopy with hugepages ----------------------- @@ -293,7 +302,7 @@ Retro-fitting postcopy to existing clients is possible: guest memory access is made while holding a lock then all other threads waiting for that lock will also be blocked. -Postcopy Preemption Mode +Postcopy preemption mode ------------------------ Postcopy preempt is a new capability introduced in 8.0 QEMU release, it From patchwork Tue Jan 16 03:19:45 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 13520354 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id E6DCDC4706C for ; Tue, 16 Jan 2024 03:22:08 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1rPa0s-0006eu-Iu; Mon, 15 Jan 2024 22:21:20 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1rPa0d-00068D-ML for qemu-devel@nongnu.org; Mon, 15 Jan 2024 22:21:06 -0500 Received: from us-smtp-delivery-124.mimecast.com ([170.10.129.124]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1rPa0Z-000373-4q for qemu-devel@nongnu.org; Mon, 15 Jan 2024 22:21:03 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1705375258; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=PezdJDREJ1dW9K2++rC3TKhv48O3th3VaLjq7R8Jl+A=; b=ESY2EoL51Ttmq4WQovNiasTc1HU5kXwrf7AQfNyD6osgoul3zoczg2D0wPrIrFKQzjy+LH tlhBOnNti4RrhixaENcdC1OisWNyjApJG8aB0ujSViQGUf263xBuwA/3EzAgPVOQylua0/ G4eilR9OonnH1zqCrjLjH2/0XvY8AHI= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-493-loJwZuK3PxCYFs_pbent-g-1; Mon, 15 Jan 2024 22:20:55 -0500 X-MC-Unique: loJwZuK3PxCYFs_pbent-g-1 Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.rdu2.redhat.com [10.11.54.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 689C6185A782; Tue, 16 Jan 2024 03:20:55 +0000 (UTC) Received: from x1n.redhat.com (unknown [10.72.116.92]) by smtp.corp.redhat.com (Postfix) with ESMTP id 505E73C25; Tue, 16 Jan 2024 03:20:51 +0000 (UTC) From: peterx@redhat.com To: qemu-devel@nongnu.org, Peter Maydell Cc: peterx@redhat.com, Fabiano Rosas , Alex Williamson , =?utf-8?q?C=C3=A9dric_Le_Goat?= =?utf-8?q?er?= Subject: [PULL 18/20] docs/migration: Further move vfio to be feature of migration Date: Tue, 16 Jan 2024 11:19:45 +0800 Message-ID: <20240116031947.69017-19-peterx@redhat.com> In-Reply-To: <20240116031947.69017-1-peterx@redhat.com> References: <20240116031947.69017-1-peterx@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.4.1 on 10.11.54.1 Received-SPF: pass client-ip=170.10.129.124; envelope-from=peterx@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -35 X-Spam_score: -3.6 X-Spam_bar: --- X-Spam_report: (-3.6 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-1.531, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H4=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org From: Peter Xu Move it one layer down, so taking VFIO-migration as a feature for migration. Cc: Alex Williamson Cc: Cédric Le Goater Reviewed-by: Cédric Le Goater Link: https://lore.kernel.org/r/20240109064628.595453-10-peterx@redhat.com Signed-off-by: Peter Xu --- docs/devel/migration/features.rst | 1 + docs/devel/migration/index.rst | 1 - 2 files changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/devel/migration/features.rst b/docs/devel/migration/features.rst index e257d0d100..dea016f707 100644 --- a/docs/devel/migration/features.rst +++ b/docs/devel/migration/features.rst @@ -8,3 +8,4 @@ Migration has plenty of features to support different use cases. postcopy dirty-limit + vfio diff --git a/docs/devel/migration/index.rst b/docs/devel/migration/index.rst index 21ad58b189..b1357309e1 100644 --- a/docs/devel/migration/index.rst +++ b/docs/devel/migration/index.rst @@ -10,6 +10,5 @@ QEMU live migration works. main features compatibility - vfio virtio best-practices From patchwork Tue Jan 16 03:19:46 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 13520356 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id B4100C47077 for ; Tue, 16 Jan 2024 03:22:13 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1rPa1F-0007Ha-V4; Mon, 15 Jan 2024 22:21:43 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1rPa0g-00068X-Pz for qemu-devel@nongnu.org; Mon, 15 Jan 2024 22:21:07 -0500 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1rPa0d-00037M-Ex for qemu-devel@nongnu.org; Mon, 15 Jan 2024 22:21:06 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1705375262; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=S9B+GC/64ftXMp1oBDb/4V1+ff0KZ4ftzjfhlK0N8YU=; b=dwchI1fqBuJbzxhnRgAgsVmbK+RjTU6+5dtyE2HMtLYnXHAZJ0IbvOKoBhn6xuj1G55gJo d6yT6P4lKVnpLFD0ptR51vBCw2Wn6RNC+2g9sYLdKI08DKRrJkYlFUZd3UXc0piZwjZeNu aB30HSdYSrFGOtueqUKK7MN1/B9SzY4= Received: from mimecast-mx02.redhat.com (mx-ext.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-594-MTxvF23uNcytZZF0yEb2iA-1; Mon, 15 Jan 2024 22:20:59 -0500 X-MC-Unique: MTxvF23uNcytZZF0yEb2iA-1 Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.rdu2.redhat.com [10.11.54.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 64ABA1C04B76; Tue, 16 Jan 2024 03:20:59 +0000 (UTC) Received: from x1n.redhat.com (unknown [10.72.116.92]) by smtp.corp.redhat.com (Postfix) with ESMTP id 07D1A3C25; Tue, 16 Jan 2024 03:20:55 +0000 (UTC) From: peterx@redhat.com To: qemu-devel@nongnu.org, Peter Maydell Cc: peterx@redhat.com, Fabiano Rosas , "Michael S. Tsirkin" , Jason Wang , =?utf-8?q?C=C3=A9dric_Le_Goater?= Subject: [PULL 19/20] docs/migration: Further move virtio to be feature of migration Date: Tue, 16 Jan 2024 11:19:46 +0800 Message-ID: <20240116031947.69017-20-peterx@redhat.com> In-Reply-To: <20240116031947.69017-1-peterx@redhat.com> References: <20240116031947.69017-1-peterx@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.4.1 on 10.11.54.1 Received-SPF: pass client-ip=170.10.133.124; envelope-from=peterx@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -35 X-Spam_score: -3.6 X-Spam_bar: --- X-Spam_report: (-3.6 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-1.531, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H3=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org From: Peter Xu Move it one layer down, so taking Virtio-migration as a feature for migration. Cc: "Michael S. Tsirkin" Cc: Jason Wang Reviewed-by: Cédric Le Goater Link: https://lore.kernel.org/r/20240109064628.595453-11-peterx@redhat.com Signed-off-by: Peter Xu --- docs/devel/migration/features.rst | 1 + docs/devel/migration/index.rst | 1 - 2 files changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/devel/migration/features.rst b/docs/devel/migration/features.rst index dea016f707..a9acaf618e 100644 --- a/docs/devel/migration/features.rst +++ b/docs/devel/migration/features.rst @@ -9,3 +9,4 @@ Migration has plenty of features to support different use cases. postcopy dirty-limit vfio + virtio diff --git a/docs/devel/migration/index.rst b/docs/devel/migration/index.rst index b1357309e1..2aa294d631 100644 --- a/docs/devel/migration/index.rst +++ b/docs/devel/migration/index.rst @@ -10,5 +10,4 @@ QEMU live migration works. main features compatibility - virtio best-practices From patchwork Tue Jan 16 03:19:47 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 13520360 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id EB171C47077 for ; Tue, 16 Jan 2024 03:22:32 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1rPa1R-0007gc-81; Mon, 15 Jan 2024 22:21:53 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1rPa0i-0006HG-Q0 for qemu-devel@nongnu.org; Mon, 15 Jan 2024 22:21:10 -0500 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1rPa0g-00037X-NP for qemu-devel@nongnu.org; Mon, 15 Jan 2024 22:21:08 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1705375266; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=gicgabJmF07S0wCWqnYSu+RO4ouf2Ed+JzSkPhuW3+g=; b=AwBr2+XgLAlMmzM7CnOgv+1jD6mIfATWUeKiJMQDf08ycrMky+ogdQ2x72dn9irzxJaYh8 7Pf/l7j4BcepNNd85pYMekOWw9vWrDhhRrkTPYcrGIo8nvW77MZ6X0jUu3aogj5WoDhxrZ 7zzKneOO2e459GkKToYB/E+FoPjm+t0= Received: from mimecast-mx02.redhat.com (mx-ext.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-590-j2tHyg1VNJCg_xFO9vc-Zg-1; Mon, 15 Jan 2024 22:21:03 -0500 X-MC-Unique: j2tHyg1VNJCg_xFO9vc-Zg-1 Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.rdu2.redhat.com [10.11.54.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id F3C663C025C4; Tue, 16 Jan 2024 03:21:02 +0000 (UTC) Received: from x1n.redhat.com (unknown [10.72.116.92]) by smtp.corp.redhat.com (Postfix) with ESMTP id 345463C25; Tue, 16 Jan 2024 03:20:59 +0000 (UTC) From: peterx@redhat.com To: qemu-devel@nongnu.org, Peter Maydell Cc: peterx@redhat.com, Fabiano Rosas , Nick Briggs Subject: [PULL 20/20] migration/rdma: define htonll/ntohll only if not predefined Date: Tue, 16 Jan 2024 11:19:47 +0800 Message-ID: <20240116031947.69017-21-peterx@redhat.com> In-Reply-To: <20240116031947.69017-1-peterx@redhat.com> References: <20240116031947.69017-1-peterx@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.4.1 on 10.11.54.1 Received-SPF: pass client-ip=170.10.133.124; envelope-from=peterx@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -35 X-Spam_score: -3.6 X-Spam_bar: --- X-Spam_report: (-3.6 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-1.531, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H3=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org From: Nick Briggs Solaris has #defines for htonll and ntohll which cause syntax errors when compiling code that attempts to (re)define these functions.. Signed-off-by: Nick Briggs Link: https://lore.kernel.org/r/65a04a7d.497ab3.3e7bef1f@gateway.sonic.net Signed-off-by: Peter Xu --- migration/rdma.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/migration/rdma.c b/migration/rdma.c index 94c0f871f0..a355dcea89 100644 --- a/migration/rdma.c +++ b/migration/rdma.c @@ -238,6 +238,7 @@ static const char *control_desc(unsigned int rdma_control) return strs[rdma_control]; } +#if !defined(htonll) static uint64_t htonll(uint64_t v) { union { uint32_t lv[2]; uint64_t llv; } u; @@ -245,13 +246,16 @@ static uint64_t htonll(uint64_t v) u.lv[1] = htonl(v & 0xFFFFFFFFULL); return u.llv; } +#endif +#if !defined(ntohll) static uint64_t ntohll(uint64_t v) { union { uint32_t lv[2]; uint64_t llv; } u; u.llv = v; return ((uint64_t)ntohl(u.lv[0]) << 32) | (uint64_t) ntohl(u.lv[1]); } +#endif static void dest_block_to_network(RDMADestBlock *db) {