From patchwork Wed Jun 12 14:42:25 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 13695153 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id A1E30C27C53 for ; Wed, 12 Jun 2024 14:43:25 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1sHPBj-0008NV-Db; Wed, 12 Jun 2024 10:42:59 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1sHPBi-0008NK-3J for qemu-devel@nongnu.org; Wed, 12 Jun 2024 10:42:58 -0400 Received: from us-smtp-delivery-124.mimecast.com ([170.10.129.124]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1sHPBg-0002o9-P2 for qemu-devel@nongnu.org; Wed, 12 Jun 2024 10:42:57 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1718203376; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Qq8ujPsJ+J4O/loqzt4Yo7HEnPlqsiixMZpBY1GvreI=; b=UV1hBUDxGAa9bJvRB0ga6AXJBEAb/2OgTP0vVheg2k2j74RWETnE8b0UgON+PjagWh5EAe e9UGPVijC2cfYXpcionL8S1reV2xrM/onMo8+gSBjAxHg2FnTgGCCx+UM/yKYjhQ6aZ2rX t2MGsoGDgmPzKSLpg5WD181A/emxYmM= Received: from mail-qt1-f198.google.com (mail-qt1-f198.google.com [209.85.160.198]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-615-FLJ5Lj-tM3-odvUCkxox3w-1; Wed, 12 Jun 2024 10:42:37 -0400 X-MC-Unique: FLJ5Lj-tM3-odvUCkxox3w-1 Received: by mail-qt1-f198.google.com with SMTP id d75a77b69052e-43fd537e6a6so5526861cf.0 for ; Wed, 12 Jun 2024 07:42:35 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1718203354; x=1718808154; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Qq8ujPsJ+J4O/loqzt4Yo7HEnPlqsiixMZpBY1GvreI=; b=sxuttfnVY0KTj4UMiCOv+DW/6wzJjhe61R7OUTwQQN7p4MMw14AfNPJ71t4smcVp76 HtwraKL1ab1F1JWsmVS4SFIMp51a3b1AQ59xYkAWAbICuKSF6oZyDDIjt5S23GW76BwW AxKTJVCvDs8HkF/auY913/NpWoqsFwIYgJ7B4hW+h68hc/m7zSHI8X1C7Gt/nRiGQ2TM a4hTELCdv2ftzy5qjS9K2hudGV7HdrXHiTwumSyQpAfWEg4nQl1nE3c/JofurbSzoydz iGRi8Q7nCNQzw8I04GfSbQqe0VbTtoYH83ZHibImA9MVjTPdRQhC/bCEf9akd2boFJ/y U7lw== X-Gm-Message-State: AOJu0YxrFxtVwqnrO12dx06TMpDgh2TXdHZIrTJvY2SpPadnp57AHTi3 ompw6wnUrEOmDiuVI//3eOZA/uHB9LI1QaPawe1umuGMOAsa+7Qv71eOEWBmnCBusCwZ9bkztab A3a0MjC1U6i1Nhgexibm0jL4z5KZ9coM/7NPeWG/RSWZCqvJs4ikOm8TCvlz1DHH4vUmChoyB0j Ss+91cbUzcI/xsJn8j736cg5plaOldewuNGw== X-Received: by 2002:ac8:7d81:0:b0:440:279c:f9e6 with SMTP id d75a77b69052e-4415aca8c5amr19354201cf.5.1718203354188; Wed, 12 Jun 2024 07:42:34 -0700 (PDT) X-Google-Smtp-Source: AGHT+IGAVUywsmnBrLAPYDZ7kYritTrdaoRf5vdSbQ7SHuEHPIAi6xsukZSvYP0Fjg5oofCiUlFMkQ== X-Received: by 2002:ac8:7d81:0:b0:440:279c:f9e6 with SMTP id d75a77b69052e-4415aca8c5amr19353881cf.5.1718203353560; Wed, 12 Jun 2024 07:42:33 -0700 (PDT) Received: from x1n.redhat.com (pool-99-254-121-117.cpe.net.cable.rogers.com. [99.254.121.117]) by smtp.gmail.com with ESMTPSA id d75a77b69052e-4405e3ded65sm37581681cf.87.2024.06.12.07.42.32 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 12 Jun 2024 07:42:33 -0700 (PDT) From: Peter Xu To: qemu-devel@nongnu.org Cc: Jiri Denemark , Prasad Pandit , Fabiano Rosas , Bandan Das , peterx@redhat.com Subject: [PATCH 1/4] migration/multifd: Avoid the final FLUSH in complete() Date: Wed, 12 Jun 2024 10:42:25 -0400 Message-ID: <20240612144228.1179240-2-peterx@redhat.com> X-Mailer: git-send-email 2.45.0 In-Reply-To: <20240612144228.1179240-1-peterx@redhat.com> References: <20240612144228.1179240-1-peterx@redhat.com> MIME-Version: 1.0 Received-SPF: pass client-ip=170.10.129.124; envelope-from=peterx@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -22 X-Spam_score: -2.3 X-Spam_bar: -- X-Spam_report: (-2.3 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.143, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H4=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org We always do the flush when finishing one round of scan, and during complete() phase we should scan one more round making sure no dirty page existed. In that case we shouldn't need one explicit FLUSH at the end of complete(), as when reaching there all pages should have been flushed. Signed-off-by: Peter Xu Reviewed-by: Fabiano Rosas Tested-by: Fabiano Rosas --- migration/ram.c | 4 ---- 1 file changed, 4 deletions(-) diff --git a/migration/ram.c b/migration/ram.c index ceea586b06..edec1a2d07 100644 --- a/migration/ram.c +++ b/migration/ram.c @@ -3300,10 +3300,6 @@ static int ram_save_complete(QEMUFile *f, void *opaque) } } - if (migrate_multifd() && !migrate_multifd_flush_after_each_section() && - !migrate_mapped_ram()) { - qemu_put_be64(f, RAM_SAVE_FLAG_MULTIFD_FLUSH); - } qemu_put_be64(f, RAM_SAVE_FLAG_EOS); return qemu_fflush(f); } From patchwork Wed Jun 12 14:42:26 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 13695156 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 22064C27C53 for ; Wed, 12 Jun 2024 14:43:50 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1sHPBU-00087i-QG; Wed, 12 Jun 2024 10:42:44 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1sHPBT-000879-4T for qemu-devel@nongnu.org; Wed, 12 Jun 2024 10:42:43 -0400 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1sHPBR-0002nV-Ev for qemu-devel@nongnu.org; Wed, 12 Jun 2024 10:42:42 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1718203360; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=EhSChAN7rRoGGaUSPSWCzV7QQQImNhxehyAptMysuXo=; b=DufqvculMUpqGcNTGtk3dPs0e3tfoGiVoan8XqEQM9UKa6LHaWzA5VyUIK+xwPBKiyZVsB ccFygWb8DSCFBKDbIpkq/HHNRrsiaC5nKeWMijk3FK9U2u5OvtHXVKieO1RfKrLpbT1UAz M0ibPmJP9v3WZLTZu5wJhjZcHw2KVpw= Received: from mail-yb1-f198.google.com (mail-yb1-f198.google.com [209.85.219.198]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-120-wXBTEOKFPbaj2KMvqegU8Q-1; Wed, 12 Jun 2024 10:42:36 -0400 X-MC-Unique: wXBTEOKFPbaj2KMvqegU8Q-1 Received: by mail-yb1-f198.google.com with SMTP id 3f1490d57ef6-dfef58530cdso50927276.3 for ; Wed, 12 Jun 2024 07:42:36 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1718203355; x=1718808155; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=EhSChAN7rRoGGaUSPSWCzV7QQQImNhxehyAptMysuXo=; b=Txtgb4S1KkNdu3iKpZWwwpNxdta5fXlnFZ575IyhNuIIfUbaDnZSUIpGvu2gDk5ALA NZsEbMunXo5ds7mCpW7EK47hrPLh2SvC3hnqVZ/+JR4t8qBF8h/x17j9YSGbd/GBLATO QThJKISEwgS78Uba+NMGN4ejDyfDsnw5h1zmwgUyFdq0tfStPfw4h8VDtZAUILEQb5zz xWfI9SdAEl80ujDQDnBeDH8GZM6RJQxgL6Hah7YgTDeX1G+pNqRxFaHQ3gl860dAnxna VToq7HFzZI4Z3ha2FsQJwi58G8yLOb33j/yMEBB/MTNgT8E5RzpmKsz7sfCq7ncxxhPc w+Lw== X-Gm-Message-State: AOJu0YzD4p6/LgpU4NekwA1yyNMStxdWjvhA71WFWcRMOD79bNrBCzMT lqxnmegq7lvBEFvxCykq3tnMiybR/lsrUSeRJn26aIqpR2rJkl00IZyrgxGkWIfO5hDxLTxYCJw C3lShsOoqF8aY05mGELSzfP3BrRRrpXwxyKNi80WeLuQZORnIsyRYzXIbr67Ims5sjn1u9lyj1C 2tCx+CJdaL1lqTiSr2axxQOyYHfvgdcbpd1g== X-Received: by 2002:a25:aac1:0:b0:df8:3484:770f with SMTP id 3f1490d57ef6-dfe6a3c48cdmr1628047276.5.1718203355193; Wed, 12 Jun 2024 07:42:35 -0700 (PDT) X-Google-Smtp-Source: AGHT+IFXuFYYrHpwSclU/HZVpITEarM/IvEfihEypfKC02ucdu7kFmB0yXS4hLJlNTkaiCCIPYeSaA== X-Received: by 2002:a25:aac1:0:b0:df8:3484:770f with SMTP id 3f1490d57ef6-dfe6a3c48cdmr1627999276.5.1718203354597; Wed, 12 Jun 2024 07:42:34 -0700 (PDT) Received: from x1n.redhat.com (pool-99-254-121-117.cpe.net.cable.rogers.com. [99.254.121.117]) by smtp.gmail.com with ESMTPSA id d75a77b69052e-4405e3ded65sm37581681cf.87.2024.06.12.07.42.33 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 12 Jun 2024 07:42:34 -0700 (PDT) From: Peter Xu To: qemu-devel@nongnu.org Cc: Jiri Denemark , Prasad Pandit , Fabiano Rosas , Bandan Das , peterx@redhat.com Subject: [PATCH 2/4] migration: Rename thread debug names Date: Wed, 12 Jun 2024 10:42:26 -0400 Message-ID: <20240612144228.1179240-3-peterx@redhat.com> X-Mailer: git-send-email 2.45.0 In-Reply-To: <20240612144228.1179240-1-peterx@redhat.com> References: <20240612144228.1179240-1-peterx@redhat.com> MIME-Version: 1.0 Received-SPF: pass client-ip=170.10.133.124; envelope-from=peterx@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -22 X-Spam_score: -2.3 X-Spam_bar: -- X-Spam_report: (-2.3 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.143, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H4=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org The postcopy thread names on dest QEMU are slightly confusing, partly I'll need to blame myself on 36f62f11e4 ("migration: Postcopy preemption preparation on channel creation"). E.g., "fault-fast" reads like a fast version of "fault-default", but it's actually the fast version of "postcopy/listen". Taking this chance, rename all the migration threads with proper rules. Considering we only have 15 chars usable, prefix all threads with "mig/", meanwhile identify src/dst threads properly this time. So now most thread names will look like "mig/DIR/xxx", where DIR will be "src"/"dst", except the bg-snapshot thread which doesn't have a direction. For multifd threads, making them "mig/{src|dst}/{send|recv}_%d". We used to have "live_migration" thread for a very long time, now it's called "mig/src/main". We may hope to have "mig/dst/main" soon but not yet. Signed-off-by: Peter Xu Reviewed-by: Fabiano Rosas --- migration/colo.c | 2 +- migration/migration.c | 6 +++--- migration/multifd.c | 6 +++--- migration/postcopy-ram.c | 4 ++-- migration/savevm.c | 2 +- 5 files changed, 10 insertions(+), 10 deletions(-) diff --git a/migration/colo.c b/migration/colo.c index f96c2ee069..6449490221 100644 --- a/migration/colo.c +++ b/migration/colo.c @@ -935,7 +935,7 @@ void coroutine_fn colo_incoming_co(void) assert(bql_locked()); assert(migration_incoming_colo_enabled()); - qemu_thread_create(&th, "COLO incoming", colo_process_incoming_thread, + qemu_thread_create(&th, "mig/dst/colo", colo_process_incoming_thread, mis, QEMU_THREAD_JOINABLE); mis->colo_incoming_co = qemu_coroutine_self(); diff --git a/migration/migration.c b/migration/migration.c index e1b269624c..d41e00ed4c 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -2408,7 +2408,7 @@ static int open_return_path_on_source(MigrationState *ms) trace_open_return_path_on_source(); - qemu_thread_create(&ms->rp_state.rp_thread, "return path", + qemu_thread_create(&ms->rp_state.rp_thread, "mig/src/rp-thr", source_return_path_thread, ms, QEMU_THREAD_JOINABLE); ms->rp_state.rp_thread_created = true; @@ -3747,10 +3747,10 @@ void migrate_fd_connect(MigrationState *s, Error *error_in) } if (migrate_background_snapshot()) { - qemu_thread_create(&s->thread, "bg_snapshot", + qemu_thread_create(&s->thread, "mig/snapshot", bg_migration_thread, s, QEMU_THREAD_JOINABLE); } else { - qemu_thread_create(&s->thread, "live_migration", + qemu_thread_create(&s->thread, "mig/src/main", migration_thread, s, QEMU_THREAD_JOINABLE); } s->migration_thread_running = true; diff --git a/migration/multifd.c b/migration/multifd.c index f317bff077..7afc0965f6 100644 --- a/migration/multifd.c +++ b/migration/multifd.c @@ -1059,7 +1059,7 @@ static bool multifd_tls_channel_connect(MultiFDSendParams *p, args->p = p; p->tls_thread_created = true; - qemu_thread_create(&p->tls_thread, "multifd-tls-handshake-worker", + qemu_thread_create(&p->tls_thread, "mig/src/tls", multifd_tls_handshake_thread, args, QEMU_THREAD_JOINABLE); return true; @@ -1185,7 +1185,7 @@ bool multifd_send_setup(void) } else { p->iov = g_new0(struct iovec, page_count); } - p->name = g_strdup_printf("multifdsend_%d", i); + p->name = g_strdup_printf("mig/src/send_%d", i); p->page_size = qemu_target_page_size(); p->page_count = page_count; p->write_flags = 0; @@ -1601,7 +1601,7 @@ int multifd_recv_setup(Error **errp) + sizeof(uint64_t) * page_count; p->packet = g_malloc0(p->packet_len); } - p->name = g_strdup_printf("multifdrecv_%d", i); + p->name = g_strdup_printf("mig/dst/recv_%d", i); p->iov = g_new0(struct iovec, page_count); p->normal = g_new0(ram_addr_t, page_count); p->zero = g_new0(ram_addr_t, page_count); diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c index 3419779548..97701e6bb2 100644 --- a/migration/postcopy-ram.c +++ b/migration/postcopy-ram.c @@ -1238,7 +1238,7 @@ int postcopy_ram_incoming_setup(MigrationIncomingState *mis) return -1; } - postcopy_thread_create(mis, &mis->fault_thread, "fault-default", + postcopy_thread_create(mis, &mis->fault_thread, "mig/dst/fault", postcopy_ram_fault_thread, QEMU_THREAD_JOINABLE); mis->have_fault_thread = true; @@ -1258,7 +1258,7 @@ int postcopy_ram_incoming_setup(MigrationIncomingState *mis) * This thread needs to be created after the temp pages because * it'll fetch RAM_CHANNEL_POSTCOPY PostcopyTmpPage immediately. */ - postcopy_thread_create(mis, &mis->postcopy_prio_thread, "fault-fast", + postcopy_thread_create(mis, &mis->postcopy_prio_thread, "mig/dst/preempt", postcopy_preempt_thread, QEMU_THREAD_JOINABLE); mis->preempt_thread_status = PREEMPT_THREAD_CREATED; } diff --git a/migration/savevm.c b/migration/savevm.c index c621f2359b..e71410d8c1 100644 --- a/migration/savevm.c +++ b/migration/savevm.c @@ -2129,7 +2129,7 @@ static int loadvm_postcopy_handle_listen(MigrationIncomingState *mis) } mis->have_listen_thread = true; - postcopy_thread_create(mis, &mis->listen_thread, "postcopy/listen", + postcopy_thread_create(mis, &mis->listen_thread, "mig/dst/listen", postcopy_ram_listen_thread, QEMU_THREAD_DETACHED); trace_loadvm_postcopy_handle_listen("return"); From patchwork Wed Jun 12 14:42:27 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 13695154 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 2EB12C27C53 for ; Wed, 12 Jun 2024 14:43:36 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1sHPBS-00086n-NC; Wed, 12 Jun 2024 10:42:42 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1sHPBR-00085r-Gf for qemu-devel@nongnu.org; Wed, 12 Jun 2024 10:42:41 -0400 Received: from us-smtp-delivery-124.mimecast.com ([170.10.129.124]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1sHPBP-0002lq-Sa for qemu-devel@nongnu.org; Wed, 12 Jun 2024 10:42:41 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1718203359; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=9Y1IDYkI+aUDRc+MKkp3+DSunQulfjkQ0w9dXYr1Xhs=; b=OzpYiwvrZzIB4xrzIMYojcl+hP3f8FlqUF8dP16aF+SWWb5Y/RiNwGT7eLmJo9SO4EHWX8 LMqbfjkNAHSPEx/i1N9kh9gL/tD+BGlhVZTg0vu8UWFp1sKI+boKQ8l/AyNTbn/8X3/DoA pQeF+FYQ3Xgf+Bue7c5vRc4iSoVJfeA= Received: from mail-yb1-f197.google.com (mail-yb1-f197.google.com [209.85.219.197]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-679-HsOLga9hNCyy0-Hflst8vw-1; Wed, 12 Jun 2024 10:42:37 -0400 X-MC-Unique: HsOLga9hNCyy0-Hflst8vw-1 Received: by mail-yb1-f197.google.com with SMTP id 3f1490d57ef6-dfefde6c84dso8222276.1 for ; Wed, 12 Jun 2024 07:42:37 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1718203356; x=1718808156; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=9Y1IDYkI+aUDRc+MKkp3+DSunQulfjkQ0w9dXYr1Xhs=; b=QEekkh2/FUtrjs17oSfGKkfL4JJgjoVJCm8x1N7n/l0CO0ii4gNn+t1lnPl+s89DAg OqyLhJAgYnwqBsQRnxdcBCtye2Ut0w8gSOPz9Rooe01ZfSA0PEQck9CZZQe64ZsCuM8V 3JzSEdoosreuA03/HfWP/OhxqZsmEidF31DKEucMk8NtJt8jKUJuCh/PB1o7z/Jqw55k te9e0l0w2Ky1YaecT6Ojao93lHPzQ9ToGHp6I2S0OJftQasd7+pN9OmflPzISYWoLem1 ZamaIsC4x3G+EDPBIVLQ81Igyi0WA6L8wQeN9GBN7oRrO8/l6+zDP6M7M9s4TCJFaV8S BRTQ== X-Gm-Message-State: AOJu0Yx4MWTx5bVst5fnm9dEW1T3eGYcUBC64depX0Sr86+8ygcMTxpU xptslyeFRdOqadwxFAxBHafueqC7bKPXb/OrYkMSx49s2lwJYyXDZYLEakIA2aYkSmt/E5WG0vL LE5VK8os52ok0uANymhoy4txA9o0ftjPVtZL1c0PGa95yOcPq5B+HPi5aAN+TFRLBYfVLN2YRD+ QGjJLJV8HOqo31JQNuv4SYaNZ3H/Sv/WY9+w== X-Received: by 2002:a25:b125:0:b0:dfb:a69:5fdb with SMTP id 3f1490d57ef6-dfe65e7f385mr1601036276.1.1718203356300; Wed, 12 Jun 2024 07:42:36 -0700 (PDT) X-Google-Smtp-Source: AGHT+IFbE23xXFoaTC++Qr5ltR678XTsJh4jnpx9rYp7eUsxdlZq2JkL3dtljImwuEB/k8wumKoLXg== X-Received: by 2002:a25:b125:0:b0:dfb:a69:5fdb with SMTP id 3f1490d57ef6-dfe65e7f385mr1600992276.1.1718203355525; Wed, 12 Jun 2024 07:42:35 -0700 (PDT) Received: from x1n.redhat.com (pool-99-254-121-117.cpe.net.cable.rogers.com. [99.254.121.117]) by smtp.gmail.com with ESMTPSA id d75a77b69052e-4405e3ded65sm37581681cf.87.2024.06.12.07.42.34 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 12 Jun 2024 07:42:35 -0700 (PDT) From: Peter Xu To: qemu-devel@nongnu.org Cc: Jiri Denemark , Prasad Pandit , Fabiano Rosas , Bandan Das , peterx@redhat.com Subject: [PATCH 3/4] migration: Use MigrationStatus instead of int Date: Wed, 12 Jun 2024 10:42:27 -0400 Message-ID: <20240612144228.1179240-4-peterx@redhat.com> X-Mailer: git-send-email 2.45.0 In-Reply-To: <20240612144228.1179240-1-peterx@redhat.com> References: <20240612144228.1179240-1-peterx@redhat.com> MIME-Version: 1.0 Received-SPF: pass client-ip=170.10.129.124; envelope-from=peterx@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -22 X-Spam_score: -2.3 X-Spam_bar: -- X-Spam_report: (-2.3 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.143, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H4=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org QEMU uses "int" in most cases even if it stores MigrationStatus. I don't know why, so let's try to do that right and see what blows up.. Signed-off-by: Peter Xu Signed-off-by: Peter Xu --- migration/migration.h | 9 +++++---- migration/migration.c | 13 +++++++------ 2 files changed, 12 insertions(+), 10 deletions(-) diff --git a/migration/migration.h b/migration/migration.h index 6af01362d4..38aa1402d5 100644 --- a/migration/migration.h +++ b/migration/migration.h @@ -160,7 +160,7 @@ struct MigrationIncomingState { /* PostCopyFD's for external userfaultfds & handlers of shared memory */ GArray *postcopy_remote_fds; - int state; + MigrationStatus state; /* * The incoming migration coroutine, non-NULL during qemu_loadvm_state(). @@ -301,7 +301,7 @@ struct MigrationState { /* params from 'migrate-set-parameters' */ MigrationParameters parameters; - int state; + MigrationStatus state; /* State related to return path */ struct { @@ -459,7 +459,8 @@ struct MigrationState { bool rdma_migration; }; -void migrate_set_state(int *state, int old_state, int new_state); +void migrate_set_state(MigrationStatus *state, MigrationStatus old_state, + MigrationStatus new_state); void migration_fd_process_incoming(QEMUFile *f); void migration_ioc_process_incoming(QIOChannel *ioc, Error **errp); @@ -479,7 +480,7 @@ int migrate_init(MigrationState *s, Error **errp); bool migration_is_blocked(Error **errp); /* True if outgoing migration has entered postcopy phase */ bool migration_in_postcopy(void); -bool migration_postcopy_is_alive(int state); +bool migration_postcopy_is_alive(MigrationStatus state); MigrationState *migrate_get_current(void); bool migration_has_failed(MigrationState *); bool migrate_mode_is_cpr(MigrationState *); diff --git a/migration/migration.c b/migration/migration.c index d41e00ed4c..bfbd657035 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -390,7 +390,7 @@ void migration_incoming_state_destroy(void) yank_unregister_instance(MIGRATION_YANK_INSTANCE); } -static void migrate_generate_event(int new_state) +static void migrate_generate_event(MigrationStatus new_state) { if (migrate_events()) { qapi_event_send_migration(new_state); @@ -1273,8 +1273,6 @@ static void fill_destination_migration_info(MigrationInfo *info) } switch (mis->state) { - case MIGRATION_STATUS_NONE: - return; case MIGRATION_STATUS_SETUP: case MIGRATION_STATUS_CANCELLING: case MIGRATION_STATUS_CANCELLED: @@ -1290,6 +1288,8 @@ static void fill_destination_migration_info(MigrationInfo *info) info->has_status = true; fill_destination_postcopy_migration_info(info); break; + default: + return; } info->status = mis->state; @@ -1337,7 +1337,8 @@ void qmp_migrate_start_postcopy(Error **errp) /* shared migration helpers */ -void migrate_set_state(int *state, int old_state, int new_state) +void migrate_set_state(MigrationStatus *state, MigrationStatus old_state, + MigrationStatus new_state) { assert(new_state < MIGRATION_STATUS__MAX); if (qatomic_cmpxchg(state, old_state, new_state) == old_state) { @@ -1544,7 +1545,7 @@ bool migration_in_postcopy(void) } } -bool migration_postcopy_is_alive(int state) +bool migration_postcopy_is_alive(MigrationStatus state) { switch (state) { case MIGRATION_STATUS_POSTCOPY_ACTIVE: @@ -1598,7 +1599,7 @@ bool migration_is_idle(void) case MIGRATION_STATUS_DEVICE: case MIGRATION_STATUS_WAIT_UNPLUG: return false; - case MIGRATION_STATUS__MAX: + default: g_assert_not_reached(); } From patchwork Wed Jun 12 14:42:28 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 13695155 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 23822C27C77 for ; Wed, 12 Jun 2024 14:43:46 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1sHPBW-00089f-Jo; Wed, 12 Jun 2024 10:42:46 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1sHPBV-00088W-Kg for qemu-devel@nongnu.org; Wed, 12 Jun 2024 10:42:45 -0400 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1sHPBT-0002no-FB for qemu-devel@nongnu.org; Wed, 12 Jun 2024 10:42:45 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1718203362; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=SsoADCMSQ4i2nEI1WsbhXJoTwfNy8aqcbl8X3mzHuV0=; b=B693t8RqchIGht29Z9p4hm8OAQmQnnjo/yopM2T+I+ImUUrGh76EzXRd48mjjTpuERNK6C fVie2XVnDPYb6lI1PRmeowves6VJ3ePGq670OYeD9H6rzxKibM+16Cndt+qFm7K1ITYWxg /b2VSmAnrRmCHeHxCnmizLsWVqLnKoQ= Received: from mail-yb1-f200.google.com (mail-yb1-f200.google.com [209.85.219.200]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-332-cl2ZhOw0NqWnPWbyw57N1w-1; Wed, 12 Jun 2024 10:42:40 -0400 X-MC-Unique: cl2ZhOw0NqWnPWbyw57N1w-1 Received: by mail-yb1-f200.google.com with SMTP id 3f1490d57ef6-dfa5b0ee271so1045518276.2 for ; Wed, 12 Jun 2024 07:42:40 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1718203360; x=1718808160; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=SsoADCMSQ4i2nEI1WsbhXJoTwfNy8aqcbl8X3mzHuV0=; b=GXIFyoKPexl4O2YgfLTn2c2AUx5xEPERDIHKtEzoSnM0KpM6ar7mGOmvo9vhgc8RTp uo86/YcYPU7J9RcuUyrROfEUD20eEy5vrT7yF27l5wIp0jC0MHfjspWerVX29HVykNlt uoHksPFPEkE+CUJ6Fsjxvhkwfdpq7cBLahRvg2a6AM9tLkrLK7iTbegzNcMGAD2GZUYB OAv4koXx3IadJa4gpI5+IypzGbr5NWSOr8v0OCopIVaR6cm1/KtPxEJ0QsnbcmwcUJNZ mg6HiYmelLzNynbQdeGR2PD5Yi0Eb15fsnzFc5YX755N7yv5B7i6hwfWc6+Z5Dk4Hdnx B5bQ== X-Gm-Message-State: AOJu0YzBOvtmsg1k5wh8AtDWnshYD+w6UkUe1/IWBDMTYPuuKCGUvyZx WEXJ8caYpmp9KitL/cNRrwMiDaHY5109uLgYujSIxocDWsfHiOoc/glnJ1AFlR6sF3SRWB0lVlD Gw0A6CJ2DCwmWQ1Gh4WbsbkZrjPpQkjLX9IlHaUMie36uI1kFWdZwGfVkKJ4i63hmlERkGe/NR0 9m4+DqJhNWe9plQCDx6cSB/Zj4H9W6sNZf/A== X-Received: by 2002:a25:86d2:0:b0:dfd:d70b:bbe3 with SMTP id 3f1490d57ef6-dfe6ac9dfcamr1440823276.5.1718203359427; Wed, 12 Jun 2024 07:42:39 -0700 (PDT) X-Google-Smtp-Source: AGHT+IHafHFyGKgBr6TW9ZGmdaUNYMsJqmS9a2n823FCImTsqWNkfsw4SgWiqdz2nexJa6AKx37qJw== X-Received: by 2002:a25:86d2:0:b0:dfd:d70b:bbe3 with SMTP id 3f1490d57ef6-dfe6ac9dfcamr1440782276.5.1718203358512; Wed, 12 Jun 2024 07:42:38 -0700 (PDT) Received: from x1n.redhat.com (pool-99-254-121-117.cpe.net.cable.rogers.com. [99.254.121.117]) by smtp.gmail.com with ESMTPSA id d75a77b69052e-4405e3ded65sm37581681cf.87.2024.06.12.07.42.35 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 12 Jun 2024 07:42:36 -0700 (PDT) From: Peter Xu To: qemu-devel@nongnu.org Cc: Jiri Denemark , Prasad Pandit , Fabiano Rosas , Bandan Das , peterx@redhat.com Subject: [PATCH 4/4] migration/postcopy: Add postcopy-recover-setup phase Date: Wed, 12 Jun 2024 10:42:28 -0400 Message-ID: <20240612144228.1179240-5-peterx@redhat.com> X-Mailer: git-send-email 2.45.0 In-Reply-To: <20240612144228.1179240-1-peterx@redhat.com> References: <20240612144228.1179240-1-peterx@redhat.com> MIME-Version: 1.0 Received-SPF: pass client-ip=170.10.133.124; envelope-from=peterx@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -22 X-Spam_score: -2.3 X-Spam_bar: -- X-Spam_report: (-2.3 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.143, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H4=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org This patch adds a migration state on src called "postcopy-recover-setup". The new state will describe the intermediate step starting from when the src QEMU started an postcopy recovery request, until the migration channels are properly established, but before the recovery process take place. The request came from Libvirt where Libvirt currently rely on the migration state events to detect migration state changes. That works for most of the migration process but except postcopy recovery failures at the beginning. Currently postcopy recovery only has two major states: - postcopy-paused: this is the state that both sides of QEMU will be in for a long time as long as the migration channel was interrupted. - postcopy-recover: this is the state where both sides of QEMU handshake with each other, preparing for a continuation of postcopy which used to be interrupted. The issue here is when the recovery port is invalid, the src QEMU will take the URI/channels, noticing the ports are not valid, and it'll silently keep in the postcopy-paused state, with no event sent to Libvirt. In this case, the only thing Libvirt can do is to poll the migration status with a proper interval, however that's less optimal. Considering that this is the only case where Libvirt won't get a notification from QEMU on such events, let's add postcopy-recover-setup state to mimic what we used to have with the "setup" state of a newly initialized migration, describing the phase of connection establishment. With that, postcopy recovery will have two paths to go now, and either path will guarantee an event generated. Now the events will look like this during a recovery process on src QEMU: - Initially when the recovery is initiated on src, QEMU will go from "postcopy-paused" -> "postcopy-recover-setup". Old QEMUs don't have this event. - Depending on whether the channel re-establishment is succeeded: - In succeeded case, src QEMU will move from "postcopy-recover-setup" to "postcopy-recover". Old QEMUs also have this event. - In failure case, src QEMU will move from "postcopy-recover-setup" to "postcopy-paused" again. Old QEMUs don't have this event. This guarantees that Libvirt will always receive a notification for recovery process properly. One thing to mention is, such new status is only needed on src QEMU not both. On dest QEMU, the state machine doesn't change. Hence the events don't change either. It's done like so because dest QEMU may not have an explicit point of setup start. E.g., it can happen that when dest QEMUs doesn't use migrate-recover command to use a new URI/channel, but the old URI/channels can be reused in recovery, in which case the old ports simply can work again after the network routes are fixed up. The patch has some touch-ups in the dest path too, but it's because there's some unclearness on using migrate_set_state(), so the change should make it crystal clear now by checking current status always. The next step from that POV would be making migrate_set_state() not using cmpxchg() but always update the status, but that's for later. Cc: Jiri Denemark Cc: Fabiano Rosas Cc: Prasad Pandit Buglink: https://issues.redhat.com/browse/RHEL-38485 Signed-off-by: Peter Xu --- qapi/migration.json | 4 +++ migration/postcopy-ram.h | 3 ++ migration/migration.c | 66 +++++++++++++++++++++++++++++++++++----- migration/postcopy-ram.c | 6 ++++ migration/savevm.c | 4 +-- 5 files changed, 73 insertions(+), 10 deletions(-) diff --git a/qapi/migration.json b/qapi/migration.json index a351fd3714..a135bbcd96 100644 --- a/qapi/migration.json +++ b/qapi/migration.json @@ -142,6 +142,9 @@ # # @postcopy-paused: during postcopy but paused. (since 3.0) # +# @postcopy-recover-setup: setup phase for a postcopy recover process, +# preparing for a recover phase to start. (since 9.1) +# # @postcopy-recover: trying to recover from a paused postcopy. (since # 3.0) # @@ -166,6 +169,7 @@ { 'enum': 'MigrationStatus', 'data': [ 'none', 'setup', 'cancelling', 'cancelled', 'active', 'postcopy-active', 'postcopy-paused', + 'postcopy-recover-setup', 'postcopy-recover', 'completed', 'failed', 'colo', 'pre-switchover', 'device', 'wait-unplug' ] } ## diff --git a/migration/postcopy-ram.h b/migration/postcopy-ram.h index ecae941211..a6df1b2811 100644 --- a/migration/postcopy-ram.h +++ b/migration/postcopy-ram.h @@ -13,6 +13,8 @@ #ifndef QEMU_POSTCOPY_RAM_H #define QEMU_POSTCOPY_RAM_H +#include "qapi/qapi-types-migration.h" + /* Return true if the host supports everything we need to do postcopy-ram */ bool postcopy_ram_supported_by_host(MigrationIncomingState *mis, Error **errp); @@ -193,5 +195,6 @@ enum PostcopyChannels { void postcopy_preempt_new_channel(MigrationIncomingState *mis, QEMUFile *file); void postcopy_preempt_setup(MigrationState *s); int postcopy_preempt_establish_channel(MigrationState *s); +bool postcopy_is_paused(MigrationStatus status); #endif diff --git a/migration/migration.c b/migration/migration.c index bfbd657035..9475dce7dc 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -595,6 +595,26 @@ bool migrate_uri_parse(const char *uri, MigrationChannel **channel, return true; } +static bool +migration_incoming_state_setup(MigrationIncomingState *mis, Error **errp) +{ + MigrationStatus current = mis->state; + + if (current == MIGRATION_STATUS_POSTCOPY_PAUSED) { + /* Postcopy paused state doesn't change when setup new ports */ + return true; + } + + if (current != MIGRATION_STATUS_NONE) { + error_setg(errp, "Illegal migration incoming state: %s", + MigrationStatus_str(current)); + return false; + } + + migrate_set_state(&mis->state, current, MIGRATION_STATUS_SETUP); + return true; +} + static void qemu_start_incoming_migration(const char *uri, bool has_channels, MigrationChannelList *channels, Error **errp) @@ -633,8 +653,9 @@ static void qemu_start_incoming_migration(const char *uri, bool has_channels, return; } - migrate_set_state(&mis->state, MIGRATION_STATUS_NONE, - MIGRATION_STATUS_SETUP); + if (!migration_incoming_state_setup(mis, errp)) { + return; + } if (addr->transport == MIGRATION_ADDRESS_TYPE_SOCKET) { SocketAddress *saddr = &addr->u.socket; @@ -1070,6 +1091,7 @@ bool migration_is_setup_or_active(void) case MIGRATION_STATUS_ACTIVE: case MIGRATION_STATUS_POSTCOPY_ACTIVE: case MIGRATION_STATUS_POSTCOPY_PAUSED: + case MIGRATION_STATUS_POSTCOPY_RECOVER_SETUP: case MIGRATION_STATUS_POSTCOPY_RECOVER: case MIGRATION_STATUS_SETUP: case MIGRATION_STATUS_PRE_SWITCHOVER: @@ -1092,6 +1114,7 @@ bool migration_is_running(void) case MIGRATION_STATUS_ACTIVE: case MIGRATION_STATUS_POSTCOPY_ACTIVE: case MIGRATION_STATUS_POSTCOPY_PAUSED: + case MIGRATION_STATUS_POSTCOPY_RECOVER_SETUP: case MIGRATION_STATUS_POSTCOPY_RECOVER: case MIGRATION_STATUS_SETUP: case MIGRATION_STATUS_PRE_SWITCHOVER: @@ -1229,6 +1252,7 @@ static void fill_source_migration_info(MigrationInfo *info) case MIGRATION_STATUS_PRE_SWITCHOVER: case MIGRATION_STATUS_DEVICE: case MIGRATION_STATUS_POSTCOPY_PAUSED: + case MIGRATION_STATUS_POSTCOPY_RECOVER_SETUP: case MIGRATION_STATUS_POSTCOPY_RECOVER: /* TODO add some postcopy stats */ populate_time_info(info, s); @@ -1279,6 +1303,7 @@ static void fill_destination_migration_info(MigrationInfo *info) case MIGRATION_STATUS_ACTIVE: case MIGRATION_STATUS_POSTCOPY_ACTIVE: case MIGRATION_STATUS_POSTCOPY_PAUSED: + case MIGRATION_STATUS_POSTCOPY_RECOVER_SETUP: case MIGRATION_STATUS_POSTCOPY_RECOVER: case MIGRATION_STATUS_FAILED: case MIGRATION_STATUS_COLO: @@ -1435,9 +1460,30 @@ static void migrate_error_free(MigrationState *s) static void migrate_fd_error(MigrationState *s, const Error *error) { + MigrationStatus current = s->state; + MigrationStatus next; + assert(s->to_dst_file == NULL); - migrate_set_state(&s->state, MIGRATION_STATUS_SETUP, - MIGRATION_STATUS_FAILED); + + switch (current) { + case MIGRATION_STATUS_SETUP: + next = MIGRATION_STATUS_FAILED; + break; + case MIGRATION_STATUS_POSTCOPY_RECOVER_SETUP: + /* Never fail a postcopy migration; switch back to PAUSED instead */ + next = MIGRATION_STATUS_POSTCOPY_PAUSED; + break; + default: + /* + * This really shouldn't happen. Just be careful to not crash a VM + * just for this. Instead, dump something. + */ + error_report("%s: Illegal migration status (%s) detected", + __func__, MigrationStatus_str(current)); + return; + } + + migrate_set_state(&s->state, current, next); migrate_set_error(s, error); } @@ -1538,6 +1584,7 @@ bool migration_in_postcopy(void) switch (s->state) { case MIGRATION_STATUS_POSTCOPY_ACTIVE: case MIGRATION_STATUS_POSTCOPY_PAUSED: + case MIGRATION_STATUS_POSTCOPY_RECOVER_SETUP: case MIGRATION_STATUS_POSTCOPY_RECOVER: return true; default: @@ -1936,6 +1983,9 @@ static bool migrate_prepare(MigrationState *s, bool resume, Error **errp) return false; } + migrate_set_state(&s->state, MIGRATION_STATUS_POSTCOPY_PAUSED, + MIGRATION_STATUS_POSTCOPY_RECOVER_SETUP); + /* This is a resume, skip init status */ return true; } @@ -2968,9 +3018,9 @@ static MigThrError postcopy_pause(MigrationState *s) * We wait until things fixed up. Then someone will setup the * status back for us. */ - while (s->state == MIGRATION_STATUS_POSTCOPY_PAUSED) { + do { qemu_sem_wait(&s->postcopy_pause_sem); - } + } while (postcopy_is_paused(s->state)); if (s->state == MIGRATION_STATUS_POSTCOPY_RECOVER) { /* Woken up by a recover procedure. Give it a shot */ @@ -3666,7 +3716,7 @@ void migrate_fd_connect(MigrationState *s, Error *error_in) { Error *local_err = NULL; uint64_t rate_limit; - bool resume = s->state == MIGRATION_STATUS_POSTCOPY_PAUSED; + bool resume = migration_in_postcopy(); int ret; /* @@ -3733,7 +3783,7 @@ void migrate_fd_connect(MigrationState *s, Error *error_in) if (resume) { /* Wakeup the main migration thread to do the recovery */ - migrate_set_state(&s->state, MIGRATION_STATUS_POSTCOPY_PAUSED, + migrate_set_state(&s->state, MIGRATION_STATUS_POSTCOPY_RECOVER_SETUP, MIGRATION_STATUS_POSTCOPY_RECOVER); qemu_sem_post(&s->postcopy_pause_sem); return; diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c index 97701e6bb2..1c374b7ea1 100644 --- a/migration/postcopy-ram.c +++ b/migration/postcopy-ram.c @@ -1770,3 +1770,9 @@ void *postcopy_preempt_thread(void *opaque) return NULL; } + +bool postcopy_is_paused(MigrationStatus status) +{ + return status == MIGRATION_STATUS_POSTCOPY_PAUSED || + status == MIGRATION_STATUS_POSTCOPY_RECOVER_SETUP; +} diff --git a/migration/savevm.c b/migration/savevm.c index e71410d8c1..deb57833f8 100644 --- a/migration/savevm.c +++ b/migration/savevm.c @@ -2864,9 +2864,9 @@ static bool postcopy_pause_incoming(MigrationIncomingState *mis) error_report("Detected IO failure for postcopy. " "Migration paused."); - while (mis->state == MIGRATION_STATUS_POSTCOPY_PAUSED) { + do { qemu_sem_wait(&mis->postcopy_pause_sem_dst); - } + } while (postcopy_is_paused(mis->state)); trace_postcopy_pause_incoming_continued();