From patchwork Wed Oct 4 22:02:31 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 13409483 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id B503DE936ED for ; Wed, 4 Oct 2023 22:03:45 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1qo9xe-0005I7-5K; Wed, 04 Oct 2023 18:03:18 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1qo9xc-0005Ck-NB for qemu-devel@nongnu.org; Wed, 04 Oct 2023 18:03:16 -0400 Received: from us-smtp-delivery-124.mimecast.com ([170.10.129.124]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1qo9xN-0003wk-9k for qemu-devel@nongnu.org; Wed, 04 Oct 2023 18:03:16 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1696456980; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=bG5URcj0LvXiyzkErTJOYCEY51A9yYuyYymueU+c8k8=; b=AkHtPCjrQytpiSVy7HA9Pz2+6eGBZyovx2c0v+Vmx5jAAfm9XswSJTXZgXXxFZRtszKujr g/Pzkds1P7VItjwIVt0wpRteUtoKhEapHM/nB+xFzrziThVVoJNN9pPXDAbhEaBXljeaP3 SCtjiJqBPNcxTLmY/nMv7DtvxcmdLZg= Received: from mail-qv1-f69.google.com (mail-qv1-f69.google.com [209.85.219.69]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-477-IDl0XMZjNJiqRVj1TZaIWA-1; Wed, 04 Oct 2023 18:02:44 -0400 X-MC-Unique: IDl0XMZjNJiqRVj1TZaIWA-1 Received: by mail-qv1-f69.google.com with SMTP id 6a1803df08f44-637948b24bdso495806d6.1 for ; Wed, 04 Oct 2023 15:02:44 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1696456963; x=1697061763; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=bG5URcj0LvXiyzkErTJOYCEY51A9yYuyYymueU+c8k8=; b=Ip2wKK26W/DeCF84ZpK3c5+PLGz+CnExO2Rs8AgKhAQIPnPrbnMhZhub1+FOMibMvt zfxns6tPsB7Sq37atHTEt65ejXHRa8qUTKY0SVJepvXxyjBmSEnEIuROhOII1jhcX1cF xG/a9UICaBSqVUKXp3/y81WJyimF3mErnc1HJBInhrB8zCnLTClUJTcJEK5J6ziwg+tZ qh2cZtTG3UUWI9OfxdGiRj/oyWO9YVo3U/VyPnjxx7dSpr2P5sjwWqCHqkLD+0fBxQAY QYcOLoJnD3qHcoad92hZYd9DDCqxepA2BzoMMc9PxszFHCACdxTj90TzybS36JujwO6m h6Vg== X-Gm-Message-State: AOJu0Yw4gSAkodPSM+fg3GVMC4YVtqgInVR35TLwlfg9w4/nbcfX11ye O4YAoFwk3XhP+avN3wPDnaAw0eyrqNPmlsKjHYCQWro+jAD3e20AIzBgTraAv1L3K4Bhi4H3qrv e+V68bLqRze/D1ZgAH1TqGajSPsc3rH1KM4fhchl32v9Yfg/zDZE4iIe+x7GyWfzbRqLYDVSL X-Received: by 2002:a05:6214:2303:b0:668:e31b:5576 with SMTP id gc3-20020a056214230300b00668e31b5576mr3691511qvb.1.1696456963462; Wed, 04 Oct 2023 15:02:43 -0700 (PDT) X-Google-Smtp-Source: AGHT+IHkNSH2PDNeC6AUkshGDZ10vEV1esuQBC9JhWZ7PplECEjNuObrfLuKcPPLzVwBcxuAWKnU3A== X-Received: by 2002:a05:6214:2303:b0:668:e31b:5576 with SMTP id gc3-20020a056214230300b00668e31b5576mr3691479qvb.1.1696456962953; Wed, 04 Oct 2023 15:02:42 -0700 (PDT) Received: from x1n.redhat.com (cpe5c7695f3aee0-cm5c7695f3aede.cpe.net.cable.rogers.com. [99.254.144.39]) by smtp.gmail.com with ESMTPSA id w17-20020a0cdf91000000b0063d162a8b8bsm10821qvl.19.2023.10.04.15.02.42 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 04 Oct 2023 15:02:42 -0700 (PDT) From: Peter Xu To: qemu-devel@nongnu.org Cc: peterx@redhat.com, Fabiano Rosas , Juan Quintela Subject: [PATCH v3 01/10] migration: Display error in query-migrate irrelevant of status Date: Wed, 4 Oct 2023 18:02:31 -0400 Message-ID: <20231004220240.167175-2-peterx@redhat.com> X-Mailer: git-send-email 2.41.0 In-Reply-To: <20231004220240.167175-1-peterx@redhat.com> References: <20231004220240.167175-1-peterx@redhat.com> MIME-Version: 1.0 Received-SPF: pass client-ip=170.10.129.124; envelope-from=peterx@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Display it as long as being set, irrelevant of FAILED status. E.g., it may also be applicable to PAUSED stage of postcopy, to provide hint on what has gone wrong. The error_mutex seems to be overlooked when referencing the error, add it to be very safe. This will change QAPI behavior by showing up error message outside !FAILED status, but it's intended and doesn't expect to break anyone. Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2018404 Reviewed-by: Fabiano Rosas Signed-off-by: Peter Xu Reviewed-by: Juan Quintela --- qapi/migration.json | 5 ++--- migration/migration.c | 8 +++++--- 2 files changed, 7 insertions(+), 6 deletions(-) diff --git a/qapi/migration.json b/qapi/migration.json index 8843e74b59..c241b6d318 100644 --- a/qapi/migration.json +++ b/qapi/migration.json @@ -230,9 +230,8 @@ # throttled during auto-converge. This is only present when # auto-converge has started throttling guest cpus. (Since 2.7) # -# @error-desc: the human readable error description string, when -# @status is 'failed'. Clients should not attempt to parse the -# error strings. (Since 2.7) +# @error-desc: the human readable error description string. Clients +# should not attempt to parse the error strings. (Since 2.7) # # @postcopy-blocktime: total time when all vCPU were blocked during # postcopy live migration. This is only present when the diff --git a/migration/migration.c b/migration/migration.c index 585d3c8f55..010056d6f3 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -1057,9 +1057,6 @@ static void fill_source_migration_info(MigrationInfo *info) break; case MIGRATION_STATUS_FAILED: info->has_status = true; - if (s->error) { - info->error_desc = g_strdup(error_get_pretty(s->error)); - } break; case MIGRATION_STATUS_CANCELLED: info->has_status = true; @@ -1069,6 +1066,11 @@ static void fill_source_migration_info(MigrationInfo *info) break; } info->status = state; + + QEMU_LOCK_GUARD(&s->error_mutex); + if (s->error) { + info->error_desc = g_strdup(error_get_pretty(s->error)); + } } static void fill_destination_migration_info(MigrationInfo *info) From patchwork Wed Oct 4 22:02:32 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 13409480 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 22F66E936EB for ; Wed, 4 Oct 2023 22:03:27 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1qo9xD-0003zv-Cr; Wed, 04 Oct 2023 18:02:51 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1qo9xB-0003zk-Bc for qemu-devel@nongnu.org; Wed, 04 Oct 2023 18:02:49 -0400 Received: from us-smtp-delivery-124.mimecast.com ([170.10.129.124]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1qo9x9-0003rS-N3 for qemu-devel@nongnu.org; Wed, 04 Oct 2023 18:02:49 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1696456966; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=LzKZuZquPAsiKzUY2JHiaYVJqmKsq9gZ4aRLuK68dRY=; b=hMBYsaeVtSunviszex2n7szzhKeEcBNckgztoBNhVfBSeRpxBkI80viSlZpGFENzEiBkQx GneMfnGxymEAUWgmeHokPwlSPoPnHSbz63pVDKa1IHVt1wHqTljadIu+zuLKKQXXd2N3sd uvFPy42ztbcWuiApFrFePCMu3EDwe2Y= Received: from mail-yw1-f200.google.com (mail-yw1-f200.google.com [209.85.128.200]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-569-i6xxX3TLPimTfRhWtIm3nA-1; Wed, 04 Oct 2023 18:02:45 -0400 X-MC-Unique: i6xxX3TLPimTfRhWtIm3nA-1 Received: by mail-yw1-f200.google.com with SMTP id 00721157ae682-59beab24599so727367b3.1 for ; Wed, 04 Oct 2023 15:02:45 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1696456964; x=1697061764; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=LzKZuZquPAsiKzUY2JHiaYVJqmKsq9gZ4aRLuK68dRY=; b=U8o5merpTBe/jUWP8GTlGa+RPUhkGRXMYozd4K6zlMdbsHDbfYY7PKon5bJph6Psp1 EDkJ3LH6AkWzRW6ENx6AiLajEUCFPgYjgVd8l7dzB9IjbLROGXXp5g6CrfT9D+57Yh0/ RCW/KOxGDeF6bRuSZgkaLQXGsV8aA8DNpF7hfW5319uzLbjj/DLdRF7ODpZUrkh1rHdO r5s94SUXb7IEvs7zazdQwWotFA46dBh8TFKkXcwj8a/78JrTzbuPZm0bifEZzpYgzItt IM+Ra1xyyTuDIu5bb8njExo+LalKiA+Svrr3E9qkx5Tl8FXUu10gQQHEnmFSDCDiKrUI mWZw== X-Gm-Message-State: AOJu0YykxXCsRB+M/MojAXfbmewP/v7qbwtDOFhIkY1N6XPIcOTK1Bpr l4fHgoPiyqfWHMzDwyvMMtNpncnQlEff/YgFTmj3PRsAmk9S3n5hfSn5Ozc1EvcwCo6p19Zoq0Q DEknnU6hBJT71LYYe38VxrEuTGRCdDjHbodo/tyMQA0g4KGVoc2AW7hmJHFeD3iHFXyDivWAT X-Received: by 2002:a25:69c8:0:b0:d7a:bcef:c2ab with SMTP id e191-20020a2569c8000000b00d7abcefc2abmr2829829ybc.4.1696456964476; Wed, 04 Oct 2023 15:02:44 -0700 (PDT) X-Google-Smtp-Source: AGHT+IEbXPlHofG1fxzP/tV3uRRxNDybCQwJJuqUZ+OT+wyOZCJzfPe2TMrLouB4prb1BizU4WmMaw== X-Received: by 2002:a25:69c8:0:b0:d7a:bcef:c2ab with SMTP id e191-20020a2569c8000000b00d7abcefc2abmr2829808ybc.4.1696456964000; Wed, 04 Oct 2023 15:02:44 -0700 (PDT) Received: from x1n.redhat.com (cpe5c7695f3aee0-cm5c7695f3aede.cpe.net.cable.rogers.com. [99.254.144.39]) by smtp.gmail.com with ESMTPSA id w17-20020a0cdf91000000b0063d162a8b8bsm10821qvl.19.2023.10.04.15.02.43 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 04 Oct 2023 15:02:43 -0700 (PDT) From: Peter Xu To: qemu-devel@nongnu.org Cc: peterx@redhat.com, Fabiano Rosas , Juan Quintela Subject: [PATCH v3 02/10] migration: Introduce migrate_has_error() Date: Wed, 4 Oct 2023 18:02:32 -0400 Message-ID: <20231004220240.167175-3-peterx@redhat.com> X-Mailer: git-send-email 2.41.0 In-Reply-To: <20231004220240.167175-1-peterx@redhat.com> References: <20231004220240.167175-1-peterx@redhat.com> MIME-Version: 1.0 Received-SPF: pass client-ip=170.10.129.124; envelope-from=peterx@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H4=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Introduce a helper to detect whether MigrationState.error is set for whatever reason. This is preparation work for any thread (e.g. source return path thread) to setup errors in an unified way to MigrationState, rather than relying on its own way to set errors (mark_source_rp_bad()). Reviewed-by: Fabiano Rosas Signed-off-by: Peter Xu Reviewed-by: Juan Quintela --- migration/migration.h | 1 + migration/migration.c | 7 +++++++ 2 files changed, 8 insertions(+) diff --git a/migration/migration.h b/migration/migration.h index 972597f4de..4106a1dc54 100644 --- a/migration/migration.h +++ b/migration/migration.h @@ -476,6 +476,7 @@ bool migration_has_all_channels(void); uint64_t migrate_max_downtime(void); void migrate_set_error(MigrationState *s, const Error *error); +bool migrate_has_error(MigrationState *s); void migrate_fd_connect(MigrationState *s, Error *error_in); diff --git a/migration/migration.c b/migration/migration.c index 010056d6f3..4c6de8c2dd 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -1231,6 +1231,13 @@ void migrate_set_error(MigrationState *s, const Error *error) } } +bool migrate_has_error(MigrationState *s) +{ + /* The lock is not helpful here, but still follow the rule */ + QEMU_LOCK_GUARD(&s->error_mutex); + return qatomic_read(&s->error); +} + static void migrate_error_free(MigrationState *s) { QEMU_LOCK_GUARD(&s->error_mutex); From patchwork Wed Oct 4 22:02:33 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 13409478 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id DAE24E936EC for ; Wed, 4 Oct 2023 22:03:27 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1qo9xP-0004SB-A3; Wed, 04 Oct 2023 18:03:03 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1qo9xN-0004Mm-5B for qemu-devel@nongnu.org; Wed, 04 Oct 2023 18:03:01 -0400 Received: from us-smtp-delivery-124.mimecast.com ([170.10.129.124]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1qo9xK-0003wK-LT for qemu-devel@nongnu.org; Wed, 04 Oct 2023 18:03:00 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1696456977; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=R6va2VZkT9QMl2V1pnSCWMJi0Dr6Cze5jEB4h8TfOtM=; b=Rk0vsegzOm+Um9HLqiaMnve96mWDyeZkBsH3l7SuK7Aw2aa/oDpLkM1hwthXBSbAdlM1Th zLzWV037Ma3PKJPT1msWg2P+IdEca3xEZM9a/AM2rvtp0rUNf3AtwMYPHsFFRsq+OhTjx4 aLv0xmA31L6RbkAypDJR+GG7Rp2ypak= Received: from mail-qv1-f69.google.com (mail-qv1-f69.google.com [209.85.219.69]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-217-XDmow_a-M4OHPbYYSQmt5w-1; Wed, 04 Oct 2023 18:02:46 -0400 X-MC-Unique: XDmow_a-M4OHPbYYSQmt5w-1 Received: by mail-qv1-f69.google.com with SMTP id 6a1803df08f44-65d0ea9d271so562206d6.1 for ; Wed, 04 Oct 2023 15:02:46 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1696456966; x=1697061766; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=R6va2VZkT9QMl2V1pnSCWMJi0Dr6Cze5jEB4h8TfOtM=; b=NwEpiH2SpWEwVvZ2PCcjRidmYrYuzPVpCt/RrawDRHL3MmC9juvu9RIMpq7VZl1DLs dwhbBooyr2UHGlNfFZ9BWpBHekdg0oZm68W+i5/saM9MudvqFPDK7gM/lR+0KjyR0ffE U9SjVKp5XR5uMRsk8TZAF+b0wYwfWPRK7wk3/XhT+IxwX+bIRknOQfJJKINvF7TOX32Z xbPgayni7Ogp59XVmcnWAVRB8ftkr5lZazoOv3qorgaW6brt2g4VkiBtYPVwzsdnz2xo U0ADkSKiwHxB3rmxyvOO7JSCI01BmQnrxRcs/82kXSJHqUBpz23gmarjFMBy0A4MG9uq bfCg== X-Gm-Message-State: AOJu0YyW1wZXO/NvEGGRSzpfteQU8OnzRffhW8BNvA72w/nDu35DB7Zm lM+cUOdyhZRwQJFg/apeA9iPrnLheSSXIhRiVNtdR7TSUMJs53gRK+cKFfCsaRSFDB5g5M2c7pU 4zK8olrQDdjef0CvfZlm0GLo7ytaPcYgHCrRClCOw3DQyp2rSoalUPrBgUYIUc9bznqjogOYh X-Received: by 2002:a05:6214:4019:b0:653:5880:ed9e with SMTP id kd25-20020a056214401900b006535880ed9emr3610529qvb.6.1696456965892; Wed, 04 Oct 2023 15:02:45 -0700 (PDT) X-Google-Smtp-Source: AGHT+IFCVoSEFnjkZCq6B3z9cQrQNm9/2uzJMVHqOzOdVKqgqPrXFtB4ftlwxYfbYepV+B42GnfzrA== X-Received: by 2002:a05:6214:4019:b0:653:5880:ed9e with SMTP id kd25-20020a056214401900b006535880ed9emr3610497qvb.6.1696456965312; Wed, 04 Oct 2023 15:02:45 -0700 (PDT) Received: from x1n.redhat.com (cpe5c7695f3aee0-cm5c7695f3aede.cpe.net.cable.rogers.com. [99.254.144.39]) by smtp.gmail.com with ESMTPSA id w17-20020a0cdf91000000b0063d162a8b8bsm10821qvl.19.2023.10.04.15.02.44 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 04 Oct 2023 15:02:44 -0700 (PDT) From: Peter Xu To: qemu-devel@nongnu.org Cc: peterx@redhat.com, Fabiano Rosas , Juan Quintela Subject: [PATCH v3 03/10] migration: Refactor error handling in source return path Date: Wed, 4 Oct 2023 18:02:33 -0400 Message-ID: <20231004220240.167175-4-peterx@redhat.com> X-Mailer: git-send-email 2.41.0 In-Reply-To: <20231004220240.167175-1-peterx@redhat.com> References: <20231004220240.167175-1-peterx@redhat.com> MIME-Version: 1.0 Received-SPF: pass client-ip=170.10.129.124; envelope-from=peterx@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H4=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org rp_state.error was a boolean used to show error happened in return path thread. That's not only duplicating error reporting (migrate_set_error), but also not good enough in that we only do error_report() and set it to true, we never can keep a history of the exact error and show it in query-migrate. To make this better, a few things done: - Use error_setg() rather than error_report() across the whole lifecycle of return path thread, keeping the error in an Error*. - Use migrate_set_error() to apply that captured error to the global migration object when error occured in this thread. - With above, no need to have mark_source_rp_bad(), remove it, alongside with rp_state.error itself. Signed-off-by: Peter Xu Signed-off-by: Peter Xu Reviewed-by: Philippe Mathieu-Daudé --- migration/migration.h | 1 - migration/ram.h | 5 +- migration/migration.c | 123 ++++++++++++++++++----------------------- migration/ram.c | 41 +++++++------- migration/trace-events | 4 +- 5 files changed, 79 insertions(+), 95 deletions(-) diff --git a/migration/migration.h b/migration/migration.h index 4106a1dc54..33a7831da4 100644 --- a/migration/migration.h +++ b/migration/migration.h @@ -308,7 +308,6 @@ struct MigrationState { /* Protected by qemu_file_lock */ QEMUFile *from_dst_file; QemuThread rp_thread; - bool error; /* * We can also check non-zero of rp_thread, but there's no "official" * way to do this, so this bool makes it slightly more elegant. diff --git a/migration/ram.h b/migration/ram.h index 145c915ca7..14ed666d58 100644 --- a/migration/ram.h +++ b/migration/ram.h @@ -51,7 +51,8 @@ uint64_t ram_bytes_total(void); void mig_throttle_counter_reset(void); uint64_t ram_pagesize_summary(void); -int ram_save_queue_pages(const char *rbname, ram_addr_t start, ram_addr_t len); +int ram_save_queue_pages(const char *rbname, ram_addr_t start, ram_addr_t len, + Error **errp); void ram_postcopy_migrated_memory_release(MigrationState *ms); /* For outgoing discard bitmap */ void ram_postcopy_send_discard_bitmap(MigrationState *ms); @@ -71,7 +72,7 @@ void ramblock_recv_bitmap_set(RAMBlock *rb, void *host_addr); void ramblock_recv_bitmap_set_range(RAMBlock *rb, void *host_addr, size_t nr); int64_t ramblock_recv_bitmap_send(QEMUFile *file, const char *block_name); -int ram_dirty_bitmap_reload(MigrationState *s, RAMBlock *rb); +int ram_dirty_bitmap_reload(MigrationState *s, RAMBlock *rb, Error **errp); bool ramblock_page_is_discarded(RAMBlock *rb, ram_addr_t start); void postcopy_preempt_shutdown_file(MigrationState *s); void *postcopy_preempt_thread(void *opaque); diff --git a/migration/migration.c b/migration/migration.c index 4c6de8c2dd..e821e80094 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -99,7 +99,7 @@ static int migration_maybe_pause(MigrationState *s, int *current_active_state, int new_state); static void migrate_fd_cancel(MigrationState *s); -static int await_return_path_close_on_source(MigrationState *s); +static void await_return_path_close_on_source(MigrationState *s); static bool migration_needs_multiple_sockets(void) { @@ -1427,7 +1427,6 @@ int migrate_init(MigrationState *s, Error **errp) s->to_dst_file = NULL; s->state = MIGRATION_STATUS_NONE; s->rp_state.from_dst_file = NULL; - s->rp_state.error = false; s->mbps = 0.0; s->pages_per_second = 0.0; s->downtime = 0; @@ -1750,16 +1749,6 @@ void qmp_migrate_continue(MigrationStatus state, Error **errp) qemu_sem_post(&s->pause_sem); } -/* migration thread support */ -/* - * Something bad happened to the RP stream, mark an error - * The caller shall print or trace something to indicate why - */ -static void mark_source_rp_bad(MigrationState *s) -{ - s->rp_state.error = true; -} - static struct rp_cmd_args { ssize_t len; /* -1 = variable */ const char *name; @@ -1781,7 +1770,7 @@ static struct rp_cmd_args { * and we don't need to send pages that have already been sent. */ static void migrate_handle_rp_req_pages(MigrationState *ms, const char* rbname, - ram_addr_t start, size_t len) + ram_addr_t start, size_t len, Error **errp) { long our_host_ps = qemu_real_host_page_size(); @@ -1793,37 +1782,36 @@ static void migrate_handle_rp_req_pages(MigrationState *ms, const char* rbname, */ if (!QEMU_IS_ALIGNED(start, our_host_ps) || !QEMU_IS_ALIGNED(len, our_host_ps)) { - error_report("%s: Misaligned page request, start: " RAM_ADDR_FMT - " len: %zd", __func__, start, len); - mark_source_rp_bad(ms); + error_setg(errp, "MIG_RP_MSG_REQ_PAGES: Misaligned page request, start:" + RAM_ADDR_FMT " len: %zd", start, len); return; } - if (ram_save_queue_pages(rbname, start, len)) { - mark_source_rp_bad(ms); - } + ram_save_queue_pages(rbname, start, len, errp); } -static int migrate_handle_rp_recv_bitmap(MigrationState *s, char *block_name) +static int migrate_handle_rp_recv_bitmap(MigrationState *s, char *block_name, + Error **errp) { RAMBlock *block = qemu_ram_block_by_name(block_name); if (!block) { - error_report("%s: invalid block name '%s'", __func__, block_name); + error_setg(errp, "MIG_RP_MSG_RECV_BITMAP has invalid block name '%s'", + block_name); return -EINVAL; } /* Fetch the received bitmap and refresh the dirty bitmap */ - return ram_dirty_bitmap_reload(s, block); + return ram_dirty_bitmap_reload(s, block, errp); } -static int migrate_handle_rp_resume_ack(MigrationState *s, uint32_t value) +static int migrate_handle_rp_resume_ack(MigrationState *s, + uint32_t value, Error **errp) { trace_source_return_path_thread_resume_ack(value); if (value != MIGRATION_RESUME_ACK_VALUE) { - error_report("%s: illegal resume_ack value %"PRIu32, - __func__, value); + error_setg(errp, "illegal resume_ack value %"PRIu32, value); return -1; } @@ -1882,48 +1870,46 @@ static void *source_return_path_thread(void *opaque) uint32_t tmp32, sibling_error; ram_addr_t start = 0; /* =0 to silence warning */ size_t len = 0, expected_len; + Error *err = NULL; int res; trace_source_return_path_thread_entry(); rcu_register_thread(); - while (!ms->rp_state.error && !qemu_file_get_error(rp) && + while (!migrate_has_error(ms) && !qemu_file_get_error(rp) && migration_is_setup_or_active(ms->state)) { trace_source_return_path_thread_loop_top(); + header_type = qemu_get_be16(rp); header_len = qemu_get_be16(rp); if (qemu_file_get_error(rp)) { - mark_source_rp_bad(ms); goto out; } if (header_type >= MIG_RP_MSG_MAX || header_type == MIG_RP_MSG_INVALID) { - error_report("RP: Received invalid message 0x%04x length 0x%04x", - header_type, header_len); - mark_source_rp_bad(ms); + error_setg(&err, "Received invalid message 0x%04x length 0x%04x", + header_type, header_len); goto out; } if ((rp_cmd_args[header_type].len != -1 && header_len != rp_cmd_args[header_type].len) || header_len > sizeof(buf)) { - error_report("RP: Received '%s' message (0x%04x) with" - "incorrect length %d expecting %zu", - rp_cmd_args[header_type].name, header_type, header_len, - (size_t)rp_cmd_args[header_type].len); - mark_source_rp_bad(ms); + error_setg(&err, "Received '%s' message (0x%04x) with" + "incorrect length %d expecting %zu", + rp_cmd_args[header_type].name, header_type, header_len, + (size_t)rp_cmd_args[header_type].len); goto out; } /* We know we've got a valid header by this point */ res = qemu_get_buffer(rp, buf, header_len); if (res != header_len) { - error_report("RP: Failed reading data for message 0x%04x" - " read %d expected %d", - header_type, res, header_len); - mark_source_rp_bad(ms); + error_setg(&err, "Failed reading data for message 0x%04x" + " read %d expected %d", + header_type, res, header_len); goto out; } @@ -1933,8 +1919,7 @@ static void *source_return_path_thread(void *opaque) sibling_error = ldl_be_p(buf); trace_source_return_path_thread_shut(sibling_error); if (sibling_error) { - error_report("RP: Sibling indicated error %d", sibling_error); - mark_source_rp_bad(ms); + error_setg(&err, "Sibling indicated error %d", sibling_error); } /* * We'll let the main thread deal with closing the RP @@ -1952,7 +1937,10 @@ static void *source_return_path_thread(void *opaque) case MIG_RP_MSG_REQ_PAGES: start = ldq_be_p(buf); len = ldl_be_p(buf + 8); - migrate_handle_rp_req_pages(ms, NULL, start, len); + migrate_handle_rp_req_pages(ms, NULL, start, len, &err); + if (err) { + goto out; + } break; case MIG_RP_MSG_REQ_PAGES_ID: @@ -1967,32 +1955,32 @@ static void *source_return_path_thread(void *opaque) expected_len += tmp32; } if (header_len != expected_len) { - error_report("RP: Req_Page_id with length %d expecting %zd", - header_len, expected_len); - mark_source_rp_bad(ms); + error_setg(&err, "Req_Page_id with length %d expecting %zd", + header_len, expected_len); + goto out; + } + migrate_handle_rp_req_pages(ms, (char *)&buf[13], start, len, + &err); + if (err) { goto out; } - migrate_handle_rp_req_pages(ms, (char *)&buf[13], start, len); break; case MIG_RP_MSG_RECV_BITMAP: if (header_len < 1) { - error_report("%s: missing block name", __func__); - mark_source_rp_bad(ms); + error_setg(&err, "MIG_RP_MSG_RECV_BITMAP missing block name"); goto out; } /* Format: len (1B) + idstr (<255B). This ends the idstr. */ buf[buf[0] + 1] = '\0'; - if (migrate_handle_rp_recv_bitmap(ms, (char *)(buf + 1))) { - mark_source_rp_bad(ms); + if (migrate_handle_rp_recv_bitmap(ms, (char *)(buf + 1), &err)) { goto out; } break; case MIG_RP_MSG_RESUME_ACK: tmp32 = ldl_be_p(buf); - if (migrate_handle_rp_resume_ack(ms, tmp32)) { - mark_source_rp_bad(ms); + if (migrate_handle_rp_resume_ack(ms, tmp32, &err)) { goto out; } break; @@ -2008,9 +1996,14 @@ static void *source_return_path_thread(void *opaque) } out: - if (qemu_file_get_error(rp)) { + if (err) { + /* + * Collect any error in return-path thread and report it to the + * migration state object. + */ + migrate_set_error(ms, err); + error_free(err); trace_source_return_path_thread_bad_end(); - mark_source_rp_bad(ms); } trace_source_return_path_thread_end(); @@ -2036,13 +2029,10 @@ static int open_return_path_on_source(MigrationState *ms) return 0; } -/* Returns 0 if the RP was ok, otherwise there was an error on the RP */ -static int await_return_path_close_on_source(MigrationState *ms) +static void await_return_path_close_on_source(MigrationState *ms) { - int ret; - if (!ms->rp_state.rp_thread_created) { - return 0; + return; } trace_migration_return_path_end_before(); @@ -2060,18 +2050,10 @@ static int await_return_path_close_on_source(MigrationState *ms) } } - trace_await_return_path_close_on_source_joining(); qemu_thread_join(&ms->rp_state.rp_thread); ms->rp_state.rp_thread_created = false; - trace_await_return_path_close_on_source_close(); - - ret = ms->rp_state.error; - ms->rp_state.error = false; - migration_release_dst_files(ms); - - trace_migration_return_path_end_after(ret); - return ret; + trace_migration_return_path_end_after(); } static inline void @@ -2367,7 +2349,10 @@ static void migration_completion(MigrationState *s) goto fail; } - if (await_return_path_close_on_source(s)) { + await_return_path_close_on_source(s); + + /* If return path has error, should have been set here */ + if (migrate_has_error(s)) { goto fail; } diff --git a/migration/ram.c b/migration/ram.c index e4bfd39f08..c54e071ea3 100644 --- a/migration/ram.c +++ b/migration/ram.c @@ -1951,7 +1951,8 @@ static void migration_page_queue_free(RAMState *rs) * @start: starting address from the start of the RAMBlock * @len: length (in bytes) to send */ -int ram_save_queue_pages(const char *rbname, ram_addr_t start, ram_addr_t len) +int ram_save_queue_pages(const char *rbname, ram_addr_t start, ram_addr_t len, + Error **errp) { RAMBlock *ramblock; RAMState *rs = ram_state; @@ -1968,7 +1969,7 @@ int ram_save_queue_pages(const char *rbname, ram_addr_t start, ram_addr_t len) * Shouldn't happen, we can't reuse the last RAMBlock if * it's the 1st request. */ - error_report("ram_save_queue_pages no previous block"); + error_setg(errp, "MIG_RP_MSG_REQ_PAGES has no previous block"); return -1; } } else { @@ -1976,16 +1977,17 @@ int ram_save_queue_pages(const char *rbname, ram_addr_t start, ram_addr_t len) if (!ramblock) { /* We shouldn't be asked for a non-existent RAMBlock */ - error_report("ram_save_queue_pages no block '%s'", rbname); + error_setg(errp, "MIG_RP_MSG_REQ_PAGES has no block '%s'", rbname); return -1; } rs->last_req_rb = ramblock; } trace_ram_save_queue_pages(ramblock->idstr, start, len); if (!offset_in_ramblock(ramblock, start + len - 1)) { - error_report("%s request overrun start=" RAM_ADDR_FMT " len=" - RAM_ADDR_FMT " blocklen=" RAM_ADDR_FMT, - __func__, start, len, ramblock->used_length); + error_setg(errp, "MIG_RP_MSG_REQ_PAGES request overrun, " + "start=" RAM_ADDR_FMT " len=" + RAM_ADDR_FMT " blocklen=" RAM_ADDR_FMT, + start, len, ramblock->used_length); return -1; } @@ -2017,9 +2019,9 @@ int ram_save_queue_pages(const char *rbname, ram_addr_t start, ram_addr_t len) assert(len % page_size == 0); while (len) { if (ram_save_host_page_urgent(pss)) { - error_report("%s: ram_save_host_page_urgent() failed: " - "ramblock=%s, start_addr=0x"RAM_ADDR_FMT, - __func__, ramblock->idstr, start); + error_setg(errp, "ram_save_host_page_urgent() failed: " + "ramblock=%s, start_addr=0x"RAM_ADDR_FMT, + ramblock->idstr, start); ret = -1; break; } @@ -4151,7 +4153,7 @@ static void ram_dirty_bitmap_reload_notify(MigrationState *s) * This is only used when the postcopy migration is paused but wants * to resume from a middle point. */ -int ram_dirty_bitmap_reload(MigrationState *s, RAMBlock *block) +int ram_dirty_bitmap_reload(MigrationState *s, RAMBlock *block, Error **errp) { int ret = -EINVAL; /* from_dst_file is always valid because we're within rp_thread */ @@ -4163,8 +4165,8 @@ int ram_dirty_bitmap_reload(MigrationState *s, RAMBlock *block) trace_ram_dirty_bitmap_reload_begin(block->idstr); if (s->state != MIGRATION_STATUS_POSTCOPY_RECOVER) { - error_report("%s: incorrect state %s", __func__, - MigrationStatus_str(s->state)); + error_setg(errp, "Reload bitmap in incorrect state %s", + MigrationStatus_str(s->state)); return -EINVAL; } @@ -4181,9 +4183,8 @@ int ram_dirty_bitmap_reload(MigrationState *s, RAMBlock *block) /* The size of the bitmap should match with our ramblock */ if (size != local_size) { - error_report("%s: ramblock '%s' bitmap size mismatch " - "(0x%"PRIx64" != 0x%"PRIx64")", __func__, - block->idstr, size, local_size); + error_setg(errp, "ramblock '%s' bitmap size mismatch (0x%"PRIx64 + " != 0x%"PRIx64")", block->idstr, size, local_size); ret = -EINVAL; goto out; } @@ -4193,16 +4194,16 @@ int ram_dirty_bitmap_reload(MigrationState *s, RAMBlock *block) ret = qemu_file_get_error(file); if (ret || size != local_size) { - error_report("%s: read bitmap failed for ramblock '%s': %d" - " (size 0x%"PRIx64", got: 0x%"PRIx64")", - __func__, block->idstr, ret, local_size, size); + error_setg(errp, "read bitmap failed for ramblock '%s': %d" + " (size 0x%"PRIx64", got: 0x%"PRIx64")", + block->idstr, ret, local_size, size); ret = -EIO; goto out; } if (end_mark != RAMBLOCK_RECV_BITMAP_ENDING) { - error_report("%s: ramblock '%s' end mark incorrect: 0x%"PRIx64, - __func__, block->idstr, end_mark); + error_setg(errp, "ramblock '%s' end mark incorrect: 0x%"PRIx64, + block->idstr, end_mark); ret = -EINVAL; goto out; } diff --git a/migration/trace-events b/migration/trace-events index 002abe3a4e..5739f6b266 100644 --- a/migration/trace-events +++ b/migration/trace-events @@ -147,8 +147,6 @@ multifd_tls_outgoing_handshake_complete(void *ioc) "ioc=%p" multifd_set_outgoing_channel(void *ioc, const char *ioctype, const char *hostname, void *err) "ioc=%p ioctype=%s hostname=%s err=%p" # migration.c -await_return_path_close_on_source_close(void) "" -await_return_path_close_on_source_joining(void) "" migrate_set_state(const char *new_state) "new state %s" migrate_fd_cleanup(void) "" migrate_fd_error(const char *error_desc) "error=%s" @@ -165,7 +163,7 @@ migration_completion_postcopy_end_after_complete(void) "" migration_rate_limit_pre(int ms) "%d ms" migration_rate_limit_post(int urgent) "urgent: %d" migration_return_path_end_before(void) "" -migration_return_path_end_after(int rp_error) "%d" +migration_return_path_end_after(void) "" migration_thread_after_loop(void) "" migration_thread_file_err(void) "" migration_thread_setup_complete(void) "" From patchwork Wed Oct 4 22:02:34 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 13409485 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id B846EE936EB for ; Wed, 4 Oct 2023 22:04:08 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1qo9xQ-0004ZS-AK; Wed, 04 Oct 2023 18:03:04 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1qo9xN-0004Mn-5F for qemu-devel@nongnu.org; Wed, 04 Oct 2023 18:03:01 -0400 Received: from us-smtp-delivery-124.mimecast.com ([170.10.129.124]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1qo9xL-0003wW-IG for qemu-devel@nongnu.org; Wed, 04 Oct 2023 18:03:00 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1696456978; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=jV8cOpRTPLd2f04WBSfLlfjLLJ1luyrwGf9W18xAndM=; b=eGNtRaaPIQU4W31cQhXgQGrxPei6OcZEzeouJJGKUYAddF8D/13kIkryQTPpqi9PYXrHdC nXHv+a5MftX5DoD2y77pcrwPvEiwZyowh7S7bjv/tFSs7IRANkYYg7OEz7kXVh8HygBs7h wHxyvRvGFkzEucasOts6FHvQCtbWRXg= Received: from mail-qv1-f69.google.com (mail-qv1-f69.google.com [209.85.219.69]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-75-CIGeB-NAM8qYfwn_ytGK_A-1; Wed, 04 Oct 2023 18:02:47 -0400 X-MC-Unique: CIGeB-NAM8qYfwn_ytGK_A-1 Received: by mail-qv1-f69.google.com with SMTP id 6a1803df08f44-637948b24bdso495936d6.1 for ; Wed, 04 Oct 2023 15:02:47 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1696456967; x=1697061767; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=jV8cOpRTPLd2f04WBSfLlfjLLJ1luyrwGf9W18xAndM=; b=cvKLgnntrYg9I7+XDSGyrBeZCdOlLvv4dqqa+QOXNjwGazdimI0e4FeeI8AFgW2fdr T3L45AaeipJXdntkScxmKTMNFs2Tbi/tr78mo/w7Uw4Ql/4P+EwCANzq8RROUs66mN7v P9vPLDy+9xHdzfQrkz4aqGxVwnJt2ya8PHmQDVVVMyWHkSj/fl/f8B2xruWolICOaPzg v9pHiedUX3MWW0qQRvqb7B9DeizhdsCKcEZxynSCLneZZr2Bv9BTMiXi5iXA9UeXaPK8 UYgnIlLUuTVKqE4hGqLEt4dKRBStugYWTNsQohoj06WztaWz41QT1h2X6q6PMFOj4DUp mriA== X-Gm-Message-State: AOJu0YzL4YP80UWGEFDodNhb6lYSEWatPmiigmwo+V+6IkWb0d0Stswk QBgjGDBMi1JYeEmtDs+o6DBZtbpnB0miKVyydvsr8lLGb8ayXyVNmJPUL6uqvR5pzKDd9pkWZye 0G8oUA4cSS4A+81dEnxOCM6OYFeKJDnjwYNSFw98oILmdeJp0CPg2AHG3xKaEyMM289Q3wAo4 X-Received: by 2002:a05:6214:2303:b0:668:e31b:5576 with SMTP id gc3-20020a056214230300b00668e31b5576mr3691704qvb.1.1696456967134; Wed, 04 Oct 2023 15:02:47 -0700 (PDT) X-Google-Smtp-Source: AGHT+IE3ordlFWGONG9+oZ5uQcWyu4LiOP3ozr+k0GBZvOeh4cqLLTnkzitp9brpVOEZCa52mkqECQ== X-Received: by 2002:a05:6214:2303:b0:668:e31b:5576 with SMTP id gc3-20020a056214230300b00668e31b5576mr3691658qvb.1.1696456966347; Wed, 04 Oct 2023 15:02:46 -0700 (PDT) Received: from x1n.redhat.com (cpe5c7695f3aee0-cm5c7695f3aede.cpe.net.cable.rogers.com. [99.254.144.39]) by smtp.gmail.com with ESMTPSA id w17-20020a0cdf91000000b0063d162a8b8bsm10821qvl.19.2023.10.04.15.02.45 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 04 Oct 2023 15:02:45 -0700 (PDT) From: Peter Xu To: qemu-devel@nongnu.org Cc: peterx@redhat.com, Fabiano Rosas , Juan Quintela Subject: [PATCH v3 04/10] migration: Deliver return path file error to migrate state too Date: Wed, 4 Oct 2023 18:02:34 -0400 Message-ID: <20231004220240.167175-5-peterx@redhat.com> X-Mailer: git-send-email 2.41.0 In-Reply-To: <20231004220240.167175-1-peterx@redhat.com> References: <20231004220240.167175-1-peterx@redhat.com> MIME-Version: 1.0 Received-SPF: pass client-ip=170.10.129.124; envelope-from=peterx@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H4=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org We've already did this for most of the return path thread errors, but not yet for the IO errors happened on the return path qemufile. Do that too. Re-export qemu_file_get_error_obj(). Signed-off-by: Peter Xu Reviewed-by: Juan Quintela --- migration/qemu-file.h | 1 + migration/migration.c | 1 + migration/qemu-file.c | 2 +- 3 files changed, 3 insertions(+), 1 deletion(-) diff --git a/migration/qemu-file.h b/migration/qemu-file.h index 03e718c264..75efe503c4 100644 --- a/migration/qemu-file.h +++ b/migration/qemu-file.h @@ -120,6 +120,7 @@ int coroutine_mixed_fn qemu_peek_byte(QEMUFile *f, int offset); void qemu_file_skip(QEMUFile *f, int size); int qemu_file_get_error_obj_any(QEMUFile *f1, QEMUFile *f2, Error **errp); void qemu_file_set_error_obj(QEMUFile *f, int ret, Error *err); +int qemu_file_get_error_obj(QEMUFile *f, Error **errp); void qemu_file_set_error(QEMUFile *f, int ret); int qemu_file_shutdown(QEMUFile *f); QEMUFile *qemu_file_get_return_path(QEMUFile *f); diff --git a/migration/migration.c b/migration/migration.c index e821e80094..b28b504b4c 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -1884,6 +1884,7 @@ static void *source_return_path_thread(void *opaque) header_len = qemu_get_be16(rp); if (qemu_file_get_error(rp)) { + qemu_file_get_error_obj(rp, &err); goto out; } diff --git a/migration/qemu-file.c b/migration/qemu-file.c index 5e8207dae4..ffa9c0a48a 100644 --- a/migration/qemu-file.c +++ b/migration/qemu-file.c @@ -146,7 +146,7 @@ void qemu_file_set_hooks(QEMUFile *f, const QEMUFileHooks *hooks) * is not 0. * */ -static int qemu_file_get_error_obj(QEMUFile *f, Error **errp) +int qemu_file_get_error_obj(QEMUFile *f, Error **errp) { if (errp) { *errp = f->last_error_obj ? error_copy(f->last_error_obj) : NULL; From patchwork Wed Oct 4 22:02:35 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 13409489 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 889A0E936EA for ; Wed, 4 Oct 2023 22:04:30 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1qo9xL-0004ID-E5; Wed, 04 Oct 2023 18:02:59 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1qo9xJ-0004Hk-JF for qemu-devel@nongnu.org; Wed, 04 Oct 2023 18:02:57 -0400 Received: from us-smtp-delivery-124.mimecast.com ([170.10.129.124]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1qo9xH-0003vE-9E for qemu-devel@nongnu.org; Wed, 04 Oct 2023 18:02:57 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1696456974; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=VxO3OdtMyeeaxOri15W9iSHPGH8x5n6UYkaAADCs9aY=; b=PU4BTCNgpmB/2q7FUjc+QPP/RxiDoiKdrsJinNp2AdrgeOPPpWaLBjic9LUWYzoCSpkDYm M6gVQvzs7yFT9on9A9xicvc5/8TpIZf4e1VkTgP808goHaErxGBoenbJlsa+/FZdGVchvv GTdP+8Fcf9ZIYeCcjMwSjpe4Z6eKiE8= Received: from mail-yb1-f197.google.com (mail-yb1-f197.google.com [209.85.219.197]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-77-KMXCoxP9MuiO117iMjonug-1; Wed, 04 Oct 2023 18:02:48 -0400 X-MC-Unique: KMXCoxP9MuiO117iMjonug-1 Received: by mail-yb1-f197.google.com with SMTP id 3f1490d57ef6-d868c33252fso84847276.0 for ; Wed, 04 Oct 2023 15:02:48 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1696456967; x=1697061767; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=VxO3OdtMyeeaxOri15W9iSHPGH8x5n6UYkaAADCs9aY=; b=UoHXlYDuB2Prprzy2nFvBqSbSM77qHUVZCaTsRY2zHGN66nDnSMXXu8R7J2lAPg8p4 U5vS911yR2zXL9+24NGAQoiRetHTeRso98prw0ORUNtIJdxfvZq2kJ2R5z1X/GnWbjpD vG91+kDEVzQZGgapLUtiTsRgm6Awjn1rV5AySwIFsdxe9Xj2Sd5rqsM1jccwFYfljcXT rbhEIFQrDPBquK1gVh2+Sy+AYAfLmCSCZhc/0CSmI21DakWzyVHJPVuDd/qoqeK6syVR UspX1Msw5VgT5rLZz0NYZ2eeM2MXgTO8AEaRa/5GLi0sF860UESeQHtNCdLwEup2EVsu Qvow== X-Gm-Message-State: AOJu0YzzebdfSEBVciO1pd1RN+9RCoVuVtW1BTWOzKXvjlG4f/KLlMqS WPs9pURZFq8LBhDs67qGGYqSeIVQhG5C2KIoQDhNoSmKxWWX6EZyzDMIjlZbvRK+2xZ1wyBgOCm XHOxBit5iFfI2/Bk5fjdKfOsrwg2irjpUzWUPFMpSYemF2dHbwBhrvZxAFuPoXgtFXn6zUypk X-Received: by 2002:a25:ab89:0:b0:d89:49a4:448c with SMTP id v9-20020a25ab89000000b00d8949a4448cmr3045214ybi.2.1696456967596; Wed, 04 Oct 2023 15:02:47 -0700 (PDT) X-Google-Smtp-Source: AGHT+IEnxSiwfWYLW/ZHB7DxsuTQvAij8OoTSqeyXHt9U/MpvsqSowozeRhP8Ag3z849Hsx1Eq3b2Q== X-Received: by 2002:a25:ab89:0:b0:d89:49a4:448c with SMTP id v9-20020a25ab89000000b00d8949a4448cmr3045195ybi.2.1696456967275; Wed, 04 Oct 2023 15:02:47 -0700 (PDT) Received: from x1n.redhat.com (cpe5c7695f3aee0-cm5c7695f3aede.cpe.net.cable.rogers.com. [99.254.144.39]) by smtp.gmail.com with ESMTPSA id w17-20020a0cdf91000000b0063d162a8b8bsm10821qvl.19.2023.10.04.15.02.46 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 04 Oct 2023 15:02:46 -0700 (PDT) From: Peter Xu To: qemu-devel@nongnu.org Cc: peterx@redhat.com, Fabiano Rosas , Juan Quintela Subject: [PATCH v3 05/10] qemufile: Always return a verbose error Date: Wed, 4 Oct 2023 18:02:35 -0400 Message-ID: <20231004220240.167175-6-peterx@redhat.com> X-Mailer: git-send-email 2.41.0 In-Reply-To: <20231004220240.167175-1-peterx@redhat.com> References: <20231004220240.167175-1-peterx@redhat.com> MIME-Version: 1.0 Received-SPF: pass client-ip=170.10.129.124; envelope-from=peterx@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H4=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org There're a lot of cases where we only have an errno set in last_error but without a detailed error description. When this happens, try to generate an error contains the errno as a descriptive error. This will be helpful in cases where one relies on the Error*. E.g., migration state only caches Error* in MigrationState.error. With this, we'll display correct error messages in e.g. query-migrate when the error was only set by qemu_file_set_error(). Reviewed-by: Fabiano Rosas Signed-off-by: Peter Xu Reviewed-by: Juan Quintela --- migration/qemu-file.c | 15 ++++++++++++--- 1 file changed, 12 insertions(+), 3 deletions(-) diff --git a/migration/qemu-file.c b/migration/qemu-file.c index ffa9c0a48a..c12a905a34 100644 --- a/migration/qemu-file.c +++ b/migration/qemu-file.c @@ -142,15 +142,24 @@ void qemu_file_set_hooks(QEMUFile *f, const QEMUFileHooks *hooks) * * Return negative error value if there has been an error on previous * operations, return 0 if no error happened. - * Optional, it returns Error* in errp, but it may be NULL even if return value - * is not 0. * + * If errp is specified, a verbose error message will be copied over. */ int qemu_file_get_error_obj(QEMUFile *f, Error **errp) { + if (!f->last_error) { + return 0; + } + + /* There is an error */ if (errp) { - *errp = f->last_error_obj ? error_copy(f->last_error_obj) : NULL; + if (f->last_error_obj) { + *errp = error_copy(f->last_error_obj); + } else { + error_setg_errno(errp, -f->last_error, "Channel error"); + } } + return f->last_error; } From patchwork Wed Oct 4 22:02:36 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 13409484 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 587BAE936EB for ; Wed, 4 Oct 2023 22:04:04 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1qo9xO-0004Nx-Fe; Wed, 04 Oct 2023 18:03:02 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1qo9xK-0004JB-SG for qemu-devel@nongnu.org; Wed, 04 Oct 2023 18:02:59 -0400 Received: from us-smtp-delivery-124.mimecast.com ([170.10.129.124]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1qo9xJ-0003vq-70 for qemu-devel@nongnu.org; Wed, 04 Oct 2023 18:02:58 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1696456976; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=2vHYMUUZwdgemINSHDTriLnfve6+5EiI4PRPsUAHgjs=; b=d5JmZY0fh+4e3qmlv8UwAFmDUyE0ul1KZe9/mOHGFXtl4hUlMUpUoXZNj6+cYjtwgMidqZ 9sHgEZ2IWT0GI8WdymniR82ZQpMfO/84PEZCuhmzFk/xfCn0O5cv1WGv7whWJ+ZgyBzbji +cQ5BUu+C2/vi/x/hCiCJkgR+oGQBro= Received: from mail-qt1-f199.google.com (mail-qt1-f199.google.com [209.85.160.199]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-295-6cbw9iAlMPOrTgEh6dqYyQ-1; Wed, 04 Oct 2023 18:02:50 -0400 X-MC-Unique: 6cbw9iAlMPOrTgEh6dqYyQ-1 Received: by mail-qt1-f199.google.com with SMTP id d75a77b69052e-41810d0d8c2so702261cf.1 for ; Wed, 04 Oct 2023 15:02:49 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1696456969; x=1697061769; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=2vHYMUUZwdgemINSHDTriLnfve6+5EiI4PRPsUAHgjs=; b=G+5CERwdxDcgsX471Xmok9dCz/KN9MV3kDlVbfAJvOYHjLqeaHeWcpCDf6VhZYLu8S 0vUCoExt0VAp4ZdCuCCqxCCYFWUQqfU/O96Ubigr5U/k63SaPEL5uIJ0suBDPQgF8dS8 ntGjCgdJ4Seh8wCrW9gy9afxAdur/3przZdMw++1onlADJm/LHWEf9NTWwkZ5UNnouC5 Gn54iTWB2eRWpjr0bAfkhSaXXCYyzMcY0s+whuFvt4KJJBYcbp8+MTv3KMdR11rtCVmz /kwikrNoKu4sO9G9LZn5kYYXjyrSgnjXgGYGf4MPmubaszTIbX1mJ0s3wCxOuvTZOUBU 0NaQ== X-Gm-Message-State: AOJu0YyJYJCCfEvqhvZdAvBiBGycyPBDn4hX5tART292chHnVkB22i1r J0tqDFRm0wP6AX45DDt0iuI9xF8vdsHh/u2LghL23R3ErcMifppAuv7Te0At2zgebvNOK4uKRTo wd3U2+nBpaAbfeTBJ2xuXZ5Gu8RuSEIjCKDZnCR04hrEq8dSHiG2ZexOMLNnBYCjQApybdpMG X-Received: by 2002:a05:6214:e62:b0:65a:fd40:24d8 with SMTP id jz2-20020a0562140e6200b0065afd4024d8mr3700408qvb.4.1696456969176; Wed, 04 Oct 2023 15:02:49 -0700 (PDT) X-Google-Smtp-Source: AGHT+IE99+TLip7o5d37PG2efjr7xA1ZDJgFnB8zquaxa2TXbLIVNXB/pGS+LLzukN2J3hJQ+jrFtQ== X-Received: by 2002:a05:6214:e62:b0:65a:fd40:24d8 with SMTP id jz2-20020a0562140e6200b0065afd4024d8mr3700378qvb.4.1696456968702; Wed, 04 Oct 2023 15:02:48 -0700 (PDT) Received: from x1n.redhat.com (cpe5c7695f3aee0-cm5c7695f3aede.cpe.net.cable.rogers.com. [99.254.144.39]) by smtp.gmail.com with ESMTPSA id w17-20020a0cdf91000000b0063d162a8b8bsm10821qvl.19.2023.10.04.15.02.47 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 04 Oct 2023 15:02:47 -0700 (PDT) From: Peter Xu To: qemu-devel@nongnu.org Cc: peterx@redhat.com, Fabiano Rosas , Juan Quintela Subject: [PATCH v3 06/10] migration: Remember num of ramblocks to sync during recovery Date: Wed, 4 Oct 2023 18:02:36 -0400 Message-ID: <20231004220240.167175-7-peterx@redhat.com> X-Mailer: git-send-email 2.41.0 In-Reply-To: <20231004220240.167175-1-peterx@redhat.com> References: <20231004220240.167175-1-peterx@redhat.com> MIME-Version: 1.0 Received-SPF: pass client-ip=170.10.129.124; envelope-from=peterx@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H4=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Instead of only relying on the count of rp_sem, make the counter be part of RAMState so it can be used in both threads to synchronize on the process. rp_sem will be further reused in follow up patches, as a way to kick the main thread, e.g., on recovery failures. Reviewed-by: Fabiano Rosas Signed-off-by: Peter Xu Reviewed-by: Juan Quintela --- migration/ram.c | 17 ++++++++++++++--- 1 file changed, 14 insertions(+), 3 deletions(-) diff --git a/migration/ram.c b/migration/ram.c index c54e071ea3..ef4af3fbce 100644 --- a/migration/ram.c +++ b/migration/ram.c @@ -394,6 +394,14 @@ struct RAMState { /* Queue of outstanding page requests from the destination */ QemuMutex src_page_req_mutex; QSIMPLEQ_HEAD(, RAMSrcPageRequest) src_page_requests; + + /* + * This is only used when postcopy is in recovery phase, to communicate + * between the migration thread and the return path thread on dirty + * bitmap synchronizations. This field is unused in other stages of + * RAM migration. + */ + unsigned int postcopy_bmap_sync_requested; }; typedef struct RAMState RAMState; @@ -4121,20 +4129,20 @@ static int ram_dirty_bitmap_sync_all(MigrationState *s, RAMState *rs) { RAMBlock *block; QEMUFile *file = s->to_dst_file; - int ramblock_count = 0; trace_ram_dirty_bitmap_sync_start(); + qatomic_set(&rs->postcopy_bmap_sync_requested, 0); RAMBLOCK_FOREACH_NOT_IGNORED(block) { qemu_savevm_send_recv_bitmap(file, block->idstr); trace_ram_dirty_bitmap_request(block->idstr); - ramblock_count++; + qatomic_inc(&rs->postcopy_bmap_sync_requested); } trace_ram_dirty_bitmap_sync_wait(); /* Wait until all the ramblocks' dirty bitmap synced */ - while (ramblock_count--) { + while (qatomic_read(&rs->postcopy_bmap_sync_requested)) { qemu_sem_wait(&s->rp_state.rp_sem); } @@ -4161,6 +4169,7 @@ int ram_dirty_bitmap_reload(MigrationState *s, RAMBlock *block, Error **errp) unsigned long *le_bitmap, nbits = block->used_length >> TARGET_PAGE_BITS; uint64_t local_size = DIV_ROUND_UP(nbits, 8); uint64_t size, end_mark; + RAMState *rs = ram_state; trace_ram_dirty_bitmap_reload_begin(block->idstr); @@ -4226,6 +4235,8 @@ int ram_dirty_bitmap_reload(MigrationState *s, RAMBlock *block, Error **errp) /* We'll recalculate migration_dirty_pages in ram_state_resume_prepare(). */ trace_ram_dirty_bitmap_reload_complete(block->idstr); + qatomic_dec(&rs->postcopy_bmap_sync_requested); + /* * We succeeded to sync bitmap for current ramblock. If this is * the last one to sync, we need to notify the main send thread. From patchwork Wed Oct 4 22:02:37 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 13409490 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 90FA9E936EA for ; Wed, 4 Oct 2023 22:04:47 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1qo9xL-0004I2-EX; Wed, 04 Oct 2023 18:02:59 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1qo9xI-0004CG-Da for qemu-devel@nongnu.org; Wed, 04 Oct 2023 18:02:56 -0400 Received: from us-smtp-delivery-124.mimecast.com ([170.10.129.124]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1qo9xF-0003uc-Hz for qemu-devel@nongnu.org; Wed, 04 Oct 2023 18:02:56 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1696456972; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=6qVVdc1vklEI9hY/ZbP4mgABUKv0o2qcSEYQEaeTmoE=; b=D+AD7T5okW0E/cBBdGdqlaKsMVvUn9fbWlZOp93FctUI3jAa/e+Kl0Nn4I+Xc8GxKxP86X WCgOHvlGRamnWzV43PLSDDxeLRRGr9f+yJ7SawO6QnVUx0QSy3ui7HIzU8XKXqKtePyZ16 YbXLf3+STxP+kIpucWB5mO622znolI4= Received: from mail-qv1-f70.google.com (mail-qv1-f70.google.com [209.85.219.70]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-371-0s-IN6ddMCSQy2x2UPiyOA-1; Wed, 04 Oct 2023 18:02:51 -0400 X-MC-Unique: 0s-IN6ddMCSQy2x2UPiyOA-1 Received: by mail-qv1-f70.google.com with SMTP id 6a1803df08f44-65623d0075aso526606d6.0 for ; Wed, 04 Oct 2023 15:02:51 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1696456970; x=1697061770; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=6qVVdc1vklEI9hY/ZbP4mgABUKv0o2qcSEYQEaeTmoE=; b=GsLJ4nvIyotowBPWlrUs5nJS+X8JCD/EAjthrGL7WNshSFdbmCdduppL7gc/9gQwEh i6iC8bczuPQ3HsCGcvTTbBOWUdQqYvMmhCkzKA+BhlHAzYkA0A3JEsShPCKy73uxgerJ VoCHNLbJ1ne6FiA8bfH7r3Y9J6AzpEa00mP0hc2rjGGLlVwutcU5i+mT2k2xLLd0SbgO 3EG4S8WN1QFMHh1Rg0b95jQKtNYXCC/gbUPm4ig20Mo//Kyu2ST6QQaydno2VHEOhxtz pZnKWkNFbdhA42gxit9NvUc2MuFhDnppHomy7UB5mmM99JdeqJT9Qr7MjFOEccdJr3rK PezA== X-Gm-Message-State: AOJu0Yyezl9fTuKynEUmRvqgEX3jE3o28M7XuNso/YjJK2VqQYs6pdok BCc/tW1KksQaFaJ7o5qugOJfDulWWR8LxxtIYWFt8rxjOeJ4qBqmH1Tma/N/S5MkWEiKEgUlDhQ krSD6Zgoyb/dOaNUxnaFzZ73LS5VyRL199qZRUP0FnNLgT4h1Ee/lfMs87vYQj6qOpZdebWwS X-Received: by 2002:ad4:5cce:0:b0:658:305f:d81d with SMTP id iu14-20020ad45cce000000b00658305fd81dmr3694551qvb.0.1696456970558; Wed, 04 Oct 2023 15:02:50 -0700 (PDT) X-Google-Smtp-Source: AGHT+IF3y1GndWC1lr/0LBXzpCpwmS9DhgTlTbJk0RimmkKejFd2OrJVQus4AQnzBombFvlGANY6vA== X-Received: by 2002:ad4:5cce:0:b0:658:305f:d81d with SMTP id iu14-20020ad45cce000000b00658305fd81dmr3694518qvb.0.1696456970112; Wed, 04 Oct 2023 15:02:50 -0700 (PDT) Received: from x1n.redhat.com (cpe5c7695f3aee0-cm5c7695f3aede.cpe.net.cable.rogers.com. [99.254.144.39]) by smtp.gmail.com with ESMTPSA id w17-20020a0cdf91000000b0063d162a8b8bsm10821qvl.19.2023.10.04.15.02.48 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 04 Oct 2023 15:02:49 -0700 (PDT) From: Peter Xu To: qemu-devel@nongnu.org Cc: peterx@redhat.com, Fabiano Rosas , Juan Quintela Subject: [PATCH v3 07/10] migration: Add migration_rp_wait|kick() Date: Wed, 4 Oct 2023 18:02:37 -0400 Message-ID: <20231004220240.167175-8-peterx@redhat.com> X-Mailer: git-send-email 2.41.0 In-Reply-To: <20231004220240.167175-1-peterx@redhat.com> References: <20231004220240.167175-1-peterx@redhat.com> MIME-Version: 1.0 Received-SPF: pass client-ip=170.10.129.124; envelope-from=peterx@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H4=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org It's just a simple wrapper for rp_sem on either wait() or kick(), make it even clearer on how it is used. Prepared to be used even for other things. Reviewed-by: Fabiano Rosas Signed-off-by: Peter Xu Reviewed-by: Juan Quintela --- migration/migration.h | 15 +++++++++++++++ migration/migration.c | 14 ++++++++++++-- migration/ram.c | 16 +++++++--------- 3 files changed, 34 insertions(+), 11 deletions(-) diff --git a/migration/migration.h b/migration/migration.h index 33a7831da4..573aa69f19 100644 --- a/migration/migration.h +++ b/migration/migration.h @@ -315,6 +315,12 @@ struct MigrationState { * be cleared in the rp_thread! */ bool rp_thread_created; + /* + * Used to synchronize between migration main thread and return + * path thread. The migration thread can wait() on this sem, while + * other threads (e.g., return path thread) can kick it using a + * post(). + */ QemuSemaphore rp_sem; /* * We post to this when we got one PONG from dest. So far it's an @@ -526,4 +532,13 @@ void migration_populate_vfio_info(MigrationInfo *info); void migration_reset_vfio_bytes_transferred(void); void postcopy_temp_page_reset(PostcopyTmpPage *tmp_page); +/* Migration thread waiting for return path thread. */ +void migration_rp_wait(MigrationState *s); +/* + * Kick the migration thread waiting for return path messages. NOTE: the + * name can be slightly confusing (when read as "kick the rp thread"), just + * to remember the target is always the migration thread. + */ +void migration_rp_kick(MigrationState *s); + #endif diff --git a/migration/migration.c b/migration/migration.c index b28b504b4c..1b7ed2d35a 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -1749,6 +1749,16 @@ void qmp_migrate_continue(MigrationStatus state, Error **errp) qemu_sem_post(&s->pause_sem); } +void migration_rp_wait(MigrationState *s) +{ + qemu_sem_wait(&s->rp_state.rp_sem); +} + +void migration_rp_kick(MigrationState *s) +{ + qemu_sem_post(&s->rp_state.rp_sem); +} + static struct rp_cmd_args { ssize_t len; /* -1 = variable */ const char *name; @@ -1820,7 +1830,7 @@ static int migrate_handle_rp_resume_ack(MigrationState *s, MIGRATION_STATUS_POSTCOPY_ACTIVE); /* Notify send thread that time to continue send pages */ - qemu_sem_post(&s->rp_state.rp_sem); + migration_rp_kick(s); return 0; } @@ -2447,7 +2457,7 @@ static int postcopy_resume_handshake(MigrationState *s) qemu_savevm_send_postcopy_resume(s->to_dst_file); while (s->state == MIGRATION_STATUS_POSTCOPY_RECOVER) { - qemu_sem_wait(&s->rp_state.rp_sem); + migration_rp_wait(s); } if (s->state == MIGRATION_STATUS_POSTCOPY_ACTIVE) { diff --git a/migration/ram.c b/migration/ram.c index ef4af3fbce..43ba62be83 100644 --- a/migration/ram.c +++ b/migration/ram.c @@ -4143,7 +4143,7 @@ static int ram_dirty_bitmap_sync_all(MigrationState *s, RAMState *rs) /* Wait until all the ramblocks' dirty bitmap synced */ while (qatomic_read(&rs->postcopy_bmap_sync_requested)) { - qemu_sem_wait(&s->rp_state.rp_sem); + migration_rp_wait(s); } trace_ram_dirty_bitmap_sync_complete(); @@ -4151,11 +4151,6 @@ static int ram_dirty_bitmap_sync_all(MigrationState *s, RAMState *rs) return 0; } -static void ram_dirty_bitmap_reload_notify(MigrationState *s) -{ - qemu_sem_post(&s->rp_state.rp_sem); -} - /* * Read the received bitmap, revert it as the initial dirty bitmap. * This is only used when the postcopy migration is paused but wants @@ -4238,10 +4233,13 @@ int ram_dirty_bitmap_reload(MigrationState *s, RAMBlock *block, Error **errp) qatomic_dec(&rs->postcopy_bmap_sync_requested); /* - * We succeeded to sync bitmap for current ramblock. If this is - * the last one to sync, we need to notify the main send thread. + * We succeeded to sync bitmap for current ramblock. Always kick the + * migration thread to check whether all requested bitmaps are + * reloaded. NOTE: it's racy to only kick when requested==0, because + * we don't know whether the migration thread may still be increasing + * it. */ - ram_dirty_bitmap_reload_notify(s); + migration_rp_kick(s); ret = 0; out: From patchwork Wed Oct 4 22:02:38 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 13409481 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id EA1E8E936EE for ; Wed, 4 Oct 2023 22:03:30 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1qo9xO-0004OZ-I0; Wed, 04 Oct 2023 18:03:02 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1qo9xJ-0004HG-8v for qemu-devel@nongnu.org; Wed, 04 Oct 2023 18:02:57 -0400 Received: from us-smtp-delivery-124.mimecast.com ([170.10.129.124]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1qo9xG-0003v7-Of for qemu-devel@nongnu.org; Wed, 04 Oct 2023 18:02:57 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1696456974; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=O7vF4oampnVew52zhwRc8UGlegpzHZt7lFld7Xo897I=; b=LLeXYvYT2L67Vqk6ssTJ48rNI1uC9l+9dR9PY9X1Rk4bWODInAImhTz0Pfcr2HjUoYfya5 /3CWyPs+f5QPLdRP6AaHkR092v3dwP+/jAXUYkPSEjtlRcADKbngzADFcD2ONuSA9oHHhm g9G8Tp8wJRc2uRTDgjI0IE6RH++09pY= Received: from mail-qv1-f71.google.com (mail-qv1-f71.google.com [209.85.219.71]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-639-_GCTfS_MP-KgaCFLGwMXHw-1; Wed, 04 Oct 2023 18:02:52 -0400 X-MC-Unique: _GCTfS_MP-KgaCFLGwMXHw-1 Received: by mail-qv1-f71.google.com with SMTP id 6a1803df08f44-66011f0d521so370686d6.1 for ; Wed, 04 Oct 2023 15:02:52 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1696456972; x=1697061772; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=O7vF4oampnVew52zhwRc8UGlegpzHZt7lFld7Xo897I=; b=jFVPhcEWyPJmI7yVuHNA8OQakXgmQ3CnmHT969A+l4MKW/BmSnRJWfVADNqEOaiwqN USxlUkauoTUs5BrFkrNqPYUvPyqENkXmjtnUF2LK6WcBlfc2fsWa/zSdMOTtNLTmEQHH MJOzmJETh4cuHVrLxSErzMXsqC5RQ/WJDaiqWo0LYLct3ZsbV2eREXoNF26P8hp61EvT ZaN30Kf8mulY2kihRytaLgeOVW4/XFCfNShCcHNYfhaxX/X2MWkBMxnd/Z1vfB5JICjb gWGboEHJf2TLhuVOwqWe3tffmUY+/GfZ/00DFuH5zhkzF8A5tn7EUqBylQL6TGdoD91F UTvQ== X-Gm-Message-State: AOJu0Yzat/jM3oaUifShpL673H4T6PcpkxoiI7+t6ror6DQTiWjhr8MJ Ns5VS7QRJ8NkPcjgI26/A4m92LeT2xQ09mV7MfXNDeU12ohSeDHbcqAzPoafVNlEAyoiK3KeGjC lB8EEo2etKx0xp88JIjcbtRE4ohslUj+xBNBPSKe5SDDK6GIpt+L1/gF1BZ/dfia65T6b3a6E X-Received: by 2002:a05:6214:4003:b0:656:308b:98d1 with SMTP id kd3-20020a056214400300b00656308b98d1mr3774512qvb.2.1696456971796; Wed, 04 Oct 2023 15:02:51 -0700 (PDT) X-Google-Smtp-Source: AGHT+IFmoc1qoM3LUgC4A8YdcF5p+Q/xBmqZ4gFv6RpoIwqPca0JtI5Z5uo2FUVI3dVfRcB0xsqEsw== X-Received: by 2002:a05:6214:4003:b0:656:308b:98d1 with SMTP id kd3-20020a056214400300b00656308b98d1mr3774486qvb.2.1696456971405; Wed, 04 Oct 2023 15:02:51 -0700 (PDT) Received: from x1n.redhat.com (cpe5c7695f3aee0-cm5c7695f3aede.cpe.net.cable.rogers.com. [99.254.144.39]) by smtp.gmail.com with ESMTPSA id w17-20020a0cdf91000000b0063d162a8b8bsm10821qvl.19.2023.10.04.15.02.50 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 04 Oct 2023 15:02:50 -0700 (PDT) From: Peter Xu To: qemu-devel@nongnu.org Cc: peterx@redhat.com, Fabiano Rosas , Juan Quintela , Xiaohui Li Subject: [PATCH v3 08/10] migration: Allow network to fail even during recovery Date: Wed, 4 Oct 2023 18:02:38 -0400 Message-ID: <20231004220240.167175-9-peterx@redhat.com> X-Mailer: git-send-email 2.41.0 In-Reply-To: <20231004220240.167175-1-peterx@redhat.com> References: <20231004220240.167175-1-peterx@redhat.com> MIME-Version: 1.0 Received-SPF: pass client-ip=170.10.129.124; envelope-from=peterx@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H4=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Normally the postcopy recover phase should only exist for a super short period, that's the duration when QEMU is trying to recover from an interrupted postcopy migration, during which handshake will be carried out for continuing the procedure with state changes from PAUSED -> RECOVER -> POSTCOPY_ACTIVE again. Here RECOVER phase should be super small, that happens right after the admin specified a new but working network link for QEMU to reconnect to dest QEMU. However there can still be case where the channel is broken in this small RECOVER window. If it happens, with current code there's no way the src QEMU can got kicked out of RECOVER stage. No way either to retry the recover in another channel when established. This patch allows the RECOVER phase to fail itself too - we're mostly ready, just some small things missing, e.g. properly kick the main migration thread out when sleeping on rp_sem when we found that we're at RECOVER stage. When this happens, it fails the RECOVER itself, and rollback to PAUSED stage. Then the user can retry another round of recovery. To make it even stronger, teach QMP command migrate-pause to explicitly kick src/dst QEMU out when needed, so even if for some reason the migration thread didn't got kicked out already by a failing rethrn-path thread, the admin can also kick it out. This will be an super, super corner case, but still try to cover that. One can try to test this with two proxy channels for migration: (a) socat unix-listen:/tmp/src.sock,reuseaddr,fork tcp:localhost:10000 (b) socat tcp-listen:10000,reuseaddr,fork unix:/tmp/dst.sock So the migration channel will be: (a) (b) src -> /tmp/src.sock -> tcp:10000 -> /tmp/dst.sock -> dst Then to make QEMU hang at RECOVER stage, one can do below: (1) stop the postcopy using QMP command postcopy-pause (2) kill the 2nd proxy (b) (3) try to recover the postcopy using /tmp/src.sock on src (4) src QEMU will go into RECOVER stage but won't be able to continue from there, because the channel is actually broken at (b) Before this patch, step (4) will make src QEMU stuck in RECOVER stage, without a way to kick the QEMU out or continue the postcopy again. After this patch, (4) will quickly fail qemu and bounce back to PAUSED stage. Admin can also kick QEMU from (4) into PAUSED when needed using migrate-pause when needed. After bouncing back to PAUSED stage, one can recover again. Reported-by: Xiaohui Li Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2111332 Signed-off-by: Peter Xu Reviewed-by: Fabiano Rosas --- migration/migration.h | 8 ++++-- migration/migration.c | 63 +++++++++++++++++++++++++++++++++++++++---- migration/ram.c | 4 ++- 3 files changed, 67 insertions(+), 8 deletions(-) diff --git a/migration/migration.h b/migration/migration.h index 573aa69f19..f985d3dedb 100644 --- a/migration/migration.h +++ b/migration/migration.h @@ -492,6 +492,7 @@ int migrate_init(MigrationState *s, Error **errp); bool migration_is_blocked(Error **errp); /* True if outgoing migration has entered postcopy phase */ bool migration_in_postcopy(void); +bool migration_postcopy_is_alive(int state); MigrationState *migrate_get_current(void); uint64_t ram_get_total_transferred_pages(void); @@ -532,8 +533,11 @@ void migration_populate_vfio_info(MigrationInfo *info); void migration_reset_vfio_bytes_transferred(void); void postcopy_temp_page_reset(PostcopyTmpPage *tmp_page); -/* Migration thread waiting for return path thread. */ -void migration_rp_wait(MigrationState *s); +/* + * Migration thread waiting for return path thread. Return non-zero if an + * error is detected. + */ +int migration_rp_wait(MigrationState *s); /* * Kick the migration thread waiting for return path messages. NOTE: the * name can be slightly confusing (when read as "kick the rp thread"), just diff --git a/migration/migration.c b/migration/migration.c index 1b7ed2d35a..1a7f214fcf 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -1345,6 +1345,17 @@ bool migration_in_postcopy(void) } } +bool migration_postcopy_is_alive(int state) +{ + switch (state) { + case MIGRATION_STATUS_POSTCOPY_ACTIVE: + case MIGRATION_STATUS_POSTCOPY_RECOVER: + return true; + default: + return false; + } +} + bool migration_in_postcopy_after_devices(MigrationState *s) { return migration_in_postcopy() && s->postcopy_after_devices; @@ -1552,8 +1563,15 @@ void qmp_migrate_pause(Error **errp) MigrationIncomingState *mis = migration_incoming_get_current(); int ret = 0; - if (ms->state == MIGRATION_STATUS_POSTCOPY_ACTIVE) { + if (migration_postcopy_is_alive(ms->state)) { /* Source side, during postcopy */ + Error *error = NULL; + + /* Tell the core migration that we're pausing */ + error_setg(&error, "Postcopy migration is paused by the user"); + migrate_set_error(ms, error); + error_free(error); + qemu_mutex_lock(&ms->qemu_file_lock); if (ms->to_dst_file) { ret = qemu_file_shutdown(ms->to_dst_file); @@ -1562,10 +1580,17 @@ void qmp_migrate_pause(Error **errp) if (ret) { error_setg(errp, "Failed to pause source migration"); } + + /* + * Kick the migration thread out of any waiting windows (on behalf + * of the rp thread). + */ + migration_rp_kick(ms); + return; } - if (mis->state == MIGRATION_STATUS_POSTCOPY_ACTIVE) { + if (migration_postcopy_is_alive(mis->state)) { ret = qemu_file_shutdown(mis->from_src_file); if (ret) { error_setg(errp, "Failed to pause destination migration"); @@ -1574,7 +1599,7 @@ void qmp_migrate_pause(Error **errp) } error_setg(errp, "migrate-pause is currently only supported " - "during postcopy-active state"); + "during postcopy-active or postcopy-recover state"); } bool migration_is_blocked(Error **errp) @@ -1749,9 +1774,21 @@ void qmp_migrate_continue(MigrationStatus state, Error **errp) qemu_sem_post(&s->pause_sem); } -void migration_rp_wait(MigrationState *s) +int migration_rp_wait(MigrationState *s) { + /* If migration has failure already, ignore the wait */ + if (migrate_has_error(s)) { + return -1; + } + qemu_sem_wait(&s->rp_state.rp_sem); + + /* After wait, double check that there's no failure */ + if (migrate_has_error(s)) { + return -1; + } + + return 0; } void migration_rp_kick(MigrationState *s) @@ -2017,6 +2054,20 @@ out: trace_source_return_path_thread_bad_end(); } + if (ms->state == MIGRATION_STATUS_POSTCOPY_RECOVER) { + /* + * this will be extremely unlikely: that we got yet another network + * issue during recovering of the 1st network failure.. during this + * period the main migration thread can be waiting on rp_sem for + * this thread to sync with the other side. + * + * When this happens, explicitly kick the migration thread out of + * RECOVER stage and back to PAUSED, so the admin can try + * everything again. + */ + migration_rp_kick(ms); + } + trace_source_return_path_thread_end(); rcu_unregister_thread(); return NULL; @@ -2457,7 +2508,9 @@ static int postcopy_resume_handshake(MigrationState *s) qemu_savevm_send_postcopy_resume(s->to_dst_file); while (s->state == MIGRATION_STATUS_POSTCOPY_RECOVER) { - migration_rp_wait(s); + if (migration_rp_wait(s)) { + return -1; + } } if (s->state == MIGRATION_STATUS_POSTCOPY_ACTIVE) { diff --git a/migration/ram.c b/migration/ram.c index 43ba62be83..2565f53f5c 100644 --- a/migration/ram.c +++ b/migration/ram.c @@ -4143,7 +4143,9 @@ static int ram_dirty_bitmap_sync_all(MigrationState *s, RAMState *rs) /* Wait until all the ramblocks' dirty bitmap synced */ while (qatomic_read(&rs->postcopy_bmap_sync_requested)) { - migration_rp_wait(s); + if (migration_rp_wait(s)) { + return -1; + } } trace_ram_dirty_bitmap_sync_complete(); From patchwork Wed Oct 4 22:02:39 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 13409482 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 782A2E936EA for ; Wed, 4 Oct 2023 22:03:40 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1qo9xZ-000571-FZ; Wed, 04 Oct 2023 18:03:13 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1qo9xX-0004yx-UW for qemu-devel@nongnu.org; Wed, 04 Oct 2023 18:03:11 -0400 Received: from us-smtp-delivery-124.mimecast.com ([170.10.129.124]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1qo9xW-0003xh-CL for qemu-devel@nongnu.org; Wed, 04 Oct 2023 18:03:11 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1696456989; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=/HwT+zMcpoZEFUPWctUoS6yoeEKIEIcZgfW9Ak1oe+M=; b=Jygi8Booiev9dahYBE+8uU/fQe4BHfvH/qPQbup8HB9n08hS3y5OwLd1/2KPuNQmYvsOYP FZcNkmZTlMpM2oHVMLJvTjMtkadlQ3boHE6YK59HL1cBpVT2NY5L0nCQgZCEqnOE3lPNtO MJWvEFNrbd7bpWHVVyFKAaVckCsaV6Y= Received: from mail-yb1-f199.google.com (mail-yb1-f199.google.com [209.85.219.199]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-617-rp-UjmZaPqSt2wz4kSI7mg-1; Wed, 04 Oct 2023 18:02:53 -0400 X-MC-Unique: rp-UjmZaPqSt2wz4kSI7mg-1 Received: by mail-yb1-f199.google.com with SMTP id 3f1490d57ef6-d818fe59cacso78580276.1 for ; Wed, 04 Oct 2023 15:02:53 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1696456972; x=1697061772; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=/HwT+zMcpoZEFUPWctUoS6yoeEKIEIcZgfW9Ak1oe+M=; b=s5/ZWj2ZaPdiqpHlcVOlWnixIfdviHljTvNwCF7NKQXeU8nALHg53C60+lJr1qF7dQ oUy6pvygkPDLqmD3ICpJcNDKpEIUEFCOZ6v1VuyFTLwMqNugUhGgIIgj/4BlYpi5c9Q5 pD8Db03RAnqcQQGsfp+9v5IKN1E3P1L1HQThNaJ2XgcZ1UwnpTWZcHvmQpIJCuWhundC KPVSbjB+eSTlmsbH02SFQHlJx/l0dscmios3cqXVuOoEDa6ZXT1QXO8RRp1bYFRmKu7p lbpkcS0UA0m+IsYCovo56v/jzeCYTWaNThWXWmnsP9ICA7Pd9d/m0OFqPhV3pnevi8mq IrVg== X-Gm-Message-State: AOJu0YyhTS6frjbs00uAL2vQVGEngAtPG7MoSoQJxFqjOehrwilnIhtC 9yc1dO5lIfRidW6/8i12cyvtUeU1I6Aa3piaiLBMjE0JEoQ9yJd4xQILOBGYldZci8PrTMel5nE uoOshM31VnEyaJDiveyn+3oUgtBn/h6RKReRRhUZCGRZrgVoAxvxMTtLzq6A2F/fSiAtCYrBL X-Received: by 2002:a25:aad0:0:b0:d36:4892:998b with SMTP id t74-20020a25aad0000000b00d364892998bmr2621812ybi.5.1696456972638; Wed, 04 Oct 2023 15:02:52 -0700 (PDT) X-Google-Smtp-Source: AGHT+IE0NuNqFtG1SCVV1hm7R/eTesN2sU+0T2PmirLTlc5CMv2XuzD2yRUqQ1uxsJZHj1S99CYVog== X-Received: by 2002:a25:aad0:0:b0:d36:4892:998b with SMTP id t74-20020a25aad0000000b00d364892998bmr2621791ybi.5.1696456972254; Wed, 04 Oct 2023 15:02:52 -0700 (PDT) Received: from x1n.redhat.com (cpe5c7695f3aee0-cm5c7695f3aede.cpe.net.cable.rogers.com. [99.254.144.39]) by smtp.gmail.com with ESMTPSA id w17-20020a0cdf91000000b0063d162a8b8bsm10821qvl.19.2023.10.04.15.02.51 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 04 Oct 2023 15:02:51 -0700 (PDT) From: Peter Xu To: qemu-devel@nongnu.org Cc: peterx@redhat.com, Fabiano Rosas , Juan Quintela Subject: [PATCH v3 09/10] migration: Allow RECOVER->PAUSED convertion for dest qemu Date: Wed, 4 Oct 2023 18:02:39 -0400 Message-ID: <20231004220240.167175-10-peterx@redhat.com> X-Mailer: git-send-email 2.41.0 In-Reply-To: <20231004220240.167175-1-peterx@redhat.com> References: <20231004220240.167175-1-peterx@redhat.com> MIME-Version: 1.0 Received-SPF: pass client-ip=170.10.129.124; envelope-from=peterx@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H4=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org There's a bug on dest that if a double fault triggered on dest qemu (a network issue during postcopy-recover), we won't set PAUSED correctly because we assumed we always came from ACTIVE. Fix that by always overwriting the state to PAUSE. We could also check for these two states, but maybe it's an overkill. We did the same on the src QEMU to unconditionally switch to PAUSE anyway. Reviewed-by: Fabiano Rosas Signed-off-by: Peter Xu Reviewed-by: Juan Quintela --- migration/savevm.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/migration/savevm.c b/migration/savevm.c index 60eec7c31f..497ce02bd7 100644 --- a/migration/savevm.c +++ b/migration/savevm.c @@ -2734,7 +2734,8 @@ static bool postcopy_pause_incoming(MigrationIncomingState *mis) qemu_mutex_unlock(&mis->postcopy_prio_thread_mutex); } - migrate_set_state(&mis->state, MIGRATION_STATUS_POSTCOPY_ACTIVE, + /* Current state can be either ACTIVE or RECOVER */ + migrate_set_state(&mis->state, mis->state, MIGRATION_STATUS_POSTCOPY_PAUSED); /* Notify the fault thread for the invalidated file handle */ From patchwork Wed Oct 4 22:02:40 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 13409491 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id D5F91E936EB for ; Wed, 4 Oct 2023 22:05:00 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1qo9xQ-0004aB-E8; Wed, 04 Oct 2023 18:03:04 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1qo9xN-0004Nz-H1 for qemu-devel@nongnu.org; Wed, 04 Oct 2023 18:03:01 -0400 Received: from us-smtp-delivery-124.mimecast.com ([170.10.129.124]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1qo9xI-0003vP-E3 for qemu-devel@nongnu.org; Wed, 04 Oct 2023 18:03:01 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1696456975; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=CFhEl8wwd9dTqoVR02b8lIiRWMn2Dn8nrLJ9swljZms=; b=WlWPxArXgb8zM928A76fCte6Ubfakfw+0ueeZDpJIgpPduT5diFjsBq12vMTbWAz5lp+I/ zqpHsBsdF5wXzqKSY5c2FMak/unD1ZWEglM+hD3GiRxSEPd7YUKMZ5jyrAAD3IXaLYBCpP I/4f/BlbkSKf2qolNvHJPVuBSYVlXpA= Received: from mail-yb1-f198.google.com (mail-yb1-f198.google.com [209.85.219.198]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-681-hOcWWHIsPiOLgvC22qK-DQ-1; Wed, 04 Oct 2023 18:02:54 -0400 X-MC-Unique: hOcWWHIsPiOLgvC22qK-DQ-1 Received: by mail-yb1-f198.google.com with SMTP id 3f1490d57ef6-d81ad678f5aso78126276.1 for ; Wed, 04 Oct 2023 15:02:54 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1696456974; x=1697061774; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=CFhEl8wwd9dTqoVR02b8lIiRWMn2Dn8nrLJ9swljZms=; b=FC22RAredXj/k7xK7P51CkFDY6xpn1FazxnLdvX95SosxeJXadeBF0kcNBzYnFLV3B ZSn5/ornfJ5HWLysFUzpLMTchaO3PtGPOG7beSkoEiy1WZek0enuTkRraUplLX4Um1Ef DmDIf3i6R6Mk1SZrmtgs1Oq3lIjWAcPIKHiN10OiNb3BaNjNlsxau/IRPtOwOjbTMYpN XONDDVQTDN7vo2Xvb37RC+vTzWYsKBmsKH48ofz6Y4+zYZhpozqZzP2qrTK6OUKXFj4X a+X0Fomv1NGRECYgJiE1nRwJtExKncq9e7a+LcqSdcolheX+QMkgJoy7eK9T/SPeIKuR spdQ== X-Gm-Message-State: AOJu0Yycbt8nRU0NXFuvFwNSd1N34zk7saC6tgoCQGlJmAl9ZVEban5R xEFOdG8HIUrdU/twdv5AokH+vjXUo9fnst2KDF7CVBas/+SgMHeVNFGCYAiaKRLfgx4/DTTPJGf jERr0/UOXdgddVdKCGdu38in1qltZM4TlCBMOQnarHsVa0bJ+xKYoUxUMyS9qm1apv5XWWyKj X-Received: by 2002:a25:f446:0:b0:d90:e580:2b23 with SMTP id p6-20020a25f446000000b00d90e5802b23mr2776691ybe.4.1696456973824; Wed, 04 Oct 2023 15:02:53 -0700 (PDT) X-Google-Smtp-Source: AGHT+IHeTibOH544rdIb8uLKt+bgYS98/fU7LsCvmduZr3WDm9KyEWOYeiKS8fCE3IIVZuG+2csOcw== X-Received: by 2002:a25:f446:0:b0:d90:e580:2b23 with SMTP id p6-20020a25f446000000b00d90e5802b23mr2776654ybe.4.1696456973405; Wed, 04 Oct 2023 15:02:53 -0700 (PDT) Received: from x1n.redhat.com (cpe5c7695f3aee0-cm5c7695f3aede.cpe.net.cable.rogers.com. [99.254.144.39]) by smtp.gmail.com with ESMTPSA id w17-20020a0cdf91000000b0063d162a8b8bsm10821qvl.19.2023.10.04.15.02.52 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 04 Oct 2023 15:02:52 -0700 (PDT) From: Peter Xu To: qemu-devel@nongnu.org Cc: peterx@redhat.com, Fabiano Rosas , Juan Quintela Subject: [PATCH v3 10/10] tests/migration-test: Add a test for postcopy hangs during RECOVER Date: Wed, 4 Oct 2023 18:02:40 -0400 Message-ID: <20231004220240.167175-11-peterx@redhat.com> X-Mailer: git-send-email 2.41.0 In-Reply-To: <20231004220240.167175-1-peterx@redhat.com> References: <20231004220240.167175-1-peterx@redhat.com> MIME-Version: 1.0 Received-SPF: pass client-ip=170.10.129.124; envelope-from=peterx@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H4=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_NONE=0.001, T_SPF_TEMPERROR=0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org From: Fabiano Rosas To do so, create two paired sockets, but make them not providing real data. Feed those fake sockets to src/dst QEMUs for recovery to let them go into RECOVER stage without going out. Test that we can always kick it out and recover again with the right ports. This patch is based on Fabiano's version here: https://lore.kernel.org/r/877cowmdu0.fsf@suse.de Signed-off-by: Fabiano Rosas [peterx: write commit message, remove case 1, fix bugs, and more] Signed-off-by: Peter Xu Signed-off-by: Peter Xu --- tests/qtest/migration-test.c | 94 ++++++++++++++++++++++++++++++++++++ 1 file changed, 94 insertions(+) diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c index 46f1c275a2..fb7a3765e4 100644 --- a/tests/qtest/migration-test.c +++ b/tests/qtest/migration-test.c @@ -729,6 +729,7 @@ typedef struct { /* Postcopy specific fields */ void *postcopy_data; bool postcopy_preempt; + bool postcopy_recovery_test_fail; } MigrateCommon; static int test_migrate_start(QTestState **from, QTestState **to, @@ -1381,6 +1382,78 @@ static void test_postcopy_preempt_tls_psk(void) } #endif +static void wait_for_postcopy_status(QTestState *one, const char *status) +{ + wait_for_migration_status(one, status, + (const char * []) { "failed", "active", + "completed", NULL }); +} + +static void postcopy_recover_fail(QTestState *from, QTestState *to) +{ + int ret, pair1[2], pair2[2]; + char c; + + /* Create two unrelated socketpairs */ + ret = qemu_socketpair(PF_LOCAL, SOCK_STREAM, 0, pair1); + g_assert_cmpint(ret, ==, 0); + + ret = qemu_socketpair(PF_LOCAL, SOCK_STREAM, 0, pair2); + g_assert_cmpint(ret, ==, 0); + + /* + * Give the guests unpaired ends of the sockets, so they'll all blocked + * at reading. This mimics a wrong channel established. + */ + qtest_qmp_fds_assert_success(from, &pair1[0], 1, + "{ 'execute': 'getfd'," + " 'arguments': { 'fdname': 'fd-mig' }}"); + qtest_qmp_fds_assert_success(to, &pair2[0], 1, + "{ 'execute': 'getfd'," + " 'arguments': { 'fdname': 'fd-mig' }}"); + + /* + * Write the 1st byte as QEMU_VM_COMMAND (0x8) for the dest socket, to + * emulate the 1st byte of a real recovery, but stops from there to + * keep dest QEMU in RECOVER. This is needed so that we can kick off + * the recover process on dest QEMU (by triggering the G_IO_IN event). + * + * NOTE: this trick is not needed on src QEMUs, because src doesn't + * rely on an pre-existing G_IO_IN event, so it will always trigger the + * upcoming recovery anyway even if it can read nothing. + */ +#define QEMU_VM_COMMAND 0x08 + c = QEMU_VM_COMMAND; + ret = send(pair2[1], &c, 1, 0); + g_assert_cmpint(ret, ==, 1); + + migrate_recover(to, "fd:fd-mig"); + migrate_qmp(from, "fd:fd-mig", "{'resume': true}"); + + /* + * Make sure both QEMU instances will go into RECOVER stage, then test + * kicking them out using migrate-pause. + */ + wait_for_postcopy_status(from, "postcopy-recover"); + wait_for_postcopy_status(to, "postcopy-recover"); + + /* + * This would be issued by the admin upon noticing the hang, we should + * make sure we're able to kick this out. + */ + migrate_pause(from); + wait_for_postcopy_status(from, "postcopy-paused"); + + /* Do the same test on dest */ + migrate_pause(to); + wait_for_postcopy_status(to, "postcopy-paused"); + + close(pair1[0]); + close(pair1[1]); + close(pair2[0]); + close(pair2[1]); +} + static void test_postcopy_recovery_common(MigrateCommon *args) { QTestState *from, *to; @@ -1420,6 +1493,15 @@ static void test_postcopy_recovery_common(MigrateCommon *args) (const char * []) { "failed", "active", "completed", NULL }); + if (args->postcopy_recovery_test_fail) { + /* + * Test when a wrong socket specified for recover, and then the + * ability to kick it out, and continue with a correct socket. + */ + postcopy_recover_fail(from, to); + /* continue with a good recovery */ + } + /* * Create a new socket to emulate a new channel that is different * from the broken migration channel; tell the destination to @@ -1459,6 +1541,15 @@ static void test_postcopy_recovery_compress(void) test_postcopy_recovery_common(&args); } +static void test_postcopy_recovery_double_fail(void) +{ + MigrateCommon args = { + .postcopy_recovery_test_fail = true, + }; + + test_postcopy_recovery_common(&args); +} + #ifdef CONFIG_GNUTLS static void test_postcopy_recovery_tls_psk(void) { @@ -2841,6 +2932,9 @@ int main(int argc, char **argv) qtest_add_func("/migration/postcopy/recovery/compress/plain", test_postcopy_recovery_compress); } + qtest_add_func("/migration/postcopy/recovery/double-failures", + test_postcopy_recovery_double_fail); + } qtest_add_func("/migration/bad_dest", test_baddest);