From patchwork Tue Oct 17 20:26:29 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 13426018 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id C239ACDB474 for ; Tue, 17 Oct 2023 20:28:32 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1qsqeN-0000L5-4a; Tue, 17 Oct 2023 16:26:47 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1qsqeK-00009c-N0 for qemu-devel@nongnu.org; Tue, 17 Oct 2023 16:26:44 -0400 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1qsqeH-0000l9-OG for qemu-devel@nongnu.org; Tue, 17 Oct 2023 16:26:44 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1697574401; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=PFY4GDW0Mm9hz64aNzZXY2peH2lKsL7Jq4tRyZFatlM=; b=HbeUtHManwCqP3pKZ01F/uX1wSMKUFCJH3tXAzIf6dMV//SJz/R/JKGS8uS2ky5Hu89e5J uyVDfDjPOxasM6RQ1xSP6LNJuyYNCk3tzh86IDVa1E32eyhFl8xrTM3jbxAVC34JTPU4iD 9rhpcCOY+p8n8DYJEsbAlVfRl52u8YQ= Received: from mail-qk1-f197.google.com (mail-qk1-f197.google.com [209.85.222.197]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-98-26mXbPNeO6e7R9MWesgSqA-1; Tue, 17 Oct 2023 16:26:37 -0400 X-MC-Unique: 26mXbPNeO6e7R9MWesgSqA-1 Received: by mail-qk1-f197.google.com with SMTP id af79cd13be357-776fbd054e2so102250685a.1 for ; Tue, 17 Oct 2023 13:26:37 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1697574397; x=1698179197; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=PFY4GDW0Mm9hz64aNzZXY2peH2lKsL7Jq4tRyZFatlM=; b=RnHMKAfr3vdq+Mit/raffRtD4cAb/lGuDJrq3y9erCOxrnbSSyQA9UO4vaaAEqEYca At5jgo1nAXgHMdHCYpNgcfTYIicUywLBOzVQgX3xH6aQVg+nOIUw9mYvrLrOeemcMbmt SZA0v0TZIU/bIMitGh5YSRIpeK1V2cvtUvWI1TdSXjJP2DR1xVJyNV0ye0cM3tpzlYad QqJKGKw8riUmZn4IZyfP+1nYyzHBE/Ez328olCBnTAeXIEQQn+cXoAuaeZT5cGNuomiX XUQqAptEkTDe2CzqSnMqeeVYW/jET+HXwg02hbx/w8HrDUuSFf9PGamsANeaRNSfzxul 5cqg== X-Gm-Message-State: AOJu0YwI5LpviPtnt33SWH9F7LdjtlhabnRNP+heDODuq4VShvZA2k3a RWxvuXHuFflNoMKVuTXkgdBSXpMZF0R2uY96GrhaO6//eFtcPq3aEhHsewaVx56oY2LJV+1UDkM L16toxNg8GsB/uI2cmAwAFrDRmRYyHxHwiT4Nq3bQuMe2LeAmGStYgCQD3igIhu2jmdAzxZWn X-Received: by 2002:a05:620a:8a15:b0:76f:167a:cc5d with SMTP id qt21-20020a05620a8a1500b0076f167acc5dmr3015766qkn.7.1697574396649; Tue, 17 Oct 2023 13:26:36 -0700 (PDT) X-Google-Smtp-Source: AGHT+IHyI/j6+MmZE1r7qoZHn7PkfSrBV/X1mR7sMIszVePJrM49IqyTO6RQvnamZNu2JZHTyERFRA== X-Received: by 2002:a05:620a:8a15:b0:76f:167a:cc5d with SMTP id qt21-20020a05620a8a1500b0076f167acc5dmr3015746qkn.7.1697574396103; Tue, 17 Oct 2023 13:26:36 -0700 (PDT) Received: from x1n.redhat.com (cpe5c7695f3aee0-cm5c7695f3aede.cpe.net.cable.rogers.com. [99.254.144.39]) by smtp.gmail.com with ESMTPSA id s17-20020ae9f711000000b0076f16e98851sm931879qkg.102.2023.10.17.13.26.35 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 17 Oct 2023 13:26:35 -0700 (PDT) From: Peter Xu To: qemu-devel@nongnu.org Cc: Fabiano Rosas , peterx@redhat.com, =?utf-8?q?Philippe_M?= =?utf-8?q?athieu-Daud=C3=A9?= , Juan Quintela Subject: [PATCH v4 1/5] migration: Refactor error handling in source return path Date: Tue, 17 Oct 2023 16:26:29 -0400 Message-ID: <20231017202633.296756-2-peterx@redhat.com> X-Mailer: git-send-email 2.41.0 In-Reply-To: <20231017202633.296756-1-peterx@redhat.com> References: <20231017202633.296756-1-peterx@redhat.com> MIME-Version: 1.0 Received-SPF: pass client-ip=170.10.133.124; envelope-from=peterx@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H3=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org rp_state.error was a boolean used to show error happened in return path thread. That's not only duplicating error reporting (migrate_set_error), but also not good enough in that we only do error_report() and set it to true, we never can keep a history of the exact error and show it in query-migrate. To make this better, a few things done: - Use error_setg() rather than error_report() across the whole lifecycle of return path thread, keeping the error in an Error*. - With above, no need to have mark_source_rp_bad(), remove it, alongside with rp_state.error itself. - Use migrate_set_error() to apply that captured error to the global migration object when error occured in this thread. - Do the same when detected qemufile error in source return path We need to re-export qemu_file_get_error_obj() to do the last one. Signed-off-by: Peter Xu Reviewed-by: Fabiano Rosas Reviewed-by: Juan Quintela --- migration/migration.h | 1 - migration/qemu-file.h | 1 + migration/ram.h | 5 +- migration/migration.c | 121 ++++++++++++++++++----------------------- migration/qemu-file.c | 2 +- migration/ram.c | 41 +++++++------- migration/trace-events | 4 +- 7 files changed, 80 insertions(+), 95 deletions(-) diff --git a/migration/migration.h b/migration/migration.h index ae82004892..1d1ac13adb 100644 --- a/migration/migration.h +++ b/migration/migration.h @@ -308,7 +308,6 @@ struct MigrationState { /* Protected by qemu_file_lock */ QEMUFile *from_dst_file; QemuThread rp_thread; - bool error; /* * We can also check non-zero of rp_thread, but there's no "official" * way to do this, so this bool makes it slightly more elegant. diff --git a/migration/qemu-file.h b/migration/qemu-file.h index a29c37b0d0..25a1a8052f 100644 --- a/migration/qemu-file.h +++ b/migration/qemu-file.h @@ -87,6 +87,7 @@ int coroutine_mixed_fn qemu_peek_byte(QEMUFile *f, int offset); void qemu_file_skip(QEMUFile *f, int size); int qemu_file_get_error_obj_any(QEMUFile *f1, QEMUFile *f2, Error **errp); void qemu_file_set_error_obj(QEMUFile *f, int ret, Error *err); +int qemu_file_get_error_obj(QEMUFile *f, Error **errp); void qemu_file_set_error(QEMUFile *f, int ret); int qemu_file_shutdown(QEMUFile *f); QEMUFile *qemu_file_get_return_path(QEMUFile *f); diff --git a/migration/ram.h b/migration/ram.h index 145c915ca7..14ed666d58 100644 --- a/migration/ram.h +++ b/migration/ram.h @@ -51,7 +51,8 @@ uint64_t ram_bytes_total(void); void mig_throttle_counter_reset(void); uint64_t ram_pagesize_summary(void); -int ram_save_queue_pages(const char *rbname, ram_addr_t start, ram_addr_t len); +int ram_save_queue_pages(const char *rbname, ram_addr_t start, ram_addr_t len, + Error **errp); void ram_postcopy_migrated_memory_release(MigrationState *ms); /* For outgoing discard bitmap */ void ram_postcopy_send_discard_bitmap(MigrationState *ms); @@ -71,7 +72,7 @@ void ramblock_recv_bitmap_set(RAMBlock *rb, void *host_addr); void ramblock_recv_bitmap_set_range(RAMBlock *rb, void *host_addr, size_t nr); int64_t ramblock_recv_bitmap_send(QEMUFile *file, const char *block_name); -int ram_dirty_bitmap_reload(MigrationState *s, RAMBlock *rb); +int ram_dirty_bitmap_reload(MigrationState *s, RAMBlock *rb, Error **errp); bool ramblock_page_is_discarded(RAMBlock *rb, ram_addr_t start); void postcopy_preempt_shutdown_file(MigrationState *s); void *postcopy_preempt_thread(void *opaque); diff --git a/migration/migration.c b/migration/migration.c index 6ba5e145ac..3123c4a873 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -99,7 +99,7 @@ static int migration_maybe_pause(MigrationState *s, int *current_active_state, int new_state); static void migrate_fd_cancel(MigrationState *s); -static int close_return_path_on_source(MigrationState *s); +static bool close_return_path_on_source(MigrationState *s); static bool migration_needs_multiple_sockets(void) { @@ -1430,7 +1430,6 @@ int migrate_init(MigrationState *s, Error **errp) s->to_dst_file = NULL; s->state = MIGRATION_STATUS_NONE; s->rp_state.from_dst_file = NULL; - s->rp_state.error = false; s->mbps = 0.0; s->pages_per_second = 0.0; s->downtime = 0; @@ -1755,16 +1754,6 @@ void qmp_migrate_continue(MigrationStatus state, Error **errp) qemu_sem_post(&s->pause_sem); } -/* migration thread support */ -/* - * Something bad happened to the RP stream, mark an error - * The caller shall print or trace something to indicate why - */ -static void mark_source_rp_bad(MigrationState *s) -{ - s->rp_state.error = true; -} - void migration_rp_wait(MigrationState *s) { qemu_sem_wait(&s->rp_state.rp_sem); @@ -1795,8 +1784,9 @@ static struct rp_cmd_args { * We're allowed to send more than requested (e.g. to round to our page size) * and we don't need to send pages that have already been sent. */ -static void migrate_handle_rp_req_pages(MigrationState *ms, const char* rbname, - ram_addr_t start, size_t len) +static void +migrate_handle_rp_req_pages(MigrationState *ms, const char* rbname, + ram_addr_t start, size_t len, Error **errp) { long our_host_ps = qemu_real_host_page_size(); @@ -1808,37 +1798,36 @@ static void migrate_handle_rp_req_pages(MigrationState *ms, const char* rbname, */ if (!QEMU_IS_ALIGNED(start, our_host_ps) || !QEMU_IS_ALIGNED(len, our_host_ps)) { - error_report("%s: Misaligned page request, start: " RAM_ADDR_FMT - " len: %zd", __func__, start, len); - mark_source_rp_bad(ms); + error_setg(errp, "MIG_RP_MSG_REQ_PAGES: Misaligned page request, start:" + RAM_ADDR_FMT " len: %zd", start, len); return; } - if (ram_save_queue_pages(rbname, start, len)) { - mark_source_rp_bad(ms); - } + ram_save_queue_pages(rbname, start, len, errp); } -static int migrate_handle_rp_recv_bitmap(MigrationState *s, char *block_name) +static int migrate_handle_rp_recv_bitmap(MigrationState *s, char *block_name, + Error **errp) { RAMBlock *block = qemu_ram_block_by_name(block_name); if (!block) { - error_report("%s: invalid block name '%s'", __func__, block_name); + error_setg(errp, "MIG_RP_MSG_RECV_BITMAP has invalid block name '%s'", + block_name); return -EINVAL; } /* Fetch the received bitmap and refresh the dirty bitmap */ - return ram_dirty_bitmap_reload(s, block); + return ram_dirty_bitmap_reload(s, block, errp); } -static int migrate_handle_rp_resume_ack(MigrationState *s, uint32_t value) +static int migrate_handle_rp_resume_ack(MigrationState *s, + uint32_t value, Error **errp) { trace_source_return_path_thread_resume_ack(value); if (value != MIGRATION_RESUME_ACK_VALUE) { - error_report("%s: illegal resume_ack value %"PRIu32, - __func__, value); + error_setg(errp, "illegal resume_ack value %"PRIu32, value); return -1; } @@ -1897,48 +1886,46 @@ static void *source_return_path_thread(void *opaque) uint32_t tmp32, sibling_error; ram_addr_t start = 0; /* =0 to silence warning */ size_t len = 0, expected_len; + Error *err = NULL; int res; trace_source_return_path_thread_entry(); rcu_register_thread(); - while (!ms->rp_state.error && !qemu_file_get_error(rp) && - migration_is_setup_or_active(ms->state)) { + while (migration_is_setup_or_active(ms->state)) { trace_source_return_path_thread_loop_top(); + header_type = qemu_get_be16(rp); header_len = qemu_get_be16(rp); if (qemu_file_get_error(rp)) { - mark_source_rp_bad(ms); + qemu_file_get_error_obj(rp, &err); goto out; } if (header_type >= MIG_RP_MSG_MAX || header_type == MIG_RP_MSG_INVALID) { - error_report("RP: Received invalid message 0x%04x length 0x%04x", - header_type, header_len); - mark_source_rp_bad(ms); + error_setg(&err, "Received invalid message 0x%04x length 0x%04x", + header_type, header_len); goto out; } if ((rp_cmd_args[header_type].len != -1 && header_len != rp_cmd_args[header_type].len) || header_len > sizeof(buf)) { - error_report("RP: Received '%s' message (0x%04x) with" - "incorrect length %d expecting %zu", - rp_cmd_args[header_type].name, header_type, header_len, - (size_t)rp_cmd_args[header_type].len); - mark_source_rp_bad(ms); + error_setg(&err, "Received '%s' message (0x%04x) with" + "incorrect length %d expecting %zu", + rp_cmd_args[header_type].name, header_type, header_len, + (size_t)rp_cmd_args[header_type].len); goto out; } /* We know we've got a valid header by this point */ res = qemu_get_buffer(rp, buf, header_len); if (res != header_len) { - error_report("RP: Failed reading data for message 0x%04x" - " read %d expected %d", - header_type, res, header_len); - mark_source_rp_bad(ms); + error_setg(&err, "Failed reading data for message 0x%04x" + " read %d expected %d", + header_type, res, header_len); goto out; } @@ -1948,8 +1935,7 @@ static void *source_return_path_thread(void *opaque) sibling_error = ldl_be_p(buf); trace_source_return_path_thread_shut(sibling_error); if (sibling_error) { - error_report("RP: Sibling indicated error %d", sibling_error); - mark_source_rp_bad(ms); + error_setg(&err, "Sibling indicated error %d", sibling_error); } /* * We'll let the main thread deal with closing the RP @@ -1967,7 +1953,10 @@ static void *source_return_path_thread(void *opaque) case MIG_RP_MSG_REQ_PAGES: start = ldq_be_p(buf); len = ldl_be_p(buf + 8); - migrate_handle_rp_req_pages(ms, NULL, start, len); + migrate_handle_rp_req_pages(ms, NULL, start, len, &err); + if (err) { + goto out; + } break; case MIG_RP_MSG_REQ_PAGES_ID: @@ -1982,32 +1971,32 @@ static void *source_return_path_thread(void *opaque) expected_len += tmp32; } if (header_len != expected_len) { - error_report("RP: Req_Page_id with length %d expecting %zd", - header_len, expected_len); - mark_source_rp_bad(ms); + error_setg(&err, "Req_Page_id with length %d expecting %zd", + header_len, expected_len); + goto out; + } + migrate_handle_rp_req_pages(ms, (char *)&buf[13], start, len, + &err); + if (err) { goto out; } - migrate_handle_rp_req_pages(ms, (char *)&buf[13], start, len); break; case MIG_RP_MSG_RECV_BITMAP: if (header_len < 1) { - error_report("%s: missing block name", __func__); - mark_source_rp_bad(ms); + error_setg(&err, "MIG_RP_MSG_RECV_BITMAP missing block name"); goto out; } /* Format: len (1B) + idstr (<255B). This ends the idstr. */ buf[buf[0] + 1] = '\0'; - if (migrate_handle_rp_recv_bitmap(ms, (char *)(buf + 1))) { - mark_source_rp_bad(ms); + if (migrate_handle_rp_recv_bitmap(ms, (char *)(buf + 1), &err)) { goto out; } break; case MIG_RP_MSG_RESUME_ACK: tmp32 = ldl_be_p(buf); - if (migrate_handle_rp_resume_ack(ms, tmp32)) { - mark_source_rp_bad(ms); + if (migrate_handle_rp_resume_ack(ms, tmp32, &err)) { goto out; } break; @@ -2023,13 +2012,15 @@ static void *source_return_path_thread(void *opaque) } out: - if (qemu_file_get_error(rp)) { + if (err) { + migrate_set_error(ms, err); + error_free(err); trace_source_return_path_thread_bad_end(); - mark_source_rp_bad(ms); } trace_source_return_path_thread_end(); rcu_unregister_thread(); + return NULL; } @@ -2051,12 +2042,11 @@ static int open_return_path_on_source(MigrationState *ms) return 0; } -static int close_return_path_on_source(MigrationState *ms) +/* Return true if error detected, or false otherwise */ +static bool close_return_path_on_source(MigrationState *ms) { - int ret; - if (!ms->rp_state.rp_thread_created) { - return 0; + return false; } trace_migration_return_path_end_before(); @@ -2074,18 +2064,13 @@ static int close_return_path_on_source(MigrationState *ms) } } - trace_await_return_path_close_on_source_joining(); qemu_thread_join(&ms->rp_state.rp_thread); ms->rp_state.rp_thread_created = false; - trace_await_return_path_close_on_source_close(); - - ret = ms->rp_state.error; - ms->rp_state.error = false; - migration_release_dst_files(ms); + trace_migration_return_path_end_after(); - trace_migration_return_path_end_after(ret); - return ret; + /* Return path will persist the error in MigrationState when quit */ + return migrate_has_error(ms); } static inline void diff --git a/migration/qemu-file.c b/migration/qemu-file.c index 3fb25148d1..5b8fd8bee1 100644 --- a/migration/qemu-file.c +++ b/migration/qemu-file.c @@ -140,7 +140,7 @@ QEMUFile *qemu_file_new_input(QIOChannel *ioc) * * If errp is specified, a verbose error message will be copied over. */ -static int qemu_file_get_error_obj(QEMUFile *f, Error **errp) +int qemu_file_get_error_obj(QEMUFile *f, Error **errp) { if (!f->last_error) { return 0; diff --git a/migration/ram.c b/migration/ram.c index c844151ee9..544c5b63cf 100644 --- a/migration/ram.c +++ b/migration/ram.c @@ -1955,7 +1955,8 @@ static void migration_page_queue_free(RAMState *rs) * @start: starting address from the start of the RAMBlock * @len: length (in bytes) to send */ -int ram_save_queue_pages(const char *rbname, ram_addr_t start, ram_addr_t len) +int ram_save_queue_pages(const char *rbname, ram_addr_t start, ram_addr_t len, + Error **errp) { RAMBlock *ramblock; RAMState *rs = ram_state; @@ -1972,7 +1973,7 @@ int ram_save_queue_pages(const char *rbname, ram_addr_t start, ram_addr_t len) * Shouldn't happen, we can't reuse the last RAMBlock if * it's the 1st request. */ - error_report("ram_save_queue_pages no previous block"); + error_setg(errp, "MIG_RP_MSG_REQ_PAGES has no previous block"); return -1; } } else { @@ -1980,16 +1981,17 @@ int ram_save_queue_pages(const char *rbname, ram_addr_t start, ram_addr_t len) if (!ramblock) { /* We shouldn't be asked for a non-existent RAMBlock */ - error_report("ram_save_queue_pages no block '%s'", rbname); + error_setg(errp, "MIG_RP_MSG_REQ_PAGES has no block '%s'", rbname); return -1; } rs->last_req_rb = ramblock; } trace_ram_save_queue_pages(ramblock->idstr, start, len); if (!offset_in_ramblock(ramblock, start + len - 1)) { - error_report("%s request overrun start=" RAM_ADDR_FMT " len=" - RAM_ADDR_FMT " blocklen=" RAM_ADDR_FMT, - __func__, start, len, ramblock->used_length); + error_setg(errp, "MIG_RP_MSG_REQ_PAGES request overrun, " + "start=" RAM_ADDR_FMT " len=" + RAM_ADDR_FMT " blocklen=" RAM_ADDR_FMT, + start, len, ramblock->used_length); return -1; } @@ -2021,9 +2023,9 @@ int ram_save_queue_pages(const char *rbname, ram_addr_t start, ram_addr_t len) assert(len % page_size == 0); while (len) { if (ram_save_host_page_urgent(pss)) { - error_report("%s: ram_save_host_page_urgent() failed: " - "ramblock=%s, start_addr=0x"RAM_ADDR_FMT, - __func__, ramblock->idstr, start); + error_setg(errp, "ram_save_host_page_urgent() failed: " + "ramblock=%s, start_addr=0x"RAM_ADDR_FMT, + ramblock->idstr, start); ret = -1; break; } @@ -4186,7 +4188,7 @@ static int ram_dirty_bitmap_sync_all(MigrationState *s, RAMState *rs) * This is only used when the postcopy migration is paused but wants * to resume from a middle point. */ -int ram_dirty_bitmap_reload(MigrationState *s, RAMBlock *block) +int ram_dirty_bitmap_reload(MigrationState *s, RAMBlock *block, Error **errp) { int ret = -EINVAL; /* from_dst_file is always valid because we're within rp_thread */ @@ -4200,8 +4202,8 @@ int ram_dirty_bitmap_reload(MigrationState *s, RAMBlock *block) trace_ram_dirty_bitmap_reload_begin(block->idstr); if (s->state != MIGRATION_STATUS_POSTCOPY_RECOVER) { - error_report("%s: incorrect state %s", __func__, - MigrationStatus_str(s->state)); + error_setg(errp, "Reload bitmap in incorrect state %s", + MigrationStatus_str(s->state)); return -EINVAL; } @@ -4218,9 +4220,8 @@ int ram_dirty_bitmap_reload(MigrationState *s, RAMBlock *block) /* The size of the bitmap should match with our ramblock */ if (size != local_size) { - error_report("%s: ramblock '%s' bitmap size mismatch " - "(0x%"PRIx64" != 0x%"PRIx64")", __func__, - block->idstr, size, local_size); + error_setg(errp, "ramblock '%s' bitmap size mismatch (0x%"PRIx64 + " != 0x%"PRIx64")", block->idstr, size, local_size); return -EINVAL; } @@ -4229,15 +4230,15 @@ int ram_dirty_bitmap_reload(MigrationState *s, RAMBlock *block) ret = qemu_file_get_error(file); if (ret || size != local_size) { - error_report("%s: read bitmap failed for ramblock '%s': %d" - " (size 0x%"PRIx64", got: 0x%"PRIx64")", - __func__, block->idstr, ret, local_size, size); + error_setg(errp, "read bitmap failed for ramblock '%s': %d" + " (size 0x%"PRIx64", got: 0x%"PRIx64")", + block->idstr, ret, local_size, size); return -EIO; } if (end_mark != RAMBLOCK_RECV_BITMAP_ENDING) { - error_report("%s: ramblock '%s' end mark incorrect: 0x%"PRIx64, - __func__, block->idstr, end_mark); + error_setg(errp, "ramblock '%s' end mark incorrect: 0x%"PRIx64, + block->idstr, end_mark); return -EINVAL; } diff --git a/migration/trace-events b/migration/trace-events index fa9486dffe..d50e8ed87e 100644 --- a/migration/trace-events +++ b/migration/trace-events @@ -148,8 +148,6 @@ multifd_tls_outgoing_handshake_complete(void *ioc) "ioc=%p" multifd_set_outgoing_channel(void *ioc, const char *ioctype, const char *hostname) "ioc=%p ioctype=%s hostname=%s" # migration.c -await_return_path_close_on_source_close(void) "" -await_return_path_close_on_source_joining(void) "" migrate_set_state(const char *new_state) "new state %s" migrate_fd_cleanup(void) "" migrate_fd_error(const char *error_desc) "error=%s" @@ -166,7 +164,7 @@ migration_completion_postcopy_end_after_complete(void) "" migration_rate_limit_pre(int ms) "%d ms" migration_rate_limit_post(int urgent) "urgent: %d" migration_return_path_end_before(void) "" -migration_return_path_end_after(int rp_error) "%d" +migration_return_path_end_after(void) "" migration_thread_after_loop(void) "" migration_thread_file_err(void) "" migration_thread_setup_complete(void) "" From patchwork Tue Oct 17 20:26:30 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 13426003 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id BCD6DCDB474 for ; Tue, 17 Oct 2023 20:27:03 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1qsqeR-0000Wz-TI; Tue, 17 Oct 2023 16:26:52 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1qsqeO-0000PU-2Z for qemu-devel@nongnu.org; Tue, 17 Oct 2023 16:26:48 -0400 Received: from us-smtp-delivery-124.mimecast.com ([170.10.129.124]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1qsqeL-0000lx-O6 for qemu-devel@nongnu.org; Tue, 17 Oct 2023 16:26:47 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1697574405; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=6fmbaN/VcS/KEap3OOb428PF7mCjkwVBQva1/SxwhK4=; b=YxhbNn7sxRC03Ieq4t39u1ZJVxhzpWVcSuXYWVoKt1jpMx1L754IENuVyaKvBMXppwN79Q gAZYFadMUfdcFPvp4s6dOaMA1zeTUrhS5xeGw8XERBgg4wpjb9oax7mCaVpkqaAa7ic/Ja V1ekYwJUbla9IflQQVGlEO7vdqFY35I= Received: from mail-qk1-f200.google.com (mail-qk1-f200.google.com [209.85.222.200]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-589-V7bKnHC0Opqi11w0sDkyLw-1; Tue, 17 Oct 2023 16:26:38 -0400 X-MC-Unique: V7bKnHC0Opqi11w0sDkyLw-1 Received: by mail-qk1-f200.google.com with SMTP id af79cd13be357-77422b20b13so100478585a.1 for ; Tue, 17 Oct 2023 13:26:38 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1697574397; x=1698179197; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=6fmbaN/VcS/KEap3OOb428PF7mCjkwVBQva1/SxwhK4=; b=Ai8mhKyZvlhQar1mu9BOjncZ8teY77jrJbwClqOqfN2lIqCo+0EMGmm3qfYkjd37MD nj9H0Ji64Sn/XeeyEKKW7ZrGUVXdr0qZViJTbEz9ptzioGaEiXY7qNQW/d8fbCqNd/eY ZeuMqcoSyJXY0RIK2isk5/gbQaz47MYnHlpWB5Yr55nGhZm9QMp3MN/M8LcRvu+VnM31 JQrQddMwTDlFlKAFDfAWsSmbHC0nrJ607iyrJ0mZ42spuiMaGKTjDatoS6wayTr78O6u J4Bpa0tMd8rvswK0S+CrVOBoPDr1HsrHblXf5TYVqOyrO2Mn9oobGldduoqKzVUNFpas SaVg== X-Gm-Message-State: AOJu0YxqvLLArHWGr7XDflOSSut+mP/LHVn9U2Zc7IoNtY6XLRGnDPGh QmV3TruA8OJzanvlDSRthw6k0JDw74e74uhqfny/lz6MYCybM2zuATQiicU0tAniw9PblMpwxKK M2fCPp6oxyRb3SP8b9CUajio7T4ty5WPUPv7qrEku3UQkLSsrj1pqS67jpa4c/AhsiH8D2iel X-Received: by 2002:a05:620a:6589:b0:774:17d6:31dc with SMTP id qd9-20020a05620a658900b0077417d631dcmr3118328qkn.4.1697574397339; Tue, 17 Oct 2023 13:26:37 -0700 (PDT) X-Google-Smtp-Source: AGHT+IEn/SNv5OQ4VZz2E0CH8t6JhVd2pujfRNecXMQtaj2rDuFRcOr4LyOiQk7WDqgQkAc9vaZHDw== X-Received: by 2002:a05:620a:6589:b0:774:17d6:31dc with SMTP id qd9-20020a05620a658900b0077417d631dcmr3118309qkn.4.1697574396926; Tue, 17 Oct 2023 13:26:36 -0700 (PDT) Received: from x1n.redhat.com (cpe5c7695f3aee0-cm5c7695f3aede.cpe.net.cable.rogers.com. [99.254.144.39]) by smtp.gmail.com with ESMTPSA id s17-20020ae9f711000000b0076f16e98851sm931879qkg.102.2023.10.17.13.26.36 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 17 Oct 2023 13:26:36 -0700 (PDT) From: Peter Xu To: qemu-devel@nongnu.org Cc: Fabiano Rosas , peterx@redhat.com, =?utf-8?q?Philippe_M?= =?utf-8?q?athieu-Daud=C3=A9?= , Juan Quintela , Xiaohui Li Subject: [PATCH v4 2/5] migration: Allow network to fail even during recovery Date: Tue, 17 Oct 2023 16:26:30 -0400 Message-ID: <20231017202633.296756-3-peterx@redhat.com> X-Mailer: git-send-email 2.41.0 In-Reply-To: <20231017202633.296756-1-peterx@redhat.com> References: <20231017202633.296756-1-peterx@redhat.com> MIME-Version: 1.0 Received-SPF: pass client-ip=170.10.129.124; envelope-from=peterx@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H4=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Normally the postcopy recover phase should only exist for a super short period, that's the duration when QEMU is trying to recover from an interrupted postcopy migration, during which handshake will be carried out for continuing the procedure with state changes from PAUSED -> RECOVER -> POSTCOPY_ACTIVE again. Here RECOVER phase should be super small, that happens right after the admin specified a new but working network link for QEMU to reconnect to dest QEMU. However there can still be case where the channel is broken in this small RECOVER window. If it happens, with current code there's no way the src QEMU can got kicked out of RECOVER stage. No way either to retry the recover in another channel when established. This patch allows the RECOVER phase to fail itself too - we're mostly ready, just some small things missing, e.g. properly kick the main migration thread out when sleeping on rp_sem when we found that we're at RECOVER stage. When this happens, it fails the RECOVER itself, and rollback to PAUSED stage. Then the user can retry another round of recovery. To make it even stronger, teach QMP command migrate-pause to explicitly kick src/dst QEMU out when needed, so even if for some reason the migration thread didn't got kicked out already by a failing rethrn-path thread, the admin can also kick it out. This will be an super, super corner case, but still try to cover that. One can try to test this with two proxy channels for migration: (a) socat unix-listen:/tmp/src.sock,reuseaddr,fork tcp:localhost:10000 (b) socat tcp-listen:10000,reuseaddr,fork unix:/tmp/dst.sock So the migration channel will be: (a) (b) src -> /tmp/src.sock -> tcp:10000 -> /tmp/dst.sock -> dst Then to make QEMU hang at RECOVER stage, one can do below: (1) stop the postcopy using QMP command postcopy-pause (2) kill the 2nd proxy (b) (3) try to recover the postcopy using /tmp/src.sock on src (4) src QEMU will go into RECOVER stage but won't be able to continue from there, because the channel is actually broken at (b) Before this patch, step (4) will make src QEMU stuck in RECOVER stage, without a way to kick the QEMU out or continue the postcopy again. After this patch, (4) will quickly fail qemu and bounce back to PAUSED stage. Admin can also kick QEMU from (4) into PAUSED when needed using migrate-pause when needed. After bouncing back to PAUSED stage, one can recover again. Reported-by: Xiaohui Li Reviewed-by: Fabiano Rosas Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2111332 Signed-off-by: Peter Xu Reviewed-by: Juan Quintela --- migration/migration.h | 8 ++++-- migration/migration.c | 63 +++++++++++++++++++++++++++++++++++++++---- migration/ram.c | 4 ++- 3 files changed, 67 insertions(+), 8 deletions(-) diff --git a/migration/migration.h b/migration/migration.h index 1d1ac13adb..0b695bbfb1 100644 --- a/migration/migration.h +++ b/migration/migration.h @@ -494,6 +494,7 @@ int migrate_init(MigrationState *s, Error **errp); bool migration_is_blocked(Error **errp); /* True if outgoing migration has entered postcopy phase */ bool migration_in_postcopy(void); +bool migration_postcopy_is_alive(int state); MigrationState *migrate_get_current(void); uint64_t ram_get_total_transferred_pages(void); @@ -534,8 +535,11 @@ void migration_populate_vfio_info(MigrationInfo *info); void migration_reset_vfio_bytes_transferred(void); void postcopy_temp_page_reset(PostcopyTmpPage *tmp_page); -/* Migration thread waiting for return path thread. */ -void migration_rp_wait(MigrationState *s); +/* + * Migration thread waiting for return path thread. Return non-zero if an + * error is detected. + */ +int migration_rp_wait(MigrationState *s); /* * Kick the migration thread waiting for return path messages. NOTE: the * name can be slightly confusing (when read as "kick the rp thread"), just diff --git a/migration/migration.c b/migration/migration.c index 3123c4a873..0661dad953 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -1348,6 +1348,17 @@ bool migration_in_postcopy(void) } } +bool migration_postcopy_is_alive(int state) +{ + switch (state) { + case MIGRATION_STATUS_POSTCOPY_ACTIVE: + case MIGRATION_STATUS_POSTCOPY_RECOVER: + return true; + default: + return false; + } +} + bool migration_in_postcopy_after_devices(MigrationState *s) { return migration_in_postcopy() && s->postcopy_after_devices; @@ -1557,8 +1568,15 @@ void qmp_migrate_pause(Error **errp) MigrationIncomingState *mis = migration_incoming_get_current(); int ret = 0; - if (ms->state == MIGRATION_STATUS_POSTCOPY_ACTIVE) { + if (migration_postcopy_is_alive(ms->state)) { /* Source side, during postcopy */ + Error *error = NULL; + + /* Tell the core migration that we're pausing */ + error_setg(&error, "Postcopy migration is paused by the user"); + migrate_set_error(ms, error); + error_free(error); + qemu_mutex_lock(&ms->qemu_file_lock); if (ms->to_dst_file) { ret = qemu_file_shutdown(ms->to_dst_file); @@ -1567,10 +1585,17 @@ void qmp_migrate_pause(Error **errp) if (ret) { error_setg(errp, "Failed to pause source migration"); } + + /* + * Kick the migration thread out of any waiting windows (on behalf + * of the rp thread). + */ + migration_rp_kick(ms); + return; } - if (mis->state == MIGRATION_STATUS_POSTCOPY_ACTIVE) { + if (migration_postcopy_is_alive(mis->state)) { ret = qemu_file_shutdown(mis->from_src_file); if (ret) { error_setg(errp, "Failed to pause destination migration"); @@ -1579,7 +1604,7 @@ void qmp_migrate_pause(Error **errp) } error_setg(errp, "migrate-pause is currently only supported " - "during postcopy-active state"); + "during postcopy-active or postcopy-recover state"); } bool migration_is_blocked(Error **errp) @@ -1754,9 +1779,21 @@ void qmp_migrate_continue(MigrationStatus state, Error **errp) qemu_sem_post(&s->pause_sem); } -void migration_rp_wait(MigrationState *s) +int migration_rp_wait(MigrationState *s) { + /* If migration has failure already, ignore the wait */ + if (migrate_has_error(s)) { + return -1; + } + qemu_sem_wait(&s->rp_state.rp_sem); + + /* After wait, double check that there's no failure */ + if (migrate_has_error(s)) { + return -1; + } + + return 0; } void migration_rp_kick(MigrationState *s) @@ -2018,6 +2055,20 @@ out: trace_source_return_path_thread_bad_end(); } + if (ms->state == MIGRATION_STATUS_POSTCOPY_RECOVER) { + /* + * this will be extremely unlikely: that we got yet another network + * issue during recovering of the 1st network failure.. during this + * period the main migration thread can be waiting on rp_sem for + * this thread to sync with the other side. + * + * When this happens, explicitly kick the migration thread out of + * RECOVER stage and back to PAUSED, so the admin can try + * everything again. + */ + migration_rp_kick(ms); + } + trace_source_return_path_thread_end(); rcu_unregister_thread(); @@ -2482,7 +2533,9 @@ static int postcopy_resume_handshake(MigrationState *s) qemu_savevm_send_postcopy_resume(s->to_dst_file); while (s->state == MIGRATION_STATUS_POSTCOPY_RECOVER) { - migration_rp_wait(s); + if (migration_rp_wait(s)) { + return -1; + } } if (s->state == MIGRATION_STATUS_POSTCOPY_ACTIVE) { diff --git a/migration/ram.c b/migration/ram.c index 544c5b63cf..973e872284 100644 --- a/migration/ram.c +++ b/migration/ram.c @@ -4175,7 +4175,9 @@ static int ram_dirty_bitmap_sync_all(MigrationState *s, RAMState *rs) /* Wait until all the ramblocks' dirty bitmap synced */ while (qatomic_read(&rs->postcopy_bmap_sync_requested)) { - migration_rp_wait(s); + if (migration_rp_wait(s)) { + return -1; + } } trace_ram_dirty_bitmap_sync_complete(); From patchwork Tue Oct 17 20:26:31 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 13426012 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id E6F4ACDB483 for ; Tue, 17 Oct 2023 20:28:10 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1qsqeO-0000Og-1I; Tue, 17 Oct 2023 16:26:48 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1qsqeM-0000Jj-DI for qemu-devel@nongnu.org; Tue, 17 Oct 2023 16:26:46 -0400 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1qsqeK-0000lS-FE for qemu-devel@nongnu.org; Tue, 17 Oct 2023 16:26:46 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1697574403; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=fDK+2dORsbSKQM++BPpMxFWX+JtgmThkw7TH+ydtup0=; b=V+nioVTKfrhQvUxNc/fkIRrtLCB6ZsW74t2octUtZNLu+HV1SwVgXBYF7m3Aqosq8F+4zA VIZkfyrxyw3CGiB0C5go3pBBR3OWaD9evkYyG6Yqa5/SOTa5x5FGS2selyBx0SdVuscMxq mhBe+UsdAyINKkRHlAKppPf+jOR/BjY= Received: from mail-oi1-f198.google.com (mail-oi1-f198.google.com [209.85.167.198]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-160-3JsdwfttMyKNOONIaHd6oA-1; Tue, 17 Oct 2023 16:26:39 -0400 X-MC-Unique: 3JsdwfttMyKNOONIaHd6oA-1 Received: by mail-oi1-f198.google.com with SMTP id 5614622812f47-3ae112530b7so1900707b6e.0 for ; Tue, 17 Oct 2023 13:26:39 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1697574399; x=1698179199; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=fDK+2dORsbSKQM++BPpMxFWX+JtgmThkw7TH+ydtup0=; b=xAkZF+FUzHst6iHVerGTGduKAXzAu7jZ9q8X0FVAKVLVBvtMx48hz9cU7n1P/5OrSx p3h+VDpeLwsk1eI4ufBfYRdXlfs+8XvU0ou+tmAFpXSTd87QPYg2tpNk+tnhVOarrNUY ivd6n8d8SGCL9R01gsvK02VGpAUohj3/C60p4CCf6l9oRuF0GOy1oiwrH7DhXYbyW7L8 hpzT70oTL+d3Ea5djJahU09MgAGo4fEGYKt0BHKraMvAKPtD+D4rc8LR2jp+hIxH0Ssh /ER+iBeJTJYQA01fVrAlXoBdvGFVka9tT/0CrvX9iWVoZEM6XZhEzq1KoSGyfbXUQ/PL vTkg== X-Gm-Message-State: AOJu0YwJMjW8SuWKV7HdZrmG0FI5pNhT3Oy84fyCnf0H7qxj+hPiIVcz 6NW5/ab4l0D1a3+pu8d6FsVm1X13VO9liGM8zKYTQk8O+rBwV2cpIGy9EFnApFZUAMwirYtik1C imWc3JP6Z+zDBg9959nuEhRRtVkjmVapiqx5j1DhZan8fYVCzGoDvvGoCJ4zfc2Q1b9v4QCcn X-Received: by 2002:a05:6808:17a0:b0:3ae:108d:acee with SMTP id bg32-20020a05680817a000b003ae108daceemr4509454oib.1.1697574398782; Tue, 17 Oct 2023 13:26:38 -0700 (PDT) X-Google-Smtp-Source: AGHT+IHe7g/tvUL2qMETO2koxe7j7z/7uES2LrADYTS7k0vEBWxGI6rgO6/sTlgXD5pE9mSmJiPBwg== X-Received: by 2002:a05:6808:17a0:b0:3ae:108d:acee with SMTP id bg32-20020a05680817a000b003ae108daceemr4509426oib.1.1697574398301; Tue, 17 Oct 2023 13:26:38 -0700 (PDT) Received: from x1n.redhat.com (cpe5c7695f3aee0-cm5c7695f3aede.cpe.net.cable.rogers.com. [99.254.144.39]) by smtp.gmail.com with ESMTPSA id s17-20020ae9f711000000b0076f16e98851sm931879qkg.102.2023.10.17.13.26.37 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 17 Oct 2023 13:26:37 -0700 (PDT) From: Peter Xu To: qemu-devel@nongnu.org Cc: Fabiano Rosas , peterx@redhat.com, =?utf-8?q?Philippe_M?= =?utf-8?q?athieu-Daud=C3=A9?= , Juan Quintela Subject: [PATCH v4 3/5] tests/migration-test: Add a test for postcopy hangs during RECOVER Date: Tue, 17 Oct 2023 16:26:31 -0400 Message-ID: <20231017202633.296756-4-peterx@redhat.com> X-Mailer: git-send-email 2.41.0 In-Reply-To: <20231017202633.296756-1-peterx@redhat.com> References: <20231017202633.296756-1-peterx@redhat.com> MIME-Version: 1.0 Received-SPF: pass client-ip=170.10.133.124; envelope-from=peterx@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H3=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org From: Fabiano Rosas To do so, create two paired sockets, but make them not providing real data. Feed those fake sockets to src/dst QEMUs for recovery to let them go into RECOVER stage without going out. Test that we can always kick it out and recover again with the right ports. This patch is based on Fabiano's version here: https://lore.kernel.org/r/877cowmdu0.fsf@suse.de Signed-off-by: Fabiano Rosas [peterx: write commit message, remove case 1, fix bugs, and more] Signed-off-by: Peter Xu Signed-off-by: Fabiano Rosas --- tests/qtest/migration-test.c | 102 ++++++++++++++++++++++++++++++++--- 1 file changed, 96 insertions(+), 6 deletions(-) diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c index e1c110537b..e81d6de000 100644 --- a/tests/qtest/migration-test.c +++ b/tests/qtest/migration-test.c @@ -726,6 +726,7 @@ typedef struct { /* Postcopy specific fields */ void *postcopy_data; bool postcopy_preempt; + bool postcopy_recovery_test_fail; } MigrateCommon; static int test_migrate_start(QTestState **from, QTestState **to, @@ -1379,6 +1380,78 @@ static void test_postcopy_preempt_tls_psk(void) } #endif +static void wait_for_postcopy_status(QTestState *one, const char *status) +{ + wait_for_migration_status(one, status, + (const char * []) { "failed", "active", + "completed", NULL }); +} + +static void postcopy_recover_fail(QTestState *from, QTestState *to) +{ + int ret, pair1[2], pair2[2]; + char c; + + /* Create two unrelated socketpairs */ + ret = qemu_socketpair(PF_LOCAL, SOCK_STREAM, 0, pair1); + g_assert_cmpint(ret, ==, 0); + + ret = qemu_socketpair(PF_LOCAL, SOCK_STREAM, 0, pair2); + g_assert_cmpint(ret, ==, 0); + + /* + * Give the guests unpaired ends of the sockets, so they'll all blocked + * at reading. This mimics a wrong channel established. + */ + qtest_qmp_fds_assert_success(from, &pair1[0], 1, + "{ 'execute': 'getfd'," + " 'arguments': { 'fdname': 'fd-mig' }}"); + qtest_qmp_fds_assert_success(to, &pair2[0], 1, + "{ 'execute': 'getfd'," + " 'arguments': { 'fdname': 'fd-mig' }}"); + + /* + * Write the 1st byte as QEMU_VM_COMMAND (0x8) for the dest socket, to + * emulate the 1st byte of a real recovery, but stops from there to + * keep dest QEMU in RECOVER. This is needed so that we can kick off + * the recover process on dest QEMU (by triggering the G_IO_IN event). + * + * NOTE: this trick is not needed on src QEMUs, because src doesn't + * rely on an pre-existing G_IO_IN event, so it will always trigger the + * upcoming recovery anyway even if it can read nothing. + */ +#define QEMU_VM_COMMAND 0x08 + c = QEMU_VM_COMMAND; + ret = send(pair2[1], &c, 1, 0); + g_assert_cmpint(ret, ==, 1); + + migrate_recover(to, "fd:fd-mig"); + migrate_qmp(from, "fd:fd-mig", "{'resume': true}"); + + /* + * Make sure both QEMU instances will go into RECOVER stage, then test + * kicking them out using migrate-pause. + */ + wait_for_postcopy_status(from, "postcopy-recover"); + wait_for_postcopy_status(to, "postcopy-recover"); + + /* + * This would be issued by the admin upon noticing the hang, we should + * make sure we're able to kick this out. + */ + migrate_pause(from); + wait_for_postcopy_status(from, "postcopy-paused"); + + /* Do the same test on dest */ + migrate_pause(to); + wait_for_postcopy_status(to, "postcopy-paused"); + + close(pair1[0]); + close(pair1[1]); + close(pair2[0]); + close(pair2[1]); +} + static void test_postcopy_recovery_common(MigrateCommon *args) { QTestState *from, *to; @@ -1414,9 +1487,17 @@ static void test_postcopy_recovery_common(MigrateCommon *args) * migrate-recover command can only succeed if destination machine * is in the paused state */ - wait_for_migration_status(to, "postcopy-paused", - (const char * []) { "failed", "active", - "completed", NULL }); + wait_for_postcopy_status(to, "postcopy-paused"); + wait_for_postcopy_status(from, "postcopy-paused"); + + if (args->postcopy_recovery_test_fail) { + /* + * Test when a wrong socket specified for recover, and then the + * ability to kick it out, and continue with a correct socket. + */ + postcopy_recover_fail(from, to); + /* continue with a good recovery */ + } /* * Create a new socket to emulate a new channel that is different @@ -1430,9 +1511,6 @@ static void test_postcopy_recovery_common(MigrateCommon *args) * Try to rebuild the migration channel using the resume flag and * the newly created channel */ - wait_for_migration_status(from, "postcopy-paused", - (const char * []) { "failed", "active", - "completed", NULL }); migrate_qmp(from, uri, "{'resume': true}"); /* Restore the postcopy bandwidth to unlimited */ @@ -1457,6 +1535,15 @@ static void test_postcopy_recovery_compress(void) test_postcopy_recovery_common(&args); } +static void test_postcopy_recovery_double_fail(void) +{ + MigrateCommon args = { + .postcopy_recovery_test_fail = true, + }; + + test_postcopy_recovery_common(&args); +} + #ifdef CONFIG_GNUTLS static void test_postcopy_recovery_tls_psk(void) { @@ -3030,6 +3117,9 @@ int main(int argc, char **argv) qtest_add_func("/migration/postcopy/recovery/compress/plain", test_postcopy_recovery_compress); } + qtest_add_func("/migration/postcopy/recovery/double-failures", + test_postcopy_recovery_double_fail); + } qtest_add_func("/migration/bad_dest", test_baddest); From patchwork Tue Oct 17 20:26:32 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 13426005 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 50591CDB474 for ; Tue, 17 Oct 2023 20:27:22 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1qsqeO-0000Qt-Oy; Tue, 17 Oct 2023 16:26:48 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1qsqeM-0000IQ-6o for qemu-devel@nongnu.org; Tue, 17 Oct 2023 16:26:46 -0400 Received: from us-smtp-delivery-124.mimecast.com ([170.10.129.124]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1qsqeK-0000lT-E4 for qemu-devel@nongnu.org; Tue, 17 Oct 2023 16:26:45 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1697574403; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=8cSQeC5Z7TwUIWEbdrWTkKxGeuItSqPw3IOExmlUlQY=; b=LprZ/RUIChjmEEUVNeeBJ1eVJarGWFWu+w8ffwQ6cDCQS5CGZnmGLRprsEH5ElG6Y3ioPH J+0dKueQkGj20FH/jivinJq2D70KlpC7RZKkqO9zlApcDEc5VmqFnfyF6nByQ9tsDT2K6h 8K2wZxYCxAT7OLIGoEb7KfP+ckdytPY= Received: from mail-qv1-f72.google.com (mail-qv1-f72.google.com [209.85.219.72]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-685-gqGalfuVM0uatG9-ZjeNWw-1; Tue, 17 Oct 2023 16:26:40 -0400 X-MC-Unique: gqGalfuVM0uatG9-ZjeNWw-1 Received: by mail-qv1-f72.google.com with SMTP id 6a1803df08f44-66d3f71f49cso8528386d6.1 for ; Tue, 17 Oct 2023 13:26:40 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1697574400; x=1698179200; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=8cSQeC5Z7TwUIWEbdrWTkKxGeuItSqPw3IOExmlUlQY=; b=YdKrE/3Q4nvk6fyeA9N/hNkKwYOOWW8Xcc3r3/fhQatLuYFXOa+tnmyrRNJrfmruY6 P9AFQtyth2POI8Hdp6mxeuETHBJsRfGHBM2znyE6Bh2e43V4zQJG7/YCubC9yixm8iUP EBIEQLpvKvvm7QWd+dXgwTrF7uAPKGimTZUpV+cuPFgjJ5J5JnvvZlZIblTToNEhuj0h K0RZuFYnmGW6DGl9C1wYnLr5HKw8p2SjRLZwGU7ysLRQ3WoWkUPhAux8ntGkkM78BeST 3gy5bmj6RM3yeJNC4RoYftyls+2VxtQW+/OiUtwz9scfBfQzp7am4BFjbaKWiZI3X9Ju f8ow== X-Gm-Message-State: AOJu0Yw3MWa6sIFodwjqTA1JSxWUoI4FX/pNMfpOWUXuR6JAElWGI6Yg JE+mSs9NZwqdtN2BeZDRcG1esj0A9RDGNbNRV33ckDwmQl52OFjn4zPbuu1Z8Lvz87MBP+23iiF wnHw2UldWmKuje1IL1/jLQZXdlnB4Wa/Jm3lj+PTiJdd4BiXEhzA1nI98Qui2LOskrhG9PBwN X-Received: by 2002:a05:620a:3994:b0:775:8fab:8c6d with SMTP id ro20-20020a05620a399400b007758fab8c6dmr3222214qkn.1.1697574400009; Tue, 17 Oct 2023 13:26:40 -0700 (PDT) X-Google-Smtp-Source: AGHT+IHPzF6uUkr5KWCJFK1zw1tC2LVCedSasJjT7+NJ5d8JsHqvYGMc+3Td3autrmg0ZrKkO+VACg== X-Received: by 2002:a05:620a:3994:b0:775:8fab:8c6d with SMTP id ro20-20020a05620a399400b007758fab8c6dmr3222194qkn.1.1697574399565; Tue, 17 Oct 2023 13:26:39 -0700 (PDT) Received: from x1n.redhat.com (cpe5c7695f3aee0-cm5c7695f3aede.cpe.net.cable.rogers.com. [99.254.144.39]) by smtp.gmail.com with ESMTPSA id s17-20020ae9f711000000b0076f16e98851sm931879qkg.102.2023.10.17.13.26.38 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 17 Oct 2023 13:26:38 -0700 (PDT) From: Peter Xu To: qemu-devel@nongnu.org Cc: Fabiano Rosas , peterx@redhat.com, =?utf-8?q?Philippe_M?= =?utf-8?q?athieu-Daud=C3=A9?= , Juan Quintela Subject: [PATCH v4 4/5] migration: Change ram_dirty_bitmap_reload() retval to bool Date: Tue, 17 Oct 2023 16:26:32 -0400 Message-ID: <20231017202633.296756-5-peterx@redhat.com> X-Mailer: git-send-email 2.41.0 In-Reply-To: <20231017202633.296756-1-peterx@redhat.com> References: <20231017202633.296756-1-peterx@redhat.com> MIME-Version: 1.0 Received-SPF: pass client-ip=170.10.129.124; envelope-from=peterx@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H4=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Now we have a Error** passed into the return path thread stack, which is even clearer than an int retval. Change ram_dirty_bitmap_reload() and the callers to use a bool instead to replace errnos. Suggested-by: Philippe Mathieu-Daudé Signed-off-by: Peter Xu Reviewed-by: Fabiano Rosas Reviewed-by: Juan Quintela --- migration/ram.h | 2 +- migration/migration.c | 18 +++++++++--------- migration/ram.c | 24 ++++++++++++------------ 3 files changed, 22 insertions(+), 22 deletions(-) diff --git a/migration/ram.h b/migration/ram.h index 14ed666d58..af0290f8ab 100644 --- a/migration/ram.h +++ b/migration/ram.h @@ -72,7 +72,7 @@ void ramblock_recv_bitmap_set(RAMBlock *rb, void *host_addr); void ramblock_recv_bitmap_set_range(RAMBlock *rb, void *host_addr, size_t nr); int64_t ramblock_recv_bitmap_send(QEMUFile *file, const char *block_name); -int ram_dirty_bitmap_reload(MigrationState *s, RAMBlock *rb, Error **errp); +bool ram_dirty_bitmap_reload(MigrationState *s, RAMBlock *rb, Error **errp); bool ramblock_page_is_discarded(RAMBlock *rb, ram_addr_t start); void postcopy_preempt_shutdown_file(MigrationState *s); void *postcopy_preempt_thread(void *opaque); diff --git a/migration/migration.c b/migration/migration.c index 0661dad953..dfb8b48dcb 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -1843,29 +1843,29 @@ migrate_handle_rp_req_pages(MigrationState *ms, const char* rbname, ram_save_queue_pages(rbname, start, len, errp); } -static int migrate_handle_rp_recv_bitmap(MigrationState *s, char *block_name, - Error **errp) +static bool migrate_handle_rp_recv_bitmap(MigrationState *s, char *block_name, + Error **errp) { RAMBlock *block = qemu_ram_block_by_name(block_name); if (!block) { error_setg(errp, "MIG_RP_MSG_RECV_BITMAP has invalid block name '%s'", block_name); - return -EINVAL; + return false; } /* Fetch the received bitmap and refresh the dirty bitmap */ return ram_dirty_bitmap_reload(s, block, errp); } -static int migrate_handle_rp_resume_ack(MigrationState *s, - uint32_t value, Error **errp) +static bool migrate_handle_rp_resume_ack(MigrationState *s, + uint32_t value, Error **errp) { trace_source_return_path_thread_resume_ack(value); if (value != MIGRATION_RESUME_ACK_VALUE) { error_setg(errp, "illegal resume_ack value %"PRIu32, value); - return -1; + return false; } /* Now both sides are active. */ @@ -1875,7 +1875,7 @@ static int migrate_handle_rp_resume_ack(MigrationState *s, /* Notify send thread that time to continue send pages */ migration_rp_kick(s); - return 0; + return true; } /* @@ -2026,14 +2026,14 @@ static void *source_return_path_thread(void *opaque) } /* Format: len (1B) + idstr (<255B). This ends the idstr. */ buf[buf[0] + 1] = '\0'; - if (migrate_handle_rp_recv_bitmap(ms, (char *)(buf + 1), &err)) { + if (!migrate_handle_rp_recv_bitmap(ms, (char *)(buf + 1), &err)) { goto out; } break; case MIG_RP_MSG_RESUME_ACK: tmp32 = ldl_be_p(buf); - if (migrate_handle_rp_resume_ack(ms, tmp32, &err)) { + if (!migrate_handle_rp_resume_ack(ms, tmp32, &err)) { goto out; } break; diff --git a/migration/ram.c b/migration/ram.c index 973e872284..ca77444e18 100644 --- a/migration/ram.c +++ b/migration/ram.c @@ -4189,10 +4189,11 @@ static int ram_dirty_bitmap_sync_all(MigrationState *s, RAMState *rs) * Read the received bitmap, revert it as the initial dirty bitmap. * This is only used when the postcopy migration is paused but wants * to resume from a middle point. + * + * Returns true if succeeded, false for errors. */ -int ram_dirty_bitmap_reload(MigrationState *s, RAMBlock *block, Error **errp) +bool ram_dirty_bitmap_reload(MigrationState *s, RAMBlock *block, Error **errp) { - int ret = -EINVAL; /* from_dst_file is always valid because we're within rp_thread */ QEMUFile *file = s->rp_state.from_dst_file; g_autofree unsigned long *le_bitmap = NULL; @@ -4206,7 +4207,7 @@ int ram_dirty_bitmap_reload(MigrationState *s, RAMBlock *block, Error **errp) if (s->state != MIGRATION_STATUS_POSTCOPY_RECOVER) { error_setg(errp, "Reload bitmap in incorrect state %s", MigrationStatus_str(s->state)); - return -EINVAL; + return false; } /* @@ -4224,24 +4225,23 @@ int ram_dirty_bitmap_reload(MigrationState *s, RAMBlock *block, Error **errp) if (size != local_size) { error_setg(errp, "ramblock '%s' bitmap size mismatch (0x%"PRIx64 " != 0x%"PRIx64")", block->idstr, size, local_size); - return -EINVAL; + return false; } size = qemu_get_buffer(file, (uint8_t *)le_bitmap, local_size); end_mark = qemu_get_be64(file); - ret = qemu_file_get_error(file); - if (ret || size != local_size) { - error_setg(errp, "read bitmap failed for ramblock '%s': %d" - " (size 0x%"PRIx64", got: 0x%"PRIx64")", - block->idstr, ret, local_size, size); - return -EIO; + if (qemu_file_get_error(file) || size != local_size) { + error_setg(errp, "read bitmap failed for ramblock '%s': " + "(size 0x%"PRIx64", got: 0x%"PRIx64")", + block->idstr, local_size, size); + return false; } if (end_mark != RAMBLOCK_RECV_BITMAP_ENDING) { error_setg(errp, "ramblock '%s' end mark incorrect: 0x%"PRIx64, block->idstr, end_mark); - return -EINVAL; + return false; } /* @@ -4273,7 +4273,7 @@ int ram_dirty_bitmap_reload(MigrationState *s, RAMBlock *block, Error **errp) */ migration_rp_kick(s); - return 0; + return true; } static int ram_resume_prepare(MigrationState *s, void *opaque) From patchwork Tue Oct 17 20:26:33 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 13426017 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 42A8ECDB474 for ; Tue, 17 Oct 2023 20:28:28 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1qsqeV-0000t1-8X; Tue, 17 Oct 2023 16:26:55 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1qsqeS-0000i9-RO for qemu-devel@nongnu.org; Tue, 17 Oct 2023 16:26:52 -0400 Received: from us-smtp-delivery-124.mimecast.com ([170.10.129.124]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1qsqeL-0000lc-3D for qemu-devel@nongnu.org; Tue, 17 Oct 2023 16:26:52 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1697574403; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=rp5YQ9xMfI27gHt4gsSUZxlCP9AGS7NHKrCi4hwJoAM=; b=bu6S3QVzk7t3IpoAwVypMZG66LmeBM96+4IBW/i2YWd9O1uljguPIKY4vVmsgHvBuUfdSm W+C67DciZBRF4/3scomHGwc6cfLyJOcXWjdy+Nn+HTOy4wlkDLFcgB+YxQ6nV/LhJPFBX2 VMOnZHrIGxk/P+Y6a2IMTuCumkm14Rg= Received: from mail-qv1-f72.google.com (mail-qv1-f72.google.com [209.85.219.72]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-691-_jigtFY3OhC74L-Gucg1Uw-1; Tue, 17 Oct 2023 16:26:42 -0400 X-MC-Unique: _jigtFY3OhC74L-Gucg1Uw-1 Received: by mail-qv1-f72.google.com with SMTP id 6a1803df08f44-66011f0d521so8490546d6.1 for ; Tue, 17 Oct 2023 13:26:41 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1697574401; x=1698179201; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=rp5YQ9xMfI27gHt4gsSUZxlCP9AGS7NHKrCi4hwJoAM=; b=tlQveCjhg7j+7QeOY1UNAR+KW6Z41pOYa379Mtq3MqMPIvJZwe0+p+smjDaQfLu6Ql 2RozEsslu23eD/ufISBEDvvRQiPpPfiT2yPVsgigaHfgvvAgc2Gj0yKCVs6wH0o+fUSV edHk/5vWhgwV4kpqsD6ambOakQFXyKGyszzqV4JlLrolzF8DjSKLMS/BcHQj+4912wuM P+/zbyhAuVZKE0U5yAznGSH8enYy7fhUD03HsIRWGefdI0YTHMgaJJu79kMugUweGZFl +jnYXEXzEDhQYkyFDCsBJFedv885vCn8hJLeiAkrHtjPBAeaZQORH9gdYSRM9X/lA4YE Pfaw== X-Gm-Message-State: AOJu0YwQWvlrJkn3UCuhNSixwbtoue5Dnd+kGlz73hVdwQMcQd9yDhCL Pkl/YAIi+2IPSzPF9QJDzwhaAcmlRy1eV/vuUwKLSBODFNRMaoIhEnOfP2UruWF+21TXFbNyRSR VUbcMb6Wdl1m7DORNOwNIfHEVOjvPS/QS4e8QmmfOHssgIykwaJ4i8RW0HnK4W7ahH/7mR0Q5 X-Received: by 2002:a05:620a:6c0d:b0:778:8ae9:2247 with SMTP id ui13-20020a05620a6c0d00b007788ae92247mr2161246qkn.5.1697574400790; Tue, 17 Oct 2023 13:26:40 -0700 (PDT) X-Google-Smtp-Source: AGHT+IH75v9ptweOv0T1qw86ziZSF/tdYOfONUeQ7eOv6BmR7cVpN3kLPc0iMNBWfGG2l1UBI5Cy9Q== X-Received: by 2002:a05:620a:6c0d:b0:778:8ae9:2247 with SMTP id ui13-20020a05620a6c0d00b007788ae92247mr2161231qkn.5.1697574400457; Tue, 17 Oct 2023 13:26:40 -0700 (PDT) Received: from x1n.redhat.com (cpe5c7695f3aee0-cm5c7695f3aede.cpe.net.cable.rogers.com. [99.254.144.39]) by smtp.gmail.com with ESMTPSA id s17-20020ae9f711000000b0076f16e98851sm931879qkg.102.2023.10.17.13.26.39 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 17 Oct 2023 13:26:40 -0700 (PDT) From: Peter Xu To: qemu-devel@nongnu.org Cc: Fabiano Rosas , peterx@redhat.com, =?utf-8?q?Philippe_M?= =?utf-8?q?athieu-Daud=C3=A9?= , Juan Quintela Subject: [PATCH v4 5/5] migration: Change ram_save_queue_pages() retval to bool Date: Tue, 17 Oct 2023 16:26:33 -0400 Message-ID: <20231017202633.296756-6-peterx@redhat.com> X-Mailer: git-send-email 2.41.0 In-Reply-To: <20231017202633.296756-1-peterx@redhat.com> References: <20231017202633.296756-1-peterx@redhat.com> MIME-Version: 1.0 Received-SPF: pass client-ip=170.10.129.124; envelope-from=peterx@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -14 X-Spam_score: -1.5 X-Spam_bar: - X-Spam_report: (-1.5 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H4=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URG_BIZ=0.573 autolearn=no autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org After we have errp which contains the more detailed error message, make ram_save_queue_pages() returns bool in its stack. Signed-off-by: Peter Xu Reviewed-by: Fabiano Rosas --- migration/ram.h | 4 ++-- migration/migration.c | 16 ++++++++-------- migration/ram.c | 18 +++++++++--------- 3 files changed, 19 insertions(+), 19 deletions(-) diff --git a/migration/ram.h b/migration/ram.h index af0290f8ab..e22a6b0d1c 100644 --- a/migration/ram.h +++ b/migration/ram.h @@ -51,8 +51,8 @@ uint64_t ram_bytes_total(void); void mig_throttle_counter_reset(void); uint64_t ram_pagesize_summary(void); -int ram_save_queue_pages(const char *rbname, ram_addr_t start, ram_addr_t len, - Error **errp); +bool ram_save_queue_pages(const char *rbname, ram_addr_t start, + ram_addr_t len, Error **errp); void ram_postcopy_migrated_memory_release(MigrationState *ms); /* For outgoing discard bitmap */ void ram_postcopy_send_discard_bitmap(MigrationState *ms); diff --git a/migration/migration.c b/migration/migration.c index dfb8b48dcb..50bf8422c7 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -1820,8 +1820,10 @@ static struct rp_cmd_args { * Process a request for pages received on the return path, * We're allowed to send more than requested (e.g. to round to our page size) * and we don't need to send pages that have already been sent. + * + * Returns true if succeed, false otherwise. */ -static void +static bool migrate_handle_rp_req_pages(MigrationState *ms, const char* rbname, ram_addr_t start, size_t len, Error **errp) { @@ -1837,10 +1839,10 @@ migrate_handle_rp_req_pages(MigrationState *ms, const char* rbname, !QEMU_IS_ALIGNED(len, our_host_ps)) { error_setg(errp, "MIG_RP_MSG_REQ_PAGES: Misaligned page request, start:" RAM_ADDR_FMT " len: %zd", start, len); - return; + return false; } - ram_save_queue_pages(rbname, start, len, errp); + return ram_save_queue_pages(rbname, start, len, errp); } static bool migrate_handle_rp_recv_bitmap(MigrationState *s, char *block_name, @@ -1990,8 +1992,7 @@ static void *source_return_path_thread(void *opaque) case MIG_RP_MSG_REQ_PAGES: start = ldq_be_p(buf); len = ldl_be_p(buf + 8); - migrate_handle_rp_req_pages(ms, NULL, start, len, &err); - if (err) { + if (!migrate_handle_rp_req_pages(ms, NULL, start, len, &err)) { goto out; } break; @@ -2012,9 +2013,8 @@ static void *source_return_path_thread(void *opaque) header_len, expected_len); goto out; } - migrate_handle_rp_req_pages(ms, (char *)&buf[13], start, len, - &err); - if (err) { + if (!migrate_handle_rp_req_pages(ms, (char *)&buf[13], start, len, + &err)) { goto out; } break; diff --git a/migration/ram.c b/migration/ram.c index ca77444e18..aca9ae5846 100644 --- a/migration/ram.c +++ b/migration/ram.c @@ -1948,15 +1948,15 @@ static void migration_page_queue_free(RAMState *rs) * * A request from postcopy destination for example. * - * Returns zero on success or negative on error + * Returns true on success or false on error (detailed error put in @errp) * * @rbname: Name of the RAMBLock of the request. NULL means the * same that last one. * @start: starting address from the start of the RAMBlock * @len: length (in bytes) to send */ -int ram_save_queue_pages(const char *rbname, ram_addr_t start, ram_addr_t len, - Error **errp) +bool ram_save_queue_pages(const char *rbname, ram_addr_t start, + ram_addr_t len, Error **errp) { RAMBlock *ramblock; RAMState *rs = ram_state; @@ -1974,7 +1974,7 @@ int ram_save_queue_pages(const char *rbname, ram_addr_t start, ram_addr_t len, * it's the 1st request. */ error_setg(errp, "MIG_RP_MSG_REQ_PAGES has no previous block"); - return -1; + return false; } } else { ramblock = qemu_ram_block_by_name(rbname); @@ -1982,7 +1982,7 @@ int ram_save_queue_pages(const char *rbname, ram_addr_t start, ram_addr_t len, if (!ramblock) { /* We shouldn't be asked for a non-existent RAMBlock */ error_setg(errp, "MIG_RP_MSG_REQ_PAGES has no block '%s'", rbname); - return -1; + return false; } rs->last_req_rb = ramblock; } @@ -1992,7 +1992,7 @@ int ram_save_queue_pages(const char *rbname, ram_addr_t start, ram_addr_t len, "start=" RAM_ADDR_FMT " len=" RAM_ADDR_FMT " blocklen=" RAM_ADDR_FMT, start, len, ramblock->used_length); - return -1; + return false; } /* @@ -2003,7 +2003,7 @@ int ram_save_queue_pages(const char *rbname, ram_addr_t start, ram_addr_t len, ram_addr_t page_start = start >> TARGET_PAGE_BITS; size_t page_size = qemu_ram_pagesize(ramblock); PageSearchStatus *pss = &ram_state->pss[RAM_CHANNEL_POSTCOPY]; - int ret = 0; + bool ret = true; qemu_mutex_lock(&rs->bitmap_mutex); @@ -2026,7 +2026,7 @@ int ram_save_queue_pages(const char *rbname, ram_addr_t start, ram_addr_t len, error_setg(errp, "ram_save_host_page_urgent() failed: " "ramblock=%s, start_addr=0x"RAM_ADDR_FMT, ramblock->idstr, start); - ret = -1; + ret = false; break; } /* @@ -2057,7 +2057,7 @@ int ram_save_queue_pages(const char *rbname, ram_addr_t start, ram_addr_t len, migration_make_urgent_request(); qemu_mutex_unlock(&rs->src_page_req_mutex); - return 0; + return true; } static bool save_page_use_compression(RAMState *rs)