From patchwork Thu Feb 8 03:04:55 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 13549238 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 08F55C4829B for ; Thu, 8 Feb 2024 03:06:48 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1rXujT-0005xY-OV; Wed, 07 Feb 2024 22:05:50 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1rXujN-0005xG-FN for qemu-devel@nongnu.org; Wed, 07 Feb 2024 22:05:41 -0500 Received: from us-smtp-delivery-124.mimecast.com ([170.10.129.124]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1rXujL-0004IH-Jm for qemu-devel@nongnu.org; Wed, 07 Feb 2024 22:05:41 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1707361538; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=a/kk9ED0hp2bNq45sK/1IjSd9mSuqonRqlMm1eHzFT0=; b=YcWvus429cWOWOBlEdVg7n9h2hkIkyudcvBxgd4Ef15/8qJRD2pTna8333eW5AJcTpb+1A rj6HwYW5WsmPVTgAXEd35uRPyBahUfbEjAiXLFhWK/43d+G+C20qr4bh/SYy/1kaEu09mG v8XEGKyv5YzKEhd4x6nZNlR6rU/IAuw= Received: from mimecast-mx02.redhat.com (mx-ext.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-677-cWyNToBZObyLGfxmDb0-Tw-1; Wed, 07 Feb 2024 22:05:37 -0500 X-MC-Unique: cWyNToBZObyLGfxmDb0-Tw-1 Received: from smtp.corp.redhat.com (int-mx09.intmail.prod.int.rdu2.redhat.com [10.11.54.9]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 7B81D3C0F189; Thu, 8 Feb 2024 03:05:36 +0000 (UTC) Received: from x1n.redhat.com (unknown [10.72.116.45]) by smtp.corp.redhat.com (Postfix) with ESMTP id 20A35492BC7; Thu, 8 Feb 2024 03:05:33 +0000 (UTC) From: peterx@redhat.com To: Peter Maydell , qemu-devel@nongnu.org Cc: peterx@redhat.com, Fabiano Rosas , William Roche Subject: [PULL 01/34] migration: prevent migration when VM has poisoned memory Date: Thu, 8 Feb 2024 11:04:55 +0800 Message-ID: <20240208030528.368214-2-peterx@redhat.com> In-Reply-To: <20240208030528.368214-1-peterx@redhat.com> References: <20240208030528.368214-1-peterx@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.4.1 on 10.11.54.9 Received-SPF: pass client-ip=170.10.129.124; envelope-from=peterx@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -21 X-Spam_score: -2.2 X-Spam_bar: -- X-Spam_report: (-2.2 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.106, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H2=-0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org From: William Roche A memory page poisoned from the hypervisor level is no longer readable. The migration of a VM will crash Qemu when it tries to read the memory address space and stumbles on the poisoned page with a similar stack trace: Program terminated with signal SIGBUS, Bus error. #0 _mm256_loadu_si256 #1 buffer_zero_avx2 #2 select_accel_fn #3 buffer_is_zero #4 save_zero_page #5 ram_save_target_page_legacy #6 ram_save_host_page #7 ram_find_and_save_block #8 ram_save_iterate #9 qemu_savevm_state_iterate #10 migration_iteration_run #11 migration_thread #12 qemu_thread_start To avoid this VM crash during the migration, prevent the migration when a known hardware poison exists on the VM. Signed-off-by: William Roche Link: https://lore.kernel.org/r/20240130190640.139364-2-william.roche@oracle.com Signed-off-by: Peter Xu --- include/sysemu/kvm.h | 6 ++++++ accel/kvm/kvm-all.c | 10 ++++++++++ accel/stubs/kvm-stub.c | 5 +++++ migration/migration.c | 7 +++++++ 4 files changed, 28 insertions(+) diff --git a/include/sysemu/kvm.h b/include/sysemu/kvm.h index d614878164..fad9a7e8ff 100644 --- a/include/sysemu/kvm.h +++ b/include/sysemu/kvm.h @@ -538,4 +538,10 @@ bool kvm_arch_cpu_check_are_resettable(void); bool kvm_dirty_ring_enabled(void); uint32_t kvm_dirty_ring_size(void); + +/** + * kvm_hwpoisoned_mem - indicate if there is any hwpoisoned page + * reported for the VM. + */ +bool kvm_hwpoisoned_mem(void); #endif diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c index 49e755ec4a..a8cecd040e 100644 --- a/accel/kvm/kvm-all.c +++ b/accel/kvm/kvm-all.c @@ -1119,6 +1119,11 @@ int kvm_vm_check_extension(KVMState *s, unsigned int extension) return ret; } +/* + * We track the poisoned pages to be able to: + * - replace them on VM reset + * - block a migration for a VM with a poisoned page + */ typedef struct HWPoisonPage { ram_addr_t ram_addr; QLIST_ENTRY(HWPoisonPage) list; @@ -1152,6 +1157,11 @@ void kvm_hwpoison_page_add(ram_addr_t ram_addr) QLIST_INSERT_HEAD(&hwpoison_page_list, page, list); } +bool kvm_hwpoisoned_mem(void) +{ + return !QLIST_EMPTY(&hwpoison_page_list); +} + static uint32_t adjust_ioeventfd_endianness(uint32_t val, uint32_t size) { #if HOST_BIG_ENDIAN != TARGET_BIG_ENDIAN diff --git a/accel/stubs/kvm-stub.c b/accel/stubs/kvm-stub.c index 1b37d9a302..ca38172884 100644 --- a/accel/stubs/kvm-stub.c +++ b/accel/stubs/kvm-stub.c @@ -124,3 +124,8 @@ uint32_t kvm_dirty_ring_size(void) { return 0; } + +bool kvm_hwpoisoned_mem(void) +{ + return false; +} diff --git a/migration/migration.c b/migration/migration.c index d5f705ceef..b574e66f7b 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -67,6 +67,7 @@ #include "options.h" #include "sysemu/dirtylimit.h" #include "qemu/sockets.h" +#include "sysemu/kvm.h" static NotifierList migration_state_notifiers = NOTIFIER_LIST_INITIALIZER(migration_state_notifiers); @@ -1906,6 +1907,12 @@ static bool migrate_prepare(MigrationState *s, bool blk, bool blk_inc, return false; } + if (kvm_hwpoisoned_mem()) { + error_setg(errp, "Can't migrate this vm with hardware poisoned memory, " + "please reboot the vm and try again"); + return false; + } + if (migration_is_blocked(errp)) { return false; } From patchwork Thu Feb 8 03:04:56 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 13549256 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 0DFA3C4828F for ; Thu, 8 Feb 2024 03:09:19 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1rXujh-0006BS-I5; Wed, 07 Feb 2024 22:06:01 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1rXujR-0005xd-Q7 for qemu-devel@nongnu.org; Wed, 07 Feb 2024 22:05:47 -0500 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1rXujP-0004Iz-W5 for qemu-devel@nongnu.org; Wed, 07 Feb 2024 22:05:45 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1707361543; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=GXePTiwA5V83mwEX5LUgK1Dw9tfBV23tEnkraCOVtjI=; b=QCh6JrDIzXX6Imzsn5HGurvNLqYiYKu4FsCtPBQ9VN2wry3upKjMOA+Mj1lAp0BtH9D1j+ A1VCneMAydZ91ZoE5waP3CmLA2e4SFNMWoKX/hj+mPMFRrwGRtvUtJTtf+UJZAGWch9Ws4 gSo8qzHfw5u0TsEF1HITyqmxCsKi9uw= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-618-SIlGad_qOEmp608_WG5u-g-1; Wed, 07 Feb 2024 22:05:39 -0500 X-MC-Unique: SIlGad_qOEmp608_WG5u-g-1 Received: from smtp.corp.redhat.com (int-mx09.intmail.prod.int.rdu2.redhat.com [10.11.54.9]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 81F3C185A784; Thu, 8 Feb 2024 03:05:39 +0000 (UTC) Received: from x1n.redhat.com (unknown [10.72.116.45]) by smtp.corp.redhat.com (Postfix) with ESMTP id 3423C492BC7; Thu, 8 Feb 2024 03:05:36 +0000 (UTC) From: peterx@redhat.com To: Peter Maydell , qemu-devel@nongnu.org Cc: peterx@redhat.com, Fabiano Rosas Subject: [PULL 02/34] migration/multifd: Drop stale comment for multifd zero copy Date: Thu, 8 Feb 2024 11:04:56 +0800 Message-ID: <20240208030528.368214-3-peterx@redhat.com> In-Reply-To: <20240208030528.368214-1-peterx@redhat.com> References: <20240208030528.368214-1-peterx@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.4.1 on 10.11.54.9 Received-SPF: pass client-ip=170.10.133.124; envelope-from=peterx@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -21 X-Spam_score: -2.2 X-Spam_bar: -- X-Spam_report: (-2.2 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.106, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H5=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org From: Peter Xu We've already done that with multifd_flush_after_each_section, for multifd in general. Drop the stale "TODO-like" comment. Reviewed-by: Fabiano Rosas Link: https://lore.kernel.org/r/20240202102857.110210-2-peterx@redhat.com Signed-off-by: Peter Xu --- migration/multifd.c | 11 ----------- 1 file changed, 11 deletions(-) diff --git a/migration/multifd.c b/migration/multifd.c index 25cbc6dc6b..eee2586770 100644 --- a/migration/multifd.c +++ b/migration/multifd.c @@ -598,17 +598,6 @@ int multifd_send_sync_main(void) } } - /* - * When using zero-copy, it's necessary to flush the pages before any of - * the pages can be sent again, so we'll make sure the new version of the - * pages will always arrive _later_ than the old pages. - * - * Currently we achieve this by flushing the zero-page requested writes - * per ram iteration, but in the future we could potentially optimize it - * to be less frequent, e.g. only after we finished one whole scanning of - * all the dirty bitmaps. - */ - flush_zero_copy = migrate_zero_copy_send(); for (i = 0; i < migrate_multifd_channels(); i++) { From patchwork Thu Feb 8 03:04:57 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 13549242 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 350B0C4828F for ; Thu, 8 Feb 2024 03:07:43 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1rXujf-00068W-02; Wed, 07 Feb 2024 22:05:59 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1rXujT-0005xj-4X for qemu-devel@nongnu.org; Wed, 07 Feb 2024 22:05:47 -0500 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1rXujR-0004J5-HS for qemu-devel@nongnu.org; Wed, 07 Feb 2024 22:05:46 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1707361544; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=nDtOh6EyR1FRlMPetryL8BQxGR+uScxl+Gq8KTSlZg4=; b=bGD6v8DiXL7n0p9AElbLtfCpE9VfmKpCRo1/56s/4+QfTkSeBACxyTbfTrxOze6K6QXJRF ojBsn1jkZV+SBc9HuBAF7HhSaz+RL6WLN7SmCjYvH09NfsAhu+qXnCfL7Gg5kFjCEGjqZp C7tj5RIZo+we8EpI8/REyNT7UYAnsjo= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-638-ML1HRWBwNECzr6qH0ZDDdg-1; Wed, 07 Feb 2024 22:05:42 -0500 X-MC-Unique: ML1HRWBwNECzr6qH0ZDDdg-1 Received: from smtp.corp.redhat.com (int-mx09.intmail.prod.int.rdu2.redhat.com [10.11.54.9]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 6D5E2870821; Thu, 8 Feb 2024 03:05:42 +0000 (UTC) Received: from x1n.redhat.com (unknown [10.72.116.45]) by smtp.corp.redhat.com (Postfix) with ESMTP id 23B08492BC7; Thu, 8 Feb 2024 03:05:39 +0000 (UTC) From: peterx@redhat.com To: Peter Maydell , qemu-devel@nongnu.org Cc: peterx@redhat.com, Fabiano Rosas Subject: [PULL 03/34] migration/multifd: multifd_send_kick_main() Date: Thu, 8 Feb 2024 11:04:57 +0800 Message-ID: <20240208030528.368214-4-peterx@redhat.com> In-Reply-To: <20240208030528.368214-1-peterx@redhat.com> References: <20240208030528.368214-1-peterx@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.4.1 on 10.11.54.9 Received-SPF: pass client-ip=170.10.133.124; envelope-from=peterx@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -21 X-Spam_score: -2.2 X-Spam_bar: -- X-Spam_report: (-2.2 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.106, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H5=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org From: Peter Xu When a multifd sender thread hit errors, it always needs to kick the main thread by kicking all the semaphores that it can be waiting upon. Provide a helper for it and deduplicate the code. Reviewed-by: Fabiano Rosas Link: https://lore.kernel.org/r/20240202102857.110210-3-peterx@redhat.com Signed-off-by: Peter Xu --- migration/multifd.c | 21 +++++++++++++++------ 1 file changed, 15 insertions(+), 6 deletions(-) diff --git a/migration/multifd.c b/migration/multifd.c index eee2586770..b8d2c96533 100644 --- a/migration/multifd.c +++ b/migration/multifd.c @@ -372,6 +372,18 @@ struct { MultiFDMethods *ops; } *multifd_send_state; +/* + * The migration thread can wait on either of the two semaphores. This + * function can be used to kick the main thread out of waiting on either of + * them. Should mostly only be called when something wrong happened with + * the current multifd send thread. + */ +static void multifd_send_kick_main(MultiFDSendParams *p) +{ + qemu_sem_post(&p->sem_sync); + qemu_sem_post(&multifd_send_state->channels_ready); +} + /* * How we use multifd_send_state->pages and channel->pages? * @@ -739,8 +751,7 @@ out: assert(local_err); trace_multifd_send_error(p->id); multifd_send_terminate_threads(local_err); - qemu_sem_post(&p->sem_sync); - qemu_sem_post(&multifd_send_state->channels_ready); + multifd_send_kick_main(p); error_free(local_err); } @@ -781,8 +792,7 @@ static void multifd_tls_outgoing_handshake(QIOTask *task, * is not created, and then tell who pay attention to me. */ p->quit = true; - qemu_sem_post(&multifd_send_state->channels_ready); - qemu_sem_post(&p->sem_sync); + multifd_send_kick_main(p); error_free(err); } @@ -852,8 +862,7 @@ static void multifd_new_send_channel_cleanup(MultiFDSendParams *p, { migrate_set_error(migrate_get_current(), err); /* Error happen, we need to tell who pay attention to me */ - qemu_sem_post(&multifd_send_state->channels_ready); - qemu_sem_post(&p->sem_sync); + multifd_send_kick_main(p); /* * Although multifd_send_thread is not created, but main migration * thread need to judge whether it is running, so we need to mark From patchwork Thu Feb 8 03:04:58 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 13549270 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 0982BC4828F for ; Thu, 8 Feb 2024 03:11:42 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1rXujj-0006Ck-Ok; Wed, 07 Feb 2024 22:06:03 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1rXujX-000616-WF for qemu-devel@nongnu.org; Wed, 07 Feb 2024 22:05:54 -0500 Received: from us-smtp-delivery-124.mimecast.com ([170.10.129.124]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1rXujV-0004JQ-T9 for qemu-devel@nongnu.org; Wed, 07 Feb 2024 22:05:51 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1707361549; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=+E2Rwct9Es94xaZWN9ER664YfyJACLQgAotPE03pOCU=; b=CYmLf0W2OfiJurL/g4LkB4pvqvSH8fibtjlSjR2SJrdL/N4Y7htmchyuS3J5KizBbtZUQT Di6rhGrQhbF4YER0M+iZJZu5NqdnLzIAj5e+eJLB5PKEubX4SorZk1MHqpb3kmvLCAbhbF KAxIqT60NXz2Tmer9oQdd2FXz0XMCNw= Received: from mimecast-mx02.redhat.com (mx-ext.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-169-bSBXAprWPC61iRTvwbqtfA-1; Wed, 07 Feb 2024 22:05:45 -0500 X-MC-Unique: bSBXAprWPC61iRTvwbqtfA-1 Received: from smtp.corp.redhat.com (int-mx09.intmail.prod.int.rdu2.redhat.com [10.11.54.9]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 623A33C0F18A; Thu, 8 Feb 2024 03:05:45 +0000 (UTC) Received: from x1n.redhat.com (unknown [10.72.116.45]) by smtp.corp.redhat.com (Postfix) with ESMTP id 0E177492BC7; Thu, 8 Feb 2024 03:05:42 +0000 (UTC) From: peterx@redhat.com To: Peter Maydell , qemu-devel@nongnu.org Cc: peterx@redhat.com, Fabiano Rosas Subject: [PULL 04/34] migration/multifd: Drop MultiFDSendParams.quit, cleanup error paths Date: Thu, 8 Feb 2024 11:04:58 +0800 Message-ID: <20240208030528.368214-5-peterx@redhat.com> In-Reply-To: <20240208030528.368214-1-peterx@redhat.com> References: <20240208030528.368214-1-peterx@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.4.1 on 10.11.54.9 Received-SPF: pass client-ip=170.10.129.124; envelope-from=peterx@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -21 X-Spam_score: -2.2 X-Spam_bar: -- X-Spam_report: (-2.2 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.106, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H2=-0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org From: Peter Xu Multifd send side has two fields to indicate error quits: - MultiFDSendParams.quit - &multifd_send_state->exiting Merge them into the global one. The replacement is done by changing all p->quit checks into the global var check. The global check doesn't need any lock. A few more things done on top of this altogether: - multifd_send_terminate_threads() Moving the xchg() of &multifd_send_state->exiting upper, so as to cover the tracepoint, migrate_set_error() and migrate_set_state(). - multifd_send_sync_main() In the 2nd loop, add one more check over the global var to make sure we don't keep the looping if QEMU already decided to quit. - multifd_tls_outgoing_handshake() Use multifd_send_terminate_threads() to set the error state. That has a benefit of updating MigrationState.error to that error too, so we can persist that 1st error we hit in that specific channel. - multifd_new_send_channel_async() Take similar approach like above, drop the migrate_set_error() because multifd_send_terminate_threads() already covers that. Unwrap the helper multifd_new_send_channel_cleanup() along the way; not really needed. Reviewed-by: Fabiano Rosas Link: https://lore.kernel.org/r/20240202102857.110210-4-peterx@redhat.com Signed-off-by: Peter Xu --- migration/multifd.h | 2 -- migration/multifd.c | 85 ++++++++++++++++++--------------------------- 2 files changed, 33 insertions(+), 54 deletions(-) diff --git a/migration/multifd.h b/migration/multifd.h index 35d11f103c..7c040cb85a 100644 --- a/migration/multifd.h +++ b/migration/multifd.h @@ -95,8 +95,6 @@ typedef struct { QemuMutex mutex; /* is this channel thread running */ bool running; - /* should this thread finish */ - bool quit; /* multifd flags for each packet */ uint32_t flags; /* global number of generated multifd packets */ diff --git a/migration/multifd.c b/migration/multifd.c index b8d2c96533..2c98023d67 100644 --- a/migration/multifd.c +++ b/migration/multifd.c @@ -372,6 +372,11 @@ struct { MultiFDMethods *ops; } *multifd_send_state; +static bool multifd_send_should_exit(void) +{ + return qatomic_read(&multifd_send_state->exiting); +} + /* * The migration thread can wait on either of the two semaphores. This * function can be used to kick the main thread out of waiting on either of @@ -409,7 +414,7 @@ static int multifd_send_pages(void) MultiFDSendParams *p = NULL; /* make happy gcc */ MultiFDPages_t *pages = multifd_send_state->pages; - if (qatomic_read(&multifd_send_state->exiting)) { + if (multifd_send_should_exit()) { return -1; } @@ -421,14 +426,11 @@ static int multifd_send_pages(void) */ next_channel %= migrate_multifd_channels(); for (i = next_channel;; i = (i + 1) % migrate_multifd_channels()) { - p = &multifd_send_state->params[i]; - - qemu_mutex_lock(&p->mutex); - if (p->quit) { - error_report("%s: channel %d has already quit!", __func__, i); - qemu_mutex_unlock(&p->mutex); + if (multifd_send_should_exit()) { return -1; } + p = &multifd_send_state->params[i]; + qemu_mutex_lock(&p->mutex); if (!p->pending_job) { p->pending_job++; next_channel = (i + 1) % migrate_multifd_channels(); @@ -483,6 +485,16 @@ static void multifd_send_terminate_threads(Error *err) { int i; + /* + * We don't want to exit each threads twice. Depending on where + * we get the error, or if there are two independent errors in two + * threads at the same time, we can end calling this function + * twice. + */ + if (qatomic_xchg(&multifd_send_state->exiting, 1)) { + return; + } + trace_multifd_send_terminate_threads(err != NULL); if (err) { @@ -497,26 +509,13 @@ static void multifd_send_terminate_threads(Error *err) } } - /* - * We don't want to exit each threads twice. Depending on where - * we get the error, or if there are two independent errors in two - * threads at the same time, we can end calling this function - * twice. - */ - if (qatomic_xchg(&multifd_send_state->exiting, 1)) { - return; - } - for (i = 0; i < migrate_multifd_channels(); i++) { MultiFDSendParams *p = &multifd_send_state->params[i]; - qemu_mutex_lock(&p->mutex); - p->quit = true; qemu_sem_post(&p->sem); if (p->c) { qio_channel_shutdown(p->c, QIO_CHANNEL_SHUTDOWN_BOTH, NULL); } - qemu_mutex_unlock(&p->mutex); } } @@ -615,16 +614,13 @@ int multifd_send_sync_main(void) for (i = 0; i < migrate_multifd_channels(); i++) { MultiFDSendParams *p = &multifd_send_state->params[i]; - trace_multifd_send_sync_main_signal(p->id); - - qemu_mutex_lock(&p->mutex); - - if (p->quit) { - error_report("%s: channel %d has already quit", __func__, i); - qemu_mutex_unlock(&p->mutex); + if (multifd_send_should_exit()) { return -1; } + trace_multifd_send_sync_main_signal(p->id); + + qemu_mutex_lock(&p->mutex); p->packet_num = multifd_send_state->packet_num++; p->flags |= MULTIFD_FLAG_SYNC; p->pending_job++; @@ -634,6 +630,10 @@ int multifd_send_sync_main(void) for (i = 0; i < migrate_multifd_channels(); i++) { MultiFDSendParams *p = &multifd_send_state->params[i]; + if (multifd_send_should_exit()) { + return -1; + } + qemu_sem_wait(&multifd_send_state->channels_ready); trace_multifd_send_sync_main_wait(p->id); qemu_sem_wait(&p->sem_sync); @@ -671,7 +671,7 @@ static void *multifd_send_thread(void *opaque) qemu_sem_post(&multifd_send_state->channels_ready); qemu_sem_wait(&p->sem); - if (qatomic_read(&multifd_send_state->exiting)) { + if (multifd_send_should_exit()) { break; } qemu_mutex_lock(&p->mutex); @@ -786,12 +786,7 @@ static void multifd_tls_outgoing_handshake(QIOTask *task, trace_multifd_tls_outgoing_handshake_error(ioc, error_get_pretty(err)); - migrate_set_error(migrate_get_current(), err); - /* - * Error happen, mark multifd_send_thread status as 'quit' although it - * is not created, and then tell who pay attention to me. - */ - p->quit = true; + multifd_send_terminate_threads(err); multifd_send_kick_main(p); error_free(err); } @@ -857,22 +852,6 @@ static bool multifd_channel_connect(MultiFDSendParams *p, return true; } -static void multifd_new_send_channel_cleanup(MultiFDSendParams *p, - QIOChannel *ioc, Error *err) -{ - migrate_set_error(migrate_get_current(), err); - /* Error happen, we need to tell who pay attention to me */ - multifd_send_kick_main(p); - /* - * Although multifd_send_thread is not created, but main migration - * thread need to judge whether it is running, so we need to mark - * its status. - */ - p->quit = true; - object_unref(OBJECT(ioc)); - error_free(err); -} - static void multifd_new_send_channel_async(QIOTask *task, gpointer opaque) { MultiFDSendParams *p = opaque; @@ -889,7 +868,10 @@ static void multifd_new_send_channel_async(QIOTask *task, gpointer opaque) } trace_multifd_new_send_channel_async_error(p->id, local_err); - multifd_new_send_channel_cleanup(p, ioc, local_err); + multifd_send_terminate_threads(local_err); + multifd_send_kick_main(p); + object_unref(OBJECT(ioc)); + error_free(local_err); } static void multifd_new_send_channel_create(gpointer opaque) @@ -921,7 +903,6 @@ int multifd_save_setup(Error **errp) qemu_mutex_init(&p->mutex); qemu_sem_init(&p->sem, 0); qemu_sem_init(&p->sem_sync, 0); - p->quit = false; p->pending_job = 0; p->id = i; p->pages = multifd_pages_init(page_count); From patchwork Thu Feb 8 03:04:59 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 13549240 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id D7B0DC4829B for ; Thu, 8 Feb 2024 03:07:06 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1rXujf-00068U-32; Wed, 07 Feb 2024 22:05:59 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1rXujZ-00061F-0Q for qemu-devel@nongnu.org; Wed, 07 Feb 2024 22:05:55 -0500 Received: from us-smtp-delivery-124.mimecast.com ([170.10.129.124]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1rXujX-0004Ku-Bj for qemu-devel@nongnu.org; Wed, 07 Feb 2024 22:05:52 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1707361550; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=rxZHrPajLtIGHeZjEajUe4yfONFZ2JGFrfzt8+Ew5WI=; b=Cb6QaCfe2/9Jo+ypL882OZE9m1Uo/Pk2XQ76wuwwNbL2wcxRW5ooeVSLqAWIGqx6513IF4 mluw66VAUTcoCYW/l9d7oIxHmDPdyZzwHq6r9eFdrNOjHfgbi1PRIGatbCoPkznCfQLRS8 IdJ3YUd7xShah1S99oCRXhTzJYlGu08= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-477-3nwhbgwXN3uDdaVD9ZUOEg-1; Wed, 07 Feb 2024 22:05:48 -0500 X-MC-Unique: 3nwhbgwXN3uDdaVD9ZUOEg-1 Received: from smtp.corp.redhat.com (int-mx09.intmail.prod.int.rdu2.redhat.com [10.11.54.9]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 90273185A780; Thu, 8 Feb 2024 03:05:48 +0000 (UTC) Received: from x1n.redhat.com (unknown [10.72.116.45]) by smtp.corp.redhat.com (Postfix) with ESMTP id 1C56F492BC7; Thu, 8 Feb 2024 03:05:45 +0000 (UTC) From: peterx@redhat.com To: Peter Maydell , qemu-devel@nongnu.org Cc: peterx@redhat.com, Fabiano Rosas Subject: [PULL 05/34] migration/multifd: Postpone reset of MultiFDPages_t Date: Thu, 8 Feb 2024 11:04:59 +0800 Message-ID: <20240208030528.368214-6-peterx@redhat.com> In-Reply-To: <20240208030528.368214-1-peterx@redhat.com> References: <20240208030528.368214-1-peterx@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.4.1 on 10.11.54.9 Received-SPF: pass client-ip=170.10.129.124; envelope-from=peterx@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -21 X-Spam_score: -2.2 X-Spam_bar: -- X-Spam_report: (-2.2 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.106, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H2=-0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org From: Peter Xu Now we reset MultiFDPages_t object in the multifd sender thread in the middle of the sending job. That's not necessary, because the "*pages" struct will not be reused anyway until pending_job is cleared. Move that to the end after the job is completed, provide a helper to reset a "*pages" object. Use that same helper when free the object too. This prepares us to keep using p->pages in the follow up patches, where we may drop p->normal[]. Reviewed-by: Fabiano Rosas Link: https://lore.kernel.org/r/20240202102857.110210-5-peterx@redhat.com Signed-off-by: Peter Xu --- migration/multifd.c | 18 ++++++++++++++---- 1 file changed, 14 insertions(+), 4 deletions(-) diff --git a/migration/multifd.c b/migration/multifd.c index 2c98023d67..5633ac245a 100644 --- a/migration/multifd.c +++ b/migration/multifd.c @@ -172,6 +172,17 @@ void multifd_register_ops(int method, MultiFDMethods *ops) multifd_ops[method] = ops; } +/* Reset a MultiFDPages_t* object for the next use */ +static void multifd_pages_reset(MultiFDPages_t *pages) +{ + /* + * We don't need to touch offset[] array, because it will be + * overwritten later when reused. + */ + pages->num = 0; + pages->block = NULL; +} + static int multifd_send_initial_packet(MultiFDSendParams *p, Error **errp) { MultiFDInit_t msg = {}; @@ -248,9 +259,8 @@ static MultiFDPages_t *multifd_pages_init(uint32_t n) static void multifd_pages_clear(MultiFDPages_t *pages) { - pages->num = 0; + multifd_pages_reset(pages); pages->allocated = 0; - pages->block = NULL; g_free(pages->offset); pages->offset = NULL; g_free(pages); @@ -704,8 +714,6 @@ static void *multifd_send_thread(void *opaque) p->flags = 0; p->num_packets++; p->total_normal_pages += p->normal_num; - p->pages->num = 0; - p->pages->block = NULL; qemu_mutex_unlock(&p->mutex); trace_multifd_send(p->id, packet_num, p->normal_num, flags, @@ -732,6 +740,8 @@ static void *multifd_send_thread(void *opaque) stat64_add(&mig_stats.multifd_bytes, p->next_packet_size + p->packet_len); + + multifd_pages_reset(p->pages); p->next_packet_size = 0; qemu_mutex_lock(&p->mutex); p->pending_job--; From patchwork Thu Feb 8 03:05:00 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 13549273 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id F2F12C4828F for ; Thu, 8 Feb 2024 03:12:06 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1rXujj-0006CR-3s; Wed, 07 Feb 2024 22:06:03 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1rXujf-0006AB-0g for qemu-devel@nongnu.org; Wed, 07 Feb 2024 22:05:59 -0500 Received: from us-smtp-delivery-124.mimecast.com ([170.10.129.124]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1rXujc-0004LF-7o for qemu-devel@nongnu.org; Wed, 07 Feb 2024 22:05:58 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1707361555; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=F8rpuPTioywxfY/ZvOjWZTcWw2/q7y++Xn+a9fof+s0=; b=IRern8mtjEw4YmTY/lHvnosl2OIljGIjHTh1cIE3oJ2XVMGfC/NC5RDft60cRCodXTmIau SIQ4r964DiJQRIXZOUuezjxRJB55bz6zfgH33EZh84aGC6llD6Udw4K4h6QPwrYplvTKNi J7HiBOJBZGShXXREJTt2pPD2RwTemTM= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-393-9dDuNEYlNP2lsiH5NGSpwA-1; Wed, 07 Feb 2024 22:05:52 -0500 X-MC-Unique: 9dDuNEYlNP2lsiH5NGSpwA-1 Received: from smtp.corp.redhat.com (int-mx09.intmail.prod.int.rdu2.redhat.com [10.11.54.9]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id D3493845E63; Thu, 8 Feb 2024 03:05:51 +0000 (UTC) Received: from x1n.redhat.com (unknown [10.72.116.45]) by smtp.corp.redhat.com (Postfix) with ESMTP id 60158492BC7; Thu, 8 Feb 2024 03:05:48 +0000 (UTC) From: peterx@redhat.com To: Peter Maydell , qemu-devel@nongnu.org Cc: peterx@redhat.com, Fabiano Rosas Subject: [PULL 06/34] migration/multifd: Drop MultiFDSendParams.normal[] array Date: Thu, 8 Feb 2024 11:05:00 +0800 Message-ID: <20240208030528.368214-7-peterx@redhat.com> In-Reply-To: <20240208030528.368214-1-peterx@redhat.com> References: <20240208030528.368214-1-peterx@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.4.1 on 10.11.54.9 Received-SPF: pass client-ip=170.10.129.124; envelope-from=peterx@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -21 X-Spam_score: -2.2 X-Spam_bar: -- X-Spam_report: (-2.2 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.106, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H2=-0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org From: Peter Xu This array is redundant when p->pages exists. Now we extended the life of p->pages to the whole period where pending_job is set, it should be safe to always use p->pages->offset[] rather than p->normal[]. Drop the array. Alongside, the normal_num is also redundant, which is the same to p->pages->num. This doesn't apply to recv side, because there's no extra buffering on recv side, so p->normal[] array is still needed. Reviewed-by: Fabiano Rosas Link: https://lore.kernel.org/r/20240202102857.110210-6-peterx@redhat.com Signed-off-by: Peter Xu --- migration/multifd.h | 4 ---- migration/multifd-zlib.c | 7 ++++--- migration/multifd-zstd.c | 7 ++++--- migration/multifd.c | 33 +++++++++++++-------------------- 4 files changed, 21 insertions(+), 30 deletions(-) diff --git a/migration/multifd.h b/migration/multifd.h index 7c040cb85a..3920bdbcf1 100644 --- a/migration/multifd.h +++ b/migration/multifd.h @@ -122,10 +122,6 @@ typedef struct { struct iovec *iov; /* number of iovs used */ uint32_t iovs_num; - /* Pages that are not zero */ - ram_addr_t *normal; - /* num of non zero pages */ - uint32_t normal_num; /* used for compression methods */ void *data; } MultiFDSendParams; diff --git a/migration/multifd-zlib.c b/migration/multifd-zlib.c index 37ce48621e..100809abc1 100644 --- a/migration/multifd-zlib.c +++ b/migration/multifd-zlib.c @@ -116,17 +116,18 @@ static void zlib_send_cleanup(MultiFDSendParams *p, Error **errp) */ static int zlib_send_prepare(MultiFDSendParams *p, Error **errp) { + MultiFDPages_t *pages = p->pages; struct zlib_data *z = p->data; z_stream *zs = &z->zs; uint32_t out_size = 0; int ret; uint32_t i; - for (i = 0; i < p->normal_num; i++) { + for (i = 0; i < pages->num; i++) { uint32_t available = z->zbuff_len - out_size; int flush = Z_NO_FLUSH; - if (i == p->normal_num - 1) { + if (i == pages->num - 1) { flush = Z_SYNC_FLUSH; } @@ -135,7 +136,7 @@ static int zlib_send_prepare(MultiFDSendParams *p, Error **errp) * with compression. zlib does not guarantee that this is safe, * therefore copy the page before calling deflate(). */ - memcpy(z->buf, p->pages->block->host + p->normal[i], p->page_size); + memcpy(z->buf, p->pages->block->host + pages->offset[i], p->page_size); zs->avail_in = p->page_size; zs->next_in = z->buf; diff --git a/migration/multifd-zstd.c b/migration/multifd-zstd.c index b471daadcd..2023edd8cc 100644 --- a/migration/multifd-zstd.c +++ b/migration/multifd-zstd.c @@ -113,6 +113,7 @@ static void zstd_send_cleanup(MultiFDSendParams *p, Error **errp) */ static int zstd_send_prepare(MultiFDSendParams *p, Error **errp) { + MultiFDPages_t *pages = p->pages; struct zstd_data *z = p->data; int ret; uint32_t i; @@ -121,13 +122,13 @@ static int zstd_send_prepare(MultiFDSendParams *p, Error **errp) z->out.size = z->zbuff_len; z->out.pos = 0; - for (i = 0; i < p->normal_num; i++) { + for (i = 0; i < pages->num; i++) { ZSTD_EndDirective flush = ZSTD_e_continue; - if (i == p->normal_num - 1) { + if (i == pages->num - 1) { flush = ZSTD_e_flush; } - z->in.src = p->pages->block->host + p->normal[i]; + z->in.src = p->pages->block->host + pages->offset[i]; z->in.size = p->page_size; z->in.pos = 0; diff --git a/migration/multifd.c b/migration/multifd.c index 5633ac245a..8bb1fd95cf 100644 --- a/migration/multifd.c +++ b/migration/multifd.c @@ -90,13 +90,13 @@ static int nocomp_send_prepare(MultiFDSendParams *p, Error **errp) { MultiFDPages_t *pages = p->pages; - for (int i = 0; i < p->normal_num; i++) { - p->iov[p->iovs_num].iov_base = pages->block->host + p->normal[i]; + for (int i = 0; i < pages->num; i++) { + p->iov[p->iovs_num].iov_base = pages->block->host + pages->offset[i]; p->iov[p->iovs_num].iov_len = p->page_size; p->iovs_num++; } - p->next_packet_size = p->normal_num * p->page_size; + p->next_packet_size = pages->num * p->page_size; p->flags |= MULTIFD_FLAG_NOCOMP; return 0; } @@ -269,21 +269,22 @@ static void multifd_pages_clear(MultiFDPages_t *pages) static void multifd_send_fill_packet(MultiFDSendParams *p) { MultiFDPacket_t *packet = p->packet; + MultiFDPages_t *pages = p->pages; int i; packet->flags = cpu_to_be32(p->flags); packet->pages_alloc = cpu_to_be32(p->pages->allocated); - packet->normal_pages = cpu_to_be32(p->normal_num); + packet->normal_pages = cpu_to_be32(pages->num); packet->next_packet_size = cpu_to_be32(p->next_packet_size); packet->packet_num = cpu_to_be64(p->packet_num); - if (p->pages->block) { - strncpy(packet->ramblock, p->pages->block->idstr, 256); + if (pages->block) { + strncpy(packet->ramblock, pages->block->idstr, 256); } - for (i = 0; i < p->normal_num; i++) { + for (i = 0; i < pages->num; i++) { /* there are architectures where ram_addr_t is 32 bit */ - uint64_t temp = p->normal[i]; + uint64_t temp = pages->offset[i]; packet->offset[i] = cpu_to_be64(temp); } @@ -570,8 +571,6 @@ void multifd_save_cleanup(void) p->packet = NULL; g_free(p->iov); p->iov = NULL; - g_free(p->normal); - p->normal = NULL; multifd_send_state->ops->send_cleanup(p, &local_err); if (local_err) { migrate_set_error(migrate_get_current(), local_err); @@ -688,8 +687,8 @@ static void *multifd_send_thread(void *opaque) if (p->pending_job) { uint64_t packet_num = p->packet_num; + MultiFDPages_t *pages = p->pages; uint32_t flags; - p->normal_num = 0; if (use_zero_copy_send) { p->iovs_num = 0; @@ -697,12 +696,7 @@ static void *multifd_send_thread(void *opaque) p->iovs_num = 1; } - for (int i = 0; i < p->pages->num; i++) { - p->normal[p->normal_num] = p->pages->offset[i]; - p->normal_num++; - } - - if (p->normal_num) { + if (pages->num) { ret = multifd_send_state->ops->send_prepare(p, &local_err); if (ret != 0) { qemu_mutex_unlock(&p->mutex); @@ -713,10 +707,10 @@ static void *multifd_send_thread(void *opaque) flags = p->flags; p->flags = 0; p->num_packets++; - p->total_normal_pages += p->normal_num; + p->total_normal_pages += pages->num; qemu_mutex_unlock(&p->mutex); - trace_multifd_send(p->id, packet_num, p->normal_num, flags, + trace_multifd_send(p->id, packet_num, pages->num, flags, p->next_packet_size); if (use_zero_copy_send) { @@ -924,7 +918,6 @@ int multifd_save_setup(Error **errp) p->name = g_strdup_printf("multifdsend_%d", i); /* We need one extra place for the packet header */ p->iov = g_new0(struct iovec, page_count + 1); - p->normal = g_new0(ram_addr_t, page_count); p->page_size = qemu_target_page_size(); p->page_count = page_count; From patchwork Thu Feb 8 03:05:01 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 13549269 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 18984C4829B for ; Thu, 8 Feb 2024 03:11:41 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1rXujp-0006EC-Gy; Wed, 07 Feb 2024 22:06:09 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1rXujh-0006Bp-O6 for qemu-devel@nongnu.org; Wed, 07 Feb 2024 22:06:01 -0500 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1rXujf-0004LP-Kt for qemu-devel@nongnu.org; Wed, 07 Feb 2024 22:06:01 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1707361558; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=ZXF88j6L+HU+EILPLvbQTVGt5UFQGI3yIKbM1yEgjSo=; b=Z6Q4GRFQKgNyfBr0zwQ+nguFfrU6ledwdsoM8mXFwAssGgPhe3BObcULovH1/SU/RPk7ME igSfgzx1Z/Z9+35mwKQEq+htoTYxxA+5vkTQ/BbZK+8D6mRLBu0M7e3aI5GCFMp9s3AgRI P46nmdfQNrkJizmopMtRiZfRMIz51jU= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-436-8-uARfx-P22rhz_M9YV0Kg-1; Wed, 07 Feb 2024 22:05:55 -0500 X-MC-Unique: 8-uARfx-P22rhz_M9YV0Kg-1 Received: from smtp.corp.redhat.com (int-mx09.intmail.prod.int.rdu2.redhat.com [10.11.54.9]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 098BB85A58F; Thu, 8 Feb 2024 03:05:55 +0000 (UTC) Received: from x1n.redhat.com (unknown [10.72.116.45]) by smtp.corp.redhat.com (Postfix) with ESMTP id A32C9492BC7; Thu, 8 Feb 2024 03:05:52 +0000 (UTC) From: peterx@redhat.com To: Peter Maydell , qemu-devel@nongnu.org Cc: peterx@redhat.com, Fabiano Rosas Subject: [PULL 07/34] migration/multifd: Separate SYNC request with normal jobs Date: Thu, 8 Feb 2024 11:05:01 +0800 Message-ID: <20240208030528.368214-8-peterx@redhat.com> In-Reply-To: <20240208030528.368214-1-peterx@redhat.com> References: <20240208030528.368214-1-peterx@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.4.1 on 10.11.54.9 Received-SPF: pass client-ip=170.10.133.124; envelope-from=peterx@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -21 X-Spam_score: -2.2 X-Spam_bar: -- X-Spam_report: (-2.2 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.106, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H5=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org From: Peter Xu Multifd provide a threaded model for processing jobs. On sender side, there can be two kinds of job: (1) a list of pages to send, or (2) a sync request. The sync request is a very special kind of job. It never contains a page array, but only a multifd packet telling the dest side to synchronize with sent pages. Before this patch, both requests use the pending_job field, no matter what the request is, it will boost pending_job, while multifd sender thread will decrement it after it finishes one job. However this should be racy, because SYNC is special in that it needs to set p->flags with MULTIFD_FLAG_SYNC, showing that this is a sync request. Consider a sequence of operations where: - migration thread enqueue a job to send some pages, pending_job++ (0->1) - [...before the selected multifd sender thread wakes up...] - migration thread enqueue another job to sync, pending_job++ (1->2), setup p->flags=MULTIFD_FLAG_SYNC - multifd sender thread wakes up, found pending_job==2 - send the 1st packet with MULTIFD_FLAG_SYNC and list of pages - send the 2nd packet with flags==0 and no pages This is not expected, because MULTIFD_FLAG_SYNC should hopefully be done after all the pages are received. Meanwhile, the 2nd packet will be completely useless, which contains zero information. I didn't verify above, but I think this issue is still benign in that at least on the recv side we always receive pages before handling MULTIFD_FLAG_SYNC. However that's not always guaranteed and just tricky. One other reason I want to separate it is using p->flags to communicate between the two threads is also not clearly defined, it's very hard to read and understand why accessing p->flags is always safe; see the current impl of multifd_send_thread() where we tried to cache only p->flags. It doesn't need to be that complicated. This patch introduces pending_sync, a separate flag just to show that the requester needs a sync. Alongside, we remove the tricky caching of p->flags now because after this patch p->flags should only be used by multifd sender thread now, which will be crystal clear. So it is always thread safe to access p->flags. With that, we can also safely convert the pending_job into a boolean, because we don't support >1 pending jobs anyway. Always use atomic ops to access both flags to make sure no cache effect. When at it, drop the initial setting of "pending_job = 0" because it's always allocated using g_new0(). Reviewed-by: Fabiano Rosas Link: https://lore.kernel.org/r/20240202102857.110210-7-peterx@redhat.com Signed-off-by: Peter Xu --- migration/multifd.h | 13 +++++++++++-- migration/multifd.c | 39 +++++++++++++++++++++++++-------------- 2 files changed, 36 insertions(+), 16 deletions(-) diff --git a/migration/multifd.h b/migration/multifd.h index 3920bdbcf1..08f26ef3fe 100644 --- a/migration/multifd.h +++ b/migration/multifd.h @@ -99,8 +99,17 @@ typedef struct { uint32_t flags; /* global number of generated multifd packets */ uint64_t packet_num; - /* thread has work to do */ - int pending_job; + /* + * The sender thread has work to do if either of below boolean is set. + * + * @pending_job: a job is pending + * @pending_sync: a sync request is pending + * + * For both of these fields, they're only set by the requesters, and + * cleared by the multifd sender threads. + */ + bool pending_job; + bool pending_sync; /* array of pages to sent. * The owner of 'pages' depends of 'pending_job' value: * pending_job == 0 -> migration_thread can use it. diff --git a/migration/multifd.c b/migration/multifd.c index 8bb1fd95cf..ea25bbe6bd 100644 --- a/migration/multifd.c +++ b/migration/multifd.c @@ -442,8 +442,8 @@ static int multifd_send_pages(void) } p = &multifd_send_state->params[i]; qemu_mutex_lock(&p->mutex); - if (!p->pending_job) { - p->pending_job++; + if (qatomic_read(&p->pending_job) == false) { + qatomic_set(&p->pending_job, true); next_channel = (i + 1) % migrate_multifd_channels(); break; } @@ -631,8 +631,12 @@ int multifd_send_sync_main(void) qemu_mutex_lock(&p->mutex); p->packet_num = multifd_send_state->packet_num++; - p->flags |= MULTIFD_FLAG_SYNC; - p->pending_job++; + /* + * We should be the only user so far, so not possible to be set by + * others concurrently. + */ + assert(qatomic_read(&p->pending_sync) == false); + qatomic_set(&p->pending_sync, true); qemu_mutex_unlock(&p->mutex); qemu_sem_post(&p->sem); } @@ -685,10 +689,9 @@ static void *multifd_send_thread(void *opaque) } qemu_mutex_lock(&p->mutex); - if (p->pending_job) { + if (qatomic_read(&p->pending_job)) { uint64_t packet_num = p->packet_num; MultiFDPages_t *pages = p->pages; - uint32_t flags; if (use_zero_copy_send) { p->iovs_num = 0; @@ -704,13 +707,11 @@ static void *multifd_send_thread(void *opaque) } } multifd_send_fill_packet(p); - flags = p->flags; - p->flags = 0; p->num_packets++; p->total_normal_pages += pages->num; qemu_mutex_unlock(&p->mutex); - trace_multifd_send(p->id, packet_num, pages->num, flags, + trace_multifd_send(p->id, packet_num, pages->num, p->flags, p->next_packet_size); if (use_zero_copy_send) { @@ -738,12 +739,23 @@ static void *multifd_send_thread(void *opaque) multifd_pages_reset(p->pages); p->next_packet_size = 0; qemu_mutex_lock(&p->mutex); - p->pending_job--; + qatomic_set(&p->pending_job, false); qemu_mutex_unlock(&p->mutex); - - if (flags & MULTIFD_FLAG_SYNC) { - qemu_sem_post(&p->sem_sync); + } else if (qatomic_read(&p->pending_sync)) { + p->flags = MULTIFD_FLAG_SYNC; + multifd_send_fill_packet(p); + ret = qio_channel_write_all(p->c, (void *)p->packet, + p->packet_len, &local_err); + if (ret != 0) { + qemu_mutex_unlock(&p->mutex); + break; } + /* p->next_packet_size will always be zero for a SYNC packet */ + stat64_add(&mig_stats.multifd_bytes, p->packet_len); + p->flags = 0; + qatomic_set(&p->pending_sync, false); + qemu_mutex_unlock(&p->mutex); + qemu_sem_post(&p->sem_sync); } else { qemu_mutex_unlock(&p->mutex); /* sometimes there are spurious wakeups */ @@ -907,7 +919,6 @@ int multifd_save_setup(Error **errp) qemu_mutex_init(&p->mutex); qemu_sem_init(&p->sem, 0); qemu_sem_init(&p->sem_sync, 0); - p->pending_job = 0; p->id = i; p->pages = multifd_pages_init(page_count); p->packet_len = sizeof(MultiFDPacket_t) From patchwork Thu Feb 8 03:05:02 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 13549271 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 7C8A5C4829B for ; Thu, 8 Feb 2024 03:11:57 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1rXujp-0006E7-Bv; Wed, 07 Feb 2024 22:06:09 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1rXujk-0006Cs-1M for qemu-devel@nongnu.org; Wed, 07 Feb 2024 22:06:04 -0500 Received: from us-smtp-delivery-124.mimecast.com ([170.10.129.124]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1rXuji-0004Ls-Ez for qemu-devel@nongnu.org; Wed, 07 Feb 2024 22:06:03 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1707361561; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=HOxM34KtLIN0cPcBEfxMbZlDBqVqASX9xZ8Un2UI8a4=; b=e6OxU3FrvygcTirN2NbBivQ08Q98fEEMiYk8uh+cQR7+QhM1vAxrpaHz8Xlpl/aMV5ryp6 7jnIP1ciok/dIeJjEosi0E0ZK04Vyqgh/HAB57zFQljdOsxR61iySDC3qookSqJakK8Z18 ij8lT8ahsR1ONqSSqqHIJOUtR9jceSQ= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-193-zGiJ5iBoMm-RCvuNs_LyVA-1; Wed, 07 Feb 2024 22:05:58 -0500 X-MC-Unique: zGiJ5iBoMm-RCvuNs_LyVA-1 Received: from smtp.corp.redhat.com (int-mx09.intmail.prod.int.rdu2.redhat.com [10.11.54.9]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 34D52845E62; Thu, 8 Feb 2024 03:05:58 +0000 (UTC) Received: from x1n.redhat.com (unknown [10.72.116.45]) by smtp.corp.redhat.com (Postfix) with ESMTP id C0140492BC7; Thu, 8 Feb 2024 03:05:55 +0000 (UTC) From: peterx@redhat.com To: Peter Maydell , qemu-devel@nongnu.org Cc: peterx@redhat.com, Fabiano Rosas Subject: [PULL 08/34] migration/multifd: Simplify locking in sender thread Date: Thu, 8 Feb 2024 11:05:02 +0800 Message-ID: <20240208030528.368214-9-peterx@redhat.com> In-Reply-To: <20240208030528.368214-1-peterx@redhat.com> References: <20240208030528.368214-1-peterx@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.4.1 on 10.11.54.9 Received-SPF: pass client-ip=170.10.129.124; envelope-from=peterx@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -21 X-Spam_score: -2.2 X-Spam_bar: -- X-Spam_report: (-2.2 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.106, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H2=-0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org From: Peter Xu The sender thread will yield the p->mutex before IO starts, trying to not block the requester thread. This may be unnecessary lock optimizations, because the requester can already read pending_job safely even without the lock, because the requester is currently the only one who can assign a task. Drop that lock complication on both sides: (1) in the sender thread, always take the mutex until job done (2) in the requester thread, check pending_job clear lockless Reviewed-by: Fabiano Rosas Link: https://lore.kernel.org/r/20240202102857.110210-8-peterx@redhat.com Signed-off-by: Peter Xu --- migration/multifd.c | 23 ++++++++++++++++------- 1 file changed, 16 insertions(+), 7 deletions(-) diff --git a/migration/multifd.c b/migration/multifd.c index ea25bbe6bd..4d5a01ed93 100644 --- a/migration/multifd.c +++ b/migration/multifd.c @@ -429,7 +429,9 @@ static int multifd_send_pages(void) return -1; } + /* We wait here, until at least one channel is ready */ qemu_sem_wait(&multifd_send_state->channels_ready); + /* * next_channel can remain from a previous migration that was * using more channels, so ensure it doesn't overflow if the @@ -441,17 +443,26 @@ static int multifd_send_pages(void) return -1; } p = &multifd_send_state->params[i]; - qemu_mutex_lock(&p->mutex); + /* + * Lockless read to p->pending_job is safe, because only multifd + * sender thread can clear it. + */ if (qatomic_read(&p->pending_job) == false) { - qatomic_set(&p->pending_job, true); next_channel = (i + 1) % migrate_multifd_channels(); break; } - qemu_mutex_unlock(&p->mutex); } + + qemu_mutex_lock(&p->mutex); assert(!p->pages->num); assert(!p->pages->block); - + /* + * Double check on pending_job==false with the lock. In the future if + * we can have >1 requester thread, we can replace this with a "goto + * retry", but that is for later. + */ + assert(qatomic_read(&p->pending_job) == false); + qatomic_set(&p->pending_job, true); p->packet_num = multifd_send_state->packet_num++; multifd_send_state->pages = p->pages; p->pages = pages; @@ -709,8 +720,6 @@ static void *multifd_send_thread(void *opaque) multifd_send_fill_packet(p); p->num_packets++; p->total_normal_pages += pages->num; - qemu_mutex_unlock(&p->mutex); - trace_multifd_send(p->id, packet_num, pages->num, p->flags, p->next_packet_size); @@ -730,6 +739,7 @@ static void *multifd_send_thread(void *opaque) ret = qio_channel_writev_full_all(p->c, p->iov, p->iovs_num, NULL, 0, p->write_flags, &local_err); if (ret != 0) { + qemu_mutex_unlock(&p->mutex); break; } @@ -738,7 +748,6 @@ static void *multifd_send_thread(void *opaque) multifd_pages_reset(p->pages); p->next_packet_size = 0; - qemu_mutex_lock(&p->mutex); qatomic_set(&p->pending_job, false); qemu_mutex_unlock(&p->mutex); } else if (qatomic_read(&p->pending_sync)) { From patchwork Thu Feb 8 03:05:03 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 13549262 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 28510C4829B for ; Thu, 8 Feb 2024 03:10:24 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1rXuk4-0006IP-8P; Wed, 07 Feb 2024 22:06:25 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1rXujn-0006Dz-HH for qemu-devel@nongnu.org; Wed, 07 Feb 2024 22:06:07 -0500 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1rXujm-0004M8-1u for qemu-devel@nongnu.org; Wed, 07 Feb 2024 22:06:07 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1707361565; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=hdcnycTr3e9Qudscro0+2UzOqhMsE2+6FKQ7CO81pEc=; b=RXGXyKNqKEBrIMVLK4uUHEx+CnsmmyS2sVz6BMwSxAlfYDRtx+Yt1M3mzdPB6HYDChgjg1 bSx3ey0vUkS6Qsst9sIuvDwNyTUj1lzStZyChBW+9jM1WTsEt7acfu/Oa/2XOHfk/hq/44 cPlJlhY8il1Q8da3OHrw6CRoRzyBNRg= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-558-Iunm-T1MOSOO-Zq5zr_V8g-1; Wed, 07 Feb 2024 22:06:01 -0500 X-MC-Unique: Iunm-T1MOSOO-Zq5zr_V8g-1 Received: from smtp.corp.redhat.com (int-mx09.intmail.prod.int.rdu2.redhat.com [10.11.54.9]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 5F3CF85A596; Thu, 8 Feb 2024 03:06:01 +0000 (UTC) Received: from x1n.redhat.com (unknown [10.72.116.45]) by smtp.corp.redhat.com (Postfix) with ESMTP id 0435E492BC7; Thu, 8 Feb 2024 03:05:58 +0000 (UTC) From: peterx@redhat.com To: Peter Maydell , qemu-devel@nongnu.org Cc: peterx@redhat.com, Fabiano Rosas Subject: [PULL 09/34] migration/multifd: Drop pages->num check in sender thread Date: Thu, 8 Feb 2024 11:05:03 +0800 Message-ID: <20240208030528.368214-10-peterx@redhat.com> In-Reply-To: <20240208030528.368214-1-peterx@redhat.com> References: <20240208030528.368214-1-peterx@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.4.1 on 10.11.54.9 Received-SPF: pass client-ip=170.10.133.124; envelope-from=peterx@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -21 X-Spam_score: -2.2 X-Spam_bar: -- X-Spam_report: (-2.2 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.106, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H5=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org From: Peter Xu Now with a split SYNC handler, we always have pages->num set for pending_job==true. Assert it instead. Reviewed-by: Fabiano Rosas Link: https://lore.kernel.org/r/20240202102857.110210-9-peterx@redhat.com Signed-off-by: Peter Xu --- migration/multifd.c | 13 +++++++------ 1 file changed, 7 insertions(+), 6 deletions(-) diff --git a/migration/multifd.c b/migration/multifd.c index 4d5a01ed93..518f9de723 100644 --- a/migration/multifd.c +++ b/migration/multifd.c @@ -710,13 +710,14 @@ static void *multifd_send_thread(void *opaque) p->iovs_num = 1; } - if (pages->num) { - ret = multifd_send_state->ops->send_prepare(p, &local_err); - if (ret != 0) { - qemu_mutex_unlock(&p->mutex); - break; - } + assert(pages->num); + + ret = multifd_send_state->ops->send_prepare(p, &local_err); + if (ret != 0) { + qemu_mutex_unlock(&p->mutex); + break; } + multifd_send_fill_packet(p); p->num_packets++; p->total_normal_pages += pages->num; From patchwork Thu Feb 8 03:05:04 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 13549272 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 96C0DC4828F for ; Thu, 8 Feb 2024 03:12:00 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1rXukP-0006ed-51; Wed, 07 Feb 2024 22:06:47 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1rXujr-0006HP-8s for qemu-devel@nongnu.org; Wed, 07 Feb 2024 22:06:13 -0500 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1rXujo-0004MH-PB for qemu-devel@nongnu.org; Wed, 07 Feb 2024 22:06:10 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1707361568; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=P4rGGjiReFA5mbRKiKXV4StIfVcFiiaCslJt4eb0ZT8=; b=Z1j/x7gPKs8SPYYVN+reyENZdX1a0UvQVgtKKcMWyBOOG0Fn5nuCPRY+doBKt2Aa053GNG kOmcoU4I4XbvWP0ieIXiXxDPPave41ZT+Cdm4qYkTf7ryBXCkiwU7aAGmP7zDSHWQ0PyrW dHFVXtcEw57ofgvJtokCitxGPWFUOt4= Received: from mimecast-mx02.redhat.com (mx-ext.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-210-XlWzly5BNKqsLxuWT7Fj9w-1; Wed, 07 Feb 2024 22:06:04 -0500 X-MC-Unique: XlWzly5BNKqsLxuWT7Fj9w-1 Received: from smtp.corp.redhat.com (int-mx09.intmail.prod.int.rdu2.redhat.com [10.11.54.9]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 586D03806710; Thu, 8 Feb 2024 03:06:04 +0000 (UTC) Received: from x1n.redhat.com (unknown [10.72.116.45]) by smtp.corp.redhat.com (Postfix) with ESMTP id 15FDF492BC7; Thu, 8 Feb 2024 03:06:01 +0000 (UTC) From: peterx@redhat.com To: Peter Maydell , qemu-devel@nongnu.org Cc: peterx@redhat.com, Fabiano Rosas Subject: [PULL 10/34] migration/multifd: Rename p->num_packets and clean it up Date: Thu, 8 Feb 2024 11:05:04 +0800 Message-ID: <20240208030528.368214-11-peterx@redhat.com> In-Reply-To: <20240208030528.368214-1-peterx@redhat.com> References: <20240208030528.368214-1-peterx@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.4.1 on 10.11.54.9 Received-SPF: pass client-ip=170.10.133.124; envelope-from=peterx@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -21 X-Spam_score: -2.2 X-Spam_bar: -- X-Spam_report: (-2.2 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.106, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H5=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org From: Peter Xu This field, no matter whether on src or dest, is only used for debugging purpose. They can even be removed already, unless it still more or less provide some accounting on "how many packets are sent/recved for this thread". The other more important one is called packet_num, which is embeded in the multifd packet headers (MultiFDPacket_t). So let's keep them for now, but make them much easier to understand, by doing below: - Rename both of them to packets_sent / packets_recved, the old name (num_packets) are waaay too confusing when we already have MultiFDPacket_t.packets_num. - Avoid worrying on the "initial packet": we know we will send it, that's good enough. The accounting won't matter a great deal to start with 0 or with 1. - Move them to where we send/recv the packets. They're: - multifd_send_fill_packet() for senders. - multifd_recv_unfill_packet() for receivers. Reviewed-by: Fabiano Rosas Link: https://lore.kernel.org/r/20240202102857.110210-10-peterx@redhat.com Signed-off-by: Peter Xu --- migration/multifd.h | 6 +++--- migration/multifd.c | 13 +++++-------- 2 files changed, 8 insertions(+), 11 deletions(-) diff --git a/migration/multifd.h b/migration/multifd.h index 08f26ef3fe..2e4ad0dc56 100644 --- a/migration/multifd.h +++ b/migration/multifd.h @@ -124,7 +124,7 @@ typedef struct { /* size of the next packet that contains pages */ uint32_t next_packet_size; /* packets sent through this channel */ - uint64_t num_packets; + uint64_t packets_sent; /* non zero pages sent through this channel */ uint64_t total_normal_pages; /* buffers to send */ @@ -174,8 +174,8 @@ typedef struct { MultiFDPacket_t *packet; /* size of the next packet that contains pages */ uint32_t next_packet_size; - /* packets sent through this channel */ - uint64_t num_packets; + /* packets received through this channel */ + uint64_t packets_recved; /* ramblock */ RAMBlock *block; /* ramblock host address */ diff --git a/migration/multifd.c b/migration/multifd.c index 518f9de723..eca76e2c18 100644 --- a/migration/multifd.c +++ b/migration/multifd.c @@ -288,6 +288,8 @@ static void multifd_send_fill_packet(MultiFDSendParams *p) packet->offset[i] = cpu_to_be64(temp); } + + p->packets_sent++; } static int multifd_recv_unfill_packet(MultiFDRecvParams *p, Error **errp) @@ -335,6 +337,7 @@ static int multifd_recv_unfill_packet(MultiFDRecvParams *p, Error **errp) p->next_packet_size = be32_to_cpu(packet->next_packet_size); p->packet_num = be64_to_cpu(packet->packet_num); + p->packets_recved++; if (p->normal_num == 0) { return 0; @@ -688,8 +691,6 @@ static void *multifd_send_thread(void *opaque) ret = -1; goto out; } - /* initial packet */ - p->num_packets = 1; while (true) { qemu_sem_post(&multifd_send_state->channels_ready); @@ -719,7 +720,6 @@ static void *multifd_send_thread(void *opaque) } multifd_send_fill_packet(p); - p->num_packets++; p->total_normal_pages += pages->num; trace_multifd_send(p->id, packet_num, pages->num, p->flags, p->next_packet_size); @@ -787,7 +787,7 @@ out: rcu_unregister_thread(); migration_threads_remove(thread); - trace_multifd_send_thread_end(p->id, p->num_packets, p->total_normal_pages); + trace_multifd_send_thread_end(p->id, p->packets_sent, p->total_normal_pages); return NULL; } @@ -1124,7 +1124,6 @@ static void *multifd_recv_thread(void *opaque) p->flags &= ~MULTIFD_FLAG_SYNC; trace_multifd_recv(p->id, p->packet_num, p->normal_num, flags, p->next_packet_size); - p->num_packets++; p->total_normal_pages += p->normal_num; qemu_mutex_unlock(&p->mutex); @@ -1150,7 +1149,7 @@ static void *multifd_recv_thread(void *opaque) qemu_mutex_unlock(&p->mutex); rcu_unregister_thread(); - trace_multifd_recv_thread_end(p->id, p->num_packets, p->total_normal_pages); + trace_multifd_recv_thread_end(p->id, p->packets_recved, p->total_normal_pages); return NULL; } @@ -1252,8 +1251,6 @@ void multifd_recv_new_channel(QIOChannel *ioc, Error **errp) } p->c = ioc; object_ref(OBJECT(ioc)); - /* initial packet */ - p->num_packets = 1; p->running = true; qemu_thread_create(&p->thread, p->name, multifd_recv_thread, p, From patchwork Thu Feb 8 03:05:05 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 13549249 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 4E0C2C4828F for ; Thu, 8 Feb 2024 03:08:37 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1rXuke-0006pw-12; Wed, 07 Feb 2024 22:07:02 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1rXujt-0006Hi-HJ for qemu-devel@nongnu.org; Wed, 07 Feb 2024 22:06:15 -0500 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1rXujr-0004MV-Vd for qemu-devel@nongnu.org; Wed, 07 Feb 2024 22:06:13 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1707361571; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Wch729pW3cWXDkHx07WlSRgxQRZ8MhkzYhr59IzkERo=; b=fC/lTdKX8bQYP66T5POYMmFDFyEcPMMwkYyDHGQpp38QdmqD7C18WPV4Dpd54YF6yPHSdW vhwrELWuv18lj+WWkDA9NwLFdWjzNl4vOn0qR2ykAphQ5FOWg1D830aSu3cjGvHGnRvfFS n4nhGxtWUCd3Ya8EiHhWpIzZQK+FCCM= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-348-3W1o2FTuMGaMz0r94OfvRQ-1; Wed, 07 Feb 2024 22:06:07 -0500 X-MC-Unique: 3W1o2FTuMGaMz0r94OfvRQ-1 Received: from smtp.corp.redhat.com (int-mx09.intmail.prod.int.rdu2.redhat.com [10.11.54.9]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 6978B101FA2D; Thu, 8 Feb 2024 03:06:07 +0000 (UTC) Received: from x1n.redhat.com (unknown [10.72.116.45]) by smtp.corp.redhat.com (Postfix) with ESMTP id 27CA7492BC7; Thu, 8 Feb 2024 03:06:04 +0000 (UTC) From: peterx@redhat.com To: Peter Maydell , qemu-devel@nongnu.org Cc: peterx@redhat.com, Fabiano Rosas Subject: [PULL 11/34] migration/multifd: Move total_normal_pages accounting Date: Thu, 8 Feb 2024 11:05:05 +0800 Message-ID: <20240208030528.368214-12-peterx@redhat.com> In-Reply-To: <20240208030528.368214-1-peterx@redhat.com> References: <20240208030528.368214-1-peterx@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.4.1 on 10.11.54.9 Received-SPF: pass client-ip=170.10.133.124; envelope-from=peterx@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -21 X-Spam_score: -2.2 X-Spam_bar: -- X-Spam_report: (-2.2 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.106, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H5=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org From: Peter Xu Just like the previous patch, move the accounting for total_normal_pages on both src/dst sides into the packet fill/unfill procedures. Reviewed-by: Fabiano Rosas Link: https://lore.kernel.org/r/20240202102857.110210-11-peterx@redhat.com Signed-off-by: Peter Xu --- migration/multifd.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/migration/multifd.c b/migration/multifd.c index eca76e2c18..94a0124934 100644 --- a/migration/multifd.c +++ b/migration/multifd.c @@ -290,6 +290,7 @@ static void multifd_send_fill_packet(MultiFDSendParams *p) } p->packets_sent++; + p->total_normal_pages += pages->num; } static int multifd_recv_unfill_packet(MultiFDRecvParams *p, Error **errp) @@ -338,6 +339,7 @@ static int multifd_recv_unfill_packet(MultiFDRecvParams *p, Error **errp) p->next_packet_size = be32_to_cpu(packet->next_packet_size); p->packet_num = be64_to_cpu(packet->packet_num); p->packets_recved++; + p->total_normal_pages += p->normal_num; if (p->normal_num == 0) { return 0; @@ -720,7 +722,6 @@ static void *multifd_send_thread(void *opaque) } multifd_send_fill_packet(p); - p->total_normal_pages += pages->num; trace_multifd_send(p->id, packet_num, pages->num, p->flags, p->next_packet_size); @@ -1124,7 +1125,6 @@ static void *multifd_recv_thread(void *opaque) p->flags &= ~MULTIFD_FLAG_SYNC; trace_multifd_recv(p->id, p->packet_num, p->normal_num, flags, p->next_packet_size); - p->total_normal_pages += p->normal_num; qemu_mutex_unlock(&p->mutex); if (p->normal_num) { From patchwork Thu Feb 8 03:05:06 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 13549264 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 08151C4829B for ; Thu, 8 Feb 2024 03:10:53 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1rXuku-0007GW-6E; Wed, 07 Feb 2024 22:07:16 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1rXujy-0006Kj-Rc for qemu-devel@nongnu.org; Wed, 07 Feb 2024 22:06:21 -0500 Received: from us-smtp-delivery-124.mimecast.com ([170.10.129.124]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1rXujx-0004Mt-5q for qemu-devel@nongnu.org; Wed, 07 Feb 2024 22:06:18 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1707361576; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=mVjPcPPLJ3RpMmbR7Nkhhoz2QLEcIP6RZZBw/2/XOyQ=; b=HLKBc4f/8nBljIWeu2gmJhXepc4g+ynEM1HcPoFRcI/Xs9ZChQxYy4s6hYtBvGn1yuUzRz k/l6z71ZKb0DMPZhDj7W0bW2Jn3rmwI51baonUaY6ZN25F7kgrWaBU/7udPiooHUXmLfed MD/F3CzNjR+uvd8W+aLQDoUiEf7Vz8I= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-635-CjK9iwiDOZaE90mNVcC8sQ-1; Wed, 07 Feb 2024 22:06:10 -0500 X-MC-Unique: CjK9iwiDOZaE90mNVcC8sQ-1 Received: from smtp.corp.redhat.com (int-mx09.intmail.prod.int.rdu2.redhat.com [10.11.54.9]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 95060101FA2C; Thu, 8 Feb 2024 03:06:10 +0000 (UTC) Received: from x1n.redhat.com (unknown [10.72.116.45]) by smtp.corp.redhat.com (Postfix) with ESMTP id 395CE492BC7; Thu, 8 Feb 2024 03:06:07 +0000 (UTC) From: peterx@redhat.com To: Peter Maydell , qemu-devel@nongnu.org Cc: peterx@redhat.com, Fabiano Rosas Subject: [PULL 12/34] migration/multifd: Move trace_multifd_send|recv() Date: Thu, 8 Feb 2024 11:05:06 +0800 Message-ID: <20240208030528.368214-13-peterx@redhat.com> In-Reply-To: <20240208030528.368214-1-peterx@redhat.com> References: <20240208030528.368214-1-peterx@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.4.1 on 10.11.54.9 Received-SPF: pass client-ip=170.10.129.124; envelope-from=peterx@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -21 X-Spam_score: -2.2 X-Spam_bar: -- X-Spam_report: (-2.2 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.106, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H2=-0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org From: Peter Xu Move them into fill/unfill of packets. With that, we can further cleanup the send/recv thread procedure, and remove one more temp var. Reviewed-by: Fabiano Rosas Link: https://lore.kernel.org/r/20240202102857.110210-12-peterx@redhat.com Signed-off-by: Peter Xu --- migration/multifd.c | 11 ++++++----- 1 file changed, 6 insertions(+), 5 deletions(-) diff --git a/migration/multifd.c b/migration/multifd.c index 94a0124934..44163e4e28 100644 --- a/migration/multifd.c +++ b/migration/multifd.c @@ -291,6 +291,9 @@ static void multifd_send_fill_packet(MultiFDSendParams *p) p->packets_sent++; p->total_normal_pages += pages->num; + + trace_multifd_send(p->id, p->packet_num, pages->num, p->flags, + p->next_packet_size); } static int multifd_recv_unfill_packet(MultiFDRecvParams *p, Error **errp) @@ -341,6 +344,9 @@ static int multifd_recv_unfill_packet(MultiFDRecvParams *p, Error **errp) p->packets_recved++; p->total_normal_pages += p->normal_num; + trace_multifd_recv(p->id, p->packet_num, p->normal_num, p->flags, + p->next_packet_size); + if (p->normal_num == 0) { return 0; } @@ -704,7 +710,6 @@ static void *multifd_send_thread(void *opaque) qemu_mutex_lock(&p->mutex); if (qatomic_read(&p->pending_job)) { - uint64_t packet_num = p->packet_num; MultiFDPages_t *pages = p->pages; if (use_zero_copy_send) { @@ -722,8 +727,6 @@ static void *multifd_send_thread(void *opaque) } multifd_send_fill_packet(p); - trace_multifd_send(p->id, packet_num, pages->num, p->flags, - p->next_packet_size); if (use_zero_copy_send) { /* Send header first, without zerocopy */ @@ -1123,8 +1126,6 @@ static void *multifd_recv_thread(void *opaque) flags = p->flags; /* recv methods don't know how to handle the SYNC flag */ p->flags &= ~MULTIFD_FLAG_SYNC; - trace_multifd_recv(p->id, p->packet_num, p->normal_num, flags, - p->next_packet_size); qemu_mutex_unlock(&p->mutex); if (p->normal_num) { From patchwork Thu Feb 8 03:05:07 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 13549239 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 50F69C4828F for ; Thu, 8 Feb 2024 03:06:59 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1rXukF-0006ZO-CI; Wed, 07 Feb 2024 22:06:36 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1rXujy-0006IQ-AC for qemu-devel@nongnu.org; Wed, 07 Feb 2024 22:06:20 -0500 Received: from us-smtp-delivery-124.mimecast.com ([170.10.129.124]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1rXujw-0004Mo-Iy for qemu-devel@nongnu.org; Wed, 07 Feb 2024 22:06:18 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1707361576; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=4nU6Ac8DCGPpdvPW3rUwbnDqO5IX17ELyXMrLfPktHk=; b=AClL6raaacWWCB/fcUt2tFDvc3i527/n1a/LgsKKx0rfAhbxjl5UYm1bmCqggNJYHQgVUK 2vmLJkqjSPYRTAygCLfrVgBkP0/B+stxWrA/SdmdostTgdtISgZAG/q/qMHugYo8PSU9IE BU9h1bxc8rpdRfVBHHJVBRHQJnJnj30= Received: from mimecast-mx02.redhat.com (mx-ext.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-607-AajobNSKND6xAk2vnJ_G9Q-1; Wed, 07 Feb 2024 22:06:13 -0500 X-MC-Unique: AajobNSKND6xAk2vnJ_G9Q-1 Received: from smtp.corp.redhat.com (int-mx09.intmail.prod.int.rdu2.redhat.com [10.11.54.9]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id A5FC529AC010; Thu, 8 Feb 2024 03:06:13 +0000 (UTC) Received: from x1n.redhat.com (unknown [10.72.116.45]) by smtp.corp.redhat.com (Postfix) with ESMTP id 641D5492BC7; Thu, 8 Feb 2024 03:06:11 +0000 (UTC) From: peterx@redhat.com To: Peter Maydell , qemu-devel@nongnu.org Cc: peterx@redhat.com, Fabiano Rosas Subject: [PULL 13/34] migration/multifd: multifd_send_prepare_header() Date: Thu, 8 Feb 2024 11:05:07 +0800 Message-ID: <20240208030528.368214-14-peterx@redhat.com> In-Reply-To: <20240208030528.368214-1-peterx@redhat.com> References: <20240208030528.368214-1-peterx@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.4.1 on 10.11.54.9 Received-SPF: pass client-ip=170.10.129.124; envelope-from=peterx@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -21 X-Spam_score: -2.2 X-Spam_bar: -- X-Spam_report: (-2.2 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.106, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H2=-0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org From: Peter Xu Introduce a helper multifd_send_prepare_header() to setup the header packet for multifd sender. It's fine to setup the IOV[0] _before_ send_prepare() because the packet buffer is already ready, even if the content is to be filled in. With this helper, we can already slightly clean up the zero copy path. Note that I explicitly put it into multifd.h, because I want it inlined directly into multifd*.c where necessary later. Reviewed-by: Fabiano Rosas Link: https://lore.kernel.org/r/20240202102857.110210-13-peterx@redhat.com Signed-off-by: Peter Xu --- migration/multifd.h | 8 ++++++++ migration/multifd.c | 16 ++++++++-------- 2 files changed, 16 insertions(+), 8 deletions(-) diff --git a/migration/multifd.h b/migration/multifd.h index 2e4ad0dc56..4ec005f53f 100644 --- a/migration/multifd.h +++ b/migration/multifd.h @@ -209,5 +209,13 @@ typedef struct { void multifd_register_ops(int method, MultiFDMethods *ops); +static inline void multifd_send_prepare_header(MultiFDSendParams *p) +{ + p->iov[0].iov_len = p->packet_len; + p->iov[0].iov_base = p->packet; + p->iovs_num++; +} + + #endif diff --git a/migration/multifd.c b/migration/multifd.c index 44163e4e28..cd4467aff4 100644 --- a/migration/multifd.c +++ b/migration/multifd.c @@ -712,10 +712,14 @@ static void *multifd_send_thread(void *opaque) if (qatomic_read(&p->pending_job)) { MultiFDPages_t *pages = p->pages; - if (use_zero_copy_send) { - p->iovs_num = 0; - } else { - p->iovs_num = 1; + p->iovs_num = 0; + + if (!use_zero_copy_send) { + /* + * Only !zerocopy needs the header in IOV; zerocopy will + * send it separately. + */ + multifd_send_prepare_header(p); } assert(pages->num); @@ -735,10 +739,6 @@ static void *multifd_send_thread(void *opaque) if (ret != 0) { break; } - } else { - /* Send header using the same writev call */ - p->iov[0].iov_len = p->packet_len; - p->iov[0].iov_base = p->packet; } ret = qio_channel_writev_full_all(p->c, p->iov, p->iovs_num, NULL, From patchwork Thu Feb 8 03:05:08 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 13549257 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 9C902C4829B for ; Thu, 8 Feb 2024 03:09:19 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1rXul7-0007Zj-Aq; Wed, 07 Feb 2024 22:07:29 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1rXuk4-0006Pr-Ca for qemu-devel@nongnu.org; Wed, 07 Feb 2024 22:06:25 -0500 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1rXuk1-0004ND-9j for qemu-devel@nongnu.org; Wed, 07 Feb 2024 22:06:23 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1707361580; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=I3vRV0VDr3Uq1n51oj5HlrvT40qebwi3yFZMQzgX7Og=; b=CvMnH8mKWFmAWNk6lRnqwV/nm4uNRJ36WEfaefB3/rpcle7khUl5f6Zsu12uGkV96NGR7G 2hKe3WK+PQpj5hq5FbMaHiKjk4iSSaz56kCkzg5/hcCuNyh6SYLddZBLdjYQm/cr22CZes or2M8V39CJWXTaXzHcBgbPeCW4QzaRE= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-324-CgpzQ-C8PI6ZcDBfB8EiPw-1; Wed, 07 Feb 2024 22:06:17 -0500 X-MC-Unique: CgpzQ-C8PI6ZcDBfB8EiPw-1 Received: from smtp.corp.redhat.com (int-mx09.intmail.prod.int.rdu2.redhat.com [10.11.54.9]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 9F45C185A780; Thu, 8 Feb 2024 03:06:16 +0000 (UTC) Received: from x1n.redhat.com (unknown [10.72.116.45]) by smtp.corp.redhat.com (Postfix) with ESMTP id 759EE492BC7; Thu, 8 Feb 2024 03:06:14 +0000 (UTC) From: peterx@redhat.com To: Peter Maydell , qemu-devel@nongnu.org Cc: peterx@redhat.com, Fabiano Rosas Subject: [PULL 14/34] migration/multifd: Move header prepare/fill into send_prepare() Date: Thu, 8 Feb 2024 11:05:08 +0800 Message-ID: <20240208030528.368214-15-peterx@redhat.com> In-Reply-To: <20240208030528.368214-1-peterx@redhat.com> References: <20240208030528.368214-1-peterx@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.4.1 on 10.11.54.9 Received-SPF: pass client-ip=170.10.133.124; envelope-from=peterx@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -21 X-Spam_score: -2.2 X-Spam_bar: -- X-Spam_report: (-2.2 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.106, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H5=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org From: Peter Xu This patch redefines the interfacing of ->send_prepare(). It further simplifies multifd_send_thread() especially on zero copy. Now with the new interface, we require the hook to do all the work for preparing the IOVs to send. After it's completed, the IOVs should be ready to be dumped into the specific multifd QIOChannel later. So now the API looks like: p->pages -----------> send_prepare() -------------> IOVs This also prepares for the case where the input can be extended to even not any p->pages. But that's for later. This patch will achieve similar goal of what Fabiano used to propose here: https://lore.kernel.org/r/20240126221943.26628-1-farosas@suse.de However the send() interface may not be necessary. I'm boldly attaching a "Co-developed-by" for Fabiano. Co-developed-by: Fabiano Rosas Reviewed-by: Fabiano Rosas Link: https://lore.kernel.org/r/20240202102857.110210-14-peterx@redhat.com Signed-off-by: Peter Xu --- migration/multifd.h | 1 + migration/multifd-zlib.c | 4 +++ migration/multifd-zstd.c | 4 +++ migration/multifd.c | 61 ++++++++++++++++++---------------------- 4 files changed, 37 insertions(+), 33 deletions(-) diff --git a/migration/multifd.h b/migration/multifd.h index 4ec005f53f..34a2ecb9f4 100644 --- a/migration/multifd.h +++ b/migration/multifd.h @@ -208,6 +208,7 @@ typedef struct { } MultiFDMethods; void multifd_register_ops(int method, MultiFDMethods *ops); +void multifd_send_fill_packet(MultiFDSendParams *p); static inline void multifd_send_prepare_header(MultiFDSendParams *p) { diff --git a/migration/multifd-zlib.c b/migration/multifd-zlib.c index 100809abc1..012e3bdea1 100644 --- a/migration/multifd-zlib.c +++ b/migration/multifd-zlib.c @@ -123,6 +123,8 @@ static int zlib_send_prepare(MultiFDSendParams *p, Error **errp) int ret; uint32_t i; + multifd_send_prepare_header(p); + for (i = 0; i < pages->num; i++) { uint32_t available = z->zbuff_len - out_size; int flush = Z_NO_FLUSH; @@ -172,6 +174,8 @@ static int zlib_send_prepare(MultiFDSendParams *p, Error **errp) p->next_packet_size = out_size; p->flags |= MULTIFD_FLAG_ZLIB; + multifd_send_fill_packet(p); + return 0; } diff --git a/migration/multifd-zstd.c b/migration/multifd-zstd.c index 2023edd8cc..dc8fe43e94 100644 --- a/migration/multifd-zstd.c +++ b/migration/multifd-zstd.c @@ -118,6 +118,8 @@ static int zstd_send_prepare(MultiFDSendParams *p, Error **errp) int ret; uint32_t i; + multifd_send_prepare_header(p); + z->out.dst = z->zbuff; z->out.size = z->zbuff_len; z->out.pos = 0; @@ -161,6 +163,8 @@ static int zstd_send_prepare(MultiFDSendParams *p, Error **errp) p->next_packet_size = z->out.pos; p->flags |= MULTIFD_FLAG_ZSTD; + multifd_send_fill_packet(p); + return 0; } diff --git a/migration/multifd.c b/migration/multifd.c index cd4467aff4..6aa44340de 100644 --- a/migration/multifd.c +++ b/migration/multifd.c @@ -50,15 +50,15 @@ typedef struct { /** * nocomp_send_setup: setup send side * - * For no compression this function does nothing. - * - * Returns 0 for success or -1 for error - * * @p: Params for the channel that we are using * @errp: pointer to an error */ static int nocomp_send_setup(MultiFDSendParams *p, Error **errp) { + if (migrate_zero_copy_send()) { + p->write_flags |= QIO_CHANNEL_WRITE_FLAG_ZERO_COPY; + } + return 0; } @@ -88,7 +88,17 @@ static void nocomp_send_cleanup(MultiFDSendParams *p, Error **errp) */ static int nocomp_send_prepare(MultiFDSendParams *p, Error **errp) { + bool use_zero_copy_send = migrate_zero_copy_send(); MultiFDPages_t *pages = p->pages; + int ret; + + if (!use_zero_copy_send) { + /* + * Only !zerocopy needs the header in IOV; zerocopy will + * send it separately. + */ + multifd_send_prepare_header(p); + } for (int i = 0; i < pages->num; i++) { p->iov[p->iovs_num].iov_base = pages->block->host + pages->offset[i]; @@ -98,6 +108,18 @@ static int nocomp_send_prepare(MultiFDSendParams *p, Error **errp) p->next_packet_size = pages->num * p->page_size; p->flags |= MULTIFD_FLAG_NOCOMP; + + multifd_send_fill_packet(p); + + if (use_zero_copy_send) { + /* Send header first, without zerocopy */ + ret = qio_channel_write_all(p->c, (void *)p->packet, + p->packet_len, errp); + if (ret != 0) { + return -1; + } + } + return 0; } @@ -266,7 +288,7 @@ static void multifd_pages_clear(MultiFDPages_t *pages) g_free(pages); } -static void multifd_send_fill_packet(MultiFDSendParams *p) +void multifd_send_fill_packet(MultiFDSendParams *p) { MultiFDPacket_t *packet = p->packet; MultiFDPages_t *pages = p->pages; @@ -688,7 +710,6 @@ static void *multifd_send_thread(void *opaque) MigrationThread *thread = NULL; Error *local_err = NULL; int ret = 0; - bool use_zero_copy_send = migrate_zero_copy_send(); thread = migration_threads_add(p->name, qemu_get_thread_id()); @@ -713,15 +734,6 @@ static void *multifd_send_thread(void *opaque) MultiFDPages_t *pages = p->pages; p->iovs_num = 0; - - if (!use_zero_copy_send) { - /* - * Only !zerocopy needs the header in IOV; zerocopy will - * send it separately. - */ - multifd_send_prepare_header(p); - } - assert(pages->num); ret = multifd_send_state->ops->send_prepare(p, &local_err); @@ -730,17 +742,6 @@ static void *multifd_send_thread(void *opaque) break; } - multifd_send_fill_packet(p); - - if (use_zero_copy_send) { - /* Send header first, without zerocopy */ - ret = qio_channel_write_all(p->c, (void *)p->packet, - p->packet_len, &local_err); - if (ret != 0) { - break; - } - } - ret = qio_channel_writev_full_all(p->c, p->iov, p->iovs_num, NULL, 0, p->write_flags, &local_err); if (ret != 0) { @@ -945,13 +946,7 @@ int multifd_save_setup(Error **errp) p->iov = g_new0(struct iovec, page_count + 1); p->page_size = qemu_target_page_size(); p->page_count = page_count; - - if (migrate_zero_copy_send()) { - p->write_flags = QIO_CHANNEL_WRITE_FLAG_ZERO_COPY; - } else { - p->write_flags = 0; - } - + p->write_flags = 0; multifd_new_send_channel_create(p); } From patchwork Thu Feb 8 03:05:09 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 13549254 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 5A375C4829B for ; Thu, 8 Feb 2024 03:09:05 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1rXulA-0007ea-Fp; Wed, 07 Feb 2024 22:07:34 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1rXukB-0006ZB-0K for qemu-devel@nongnu.org; Wed, 07 Feb 2024 22:06:32 -0500 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1rXuk4-0004NM-BB for qemu-devel@nongnu.org; Wed, 07 Feb 2024 22:06:29 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1707361583; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=cp4T0OV/eWxNPPdn5wej7DTuu59xZM6LTLe/vHPYQB0=; b=JHybcuBgEmNvr9CQ1cEBE3C7R885W007G45WAd+2iGVM2qNA+psIwQxy2Z2UDuVWcEIik+ Nms3oSckHQG2nKagQetc5eJbaw1PpR8TsvOWr7gKjB99iJwjNVy2kdqpq1XBPHAso+k2XZ T2+AaxDBCv068pFUIn+ToopWIDNTM+A= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-358-VsZeWIjlOD2A3XRwGzkBXA-1; Wed, 07 Feb 2024 22:06:20 -0500 X-MC-Unique: VsZeWIjlOD2A3XRwGzkBXA-1 Received: from smtp.corp.redhat.com (int-mx09.intmail.prod.int.rdu2.redhat.com [10.11.54.9]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 97F0D101FA2C; Thu, 8 Feb 2024 03:06:19 +0000 (UTC) Received: from x1n.redhat.com (unknown [10.72.116.45]) by smtp.corp.redhat.com (Postfix) with ESMTP id 405E3492BC7; Thu, 8 Feb 2024 03:06:16 +0000 (UTC) From: peterx@redhat.com To: Peter Maydell , qemu-devel@nongnu.org Cc: peterx@redhat.com, Fabiano Rosas Subject: [PULL 15/34] migration/multifd: Forbid spurious wakeups Date: Thu, 8 Feb 2024 11:05:09 +0800 Message-ID: <20240208030528.368214-16-peterx@redhat.com> In-Reply-To: <20240208030528.368214-1-peterx@redhat.com> References: <20240208030528.368214-1-peterx@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.4.1 on 10.11.54.9 Received-SPF: pass client-ip=170.10.133.124; envelope-from=peterx@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -21 X-Spam_score: -2.2 X-Spam_bar: -- X-Spam_report: (-2.2 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.106, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H5=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org From: Peter Xu Now multifd's logic is designed to have no spurious wakeup. I still remember a talk to Juan and he seems to agree we should drop it now, and if my memory was right it was there because multifd used to hit that when still debugging. Let's drop it and see what can explode; as long as it's not reaching soft-freeze. Reviewed-by: Fabiano Rosas Link: https://lore.kernel.org/r/20240202102857.110210-15-peterx@redhat.com Signed-off-by: Peter Xu --- migration/multifd.c | 7 +++---- 1 file changed, 3 insertions(+), 4 deletions(-) diff --git a/migration/multifd.c b/migration/multifd.c index 6aa44340de..28b54100cd 100644 --- a/migration/multifd.c +++ b/migration/multifd.c @@ -756,7 +756,9 @@ static void *multifd_send_thread(void *opaque) p->next_packet_size = 0; qatomic_set(&p->pending_job, false); qemu_mutex_unlock(&p->mutex); - } else if (qatomic_read(&p->pending_sync)) { + } else { + /* If not a normal job, must be a sync request */ + assert(qatomic_read(&p->pending_sync)); p->flags = MULTIFD_FLAG_SYNC; multifd_send_fill_packet(p); ret = qio_channel_write_all(p->c, (void *)p->packet, @@ -771,9 +773,6 @@ static void *multifd_send_thread(void *opaque) qatomic_set(&p->pending_sync, false); qemu_mutex_unlock(&p->mutex); qemu_sem_post(&p->sem_sync); - } else { - qemu_mutex_unlock(&p->mutex); - /* sometimes there are spurious wakeups */ } } From patchwork Thu Feb 8 03:05:10 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 13549267 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 58256C4828F for ; Thu, 8 Feb 2024 03:11:19 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1rXulM-0000g9-GG; Wed, 07 Feb 2024 22:07:44 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1rXukC-0006ZW-J6 for qemu-devel@nongnu.org; Wed, 07 Feb 2024 22:06:33 -0500 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1rXukA-0004NX-Oe for qemu-devel@nongnu.org; Wed, 07 Feb 2024 22:06:32 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1707361588; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=wmVOlgbKMIZPUpnLNg5089YHM4K2bPynpzFMwVEBTrI=; b=J8hLITL2VWZ/V0GzGW8GdZOXA+iS/L3lXzMiPVysK5ziqNIKxY0rej7OzRN/FbvfwSrT8X FRGEyVVSehM8Ts9te9m2VUEcihSvNHjDeLTWVJQ6LpmFbF6zHhuUA1vIegr0FSgejbeAct D21WiVa+iWiuf9jPo1g03bRl2ZtI+Yg= Received: from mimecast-mx02.redhat.com (mx-ext.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-224-qEEz9NggN86I5Niux8CcdQ-1; Wed, 07 Feb 2024 22:06:23 -0500 X-MC-Unique: qEEz9NggN86I5Niux8CcdQ-1 Received: from smtp.corp.redhat.com (int-mx09.intmail.prod.int.rdu2.redhat.com [10.11.54.9]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id A91B73806710; Thu, 8 Feb 2024 03:06:22 +0000 (UTC) Received: from x1n.redhat.com (unknown [10.72.116.45]) by smtp.corp.redhat.com (Postfix) with ESMTP id 673D9492BC7; Thu, 8 Feb 2024 03:06:20 +0000 (UTC) From: peterx@redhat.com To: Peter Maydell , qemu-devel@nongnu.org Cc: peterx@redhat.com, Fabiano Rosas Subject: [PULL 16/34] migration/multifd: Split multifd_send_terminate_threads() Date: Thu, 8 Feb 2024 11:05:10 +0800 Message-ID: <20240208030528.368214-17-peterx@redhat.com> In-Reply-To: <20240208030528.368214-1-peterx@redhat.com> References: <20240208030528.368214-1-peterx@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.4.1 on 10.11.54.9 Received-SPF: pass client-ip=170.10.133.124; envelope-from=peterx@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -21 X-Spam_score: -2.2 X-Spam_bar: -- X-Spam_report: (-2.2 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.106, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H5=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org From: Peter Xu Split multifd_send_terminate_threads() into two functions: - multifd_send_set_error(): used when an error happened on the sender side, set error and quit state only - multifd_send_terminate_threads(): used only by the main thread to kick all multifd send threads out of sleep, for the last recycling. Use multifd_send_set_error() in the three old call sites where only the error will be set. Use multifd_send_terminate_threads() in the last one where the main thread will kick the multifd threads at last in multifd_save_cleanup(). Both helpers will need to set quitting=1. Suggested-by: Fabiano Rosas Reviewed-by: Fabiano Rosas Link: https://lore.kernel.org/r/20240202102857.110210-16-peterx@redhat.com Signed-off-by: Peter Xu --- migration/multifd.c | 27 ++++++++++++++++++--------- migration/trace-events | 2 +- 2 files changed, 19 insertions(+), 10 deletions(-) diff --git a/migration/multifd.c b/migration/multifd.c index 28b54100cd..ba86f9dda5 100644 --- a/migration/multifd.c +++ b/migration/multifd.c @@ -536,10 +536,9 @@ int multifd_queue_page(RAMBlock *block, ram_addr_t offset) return 1; } -static void multifd_send_terminate_threads(Error *err) +/* Multifd send side hit an error; remember it and prepare to quit */ +static void multifd_send_set_error(Error *err) { - int i; - /* * We don't want to exit each threads twice. Depending on where * we get the error, or if there are two independent errors in two @@ -550,8 +549,6 @@ static void multifd_send_terminate_threads(Error *err) return; } - trace_multifd_send_terminate_threads(err != NULL); - if (err) { MigrationState *s = migrate_get_current(); migrate_set_error(s, err); @@ -563,7 +560,19 @@ static void multifd_send_terminate_threads(Error *err) MIGRATION_STATUS_FAILED); } } +} + +static void multifd_send_terminate_threads(void) +{ + int i; + + trace_multifd_send_terminate_threads(); + /* + * Tell everyone we're quitting. No xchg() needed here; we simply + * always set it. + */ + qatomic_set(&multifd_send_state->exiting, 1); for (i = 0; i < migrate_multifd_channels(); i++) { MultiFDSendParams *p = &multifd_send_state->params[i]; @@ -586,7 +595,7 @@ void multifd_save_cleanup(void) if (!migrate_multifd()) { return; } - multifd_send_terminate_threads(NULL); + multifd_send_terminate_threads(); for (i = 0; i < migrate_multifd_channels(); i++) { MultiFDSendParams *p = &multifd_send_state->params[i]; @@ -780,7 +789,7 @@ out: if (ret) { assert(local_err); trace_multifd_send_error(p->id); - multifd_send_terminate_threads(local_err); + multifd_send_set_error(local_err); multifd_send_kick_main(p); error_free(local_err); } @@ -816,7 +825,7 @@ static void multifd_tls_outgoing_handshake(QIOTask *task, trace_multifd_tls_outgoing_handshake_error(ioc, error_get_pretty(err)); - multifd_send_terminate_threads(err); + multifd_send_set_error(err); multifd_send_kick_main(p); error_free(err); } @@ -898,7 +907,7 @@ static void multifd_new_send_channel_async(QIOTask *task, gpointer opaque) } trace_multifd_new_send_channel_async_error(p->id, local_err); - multifd_send_terminate_threads(local_err); + multifd_send_set_error(local_err); multifd_send_kick_main(p); object_unref(OBJECT(ioc)); error_free(local_err); diff --git a/migration/trace-events b/migration/trace-events index de4a743c8a..298ad2b0dd 100644 --- a/migration/trace-events +++ b/migration/trace-events @@ -141,7 +141,7 @@ multifd_send_error(uint8_t id) "channel %u" multifd_send_sync_main(long packet_num) "packet num %ld" multifd_send_sync_main_signal(uint8_t id) "channel %u" multifd_send_sync_main_wait(uint8_t id) "channel %u" -multifd_send_terminate_threads(bool error) "error %d" +multifd_send_terminate_threads(void) "" multifd_send_thread_end(uint8_t id, uint64_t packets, uint64_t normal_pages) "channel %u packets %" PRIu64 " normal pages %" PRIu64 multifd_send_thread_start(uint8_t id) "%u" multifd_tls_outgoing_handshake_start(void *ioc, void *tioc, const char *hostname) "ioc=%p tioc=%p hostname=%s" From patchwork Thu Feb 8 03:05:11 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 13549255 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id A36CAC4829B for ; Thu, 8 Feb 2024 03:09:13 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1rXulF-0007zx-NL; Wed, 07 Feb 2024 22:07:37 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1rXukC-0006ZX-MW for qemu-devel@nongnu.org; Wed, 07 Feb 2024 22:06:33 -0500 Received: from us-smtp-delivery-124.mimecast.com ([170.10.129.124]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1rXukA-0004Nd-Pt for qemu-devel@nongnu.org; Wed, 07 Feb 2024 22:06:32 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1707361589; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=8s0tvpWkzV3GQMQJaITUyymUQr64z0Z2GBMc/tUa79g=; b=OPQG0+VeeAyRMaFMqb0QCTe6WtlRGGLmLSFX8Ii71r3VzJIevv9X08TErW5VKIKm943UZZ cfL9iNaStIREKg0GoffjeAh82pWLaRPv7DYRG6S9QSUW0TQJNpP9jFgV8MXgGkrTbhz0MQ yXFwFO+EOnMvhKThJ++d9xcorbjKE3w= Received: from mimecast-mx02.redhat.com (mx-ext.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-79-wSZn22o6M9CbEcCaG8y1mg-1; Wed, 07 Feb 2024 22:06:26 -0500 X-MC-Unique: wSZn22o6M9CbEcCaG8y1mg-1 Received: from smtp.corp.redhat.com (int-mx09.intmail.prod.int.rdu2.redhat.com [10.11.54.9]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id EB9873806710; Thu, 8 Feb 2024 03:06:25 +0000 (UTC) Received: from x1n.redhat.com (unknown [10.72.116.45]) by smtp.corp.redhat.com (Postfix) with ESMTP id 781C8492BC7; Thu, 8 Feb 2024 03:06:23 +0000 (UTC) From: peterx@redhat.com To: Peter Maydell , qemu-devel@nongnu.org Cc: peterx@redhat.com, Fabiano Rosas Subject: [PULL 17/34] migration/multifd: Change retval of multifd_queue_page() Date: Thu, 8 Feb 2024 11:05:11 +0800 Message-ID: <20240208030528.368214-18-peterx@redhat.com> In-Reply-To: <20240208030528.368214-1-peterx@redhat.com> References: <20240208030528.368214-1-peterx@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.4.1 on 10.11.54.9 Received-SPF: pass client-ip=170.10.129.124; envelope-from=peterx@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -21 X-Spam_score: -2.2 X-Spam_bar: -- X-Spam_report: (-2.2 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.106, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H2=-0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org From: Peter Xu Using int is an overkill when there're only two options. Change it to a boolean. Reviewed-by: Fabiano Rosas Link: https://lore.kernel.org/r/20240202102857.110210-17-peterx@redhat.com Signed-off-by: Peter Xu --- migration/multifd.h | 2 +- migration/multifd.c | 9 +++++---- migration/ram.c | 2 +- 3 files changed, 7 insertions(+), 6 deletions(-) diff --git a/migration/multifd.h b/migration/multifd.h index 34a2ecb9f4..a320c53a6f 100644 --- a/migration/multifd.h +++ b/migration/multifd.h @@ -22,7 +22,7 @@ bool multifd_recv_all_channels_created(void); void multifd_recv_new_channel(QIOChannel *ioc, Error **errp); void multifd_recv_sync_main(void); int multifd_send_sync_main(void); -int multifd_queue_page(RAMBlock *block, ram_addr_t offset); +bool multifd_queue_page(RAMBlock *block, ram_addr_t offset); /* Multifd Compression flags */ #define MULTIFD_FLAG_SYNC (1 << 0) diff --git a/migration/multifd.c b/migration/multifd.c index ba86f9dda5..12e587fda8 100644 --- a/migration/multifd.c +++ b/migration/multifd.c @@ -505,7 +505,8 @@ static int multifd_send_pages(void) return 1; } -int multifd_queue_page(RAMBlock *block, ram_addr_t offset) +/* Returns true if enqueue successful, false otherwise */ +bool multifd_queue_page(RAMBlock *block, ram_addr_t offset) { MultiFDPages_t *pages = multifd_send_state->pages; bool changed = false; @@ -519,21 +520,21 @@ int multifd_queue_page(RAMBlock *block, ram_addr_t offset) pages->num++; if (pages->num < pages->allocated) { - return 1; + return true; } } else { changed = true; } if (multifd_send_pages() < 0) { - return -1; + return false; } if (changed) { return multifd_queue_page(block, offset); } - return 1; + return true; } /* Multifd send side hit an error; remember it and prepare to quit */ diff --git a/migration/ram.c b/migration/ram.c index d5b7cd5ac2..4649a81204 100644 --- a/migration/ram.c +++ b/migration/ram.c @@ -1252,7 +1252,7 @@ static int ram_save_page(RAMState *rs, PageSearchStatus *pss) static int ram_save_multifd_page(RAMBlock *block, ram_addr_t offset) { - if (multifd_queue_page(block, offset) < 0) { + if (!multifd_queue_page(block, offset)) { return -1; } stat64_add(&mig_stats.normal_pages, 1); From patchwork Thu Feb 8 03:05:12 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 13549245 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 11284C4829B for ; Thu, 8 Feb 2024 03:08:18 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1rXulH-0008C3-0C; Wed, 07 Feb 2024 22:07:40 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1rXukF-0006bV-41 for qemu-devel@nongnu.org; Wed, 07 Feb 2024 22:06:36 -0500 Received: from us-smtp-delivery-124.mimecast.com ([170.10.129.124]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1rXukD-0004Ny-GF for qemu-devel@nongnu.org; Wed, 07 Feb 2024 22:06:34 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1707361592; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=PDKyUbSmLP6Qcr0DTtmBT55sngUFPePIR6zJW4JRhd0=; b=K2//3/Zy6/uP+NGA1FMRGVsdzNXS98BCUzZ95rWoPqEETiakHPTrnIQw78AAE5VocVQ0C+ PEqzP3rT1FsM2Y193dragglsJPnndyxM/uPzmsZkiVy5/qbcrbTRD6Bjf2uvNaOYtuGnKw VDLNbH81ULv+BgNM03vN4sf4VDr0LoA= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-633-KTfRADHdNPWZY86ykG-oVw-1; Wed, 07 Feb 2024 22:06:29 -0500 X-MC-Unique: KTfRADHdNPWZY86ykG-oVw-1 Received: from smtp.corp.redhat.com (int-mx09.intmail.prod.int.rdu2.redhat.com [10.11.54.9]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id E4D6D811E81; Thu, 8 Feb 2024 03:06:28 +0000 (UTC) Received: from x1n.redhat.com (unknown [10.72.116.45]) by smtp.corp.redhat.com (Postfix) with ESMTP id BC1E6492BC7; Thu, 8 Feb 2024 03:06:26 +0000 (UTC) From: peterx@redhat.com To: Peter Maydell , qemu-devel@nongnu.org Cc: peterx@redhat.com, Fabiano Rosas Subject: [PULL 18/34] migration/multifd: Change retval of multifd_send_pages() Date: Thu, 8 Feb 2024 11:05:12 +0800 Message-ID: <20240208030528.368214-19-peterx@redhat.com> In-Reply-To: <20240208030528.368214-1-peterx@redhat.com> References: <20240208030528.368214-1-peterx@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.4.1 on 10.11.54.9 Received-SPF: pass client-ip=170.10.129.124; envelope-from=peterx@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -21 X-Spam_score: -2.2 X-Spam_bar: -- X-Spam_report: (-2.2 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.106, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H2=-0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org From: Peter Xu Using int is an overkill when there're only two options. Change it to a boolean. Reviewed-by: Fabiano Rosas Link: https://lore.kernel.org/r/20240202102857.110210-18-peterx@redhat.com Signed-off-by: Peter Xu --- migration/multifd.c | 15 ++++++++------- 1 file changed, 8 insertions(+), 7 deletions(-) diff --git a/migration/multifd.c b/migration/multifd.c index 12e587fda8..35d4e8ad1f 100644 --- a/migration/multifd.c +++ b/migration/multifd.c @@ -449,9 +449,10 @@ static void multifd_send_kick_main(MultiFDSendParams *p) * thread is using the channel mutex when changing it, and the channel * have to had finish with its own, otherwise pending_job can't be * false. + * + * Returns true if succeed, false otherwise. */ - -static int multifd_send_pages(void) +static bool multifd_send_pages(void) { int i; static int next_channel; @@ -459,7 +460,7 @@ static int multifd_send_pages(void) MultiFDPages_t *pages = multifd_send_state->pages; if (multifd_send_should_exit()) { - return -1; + return false; } /* We wait here, until at least one channel is ready */ @@ -473,7 +474,7 @@ static int multifd_send_pages(void) next_channel %= migrate_multifd_channels(); for (i = next_channel;; i = (i + 1) % migrate_multifd_channels()) { if (multifd_send_should_exit()) { - return -1; + return false; } p = &multifd_send_state->params[i]; /* @@ -502,7 +503,7 @@ static int multifd_send_pages(void) qemu_mutex_unlock(&p->mutex); qemu_sem_post(&p->sem); - return 1; + return true; } /* Returns true if enqueue successful, false otherwise */ @@ -526,7 +527,7 @@ bool multifd_queue_page(RAMBlock *block, ram_addr_t offset) changed = true; } - if (multifd_send_pages() < 0) { + if (!multifd_send_pages()) { return false; } @@ -666,7 +667,7 @@ int multifd_send_sync_main(void) return 0; } if (multifd_send_state->pages->num) { - if (multifd_send_pages() < 0) { + if (!multifd_send_pages()) { error_report("%s: multifd_send_pages fail", __func__); return -1; } From patchwork Thu Feb 8 03:05:13 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 13549274 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 15C6CC4829B for ; Thu, 8 Feb 2024 03:12:16 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1rXulP-0000zN-CP; Wed, 07 Feb 2024 22:07:47 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1rXukG-0006cU-L6 for qemu-devel@nongnu.org; Wed, 07 Feb 2024 22:06:37 -0500 Received: from us-smtp-delivery-124.mimecast.com ([170.10.129.124]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1rXukE-0004O3-Sw for qemu-devel@nongnu.org; Wed, 07 Feb 2024 22:06:36 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1707361594; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=BjEECB6U8e4HSNn1mocHtFPk1eY0mOC4/hjT4mwFxKs=; b=bOBOPKRVzLMeSFvFg04WSuXMuikBib0wJTLz4Q4WTCWoyWzcBtoa8ruHXErO4AhhHlwdE2 KJ53FevZF4E+b0SHj8KKw12Z8XFHo7744lFS0JWDIsvoS1gLRqy6K95YXK46T0+G7xQW3M cDBPIBzPGpLsDYXM8oN+u/EHLLVR9MU= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-511-zMFkFRQmOouuQj2iL_oCoA-1; Wed, 07 Feb 2024 22:06:32 -0500 X-MC-Unique: zMFkFRQmOouuQj2iL_oCoA-1 Received: from smtp.corp.redhat.com (int-mx09.intmail.prod.int.rdu2.redhat.com [10.11.54.9]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 1BD4B101FA2C; Thu, 8 Feb 2024 03:06:32 +0000 (UTC) Received: from x1n.redhat.com (unknown [10.72.116.45]) by smtp.corp.redhat.com (Postfix) with ESMTP id B43BC492BC7; Thu, 8 Feb 2024 03:06:29 +0000 (UTC) From: peterx@redhat.com To: Peter Maydell , qemu-devel@nongnu.org Cc: peterx@redhat.com, Fabiano Rosas Subject: [PULL 19/34] migration/multifd: Rewrite multifd_queue_page() Date: Thu, 8 Feb 2024 11:05:13 +0800 Message-ID: <20240208030528.368214-20-peterx@redhat.com> In-Reply-To: <20240208030528.368214-1-peterx@redhat.com> References: <20240208030528.368214-1-peterx@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.4.1 on 10.11.54.9 Received-SPF: pass client-ip=170.10.129.124; envelope-from=peterx@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -21 X-Spam_score: -2.2 X-Spam_bar: -- X-Spam_report: (-2.2 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.106, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H2=-0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org From: Peter Xu The current multifd_queue_page() is not easy to read and follow. It is not good with a few reasons: - No helper at all to show what exactly does a condition mean; in short, readability is low. - Rely on pages->ramblock being cleared to detect an empty queue. It's slightly an overload of the ramblock pointer, per Fabiano [1], which I also agree. - Contains a self recursion, even if not necessary.. Rewrite this function. We add some comments to make it even clearer on what it does. [1] https://lore.kernel.org/r/87wmrpjzew.fsf@suse.de Reviewed-by: Fabiano Rosas Link: https://lore.kernel.org/r/20240202102857.110210-19-peterx@redhat.com Signed-off-by: Peter Xu --- migration/multifd.c | 56 ++++++++++++++++++++++++++++++--------------- 1 file changed, 37 insertions(+), 19 deletions(-) diff --git a/migration/multifd.c b/migration/multifd.c index 35d4e8ad1f..4ab8e6eff2 100644 --- a/migration/multifd.c +++ b/migration/multifd.c @@ -506,35 +506,53 @@ static bool multifd_send_pages(void) return true; } +static inline bool multifd_queue_empty(MultiFDPages_t *pages) +{ + return pages->num == 0; +} + +static inline bool multifd_queue_full(MultiFDPages_t *pages) +{ + return pages->num == pages->allocated; +} + +static inline void multifd_enqueue(MultiFDPages_t *pages, ram_addr_t offset) +{ + pages->offset[pages->num++] = offset; +} + /* Returns true if enqueue successful, false otherwise */ bool multifd_queue_page(RAMBlock *block, ram_addr_t offset) { - MultiFDPages_t *pages = multifd_send_state->pages; - bool changed = false; + MultiFDPages_t *pages; + +retry: + pages = multifd_send_state->pages; - if (!pages->block) { + /* If the queue is empty, we can already enqueue now */ + if (multifd_queue_empty(pages)) { pages->block = block; + multifd_enqueue(pages, offset); + return true; } - if (pages->block == block) { - pages->offset[pages->num] = offset; - pages->num++; - - if (pages->num < pages->allocated) { - return true; + /* + * Not empty, meanwhile we need a flush. It can because of either: + * + * (1) The page is not on the same ramblock of previous ones, or, + * (2) The queue is full. + * + * After flush, always retry. + */ + if (pages->block != block || multifd_queue_full(pages)) { + if (!multifd_send_pages()) { + return false; } - } else { - changed = true; - } - - if (!multifd_send_pages()) { - return false; - } - - if (changed) { - return multifd_queue_page(block, offset); + goto retry; } + /* Not empty, and we still have space, do it! */ + multifd_enqueue(pages, offset); return true; } From patchwork Thu Feb 8 03:05:14 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 13549259 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 52C5BC4829D for ; Thu, 8 Feb 2024 03:09:33 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1rXukm-00075d-PB; Wed, 07 Feb 2024 22:07:10 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1rXukJ-0006eE-9N for qemu-devel@nongnu.org; Wed, 07 Feb 2024 22:06:42 -0500 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1rXukH-0004OJ-FC for qemu-devel@nongnu.org; Wed, 07 Feb 2024 22:06:39 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1707361596; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Xy7kP91cNG/oDUYircRk5JGvj6nwWok6Sax1Yx9kn2g=; b=i6qa5CR/Hp+PK5UvwCgrOPUpyt0B7Zyf0LIjKpvR1OX5Z9Ppm2RJIlxmAKfZnNcXur6AbD xNIJEfbtoniFz+/Yu3ADiRQQjX6djrWW+OGFSoQcEI9H9m+p/vRbfY9ZQF1kC/KVKrOCB8 iRlFogFoDuehN5Fav2j2D+bImDupI0Y= Received: from mimecast-mx02.redhat.com (mx-ext.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-292-t2z2_s14M8if-Roub_6gGg-1; Wed, 07 Feb 2024 22:06:35 -0500 X-MC-Unique: t2z2_s14M8if-Roub_6gGg-1 Received: from smtp.corp.redhat.com (int-mx09.intmail.prod.int.rdu2.redhat.com [10.11.54.9]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 468663806622; Thu, 8 Feb 2024 03:06:35 +0000 (UTC) Received: from x1n.redhat.com (unknown [10.72.116.45]) by smtp.corp.redhat.com (Postfix) with ESMTP id DF082492BC7; Thu, 8 Feb 2024 03:06:32 +0000 (UTC) From: peterx@redhat.com To: Peter Maydell , qemu-devel@nongnu.org Cc: peterx@redhat.com, Fabiano Rosas Subject: [PULL 20/34] migration/multifd: Cleanup multifd_save_cleanup() Date: Thu, 8 Feb 2024 11:05:14 +0800 Message-ID: <20240208030528.368214-21-peterx@redhat.com> In-Reply-To: <20240208030528.368214-1-peterx@redhat.com> References: <20240208030528.368214-1-peterx@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.4.1 on 10.11.54.9 Received-SPF: pass client-ip=170.10.133.124; envelope-from=peterx@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -21 X-Spam_score: -2.2 X-Spam_bar: -- X-Spam_report: (-2.2 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.106, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H5=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org From: Peter Xu Shrink the function by moving relevant works into helpers: move the thread join()s into multifd_send_terminate_threads(), then create two more helpers to cover channel/state cleanups. Add a TODO entry for the thread terminate process because p->running is still buggy. We need to fix it at some point but not yet covered. Suggested-by: Fabiano Rosas Reviewed-by: Fabiano Rosas Link: https://lore.kernel.org/r/20240202102857.110210-20-peterx@redhat.com Signed-off-by: Peter Xu --- migration/multifd.c | 91 +++++++++++++++++++++++++++++---------------- 1 file changed, 59 insertions(+), 32 deletions(-) diff --git a/migration/multifd.c b/migration/multifd.c index 4ab8e6eff2..4cb0d2cc17 100644 --- a/migration/multifd.c +++ b/migration/multifd.c @@ -593,6 +593,11 @@ static void multifd_send_terminate_threads(void) * always set it. */ qatomic_set(&multifd_send_state->exiting, 1); + + /* + * Firstly, kick all threads out; no matter whether they are just idle, + * or blocked in an IO system call. + */ for (i = 0; i < migrate_multifd_channels(); i++) { MultiFDSendParams *p = &multifd_send_state->params[i]; @@ -601,6 +606,21 @@ static void multifd_send_terminate_threads(void) qio_channel_shutdown(p->c, QIO_CHANNEL_SHUTDOWN_BOTH, NULL); } } + + /* + * Finally recycle all the threads. + * + * TODO: p->running is still buggy, e.g. we can reach here without the + * corresponding multifd_new_send_channel_async() get invoked yet, + * then a new thread can even be created after this function returns. + */ + for (i = 0; i < migrate_multifd_channels(); i++) { + MultiFDSendParams *p = &multifd_send_state->params[i]; + + if (p->running) { + qemu_thread_join(&p->thread); + } + } } static int multifd_send_channel_destroy(QIOChannel *send) @@ -608,6 +628,41 @@ static int multifd_send_channel_destroy(QIOChannel *send) return socket_send_channel_destroy(send); } +static bool multifd_send_cleanup_channel(MultiFDSendParams *p, Error **errp) +{ + if (p->registered_yank) { + migration_ioc_unregister_yank(p->c); + } + multifd_send_channel_destroy(p->c); + p->c = NULL; + qemu_mutex_destroy(&p->mutex); + qemu_sem_destroy(&p->sem); + qemu_sem_destroy(&p->sem_sync); + g_free(p->name); + p->name = NULL; + multifd_pages_clear(p->pages); + p->pages = NULL; + p->packet_len = 0; + g_free(p->packet); + p->packet = NULL; + g_free(p->iov); + p->iov = NULL; + multifd_send_state->ops->send_cleanup(p, errp); + + return *errp == NULL; +} + +static void multifd_send_cleanup_state(void) +{ + qemu_sem_destroy(&multifd_send_state->channels_ready); + g_free(multifd_send_state->params); + multifd_send_state->params = NULL; + multifd_pages_clear(multifd_send_state->pages); + multifd_send_state->pages = NULL; + g_free(multifd_send_state); + multifd_send_state = NULL; +} + void multifd_save_cleanup(void) { int i; @@ -615,48 +670,20 @@ void multifd_save_cleanup(void) if (!migrate_multifd()) { return; } + multifd_send_terminate_threads(); - for (i = 0; i < migrate_multifd_channels(); i++) { - MultiFDSendParams *p = &multifd_send_state->params[i]; - if (p->running) { - qemu_thread_join(&p->thread); - } - } for (i = 0; i < migrate_multifd_channels(); i++) { MultiFDSendParams *p = &multifd_send_state->params[i]; Error *local_err = NULL; - if (p->registered_yank) { - migration_ioc_unregister_yank(p->c); - } - multifd_send_channel_destroy(p->c); - p->c = NULL; - qemu_mutex_destroy(&p->mutex); - qemu_sem_destroy(&p->sem); - qemu_sem_destroy(&p->sem_sync); - g_free(p->name); - p->name = NULL; - multifd_pages_clear(p->pages); - p->pages = NULL; - p->packet_len = 0; - g_free(p->packet); - p->packet = NULL; - g_free(p->iov); - p->iov = NULL; - multifd_send_state->ops->send_cleanup(p, &local_err); - if (local_err) { + if (!multifd_send_cleanup_channel(p, &local_err)) { migrate_set_error(migrate_get_current(), local_err); error_free(local_err); } } - qemu_sem_destroy(&multifd_send_state->channels_ready); - g_free(multifd_send_state->params); - multifd_send_state->params = NULL; - multifd_pages_clear(multifd_send_state->pages); - multifd_send_state->pages = NULL; - g_free(multifd_send_state); - multifd_send_state = NULL; + + multifd_send_cleanup_state(); } static int multifd_zero_copy_flush(QIOChannel *c) From patchwork Thu Feb 8 03:05:15 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 13549243 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id F26CBC4829B for ; Thu, 8 Feb 2024 03:07:58 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1rXulJ-0000H6-QO; Wed, 07 Feb 2024 22:07:41 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1rXukR-0006is-Vj for qemu-devel@nongnu.org; Wed, 07 Feb 2024 22:06:52 -0500 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1rXukO-0004Oj-RH for qemu-devel@nongnu.org; Wed, 07 Feb 2024 22:06:47 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1707361604; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=p+rWOiKRGfKYYD4s1CHnnYWpK5RIZrtTAQJmSTL7WeM=; b=WxpRdtBB86r/j0PjHAleuK5HjGT/H4BER6VybbVKco3mjx7ujZaJMdoV4rzjsotzrray5s MSTISSRXRisqtboSWpGfOn8hc9onMYKsPiusjv+XTfh99zrj3k8rFTEaNrZG3fg9/BrTOp sCYfdpcgHxHazBZvYO50F8cBl4BXkcM= Received: from mimecast-mx02.redhat.com (mx-ext.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-386-ozsVZ_xCO7WTkUf-mRaTRA-1; Wed, 07 Feb 2024 22:06:38 -0500 X-MC-Unique: ozsVZ_xCO7WTkUf-mRaTRA-1 Received: from smtp.corp.redhat.com (int-mx09.intmail.prod.int.rdu2.redhat.com [10.11.54.9]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 6FC2D1C051A6; Thu, 8 Feb 2024 03:06:38 +0000 (UTC) Received: from x1n.redhat.com (unknown [10.72.116.45]) by smtp.corp.redhat.com (Postfix) with ESMTP id 15A66492BC7; Thu, 8 Feb 2024 03:06:35 +0000 (UTC) From: peterx@redhat.com To: Peter Maydell , qemu-devel@nongnu.org Cc: peterx@redhat.com, Fabiano Rosas Subject: [PULL 21/34] migration/multifd: Cleanup multifd_load_cleanup() Date: Thu, 8 Feb 2024 11:05:15 +0800 Message-ID: <20240208030528.368214-22-peterx@redhat.com> In-Reply-To: <20240208030528.368214-1-peterx@redhat.com> References: <20240208030528.368214-1-peterx@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.4.1 on 10.11.54.9 Received-SPF: pass client-ip=170.10.133.124; envelope-from=peterx@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -21 X-Spam_score: -2.2 X-Spam_bar: -- X-Spam_report: (-2.2 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.106, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H5=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org From: Peter Xu Use similar logic to cleanup the recv side. Note that multifd_recv_terminate_threads() may need some similar rework like the sender side, but let's leave that for later. Reviewed-by: Fabiano Rosas Link: https://lore.kernel.org/r/20240202102857.110210-21-peterx@redhat.com Signed-off-by: Peter Xu --- migration/multifd.c | 52 ++++++++++++++++++++++++++------------------- 1 file changed, 30 insertions(+), 22 deletions(-) diff --git a/migration/multifd.c b/migration/multifd.c index 4cb0d2cc17..e2dd2f6e04 100644 --- a/migration/multifd.c +++ b/migration/multifd.c @@ -1070,6 +1070,34 @@ void multifd_load_shutdown(void) } } +static void multifd_recv_cleanup_channel(MultiFDRecvParams *p) +{ + migration_ioc_unregister_yank(p->c); + object_unref(OBJECT(p->c)); + p->c = NULL; + qemu_mutex_destroy(&p->mutex); + qemu_sem_destroy(&p->sem_sync); + g_free(p->name); + p->name = NULL; + p->packet_len = 0; + g_free(p->packet); + p->packet = NULL; + g_free(p->iov); + p->iov = NULL; + g_free(p->normal); + p->normal = NULL; + multifd_recv_state->ops->recv_cleanup(p); +} + +static void multifd_recv_cleanup_state(void) +{ + qemu_sem_destroy(&multifd_recv_state->sem_sync); + g_free(multifd_recv_state->params); + multifd_recv_state->params = NULL; + g_free(multifd_recv_state); + multifd_recv_state = NULL; +} + void multifd_load_cleanup(void) { int i; @@ -1092,29 +1120,9 @@ void multifd_load_cleanup(void) qemu_thread_join(&p->thread); } for (i = 0; i < migrate_multifd_channels(); i++) { - MultiFDRecvParams *p = &multifd_recv_state->params[i]; - - migration_ioc_unregister_yank(p->c); - object_unref(OBJECT(p->c)); - p->c = NULL; - qemu_mutex_destroy(&p->mutex); - qemu_sem_destroy(&p->sem_sync); - g_free(p->name); - p->name = NULL; - p->packet_len = 0; - g_free(p->packet); - p->packet = NULL; - g_free(p->iov); - p->iov = NULL; - g_free(p->normal); - p->normal = NULL; - multifd_recv_state->ops->recv_cleanup(p); + multifd_recv_cleanup_channel(&multifd_recv_state->params[i]); } - qemu_sem_destroy(&multifd_recv_state->sem_sync); - g_free(multifd_recv_state->params); - multifd_recv_state->params = NULL; - g_free(multifd_recv_state); - multifd_recv_state = NULL; + multifd_recv_cleanup_state(); } void multifd_recv_sync_main(void) From patchwork Thu Feb 8 03:05:16 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 13549266 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id BA56EC4829B for ; Thu, 8 Feb 2024 03:11:18 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1rXulE-0007tR-Vo; Wed, 07 Feb 2024 22:07:37 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1rXukS-0006it-79 for qemu-devel@nongnu.org; Wed, 07 Feb 2024 22:06:52 -0500 Received: from us-smtp-delivery-124.mimecast.com ([170.10.129.124]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1rXukQ-0004On-0d for qemu-devel@nongnu.org; Wed, 07 Feb 2024 22:06:47 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1707361604; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=s9gCdiOaSnuVcgNHNR30uoRO4+oGuDwrWMxfdOknJjo=; b=Hfpr3meujD3fWcEwyPLs50djqI4apptq5RrjgKvMxJrJbg2V/ZpmScHRszKyQk74wZPd2n 8NY+QdZMzkCcj48742jXJhpfr3UkZdkWwFoW5jq930Zy/FFqiB4jRDpkHFY7M8ppk6mDB+ Umrqj0Sz7nESLJi0aQkTYi+tVdEIUWc= Received: from mimecast-mx02.redhat.com (mx-ext.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-275-K5T8cWuMOhW9IHzV11thKQ-1; Wed, 07 Feb 2024 22:06:41 -0500 X-MC-Unique: K5T8cWuMOhW9IHzV11thKQ-1 Received: from smtp.corp.redhat.com (int-mx09.intmail.prod.int.rdu2.redhat.com [10.11.54.9]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 690A53806710; Thu, 8 Feb 2024 03:06:41 +0000 (UTC) Received: from x1n.redhat.com (unknown [10.72.116.45]) by smtp.corp.redhat.com (Postfix) with ESMTP id 28A51492BC7; Thu, 8 Feb 2024 03:06:38 +0000 (UTC) From: peterx@redhat.com To: Peter Maydell , qemu-devel@nongnu.org Cc: peterx@redhat.com, Fabiano Rosas Subject: [PULL 22/34] migration/multifd: Stick with send/recv on function names Date: Thu, 8 Feb 2024 11:05:16 +0800 Message-ID: <20240208030528.368214-23-peterx@redhat.com> In-Reply-To: <20240208030528.368214-1-peterx@redhat.com> References: <20240208030528.368214-1-peterx@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.4.1 on 10.11.54.9 Received-SPF: pass client-ip=170.10.129.124; envelope-from=peterx@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -21 X-Spam_score: -2.2 X-Spam_bar: -- X-Spam_report: (-2.2 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.106, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H2=-0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org From: Peter Xu Most of the multifd code uses send/recv to represent the two sides, but some rare cases use save/load. Since send/recv is the majority, replacing the save/load use cases to use send/recv globally. Now we reach a consensus on the naming. Reviewed-by: Fabiano Rosas Link: https://lore.kernel.org/r/20240202102857.110210-22-peterx@redhat.com Signed-off-by: Peter Xu --- migration/multifd.h | 10 +++++----- migration/migration.c | 12 ++++++------ migration/multifd.c | 10 +++++----- 3 files changed, 16 insertions(+), 16 deletions(-) diff --git a/migration/multifd.h b/migration/multifd.h index a320c53a6f..9b40a53cb6 100644 --- a/migration/multifd.h +++ b/migration/multifd.h @@ -13,11 +13,11 @@ #ifndef QEMU_MIGRATION_MULTIFD_H #define QEMU_MIGRATION_MULTIFD_H -int multifd_save_setup(Error **errp); -void multifd_save_cleanup(void); -int multifd_load_setup(Error **errp); -void multifd_load_cleanup(void); -void multifd_load_shutdown(void); +int multifd_send_setup(Error **errp); +void multifd_send_shutdown(void); +int multifd_recv_setup(Error **errp); +void multifd_recv_cleanup(void); +void multifd_recv_shutdown(void); bool multifd_recv_all_channels_created(void); void multifd_recv_new_channel(QIOChannel *ioc, Error **errp); void multifd_recv_sync_main(void); diff --git a/migration/migration.c b/migration/migration.c index b574e66f7b..9b695685b1 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -312,7 +312,7 @@ void migration_incoming_state_destroy(void) { struct MigrationIncomingState *mis = migration_incoming_get_current(); - multifd_load_cleanup(); + multifd_recv_cleanup(); compress_threads_load_cleanup(); if (mis->to_src_file) { @@ -663,7 +663,7 @@ static void process_incoming_migration_bh(void *opaque) trace_vmstate_downtime_checkpoint("dst-precopy-bh-announced"); - multifd_load_shutdown(); + multifd_recv_shutdown(); dirty_bitmap_mig_before_vm_start(); @@ -760,7 +760,7 @@ fail: MIGRATION_STATUS_FAILED); qemu_fclose(mis->from_src_file); - multifd_load_cleanup(); + multifd_recv_cleanup(); compress_threads_load_cleanup(); exit(EXIT_FAILURE); @@ -886,7 +886,7 @@ void migration_ioc_process_incoming(QIOChannel *ioc, Error **errp) default_channel = !mis->from_src_file; } - if (multifd_load_setup(errp) != 0) { + if (multifd_recv_setup(errp) != 0) { return; } @@ -1332,7 +1332,7 @@ static void migrate_fd_cleanup(MigrationState *s) } bql_lock(); - multifd_save_cleanup(); + multifd_send_shutdown(); qemu_mutex_lock(&s->qemu_file_lock); tmp = s->to_dst_file; s->to_dst_file = NULL; @@ -3630,7 +3630,7 @@ void migrate_fd_connect(MigrationState *s, Error *error_in) return; } - if (multifd_save_setup(&local_err) != 0) { + if (multifd_send_setup(&local_err) != 0) { migrate_set_error(s, local_err); error_report_err(local_err); migrate_set_state(&s->state, MIGRATION_STATUS_SETUP, diff --git a/migration/multifd.c b/migration/multifd.c index e2dd2f6e04..130f86a1fb 100644 --- a/migration/multifd.c +++ b/migration/multifd.c @@ -663,7 +663,7 @@ static void multifd_send_cleanup_state(void) multifd_send_state = NULL; } -void multifd_save_cleanup(void) +void multifd_send_shutdown(void) { int i; @@ -965,7 +965,7 @@ static void multifd_new_send_channel_create(gpointer opaque) socket_send_channel_create(multifd_new_send_channel_async, opaque); } -int multifd_save_setup(Error **errp) +int multifd_send_setup(Error **errp) { int thread_count; uint32_t page_count = MULTIFD_PACKET_SIZE / qemu_target_page_size(); @@ -1063,7 +1063,7 @@ static void multifd_recv_terminate_threads(Error *err) } } -void multifd_load_shutdown(void) +void multifd_recv_shutdown(void) { if (migrate_multifd()) { multifd_recv_terminate_threads(NULL); @@ -1098,7 +1098,7 @@ static void multifd_recv_cleanup_state(void) multifd_recv_state = NULL; } -void multifd_load_cleanup(void) +void multifd_recv_cleanup(void) { int i; @@ -1213,7 +1213,7 @@ static void *multifd_recv_thread(void *opaque) return NULL; } -int multifd_load_setup(Error **errp) +int multifd_recv_setup(Error **errp) { int thread_count; uint32_t page_count = MULTIFD_PACKET_SIZE / qemu_target_page_size(); From patchwork Thu Feb 8 03:05:17 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 13549261 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 940A1C4828F for ; Thu, 8 Feb 2024 03:09:48 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1rXulH-0008MU-Tz; Wed, 07 Feb 2024 22:07:39 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1rXukW-0006pz-VG for qemu-devel@nongnu.org; Wed, 07 Feb 2024 22:06:56 -0500 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1rXukU-0004P9-0F for qemu-devel@nongnu.org; Wed, 07 Feb 2024 22:06:52 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1707361608; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=36Q5NpS1cXhMGO51+Jo7M+vKFKUYs03V1nHpn7mJj4c=; b=BlPSmfCqW+9c/7upbvnEgPlIqI3xsOu2ag2Z/xSjxoGstwDtm9A1S8lSOQeGGX4M/miRgS wdHwNSG4QlEcMKMJHGFg7Ji8PG/ZLVxfAjWovx1qAiQHV/D1sZGbo6UofRMQ9BOZfGfDYh wmu+fuLr+cm9X5sCMLgmywaHqJumIkc= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-668--5qo_JUsP3CmZxNKU1ecYw-1; Wed, 07 Feb 2024 22:06:45 -0500 X-MC-Unique: -5qo_JUsP3CmZxNKU1ecYw-1 Received: from smtp.corp.redhat.com (int-mx09.intmail.prod.int.rdu2.redhat.com [10.11.54.9]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id AC42B845E61; Thu, 8 Feb 2024 03:06:44 +0000 (UTC) Received: from x1n.redhat.com (unknown [10.72.116.45]) by smtp.corp.redhat.com (Postfix) with ESMTP id 38665492BC7; Thu, 8 Feb 2024 03:06:41 +0000 (UTC) From: peterx@redhat.com To: Peter Maydell , qemu-devel@nongnu.org Cc: peterx@redhat.com, Fabiano Rosas , Elena Ufimtseva Subject: [PULL 23/34] migration/multifd: Fix MultiFDSendParams.packet_num race Date: Thu, 8 Feb 2024 11:05:17 +0800 Message-ID: <20240208030528.368214-24-peterx@redhat.com> In-Reply-To: <20240208030528.368214-1-peterx@redhat.com> References: <20240208030528.368214-1-peterx@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.4.1 on 10.11.54.9 Received-SPF: pass client-ip=170.10.133.124; envelope-from=peterx@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -21 X-Spam_score: -2.2 X-Spam_bar: -- X-Spam_report: (-2.2 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.106, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H5=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org From: Peter Xu As reported correctly by Fabiano [1] (while per Fabiano, it sourced back to Elena's initial report in Oct 2023), MultiFDSendParams.packet_num is buggy to be assigned and stored. Consider two consequent operations of: (1) queue a job into multifd send thread X, then (2) queue another sync request to the same send thread X. Then the MultiFDSendParams.packet_num will be assigned twice, and the first assignment can get lost already. To avoid that, we move the packet_num assignment from p->packet_num into where the thread will fill in the packet. Use atomic operations to protect the field, making sure there's no race. Note that atomic fetch_add() may not be good for scaling purposes, however multifd should be fine as number of threads should normally not go beyond 16 threads. Let's leave that concern for later but fix the issue first. There's also a trick on how to make it always work even on 32 bit hosts for uint64_t packet number. Switching to uintptr_t as of now to simply the case. It will cause packet number to overflow easier on 32 bit, but that shouldn't be a major concern for now as 32 bit systems is not the major audience for any performance concerns like what multifd wants to address. We also need to move multifd_send_state definition upper, so that multifd_send_fill_packet() can reference it. [1] https://lore.kernel.org/r/87o7d1jlu5.fsf@suse.de Reported-by: Elena Ufimtseva Reviewed-by: Fabiano Rosas Link: https://lore.kernel.org/r/20240202102857.110210-23-peterx@redhat.com Signed-off-by: Peter Xu --- migration/multifd.h | 2 -- migration/multifd.c | 56 +++++++++++++++++++++++++++------------------ 2 files changed, 34 insertions(+), 24 deletions(-) diff --git a/migration/multifd.h b/migration/multifd.h index 9b40a53cb6..98876ff94a 100644 --- a/migration/multifd.h +++ b/migration/multifd.h @@ -97,8 +97,6 @@ typedef struct { bool running; /* multifd flags for each packet */ uint32_t flags; - /* global number of generated multifd packets */ - uint64_t packet_num; /* * The sender thread has work to do if either of below boolean is set. * diff --git a/migration/multifd.c b/migration/multifd.c index 130f86a1fb..b317d57d61 100644 --- a/migration/multifd.c +++ b/migration/multifd.c @@ -45,6 +45,35 @@ typedef struct { uint64_t unused2[4]; /* Reserved for future use */ } __attribute__((packed)) MultiFDInit_t; +struct { + MultiFDSendParams *params; + /* array of pages to sent */ + MultiFDPages_t *pages; + /* + * Global number of generated multifd packets. + * + * Note that we used 'uintptr_t' because it'll naturally support atomic + * operations on both 32bit / 64 bits hosts. It means on 32bit systems + * multifd will overflow the packet_num easier, but that should be + * fine. + * + * Another option is to use QEMU's Stat64 then it'll be 64 bits on all + * hosts, however so far it does not support atomic fetch_add() yet. + * Make it easy for now. + */ + uintptr_t packet_num; + /* send channels ready */ + QemuSemaphore channels_ready; + /* + * Have we already run terminate threads. There is a race when it + * happens that we got one error while we are exiting. + * We will use atomic operations. Only valid values are 0 and 1. + */ + int exiting; + /* multifd ops */ + MultiFDMethods *ops; +} *multifd_send_state; + /* Multifd without compression */ /** @@ -292,13 +321,16 @@ void multifd_send_fill_packet(MultiFDSendParams *p) { MultiFDPacket_t *packet = p->packet; MultiFDPages_t *pages = p->pages; + uint64_t packet_num; int i; packet->flags = cpu_to_be32(p->flags); packet->pages_alloc = cpu_to_be32(p->pages->allocated); packet->normal_pages = cpu_to_be32(pages->num); packet->next_packet_size = cpu_to_be32(p->next_packet_size); - packet->packet_num = cpu_to_be64(p->packet_num); + + packet_num = qatomic_fetch_inc(&multifd_send_state->packet_num); + packet->packet_num = cpu_to_be64(packet_num); if (pages->block) { strncpy(packet->ramblock, pages->block->idstr, 256); @@ -314,7 +346,7 @@ void multifd_send_fill_packet(MultiFDSendParams *p) p->packets_sent++; p->total_normal_pages += pages->num; - trace_multifd_send(p->id, p->packet_num, pages->num, p->flags, + trace_multifd_send(p->id, packet_num, pages->num, p->flags, p->next_packet_size); } @@ -398,24 +430,6 @@ static int multifd_recv_unfill_packet(MultiFDRecvParams *p, Error **errp) return 0; } -struct { - MultiFDSendParams *params; - /* array of pages to sent */ - MultiFDPages_t *pages; - /* global number of generated multifd packets */ - uint64_t packet_num; - /* send channels ready */ - QemuSemaphore channels_ready; - /* - * Have we already run terminate threads. There is a race when it - * happens that we got one error while we are exiting. - * We will use atomic operations. Only valid values are 0 and 1. - */ - int exiting; - /* multifd ops */ - MultiFDMethods *ops; -} *multifd_send_state; - static bool multifd_send_should_exit(void) { return qatomic_read(&multifd_send_state->exiting); @@ -497,7 +511,6 @@ static bool multifd_send_pages(void) */ assert(qatomic_read(&p->pending_job) == false); qatomic_set(&p->pending_job, true); - p->packet_num = multifd_send_state->packet_num++; multifd_send_state->pages = p->pages; p->pages = pages; qemu_mutex_unlock(&p->mutex); @@ -730,7 +743,6 @@ int multifd_send_sync_main(void) trace_multifd_send_sync_main_signal(p->id); qemu_mutex_lock(&p->mutex); - p->packet_num = multifd_send_state->packet_num++; /* * We should be the only user so far, so not possible to be set by * others concurrently. From patchwork Thu Feb 8 03:05:18 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 13549263 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 3A151C4828F for ; Thu, 8 Feb 2024 03:10:39 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1rXulL-0000Yl-PB; Wed, 07 Feb 2024 22:07:43 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1rXukX-0006q0-3F for qemu-devel@nongnu.org; Wed, 07 Feb 2024 22:06:56 -0500 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1rXukU-0004Qc-W2 for qemu-devel@nongnu.org; Wed, 07 Feb 2024 22:06:52 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1707361610; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=LOpRDl/zE/OAEgC9rgz4jRiLK5ook8APjNrgtvwvbQI=; b=jO14D5LtceNwzgmDm+EhOp4sMWzNbrv4wnOaIL9nC3bgALIMu24BBVqhMjepIPCtFoXRfw OpiNTPEAknqhnwlTtQ8yal1YRuRG1Hs0iSi9z/VY3VWHRXbzObXMslAjjgpPptK+Ircpu9 QOv/2ARNprgEt8R1RcIEETVi/e81IWw= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-620-D-xbqwfSP_iFbxs4FwchGQ-1; Wed, 07 Feb 2024 22:06:48 -0500 X-MC-Unique: D-xbqwfSP_iFbxs4FwchGQ-1 Received: from smtp.corp.redhat.com (int-mx09.intmail.prod.int.rdu2.redhat.com [10.11.54.9]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id EFF07811E81; Thu, 8 Feb 2024 03:06:47 +0000 (UTC) Received: from x1n.redhat.com (unknown [10.72.116.45]) by smtp.corp.redhat.com (Postfix) with ESMTP id 7C708492BC7; Thu, 8 Feb 2024 03:06:45 +0000 (UTC) From: peterx@redhat.com To: Peter Maydell , qemu-devel@nongnu.org Cc: peterx@redhat.com, Fabiano Rosas Subject: [PULL 24/34] migration/multifd: Optimize sender side to be lockless Date: Thu, 8 Feb 2024 11:05:18 +0800 Message-ID: <20240208030528.368214-25-peterx@redhat.com> In-Reply-To: <20240208030528.368214-1-peterx@redhat.com> References: <20240208030528.368214-1-peterx@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.4.1 on 10.11.54.9 Received-SPF: pass client-ip=170.10.133.124; envelope-from=peterx@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -21 X-Spam_score: -2.2 X-Spam_bar: -- X-Spam_report: (-2.2 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.106, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H5=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org From: Peter Xu When reviewing my attempt to refactor send_prepare(), Fabiano suggested we try out with dropping the mutex in multifd code [1]. I thought about that before but I never tried to change the code. Now maybe it's time to give it a stab. This only optimizes the sender side. The trick here is multifd has a clear provider/consumer model, that the migration main thread publishes requests (either pending_job/pending_sync), while the multifd sender threads are consumers. Here we don't have a lot of complicated data sharing, and the jobs can logically be submitted lockless. Arm the code with atomic weapons. Two things worth mentioning: - For multifd_send_pages(): we can use qatomic_load_acquire() when trying to find a free channel, but that's expensive if we attach one ACQUIRE per channel. Instead, keep the qatomic_read() on reading the pending_job flag as we do already, meanwhile use one smp_mb_acquire() after the loop to guarantee the memory ordering. - For pending_sync: it doesn't have any extra data required since now p->flags are never touched, it should be safe to not use memory barrier. That's different from pending_job. Provide rich comments for all the lockless operations to state how they are paired. With that, we can remove the mutex. [1] https://lore.kernel.org/r/87o7d1jlu5.fsf@suse.de Suggested-by: Fabiano Rosas Reviewed-by: Fabiano Rosas Link: https://lore.kernel.org/r/20240202102857.110210-24-peterx@redhat.com Signed-off-by: Peter Xu --- migration/multifd.h | 2 -- migration/multifd.c | 51 +++++++++++++++++++++++---------------------- 2 files changed, 26 insertions(+), 27 deletions(-) diff --git a/migration/multifd.h b/migration/multifd.h index 98876ff94a..78a2317263 100644 --- a/migration/multifd.h +++ b/migration/multifd.h @@ -91,8 +91,6 @@ typedef struct { /* syncs main thread and channels */ QemuSemaphore sem_sync; - /* this mutex protects the following parameters */ - QemuMutex mutex; /* is this channel thread running */ bool running; /* multifd flags for each packet */ diff --git a/migration/multifd.c b/migration/multifd.c index b317d57d61..fbdb129088 100644 --- a/migration/multifd.c +++ b/migration/multifd.c @@ -501,19 +501,19 @@ static bool multifd_send_pages(void) } } - qemu_mutex_lock(&p->mutex); - assert(!p->pages->num); - assert(!p->pages->block); /* - * Double check on pending_job==false with the lock. In the future if - * we can have >1 requester thread, we can replace this with a "goto - * retry", but that is for later. + * Make sure we read p->pending_job before all the rest. Pairs with + * qatomic_store_release() in multifd_send_thread(). */ - assert(qatomic_read(&p->pending_job) == false); - qatomic_set(&p->pending_job, true); + smp_mb_acquire(); + assert(!p->pages->num); multifd_send_state->pages = p->pages; p->pages = pages; - qemu_mutex_unlock(&p->mutex); + /* + * Making sure p->pages is setup before marking pending_job=true. Pairs + * with the qatomic_load_acquire() in multifd_send_thread(). + */ + qatomic_store_release(&p->pending_job, true); qemu_sem_post(&p->sem); return true; @@ -648,7 +648,6 @@ static bool multifd_send_cleanup_channel(MultiFDSendParams *p, Error **errp) } multifd_send_channel_destroy(p->c); p->c = NULL; - qemu_mutex_destroy(&p->mutex); qemu_sem_destroy(&p->sem); qemu_sem_destroy(&p->sem_sync); g_free(p->name); @@ -742,14 +741,12 @@ int multifd_send_sync_main(void) trace_multifd_send_sync_main_signal(p->id); - qemu_mutex_lock(&p->mutex); /* * We should be the only user so far, so not possible to be set by * others concurrently. */ assert(qatomic_read(&p->pending_sync) == false); qatomic_set(&p->pending_sync, true); - qemu_mutex_unlock(&p->mutex); qemu_sem_post(&p->sem); } for (i = 0; i < migrate_multifd_channels(); i++) { @@ -796,9 +793,12 @@ static void *multifd_send_thread(void *opaque) if (multifd_send_should_exit()) { break; } - qemu_mutex_lock(&p->mutex); - if (qatomic_read(&p->pending_job)) { + /* + * Read pending_job flag before p->pages. Pairs with the + * qatomic_store_release() in multifd_send_pages(). + */ + if (qatomic_load_acquire(&p->pending_job)) { MultiFDPages_t *pages = p->pages; p->iovs_num = 0; @@ -806,14 +806,12 @@ static void *multifd_send_thread(void *opaque) ret = multifd_send_state->ops->send_prepare(p, &local_err); if (ret != 0) { - qemu_mutex_unlock(&p->mutex); break; } ret = qio_channel_writev_full_all(p->c, p->iov, p->iovs_num, NULL, 0, p->write_flags, &local_err); if (ret != 0) { - qemu_mutex_unlock(&p->mutex); break; } @@ -822,24 +820,31 @@ static void *multifd_send_thread(void *opaque) multifd_pages_reset(p->pages); p->next_packet_size = 0; - qatomic_set(&p->pending_job, false); - qemu_mutex_unlock(&p->mutex); + + /* + * Making sure p->pages is published before saying "we're + * free". Pairs with the smp_mb_acquire() in + * multifd_send_pages(). + */ + qatomic_store_release(&p->pending_job, false); } else { - /* If not a normal job, must be a sync request */ + /* + * If not a normal job, must be a sync request. Note that + * pending_sync is a standalone flag (unlike pending_job), so + * it doesn't require explicit memory barriers. + */ assert(qatomic_read(&p->pending_sync)); p->flags = MULTIFD_FLAG_SYNC; multifd_send_fill_packet(p); ret = qio_channel_write_all(p->c, (void *)p->packet, p->packet_len, &local_err); if (ret != 0) { - qemu_mutex_unlock(&p->mutex); break; } /* p->next_packet_size will always be zero for a SYNC packet */ stat64_add(&mig_stats.multifd_bytes, p->packet_len); p->flags = 0; qatomic_set(&p->pending_sync, false); - qemu_mutex_unlock(&p->mutex); qemu_sem_post(&p->sem_sync); } } @@ -853,10 +858,7 @@ out: error_free(local_err); } - qemu_mutex_lock(&p->mutex); p->running = false; - qemu_mutex_unlock(&p->mutex); - rcu_unregister_thread(); migration_threads_remove(thread); trace_multifd_send_thread_end(p->id, p->packets_sent, p->total_normal_pages); @@ -998,7 +1000,6 @@ int multifd_send_setup(Error **errp) for (i = 0; i < thread_count; i++) { MultiFDSendParams *p = &multifd_send_state->params[i]; - qemu_mutex_init(&p->mutex); qemu_sem_init(&p->sem, 0); qemu_sem_init(&p->sem_sync, 0); p->id = i; From patchwork Thu Feb 8 03:05:19 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 13549258 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 1F609C4828F for ; Thu, 8 Feb 2024 03:09:32 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1rXulS-0001ej-Lo; Wed, 07 Feb 2024 22:07:50 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1rXukb-0006wB-UG for qemu-devel@nongnu.org; Wed, 07 Feb 2024 22:07:00 -0500 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1rXuka-0004R5-5a for qemu-devel@nongnu.org; Wed, 07 Feb 2024 22:06:57 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1707361615; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=bqdspIyEEHDphJ+ZT5r7uaPRH0LLxzv8NL2MWnHp1Hw=; b=YK4CliYHJI8AyWssaQkH1LL/Oo02oUMMoNcy+Anx2Arx025G4XjuX967Llb9dTyYryi2bV dYM7tJUStBHjw+xa2UhzrEaWMzoDX8uWTp0Yc6a20vlLj6KAoiXlESj950uBHP3UKa3AZ5 k/LxwoCmvUi/t3sz8nRFVPSEI9dLSWo= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-16-vQ-EpK2HPpuO54zLtxAcGQ-1; Wed, 07 Feb 2024 22:06:52 -0500 X-MC-Unique: vQ-EpK2HPpuO54zLtxAcGQ-1 Received: from smtp.corp.redhat.com (int-mx09.intmail.prod.int.rdu2.redhat.com [10.11.54.9]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 8AFB5185A784; Thu, 8 Feb 2024 03:06:51 +0000 (UTC) Received: from x1n.redhat.com (unknown [10.72.116.45]) by smtp.corp.redhat.com (Postfix) with ESMTP id 91B0F492BC7; Thu, 8 Feb 2024 03:06:48 +0000 (UTC) From: peterx@redhat.com To: Peter Maydell , qemu-devel@nongnu.org Cc: peterx@redhat.com, Fabiano Rosas , Avihai Horon , qemu-stable Subject: [PULL 25/34] migration: Fix logic of channels and transport compatibility check Date: Thu, 8 Feb 2024 11:05:19 +0800 Message-ID: <20240208030528.368214-26-peterx@redhat.com> In-Reply-To: <20240208030528.368214-1-peterx@redhat.com> References: <20240208030528.368214-1-peterx@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.4.1 on 10.11.54.9 Received-SPF: pass client-ip=170.10.133.124; envelope-from=peterx@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -21 X-Spam_score: -2.2 X-Spam_bar: -- X-Spam_report: (-2.2 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.106, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H5=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=unavailable autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org From: Avihai Horon The commit in the fixes line mistakenly modified the channels and transport compatibility check logic so it now checks multi-channel support only for socket transport type. Thus, running multifd migration using a transport other than socket that is incompatible with multi-channels (such as "exec") would lead to a segmentation fault instead of an error message. For example: (qemu) migrate_set_capability multifd on (qemu) migrate -d "exec:cat > /tmp/vm_state" Segmentation fault (core dumped) Fix it by checking multi-channel compatibility for all transport types. Cc: qemu-stable Fixes: d95533e1cdcc ("migration: modify migration_channels_and_uri_compatible() for new QAPI syntax") Signed-off-by: Avihai Horon Reviewed-by: Peter Xu Link: https://lore.kernel.org/r/20240125162528.7552-2-avihaih@nvidia.com Signed-off-by: Peter Xu --- migration/migration.c | 17 +++++++++++------ 1 file changed, 11 insertions(+), 6 deletions(-) diff --git a/migration/migration.c b/migration/migration.c index 9b695685b1..b427be8762 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -129,11 +129,17 @@ static bool migration_needs_multiple_sockets(void) return migrate_multifd() || migrate_postcopy_preempt(); } -static bool transport_supports_multi_channels(SocketAddress *saddr) +static bool transport_supports_multi_channels(MigrationAddress *addr) { - return saddr->type == SOCKET_ADDRESS_TYPE_INET || - saddr->type == SOCKET_ADDRESS_TYPE_UNIX || - saddr->type == SOCKET_ADDRESS_TYPE_VSOCK; + if (addr->transport == MIGRATION_ADDRESS_TYPE_SOCKET) { + SocketAddress *saddr = &addr->u.socket; + + return saddr->type == SOCKET_ADDRESS_TYPE_INET || + saddr->type == SOCKET_ADDRESS_TYPE_UNIX || + saddr->type == SOCKET_ADDRESS_TYPE_VSOCK; + } + + return false; } static bool @@ -141,8 +147,7 @@ migration_channels_and_transport_compatible(MigrationAddress *addr, Error **errp) { if (migration_needs_multiple_sockets() && - (addr->transport == MIGRATION_ADDRESS_TYPE_SOCKET) && - !transport_supports_multi_channels(&addr->u.socket)) { + !transport_supports_multi_channels(addr)) { error_setg(errp, "Migration requires multi-channel URIs (e.g. tcp)"); return false; } From patchwork Thu Feb 8 03:05:20 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 13549246 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 719B1C4828F for ; Thu, 8 Feb 2024 03:08:22 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1rXulU-0001qV-C0; Wed, 07 Feb 2024 22:07:52 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1rXuki-00075s-Kp for qemu-devel@nongnu.org; Wed, 07 Feb 2024 22:07:06 -0500 Received: from us-smtp-delivery-124.mimecast.com ([170.10.129.124]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1rXukg-0004RZ-Oz for qemu-devel@nongnu.org; Wed, 07 Feb 2024 22:07:04 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1707361619; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=vGIrHIQU+h6EjMRrQgJmpPfxPr2te/s8Zc7EDzP32+E=; b=KIfuCEd20yOpXY91Vs2UQ4Z3UUUiI1ZZ6ZT0Xwe7eTvqgi8J4chs2QPvVx2vVvC0OL/ELA PgW5MuETNnggQiayFpdz2Dvg+26y+AisIXAP3kdZSvkGqzGDFE9Uu6O4yzkDoVIgLGkgut zdtxz7dhImm8BD1NdUmRwSlmyndJLPk= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-605-vRVOLyu4PGGLe5a1qic1YA-1; Wed, 07 Feb 2024 22:06:55 -0500 X-MC-Unique: vRVOLyu4PGGLe5a1qic1YA-1 Received: from smtp.corp.redhat.com (int-mx09.intmail.prod.int.rdu2.redhat.com [10.11.54.9]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 3E86A185A78E; Thu, 8 Feb 2024 03:06:55 +0000 (UTC) Received: from x1n.redhat.com (unknown [10.72.116.45]) by smtp.corp.redhat.com (Postfix) with ESMTP id 5A2AE492BC7; Thu, 8 Feb 2024 03:06:51 +0000 (UTC) From: peterx@redhat.com To: Peter Maydell , qemu-devel@nongnu.org Cc: peterx@redhat.com, Fabiano Rosas , qemu-stable Subject: [PULL 26/34] migration/multifd: Join the TLS thread Date: Thu, 8 Feb 2024 11:05:20 +0800 Message-ID: <20240208030528.368214-27-peterx@redhat.com> In-Reply-To: <20240208030528.368214-1-peterx@redhat.com> References: <20240208030528.368214-1-peterx@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.4.1 on 10.11.54.9 Received-SPF: pass client-ip=170.10.129.124; envelope-from=peterx@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -21 X-Spam_score: -2.2 X-Spam_bar: -- X-Spam_report: (-2.2 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.106, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H2=-0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=unavailable autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org From: Fabiano Rosas We're currently leaking the resources of the TLS thread by not joining it and also overwriting the p->thread pointer altogether. Fixes: a1af605bd5 ("migration/multifd: fix hangup with TLS-Multifd due to blocking handshake") Cc: qemu-stable Reviewed-by: Peter Xu Signed-off-by: Fabiano Rosas Link: https://lore.kernel.org/r/20240206215118.6171-2-farosas@suse.de Signed-off-by: Peter Xu --- migration/multifd.h | 2 ++ migration/multifd.c | 8 +++++++- 2 files changed, 9 insertions(+), 1 deletion(-) diff --git a/migration/multifd.h b/migration/multifd.h index 78a2317263..720c9d50db 100644 --- a/migration/multifd.h +++ b/migration/multifd.h @@ -73,6 +73,8 @@ typedef struct { char *name; /* channel thread id */ QemuThread thread; + QemuThread tls_thread; + bool tls_thread_created; /* communication channel */ QIOChannel *c; /* is the yank function registered */ diff --git a/migration/multifd.c b/migration/multifd.c index fbdb129088..5551711a2a 100644 --- a/migration/multifd.c +++ b/migration/multifd.c @@ -630,6 +630,10 @@ static void multifd_send_terminate_threads(void) for (i = 0; i < migrate_multifd_channels(); i++) { MultiFDSendParams *p = &multifd_send_state->params[i]; + if (p->tls_thread_created) { + qemu_thread_join(&p->tls_thread); + } + if (p->running) { qemu_thread_join(&p->thread); } @@ -921,7 +925,9 @@ static bool multifd_tls_channel_connect(MultiFDSendParams *p, trace_multifd_tls_outgoing_handshake_start(ioc, tioc, hostname); qio_channel_set_name(QIO_CHANNEL(tioc), "multifd-tls-outgoing"); p->c = QIO_CHANNEL(tioc); - qemu_thread_create(&p->thread, "multifd-tls-handshake-worker", + + p->tls_thread_created = true; + qemu_thread_create(&p->tls_thread, "multifd-tls-handshake-worker", multifd_tls_handshake_thread, p, QEMU_THREAD_JOINABLE); return true; From patchwork Thu Feb 8 03:05:21 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 13549265 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 9C680C4828F for ; Thu, 8 Feb 2024 03:11:11 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1rXulZ-0002L3-Fp; Wed, 07 Feb 2024 22:07:57 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1rXuki-00076S-Sp for qemu-devel@nongnu.org; Wed, 07 Feb 2024 22:07:06 -0500 Received: from us-smtp-delivery-124.mimecast.com ([170.10.129.124]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1rXukh-0004Rn-4C for qemu-devel@nongnu.org; Wed, 07 Feb 2024 22:07:04 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1707361621; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=JCTnp5IEaIuBcbRgM9kMkTTMtCWI6S6885VUUuJ6pl4=; b=TqSQgRZ6RyOsA+xCSaWoxh2I8pvGDpFMe0SIdvoqDZCod3Igv7AXHu0ZDZlibsBSqhQSc4 xus4D1CAFydMEB08a0S0NfaRrhBpKvQZkK0bIDBdNGdeHPooGOH6/1VGROBcDRMneZ5Pl1 nXL79SH9x20YuSgH7/dginOrSD0Au/Q= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-581-PCI2s-kKPDm-iJxn6_Vj1w-1; Wed, 07 Feb 2024 22:07:00 -0500 X-MC-Unique: PCI2s-kKPDm-iJxn6_Vj1w-1 Received: from smtp.corp.redhat.com (int-mx09.intmail.prod.int.rdu2.redhat.com [10.11.54.9]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id BA661870821; Thu, 8 Feb 2024 03:06:59 +0000 (UTC) Received: from x1n.redhat.com (unknown [10.72.116.45]) by smtp.corp.redhat.com (Postfix) with ESMTP id 0EDBF492BC7; Thu, 8 Feb 2024 03:06:55 +0000 (UTC) From: peterx@redhat.com To: Peter Maydell , qemu-devel@nongnu.org Cc: peterx@redhat.com, Fabiano Rosas , qemu-stable , Avihai Horon , chenyuhui5@huawei.com Subject: [PULL 27/34] migration/multifd: Remove p->running Date: Thu, 8 Feb 2024 11:05:21 +0800 Message-ID: <20240208030528.368214-28-peterx@redhat.com> In-Reply-To: <20240208030528.368214-1-peterx@redhat.com> References: <20240208030528.368214-1-peterx@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.4.1 on 10.11.54.9 Received-SPF: pass client-ip=170.10.129.124; envelope-from=peterx@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -21 X-Spam_score: -2.2 X-Spam_bar: -- X-Spam_report: (-2.2 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.106, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H2=-0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org From: Fabiano Rosas We currently only need p->running to avoid calling qemu_thread_join() on a non existent thread if the thread has never been created. However, there are at least two bugs in this logic: 1) On the sending side, p->running is set too early and qemu_thread_create() can be skipped due to an error during TLS handshake, leaving the flag set and leading to a crash when multifd_send_cleanup() calls qemu_thread_join(). 2) During exit, the multifd thread clears the flag while holding the channel lock. The counterpart at multifd_send_cleanup() reads the flag outside of the lock and might free the mutex while the multifd thread still has it locked. Fix the first issue by setting the flag right before creating the thread. Rename it from p->running to p->thread_created to clarify its usage. Fix the second issue by not clearing the flag at the multifd thread exit. We don't have any use for that. Note that these bugs are straight-forward logic issues and not race conditions. There is still a gap for races to affect this code due to multifd_send_cleanup() being allowed to run concurrently with the thread creation loop. This issue is solved in the next patches. Cc: qemu-stable Fixes: 29647140157a ("migration/tls: add support for multifd tls-handshake") Reported-by: Avihai Horon Reported-by: chenyuhui5@huawei.com Reviewed-by: Peter Xu Signed-off-by: Fabiano Rosas Link: https://lore.kernel.org/r/20240206215118.6171-3-farosas@suse.de Signed-off-by: Peter Xu --- migration/multifd.h | 7 ++----- migration/multifd.c | 27 ++++++++++++--------------- 2 files changed, 14 insertions(+), 20 deletions(-) diff --git a/migration/multifd.h b/migration/multifd.h index 720c9d50db..7881980ee6 100644 --- a/migration/multifd.h +++ b/migration/multifd.h @@ -73,6 +73,7 @@ typedef struct { char *name; /* channel thread id */ QemuThread thread; + bool thread_created; QemuThread tls_thread; bool tls_thread_created; /* communication channel */ @@ -93,8 +94,6 @@ typedef struct { /* syncs main thread and channels */ QemuSemaphore sem_sync; - /* is this channel thread running */ - bool running; /* multifd flags for each packet */ uint32_t flags; /* @@ -143,6 +142,7 @@ typedef struct { char *name; /* channel thread id */ QemuThread thread; + bool thread_created; /* communication channel */ QIOChannel *c; /* packet allocated len */ @@ -157,8 +157,6 @@ typedef struct { /* this mutex protects the following parameters */ QemuMutex mutex; - /* is this channel thread running */ - bool running; /* should this thread finish */ bool quit; /* multifd flags for each packet */ @@ -217,4 +215,3 @@ static inline void multifd_send_prepare_header(MultiFDSendParams *p) #endif - diff --git a/migration/multifd.c b/migration/multifd.c index 5551711a2a..e6ac1ad6dc 100644 --- a/migration/multifd.c +++ b/migration/multifd.c @@ -634,7 +634,7 @@ static void multifd_send_terminate_threads(void) qemu_thread_join(&p->tls_thread); } - if (p->running) { + if (p->thread_created) { qemu_thread_join(&p->thread); } } @@ -862,7 +862,6 @@ out: error_free(local_err); } - p->running = false; rcu_unregister_thread(); migration_threads_remove(thread); trace_multifd_send_thread_end(p->id, p->packets_sent, p->total_normal_pages); @@ -953,6 +952,8 @@ static bool multifd_channel_connect(MultiFDSendParams *p, migration_ioc_register_yank(ioc); p->registered_yank = true; p->c = ioc; + + p->thread_created = true; qemu_thread_create(&p->thread, p->name, multifd_send_thread, p, QEMU_THREAD_JOINABLE); return true; @@ -967,7 +968,6 @@ static void multifd_new_send_channel_async(QIOTask *task, gpointer opaque) trace_multifd_new_send_channel_async(p->id); if (!qio_task_propagate_error(task, &local_err)) { qio_channel_set_delay(ioc, false); - p->running = true; if (multifd_channel_connect(p, ioc, &local_err)) { return; } @@ -1128,15 +1128,15 @@ void multifd_recv_cleanup(void) for (i = 0; i < migrate_multifd_channels(); i++) { MultiFDRecvParams *p = &multifd_recv_state->params[i]; - if (p->running) { - /* - * multifd_recv_thread may hung at MULTIFD_FLAG_SYNC handle code, - * however try to wakeup it without harm in cleanup phase. - */ - qemu_sem_post(&p->sem_sync); - } + /* + * multifd_recv_thread may hung at MULTIFD_FLAG_SYNC handle code, + * however try to wakeup it without harm in cleanup phase. + */ + qemu_sem_post(&p->sem_sync); - qemu_thread_join(&p->thread); + if (p->thread_created) { + qemu_thread_join(&p->thread); + } } for (i = 0; i < migrate_multifd_channels(); i++) { multifd_recv_cleanup_channel(&multifd_recv_state->params[i]); @@ -1222,9 +1222,6 @@ static void *multifd_recv_thread(void *opaque) multifd_recv_terminate_threads(local_err); error_free(local_err); } - qemu_mutex_lock(&p->mutex); - p->running = false; - qemu_mutex_unlock(&p->mutex); rcu_unregister_thread(); trace_multifd_recv_thread_end(p->id, p->packets_recved, p->total_normal_pages); @@ -1330,7 +1327,7 @@ void multifd_recv_new_channel(QIOChannel *ioc, Error **errp) p->c = ioc; object_ref(OBJECT(ioc)); - p->running = true; + p->thread_created = true; qemu_thread_create(&p->thread, p->name, multifd_recv_thread, p, QEMU_THREAD_JOINABLE); qatomic_inc(&multifd_recv_state->count); From patchwork Thu Feb 8 03:05:22 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 13549260 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 8A972C4829D for ; Thu, 8 Feb 2024 03:09:36 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1rXulK-0000Mw-AO; Wed, 07 Feb 2024 22:07:42 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1rXukn-0007EK-MR for qemu-devel@nongnu.org; Wed, 07 Feb 2024 22:07:10 -0500 Received: from us-smtp-delivery-124.mimecast.com ([170.10.129.124]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1rXukl-0004SO-VH for qemu-devel@nongnu.org; Wed, 07 Feb 2024 22:07:09 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1707361627; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=hM7JTowvoXYGlHo4A15T65Zf4ZTl3CP8XX2SKnb7Lko=; b=eUNRtOhLsHeegyLrE4sD8G/oVVYaRYZZvMwYacn5z0DIlkUhb/bKHaeCj9MH8qAcsO13IY PyK92zshfEml7QPf2mCWHHuINKZI+LcogMS9r2AjOlyqlDc2vRhnCPLL0tUVkQiItvyWoN FxQdskxwNE+oiV5OAangJr8Z8Lv0B7o= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-235-8o_JbMc7PWSf1ec4xamrJQ-1; Wed, 07 Feb 2024 22:07:03 -0500 X-MC-Unique: 8o_JbMc7PWSf1ec4xamrJQ-1 Received: from smtp.corp.redhat.com (int-mx09.intmail.prod.int.rdu2.redhat.com [10.11.54.9]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id E443D85A58B; Thu, 8 Feb 2024 03:07:02 +0000 (UTC) Received: from x1n.redhat.com (unknown [10.72.116.45]) by smtp.corp.redhat.com (Postfix) with ESMTP id 8939E492BC7; Thu, 8 Feb 2024 03:07:00 +0000 (UTC) From: peterx@redhat.com To: Peter Maydell , qemu-devel@nongnu.org Cc: peterx@redhat.com, Fabiano Rosas Subject: [PULL 28/34] migration/multifd: Move multifd_send_setup error handling in to the function Date: Thu, 8 Feb 2024 11:05:22 +0800 Message-ID: <20240208030528.368214-29-peterx@redhat.com> In-Reply-To: <20240208030528.368214-1-peterx@redhat.com> References: <20240208030528.368214-1-peterx@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.4.1 on 10.11.54.9 Received-SPF: pass client-ip=170.10.129.124; envelope-from=peterx@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -21 X-Spam_score: -2.2 X-Spam_bar: -- X-Spam_report: (-2.2 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.106, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H2=-0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org From: Fabiano Rosas Hide the error handling inside multifd_send_setup to make it cleaner for the next patch to move the function around. Reviewed-by: Peter Xu Signed-off-by: Fabiano Rosas Link: https://lore.kernel.org/r/20240206215118.6171-4-farosas@suse.de Signed-off-by: Peter Xu --- migration/multifd.h | 2 +- migration/migration.c | 6 +----- migration/multifd.c | 24 +++++++++++++++++------- 3 files changed, 19 insertions(+), 13 deletions(-) diff --git a/migration/multifd.h b/migration/multifd.h index 7881980ee6..8a1cad0996 100644 --- a/migration/multifd.h +++ b/migration/multifd.h @@ -13,7 +13,7 @@ #ifndef QEMU_MIGRATION_MULTIFD_H #define QEMU_MIGRATION_MULTIFD_H -int multifd_send_setup(Error **errp); +bool multifd_send_setup(void); void multifd_send_shutdown(void); int multifd_recv_setup(Error **errp); void multifd_recv_cleanup(void); diff --git a/migration/migration.c b/migration/migration.c index b427be8762..6432a81e8b 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -3635,11 +3635,7 @@ void migrate_fd_connect(MigrationState *s, Error *error_in) return; } - if (multifd_send_setup(&local_err) != 0) { - migrate_set_error(s, local_err); - error_report_err(local_err); - migrate_set_state(&s->state, MIGRATION_STATUS_SETUP, - MIGRATION_STATUS_FAILED); + if (!multifd_send_setup()) { migrate_fd_cleanup(s); return; } diff --git a/migration/multifd.c b/migration/multifd.c index e6ac1ad6dc..cf865edba0 100644 --- a/migration/multifd.c +++ b/migration/multifd.c @@ -985,14 +985,16 @@ static void multifd_new_send_channel_create(gpointer opaque) socket_send_channel_create(multifd_new_send_channel_async, opaque); } -int multifd_send_setup(Error **errp) +bool multifd_send_setup(void) { - int thread_count; + MigrationState *s = migrate_get_current(); + Error *local_err = NULL; + int thread_count, ret = 0; uint32_t page_count = MULTIFD_PACKET_SIZE / qemu_target_page_size(); uint8_t i; if (!migrate_multifd()) { - return 0; + return true; } thread_count = migrate_multifd_channels(); @@ -1026,14 +1028,22 @@ int multifd_send_setup(Error **errp) for (i = 0; i < thread_count; i++) { MultiFDSendParams *p = &multifd_send_state->params[i]; - int ret; - ret = multifd_send_state->ops->send_setup(p, errp); + ret = multifd_send_state->ops->send_setup(p, &local_err); if (ret) { - return ret; + break; } } - return 0; + + if (ret) { + migrate_set_error(s, local_err); + error_report_err(local_err); + migrate_set_state(&s->state, MIGRATION_STATUS_SETUP, + MIGRATION_STATUS_FAILED); + return false; + } + + return true; } struct { From patchwork Thu Feb 8 03:05:23 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 13549250 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 32E77C4828F for ; Thu, 8 Feb 2024 03:08:44 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1rXulL-0000UZ-9f; Wed, 07 Feb 2024 22:07:43 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1rXukp-0007Fd-2Q for qemu-devel@nongnu.org; Wed, 07 Feb 2024 22:07:12 -0500 Received: from us-smtp-delivery-124.mimecast.com ([170.10.129.124]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1rXukm-0004SS-LL for qemu-devel@nongnu.org; Wed, 07 Feb 2024 22:07:10 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1707361627; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=ghuKxeFZpBanNogPdnPaumsPzZ5xJQiFLT94YYQw2kw=; b=IrIeLtrFlfCnMrejpPDb+7lfvJxUnd4tWZDuIyz9P2ELIe1KvhJAXVrT6QkUYsWyqx3T7T ltaPkdZ9WYRCzqMy3OjYPs2rOeIeU+pJJIQuSP4jy6GabI4SqJLObvPSdXIjqq9lWoCG1o 290qINeUk5D3Tl7hQM40lU1fYtWFO2M= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-203-I4BFMWgDMiqFfPVzTr1ToA-1; Wed, 07 Feb 2024 22:07:06 -0500 X-MC-Unique: I4BFMWgDMiqFfPVzTr1ToA-1 Received: from smtp.corp.redhat.com (int-mx09.intmail.prod.int.rdu2.redhat.com [10.11.54.9]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 33EF2859552; Thu, 8 Feb 2024 03:07:06 +0000 (UTC) Received: from x1n.redhat.com (unknown [10.72.116.45]) by smtp.corp.redhat.com (Postfix) with ESMTP id B3592492BC7; Thu, 8 Feb 2024 03:07:03 +0000 (UTC) From: peterx@redhat.com To: Peter Maydell , qemu-devel@nongnu.org Cc: peterx@redhat.com, Fabiano Rosas Subject: [PULL 29/34] migration/multifd: Move multifd_send_setup into migration thread Date: Thu, 8 Feb 2024 11:05:23 +0800 Message-ID: <20240208030528.368214-30-peterx@redhat.com> In-Reply-To: <20240208030528.368214-1-peterx@redhat.com> References: <20240208030528.368214-1-peterx@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.4.1 on 10.11.54.9 Received-SPF: pass client-ip=170.10.129.124; envelope-from=peterx@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -21 X-Spam_score: -2.2 X-Spam_bar: -- X-Spam_report: (-2.2 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.106, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H2=-0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org From: Fabiano Rosas We currently have an unfavorable situation around multifd channels creation and the migration thread execution. We create the multifd channels with qio_channel_socket_connect_async -> qio_task_run_in_thread, but only connect them at the multifd_new_send_channel_async callback, called from qio_task_complete, which is registered as a glib event. So at multifd_send_setup() we create the channels, but they will only be actually usable after the whole multifd_send_setup() calling stack returns back to the main loop. Which means that the migration thread is already up and running without any possibility for the multifd channels to be ready on time. We currently rely on the channels-ready semaphore blocking multifd_send_sync_main() until channels start to come up and release it. However there have been bugs recently found when a channel's creation fails and multifd_send_cleanup() is allowed to run while other channels are still being created. Let's start to organize this situation by moving the multifd_send_setup() call into the migration thread. That way we unblock the main-loop to dispatch the completion callbacks and actually have a chance of getting the multifd channels ready for when the migration thread needs them. The next patches will deal with the synchronization aspects. Note that this takes multifd_send_setup() out of the BQL. Reviewed-by: Peter Xu Signed-off-by: Fabiano Rosas Link: https://lore.kernel.org/r/20240206215118.6171-5-farosas@suse.de Signed-off-by: Peter Xu --- migration/migration.c | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/migration/migration.c b/migration/migration.c index 6432a81e8b..ab21de2cad 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -3327,6 +3327,10 @@ static void *migration_thread(void *opaque) object_ref(OBJECT(s)); update_iteration_initial_status(s); + if (!multifd_send_setup()) { + goto out; + } + bql_lock(); qemu_savevm_state_header(s->to_dst_file); bql_unlock(); @@ -3398,6 +3402,7 @@ static void *migration_thread(void *opaque) urgent = migration_rate_limit(); } +out: trace_migration_thread_after_loop(); migration_iteration_finish(s); object_unref(OBJECT(s)); @@ -3635,11 +3640,6 @@ void migrate_fd_connect(MigrationState *s, Error *error_in) return; } - if (!multifd_send_setup()) { - migrate_fd_cleanup(s); - return; - } - if (migrate_background_snapshot()) { qemu_thread_create(&s->thread, "bg_snapshot", bg_migration_thread, s, QEMU_THREAD_JOINABLE); From patchwork Thu Feb 8 03:05:24 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 13549248 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 43C94C4829B for ; Thu, 8 Feb 2024 03:08:28 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1rXulN-0000my-MR; Wed, 07 Feb 2024 22:07:45 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1rXukt-0007Lg-Jg for qemu-devel@nongnu.org; Wed, 07 Feb 2024 22:07:16 -0500 Received: from us-smtp-delivery-124.mimecast.com ([170.10.129.124]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1rXukr-0004Sp-Pc for qemu-devel@nongnu.org; Wed, 07 Feb 2024 22:07:15 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1707361633; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Q4Vnj6oBPJEBsdh78JffL2RsD5I0M/JGFxJ+1Q6n0PI=; b=ZaawgheYBSJTPHxhEB+Ld5x3ckTD/cLmuj7MWpXGHwoC68UrN+a4Xo+51qwdaB3Nas264I VbGsm1MKXIK7uMZecm4M3Jn15S17NjPXvLgyTYOjWUION6OZBYm9y10FwgyzRVLLdadjLV rbdbI9YYPko7ATc3ucyucJ+9lqRhsm4= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-615-yYCQOGfLOu2LshR2va399A-1; Wed, 07 Feb 2024 22:07:09 -0500 X-MC-Unique: yYCQOGfLOu2LshR2va399A-1 Received: from smtp.corp.redhat.com (int-mx09.intmail.prod.int.rdu2.redhat.com [10.11.54.9]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 44A6C185A780; Thu, 8 Feb 2024 03:07:09 +0000 (UTC) Received: from x1n.redhat.com (unknown [10.72.116.45]) by smtp.corp.redhat.com (Postfix) with ESMTP id 03033492BC7; Thu, 8 Feb 2024 03:07:06 +0000 (UTC) From: peterx@redhat.com To: Peter Maydell , qemu-devel@nongnu.org Cc: peterx@redhat.com, Fabiano Rosas Subject: [PULL 30/34] migration/multifd: Unify multifd and TLS connection paths Date: Thu, 8 Feb 2024 11:05:24 +0800 Message-ID: <20240208030528.368214-31-peterx@redhat.com> In-Reply-To: <20240208030528.368214-1-peterx@redhat.com> References: <20240208030528.368214-1-peterx@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.4.1 on 10.11.54.9 Received-SPF: pass client-ip=170.10.129.124; envelope-from=peterx@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -21 X-Spam_score: -2.2 X-Spam_bar: -- X-Spam_report: (-2.2 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.106, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H2=-0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org From: Fabiano Rosas During multifd channel creation (multifd_send_new_channel_async) when TLS is enabled, the multifd_channel_connect function is called twice, once to create the TLS handshake thread and another time after the asynchrounous TLS handshake has finished. This creates a slightly confusing call stack where multifd_channel_connect() is called more times than the number of channels. It also splits error handling between the two callers of multifd_channel_connect() causing some code duplication. Lastly, it gets in the way of having a single point to determine whether all channel creation tasks have been initiated. Refactor the code to move the reentrancy one level up at the multifd_new_send_channel_async() level, de-duplicating the error handling and allowing for the next patch to introduce a synchronization point common to all the multifd channel creation, regardless of TLS. Note that the previous code would never fail once p->c had been set. This patch changes this assumption, which affects refcounting, so add comments around object_unref to explain the situation. Reviewed-by: Peter Xu Signed-off-by: Fabiano Rosas Link: https://lore.kernel.org/r/20240206215118.6171-6-farosas@suse.de Signed-off-by: Peter Xu --- migration/multifd.c | 83 ++++++++++++++++++++++----------------------- 1 file changed, 40 insertions(+), 43 deletions(-) diff --git a/migration/multifd.c b/migration/multifd.c index cf865edba0..3db18dc79e 100644 --- a/migration/multifd.c +++ b/migration/multifd.c @@ -869,30 +869,7 @@ out: return NULL; } -static bool multifd_channel_connect(MultiFDSendParams *p, - QIOChannel *ioc, - Error **errp); - -static void multifd_tls_outgoing_handshake(QIOTask *task, - gpointer opaque) -{ - MultiFDSendParams *p = opaque; - QIOChannel *ioc = QIO_CHANNEL(qio_task_get_source(task)); - Error *err = NULL; - - if (!qio_task_propagate_error(task, &err)) { - trace_multifd_tls_outgoing_handshake_complete(ioc); - if (multifd_channel_connect(p, ioc, &err)) { - return; - } - } - - trace_multifd_tls_outgoing_handshake_error(ioc, error_get_pretty(err)); - - multifd_send_set_error(err); - multifd_send_kick_main(p); - error_free(err); -} +static void multifd_new_send_channel_async(QIOTask *task, gpointer opaque); static void *multifd_tls_handshake_thread(void *opaque) { @@ -900,7 +877,7 @@ static void *multifd_tls_handshake_thread(void *opaque) QIOChannelTLS *tioc = QIO_CHANNEL_TLS(p->c); qio_channel_tls_handshake(tioc, - multifd_tls_outgoing_handshake, + multifd_new_send_channel_async, p, NULL, NULL); @@ -920,6 +897,10 @@ static bool multifd_tls_channel_connect(MultiFDSendParams *p, return false; } + /* + * Ownership of the socket channel now transfers to the newly + * created TLS channel, which has already taken a reference. + */ object_unref(OBJECT(ioc)); trace_multifd_tls_outgoing_handshake_start(ioc, tioc, hostname); qio_channel_set_name(QIO_CHANNEL(tioc), "multifd-tls-outgoing"); @@ -936,18 +917,7 @@ static bool multifd_channel_connect(MultiFDSendParams *p, QIOChannel *ioc, Error **errp) { - trace_multifd_set_outgoing_channel( - ioc, object_get_typename(OBJECT(ioc)), - migrate_get_current()->hostname); - - if (migrate_channel_requires_tls_upgrade(ioc)) { - /* - * tls_channel_connect will call back to this - * function after the TLS handshake, - * so we mustn't call multifd_send_thread until then - */ - return multifd_tls_channel_connect(p, ioc, errp); - } + qio_channel_set_delay(ioc, false); migration_ioc_register_yank(ioc); p->registered_yank = true; @@ -959,24 +929,51 @@ static bool multifd_channel_connect(MultiFDSendParams *p, return true; } +/* + * When TLS is enabled this function is called once to establish the + * TLS connection and a second time after the TLS handshake to create + * the multifd channel. Without TLS it goes straight into the channel + * creation. + */ static void multifd_new_send_channel_async(QIOTask *task, gpointer opaque) { MultiFDSendParams *p = opaque; QIOChannel *ioc = QIO_CHANNEL(qio_task_get_source(task)); Error *local_err = NULL; + bool ret; trace_multifd_new_send_channel_async(p->id); - if (!qio_task_propagate_error(task, &local_err)) { - qio_channel_set_delay(ioc, false); - if (multifd_channel_connect(p, ioc, &local_err)) { - return; - } + + if (qio_task_propagate_error(task, &local_err)) { + ret = false; + goto out; + } + + trace_multifd_set_outgoing_channel(ioc, object_get_typename(OBJECT(ioc)), + migrate_get_current()->hostname); + + if (migrate_channel_requires_tls_upgrade(ioc)) { + ret = multifd_tls_channel_connect(p, ioc, &local_err); + } else { + ret = multifd_channel_connect(p, ioc, &local_err); } + if (ret) { + return; + } + +out: trace_multifd_new_send_channel_async_error(p->id, local_err); multifd_send_set_error(local_err); multifd_send_kick_main(p); - object_unref(OBJECT(ioc)); + if (!p->c) { + /* + * If no channel has been created, drop the initial + * reference. Otherwise cleanup happens at + * multifd_send_channel_destroy() + */ + object_unref(OBJECT(ioc)); + } error_free(local_err); } From patchwork Thu Feb 8 03:05:25 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 13549268 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 0B2CEC4828F for ; Thu, 8 Feb 2024 03:11:35 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1rXulM-0000hy-NY; Wed, 07 Feb 2024 22:07:44 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1rXukx-0007Q5-3T for qemu-devel@nongnu.org; Wed, 07 Feb 2024 22:07:21 -0500 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1rXukv-0004T5-CP for qemu-devel@nongnu.org; Wed, 07 Feb 2024 22:07:18 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1707361636; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=0uBhWvVhBBBJH3HTQWIcFvxknxyvfEJFnfNd9n4dZVo=; b=RJnKQoVibGrJ1rvevfsIzVPnX5fVCFk+LVlNSEus02mcMAhciuQfQzUuSy+Yr5DAQkc5IJ M1QOC7i4JKXEAJ6GJEcVNSgc3ihojBX92wL2DDha+F1loA373StjRwiT4qzCBLPaHzPbr5 axkGgruLP/wjyW0vtkxbJaiqTj0s3BI= Received: from mimecast-mx02.redhat.com (mx-ext.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-44-txyy_h62OTOaqwUCeSD1hw-1; Wed, 07 Feb 2024 22:07:13 -0500 X-MC-Unique: txyy_h62OTOaqwUCeSD1hw-1 Received: from smtp.corp.redhat.com (int-mx09.intmail.prod.int.rdu2.redhat.com [10.11.54.9]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id E0FDC3806710; Thu, 8 Feb 2024 03:07:12 +0000 (UTC) Received: from x1n.redhat.com (unknown [10.72.116.45]) by smtp.corp.redhat.com (Postfix) with ESMTP id EFD8D492BC7; Thu, 8 Feb 2024 03:07:09 +0000 (UTC) From: peterx@redhat.com To: Peter Maydell , qemu-devel@nongnu.org Cc: peterx@redhat.com, Fabiano Rosas , Avihai Horon Subject: [PULL 31/34] migration/multifd: Add a synchronization point for channel creation Date: Thu, 8 Feb 2024 11:05:25 +0800 Message-ID: <20240208030528.368214-32-peterx@redhat.com> In-Reply-To: <20240208030528.368214-1-peterx@redhat.com> References: <20240208030528.368214-1-peterx@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.4.1 on 10.11.54.9 Received-SPF: pass client-ip=170.10.133.124; envelope-from=peterx@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -21 X-Spam_score: -2.2 X-Spam_bar: -- X-Spam_report: (-2.2 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.106, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H5=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org From: Fabiano Rosas It is possible that one of the multifd channels fails to be created at multifd_new_send_channel_async() while the rest of the channel creation tasks are still in flight. This could lead to multifd_save_cleanup() executing the qemu_thread_join() loop too early and not waiting for the threads which haven't been created yet, leading to the freeing of resources that the newly created threads will try to access and crash. Add a synchronization point after which there will be no attempts at thread creation and therefore calling multifd_save_cleanup() past that point will ensure it properly waits for the threads. A note about performance: Prior to this patch, if a channel took too long to be established, other channels could finish connecting first and already start taking load. Now we're bounded by the slowest-connecting channel. Reported-by: Avihai Horon Reviewed-by: Peter Xu Signed-off-by: Fabiano Rosas Link: https://lore.kernel.org/r/20240206215118.6171-7-farosas@suse.de Signed-off-by: Peter Xu --- migration/multifd.c | 32 ++++++++++++++++++++++++++------ 1 file changed, 26 insertions(+), 6 deletions(-) diff --git a/migration/multifd.c b/migration/multifd.c index 3db18dc79e..adfe8c9a0a 100644 --- a/migration/multifd.c +++ b/migration/multifd.c @@ -62,6 +62,11 @@ struct { * Make it easy for now. */ uintptr_t packet_num; + /* + * Synchronization point past which no more channels will be + * created. + */ + QemuSemaphore channels_created; /* send channels ready */ QemuSemaphore channels_ready; /* @@ -622,10 +627,6 @@ static void multifd_send_terminate_threads(void) /* * Finally recycle all the threads. - * - * TODO: p->running is still buggy, e.g. we can reach here without the - * corresponding multifd_new_send_channel_async() get invoked yet, - * then a new thread can even be created after this function returns. */ for (i = 0; i < migrate_multifd_channels(); i++) { MultiFDSendParams *p = &multifd_send_state->params[i]; @@ -670,6 +671,7 @@ static bool multifd_send_cleanup_channel(MultiFDSendParams *p, Error **errp) static void multifd_send_cleanup_state(void) { + qemu_sem_destroy(&multifd_send_state->channels_created); qemu_sem_destroy(&multifd_send_state->channels_ready); g_free(multifd_send_state->params); multifd_send_state->params = NULL; @@ -954,18 +956,26 @@ static void multifd_new_send_channel_async(QIOTask *task, gpointer opaque) if (migrate_channel_requires_tls_upgrade(ioc)) { ret = multifd_tls_channel_connect(p, ioc, &local_err); + if (ret) { + return; + } } else { ret = multifd_channel_connect(p, ioc, &local_err); } +out: + /* + * Here we're not interested whether creation succeeded, only that + * it happened at all. + */ + qemu_sem_post(&multifd_send_state->channels_created); + if (ret) { return; } -out: trace_multifd_new_send_channel_async_error(p->id, local_err); multifd_send_set_error(local_err); - multifd_send_kick_main(p); if (!p->c) { /* * If no channel has been created, drop the initial @@ -998,6 +1008,7 @@ bool multifd_send_setup(void) multifd_send_state = g_malloc0(sizeof(*multifd_send_state)); multifd_send_state->params = g_new0(MultiFDSendParams, thread_count); multifd_send_state->pages = multifd_pages_init(page_count); + qemu_sem_init(&multifd_send_state->channels_created, 0); qemu_sem_init(&multifd_send_state->channels_ready, 0); qatomic_set(&multifd_send_state->exiting, 0); multifd_send_state->ops = multifd_ops[migrate_multifd_compression()]; @@ -1023,6 +1034,15 @@ bool multifd_send_setup(void) multifd_new_send_channel_create(p); } + /* + * Wait until channel creation has started for all channels. The + * creation can still fail, but no more channels will be created + * past this point. + */ + for (i = 0; i < thread_count; i++) { + qemu_sem_wait(&multifd_send_state->channels_created); + } + for (i = 0; i < thread_count; i++) { MultiFDSendParams *p = &multifd_send_state->params[i]; From patchwork Thu Feb 8 03:05:26 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 13549253 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id BBDA2C4829B for ; Thu, 8 Feb 2024 03:08:58 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1rXulF-0007x9-JR; Wed, 07 Feb 2024 22:07:37 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1rXul2-0007XE-Ri for qemu-devel@nongnu.org; Wed, 07 Feb 2024 22:07:29 -0500 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1rXukz-0004TI-GW for qemu-devel@nongnu.org; Wed, 07 Feb 2024 22:07:23 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1707361641; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=EYgcoe+Yf6G873AECalTe8HYrleEX1mHKU34mfd81ew=; b=MhD+NxgzPD/xljDacyooEVIcIXUdC0qExJT/aFLnaD69UtqY/gjGb4/DCWxXwWMbcII6X6 cEM6JSckJjqmzPtNW2gDKUs6pbRyQMB+KXlCdezf8hP+B4xs3UL8k53dVHcA6pwv5WkoHm iExTNxCWC7HRyysGbkXDTbEK00l4vno= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-696-3AtC4yqCOUGWnRAnFHjD3g-1; Wed, 07 Feb 2024 22:07:16 -0500 X-MC-Unique: 3AtC4yqCOUGWnRAnFHjD3g-1 Received: from smtp.corp.redhat.com (int-mx09.intmail.prod.int.rdu2.redhat.com [10.11.54.9]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 874A685A596; Thu, 8 Feb 2024 03:07:16 +0000 (UTC) Received: from x1n.redhat.com (unknown [10.72.116.45]) by smtp.corp.redhat.com (Postfix) with ESMTP id A3206492BC7; Thu, 8 Feb 2024 03:07:13 +0000 (UTC) From: peterx@redhat.com To: Peter Maydell , qemu-devel@nongnu.org Cc: peterx@redhat.com, Fabiano Rosas , =?utf-8?q?Daniel_P?= =?utf-8?q?=2E_Berrang=C3=A9?= Subject: [PULL 32/34] tests/migration-test: Stick with gicv3 in aarch64 test Date: Thu, 8 Feb 2024 11:05:26 +0800 Message-ID: <20240208030528.368214-33-peterx@redhat.com> In-Reply-To: <20240208030528.368214-1-peterx@redhat.com> References: <20240208030528.368214-1-peterx@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.4.1 on 10.11.54.9 Received-SPF: pass client-ip=170.10.133.124; envelope-from=peterx@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -21 X-Spam_score: -2.2 X-Spam_bar: -- X-Spam_report: (-2.2 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.106, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H5=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org From: Peter Xu Recently we introduced cross-binary migration test. It's always wanted that migration-test uses stable guest ABI for both QEMU binaries in this case, so that both QEMU binaries will be compatible on the migration stream with the cmdline specified. Switch to a static gic version "3" rather than using version "max", so that GIC should be stable now across any future QEMU binaries for migration-test. Here the version can actually be anything as long as the ABI is stable. We choose "3" because it's the majority of what we already use in QEMU while still new enough: "git grep gic-version=3" shows 6 hit, while version 4 has no direct user yet besides "max". Note that even with this change, aarch64 won't be able to work yet with migration cross binary test, but then the only missing piece will be the stable CPU model. Reviewed-by: "Daniel P. Berrangé" Link: https://lore.kernel.org/r/20240207005403.242235-2-peterx@redhat.com Signed-off-by: Peter Xu --- tests/qtest/migration-test.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c index 7675519cfa..8a5bb1752e 100644 --- a/tests/qtest/migration-test.c +++ b/tests/qtest/migration-test.c @@ -819,7 +819,7 @@ static int test_migrate_start(QTestState **from, QTestState **to, } else if (strcmp(arch, "aarch64") == 0) { memory_size = "150M"; machine_alias = "virt"; - machine_opts = "gic-version=max"; + machine_opts = "gic-version=3"; arch_opts = g_strdup_printf("-cpu max -kernel %s", bootpath); start_address = ARM_TEST_MEM_START; end_address = ARM_TEST_MEM_END; From patchwork Thu Feb 8 03:05:27 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 13549247 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id AD566C4828F for ; Thu, 8 Feb 2024 03:08:24 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1rXulG-0008Be-SY; Wed, 07 Feb 2024 22:07:38 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1rXul2-0007XL-Rx for qemu-devel@nongnu.org; Wed, 07 Feb 2024 22:07:29 -0500 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1rXul0-0004TQ-Jm for qemu-devel@nongnu.org; Wed, 07 Feb 2024 22:07:24 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1707361641; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=mWjun7fHHlNrofZutKQfly3Mfgqa+fkpDUBVFGm51eQ=; b=ZkpBW/cydEovMwNDGv2SCtY8ZXUpMrZh3IqwXFzrRSXVzLCwpvNFJX+L+j06NYJkwGO1Ja 7D1ZypqmJTTtp7eVDI31tYHaHfNpv4fV1vcz5/7vDWg3vmVGvw5njRmcG23LHu36wViRCx 391tsIC+AURuuQTop6gza+ckDu7HiPg= Received: from mimecast-mx02.redhat.com (mx-ext.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-508-irj1cScEOiqdcrQhKn6TKw-1; Wed, 07 Feb 2024 22:07:20 -0500 X-MC-Unique: irj1cScEOiqdcrQhKn6TKw-1 Received: from smtp.corp.redhat.com (int-mx09.intmail.prod.int.rdu2.redhat.com [10.11.54.9]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id CCDCD3C0F18A; Thu, 8 Feb 2024 03:07:19 +0000 (UTC) Received: from x1n.redhat.com (unknown [10.72.116.45]) by smtp.corp.redhat.com (Postfix) with ESMTP id 56F40492BC7; Thu, 8 Feb 2024 03:07:16 +0000 (UTC) From: peterx@redhat.com To: Peter Maydell , qemu-devel@nongnu.org Cc: peterx@redhat.com, Fabiano Rosas , =?utf-8?q?Daniel_P?= =?utf-8?q?=2E_Berrang=C3=A9?= Subject: [PULL 33/34] ci: Remove tag dependency for build-previous-qemu Date: Thu, 8 Feb 2024 11:05:27 +0800 Message-ID: <20240208030528.368214-34-peterx@redhat.com> In-Reply-To: <20240208030528.368214-1-peterx@redhat.com> References: <20240208030528.368214-1-peterx@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.4.1 on 10.11.54.9 Received-SPF: pass client-ip=170.10.133.124; envelope-from=peterx@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -21 X-Spam_score: -2.2 X-Spam_bar: -- X-Spam_report: (-2.2 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.106, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H5=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org From: Peter Xu The new build-previous-qemu job relies on QEMU release tag being present, while that may not be always true for personal git repositories since by default tag is not pushed. The job can fail on those CI kicks, as reported by Peter Maydell. Fix it by fetching the tags remotely from the official repository, as suggested by Dan. [1] https://lore.kernel.org/r/ZcC9ScKJ7VvqektA@redhat.com Reported-by: Peter Maydell Suggested-by: "Daniel P. Berrangé" Reviewed-by: "Daniel P. Berrangé" Link: https://lore.kernel.org/r/20240207005403.242235-3-peterx@redhat.com Signed-off-by: Peter Xu --- .gitlab-ci.d/buildtest.yml | 2 ++ 1 file changed, 2 insertions(+) diff --git a/.gitlab-ci.d/buildtest.yml b/.gitlab-ci.d/buildtest.yml index 79bbc8585b..cfe95c1b17 100644 --- a/.gitlab-ci.d/buildtest.yml +++ b/.gitlab-ci.d/buildtest.yml @@ -189,6 +189,8 @@ build-previous-qemu: TARGETS: x86_64-softmmu aarch64-softmmu before_script: - export QEMU_PREV_VERSION="$(sed 's/\([0-9.]*\)\.[0-9]*/v\1.0/' VERSION)" + - git remote add upstream https://gitlab.com/qemu-project/qemu + - git fetch upstream $QEMU_PREV_VERSION - git checkout $QEMU_PREV_VERSION after_script: - mv build build-previous From patchwork Thu Feb 8 03:05:28 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 13549244 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 1BE98C4828F for ; Thu, 8 Feb 2024 03:08:13 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1rXulG-0008An-Rl; Wed, 07 Feb 2024 22:07:38 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1rXul8-0007is-7g for qemu-devel@nongnu.org; Wed, 07 Feb 2024 22:07:31 -0500 Received: from us-smtp-delivery-124.mimecast.com ([170.10.129.124]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1rXul6-0004UI-FN for qemu-devel@nongnu.org; Wed, 07 Feb 2024 22:07:29 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1707361646; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=9yyTNfUk5Njdt3gYz4rbQl9fF1WbnjVbxLHTtUtcj4A=; b=d0i+T2KYJGNVVa5zi4CNTptKTIH4Caj0mQEkqew+w5pDJXcmZu3/AIWBxiuxCQZj86Z8d6 TN46zliLvg44xMIcpLXtTSghjYK3pL3s8x5fXefbklUymBHh2Sdxup3S8lKCb8n6z6Dqhs 4C14W19j84KfP8rsvbSETpCIloFoGoQ= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-619-986EsJx0Ogedimj4tAckZw-1; Wed, 07 Feb 2024 22:07:23 -0500 X-MC-Unique: 986EsJx0Ogedimj4tAckZw-1 Received: from smtp.corp.redhat.com (int-mx09.intmail.prod.int.rdu2.redhat.com [10.11.54.9]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 65C8E101FA2D; Thu, 8 Feb 2024 03:07:23 +0000 (UTC) Received: from x1n.redhat.com (unknown [10.72.116.45]) by smtp.corp.redhat.com (Postfix) with ESMTP id 9AAA9492BC7; Thu, 8 Feb 2024 03:07:20 +0000 (UTC) From: peterx@redhat.com To: Peter Maydell , qemu-devel@nongnu.org Cc: peterx@redhat.com, Fabiano Rosas , =?utf-8?q?Daniel_P?= =?utf-8?q?=2E_Berrang=C3=A9?= Subject: [PULL 34/34] ci: Update comment for migration-compat-aarch64 Date: Thu, 8 Feb 2024 11:05:28 +0800 Message-ID: <20240208030528.368214-35-peterx@redhat.com> In-Reply-To: <20240208030528.368214-1-peterx@redhat.com> References: <20240208030528.368214-1-peterx@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.4.1 on 10.11.54.9 Received-SPF: pass client-ip=170.10.129.124; envelope-from=peterx@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -21 X-Spam_score: -2.2 X-Spam_bar: -- X-Spam_report: (-2.2 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.106, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H2=-0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org From: Peter Xu It turns out that we may not be able to enable this test even for the upcoming v9.0. Document what we're still missing. Reviewed-by: "Daniel P. Berrangé" Link: https://lore.kernel.org/r/20240207005403.242235-4-peterx@redhat.com Signed-off-by: Peter Xu --- .gitlab-ci.d/buildtest.yml | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/.gitlab-ci.d/buildtest.yml b/.gitlab-ci.d/buildtest.yml index cfe95c1b17..f56df59c94 100644 --- a/.gitlab-ci.d/buildtest.yml +++ b/.gitlab-ci.d/buildtest.yml @@ -219,9 +219,10 @@ build-previous-qemu: - QTEST_QEMU_BINARY_DST=./qemu-system-${TARGET} QTEST_QEMU_BINARY=../build/qemu-system-${TARGET} ./tests/qtest/migration-test -# This job is disabled until we release 9.0. The existing -# migration-test in 8.2 is broken on aarch64. The fix was already -# commited, but it will only take effect once 9.0 is out. +# This job needs to be disabled until we can have an aarch64 CPU model that +# will both (1) support both KVM and TCG, and (2) provide a stable ABI. +# Currently only "-cpu max" can provide (1), however it doesn't guarantee +# (2). Mark this test skipped until later. migration-compat-aarch64: extends: .migration-compat-common variables: