From patchwork Tue Nov 28 10:42:53 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Eugenio Perez Martin X-Patchwork-Id: 13470964 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 8AA8EC07E98 for ; Tue, 28 Nov 2023 10:45:25 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1r7vYo-0006kK-II; Tue, 28 Nov 2023 05:43:22 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1r7vYl-0006jd-98 for qemu-devel@nongnu.org; Tue, 28 Nov 2023 05:43:19 -0500 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1r7vYj-0005dp-Gq for qemu-devel@nongnu.org; Tue, 28 Nov 2023 05:43:19 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1701168194; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding; bh=UgTGwcOENzROtaj0eWF6SpaTqmIF1t1/VaaJBjbfdZE=; b=ZUaexozZylkpRy7O0nznhQuMTl4w7C966P19Y3fPumKGpUFHI4htyVZz0gCtmwjQSDf5kL UWcHtmFWCqD4au3sKm0cNK3906r37St3dFinz4VvjNhEtMO5dG0pAifrCx/ry0n6+5iiEa BOnUGZ05roczyAj8NDIhv7Nu8OwUDQM= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-592-58IXC40YM6yiPgV6CTTHcg-1; Tue, 28 Nov 2023 05:43:08 -0500 X-MC-Unique: 58IXC40YM6yiPgV6CTTHcg-1 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.rdu2.redhat.com [10.11.54.4]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id DE4F5101A529; Tue, 28 Nov 2023 10:43:07 +0000 (UTC) Received: from eperezma.remote.csb (unknown [10.39.193.65]) by smtp.corp.redhat.com (Postfix) with ESMTP id 72F5520268D7; Tue, 28 Nov 2023 10:43:05 +0000 (UTC) From: =?utf-8?q?Eugenio_P=C3=A9rez?= To: qemu-devel@nongnu.org Cc: Gautam Dawar , Jason Wang , Zhu Lingshan , yin31149@gmail.com, Shannon Nelson , "Michael S. Tsirkin" , Dragos Tatulea , Yajun Wu , Juan Quintela , Laurent Vivier , Stefano Garzarella , Parav Pandit , Lei Yang , si-wei.liu@oracle.com Subject: [RFC PATCH v2 00/10] Map memory at destination .load_setup in vDPA-net migration Date: Tue, 28 Nov 2023 11:42:53 +0100 Message-Id: <20231128104303.3314000-1-eperezma@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.4.1 on 10.11.54.4 Received-SPF: pass client-ip=170.10.133.124; envelope-from=eperezma@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H3=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Current memory operations like pinning may take a lot of time at the destination. Currently they are done after the source of the migration is stopped, and before the workload is resumed at the destination. This is a period where neigher traffic can flow, nor the VM workload can continue (downtime). We can do better as we know the memory layout of the guest RAM at the destination from the moment the migration starts. Moving that operation allows QEMU to communicate the kernel the maps while the workload is still running in the source, so Linux can start mapping them. Also, the destination of the guest memory may finish before the destination QEMU maps all the memory. In this case, the rest of the memory will be mapped at the same time as before applying this series, when the device is starting. So we're only improving with this series. RFC TODO: We should be able to not finish the migration while the memory is still not mapped, but I still need to find how. Suggestions are welcome. Note that further devices setup at the end of the migration may alter the guest memory layout. But same as the previous point, many operations are still done incrementally, like memory pinning, so we're saving time anyway. Only tested with vdpa_sim. I'm sending this before full benchmark, as some work like [1] can be based on it, and Si-Wei agreed on benchmark this series with his experience. This needs to be applied on top of [2], which perform some code reorganization that allows to map the memory without knowing the queue layout the guest configure on the device. Future directions on top of this series may include: * Iterative migration of virtio-net devices, as it may reduce downtime per [1]. vhost-vdpa net can apply the configuration through CVQ in the destination while the source is still migrating. * Move more things ahead of migration time, like DRIVER_OK. * Check that the devices of the destination are valid, and cancel the migration in case it is not. RFC v2: * Delegate map to another thread so it does no block QMP. * Fix not allocating iova_tree if x-svq=on at the destination. * Rebased on latest master. * More cleanups of current code, that might be split from this series too. [1] https://lore.kernel.org/qemu-devel/6c8ebb97-d546-3f1c-4cdd-54e23a566f61@nvidia.com/T/ [2] https://lists.nongnu.org/archive/html/qemu-devel/2023-11/msg05331.html Eugenio PĂ©rez (10): vdpa: do not set virtio status bits if unneeded vdpa: make batch_begin_once early return vdpa: merge _begin_batch into _batch_begin_once vdpa: extract out _dma_end_batch from _listener_commit vdpa: factor out stop path of vhost_vdpa_dev_start vdpa: check for iova tree initialized at net_client_start vdpa: set backend capabilities at vhost_vdpa_init vdpa: add vhost_vdpa_load_setup vdpa: add vhost_vdpa_net_load_setup NetClient callback virtio_net: register incremental migration handlers include/hw/virtio/vhost-vdpa.h | 25 ++++ include/net/net.h | 6 + hw/net/virtio-net.c | 35 +++++ hw/virtio/vhost-vdpa.c | 257 +++++++++++++++++++++++++++------ net/vhost-vdpa.c | 37 ++++- 5 files changed, 312 insertions(+), 48 deletions(-)