From patchwork Fri Dec 13 07:08:43 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chenyi Qiang X-Patchwork-Id: 13906617 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 705B0E7717D for ; Fri, 13 Dec 2024 07:11:19 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1tLznn-0007Wt-4B; Fri, 13 Dec 2024 02:09:31 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1tLznl-0007WJ-H5 for qemu-devel@nongnu.org; Fri, 13 Dec 2024 02:09:29 -0500 Received: from mgamail.intel.com ([198.175.65.10]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1tLzne-00071G-HX for qemu-devel@nongnu.org; Fri, 13 Dec 2024 02:09:29 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1734073763; x=1765609763; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=+gGDRKc6A8E0xNRDSzs8+ysZ9ibQoXdsqeDZYD2FgLQ=; b=cgjCaRTpSKt3edYn07ahIjG9qbd7gELfLiN5Wcitefu9L0x8gwT883Hy Cz5SDWMeHiolHPhS3DsokGhcGu2r3enO3WY4U0oA08aPRvBls3C4rbECy 0JUeiWHvmjJ1L54+zmi5KfbTVH13sRrGVw0vKtV2hoxRkubUXvjzaeB7c 6mqQDOQE/bv7WlPxSGIj5gj5SXG9hhDMJ0U28R9844PcL3sX+Le4Mcl99 8yoX77EJt4gJosBTfj9ZDNb213SSaqF3OI/BgqV3qwGaScjXIGSNsQWSz OYhNM/ktvPvZuUaP05i8NtT42wlBtUtYVWCBgosKxSeuUGKtI5KgABX9Z A==; X-CSE-ConnectionGUID: fV49oYbDTcCdjKf+1un+3A== X-CSE-MsgGUID: sIfMSqFVQZKt05IMQBg7EQ== X-IronPort-AV: E=McAfee;i="6700,10204,11284"; a="51937069" X-IronPort-AV: E=Sophos;i="6.12,230,1728975600"; d="scan'208";a="51937069" Received: from orviesa009.jf.intel.com ([10.64.159.149]) by orvoesa102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Dec 2024 23:09:19 -0800 X-CSE-ConnectionGUID: dr84zZlCQqSEHaVcDIz0oQ== X-CSE-MsgGUID: swkuzg1CTXaftgdaBFhv4g== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.12,230,1728975600"; d="scan'208";a="96365541" Received: from emr-bkc.sh.intel.com ([10.112.230.82]) by orviesa009-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Dec 2024 23:09:16 -0800 From: Chenyi Qiang To: David Hildenbrand , Paolo Bonzini , Peter Xu , =?utf-8?q?Philippe_Mathieu-Daud=C3=A9?= , Michael Roth Cc: Chenyi Qiang , qemu-devel@nongnu.org, kvm@vger.kernel.org, Williams Dan J , Peng Chao P , Gao Chao , Xu Yilun Subject: [PATCH 1/7] memory: Export a helper to get intersection of a MemoryRegionSection with a given range Date: Fri, 13 Dec 2024 15:08:43 +0800 Message-ID: <20241213070852.106092-2-chenyi.qiang@intel.com> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20241213070852.106092-1-chenyi.qiang@intel.com> References: <20241213070852.106092-1-chenyi.qiang@intel.com> MIME-Version: 1.0 Received-SPF: pass client-ip=198.175.65.10; envelope-from=chenyi.qiang@intel.com; helo=mgamail.intel.com X-Spam_score_int: -48 X-Spam_score: -4.9 X-Spam_bar: ---- X-Spam_report: (-4.9 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.496, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_MED=-2.3, RCVD_IN_VALIDITY_RPBL_BLOCKED=0.001, RCVD_IN_VALIDITY_SAFE_BLOCKED=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Rename the helper to memory_region_section_intersect_range() to make it more generic. Signed-off-by: Chenyi Qiang Reviewed-by: David Hildenbrand --- hw/virtio/virtio-mem.c | 32 +++++--------------------------- include/exec/memory.h | 13 +++++++++++++ system/memory.c | 17 +++++++++++++++++ 3 files changed, 35 insertions(+), 27 deletions(-) diff --git a/hw/virtio/virtio-mem.c b/hw/virtio/virtio-mem.c index 80ada89551..e3d1ccaeeb 100644 --- a/hw/virtio/virtio-mem.c +++ b/hw/virtio/virtio-mem.c @@ -242,28 +242,6 @@ static int virtio_mem_for_each_plugged_range(VirtIOMEM *vmem, void *arg, return ret; } -/* - * Adjust the memory section to cover the intersection with the given range. - * - * Returns false if the intersection is empty, otherwise returns true. - */ -static bool virtio_mem_intersect_memory_section(MemoryRegionSection *s, - uint64_t offset, uint64_t size) -{ - uint64_t start = MAX(s->offset_within_region, offset); - uint64_t end = MIN(s->offset_within_region + int128_get64(s->size), - offset + size); - - if (end <= start) { - return false; - } - - s->offset_within_address_space += start - s->offset_within_region; - s->offset_within_region = start; - s->size = int128_make64(end - start); - return true; -} - typedef int (*virtio_mem_section_cb)(MemoryRegionSection *s, void *arg); static int virtio_mem_for_each_plugged_section(const VirtIOMEM *vmem, @@ -285,7 +263,7 @@ static int virtio_mem_for_each_plugged_section(const VirtIOMEM *vmem, first_bit + 1) - 1; size = (last_bit - first_bit + 1) * vmem->block_size; - if (!virtio_mem_intersect_memory_section(&tmp, offset, size)) { + if (!memory_region_section_intersect_range(&tmp, offset, size)) { break; } ret = cb(&tmp, arg); @@ -317,7 +295,7 @@ static int virtio_mem_for_each_unplugged_section(const VirtIOMEM *vmem, first_bit + 1) - 1; size = (last_bit - first_bit + 1) * vmem->block_size; - if (!virtio_mem_intersect_memory_section(&tmp, offset, size)) { + if (!memory_region_section_intersect_range(&tmp, offset, size)) { break; } ret = cb(&tmp, arg); @@ -353,7 +331,7 @@ static void virtio_mem_notify_unplug(VirtIOMEM *vmem, uint64_t offset, QLIST_FOREACH(rdl, &vmem->rdl_list, next) { MemoryRegionSection tmp = *rdl->section; - if (!virtio_mem_intersect_memory_section(&tmp, offset, size)) { + if (!memory_region_section_intersect_range(&tmp, offset, size)) { continue; } rdl->notify_discard(rdl, &tmp); @@ -369,7 +347,7 @@ static int virtio_mem_notify_plug(VirtIOMEM *vmem, uint64_t offset, QLIST_FOREACH(rdl, &vmem->rdl_list, next) { MemoryRegionSection tmp = *rdl->section; - if (!virtio_mem_intersect_memory_section(&tmp, offset, size)) { + if (!memory_region_section_intersect_range(&tmp, offset, size)) { continue; } ret = rdl->notify_populate(rdl, &tmp); @@ -386,7 +364,7 @@ static int virtio_mem_notify_plug(VirtIOMEM *vmem, uint64_t offset, if (rdl2 == rdl) { break; } - if (!virtio_mem_intersect_memory_section(&tmp, offset, size)) { + if (!memory_region_section_intersect_range(&tmp, offset, size)) { continue; } rdl2->notify_discard(rdl2, &tmp); diff --git a/include/exec/memory.h b/include/exec/memory.h index e5e865d1a9..ec7bc641e8 100644 --- a/include/exec/memory.h +++ b/include/exec/memory.h @@ -1196,6 +1196,19 @@ MemoryRegionSection *memory_region_section_new_copy(MemoryRegionSection *s); */ void memory_region_section_free_copy(MemoryRegionSection *s); +/** + * memory_region_section_intersect_range: Adjust the memory section to cover + * the intersection with the given range. + * + * @s: the #MemoryRegionSection to be adjusted + * @offset: the offset of the given range in the memory region + * @size: the size of the given range + * + * Returns false if the intersection is empty, otherwise returns true. + */ +bool memory_region_section_intersect_range(MemoryRegionSection *s, + uint64_t offset, uint64_t size); + /** * memory_region_init: Initialize a memory region * diff --git a/system/memory.c b/system/memory.c index 85f6834cb3..ddcec90f5e 100644 --- a/system/memory.c +++ b/system/memory.c @@ -2898,6 +2898,23 @@ void memory_region_section_free_copy(MemoryRegionSection *s) g_free(s); } +bool memory_region_section_intersect_range(MemoryRegionSection *s, + uint64_t offset, uint64_t size) +{ + uint64_t start = MAX(s->offset_within_region, offset); + uint64_t end = MIN(s->offset_within_region + int128_get64(s->size), + offset + size); + + if (end <= start) { + return false; + } + + s->offset_within_address_space += start - s->offset_within_region; + s->offset_within_region = start; + s->size = int128_make64(end - start); + return true; +} + bool memory_region_present(MemoryRegion *container, hwaddr addr) { MemoryRegion *mr; From patchwork Fri Dec 13 07:08:44 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chenyi Qiang X-Patchwork-Id: 13906612 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id AF896E7717F for ; Fri, 13 Dec 2024 07:10:27 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1tLznp-0007Xk-UZ; Fri, 13 Dec 2024 02:09:33 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1tLzno-0007XB-2R for qemu-devel@nongnu.org; Fri, 13 Dec 2024 02:09:32 -0500 Received: from mgamail.intel.com ([198.175.65.10]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1tLznf-00071K-N5 for qemu-devel@nongnu.org; Fri, 13 Dec 2024 02:09:31 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1734073764; x=1765609764; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=f3SH0dN9l5XpHOQkWq7GcuZYymqFqU3Q2bWZ4IWC33c=; b=c6OnPVrx8m0xZ1RTNJPRsJOSgxlrwx5eXQ8AXs9KwbBV4uDyM7Up6xUG TS8FxfgwGlEnuwBXAL57MpZij0oKCEWMqK39nFaKQNoTLfCZPAFiW0bsH U8W7BMBNqaFopcHKOhvX9Ap7qkmD34aKi0SQAAf3RPlnXeSxlibI8/NNO 9MdhZy+8OgDbW4B151ku4REqGxiarn8XjKfF5sPuPny7kBH2ZiXOhs0Bd /blvFYTq2gKRVjUw/fz/gMVD5W/mm8/qvHA6POudqLBfrIC69qfB5Omhm 5uSEs4QpvEqxEDJjGqGDo7vqIOVCupDts78MzxOXIwInD4/MavdoxzxAj g==; X-CSE-ConnectionGUID: mbH6/IYxSumUHWSORdn65Q== X-CSE-MsgGUID: 2wKFu47bRE+4qyoF7zz/6Q== X-IronPort-AV: E=McAfee;i="6700,10204,11284"; a="51937075" X-IronPort-AV: E=Sophos;i="6.12,230,1728975600"; d="scan'208";a="51937075" Received: from orviesa009.jf.intel.com ([10.64.159.149]) by orvoesa102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Dec 2024 23:09:23 -0800 X-CSE-ConnectionGUID: IixbyfNwTPSBxOVNMAoO2Q== X-CSE-MsgGUID: TT8enhCkR7aiPygj5iCDvg== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.12,230,1728975600"; d="scan'208";a="96365549" Received: from emr-bkc.sh.intel.com ([10.112.230.82]) by orviesa009-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Dec 2024 23:09:19 -0800 From: Chenyi Qiang To: David Hildenbrand , Paolo Bonzini , Peter Xu , =?utf-8?q?Philippe_Mathieu-Daud=C3=A9?= , Michael Roth Cc: Chenyi Qiang , qemu-devel@nongnu.org, kvm@vger.kernel.org, Williams Dan J , Peng Chao P , Gao Chao , Xu Yilun Subject: [PATCH 2/7] guest_memfd: Introduce an object to manage the guest-memfd with RamDiscardManager Date: Fri, 13 Dec 2024 15:08:44 +0800 Message-ID: <20241213070852.106092-3-chenyi.qiang@intel.com> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20241213070852.106092-1-chenyi.qiang@intel.com> References: <20241213070852.106092-1-chenyi.qiang@intel.com> MIME-Version: 1.0 Received-SPF: pass client-ip=198.175.65.10; envelope-from=chenyi.qiang@intel.com; helo=mgamail.intel.com X-Spam_score_int: -48 X-Spam_score: -4.9 X-Spam_bar: ---- X-Spam_report: (-4.9 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.496, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_MED=-2.3, RCVD_IN_VALIDITY_RPBL_BLOCKED=0.001, RCVD_IN_VALIDITY_SAFE_BLOCKED=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org As the commit 852f0048f3 ("RAMBlock: make guest_memfd require uncoordinated discard") highlighted, some subsystems like VFIO might disable ram block discard. However, guest_memfd relies on the discard operation to perform page conversion between private and shared memory. This can lead to stale IOMMU mapping issue when assigning a hardware device to a confidential VM via shared memory (unprotected memory pages). Blocking shared page discard can solve this problem, but it could cause guests to consume twice the memory with VFIO, which is not acceptable in some cases. An alternative solution is to convey other systems like VFIO to refresh its outdated IOMMU mappings. RamDiscardManager is an existing concept (used by virtio-mem) to adjust VFIO mappings in relation to VM page assignment. Effectively page conversion is similar to hot-removing a page in one mode and adding it back in the other, so the similar work that needs to happen in response to virtio-mem changes needs to happen for page conversion events. Introduce the RamDiscardManager to guest_memfd to achieve it. However, guest_memfd is not an object so it cannot directly implement the RamDiscardManager interface. One solution is to implement the interface in HostMemoryBackend. Any guest_memfd-backed host memory backend can register itself in the target MemoryRegion. However, this solution doesn't cover the scenario where a guest_memfd MemoryRegion doesn't belong to the HostMemoryBackend, e.g. the virtual BIOS MemoryRegion. Thus, choose the second option, i.e. define an object type named guest_memfd_manager with RamDiscardManager interface. Upon creation of guest_memfd, a new guest_memfd_manager object can be instantiated and registered to the managed guest_memfd MemoryRegion to handle the page conversion events. In the context of guest_memfd, the discarded state signifies that the page is private, while the populated state indicated that the page is shared. The state of the memory is tracked at the granularity of the host page size (i.e. block_size), as the minimum conversion size can be one page per request. In addition, VFIO expects the DMA mapping for a specific iova to be mapped and unmapped with the same granularity. However, the confidential VMs may do partial conversion, e.g. conversion happens on a small region within a large region. To prevent such invalid cases and before any potential optimization comes out, all operations are performed with 4K granularity. Signed-off-by: Chenyi Qiang --- include/sysemu/guest-memfd-manager.h | 46 +++++ system/guest-memfd-manager.c | 250 +++++++++++++++++++++++++++ system/meson.build | 1 + 3 files changed, 297 insertions(+) create mode 100644 include/sysemu/guest-memfd-manager.h create mode 100644 system/guest-memfd-manager.c diff --git a/include/sysemu/guest-memfd-manager.h b/include/sysemu/guest-memfd-manager.h new file mode 100644 index 0000000000..ba4a99b614 --- /dev/null +++ b/include/sysemu/guest-memfd-manager.h @@ -0,0 +1,46 @@ +/* + * QEMU guest memfd manager + * + * Copyright Intel + * + * Author: + * Chenyi Qiang + * + * This work is licensed under the terms of the GNU GPL, version 2 or later. + * See the COPYING file in the top-level directory + * + */ + +#ifndef SYSEMU_GUEST_MEMFD_MANAGER_H +#define SYSEMU_GUEST_MEMFD_MANAGER_H + +#include "sysemu/hostmem.h" + +#define TYPE_GUEST_MEMFD_MANAGER "guest-memfd-manager" + +OBJECT_DECLARE_TYPE(GuestMemfdManager, GuestMemfdManagerClass, GUEST_MEMFD_MANAGER) + +struct GuestMemfdManager { + Object parent; + + /* Managed memory region. */ + MemoryRegion *mr; + + /* + * 1-setting of the bit represents the memory is populated (shared). + */ + int32_t bitmap_size; + unsigned long *bitmap; + + /* block size and alignment */ + uint64_t block_size; + + /* listeners to notify on populate/discard activity. */ + QLIST_HEAD(, RamDiscardListener) rdl_list; +}; + +struct GuestMemfdManagerClass { + ObjectClass parent_class; +}; + +#endif diff --git a/system/guest-memfd-manager.c b/system/guest-memfd-manager.c new file mode 100644 index 0000000000..d7e105fead --- /dev/null +++ b/system/guest-memfd-manager.c @@ -0,0 +1,250 @@ +/* + * QEMU guest memfd manager + * + * Copyright Intel + * + * Author: + * Chenyi Qiang + * + * This work is licensed under the terms of the GNU GPL, version 2 or later. + * See the COPYING file in the top-level directory + * + */ + +#include "qemu/osdep.h" +#include "qemu/error-report.h" +#include "sysemu/guest-memfd-manager.h" + +OBJECT_DEFINE_SIMPLE_TYPE_WITH_INTERFACES(GuestMemfdManager, + guest_memfd_manager, + GUEST_MEMFD_MANAGER, + OBJECT, + { TYPE_RAM_DISCARD_MANAGER }, + { }) + +static bool guest_memfd_rdm_is_populated(const RamDiscardManager *rdm, + const MemoryRegionSection *section) +{ + const GuestMemfdManager *gmm = GUEST_MEMFD_MANAGER(rdm); + uint64_t first_bit = section->offset_within_region / gmm->block_size; + uint64_t last_bit = first_bit + int128_get64(section->size) / gmm->block_size - 1; + unsigned long first_discard_bit; + + first_discard_bit = find_next_zero_bit(gmm->bitmap, last_bit + 1, first_bit); + return first_discard_bit > last_bit; +} + +typedef int (*guest_memfd_section_cb)(MemoryRegionSection *s, void *arg); + +static int guest_memfd_notify_populate_cb(MemoryRegionSection *section, void *arg) +{ + RamDiscardListener *rdl = arg; + + return rdl->notify_populate(rdl, section); +} + +static int guest_memfd_notify_discard_cb(MemoryRegionSection *section, void *arg) +{ + RamDiscardListener *rdl = arg; + + rdl->notify_discard(rdl, section); + + return 0; +} + +static int guest_memfd_for_each_populated_section(const GuestMemfdManager *gmm, + MemoryRegionSection *section, + void *arg, + guest_memfd_section_cb cb) +{ + unsigned long first_one_bit, last_one_bit; + uint64_t offset, size; + int ret = 0; + + first_one_bit = section->offset_within_region / gmm->block_size; + first_one_bit = find_next_bit(gmm->bitmap, gmm->bitmap_size, first_one_bit); + + while (first_one_bit < gmm->bitmap_size) { + MemoryRegionSection tmp = *section; + + offset = first_one_bit * gmm->block_size; + last_one_bit = find_next_zero_bit(gmm->bitmap, gmm->bitmap_size, + first_one_bit + 1) - 1; + size = (last_one_bit - first_one_bit + 1) * gmm->block_size; + + if (!memory_region_section_intersect_range(&tmp, offset, size)) { + break; + } + + ret = cb(&tmp, arg); + if (ret) { + break; + } + + first_one_bit = find_next_bit(gmm->bitmap, gmm->bitmap_size, + last_one_bit + 2); + } + + return ret; +} + +static int guest_memfd_for_each_discarded_section(const GuestMemfdManager *gmm, + MemoryRegionSection *section, + void *arg, + guest_memfd_section_cb cb) +{ + unsigned long first_zero_bit, last_zero_bit; + uint64_t offset, size; + int ret = 0; + + first_zero_bit = section->offset_within_region / gmm->block_size; + first_zero_bit = find_next_zero_bit(gmm->bitmap, gmm->bitmap_size, + first_zero_bit); + + while (first_zero_bit < gmm->bitmap_size) { + MemoryRegionSection tmp = *section; + + offset = first_zero_bit * gmm->block_size; + last_zero_bit = find_next_bit(gmm->bitmap, gmm->bitmap_size, + first_zero_bit + 1) - 1; + size = (last_zero_bit - first_zero_bit + 1) * gmm->block_size; + + if (!memory_region_section_intersect_range(&tmp, offset, size)) { + break; + } + + ret = cb(&tmp, arg); + if (ret) { + break; + } + + first_zero_bit = find_next_zero_bit(gmm->bitmap, gmm->bitmap_size, + last_zero_bit + 2); + } + + return ret; +} + +static uint64_t guest_memfd_rdm_get_min_granularity(const RamDiscardManager *rdm, + const MemoryRegion *mr) +{ + GuestMemfdManager *gmm = GUEST_MEMFD_MANAGER(rdm); + + g_assert(mr == gmm->mr); + return gmm->block_size; +} + +static void guest_memfd_rdm_register_listener(RamDiscardManager *rdm, + RamDiscardListener *rdl, + MemoryRegionSection *section) +{ + GuestMemfdManager *gmm = GUEST_MEMFD_MANAGER(rdm); + int ret; + + g_assert(section->mr == gmm->mr); + rdl->section = memory_region_section_new_copy(section); + + QLIST_INSERT_HEAD(&gmm->rdl_list, rdl, next); + + ret = guest_memfd_for_each_populated_section(gmm, section, rdl, + guest_memfd_notify_populate_cb); + if (ret) { + error_report("%s: Failed to register RAM discard listener: %s", __func__, + strerror(-ret)); + } +} + +static void guest_memfd_rdm_unregister_listener(RamDiscardManager *rdm, + RamDiscardListener *rdl) +{ + GuestMemfdManager *gmm = GUEST_MEMFD_MANAGER(rdm); + int ret; + + g_assert(rdl->section); + g_assert(rdl->section->mr == gmm->mr); + + ret = guest_memfd_for_each_populated_section(gmm, rdl->section, rdl, + guest_memfd_notify_discard_cb); + if (ret) { + error_report("%s: Failed to unregister RAM discard listener: %s", __func__, + strerror(-ret)); + } + + memory_region_section_free_copy(rdl->section); + rdl->section = NULL; + QLIST_REMOVE(rdl, next); + +} + +typedef struct GuestMemfdReplayData { + void *fn; + void *opaque; +} GuestMemfdReplayData; + +static int guest_memfd_rdm_replay_populated_cb(MemoryRegionSection *section, void *arg) +{ + struct GuestMemfdReplayData *data = arg; + ReplayRamPopulate replay_fn = data->fn; + + return replay_fn(section, data->opaque); +} + +static int guest_memfd_rdm_replay_populated(const RamDiscardManager *rdm, + MemoryRegionSection *section, + ReplayRamPopulate replay_fn, + void *opaque) +{ + GuestMemfdManager *gmm = GUEST_MEMFD_MANAGER(rdm); + struct GuestMemfdReplayData data = { .fn = replay_fn, .opaque = opaque }; + + g_assert(section->mr == gmm->mr); + return guest_memfd_for_each_populated_section(gmm, section, &data, + guest_memfd_rdm_replay_populated_cb); +} + +static int guest_memfd_rdm_replay_discarded_cb(MemoryRegionSection *section, void *arg) +{ + struct GuestMemfdReplayData *data = arg; + ReplayRamDiscard replay_fn = data->fn; + + replay_fn(section, data->opaque); + + return 0; +} + +static void guest_memfd_rdm_replay_discarded(const RamDiscardManager *rdm, + MemoryRegionSection *section, + ReplayRamDiscard replay_fn, + void *opaque) +{ + GuestMemfdManager *gmm = GUEST_MEMFD_MANAGER(rdm); + struct GuestMemfdReplayData data = { .fn = replay_fn, .opaque = opaque }; + + g_assert(section->mr == gmm->mr); + guest_memfd_for_each_discarded_section(gmm, section, &data, + guest_memfd_rdm_replay_discarded_cb); +} + +static void guest_memfd_manager_init(Object *obj) +{ + GuestMemfdManager *gmm = GUEST_MEMFD_MANAGER(obj); + + QLIST_INIT(&gmm->rdl_list); +} + +static void guest_memfd_manager_finalize(Object *obj) +{ + g_free(GUEST_MEMFD_MANAGER(obj)->bitmap); +} + +static void guest_memfd_manager_class_init(ObjectClass *oc, void *data) +{ + RamDiscardManagerClass *rdmc = RAM_DISCARD_MANAGER_CLASS(oc); + + rdmc->get_min_granularity = guest_memfd_rdm_get_min_granularity; + rdmc->register_listener = guest_memfd_rdm_register_listener; + rdmc->unregister_listener = guest_memfd_rdm_unregister_listener; + rdmc->is_populated = guest_memfd_rdm_is_populated; + rdmc->replay_populated = guest_memfd_rdm_replay_populated; + rdmc->replay_discarded = guest_memfd_rdm_replay_discarded; +} diff --git a/system/meson.build b/system/meson.build index 4952f4b2c7..ed4e1137bd 100644 --- a/system/meson.build +++ b/system/meson.build @@ -15,6 +15,7 @@ system_ss.add(files( 'dirtylimit.c', 'dma-helpers.c', 'globals.c', + 'guest-memfd-manager.c', 'memory_mapping.c', 'qdev-monitor.c', 'qtest.c', From patchwork Fri Dec 13 07:08:45 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chenyi Qiang X-Patchwork-Id: 13906614 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id D8538C3DA4A for ; Fri, 13 Dec 2024 07:10:27 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1tLznq-0007Xt-TN; Fri, 13 Dec 2024 02:09:34 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1tLzno-0007XL-R9 for qemu-devel@nongnu.org; Fri, 13 Dec 2024 02:09:32 -0500 Received: from mgamail.intel.com ([198.175.65.10]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1tLznl-00071G-SG for qemu-devel@nongnu.org; Fri, 13 Dec 2024 02:09:32 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1734073770; x=1765609770; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=clDYu4irPfldg6nv/XSDBXCk/lCoW+l/KtnHIqDPmow=; b=fQn7c7LZQEHjIOax0hv72p6AfHsw/eJqRA9D8vL3GuwiUd4Xl+996ppQ +sR/tCYO50CKD9NmUn1eB7gW+aoHYRN9if67KZd/lOLp0MCoqeuj2dDL0 AFe+z+QptQS4jo9+UnkM9sWs2giq/gBfc+i47x4AZWj5hxgmA0aWx0j/R r+m31vyXlPosJwbxv68sakZ7H4aY7td54LPeQJ3MSGtWLEdDk0DF17X7m zCOCJf6sC+igw/IiAn9VkvV24YvSVMeUfjCuKw6kCk3Ez4vHAkwk+B+O2 McHck/tIN/zqfxw/rCbaMnXZeI0okxZVul+FM7ZaCJh6jpkwRV4ikygC5 A==; X-CSE-ConnectionGUID: XZdl0D/iSP6eX9jZPRGKHw== X-CSE-MsgGUID: yeCXu7oiQticRlBhZoUmQA== X-IronPort-AV: E=McAfee;i="6700,10204,11284"; a="51937082" X-IronPort-AV: E=Sophos;i="6.12,230,1728975600"; d="scan'208";a="51937082" Received: from orviesa009.jf.intel.com ([10.64.159.149]) by orvoesa102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Dec 2024 23:09:26 -0800 X-CSE-ConnectionGUID: /kKuGrECQ462TJEg+6+ouA== X-CSE-MsgGUID: TZBRwFL4T86Wddo0G1L9Jg== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.12,230,1728975600"; d="scan'208";a="96365557" Received: from emr-bkc.sh.intel.com ([10.112.230.82]) by orviesa009-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Dec 2024 23:09:22 -0800 From: Chenyi Qiang To: David Hildenbrand , Paolo Bonzini , Peter Xu , =?utf-8?q?Philippe_Mathieu-Daud=C3=A9?= , Michael Roth Cc: Chenyi Qiang , qemu-devel@nongnu.org, kvm@vger.kernel.org, Williams Dan J , Peng Chao P , Gao Chao , Xu Yilun Subject: [PATCH 3/7] guest_memfd: Introduce a callback to notify the shared/private state change Date: Fri, 13 Dec 2024 15:08:45 +0800 Message-ID: <20241213070852.106092-4-chenyi.qiang@intel.com> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20241213070852.106092-1-chenyi.qiang@intel.com> References: <20241213070852.106092-1-chenyi.qiang@intel.com> MIME-Version: 1.0 Received-SPF: pass client-ip=198.175.65.10; envelope-from=chenyi.qiang@intel.com; helo=mgamail.intel.com X-Spam_score_int: -48 X-Spam_score: -4.9 X-Spam_bar: ---- X-Spam_report: (-4.9 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.496, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_MED=-2.3, RCVD_IN_VALIDITY_RPBL_BLOCKED=0.001, RCVD_IN_VALIDITY_SAFE_BLOCKED=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Introduce a new state_change() callback in GuestMemfdManagerClass to efficiently notify all registered RamDiscardListeners, including VFIO listeners about the memory conversion events in guest_memfd. The existing VFIO listener can dynamically DMA map/unmap the shared pages based on conversion types: - For conversions from shared to private, the VFIO system ensures the discarding of shared mapping from the IOMMU. - For conversions from private to shared, it triggers the population of the shared mapping into the IOMMU. Additionally, there could be some special conversion requests: - When a conversion request is made for a page already in the desired state, the helper simply returns success. - For requests involving a range partially in the desired state, only the necessary segments are converted, ensuring the entire range complies with the request efficiently. - In scenarios where a conversion request is declined by other systems, such as a failure from VFIO during notify_populate(), the helper will roll back the request, maintaining consistency. Signed-off-by: Chenyi Qiang --- include/sysemu/guest-memfd-manager.h | 3 + system/guest-memfd-manager.c | 144 +++++++++++++++++++++++++++ 2 files changed, 147 insertions(+) diff --git a/include/sysemu/guest-memfd-manager.h b/include/sysemu/guest-memfd-manager.h index ba4a99b614..f4b175529b 100644 --- a/include/sysemu/guest-memfd-manager.h +++ b/include/sysemu/guest-memfd-manager.h @@ -41,6 +41,9 @@ struct GuestMemfdManager { struct GuestMemfdManagerClass { ObjectClass parent_class; + + int (*state_change)(GuestMemfdManager *gmm, uint64_t offset, uint64_t size, + bool shared_to_private); }; #endif diff --git a/system/guest-memfd-manager.c b/system/guest-memfd-manager.c index d7e105fead..6601df5f3f 100644 --- a/system/guest-memfd-manager.c +++ b/system/guest-memfd-manager.c @@ -225,6 +225,147 @@ static void guest_memfd_rdm_replay_discarded(const RamDiscardManager *rdm, guest_memfd_rdm_replay_discarded_cb); } +static bool guest_memfd_is_valid_range(GuestMemfdManager *gmm, + uint64_t offset, uint64_t size) +{ + MemoryRegion *mr = gmm->mr; + + g_assert(mr); + + uint64_t region_size = memory_region_size(mr); + if (!QEMU_IS_ALIGNED(offset, gmm->block_size)) { + return false; + } + if (offset + size < offset || !size) { + return false; + } + if (offset >= region_size || offset + size > region_size) { + return false; + } + return true; +} + +static void guest_memfd_notify_discard(GuestMemfdManager *gmm, + uint64_t offset, uint64_t size) +{ + RamDiscardListener *rdl; + + QLIST_FOREACH(rdl, &gmm->rdl_list, next) { + MemoryRegionSection tmp = *rdl->section; + + if (!memory_region_section_intersect_range(&tmp, offset, size)) { + continue; + } + + guest_memfd_for_each_populated_section(gmm, &tmp, rdl, + guest_memfd_notify_discard_cb); + } +} + + +static int guest_memfd_notify_populate(GuestMemfdManager *gmm, + uint64_t offset, uint64_t size) +{ + RamDiscardListener *rdl, *rdl2; + int ret = 0; + + QLIST_FOREACH(rdl, &gmm->rdl_list, next) { + MemoryRegionSection tmp = *rdl->section; + + if (!memory_region_section_intersect_range(&tmp, offset, size)) { + continue; + } + + ret = guest_memfd_for_each_discarded_section(gmm, &tmp, rdl, + guest_memfd_notify_populate_cb); + if (ret) { + break; + } + } + + if (ret) { + /* Notify all already-notified listeners. */ + QLIST_FOREACH(rdl2, &gmm->rdl_list, next) { + MemoryRegionSection tmp = *rdl2->section; + + if (rdl2 == rdl) { + break; + } + if (!memory_region_section_intersect_range(&tmp, offset, size)) { + continue; + } + + guest_memfd_for_each_discarded_section(gmm, &tmp, rdl2, + guest_memfd_notify_discard_cb); + } + } + return ret; +} + +static bool guest_memfd_is_range_populated(GuestMemfdManager *gmm, + uint64_t offset, uint64_t size) +{ + const unsigned long first_bit = offset / gmm->block_size; + const unsigned long last_bit = first_bit + (size / gmm->block_size) - 1; + unsigned long found_bit; + + /* We fake a shorter bitmap to avoid searching too far. */ + found_bit = find_next_zero_bit(gmm->bitmap, last_bit + 1, first_bit); + return found_bit > last_bit; +} + +static bool guest_memfd_is_range_discarded(GuestMemfdManager *gmm, + uint64_t offset, uint64_t size) +{ + const unsigned long first_bit = offset / gmm->block_size; + const unsigned long last_bit = first_bit + (size / gmm->block_size) - 1; + unsigned long found_bit; + + /* We fake a shorter bitmap to avoid searching too far. */ + found_bit = find_next_bit(gmm->bitmap, last_bit + 1, first_bit); + return found_bit > last_bit; +} + +static int guest_memfd_state_change(GuestMemfdManager *gmm, uint64_t offset, + uint64_t size, bool shared_to_private) +{ + int ret = 0; + + if (!guest_memfd_is_valid_range(gmm, offset, size)) { + error_report("%s, invalid range: offset 0x%lx, size 0x%lx", + __func__, offset, size); + return -1; + } + + if ((shared_to_private && guest_memfd_is_range_discarded(gmm, offset, size)) || + (!shared_to_private && guest_memfd_is_range_populated(gmm, offset, size))) { + return 0; + } + + if (shared_to_private) { + guest_memfd_notify_discard(gmm, offset, size); + } else { + ret = guest_memfd_notify_populate(gmm, offset, size); + } + + if (!ret) { + unsigned long first_bit = offset / gmm->block_size; + unsigned long nbits = size / gmm->block_size; + + g_assert((first_bit + nbits) <= gmm->bitmap_size); + + if (shared_to_private) { + bitmap_clear(gmm->bitmap, first_bit, nbits); + } else { + bitmap_set(gmm->bitmap, first_bit, nbits); + } + + return 0; + } + + return ret; +} + static void guest_memfd_manager_init(Object *obj) { GuestMemfdManager *gmm = GUEST_MEMFD_MANAGER(obj); @@ -239,8 +380,11 @@ static void guest_memfd_manager_finalize(Object *obj) static void guest_memfd_manager_class_init(ObjectClass *oc, void *data) { + GuestMemfdManagerClass *gmmc = GUEST_MEMFD_MANAGER_CLASS(oc); RamDiscardManagerClass *rdmc = RAM_DISCARD_MANAGER_CLASS(oc); + gmmc->state_change = guest_memfd_state_change; + rdmc->get_min_granularity = guest_memfd_rdm_get_min_granularity; rdmc->register_listener = guest_memfd_rdm_register_listener; rdmc->unregister_listener = guest_memfd_rdm_unregister_listener; From patchwork Fri Dec 13 07:08:46 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chenyi Qiang X-Patchwork-Id: 13906613 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 1D635E77182 for ; Fri, 13 Dec 2024 07:10:28 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1tLznu-0007Yh-H2; Fri, 13 Dec 2024 02:09:38 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1tLzns-0007YN-RK for qemu-devel@nongnu.org; Fri, 13 Dec 2024 02:09:36 -0500 Received: from mgamail.intel.com ([198.175.65.10]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1tLznn-000714-SU for qemu-devel@nongnu.org; Fri, 13 Dec 2024 02:09:36 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1734073772; x=1765609772; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=vCYpc1Vq5+niHLJ2a/1T5SIiwGkK2cfUHuX+/Tv813o=; b=OmVWp7xSeSgq1bc79yRi9ylhu31WRnTKciOR9K35ng/vMf3YW20kPl0X KzlmjdlcX0S3sn4FRNfOhSJlnU1h3mYUwNLqMhK4mzvQIWboIPyJKBo+E ZKN8vXw9Qek7WvTEskGGtYaKFPm+FVNVJhNtDcrNo9TQwIQLXJesaLlLs EiqnCelr8rFz2vJkyEMuNjhdVtHsNyS7cBXSZNRG1QGe0FTUzbFHcAn6F mJtNTjeJQ7MmC79Nevq7PnYbCnqvRlAjjDdDVMukXpfCMXp79vu0IFgEW YYACjx3CB+x/rnesmmOBcGUyYtYxJYWUG4L65FZzNMeYIXilHgUny+rRg g==; X-CSE-ConnectionGUID: TfZodG4rR66MXpO8Ej5K5w== X-CSE-MsgGUID: 8CcSyo1tTiCFOd7vWpFZvg== X-IronPort-AV: E=McAfee;i="6700,10204,11284"; a="51937087" X-IronPort-AV: E=Sophos;i="6.12,230,1728975600"; d="scan'208";a="51937087" Received: from orviesa009.jf.intel.com ([10.64.159.149]) by orvoesa102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Dec 2024 23:09:29 -0800 X-CSE-ConnectionGUID: PkxAlT9xTnSNCm03DXVOsw== X-CSE-MsgGUID: Z1jMnQXCQzuVd9eD3SkzhA== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.12,230,1728975600"; d="scan'208";a="96365565" Received: from emr-bkc.sh.intel.com ([10.112.230.82]) by orviesa009-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Dec 2024 23:09:26 -0800 From: Chenyi Qiang To: David Hildenbrand , Paolo Bonzini , Peter Xu , =?utf-8?q?Philippe_Mathieu-Daud=C3=A9?= , Michael Roth Cc: Chenyi Qiang , qemu-devel@nongnu.org, kvm@vger.kernel.org, Williams Dan J , Peng Chao P , Gao Chao , Xu Yilun Subject: [PATCH 4/7] KVM: Notify the state change event during shared/private conversion Date: Fri, 13 Dec 2024 15:08:46 +0800 Message-ID: <20241213070852.106092-5-chenyi.qiang@intel.com> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20241213070852.106092-1-chenyi.qiang@intel.com> References: <20241213070852.106092-1-chenyi.qiang@intel.com> MIME-Version: 1.0 Received-SPF: pass client-ip=198.175.65.10; envelope-from=chenyi.qiang@intel.com; helo=mgamail.intel.com X-Spam_score_int: -48 X-Spam_score: -4.9 X-Spam_bar: ---- X-Spam_report: (-4.9 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.496, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_MED=-2.3, RCVD_IN_VALIDITY_SAFE_BLOCKED=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Introduce a helper to trigger the state_change() callback of the class. Once exit to userspace to convert the page from private to shared or vice versa at runtime, notify the event via the helper so that other registered subsystems like VFIO can be notified. Signed-off-by: Chenyi Qiang --- accel/kvm/kvm-all.c | 4 ++++ include/sysemu/guest-memfd-manager.h | 15 +++++++++++++++ 2 files changed, 19 insertions(+) diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c index 52425af534..38f41a98a5 100644 --- a/accel/kvm/kvm-all.c +++ b/accel/kvm/kvm-all.c @@ -48,6 +48,7 @@ #include "kvm-cpus.h" #include "sysemu/dirtylimit.h" #include "qemu/range.h" +#include "sysemu/guest-memfd-manager.h" #include "hw/boards.h" #include "sysemu/stats.h" @@ -3080,6 +3081,9 @@ int kvm_convert_memory(hwaddr start, hwaddr size, bool to_private) addr = memory_region_get_ram_ptr(mr) + section.offset_within_region; rb = qemu_ram_block_from_host(addr, false, &offset); + guest_memfd_manager_state_change(GUEST_MEMFD_MANAGER(mr->rdm), offset, + size, to_private); + if (to_private) { if (rb->page_size != qemu_real_host_page_size()) { /* diff --git a/include/sysemu/guest-memfd-manager.h b/include/sysemu/guest-memfd-manager.h index f4b175529b..9dc4e0346d 100644 --- a/include/sysemu/guest-memfd-manager.h +++ b/include/sysemu/guest-memfd-manager.h @@ -46,4 +46,19 @@ struct GuestMemfdManagerClass { bool shared_to_private); }; +static inline int guest_memfd_manager_state_change(GuestMemfdManager *gmm, uint64_t offset, + uint64_t size, bool shared_to_private) +{ + GuestMemfdManagerClass *klass; + + g_assert(gmm); + klass = GUEST_MEMFD_MANAGER_GET_CLASS(gmm); + + if (klass->state_change) { + return klass->state_change(gmm, offset, size, shared_to_private); + } + + return 0; +} + #endif From patchwork Fri Dec 13 07:08:47 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chenyi Qiang X-Patchwork-Id: 13906618 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 9B1ADE77180 for ; Fri, 13 Dec 2024 07:11:20 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1tLzns-0007YG-MF; Fri, 13 Dec 2024 02:09:36 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1tLznr-0007Y8-Fw for qemu-devel@nongnu.org; Fri, 13 Dec 2024 02:09:35 -0500 Received: from mgamail.intel.com ([198.175.65.10]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1tLzno-00071K-Du for qemu-devel@nongnu.org; Fri, 13 Dec 2024 02:09:35 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1734073773; x=1765609773; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=A4w+xUWPFzrHzwWGCwJC81L1HqXdufeFhGlIL1EjnJc=; b=SKvXKByHb42oHeJF9t2NRFnVn2+EOuyPzWekYj6L0cpDUfX2TEd8hpqV qu0mpi3WkaL7Qun0hVIDM0qIt0MycTXMCd5MD9kjM71GPfG+AP5iUQ1Ct puTUw4/KgT/9o4SJFN7FY1Cf9RnXhO2zlptb9+sLkl/KJ2Hbh74Fn00tv LSjsJ4T/3Zh17z3EvRuq/OUyEhwgj0HlJHZYnxs+mcsUivlsviYCa6Scl vSXB3aql2KDFY3pRNNbsj/EC8sGoGB0C7+8dr8BcD81Jcz1Tb+MPO5Saz qMb4l3VpV+Ox/fjz57iinXgsGnwH99N1kYeufYz1MHOEbXu7jTD93yZgS g==; X-CSE-ConnectionGUID: uvdmKt2dTRC7B9QR/RobXg== X-CSE-MsgGUID: HWKgLmvJRqm3bDNQ+fh4Fg== X-IronPort-AV: E=McAfee;i="6700,10204,11284"; a="51937093" X-IronPort-AV: E=Sophos;i="6.12,230,1728975600"; d="scan'208";a="51937093" Received: from orviesa009.jf.intel.com ([10.64.159.149]) by orvoesa102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Dec 2024 23:09:32 -0800 X-CSE-ConnectionGUID: XvVkZrwWTDSpO5tbYZGm3Q== X-CSE-MsgGUID: WI9Kr5bxSjiT/r1psnF6jg== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.12,230,1728975600"; d="scan'208";a="96365574" Received: from emr-bkc.sh.intel.com ([10.112.230.82]) by orviesa009-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Dec 2024 23:09:29 -0800 From: Chenyi Qiang To: David Hildenbrand , Paolo Bonzini , Peter Xu , =?utf-8?q?Philippe_Mathieu-Daud=C3=A9?= , Michael Roth Cc: Chenyi Qiang , qemu-devel@nongnu.org, kvm@vger.kernel.org, Williams Dan J , Peng Chao P , Gao Chao , Xu Yilun Subject: [PATCH 5/7] memory: Register the RamDiscardManager instance upon guest_memfd creation Date: Fri, 13 Dec 2024 15:08:47 +0800 Message-ID: <20241213070852.106092-6-chenyi.qiang@intel.com> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20241213070852.106092-1-chenyi.qiang@intel.com> References: <20241213070852.106092-1-chenyi.qiang@intel.com> MIME-Version: 1.0 Received-SPF: pass client-ip=198.175.65.10; envelope-from=chenyi.qiang@intel.com; helo=mgamail.intel.com X-Spam_score_int: -48 X-Spam_score: -4.9 X-Spam_bar: ---- X-Spam_report: (-4.9 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.496, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_MED=-2.3, RCVD_IN_VALIDITY_RPBL_BLOCKED=0.001, RCVD_IN_VALIDITY_SAFE_BLOCKED=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Introduce the realize()/unrealize() callbacks to initialize/uninitialize the new guest_memfd_manager object and register/unregister it in the target MemoryRegion. Guest_memfd was initially set to shared until the commit bd3bcf6962 ("kvm/memory: Make memory type private by default if it has guest memfd backend"). To align with this change, the default state in guest_memfd_manager is set to private. (The bitmap is cleared to 0). Additionally, setting the default to private can also reduce the overhead of mapping shared pages into IOMMU by VFIO during the bootup stage. Signed-off-by: Chenyi Qiang --- include/sysemu/guest-memfd-manager.h | 27 +++++++++++++++++++++++++++ system/guest-memfd-manager.c | 28 +++++++++++++++++++++++++++- system/physmem.c | 7 +++++++ 3 files changed, 61 insertions(+), 1 deletion(-) diff --git a/include/sysemu/guest-memfd-manager.h b/include/sysemu/guest-memfd-manager.h index 9dc4e0346d..d1e7f698e8 100644 --- a/include/sysemu/guest-memfd-manager.h +++ b/include/sysemu/guest-memfd-manager.h @@ -42,6 +42,8 @@ struct GuestMemfdManager { struct GuestMemfdManagerClass { ObjectClass parent_class; + void (*realize)(GuestMemfdManager *gmm, MemoryRegion *mr, uint64_t region_size); + void (*unrealize)(GuestMemfdManager *gmm); int (*state_change)(GuestMemfdManager *gmm, uint64_t offset, uint64_t size, bool shared_to_private); }; @@ -61,4 +63,29 @@ static inline int guest_memfd_manager_state_change(GuestMemfdManager *gmm, uint6 return 0; } +static inline void guest_memfd_manager_realize(GuestMemfdManager *gmm, + MemoryRegion *mr, uint64_t region_size) +{ + GuestMemfdManagerClass *klass; + + g_assert(gmm); + klass = GUEST_MEMFD_MANAGER_GET_CLASS(gmm); + + if (klass->realize) { + klass->realize(gmm, mr, region_size); + } +} + +static inline void guest_memfd_manager_unrealize(GuestMemfdManager *gmm) +{ + GuestMemfdManagerClass *klass; + + g_assert(gmm); + klass = GUEST_MEMFD_MANAGER_GET_CLASS(gmm); + + if (klass->unrealize) { + klass->unrealize(gmm); + } +} + #endif diff --git a/system/guest-memfd-manager.c b/system/guest-memfd-manager.c index 6601df5f3f..b6a32f0bfb 100644 --- a/system/guest-memfd-manager.c +++ b/system/guest-memfd-manager.c @@ -366,6 +366,31 @@ static int guest_memfd_state_change(GuestMemfdManager *gmm, uint64_t offset, return ret; } +static void guest_memfd_manager_realizefn(GuestMemfdManager *gmm, MemoryRegion *mr, + uint64_t region_size) +{ + uint64_t bitmap_size; + + gmm->block_size = qemu_real_host_page_size(); + bitmap_size = ROUND_UP(region_size, gmm->block_size) / gmm->block_size; + + gmm->mr = mr; + gmm->bitmap_size = bitmap_size; + gmm->bitmap = bitmap_new(bitmap_size); + + memory_region_set_ram_discard_manager(gmm->mr, RAM_DISCARD_MANAGER(gmm)); +} + +static void guest_memfd_manager_unrealizefn(GuestMemfdManager *gmm) +{ + memory_region_set_ram_discard_manager(gmm->mr, NULL); + + g_free(gmm->bitmap); + gmm->bitmap = NULL; + gmm->bitmap_size = 0; + gmm->mr = NULL; +} + static void guest_memfd_manager_init(Object *obj) { GuestMemfdManager *gmm = GUEST_MEMFD_MANAGER(obj); @@ -375,7 +400,6 @@ static void guest_memfd_manager_init(Object *obj) static void guest_memfd_manager_finalize(Object *obj) { - g_free(GUEST_MEMFD_MANAGER(obj)->bitmap); } static void guest_memfd_manager_class_init(ObjectClass *oc, void *data) @@ -384,6 +408,8 @@ static void guest_memfd_manager_class_init(ObjectClass *oc, void *data) RamDiscardManagerClass *rdmc = RAM_DISCARD_MANAGER_CLASS(oc); gmmc->state_change = guest_memfd_state_change; + gmmc->realize = guest_memfd_manager_realizefn; + gmmc->unrealize = guest_memfd_manager_unrealizefn; rdmc->get_min_granularity = guest_memfd_rdm_get_min_granularity; rdmc->register_listener = guest_memfd_rdm_register_listener; diff --git a/system/physmem.c b/system/physmem.c index dc1db3a384..532182a6dd 100644 --- a/system/physmem.c +++ b/system/physmem.c @@ -53,6 +53,7 @@ #include "sysemu/hostmem.h" #include "sysemu/hw_accel.h" #include "sysemu/xen-mapcache.h" +#include "sysemu/guest-memfd-manager.h" #include "trace.h" #ifdef CONFIG_FALLOCATE_PUNCH_HOLE @@ -1885,6 +1886,9 @@ static void ram_block_add(RAMBlock *new_block, Error **errp) qemu_mutex_unlock_ramlist(); goto out_free; } + + GuestMemfdManager *gmm = GUEST_MEMFD_MANAGER(object_new(TYPE_GUEST_MEMFD_MANAGER)); + guest_memfd_manager_realize(gmm, new_block->mr, new_block->mr->size); } ram_size = (new_block->offset + new_block->max_length) >> TARGET_PAGE_BITS; @@ -2139,6 +2143,9 @@ static void reclaim_ramblock(RAMBlock *block) if (block->guest_memfd >= 0) { close(block->guest_memfd); + GuestMemfdManager *gmm = GUEST_MEMFD_MANAGER(block->mr->rdm); + guest_memfd_manager_unrealize(gmm); + object_unref(OBJECT(gmm)); ram_block_discard_require(false); } From patchwork Fri Dec 13 07:08:48 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chenyi Qiang X-Patchwork-Id: 13906611 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id AAD42E7717D for ; Fri, 13 Dec 2024 07:10:27 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1tLznw-0007ZD-Sr; Fri, 13 Dec 2024 02:09:40 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1tLznv-0007Yy-Nv for qemu-devel@nongnu.org; Fri, 13 Dec 2024 02:09:39 -0500 Received: from mgamail.intel.com ([198.175.65.10]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1tLznr-00071G-60 for qemu-devel@nongnu.org; Fri, 13 Dec 2024 02:09:39 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1734073775; x=1765609775; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=be+9xTTlW9Q1UDuF2/Stfqw8KCfr7EfJ8tCu6lwkuG8=; b=O0aZH8u9NQMSpBSpGR2WaBnsmnKZ9hSJyyCGNQPXu8O3osbyK/tA0wvE fAlIWrPVSKHIsLSfQ7crPOuK6I1dnKpT7QftyiMJn/MM8uN2OgSld9rU9 0srYL0Vx7YUsGRK0u+pxkmoxFzHmvyrPV1IMN30KekwayEvZN5Vv6o2/A D7Y5M9wEAEFm5ai/831Cs8XaUJsxji64KfjvZdrBKNgpd9ZuaWUQfirSH pDyv4oBbITYsnaLKBdw2UXtbpFZn1soyjkAVZrgfLK/gROb8f+8+CWLTY Fkv82tUKgnc/fUhk22UCynpiGIgTr9E13TPKOsFzvHdITrhEDClqO0xVC A==; X-CSE-ConnectionGUID: DX75y7joSFmJORlE8dP0tg== X-CSE-MsgGUID: 6qaAd3GcQHSXFgjNfGeUqQ== X-IronPort-AV: E=McAfee;i="6700,10204,11284"; a="51937098" X-IronPort-AV: E=Sophos;i="6.12,230,1728975600"; d="scan'208";a="51937098" Received: from orviesa009.jf.intel.com ([10.64.159.149]) by orvoesa102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Dec 2024 23:09:35 -0800 X-CSE-ConnectionGUID: EN3ia8tuQ6Cs2wMQRzzq2g== X-CSE-MsgGUID: RVV3xVuERumDJExGsQeMfw== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.12,230,1728975600"; d="scan'208";a="96365582" Received: from emr-bkc.sh.intel.com ([10.112.230.82]) by orviesa009-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Dec 2024 23:09:32 -0800 From: Chenyi Qiang To: David Hildenbrand , Paolo Bonzini , Peter Xu , =?utf-8?q?Philippe_Mathieu-Daud=C3=A9?= , Michael Roth Cc: Chenyi Qiang , qemu-devel@nongnu.org, kvm@vger.kernel.org, Williams Dan J , Peng Chao P , Gao Chao , Xu Yilun Subject: [PATCH 6/7] RAMBlock: make guest_memfd require coordinate discard Date: Fri, 13 Dec 2024 15:08:48 +0800 Message-ID: <20241213070852.106092-7-chenyi.qiang@intel.com> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20241213070852.106092-1-chenyi.qiang@intel.com> References: <20241213070852.106092-1-chenyi.qiang@intel.com> MIME-Version: 1.0 Received-SPF: pass client-ip=198.175.65.10; envelope-from=chenyi.qiang@intel.com; helo=mgamail.intel.com X-Spam_score_int: -48 X-Spam_score: -4.9 X-Spam_bar: ---- X-Spam_report: (-4.9 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.496, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_MED=-2.3, RCVD_IN_VALIDITY_RPBL_BLOCKED=0.001, RCVD_IN_VALIDITY_SAFE_BLOCKED=0.001, SPF_HELO_NONE=0.001, T_SPF_TEMPERROR=0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org As guest_memfd is now managed by guest_memfd_manager with RamDiscardManager, only block uncoordinated discard. Signed-off-by: Chenyi Qiang --- system/physmem.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/system/physmem.c b/system/physmem.c index 532182a6dd..585090b063 100644 --- a/system/physmem.c +++ b/system/physmem.c @@ -1872,7 +1872,7 @@ static void ram_block_add(RAMBlock *new_block, Error **errp) assert(kvm_enabled()); assert(new_block->guest_memfd < 0); - ret = ram_block_discard_require(true); + ret = ram_block_coordinated_discard_require(true); if (ret < 0) { error_setg_errno(errp, -ret, "cannot set up private guest memory: discard currently blocked"); From patchwork Fri Dec 13 07:08:49 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chenyi Qiang X-Patchwork-Id: 13906616 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id D2BB2E7717D for ; Fri, 13 Dec 2024 07:11:03 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1tLzo3-0007a7-Hj; Fri, 13 Dec 2024 02:09:47 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1tLzo1-0007Zb-Ag for qemu-devel@nongnu.org; Fri, 13 Dec 2024 02:09:45 -0500 Received: from mgamail.intel.com ([198.175.65.10]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1tLznv-00072x-9U for qemu-devel@nongnu.org; Fri, 13 Dec 2024 02:09:45 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1734073780; x=1765609780; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=t+LB+Ns9IyXdCBWhMrXMtMR2X8Lk7IJwTq/0yEzRAaU=; b=gCFA5PVCGSqFmku+WBY1KFomhegVMW/LiKPhlJzJ+75GOA+rX57/yhlS OxKR2in2p36y3q/1lc8C1zkm4jjweVe1WAyF5EVv65Skg/i/Onf7minJX Bydf+DGMn7UQ1ahIoY7oqfak4mi85/d5uDkamYdANUY8uAmZ1bhdh2Tc3 PXEDetQ0NGRubERRdKzhkKdRn89jCqG22ZNs24uGlcyF4a0+LtXXkRKUm 0L4/Ht/kOHwrICt4B8vGJqbQ00kxu2592vP95zglUdUHJT/skW08u1oY9 DDTV4QPH3+6N4wTH8p6iXBgr7oNZdFM/IJlIirWYOFZPSRIUvqqbXvrvQ A==; X-CSE-ConnectionGUID: 8r0oZZ2tTqab8btBvwAm4w== X-CSE-MsgGUID: +LtZdSWAQhGSKs42yqXWAA== X-IronPort-AV: E=McAfee;i="6700,10204,11284"; a="51937108" X-IronPort-AV: E=Sophos;i="6.12,230,1728975600"; d="scan'208";a="51937108" Received: from orviesa009.jf.intel.com ([10.64.159.149]) by orvoesa102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Dec 2024 23:09:38 -0800 X-CSE-ConnectionGUID: oYKPgmgOTZGMb6AP0BKP5w== X-CSE-MsgGUID: y5rdZvvoRtibrtxIQBkOlg== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.12,230,1728975600"; d="scan'208";a="96365590" Received: from emr-bkc.sh.intel.com ([10.112.230.82]) by orviesa009-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Dec 2024 23:09:35 -0800 From: Chenyi Qiang To: David Hildenbrand , Paolo Bonzini , Peter Xu , =?utf-8?q?Philippe_Mathieu-Daud=C3=A9?= , Michael Roth Cc: Chenyi Qiang , qemu-devel@nongnu.org, kvm@vger.kernel.org, Williams Dan J , Peng Chao P , Gao Chao , Xu Yilun Subject: [RFC PATCH 7/7] memory: Add a new argument to indicate the request attribute in RamDismcardManager helpers Date: Fri, 13 Dec 2024 15:08:49 +0800 Message-ID: <20241213070852.106092-8-chenyi.qiang@intel.com> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20241213070852.106092-1-chenyi.qiang@intel.com> References: <20241213070852.106092-1-chenyi.qiang@intel.com> MIME-Version: 1.0 Received-SPF: pass client-ip=198.175.65.10; envelope-from=chenyi.qiang@intel.com; helo=mgamail.intel.com X-Spam_score_int: -48 X-Spam_score: -4.9 X-Spam_bar: ---- X-Spam_report: (-4.9 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.496, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_MED=-2.3, RCVD_IN_VALIDITY_RPBL_BLOCKED=0.001, RCVD_IN_VALIDITY_SAFE_BLOCKED=0.001, SPF_HELO_NONE=0.001, T_SPF_TEMPERROR=0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org For each ram_discard_manager helper, add a new argument 'is_private' to indicate the request attribute. If is_private is true, the operation targets the private range in the section. For example, replay_populate(true) will replay the populate operation on private part in the MemoryRegionSection, while replay_popuate(false) will replay population on shared part. This helps to distinguish between the states of private/shared and discarded/populated. It is essential for guest_memfd_manager which uses RamDiscardManager interface but can't treat private memory as discarded memory. This is because it does not align with the expectation of current RamDiscardManager users (e.g. live migration), who expect that discarded memory is hot-removed and can be skipped when processing guest memory. Treating private memory as discarded won't work in the future if live migration needs to handle private memory. For example, live migration needs to migrate private memory. The user of the helper needs to figure out which attribute to manipulate. For legacy VM case, use is_private=true by default. Private attribute is only valid in a guest_memfd based VM. Opportunistically rename the guest_memfd_for_each_{discarded, populated}_section() to guest_memfd_for_each_{private, shared)_section() to distinguish between private/shared and discarded/populated at the same time. Signed-off-by: Chenyi Qiang --- hw/vfio/common.c | 22 ++++++-- hw/virtio/virtio-mem.c | 23 ++++---- include/exec/memory.h | 23 ++++++-- migration/ram.c | 14 ++--- system/guest-memfd-manager.c | 106 +++++++++++++++++++++++------------ system/memory.c | 13 +++-- system/memory_mapping.c | 4 +- 7 files changed, 135 insertions(+), 70 deletions(-) diff --git a/hw/vfio/common.c b/hw/vfio/common.c index dcef44fe55..a6f49e6450 100644 --- a/hw/vfio/common.c +++ b/hw/vfio/common.c @@ -345,7 +345,8 @@ out: } static void vfio_ram_discard_notify_discard(RamDiscardListener *rdl, - MemoryRegionSection *section) + MemoryRegionSection *section, + bool is_private) { VFIORamDiscardListener *vrdl = container_of(rdl, VFIORamDiscardListener, listener); @@ -354,6 +355,11 @@ static void vfio_ram_discard_notify_discard(RamDiscardListener *rdl, const hwaddr iova = section->offset_within_address_space; int ret; + if (is_private) { + /* Not support discard private memory yet. */ + return; + } + /* Unmap with a single call. */ ret = vfio_container_dma_unmap(bcontainer, iova, size , NULL); if (ret) { @@ -363,7 +369,8 @@ static void vfio_ram_discard_notify_discard(RamDiscardListener *rdl, } static int vfio_ram_discard_notify_populate(RamDiscardListener *rdl, - MemoryRegionSection *section) + MemoryRegionSection *section, + bool is_private) { VFIORamDiscardListener *vrdl = container_of(rdl, VFIORamDiscardListener, listener); @@ -374,6 +381,11 @@ static int vfio_ram_discard_notify_populate(RamDiscardListener *rdl, void *vaddr; int ret; + if (is_private) { + /* Not support discard private memory yet. */ + return 0; + } + /* * Map in (aligned within memory region) minimum granularity, so we can * unmap in minimum granularity later. @@ -390,7 +402,7 @@ static int vfio_ram_discard_notify_populate(RamDiscardListener *rdl, vaddr, section->readonly); if (ret) { /* Rollback */ - vfio_ram_discard_notify_discard(rdl, section); + vfio_ram_discard_notify_discard(rdl, section, false); return ret; } } @@ -1248,7 +1260,7 @@ out: } static int vfio_ram_discard_get_dirty_bitmap(MemoryRegionSection *section, - void *opaque) + bool is_private, void *opaque) { const hwaddr size = int128_get64(section->size); const hwaddr iova = section->offset_within_address_space; @@ -1293,7 +1305,7 @@ vfio_sync_ram_discard_listener_dirty_bitmap(VFIOContainerBase *bcontainer, * We only want/can synchronize the bitmap for actually mapped parts - * which correspond to populated parts. Replay all populated parts. */ - return ram_discard_manager_replay_populated(rdm, section, + return ram_discard_manager_replay_populated(rdm, section, false, vfio_ram_discard_get_dirty_bitmap, &vrdl); } diff --git a/hw/virtio/virtio-mem.c b/hw/virtio/virtio-mem.c index e3d1ccaeeb..e7304c7e47 100644 --- a/hw/virtio/virtio-mem.c +++ b/hw/virtio/virtio-mem.c @@ -312,14 +312,14 @@ static int virtio_mem_notify_populate_cb(MemoryRegionSection *s, void *arg) { RamDiscardListener *rdl = arg; - return rdl->notify_populate(rdl, s); + return rdl->notify_populate(rdl, s, false); } static int virtio_mem_notify_discard_cb(MemoryRegionSection *s, void *arg) { RamDiscardListener *rdl = arg; - rdl->notify_discard(rdl, s); + rdl->notify_discard(rdl, s, false); return 0; } @@ -334,7 +334,7 @@ static void virtio_mem_notify_unplug(VirtIOMEM *vmem, uint64_t offset, if (!memory_region_section_intersect_range(&tmp, offset, size)) { continue; } - rdl->notify_discard(rdl, &tmp); + rdl->notify_discard(rdl, &tmp, false); } } @@ -350,7 +350,7 @@ static int virtio_mem_notify_plug(VirtIOMEM *vmem, uint64_t offset, if (!memory_region_section_intersect_range(&tmp, offset, size)) { continue; } - ret = rdl->notify_populate(rdl, &tmp); + ret = rdl->notify_populate(rdl, &tmp, false); if (ret) { break; } @@ -367,7 +367,7 @@ static int virtio_mem_notify_plug(VirtIOMEM *vmem, uint64_t offset, if (!memory_region_section_intersect_range(&tmp, offset, size)) { continue; } - rdl2->notify_discard(rdl2, &tmp); + rdl2->notify_discard(rdl2, &tmp, false); } } return ret; @@ -383,7 +383,7 @@ static void virtio_mem_notify_unplug_all(VirtIOMEM *vmem) QLIST_FOREACH(rdl, &vmem->rdl_list, next) { if (rdl->double_discard_supported) { - rdl->notify_discard(rdl, rdl->section); + rdl->notify_discard(rdl, rdl->section, false); } else { virtio_mem_for_each_plugged_section(vmem, rdl->section, rdl, virtio_mem_notify_discard_cb); @@ -1685,7 +1685,8 @@ static uint64_t virtio_mem_rdm_get_min_granularity(const RamDiscardManager *rdm, } static bool virtio_mem_rdm_is_populated(const RamDiscardManager *rdm, - const MemoryRegionSection *s) + const MemoryRegionSection *s, + bool is_private) { const VirtIOMEM *vmem = VIRTIO_MEM(rdm); uint64_t start_gpa = vmem->addr + s->offset_within_region; @@ -1712,11 +1713,12 @@ static int virtio_mem_rdm_replay_populated_cb(MemoryRegionSection *s, void *arg) { struct VirtIOMEMReplayData *data = arg; - return ((ReplayRamPopulate)data->fn)(s, data->opaque); + return ((ReplayRamPopulate)data->fn)(s, false, data->opaque); } static int virtio_mem_rdm_replay_populated(const RamDiscardManager *rdm, MemoryRegionSection *s, + bool is_private, ReplayRamPopulate replay_fn, void *opaque) { @@ -1736,12 +1738,13 @@ static int virtio_mem_rdm_replay_discarded_cb(MemoryRegionSection *s, { struct VirtIOMEMReplayData *data = arg; - ((ReplayRamDiscard)data->fn)(s, data->opaque); + ((ReplayRamDiscard)data->fn)(s, false, data->opaque); return 0; } static void virtio_mem_rdm_replay_discarded(const RamDiscardManager *rdm, MemoryRegionSection *s, + bool is_private, ReplayRamDiscard replay_fn, void *opaque) { @@ -1783,7 +1786,7 @@ static void virtio_mem_rdm_unregister_listener(RamDiscardManager *rdm, g_assert(rdl->section->mr == &vmem->memdev->mr); if (vmem->size) { if (rdl->double_discard_supported) { - rdl->notify_discard(rdl, rdl->section); + rdl->notify_discard(rdl, rdl->section, false); } else { virtio_mem_for_each_plugged_section(vmem, rdl->section, rdl, virtio_mem_notify_discard_cb); diff --git a/include/exec/memory.h b/include/exec/memory.h index ec7bc641e8..8aac61af08 100644 --- a/include/exec/memory.h +++ b/include/exec/memory.h @@ -508,9 +508,11 @@ struct IOMMUMemoryRegionClass { typedef struct RamDiscardListener RamDiscardListener; typedef int (*NotifyRamPopulate)(RamDiscardListener *rdl, - MemoryRegionSection *section); + MemoryRegionSection *section, + bool is_private); typedef void (*NotifyRamDiscard)(RamDiscardListener *rdl, - MemoryRegionSection *section); + MemoryRegionSection *section, + bool is_private); struct RamDiscardListener { /* @@ -566,8 +568,8 @@ static inline void ram_discard_listener_init(RamDiscardListener *rdl, rdl->double_discard_supported = double_discard_supported; } -typedef int (*ReplayRamPopulate)(MemoryRegionSection *section, void *opaque); -typedef void (*ReplayRamDiscard)(MemoryRegionSection *section, void *opaque); +typedef int (*ReplayRamPopulate)(MemoryRegionSection *section, bool is_private, void *opaque); +typedef void (*ReplayRamDiscard)(MemoryRegionSection *section, bool is_private, void *opaque); /* * RamDiscardManagerClass: @@ -632,11 +634,13 @@ struct RamDiscardManagerClass { * * @rdm: the #RamDiscardManager * @section: the #MemoryRegionSection + * @is_private: the attribute of the request section * * Returns whether the given range is completely populated. */ bool (*is_populated)(const RamDiscardManager *rdm, - const MemoryRegionSection *section); + const MemoryRegionSection *section, + bool is_private); /** * @replay_populated: @@ -648,6 +652,7 @@ struct RamDiscardManagerClass { * * @rdm: the #RamDiscardManager * @section: the #MemoryRegionSection + * @is_private: the attribute of the populated parts * @replay_fn: the #ReplayRamPopulate callback * @opaque: pointer to forward to the callback * @@ -655,6 +660,7 @@ struct RamDiscardManagerClass { */ int (*replay_populated)(const RamDiscardManager *rdm, MemoryRegionSection *section, + bool is_private, ReplayRamPopulate replay_fn, void *opaque); /** @@ -665,11 +671,13 @@ struct RamDiscardManagerClass { * * @rdm: the #RamDiscardManager * @section: the #MemoryRegionSection + * @is_private: the attribute of the discarded parts * @replay_fn: the #ReplayRamDiscard callback * @opaque: pointer to forward to the callback */ void (*replay_discarded)(const RamDiscardManager *rdm, MemoryRegionSection *section, + bool is_private, ReplayRamDiscard replay_fn, void *opaque); /** @@ -709,15 +717,18 @@ uint64_t ram_discard_manager_get_min_granularity(const RamDiscardManager *rdm, const MemoryRegion *mr); bool ram_discard_manager_is_populated(const RamDiscardManager *rdm, - const MemoryRegionSection *section); + const MemoryRegionSection *section, + bool is_private); int ram_discard_manager_replay_populated(const RamDiscardManager *rdm, MemoryRegionSection *section, + bool is_private, ReplayRamPopulate replay_fn, void *opaque); void ram_discard_manager_replay_discarded(const RamDiscardManager *rdm, MemoryRegionSection *section, + bool is_private, ReplayRamDiscard replay_fn, void *opaque); diff --git a/migration/ram.c b/migration/ram.c index 05ff9eb328..b9efba1d14 100644 --- a/migration/ram.c +++ b/migration/ram.c @@ -838,7 +838,7 @@ static inline bool migration_bitmap_clear_dirty(RAMState *rs, } static void dirty_bitmap_clear_section(MemoryRegionSection *section, - void *opaque) + bool is_private, void *opaque) { const hwaddr offset = section->offset_within_region; const hwaddr size = int128_get64(section->size); @@ -884,7 +884,7 @@ static uint64_t ramblock_dirty_bitmap_clear_discarded_pages(RAMBlock *rb) .size = int128_make64(qemu_ram_get_used_length(rb)), }; - ram_discard_manager_replay_discarded(rdm, §ion, + ram_discard_manager_replay_discarded(rdm, §ion, false, dirty_bitmap_clear_section, &cleared_bits); } @@ -907,7 +907,7 @@ bool ramblock_page_is_discarded(RAMBlock *rb, ram_addr_t start) .size = int128_make64(qemu_ram_pagesize(rb)), }; - return !ram_discard_manager_is_populated(rdm, §ion); + return !ram_discard_manager_is_populated(rdm, §ion, false); } return false; } @@ -1539,7 +1539,7 @@ static inline void populate_read_range(RAMBlock *block, ram_addr_t offset, } static inline int populate_read_section(MemoryRegionSection *section, - void *opaque) + bool is_private, void *opaque) { const hwaddr size = int128_get64(section->size); hwaddr offset = section->offset_within_region; @@ -1579,7 +1579,7 @@ static void ram_block_populate_read(RAMBlock *rb) .size = rb->mr->size, }; - ram_discard_manager_replay_populated(rdm, §ion, + ram_discard_manager_replay_populated(rdm, §ion, false, populate_read_section, NULL); } else { populate_read_range(rb, 0, rb->used_length); @@ -1614,7 +1614,7 @@ void ram_write_tracking_prepare(void) } static inline int uffd_protect_section(MemoryRegionSection *section, - void *opaque) + bool is_private, void *opaque) { const hwaddr size = int128_get64(section->size); const hwaddr offset = section->offset_within_region; @@ -1638,7 +1638,7 @@ static int ram_block_uffd_protect(RAMBlock *rb, int uffd_fd) .size = rb->mr->size, }; - return ram_discard_manager_replay_populated(rdm, §ion, + return ram_discard_manager_replay_populated(rdm, §ion, false, uffd_protect_section, (void *)(uintptr_t)uffd_fd); } diff --git a/system/guest-memfd-manager.c b/system/guest-memfd-manager.c index b6a32f0bfb..50802b34d7 100644 --- a/system/guest-memfd-manager.c +++ b/system/guest-memfd-manager.c @@ -23,39 +23,51 @@ OBJECT_DEFINE_SIMPLE_TYPE_WITH_INTERFACES(GuestMemfdManager, { }) static bool guest_memfd_rdm_is_populated(const RamDiscardManager *rdm, - const MemoryRegionSection *section) + const MemoryRegionSection *section, + bool is_private) { const GuestMemfdManager *gmm = GUEST_MEMFD_MANAGER(rdm); uint64_t first_bit = section->offset_within_region / gmm->block_size; uint64_t last_bit = first_bit + int128_get64(section->size) / gmm->block_size - 1; unsigned long first_discard_bit; - first_discard_bit = find_next_zero_bit(gmm->bitmap, last_bit + 1, first_bit); + if (is_private) { + /* Check if the private section is populated */ + first_discard_bit = find_next_bit(gmm->bitmap, last_bit + 1, first_bit); + } else { + /* Check if the shared section is populated */ + first_discard_bit = find_next_zero_bit(gmm->bitmap, last_bit + 1, first_bit); + } + return first_discard_bit > last_bit; } -typedef int (*guest_memfd_section_cb)(MemoryRegionSection *s, void *arg); +typedef int (*guest_memfd_section_cb)(MemoryRegionSection *s, bool is_private, + void *arg); -static int guest_memfd_notify_populate_cb(MemoryRegionSection *section, void *arg) +static int guest_memfd_notify_populate_cb(MemoryRegionSection *section, bool is_private, + void *arg) { RamDiscardListener *rdl = arg; - return rdl->notify_populate(rdl, section); + return rdl->notify_populate(rdl, section, is_private); } -static int guest_memfd_notify_discard_cb(MemoryRegionSection *section, void *arg) +static int guest_memfd_notify_discard_cb(MemoryRegionSection *section, bool is_private, + void *arg) { RamDiscardListener *rdl = arg; - rdl->notify_discard(rdl, section); + rdl->notify_discard(rdl, section, is_private); return 0; } -static int guest_memfd_for_each_populated_section(const GuestMemfdManager *gmm, - MemoryRegionSection *section, - void *arg, - guest_memfd_section_cb cb) +static int guest_memfd_for_each_shared_section(const GuestMemfdManager *gmm, + MemoryRegionSection *section, + bool is_private, + void *arg, + guest_memfd_section_cb cb) { unsigned long first_one_bit, last_one_bit; uint64_t offset, size; @@ -76,7 +88,7 @@ static int guest_memfd_for_each_populated_section(const GuestMemfdManager *gmm, break; } - ret = cb(&tmp, arg); + ret = cb(&tmp, is_private, arg); if (ret) { break; } @@ -88,10 +100,11 @@ static int guest_memfd_for_each_populated_section(const GuestMemfdManager *gmm, return ret; } -static int guest_memfd_for_each_discarded_section(const GuestMemfdManager *gmm, - MemoryRegionSection *section, - void *arg, - guest_memfd_section_cb cb) +static int guest_memfd_for_each_private_section(const GuestMemfdManager *gmm, + MemoryRegionSection *section, + bool is_private, + void *arg, + guest_memfd_section_cb cb) { unsigned long first_zero_bit, last_zero_bit; uint64_t offset, size; @@ -113,7 +126,7 @@ static int guest_memfd_for_each_discarded_section(const GuestMemfdManager *gmm, break; } - ret = cb(&tmp, arg); + ret = cb(&tmp, is_private, arg); if (ret) { break; } @@ -146,8 +159,9 @@ static void guest_memfd_rdm_register_listener(RamDiscardManager *rdm, QLIST_INSERT_HEAD(&gmm->rdl_list, rdl, next); - ret = guest_memfd_for_each_populated_section(gmm, section, rdl, - guest_memfd_notify_populate_cb); + /* Populate shared part */ + ret = guest_memfd_for_each_shared_section(gmm, section, false, rdl, + guest_memfd_notify_populate_cb); if (ret) { error_report("%s: Failed to register RAM discard listener: %s", __func__, strerror(-ret)); @@ -163,8 +177,9 @@ static void guest_memfd_rdm_unregister_listener(RamDiscardManager *rdm, g_assert(rdl->section); g_assert(rdl->section->mr == gmm->mr); - ret = guest_memfd_for_each_populated_section(gmm, rdl->section, rdl, - guest_memfd_notify_discard_cb); + /* Discard shared part */ + ret = guest_memfd_for_each_shared_section(gmm, rdl->section, false, rdl, + guest_memfd_notify_discard_cb); if (ret) { error_report("%s: Failed to unregister RAM discard listener: %s", __func__, strerror(-ret)); @@ -181,16 +196,18 @@ typedef struct GuestMemfdReplayData { void *opaque; } GuestMemfdReplayData; -static int guest_memfd_rdm_replay_populated_cb(MemoryRegionSection *section, void *arg) +static int guest_memfd_rdm_replay_populated_cb(MemoryRegionSection *section, + bool is_private, void *arg) { struct GuestMemfdReplayData *data = arg; ReplayRamPopulate replay_fn = data->fn; - return replay_fn(section, data->opaque); + return replay_fn(section, is_private, data->opaque); } static int guest_memfd_rdm_replay_populated(const RamDiscardManager *rdm, MemoryRegionSection *section, + bool is_private, ReplayRamPopulate replay_fn, void *opaque) { @@ -198,22 +215,31 @@ static int guest_memfd_rdm_replay_populated(const RamDiscardManager *rdm, struct GuestMemfdReplayData data = { .fn = replay_fn, .opaque = opaque }; g_assert(section->mr == gmm->mr); - return guest_memfd_for_each_populated_section(gmm, section, &data, - guest_memfd_rdm_replay_populated_cb); + if (is_private) { + /* Replay populate on private section */ + return guest_memfd_for_each_private_section(gmm, section, is_private, &data, + guest_memfd_rdm_replay_populated_cb); + } else { + /* Replay populate on shared section */ + return guest_memfd_for_each_shared_section(gmm, section, is_private, &data, + guest_memfd_rdm_replay_populated_cb); + } } -static int guest_memfd_rdm_replay_discarded_cb(MemoryRegionSection *section, void *arg) +static int guest_memfd_rdm_replay_discarded_cb(MemoryRegionSection *section, + bool is_private, void *arg) { struct GuestMemfdReplayData *data = arg; ReplayRamDiscard replay_fn = data->fn; - replay_fn(section, data->opaque); + replay_fn(section, is_private, data->opaque); return 0; } static void guest_memfd_rdm_replay_discarded(const RamDiscardManager *rdm, MemoryRegionSection *section, + bool is_private, ReplayRamDiscard replay_fn, void *opaque) { @@ -221,8 +247,16 @@ static void guest_memfd_rdm_replay_discarded(const RamDiscardManager *rdm, struct GuestMemfdReplayData data = { .fn = replay_fn, .opaque = opaque }; g_assert(section->mr == gmm->mr); - guest_memfd_for_each_discarded_section(gmm, section, &data, - guest_memfd_rdm_replay_discarded_cb); + + if (is_private) { + /* Replay discard on private section */ + guest_memfd_for_each_private_section(gmm, section, is_private, &data, + guest_memfd_rdm_replay_discarded_cb); + } else { + /* Replay discard on shared section */ + guest_memfd_for_each_shared_section(gmm, section, is_private, &data, + guest_memfd_rdm_replay_discarded_cb); + } } static bool guest_memfd_is_valid_range(GuestMemfdManager *gmm, @@ -257,8 +291,9 @@ static void guest_memfd_notify_discard(GuestMemfdManager *gmm, continue; } - guest_memfd_for_each_populated_section(gmm, &tmp, rdl, - guest_memfd_notify_discard_cb); + /* For current shared section, notify to discard shared parts */ + guest_memfd_for_each_shared_section(gmm, &tmp, false, rdl, + guest_memfd_notify_discard_cb); } } @@ -276,8 +311,9 @@ static int guest_memfd_notify_populate(GuestMemfdManager *gmm, continue; } - ret = guest_memfd_for_each_discarded_section(gmm, &tmp, rdl, - guest_memfd_notify_populate_cb); + /* For current private section, notify to populate the shared parts */ + ret = guest_memfd_for_each_private_section(gmm, &tmp, false, rdl, + guest_memfd_notify_populate_cb); if (ret) { break; } @@ -295,8 +331,8 @@ static int guest_memfd_notify_populate(GuestMemfdManager *gmm, continue; } - guest_memfd_for_each_discarded_section(gmm, &tmp, rdl2, - guest_memfd_notify_discard_cb); + guest_memfd_for_each_private_section(gmm, &tmp, false, rdl2, + guest_memfd_notify_discard_cb); } } return ret; diff --git a/system/memory.c b/system/memory.c index ddcec90f5e..d3d5a04f98 100644 --- a/system/memory.c +++ b/system/memory.c @@ -2133,34 +2133,37 @@ uint64_t ram_discard_manager_get_min_granularity(const RamDiscardManager *rdm, } bool ram_discard_manager_is_populated(const RamDiscardManager *rdm, - const MemoryRegionSection *section) + const MemoryRegionSection *section, + bool is_private) { RamDiscardManagerClass *rdmc = RAM_DISCARD_MANAGER_GET_CLASS(rdm); g_assert(rdmc->is_populated); - return rdmc->is_populated(rdm, section); + return rdmc->is_populated(rdm, section, is_private); } int ram_discard_manager_replay_populated(const RamDiscardManager *rdm, MemoryRegionSection *section, + bool is_private, ReplayRamPopulate replay_fn, void *opaque) { RamDiscardManagerClass *rdmc = RAM_DISCARD_MANAGER_GET_CLASS(rdm); g_assert(rdmc->replay_populated); - return rdmc->replay_populated(rdm, section, replay_fn, opaque); + return rdmc->replay_populated(rdm, section, is_private, replay_fn, opaque); } void ram_discard_manager_replay_discarded(const RamDiscardManager *rdm, MemoryRegionSection *section, + bool is_private, ReplayRamDiscard replay_fn, void *opaque) { RamDiscardManagerClass *rdmc = RAM_DISCARD_MANAGER_GET_CLASS(rdm); g_assert(rdmc->replay_discarded); - rdmc->replay_discarded(rdm, section, replay_fn, opaque); + rdmc->replay_discarded(rdm, section, is_private, replay_fn, opaque); } void ram_discard_manager_register_listener(RamDiscardManager *rdm, @@ -2221,7 +2224,7 @@ bool memory_get_xlat_addr(IOMMUTLBEntry *iotlb, void **vaddr, * Disallow that. vmstate priorities make sure any RamDiscardManager * were already restored before IOMMUs are restored. */ - if (!ram_discard_manager_is_populated(rdm, &tmp)) { + if (!ram_discard_manager_is_populated(rdm, &tmp, false)) { error_setg(errp, "iommu map to discarded memory (e.g., unplugged" " via virtio-mem): %" HWADDR_PRIx "", iotlb->translated_addr); diff --git a/system/memory_mapping.c b/system/memory_mapping.c index ca2390eb80..c55c0c0c93 100644 --- a/system/memory_mapping.c +++ b/system/memory_mapping.c @@ -249,7 +249,7 @@ static void guest_phys_block_add_section(GuestPhysListener *g, } static int guest_phys_ram_populate_cb(MemoryRegionSection *section, - void *opaque) + bool is_private, void *opaque) { GuestPhysListener *g = opaque; @@ -274,7 +274,7 @@ static void guest_phys_blocks_region_add(MemoryListener *listener, RamDiscardManager *rdm; rdm = memory_region_get_ram_discard_manager(section->mr); - ram_discard_manager_replay_populated(rdm, section, + ram_discard_manager_replay_populated(rdm, section, false, guest_phys_ram_populate_cb, g); return; }