From patchwork Fri Dec 13 07:08:43 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chenyi Qiang X-Patchwork-Id: 13906620 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.10]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4249E18BBBB for ; Fri, 13 Dec 2024 07:09:19 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.10 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734073761; cv=none; b=RSZzXZnoqksuK+e74Qf33RUOemdjzVLLDd0pEajqeQnKFoQfIHRxGuTiwkcYBABskaP5hvaGqdlV8GwjgVjcf+mhEuJ3Z8LMZEIWMJJYkg8HsSoXHRehMRH1oc8OG2A7EBURIN6S9e58XibMHQ5JsHo/onwlN7ye1NDchTZtnTM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734073761; c=relaxed/simple; bh=+gGDRKc6A8E0xNRDSzs8+ysZ9ibQoXdsqeDZYD2FgLQ=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=aYkVr9+Q43qIaqA/+CZeZyjYD54RROtMxz/a6oz+ZN7SKjOkfZ+Q0pprU+XvwQyx8Fn3dvGPNr4ygXn1wfQV34NFA2mu+gQCBGLvjc7NJArRiM14x+FOgpXWeRndq8UFBmJkczOMP9WT69iZIK/6IxYh3W0qSjYClU5l2RsKKVE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=lnyYDw3h; arc=none smtp.client-ip=198.175.65.10 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="lnyYDw3h" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1734073759; x=1765609759; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=+gGDRKc6A8E0xNRDSzs8+ysZ9ibQoXdsqeDZYD2FgLQ=; b=lnyYDw3h/tQP7L9nKubkUeGCBmay6p0D3eoqsLooHvMw4ZOMvMaN/pze UtH0xU6TqjeIcfWrdGR2uEoLtTLw0yikDIDVryK1mrjNQm1nYww6eo39t InViWP/L8Dxiq+1mGTqhr61uRzTNC/5o/kXMjDc6DPgDaYYokW+uuGEQE K0UQ0sHwjj4MKs/rAw+upXO0gNUxO/OXW0JbQES8vbb9lDP9tI2i12Ogt JXVBN2rhvXgmT+4Cs2wbSIX1iz89y2AtjCWa5XwBx4wVE7EzUyu65klpO hQ+WRlBqs+QP8fhcO8OjNrGwyAxqn0EMsCyJ1l+md6JDTmk+4OfJGcmEF g==; X-CSE-ConnectionGUID: uVRCZRlISuyu40dlpcAvsw== X-CSE-MsgGUID: T0csoUADQx+jPuZNQyim+g== X-IronPort-AV: E=McAfee;i="6700,10204,11284"; a="51937070" X-IronPort-AV: E=Sophos;i="6.12,230,1728975600"; d="scan'208";a="51937070" Received: from orviesa009.jf.intel.com ([10.64.159.149]) by orvoesa102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Dec 2024 23:09:19 -0800 X-CSE-ConnectionGUID: dr84zZlCQqSEHaVcDIz0oQ== X-CSE-MsgGUID: swkuzg1CTXaftgdaBFhv4g== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.12,230,1728975600"; d="scan'208";a="96365541" Received: from emr-bkc.sh.intel.com ([10.112.230.82]) by orviesa009-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Dec 2024 23:09:16 -0800 From: Chenyi Qiang To: David Hildenbrand , Paolo Bonzini , Peter Xu , =?utf-8?q?Philippe_Mathieu-Daud=C3=A9?= , Michael Roth Cc: Chenyi Qiang , qemu-devel@nongnu.org, kvm@vger.kernel.org, Williams Dan J , Peng Chao P , Gao Chao , Xu Yilun Subject: [PATCH 1/7] memory: Export a helper to get intersection of a MemoryRegionSection with a given range Date: Fri, 13 Dec 2024 15:08:43 +0800 Message-ID: <20241213070852.106092-2-chenyi.qiang@intel.com> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20241213070852.106092-1-chenyi.qiang@intel.com> References: <20241213070852.106092-1-chenyi.qiang@intel.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Rename the helper to memory_region_section_intersect_range() to make it more generic. Signed-off-by: Chenyi Qiang --- hw/virtio/virtio-mem.c | 32 +++++--------------------------- include/exec/memory.h | 13 +++++++++++++ system/memory.c | 17 +++++++++++++++++ 3 files changed, 35 insertions(+), 27 deletions(-) diff --git a/hw/virtio/virtio-mem.c b/hw/virtio/virtio-mem.c index 80ada89551..e3d1ccaeeb 100644 --- a/hw/virtio/virtio-mem.c +++ b/hw/virtio/virtio-mem.c @@ -242,28 +242,6 @@ static int virtio_mem_for_each_plugged_range(VirtIOMEM *vmem, void *arg, return ret; } -/* - * Adjust the memory section to cover the intersection with the given range. - * - * Returns false if the intersection is empty, otherwise returns true. - */ -static bool virtio_mem_intersect_memory_section(MemoryRegionSection *s, - uint64_t offset, uint64_t size) -{ - uint64_t start = MAX(s->offset_within_region, offset); - uint64_t end = MIN(s->offset_within_region + int128_get64(s->size), - offset + size); - - if (end <= start) { - return false; - } - - s->offset_within_address_space += start - s->offset_within_region; - s->offset_within_region = start; - s->size = int128_make64(end - start); - return true; -} - typedef int (*virtio_mem_section_cb)(MemoryRegionSection *s, void *arg); static int virtio_mem_for_each_plugged_section(const VirtIOMEM *vmem, @@ -285,7 +263,7 @@ static int virtio_mem_for_each_plugged_section(const VirtIOMEM *vmem, first_bit + 1) - 1; size = (last_bit - first_bit + 1) * vmem->block_size; - if (!virtio_mem_intersect_memory_section(&tmp, offset, size)) { + if (!memory_region_section_intersect_range(&tmp, offset, size)) { break; } ret = cb(&tmp, arg); @@ -317,7 +295,7 @@ static int virtio_mem_for_each_unplugged_section(const VirtIOMEM *vmem, first_bit + 1) - 1; size = (last_bit - first_bit + 1) * vmem->block_size; - if (!virtio_mem_intersect_memory_section(&tmp, offset, size)) { + if (!memory_region_section_intersect_range(&tmp, offset, size)) { break; } ret = cb(&tmp, arg); @@ -353,7 +331,7 @@ static void virtio_mem_notify_unplug(VirtIOMEM *vmem, uint64_t offset, QLIST_FOREACH(rdl, &vmem->rdl_list, next) { MemoryRegionSection tmp = *rdl->section; - if (!virtio_mem_intersect_memory_section(&tmp, offset, size)) { + if (!memory_region_section_intersect_range(&tmp, offset, size)) { continue; } rdl->notify_discard(rdl, &tmp); @@ -369,7 +347,7 @@ static int virtio_mem_notify_plug(VirtIOMEM *vmem, uint64_t offset, QLIST_FOREACH(rdl, &vmem->rdl_list, next) { MemoryRegionSection tmp = *rdl->section; - if (!virtio_mem_intersect_memory_section(&tmp, offset, size)) { + if (!memory_region_section_intersect_range(&tmp, offset, size)) { continue; } ret = rdl->notify_populate(rdl, &tmp); @@ -386,7 +364,7 @@ static int virtio_mem_notify_plug(VirtIOMEM *vmem, uint64_t offset, if (rdl2 == rdl) { break; } - if (!virtio_mem_intersect_memory_section(&tmp, offset, size)) { + if (!memory_region_section_intersect_range(&tmp, offset, size)) { continue; } rdl2->notify_discard(rdl2, &tmp); diff --git a/include/exec/memory.h b/include/exec/memory.h index e5e865d1a9..ec7bc641e8 100644 --- a/include/exec/memory.h +++ b/include/exec/memory.h @@ -1196,6 +1196,19 @@ MemoryRegionSection *memory_region_section_new_copy(MemoryRegionSection *s); */ void memory_region_section_free_copy(MemoryRegionSection *s); +/** + * memory_region_section_intersect_range: Adjust the memory section to cover + * the intersection with the given range. + * + * @s: the #MemoryRegionSection to be adjusted + * @offset: the offset of the given range in the memory region + * @size: the size of the given range + * + * Returns false if the intersection is empty, otherwise returns true. + */ +bool memory_region_section_intersect_range(MemoryRegionSection *s, + uint64_t offset, uint64_t size); + /** * memory_region_init: Initialize a memory region * diff --git a/system/memory.c b/system/memory.c index 85f6834cb3..ddcec90f5e 100644 --- a/system/memory.c +++ b/system/memory.c @@ -2898,6 +2898,23 @@ void memory_region_section_free_copy(MemoryRegionSection *s) g_free(s); } +bool memory_region_section_intersect_range(MemoryRegionSection *s, + uint64_t offset, uint64_t size) +{ + uint64_t start = MAX(s->offset_within_region, offset); + uint64_t end = MIN(s->offset_within_region + int128_get64(s->size), + offset + size); + + if (end <= start) { + return false; + } + + s->offset_within_address_space += start - s->offset_within_region; + s->offset_within_region = start; + s->size = int128_make64(end - start); + return true; +} + bool memory_region_present(MemoryRegion *container, hwaddr addr) { MemoryRegion *mr; From patchwork Fri Dec 13 07:08:44 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chenyi Qiang X-Patchwork-Id: 13906621 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.10]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7CC8818F2FC for ; Fri, 13 Dec 2024 07:09:22 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.10 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734073764; cv=none; b=jzAxhT2V41yZyWwohnwuixE5pAR7kYwagxJUVHB6uYwt7Q3s1MsPeMqbDPFFThmEgWklwgkA9TpT61TwdBiZL/FJWt72KjY19YCKkRejUe4E/RDGEf5Ye7IjHP7ipCAg8/YneFfTXdTtQPZXoD/Qw0TCvPuKmR2oDhmScZhSnXw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734073764; c=relaxed/simple; bh=f3SH0dN9l5XpHOQkWq7GcuZYymqFqU3Q2bWZ4IWC33c=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=KA2NB70laIxONGxQuy0d67hnZkWQf7rauzixE/U+t+VYLUk14kAu4ugG1+e5I9hgV4xhnhtzl8BOQeFXSZm2jsIPTxIJjqimiVUvPAa/UsvWskf16F3nfTfngE5ZyugUinBXwkorFj4ooDVVenMGW5rVMayqChccoN0+nXO6WaA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=Juc4APAd; arc=none smtp.client-ip=198.175.65.10 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="Juc4APAd" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1734073763; x=1765609763; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=f3SH0dN9l5XpHOQkWq7GcuZYymqFqU3Q2bWZ4IWC33c=; b=Juc4APAdY8u3XD9Ze9IzHULj7A3WVB12mc3zfMBeM1L5QiFglhOiSa4W /ZfFSAncZ4I9TDDcWauxB3V4v3trkaVeRPuGdgw7NVs6F8oge48dasKmd e+5RCJj2A049Se2C3bVAKgNVuVAbs/rsrJwC1xjtRzndVQQY3vP8/PAbX dm7DA+ju851xhaqs8AI9vtju0SgZ30Q5vFbhH6mQJBA/6iGY/jQZ/ggo3 IBr5ILSizU/RSZUHxofVFrkdPEjJKMZKG5iWM3a3yPZft9cPoy+WAry8X 1u5fH0ixV35OTuFEjVt1FOzzW2VDB/+QIENnjt4mffEDM/2WSTroWOklC A==; X-CSE-ConnectionGUID: TdTz5WD3T9CpjGLHFZ8bzw== X-CSE-MsgGUID: 8bEf0kVbRBKjIXuGYhCOfw== X-IronPort-AV: E=McAfee;i="6700,10204,11284"; a="51937076" X-IronPort-AV: E=Sophos;i="6.12,230,1728975600"; d="scan'208";a="51937076" Received: from orviesa009.jf.intel.com ([10.64.159.149]) by orvoesa102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Dec 2024 23:09:23 -0800 X-CSE-ConnectionGUID: IixbyfNwTPSBxOVNMAoO2Q== X-CSE-MsgGUID: TT8enhCkR7aiPygj5iCDvg== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.12,230,1728975600"; d="scan'208";a="96365549" Received: from emr-bkc.sh.intel.com ([10.112.230.82]) by orviesa009-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Dec 2024 23:09:19 -0800 From: Chenyi Qiang To: David Hildenbrand , Paolo Bonzini , Peter Xu , =?utf-8?q?Philippe_Mathieu-Daud=C3=A9?= , Michael Roth Cc: Chenyi Qiang , qemu-devel@nongnu.org, kvm@vger.kernel.org, Williams Dan J , Peng Chao P , Gao Chao , Xu Yilun Subject: [PATCH 2/7] guest_memfd: Introduce an object to manage the guest-memfd with RamDiscardManager Date: Fri, 13 Dec 2024 15:08:44 +0800 Message-ID: <20241213070852.106092-3-chenyi.qiang@intel.com> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20241213070852.106092-1-chenyi.qiang@intel.com> References: <20241213070852.106092-1-chenyi.qiang@intel.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 As the commit 852f0048f3 ("RAMBlock: make guest_memfd require uncoordinated discard") highlighted, some subsystems like VFIO might disable ram block discard. However, guest_memfd relies on the discard operation to perform page conversion between private and shared memory. This can lead to stale IOMMU mapping issue when assigning a hardware device to a confidential VM via shared memory (unprotected memory pages). Blocking shared page discard can solve this problem, but it could cause guests to consume twice the memory with VFIO, which is not acceptable in some cases. An alternative solution is to convey other systems like VFIO to refresh its outdated IOMMU mappings. RamDiscardManager is an existing concept (used by virtio-mem) to adjust VFIO mappings in relation to VM page assignment. Effectively page conversion is similar to hot-removing a page in one mode and adding it back in the other, so the similar work that needs to happen in response to virtio-mem changes needs to happen for page conversion events. Introduce the RamDiscardManager to guest_memfd to achieve it. However, guest_memfd is not an object so it cannot directly implement the RamDiscardManager interface. One solution is to implement the interface in HostMemoryBackend. Any guest_memfd-backed host memory backend can register itself in the target MemoryRegion. However, this solution doesn't cover the scenario where a guest_memfd MemoryRegion doesn't belong to the HostMemoryBackend, e.g. the virtual BIOS MemoryRegion. Thus, choose the second option, i.e. define an object type named guest_memfd_manager with RamDiscardManager interface. Upon creation of guest_memfd, a new guest_memfd_manager object can be instantiated and registered to the managed guest_memfd MemoryRegion to handle the page conversion events. In the context of guest_memfd, the discarded state signifies that the page is private, while the populated state indicated that the page is shared. The state of the memory is tracked at the granularity of the host page size (i.e. block_size), as the minimum conversion size can be one page per request. In addition, VFIO expects the DMA mapping for a specific iova to be mapped and unmapped with the same granularity. However, the confidential VMs may do partial conversion, e.g. conversion happens on a small region within a large region. To prevent such invalid cases and before any potential optimization comes out, all operations are performed with 4K granularity. Signed-off-by: Chenyi Qiang --- include/sysemu/guest-memfd-manager.h | 46 +++++ system/guest-memfd-manager.c | 250 +++++++++++++++++++++++++++ system/meson.build | 1 + 3 files changed, 297 insertions(+) create mode 100644 include/sysemu/guest-memfd-manager.h create mode 100644 system/guest-memfd-manager.c diff --git a/include/sysemu/guest-memfd-manager.h b/include/sysemu/guest-memfd-manager.h new file mode 100644 index 0000000000..ba4a99b614 --- /dev/null +++ b/include/sysemu/guest-memfd-manager.h @@ -0,0 +1,46 @@ +/* + * QEMU guest memfd manager + * + * Copyright Intel + * + * Author: + * Chenyi Qiang + * + * This work is licensed under the terms of the GNU GPL, version 2 or later. + * See the COPYING file in the top-level directory + * + */ + +#ifndef SYSEMU_GUEST_MEMFD_MANAGER_H +#define SYSEMU_GUEST_MEMFD_MANAGER_H + +#include "sysemu/hostmem.h" + +#define TYPE_GUEST_MEMFD_MANAGER "guest-memfd-manager" + +OBJECT_DECLARE_TYPE(GuestMemfdManager, GuestMemfdManagerClass, GUEST_MEMFD_MANAGER) + +struct GuestMemfdManager { + Object parent; + + /* Managed memory region. */ + MemoryRegion *mr; + + /* + * 1-setting of the bit represents the memory is populated (shared). + */ + int32_t bitmap_size; + unsigned long *bitmap; + + /* block size and alignment */ + uint64_t block_size; + + /* listeners to notify on populate/discard activity. */ + QLIST_HEAD(, RamDiscardListener) rdl_list; +}; + +struct GuestMemfdManagerClass { + ObjectClass parent_class; +}; + +#endif diff --git a/system/guest-memfd-manager.c b/system/guest-memfd-manager.c new file mode 100644 index 0000000000..d7e105fead --- /dev/null +++ b/system/guest-memfd-manager.c @@ -0,0 +1,250 @@ +/* + * QEMU guest memfd manager + * + * Copyright Intel + * + * Author: + * Chenyi Qiang + * + * This work is licensed under the terms of the GNU GPL, version 2 or later. + * See the COPYING file in the top-level directory + * + */ + +#include "qemu/osdep.h" +#include "qemu/error-report.h" +#include "sysemu/guest-memfd-manager.h" + +OBJECT_DEFINE_SIMPLE_TYPE_WITH_INTERFACES(GuestMemfdManager, + guest_memfd_manager, + GUEST_MEMFD_MANAGER, + OBJECT, + { TYPE_RAM_DISCARD_MANAGER }, + { }) + +static bool guest_memfd_rdm_is_populated(const RamDiscardManager *rdm, + const MemoryRegionSection *section) +{ + const GuestMemfdManager *gmm = GUEST_MEMFD_MANAGER(rdm); + uint64_t first_bit = section->offset_within_region / gmm->block_size; + uint64_t last_bit = first_bit + int128_get64(section->size) / gmm->block_size - 1; + unsigned long first_discard_bit; + + first_discard_bit = find_next_zero_bit(gmm->bitmap, last_bit + 1, first_bit); + return first_discard_bit > last_bit; +} + +typedef int (*guest_memfd_section_cb)(MemoryRegionSection *s, void *arg); + +static int guest_memfd_notify_populate_cb(MemoryRegionSection *section, void *arg) +{ + RamDiscardListener *rdl = arg; + + return rdl->notify_populate(rdl, section); +} + +static int guest_memfd_notify_discard_cb(MemoryRegionSection *section, void *arg) +{ + RamDiscardListener *rdl = arg; + + rdl->notify_discard(rdl, section); + + return 0; +} + +static int guest_memfd_for_each_populated_section(const GuestMemfdManager *gmm, + MemoryRegionSection *section, + void *arg, + guest_memfd_section_cb cb) +{ + unsigned long first_one_bit, last_one_bit; + uint64_t offset, size; + int ret = 0; + + first_one_bit = section->offset_within_region / gmm->block_size; + first_one_bit = find_next_bit(gmm->bitmap, gmm->bitmap_size, first_one_bit); + + while (first_one_bit < gmm->bitmap_size) { + MemoryRegionSection tmp = *section; + + offset = first_one_bit * gmm->block_size; + last_one_bit = find_next_zero_bit(gmm->bitmap, gmm->bitmap_size, + first_one_bit + 1) - 1; + size = (last_one_bit - first_one_bit + 1) * gmm->block_size; + + if (!memory_region_section_intersect_range(&tmp, offset, size)) { + break; + } + + ret = cb(&tmp, arg); + if (ret) { + break; + } + + first_one_bit = find_next_bit(gmm->bitmap, gmm->bitmap_size, + last_one_bit + 2); + } + + return ret; +} + +static int guest_memfd_for_each_discarded_section(const GuestMemfdManager *gmm, + MemoryRegionSection *section, + void *arg, + guest_memfd_section_cb cb) +{ + unsigned long first_zero_bit, last_zero_bit; + uint64_t offset, size; + int ret = 0; + + first_zero_bit = section->offset_within_region / gmm->block_size; + first_zero_bit = find_next_zero_bit(gmm->bitmap, gmm->bitmap_size, + first_zero_bit); + + while (first_zero_bit < gmm->bitmap_size) { + MemoryRegionSection tmp = *section; + + offset = first_zero_bit * gmm->block_size; + last_zero_bit = find_next_bit(gmm->bitmap, gmm->bitmap_size, + first_zero_bit + 1) - 1; + size = (last_zero_bit - first_zero_bit + 1) * gmm->block_size; + + if (!memory_region_section_intersect_range(&tmp, offset, size)) { + break; + } + + ret = cb(&tmp, arg); + if (ret) { + break; + } + + first_zero_bit = find_next_zero_bit(gmm->bitmap, gmm->bitmap_size, + last_zero_bit + 2); + } + + return ret; +} + +static uint64_t guest_memfd_rdm_get_min_granularity(const RamDiscardManager *rdm, + const MemoryRegion *mr) +{ + GuestMemfdManager *gmm = GUEST_MEMFD_MANAGER(rdm); + + g_assert(mr == gmm->mr); + return gmm->block_size; +} + +static void guest_memfd_rdm_register_listener(RamDiscardManager *rdm, + RamDiscardListener *rdl, + MemoryRegionSection *section) +{ + GuestMemfdManager *gmm = GUEST_MEMFD_MANAGER(rdm); + int ret; + + g_assert(section->mr == gmm->mr); + rdl->section = memory_region_section_new_copy(section); + + QLIST_INSERT_HEAD(&gmm->rdl_list, rdl, next); + + ret = guest_memfd_for_each_populated_section(gmm, section, rdl, + guest_memfd_notify_populate_cb); + if (ret) { + error_report("%s: Failed to register RAM discard listener: %s", __func__, + strerror(-ret)); + } +} + +static void guest_memfd_rdm_unregister_listener(RamDiscardManager *rdm, + RamDiscardListener *rdl) +{ + GuestMemfdManager *gmm = GUEST_MEMFD_MANAGER(rdm); + int ret; + + g_assert(rdl->section); + g_assert(rdl->section->mr == gmm->mr); + + ret = guest_memfd_for_each_populated_section(gmm, rdl->section, rdl, + guest_memfd_notify_discard_cb); + if (ret) { + error_report("%s: Failed to unregister RAM discard listener: %s", __func__, + strerror(-ret)); + } + + memory_region_section_free_copy(rdl->section); + rdl->section = NULL; + QLIST_REMOVE(rdl, next); + +} + +typedef struct GuestMemfdReplayData { + void *fn; + void *opaque; +} GuestMemfdReplayData; + +static int guest_memfd_rdm_replay_populated_cb(MemoryRegionSection *section, void *arg) +{ + struct GuestMemfdReplayData *data = arg; + ReplayRamPopulate replay_fn = data->fn; + + return replay_fn(section, data->opaque); +} + +static int guest_memfd_rdm_replay_populated(const RamDiscardManager *rdm, + MemoryRegionSection *section, + ReplayRamPopulate replay_fn, + void *opaque) +{ + GuestMemfdManager *gmm = GUEST_MEMFD_MANAGER(rdm); + struct GuestMemfdReplayData data = { .fn = replay_fn, .opaque = opaque }; + + g_assert(section->mr == gmm->mr); + return guest_memfd_for_each_populated_section(gmm, section, &data, + guest_memfd_rdm_replay_populated_cb); +} + +static int guest_memfd_rdm_replay_discarded_cb(MemoryRegionSection *section, void *arg) +{ + struct GuestMemfdReplayData *data = arg; + ReplayRamDiscard replay_fn = data->fn; + + replay_fn(section, data->opaque); + + return 0; +} + +static void guest_memfd_rdm_replay_discarded(const RamDiscardManager *rdm, + MemoryRegionSection *section, + ReplayRamDiscard replay_fn, + void *opaque) +{ + GuestMemfdManager *gmm = GUEST_MEMFD_MANAGER(rdm); + struct GuestMemfdReplayData data = { .fn = replay_fn, .opaque = opaque }; + + g_assert(section->mr == gmm->mr); + guest_memfd_for_each_discarded_section(gmm, section, &data, + guest_memfd_rdm_replay_discarded_cb); +} + +static void guest_memfd_manager_init(Object *obj) +{ + GuestMemfdManager *gmm = GUEST_MEMFD_MANAGER(obj); + + QLIST_INIT(&gmm->rdl_list); +} + +static void guest_memfd_manager_finalize(Object *obj) +{ + g_free(GUEST_MEMFD_MANAGER(obj)->bitmap); +} + +static void guest_memfd_manager_class_init(ObjectClass *oc, void *data) +{ + RamDiscardManagerClass *rdmc = RAM_DISCARD_MANAGER_CLASS(oc); + + rdmc->get_min_granularity = guest_memfd_rdm_get_min_granularity; + rdmc->register_listener = guest_memfd_rdm_register_listener; + rdmc->unregister_listener = guest_memfd_rdm_unregister_listener; + rdmc->is_populated = guest_memfd_rdm_is_populated; + rdmc->replay_populated = guest_memfd_rdm_replay_populated; + rdmc->replay_discarded = guest_memfd_rdm_replay_discarded; +} diff --git a/system/meson.build b/system/meson.build index 4952f4b2c7..ed4e1137bd 100644 --- a/system/meson.build +++ b/system/meson.build @@ -15,6 +15,7 @@ system_ss.add(files( 'dirtylimit.c', 'dma-helpers.c', 'globals.c', + 'guest-memfd-manager.c', 'memory_mapping.c', 'qdev-monitor.c', 'qtest.c', From patchwork Fri Dec 13 07:08:45 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chenyi Qiang X-Patchwork-Id: 13906622 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.10]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 91AA718FC8C for ; Fri, 13 Dec 2024 07:09:25 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.10 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734073767; cv=none; b=nywbkiBSpMuzpvvu8XWUL8gDs/lcpjwHLdyFo1llzmZNAYqyZ2j2V03yqg0k9l80a9YpZUIIym9H2TOF7rEV+D4/Tg05CgRdTek2msrg6GX7lE7VxrmNA950lGTBjR4aTL0L0mNYBClG+NeVS2bmJEYicZqCM5ZaUzEWdsMEvfM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734073767; c=relaxed/simple; bh=clDYu4irPfldg6nv/XSDBXCk/lCoW+l/KtnHIqDPmow=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=iotkDdrqquv2F+Qmz7QuQdug3+Xzex4iuyM8gx8IuAurxbOHIXuVVGQDO0BSbHwS2oWnjQKIIkdDfZCbcd0d5OL17x1K2VZlVQ/c5g6pThQ4BsIcJA4eG1GFhBxaKPOKFDUCynBPbmOBQ6MXhKlq3T85Jx3bccRB+aTU9Pyvfuw= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=jgFOmK9H; arc=none smtp.client-ip=198.175.65.10 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="jgFOmK9H" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1734073766; x=1765609766; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=clDYu4irPfldg6nv/XSDBXCk/lCoW+l/KtnHIqDPmow=; b=jgFOmK9H49F8OUmXAomC97WrnTE7ClXeYm3dKosV7XevV3yDObAjSjeo CQan/28s/kOAyIvN9HS3/aqBREc6sQU98VWE5rUBKUjMmNxMCQAyav3GH /4gWXCgI2N1uMWua0O1C9gl5NCmmgx2E5NgwHwzXuk/zvoByH9o4ImqgJ Vx9hjJcOiDHVo9rS0DrPZFX3uOzDCyRmrQBk3/1Pm8W8yUCmEhlUglBmA H66GhVXZbs7I6tbGS74MzO3jzqZWCNmfOpTWqTharl3ck1Lkyx1r9vQrC zWbm9TrnyKzneAMHPh+UF/6Kb/lJthlpSkJG8pllRpz26W4EN1m1jiYji g==; X-CSE-ConnectionGUID: gb3buA+iTfm1Q9jHcngUeQ== X-CSE-MsgGUID: YZzTDSH/Q3umkjFl6X5TaQ== X-IronPort-AV: E=McAfee;i="6700,10204,11284"; a="51937083" X-IronPort-AV: E=Sophos;i="6.12,230,1728975600"; d="scan'208";a="51937083" Received: from orviesa009.jf.intel.com ([10.64.159.149]) by orvoesa102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Dec 2024 23:09:26 -0800 X-CSE-ConnectionGUID: /kKuGrECQ462TJEg+6+ouA== X-CSE-MsgGUID: TZBRwFL4T86Wddo0G1L9Jg== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.12,230,1728975600"; d="scan'208";a="96365557" Received: from emr-bkc.sh.intel.com ([10.112.230.82]) by orviesa009-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Dec 2024 23:09:22 -0800 From: Chenyi Qiang To: David Hildenbrand , Paolo Bonzini , Peter Xu , =?utf-8?q?Philippe_Mathieu-Daud=C3=A9?= , Michael Roth Cc: Chenyi Qiang , qemu-devel@nongnu.org, kvm@vger.kernel.org, Williams Dan J , Peng Chao P , Gao Chao , Xu Yilun Subject: [PATCH 3/7] guest_memfd: Introduce a callback to notify the shared/private state change Date: Fri, 13 Dec 2024 15:08:45 +0800 Message-ID: <20241213070852.106092-4-chenyi.qiang@intel.com> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20241213070852.106092-1-chenyi.qiang@intel.com> References: <20241213070852.106092-1-chenyi.qiang@intel.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Introduce a new state_change() callback in GuestMemfdManagerClass to efficiently notify all registered RamDiscardListeners, including VFIO listeners about the memory conversion events in guest_memfd. The existing VFIO listener can dynamically DMA map/unmap the shared pages based on conversion types: - For conversions from shared to private, the VFIO system ensures the discarding of shared mapping from the IOMMU. - For conversions from private to shared, it triggers the population of the shared mapping into the IOMMU. Additionally, there could be some special conversion requests: - When a conversion request is made for a page already in the desired state, the helper simply returns success. - For requests involving a range partially in the desired state, only the necessary segments are converted, ensuring the entire range complies with the request efficiently. - In scenarios where a conversion request is declined by other systems, such as a failure from VFIO during notify_populate(), the helper will roll back the request, maintaining consistency. Signed-off-by: Chenyi Qiang --- include/sysemu/guest-memfd-manager.h | 3 + system/guest-memfd-manager.c | 144 +++++++++++++++++++++++++++ 2 files changed, 147 insertions(+) diff --git a/include/sysemu/guest-memfd-manager.h b/include/sysemu/guest-memfd-manager.h index ba4a99b614..f4b175529b 100644 --- a/include/sysemu/guest-memfd-manager.h +++ b/include/sysemu/guest-memfd-manager.h @@ -41,6 +41,9 @@ struct GuestMemfdManager { struct GuestMemfdManagerClass { ObjectClass parent_class; + + int (*state_change)(GuestMemfdManager *gmm, uint64_t offset, uint64_t size, + bool shared_to_private); }; #endif diff --git a/system/guest-memfd-manager.c b/system/guest-memfd-manager.c index d7e105fead..6601df5f3f 100644 --- a/system/guest-memfd-manager.c +++ b/system/guest-memfd-manager.c @@ -225,6 +225,147 @@ static void guest_memfd_rdm_replay_discarded(const RamDiscardManager *rdm, guest_memfd_rdm_replay_discarded_cb); } +static bool guest_memfd_is_valid_range(GuestMemfdManager *gmm, + uint64_t offset, uint64_t size) +{ + MemoryRegion *mr = gmm->mr; + + g_assert(mr); + + uint64_t region_size = memory_region_size(mr); + if (!QEMU_IS_ALIGNED(offset, gmm->block_size)) { + return false; + } + if (offset + size < offset || !size) { + return false; + } + if (offset >= region_size || offset + size > region_size) { + return false; + } + return true; +} + +static void guest_memfd_notify_discard(GuestMemfdManager *gmm, + uint64_t offset, uint64_t size) +{ + RamDiscardListener *rdl; + + QLIST_FOREACH(rdl, &gmm->rdl_list, next) { + MemoryRegionSection tmp = *rdl->section; + + if (!memory_region_section_intersect_range(&tmp, offset, size)) { + continue; + } + + guest_memfd_for_each_populated_section(gmm, &tmp, rdl, + guest_memfd_notify_discard_cb); + } +} + + +static int guest_memfd_notify_populate(GuestMemfdManager *gmm, + uint64_t offset, uint64_t size) +{ + RamDiscardListener *rdl, *rdl2; + int ret = 0; + + QLIST_FOREACH(rdl, &gmm->rdl_list, next) { + MemoryRegionSection tmp = *rdl->section; + + if (!memory_region_section_intersect_range(&tmp, offset, size)) { + continue; + } + + ret = guest_memfd_for_each_discarded_section(gmm, &tmp, rdl, + guest_memfd_notify_populate_cb); + if (ret) { + break; + } + } + + if (ret) { + /* Notify all already-notified listeners. */ + QLIST_FOREACH(rdl2, &gmm->rdl_list, next) { + MemoryRegionSection tmp = *rdl2->section; + + if (rdl2 == rdl) { + break; + } + if (!memory_region_section_intersect_range(&tmp, offset, size)) { + continue; + } + + guest_memfd_for_each_discarded_section(gmm, &tmp, rdl2, + guest_memfd_notify_discard_cb); + } + } + return ret; +} + +static bool guest_memfd_is_range_populated(GuestMemfdManager *gmm, + uint64_t offset, uint64_t size) +{ + const unsigned long first_bit = offset / gmm->block_size; + const unsigned long last_bit = first_bit + (size / gmm->block_size) - 1; + unsigned long found_bit; + + /* We fake a shorter bitmap to avoid searching too far. */ + found_bit = find_next_zero_bit(gmm->bitmap, last_bit + 1, first_bit); + return found_bit > last_bit; +} + +static bool guest_memfd_is_range_discarded(GuestMemfdManager *gmm, + uint64_t offset, uint64_t size) +{ + const unsigned long first_bit = offset / gmm->block_size; + const unsigned long last_bit = first_bit + (size / gmm->block_size) - 1; + unsigned long found_bit; + + /* We fake a shorter bitmap to avoid searching too far. */ + found_bit = find_next_bit(gmm->bitmap, last_bit + 1, first_bit); + return found_bit > last_bit; +} + +static int guest_memfd_state_change(GuestMemfdManager *gmm, uint64_t offset, + uint64_t size, bool shared_to_private) +{ + int ret = 0; + + if (!guest_memfd_is_valid_range(gmm, offset, size)) { + error_report("%s, invalid range: offset 0x%lx, size 0x%lx", + __func__, offset, size); + return -1; + } + + if ((shared_to_private && guest_memfd_is_range_discarded(gmm, offset, size)) || + (!shared_to_private && guest_memfd_is_range_populated(gmm, offset, size))) { + return 0; + } + + if (shared_to_private) { + guest_memfd_notify_discard(gmm, offset, size); + } else { + ret = guest_memfd_notify_populate(gmm, offset, size); + } + + if (!ret) { + unsigned long first_bit = offset / gmm->block_size; + unsigned long nbits = size / gmm->block_size; + + g_assert((first_bit + nbits) <= gmm->bitmap_size); + + if (shared_to_private) { + bitmap_clear(gmm->bitmap, first_bit, nbits); + } else { + bitmap_set(gmm->bitmap, first_bit, nbits); + } + + return 0; + } + + return ret; +} + static void guest_memfd_manager_init(Object *obj) { GuestMemfdManager *gmm = GUEST_MEMFD_MANAGER(obj); @@ -239,8 +380,11 @@ static void guest_memfd_manager_finalize(Object *obj) static void guest_memfd_manager_class_init(ObjectClass *oc, void *data) { + GuestMemfdManagerClass *gmmc = GUEST_MEMFD_MANAGER_CLASS(oc); RamDiscardManagerClass *rdmc = RAM_DISCARD_MANAGER_CLASS(oc); + gmmc->state_change = guest_memfd_state_change; + rdmc->get_min_granularity = guest_memfd_rdm_get_min_granularity; rdmc->register_listener = guest_memfd_rdm_register_listener; rdmc->unregister_listener = guest_memfd_rdm_unregister_listener; From patchwork Fri Dec 13 07:08:46 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chenyi Qiang X-Patchwork-Id: 13906623 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.10]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id F060218FDB2 for ; Fri, 13 Dec 2024 07:09:28 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.10 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734073770; cv=none; b=gFoWIZ8Nlbcay1wlyUqB4CscXAr2UuM3i2gDO3T+V29XFyqnT8956yQEgJVtsb1Y3sOJB0bS8yj8c67gAwIgziB9ul8rVKtSP3qkFerW13MKdg29415Mg9reYZLFxybj+ruQ8do2PVYVy07d3O17qcUemix61mjba0cY3b0OPSk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734073770; c=relaxed/simple; bh=vCYpc1Vq5+niHLJ2a/1T5SIiwGkK2cfUHuX+/Tv813o=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=mQ+6akhQbfhiWvcTE8HftlZTPQNq7Y20drim/PAB9Hr4qdMvY/KHNkXuUv9pK5HLsW6wvadFW2JIdGyXml1o6ciDsF7waaNNnAuT9Pphyrr82KCVqxqycD5Xi4Mg50JxiW+BGHYVOezxFmQ2v9mnQUDTRhjJycx0YaKSa0KFfJA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=g0EvWFj7; arc=none smtp.client-ip=198.175.65.10 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="g0EvWFj7" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1734073769; x=1765609769; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=vCYpc1Vq5+niHLJ2a/1T5SIiwGkK2cfUHuX+/Tv813o=; b=g0EvWFj7BUgGvlgeiPOxFr/XPShO8MGMiqSz061LtdAfee2+6On27uDS hMOtM9MdQsfEA3ifXaiYVR0XJyz4hXMHwihyBErVEt5SFMnVTtnW+EJOL Xa430H4zBbFOUdv/sulpqcAcfzRRsGDqzASl5tMQCcel+JW6Wrw2rfWT+ 3tL7NimRJiIGiEP8hePpx0ljyOBcoU94tsrrotEl1ZnQscb6VN3ARf/Fv kyzPVD1/2PLH/rmSTnttYTgzGC7j8NcY4ee5qlUG26PAFoF3o/7NKIsu8 mH8o8mIbdKR2v6Pz8/oTpBdIv6DlJkMvcLY65Eb5fQkRpFab60IpPMOnk A==; X-CSE-ConnectionGUID: 2aDDxuxURnum7lT9tXO4/Q== X-CSE-MsgGUID: 4gWqIqgbStysNeIPO0PBHg== X-IronPort-AV: E=McAfee;i="6700,10204,11284"; a="51937088" X-IronPort-AV: E=Sophos;i="6.12,230,1728975600"; d="scan'208";a="51937088" Received: from orviesa009.jf.intel.com ([10.64.159.149]) by orvoesa102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Dec 2024 23:09:29 -0800 X-CSE-ConnectionGUID: PkxAlT9xTnSNCm03DXVOsw== X-CSE-MsgGUID: Z1jMnQXCQzuVd9eD3SkzhA== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.12,230,1728975600"; d="scan'208";a="96365565" Received: from emr-bkc.sh.intel.com ([10.112.230.82]) by orviesa009-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Dec 2024 23:09:26 -0800 From: Chenyi Qiang To: David Hildenbrand , Paolo Bonzini , Peter Xu , =?utf-8?q?Philippe_Mathieu-Daud=C3=A9?= , Michael Roth Cc: Chenyi Qiang , qemu-devel@nongnu.org, kvm@vger.kernel.org, Williams Dan J , Peng Chao P , Gao Chao , Xu Yilun Subject: [PATCH 4/7] KVM: Notify the state change event during shared/private conversion Date: Fri, 13 Dec 2024 15:08:46 +0800 Message-ID: <20241213070852.106092-5-chenyi.qiang@intel.com> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20241213070852.106092-1-chenyi.qiang@intel.com> References: <20241213070852.106092-1-chenyi.qiang@intel.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Introduce a helper to trigger the state_change() callback of the class. Once exit to userspace to convert the page from private to shared or vice versa at runtime, notify the event via the helper so that other registered subsystems like VFIO can be notified. Signed-off-by: Chenyi Qiang --- accel/kvm/kvm-all.c | 4 ++++ include/sysemu/guest-memfd-manager.h | 15 +++++++++++++++ 2 files changed, 19 insertions(+) diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c index 52425af534..38f41a98a5 100644 --- a/accel/kvm/kvm-all.c +++ b/accel/kvm/kvm-all.c @@ -48,6 +48,7 @@ #include "kvm-cpus.h" #include "sysemu/dirtylimit.h" #include "qemu/range.h" +#include "sysemu/guest-memfd-manager.h" #include "hw/boards.h" #include "sysemu/stats.h" @@ -3080,6 +3081,9 @@ int kvm_convert_memory(hwaddr start, hwaddr size, bool to_private) addr = memory_region_get_ram_ptr(mr) + section.offset_within_region; rb = qemu_ram_block_from_host(addr, false, &offset); + guest_memfd_manager_state_change(GUEST_MEMFD_MANAGER(mr->rdm), offset, + size, to_private); + if (to_private) { if (rb->page_size != qemu_real_host_page_size()) { /* diff --git a/include/sysemu/guest-memfd-manager.h b/include/sysemu/guest-memfd-manager.h index f4b175529b..9dc4e0346d 100644 --- a/include/sysemu/guest-memfd-manager.h +++ b/include/sysemu/guest-memfd-manager.h @@ -46,4 +46,19 @@ struct GuestMemfdManagerClass { bool shared_to_private); }; +static inline int guest_memfd_manager_state_change(GuestMemfdManager *gmm, uint64_t offset, + uint64_t size, bool shared_to_private) +{ + GuestMemfdManagerClass *klass; + + g_assert(gmm); + klass = GUEST_MEMFD_MANAGER_GET_CLASS(gmm); + + if (klass->state_change) { + return klass->state_change(gmm, offset, size, shared_to_private); + } + + return 0; +} + #endif From patchwork Fri Dec 13 07:08:47 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chenyi Qiang X-Patchwork-Id: 13906624 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.10]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B0E3918FDC6 for ; Fri, 13 Dec 2024 07:09:31 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.10 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734073773; cv=none; b=qsoyv7Pgim2jCqLZUD5D9SbkddWilXHHpXUVjZO1Gor/HIf2tZuc5lwPbzEfPZKsTWaEceNqZE9KPxgoOmyYA28HJQR3GgsYDYpSultzTNIIjxNav2TNItZqves9Vw7IU3uln6CoPj9CY+pxhsyjyReiqu7Ka+oCX+f7epsBDyM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734073773; c=relaxed/simple; bh=A4w+xUWPFzrHzwWGCwJC81L1HqXdufeFhGlIL1EjnJc=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=sUL1RV0MgIqBERT2uJ50bTCvSX3biaIxNWk/0kSStPwbsTQ7QuVanatMFvkDGLFz9rky3CRZvmbddqa35FcbfUmFugZSgDg+M0iwaOG5UiLDLuakGupIoQkznJ8yMCkopDAZticO8Xnwj0MUPKGMGYiKVR79efta3OxAM+of3MM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=CRzU8fbu; arc=none smtp.client-ip=198.175.65.10 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="CRzU8fbu" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1734073772; x=1765609772; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=A4w+xUWPFzrHzwWGCwJC81L1HqXdufeFhGlIL1EjnJc=; b=CRzU8fbuk9T6+s8sfDFIt8m6XxDd13/7hrZ7LHyXufKadp/JUZBU5qTs IiBwk0otW+C5muDkuZzF379pSQ8Ya+JKHULO/29W+sdHZMxZsmaceNgS6 aHa0st3tFhc6w2BSzOwGLdPG0GL/CKtKx5LPoef/svfc6VY3EC1gCjn0k qNqQtm0NVeYjJJt8c0yNmz2txkp54sjYbtXskxT4WoeZCurTNvrdBVfjw MqaYAFqZW0QtBvFQtuxyN0dun+M69iF/5WwHK4UehIql8vD92bMeAIyuQ Ih/LKCacyx1hMmA9NBYDekP1G+BxBQXrvTL9Rnj4xO2ctQ6jfepHKlsdX g==; X-CSE-ConnectionGUID: nyAIUrumSSGpPya6JkXnmA== X-CSE-MsgGUID: cIvbMKFlRKyUMj07QKFknQ== X-IronPort-AV: E=McAfee;i="6700,10204,11284"; a="51937094" X-IronPort-AV: E=Sophos;i="6.12,230,1728975600"; d="scan'208";a="51937094" Received: from orviesa009.jf.intel.com ([10.64.159.149]) by orvoesa102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Dec 2024 23:09:32 -0800 X-CSE-ConnectionGUID: XvVkZrwWTDSpO5tbYZGm3Q== X-CSE-MsgGUID: WI9Kr5bxSjiT/r1psnF6jg== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.12,230,1728975600"; d="scan'208";a="96365574" Received: from emr-bkc.sh.intel.com ([10.112.230.82]) by orviesa009-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Dec 2024 23:09:29 -0800 From: Chenyi Qiang To: David Hildenbrand , Paolo Bonzini , Peter Xu , =?utf-8?q?Philippe_Mathieu-Daud=C3=A9?= , Michael Roth Cc: Chenyi Qiang , qemu-devel@nongnu.org, kvm@vger.kernel.org, Williams Dan J , Peng Chao P , Gao Chao , Xu Yilun Subject: [PATCH 5/7] memory: Register the RamDiscardManager instance upon guest_memfd creation Date: Fri, 13 Dec 2024 15:08:47 +0800 Message-ID: <20241213070852.106092-6-chenyi.qiang@intel.com> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20241213070852.106092-1-chenyi.qiang@intel.com> References: <20241213070852.106092-1-chenyi.qiang@intel.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Introduce the realize()/unrealize() callbacks to initialize/uninitialize the new guest_memfd_manager object and register/unregister it in the target MemoryRegion. Guest_memfd was initially set to shared until the commit bd3bcf6962 ("kvm/memory: Make memory type private by default if it has guest memfd backend"). To align with this change, the default state in guest_memfd_manager is set to private. (The bitmap is cleared to 0). Additionally, setting the default to private can also reduce the overhead of mapping shared pages into IOMMU by VFIO during the bootup stage. Signed-off-by: Chenyi Qiang --- include/sysemu/guest-memfd-manager.h | 27 +++++++++++++++++++++++++++ system/guest-memfd-manager.c | 28 +++++++++++++++++++++++++++- system/physmem.c | 7 +++++++ 3 files changed, 61 insertions(+), 1 deletion(-) diff --git a/include/sysemu/guest-memfd-manager.h b/include/sysemu/guest-memfd-manager.h index 9dc4e0346d..d1e7f698e8 100644 --- a/include/sysemu/guest-memfd-manager.h +++ b/include/sysemu/guest-memfd-manager.h @@ -42,6 +42,8 @@ struct GuestMemfdManager { struct GuestMemfdManagerClass { ObjectClass parent_class; + void (*realize)(GuestMemfdManager *gmm, MemoryRegion *mr, uint64_t region_size); + void (*unrealize)(GuestMemfdManager *gmm); int (*state_change)(GuestMemfdManager *gmm, uint64_t offset, uint64_t size, bool shared_to_private); }; @@ -61,4 +63,29 @@ static inline int guest_memfd_manager_state_change(GuestMemfdManager *gmm, uint6 return 0; } +static inline void guest_memfd_manager_realize(GuestMemfdManager *gmm, + MemoryRegion *mr, uint64_t region_size) +{ + GuestMemfdManagerClass *klass; + + g_assert(gmm); + klass = GUEST_MEMFD_MANAGER_GET_CLASS(gmm); + + if (klass->realize) { + klass->realize(gmm, mr, region_size); + } +} + +static inline void guest_memfd_manager_unrealize(GuestMemfdManager *gmm) +{ + GuestMemfdManagerClass *klass; + + g_assert(gmm); + klass = GUEST_MEMFD_MANAGER_GET_CLASS(gmm); + + if (klass->unrealize) { + klass->unrealize(gmm); + } +} + #endif diff --git a/system/guest-memfd-manager.c b/system/guest-memfd-manager.c index 6601df5f3f..b6a32f0bfb 100644 --- a/system/guest-memfd-manager.c +++ b/system/guest-memfd-manager.c @@ -366,6 +366,31 @@ static int guest_memfd_state_change(GuestMemfdManager *gmm, uint64_t offset, return ret; } +static void guest_memfd_manager_realizefn(GuestMemfdManager *gmm, MemoryRegion *mr, + uint64_t region_size) +{ + uint64_t bitmap_size; + + gmm->block_size = qemu_real_host_page_size(); + bitmap_size = ROUND_UP(region_size, gmm->block_size) / gmm->block_size; + + gmm->mr = mr; + gmm->bitmap_size = bitmap_size; + gmm->bitmap = bitmap_new(bitmap_size); + + memory_region_set_ram_discard_manager(gmm->mr, RAM_DISCARD_MANAGER(gmm)); +} + +static void guest_memfd_manager_unrealizefn(GuestMemfdManager *gmm) +{ + memory_region_set_ram_discard_manager(gmm->mr, NULL); + + g_free(gmm->bitmap); + gmm->bitmap = NULL; + gmm->bitmap_size = 0; + gmm->mr = NULL; +} + static void guest_memfd_manager_init(Object *obj) { GuestMemfdManager *gmm = GUEST_MEMFD_MANAGER(obj); @@ -375,7 +400,6 @@ static void guest_memfd_manager_init(Object *obj) static void guest_memfd_manager_finalize(Object *obj) { - g_free(GUEST_MEMFD_MANAGER(obj)->bitmap); } static void guest_memfd_manager_class_init(ObjectClass *oc, void *data) @@ -384,6 +408,8 @@ static void guest_memfd_manager_class_init(ObjectClass *oc, void *data) RamDiscardManagerClass *rdmc = RAM_DISCARD_MANAGER_CLASS(oc); gmmc->state_change = guest_memfd_state_change; + gmmc->realize = guest_memfd_manager_realizefn; + gmmc->unrealize = guest_memfd_manager_unrealizefn; rdmc->get_min_granularity = guest_memfd_rdm_get_min_granularity; rdmc->register_listener = guest_memfd_rdm_register_listener; diff --git a/system/physmem.c b/system/physmem.c index dc1db3a384..532182a6dd 100644 --- a/system/physmem.c +++ b/system/physmem.c @@ -53,6 +53,7 @@ #include "sysemu/hostmem.h" #include "sysemu/hw_accel.h" #include "sysemu/xen-mapcache.h" +#include "sysemu/guest-memfd-manager.h" #include "trace.h" #ifdef CONFIG_FALLOCATE_PUNCH_HOLE @@ -1885,6 +1886,9 @@ static void ram_block_add(RAMBlock *new_block, Error **errp) qemu_mutex_unlock_ramlist(); goto out_free; } + + GuestMemfdManager *gmm = GUEST_MEMFD_MANAGER(object_new(TYPE_GUEST_MEMFD_MANAGER)); + guest_memfd_manager_realize(gmm, new_block->mr, new_block->mr->size); } ram_size = (new_block->offset + new_block->max_length) >> TARGET_PAGE_BITS; @@ -2139,6 +2143,9 @@ static void reclaim_ramblock(RAMBlock *block) if (block->guest_memfd >= 0) { close(block->guest_memfd); + GuestMemfdManager *gmm = GUEST_MEMFD_MANAGER(block->mr->rdm); + guest_memfd_manager_unrealize(gmm); + object_unref(OBJECT(gmm)); ram_block_discard_require(false); } From patchwork Fri Dec 13 07:08:48 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chenyi Qiang X-Patchwork-Id: 13906625 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.10]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9D8FF19006B for ; Fri, 13 Dec 2024 07:09:34 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.10 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734073776; cv=none; b=b2I7viAiGqRov5/7R2JW8fkRdpDEwi3OAX+litBHu0yrKfqT2vErWexLnvhTey4pCzkZExdCPHrwy0UXtGa19VtbLd/v7jrsXjuOORG8u1zoMx9cwsNTYDYN1qaEgztgI078oPbZIw+X6I7FhZ8KB6sZKjeOiQbdZwCK6CpgSJQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734073776; c=relaxed/simple; bh=be+9xTTlW9Q1UDuF2/Stfqw8KCfr7EfJ8tCu6lwkuG8=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=ZIFJ8Vv6whQZYOrJZOJmOvWMG7SsPEEPPqbk2dzmgdVQ8AMNnyoXETqif60dld13MdiOp+4VPkziLEnvfyut0cCKGl4g6WuL2QgG46CnVKwhEWMj+FduyFk9oeqmy6B56nJA4OVnlPCRM4QpAyKPHYWk/RjWDxCwi1nPvMFtUbk= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=O0aZH8u9; arc=none smtp.client-ip=198.175.65.10 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="O0aZH8u9" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1734073775; x=1765609775; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=be+9xTTlW9Q1UDuF2/Stfqw8KCfr7EfJ8tCu6lwkuG8=; b=O0aZH8u9NQMSpBSpGR2WaBnsmnKZ9hSJyyCGNQPXu8O3osbyK/tA0wvE fAlIWrPVSKHIsLSfQ7crPOuK6I1dnKpT7QftyiMJn/MM8uN2OgSld9rU9 0srYL0Vx7YUsGRK0u+pxkmoxFzHmvyrPV1IMN30KekwayEvZN5Vv6o2/A D7Y5M9wEAEFm5ai/831Cs8XaUJsxji64KfjvZdrBKNgpd9ZuaWUQfirSH pDyv4oBbITYsnaLKBdw2UXtbpFZn1soyjkAVZrgfLK/gROb8f+8+CWLTY Fkv82tUKgnc/fUhk22UCynpiGIgTr9E13TPKOsFzvHdITrhEDClqO0xVC A==; X-CSE-ConnectionGUID: 5LkizjiSQCi1W9wmSY7O5g== X-CSE-MsgGUID: Tghwb7ECRxKpxhYn44hSPg== X-IronPort-AV: E=McAfee;i="6700,10204,11284"; a="51937099" X-IronPort-AV: E=Sophos;i="6.12,230,1728975600"; d="scan'208";a="51937099" Received: from orviesa009.jf.intel.com ([10.64.159.149]) by orvoesa102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Dec 2024 23:09:35 -0800 X-CSE-ConnectionGUID: EN3ia8tuQ6Cs2wMQRzzq2g== X-CSE-MsgGUID: RVV3xVuERumDJExGsQeMfw== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.12,230,1728975600"; d="scan'208";a="96365582" Received: from emr-bkc.sh.intel.com ([10.112.230.82]) by orviesa009-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Dec 2024 23:09:32 -0800 From: Chenyi Qiang To: David Hildenbrand , Paolo Bonzini , Peter Xu , =?utf-8?q?Philippe_Mathieu-Daud=C3=A9?= , Michael Roth Cc: Chenyi Qiang , qemu-devel@nongnu.org, kvm@vger.kernel.org, Williams Dan J , Peng Chao P , Gao Chao , Xu Yilun Subject: [PATCH 6/7] RAMBlock: make guest_memfd require coordinate discard Date: Fri, 13 Dec 2024 15:08:48 +0800 Message-ID: <20241213070852.106092-7-chenyi.qiang@intel.com> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20241213070852.106092-1-chenyi.qiang@intel.com> References: <20241213070852.106092-1-chenyi.qiang@intel.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 As guest_memfd is now managed by guest_memfd_manager with RamDiscardManager, only block uncoordinated discard. Signed-off-by: Chenyi Qiang --- system/physmem.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/system/physmem.c b/system/physmem.c index 532182a6dd..585090b063 100644 --- a/system/physmem.c +++ b/system/physmem.c @@ -1872,7 +1872,7 @@ static void ram_block_add(RAMBlock *new_block, Error **errp) assert(kvm_enabled()); assert(new_block->guest_memfd < 0); - ret = ram_block_discard_require(true); + ret = ram_block_coordinated_discard_require(true); if (ret < 0) { error_setg_errno(errp, -ret, "cannot set up private guest memory: discard currently blocked"); From patchwork Fri Dec 13 07:08:49 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chenyi Qiang X-Patchwork-Id: 13906626 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.10]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C735119006B for ; Fri, 13 Dec 2024 07:09:37 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.10 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734073780; cv=none; b=sqxbAFOG1AP334WTq7CVxptT6PM9YNLz7umtoGnFlXnv9vWa/hZO4ClQt4psqSamhsF+UaUemJhtyzIHJZMDjl9YyUPtSHVbJKZYmkunrtFsmsT51oyPZzp383YKKexnX6iQY5OxRW2yErFOUEfqBR3GfNJv9u35cmQkQeqZlQ8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734073780; c=relaxed/simple; bh=t+LB+Ns9IyXdCBWhMrXMtMR2X8Lk7IJwTq/0yEzRAaU=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=Ra8Ka15Bw53cjt9HmGlgWFBc2CGOPAv0IR+IlRZL+xJSL4EkzQKzyVEI+9tMKazAdwSQxGa48I7ITyvIsqaDGilNMQPHluP7gqX3taWW1sZn5AdZEMS+AIiLSAKKtR9AMiBpAXGtG6EOgJviA1h01gEa5MzWecHzdFfZ7znT+po= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=FpR+xUK4; arc=none smtp.client-ip=198.175.65.10 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="FpR+xUK4" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1734073778; x=1765609778; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=t+LB+Ns9IyXdCBWhMrXMtMR2X8Lk7IJwTq/0yEzRAaU=; b=FpR+xUK4FHYRXQYR47nKrHxN5DUxoU/LmOz3F/ehhdzzWZOaX5Lgh+4Q mjf1aelr9fs9wP/+E50EQnquedVhrjpTGHfnP/EPjJMQyztQ6jiLZuq8E 0iwvF8+qH0E9LoOcrF/0kC9Zn4/SR/yUFy6v68HRVjHtUCar6K2QtsoBK iQPQh5cNOsCN/eE+zK1bCu1Mw54K2tc2gT4PfSBm9/NEyJJ9+tWh4eSWV zKlXeH/Qj0v8Vh4rld4HZDFzGiwKC9vE/ya8BtsfF8FOOERH2TwRq92SM V1BkByGdqVa6HwrZ5RQOmXa5ZQ6qhPE8zGI6Jol5g1cRgoBRqAYbBl6hb g==; X-CSE-ConnectionGUID: AxRolGBxRmK2CVK8PiZ5mg== X-CSE-MsgGUID: smAFfKDPTRqmvREA37MgMQ== X-IronPort-AV: E=McAfee;i="6700,10204,11284"; a="51937109" X-IronPort-AV: E=Sophos;i="6.12,230,1728975600"; d="scan'208";a="51937109" Received: from orviesa009.jf.intel.com ([10.64.159.149]) by orvoesa102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Dec 2024 23:09:38 -0800 X-CSE-ConnectionGUID: oYKPgmgOTZGMb6AP0BKP5w== X-CSE-MsgGUID: y5rdZvvoRtibrtxIQBkOlg== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.12,230,1728975600"; d="scan'208";a="96365590" Received: from emr-bkc.sh.intel.com ([10.112.230.82]) by orviesa009-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Dec 2024 23:09:35 -0800 From: Chenyi Qiang To: David Hildenbrand , Paolo Bonzini , Peter Xu , =?utf-8?q?Philippe_Mathieu-Daud=C3=A9?= , Michael Roth Cc: Chenyi Qiang , qemu-devel@nongnu.org, kvm@vger.kernel.org, Williams Dan J , Peng Chao P , Gao Chao , Xu Yilun Subject: [RFC PATCH 7/7] memory: Add a new argument to indicate the request attribute in RamDismcardManager helpers Date: Fri, 13 Dec 2024 15:08:49 +0800 Message-ID: <20241213070852.106092-8-chenyi.qiang@intel.com> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20241213070852.106092-1-chenyi.qiang@intel.com> References: <20241213070852.106092-1-chenyi.qiang@intel.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 For each ram_discard_manager helper, add a new argument 'is_private' to indicate the request attribute. If is_private is true, the operation targets the private range in the section. For example, replay_populate(true) will replay the populate operation on private part in the MemoryRegionSection, while replay_popuate(false) will replay population on shared part. This helps to distinguish between the states of private/shared and discarded/populated. It is essential for guest_memfd_manager which uses RamDiscardManager interface but can't treat private memory as discarded memory. This is because it does not align with the expectation of current RamDiscardManager users (e.g. live migration), who expect that discarded memory is hot-removed and can be skipped when processing guest memory. Treating private memory as discarded won't work in the future if live migration needs to handle private memory. For example, live migration needs to migrate private memory. The user of the helper needs to figure out which attribute to manipulate. For legacy VM case, use is_private=true by default. Private attribute is only valid in a guest_memfd based VM. Opportunistically rename the guest_memfd_for_each_{discarded, populated}_section() to guest_memfd_for_each_{private, shared)_section() to distinguish between private/shared and discarded/populated at the same time. Signed-off-by: Chenyi Qiang --- hw/vfio/common.c | 22 ++++++-- hw/virtio/virtio-mem.c | 23 ++++---- include/exec/memory.h | 23 ++++++-- migration/ram.c | 14 ++--- system/guest-memfd-manager.c | 106 +++++++++++++++++++++++------------ system/memory.c | 13 +++-- system/memory_mapping.c | 4 +- 7 files changed, 135 insertions(+), 70 deletions(-) diff --git a/hw/vfio/common.c b/hw/vfio/common.c index dcef44fe55..a6f49e6450 100644 --- a/hw/vfio/common.c +++ b/hw/vfio/common.c @@ -345,7 +345,8 @@ out: } static void vfio_ram_discard_notify_discard(RamDiscardListener *rdl, - MemoryRegionSection *section) + MemoryRegionSection *section, + bool is_private) { VFIORamDiscardListener *vrdl = container_of(rdl, VFIORamDiscardListener, listener); @@ -354,6 +355,11 @@ static void vfio_ram_discard_notify_discard(RamDiscardListener *rdl, const hwaddr iova = section->offset_within_address_space; int ret; + if (is_private) { + /* Not support discard private memory yet. */ + return; + } + /* Unmap with a single call. */ ret = vfio_container_dma_unmap(bcontainer, iova, size , NULL); if (ret) { @@ -363,7 +369,8 @@ static void vfio_ram_discard_notify_discard(RamDiscardListener *rdl, } static int vfio_ram_discard_notify_populate(RamDiscardListener *rdl, - MemoryRegionSection *section) + MemoryRegionSection *section, + bool is_private) { VFIORamDiscardListener *vrdl = container_of(rdl, VFIORamDiscardListener, listener); @@ -374,6 +381,11 @@ static int vfio_ram_discard_notify_populate(RamDiscardListener *rdl, void *vaddr; int ret; + if (is_private) { + /* Not support discard private memory yet. */ + return 0; + } + /* * Map in (aligned within memory region) minimum granularity, so we can * unmap in minimum granularity later. @@ -390,7 +402,7 @@ static int vfio_ram_discard_notify_populate(RamDiscardListener *rdl, vaddr, section->readonly); if (ret) { /* Rollback */ - vfio_ram_discard_notify_discard(rdl, section); + vfio_ram_discard_notify_discard(rdl, section, false); return ret; } } @@ -1248,7 +1260,7 @@ out: } static int vfio_ram_discard_get_dirty_bitmap(MemoryRegionSection *section, - void *opaque) + bool is_private, void *opaque) { const hwaddr size = int128_get64(section->size); const hwaddr iova = section->offset_within_address_space; @@ -1293,7 +1305,7 @@ vfio_sync_ram_discard_listener_dirty_bitmap(VFIOContainerBase *bcontainer, * We only want/can synchronize the bitmap for actually mapped parts - * which correspond to populated parts. Replay all populated parts. */ - return ram_discard_manager_replay_populated(rdm, section, + return ram_discard_manager_replay_populated(rdm, section, false, vfio_ram_discard_get_dirty_bitmap, &vrdl); } diff --git a/hw/virtio/virtio-mem.c b/hw/virtio/virtio-mem.c index e3d1ccaeeb..e7304c7e47 100644 --- a/hw/virtio/virtio-mem.c +++ b/hw/virtio/virtio-mem.c @@ -312,14 +312,14 @@ static int virtio_mem_notify_populate_cb(MemoryRegionSection *s, void *arg) { RamDiscardListener *rdl = arg; - return rdl->notify_populate(rdl, s); + return rdl->notify_populate(rdl, s, false); } static int virtio_mem_notify_discard_cb(MemoryRegionSection *s, void *arg) { RamDiscardListener *rdl = arg; - rdl->notify_discard(rdl, s); + rdl->notify_discard(rdl, s, false); return 0; } @@ -334,7 +334,7 @@ static void virtio_mem_notify_unplug(VirtIOMEM *vmem, uint64_t offset, if (!memory_region_section_intersect_range(&tmp, offset, size)) { continue; } - rdl->notify_discard(rdl, &tmp); + rdl->notify_discard(rdl, &tmp, false); } } @@ -350,7 +350,7 @@ static int virtio_mem_notify_plug(VirtIOMEM *vmem, uint64_t offset, if (!memory_region_section_intersect_range(&tmp, offset, size)) { continue; } - ret = rdl->notify_populate(rdl, &tmp); + ret = rdl->notify_populate(rdl, &tmp, false); if (ret) { break; } @@ -367,7 +367,7 @@ static int virtio_mem_notify_plug(VirtIOMEM *vmem, uint64_t offset, if (!memory_region_section_intersect_range(&tmp, offset, size)) { continue; } - rdl2->notify_discard(rdl2, &tmp); + rdl2->notify_discard(rdl2, &tmp, false); } } return ret; @@ -383,7 +383,7 @@ static void virtio_mem_notify_unplug_all(VirtIOMEM *vmem) QLIST_FOREACH(rdl, &vmem->rdl_list, next) { if (rdl->double_discard_supported) { - rdl->notify_discard(rdl, rdl->section); + rdl->notify_discard(rdl, rdl->section, false); } else { virtio_mem_for_each_plugged_section(vmem, rdl->section, rdl, virtio_mem_notify_discard_cb); @@ -1685,7 +1685,8 @@ static uint64_t virtio_mem_rdm_get_min_granularity(const RamDiscardManager *rdm, } static bool virtio_mem_rdm_is_populated(const RamDiscardManager *rdm, - const MemoryRegionSection *s) + const MemoryRegionSection *s, + bool is_private) { const VirtIOMEM *vmem = VIRTIO_MEM(rdm); uint64_t start_gpa = vmem->addr + s->offset_within_region; @@ -1712,11 +1713,12 @@ static int virtio_mem_rdm_replay_populated_cb(MemoryRegionSection *s, void *arg) { struct VirtIOMEMReplayData *data = arg; - return ((ReplayRamPopulate)data->fn)(s, data->opaque); + return ((ReplayRamPopulate)data->fn)(s, false, data->opaque); } static int virtio_mem_rdm_replay_populated(const RamDiscardManager *rdm, MemoryRegionSection *s, + bool is_private, ReplayRamPopulate replay_fn, void *opaque) { @@ -1736,12 +1738,13 @@ static int virtio_mem_rdm_replay_discarded_cb(MemoryRegionSection *s, { struct VirtIOMEMReplayData *data = arg; - ((ReplayRamDiscard)data->fn)(s, data->opaque); + ((ReplayRamDiscard)data->fn)(s, false, data->opaque); return 0; } static void virtio_mem_rdm_replay_discarded(const RamDiscardManager *rdm, MemoryRegionSection *s, + bool is_private, ReplayRamDiscard replay_fn, void *opaque) { @@ -1783,7 +1786,7 @@ static void virtio_mem_rdm_unregister_listener(RamDiscardManager *rdm, g_assert(rdl->section->mr == &vmem->memdev->mr); if (vmem->size) { if (rdl->double_discard_supported) { - rdl->notify_discard(rdl, rdl->section); + rdl->notify_discard(rdl, rdl->section, false); } else { virtio_mem_for_each_plugged_section(vmem, rdl->section, rdl, virtio_mem_notify_discard_cb); diff --git a/include/exec/memory.h b/include/exec/memory.h index ec7bc641e8..8aac61af08 100644 --- a/include/exec/memory.h +++ b/include/exec/memory.h @@ -508,9 +508,11 @@ struct IOMMUMemoryRegionClass { typedef struct RamDiscardListener RamDiscardListener; typedef int (*NotifyRamPopulate)(RamDiscardListener *rdl, - MemoryRegionSection *section); + MemoryRegionSection *section, + bool is_private); typedef void (*NotifyRamDiscard)(RamDiscardListener *rdl, - MemoryRegionSection *section); + MemoryRegionSection *section, + bool is_private); struct RamDiscardListener { /* @@ -566,8 +568,8 @@ static inline void ram_discard_listener_init(RamDiscardListener *rdl, rdl->double_discard_supported = double_discard_supported; } -typedef int (*ReplayRamPopulate)(MemoryRegionSection *section, void *opaque); -typedef void (*ReplayRamDiscard)(MemoryRegionSection *section, void *opaque); +typedef int (*ReplayRamPopulate)(MemoryRegionSection *section, bool is_private, void *opaque); +typedef void (*ReplayRamDiscard)(MemoryRegionSection *section, bool is_private, void *opaque); /* * RamDiscardManagerClass: @@ -632,11 +634,13 @@ struct RamDiscardManagerClass { * * @rdm: the #RamDiscardManager * @section: the #MemoryRegionSection + * @is_private: the attribute of the request section * * Returns whether the given range is completely populated. */ bool (*is_populated)(const RamDiscardManager *rdm, - const MemoryRegionSection *section); + const MemoryRegionSection *section, + bool is_private); /** * @replay_populated: @@ -648,6 +652,7 @@ struct RamDiscardManagerClass { * * @rdm: the #RamDiscardManager * @section: the #MemoryRegionSection + * @is_private: the attribute of the populated parts * @replay_fn: the #ReplayRamPopulate callback * @opaque: pointer to forward to the callback * @@ -655,6 +660,7 @@ struct RamDiscardManagerClass { */ int (*replay_populated)(const RamDiscardManager *rdm, MemoryRegionSection *section, + bool is_private, ReplayRamPopulate replay_fn, void *opaque); /** @@ -665,11 +671,13 @@ struct RamDiscardManagerClass { * * @rdm: the #RamDiscardManager * @section: the #MemoryRegionSection + * @is_private: the attribute of the discarded parts * @replay_fn: the #ReplayRamDiscard callback * @opaque: pointer to forward to the callback */ void (*replay_discarded)(const RamDiscardManager *rdm, MemoryRegionSection *section, + bool is_private, ReplayRamDiscard replay_fn, void *opaque); /** @@ -709,15 +717,18 @@ uint64_t ram_discard_manager_get_min_granularity(const RamDiscardManager *rdm, const MemoryRegion *mr); bool ram_discard_manager_is_populated(const RamDiscardManager *rdm, - const MemoryRegionSection *section); + const MemoryRegionSection *section, + bool is_private); int ram_discard_manager_replay_populated(const RamDiscardManager *rdm, MemoryRegionSection *section, + bool is_private, ReplayRamPopulate replay_fn, void *opaque); void ram_discard_manager_replay_discarded(const RamDiscardManager *rdm, MemoryRegionSection *section, + bool is_private, ReplayRamDiscard replay_fn, void *opaque); diff --git a/migration/ram.c b/migration/ram.c index 05ff9eb328..b9efba1d14 100644 --- a/migration/ram.c +++ b/migration/ram.c @@ -838,7 +838,7 @@ static inline bool migration_bitmap_clear_dirty(RAMState *rs, } static void dirty_bitmap_clear_section(MemoryRegionSection *section, - void *opaque) + bool is_private, void *opaque) { const hwaddr offset = section->offset_within_region; const hwaddr size = int128_get64(section->size); @@ -884,7 +884,7 @@ static uint64_t ramblock_dirty_bitmap_clear_discarded_pages(RAMBlock *rb) .size = int128_make64(qemu_ram_get_used_length(rb)), }; - ram_discard_manager_replay_discarded(rdm, §ion, + ram_discard_manager_replay_discarded(rdm, §ion, false, dirty_bitmap_clear_section, &cleared_bits); } @@ -907,7 +907,7 @@ bool ramblock_page_is_discarded(RAMBlock *rb, ram_addr_t start) .size = int128_make64(qemu_ram_pagesize(rb)), }; - return !ram_discard_manager_is_populated(rdm, §ion); + return !ram_discard_manager_is_populated(rdm, §ion, false); } return false; } @@ -1539,7 +1539,7 @@ static inline void populate_read_range(RAMBlock *block, ram_addr_t offset, } static inline int populate_read_section(MemoryRegionSection *section, - void *opaque) + bool is_private, void *opaque) { const hwaddr size = int128_get64(section->size); hwaddr offset = section->offset_within_region; @@ -1579,7 +1579,7 @@ static void ram_block_populate_read(RAMBlock *rb) .size = rb->mr->size, }; - ram_discard_manager_replay_populated(rdm, §ion, + ram_discard_manager_replay_populated(rdm, §ion, false, populate_read_section, NULL); } else { populate_read_range(rb, 0, rb->used_length); @@ -1614,7 +1614,7 @@ void ram_write_tracking_prepare(void) } static inline int uffd_protect_section(MemoryRegionSection *section, - void *opaque) + bool is_private, void *opaque) { const hwaddr size = int128_get64(section->size); const hwaddr offset = section->offset_within_region; @@ -1638,7 +1638,7 @@ static int ram_block_uffd_protect(RAMBlock *rb, int uffd_fd) .size = rb->mr->size, }; - return ram_discard_manager_replay_populated(rdm, §ion, + return ram_discard_manager_replay_populated(rdm, §ion, false, uffd_protect_section, (void *)(uintptr_t)uffd_fd); } diff --git a/system/guest-memfd-manager.c b/system/guest-memfd-manager.c index b6a32f0bfb..50802b34d7 100644 --- a/system/guest-memfd-manager.c +++ b/system/guest-memfd-manager.c @@ -23,39 +23,51 @@ OBJECT_DEFINE_SIMPLE_TYPE_WITH_INTERFACES(GuestMemfdManager, { }) static bool guest_memfd_rdm_is_populated(const RamDiscardManager *rdm, - const MemoryRegionSection *section) + const MemoryRegionSection *section, + bool is_private) { const GuestMemfdManager *gmm = GUEST_MEMFD_MANAGER(rdm); uint64_t first_bit = section->offset_within_region / gmm->block_size; uint64_t last_bit = first_bit + int128_get64(section->size) / gmm->block_size - 1; unsigned long first_discard_bit; - first_discard_bit = find_next_zero_bit(gmm->bitmap, last_bit + 1, first_bit); + if (is_private) { + /* Check if the private section is populated */ + first_discard_bit = find_next_bit(gmm->bitmap, last_bit + 1, first_bit); + } else { + /* Check if the shared section is populated */ + first_discard_bit = find_next_zero_bit(gmm->bitmap, last_bit + 1, first_bit); + } + return first_discard_bit > last_bit; } -typedef int (*guest_memfd_section_cb)(MemoryRegionSection *s, void *arg); +typedef int (*guest_memfd_section_cb)(MemoryRegionSection *s, bool is_private, + void *arg); -static int guest_memfd_notify_populate_cb(MemoryRegionSection *section, void *arg) +static int guest_memfd_notify_populate_cb(MemoryRegionSection *section, bool is_private, + void *arg) { RamDiscardListener *rdl = arg; - return rdl->notify_populate(rdl, section); + return rdl->notify_populate(rdl, section, is_private); } -static int guest_memfd_notify_discard_cb(MemoryRegionSection *section, void *arg) +static int guest_memfd_notify_discard_cb(MemoryRegionSection *section, bool is_private, + void *arg) { RamDiscardListener *rdl = arg; - rdl->notify_discard(rdl, section); + rdl->notify_discard(rdl, section, is_private); return 0; } -static int guest_memfd_for_each_populated_section(const GuestMemfdManager *gmm, - MemoryRegionSection *section, - void *arg, - guest_memfd_section_cb cb) +static int guest_memfd_for_each_shared_section(const GuestMemfdManager *gmm, + MemoryRegionSection *section, + bool is_private, + void *arg, + guest_memfd_section_cb cb) { unsigned long first_one_bit, last_one_bit; uint64_t offset, size; @@ -76,7 +88,7 @@ static int guest_memfd_for_each_populated_section(const GuestMemfdManager *gmm, break; } - ret = cb(&tmp, arg); + ret = cb(&tmp, is_private, arg); if (ret) { break; } @@ -88,10 +100,11 @@ static int guest_memfd_for_each_populated_section(const GuestMemfdManager *gmm, return ret; } -static int guest_memfd_for_each_discarded_section(const GuestMemfdManager *gmm, - MemoryRegionSection *section, - void *arg, - guest_memfd_section_cb cb) +static int guest_memfd_for_each_private_section(const GuestMemfdManager *gmm, + MemoryRegionSection *section, + bool is_private, + void *arg, + guest_memfd_section_cb cb) { unsigned long first_zero_bit, last_zero_bit; uint64_t offset, size; @@ -113,7 +126,7 @@ static int guest_memfd_for_each_discarded_section(const GuestMemfdManager *gmm, break; } - ret = cb(&tmp, arg); + ret = cb(&tmp, is_private, arg); if (ret) { break; } @@ -146,8 +159,9 @@ static void guest_memfd_rdm_register_listener(RamDiscardManager *rdm, QLIST_INSERT_HEAD(&gmm->rdl_list, rdl, next); - ret = guest_memfd_for_each_populated_section(gmm, section, rdl, - guest_memfd_notify_populate_cb); + /* Populate shared part */ + ret = guest_memfd_for_each_shared_section(gmm, section, false, rdl, + guest_memfd_notify_populate_cb); if (ret) { error_report("%s: Failed to register RAM discard listener: %s", __func__, strerror(-ret)); @@ -163,8 +177,9 @@ static void guest_memfd_rdm_unregister_listener(RamDiscardManager *rdm, g_assert(rdl->section); g_assert(rdl->section->mr == gmm->mr); - ret = guest_memfd_for_each_populated_section(gmm, rdl->section, rdl, - guest_memfd_notify_discard_cb); + /* Discard shared part */ + ret = guest_memfd_for_each_shared_section(gmm, rdl->section, false, rdl, + guest_memfd_notify_discard_cb); if (ret) { error_report("%s: Failed to unregister RAM discard listener: %s", __func__, strerror(-ret)); @@ -181,16 +196,18 @@ typedef struct GuestMemfdReplayData { void *opaque; } GuestMemfdReplayData; -static int guest_memfd_rdm_replay_populated_cb(MemoryRegionSection *section, void *arg) +static int guest_memfd_rdm_replay_populated_cb(MemoryRegionSection *section, + bool is_private, void *arg) { struct GuestMemfdReplayData *data = arg; ReplayRamPopulate replay_fn = data->fn; - return replay_fn(section, data->opaque); + return replay_fn(section, is_private, data->opaque); } static int guest_memfd_rdm_replay_populated(const RamDiscardManager *rdm, MemoryRegionSection *section, + bool is_private, ReplayRamPopulate replay_fn, void *opaque) { @@ -198,22 +215,31 @@ static int guest_memfd_rdm_replay_populated(const RamDiscardManager *rdm, struct GuestMemfdReplayData data = { .fn = replay_fn, .opaque = opaque }; g_assert(section->mr == gmm->mr); - return guest_memfd_for_each_populated_section(gmm, section, &data, - guest_memfd_rdm_replay_populated_cb); + if (is_private) { + /* Replay populate on private section */ + return guest_memfd_for_each_private_section(gmm, section, is_private, &data, + guest_memfd_rdm_replay_populated_cb); + } else { + /* Replay populate on shared section */ + return guest_memfd_for_each_shared_section(gmm, section, is_private, &data, + guest_memfd_rdm_replay_populated_cb); + } } -static int guest_memfd_rdm_replay_discarded_cb(MemoryRegionSection *section, void *arg) +static int guest_memfd_rdm_replay_discarded_cb(MemoryRegionSection *section, + bool is_private, void *arg) { struct GuestMemfdReplayData *data = arg; ReplayRamDiscard replay_fn = data->fn; - replay_fn(section, data->opaque); + replay_fn(section, is_private, data->opaque); return 0; } static void guest_memfd_rdm_replay_discarded(const RamDiscardManager *rdm, MemoryRegionSection *section, + bool is_private, ReplayRamDiscard replay_fn, void *opaque) { @@ -221,8 +247,16 @@ static void guest_memfd_rdm_replay_discarded(const RamDiscardManager *rdm, struct GuestMemfdReplayData data = { .fn = replay_fn, .opaque = opaque }; g_assert(section->mr == gmm->mr); - guest_memfd_for_each_discarded_section(gmm, section, &data, - guest_memfd_rdm_replay_discarded_cb); + + if (is_private) { + /* Replay discard on private section */ + guest_memfd_for_each_private_section(gmm, section, is_private, &data, + guest_memfd_rdm_replay_discarded_cb); + } else { + /* Replay discard on shared section */ + guest_memfd_for_each_shared_section(gmm, section, is_private, &data, + guest_memfd_rdm_replay_discarded_cb); + } } static bool guest_memfd_is_valid_range(GuestMemfdManager *gmm, @@ -257,8 +291,9 @@ static void guest_memfd_notify_discard(GuestMemfdManager *gmm, continue; } - guest_memfd_for_each_populated_section(gmm, &tmp, rdl, - guest_memfd_notify_discard_cb); + /* For current shared section, notify to discard shared parts */ + guest_memfd_for_each_shared_section(gmm, &tmp, false, rdl, + guest_memfd_notify_discard_cb); } } @@ -276,8 +311,9 @@ static int guest_memfd_notify_populate(GuestMemfdManager *gmm, continue; } - ret = guest_memfd_for_each_discarded_section(gmm, &tmp, rdl, - guest_memfd_notify_populate_cb); + /* For current private section, notify to populate the shared parts */ + ret = guest_memfd_for_each_private_section(gmm, &tmp, false, rdl, + guest_memfd_notify_populate_cb); if (ret) { break; } @@ -295,8 +331,8 @@ static int guest_memfd_notify_populate(GuestMemfdManager *gmm, continue; } - guest_memfd_for_each_discarded_section(gmm, &tmp, rdl2, - guest_memfd_notify_discard_cb); + guest_memfd_for_each_private_section(gmm, &tmp, false, rdl2, + guest_memfd_notify_discard_cb); } } return ret; diff --git a/system/memory.c b/system/memory.c index ddcec90f5e..d3d5a04f98 100644 --- a/system/memory.c +++ b/system/memory.c @@ -2133,34 +2133,37 @@ uint64_t ram_discard_manager_get_min_granularity(const RamDiscardManager *rdm, } bool ram_discard_manager_is_populated(const RamDiscardManager *rdm, - const MemoryRegionSection *section) + const MemoryRegionSection *section, + bool is_private) { RamDiscardManagerClass *rdmc = RAM_DISCARD_MANAGER_GET_CLASS(rdm); g_assert(rdmc->is_populated); - return rdmc->is_populated(rdm, section); + return rdmc->is_populated(rdm, section, is_private); } int ram_discard_manager_replay_populated(const RamDiscardManager *rdm, MemoryRegionSection *section, + bool is_private, ReplayRamPopulate replay_fn, void *opaque) { RamDiscardManagerClass *rdmc = RAM_DISCARD_MANAGER_GET_CLASS(rdm); g_assert(rdmc->replay_populated); - return rdmc->replay_populated(rdm, section, replay_fn, opaque); + return rdmc->replay_populated(rdm, section, is_private, replay_fn, opaque); } void ram_discard_manager_replay_discarded(const RamDiscardManager *rdm, MemoryRegionSection *section, + bool is_private, ReplayRamDiscard replay_fn, void *opaque) { RamDiscardManagerClass *rdmc = RAM_DISCARD_MANAGER_GET_CLASS(rdm); g_assert(rdmc->replay_discarded); - rdmc->replay_discarded(rdm, section, replay_fn, opaque); + rdmc->replay_discarded(rdm, section, is_private, replay_fn, opaque); } void ram_discard_manager_register_listener(RamDiscardManager *rdm, @@ -2221,7 +2224,7 @@ bool memory_get_xlat_addr(IOMMUTLBEntry *iotlb, void **vaddr, * Disallow that. vmstate priorities make sure any RamDiscardManager * were already restored before IOMMUs are restored. */ - if (!ram_discard_manager_is_populated(rdm, &tmp)) { + if (!ram_discard_manager_is_populated(rdm, &tmp, false)) { error_setg(errp, "iommu map to discarded memory (e.g., unplugged" " via virtio-mem): %" HWADDR_PRIx "", iotlb->translated_addr); diff --git a/system/memory_mapping.c b/system/memory_mapping.c index ca2390eb80..c55c0c0c93 100644 --- a/system/memory_mapping.c +++ b/system/memory_mapping.c @@ -249,7 +249,7 @@ static void guest_phys_block_add_section(GuestPhysListener *g, } static int guest_phys_ram_populate_cb(MemoryRegionSection *section, - void *opaque) + bool is_private, void *opaque) { GuestPhysListener *g = opaque; @@ -274,7 +274,7 @@ static void guest_phys_blocks_region_add(MemoryListener *listener, RamDiscardManager *rdm; rdm = memory_region_get_ram_discard_manager(section->mr); - ram_discard_manager_replay_populated(rdm, section, + ram_discard_manager_replay_populated(rdm, section, false, guest_phys_ram_populate_cb, g); return; }