From patchwork Mon Apr  7 07:49:21 2025
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Chenyi Qiang <chenyi.qiang@intel.com>
X-Patchwork-Id: 14039892
Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.9])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id DBDED225407
	for <kvm@vger.kernel.org>; Mon,  7 Apr 2025 07:49:51 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=198.175.65.9
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1744012193; cv=none;
 b=Nwwckmu81hoHuvCZrLuKHtyp9Vypxz0qjkDip8RDCroDiu2ULIkQbybVuPMHJlz7oHWEuT1YsQRGBYENhe8TzAxTjdc4CGpItRl5eeOuvToRID5a0qN+UPS+hm3q3nyJsRuNt9XRFSWdqyblhndR2GT6+lizYl5HFSqyyQ8Bbw8=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1744012193; c=relaxed/simple;
	bh=5bfg2mSSdrEiNMifk1Ar9hmJETGtlbqxFC2i4lu3kGw=;
	h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References:
	 MIME-Version;
 b=LSH6N4oWAWm/xKFYAIu+bWAOU3S01Tpto+m1TLDARQGjvrEKAga4ACKM0Z4VRs3b01wDIPtc/Snjls1TdPox41ri7NAidUV58m9h+tW0ELhnIVEF+S4dvSU04POv9mXB3cKoLo75BzBsEkqFKahwlR9tUlz79COcZ2fL/89qWUE=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=intel.com;
 spf=pass smtp.mailfrom=intel.com;
 dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com
 header.b=f5Fm5v3V; arc=none smtp.client-ip=198.175.65.9
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=intel.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=intel.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com
 header.b="f5Fm5v3V"
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1744012192; x=1775548192;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=5bfg2mSSdrEiNMifk1Ar9hmJETGtlbqxFC2i4lu3kGw=;
  b=f5Fm5v3Ve82RDf8v0sHt/r0j8BWDk/i89Ekn+FTs48foWkH0SFxk4Vpw
   M99fsmAvDTqSJ8pf703VHqXRDSYnoDbm4BOvzLu2yu9Ac62ScZ7jQnRLH
   yGsQWTs5/w4R8QTt7lNR2UXlX5BLHm4n3/3e3c+zMBh7wINlgA5ki0ulc
   Yn+nMaH94t9R3F3c5BB11C8YSJSlLMqVmQUuSu1eqhFqU7fLcsAJC5ZmX
   3EcuVKJZFwt03/3DnHIxKdEYE0FKEQqoBncVwi0kVemfxmdy99gaA/+gk
   b4f3Z61UTIpIPnobKOFaA2RW+qD/tZ6QRQ5dln5pbyyl1SflEr8kQIUhH
   A==;
X-CSE-ConnectionGUID: xxQFrI9/QDCrQxug7MpQPw==
X-CSE-MsgGUID: us0gWbq0SX2jPUzp64s9Hg==
X-IronPort-AV: E=McAfee;i="6700,10204,11396"; a="67857502"
X-IronPort-AV: E=Sophos;i="6.15,193,1739865600";
   d="scan'208";a="67857502"
Received: from orviesa007.jf.intel.com ([10.64.159.147])
  by orvoesa101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 07 Apr 2025 00:49:52 -0700
X-CSE-ConnectionGUID: 9dcCSXp6QNGod03yio84mg==
X-CSE-MsgGUID: slafT7jcTGGqDDLCuNvx1w==
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="6.15,193,1739865600";
   d="scan'208";a="128405464"
Received: from emr-bkc.sh.intel.com ([10.112.230.82])
  by orviesa007-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 07 Apr 2025 00:49:48 -0700
From: Chenyi Qiang <chenyi.qiang@intel.com>
To: David Hildenbrand <david@redhat.com>, Alexey Kardashevskiy <aik@amd.com>,
 Peter Xu <peterx@redhat.com>, Gupta Pankaj <pankaj.gupta@amd.com>,
 Paolo Bonzini <pbonzini@redhat.com>,
 =?utf-8?q?Philippe_Mathieu-Daud=C3=A9?= <philmd@linaro.org>,
 Michael Roth <michael.roth@amd.com>
Cc: Chenyi Qiang <chenyi.qiang@intel.com>,
	qemu-devel@nongnu.org,
	kvm@vger.kernel.org,
	Williams Dan J <dan.j.williams@intel.com>,
	Peng Chao P <chao.p.peng@intel.com>,
	Gao Chao <chao.gao@intel.com>,
	Xu Yilun <yilun.xu@intel.com>,
	Li Xiaoyao <xiaoyao.li@intel.com>
Subject: [PATCH v4 01/13] memory: Export a helper to get intersection of a
 MemoryRegionSection with a given range
Date: Mon,  7 Apr 2025 15:49:21 +0800
Message-ID: <20250407074939.18657-2-chenyi.qiang@intel.com>
X-Mailer: git-send-email 2.43.5
In-Reply-To: <20250407074939.18657-1-chenyi.qiang@intel.com>
References: <20250407074939.18657-1-chenyi.qiang@intel.com>
Precedence: bulk
X-Mailing-List: kvm@vger.kernel.org
List-Id: <kvm.vger.kernel.org>
List-Subscribe: <mailto:kvm+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:kvm+unsubscribe@vger.kernel.org>
MIME-Version: 1.0

Rename the helper to memory_region_section_intersect_range() to make it
more generic. Meanwhile, define the @end as Int128 and replace the
related operations with Int128_* format since the helper is exported as
a wider API.

Suggested-by: Alexey Kardashevskiy <aik@amd.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com>
---
Changes in v4:
    - No change.

Changes in v3:
    - No change

Changes in v2:
    - Make memory_region_section_intersect_range() an inline function.
    - Add Reviewed-by from David
    - Define the @end as Int128 and use the related Int128_* ops as a wilder
      API (Alexey)
---
 hw/virtio/virtio-mem.c | 32 +++++---------------------------
 include/exec/memory.h  | 27 +++++++++++++++++++++++++++
 2 files changed, 32 insertions(+), 27 deletions(-)

diff --git a/hw/virtio/virtio-mem.c b/hw/virtio/virtio-mem.c
index b1a003736b..21f16e4912 100644
--- a/hw/virtio/virtio-mem.c
+++ b/hw/virtio/virtio-mem.c
@@ -244,28 +244,6 @@ static int virtio_mem_for_each_plugged_range(VirtIOMEM *vmem, void *arg,
     return ret;
 }
 
-/*
- * Adjust the memory section to cover the intersection with the given range.
- *
- * Returns false if the intersection is empty, otherwise returns true.
- */
-static bool virtio_mem_intersect_memory_section(MemoryRegionSection *s,
-                                                uint64_t offset, uint64_t size)
-{
-    uint64_t start = MAX(s->offset_within_region, offset);
-    uint64_t end = MIN(s->offset_within_region + int128_get64(s->size),
-                       offset + size);
-
-    if (end <= start) {
-        return false;
-    }
-
-    s->offset_within_address_space += start - s->offset_within_region;
-    s->offset_within_region = start;
-    s->size = int128_make64(end - start);
-    return true;
-}
-
 typedef int (*virtio_mem_section_cb)(MemoryRegionSection *s, void *arg);
 
 static int virtio_mem_for_each_plugged_section(const VirtIOMEM *vmem,
@@ -287,7 +265,7 @@ static int virtio_mem_for_each_plugged_section(const VirtIOMEM *vmem,
                                       first_bit + 1) - 1;
         size = (last_bit - first_bit + 1) * vmem->block_size;
 
-        if (!virtio_mem_intersect_memory_section(&tmp, offset, size)) {
+        if (!memory_region_section_intersect_range(&tmp, offset, size)) {
             break;
         }
         ret = cb(&tmp, arg);
@@ -319,7 +297,7 @@ static int virtio_mem_for_each_unplugged_section(const VirtIOMEM *vmem,
                                  first_bit + 1) - 1;
         size = (last_bit - first_bit + 1) * vmem->block_size;
 
-        if (!virtio_mem_intersect_memory_section(&tmp, offset, size)) {
+        if (!memory_region_section_intersect_range(&tmp, offset, size)) {
             break;
         }
         ret = cb(&tmp, arg);
@@ -355,7 +333,7 @@ static void virtio_mem_notify_unplug(VirtIOMEM *vmem, uint64_t offset,
     QLIST_FOREACH(rdl, &vmem->rdl_list, next) {
         MemoryRegionSection tmp = *rdl->section;
 
-        if (!virtio_mem_intersect_memory_section(&tmp, offset, size)) {
+        if (!memory_region_section_intersect_range(&tmp, offset, size)) {
             continue;
         }
         rdl->notify_discard(rdl, &tmp);
@@ -371,7 +349,7 @@ static int virtio_mem_notify_plug(VirtIOMEM *vmem, uint64_t offset,
     QLIST_FOREACH(rdl, &vmem->rdl_list, next) {
         MemoryRegionSection tmp = *rdl->section;
 
-        if (!virtio_mem_intersect_memory_section(&tmp, offset, size)) {
+        if (!memory_region_section_intersect_range(&tmp, offset, size)) {
             continue;
         }
         ret = rdl->notify_populate(rdl, &tmp);
@@ -388,7 +366,7 @@ static int virtio_mem_notify_plug(VirtIOMEM *vmem, uint64_t offset,
             if (rdl2 == rdl) {
                 break;
             }
-            if (!virtio_mem_intersect_memory_section(&tmp, offset, size)) {
+            if (!memory_region_section_intersect_range(&tmp, offset, size)) {
                 continue;
             }
             rdl2->notify_discard(rdl2, &tmp);
diff --git a/include/exec/memory.h b/include/exec/memory.h
index 3ee1901b52..3bebc43d59 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -1202,6 +1202,33 @@ MemoryRegionSection *memory_region_section_new_copy(MemoryRegionSection *s);
  */
 void memory_region_section_free_copy(MemoryRegionSection *s);
 
+/**
+ * memory_region_section_intersect_range: Adjust the memory section to cover
+ * the intersection with the given range.
+ *
+ * @s: the #MemoryRegionSection to be adjusted
+ * @offset: the offset of the given range in the memory region
+ * @size: the size of the given range
+ *
+ * Returns false if the intersection is empty, otherwise returns true.
+ */
+static inline bool memory_region_section_intersect_range(MemoryRegionSection *s,
+                                                         uint64_t offset, uint64_t size)
+{
+    uint64_t start = MAX(s->offset_within_region, offset);
+    Int128 end = int128_min(int128_add(int128_make64(s->offset_within_region), s->size),
+                            int128_add(int128_make64(offset), int128_make64(size)));
+
+    if (int128_le(end, int128_make64(start))) {
+        return false;
+    }
+
+    s->offset_within_address_space += start - s->offset_within_region;
+    s->offset_within_region = start;
+    s->size = int128_sub(end, int128_make64(start));
+    return true;
+}
+
 /**
  * memory_region_init: Initialize a memory region
  *

From patchwork Mon Apr  7 07:49:22 2025
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Chenyi Qiang <chenyi.qiang@intel.com>
X-Patchwork-Id: 14039893
Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.9])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8387D4C6D
	for <kvm@vger.kernel.org>; Mon,  7 Apr 2025 07:49:55 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=198.175.65.9
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1744012197; cv=none;
 b=oSBTbqH/UX972hNlBUZX+7XmUayFjPB4rYwwGOyVZi2UFQ509okq4RU8Nf/Xpaynj/7F4Uis+FuuE/YReweavGA/8sk28/s3d4nX38UuVISrV8QgDOO1f1HbbTfDKNz2Fi78JCyMd/CsPsYjLkVvmpnGaZsztpJeZ3TifqbguuE=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1744012197; c=relaxed/simple;
	bh=H9LcxDtzW9NyGtq5F0H02vpqvOaa84qlhS97eJIIqwg=;
	h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References:
	 MIME-Version;
 b=sj2FLI3gw0uBqAdEwsG37lsPrTTHlSiZgnGJjIIRqszRjdduGgXxoTNRxbHL72JkUEiPNvPr+5Be4fxzZluVsOirF3CxKvWoy37XaW1no5XOqHWr9Dk/wbEdbpESG070adhmTxhaGYACxzkjcbvI8cEw8ux37c23/Ajm2NyIf6I=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=intel.com;
 spf=pass smtp.mailfrom=intel.com;
 dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com
 header.b=InP99W7M; arc=none smtp.client-ip=198.175.65.9
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=intel.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=intel.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com
 header.b="InP99W7M"
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1744012196; x=1775548196;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=H9LcxDtzW9NyGtq5F0H02vpqvOaa84qlhS97eJIIqwg=;
  b=InP99W7M9zfgVYMf7yBtoutva4CFr88Ey9VOmSZGci/0I6fB2ZN+sd5Q
   t9f386ppKTTQSwJcJQE7w6ijw7//Nv5xJbKRYQquBurOSrsHuRDZ+H+aK
   fsbF2dy4WRhxySo1/2zdaLAnxX84wmuMcOFPc4kOTLVFxk7k8g0CLMGuV
   ujEJOtYlRi9uP+s9pQAFZ/xeOZO1yH3+uUdqcVT1fb4tAfM6rFemkvHfq
   ndD0YRAalTmrI3SXaUhFqpUoYnUlbcwBi6+8M2nLqOA/Xff5nCw0yrjDx
   sqRw/dxM65Xue2oMEw0a7fuqzAUrLgJhPTZ5saTGPGhI+gxUfMgRWws8e
   A==;
X-CSE-ConnectionGUID: PQV+aK+AQJqPMtwXZdlV+Q==
X-CSE-MsgGUID: iYl/gN5IS9mMtg6jsnt0iA==
X-IronPort-AV: E=McAfee;i="6700,10204,11396"; a="67857513"
X-IronPort-AV: E=Sophos;i="6.15,193,1739865600";
   d="scan'208";a="67857513"
Received: from orviesa007.jf.intel.com ([10.64.159.147])
  by orvoesa101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 07 Apr 2025 00:49:56 -0700
X-CSE-ConnectionGUID: aMiA/yl4RC26WNr2Nyl6yQ==
X-CSE-MsgGUID: ibU24jARRqS4mtSMybmwOw==
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="6.15,193,1739865600";
   d="scan'208";a="128405476"
Received: from emr-bkc.sh.intel.com ([10.112.230.82])
  by orviesa007-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 07 Apr 2025 00:49:52 -0700
From: Chenyi Qiang <chenyi.qiang@intel.com>
To: David Hildenbrand <david@redhat.com>, Alexey Kardashevskiy <aik@amd.com>,
 Peter Xu <peterx@redhat.com>, Gupta Pankaj <pankaj.gupta@amd.com>,
 Paolo Bonzini <pbonzini@redhat.com>,
 =?utf-8?q?Philippe_Mathieu-Daud=C3=A9?= <philmd@linaro.org>,
 Michael Roth <michael.roth@amd.com>
Cc: Chenyi Qiang <chenyi.qiang@intel.com>,
	qemu-devel@nongnu.org,
	kvm@vger.kernel.org,
	Williams Dan J <dan.j.williams@intel.com>,
	Peng Chao P <chao.p.peng@intel.com>,
	Gao Chao <chao.gao@intel.com>,
	Xu Yilun <yilun.xu@intel.com>,
	Li Xiaoyao <xiaoyao.li@intel.com>
Subject: [PATCH v4 02/13] memory: Change
 memory_region_set_ram_discard_manager() to return the result
Date: Mon,  7 Apr 2025 15:49:22 +0800
Message-ID: <20250407074939.18657-3-chenyi.qiang@intel.com>
X-Mailer: git-send-email 2.43.5
In-Reply-To: <20250407074939.18657-1-chenyi.qiang@intel.com>
References: <20250407074939.18657-1-chenyi.qiang@intel.com>
Precedence: bulk
X-Mailing-List: kvm@vger.kernel.org
List-Id: <kvm.vger.kernel.org>
List-Subscribe: <mailto:kvm+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:kvm+unsubscribe@vger.kernel.org>
MIME-Version: 1.0

Modify memory_region_set_ram_discard_manager() to return false if a
RamDiscardManager is already set in the MemoryRegion. The caller must
handle this failure, such as having virtio-mem undo its actions and fail
the realize() process. Opportunistically move the call earlier to avoid
complex error handling.

This change is beneficial when introducing a new RamDiscardManager
instance besides virtio-mem. After
ram_block_coordinated_discard_require(true) unlocks all
RamDiscardManager instances, only one instance is allowed to be set for
a MemoryRegion at present.

Suggested-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com>
---
Changes in v4:
    - No change.

Changes in v3:
    - Move set_ram_discard_manager() up to avoid a g_free()
    - Clean up set_ram_discard_manager() definition

Changes in v2:
    - newly added.
---
 hw/virtio/virtio-mem.c | 29 ++++++++++++++++-------------
 include/exec/memory.h  |  6 +++---
 system/memory.c        | 10 +++++++---
 3 files changed, 26 insertions(+), 19 deletions(-)

diff --git a/hw/virtio/virtio-mem.c b/hw/virtio/virtio-mem.c
index 21f16e4912..d0d3a0240f 100644
--- a/hw/virtio/virtio-mem.c
+++ b/hw/virtio/virtio-mem.c
@@ -1049,6 +1049,17 @@ static void virtio_mem_device_realize(DeviceState *dev, Error **errp)
         return;
     }
 
+    /*
+     * Set ourselves as RamDiscardManager before the plug handler maps the
+     * memory region and exposes it via an address space.
+     */
+    if (memory_region_set_ram_discard_manager(&vmem->memdev->mr,
+                                              RAM_DISCARD_MANAGER(vmem))) {
+        error_setg(errp, "Failed to set RamDiscardManager");
+        ram_block_coordinated_discard_require(false);
+        return;
+    }
+
     /*
      * We don't know at this point whether shared RAM is migrated using
      * QEMU or migrated using the file content. "x-ignore-shared" will be
@@ -1124,13 +1135,6 @@ static void virtio_mem_device_realize(DeviceState *dev, Error **errp)
     vmem->system_reset = VIRTIO_MEM_SYSTEM_RESET(obj);
     vmem->system_reset->vmem = vmem;
     qemu_register_resettable(obj);
-
-    /*
-     * Set ourselves as RamDiscardManager before the plug handler maps the
-     * memory region and exposes it via an address space.
-     */
-    memory_region_set_ram_discard_manager(&vmem->memdev->mr,
-                                          RAM_DISCARD_MANAGER(vmem));
 }
 
 static void virtio_mem_device_unrealize(DeviceState *dev)
@@ -1138,12 +1142,6 @@ static void virtio_mem_device_unrealize(DeviceState *dev)
     VirtIODevice *vdev = VIRTIO_DEVICE(dev);
     VirtIOMEM *vmem = VIRTIO_MEM(dev);
 
-    /*
-     * The unplug handler unmapped the memory region, it cannot be
-     * found via an address space anymore. Unset ourselves.
-     */
-    memory_region_set_ram_discard_manager(&vmem->memdev->mr, NULL);
-
     qemu_unregister_resettable(OBJECT(vmem->system_reset));
     object_unref(OBJECT(vmem->system_reset));
 
@@ -1156,6 +1154,11 @@ static void virtio_mem_device_unrealize(DeviceState *dev)
     virtio_del_queue(vdev, 0);
     virtio_cleanup(vdev);
     g_free(vmem->bitmap);
+    /*
+     * The unplug handler unmapped the memory region, it cannot be
+     * found via an address space anymore. Unset ourselves.
+     */
+    memory_region_set_ram_discard_manager(&vmem->memdev->mr, NULL);
     ram_block_coordinated_discard_require(false);
 }
 
diff --git a/include/exec/memory.h b/include/exec/memory.h
index 3bebc43d59..390477b588 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -2487,13 +2487,13 @@ static inline bool memory_region_has_ram_discard_manager(MemoryRegion *mr)
  *
  * This function must not be called for a mapped #MemoryRegion, a #MemoryRegion
  * that does not cover RAM, or a #MemoryRegion that already has a
- * #RamDiscardManager assigned.
+ * #RamDiscardManager assigned. Return 0 if the rdm is set successfully.
  *
  * @mr: the #MemoryRegion
  * @rdm: #RamDiscardManager to set
  */
-void memory_region_set_ram_discard_manager(MemoryRegion *mr,
-                                           RamDiscardManager *rdm);
+int memory_region_set_ram_discard_manager(MemoryRegion *mr,
+                                          RamDiscardManager *rdm);
 
 /**
  * memory_region_find: translate an address/size relative to a
diff --git a/system/memory.c b/system/memory.c
index b17b5538ff..62d6b410f0 100644
--- a/system/memory.c
+++ b/system/memory.c
@@ -2115,12 +2115,16 @@ RamDiscardManager *memory_region_get_ram_discard_manager(MemoryRegion *mr)
     return mr->rdm;
 }
 
-void memory_region_set_ram_discard_manager(MemoryRegion *mr,
-                                           RamDiscardManager *rdm)
+int memory_region_set_ram_discard_manager(MemoryRegion *mr,
+                                          RamDiscardManager *rdm)
 {
     g_assert(memory_region_is_ram(mr));
-    g_assert(!rdm || !mr->rdm);
+    if (mr->rdm && rdm) {
+        return -EBUSY;
+    }
+
     mr->rdm = rdm;
+    return 0;
 }
 
 uint64_t ram_discard_manager_get_min_granularity(const RamDiscardManager *rdm,

From patchwork Mon Apr  7 07:49:23 2025
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Chenyi Qiang <chenyi.qiang@intel.com>
X-Patchwork-Id: 14039894
Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.9])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 3202122616C
	for <kvm@vger.kernel.org>; Mon,  7 Apr 2025 07:49:59 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=198.175.65.9
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1744012201; cv=none;
 b=HKb9pSEq3+KoqU/I5vRqZwTGkJDD0gMLHgdYqJ7+z1Bd3HruueYB/M6zQgGKtq+1uBZR/hwzv7IYMQ1JpztKAcVbmzucpSwO3LbGgvqKHxPdJXs3n9wgVLFlwg8Z/zu1pIn/S6K2Ce8NCHWOMx91RKPMzIOMrH0LIby1bG8m32I=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1744012201; c=relaxed/simple;
	bh=kk3uHBuJkJcjtKmTs4aWLcjX3wmb5FE548hrfQQbsSk=;
	h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References:
	 MIME-Version;
 b=XkPRdK6+8XPlcvzTj2hEJVknw4dtTQotZAzdh2g3pSIM4h4GNtQqABRS0JpX/NCikDEGBIAm3wOSbF7LQK8na79D5B5rqo3wrNmiDoaQ7qgNHtb1QjsW2j9TPrKnVOc6B0vBKaNXP3pI+nBIEY2iZSS+Asavgw4L9fzDmucQqM8=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=intel.com;
 spf=pass smtp.mailfrom=intel.com;
 dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com
 header.b=SftZPm2h; arc=none smtp.client-ip=198.175.65.9
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=intel.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=intel.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com
 header.b="SftZPm2h"
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1744012200; x=1775548200;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=kk3uHBuJkJcjtKmTs4aWLcjX3wmb5FE548hrfQQbsSk=;
  b=SftZPm2hBFjbi2Wa8jy4zuz4+ZNnWMN+NiWcZKQNNwm7Ncyo5KFvDAAJ
   e3s3CZpU5rZdhtTkpLmp9xnAilI8PYm9bK64/eGbBZvE0H7dgwwn0I0bH
   5iDioSnU290Vk1F2nXxGie29egz3L/0gTZXe83M7U9JSfEjShwxdplGHR
   QWZeqPd1X0f/Aj4FmQKLXlPJOUsUKZNE3jsLZw5KF05z8eYnUKgaD9+gX
   g++E3DHbhlQGg8cuc+uN0P5h6PD89v5SzTgcSNQgNNkuQoz7LkZ5ndXbq
   b/PqavS7SNoMMwGEAJW2cQ8mJvKgzMik15wliBn77MnKLcVLlEXkXviPi
   Q==;
X-CSE-ConnectionGUID: je97al0mRhiQkC7d2XYuYQ==
X-CSE-MsgGUID: s9dV5kEhTLai8JjPvm5H9g==
X-IronPort-AV: E=McAfee;i="6700,10204,11396"; a="67857522"
X-IronPort-AV: E=Sophos;i="6.15,193,1739865600";
   d="scan'208";a="67857522"
Received: from orviesa007.jf.intel.com ([10.64.159.147])
  by orvoesa101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 07 Apr 2025 00:49:59 -0700
X-CSE-ConnectionGUID: r2LJDjivQ7CzKUj3/5+NxQ==
X-CSE-MsgGUID: t/e3SxxMSzeUz6Fac+NPzA==
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="6.15,193,1739865600";
   d="scan'208";a="128405484"
Received: from emr-bkc.sh.intel.com ([10.112.230.82])
  by orviesa007-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 07 Apr 2025 00:49:56 -0700
From: Chenyi Qiang <chenyi.qiang@intel.com>
To: David Hildenbrand <david@redhat.com>, Alexey Kardashevskiy <aik@amd.com>,
 Peter Xu <peterx@redhat.com>, Gupta Pankaj <pankaj.gupta@amd.com>,
 Paolo Bonzini <pbonzini@redhat.com>,
 =?utf-8?q?Philippe_Mathieu-Daud=C3=A9?= <philmd@linaro.org>,
 Michael Roth <michael.roth@amd.com>
Cc: Chenyi Qiang <chenyi.qiang@intel.com>,
	qemu-devel@nongnu.org,
	kvm@vger.kernel.org,
	Williams Dan J <dan.j.williams@intel.com>,
	Peng Chao P <chao.p.peng@intel.com>,
	Gao Chao <chao.gao@intel.com>,
	Xu Yilun <yilun.xu@intel.com>,
	Li Xiaoyao <xiaoyao.li@intel.com>
Subject: [PATCH v4 03/13] memory: Unify the definiton of ReplayRamPopulate()
 and ReplayRamDiscard()
Date: Mon,  7 Apr 2025 15:49:23 +0800
Message-ID: <20250407074939.18657-4-chenyi.qiang@intel.com>
X-Mailer: git-send-email 2.43.5
In-Reply-To: <20250407074939.18657-1-chenyi.qiang@intel.com>
References: <20250407074939.18657-1-chenyi.qiang@intel.com>
Precedence: bulk
X-Mailing-List: kvm@vger.kernel.org
List-Id: <kvm.vger.kernel.org>
List-Subscribe: <mailto:kvm+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:kvm+unsubscribe@vger.kernel.org>
MIME-Version: 1.0

Update ReplayRamDiscard() function to return the result and unify the
ReplayRamPopulate() and ReplayRamDiscard() to ReplayStateChange() at
the same time due to their identical definitions. This unification
simplifies related structures, such as VirtIOMEMReplayData, which makes
it more cleaner and maintainable.

Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com>
---
Changes in v4:
    - Modify the commit message. We won't use Replay() operation when
      doing the attribute change like v3.

Changes in v3:
    - Newly added.
---
 hw/virtio/virtio-mem.c | 20 ++++++++++----------
 include/exec/memory.h  | 31 ++++++++++++++++---------------
 migration/ram.c        |  5 +++--
 system/memory.c        | 12 ++++++------
 4 files changed, 35 insertions(+), 33 deletions(-)

diff --git a/hw/virtio/virtio-mem.c b/hw/virtio/virtio-mem.c
index d0d3a0240f..1a88d649cb 100644
--- a/hw/virtio/virtio-mem.c
+++ b/hw/virtio/virtio-mem.c
@@ -1733,7 +1733,7 @@ static bool virtio_mem_rdm_is_populated(const RamDiscardManager *rdm,
 }
 
 struct VirtIOMEMReplayData {
-    void *fn;
+    ReplayStateChange fn;
     void *opaque;
 };
 
@@ -1741,12 +1741,12 @@ static int virtio_mem_rdm_replay_populated_cb(MemoryRegionSection *s, void *arg)
 {
     struct VirtIOMEMReplayData *data = arg;
 
-    return ((ReplayRamPopulate)data->fn)(s, data->opaque);
+    return data->fn(s, data->opaque);
 }
 
 static int virtio_mem_rdm_replay_populated(const RamDiscardManager *rdm,
                                            MemoryRegionSection *s,
-                                           ReplayRamPopulate replay_fn,
+                                           ReplayStateChange replay_fn,
                                            void *opaque)
 {
     const VirtIOMEM *vmem = VIRTIO_MEM(rdm);
@@ -1765,14 +1765,14 @@ static int virtio_mem_rdm_replay_discarded_cb(MemoryRegionSection *s,
 {
     struct VirtIOMEMReplayData *data = arg;
 
-    ((ReplayRamDiscard)data->fn)(s, data->opaque);
+    data->fn(s, data->opaque);
     return 0;
 }
 
-static void virtio_mem_rdm_replay_discarded(const RamDiscardManager *rdm,
-                                            MemoryRegionSection *s,
-                                            ReplayRamDiscard replay_fn,
-                                            void *opaque)
+static int virtio_mem_rdm_replay_discarded(const RamDiscardManager *rdm,
+                                           MemoryRegionSection *s,
+                                           ReplayStateChange replay_fn,
+                                           void *opaque)
 {
     const VirtIOMEM *vmem = VIRTIO_MEM(rdm);
     struct VirtIOMEMReplayData data = {
@@ -1781,8 +1781,8 @@ static void virtio_mem_rdm_replay_discarded(const RamDiscardManager *rdm,
     };
 
     g_assert(s->mr == &vmem->memdev->mr);
-    virtio_mem_for_each_unplugged_section(vmem, s, &data,
-                                          virtio_mem_rdm_replay_discarded_cb);
+    return virtio_mem_for_each_unplugged_section(vmem, s, &data,
+                                                 virtio_mem_rdm_replay_discarded_cb);
 }
 
 static void virtio_mem_rdm_register_listener(RamDiscardManager *rdm,
diff --git a/include/exec/memory.h b/include/exec/memory.h
index 390477b588..3b1d25a403 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -566,8 +566,7 @@ static inline void ram_discard_listener_init(RamDiscardListener *rdl,
     rdl->double_discard_supported = double_discard_supported;
 }
 
-typedef int (*ReplayRamPopulate)(MemoryRegionSection *section, void *opaque);
-typedef void (*ReplayRamDiscard)(MemoryRegionSection *section, void *opaque);
+typedef int (*ReplayStateChange)(MemoryRegionSection *section, void *opaque);
 
 /*
  * RamDiscardManagerClass:
@@ -641,36 +640,38 @@ struct RamDiscardManagerClass {
     /**
      * @replay_populated:
      *
-     * Call the #ReplayRamPopulate callback for all populated parts within the
+     * Call the #ReplayStateChange callback for all populated parts within the
      * #MemoryRegionSection via the #RamDiscardManager.
      *
      * In case any call fails, no further calls are made.
      *
      * @rdm: the #RamDiscardManager
      * @section: the #MemoryRegionSection
-     * @replay_fn: the #ReplayRamPopulate callback
+     * @replay_fn: the #ReplayStateChange callback
      * @opaque: pointer to forward to the callback
      *
      * Returns 0 on success, or a negative error if any notification failed.
      */
     int (*replay_populated)(const RamDiscardManager *rdm,
                             MemoryRegionSection *section,
-                            ReplayRamPopulate replay_fn, void *opaque);
+                            ReplayStateChange replay_fn, void *opaque);
 
     /**
      * @replay_discarded:
      *
-     * Call the #ReplayRamDiscard callback for all discarded parts within the
+     * Call the #ReplayStateChange callback for all discarded parts within the
      * #MemoryRegionSection via the #RamDiscardManager.
      *
      * @rdm: the #RamDiscardManager
      * @section: the #MemoryRegionSection
-     * @replay_fn: the #ReplayRamDiscard callback
+     * @replay_fn: the #ReplayStateChange callback
      * @opaque: pointer to forward to the callback
+     *
+     * Returns 0 on success, or a negative error if any notification failed.
      */
-    void (*replay_discarded)(const RamDiscardManager *rdm,
-                             MemoryRegionSection *section,
-                             ReplayRamDiscard replay_fn, void *opaque);
+    int (*replay_discarded)(const RamDiscardManager *rdm,
+                            MemoryRegionSection *section,
+                            ReplayStateChange replay_fn, void *opaque);
 
     /**
      * @register_listener:
@@ -713,13 +714,13 @@ bool ram_discard_manager_is_populated(const RamDiscardManager *rdm,
 
 int ram_discard_manager_replay_populated(const RamDiscardManager *rdm,
                                          MemoryRegionSection *section,
-                                         ReplayRamPopulate replay_fn,
+                                         ReplayStateChange replay_fn,
                                          void *opaque);
 
-void ram_discard_manager_replay_discarded(const RamDiscardManager *rdm,
-                                          MemoryRegionSection *section,
-                                          ReplayRamDiscard replay_fn,
-                                          void *opaque);
+int ram_discard_manager_replay_discarded(const RamDiscardManager *rdm,
+                                         MemoryRegionSection *section,
+                                         ReplayStateChange replay_fn,
+                                         void *opaque);
 
 void ram_discard_manager_register_listener(RamDiscardManager *rdm,
                                            RamDiscardListener *rdl,
diff --git a/migration/ram.c b/migration/ram.c
index ce28328141..053730367b 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -816,8 +816,8 @@ static inline bool migration_bitmap_clear_dirty(RAMState *rs,
     return ret;
 }
 
-static void dirty_bitmap_clear_section(MemoryRegionSection *section,
-                                       void *opaque)
+static int dirty_bitmap_clear_section(MemoryRegionSection *section,
+                                      void *opaque)
 {
     const hwaddr offset = section->offset_within_region;
     const hwaddr size = int128_get64(section->size);
@@ -836,6 +836,7 @@ static void dirty_bitmap_clear_section(MemoryRegionSection *section,
     }
     *cleared_bits += bitmap_count_one_with_offset(rb->bmap, start, npages);
     bitmap_clear(rb->bmap, start, npages);
+    return 0;
 }
 
 /*
diff --git a/system/memory.c b/system/memory.c
index 62d6b410f0..b5ab729e13 100644
--- a/system/memory.c
+++ b/system/memory.c
@@ -2147,7 +2147,7 @@ bool ram_discard_manager_is_populated(const RamDiscardManager *rdm,
 
 int ram_discard_manager_replay_populated(const RamDiscardManager *rdm,
                                          MemoryRegionSection *section,
-                                         ReplayRamPopulate replay_fn,
+                                         ReplayStateChange replay_fn,
                                          void *opaque)
 {
     RamDiscardManagerClass *rdmc = RAM_DISCARD_MANAGER_GET_CLASS(rdm);
@@ -2156,15 +2156,15 @@ int ram_discard_manager_replay_populated(const RamDiscardManager *rdm,
     return rdmc->replay_populated(rdm, section, replay_fn, opaque);
 }
 
-void ram_discard_manager_replay_discarded(const RamDiscardManager *rdm,
-                                          MemoryRegionSection *section,
-                                          ReplayRamDiscard replay_fn,
-                                          void *opaque)
+int ram_discard_manager_replay_discarded(const RamDiscardManager *rdm,
+                                         MemoryRegionSection *section,
+                                         ReplayStateChange replay_fn,
+                                         void *opaque)
 {
     RamDiscardManagerClass *rdmc = RAM_DISCARD_MANAGER_GET_CLASS(rdm);
 
     g_assert(rdmc->replay_discarded);
-    rdmc->replay_discarded(rdm, section, replay_fn, opaque);
+    return rdmc->replay_discarded(rdm, section, replay_fn, opaque);
 }
 
 void ram_discard_manager_register_listener(RamDiscardManager *rdm,

From patchwork Mon Apr  7 07:49:24 2025
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Chenyi Qiang <chenyi.qiang@intel.com>
X-Patchwork-Id: 14039895
Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.9])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 592B11A315F
	for <kvm@vger.kernel.org>; Mon,  7 Apr 2025 07:50:03 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=198.175.65.9
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1744012206; cv=none;
 b=YneflveHYkGnOhagb1e0dU04zHb2EAHGmV4MqEH2K8fI/VQJY66bDgQTjWJvIz7RDWoxVpUWU5RGnbjtwawQN1dxgcxDKfYhtigKNuCMYjjSILyC6p9VwLeJaPBtSdpnAZpt4pMlIK3Rp4ZN4Pj6vj9rLVr+MCcXIOQc1OQTTMM=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1744012206; c=relaxed/simple;
	bh=myUEliAmem5ZX9Ho5nThrumB00xjkd8o41CqMTLUqlg=;
	h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References:
	 MIME-Version;
 b=H8cV7lyFbucI1A7A01TSCgIGrdAUoLnRlXiJRHidHXAr4ZU9VTaEY9ZCREsfnsgrWhPGge4DNPgFo26SjsNxC1Kd/jKbLwY8EaqTnxgFOcZnWrqQHP47YPh02BpMHmDxdIDPq04iaZntf9rsw9AEYpSlDyWKNLDSr1PrMw2wvmA=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=intel.com;
 spf=pass smtp.mailfrom=intel.com;
 dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com
 header.b=XknWAoLf; arc=none smtp.client-ip=198.175.65.9
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=intel.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=intel.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com
 header.b="XknWAoLf"
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1744012204; x=1775548204;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=myUEliAmem5ZX9Ho5nThrumB00xjkd8o41CqMTLUqlg=;
  b=XknWAoLf377qXdtVzBfhPkW0vNEiZUN+cehJ8gqJ7I1Kd9VFixFvVJ8H
   Mqnv5XoteOhSYZeDtGmiQKfCKcqbjyxQhSU02kqneAWTd3RAlRrjPVhOO
   sprLEfuGuQrdQiCpJWPhVWtGVrfNC8Z51W3VSy//YY85C352ngdHxsXq9
   Nsz+vX+2n61eTfQpCIo+/kbt2wzwLhJ5zOZ4Fwb1xkEoobbpInEkBcYZs
   OtWQ7EvC1xtppMLp83Rxz/W6pbS+J4FRkOOrqxZ5mcVdlsXGbOH0bKzYm
   pqp06mn+zbkO/2UrsoJrjK5Ew3XP709QbhpPKAhnAcEstHq/kEbdYHTFg
   Q==;
X-CSE-ConnectionGUID: rvLePpiGQfei4uaqCu4i5g==
X-CSE-MsgGUID: LC7iCajwQ1misaAxHRxolg==
X-IronPort-AV: E=McAfee;i="6700,10204,11396"; a="67857533"
X-IronPort-AV: E=Sophos;i="6.15,193,1739865600";
   d="scan'208";a="67857533"
Received: from orviesa007.jf.intel.com ([10.64.159.147])
  by orvoesa101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 07 Apr 2025 00:50:03 -0700
X-CSE-ConnectionGUID: 0kVwWi0pQVCZT4RDvEUfJw==
X-CSE-MsgGUID: v7jnTIrwRkSktBAZCJ4zEg==
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="6.15,193,1739865600";
   d="scan'208";a="128405523"
Received: from emr-bkc.sh.intel.com ([10.112.230.82])
  by orviesa007-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 07 Apr 2025 00:49:59 -0700
From: Chenyi Qiang <chenyi.qiang@intel.com>
To: David Hildenbrand <david@redhat.com>, Alexey Kardashevskiy <aik@amd.com>,
 Peter Xu <peterx@redhat.com>, Gupta Pankaj <pankaj.gupta@amd.com>,
 Paolo Bonzini <pbonzini@redhat.com>,
 =?utf-8?q?Philippe_Mathieu-Daud=C3=A9?= <philmd@linaro.org>,
 Michael Roth <michael.roth@amd.com>
Cc: Chenyi Qiang <chenyi.qiang@intel.com>,
	qemu-devel@nongnu.org,
	kvm@vger.kernel.org,
	Williams Dan J <dan.j.williams@intel.com>,
	Peng Chao P <chao.p.peng@intel.com>,
	Gao Chao <chao.gao@intel.com>,
	Xu Yilun <yilun.xu@intel.com>,
	Li Xiaoyao <xiaoyao.li@intel.com>
Subject: [PATCH v4 04/13] memory: Introduce generic state change parent class
 for RamDiscardManager
Date: Mon,  7 Apr 2025 15:49:24 +0800
Message-ID: <20250407074939.18657-5-chenyi.qiang@intel.com>
X-Mailer: git-send-email 2.43.5
In-Reply-To: <20250407074939.18657-1-chenyi.qiang@intel.com>
References: <20250407074939.18657-1-chenyi.qiang@intel.com>
Precedence: bulk
X-Mailing-List: kvm@vger.kernel.org
List-Id: <kvm.vger.kernel.org>
List-Subscribe: <mailto:kvm+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:kvm+unsubscribe@vger.kernel.org>
MIME-Version: 1.0

RamDiscardManager is an interface used by virtio-mem to adjust VFIO
mappings in relation to VM page assignment. It manages the state of
populated and discard for the RAM. To accommodate future scnarios for
managing RAM states, such as private and shared states in confidential
VMs, the existing RamDiscardManager interface needs to be generalized.

Introduce a parent class, GenericStateManager, to manage a pair of
opposite states with RamDiscardManager as its child. The changes include
- Define a new abstract class GenericStateChange.
- Extract six callbacks into GenericStateChangeClass and allow the child
  classes to inherit them.
- Modify RamDiscardManager-related helpers to use GenericStateManager
  ones.
- Define a generic StatChangeListener to extract fields from
  RamDiscardManager listener which allows future listeners to embed it
  and avoid duplication.
- Change the users of RamDiscardManager (virtio-mem, migration, etc.) to
  switch to use GenericStateChange helpers.

It can provide a more flexible and resuable framework for RAM state
management, facilitating future enhancements and use cases.

Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com>
---
Changes in v4:
    - Newly added.
---
 hw/vfio/common.c        |  30 ++--
 hw/virtio/virtio-mem.c  |  95 ++++++------
 include/exec/memory.h   | 313 ++++++++++++++++++++++------------------
 migration/ram.c         |  16 +-
 system/memory.c         | 106 ++++++++------
 system/memory_mapping.c |   6 +-
 6 files changed, 310 insertions(+), 256 deletions(-)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index f7499a9b74..3172d877cc 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -335,9 +335,10 @@ out:
     rcu_read_unlock();
 }
 
-static void vfio_ram_discard_notify_discard(RamDiscardListener *rdl,
+static void vfio_ram_discard_notify_discard(StateChangeListener *scl,
                                             MemoryRegionSection *section)
 {
+    RamDiscardListener *rdl = container_of(scl, RamDiscardListener, scl);
     VFIORamDiscardListener *vrdl = container_of(rdl, VFIORamDiscardListener,
                                                 listener);
     VFIOContainerBase *bcontainer = vrdl->bcontainer;
@@ -353,9 +354,10 @@ static void vfio_ram_discard_notify_discard(RamDiscardListener *rdl,
     }
 }
 
-static int vfio_ram_discard_notify_populate(RamDiscardListener *rdl,
+static int vfio_ram_discard_notify_populate(StateChangeListener *scl,
                                             MemoryRegionSection *section)
 {
+    RamDiscardListener *rdl = container_of(scl, RamDiscardListener, scl);
     VFIORamDiscardListener *vrdl = container_of(rdl, VFIORamDiscardListener,
                                                 listener);
     VFIOContainerBase *bcontainer = vrdl->bcontainer;
@@ -381,7 +383,7 @@ static int vfio_ram_discard_notify_populate(RamDiscardListener *rdl,
                                      vaddr, section->readonly);
         if (ret) {
             /* Rollback */
-            vfio_ram_discard_notify_discard(rdl, section);
+            vfio_ram_discard_notify_discard(scl, section);
             return ret;
         }
     }
@@ -391,8 +393,9 @@ static int vfio_ram_discard_notify_populate(RamDiscardListener *rdl,
 static void vfio_register_ram_discard_listener(VFIOContainerBase *bcontainer,
                                                MemoryRegionSection *section)
 {
-    RamDiscardManager *rdm = memory_region_get_ram_discard_manager(section->mr);
+    GenericStateManager *gsm = memory_region_get_generic_state_manager(section->mr);
     VFIORamDiscardListener *vrdl;
+    RamDiscardListener *rdl;
 
     /* Ignore some corner cases not relevant in practice. */
     g_assert(QEMU_IS_ALIGNED(section->offset_within_region, TARGET_PAGE_SIZE));
@@ -405,17 +408,18 @@ static void vfio_register_ram_discard_listener(VFIOContainerBase *bcontainer,
     vrdl->mr = section->mr;
     vrdl->offset_within_address_space = section->offset_within_address_space;
     vrdl->size = int128_get64(section->size);
-    vrdl->granularity = ram_discard_manager_get_min_granularity(rdm,
-                                                                section->mr);
+    vrdl->granularity = generic_state_manager_get_min_granularity(gsm,
+                                                                  section->mr);
 
     g_assert(vrdl->granularity && is_power_of_2(vrdl->granularity));
     g_assert(bcontainer->pgsizes &&
              vrdl->granularity >= 1ULL << ctz64(bcontainer->pgsizes));
 
-    ram_discard_listener_init(&vrdl->listener,
+    rdl = &vrdl->listener;
+    ram_discard_listener_init(rdl,
                               vfio_ram_discard_notify_populate,
                               vfio_ram_discard_notify_discard, true);
-    ram_discard_manager_register_listener(rdm, &vrdl->listener, section);
+    generic_state_manager_register_listener(gsm, &rdl->scl, section);
     QLIST_INSERT_HEAD(&bcontainer->vrdl_list, vrdl, next);
 
     /*
@@ -465,8 +469,9 @@ static void vfio_register_ram_discard_listener(VFIOContainerBase *bcontainer,
 static void vfio_unregister_ram_discard_listener(VFIOContainerBase *bcontainer,
                                                  MemoryRegionSection *section)
 {
-    RamDiscardManager *rdm = memory_region_get_ram_discard_manager(section->mr);
+    GenericStateManager *gsm = memory_region_get_generic_state_manager(section->mr);
     VFIORamDiscardListener *vrdl = NULL;
+    RamDiscardListener *rdl;
 
     QLIST_FOREACH(vrdl, &bcontainer->vrdl_list, next) {
         if (vrdl->mr == section->mr &&
@@ -480,7 +485,8 @@ static void vfio_unregister_ram_discard_listener(VFIOContainerBase *bcontainer,
         hw_error("vfio: Trying to unregister missing RAM discard listener");
     }
 
-    ram_discard_manager_unregister_listener(rdm, &vrdl->listener);
+    rdl = &vrdl->listener;
+    generic_state_manager_unregister_listener(gsm, &rdl->scl);
     QLIST_REMOVE(vrdl, next);
     g_free(vrdl);
 }
@@ -1265,7 +1271,7 @@ static int
 vfio_sync_ram_discard_listener_dirty_bitmap(VFIOContainerBase *bcontainer,
                                             MemoryRegionSection *section)
 {
-    RamDiscardManager *rdm = memory_region_get_ram_discard_manager(section->mr);
+    GenericStateManager *gsm = memory_region_get_generic_state_manager(section->mr);
     VFIORamDiscardListener *vrdl = NULL;
 
     QLIST_FOREACH(vrdl, &bcontainer->vrdl_list, next) {
@@ -1284,7 +1290,7 @@ vfio_sync_ram_discard_listener_dirty_bitmap(VFIOContainerBase *bcontainer,
      * We only want/can synchronize the bitmap for actually mapped parts -
      * which correspond to populated parts. Replay all populated parts.
      */
-    return ram_discard_manager_replay_populated(rdm, section,
+    return generic_state_manager_replay_on_state_set(gsm, section,
                                               vfio_ram_discard_get_dirty_bitmap,
                                                 &vrdl);
 }
diff --git a/hw/virtio/virtio-mem.c b/hw/virtio/virtio-mem.c
index 1a88d649cb..40e8267254 100644
--- a/hw/virtio/virtio-mem.c
+++ b/hw/virtio/virtio-mem.c
@@ -312,16 +312,16 @@ static int virtio_mem_for_each_unplugged_section(const VirtIOMEM *vmem,
 
 static int virtio_mem_notify_populate_cb(MemoryRegionSection *s, void *arg)
 {
-    RamDiscardListener *rdl = arg;
+    StateChangeListener *scl = arg;
 
-    return rdl->notify_populate(rdl, s);
+    return scl->notify_to_state_set(scl, s);
 }
 
 static int virtio_mem_notify_discard_cb(MemoryRegionSection *s, void *arg)
 {
-    RamDiscardListener *rdl = arg;
+    StateChangeListener *scl = arg;
 
-    rdl->notify_discard(rdl, s);
+    scl->notify_to_state_clear(scl, s);
     return 0;
 }
 
@@ -331,12 +331,13 @@ static void virtio_mem_notify_unplug(VirtIOMEM *vmem, uint64_t offset,
     RamDiscardListener *rdl;
 
     QLIST_FOREACH(rdl, &vmem->rdl_list, next) {
-        MemoryRegionSection tmp = *rdl->section;
+        StateChangeListener *scl = &rdl->scl;
+        MemoryRegionSection tmp = *scl->section;
 
         if (!memory_region_section_intersect_range(&tmp, offset, size)) {
             continue;
         }
-        rdl->notify_discard(rdl, &tmp);
+        scl->notify_to_state_clear(scl, &tmp);
     }
 }
 
@@ -347,12 +348,13 @@ static int virtio_mem_notify_plug(VirtIOMEM *vmem, uint64_t offset,
     int ret = 0;
 
     QLIST_FOREACH(rdl, &vmem->rdl_list, next) {
-        MemoryRegionSection tmp = *rdl->section;
+        StateChangeListener *scl = &rdl->scl;
+        MemoryRegionSection tmp = *scl->section;
 
         if (!memory_region_section_intersect_range(&tmp, offset, size)) {
             continue;
         }
-        ret = rdl->notify_populate(rdl, &tmp);
+        ret = scl->notify_to_state_set(scl, &tmp);
         if (ret) {
             break;
         }
@@ -361,7 +363,8 @@ static int virtio_mem_notify_plug(VirtIOMEM *vmem, uint64_t offset,
     if (ret) {
         /* Notify all already-notified listeners. */
         QLIST_FOREACH(rdl2, &vmem->rdl_list, next) {
-            MemoryRegionSection tmp = *rdl2->section;
+            StateChangeListener *scl2 = &rdl2->scl;
+            MemoryRegionSection tmp = *scl2->section;
 
             if (rdl2 == rdl) {
                 break;
@@ -369,7 +372,7 @@ static int virtio_mem_notify_plug(VirtIOMEM *vmem, uint64_t offset,
             if (!memory_region_section_intersect_range(&tmp, offset, size)) {
                 continue;
             }
-            rdl2->notify_discard(rdl2, &tmp);
+            scl2->notify_to_state_clear(scl2, &tmp);
         }
     }
     return ret;
@@ -384,10 +387,11 @@ static void virtio_mem_notify_unplug_all(VirtIOMEM *vmem)
     }
 
     QLIST_FOREACH(rdl, &vmem->rdl_list, next) {
+        StateChangeListener *scl = &rdl->scl;
         if (rdl->double_discard_supported) {
-            rdl->notify_discard(rdl, rdl->section);
+            scl->notify_to_state_clear(scl, scl->section);
         } else {
-            virtio_mem_for_each_plugged_section(vmem, rdl->section, rdl,
+            virtio_mem_for_each_plugged_section(vmem, scl->section, scl,
                                                 virtio_mem_notify_discard_cb);
         }
     }
@@ -1053,8 +1057,8 @@ static void virtio_mem_device_realize(DeviceState *dev, Error **errp)
      * Set ourselves as RamDiscardManager before the plug handler maps the
      * memory region and exposes it via an address space.
      */
-    if (memory_region_set_ram_discard_manager(&vmem->memdev->mr,
-                                              RAM_DISCARD_MANAGER(vmem))) {
+    if (memory_region_set_generic_state_manager(&vmem->memdev->mr,
+                                                GENERIC_STATE_MANAGER(vmem))) {
         error_setg(errp, "Failed to set RamDiscardManager");
         ram_block_coordinated_discard_require(false);
         return;
@@ -1158,7 +1162,7 @@ static void virtio_mem_device_unrealize(DeviceState *dev)
      * The unplug handler unmapped the memory region, it cannot be
      * found via an address space anymore. Unset ourselves.
      */
-    memory_region_set_ram_discard_manager(&vmem->memdev->mr, NULL);
+    memory_region_set_generic_state_manager(&vmem->memdev->mr, NULL);
     ram_block_coordinated_discard_require(false);
 }
 
@@ -1207,7 +1211,8 @@ static int virtio_mem_post_load_bitmap(VirtIOMEM *vmem)
      * into an address space. Replay, now that we updated the bitmap.
      */
     QLIST_FOREACH(rdl, &vmem->rdl_list, next) {
-        ret = virtio_mem_for_each_plugged_section(vmem, rdl->section, rdl,
+        StateChangeListener *scl = &rdl->scl;
+        ret = virtio_mem_for_each_plugged_section(vmem, scl->section, scl,
                                                  virtio_mem_notify_populate_cb);
         if (ret) {
             return ret;
@@ -1704,19 +1709,19 @@ static const Property virtio_mem_properties[] = {
                      dynamic_memslots, false),
 };
 
-static uint64_t virtio_mem_rdm_get_min_granularity(const RamDiscardManager *rdm,
+static uint64_t virtio_mem_rdm_get_min_granularity(const GenericStateManager *gsm,
                                                    const MemoryRegion *mr)
 {
-    const VirtIOMEM *vmem = VIRTIO_MEM(rdm);
+    const VirtIOMEM *vmem = VIRTIO_MEM(gsm);
 
     g_assert(mr == &vmem->memdev->mr);
     return vmem->block_size;
 }
 
-static bool virtio_mem_rdm_is_populated(const RamDiscardManager *rdm,
+static bool virtio_mem_rdm_is_populated(const GenericStateManager *gsm,
                                         const MemoryRegionSection *s)
 {
-    const VirtIOMEM *vmem = VIRTIO_MEM(rdm);
+    const VirtIOMEM *vmem = VIRTIO_MEM(gsm);
     uint64_t start_gpa = vmem->addr + s->offset_within_region;
     uint64_t end_gpa = start_gpa + int128_get64(s->size);
 
@@ -1744,12 +1749,12 @@ static int virtio_mem_rdm_replay_populated_cb(MemoryRegionSection *s, void *arg)
     return data->fn(s, data->opaque);
 }
 
-static int virtio_mem_rdm_replay_populated(const RamDiscardManager *rdm,
+static int virtio_mem_rdm_replay_populated(const GenericStateManager *gsm,
                                            MemoryRegionSection *s,
                                            ReplayStateChange replay_fn,
                                            void *opaque)
 {
-    const VirtIOMEM *vmem = VIRTIO_MEM(rdm);
+    const VirtIOMEM *vmem = VIRTIO_MEM(gsm);
     struct VirtIOMEMReplayData data = {
         .fn = replay_fn,
         .opaque = opaque,
@@ -1769,12 +1774,12 @@ static int virtio_mem_rdm_replay_discarded_cb(MemoryRegionSection *s,
     return 0;
 }
 
-static int virtio_mem_rdm_replay_discarded(const RamDiscardManager *rdm,
+static int virtio_mem_rdm_replay_discarded(const GenericStateManager *gsm,
                                            MemoryRegionSection *s,
                                            ReplayStateChange replay_fn,
                                            void *opaque)
 {
-    const VirtIOMEM *vmem = VIRTIO_MEM(rdm);
+    const VirtIOMEM *vmem = VIRTIO_MEM(gsm);
     struct VirtIOMEMReplayData data = {
         .fn = replay_fn,
         .opaque = opaque,
@@ -1785,18 +1790,19 @@ static int virtio_mem_rdm_replay_discarded(const RamDiscardManager *rdm,
                                                  virtio_mem_rdm_replay_discarded_cb);
 }
 
-static void virtio_mem_rdm_register_listener(RamDiscardManager *rdm,
-                                             RamDiscardListener *rdl,
+static void virtio_mem_rdm_register_listener(GenericStateManager *gsm,
+                                             StateChangeListener *scl,
                                              MemoryRegionSection *s)
 {
-    VirtIOMEM *vmem = VIRTIO_MEM(rdm);
+    VirtIOMEM *vmem = VIRTIO_MEM(gsm);
+    RamDiscardListener *rdl = container_of(scl, RamDiscardListener, scl);
     int ret;
 
     g_assert(s->mr == &vmem->memdev->mr);
-    rdl->section = memory_region_section_new_copy(s);
+    scl->section = memory_region_section_new_copy(s);
 
     QLIST_INSERT_HEAD(&vmem->rdl_list, rdl, next);
-    ret = virtio_mem_for_each_plugged_section(vmem, rdl->section, rdl,
+    ret = virtio_mem_for_each_plugged_section(vmem, scl->section, scl,
                                               virtio_mem_notify_populate_cb);
     if (ret) {
         error_report("%s: Replaying plugged ranges failed: %s", __func__,
@@ -1804,23 +1810,24 @@ static void virtio_mem_rdm_register_listener(RamDiscardManager *rdm,
     }
 }
 
-static void virtio_mem_rdm_unregister_listener(RamDiscardManager *rdm,
-                                               RamDiscardListener *rdl)
+static void virtio_mem_rdm_unregister_listener(GenericStateManager *gsm,
+                                               StateChangeListener *scl)
 {
-    VirtIOMEM *vmem = VIRTIO_MEM(rdm);
+    VirtIOMEM *vmem = VIRTIO_MEM(gsm);
+    RamDiscardListener *rdl = container_of(scl, RamDiscardListener, scl);
 
-    g_assert(rdl->section->mr == &vmem->memdev->mr);
+    g_assert(scl->section->mr == &vmem->memdev->mr);
     if (vmem->size) {
         if (rdl->double_discard_supported) {
-            rdl->notify_discard(rdl, rdl->section);
+            scl->notify_to_state_clear(scl, scl->section);
         } else {
-            virtio_mem_for_each_plugged_section(vmem, rdl->section, rdl,
+            virtio_mem_for_each_plugged_section(vmem, scl->section, scl,
                                                 virtio_mem_notify_discard_cb);
         }
     }
 
-    memory_region_section_free_copy(rdl->section);
-    rdl->section = NULL;
+    memory_region_section_free_copy(scl->section);
+    scl->section = NULL;
     QLIST_REMOVE(rdl, next);
 }
 
@@ -1853,7 +1860,7 @@ static void virtio_mem_class_init(ObjectClass *klass, void *data)
     DeviceClass *dc = DEVICE_CLASS(klass);
     VirtioDeviceClass *vdc = VIRTIO_DEVICE_CLASS(klass);
     VirtIOMEMClass *vmc = VIRTIO_MEM_CLASS(klass);
-    RamDiscardManagerClass *rdmc = RAM_DISCARD_MANAGER_CLASS(klass);
+    GenericStateManagerClass *gsmc = GENERIC_STATE_MANAGER_CLASS(klass);
 
     device_class_set_props(dc, virtio_mem_properties);
     dc->vmsd = &vmstate_virtio_mem;
@@ -1874,12 +1881,12 @@ static void virtio_mem_class_init(ObjectClass *klass, void *data)
     vmc->remove_size_change_notifier = virtio_mem_remove_size_change_notifier;
     vmc->unplug_request_check = virtio_mem_unplug_request_check;
 
-    rdmc->get_min_granularity = virtio_mem_rdm_get_min_granularity;
-    rdmc->is_populated = virtio_mem_rdm_is_populated;
-    rdmc->replay_populated = virtio_mem_rdm_replay_populated;
-    rdmc->replay_discarded = virtio_mem_rdm_replay_discarded;
-    rdmc->register_listener = virtio_mem_rdm_register_listener;
-    rdmc->unregister_listener = virtio_mem_rdm_unregister_listener;
+    gsmc->get_min_granularity = virtio_mem_rdm_get_min_granularity;
+    gsmc->is_state_set = virtio_mem_rdm_is_populated;
+    gsmc->replay_on_state_set = virtio_mem_rdm_replay_populated;
+    gsmc->replay_on_state_clear = virtio_mem_rdm_replay_discarded;
+    gsmc->register_listener = virtio_mem_rdm_register_listener;
+    gsmc->unregister_listener = virtio_mem_rdm_unregister_listener;
 }
 
 static const TypeInfo virtio_mem_info = {
diff --git a/include/exec/memory.h b/include/exec/memory.h
index 3b1d25a403..30e5838d02 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -43,6 +43,12 @@ typedef struct IOMMUMemoryRegionClass IOMMUMemoryRegionClass;
 DECLARE_OBJ_CHECKERS(IOMMUMemoryRegion, IOMMUMemoryRegionClass,
                      IOMMU_MEMORY_REGION, TYPE_IOMMU_MEMORY_REGION)
 
+#define TYPE_GENERIC_STATE_MANAGER "generic-state-manager"
+typedef struct GenericStateManagerClass GenericStateManagerClass;
+typedef struct GenericStateManager GenericStateManager;
+DECLARE_OBJ_CHECKERS(GenericStateManager, GenericStateManagerClass,
+                     GENERIC_STATE_MANAGER, TYPE_GENERIC_STATE_MANAGER)
+
 #define TYPE_RAM_DISCARD_MANAGER "ram-discard-manager"
 typedef struct RamDiscardManagerClass RamDiscardManagerClass;
 typedef struct RamDiscardManager RamDiscardManager;
@@ -506,103 +512,59 @@ struct IOMMUMemoryRegionClass {
     int (*num_indexes)(IOMMUMemoryRegion *iommu);
 };
 
-typedef struct RamDiscardListener RamDiscardListener;
-typedef int (*NotifyRamPopulate)(RamDiscardListener *rdl,
-                                 MemoryRegionSection *section);
-typedef void (*NotifyRamDiscard)(RamDiscardListener *rdl,
+typedef int (*ReplayStateChange)(MemoryRegionSection *section, void *opaque);
+
+typedef struct StateChangeListener StateChangeListener;
+typedef int (*NotifyStateSet)(StateChangeListener *scl,
+                              MemoryRegionSection *section);
+typedef void (*NotifyStateClear)(StateChangeListener *scl,
                                  MemoryRegionSection *section);
 
-struct RamDiscardListener {
+struct StateChangeListener {
     /*
-     * @notify_populate:
+     * @notify_to_state_set:
      *
-     * Notification that previously discarded memory is about to get populated.
-     * Listeners are able to object. If any listener objects, already
-     * successfully notified listeners are notified about a discard again.
+     * Notification that previously state clear part is about to be set.
      *
-     * @rdl: the #RamDiscardListener getting notified
-     * @section: the #MemoryRegionSection to get populated. The section
+     * @scl: the #StateChangeListener getting notified
+     * @section: the #MemoryRegionSection to be state-set. The section
      *           is aligned within the memory region to the minimum granularity
      *           unless it would exceed the registered section.
      *
      * Returns 0 on success. If the notification is rejected by the listener,
      * an error is returned.
      */
-    NotifyRamPopulate notify_populate;
+    NotifyStateSet notify_to_state_set;
 
     /*
-     * @notify_discard:
+     * @notify_to_state_clear:
      *
-     * Notification that previously populated memory was discarded successfully
-     * and listeners should drop all references to such memory and prevent
-     * new population (e.g., unmap).
+     * Notification that previously state set part is about to be cleared
      *
-     * @rdl: the #RamDiscardListener getting notified
-     * @section: the #MemoryRegionSection to get populated. The section
+     * @scl: the #StateChangeListener getting notified
+     * @section: the #MemoryRegionSection to be state-cleared. The section
      *           is aligned within the memory region to the minimum granularity
      *           unless it would exceed the registered section.
-     */
-    NotifyRamDiscard notify_discard;
-
-    /*
-     * @double_discard_supported:
      *
-     * The listener suppors getting @notify_discard notifications that span
-     * already discarded parts.
+     * Returns 0 on success. If the notification is rejected by the listener,
+     * an error is returned.
      */
-    bool double_discard_supported;
+    NotifyStateClear notify_to_state_clear;
 
     MemoryRegionSection *section;
-    QLIST_ENTRY(RamDiscardListener) next;
 };
 
-static inline void ram_discard_listener_init(RamDiscardListener *rdl,
-                                             NotifyRamPopulate populate_fn,
-                                             NotifyRamDiscard discard_fn,
-                                             bool double_discard_supported)
-{
-    rdl->notify_populate = populate_fn;
-    rdl->notify_discard = discard_fn;
-    rdl->double_discard_supported = double_discard_supported;
-}
-
-typedef int (*ReplayStateChange)(MemoryRegionSection *section, void *opaque);
-
 /*
- * RamDiscardManagerClass:
- *
- * A #RamDiscardManager coordinates which parts of specific RAM #MemoryRegion
- * regions are currently populated to be used/accessed by the VM, notifying
- * after parts were discarded (freeing up memory) and before parts will be
- * populated (consuming memory), to be used/accessed by the VM.
- *
- * A #RamDiscardManager can only be set for a RAM #MemoryRegion while the
- * #MemoryRegion isn't mapped into an address space yet (either directly
- * or via an alias); it cannot change while the #MemoryRegion is
- * mapped into an address space.
+ * GenericStateManagerClass:
  *
- * The #RamDiscardManager is intended to be used by technologies that are
- * incompatible with discarding of RAM (e.g., VFIO, which may pin all
- * memory inside a #MemoryRegion), and require proper coordination to only
- * map the currently populated parts, to hinder parts that are expected to
- * remain discarded from silently getting populated and consuming memory.
- * Technologies that support discarding of RAM don't have to bother and can
- * simply map the whole #MemoryRegion.
- *
- * An example #RamDiscardManager is virtio-mem, which logically (un)plugs
- * memory within an assigned RAM #MemoryRegion, coordinated with the VM.
- * Logically unplugging memory consists of discarding RAM. The VM agreed to not
- * access unplugged (discarded) memory - especially via DMA. virtio-mem will
- * properly coordinate with listeners before memory is plugged (populated),
- * and after memory is unplugged (discarded).
+ * A #GenericStateManager is a common interface used to manage the state of
+ * a #MemoryRegion. The managed states is a pair of opposite states, such as
+ * populated and discarded, or private and shared. It is abstract as set and
+ * clear in below callbacks, and the actual state is managed by the
+ * implementation.
  *
- * Listeners are called in multiples of the minimum granularity (unless it
- * would exceed the registered range) and changes are aligned to the minimum
- * granularity within the #MemoryRegion. Listeners have to prepare for memory
- * becoming discarded in a different granularity than it was populated and the
- * other way around.
  */
-struct RamDiscardManagerClass {
+struct GenericStateManagerClass {
     /* private */
     InterfaceClass parent_class;
 
@@ -612,122 +574,188 @@ struct RamDiscardManagerClass {
      * @get_min_granularity:
      *
      * Get the minimum granularity in which listeners will get notified
-     * about changes within the #MemoryRegion via the #RamDiscardManager.
+     * about changes within the #MemoryRegion via the #GenericStateManager.
      *
-     * @rdm: the #RamDiscardManager
+     * @gsm: the #GenericStateManager
      * @mr: the #MemoryRegion
      *
      * Returns the minimum granularity.
      */
-    uint64_t (*get_min_granularity)(const RamDiscardManager *rdm,
+    uint64_t (*get_min_granularity)(const GenericStateManager *gsm,
                                     const MemoryRegion *mr);
 
     /**
-     * @is_populated:
+     * @is_state_set:
      *
-     * Check whether the given #MemoryRegionSection is completely populated
-     * (i.e., no parts are currently discarded) via the #RamDiscardManager.
-     * There are no alignment requirements.
+     * Check whether the given #MemoryRegionSection state is set.
+     * via the #GenericStateManager.
      *
-     * @rdm: the #RamDiscardManager
+     * @gsm: the #GenericStateManager
      * @section: the #MemoryRegionSection
      *
-     * Returns whether the given range is completely populated.
+     * Returns whether the given range is completely set.
      */
-    bool (*is_populated)(const RamDiscardManager *rdm,
+    bool (*is_state_set)(const GenericStateManager *gsm,
                          const MemoryRegionSection *section);
 
     /**
-     * @replay_populated:
+     * @replay_on_state_set:
      *
-     * Call the #ReplayStateChange callback for all populated parts within the
-     * #MemoryRegionSection via the #RamDiscardManager.
+     * Call the #ReplayStateChange callback for all state set parts within the
+     * #MemoryRegionSection via the #GenericStateManager.
      *
      * In case any call fails, no further calls are made.
      *
-     * @rdm: the #RamDiscardManager
+     * @gsm: the #GenericStateManager
      * @section: the #MemoryRegionSection
      * @replay_fn: the #ReplayStateChange callback
      * @opaque: pointer to forward to the callback
      *
      * Returns 0 on success, or a negative error if any notification failed.
      */
-    int (*replay_populated)(const RamDiscardManager *rdm,
-                            MemoryRegionSection *section,
-                            ReplayStateChange replay_fn, void *opaque);
+    int (*replay_on_state_set)(const GenericStateManager *gsm,
+                               MemoryRegionSection *section,
+                               ReplayStateChange replay_fn, void *opaque);
 
     /**
-     * @replay_discarded:
+     * @replay_on_state_clear:
      *
-     * Call the #ReplayStateChange callback for all discarded parts within the
-     * #MemoryRegionSection via the #RamDiscardManager.
+     * Call the #ReplayStateChange callback for all state clear parts within the
+     * #MemoryRegionSection via the #GenericStateManager.
+     *
+     * In case any call fails, no further calls are made.
      *
-     * @rdm: the #RamDiscardManager
+     * @gsm: the #GenericStateManager
      * @section: the #MemoryRegionSection
      * @replay_fn: the #ReplayStateChange callback
      * @opaque: pointer to forward to the callback
      *
      * Returns 0 on success, or a negative error if any notification failed.
      */
-    int (*replay_discarded)(const RamDiscardManager *rdm,
-                            MemoryRegionSection *section,
-                            ReplayStateChange replay_fn, void *opaque);
+    int (*replay_on_state_clear)(const GenericStateManager *gsm,
+                                 MemoryRegionSection *section,
+                                 ReplayStateChange replay_fn, void *opaque);
 
     /**
      * @register_listener:
      *
-     * Register a #RamDiscardListener for the given #MemoryRegionSection and
-     * immediately notify the #RamDiscardListener about all populated parts
-     * within the #MemoryRegionSection via the #RamDiscardManager.
+     * Register a #StateChangeListener for the given #MemoryRegionSection and
+     * immediately notify the #StateChangeListener about all state-set parts
+     * within the #MemoryRegionSection via the #GenericStateManager.
      *
      * In case any notification fails, no further notifications are triggered
      * and an error is logged.
      *
-     * @rdm: the #RamDiscardManager
-     * @rdl: the #RamDiscardListener
+     * @rdm: the #GenericStateManager
+     * @rdl: the #StateChangeListener
      * @section: the #MemoryRegionSection
      */
-    void (*register_listener)(RamDiscardManager *rdm,
-                              RamDiscardListener *rdl,
+    void (*register_listener)(GenericStateManager *gsm,
+                              StateChangeListener *scl,
                               MemoryRegionSection *section);
 
     /**
      * @unregister_listener:
      *
-     * Unregister a previously registered #RamDiscardListener via the
-     * #RamDiscardManager after notifying the #RamDiscardListener about all
-     * populated parts becoming unpopulated within the registered
+     * Unregister a previously registered #StateChangeListener via the
+     * #GenericStateManager after notifying the #StateChangeListener about all
+     * state-set parts becoming state-cleared within the registered
      * #MemoryRegionSection.
      *
-     * @rdm: the #RamDiscardManager
-     * @rdl: the #RamDiscardListener
+     * @rdm: the #GenericStateManager
+     * @rdl: the #StateChangeListener
      */
-    void (*unregister_listener)(RamDiscardManager *rdm,
-                                RamDiscardListener *rdl);
+    void (*unregister_listener)(GenericStateManager *gsm,
+                                StateChangeListener *scl);
 };
 
-uint64_t ram_discard_manager_get_min_granularity(const RamDiscardManager *rdm,
-                                                 const MemoryRegion *mr);
+uint64_t generic_state_manager_get_min_granularity(const GenericStateManager *gsm,
+                                                   const MemoryRegion *mr);
 
-bool ram_discard_manager_is_populated(const RamDiscardManager *rdm,
-                                      const MemoryRegionSection *section);
+bool generic_state_manager_is_state_set(const GenericStateManager *gsm,
+                                        const MemoryRegionSection *section);
 
-int ram_discard_manager_replay_populated(const RamDiscardManager *rdm,
-                                         MemoryRegionSection *section,
-                                         ReplayStateChange replay_fn,
-                                         void *opaque);
+int generic_state_manager_replay_on_state_set(const GenericStateManager *gsm,
+                                           MemoryRegionSection *section,
+                                           ReplayStateChange replay_fn,
+                                           void *opaque);
 
-int ram_discard_manager_replay_discarded(const RamDiscardManager *rdm,
-                                         MemoryRegionSection *section,
-                                         ReplayStateChange replay_fn,
-                                         void *opaque);
+int generic_state_manager_replay_on_state_clear(const GenericStateManager *gsm,
+                                                MemoryRegionSection *section,
+                                                ReplayStateChange replay_fn,
+                                                void *opaque);
 
-void ram_discard_manager_register_listener(RamDiscardManager *rdm,
-                                           RamDiscardListener *rdl,
-                                           MemoryRegionSection *section);
+void generic_state_manager_register_listener(GenericStateManager *gsm,
+                                             StateChangeListener *scl,
+                                             MemoryRegionSection *section);
 
-void ram_discard_manager_unregister_listener(RamDiscardManager *rdm,
-                                             RamDiscardListener *rdl);
+void generic_state_manager_unregister_listener(GenericStateManager *gsm,
+                                               StateChangeListener *scl);
+
+typedef struct RamDiscardListener RamDiscardListener;
+
+struct RamDiscardListener {
+    struct StateChangeListener scl;
+
+    /*
+     * @double_discard_supported:
+     *
+     * The listener suppors getting @notify_discard notifications that span
+     * already discarded parts.
+     */
+    bool double_discard_supported;
+
+    QLIST_ENTRY(RamDiscardListener) next;
+};
+
+static inline void ram_discard_listener_init(RamDiscardListener *rdl,
+                                             NotifyStateSet populate_fn,
+                                             NotifyStateClear discard_fn,
+                                             bool double_discard_supported)
+{
+    rdl->scl.notify_to_state_set = populate_fn;
+    rdl->scl.notify_to_state_clear = discard_fn;
+    rdl->double_discard_supported = double_discard_supported;
+}
+
+/*
+ * RamDiscardManagerClass:
+ *
+ * A #RamDiscardManager coordinates which parts of specific RAM #MemoryRegion
+ * regions are currently populated to be used/accessed by the VM, notifying
+ * after parts were discarded (freeing up memory) and before parts will be
+ * populated (consuming memory), to be used/accessed by the VM.
+ *
+ * A #RamDiscardManager can only be set for a RAM #MemoryRegion while the
+ * #MemoryRegion isn't mapped into an address space yet (either directly
+ * or via an alias); it cannot change while the #MemoryRegion is
+ * mapped into an address space.
+ *
+ * The #RamDiscardManager is intended to be used by technologies that are
+ * incompatible with discarding of RAM (e.g., VFIO, which may pin all
+ * memory inside a #MemoryRegion), and require proper coordination to only
+ * map the currently populated parts, to hinder parts that are expected to
+ * remain discarded from silently getting populated and consuming memory.
+ * Technologies that support discarding of RAM don't have to bother and can
+ * simply map the whole #MemoryRegion.
+ *
+ * An example #RamDiscardManager is virtio-mem, which logically (un)plugs
+ * memory within an assigned RAM #MemoryRegion, coordinated with the VM.
+ * Logically unplugging memory consists of discarding RAM. The VM agreed to not
+ * access unplugged (discarded) memory - especially via DMA. virtio-mem will
+ * properly coordinate with listeners before memory is plugged (populated),
+ * and after memory is unplugged (discarded).
+ *
+ * Listeners are called in multiples of the minimum granularity (unless it
+ * would exceed the registered range) and changes are aligned to the minimum
+ * granularity within the #MemoryRegion. Listeners have to prepare for memory
+ * becoming discarded in a different granularity than it was populated and the
+ * other way around.
+ */
+struct RamDiscardManagerClass {
+    /* private */
+    GenericStateManagerClass parent_class;
+};
 
 /**
  * memory_get_xlat_addr: Extract addresses from a TLB entry
@@ -795,7 +823,7 @@ struct MemoryRegion {
     const char *name;
     unsigned ioeventfd_nb;
     MemoryRegionIoeventfd *ioeventfds;
-    RamDiscardManager *rdm; /* Only for RAM */
+    GenericStateManager *gsm; /* Only for RAM */
 
     /* For devices designed to perform re-entrant IO into their own IO MRs */
     bool disable_reentrancy_guard;
@@ -2462,39 +2490,36 @@ bool memory_region_present(MemoryRegion *container, hwaddr addr);
 bool memory_region_is_mapped(MemoryRegion *mr);
 
 /**
- * memory_region_get_ram_discard_manager: get the #RamDiscardManager for a
+ * memory_region_get_generic_state_manager: get the #GenericStateManager for a
  * #MemoryRegion
  *
- * The #RamDiscardManager cannot change while a memory region is mapped.
+ * The #GenericStateManager cannot change while a memory region is mapped.
  *
  * @mr: the #MemoryRegion
  */
-RamDiscardManager *memory_region_get_ram_discard_manager(MemoryRegion *mr);
+GenericStateManager *memory_region_get_generic_state_manager(MemoryRegion *mr);
 
 /**
- * memory_region_has_ram_discard_manager: check whether a #MemoryRegion has a
- * #RamDiscardManager assigned
+ * memory_region_set_generic_state_manager: set the #GenericStateManager for a
+ * #MemoryRegion
+ *
+ * This function must not be called for a mapped #MemoryRegion, a #MemoryRegion
+ * that does not cover RAM, or a #MemoryRegion that already has a
+ * #GenericStateManager assigned. Return 0 if the gsm is set successfully.
  *
  * @mr: the #MemoryRegion
+ * @gsm: #GenericStateManager to set
  */
-static inline bool memory_region_has_ram_discard_manager(MemoryRegion *mr)
-{
-    return !!memory_region_get_ram_discard_manager(mr);
-}
+int memory_region_set_generic_state_manager(MemoryRegion *mr,
+                                            GenericStateManager *gsm);
 
 /**
- * memory_region_set_ram_discard_manager: set the #RamDiscardManager for a
- * #MemoryRegion
- *
- * This function must not be called for a mapped #MemoryRegion, a #MemoryRegion
- * that does not cover RAM, or a #MemoryRegion that already has a
- * #RamDiscardManager assigned. Return 0 if the rdm is set successfully.
+ * memory_region_has_ram_discard_manager: check whether a #MemoryRegion has a
+ * #RamDiscardManager assigned
  *
  * @mr: the #MemoryRegion
- * @rdm: #RamDiscardManager to set
  */
-int memory_region_set_ram_discard_manager(MemoryRegion *mr,
-                                          RamDiscardManager *rdm);
+bool memory_region_has_ram_discard_manager(MemoryRegion *mr);
 
 /**
  * memory_region_find: translate an address/size relative to a
diff --git a/migration/ram.c b/migration/ram.c
index 053730367b..c881523e64 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -857,14 +857,14 @@ static uint64_t ramblock_dirty_bitmap_clear_discarded_pages(RAMBlock *rb)
     uint64_t cleared_bits = 0;
 
     if (rb->mr && rb->bmap && memory_region_has_ram_discard_manager(rb->mr)) {
-        RamDiscardManager *rdm = memory_region_get_ram_discard_manager(rb->mr);
+        GenericStateManager *gsm = memory_region_get_generic_state_manager(rb->mr);
         MemoryRegionSection section = {
             .mr = rb->mr,
             .offset_within_region = 0,
             .size = int128_make64(qemu_ram_get_used_length(rb)),
         };
 
-        ram_discard_manager_replay_discarded(rdm, &section,
+        generic_state_manager_replay_on_state_clear(gsm, &section,
                                              dirty_bitmap_clear_section,
                                              &cleared_bits);
     }
@@ -880,14 +880,14 @@ static uint64_t ramblock_dirty_bitmap_clear_discarded_pages(RAMBlock *rb)
 bool ramblock_page_is_discarded(RAMBlock *rb, ram_addr_t start)
 {
     if (rb->mr && memory_region_has_ram_discard_manager(rb->mr)) {
-        RamDiscardManager *rdm = memory_region_get_ram_discard_manager(rb->mr);
+        GenericStateManager *gsm = memory_region_get_generic_state_manager(rb->mr);
         MemoryRegionSection section = {
             .mr = rb->mr,
             .offset_within_region = start,
             .size = int128_make64(qemu_ram_pagesize(rb)),
         };
 
-        return !ram_discard_manager_is_populated(rdm, &section);
+        return !generic_state_manager_is_state_set(gsm, &section);
     }
     return false;
 }
@@ -1545,14 +1545,14 @@ static void ram_block_populate_read(RAMBlock *rb)
      * Note: The result is only stable while migrating (precopy/postcopy).
      */
     if (rb->mr && memory_region_has_ram_discard_manager(rb->mr)) {
-        RamDiscardManager *rdm = memory_region_get_ram_discard_manager(rb->mr);
+        GenericStateManager *gsm = memory_region_get_generic_state_manager(rb->mr);
         MemoryRegionSection section = {
             .mr = rb->mr,
             .offset_within_region = 0,
             .size = rb->mr->size,
         };
 
-        ram_discard_manager_replay_populated(rdm, &section,
+        generic_state_manager_replay_on_state_set(gsm, &section,
                                              populate_read_section, NULL);
     } else {
         populate_read_range(rb, 0, rb->used_length);
@@ -1604,14 +1604,14 @@ static int ram_block_uffd_protect(RAMBlock *rb, int uffd_fd)
 
     /* See ram_block_populate_read() */
     if (rb->mr && memory_region_has_ram_discard_manager(rb->mr)) {
-        RamDiscardManager *rdm = memory_region_get_ram_discard_manager(rb->mr);
+        GenericStateManager *gsm = memory_region_get_generic_state_manager(rb->mr);
         MemoryRegionSection section = {
             .mr = rb->mr,
             .offset_within_region = 0,
             .size = rb->mr->size,
         };
 
-        return ram_discard_manager_replay_populated(rdm, &section,
+        return generic_state_manager_replay_on_state_set(gsm, &section,
                                                     uffd_protect_section,
                                                     (void *)(uintptr_t)uffd_fd);
     }
diff --git a/system/memory.c b/system/memory.c
index b5ab729e13..7b921c66a6 100644
--- a/system/memory.c
+++ b/system/memory.c
@@ -2107,83 +2107,93 @@ int memory_region_iommu_num_indexes(IOMMUMemoryRegion *iommu_mr)
     return imrc->num_indexes(iommu_mr);
 }
 
-RamDiscardManager *memory_region_get_ram_discard_manager(MemoryRegion *mr)
+GenericStateManager *memory_region_get_generic_state_manager(MemoryRegion *mr)
 {
     if (!memory_region_is_ram(mr)) {
         return NULL;
     }
-    return mr->rdm;
+    return mr->gsm;
 }
 
-int memory_region_set_ram_discard_manager(MemoryRegion *mr,
-                                          RamDiscardManager *rdm)
+int memory_region_set_generic_state_manager(MemoryRegion *mr,
+                                            GenericStateManager *gsm)
 {
     g_assert(memory_region_is_ram(mr));
-    if (mr->rdm && rdm) {
+    if (mr->gsm && gsm) {
         return -EBUSY;
     }
 
-    mr->rdm = rdm;
+    mr->gsm = gsm;
     return 0;
 }
 
-uint64_t ram_discard_manager_get_min_granularity(const RamDiscardManager *rdm,
-                                                 const MemoryRegion *mr)
+bool memory_region_has_ram_discard_manager(MemoryRegion *mr)
 {
-    RamDiscardManagerClass *rdmc = RAM_DISCARD_MANAGER_GET_CLASS(rdm);
+    if (!memory_region_is_ram(mr) ||
+        !object_dynamic_cast(OBJECT(mr->gsm), TYPE_RAM_DISCARD_MANAGER)) {
+        return false;
+    }
+
+    return true;
+}
+
+uint64_t generic_state_manager_get_min_granularity(const GenericStateManager *gsm,
+                                                   const MemoryRegion *mr)
+{
+    GenericStateManagerClass *gsmc = GENERIC_STATE_MANAGER_GET_CLASS(gsm);
 
-    g_assert(rdmc->get_min_granularity);
-    return rdmc->get_min_granularity(rdm, mr);
+    g_assert(gsmc->get_min_granularity);
+    return gsmc->get_min_granularity(gsm, mr);
 }
 
-bool ram_discard_manager_is_populated(const RamDiscardManager *rdm,
-                                      const MemoryRegionSection *section)
+bool generic_state_manager_is_state_set(const GenericStateManager *gsm,
+                                        const MemoryRegionSection *section)
 {
-    RamDiscardManagerClass *rdmc = RAM_DISCARD_MANAGER_GET_CLASS(rdm);
+    GenericStateManagerClass *gsmc = GENERIC_STATE_MANAGER_GET_CLASS(gsm);
 
-    g_assert(rdmc->is_populated);
-    return rdmc->is_populated(rdm, section);
+    g_assert(gsmc->is_state_set);
+    return gsmc->is_state_set(gsm, section);
 }
 
-int ram_discard_manager_replay_populated(const RamDiscardManager *rdm,
-                                         MemoryRegionSection *section,
-                                         ReplayStateChange replay_fn,
-                                         void *opaque)
+int generic_state_manager_replay_on_state_set(const GenericStateManager *gsm,
+                                              MemoryRegionSection *section,
+                                              ReplayStateChange replay_fn,
+                                              void *opaque)
 {
-    RamDiscardManagerClass *rdmc = RAM_DISCARD_MANAGER_GET_CLASS(rdm);
+    GenericStateManagerClass *gsmc = GENERIC_STATE_MANAGER_GET_CLASS(gsm);
 
-    g_assert(rdmc->replay_populated);
-    return rdmc->replay_populated(rdm, section, replay_fn, opaque);
+    g_assert(gsmc->replay_on_state_set);
+    return gsmc->replay_on_state_set(gsm, section, replay_fn, opaque);
 }
 
-int ram_discard_manager_replay_discarded(const RamDiscardManager *rdm,
-                                         MemoryRegionSection *section,
-                                         ReplayStateChange replay_fn,
-                                         void *opaque)
+int generic_state_manager_replay_on_state_clear(const GenericStateManager *gsm,
+                                                MemoryRegionSection *section,
+                                                ReplayStateChange replay_fn,
+                                                void *opaque)
 {
-    RamDiscardManagerClass *rdmc = RAM_DISCARD_MANAGER_GET_CLASS(rdm);
+    GenericStateManagerClass *gsmc = GENERIC_STATE_MANAGER_GET_CLASS(gsm);
 
-    g_assert(rdmc->replay_discarded);
-    return rdmc->replay_discarded(rdm, section, replay_fn, opaque);
+    g_assert(gsmc->replay_on_state_clear);
+    return gsmc->replay_on_state_clear(gsm, section, replay_fn, opaque);
 }
 
-void ram_discard_manager_register_listener(RamDiscardManager *rdm,
-                                           RamDiscardListener *rdl,
-                                           MemoryRegionSection *section)
+void generic_state_manager_register_listener(GenericStateManager *gsm,
+                                             StateChangeListener *scl,
+                                             MemoryRegionSection *section)
 {
-    RamDiscardManagerClass *rdmc = RAM_DISCARD_MANAGER_GET_CLASS(rdm);
+    GenericStateManagerClass *gsmc = GENERIC_STATE_MANAGER_GET_CLASS(gsm);
 
-    g_assert(rdmc->register_listener);
-    rdmc->register_listener(rdm, rdl, section);
+    g_assert(gsmc->register_listener);
+    gsmc->register_listener(gsm, scl, section);
 }
 
-void ram_discard_manager_unregister_listener(RamDiscardManager *rdm,
-                                             RamDiscardListener *rdl)
+void generic_state_manager_unregister_listener(GenericStateManager *gsm,
+                                               StateChangeListener *scl)
 {
-    RamDiscardManagerClass *rdmc = RAM_DISCARD_MANAGER_GET_CLASS(rdm);
+    GenericStateManagerClass *gsmc = GENERIC_STATE_MANAGER_GET_CLASS(gsm);
 
-    g_assert(rdmc->unregister_listener);
-    rdmc->unregister_listener(rdm, rdl);
+    g_assert(gsmc->unregister_listener);
+    gsmc->unregister_listener(gsm, scl);
 }
 
 /* Called with rcu_read_lock held.  */
@@ -2210,7 +2220,7 @@ bool memory_get_xlat_addr(IOMMUTLBEntry *iotlb, void **vaddr,
         error_setg(errp, "iommu map to non memory area %" HWADDR_PRIx "", xlat);
         return false;
     } else if (memory_region_has_ram_discard_manager(mr)) {
-        RamDiscardManager *rdm = memory_region_get_ram_discard_manager(mr);
+        GenericStateManager *gsm = memory_region_get_generic_state_manager(mr);
         MemoryRegionSection tmp = {
             .mr = mr,
             .offset_within_region = xlat,
@@ -2225,7 +2235,7 @@ bool memory_get_xlat_addr(IOMMUTLBEntry *iotlb, void **vaddr,
          * Disallow that. vmstate priorities make sure any RamDiscardManager
          * were already restored before IOMMUs are restored.
          */
-        if (!ram_discard_manager_is_populated(rdm, &tmp)) {
+        if (!generic_state_manager_is_state_set(gsm, &tmp)) {
             error_setg(errp, "iommu map to discarded memory (e.g., unplugged"
                          " via virtio-mem): %" HWADDR_PRIx "",
                          iotlb->translated_addr);
@@ -3814,8 +3824,15 @@ static const TypeInfo iommu_memory_region_info = {
     .abstract           = true,
 };
 
-static const TypeInfo ram_discard_manager_info = {
+static const TypeInfo generic_state_manager_info = {
     .parent             = TYPE_INTERFACE,
+    .name               = TYPE_GENERIC_STATE_MANAGER,
+    .class_size         = sizeof(GenericStateManagerClass),
+    .abstract           = true,
+};
+
+static const TypeInfo ram_discard_manager_info = {
+    .parent             = TYPE_GENERIC_STATE_MANAGER,
     .name               = TYPE_RAM_DISCARD_MANAGER,
     .class_size         = sizeof(RamDiscardManagerClass),
 };
@@ -3824,6 +3841,7 @@ static void memory_register_types(void)
 {
     type_register_static(&memory_region_info);
     type_register_static(&iommu_memory_region_info);
+    type_register_static(&generic_state_manager_info);
     type_register_static(&ram_discard_manager_info);
 }
 
diff --git a/system/memory_mapping.c b/system/memory_mapping.c
index 37d3325f77..e9d15c737d 100644
--- a/system/memory_mapping.c
+++ b/system/memory_mapping.c
@@ -271,10 +271,8 @@ static void guest_phys_blocks_region_add(MemoryListener *listener,
 
     /* for special sparse regions, only add populated parts */
     if (memory_region_has_ram_discard_manager(section->mr)) {
-        RamDiscardManager *rdm;
-
-        rdm = memory_region_get_ram_discard_manager(section->mr);
-        ram_discard_manager_replay_populated(rdm, section,
+        GenericStateManager *gsm = memory_region_get_generic_state_manager(section->mr);
+        generic_state_manager_replay_on_state_set(gsm, section,
                                              guest_phys_ram_populate_cb, g);
         return;
     }

From patchwork Mon Apr  7 07:49:25 2025
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Chenyi Qiang <chenyi.qiang@intel.com>
X-Patchwork-Id: 14039896
Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.9])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id E0B3E225A38
	for <kvm@vger.kernel.org>; Mon,  7 Apr 2025 07:50:06 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=198.175.65.9
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1744012208; cv=none;
 b=rwq9bl6s1ZZi92uqiLxnLM4atwKbaquuWFLvkH+G9S34c1b0KSGDhsh3f8Nab8uY4gIL8BrpHrLSz6Vm2DnLowrXQWL804GWVebwuA3YZ+nmQ2r1fwRMScarcwCQyEp5HU75B5z2kfcy6Fakxi8ac3wsZj7kjZ9//7zAzgM2O+s=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1744012208; c=relaxed/simple;
	bh=4/NxRHGIa82bpN+zenGnUIFub5qv8ngu3vJ4kD8YLHE=;
	h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References:
	 MIME-Version;
 b=QuntGgHdMWUOy3+wX/z0tsuDTAXGPgoEIuEPRMRK3kc3pa+aDBpeWzLjpGjzM0BrdnWlZxOU2aqJOW7LHCbX0mRl8XqdEcdxCW11gp7/XICa+5ogQ5D5ZlWX1ur9aOnGiPYUoT81WrpRzP2mBX2YQwwoU85kU5tQY05/adeCh6E=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=intel.com;
 spf=pass smtp.mailfrom=intel.com;
 dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com
 header.b=IfBKDCmJ; arc=none smtp.client-ip=198.175.65.9
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=intel.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=intel.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com
 header.b="IfBKDCmJ"
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1744012207; x=1775548207;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=4/NxRHGIa82bpN+zenGnUIFub5qv8ngu3vJ4kD8YLHE=;
  b=IfBKDCmJQL25GVMZY0y37N3IU5Eu9JdR2PTotuuUaNBtA+wFvErQLS3b
   nl6s/UUZvuLOtxeqODfvcN4WR6aaba8lKIBfC+qInfoPluu+R0ZsIFz+d
   jY+Td5KFQAWWQX/CA2QYLeuwTGdqtu4hfCpVN4qxXR32Wej5kQWmhy5Ap
   ycV3qUtfWM03tNiHCfUWWONh5la0uKrLrBrYZtVVFwqMCGGeKZW69D7/H
   gjwNPW7XL4COlWAJrDi4zxWiK5+h+VH6rgQFpaxdkkXn15bz6AW/8Mf0C
   WfEP7ppK7r5F2tIKswDCegvXYP4OoAa+YmNWYVwnYlgDUZed8sbV0UvRS
   g==;
X-CSE-ConnectionGUID: ywY6/oaJRDWFCaQz5Zrzjw==
X-CSE-MsgGUID: n/rH48m8ReqrjQ1ZrFgC3g==
X-IronPort-AV: E=McAfee;i="6700,10204,11396"; a="67857543"
X-IronPort-AV: E=Sophos;i="6.15,193,1739865600";
   d="scan'208";a="67857543"
Received: from orviesa007.jf.intel.com ([10.64.159.147])
  by orvoesa101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 07 Apr 2025 00:50:07 -0700
X-CSE-ConnectionGUID: yBqCn659QIiBIBvFTDzYEQ==
X-CSE-MsgGUID: dQEPRNaGTPGDAItE1bRttw==
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="6.15,193,1739865600";
   d="scan'208";a="128405591"
Received: from emr-bkc.sh.intel.com ([10.112.230.82])
  by orviesa007-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 07 Apr 2025 00:50:03 -0700
From: Chenyi Qiang <chenyi.qiang@intel.com>
To: David Hildenbrand <david@redhat.com>, Alexey Kardashevskiy <aik@amd.com>,
 Peter Xu <peterx@redhat.com>, Gupta Pankaj <pankaj.gupta@amd.com>,
 Paolo Bonzini <pbonzini@redhat.com>,
 =?utf-8?q?Philippe_Mathieu-Daud=C3=A9?= <philmd@linaro.org>,
 Michael Roth <michael.roth@amd.com>
Cc: Chenyi Qiang <chenyi.qiang@intel.com>,
	qemu-devel@nongnu.org,
	kvm@vger.kernel.org,
	Williams Dan J <dan.j.williams@intel.com>,
	Peng Chao P <chao.p.peng@intel.com>,
	Gao Chao <chao.gao@intel.com>,
	Xu Yilun <yilun.xu@intel.com>,
	Li Xiaoyao <xiaoyao.li@intel.com>
Subject: [PATCH v4 05/13] memory: Introduce PrivateSharedManager Interface as
 child of GenericStateManager
Date: Mon,  7 Apr 2025 15:49:25 +0800
Message-ID: <20250407074939.18657-6-chenyi.qiang@intel.com>
X-Mailer: git-send-email 2.43.5
In-Reply-To: <20250407074939.18657-1-chenyi.qiang@intel.com>
References: <20250407074939.18657-1-chenyi.qiang@intel.com>
Precedence: bulk
X-Mailing-List: kvm@vger.kernel.org
List-Id: <kvm.vger.kernel.org>
List-Subscribe: <mailto:kvm+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:kvm+unsubscribe@vger.kernel.org>
MIME-Version: 1.0

To manage the private and shared RAM states in confidential VMs,
introduce a new class of PrivateShareManager as a child of
GenericStateManager, which inherits the six interface callbacks. With a
different interface type, it can be distinguished from the
RamDiscardManager object and provide the flexibility for addressing
specific requirements of confidential VMs in the future.

Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com>
---
Changes in v4:
    - Newly added.
---
 include/exec/memory.h | 44 +++++++++++++++++++++++++++++++++++++++++--
 system/memory.c       | 17 +++++++++++++++++
 2 files changed, 59 insertions(+), 2 deletions(-)

diff --git a/include/exec/memory.h b/include/exec/memory.h
index 30e5838d02..08f25e5e84 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -55,6 +55,12 @@ typedef struct RamDiscardManager RamDiscardManager;
 DECLARE_OBJ_CHECKERS(RamDiscardManager, RamDiscardManagerClass,
                      RAM_DISCARD_MANAGER, TYPE_RAM_DISCARD_MANAGER);
 
+#define TYPE_PRIVATE_SHARED_MANAGER "private-shared-manager"
+typedef struct PrivateSharedManagerClass PrivateSharedManagerClass;
+typedef struct PrivateSharedManager PrivateSharedManager;
+DECLARE_OBJ_CHECKERS(PrivateSharedManager, PrivateSharedManagerClass,
+                     PRIVATE_SHARED_MANAGER, TYPE_PRIVATE_SHARED_MANAGER)
+
 #ifdef CONFIG_FUZZ
 void fuzz_dma_read_cb(size_t addr,
                       size_t len,
@@ -692,6 +698,14 @@ void generic_state_manager_register_listener(GenericStateManager *gsm,
 void generic_state_manager_unregister_listener(GenericStateManager *gsm,
                                                StateChangeListener *scl);
 
+static inline void state_change_listener_init(StateChangeListener *scl,
+                                              NotifyStateSet state_set_fn,
+                                              NotifyStateClear state_clear_fn)
+{
+    scl->notify_to_state_set = state_set_fn;
+    scl->notify_to_state_clear = state_clear_fn;
+}
+
 typedef struct RamDiscardListener RamDiscardListener;
 
 struct RamDiscardListener {
@@ -713,8 +727,7 @@ static inline void ram_discard_listener_init(RamDiscardListener *rdl,
                                              NotifyStateClear discard_fn,
                                              bool double_discard_supported)
 {
-    rdl->scl.notify_to_state_set = populate_fn;
-    rdl->scl.notify_to_state_clear = discard_fn;
+    state_change_listener_init(&rdl->scl, populate_fn, discard_fn);
     rdl->double_discard_supported = double_discard_supported;
 }
 
@@ -757,6 +770,25 @@ struct RamDiscardManagerClass {
     GenericStateManagerClass parent_class;
 };
 
+typedef struct PrivateSharedListener PrivateSharedListener;
+struct PrivateSharedListener {
+    struct StateChangeListener scl;
+
+    QLIST_ENTRY(PrivateSharedListener) next;
+};
+
+struct PrivateSharedManagerClass {
+    /* private */
+    GenericStateManagerClass parent_class;
+};
+
+static inline void private_shared_listener_init(PrivateSharedListener *psl,
+                                                NotifyStateSet populate_fn,
+                                                NotifyStateClear discard_fn)
+{
+    state_change_listener_init(&psl->scl, populate_fn, discard_fn);
+}
+
 /**
  * memory_get_xlat_addr: Extract addresses from a TLB entry
  *
@@ -2521,6 +2553,14 @@ int memory_region_set_generic_state_manager(MemoryRegion *mr,
  */
 bool memory_region_has_ram_discard_manager(MemoryRegion *mr);
 
+/**
+ * memory_region_has_private_shared_manager: check whether a #MemoryRegion has a
+ * #PrivateSharedManager assigned
+ *
+ * @mr: the #MemoryRegion
+ */
+bool memory_region_has_private_shared_manager(MemoryRegion *mr);
+
 /**
  * memory_region_find: translate an address/size relative to a
  * MemoryRegion into a #MemoryRegionSection.
diff --git a/system/memory.c b/system/memory.c
index 7b921c66a6..e6e944d9c0 100644
--- a/system/memory.c
+++ b/system/memory.c
@@ -2137,6 +2137,16 @@ bool memory_region_has_ram_discard_manager(MemoryRegion *mr)
     return true;
 }
 
+bool memory_region_has_private_shared_manager(MemoryRegion *mr)
+{
+    if (!memory_region_is_ram(mr) ||
+        !object_dynamic_cast(OBJECT(mr->gsm), TYPE_PRIVATE_SHARED_MANAGER)) {
+        return false;
+    }
+
+    return true;
+}
+
 uint64_t generic_state_manager_get_min_granularity(const GenericStateManager *gsm,
                                                    const MemoryRegion *mr)
 {
@@ -3837,12 +3847,19 @@ static const TypeInfo ram_discard_manager_info = {
     .class_size         = sizeof(RamDiscardManagerClass),
 };
 
+static const TypeInfo private_shared_manager_info = {
+    .parent             = TYPE_GENERIC_STATE_MANAGER,
+    .name               = TYPE_PRIVATE_SHARED_MANAGER,
+    .class_size         = sizeof(PrivateSharedManagerClass),
+};
+
 static void memory_register_types(void)
 {
     type_register_static(&memory_region_info);
     type_register_static(&iommu_memory_region_info);
     type_register_static(&generic_state_manager_info);
     type_register_static(&ram_discard_manager_info);
+    type_register_static(&private_shared_manager_info);
 }
 
 type_init(memory_register_types)

From patchwork Mon Apr  7 07:49:26 2025
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Chenyi Qiang <chenyi.qiang@intel.com>
X-Patchwork-Id: 14039897
Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.9])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8768F192D68
	for <kvm@vger.kernel.org>; Mon,  7 Apr 2025 07:50:10 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=198.175.65.9
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1744012212; cv=none;
 b=m0Pf3xUVS35/+0Blsq9XU9fVullN5Dsb9YG/ty+NrbNacNCB6zFFdxNuBScxL1XOM/qq//vQ6fXGXBI170GcaZxlnm5PDeaHpX1Zwti0OqeCN+9b9orN5LgCA5BFTzCF8h20f64w/aoy5fEn73NObdIMA6WTvSZBlvC1lXU+ziY=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1744012212; c=relaxed/simple;
	bh=jOVy7Ml6/2LDRm72chti/Lz4qTE3xrNmv1SgD04lXdk=;
	h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References:
	 MIME-Version;
 b=Zzwcj2V7H9z3prIMvQ8ceEbb6OtwFOgSTMgqeqpQlfD718Ft96p9AfCsnp+n1QSqCZqAhj6CXwYu2yBSuxzYZZiU4MNJXFZMz+uyAwS15AUphtm6Arloygzb8F4uZex39uGlRzhle/p/iCimz9zuNhDadH6NK5C0jyi+Cyn7KpY=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=intel.com;
 spf=pass smtp.mailfrom=intel.com;
 dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com
 header.b=RpaNHXXP; arc=none smtp.client-ip=198.175.65.9
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=intel.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=intel.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com
 header.b="RpaNHXXP"
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1744012211; x=1775548211;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=jOVy7Ml6/2LDRm72chti/Lz4qTE3xrNmv1SgD04lXdk=;
  b=RpaNHXXPTpbfAJ7X2Ezmo/KIuqUWPfF/cKrxJScVWEVTX9uO2YqbOyTd
   djC5jmWzUydf8RLv0o2fHu9inQqqoaajYaZLQ5h6xsZNUpLjxadHhTLZO
   aDLQuaQbRlBxAHjHveSlvDwojxFhcRJOh8Dz+HfUaz8FivHJX9o5JzEN6
   Nd4jkHfeNxocb6V/C8GDmiCpcq4MPSKShROarU1yUHPDiCVj1CT7qhWYL
   zyDO6AMFKQCNTdFKD5WO0I2xaKp4zfgx5HxXgDgwLLuQpjA+gir7ejSMU
   MnzNOFTOdRjDC9wjJofSY+ZEtiy+d2nVZ9X2lVdSLXCe41ro7GJcnkVqw
   Q==;
X-CSE-ConnectionGUID: AsiSGhvNTFGEoJyTf0aofw==
X-CSE-MsgGUID: OqEFHlJnRGy4Unu6Kx3KeA==
X-IronPort-AV: E=McAfee;i="6700,10204,11396"; a="67857550"
X-IronPort-AV: E=Sophos;i="6.15,193,1739865600";
   d="scan'208";a="67857550"
Received: from orviesa007.jf.intel.com ([10.64.159.147])
  by orvoesa101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 07 Apr 2025 00:50:11 -0700
X-CSE-ConnectionGUID: WKLDJ1FmTiejDRlLKWWZ3Q==
X-CSE-MsgGUID: hw6BqKrnQpe7BwXSvQbjGA==
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="6.15,193,1739865600";
   d="scan'208";a="128405620"
Received: from emr-bkc.sh.intel.com ([10.112.230.82])
  by orviesa007-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 07 Apr 2025 00:50:07 -0700
From: Chenyi Qiang <chenyi.qiang@intel.com>
To: David Hildenbrand <david@redhat.com>, Alexey Kardashevskiy <aik@amd.com>,
 Peter Xu <peterx@redhat.com>, Gupta Pankaj <pankaj.gupta@amd.com>,
 Paolo Bonzini <pbonzini@redhat.com>,
 =?utf-8?q?Philippe_Mathieu-Daud=C3=A9?= <philmd@linaro.org>,
 Michael Roth <michael.roth@amd.com>
Cc: Chenyi Qiang <chenyi.qiang@intel.com>,
	qemu-devel@nongnu.org,
	kvm@vger.kernel.org,
	Williams Dan J <dan.j.williams@intel.com>,
	Peng Chao P <chao.p.peng@intel.com>,
	Gao Chao <chao.gao@intel.com>,
	Xu Yilun <yilun.xu@intel.com>,
	Li Xiaoyao <xiaoyao.li@intel.com>
Subject: [PATCH v4 06/13] vfio: Add the support for PrivateSharedManager
 Interface
Date: Mon,  7 Apr 2025 15:49:26 +0800
Message-ID: <20250407074939.18657-7-chenyi.qiang@intel.com>
X-Mailer: git-send-email 2.43.5
In-Reply-To: <20250407074939.18657-1-chenyi.qiang@intel.com>
References: <20250407074939.18657-1-chenyi.qiang@intel.com>
Precedence: bulk
X-Mailing-List: kvm@vger.kernel.org
List-Id: <kvm.vger.kernel.org>
List-Subscribe: <mailto:kvm+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:kvm+unsubscribe@vger.kernel.org>
MIME-Version: 1.0

Subsystems like VFIO previously disabled ram block discard and only
allowed coordinated discarding via RamDiscardManager. However,
guest_memfd in confidential VMs relies on discard operations for page
conversion between private and shared memory. This can lead to stale
IOMMU mapping issue when assigning a hardware device to a confidential
VM via shared memory. With the introduction of PrivateSharedManager
interface to manage private and shared states and being distinct from
RamDiscardManager, include PrivateSharedManager in coordinated RAM
discard and add related support in VFIO.

Currently, migration support for confidential VMs is not available, so
vfio_sync_dirty_bitmap() handling for PrivateSharedListener can be
ignored. The register/unregister of PrivateSharedListener is necessary
during vfio_listener_region_add/del(). The listener callbacks are
similar between RamDiscardListener and PrivateSharedListener, allowing
for extraction of common parts opportunisticlly.

Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com>
---
Changes in v4
    - Newly added.
---
 hw/vfio/common.c                      | 104 +++++++++++++++++++++++---
 hw/vfio/container-base.c              |   1 +
 include/hw/vfio/vfio-container-base.h |  10 +++
 3 files changed, 105 insertions(+), 10 deletions(-)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 3172d877cc..48468a12c3 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -335,13 +335,9 @@ out:
     rcu_read_unlock();
 }
 
-static void vfio_ram_discard_notify_discard(StateChangeListener *scl,
-                                            MemoryRegionSection *section)
+static void vfio_state_change_notify_to_state_clear(VFIOContainerBase *bcontainer,
+                                                    MemoryRegionSection *section)
 {
-    RamDiscardListener *rdl = container_of(scl, RamDiscardListener, scl);
-    VFIORamDiscardListener *vrdl = container_of(rdl, VFIORamDiscardListener,
-                                                listener);
-    VFIOContainerBase *bcontainer = vrdl->bcontainer;
     const hwaddr size = int128_get64(section->size);
     const hwaddr iova = section->offset_within_address_space;
     int ret;
@@ -354,13 +350,28 @@ static void vfio_ram_discard_notify_discard(StateChangeListener *scl,
     }
 }
 
-static int vfio_ram_discard_notify_populate(StateChangeListener *scl,
+static void vfio_ram_discard_notify_discard(StateChangeListener *scl,
                                             MemoryRegionSection *section)
 {
     RamDiscardListener *rdl = container_of(scl, RamDiscardListener, scl);
     VFIORamDiscardListener *vrdl = container_of(rdl, VFIORamDiscardListener,
                                                 listener);
-    VFIOContainerBase *bcontainer = vrdl->bcontainer;
+    vfio_state_change_notify_to_state_clear(vrdl->bcontainer, section);
+}
+
+static void vfio_private_shared_notify_to_private(StateChangeListener *scl,
+                                                  MemoryRegionSection *section)
+{
+    PrivateSharedListener *psl = container_of(scl, PrivateSharedListener, scl);
+    VFIOPrivateSharedListener *vpsl = container_of(psl, VFIOPrivateSharedListener,
+                                                   listener);
+    vfio_state_change_notify_to_state_clear(vpsl->bcontainer, section);
+}
+
+static int vfio_state_change_notify_to_state_set(VFIOContainerBase *bcontainer,
+                                                 MemoryRegionSection *section,
+                                                 uint64_t granularity)
+{
     const hwaddr end = section->offset_within_region +
                        int128_get64(section->size);
     hwaddr start, next, iova;
@@ -372,7 +383,7 @@ static int vfio_ram_discard_notify_populate(StateChangeListener *scl,
      * unmap in minimum granularity later.
      */
     for (start = section->offset_within_region; start < end; start = next) {
-        next = ROUND_UP(start + 1, vrdl->granularity);
+        next = ROUND_UP(start + 1, granularity);
         next = MIN(next, end);
 
         iova = start - section->offset_within_region +
@@ -383,13 +394,33 @@ static int vfio_ram_discard_notify_populate(StateChangeListener *scl,
                                      vaddr, section->readonly);
         if (ret) {
             /* Rollback */
-            vfio_ram_discard_notify_discard(scl, section);
+            vfio_state_change_notify_to_state_clear(bcontainer, section);
             return ret;
         }
     }
     return 0;
 }
 
+static int vfio_ram_discard_notify_populate(StateChangeListener *scl,
+                                            MemoryRegionSection *section)
+{
+    RamDiscardListener *rdl = container_of(scl, RamDiscardListener, scl);
+    VFIORamDiscardListener *vrdl = container_of(rdl, VFIORamDiscardListener,
+                                                listener);
+    return vfio_state_change_notify_to_state_set(vrdl->bcontainer, section,
+                                                 vrdl->granularity);
+}
+
+static int vfio_private_shared_notify_to_shared(StateChangeListener *scl,
+                                                MemoryRegionSection *section)
+{
+    PrivateSharedListener *psl = container_of(scl, PrivateSharedListener, scl);
+    VFIOPrivateSharedListener *vpsl = container_of(psl, VFIOPrivateSharedListener,
+                                                   listener);
+    return vfio_state_change_notify_to_state_set(vpsl->bcontainer, section,
+                                                 vpsl->granularity);
+}
+
 static void vfio_register_ram_discard_listener(VFIOContainerBase *bcontainer,
                                                MemoryRegionSection *section)
 {
@@ -466,6 +497,27 @@ static void vfio_register_ram_discard_listener(VFIOContainerBase *bcontainer,
     }
 }
 
+static void vfio_register_private_shared_listener(VFIOContainerBase *bcontainer,
+                                                  MemoryRegionSection *section)
+{
+    GenericStateManager *gsm = memory_region_get_generic_state_manager(section->mr);
+    VFIOPrivateSharedListener *vpsl;
+    PrivateSharedListener *psl;
+
+    vpsl = g_new0(VFIOPrivateSharedListener, 1);
+    vpsl->bcontainer = bcontainer;
+    vpsl->mr = section->mr;
+    vpsl->offset_within_address_space = section->offset_within_address_space;
+    vpsl->granularity = generic_state_manager_get_min_granularity(gsm,
+                                                                  section->mr);
+
+    psl = &vpsl->listener;
+    private_shared_listener_init(psl, vfio_private_shared_notify_to_shared,
+                                 vfio_private_shared_notify_to_private);
+    generic_state_manager_register_listener(gsm, &psl->scl, section);
+    QLIST_INSERT_HEAD(&bcontainer->vpsl_list, vpsl, next);
+}
+
 static void vfio_unregister_ram_discard_listener(VFIOContainerBase *bcontainer,
                                                  MemoryRegionSection *section)
 {
@@ -491,6 +543,31 @@ static void vfio_unregister_ram_discard_listener(VFIOContainerBase *bcontainer,
     g_free(vrdl);
 }
 
+static void vfio_unregister_private_shared_listener(VFIOContainerBase *bcontainer,
+                                                    MemoryRegionSection *section)
+{
+    GenericStateManager *gsm = memory_region_get_generic_state_manager(section->mr);
+    VFIOPrivateSharedListener *vpsl = NULL;
+    PrivateSharedListener *psl;
+
+    QLIST_FOREACH(vpsl, &bcontainer->vpsl_list, next) {
+        if (vpsl->mr == section->mr &&
+            vpsl->offset_within_address_space ==
+            section->offset_within_address_space) {
+            break;
+        }
+    }
+
+    if (!vpsl) {
+        hw_error("vfio: Trying to unregister missing RAM discard listener");
+    }
+
+    psl = &vpsl->listener;
+    generic_state_manager_unregister_listener(gsm, &psl->scl);
+    QLIST_REMOVE(vpsl, next);
+    g_free(vpsl);
+}
+
 static bool vfio_known_safe_misalignment(MemoryRegionSection *section)
 {
     MemoryRegion *mr = section->mr;
@@ -644,6 +721,9 @@ static void vfio_listener_region_add(MemoryListener *listener,
     if (memory_region_has_ram_discard_manager(section->mr)) {
         vfio_register_ram_discard_listener(bcontainer, section);
         return;
+    } else if (memory_region_has_private_shared_manager(section->mr)) {
+        vfio_register_private_shared_listener(bcontainer, section);
+        return;
     }
 
     vaddr = memory_region_get_ram_ptr(section->mr) +
@@ -764,6 +844,10 @@ static void vfio_listener_region_del(MemoryListener *listener,
         vfio_unregister_ram_discard_listener(bcontainer, section);
         /* Unregistering will trigger an unmap. */
         try_unmap = false;
+    } else if (memory_region_has_private_shared_manager(section->mr)) {
+        vfio_unregister_private_shared_listener(bcontainer, section);
+        /* Unregistering will trigger an unmap. */
+        try_unmap = false;
     }
 
     if (try_unmap) {
diff --git a/hw/vfio/container-base.c b/hw/vfio/container-base.c
index 749a3fd29d..ff5df925c2 100644
--- a/hw/vfio/container-base.c
+++ b/hw/vfio/container-base.c
@@ -135,6 +135,7 @@ static void vfio_container_instance_init(Object *obj)
     bcontainer->iova_ranges = NULL;
     QLIST_INIT(&bcontainer->giommu_list);
     QLIST_INIT(&bcontainer->vrdl_list);
+    QLIST_INIT(&bcontainer->vpsl_list);
 }
 
 static const TypeInfo types[] = {
diff --git a/include/hw/vfio/vfio-container-base.h b/include/hw/vfio/vfio-container-base.h
index 4cff9943ab..8d7c0b1179 100644
--- a/include/hw/vfio/vfio-container-base.h
+++ b/include/hw/vfio/vfio-container-base.h
@@ -47,6 +47,7 @@ typedef struct VFIOContainerBase {
     bool dirty_pages_started; /* Protected by BQL */
     QLIST_HEAD(, VFIOGuestIOMMU) giommu_list;
     QLIST_HEAD(, VFIORamDiscardListener) vrdl_list;
+    QLIST_HEAD(, VFIOPrivateSharedListener) vpsl_list;
     QLIST_ENTRY(VFIOContainerBase) next;
     QLIST_HEAD(, VFIODevice) device_list;
     GList *iova_ranges;
@@ -71,6 +72,15 @@ typedef struct VFIORamDiscardListener {
     QLIST_ENTRY(VFIORamDiscardListener) next;
 } VFIORamDiscardListener;
 
+typedef struct VFIOPrivateSharedListener {
+    VFIOContainerBase *bcontainer;
+    MemoryRegion *mr;
+    hwaddr offset_within_address_space;
+    uint64_t granularity;
+    PrivateSharedListener listener;
+    QLIST_ENTRY(VFIOPrivateSharedListener) next;
+} VFIOPrivateSharedListener;
+
 int vfio_container_dma_map(VFIOContainerBase *bcontainer,
                            hwaddr iova, ram_addr_t size,
                            void *vaddr, bool readonly);

From patchwork Mon Apr  7 07:49:27 2025
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Chenyi Qiang <chenyi.qiang@intel.com>
X-Patchwork-Id: 14039898
Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.9])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 33D7F225407
	for <kvm@vger.kernel.org>; Mon,  7 Apr 2025 07:50:14 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=198.175.65.9
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1744012216; cv=none;
 b=AgeFK6VywP828t4zvzcWAJppACxBa4COGq13AdTyFHPNw7j9CK8ofJQy4jR8dJuYzs4lEvjX5n/CS1gOAi2CcN4QvU7jyaCMZz+L7ZypVsLE+JE3/MpLqA1PkPjbLk4hyFeseQJR1zc0k+D1Lkz5jRa6+A4gFaBD5x5juyYslTY=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1744012216; c=relaxed/simple;
	bh=w6HCJqHaMptWnO0iH0yxSN1OpJkMWZIZHCGCASw6PYM=;
	h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References:
	 MIME-Version;
 b=BBeAiGGPWDPcwYiiLiFKW7eDDR0FNZsI+RJyR19IXWs5twlYqihQk8B0awvfPvJED3+ThYNBnwZpk+XnBD1nie/V6LWhB5koGj9LpMmizsBoP8yVWcmATFGzC/0MHOIlmZc/WRw20dYtQSBxVQ7tczoHywsqBFY8jYYl2Fqlvrk=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=intel.com;
 spf=pass smtp.mailfrom=intel.com;
 dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com
 header.b=KyRNALE8; arc=none smtp.client-ip=198.175.65.9
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=intel.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=intel.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com
 header.b="KyRNALE8"
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1744012215; x=1775548215;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=w6HCJqHaMptWnO0iH0yxSN1OpJkMWZIZHCGCASw6PYM=;
  b=KyRNALE8exmwVDGCyUwJMHMDJKXHnt31Gd6GfkLvrXZ1aeOTJrWzdcJh
   Q+4WzQvFX0kJ/YNfR/C560gwXu3mOrgB+Ak9J4ka5ttwdg9q0/vuaEUt6
   NYzqyktwdCq1V79BOI1mOnbsKJDphzWDUSN6MEuYDqRMkZV4cJ8kOq7ZB
   E62uimSsh66RyBF9/tRZqZObVaETYnqQvzwRLs7kPPg7WY0PIIkyM9Yfp
   d8Yk2Pilku4TezOcktcTmU0sAsZFYNGyxWuxJjsZZnlZExsvDM4A/yzXg
   2JiHjpPoyyyVrc6nr4SZRlQPfeRuYpGpu0yHAcXLRGT9vV3F7QsCyU8FE
   Q==;
X-CSE-ConnectionGUID: KculSSsKSPmdrPI+6bBOoA==
X-CSE-MsgGUID: njKqaxUdQ2GBVWtcpYez1Q==
X-IronPort-AV: E=McAfee;i="6700,10204,11396"; a="67857557"
X-IronPort-AV: E=Sophos;i="6.15,193,1739865600";
   d="scan'208";a="67857557"
Received: from orviesa007.jf.intel.com ([10.64.159.147])
  by orvoesa101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 07 Apr 2025 00:50:14 -0700
X-CSE-ConnectionGUID: 3aX8Z7t2Q+ug3acnzQY8zQ==
X-CSE-MsgGUID: Myyi4eYPToOY0AHfkrcZyA==
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="6.15,193,1739865600";
   d="scan'208";a="128405639"
Received: from emr-bkc.sh.intel.com ([10.112.230.82])
  by orviesa007-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 07 Apr 2025 00:50:11 -0700
From: Chenyi Qiang <chenyi.qiang@intel.com>
To: David Hildenbrand <david@redhat.com>, Alexey Kardashevskiy <aik@amd.com>,
 Peter Xu <peterx@redhat.com>, Gupta Pankaj <pankaj.gupta@amd.com>,
 Paolo Bonzini <pbonzini@redhat.com>,
 =?utf-8?q?Philippe_Mathieu-Daud=C3=A9?= <philmd@linaro.org>,
 Michael Roth <michael.roth@amd.com>
Cc: Chenyi Qiang <chenyi.qiang@intel.com>,
	qemu-devel@nongnu.org,
	kvm@vger.kernel.org,
	Williams Dan J <dan.j.williams@intel.com>,
	Peng Chao P <chao.p.peng@intel.com>,
	Gao Chao <chao.gao@intel.com>,
	Xu Yilun <yilun.xu@intel.com>,
	Li Xiaoyao <xiaoyao.li@intel.com>
Subject: [PATCH v4 07/13] ram-block-attribute: Introduce RamBlockAttribute to
 manage RAMBLock with guest_memfd
Date: Mon,  7 Apr 2025 15:49:27 +0800
Message-ID: <20250407074939.18657-8-chenyi.qiang@intel.com>
X-Mailer: git-send-email 2.43.5
In-Reply-To: <20250407074939.18657-1-chenyi.qiang@intel.com>
References: <20250407074939.18657-1-chenyi.qiang@intel.com>
Precedence: bulk
X-Mailing-List: kvm@vger.kernel.org
List-Id: <kvm.vger.kernel.org>
List-Subscribe: <mailto:kvm+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:kvm+unsubscribe@vger.kernel.org>
MIME-Version: 1.0

Commit 852f0048f3 ("RAMBlock: make guest_memfd require uncoordinated
discard") highlighted that subsystems like VFIO may disable RAM block
discard. However, guest_memfd relies on discard operations for page
conversion between private and shared memory, potentially leading to
stale IOMMU mapping issue when assigning hardware devices to
confidential VMs via shared memory. To address this, it is crucial to
ensure systems like VFIO refresh its IOMMU mappings.

PrivateSharedManager is introduced to manage private and shared states in
confidential VMs, similar to RamDiscardManager, which supports
coordinated RAM discard in VFIO. Integrating PrivateSharedManager with
guest_memfd can facilitate the adjustment of VFIO mappings in response
to page conversion events.

Since guest_memfd is not an object, it cannot directly implement the
PrivateSharedManager interface. Implementing it in HostMemoryBackend is
not appropriate because guest_memfd is per RAMBlock, and some RAMBlocks
have a memory backend while others do not. Notably, virtual BIOS
RAMBlocks using memory_region_init_ram_guest_memfd() do not have a
backend.

To manage RAMBlocks with guest_memfd, define a new object named
RamBlockAttribute to implement the RamDiscardManager interface. This
object stores guest_memfd information such as shared_bitmap, and handles
page conversion notification. The memory state is tracked at the host
page size granularity, as the minimum memory conversion size can be one
page per request. Additionally, VFIO expects the DMA mapping for a
specific iova to be mapped and unmapped with the same granularity.
Confidential VMs may perform partial conversions, such as conversions on
small regions within larger regions. To prevent invalid cases and until
cut_mapping operation support is available, all operations are performed
with 4K granularity.

Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com>
---
Changes in v4:
    - Change the name from memory-attribute-manager to
      ram-block-attribute.
    - Implement the newly-introduced PrivateSharedManager instead of
      RamDiscardManager and change related commit message.
    - Define the new object in ramblock.h instead of adding a new file.

Changes in v3:
    - Some rename (bitmap_size->shared_bitmap_size,
      first_one/zero_bit->first_bit, etc.)
    - Change shared_bitmap_size from uint32_t to unsigned
    - Return mgr->mr->ram_block->page_size in get_block_size()
    - Move set_ram_discard_manager() up to avoid a g_free() in failure
      case.
    - Add const for the memory_attribute_manager_get_block_size()
    - Unify the ReplayRamPopulate and ReplayRamDiscard and related
      callback.

Changes in v2:
    - Rename the object name to MemoryAttributeManager
    - Rename the bitmap to shared_bitmap to make it more clear.
    - Remove block_size field and get it from a helper. In future, we
      can get the page_size from RAMBlock if necessary.
    - Remove the unncessary "struct" before GuestMemfdReplayData
    - Remove the unncessary g_free() for the bitmap
    - Add some error report when the callback failure for
      populated/discarded section.
    - Move the realize()/unrealize() definition to this patch.
---
 include/exec/ramblock.h      |  24 +++
 system/meson.build           |   1 +
 system/ram-block-attribute.c | 282 +++++++++++++++++++++++++++++++++++
 3 files changed, 307 insertions(+)
 create mode 100644 system/ram-block-attribute.c

diff --git a/include/exec/ramblock.h b/include/exec/ramblock.h
index 0babd105c0..b8b5469db9 100644
--- a/include/exec/ramblock.h
+++ b/include/exec/ramblock.h
@@ -23,6 +23,10 @@
 #include "cpu-common.h"
 #include "qemu/rcu.h"
 #include "exec/ramlist.h"
+#include "system/hostmem.h"
+
+#define TYPE_RAM_BLOCK_ATTRIBUTE "ram-block-attribute"
+OBJECT_DECLARE_TYPE(RamBlockAttribute, RamBlockAttributeClass, RAM_BLOCK_ATTRIBUTE)
 
 struct RAMBlock {
     struct rcu_head rcu;
@@ -90,5 +94,25 @@ struct RAMBlock {
      */
     ram_addr_t postcopy_length;
 };
+
+struct RamBlockAttribute {
+    Object parent;
+
+    MemoryRegion *mr;
+
+    /* 1-setting of the bit represents the memory is populated (shared) */
+    unsigned shared_bitmap_size;
+    unsigned long *shared_bitmap;
+
+    QLIST_HEAD(, PrivateSharedListener) psl_list;
+};
+
+struct RamBlockAttributeClass {
+    ObjectClass parent_class;
+};
+
+int ram_block_attribute_realize(RamBlockAttribute *attr, MemoryRegion *mr);
+void ram_block_attribute_unrealize(RamBlockAttribute *attr);
+
 #endif
 #endif
diff --git a/system/meson.build b/system/meson.build
index 4952f4b2c7..50a5a64f1c 100644
--- a/system/meson.build
+++ b/system/meson.build
@@ -15,6 +15,7 @@ system_ss.add(files(
   'dirtylimit.c',
   'dma-helpers.c',
   'globals.c',
+  'ram-block-attribute.c',
   'memory_mapping.c',
   'qdev-monitor.c',
   'qtest.c',
diff --git a/system/ram-block-attribute.c b/system/ram-block-attribute.c
new file mode 100644
index 0000000000..283c03b354
--- /dev/null
+++ b/system/ram-block-attribute.c
@@ -0,0 +1,282 @@
+/*
+ * QEMU ram block attribute
+ *
+ * Copyright Intel
+ *
+ * Author:
+ *      Chenyi Qiang <chenyi.qiang@intel.com>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory
+ *
+ */
+
+#include "qemu/osdep.h"
+#include "qemu/error-report.h"
+#include "exec/ramblock.h"
+
+OBJECT_DEFINE_TYPE_WITH_INTERFACES(RamBlockAttribute,
+                                   ram_block_attribute,
+                                   RAM_BLOCK_ATTRIBUTE,
+                                   OBJECT,
+                                   { TYPE_PRIVATE_SHARED_MANAGER },
+                                   { })
+
+static size_t ram_block_attribute_get_block_size(const RamBlockAttribute *attr)
+{
+    /*
+     * Because page conversion could be manipulated in the size of at least 4K or 4K aligned,
+     * Use the host page size as the granularity to track the memory attribute.
+     */
+    g_assert(attr && attr->mr && attr->mr->ram_block);
+    g_assert(attr->mr->ram_block->page_size == qemu_real_host_page_size());
+    return attr->mr->ram_block->page_size;
+}
+
+
+static bool ram_block_attribute_psm_is_shared(const GenericStateManager *gsm,
+                                              const MemoryRegionSection *section)
+{
+    const RamBlockAttribute *attr = RAM_BLOCK_ATTRIBUTE(gsm);
+    const int block_size = ram_block_attribute_get_block_size(attr);
+    uint64_t first_bit = section->offset_within_region / block_size;
+    uint64_t last_bit = first_bit + int128_get64(section->size) / block_size - 1;
+    unsigned long first_discard_bit;
+
+    first_discard_bit = find_next_zero_bit(attr->shared_bitmap, last_bit + 1, first_bit);
+    return first_discard_bit > last_bit;
+}
+
+typedef int (*ram_block_attribute_section_cb)(MemoryRegionSection *s, void *arg);
+
+static int ram_block_attribute_notify_shared_cb(MemoryRegionSection *section, void *arg)
+{
+    StateChangeListener *scl = arg;
+
+    return scl->notify_to_state_set(scl, section);
+}
+
+static int ram_block_attribute_notify_private_cb(MemoryRegionSection *section, void *arg)
+{
+    StateChangeListener *scl = arg;
+
+    scl->notify_to_state_clear(scl, section);
+    return 0;
+}
+
+static int ram_block_attribute_for_each_shared_section(const RamBlockAttribute *attr,
+                                                       MemoryRegionSection *section,
+                                                       void *arg,
+                                                       ram_block_attribute_section_cb cb)
+{
+    unsigned long first_bit, last_bit;
+    uint64_t offset, size;
+    const int block_size = ram_block_attribute_get_block_size(attr);
+    int ret = 0;
+
+    first_bit = section->offset_within_region / block_size;
+    first_bit = find_next_bit(attr->shared_bitmap, attr->shared_bitmap_size, first_bit);
+
+    while (first_bit < attr->shared_bitmap_size) {
+        MemoryRegionSection tmp = *section;
+
+        offset = first_bit * block_size;
+        last_bit = find_next_zero_bit(attr->shared_bitmap, attr->shared_bitmap_size,
+                                      first_bit + 1) - 1;
+        size = (last_bit - first_bit + 1) * block_size;
+
+        if (!memory_region_section_intersect_range(&tmp, offset, size)) {
+            break;
+        }
+
+        ret = cb(&tmp, arg);
+        if (ret) {
+            error_report("%s: Failed to notify RAM discard listener: %s", __func__,
+                         strerror(-ret));
+            break;
+        }
+
+        first_bit = find_next_bit(attr->shared_bitmap, attr->shared_bitmap_size,
+                                  last_bit + 2);
+    }
+
+    return ret;
+}
+
+static int ram_block_attribute_for_each_private_section(const RamBlockAttribute *attr,
+                                                        MemoryRegionSection *section,
+                                                        void *arg,
+                                                        ram_block_attribute_section_cb cb)
+{
+    unsigned long first_bit, last_bit;
+    uint64_t offset, size;
+    const int block_size = ram_block_attribute_get_block_size(attr);
+    int ret = 0;
+
+    first_bit = section->offset_within_region / block_size;
+    first_bit = find_next_zero_bit(attr->shared_bitmap, attr->shared_bitmap_size,
+                                   first_bit);
+
+    while (first_bit < attr->shared_bitmap_size) {
+        MemoryRegionSection tmp = *section;
+
+        offset = first_bit * block_size;
+        last_bit = find_next_bit(attr->shared_bitmap, attr->shared_bitmap_size,
+                                      first_bit + 1) - 1;
+        size = (last_bit - first_bit + 1) * block_size;
+
+        if (!memory_region_section_intersect_range(&tmp, offset, size)) {
+            break;
+        }
+
+        ret = cb(&tmp, arg);
+        if (ret) {
+            error_report("%s: Failed to notify RAM discard listener: %s", __func__,
+                         strerror(-ret));
+            break;
+        }
+
+        first_bit = find_next_zero_bit(attr->shared_bitmap, attr->shared_bitmap_size,
+                                       last_bit + 2);
+    }
+
+    return ret;
+}
+
+static uint64_t ram_block_attribute_psm_get_min_granularity(const GenericStateManager *gsm,
+                                                            const MemoryRegion *mr)
+{
+    const RamBlockAttribute *attr = RAM_BLOCK_ATTRIBUTE(gsm);
+
+    g_assert(mr == attr->mr);
+    return ram_block_attribute_get_block_size(attr);
+}
+
+static void ram_block_attribute_psm_register_listener(GenericStateManager *gsm,
+                                                      StateChangeListener *scl,
+                                                      MemoryRegionSection *section)
+{
+    RamBlockAttribute *attr = RAM_BLOCK_ATTRIBUTE(gsm);
+    PrivateSharedListener *psl = container_of(scl, PrivateSharedListener, scl);
+    int ret;
+
+    g_assert(section->mr == attr->mr);
+    scl->section = memory_region_section_new_copy(section);
+
+    QLIST_INSERT_HEAD(&attr->psl_list, psl, next);
+
+    ret = ram_block_attribute_for_each_shared_section(attr, section, scl,
+                                                      ram_block_attribute_notify_shared_cb);
+    if (ret) {
+        error_report("%s: Failed to register RAM discard listener: %s", __func__,
+                     strerror(-ret));
+    }
+}
+
+static void ram_block_attribute_psm_unregister_listener(GenericStateManager *gsm,
+                                                        StateChangeListener *scl)
+{
+    RamBlockAttribute *attr = RAM_BLOCK_ATTRIBUTE(gsm);
+    PrivateSharedListener *psl = container_of(scl, PrivateSharedListener, scl);
+    int ret;
+
+    g_assert(scl->section);
+    g_assert(scl->section->mr == attr->mr);
+
+    ret = ram_block_attribute_for_each_shared_section(attr, scl->section, scl,
+                                                      ram_block_attribute_notify_private_cb);
+    if (ret) {
+        error_report("%s: Failed to unregister RAM discard listener: %s", __func__,
+                     strerror(-ret));
+    }
+
+    memory_region_section_free_copy(scl->section);
+    scl->section = NULL;
+    QLIST_REMOVE(psl, next);
+}
+
+typedef struct RamBlockAttributeReplayData {
+    ReplayStateChange fn;
+    void *opaque;
+} RamBlockAttributeReplayData;
+
+static int ram_block_attribute_psm_replay_cb(MemoryRegionSection *section, void *arg)
+{
+    RamBlockAttributeReplayData *data = arg;
+
+    return data->fn(section, data->opaque);
+}
+
+static int ram_block_attribute_psm_replay_on_shared(const GenericStateManager *gsm,
+                                                    MemoryRegionSection *section,
+                                                    ReplayStateChange replay_fn,
+                                                    void *opaque)
+{
+    RamBlockAttribute *attr = RAM_BLOCK_ATTRIBUTE(gsm);
+    RamBlockAttributeReplayData data = { .fn = replay_fn, .opaque = opaque };
+
+    g_assert(section->mr == attr->mr);
+    return ram_block_attribute_for_each_shared_section(attr, section, &data,
+                                                       ram_block_attribute_psm_replay_cb);
+}
+
+static int ram_block_attribute_psm_replay_on_private(const GenericStateManager *gsm,
+                                                     MemoryRegionSection *section,
+                                                     ReplayStateChange replay_fn,
+                                                     void *opaque)
+{
+    RamBlockAttribute *attr = RAM_BLOCK_ATTRIBUTE(gsm);
+    RamBlockAttributeReplayData data = { .fn = replay_fn, .opaque = opaque };
+
+    g_assert(section->mr == attr->mr);
+    return ram_block_attribute_for_each_private_section(attr, section, &data,
+                                                        ram_block_attribute_psm_replay_cb);
+}
+
+int ram_block_attribute_realize(RamBlockAttribute *attr, MemoryRegion *mr)
+{
+    uint64_t shared_bitmap_size;
+    const int block_size  = qemu_real_host_page_size();
+    int ret;
+
+    shared_bitmap_size = ROUND_UP(mr->size, block_size) / block_size;
+
+    attr->mr = mr;
+    ret = memory_region_set_generic_state_manager(mr, GENERIC_STATE_MANAGER(attr));
+    if (ret) {
+        return ret;
+    }
+    attr->shared_bitmap_size = shared_bitmap_size;
+    attr->shared_bitmap = bitmap_new(shared_bitmap_size);
+
+    return ret;
+}
+
+void ram_block_attribute_unrealize(RamBlockAttribute *attr)
+{
+    g_free(attr->shared_bitmap);
+    memory_region_set_generic_state_manager(attr->mr, NULL);
+}
+
+static void ram_block_attribute_init(Object *obj)
+{
+    RamBlockAttribute *attr = RAM_BLOCK_ATTRIBUTE(obj);
+
+    QLIST_INIT(&attr->psl_list);
+}
+
+static void ram_block_attribute_finalize(Object *obj)
+{
+}
+
+static void ram_block_attribute_class_init(ObjectClass *oc, void *data)
+{
+    GenericStateManagerClass *gsmc = GENERIC_STATE_MANAGER_CLASS(oc);
+
+    gsmc->get_min_granularity = ram_block_attribute_psm_get_min_granularity;
+    gsmc->register_listener = ram_block_attribute_psm_register_listener;
+    gsmc->unregister_listener = ram_block_attribute_psm_unregister_listener;
+    gsmc->is_state_set = ram_block_attribute_psm_is_shared;
+    gsmc->replay_on_state_set = ram_block_attribute_psm_replay_on_shared;
+    gsmc->replay_on_state_clear = ram_block_attribute_psm_replay_on_private;
+}

From patchwork Mon Apr  7 07:49:28 2025
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Chenyi Qiang <chenyi.qiang@intel.com>
X-Patchwork-Id: 14039899
Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.9])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id D286F227B8C
	for <kvm@vger.kernel.org>; Mon,  7 Apr 2025 07:50:17 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=198.175.65.9
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1744012219; cv=none;
 b=jeH67HW+z/E7G42r+yKpaa5QpjbZcqVwlOC9YufH7MS3sFQoEv/N+zS/6KeVizVtlY+RIazcT74bVAArdm+m+daYG8xXt/QgHo+c4AGPHwfP11U6MV96jwwI3xyDTrrF+5G7DR+loXcpVxSzYsi3Ej9Qxrh6uRAnoS4s+LWPshE=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1744012219; c=relaxed/simple;
	bh=o+a/Hw3O4JaruEHuRPJRaztD3Sx+cvobAeN0HQKIJvQ=;
	h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References:
	 MIME-Version;
 b=N1ifOIwPF5+aj8J76DYeltcmB/68pmz4zkoqfZVgNwW3AAym3G/qDKi2ehryslfdlsR6FNFj2mHC6JDkwqkIp5ZJO4bUnMhw4e0gniCaXAfp6WHPOHPergeGd9HDYtcPp4V8vOXS7r6Tb+MU91/gaUGWZFpHs6s4wO02eisXCm8=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=intel.com;
 spf=pass smtp.mailfrom=intel.com;
 dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com
 header.b=U/tZwQV2; arc=none smtp.client-ip=198.175.65.9
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=intel.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=intel.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com
 header.b="U/tZwQV2"
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1744012218; x=1775548218;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=o+a/Hw3O4JaruEHuRPJRaztD3Sx+cvobAeN0HQKIJvQ=;
  b=U/tZwQV2FI8GQONgAaAf1ZPIEaq5+jxVUBPgaffZIFuMiFrUxe95OScb
   ll9zHQ/+dLXHdxm5xjXGchYSvvbzPXRPg3SUazi0b9O9C/8q4NS+I2zdF
   xDLU0eM7T7oWLRY7YZLsacS+UAJ4S3iLYOdX5OzIKP0yWwWYHpNWsXcgd
   qvj6XiyS4omMwFTPkCjdXMh9hyOh6l3+8jG3QTc0bcgikLpt1gLNKN8gB
   Rjy+QiPX/RmWfPzKLw1j71w5gYyiLQSsKjHKHlAgxtN7rTriI5t+nezc6
   7epsBhpehxg+vwn9UBdtEFs6V+xPtFqHBAvIIOjqO5GkBVkoRbycKdtr6
   g==;
X-CSE-ConnectionGUID: Roh34rF2RwaujkdoSy+dKA==
X-CSE-MsgGUID: wYAhUGAmR7O+BWIJ0AGxlQ==
X-IronPort-AV: E=McAfee;i="6700,10204,11396"; a="67857563"
X-IronPort-AV: E=Sophos;i="6.15,193,1739865600";
   d="scan'208";a="67857563"
Received: from orviesa007.jf.intel.com ([10.64.159.147])
  by orvoesa101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 07 Apr 2025 00:50:18 -0700
X-CSE-ConnectionGUID: ZqAUSgJjSNmFp9U52hhdEg==
X-CSE-MsgGUID: qT3igI5+Ro6ZU3k+HwPskw==
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="6.15,193,1739865600";
   d="scan'208";a="128405652"
Received: from emr-bkc.sh.intel.com ([10.112.230.82])
  by orviesa007-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 07 Apr 2025 00:50:14 -0700
From: Chenyi Qiang <chenyi.qiang@intel.com>
To: David Hildenbrand <david@redhat.com>, Alexey Kardashevskiy <aik@amd.com>,
 Peter Xu <peterx@redhat.com>, Gupta Pankaj <pankaj.gupta@amd.com>,
 Paolo Bonzini <pbonzini@redhat.com>,
 =?utf-8?q?Philippe_Mathieu-Daud=C3=A9?= <philmd@linaro.org>,
 Michael Roth <michael.roth@amd.com>
Cc: Chenyi Qiang <chenyi.qiang@intel.com>,
	qemu-devel@nongnu.org,
	kvm@vger.kernel.org,
	Williams Dan J <dan.j.williams@intel.com>,
	Peng Chao P <chao.p.peng@intel.com>,
	Gao Chao <chao.gao@intel.com>,
	Xu Yilun <yilun.xu@intel.com>,
	Li Xiaoyao <xiaoyao.li@intel.com>
Subject: [PATCH v4 08/13] ram-block-attribute: Introduce a callback to notify
 shared/private state changes
Date: Mon,  7 Apr 2025 15:49:28 +0800
Message-ID: <20250407074939.18657-9-chenyi.qiang@intel.com>
X-Mailer: git-send-email 2.43.5
In-Reply-To: <20250407074939.18657-1-chenyi.qiang@intel.com>
References: <20250407074939.18657-1-chenyi.qiang@intel.com>
Precedence: bulk
X-Mailing-List: kvm@vger.kernel.org
List-Id: <kvm.vger.kernel.org>
List-Subscribe: <mailto:kvm+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:kvm+unsubscribe@vger.kernel.org>
MIME-Version: 1.0

A new state_change() callback is introduced in PrivateSharedManageClass
to efficiently notify all registered PrivateSharedListeners, including
VFIO listeners, about memory conversion events in guest_memfd. The VFIO
listener can dynamically DMA map/unmap shared pages based on conversion
types:
- For conversions from shared to private, the VFIO system ensures the
  discarding of shared mapping from the IOMMU.
- For conversions from private to shared, it triggers the population of
  the shared mapping into the IOMMU.

Additionally, special conversion requests are handled as followed:
- If a conversion request is made for a page already in the desired
  state, the helper simply returns success.
- For requests involving a range partially in the desired state, only
  the necessary segments are converted, ensuring efficient compliance
  with the request. In this case, fallback to "1 block at a time"
  handling so that the range passed to the notify_to_private/shared() is
  always in the desired state.
- If a conversion request is declined by other systems, such as a
  failure from VFIO during notify_to_shared(), the helper rolls back the
  request to maintain consistency. As for notify_to_private() handling,
  failure in VFIO is unexpected, so no error check is performed.

Note that the bitmap status is updated before callbacks, allowing
listeners to handle memory based on the latest status.

Opportunistically, introduce a helper to trigger the state_change()
callback of the class.

Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com>
---
Changes in v4:
    - Add the state_change() callback in PrivateSharedManagerClass
      instead of the RamBlockAttribute.

Changes in v3:
    - Move the bitmap update before notifier callbacks.
    - Call the notifier callbacks directly in notify_discard/populate()
      with the expectation that the request memory range is in the
      desired attribute.
    - For the case that only partial range in the desire status, handle
      the range with block_size granularity for ease of rollback
      (https://lore.kernel.org/qemu-devel/812768d7-a02d-4b29-95f3-fb7a125cf54e@redhat.com/)

Changes in v2:
    - Do the alignment changes due to the rename to MemoryAttributeManager
    - Move the state_change() helper definition in this patch.
---
 include/exec/memory.h        |   7 ++
 system/memory.c              |  10 ++
 system/ram-block-attribute.c | 191 +++++++++++++++++++++++++++++++++++
 3 files changed, 208 insertions(+)

diff --git a/include/exec/memory.h b/include/exec/memory.h
index 08f25e5e84..a61896251c 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -780,6 +780,9 @@ struct PrivateSharedListener {
 struct PrivateSharedManagerClass {
     /* private */
     GenericStateManagerClass parent_class;
+
+    int (*state_change)(PrivateSharedManager *mgr, uint64_t offset, uint64_t size,
+                        bool to_private);
 };
 
 static inline void private_shared_listener_init(PrivateSharedListener *psl,
@@ -789,6 +792,10 @@ static inline void private_shared_listener_init(PrivateSharedListener *psl,
     state_change_listener_init(&psl->scl, populate_fn, discard_fn);
 }
 
+int private_shared_manager_state_change(PrivateSharedManager *mgr,
+                                        uint64_t offset, uint64_t size,
+                                        bool to_private);
+
 /**
  * memory_get_xlat_addr: Extract addresses from a TLB entry
  *
diff --git a/system/memory.c b/system/memory.c
index e6e944d9c0..2f6eaf6314 100644
--- a/system/memory.c
+++ b/system/memory.c
@@ -2206,6 +2206,16 @@ void generic_state_manager_unregister_listener(GenericStateManager *gsm,
     gsmc->unregister_listener(gsm, scl);
 }
 
+int private_shared_manager_state_change(PrivateSharedManager *mgr,
+                                        uint64_t offset, uint64_t size,
+                                        bool to_private)
+{
+    PrivateSharedManagerClass *psmc = PRIVATE_SHARED_MANAGER_GET_CLASS(mgr);
+
+    g_assert(psmc->state_change);
+    return psmc->state_change(mgr, offset, size, to_private);
+}
+
 /* Called with rcu_read_lock held.  */
 bool memory_get_xlat_addr(IOMMUTLBEntry *iotlb, void **vaddr,
                           ram_addr_t *ram_addr, bool *read_only,
diff --git a/system/ram-block-attribute.c b/system/ram-block-attribute.c
index 283c03b354..06ed134cda 100644
--- a/system/ram-block-attribute.c
+++ b/system/ram-block-attribute.c
@@ -233,6 +233,195 @@ static int ram_block_attribute_psm_replay_on_private(const GenericStateManager *
                                                         ram_block_attribute_psm_replay_cb);
 }
 
+static bool ram_block_attribute_is_valid_range(RamBlockAttribute *attr,
+                                               uint64_t offset, uint64_t size)
+{
+    MemoryRegion *mr = attr->mr;
+
+    g_assert(mr);
+
+    uint64_t region_size = memory_region_size(mr);
+    int block_size = ram_block_attribute_get_block_size(attr);
+
+    if (!QEMU_IS_ALIGNED(offset, block_size)) {
+        return false;
+    }
+    if (offset + size < offset || !size) {
+        return false;
+    }
+    if (offset >= region_size || offset + size > region_size) {
+        return false;
+    }
+    return true;
+}
+
+static void ram_block_attribute_notify_to_private(RamBlockAttribute *attr,
+                                                  uint64_t offset, uint64_t size)
+{
+    PrivateSharedListener *psl;
+
+    QLIST_FOREACH(psl, &attr->psl_list, next) {
+        StateChangeListener *scl = &psl->scl;
+        MemoryRegionSection tmp = *scl->section;
+
+        if (!memory_region_section_intersect_range(&tmp, offset, size)) {
+            continue;
+        }
+        scl->notify_to_state_clear(scl, &tmp);
+    }
+}
+
+static int ram_block_attribute_notify_to_shared(RamBlockAttribute *attr,
+                                                uint64_t offset, uint64_t size)
+{
+    PrivateSharedListener *psl, *psl2;
+    int ret = 0;
+
+    QLIST_FOREACH(psl, &attr->psl_list, next) {
+        StateChangeListener *scl = &psl->scl;
+        MemoryRegionSection tmp = *scl->section;
+
+        if (!memory_region_section_intersect_range(&tmp, offset, size)) {
+            continue;
+        }
+        ret = scl->notify_to_state_set(scl, &tmp);
+        if (ret) {
+            break;
+        }
+    }
+
+    if (ret) {
+        /* Notify all already-notified listeners. */
+        QLIST_FOREACH(psl2, &attr->psl_list, next) {
+            StateChangeListener *scl2 = &psl2->scl;
+            MemoryRegionSection tmp = *scl2->section;
+
+            if (psl == psl2) {
+                break;
+            }
+            if (!memory_region_section_intersect_range(&tmp, offset, size)) {
+                continue;
+            }
+            scl2->notify_to_state_clear(scl2, &tmp);
+        }
+    }
+    return ret;
+}
+
+static bool ram_block_attribute_is_range_shared(RamBlockAttribute *attr,
+                                                uint64_t offset, uint64_t size)
+{
+    const int block_size = ram_block_attribute_get_block_size(attr);
+    const unsigned long first_bit = offset / block_size;
+    const unsigned long last_bit = first_bit + (size / block_size) - 1;
+    unsigned long found_bit;
+
+    /* We fake a shorter bitmap to avoid searching too far. */
+    found_bit = find_next_zero_bit(attr->shared_bitmap, last_bit + 1, first_bit);
+    return found_bit > last_bit;
+}
+
+static bool ram_block_attribute_is_range_private(RamBlockAttribute *attr,
+                                                uint64_t offset, uint64_t size)
+{
+    const int block_size = ram_block_attribute_get_block_size(attr);
+    const unsigned long first_bit = offset / block_size;
+    const unsigned long last_bit = first_bit + (size / block_size) - 1;
+    unsigned long found_bit;
+
+    /* We fake a shorter bitmap to avoid searching too far. */
+    found_bit = find_next_bit(attr->shared_bitmap, last_bit + 1, first_bit);
+    return found_bit > last_bit;
+}
+
+static int ram_block_attribute_psm_state_change(PrivateSharedManager *mgr, uint64_t offset,
+                                                uint64_t size, bool to_private)
+{
+    RamBlockAttribute *attr = RAM_BLOCK_ATTRIBUTE(mgr);
+    const int block_size = ram_block_attribute_get_block_size(attr);
+    const unsigned long first_bit = offset / block_size;
+    const unsigned long nbits = size / block_size;
+    const uint64_t end = offset + size;
+    unsigned long bit;
+    uint64_t cur;
+    int ret = 0;
+
+    if (!ram_block_attribute_is_valid_range(attr, offset, size)) {
+        error_report("%s, invalid range: offset 0x%lx, size 0x%lx",
+                     __func__, offset, size);
+        return -1;
+    }
+
+    if (to_private) {
+        if (ram_block_attribute_is_range_private(attr, offset, size)) {
+            /* Already private */
+        } else if (!ram_block_attribute_is_range_shared(attr, offset, size)) {
+            /* Unexpected mixture: process individual blocks */
+            for (cur = offset; cur < end; cur += block_size) {
+                bit = cur / block_size;
+                if (!test_bit(bit, attr->shared_bitmap)) {
+                    continue;
+                }
+                clear_bit(bit, attr->shared_bitmap);
+                ram_block_attribute_notify_to_private(attr, cur, block_size);
+            }
+        } else {
+            /* Completely shared */
+            bitmap_clear(attr->shared_bitmap, first_bit, nbits);
+            ram_block_attribute_notify_to_private(attr, offset, size);
+        }
+    } else {
+        if (ram_block_attribute_is_range_shared(attr, offset, size)) {
+            /* Already shared */
+        } else if (!ram_block_attribute_is_range_private(attr, offset, size)) {
+            /* Unexpected mixture: process individual blocks */
+            unsigned long *modified_bitmap = bitmap_new(nbits);
+
+            for (cur = offset; cur < end; cur += block_size) {
+                bit = cur / block_size;
+                if (test_bit(bit, attr->shared_bitmap)) {
+                    continue;
+                }
+                set_bit(bit, attr->shared_bitmap);
+                ret = ram_block_attribute_notify_to_shared(attr, cur, block_size);
+                if (!ret) {
+                    set_bit(bit - first_bit, modified_bitmap);
+                    continue;
+                }
+                clear_bit(bit, attr->shared_bitmap);
+                break;
+            }
+
+            if (ret) {
+                /*
+                 * Very unexpected: something went wrong. Revert to the old
+                 * state, marking only the blocks as private that we converted
+                 * to shared.
+                 */
+                for (cur = offset; cur < end; cur += block_size) {
+                    bit = cur / block_size;
+                    if (!test_bit(bit - first_bit, modified_bitmap)) {
+                        continue;
+                    }
+                    assert(test_bit(bit, attr->shared_bitmap));
+                    clear_bit(bit, attr->shared_bitmap);
+                    ram_block_attribute_notify_to_private(attr, cur, block_size);
+                }
+            }
+            g_free(modified_bitmap);
+        } else {
+            /* Complete private */
+            bitmap_set(attr->shared_bitmap, first_bit, nbits);
+            ret = ram_block_attribute_notify_to_shared(attr, offset, size);
+            if (ret) {
+                bitmap_clear(attr->shared_bitmap, first_bit, nbits);
+            }
+        }
+    }
+
+    return ret;
+}
+
 int ram_block_attribute_realize(RamBlockAttribute *attr, MemoryRegion *mr)
 {
     uint64_t shared_bitmap_size;
@@ -272,6 +461,7 @@ static void ram_block_attribute_finalize(Object *obj)
 static void ram_block_attribute_class_init(ObjectClass *oc, void *data)
 {
     GenericStateManagerClass *gsmc = GENERIC_STATE_MANAGER_CLASS(oc);
+    PrivateSharedManagerClass *psmc = PRIVATE_SHARED_MANAGER_CLASS(oc);
 
     gsmc->get_min_granularity = ram_block_attribute_psm_get_min_granularity;
     gsmc->register_listener = ram_block_attribute_psm_register_listener;
@@ -279,4 +469,5 @@ static void ram_block_attribute_class_init(ObjectClass *oc, void *data)
     gsmc->is_state_set = ram_block_attribute_psm_is_shared;
     gsmc->replay_on_state_set = ram_block_attribute_psm_replay_on_shared;
     gsmc->replay_on_state_clear = ram_block_attribute_psm_replay_on_private;
+    psmc->state_change = ram_block_attribute_psm_state_change;
 }

From patchwork Mon Apr  7 07:49:29 2025
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Chenyi Qiang <chenyi.qiang@intel.com>
X-Patchwork-Id: 14039900
Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.9])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 71CA7227B8C
	for <kvm@vger.kernel.org>; Mon,  7 Apr 2025 07:50:21 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=198.175.65.9
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1744012223; cv=none;
 b=JwvO7eq3Tt1/fajUc60dZPoVrW2YSDPMIyutcvcytPLhsAlk0lwNSuKWryxrCB2RW7Ciu6WUgpFm0K3noZEGE5nTvXwODEthu6WI0KhNctD9eEjnygQW79snb938JRGnkAyYOhmEyJvOOh0QN2y1LeJrTlyWLhfSxH76RIddBac=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1744012223; c=relaxed/simple;
	bh=VcsOtoKaXuIBJyGwmx8nSbTIs9EhvqUalOaaez7TI0A=;
	h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References:
	 MIME-Version;
 b=e4MyLe401FH5a1+do89zqch3dN7Wb2e23HZ92l9Tw4prwB7EjWmOHclsJ1wM13XOYES9MO+3//qeliOMZgny3FYIBxmOn/oO7YTUYU26UcOZk4Vy0Vs7wZVzi+RB1nPa74kr1O0tDrrNcFtYn0MoZ/pSMuAD1iVa5DWwfZ4PC+Y=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=intel.com;
 spf=pass smtp.mailfrom=intel.com;
 dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com
 header.b=OfCE66t+; arc=none smtp.client-ip=198.175.65.9
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=intel.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=intel.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com
 header.b="OfCE66t+"
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1744012222; x=1775548222;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=VcsOtoKaXuIBJyGwmx8nSbTIs9EhvqUalOaaez7TI0A=;
  b=OfCE66t+jTNcOns1sQUwSJTDr0xWQMVWpAcg4t9YVxDAb9qKy8qKb3nP
   amDEKGtYG2lA1UbY3Txm/OoELMu+Kcya2Ez1kIVGCx9Z8H5rdeI/0HDFr
   bMgysBtzIYPQZXXGvyle6uH+vWJdsYRE0fxUNvgWQoS+QEqQ+qsWe+9is
   cNzajt68IloAMvUlDdc7Oo9HVKtha3OQ/4rs5LYDKw8IjdeAf2ORWZ/FV
   yorefbnErzZ0RRY/aAM737Q614GkuMU52PwTAJIx74kjpd1nyR0ofhGZG
   A5wMF7aNgjbq85kP/4lv5LM7RLqbfLNiCFKuSwfFBMvRDMBfswP7+madt
   A==;
X-CSE-ConnectionGUID: RGmF6XtdS0aQukfkC6TVTg==
X-CSE-MsgGUID: FZ9rhfA/R3Cwi+WK9QRZwQ==
X-IronPort-AV: E=McAfee;i="6700,10204,11396"; a="67857571"
X-IronPort-AV: E=Sophos;i="6.15,193,1739865600";
   d="scan'208";a="67857571"
Received: from orviesa007.jf.intel.com ([10.64.159.147])
  by orvoesa101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 07 Apr 2025 00:50:22 -0700
X-CSE-ConnectionGUID: 52pGxl9oQAeCJDAezIyv3w==
X-CSE-MsgGUID: Q/C6FP9TTbaf3hipyyTGfA==
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="6.15,193,1739865600";
   d="scan'208";a="128405662"
Received: from emr-bkc.sh.intel.com ([10.112.230.82])
  by orviesa007-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 07 Apr 2025 00:50:18 -0700
From: Chenyi Qiang <chenyi.qiang@intel.com>
To: David Hildenbrand <david@redhat.com>, Alexey Kardashevskiy <aik@amd.com>,
 Peter Xu <peterx@redhat.com>, Gupta Pankaj <pankaj.gupta@amd.com>,
 Paolo Bonzini <pbonzini@redhat.com>,
 =?utf-8?q?Philippe_Mathieu-Daud=C3=A9?= <philmd@linaro.org>,
 Michael Roth <michael.roth@amd.com>
Cc: Chenyi Qiang <chenyi.qiang@intel.com>,
	qemu-devel@nongnu.org,
	kvm@vger.kernel.org,
	Williams Dan J <dan.j.williams@intel.com>,
	Peng Chao P <chao.p.peng@intel.com>,
	Gao Chao <chao.gao@intel.com>,
	Xu Yilun <yilun.xu@intel.com>,
	Li Xiaoyao <xiaoyao.li@intel.com>
Subject: [PATCH v4 09/13] memory: Attach RamBlockAttribute to
 guest_memfd-backed RAMBlocks
Date: Mon,  7 Apr 2025 15:49:29 +0800
Message-ID: <20250407074939.18657-10-chenyi.qiang@intel.com>
X-Mailer: git-send-email 2.43.5
In-Reply-To: <20250407074939.18657-1-chenyi.qiang@intel.com>
References: <20250407074939.18657-1-chenyi.qiang@intel.com>
Precedence: bulk
X-Mailing-List: kvm@vger.kernel.org
List-Id: <kvm.vger.kernel.org>
List-Subscribe: <mailto:kvm+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:kvm+unsubscribe@vger.kernel.org>
MIME-Version: 1.0

A new field, ram_block_attribute, is introduced in RAMBlock to link to a
RamBlockAttribute object. This change centralizes all guest_memfd state
information (such as fd and shared_bitmap) within a RAMBlock,
simplifying management.

The realize()/unrealized() helpers are used to initialize/uninitialize
the RamBlockAttribute object. The object is registered/unregistered in
the target RAMBlock's MemoryRegion when creating guest_memfd.

Additionally, use the private_shared_manager_state_change() helper to
notify the registered PrivateSharedListener of these changes.

Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com>
---
Changes in v4:
    - Remove the replay operations for attribute changes which will be
      handled in a listener in following patches.
    - Add some comment in the error path of realize() to remind the
      future development of the unified error path.

Changes in v3:
    - Use ram_discard_manager_reply_populated/discarded() to set the
      memory attribute and add the undo support if state_change()
      failed.
    - Didn't add Reviewed-by from Alexey due to the new changes in this
      commit.

Changes in v2:
    - Introduce a new field memory_attribute_manager in RAMBlock.
    - Move the state_change() handling during page conversion in this patch.
    - Undo what we did if it fails to set.
    - Change the order of close(guest_memfd) and memory_attribute_manager cleanup.
---
 accel/kvm/kvm-all.c     |  9 +++++++++
 include/exec/ramblock.h |  1 +
 system/physmem.c        | 16 ++++++++++++++++
 3 files changed, 26 insertions(+)

diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index c1fea69d58..546b58b737 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -3088,6 +3088,15 @@ int kvm_convert_memory(hwaddr start, hwaddr size, bool to_private)
     addr = memory_region_get_ram_ptr(mr) + section.offset_within_region;
     rb = qemu_ram_block_from_host(addr, false, &offset);
 
+    ret = private_shared_manager_state_change(PRIVATE_SHARED_MANAGER(mr->gsm),
+                                              offset, size, to_private);
+    if (ret) {
+        error_report("Failed to notify the listener the state change of "
+                     "(0x%"HWADDR_PRIx" + 0x%"HWADDR_PRIx") to %s",
+                     start, size, to_private ? "private" : "shared");
+        goto out_unref;
+    }
+
     if (to_private) {
         if (rb->page_size != qemu_real_host_page_size()) {
             /*
diff --git a/include/exec/ramblock.h b/include/exec/ramblock.h
index b8b5469db9..78eb031819 100644
--- a/include/exec/ramblock.h
+++ b/include/exec/ramblock.h
@@ -46,6 +46,7 @@ struct RAMBlock {
     int fd;
     uint64_t fd_offset;
     int guest_memfd;
+    RamBlockAttribute *ram_block_attribute;
     size_t page_size;
     /* dirty bitmap used during migration */
     unsigned long *bmap;
diff --git a/system/physmem.c b/system/physmem.c
index c76503aea8..fb74321e10 100644
--- a/system/physmem.c
+++ b/system/physmem.c
@@ -1885,6 +1885,20 @@ static void ram_block_add(RAMBlock *new_block, Error **errp)
             qemu_mutex_unlock_ramlist();
             goto out_free;
         }
+
+        new_block->ram_block_attribute = RAM_BLOCK_ATTRIBUTE(object_new(TYPE_RAM_BLOCK_ATTRIBUTE));
+        if (ram_block_attribute_realize(new_block->ram_block_attribute, new_block->mr)) {
+            error_setg(errp, "Failed to realize ram block attribute");
+            /*
+             * The error path could be unified if the rest of ram_block_add() ever
+             * develops a need to check for errors.
+             */
+            object_unref(OBJECT(new_block->ram_block_attribute));
+            close(new_block->guest_memfd);
+            ram_block_discard_require(false);
+            qemu_mutex_unlock_ramlist();
+            goto out_free;
+        }
     }
 
     ram_size = (new_block->offset + new_block->max_length) >> TARGET_PAGE_BITS;
@@ -2138,6 +2152,8 @@ static void reclaim_ramblock(RAMBlock *block)
     }
 
     if (block->guest_memfd >= 0) {
+        ram_block_attribute_unrealize(block->ram_block_attribute);
+        object_unref(OBJECT(block->ram_block_attribute));
         close(block->guest_memfd);
         ram_block_discard_require(false);
     }

From patchwork Mon Apr  7 07:49:30 2025
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Chenyi Qiang <chenyi.qiang@intel.com>
X-Patchwork-Id: 14039924
Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.9])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1C809225791
	for <kvm@vger.kernel.org>; Mon,  7 Apr 2025 07:50:25 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=198.175.65.9
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1744012226; cv=none;
 b=gJ/8Jy2hJW9a0w0eP4+EAlPEw5/lttRU1JOk6DPF1YaPUCsXxMvfbTNN+nIKH4fv+aFufRpNBoHWHyS9E02zZVfmvMgXN8o2BwkGkGiveCn9lNCeC8limywLwASdhNnz/O82zEOEuv9BD08D+hV5fP4EaiKcDtzzgIwZLNe5VoA=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1744012226; c=relaxed/simple;
	bh=sVUIY4fIKGHtOSqBXZ7Rk7z5KaG5cmAGeldWUKYdfBY=;
	h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References:
	 MIME-Version;
 b=UDMkBR6+ICiZWIBuenD+gPdCC+bn9AgAqLC3vRK+f5cd3h7jaLDK4ClAoSL4WEDJ2U9w1RuyFY5iHQrY2w0cRkSwHbO+MUUzVzC43eXvlUnriyK5odeFRle4fIseBF6dpoX0SkTvqWGbnEzJ12ws9Pi2P0urhr6gYQIStKhCXGM=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=intel.com;
 spf=pass smtp.mailfrom=intel.com;
 dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com
 header.b=oEzuUO8N; arc=none smtp.client-ip=198.175.65.9
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=intel.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=intel.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com
 header.b="oEzuUO8N"
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1744012225; x=1775548225;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=sVUIY4fIKGHtOSqBXZ7Rk7z5KaG5cmAGeldWUKYdfBY=;
  b=oEzuUO8Nyvc2sXJcRvttiQYHnG+1yyIMcz2WgikNZ7VfZsYaKDrPwRah
   mpo99hK2rYC6D3wiy/ZWl7ZniAJqSwucVDovdeA9HJganb2ZvUTYMQTlR
   8TrXaf5TXE6GDTzmOn9jN1JeH+ewA3/EXY8LtB926XKzPgKArYSwCaenR
   ftIH+TwejUxp6MNwGt52/GHKQxt6nMnQIS0VV0uUrdqne1xFX6X5tvDHx
   9LaKoufUZ7E/VaGU1vr8NTACsdqlr8yuwIBXOWrLLn8z5GLKcdo9kzJxo
   lZnZmSc6t/cJNsGAv6EMsgP/lh53YNP6VZm6VszI+yUCEJI/2DEfGzCbQ
   g==;
X-CSE-ConnectionGUID: 1Mbz6CRIS7mU/jsGsKjW0g==
X-CSE-MsgGUID: wGZZbgltRkWMTAwS898Elw==
X-IronPort-AV: E=McAfee;i="6700,10204,11396"; a="67857585"
X-IronPort-AV: E=Sophos;i="6.15,193,1739865600";
   d="scan'208";a="67857585"
Received: from orviesa007.jf.intel.com ([10.64.159.147])
  by orvoesa101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 07 Apr 2025 00:50:25 -0700
X-CSE-ConnectionGUID: zVXf4v0QRxeKIUoz8K2CEQ==
X-CSE-MsgGUID: 62JTDrV0Rx+lOH8R1pgAqw==
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="6.15,193,1739865600";
   d="scan'208";a="128405674"
Received: from emr-bkc.sh.intel.com ([10.112.230.82])
  by orviesa007-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 07 Apr 2025 00:50:22 -0700
From: Chenyi Qiang <chenyi.qiang@intel.com>
To: David Hildenbrand <david@redhat.com>, Alexey Kardashevskiy <aik@amd.com>,
 Peter Xu <peterx@redhat.com>, Gupta Pankaj <pankaj.gupta@amd.com>,
 Paolo Bonzini <pbonzini@redhat.com>,
 =?utf-8?q?Philippe_Mathieu-Daud=C3=A9?= <philmd@linaro.org>,
 Michael Roth <michael.roth@amd.com>
Cc: Chenyi Qiang <chenyi.qiang@intel.com>,
	qemu-devel@nongnu.org,
	kvm@vger.kernel.org,
	Williams Dan J <dan.j.williams@intel.com>,
	Peng Chao P <chao.p.peng@intel.com>,
	Gao Chao <chao.gao@intel.com>,
	Xu Yilun <yilun.xu@intel.com>,
	Li Xiaoyao <xiaoyao.li@intel.com>
Subject: [PATCH v4 10/13] memory: Change NotifyStateClear() definition to
 return the result
Date: Mon,  7 Apr 2025 15:49:30 +0800
Message-ID: <20250407074939.18657-11-chenyi.qiang@intel.com>
X-Mailer: git-send-email 2.43.5
In-Reply-To: <20250407074939.18657-1-chenyi.qiang@intel.com>
References: <20250407074939.18657-1-chenyi.qiang@intel.com>
Precedence: bulk
X-Mailing-List: kvm@vger.kernel.org
List-Id: <kvm.vger.kernel.org>
List-Subscribe: <mailto:kvm+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:kvm+unsubscribe@vger.kernel.org>
MIME-Version: 1.0

So that the caller can check the result of NotifyStateClear() handler if
the operation fails.

Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com>
---
Changes in v4:
    - Newly added.
---
 hw/vfio/common.c      | 18 ++++++++++--------
 include/exec/memory.h |  4 ++--
 2 files changed, 12 insertions(+), 10 deletions(-)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 48468a12c3..6e49ae597d 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -335,8 +335,8 @@ out:
     rcu_read_unlock();
 }
 
-static void vfio_state_change_notify_to_state_clear(VFIOContainerBase *bcontainer,
-                                                    MemoryRegionSection *section)
+static int vfio_state_change_notify_to_state_clear(VFIOContainerBase *bcontainer,
+                                                   MemoryRegionSection *section)
 {
     const hwaddr size = int128_get64(section->size);
     const hwaddr iova = section->offset_within_address_space;
@@ -348,24 +348,26 @@ static void vfio_state_change_notify_to_state_clear(VFIOContainerBase *bcontaine
         error_report("%s: vfio_container_dma_unmap() failed: %s", __func__,
                      strerror(-ret));
     }
+
+    return ret;
 }
 
-static void vfio_ram_discard_notify_discard(StateChangeListener *scl,
-                                            MemoryRegionSection *section)
+static int vfio_ram_discard_notify_discard(StateChangeListener *scl,
+                                           MemoryRegionSection *section)
 {
     RamDiscardListener *rdl = container_of(scl, RamDiscardListener, scl);
     VFIORamDiscardListener *vrdl = container_of(rdl, VFIORamDiscardListener,
                                                 listener);
-    vfio_state_change_notify_to_state_clear(vrdl->bcontainer, section);
+    return vfio_state_change_notify_to_state_clear(vrdl->bcontainer, section);
 }
 
-static void vfio_private_shared_notify_to_private(StateChangeListener *scl,
-                                                  MemoryRegionSection *section)
+static int vfio_private_shared_notify_to_private(StateChangeListener *scl,
+                                                 MemoryRegionSection *section)
 {
     PrivateSharedListener *psl = container_of(scl, PrivateSharedListener, scl);
     VFIOPrivateSharedListener *vpsl = container_of(psl, VFIOPrivateSharedListener,
                                                    listener);
-    vfio_state_change_notify_to_state_clear(vpsl->bcontainer, section);
+    return vfio_state_change_notify_to_state_clear(vpsl->bcontainer, section);
 }
 
 static int vfio_state_change_notify_to_state_set(VFIOContainerBase *bcontainer,
diff --git a/include/exec/memory.h b/include/exec/memory.h
index a61896251c..9472d9e9b4 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -523,8 +523,8 @@ typedef int (*ReplayStateChange)(MemoryRegionSection *section, void *opaque);
 typedef struct StateChangeListener StateChangeListener;
 typedef int (*NotifyStateSet)(StateChangeListener *scl,
                               MemoryRegionSection *section);
-typedef void (*NotifyStateClear)(StateChangeListener *scl,
-                                 MemoryRegionSection *section);
+typedef int (*NotifyStateClear)(StateChangeListener *scl,
+                                MemoryRegionSection *section);
 
 struct StateChangeListener {
     /*

From patchwork Mon Apr  7 07:49:31 2025
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Chenyi Qiang <chenyi.qiang@intel.com>
X-Patchwork-Id: 14039925
Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.9])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id D880F226883
	for <kvm@vger.kernel.org>; Mon,  7 Apr 2025 07:50:29 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=198.175.65.9
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1744012231; cv=none;
 b=T7855+Y+adpfZvy/hxxDNy9A7IlocyeExy3uMYqv7rOfyZnzr15lGFfS4Y6oPev8H4beHPiW+Bgu1/LM2FsNZ55mfJX5HN4RNw4P/MzEQzF4eI9MwRrbboLhSZ3nIPyW7ZBO1jqkuDiyrTGKvSdLb/YinxlckHXf7u42IrieLt0=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1744012231; c=relaxed/simple;
	bh=LMM4jg2KgMvxa59xhZRJFI8Q6LECqUZX8PbGNMlnhmI=;
	h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References:
	 MIME-Version;
 b=rKlIYJPN8m9Fw0PukkNpm02UlkXWYqb5cXHZlveOCH1uu9D+2Qhc7SgZW5af1Ea6BatHhW/qmGsBO6pjcaWMW3LfOTZ0hHte3CBijbugsm36vsVbhZG3eDdJB+8lvbW2QyjPvfw9SZntP4HBbWmJooGf747w04p8t2sWPbARSNM=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=intel.com;
 spf=pass smtp.mailfrom=intel.com;
 dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com
 header.b=SLDvahDD; arc=none smtp.client-ip=198.175.65.9
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=intel.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=intel.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com
 header.b="SLDvahDD"
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1744012230; x=1775548230;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=LMM4jg2KgMvxa59xhZRJFI8Q6LECqUZX8PbGNMlnhmI=;
  b=SLDvahDDypKvu9QpZi+IHwuPN1py72J2jzDm2a4jHEEC34jOaYAjKiTZ
   k8RtJKAK6n4cBE8+154AlxiwpRtAv9ugCdbeEdoR4u7pTg2nNw+00mFIm
   R6N60HaOcO+9nL89HFkWRVBB4rwfjF15n5G0UA1o1lLP4rxx++QL21sRf
   FNCUZW+rl+xhNSHt0cDktrOFeG3ulWPFtsjb1yGwyVnWNcTb3qTnEfvSe
   uPCENs8GqBnSmWCCoeWgBOo2rALOB6othpq9EgnhKqmPAF//yx1TSL9Ez
   lm7qunAwBraLl7g6h2hHdQ3+v6y6rsH/9q6RELZya3NZpBB+q1937bdPY
   g==;
X-CSE-ConnectionGUID: kvDkuGAeSRuYWucV9xGkhQ==
X-CSE-MsgGUID: i5i+kCkyReWVMpiaUwc3HA==
X-IronPort-AV: E=McAfee;i="6700,10204,11396"; a="67857596"
X-IronPort-AV: E=Sophos;i="6.15,193,1739865600";
   d="scan'208";a="67857596"
Received: from orviesa007.jf.intel.com ([10.64.159.147])
  by orvoesa101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 07 Apr 2025 00:50:30 -0700
X-CSE-ConnectionGUID: kzUdn2HcR1SCId2TI2iwPg==
X-CSE-MsgGUID: DHbBCsbTSEi+NmWP/DDS1w==
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="6.15,193,1739865600";
   d="scan'208";a="128405694"
Received: from emr-bkc.sh.intel.com ([10.112.230.82])
  by orviesa007-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 07 Apr 2025 00:50:25 -0700
From: Chenyi Qiang <chenyi.qiang@intel.com>
To: David Hildenbrand <david@redhat.com>, Alexey Kardashevskiy <aik@amd.com>,
 Peter Xu <peterx@redhat.com>, Gupta Pankaj <pankaj.gupta@amd.com>,
 Paolo Bonzini <pbonzini@redhat.com>,
 =?utf-8?q?Philippe_Mathieu-Daud=C3=A9?= <philmd@linaro.org>,
 Michael Roth <michael.roth@amd.com>
Cc: Chenyi Qiang <chenyi.qiang@intel.com>,
	qemu-devel@nongnu.org,
	kvm@vger.kernel.org,
	Williams Dan J <dan.j.williams@intel.com>,
	Peng Chao P <chao.p.peng@intel.com>,
	Gao Chao <chao.gao@intel.com>,
	Xu Yilun <yilun.xu@intel.com>,
	Li Xiaoyao <xiaoyao.li@intel.com>
Subject: [PATCH v4 11/13] KVM: Introduce CVMPrivateSharedListener for
 attribute changes during page conversions
Date: Mon,  7 Apr 2025 15:49:31 +0800
Message-ID: <20250407074939.18657-12-chenyi.qiang@intel.com>
X-Mailer: git-send-email 2.43.5
In-Reply-To: <20250407074939.18657-1-chenyi.qiang@intel.com>
References: <20250407074939.18657-1-chenyi.qiang@intel.com>
Precedence: bulk
X-Mailing-List: kvm@vger.kernel.org
List-Id: <kvm.vger.kernel.org>
List-Subscribe: <mailto:kvm+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:kvm+unsubscribe@vger.kernel.org>
MIME-Version: 1.0

With the introduction of the RamBlockAttribute object to manage
RAMBlocks with guest_memfd and the implementation of
PrivateSharedManager interface to convey page conversion events, it is
more elegant to move attribute changes into a PrivateSharedListener.

The PrivateSharedListener is reigstered/unregistered for each memory
region section during kvm_region_add/del(), and listeners are stored in
a CVMPrivateSharedListener list for easy management. The listener
handler performs attribute changes upon receiving notifications from
private_shared_manager_state_change() calls. With this change, the
state changes operations in kvm_convert_memory() can be removed.

Note that after moving attribute changes into a listener, errors can be
returned in ram_block_attribute_notify_to_private() if attribute changes
fail in corner cases (e.g. -ENOMEM). Since there is currently no rollback
operation for the to_private case, an assert is used to prevent the
guest from continuing with a partially changed attribute state.

Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com>
---
Changes in v4:
    - Newly added.
---
 accel/kvm/kvm-all.c                         | 73 ++++++++++++++++++---
 include/system/confidential-guest-support.h | 10 +++
 system/ram-block-attribute.c                | 17 ++++-
 target/i386/kvm/tdx.c                       |  1 +
 target/i386/sev.c                           |  1 +
 5 files changed, 90 insertions(+), 12 deletions(-)

diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index 546b58b737..aec64d559b 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -48,6 +48,7 @@
 #include "kvm-cpus.h"
 #include "system/dirtylimit.h"
 #include "qemu/range.h"
+#include "system/confidential-guest-support.h"
 
 #include "hw/boards.h"
 #include "system/stats.h"
@@ -1691,28 +1692,91 @@ static int kvm_dirty_ring_init(KVMState *s)
     return 0;
 }
 
+static int kvm_private_shared_notify(StateChangeListener *scl,
+                                     MemoryRegionSection *section,
+                                     bool to_private)
+{
+    hwaddr start = section->offset_within_address_space;
+    hwaddr size = section->size;
+
+    if (to_private) {
+        return kvm_set_memory_attributes_private(start, size);
+    } else {
+        return kvm_set_memory_attributes_shared(start, size);
+    }
+}
+
+static int kvm_private_shared_notify_to_shared(StateChangeListener *scl,
+                                               MemoryRegionSection *section)
+{
+    return kvm_private_shared_notify(scl, section, false);
+}
+
+static int kvm_private_shared_notify_to_private(StateChangeListener *scl,
+                                                MemoryRegionSection *section)
+{
+    return kvm_private_shared_notify(scl, section, true);
+}
+
 static void kvm_region_add(MemoryListener *listener,
                            MemoryRegionSection *section)
 {
     KVMMemoryListener *kml = container_of(listener, KVMMemoryListener, listener);
+    ConfidentialGuestSupport *cgs = MACHINE(qdev_get_machine())->cgs;
+    GenericStateManager *gsm = memory_region_get_generic_state_manager(section->mr);
     KVMMemoryUpdate *update;
+    CVMPrivateSharedListener *cpsl;
+    PrivateSharedListener *psl;
+
 
     update = g_new0(KVMMemoryUpdate, 1);
     update->section = *section;
 
     QSIMPLEQ_INSERT_TAIL(&kml->transaction_add, update, next);
+
+    if (!memory_region_has_guest_memfd(section->mr) || !gsm) {
+        return;
+    }
+
+    cpsl = g_new0(CVMPrivateSharedListener, 1);
+    cpsl->mr = section->mr;
+    cpsl->offset_within_address_space = section->offset_within_address_space;
+    cpsl->granularity = generic_state_manager_get_min_granularity(gsm, section->mr);
+    psl = &cpsl->listener;
+    QLIST_INSERT_HEAD(&cgs->cvm_private_shared_list, cpsl, next);
+    private_shared_listener_init(psl, kvm_private_shared_notify_to_shared,
+                                 kvm_private_shared_notify_to_private);
+    generic_state_manager_register_listener(gsm, &psl->scl, section);
 }
 
 static void kvm_region_del(MemoryListener *listener,
                            MemoryRegionSection *section)
 {
     KVMMemoryListener *kml = container_of(listener, KVMMemoryListener, listener);
+    ConfidentialGuestSupport *cgs = MACHINE(qdev_get_machine())->cgs;
+    GenericStateManager *gsm = memory_region_get_generic_state_manager(section->mr);
     KVMMemoryUpdate *update;
+    CVMPrivateSharedListener *cpsl;
+    PrivateSharedListener *psl;
 
     update = g_new0(KVMMemoryUpdate, 1);
     update->section = *section;
 
     QSIMPLEQ_INSERT_TAIL(&kml->transaction_del, update, next);
+    if (!memory_region_has_guest_memfd(section->mr) || !gsm) {
+        return;
+    }
+
+    QLIST_FOREACH(cpsl, &cgs->cvm_private_shared_list, next) {
+        if (cpsl->mr == section->mr &&
+            cpsl->offset_within_address_space == section->offset_within_address_space) {
+            psl = &cpsl->listener;
+            generic_state_manager_unregister_listener(gsm, &psl->scl);
+            QLIST_REMOVE(cpsl, next);
+            g_free(cpsl);
+            break;
+        }
+    }
 }
 
 static void kvm_region_commit(MemoryListener *listener)
@@ -3076,15 +3140,6 @@ int kvm_convert_memory(hwaddr start, hwaddr size, bool to_private)
         goto out_unref;
     }
 
-    if (to_private) {
-        ret = kvm_set_memory_attributes_private(start, size);
-    } else {
-        ret = kvm_set_memory_attributes_shared(start, size);
-    }
-    if (ret) {
-        goto out_unref;
-    }
-
     addr = memory_region_get_ram_ptr(mr) + section.offset_within_region;
     rb = qemu_ram_block_from_host(addr, false, &offset);
 
diff --git a/include/system/confidential-guest-support.h b/include/system/confidential-guest-support.h
index b68c4bebbc..64f67db19a 100644
--- a/include/system/confidential-guest-support.h
+++ b/include/system/confidential-guest-support.h
@@ -23,12 +23,20 @@
 #endif
 
 #include "qom/object.h"
+#include "exec/memory.h"
 
 #define TYPE_CONFIDENTIAL_GUEST_SUPPORT "confidential-guest-support"
 OBJECT_DECLARE_TYPE(ConfidentialGuestSupport,
                     ConfidentialGuestSupportClass,
                     CONFIDENTIAL_GUEST_SUPPORT)
 
+typedef struct CVMPrivateSharedListener {
+    MemoryRegion *mr;
+    hwaddr offset_within_address_space;
+    uint64_t granularity;
+    PrivateSharedListener listener;
+    QLIST_ENTRY(CVMPrivateSharedListener) next;
+} CVMPrivateSharedListener;
 
 struct ConfidentialGuestSupport {
     Object parent;
@@ -38,6 +46,8 @@ struct ConfidentialGuestSupport {
      */
     bool require_guest_memfd;
 
+    QLIST_HEAD(, CVMPrivateSharedListener) cvm_private_shared_list;
+
     /*
      * ready: flag set by CGS initialization code once it's ready to
      *        start executing instructions in a potentially-secure
diff --git a/system/ram-block-attribute.c b/system/ram-block-attribute.c
index 06ed134cda..15c9aebd09 100644
--- a/system/ram-block-attribute.c
+++ b/system/ram-block-attribute.c
@@ -259,6 +259,7 @@ static void ram_block_attribute_notify_to_private(RamBlockAttribute *attr,
                                                   uint64_t offset, uint64_t size)
 {
     PrivateSharedListener *psl;
+    int ret;
 
     QLIST_FOREACH(psl, &attr->psl_list, next) {
         StateChangeListener *scl = &psl->scl;
@@ -267,7 +268,12 @@ static void ram_block_attribute_notify_to_private(RamBlockAttribute *attr,
         if (!memory_region_section_intersect_range(&tmp, offset, size)) {
             continue;
         }
-        scl->notify_to_state_clear(scl, &tmp);
+        /*
+         * No undo operation for the state_clear() callback failure at present.
+         * Expect the state_clear() callback always succeed.
+         */
+        ret = scl->notify_to_state_clear(scl, &tmp);
+        g_assert(!ret);
     }
 }
 
@@ -275,7 +281,7 @@ static int ram_block_attribute_notify_to_shared(RamBlockAttribute *attr,
                                                 uint64_t offset, uint64_t size)
 {
     PrivateSharedListener *psl, *psl2;
-    int ret = 0;
+    int ret = 0, ret2 = 0;
 
     QLIST_FOREACH(psl, &attr->psl_list, next) {
         StateChangeListener *scl = &psl->scl;
@@ -302,7 +308,12 @@ static int ram_block_attribute_notify_to_shared(RamBlockAttribute *attr,
             if (!memory_region_section_intersect_range(&tmp, offset, size)) {
                 continue;
             }
-            scl2->notify_to_state_clear(scl2, &tmp);
+            /*
+             * No undo operation for the state_clear() callback failure at present.
+             * Expect the state_clear() callback always succeed.
+             */
+            ret2 = scl2->notify_to_state_clear(scl2, &tmp);
+            g_assert(!ret2);
         }
     }
     return ret;
diff --git a/target/i386/kvm/tdx.c b/target/i386/kvm/tdx.c
index c906a76c4c..718385c8de 100644
--- a/target/i386/kvm/tdx.c
+++ b/target/i386/kvm/tdx.c
@@ -1179,6 +1179,7 @@ static void tdx_guest_init(Object *obj)
     qemu_mutex_init(&tdx->lock);
 
     cgs->require_guest_memfd = true;
+    QLIST_INIT(&cgs->cvm_private_shared_list);
     tdx->attributes = TDX_TD_ATTRIBUTES_SEPT_VE_DISABLE;
 
     object_property_add_uint64_ptr(obj, "attributes", &tdx->attributes,
diff --git a/target/i386/sev.c b/target/i386/sev.c
index 217b19ad7b..6647727a44 100644
--- a/target/i386/sev.c
+++ b/target/i386/sev.c
@@ -2432,6 +2432,7 @@ sev_snp_guest_instance_init(Object *obj)
     SevSnpGuestState *sev_snp_guest = SEV_SNP_GUEST(obj);
 
     cgs->require_guest_memfd = true;
+    QLIST_INIT(&cgs->cvm_private_shared_list);
 
     /* default init/start/finish params for kvm */
     sev_snp_guest->kvm_start_conf.policy = DEFAULT_SEV_SNP_POLICY;

From patchwork Mon Apr  7 07:49:32 2025
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Chenyi Qiang <chenyi.qiang@intel.com>
X-Patchwork-Id: 14039926
Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.9])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8309A227E9B
	for <kvm@vger.kernel.org>; Mon,  7 Apr 2025 07:50:33 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=198.175.65.9
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1744012235; cv=none;
 b=aZ19LCY7KtU+8Xb2J4P6Ke+MfwsKs99S3rkQQ5bLHunW0oIx5fbG850QF6lGgPxF3u9DqOtu4pRn+OF4MVFWQriRnFchIbB1YuLPZG2A4PZwApdHVFuJ+RsJhUD2R5bsEwy6jdyYflXaIG5oGSP9yXt1O3pcPmYit503Qx+XeBI=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1744012235; c=relaxed/simple;
	bh=NXIbQZGPX/IetC+2ATda7+FulTolZHjLHw2CKq3GgXI=;
	h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References:
	 MIME-Version;
 b=XOzWUrzy8a63HCjcvFAfXhEDkZibG2hFIsncuD/9OwLXfgXJOOQfLHDmeeOcnMPBomwrmccu234ZLH6T6CjT5Z4F/Depl2APEo1pegEnvXzf1zWQneLq99nLGMLqWpBpXi84pMPS+SsiHO/xVKE8TUzAYL5AmnVVrN7zAYTtIUw=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=intel.com;
 spf=pass smtp.mailfrom=intel.com;
 dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com
 header.b=BNcMIuzd; arc=none smtp.client-ip=198.175.65.9
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=intel.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=intel.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com
 header.b="BNcMIuzd"
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1744012234; x=1775548234;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=NXIbQZGPX/IetC+2ATda7+FulTolZHjLHw2CKq3GgXI=;
  b=BNcMIuzd+yRJxjeLkEjz/vO0Rn+N9wtxdDw3N/z2YXaD1PviepbgRrL0
   S1FEoz+hW3UXxU7IPAEKawAmISGf4iBrzI9FNgD+KW9jwduzvZeoAyKhf
   Z4C3DV9bYY2xNNdoj3Cf5w/+7BS3TG1WLaRobLcBKfoZNFPS8LuzntnOD
   H5stAF0qLzWbSlHLcJB5aqIGnMSWyfSg5JPtG9o+mWHMjeI6kkOJKlQnA
   E6MiibRX/W7s+vuzzm8c9/Nn6rl1tWn/pJAiGd9CJlZDDWVF2+9HMItVD
   y1ON2FuxYhTz3lIFIloxJWuiW3X5zPReHUMpWxyGhcnwhofWpZDqQrXsZ
   w==;
X-CSE-ConnectionGUID: qPYgPExpQ9WxO9SkqCnxYQ==
X-CSE-MsgGUID: Ca8H9lCZS4uL/5Ig5te6fQ==
X-IronPort-AV: E=McAfee;i="6700,10204,11396"; a="67857610"
X-IronPort-AV: E=Sophos;i="6.15,193,1739865600";
   d="scan'208";a="67857610"
Received: from orviesa007.jf.intel.com ([10.64.159.147])
  by orvoesa101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 07 Apr 2025 00:50:34 -0700
X-CSE-ConnectionGUID: 7njJYSf+TOOQowEGtCZrXg==
X-CSE-MsgGUID: Hvft6m07QiGM8/baoKcQdA==
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="6.15,193,1739865600";
   d="scan'208";a="128405702"
Received: from emr-bkc.sh.intel.com ([10.112.230.82])
  by orviesa007-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 07 Apr 2025 00:50:30 -0700
From: Chenyi Qiang <chenyi.qiang@intel.com>
To: David Hildenbrand <david@redhat.com>, Alexey Kardashevskiy <aik@amd.com>,
 Peter Xu <peterx@redhat.com>, Gupta Pankaj <pankaj.gupta@amd.com>,
 Paolo Bonzini <pbonzini@redhat.com>,
 =?utf-8?q?Philippe_Mathieu-Daud=C3=A9?= <philmd@linaro.org>,
 Michael Roth <michael.roth@amd.com>
Cc: Chenyi Qiang <chenyi.qiang@intel.com>,
	qemu-devel@nongnu.org,
	kvm@vger.kernel.org,
	Williams Dan J <dan.j.williams@intel.com>,
	Peng Chao P <chao.p.peng@intel.com>,
	Gao Chao <chao.gao@intel.com>,
	Xu Yilun <yilun.xu@intel.com>,
	Li Xiaoyao <xiaoyao.li@intel.com>
Subject: [PATCH v4 12/13] ram-block-attribute: Add priority listener support
 for PrivateSharedListener
Date: Mon,  7 Apr 2025 15:49:32 +0800
Message-ID: <20250407074939.18657-13-chenyi.qiang@intel.com>
X-Mailer: git-send-email 2.43.5
In-Reply-To: <20250407074939.18657-1-chenyi.qiang@intel.com>
References: <20250407074939.18657-1-chenyi.qiang@intel.com>
Precedence: bulk
X-Mailing-List: kvm@vger.kernel.org
List-Id: <kvm.vger.kernel.org>
List-Subscribe: <mailto:kvm+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:kvm+unsubscribe@vger.kernel.org>
MIME-Version: 1.0

In-place page conversion requires operations to follow a specific
sequence: unmap-before-conversion-to-private and
map-after-conversion-to-shared. Currently, both attribute changes and
VFIO DMA map/unmap operations are handled by PrivateSharedListeners,
they need to be invoked in a specific order.

For private to shared conversion:
- Change attribute to shared.
- VFIO populates the shared mappings into the IOMMU.
- Restore attribute if the operation fails.

For shared to private conversion:
- VFIO discards shared mapping from the IOMMU.
- Change attribute to private.

To faciliate this sequence, priority support is added to
PrivateSharedListener so that listeners are stored in a determined
order based on priority. A tail queue is used to store listeners,
allowing traversal in either direction.

Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com>
---
Changes in v4:
    - Newly added.
---
 accel/kvm/kvm-all.c          |  3 ++-
 hw/vfio/common.c             |  3 ++-
 include/exec/memory.h        | 19 +++++++++++++++++--
 include/exec/ramblock.h      |  2 +-
 system/ram-block-attribute.c | 23 +++++++++++++++++------
 5 files changed, 39 insertions(+), 11 deletions(-)

diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index aec64d559b..879c61b391 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -1745,7 +1745,8 @@ static void kvm_region_add(MemoryListener *listener,
     psl = &cpsl->listener;
     QLIST_INSERT_HEAD(&cgs->cvm_private_shared_list, cpsl, next);
     private_shared_listener_init(psl, kvm_private_shared_notify_to_shared,
-                                 kvm_private_shared_notify_to_private);
+                                 kvm_private_shared_notify_to_private,
+                                 PRIVATE_SHARED_LISTENER_PRIORITY_MIN);
     generic_state_manager_register_listener(gsm, &psl->scl, section);
 }
 
diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 6e49ae597d..a8aacae26c 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -515,7 +515,8 @@ static void vfio_register_private_shared_listener(VFIOContainerBase *bcontainer,
 
     psl = &vpsl->listener;
     private_shared_listener_init(psl, vfio_private_shared_notify_to_shared,
-                                 vfio_private_shared_notify_to_private);
+                                 vfio_private_shared_notify_to_private,
+                                 PRIVATE_SHARED_LISTENER_PRIORITY_COMMON);
     generic_state_manager_register_listener(gsm, &psl->scl, section);
     QLIST_INSERT_HEAD(&bcontainer->vpsl_list, vpsl, next);
 }
diff --git a/include/exec/memory.h b/include/exec/memory.h
index 9472d9e9b4..3d06cc04a0 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -770,11 +770,24 @@ struct RamDiscardManagerClass {
     GenericStateManagerClass parent_class;
 };
 
+#define PRIVATE_SHARED_LISTENER_PRIORITY_MIN       0
+#define PRIVATE_SHARED_LISTENER_PRIORITY_COMMON    10
+
 typedef struct PrivateSharedListener PrivateSharedListener;
 struct PrivateSharedListener {
     struct StateChangeListener scl;
 
-    QLIST_ENTRY(PrivateSharedListener) next;
+    /*
+     * @priority:
+     *
+     * Govern the order in which ram discard listeners are invoked. Lower priorities
+     * are invoked earlier.
+     * The listener priority can help to undo the effects of previous listeners in
+     * a reverse order in case of a failure callback.
+     */
+    int priority;
+
+    QTAILQ_ENTRY(PrivateSharedListener) next;
 };
 
 struct PrivateSharedManagerClass {
@@ -787,9 +800,11 @@ struct PrivateSharedManagerClass {
 
 static inline void private_shared_listener_init(PrivateSharedListener *psl,
                                                 NotifyStateSet populate_fn,
-                                                NotifyStateClear discard_fn)
+                                                NotifyStateClear discard_fn,
+                                                int priority)
 {
     state_change_listener_init(&psl->scl, populate_fn, discard_fn);
+    psl->priority = priority;
 }
 
 int private_shared_manager_state_change(PrivateSharedManager *mgr,
diff --git a/include/exec/ramblock.h b/include/exec/ramblock.h
index 78eb031819..7a3dd709bb 100644
--- a/include/exec/ramblock.h
+++ b/include/exec/ramblock.h
@@ -105,7 +105,7 @@ struct RamBlockAttribute {
     unsigned shared_bitmap_size;
     unsigned long *shared_bitmap;
 
-    QLIST_HEAD(, PrivateSharedListener) psl_list;
+    QTAILQ_HEAD(, PrivateSharedListener) psl_list;
 };
 
 struct RamBlockAttributeClass {
diff --git a/system/ram-block-attribute.c b/system/ram-block-attribute.c
index 15c9aebd09..fd041148c7 100644
--- a/system/ram-block-attribute.c
+++ b/system/ram-block-attribute.c
@@ -158,12 +158,23 @@ static void ram_block_attribute_psm_register_listener(GenericStateManager *gsm,
 {
     RamBlockAttribute *attr = RAM_BLOCK_ATTRIBUTE(gsm);
     PrivateSharedListener *psl = container_of(scl, PrivateSharedListener, scl);
+    PrivateSharedListener *other = NULL;
     int ret;
 
     g_assert(section->mr == attr->mr);
     scl->section = memory_region_section_new_copy(section);
 
-    QLIST_INSERT_HEAD(&attr->psl_list, psl, next);
+    if (QTAILQ_EMPTY(&attr->psl_list) ||
+        psl->priority >= QTAILQ_LAST(&attr->psl_list)->priority) {
+        QTAILQ_INSERT_TAIL(&attr->psl_list, psl, next);
+    } else {
+        QTAILQ_FOREACH(other, &attr->psl_list, next) {
+            if (psl->priority < other->priority) {
+                break;
+            }
+        }
+        QTAILQ_INSERT_BEFORE(other, psl, next);
+    }
 
     ret = ram_block_attribute_for_each_shared_section(attr, section, scl,
                                                       ram_block_attribute_notify_shared_cb);
@@ -192,7 +203,7 @@ static void ram_block_attribute_psm_unregister_listener(GenericStateManager *gsm
 
     memory_region_section_free_copy(scl->section);
     scl->section = NULL;
-    QLIST_REMOVE(psl, next);
+    QTAILQ_REMOVE(&attr->psl_list, psl, next);
 }
 
 typedef struct RamBlockAttributeReplayData {
@@ -261,7 +272,7 @@ static void ram_block_attribute_notify_to_private(RamBlockAttribute *attr,
     PrivateSharedListener *psl;
     int ret;
 
-    QLIST_FOREACH(psl, &attr->psl_list, next) {
+    QTAILQ_FOREACH_REVERSE(psl, &attr->psl_list, next) {
         StateChangeListener *scl = &psl->scl;
         MemoryRegionSection tmp = *scl->section;
 
@@ -283,7 +294,7 @@ static int ram_block_attribute_notify_to_shared(RamBlockAttribute *attr,
     PrivateSharedListener *psl, *psl2;
     int ret = 0, ret2 = 0;
 
-    QLIST_FOREACH(psl, &attr->psl_list, next) {
+    QTAILQ_FOREACH(psl, &attr->psl_list, next) {
         StateChangeListener *scl = &psl->scl;
         MemoryRegionSection tmp = *scl->section;
 
@@ -298,7 +309,7 @@ static int ram_block_attribute_notify_to_shared(RamBlockAttribute *attr,
 
     if (ret) {
         /* Notify all already-notified listeners. */
-        QLIST_FOREACH(psl2, &attr->psl_list, next) {
+        QTAILQ_FOREACH(psl2, &attr->psl_list, next) {
             StateChangeListener *scl2 = &psl2->scl;
             MemoryRegionSection tmp = *scl2->section;
 
@@ -462,7 +473,7 @@ static void ram_block_attribute_init(Object *obj)
 {
     RamBlockAttribute *attr = RAM_BLOCK_ATTRIBUTE(obj);
 
-    QLIST_INIT(&attr->psl_list);
+    QTAILQ_INIT(&attr->psl_list);
 }
 
 static void ram_block_attribute_finalize(Object *obj)

From patchwork Mon Apr  7 07:49:33 2025
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Chenyi Qiang <chenyi.qiang@intel.com>
X-Patchwork-Id: 14039927
Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.9])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1D3F2226170
	for <kvm@vger.kernel.org>; Mon,  7 Apr 2025 07:50:37 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=198.175.65.9
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1744012238; cv=none;
 b=T77vfcGMlV6dsfQL+UCV68ktiX5OHwuhPJUuI3jpYMhlHbo6PPIlCNbnfVyzciUTKtblXfiOrs3TT252KnxlrnjXOnjepwmqlmspMFU4CQSPR6ryKIubfL6WryyP9RA4lnkkDTyx2vS11WZpFNPGo3sthcEKuzNt2k4zRU4tUyc=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1744012238; c=relaxed/simple;
	bh=72seqPwiqtTTIfhrsv8Di6UixdAaQLoSv7f9LbSTVdM=;
	h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References:
	 MIME-Version;
 b=mXCImQYLcE+zSkgpH5wVQWYv4xq2qufz5O1+doBaga5f9uDbwF8KdLviHrKieA1HaY36H3QVbNf08uovS3ad2g/hH6xiR+boqX3izNDtzjwTE23BJ+qSZwWACIPHf4dVc7C0LeDFGrYFnPvlL6e8CPwSEa+8MfD4Ut6AmWdEjUk=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=intel.com;
 spf=pass smtp.mailfrom=intel.com;
 dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com
 header.b=mEj+anw7; arc=none smtp.client-ip=198.175.65.9
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=intel.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=intel.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com
 header.b="mEj+anw7"
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1744012237; x=1775548237;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=72seqPwiqtTTIfhrsv8Di6UixdAaQLoSv7f9LbSTVdM=;
  b=mEj+anw78jFm7aVouiq/iipDPYnaqpnqgDawI11g7kPKQ5RiOd8IE1fQ
   BIZomayg5JA2hz3L01H9i9ifSiItS3A/elvKl2+DQQl55Czpjqds0lxuC
   xdrXXk/biI/NkhZ4hA9KYmBr3cAHW4SoliMeS966DiHGedvzUFH6xnRdB
   UQhKMtmeyRDsGC6cbaDstH+rHzBDPT7rg3wBTNgQ964oqv13szGkkAkWJ
   +eS4VPGznz6CCtuejQVAyKdW4FTZLMvIYqJlX0pBJ6Fc8j0UGKp3jQ8rn
   RWUpIVVuJxxpSsbdtjNdzlSot+I8qJvy5WRrxQ2wGMCQXr59QmziVNKCF
   w==;
X-CSE-ConnectionGUID: zJ+UYIC8TKO6Z9N21k5VSg==
X-CSE-MsgGUID: ar1ul16xSnCfHopYDBPA5Q==
X-IronPort-AV: E=McAfee;i="6700,10204,11396"; a="67857619"
X-IronPort-AV: E=Sophos;i="6.15,193,1739865600";
   d="scan'208";a="67857619"
Received: from orviesa007.jf.intel.com ([10.64.159.147])
  by orvoesa101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 07 Apr 2025 00:50:37 -0700
X-CSE-ConnectionGUID: uVjmCoGaRRmRYXxjCVAOLw==
X-CSE-MsgGUID: 4g4q8ObRQZijpc802CGgiA==
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="6.15,193,1739865600";
   d="scan'208";a="128405709"
Received: from emr-bkc.sh.intel.com ([10.112.230.82])
  by orviesa007-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 07 Apr 2025 00:50:34 -0700
From: Chenyi Qiang <chenyi.qiang@intel.com>
To: David Hildenbrand <david@redhat.com>, Alexey Kardashevskiy <aik@amd.com>,
 Peter Xu <peterx@redhat.com>, Gupta Pankaj <pankaj.gupta@amd.com>,
 Paolo Bonzini <pbonzini@redhat.com>,
 =?utf-8?q?Philippe_Mathieu-Daud=C3=A9?= <philmd@linaro.org>,
 Michael Roth <michael.roth@amd.com>
Cc: Chenyi Qiang <chenyi.qiang@intel.com>,
	qemu-devel@nongnu.org,
	kvm@vger.kernel.org,
	Williams Dan J <dan.j.williams@intel.com>,
	Peng Chao P <chao.p.peng@intel.com>,
	Gao Chao <chao.gao@intel.com>,
	Xu Yilun <yilun.xu@intel.com>,
	Li Xiaoyao <xiaoyao.li@intel.com>
Subject: [PATCH v4 13/13] RAMBlock: Make guest_memfd require coordinate
 discard
Date: Mon,  7 Apr 2025 15:49:33 +0800
Message-ID: <20250407074939.18657-14-chenyi.qiang@intel.com>
X-Mailer: git-send-email 2.43.5
In-Reply-To: <20250407074939.18657-1-chenyi.qiang@intel.com>
References: <20250407074939.18657-1-chenyi.qiang@intel.com>
Precedence: bulk
X-Mailing-List: kvm@vger.kernel.org
List-Id: <kvm.vger.kernel.org>
List-Subscribe: <mailto:kvm+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:kvm+unsubscribe@vger.kernel.org>
MIME-Version: 1.0

As guest_memfd is now managed by ram_block_attribute with
PrivateSharedManager, only block uncoordinated discard.

Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com>
---
Changes in v4:
    - Modify commit message (RamDiscardManager->PrivateSharedManager).

Changes in v3:
    - No change.

Changes in v2:
    - Change the ram_block_discard_require(false) to
      ram_block_coordinated_discard_require(false).
---
 system/physmem.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/system/physmem.c b/system/physmem.c
index fb74321e10..5e72d2a544 100644
--- a/system/physmem.c
+++ b/system/physmem.c
@@ -1871,7 +1871,7 @@ static void ram_block_add(RAMBlock *new_block, Error **errp)
         assert(kvm_enabled());
         assert(new_block->guest_memfd < 0);
 
-        ret = ram_block_discard_require(true);
+        ret = ram_block_coordinated_discard_require(true);
         if (ret < 0) {
             error_setg_errno(errp, -ret,
                              "cannot set up private guest memory: discard currently blocked");
@@ -1895,7 +1895,7 @@ static void ram_block_add(RAMBlock *new_block, Error **errp)
              */
             object_unref(OBJECT(new_block->ram_block_attribute));
             close(new_block->guest_memfd);
-            ram_block_discard_require(false);
+            ram_block_coordinated_discard_require(false);
             qemu_mutex_unlock_ramlist();
             goto out_free;
         }
@@ -2155,7 +2155,7 @@ static void reclaim_ramblock(RAMBlock *block)
         ram_block_attribute_unrealize(block->ram_block_attribute);
         object_unref(OBJECT(block->ram_block_attribute));
         close(block->guest_memfd);
-        ram_block_discard_require(false);
+        ram_block_coordinated_discard_require(false);
     }
 
     g_free(block);