From patchwork Thu Jul 25 07:21:10 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chenyi Qiang X-Patchwork-Id: 13741589 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.13]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 3C7984428 for ; Thu, 25 Jul 2024 07:22:42 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.13 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1721892164; cv=none; b=mAGYV3WvKgK3AfuLbFn8kMPxp4M1yJRFdlmGh+/p3xjhad1o7hxPd29s7EtiFyIHlwoto066ppIzOvnqSoGZDwcen3TVWcM02NnubRYjuwGcCtHZp6W0CxzuO8Lnk806J/xpVrlhWpdO0vGmkpZqv/5Sq4mEsXpPP4OT2Jr96uU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1721892164; c=relaxed/simple; bh=H5K123PXaJQqQxOo8ELc+HzpStV4NW+TCKC7nAh3EBY=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=Mx5LF+19nKPzmm4Nh5tRBNgIK2PG9iHhK725Kx9vX0u5Rmi+UxnbfuIheSAO4OmJJMJm8GUDBYGKAZwvSvFO/Uc3UHhkVZC+PEMfmrbavoc9WIykLuxRosShJ/U8Cjsdy0CobOLWMCA4BCtcg2KVPKy+CAc0Qms2TJb2rXdhAaA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=FueTa9SA; arc=none smtp.client-ip=198.175.65.13 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="FueTa9SA" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1721892162; x=1753428162; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=H5K123PXaJQqQxOo8ELc+HzpStV4NW+TCKC7nAh3EBY=; b=FueTa9SAgDym6Yoz+RNkPb8DgZu3D5p/kp04E65VRAIp+9fXcqw9vOER +BE5epWhxrXUk71XSI1YhD4txdRCjsXQ0qT66rVxwPN6wKtJ/P3JqsnaY z/69AyzvSZrF3lN5uWBRwKNVlCuLAdmkXBHMVxMoN9TUsH9qAIJJjCqor whgqHjzdHbXfoF6NAKU6693hisjpsBbiYxHAemR3SYinc2YDK7LuH4+6y I6XEaur5Webr39nXhWyFDsc6XPbVlvpzSUCJUP61ALikLpXqXJC+qLM3n w5g9ACJQXIKAcmf11+p9ScO3MdYE1wURDU+MQKOTH+F5BSTUEuQRN+uY1 Q==; X-CSE-ConnectionGUID: utObAXJoQWKhlmwvrRu81g== X-CSE-MsgGUID: q69eVkEVQyyE+QIgibbpSw== X-IronPort-AV: E=McAfee;i="6700,10204,11143"; a="30753930" X-IronPort-AV: E=Sophos;i="6.09,235,1716274800"; d="scan'208";a="30753930" Received: from orviesa009.jf.intel.com ([10.64.159.149]) by orvoesa105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 25 Jul 2024 00:22:42 -0700 X-CSE-ConnectionGUID: 3w8sSJbVTvCY1sQAFyC/hg== X-CSE-MsgGUID: yyr8OVH1QCOWaxtpNbBKOw== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.09,235,1716274800"; d="scan'208";a="52858146" Received: from emr-bkc.sh.intel.com ([10.112.230.82]) by orviesa009-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 25 Jul 2024 00:22:38 -0700 From: Chenyi Qiang To: Paolo Bonzini , David Hildenbrand , Peter Xu , =?utf-8?q?Philippe_Mathieu-Daud=C3=A9?= , Michael Roth Cc: Chenyi Qiang , qemu-devel@nongnu.org, kvm@vger.kernel.org, Williams Dan J , Edgecombe Rick P , Wang Wei W , Peng Chao P , Gao Chao , Wu Hao , Xu Yilun Subject: [RFC PATCH 1/6] guest_memfd: Introduce an object to manage the guest-memfd with RamDiscardManager Date: Thu, 25 Jul 2024 03:21:10 -0400 Message-ID: <20240725072118.358923-2-chenyi.qiang@intel.com> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20240725072118.358923-1-chenyi.qiang@intel.com> References: <20240725072118.358923-1-chenyi.qiang@intel.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 As the commit 852f0048f3 ("RAMBlock: make guest_memfd require uncoordinated discard") highlighted, some subsystems like VFIO might disable ram block discard. However, guest_memfd relies on the discard operation to perform page conversion between private and shared memory. This can lead to stale IOMMU mapping issue when assigning a hardware device to a confidential guest via shared memory (unprotected memory pages). Blocking shared page discard can solve this problem, but it could cause guests to consume twice the memory with VFIO, which is not acceptable in some cases. An alternative solution is to convey other systems like VFIO to refresh its outdated IOMMU mappings. RamDiscardManager is an existing concept (used by virtio-mem) to adjust VFIO mappings in relation to VM page assignement. Effectively page conversion is similar to hot-removing a page in one mode and adding it back in the other, so the similar work that needs to happen in response to virtio-mem changes needs to happen for page conversion events. Introduce the RamDiscardManager to guest_memfd to achieve it. However, Implementing the RamDiscardManager interface poses a challenge as guest_memfd is not an object, instead, it is contained within RamBlock and is indicated by a RAM_GUEST_MEMFD flag upon creation. One option is to implement the interface in HostMemoryBackend. Any guest_memfd-backed host memory backend can register itself in the target MemoryRegion. However, this solution doesn't cover the scenario where a guest_memfd MemoryRegion doesn't belong to the HostMemoryBackend, e.g. the virtual BIOS MemoryRegion. Thus, implement the second option, which involves defining an object type named guest_memfd_manager with the RamDiscardManager interface. Upon creation of guest_memfd, a new guest_memfd_manager object can be instantiated and registered to the managed guest_memfd MemoryRegion to handle the page conversion events. In the context of guest_memfd, the discarded state signifies that the page is private, while the populated state indicated that the page is shared. The state of the memory is tracked at the granularity of the host page size (i.e. block_size), as the minimum conversion size can be one page per request. In addition, VFIO expects the DMA mapping for a specific iova to be mapped and unmapped with the same granularity. However, there's no guarantee that the confidential guest won't partially convert the pages. For instance the confidential guest may flip a 2M page from private to shared and later flip the first 4K sub-range from shared to private. To prevent such invalid cases, all operations are performed with a 4K granularity. Signed-off-by: Chenyi Qiang --- include/sysemu/guest-memfd-manager.h | 46 +++++ system/guest-memfd-manager.c | 283 +++++++++++++++++++++++++++ system/meson.build | 1 + 3 files changed, 330 insertions(+) create mode 100644 include/sysemu/guest-memfd-manager.h create mode 100644 system/guest-memfd-manager.c diff --git a/include/sysemu/guest-memfd-manager.h b/include/sysemu/guest-memfd-manager.h new file mode 100644 index 0000000000..ab8c2ba362 --- /dev/null +++ b/include/sysemu/guest-memfd-manager.h @@ -0,0 +1,46 @@ +/* + * QEMU guest memfd manager + * + * Copyright Intel + * + * Author: + * Chenyi Qiang + * + * This work is licensed under the terms of the GNU GPL, version 2 or later. + * See the COPYING file in the top-level directory + * + */ + +#ifndef SYSEMU_GUEST_MEMFD_MANAGER_H +#define SYSEMU_GUEST_MEMFD_MANAGER_H + +#include "sysemu/hostmem.h" + +#define TYPE_GUEST_MEMFD_MANAGER "guest-memfd-manager" + +OBJECT_DECLARE_TYPE(GuestMemfdManager, GuestMemfdManagerClass, GUEST_MEMFD_MANAGER) + +struct GuestMemfdManager { + Object parent; + + /* Managed memory region. */ + MemoryRegion *mr; + + /* bitmap used to track discard (private) memory */ + int32_t discard_bitmap_size; + unsigned long *discard_bitmap; + + /* block size and alignment */ + uint64_t block_size; + + /* listeners to notify on populate/discard activity. */ + QLIST_HEAD(, RamDiscardListener) rdl_list; +}; + +struct GuestMemfdManagerClass { + ObjectClass parent_class; + + void (*realize)(Object *gmm, MemoryRegion *mr, uint64_t region_size); +}; + +#endif diff --git a/system/guest-memfd-manager.c b/system/guest-memfd-manager.c new file mode 100644 index 0000000000..7b90f26859 --- /dev/null +++ b/system/guest-memfd-manager.c @@ -0,0 +1,283 @@ +/* + * QEMU guest memfd manager + * + * Copyright Intel + * + * Author: + * Chenyi Qiang + * + * This work is licensed under the terms of the GNU GPL, version 2 or later. + * See the COPYING file in the top-level directory + * + */ + +#include "qemu/osdep.h" +#include "qemu/error-report.h" +#include "sysemu/guest-memfd-manager.h" + +OBJECT_DEFINE_SIMPLE_TYPE_WITH_INTERFACES(GuestMemfdManager, + guest_memfd_manager, + GUEST_MEMFD_MANAGER, + OBJECT, + { TYPE_RAM_DISCARD_MANAGER }, + { }) + +static bool guest_memfd_rdm_is_populated(const RamDiscardManager *rdm, + const MemoryRegionSection *section) +{ + const GuestMemfdManager *gmm = GUEST_MEMFD_MANAGER(rdm); + uint64_t first_bit = section->offset_within_region / gmm->block_size; + uint64_t last_bit = first_bit + int128_get64(section->size) / gmm->block_size - 1; + unsigned long first_discard_bit; + + first_discard_bit = find_next_bit(gmm->discard_bitmap, last_bit + 1, first_bit); + return first_discard_bit > last_bit; +} + +static bool guest_memfd_rdm_intersect_memory_section(MemoryRegionSection *section, + uint64_t offset, uint64_t size) +{ + uint64_t start = MAX(section->offset_within_region, offset); + uint64_t end = MIN(section->offset_within_region + int128_get64(section->size), + offset + size); + if (end <= start) { + return false; + } + + section->offset_within_address_space += start - section->offset_within_region; + section->offset_within_region = start; + section->size = int128_make64(end - start); + + return true; +} + +typedef int (*guest_memfd_section_cb)(MemoryRegionSection *s, void *arg); + +static int guest_memfd_notify_populate_cb(MemoryRegionSection *section, void *arg) +{ + RamDiscardListener *rdl = arg; + + return rdl->notify_populate(rdl, section); +} + +static int guest_memfd_notify_discard_cb(MemoryRegionSection *section, void *arg) +{ + RamDiscardListener *rdl = arg; + + rdl->notify_discard(rdl, section); + + return 0; +} + +static int guest_memfd_for_each_populated_range(const GuestMemfdManager *gmm, + MemoryRegionSection *section, + void *arg, + guest_memfd_section_cb cb) +{ + unsigned long first_zero_bit, last_zero_bit; + uint64_t offset, size; + int ret = 0; + + first_zero_bit = section->offset_within_region / gmm->block_size; + first_zero_bit = find_next_zero_bit(gmm->discard_bitmap, gmm->discard_bitmap_size, + first_zero_bit); + + while (first_zero_bit < gmm->discard_bitmap_size) { + MemoryRegionSection tmp = *section; + + offset = first_zero_bit * gmm->block_size; + last_zero_bit = find_next_bit(gmm->discard_bitmap, gmm->discard_bitmap_size, + first_zero_bit + 1) - 1; + size = (last_zero_bit - first_zero_bit + 1) * gmm->block_size; + + if (!guest_memfd_rdm_intersect_memory_section(&tmp, offset, size)) { + break; + } + + ret = cb(&tmp, arg); + if (ret) { + break; + } + + first_zero_bit = find_next_zero_bit(gmm->discard_bitmap, gmm->discard_bitmap_size, + last_zero_bit + 2); + } + + return ret; +} + +static int guest_memfd_for_each_discarded_range(const GuestMemfdManager *gmm, + MemoryRegionSection *section, + void *arg, + guest_memfd_section_cb cb) +{ + unsigned long first_one_bit, last_one_bit; + uint64_t offset, size; + int ret = 0; + + first_one_bit = section->offset_within_region / gmm->block_size; + first_one_bit = find_next_bit(gmm->discard_bitmap, gmm->discard_bitmap_size, + first_one_bit); + + while (first_one_bit < gmm->discard_bitmap_size) { + MemoryRegionSection tmp = *section; + + offset = first_one_bit * gmm->block_size; + last_one_bit = find_next_zero_bit(gmm->discard_bitmap, gmm->discard_bitmap_size, + first_one_bit + 1) - 1; + size = (last_one_bit - first_one_bit + 1) * gmm->block_size; + + if (!guest_memfd_rdm_intersect_memory_section(&tmp, offset, size)) { + break; + } + + ret = cb(&tmp, arg); + if (ret) { + break; + } + + first_one_bit = find_next_bit(gmm->discard_bitmap, gmm->discard_bitmap_size, + last_one_bit + 2); + } + + return ret; +} + +static uint64_t guest_memfd_rdm_get_min_granularity(const RamDiscardManager *rdm, + const MemoryRegion *mr) +{ + GuestMemfdManager *gmm = GUEST_MEMFD_MANAGER(rdm); + + g_assert(mr == gmm->mr); + return gmm->block_size; +} + +static void guest_memfd_rdm_register_listener(RamDiscardManager *rdm, + RamDiscardListener *rdl, + MemoryRegionSection *section) +{ + GuestMemfdManager *gmm = GUEST_MEMFD_MANAGER(rdm); + int ret; + + g_assert(section->mr == gmm->mr); + rdl->section = memory_region_section_new_copy(section); + + QLIST_INSERT_HEAD(&gmm->rdl_list, rdl, next); + + ret = guest_memfd_for_each_populated_range(gmm, section, rdl, + guest_memfd_notify_populate_cb); + if (ret) { + error_report("%s: Failed to register RAM discard listener: %s", __func__, + strerror(-ret)); + } +} + +static void guest_memfd_rdm_unregister_listener(RamDiscardManager *rdm, + RamDiscardListener *rdl) +{ + GuestMemfdManager *gmm = GUEST_MEMFD_MANAGER(rdm); + int ret; + + g_assert(rdl->section); + g_assert(rdl->section->mr == gmm->mr); + + ret = guest_memfd_for_each_populated_range(gmm, rdl->section, rdl, + guest_memfd_notify_discard_cb); + if (ret) { + error_report("%s: Failed to unregister RAM discard listener: %s", __func__, + strerror(-ret)); + } + + memory_region_section_free_copy(rdl->section); + rdl->section = NULL; + QLIST_REMOVE(rdl, next); + +} + +typedef struct GuestMemfdReplayData { + void *fn; + void *opaque; +} GuestMemfdReplayData; + +static int guest_memfd_rdm_replay_populated_cb(MemoryRegionSection *section, void *arg) +{ + struct GuestMemfdReplayData *data = arg; + ReplayRamPopulate replay_fn = data->fn; + + return replay_fn(section, data->opaque); +} + +static int guest_memfd_rdm_replay_populated(const RamDiscardManager *rdm, + MemoryRegionSection *section, + ReplayRamPopulate replay_fn, + void *opaque) +{ + GuestMemfdManager *gmm = GUEST_MEMFD_MANAGER(rdm); + struct GuestMemfdReplayData data = { .fn = replay_fn, .opaque = opaque }; + + g_assert(section->mr == gmm->mr); + return guest_memfd_for_each_populated_range(gmm, section, &data, + guest_memfd_rdm_replay_populated_cb); +} + +static int guest_memfd_rdm_replay_discarded_cb(MemoryRegionSection *section, void *arg) +{ + struct GuestMemfdReplayData *data = arg; + ReplayRamDiscard replay_fn = data->fn; + + replay_fn(section, data->opaque); + + return 0; +} + +static void guest_memfd_rdm_replay_discarded(const RamDiscardManager *rdm, + MemoryRegionSection *section, + ReplayRamDiscard replay_fn, + void *opaque) +{ + GuestMemfdManager *gmm = GUEST_MEMFD_MANAGER(rdm); + struct GuestMemfdReplayData data = { .fn = replay_fn, .opaque = opaque }; + + g_assert(section->mr == gmm->mr); + guest_memfd_for_each_discarded_range(gmm, section, &data, + guest_memfd_rdm_replay_discarded_cb); +} + +static void guest_memfd_manager_realize(Object *obj, MemoryRegion *mr, + uint64_t region_size) +{ + GuestMemfdManager *gmm = GUEST_MEMFD_MANAGER(obj); + uint64_t bitmap_size = ROUND_UP(region_size, gmm->block_size) / gmm->block_size; + + gmm->mr = mr; + gmm->discard_bitmap_size = bitmap_size; + gmm->discard_bitmap = bitmap_new(bitmap_size); +} + +static void guest_memfd_manager_init(Object *obj) +{ + GuestMemfdManager *gmm = GUEST_MEMFD_MANAGER(obj); + + gmm->block_size = qemu_real_host_page_size(); + QLIST_INIT(&gmm->rdl_list); +} + +static void guest_memfd_manager_finalize(Object *obj) +{ + g_free(GUEST_MEMFD_MANAGER(obj)->discard_bitmap); +} + +static void guest_memfd_manager_class_init(ObjectClass *oc, void *data) +{ + GuestMemfdManagerClass *gmmc = GUEST_MEMFD_MANAGER_CLASS(oc); + RamDiscardManagerClass *rdmc = RAM_DISCARD_MANAGER_CLASS(oc); + + gmmc->realize = guest_memfd_manager_realize; + + rdmc->get_min_granularity = guest_memfd_rdm_get_min_granularity; + rdmc->register_listener = guest_memfd_rdm_register_listener; + rdmc->unregister_listener = guest_memfd_rdm_unregister_listener; + rdmc->is_populated = guest_memfd_rdm_is_populated; + rdmc->replay_populated = guest_memfd_rdm_replay_populated; + rdmc->replay_discarded = guest_memfd_rdm_replay_discarded; +} diff --git a/system/meson.build b/system/meson.build index a296270cb0..9b96d645ab 100644 --- a/system/meson.build +++ b/system/meson.build @@ -16,6 +16,7 @@ system_ss.add(files( 'dirtylimit.c', 'dma-helpers.c', 'globals.c', + 'guest-memfd-manager.c', 'memory_mapping.c', 'qdev-monitor.c', 'qtest.c', From patchwork Thu Jul 25 07:21:11 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chenyi Qiang X-Patchwork-Id: 13741590 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.13]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E558E4428 for ; Thu, 25 Jul 2024 07:22:45 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.13 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1721892167; cv=none; b=RSqvgVTXsNTZ+k90tICzqGS3aGLkBV4AdGYaAQW57O1qUwrQMotrVVM1fNl+Vjke/1CZld2rtcGJyaQoLc+CJTT8gNKcbmk+1XJtd/qgmpW1dancFHbcLhqkG8OXEOVXDNF1Yk31dzRyDBgStX2k6yhUqOIFWDhMee3P7XPRlnY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1721892167; c=relaxed/simple; bh=w97J4M33cCw9TAhSRTajDSuiBoryiegLBL0geGMjfTo=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=D0yXZ48+0KXWMD1XqRBnbsRfVljUJM2ZHQnbVd3E90mc6x0y5ICdw5+dA+Q0I7Cl4JBGmBQhiGysBW0KJ+lLeEKlMD8V7ZKrOyuimy3aMoLRPFt3HT+0s2svoRuLs5HojmSmhszu/tzViD8XTiebekB7k1DgIoeW9yRvH5X+3cs= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=cmiJT+8t; arc=none smtp.client-ip=198.175.65.13 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="cmiJT+8t" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1721892166; x=1753428166; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=w97J4M33cCw9TAhSRTajDSuiBoryiegLBL0geGMjfTo=; b=cmiJT+8tvPGwoaCXMsSWzdXOKM9D2Z7gHVn4ubyx/5DHYaUEBY0kRpXY hcMrnaVZJtJz2EL09YbPgD2pWWgiElwvz+GfCkdyPFW06OisyCc2o7uXN XBCzUZUmkSpoVEYPx0TObjSylpgi0mYTf7FOzCQDy9p38C6hCVO3r5I+r XXKYqSdXgJGnrBNn5KEdulQqSGP37FMfmmtAbf6s3lKWAKj87tcD/dIvS qby5SY/I1Pi8cFeucJFBAuY87Rc3SNLG0i14fBy6og9Z8c23KVLN+ZMAN f1P0hLK7XURbXt1OCkYVj8o9p3zEny4OMBQ+kQ6XbQGffN8NJyC6piEgG Q==; X-CSE-ConnectionGUID: 4E5F2MM0SPGVPzfMFYBH7A== X-CSE-MsgGUID: azadoBnpRKqADrR/TzwQUA== X-IronPort-AV: E=McAfee;i="6700,10204,11143"; a="30753940" X-IronPort-AV: E=Sophos;i="6.09,235,1716274800"; d="scan'208";a="30753940" Received: from orviesa009.jf.intel.com ([10.64.159.149]) by orvoesa105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 25 Jul 2024 00:22:46 -0700 X-CSE-ConnectionGUID: trBWNliyQSCF+mniCghpGg== X-CSE-MsgGUID: Cj11aMxMTk2JVZHfE+JW4g== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.09,235,1716274800"; d="scan'208";a="52858154" Received: from emr-bkc.sh.intel.com ([10.112.230.82]) by orviesa009-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 25 Jul 2024 00:22:43 -0700 From: Chenyi Qiang To: Paolo Bonzini , David Hildenbrand , Peter Xu , =?utf-8?q?Philippe_Mathieu-Daud=C3=A9?= , Michael Roth Cc: Chenyi Qiang , qemu-devel@nongnu.org, kvm@vger.kernel.org, Williams Dan J , Edgecombe Rick P , Wang Wei W , Peng Chao P , Gao Chao , Wu Hao , Xu Yilun Subject: [RFC PATCH 2/6] guest_memfd: Introduce a helper to notify the shared/private state change Date: Thu, 25 Jul 2024 03:21:11 -0400 Message-ID: <20240725072118.358923-3-chenyi.qiang@intel.com> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20240725072118.358923-1-chenyi.qiang@intel.com> References: <20240725072118.358923-1-chenyi.qiang@intel.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Introduce a helper function within RamDiscardManager to efficiently notify all registered RamDiscardListeners, including VFIO listeners about the memory conversion events between shared and private in guest_memfd. The existing VFIO listener can dynamically DMA map/unmap the shared pages based on the conversion type: - For conversions from shared to private, the VFIO system ensures the discarding of shared mapping from the IOMMU. - For conversions from private to shared, it triggers the population of the shared mapping into the IOMMU. Additionally, there could be some special conversion requests: - When a conversion request is made for a page already in the desired state (either private or shared), the helper simply returns success. - For requests involving a range partially in the desired state, only the necessary segments are converted, ensuring the entire range complies with the request efficiently. - In scenarios where a conversion request is declined by other systems, such as a failure from VFIO during notify_populate(), the helper will roll back the request, maintaining consistency. Signed-off-by: Chenyi Qiang --- include/sysemu/guest-memfd-manager.h | 3 + system/guest-memfd-manager.c | 141 +++++++++++++++++++++++++++ 2 files changed, 144 insertions(+) diff --git a/include/sysemu/guest-memfd-manager.h b/include/sysemu/guest-memfd-manager.h index ab8c2ba362..1cce4cde43 100644 --- a/include/sysemu/guest-memfd-manager.h +++ b/include/sysemu/guest-memfd-manager.h @@ -43,4 +43,7 @@ struct GuestMemfdManagerClass { void (*realize)(Object *gmm, MemoryRegion *mr, uint64_t region_size); }; +int guest_memfd_state_change(GuestMemfdManager *gmm, uint64_t offset, uint64_t size, + bool shared_to_private); + #endif diff --git a/system/guest-memfd-manager.c b/system/guest-memfd-manager.c index 7b90f26859..deb43db90b 100644 --- a/system/guest-memfd-manager.c +++ b/system/guest-memfd-manager.c @@ -243,6 +243,147 @@ static void guest_memfd_rdm_replay_discarded(const RamDiscardManager *rdm, guest_memfd_rdm_replay_discarded_cb); } +static bool guest_memfd_is_valid_range(GuestMemfdManager *gmm, + uint64_t offset, uint64_t size) +{ + MemoryRegion *mr = gmm->mr; + + g_assert(mr); + + uint64_t region_size = memory_region_size(mr); + if (!QEMU_IS_ALIGNED(offset, gmm->block_size)) { + return false; + } + if (offset + size < offset || !size) { + return false; + } + if (offset >= region_size || offset + size > region_size) { + return false; + } + return true; +} + +static void guest_memfd_notify_discard(GuestMemfdManager *gmm, + uint64_t offset, uint64_t size) +{ + RamDiscardListener *rdl; + + QLIST_FOREACH(rdl, &gmm->rdl_list, next) { + MemoryRegionSection tmp = *rdl->section; + + if (!guest_memfd_rdm_intersect_memory_section(&tmp, offset, size)) { + continue; + } + + guest_memfd_for_each_populated_range(gmm, &tmp, rdl, + guest_memfd_notify_discard_cb); + } +} + + +static int guest_memfd_notify_populate(GuestMemfdManager *gmm, + uint64_t offset, uint64_t size) +{ + RamDiscardListener *rdl, *rdl2; + int ret = 0; + + QLIST_FOREACH(rdl, &gmm->rdl_list, next) { + MemoryRegionSection tmp = *rdl->section; + + if (!guest_memfd_rdm_intersect_memory_section(&tmp, offset, size)) { + continue; + } + + ret = guest_memfd_for_each_discarded_range(gmm, &tmp, rdl, + guest_memfd_notify_populate_cb); + if (ret) { + break; + } + } + + if (ret) { + /* Notify all already-notified listeners. */ + QLIST_FOREACH(rdl2, &gmm->rdl_list, next) { + MemoryRegionSection tmp = *rdl2->section; + + if (rdl2 == rdl) { + break; + } + if (!guest_memfd_rdm_intersect_memory_section(&tmp, offset, size)) { + continue; + } + + guest_memfd_for_each_discarded_range(gmm, &tmp, rdl2, + guest_memfd_notify_discard_cb); + } + } + return ret; +} + +static bool guest_memfd_is_range_populated(GuestMemfdManager *gmm, + uint64_t offset, uint64_t size) +{ + const unsigned long first_bit = offset / gmm->block_size; + const unsigned long last_bit = first_bit + (size / gmm->block_size) - 1; + unsigned long found_bit; + + /* We fake a shorter bitmap to avoid searching too far. */ + found_bit = find_next_bit(gmm->discard_bitmap, last_bit + 1, first_bit); + return found_bit > last_bit; +} + +static bool guest_memfd_is_range_discarded(GuestMemfdManager *gmm, + uint64_t offset, uint64_t size) +{ + const unsigned long first_bit = offset / gmm->block_size; + const unsigned long last_bit = first_bit + (size / gmm->block_size) - 1; + unsigned long found_bit; + + /* We fake a shorter bitmap to avoid searching too far. */ + found_bit = find_next_zero_bit(gmm->discard_bitmap, last_bit + 1, first_bit); + return found_bit > last_bit; +} + +int guest_memfd_state_change(GuestMemfdManager *gmm, uint64_t offset, uint64_t size, + bool shared_to_private) +{ + int ret = 0; + + if (!guest_memfd_is_valid_range(gmm, offset, size)) { + error_report("%s, invalid range: offset 0x%lx, size 0x%lx", + __func__, offset, size); + return -1; + } + + if ((shared_to_private && guest_memfd_is_range_discarded(gmm, offset, size)) || + (!shared_to_private && guest_memfd_is_range_populated(gmm, offset, size))) { + return 0; + } + + if (shared_to_private) { + guest_memfd_notify_discard(gmm, offset, size); + } else { + ret = guest_memfd_notify_populate(gmm, offset, size); + } + + if (!ret) { + unsigned long first_bit = offset / gmm->block_size; + unsigned long nbits = size / gmm->block_size; + + g_assert((first_bit + nbits) <= gmm->discard_bitmap_size); + + if (shared_to_private) { + bitmap_set(gmm->discard_bitmap, first_bit, nbits); + } else { + bitmap_clear(gmm->discard_bitmap, first_bit, nbits); + } + + return 0; + } + + return ret; +} + static void guest_memfd_manager_realize(Object *obj, MemoryRegion *mr, uint64_t region_size) { From patchwork Thu Jul 25 07:21:12 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chenyi Qiang X-Patchwork-Id: 13741591 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.13]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C971F4428 for ; Thu, 25 Jul 2024 07:22:49 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.13 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1721892171; cv=none; b=LH4SioBWnmFlo8KD8dy9ghYZ4jfxAv5tB0ubISUrUQdvn6Nv/UvLlkCznqkQg7V//XwJ/zYa47mEaFtCrx6KSyXu/4lsKjyYGG6bruam48dqelQbcWFJw1WCWfsZerIsAEsGk3M/l5nAwS9nbE082+8RQKR46Uh1S/YYXczgi8g= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1721892171; c=relaxed/simple; bh=PzDthdOyXi8jqps5YCnKqKm2ZMk1q6Qbjz/AdVU9fYE=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=dFmeZtXo6po88TXgMUXeJQjZhfo8Css/DS7/H24JafECMlzyfPlmx0RJw2PEzggU1NkaSnyHvtgT8DsNHd6bFRWkc5eyYQ68e7zF5urWsQGht0XfjAuFiBZJu/m6uYolZx1d59tMoc2v7a0QEfMNNe7iPnChCaeJnokReaBtjt4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=TgN5jnHp; arc=none smtp.client-ip=198.175.65.13 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="TgN5jnHp" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1721892170; x=1753428170; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=PzDthdOyXi8jqps5YCnKqKm2ZMk1q6Qbjz/AdVU9fYE=; b=TgN5jnHpNqF3e/bbddPr8g9BKHtPvpUIqHb0pPsqWvuEPgjG/3r45T3W R3tHehXGBYTlx+KZ9BMJRx7zIpuFvTvdaTrazgI4VEQLFQ2quLqks3eIc 5mxaJ73F20Alzv48nTE8loMtKVqQMHqkIpkxj6L225sXG+LH0bWPIsSsa zmQwNGE1CMxGKPIfujfN+eWQdKcJS7OXeqQSEVHXBp/8fw0E1G3SdhbxB DqCLogOCbim2lbivwiHQ1YJOA9oAav7bVaugq6vcWALhYsy8dlZw4WOeT qj/LGqXuLyd3ykWdOEmC15xucMtobMv9WHwFbxr2x/uVwTXzbfuPFAnJa w==; X-CSE-ConnectionGUID: lH4tedJCS++or2xqrNwTbg== X-CSE-MsgGUID: HwXM9t7WQbOYoMuo44LUzA== X-IronPort-AV: E=McAfee;i="6700,10204,11143"; a="30753964" X-IronPort-AV: E=Sophos;i="6.09,235,1716274800"; d="scan'208";a="30753964" Received: from orviesa009.jf.intel.com ([10.64.159.149]) by orvoesa105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 25 Jul 2024 00:22:50 -0700 X-CSE-ConnectionGUID: BmOnfm53TROM4l4nb2FKiw== X-CSE-MsgGUID: 1Cs7+M5qSK6v2R8gqF5zBQ== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.09,235,1716274800"; d="scan'208";a="52858161" Received: from emr-bkc.sh.intel.com ([10.112.230.82]) by orviesa009-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 25 Jul 2024 00:22:46 -0700 From: Chenyi Qiang To: Paolo Bonzini , David Hildenbrand , Peter Xu , =?utf-8?q?Philippe_Mathieu-Daud=C3=A9?= , Michael Roth Cc: Chenyi Qiang , qemu-devel@nongnu.org, kvm@vger.kernel.org, Williams Dan J , Edgecombe Rick P , Wang Wei W , Peng Chao P , Gao Chao , Wu Hao , Xu Yilun Subject: [RFC PATCH 3/6] KVM: Notify the state change via RamDiscardManager helper during shared/private conversion Date: Thu, 25 Jul 2024 03:21:12 -0400 Message-ID: <20240725072118.358923-4-chenyi.qiang@intel.com> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20240725072118.358923-1-chenyi.qiang@intel.com> References: <20240725072118.358923-1-chenyi.qiang@intel.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Once exit to userspace to convert the page from private to shared or vice versa at runtime, notify the state change via the guest_memfd_state_change() helper so that other registered subsystems like VFIO can be notified. Signed-off-by: Chenyi Qiang --- accel/kvm/kvm-all.c | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c index 854cb86b22..94bbbbd2de 100644 --- a/accel/kvm/kvm-all.c +++ b/accel/kvm/kvm-all.c @@ -48,6 +48,7 @@ #include "kvm-cpus.h" #include "sysemu/dirtylimit.h" #include "qemu/range.h" +#include "sysemu/guest-memfd-manager.h" #include "hw/boards.h" #include "sysemu/stats.h" @@ -2852,6 +2853,7 @@ int kvm_convert_memory(hwaddr start, hwaddr size, bool to_private) RAMBlock *rb; void *addr; int ret = -1; + GuestMemfdManager *gmm; trace_kvm_convert_memory(start, size, to_private ? "shared_to_private" : "private_to_shared"); @@ -2914,6 +2916,11 @@ int kvm_convert_memory(hwaddr start, hwaddr size, bool to_private) addr = memory_region_get_ram_ptr(mr) + section.offset_within_region; rb = qemu_ram_block_from_host(addr, false, &offset); + gmm = GUEST_MEMFD_MANAGER(mr->rdm); + if (gmm) { + guest_memfd_state_change(gmm, offset, size, to_private); + } + if (to_private) { if (rb->page_size != qemu_real_host_page_size()) { /* From patchwork Thu Jul 25 07:21:13 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chenyi Qiang X-Patchwork-Id: 13741592 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.13]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id AF4FF17109D for ; Thu, 25 Jul 2024 07:22:53 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.13 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1721892175; cv=none; b=WksHwPAgcx/AgUshjPG1TZnsMQ7Z/mpBFcbXNEqPti0ieQNRU/i7Jp6Jccz+Kzadov/agG22UapgYh0ZwV01N+qVbA0/7hOCYlofWVgMsEgxcvnpi+s64KvL/KJw2qw+IPh9fGgVOo7t3B3H0bxCIz77VMPW8JNA2oid6CN/CrA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1721892175; c=relaxed/simple; bh=PifPO2VLSml++/yyEHfyB9LD6kSBnEg6UWh8hpYvOM8=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=N6Io8MoBRr3mA3G/9fpYkggo8W++Q/y2pkglJrLDag/gYGqLRpLFiFXcAJG78+X+Fg4upZ/jQ0/R/LJpuhrciyrV6z5Sxpq/b4ho95mRDZ2E0dNzHKjb+o5ncDQ7yOqqG5dZK/i/FZTeA/XOfVmD392eqSUc39iW9ies90p1yeE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=l66Z4CLG; arc=none smtp.client-ip=198.175.65.13 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="l66Z4CLG" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1721892174; x=1753428174; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=PifPO2VLSml++/yyEHfyB9LD6kSBnEg6UWh8hpYvOM8=; b=l66Z4CLGcdWH4D3mRiyFzWe2jiZyxM2t1mGWyqEGQAOcxcgkHRoJQ+oW pbNFnzwvAAW3/+McAitnVn58/idzuZOshz9BymQ7uI/ePW87+5c09dJI9 EPTAag3uUx+o8REal3RjDGjBU3A9VLoZrOi7gPQEw3SES+xxSJRNUHF9f 3Gp+oDMtheL290ZfRQEXTjRBND3RRVZaZuJIts6gRUO8Hv0tCsgjbY08P KNv/EyVNBnaGbPizmPEZHEBXabf7pDccS1DzC6XZ+swmJ6OUgXVLd4987 wgoV7CVLtaLBfAFI5uRBi983kpPj9nuO2ed1B9bvuLjmBSvx4hi7wZYVm w==; X-CSE-ConnectionGUID: buk/9EF8TEGdB653ijsFEQ== X-CSE-MsgGUID: geWvxKo8RtCWgG6UOmguUA== X-IronPort-AV: E=McAfee;i="6700,10204,11143"; a="30753989" X-IronPort-AV: E=Sophos;i="6.09,235,1716274800"; d="scan'208";a="30753989" Received: from orviesa009.jf.intel.com ([10.64.159.149]) by orvoesa105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 25 Jul 2024 00:22:53 -0700 X-CSE-ConnectionGUID: gFgeME1KR7eAyTzj7cKk1A== X-CSE-MsgGUID: xo48P9PiTWaGNySerO1iQA== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.09,235,1716274800"; d="scan'208";a="52858168" Received: from emr-bkc.sh.intel.com ([10.112.230.82]) by orviesa009-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 25 Jul 2024 00:22:50 -0700 From: Chenyi Qiang To: Paolo Bonzini , David Hildenbrand , Peter Xu , =?utf-8?q?Philippe_Mathieu-Daud=C3=A9?= , Michael Roth Cc: Chenyi Qiang , qemu-devel@nongnu.org, kvm@vger.kernel.org, Williams Dan J , Edgecombe Rick P , Wang Wei W , Peng Chao P , Gao Chao , Wu Hao , Xu Yilun Subject: [RFC PATCH 4/6] memory: Register the RamDiscardManager instance upon guest_memfd creation Date: Thu, 25 Jul 2024 03:21:13 -0400 Message-ID: <20240725072118.358923-5-chenyi.qiang@intel.com> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20240725072118.358923-1-chenyi.qiang@intel.com> References: <20240725072118.358923-1-chenyi.qiang@intel.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Instantiate a new guest_memfd_manager object and register it in the target MemoryRegion. From this point, other subsystems such as VFIO can register their listeners in guest_memfd_manager and receive conversion events through RamDiscardManager. Signed-off-by: Chenyi Qiang --- system/physmem.c | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/system/physmem.c b/system/physmem.c index 33d09f7571..98072ae246 100644 --- a/system/physmem.c +++ b/system/physmem.c @@ -53,6 +53,7 @@ #include "sysemu/hostmem.h" #include "sysemu/hw_accel.h" #include "sysemu/xen-mapcache.h" +#include "sysemu/guest-memfd-manager.h" #include "trace/trace-root.h" #ifdef CONFIG_FALLOCATE_PUNCH_HOLE @@ -1861,6 +1862,12 @@ static void ram_block_add(RAMBlock *new_block, Error **errp) qemu_mutex_unlock_ramlist(); goto out_free; } + + GuestMemfdManager *gmm = GUEST_MEMFD_MANAGER(object_new(TYPE_GUEST_MEMFD_MANAGER)); + GuestMemfdManagerClass *gmmc = GUEST_MEMFD_MANAGER_GET_CLASS(gmm); + g_assert(new_block->mr); + gmmc->realize(OBJECT(gmm), new_block->mr, new_block->mr->size); + memory_region_set_ram_discard_manager(gmm->mr, RAM_DISCARD_MANAGER(gmm)); } new_ram_size = MAX(old_ram_size, @@ -2118,6 +2125,8 @@ static void reclaim_ramblock(RAMBlock *block) if (block->guest_memfd >= 0) { close(block->guest_memfd); + g_assert(block->mr); + object_unref(OBJECT(block->mr->rdm)); ram_block_discard_require(false); } From patchwork Thu Jul 25 07:21:14 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chenyi Qiang X-Patchwork-Id: 13741593 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.13]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4FB2916F830 for ; Thu, 25 Jul 2024 07:22:57 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.13 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1721892179; cv=none; b=Md5YQgU8Q5TOWk4sr2ARIQQo7ZBVluJ9H4M9O4PrhegjPk08IOcmNxYbLW3IHKpIa4fO4Onici8NNqeNC2ffzmTcZlXUGfYUma05BIQh0ad0owDD9aM1PlYJl21hhjzIGxqSVf8qm38aU27GCHeSJCcAbOGTn2s0pWwZMTg/0/M= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1721892179; c=relaxed/simple; bh=l2lUqV9g3yKHM7XwJAQ7khV4fRpn4ow3j+tpeoa6IUU=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=fnjAcWsz83pjWSHi6zzCWfxNEBTrkfW3JuGdZHocRR/dVpBhrHyFt1M0Zg9prFZ9QInJhcLuN2qeN+ynvM5FVey7jfyD3RMf/QgGV7D5SOg7MRPTzPmnhNFsQKaHQUUjnsWRVlNXhyIm3WwWL0/m4P6rjQ+ULRB5TDJ7PZ17Cw4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=O9EbnNlm; arc=none smtp.client-ip=198.175.65.13 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="O9EbnNlm" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1721892177; x=1753428177; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=l2lUqV9g3yKHM7XwJAQ7khV4fRpn4ow3j+tpeoa6IUU=; b=O9EbnNlmHeoKiT6pC+hMbdaffjKNRDf140XQf+6Dib/8siDlwZX9T+Rb xD5d4JhJHmXjDficGHKFxRt3iUoMTuBbxDC1JL56PPXsxI0RtKNYM/KPp VxQT9OcFWkS6/Mn5g04UmrzOjyQEZqwKvMBxz+NquOso3wFu6Mp8e9K5o mqv313eDh6Q6JtPQkWBnSzFcT+LvdKwzFEdZUeQeFgzy7g4DFgqDDucdq bt7jCIPTC3Is4n0RHyIjqPz5hbKoX+vMvGzAwSupzg/THTOTVOWNW5eGA gO5e3qKInwJ04s20R59ToeNsgdWpSU4mp8adWY46le1IvaFmo9vf0anUK A==; X-CSE-ConnectionGUID: 37MLHl1vS9iVhMaTMtdJuA== X-CSE-MsgGUID: xFm5Y2tUQouBB14CAa7fmw== X-IronPort-AV: E=McAfee;i="6700,10204,11143"; a="30754006" X-IronPort-AV: E=Sophos;i="6.09,235,1716274800"; d="scan'208";a="30754006" Received: from orviesa009.jf.intel.com ([10.64.159.149]) by orvoesa105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 25 Jul 2024 00:22:57 -0700 X-CSE-ConnectionGUID: 3L+kWF1eQP2UnHrtemDUQA== X-CSE-MsgGUID: aE4D7ZLoSU6JV3CkwMoB0Q== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.09,235,1716274800"; d="scan'208";a="52858196" Received: from emr-bkc.sh.intel.com ([10.112.230.82]) by orviesa009-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 25 Jul 2024 00:22:54 -0700 From: Chenyi Qiang To: Paolo Bonzini , David Hildenbrand , Peter Xu , =?utf-8?q?Philippe_Mathieu-Daud=C3=A9?= , Michael Roth Cc: Chenyi Qiang , qemu-devel@nongnu.org, kvm@vger.kernel.org, Williams Dan J , Edgecombe Rick P , Wang Wei W , Peng Chao P , Gao Chao , Wu Hao , Xu Yilun Subject: [RFC PATCH 5/6] guest-memfd: Default to discarded (private) in guest_memfd_manager Date: Thu, 25 Jul 2024 03:21:14 -0400 Message-ID: <20240725072118.358923-6-chenyi.qiang@intel.com> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20240725072118.358923-1-chenyi.qiang@intel.com> References: <20240725072118.358923-1-chenyi.qiang@intel.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 guest_memfd was initially set to shared until the commit bd3bcf6962 ("kvm/memory: Make memory type private by default if it has guest memfd backend"). To align with this change, the default state in guest_memfd_manager is set to discarded. One concern raised by this commit is the handling of the virtual BIOS. The virtual BIOS loads its image into the shared memory of guest_memfd. However, during the region_commit() stage, the memory attribute is set to private while its shared memory remains valid. This mismatch persists until the shared content is copied to the private region. Fortunately, this interval only exits during setup stage and currently, only the guest_memfd_manager is concerned with the state of the guest_memfd at that stage. For simplicity, the default bitmap in guest_memfd_manager is set to discarded (private). This is feasible because the shared content of the virtual BIOS will eventually be discarded and there are no requests to DMA access to this shared part during this period. Additionally, setting the default to private can also reduce the overhead of mapping shared pages into IOMMU by VFIO at the bootup stage. Signed-off-by: Chenyi Qiang --- system/guest-memfd-manager.c | 1 + 1 file changed, 1 insertion(+) diff --git a/system/guest-memfd-manager.c b/system/guest-memfd-manager.c index deb43db90b..ad1a46bac4 100644 --- a/system/guest-memfd-manager.c +++ b/system/guest-memfd-manager.c @@ -393,6 +393,7 @@ static void guest_memfd_manager_realize(Object *obj, MemoryRegion *mr, gmm->mr = mr; gmm->discard_bitmap_size = bitmap_size; gmm->discard_bitmap = bitmap_new(bitmap_size); + bitmap_fill(gmm->discard_bitmap, bitmap_size); } static void guest_memfd_manager_init(Object *obj) From patchwork Thu Jul 25 07:21:15 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chenyi Qiang X-Patchwork-Id: 13741594 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.13]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 2586816D9B9 for ; Thu, 25 Jul 2024 07:23:01 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.13 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1721892182; cv=none; b=XbMvW6nk0REltx42W4hzAmDPjSheBkgYDykvR4Fos4hRo/pCxYdQvYe5KhBYnozuLMU9E/2lhf1ZY3w572p8a7EPyks6XNx18o6aoRHo6xeWymn2d6TFZZNNi9irJegPr1wnEepRTJcC3W2h7SgwK/xw/47FdD2hsm8EwYedCzI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1721892182; c=relaxed/simple; bh=hEJIsWDYfFTtEa1W706+mMe0OCLnShBTQaakw4gSz8Q=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=rhim7ngqwWkhhR8RrEL3Hr4ctUyKBMJ+YmWX9VDvPR9FKBqeE011Prvfpa+TPkW+fUXUD3kPNfFoeE44dVv13xVPhA0mBx0UXZL/RSMjI6Q+rJTwES0oxipTJrWzSpgseXJdCstOv+itVKj7ASo//GoD5vW5FEX8tTYJwXoEMxU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=mxb/rfnX; arc=none smtp.client-ip=198.175.65.13 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="mxb/rfnX" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1721892181; x=1753428181; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=hEJIsWDYfFTtEa1W706+mMe0OCLnShBTQaakw4gSz8Q=; b=mxb/rfnXK92igiff3uCh3mcJTnaMkONmcVdjfhAl5lWy1TmCqRD5vNJq VUf3O5KeWfpgw8y9pvw9KYSoVUcBIjnOGsz9iOdogQ2WqgrVisoMKcT4D WR1nbh8CGbN0qDG/awjhzji1p3SN/Uu1E5o7JI2BAI/RawKMtpw+c7/1N NghMShnGh76xRud2LfOKboQfwaPYLcNrfX2Iwy9pt2KK/h8A/xfIKYWH5 IgqD1He+e5U5L7M97EXhS+w7vsSbJh7ZimOeFT/nA91/rIQdKxKYnCBRj jk7jq5QRYvGYPwqM/B0EOcaJjX55GV0Uf15fCv+Ro76X0yN+OLdU3HJgA g==; X-CSE-ConnectionGUID: mXL/g9CdTkmX873Vz+4vZg== X-CSE-MsgGUID: Hb+zOu8OTFWCETMeHzH0QA== X-IronPort-AV: E=McAfee;i="6700,10204,11143"; a="30754017" X-IronPort-AV: E=Sophos;i="6.09,235,1716274800"; d="scan'208";a="30754017" Received: from orviesa009.jf.intel.com ([10.64.159.149]) by orvoesa105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 25 Jul 2024 00:23:01 -0700 X-CSE-ConnectionGUID: /vvKROvzSbqAL9fMxU/uyw== X-CSE-MsgGUID: hWGUxWa3R/yK5BGhLtSg/g== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.09,235,1716274800"; d="scan'208";a="52858206" Received: from emr-bkc.sh.intel.com ([10.112.230.82]) by orviesa009-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 25 Jul 2024 00:22:58 -0700 From: Chenyi Qiang To: Paolo Bonzini , David Hildenbrand , Peter Xu , =?utf-8?q?Philippe_Mathieu-Daud=C3=A9?= , Michael Roth Cc: Chenyi Qiang , qemu-devel@nongnu.org, kvm@vger.kernel.org, Williams Dan J , Edgecombe Rick P , Wang Wei W , Peng Chao P , Gao Chao , Wu Hao , Xu Yilun Subject: [RFC PATCH 6/6] RAMBlock: make guest_memfd require coordinate discard Date: Thu, 25 Jul 2024 03:21:15 -0400 Message-ID: <20240725072118.358923-7-chenyi.qiang@intel.com> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20240725072118.358923-1-chenyi.qiang@intel.com> References: <20240725072118.358923-1-chenyi.qiang@intel.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 As guest_memfd is now managed by guest_memfd_manager with RamDiscardManager, only block uncoordinated discard. Signed-off-by: Chenyi Qiang --- system/physmem.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/system/physmem.c b/system/physmem.c index 98072ae246..ffd68debf0 100644 --- a/system/physmem.c +++ b/system/physmem.c @@ -1849,7 +1849,7 @@ static void ram_block_add(RAMBlock *new_block, Error **errp) assert(kvm_enabled()); assert(new_block->guest_memfd < 0); - if (ram_block_discard_require(true) < 0) { + if (ram_block_coordinated_discard_require(true) < 0) { error_setg_errno(errp, errno, "cannot set up private guest memory: discard currently blocked"); error_append_hint(errp, "Are you using assigned devices?\n");