From patchwork Tue Jan 7 14:27:08 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Xu Yilun X-Patchwork-Id: 13930046 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.11]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 57CC5188A18; Wed, 8 Jan 2025 02:28:30 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.11 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736303312; cv=none; b=nhd/mANna6g4SUQJHuNgRQM/k9fRZoUMIi/7rBOrze8ME1wtNnkvimbZ5ZncUAIlOwOk+j+GlMKwxO9pwuEjvdEe5x3lY25Uu9Q1LyJ0Ew/mpDPndz76/SV9lS9neQ/fmGPjnjbKPz99NkP1p7TJsCDbHa9lQrw8Lmy8P2V/ANY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736303312; c=relaxed/simple; bh=Syt/4zTwsFXntNmBILomrRDPU928v/NSlBbptzf8p88=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=DUc7Qk683Aj6vBr7meX+P03oumLIs/7LYsaWCPUICmSKE+IHUphu0IM4ZcG0xasSwiJQaAFhCy+V+zuOn1w4qqEwfrbxcNtoG1g5OLAWmi4qH3QSQQB+NkmBSDWmAzWu8iUgHvCIC3u82LT2ek/14d/yy9SwdDz3GvZo6HWLSyM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=none smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=QrmxA1Uu; arc=none smtp.client-ip=198.175.65.11 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="QrmxA1Uu" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1736303311; x=1767839311; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=Syt/4zTwsFXntNmBILomrRDPU928v/NSlBbptzf8p88=; b=QrmxA1UuA5jgan7NCl4GLBnm8UpvcJFQ+/F2g3Fnr6KJSf09egQSn+fc rMZQRaAzVJUM5dIgop5+NnIwyU5hxKWQhOdgrtJOj0AfJ/zdd1KxrySFG j6mmU2UZyASiG7Yq8Fjo7mX450v0hFUR7tkFWxtF68qgunS5jgsx7Mtiv nTFgiVrVcsLgylJhYVL7VvSJC8mwg6dYCO0ZgnUd4xx8OGyeU80yztovX v/Bx8ROJivN8X5A7qmmRb03o+S6KbW3JXyLqbkhQKcKYCgSIXjxLxgmMX PP+w8rP158Uh0G8GmuK6Um3G2VDuv2rdUZayzFidPCcyUUqwuVSvhtH4h g==; X-CSE-ConnectionGUID: IcLXNNnTRuyUhh+eNJnL/Q== X-CSE-MsgGUID: CS3AA+D/RaeyjJm2wcF5Zg== X-IronPort-AV: E=McAfee;i="6700,10204,11308"; a="47010438" X-IronPort-AV: E=Sophos;i="6.12,296,1728975600"; d="scan'208";a="47010438" Received: from orviesa008.jf.intel.com ([10.64.159.148]) by orvoesa103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 07 Jan 2025 18:28:30 -0800 X-CSE-ConnectionGUID: gC/yh04aS/6ZUCyo43V61w== X-CSE-MsgGUID: E/pvK/93RQ2PEmna2N7JOQ== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.12,224,1728975600"; d="scan'208";a="103793547" Received: from yilunxu-optiplex-7050.sh.intel.com ([10.239.159.165]) by orviesa008.jf.intel.com with ESMTP; 07 Jan 2025 18:28:25 -0800 From: Xu Yilun To: kvm@vger.kernel.org, dri-devel@lists.freedesktop.org, linux-media@vger.kernel.org, linaro-mm-sig@lists.linaro.org, sumit.semwal@linaro.org, christian.koenig@amd.com, pbonzini@redhat.com, seanjc@google.com, alex.williamson@redhat.com, jgg@nvidia.com, vivek.kasireddy@intel.com, dan.j.williams@intel.com, aik@amd.com Cc: yilun.xu@intel.com, yilun.xu@linux.intel.com, linux-coco@lists.linux.dev, linux-kernel@vger.kernel.org, lukas@wunner.de, yan.y.zhao@intel.com, daniel.vetter@ffwll.ch, leon@kernel.org, baolu.lu@linux.intel.com, zhenzhong.duan@intel.com, tao1.su@intel.com Subject: [RFC PATCH 01/12] dma-buf: Introduce dma_buf_get_pfn_unlocked() kAPI Date: Tue, 7 Jan 2025 22:27:08 +0800 Message-Id: <20250107142719.179636-2-yilun.xu@linux.intel.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20250107142719.179636-1-yilun.xu@linux.intel.com> References: <20250107142719.179636-1-yilun.xu@linux.intel.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Introduce a new API for dma-buf importer, also add a dma_buf_ops callback for dma-buf exporter. This API is for subsystem importers who map the dma-buf to some user defined address space, e.g. for IOMMUFD to map the dma-buf to userspace IOVA via IOMMU page table, or for KVM to map the dma-buf to GPA via KVM MMU (e.g. EPT). Currently dma-buf is only used to get DMA address for device's default domain by using kernel DMA APIs. But for these new use-cases, importers only need the pfn of the dma-buf resource to build their own mapping tables. So the map_dma_buf() callback is not mandatory for exporters anymore. Also the importers could choose not to provide struct device *dev on dma_buf_attach() if they don't call dma_buf_map_attachment(). Like dma_buf_map_attachment(), the importer should firstly call dma_buf_attach/dynamic_attach() then call dma_buf_get_pfn_unlocked(). If the importer choose to do dynamic attach, it also should handle the dma-buf move notification. Only the unlocked version of dma_buf_get_pfn is implemented for now, just because no locked version is used for now. Signed-off-by: Xu Yilun --- IIUC, Only get_pfn() is needed but no put_pfn(). The whole dma-buf is de/referenced at dma-buf attach/detach time. Specifically, for static attachment, the exporter should always make memory resource available/pinned on first dma_buf_attach(), and release/unpin memory resource on last dma_buf_detach(). For dynamic attachment, the exporter could populate & invalidate the memory resource at any time, it's OK as long as the importers follow dma-buf move notification. So no pinning is needed for get_pfn() and no put_pfn() is needed. --- drivers/dma-buf/dma-buf.c | 90 +++++++++++++++++++++++++++++++-------- include/linux/dma-buf.h | 13 ++++++ 2 files changed, 86 insertions(+), 17 deletions(-) diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c index 7eeee3a38202..83d1448b6dcc 100644 --- a/drivers/dma-buf/dma-buf.c +++ b/drivers/dma-buf/dma-buf.c @@ -630,10 +630,10 @@ struct dma_buf *dma_buf_export(const struct dma_buf_export_info *exp_info) size_t alloc_size = sizeof(struct dma_buf); int ret; - if (WARN_ON(!exp_info->priv || !exp_info->ops - || !exp_info->ops->map_dma_buf - || !exp_info->ops->unmap_dma_buf - || !exp_info->ops->release)) + if (WARN_ON(!exp_info->priv || !exp_info->ops || + (!!exp_info->ops->map_dma_buf != !!exp_info->ops->unmap_dma_buf) || + (!exp_info->ops->map_dma_buf && !exp_info->ops->get_pfn) || + !exp_info->ops->release)) return ERR_PTR(-EINVAL); if (WARN_ON(exp_info->ops->cache_sgt_mapping && @@ -909,7 +909,10 @@ dma_buf_dynamic_attach(struct dma_buf *dmabuf, struct device *dev, struct dma_buf_attachment *attach; int ret; - if (WARN_ON(!dmabuf || !dev)) + if (WARN_ON(!dmabuf)) + return ERR_PTR(-EINVAL); + + if (WARN_ON(dmabuf->ops->map_dma_buf && !dev)) return ERR_PTR(-EINVAL); if (WARN_ON(importer_ops && !importer_ops->move_notify)) @@ -941,7 +944,7 @@ dma_buf_dynamic_attach(struct dma_buf *dmabuf, struct device *dev, */ if (dma_buf_attachment_is_dynamic(attach) != dma_buf_is_dynamic(dmabuf)) { - struct sg_table *sgt; + struct sg_table *sgt = NULL; dma_resv_lock(attach->dmabuf->resv, NULL); if (dma_buf_is_dynamic(attach->dmabuf)) { @@ -950,13 +953,16 @@ dma_buf_dynamic_attach(struct dma_buf *dmabuf, struct device *dev, goto err_unlock; } - sgt = __map_dma_buf(attach, DMA_BIDIRECTIONAL); - if (!sgt) - sgt = ERR_PTR(-ENOMEM); - if (IS_ERR(sgt)) { - ret = PTR_ERR(sgt); - goto err_unpin; + if (dmabuf->ops->map_dma_buf) { + sgt = __map_dma_buf(attach, DMA_BIDIRECTIONAL); + if (!sgt) + sgt = ERR_PTR(-ENOMEM); + if (IS_ERR(sgt)) { + ret = PTR_ERR(sgt); + goto err_unpin; + } } + dma_resv_unlock(attach->dmabuf->resv); attach->sgt = sgt; attach->dir = DMA_BIDIRECTIONAL; @@ -1119,7 +1125,8 @@ struct sg_table *dma_buf_map_attachment(struct dma_buf_attachment *attach, might_sleep(); - if (WARN_ON(!attach || !attach->dmabuf)) + if (WARN_ON(!attach || !attach->dmabuf || + !attach->dmabuf->ops->map_dma_buf)) return ERR_PTR(-EINVAL); dma_resv_assert_held(attach->dmabuf->resv); @@ -1195,7 +1202,8 @@ dma_buf_map_attachment_unlocked(struct dma_buf_attachment *attach, might_sleep(); - if (WARN_ON(!attach || !attach->dmabuf)) + if (WARN_ON(!attach || !attach->dmabuf || + !attach->dmabuf->ops->map_dma_buf)) return ERR_PTR(-EINVAL); dma_resv_lock(attach->dmabuf->resv, NULL); @@ -1222,7 +1230,8 @@ void dma_buf_unmap_attachment(struct dma_buf_attachment *attach, { might_sleep(); - if (WARN_ON(!attach || !attach->dmabuf || !sg_table)) + if (WARN_ON(!attach || !attach->dmabuf || + !attach->dmabuf->ops->unmap_dma_buf || !sg_table)) return; dma_resv_assert_held(attach->dmabuf->resv); @@ -1254,7 +1263,8 @@ void dma_buf_unmap_attachment_unlocked(struct dma_buf_attachment *attach, { might_sleep(); - if (WARN_ON(!attach || !attach->dmabuf || !sg_table)) + if (WARN_ON(!attach || !attach->dmabuf || + !attach->dmabuf->ops->unmap_dma_buf || !sg_table)) return; dma_resv_lock(attach->dmabuf->resv, NULL); @@ -1263,6 +1273,52 @@ void dma_buf_unmap_attachment_unlocked(struct dma_buf_attachment *attach, } EXPORT_SYMBOL_NS_GPL(dma_buf_unmap_attachment_unlocked, "DMA_BUF"); +/** + * dma_buf_get_pfn_unlocked - + * @attach: [in] attachment to get pfn from + * @pgoff: [in] page offset of the buffer against the start of dma_buf + * @pfn: [out] returns the pfn of the buffer + * @max_order [out] returns the max mapping order of the buffer + */ +int dma_buf_get_pfn_unlocked(struct dma_buf_attachment *attach, + pgoff_t pgoff, u64 *pfn, int *max_order) +{ + struct dma_buf *dmabuf = attach->dmabuf; + int ret; + + if (WARN_ON(!attach || !attach->dmabuf || + !attach->dmabuf->ops->get_pfn)) + return -EINVAL; + + /* + * Open: + * + * When dma_buf is dynamic but dma_buf move is disabled, the buffer + * should be pinned before use, See dma_buf_map_attachment() for + * reference. + * + * But for now no pin is intended inside dma_buf_get_pfn(), otherwise + * need another API to unpin the dma_buf. So just fail out this case. + */ + if (dma_buf_is_dynamic(attach->dmabuf) && + !IS_ENABLED(CONFIG_DMABUF_MOVE_NOTIFY)) + return -ENOENT; + + dma_resv_lock(attach->dmabuf->resv, NULL); + ret = dmabuf->ops->get_pfn(attach, pgoff, pfn, max_order); + /* + * Open: + * + * Is dma_resv_wait_timeout() needed? I assume no. The DMA buffer + * content synchronization could be done when the buffer is to be + * mapped by importer. + */ + dma_resv_unlock(attach->dmabuf->resv); + + return ret; +} +EXPORT_SYMBOL_NS_GPL(dma_buf_get_pfn_unlocked, "DMA_BUF"); + /** * dma_buf_move_notify - notify attachments that DMA-buf is moving * @@ -1662,7 +1718,7 @@ static int dma_buf_debug_show(struct seq_file *s, void *unused) attach_count = 0; list_for_each_entry(attach_obj, &buf_obj->attachments, node) { - seq_printf(s, "\t%s\n", dev_name(attach_obj->dev)); + seq_printf(s, "\t%s\n", attach_obj->dev ? dev_name(attach_obj->dev) : NULL); attach_count++; } dma_resv_unlock(buf_obj->resv); diff --git a/include/linux/dma-buf.h b/include/linux/dma-buf.h index 36216d28d8bd..b16183edfb3a 100644 --- a/include/linux/dma-buf.h +++ b/include/linux/dma-buf.h @@ -194,6 +194,17 @@ struct dma_buf_ops { * if the call would block. */ + /** + * @get_pfn: + * + * This is called by dma_buf_get_pfn(). It is used to get the pfn + * of the buffer positioned by the page offset against the start of + * the dma_buf. It can only be called if @attach has been called + * successfully. + */ + int (*get_pfn)(struct dma_buf_attachment *attach, pgoff_t pgoff, + u64 *pfn, int *max_order); + /** * @release: * @@ -629,6 +640,8 @@ dma_buf_map_attachment_unlocked(struct dma_buf_attachment *attach, void dma_buf_unmap_attachment_unlocked(struct dma_buf_attachment *attach, struct sg_table *sg_table, enum dma_data_direction direction); +int dma_buf_get_pfn_unlocked(struct dma_buf_attachment *attach, + pgoff_t pgoff, u64 *pfn, int *max_order); int dma_buf_mmap(struct dma_buf *, struct vm_area_struct *, unsigned long); From patchwork Tue Jan 7 14:27:09 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Xu Yilun X-Patchwork-Id: 13930047 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.11]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E8B971991CA; Wed, 8 Jan 2025 02:28:35 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.11 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736303317; cv=none; b=N1GUyI3hXSWRugOr6LvikQznPT2Ta2unzwwolXQsJTZ0sArjw0E/R6qUoL/mT3f9Qlg88E/1AvlH/ZL3kT7tQwEsufH3kADfbSUb8q6jUITHSnigOYmc9flZGCoDTm2OVYxKhOGTuR6BZ0+bIeX1kocKhXMIQad8NBdcWXX84Xc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736303317; c=relaxed/simple; bh=F0o6l345++Qco2fqWYYKPWB6A9m+I3zwAF5W1WMcfyA=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=oFLQFxcRuoTz9uweiDZldvdQA2a2RDcATx9T+USKgxsdwtrpV03VMZDbw6soAh/OtKshiu6iy59gOhmzFIRvrN2+T9fhGdWp5ve8knxNJXt5uiTM7J+WaSmX+6IcJZDzmWslpVMsOjYQ5r/bgVbgRnCvjifxlueaIr36rWOMaIE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=none smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=NmZqvgKQ; arc=none smtp.client-ip=198.175.65.11 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="NmZqvgKQ" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1736303316; x=1767839316; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=F0o6l345++Qco2fqWYYKPWB6A9m+I3zwAF5W1WMcfyA=; b=NmZqvgKQ/1lY2BEAONQZ/CUIpSr5JAeAxd1R3IpGe3NUzWucaq5vmkHI 4DhoKW4K3w9Altw5vBWzq2PC+SzxoCa6lAVc/579LyRAFM8qaptE2xnV3 BLs4HoGkcdyya9fVdm2L7rUaLLxQx5Ce1MC1tr7pgUR0dtUHOY0M3vnVZ QoKmrga0qddkd3rzKQ87HNOoRdmexlIjjz9VFyuI7X6jA+DkaNHTvVNEh 8+6mvG3kh7qwGCK734dqul5+jhIxMxNz/D2MVonMFf9cbKQDcGUzov/0F wxH+LegBvM8n0AKMCvzaEl/lvNyVFRSGSebmXvnIbdiN5uwoKEoH7x/yR A==; X-CSE-ConnectionGUID: 8104q0QoQc+QP2sU2NdFVA== X-CSE-MsgGUID: ItZ2BB8cQpCwNiiz1G6uiw== X-IronPort-AV: E=McAfee;i="6700,10204,11308"; a="47010452" X-IronPort-AV: E=Sophos;i="6.12,296,1728975600"; d="scan'208";a="47010452" Received: from orviesa008.jf.intel.com ([10.64.159.148]) by orvoesa103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 07 Jan 2025 18:28:36 -0800 X-CSE-ConnectionGUID: IiAlLdr9QPSpM57H4QGriw== X-CSE-MsgGUID: WBxJUeAuQ4aLnvdrp+VNWQ== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.12,224,1728975600"; d="scan'208";a="103793561" Received: from yilunxu-optiplex-7050.sh.intel.com ([10.239.159.165]) by orviesa008.jf.intel.com with ESMTP; 07 Jan 2025 18:28:31 -0800 From: Xu Yilun To: kvm@vger.kernel.org, dri-devel@lists.freedesktop.org, linux-media@vger.kernel.org, linaro-mm-sig@lists.linaro.org, sumit.semwal@linaro.org, christian.koenig@amd.com, pbonzini@redhat.com, seanjc@google.com, alex.williamson@redhat.com, jgg@nvidia.com, vivek.kasireddy@intel.com, dan.j.williams@intel.com, aik@amd.com Cc: yilun.xu@intel.com, yilun.xu@linux.intel.com, linux-coco@lists.linux.dev, linux-kernel@vger.kernel.org, lukas@wunner.de, yan.y.zhao@intel.com, daniel.vetter@ffwll.ch, leon@kernel.org, baolu.lu@linux.intel.com, zhenzhong.duan@intel.com, tao1.su@intel.com Subject: [RFC PATCH 02/12] vfio: Export vfio device get and put registration helpers Date: Tue, 7 Jan 2025 22:27:09 +0800 Message-Id: <20250107142719.179636-3-yilun.xu@linux.intel.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20250107142719.179636-1-yilun.xu@linux.intel.com> References: <20250107142719.179636-1-yilun.xu@linux.intel.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Vivek Kasireddy These helpers are useful for managing additional references taken on the device from other associated VFIO modules. Original-patch-by: Jason Gunthorpe Signed-off-by: Vivek Kasireddy --- drivers/vfio/vfio_main.c | 2 ++ include/linux/vfio.h | 2 ++ 2 files changed, 4 insertions(+) diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c index 1fd261efc582..620a3ee5d04d 100644 --- a/drivers/vfio/vfio_main.c +++ b/drivers/vfio/vfio_main.c @@ -171,11 +171,13 @@ void vfio_device_put_registration(struct vfio_device *device) if (refcount_dec_and_test(&device->refcount)) complete(&device->comp); } +EXPORT_SYMBOL_GPL(vfio_device_put_registration); bool vfio_device_try_get_registration(struct vfio_device *device) { return refcount_inc_not_zero(&device->refcount); } +EXPORT_SYMBOL_GPL(vfio_device_try_get_registration); /* * VFIO driver API diff --git a/include/linux/vfio.h b/include/linux/vfio.h index 000a6cab2d31..2258b0585330 100644 --- a/include/linux/vfio.h +++ b/include/linux/vfio.h @@ -279,6 +279,8 @@ static inline void vfio_put_device(struct vfio_device *device) int vfio_register_group_dev(struct vfio_device *device); int vfio_register_emulated_iommu_dev(struct vfio_device *device); void vfio_unregister_group_dev(struct vfio_device *device); +bool vfio_device_try_get_registration(struct vfio_device *device); +void vfio_device_put_registration(struct vfio_device *device); int vfio_assign_device_set(struct vfio_device *device, void *set_id); unsigned int vfio_device_set_open_count(struct vfio_device_set *dev_set); From patchwork Tue Jan 7 14:27:10 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Xu Yilun X-Patchwork-Id: 13930048 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.11]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 77A1317E8E2; Wed, 8 Jan 2025 02:28:41 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.11 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736303322; cv=none; b=PReuPRScjDSPhGaNMCfQ+nd6QmFsZM4t4FyugbmEbAbqIAb2wF4jLPPkr4k0rSGStFWacPqaq6bdq2epRQYZ6f6dcmMpHDGCmpToJevtnYt81GjtKHVkjoXBpuFP8OC3a527UfH2D2oJaKz7LFuCwrYeiB4nbeeVFtOlre9nDO8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736303322; c=relaxed/simple; bh=IxMVAkg+53rXDfSSEiAszqwBdp+SFB/CHo+r4gMxFl8=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=YgNySYNyJ/G4tOtYQxiSv9me2Nt6+YmUi601EIG5/7Eq7luDiq9tIw2qwhM6AfEH3fD2rVyXquShm6v9vfFqyLODBMRUejtKPhgcBRRhLC9qYe2gDAOWBaZRDcU0XMieoCSs8qosYj9OUZoIRZVZIqP/+BvDmOvqHqLxu2p2BrI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=none smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=cxb0xPUS; arc=none smtp.client-ip=198.175.65.11 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="cxb0xPUS" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1736303322; x=1767839322; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=IxMVAkg+53rXDfSSEiAszqwBdp+SFB/CHo+r4gMxFl8=; b=cxb0xPUSDW12EOQdaNHA+MNgXdkgX6DgrT3HIvM4552phhR2k+Kp4871 oQ8LM6goJSMf/Tt4QVUUs8rgIKKfsYxBOGQBp1ED+Q7L8iw+BcNkHPLhI 1BjaasNCZlVUxBK6txsOrSeXRg8D6dDl/KbVvH4zFBZDknaEWs2suXjdl ztqgVleNVfbkdpFp7fA1vXw9c60oipjbDFmf5pKG8iblo1jnfo2IdU+06 0RoQoF+ReisX5mtqqi57F1EM+QHNFnxafMAvFDDuR9/jZ6YsTb9qsLq9l 4M4UJyCx+Hg80j4g3q50t2wh+ucZPx0LXqIkPOUtB42nfuwG+7HEHS4A8 Q==; X-CSE-ConnectionGUID: VIPySz3NT1yiYLg0PM5ymw== X-CSE-MsgGUID: XT/YYTi/RoaH0+4OZBKP+Q== X-IronPort-AV: E=McAfee;i="6700,10204,11308"; a="47010467" X-IronPort-AV: E=Sophos;i="6.12,296,1728975600"; d="scan'208";a="47010467" Received: from orviesa008.jf.intel.com ([10.64.159.148]) by orvoesa103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 07 Jan 2025 18:28:41 -0800 X-CSE-ConnectionGUID: rnahsD3OT9q89XllgcYoJg== X-CSE-MsgGUID: /UQPPeECQQuVBbmZ6+FLPA== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.12,224,1728975600"; d="scan'208";a="103793588" Received: from yilunxu-optiplex-7050.sh.intel.com ([10.239.159.165]) by orviesa008.jf.intel.com with ESMTP; 07 Jan 2025 18:28:36 -0800 From: Xu Yilun To: kvm@vger.kernel.org, dri-devel@lists.freedesktop.org, linux-media@vger.kernel.org, linaro-mm-sig@lists.linaro.org, sumit.semwal@linaro.org, christian.koenig@amd.com, pbonzini@redhat.com, seanjc@google.com, alex.williamson@redhat.com, jgg@nvidia.com, vivek.kasireddy@intel.com, dan.j.williams@intel.com, aik@amd.com Cc: yilun.xu@intel.com, yilun.xu@linux.intel.com, linux-coco@lists.linux.dev, linux-kernel@vger.kernel.org, lukas@wunner.de, yan.y.zhao@intel.com, daniel.vetter@ffwll.ch, leon@kernel.org, baolu.lu@linux.intel.com, zhenzhong.duan@intel.com, tao1.su@intel.com Subject: [RFC PATCH 03/12] vfio/pci: Share the core device pointer while invoking feature functions Date: Tue, 7 Jan 2025 22:27:10 +0800 Message-Id: <20250107142719.179636-4-yilun.xu@linux.intel.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20250107142719.179636-1-yilun.xu@linux.intel.com> References: <20250107142719.179636-1-yilun.xu@linux.intel.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Vivek Kasireddy There is no need to share the main device pointer (struct vfio_device *) with all the feature functions as they only need the core device pointer. Therefore, extract the core device pointer once in the caller (vfio_pci_core_ioctl_feature) and share it instead. Signed-off-by: Vivek Kasireddy --- drivers/vfio/pci/vfio_pci_core.c | 30 +++++++++++++----------------- 1 file changed, 13 insertions(+), 17 deletions(-) diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c index 1ab58da9f38a..c3269d708411 100644 --- a/drivers/vfio/pci/vfio_pci_core.c +++ b/drivers/vfio/pci/vfio_pci_core.c @@ -300,11 +300,9 @@ static int vfio_pci_runtime_pm_entry(struct vfio_pci_core_device *vdev, return 0; } -static int vfio_pci_core_pm_entry(struct vfio_device *device, u32 flags, +static int vfio_pci_core_pm_entry(struct vfio_pci_core_device *vdev, u32 flags, void __user *arg, size_t argsz) { - struct vfio_pci_core_device *vdev = - container_of(device, struct vfio_pci_core_device, vdev); int ret; ret = vfio_check_feature(flags, argsz, VFIO_DEVICE_FEATURE_SET, 0); @@ -321,12 +319,10 @@ static int vfio_pci_core_pm_entry(struct vfio_device *device, u32 flags, } static int vfio_pci_core_pm_entry_with_wakeup( - struct vfio_device *device, u32 flags, + struct vfio_pci_core_device *vdev, u32 flags, struct vfio_device_low_power_entry_with_wakeup __user *arg, size_t argsz) { - struct vfio_pci_core_device *vdev = - container_of(device, struct vfio_pci_core_device, vdev); struct vfio_device_low_power_entry_with_wakeup entry; struct eventfd_ctx *efdctx; int ret; @@ -377,11 +373,9 @@ static void vfio_pci_runtime_pm_exit(struct vfio_pci_core_device *vdev) up_write(&vdev->memory_lock); } -static int vfio_pci_core_pm_exit(struct vfio_device *device, u32 flags, +static int vfio_pci_core_pm_exit(struct vfio_pci_core_device *vdev, u32 flags, void __user *arg, size_t argsz) { - struct vfio_pci_core_device *vdev = - container_of(device, struct vfio_pci_core_device, vdev); int ret; ret = vfio_check_feature(flags, argsz, VFIO_DEVICE_FEATURE_SET, 0); @@ -1486,11 +1480,10 @@ long vfio_pci_core_ioctl(struct vfio_device *core_vdev, unsigned int cmd, } EXPORT_SYMBOL_GPL(vfio_pci_core_ioctl); -static int vfio_pci_core_feature_token(struct vfio_device *device, u32 flags, - uuid_t __user *arg, size_t argsz) +static int vfio_pci_core_feature_token(struct vfio_pci_core_device *vdev, + u32 flags, uuid_t __user *arg, + size_t argsz) { - struct vfio_pci_core_device *vdev = - container_of(device, struct vfio_pci_core_device, vdev); uuid_t uuid; int ret; @@ -1517,16 +1510,19 @@ static int vfio_pci_core_feature_token(struct vfio_device *device, u32 flags, int vfio_pci_core_ioctl_feature(struct vfio_device *device, u32 flags, void __user *arg, size_t argsz) { + struct vfio_pci_core_device *vdev = + container_of(device, struct vfio_pci_core_device, vdev); + switch (flags & VFIO_DEVICE_FEATURE_MASK) { case VFIO_DEVICE_FEATURE_LOW_POWER_ENTRY: - return vfio_pci_core_pm_entry(device, flags, arg, argsz); + return vfio_pci_core_pm_entry(vdev, flags, arg, argsz); case VFIO_DEVICE_FEATURE_LOW_POWER_ENTRY_WITH_WAKEUP: - return vfio_pci_core_pm_entry_with_wakeup(device, flags, + return vfio_pci_core_pm_entry_with_wakeup(vdev, flags, arg, argsz); case VFIO_DEVICE_FEATURE_LOW_POWER_EXIT: - return vfio_pci_core_pm_exit(device, flags, arg, argsz); + return vfio_pci_core_pm_exit(vdev, flags, arg, argsz); case VFIO_DEVICE_FEATURE_PCI_VF_TOKEN: - return vfio_pci_core_feature_token(device, flags, arg, argsz); + return vfio_pci_core_feature_token(vdev, flags, arg, argsz); default: return -ENOTTY; } From patchwork Tue Jan 7 14:27:11 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Xu Yilun X-Patchwork-Id: 13930049 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.11]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 2251A17E8E2; Wed, 8 Jan 2025 02:28:47 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.11 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736303329; cv=none; b=YQOpQ3CGDh9ZnIj/U3hSTQe4vPO4xnoaX7KJJlQl9EAf9ku40SeTvIX7kOSU2YzNnnXbNpW876cmPeDG1LaVnVty3PCewGzYup+HyKUURBQVg0z+f/Cjqn/ZA87LD5UV03wKUL2X8CFMglF7fwFZulP/L2wi2GnbD9cC+yc2GQc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736303329; c=relaxed/simple; bh=a0OqyQpFfOIIpYT0YJaazZ1oZJu2Fx2Dx/pTeyVZNCg=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=W8S9pFXSXqx7KK8imRegUlphqIPKjrVkxliAFh8RSrNazhWUKuLzx7MGZUwVMTm0A9bVLra3xi5dTd5n04UcgsWZgMvo4TSzE3iXdlnvtj00rsUUw/KXEDMsEodyWxOQu4t4Z0Z27nznZeod642BDjMoxnwgNGgdJIjr5lh3PvM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=none smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=mD0GNwZw; arc=none smtp.client-ip=198.175.65.11 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="mD0GNwZw" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1736303328; x=1767839328; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=a0OqyQpFfOIIpYT0YJaazZ1oZJu2Fx2Dx/pTeyVZNCg=; b=mD0GNwZwXAbq8Ztp+Gduc3GPNU9VquUr+Haz1cGFhJgPX4eijOCowtcu cOhLCLIty0nB2SjaI8033VV3R46Upn2sLLI4VluINKwO460r1/XxNd3ow c2J6wJ/nQDqN0+GL+d3aNY9nm0llspVzliztXDS0HJ6XOlxzHhZF3P7vH v3wsZWp8gBvI5lxsWNXQfpdbg/YgKwxU4lY/+JZMNaRnqwegcVb9m6GO/ hh1MJJslVBYfW9S/zwqY0qnWEppfB4b0b5T8WmHin150KqQ1SXrJ6DFBO fc6Dv6BdX4SiJmETPsB3bJR5kgY8UMFrbqXTh8s9hN0JRjP/yGc/spuBa w==; X-CSE-ConnectionGUID: FqM7eouLSUGcviEb5KuaTQ== X-CSE-MsgGUID: aHzh+6wRRpOMdoIU93IeWQ== X-IronPort-AV: E=McAfee;i="6700,10204,11308"; a="47010482" X-IronPort-AV: E=Sophos;i="6.12,296,1728975600"; d="scan'208";a="47010482" Received: from orviesa008.jf.intel.com ([10.64.159.148]) by orvoesa103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 07 Jan 2025 18:28:47 -0800 X-CSE-ConnectionGUID: jAc5iFc9RwO7vSvXlvks+A== X-CSE-MsgGUID: u0meB6CUQVybzfvf3lmg2w== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.12,224,1728975600"; d="scan'208";a="103793644" Received: from yilunxu-optiplex-7050.sh.intel.com ([10.239.159.165]) by orviesa008.jf.intel.com with ESMTP; 07 Jan 2025 18:28:41 -0800 From: Xu Yilun To: kvm@vger.kernel.org, dri-devel@lists.freedesktop.org, linux-media@vger.kernel.org, linaro-mm-sig@lists.linaro.org, sumit.semwal@linaro.org, christian.koenig@amd.com, pbonzini@redhat.com, seanjc@google.com, alex.williamson@redhat.com, jgg@nvidia.com, vivek.kasireddy@intel.com, dan.j.williams@intel.com, aik@amd.com Cc: yilun.xu@intel.com, yilun.xu@linux.intel.com, linux-coco@lists.linux.dev, linux-kernel@vger.kernel.org, lukas@wunner.de, yan.y.zhao@intel.com, daniel.vetter@ffwll.ch, leon@kernel.org, baolu.lu@linux.intel.com, zhenzhong.duan@intel.com, tao1.su@intel.com Subject: [RFC PATCH 04/12] vfio/pci: Allow MMIO regions to be exported through dma-buf Date: Tue, 7 Jan 2025 22:27:11 +0800 Message-Id: <20250107142719.179636-5-yilun.xu@linux.intel.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20250107142719.179636-1-yilun.xu@linux.intel.com> References: <20250107142719.179636-1-yilun.xu@linux.intel.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Vivek Kasireddy This is a reduced version of Vivek's series [1]. Removed the dma_buf_ops.attach/map/unmap_dma_buf/mmap() as they are not necessary in this series, also because of the WIP p2p dma mapping opens [2]. Just focus on the private MMIO get PFN function in this early stage. >From Jason Gunthorpe: "dma-buf has become a way to safely acquire a handle to non-struct page memory that can still have lifetime controlled by the exporter. Notably RDMA can now import dma-buf FDs and build them into MRs which allows for PCI P2P operations. Extend this to allow vfio-pci to export MMIO memory from PCI device BARs. The patch design loosely follows the pattern in commit db1a8dd916aa ("habanalabs: add support for dma-buf exporter") except this does not support pinning. Instead, this implements what, in the past, we've called a revocable attachment using move. In normal situations the attachment is pinned, as a BAR does not change physical address. However when the VFIO device is closed, or a PCI reset is issued, access to the MMIO memory is revoked. Revoked means that move occurs, but an attempt to immediately re-map the memory will fail. In the reset case a future move will be triggered when MMIO access returns. As both close and reset are under userspace control it is expected that userspace will suspend use of the dma-buf before doing these operations, the revoke is purely for kernel self-defense against a hostile userspace." [1] https://lore.kernel.org/kvm/20240624065552.1572580-4-vivek.kasireddy@intel.com/ [2] https://lore.kernel.org/all/IA0PR11MB7185FDD56CFDD0A2B8D21468F83B2@IA0PR11MB7185.namprd11.prod.outlook.com/ Original-patch-by: Jason Gunthorpe Signed-off-by: Vivek Kasireddy Signed-off-by: Xu Yilun --- drivers/vfio/pci/Makefile | 1 + drivers/vfio/pci/dma_buf.c | 223 +++++++++++++++++++++++++++++ drivers/vfio/pci/vfio_pci_config.c | 22 ++- drivers/vfio/pci/vfio_pci_core.c | 20 ++- drivers/vfio/pci/vfio_pci_priv.h | 25 ++++ include/linux/vfio_pci_core.h | 1 + include/uapi/linux/vfio.h | 29 ++++ 7 files changed, 316 insertions(+), 5 deletions(-) create mode 100644 drivers/vfio/pci/dma_buf.c diff --git a/drivers/vfio/pci/Makefile b/drivers/vfio/pci/Makefile index cf00c0a7e55c..0cfdc9ede82f 100644 --- a/drivers/vfio/pci/Makefile +++ b/drivers/vfio/pci/Makefile @@ -2,6 +2,7 @@ vfio-pci-core-y := vfio_pci_core.o vfio_pci_intrs.o vfio_pci_rdwr.o vfio_pci_config.o vfio-pci-core-$(CONFIG_VFIO_PCI_ZDEV_KVM) += vfio_pci_zdev.o +vfio-pci-core-$(CONFIG_DMA_SHARED_BUFFER) += dma_buf.o obj-$(CONFIG_VFIO_PCI_CORE) += vfio-pci-core.o vfio-pci-y := vfio_pci.o diff --git a/drivers/vfio/pci/dma_buf.c b/drivers/vfio/pci/dma_buf.c new file mode 100644 index 000000000000..1d5f46744922 --- /dev/null +++ b/drivers/vfio/pci/dma_buf.c @@ -0,0 +1,223 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* Copyright (c) 2024, NVIDIA CORPORATION & AFFILIATES. + */ +#include +#include + +#include "vfio_pci_priv.h" + +MODULE_IMPORT_NS("DMA_BUF"); + +struct vfio_pci_dma_buf { + struct dma_buf *dmabuf; + struct vfio_pci_core_device *vdev; + struct list_head dmabufs_elm; + unsigned int nr_ranges; + struct vfio_region_dma_range *dma_ranges; + bool revoked; +}; + +static void vfio_pci_dma_buf_unpin(struct dma_buf_attachment *attachment) +{ +} + +static int vfio_pci_dma_buf_pin(struct dma_buf_attachment *attachment) +{ + /* + * Uses the dynamic interface but must always allow for + * dma_buf_move_notify() to do revoke + */ + return -EINVAL; +} + +static int vfio_pci_dma_buf_get_pfn(struct dma_buf_attachment *attachment, + pgoff_t pgoff, u64 *pfn, int *max_order) +{ + /* TODO */ + return -EOPNOTSUPP; +} + +static void vfio_pci_dma_buf_release(struct dma_buf *dmabuf) +{ + struct vfio_pci_dma_buf *priv = dmabuf->priv; + + /* + * Either this or vfio_pci_dma_buf_cleanup() will remove from the list. + * The refcount prevents both. + */ + if (priv->vdev) { + down_write(&priv->vdev->memory_lock); + list_del_init(&priv->dmabufs_elm); + up_write(&priv->vdev->memory_lock); + vfio_device_put_registration(&priv->vdev->vdev); + } + kfree(priv); +} + +static const struct dma_buf_ops vfio_pci_dmabuf_ops = { + .pin = vfio_pci_dma_buf_pin, + .unpin = vfio_pci_dma_buf_unpin, + .get_pfn = vfio_pci_dma_buf_get_pfn, + .release = vfio_pci_dma_buf_release, +}; + +static int check_dma_ranges(struct vfio_pci_dma_buf *priv, u64 *dmabuf_size) +{ + struct vfio_region_dma_range *dma_ranges = priv->dma_ranges; + struct pci_dev *pdev = priv->vdev->pdev; + resource_size_t bar_size; + int i; + + for (i = 0; i < priv->nr_ranges; i++) { + /* + * For PCI the region_index is the BAR number like + * everything else. + */ + if (dma_ranges[i].region_index >= VFIO_PCI_ROM_REGION_INDEX) + return -EINVAL; + + bar_size = pci_resource_len(pdev, dma_ranges[i].region_index); + if (!bar_size) + return -EINVAL; + + if (!dma_ranges[i].offset && !dma_ranges[i].length) + dma_ranges[i].length = bar_size; + + if (!IS_ALIGNED(dma_ranges[i].offset, PAGE_SIZE) || + !IS_ALIGNED(dma_ranges[i].length, PAGE_SIZE) || + dma_ranges[i].length > bar_size || + dma_ranges[i].offset >= bar_size || + dma_ranges[i].offset + dma_ranges[i].length > bar_size) + return -EINVAL; + + *dmabuf_size += dma_ranges[i].length; + } + + return 0; +} + +int vfio_pci_core_feature_dma_buf(struct vfio_pci_core_device *vdev, u32 flags, + struct vfio_device_feature_dma_buf __user *arg, + size_t argsz) +{ + struct vfio_device_feature_dma_buf get_dma_buf; + struct vfio_region_dma_range *dma_ranges; + DEFINE_DMA_BUF_EXPORT_INFO(exp_info); + struct vfio_pci_dma_buf *priv; + u64 dmabuf_size = 0; + int ret; + + ret = vfio_check_feature(flags, argsz, VFIO_DEVICE_FEATURE_GET, + sizeof(get_dma_buf)); + if (ret != 1) + return ret; + + if (copy_from_user(&get_dma_buf, arg, sizeof(get_dma_buf))) + return -EFAULT; + + dma_ranges = memdup_array_user(&arg->dma_ranges, + get_dma_buf.nr_ranges, + sizeof(*dma_ranges)); + if (IS_ERR(dma_ranges)) + return PTR_ERR(dma_ranges); + + priv = kzalloc(sizeof(*priv), GFP_KERNEL); + if (!priv) { + kfree(dma_ranges); + return -ENOMEM; + } + + priv->vdev = vdev; + priv->nr_ranges = get_dma_buf.nr_ranges; + priv->dma_ranges = dma_ranges; + + ret = check_dma_ranges(priv, &dmabuf_size); + if (ret) + goto err_free_priv; + + if (!vfio_device_try_get_registration(&vdev->vdev)) { + ret = -ENODEV; + goto err_free_priv; + } + + exp_info.ops = &vfio_pci_dmabuf_ops; + exp_info.size = dmabuf_size; + exp_info.flags = get_dma_buf.open_flags; + exp_info.priv = priv; + + priv->dmabuf = dma_buf_export(&exp_info); + if (IS_ERR(priv->dmabuf)) { + ret = PTR_ERR(priv->dmabuf); + goto err_dev_put; + } + + /* dma_buf_put() now frees priv */ + INIT_LIST_HEAD(&priv->dmabufs_elm); + down_write(&vdev->memory_lock); + dma_resv_lock(priv->dmabuf->resv, NULL); + priv->revoked = !__vfio_pci_memory_enabled(vdev); + list_add_tail(&priv->dmabufs_elm, &vdev->dmabufs); + dma_resv_unlock(priv->dmabuf->resv); + up_write(&vdev->memory_lock); + + /* + * dma_buf_fd() consumes the reference, when the file closes the dmabuf + * will be released. + */ + return dma_buf_fd(priv->dmabuf, get_dma_buf.open_flags); + +err_dev_put: + vfio_device_put_registration(&vdev->vdev); +err_free_priv: + kfree(dma_ranges); + kfree(priv); + return ret; +} + +void vfio_pci_dma_buf_move(struct vfio_pci_core_device *vdev, bool revoked) +{ + struct vfio_pci_dma_buf *priv; + struct vfio_pci_dma_buf *tmp; + + lockdep_assert_held_write(&vdev->memory_lock); + + list_for_each_entry_safe(priv, tmp, &vdev->dmabufs, dmabufs_elm) { + /* + * Returns true if a reference was successfully obtained. + * The caller must interlock with the dmabuf's release + * function in some way, such as RCU, to ensure that this + * is not called on freed memory. + */ + if (!get_file_rcu(&priv->dmabuf->file)) + continue; + + if (priv->revoked != revoked) { + dma_resv_lock(priv->dmabuf->resv, NULL); + priv->revoked = revoked; + dma_buf_move_notify(priv->dmabuf); + dma_resv_unlock(priv->dmabuf->resv); + } + dma_buf_put(priv->dmabuf); + } +} + +void vfio_pci_dma_buf_cleanup(struct vfio_pci_core_device *vdev) +{ + struct vfio_pci_dma_buf *priv; + struct vfio_pci_dma_buf *tmp; + + down_write(&vdev->memory_lock); + list_for_each_entry_safe(priv, tmp, &vdev->dmabufs, dmabufs_elm) { + if (!get_file_rcu(&priv->dmabuf->file)) + continue; + dma_resv_lock(priv->dmabuf->resv, NULL); + list_del_init(&priv->dmabufs_elm); + priv->vdev = NULL; + priv->revoked = true; + dma_buf_move_notify(priv->dmabuf); + dma_resv_unlock(priv->dmabuf->resv); + vfio_device_put_registration(&vdev->vdev); + dma_buf_put(priv->dmabuf); + } + up_write(&vdev->memory_lock); +} diff --git a/drivers/vfio/pci/vfio_pci_config.c b/drivers/vfio/pci/vfio_pci_config.c index ea2745c1ac5e..5cc200e15edc 100644 --- a/drivers/vfio/pci/vfio_pci_config.c +++ b/drivers/vfio/pci/vfio_pci_config.c @@ -589,10 +589,12 @@ static int vfio_basic_config_write(struct vfio_pci_core_device *vdev, int pos, virt_mem = !!(le16_to_cpu(*virt_cmd) & PCI_COMMAND_MEMORY); new_mem = !!(new_cmd & PCI_COMMAND_MEMORY); - if (!new_mem) + if (!new_mem) { vfio_pci_zap_and_down_write_memory_lock(vdev); - else + vfio_pci_dma_buf_move(vdev, true); + } else { down_write(&vdev->memory_lock); + } /* * If the user is writing mem/io enable (new_mem/io) and we @@ -627,6 +629,8 @@ static int vfio_basic_config_write(struct vfio_pci_core_device *vdev, int pos, *virt_cmd &= cpu_to_le16(~mask); *virt_cmd |= cpu_to_le16(new_cmd & mask); + if (__vfio_pci_memory_enabled(vdev)) + vfio_pci_dma_buf_move(vdev, false); up_write(&vdev->memory_lock); } @@ -707,12 +711,16 @@ static int __init init_pci_cap_basic_perm(struct perm_bits *perm) static void vfio_lock_and_set_power_state(struct vfio_pci_core_device *vdev, pci_power_t state) { - if (state >= PCI_D3hot) + if (state >= PCI_D3hot) { vfio_pci_zap_and_down_write_memory_lock(vdev); - else + vfio_pci_dma_buf_move(vdev, true); + } else { down_write(&vdev->memory_lock); + } vfio_pci_set_power_state(vdev, state); + if (__vfio_pci_memory_enabled(vdev)) + vfio_pci_dma_buf_move(vdev, false); up_write(&vdev->memory_lock); } @@ -900,7 +908,10 @@ static int vfio_exp_config_write(struct vfio_pci_core_device *vdev, int pos, if (!ret && (cap & PCI_EXP_DEVCAP_FLR)) { vfio_pci_zap_and_down_write_memory_lock(vdev); + vfio_pci_dma_buf_move(vdev, true); pci_try_reset_function(vdev->pdev); + if (__vfio_pci_memory_enabled(vdev)) + vfio_pci_dma_buf_move(vdev, true); up_write(&vdev->memory_lock); } } @@ -982,7 +993,10 @@ static int vfio_af_config_write(struct vfio_pci_core_device *vdev, int pos, if (!ret && (cap & PCI_AF_CAP_FLR) && (cap & PCI_AF_CAP_TP)) { vfio_pci_zap_and_down_write_memory_lock(vdev); + vfio_pci_dma_buf_move(vdev, true); pci_try_reset_function(vdev->pdev); + if (__vfio_pci_memory_enabled(vdev)) + vfio_pci_dma_buf_move(vdev, true); up_write(&vdev->memory_lock); } } diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c index c3269d708411..f69eda5956ad 100644 --- a/drivers/vfio/pci/vfio_pci_core.c +++ b/drivers/vfio/pci/vfio_pci_core.c @@ -287,6 +287,8 @@ static int vfio_pci_runtime_pm_entry(struct vfio_pci_core_device *vdev, * semaphore. */ vfio_pci_zap_and_down_write_memory_lock(vdev); + vfio_pci_dma_buf_move(vdev, true); + if (vdev->pm_runtime_engaged) { up_write(&vdev->memory_lock); return -EINVAL; @@ -370,6 +372,8 @@ static void vfio_pci_runtime_pm_exit(struct vfio_pci_core_device *vdev) */ down_write(&vdev->memory_lock); __vfio_pci_runtime_pm_exit(vdev); + if (__vfio_pci_memory_enabled(vdev)) + vfio_pci_dma_buf_move(vdev, false); up_write(&vdev->memory_lock); } @@ -690,6 +694,8 @@ void vfio_pci_core_close_device(struct vfio_device *core_vdev) #endif vfio_pci_core_disable(vdev); + vfio_pci_dma_buf_cleanup(vdev); + mutex_lock(&vdev->igate); if (vdev->err_trigger) { eventfd_ctx_put(vdev->err_trigger); @@ -1234,7 +1240,10 @@ static int vfio_pci_ioctl_reset(struct vfio_pci_core_device *vdev, */ vfio_pci_set_power_state(vdev, PCI_D0); + vfio_pci_dma_buf_move(vdev, true); ret = pci_try_reset_function(vdev->pdev); + if (__vfio_pci_memory_enabled(vdev)) + vfio_pci_dma_buf_move(vdev, false); up_write(&vdev->memory_lock); return ret; @@ -1523,6 +1532,8 @@ int vfio_pci_core_ioctl_feature(struct vfio_device *device, u32 flags, return vfio_pci_core_pm_exit(vdev, flags, arg, argsz); case VFIO_DEVICE_FEATURE_PCI_VF_TOKEN: return vfio_pci_core_feature_token(vdev, flags, arg, argsz); + case VFIO_DEVICE_FEATURE_DMA_BUF: + return vfio_pci_core_feature_dma_buf(vdev, flags, arg, argsz); default: return -ENOTTY; } @@ -2098,6 +2109,7 @@ int vfio_pci_core_init_dev(struct vfio_device *core_vdev) INIT_LIST_HEAD(&vdev->dummy_resources_list); INIT_LIST_HEAD(&vdev->ioeventfds_list); INIT_LIST_HEAD(&vdev->sriov_pfs_item); + INIT_LIST_HEAD(&vdev->dmabufs); init_rwsem(&vdev->memory_lock); xa_init(&vdev->ctx); @@ -2480,11 +2492,17 @@ static int vfio_pci_dev_set_hot_reset(struct vfio_device_set *dev_set, * cause the PCI config space reset without restoring the original * state (saved locally in 'vdev->pm_save'). */ - list_for_each_entry(vdev, &dev_set->device_list, vdev.dev_set_list) + list_for_each_entry(vdev, &dev_set->device_list, vdev.dev_set_list) { + vfio_pci_dma_buf_move(vdev, true); vfio_pci_set_power_state(vdev, PCI_D0); + } ret = pci_reset_bus(pdev); + list_for_each_entry(vdev, &dev_set->device_list, vdev.dev_set_list) + if (__vfio_pci_memory_enabled(vdev)) + vfio_pci_dma_buf_move(vdev, false); + vdev = list_last_entry(&dev_set->device_list, struct vfio_pci_core_device, vdev.dev_set_list); diff --git a/drivers/vfio/pci/vfio_pci_priv.h b/drivers/vfio/pci/vfio_pci_priv.h index 5e4fa69aee16..d27f383f3931 100644 --- a/drivers/vfio/pci/vfio_pci_priv.h +++ b/drivers/vfio/pci/vfio_pci_priv.h @@ -101,4 +101,29 @@ static inline bool vfio_pci_is_vga(struct pci_dev *pdev) return (pdev->class >> 8) == PCI_CLASS_DISPLAY_VGA; } +#ifdef CONFIG_DMA_SHARED_BUFFER +int vfio_pci_core_feature_dma_buf(struct vfio_pci_core_device *vdev, u32 flags, + struct vfio_device_feature_dma_buf __user *arg, + size_t argsz); +void vfio_pci_dma_buf_cleanup(struct vfio_pci_core_device *vdev); +void vfio_pci_dma_buf_move(struct vfio_pci_core_device *vdev, bool revoked); +#else +static int +vfio_pci_core_feature_dma_buf(struct vfio_pci_core_device *vdev, u32 flags, + struct vfio_device_feature_dma_buf __user *arg, + size_t argsz) +{ + return -ENOTTY; +} + +static inline void vfio_pci_dma_buf_cleanup(struct vfio_pci_core_device *vdev) +{ +} + +static inline void vfio_pci_dma_buf_move(struct vfio_pci_core_device *vdev, + bool revoked) +{ +} +#endif + #endif diff --git a/include/linux/vfio_pci_core.h b/include/linux/vfio_pci_core.h index fbb472dd99b3..da5d8955ae56 100644 --- a/include/linux/vfio_pci_core.h +++ b/include/linux/vfio_pci_core.h @@ -94,6 +94,7 @@ struct vfio_pci_core_device { struct vfio_pci_core_device *sriov_pf_core_dev; struct notifier_block nb; struct rw_semaphore memory_lock; + struct list_head dmabufs; }; /* Will be exported for vfio pci drivers usage */ diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h index c8dbf8219c4f..f43dfbde7352 100644 --- a/include/uapi/linux/vfio.h +++ b/include/uapi/linux/vfio.h @@ -1458,6 +1458,35 @@ struct vfio_device_feature_bus_master { }; #define VFIO_DEVICE_FEATURE_BUS_MASTER 10 +/** + * Upon VFIO_DEVICE_FEATURE_GET create a dma_buf fd for the + * regions selected. + * + * For struct struct vfio_device_feature_dma_buf, open_flags are the typical + * flags passed to open(2), eg O_RDWR, O_CLOEXEC, etc. nr_ranges is the total + * number of dma_ranges that comprise the dmabuf. + * + * For struct vfio_region_dma_range, region_index/offset/length specify a slice + * of the region to create the dmabuf from, if both offset & length are 0 then + * the whole region is used. + * + * Return: The fd number on success, -1 and errno is set on failure. + */ +struct vfio_region_dma_range { + __u32 region_index; + __u32 __pad; + __u64 offset; + __u64 length; +}; + +struct vfio_device_feature_dma_buf { + __u32 open_flags; + __u32 nr_ranges; + struct vfio_region_dma_range dma_ranges[]; +}; + +#define VFIO_DEVICE_FEATURE_DMA_BUF 11 + /* -------- API for Type1 VFIO IOMMU -------- */ /** From patchwork Tue Jan 7 14:27:12 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Xu Yilun X-Patchwork-Id: 13930050 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.11]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 12C4519E98B; Wed, 8 Jan 2025 02:28:52 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.11 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736303333; cv=none; b=YRKIB+nvg+RRJt+n9XDhGvy39YpKTvDwp3KqrhNAl7Y+4vxV52vmN61W/RDycE0yYLS9AyNFWi+GIbsXY1a/fB/qYOX1s8RM2xmM4ZGFmgyhGrNKh4aGOe4Qa+iMxL7L7M7oixTXCN9hMprU7E9koTaw4KeTu2AhQWwTrHT9juY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736303333; c=relaxed/simple; bh=otWkkgUeiMivJEjrgfsr/A/0Wt8yPpouFUFtERZOVd0=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=A7sJ8vHFr2OtysE4hSZXhN+Z9jqANJOD0yRt2QrOAjoIX548PM4e6FNGxZ7mWJ7tWGAMa1rH7N34EXayCEzSxXOQL1tfpMkRmaDrOV6VXIoF7OXf87eU6KV/IxdSR0oBp068Vg0/WMjxsbGFdAahJCltGkO6YYzmBIOaGZpseeY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=none smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=SNXDyByd; arc=none smtp.client-ip=198.175.65.11 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="SNXDyByd" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1736303332; x=1767839332; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=otWkkgUeiMivJEjrgfsr/A/0Wt8yPpouFUFtERZOVd0=; b=SNXDyByd46+6XIEXNduokyMO+rw+vjtXeV08FAklMm6lcgMWX3MempGw U1woe1xKCrsgjiQyw2kGYO/GRFse89O3Cr8sEf5Q7NJlMX/tU9HB/a6K+ fqf9QO6b50UWQ0SdXgUxPx4LCOmVCTqXVeRvidapjh+oJzTOH6qBpfnBg 7Isx71ilh1zeLmJ4uLH57eLDcfxSvWvku4KXvd4t/oeTDVXwLm7suLPJu 96KtJComV4EAdRZyJU2FGW8Cb7OeDGUb+kWLPTMIVji7EithiY43raqNo qAveJzst2t++hRU3s2BOwJajBo/xu980GhUiCRzYwy4iZ8Rls9QNwSj2g A==; X-CSE-ConnectionGUID: PX6InvjuT/KAndtPDFMwNA== X-CSE-MsgGUID: AXIAptNFSw+qe6LAaht/xw== X-IronPort-AV: E=McAfee;i="6700,10204,11308"; a="47010499" X-IronPort-AV: E=Sophos;i="6.12,296,1728975600"; d="scan'208";a="47010499" Received: from orviesa008.jf.intel.com ([10.64.159.148]) by orvoesa103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 07 Jan 2025 18:28:52 -0800 X-CSE-ConnectionGUID: AYUspgPUQrSA4seOzX8GZA== X-CSE-MsgGUID: ZHy1gbacQMGfWoRc81VWEQ== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.12,224,1728975600"; d="scan'208";a="103793662" Received: from yilunxu-optiplex-7050.sh.intel.com ([10.239.159.165]) by orviesa008.jf.intel.com with ESMTP; 07 Jan 2025 18:28:47 -0800 From: Xu Yilun To: kvm@vger.kernel.org, dri-devel@lists.freedesktop.org, linux-media@vger.kernel.org, linaro-mm-sig@lists.linaro.org, sumit.semwal@linaro.org, christian.koenig@amd.com, pbonzini@redhat.com, seanjc@google.com, alex.williamson@redhat.com, jgg@nvidia.com, vivek.kasireddy@intel.com, dan.j.williams@intel.com, aik@amd.com Cc: yilun.xu@intel.com, yilun.xu@linux.intel.com, linux-coco@lists.linux.dev, linux-kernel@vger.kernel.org, lukas@wunner.de, yan.y.zhao@intel.com, daniel.vetter@ffwll.ch, leon@kernel.org, baolu.lu@linux.intel.com, zhenzhong.duan@intel.com, tao1.su@intel.com Subject: [RFC PATCH 05/12] vfio/pci: Support get_pfn() callback for dma-buf Date: Tue, 7 Jan 2025 22:27:12 +0800 Message-Id: <20250107142719.179636-6-yilun.xu@linux.intel.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20250107142719.179636-1-yilun.xu@linux.intel.com> References: <20250107142719.179636-1-yilun.xu@linux.intel.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Implement get_pfn() callback for exported MMIO resources. Signed-off-by: Xu Yilun --- drivers/vfio/pci/dma_buf.c | 30 ++++++++++++++++++++++++++++-- 1 file changed, 28 insertions(+), 2 deletions(-) diff --git a/drivers/vfio/pci/dma_buf.c b/drivers/vfio/pci/dma_buf.c index 1d5f46744922..ad12cfb85099 100644 --- a/drivers/vfio/pci/dma_buf.c +++ b/drivers/vfio/pci/dma_buf.c @@ -33,8 +33,34 @@ static int vfio_pci_dma_buf_pin(struct dma_buf_attachment *attachment) static int vfio_pci_dma_buf_get_pfn(struct dma_buf_attachment *attachment, pgoff_t pgoff, u64 *pfn, int *max_order) { - /* TODO */ - return -EOPNOTSUPP; + struct vfio_pci_dma_buf *priv = attachment->dmabuf->priv; + struct vfio_region_dma_range *dma_ranges = priv->dma_ranges; + u64 offset = pgoff << PAGE_SHIFT; + int i; + + dma_resv_assert_held(priv->dmabuf->resv); + + if (priv->revoked) + return -ENODEV; + + if (offset >= priv->dmabuf->size) + return -EINVAL; + + for (i = 0; i < priv->nr_ranges; i++) { + if (offset < dma_ranges[i].length) + break; + + offset -= dma_ranges[i].length; + } + + *pfn = PHYS_PFN(pci_resource_start(priv->vdev->pdev, dma_ranges[i].region_index) + + dma_ranges[i].offset + offset); + + /* TODO: large page mapping is yet to be supported */ + if (max_order) + *max_order = 0; + + return 0; } static void vfio_pci_dma_buf_release(struct dma_buf *dmabuf) From patchwork Tue Jan 7 14:27:13 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Xu Yilun X-Patchwork-Id: 13930051 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.11]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E262A19408C; Wed, 8 Jan 2025 02:28:57 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.11 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736303339; cv=none; b=Z/aY60Ijue5sWGT929oiFrPiBXcRSoVo5bVVkZIFZ1aPrYPVp2MoBYDYXpfV99mLYHExKlNNw8NzwXa27OXR2/xDjBidveT9AeNp3XOmyLHaBvGjG9as+k46vhi9mODdXEshqYM+pqoR1X6JWrpfVzzr+d5vLDahavhyE8mjgoU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736303339; c=relaxed/simple; bh=MmiaYqOSRFerEB2by5lm7b3Hd56RnhjXOTZ4ghG6PaQ=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=gTAXgFgMgaMCcmlS84MmoWtHcBgJXqujwshO+PCuLqPGZOrMeIVGNRgbm4GuI7yo7X85cRWyyLgvikpeer7kv3vySU2dq9tAXzRxb16SIB7uXr2/Iworkzut3ydMDtOy4JA7R12Pq+roK3AIPz7pLG+bdDBkuPOUoJSQa8XSXTc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=none smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=SdlAPim6; arc=none smtp.client-ip=198.175.65.11 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="SdlAPim6" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1736303338; x=1767839338; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=MmiaYqOSRFerEB2by5lm7b3Hd56RnhjXOTZ4ghG6PaQ=; b=SdlAPim6OXOreqCoAiQTDYzvWy9EL47bN9//0SfxDVZGkHXBz794Ul5d X9yS3S+ikGhNZ9oYso8nfYJyIwM/9sYko2YkjEVKFkD311BfSfaQOUbmC zNG9V0OWQ1gi/BlfGpP0tSnPWOkMXsQAAGfaVHp0fhYeqNuBhs9PlwZLp JCfSi+k28Jp9pv91o2i8pCrE5fYl8KsbQLqOMx2fx3T6rxTnGtNNjK3Ad +vEBXvjajJKkVgv6I/owEVzBR4IyqCR69N+sbA3slnUyFypUyN1JBRiQC tzA5YdqCstAYZ2ysj/SUgVdKQu67iI6EfdOrp7HUCTCIXSiGEqYKla9s8 Q==; X-CSE-ConnectionGUID: uSTSQUA/RpyTVoZgVX0yww== X-CSE-MsgGUID: VEXwZ3G3SIeAvUZILHXDxw== X-IronPort-AV: E=McAfee;i="6700,10204,11308"; a="47010512" X-IronPort-AV: E=Sophos;i="6.12,296,1728975600"; d="scan'208";a="47010512" Received: from orviesa008.jf.intel.com ([10.64.159.148]) by orvoesa103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 07 Jan 2025 18:28:58 -0800 X-CSE-ConnectionGUID: irjzBI/xRXqn/310sPapow== X-CSE-MsgGUID: KHJR6cqgS+KEj/qLC2/O2w== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.12,224,1728975600"; d="scan'208";a="103793681" Received: from yilunxu-optiplex-7050.sh.intel.com ([10.239.159.165]) by orviesa008.jf.intel.com with ESMTP; 07 Jan 2025 18:28:52 -0800 From: Xu Yilun To: kvm@vger.kernel.org, dri-devel@lists.freedesktop.org, linux-media@vger.kernel.org, linaro-mm-sig@lists.linaro.org, sumit.semwal@linaro.org, christian.koenig@amd.com, pbonzini@redhat.com, seanjc@google.com, alex.williamson@redhat.com, jgg@nvidia.com, vivek.kasireddy@intel.com, dan.j.williams@intel.com, aik@amd.com Cc: yilun.xu@intel.com, yilun.xu@linux.intel.com, linux-coco@lists.linux.dev, linux-kernel@vger.kernel.org, lukas@wunner.de, yan.y.zhao@intel.com, daniel.vetter@ffwll.ch, leon@kernel.org, baolu.lu@linux.intel.com, zhenzhong.duan@intel.com, tao1.su@intel.com Subject: [RFC PATCH 06/12] KVM: Support vfio_dmabuf backed MMIO region Date: Tue, 7 Jan 2025 22:27:13 +0800 Message-Id: <20250107142719.179636-7-yilun.xu@linux.intel.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20250107142719.179636-1-yilun.xu@linux.intel.com> References: <20250107142719.179636-1-yilun.xu@linux.intel.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Extend KVM_SET_USER_MEMORY_REGION2 to support mapping vfio_dmabuf backed MMIO region into a guest. The main purpose of this change is for KVM to map MMIO resources without firstly mapping into the host, similar to what is done in guest_memfd. The immediate use case is for CoCo VMs to support private MMIO. Similar to private guest memory, private MMIO is also not intended to be accessed by host. The host access to private MMIO would be rejected by private devices (known as TDI in TDISP spec) and cause the TDI exit the secure state. The further impact to the system may vary according to device implementation. The TDISP spec doesn't mandate any error reporting or logging, the TLP may be handled as an Unsupported Request, or just be dropped. In my test environment, an AER NonFatalErr is reported and no further impact. So from HW perspective, disallowing host access to private MMIO is not that critical but nice to have. But stick to find pfn via userspace mapping while allowing the pfn been privately mapped conflicts with the private mapping concept. And it virtually allows userspace to map any address as private. Before fault in, KVM cannot distinguish if a userspace addr is for private MMIO and safe to host access. Rely on userspace mapping also means private MMIO mapping should follow userspace mapping change via mmu_notifier. This conflicts with the current design that mmu_notifier never impacts private mapping. It also makes no sense to support mmu_notifier just for private MMIO, private MMIO mapping should be fixed when CoCo-VM accepts the private MMIO, any following mapping change without guest permission should be invalid. So the choice here is to eliminate userspace mapping and switch to use the FD based MMIO resources. There is still need to switch the memory attribute (shared <-> private) for private MMIO, when guest switches the device attribute between shared & private. Unlike memory, MMIO region has only one physical backend so it is a bit like in-place conversion, which for private memory, requires much effort on how to invalidate user mapping when converting to private. But for MMIO, it is expected that VMM never needs to access assigned MMIO for feature emulation, so always disallow userspace MMIO mapping and use FD based MMIO resources for 'private capable' MMIO region. The dma-buf is chosen as the FD based backend, it meets the need for KVM to aquire the non-struct page memory that can still have lifetime controlled by VFIO. It provides the option to disallow userspace mmap as long as the exporter doesn't provide dma_buf_ops.mmap() callback. The concern is it now just supports mapping into device's default_domain via DMA APIs. Some clue I can found to extend dma-buf APIs for subsystems like IOMMUFD [1] or KVM. The adding of dma_buf_get_pfn_unlocked() in this series is for this purpose. An alternative is VFIO provides a dedicated FD for KVM. But considering IOMMUFD may use dma-buf for MMIO mapping [2], it is better to have a unified export mechanism for the same purpose in VFIO. Open: Currently store the dmabuf fd parameter in kvm_userspace_memory_region2::guest_memfd. It may be confusing but avoids introducing another API format for IOCTL(KVM_SET_USER_MEMORY_REGION3). [1] https://lore.kernel.org/all/YwywgciH6BiWz4H1@nvidia.com/ [2] https://lore.kernel.org/kvm/14-v4-0de2f6c78ed0+9d1-iommufd_jgg@nvidia.com/ Signed-off-by: Xu Yilun --- Documentation/virt/kvm/api.rst | 7 ++ include/linux/kvm_host.h | 18 +++++ include/uapi/linux/kvm.h | 1 + virt/kvm/Kconfig | 6 ++ virt/kvm/Makefile.kvm | 1 + virt/kvm/kvm_main.c | 32 +++++++-- virt/kvm/kvm_mm.h | 19 +++++ virt/kvm/vfio_dmabuf.c | 125 +++++++++++++++++++++++++++++++++ 8 files changed, 205 insertions(+), 4 deletions(-) create mode 100644 virt/kvm/vfio_dmabuf.c diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst index 7911da34b9fd..f6199764a768 100644 --- a/Documentation/virt/kvm/api.rst +++ b/Documentation/virt/kvm/api.rst @@ -6304,6 +6304,13 @@ state. At VM creation time, all memory is shared, i.e. the PRIVATE attribute is '0' for all gfns. Userspace can control whether memory is shared/private by toggling KVM_MEMORY_ATTRIBUTE_PRIVATE via KVM_SET_MEMORY_ATTRIBUTES as needed. +Userspace can set KVM_MEM_VFIO_DMABUF in flags to indicate the memory region is +backed by a userspace unmappable dma_buf exported by VFIO. The backend resource +is one piece of MMIO region of the device. The slot is unmappable so it is +allowed to be converted to private. KVM binds the memory region to a given +dma_buf fd range of [0, memory_size]. For now, the dma_buf fd is filled in +'guest_memfd' field, and the guest_memfd_offset must be 0; + S390: ^^^^^ diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 0a141685872d..871d927485a5 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -606,6 +606,10 @@ struct kvm_memory_slot { pgoff_t pgoff; } gmem; #endif + +#ifdef CONFIG_KVM_VFIO_DMABUF + struct dma_buf_attachment *dmabuf_attach; +#endif }; static inline bool kvm_slot_can_be_private(const struct kvm_memory_slot *slot) @@ -2568,4 +2572,18 @@ static inline int kvm_enable_virtualization(void) { return 0; } static inline void kvm_disable_virtualization(void) { } #endif +#ifdef CONFIG_KVM_VFIO_DMABUF +int kvm_vfio_dmabuf_get_pfn(struct kvm *kvm, struct kvm_memory_slot *slot, + gfn_t gfn, kvm_pfn_t *pfn, int *max_order); +#else +static inline int kvm_vfio_dmabuf_get_pfn(struct kvm *kvm, + struct kvm_memory_slot *slot, + gfn_t gfn, kvm_pfn_t *pfn, + int *max_order); +{ + KVM_BUG_ON(1, kvm); + return -EIO; +} +#endif + #endif diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h index 1dae36cbfd52..4f5b5def182a 100644 --- a/include/uapi/linux/kvm.h +++ b/include/uapi/linux/kvm.h @@ -51,6 +51,7 @@ struct kvm_userspace_memory_region2 { #define KVM_MEM_LOG_DIRTY_PAGES (1UL << 0) #define KVM_MEM_READONLY (1UL << 1) #define KVM_MEM_GUEST_MEMFD (1UL << 2) +#define KVM_MEM_VFIO_DMABUF (1UL << 3) /* for KVM_IRQ_LINE */ struct kvm_irq_level { diff --git a/virt/kvm/Kconfig b/virt/kvm/Kconfig index 54e959e7d68f..68fff3fb1841 100644 --- a/virt/kvm/Kconfig +++ b/virt/kvm/Kconfig @@ -115,6 +115,7 @@ config KVM_PRIVATE_MEM config KVM_GENERIC_PRIVATE_MEM select KVM_GENERIC_MEMORY_ATTRIBUTES select KVM_PRIVATE_MEM + select KVM_VFIO_DMABUF bool config HAVE_KVM_ARCH_GMEM_PREPARE @@ -124,3 +125,8 @@ config HAVE_KVM_ARCH_GMEM_PREPARE config HAVE_KVM_ARCH_GMEM_INVALIDATE bool depends on KVM_PRIVATE_MEM + +config KVM_VFIO_DMABUF + bool + select DMA_SHARED_BUFFER + select DMABUF_MOVE_NOTIFY diff --git a/virt/kvm/Makefile.kvm b/virt/kvm/Makefile.kvm index 724c89af78af..c08e98f13f65 100644 --- a/virt/kvm/Makefile.kvm +++ b/virt/kvm/Makefile.kvm @@ -13,3 +13,4 @@ kvm-$(CONFIG_HAVE_KVM_IRQ_ROUTING) += $(KVM)/irqchip.o kvm-$(CONFIG_HAVE_KVM_DIRTY_RING) += $(KVM)/dirty_ring.o kvm-$(CONFIG_HAVE_KVM_PFNCACHE) += $(KVM)/pfncache.o kvm-$(CONFIG_KVM_PRIVATE_MEM) += $(KVM)/guest_memfd.o +kvm-$(CONFIG_KVM_VFIO_DMABUF) += $(KVM)/vfio_dmabuf.o diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 4a13de82479d..c9342d88f06c 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -938,6 +938,8 @@ static void kvm_free_memslot(struct kvm *kvm, struct kvm_memory_slot *slot) { if (slot->flags & KVM_MEM_GUEST_MEMFD) kvm_gmem_unbind(slot); + else if (slot->flags & KVM_MEM_VFIO_DMABUF) + kvm_vfio_dmabuf_unbind(slot); kvm_destroy_dirty_bitmap(slot); @@ -1526,13 +1528,19 @@ static void kvm_replace_memslot(struct kvm *kvm, static int check_memory_region_flags(struct kvm *kvm, const struct kvm_userspace_memory_region2 *mem) { + u32 private_mask = KVM_MEM_GUEST_MEMFD | KVM_MEM_VFIO_DMABUF; + u32 private_flag = mem->flags & private_mask; u32 valid_flags = KVM_MEM_LOG_DIRTY_PAGES; + /* private flags are mutually exclusive. */ + if (private_flag & (private_flag - 1)) + return -EINVAL; + if (kvm_arch_has_private_mem(kvm)) - valid_flags |= KVM_MEM_GUEST_MEMFD; + valid_flags |= private_flag; /* Dirty logging private memory is not currently supported. */ - if (mem->flags & KVM_MEM_GUEST_MEMFD) + if (private_flag) valid_flags &= ~KVM_MEM_LOG_DIRTY_PAGES; /* @@ -1540,8 +1548,7 @@ static int check_memory_region_flags(struct kvm *kvm, * read-only memslots have emulated MMIO, not page fault, semantics, * and KVM doesn't allow emulated MMIO for private memory. */ - if (kvm_arch_has_readonly_mem(kvm) && - !(mem->flags & KVM_MEM_GUEST_MEMFD)) + if (kvm_arch_has_readonly_mem(kvm) && !private_flag) valid_flags |= KVM_MEM_READONLY; if (mem->flags & ~valid_flags) @@ -2044,6 +2051,21 @@ int __kvm_set_memory_region(struct kvm *kvm, r = kvm_gmem_bind(kvm, new, mem->guest_memfd, mem->guest_memfd_offset); if (r) goto out; + } else if (mem->flags & KVM_MEM_VFIO_DMABUF) { + if (mem->guest_memfd_offset) { + r = -EINVAL; + goto out; + } + + /* + * Open: May be confusing that store the dmabuf fd parameter in + * kvm_userspace_memory_region2::guest_memfd. But this avoids + * introducing another format for + * IOCTL(KVM_SET_USER_MEMORY_REGIONX). + */ + r = kvm_vfio_dmabuf_bind(kvm, new, mem->guest_memfd); + if (r) + goto out; } r = kvm_set_memslot(kvm, old, new, change); @@ -2055,6 +2077,8 @@ int __kvm_set_memory_region(struct kvm *kvm, out_unbind: if (mem->flags & KVM_MEM_GUEST_MEMFD) kvm_gmem_unbind(new); + else if (mem->flags & KVM_MEM_VFIO_DMABUF) + kvm_vfio_dmabuf_unbind(new); out: kfree(new); return r; diff --git a/virt/kvm/kvm_mm.h b/virt/kvm/kvm_mm.h index acef3f5c582a..faefc252c337 100644 --- a/virt/kvm/kvm_mm.h +++ b/virt/kvm/kvm_mm.h @@ -93,4 +93,23 @@ static inline void kvm_gmem_unbind(struct kvm_memory_slot *slot) } #endif /* CONFIG_KVM_PRIVATE_MEM */ +#ifdef CONFIG_KVM_VFIO_DMABUF +int kvm_vfio_dmabuf_bind(struct kvm *kvm, struct kvm_memory_slot *slot, + unsigned int fd); +void kvm_vfio_dmabuf_unbind(struct kvm_memory_slot *slot); +#else +static inline int kvm_vfio_dmabuf_bind(struct kvm *kvm, + struct kvm_memory_slot *slot, + unsigned int fd); +{ + WARN_ON_ONCE(1); + return -EIO; +} + +static inline void kvm_vfio_dmabuf_unbind(struct kvm_memory_slot *slot) +{ + WARN_ON_ONCE(1); +} +#endif /* CONFIG_KVM_VFIO_DMABUF */ + #endif /* __KVM_MM_H__ */ diff --git a/virt/kvm/vfio_dmabuf.c b/virt/kvm/vfio_dmabuf.c new file mode 100644 index 000000000000..c427ab39c68a --- /dev/null +++ b/virt/kvm/vfio_dmabuf.c @@ -0,0 +1,125 @@ +// SPDX-License-Identifier: GPL-2.0 +#include +#include +#include + +#include "kvm_mm.h" + +MODULE_IMPORT_NS("DMA_BUF"); + +struct kvm_vfio_dmabuf { + struct kvm *kvm; + struct kvm_memory_slot *slot; +}; + +static void kv_dmabuf_move_notify(struct dma_buf_attachment *attach) +{ + struct kvm_vfio_dmabuf *kv_dmabuf = attach->importer_priv; + struct kvm_memory_slot *slot = kv_dmabuf->slot; + struct kvm *kvm = kv_dmabuf->kvm; + bool flush = false; + + struct kvm_gfn_range gfn_range = { + .start = slot->base_gfn, + .end = slot->base_gfn + slot->npages, + .slot = slot, + .may_block = true, + .attr_filter = KVM_FILTER_PRIVATE | KVM_FILTER_SHARED, + }; + + KVM_MMU_LOCK(kvm); + kvm_mmu_invalidate_begin(kvm); + flush |= kvm_mmu_unmap_gfn_range(kvm, &gfn_range); + if (flush) + kvm_flush_remote_tlbs(kvm); + + kvm_mmu_invalidate_end(kvm); + KVM_MMU_UNLOCK(kvm); +} + +static const struct dma_buf_attach_ops kv_dmabuf_attach_ops = { + .allow_peer2peer = true, + .move_notify = kv_dmabuf_move_notify, +}; + +int kvm_vfio_dmabuf_bind(struct kvm *kvm, struct kvm_memory_slot *slot, + unsigned int fd) +{ + size_t size = slot->npages << PAGE_SHIFT; + struct dma_buf_attachment *attach; + struct kvm_vfio_dmabuf *kv_dmabuf; + struct dma_buf *dmabuf; + int ret; + + dmabuf = dma_buf_get(fd); + if (IS_ERR(dmabuf)) + return PTR_ERR(dmabuf); + + if (size != dmabuf->size) { + ret = -EINVAL; + goto err_dmabuf; + } + + kv_dmabuf = kzalloc(sizeof(*kv_dmabuf), GFP_KERNEL); + if (!kv_dmabuf) { + ret = -ENOMEM; + goto err_dmabuf; + } + + kv_dmabuf->kvm = kvm; + kv_dmabuf->slot = slot; + attach = dma_buf_dynamic_attach(dmabuf, NULL, &kv_dmabuf_attach_ops, + kv_dmabuf); + if (IS_ERR(attach)) { + ret = PTR_ERR(attach); + goto err_kv_dmabuf; + } + + slot->dmabuf_attach = attach; + + return 0; + +err_kv_dmabuf: + kfree(kv_dmabuf); +err_dmabuf: + dma_buf_put(dmabuf); + return ret; +} + +void kvm_vfio_dmabuf_unbind(struct kvm_memory_slot *slot) +{ + struct dma_buf_attachment *attach = slot->dmabuf_attach; + struct kvm_vfio_dmabuf *kv_dmabuf; + struct dma_buf *dmabuf; + + if (WARN_ON_ONCE(!attach)) + return; + + kv_dmabuf = attach->importer_priv; + dmabuf = attach->dmabuf; + dma_buf_detach(dmabuf, attach); + kfree(kv_dmabuf); + dma_buf_put(dmabuf); +} + +/* + * The return value matters. If return -EFAULT, userspace will try to do + * page attribute (shared <-> private) conversion. + */ +int kvm_vfio_dmabuf_get_pfn(struct kvm *kvm, struct kvm_memory_slot *slot, + gfn_t gfn, kvm_pfn_t *pfn, int *max_order) +{ + struct dma_buf_attachment *attach = slot->dmabuf_attach; + pgoff_t pgoff = gfn - slot->base_gfn; + int ret; + + if (WARN_ON_ONCE(!attach)) + return -EFAULT; + + ret = dma_buf_get_pfn_unlocked(attach, pgoff, pfn, max_order); + if (ret) + return -EIO; + + return 0; +} +EXPORT_SYMBOL_GPL(kvm_vfio_dmabuf_get_pfn); From patchwork Tue Jan 7 14:27:14 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Xu Yilun X-Patchwork-Id: 13930052 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.11]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 585B3198851; Wed, 8 Jan 2025 02:29:03 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.11 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736303345; cv=none; b=CfNP9XdkBc2uzv88YucK7yREFMkBP92Zx4V+Sw7kN4xBSfzZKl8RKu3LB4zn3z0jtdkeph2DSWyp8GSzIJNh/b1o/PfGlB0ee7sIUlmwPtq/ZiCQg0gW4iSzuLFKVxyCeyaLeU/D7GMGk85VZryeBgczskmddYDqK5IBHjwGPUg= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736303345; c=relaxed/simple; bh=bJRSkONMJqebaCZW+gob8Q62wZ/LJw1y78ECWyPbnZQ=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=FhjmNrWlOiwz6nYSx6nfcZyHLsYb2C6QQYG/yGNeNJWaZEIUCmG92U6mwEOwJMXFa0LGnZ02X83jmcomMuYzWyfgfC6PNk2s7J7ARcATcjC9KDFTJgFK+5aXRVcNAsSF6Zt7l7LsAwbCM7f7k+/MXfSS4bx8haq+65Ww7zXFj7M= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=none smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=ONsWXK1O; arc=none smtp.client-ip=198.175.65.11 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="ONsWXK1O" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1736303344; x=1767839344; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=bJRSkONMJqebaCZW+gob8Q62wZ/LJw1y78ECWyPbnZQ=; b=ONsWXK1OU/ZhqFyRHRzr5Y3DvdyVLsxOy6pVd9Lqd38ws4dld/HyyKHD ugvFMr0NHX5nBfW3As1sR43yc+HsXn2lLn0bZApn6d5yZnmpbxGx2j6FC a2/ZyEXn4YvK4BC6OjMrdt1xNWO9AYcTVzucq9UizM5xAXocQCKBW13SR xDKAwOMFb0uoz/KRpNKgg+Rwfxdr0XOORaSU62rhaZVrTE1Pt8N3f/dxv RtjTjbqnRTe1+syEiQb6P9f8eVkwuzfoRC6/NccNHfjzWK9b3xYuICItY s31Mpjz0jylRhupDX6BOKHV4k9/UVH/ts9Lp/V08ArUwjz8UQXhAQfuhZ g==; X-CSE-ConnectionGUID: tP259KaRRI+dW7bPil/Czw== X-CSE-MsgGUID: ciTjZxGMQuyw/YDioed8GQ== X-IronPort-AV: E=McAfee;i="6700,10204,11308"; a="47010535" X-IronPort-AV: E=Sophos;i="6.12,296,1728975600"; d="scan'208";a="47010535" Received: from orviesa008.jf.intel.com ([10.64.159.148]) by orvoesa103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 07 Jan 2025 18:29:03 -0800 X-CSE-ConnectionGUID: ftMgnn64SLq8VRe+W3TdqQ== X-CSE-MsgGUID: YgHk+2iRRXSnX7E/tvOqCQ== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.12,224,1728975600"; d="scan'208";a="103793703" Received: from yilunxu-optiplex-7050.sh.intel.com ([10.239.159.165]) by orviesa008.jf.intel.com with ESMTP; 07 Jan 2025 18:28:58 -0800 From: Xu Yilun To: kvm@vger.kernel.org, dri-devel@lists.freedesktop.org, linux-media@vger.kernel.org, linaro-mm-sig@lists.linaro.org, sumit.semwal@linaro.org, christian.koenig@amd.com, pbonzini@redhat.com, seanjc@google.com, alex.williamson@redhat.com, jgg@nvidia.com, vivek.kasireddy@intel.com, dan.j.williams@intel.com, aik@amd.com Cc: yilun.xu@intel.com, yilun.xu@linux.intel.com, linux-coco@lists.linux.dev, linux-kernel@vger.kernel.org, lukas@wunner.de, yan.y.zhao@intel.com, daniel.vetter@ffwll.ch, leon@kernel.org, baolu.lu@linux.intel.com, zhenzhong.duan@intel.com, tao1.su@intel.com Subject: [RFC PATCH 07/12] KVM: x86/mmu: Handle page fault for vfio_dmabuf backed MMIO Date: Tue, 7 Jan 2025 22:27:14 +0800 Message-Id: <20250107142719.179636-8-yilun.xu@linux.intel.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20250107142719.179636-1-yilun.xu@linux.intel.com> References: <20250107142719.179636-1-yilun.xu@linux.intel.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Add support for resolving page faults on vfio_dmabuf backed MMIO. This is to support private MMIO for private assigned devices (known as TDI in TDISP spec). Private MMIOs are set to KVM as vfio_dmabuf typed memory slot, which is another type of can-be-private memory slot just like the gmem slot. Like gmem slot, KVM needs to map its GFN as shared or private based on the current state of the GFN's memory attribute. When page fault happens for private MMIO but private <-> shared conversion is needed, KVM still exits to userspace with exit reason KVM_EXIT_MEMORY_FAULT and toggles KVM_MEMORY_EXIT_FLAG_PRIVATE. Unlike gmem slot, vfio_dmabuf slot has only one backend MMIO resource, the switching of GFN's attribute won't change the way of getting PFN, the vfio_dmabuf specific way, kvm_vfio_dmabuf_get_pfn(). Signed-off-by: Xu Yilun --- arch/x86/kvm/mmu/mmu.c | 25 +++++++++++++++++++++++-- include/linux/kvm_host.h | 7 ++++++- 2 files changed, 29 insertions(+), 3 deletions(-) diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 713ca857f2c2..90ca54fee22f 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -4341,8 +4341,13 @@ static int kvm_mmu_faultin_pfn_private(struct kvm_vcpu *vcpu, return -EFAULT; } - r = kvm_gmem_get_pfn(vcpu->kvm, fault->slot, fault->gfn, &fault->pfn, - &fault->refcounted_page, &max_order); + if (kvm_slot_is_vfio_dmabuf(fault->slot)) + r = kvm_vfio_dmabuf_get_pfn(vcpu->kvm, fault->slot, fault->gfn, + &fault->pfn, &max_order); + else + r = kvm_gmem_get_pfn(vcpu->kvm, fault->slot, fault->gfn, + &fault->pfn, &fault->refcounted_page, + &max_order); if (r) { kvm_mmu_prepare_memory_fault_exit(vcpu, fault); return r; @@ -4363,6 +4368,22 @@ static int __kvm_mmu_faultin_pfn(struct kvm_vcpu *vcpu, if (fault->is_private) return kvm_mmu_faultin_pfn_private(vcpu, fault); + /* vfio_dmabuf slot is also applicable for shared mapping */ + if (kvm_slot_is_vfio_dmabuf(fault->slot)) { + int max_order, r; + + r = kvm_vfio_dmabuf_get_pfn(vcpu->kvm, fault->slot, fault->gfn, + &fault->pfn, &max_order); + if (r) + return r; + + fault->max_level = min(kvm_max_level_for_order(max_order), + fault->max_level); + fault->map_writable = !(fault->slot->flags & KVM_MEM_READONLY); + + return RET_PF_CONTINUE; + } + foll |= FOLL_NOWAIT; fault->pfn = __kvm_faultin_pfn(fault->slot, fault->gfn, foll, &fault->map_writable, &fault->refcounted_page); diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 871d927485a5..966a5a247c6b 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -614,7 +614,12 @@ struct kvm_memory_slot { static inline bool kvm_slot_can_be_private(const struct kvm_memory_slot *slot) { - return slot && (slot->flags & KVM_MEM_GUEST_MEMFD); + return slot && (slot->flags & (KVM_MEM_GUEST_MEMFD | KVM_MEM_VFIO_DMABUF)); +} + +static inline bool kvm_slot_is_vfio_dmabuf(const struct kvm_memory_slot *slot) +{ + return slot && (slot->flags & KVM_MEM_VFIO_DMABUF); } static inline bool kvm_slot_dirty_track_enabled(const struct kvm_memory_slot *slot) From patchwork Tue Jan 7 14:27:15 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Xu Yilun X-Patchwork-Id: 13930053 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.11]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id DE2371632F2; Wed, 8 Jan 2025 02:29:08 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.11 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736303350; cv=none; b=Hng8bcMWh9ryki3Cjm5Uk/AduExWxYb5SkilgfsRaUT61uv4y1q6k6OUjqtdlNNp4KXblYgkFiOIDVdJPgmvftwAUbglLkXhTecgKsDlAB6asWgBDGnEOdHNhShulvKLFJEz2D9vvDjO897575q/q0hs/jDwc6ZC9hltlojy9cY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736303350; c=relaxed/simple; bh=D0Ctyg+ZzScKG7XiWzZtdkCCkALH2qE1Urf72jJEyPc=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=XoQ4tmbpKZyKqg767/ivukUzrFh5PVlPP+qDpvle3OdXv+/hbWJu+WS6ngsFRJQ4Oc6kp7MAslwx3ZIVbZnuoGH0vTrRBs/ieGzQ9l8wWEYWNgfTcJ96DO3LeyS+4LXc1TyXBlGLbbIeTYfF9SYDtfzQKoE6iEmmZfWSmcCfcZc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=none smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=ax4rACUh; arc=none smtp.client-ip=198.175.65.11 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="ax4rACUh" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1736303349; x=1767839349; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=D0Ctyg+ZzScKG7XiWzZtdkCCkALH2qE1Urf72jJEyPc=; b=ax4rACUhUdkXLSGrIrxdUv7PdWzA8SW0z9ap+UaHula1maDs2x8YyTUJ dqgFMDcDIWou8Y6Nip00sW66FfeBtwEMTa1YbJ/LEjSpTQF2agNSg7K19 IeMJebqNgPPUmMRmIXwhXYNyyJKTzMFDsU1Y03lneJs3AxXhpJQeAzZDa A0GghIHHvpHMa/Vo/tsc2uzw8FxnnTItpJHeX1jfTbvqAYaXjrWJNFQqk sX375VpMcy0ErwA32HdaeTa5D2l6S71Cwl4J0D8/8R+41A5FdlsZcC7FO WhymD/4ZLgVbZnKCQx3UsRGBH1xF+o5NZ+OyyEeiLnC5DTS10iP0mI+FX Q==; X-CSE-ConnectionGUID: xbwV6PHGT/uEIjPdGdTpvA== X-CSE-MsgGUID: 1/CJk88CQ9mfxP1SpZbqkg== X-IronPort-AV: E=McAfee;i="6700,10204,11308"; a="47010554" X-IronPort-AV: E=Sophos;i="6.12,296,1728975600"; d="scan'208";a="47010554" Received: from orviesa008.jf.intel.com ([10.64.159.148]) by orvoesa103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 07 Jan 2025 18:29:09 -0800 X-CSE-ConnectionGUID: jzF6RmRtRFWNznlKn1mnww== X-CSE-MsgGUID: d4vAvP1EQ+mbgV8wjEZ0Hw== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.12,224,1728975600"; d="scan'208";a="103793735" Received: from yilunxu-optiplex-7050.sh.intel.com ([10.239.159.165]) by orviesa008.jf.intel.com with ESMTP; 07 Jan 2025 18:29:04 -0800 From: Xu Yilun To: kvm@vger.kernel.org, dri-devel@lists.freedesktop.org, linux-media@vger.kernel.org, linaro-mm-sig@lists.linaro.org, sumit.semwal@linaro.org, christian.koenig@amd.com, pbonzini@redhat.com, seanjc@google.com, alex.williamson@redhat.com, jgg@nvidia.com, vivek.kasireddy@intel.com, dan.j.williams@intel.com, aik@amd.com Cc: yilun.xu@intel.com, yilun.xu@linux.intel.com, linux-coco@lists.linux.dev, linux-kernel@vger.kernel.org, lukas@wunner.de, yan.y.zhao@intel.com, daniel.vetter@ffwll.ch, leon@kernel.org, baolu.lu@linux.intel.com, zhenzhong.duan@intel.com, tao1.su@intel.com Subject: [RFC PATCH 08/12] vfio/pci: Create host unaccessible dma-buf for private device Date: Tue, 7 Jan 2025 22:27:15 +0800 Message-Id: <20250107142719.179636-9-yilun.xu@linux.intel.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20250107142719.179636-1-yilun.xu@linux.intel.com> References: <20250107142719.179636-1-yilun.xu@linux.intel.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Add a flag for ioctl(VFIO_DEVICE_BIND_IOMMUFD) to mark a device as for private assignment. For these private assigned devices, disallow host accessing their MMIO resources. Since the MMIO regions for private assignment are not accessible from host, remove the VFIO_REGION_INFO_FLAG_MMAP/READ/WRITE for these regions, instead add a new VFIO_REGION_INFO_FLAG_PRIVATE flag to indicate users should create dma-buf for MMIO mapping in KVM MMU. Signed-off-by: Xu Yilun --- drivers/vfio/device_cdev.c | 9 ++++++++- drivers/vfio/pci/vfio_pci_core.c | 14 ++++++++++++++ drivers/vfio/pci/vfio_pci_priv.h | 2 ++ drivers/vfio/pci/vfio_pci_rdwr.c | 3 +++ include/linux/vfio.h | 1 + include/uapi/linux/vfio.h | 5 ++++- 6 files changed, 32 insertions(+), 2 deletions(-) diff --git a/drivers/vfio/device_cdev.c b/drivers/vfio/device_cdev.c index bb1817bd4ff3..919285c1cd7a 100644 --- a/drivers/vfio/device_cdev.c +++ b/drivers/vfio/device_cdev.c @@ -75,7 +75,10 @@ long vfio_df_ioctl_bind_iommufd(struct vfio_device_file *df, if (copy_from_user(&bind, arg, minsz)) return -EFAULT; - if (bind.argsz < minsz || bind.flags || bind.iommufd < 0) + if (bind.argsz < minsz || bind.iommufd < 0) + return -EINVAL; + + if (bind.flags & ~(VFIO_DEVICE_BIND_IOMMUFD_PRIVATE)) return -EINVAL; /* BIND_IOMMUFD only allowed for cdev fds */ @@ -118,6 +121,9 @@ long vfio_df_ioctl_bind_iommufd(struct vfio_device_file *df, goto out_close_device; device->cdev_opened = true; + if (bind.flags & VFIO_DEVICE_BIND_IOMMUFD_PRIVATE) + device->is_private = true; + /* * Paired with smp_load_acquire() in vfio_device_fops::ioctl/ * read/write/mmap @@ -151,6 +157,7 @@ void vfio_df_unbind_iommufd(struct vfio_device_file *df) return; mutex_lock(&device->dev_set->lock); + device->is_private = false; vfio_df_close(df); vfio_device_put_kvm(device); iommufd_ctx_put(df->iommufd); diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c index f69eda5956ad..11c735dfe1f7 100644 --- a/drivers/vfio/pci/vfio_pci_core.c +++ b/drivers/vfio/pci/vfio_pci_core.c @@ -1005,6 +1005,12 @@ static int vfio_pci_ioctl_get_info(struct vfio_pci_core_device *vdev, return copy_to_user(arg, &info, minsz) ? -EFAULT : 0; } +bool is_vfio_pci_bar_private(struct vfio_pci_core_device *vdev, int bar) +{ + /* Any mmap supported bar can be used as vfio dmabuf */ + return vdev->bar_mmap_supported[bar] && vdev->vdev.is_private; +} + static int vfio_pci_ioctl_get_region_info(struct vfio_pci_core_device *vdev, struct vfio_region_info __user *arg) { @@ -1035,6 +1041,11 @@ static int vfio_pci_ioctl_get_region_info(struct vfio_pci_core_device *vdev, break; } + if (is_vfio_pci_bar_private(vdev, info.index)) { + info.flags = VFIO_REGION_INFO_FLAG_PRIVATE; + break; + } + info.flags = VFIO_REGION_INFO_FLAG_READ | VFIO_REGION_INFO_FLAG_WRITE; if (vdev->bar_mmap_supported[info.index]) { @@ -1735,6 +1746,9 @@ int vfio_pci_core_mmap(struct vfio_device *core_vdev, struct vm_area_struct *vma u64 phys_len, req_len, pgoff, req_start; int ret; + if (vdev->vdev.is_private) + return -EINVAL; + index = vma->vm_pgoff >> (VFIO_PCI_OFFSET_SHIFT - PAGE_SHIFT); if (index >= VFIO_PCI_NUM_REGIONS + vdev->num_regions) diff --git a/drivers/vfio/pci/vfio_pci_priv.h b/drivers/vfio/pci/vfio_pci_priv.h index d27f383f3931..2b61e35145fd 100644 --- a/drivers/vfio/pci/vfio_pci_priv.h +++ b/drivers/vfio/pci/vfio_pci_priv.h @@ -126,4 +126,6 @@ static inline void vfio_pci_dma_buf_move(struct vfio_pci_core_device *vdev, } #endif +bool is_vfio_pci_bar_private(struct vfio_pci_core_device *vdev, int bar); + #endif diff --git a/drivers/vfio/pci/vfio_pci_rdwr.c b/drivers/vfio/pci/vfio_pci_rdwr.c index 66b72c289284..e385f7f63414 100644 --- a/drivers/vfio/pci/vfio_pci_rdwr.c +++ b/drivers/vfio/pci/vfio_pci_rdwr.c @@ -242,6 +242,9 @@ ssize_t vfio_pci_bar_rw(struct vfio_pci_core_device *vdev, char __user *buf, struct resource *res = &vdev->pdev->resource[bar]; ssize_t done; + if (is_vfio_pci_bar_private(vdev, bar)) + return -EINVAL; + if (pci_resource_start(pdev, bar)) end = pci_resource_len(pdev, bar); else if (bar == PCI_ROM_RESOURCE && diff --git a/include/linux/vfio.h b/include/linux/vfio.h index 2258b0585330..e99d856c6cd8 100644 --- a/include/linux/vfio.h +++ b/include/linux/vfio.h @@ -69,6 +69,7 @@ struct vfio_device { struct iommufd_device *iommufd_device; u8 iommufd_attached:1; #endif + u8 is_private:1; u8 cdev_opened:1; #ifdef CONFIG_DEBUG_FS /* diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h index f43dfbde7352..6a1c703e3185 100644 --- a/include/uapi/linux/vfio.h +++ b/include/uapi/linux/vfio.h @@ -275,6 +275,7 @@ struct vfio_region_info { #define VFIO_REGION_INFO_FLAG_WRITE (1 << 1) /* Region supports write */ #define VFIO_REGION_INFO_FLAG_MMAP (1 << 2) /* Region supports mmap */ #define VFIO_REGION_INFO_FLAG_CAPS (1 << 3) /* Info supports caps */ +#define VFIO_REGION_INFO_FLAG_PRIVATE (1 << 4) /* Region supports private MMIO */ __u32 index; /* Region index */ __u32 cap_offset; /* Offset within info struct of first cap */ __aligned_u64 size; /* Region size (bytes) */ @@ -904,7 +905,8 @@ struct vfio_device_feature { * VFIO_DEVICE_BIND_IOMMUFD - _IOR(VFIO_TYPE, VFIO_BASE + 18, * struct vfio_device_bind_iommufd) * @argsz: User filled size of this data. - * @flags: Must be 0. + * @flags: Optional device initialization flags: + * VFIO_DEVICE_BIND_IOMMUFD_PRIVATE: for private assignment * @iommufd: iommufd to bind. * @out_devid: The device id generated by this bind. devid is a handle for * this device/iommufd bond and can be used in IOMMUFD commands. @@ -921,6 +923,7 @@ struct vfio_device_feature { struct vfio_device_bind_iommufd { __u32 argsz; __u32 flags; +#define VFIO_DEVICE_BIND_IOMMUFD_PRIVATE (1 << 0) __s32 iommufd; __u32 out_devid; }; From patchwork Tue Jan 7 14:27:16 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Xu Yilun X-Patchwork-Id: 13930054 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.11]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1773E1A3056; Wed, 8 Jan 2025 02:29:14 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.11 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736303356; cv=none; b=cOnCehkpSseAtnfBnf4IUyipDzcz6OfwyHEcZkW4En8u0Y4JMzdytzyeqSwDBEZFtgxV5JQKREaUzMjA47k/WO6f50rEfJM7vGPVbAD/3ueiKir8g2Ga+oJY60Zk5UIEQrmn0sEVSUj6BdbQ0QVXyJVHG/bXnqW9T3yJ4t6XgsE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736303356; c=relaxed/simple; bh=OdYvqy9SZiyGRa0z3UebND5Dabkm8gIYOojEDkxguSw=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=k/Tj5oRKK/YemCYeElnmLbd5U7UJEve3/bAhROcntORZmL5LKT4c0emnBqSAeoyLiRi2efTyURR26gIZxQKtAR/pxKsgO+s6tk/zEnyaumi1ILgFVXyz1lcLpbvMEjd6/og7tNSbndBEvzsY7CmU05xn5AsDdbXzscmOJ6hCU5E= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=none smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=GwRjz4eI; arc=none smtp.client-ip=198.175.65.11 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="GwRjz4eI" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1736303355; x=1767839355; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=OdYvqy9SZiyGRa0z3UebND5Dabkm8gIYOojEDkxguSw=; b=GwRjz4eIweKY2zGzQ4oCw5B1aP6zFyNSpxcITvrArbTgix8l7F4zf9n0 YlKps6VM5vhbQaKbdnqvpbNqVWQmmyXZftBsFo1pGMaTwryx1PPFJHiAf tl4sa44v2kj1FUcSmZ4+GEuydM3MKm3/UuCR55kFtTj2pRQAmjXfbVmMj WWyoKw2yskdfXKAaNxV660ehbhRYn1J6axEY/SGf8dMpZkaWOtZ48P4nf gT8S7Fd+TE4nQX/n285al2yYH9ZjSoXka5u54AtFm8S5haHt892uk5Nkk 4M+NypF6Ene1KU6776462zO7HzXb2yf7aRgRtInN2YW+LY0/ts+C9lv0m w==; X-CSE-ConnectionGUID: aOcBgEMlQZKF1q2K/AIdTQ== X-CSE-MsgGUID: kbShPKG/ShWXVFr02mMsxg== X-IronPort-AV: E=McAfee;i="6700,10204,11308"; a="47010580" X-IronPort-AV: E=Sophos;i="6.12,296,1728975600"; d="scan'208";a="47010580" Received: from orviesa008.jf.intel.com ([10.64.159.148]) by orvoesa103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 07 Jan 2025 18:29:14 -0800 X-CSE-ConnectionGUID: f/1pbW8GShW+Cp7ykLs29Q== X-CSE-MsgGUID: 60HgkxdjTqiSz/RcjwfLHw== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.12,224,1728975600"; d="scan'208";a="103793792" Received: from yilunxu-optiplex-7050.sh.intel.com ([10.239.159.165]) by orviesa008.jf.intel.com with ESMTP; 07 Jan 2025 18:29:09 -0800 From: Xu Yilun To: kvm@vger.kernel.org, dri-devel@lists.freedesktop.org, linux-media@vger.kernel.org, linaro-mm-sig@lists.linaro.org, sumit.semwal@linaro.org, christian.koenig@amd.com, pbonzini@redhat.com, seanjc@google.com, alex.williamson@redhat.com, jgg@nvidia.com, vivek.kasireddy@intel.com, dan.j.williams@intel.com, aik@amd.com Cc: yilun.xu@intel.com, yilun.xu@linux.intel.com, linux-coco@lists.linux.dev, linux-kernel@vger.kernel.org, lukas@wunner.de, yan.y.zhao@intel.com, daniel.vetter@ffwll.ch, leon@kernel.org, baolu.lu@linux.intel.com, zhenzhong.duan@intel.com, tao1.su@intel.com Subject: [RFC PATCH 09/12] vfio/pci: Export vfio dma-buf specific info for importers Date: Tue, 7 Jan 2025 22:27:16 +0800 Message-Id: <20250107142719.179636-10-yilun.xu@linux.intel.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20250107142719.179636-1-yilun.xu@linux.intel.com> References: <20250107142719.179636-1-yilun.xu@linux.intel.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 VFIO dma-buf supports exporting host unaccessible MMIO regions for private assignment. Export this info by attaching VFIO specific dma-buf data in struct dma_buf::priv. Provide a helper vfio_dma_buf_get_data() for importers to fetch these data. The exported host unaccessible info are for importers to decide if the dma-buf is good to use. KVM only allows host unaccessible MMIO regions for private MMIO slot. But it is expected other importers (e.g. RDMA driver, IOMMUFD) may also use the dma-buf machanism for P2P in native or non-CoCo VM, in which cases host unaccessible is not required. Also export struct kvm * handler attached to the vfio device. This allows KVM to do another sanity check. MMIO should only be assigned to a CoCo VM if its owner device is already assigned to the same VM. Signed-off-by: Xu Yilun --- drivers/vfio/pci/dma_buf.c | 24 ++++++++++++++++++++++++ include/linux/vfio.h | 19 +++++++++++++++++++ 2 files changed, 43 insertions(+) diff --git a/drivers/vfio/pci/dma_buf.c b/drivers/vfio/pci/dma_buf.c index ad12cfb85099..ad984f2c22fc 100644 --- a/drivers/vfio/pci/dma_buf.c +++ b/drivers/vfio/pci/dma_buf.c @@ -9,6 +9,8 @@ MODULE_IMPORT_NS("DMA_BUF"); struct vfio_pci_dma_buf { + struct vfio_dma_buf_data export_data; + struct dma_buf *dmabuf; struct vfio_pci_core_device *vdev; struct list_head dmabufs_elm; @@ -156,6 +158,14 @@ int vfio_pci_core_feature_dma_buf(struct vfio_pci_core_device *vdev, u32 flags, priv->vdev = vdev; priv->nr_ranges = get_dma_buf.nr_ranges; priv->dma_ranges = dma_ranges; + /* + * KVM expects private dma_buf. An private dma_buf must not + * support dma_buf_ops.map_dma_buf/mmap/vmap(). The exporter must also + * ensure no side channel access for the backend resource, e.g. + * vfio_device_ops.mmap() should not be supported. + */ + priv->export_data.is_private = vdev->vdev.is_private; + priv->export_data.kvm = vdev->vdev.kvm; ret = check_dma_ranges(priv, &dmabuf_size); if (ret) @@ -247,3 +257,17 @@ void vfio_pci_dma_buf_cleanup(struct vfio_pci_core_device *vdev) } up_write(&vdev->memory_lock); } + +/* + * Only vfio/pci implements this, so put the helper here for now. + */ +struct vfio_dma_buf_data *vfio_dma_buf_get_data(struct dma_buf *dmabuf) +{ + struct vfio_pci_dma_buf *priv = dmabuf->priv; + + if (dmabuf->ops != &vfio_pci_dmabuf_ops) + return ERR_PTR(-EINVAL); + + return &priv->export_data; +} +EXPORT_SYMBOL_GPL(vfio_dma_buf_get_data); diff --git a/include/linux/vfio.h b/include/linux/vfio.h index e99d856c6cd8..fd7669e5b276 100644 --- a/include/linux/vfio.h +++ b/include/linux/vfio.h @@ -9,6 +9,7 @@ #define VFIO_H +#include #include #include #include @@ -370,4 +371,22 @@ int vfio_virqfd_enable(void *opaque, int (*handler)(void *, void *), void vfio_virqfd_disable(struct virqfd **pvirqfd); void vfio_virqfd_flush_thread(struct virqfd **pvirqfd); +/* + * DMA-buf - generic + */ +struct vfio_dma_buf_data { + bool is_private; + struct kvm *kvm; +}; + +#if IS_ENABLED(CONFIG_DMA_SHARED_BUFFER) && IS_ENABLED(CONFIG_VFIO_PCI_CORE) +struct vfio_dma_buf_data *vfio_dma_buf_get_data(struct dma_buf *dmabuf); +#else +static inline +struct vfio_dma_buf_data *vfio_dma_buf_get_data(struct dma_buf *dmabuf) +{ + return NULL; +} +#endif + #endif /* VFIO_H */ From patchwork Tue Jan 7 14:27:17 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Xu Yilun X-Patchwork-Id: 13930055 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.11]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7339E1A76B6; Wed, 8 Jan 2025 02:29:19 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.11 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736303360; cv=none; b=hJu5CvL1JL12gWagIvBz32cWpUwcwjW9vI6ZHWycqa058U4/TCj6rLcIx9KrcAZ11NV5zzJ3GcEbFWL40g9gPdHQtVLtWmHHfR5NVGhD7JLvOsOV6WFy6RQS/6/z0rsPpo2aLUrU2Fu/L7npKBRXuIK0mlLJFvNPU4F0bbJPR3g= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736303360; c=relaxed/simple; bh=aXS00hvcLCx/JNYY8Pz9dnfD+e6/gpdHuhaWAk8Hj8w=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=FsDI+ZgnX7KjJ3rEr+WVReBObjavgKoqEVltKHffppIJ9SBF5JAIyKoCplq0lEyjGGLtFymxxgiXSxpsw6nIIWvob6RmGKulA+hnfXVinKOy4Fc4MQllLjIFwK+HShMTUpWNvJLnmAqcwXLEXLbA6ljT5+Qw/zLF2kYGxNgHN9o= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=none smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=HorjPr1m; arc=none smtp.client-ip=198.175.65.11 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="HorjPr1m" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1736303360; x=1767839360; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=aXS00hvcLCx/JNYY8Pz9dnfD+e6/gpdHuhaWAk8Hj8w=; b=HorjPr1mPYLXqBrhfegF/sTkEJfNdfMr1lwF3zNvHUb+xsXkzKhr5IQQ rotNAVliAWt3eMBk3illVtEeo8UBZqb3fVm6m46o5BF0/ux+E89Iujk9D dJHIt8f+lYEodnwYt0k4Aw1LDKQAAk8HM9VFMNr2uOMdWLFBtIDQ+UiNK Uz0kzogAujKf131GkWou+M/vzDfdvUWgE3u6avSzPJm1p+E1GZg99o85m oyc+OqTf6jF583aSYLsV9I5kOdViHf7RFPrq7M/QFKRsb1WLkqiejkwe4 wd6NxgBUS0SYVgCUZsVB8Ht/WzzURGeUxizpQw6ZYZJKGP8NDDRRqr2Gt w==; X-CSE-ConnectionGUID: /Q3EP+lPQlSj5bMfGa/U7Q== X-CSE-MsgGUID: qU7FYrA6RI6d/KH75+1J9A== X-IronPort-AV: E=McAfee;i="6700,10204,11308"; a="47010612" X-IronPort-AV: E=Sophos;i="6.12,296,1728975600"; d="scan'208";a="47010612" Received: from orviesa008.jf.intel.com ([10.64.159.148]) by orvoesa103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 07 Jan 2025 18:29:20 -0800 X-CSE-ConnectionGUID: sBlHVtLsRvylvXu/hmJb0w== X-CSE-MsgGUID: YQy6dSo0QA2bbZy+yHWL6A== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.12,224,1728975600"; d="scan'208";a="103793822" Received: from yilunxu-optiplex-7050.sh.intel.com ([10.239.159.165]) by orviesa008.jf.intel.com with ESMTP; 07 Jan 2025 18:29:14 -0800 From: Xu Yilun To: kvm@vger.kernel.org, dri-devel@lists.freedesktop.org, linux-media@vger.kernel.org, linaro-mm-sig@lists.linaro.org, sumit.semwal@linaro.org, christian.koenig@amd.com, pbonzini@redhat.com, seanjc@google.com, alex.williamson@redhat.com, jgg@nvidia.com, vivek.kasireddy@intel.com, dan.j.williams@intel.com, aik@amd.com Cc: yilun.xu@intel.com, yilun.xu@linux.intel.com, linux-coco@lists.linux.dev, linux-kernel@vger.kernel.org, lukas@wunner.de, yan.y.zhao@intel.com, daniel.vetter@ffwll.ch, leon@kernel.org, baolu.lu@linux.intel.com, zhenzhong.duan@intel.com, tao1.su@intel.com Subject: [RFC PATCH 10/12] KVM: vfio_dmabuf: Fetch VFIO specific dma-buf data for sanity check Date: Tue, 7 Jan 2025 22:27:17 +0800 Message-Id: <20250107142719.179636-11-yilun.xu@linux.intel.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20250107142719.179636-1-yilun.xu@linux.intel.com> References: <20250107142719.179636-1-yilun.xu@linux.intel.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Fetch VFIO specific dma-buf data to see if the dma-buf is eligible to be assigned to CoCo VM as private MMIO. KVM expects host unaccessible dma-buf for private MMIO mapping. So need the exporter provide this information. VFIO also provides the struct kvm *kvm handler for KVM to check if the owner device of the MMIO region is already assigned to the same CoCo VM. Signed-off-by: Xu Yilun --- virt/kvm/vfio_dmabuf.c | 26 ++++++++++++++++++++++++++ 1 file changed, 26 insertions(+) diff --git a/virt/kvm/vfio_dmabuf.c b/virt/kvm/vfio_dmabuf.c index c427ab39c68a..26e01b815ebf 100644 --- a/virt/kvm/vfio_dmabuf.c +++ b/virt/kvm/vfio_dmabuf.c @@ -12,6 +12,22 @@ struct kvm_vfio_dmabuf { struct kvm_memory_slot *slot; }; +static struct vfio_dma_buf_data *kvm_vfio_dma_buf_get_data(struct dma_buf *dmabuf) +{ + struct vfio_dma_buf_data *(*fn)(struct dma_buf *dmabuf); + struct vfio_dma_buf_data *ret; + + fn = symbol_get(vfio_dma_buf_get_data); + if (!fn) + return ERR_PTR(-ENOENT); + + ret = fn(dmabuf); + + symbol_put(vfio_dma_buf_get_data); + + return ret; +} + static void kv_dmabuf_move_notify(struct dma_buf_attachment *attach) { struct kvm_vfio_dmabuf *kv_dmabuf = attach->importer_priv; @@ -48,6 +64,7 @@ int kvm_vfio_dmabuf_bind(struct kvm *kvm, struct kvm_memory_slot *slot, size_t size = slot->npages << PAGE_SHIFT; struct dma_buf_attachment *attach; struct kvm_vfio_dmabuf *kv_dmabuf; + struct vfio_dma_buf_data *data; struct dma_buf *dmabuf; int ret; @@ -60,6 +77,15 @@ int kvm_vfio_dmabuf_bind(struct kvm *kvm, struct kvm_memory_slot *slot, goto err_dmabuf; } + data = kvm_vfio_dma_buf_get_data(dmabuf); + if (IS_ERR(data)) + goto err_dmabuf; + + if (!data->is_private || data->kvm != kvm) { + ret = -EINVAL; + goto err_dmabuf; + } + kv_dmabuf = kzalloc(sizeof(*kv_dmabuf), GFP_KERNEL); if (!kv_dmabuf) { ret = -ENOMEM; From patchwork Tue Jan 7 14:27:18 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Xu Yilun X-Patchwork-Id: 13930056 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.11]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7D4371A9B35; Wed, 8 Jan 2025 02:29:25 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.11 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736303366; cv=none; b=MvFVXekHH0InL6g6HsM3cUFNhKySovBqrwjQkP5j68huHlxrm2Ok1JxLbUMyOGLKRrddBPTgAH8lJfMaFMIa7Q1FltLnes+C7+bc1yMBxhTzFJy3qgCeCxvKzDKNkGrajOFKZbz5Gpa+ACsSnIxUPJ7DM5fGyyBEnO7B56I+rJw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736303366; c=relaxed/simple; bh=S/z8023DqOXrBeB2h+oVIlnQB3qWuu0HfYeVGYTz0Vk=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=MvIqi7zBwVLgt8gSA76aiLVv/GhhNBhEJdk3Bh2MlyK9xtPZ70eDOYDYo5YU38i1KigJh/i3d0wLVJQ6xa5ZqW5hHLoMPuucskFyLE5UvITui029JzZCaGkvEe2EbHcAiB2nk/LHU0uLIE9KHbadJ+PQNrdL5S8rxylP0vG7X+8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=none smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=DPKZhMzs; arc=none smtp.client-ip=198.175.65.11 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="DPKZhMzs" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1736303366; x=1767839366; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=S/z8023DqOXrBeB2h+oVIlnQB3qWuu0HfYeVGYTz0Vk=; b=DPKZhMzs/eOLXhfb3QjvAREcq9SY5HdbHHvQ/g1XCAGd56bFiqXSO7t7 3gCieKnwCuefl6D7+p/0nQNUHULtl0HVvyUB1Irhgo5P+jalsE0U55cVu aCqpsmR1vlHNNr5urpZYd49/a5X/oC0pJdZRR4RiADzcFx2N0pSzLEClj kBPS8n1bHfMyf/0YxtmIOEW6wst/oTDZP7DhaAIfUx+oghb5i2pYWTe3T nxsHH3v+A0x76N1W1S3vnWEViLxOrFFNsGSoMJiAIuNP3b2eHx/cr6zbN VGC5GYqNzjSxQfKU1CY2HE/i8bND5HhgztDT+SwTtQipVfFWcrLbSnY+P A==; X-CSE-ConnectionGUID: yQqRI3obSYiFinlGW8UnAw== X-CSE-MsgGUID: WK9jmRo3SD6Os8bAgOaTYA== X-IronPort-AV: E=McAfee;i="6700,10204,11308"; a="47010638" X-IronPort-AV: E=Sophos;i="6.12,296,1728975600"; d="scan'208";a="47010638" Received: from orviesa008.jf.intel.com ([10.64.159.148]) by orvoesa103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 07 Jan 2025 18:29:25 -0800 X-CSE-ConnectionGUID: i6Mkl2ZcQM2v3FBIYzGXfw== X-CSE-MsgGUID: zcD3B7dES06fsa7ON3BDvA== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.12,224,1728975600"; d="scan'208";a="103793857" Received: from yilunxu-optiplex-7050.sh.intel.com ([10.239.159.165]) by orviesa008.jf.intel.com with ESMTP; 07 Jan 2025 18:29:20 -0800 From: Xu Yilun To: kvm@vger.kernel.org, dri-devel@lists.freedesktop.org, linux-media@vger.kernel.org, linaro-mm-sig@lists.linaro.org, sumit.semwal@linaro.org, christian.koenig@amd.com, pbonzini@redhat.com, seanjc@google.com, alex.williamson@redhat.com, jgg@nvidia.com, vivek.kasireddy@intel.com, dan.j.williams@intel.com, aik@amd.com Cc: yilun.xu@intel.com, yilun.xu@linux.intel.com, linux-coco@lists.linux.dev, linux-kernel@vger.kernel.org, lukas@wunner.de, yan.y.zhao@intel.com, daniel.vetter@ffwll.ch, leon@kernel.org, baolu.lu@linux.intel.com, zhenzhong.duan@intel.com, tao1.su@intel.com Subject: [RFC PATCH 11/12] KVM: x86/mmu: Export kvm_is_mmio_pfn() Date: Tue, 7 Jan 2025 22:27:18 +0800 Message-Id: <20250107142719.179636-12-yilun.xu@linux.intel.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20250107142719.179636-1-yilun.xu@linux.intel.com> References: <20250107142719.179636-1-yilun.xu@linux.intel.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Xu Yilun Export kvm_is_mmio_pfn() for KVM TDX to decide which seamcall should be used to setup SEPT leaf entry. TDX Module requires tdh_mem_page_aug() for memory page setup, and tdh_mmio_map() for MMIO setup. Signed-off-by: Yan Zhao Signed-off-by: Xu Yilun --- arch/x86/kvm/mmu.h | 1 + arch/x86/kvm/mmu/spte.c | 3 ++- 2 files changed, 3 insertions(+), 1 deletion(-) diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h index e40097c7e8d4..23ff0e6c9ef6 100644 --- a/arch/x86/kvm/mmu.h +++ b/arch/x86/kvm/mmu.h @@ -102,6 +102,7 @@ void kvm_mmu_sync_roots(struct kvm_vcpu *vcpu); void kvm_mmu_sync_prev_roots(struct kvm_vcpu *vcpu); void kvm_mmu_track_write(struct kvm_vcpu *vcpu, gpa_t gpa, const u8 *new, int bytes); +bool kvm_is_mmio_pfn(kvm_pfn_t pfn); static inline int kvm_mmu_reload(struct kvm_vcpu *vcpu) { diff --git a/arch/x86/kvm/mmu/spte.c b/arch/x86/kvm/mmu/spte.c index e819d16655b6..0a9a81afba93 100644 --- a/arch/x86/kvm/mmu/spte.c +++ b/arch/x86/kvm/mmu/spte.c @@ -105,7 +105,7 @@ u64 make_mmio_spte(struct kvm_vcpu *vcpu, u64 gfn, unsigned int access) return spte; } -static bool kvm_is_mmio_pfn(kvm_pfn_t pfn) +bool kvm_is_mmio_pfn(kvm_pfn_t pfn) { if (pfn_valid(pfn)) return !is_zero_pfn(pfn) && PageReserved(pfn_to_page(pfn)) && @@ -125,6 +125,7 @@ static bool kvm_is_mmio_pfn(kvm_pfn_t pfn) pfn_to_hpa(pfn + 1) - 1, E820_TYPE_RAM); } +EXPORT_SYMBOL_GPL(kvm_is_mmio_pfn); /* * Returns true if the SPTE has bits that may be set without holding mmu_lock. From patchwork Tue Jan 7 14:27:19 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Xu Yilun X-Patchwork-Id: 13930057 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.11]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1E2291AA795; Wed, 8 Jan 2025 02:29:30 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.11 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736303371; cv=none; b=fJOUw6CDmuzigWrMmW4bITb7KGCeNI+WjS8WzJndJ5/+Am/1zwx9Um/IIk78k1w3S+RqSx6WXO4EL/IUQou3OpkoZ+/xmPS0aEEf2/Q3451i5Yry9UN47/8FCljnjE6sMZpg6HRIu8t1JK9bwI6ooqv/xViTVb5qL5/SwYMoEGM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736303371; c=relaxed/simple; bh=wVHrkI8pTk7m0SmAt5vONTuphWS7jc5OHQuJjOSNwbI=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=n3Gicfd3uqYlHs8RtAe6XCj+P3SMfZJpDNvFj7GOoRwtYntwk3t5tW24M6VMuYViiuuHB6EKgYTpiDcHJfzb7FYyabf8xyyWygy2UR50dLDNDN5Jhwd6MM/qzkx/LVTX068CjqeKEIjwOaeC5LR78I19inz8mx2fyvbdK8CmPx4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=none smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=eZJ9eokF; arc=none smtp.client-ip=198.175.65.11 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="eZJ9eokF" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1736303371; x=1767839371; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=wVHrkI8pTk7m0SmAt5vONTuphWS7jc5OHQuJjOSNwbI=; b=eZJ9eokF5biqXp7Psuwftgn7SuymIPdWcuqlFBohXOjLabhNTMKUNSv+ PLB/kSsK/MehXlbjN78ogLTrVCENKrsSGle+eDxyjKFOi86VOpha3IITv XMrXKUgSqeGzb8NAyiWR2j5HbD2rNABtI7jw9D6n9ZuypT/BpVvuEuJEj TaQH6dqi2joquG7J/O4yqVclyyu7guZ+1JlmX2qGvLd5yRlkxklSQU9EG glFvU26LIDVjOcjim4+0g1n2B3dHWBExdsDDG0EerC0qt9+pKPmjTllqz wGWwVnWy+kR8MDCkBA0mh6R+PsY6xU8gDQzqbL1oQshs3vpFEa5oTLMGH g==; X-CSE-ConnectionGUID: QjECWk3wRImeSg9TQKpcTw== X-CSE-MsgGUID: jFWvoXMSSamDsKGlI19vSw== X-IronPort-AV: E=McAfee;i="6700,10204,11308"; a="47010654" X-IronPort-AV: E=Sophos;i="6.12,296,1728975600"; d="scan'208";a="47010654" Received: from orviesa008.jf.intel.com ([10.64.159.148]) by orvoesa103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 07 Jan 2025 18:29:30 -0800 X-CSE-ConnectionGUID: 0kj8MQWETA+EgGm5i13+zw== X-CSE-MsgGUID: YyKlmcGyQiKuexj+KBRfVA== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.12,224,1728975600"; d="scan'208";a="103793902" Received: from yilunxu-optiplex-7050.sh.intel.com ([10.239.159.165]) by orviesa008.jf.intel.com with ESMTP; 07 Jan 2025 18:29:25 -0800 From: Xu Yilun To: kvm@vger.kernel.org, dri-devel@lists.freedesktop.org, linux-media@vger.kernel.org, linaro-mm-sig@lists.linaro.org, sumit.semwal@linaro.org, christian.koenig@amd.com, pbonzini@redhat.com, seanjc@google.com, alex.williamson@redhat.com, jgg@nvidia.com, vivek.kasireddy@intel.com, dan.j.williams@intel.com, aik@amd.com Cc: yilun.xu@intel.com, yilun.xu@linux.intel.com, linux-coco@lists.linux.dev, linux-kernel@vger.kernel.org, lukas@wunner.de, yan.y.zhao@intel.com, daniel.vetter@ffwll.ch, leon@kernel.org, baolu.lu@linux.intel.com, zhenzhong.duan@intel.com, tao1.su@intel.com Subject: [RFC PATCH 12/12] KVM: TDX: Implement TDX specific private MMIO map/unmap for SEPT Date: Tue, 7 Jan 2025 22:27:19 +0800 Message-Id: <20250107142719.179636-13-yilun.xu@linux.intel.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20250107142719.179636-1-yilun.xu@linux.intel.com> References: <20250107142719.179636-1-yilun.xu@linux.intel.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Implement TDX specific private MMIO map/unmap in existing TDP MMU hooks. Signed-off-by: Yan Zhao Signed-off-by: Xu Yilun --- TODO: This patch is still based on the earlier kvm-coco-queue version (v6.13-rc2). Will follow up the latest SEAMCALL wrapper change. [1] [1] https://lore.kernel.org/all/20250101074959.412696-1-pbonzini@redhat.com/ --- arch/x86/include/asm/tdx.h | 3 ++ arch/x86/kvm/vmx/tdx.c | 57 +++++++++++++++++++++++++++++++++++-- arch/x86/virt/vmx/tdx/tdx.c | 52 +++++++++++++++++++++++++++++++++ arch/x86/virt/vmx/tdx/tdx.h | 3 ++ 4 files changed, 113 insertions(+), 2 deletions(-) diff --git a/arch/x86/include/asm/tdx.h b/arch/x86/include/asm/tdx.h index 01409a59224d..7d158bbf79f4 100644 --- a/arch/x86/include/asm/tdx.h +++ b/arch/x86/include/asm/tdx.h @@ -151,6 +151,9 @@ u64 tdh_mem_page_remove(u64 tdr, u64 gpa, u64 level, u64 *rcx, u64 *rdx); u64 tdh_phymem_cache_wb(bool resume); u64 tdh_phymem_page_wbinvd_tdr(u64 tdr); u64 tdh_phymem_page_wbinvd_hkid(u64 hpa, u64 hkid); +u64 tdh_mmio_map(u64 tdr, u64 gpa, u64 level, u64 hpa, u64 *rcx, u64 *rdx); +u64 tdh_mmio_block(u64 tdr, u64 gpa, u64 level, u64 *rcx, u64 *rdx); +u64 tdh_mmio_unmap(u64 tdr, u64 gpa, u64 level, u64 *rcx, u64 *rdx); #else static inline void tdx_init(void) { } static inline int tdx_cpu_enable(void) { return -ENODEV; } diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c index 69ef9c967fbf..9b43a2ee2203 100644 --- a/arch/x86/kvm/vmx/tdx.c +++ b/arch/x86/kvm/vmx/tdx.c @@ -1576,6 +1576,29 @@ static int tdx_mem_page_aug(struct kvm *kvm, gfn_t gfn, return 0; } +static int tdx_mmio_map(struct kvm *kvm, gfn_t gfn, + enum pg_level level, kvm_pfn_t pfn) +{ + int tdx_level = pg_level_to_tdx_sept_level(level); + struct kvm_tdx *kvm_tdx = to_kvm_tdx(kvm); + hpa_t hpa = pfn_to_hpa(pfn); + gpa_t gpa = gfn_to_gpa(gfn); + u64 entry, level_state; + u64 err; + + err = tdh_mmio_map(kvm_tdx->tdr_pa, gpa, tdx_level, hpa, + &entry, &level_state); + if (unlikely(err & TDX_OPERAND_BUSY)) + return -EBUSY; + + if (KVM_BUG_ON(err, kvm)) { + pr_tdx_error_2(TDH_MMIO_MAP, err, entry, level_state); + return -EIO; + } + + return 0; +} + /* * KVM_TDX_INIT_MEM_REGION calls kvm_gmem_populate() to get guest pages and * tdx_gmem_post_populate() to premap page table pages into private EPT. @@ -1610,6 +1633,9 @@ int tdx_sept_set_private_spte(struct kvm *kvm, gfn_t gfn, if (KVM_BUG_ON(level != PG_LEVEL_4K, kvm)) return -EINVAL; + if (kvm_is_mmio_pfn(pfn)) + return tdx_mmio_map(kvm, gfn, level, pfn); + /* * Because guest_memfd doesn't support page migration with * a_ops->migrate_folio (yet), no callback is triggered for KVM on page @@ -1647,6 +1673,20 @@ static int tdx_sept_drop_private_spte(struct kvm *kvm, gfn_t gfn, if (KVM_BUG_ON(!is_hkid_assigned(kvm_tdx), kvm)) return -EINVAL; + if (kvm_is_mmio_pfn(pfn)) { + do { + err = tdh_mmio_unmap(kvm_tdx->tdr_pa, gpa, tdx_level, + &entry, &level_state); + } while (unlikely(err == TDX_ERROR_SEPT_BUSY)); + + if (KVM_BUG_ON(err, kvm)) { + pr_tdx_error_2(TDH_MMIO_UNMAP, err, entry, level_state); + return -EIO; + } + + return 0; + } + do { /* * When zapping private page, write lock is held. So no race @@ -1715,7 +1755,7 @@ int tdx_sept_link_private_spt(struct kvm *kvm, gfn_t gfn, } static int tdx_sept_zap_private_spte(struct kvm *kvm, gfn_t gfn, - enum pg_level level) + enum pg_level level, kvm_pfn_t pfn) { int tdx_level = pg_level_to_tdx_sept_level(level); struct kvm_tdx *kvm_tdx = to_kvm_tdx(kvm); @@ -1725,6 +1765,19 @@ static int tdx_sept_zap_private_spte(struct kvm *kvm, gfn_t gfn, /* For now large page isn't supported yet. */ WARN_ON_ONCE(level != PG_LEVEL_4K); + if (kvm_is_mmio_pfn(pfn)) { + err = tdh_mmio_block(kvm_tdx->tdr_pa, gpa, tdx_level, + &entry, &level_state); + if (unlikely(err == TDX_ERROR_SEPT_BUSY)) + return -EAGAIN; + if (KVM_BUG_ON(err, kvm)) { + pr_tdx_error_2(TDH_MMIO_BLOCK, err, entry, level_state); + return -EIO; + } + + return 0; + } + err = tdh_mem_range_block(kvm_tdx->tdr_pa, gpa, tdx_level, &entry, &level_state); if (unlikely(err == TDX_ERROR_SEPT_BUSY)) return -EAGAIN; @@ -1816,7 +1869,7 @@ int tdx_sept_remove_private_spte(struct kvm *kvm, gfn_t gfn, if (KVM_BUG_ON(!is_hkid_assigned(to_kvm_tdx(kvm)), kvm)) return -EINVAL; - ret = tdx_sept_zap_private_spte(kvm, gfn, level); + ret = tdx_sept_zap_private_spte(kvm, gfn, level, pfn); if (ret) return ret; diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c index 57195cf0d832..3b2109877a39 100644 --- a/arch/x86/virt/vmx/tdx/tdx.c +++ b/arch/x86/virt/vmx/tdx/tdx.c @@ -1951,3 +1951,55 @@ u64 tdh_phymem_page_wbinvd_hkid(u64 hpa, u64 hkid) return seamcall(TDH_PHYMEM_PAGE_WBINVD, &args); } EXPORT_SYMBOL_GPL(tdh_phymem_page_wbinvd_hkid); + +u64 tdh_mmio_map(u64 tdr, u64 gpa, u64 level, u64 hpa, u64 *rcx, u64 *rdx) +{ + struct tdx_module_args args = { + .rcx = gpa | level, + .rdx = tdr, + .r8 = hpa, + }; + u64 ret; + + ret = tdx_seamcall_sept(TDH_MMIO_MAP, &args); + + *rcx = args.rcx; + *rdx = args.rdx; + + return ret; +} +EXPORT_SYMBOL_GPL(tdh_mmio_map); + +u64 tdh_mmio_block(u64 tdr, u64 gpa, u64 level, u64 *rcx, u64 *rdx) +{ + struct tdx_module_args args = { + .rcx = gpa | level, + .rdx = tdr, + }; + u64 ret; + + ret = tdx_seamcall_sept(TDH_MMIO_BLOCK, &args); + + *rcx = args.rcx; + *rdx = args.rdx; + + return ret; +} +EXPORT_SYMBOL_GPL(tdh_mmio_block); + +u64 tdh_mmio_unmap(u64 tdr, u64 gpa, u64 level, u64 *rcx, u64 *rdx) +{ + struct tdx_module_args args = { + .rcx = gpa | level, + .rdx = tdr, + }; + u64 ret; + + ret = tdx_seamcall_sept(TDH_MMIO_UNMAP, &args); + + *rcx = args.rcx; + *rdx = args.rdx; + + return ret; +} +EXPORT_SYMBOL_GPL(tdh_mmio_unmap); diff --git a/arch/x86/virt/vmx/tdx/tdx.h b/arch/x86/virt/vmx/tdx/tdx.h index 58d5754dcb4d..a83a90a043a5 100644 --- a/arch/x86/virt/vmx/tdx/tdx.h +++ b/arch/x86/virt/vmx/tdx/tdx.h @@ -49,6 +49,9 @@ #define TDH_VP_WR 43 #define TDH_PHYMEM_PAGE_WBINVD 41 #define TDH_SYS_CONFIG 45 +#define TDH_MMIO_MAP 158 +#define TDH_MMIO_BLOCK 159 +#define TDH_MMIO_UNMAP 160 /* * SEAMCALL leaf: