From patchwork Tue Jan 17 13:49:30 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yi Liu X-Patchwork-Id: 13104674 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4961CC677F1 for ; Tue, 17 Jan 2023 13:50:26 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231208AbjAQNuY (ORCPT ); Tue, 17 Jan 2023 08:50:24 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:47068 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230510AbjAQNt5 (ORCPT ); Tue, 17 Jan 2023 08:49:57 -0500 Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A942E3B0D8 for ; Tue, 17 Jan 2023 05:49:46 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1673963386; x=1705499386; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=OoZ9xwrsZqLf2k6VO9I8BlEj5dTt6kQpJ6grPGlPQ+k=; b=Ppo4588TJwlrv01EoTkgCwbhL+9qrq4QYYeGdr0FCzbndqqw7nrLtpm8 aVslMgiT8ILXvE0nAEY5tuX50awrYCTUA+tWEnoOETQNqeclRhRhfOlLk cY2ca2lq6poC6+6zALW6xSveaim/TuXu8doFEKWcP0ca+BPn4cyIzL4Yy BIYINq+/SuP+dT8146Z0JHznjXOn09Agf2pXibZITKmiCBE84vYmnnhLq /uAyZLYj+gTgn8mrchbcQLF2teUtOey6sk7ID44u++A4NImXjofZG4yso yXbyJBKQbKsShFQ0FStCRhqeXP/f4FQ5VW96ELmpvL6NHM5gJrO+8siRr w==; X-IronPort-AV: E=McAfee;i="6500,9779,10592"; a="326766374" X-IronPort-AV: E=Sophos;i="5.97,224,1669104000"; d="scan'208";a="326766374" Received: from orsmga007.jf.intel.com ([10.7.209.58]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 17 Jan 2023 05:49:46 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6500,9779,10592"; a="652551009" X-IronPort-AV: E=Sophos;i="5.97,224,1669104000"; d="scan'208";a="652551009" Received: from 984fee00a4c6.jf.intel.com ([10.165.58.231]) by orsmga007.jf.intel.com with ESMTP; 17 Jan 2023 05:49:46 -0800 From: Yi Liu To: alex.williamson@redhat.com, jgg@nvidia.com Cc: kevin.tian@intel.com, cohuck@redhat.com, eric.auger@redhat.com, nicolinc@nvidia.com, kvm@vger.kernel.org, mjrosato@linux.ibm.com, chao.p.peng@linux.intel.com, yi.l.liu@intel.com, yi.y.sun@linux.intel.com, peterx@redhat.com, jasowang@redhat.com, suravee.suthikulpanit@amd.com Subject: [PATCH 01/13] vfio: Allocate per device file structure Date: Tue, 17 Jan 2023 05:49:30 -0800 Message-Id: <20230117134942.101112-2-yi.l.liu@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230117134942.101112-1-yi.l.liu@intel.com> References: <20230117134942.101112-1-yi.l.liu@intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org This is preparation for adding vfio device cdev support. vfio device cdev requires: 1) a per device file memory to store the kvm pointer set by KVM. It will be propagated to vfio_device:kvm after the device cdev file is bound to an iommufd 2) a mechanism to block device access through device cdev fd before it is bound to an iommufd To address above requirements, this adds a per device file structure named vfio_device_file. For now, it's only a wrapper of struct vfio_device pointer. Other fields will be added to this per file structure in future commits. Signed-off-by: Yi Liu Reviewed-by: Kevin Tian Reviewed-by: Eric Auger --- drivers/vfio/group.c | 13 +++++++++++-- drivers/vfio/vfio.h | 6 ++++++ drivers/vfio/vfio_main.c | 31 ++++++++++++++++++++++++++----- 3 files changed, 43 insertions(+), 7 deletions(-) diff --git a/drivers/vfio/group.c b/drivers/vfio/group.c index bb24b2f0271e..8fdb7e35b0a6 100644 --- a/drivers/vfio/group.c +++ b/drivers/vfio/group.c @@ -186,19 +186,26 @@ void vfio_device_group_close(struct vfio_device *device) static struct file *vfio_device_open_file(struct vfio_device *device) { + struct vfio_device_file *df; struct file *filep; int ret; + df = vfio_allocate_device_file(device); + if (IS_ERR(df)) { + ret = PTR_ERR(df); + goto err_out; + } + ret = vfio_device_group_open(device); if (ret) - goto err_out; + goto err_free; /* * We can't use anon_inode_getfd() because we need to modify * the f_mode flags directly to allow more than just ioctls */ filep = anon_inode_getfile("[vfio-device]", &vfio_device_fops, - device, O_RDWR); + df, O_RDWR); if (IS_ERR(filep)) { ret = PTR_ERR(filep); goto err_close_device; @@ -222,6 +229,8 @@ static struct file *vfio_device_open_file(struct vfio_device *device) err_close_device: vfio_device_group_close(device); +err_free: + kfree(df); err_out: return ERR_PTR(ret); } diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h index f8219a438bfb..1091806bc89d 100644 --- a/drivers/vfio/vfio.h +++ b/drivers/vfio/vfio.h @@ -16,12 +16,18 @@ struct iommu_group; struct vfio_device; struct vfio_container; +struct vfio_device_file { + struct vfio_device *device; +}; + void vfio_device_put_registration(struct vfio_device *device); bool vfio_device_try_get_registration(struct vfio_device *device); int vfio_device_open(struct vfio_device *device, struct iommufd_ctx *iommufd, struct kvm *kvm); void vfio_device_close(struct vfio_device *device, struct iommufd_ctx *iommufd); +struct vfio_device_file * +vfio_allocate_device_file(struct vfio_device *device); extern const struct file_operations vfio_device_fops; diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c index 5177bb061b17..ee54c9ae0af4 100644 --- a/drivers/vfio/vfio_main.c +++ b/drivers/vfio/vfio_main.c @@ -344,6 +344,20 @@ static bool vfio_assert_device_open(struct vfio_device *device) return !WARN_ON_ONCE(!READ_ONCE(device->open_count)); } +struct vfio_device_file * +vfio_allocate_device_file(struct vfio_device *device) +{ + struct vfio_device_file *df; + + df = kzalloc(sizeof(*df), GFP_KERNEL_ACCOUNT); + if (!df) + return ERR_PTR(-ENOMEM); + + df->device = device; + + return df; +} + static int vfio_device_first_open(struct vfio_device *device, struct iommufd_ctx *iommufd, struct kvm *kvm) { @@ -461,12 +475,15 @@ static inline void vfio_device_pm_runtime_put(struct vfio_device *device) */ static int vfio_device_fops_release(struct inode *inode, struct file *filep) { - struct vfio_device *device = filep->private_data; + struct vfio_device_file *df = filep->private_data; + struct vfio_device *device = df->device; vfio_device_group_close(device); vfio_device_put_registration(device); + kfree(df); + return 0; } @@ -1031,7 +1048,8 @@ static int vfio_ioctl_device_feature(struct vfio_device *device, static long vfio_device_fops_unl_ioctl(struct file *filep, unsigned int cmd, unsigned long arg) { - struct vfio_device *device = filep->private_data; + struct vfio_device_file *df = filep->private_data; + struct vfio_device *device = df->device; int ret; ret = vfio_device_pm_runtime_get(device); @@ -1058,7 +1076,8 @@ static long vfio_device_fops_unl_ioctl(struct file *filep, static ssize_t vfio_device_fops_read(struct file *filep, char __user *buf, size_t count, loff_t *ppos) { - struct vfio_device *device = filep->private_data; + struct vfio_device_file *df = filep->private_data; + struct vfio_device *device = df->device; if (unlikely(!device->ops->read)) return -EINVAL; @@ -1070,7 +1089,8 @@ static ssize_t vfio_device_fops_write(struct file *filep, const char __user *buf, size_t count, loff_t *ppos) { - struct vfio_device *device = filep->private_data; + struct vfio_device_file *df = filep->private_data; + struct vfio_device *device = df->device; if (unlikely(!device->ops->write)) return -EINVAL; @@ -1080,7 +1100,8 @@ static ssize_t vfio_device_fops_write(struct file *filep, static int vfio_device_fops_mmap(struct file *filep, struct vm_area_struct *vma) { - struct vfio_device *device = filep->private_data; + struct vfio_device_file *df = filep->private_data; + struct vfio_device *device = df->device; if (unlikely(!device->ops->mmap)) return -EINVAL; From patchwork Tue Jan 17 13:49:31 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yi Liu X-Patchwork-Id: 13104676 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2C0B3C3DA78 for ; Tue, 17 Jan 2023 13:50:31 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230453AbjAQNu3 (ORCPT ); Tue, 17 Jan 2023 08:50:29 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46592 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230392AbjAQNt7 (ORCPT ); Tue, 17 Jan 2023 08:49:59 -0500 Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 734653B3F9 for ; Tue, 17 Jan 2023 05:49:49 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1673963389; x=1705499389; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=xZCDLpcAU+pgu0tfTQfQ6zxZJmCJs2fpuoPmOAEP5GE=; b=mZIwp0XBrGvPHKMgJaHkZOiAKBTwOApAl0SJdJ+Qqw7mlUq1vxuXeisT 1wqGTeHKHhsE/rq5WE8abavCiCnbNax8Oj4VkekKD6ltlYDei/VDtilbo ymZzGOtzW4ZijMb8bkdJGNNIRuhmsRNsBcel72JhsitG/4C+WT+/FeB7K yB9DarVbN06Sjr1LfqsTnwo8Vzmwz3UNqHUsvIg854PSepU3S9qc697ft ZvuXSm7CDiIl9ikTWrSuGYjxOo+kc3WJshhP01s5Spm+GzDKeDZQZTkOQ acbfSLEW7lZz9wRIpb9D3bj8hTu579vdJua5Mx8QIyg9idCcGwYbjN0PK Q==; X-IronPort-AV: E=McAfee;i="6500,9779,10592"; a="326766386" X-IronPort-AV: E=Sophos;i="5.97,224,1669104000"; d="scan'208";a="326766386" Received: from orsmga007.jf.intel.com ([10.7.209.58]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 17 Jan 2023 05:49:48 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6500,9779,10592"; a="652551017" X-IronPort-AV: E=Sophos;i="5.97,224,1669104000"; d="scan'208";a="652551017" Received: from 984fee00a4c6.jf.intel.com ([10.165.58.231]) by orsmga007.jf.intel.com with ESMTP; 17 Jan 2023 05:49:48 -0800 From: Yi Liu To: alex.williamson@redhat.com, jgg@nvidia.com Cc: kevin.tian@intel.com, cohuck@redhat.com, eric.auger@redhat.com, nicolinc@nvidia.com, kvm@vger.kernel.org, mjrosato@linux.ibm.com, chao.p.peng@linux.intel.com, yi.l.liu@intel.com, yi.y.sun@linux.intel.com, peterx@redhat.com, jasowang@redhat.com, suravee.suthikulpanit@amd.com Subject: [PATCH 02/13] vfio: Refine vfio file kAPIs Date: Tue, 17 Jan 2023 05:49:31 -0800 Message-Id: <20230117134942.101112-3-yi.l.liu@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230117134942.101112-1-yi.l.liu@intel.com> References: <20230117134942.101112-1-yi.l.liu@intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org This prepares for making the below kAPIs to accept both group file and device file instead of only vfio group file. bool vfio_file_enforced_coherent(struct file *file); void vfio_file_set_kvm(struct file *file, struct kvm *kvm); bool vfio_file_has_dev(struct file *file, struct vfio_device *device); Besides above change, vfio_file_is_group() is renamed to be vfio_file_is_valid(). Signed-off-by: Yi Liu Reviewed-by: Kevin Tian Reviewed-by: Eric Auger --- drivers/vfio/group.c | 74 ++++++++------------------------ drivers/vfio/pci/vfio_pci_core.c | 4 +- drivers/vfio/vfio.h | 4 ++ drivers/vfio/vfio_main.c | 62 ++++++++++++++++++++++++++ include/linux/vfio.h | 2 +- virt/kvm/vfio.c | 10 ++--- 6 files changed, 92 insertions(+), 64 deletions(-) diff --git a/drivers/vfio/group.c b/drivers/vfio/group.c index 8fdb7e35b0a6..d83cf069d290 100644 --- a/drivers/vfio/group.c +++ b/drivers/vfio/group.c @@ -721,6 +721,15 @@ bool vfio_device_has_container(struct vfio_device *device) return device->group->container; } +struct vfio_group *vfio_group_from_file(struct file *file) +{ + struct vfio_group *group = file->private_data; + + if (file->f_op != &vfio_group_fops) + return NULL; + return group; +} + /** * vfio_file_iommu_group - Return the struct iommu_group for the vfio group file * @file: VFIO group file @@ -731,13 +740,13 @@ bool vfio_device_has_container(struct vfio_device *device) */ struct iommu_group *vfio_file_iommu_group(struct file *file) { - struct vfio_group *group = file->private_data; + struct vfio_group *group = vfio_group_from_file(file); struct iommu_group *iommu_group = NULL; if (!IS_ENABLED(CONFIG_SPAPR_TCE_IOMMU)) return NULL; - if (!vfio_file_is_group(file)) + if (!group) return NULL; mutex_lock(&group->group_lock); @@ -750,34 +759,11 @@ struct iommu_group *vfio_file_iommu_group(struct file *file) } EXPORT_SYMBOL_GPL(vfio_file_iommu_group); -/** - * vfio_file_is_group - True if the file is usable with VFIO aPIS - * @file: VFIO group file - */ -bool vfio_file_is_group(struct file *file) -{ - return file->f_op == &vfio_group_fops; -} -EXPORT_SYMBOL_GPL(vfio_file_is_group); - -/** - * vfio_file_enforced_coherent - True if the DMA associated with the VFIO file - * is always CPU cache coherent - * @file: VFIO group file - * - * Enforced coherency means that the IOMMU ignores things like the PCIe no-snoop - * bit in DMA transactions. A return of false indicates that the user has - * rights to access additional instructions such as wbinvd on x86. - */ -bool vfio_file_enforced_coherent(struct file *file) +bool vfio_group_enforced_coherent(struct vfio_group *group) { - struct vfio_group *group = file->private_data; struct vfio_device *device; bool ret = true; - if (!vfio_file_is_group(file)) - return true; - /* * If the device does not have IOMMU_CAP_ENFORCE_CACHE_COHERENCY then * any domain later attached to it will also not support it. If the cap @@ -795,46 +781,22 @@ bool vfio_file_enforced_coherent(struct file *file) mutex_unlock(&group->device_lock); return ret; } -EXPORT_SYMBOL_GPL(vfio_file_enforced_coherent); -/** - * vfio_file_set_kvm - Link a kvm with VFIO drivers - * @file: VFIO group file - * @kvm: KVM to link - * - * When a VFIO device is first opened the KVM will be available in - * device->kvm if one was associated with the group. - */ -void vfio_file_set_kvm(struct file *file, struct kvm *kvm) +void vfio_group_set_kvm(struct vfio_group *group, struct kvm *kvm) { - struct vfio_group *group = file->private_data; - - if (!vfio_file_is_group(file)) - return; - + /* + * When a VFIO device is first opened the KVM will be available in + * device->kvm if one was associated with the group. + */ mutex_lock(&group->group_lock); group->kvm = kvm; mutex_unlock(&group->group_lock); } -EXPORT_SYMBOL_GPL(vfio_file_set_kvm); -/** - * vfio_file_has_dev - True if the VFIO file is a handle for device - * @file: VFIO file to check - * @device: Device that must be part of the file - * - * Returns true if given file has permission to manipulate the given device. - */ -bool vfio_file_has_dev(struct file *file, struct vfio_device *device) +bool vfio_group_has_dev(struct vfio_group *group, struct vfio_device *device) { - struct vfio_group *group = file->private_data; - - if (!vfio_file_is_group(file)) - return false; - return group == device->group; } -EXPORT_SYMBOL_GPL(vfio_file_has_dev); static char *vfio_devnode(const struct device *dev, umode_t *mode) { diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c index 26a541cc64d1..985c6184a587 100644 --- a/drivers/vfio/pci/vfio_pci_core.c +++ b/drivers/vfio/pci/vfio_pci_core.c @@ -1319,8 +1319,8 @@ static int vfio_pci_ioctl_pci_hot_reset(struct vfio_pci_core_device *vdev, break; } - /* Ensure the FD is a vfio group FD.*/ - if (!vfio_file_is_group(file)) { + /* Ensure the FD is a vfio FD.*/ + if (!vfio_file_is_valid(file)) { fput(file); ret = -EINVAL; break; diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h index 1091806bc89d..ef5de2872983 100644 --- a/drivers/vfio/vfio.h +++ b/drivers/vfio/vfio.h @@ -90,6 +90,10 @@ void vfio_device_group_unregister(struct vfio_device *device); int vfio_device_group_use_iommu(struct vfio_device *device); void vfio_device_group_unuse_iommu(struct vfio_device *device); void vfio_device_group_close(struct vfio_device *device); +struct vfio_group *vfio_group_from_file(struct file *file); +bool vfio_group_enforced_coherent(struct vfio_group *group); +void vfio_group_set_kvm(struct vfio_group *group, struct kvm *kvm); +bool vfio_group_has_dev(struct vfio_group *group, struct vfio_device *device); bool vfio_device_has_container(struct vfio_device *device); int __init vfio_group_init(void); void vfio_group_cleanup(void); diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c index ee54c9ae0af4..1aedfbd15ca0 100644 --- a/drivers/vfio/vfio_main.c +++ b/drivers/vfio/vfio_main.c @@ -1119,6 +1119,68 @@ const struct file_operations vfio_device_fops = { .mmap = vfio_device_fops_mmap, }; +/** + * vfio_file_is_valid - True if the file is usable with VFIO aPIS + * @file: VFIO group file or VFIO device file + */ +bool vfio_file_is_valid(struct file *file) +{ + return vfio_group_from_file(file); +} +EXPORT_SYMBOL_GPL(vfio_file_is_valid); + +/** + * vfio_file_enforced_coherent - True if the DMA associated with the VFIO file + * is always CPU cache coherent + * @file: VFIO group or device file + * + * Enforced coherency means that the IOMMU ignores things like the PCIe no-snoop + * bit in DMA transactions. A return of false indicates that the user has + * rights to access additional instructions such as wbinvd on x86. + */ +bool vfio_file_enforced_coherent(struct file *file) +{ + struct vfio_group *group = vfio_group_from_file(file); + + if (group) + return vfio_group_enforced_coherent(group); + + return true; +} +EXPORT_SYMBOL_GPL(vfio_file_enforced_coherent); + +/** + * vfio_file_set_kvm - Link a kvm with VFIO drivers + * @file: VFIO group file or device file + * @kvm: KVM to link + * + */ +void vfio_file_set_kvm(struct file *file, struct kvm *kvm) +{ + struct vfio_group *group = vfio_group_from_file(file); + + if (group) + vfio_group_set_kvm(group, kvm); +} +EXPORT_SYMBOL_GPL(vfio_file_set_kvm); + +/** + * vfio_file_has_dev - True if the VFIO file is a handle for device + * @file: VFIO file to check + * @device: Device that must be part of the file + * + * Returns true if given file has permission to manipulate the given device. + */ +bool vfio_file_has_dev(struct file *file, struct vfio_device *device) +{ + struct vfio_group *group = vfio_group_from_file(file); + + if (group) + return vfio_group_has_dev(group, device); + return false; +} +EXPORT_SYMBOL_GPL(vfio_file_has_dev); + /* * Sub-module support */ diff --git a/include/linux/vfio.h b/include/linux/vfio.h index 35be78e9ae57..46edd6e6c0ba 100644 --- a/include/linux/vfio.h +++ b/include/linux/vfio.h @@ -241,7 +241,7 @@ int vfio_mig_get_next_state(struct vfio_device *device, * External user API */ struct iommu_group *vfio_file_iommu_group(struct file *file); -bool vfio_file_is_group(struct file *file); +bool vfio_file_is_valid(struct file *file); bool vfio_file_enforced_coherent(struct file *file); void vfio_file_set_kvm(struct file *file, struct kvm *kvm); bool vfio_file_has_dev(struct file *file, struct vfio_device *device); diff --git a/virt/kvm/vfio.c b/virt/kvm/vfio.c index 495ceabffe88..868930c7a59b 100644 --- a/virt/kvm/vfio.c +++ b/virt/kvm/vfio.c @@ -64,18 +64,18 @@ static bool kvm_vfio_file_enforced_coherent(struct file *file) return ret; } -static bool kvm_vfio_file_is_group(struct file *file) +static bool kvm_vfio_file_is_valid(struct file *file) { bool (*fn)(struct file *file); bool ret; - fn = symbol_get(vfio_file_is_group); + fn = symbol_get(vfio_file_is_valid); if (!fn) return false; ret = fn(file); - symbol_put(vfio_file_is_group); + symbol_put(vfio_file_is_valid); return ret; } @@ -154,8 +154,8 @@ static int kvm_vfio_group_add(struct kvm_device *dev, unsigned int fd) if (!filp) return -EBADF; - /* Ensure the FD is a vfio group FD.*/ - if (!kvm_vfio_file_is_group(filp)) { + /* Ensure the FD is a vfio FD.*/ + if (!kvm_vfio_file_is_valid(filp)) { ret = -EINVAL; goto err_fput; } From patchwork Tue Jan 17 13:49:32 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yi Liu X-Patchwork-Id: 13104675 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 964DCC63797 for ; Tue, 17 Jan 2023 13:50:28 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231180AbjAQNu0 (ORCPT ); Tue, 17 Jan 2023 08:50:26 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46824 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230355AbjAQNt7 (ORCPT ); Tue, 17 Jan 2023 08:49:59 -0500 Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A442621A33 for ; Tue, 17 Jan 2023 05:49:51 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1673963391; x=1705499391; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=3tu+bI5XmKXqC2Ajzd3Ehhi/Qb+BLxnMybciEIZr7wI=; b=jMvhRXoxrtRLzQHH0d5XidodgDW2sH87oteAVE9Tle/LxD53br1FSNw6 pApcrk97Tt+53KzVn+El1EbePc45CmXXU2AHTYm21Vcg1UGP/W/aptYTh eFMXIDaZSUCiM+RrOXTKAN5Yo1ol0UnElFYc44kdeuuMqwvOV+Ee5W7Ye Z8qJsGFyzgbuegVzLC1UOzcOBQvqSK4SRITkQT2QomvWBGnQ313w50DED cg5LHc5mkKWdXqmjtDp0w/ju75RmZc6+gcgeH/TQDpgu3nqfUpHlPvzgv wl7HuT8WM/C/pF/1S5lL8QmjGLs/cR3W8Z6TqYiW9UJO/p0+6wFIXzH90 A==; X-IronPort-AV: E=McAfee;i="6500,9779,10592"; a="326766394" X-IronPort-AV: E=Sophos;i="5.97,224,1669104000"; d="scan'208";a="326766394" Received: from orsmga007.jf.intel.com ([10.7.209.58]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 17 Jan 2023 05:49:49 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6500,9779,10592"; a="652551022" X-IronPort-AV: E=Sophos;i="5.97,224,1669104000"; d="scan'208";a="652551022" Received: from 984fee00a4c6.jf.intel.com ([10.165.58.231]) by orsmga007.jf.intel.com with ESMTP; 17 Jan 2023 05:49:49 -0800 From: Yi Liu To: alex.williamson@redhat.com, jgg@nvidia.com Cc: kevin.tian@intel.com, cohuck@redhat.com, eric.auger@redhat.com, nicolinc@nvidia.com, kvm@vger.kernel.org, mjrosato@linux.ibm.com, chao.p.peng@linux.intel.com, yi.l.liu@intel.com, yi.y.sun@linux.intel.com, peterx@redhat.com, jasowang@redhat.com, suravee.suthikulpanit@amd.com Subject: [PATCH 03/13] vfio: Accept vfio device file in the driver facing kAPI Date: Tue, 17 Jan 2023 05:49:32 -0800 Message-Id: <20230117134942.101112-4-yi.l.liu@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230117134942.101112-1-yi.l.liu@intel.com> References: <20230117134942.101112-1-yi.l.liu@intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org This makes the vfio file kAPIs to accepte vfio device files, also a preparation for vfio device cdev support. Signed-off-by: Yi Liu Reviewed-by: Kevin Tian --- drivers/vfio/vfio.h | 1 + drivers/vfio/vfio_main.c | 51 ++++++++++++++++++++++++++++++++++++---- 2 files changed, 48 insertions(+), 4 deletions(-) diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h index ef5de2872983..53af6e3ea214 100644 --- a/drivers/vfio/vfio.h +++ b/drivers/vfio/vfio.h @@ -18,6 +18,7 @@ struct vfio_container; struct vfio_device_file { struct vfio_device *device; + struct kvm *kvm; }; void vfio_device_put_registration(struct vfio_device *device); diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c index 1aedfbd15ca0..dc08d5dd62cc 100644 --- a/drivers/vfio/vfio_main.c +++ b/drivers/vfio/vfio_main.c @@ -1119,13 +1119,23 @@ const struct file_operations vfio_device_fops = { .mmap = vfio_device_fops_mmap, }; +static struct vfio_device *vfio_device_from_file(struct file *file) +{ + struct vfio_device_file *df = file->private_data; + + if (file->f_op != &vfio_device_fops) + return NULL; + return df->device; +} + /** * vfio_file_is_valid - True if the file is usable with VFIO aPIS * @file: VFIO group file or VFIO device file */ bool vfio_file_is_valid(struct file *file) { - return vfio_group_from_file(file); + return vfio_group_from_file(file) || + vfio_device_from_file(file); } EXPORT_SYMBOL_GPL(vfio_file_is_valid); @@ -1140,15 +1150,37 @@ EXPORT_SYMBOL_GPL(vfio_file_is_valid); */ bool vfio_file_enforced_coherent(struct file *file) { - struct vfio_group *group = vfio_group_from_file(file); + struct vfio_group *group; + struct vfio_device *device; + group = vfio_group_from_file(file); if (group) return vfio_group_enforced_coherent(group); + device = vfio_device_from_file(file); + if (device) + return device_iommu_capable(device->dev, + IOMMU_CAP_ENFORCE_CACHE_COHERENCY); + return true; } EXPORT_SYMBOL_GPL(vfio_file_enforced_coherent); +static void vfio_device_file_set_kvm(struct file *file, struct kvm *kvm) +{ + struct vfio_device_file *df = file->private_data; + struct vfio_device *device = df->device; + + /* + * The kvm is first recorded in the df, and will be propagated + * to vfio_device::kvm when the file binds iommufd successfully in + * the vfio device cdev path. + */ + mutex_lock(&device->dev_set->lock); + df->kvm = kvm; + mutex_unlock(&device->dev_set->lock); +} + /** * vfio_file_set_kvm - Link a kvm with VFIO drivers * @file: VFIO group file or device file @@ -1157,10 +1189,14 @@ EXPORT_SYMBOL_GPL(vfio_file_enforced_coherent); */ void vfio_file_set_kvm(struct file *file, struct kvm *kvm) { - struct vfio_group *group = vfio_group_from_file(file); + struct vfio_group *group; + group = vfio_group_from_file(file); if (group) vfio_group_set_kvm(group, kvm); + + if (vfio_device_from_file(file)) + vfio_device_file_set_kvm(file, kvm); } EXPORT_SYMBOL_GPL(vfio_file_set_kvm); @@ -1173,10 +1209,17 @@ EXPORT_SYMBOL_GPL(vfio_file_set_kvm); */ bool vfio_file_has_dev(struct file *file, struct vfio_device *device) { - struct vfio_group *group = vfio_group_from_file(file); + struct vfio_group *group; + struct vfio_device *vdev; + group = vfio_group_from_file(file); if (group) return vfio_group_has_dev(group, device); + + vdev = vfio_device_from_file(file); + if (device) + return vdev == device; + return false; } EXPORT_SYMBOL_GPL(vfio_file_has_dev); From patchwork Tue Jan 17 13:49:33 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yi Liu X-Patchwork-Id: 13104677 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id CAC73C63797 for ; Tue, 17 Jan 2023 13:50:33 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231263AbjAQNuc (ORCPT ); Tue, 17 Jan 2023 08:50:32 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46828 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230359AbjAQNt7 (ORCPT ); Tue, 17 Jan 2023 08:49:59 -0500 Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3FF9A3B3FC for ; Tue, 17 Jan 2023 05:49:52 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1673963392; x=1705499392; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=u6ravFXwu0qcSyxTT4L4QXDzCd7tAVXR36AsFI5W24s=; b=Ab9l0VC7tE8LYo9FXPv3cYjfufX3N/ELyGNkveDRFLd5ah5otXXNVhgx p9Uz1FQgY+sGvHwdAJoSqTAL9lNc1tLShSveOfygs5s1s10gAjXL7LNqS AYKTva4excygjpEKsDYq0d1yWBYtP27henRXsadNT7irZWqSe248pGVHr oDmj6JlX33mSRihNp6UcgNBMBe5QDDjuzJnqYns6UG0yZi1ZMd1vsp9gt Oe6Tcw5WZNZeM3Agivhkz4EtOo8DnSAgSyo7XqfDNhYg6MUql3MOgRzNm VEQM25W/9DX6xC8Q47be/uCEkN1v5U1tSGxaDQLeZk4Cuec4XHRUJXGYA g==; X-IronPort-AV: E=McAfee;i="6500,9779,10592"; a="326766400" X-IronPort-AV: E=Sophos;i="5.97,224,1669104000"; d="scan'208";a="326766400" Received: from orsmga007.jf.intel.com ([10.7.209.58]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 17 Jan 2023 05:49:50 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6500,9779,10592"; a="652551027" X-IronPort-AV: E=Sophos;i="5.97,224,1669104000"; d="scan'208";a="652551027" Received: from 984fee00a4c6.jf.intel.com ([10.165.58.231]) by orsmga007.jf.intel.com with ESMTP; 17 Jan 2023 05:49:50 -0800 From: Yi Liu To: alex.williamson@redhat.com, jgg@nvidia.com Cc: kevin.tian@intel.com, cohuck@redhat.com, eric.auger@redhat.com, nicolinc@nvidia.com, kvm@vger.kernel.org, mjrosato@linux.ibm.com, chao.p.peng@linux.intel.com, yi.l.liu@intel.com, yi.y.sun@linux.intel.com, peterx@redhat.com, jasowang@redhat.com, suravee.suthikulpanit@amd.com Subject: [PATCH 04/13] kvm/vfio: Rename kvm_vfio_group to prepare for accepting vfio device fd Date: Tue, 17 Jan 2023 05:49:33 -0800 Message-Id: <20230117134942.101112-5-yi.l.liu@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230117134942.101112-1-yi.l.liu@intel.com> References: <20230117134942.101112-1-yi.l.liu@intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Meanwhile, rename related helpers. No functional change is intended. Signed-off-by: Yi Liu Reviewed-by: Kevin Tian Reviewed-by: Eric Auger --- virt/kvm/vfio.c | 115 ++++++++++++++++++++++++------------------------ 1 file changed, 58 insertions(+), 57 deletions(-) diff --git a/virt/kvm/vfio.c b/virt/kvm/vfio.c index 868930c7a59b..0f54b9d308d7 100644 --- a/virt/kvm/vfio.c +++ b/virt/kvm/vfio.c @@ -21,7 +21,7 @@ #include #endif -struct kvm_vfio_group { +struct kvm_vfio_file { struct list_head node; struct file *file; #ifdef CONFIG_SPAPR_TCE_IOMMU @@ -30,7 +30,7 @@ struct kvm_vfio_group { }; struct kvm_vfio { - struct list_head group_list; + struct list_head file_list; struct mutex lock; bool noncoherent; }; @@ -98,34 +98,35 @@ static struct iommu_group *kvm_vfio_file_iommu_group(struct file *file) } static void kvm_spapr_tce_release_vfio_group(struct kvm *kvm, - struct kvm_vfio_group *kvg) + struct kvm_vfio_file *kvf) { - if (WARN_ON_ONCE(!kvg->iommu_group)) + if (WARN_ON_ONCE(!kvf->iommu_group)) return; - kvm_spapr_tce_release_iommu_group(kvm, kvg->iommu_group); - iommu_group_put(kvg->iommu_group); - kvg->iommu_group = NULL; + kvm_spapr_tce_release_iommu_group(kvm, kvf->iommu_group); + iommu_group_put(kvf->iommu_group); + kvf->iommu_group = NULL; } #endif /* - * Groups can use the same or different IOMMU domains. If the same then - * adding a new group may change the coherency of groups we've previously - * been told about. We don't want to care about any of that so we retest - * each group and bail as soon as we find one that's noncoherent. This - * means we only ever [un]register_noncoherent_dma once for the whole device. + * Groups/devices can use the same or different IOMMU domains. If the same + * then adding a new group/device may change the coherency of groups/devices + * we've previously been told about. We don't want to care about any of + * that so we retest each group/device and bail as soon as we find one that's + * noncoherent. This means we only ever [un]register_noncoherent_dma once + * for the whole device. */ static void kvm_vfio_update_coherency(struct kvm_device *dev) { struct kvm_vfio *kv = dev->private; bool noncoherent = false; - struct kvm_vfio_group *kvg; + struct kvm_vfio_file *kvf; mutex_lock(&kv->lock); - list_for_each_entry(kvg, &kv->group_list, node) { - if (!kvm_vfio_file_enforced_coherent(kvg->file)) { + list_for_each_entry(kvf, &kv->file_list, node) { + if (!kvm_vfio_file_enforced_coherent(kvf->file)) { noncoherent = true; break; } @@ -143,10 +144,10 @@ static void kvm_vfio_update_coherency(struct kvm_device *dev) mutex_unlock(&kv->lock); } -static int kvm_vfio_group_add(struct kvm_device *dev, unsigned int fd) +static int kvm_vfio_file_add(struct kvm_device *dev, unsigned int fd) { struct kvm_vfio *kv = dev->private; - struct kvm_vfio_group *kvg; + struct kvm_vfio_file *kvf; struct file *filp; int ret; @@ -162,27 +163,27 @@ static int kvm_vfio_group_add(struct kvm_device *dev, unsigned int fd) mutex_lock(&kv->lock); - list_for_each_entry(kvg, &kv->group_list, node) { - if (kvg->file == filp) { + list_for_each_entry(kvf, &kv->file_list, node) { + if (kvf->file == filp) { ret = -EEXIST; goto err_unlock; } } - kvg = kzalloc(sizeof(*kvg), GFP_KERNEL_ACCOUNT); - if (!kvg) { + kvf = kzalloc(sizeof(*kvf), GFP_KERNEL_ACCOUNT); + if (!kvf) { ret = -ENOMEM; goto err_unlock; } - kvg->file = filp; - list_add_tail(&kvg->node, &kv->group_list); + kvf->file = filp; + list_add_tail(&kvf->node, &kv->file_list); kvm_arch_start_assignment(dev->kvm); mutex_unlock(&kv->lock); - kvm_vfio_file_set_kvm(kvg->file, dev->kvm); + kvm_vfio_file_set_kvm(kvf->file, dev->kvm); kvm_vfio_update_coherency(dev); return 0; @@ -193,10 +194,10 @@ static int kvm_vfio_group_add(struct kvm_device *dev, unsigned int fd) return ret; } -static int kvm_vfio_group_del(struct kvm_device *dev, unsigned int fd) +static int kvm_vfio_file_del(struct kvm_device *dev, unsigned int fd) { struct kvm_vfio *kv = dev->private; - struct kvm_vfio_group *kvg; + struct kvm_vfio_file *kvf; struct fd f; int ret; @@ -208,18 +209,18 @@ static int kvm_vfio_group_del(struct kvm_device *dev, unsigned int fd) mutex_lock(&kv->lock); - list_for_each_entry(kvg, &kv->group_list, node) { - if (kvg->file != f.file) + list_for_each_entry(kvf, &kv->file_list, node) { + if (kvf->file != f.file) continue; - list_del(&kvg->node); + list_del(&kvf->node); kvm_arch_end_assignment(dev->kvm); #ifdef CONFIG_SPAPR_TCE_IOMMU - kvm_spapr_tce_release_vfio_group(dev->kvm, kvg); + kvm_spapr_tce_release_vfio_group(dev->kvm, kvf); #endif - kvm_vfio_file_set_kvm(kvg->file, NULL); - fput(kvg->file); - kfree(kvg); + kvm_vfio_file_set_kvm(kvf->file, NULL); + fput(kvf->file); + kfree(kvf); ret = 0; break; } @@ -234,12 +235,12 @@ static int kvm_vfio_group_del(struct kvm_device *dev, unsigned int fd) } #ifdef CONFIG_SPAPR_TCE_IOMMU -static int kvm_vfio_group_set_spapr_tce(struct kvm_device *dev, - void __user *arg) +static int kvm_vfio_file_set_spapr_tce(struct kvm_device *dev, + void __user *arg) { struct kvm_vfio_spapr_tce param; struct kvm_vfio *kv = dev->private; - struct kvm_vfio_group *kvg; + struct kvm_vfio_file *kvf; struct fd f; int ret; @@ -254,20 +255,20 @@ static int kvm_vfio_group_set_spapr_tce(struct kvm_device *dev, mutex_lock(&kv->lock); - list_for_each_entry(kvg, &kv->group_list, node) { - if (kvg->file != f.file) + list_for_each_entry(kvf, &kv->file_list, node) { + if (kvf->file != f.file) continue; - if (!kvg->iommu_group) { - kvg->iommu_group = kvm_vfio_file_iommu_group(kvg->file); - if (WARN_ON_ONCE(!kvg->iommu_group)) { + if (!kvf->iommu_group) { + kvf->iommu_group = kvm_vfio_file_iommu_group(kvf->file); + if (WARN_ON_ONCE(!kvf->iommu_group)) { ret = -EIO; goto err_fdput; } } ret = kvm_spapr_tce_attach_iommu_group(dev->kvm, param.tablefd, - kvg->iommu_group); + kvf->iommu_group); break; } @@ -278,8 +279,8 @@ static int kvm_vfio_group_set_spapr_tce(struct kvm_device *dev, } #endif -static int kvm_vfio_set_group(struct kvm_device *dev, long attr, - void __user *arg) +static int kvm_vfio_set_file(struct kvm_device *dev, long attr, + void __user *arg) { int32_t __user *argp = arg; int32_t fd; @@ -288,16 +289,16 @@ static int kvm_vfio_set_group(struct kvm_device *dev, long attr, case KVM_DEV_VFIO_GROUP_ADD: if (get_user(fd, argp)) return -EFAULT; - return kvm_vfio_group_add(dev, fd); + return kvm_vfio_file_add(dev, fd); case KVM_DEV_VFIO_GROUP_DEL: if (get_user(fd, argp)) return -EFAULT; - return kvm_vfio_group_del(dev, fd); + return kvm_vfio_file_del(dev, fd); #ifdef CONFIG_SPAPR_TCE_IOMMU case KVM_DEV_VFIO_GROUP_SET_SPAPR_TCE: - return kvm_vfio_group_set_spapr_tce(dev, arg); + return kvm_vfio_file_set_spapr_tce(dev, arg); #endif } @@ -309,8 +310,8 @@ static int kvm_vfio_set_attr(struct kvm_device *dev, { switch (attr->group) { case KVM_DEV_VFIO_GROUP: - return kvm_vfio_set_group(dev, attr->attr, - u64_to_user_ptr(attr->addr)); + return kvm_vfio_set_file(dev, attr->attr, + u64_to_user_ptr(attr->addr)); } return -ENXIO; @@ -339,16 +340,16 @@ static int kvm_vfio_has_attr(struct kvm_device *dev, static void kvm_vfio_destroy(struct kvm_device *dev) { struct kvm_vfio *kv = dev->private; - struct kvm_vfio_group *kvg, *tmp; + struct kvm_vfio_file *kvf, *tmp; - list_for_each_entry_safe(kvg, tmp, &kv->group_list, node) { + list_for_each_entry_safe(kvf, tmp, &kv->file_list, node) { #ifdef CONFIG_SPAPR_TCE_IOMMU - kvm_spapr_tce_release_vfio_group(dev->kvm, kvg); + kvm_spapr_tce_release_vfio_group(dev->kvm, kvf); #endif - kvm_vfio_file_set_kvm(kvg->file, NULL); - fput(kvg->file); - list_del(&kvg->node); - kfree(kvg); + kvm_vfio_file_set_kvm(kvf->file, NULL); + fput(kvf->file); + list_del(&kvf->node); + kfree(kvf); kvm_arch_end_assignment(dev->kvm); } @@ -382,7 +383,7 @@ static int kvm_vfio_create(struct kvm_device *dev, u32 type) if (!kv) return -ENOMEM; - INIT_LIST_HEAD(&kv->group_list); + INIT_LIST_HEAD(&kv->file_list); mutex_init(&kv->lock); dev->private = kv; From patchwork Tue Jan 17 13:49:34 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yi Liu X-Patchwork-Id: 13104679 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9C5C0C3DA78 for ; Tue, 17 Jan 2023 13:50:37 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230395AbjAQNug (ORCPT ); Tue, 17 Jan 2023 08:50:36 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46842 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230102AbjAQNuB (ORCPT ); Tue, 17 Jan 2023 08:50:01 -0500 Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id BF8CF3B65F for ; Tue, 17 Jan 2023 05:49:53 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1673963393; x=1705499393; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=CpwZVJplIyYSK1nrLUTtDPHHHdMW/9c8QLMcMf23i4g=; b=Z7bLEu7OO0oK6/Udhw51u95AwIweDKbv4d1M6zqjeG07GGblTTW/ZMft WcyQhoLhtCWB/AomPY9gUu5DPLnTeMPM746jd5ZWIRgQAe2hw6GfTWqQi m1l41jb8X3+G2dhJ8eUAkIePzaKNOYLbyQsFxn0z97bw2rcpfYS3tTnJ4 i2ENIHq2kK1y/aGIMdI1RBEc8yP1SxiXrQnPGfIV738XMMoELYdrhcTC4 5Tl+m+Iu/ijRnL5wZB26JIYbGNoJgkNZUv20rR3lWhvn2XD4fEJnWatSq Jfb2aQxl9gcZMi1QAeuckBnHOXQVqmYtsBFeNQc6A2B3gd2Nrd3DS/JJe A==; X-IronPort-AV: E=McAfee;i="6500,9779,10592"; a="326766408" X-IronPort-AV: E=Sophos;i="5.97,224,1669104000"; d="scan'208";a="326766408" Received: from orsmga007.jf.intel.com ([10.7.209.58]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 17 Jan 2023 05:49:51 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6500,9779,10592"; a="652551036" X-IronPort-AV: E=Sophos;i="5.97,224,1669104000"; d="scan'208";a="652551036" Received: from 984fee00a4c6.jf.intel.com ([10.165.58.231]) by orsmga007.jf.intel.com with ESMTP; 17 Jan 2023 05:49:51 -0800 From: Yi Liu To: alex.williamson@redhat.com, jgg@nvidia.com Cc: kevin.tian@intel.com, cohuck@redhat.com, eric.auger@redhat.com, nicolinc@nvidia.com, kvm@vger.kernel.org, mjrosato@linux.ibm.com, chao.p.peng@linux.intel.com, yi.l.liu@intel.com, yi.y.sun@linux.intel.com, peterx@redhat.com, jasowang@redhat.com, suravee.suthikulpanit@amd.com Subject: [PATCH 05/13] kvm/vfio: Provide struct kvm_device_ops::release() insted of ::destroy() Date: Tue, 17 Jan 2023 05:49:34 -0800 Message-Id: <20230117134942.101112-6-yi.l.liu@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230117134942.101112-1-yi.l.liu@intel.com> References: <20230117134942.101112-1-yi.l.liu@intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org This is to avoid a circular refcount problem between the kvm struct and the device file. KVM modules holds device/group file reference when the device/group is added and releases it per removal or the last kvm reference is released. This reference model is ok for the group since there is no kvm reference in the group paths. But it is a problem for device file since the vfio devices may get kvm reference in the device open path and put it in the device file release. e.g. Intel kvmgt. This would result in a circular issue since the kvm side won't put the device file reference if kvm reference is not 0, while the vfio device side needs to put kvm reference in the release callback. To solve this problem for device file, let vfio provide release() which would be called once kvm file is closed, it won't depend on the last kvm reference. Hence avoid circular refcount problem. Suggested-by: Kevin Tian Signed-off-by: Yi Liu Reviewed-by: Jason Gunthorpe --- virt/kvm/vfio.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/virt/kvm/vfio.c b/virt/kvm/vfio.c index 0f54b9d308d7..525efe37ab6d 100644 --- a/virt/kvm/vfio.c +++ b/virt/kvm/vfio.c @@ -364,7 +364,7 @@ static int kvm_vfio_create(struct kvm_device *dev, u32 type); static struct kvm_device_ops kvm_vfio_ops = { .name = "kvm-vfio", .create = kvm_vfio_create, - .destroy = kvm_vfio_destroy, + .release = kvm_vfio_destroy, .set_attr = kvm_vfio_set_attr, .has_attr = kvm_vfio_has_attr, }; From patchwork Tue Jan 17 13:49:35 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yi Liu X-Patchwork-Id: 13104678 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 92507C677F1 for ; Tue, 17 Jan 2023 13:50:35 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231240AbjAQNue (ORCPT ); Tue, 17 Jan 2023 08:50:34 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46598 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230372AbjAQNuB (ORCPT ); Tue, 17 Jan 2023 08:50:01 -0500 Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C32413B66B for ; Tue, 17 Jan 2023 05:49:53 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1673963393; x=1705499393; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=/Zh5NxPU7y9q/SzyiYHn3tbHNGtzNTn2O5tmfYARnRk=; b=I2fa6pd3mzoDWAB9tERo9z+4woWEXb53L9WMCSWn+D3xeGWfq5lhGQiA 2M76K3lDA6twWZ85n4QyRxAKbhGS3Seffy41hOSIS56wVXuA/iiCTflkA yE0E+8vUyylC5qCx2R68ubMmytzogGjLrzH8mjtURYaHrkIjftvleszJK JfAG6S8kDwSy5USPN/+T6Ip/VMEMG6X3fACL9CS/WJjT5/WAHeaZ3vmbN zc1dBZOFoskhxYPvD7j8kNSencwq1ugt9YhYo26dm2lf5z0JCfpP4Nngt aJLwmzd3U9Kk9X6Q4OmDHt/oGuNY5NlUntO0HBRNGReOmn6DDRkeynO9S A==; X-IronPort-AV: E=McAfee;i="6500,9779,10592"; a="326766414" X-IronPort-AV: E=Sophos;i="5.97,224,1669104000"; d="scan'208";a="326766414" Received: from orsmga007.jf.intel.com ([10.7.209.58]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 17 Jan 2023 05:49:52 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6500,9779,10592"; a="652551043" X-IronPort-AV: E=Sophos;i="5.97,224,1669104000"; d="scan'208";a="652551043" Received: from 984fee00a4c6.jf.intel.com ([10.165.58.231]) by orsmga007.jf.intel.com with ESMTP; 17 Jan 2023 05:49:52 -0800 From: Yi Liu To: alex.williamson@redhat.com, jgg@nvidia.com Cc: kevin.tian@intel.com, cohuck@redhat.com, eric.auger@redhat.com, nicolinc@nvidia.com, kvm@vger.kernel.org, mjrosato@linux.ibm.com, chao.p.peng@linux.intel.com, yi.l.liu@intel.com, yi.y.sun@linux.intel.com, peterx@redhat.com, jasowang@redhat.com, suravee.suthikulpanit@amd.com Subject: [PATCH 06/13] kvm/vfio: Accept vfio device file from userspace Date: Tue, 17 Jan 2023 05:49:35 -0800 Message-Id: <20230117134942.101112-7-yi.l.liu@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230117134942.101112-1-yi.l.liu@intel.com> References: <20230117134942.101112-1-yi.l.liu@intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org This defines KVM_DEV_VFIO_FILE* and make alias with KVM_DEV_VFIO_GROUP*. Old userspace uses KVM_DEV_VFIO_GROUP* works as well. Signed-off-by: Yi Liu --- Documentation/virt/kvm/devices/vfio.rst | 32 ++++++++++++------------- include/uapi/linux/kvm.h | 23 +++++++++++++----- virt/kvm/vfio.c | 18 +++++++------- 3 files changed, 42 insertions(+), 31 deletions(-) diff --git a/Documentation/virt/kvm/devices/vfio.rst b/Documentation/virt/kvm/devices/vfio.rst index 2d20dc561069..ac4300ded398 100644 --- a/Documentation/virt/kvm/devices/vfio.rst +++ b/Documentation/virt/kvm/devices/vfio.rst @@ -9,23 +9,23 @@ Device types supported: - KVM_DEV_TYPE_VFIO Only one VFIO instance may be created per VM. The created device -tracks VFIO groups in use by the VM and features of those groups -important to the correctness and acceleration of the VM. As groups -are enabled and disabled for use by the VM, KVM should be updated -about their presence. When registered with KVM, a reference to the -VFIO-group is held by KVM. - -Groups: - KVM_DEV_VFIO_GROUP - -KVM_DEV_VFIO_GROUP attributes: - KVM_DEV_VFIO_GROUP_ADD: Add a VFIO group to VFIO-KVM device tracking - kvm_device_attr.addr points to an int32_t file descriptor +tracks VFIO files (group or device) in use by the VM and features +of those groups/devices important to the correctness and acceleration +of the VM. As groups/device are enabled and disabled for use by the +VM, KVM should be updated about their presence. When registered with +KVM, a reference to the VFIO file is held by KVM. + +VFIO Files: + KVM_DEV_VFIO_FILE + +KVM_DEV_VFIO_FILE attributes: + KVM_DEV_VFIO_FILE_ADD: Add a VFIO file (group/device) to VFIO-KVM device + tracking kvm_device_attr.addr points to an int32_t file descriptor + for the VFIO file. + KVM_DEV_VFIO_FILE_DEL: Remove a VFIO file (group/device) from VFIO-KVM device + tracking kvm_device_attr.addr points to an int32_t file descriptor for the VFIO group. - KVM_DEV_VFIO_GROUP_DEL: Remove a VFIO group from VFIO-KVM device tracking - kvm_device_attr.addr points to an int32_t file descriptor - for the VFIO group. - KVM_DEV_VFIO_GROUP_SET_SPAPR_TCE: attaches a guest visible TCE table + KVM_DEV_VFIO_FILE_SET_SPAPR_TCE: attaches a guest visible TCE table allocated by sPAPR KVM. kvm_device_attr.addr points to a struct:: diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h index 55155e262646..ad36e144a41d 100644 --- a/include/uapi/linux/kvm.h +++ b/include/uapi/linux/kvm.h @@ -1396,15 +1396,26 @@ struct kvm_create_device { struct kvm_device_attr { __u32 flags; /* no flags currently defined */ - __u32 group; /* device-defined */ - __u64 attr; /* group-defined */ + union { + __u32 group; + __u32 file; + }; /* device-defined */ + __u64 attr; /* VFIO-file-defined or group-defined */ __u64 addr; /* userspace address of attr data */ }; -#define KVM_DEV_VFIO_GROUP 1 -#define KVM_DEV_VFIO_GROUP_ADD 1 -#define KVM_DEV_VFIO_GROUP_DEL 2 -#define KVM_DEV_VFIO_GROUP_SET_SPAPR_TCE 3 +#define KVM_DEV_VFIO_FILE 1 + +#define KVM_DEV_VFIO_FILE_ADD 1 +#define KVM_DEV_VFIO_FILE_DEL 2 +#define KVM_DEV_VFIO_FILE_SET_SPAPR_TCE 3 + +/* Group aliases are for compile time uapi compatibility */ +#define KVM_DEV_VFIO_GROUP KVM_DEV_VFIO_FILE + +#define KVM_DEV_VFIO_GROUP_ADD KVM_DEV_VFIO_FILE_ADD +#define KVM_DEV_VFIO_GROUP_DEL KVM_DEV_VFIO_FILE_DEL +#define KVM_DEV_VFIO_GROUP_SET_SPAPR_TCE KVM_DEV_VFIO_FILE_SET_SPAPR_TCE enum kvm_device_type { KVM_DEV_TYPE_FSL_MPIC_20 = 1, diff --git a/virt/kvm/vfio.c b/virt/kvm/vfio.c index 525efe37ab6d..e73ca60af3ae 100644 --- a/virt/kvm/vfio.c +++ b/virt/kvm/vfio.c @@ -286,18 +286,18 @@ static int kvm_vfio_set_file(struct kvm_device *dev, long attr, int32_t fd; switch (attr) { - case KVM_DEV_VFIO_GROUP_ADD: + case KVM_DEV_VFIO_FILE_ADD: if (get_user(fd, argp)) return -EFAULT; return kvm_vfio_file_add(dev, fd); - case KVM_DEV_VFIO_GROUP_DEL: + case KVM_DEV_VFIO_FILE_DEL: if (get_user(fd, argp)) return -EFAULT; return kvm_vfio_file_del(dev, fd); #ifdef CONFIG_SPAPR_TCE_IOMMU - case KVM_DEV_VFIO_GROUP_SET_SPAPR_TCE: + case KVM_DEV_VFIO_FILE_SET_SPAPR_TCE: return kvm_vfio_file_set_spapr_tce(dev, arg); #endif } @@ -309,7 +309,7 @@ static int kvm_vfio_set_attr(struct kvm_device *dev, struct kvm_device_attr *attr) { switch (attr->group) { - case KVM_DEV_VFIO_GROUP: + case KVM_DEV_VFIO_FILE: return kvm_vfio_set_file(dev, attr->attr, u64_to_user_ptr(attr->addr)); } @@ -320,13 +320,13 @@ static int kvm_vfio_set_attr(struct kvm_device *dev, static int kvm_vfio_has_attr(struct kvm_device *dev, struct kvm_device_attr *attr) { - switch (attr->group) { - case KVM_DEV_VFIO_GROUP: + switch (attr->file) { + case KVM_DEV_VFIO_FILE: switch (attr->attr) { - case KVM_DEV_VFIO_GROUP_ADD: - case KVM_DEV_VFIO_GROUP_DEL: + case KVM_DEV_VFIO_FILE_ADD: + case KVM_DEV_VFIO_FILE_DEL: #ifdef CONFIG_SPAPR_TCE_IOMMU - case KVM_DEV_VFIO_GROUP_SET_SPAPR_TCE: + case KVM_DEV_VFIO_FILE_SET_SPAPR_TCE: #endif return 0; } From patchwork Tue Jan 17 13:49:36 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yi Liu X-Patchwork-Id: 13104681 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3B806C63797 for ; Tue, 17 Jan 2023 13:50:41 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230195AbjAQNuj (ORCPT ); Tue, 17 Jan 2023 08:50:39 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46724 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230460AbjAQNuD (ORCPT ); Tue, 17 Jan 2023 08:50:03 -0500 Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 72FBB3B674 for ; Tue, 17 Jan 2023 05:49:54 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1673963394; x=1705499394; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=X2wMousrXWtPiKPa2X5dsrGJnfQx7rARTKO+czpRmCg=; b=DX60nCxEjGiMS9ykzj88ux+jUOQ97/GWWeokT5EgwdLl4D64DUHkCqW8 SabsVI8sdfHIkcDpDFXSNcglF6FAcI9YyprGgwxUPxZgU5H/v/P3e6Y3T 1xTOzmP+AjLz7iwZeT0sAFwlZZ7tUfyxxS8YzEf/FW0yXLoq/oukrp75N 28OmYFjEZW+7xuYXTC4+3na0mSwKtHxDxLqXTZLK5WS6+aqAgBFHElgyF GAoxBWPlgW39r+hC03PN4Oi9PushhDt+EXIFr0TRrmGiwqItrDqLPy+lP 5buiDj2U18R6ETo+pLfKBMikxJ1TJdAZvgw/DllngvdJES3koQbLVjeyt A==; X-IronPort-AV: E=McAfee;i="6500,9779,10592"; a="326766424" X-IronPort-AV: E=Sophos;i="5.97,224,1669104000"; d="scan'208";a="326766424" Received: from orsmga007.jf.intel.com ([10.7.209.58]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 17 Jan 2023 05:49:53 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6500,9779,10592"; a="652551050" X-IronPort-AV: E=Sophos;i="5.97,224,1669104000"; d="scan'208";a="652551050" Received: from 984fee00a4c6.jf.intel.com ([10.165.58.231]) by orsmga007.jf.intel.com with ESMTP; 17 Jan 2023 05:49:52 -0800 From: Yi Liu To: alex.williamson@redhat.com, jgg@nvidia.com Cc: kevin.tian@intel.com, cohuck@redhat.com, eric.auger@redhat.com, nicolinc@nvidia.com, kvm@vger.kernel.org, mjrosato@linux.ibm.com, chao.p.peng@linux.intel.com, yi.l.liu@intel.com, yi.y.sun@linux.intel.com, peterx@redhat.com, jasowang@redhat.com, suravee.suthikulpanit@amd.com Subject: [PATCH 07/13] vfio: Pass struct vfio_device_file * to vfio_device_open/close() Date: Tue, 17 Jan 2023 05:49:36 -0800 Message-Id: <20230117134942.101112-8-yi.l.liu@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230117134942.101112-1-yi.l.liu@intel.com> References: <20230117134942.101112-1-yi.l.liu@intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org This avoids passing struct kvm * and struct iommufd_ctx * in multiple functions. vfio_device_open() becomes to be a locked helper. Signed-off-by: Yi Liu Reviewed-by: Kevin Tian --- drivers/vfio/group.c | 34 +++++++++++++++++++++++++--------- drivers/vfio/vfio.h | 10 +++++----- drivers/vfio/vfio_main.c | 40 ++++++++++++++++++++++++---------------- 3 files changed, 54 insertions(+), 30 deletions(-) diff --git a/drivers/vfio/group.c b/drivers/vfio/group.c index d83cf069d290..7200304663e5 100644 --- a/drivers/vfio/group.c +++ b/drivers/vfio/group.c @@ -154,33 +154,49 @@ static int vfio_group_ioctl_set_container(struct vfio_group *group, return ret; } -static int vfio_device_group_open(struct vfio_device *device) +static int vfio_device_group_open(struct vfio_device_file *df) { + struct vfio_device *device = df->device; int ret; mutex_lock(&device->group->group_lock); if (!vfio_group_has_iommu(device->group)) { ret = -EINVAL; - goto out_unlock; + goto err_unlock_group; } + mutex_lock(&device->dev_set->lock); /* * Here we pass the KVM pointer with the group under the lock. If the * device driver will use it, it must obtain a reference and release it * during close_device. */ - ret = vfio_device_open(device, device->group->iommufd, - device->group->kvm); + df->kvm = device->group->kvm; + df->iommufd = device->group->iommufd; + + ret = vfio_device_open(df); + if (ret) + goto err_unlock_device; + mutex_unlock(&device->dev_set->lock); -out_unlock: + mutex_unlock(&device->group->group_lock); + return 0; + +err_unlock_device: + df->kvm = NULL; + df->iommufd = NULL; + mutex_unlock(&device->dev_set->lock); +err_unlock_group: mutex_unlock(&device->group->group_lock); return ret; } -void vfio_device_group_close(struct vfio_device *device) +void vfio_device_group_close(struct vfio_device_file *df) { + struct vfio_device *device = df->device; + mutex_lock(&device->group->group_lock); - vfio_device_close(device, device->group->iommufd); + vfio_device_close(df); mutex_unlock(&device->group->group_lock); } @@ -196,7 +212,7 @@ static struct file *vfio_device_open_file(struct vfio_device *device) goto err_out; } - ret = vfio_device_group_open(device); + ret = vfio_device_group_open(df); if (ret) goto err_free; @@ -228,7 +244,7 @@ static struct file *vfio_device_open_file(struct vfio_device *device) return filep; err_close_device: - vfio_device_group_close(device); + vfio_device_group_close(df); err_free: kfree(df); err_out: diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h index 53af6e3ea214..3d8ba165146c 100644 --- a/drivers/vfio/vfio.h +++ b/drivers/vfio/vfio.h @@ -19,14 +19,14 @@ struct vfio_container; struct vfio_device_file { struct vfio_device *device; struct kvm *kvm; + struct iommufd_ctx *iommufd; }; void vfio_device_put_registration(struct vfio_device *device); bool vfio_device_try_get_registration(struct vfio_device *device); -int vfio_device_open(struct vfio_device *device, - struct iommufd_ctx *iommufd, struct kvm *kvm); -void vfio_device_close(struct vfio_device *device, - struct iommufd_ctx *iommufd); +int vfio_device_open(struct vfio_device_file *df); +void vfio_device_close(struct vfio_device_file *device); + struct vfio_device_file * vfio_allocate_device_file(struct vfio_device *device); @@ -90,7 +90,7 @@ void vfio_device_group_register(struct vfio_device *device); void vfio_device_group_unregister(struct vfio_device *device); int vfio_device_group_use_iommu(struct vfio_device *device); void vfio_device_group_unuse_iommu(struct vfio_device *device); -void vfio_device_group_close(struct vfio_device *device); +void vfio_device_group_close(struct vfio_device_file *df); struct vfio_group *vfio_group_from_file(struct file *file); bool vfio_group_enforced_coherent(struct vfio_group *group); void vfio_group_set_kvm(struct vfio_group *group, struct kvm *kvm); diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c index dc08d5dd62cc..3df71bd9cd1e 100644 --- a/drivers/vfio/vfio_main.c +++ b/drivers/vfio/vfio_main.c @@ -358,9 +358,11 @@ vfio_allocate_device_file(struct vfio_device *device) return df; } -static int vfio_device_first_open(struct vfio_device *device, - struct iommufd_ctx *iommufd, struct kvm *kvm) +static int vfio_device_first_open(struct vfio_device_file *df) { + struct vfio_device *device = df->device; + struct iommufd_ctx *iommufd = df->iommufd; + struct kvm *kvm = df->kvm; int ret; lockdep_assert_held(&device->dev_set->lock); @@ -394,9 +396,11 @@ static int vfio_device_first_open(struct vfio_device *device, return ret; } -static void vfio_device_last_close(struct vfio_device *device, - struct iommufd_ctx *iommufd) +static void vfio_device_last_close(struct vfio_device_file *df) { + struct vfio_device *device = df->device; + struct iommufd_ctx *iommufd = df->iommufd; + lockdep_assert_held(&device->dev_set->lock); if (device->ops->close_device) @@ -409,30 +413,34 @@ static void vfio_device_last_close(struct vfio_device *device, module_put(device->dev->driver->owner); } -int vfio_device_open(struct vfio_device *device, - struct iommufd_ctx *iommufd, struct kvm *kvm) +int vfio_device_open(struct vfio_device_file *df) { - int ret = 0; + struct vfio_device *device = df->device; + + lockdep_assert_held(&device->dev_set->lock); - mutex_lock(&device->dev_set->lock); device->open_count++; if (device->open_count == 1) { - ret = vfio_device_first_open(device, iommufd, kvm); - if (ret) + int ret; + + ret = vfio_device_first_open(df); + if (ret) { device->open_count--; + return ret; + } } - mutex_unlock(&device->dev_set->lock); - return ret; + return 0; } -void vfio_device_close(struct vfio_device *device, - struct iommufd_ctx *iommufd) +void vfio_device_close(struct vfio_device_file *df) { + struct vfio_device *device = df->device; + mutex_lock(&device->dev_set->lock); vfio_assert_device_open(device); if (device->open_count == 1) - vfio_device_last_close(device, iommufd); + vfio_device_last_close(df); device->open_count--; mutex_unlock(&device->dev_set->lock); } @@ -478,7 +486,7 @@ static int vfio_device_fops_release(struct inode *inode, struct file *filep) struct vfio_device_file *df = filep->private_data; struct vfio_device *device = df->device; - vfio_device_group_close(device); + vfio_device_group_close(df); vfio_device_put_registration(device); From patchwork Tue Jan 17 13:49:37 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yi Liu X-Patchwork-Id: 13104680 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id C5682C63797 for ; Tue, 17 Jan 2023 13:50:38 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229517AbjAQNuh (ORCPT ); Tue, 17 Jan 2023 08:50:37 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46886 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230465AbjAQNuD (ORCPT ); Tue, 17 Jan 2023 08:50:03 -0500 Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 2BDCA1A48B for ; Tue, 17 Jan 2023 05:49:56 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1673963396; x=1705499396; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=SNohEo4uoAYjf0ja6idxheK+5oGVjgxUYQVtjBwI2ec=; b=BK79cAyJ22hLLVaAhzrCmdZisMhxX7Asts/XXpgQrVjEW2G2iO5OZN+v fGfBzu7/oNZIa80iU4vP4FkFIIS6hqtExa05RmTomsQKOcum1PQrD9a4N BAHEBAUpO28eqzTNi0F+NTNvwwPJBTLa87XtwzPqEW5lGWHaczjJ6Sjdc TTpUwfB5TNTmz9pYhRycMN3heagqOPvXP3KFtysCNSNGpNf/yld3i7IdY +ihajvGo0YbdZ8evkMx3+dBEq1d18/KTIx2dUPfi9PttCaSENbAlJ6UVw lDV+gXvTv8uiZnVr55+1xD+wtf9IazKVQasQxTsbYWeJ8vhcsHuS1MNai w==; X-IronPort-AV: E=McAfee;i="6500,9779,10592"; a="326766441" X-IronPort-AV: E=Sophos;i="5.97,224,1669104000"; d="scan'208";a="326766441" Received: from orsmga007.jf.intel.com ([10.7.209.58]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 17 Jan 2023 05:49:55 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6500,9779,10592"; a="652551064" X-IronPort-AV: E=Sophos;i="5.97,224,1669104000"; d="scan'208";a="652551064" Received: from 984fee00a4c6.jf.intel.com ([10.165.58.231]) by orsmga007.jf.intel.com with ESMTP; 17 Jan 2023 05:49:55 -0800 From: Yi Liu To: alex.williamson@redhat.com, jgg@nvidia.com Cc: kevin.tian@intel.com, cohuck@redhat.com, eric.auger@redhat.com, nicolinc@nvidia.com, kvm@vger.kernel.org, mjrosato@linux.ibm.com, chao.p.peng@linux.intel.com, yi.l.liu@intel.com, yi.y.sun@linux.intel.com, peterx@redhat.com, jasowang@redhat.com, suravee.suthikulpanit@amd.com Subject: [PATCH 08/13] vfio: Block device access via device fd until device is opened Date: Tue, 17 Jan 2023 05:49:37 -0800 Message-Id: <20230117134942.101112-9-yi.l.liu@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230117134942.101112-1-yi.l.liu@intel.com> References: <20230117134942.101112-1-yi.l.liu@intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Allow the vfio_device file to be in a state where the device FD is opened but the device cannot be used by userspace (i.e. its .open_device() hasn't been called). This inbetween state is not used when the device FD is spawned from the group FD, however when we create the device FD directly by opening a cdev it will be opened in the blocked state. In the blocked state, currently only the bind operation is allowed, other device accesses are not allowed. Completing bind will allow user to further access the device. This is implemented by adding a flag in struct vfio_device_file to mark the blocked state and using a simple smp_load_acquire() to obtain the flag value and serialize all the device setup with the thread accessing this device. Due to this scheme it is not possible to unbind the FD, once it is bound, it remains bound until the FD is closed. Suggested-by: Jason Gunthorpe Signed-off-by: Yi Liu --- drivers/vfio/vfio.h | 1 + drivers/vfio/vfio_main.c | 29 +++++++++++++++++++++++++++++ 2 files changed, 30 insertions(+) diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h index 3d8ba165146c..c69a9902ea84 100644 --- a/drivers/vfio/vfio.h +++ b/drivers/vfio/vfio.h @@ -20,6 +20,7 @@ struct vfio_device_file { struct vfio_device *device; struct kvm *kvm; struct iommufd_ctx *iommufd; + bool access_granted; }; void vfio_device_put_registration(struct vfio_device *device); diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c index 3df71bd9cd1e..d442ebaa4b21 100644 --- a/drivers/vfio/vfio_main.c +++ b/drivers/vfio/vfio_main.c @@ -430,6 +430,11 @@ int vfio_device_open(struct vfio_device_file *df) } } + /* + * Paired with smp_load_acquire() in vfio_device_fops::ioctl/ + * read/write/mmap + */ + smp_store_release(&df->access_granted, true); return 0; } @@ -1058,8 +1063,14 @@ static long vfio_device_fops_unl_ioctl(struct file *filep, { struct vfio_device_file *df = filep->private_data; struct vfio_device *device = df->device; + bool access; int ret; + /* Paired with smp_store_release() in vfio_device_open() */ + access = smp_load_acquire(&df->access_granted); + if (!access) + return -EINVAL; + ret = vfio_device_pm_runtime_get(device); if (ret) return ret; @@ -1086,6 +1097,12 @@ static ssize_t vfio_device_fops_read(struct file *filep, char __user *buf, { struct vfio_device_file *df = filep->private_data; struct vfio_device *device = df->device; + bool access; + + /* Paired with smp_store_release() in vfio_device_open() */ + access = smp_load_acquire(&df->access_granted); + if (!access) + return -EINVAL; if (unlikely(!device->ops->read)) return -EINVAL; @@ -1099,6 +1116,12 @@ static ssize_t vfio_device_fops_write(struct file *filep, { struct vfio_device_file *df = filep->private_data; struct vfio_device *device = df->device; + bool access; + + /* Paired with smp_store_release() in vfio_device_open() */ + access = smp_load_acquire(&df->access_granted); + if (!access) + return -EINVAL; if (unlikely(!device->ops->write)) return -EINVAL; @@ -1110,6 +1133,12 @@ static int vfio_device_fops_mmap(struct file *filep, struct vm_area_struct *vma) { struct vfio_device_file *df = filep->private_data; struct vfio_device *device = df->device; + bool access; + + /* Paired with smp_store_release() in vfio_device_open() */ + access = smp_load_acquire(&df->access_granted); + if (!access) + return -EINVAL; if (unlikely(!device->ops->mmap)) return -EINVAL; From patchwork Tue Jan 17 13:49:38 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yi Liu X-Patchwork-Id: 13104683 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 06D3FC63797 for ; Tue, 17 Jan 2023 13:50:48 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230429AbjAQNuq (ORCPT ); Tue, 17 Jan 2023 08:50:46 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46298 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230466AbjAQNuE (ORCPT ); Tue, 17 Jan 2023 08:50:04 -0500 Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D71B03BD8F for ; Tue, 17 Jan 2023 05:49:56 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1673963396; x=1705499396; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=0brp71RRn7FOg/UwHpM47C6iY/eRONq/BgCLQUmWljI=; b=Tv4yA5fTL9ZKD5fKPiao4l1z9YdJI4DxL4WENqqsYdnPpqChruQYB69U rL+jNaw6+Lkp7z21UZ54qluZWOLo99lqWdhDN1QFsDGRp5snZsrVQHfiq bA/m1yvKrwqIx4AoCKxfboAt7u5HroUS11pRUBW4Hd20tdFgDniRh+HHl yHQxF20+hS0nF3dZScdRnsdO49z2pYouUhdr+5AreT0UPqqN3oEorSH9X l3o5+rakVGe6UoJNGsYYG3eHDdkCMTxTTaWPy4IjOh+KRzWaOJAkI51th puexkLn5G8XW4hbQ7hTXieLWs7UqgmErezjobRaPg+meLR+N//buxKSSB g==; X-IronPort-AV: E=McAfee;i="6500,9779,10592"; a="326766450" X-IronPort-AV: E=Sophos;i="5.97,224,1669104000"; d="scan'208";a="326766450" Received: from orsmga007.jf.intel.com ([10.7.209.58]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 17 Jan 2023 05:49:56 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6500,9779,10592"; a="652551072" X-IronPort-AV: E=Sophos;i="5.97,224,1669104000"; d="scan'208";a="652551072" Received: from 984fee00a4c6.jf.intel.com ([10.165.58.231]) by orsmga007.jf.intel.com with ESMTP; 17 Jan 2023 05:49:56 -0800 From: Yi Liu To: alex.williamson@redhat.com, jgg@nvidia.com Cc: kevin.tian@intel.com, cohuck@redhat.com, eric.auger@redhat.com, nicolinc@nvidia.com, kvm@vger.kernel.org, mjrosato@linux.ibm.com, chao.p.peng@linux.intel.com, yi.l.liu@intel.com, yi.y.sun@linux.intel.com, peterx@redhat.com, jasowang@redhat.com, suravee.suthikulpanit@amd.com Subject: [PATCH 09/13] vfio: Add infrastructure for bind_iommufd and attach Date: Tue, 17 Jan 2023 05:49:38 -0800 Message-Id: <20230117134942.101112-10-yi.l.liu@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230117134942.101112-1-yi.l.liu@intel.com> References: <20230117134942.101112-1-yi.l.liu@intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org This prepares to add ioctls for device cdev fd. This infrastructure includes: - add vfio_iommufd_attach() to support iommufd pgtable attach after bind_iommufd. A NULL pt_id indicates detach. - let vfio_iommufd_bind() to accept pt_id, e.g. the comapt_ioas_id in the legacy group path, and also return back dev_id if caller requires it. Signed-off-by: Yi Liu --- drivers/vfio/group.c | 12 +++++- drivers/vfio/iommufd.c | 79 ++++++++++++++++++++++++++++++---------- drivers/vfio/vfio.h | 15 ++++++-- drivers/vfio/vfio_main.c | 10 +++-- 4 files changed, 88 insertions(+), 28 deletions(-) diff --git a/drivers/vfio/group.c b/drivers/vfio/group.c index 7200304663e5..9484bb1c54a9 100644 --- a/drivers/vfio/group.c +++ b/drivers/vfio/group.c @@ -157,6 +157,8 @@ static int vfio_group_ioctl_set_container(struct vfio_group *group, static int vfio_device_group_open(struct vfio_device_file *df) { struct vfio_device *device = df->device; + u32 ioas_id; + u32 *pt_id = NULL; int ret; mutex_lock(&device->group->group_lock); @@ -165,6 +167,14 @@ static int vfio_device_group_open(struct vfio_device_file *df) goto err_unlock_group; } + if (device->group->iommufd) { + ret = iommufd_vfio_compat_ioas_id(device->group->iommufd, + &ioas_id); + if (ret) + goto err_unlock_group; + pt_id = &ioas_id; + } + mutex_lock(&device->dev_set->lock); /* * Here we pass the KVM pointer with the group under the lock. If the @@ -174,7 +184,7 @@ static int vfio_device_group_open(struct vfio_device_file *df) df->kvm = device->group->kvm; df->iommufd = device->group->iommufd; - ret = vfio_device_open(df); + ret = vfio_device_open(df, NULL, pt_id); if (ret) goto err_unlock_device; mutex_unlock(&device->dev_set->lock); diff --git a/drivers/vfio/iommufd.c b/drivers/vfio/iommufd.c index 4f82a6fa7c6c..412644fdbf16 100644 --- a/drivers/vfio/iommufd.c +++ b/drivers/vfio/iommufd.c @@ -10,9 +10,17 @@ MODULE_IMPORT_NS(IOMMUFD); MODULE_IMPORT_NS(IOMMUFD_VFIO); -int vfio_iommufd_bind(struct vfio_device *vdev, struct iommufd_ctx *ictx) +/* @pt_id == NULL implies detach */ +int vfio_iommufd_attach(struct vfio_device *vdev, u32 *pt_id) +{ + lockdep_assert_held(&vdev->dev_set->lock); + + return vdev->ops->attach_ioas(vdev, pt_id); +} + +int vfio_iommufd_bind(struct vfio_device *vdev, struct iommufd_ctx *ictx, + u32 *dev_id, u32 *pt_id) { - u32 ioas_id; u32 device_id; int ret; @@ -29,17 +37,14 @@ int vfio_iommufd_bind(struct vfio_device *vdev, struct iommufd_ctx *ictx) if (ret) return ret; - ret = iommufd_vfio_compat_ioas_id(ictx, &ioas_id); - if (ret) - goto err_unbind; - ret = vdev->ops->attach_ioas(vdev, &ioas_id); - if (ret) - goto err_unbind; + if (pt_id) { + ret = vfio_iommufd_attach(vdev, pt_id); + if (ret) + goto err_unbind; + } - /* - * The legacy path has no way to return the device id or the selected - * pt_id - */ + if (dev_id) + *dev_id = device_id; return 0; err_unbind: @@ -74,14 +79,18 @@ int vfio_iommufd_physical_bind(struct vfio_device *vdev, } EXPORT_SYMBOL_GPL(vfio_iommufd_physical_bind); +static void __vfio_iommufd_detach(struct vfio_device *vdev) +{ + iommufd_device_detach(vdev->iommufd_device); + vdev->iommufd_attached = false; +} + void vfio_iommufd_physical_unbind(struct vfio_device *vdev) { lockdep_assert_held(&vdev->dev_set->lock); - if (vdev->iommufd_attached) { - iommufd_device_detach(vdev->iommufd_device); - vdev->iommufd_attached = false; - } + if (vdev->iommufd_attached) + __vfio_iommufd_detach(vdev); iommufd_device_unbind(vdev->iommufd_device); vdev->iommufd_device = NULL; } @@ -91,6 +100,20 @@ int vfio_iommufd_physical_attach_ioas(struct vfio_device *vdev, u32 *pt_id) { int rc; + lockdep_assert_held(&vdev->dev_set->lock); + + if (!vdev->iommufd_device) + return -EINVAL; + + if (!pt_id) { + if (vdev->iommufd_attached) + __vfio_iommufd_detach(vdev); + return 0; + } + + if (vdev->iommufd_attached) + return -EBUSY; + rc = iommufd_device_attach(vdev->iommufd_device, pt_id); if (rc) return rc; @@ -129,14 +152,18 @@ int vfio_iommufd_emulated_bind(struct vfio_device *vdev, } EXPORT_SYMBOL_GPL(vfio_iommufd_emulated_bind); +static void __vfio_iommufd_access_destroy(struct vfio_device *vdev) +{ + iommufd_access_destroy(vdev->iommufd_access); + vdev->iommufd_access = NULL; +} + void vfio_iommufd_emulated_unbind(struct vfio_device *vdev) { lockdep_assert_held(&vdev->dev_set->lock); - if (vdev->iommufd_access) { - iommufd_access_destroy(vdev->iommufd_access); - vdev->iommufd_access = NULL; - } + if (vdev->iommufd_access) + __vfio_iommufd_access_destroy(vdev); iommufd_ctx_put(vdev->iommufd_ictx); vdev->iommufd_ictx = NULL; } @@ -148,6 +175,18 @@ int vfio_iommufd_emulated_attach_ioas(struct vfio_device *vdev, u32 *pt_id) lockdep_assert_held(&vdev->dev_set->lock); + if (!vdev->iommufd_ictx) + return -EINVAL; + + if (!pt_id) { + if (vdev->iommufd_access) + __vfio_iommufd_access_destroy(vdev); + return 0; + } + + if (vdev->iommufd_access) + return -EBUSY; + user = iommufd_access_create(vdev->iommufd_ictx, *pt_id, &vfio_user_ops, vdev); if (IS_ERR(user)) diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h index c69a9902ea84..fe0fcfa78710 100644 --- a/drivers/vfio/vfio.h +++ b/drivers/vfio/vfio.h @@ -25,7 +25,8 @@ struct vfio_device_file { void vfio_device_put_registration(struct vfio_device *device); bool vfio_device_try_get_registration(struct vfio_device *device); -int vfio_device_open(struct vfio_device_file *df); +int vfio_device_open(struct vfio_device_file *df, + u32 *dev_id, u32 *pt_id); void vfio_device_close(struct vfio_device_file *device); struct vfio_device_file * @@ -230,11 +231,14 @@ static inline void vfio_container_cleanup(void) #endif #if IS_ENABLED(CONFIG_IOMMUFD) -int vfio_iommufd_bind(struct vfio_device *device, struct iommufd_ctx *ictx); +int vfio_iommufd_bind(struct vfio_device *device, struct iommufd_ctx *ictx, + u32 *dev_id, u32 *pt_id); void vfio_iommufd_unbind(struct vfio_device *device); +int vfio_iommufd_attach(struct vfio_device *vdev, u32 *pt_id); #else static inline int vfio_iommufd_bind(struct vfio_device *device, - struct iommufd_ctx *ictx) + struct iommufd_ctx *ictx, + u32 *dev_id, u32 *pt_id) { return -EOPNOTSUPP; } @@ -242,6 +246,11 @@ static inline int vfio_iommufd_bind(struct vfio_device *device, static inline void vfio_iommufd_unbind(struct vfio_device *device) { } + +static inline int vfio_iommufd_attach(struct vfio_device *vdev, u32 *pt_id) +{ + return -EOPNOTSUPP; +} #endif #if IS_ENABLED(CONFIG_VFIO_VIRQFD) diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c index d442ebaa4b21..90174a9015c4 100644 --- a/drivers/vfio/vfio_main.c +++ b/drivers/vfio/vfio_main.c @@ -358,7 +358,8 @@ vfio_allocate_device_file(struct vfio_device *device) return df; } -static int vfio_device_first_open(struct vfio_device_file *df) +static int vfio_device_first_open(struct vfio_device_file *df, + u32 *dev_id, u32 *pt_id) { struct vfio_device *device = df->device; struct iommufd_ctx *iommufd = df->iommufd; @@ -371,7 +372,7 @@ static int vfio_device_first_open(struct vfio_device_file *df) return -ENODEV; if (iommufd) - ret = vfio_iommufd_bind(device, iommufd); + ret = vfio_iommufd_bind(device, iommufd, dev_id, pt_id); else ret = vfio_device_group_use_iommu(device); if (ret) @@ -413,7 +414,8 @@ static void vfio_device_last_close(struct vfio_device_file *df) module_put(device->dev->driver->owner); } -int vfio_device_open(struct vfio_device_file *df) +int vfio_device_open(struct vfio_device_file *df, + u32 *dev_id, u32 *pt_id) { struct vfio_device *device = df->device; @@ -423,7 +425,7 @@ int vfio_device_open(struct vfio_device_file *df) if (device->open_count == 1) { int ret; - ret = vfio_device_first_open(df); + ret = vfio_device_first_open(df, dev_id, pt_id); if (ret) { device->open_count--; return ret; From patchwork Tue Jan 17 13:49:39 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yi Liu X-Patchwork-Id: 13104682 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id F0532C63797 for ; Tue, 17 Jan 2023 13:50:44 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229539AbjAQNul (ORCPT ); Tue, 17 Jan 2023 08:50:41 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46728 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230472AbjAQNuE (ORCPT ); Tue, 17 Jan 2023 08:50:04 -0500 Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D35DA3B642 for ; Tue, 17 Jan 2023 05:49:57 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1673963397; x=1705499397; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=Q2k1LfO8Uc9ntNmEOhY9d6Fb175r8XpfB3BBbjS4Xus=; b=ILXpmlsEM1BViiCNKHhF8PhWz1JfnkdYGvjW+FlKrstdEZhjaZqexC2E Df6dQOtzRmv7v4Si1uNYAlpH1urY4uP2sAiL5Z68GwUmXgQ8H5X8eaQOm OukJSWhtF6dpUIWwWyXEEmS+lyC2fY8hCx6WBb927NLrJNNTPyIswEVJh 66c4OJQVCHQZilyoKfC/STDBErbHOeX+bRTEy3fg4rK0bgmM1NfTm8lnP 18GLa4Q4rWkK3IJcu043DaOOzntlNg3vCftbzAmb5uaoddGCsw5jJHbRt 6iCjjuN5McwOnPxelT/1/AiohzlqZ99jFSfzxg/eryuvbgYsShEt8noSl w==; X-IronPort-AV: E=McAfee;i="6500,9779,10592"; a="326766458" X-IronPort-AV: E=Sophos;i="5.97,224,1669104000"; d="scan'208";a="326766458" Received: from orsmga007.jf.intel.com ([10.7.209.58]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 17 Jan 2023 05:49:57 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6500,9779,10592"; a="652551077" X-IronPort-AV: E=Sophos;i="5.97,224,1669104000"; d="scan'208";a="652551077" Received: from 984fee00a4c6.jf.intel.com ([10.165.58.231]) by orsmga007.jf.intel.com with ESMTP; 17 Jan 2023 05:49:57 -0800 From: Yi Liu To: alex.williamson@redhat.com, jgg@nvidia.com Cc: kevin.tian@intel.com, cohuck@redhat.com, eric.auger@redhat.com, nicolinc@nvidia.com, kvm@vger.kernel.org, mjrosato@linux.ibm.com, chao.p.peng@linux.intel.com, yi.l.liu@intel.com, yi.y.sun@linux.intel.com, peterx@redhat.com, jasowang@redhat.com, suravee.suthikulpanit@amd.com Subject: [PATCH 10/13] vfio: Make vfio_device_open() exclusive between group path and device cdev path Date: Tue, 17 Jan 2023 05:49:39 -0800 Message-Id: <20230117134942.101112-11-yi.l.liu@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230117134942.101112-1-yi.l.liu@intel.com> References: <20230117134942.101112-1-yi.l.liu@intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org VFIO group has historically allowed multi-open of the device FD. This was made secure because the "open" was executed via an ioctl to the group FD which is itself only single open. No know use of multiple device FDs is known. It is kind of a strange thing to do because new device FDs can naturally be created via dup(). When we implement the new device uAPI there is no natural way to allow the device itself from being multi-opened in a secure manner. Without the group FD we cannot prove the security context of the opener. Thus, when moving to the new uAPI we block the ability to multi-open the device. This also makes the cdev path exclusive with group path. The main logic is in the vfio_device_open(). It needs to sustain both the legacy behavior i.e. multi-open in the group path and the new behavior i.e. single-open in the cdev path. This mixture leads to the introduction of a new single_open flag stored both in struct vfio_device and vfio_device_file. vfio_device_file::single_open is set per the vfio_device_file allocation. Its value is propagated to struct vfio_device after device is opened successfully. Signed-off-by: Yi Liu --- drivers/vfio/group.c | 2 +- drivers/vfio/vfio.h | 6 +++++- drivers/vfio/vfio_main.c | 25 ++++++++++++++++++++++--- include/linux/vfio.h | 1 + 4 files changed, 29 insertions(+), 5 deletions(-) diff --git a/drivers/vfio/group.c b/drivers/vfio/group.c index 9484bb1c54a9..57ebe5e1a7e6 100644 --- a/drivers/vfio/group.c +++ b/drivers/vfio/group.c @@ -216,7 +216,7 @@ static struct file *vfio_device_open_file(struct vfio_device *device) struct file *filep; int ret; - df = vfio_allocate_device_file(device); + df = vfio_allocate_device_file(device, false); if (IS_ERR(df)) { ret = PTR_ERR(df); goto err_out; diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h index fe0fcfa78710..bdcf9762521d 100644 --- a/drivers/vfio/vfio.h +++ b/drivers/vfio/vfio.h @@ -17,7 +17,11 @@ struct vfio_device; struct vfio_container; struct vfio_device_file { + /* static fields, init per allocation */ struct vfio_device *device; + bool single_open; + + /* fields set after allocation */ struct kvm *kvm; struct iommufd_ctx *iommufd; bool access_granted; @@ -30,7 +34,7 @@ int vfio_device_open(struct vfio_device_file *df, void vfio_device_close(struct vfio_device_file *device); struct vfio_device_file * -vfio_allocate_device_file(struct vfio_device *device); +vfio_allocate_device_file(struct vfio_device *device, bool single_open); extern const struct file_operations vfio_device_fops; diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c index 90174a9015c4..78725c28b933 100644 --- a/drivers/vfio/vfio_main.c +++ b/drivers/vfio/vfio_main.c @@ -345,7 +345,7 @@ static bool vfio_assert_device_open(struct vfio_device *device) } struct vfio_device_file * -vfio_allocate_device_file(struct vfio_device *device) +vfio_allocate_device_file(struct vfio_device *device, bool single_open) { struct vfio_device_file *df; @@ -354,6 +354,7 @@ vfio_allocate_device_file(struct vfio_device *device) return ERR_PTR(-ENOMEM); df->device = device; + df->single_open = single_open; return df; } @@ -421,6 +422,16 @@ int vfio_device_open(struct vfio_device_file *df, lockdep_assert_held(&device->dev_set->lock); + /* + * Device cdev path cannot support multiple device open since + * it doesn't have a secure way for it. So a second device + * open attempt should be failed if the caller is from a cdev + * path or the device has already been opened by a cdev path. + */ + if (device->open_count != 0 && + (df->single_open || device->single_open)) + return -EINVAL; + device->open_count++; if (device->open_count == 1) { int ret; @@ -430,6 +441,7 @@ int vfio_device_open(struct vfio_device_file *df, device->open_count--; return ret; } + device->single_open = df->single_open; } /* @@ -446,8 +458,10 @@ void vfio_device_close(struct vfio_device_file *df) mutex_lock(&device->dev_set->lock); vfio_assert_device_open(device); - if (device->open_count == 1) + if (device->open_count == 1) { vfio_device_last_close(df); + device->single_open = false; + } device->open_count--; mutex_unlock(&device->dev_set->lock); } @@ -493,7 +507,12 @@ static int vfio_device_fops_release(struct inode *inode, struct file *filep) struct vfio_device_file *df = filep->private_data; struct vfio_device *device = df->device; - vfio_device_group_close(df); + /* + * group path supports multiple device open, while cdev doesn't. + * So use vfio_device_group_close() for !singel_open case. + */ + if (!df->single_open) + vfio_device_group_close(df); vfio_device_put_registration(device); diff --git a/include/linux/vfio.h b/include/linux/vfio.h index 46edd6e6c0ba..300318f0d448 100644 --- a/include/linux/vfio.h +++ b/include/linux/vfio.h @@ -63,6 +63,7 @@ struct vfio_device { struct iommufd_ctx *iommufd_ictx; bool iommufd_attached; #endif + bool single_open; }; /** From patchwork Tue Jan 17 13:49:40 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yi Liu X-Patchwork-Id: 13104685 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 63F4AC677F1 for ; Tue, 17 Jan 2023 13:50:51 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230354AbjAQNuu (ORCPT ); Tue, 17 Jan 2023 08:50:50 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46898 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230401AbjAQNuE (ORCPT ); Tue, 17 Jan 2023 08:50:04 -0500 Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id AC7D8367C2 for ; Tue, 17 Jan 2023 05:49:58 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1673963398; x=1705499398; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=VIGYMeqehWy9X7tamECnp9oQp3atpYT9g1Gc4Cx0hq8=; b=V+NrCunqp4ezl1LRThWn0juxNiHfYyhmS8glOtaQE8/eertsCSjIgMAe vX7bfn/sI5AqxGSfNyXLVQYwQ2n2VZWIHvf425dFG6Xadx2BI7YAN79Wa zDSnOCil6RS5fwv3zzz5xLTD9TBH/eE+lj9S0aLakxH2Eg7nEqhOZPqN5 D34bVc0xC3pbZvmUEBzYy4tnLfw0rG1x9uRN58HBBL413fCWMC4FQ+Zug 0f3m7xSQ1TXvD6MAU9uVO4p0sJrEO2NX40r9VKcQesx6jB+1unGv/U8C/ EBCY2OZPDGRtsEkfkMNwgpoao2jnz2cZjVQ+Ozils6ysInVf7dD/gHmip Q==; X-IronPort-AV: E=McAfee;i="6500,9779,10592"; a="326766464" X-IronPort-AV: E=Sophos;i="5.97,224,1669104000"; d="scan'208";a="326766464" Received: from orsmga007.jf.intel.com ([10.7.209.58]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 17 Jan 2023 05:49:58 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6500,9779,10592"; a="652551084" X-IronPort-AV: E=Sophos;i="5.97,224,1669104000"; d="scan'208";a="652551084" Received: from 984fee00a4c6.jf.intel.com ([10.165.58.231]) by orsmga007.jf.intel.com with ESMTP; 17 Jan 2023 05:49:58 -0800 From: Yi Liu To: alex.williamson@redhat.com, jgg@nvidia.com Cc: kevin.tian@intel.com, cohuck@redhat.com, eric.auger@redhat.com, nicolinc@nvidia.com, kvm@vger.kernel.org, mjrosato@linux.ibm.com, chao.p.peng@linux.intel.com, yi.l.liu@intel.com, yi.y.sun@linux.intel.com, peterx@redhat.com, jasowang@redhat.com, suravee.suthikulpanit@amd.com, Joao Martins Subject: [PATCH 11/13] vfio: Add cdev for vfio_device Date: Tue, 17 Jan 2023 05:49:40 -0800 Message-Id: <20230117134942.101112-12-yi.l.liu@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230117134942.101112-1-yi.l.liu@intel.com> References: <20230117134942.101112-1-yi.l.liu@intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org This allows user to directly open a vfio device w/o using the legacy container/group interface, as a prerequisite for supporting new iommu features like nested translation. The device fd opened in this manner doesn't have the capability to access the device as the fops open() doesn't open the device until the successful BIND_IOMMUFD which be added in next patch. With this patch, devices registered to vfio core have both group and device interface created. - group interface : /dev/vfio/$groupID - device interface: /dev/vfio/devices/vfioX (X is the minor number and is unique across devices) Given a vfio device the user can identify the matching vfioX by checking the sysfs path of the device. Take PCI device (0000:6a:01.0) for example, /sys/bus/pci/devices/0000\:6a\:01.0/vfio-dev/vfio0/dev contains the major:minor of the matching vfioX. Userspace then opens the /dev/vfio/devices/vfioX and checks with fstat that the major:minor matches. The vfio_device cdev logic in this patch: *) __vfio_register_dev() path ends up doing cdev_device_add() for each vfio_device; *) vfio_unregister_group_dev() path does cdev_device_del(); Signed-off-by: Yi Liu Signed-off-by: Joao Martins --- drivers/vfio/vfio_main.c | 103 ++++++++++++++++++++++++++++++++++++--- include/linux/vfio.h | 7 ++- 2 files changed, 102 insertions(+), 8 deletions(-) diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c index 78725c28b933..6068ffb7c6b7 100644 --- a/drivers/vfio/vfio_main.c +++ b/drivers/vfio/vfio_main.c @@ -43,6 +43,9 @@ static struct vfio { struct class *device_class; struct ida device_ida; +#if IS_ENABLED(CONFIG_IOMMUFD) + dev_t device_devt; +#endif } vfio; static DEFINE_XARRAY(vfio_device_set_xa); @@ -156,7 +159,11 @@ static void vfio_device_release(struct device *dev) container_of(dev, struct vfio_device, device); vfio_release_device_set(device); +#if IS_ENABLED(CONFIG_IOMMUFD) + ida_free(&vfio.device_ida, MINOR(device->device.devt)); +#else ida_free(&vfio.device_ida, device->index); +#endif if (device->ops->release) device->ops->release(device); @@ -209,15 +216,16 @@ EXPORT_SYMBOL_GPL(_vfio_alloc_device); static int vfio_init_device(struct vfio_device *device, struct device *dev, const struct vfio_device_ops *ops) { + unsigned int minor; int ret; ret = ida_alloc_max(&vfio.device_ida, MINORMASK, GFP_KERNEL); if (ret < 0) { - dev_dbg(dev, "Error to alloc index\n"); + dev_dbg(dev, "Error to alloc minor\n"); return ret; } - device->index = ret; + minor = ret; init_completion(&device->comp); device->dev = dev; device->ops = ops; @@ -232,17 +240,25 @@ static int vfio_init_device(struct vfio_device *device, struct device *dev, device->device.release = vfio_device_release; device->device.class = vfio.device_class; device->device.parent = device->dev; +#if IS_ENABLED(CONFIG_IOMMUFD) + device->device.devt = MKDEV(MAJOR(vfio.device_devt), minor); + cdev_init(&device->cdev, &vfio_device_fops); + device->cdev.owner = THIS_MODULE; +#else + device->index = minor; +#endif return 0; out_uninit: vfio_release_device_set(device); - ida_free(&vfio.device_ida, device->index); + ida_free(&vfio.device_ida, minor); return ret; } static int __vfio_register_dev(struct vfio_device *device, enum vfio_group_type type) { + unsigned int minor; int ret; if (WARN_ON(device->ops->bind_iommufd && @@ -257,7 +273,12 @@ static int __vfio_register_dev(struct vfio_device *device, if (!device->dev_set) vfio_assign_device_set(device, device); - ret = dev_set_name(&device->device, "vfio%d", device->index); +#if IS_ENABLED(CONFIG_IOMMUFD) + minor = MINOR(device->device.devt); +#else + minor = device->index; +#endif + ret = dev_set_name(&device->device, "vfio%d", minor); if (ret) return ret; @@ -265,7 +286,11 @@ static int __vfio_register_dev(struct vfio_device *device, if (ret) return ret; +#if IS_ENABLED(CONFIG_IOMMUFD) + ret = cdev_device_add(&device->cdev, &device->device); +#else ret = device_add(&device->device); +#endif if (ret) goto err_out; @@ -305,6 +330,17 @@ void vfio_unregister_group_dev(struct vfio_device *device) bool interrupted = false; long rc; +#if IS_ENABLED(CONFIG_IOMMUFD) + /* + * Balances device_add in register path. Putting it as the first + * operation in unregister to prevent registration refcount from + * incrementing per cdev open. + */ + cdev_device_del(&device->cdev, &device->device); +#else + device_del(&device->device); +#endif + vfio_device_put_registration(device); rc = try_wait_for_completion(&device->comp); while (rc <= 0) { @@ -330,9 +366,6 @@ void vfio_unregister_group_dev(struct vfio_device *device) vfio_device_group_unregister(device); - /* Balances device_add in register path */ - device_del(&device->device); - /* Balances vfio_device_set_group in register path */ vfio_device_remove_group(device); } @@ -502,6 +535,37 @@ static inline void vfio_device_pm_runtime_put(struct vfio_device *device) /* * VFIO Device fd */ +#if IS_ENABLED(CONFIG_IOMMUFD) +static int vfio_device_fops_open(struct inode *inode, struct file *filep) +{ + struct vfio_device *device = container_of(inode->i_cdev, + struct vfio_device, cdev); + struct vfio_device_file *df; + int ret; + + if (!vfio_device_try_get_registration(device)) + return -ENODEV; + + /* + * device access is blocked until .open_device() is called + * in BIND_IOMMUFD. + */ + df = vfio_allocate_device_file(device, true); + if (IS_ERR(df)) { + ret = PTR_ERR(df); + goto err_put_registration; + } + + filep->private_data = df; + + return 0; + +err_put_registration: + vfio_device_put_registration(device); + return ret; +} +#endif + static int vfio_device_fops_release(struct inode *inode, struct file *filep) { struct vfio_device_file *df = filep->private_data; @@ -1169,6 +1233,9 @@ static int vfio_device_fops_mmap(struct file *filep, struct vm_area_struct *vma) const struct file_operations vfio_device_fops = { .owner = THIS_MODULE, +#if IS_ENABLED(CONFIG_IOMMUFD) + .open = vfio_device_fops_open, +#endif .release = vfio_device_fops_release, .read = vfio_device_fops_read, .write = vfio_device_fops_write, @@ -1522,6 +1589,13 @@ EXPORT_SYMBOL(vfio_dma_rw); /* * Module/class support */ +#if IS_ENABLED(CONFIG_IOMMUFD) +static char *vfio_device_devnode(const struct device *dev, umode_t *mode) +{ + return kasprintf(GFP_KERNEL, "vfio/devices/%s", dev_name(dev)); +} +#endif + static int __init vfio_init(void) { int ret; @@ -1543,9 +1617,21 @@ static int __init vfio_init(void) goto err_dev_class; } +#if IS_ENABLED(CONFIG_IOMMUFD) + vfio.device_class->devnode = vfio_device_devnode; + ret = alloc_chrdev_region(&vfio.device_devt, 0, + MINORMASK + 1, "vfio-dev"); + if (ret) + goto err_alloc_dev_chrdev; +#endif pr_info(DRIVER_DESC " version: " DRIVER_VERSION "\n"); return 0; +#if IS_ENABLED(CONFIG_IOMMUFD) +err_alloc_dev_chrdev: + class_destroy(vfio.device_class); + vfio.device_class = NULL; +#endif err_dev_class: vfio_virqfd_exit(); err_virqfd: @@ -1556,6 +1642,9 @@ static int __init vfio_init(void) static void __exit vfio_cleanup(void) { ida_destroy(&vfio.device_ida); +#if IS_ENABLED(CONFIG_IOMMUFD) + unregister_chrdev_region(vfio.device_devt, MINORMASK + 1); +#endif class_destroy(vfio.device_class); vfio.device_class = NULL; vfio_virqfd_exit(); diff --git a/include/linux/vfio.h b/include/linux/vfio.h index 300318f0d448..4a31842ebe0b 100644 --- a/include/linux/vfio.h +++ b/include/linux/vfio.h @@ -13,6 +13,7 @@ #include #include #include +#include #include #include @@ -50,8 +51,12 @@ struct vfio_device { struct kvm *kvm; /* Members below here are private, not for driver use */ - unsigned int index; struct device device; /* device.kref covers object life circle */ +#if IS_ENABLED(CONFIG_IOMMUFD) + struct cdev cdev; +#else + unsigned int index; +#endif refcount_t refcount; /* user count on registered device*/ unsigned int open_count; struct completion comp; From patchwork Tue Jan 17 13:49:41 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yi Liu X-Patchwork-Id: 13104684 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id B6DDDC3DA78 for ; Tue, 17 Jan 2023 13:50:49 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230448AbjAQNus (ORCPT ); Tue, 17 Jan 2023 08:50:48 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46288 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229878AbjAQNuE (ORCPT ); Tue, 17 Jan 2023 08:50:04 -0500 Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6573E367E0 for ; Tue, 17 Jan 2023 05:49:59 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1673963399; x=1705499399; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=eor2XodTuo+1n2yc1CRomFNfDbYdFRiv33jME0SEMMU=; b=Ji5pHE0t1kc1aYv/h8Ub3kPjSL7prw4qvxWGjeOTNVv+F5t3hcRipabd AcRL9EoepkP9T+O9djske60NfwkKrAsq+warVjrN+7cypj0cKkEyMhBPe YICf/jyNJ9C9RQFg3bRird/+PI1ImCAqIQkB7MiFhKQyTfDU0pNWJHL1W Hm2K0fdBRDNvnMKoQURTIwu2EPoSVSZjyVkcdnmneK0+fUdQjucju2SuP LdlHkjNePcQ6RP4HHUKfVOB8tOt+DaoH9PkhY4sH768MLPadQ0M46fpJa X1CYlg2CCQ5ksHb7fRGuy6gLeykqAyTVY888ijmkzmPHramPsiI2K2D+R A==; X-IronPort-AV: E=McAfee;i="6500,9779,10592"; a="326766473" X-IronPort-AV: E=Sophos;i="5.97,224,1669104000"; d="scan'208";a="326766473" Received: from orsmga007.jf.intel.com ([10.7.209.58]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 17 Jan 2023 05:49:59 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6500,9779,10592"; a="652551092" X-IronPort-AV: E=Sophos;i="5.97,224,1669104000"; d="scan'208";a="652551092" Received: from 984fee00a4c6.jf.intel.com ([10.165.58.231]) by orsmga007.jf.intel.com with ESMTP; 17 Jan 2023 05:49:59 -0800 From: Yi Liu To: alex.williamson@redhat.com, jgg@nvidia.com Cc: kevin.tian@intel.com, cohuck@redhat.com, eric.auger@redhat.com, nicolinc@nvidia.com, kvm@vger.kernel.org, mjrosato@linux.ibm.com, chao.p.peng@linux.intel.com, yi.l.liu@intel.com, yi.y.sun@linux.intel.com, peterx@redhat.com, jasowang@redhat.com, suravee.suthikulpanit@amd.com Subject: [PATCH 12/13] vfio: Add ioctls for device cdev iommufd Date: Tue, 17 Jan 2023 05:49:41 -0800 Message-Id: <20230117134942.101112-13-yi.l.liu@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230117134942.101112-1-yi.l.liu@intel.com> References: <20230117134942.101112-1-yi.l.liu@intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org This adds two vfio device ioctls for userspace using iommufd on vfio devices. VFIO_DEVICE_BIND_IOMMUFD: bind device to an iommufd, hence gain DMA control provided by the iommufd. VFIO no iommu is indicated by passing a minus fd value. VFIO_DEVICE_ATTACH_IOMMUFD_PT: attach device to ioas, page tables managed by iommufd. Attach can be undo by passing IOMMUFD_INVALID_ID to kernel. The ioctls introduced here are just on par with existing VFIO. Signed-off-by: Yi Liu --- drivers/vfio/vfio.h | 1 + drivers/vfio/vfio_main.c | 175 ++++++++++++++++++++++++++++++++++- include/uapi/linux/iommufd.h | 2 + include/uapi/linux/vfio.h | 64 +++++++++++++ 4 files changed, 237 insertions(+), 5 deletions(-) diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h index bdcf9762521d..444be924c915 100644 --- a/drivers/vfio/vfio.h +++ b/drivers/vfio/vfio.h @@ -25,6 +25,7 @@ struct vfio_device_file { struct kvm *kvm; struct iommufd_ctx *iommufd; bool access_granted; + bool noiommu; }; void vfio_device_put_registration(struct vfio_device *device); diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c index 6068ffb7c6b7..99ebb5bd1eda 100644 --- a/drivers/vfio/vfio_main.c +++ b/drivers/vfio/vfio_main.c @@ -34,6 +34,7 @@ #include #include #include +#include #include "vfio.h" #define DRIVER_VERSION "0.3" @@ -402,12 +403,37 @@ static int vfio_device_first_open(struct vfio_device_file *df, lockdep_assert_held(&device->dev_set->lock); + /* df->iommufd and df->noiommu should be exclusive */ + if (WARN_ON(iommufd && df->noiommu)) + return -EINVAL; + if (!try_module_get(device->dev->driver->owner)) return -ENODEV; + /* + * For group path, iommufd pointer is NULL when comes into this + * helper. Its noiommu support is in container.c. + * + * For iommufd compat mode, iommufd pointer here is a valid value. + * Its noiommu support is supposed to be in vfio_iommufd_bind(). + * + * For device cdev path, iommufd pointer here is a valid value for + * normal cases, but it is NULL if it's noiommu. The reason is + * that userspace uses iommufd==-1 to indicate noiommu mode in this + * path. So caller of this helper will pass in a NULL iommufd + * pointer. To differentiate it from the group path which also + * passes NULL iommufd pointer in, df->noiommu is used. For cdev + * noiommu, df->noiommu would be set to mark noiommu case for cdev + * path. + * + * So if df->noiommu is set then this helper just goes ahead to + * open device. If not, it depends on if iommufd pointer is NULL + * to handle the group path, iommufd compat mode, normal cases in + * the cdev path. + */ if (iommufd) ret = vfio_iommufd_bind(device, iommufd, dev_id, pt_id); - else + else if (!df->noiommu) ret = vfio_device_group_use_iommu(device); if (ret) goto err_module_put; @@ -424,7 +450,7 @@ static int vfio_device_first_open(struct vfio_device_file *df, device->kvm = NULL; if (iommufd) vfio_iommufd_unbind(device); - else + else if (!df->noiommu) vfio_device_group_unuse_iommu(device); err_module_put: module_put(device->dev->driver->owner); @@ -443,7 +469,7 @@ static void vfio_device_last_close(struct vfio_device_file *df) device->kvm = NULL; if (iommufd) vfio_iommufd_unbind(device); - else + else if (!df->noiommu) vfio_device_group_unuse_iommu(device); module_put(device->dev->driver->owner); } @@ -485,17 +511,24 @@ int vfio_device_open(struct vfio_device_file *df, return 0; } -void vfio_device_close(struct vfio_device_file *df) +static void __vfio_device_close(struct vfio_device_file *df) { struct vfio_device *device = df->device; - mutex_lock(&device->dev_set->lock); vfio_assert_device_open(device); if (device->open_count == 1) { vfio_device_last_close(df); device->single_open = false; } device->open_count--; +} + +void vfio_device_close(struct vfio_device_file *df) +{ + struct vfio_device *device = df->device; + + mutex_lock(&device->dev_set->lock); + __vfio_device_close(df); mutex_unlock(&device->dev_set->lock); } @@ -577,6 +610,8 @@ static int vfio_device_fops_release(struct inode *inode, struct file *filep) */ if (!df->single_open) vfio_device_group_close(df); + else + vfio_device_close(df); vfio_device_put_registration(device); @@ -1143,6 +1178,129 @@ static int vfio_ioctl_device_feature(struct vfio_device *device, } } +static long vfio_device_ioctl_bind_iommufd(struct vfio_device_file *df, + unsigned long arg) +{ + struct vfio_device *device = df->device; + struct vfio_device_bind_iommufd bind; + struct iommufd_ctx *iommufd = NULL; + unsigned long minsz; + struct fd f; + int ret; + + minsz = offsetofend(struct vfio_device_bind_iommufd, iommufd); + + if (copy_from_user(&bind, (void __user *)arg, minsz)) + return -EFAULT; + + if (bind.argsz < minsz || bind.flags) + return -EINVAL; + + if (!device->ops->bind_iommufd) + return -ENODEV; + + mutex_lock(&device->dev_set->lock); + /* + * If already been bound to an iommufd, or already set noiommu + * then fail it. + */ + if (df->iommufd || df->noiommu) { + ret = -EINVAL; + goto out_unlock; + } + + /* iommufd < 0 means noiommu mode */ + if (bind.iommufd < 0) { + if (!capable(CAP_SYS_RAWIO)) { + ret = -EPERM; + goto out_unlock; + } + df->noiommu = true; + } else { + f = fdget(bind.iommufd); + if (!f.file) { + ret = -EBADF; + goto out_unlock; + } + iommufd = iommufd_ctx_from_file(f.file); + if (IS_ERR(iommufd)) { + ret = PTR_ERR(iommufd); + goto out_put_file; + } + } + + /* df->kvm is supposed to be set in vfio_device_file_set_kvm() */ + df->iommufd = iommufd; + ret = vfio_device_open(df, &bind.out_devid, NULL); + if (ret) + goto out_put_file; + + ret = copy_to_user((void __user *)arg + minsz, + &bind.out_devid, + sizeof(bind.out_devid)) ? -EFAULT : 0; + if (ret) + goto out_close_device; + + mutex_unlock(&device->dev_set->lock); + if (iommufd) + fdput(f); + else if (df->noiommu) + dev_warn(device->dev, "vfio-noiommu device used by user " + "(%s:%d)\n", current->comm, task_pid_nr(current)); + return 0; + +out_close_device: + __vfio_device_close(df); +out_put_file: + if (iommufd) + fdput(f); +out_unlock: + df->iommufd = NULL; + df->noiommu = false; + mutex_unlock(&device->dev_set->lock); + return ret; +} + +static int vfio_ioctl_device_attach(struct vfio_device *device, + struct vfio_device_feature __user *arg) +{ + struct vfio_device_attach_iommufd_pt attach; + int ret; + bool is_attach; + + if (copy_from_user(&attach, (void __user *)arg, sizeof(attach))) + return -EFAULT; + + if (attach.flags) + return -EINVAL; + + if (!device->ops->bind_iommufd) + return -ENODEV; + + mutex_lock(&device->dev_set->lock); + is_attach = attach.pt_id != IOMMUFD_INVALID_ID; + ret = vfio_iommufd_attach(device, is_attach ? &attach.pt_id : NULL); + if (ret) + goto out_unlock; + + if (is_attach) { + ret = copy_to_user((void __user *)arg + offsetofend( + struct vfio_device_attach_iommufd_pt, flags), + &attach.pt_id, + sizeof(attach.pt_id)) ? -EFAULT : 0; + if (ret) + goto out_detach; + } + mutex_unlock(&device->dev_set->lock); + return 0; + +out_detach: + vfio_iommufd_attach(device, NULL); +out_unlock: + mutex_unlock(&device->dev_set->lock); + return ret; +} + static long vfio_device_fops_unl_ioctl(struct file *filep, unsigned int cmd, unsigned long arg) { @@ -1151,6 +1309,9 @@ static long vfio_device_fops_unl_ioctl(struct file *filep, bool access; int ret; + if (cmd == VFIO_DEVICE_BIND_IOMMUFD) + return vfio_device_ioctl_bind_iommufd(df, arg); + /* Paired with smp_store_release() in vfio_device_open() */ access = smp_load_acquire(&df->access_granted); if (!access) @@ -1165,6 +1326,10 @@ static long vfio_device_fops_unl_ioctl(struct file *filep, ret = vfio_ioctl_device_feature(device, (void __user *)arg); break; + case VFIO_DEVICE_ATTACH_IOMMUFD_PT: + ret = vfio_ioctl_device_attach(device, (void __user *)arg); + break; + default: if (unlikely(!device->ops->ioctl)) ret = -EINVAL; diff --git a/include/uapi/linux/iommufd.h b/include/uapi/linux/iommufd.h index 98ebba80cfa1..87680274c01b 100644 --- a/include/uapi/linux/iommufd.h +++ b/include/uapi/linux/iommufd.h @@ -9,6 +9,8 @@ #define IOMMUFD_TYPE (';') +#define IOMMUFD_INVALID_ID 0 /* valid ID starts from 1 */ + /** * DOC: General ioctl format * diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h index 23105eb036fa..235d3485a883 100644 --- a/include/uapi/linux/vfio.h +++ b/include/uapi/linux/vfio.h @@ -190,6 +190,70 @@ struct vfio_group_status { /* --------------- IOCTLs for DEVICE file descriptors --------------- */ +/* + * VFIO_DEVICE_BIND_IOMMUFD - _IOR(VFIO_TYPE, VFIO_BASE + 19, + * struct vfio_device_bind_iommufd) + * + * Bind a vfio_device to the specified iommufd. + * + * The user should provide a device cookie when calling this ioctl. The + * cookie is carried only in event e.g. I/O fault reported to userspace + * via iommufd. The user should use devid returned by this ioctl to mark + * the target device in other ioctls (e.g. capability query via iommufd). + * + * User is not allowed to access the device before the binding operation + * is completed. + * + * Unbind is automatically conducted when device fd is closed. + * + * @argsz: user filled size of this data. + * @flags: reserved for future extension. + * @dev_cookie: a per device cookie provided by userspace. + * @iommufd: iommufd to bind. iommufd < 0 means noiommu. + * @out_devid: the device id generated by this bind. + * + * Return: 0 on success, -errno on failure. + */ +struct vfio_device_bind_iommufd { + __u32 argsz; + __u32 flags; + __aligned_u64 dev_cookie; + __s32 iommufd; + __u32 out_devid; +}; + +#define VFIO_DEVICE_BIND_IOMMUFD _IO(VFIO_TYPE, VFIO_BASE + 19) + +/* + * VFIO_DEVICE_ATTACH_IOMMUFD_PT - _IOW(VFIO_TYPE, VFIO_BASE + 20, + * struct vfio_device_attach_iommufd_pt) + * + * Attach a vfio device to an iommufd address space specified by IOAS + * id or hw_pagetable (hwpt) id. + * + * Available only after a device has been bound to iommufd via + * VFIO_DEVICE_BIND_IOMMUFD + * + * Undo by passing pt_id == IOMMUFD_INVALID_ID + * + * @argsz: user filled size of this data. + * @flags: must be 0. + * @pt_id: Input the target id which can represent an ioas or a hwpt + * allocated via iommufd subsystem. + * Output the attached hwpt id which could be the specified + * hwpt itself or a hwpt automatically created for the + * specified ioas by kernel during the attachment. + * + * Return: 0 on success, -errno on failure. + */ +struct vfio_device_attach_iommufd_pt { + __u32 argsz; + __u32 flags; + __u32 pt_id; +}; + +#define VFIO_DEVICE_ATTACH_IOMMUFD_PT _IO(VFIO_TYPE, VFIO_BASE + 20) + /** * VFIO_DEVICE_GET_INFO - _IOR(VFIO_TYPE, VFIO_BASE + 7, * struct vfio_device_info) From patchwork Tue Jan 17 13:49:42 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yi Liu X-Patchwork-Id: 13104686 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id F1004C3DA78 for ; Tue, 17 Jan 2023 13:50:52 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231220AbjAQNuv (ORCPT ); Tue, 17 Jan 2023 08:50:51 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46758 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231185AbjAQNuF (ORCPT ); Tue, 17 Jan 2023 08:50:05 -0500 Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7061839B8A for ; Tue, 17 Jan 2023 05:50:00 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1673963400; x=1705499400; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=KhfleG4+0HR7xoEzBCw2e2eD1CHN9Wk9VZ5QjQ1CUeo=; b=XRvfckeNkTl1r8O7AdUVe5CAB7Pn8scddhNs4zgc68YyPn5nrz8gGmPV 9lUYrEeOnjlrZkvYM0MXdhHecBa9AN5kT3rELmq2iIQcYhUji8u7hF3MX jsZv9KIf40kGrEGse9Bry0Aa4nav7jhJ0G1fMZx0ORRz/0XCeFKMlhsFP H8KMCdiBVnoRc9Od/D5lgFoib5JVI+5sKZcUru3XbqcuhYp13+FyiEfE7 SpPQE362tipTLzWYru5sblUgUYupe3E9helVJxtXt4BoWnAPQVN91EPRC R2PzqYlX+p76PfP+Y4hjd95nERyHOQ/QItVbRAa+1/Zwl9pui2EzmLxVA A==; X-IronPort-AV: E=McAfee;i="6500,9779,10592"; a="326766482" X-IronPort-AV: E=Sophos;i="5.97,224,1669104000"; d="scan'208";a="326766482" Received: from orsmga007.jf.intel.com ([10.7.209.58]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 17 Jan 2023 05:50:00 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6500,9779,10592"; a="652551104" X-IronPort-AV: E=Sophos;i="5.97,224,1669104000"; d="scan'208";a="652551104" Received: from 984fee00a4c6.jf.intel.com ([10.165.58.231]) by orsmga007.jf.intel.com with ESMTP; 17 Jan 2023 05:49:59 -0800 From: Yi Liu To: alex.williamson@redhat.com, jgg@nvidia.com Cc: kevin.tian@intel.com, cohuck@redhat.com, eric.auger@redhat.com, nicolinc@nvidia.com, kvm@vger.kernel.org, mjrosato@linux.ibm.com, chao.p.peng@linux.intel.com, yi.l.liu@intel.com, yi.y.sun@linux.intel.com, peterx@redhat.com, jasowang@redhat.com, suravee.suthikulpanit@amd.com Subject: [PATCH 13/13] vfio: Compile group optionally Date: Tue, 17 Jan 2023 05:49:42 -0800 Message-Id: <20230117134942.101112-14-yi.l.liu@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230117134942.101112-1-yi.l.liu@intel.com> References: <20230117134942.101112-1-yi.l.liu@intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org group code is not needed for vfio device cdev, so with vfio device cdev introduced, the group infrastructures can be compiled out. Signed-off-by: Yi Liu --- drivers/vfio/Kconfig | 17 +++++++++++ drivers/vfio/Makefile | 3 +- drivers/vfio/vfio.h | 69 +++++++++++++++++++++++++++++++++++++++++++ include/linux/vfio.h | 11 +++++++ 4 files changed, 99 insertions(+), 1 deletion(-) diff --git a/drivers/vfio/Kconfig b/drivers/vfio/Kconfig index a8f544629467..7e3f6249fa15 100644 --- a/drivers/vfio/Kconfig +++ b/drivers/vfio/Kconfig @@ -12,9 +12,26 @@ menuconfig VFIO If you don't know what to do here, say N. if VFIO +config VFIO_ENABLE_GROUP + bool + default !IOMMUFD + +config VFIO_GROUP + bool "Support for the VFIO group /dev/vfio/$group_id" + select VFIO_ENABLE_GROUP + default y + help + VFIO group is legacy interface for userspace. For userspaces + adapted to iommufd and vfio device cdev, this can be N. For + now, before iommufd is ready and userspace applications fully + converted to iommufd and vfio device cdev, this should be Y. + + If you don't know what to do here, say Y. + config VFIO_CONTAINER bool "Support for the VFIO container /dev/vfio/vfio" select VFIO_IOMMU_TYPE1 if MMU && (X86 || S390 || ARM || ARM64) + depends on VFIO_ENABLE_GROUP default y help The VFIO container is the classic interface to VFIO for establishing diff --git a/drivers/vfio/Makefile b/drivers/vfio/Makefile index 70e7dcb302ef..bb3fec9ea6bf 100644 --- a/drivers/vfio/Makefile +++ b/drivers/vfio/Makefile @@ -2,8 +2,9 @@ obj-$(CONFIG_VFIO) += vfio.o vfio-y += vfio_main.o \ - group.o \ iova_bitmap.o + +vfio-$(CONFIG_VFIO_ENABLE_GROUP) += group.o vfio-$(CONFIG_IOMMUFD) += iommufd.o vfio-$(CONFIG_VFIO_CONTAINER) += container.o vfio-$(CONFIG_VFIO_VIRQFD) += virqfd.o diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h index 444be924c915..cd282e5c07bb 100644 --- a/drivers/vfio/vfio.h +++ b/drivers/vfio/vfio.h @@ -63,6 +63,7 @@ enum vfio_group_type { VFIO_NO_IOMMU, }; +#if IS_ENABLED(CONFIG_VFIO_ENABLE_GROUP) struct vfio_group { struct device dev; struct cdev cdev; @@ -105,6 +106,74 @@ bool vfio_group_has_dev(struct vfio_group *group, struct vfio_device *device); bool vfio_device_has_container(struct vfio_device *device); int __init vfio_group_init(void); void vfio_group_cleanup(void); +#else +struct vfio_group; + +static inline int vfio_device_set_group(struct vfio_device *device, + enum vfio_group_type type) +{ + return 0; +} + +static inline void vfio_device_remove_group(struct vfio_device *device) +{ +} + +static inline void vfio_device_group_register(struct vfio_device *device) +{ +} + +static inline void vfio_device_group_unregister(struct vfio_device *device) +{ +} + +static inline int vfio_device_group_use_iommu(struct vfio_device *device) +{ + return -EOPNOTSUPP; +} + +static inline void vfio_device_group_unuse_iommu(struct vfio_device *device) +{ +} + +static inline void vfio_device_group_close(struct vfio_device_file *df) +{ +} + +static inline struct vfio_group *vfio_group_from_file(struct file *file) +{ + return NULL; +} + +static inline bool vfio_group_enforced_coherent(struct vfio_group *group) +{ + return true; +} + +static inline void vfio_group_set_kvm(struct vfio_group *group, struct kvm *kvm) +{ +} + +static inline bool vfio_group_has_dev(struct vfio_group *group, + struct vfio_device *device) +{ + return false; +} + +static inline bool vfio_device_has_container(struct vfio_device *device) +{ + return false; +} + +static inline int __init vfio_group_init(void) +{ + return 0; +} + +static inline void vfio_group_cleanup(void) +{ +} +#endif /* CONFIG_VFIO_ENABLE_GROUP */ #if IS_ENABLED(CONFIG_VFIO_CONTAINER) /* events for the backend driver notify callback */ diff --git a/include/linux/vfio.h b/include/linux/vfio.h index 4a31842ebe0b..eb4dc3dfab03 100644 --- a/include/linux/vfio.h +++ b/include/linux/vfio.h @@ -43,7 +43,9 @@ struct vfio_device { */ const struct vfio_migration_ops *mig_ops; const struct vfio_log_ops *log_ops; +#if IS_ENABLED(CONFIG_VFIO_ENABLE_GROUP) struct vfio_group *group; +#endif struct vfio_device_set *dev_set; struct list_head dev_set_list; unsigned int migration_flags; @@ -60,8 +62,10 @@ struct vfio_device { refcount_t refcount; /* user count on registered device*/ unsigned int open_count; struct completion comp; +#if IS_ENABLED(CONFIG_VFIO_ENABLE_GROUP) struct list_head group_next; struct list_head iommu_entry; +#endif struct iommufd_access *iommufd_access; #if IS_ENABLED(CONFIG_IOMMUFD) struct iommufd_device *iommufd_device; @@ -246,7 +250,14 @@ int vfio_mig_get_next_state(struct vfio_device *device, /* * External user API */ +#if IS_ENABLED(CONFIG_VFIO_ENABLE_GROUP) struct iommu_group *vfio_file_iommu_group(struct file *file); +#else +static inline struct iommu_group *vfio_file_iommu_group(struct file *file) +{ + return NULL; +} +#endif bool vfio_file_is_valid(struct file *file); bool vfio_file_enforced_coherent(struct file *file); void vfio_file_set_kvm(struct file *file, struct kvm *kvm);