Message ID | 20240719120501.81279-6-joao.m.martins@oracle.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | hw/iommufd: IOMMUFD Dirty Tracking | expand |
>-----Original Message----- >From: Joao Martins <joao.m.martins@oracle.com> >Subject: [PATCH v5 05/13] vfio/iommufd: Introduce auto domain creation > >There's generally two modes of operation for IOMMUFD: > >1) The simple user API which intends to perform relatively simple things >with IOMMUs e.g. DPDK. The process generally creates an IOAS and attaches >to VFIO and mainly performs IOAS_MAP and UNMAP. > >2) The native IOMMUFD API where you have fine grained control of the >IOMMU domain and model it accordingly. This is where most new feature >are being steered to. > >For dirty tracking 2) is required, as it needs to ensure that >the stage-2/parent IOMMU domain will only attach devices >that support dirty tracking (so far it is all homogeneous in x86, likely >not the case for smmuv3). Such invariant on dirty tracking provides a >useful guarantee to VMMs that will refuse incompatible device >attachments for IOMMU domains. > >Dirty tracking insurance is enforced via HWPT_ALLOC, which is >responsible for creating an IOMMU domain. This is contrast to the >'simple API' where the IOMMU domain is created by IOMMUFD >automatically >when it attaches to VFIO (usually referred as autodomains) but it has >the needed handling for mdevs. > >To support dirty tracking with the advanced IOMMUFD API, it needs >similar logic, where IOMMU domains are created and devices attached to >compatible domains. Essentially mimicking kernel >iommufd_device_auto_get_domain(). With mdevs given there's no IOMMU >domain >it falls back to IOAS attach. > >The auto domain logic allows different IOMMU domains to be created when >DMA dirty tracking is not desired (and VF can provide it), and others where >it is. Here it is not used in this way given how VFIODevice migration >state is initialized after the device attachment. But such mixed mode of >IOMMU dirty tracking + device dirty tracking is an improvement that can >be added on. Keep the 'all of nothing' of type1 approach that we have >been using so far between container vs device dirty tracking. > >Signed-off-by: Joao Martins <joao.m.martins@oracle.com> >--- > include/hw/vfio/vfio-common.h | 9 ++++ > include/sysemu/iommufd.h | 5 +++ > backends/iommufd.c | 30 +++++++++++++ > hw/vfio/iommufd.c | 84 >+++++++++++++++++++++++++++++++++++ > backends/trace-events | 1 + > 5 files changed, 129 insertions(+) > >diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio- >common.h >index 98acae8c1c97..1a96678f8c38 100644 >--- a/include/hw/vfio/vfio-common.h >+++ b/include/hw/vfio/vfio-common.h >@@ -95,10 +95,17 @@ typedef struct VFIOHostDMAWindow { > > typedef struct IOMMUFDBackend IOMMUFDBackend; > >+typedef struct VFIOIOASHwpt { >+ uint32_t hwpt_id; >+ QLIST_HEAD(, VFIODevice) device_list; >+ QLIST_ENTRY(VFIOIOASHwpt) next; >+} VFIOIOASHwpt; >+ > typedef struct VFIOIOMMUFDContainer { > VFIOContainerBase bcontainer; > IOMMUFDBackend *be; > uint32_t ioas_id; >+ QLIST_HEAD(, VFIOIOASHwpt) hwpt_list; > } VFIOIOMMUFDContainer; > > OBJECT_DECLARE_SIMPLE_TYPE(VFIOIOMMUFDContainer, >VFIO_IOMMU_IOMMUFD); >@@ -135,6 +142,8 @@ typedef struct VFIODevice { > HostIOMMUDevice *hiod; > int devid; > IOMMUFDBackend *iommufd; >+ VFIOIOASHwpt *hwpt; >+ QLIST_ENTRY(VFIODevice) hwpt_next; > } VFIODevice; > > struct VFIODeviceOps { >diff --git a/include/sysemu/iommufd.h b/include/sysemu/iommufd.h >index 57d502a1c79a..e917e7591d05 100644 >--- a/include/sysemu/iommufd.h >+++ b/include/sysemu/iommufd.h >@@ -50,6 +50,11 @@ int >iommufd_backend_unmap_dma(IOMMUFDBackend *be, uint32_t ioas_id, > bool iommufd_backend_get_device_info(IOMMUFDBackend *be, uint32_t >devid, > uint32_t *type, void *data, uint32_t len, > uint64_t *caps, Error **errp); >+bool iommufd_backend_alloc_hwpt(IOMMUFDBackend *be, uint32_t >dev_id, >+ uint32_t pt_id, uint32_t flags, >+ uint32_t data_type, uint32_t data_len, >+ void *data_ptr, uint32_t *out_hwpt, >+ Error **errp); > > #define TYPE_HOST_IOMMU_DEVICE_IOMMUFD >TYPE_HOST_IOMMU_DEVICE "-iommufd" > #endif >diff --git a/backends/iommufd.c b/backends/iommufd.c >index 2b3d51af26d2..a94d3b90c05c 100644 >--- a/backends/iommufd.c >+++ b/backends/iommufd.c >@@ -208,6 +208,36 @@ int >iommufd_backend_unmap_dma(IOMMUFDBackend *be, uint32_t ioas_id, > return ret; > } > >+bool iommufd_backend_alloc_hwpt(IOMMUFDBackend *be, uint32_t >dev_id, >+ uint32_t pt_id, uint32_t flags, >+ uint32_t data_type, uint32_t data_len, >+ void *data_ptr, uint32_t *out_hwpt, >+ Error **errp) >+{ >+ int ret, fd = be->fd; >+ struct iommu_hwpt_alloc alloc_hwpt = { >+ .size = sizeof(struct iommu_hwpt_alloc), >+ .flags = flags, >+ .dev_id = dev_id, >+ .pt_id = pt_id, >+ .data_type = data_type, >+ .data_len = data_len, >+ .data_uptr = (uintptr_t)data_ptr, >+ }; >+ >+ ret = ioctl(fd, IOMMU_HWPT_ALLOC, &alloc_hwpt); >+ trace_iommufd_backend_alloc_hwpt(fd, dev_id, pt_id, flags, data_type, >+ data_len, (uintptr_t)data_ptr, >+ alloc_hwpt.out_hwpt_id, ret); >+ if (ret) { >+ error_setg_errno(errp, errno, "Failed to allocate hwpt"); >+ return false; >+ } >+ >+ *out_hwpt = alloc_hwpt.out_hwpt_id; >+ return true; >+} >+ > bool iommufd_backend_get_device_info(IOMMUFDBackend *be, uint32_t >devid, > uint32_t *type, void *data, uint32_t len, > uint64_t *caps, Error **errp) >diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c >index 077dea8f1b64..545f4a404125 100644 >--- a/hw/vfio/iommufd.c >+++ b/hw/vfio/iommufd.c >@@ -212,10 +212,88 @@ static bool >iommufd_cdev_detach_ioas_hwpt(VFIODevice *vbasedev, Error **errp) > return true; > } > >+static bool iommufd_cdev_autodomains_get(VFIODevice *vbasedev, >+ VFIOIOMMUFDContainer *container, >+ Error **errp) >+{ >+ IOMMUFDBackend *iommufd = vbasedev->iommufd; >+ uint32_t flags = 0; >+ VFIOIOASHwpt *hwpt; >+ uint32_t hwpt_id; >+ int ret; >+ >+ /* Try to find a domain */ >+ QLIST_FOREACH(hwpt, &container->hwpt_list, next) { >+ ret = iommufd_cdev_attach_ioas_hwpt(vbasedev, hwpt->hwpt_id, >errp); >+ if (ret) { >+ /* -EINVAL means the domain is incompatible with the device. */ >+ if (ret == -EINVAL) { >+ /* >+ * It is an expected failure and it just means we will try >+ * another domain, or create one if no existing compatible >+ * domain is found. Hence why the error is discarded below. >+ */ >+ error_free(*errp); >+ *errp = NULL; >+ continue; >+ } >+ >+ return false; >+ } else { >+ vbasedev->hwpt = hwpt; >+ QLIST_INSERT_HEAD(&hwpt->device_list, vbasedev, hwpt_next); >+ return true; >+ } >+ } >+ >+ if (!iommufd_backend_alloc_hwpt(iommufd, vbasedev->devid, >+ container->ioas_id, flags, >+ IOMMU_HWPT_DATA_NONE, 0, NULL, >+ &hwpt_id, errp)) { >+ return false; >+ } >+ >+ hwpt = g_malloc0(sizeof(*hwpt)); >+ hwpt->hwpt_id = hwpt_id; >+ QLIST_INIT(&hwpt->device_list); >+ >+ ret = iommufd_cdev_attach_ioas_hwpt(vbasedev, hwpt->hwpt_id, errp); >+ if (ret) { >+ iommufd_backend_free_id(container->be, hwpt->hwpt_id); >+ g_free(hwpt); >+ return false; >+ } >+ >+ vbasedev->hwpt = hwpt; >+ QLIST_INSERT_HEAD(&hwpt->device_list, vbasedev, hwpt_next); >+ QLIST_INSERT_HEAD(&container->hwpt_list, hwpt, next); >+ return true; >+} >+ >+static void iommufd_cdev_autodomains_put(VFIODevice *vbasedev, >+ VFIOIOMMUFDContainer *container) >+{ >+ VFIOIOASHwpt *hwpt = vbasedev->hwpt; >+ >+ QLIST_REMOVE(vbasedev, hwpt_next); >+ vbasedev->hwpt = NULL; >+ >+ if (QLIST_EMPTY(&hwpt->device_list)) { >+ QLIST_REMOVE(hwpt, next); >+ iommufd_backend_free_id(container->be, hwpt->hwpt_id); >+ g_free(hwpt); >+ } >+} Looks the detach flow is still missed? >+ > static bool iommufd_cdev_attach_container(VFIODevice *vbasedev, > VFIOIOMMUFDContainer *container, > Error **errp) > { >+ /* mdevs aren't physical devices and will fail with auto domains */ >+ if (!vbasedev->mdev) { >+ return iommufd_cdev_autodomains_get(vbasedev, container, errp); >+ } >+ > return !iommufd_cdev_attach_ioas_hwpt(vbasedev, container->ioas_id, >errp); > } > >@@ -227,6 +305,11 @@ static void >iommufd_cdev_detach_container(VFIODevice *vbasedev, > if (!iommufd_cdev_detach_ioas_hwpt(vbasedev, &err)) { Shouldn't we check mdev before calling this? > error_report_err(err); > } >+ >+ if (vbasedev->hwpt) { >+ iommufd_cdev_autodomains_put(vbasedev, container); >+ } >+ > } > > static void iommufd_cdev_container_destroy(VFIOIOMMUFDContainer >*container) >@@ -354,6 +437,7 @@ static bool iommufd_cdev_attach(const char *name, >VFIODevice *vbasedev, > container = >VFIO_IOMMU_IOMMUFD(object_new(TYPE_VFIO_IOMMU_IOMMUFD)); > container->be = vbasedev->iommufd; > container->ioas_id = ioas_id; >+ QLIST_INIT(&container->hwpt_list); This can be in ::instance_init(). Thanks Zhenzhong > > bcontainer = &container->bcontainer; > vfio_address_space_insert(space, bcontainer); >diff --git a/backends/trace-events b/backends/trace-events >index 211e6f374adc..4d8ac02fe7d6 100644 >--- a/backends/trace-events >+++ b/backends/trace-events >@@ -14,4 +14,5 @@ iommufd_backend_map_dma(int iommufd, uint32_t >ioas, uint64_t iova, uint64_t size > iommufd_backend_unmap_dma_non_exist(int iommufd, uint32_t ioas, >uint64_t iova, uint64_t size, int ret) " Unmap nonexistent mapping: >iommufd=%d ioas=%d iova=0x%"PRIx64" size=0x%"PRIx64" (%d)" > iommufd_backend_unmap_dma(int iommufd, uint32_t ioas, uint64_t iova, >uint64_t size, int ret) " iommufd=%d ioas=%d iova=0x%"PRIx64" >size=0x%"PRIx64" (%d)" > iommufd_backend_alloc_ioas(int iommufd, uint32_t ioas) " iommufd=%d >ioas=%d" >+iommufd_backend_alloc_hwpt(int iommufd, uint32_t dev_id, uint32_t >pt_id, uint32_t flags, uint32_t hwpt_type, uint32_t len, uint64_t data_ptr, >uint32_t out_hwpt_id, int ret) " iommufd=%d dev_id=%u pt_id=%u >flags=0x%x hwpt_type=%u len=%u data_ptr=0x%"PRIx64" out_hwpt=%u >(%d)" > iommufd_backend_free_id(int iommufd, uint32_t id, int ret) " iommufd=%d >id=%d (%d)" >-- >2.17.2
On 22/07/2024 06:16, Duan, Zhenzhong wrote: >> -----Original Message----- >> From: Joao Martins <joao.m.martins@oracle.com> >> Subject: [PATCH v5 05/13] vfio/iommufd: Introduce auto domain creation >> >> There's generally two modes of operation for IOMMUFD: >> >> 1) The simple user API which intends to perform relatively simple things >> with IOMMUs e.g. DPDK. The process generally creates an IOAS and attaches >> to VFIO and mainly performs IOAS_MAP and UNMAP. >> >> 2) The native IOMMUFD API where you have fine grained control of the >> IOMMU domain and model it accordingly. This is where most new feature >> are being steered to. >> >> For dirty tracking 2) is required, as it needs to ensure that >> the stage-2/parent IOMMU domain will only attach devices >> that support dirty tracking (so far it is all homogeneous in x86, likely >> not the case for smmuv3). Such invariant on dirty tracking provides a >> useful guarantee to VMMs that will refuse incompatible device >> attachments for IOMMU domains. >> >> Dirty tracking insurance is enforced via HWPT_ALLOC, which is >> responsible for creating an IOMMU domain. This is contrast to the >> 'simple API' where the IOMMU domain is created by IOMMUFD >> automatically >> when it attaches to VFIO (usually referred as autodomains) but it has >> the needed handling for mdevs. >> >> To support dirty tracking with the advanced IOMMUFD API, it needs >> similar logic, where IOMMU domains are created and devices attached to >> compatible domains. Essentially mimicking kernel >> iommufd_device_auto_get_domain(). With mdevs given there's no IOMMU >> domain >> it falls back to IOAS attach. >> >> The auto domain logic allows different IOMMU domains to be created when >> DMA dirty tracking is not desired (and VF can provide it), and others where >> it is. Here it is not used in this way given how VFIODevice migration >> state is initialized after the device attachment. But such mixed mode of >> IOMMU dirty tracking + device dirty tracking is an improvement that can >> be added on. Keep the 'all of nothing' of type1 approach that we have >> been using so far between container vs device dirty tracking. >> >> Signed-off-by: Joao Martins <joao.m.martins@oracle.com> >> --- >> include/hw/vfio/vfio-common.h | 9 ++++ >> include/sysemu/iommufd.h | 5 +++ >> backends/iommufd.c | 30 +++++++++++++ >> hw/vfio/iommufd.c | 84 >> +++++++++++++++++++++++++++++++++++ >> backends/trace-events | 1 + >> 5 files changed, 129 insertions(+) >> >> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio- >> common.h >> index 98acae8c1c97..1a96678f8c38 100644 >> --- a/include/hw/vfio/vfio-common.h >> +++ b/include/hw/vfio/vfio-common.h >> @@ -95,10 +95,17 @@ typedef struct VFIOHostDMAWindow { >> >> typedef struct IOMMUFDBackend IOMMUFDBackend; >> >> +typedef struct VFIOIOASHwpt { >> + uint32_t hwpt_id; >> + QLIST_HEAD(, VFIODevice) device_list; >> + QLIST_ENTRY(VFIOIOASHwpt) next; >> +} VFIOIOASHwpt; >> + >> typedef struct VFIOIOMMUFDContainer { >> VFIOContainerBase bcontainer; >> IOMMUFDBackend *be; >> uint32_t ioas_id; >> + QLIST_HEAD(, VFIOIOASHwpt) hwpt_list; >> } VFIOIOMMUFDContainer; >> >> OBJECT_DECLARE_SIMPLE_TYPE(VFIOIOMMUFDContainer, >> VFIO_IOMMU_IOMMUFD); >> @@ -135,6 +142,8 @@ typedef struct VFIODevice { >> HostIOMMUDevice *hiod; >> int devid; >> IOMMUFDBackend *iommufd; >> + VFIOIOASHwpt *hwpt; >> + QLIST_ENTRY(VFIODevice) hwpt_next; >> } VFIODevice; >> >> struct VFIODeviceOps { >> diff --git a/include/sysemu/iommufd.h b/include/sysemu/iommufd.h >> index 57d502a1c79a..e917e7591d05 100644 >> --- a/include/sysemu/iommufd.h >> +++ b/include/sysemu/iommufd.h >> @@ -50,6 +50,11 @@ int >> iommufd_backend_unmap_dma(IOMMUFDBackend *be, uint32_t ioas_id, >> bool iommufd_backend_get_device_info(IOMMUFDBackend *be, uint32_t >> devid, >> uint32_t *type, void *data, uint32_t len, >> uint64_t *caps, Error **errp); >> +bool iommufd_backend_alloc_hwpt(IOMMUFDBackend *be, uint32_t >> dev_id, >> + uint32_t pt_id, uint32_t flags, >> + uint32_t data_type, uint32_t data_len, >> + void *data_ptr, uint32_t *out_hwpt, >> + Error **errp); >> >> #define TYPE_HOST_IOMMU_DEVICE_IOMMUFD >> TYPE_HOST_IOMMU_DEVICE "-iommufd" >> #endif >> diff --git a/backends/iommufd.c b/backends/iommufd.c >> index 2b3d51af26d2..a94d3b90c05c 100644 >> --- a/backends/iommufd.c >> +++ b/backends/iommufd.c >> @@ -208,6 +208,36 @@ int >> iommufd_backend_unmap_dma(IOMMUFDBackend *be, uint32_t ioas_id, >> return ret; >> } >> >> +bool iommufd_backend_alloc_hwpt(IOMMUFDBackend *be, uint32_t >> dev_id, >> + uint32_t pt_id, uint32_t flags, >> + uint32_t data_type, uint32_t data_len, >> + void *data_ptr, uint32_t *out_hwpt, >> + Error **errp) >> +{ >> + int ret, fd = be->fd; >> + struct iommu_hwpt_alloc alloc_hwpt = { >> + .size = sizeof(struct iommu_hwpt_alloc), >> + .flags = flags, >> + .dev_id = dev_id, >> + .pt_id = pt_id, >> + .data_type = data_type, >> + .data_len = data_len, >> + .data_uptr = (uintptr_t)data_ptr, >> + }; >> + >> + ret = ioctl(fd, IOMMU_HWPT_ALLOC, &alloc_hwpt); >> + trace_iommufd_backend_alloc_hwpt(fd, dev_id, pt_id, flags, data_type, >> + data_len, (uintptr_t)data_ptr, >> + alloc_hwpt.out_hwpt_id, ret); >> + if (ret) { >> + error_setg_errno(errp, errno, "Failed to allocate hwpt"); >> + return false; >> + } >> + >> + *out_hwpt = alloc_hwpt.out_hwpt_id; >> + return true; >> +} >> + >> bool iommufd_backend_get_device_info(IOMMUFDBackend *be, uint32_t >> devid, >> uint32_t *type, void *data, uint32_t len, >> uint64_t *caps, Error **errp) >> diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c >> index 077dea8f1b64..545f4a404125 100644 >> --- a/hw/vfio/iommufd.c >> +++ b/hw/vfio/iommufd.c >> @@ -212,10 +212,88 @@ static bool >> iommufd_cdev_detach_ioas_hwpt(VFIODevice *vbasedev, Error **errp) >> return true; >> } >> >> +static bool iommufd_cdev_autodomains_get(VFIODevice *vbasedev, >> + VFIOIOMMUFDContainer *container, >> + Error **errp) >> +{ >> + IOMMUFDBackend *iommufd = vbasedev->iommufd; >> + uint32_t flags = 0; >> + VFIOIOASHwpt *hwpt; >> + uint32_t hwpt_id; >> + int ret; >> + >> + /* Try to find a domain */ >> + QLIST_FOREACH(hwpt, &container->hwpt_list, next) { >> + ret = iommufd_cdev_attach_ioas_hwpt(vbasedev, hwpt->hwpt_id, >> errp); >> + if (ret) { >> + /* -EINVAL means the domain is incompatible with the device. */ >> + if (ret == -EINVAL) { >> + /* >> + * It is an expected failure and it just means we will try >> + * another domain, or create one if no existing compatible >> + * domain is found. Hence why the error is discarded below. >> + */ >> + error_free(*errp); >> + *errp = NULL; >> + continue; >> + } >> + >> + return false; >> + } else { >> + vbasedev->hwpt = hwpt; >> + QLIST_INSERT_HEAD(&hwpt->device_list, vbasedev, hwpt_next); >> + return true; >> + } >> + } >> + >> + if (!iommufd_backend_alloc_hwpt(iommufd, vbasedev->devid, >> + container->ioas_id, flags, >> + IOMMU_HWPT_DATA_NONE, 0, NULL, >> + &hwpt_id, errp)) { >> + return false; >> + } >> + >> + hwpt = g_malloc0(sizeof(*hwpt)); >> + hwpt->hwpt_id = hwpt_id; >> + QLIST_INIT(&hwpt->device_list); >> + >> + ret = iommufd_cdev_attach_ioas_hwpt(vbasedev, hwpt->hwpt_id, errp); >> + if (ret) { >> + iommufd_backend_free_id(container->be, hwpt->hwpt_id); >> + g_free(hwpt); >> + return false; >> + } >> + >> + vbasedev->hwpt = hwpt; >> + QLIST_INSERT_HEAD(&hwpt->device_list, vbasedev, hwpt_next); >> + QLIST_INSERT_HEAD(&container->hwpt_list, hwpt, next); >> + return true; >> +} >> + >> +static void iommufd_cdev_autodomains_put(VFIODevice *vbasedev, >> + VFIOIOMMUFDContainer *container) >> +{ >> + VFIOIOASHwpt *hwpt = vbasedev->hwpt; >> + >> + QLIST_REMOVE(vbasedev, hwpt_next); >> + vbasedev->hwpt = NULL; >> + >> + if (QLIST_EMPTY(&hwpt->device_list)) { >> + QLIST_REMOVE(hwpt, next); >> + iommufd_backend_free_id(container->be, hwpt->hwpt_id); >> + g_free(hwpt); >> + } >> +} > > Looks the detach flow is still missed? > I don't think so. The iommufd_backend_free_id() pairs with alloc_hwpt call and is there for when there's no device attached to the hwpt to actually free the hwpt. Besides setting to NULL the device hwpt, the detach flow was fixed below (...) >> + >> static bool iommufd_cdev_attach_container(VFIODevice *vbasedev, >> VFIOIOMMUFDContainer *container, >> Error **errp) >> { >> + /* mdevs aren't physical devices and will fail with auto domains */ >> + if (!vbasedev->mdev) { >> + return iommufd_cdev_autodomains_get(vbasedev, container, errp); >> + } >> + >> return !iommufd_cdev_attach_ioas_hwpt(vbasedev, container->ioas_id, >> errp); >> } >> >> @@ -227,6 +305,11 @@ static void >> iommufd_cdev_detach_container(VFIODevice *vbasedev, >> if (!iommufd_cdev_detach_ioas_hwpt(vbasedev, &err)) { > > Shouldn't we check mdev before calling this? > (...) here. Detach needs to be called for both, and keep in mind that this doesn't a pt_id, as the ioctl detaches from whatever domain or emulated idea of it (for mdev) that it has previously been called IOMMUFD_ATTACH with. We also call this with mdev we just don't call it with a hwpt_id but rather use autodomains (and it doesn't actually allocate a hw domain) >> error_report_err(err); >> } >> + >> + if (vbasedev->hwpt) { >> + iommufd_cdev_autodomains_put(vbasedev, container); >> + } >> + >> } >> >> static void iommufd_cdev_container_destroy(VFIOIOMMUFDContainer >> *container) >> @@ -354,6 +437,7 @@ static bool iommufd_cdev_attach(const char *name, >> VFIODevice *vbasedev, >> container = >> VFIO_IOMMU_IOMMUFD(object_new(TYPE_VFIO_IOMMU_IOMMUFD)); >> container->be = vbasedev->iommufd; >> container->ioas_id = ioas_id; >> + QLIST_INIT(&container->hwpt_list); > > This can be in ::instance_init(). > But there's no instance_init() for TYPE_VFIO_IOMMU_IOMMUFD. This is where all IOMMUFD container stuff is taking place aiui. > Thanks > Zhenzhong > >> >> bcontainer = &container->bcontainer; >> vfio_address_space_insert(space, bcontainer); >> diff --git a/backends/trace-events b/backends/trace-events >> index 211e6f374adc..4d8ac02fe7d6 100644 >> --- a/backends/trace-events >> +++ b/backends/trace-events >> @@ -14,4 +14,5 @@ iommufd_backend_map_dma(int iommufd, uint32_t >> ioas, uint64_t iova, uint64_t size >> iommufd_backend_unmap_dma_non_exist(int iommufd, uint32_t ioas, >> uint64_t iova, uint64_t size, int ret) " Unmap nonexistent mapping: >> iommufd=%d ioas=%d iova=0x%"PRIx64" size=0x%"PRIx64" (%d)" >> iommufd_backend_unmap_dma(int iommufd, uint32_t ioas, uint64_t iova, >> uint64_t size, int ret) " iommufd=%d ioas=%d iova=0x%"PRIx64" >> size=0x%"PRIx64" (%d)" >> iommufd_backend_alloc_ioas(int iommufd, uint32_t ioas) " iommufd=%d >> ioas=%d" >> +iommufd_backend_alloc_hwpt(int iommufd, uint32_t dev_id, uint32_t >> pt_id, uint32_t flags, uint32_t hwpt_type, uint32_t len, uint64_t data_ptr, >> uint32_t out_hwpt_id, int ret) " iommufd=%d dev_id=%u pt_id=%u >> flags=0x%x hwpt_type=%u len=%u data_ptr=0x%"PRIx64" out_hwpt=%u >> (%d)" >> iommufd_backend_free_id(int iommufd, uint32_t id, int ret) " iommufd=%d >> id=%d (%d)" >> -- >> 2.17.2 >
On 7/22/24 10:50, Joao Martins wrote: > On 22/07/2024 06:16, Duan, Zhenzhong wrote: >>> -----Original Message----- >>> From: Joao Martins <joao.m.martins@oracle.com> >>> Subject: [PATCH v5 05/13] vfio/iommufd: Introduce auto domain creation >>> >>> There's generally two modes of operation for IOMMUFD: >>> >>> 1) The simple user API which intends to perform relatively simple things >>> with IOMMUs e.g. DPDK. The process generally creates an IOAS and attaches >>> to VFIO and mainly performs IOAS_MAP and UNMAP. >>> >>> 2) The native IOMMUFD API where you have fine grained control of the >>> IOMMU domain and model it accordingly. This is where most new feature >>> are being steered to. >>> >>> For dirty tracking 2) is required, as it needs to ensure that >>> the stage-2/parent IOMMU domain will only attach devices >>> that support dirty tracking (so far it is all homogeneous in x86, likely >>> not the case for smmuv3). Such invariant on dirty tracking provides a >>> useful guarantee to VMMs that will refuse incompatible device >>> attachments for IOMMU domains. >>> >>> Dirty tracking insurance is enforced via HWPT_ALLOC, which is >>> responsible for creating an IOMMU domain. This is contrast to the >>> 'simple API' where the IOMMU domain is created by IOMMUFD >>> automatically >>> when it attaches to VFIO (usually referred as autodomains) but it has >>> the needed handling for mdevs. >>> >>> To support dirty tracking with the advanced IOMMUFD API, it needs >>> similar logic, where IOMMU domains are created and devices attached to >>> compatible domains. Essentially mimicking kernel >>> iommufd_device_auto_get_domain(). With mdevs given there's no IOMMU >>> domain >>> it falls back to IOAS attach. >>> >>> The auto domain logic allows different IOMMU domains to be created when >>> DMA dirty tracking is not desired (and VF can provide it), and others where >>> it is. Here it is not used in this way given how VFIODevice migration >>> state is initialized after the device attachment. But such mixed mode of >>> IOMMU dirty tracking + device dirty tracking is an improvement that can >>> be added on. Keep the 'all of nothing' of type1 approach that we have >>> been using so far between container vs device dirty tracking. >>> >>> Signed-off-by: Joao Martins <joao.m.martins@oracle.com> >>> --- >>> include/hw/vfio/vfio-common.h | 9 ++++ >>> include/sysemu/iommufd.h | 5 +++ >>> backends/iommufd.c | 30 +++++++++++++ >>> hw/vfio/iommufd.c | 84 >>> +++++++++++++++++++++++++++++++++++ >>> backends/trace-events | 1 + >>> 5 files changed, 129 insertions(+) >>> >>> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio- >>> common.h >>> index 98acae8c1c97..1a96678f8c38 100644 >>> --- a/include/hw/vfio/vfio-common.h >>> +++ b/include/hw/vfio/vfio-common.h >>> @@ -95,10 +95,17 @@ typedef struct VFIOHostDMAWindow { >>> >>> typedef struct IOMMUFDBackend IOMMUFDBackend; >>> >>> +typedef struct VFIOIOASHwpt { >>> + uint32_t hwpt_id; >>> + QLIST_HEAD(, VFIODevice) device_list; >>> + QLIST_ENTRY(VFIOIOASHwpt) next; >>> +} VFIOIOASHwpt; >>> + >>> typedef struct VFIOIOMMUFDContainer { >>> VFIOContainerBase bcontainer; >>> IOMMUFDBackend *be; >>> uint32_t ioas_id; >>> + QLIST_HEAD(, VFIOIOASHwpt) hwpt_list; >>> } VFIOIOMMUFDContainer; >>> >>> OBJECT_DECLARE_SIMPLE_TYPE(VFIOIOMMUFDContainer, >>> VFIO_IOMMU_IOMMUFD); >>> @@ -135,6 +142,8 @@ typedef struct VFIODevice { >>> HostIOMMUDevice *hiod; >>> int devid; >>> IOMMUFDBackend *iommufd; >>> + VFIOIOASHwpt *hwpt; >>> + QLIST_ENTRY(VFIODevice) hwpt_next; >>> } VFIODevice; >>> >>> struct VFIODeviceOps { >>> diff --git a/include/sysemu/iommufd.h b/include/sysemu/iommufd.h >>> index 57d502a1c79a..e917e7591d05 100644 >>> --- a/include/sysemu/iommufd.h >>> +++ b/include/sysemu/iommufd.h >>> @@ -50,6 +50,11 @@ int >>> iommufd_backend_unmap_dma(IOMMUFDBackend *be, uint32_t ioas_id, >>> bool iommufd_backend_get_device_info(IOMMUFDBackend *be, uint32_t >>> devid, >>> uint32_t *type, void *data, uint32_t len, >>> uint64_t *caps, Error **errp); >>> +bool iommufd_backend_alloc_hwpt(IOMMUFDBackend *be, uint32_t >>> dev_id, >>> + uint32_t pt_id, uint32_t flags, >>> + uint32_t data_type, uint32_t data_len, >>> + void *data_ptr, uint32_t *out_hwpt, >>> + Error **errp); >>> >>> #define TYPE_HOST_IOMMU_DEVICE_IOMMUFD >>> TYPE_HOST_IOMMU_DEVICE "-iommufd" >>> #endif >>> diff --git a/backends/iommufd.c b/backends/iommufd.c >>> index 2b3d51af26d2..a94d3b90c05c 100644 >>> --- a/backends/iommufd.c >>> +++ b/backends/iommufd.c >>> @@ -208,6 +208,36 @@ int >>> iommufd_backend_unmap_dma(IOMMUFDBackend *be, uint32_t ioas_id, >>> return ret; >>> } >>> >>> +bool iommufd_backend_alloc_hwpt(IOMMUFDBackend *be, uint32_t >>> dev_id, >>> + uint32_t pt_id, uint32_t flags, >>> + uint32_t data_type, uint32_t data_len, >>> + void *data_ptr, uint32_t *out_hwpt, >>> + Error **errp) >>> +{ >>> + int ret, fd = be->fd; >>> + struct iommu_hwpt_alloc alloc_hwpt = { >>> + .size = sizeof(struct iommu_hwpt_alloc), >>> + .flags = flags, >>> + .dev_id = dev_id, >>> + .pt_id = pt_id, >>> + .data_type = data_type, >>> + .data_len = data_len, >>> + .data_uptr = (uintptr_t)data_ptr, >>> + }; >>> + >>> + ret = ioctl(fd, IOMMU_HWPT_ALLOC, &alloc_hwpt); >>> + trace_iommufd_backend_alloc_hwpt(fd, dev_id, pt_id, flags, data_type, >>> + data_len, (uintptr_t)data_ptr, >>> + alloc_hwpt.out_hwpt_id, ret); >>> + if (ret) { >>> + error_setg_errno(errp, errno, "Failed to allocate hwpt"); >>> + return false; >>> + } >>> + >>> + *out_hwpt = alloc_hwpt.out_hwpt_id; >>> + return true; >>> +} >>> + >>> bool iommufd_backend_get_device_info(IOMMUFDBackend *be, uint32_t >>> devid, >>> uint32_t *type, void *data, uint32_t len, >>> uint64_t *caps, Error **errp) >>> diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c >>> index 077dea8f1b64..545f4a404125 100644 >>> --- a/hw/vfio/iommufd.c >>> +++ b/hw/vfio/iommufd.c >>> @@ -212,10 +212,88 @@ static bool >>> iommufd_cdev_detach_ioas_hwpt(VFIODevice *vbasedev, Error **errp) >>> return true; >>> } >>> >>> +static bool iommufd_cdev_autodomains_get(VFIODevice *vbasedev, >>> + VFIOIOMMUFDContainer *container, >>> + Error **errp) >>> +{ >>> + IOMMUFDBackend *iommufd = vbasedev->iommufd; >>> + uint32_t flags = 0; >>> + VFIOIOASHwpt *hwpt; >>> + uint32_t hwpt_id; >>> + int ret; >>> + >>> + /* Try to find a domain */ >>> + QLIST_FOREACH(hwpt, &container->hwpt_list, next) { >>> + ret = iommufd_cdev_attach_ioas_hwpt(vbasedev, hwpt->hwpt_id, >>> errp); >>> + if (ret) { >>> + /* -EINVAL means the domain is incompatible with the device. */ >>> + if (ret == -EINVAL) { >>> + /* >>> + * It is an expected failure and it just means we will try >>> + * another domain, or create one if no existing compatible >>> + * domain is found. Hence why the error is discarded below. >>> + */ >>> + error_free(*errp); >>> + *errp = NULL; >>> + continue; >>> + } >>> + >>> + return false; >>> + } else { >>> + vbasedev->hwpt = hwpt; >>> + QLIST_INSERT_HEAD(&hwpt->device_list, vbasedev, hwpt_next); >>> + return true; >>> + } >>> + } >>> + >>> + if (!iommufd_backend_alloc_hwpt(iommufd, vbasedev->devid, >>> + container->ioas_id, flags, >>> + IOMMU_HWPT_DATA_NONE, 0, NULL, >>> + &hwpt_id, errp)) { >>> + return false; >>> + } >>> + >>> + hwpt = g_malloc0(sizeof(*hwpt)); >>> + hwpt->hwpt_id = hwpt_id; >>> + QLIST_INIT(&hwpt->device_list); >>> + >>> + ret = iommufd_cdev_attach_ioas_hwpt(vbasedev, hwpt->hwpt_id, errp); >>> + if (ret) { >>> + iommufd_backend_free_id(container->be, hwpt->hwpt_id); >>> + g_free(hwpt); >>> + return false; >>> + } >>> + >>> + vbasedev->hwpt = hwpt; >>> + QLIST_INSERT_HEAD(&hwpt->device_list, vbasedev, hwpt_next); >>> + QLIST_INSERT_HEAD(&container->hwpt_list, hwpt, next); >>> + return true; >>> +} >>> + >>> +static void iommufd_cdev_autodomains_put(VFIODevice *vbasedev, >>> + VFIOIOMMUFDContainer *container) >>> +{ >>> + VFIOIOASHwpt *hwpt = vbasedev->hwpt; >>> + >>> + QLIST_REMOVE(vbasedev, hwpt_next); >>> + vbasedev->hwpt = NULL; >>> + >>> + if (QLIST_EMPTY(&hwpt->device_list)) { >>> + QLIST_REMOVE(hwpt, next); >>> + iommufd_backend_free_id(container->be, hwpt->hwpt_id); >>> + g_free(hwpt); >>> + } >>> +} >> >> Looks the detach flow is still missed? >> > > > I don't think so. The iommufd_backend_free_id() pairs with alloc_hwpt call and > is there for when there's no device attached to the hwpt to actually free the > hwpt. Besides setting to NULL the device hwpt, the detach flow was fixed below (...) > >>> + >>> static bool iommufd_cdev_attach_container(VFIODevice *vbasedev, >>> VFIOIOMMUFDContainer *container, >>> Error **errp) >>> { >>> + /* mdevs aren't physical devices and will fail with auto domains */ >>> + if (!vbasedev->mdev) { >>> + return iommufd_cdev_autodomains_get(vbasedev, container, errp); >>> + } >>> + >>> return !iommufd_cdev_attach_ioas_hwpt(vbasedev, container->ioas_id, >>> errp); >>> } >>> >>> @@ -227,6 +305,11 @@ static void >>> iommufd_cdev_detach_container(VFIODevice *vbasedev, >>> if (!iommufd_cdev_detach_ioas_hwpt(vbasedev, &err)) { >> >> Shouldn't we check mdev before calling this? >> > (...) here. Detach needs to be called for both, and keep in mind that this > doesn't a pt_id, as the ioctl detaches from whatever domain or emulated idea of > it (for mdev) that it has previously been called IOMMUFD_ATTACH with. > > We also call this with mdev we just don't call it with a hwpt_id but rather use > autodomains (and it doesn't actually allocate a hw domain) > >>> error_report_err(err); >>> } >>> + >>> + if (vbasedev->hwpt) { >>> + iommufd_cdev_autodomains_put(vbasedev, container); >>> + } >>> + >>> } >>> >>> static void iommufd_cdev_container_destroy(VFIOIOMMUFDContainer >>> *container) >>> @@ -354,6 +437,7 @@ static bool iommufd_cdev_attach(const char *name, >>> VFIODevice *vbasedev, >>> container = >>> VFIO_IOMMU_IOMMUFD(object_new(TYPE_VFIO_IOMMU_IOMMUFD)); >>> container->be = vbasedev->iommufd; >>> container->ioas_id = ioas_id; >>> + QLIST_INIT(&container->hwpt_list); >> >> This can be in ::instance_init(). >> > But there's no instance_init() for TYPE_VFIO_IOMMU_IOMMUFD. This is where all > IOMMUFD container stuff is taking place aiui. We can add an .instance_init() handler later on. It would be cleaner I agree but it shouldn't be a reason to block the series. Zhenzhong, Did Joao address your concerns ? Thanks, C. >> Thanks >> Zhenzhong >> >>> >>> bcontainer = &container->bcontainer; >>> vfio_address_space_insert(space, bcontainer); >>> diff --git a/backends/trace-events b/backends/trace-events >>> index 211e6f374adc..4d8ac02fe7d6 100644 >>> --- a/backends/trace-events >>> +++ b/backends/trace-events >>> @@ -14,4 +14,5 @@ iommufd_backend_map_dma(int iommufd, uint32_t >>> ioas, uint64_t iova, uint64_t size >>> iommufd_backend_unmap_dma_non_exist(int iommufd, uint32_t ioas, >>> uint64_t iova, uint64_t size, int ret) " Unmap nonexistent mapping: >>> iommufd=%d ioas=%d iova=0x%"PRIx64" size=0x%"PRIx64" (%d)" >>> iommufd_backend_unmap_dma(int iommufd, uint32_t ioas, uint64_t iova, >>> uint64_t size, int ret) " iommufd=%d ioas=%d iova=0x%"PRIx64" >>> size=0x%"PRIx64" (%d)" >>> iommufd_backend_alloc_ioas(int iommufd, uint32_t ioas) " iommufd=%d >>> ioas=%d" >>> +iommufd_backend_alloc_hwpt(int iommufd, uint32_t dev_id, uint32_t >>> pt_id, uint32_t flags, uint32_t hwpt_type, uint32_t len, uint64_t data_ptr, >>> uint32_t out_hwpt_id, int ret) " iommufd=%d dev_id=%u pt_id=%u >>> flags=0x%x hwpt_type=%u len=%u data_ptr=0x%"PRIx64" out_hwpt=%u >>> (%d)" >>> iommufd_backend_free_id(int iommufd, uint32_t id, int ret) " iommufd=%d >>> id=%d (%d)" >>> -- >>> 2.17.2 >> >
>-----Original Message----- >From: Cédric Le Goater <clg@redhat.com> >Subject: Re: [PATCH v5 05/13] vfio/iommufd: Introduce auto domain >creation > >On 7/22/24 10:50, Joao Martins wrote: >> On 22/07/2024 06:16, Duan, Zhenzhong wrote: >>>> -----Original Message----- >>>> From: Joao Martins <joao.m.martins@oracle.com> >>>> Subject: [PATCH v5 05/13] vfio/iommufd: Introduce auto domain >creation >>>> >>>> There's generally two modes of operation for IOMMUFD: >>>> >>>> 1) The simple user API which intends to perform relatively simple things >>>> with IOMMUs e.g. DPDK. The process generally creates an IOAS and >attaches >>>> to VFIO and mainly performs IOAS_MAP and UNMAP. >>>> >>>> 2) The native IOMMUFD API where you have fine grained control of the >>>> IOMMU domain and model it accordingly. This is where most new >feature >>>> are being steered to. >>>> >>>> For dirty tracking 2) is required, as it needs to ensure that >>>> the stage-2/parent IOMMU domain will only attach devices >>>> that support dirty tracking (so far it is all homogeneous in x86, likely >>>> not the case for smmuv3). Such invariant on dirty tracking provides a >>>> useful guarantee to VMMs that will refuse incompatible device >>>> attachments for IOMMU domains. >>>> >>>> Dirty tracking insurance is enforced via HWPT_ALLOC, which is >>>> responsible for creating an IOMMU domain. This is contrast to the >>>> 'simple API' where the IOMMU domain is created by IOMMUFD >>>> automatically >>>> when it attaches to VFIO (usually referred as autodomains) but it has >>>> the needed handling for mdevs. >>>> >>>> To support dirty tracking with the advanced IOMMUFD API, it needs >>>> similar logic, where IOMMU domains are created and devices attached >to >>>> compatible domains. Essentially mimicking kernel >>>> iommufd_device_auto_get_domain(). With mdevs given there's no >IOMMU >>>> domain >>>> it falls back to IOAS attach. >>>> >>>> The auto domain logic allows different IOMMU domains to be created >when >>>> DMA dirty tracking is not desired (and VF can provide it), and others >where >>>> it is. Here it is not used in this way given how VFIODevice migration >>>> state is initialized after the device attachment. But such mixed mode of >>>> IOMMU dirty tracking + device dirty tracking is an improvement that >can >>>> be added on. Keep the 'all of nothing' of type1 approach that we have >>>> been using so far between container vs device dirty tracking. >>>> >>>> Signed-off-by: Joao Martins <joao.m.martins@oracle.com> >>>> --- >>>> include/hw/vfio/vfio-common.h | 9 ++++ >>>> include/sysemu/iommufd.h | 5 +++ >>>> backends/iommufd.c | 30 +++++++++++++ >>>> hw/vfio/iommufd.c | 84 >>>> +++++++++++++++++++++++++++++++++++ >>>> backends/trace-events | 1 + >>>> 5 files changed, 129 insertions(+) >>>> >>>> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio- >>>> common.h >>>> index 98acae8c1c97..1a96678f8c38 100644 >>>> --- a/include/hw/vfio/vfio-common.h >>>> +++ b/include/hw/vfio/vfio-common.h >>>> @@ -95,10 +95,17 @@ typedef struct VFIOHostDMAWindow { >>>> >>>> typedef struct IOMMUFDBackend IOMMUFDBackend; >>>> >>>> +typedef struct VFIOIOASHwpt { >>>> + uint32_t hwpt_id; >>>> + QLIST_HEAD(, VFIODevice) device_list; >>>> + QLIST_ENTRY(VFIOIOASHwpt) next; >>>> +} VFIOIOASHwpt; >>>> + >>>> typedef struct VFIOIOMMUFDContainer { >>>> VFIOContainerBase bcontainer; >>>> IOMMUFDBackend *be; >>>> uint32_t ioas_id; >>>> + QLIST_HEAD(, VFIOIOASHwpt) hwpt_list; >>>> } VFIOIOMMUFDContainer; >>>> >>>> OBJECT_DECLARE_SIMPLE_TYPE(VFIOIOMMUFDContainer, >>>> VFIO_IOMMU_IOMMUFD); >>>> @@ -135,6 +142,8 @@ typedef struct VFIODevice { >>>> HostIOMMUDevice *hiod; >>>> int devid; >>>> IOMMUFDBackend *iommufd; >>>> + VFIOIOASHwpt *hwpt; >>>> + QLIST_ENTRY(VFIODevice) hwpt_next; >>>> } VFIODevice; >>>> >>>> struct VFIODeviceOps { >>>> diff --git a/include/sysemu/iommufd.h b/include/sysemu/iommufd.h >>>> index 57d502a1c79a..e917e7591d05 100644 >>>> --- a/include/sysemu/iommufd.h >>>> +++ b/include/sysemu/iommufd.h >>>> @@ -50,6 +50,11 @@ int >>>> iommufd_backend_unmap_dma(IOMMUFDBackend *be, uint32_t >ioas_id, >>>> bool iommufd_backend_get_device_info(IOMMUFDBackend *be, >uint32_t >>>> devid, >>>> uint32_t *type, void *data, uint32_t len, >>>> uint64_t *caps, Error **errp); >>>> +bool iommufd_backend_alloc_hwpt(IOMMUFDBackend *be, uint32_t >>>> dev_id, >>>> + uint32_t pt_id, uint32_t flags, >>>> + uint32_t data_type, uint32_t data_len, >>>> + void *data_ptr, uint32_t *out_hwpt, >>>> + Error **errp); >>>> >>>> #define TYPE_HOST_IOMMU_DEVICE_IOMMUFD >>>> TYPE_HOST_IOMMU_DEVICE "-iommufd" >>>> #endif >>>> diff --git a/backends/iommufd.c b/backends/iommufd.c >>>> index 2b3d51af26d2..a94d3b90c05c 100644 >>>> --- a/backends/iommufd.c >>>> +++ b/backends/iommufd.c >>>> @@ -208,6 +208,36 @@ int >>>> iommufd_backend_unmap_dma(IOMMUFDBackend *be, uint32_t >ioas_id, >>>> return ret; >>>> } >>>> >>>> +bool iommufd_backend_alloc_hwpt(IOMMUFDBackend *be, uint32_t >>>> dev_id, >>>> + uint32_t pt_id, uint32_t flags, >>>> + uint32_t data_type, uint32_t data_len, >>>> + void *data_ptr, uint32_t *out_hwpt, >>>> + Error **errp) >>>> +{ >>>> + int ret, fd = be->fd; >>>> + struct iommu_hwpt_alloc alloc_hwpt = { >>>> + .size = sizeof(struct iommu_hwpt_alloc), >>>> + .flags = flags, >>>> + .dev_id = dev_id, >>>> + .pt_id = pt_id, >>>> + .data_type = data_type, >>>> + .data_len = data_len, >>>> + .data_uptr = (uintptr_t)data_ptr, >>>> + }; >>>> + >>>> + ret = ioctl(fd, IOMMU_HWPT_ALLOC, &alloc_hwpt); >>>> + trace_iommufd_backend_alloc_hwpt(fd, dev_id, pt_id, flags, >data_type, >>>> + data_len, (uintptr_t)data_ptr, >>>> + alloc_hwpt.out_hwpt_id, ret); >>>> + if (ret) { >>>> + error_setg_errno(errp, errno, "Failed to allocate hwpt"); >>>> + return false; >>>> + } >>>> + >>>> + *out_hwpt = alloc_hwpt.out_hwpt_id; >>>> + return true; >>>> +} >>>> + >>>> bool iommufd_backend_get_device_info(IOMMUFDBackend *be, >uint32_t >>>> devid, >>>> uint32_t *type, void *data, uint32_t len, >>>> uint64_t *caps, Error **errp) >>>> diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c >>>> index 077dea8f1b64..545f4a404125 100644 >>>> --- a/hw/vfio/iommufd.c >>>> +++ b/hw/vfio/iommufd.c >>>> @@ -212,10 +212,88 @@ static bool >>>> iommufd_cdev_detach_ioas_hwpt(VFIODevice *vbasedev, Error **errp) >>>> return true; >>>> } >>>> >>>> +static bool iommufd_cdev_autodomains_get(VFIODevice *vbasedev, >>>> + VFIOIOMMUFDContainer *container, >>>> + Error **errp) >>>> +{ >>>> + IOMMUFDBackend *iommufd = vbasedev->iommufd; >>>> + uint32_t flags = 0; >>>> + VFIOIOASHwpt *hwpt; >>>> + uint32_t hwpt_id; >>>> + int ret; >>>> + >>>> + /* Try to find a domain */ >>>> + QLIST_FOREACH(hwpt, &container->hwpt_list, next) { >>>> + ret = iommufd_cdev_attach_ioas_hwpt(vbasedev, hwpt->hwpt_id, >>>> errp); >>>> + if (ret) { >>>> + /* -EINVAL means the domain is incompatible with the device. >*/ >>>> + if (ret == -EINVAL) { >>>> + /* >>>> + * It is an expected failure and it just means we will try >>>> + * another domain, or create one if no existing compatible >>>> + * domain is found. Hence why the error is discarded below. >>>> + */ >>>> + error_free(*errp); >>>> + *errp = NULL; >>>> + continue; >>>> + } >>>> + >>>> + return false; >>>> + } else { >>>> + vbasedev->hwpt = hwpt; >>>> + QLIST_INSERT_HEAD(&hwpt->device_list, vbasedev, hwpt_next); >>>> + return true; >>>> + } >>>> + } >>>> + >>>> + if (!iommufd_backend_alloc_hwpt(iommufd, vbasedev->devid, >>>> + container->ioas_id, flags, >>>> + IOMMU_HWPT_DATA_NONE, 0, NULL, >>>> + &hwpt_id, errp)) { >>>> + return false; >>>> + } >>>> + >>>> + hwpt = g_malloc0(sizeof(*hwpt)); >>>> + hwpt->hwpt_id = hwpt_id; >>>> + QLIST_INIT(&hwpt->device_list); >>>> + >>>> + ret = iommufd_cdev_attach_ioas_hwpt(vbasedev, hwpt->hwpt_id, >errp); >>>> + if (ret) { >>>> + iommufd_backend_free_id(container->be, hwpt->hwpt_id); >>>> + g_free(hwpt); >>>> + return false; >>>> + } >>>> + >>>> + vbasedev->hwpt = hwpt; >>>> + QLIST_INSERT_HEAD(&hwpt->device_list, vbasedev, hwpt_next); >>>> + QLIST_INSERT_HEAD(&container->hwpt_list, hwpt, next); >>>> + return true; >>>> +} >>>> + >>>> +static void iommufd_cdev_autodomains_put(VFIODevice *vbasedev, >>>> + VFIOIOMMUFDContainer *container) >>>> +{ >>>> + VFIOIOASHwpt *hwpt = vbasedev->hwpt; >>>> + >>>> + QLIST_REMOVE(vbasedev, hwpt_next); >>>> + vbasedev->hwpt = NULL; >>>> + >>>> + if (QLIST_EMPTY(&hwpt->device_list)) { >>>> + QLIST_REMOVE(hwpt, next); >>>> + iommufd_backend_free_id(container->be, hwpt->hwpt_id); >>>> + g_free(hwpt); >>>> + } >>>> +} >>> >>> Looks the detach flow is still missed? >>> >> >> >> I don't think so. The iommufd_backend_free_id() pairs with alloc_hwpt call >and >> is there for when there's no device attached to the hwpt to actually free >the >> hwpt. Besides setting to NULL the device hwpt, the detach flow was fixed >below (...) >> >>>> + >>>> static bool iommufd_cdev_attach_container(VFIODevice *vbasedev, >>>> VFIOIOMMUFDContainer *container, >>>> Error **errp) >>>> { >>>> + /* mdevs aren't physical devices and will fail with auto domains */ >>>> + if (!vbasedev->mdev) { >>>> + return iommufd_cdev_autodomains_get(vbasedev, container, >errp); >>>> + } >>>> + >>>> return !iommufd_cdev_attach_ioas_hwpt(vbasedev, container- >>ioas_id, >>>> errp); >>>> } >>>> >>>> @@ -227,6 +305,11 @@ static void >>>> iommufd_cdev_detach_container(VFIODevice *vbasedev, >>>> if (!iommufd_cdev_detach_ioas_hwpt(vbasedev, &err)) { >>> >>> Shouldn't we check mdev before calling this? >>> >> (...) here. Detach needs to be called for both, and keep in mind that this >> doesn't a pt_id, as the ioctl detaches from whatever domain or emulated >idea of >> it (for mdev) that it has previously been called IOMMUFD_ATTACH with. >> >> We also call this with mdev we just don't call it with a hwpt_id but rather >use >> autodomains (and it doesn't actually allocate a hw domain) >> >>>> error_report_err(err); >>>> } >>>> + >>>> + if (vbasedev->hwpt) { >>>> + iommufd_cdev_autodomains_put(vbasedev, container); >>>> + } >>>> + >>>> } >>>> >>>> static void iommufd_cdev_container_destroy(VFIOIOMMUFDContainer >>>> *container) >>>> @@ -354,6 +437,7 @@ static bool iommufd_cdev_attach(const char >*name, >>>> VFIODevice *vbasedev, >>>> container = >>>> VFIO_IOMMU_IOMMUFD(object_new(TYPE_VFIO_IOMMU_IOMMUFD)); >>>> container->be = vbasedev->iommufd; >>>> container->ioas_id = ioas_id; >>>> + QLIST_INIT(&container->hwpt_list); >>> >>> This can be in ::instance_init(). >>> >> But there's no instance_init() for TYPE_VFIO_IOMMU_IOMMUFD. This is >where all >> IOMMUFD container stuff is taking place aiui. > >We can add an .instance_init() handler later on. It would be cleaner I agree >but it shouldn't be a reason to block the series. Yes, it's minor. > >Zhenzhong, > >Did Joao address your concerns ? Sure. Thanks Zhenzhong > >Thanks, > >C. > > > > >>> Thanks >>> Zhenzhong >>> >>>> >>>> bcontainer = &container->bcontainer; >>>> vfio_address_space_insert(space, bcontainer); >>>> diff --git a/backends/trace-events b/backends/trace-events >>>> index 211e6f374adc..4d8ac02fe7d6 100644 >>>> --- a/backends/trace-events >>>> +++ b/backends/trace-events >>>> @@ -14,4 +14,5 @@ iommufd_backend_map_dma(int iommufd, >uint32_t >>>> ioas, uint64_t iova, uint64_t size >>>> iommufd_backend_unmap_dma_non_exist(int iommufd, uint32_t ioas, >>>> uint64_t iova, uint64_t size, int ret) " Unmap nonexistent mapping: >>>> iommufd=%d ioas=%d iova=0x%"PRIx64" size=0x%"PRIx64" (%d)" >>>> iommufd_backend_unmap_dma(int iommufd, uint32_t ioas, uint64_t >iova, >>>> uint64_t size, int ret) " iommufd=%d ioas=%d iova=0x%"PRIx64" >>>> size=0x%"PRIx64" (%d)" >>>> iommufd_backend_alloc_ioas(int iommufd, uint32_t ioas) " >iommufd=%d >>>> ioas=%d" >>>> +iommufd_backend_alloc_hwpt(int iommufd, uint32_t dev_id, >uint32_t >>>> pt_id, uint32_t flags, uint32_t hwpt_type, uint32_t len, uint64_t >data_ptr, >>>> uint32_t out_hwpt_id, int ret) " iommufd=%d dev_id=%u pt_id=%u >>>> flags=0x%x hwpt_type=%u len=%u data_ptr=0x%"PRIx64" >out_hwpt=%u >>>> (%d)" >>>> iommufd_backend_free_id(int iommufd, uint32_t id, int ret) " >iommufd=%d >>>> id=%d (%d)" >>>> -- >>>> 2.17.2 >>> >>
>-----Original Message----- >From: Joao Martins <joao.m.martins@oracle.com> >Subject: Re: [PATCH v5 05/13] vfio/iommufd: Introduce auto domain >creation > >On 22/07/2024 06:16, Duan, Zhenzhong wrote: >>> -----Original Message----- >>> From: Joao Martins <joao.m.martins@oracle.com> >>> Subject: [PATCH v5 05/13] vfio/iommufd: Introduce auto domain >creation >>> >>> There's generally two modes of operation for IOMMUFD: >>> >>> 1) The simple user API which intends to perform relatively simple things >>> with IOMMUs e.g. DPDK. The process generally creates an IOAS and >attaches >>> to VFIO and mainly performs IOAS_MAP and UNMAP. >>> >>> 2) The native IOMMUFD API where you have fine grained control of the >>> IOMMU domain and model it accordingly. This is where most new feature >>> are being steered to. >>> >>> For dirty tracking 2) is required, as it needs to ensure that >>> the stage-2/parent IOMMU domain will only attach devices >>> that support dirty tracking (so far it is all homogeneous in x86, likely >>> not the case for smmuv3). Such invariant on dirty tracking provides a >>> useful guarantee to VMMs that will refuse incompatible device >>> attachments for IOMMU domains. >>> >>> Dirty tracking insurance is enforced via HWPT_ALLOC, which is >>> responsible for creating an IOMMU domain. This is contrast to the >>> 'simple API' where the IOMMU domain is created by IOMMUFD >>> automatically >>> when it attaches to VFIO (usually referred as autodomains) but it has >>> the needed handling for mdevs. >>> >>> To support dirty tracking with the advanced IOMMUFD API, it needs >>> similar logic, where IOMMU domains are created and devices attached to >>> compatible domains. Essentially mimicking kernel >>> iommufd_device_auto_get_domain(). With mdevs given there's no >IOMMU >>> domain >>> it falls back to IOAS attach. >>> >>> The auto domain logic allows different IOMMU domains to be created >when >>> DMA dirty tracking is not desired (and VF can provide it), and others >where >>> it is. Here it is not used in this way given how VFIODevice migration >>> state is initialized after the device attachment. But such mixed mode of >>> IOMMU dirty tracking + device dirty tracking is an improvement that can >>> be added on. Keep the 'all of nothing' of type1 approach that we have >>> been using so far between container vs device dirty tracking. >>> >>> Signed-off-by: Joao Martins <joao.m.martins@oracle.com> >>> --- >>> include/hw/vfio/vfio-common.h | 9 ++++ >>> include/sysemu/iommufd.h | 5 +++ >>> backends/iommufd.c | 30 +++++++++++++ >>> hw/vfio/iommufd.c | 84 >>> +++++++++++++++++++++++++++++++++++ >>> backends/trace-events | 1 + >>> 5 files changed, 129 insertions(+) >>> >>> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio- >>> common.h >>> index 98acae8c1c97..1a96678f8c38 100644 >>> --- a/include/hw/vfio/vfio-common.h >>> +++ b/include/hw/vfio/vfio-common.h >>> @@ -95,10 +95,17 @@ typedef struct VFIOHostDMAWindow { >>> >>> typedef struct IOMMUFDBackend IOMMUFDBackend; >>> >>> +typedef struct VFIOIOASHwpt { >>> + uint32_t hwpt_id; >>> + QLIST_HEAD(, VFIODevice) device_list; >>> + QLIST_ENTRY(VFIOIOASHwpt) next; >>> +} VFIOIOASHwpt; >>> + >>> typedef struct VFIOIOMMUFDContainer { >>> VFIOContainerBase bcontainer; >>> IOMMUFDBackend *be; >>> uint32_t ioas_id; >>> + QLIST_HEAD(, VFIOIOASHwpt) hwpt_list; >>> } VFIOIOMMUFDContainer; >>> >>> OBJECT_DECLARE_SIMPLE_TYPE(VFIOIOMMUFDContainer, >>> VFIO_IOMMU_IOMMUFD); >>> @@ -135,6 +142,8 @@ typedef struct VFIODevice { >>> HostIOMMUDevice *hiod; >>> int devid; >>> IOMMUFDBackend *iommufd; >>> + VFIOIOASHwpt *hwpt; >>> + QLIST_ENTRY(VFIODevice) hwpt_next; >>> } VFIODevice; >>> >>> struct VFIODeviceOps { >>> diff --git a/include/sysemu/iommufd.h b/include/sysemu/iommufd.h >>> index 57d502a1c79a..e917e7591d05 100644 >>> --- a/include/sysemu/iommufd.h >>> +++ b/include/sysemu/iommufd.h >>> @@ -50,6 +50,11 @@ int >>> iommufd_backend_unmap_dma(IOMMUFDBackend *be, uint32_t >ioas_id, >>> bool iommufd_backend_get_device_info(IOMMUFDBackend *be, >uint32_t >>> devid, >>> uint32_t *type, void *data, uint32_t len, >>> uint64_t *caps, Error **errp); >>> +bool iommufd_backend_alloc_hwpt(IOMMUFDBackend *be, uint32_t >>> dev_id, >>> + uint32_t pt_id, uint32_t flags, >>> + uint32_t data_type, uint32_t data_len, >>> + void *data_ptr, uint32_t *out_hwpt, >>> + Error **errp); >>> >>> #define TYPE_HOST_IOMMU_DEVICE_IOMMUFD >>> TYPE_HOST_IOMMU_DEVICE "-iommufd" >>> #endif >>> diff --git a/backends/iommufd.c b/backends/iommufd.c >>> index 2b3d51af26d2..a94d3b90c05c 100644 >>> --- a/backends/iommufd.c >>> +++ b/backends/iommufd.c >>> @@ -208,6 +208,36 @@ int >>> iommufd_backend_unmap_dma(IOMMUFDBackend *be, uint32_t >ioas_id, >>> return ret; >>> } >>> >>> +bool iommufd_backend_alloc_hwpt(IOMMUFDBackend *be, uint32_t >>> dev_id, >>> + uint32_t pt_id, uint32_t flags, >>> + uint32_t data_type, uint32_t data_len, >>> + void *data_ptr, uint32_t *out_hwpt, >>> + Error **errp) >>> +{ >>> + int ret, fd = be->fd; >>> + struct iommu_hwpt_alloc alloc_hwpt = { >>> + .size = sizeof(struct iommu_hwpt_alloc), >>> + .flags = flags, >>> + .dev_id = dev_id, >>> + .pt_id = pt_id, >>> + .data_type = data_type, >>> + .data_len = data_len, >>> + .data_uptr = (uintptr_t)data_ptr, >>> + }; >>> + >>> + ret = ioctl(fd, IOMMU_HWPT_ALLOC, &alloc_hwpt); >>> + trace_iommufd_backend_alloc_hwpt(fd, dev_id, pt_id, flags, >data_type, >>> + data_len, (uintptr_t)data_ptr, >>> + alloc_hwpt.out_hwpt_id, ret); >>> + if (ret) { >>> + error_setg_errno(errp, errno, "Failed to allocate hwpt"); >>> + return false; >>> + } >>> + >>> + *out_hwpt = alloc_hwpt.out_hwpt_id; >>> + return true; >>> +} >>> + >>> bool iommufd_backend_get_device_info(IOMMUFDBackend *be, >uint32_t >>> devid, >>> uint32_t *type, void *data, uint32_t len, >>> uint64_t *caps, Error **errp) >>> diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c >>> index 077dea8f1b64..545f4a404125 100644 >>> --- a/hw/vfio/iommufd.c >>> +++ b/hw/vfio/iommufd.c >>> @@ -212,10 +212,88 @@ static bool >>> iommufd_cdev_detach_ioas_hwpt(VFIODevice *vbasedev, Error **errp) >>> return true; >>> } >>> >>> +static bool iommufd_cdev_autodomains_get(VFIODevice *vbasedev, >>> + VFIOIOMMUFDContainer *container, >>> + Error **errp) >>> +{ >>> + IOMMUFDBackend *iommufd = vbasedev->iommufd; >>> + uint32_t flags = 0; >>> + VFIOIOASHwpt *hwpt; >>> + uint32_t hwpt_id; >>> + int ret; >>> + >>> + /* Try to find a domain */ >>> + QLIST_FOREACH(hwpt, &container->hwpt_list, next) { >>> + ret = iommufd_cdev_attach_ioas_hwpt(vbasedev, hwpt->hwpt_id, >>> errp); >>> + if (ret) { >>> + /* -EINVAL means the domain is incompatible with the device. */ >>> + if (ret == -EINVAL) { >>> + /* >>> + * It is an expected failure and it just means we will try >>> + * another domain, or create one if no existing compatible >>> + * domain is found. Hence why the error is discarded below. >>> + */ >>> + error_free(*errp); >>> + *errp = NULL; >>> + continue; >>> + } >>> + >>> + return false; >>> + } else { >>> + vbasedev->hwpt = hwpt; >>> + QLIST_INSERT_HEAD(&hwpt->device_list, vbasedev, hwpt_next); >>> + return true; >>> + } >>> + } >>> + >>> + if (!iommufd_backend_alloc_hwpt(iommufd, vbasedev->devid, >>> + container->ioas_id, flags, >>> + IOMMU_HWPT_DATA_NONE, 0, NULL, >>> + &hwpt_id, errp)) { >>> + return false; >>> + } >>> + >>> + hwpt = g_malloc0(sizeof(*hwpt)); >>> + hwpt->hwpt_id = hwpt_id; >>> + QLIST_INIT(&hwpt->device_list); >>> + >>> + ret = iommufd_cdev_attach_ioas_hwpt(vbasedev, hwpt->hwpt_id, >errp); >>> + if (ret) { >>> + iommufd_backend_free_id(container->be, hwpt->hwpt_id); >>> + g_free(hwpt); >>> + return false; >>> + } >>> + >>> + vbasedev->hwpt = hwpt; >>> + QLIST_INSERT_HEAD(&hwpt->device_list, vbasedev, hwpt_next); >>> + QLIST_INSERT_HEAD(&container->hwpt_list, hwpt, next); >>> + return true; >>> +} >>> + >>> +static void iommufd_cdev_autodomains_put(VFIODevice *vbasedev, >>> + VFIOIOMMUFDContainer *container) >>> +{ >>> + VFIOIOASHwpt *hwpt = vbasedev->hwpt; >>> + >>> + QLIST_REMOVE(vbasedev, hwpt_next); >>> + vbasedev->hwpt = NULL; >>> + >>> + if (QLIST_EMPTY(&hwpt->device_list)) { >>> + QLIST_REMOVE(hwpt, next); >>> + iommufd_backend_free_id(container->be, hwpt->hwpt_id); >>> + g_free(hwpt); >>> + } >>> +} >> >> Looks the detach flow is still missed? >> > > >I don't think so. The iommufd_backend_free_id() pairs with alloc_hwpt call >and >is there for when there's no device attached to the hwpt to actually free the >hwpt. Besides setting to NULL the device hwpt, the detach flow was fixed >below (...) > >>> + >>> static bool iommufd_cdev_attach_container(VFIODevice *vbasedev, >>> VFIOIOMMUFDContainer *container, >>> Error **errp) >>> { >>> + /* mdevs aren't physical devices and will fail with auto domains */ >>> + if (!vbasedev->mdev) { >>> + return iommufd_cdev_autodomains_get(vbasedev, container, errp); >>> + } >>> + >>> return !iommufd_cdev_attach_ioas_hwpt(vbasedev, container- >>ioas_id, >>> errp); >>> } >>> >>> @@ -227,6 +305,11 @@ static void >>> iommufd_cdev_detach_container(VFIODevice *vbasedev, >>> if (!iommufd_cdev_detach_ioas_hwpt(vbasedev, &err)) { >> >> Shouldn't we check mdev before calling this? >> >(...) here. Detach needs to be called for both, and keep in mind that this >doesn't a pt_id, as the ioctl detaches from whatever domain or emulated >idea of >it (for mdev) that it has previously been called IOMMUFD_ATTACH with. > >We also call this with mdev we just don't call it with a hwpt_id but rather >use >autodomains (and it doesn't actually allocate a hw domain) Yeah, you are right, no problem here. > >>> error_report_err(err); >>> } >>> + >>> + if (vbasedev->hwpt) { >>> + iommufd_cdev_autodomains_put(vbasedev, container); >>> + } >>> + >>> } >>> >>> static void iommufd_cdev_container_destroy(VFIOIOMMUFDContainer >>> *container) >>> @@ -354,6 +437,7 @@ static bool iommufd_cdev_attach(const char >*name, >>> VFIODevice *vbasedev, >>> container = >>> VFIO_IOMMU_IOMMUFD(object_new(TYPE_VFIO_IOMMU_IOMMUFD)); >>> container->be = vbasedev->iommufd; >>> container->ioas_id = ioas_id; >>> + QLIST_INIT(&container->hwpt_list); >> >> This can be in ::instance_init(). >> >But there's no instance_init() for TYPE_VFIO_IOMMU_IOMMUFD. This is >where all >IOMMUFD container stuff is taking place aiui. OK. Thanks Zhenzhong > >> Thanks >> Zhenzhong >> >>> >>> bcontainer = &container->bcontainer; >>> vfio_address_space_insert(space, bcontainer); >>> diff --git a/backends/trace-events b/backends/trace-events >>> index 211e6f374adc..4d8ac02fe7d6 100644 >>> --- a/backends/trace-events >>> +++ b/backends/trace-events >>> @@ -14,4 +14,5 @@ iommufd_backend_map_dma(int iommufd, >uint32_t >>> ioas, uint64_t iova, uint64_t size >>> iommufd_backend_unmap_dma_non_exist(int iommufd, uint32_t ioas, >>> uint64_t iova, uint64_t size, int ret) " Unmap nonexistent mapping: >>> iommufd=%d ioas=%d iova=0x%"PRIx64" size=0x%"PRIx64" (%d)" >>> iommufd_backend_unmap_dma(int iommufd, uint32_t ioas, uint64_t >iova, >>> uint64_t size, int ret) " iommufd=%d ioas=%d iova=0x%"PRIx64" >>> size=0x%"PRIx64" (%d)" >>> iommufd_backend_alloc_ioas(int iommufd, uint32_t ioas) " >iommufd=%d >>> ioas=%d" >>> +iommufd_backend_alloc_hwpt(int iommufd, uint32_t dev_id, uint32_t >>> pt_id, uint32_t flags, uint32_t hwpt_type, uint32_t len, uint64_t >data_ptr, >>> uint32_t out_hwpt_id, int ret) " iommufd=%d dev_id=%u pt_id=%u >>> flags=0x%x hwpt_type=%u len=%u data_ptr=0x%"PRIx64" out_hwpt=%u >>> (%d)" >>> iommufd_backend_free_id(int iommufd, uint32_t id, int ret) " >iommufd=%d >>> id=%d (%d)" >>> -- >>> 2.17.2 >>
diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h index 98acae8c1c97..1a96678f8c38 100644 --- a/include/hw/vfio/vfio-common.h +++ b/include/hw/vfio/vfio-common.h @@ -95,10 +95,17 @@ typedef struct VFIOHostDMAWindow { typedef struct IOMMUFDBackend IOMMUFDBackend; +typedef struct VFIOIOASHwpt { + uint32_t hwpt_id; + QLIST_HEAD(, VFIODevice) device_list; + QLIST_ENTRY(VFIOIOASHwpt) next; +} VFIOIOASHwpt; + typedef struct VFIOIOMMUFDContainer { VFIOContainerBase bcontainer; IOMMUFDBackend *be; uint32_t ioas_id; + QLIST_HEAD(, VFIOIOASHwpt) hwpt_list; } VFIOIOMMUFDContainer; OBJECT_DECLARE_SIMPLE_TYPE(VFIOIOMMUFDContainer, VFIO_IOMMU_IOMMUFD); @@ -135,6 +142,8 @@ typedef struct VFIODevice { HostIOMMUDevice *hiod; int devid; IOMMUFDBackend *iommufd; + VFIOIOASHwpt *hwpt; + QLIST_ENTRY(VFIODevice) hwpt_next; } VFIODevice; struct VFIODeviceOps { diff --git a/include/sysemu/iommufd.h b/include/sysemu/iommufd.h index 57d502a1c79a..e917e7591d05 100644 --- a/include/sysemu/iommufd.h +++ b/include/sysemu/iommufd.h @@ -50,6 +50,11 @@ int iommufd_backend_unmap_dma(IOMMUFDBackend *be, uint32_t ioas_id, bool iommufd_backend_get_device_info(IOMMUFDBackend *be, uint32_t devid, uint32_t *type, void *data, uint32_t len, uint64_t *caps, Error **errp); +bool iommufd_backend_alloc_hwpt(IOMMUFDBackend *be, uint32_t dev_id, + uint32_t pt_id, uint32_t flags, + uint32_t data_type, uint32_t data_len, + void *data_ptr, uint32_t *out_hwpt, + Error **errp); #define TYPE_HOST_IOMMU_DEVICE_IOMMUFD TYPE_HOST_IOMMU_DEVICE "-iommufd" #endif diff --git a/backends/iommufd.c b/backends/iommufd.c index 2b3d51af26d2..a94d3b90c05c 100644 --- a/backends/iommufd.c +++ b/backends/iommufd.c @@ -208,6 +208,36 @@ int iommufd_backend_unmap_dma(IOMMUFDBackend *be, uint32_t ioas_id, return ret; } +bool iommufd_backend_alloc_hwpt(IOMMUFDBackend *be, uint32_t dev_id, + uint32_t pt_id, uint32_t flags, + uint32_t data_type, uint32_t data_len, + void *data_ptr, uint32_t *out_hwpt, + Error **errp) +{ + int ret, fd = be->fd; + struct iommu_hwpt_alloc alloc_hwpt = { + .size = sizeof(struct iommu_hwpt_alloc), + .flags = flags, + .dev_id = dev_id, + .pt_id = pt_id, + .data_type = data_type, + .data_len = data_len, + .data_uptr = (uintptr_t)data_ptr, + }; + + ret = ioctl(fd, IOMMU_HWPT_ALLOC, &alloc_hwpt); + trace_iommufd_backend_alloc_hwpt(fd, dev_id, pt_id, flags, data_type, + data_len, (uintptr_t)data_ptr, + alloc_hwpt.out_hwpt_id, ret); + if (ret) { + error_setg_errno(errp, errno, "Failed to allocate hwpt"); + return false; + } + + *out_hwpt = alloc_hwpt.out_hwpt_id; + return true; +} + bool iommufd_backend_get_device_info(IOMMUFDBackend *be, uint32_t devid, uint32_t *type, void *data, uint32_t len, uint64_t *caps, Error **errp) diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c index 077dea8f1b64..545f4a404125 100644 --- a/hw/vfio/iommufd.c +++ b/hw/vfio/iommufd.c @@ -212,10 +212,88 @@ static bool iommufd_cdev_detach_ioas_hwpt(VFIODevice *vbasedev, Error **errp) return true; } +static bool iommufd_cdev_autodomains_get(VFIODevice *vbasedev, + VFIOIOMMUFDContainer *container, + Error **errp) +{ + IOMMUFDBackend *iommufd = vbasedev->iommufd; + uint32_t flags = 0; + VFIOIOASHwpt *hwpt; + uint32_t hwpt_id; + int ret; + + /* Try to find a domain */ + QLIST_FOREACH(hwpt, &container->hwpt_list, next) { + ret = iommufd_cdev_attach_ioas_hwpt(vbasedev, hwpt->hwpt_id, errp); + if (ret) { + /* -EINVAL means the domain is incompatible with the device. */ + if (ret == -EINVAL) { + /* + * It is an expected failure and it just means we will try + * another domain, or create one if no existing compatible + * domain is found. Hence why the error is discarded below. + */ + error_free(*errp); + *errp = NULL; + continue; + } + + return false; + } else { + vbasedev->hwpt = hwpt; + QLIST_INSERT_HEAD(&hwpt->device_list, vbasedev, hwpt_next); + return true; + } + } + + if (!iommufd_backend_alloc_hwpt(iommufd, vbasedev->devid, + container->ioas_id, flags, + IOMMU_HWPT_DATA_NONE, 0, NULL, + &hwpt_id, errp)) { + return false; + } + + hwpt = g_malloc0(sizeof(*hwpt)); + hwpt->hwpt_id = hwpt_id; + QLIST_INIT(&hwpt->device_list); + + ret = iommufd_cdev_attach_ioas_hwpt(vbasedev, hwpt->hwpt_id, errp); + if (ret) { + iommufd_backend_free_id(container->be, hwpt->hwpt_id); + g_free(hwpt); + return false; + } + + vbasedev->hwpt = hwpt; + QLIST_INSERT_HEAD(&hwpt->device_list, vbasedev, hwpt_next); + QLIST_INSERT_HEAD(&container->hwpt_list, hwpt, next); + return true; +} + +static void iommufd_cdev_autodomains_put(VFIODevice *vbasedev, + VFIOIOMMUFDContainer *container) +{ + VFIOIOASHwpt *hwpt = vbasedev->hwpt; + + QLIST_REMOVE(vbasedev, hwpt_next); + vbasedev->hwpt = NULL; + + if (QLIST_EMPTY(&hwpt->device_list)) { + QLIST_REMOVE(hwpt, next); + iommufd_backend_free_id(container->be, hwpt->hwpt_id); + g_free(hwpt); + } +} + static bool iommufd_cdev_attach_container(VFIODevice *vbasedev, VFIOIOMMUFDContainer *container, Error **errp) { + /* mdevs aren't physical devices and will fail with auto domains */ + if (!vbasedev->mdev) { + return iommufd_cdev_autodomains_get(vbasedev, container, errp); + } + return !iommufd_cdev_attach_ioas_hwpt(vbasedev, container->ioas_id, errp); } @@ -227,6 +305,11 @@ static void iommufd_cdev_detach_container(VFIODevice *vbasedev, if (!iommufd_cdev_detach_ioas_hwpt(vbasedev, &err)) { error_report_err(err); } + + if (vbasedev->hwpt) { + iommufd_cdev_autodomains_put(vbasedev, container); + } + } static void iommufd_cdev_container_destroy(VFIOIOMMUFDContainer *container) @@ -354,6 +437,7 @@ static bool iommufd_cdev_attach(const char *name, VFIODevice *vbasedev, container = VFIO_IOMMU_IOMMUFD(object_new(TYPE_VFIO_IOMMU_IOMMUFD)); container->be = vbasedev->iommufd; container->ioas_id = ioas_id; + QLIST_INIT(&container->hwpt_list); bcontainer = &container->bcontainer; vfio_address_space_insert(space, bcontainer); diff --git a/backends/trace-events b/backends/trace-events index 211e6f374adc..4d8ac02fe7d6 100644 --- a/backends/trace-events +++ b/backends/trace-events @@ -14,4 +14,5 @@ iommufd_backend_map_dma(int iommufd, uint32_t ioas, uint64_t iova, uint64_t size iommufd_backend_unmap_dma_non_exist(int iommufd, uint32_t ioas, uint64_t iova, uint64_t size, int ret) " Unmap nonexistent mapping: iommufd=%d ioas=%d iova=0x%"PRIx64" size=0x%"PRIx64" (%d)" iommufd_backend_unmap_dma(int iommufd, uint32_t ioas, uint64_t iova, uint64_t size, int ret) " iommufd=%d ioas=%d iova=0x%"PRIx64" size=0x%"PRIx64" (%d)" iommufd_backend_alloc_ioas(int iommufd, uint32_t ioas) " iommufd=%d ioas=%d" +iommufd_backend_alloc_hwpt(int iommufd, uint32_t dev_id, uint32_t pt_id, uint32_t flags, uint32_t hwpt_type, uint32_t len, uint64_t data_ptr, uint32_t out_hwpt_id, int ret) " iommufd=%d dev_id=%u pt_id=%u flags=0x%x hwpt_type=%u len=%u data_ptr=0x%"PRIx64" out_hwpt=%u (%d)" iommufd_backend_free_id(int iommufd, uint32_t id, int ret) " iommufd=%d id=%d (%d)"
There's generally two modes of operation for IOMMUFD: 1) The simple user API which intends to perform relatively simple things with IOMMUs e.g. DPDK. The process generally creates an IOAS and attaches to VFIO and mainly performs IOAS_MAP and UNMAP. 2) The native IOMMUFD API where you have fine grained control of the IOMMU domain and model it accordingly. This is where most new feature are being steered to. For dirty tracking 2) is required, as it needs to ensure that the stage-2/parent IOMMU domain will only attach devices that support dirty tracking (so far it is all homogeneous in x86, likely not the case for smmuv3). Such invariant on dirty tracking provides a useful guarantee to VMMs that will refuse incompatible device attachments for IOMMU domains. Dirty tracking insurance is enforced via HWPT_ALLOC, which is responsible for creating an IOMMU domain. This is contrast to the 'simple API' where the IOMMU domain is created by IOMMUFD automatically when it attaches to VFIO (usually referred as autodomains) but it has the needed handling for mdevs. To support dirty tracking with the advanced IOMMUFD API, it needs similar logic, where IOMMU domains are created and devices attached to compatible domains. Essentially mimicking kernel iommufd_device_auto_get_domain(). With mdevs given there's no IOMMU domain it falls back to IOAS attach. The auto domain logic allows different IOMMU domains to be created when DMA dirty tracking is not desired (and VF can provide it), and others where it is. Here it is not used in this way given how VFIODevice migration state is initialized after the device attachment. But such mixed mode of IOMMU dirty tracking + device dirty tracking is an improvement that can be added on. Keep the 'all of nothing' of type1 approach that we have been using so far between container vs device dirty tracking. Signed-off-by: Joao Martins <joao.m.martins@oracle.com> --- include/hw/vfio/vfio-common.h | 9 ++++ include/sysemu/iommufd.h | 5 +++ backends/iommufd.c | 30 +++++++++++++ hw/vfio/iommufd.c | 84 +++++++++++++++++++++++++++++++++++ backends/trace-events | 1 + 5 files changed, 129 insertions(+)