Message ID | 20231024135109.73787-1-joao.m.martins@oracle.com (mailing list archive) |
---|---|
Headers | show |
Series | IOMMUFD Dirty Tracking | expand |
On Tue, Oct 24, 2023 at 02:50:51PM +0100, Joao Martins wrote: > v6 is a replacement of what's in iommufd next: > https://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd.git/log/?h=for-next > > Joao Martins (18): > vfio/iova_bitmap: Export more API symbols > vfio: Move iova_bitmap into iommufd > iommufd/iova_bitmap: Move symbols to IOMMUFD namespace > iommu: Add iommu_domain ops for dirty tracking > iommufd: Add a flag to enforce dirty tracking on attach > iommufd: Add IOMMU_HWPT_SET_DIRTY_TRACKING > iommufd: Add IOMMU_HWPT_GET_DIRTY_BITMAP > iommufd: Add capabilities to IOMMU_GET_HW_INFO > iommufd: Add a flag to skip clearing of IOPTE dirty > iommu/amd: Add domain_alloc_user based domain allocation > iommu/amd: Access/Dirty bit support in IOPTEs > iommu/vt-d: Access/Dirty bit support for SS domains > iommufd/selftest: Expand mock_domain with dev_flags > iommufd/selftest: Test IOMMU_HWPT_ALLOC_DIRTY_TRACKING > iommufd/selftest: Test IOMMU_HWPT_SET_DIRTY_TRACKING > iommufd/selftest: Test IOMMU_HWPT_GET_DIRTY_BITMAP > iommufd/selftest: Test out_capabilities in IOMMU_GET_HW_INFO > iommufd/selftest: Test IOMMU_HWPT_GET_DIRTY_BITMAP_NO_CLEAR flag Ok, I refreshed the series, thanks! Jason
On Tue, Oct 24, 2023 at 12:55:12PM -0300, Jason Gunthorpe wrote: > On Tue, Oct 24, 2023 at 02:50:51PM +0100, Joao Martins wrote: > > v6 is a replacement of what's in iommufd next: > > https://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd.git/log/?h=for-next > > > > Joao Martins (18): > > vfio/iova_bitmap: Export more API symbols > > vfio: Move iova_bitmap into iommufd > > iommufd/iova_bitmap: Move symbols to IOMMUFD namespace > > iommu: Add iommu_domain ops for dirty tracking > > iommufd: Add a flag to enforce dirty tracking on attach > > iommufd: Add IOMMU_HWPT_SET_DIRTY_TRACKING > > iommufd: Add IOMMU_HWPT_GET_DIRTY_BITMAP > > iommufd: Add capabilities to IOMMU_GET_HW_INFO > > iommufd: Add a flag to skip clearing of IOPTE dirty > > iommu/amd: Add domain_alloc_user based domain allocation > > iommu/amd: Access/Dirty bit support in IOPTEs > > iommu/vt-d: Access/Dirty bit support for SS domains > > iommufd/selftest: Expand mock_domain with dev_flags > > iommufd/selftest: Test IOMMU_HWPT_ALLOC_DIRTY_TRACKING > > iommufd/selftest: Test IOMMU_HWPT_SET_DIRTY_TRACKING > > iommufd/selftest: Test IOMMU_HWPT_GET_DIRTY_BITMAP > > iommufd/selftest: Test out_capabilities in IOMMU_GET_HW_INFO > > iommufd/selftest: Test IOMMU_HWPT_GET_DIRTY_BITMAP_NO_CLEAR flag > > Ok, I refreshed the series, thanks! Selftest is passing with this version. Cheers Nicolin
Hi, Joao On Tue, 24 Oct 2023 at 21:51, Joao Martins <joao.m.martins@oracle.com> wrote: > > v6 is a replacement of what's in iommufd next: > https://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd.git/log/?h=for-next > > base-commit: b5f9e63278d6f32789478acf1ed41d21d92b36cf > > (from the iommufd tree) > > =========>8========= > > Presented herewith is a series that extends IOMMUFD to have IOMMU > hardware support for dirty bit in the IOPTEs. > > Today, AMD Milan (or more recent) supports it while ARM SMMUv3.2 > alongside VT-D rev3.x also do support. One intended use-case (but not > restricted!) is to support Live Migration with SR-IOV, specially useful > for live migrateable PCI devices that can't supply its own dirty > tracking hardware blocks amongst others. > > At a quick glance, IOMMUFD lets the userspace create the IOAS with a > set of a IOVA ranges mapped to some physical memory composing an IO > pagetable. This is then created via HWPT_ALLOC or attached to a > particular device/hwpt, consequently creating the IOMMU domain and share > a common IO page table representing the endporint DMA-addressable guest > address space. In IOMMUFD Dirty tracking (since v2 of the series) it > will require via the HWPT_ALLOC model only, as opposed to simpler > autodomains model. > > The result is an hw_pagetable which represents the > iommu_domain which will be directly manipulated. The IOMMUFD UAPI, > and the iommu/iommufd kAPI are then extended to provide: > > 1) Enforcement that only devices with dirty tracking support are attached > to an IOMMU domain, to cover the case where this isn't all homogenous in > the platform. While initially this is more aimed at possible heterogenous nature > of ARM while x86 gets future proofed, should any such ocasion occur. > > The device dirty tracking enforcement on attach_dev is made whether the > dirty_ops are set or not. Given that attach always checks for dirty > ops and IOMMU_CAP_DIRTY, while writing this it almost wanted this to > move to upper layer but semantically iommu driver should do the > checking. > > 2) Toggling of Dirty Tracking on the iommu_domain. We model as the most > common case of changing hardware translation control structures dynamically > (x86) while making it easier to have an always-enabled mode. In the > RFCv1, the ARM specific case is suggested to be always enabled instead of > having to enable the per-PTE DBM control bit (what I previously called > "range tracking"). Here, setting/clearing tracking means just clearing the > dirty bits at start. The 'real' tracking of whether dirty > tracking is enabled is stored in the IOMMU driver, hence no new > fields are added to iommufd pagetable structures, except for the > iommu_domain dirty ops part via adding a dirty_ops field to > iommu_domain. We use that too for IOMMUFD to know if dirty tracking > is supported and toggleable without having iommu drivers replicate said > checks. > > 3) Add a capability probing for dirty tracking, leveraging the > per-device iommu_capable() and adding a IOMMU_CAP_DIRTY. It extends > the GET_HW_INFO ioctl which takes a device ID to return some generic > capabilities *in addition*. Possible values enumarated by `enum > iommufd_hw_capabilities`. > > 4) Read the I/O PTEs and marshal its dirtyiness into a bitmap. The bitmap > indexes on a page_size basis the IOVAs that got written by the device. > While performing the marshalling also drivers need to clear the dirty bits > from IOPTE and allow the kAPI caller to batch the much needed IOTLB flush. > There's no copy of bitmaps to userspace backed memory, all is zerocopy > based to not add more cost to the iommu driver IOPT walker. This shares > functionality with VFIO device dirty tracking via the IOVA bitmap APIs. So > far this is a test-and-clear kind of interface given that the IOPT walk is > going to be expensive. In addition this also adds the ability to read dirty > bit info without clearing the PTE info. This is meant to cover the > unmap-and-read-dirty use-case, and avoid the second IOTLB flush. > > The only dependency is: > * Have domain_alloc_user() API with flags [2] already queued (iommufd/for-next). > > The series is organized as follows: > > * Patches 1-4: Takes care of the iommu domain operations to be added. > The idea is to abstract iommu drivers from any idea of how bitmaps are > stored or propagated back to the caller, as well as allowing > control/batching over IOTLB flush. So there's a data structure and an > helper that only tells the upper layer that an IOVA range got dirty. > This logic is shared with VFIO and it's meant to walking the bitmap > user memory, and kmap-ing plus setting bits as needed. IOMMU driver > just has an idea of a 'dirty bitmap state' and recording an IOVA as > dirty. > > * Patches 5-9, 13-18: Adds the UAPIs for IOMMUFD, and selftests. The > selftests cover some corner cases on boundaries handling of the bitmap > and various bitmap sizes that exercise. I haven't included huge IOVA > ranges to avoid risking the selftests failing to execute due to OOM > issues of mmaping big buffers. > > * Patches 10-11: AMD IOMMU implementation, particularly on those having > HDSup support. Tested with a Qemu amd-iommu with HDSUp emulated[0]. And > tested with live migration with VFs (but with IOMMU dirty tracking). > > * Patches 12: Intel IOMMU rev3.x+ implementation. Tested with a Qemu > based intel-iommu vIOMMU with SSADS emulation support[0]. > > On AMD/Intel I have tested this with emulation and then live migration in > AMD hardware; > > The qemu iommu emulation bits are to increase coverage of this code and > hopefully make this more broadly available for fellow contributors/devs, > old version[1]; it uses Yi's 2 commits to have hw_info() supported (still > needs a bit of cleanup) on top of a recent Zhenzhong series of IOMMUFD > QEMU bringup work: see here[0]. It includes IOMMUFD dirty tracking for > Live migration and with live migration tested. I won't be exactly > following up a v2 of QEMU patches until IOMMUFD tracking lands. > > Feedback or any comments are very much appreciated. > > Thanks! > Joao Is this patchset enough for iommufd live migration? Just tried live migration in local machine, reports "VFIO migration is not supported in kernel" Thanks
On 2024/10/29 10:35, Zhangfei Gao wrote:
> VFIO migration is not supported in kernel
do you have a vfio-pci-xxx driver that suits your device? Looks
like your case failed when checking the VFIO_DEVICE_FEATURE_GET |
VFIO_DEVICE_FEATURE_MIGRATION via VFIO_DEVICE_FEATURE.
On Tue, 29 Oct 2024 at 10:48, Yi Liu <yi.l.liu@intel.com> wrote: > > On 2024/10/29 10:35, Zhangfei Gao wrote: > > VFIO migration is not supported in kernel > > do you have a vfio-pci-xxx driver that suits your device? Looks > like your case failed when checking the VFIO_DEVICE_FEATURE_GET | > VFIO_DEVICE_FEATURE_MIGRATION via VFIO_DEVICE_FEATURE. Thanks Yi for the guidance. Yes, ioctl VFIO_DEVICE_FEATURE with VFIO_DEVICE_FEATURE_MIGRATION fails. Since if (!device->mig_ops) return -ENOTTY; Now drivers/vfio/pci/vfio_pci.c is used, without mig_ops. Looks like I have to use other devices like drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c, still in check. Thanks Yi.
On 2024/10/29 16:05, Zhangfei Gao wrote: > On Tue, 29 Oct 2024 at 10:48, Yi Liu <yi.l.liu@intel.com> wrote: >> >> On 2024/10/29 10:35, Zhangfei Gao wrote: >>> VFIO migration is not supported in kernel >> >> do you have a vfio-pci-xxx driver that suits your device? Looks >> like your case failed when checking the VFIO_DEVICE_FEATURE_GET | >> VFIO_DEVICE_FEATURE_MIGRATION via VFIO_DEVICE_FEATURE. > > Thanks Yi for the guidance. > > Yes, > ioctl VFIO_DEVICE_FEATURE with VFIO_DEVICE_FEATURE_MIGRATION fails. > Since > if (!device->mig_ops) > return -ENOTTY; > > Now drivers/vfio/pci/vfio_pci.c is used, without mig_ops. > Looks like I have to use other devices like > drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c, still in check. indeed. You need to bind your device to a variant driver named vfio-pci-xxx which will provide the mig_ops and even dirty logging feature. Good luck.
On Tue, 24 Oct 2023 at 21:51, Joao Martins <joao.m.martins@oracle.com> wrote: > > v6 is a replacement of what's in iommufd next: > https://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd.git/log/?h=for-next > > base-commit: b5f9e63278d6f32789478acf1ed41d21d92b36cf > > (from the iommufd tree) > > =========>8========= > > Presented herewith is a series that extends IOMMUFD to have IOMMU > hardware support for dirty bit in the IOPTEs. > > Today, AMD Milan (or more recent) supports it while ARM SMMUv3.2 > alongside VT-D rev3.x also do support. One intended use-case (but not > restricted!) is to support Live Migration with SR-IOV, specially useful > for live migrateable PCI devices that can't supply its own dirty > tracking hardware blocks amongst others. > > At a quick glance, IOMMUFD lets the userspace create the IOAS with a > set of a IOVA ranges mapped to some physical memory composing an IO > pagetable. This is then created via HWPT_ALLOC or attached to a > particular device/hwpt, consequently creating the IOMMU domain and share > a common IO page table representing the endporint DMA-addressable guest > address space. In IOMMUFD Dirty tracking (since v2 of the series) it > will require via the HWPT_ALLOC model only, as opposed to simpler > autodomains model. > > The result is an hw_pagetable which represents the > iommu_domain which will be directly manipulated. The IOMMUFD UAPI, > and the iommu/iommufd kAPI are then extended to provide: > > 1) Enforcement that only devices with dirty tracking support are attached > to an IOMMU domain, to cover the case where this isn't all homogenous in > the platform. While initially this is more aimed at possible heterogenous nature > of ARM while x86 gets future proofed, should any such ocasion occur. > > The device dirty tracking enforcement on attach_dev is made whether the > dirty_ops are set or not. Given that attach always checks for dirty > ops and IOMMU_CAP_DIRTY, while writing this it almost wanted this to > move to upper layer but semantically iommu driver should do the > checking. > > 2) Toggling of Dirty Tracking on the iommu_domain. We model as the most > common case of changing hardware translation control structures dynamically > (x86) while making it easier to have an always-enabled mode. In the > RFCv1, the ARM specific case is suggested to be always enabled instead of > having to enable the per-PTE DBM control bit (what I previously called > "range tracking"). Here, setting/clearing tracking means just clearing the > dirty bits at start. The 'real' tracking of whether dirty > tracking is enabled is stored in the IOMMU driver, hence no new > fields are added to iommufd pagetable structures, except for the > iommu_domain dirty ops part via adding a dirty_ops field to > iommu_domain. We use that too for IOMMUFD to know if dirty tracking > is supported and toggleable without having iommu drivers replicate said > checks. > > 3) Add a capability probing for dirty tracking, leveraging the > per-device iommu_capable() and adding a IOMMU_CAP_DIRTY. It extends > the GET_HW_INFO ioctl which takes a device ID to return some generic > capabilities *in addition*. Possible values enumarated by `enum > iommufd_hw_capabilities`. > > 4) Read the I/O PTEs and marshal its dirtyiness into a bitmap. The bitmap > indexes on a page_size basis the IOVAs that got written by the device. > While performing the marshalling also drivers need to clear the dirty bits > from IOPTE and allow the kAPI caller to batch the much needed IOTLB flush. > There's no copy of bitmaps to userspace backed memory, all is zerocopy > based to not add more cost to the iommu driver IOPT walker. This shares > functionality with VFIO device dirty tracking via the IOVA bitmap APIs. So > far this is a test-and-clear kind of interface given that the IOPT walk is > going to be expensive. In addition this also adds the ability to read dirty > bit info without clearing the PTE info. This is meant to cover the > unmap-and-read-dirty use-case, and avoid the second IOTLB flush. > > The only dependency is: > * Have domain_alloc_user() API with flags [2] already queued (iommufd/for-next). > > The series is organized as follows: > > * Patches 1-4: Takes care of the iommu domain operations to be added. > The idea is to abstract iommu drivers from any idea of how bitmaps are > stored or propagated back to the caller, as well as allowing > control/batching over IOTLB flush. So there's a data structure and an > helper that only tells the upper layer that an IOVA range got dirty. > This logic is shared with VFIO and it's meant to walking the bitmap > user memory, and kmap-ing plus setting bits as needed. IOMMU driver > just has an idea of a 'dirty bitmap state' and recording an IOVA as > dirty. > > * Patches 5-9, 13-18: Adds the UAPIs for IOMMUFD, and selftests. The > selftests cover some corner cases on boundaries handling of the bitmap > and various bitmap sizes that exercise. I haven't included huge IOVA > ranges to avoid risking the selftests failing to execute due to OOM > issues of mmaping big buffers. > > * Patches 10-11: AMD IOMMU implementation, particularly on those having > HDSup support. Tested with a Qemu amd-iommu with HDSUp emulated[0]. And > tested with live migration with VFs (but with IOMMU dirty tracking). > > * Patches 12: Intel IOMMU rev3.x+ implementation. Tested with a Qemu > based intel-iommu vIOMMU with SSADS emulation support[0]. > > On AMD/Intel I have tested this with emulation and then live migration in > AMD hardware; > > The qemu iommu emulation bits are to increase coverage of this code and > hopefully make this more broadly available for fellow contributors/devs, > old version[1]; it uses Yi's 2 commits to have hw_info() supported (still > needs a bit of cleanup) on top of a recent Zhenzhong series of IOMMUFD > QEMU bringup work: see here[0]. It includes IOMMUFD dirty tracking for > Live migration and with live migration tested. I won't be exactly > following up a v2 of QEMU patches until IOMMUFD tracking lands. > > Feedback or any comments are very much appreciated. > > Thanks! > Joao Hi, Joao and Yi I just tried this on aarch64, live migration with "iommu=nested-smmuv3" It does not work. vbasedev->dirty_pages_supported=0 qemu-system-aarch64: -device vfio-pci-nohotplug,host=0000:75:00.1,iommufd=iommufd0,enable-migration=on,x-pre-copy-dirty-page-tracking=off: warning: 0000:75:00.1: VFIO device doesn't support device and IOMMU dirty tracking qemu-system-aarch64: -device vfio-pci-nohotplug,host=0000:75:00.1,iommufd=iommufd0,enable-migration=on,x-pre-copy-dirty-page-tracking=off: vfio 0000:75:00.1: 0000:75:00.1: Migration is currently not supported with vIOMMU enabled hw/vfio/migration.c if (vfio_viommu_preset(vbasedev)) { error_setg(&err, "%s: Migration is currently not supported " "with vIOMMU enabled", vbasedev->name); goto add_blocker; } Is this mean live migration with vIOMMU is still not ready, It is not an error. It is how they are blocking migration till all other related feature support is added for vIOMMU. And still need more work to enable migration with vIOMMU? By the way, live migration works if removing "iommu=nested-smmuv3". Any suggestions? Thanks
On Wed, Oct 30, 2024 at 11:15:02PM +0800, Zhangfei Gao wrote: > hw/vfio/migration.c > if (vfio_viommu_preset(vbasedev)) { > error_setg(&err, "%s: Migration is currently not supported " > "with vIOMMU enabled", vbasedev->name); > goto add_blocker; > } The viommu driver itself does not support live migration, it would need to preserve all the guest configuration and bring it all back. It doesn't know how to do that yet. Jason
On 30/10/2024 15:36, Jason Gunthorpe wrote: > On Wed, Oct 30, 2024 at 11:15:02PM +0800, Zhangfei Gao wrote: >> hw/vfio/migration.c >> if (vfio_viommu_preset(vbasedev)) { >> error_setg(&err, "%s: Migration is currently not supported " >> "with vIOMMU enabled", vbasedev->name); >> goto add_blocker; >> } > > The viommu driver itself does not support live migration, it would > need to preserve all the guest configuration and bring it all back. It > doesn't know how to do that yet. It's more of vfio code, not quite related to actually hw vIOMMU. There's some vfio migration + vIOMMU support patches I have to follow up (v5) but unexpected set backs unrelated to work delay some of my plans for qemu 9.2. I expect to resume in few weeks. I can point you to a branch while I don't submit (provided soft-freeze is coming) Joao
Hi Joao, > -----Original Message----- > From: Joao Martins <joao.m.martins@oracle.com> > Sent: Wednesday, October 30, 2024 3:47 PM > To: Jason Gunthorpe <jgg@nvidia.com>; Zhangfei Gao > <zhangfei.gao@linaro.org> > Cc: iommu@lists.linux.dev; Kevin Tian <kevin.tian@intel.com>; Shameerali > Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>; Lu Baolu > <baolu.lu@linux.intel.com>; Yi Liu <yi.l.liu@intel.com>; Yi Y Sun > <yi.y.sun@intel.com>; Nicolin Chen <nicolinc@nvidia.com>; Joerg Roedel > <joro@8bytes.org>; Suravee Suthikulpanit > <suravee.suthikulpanit@amd.com>; Will Deacon <will@kernel.org>; Robin > Murphy <robin.murphy@arm.com>; Zhenzhong Duan > <zhenzhong.duan@intel.com>; Alex Williamson > <alex.williamson@redhat.com>; kvm@vger.kernel.org; Shameer Kolothum > <shamiali2008@gmail.com>; Wangzhou (B) <wangzhou1@hisilicon.com> > Subject: Re: [PATCH v6 00/18] IOMMUFD Dirty Tracking > > On 30/10/2024 15:36, Jason Gunthorpe wrote: > > On Wed, Oct 30, 2024 at 11:15:02PM +0800, Zhangfei Gao wrote: > >> hw/vfio/migration.c > >> if (vfio_viommu_preset(vbasedev)) { > >> error_setg(&err, "%s: Migration is currently not supported " > >> "with vIOMMU enabled", vbasedev->name); > >> goto add_blocker; > >> } > > > > The viommu driver itself does not support live migration, it would > > need to preserve all the guest configuration and bring it all back. It > > doesn't know how to do that yet. > > It's more of vfio code, not quite related to actually hw vIOMMU. > > There's some vfio migration + vIOMMU support patches I have to follow up > (v5) Are you referring this series here? https://lore.kernel.org/qemu-devel/d5d30f58-31f0-1103-6956-377de34a790c@redhat.com/T/ Is that enabling migration only if Guest doesn’t do any DMA translations? > but unexpected set backs unrelated to work delay some of my plans for > qemu 9.2. > I expect to resume in few weeks. I can point you to a branch while I don't > submit (provided soft-freeze is coming) Also, I think we need a mechanism for page fault handling in case Guest handles the stage 1 plus dirty tracking for stage 1 as well. Thanks, Shameer
On 30/10/2024 15:57, Shameerali Kolothum Thodi wrote: >> On 30/10/2024 15:36, Jason Gunthorpe wrote: >>> On Wed, Oct 30, 2024 at 11:15:02PM +0800, Zhangfei Gao wrote: >>>> hw/vfio/migration.c >>>> if (vfio_viommu_preset(vbasedev)) { >>>> error_setg(&err, "%s: Migration is currently not supported " >>>> "with vIOMMU enabled", vbasedev->name); >>>> goto add_blocker; >>>> } >>> >>> The viommu driver itself does not support live migration, it would >>> need to preserve all the guest configuration and bring it all back. It >>> doesn't know how to do that yet. >> >> It's more of vfio code, not quite related to actually hw vIOMMU. >> >> There's some vfio migration + vIOMMU support patches I have to follow up >> (v5) > > Are you referring this series here? > https://lore.kernel.org/qemu-devel/d5d30f58-31f0-1103-6956-377de34a790c@redhat.com/T/ > > Is that enabling migration only if Guest doesn’t do any DMA translations? > No, it does it when guest is using the sw vIOMMU too. to be clear: this has nothing to do with nested IOMMU or if the guest is doing (emulated) dirty tracking. When the guest doesn't do DMA translation is this patch: https://lore.kernel.org/qemu-devel/20230908120521.50903-1-joao.m.martins@oracle.com/ >> but unexpected set backs unrelated to work delay some of my plans for >> qemu 9.2. >> I expect to resume in few weeks. I can point you to a branch while I don't >> submit (provided soft-freeze is coming) > > Also, I think we need a mechanism for page fault handling in case Guest handles > the stage 1 plus dirty tracking for stage 1 as well. > I have emulation for x86 iommus to dirty tracking, but that is unrelated to L0 live migration -- It's more for testing in the lack of recent hardware. Even emulated page fault handling doesn't affect this unless you have to re-map/map new IOVA, which would also be covered in this series I think. Unless you are talking about physical IOPF that qemu may terminate, though we don't have such support in Qemu atm.
> -----Original Message----- > From: Joao Martins <joao.m.martins@oracle.com> > Sent: Wednesday, October 30, 2024 4:57 PM > To: Shameerali Kolothum Thodi > <shameerali.kolothum.thodi@huawei.com>; Jason Gunthorpe > <jgg@nvidia.com>; Zhangfei Gao <zhangfei.gao@linaro.org> > Cc: iommu@lists.linux.dev; Kevin Tian <kevin.tian@intel.com>; Lu Baolu > <baolu.lu@linux.intel.com>; Yi Liu <yi.l.liu@intel.com>; Yi Y Sun > <yi.y.sun@intel.com>; Nicolin Chen <nicolinc@nvidia.com>; Joerg Roedel > <joro@8bytes.org>; Suravee Suthikulpanit > <suravee.suthikulpanit@amd.com>; Will Deacon <will@kernel.org>; Robin > Murphy <robin.murphy@arm.com>; Zhenzhong Duan > <zhenzhong.duan@intel.com>; Alex Williamson > <alex.williamson@redhat.com>; kvm@vger.kernel.org; Shameer Kolothum > <shamiali2008@gmail.com>; Wangzhou (B) <wangzhou1@hisilicon.com> > Subject: Re: [PATCH v6 00/18] IOMMUFD Dirty Tracking > > On 30/10/2024 15:57, Shameerali Kolothum Thodi wrote: > >> On 30/10/2024 15:36, Jason Gunthorpe wrote: > >>> On Wed, Oct 30, 2024 at 11:15:02PM +0800, Zhangfei Gao wrote: > >>>> hw/vfio/migration.c > >>>> if (vfio_viommu_preset(vbasedev)) { > >>>> error_setg(&err, "%s: Migration is currently not supported " > >>>> "with vIOMMU enabled", vbasedev->name); > >>>> goto add_blocker; > >>>> } > >>> > >>> The viommu driver itself does not support live migration, it would > >>> need to preserve all the guest configuration and bring it all back. It > >>> doesn't know how to do that yet. > >> > >> It's more of vfio code, not quite related to actually hw vIOMMU. > >> > >> There's some vfio migration + vIOMMU support patches I have to follow > up > >> (v5) > > > > Are you referring this series here? > > https://lore.kernel.org/qemu-devel/d5d30f58-31f0-1103-6956- > 377de34a790c@redhat.com/T/ > > > > Is that enabling migration only if Guest doesn’t do any DMA translations? > > > No, it does it when guest is using the sw vIOMMU too. to be clear: this has > nothing to do with nested IOMMU or if the guest is doing (emulated) dirty > tracking. Ok. Thanks for explaining. So just to clarify, this works for Intel vt-d with " caching-mode=on". ie, no real 2 stage setup is required like in ARM SMMUv3. > When the guest doesn't do DMA translation is this patch: > > https://lore.kernel.org/qemu-devel/20230908120521.50903-1- > joao.m.martins@oracle.com/ Ok. > > >> but unexpected set backs unrelated to work delay some of my plans for > >> qemu 9.2. > >> I expect to resume in few weeks. I can point you to a branch while I don't > >> submit (provided soft-freeze is coming) > > > > Also, I think we need a mechanism for page fault handling in case Guest > handles > > the stage 1 plus dirty tracking for stage 1 as well. > > > > I have emulation for x86 iommus to dirty tracking, but that is unrelated to > L0 > live migration -- It's more for testing in the lack of recent hardware. Even > emulated page fault handling doesn't affect this unless you have to re- > map/map > new IOVA, which would also be covered in this series I think. > > Unless you are talking about physical IOPF that qemu may terminate, > though we > don't have such support in Qemu atm. Yeah I was referring to ARM SMMUv3 cases, where we need nested-smmuv3 support for vfio-pci assignment. Another usecase we have is support SVA in Guest, with hardware capable of physical IOPF. I will take a look at your series above and will see what else is required to support ARM. Please CC if you plan to respin or have a latest branch. Thanks for your efforts. Shameer
On 30/10/2024 18:41, Shameerali Kolothum Thodi wrote: >> -----Original Message----- >> From: Joao Martins <joao.m.martins@oracle.com> >> Sent: Wednesday, October 30, 2024 4:57 PM >> To: Shameerali Kolothum Thodi >> <shameerali.kolothum.thodi@huawei.com>; Jason Gunthorpe >> <jgg@nvidia.com>; Zhangfei Gao <zhangfei.gao@linaro.org> >> Cc: iommu@lists.linux.dev; Kevin Tian <kevin.tian@intel.com>; Lu Baolu >> <baolu.lu@linux.intel.com>; Yi Liu <yi.l.liu@intel.com>; Yi Y Sun >> <yi.y.sun@intel.com>; Nicolin Chen <nicolinc@nvidia.com>; Joerg Roedel >> <joro@8bytes.org>; Suravee Suthikulpanit >> <suravee.suthikulpanit@amd.com>; Will Deacon <will@kernel.org>; Robin >> Murphy <robin.murphy@arm.com>; Zhenzhong Duan >> <zhenzhong.duan@intel.com>; Alex Williamson >> <alex.williamson@redhat.com>; kvm@vger.kernel.org; Shameer Kolothum >> <shamiali2008@gmail.com>; Wangzhou (B) <wangzhou1@hisilicon.com> >> Subject: Re: [PATCH v6 00/18] IOMMUFD Dirty Tracking >> On 30/10/2024 15:57, Shameerali Kolothum Thodi wrote: >>>> On 30/10/2024 15:36, Jason Gunthorpe wrote: >>>>> On Wed, Oct 30, 2024 at 11:15:02PM +0800, Zhangfei Gao wrote: >>>> but unexpected set backs unrelated to work delay some of my plans for >>>> qemu 9.2. >>>> I expect to resume in few weeks. I can point you to a branch while I don't >>>> submit (provided soft-freeze is coming) >>> >>> Also, I think we need a mechanism for page fault handling in case Guest >> handles >>> the stage 1 plus dirty tracking for stage 1 as well. >>> >> >> I have emulation for x86 iommus to dirty tracking, but that is unrelated to >> L0 >> live migration -- It's more for testing in the lack of recent hardware. Even >> emulated page fault handling doesn't affect this unless you have to re- >> map/map >> new IOVA, which would also be covered in this series I think. >> >> Unless you are talking about physical IOPF that qemu may terminate, >> though we >> don't have such support in Qemu atm. > > Yeah I was referring to ARM SMMUv3 cases, where we need nested-smmuv3 > support for vfio-pci assignment. Another usecase we have is support SVA in > Guest, with hardware capable of physical IOPF. > > I will take a look at your series above and will see what else is required> to support ARM. Please CC if you plan to respin or have a latest branch. > Thanks for your efforts. Right, the series about works for emulated intel-iommu and I had some patches for virtio-iommu (which got simpler thanks to Eric's work on aw-bits). amd-iommu is easily added too (but it needs to work with guest non-passthrough mode first before we get there). The only thing you need to do in iommu emulation side is to expose IOMMU_ATTR_MAX_IOVA[0], and it should be all. For guests hw-nesting / iopf I don't think I see it affected considering the GPA space remains the same and we still have a parent pagetable to get the stage 2 A/D bits (like non nested case). [0] https://lore.kernel.org/qemu-devel/20230622214845.3980-11-joao.m.martins@oracle.com/ Joao