mbox series

[RFCv2,0/8] vfio/iommufd: IOMMUFD Dirty Tracking

Message ID 20240212135643.5858-1-joao.m.martins@oracle.com (mailing list archive)
Headers show
Series vfio/iommufd: IOMMUFD Dirty Tracking | expand

Message

Joao Martins Feb. 12, 2024, 1:56 p.m. UTC
This small series adds support for Dirty Tracking in IOMMUFD backend.
The sole reason I still made it RFC is because of the second patch,
where we are implementing user-managed auto domains.

In essence it is quite similar to the original IOMMUFD series where we
would allocate a HWPT, until we switched later on into a IOAS attach.
Patch 2 goes into more detail, but the gist is that there's two modes of
using IOMMUFD and by keep using kernel managed auto domains we would end
up duplicating the same flags we have in HWPT but into the VFIO IOAS
attach. While it is true that just adding a flag is simpler, it also
creates duplication and motivates duplicate what hwpt-alloc already has.
But there's a chance I have the wrong expectation here, so any feedback
welcome.

The series is divided into:

* Patch 1: Adds a simple helper to get device capabilities;

* Patches 2 - 5: IOMMUFD backend support for dirty tracking;

The workflow is relatively simple:

1) Probe device and allow dirty tracking in the HWPT
2) Toggling dirty tracking on/off
3) Read-and-clear of Dirty IOVAs

The heuristics selected for (1) were to enable it *if* device supports
migration but doesn't support VF dirty tracking or IOMMU dirty tracking
is supported. The latter is for the hotplug case where we can add a device
without a tracker and thus still support migration.

The unmap case is deferred until further vIOMMU support with migration
is added[3] which will then introduce the usage of
IOMMU_HWPT_GET_DIRTY_BITMAP_NO_CLEAR in GET_DIRTY_BITMAP ioctl in the
dma unmap bitmap flow.

* Patches 6-8: Add disabling of hugepages to allow tracking at base
page; avoid blocking live migration where there's no VF dirty
tracker, considering that we have IOMMU dirty tracking. And allow
disabling VF dirty tracker via qemu command line.

This series builds on top of Zhengzhong series[0], but only requires the
first 9 patches i.e. up to ("vfio/pci: Initialize host iommu device
instance after attachment")[1] that are more generic IOMMUFD device
plumbing, and doesn't require the nesting counterpart.

This is stored on github:
https://github.com/jpemartins/qemu/commits/iommufd-v5

Note: While Linux v6.7 has IOMMU dirty tracking feature, I suggest folks
use the latest for-rc of iommufd kernel tree as there's some fixes there.

Comments and feedback appreciated.

Cheers,
        Joao

Chances since RFCv1[2]:
* Remove intel/amd dirty tracking emulation enabling
* Remove the dirtyrate improvement for VF/IOMMU dirty tracking
[Will pursue these two in separate series]
* Introduce auto domains support
* Enforce dirty tracking following the IOMMUFD UAPI for this
* Add support for toggling hugepages in IOMMUFD
* Auto enable support when VF supports migration to use IOMMU
when it doesn't have VF dirty tracking
* Add a parameter to toggle VF dirty tracking

[0] https://lore.kernel.org/qemu-devel/20240201072818.327930-1-zhenzhong.duan@intel.com/
[1] https://lore.kernel.org/qemu-devel/20240201072818.327930-10-zhenzhong.duan@intel.com/
[2] https://lore.kernel.org/qemu-devel/20220428211351.3897-1-joao.m.martins@oracle.com/
[3] https://lore.kernel.org/qemu-devel/20230622214845.3980-1-joao.m.martins@oracle.com/

Joao Martins (8):
  backends/iommufd: Introduce helper function
    iommufd_device_get_hw_capabilities()
  vfio/iommufd: Introduce auto domain creation
  vfio/iommufd: Probe and request hwpt dirty tracking capability
  vfio/iommufd: Implement VFIOIOMMUClass::set_dirty_tracking support
  vfio/iommufd: Implement VFIOIOMMUClass::query_dirty_bitmap support
  backends/iommufd: Add ability to disable hugepages
  vfio/migration: Don't block migration device dirty tracking is
    unsupported
  vfio/common: Allow disabling device dirty page tracking

 backends/iommufd.c            | 133 ++++++++++++++++++++++++++++
 backends/trace-events         |   4 +
 hw/vfio/common.c              |  32 ++++++-
 hw/vfio/iommufd.c             | 162 ++++++++++++++++++++++++++++++++++
 hw/vfio/migration.c           |   5 +-
 hw/vfio/pci.c                 |   3 +
 include/hw/vfio/vfio-common.h |  12 +++
 include/sysemu/iommufd.h      |  17 ++++
 qapi/qom.json                 |   2 +-
 9 files changed, 367 insertions(+), 3 deletions(-)

Comments

Joao Martins Feb. 13, 2024, 11:59 a.m. UTC | #1
On 12/02/2024 13:56, Joao Martins wrote:
> This small series adds support for Dirty Tracking in IOMMUFD backend.
> The sole reason I still made it RFC is because of the second patch,
> where we are implementing user-managed auto domains.
> 
> In essence it is quite similar to the original IOMMUFD series where we
> would allocate a HWPT, until we switched later on into a IOAS attach.
> Patch 2 goes into more detail, but the gist is that there's two modes of
> using IOMMUFD and by keep using kernel managed auto domains we would end
> up duplicating the same flags we have in HWPT but into the VFIO IOAS
> attach. While it is true that just adding a flag is simpler, it also
> creates duplication and motivates duplicate what hwpt-alloc already has.
> But there's a chance I have the wrong expectation here, so any feedback
> welcome.
> 
> The series is divided into:
> 
> * Patch 1: Adds a simple helper to get device capabilities;
> 
> * Patches 2 - 5: IOMMUFD backend support for dirty tracking;
> 
> The workflow is relatively simple:
> 
> 1) Probe device and allow dirty tracking in the HWPT
> 2) Toggling dirty tracking on/off
> 3) Read-and-clear of Dirty IOVAs
> 
> The heuristics selected for (1) were to enable it *if* device supports
> migration but doesn't support VF dirty tracking or IOMMU dirty tracking
> is supported. The latter is for the hotplug case where we can add a device
> without a tracker and thus still support migration.
> 
> The unmap case is deferred until further vIOMMU support with migration
> is added[3] which will then introduce the usage of
> IOMMU_HWPT_GET_DIRTY_BITMAP_NO_CLEAR in GET_DIRTY_BITMAP ioctl in the
> dma unmap bitmap flow.
> 
> * Patches 6-8: Add disabling of hugepages to allow tracking at base
> page; avoid blocking live migration where there's no VF dirty
> tracker, considering that we have IOMMU dirty tracking. And allow
> disabling VF dirty tracker via qemu command line.
> 
> This series builds on top of Zhengzhong series[0], but only requires the
> first 9 patches i.e. up to ("vfio/pci: Initialize host iommu device
> instance after attachment")[1] that are more generic IOMMUFD device
> plumbing, and doesn't require the nesting counterpart.
> 
I need to add that this series doesn't *need* to be based on Zhengzhong series.
Though given that he is consolidating how an IOMMUFD device info is represented
it felt the correct thing to do. For dirty tracking we mainly need the
dev_id/iommufd available when we are going to attach, that's it.

I've pushed this series version that doesn't have such dependency, let me know
if you want me to pursue this version instead going forward:

https://github.com/jpemartins/qemu/commits/iommufd-v5.nodeps

> This is stored on github:
> https://github.com/jpemartins/qemu/commits/iommufd-v5
> 
> Note: While Linux v6.7 has IOMMU dirty tracking feature, I suggest folks
> use the latest for-rc of iommufd kernel tree as there's some fixes there.
> 
> Comments and feedback appreciated.
> 
> Cheers,
>         Joao
> 
> Chances since RFCv1[2]:
> * Remove intel/amd dirty tracking emulation enabling
> * Remove the dirtyrate improvement for VF/IOMMU dirty tracking
> [Will pursue these two in separate series]
> * Introduce auto domains support
> * Enforce dirty tracking following the IOMMUFD UAPI for this
> * Add support for toggling hugepages in IOMMUFD
> * Auto enable support when VF supports migration to use IOMMU
> when it doesn't have VF dirty tracking
> * Add a parameter to toggle VF dirty tracking
> 
> [0] https://lore.kernel.org/qemu-devel/20240201072818.327930-1-zhenzhong.duan@intel.com/
> [1] https://lore.kernel.org/qemu-devel/20240201072818.327930-10-zhenzhong.duan@intel.com/
> [2] https://lore.kernel.org/qemu-devel/20220428211351.3897-1-joao.m.martins@oracle.com/
> [3] https://lore.kernel.org/qemu-devel/20230622214845.3980-1-joao.m.martins@oracle.com/
> 
> Joao Martins (8):
>   backends/iommufd: Introduce helper function
>     iommufd_device_get_hw_capabilities()
>   vfio/iommufd: Introduce auto domain creation
>   vfio/iommufd: Probe and request hwpt dirty tracking capability
>   vfio/iommufd: Implement VFIOIOMMUClass::set_dirty_tracking support
>   vfio/iommufd: Implement VFIOIOMMUClass::query_dirty_bitmap support
>   backends/iommufd: Add ability to disable hugepages
>   vfio/migration: Don't block migration device dirty tracking is
>     unsupported
>   vfio/common: Allow disabling device dirty page tracking
> 
>  backends/iommufd.c            | 133 ++++++++++++++++++++++++++++
>  backends/trace-events         |   4 +
>  hw/vfio/common.c              |  32 ++++++-
>  hw/vfio/iommufd.c             | 162 ++++++++++++++++++++++++++++++++++
>  hw/vfio/migration.c           |   5 +-
>  hw/vfio/pci.c                 |   3 +
>  include/hw/vfio/vfio-common.h |  12 +++
>  include/sysemu/iommufd.h      |  17 ++++
>  qapi/qom.json                 |   2 +-
>  9 files changed, 367 insertions(+), 3 deletions(-)
>
Cédric Le Goater Feb. 14, 2024, 3:40 p.m. UTC | #2
Hello Joao,

On 2/13/24 12:59, Joao Martins wrote:
> On 12/02/2024 13:56, Joao Martins wrote:
>> This small series adds support for Dirty Tracking in IOMMUFD backend.
>> The sole reason I still made it RFC is because of the second patch,
>> where we are implementing user-managed auto domains.
>>
>> In essence it is quite similar to the original IOMMUFD series where we
>> would allocate a HWPT, until we switched later on into a IOAS attach.
>> Patch 2 goes into more detail, but the gist is that there's two modes of
>> using IOMMUFD and by keep using kernel managed auto domains we would end
>> up duplicating the same flags we have in HWPT but into the VFIO IOAS
>> attach. While it is true that just adding a flag is simpler, it also
>> creates duplication and motivates duplicate what hwpt-alloc already has.
>> But there's a chance I have the wrong expectation here, so any feedback
>> welcome.
>>
>> The series is divided into:
>>
>> * Patch 1: Adds a simple helper to get device capabilities;
>>
>> * Patches 2 - 5: IOMMUFD backend support for dirty tracking;
>>
>> The workflow is relatively simple:
>>
>> 1) Probe device and allow dirty tracking in the HWPT
>> 2) Toggling dirty tracking on/off
>> 3) Read-and-clear of Dirty IOVAs
>>
>> The heuristics selected for (1) were to enable it *if* device supports
>> migration but doesn't support VF dirty tracking or IOMMU dirty tracking
>> is supported. The latter is for the hotplug case where we can add a device
>> without a tracker and thus still support migration.
>>
>> The unmap case is deferred until further vIOMMU support with migration
>> is added[3] which will then introduce the usage of
>> IOMMU_HWPT_GET_DIRTY_BITMAP_NO_CLEAR in GET_DIRTY_BITMAP ioctl in the
>> dma unmap bitmap flow.
>>
>> * Patches 6-8: Add disabling of hugepages to allow tracking at base
>> page; avoid blocking live migration where there's no VF dirty
>> tracker, considering that we have IOMMU dirty tracking. And allow
>> disabling VF dirty tracker via qemu command line.
>>
>> This series builds on top of Zhengzhong series[0], but only requires the
>> first 9 patches i.e. up to ("vfio/pci: Initialize host iommu device
>> instance after attachment")[1] that are more generic IOMMUFD device
>> plumbing, and doesn't require the nesting counterpart.
>>
> I need to add that this series doesn't *need* to be based on Zhengzhong series.
> Though given that he is consolidating how an IOMMUFD device info is represented
> it felt the correct thing to do. For dirty tracking we mainly need the
> dev_id/iommufd available when we are going to attach, that's it.
> 
> I've pushed this series version that doesn't have such dependency, let me know
> if you want me to pursue this version instead going forward:
> 
> https://github.com/jpemartins/qemu/commits/iommufd-v5.nodeps

I feel I have lost track of all the different patchsets.

To recap, there is yours :

* vfio/iommufd: IOMMUFD Dirty Tracking
   https://lore.kernel.org/qemu-devel/20240212135643.5858-1-joao.m.martins@oracle.com/

Zhengzhong's :

* [PATCH rfcv2 00/18] Check and sync host IOMMU cap/ecap with vIOMMU
   https://lore.kernel.org/qemu-devel/20240201072818.327930-1-zhenzhong.duan@intel.com/

Eric's :

* [RFC 0/7] VIRTIO-IOMMU/VFIO: Fix host iommu geometry handling for hotplugged devices
   https://lore.kernel.org/qemu-devel/20240117080414.316890-1-eric.auger@redhat.com/

Steve's:

* [PATCH V3 00/13] allow cpr-reboot for vfio
   https://lore.kernel.org/qemu-devel/1707418446-134863-1-git-send-email-steven.sistare@oracle.com/

Mine, which should be an RFC :

* [PATCH 00/14] migration: Improve error reporting
   https://lore.kernel.org/qemu-devel/20240207133347.1115903-1-clg@redhat.com/

Anything else ?

Thanks,

C.
Joao Martins Feb. 14, 2024, 4:25 p.m. UTC | #3
On 14/02/2024 15:40, Cédric Le Goater wrote:
> Hello Joao,
> 
> On 2/13/24 12:59, Joao Martins wrote:
>> On 12/02/2024 13:56, Joao Martins wrote:
>>> This small series adds support for Dirty Tracking in IOMMUFD backend.
>>> The sole reason I still made it RFC is because of the second patch,
>>> where we are implementing user-managed auto domains.
>>>
>>> In essence it is quite similar to the original IOMMUFD series where we
>>> would allocate a HWPT, until we switched later on into a IOAS attach.
>>> Patch 2 goes into more detail, but the gist is that there's two modes of
>>> using IOMMUFD and by keep using kernel managed auto domains we would end
>>> up duplicating the same flags we have in HWPT but into the VFIO IOAS
>>> attach. While it is true that just adding a flag is simpler, it also
>>> creates duplication and motivates duplicate what hwpt-alloc already has.
>>> But there's a chance I have the wrong expectation here, so any feedback
>>> welcome.
>>>
>>> The series is divided into:
>>>
>>> * Patch 1: Adds a simple helper to get device capabilities;
>>>
>>> * Patches 2 - 5: IOMMUFD backend support for dirty tracking;
>>>
>>> The workflow is relatively simple:
>>>
>>> 1) Probe device and allow dirty tracking in the HWPT
>>> 2) Toggling dirty tracking on/off
>>> 3) Read-and-clear of Dirty IOVAs
>>>
>>> The heuristics selected for (1) were to enable it *if* device supports
>>> migration but doesn't support VF dirty tracking or IOMMU dirty tracking
>>> is supported. The latter is for the hotplug case where we can add a device
>>> without a tracker and thus still support migration.
>>>
>>> The unmap case is deferred until further vIOMMU support with migration
>>> is added[3] which will then introduce the usage of
>>> IOMMU_HWPT_GET_DIRTY_BITMAP_NO_CLEAR in GET_DIRTY_BITMAP ioctl in the
>>> dma unmap bitmap flow.
>>>
>>> * Patches 6-8: Add disabling of hugepages to allow tracking at base
>>> page; avoid blocking live migration where there's no VF dirty
>>> tracker, considering that we have IOMMU dirty tracking. And allow
>>> disabling VF dirty tracker via qemu command line.
>>>
>>> This series builds on top of Zhengzhong series[0], but only requires the
>>> first 9 patches i.e. up to ("vfio/pci: Initialize host iommu device
>>> instance after attachment")[1] that are more generic IOMMUFD device
>>> plumbing, and doesn't require the nesting counterpart.
>>>
>> I need to add that this series doesn't *need* to be based on Zhengzhong series.
>> Though given that he is consolidating how an IOMMUFD device info is represented
>> it felt the correct thing to do. For dirty tracking we mainly need the
>> dev_id/iommufd available when we are going to attach, that's it.
>>
>> I've pushed this series version that doesn't have such dependency, let me know
>> if you want me to pursue this version instead going forward:
>>
>> https://github.com/jpemartins/qemu/commits/iommufd-v5.nodeps
> 
> I feel I have lost track of all the different patchsets.
> 
> To recap, there is yours :
> 
> * vfio/iommufd: IOMMUFD Dirty Tracking
>  
> https://lore.kernel.org/qemu-devel/20240212135643.5858-1-joao.m.martins@oracle.com/
> 
> Zhengzhong's :
> 
> * [PATCH rfcv2 00/18] Check and sync host IOMMU cap/ecap with vIOMMU
>  
> https://lore.kernel.org/qemu-devel/20240201072818.327930-1-zhenzhong.duan@intel.com/
> 

There's also this one from Zhenzhong which depends on this set above:

	https://lore.kernel.org/qemu-devel/20240115103735.132209-1-zhenzhong.duan@intel.com/

But I suspect that part of it is stale already, considering a whole lot of
IOMMUFDDevice was reworked. Though the series is about bringup intel-iommu
nesting support.

> Eric's :
> 
> * [RFC 0/7] VIRTIO-IOMMU/VFIO: Fix host iommu geometry handling for hotplugged
> devices
>   https://lore.kernel.org/qemu-devel/20240117080414.316890-1-eric.auger@redhat.com/
> 
> Steve's:
> 
> * [PATCH V3 00/13] allow cpr-reboot for vfio
>  
> https://lore.kernel.org/qemu-devel/1707418446-134863-1-git-send-email-steven.sistare@oracle.com/
> 
> Mine, which should be an RFC :
> 
> * [PATCH 00/14] migration: Improve error reporting
>   https://lore.kernel.org/qemu-devel/20240207133347.1115903-1-clg@redhat.com/
> 
> Anything else ?

In terms of major series, I think you only forgot one. The rest look to be
what's out there.

Just to avoid confusion, yesterday's message was just providing an alternative
of this same series but it that wouldn't be dependent on:

	[PATCH rfcv2 00/18] Check and sync host IOMMU cap/ecap with vIOMMU

... which is what is posted in this link:

	https://github.com/jpemartins/qemu/commits/iommufd-v5.nodeps

While the series, as posted, is here:

	https://github.com/jpemartins/qemu/commits/iommufd-v5
Eric Auger Feb. 15, 2024, 2:20 p.m. UTC | #4
Hi,

On 2/14/24 17:25, Joao Martins wrote:
> On 14/02/2024 15:40, Cédric Le Goater wrote:
>> Hello Joao,
>>
>> On 2/13/24 12:59, Joao Martins wrote:
>>> On 12/02/2024 13:56, Joao Martins wrote:
>>>> This small series adds support for Dirty Tracking in IOMMUFD backend.
>>>> The sole reason I still made it RFC is because of the second patch,
>>>> where we are implementing user-managed auto domains.
>>>>
>>>> In essence it is quite similar to the original IOMMUFD series where we
>>>> would allocate a HWPT, until we switched later on into a IOAS attach.
>>>> Patch 2 goes into more detail, but the gist is that there's two modes of
>>>> using IOMMUFD and by keep using kernel managed auto domains we would end
>>>> up duplicating the same flags we have in HWPT but into the VFIO IOAS
>>>> attach. While it is true that just adding a flag is simpler, it also
>>>> creates duplication and motivates duplicate what hwpt-alloc already has.
>>>> But there's a chance I have the wrong expectation here, so any feedback
>>>> welcome.
>>>>
>>>> The series is divided into:
>>>>
>>>> * Patch 1: Adds a simple helper to get device capabilities;
>>>>
>>>> * Patches 2 - 5: IOMMUFD backend support for dirty tracking;
>>>>
>>>> The workflow is relatively simple:
>>>>
>>>> 1) Probe device and allow dirty tracking in the HWPT
>>>> 2) Toggling dirty tracking on/off
>>>> 3) Read-and-clear of Dirty IOVAs
>>>>
>>>> The heuristics selected for (1) were to enable it *if* device supports
>>>> migration but doesn't support VF dirty tracking or IOMMU dirty tracking
>>>> is supported. The latter is for the hotplug case where we can add a device
>>>> without a tracker and thus still support migration.
>>>>
>>>> The unmap case is deferred until further vIOMMU support with migration
>>>> is added[3] which will then introduce the usage of
>>>> IOMMU_HWPT_GET_DIRTY_BITMAP_NO_CLEAR in GET_DIRTY_BITMAP ioctl in the
>>>> dma unmap bitmap flow.
>>>>
>>>> * Patches 6-8: Add disabling of hugepages to allow tracking at base
>>>> page; avoid blocking live migration where there's no VF dirty
>>>> tracker, considering that we have IOMMU dirty tracking. And allow
>>>> disabling VF dirty tracker via qemu command line.
>>>>
>>>> This series builds on top of Zhengzhong series[0], but only requires the
>>>> first 9 patches i.e. up to ("vfio/pci: Initialize host iommu device
>>>> instance after attachment")[1] that are more generic IOMMUFD device
>>>> plumbing, and doesn't require the nesting counterpart.
>>>>
>>> I need to add that this series doesn't *need* to be based on Zhengzhong series.
>>> Though given that he is consolidating how an IOMMUFD device info is represented
>>> it felt the correct thing to do. For dirty tracking we mainly need the
>>> dev_id/iommufd available when we are going to attach, that's it.
>>>
>>> I've pushed this series version that doesn't have such dependency, let me know
>>> if you want me to pursue this version instead going forward:
>>>
>>> https://github.com/jpemartins/qemu/commits/iommufd-v5.nodeps
>> I feel I have lost track of all the different patchsets.
>>
>> To recap, there is yours :
>>
>> * vfio/iommufd: IOMMUFD Dirty Tracking
>>  
>> https://lore.kernel.org/qemu-devel/20240212135643.5858-1-joao.m.martins@oracle.com/
>>
>> Zhengzhong's :
>>
>> * [PATCH rfcv2 00/18] Check and sync host IOMMU cap/ecap with vIOMMU
>>  
>> https://lore.kernel.org/qemu-devel/20240201072818.327930-1-zhenzhong.duan@intel.com/
>>
> There's also this one from Zhenzhong which depends on this set above:
>
> 	https://lore.kernel.org/qemu-devel/20240115103735.132209-1-zhenzhong.duan@intel.com/
>
> But I suspect that part of it is stale already, considering a whole lot of
> IOMMUFDDevice was reworked. Though the series is about bringup intel-iommu
> nesting support.
>
>> Eric's :
>>
>> * [RFC 0/7] VIRTIO-IOMMU/VFIO: Fix host iommu geometry handling for hotplugged
>> devices
>>   https://lore.kernel.org/qemu-devel/20240117080414.316890-1-eric.auger@redhat.com/

don't spend time reviewing my series at that stage. I will review
Zhenzhong's

[PATCH rfcv2 00/18] Check and sync host IOMMU cap/ecap with vIOMMU

and try to rebase on it.

Thanks

Eric

>>
>> Steve's:
>>
>> * [PATCH V3 00/13] allow cpr-reboot for vfio
>>  
>> https://lore.kernel.org/qemu-devel/1707418446-134863-1-git-send-email-steven.sistare@oracle.com/
>>
>> Mine, which should be an RFC :
>>
>> * [PATCH 00/14] migration: Improve error reporting
>>   https://lore.kernel.org/qemu-devel/20240207133347.1115903-1-clg@redhat.com/
>>
>> Anything else ?
> In terms of major series, I think you only forgot one. The rest look to be
> what's out there.
>
> Just to avoid confusion, yesterday's message was just providing an alternative
> of this same series but it that wouldn't be dependent on:
>
> 	[PATCH rfcv2 00/18] Check and sync host IOMMU cap/ecap with vIOMMU
>
> ... which is what is posted in this link:
>
> 	https://github.com/jpemartins/qemu/commits/iommufd-v5.nodeps
>
> While the series, as posted, is here:
>
> 	https://github.com/jpemartins/qemu/commits/iommufd-v5
>