Message ID | 20230209192043.14885-4-avihaih@nvidia.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | vfio/migration: Implement VFIO migration protocol v2 | expand |
Avihai Horon <avihaih@nvidia.com> wrote: > Currently, if IOMMU of a VFIO container doesn't support dirty page > tracking, migration is blocked. This is because a DMA-able VFIO device > can dirty RAM pages without updating QEMU about it, thus breaking the > migration. > > However, this doesn't mean that migration can't be done at all. > In such case, allow migration and let QEMU VFIO code mark all pages > dirty. > > This guarantees that all pages that might have gotten dirty are reported > back, and thus guarantees a valid migration even without VFIO IOMMU > dirty tracking support. > > The motivation for this patch is the introduction of iommufd [1]. > iommufd can directly implement the /dev/vfio/vfio container IOCTLs by > mapping them into its internal ops, allowing the usage of these IOCTLs > over iommufd. However, VFIO IOMMU dirty tracking is not supported by > this VFIO compatibility API. > > This patch will allow migration by hosts that use the VFIO compatibility > API and prevent migration regressions caused by the lack of VFIO IOMMU > dirty tracking support. > > [1] > https://lore.kernel.org/kvm/0-v6-a196d26f289e+11787-iommufd_jgg@nvidia.com/ > > Signed-off-by: Avihai Horon <avihaih@nvidia.com> > Reviewed-by: Cédric Le Goater <clg@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> I know why you are doing this. But I think this should print a warning, error, somewhere. You are just dirtying all pages each time we arrive here. Even calling the featura "experimental" is an understatement. Later, Juan.
On 15/02/2023 14:43, Juan Quintela wrote: > External email: Use caution opening links or attachments > > > Avihai Horon <avihaih@nvidia.com> wrote: >> Currently, if IOMMU of a VFIO container doesn't support dirty page >> tracking, migration is blocked. This is because a DMA-able VFIO device >> can dirty RAM pages without updating QEMU about it, thus breaking the >> migration. >> >> However, this doesn't mean that migration can't be done at all. >> In such case, allow migration and let QEMU VFIO code mark all pages >> dirty. >> >> This guarantees that all pages that might have gotten dirty are reported >> back, and thus guarantees a valid migration even without VFIO IOMMU >> dirty tracking support. >> >> The motivation for this patch is the introduction of iommufd [1]. >> iommufd can directly implement the /dev/vfio/vfio container IOCTLs by >> mapping them into its internal ops, allowing the usage of these IOCTLs >> over iommufd. However, VFIO IOMMU dirty tracking is not supported by >> this VFIO compatibility API. >> >> This patch will allow migration by hosts that use the VFIO compatibility >> API and prevent migration regressions caused by the lack of VFIO IOMMU >> dirty tracking support. >> >> [1] >> https://lore.kernel.org/kvm/0-v6-a196d26f289e+11787-iommufd_jgg@nvidia.com/ >> >> Signed-off-by: Avihai Horon <avihaih@nvidia.com> >> Reviewed-by: Cédric Le Goater <clg@redhat.com> > Reviewed-by: Juan Quintela <quintela@redhat.com> > > I know why you are doing this. > > But I think this should print a warning, error, somewhere. IMHO, I'm not sure it's really necessary. To enable VFIO migration the user must explicitly set x-enable-migration=on. I guess in this case the user is well aware of the dirty tracking capabilities the VFIO device has or doesn't have. So I don't think adding this error/warning will help much. Thanks. > > You are just dirtying all pages each time we arrive here. > > Even calling the featura "experimental" is an understatement. > > Later, Juan. >
Avihai Horon <avihaih@nvidia.com> wrote: > On 15/02/2023 14:43, Juan Quintela wrote: >> External email: Use caution opening links or attachments >> >> >> Avihai Horon <avihaih@nvidia.com> wrote: >>> Currently, if IOMMU of a VFIO container doesn't support dirty page >>> tracking, migration is blocked. This is because a DMA-able VFIO device >>> can dirty RAM pages without updating QEMU about it, thus breaking the >>> migration. >>> >>> However, this doesn't mean that migration can't be done at all. >>> In such case, allow migration and let QEMU VFIO code mark all pages >>> dirty. >>> >>> This guarantees that all pages that might have gotten dirty are reported >>> back, and thus guarantees a valid migration even without VFIO IOMMU >>> dirty tracking support. >>> >>> The motivation for this patch is the introduction of iommufd [1]. >>> iommufd can directly implement the /dev/vfio/vfio container IOCTLs by >>> mapping them into its internal ops, allowing the usage of these IOCTLs >>> over iommufd. However, VFIO IOMMU dirty tracking is not supported by >>> this VFIO compatibility API. >>> >>> This patch will allow migration by hosts that use the VFIO compatibility >>> API and prevent migration regressions caused by the lack of VFIO IOMMU >>> dirty tracking support. >>> >>> [1] >>> https://lore.kernel.org/kvm/0-v6-a196d26f289e+11787-iommufd_jgg@nvidia.com/ >>> >>> Signed-off-by: Avihai Horon <avihaih@nvidia.com> >>> Reviewed-by: Cédric Le Goater <clg@redhat.com> >> Reviewed-by: Juan Quintela <quintela@redhat.com> >> >> I know why you are doing this. >> >> But I think this should print a warning, error, somewhere. > > IMHO, I'm not sure it's really necessary. > > To enable VFIO migration the user must explicitly set x-enable-migration=on. > I guess in this case the user is well aware of the dirty tracking > capabilities the VFIO device has or doesn't have. > So I don't think adding this error/warning will help much. Oops. I missed that bit. I retire my objection. Sorry, Juan. > Thanks. > >> >> You are just dirtying all pages each time we arrive here. >> >> Even calling the featura "experimental" is an understatement. >> >> Later, Juan. >>
On Wed, 15 Feb 2023 19:04:33 +0100 Juan Quintela <quintela@redhat.com> wrote: > Avihai Horon <avihaih@nvidia.com> wrote: > > On 15/02/2023 14:43, Juan Quintela wrote: > >> External email: Use caution opening links or attachments > >> > >> > >> Avihai Horon <avihaih@nvidia.com> wrote: > >>> Currently, if IOMMU of a VFIO container doesn't support dirty page > >>> tracking, migration is blocked. This is because a DMA-able VFIO device > >>> can dirty RAM pages without updating QEMU about it, thus breaking the > >>> migration. > >>> > >>> However, this doesn't mean that migration can't be done at all. > >>> In such case, allow migration and let QEMU VFIO code mark all pages > >>> dirty. > >>> > >>> This guarantees that all pages that might have gotten dirty are reported > >>> back, and thus guarantees a valid migration even without VFIO IOMMU > >>> dirty tracking support. > >>> > >>> The motivation for this patch is the introduction of iommufd [1]. > >>> iommufd can directly implement the /dev/vfio/vfio container IOCTLs by > >>> mapping them into its internal ops, allowing the usage of these IOCTLs > >>> over iommufd. However, VFIO IOMMU dirty tracking is not supported by > >>> this VFIO compatibility API. > >>> > >>> This patch will allow migration by hosts that use the VFIO compatibility > >>> API and prevent migration regressions caused by the lack of VFIO IOMMU > >>> dirty tracking support. > >>> > >>> [1] > >>> https://lore.kernel.org/kvm/0-v6-a196d26f289e+11787-iommufd_jgg@nvidia.com/ > >>> > >>> Signed-off-by: Avihai Horon <avihaih@nvidia.com> > >>> Reviewed-by: Cédric Le Goater <clg@redhat.com> > >> Reviewed-by: Juan Quintela <quintela@redhat.com> > >> > >> I know why you are doing this. > >> > >> But I think this should print a warning, error, somewhere. > > > > IMHO, I'm not sure it's really necessary. > > > > To enable VFIO migration the user must explicitly set x-enable-migration=on. > > I guess in this case the user is well aware of the dirty tracking > > capabilities the VFIO device has or doesn't have. > > So I don't think adding this error/warning will help much. > > Oops. I missed that bit. > I retire my objection. Keep it in mind though as hopefully we'll be making vfio migration non-experimental soon and enabled by default where devices support it. We'll need to consider whether we want to keep "dumb" dirty tracking, or even any form of dirty tracking in the type1 uAPI, under an experimental opt-in. Thanks, Alex
On Wed, Feb 15, 2023 at 01:14:35PM -0700, Alex Williamson wrote: > We'll need to consider whether we want to keep "dumb" dirty tracking, > or even any form of dirty tracking in the type1 uAPI, under an > experimental opt-in. Thanks, I was expecting we'd delete the kernel code for type 1 dirty tracking once the v2 parts are merged to qemu since we don't and won't have any kernel implementation of it.. The big point of this to allow qmeu to continue on with a future kernel that no longer reports it supports this. Jason
On Wed, 15 Feb 2023 16:38:10 -0400 Jason Gunthorpe <jgg@nvidia.com> wrote: > On Wed, Feb 15, 2023 at 01:14:35PM -0700, Alex Williamson wrote: > > > We'll need to consider whether we want to keep "dumb" dirty tracking, > > or even any form of dirty tracking in the type1 uAPI, under an > > experimental opt-in. Thanks, > > I was expecting we'd delete the kernel code for type 1 dirty tracking > once the v2 parts are merged to qemu since we don't and won't have any > kernel implementation of it.. > > The big point of this to allow qmeu to continue on with a future > kernel that no longer reports it supports this. Right, in the part-1 series adding v2 support, we have no other dirty tracking, so it serves a purpose until we have device-level dirty tracking in the part-2 series. After that, we can certainly at least remove the type1 dirty tracking since there never were, and likely never will be, any in-kernel implementations. I could go either way if we want to keep the in-QEMU dirty-everything tracking around as an experimental option to aid migration driver development/debug (ie. maybe enabling it preempts device dirty tracking as a comparison for debug). Thanks, Alex
diff --git a/hw/vfio/common.c b/hw/vfio/common.c index 130e5d1dc7..f6dd571549 100644 --- a/hw/vfio/common.c +++ b/hw/vfio/common.c @@ -488,6 +488,12 @@ static int vfio_dma_unmap(VFIOContainer *container, return -errno; } + if (iotlb && vfio_devices_all_running_and_saving(container)) { + cpu_physical_memory_set_dirty_range(iotlb->translated_addr, size, + tcg_enabled() ? DIRTY_CLIENTS_ALL : + DIRTY_CLIENTS_NOCODE); + } + return 0; } @@ -1201,6 +1207,10 @@ static void vfio_set_dirty_page_tracking(VFIOContainer *container, bool start) .argsz = sizeof(dirty), }; + if (!container->dirty_pages_supported) { + return; + } + if (start) { dirty.flags = VFIO_IOMMU_DIRTY_PAGES_FLAG_START; } else { @@ -1236,6 +1246,13 @@ static int vfio_get_dirty_bitmap(VFIOContainer *container, uint64_t iova, uint64_t pages; int ret; + if (!container->dirty_pages_supported) { + cpu_physical_memory_set_dirty_range(ram_addr, size, + tcg_enabled() ? DIRTY_CLIENTS_ALL : + DIRTY_CLIENTS_NOCODE); + return 0; + } + dbitmap = g_malloc0(sizeof(*dbitmap) + sizeof(*range)); dbitmap->argsz = sizeof(*dbitmap) + sizeof(*range); @@ -1409,8 +1426,7 @@ static void vfio_listener_log_sync(MemoryListener *listener, { VFIOContainer *container = container_of(listener, VFIOContainer, listener); - if (vfio_listener_skipped_section(section) || - !container->dirty_pages_supported) { + if (vfio_listener_skipped_section(section)) { return; } diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c index 02db30fe8d..c9f3117986 100644 --- a/hw/vfio/migration.c +++ b/hw/vfio/migration.c @@ -861,11 +861,10 @@ int64_t vfio_mig_bytes_transferred(void) int vfio_migration_probe(VFIODevice *vbasedev, Error **errp) { - VFIOContainer *container = vbasedev->group->container; struct vfio_region_info *info = NULL; int ret = -ENOTSUP; - if (!vbasedev->enable_migration || !container->dirty_pages_supported) { + if (!vbasedev->enable_migration) { goto add_blocker; }