mbox series

[v7,0/9] vfio: Improve error reporting (part 2)

Message ID 20240516124658.850504-1-clg@redhat.com (mailing list archive)
Headers show
Series vfio: Improve error reporting (part 2) | expand

Message

Cédric Le Goater May 16, 2024, 12:46 p.m. UTC
Hello,

The motivation behind these changes is to improve error reporting to
the upper management layer (libvirt) with a more detailed error, this
to let it decide, depending on the reported error, whether to try
migration again later. It would be useful in cases where migration
fails due to lack of HW resources on the host. For instance, some
adapters can only initiate a limited number of simultaneous dirty
tracking requests and this imposes a limit on the the number of VMs
that can be migrated simultaneously.

We are not quite ready for such a mechanism but what we can do first is
to cleanup the error reporting in the early save_setup sequence. This
is what the following changes propose, by adding an Error** argument to
various handlers and propagating it to the core migration subsystem.

The first part [1] of this series modifying the core migration
subsystem is now merged. This is the second part changing VFIO which
was already proposed in March. See [2].

Thanks,

C.

[1] [PATCH for-9.1 v5 00/14] migration: Improve error reporting
    https://lore.kernel.org/qemu-devel/20240320064911.545001-1-clg@redhat.com/

[2] [PATCH v4 00/25] migration: Improve error reporting
    https://lore.kernel.org/qemu-devel/20240306133441.2351700-1-clg@redhat.com/

Changes in v7:

 - Commit log improvements (Eric)
 - vfio_set_migration_error() : err -> ret rename (Eric)
 - vfio_migration_set_state() :
   Introduced an error prefix to remove redundancy in error messages (Eric)
   Commented error_report when setting the device in recover state fails (Eric)
 - vfio_migration_state_notifier() :
   Remove useless assignment of local ret variable (Avihai) 
   Rephrased comment regarding MigrationNotifyFunc API (Avihai)
 - Fixed even more line wrapping of *dirty_bitmap() routines (Avihai)
 - vfio_sync_dirty_bitmap()
   Fixed return when vfio_sync_ram_discard_listener_dirty_bitmap() is called (Avihai)

Changes in v6:

 - Commit log improvements (Avihai)
 - Modified some titles (Avihai)
 - vfio_migration_set_state() : Dropped the error_setg_errno()
   change when setting device in recover state fails  (Avihai)    
 - vfio_migration_state_notifier() : report local error (Avihai) 
 - vfio_save_device_config_state() : Set errp if the migration
   stream is in error (Avihai)
 - vfio_save_state() : Changed error prefix  (Avihai)
 - vfio_iommu_map_dirty_notify() : Modified goto label  (Avihai)
 - Fixed memory_get_xlat_addr documentation (Avihai)
 - Fixed line wrapping (Avihai)
 - Fixed query_dirty_bitmap documentation (Avihai)
 - Dropped last patch from v5 :   
   vfio: Extend vfio_set_migration_error() with Error* argument

Changes in v5:

 - Rebased on 20c64c8a51a4 ("migration: migration_file_set_error")
 - Fixed typo in set_dirty_page_tracking documentation
 - Used error_setg_errno() in vfio_devices_dma_logging_start()
 - Replaced error_setg() by error_setg_errno() in vfio_migration_set_state()
 - Replaced error_setg() by error_setg_errno() in
   vfio_devices_query_dirty_bitmap() and vfio_legacy_query_dirty_bitmap()
 - ':' -> '-' in vfio_iommu_map_dirty_notify()

Cédric Le Goater (9):
  vfio: Add Error** argument to .set_dirty_page_tracking() handler
  vfio: Add Error** argument to vfio_devices_dma_logging_start()
  migration: Extend migration_file_set_error() with Error* argument
  vfio/migration: Add an Error** argument to vfio_migration_set_state()
  vfio/migration: Add Error** argument to .vfio_save_config() handler
  vfio: Reverse test on vfio_get_xlat_addr()
  memory: Add Error** argument to memory_get_xlat_addr()
  vfio: Add Error** argument to .get_dirty_bitmap() handler
  vfio: Also trace event failures in vfio_save_complete_precopy()

 include/exec/memory.h                 |  15 +++-
 include/hw/vfio/vfio-common.h         |  30 ++++++-
 include/hw/vfio/vfio-container-base.h |  37 +++++++--
 include/migration/misc.h              |   2 +-
 hw/vfio/common.c                      | 113 ++++++++++++++++----------
 hw/vfio/container-base.c              |  10 +--
 hw/vfio/container.c                   |  20 +++--
 hw/vfio/migration.c                   | 109 ++++++++++++++++---------
 hw/vfio/pci.c                         |   5 +-
 hw/virtio/vhost-vdpa.c                |   5 +-
 migration/migration.c                 |   6 +-
 system/memory.c                       |  10 +--
 12 files changed, 246 insertions(+), 116 deletions(-)

Comments

Cédric Le Goater May 16, 2024, 4:22 p.m. UTC | #1
On 5/16/24 14:46, Cédric Le Goater wrote:
> Hello,
> 
> The motivation behind these changes is to improve error reporting to
> the upper management layer (libvirt) with a more detailed error, this
> to let it decide, depending on the reported error, whether to try
> migration again later. It would be useful in cases where migration
> fails due to lack of HW resources on the host. For instance, some
> adapters can only initiate a limited number of simultaneous dirty
> tracking requests and this imposes a limit on the the number of VMs
> that can be migrated simultaneously.
> 
> We are not quite ready for such a mechanism but what we can do first is
> to cleanup the error reporting in the early save_setup sequence. This
> is what the following changes propose, by adding an Error** argument to
> various handlers and propagating it to the core migration subsystem.
> 
> The first part [1] of this series modifying the core migration
> subsystem is now merged. This is the second part changing VFIO which
> was already proposed in March. See [2].
> 
> Thanks,
> 
> C.
> 
> [1] [PATCH for-9.1 v5 00/14] migration: Improve error reporting
>      https://lore.kernel.org/qemu-devel/20240320064911.545001-1-clg@redhat.com/
> 
> [2] [PATCH v4 00/25] migration: Improve error reporting
>      https://lore.kernel.org/qemu-devel/20240306133441.2351700-1-clg@redhat.com/
> 
> Changes in v7:
> 
>   - Commit log improvements (Eric)
>   - vfio_set_migration_error() : err -> ret rename (Eric)
>   - vfio_migration_set_state() :
>     Introduced an error prefix to remove redundancy in error messages (Eric)
>     Commented error_report when setting the device in recover state fails (Eric)
>   - vfio_migration_state_notifier() :
>     Remove useless assignment of local ret variable (Avihai)
>     Rephrased comment regarding MigrationNotifyFunc API (Avihai)
>   - Fixed even more line wrapping of *dirty_bitmap() routines (Avihai)
>   - vfio_sync_dirty_bitmap()
>     Fixed return when vfio_sync_ram_discard_listener_dirty_bitmap() is called (Avihai)

I fixed this last issue as commented in patch 8. Let's address other
issues, if minor, with followup patches.

Applied to vfio-next.

Thanks,

C.


> 
> Changes in v6:
> 
>   - Commit log improvements (Avihai)
>   - Modified some titles (Avihai)
>   - vfio_migration_set_state() : Dropped the error_setg_errno()
>     change when setting device in recover state fails  (Avihai)
>   - vfio_migration_state_notifier() : report local error (Avihai)
>   - vfio_save_device_config_state() : Set errp if the migration
>     stream is in error (Avihai)
>   - vfio_save_state() : Changed error prefix  (Avihai)
>   - vfio_iommu_map_dirty_notify() : Modified goto label  (Avihai)
>   - Fixed memory_get_xlat_addr documentation (Avihai)
>   - Fixed line wrapping (Avihai)
>   - Fixed query_dirty_bitmap documentation (Avihai)
>   - Dropped last patch from v5 :
>     vfio: Extend vfio_set_migration_error() with Error* argument
> 
> Changes in v5:
> 
>   - Rebased on 20c64c8a51a4 ("migration: migration_file_set_error")
>   - Fixed typo in set_dirty_page_tracking documentation
>   - Used error_setg_errno() in vfio_devices_dma_logging_start()
>   - Replaced error_setg() by error_setg_errno() in vfio_migration_set_state()
>   - Replaced error_setg() by error_setg_errno() in
>     vfio_devices_query_dirty_bitmap() and vfio_legacy_query_dirty_bitmap()
>   - ':' -> '-' in vfio_iommu_map_dirty_notify()
> 
> Cédric Le Goater (9):
>    vfio: Add Error** argument to .set_dirty_page_tracking() handler
>    vfio: Add Error** argument to vfio_devices_dma_logging_start()
>    migration: Extend migration_file_set_error() with Error* argument
>    vfio/migration: Add an Error** argument to vfio_migration_set_state()
>    vfio/migration: Add Error** argument to .vfio_save_config() handler
>    vfio: Reverse test on vfio_get_xlat_addr()
>    memory: Add Error** argument to memory_get_xlat_addr()
>    vfio: Add Error** argument to .get_dirty_bitmap() handler
>    vfio: Also trace event failures in vfio_save_complete_precopy()
> 
>   include/exec/memory.h                 |  15 +++-
>   include/hw/vfio/vfio-common.h         |  30 ++++++-
>   include/hw/vfio/vfio-container-base.h |  37 +++++++--
>   include/migration/misc.h              |   2 +-
>   hw/vfio/common.c                      | 113 ++++++++++++++++----------
>   hw/vfio/container-base.c              |  10 +--
>   hw/vfio/container.c                   |  20 +++--
>   hw/vfio/migration.c                   | 109 ++++++++++++++++---------
>   hw/vfio/pci.c                         |   5 +-
>   hw/virtio/vhost-vdpa.c                |   5 +-
>   migration/migration.c                 |   6 +-
>   system/memory.c                       |  10 +--
>   12 files changed, 246 insertions(+), 116 deletions(-)
>