mbox series

[v4,00/25] migration: Improve error reporting

Message ID 20240306133441.2351700-1-clg@redhat.com (mailing list archive)
Headers show
Series migration: Improve error reporting | expand

Message

Cédric Le Goater March 6, 2024, 1:34 p.m. UTC
Hello,

The motivation behind these changes is to improve error reporting to
the upper management layer (libvirt) with a more detailed error, this
to let it decide, depending on the reported error, whether to try
migration again later. It would be useful in cases where migration
fails due to lack of HW resources on the host. For instance, some
adapters can only initiate a limited number of simultaneous dirty
tracking requests and this imposes a limit on the the number of VMs
that can be migrated simultaneously.

We are not quite ready for such a mechanism but what we can do first is
to cleanup the error reporting in the early save_setup sequence. This
is what the following changes propose, by adding an Error** argument to
various handlers and propagating it to the core migration subsystem.


Patchset is organized as follow :

* [1-4] already queued in migration-next.
  
  migration: Report error when shutdown fails
  migration: Remove SaveStateHandler and LoadStateHandler typedefs
  migration: Add documentation for SaveVMHandlers
  migration: Do not call PRECOPY_NOTIFY_SETUP notifiers in case of error
  
* [5-9] are prequisite changes in other components related to the
  migration save_setup() handler. They make sure a failure is not
  returned without setting an error.
  
  s390/stattrib: Add Error** argument to set_migrationmode() handler
  vfio: Always report an error in vfio_save_setup()
  migration: Always report an error in block_save_setup()
  migration: Always report an error in ram_save_setup()
  migration: Add Error** argument to vmstate_save()

* [10-15] are the core changes in migration and memory components to
  propagate an error reported in a save_setup() handler.

  migration: Add Error** argument to qemu_savevm_state_setup()
  migration: Add Error** argument to .save_setup() handler
  migration: Add Error** argument to .load_setup() handler
  memory: Add Error** argument to .log_global_start() handler
  memory: Add Error** argument to the global_dirty_log routines
  migration: Modify ram_init_bitmaps() to report dirty tracking errors

* [16-19] contains the VFIO changes we are interested in. Can go
  through vfio-next.

  vfio: Add Error** argument to .set_dirty_page_tracking() handler
  vfio: Add Error** argument to vfio_devices_dma_logging_start()
  vfio: Add Error** argument to vfio_devices_dma_logging_stop()
  vfio: Use new Error** argument in vfio_save_setup()

* [20-25] are followups for better error handling in VFIO. Good to
  have but not necessary for the issue described in the intro. Can go
  through vfio-next.

  vfio: Add Error** argument to .vfio_save_config() handler
  vfio: Reverse test on vfio_get_dirty_bitmap()
  memory: Add Error** argument to memory_get_xlat_addr()
  vfio: Add Error** argument to .get_dirty_bitmap() handler
  vfio: Also trace event failures in vfio_save_complete_precopy()
  vfio: Extend vfio_set_migration_error() with Error* argument

Thanks,

C.

Changes in v4:

 - Fixed frenchism futur to future
 - Fixed typo in set_migrationmode() handler
 - Added error_free() in hmp_migrationmode()
 - Fixed state name printed out in error returned by vfio_save_setup()
 - Fixed test on error returned by qemu_file_get_error()
 - Added an error when bdrv_nb_sectors() returns a negative value 
 - Dropped log_global_stop() and log_global_sync() changes
 - Dropped MEMORY_LISTENER_CALL_LOG_GLOBAL
 - Modified memory_global_dirty_log_start() to loop on the list of
   listeners and handle errors directly.
 - Introduced memory_global_dirty_log_rollback() to revert operations
   previously done

Changes in v3:

 - New changes to make sure an error is always set in case of failure.
   This is the reason behing the 5/6 extra patches. (Markus)
 - Documentation fixup (Peter + Avihai)
 - Set migration state to MIGRATION_STATUS_FAILED always
 - Fixed error handling in bg_migration_thread() (Peter)
 - Fixed return value of vfio_listener_log_global_start/stop(). 
   Went unnoticed because value is not tested. (Peter)
 - Add ERRP_GUARD() when error_prepend is used 
 - Use error_setg_errno() when possible
    
Changes in v2:

- Removed v1 patches addressing the return-path thread termination as
  they are now superseded by :  
  https://lore.kernel.org/qemu-devel/20240226203122.22894-1-farosas@suse.de/
- Documentation updates of handlers
- Removed call to PRECOPY_NOTIFY_SETUP notifiers in case of errors
- Modified routines taking an Error** argument to return a bool when
  possible and made adjustments in callers.
- new MEMORY_LISTENER_CALL_LOG_GLOBAL macro for .log_global*()
  handlers
- Handled SETUP state when migration terminates
- Modified memory_get_xlat_addr() to take an Error** argument
- Various refinements on error handling

Cédric Le Goater (25):
  migration: Report error when shutdown fails
  migration: Remove SaveStateHandler and LoadStateHandler typedefs
  migration: Add documentation for SaveVMHandlers
  migration: Do not call PRECOPY_NOTIFY_SETUP notifiers in case of error
  s390/stattrib: Add Error** argument to set_migrationmode() handler
  vfio: Always report an error in vfio_save_setup()
  migration: Always report an error in block_save_setup()
  migration: Always report an error in ram_save_setup()
  migration: Add Error** argument to vmstate_save()
  migration: Add Error** argument to qemu_savevm_state_setup()
  migration: Add Error** argument to .save_setup() handler
  migration: Add Error** argument to .load_setup() handler
  memory: Add Error** argument to .log_global_start() handler
  memory: Add Error** argument to the global_dirty_log routines
  migration: Modify ram_init_bitmaps() to report dirty tracking errors
  vfio: Add Error** argument to .set_dirty_page_tracking() handler
  vfio: Add Error** argument to vfio_devices_dma_logging_start()
  vfio: Add Error** argument to vfio_devices_dma_logging_stop()
  vfio: Use new Error** argument in vfio_save_setup()
  vfio: Add Error** argument to .vfio_save_config() handler
  vfio: Reverse test on vfio_get_dirty_bitmap()
  memory: Add Error** argument to memory_get_xlat_addr()
  vfio: Add Error** argument to .get_dirty_bitmap() handler
  vfio: Also trace event failures in vfio_save_complete_precopy()
  vfio: Extend vfio_set_migration_error() with Error* argument

 include/exec/memory.h                 |  25 ++-
 include/hw/s390x/storage-attributes.h |   2 +-
 include/hw/vfio/vfio-common.h         |  29 ++-
 include/hw/vfio/vfio-container-base.h |  35 +++-
 include/migration/register.h          | 273 +++++++++++++++++++++++---
 include/qemu/typedefs.h               |   2 -
 migration/savevm.h                    |   2 +-
 hw/i386/xen/xen-hvm.c                 |   5 +-
 hw/ppc/spapr.c                        |   2 +-
 hw/s390x/s390-stattrib-kvm.c          |  12 +-
 hw/s390x/s390-stattrib.c              |  15 +-
 hw/vfio/common.c                      | 161 +++++++++------
 hw/vfio/container-base.c              |   9 +-
 hw/vfio/container.c                   |  19 +-
 hw/vfio/migration.c                   |  99 ++++++----
 hw/vfio/pci.c                         |   5 +-
 hw/virtio/vhost-vdpa.c                |   5 +-
 hw/virtio/vhost.c                     |   3 +-
 migration/block-dirty-bitmap.c        |   4 +-
 migration/block.c                     |  19 +-
 migration/dirtyrate.c                 |  13 +-
 migration/migration.c                 |  27 ++-
 migration/qemu-file.c                 |   5 +-
 migration/ram.c                       |  46 ++++-
 migration/savevm.c                    |  59 +++---
 system/memory.c                       |  56 +++++-
 26 files changed, 713 insertions(+), 219 deletions(-)

Comments

Peter Xu March 8, 2024, 8:15 a.m. UTC | #1
On Wed, Mar 06, 2024 at 02:34:15PM +0100, Cédric Le Goater wrote:
> * [1-4] already queued in migration-next.
>   
>   migration: Report error when shutdown fails
>   migration: Remove SaveStateHandler and LoadStateHandler typedefs
>   migration: Add documentation for SaveVMHandlers
>   migration: Do not call PRECOPY_NOTIFY_SETUP notifiers in case of error
>   
> * [5-9] are prequisite changes in other components related to the
>   migration save_setup() handler. They make sure a failure is not
>   returned without setting an error.
>   
>   s390/stattrib: Add Error** argument to set_migrationmode() handler
>   vfio: Always report an error in vfio_save_setup()
>   migration: Always report an error in block_save_setup()
>   migration: Always report an error in ram_save_setup()
>   migration: Add Error** argument to vmstate_save()
> 
> * [10-15] are the core changes in migration and memory components to
>   propagate an error reported in a save_setup() handler.
> 
>   migration: Add Error** argument to qemu_savevm_state_setup()
>   migration: Add Error** argument to .save_setup() handler
>   migration: Add Error** argument to .load_setup() handler

Further queued 5-12 in migration-staging (until here), thanks.

>   memory: Add Error** argument to .log_global_start() handler
>   memory: Add Error** argument to the global_dirty_log routines
>   migration: Modify ram_init_bitmaps() to report dirty tracking errors
> 
> * [16-19] contains the VFIO changes we are interested in. Can go
>   through vfio-next.
> 
>   vfio: Add Error** argument to .set_dirty_page_tracking() handler
>   vfio: Add Error** argument to vfio_devices_dma_logging_start()
>   vfio: Add Error** argument to vfio_devices_dma_logging_stop()
>   vfio: Use new Error** argument in vfio_save_setup()
> 
> * [20-25] are followups for better error handling in VFIO. Good to
>   have but not necessary for the issue described in the intro. Can go
>   through vfio-next.
> 
>   vfio: Add Error** argument to .vfio_save_config() handler
>   vfio: Reverse test on vfio_get_dirty_bitmap()
>   memory: Add Error** argument to memory_get_xlat_addr()
>   vfio: Add Error** argument to .get_dirty_bitmap() handler
>   vfio: Also trace event failures in vfio_save_complete_precopy()
>   vfio: Extend vfio_set_migration_error() with Error* argument
> 
> Thanks,
> 
> C.
> 
> Changes in v4:
> 
>  - Fixed frenchism futur to future
>  - Fixed typo in set_migrationmode() handler
>  - Added error_free() in hmp_migrationmode()
>  - Fixed state name printed out in error returned by vfio_save_setup()
>  - Fixed test on error returned by qemu_file_get_error()
>  - Added an error when bdrv_nb_sectors() returns a negative value 
>  - Dropped log_global_stop() and log_global_sync() changes
>  - Dropped MEMORY_LISTENER_CALL_LOG_GLOBAL
>  - Modified memory_global_dirty_log_start() to loop on the list of
>    listeners and handle errors directly.
>  - Introduced memory_global_dirty_log_rollback() to revert operations
>    previously done
> 
> Changes in v3:
> 
>  - New changes to make sure an error is always set in case of failure.
>    This is the reason behing the 5/6 extra patches. (Markus)
>  - Documentation fixup (Peter + Avihai)
>  - Set migration state to MIGRATION_STATUS_FAILED always
>  - Fixed error handling in bg_migration_thread() (Peter)
>  - Fixed return value of vfio_listener_log_global_start/stop(). 
>    Went unnoticed because value is not tested. (Peter)
>  - Add ERRP_GUARD() when error_prepend is used 
>  - Use error_setg_errno() when possible
>     
> Changes in v2:
> 
> - Removed v1 patches addressing the return-path thread termination as
>   they are now superseded by :  
>   https://lore.kernel.org/qemu-devel/20240226203122.22894-1-farosas@suse.de/
> - Documentation updates of handlers
> - Removed call to PRECOPY_NOTIFY_SETUP notifiers in case of errors
> - Modified routines taking an Error** argument to return a bool when
>   possible and made adjustments in callers.
> - new MEMORY_LISTENER_CALL_LOG_GLOBAL macro for .log_global*()
>   handlers
> - Handled SETUP state when migration terminates
> - Modified memory_get_xlat_addr() to take an Error** argument
> - Various refinements on error handling
> 
> Cédric Le Goater (25):
>   migration: Report error when shutdown fails
>   migration: Remove SaveStateHandler and LoadStateHandler typedefs
>   migration: Add documentation for SaveVMHandlers
>   migration: Do not call PRECOPY_NOTIFY_SETUP notifiers in case of error
>   s390/stattrib: Add Error** argument to set_migrationmode() handler
>   vfio: Always report an error in vfio_save_setup()
>   migration: Always report an error in block_save_setup()
>   migration: Always report an error in ram_save_setup()
>   migration: Add Error** argument to vmstate_save()
>   migration: Add Error** argument to qemu_savevm_state_setup()
>   migration: Add Error** argument to .save_setup() handler
>   migration: Add Error** argument to .load_setup() handler
>   memory: Add Error** argument to .log_global_start() handler
>   memory: Add Error** argument to the global_dirty_log routines
>   migration: Modify ram_init_bitmaps() to report dirty tracking errors
>   vfio: Add Error** argument to .set_dirty_page_tracking() handler
>   vfio: Add Error** argument to vfio_devices_dma_logging_start()
>   vfio: Add Error** argument to vfio_devices_dma_logging_stop()
>   vfio: Use new Error** argument in vfio_save_setup()
>   vfio: Add Error** argument to .vfio_save_config() handler
>   vfio: Reverse test on vfio_get_dirty_bitmap()
>   memory: Add Error** argument to memory_get_xlat_addr()
>   vfio: Add Error** argument to .get_dirty_bitmap() handler
>   vfio: Also trace event failures in vfio_save_complete_precopy()
>   vfio: Extend vfio_set_migration_error() with Error* argument
> 
>  include/exec/memory.h                 |  25 ++-
>  include/hw/s390x/storage-attributes.h |   2 +-
>  include/hw/vfio/vfio-common.h         |  29 ++-
>  include/hw/vfio/vfio-container-base.h |  35 +++-
>  include/migration/register.h          | 273 +++++++++++++++++++++++---
>  include/qemu/typedefs.h               |   2 -
>  migration/savevm.h                    |   2 +-
>  hw/i386/xen/xen-hvm.c                 |   5 +-
>  hw/ppc/spapr.c                        |   2 +-
>  hw/s390x/s390-stattrib-kvm.c          |  12 +-
>  hw/s390x/s390-stattrib.c              |  15 +-
>  hw/vfio/common.c                      | 161 +++++++++------
>  hw/vfio/container-base.c              |   9 +-
>  hw/vfio/container.c                   |  19 +-
>  hw/vfio/migration.c                   |  99 ++++++----
>  hw/vfio/pci.c                         |   5 +-
>  hw/virtio/vhost-vdpa.c                |   5 +-
>  hw/virtio/vhost.c                     |   3 +-
>  migration/block-dirty-bitmap.c        |   4 +-
>  migration/block.c                     |  19 +-
>  migration/dirtyrate.c                 |  13 +-
>  migration/migration.c                 |  27 ++-
>  migration/qemu-file.c                 |   5 +-
>  migration/ram.c                       |  46 ++++-
>  migration/savevm.c                    |  59 +++---
>  system/memory.c                       |  56 +++++-
>  26 files changed, 713 insertions(+), 219 deletions(-)
> 
> -- 
> 2.44.0
> 
>
Cédric Le Goater March 8, 2024, 1:03 p.m. UTC | #2
On 3/8/24 09:15, Peter Xu wrote:
> On Wed, Mar 06, 2024 at 02:34:15PM +0100, Cédric Le Goater wrote:
>> * [1-4] already queued in migration-next.
>>    
>>    migration: Report error when shutdown fails
>>    migration: Remove SaveStateHandler and LoadStateHandler typedefs
>>    migration: Add documentation for SaveVMHandlers
>>    migration: Do not call PRECOPY_NOTIFY_SETUP notifiers in case of error
>>    
>> * [5-9] are prequisite changes in other components related to the
>>    migration save_setup() handler. They make sure a failure is not
>>    returned without setting an error.
>>    
>>    s390/stattrib: Add Error** argument to set_migrationmode() handler
>>    vfio: Always report an error in vfio_save_setup()
>>    migration: Always report an error in block_save_setup()
>>    migration: Always report an error in ram_save_setup()
>>    migration: Add Error** argument to vmstate_save()
>>
>> * [10-15] are the core changes in migration and memory components to
>>    propagate an error reported in a save_setup() handler.
>>
>>    migration: Add Error** argument to qemu_savevm_state_setup()
>>    migration: Add Error** argument to .save_setup() handler
>>    migration: Add Error** argument to .load_setup() handler
> 
> Further queued 5-12 in migration-staging (until here), thanks.

Thanks Peter. All the prereq changes should reach 9.0, which leaves
time to discuss the core changes for 9.1.

C.
Peter Xu March 11, 2024, 8:24 p.m. UTC | #3
On Fri, Mar 08, 2024 at 04:15:08PM +0800, Peter Xu wrote:
> On Wed, Mar 06, 2024 at 02:34:15PM +0100, Cédric Le Goater wrote:
> > * [1-4] already queued in migration-next.
> >   
> >   migration: Report error when shutdown fails
> >   migration: Remove SaveStateHandler and LoadStateHandler typedefs
> >   migration: Add documentation for SaveVMHandlers
> >   migration: Do not call PRECOPY_NOTIFY_SETUP notifiers in case of error
> >   
> > * [5-9] are prequisite changes in other components related to the
> >   migration save_setup() handler. They make sure a failure is not
> >   returned without setting an error.
> >   
> >   s390/stattrib: Add Error** argument to set_migrationmode() handler
> >   vfio: Always report an error in vfio_save_setup()
> >   migration: Always report an error in block_save_setup()
> >   migration: Always report an error in ram_save_setup()
> >   migration: Add Error** argument to vmstate_save()
> > 
> > * [10-15] are the core changes in migration and memory components to
> >   propagate an error reported in a save_setup() handler.
> > 
> >   migration: Add Error** argument to qemu_savevm_state_setup()
> >   migration: Add Error** argument to .save_setup() handler
> >   migration: Add Error** argument to .load_setup() handler
> 
> Further queued 5-12 in migration-staging (until here), thanks.

Just to keep a record: due to the virtio failover test failure and the
other block migration uncertainty in patch 7 (in which case we may want to
have a fix on sectors==0 case), I unqueued this chunk for 9.0.

Thanks,
Cédric Le Goater March 12, 2024, 7:16 a.m. UTC | #4
On 3/11/24 21:24, Peter Xu wrote:
> On Fri, Mar 08, 2024 at 04:15:08PM +0800, Peter Xu wrote:
>> On Wed, Mar 06, 2024 at 02:34:15PM +0100, Cédric Le Goater wrote:
>>> * [1-4] already queued in migration-next.
>>>    
>>>    migration: Report error when shutdown fails
>>>    migration: Remove SaveStateHandler and LoadStateHandler typedefs
>>>    migration: Add documentation for SaveVMHandlers
>>>    migration: Do not call PRECOPY_NOTIFY_SETUP notifiers in case of error
>>>    
>>> * [5-9] are prequisite changes in other components related to the
>>>    migration save_setup() handler. They make sure a failure is not
>>>    returned without setting an error.
>>>    
>>>    s390/stattrib: Add Error** argument to set_migrationmode() handler
>>>    vfio: Always report an error in vfio_save_setup()
>>>    migration: Always report an error in block_save_setup()
>>>    migration: Always report an error in ram_save_setup()
>>>    migration: Add Error** argument to vmstate_save()
>>>
>>> * [10-15] are the core changes in migration and memory components to
>>>    propagate an error reported in a save_setup() handler.
>>>
>>>    migration: Add Error** argument to qemu_savevm_state_setup()
>>>    migration: Add Error** argument to .save_setup() handler
>>>    migration: Add Error** argument to .load_setup() handler
>>
>> Further queued 5-12 in migration-staging (until here), thanks.
> 
> Just to keep a record: due to the virtio failover test failure and the
> other block migration uncertainty in patch 7 (in which case we may want to
> have a fix on sectors==0 case), I unqueued this chunk for 9.0.

ok. I will ask the block folks for help to understand if sectors==0
is also an error in the save_setup context. May be  we can still
merge these in 9.0 cycle.
  
Thanks,

C.
Cédric Le Goater March 12, 2024, 9:58 a.m. UTC | #5
On 3/12/24 08:16, Cédric Le Goater wrote:
> On 3/11/24 21:24, Peter Xu wrote:
>> On Fri, Mar 08, 2024 at 04:15:08PM +0800, Peter Xu wrote:
>>> On Wed, Mar 06, 2024 at 02:34:15PM +0100, Cédric Le Goater wrote:
>>>> * [1-4] already queued in migration-next.
>>>>    migration: Report error when shutdown fails
>>>>    migration: Remove SaveStateHandler and LoadStateHandler typedefs
>>>>    migration: Add documentation for SaveVMHandlers
>>>>    migration: Do not call PRECOPY_NOTIFY_SETUP notifiers in case of error
>>>> * [5-9] are prequisite changes in other components related to the
>>>>    migration save_setup() handler. They make sure a failure is not
>>>>    returned without setting an error.
>>>>    s390/stattrib: Add Error** argument to set_migrationmode() handler
>>>>    vfio: Always report an error in vfio_save_setup()
>>>>    migration: Always report an error in block_save_setup()
>>>>    migration: Always report an error in ram_save_setup()
>>>>    migration: Add Error** argument to vmstate_save()
>>>>
>>>> * [10-15] are the core changes in migration and memory components to
>>>>    propagate an error reported in a save_setup() handler.
>>>>
>>>>    migration: Add Error** argument to qemu_savevm_state_setup()
>>>>    migration: Add Error** argument to .save_setup() handler
>>>>    migration: Add Error** argument to .load_setup() handler
>>>
>>> Further queued 5-12 in migration-staging (until here), thanks.
>>
>> Just to keep a record: due to the virtio failover test failure and the
>> other block migration uncertainty in patch 7 (in which case we may want to
>> have a fix on sectors==0 case), I unqueued this chunk for 9.0.
> 
> ok. I will ask the block folks for help to understand if sectors==0
> is also an error in the save_setup context. May be  we can still
> merge these in 9.0 cycle.

I discussed with Kevin and sectors==0 is not an error case, the loop
should simply continue. That said, commit 66db46ca83b8 ("migration:
Deprecate block migration") would let us remove all that code in
the next cycle which is even simpler.

Thanks,

C.
Peter Xu March 12, 2024, 11:50 a.m. UTC | #6
On Tue, Mar 12, 2024 at 10:58:51AM +0100, Cédric Le Goater wrote:
> On 3/12/24 08:16, Cédric Le Goater wrote:
> > On 3/11/24 21:24, Peter Xu wrote:
> > > On Fri, Mar 08, 2024 at 04:15:08PM +0800, Peter Xu wrote:
> > > > On Wed, Mar 06, 2024 at 02:34:15PM +0100, Cédric Le Goater wrote:
> > > > > * [1-4] already queued in migration-next.
> > > > >    migration: Report error when shutdown fails
> > > > >    migration: Remove SaveStateHandler and LoadStateHandler typedefs
> > > > >    migration: Add documentation for SaveVMHandlers
> > > > >    migration: Do not call PRECOPY_NOTIFY_SETUP notifiers in case of error
> > > > > * [5-9] are prequisite changes in other components related to the
> > > > >    migration save_setup() handler. They make sure a failure is not
> > > > >    returned without setting an error.
> > > > >    s390/stattrib: Add Error** argument to set_migrationmode() handler
> > > > >    vfio: Always report an error in vfio_save_setup()
> > > > >    migration: Always report an error in block_save_setup()
> > > > >    migration: Always report an error in ram_save_setup()
> > > > >    migration: Add Error** argument to vmstate_save()
> > > > > 
> > > > > * [10-15] are the core changes in migration and memory components to
> > > > >    propagate an error reported in a save_setup() handler.
> > > > > 
> > > > >    migration: Add Error** argument to qemu_savevm_state_setup()
> > > > >    migration: Add Error** argument to .save_setup() handler
> > > > >    migration: Add Error** argument to .load_setup() handler
> > > > 
> > > > Further queued 5-12 in migration-staging (until here), thanks.
> > > 
> > > Just to keep a record: due to the virtio failover test failure and the
> > > other block migration uncertainty in patch 7 (in which case we may want to
> > > have a fix on sectors==0 case), I unqueued this chunk for 9.0.
> > 
> > ok. I will ask the block folks for help to understand if sectors==0
> > is also an error in the save_setup context. May be  we can still
> > merge these in 9.0 cycle.
> 
> I discussed with Kevin and sectors==0 is not an error case, the loop
> should simply continue. That said, commit 66db46ca83b8 ("migration:
> Deprecate block migration") would let us remove all that code in
> the next cycle which is even simpler.

Thanks for taking a look.  I can try to have a look at removing block
migration in 9.1.

Regarding to the failover failure - I still think what you posted as a
"hack" could be an official patch.  Do you plan to send it?  Or do you have
anything else in mind?

For 9.0, we're missing softfreeze. IIUC we can only merge things like
regression fixes, documentation updates, some test changess, etc.. into rc
windows. With QEMU's heavy reliance on CI now I don't even think most test
case changes would be applicable for RCs unless it's never run in a CI.  So
unless there's a strong need, it'll be easier if we wait for 9.1 (but yet
again, we can still queue them earlier, so they will appear in the 1st 9.1
pull).

Thanks,
Cédric Le Goater March 12, 2024, 12:09 p.m. UTC | #7
On 3/12/24 12:50, Peter Xu wrote:
> On Tue, Mar 12, 2024 at 10:58:51AM +0100, Cédric Le Goater wrote:
>> On 3/12/24 08:16, Cédric Le Goater wrote:
>>> On 3/11/24 21:24, Peter Xu wrote:
>>>> On Fri, Mar 08, 2024 at 04:15:08PM +0800, Peter Xu wrote:
>>>>> On Wed, Mar 06, 2024 at 02:34:15PM +0100, Cédric Le Goater wrote:
>>>>>> * [1-4] already queued in migration-next.
>>>>>>     migration: Report error when shutdown fails
>>>>>>     migration: Remove SaveStateHandler and LoadStateHandler typedefs
>>>>>>     migration: Add documentation for SaveVMHandlers
>>>>>>     migration: Do not call PRECOPY_NOTIFY_SETUP notifiers in case of error
>>>>>> * [5-9] are prequisite changes in other components related to the
>>>>>>     migration save_setup() handler. They make sure a failure is not
>>>>>>     returned without setting an error.
>>>>>>     s390/stattrib: Add Error** argument to set_migrationmode() handler
>>>>>>     vfio: Always report an error in vfio_save_setup()
>>>>>>     migration: Always report an error in block_save_setup()
>>>>>>     migration: Always report an error in ram_save_setup()
>>>>>>     migration: Add Error** argument to vmstate_save()
>>>>>>
>>>>>> * [10-15] are the core changes in migration and memory components to
>>>>>>     propagate an error reported in a save_setup() handler.
>>>>>>
>>>>>>     migration: Add Error** argument to qemu_savevm_state_setup()
>>>>>>     migration: Add Error** argument to .save_setup() handler
>>>>>>     migration: Add Error** argument to .load_setup() handler
>>>>>
>>>>> Further queued 5-12 in migration-staging (until here), thanks.
>>>>
>>>> Just to keep a record: due to the virtio failover test failure and the
>>>> other block migration uncertainty in patch 7 (in which case we may want to
>>>> have a fix on sectors==0 case), I unqueued this chunk for 9.0.
>>>
>>> ok. I will ask the block folks for help to understand if sectors==0
>>> is also an error in the save_setup context. May be  we can still
>>> merge these in 9.0 cycle.
>>
>> I discussed with Kevin and sectors==0 is not an error case, the loop
>> should simply continue. That said, commit 66db46ca83b8 ("migration:
>> Deprecate block migration") would let us remove all that code in
>> the next cycle which is even simpler.
> 
> Thanks for taking a look.  I can try to have a look at removing block
> migration in 9.1.

Just sent a 9.0 fix for the block part.

> Regarding to the failover failure - I still think what you posted as a
> "hack" could be an official patch.  Do you plan to send it?  
> Or do you have anything else in mind?

I was hoping to fix the test case instead. I can try to improve the hack
I sent this afternoon.

Thanks,

C.


> 
> For 9.0, we're missing softfreeze. IIUC we can only merge things like
> regression fixes, documentation updates, some test changess, etc.. into rc
> windows. With QEMU's heavy reliance on CI now I don't even think most test
> case changes would be applicable for RCs unless it's never run in a CI.  So
> unless there's a strong need, it'll be easier if we wait for 9.1 (but yet
> again, we can still queue them earlier, so they will appear in the 1st 9.1
> pull).
> 
> Thanks,
>
Peter Xu March 12, 2024, 12:25 p.m. UTC | #8
On Tue, Mar 12, 2024 at 01:09:42PM +0100, Cédric Le Goater wrote:
> I was hoping to fix the test case instead. I can try to improve the hack
> I sent this afternoon.

Thanks, please go whatever way you think is the right approach.