diff mbox series

[1/4] migration/savevm: Add a compatibility check for capabilities

Message ID 20250327143934.7935-2-farosas@suse.de (mailing list archive)
State New
Headers show
Series migration: savevm testing | expand

Commit Message

Fabiano Rosas March 27, 2025, 2:39 p.m. UTC
It has always been possible to enable arbitrary migration capabilities
and attempt to take a snapshot of the VM with the savevm/loadvm
commands as well as their QMP counterparts
snapshot-save/snapshot-load.

Most migration capabilities are not meant to be used with snapshots
and there's a risk of crashing QEMU or producing incorrect
behavior. Ideally, every migration capability would either be
implemented for savevm or explicitly rejected.

Add a compatibility check routine and reject the snapshot command if
an incompatible capability is enabled. For now only act on the the two
that actually cause a crash: multifd and mapped-ram.

Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2881
Signed-off-by: Fabiano Rosas <farosas@suse.de>
---
 migration/options.c | 26 ++++++++++++++++++++++++++
 migration/options.h |  1 +
 migration/savevm.c  |  8 ++++++++
 3 files changed, 35 insertions(+)

Comments

Daniel P. Berrangé March 27, 2025, 2:54 p.m. UTC | #1
On Thu, Mar 27, 2025 at 11:39:31AM -0300, Fabiano Rosas wrote:
> It has always been possible to enable arbitrary migration capabilities
> and attempt to take a snapshot of the VM with the savevm/loadvm
> commands as well as their QMP counterparts
> snapshot-save/snapshot-load.
> 
> Most migration capabilities are not meant to be used with snapshots
> and there's a risk of crashing QEMU or producing incorrect
> behavior. Ideally, every migration capability would either be
> implemented for savevm or explicitly rejected.

IMHO, this a prime example of why migration config shouldn't be held
as global state, and instead passed as parameters to the commands
that need them.  The snapshot-save/load commands would then only
be able to accept what few settings are actually relevant, instead
of inheriting any/all global migration state.

> Add a compatibility check routine and reject the snapshot command if
> an incompatible capability is enabled. For now only act on the the two
> that actually cause a crash: multifd and mapped-ram.
> 
> Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2818

Issue is 2881 not 2818                                   ^^^^^^^

> Signed-off-by: Fabiano Rosas <farosas@suse.de>
> ---
>  migration/options.c | 26 ++++++++++++++++++++++++++
>  migration/options.h |  1 +
>  migration/savevm.c  |  8 ++++++++
>  3 files changed, 35 insertions(+)

With regards,
Daniel
Fabiano Rosas March 27, 2025, 3:11 p.m. UTC | #2
Daniel P. Berrangé <berrange@redhat.com> writes:

> On Thu, Mar 27, 2025 at 11:39:31AM -0300, Fabiano Rosas wrote:
>> It has always been possible to enable arbitrary migration capabilities
>> and attempt to take a snapshot of the VM with the savevm/loadvm
>> commands as well as their QMP counterparts
>> snapshot-save/snapshot-load.
>> 
>> Most migration capabilities are not meant to be used with snapshots
>> and there's a risk of crashing QEMU or producing incorrect
>> behavior. Ideally, every migration capability would either be
>> implemented for savevm or explicitly rejected.
>
> IMHO, this a prime example of why migration config shouldn't be held
> as global state, and instead passed as parameters to the commands
> that need them.  The snapshot-save/load commands would then only
> be able to accept what few settings are actually relevant, instead
> of inheriting any/all global migration state.
>

Right, I remember we got caught around the fact that some migration
options are needed during runtime as well... but I don't remember the
details, let try to find that thread.

>> Add a compatibility check routine and reject the snapshot command if
>> an incompatible capability is enabled. For now only act on the the two
>> that actually cause a crash: multifd and mapped-ram.
>> 
>> Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2818
>
> Issue is 2881 not 2818                                   ^^^^^^^
>

It seem that was your own C-t =)

>> Signed-off-by: Fabiano Rosas <farosas@suse.de>
>> ---
>>  migration/options.c | 26 ++++++++++++++++++++++++++
>>  migration/options.h |  1 +
>>  migration/savevm.c  |  8 ++++++++
>>  3 files changed, 35 insertions(+)
>
> With regards,
> Daniel
Marco Cavenati March 27, 2025, 4:46 p.m. UTC | #3
Hello Fabiano,

First of all thanks a lot for the quick follow up to my issue!

I just want to point out that with only mapped-ram enabled (without
multifd) savevm/loadvm do not lead to a crash but just to an error
according to my (few) experiments (on upstream).

Ciao

Marco

On Thursday, March 27, 2025 15:39 CET, Fabiano Rosas <farosas@suse.de> wrote:

> It has always been possible to enable arbitrary migration capabilities
> and attempt to take a snapshot of the VM with the savevm/loadvm
> commands as well as their QMP counterparts
> snapshot-save/snapshot-load.
> 
> Most migration capabilities are not meant to be used with snapshots
> and there's a risk of crashing QEMU or producing incorrect
> behavior. Ideally, every migration capability would either be
> implemented for savevm or explicitly rejected.
> 
> Add a compatibility check routine and reject the snapshot command if
> an incompatible capability is enabled. For now only act on the the two
> that actually cause a crash: multifd and mapped-ram.
> 
> Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2881
> Signed-off-by: Fabiano Rosas <farosas@suse.de>
> ---
>  migration/options.c | 26 ++++++++++++++++++++++++++
>  migration/options.h |  1 +
>  migration/savevm.c  |  8 ++++++++
>  3 files changed, 35 insertions(+)
> 
> diff --git a/migration/options.c b/migration/options.c
> index b0ac2ea408..8772b98dca 100644
> --- a/migration/options.c
> +++ b/migration/options.c
> @@ -443,11 +443,37 @@ INITIALIZE_MIGRATE_CAPS_SET(check_caps_background_snapshot,
>      MIGRATION_CAPABILITY_VALIDATE_UUID,
>      MIGRATION_CAPABILITY_ZERO_COPY_SEND);
>  
> +/* Snapshot compatibility check list */
> +static const
> +INITIALIZE_MIGRATE_CAPS_SET(check_caps_savevm,
> +                            MIGRATION_CAPABILITY_MULTIFD,
> +                            MIGRATION_CAPABILITY_MAPPED_RAM,
> +);
> +
>  static bool migrate_incoming_started(void)
>  {
>      return !!migration_incoming_get_current()->transport_data;
>  }
>  
> +bool migrate_can_snapshot(Error **errp)
> +{
> +    MigrationState *s = migrate_get_current();
> +    int i;
> +
> +    for (i = 0; i < check_caps_savevm.size; i++) {
> +        int incomp_cap = check_caps_savevm.caps[i];
> +
> +        if (s->capabilities[incomp_cap]) {
> +            error_setg(errp,
> +                       "Snapshots are not compatible with %s",
> +                       MigrationCapability_str(incomp_cap));
> +            return false;
> +        }
> +    }
> +
> +    return true;
> +}
> +
>  /**
>   * @migration_caps_check - check capability compatibility
>   *
> diff --git a/migration/options.h b/migration/options.h
> index 762be4e641..20b71b6f2a 100644
> --- a/migration/options.h
> +++ b/migration/options.h
> @@ -58,6 +58,7 @@ bool migrate_tls(void);
>  /* capabilities helpers */
>  
>  bool migrate_caps_check(bool *old_caps, bool *new_caps, Error **errp);
> +bool migrate_can_snapshot(Error **errp);
>  
>  /* parameters */
>  
> diff --git a/migration/savevm.c b/migration/savevm.c
> index ce158c3512..3be13bcfe8 100644
> --- a/migration/savevm.c
> +++ b/migration/savevm.c
> @@ -3229,6 +3229,10 @@ bool save_snapshot(const char *name, bool overwrite, const char *vmstate,
>  
>      GLOBAL_STATE_CODE();
>  
> +    if (!migrate_can_snapshot(errp)) {
> +        return false;
> +    }
> +
>      if (migration_is_blocked(errp)) {
>          return false;
>      }
> @@ -3413,6 +3417,10 @@ bool load_snapshot(const char *name, const char *vmstate,
>      int ret;
>      MigrationIncomingState *mis = migration_incoming_get_current();
>  
> +    if (!migrate_can_snapshot(errp)) {
> +        return false;
> +    }
> +
>      if (!bdrv_all_can_snapshot(has_devices, devices, errp)) {
>          return false;
>      }
> -- 
> 2.35.3
>
Fabiano Rosas March 27, 2025, 5:02 p.m. UTC | #4
"Marco Cavenati" <Marco.Cavenati@eurecom.fr> writes:

> Hello Fabiano,
>
> First of all thanks a lot for the quick follow up to my issue!
>
> I just want to point out that with only mapped-ram enabled (without
> multifd) savevm/loadvm do not lead to a crash but just to an error
> according to my (few) experiments (on upstream).
>

Yes, absolutely. I used imprecise language. Thanks for the correction,
I'll explain it better in the following versions.
diff mbox series

Patch

diff --git a/migration/options.c b/migration/options.c
index b0ac2ea408..8772b98dca 100644
--- a/migration/options.c
+++ b/migration/options.c
@@ -443,11 +443,37 @@  INITIALIZE_MIGRATE_CAPS_SET(check_caps_background_snapshot,
     MIGRATION_CAPABILITY_VALIDATE_UUID,
     MIGRATION_CAPABILITY_ZERO_COPY_SEND);
 
+/* Snapshot compatibility check list */
+static const
+INITIALIZE_MIGRATE_CAPS_SET(check_caps_savevm,
+                            MIGRATION_CAPABILITY_MULTIFD,
+                            MIGRATION_CAPABILITY_MAPPED_RAM,
+);
+
 static bool migrate_incoming_started(void)
 {
     return !!migration_incoming_get_current()->transport_data;
 }
 
+bool migrate_can_snapshot(Error **errp)
+{
+    MigrationState *s = migrate_get_current();
+    int i;
+
+    for (i = 0; i < check_caps_savevm.size; i++) {
+        int incomp_cap = check_caps_savevm.caps[i];
+
+        if (s->capabilities[incomp_cap]) {
+            error_setg(errp,
+                       "Snapshots are not compatible with %s",
+                       MigrationCapability_str(incomp_cap));
+            return false;
+        }
+    }
+
+    return true;
+}
+
 /**
  * @migration_caps_check - check capability compatibility
  *
diff --git a/migration/options.h b/migration/options.h
index 762be4e641..20b71b6f2a 100644
--- a/migration/options.h
+++ b/migration/options.h
@@ -58,6 +58,7 @@  bool migrate_tls(void);
 /* capabilities helpers */
 
 bool migrate_caps_check(bool *old_caps, bool *new_caps, Error **errp);
+bool migrate_can_snapshot(Error **errp);
 
 /* parameters */
 
diff --git a/migration/savevm.c b/migration/savevm.c
index ce158c3512..3be13bcfe8 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -3229,6 +3229,10 @@  bool save_snapshot(const char *name, bool overwrite, const char *vmstate,
 
     GLOBAL_STATE_CODE();
 
+    if (!migrate_can_snapshot(errp)) {
+        return false;
+    }
+
     if (migration_is_blocked(errp)) {
         return false;
     }
@@ -3413,6 +3417,10 @@  bool load_snapshot(const char *name, const char *vmstate,
     int ret;
     MigrationIncomingState *mis = migration_incoming_get_current();
 
+    if (!migrate_can_snapshot(errp)) {
+        return false;
+    }
+
     if (!bdrv_all_can_snapshot(has_devices, devices, errp)) {
         return false;
     }