diff mbox

migration: introduce decompress-error-check

Message ID 20180426091519.26934-1-xiaoguangrong@tencent.com (mailing list archive)
State New, archived
Headers show

Commit Message

Xiao Guangrong April 26, 2018, 9:15 a.m. UTC
From: Xiao Guangrong <xiaoguangrong@tencent.com>

QEMU 2.13 enables strict check for compression & decompression to
make the migration more robuster, that depends on the source to fix
the internal design which triggers the unexpected error conditions

To make it work for migrating old version QEMU to 2.13 QEMU, we
introduce this parameter to disable the error check on the
destination

Signed-off-by: Xiao Guangrong <xiaoguangrong@tencent.com>
---
 hmp.c                 |  8 ++++++++
 migration/migration.c | 19 +++++++++++++++++++
 migration/migration.h |  1 +
 migration/ram.c       |  2 +-
 qapi/migration.json   | 43 +++++++++++++++++++++++++++++++++++++++----
 5 files changed, 68 insertions(+), 5 deletions(-)

Comments

Xiao Guangrong April 26, 2018, 9:19 a.m. UTC | #1
Hi Dave,

On 04/26/2018 05:15 PM, guangrong.xiao@gmail.com wrote:
> From: Xiao Guangrong <xiaoguangrong@tencent.com>
> 
> QEMU 2.13 enables strict check for compression & decompression to
> make the migration more robuster, that depends on the source to fix
> the internal design which triggers the unexpected error conditions
> 
> To make it work for migrating old version QEMU to 2.13 QEMU, we
> introduce this parameter to disable the error check on the
> destination

We noticed this issue when evaluated your pull request, could you
please review this fix and pull it to 2.13 release? Sorry, i did
not realize it before. :(
Dr. David Alan Gilbert April 26, 2018, 9:34 a.m. UTC | #2
* guangrong.xiao@gmail.com (guangrong.xiao@gmail.com) wrote:
> From: Xiao Guangrong <xiaoguangrong@tencent.com>
> 
> QEMU 2.13 enables strict check for compression & decompression to
> make the migration more robuster, that depends on the source to fix
> the internal design which triggers the unexpected error conditions

Hmm that's painful!

> To make it work for migrating old version QEMU to 2.13 QEMU, we
> introduce this parameter to disable the error check on the
> destination
> 
> Signed-off-by: Xiao Guangrong <xiaoguangrong@tencent.com>
> ---
>  hmp.c                 |  8 ++++++++
>  migration/migration.c | 19 +++++++++++++++++++
>  migration/migration.h |  1 +
>  migration/ram.c       |  2 +-
>  qapi/migration.json   | 43 +++++++++++++++++++++++++++++++++++++++----
>  5 files changed, 68 insertions(+), 5 deletions(-)
> 
> diff --git a/hmp.c b/hmp.c
> index 898e25f3e1..f0b934368b 100644
> --- a/hmp.c
> +++ b/hmp.c
> @@ -329,6 +329,10 @@ void hmp_info_migrate_parameters(Monitor *mon, const QDict *qdict)
>          monitor_printf(mon, "%s: %u\n",
>              MigrationParameter_str(MIGRATION_PARAMETER_DECOMPRESS_THREADS),
>              params->decompress_threads);
> +        assert(params->has_decompress_error_check);
> +        monitor_printf(mon, "%s: %s\n",
> +            MigrationParameter_str(MIGRATION_PARAMETER_DECOMPRESS_ERROR_CHECK),
> +            params->decompress_error_check ? "on" : "off");

Since it's a bool, this should be a migration-capability rather than a
parameter I think.

Also, we need to make sure it defaults to off for compatibility.

Other than that, I think it's OK.

Dave

>          assert(params->has_cpu_throttle_initial);
>          monitor_printf(mon, "%s: %u\n",
>              MigrationParameter_str(MIGRATION_PARAMETER_CPU_THROTTLE_INITIAL),
> @@ -1593,6 +1597,10 @@ void hmp_migrate_set_parameter(Monitor *mon, const QDict *qdict)
>          p->has_decompress_threads = true;
>          visit_type_int(v, param, &p->decompress_threads, &err);
>          break;
> +    case MIGRATION_PARAMETER_DECOMPRESS_ERROR_CHECK:
> +        p->has_decompress_error_check = true;
> +        visit_type_bool(v, param, &p->decompress_error_check, &err);
> +        break;
>      case MIGRATION_PARAMETER_CPU_THROTTLE_INITIAL:
>          p->has_cpu_throttle_initial = true;
>          visit_type_int(v, param, &p->cpu_throttle_initial, &err);
> diff --git a/migration/migration.c b/migration/migration.c
> index 0bdb28e144..1eef084763 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -534,6 +534,8 @@ MigrationParameters *qmp_query_migrate_parameters(Error **errp)
>      params->compress_threads = s->parameters.compress_threads;
>      params->has_decompress_threads = true;
>      params->decompress_threads = s->parameters.decompress_threads;
> +    params->has_decompress_error_check = true;
> +    params->decompress_error_check = s->parameters.decompress_error_check;
>      params->has_cpu_throttle_initial = true;
>      params->cpu_throttle_initial = s->parameters.cpu_throttle_initial;
>      params->has_cpu_throttle_increment = true;
> @@ -917,6 +919,10 @@ static void migrate_params_test_apply(MigrateSetParameters *params,
>          dest->decompress_threads = params->decompress_threads;
>      }
>  
> +    if (params->has_decompress_error_check) {
> +        dest->decompress_error_check = params->decompress_error_check;
> +    }
> +
>      if (params->has_cpu_throttle_initial) {
>          dest->cpu_throttle_initial = params->cpu_throttle_initial;
>      }
> @@ -979,6 +985,10 @@ static void migrate_params_apply(MigrateSetParameters *params, Error **errp)
>          s->parameters.decompress_threads = params->decompress_threads;
>      }
>  
> +    if (params->has_decompress_error_check) {
> +        s->parameters.decompress_error_check = params->decompress_error_check;
> +    }
> +
>      if (params->has_cpu_throttle_initial) {
>          s->parameters.cpu_throttle_initial = params->cpu_throttle_initial;
>      }
> @@ -1620,6 +1630,15 @@ int migrate_decompress_threads(void)
>      return s->parameters.decompress_threads;
>  }
>  
> +bool migrate_decompress_error_check(void)
> +{
> +    MigrationState *s;
> +
> +    s = migrate_get_current();
> +
> +    return s->parameters.decompress_error_check;
> +}
> +
>  bool migrate_dirty_bitmaps(void)
>  {
>      MigrationState *s;
> diff --git a/migration/migration.h b/migration/migration.h
> index 7c69598c54..5efbbaf9d8 100644
> --- a/migration/migration.h
> +++ b/migration/migration.h
> @@ -241,6 +241,7 @@ bool migrate_use_compression(void);
>  int migrate_compress_level(void);
>  int migrate_compress_threads(void);
>  int migrate_decompress_threads(void);
> +bool migrate_decompress_error_check(void);
>  bool migrate_use_events(void);
>  bool migrate_postcopy_blocktime(void);
>  
> diff --git a/migration/ram.c b/migration/ram.c
> index 912810c18e..01cc815410 100644
> --- a/migration/ram.c
> +++ b/migration/ram.c
> @@ -2581,7 +2581,7 @@ static void *do_data_decompress(void *opaque)
>  
>              ret = qemu_uncompress_data(&param->stream, des, pagesize,
>                                         param->compbuf, len);
> -            if (ret < 0) {
> +            if (ret < 0 && migrate_decompress_error_check()) {
>                  error_report("decompress data failed");
>                  qemu_file_set_error(decomp_file, ret);
>              }
> diff --git a/qapi/migration.json b/qapi/migration.json
> index f3974c6807..68222358e1 100644
> --- a/qapi/migration.json
> +++ b/qapi/migration.json
> @@ -455,6 +455,17 @@
>  #          compression, so set the decompress-threads to the number about 1/4
>  #          of compress-threads is adequate.
>  #
> +# @decompress-error-check: check decompression errors. When false, the errors
> +#                          triggered by memory decompression are ignored.
> +#                          When true, migration is aborted if the errors are
> +#                          detected. For the old QEMU versions (< 2.13) the
> +#                          internal design will cause decompression to fail
> +#                          so the destination should completely ignore the
> +#                          error conditions, i.e, make it be false if these
> +#                          QEMUs are going to be migrated. Since 2.13, this
> +#                          design is fixed, make it be true to avoid corrupting
> +#                          the VM silently (Since 2.13)
> +#
>  # @cpu-throttle-initial: Initial percentage of time guest cpus are throttled
>  #                        when migration auto-converge is activated. The
>  #                        default value is 20. (Since 2.7)
> @@ -511,10 +522,10 @@
>  ##
>  { 'enum': 'MigrationParameter',
>    'data': ['compress-level', 'compress-threads', 'decompress-threads',
> -           'cpu-throttle-initial', 'cpu-throttle-increment',
> -           'tls-creds', 'tls-hostname', 'max-bandwidth',
> -           'downtime-limit', 'x-checkpoint-delay', 'block-incremental',
> -           'x-multifd-channels', 'x-multifd-page-count',
> +           'decompress-error-check', 'cpu-throttle-initial',
> +           'cpu-throttle-increment', 'tls-creds', 'tls-hostname',
> +           'max-bandwidth', 'downtime-limit', 'x-checkpoint-delay',
> +           'block-incremental', 'x-multifd-channels', 'x-multifd-page-count',
>             'xbzrle-cache-size' ] }
>  
>  ##
> @@ -526,6 +537,17 @@
>  #
>  # @decompress-threads: decompression thread count
>  #
> +# @decompress-error-check: check decompression errors. When false, the errors
> +#                          triggered by memory decompression are ignored.
> +#                          When true, migration is aborted if the errors are
> +#                          detected. For the old QEMU versions (< 2.13) the
> +#                          internal design will cause decompression to fail
> +#                          so the destination should completely ignore the
> +#                          error conditions, i.e, make it be false if these
> +#                          QEMUs are going to be migrated. Since 2.13, this
> +#                          design is fixed, make it be true to avoid corrupting
> +#                          the VM silently (Since 2.13)
> +#
>  # @cpu-throttle-initial: Initial percentage of time guest cpus are
>  #                        throttled when migration auto-converge is activated.
>  #                        The default value is 20. (Since 2.7)
> @@ -591,6 +613,7 @@
>    'data': { '*compress-level': 'int',
>              '*compress-threads': 'int',
>              '*decompress-threads': 'int',
> +            '*decompress-error-check': 'bool',
>              '*cpu-throttle-initial': 'int',
>              '*cpu-throttle-increment': 'int',
>              '*tls-creds': 'StrOrNull',
> @@ -630,6 +653,17 @@
>  #
>  # @decompress-threads: decompression thread count
>  #
> +# @decompress-error-check: check decompression errors. When false, the errors
> +#                          triggered by memory decompression are ignored.
> +#                          When true, migration is aborted if the errors are
> +#                          detected. For the old QEMU versions (< 2.13) the
> +#                          internal design will cause decompression to fail
> +#                          so the destination should completely ignore the
> +#                          error conditions, i.e, make it be false if these
> +#                          QEMUs are going to be migrated. Since 2.13, this
> +#                          design is fixed, make it be true to avoid corrupting
> +#                          the VM silently (Since 2.13)
> +#
>  # @cpu-throttle-initial: Initial percentage of time guest cpus are
>  #                        throttled when migration auto-converge is activated.
>  #                        (Since 2.7)
> @@ -690,6 +724,7 @@
>    'data': { '*compress-level': 'uint8',
>              '*compress-threads': 'uint8',
>              '*decompress-threads': 'uint8',
> +            '*decompress-error-check': 'bool',
>              '*cpu-throttle-initial': 'uint8',
>              '*cpu-throttle-increment': 'uint8',
>              '*tls-creds': 'str',
> -- 
> 2.14.3
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
Xiao Guangrong April 26, 2018, 1:18 p.m. UTC | #3
On 04/26/2018 05:34 PM, Dr. David Alan Gilbert wrote:
> * guangrong.xiao@gmail.com (guangrong.xiao@gmail.com) wrote:
>> From: Xiao Guangrong <xiaoguangrong@tencent.com>
>>
>> QEMU 2.13 enables strict check for compression & decompression to
>> make the migration more robuster, that depends on the source to fix
>> the internal design which triggers the unexpected error conditions
> 
> Hmm that's painful!

Yup, i will think it more carefully next time. :(

> 
>> To make it work for migrating old version QEMU to 2.13 QEMU, we
>> introduce this parameter to disable the error check on the
>> destination
>>
>> Signed-off-by: Xiao Guangrong <xiaoguangrong@tencent.com>
>> ---
>>   hmp.c                 |  8 ++++++++
>>   migration/migration.c | 19 +++++++++++++++++++
>>   migration/migration.h |  1 +
>>   migration/ram.c       |  2 +-
>>   qapi/migration.json   | 43 +++++++++++++++++++++++++++++++++++++++----
>>   5 files changed, 68 insertions(+), 5 deletions(-)
>>
>> diff --git a/hmp.c b/hmp.c
>> index 898e25f3e1..f0b934368b 100644
>> --- a/hmp.c
>> +++ b/hmp.c
>> @@ -329,6 +329,10 @@ void hmp_info_migrate_parameters(Monitor *mon, const QDict *qdict)
>>           monitor_printf(mon, "%s: %u\n",
>>               MigrationParameter_str(MIGRATION_PARAMETER_DECOMPRESS_THREADS),
>>               params->decompress_threads);
>> +        assert(params->has_decompress_error_check);
>> +        monitor_printf(mon, "%s: %s\n",
>> +            MigrationParameter_str(MIGRATION_PARAMETER_DECOMPRESS_ERROR_CHECK),
>> +            params->decompress_error_check ? "on" : "off");
> 
> Since it's a bool, this should be a migration-capability rather than a
> parameter I think.
> 

The parameter, @block-incremental, it is also a bool, i think it is okay to
decompress-error-check as well, and it is one of the configurations
of decompression, it is better to group them. Right? :)

> Also, we need to make sure it defaults to off for compatibility.

Yes, the parameter is allocated by zalloc, it has already been false on
default, do you think we should make it be false explicitly?

> 
> Other than that, I think it's OK.

Thank you, Dave!
Eric Blake April 26, 2018, 2:01 p.m. UTC | #4
On 04/26/2018 04:15 AM, guangrong.xiao@gmail.com wrote:
> From: Xiao Guangrong <xiaoguangrong@tencent.com>
> 
> QEMU 2.13 enables strict check for compression & decompression to
> make the migration more robuster, that depends on the source to fix

s/robuster/robust/

> the internal design which triggers the unexpected error conditions

2.13 hasn't been released yet.  Why do we need a knob to explicitly turn
off strict checking?  Can we not instead make 2.13 automatically smart
enough to tell if the incoming stream is coming from an older qemu
(which might fail if the strict checks are enabled) vs. a newer qemu
(the sender gave us what we need to ensure the strict checks are
worthwhile)?

> 
> To make it work for migrating old version QEMU to 2.13 QEMU, we
> introduce this parameter to disable the error check on the
> destination
> 
> Signed-off-by: Xiao Guangrong <xiaoguangrong@tencent.com>
> ---

> +++ b/qapi/migration.json
> @@ -455,6 +455,17 @@
>  #          compression, so set the decompress-threads to the number about 1/4
>  #          of compress-threads is adequate.
>  #
> +# @decompress-error-check: check decompression errors. When false, the errors
> +#                          triggered by memory decompression are ignored.

What are the consequences of such an error?  Is it a security hole to
leave this at false, when a malicious migration stream can cause us to
misbehave by ignoring the errors?

> +#                          When true, migration is aborted if the errors are
> +#                          detected. For the old QEMU versions (< 2.13) the
> +#                          internal design will cause decompression to fail
> +#                          so the destination should completely ignore the
> +#                          error conditions, i.e, make it be false if these
> +#                          QEMUs are going to be migrated. Since 2.13, this
> +#                          design is fixed, make it be true to avoid corrupting
> +#                          the VM silently (Since 2.13)

Rather wordy; I'd suggest:

@decompress-error-check: Set to true to abort the migration if
        decompression errors are detected at the destination. Should be
        left at false (default) for qemu older than 2.13, since only
        newer qemu sends streams that do not trigger spurious
        decompression errors. (Since 2.13)

But that's if we even need it (it SHOULD be possible to design something
into the migration stream so that you can detect this property
automatically instead of relying on the user to set the property).
Xiao Guangrong April 27, 2018, 3:15 a.m. UTC | #5
On 04/26/2018 10:01 PM, Eric Blake wrote:
> On 04/26/2018 04:15 AM, guangrong.xiao@gmail.com wrote:
>> From: Xiao Guangrong <xiaoguangrong@tencent.com>
>>
>> QEMU 2.13 enables strict check for compression & decompression to
>> make the migration more robuster, that depends on the source to fix
> 
> s/robuster/robust/
> 

Will fix, thank you for pointing it out.

>> the internal design which triggers the unexpected error conditions
> 
> 2.13 hasn't been released yet.  Why do we need a knob to explicitly turn
> off strict checking?  Can we not instead make 2.13 automatically smart
> enough to tell if the incoming stream is coming from an older qemu
> (which might fail if the strict checks are enabled) vs. a newer qemu
> (the sender gave us what we need to ensure the strict checks are
> worthwhile)?
> 

Really smart.

How about introduce a new command, MIG_CMD_DECOMPRESS_ERR_CHECK,
the destination will do strict check if got this command (i.e, new
QEMU is running on the source), otherwise, turn the check off.

>>
>> To make it work for migrating old version QEMU to 2.13 QEMU, we
>> introduce this parameter to disable the error check on the
>> destination
>>
>> Signed-off-by: Xiao Guangrong <xiaoguangrong@tencent.com>
>> ---
> 
>> +++ b/qapi/migration.json
>> @@ -455,6 +455,17 @@
>>   #          compression, so set the decompress-threads to the number about 1/4
>>   #          of compress-threads is adequate.
>>   #
>> +# @decompress-error-check: check decompression errors. When false, the errors
>> +#                          triggered by memory decompression are ignored.
> 
> What are the consequences of such an error?  Is it a security hole to
> leave this at false, when a malicious migration stream can cause us to
> misbehave by ignoring the errors?

The issue fixed by strict error check is avoiding VM corruption if zlib failed
to compress & decompress the memory, i.e, catch error conditions returned by
zlib API.

> 
>> +#                          When true, migration is aborted if the errors are
>> +#                          detected. For the old QEMU versions (< 2.13) the
>> +#                          internal design will cause decompression to fail
>> +#                          so the destination should completely ignore the
>> +#                          error conditions, i.e, make it be false if these
>> +#                          QEMUs are going to be migrated. Since 2.13, this
>> +#                          design is fixed, make it be true to avoid corrupting
>> +#                          the VM silently (Since 2.13)
> 
> Rather wordy; I'd suggest:
> 
> @decompress-error-check: Set to true to abort the migration if
>          decompression errors are detected at the destination. Should be
>          left at false (default) for qemu older than 2.13, since only
>          newer qemu sends streams that do not trigger spurious
>          decompression errors. (Since 2.13)
> 

Yup, much better.

> But that's if we even need it (it SHOULD be possible to design something
> into the migration stream so that you can detect this property
> automatically instead of relying on the user to set the property).
> 

Yes, can not agree with you more. :)
Peter Xu April 27, 2018, 9:31 a.m. UTC | #6
On Fri, Apr 27, 2018 at 11:15:37AM +0800, Xiao Guangrong wrote:
> 
> 
> On 04/26/2018 10:01 PM, Eric Blake wrote:
> > On 04/26/2018 04:15 AM, guangrong.xiao@gmail.com wrote:
> > > From: Xiao Guangrong <xiaoguangrong@tencent.com>
> > > 
> > > QEMU 2.13 enables strict check for compression & decompression to
> > > make the migration more robuster, that depends on the source to fix
> > 
> > s/robuster/robust/
> > 
> 
> Will fix, thank you for pointing it out.
> 
> > > the internal design which triggers the unexpected error conditions
> > 
> > 2.13 hasn't been released yet.  Why do we need a knob to explicitly turn
> > off strict checking?  Can we not instead make 2.13 automatically smart
> > enough to tell if the incoming stream is coming from an older qemu
> > (which might fail if the strict checks are enabled) vs. a newer qemu
> > (the sender gave us what we need to ensure the strict checks are
> > worthwhile)?
> > 
> 
> Really smart.
> 
> How about introduce a new command, MIG_CMD_DECOMPRESS_ERR_CHECK,
> the destination will do strict check if got this command (i.e, new
> QEMU is running on the source), otherwise, turn the check off.

Why not we just introduce a compat bit for that?  I mean something
like: 15c3850325 ("migration: move skip_section_footers",
2017-06-28).  Then we turn that check bit off for <=2.12.

Would that work?

(I would suspect that's what Eric mean too)

Regards,
Xiao Guangrong April 27, 2018, 10:40 a.m. UTC | #7
On 04/27/2018 05:31 PM, Peter Xu wrote:
> On Fri, Apr 27, 2018 at 11:15:37AM +0800, Xiao Guangrong wrote:
>>
>>
>> On 04/26/2018 10:01 PM, Eric Blake wrote:
>>> On 04/26/2018 04:15 AM, guangrong.xiao@gmail.com wrote:
>>>> From: Xiao Guangrong <xiaoguangrong@tencent.com>
>>>>
>>>> QEMU 2.13 enables strict check for compression & decompression to
>>>> make the migration more robuster, that depends on the source to fix
>>>
>>> s/robuster/robust/
>>>
>>
>> Will fix, thank you for pointing it out.
>>
>>>> the internal design which triggers the unexpected error conditions
>>>
>>> 2.13 hasn't been released yet.  Why do we need a knob to explicitly turn
>>> off strict checking?  Can we not instead make 2.13 automatically smart
>>> enough to tell if the incoming stream is coming from an older qemu
>>> (which might fail if the strict checks are enabled) vs. a newer qemu
>>> (the sender gave us what we need to ensure the strict checks are
>>> worthwhile)?
>>>
>>
>> Really smart.
>>
>> How about introduce a new command, MIG_CMD_DECOMPRESS_ERR_CHECK,
>> the destination will do strict check if got this command (i.e, new
>> QEMU is running on the source), otherwise, turn the check off.
> 
> Why not we just introduce a compat bit for that?  I mean something
> like: 15c3850325 ("migration: move skip_section_footers",
> 2017-06-28).  Then we turn that check bit off for <=2.12.
> 
> Would that work?

I am afraid it can not. :(

The compat bit only impacts local behavior, however, in this case, we
need the source QEMU to tell the destination if it supports strict
error check.
Dr. David Alan Gilbert April 27, 2018, 11:29 a.m. UTC | #8
* Xiao Guangrong (guangrong.xiao@gmail.com) wrote:
> 
> 
> On 04/26/2018 10:01 PM, Eric Blake wrote:
> > On 04/26/2018 04:15 AM, guangrong.xiao@gmail.com wrote:
> > > From: Xiao Guangrong <xiaoguangrong@tencent.com>
> > > 
> > > QEMU 2.13 enables strict check for compression & decompression to
> > > make the migration more robuster, that depends on the source to fix
> > 
> > s/robuster/robust/
> > 
> 
> Will fix, thank you for pointing it out.
> 
> > > the internal design which triggers the unexpected error conditions
> > 
> > 2.13 hasn't been released yet.  Why do we need a knob to explicitly turn
> > off strict checking?  Can we not instead make 2.13 automatically smart
> > enough to tell if the incoming stream is coming from an older qemu
> > (which might fail if the strict checks are enabled) vs. a newer qemu
> > (the sender gave us what we need to ensure the strict checks are
> > worthwhile)?
> > 
> 
> Really smart.
> 
> How about introduce a new command, MIG_CMD_DECOMPRESS_ERR_CHECK,
> the destination will do strict check if got this command (i.e, new
> QEMU is running on the source), otherwise, turn the check off.

The problem is that if it's received by an old qemu it will fail,
so that breaks backwards migration which is unfortunate.

> > > 
> > > To make it work for migrating old version QEMU to 2.13 QEMU, we
> > > introduce this parameter to disable the error check on the
> > > destination
> > > 
> > > Signed-off-by: Xiao Guangrong <xiaoguangrong@tencent.com>
> > > ---
> > 
> > > +++ b/qapi/migration.json
> > > @@ -455,6 +455,17 @@
> > >   #          compression, so set the decompress-threads to the number about 1/4
> > >   #          of compress-threads is adequate.
> > >   #
> > > +# @decompress-error-check: check decompression errors. When false, the errors
> > > +#                          triggered by memory decompression are ignored.
> > 
> > What are the consequences of such an error?  Is it a security hole to
> > leave this at false, when a malicious migration stream can cause us to
> > misbehave by ignoring the errors?
> 
> The issue fixed by strict error check is avoiding VM corruption if zlib failed
> to compress & decompress the memory, i.e, catch error conditions returned by
> zlib API.
> 
> > 
> > > +#                          When true, migration is aborted if the errors are
> > > +#                          detected. For the old QEMU versions (< 2.13) the
> > > +#                          internal design will cause decompression to fail
> > > +#                          so the destination should completely ignore the
> > > +#                          error conditions, i.e, make it be false if these
> > > +#                          QEMUs are going to be migrated. Since 2.13, this
> > > +#                          design is fixed, make it be true to avoid corrupting
> > > +#                          the VM silently (Since 2.13)
> > 
> > Rather wordy; I'd suggest:
> > 
> > @decompress-error-check: Set to true to abort the migration if
> >          decompression errors are detected at the destination. Should be
> >          left at false (default) for qemu older than 2.13, since only
> >          newer qemu sends streams that do not trigger spurious
> >          decompression errors. (Since 2.13)
> > 
> 
> Yup, much better.
> 
> > But that's if we even need it (it SHOULD be possible to design something
> > into the migration stream so that you can detect this property
> > automatically instead of relying on the user to set the property).
> > 
> 
> Yes, can not agree with you more. :)

The challenge is how to put something into the stream without breaking
an old version of QEMU that's receiving the stream.

Dave
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
Xiao Guangrong April 28, 2018, 6:13 a.m. UTC | #9
On 04/27/2018 07:29 PM, Dr. David Alan Gilbert wrote:

>>
>> Yes, can not agree with you more. :)
> 
> The challenge is how to put something into the stream without breaking
> an old version of QEMU that's receiving the stream.
> 

Er, i did not think this case :(.

The new parameter as this patch did is the only idea now in my mind...
not sure if Eric and Peter have another solution.
Peter Xu May 2, 2018, 3:03 a.m. UTC | #10
On Fri, Apr 27, 2018 at 06:40:09PM +0800, Xiao Guangrong wrote:
> 
> 
> On 04/27/2018 05:31 PM, Peter Xu wrote:
> > On Fri, Apr 27, 2018 at 11:15:37AM +0800, Xiao Guangrong wrote:
> > > 
> > > 
> > > On 04/26/2018 10:01 PM, Eric Blake wrote:
> > > > On 04/26/2018 04:15 AM, guangrong.xiao@gmail.com wrote:
> > > > > From: Xiao Guangrong <xiaoguangrong@tencent.com>
> > > > > 
> > > > > QEMU 2.13 enables strict check for compression & decompression to
> > > > > make the migration more robuster, that depends on the source to fix
> > > > 
> > > > s/robuster/robust/
> > > > 
> > > 
> > > Will fix, thank you for pointing it out.
> > > 
> > > > > the internal design which triggers the unexpected error conditions
> > > > 
> > > > 2.13 hasn't been released yet.  Why do we need a knob to explicitly turn
> > > > off strict checking?  Can we not instead make 2.13 automatically smart
> > > > enough to tell if the incoming stream is coming from an older qemu
> > > > (which might fail if the strict checks are enabled) vs. a newer qemu
> > > > (the sender gave us what we need to ensure the strict checks are
> > > > worthwhile)?
> > > > 
> > > 
> > > Really smart.
> > > 
> > > How about introduce a new command, MIG_CMD_DECOMPRESS_ERR_CHECK,
> > > the destination will do strict check if got this command (i.e, new
> > > QEMU is running on the source), otherwise, turn the check off.
> > 
> > Why not we just introduce a compat bit for that?  I mean something
> > like: 15c3850325 ("migration: move skip_section_footers",
> > 2017-06-28).  Then we turn that check bit off for <=2.12.
> > 
> > Would that work?
> 
> I am afraid it can not. :(
> 
> The compat bit only impacts local behavior, however, in this case, we
> need the source QEMU to tell the destination if it supports strict
> error check.

My understanding is that the new compat bit will only take effect when
at destination.

I'm not sure I'm thinking that correctly. I'll give some examples.

When we migrate from <2.12 to 2.13, on 2.13 QEMU we'll possibly with
(using q35 as example, always) "-M pc-q35-2.12" to make the migration
work, so this will let the destination QEMU stop checking
decompressing errors.  IMHO that's what we want so it's fine (forward
migration).

When we migrate from 2.13 to <2.12, on 2.12 it'll always skip checking
decompression errors, so it's fine too even if we don't send some
compress-errored pages.

Then, would this mean that the compat bit could work too just like
this patch?  AFAIU the compat bit idea is very similar to current
patch, however we don't really need a new parameter to make things
complicated, we just let old QEMUs behave differently and
automatically, then user won't need to worry about manually specify
that parameter.

Thanks,
Dr. David Alan Gilbert May 2, 2018, 2:57 p.m. UTC | #11
* Peter Xu (peterx@redhat.com) wrote:
> On Fri, Apr 27, 2018 at 06:40:09PM +0800, Xiao Guangrong wrote:
> > 
> > 
> > On 04/27/2018 05:31 PM, Peter Xu wrote:
> > > On Fri, Apr 27, 2018 at 11:15:37AM +0800, Xiao Guangrong wrote:
> > > > 
> > > > 
> > > > On 04/26/2018 10:01 PM, Eric Blake wrote:
> > > > > On 04/26/2018 04:15 AM, guangrong.xiao@gmail.com wrote:
> > > > > > From: Xiao Guangrong <xiaoguangrong@tencent.com>
> > > > > > 
> > > > > > QEMU 2.13 enables strict check for compression & decompression to
> > > > > > make the migration more robuster, that depends on the source to fix
> > > > > 
> > > > > s/robuster/robust/
> > > > > 
> > > > 
> > > > Will fix, thank you for pointing it out.
> > > > 
> > > > > > the internal design which triggers the unexpected error conditions
> > > > > 
> > > > > 2.13 hasn't been released yet.  Why do we need a knob to explicitly turn
> > > > > off strict checking?  Can we not instead make 2.13 automatically smart
> > > > > enough to tell if the incoming stream is coming from an older qemu
> > > > > (which might fail if the strict checks are enabled) vs. a newer qemu
> > > > > (the sender gave us what we need to ensure the strict checks are
> > > > > worthwhile)?
> > > > > 
> > > > 
> > > > Really smart.
> > > > 
> > > > How about introduce a new command, MIG_CMD_DECOMPRESS_ERR_CHECK,
> > > > the destination will do strict check if got this command (i.e, new
> > > > QEMU is running on the source), otherwise, turn the check off.
> > > 
> > > Why not we just introduce a compat bit for that?  I mean something
> > > like: 15c3850325 ("migration: move skip_section_footers",
> > > 2017-06-28).  Then we turn that check bit off for <=2.12.
> > > 
> > > Would that work?
> > 
> > I am afraid it can not. :(
> > 
> > The compat bit only impacts local behavior, however, in this case, we
> > need the source QEMU to tell the destination if it supports strict
> > error check.
> 
> My understanding is that the new compat bit will only take effect when
> at destination.
> 
> I'm not sure I'm thinking that correctly. I'll give some examples.
> 
> When we migrate from <2.12 to 2.13, on 2.13 QEMU we'll possibly with
> (using q35 as example, always) "-M pc-q35-2.12" to make the migration
> work, so this will let the destination QEMU stop checking
> decompressing errors.  IMHO that's what we want so it's fine (forward
> migration).
> 
> When we migrate from 2.13 to <2.12, on 2.12 it'll always skip checking
> decompression errors, so it's fine too even if we don't send some
> compress-errored pages.
> 
> Then, would this mean that the compat bit could work too just like
> this patch?  AFAIU the compat bit idea is very similar to current
> patch, however we don't really need a new parameter to make things
> complicated, we just let old QEMUs behave differently and
> automatically, then user won't need to worry about manually specify
> that parameter.

I think you're saying just to wire it to the machine type for receive;
that would work and would be fairly simple, although wouldn't provide
the protection when going from new->new using an old machine type.

Dave

> Thanks,
> 
> -- 
> Peter Xu
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
Peter Xu May 3, 2018, 2:10 a.m. UTC | #12
On Wed, May 02, 2018 at 03:57:13PM +0100, Dr. David Alan Gilbert wrote:
> * Peter Xu (peterx@redhat.com) wrote:
> > On Fri, Apr 27, 2018 at 06:40:09PM +0800, Xiao Guangrong wrote:
> > > 
> > > 
> > > On 04/27/2018 05:31 PM, Peter Xu wrote:
> > > > On Fri, Apr 27, 2018 at 11:15:37AM +0800, Xiao Guangrong wrote:
> > > > > 
> > > > > 
> > > > > On 04/26/2018 10:01 PM, Eric Blake wrote:
> > > > > > On 04/26/2018 04:15 AM, guangrong.xiao@gmail.com wrote:
> > > > > > > From: Xiao Guangrong <xiaoguangrong@tencent.com>
> > > > > > > 
> > > > > > > QEMU 2.13 enables strict check for compression & decompression to
> > > > > > > make the migration more robuster, that depends on the source to fix
> > > > > > 
> > > > > > s/robuster/robust/
> > > > > > 
> > > > > 
> > > > > Will fix, thank you for pointing it out.
> > > > > 
> > > > > > > the internal design which triggers the unexpected error conditions
> > > > > > 
> > > > > > 2.13 hasn't been released yet.  Why do we need a knob to explicitly turn
> > > > > > off strict checking?  Can we not instead make 2.13 automatically smart
> > > > > > enough to tell if the incoming stream is coming from an older qemu
> > > > > > (which might fail if the strict checks are enabled) vs. a newer qemu
> > > > > > (the sender gave us what we need to ensure the strict checks are
> > > > > > worthwhile)?
> > > > > > 
> > > > > 
> > > > > Really smart.
> > > > > 
> > > > > How about introduce a new command, MIG_CMD_DECOMPRESS_ERR_CHECK,
> > > > > the destination will do strict check if got this command (i.e, new
> > > > > QEMU is running on the source), otherwise, turn the check off.
> > > > 
> > > > Why not we just introduce a compat bit for that?  I mean something
> > > > like: 15c3850325 ("migration: move skip_section_footers",
> > > > 2017-06-28).  Then we turn that check bit off for <=2.12.
> > > > 
> > > > Would that work?
> > > 
> > > I am afraid it can not. :(
> > > 
> > > The compat bit only impacts local behavior, however, in this case, we
> > > need the source QEMU to tell the destination if it supports strict
> > > error check.
> > 
> > My understanding is that the new compat bit will only take effect when
> > at destination.
> > 
> > I'm not sure I'm thinking that correctly. I'll give some examples.
> > 
> > When we migrate from <2.12 to 2.13, on 2.13 QEMU we'll possibly with
> > (using q35 as example, always) "-M pc-q35-2.12" to make the migration
> > work, so this will let the destination QEMU stop checking
> > decompressing errors.  IMHO that's what we want so it's fine (forward
> > migration).
> > 
> > When we migrate from 2.13 to <2.12, on 2.12 it'll always skip checking
> > decompression errors, so it's fine too even if we don't send some
> > compress-errored pages.
> > 
> > Then, would this mean that the compat bit could work too just like
> > this patch?  AFAIU the compat bit idea is very similar to current
> > patch, however we don't really need a new parameter to make things
> > complicated, we just let old QEMUs behave differently and
> > automatically, then user won't need to worry about manually specify
> > that parameter.
> 
> I think you're saying just to wire it to the machine type for receive;
> that would work and would be fairly simple, although wouldn't provide
> the protection when going from new->new using an old machine type.

Yes.  But actually we can still leverage the protection even with
new->new and old machine types - we just need to explicitly override
that parameter on both sides (instead of explicitly disalbe that on
old ones):

  -M pc-q35-2.12 -global migration.x-error-decompress-check=true

After all the user already specified "-M pc-q35-2.12" explicitly
rather than using the default 2.13 one, I would consider he/she an
advanced user.  Then IMHO it would be acceptable to make this explicit
too when the user really wants that.

(Will that happen a lot when people still use old machine types even
 if they are creating new VMs?)
diff mbox

Patch

diff --git a/hmp.c b/hmp.c
index 898e25f3e1..f0b934368b 100644
--- a/hmp.c
+++ b/hmp.c
@@ -329,6 +329,10 @@  void hmp_info_migrate_parameters(Monitor *mon, const QDict *qdict)
         monitor_printf(mon, "%s: %u\n",
             MigrationParameter_str(MIGRATION_PARAMETER_DECOMPRESS_THREADS),
             params->decompress_threads);
+        assert(params->has_decompress_error_check);
+        monitor_printf(mon, "%s: %s\n",
+            MigrationParameter_str(MIGRATION_PARAMETER_DECOMPRESS_ERROR_CHECK),
+            params->decompress_error_check ? "on" : "off");
         assert(params->has_cpu_throttle_initial);
         monitor_printf(mon, "%s: %u\n",
             MigrationParameter_str(MIGRATION_PARAMETER_CPU_THROTTLE_INITIAL),
@@ -1593,6 +1597,10 @@  void hmp_migrate_set_parameter(Monitor *mon, const QDict *qdict)
         p->has_decompress_threads = true;
         visit_type_int(v, param, &p->decompress_threads, &err);
         break;
+    case MIGRATION_PARAMETER_DECOMPRESS_ERROR_CHECK:
+        p->has_decompress_error_check = true;
+        visit_type_bool(v, param, &p->decompress_error_check, &err);
+        break;
     case MIGRATION_PARAMETER_CPU_THROTTLE_INITIAL:
         p->has_cpu_throttle_initial = true;
         visit_type_int(v, param, &p->cpu_throttle_initial, &err);
diff --git a/migration/migration.c b/migration/migration.c
index 0bdb28e144..1eef084763 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -534,6 +534,8 @@  MigrationParameters *qmp_query_migrate_parameters(Error **errp)
     params->compress_threads = s->parameters.compress_threads;
     params->has_decompress_threads = true;
     params->decompress_threads = s->parameters.decompress_threads;
+    params->has_decompress_error_check = true;
+    params->decompress_error_check = s->parameters.decompress_error_check;
     params->has_cpu_throttle_initial = true;
     params->cpu_throttle_initial = s->parameters.cpu_throttle_initial;
     params->has_cpu_throttle_increment = true;
@@ -917,6 +919,10 @@  static void migrate_params_test_apply(MigrateSetParameters *params,
         dest->decompress_threads = params->decompress_threads;
     }
 
+    if (params->has_decompress_error_check) {
+        dest->decompress_error_check = params->decompress_error_check;
+    }
+
     if (params->has_cpu_throttle_initial) {
         dest->cpu_throttle_initial = params->cpu_throttle_initial;
     }
@@ -979,6 +985,10 @@  static void migrate_params_apply(MigrateSetParameters *params, Error **errp)
         s->parameters.decompress_threads = params->decompress_threads;
     }
 
+    if (params->has_decompress_error_check) {
+        s->parameters.decompress_error_check = params->decompress_error_check;
+    }
+
     if (params->has_cpu_throttle_initial) {
         s->parameters.cpu_throttle_initial = params->cpu_throttle_initial;
     }
@@ -1620,6 +1630,15 @@  int migrate_decompress_threads(void)
     return s->parameters.decompress_threads;
 }
 
+bool migrate_decompress_error_check(void)
+{
+    MigrationState *s;
+
+    s = migrate_get_current();
+
+    return s->parameters.decompress_error_check;
+}
+
 bool migrate_dirty_bitmaps(void)
 {
     MigrationState *s;
diff --git a/migration/migration.h b/migration/migration.h
index 7c69598c54..5efbbaf9d8 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -241,6 +241,7 @@  bool migrate_use_compression(void);
 int migrate_compress_level(void);
 int migrate_compress_threads(void);
 int migrate_decompress_threads(void);
+bool migrate_decompress_error_check(void);
 bool migrate_use_events(void);
 bool migrate_postcopy_blocktime(void);
 
diff --git a/migration/ram.c b/migration/ram.c
index 912810c18e..01cc815410 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -2581,7 +2581,7 @@  static void *do_data_decompress(void *opaque)
 
             ret = qemu_uncompress_data(&param->stream, des, pagesize,
                                        param->compbuf, len);
-            if (ret < 0) {
+            if (ret < 0 && migrate_decompress_error_check()) {
                 error_report("decompress data failed");
                 qemu_file_set_error(decomp_file, ret);
             }
diff --git a/qapi/migration.json b/qapi/migration.json
index f3974c6807..68222358e1 100644
--- a/qapi/migration.json
+++ b/qapi/migration.json
@@ -455,6 +455,17 @@ 
 #          compression, so set the decompress-threads to the number about 1/4
 #          of compress-threads is adequate.
 #
+# @decompress-error-check: check decompression errors. When false, the errors
+#                          triggered by memory decompression are ignored.
+#                          When true, migration is aborted if the errors are
+#                          detected. For the old QEMU versions (< 2.13) the
+#                          internal design will cause decompression to fail
+#                          so the destination should completely ignore the
+#                          error conditions, i.e, make it be false if these
+#                          QEMUs are going to be migrated. Since 2.13, this
+#                          design is fixed, make it be true to avoid corrupting
+#                          the VM silently (Since 2.13)
+#
 # @cpu-throttle-initial: Initial percentage of time guest cpus are throttled
 #                        when migration auto-converge is activated. The
 #                        default value is 20. (Since 2.7)
@@ -511,10 +522,10 @@ 
 ##
 { 'enum': 'MigrationParameter',
   'data': ['compress-level', 'compress-threads', 'decompress-threads',
-           'cpu-throttle-initial', 'cpu-throttle-increment',
-           'tls-creds', 'tls-hostname', 'max-bandwidth',
-           'downtime-limit', 'x-checkpoint-delay', 'block-incremental',
-           'x-multifd-channels', 'x-multifd-page-count',
+           'decompress-error-check', 'cpu-throttle-initial',
+           'cpu-throttle-increment', 'tls-creds', 'tls-hostname',
+           'max-bandwidth', 'downtime-limit', 'x-checkpoint-delay',
+           'block-incremental', 'x-multifd-channels', 'x-multifd-page-count',
            'xbzrle-cache-size' ] }
 
 ##
@@ -526,6 +537,17 @@ 
 #
 # @decompress-threads: decompression thread count
 #
+# @decompress-error-check: check decompression errors. When false, the errors
+#                          triggered by memory decompression are ignored.
+#                          When true, migration is aborted if the errors are
+#                          detected. For the old QEMU versions (< 2.13) the
+#                          internal design will cause decompression to fail
+#                          so the destination should completely ignore the
+#                          error conditions, i.e, make it be false if these
+#                          QEMUs are going to be migrated. Since 2.13, this
+#                          design is fixed, make it be true to avoid corrupting
+#                          the VM silently (Since 2.13)
+#
 # @cpu-throttle-initial: Initial percentage of time guest cpus are
 #                        throttled when migration auto-converge is activated.
 #                        The default value is 20. (Since 2.7)
@@ -591,6 +613,7 @@ 
   'data': { '*compress-level': 'int',
             '*compress-threads': 'int',
             '*decompress-threads': 'int',
+            '*decompress-error-check': 'bool',
             '*cpu-throttle-initial': 'int',
             '*cpu-throttle-increment': 'int',
             '*tls-creds': 'StrOrNull',
@@ -630,6 +653,17 @@ 
 #
 # @decompress-threads: decompression thread count
 #
+# @decompress-error-check: check decompression errors. When false, the errors
+#                          triggered by memory decompression are ignored.
+#                          When true, migration is aborted if the errors are
+#                          detected. For the old QEMU versions (< 2.13) the
+#                          internal design will cause decompression to fail
+#                          so the destination should completely ignore the
+#                          error conditions, i.e, make it be false if these
+#                          QEMUs are going to be migrated. Since 2.13, this
+#                          design is fixed, make it be true to avoid corrupting
+#                          the VM silently (Since 2.13)
+#
 # @cpu-throttle-initial: Initial percentage of time guest cpus are
 #                        throttled when migration auto-converge is activated.
 #                        (Since 2.7)
@@ -690,6 +724,7 @@ 
   'data': { '*compress-level': 'uint8',
             '*compress-threads': 'uint8',
             '*decompress-threads': 'uint8',
+            '*decompress-error-check': 'bool',
             '*cpu-throttle-initial': 'uint8',
             '*cpu-throttle-increment': 'uint8',
             '*tls-creds': 'str',