[v3,10/18] vmstate: Use new JSON output visitor

Message ID	1461903820-3092-11-git-send-email-eblake@redhat.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org> From: Eric Blake <eblake@redhat.com> To: qemu-devel@nongnu.org Date: Thu, 28 Apr 2016 22:23:31 -0600 Message-Id: <1461903820-3092-11-git-send-email-eblake@redhat.com> In-Reply-To: <1461903820-3092-1-git-send-email-eblake@redhat.com> References: <1461903820-3092-1-git-send-email-eblake@redhat.com> Subject: [Qemu-devel] [PATCH v3 10/18] vmstate: Use new JSON output visitor Precedence: list Cc: Amit Shah <amit.shah@redhat.com>, famz@redhat.com, armbru@redhat.com, Juan Quintela <quintela@redhat.com> Errors-To: qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org Sender: "Qemu-devel" <qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org>

Eric Blake April 29, 2016, 4:23 a.m. UTC

Rather than using a QJSON object and converting the QString result
to a char *, we can use the new JSON output visitor and get directly
to a char *.

The conversions are a bit tricky in place (in places, we have to
copy an integer to an int64_t temporary to get the right pointer for
visit_type_int(); and for several strings, we have to copy to a
temporary variable so we can take an address (&char[] is not the
same as &char*) and cast away const), but overall still fairly
mechanical.

Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>

---
v3: retitle, rebase to master
v2: new patch
---
 include/migration/vmstate.h |  4 +--
 migration/savevm.c          | 68 ++++++++++++++++++++++++++++-----------------
 migration/vmstate.c         | 64 ++++++++++++++++++++++++++----------------
 tests/Makefile              |  2 +-
 4 files changed, 86 insertions(+), 52 deletions(-)

Markus Armbruster May 2, 2016, 1:26 p.m. UTC | #1

Eric Blake <eblake@redhat.com> writes:

> Rather than using a QJSON object and converting the QString result
> to a char *, we can use the new JSON output visitor and get directly
> to a char *.
>
> The conversions are a bit tricky in place (in places, we have to
> copy an integer to an int64_t temporary to get the right pointer for
> visit_type_int(); and for several strings, we have to copy to a
> temporary variable so we can take an address (&char[] is not the
> same as &char*) and cast away const), but overall still fairly
> mechanical.
>
> Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
> Signed-off-by: Eric Blake <eblake@redhat.com>
>
> ---
> v3: retitle, rebase to master
> v2: new patch
> ---
>  include/migration/vmstate.h |  4 +--
>  migration/savevm.c          | 68 ++++++++++++++++++++++++++++-----------------
>  migration/vmstate.c         | 64 ++++++++++++++++++++++++++----------------
>  tests/Makefile              |  2 +-
>  4 files changed, 86 insertions(+), 52 deletions(-)
>
> diff --git a/include/migration/vmstate.h b/include/migration/vmstate.h
> index 84ee355..2cdfce9 100644
> --- a/include/migration/vmstate.h
> +++ b/include/migration/vmstate.h
> @@ -29,7 +29,7 @@
>  #ifndef CONFIG_USER_ONLY
>  #include <migration/qemu-file.h>
>  #endif
> -#include <qjson.h>
> +#include "qapi/json-output-visitor.h"
>
>  typedef void SaveStateHandler(QEMUFile *f, void *opaque);
>  typedef int LoadStateHandler(QEMUFile *f, void *opaque, int version_id);
> @@ -925,7 +925,7 @@ void loadvm_free_handlers(MigrationIncomingState *mis);
>  int vmstate_load_state(QEMUFile *f, const VMStateDescription *vmsd,
>                         void *opaque, int version_id);
>  void vmstate_save_state(QEMUFile *f, const VMStateDescription *vmsd,
> -                        void *opaque, QJSON *vmdesc);
> +                        void *opaque, Visitor *vmdesc);
>
>  bool vmstate_save_needed(const VMStateDescription *vmsd, void *opaque);
>
> diff --git a/migration/savevm.c b/migration/savevm.c
> index 16ba443..c858fe9 100644
> --- a/migration/savevm.c
> +++ b/migration/savevm.c
> @@ -2,7 +2,7 @@
>   * QEMU System Emulator
>   *
>   * Copyright (c) 2003-2008 Fabrice Bellard
> - * Copyright (c) 2009-2015 Red Hat Inc
> + * Copyright (c) 2009-2016 Red Hat Inc
>   *
>   * Authors:
>   *  Juan Quintela <quintela@redhat.com>
> @@ -645,27 +645,32 @@ static int vmstate_load(QEMUFile *f, SaveStateEntry *se, int version_id)
>      return vmstate_load_state(f, se->vmsd, se->opaque, version_id);
>  }
>
> -static void vmstate_save_old_style(QEMUFile *f, SaveStateEntry *se, QJSON *vmdesc)
> +static void vmstate_save_old_style(QEMUFile *f, SaveStateEntry *se,
> +                                   Visitor *vmdesc)
>  {
>      int64_t old_offset, size;
> +    const char *tmp;
>
>      old_offset = qemu_ftell_fast(f);
>      se->ops->save_state(f, se->opaque);
>      size = qemu_ftell_fast(f) - old_offset;
>
>      if (vmdesc) {

Conditionals could be avoided: use a null visitor.  Not sure it's worth
it, though.

> -        json_prop_int(vmdesc, "size", size);
> -        json_start_array(vmdesc, "fields");
> -        json_start_object(vmdesc, NULL);
> -        json_prop_str(vmdesc, "name", "data");
> -        json_prop_int(vmdesc, "size", size);
> -        json_prop_str(vmdesc, "type", "buffer");
> -        json_end_object(vmdesc);
> -        json_end_array(vmdesc);
> +        visit_type_int(vmdesc, "size", &size, &error_abort);
> +        visit_start_list(vmdesc, "fields", NULL, 0, &error_abort);
> +        visit_start_struct(vmdesc, NULL, NULL, 0, &error_abort);
> +        tmp = "data";
> +        visit_type_str(vmdesc, "name", (char **)&tmp, &error_abort);

The Visitor interface is the same for input and for output.  Convenient
when the code is direction-agnostic.  Inconvenient when it's output: you
have to pass the value by reference even though it's only read.  In
particular, literals need a temporary, and types have to be adjusted via
cast or temporary more frequently than for by-value.

If that bothers us, we can add by-value wrappers to the interface.

Are there other output-only visitor uses?

> +        visit_type_int(vmdesc, "size", &size, &error_abort);
> +        tmp = "buffer";
> +        visit_type_str(vmdesc, "type", (char **)&tmp, &error_abort);
> +        visit_check_struct(vmdesc, &error_abort);
> +        visit_end_struct(vmdesc);
> +        visit_end_list(vmdesc);
>      }
>  }
>
> -static void vmstate_save(QEMUFile *f, SaveStateEntry *se, QJSON *vmdesc)
> +static void vmstate_save(QEMUFile *f, SaveStateEntry *se, Visitor *vmdesc)
>  {
>      trace_vmstate_save(se->idstr, se->vmsd ? se->vmsd->name : "(old)");
>      if (!se->vmsd) {
> @@ -891,7 +896,7 @@ void qemu_savevm_state_header(QEMUFile *f)
>
>      if (!savevm_state.skip_configuration || enforce_config_section()) {
>          qemu_put_byte(f, QEMU_VM_CONFIGURATION);
> -        vmstate_save_state(f, &vmstate_configuration, &savevm_state, 0);
> +        vmstate_save_state(f, &vmstate_configuration, &savevm_state, NULL);

Cleans up use of 0 as pointer literal while there, good.

Note to self: use Coccinelle to find this style bug's buddies.

>      }
>
>  }
[...]

Well, it doesn't exactly make this code prettier, but having a stupid
wrapper just to hide the ugliness isn't so hot, either.

Eric Blake May 2, 2016, 2:23 p.m. UTC | #2

On 05/02/2016 07:26 AM, Markus Armbruster wrote:
> Eric Blake <eblake@redhat.com> writes:
> 
>> Rather than using a QJSON object and converting the QString result
>> to a char *, we can use the new JSON output visitor and get directly
>> to a char *.
>>
>> The conversions are a bit tricky in place (in places, we have to
>> copy an integer to an int64_t temporary to get the right pointer for
>> visit_type_int(); and for several strings, we have to copy to a
>> temporary variable so we can take an address (&char[] is not the
>> same as &char*) and cast away const), but overall still fairly
>> mechanical.
>>
>> Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
>> Signed-off-by: Eric Blake <eblake@redhat.com>
>>

>> -static void vmstate_save_old_style(QEMUFile *f, SaveStateEntry *se, QJSON *vmdesc)
>> +static void vmstate_save_old_style(QEMUFile *f, SaveStateEntry *se,
>> +                                   Visitor *vmdesc)
>>  {
>>      int64_t old_offset, size;
>> +    const char *tmp;
>>
>>      old_offset = qemu_ftell_fast(f);
>>      se->ops->save_state(f, se->opaque);
>>      size = qemu_ftell_fast(f) - old_offset;
>>
>>      if (vmdesc) {
> 
> Conditionals could be avoided: use a null visitor.  Not sure it's worth
> it, though.

We could just teach qapi-visit-core.c to be a no-op for v==NULL (thus
hiding the conditionals in the core code, but that then slows down the
common case for more conditionals on every caller.  Maybe a null visitor
is reasonable, after all?

>> +        tmp = "data";
>> +        visit_type_str(vmdesc, "name", (char **)&tmp, &error_abort);
> 
> The Visitor interface is the same for input and for output.  Convenient
> when the code is direction-agnostic.  Inconvenient when it's output: you
> have to pass the value by reference even though it's only read.  In
> particular, literals need a temporary, and types have to be adjusted via
> cast or temporary more frequently than for by-value.
> 
> If that bothers us, we can add by-value wrappers to the interface.
> 
> Are there other output-only visitor uses?

qom-get is output-only, just as qom-set is input-only.  Maybe it's worth
an experiment to see how difficult it would be.


> Well, it doesn't exactly make this code prettier, but having a stupid
> wrapper just to hide the ugliness isn't so hot, either.

And now you see why I posted two alternatives, to see which way we want
to go.  Having convenient wrappers for output-only visits may swing the
vote.

Markus Armbruster May 3, 2016, 8:30 a.m. UTC | #3

Eric Blake <eblake@redhat.com> writes:

> On 05/02/2016 07:26 AM, Markus Armbruster wrote:
>> Eric Blake <eblake@redhat.com> writes:
>> 
>>> Rather than using a QJSON object and converting the QString result
>>> to a char *, we can use the new JSON output visitor and get directly
>>> to a char *.
>>>
>>> The conversions are a bit tricky in place (in places, we have to
>>> copy an integer to an int64_t temporary to get the right pointer for
>>> visit_type_int(); and for several strings, we have to copy to a
>>> temporary variable so we can take an address (&char[] is not the
>>> same as &char*) and cast away const), but overall still fairly
>>> mechanical.
>>>
>>> Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
>>> Signed-off-by: Eric Blake <eblake@redhat.com>
>>>
>
>>> -static void vmstate_save_old_style(QEMUFile *f, SaveStateEntry *se, QJSON *vmdesc)
>>> +static void vmstate_save_old_style(QEMUFile *f, SaveStateEntry *se,
>>> +                                   Visitor *vmdesc)
>>>  {
>>>      int64_t old_offset, size;
>>> +    const char *tmp;
>>>
>>>      old_offset = qemu_ftell_fast(f);
>>>      se->ops->save_state(f, se->opaque);
>>>      size = qemu_ftell_fast(f) - old_offset;
>>>
>>>      if (vmdesc) {
>> 
>> Conditionals could be avoided: use a null visitor.  Not sure it's worth
>> it, though.
>
> We could just teach qapi-visit-core.c to be a no-op for v==NULL (thus
> hiding the conditionals in the core code, but that then slows down the
> common case for more conditionals on every caller.  Maybe a null visitor
> is reasonable, after all?

I'd prefer a null visitor to messing up the core with conditionals.

Again: not sure avoiding the conditionals here is worth a null visitor.
If you want to find out, cook up a patch and we'll see.

>>> +        tmp = "data";
>>> +        visit_type_str(vmdesc, "name", (char **)&tmp, &error_abort);
>> 
>> The Visitor interface is the same for input and for output.  Convenient
>> when the code is direction-agnostic.  Inconvenient when it's output: you
>> have to pass the value by reference even though it's only read.  In
>> particular, literals need a temporary, and types have to be adjusted via
>> cast or temporary more frequently than for by-value.
>> 
>> If that bothers us, we can add by-value wrappers to the interface.
>> 
>> Are there other output-only visitor uses?
>
> qom-get is output-only, just as qom-set is input-only.  Maybe it's worth
> an experiment to see how difficult it would be.
>
>
>> Well, it doesn't exactly make this code prettier, but having a stupid
>> wrapper just to hide the ugliness isn't so hot, either.
>
> And now you see why I posted two alternatives, to see which way we want
> to go.  Having convenient wrappers for output-only visits may swing the
> vote.

Having read your first alternative, my immediate reaction was "why don't
you do X instead?", where X turned out to be your second alternative.  I
really don't like this feature-limited wrapper that is used in just one
place.  I also don't like its confusing git-log, courtesy of its
unwisely chosen filename.

But having read the second alternative, I understand why you're offering
the first one: both are ugly.

So which of the uglies do I prefer?  The experiment could indeed swing
my vote.

Dr. David Alan Gilbert May 3, 2016, 9:44 a.m. UTC | #4

* Eric Blake (eblake@redhat.com) wrote:

> -static void vmstate_save_old_style(QEMUFile *f, SaveStateEntry *se, QJSON *vmdesc)
> +static void vmstate_save_old_style(QEMUFile *f, SaveStateEntry *se,
> +                                   Visitor *vmdesc)
>  {
>      int64_t old_offset, size;
> +    const char *tmp;
> 
>      old_offset = qemu_ftell_fast(f);
>      se->ops->save_state(f, se->opaque);
>      size = qemu_ftell_fast(f) - old_offset;
> 
>      if (vmdesc) {
> -        json_prop_int(vmdesc, "size", size);
> -        json_start_array(vmdesc, "fields");
> -        json_start_object(vmdesc, NULL);
> -        json_prop_str(vmdesc, "name", "data");
> -        json_prop_int(vmdesc, "size", size);
> -        json_prop_str(vmdesc, "type", "buffer");
> -        json_end_object(vmdesc);
> -        json_end_array(vmdesc);
> +        visit_type_int(vmdesc, "size", &size, &error_abort);
> +        visit_start_list(vmdesc, "fields", NULL, 0, &error_abort);
> +        visit_start_struct(vmdesc, NULL, NULL, 0, &error_abort);

Please avoid error_abort in migration code, especially on the source side.
You've got an apparently happily working VM, we must never kill it 
while attempting migration.

Dave

> +        tmp = "data";
> +        visit_type_str(vmdesc, "name", (char **)&tmp, &error_abort);
> +        visit_type_int(vmdesc, "size", &size, &error_abort);
> +        tmp = "buffer";
> +        visit_type_str(vmdesc, "type", (char **)&tmp, &error_abort);
> +        visit_check_struct(vmdesc, &error_abort);
> +        visit_end_struct(vmdesc);
> +        visit_end_list(vmdesc);
>      }
>  }
> 
> -static void vmstate_save(QEMUFile *f, SaveStateEntry *se, QJSON *vmdesc)
> +static void vmstate_save(QEMUFile *f, SaveStateEntry *se, Visitor *vmdesc)
>  {
>      trace_vmstate_save(se->idstr, se->vmsd ? se->vmsd->name : "(old)");
>      if (!se->vmsd) {
> @@ -891,7 +896,7 @@ void qemu_savevm_state_header(QEMUFile *f)
> 
>      if (!savevm_state.skip_configuration || enforce_config_section()) {
>          qemu_put_byte(f, QEMU_VM_CONFIGURATION);
> -        vmstate_save_state(f, &vmstate_configuration, &savevm_state, 0);
> +        vmstate_save_state(f, &vmstate_configuration, &savevm_state, NULL);
>      }
> 
>  }
> @@ -1033,11 +1038,15 @@ void qemu_savevm_state_complete_postcopy(QEMUFile *f)
> 
>  void qemu_savevm_state_complete_precopy(QEMUFile *f, bool iterable_only)
>  {
> -    QJSON *vmdesc;
> +    JsonOutputVisitor *vmdesc_jov;
> +    Visitor *vmdesc;
> +    char *vmdesc_str;
>      int vmdesc_len;
>      SaveStateEntry *se;
>      int ret;
>      bool in_postcopy = migration_in_postcopy(migrate_get_current());
> +    int64_t tmp_i;
> +    char *tmp_s;
> 
>      trace_savevm_state_complete_precopy();
> 
> @@ -1073,9 +1082,12 @@ void qemu_savevm_state_complete_precopy(QEMUFile *f, bool iterable_only)
>          return;
>      }
> 
> -    vmdesc = qjson_new();
> -    json_prop_int(vmdesc, "page_size", TARGET_PAGE_SIZE);
> -    json_start_array(vmdesc, "devices");
> +    vmdesc_jov = json_output_visitor_new();
> +    vmdesc = json_output_get_visitor(vmdesc_jov);
> +    visit_start_struct(vmdesc, NULL, NULL, 0, &error_abort);
> +    tmp_i = TARGET_PAGE_SIZE;
> +    visit_type_int(vmdesc, "page_size", &tmp_i, &error_abort);
> +    visit_start_list(vmdesc, "devices", NULL, 0, &error_abort);
>      QTAILQ_FOREACH(se, &savevm_state.handlers, entry) {
> 
>          if ((!se->ops || !se->ops->save_state) && !se->vmsd) {
> @@ -1088,16 +1100,19 @@ void qemu_savevm_state_complete_precopy(QEMUFile *f, bool iterable_only)
> 
>          trace_savevm_section_start(se->idstr, se->section_id);
> 
> -        json_start_object(vmdesc, NULL);
> -        json_prop_str(vmdesc, "name", se->idstr);
> -        json_prop_int(vmdesc, "instance_id", se->instance_id);
> +        visit_start_struct(vmdesc, NULL, NULL, 0, &error_abort);
> +        tmp_s = se->idstr;
> +        visit_type_str(vmdesc, "name", &tmp_s, &error_abort);
> +        tmp_i = se->instance_id;
> +        visit_type_int(vmdesc, "instance_id", &tmp_i, &error_abort);
> 
>          save_section_header(f, se, QEMU_VM_SECTION_FULL);
>          vmstate_save(f, se, vmdesc);
>          trace_savevm_section_end(se->idstr, se->section_id, 0);
>          save_section_footer(f, se);
> 
> -        json_end_object(vmdesc);
> +        visit_check_struct(vmdesc, &error_abort);
> +        visit_end_struct(vmdesc);
>      }
> 
>      if (!in_postcopy) {
> @@ -1105,16 +1120,19 @@ void qemu_savevm_state_complete_precopy(QEMUFile *f, bool iterable_only)
>          qemu_put_byte(f, QEMU_VM_EOF);
>      }
> 
> -    json_end_array(vmdesc);
> -    qjson_finish(vmdesc);
> -    vmdesc_len = strlen(qjson_get_str(vmdesc));
> +    visit_end_list(vmdesc);
> +    visit_check_struct(vmdesc, &error_abort);
> +    visit_end_struct(vmdesc);
> +    vmdesc_str = json_output_get_string(vmdesc_jov);
> +    vmdesc_len = strlen(vmdesc_str);
> 
>      if (should_send_vmdesc()) {
>          qemu_put_byte(f, QEMU_VM_VMDESCRIPTION);
>          qemu_put_be32(f, vmdesc_len);
> -        qemu_put_buffer(f, (uint8_t *)qjson_get_str(vmdesc), vmdesc_len);
> +        qemu_put_buffer(f, (uint8_t *)vmdesc_str, vmdesc_len);
>      }
> -    object_unref(OBJECT(vmdesc));
> +    g_free(vmdesc_str);
> +    json_output_visitor_cleanup(vmdesc_jov);
> 
>      qemu_fflush(f);
>  }
> diff --git a/migration/vmstate.c b/migration/vmstate.c
> index bf3d5db..e7f894c 100644
> --- a/migration/vmstate.c
> +++ b/migration/vmstate.c
> @@ -1,15 +1,15 @@
>  #include "qemu/osdep.h"
>  #include "qemu-common.h"
> +#include "qapi/error.h"
>  #include "migration/migration.h"
>  #include "migration/qemu-file.h"
>  #include "migration/vmstate.h"
>  #include "qemu/bitops.h"
>  #include "qemu/error-report.h"
>  #include "trace.h"
> -#include "qjson.h"
> 
>  static void vmstate_subsection_save(QEMUFile *f, const VMStateDescription *vmsd,
> -                                    void *opaque, QJSON *vmdesc);
> +                                    void *opaque, Visitor *vmdesc);
>  static int vmstate_subsection_load(QEMUFile *f, const VMStateDescription *vmsd,
>                                     void *opaque);
> 
> @@ -226,12 +226,15 @@ static bool vmsd_can_compress(VMStateField *field)
>      return true;
>  }
> 
> -static void vmsd_desc_field_start(const VMStateDescription *vmsd, QJSON *vmdesc,
> +static void vmsd_desc_field_start(const VMStateDescription *vmsd,
> +                                  Visitor *vmdesc,
>                                    VMStateField *field, int i, int max)
>  {
>      char *name, *old_name;
>      bool is_array = max > 1;
>      bool can_compress = vmsd_can_compress(field);
> +    int64_t tmp;
> +    const char *type;
> 
>      if (!vmdesc) {
>          return;
> @@ -247,38 +250,47 @@ static void vmsd_desc_field_start(const VMStateDescription *vmsd, QJSON *vmdesc,
>          g_free(old_name);
>      }
> 
> -    json_start_object(vmdesc, NULL);
> -    json_prop_str(vmdesc, "name", name);
> +    visit_start_struct(vmdesc, NULL, NULL, 0, &error_abort);
> +    visit_type_str(vmdesc, "name", &name, &error_abort);
>      if (is_array) {
>          if (can_compress) {
> -            json_prop_int(vmdesc, "array_len", max);
> +            tmp = max;
> +            visit_type_int(vmdesc, "array_len", &tmp, &error_abort);
>          } else {
> -            json_prop_int(vmdesc, "index", i);
> +            tmp = i;
> +            visit_type_int(vmdesc, "index", &tmp, &error_abort);
>          }
>      }
> -    json_prop_str(vmdesc, "type", vmfield_get_type_name(field));
> +    type = vmfield_get_type_name(field);
> +    visit_type_str(vmdesc, "type", (char **)&type, &error_abort);
> 
>      if (field->flags & VMS_STRUCT) {
> -        json_start_object(vmdesc, "struct");
> +        visit_start_struct(vmdesc, "struct", NULL, 0, &error_abort);
>      }
> 
>      g_free(name);
>  }
> 
> -static void vmsd_desc_field_end(const VMStateDescription *vmsd, QJSON *vmdesc,
> +static void vmsd_desc_field_end(const VMStateDescription *vmsd,
> +                                Visitor *vmdesc,
>                                  VMStateField *field, size_t size, int i)
>  {
> +    int64_t tmp;
> +
>      if (!vmdesc) {
>          return;
>      }
> 
>      if (field->flags & VMS_STRUCT) {
>          /* We printed a struct in between, close its child object */
> -        json_end_object(vmdesc);
> +        visit_check_struct(vmdesc, &error_abort);
> +        visit_end_struct(vmdesc);
>      }
> 
> -    json_prop_int(vmdesc, "size", size);
> -    json_end_object(vmdesc);
> +    tmp = size;
> +    visit_type_int(vmdesc, "size", &tmp, &error_abort);
> +    visit_check_struct(vmdesc, &error_abort);
> +    visit_end_struct(vmdesc);
>  }
> 
> 
> @@ -293,7 +305,7 @@ bool vmstate_save_needed(const VMStateDescription *vmsd, void *opaque)
> 
> 
>  void vmstate_save_state(QEMUFile *f, const VMStateDescription *vmsd,
> -                        void *opaque, QJSON *vmdesc)
> +                        void *opaque, Visitor *vmdesc)
>  {
>      VMStateField *field = vmsd->fields;
> 
> @@ -302,9 +314,11 @@ void vmstate_save_state(QEMUFile *f, const VMStateDescription *vmsd,
>      }
> 
>      if (vmdesc) {
> -        json_prop_str(vmdesc, "vmsd_name", vmsd->name);
> -        json_prop_int(vmdesc, "version", vmsd->version_id);
> -        json_start_array(vmdesc, "fields");
> +        const char *tmp_s = vmsd->name;
> +        int64_t tmp_i = vmsd->version_id;
> +        visit_type_str(vmdesc, "vmsd_name", (char **)&tmp_s, &error_abort);
> +        visit_type_int(vmdesc, "version", &tmp_i, &error_abort);
> +        visit_start_list(vmdesc, "fields", NULL, 0, &error_abort);
>      }
> 
>      while (field->name) {
> @@ -314,7 +328,7 @@ void vmstate_save_state(QEMUFile *f, const VMStateDescription *vmsd,
>              int i, n_elems = vmstate_n_elems(opaque, field);
>              int size = vmstate_size(opaque, field);
>              int64_t old_offset, written_bytes;
> -            QJSON *vmdesc_loop = vmdesc;
> +            Visitor *vmdesc_loop = vmdesc;
> 
>              for (i = 0; i < n_elems; i++) {
>                  void *addr = base_addr + size * i;
> @@ -350,7 +364,7 @@ void vmstate_save_state(QEMUFile *f, const VMStateDescription *vmsd,
>      }
> 
>      if (vmdesc) {
> -        json_end_array(vmdesc);
> +        visit_end_list(vmdesc);
>      }
> 
>      vmstate_subsection_save(f, vmsd, opaque, vmdesc);
> @@ -420,7 +434,7 @@ static int vmstate_subsection_load(QEMUFile *f, const VMStateDescription *vmsd,
>  }
> 
>  static void vmstate_subsection_save(QEMUFile *f, const VMStateDescription *vmsd,
> -                                    void *opaque, QJSON *vmdesc)
> +                                    void *opaque, Visitor *vmdesc)
>  {
>      const VMStateDescription **sub = vmsd->subsections;
>      bool subsection_found = false;
> @@ -433,11 +447,12 @@ static void vmstate_subsection_save(QEMUFile *f, const VMStateDescription *vmsd,
>              if (vmdesc) {
>                  /* Only create subsection array when we have any */
>                  if (!subsection_found) {
> -                    json_start_array(vmdesc, "subsections");
> +                    visit_start_list(vmdesc, "subsections", NULL, 0,
> +                                     &error_abort);
>                      subsection_found = true;
>                  }
> 
> -                json_start_object(vmdesc, NULL);
> +                visit_start_struct(vmdesc, NULL, NULL, 0, &error_abort);
>              }
> 
>              qemu_put_byte(f, QEMU_VM_SUBSECTION);
> @@ -448,14 +463,15 @@ static void vmstate_subsection_save(QEMUFile *f, const VMStateDescription *vmsd,
>              vmstate_save_state(f, vmsd, opaque, vmdesc);
> 
>              if (vmdesc) {
> -                json_end_object(vmdesc);
> +                visit_check_struct(vmdesc, &error_abort);
> +                visit_end_struct(vmdesc);
>              }
>          }
>          sub++;
>      }
> 
>      if (vmdesc && subsection_found) {
> -        json_end_array(vmdesc);
> +        visit_end_list(vmdesc);
>      }
>  }
> 
> diff --git a/tests/Makefile b/tests/Makefile
> index 0b5a7a8..1f8a39d 100644
> --- a/tests/Makefile
> +++ b/tests/Makefile
> @@ -442,7 +442,7 @@ tests/test-qdev-global-props$(EXESUF): tests/test-qdev-global-props.o \
>  	$(test-qapi-obj-y)
>  tests/test-vmstate$(EXESUF): tests/test-vmstate.o \
>  	migration/vmstate.o migration/qemu-file.o migration/qemu-file-buf.o \
> -        migration/qemu-file-unix.o qjson.o \
> +        migration/qemu-file-unix.o qapi/json-output-visitor.o \
>  	$(test-qom-obj-y)
>  tests/test-timed-average$(EXESUF): tests/test-timed-average.o qemu-timer.o \
>  	$(test-util-obj-y)
> -- 
> 2.5.5
> 
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

Markus Armbruster May 3, 2016, 12:26 p.m. UTC | #5

"Dr. David Alan Gilbert" <dgilbert@redhat.com> writes:

> * Eric Blake (eblake@redhat.com) wrote:
>
>> -static void vmstate_save_old_style(QEMUFile *f, SaveStateEntry *se, QJSON *vmdesc)
>> +static void vmstate_save_old_style(QEMUFile *f, SaveStateEntry *se,
>> +                                   Visitor *vmdesc)
>>  {
>>      int64_t old_offset, size;
>> +    const char *tmp;
>> 
>>      old_offset = qemu_ftell_fast(f);
>>      se->ops->save_state(f, se->opaque);
>>      size = qemu_ftell_fast(f) - old_offset;
>> 
>>      if (vmdesc) {
>> -        json_prop_int(vmdesc, "size", size);
>> -        json_start_array(vmdesc, "fields");
>> -        json_start_object(vmdesc, NULL);
>> -        json_prop_str(vmdesc, "name", "data");
>> -        json_prop_int(vmdesc, "size", size);
>> -        json_prop_str(vmdesc, "type", "buffer");
>> -        json_end_object(vmdesc);
>> -        json_end_array(vmdesc);
>> +        visit_type_int(vmdesc, "size", &size, &error_abort);
>> +        visit_start_list(vmdesc, "fields", NULL, 0, &error_abort);
>> +        visit_start_struct(vmdesc, NULL, NULL, 0, &error_abort);
>
> Please avoid error_abort in migration code, especially on the source side.
> You've got an apparently happily working VM, we must never kill it 
> while attempting migration.

These functions cannot fail, and &error_abort is a concise way to
express that.  It's the same as

            visit_type_int(vmdesc, "size", &size, &err);
            assert(!err);

An alternative would be ignoring errors:

            visit_type_int(vmdesc, "size", &size, NULL);

Ignoring violations of design invariants is hardly ever a good idea,
though.

Another alternative would be trying to recover from the violation, like
this:

            visit_type_int(vmdesc, "size", &size, &err);
            if (err) {
                report we're fscked...
                do whatever needs to be done to recover...
                goto out;
            }

Fancy untestable error paths are hardly ever good ideas, either.

Complete list of conditions where the JSON output visitor sets an error:

* Conditions where the visitor core sets an error:

  - visit_type_uintN() when one of the visit_type_uint{8,16,32}() passes
    a value out of bounds.  This is a serious programming error in
    qapi-visit-core.c.  We're almost certainly screwed, and attempting
    to continue is unsafe.

  - visit_type_int(): likewise.

  - output_type_enum() when the numeric value is out of bounds.  This is
    either a serious programming error in qapi-visit-core.c, or
    corrupted state.  Either way, we're almost certainly screwed, and
    attempting to continue is unsafe.

  - input_type_enum() when the string value is unknown.  This is either
    a serious programming error in qapi-visit-core.c, or bad input.
    However, the JSON output visitor isn't supposed to ever call
    input_type_enum(), so it's the former.  Once again, we're almost
    certainly screwed, and attempting to continue is unsafe.

* Conditions where the JSON output visitor itself sets an error:

  - None.

Do you still object to &error_abort?

Eric Blake May 3, 2016, 12:34 p.m. UTC | #6

On 05/03/2016 06:26 AM, Markus Armbruster wrote:

>>> +        visit_type_int(vmdesc, "size", &size, &error_abort);
>>> +        visit_start_list(vmdesc, "fields", NULL, 0, &error_abort);
>>> +        visit_start_struct(vmdesc, NULL, NULL, 0, &error_abort);
>>
>> Please avoid error_abort in migration code, especially on the source side.
>> You've got an apparently happily working VM, we must never kill it 
>> while attempting migration.
> 
> These functions cannot fail, and &error_abort is a concise way to
> express that.  It's the same as
> 
>             visit_type_int(vmdesc, "size", &size, &err);
>             assert(!err);

&error_abort is ONLY supposed to be used to flag programming errors (ie.
they should never be reachable).  I'm asserting that the errors don't
happen, and therefore this cannot make the migration fail - in other
words, this is NOT going to kill a VM that attempts migration.

> * Conditions where the JSON output visitor itself sets an error:
> 
>   - None.

The JSON output visitor itself may be adding an error for an attempt to
output Inf or NaN for a floating point number - but since vmstate
doesn't use visit_type_number(), this is not possible.  And if we are
really worried about it, then in my next spin of the patch I may make it
user-configurable whether we stick to strict JSON or whether we relax
things and output Inf/NaN anyways.

Dr. David Alan Gilbert May 3, 2016, 1:23 p.m. UTC | #7

* Markus Armbruster (armbru@redhat.com) wrote:
> "Dr. David Alan Gilbert" <dgilbert@redhat.com> writes:
> 
> > * Eric Blake (eblake@redhat.com) wrote:
> >
> >> -static void vmstate_save_old_style(QEMUFile *f, SaveStateEntry *se, QJSON *vmdesc)
> >> +static void vmstate_save_old_style(QEMUFile *f, SaveStateEntry *se,
> >> +                                   Visitor *vmdesc)
> >>  {
> >>      int64_t old_offset, size;
> >> +    const char *tmp;
> >> 
> >>      old_offset = qemu_ftell_fast(f);
> >>      se->ops->save_state(f, se->opaque);
> >>      size = qemu_ftell_fast(f) - old_offset;
> >> 
> >>      if (vmdesc) {
> >> -        json_prop_int(vmdesc, "size", size);
> >> -        json_start_array(vmdesc, "fields");
> >> -        json_start_object(vmdesc, NULL);
> >> -        json_prop_str(vmdesc, "name", "data");
> >> -        json_prop_int(vmdesc, "size", size);
> >> -        json_prop_str(vmdesc, "type", "buffer");
> >> -        json_end_object(vmdesc);
> >> -        json_end_array(vmdesc);
> >> +        visit_type_int(vmdesc, "size", &size, &error_abort);
> >> +        visit_start_list(vmdesc, "fields", NULL, 0, &error_abort);
> >> +        visit_start_struct(vmdesc, NULL, NULL, 0, &error_abort);
> >
> > Please avoid error_abort in migration code, especially on the source side.
> > You've got an apparently happily working VM, we must never kill it 
> > while attempting migration.
> 
> These functions cannot fail,

Hang on though - this takes a Visitor* - that could be any visitor and that
could fail.

 and &error_abort is a concise way to
> express that.  It's the same as
> 
>             visit_type_int(vmdesc, "size", &size, &err);
>             assert(!err);
> 
> An alternative would be ignoring errors:
> 
>             visit_type_int(vmdesc, "size", &size, NULL);
> 
> Ignoring violations of design invariants is hardly ever a good idea,
> though.
> 
> Another alternative would be trying to recover from the violation, like
> this:
> 
>             visit_type_int(vmdesc, "size", &size, &err);
>             if (err) {
>                 report we're fscked...
>                 do whatever needs to be done to recover...
>                 goto out;
>             }
> 
> Fancy untestable error paths are hardly ever good ideas, either.

For an outgoing migration we must never kill the source unless we think the
data structures the source is using are itself corrupt.
We get programming errors both in our migration code and the migration
structures on devices.
If our migration code is broken/failing an invariant that still doesn't mean
you should kill the source - it should kill the migration only.

> Complete list of conditions where the JSON output visitor sets an error:
> 
> * Conditions where the visitor core sets an error:
> 
>   - visit_type_uintN() when one of the visit_type_uint{8,16,32}() passes
>     a value out of bounds.  This is a serious programming error in
>     qapi-visit-core.c.  We're almost certainly screwed, and attempting
>     to continue is unsafe.
> 
>   - visit_type_int(): likewise.
> 
>   - output_type_enum() when the numeric value is out of bounds.  This is
>     either a serious programming error in qapi-visit-core.c, or
>     corrupted state.  Either way, we're almost certainly screwed, and
>     attempting to continue is unsafe.
> 
>   - input_type_enum() when the string value is unknown.  This is either
>     a serious programming error in qapi-visit-core.c, or bad input.
>     However, the JSON output visitor isn't supposed to ever call
>     input_type_enum(), so it's the former.  Once again, we're almost
>     certainly screwed, and attempting to continue is unsafe.
> 
> * Conditions where the JSON output visitor itself sets an error:
> 
>   - None.
> 
> Do you still object to &error_abort?

So at the very least it should be commented as to why it can't happen.
My worry about it is that you've got a fairly long comment about why
it can't happen, and I worry that in 6 months someone adds a feature
to either the visitors or the migration code that means there's now
a case where it can happen.

Dave

--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

Dr. David Alan Gilbert May 3, 2016, 1:27 p.m. UTC | #8

* Eric Blake (eblake@redhat.com) wrote:
> On 05/03/2016 06:26 AM, Markus Armbruster wrote:
> 
> >>> +        visit_type_int(vmdesc, "size", &size, &error_abort);
> >>> +        visit_start_list(vmdesc, "fields", NULL, 0, &error_abort);
> >>> +        visit_start_struct(vmdesc, NULL, NULL, 0, &error_abort);
> >>
> >> Please avoid error_abort in migration code, especially on the source side.
> >> You've got an apparently happily working VM, we must never kill it 
> >> while attempting migration.
> > 
> > These functions cannot fail, and &error_abort is a concise way to
> > express that.  It's the same as
> > 
> >             visit_type_int(vmdesc, "size", &size, &err);
> >             assert(!err);
> 
> &error_abort is ONLY supposed to be used to flag programming errors (ie.
> they should never be reachable).  I'm asserting that the errors don't
> happen, and therefore this cannot make the migration fail - in other
> words, this is NOT going to kill a VM that attempts migration.

OK, but remember that I work on the basis that there are programming errors
in both the migration code and the VMState descriptions for devices.
If those break it still shouldn't kill the source.
(Note this isn't just true of migration - we need to be careful about
it in all cases where we're doing stuff to an otherwise happy VM).

> > * Conditions where the JSON output visitor itself sets an error:
> > 
> >   - None.
> 
> The JSON output visitor itself may be adding an error for an attempt to
> output Inf or NaN for a floating point number - but since vmstate
> doesn't use visit_type_number(), this is not possible.  And if we are
> really worried about it, then in my next spin of the patch I may make it
> user-configurable whether we stick to strict JSON or whether we relax
> things and output Inf/NaN anyways.

If that's the only case, and you're already saying it doesn't use it, then
I don't see there's a point in making that bit any more configurable.

Dave

> -- 
> Eric Blake   eblake redhat com    +1-919-301-3266
> Libvirt virtualization library http://libvirt.org
> 


--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

Markus Armbruster May 4, 2016, 8:39 a.m. UTC | #9

"Dr. David Alan Gilbert" <dgilbert@redhat.com> writes:

> * Eric Blake (eblake@redhat.com) wrote:
>> On 05/03/2016 06:26 AM, Markus Armbruster wrote:
>> 
>> >>> +        visit_type_int(vmdesc, "size", &size, &error_abort);
>> >>> +        visit_start_list(vmdesc, "fields", NULL, 0, &error_abort);
>> >>> +        visit_start_struct(vmdesc, NULL, NULL, 0, &error_abort);
>> >>
>> >> Please avoid error_abort in migration code, especially on the source side.
>> >> You've got an apparently happily working VM, we must never kill it 
>> >> while attempting migration.
>> > 
>> > These functions cannot fail, and &error_abort is a concise way to
>> > express that.  It's the same as
>> > 
>> >             visit_type_int(vmdesc, "size", &size, &err);
>> >             assert(!err);
>> 
>> &error_abort is ONLY supposed to be used to flag programming errors (ie.
>> they should never be reachable).  I'm asserting that the errors don't
>> happen, and therefore this cannot make the migration fail - in other
>> words, this is NOT going to kill a VM that attempts migration.
>
> OK, but remember that I work on the basis that there are programming errors
> in both the migration code and the VMState descriptions for devices.
> If those break it still shouldn't kill the source.
> (Note this isn't just true of migration - we need to be careful about
> it in all cases where we're doing stuff to an otherwise happy VM).

While you can safely recover from certain programming errors, you can't
do it in general.  Worse, deciding whether recovery from a certain
programming error is safe can be intractable.

Example: visit_type_enum(v, name, &enum_val, enum_str, &err), where v is
an output visitor.  This can fail when enum_val is not a valid subscript
of enum_str[].  Can we recover safely?  Assume that we can cleanly fail
the task at hand at this point of its execution.

Perhaps enum_str[] doesn't match the actual enum.  This is a programming
error.  Failing the task is graceful degradation, and safe enough.

But what if enum_str[] is fine, but enum_val got corrupted?  Then
failing the task is still safe as long as enum_val isn't visible outside
the task.  But if it is visible, all bets are off.  The corruption can
spread, and do real damage.  Can be the difference between a crash that
forces a reboot with a filesystem journal replay, and massive data
corruption.

So, should we try to recover here?  Assuming we want to, badly.  If
analysis shows the possible causes of this error are safely isolated by
the recovery, yes.  Without such analysis, the only prudent answer is
no.

Real world examples typically deal with state more complex than just an
enum (all too often a thicket of pointers), and the safety argument gets
much hairier.

If you want more tractable arguments, try Erlang.

>> > * Conditions where the JSON output visitor itself sets an error:
>> > 
>> >   - None.
>> 
>> The JSON output visitor itself may be adding an error for an attempt to
>> output Inf or NaN for a floating point number - but since vmstate
>> doesn't use visit_type_number(), this is not possible.  And if we are
>> really worried about it, then in my next spin of the patch I may make it
>> user-configurable whether we stick to strict JSON or whether we relax
>> things and output Inf/NaN anyways.
>
> If that's the only case, and you're already saying it doesn't use it, then
> I don't see there's a point in making that bit any more configurable.

I listed all possible failures of the JSON output visitor upthread.
This is an additional failure we've considered.  I'm wary of adding it
precisely because I do worry about upsetting apple carts like this one.

Dr. David Alan Gilbert May 4, 2016, 8:54 a.m. UTC | #10

* Markus Armbruster (armbru@redhat.com) wrote:
> "Dr. David Alan Gilbert" <dgilbert@redhat.com> writes:
> 
> > * Eric Blake (eblake@redhat.com) wrote:
> >> On 05/03/2016 06:26 AM, Markus Armbruster wrote:
> >> 
> >> >>> +        visit_type_int(vmdesc, "size", &size, &error_abort);
> >> >>> +        visit_start_list(vmdesc, "fields", NULL, 0, &error_abort);
> >> >>> +        visit_start_struct(vmdesc, NULL, NULL, 0, &error_abort);
> >> >>
> >> >> Please avoid error_abort in migration code, especially on the source side.
> >> >> You've got an apparently happily working VM, we must never kill it 
> >> >> while attempting migration.
> >> > 
> >> > These functions cannot fail, and &error_abort is a concise way to
> >> > express that.  It's the same as
> >> > 
> >> >             visit_type_int(vmdesc, "size", &size, &err);
> >> >             assert(!err);
> >> 
> >> &error_abort is ONLY supposed to be used to flag programming errors (ie.
> >> they should never be reachable).  I'm asserting that the errors don't
> >> happen, and therefore this cannot make the migration fail - in other
> >> words, this is NOT going to kill a VM that attempts migration.
> >
> > OK, but remember that I work on the basis that there are programming errors
> > in both the migration code and the VMState descriptions for devices.
> > If those break it still shouldn't kill the source.
> > (Note this isn't just true of migration - we need to be careful about
> > it in all cases where we're doing stuff to an otherwise happy VM).
> 
> While you can safely recover from certain programming errors, you can't
> do it in general.  Worse, deciding whether recovery from a certain
> programming error is safe can be intractable.
> 
> Example: visit_type_enum(v, name, &enum_val, enum_str, &err), where v is
> an output visitor.  This can fail when enum_val is not a valid subscript
> of enum_str[].  Can we recover safely?  Assume that we can cleanly fail
> the task at hand at this point of its execution.
> 
> Perhaps enum_str[] doesn't match the actual enum.  This is a programming
> error.  Failing the task is graceful degradation, and safe enough.
> 
> But what if enum_str[] is fine, but enum_val got corrupted?  Then
> failing the task is still safe as long as enum_val isn't visible outside
> the task.  But if it is visible, all bets are off.  The corruption can
> spread, and do real damage.  Can be the difference between a crash that
> forces a reboot with a filesystem journal replay, and massive data
> corruption.
> 
> So, should we try to recover here?  Assuming we want to, badly.  If
> analysis shows the possible causes of this error are safely isolated by
> the recovery, yes.  Without such analysis, the only prudent answer is
> no.
> 
> Real world examples typically deal with state more complex than just an
> enum (all too often a thicket of pointers), and the safety argument gets
> much hairier.
> 
> If you want more tractable arguments, try Erlang.

And so my argument here is very simple; if we believe we have a corruption
in migration data then we fail migration - I don't try and do anything
clever about trying to bound what's broken.
This isn't about getting formal/tractable arguments, it's about making
a practical system.

> >> > * Conditions where the JSON output visitor itself sets an error:
> >> > 
> >> >   - None.
> >> 
> >> The JSON output visitor itself may be adding an error for an attempt to
> >> output Inf or NaN for a floating point number - but since vmstate
> >> doesn't use visit_type_number(), this is not possible.  And if we are
> >> really worried about it, then in my next spin of the patch I may make it
> >> user-configurable whether we stick to strict JSON or whether we relax
> >> things and output Inf/NaN anyways.
> >
> > If that's the only case, and you're already saying it doesn't use it, then
> > I don't see there's a point in making that bit any more configurable.
> 
> I listed all possible failures of the JSON output visitor upthread.
> This is an additional failure we've considered.  I'm wary of adding it
> precisely because I do worry about upsetting apple carts like this one.

Dave
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

Markus Armbruster May 4, 2016, 9:11 a.m. UTC | #11

"Dr. David Alan Gilbert" <dgilbert@redhat.com> writes:

> * Markus Armbruster (armbru@redhat.com) wrote:
>> "Dr. David Alan Gilbert" <dgilbert@redhat.com> writes:
>> 
>> > * Eric Blake (eblake@redhat.com) wrote:
>> >
>> >> -static void vmstate_save_old_style(QEMUFile *f, SaveStateEntry *se, QJSON *vmdesc)
>> >> +static void vmstate_save_old_style(QEMUFile *f, SaveStateEntry *se,
>> >> +                                   Visitor *vmdesc)
>> >>  {
>> >>      int64_t old_offset, size;
>> >> +    const char *tmp;
>> >> 
>> >>      old_offset = qemu_ftell_fast(f);
>> >>      se->ops->save_state(f, se->opaque);
>> >>      size = qemu_ftell_fast(f) - old_offset;
>> >> 
>> >>      if (vmdesc) {
>> >> -        json_prop_int(vmdesc, "size", size);
>> >> -        json_start_array(vmdesc, "fields");
>> >> -        json_start_object(vmdesc, NULL);
>> >> -        json_prop_str(vmdesc, "name", "data");
>> >> -        json_prop_int(vmdesc, "size", size);
>> >> -        json_prop_str(vmdesc, "type", "buffer");
>> >> -        json_end_object(vmdesc);
>> >> -        json_end_array(vmdesc);
>> >> +        visit_type_int(vmdesc, "size", &size, &error_abort);
>> >> +        visit_start_list(vmdesc, "fields", NULL, 0, &error_abort);
>> >> +        visit_start_struct(vmdesc, NULL, NULL, 0, &error_abort);
>> >
>> > Please avoid error_abort in migration code, especially on the source side.
>> > You've got an apparently happily working VM, we must never kill it 
>> > while attempting migration.
>> 
>> These functions cannot fail,
>
> Hang on though - this takes a Visitor* - that could be any visitor and that
> could fail.

vmdesc is either NULL or it's the JSON output visitor created by
qemu_savevm_state_complete_precopy():

    vmdesc_jov = json_output_visitor_new(false);
    vmdesc = json_output_get_visitor(vmdesc_jov);

This is by design: the purpose of this code is *writing* a *JSON*
description of the migration stream.  See commit 8118f09.

>  and &error_abort is a concise way to
>> express that.  It's the same as
>> 
>>             visit_type_int(vmdesc, "size", &size, &err);
>>             assert(!err);
>> 
>> An alternative would be ignoring errors:
>> 
>>             visit_type_int(vmdesc, "size", &size, NULL);
>> 
>> Ignoring violations of design invariants is hardly ever a good idea,
>> though.
>> 
>> Another alternative would be trying to recover from the violation, like
>> this:
>> 
>>             visit_type_int(vmdesc, "size", &size, &err);
>>             if (err) {
>>                 report we're fscked...
>>                 do whatever needs to be done to recover...
>>                 goto out;
>>             }
>> 
>> Fancy untestable error paths are hardly ever good ideas, either.
>
> For an outgoing migration we must never kill the source unless we think the
> data structures the source is using are itself corrupt.
> We get programming errors both in our migration code and the migration
> structures on devices.
> If our migration code is broken/failing an invariant that still doesn't mean
> you should kill the source - it should kill the migration only.

"git-grep assert migration" suggests you do kill the source on certain
programming errors.

I reiterate my point that fancy, untestable error recovery is unlikely
to actually recover.  "Fancy" can work, "untestable" might work (but
color me skeptic), but once you got both, you're a dead man walking.

>> Complete list of conditions where the JSON output visitor sets an error:
>> 
>> * Conditions where the visitor core sets an error:
>> 
>>   - visit_type_uintN() when one of the visit_type_uint{8,16,32}() passes
>>     a value out of bounds.  This is a serious programming error in
>>     qapi-visit-core.c.  We're almost certainly screwed, and attempting
>>     to continue is unsafe.
>> 
>>   - visit_type_int(): likewise.
>> 
>>   - output_type_enum() when the numeric value is out of bounds.  This is
>>     either a serious programming error in qapi-visit-core.c, or
>>     corrupted state.  Either way, we're almost certainly screwed, and
>>     attempting to continue is unsafe.
>> 
>>   - input_type_enum() when the string value is unknown.  This is either
>>     a serious programming error in qapi-visit-core.c, or bad input.
>>     However, the JSON output visitor isn't supposed to ever call
>>     input_type_enum(), so it's the former.  Once again, we're almost
>>     certainly screwed, and attempting to continue is unsafe.
>> 
>> * Conditions where the JSON output visitor itself sets an error:
>> 
>>   - None.
>> 
>> Do you still object to &error_abort?
>
> So at the very least it should be commented as to why it can't happen.
> My worry about it is that you've got a fairly long comment about why
> it can't happen, and I worry that in 6 months someone adds a feature
> to either the visitors or the migration code that means there's now
> a case where it can happen.

Here's why I don't think new failure modes are likely.

What does this helper module do, and how could it possibly fail?  By
"possibly", I mean any conceivable reasonable implementation, not just
the two we have (this patch gets rid of one).

This helper module builds JSON text and returns it as a string.  Its
interface mirrors JSON abstract syntax: start object, end object, start
array, end array, string, ...  Additionally, initialize, finalize, get
the result as a string.

Conceivable failure modes:

* Out of memory.  We die, like we generally do for smallish allocations.

* Data not representable in JSON.  This is basically non-finite numbers,
  and we already chose to extend JSON instead of making this an error.
  Such a decision will not be revised without a thorough analysis of
  impact on existing users.

* Interface misused, e.g. invalid nesting.  Clearly a programming error.
  We can either silently produce garbage output, fail, or die.  Before
  the patch: garbage output.  After the patch: die by assertion failure
  (*not* via &error_abort).

* Anything else?

"Not via &error_abort" leads me to another point.  The &error_abort are
the assertions you can see in the patch.  The ones you can't see are in
the visitor core and the JSON output visitor.  They're all about misuse
of the interface.

The old code is different: it doesn't detect misuse, and produces
invalid JSON instead.  "Never check for an error you don't know how to
handle."

With the new code, misuse should be caught in general migration testing,
"make check" if it's any good.

With the old code, it could more easily escape testing, because you have
to parse the resulting JSON to detect it.

Dr. David Alan Gilbert May 4, 2016, 9:22 a.m. UTC | #12

* Markus Armbruster (armbru@redhat.com) wrote:
> "Dr. David Alan Gilbert" <dgilbert@redhat.com> writes:

> "git-grep assert migration" suggests you do kill the source on certain
> programming errors.

I'm just trying hard to reduce them; I know I'm not there, but I'd rather
we didn't have any - especially on the source side.

> I reiterate my point that fancy, untestable error recovery is unlikely
> to actually recover.  "Fancy" can work, "untestable" might work (but
> color me skeptic), but once you got both, you're a dead man walking.

Then we should make the error recovery paths easy; at the moment visitor
error paths are just too painful.

> 
> >> Complete list of conditions where the JSON output visitor sets an error:
> >> 
> >> * Conditions where the visitor core sets an error:
> >> 
> >>   - visit_type_uintN() when one of the visit_type_uint{8,16,32}() passes
> >>     a value out of bounds.  This is a serious programming error in
> >>     qapi-visit-core.c.  We're almost certainly screwed, and attempting
> >>     to continue is unsafe.
> >> 
> >>   - visit_type_int(): likewise.
> >> 
> >>   - output_type_enum() when the numeric value is out of bounds.  This is
> >>     either a serious programming error in qapi-visit-core.c, or
> >>     corrupted state.  Either way, we're almost certainly screwed, and
> >>     attempting to continue is unsafe.
> >> 
> >>   - input_type_enum() when the string value is unknown.  This is either
> >>     a serious programming error in qapi-visit-core.c, or bad input.
> >>     However, the JSON output visitor isn't supposed to ever call
> >>     input_type_enum(), so it's the former.  Once again, we're almost
> >>     certainly screwed, and attempting to continue is unsafe.
> >> 
> >> * Conditions where the JSON output visitor itself sets an error:
> >> 
> >>   - None.
> >> 
> >> Do you still object to &error_abort?
> >
> > So at the very least it should be commented as to why it can't happen.
> > My worry about it is that you've got a fairly long comment about why
> > it can't happen, and I worry that in 6 months someone adds a feature
> > to either the visitors or the migration code that means there's now
> > a case where it can happen.
> 
> Here's why I don't think new failure modes are likely.
> 
> What does this helper module do, and how could it possibly fail?  By
> "possibly", I mean any conceivable reasonable implementation, not just
> the two we have (this patch gets rid of one).
> 
> This helper module builds JSON text and returns it as a string.  Its
> interface mirrors JSON abstract syntax: start object, end object, start
> array, end array, string, ...  Additionally, initialize, finalize, get
> the result as a string.
> 
> Conceivable failure modes:
> 
> * Out of memory.  We die, like we generally do for smallish allocations.
> 
> * Data not representable in JSON.  This is basically non-finite numbers,
>   and we already chose to extend JSON instead of making this an error.
>   Such a decision will not be revised without a thorough analysis of
>   impact on existing users.
> 
> * Interface misused, e.g. invalid nesting.  Clearly a programming error.
>   We can either silently produce garbage output, fail, or die.  Before
>   the patch: garbage output.  After the patch: die by assertion failure
>   (*not* via &error_abort).
> 
> * Anything else?
> 
> "Not via &error_abort" leads me to another point.  The &error_abort are
> the assertions you can see in the patch.  The ones you can't see are in
> the visitor core and the JSON output visitor.  They're all about misuse
> of the interface.
> 
> The old code is different: it doesn't detect misuse, and produces
> invalid JSON instead.  "Never check for an error you don't know how to
> handle."
> 
> With the new code, misuse should be caught in general migration testing,
> "make check" if it's any good.
> 
> With the old code, it could more easily escape testing, because you have
> to parse the resulting JSON to detect it.

And what happens to the users VM if that JSON is invalid? *nothing*
The user doesn't see any problem at all; no corruption, no crash, nothing.
That's what I like users to see.

Dave
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

Markus Armbruster May 4, 2016, 11:37 a.m. UTC | #13

"Dr. David Alan Gilbert" <dgilbert@redhat.com> writes:

> * Markus Armbruster (armbru@redhat.com) wrote:
>> "Dr. David Alan Gilbert" <dgilbert@redhat.com> writes:
>
>> "git-grep assert migration" suggests you do kill the source on certain
>> programming errors.
>
> I'm just trying hard to reduce them; I know I'm not there, but I'd rather
> we didn't have any - especially on the source side.
>
>> I reiterate my point that fancy, untestable error recovery is unlikely
>> to actually recover.  "Fancy" can work, "untestable" might work (but
>> color me skeptic), but once you got both, you're a dead man walking.
>
> Then we should make the error recovery paths easy; at the moment visitor
> error paths are just too painful.

I've never seen error handling in C that wasn't painful and still
correct.  Surprise me!

>> >> Complete list of conditions where the JSON output visitor sets an error:
>> >> 
>> >> * Conditions where the visitor core sets an error:
>> >> 
>> >>   - visit_type_uintN() when one of the visit_type_uint{8,16,32}() passes
>> >>     a value out of bounds.  This is a serious programming error in
>> >>     qapi-visit-core.c.  We're almost certainly screwed, and attempting
>> >>     to continue is unsafe.
>> >> 
>> >>   - visit_type_int(): likewise.
>> >> 
>> >>   - output_type_enum() when the numeric value is out of bounds.  This is
>> >>     either a serious programming error in qapi-visit-core.c, or
>> >>     corrupted state.  Either way, we're almost certainly screwed, and
>> >>     attempting to continue is unsafe.
>> >> 
>> >>   - input_type_enum() when the string value is unknown.  This is either
>> >>     a serious programming error in qapi-visit-core.c, or bad input.
>> >>     However, the JSON output visitor isn't supposed to ever call
>> >>     input_type_enum(), so it's the former.  Once again, we're almost
>> >>     certainly screwed, and attempting to continue is unsafe.
>> >> 
>> >> * Conditions where the JSON output visitor itself sets an error:
>> >> 
>> >>   - None.
>> >> 
>> >> Do you still object to &error_abort?
>> >
>> > So at the very least it should be commented as to why it can't happen.
>> > My worry about it is that you've got a fairly long comment about why
>> > it can't happen, and I worry that in 6 months someone adds a feature
>> > to either the visitors or the migration code that means there's now
>> > a case where it can happen.
>> 
>> Here's why I don't think new failure modes are likely.
>> 
>> What does this helper module do, and how could it possibly fail?  By
>> "possibly", I mean any conceivable reasonable implementation, not just
>> the two we have (this patch gets rid of one).
>> 
>> This helper module builds JSON text and returns it as a string.  Its
>> interface mirrors JSON abstract syntax: start object, end object, start
>> array, end array, string, ...  Additionally, initialize, finalize, get
>> the result as a string.
>> 
>> Conceivable failure modes:
>> 
>> * Out of memory.  We die, like we generally do for smallish allocations.
>> 
>> * Data not representable in JSON.  This is basically non-finite numbers,
>>   and we already chose to extend JSON instead of making this an error.
>>   Such a decision will not be revised without a thorough analysis of
>>   impact on existing users.
>> 
>> * Interface misused, e.g. invalid nesting.  Clearly a programming error.
>>   We can either silently produce garbage output, fail, or die.  Before
>>   the patch: garbage output.  After the patch: die by assertion failure
>>   (*not* via &error_abort).
>> 
>> * Anything else?
>> 
>> "Not via &error_abort" leads me to another point.  The &error_abort are
>> the assertions you can see in the patch.  The ones you can't see are in
>> the visitor core and the JSON output visitor.  They're all about misuse
>> of the interface.
>> 
>> The old code is different: it doesn't detect misuse, and produces
>> invalid JSON instead.  "Never check for an error you don't know how to
>> handle."
>> 
>> With the new code, misuse should be caught in general migration testing,
>> "make check" if it's any good.
>> 
>> With the old code, it could more easily escape testing, because you have
>> to parse the resulting JSON to detect it.
>
> And what happens to the users VM if that JSON is invalid? *nothing*
> The user doesn't see any problem at all; no corruption, no crash, nothing.
> That's what I like users to see.

This assumes that the root cause of the assertion failure has no further
ill effects.  I call that assumption bold.  But to each his own.

I figure we're unlikely to reach consensus on this, so I'd like to
propose we agree to disagree, and do the following:

* We shelve the de-duplication of JSON formatting (this patch)
  indefinitely.

* We move qjson.c to migration/, next to its only user, and add a
  comment explaining why it migration doesn't want to use general
  infrastructure here (JSON output visitor), but needs its own thing.
  This gets the file covered in MAINTAINERS, and will help prevent it
  growing additional users.

Deal?

Dr. David Alan Gilbert May 4, 2016, 11:56 a.m. UTC | #14

* Markus Armbruster (armbru@redhat.com) wrote:
> "Dr. David Alan Gilbert" <dgilbert@redhat.com> writes:
> 
> > * Markus Armbruster (armbru@redhat.com) wrote:
> >> "Dr. David Alan Gilbert" <dgilbert@redhat.com> writes:
> >
> >> "git-grep assert migration" suggests you do kill the source on certain
> >> programming errors.
> >
> > I'm just trying hard to reduce them; I know I'm not there, but I'd rather
> > we didn't have any - especially on the source side.
> >
> >> I reiterate my point that fancy, untestable error recovery is unlikely
> >> to actually recover.  "Fancy" can work, "untestable" might work (but
> >> color me skeptic), but once you got both, you're a dead man walking.
> >
> > Then we should make the error recovery paths easy; at the moment visitor
> > error paths are just too painful.
> 
> I've never seen error handling in C that wasn't painful and still
> correct.  Surprise me!

The thing that makes it hard for the visitor code is the need to check
it after every call and the check is complicated.

> >> >> Complete list of conditions where the JSON output visitor sets an error:
> >> >> 
> >> >> * Conditions where the visitor core sets an error:
> >> >> 
> >> >>   - visit_type_uintN() when one of the visit_type_uint{8,16,32}() passes
> >> >>     a value out of bounds.  This is a serious programming error in
> >> >>     qapi-visit-core.c.  We're almost certainly screwed, and attempting
> >> >>     to continue is unsafe.
> >> >> 
> >> >>   - visit_type_int(): likewise.
> >> >> 
> >> >>   - output_type_enum() when the numeric value is out of bounds.  This is
> >> >>     either a serious programming error in qapi-visit-core.c, or
> >> >>     corrupted state.  Either way, we're almost certainly screwed, and
> >> >>     attempting to continue is unsafe.
> >> >> 
> >> >>   - input_type_enum() when the string value is unknown.  This is either
> >> >>     a serious programming error in qapi-visit-core.c, or bad input.
> >> >>     However, the JSON output visitor isn't supposed to ever call
> >> >>     input_type_enum(), so it's the former.  Once again, we're almost
> >> >>     certainly screwed, and attempting to continue is unsafe.
> >> >> 
> >> >> * Conditions where the JSON output visitor itself sets an error:
> >> >> 
> >> >>   - None.
> >> >> 
> >> >> Do you still object to &error_abort?
> >> >
> >> > So at the very least it should be commented as to why it can't happen.
> >> > My worry about it is that you've got a fairly long comment about why
> >> > it can't happen, and I worry that in 6 months someone adds a feature
> >> > to either the visitors or the migration code that means there's now
> >> > a case where it can happen.
> >> 
> >> Here's why I don't think new failure modes are likely.
> >> 
> >> What does this helper module do, and how could it possibly fail?  By
> >> "possibly", I mean any conceivable reasonable implementation, not just
> >> the two we have (this patch gets rid of one).
> >> 
> >> This helper module builds JSON text and returns it as a string.  Its
> >> interface mirrors JSON abstract syntax: start object, end object, start
> >> array, end array, string, ...  Additionally, initialize, finalize, get
> >> the result as a string.
> >> 
> >> Conceivable failure modes:
> >> 
> >> * Out of memory.  We die, like we generally do for smallish allocations.
> >> 
> >> * Data not representable in JSON.  This is basically non-finite numbers,
> >>   and we already chose to extend JSON instead of making this an error.
> >>   Such a decision will not be revised without a thorough analysis of
> >>   impact on existing users.
> >> 
> >> * Interface misused, e.g. invalid nesting.  Clearly a programming error.
> >>   We can either silently produce garbage output, fail, or die.  Before
> >>   the patch: garbage output.  After the patch: die by assertion failure
> >>   (*not* via &error_abort).
> >> 
> >> * Anything else?
> >> 
> >> "Not via &error_abort" leads me to another point.  The &error_abort are
> >> the assertions you can see in the patch.  The ones you can't see are in
> >> the visitor core and the JSON output visitor.  They're all about misuse
> >> of the interface.
> >> 
> >> The old code is different: it doesn't detect misuse, and produces
> >> invalid JSON instead.  "Never check for an error you don't know how to
> >> handle."
> >> 
> >> With the new code, misuse should be caught in general migration testing,
> >> "make check" if it's any good.
> >> 
> >> With the old code, it could more easily escape testing, because you have
> >> to parse the resulting JSON to detect it.
> >
> > And what happens to the users VM if that JSON is invalid? *nothing*
> > The user doesn't see any problem at all; no corruption, no crash, nothing.
> > That's what I like users to see.
> 
> This assumes that the root cause of the assertion failure has no further
> ill effects.  I call that assumption bold.  But to each his own.

The whole JSON use in migration is just for debug/parsing in external tools - 
even if it's complete rubbish it doesn't affect the VM, which is why I don't
want an error producing it to kill the VM.

> I figure we're unlikely to reach consensus on this, so I'd like to
> propose we agree to disagree, and do the following:
> 
> * We shelve the de-duplication of JSON formatting (this patch)
>   indefinitely.
> 
> * We move qjson.c to migration/, next to its only user, and add a
>   comment explaining why it migration doesn't want to use general
>   infrastructure here (JSON output visitor), but needs its own thing.
>   This gets the file covered in MAINTAINERS, and will help prevent it
>   growing additional users.
> 
> Deal?

No, sorry; the JSON use in the migration is just a debug thing;
we don't want to maintain a separate JSON instance for it.

Dave
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

Markus Armbruster May 4, 2016, 1 p.m. UTC | #15

"Dr. David Alan Gilbert" <dgilbert@redhat.com> writes:

> * Markus Armbruster (armbru@redhat.com) wrote:
>> "Dr. David Alan Gilbert" <dgilbert@redhat.com> writes:
>> 
>> > * Markus Armbruster (armbru@redhat.com) wrote:
>> >> "Dr. David Alan Gilbert" <dgilbert@redhat.com> writes:
>> >
>> >> "git-grep assert migration" suggests you do kill the source on certain
>> >> programming errors.
>> >
>> > I'm just trying hard to reduce them; I know I'm not there, but I'd rather
>> > we didn't have any - especially on the source side.
>> >
>> >> I reiterate my point that fancy, untestable error recovery is unlikely
>> >> to actually recover.  "Fancy" can work, "untestable" might work (but
>> >> color me skeptic), but once you got both, you're a dead man walking.
>> >
>> > Then we should make the error recovery paths easy; at the moment visitor
>> > error paths are just too painful.
>> 
>> I've never seen error handling in C that wasn't painful and still
>> correct.  Surprise me!
>
> The thing that makes it hard for the visitor code is the need to check
> it after every call and the check is complicated.

Having to check every call is certainly painful, but there's no general
and safe way around it.  Accumulating errors that need to be checked
only at the end of a job can be less painful, but then the job's code
needs to be very carefully written to be safe even in presence of
errors.  Most code isn't, and some code can't.

The check for failure is simple, but annoyingly verbose when the
function's return value is useless:

    Error *err = NULL;
    foo(..., &err);
    if (err) {
        ...
    }

I'm playing with a update to conventions and usage to permit

    if (!foo(..., &err)) {
        ...
    }

Just as simple, but more readable.

[...]
>> I figure we're unlikely to reach consensus on this, so I'd like to
>> propose we agree to disagree, and do the following:
>> 
>> * We shelve the de-duplication of JSON formatting (this patch)
>>   indefinitely.
>> 
>> * We move qjson.c to migration/, next to its only user, and add a
>>   comment explaining why it migration doesn't want to use general
>>   infrastructure here (JSON output visitor), but needs its own thing.
>>   This gets the file covered in MAINTAINERS, and will help prevent it
>>   growing additional users.
>> 
>> Deal?
>
> No, sorry; the JSON use in the migration is just a debug thing;
> we don't want to maintain a separate JSON instance for it.

Well, you already do, except in name.  Who else do you think is
maintaining qjson.[ch], created by migration people, for migration's
use?  Certainly not me.

If you can't use the general JSON output code I maintain because of
special requirements, you get to continue maintaining your own.  All 109
SLOC of it.  All I'm asking is to make it official, and to deter
accidental use of migration's JSON writer instead of the general one.

Dr. David Alan Gilbert May 4, 2016, 1:19 p.m. UTC | #16

* Markus Armbruster (armbru@redhat.com) wrote:
> "Dr. David Alan Gilbert" <dgilbert@redhat.com> writes:
> 
> > * Markus Armbruster (armbru@redhat.com) wrote:
> >> "Dr. David Alan Gilbert" <dgilbert@redhat.com> writes:
> >> 
> >> > * Markus Armbruster (armbru@redhat.com) wrote:
> >> >> "Dr. David Alan Gilbert" <dgilbert@redhat.com> writes:
> >> >
> >> >> "git-grep assert migration" suggests you do kill the source on certain
> >> >> programming errors.
> >> >
> >> > I'm just trying hard to reduce them; I know I'm not there, but I'd rather
> >> > we didn't have any - especially on the source side.
> >> >
> >> >> I reiterate my point that fancy, untestable error recovery is unlikely
> >> >> to actually recover.  "Fancy" can work, "untestable" might work (but
> >> >> color me skeptic), but once you got both, you're a dead man walking.
> >> >
> >> > Then we should make the error recovery paths easy; at the moment visitor
> >> > error paths are just too painful.
> >> 
> >> I've never seen error handling in C that wasn't painful and still
> >> correct.  Surprise me!
> >
> > The thing that makes it hard for the visitor code is the need to check
> > it after every call and the check is complicated.
> 
> Having to check every call is certainly painful, but there's no general
> and safe way around it.  Accumulating errors that need to be checked
> only at the end of a job can be less painful, but then the job's code
> needs to be very carefully written to be safe even in presence of
> errors.  Most code isn't, and some code can't.

Yes; output visitors would seem to be the easiest case though?

> The check for failure is simple, but annoyingly verbose when the
> function's return value is useless:
> 
>     Error *err = NULL;
>     foo(..., &err);
>     if (err) {
>         ...
>     }
> 
> I'm playing with a update to conventions and usage to permit
> 
>     if (!foo(..., &err)) {
>         ...
>     }

If that became;
      if (!foo(..., &err) ||
          !foo(..., &err) ||
          !foo(..., &err)) {
          ...
      }

That would be both readable and not verbose.

> Just as simple, but more readable.
> 
> [...]
> >> I figure we're unlikely to reach consensus on this, so I'd like to
> >> propose we agree to disagree, and do the following:
> >> 
> >> * We shelve the de-duplication of JSON formatting (this patch)
> >>   indefinitely.
> >> 
> >> * We move qjson.c to migration/, next to its only user, and add a
> >>   comment explaining why it migration doesn't want to use general
> >>   infrastructure here (JSON output visitor), but needs its own thing.
> >>   This gets the file covered in MAINTAINERS, and will help prevent it
> >>   growing additional users.
> >> 
> >> Deal?
> >
> > No, sorry; the JSON use in the migration is just a debug thing;
> > we don't want to maintain a separate JSON instance for it.
> 
> Well, you already do, except in name.  Who else do you think is
> maintaining qjson.[ch], created by migration people, for migration's
> use?  Certainly not me.

That came from migration? Really? I didn't think we used JSON at
all until last year.

> If you can't use the general JSON output code I maintain because of
> special requirements, you get to continue maintaining your own.  All 109
> SLOC of it.  All I'm asking is to make it official, and to deter
> accidental use of migration's JSON writer instead of the general one.

Yeh; I'd love to share the JSON code; just lets try and avoid anything that
can kill the source, however broken the migration.

Dave

--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

Markus Armbruster May 4, 2016, 2:10 p.m. UTC | #17

"Dr. David Alan Gilbert" <dgilbert@redhat.com> writes:

> * Markus Armbruster (armbru@redhat.com) wrote:
>> "Dr. David Alan Gilbert" <dgilbert@redhat.com> writes:
>> 
>> > * Markus Armbruster (armbru@redhat.com) wrote:
>> >> "Dr. David Alan Gilbert" <dgilbert@redhat.com> writes:
>> >> 
>> >> > * Markus Armbruster (armbru@redhat.com) wrote:
>> >> >> "Dr. David Alan Gilbert" <dgilbert@redhat.com> writes:
>> >> >
>> >> >> "git-grep assert migration" suggests you do kill the source on certain
>> >> >> programming errors.
>> >> >
>> >> > I'm just trying hard to reduce them; I know I'm not there, but I'd rather
>> >> > we didn't have any - especially on the source side.
>> >> >
>> >> >> I reiterate my point that fancy, untestable error recovery is unlikely
>> >> >> to actually recover.  "Fancy" can work, "untestable" might work (but
>> >> >> color me skeptic), but once you got both, you're a dead man walking.
>> >> >
>> >> > Then we should make the error recovery paths easy; at the moment visitor
>> >> > error paths are just too painful.
>> >> 
>> >> I've never seen error handling in C that wasn't painful and still
>> >> correct.  Surprise me!
>> >
>> > The thing that makes it hard for the visitor code is the need to check
>> > it after every call and the check is complicated.
>> 
>> Having to check every call is certainly painful, but there's no general
>> and safe way around it.  Accumulating errors that need to be checked
>> only at the end of a job can be less painful, but then the job's code
>> needs to be very carefully written to be safe even in presence of
>> errors.  Most code isn't, and some code can't.
>
> Yes; output visitors would seem to be the easiest case though?

Here's the example from visitor.h at the end of this series (with a
small mistake corrected):

    Visitor *v;
    Error *err = NULL;
    int value;

    v = ...obtain visitor...
    visit_start_struct(v, NULL, NULL, 0, &err);
    if (err) {
        goto out;
    }
    visit_start_list(v, "list", NULL, 0, &err);
    if (err) {
        goto outobj;
    }
    value = 1;
    visit_type_int(v, NULL, &value, &err);
    if (err) {
        goto outlist;
    }
    value = 2;
    visit_type_int(v, NULL, &value, &err);
    if (err) {
        goto outlist;
    }
   outlist:
    visit_end_list(v, NULL);
    if (!err) {
        visit_check_struct(v, &err);
    }
   outobj:
    visit_end_struct(v, NULL);
   out:
    error_propagate(errp, err);
    ...clean up v...

With accumulating Errors, we could elide some but not all error checks.
In particular, the ones after visit_start_FOO() are still required,
because visit_end_FOO() may only be called after visit_start_FOO()
succeeded.

If we did anything interesting in addition to calling visitors, we'd
have to additionally consider whether doing it is safe after errors.

Accumulating errors *can* make the code easier on the eyes, but they
also make it easy to screw up behavior after error.

>> The check for failure is simple, but annoyingly verbose when the
>> function's return value is useless:
>> 
>>     Error *err = NULL;
>>     foo(..., &err);
>>     if (err) {
>>         ...
>>     }
>> 
>> I'm playing with a update to conventions and usage to permit
>> 
>>     if (!foo(..., &err)) {
>>         ...
>>     }
>
> If that became;
>       if (!foo(..., &err) ||
>           !foo(..., &err) ||
>           !foo(..., &err)) {
>           ...
>       }
>
> That would be both readable and not verbose.

Yes, that could be done then.

>> Just as simple, but more readable.
>> 
>> [...]
>> >> I figure we're unlikely to reach consensus on this, so I'd like to
>> >> propose we agree to disagree, and do the following:
>> >> 
>> >> * We shelve the de-duplication of JSON formatting (this patch)
>> >>   indefinitely.
>> >> 
>> >> * We move qjson.c to migration/, next to its only user, and add a
>> >>   comment explaining why it migration doesn't want to use general
>> >>   infrastructure here (JSON output visitor), but needs its own thing.
>> >>   This gets the file covered in MAINTAINERS, and will help prevent it
>> >>   growing additional users.
>> >> 
>> >> Deal?
>> >
>> > No, sorry; the JSON use in the migration is just a debug thing;
>> > we don't want to maintain a separate JSON instance for it.
>> 
>> Well, you already do, except in name.  Who else do you think is
>> maintaining qjson.[ch], created by migration people, for migration's
>> use?  Certainly not me.
>
> That came from migration? Really? I didn't think we used JSON at
> all until last year.

Commit 0457d07..b174257.

Migration is still the only user of this special JSON writer, and if you
ask me, it better remain the only one.

>> If you can't use the general JSON output code I maintain because of
>> special requirements, you get to continue maintaining your own.  All 109
>> SLOC of it.  All I'm asking is to make it official, and to deter
>> accidental use of migration's JSON writer instead of the general one.
>
> Yeh; I'd love to share the JSON code; just lets try and avoid anything that
> can kill the source, however broken the migration.

Visitors will abort when their preconditions or invariants are violated.
If that's not okay for migration, I'm afraid migration needs to continue
to roll its own JSON writer.  Visitors are pretty heavily used nowadays,
and we very much rely on these assertions to catch mistakes.

Dr. David Alan Gilbert May 4, 2016, 2:53 p.m. UTC | #18

* Markus Armbruster (armbru@redhat.com) wrote:
> "Dr. David Alan Gilbert" <dgilbert@redhat.com> writes:
> 
> > * Markus Armbruster (armbru@redhat.com) wrote:
> >> "Dr. David Alan Gilbert" <dgilbert@redhat.com> writes:
> >> 
> >> > * Markus Armbruster (armbru@redhat.com) wrote:
> >> >> "Dr. David Alan Gilbert" <dgilbert@redhat.com> writes:
> >> >> 
> >> >> > * Markus Armbruster (armbru@redhat.com) wrote:
> >> >> >> "Dr. David Alan Gilbert" <dgilbert@redhat.com> writes:
> >> >> >
> >> >> >> "git-grep assert migration" suggests you do kill the source on certain
> >> >> >> programming errors.
> >> >> >
> >> >> > I'm just trying hard to reduce them; I know I'm not there, but I'd rather
> >> >> > we didn't have any - especially on the source side.
> >> >> >
> >> >> >> I reiterate my point that fancy, untestable error recovery is unlikely
> >> >> >> to actually recover.  "Fancy" can work, "untestable" might work (but
> >> >> >> color me skeptic), but once you got both, you're a dead man walking.
> >> >> >
> >> >> > Then we should make the error recovery paths easy; at the moment visitor
> >> >> > error paths are just too painful.
> >> >> 
> >> >> I've never seen error handling in C that wasn't painful and still
> >> >> correct.  Surprise me!
> >> >
> >> > The thing that makes it hard for the visitor code is the need to check
> >> > it after every call and the check is complicated.
> >> 
> >> Having to check every call is certainly painful, but there's no general
> >> and safe way around it.  Accumulating errors that need to be checked
> >> only at the end of a job can be less painful, but then the job's code
> >> needs to be very carefully written to be safe even in presence of
> >> errors.  Most code isn't, and some code can't.
> >
> > Yes; output visitors would seem to be the easiest case though?
> 
> Here's the example from visitor.h at the end of this series (with a
> small mistake corrected):
> 
>     Visitor *v;
>     Error *err = NULL;
>     int value;
> 
>     v = ...obtain visitor...
>     visit_start_struct(v, NULL, NULL, 0, &err);
>     if (err) {
>         goto out;
>     }
>     visit_start_list(v, "list", NULL, 0, &err);
>     if (err) {
>         goto outobj;
>     }
>     value = 1;
>     visit_type_int(v, NULL, &value, &err);
>     if (err) {
>         goto outlist;
>     }
>     value = 2;
>     visit_type_int(v, NULL, &value, &err);
>     if (err) {
>         goto outlist;
>     }
>    outlist:
>     visit_end_list(v, NULL);
>     if (!err) {
>         visit_check_struct(v, &err);
>     }
>    outobj:
>     visit_end_struct(v, NULL);
>    out:
>     error_propagate(errp, err);
>     ...clean up v...
> 
> With accumulating Errors, we could elide some but not all error checks.
> In particular, the ones after visit_start_FOO() are still required,
> because visit_end_FOO() may only be called after visit_start_FOO()
> succeeded.

Hmm the visit_end_* are interesting; I guess we have to be careful
of those, unless that is you could make the visit_end_struct(v, NULL)
to fail nicely in that case.

> If we did anything interesting in addition to calling visitors, we'd
> have to additionally consider whether doing it is safe after errors.
> 
> Accumulating errors *can* make the code easier on the eyes, but they
> also make it easy to screw up behavior after error.
> 
> >> The check for failure is simple, but annoyingly verbose when the
> >> function's return value is useless:
> >> 
> >>     Error *err = NULL;
> >>     foo(..., &err);
> >>     if (err) {
> >>         ...
> >>     }
> >> 
> >> I'm playing with a update to conventions and usage to permit
> >> 
> >>     if (!foo(..., &err)) {
> >>         ...
> >>     }
> >
> > If that became;
> >       if (!foo(..., &err) ||
> >           !foo(..., &err) ||
> >           !foo(..., &err)) {
> >           ...
> >       }
> >
> > That would be both readable and not verbose.
> 
> Yes, that could be done then.

How would we deal with all the visit_end_* - if we've decided
there's an error are we required to call all the end's before we
just free the visitor or something like that?

> >> Just as simple, but more readable.
> >> 
> >> [...]
> >> >> I figure we're unlikely to reach consensus on this, so I'd like to
> >> >> propose we agree to disagree, and do the following:
> >> >> 
> >> >> * We shelve the de-duplication of JSON formatting (this patch)
> >> >>   indefinitely.
> >> >> 
> >> >> * We move qjson.c to migration/, next to its only user, and add a
> >> >>   comment explaining why it migration doesn't want to use general
> >> >>   infrastructure here (JSON output visitor), but needs its own thing.
> >> >>   This gets the file covered in MAINTAINERS, and will help prevent it
> >> >>   growing additional users.
> >> >> 
> >> >> Deal?
> >> >
> >> > No, sorry; the JSON use in the migration is just a debug thing;
> >> > we don't want to maintain a separate JSON instance for it.
> >> 
> >> Well, you already do, except in name.  Who else do you think is
> >> maintaining qjson.[ch], created by migration people, for migration's
> >> use?  Certainly not me.
> >
> > That came from migration? Really? I didn't think we used JSON at
> > all until last year.
> 
> Commit 0457d07..b174257.
> 
> Migration is still the only user of this special JSON writer, and if you
> ask me, it better remain the only one.
> 
> >> If you can't use the general JSON output code I maintain because of
> >> special requirements, you get to continue maintaining your own.  All 109
> >> SLOC of it.  All I'm asking is to make it official, and to deter
> >> accidental use of migration's JSON writer instead of the general one.
> >
> > Yeh; I'd love to share the JSON code; just lets try and avoid anything that
> > can kill the source, however broken the migration.
> 
> Visitors will abort when their preconditions or invariants are violated.
> If that's not okay for migration, I'm afraid migration needs to continue
> to roll its own JSON writer.  Visitors are pretty heavily used nowadays,
> and we very much rely on these assertions to catch mistakes.

OK, lets keep our own writer; if we can't have more control over visitors
failure paths, we'll have to.

Dave

--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

Eric Blake May 4, 2016, 3:17 p.m. UTC | #19

On 05/04/2016 08:53 AM, Dr. David Alan Gilbert wrote:

>> Here's the example from visitor.h at the end of this series (with a
>> small mistake corrected):
>>
>>     Visitor *v;
>>     Error *err = NULL;
>>     int value;
>>
>>     v = ...obtain visitor...
>>     visit_start_struct(v, NULL, NULL, 0, &err);
>>     if (err) {
>>         goto out;
>>     }
>>     visit_start_list(v, "list", NULL, 0, &err);
>>     if (err) {
>>         goto outobj;
>>     }
>>     value = 1;
>>     visit_type_int(v, NULL, &value, &err);
>>     if (err) {
>>         goto outlist;
>>     }
>>     value = 2;
>>     visit_type_int(v, NULL, &value, &err);
>>     if (err) {
>>         goto outlist;
>>     }
>>    outlist:
>>     visit_end_list(v, NULL);
>>     if (!err) {
>>         visit_check_struct(v, &err);
>>     }
>>    outobj:
>>     visit_end_struct(v, NULL);

> Hmm the visit_end_* are interesting; I guess we have to be careful
> of those, unless that is you could make the visit_end_struct(v, NULL)
> to fail nicely in that case.

Here, visit_end_struct() _can't_ fail; you are passing NULL to match the
fact that the earlier visit_start_struct() passed NULL for obj (that is,
at the end of the current qapi series, the parameter changes from
Error** to void **, and is NOT an error indicator).

>>
>> Visitors will abort when their preconditions or invariants are violated.

And it's very easy to avoid violating those preconditions. But if it's
not easy enough for you, then the solution is...

>> If that's not okay for migration, I'm afraid migration needs to continue
>> to roll its own JSON writer.  Visitors are pretty heavily used nowadays,
>> and we very much rely on these assertions to catch mistakes.
> 
> OK, lets keep our own writer; if we can't have more control over visitors
> failure paths, we'll have to.

...exactly this. So in the next spin of my series, I will drop (both!)
of my attempts to rework qjson.c to reuse the visitors, and instead just
move it into the migration tree and out of the top level.

Markus Armbruster May 4, 2016, 3:42 p.m. UTC | #20

"Dr. David Alan Gilbert" <dgilbert@redhat.com> writes:

> * Markus Armbruster (armbru@redhat.com) wrote:
>> "Dr. David Alan Gilbert" <dgilbert@redhat.com> writes:
>> 
>> > * Markus Armbruster (armbru@redhat.com) wrote:
>> >> "Dr. David Alan Gilbert" <dgilbert@redhat.com> writes:
>> >> 
>> >> > * Markus Armbruster (armbru@redhat.com) wrote:
>> >> >> "Dr. David Alan Gilbert" <dgilbert@redhat.com> writes:
>> >> >> 
>> >> >> > * Markus Armbruster (armbru@redhat.com) wrote:
>> >> >> >> "Dr. David Alan Gilbert" <dgilbert@redhat.com> writes:
>> >> >> >
>> >> >> >> "git-grep assert migration" suggests you do kill the source on certain
>> >> >> >> programming errors.
>> >> >> >
>> >> >> > I'm just trying hard to reduce them; I know I'm not there, but I'd rather
>> >> >> > we didn't have any - especially on the source side.
>> >> >> >
>> >> >> >> I reiterate my point that fancy, untestable error recovery is unlikely
>> >> >> >> to actually recover.  "Fancy" can work, "untestable" might work (but
>> >> >> >> color me skeptic), but once you got both, you're a dead man walking.
>> >> >> >
>> >> >> > Then we should make the error recovery paths easy; at the moment visitor
>> >> >> > error paths are just too painful.
>> >> >> 
>> >> >> I've never seen error handling in C that wasn't painful and still
>> >> >> correct.  Surprise me!
>> >> >
>> >> > The thing that makes it hard for the visitor code is the need to check
>> >> > it after every call and the check is complicated.
>> >> 
>> >> Having to check every call is certainly painful, but there's no general
>> >> and safe way around it.  Accumulating errors that need to be checked
>> >> only at the end of a job can be less painful, but then the job's code
>> >> needs to be very carefully written to be safe even in presence of
>> >> errors.  Most code isn't, and some code can't.
>> >
>> > Yes; output visitors would seem to be the easiest case though?
>> 
>> Here's the example from visitor.h at the end of this series (with a
>> small mistake corrected):
>> 
>>     Visitor *v;
>>     Error *err = NULL;
>>     int value;
>> 
>>     v = ...obtain visitor...
>>     visit_start_struct(v, NULL, NULL, 0, &err);
>>     if (err) {
>>         goto out;
>>     }
>>     visit_start_list(v, "list", NULL, 0, &err);
>>     if (err) {
>>         goto outobj;
>>     }
>>     value = 1;
>>     visit_type_int(v, NULL, &value, &err);
>>     if (err) {
>>         goto outlist;
>>     }
>>     value = 2;
>>     visit_type_int(v, NULL, &value, &err);
>>     if (err) {
>>         goto outlist;
>>     }
>>    outlist:
>>     visit_end_list(v, NULL);
>>     if (!err) {
>>         visit_check_struct(v, &err);
>>     }
>>    outobj:
>>     visit_end_struct(v, NULL);
>>    out:
>>     error_propagate(errp, err);
>>     ...clean up v...
>> 
>> With accumulating Errors, we could elide some but not all error checks.
>> In particular, the ones after visit_start_FOO() are still required,
>> because visit_end_FOO() may only be called after visit_start_FOO()
>> succeeded.
>
> Hmm the visit_end_* are interesting; I guess we have to be careful
> of those, unless that is you could make the visit_end_struct(v, NULL)
> to fail nicely in that case.

Unfortunately, making functions fail "nicely" on programming errors
creates a new burden for users that do not want to recover from
programming errors (because it's *unsafe*): now they have to somehow
find out whether the error they got is a programming error.  That's not
even possible right now, but we could add a "this is a programming
error" predicate to the error type.  Anyway, the error boilerplate would
then explode, and the error handling bugs would multiply.

You can't make the same API serve radically different requirements
equally well.  "Fail as cleanly as possible even on violations of
preconditions and invariants, and do your best to survive regardless of
how I handle them" is radically different from "rely on preconditions
and invariants, but do check them to catch bugs".  An interface doing
the former is bound to have weaker postconditions.  It's also harder to
implement, and harder to use.

>> If we did anything interesting in addition to calling visitors, we'd
>> have to additionally consider whether doing it is safe after errors.
>> 
>> Accumulating errors *can* make the code easier on the eyes, but they
>> also make it easy to screw up behavior after error.
>> 
>> >> The check for failure is simple, but annoyingly verbose when the
>> >> function's return value is useless:
>> >> 
>> >>     Error *err = NULL;
>> >>     foo(..., &err);
>> >>     if (err) {
>> >>         ...
>> >>     }
>> >> 
>> >> I'm playing with a update to conventions and usage to permit
>> >> 
>> >>     if (!foo(..., &err)) {
>> >>         ...
>> >>     }
>> >
>> > If that became;
>> >       if (!foo(..., &err) ||
>> >           !foo(..., &err) ||
>> >           !foo(..., &err)) {
>> >           ...
>> >       }
>> >
>> > That would be both readable and not verbose.
>> 
>> Yes, that could be done then.
>
> How would we deal with all the visit_end_* - if we've decided
> there's an error are we required to call all the end's before we
> just free the visitor or something like that?

After this series, the visitor contract guarantees (in writing) that you
can safely destroy or reset the visitor at any time.

Existing users call the matching visit_end_FOO() instead, because that's
easy enough for them.  But if you find yourself in a situation where you
don't want to, go ahead and destroy the visitor without further ado.

>> >> Just as simple, but more readable.
>> >> 
>> >> [...]
>> >> >> I figure we're unlikely to reach consensus on this, so I'd like to
>> >> >> propose we agree to disagree, and do the following:
>> >> >> 
>> >> >> * We shelve the de-duplication of JSON formatting (this patch)
>> >> >>   indefinitely.
>> >> >> 
>> >> >> * We move qjson.c to migration/, next to its only user, and add a
>> >> >>   comment explaining why it migration doesn't want to use general
>> >> >>   infrastructure here (JSON output visitor), but needs its own thing.
>> >> >>   This gets the file covered in MAINTAINERS, and will help prevent it
>> >> >>   growing additional users.
>> >> >> 
>> >> >> Deal?
>> >> >
>> >> > No, sorry; the JSON use in the migration is just a debug thing;
>> >> > we don't want to maintain a separate JSON instance for it.
>> >> 
>> >> Well, you already do, except in name.  Who else do you think is
>> >> maintaining qjson.[ch], created by migration people, for migration's
>> >> use?  Certainly not me.
>> >
>> > That came from migration? Really? I didn't think we used JSON at
>> > all until last year.
>> 
>> Commit 0457d07..b174257.
>> 
>> Migration is still the only user of this special JSON writer, and if you
>> ask me, it better remain the only one.
>> 
>> >> If you can't use the general JSON output code I maintain because of
>> >> special requirements, you get to continue maintaining your own.  All 109
>> >> SLOC of it.  All I'm asking is to make it official, and to deter
>> >> accidental use of migration's JSON writer instead of the general one.
>> >
>> > Yeh; I'd love to share the JSON code; just lets try and avoid anything that
>> > can kill the source, however broken the migration.
>> 
>> Visitors will abort when their preconditions or invariants are violated.
>> If that's not okay for migration, I'm afraid migration needs to continue
>> to roll its own JSON writer.  Visitors are pretty heavily used nowadays,
>> and we very much rely on these assertions to catch mistakes.
>
> OK, lets keep our own writer; if we can't have more control over visitors
> failure paths, we'll have to.

All right, I'll cook up a patch.

Paolo Bonzini May 24, 2016, 7:15 a.m. UTC | #21

On 04/05/2016 10:54, Dr. David Alan Gilbert wrote:
> And so my argument here is very simple; if we believe we have a corruption
> in migration data then we fail migration - I don't try and do anything
> clever about trying to bound what's broken.
> This isn't about getting formal/tractable arguments, it's about making
> a practical system.

But this code is not driven by the migration data.  It is driven by the
vmstate description which is static.  It's not const, but still it's not
supposed to get corrupted at all.

Thanks,

Paolo

[v3,10/18] vmstate: Use new JSON output visitor

Commit Message

Comments

Patch