[v4,1/4] qapi: Add a 'coroutine' flag for commands

Message ID	20200121181122.15941-2-kwolf@redhat.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <SRS0=WByt=3K=nongnu.org=qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@kernel.org> DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org EB33324655 From: Kevin Wolf <kwolf@redhat.com> To: qemu-devel@nongnu.org Subject: [PATCH v4 1/4] qapi: Add a 'coroutine' flag for commands Date: Tue, 21 Jan 2020 19:11:19 +0100 Message-Id: <20200121181122.15941-2-kwolf@redhat.com> In-Reply-To: <20200121181122.15941-1-kwolf@redhat.com> References: <20200121181122.15941-1-kwolf@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Precedence: list Cc: kwolf@redhat.com, qemu-block@nongnu.org, armbru@redhat.com, marcandre.lureau@gmail.com, stefanha@redhat.com Errors-To: qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org Sender: "Qemu-devel" <qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org>
Series	qmp: Optionally run handlers in coroutines \| expand [v4,0/4] qmp: Optionally run handlers in coroutines [v4,1/4] qapi: Add a 'coroutine' flag for commands [v4,2/4] vl: Initialise main loop earlier [v4,3/4] qmp: Move dispatcher to a coroutine [v4,4/4] block: Mark 'block_resize' as coroutine

Kevin Wolf Jan. 21, 2020, 6:11 p.m. UTC

This patch adds a new 'coroutine' flag to QMP command definitions that
tells the QMP dispatcher that the command handler is safe to be run in a
coroutine.

The documentation of the new flag pretends that this flag is already
used as intended, which it isn't yet after this patch. We'll implement
this in another patch in this series.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
---
 docs/devel/qapi-code-gen.txt            | 10 ++++++++++
 include/qapi/qmp/dispatch.h             |  1 +
 tests/test-qmp-cmds.c                   |  4 ++++
 scripts/qapi/commands.py                | 10 +++++++---
 scripts/qapi/doc.py                     |  2 +-
 scripts/qapi/expr.py                    |  7 +++++--
 scripts/qapi/introspect.py              |  2 +-
 scripts/qapi/schema.py                  |  9 ++++++---
 tests/qapi-schema/test-qapi.py          |  7 ++++---
 tests/Makefile.include                  |  1 +
 tests/qapi-schema/oob-coroutine.err     |  2 ++
 tests/qapi-schema/oob-coroutine.json    |  2 ++
 tests/qapi-schema/oob-coroutine.out     |  0
 tests/qapi-schema/qapi-schema-test.json |  1 +
 tests/qapi-schema/qapi-schema-test.out  |  2 ++
 15 files changed, 47 insertions(+), 13 deletions(-)
 create mode 100644 tests/qapi-schema/oob-coroutine.err
 create mode 100644 tests/qapi-schema/oob-coroutine.json
 create mode 100644 tests/qapi-schema/oob-coroutine.out

Markus Armbruster Jan. 22, 2020, 6:32 a.m. UTC | #1

Kevin Wolf <kwolf@redhat.com> writes:

> This patch adds a new 'coroutine' flag to QMP command definitions that
> tells the QMP dispatcher that the command handler is safe to be run in a
> coroutine.

I'm afraid I missed this question in my review of v3: when is a handler
*not* safe to be run in a coroutine?

> The documentation of the new flag pretends that this flag is already
> used as intended, which it isn't yet after this patch. We'll implement
> this in another patch in this series.
>
> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>

Kevin Wolf Jan. 22, 2020, 10:10 a.m. UTC | #2

Am 22.01.2020 um 07:32 hat Markus Armbruster geschrieben:
> Kevin Wolf <kwolf@redhat.com> writes:
> 
> > This patch adds a new 'coroutine' flag to QMP command definitions that
> > tells the QMP dispatcher that the command handler is safe to be run in a
> > coroutine.
> 
> I'm afraid I missed this question in my review of v3: when is a handler
> *not* safe to be run in a coroutine?

That's a hard one to answer fully.

Basically, I think the biggest problem is with calling functions that
change their behaviour if run in a coroutine compared to running them
outside of coroutine context. In most cases the differences like having
a nested event loop instead of yielding are just fine, but they are
still subtly different.

I know this is vague, but I can assure you that problematic cases exist.
I hit one of them with my initial hack that just moved everything into a
coroutine. It was related to graph modifications and bdrv_drain and
resulted in a hang. For the specifics, I would have to try and reproduce
the problem again.

Kevin

Markus Armbruster Jan. 22, 2020, 12:15 p.m. UTC | #3

Kevin Wolf <kwolf@redhat.com> writes:

> Am 22.01.2020 um 07:32 hat Markus Armbruster geschrieben:
>> Kevin Wolf <kwolf@redhat.com> writes:
>> 
>> > This patch adds a new 'coroutine' flag to QMP command definitions that
>> > tells the QMP dispatcher that the command handler is safe to be run in a
>> > coroutine.
>> 
>> I'm afraid I missed this question in my review of v3: when is a handler
>> *not* safe to be run in a coroutine?
>
> That's a hard one to answer fully.

You're welcome ;)

> Basically, I think the biggest problem is with calling functions that
> change their behaviour if run in a coroutine compared to running them
> outside of coroutine context. In most cases the differences like having
> a nested event loop instead of yielding are just fine, but they are
> still subtly different.
>
> I know this is vague, but I can assure you that problematic cases exist.
> I hit one of them with my initial hack that just moved everything into a
> coroutine. It was related to graph modifications and bdrv_drain and
> resulted in a hang. For the specifics, I would have to try and reproduce
> the problem again.

Interesting.

Is coroutine-incompatible command handler code necessary or accidental?

By "necessary" I mean there are (and likely always will be) commands
that need to do stuff that cannot or should not be done on coroutine
context.

"Accidental" is the opposite: coroutine-incompatibility can be regarded
as a fixable flaw.

How widespread is coroutine-incompatibility?  Common, or just a few
commands?

If coroutine-incompatibility is accidental, then your code to drop out
of coroutine context can be regarded as a temporary work-around.  Such
work-arounds receive a modest extra ugliness & complexity budget.

If coroutine-incompatibility is rare, we'll eventually want "mark the
known-bad ones with 'coroutine': false" instead of "mark the known-good
ones with 'coroutine': true".  I'm okay with marking the known-good ones
initially, and flipping only later.

Inability to recognize coroutine-incompatibility by inspection worries
me.  Can you think of ways to identify causes other than testing things
to death?

Kevin Wolf Jan. 22, 2020, 2:35 p.m. UTC | #4

Am 22.01.2020 um 13:15 hat Markus Armbruster geschrieben:
> Kevin Wolf <kwolf@redhat.com> writes:
> 
> > Am 22.01.2020 um 07:32 hat Markus Armbruster geschrieben:
> >> Kevin Wolf <kwolf@redhat.com> writes:
> >> 
> >> > This patch adds a new 'coroutine' flag to QMP command definitions that
> >> > tells the QMP dispatcher that the command handler is safe to be run in a
> >> > coroutine.
> >> 
> >> I'm afraid I missed this question in my review of v3: when is a handler
> >> *not* safe to be run in a coroutine?
> >
> > That's a hard one to answer fully.
> 
> You're welcome ;)
> 
> > Basically, I think the biggest problem is with calling functions that
> > change their behaviour if run in a coroutine compared to running them
> > outside of coroutine context. In most cases the differences like having
> > a nested event loop instead of yielding are just fine, but they are
> > still subtly different.
> >
> > I know this is vague, but I can assure you that problematic cases exist.
> > I hit one of them with my initial hack that just moved everything into a
> > coroutine. It was related to graph modifications and bdrv_drain and
> > resulted in a hang. For the specifics, I would have to try and reproduce
> > the problem again.
> 
> Interesting.
> 
> Is coroutine-incompatible command handler code necessary or accidental?
> 
> By "necessary" I mean there are (and likely always will be) commands
> that need to do stuff that cannot or should not be done on coroutine
> context.
> 
> "Accidental" is the opposite: coroutine-incompatibility can be regarded
> as a fixable flaw.

I expect it's mostly accidental, but maybe not all of it.

bdrv_drain() is an example of a function that has a problem with running
inside a coroutine: If you schedule another coroutine to run, it will
only run after the currently active coroutine yields. However,
bdrv_drain() doesn't yield, but runs a nested event loop, so this can
lead to deadlocks.

For this reason, bdrv_drain() has code to drop out of coroutine context,
so that you can call it from a coroutine anyway:

    /* Calling bdrv_drain() from a BH ensures the current coroutine yields and
     * other coroutines run if they were queued by aio_co_enter(). */

Now, after reading this, I think the problem in the QMP handler I
observerd was that it called some parts of the drain code without
dropping out of coroutine context first. Well possible that it was the
exact deadlock described by this comment.

(Even in bdrv_drain() it might not be strictly necessary. Maybe it's
possible to redesign bdrv_drain() so that it doesn't have a nested event
loop, but can somehow rely on the normal main loop. Touching it is
scary, though, so I'd rather not unless we have a good reason.)

> How widespread is coroutine-incompatibility?  Common, or just a few
> commands?

Answering this would require a massive code audit. Way out of scope for
this series. This is the reason why I decided to just make it optional
instead of trying to find and fix the cases that would need a fix if we
enabled coroutine handlers unconditionally.

> If coroutine-incompatibility is accidental, then your code to drop out
> of coroutine context can be regarded as a temporary work-around.  Such
> work-arounds receive a modest extra ugliness & complexity budget.
> 
> If coroutine-incompatibility is rare, we'll eventually want "mark the
> known-bad ones with 'coroutine': false" instead of "mark the known-good
> ones with 'coroutine': true".  I'm okay with marking the known-good ones
> initially, and flipping only later.
> 
> Inability to recognize coroutine-incompatibility by inspection worries
> me.  Can you think of ways to identify causes other than testing things
> to death?

I think it is possible to recognise by inspection, it would just be a
massive effort. Maybe a way to start it could be adding something like a
non_coroutine_fn marker to functions that we know don't want to be
called from coroutine context, and propagating it to their callers.

I guess I would start by looking for nested event loops (AIO_WAIT_WHILE
or BDRV_POLL_WHILE), and then you would have to decide for each whether
it could run into problems. I can't promise this will cover everything,
but it would at least cover the case known from bdrv_drain.

Kevin

Markus Armbruster Feb. 17, 2020, 7:08 a.m. UTC | #5

Kevin Wolf <kwolf@redhat.com> writes:

> This patch adds a new 'coroutine' flag to QMP command definitions that
> tells the QMP dispatcher that the command handler is safe to be run in a
> coroutine.
>
> The documentation of the new flag pretends that this flag is already
> used as intended, which it isn't yet after this patch. We'll implement
> this in another patch in this series.
>
> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
> ---
>  docs/devel/qapi-code-gen.txt            | 10 ++++++++++
>  include/qapi/qmp/dispatch.h             |  1 +
>  tests/test-qmp-cmds.c                   |  4 ++++
>  scripts/qapi/commands.py                | 10 +++++++---
>  scripts/qapi/doc.py                     |  2 +-
>  scripts/qapi/expr.py                    |  7 +++++--
>  scripts/qapi/introspect.py              |  2 +-
>  scripts/qapi/schema.py                  |  9 ++++++---
>  tests/qapi-schema/test-qapi.py          |  7 ++++---
>  tests/Makefile.include                  |  1 +
>  tests/qapi-schema/oob-coroutine.err     |  2 ++
>  tests/qapi-schema/oob-coroutine.json    |  2 ++
>  tests/qapi-schema/oob-coroutine.out     |  0
>  tests/qapi-schema/qapi-schema-test.json |  1 +
>  tests/qapi-schema/qapi-schema-test.out  |  2 ++
>  15 files changed, 47 insertions(+), 13 deletions(-)
>  create mode 100644 tests/qapi-schema/oob-coroutine.err
>  create mode 100644 tests/qapi-schema/oob-coroutine.json
>  create mode 100644 tests/qapi-schema/oob-coroutine.out
>
> diff --git a/docs/devel/qapi-code-gen.txt b/docs/devel/qapi-code-gen.txt
> index 59d6973e1e..9b65cd3ab3 100644
> --- a/docs/devel/qapi-code-gen.txt
> +++ b/docs/devel/qapi-code-gen.txt
> @@ -457,6 +457,7 @@ Syntax:
>                  '*gen': false,
>                  '*allow-oob': true,
>                  '*allow-preconfig': true,
> +                '*coroutine': true,
>                  '*if': COND,
>                  '*features': FEATURES }
>  
> @@ -581,6 +582,15 @@ before the machine is built.  It defaults to false.  For example:
>  QMP is available before the machine is built only when QEMU was
>  started with --preconfig.
>  
> +Member 'coroutine' tells the QMP dispatcher whether the command handler
> +is safe to be run in a coroutine.  It defaults to false.  If it is true,
> +the command handler is called from coroutine context and may yield while
> +waiting for an external event (such as I/O completion) in order to avoid
> +blocking the guest and other background operations.
> +
> +It is an error to specify both 'coroutine': true and 'allow-oob': true
> +for a command.
> +
>  The optional 'if' member specifies a conditional.  See "Configuring
>  the schema" below for more on this.
>  
> diff --git a/include/qapi/qmp/dispatch.h b/include/qapi/qmp/dispatch.h
> index 9aa426a398..d6ce9efc8e 100644
> --- a/include/qapi/qmp/dispatch.h
> +++ b/include/qapi/qmp/dispatch.h
> @@ -24,6 +24,7 @@ typedef enum QmpCommandOptions
>      QCO_NO_SUCCESS_RESP       =  (1U << 0),
>      QCO_ALLOW_OOB             =  (1U << 1),
>      QCO_ALLOW_PRECONFIG       =  (1U << 2),
> +    QCO_COROUTINE             =  (1U << 3),
>  } QmpCommandOptions;
>  
>  typedef struct QmpCommand
> diff --git a/tests/test-qmp-cmds.c b/tests/test-qmp-cmds.c
> index 79507d9e54..6359cc28c7 100644
> --- a/tests/test-qmp-cmds.c
> +++ b/tests/test-qmp-cmds.c
> @@ -35,6 +35,10 @@ void qmp_cmd_success_response(Error **errp)
>  {
>  }
>  
> +void qmp_coroutine_cmd(Error **errp)
> +{
> +}
> +
>  Empty2 *qmp_user_def_cmd0(Error **errp)
>  {
>      return g_new0(Empty2, 1);
> diff --git a/scripts/qapi/commands.py b/scripts/qapi/commands.py
> index afa55b055c..f2f2f8948d 100644
> --- a/scripts/qapi/commands.py
> +++ b/scripts/qapi/commands.py
> @@ -194,7 +194,8 @@ out:
>      return ret
>  
>  
> -def gen_register_command(name, success_response, allow_oob, allow_preconfig):
> +def gen_register_command(name, success_response, allow_oob, allow_preconfig,
> +                         coroutine):
>      options = []
>  
>      if not success_response:
> @@ -203,6 +204,8 @@ def gen_register_command(name, success_response, allow_oob, allow_preconfig):
>          options += ['QCO_ALLOW_OOB']
>      if allow_preconfig:
>          options += ['QCO_ALLOW_PRECONFIG']
> +    if coroutine:
> +        options += ['QCO_COROUTINE']
>  
>      if not options:
>          options = ['QCO_NO_OPTIONS']
> @@ -285,7 +288,7 @@ void %(c_prefix)sqmp_init_marshal(QmpCommandList *cmds);
>  
>      def visit_command(self, name, info, ifcond, arg_type, ret_type, gen,
>                        success_response, boxed, allow_oob, allow_preconfig,
> -                      features):
> +                      coroutine, features):
>          if not gen:
>              return
>          # FIXME: If T is a user-defined type, the user is responsible
> @@ -303,7 +306,8 @@ void %(c_prefix)sqmp_init_marshal(QmpCommandList *cmds);
>              self._genh.add(gen_marshal_decl(name))
>              self._genc.add(gen_marshal(name, arg_type, boxed, ret_type))
>              self._regy.add(gen_register_command(name, success_response,
> -                                                allow_oob, allow_preconfig))
> +                                                allow_oob, allow_preconfig,
> +                                                coroutine))
>  
>  
>  def gen_commands(schema, output_dir, prefix):
> diff --git a/scripts/qapi/doc.py b/scripts/qapi/doc.py
> index 6f1c17f71f..8b6978c81e 100644
> --- a/scripts/qapi/doc.py
> +++ b/scripts/qapi/doc.py
> @@ -265,7 +265,7 @@ class QAPISchemaGenDocVisitor(QAPISchemaVisitor):
>  
>      def visit_command(self, name, info, ifcond, arg_type, ret_type, gen,
>                        success_response, boxed, allow_oob, allow_preconfig,
> -                      features):
> +                      coroutine, features):
>          doc = self.cur_doc
>          self._gen.add(texi_msg('Command', doc, ifcond,
>                                 texi_arguments(doc,
> diff --git a/scripts/qapi/expr.py b/scripts/qapi/expr.py
> index d7a289eded..769c54ebfe 100644
> --- a/scripts/qapi/expr.py
> +++ b/scripts/qapi/expr.py
> @@ -89,10 +89,13 @@ def check_flags(expr, info):
>          if key in expr and expr[key] is not False:
>              raise QAPISemError(
>                  info, "flag '%s' may only use false value" % key)
> -    for key in ['boxed', 'allow-oob', 'allow-preconfig']:
> +    for key in ['boxed', 'allow-oob', 'allow-preconfig', 'coroutine']:
>          if key in expr and expr[key] is not True:
>              raise QAPISemError(
>                  info, "flag '%s' may only use true value" % key)
> +    if 'allow-oob' in expr and 'coroutine' in expr:
> +        raise QAPISemError(info, "flags 'allow-oob' and 'coroutine' "
> +                                 "are incompatible")

Only because we didn't implement coroutine context for exec-oob, for
want of a use case.  I think we should note that somehow, to remind our
future selves that the two aren't actually fundamentally incompatible.
Could put the reminder into the error message, like "'coroutine': true
isn't implement with 'allow-oob': true".  A comment would do as well.

A similar reminder in qapi-code-gen.txt wouldn't hurt.

>  
>  
>  def check_if(expr, info, source):
> @@ -344,7 +347,7 @@ def check_exprs(exprs):
>                         ['command'],
>                         ['data', 'returns', 'boxed', 'if', 'features',
>                          'gen', 'success-response', 'allow-oob',
> -                        'allow-preconfig'])
> +                        'allow-preconfig', 'coroutine'])
>              normalize_members(expr.get('data'))
>              check_command(expr, info)
>          elif meta == 'event':
> diff --git a/scripts/qapi/introspect.py b/scripts/qapi/introspect.py
> index b3a463dd8b..8a296a69d6 100644
> --- a/scripts/qapi/introspect.py
> +++ b/scripts/qapi/introspect.py
> @@ -212,7 +212,7 @@ const QLitObject %(c_name)s = %(c_string)s;
>  
>      def visit_command(self, name, info, ifcond, arg_type, ret_type, gen,
>                        success_response, boxed, allow_oob, allow_preconfig,
> -                      features):
> +                      coroutine, features):
>          arg_type = arg_type or self._schema.the_empty_object_type
>          ret_type = ret_type or self._schema.the_empty_object_type
>          obj = {'arg-type': self._use_type(arg_type),
> diff --git a/scripts/qapi/schema.py b/scripts/qapi/schema.py
> index 0bfc5256fb..87d76b59d3 100644
> --- a/scripts/qapi/schema.py
> +++ b/scripts/qapi/schema.py
> @@ -128,7 +128,7 @@ class QAPISchemaVisitor(object):
>  
>      def visit_command(self, name, info, ifcond, arg_type, ret_type, gen,
>                        success_response, boxed, allow_oob, allow_preconfig,
> -                      features):
> +                      coroutine, features):
>          pass
>  
>      def visit_event(self, name, info, ifcond, arg_type, boxed):
> @@ -690,7 +690,7 @@ class QAPISchemaCommand(QAPISchemaEntity):
>  
>      def __init__(self, name, info, doc, ifcond, arg_type, ret_type,
>                   gen, success_response, boxed, allow_oob, allow_preconfig,
> -                 features):
> +                 coroutine, features):
>          QAPISchemaEntity.__init__(self, name, info, doc, ifcond, features)
>          assert not arg_type or isinstance(arg_type, str)
>          assert not ret_type or isinstance(ret_type, str)
> @@ -703,6 +703,7 @@ class QAPISchemaCommand(QAPISchemaEntity):
>          self.boxed = boxed
>          self.allow_oob = allow_oob
>          self.allow_preconfig = allow_preconfig
> +        self.coroutine = coroutine
>  
>      def check(self, schema):
>          QAPISchemaEntity.check(self, schema)
> @@ -745,7 +746,7 @@ class QAPISchemaCommand(QAPISchemaEntity):
>                                self.arg_type, self.ret_type,
>                                self.gen, self.success_response,
>                                self.boxed, self.allow_oob,
> -                              self.allow_preconfig,
> +                              self.allow_preconfig, self.coroutine,
>                                self.features)
>  
>  
> @@ -1043,6 +1044,7 @@ class QAPISchema(object):
>          boxed = expr.get('boxed', False)
>          allow_oob = expr.get('allow-oob', False)
>          allow_preconfig = expr.get('allow-preconfig', False)
> +        coroutine = expr.get('coroutine', False)
>          ifcond = expr.get('if')
>          features = expr.get('features', [])
>          if isinstance(data, OrderedDict):
> @@ -1054,6 +1056,7 @@ class QAPISchema(object):
>          self._def_entity(QAPISchemaCommand(name, info, doc, ifcond, data, rets,
>                                             gen, success_response,
>                                             boxed, allow_oob, allow_preconfig,
> +                                           coroutine,
>                                             self._make_features(features, info)))
>  
>      def _def_event(self, expr, info, doc):
> diff --git a/tests/qapi-schema/test-qapi.py b/tests/qapi-schema/test-qapi.py
> index bad14edb47..7a8e65188d 100755
> --- a/tests/qapi-schema/test-qapi.py
> +++ b/tests/qapi-schema/test-qapi.py
> @@ -70,12 +70,13 @@ class QAPISchemaTestVisitor(QAPISchemaVisitor):
>  
>      def visit_command(self, name, info, ifcond, arg_type, ret_type, gen,
>                        success_response, boxed, allow_oob, allow_preconfig,
> -                      features):
> +                      coroutine, features):
>          print('command %s %s -> %s'
>                % (name, arg_type and arg_type.name,
>                   ret_type and ret_type.name))
> -        print('    gen=%s success_response=%s boxed=%s oob=%s preconfig=%s'
> -              % (gen, success_response, boxed, allow_oob, allow_preconfig))
> +        print('    gen=%s success_response=%s boxed=%s oob=%s preconfig=%s%s'
> +              % (gen, success_response, boxed, allow_oob, allow_preconfig,
> +                 " coroutine=True" if coroutine else ""))
>          self._print_if(ifcond)
>          self._print_features(features)
>  
> diff --git a/tests/Makefile.include b/tests/Makefile.include
> index c6827ce8c2..d111389c03 100644
> --- a/tests/Makefile.include
> +++ b/tests/Makefile.include
> @@ -286,6 +286,7 @@ qapi-schema += missing-type.json
>  qapi-schema += nested-struct-data.json
>  qapi-schema += nested-struct-data-invalid-dict.json
>  qapi-schema += non-objects.json
> +qapi-schema += oob-coroutine.json
>  qapi-schema += oob-test.json
>  qapi-schema += allow-preconfig-test.json
>  qapi-schema += pragma-doc-required-crap.json
> diff --git a/tests/qapi-schema/oob-coroutine.err b/tests/qapi-schema/oob-coroutine.err
> new file mode 100644
> index 0000000000..c01a4992bd
> --- /dev/null
> +++ b/tests/qapi-schema/oob-coroutine.err
> @@ -0,0 +1,2 @@
> +oob-coroutine.json: In command 'oob-command-1':
> +oob-coroutine.json:2: flags 'allow-oob' and 'coroutine' are incompatible
> diff --git a/tests/qapi-schema/oob-coroutine.json b/tests/qapi-schema/oob-coroutine.json
> new file mode 100644
> index 0000000000..0f67663bcd
> --- /dev/null
> +++ b/tests/qapi-schema/oob-coroutine.json
> @@ -0,0 +1,2 @@
> +# Check that incompatible flags allow-oob and coroutine are rejected
> +{ 'command': 'oob-command-1', 'allow-oob': true, 'coroutine': true }
> diff --git a/tests/qapi-schema/oob-coroutine.out b/tests/qapi-schema/oob-coroutine.out
> new file mode 100644
> index 0000000000..e69de29bb2

You remembered to cover the error case.  Appreciated!

> diff --git a/tests/qapi-schema/qapi-schema-test.json b/tests/qapi-schema/qapi-schema-test.json
> index 9abf175fe0..1a850fe171 100644
> --- a/tests/qapi-schema/qapi-schema-test.json
> +++ b/tests/qapi-schema/qapi-schema-test.json
> @@ -147,6 +147,7 @@
>    'returns': 'UserDefTwo' }
>  
>  { 'command': 'cmd-success-response', 'data': {}, 'success-response': false }
> +{ 'command': 'coroutine-cmd', 'data': {}, 'coroutine': true }
>  
>  # Returning a non-dictionary requires a name from the whitelist
>  { 'command': 'guest-get-time', 'data': {'a': 'int', '*b': 'int' },
> diff --git a/tests/qapi-schema/qapi-schema-test.out b/tests/qapi-schema/qapi-schema-test.out
> index 9bd3c4a490..fdc349991a 100644
> --- a/tests/qapi-schema/qapi-schema-test.out
> +++ b/tests/qapi-schema/qapi-schema-test.out
> @@ -203,6 +203,8 @@ command user_def_cmd2 q_obj_user_def_cmd2-arg -> UserDefTwo
>      gen=True success_response=True boxed=False oob=False preconfig=False
>  command cmd-success-response None -> None
>      gen=True success_response=False boxed=False oob=False preconfig=False
> +command coroutine-cmd None -> None
> +    gen=True success_response=True boxed=False oob=False preconfig=False coroutine=True
>  object q_obj_guest-get-time-arg
>      member a: int optional=False
>      member b: int optional=True

Preferably with the "not implemented" status clarified:
Reviewed-by: Markus Armbruster <armbru@redhat.com>

Markus Armbruster March 5, 2020, 3:30 p.m. UTC | #6

Kevin Wolf <kwolf@redhat.com> writes:

> Am 22.01.2020 um 07:32 hat Markus Armbruster geschrieben:
>> Kevin Wolf <kwolf@redhat.com> writes:
>> 
>> > This patch adds a new 'coroutine' flag to QMP command definitions that
>> > tells the QMP dispatcher that the command handler is safe to be run in a
>> > coroutine.
>> 
>> I'm afraid I missed this question in my review of v3: when is a handler
>> *not* safe to be run in a coroutine?
>
> That's a hard one to answer fully.
>
> Basically, I think the biggest problem is with calling functions that
> change their behaviour if run in a coroutine compared to running them
> outside of coroutine context. In most cases the differences like having
> a nested event loop instead of yielding are just fine, but they are
> still subtly different.
>
> I know this is vague, but I can assure you that problematic cases exist.
> I hit one of them with my initial hack that just moved everything into a
> coroutine. It was related to graph modifications and bdrv_drain and
> resulted in a hang. For the specifics, I would have to try and reproduce
> the problem again.

I'm afraid it's even more complicated.

Monitors (HMP and QMP) run in the main loop.  Before this patch, HMP and
QMP commands run start to finish, one after the other.

After this patch, QMP commands may elect to yield.  QMP commands still
run one after the other (the shared dispatcher ensures this even when we
have multiple QMP monitors).

However, *HMP* commands can now run interleaved with a coroutine-enabled
QMP command, i.e. between a yield and its re-enter.

Consider an HMP screendump running in the middle of a coroutine-enabled
QMP screendump, using Marc-André's patch.  As far as I can tell, it
works, because:

1. HMP does not run in a coroutine.  If it did, and both dumps dumped
the same @con, then it would overwrite con->screndump_co.  If we ever
decide to make HMP coroutine-capable so it doesn't block the main loop,
this will become unsafe in a nasty way.

2. graphic_hw_update() is safe to call even while another
graphic_hw_update() runs.  qxl_render_update() appears to ensure that
with the help of qxl->ssd.lock.

3. There is no 3[*].

My point is: this is a non-trivial argument.  Whether a QMP command
handler is safe to run in a coroutine depends on how it interacts with
all the stuff that can run interleaved with it.  Typically includes
itself via its HMP buddy.

[*] I'm less confident in 3. than in 1. and 2.

Kevin Wolf March 5, 2020, 4:06 p.m. UTC | #7

Am 05.03.2020 um 16:30 hat Markus Armbruster geschrieben:
> Kevin Wolf <kwolf@redhat.com> writes:
> 
> > Am 22.01.2020 um 07:32 hat Markus Armbruster geschrieben:
> >> Kevin Wolf <kwolf@redhat.com> writes:
> >> 
> >> > This patch adds a new 'coroutine' flag to QMP command definitions that
> >> > tells the QMP dispatcher that the command handler is safe to be run in a
> >> > coroutine.
> >> 
> >> I'm afraid I missed this question in my review of v3: when is a handler
> >> *not* safe to be run in a coroutine?
> >
> > That's a hard one to answer fully.
> >
> > Basically, I think the biggest problem is with calling functions that
> > change their behaviour if run in a coroutine compared to running them
> > outside of coroutine context. In most cases the differences like having
> > a nested event loop instead of yielding are just fine, but they are
> > still subtly different.
> >
> > I know this is vague, but I can assure you that problematic cases exist.
> > I hit one of them with my initial hack that just moved everything into a
> > coroutine. It was related to graph modifications and bdrv_drain and
> > resulted in a hang. For the specifics, I would have to try and reproduce
> > the problem again.
> 
> I'm afraid it's even more complicated.
> 
> Monitors (HMP and QMP) run in the main loop.  Before this patch, HMP and
> QMP commands run start to finish, one after the other.
> 
> After this patch, QMP commands may elect to yield.  QMP commands still
> run one after the other (the shared dispatcher ensures this even when we
> have multiple QMP monitors).
> 
> However, *HMP* commands can now run interleaved with a coroutine-enabled
> QMP command, i.e. between a yield and its re-enter.

I guess that's right. :-(

> Consider an HMP screendump running in the middle of a coroutine-enabled
> QMP screendump, using Marc-André's patch.  As far as I can tell, it
> works, because:
> 
> 1. HMP does not run in a coroutine.  If it did, and both dumps dumped
> the same @con, then it would overwrite con->screndump_co.  If we ever
> decide to make HMP coroutine-capable so it doesn't block the main loop,
> this will become unsafe in a nasty way.

At the same time, switching HMP to coroutines would give us an easy way
to fix the problem: Just use a CoMutex so that HMP and QMP commands
never run in parallel. Unless we actually _want_ to run both at the same
time for some commands, but I don't see why.

While we don't have this, maybe it's worth considering if there is
another simple way to achieve the same thing. Could QMP just suspend all
HMP monitors while it's executing a command? At first sight it seems
that this would work only for "interactive" monitors.

> 2. graphic_hw_update() is safe to call even while another
> graphic_hw_update() runs.  qxl_render_update() appears to ensure that
> with the help of qxl->ssd.lock.
> 
> 3. There is no 3[*].
> 
> My point is: this is a non-trivial argument.  Whether a QMP command
> handler is safe to run in a coroutine depends on how it interacts with
> all the stuff that can run interleaved with it.  Typically includes
> itself via its HMP buddy.

If the handler doesn't yield, it's still trivial. So I think my original
statement that with the existing QMP handlers, the problem is with code
that behaves different between coroutine and non-coroutine calls, is
still true because that is the only code that could possibly yield with
existing QMP command handlers.

Of course, you're right that handlers that actually can yield need to be
careful that they can deal with whatever happens until they are
reentered. And that seems to include HMP handlers.

Kevin

Markus Armbruster March 6, 2020, 7:25 a.m. UTC | #8

Kevin Wolf <kwolf@redhat.com> writes:

> Am 05.03.2020 um 16:30 hat Markus Armbruster geschrieben:
>> Kevin Wolf <kwolf@redhat.com> writes:
>> 
>> > Am 22.01.2020 um 07:32 hat Markus Armbruster geschrieben:
>> >> Kevin Wolf <kwolf@redhat.com> writes:
>> >> 
>> >> > This patch adds a new 'coroutine' flag to QMP command definitions that
>> >> > tells the QMP dispatcher that the command handler is safe to be run in a
>> >> > coroutine.
>> >> 
>> >> I'm afraid I missed this question in my review of v3: when is a handler
>> >> *not* safe to be run in a coroutine?
>> >
>> > That's a hard one to answer fully.
>> >
>> > Basically, I think the biggest problem is with calling functions that
>> > change their behaviour if run in a coroutine compared to running them
>> > outside of coroutine context. In most cases the differences like having
>> > a nested event loop instead of yielding are just fine, but they are
>> > still subtly different.
>> >
>> > I know this is vague, but I can assure you that problematic cases exist.
>> > I hit one of them with my initial hack that just moved everything into a
>> > coroutine. It was related to graph modifications and bdrv_drain and
>> > resulted in a hang. For the specifics, I would have to try and reproduce
>> > the problem again.
>> 
>> I'm afraid it's even more complicated.
>> 
>> Monitors (HMP and QMP) run in the main loop.  Before this patch, HMP and
>> QMP commands run start to finish, one after the other.
>> 
>> After this patch, QMP commands may elect to yield.  QMP commands still
>> run one after the other (the shared dispatcher ensures this even when we
>> have multiple QMP monitors).
>> 
>> However, *HMP* commands can now run interleaved with a coroutine-enabled
>> QMP command, i.e. between a yield and its re-enter.
>
> I guess that's right. :-(
>
>> Consider an HMP screendump running in the middle of a coroutine-enabled
>> QMP screendump, using Marc-André's patch.  As far as I can tell, it
>> works, because:
>> 
>> 1. HMP does not run in a coroutine.  If it did, and both dumps dumped
>> the same @con, then it would overwrite con->screndump_co.  If we ever
>> decide to make HMP coroutine-capable so it doesn't block the main loop,
>> this will become unsafe in a nasty way.
>
> At the same time, switching HMP to coroutines would give us an easy way
> to fix the problem: Just use a CoMutex so that HMP and QMP commands
> never run in parallel. Unless we actually _want_ to run both at the same
> time for some commands, but I don't see why.

As long as running QMP commands from *all* monitors one after the other
is good enough, I can't see why running HMP commands interleaved is
worth the risk.

> While we don't have this, maybe it's worth considering if there is
> another simple way to achieve the same thing. Could QMP just suspend all
> HMP monitors while it's executing a command? At first sight it seems
> that this would work only for "interactive" monitors.

I believe the non-"interactive" HMP code is used only by gdbstub.c.

I keep forgetting our GDB server stub creates an "HMP" monitor.
Possibly because I don't understand why.  Alex, Philippe, can you
enlighten me?

Aside: I figure the distribution of work between monitor_init(),
monitor_init_hmp() and monitor_init_qmp() could be improved.

>> 2. graphic_hw_update() is safe to call even while another
>> graphic_hw_update() runs.  qxl_render_update() appears to ensure that
>> with the help of qxl->ssd.lock.
>> 
>> 3. There is no 3[*].
>> 
>> My point is: this is a non-trivial argument.  Whether a QMP command
>> handler is safe to run in a coroutine depends on how it interacts with
>> all the stuff that can run interleaved with it.  Typically includes
>> itself via its HMP buddy.
>
> If the handler doesn't yield, it's still trivial. So I think my original
> statement that with the existing QMP handlers, the problem is with code
> that behaves different between coroutine and non-coroutine calls, is
> still true because that is the only code that could possibly yield with
> existing QMP command handlers.
>
> Of course, you're right that handlers that actually can yield need to be
> careful that they can deal with whatever happens until they are
> reentered. And that seems to include HMP handlers.
>
> Kevin

Kevin Wolf March 6, 2020, 9:52 a.m. UTC | #9

Am 06.03.2020 um 08:25 hat Markus Armbruster geschrieben:
> Kevin Wolf <kwolf@redhat.com> writes:
> 
> > Am 05.03.2020 um 16:30 hat Markus Armbruster geschrieben:
> >> Kevin Wolf <kwolf@redhat.com> writes:
> >> 
> >> > Am 22.01.2020 um 07:32 hat Markus Armbruster geschrieben:
> >> >> Kevin Wolf <kwolf@redhat.com> writes:
> >> >> 
> >> >> > This patch adds a new 'coroutine' flag to QMP command definitions that
> >> >> > tells the QMP dispatcher that the command handler is safe to be run in a
> >> >> > coroutine.
> >> >> 
> >> >> I'm afraid I missed this question in my review of v3: when is a handler
> >> >> *not* safe to be run in a coroutine?
> >> >
> >> > That's a hard one to answer fully.
> >> >
> >> > Basically, I think the biggest problem is with calling functions that
> >> > change their behaviour if run in a coroutine compared to running them
> >> > outside of coroutine context. In most cases the differences like having
> >> > a nested event loop instead of yielding are just fine, but they are
> >> > still subtly different.
> >> >
> >> > I know this is vague, but I can assure you that problematic cases exist.
> >> > I hit one of them with my initial hack that just moved everything into a
> >> > coroutine. It was related to graph modifications and bdrv_drain and
> >> > resulted in a hang. For the specifics, I would have to try and reproduce
> >> > the problem again.
> >> 
> >> I'm afraid it's even more complicated.
> >> 
> >> Monitors (HMP and QMP) run in the main loop.  Before this patch, HMP and
> >> QMP commands run start to finish, one after the other.
> >> 
> >> After this patch, QMP commands may elect to yield.  QMP commands still
> >> run one after the other (the shared dispatcher ensures this even when we
> >> have multiple QMP monitors).
> >> 
> >> However, *HMP* commands can now run interleaved with a coroutine-enabled
> >> QMP command, i.e. between a yield and its re-enter.
> >
> > I guess that's right. :-(
> >
> >> Consider an HMP screendump running in the middle of a coroutine-enabled
> >> QMP screendump, using Marc-André's patch.  As far as I can tell, it
> >> works, because:
> >> 
> >> 1. HMP does not run in a coroutine.  If it did, and both dumps dumped
> >> the same @con, then it would overwrite con->screndump_co.  If we ever
> >> decide to make HMP coroutine-capable so it doesn't block the main loop,
> >> this will become unsafe in a nasty way.
> >
> > At the same time, switching HMP to coroutines would give us an easy way
> > to fix the problem: Just use a CoMutex so that HMP and QMP commands
> > never run in parallel. Unless we actually _want_ to run both at the same
> > time for some commands, but I don't see why.
> 
> As long as running QMP commands from *all* monitors one after the other
> is good enough, I can't see why running HMP commands interleaved is
> worth the risk.

There is one exception, actually: Obviously, human-monitor-command must
allow HMP commands to run in parallel.

> > While we don't have this, maybe it's worth considering if there is
> > another simple way to achieve the same thing. Could QMP just suspend all
> > HMP monitors while it's executing a command? At first sight it seems
> > that this would work only for "interactive" monitors.
> 
> I believe the non-"interactive" HMP code is used only by gdbstub.c.

monitor_init_hmp() is called from (based on my block branch):

* monitor_init(): This is interactive.
* qemu_chr_new_noreplay(): Interactive, too.
* gdbserver_start(): Non-interactive.

There is also qmp_human_monitor_command(), which manually creates a
MonitorHMP without going through monitor_init_hmp(). It does call
monitor_data_init(), though. There are no additional callers of
monitor_data_init() and I think I would have added it to every creation
of a Monitor object when I did the QMP/HMP split of the struct.

So GDB and human-monitor-command should be the two non-interactive
cases.

> I keep forgetting our GDB server stub creates an "HMP" monitor.
> Possibly because I don't understand why.  Alex, Philippe, can you
> enlighten me?

I think you can send HMP commands from within gdb with it:

(gdb) tar rem:1234
Remote debugging using :1234
[...]
(gdb) monitor info block
ide1-cd0: [not inserted]
    Attached to:      /machine/unattached/device[23]
    Removable device: not locked, tray closed

floppy0: [not inserted]
    Attached to:      /machine/unattached/device[16]
    Removable device: not locked, tray closed

sd0: [not inserted]
    Removable device: not locked, tray closed

So we do want stop it from processing requests while a QMP command is
running.

Kevin

Markus Armbruster March 6, 2020, 12:38 p.m. UTC | #10

Kevin Wolf <kwolf@redhat.com> writes:

> Am 06.03.2020 um 08:25 hat Markus Armbruster geschrieben:
>> Kevin Wolf <kwolf@redhat.com> writes:
>> 
>> > Am 05.03.2020 um 16:30 hat Markus Armbruster geschrieben:
>> >> Kevin Wolf <kwolf@redhat.com> writes:
>> >> 
>> >> > Am 22.01.2020 um 07:32 hat Markus Armbruster geschrieben:
>> >> >> Kevin Wolf <kwolf@redhat.com> writes:
>> >> >> 
>> >> >> > This patch adds a new 'coroutine' flag to QMP command definitions that
>> >> >> > tells the QMP dispatcher that the command handler is safe to be run in a
>> >> >> > coroutine.
>> >> >> 
>> >> >> I'm afraid I missed this question in my review of v3: when is a handler
>> >> >> *not* safe to be run in a coroutine?
>> >> >
>> >> > That's a hard one to answer fully.
>> >> >
>> >> > Basically, I think the biggest problem is with calling functions that
>> >> > change their behaviour if run in a coroutine compared to running them
>> >> > outside of coroutine context. In most cases the differences like having
>> >> > a nested event loop instead of yielding are just fine, but they are
>> >> > still subtly different.
>> >> >
>> >> > I know this is vague, but I can assure you that problematic cases exist.
>> >> > I hit one of them with my initial hack that just moved everything into a
>> >> > coroutine. It was related to graph modifications and bdrv_drain and
>> >> > resulted in a hang. For the specifics, I would have to try and reproduce
>> >> > the problem again.
>> >> 
>> >> I'm afraid it's even more complicated.
>> >> 
>> >> Monitors (HMP and QMP) run in the main loop.  Before this patch, HMP and
>> >> QMP commands run start to finish, one after the other.
>> >> 
>> >> After this patch, QMP commands may elect to yield.  QMP commands still
>> >> run one after the other (the shared dispatcher ensures this even when we
>> >> have multiple QMP monitors).
>> >> 
>> >> However, *HMP* commands can now run interleaved with a coroutine-enabled
>> >> QMP command, i.e. between a yield and its re-enter.
>> >
>> > I guess that's right. :-(
>> >
>> >> Consider an HMP screendump running in the middle of a coroutine-enabled
>> >> QMP screendump, using Marc-André's patch.  As far as I can tell, it
>> >> works, because:
>> >> 
>> >> 1. HMP does not run in a coroutine.  If it did, and both dumps dumped
>> >> the same @con, then it would overwrite con->screndump_co.  If we ever
>> >> decide to make HMP coroutine-capable so it doesn't block the main loop,
>> >> this will become unsafe in a nasty way.
>> >
>> > At the same time, switching HMP to coroutines would give us an easy way
>> > to fix the problem: Just use a CoMutex so that HMP and QMP commands
>> > never run in parallel. Unless we actually _want_ to run both at the same
>> > time for some commands, but I don't see why.
>> 
>> As long as running QMP commands from *all* monitors one after the other
>> is good enough, I can't see why running HMP commands interleaved is
>> worth the risk.
>
> There is one exception, actually: Obviously, human-monitor-command must
> allow HMP commands to run in parallel.

Yes.

>> > While we don't have this, maybe it's worth considering if there is
>> > another simple way to achieve the same thing. Could QMP just suspend all
>> > HMP monitors while it's executing a command? At first sight it seems
>> > that this would work only for "interactive" monitors.
>> 
>> I believe the non-"interactive" HMP code is used only by gdbstub.c.
>
> monitor_init_hmp() is called from (based on my block branch):
>
> * monitor_init(): This is interactive.
> * qemu_chr_new_noreplay(): Interactive, too.
> * gdbserver_start(): Non-interactive.
>
> There is also qmp_human_monitor_command(), which manually creates a
> MonitorHMP without going through monitor_init_hmp(). It does call
> monitor_data_init(), though. There are no additional callers of
> monitor_data_init() and I think I would have added it to every creation
> of a Monitor object when I did the QMP/HMP split of the struct.
>
> So GDB and human-monitor-command should be the two non-interactive
> cases.

Yes.

>> I keep forgetting our GDB server stub creates an "HMP" monitor.
>> Possibly because I don't understand why.  Alex, Philippe, can you
>> enlighten me?
>
> I think you can send HMP commands from within gdb with it:
>
> (gdb) tar rem:1234
> Remote debugging using :1234
> [...]
> (gdb) monitor info block
> ide1-cd0: [not inserted]
>     Attached to:      /machine/unattached/device[23]
>     Removable device: not locked, tray closed
>
> floppy0: [not inserted]
>     Attached to:      /machine/unattached/device[16]
>     Removable device: not locked, tray closed
>
> sd0: [not inserted]
>     Removable device: not locked, tray closed

Argh!

Any idea where we define GDB command "monitor"?

> So we do want stop it from processing requests while a QMP command is
> running.

Then a slow QMP command can interfere with debugging.

Perhaps we can synchronize just the "monitor" command.

Kevin Wolf March 6, 2020, 2:26 p.m. UTC | #11

Am 06.03.2020 um 13:38 hat Markus Armbruster geschrieben:
> Kevin Wolf <kwolf@redhat.com> writes:
> 
> > Am 06.03.2020 um 08:25 hat Markus Armbruster geschrieben:
> >> Kevin Wolf <kwolf@redhat.com> writes:
> >> 
> >> > Am 05.03.2020 um 16:30 hat Markus Armbruster geschrieben:
> >> >> Kevin Wolf <kwolf@redhat.com> writes:
> >> >> 
> >> >> > Am 22.01.2020 um 07:32 hat Markus Armbruster geschrieben:
> >> >> >> Kevin Wolf <kwolf@redhat.com> writes:
> >> >> >> 
> >> >> >> > This patch adds a new 'coroutine' flag to QMP command definitions that
> >> >> >> > tells the QMP dispatcher that the command handler is safe to be run in a
> >> >> >> > coroutine.
> >> >> >> 
> >> >> >> I'm afraid I missed this question in my review of v3: when is a handler
> >> >> >> *not* safe to be run in a coroutine?
> >> >> >
> >> >> > That's a hard one to answer fully.
> >> >> >
> >> >> > Basically, I think the biggest problem is with calling functions that
> >> >> > change their behaviour if run in a coroutine compared to running them
> >> >> > outside of coroutine context. In most cases the differences like having
> >> >> > a nested event loop instead of yielding are just fine, but they are
> >> >> > still subtly different.
> >> >> >
> >> >> > I know this is vague, but I can assure you that problematic cases exist.
> >> >> > I hit one of them with my initial hack that just moved everything into a
> >> >> > coroutine. It was related to graph modifications and bdrv_drain and
> >> >> > resulted in a hang. For the specifics, I would have to try and reproduce
> >> >> > the problem again.
> >> >> 
> >> >> I'm afraid it's even more complicated.
> >> >> 
> >> >> Monitors (HMP and QMP) run in the main loop.  Before this patch, HMP and
> >> >> QMP commands run start to finish, one after the other.
> >> >> 
> >> >> After this patch, QMP commands may elect to yield.  QMP commands still
> >> >> run one after the other (the shared dispatcher ensures this even when we
> >> >> have multiple QMP monitors).
> >> >> 
> >> >> However, *HMP* commands can now run interleaved with a coroutine-enabled
> >> >> QMP command, i.e. between a yield and its re-enter.
> >> >
> >> > I guess that's right. :-(
> >> >
> >> >> Consider an HMP screendump running in the middle of a coroutine-enabled
> >> >> QMP screendump, using Marc-André's patch.  As far as I can tell, it
> >> >> works, because:
> >> >> 
> >> >> 1. HMP does not run in a coroutine.  If it did, and both dumps dumped
> >> >> the same @con, then it would overwrite con->screndump_co.  If we ever
> >> >> decide to make HMP coroutine-capable so it doesn't block the main loop,
> >> >> this will become unsafe in a nasty way.
> >> >
> >> > At the same time, switching HMP to coroutines would give us an easy way
> >> > to fix the problem: Just use a CoMutex so that HMP and QMP commands
> >> > never run in parallel. Unless we actually _want_ to run both at the same
> >> > time for some commands, but I don't see why.
> >> 
> >> As long as running QMP commands from *all* monitors one after the other
> >> is good enough, I can't see why running HMP commands interleaved is
> >> worth the risk.
> >
> > There is one exception, actually: Obviously, human-monitor-command must
> > allow HMP commands to run in parallel.
> 
> Yes.
> 
> >> > While we don't have this, maybe it's worth considering if there is
> >> > another simple way to achieve the same thing. Could QMP just suspend all
> >> > HMP monitors while it's executing a command? At first sight it seems
> >> > that this would work only for "interactive" monitors.
> >> 
> >> I believe the non-"interactive" HMP code is used only by gdbstub.c.
> >
> > monitor_init_hmp() is called from (based on my block branch):
> >
> > * monitor_init(): This is interactive.
> > * qemu_chr_new_noreplay(): Interactive, too.
> > * gdbserver_start(): Non-interactive.
> >
> > There is also qmp_human_monitor_command(), which manually creates a
> > MonitorHMP without going through monitor_init_hmp(). It does call
> > monitor_data_init(), though. There are no additional callers of
> > monitor_data_init() and I think I would have added it to every creation
> > of a Monitor object when I did the QMP/HMP split of the struct.
> >
> > So GDB and human-monitor-command should be the two non-interactive
> > cases.
> 
> Yes.
> 
> >> I keep forgetting our GDB server stub creates an "HMP" monitor.
> >> Possibly because I don't understand why.  Alex, Philippe, can you
> >> enlighten me?
> >
> > I think you can send HMP commands from within gdb with it:
> >
> > (gdb) tar rem:1234
> > Remote debugging using :1234
> > [...]
> > (gdb) monitor info block
> > ide1-cd0: [not inserted]
> >     Attached to:      /machine/unattached/device[23]
> >     Removable device: not locked, tray closed
> >
> > floppy0: [not inserted]
> >     Attached to:      /machine/unattached/device[16]
> >     Removable device: not locked, tray closed
> >
> > sd0: [not inserted]
> >     Removable device: not locked, tray closed
> 
> Argh!
> 
> Any idea where we define GDB command "monitor"?

Just following the s->mon_chr that is assigned in gdbserver_start(), it
looks like handle_query_rcmd() sends the HMP command to the monitor.

gdb_gen_query_table has a function pointer to handle_query_rcmd() for
the gdb protocol command "Rcmd", and I think this is what gdb will send
for the "monitor" command.

> > So we do want stop it from processing requests while a QMP command is
> > running.
> 
> Then a slow QMP command can interfere with debugging.
> 
> Perhaps we can synchronize just the "monitor" command.

I didn't mean stop processing of gdb commands, but only of HMP commands
submitted via gdb (which will probably still block gdb while it's
waiting for a response, but only if a "monitor" command was given).

This is probably still not trivial because so far we have no buffering
involved anywhere and handle_query_rcmd() (or probably the whole gdbstub
command processing) is synchronous. Maybe run a nested event loop until
the QMP command is done or something.

Kevin

[v4,1/4] qapi: Add a 'coroutine' flag for commands

Commit Message

Comments

Patch