diff mbox

[RFC] monitor: turn on Out-Of-Band by default again

Message ID 20180521084249.10640-1-peterx@redhat.com (mailing list archive)
State New, archived
Headers show

Commit Message

Peter Xu May 21, 2018, 8:42 a.m. UTC
We turned Out-Of-Band feature of monitors off for 2.12 release.  Now we
try to turn that on again.

Signed-off-by: Peter Xu <peterx@redhat.com>
--
Now OOB should be okay with all known tests (except iotest qcow2, since
it is still broken on master), and AFAIK now we should also be okay with
ARM+Libvirt (not testsed, but Eric Auger helped to verify that before
the release).  So I think it's now safe to turn OOB on again.  Please
feel free to test this against any of existing testsuites to see whether
it'll still break any stuff.  Thanks,

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 monitor.c        | 13 +++----------
 tests/qmp-test.c |  2 +-
 vl.c             |  9 ++++-----
 3 files changed, 8 insertions(+), 16 deletions(-)

Comments

Daniel P. Berrangé May 21, 2018, 8:59 a.m. UTC | #1
On Mon, May 21, 2018 at 04:42:49PM +0800, Peter Xu wrote:
> We turned Out-Of-Band feature of monitors off for 2.12 release.  Now we
> try to turn that on again.
> 
> Signed-off-by: Peter Xu <peterx@redhat.com>
> --
> Now OOB should be okay with all known tests (except iotest qcow2, since
> it is still broken on master), and AFAIK now we should also be okay with
> ARM+Libvirt (not testsed, but Eric Auger helped to verify that before
> the release).  So I think it's now safe to turn OOB on again.  Please
> feel free to test this against any of existing testsuites to see whether
> it'll still break any stuff.  Thanks,

Can you remind us what the problem was with ARM+Libvirt that this
caused ?

> 
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
>  monitor.c        | 13 +++----------
>  tests/qmp-test.c |  2 +-
>  vl.c             |  9 ++++-----
>  3 files changed, 8 insertions(+), 16 deletions(-)
> 
> diff --git a/monitor.c b/monitor.c
> index 46814af533..ce5cc5e34e 100644
> --- a/monitor.c
> +++ b/monitor.c
> @@ -4560,16 +4560,9 @@ void monitor_init(Chardev *chr, int flags)
>      bool use_readline = flags & MONITOR_USE_READLINE;
>      bool use_oob = flags & MONITOR_USE_OOB;
>  
> -    if (use_oob) {
> -        if (CHARDEV_IS_MUX(chr)) {
> -            error_report("Monitor Out-Of-Band is not supported with "
> -                         "MUX typed chardev backend");
> -            exit(1);
> -        }
> -        if (use_readline) {
> -            error_report("Monitor Out-Of-band is only supported by QMP");
> -            exit(1);
> -        }
> +    if (CHARDEV_IS_MUX(chr)) {
> +        /* MUX is still not supported for Out-Of-Band */
> +        use_oob = false;
>      }
>  
>      monitor_data_init(mon, false, use_oob);
> diff --git a/tests/qmp-test.c b/tests/qmp-test.c
> index 88f867f8c0..c85a3964d9 100644
> --- a/tests/qmp-test.c
> +++ b/tests/qmp-test.c
> @@ -89,7 +89,7 @@ static void test_qmp_protocol(void)
>      g_assert(q);
>      test_version(qdict_get(q, "version"));
>      capabilities = qdict_get_qlist(q, "capabilities");
> -    g_assert(capabilities && qlist_empty(capabilities));
> +    g_assert(capabilities);
>      qobject_unref(resp);
>  
>      /* Test valid command before handshake */
> diff --git a/vl.c b/vl.c
> index 3b39bbd7a8..b71fb8eb25 100644
> --- a/vl.c
> +++ b/vl.c
> @@ -2394,6 +2394,10 @@ static int mon_init_func(void *opaque, QemuOpts *opts, Error **errp)
>          flags = MONITOR_USE_READLINE;
>      } else if (strcmp(mode, "control") == 0) {
>          flags = MONITOR_USE_CONTROL;
> +        /* Out-Of-Band is on by default */
> +        if (qemu_opt_get_bool(opts, "x-oob", 1)) {
> +            flags |= MONITOR_USE_OOB;
> +        }
>      } else {
>          error_report("unknown monitor mode \"%s\"", mode);
>          exit(1);
> @@ -2402,11 +2406,6 @@ static int mon_init_func(void *opaque, QemuOpts *opts, Error **errp)
>      if (qemu_opt_get_bool(opts, "pretty", 0))
>          flags |= MONITOR_USE_PRETTY;
>  
> -    /* OOB is off by default */
> -    if (qemu_opt_get_bool(opts, "x-oob", 0)) {
> -        flags |= MONITOR_USE_OOB;
> -    }
> -
>      chardev = qemu_opt_get(opts, "chardev");
>      chr = qemu_chr_find(chardev);
>      if (chr == NULL) {
> -- 
> 2.17.0
> 

Regards,
Daniel
Peter Xu May 21, 2018, 10:41 a.m. UTC | #2
On Mon, May 21, 2018 at 09:59:07AM +0100, Daniel P. Berrangé wrote:
> On Mon, May 21, 2018 at 04:42:49PM +0800, Peter Xu wrote:
> > We turned Out-Of-Band feature of monitors off for 2.12 release.  Now we
> > try to turn that on again.
> > 
> > Signed-off-by: Peter Xu <peterx@redhat.com>
> > --
> > Now OOB should be okay with all known tests (except iotest qcow2, since
> > it is still broken on master), and AFAIK now we should also be okay with
> > ARM+Libvirt (not testsed, but Eric Auger helped to verify that before
> > the release).  So I think it's now safe to turn OOB on again.  Please
> > feel free to test this against any of existing testsuites to see whether
> > it'll still break any stuff.  Thanks,
> 
> Can you remind us what the problem was with ARM+Libvirt that this
> caused ?

Sure.  The regression reported by Eric Auger is:

http://lists.gnu.org/archive/html/qemu-devel/2018-03/msg06231.html

The fix is:

https://patchwork.kernel.org/patch/10332289/

And the fix is in master already.

Regards,
Eric Blake May 21, 2018, 2:13 p.m. UTC | #3
On 05/21/2018 03:42 AM, Peter Xu wrote:
> We turned Out-Of-Band feature of monitors off for 2.12 release.  Now we
> try to turn that on again.

"try to turn" sounds weak, like you aren't sure of this patch.  If you 
aren't sure, then why should we feel safe in applying it?  This text is 
going in the permanent git history, so sound bold, rather than hesitant!

"We have resolved the issues from last time (commit 3fd2457d reverted by 
commit a4f90923):
- issue 1 ...
- issue 2 ...
So now we are ready to enable advertisement of the feature by default"

with better descriptions of the issues that you fixed (I can think of at 
least the fixes adding thread-safety to the current monitor, and fixing 
early use of the monitor before qmp_capabilities completes; there may 
also be other issues that you want to call out).

> 
> Signed-off-by: Peter Xu <peterx@redhat.com>
> --
> Now OOB should be okay with all known tests (except iotest qcow2, since
> it is still broken on master),

Which tests are still failing for you?  Ideally, you can still 
demonstrate that the tests not failing without this patch continue to 
pass with this patch, even if you call out the tests that have known 
issues to still be resolved.

> and AFAIK now we should also be okay with
> ARM+Libvirt (not testsed, but Eric Auger helped to verify that before
> the release).  So I think it's now safe to turn OOB on again.  Please
> feel free to test this against any of existing testsuites to see whether
> it'll still break any stuff.  Thanks,
> 
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
>   monitor.c        | 13 +++----------
>   tests/qmp-test.c |  2 +-
>   vl.c             |  9 ++++-----
>   3 files changed, 8 insertions(+), 16 deletions(-)
> 
> diff --git a/monitor.c b/monitor.c
> index 46814af533..ce5cc5e34e 100644
> --- a/monitor.c
> +++ b/monitor.c
> @@ -4560,16 +4560,9 @@ void monitor_init(Chardev *chr, int flags)
>       bool use_readline = flags & MONITOR_USE_READLINE;
>       bool use_oob = flags & MONITOR_USE_OOB;
>   
> -    if (use_oob) {
> -        if (CHARDEV_IS_MUX(chr)) {
> -            error_report("Monitor Out-Of-Band is not supported with "
> -                         "MUX typed chardev backend");
> -            exit(1);
> -        }
> -        if (use_readline) {
> -            error_report("Monitor Out-Of-band is only supported by QMP");
> -            exit(1);
> -        }
> +    if (CHARDEV_IS_MUX(chr)) {
> +        /* MUX is still not supported for Out-Of-Band */
> +        use_oob = false;

This isn't a mere reinstatement of 3fd2457d, but is now advertising OOB 
when using readline (which presumably is a synonym for using HMP).  Is 
that intentional?  If so, the commit message should mention it.

>       }
>   
>       monitor_data_init(mon, false, use_oob);
> diff --git a/tests/qmp-test.c b/tests/qmp-test.c
> index 88f867f8c0..c85a3964d9 100644
> --- a/tests/qmp-test.c
> +++ b/tests/qmp-test.c
> @@ -89,7 +89,7 @@ static void test_qmp_protocol(void)
>       g_assert(q);
>       test_version(qdict_get(q, "version"));
>       capabilities = qdict_get_qlist(q, "capabilities");
> -    g_assert(capabilities && qlist_empty(capabilities));
> +    g_assert(capabilities);
>       qobject_unref(resp);
>   
>       /* Test valid command before handshake */
> diff --git a/vl.c b/vl.c
> index 3b39bbd7a8..b71fb8eb25 100644
> --- a/vl.c
> +++ b/vl.c
> @@ -2394,6 +2394,10 @@ static int mon_init_func(void *opaque, QemuOpts *opts, Error **errp)
>           flags = MONITOR_USE_READLINE;
>       } else if (strcmp(mode, "control") == 0) {
>           flags = MONITOR_USE_CONTROL;
> +        /* Out-Of-Band is on by default */
> +        if (qemu_opt_get_bool(opts, "x-oob", 1)) {
> +            flags |= MONITOR_USE_OOB;
> +        }

Do we really still need the x-oob property, vs. outright deletion of 
this bandaid?  Then again, I guess keeping it for one more release makes 
it easier to forcefully turn things off for temporary testing when 
isolating whether OOB is a culprit in something breaking.

>       } else {
>           error_report("unknown monitor mode \"%s\"", mode);
>           exit(1);
> @@ -2402,11 +2406,6 @@ static int mon_init_func(void *opaque, QemuOpts *opts, Error **errp)
>       if (qemu_opt_get_bool(opts, "pretty", 0))
>           flags |= MONITOR_USE_PRETTY;
>   
> -    /* OOB is off by default */
> -    if (qemu_opt_get_bool(opts, "x-oob", 0)) {
> -        flags |= MONITOR_USE_OOB;
> -    }
> -
>       chardev = qemu_opt_get(opts, "chardev");
>       chr = qemu_chr_find(chardev);
>       if (chr == NULL) {
>
Peter Xu May 22, 2018, 3:39 a.m. UTC | #4
On Mon, May 21, 2018 at 09:13:06AM -0500, Eric Blake wrote:
> On 05/21/2018 03:42 AM, Peter Xu wrote:
> > We turned Out-Of-Band feature of monitors off for 2.12 release.  Now we
> > try to turn that on again.
> 
> "try to turn" sounds weak, like you aren't sure of this patch.  If you
> aren't sure, then why should we feel safe in applying it?  This text is
> going in the permanent git history, so sound bold, rather than hesitant!

Yeah I am not really strong at turn that on by default, that's why I
marked the patch as RFC.  I wanted to hear from your opinions.  For
now IMHO even with x-oob, postcopy can start to work with network
recovery, then the requirement from my part is done.  However I'm
thinking maybe we should still turn that on for all the people.  One
reason is that we already have the QMP capability negotiation so it
seems redundant (as you mentioned before), meanwhile exposing it to
broader users can let broader users to leverage this new bits directly
(meanwhile easier to expose potential issues for OOB too).

Meanwhile I'm not confident too on that there can be other test cases
that has not yet been run with Out-Of-Band, so even if we solved all
existing problems I can't be sure that no further test will broke.
However I don't see it a problem for merging, since AFAIU I can't
really know what will break again (if there is) unless we apply that
to master again... :)

> 
> "We have resolved the issues from last time (commit 3fd2457d reverted by
> commit a4f90923):
> - issue 1 ...
> - issue 2 ...
> So now we are ready to enable advertisement of the feature by default"
> 
> with better descriptions of the issues that you fixed (I can think of at
> least the fixes adding thread-safety to the current monitor, and fixing
> early use of the monitor before qmp_capabilities completes; there may also
> be other issues that you want to call out).

Some of the monitor patches are not really related to previous OOB
breakage, the only one that really matters should be the ARM+Libvirt
one, which I will definitely mention in my next post.  The rest
(including per-thread cur_mon, monitor thread-safety, etc.) should
mostly for future new commands of Out-Of-Band but not for now.  For
example, current OOB commands are rare, now they don't use the
get_fd()/set_fd() interface, then the mon_fdsets won't need to be
protected at all.  But we can't guarantee that new OOB commands won't
use them too, so we still need to protect them with locks.

> 
> > 
> > Signed-off-by: Peter Xu <peterx@redhat.com>
> > --
> > Now OOB should be okay with all known tests (except iotest qcow2, since
> > it is still broken on master),
> 
> Which tests are still failing for you?  Ideally, you can still demonstrate
> that the tests not failing without this patch continue to pass with this
> patch, even if you call out the tests that have known issues to still be
> resolved.

I didn't remember.  We can first settle down on whether we'd like to
turn on this default value, then I can perform this test for my next
post to make sure good tests won't break.

> 
> > and AFAIK now we should also be okay with
> > ARM+Libvirt (not testsed, but Eric Auger helped to verify that before
> > the release).  So I think it's now safe to turn OOB on again.  Please
> > feel free to test this against any of existing testsuites to see whether
> > it'll still break any stuff.  Thanks,
> > 
> > Signed-off-by: Peter Xu <peterx@redhat.com>
> > ---
> >   monitor.c        | 13 +++----------
> >   tests/qmp-test.c |  2 +-
> >   vl.c             |  9 ++++-----
> >   3 files changed, 8 insertions(+), 16 deletions(-)
> > 
> > diff --git a/monitor.c b/monitor.c
> > index 46814af533..ce5cc5e34e 100644
> > --- a/monitor.c
> > +++ b/monitor.c
> > @@ -4560,16 +4560,9 @@ void monitor_init(Chardev *chr, int flags)
> >       bool use_readline = flags & MONITOR_USE_READLINE;
> >       bool use_oob = flags & MONITOR_USE_OOB;
> > -    if (use_oob) {
> > -        if (CHARDEV_IS_MUX(chr)) {
> > -            error_report("Monitor Out-Of-Band is not supported with "
> > -                         "MUX typed chardev backend");
> > -            exit(1);
> > -        }
> > -        if (use_readline) {
> > -            error_report("Monitor Out-Of-band is only supported by QMP");
> > -            exit(1);
> > -        }
> > +    if (CHARDEV_IS_MUX(chr)) {
> > +        /* MUX is still not supported for Out-Of-Band */
> > +        use_oob = false;
> 
> This isn't a mere reinstatement of 3fd2457d, but is now advertising OOB when
> using readline (which presumably is a synonym for using HMP).  Is that
> intentional?  If so, the commit message should mention it.

At [1] below I directly moved the chunk into "mode=control" path, so
the QMP check is already there.  Here I turn OOB off explicitly for
MUX no matter HMP/QMP.  It should have the same affect as 3fd2457d.

> 
> >       }
> >       monitor_data_init(mon, false, use_oob);
> > diff --git a/tests/qmp-test.c b/tests/qmp-test.c
> > index 88f867f8c0..c85a3964d9 100644
> > --- a/tests/qmp-test.c
> > +++ b/tests/qmp-test.c
> > @@ -89,7 +89,7 @@ static void test_qmp_protocol(void)
> >       g_assert(q);
> >       test_version(qdict_get(q, "version"));
> >       capabilities = qdict_get_qlist(q, "capabilities");
> > -    g_assert(capabilities && qlist_empty(capabilities));
> > +    g_assert(capabilities);
> >       qobject_unref(resp);
> >       /* Test valid command before handshake */
> > diff --git a/vl.c b/vl.c
> > index 3b39bbd7a8..b71fb8eb25 100644
> > --- a/vl.c
> > +++ b/vl.c
> > @@ -2394,6 +2394,10 @@ static int mon_init_func(void *opaque, QemuOpts *opts, Error **errp)
> >           flags = MONITOR_USE_READLINE;
> >       } else if (strcmp(mode, "control") == 0) {
> >           flags = MONITOR_USE_CONTROL;
> > +        /* Out-Of-Band is on by default */
> > +        if (qemu_opt_get_bool(opts, "x-oob", 1)) {
> > +            flags |= MONITOR_USE_OOB;
> > +        }

[1]

> 
> Do we really still need the x-oob property, vs. outright deletion of this
> bandaid?  Then again, I guess keeping it for one more release makes it
> easier to forcefully turn things off for temporary testing when isolating
> whether OOB is a culprit in something breaking.

Yes, since we already have it, I didn't have strong opinion to remove
it, so I kept it.  Actually now we can't turn OOB off completely
already - it's fully embeded in QMP internal logic (e.g., now we'll
always have completely isolated parser, dispatcher, responder; we
can't go back to the old func-call ways any more).  So maybe I can
remove that parameter directly next time.

Thanks,

> 
> >       } else {
> >           error_report("unknown monitor mode \"%s\"", mode);
> >           exit(1);
> > @@ -2402,11 +2406,6 @@ static int mon_init_func(void *opaque, QemuOpts *opts, Error **errp)
> >       if (qemu_opt_get_bool(opts, "pretty", 0))
> >           flags |= MONITOR_USE_PRETTY;
> > -    /* OOB is off by default */
> > -    if (qemu_opt_get_bool(opts, "x-oob", 0)) {
> > -        flags |= MONITOR_USE_OOB;
> > -    }
> > -
> >       chardev = qemu_opt_get(opts, "chardev");
> >       chr = qemu_chr_find(chardev);
> >       if (chr == NULL) {
> > 
> 
> -- 
> Eric Blake, Principal Software Engineer
> Red Hat, Inc.           +1-919-301-3266
> Virtualization:  qemu.org | libvirt.org
John Snow May 22, 2018, 6:40 p.m. UTC | #5
On 05/21/2018 10:13 AM, Eric Blake wrote:
> On 05/21/2018 03:42 AM, Peter Xu wrote:
>> We turned Out-Of-Band feature of monitors off for 2.12 release.  Now we
>> try to turn that on again.
> 
> "try to turn" sounds weak, like you aren't sure of this patch.  If you
> aren't sure, then why should we feel safe in applying it?  This text is
> going in the permanent git history, so sound bold, rather than hesitant!
> 
> "We have resolved the issues from last time (commit 3fd2457d reverted by
> commit a4f90923):
> - issue 1 ...
> - issue 2 ...
> So now we are ready to enable advertisement of the feature by default"
> 
> with better descriptions of the issues that you fixed (I can think of at
> least the fixes adding thread-safety to the current monitor, and fixing
> early use of the monitor before qmp_capabilities completes; there may
> also be other issues that you want to call out).
> 
>>
>> Signed-off-by: Peter Xu <peterx@redhat.com>
>> -- 
>> Now OOB should be okay with all known tests (except iotest qcow2, since
>> it is still broken on master),
> 
> Which tests are still failing for you?  Ideally, you can still
> demonstrate that the tests not failing without this patch continue to
> pass with this patch, even if you call out the tests that have known
> issues to still be resolved.
> 

Probably 91 and 169. If any others fail that's news to me.

>> and AFAIK now we should also be okay with
>> ARM+Libvirt (not testsed, but Eric Auger helped to verify that before
>> the release).  So I think it's now safe to turn OOB on again.  Please
>> feel free to test this against any of existing testsuites to see whether
>> it'll still break any stuff.  Thanks,
>>
>> Signed-off-by: Peter Xu <peterx@redhat.com>
>> ---
>>   monitor.c        | 13 +++----------
>>   tests/qmp-test.c |  2 +-
>>   vl.c             |  9 ++++-----
>>   3 files changed, 8 insertions(+), 16 deletions(-)
>>
>> diff --git a/monitor.c b/monitor.c
>> index 46814af533..ce5cc5e34e 100644
>> --- a/monitor.c
>> +++ b/monitor.c
>> @@ -4560,16 +4560,9 @@ void monitor_init(Chardev *chr, int flags)
>>       bool use_readline = flags & MONITOR_USE_READLINE;
>>       bool use_oob = flags & MONITOR_USE_OOB;
>>   -    if (use_oob) {
>> -        if (CHARDEV_IS_MUX(chr)) {
>> -            error_report("Monitor Out-Of-Band is not supported with "
>> -                         "MUX typed chardev backend");
>> -            exit(1);
>> -        }
>> -        if (use_readline) {
>> -            error_report("Monitor Out-Of-band is only supported by
>> QMP");
>> -            exit(1);
>> -        }
>> +    if (CHARDEV_IS_MUX(chr)) {
>> +        /* MUX is still not supported for Out-Of-Band */
>> +        use_oob = false;
> 
> This isn't a mere reinstatement of 3fd2457d, but is now advertising OOB
> when using readline (which presumably is a synonym for using HMP).  Is
> that intentional?  If so, the commit message should mention it.
> 
>>       }
>>         monitor_data_init(mon, false, use_oob);
>> diff --git a/tests/qmp-test.c b/tests/qmp-test.c
>> index 88f867f8c0..c85a3964d9 100644
>> --- a/tests/qmp-test.c
>> +++ b/tests/qmp-test.c
>> @@ -89,7 +89,7 @@ static void test_qmp_protocol(void)
>>       g_assert(q);
>>       test_version(qdict_get(q, "version"));
>>       capabilities = qdict_get_qlist(q, "capabilities");
>> -    g_assert(capabilities && qlist_empty(capabilities));
>> +    g_assert(capabilities);
>>       qobject_unref(resp);
>>         /* Test valid command before handshake */
>> diff --git a/vl.c b/vl.c
>> index 3b39bbd7a8..b71fb8eb25 100644
>> --- a/vl.c
>> +++ b/vl.c
>> @@ -2394,6 +2394,10 @@ static int mon_init_func(void *opaque, QemuOpts
>> *opts, Error **errp)
>>           flags = MONITOR_USE_READLINE;
>>       } else if (strcmp(mode, "control") == 0) {
>>           flags = MONITOR_USE_CONTROL;
>> +        /* Out-Of-Band is on by default */
>> +        if (qemu_opt_get_bool(opts, "x-oob", 1)) {
>> +            flags |= MONITOR_USE_OOB;
>> +        }
> 
> Do we really still need the x-oob property, vs. outright deletion of
> this bandaid?  Then again, I guess keeping it for one more release makes
> it easier to forcefully turn things off for temporary testing when
> isolating whether OOB is a culprit in something breaking.
> 
>>       } else {
>>           error_report("unknown monitor mode \"%s\"", mode);
>>           exit(1);
>> @@ -2402,11 +2406,6 @@ static int mon_init_func(void *opaque, QemuOpts
>> *opts, Error **errp)
>>       if (qemu_opt_get_bool(opts, "pretty", 0))
>>           flags |= MONITOR_USE_PRETTY;
>>   -    /* OOB is off by default */
>> -    if (qemu_opt_get_bool(opts, "x-oob", 0)) {
>> -        flags |= MONITOR_USE_OOB;
>> -    }
>> -
>>       chardev = qemu_opt_get(opts, "chardev");
>>       chr = qemu_chr_find(chardev);
>>       if (chr == NULL) {
>>
>
Markus Armbruster May 24, 2018, 7:08 a.m. UTC | #6
Peter Xu <peterx@redhat.com> writes:

> On Mon, May 21, 2018 at 09:13:06AM -0500, Eric Blake wrote:
>> On 05/21/2018 03:42 AM, Peter Xu wrote:
>> > We turned Out-Of-Band feature of monitors off for 2.12 release.  Now we
>> > try to turn that on again.
>> 
>> "try to turn" sounds weak, like you aren't sure of this patch.  If you
>> aren't sure, then why should we feel safe in applying it?  This text is
>> going in the permanent git history, so sound bold, rather than hesitant!
>
> Yeah I am not really strong at turn that on by default, that's why I
> marked the patch as RFC.  I wanted to hear from your opinions.  For
> now IMHO even with x-oob, postcopy can start to work with network
> recovery, then the requirement from my part is done.  However I'm
> thinking maybe we should still turn that on for all the people.  One
> reason is that we already have the QMP capability negotiation so it
> seems redundant (as you mentioned before), meanwhile exposing it to
> broader users can let broader users to leverage this new bits directly
> (meanwhile easier to expose potential issues for OOB too).
>
> Meanwhile I'm not confident too on that there can be other test cases
> that has not yet been run with Out-Of-Band, so even if we solved all
> existing problems I can't be sure that no further test will broke.
> However I don't see it a problem for merging, since AFAIU I can't
> really know what will break again (if there is) unless we apply that
> to master again... :)

The time to switch it on is now.  I don't think we can find remaining
issues (if any) unless we do.

The time for doubts is the day before rc0 ;)

>> "We have resolved the issues from last time (commit 3fd2457d reverted by
>> commit a4f90923):
>> - issue 1 ...
>> - issue 2 ...
>> So now we are ready to enable advertisement of the feature by default"
>> 
>> with better descriptions of the issues that you fixed (I can think of at
>> least the fixes adding thread-safety to the current monitor, and fixing
>> early use of the monitor before qmp_capabilities completes; there may also
>> be other issues that you want to call out).
>
> Some of the monitor patches are not really related to previous OOB
> breakage, the only one that really matters should be the ARM+Libvirt
> one, which I will definitely mention in my next post.  The rest
> (including per-thread cur_mon, monitor thread-safety, etc.) should
> mostly for future new commands of Out-Of-Band but not for now.  For
> example, current OOB commands are rare, now they don't use the
> get_fd()/set_fd() interface, then the mon_fdsets won't need to be
> protected at all.

A lock works only when all its critical sections are guaranteed to
execute quickly!  Remember, for OOB to work, all OOB commands must
execute quickly, always.  Rule of thumb: treat OOB command code like
realtime code.

>                    But we can't guarantee that new OOB commands won't
> use them too, so we still need to protect them with locks.

When creating or modifying an OOB command, you need to be extra careful
about shared state: you may have to add suitable synchronization.

Please add suitable warnings about use of allow-oob to
docs/qapi-code-gen.txt.  Perhaps even a separate document on OOB,
pointed to by qapi-code-gen.txt.

>> > 
>> > Signed-off-by: Peter Xu <peterx@redhat.com>
>> > --
>> > Now OOB should be okay with all known tests (except iotest qcow2, since
>> > it is still broken on master),
>> 
>> Which tests are still failing for you?  Ideally, you can still demonstrate
>> that the tests not failing without this patch continue to pass with this
>> patch, even if you call out the tests that have known issues to still be
>> resolved.
>
> I didn't remember.  We can first settle down on whether we'd like to
> turn on this default value, then I can perform this test for my next
> post to make sure good tests won't break.
>
>> 
>> > and AFAIK now we should also be okay with
>> > ARM+Libvirt (not testsed, but Eric Auger helped to verify that before
>> > the release).  So I think it's now safe to turn OOB on again.  Please
>> > feel free to test this against any of existing testsuites to see whether
>> > it'll still break any stuff.  Thanks,
>> > 
>> > Signed-off-by: Peter Xu <peterx@redhat.com>
>> > ---
>> >   monitor.c        | 13 +++----------
>> >   tests/qmp-test.c |  2 +-
>> >   vl.c             |  9 ++++-----
>> >   3 files changed, 8 insertions(+), 16 deletions(-)
>> > 
>> > diff --git a/monitor.c b/monitor.c
>> > index 46814af533..ce5cc5e34e 100644
>> > --- a/monitor.c
>> > +++ b/monitor.c
>> > @@ -4560,16 +4560,9 @@ void monitor_init(Chardev *chr, int flags)
>> >       bool use_readline = flags & MONITOR_USE_READLINE;
>> >       bool use_oob = flags & MONITOR_USE_OOB;
>> > -    if (use_oob) {
>> > -        if (CHARDEV_IS_MUX(chr)) {
>> > -            error_report("Monitor Out-Of-Band is not supported with "
>> > -                         "MUX typed chardev backend");
>> > -            exit(1);
>> > -        }
>> > -        if (use_readline) {
>> > -            error_report("Monitor Out-Of-band is only supported by QMP");
>> > -            exit(1);
>> > -        }
>> > +    if (CHARDEV_IS_MUX(chr)) {
>> > +        /* MUX is still not supported for Out-Of-Band */
>> > +        use_oob = false;
>> 
>> This isn't a mere reinstatement of 3fd2457d, but is now advertising OOB when
>> using readline (which presumably is a synonym for using HMP).

It effectively is a synonm now, but exploiting it would be unclean.  The
clean test for HMP is !(flags & MONITOR_USE_CONTROL).

>>                                                                Is that
>> intentional?  If so, the commit message should mention it.
>
> At [1] below I directly moved the chunk into "mode=control" path, so
> the QMP check is already there.  Here I turn OOB off explicitly for
> MUX no matter HMP/QMP.  It should have the same affect as 3fd2457d.

Switching off OOB silently for mux'ed chardev is ugly.  But I don't have
better ideas.

What's keeping us from supporting OOB there?

Is anyone using QMP over a mux'ed chardev in anger?  I guess we don't
know.

>> >       }
>> >       monitor_data_init(mon, false, use_oob);
>> > diff --git a/tests/qmp-test.c b/tests/qmp-test.c
>> > index 88f867f8c0..c85a3964d9 100644
>> > --- a/tests/qmp-test.c
>> > +++ b/tests/qmp-test.c
>> > @@ -89,7 +89,7 @@ static void test_qmp_protocol(void)
>> >       g_assert(q);
>> >       test_version(qdict_get(q, "version"));
>> >       capabilities = qdict_get_qlist(q, "capabilities");
>> > -    g_assert(capabilities && qlist_empty(capabilities));
>> > +    g_assert(capabilities);
>> >       qobject_unref(resp);
>> >       /* Test valid command before handshake */

Let's check contents of @capabilities matches expectations.

>> > diff --git a/vl.c b/vl.c
>> > index 3b39bbd7a8..b71fb8eb25 100644
>> > --- a/vl.c
>> > +++ b/vl.c
>> > @@ -2394,6 +2394,10 @@ static int mon_init_func(void *opaque, QemuOpts *opts, Error **errp)
>> >           flags = MONITOR_USE_READLINE;
>> >       } else if (strcmp(mode, "control") == 0) {
>> >           flags = MONITOR_USE_CONTROL;
>> > +        /* Out-Of-Band is on by default */
>> > +        if (qemu_opt_get_bool(opts, "x-oob", 1)) {
>> > +            flags |= MONITOR_USE_OOB;
>> > +        }
>
> [1]
>
>> 
>> Do we really still need the x-oob property, vs. outright deletion of this
>> bandaid?  Then again, I guess keeping it for one more release makes it
>> easier to forcefully turn things off for temporary testing when isolating
>> whether OOB is a culprit in something breaking.
>
> Yes, since we already have it, I didn't have strong opinion to remove
> it, so I kept it.  Actually now we can't turn OOB off completely
> already - it's fully embeded in QMP internal logic (e.g., now we'll
> always have completely isolated parser, dispatcher, responder; we
> can't go back to the old func-call ways any more).  So maybe I can
> remove that parameter directly next time.

I'm leaning towards removing it, because:

* There's no test coverage for x-oob=off

* We already have a way to disable OOB: the QMP client can decline the
  capability

* MONITOR_USE_OOB makes monitor flags even uglier: of the four flag
  bits' sixteen combinations, only a few actually occur:

    MONITOR_USE_READLINE  on   off
    MONITOR_USE_CONTROL   off  on
    MONITOR_USE_PRETTY    N/A  either
    MONITOR_USE_OOB       N/A  on unless mux'ed

  That's just three genuine configurations (HMP, QMP, pretty QMP), if we
  ignore the "mux'ed can't do OOB, yet" bit.

  Without x-oob, we can drop MONITOR_USE_OOB, I think.

>
> Thanks,
>
>> 
>> >       } else {
>> >           error_report("unknown monitor mode \"%s\"", mode);
>> >           exit(1);
>> > @@ -2402,11 +2406,6 @@ static int mon_init_func(void *opaque, QemuOpts *opts, Error **errp)
>> >       if (qemu_opt_get_bool(opts, "pretty", 0))
>> >           flags |= MONITOR_USE_PRETTY;
>> > -    /* OOB is off by default */
>> > -    if (qemu_opt_get_bool(opts, "x-oob", 0)) {
>> > -        flags |= MONITOR_USE_OOB;
>> > -    }
>> > -
>> >       chardev = qemu_opt_get(opts, "chardev");
>> >       chr = qemu_chr_find(chardev);
>> >       if (chr == NULL) {
>> > 
>> 
>> -- 
>> Eric Blake, Principal Software Engineer
>> Red Hat, Inc.           +1-919-301-3266
>> Virtualization:  qemu.org | libvirt.org
Peter Xu May 30, 2018, 8:04 a.m. UTC | #7
On Tue, May 22, 2018 at 02:40:26PM -0400, John Snow wrote:
> 
> 
> On 05/21/2018 10:13 AM, Eric Blake wrote:
> > On 05/21/2018 03:42 AM, Peter Xu wrote:
> >> We turned Out-Of-Band feature of monitors off for 2.12 release.  Now we
> >> try to turn that on again.
> > 
> > "try to turn" sounds weak, like you aren't sure of this patch.  If you
> > aren't sure, then why should we feel safe in applying it?  This text is
> > going in the permanent git history, so sound bold, rather than hesitant!
> > 
> > "We have resolved the issues from last time (commit 3fd2457d reverted by
> > commit a4f90923):
> > - issue 1 ...
> > - issue 2 ...
> > So now we are ready to enable advertisement of the feature by default"
> > 
> > with better descriptions of the issues that you fixed (I can think of at
> > least the fixes adding thread-safety to the current monitor, and fixing
> > early use of the monitor before qmp_capabilities completes; there may
> > also be other issues that you want to call out).
> > 
> >>
> >> Signed-off-by: Peter Xu <peterx@redhat.com>
> >> -- 
> >> Now OOB should be okay with all known tests (except iotest qcow2, since
> >> it is still broken on master),
> > 
> > Which tests are still failing for you?  Ideally, you can still
> > demonstrate that the tests not failing without this patch continue to
> > pass with this patch, even if you call out the tests that have known
> > issues to still be resolved.
> > 
> 
> Probably 91 and 169. If any others fail that's news to me.

I just gave it a shot on my workstation too (./check -qcow2):

Not run: 045 059 064 070 075 076 077 078 081 083 084 088 092 093 094 101 106 109 113 116 119 123 128 131 135 136 146 148 149 160 162 171 173 175 199 207 210 3
Failures: 087 188 189 198 206
Failed 5 of 167 tests

I'm testing against master, e609fa7.

Thanks,
Peter Xu May 30, 2018, 12:10 p.m. UTC | #8
On Wed, May 30, 2018 at 04:04:58PM +0800, Peter Xu wrote:
> On Tue, May 22, 2018 at 02:40:26PM -0400, John Snow wrote:
> > 
> > 
> > On 05/21/2018 10:13 AM, Eric Blake wrote:
> > > On 05/21/2018 03:42 AM, Peter Xu wrote:
> > >> We turned Out-Of-Band feature of monitors off for 2.12 release.  Now we
> > >> try to turn that on again.
> > > 
> > > "try to turn" sounds weak, like you aren't sure of this patch.  If you
> > > aren't sure, then why should we feel safe in applying it?  This text is
> > > going in the permanent git history, so sound bold, rather than hesitant!
> > > 
> > > "We have resolved the issues from last time (commit 3fd2457d reverted by
> > > commit a4f90923):
> > > - issue 1 ...
> > > - issue 2 ...
> > > So now we are ready to enable advertisement of the feature by default"
> > > 
> > > with better descriptions of the issues that you fixed (I can think of at
> > > least the fixes adding thread-safety to the current monitor, and fixing
> > > early use of the monitor before qmp_capabilities completes; there may
> > > also be other issues that you want to call out).
> > > 
> > >>
> > >> Signed-off-by: Peter Xu <peterx@redhat.com>
> > >> -- 
> > >> Now OOB should be okay with all known tests (except iotest qcow2, since
> > >> it is still broken on master),
> > > 
> > > Which tests are still failing for you?  Ideally, you can still
> > > demonstrate that the tests not failing without this patch continue to
> > > pass with this patch, even if you call out the tests that have known
> > > issues to still be resolved.
> > > 
> > 
> > Probably 91 and 169. If any others fail that's news to me.
> 
> I just gave it a shot on my workstation too (./check -qcow2):
> 
> Not run: 045 059 064 070 075 076 077 078 081 083 084 088 092 093 094 101 106 109 113 116 119 123 128 131 135 136 146 148 149 160 162 171 173 175 199 207 210 3
> Failures: 087 188 189 198 206
> Failed 5 of 167 tests
> 
> I'm testing against master, e609fa7.

Hmm... I ran again the same test and the same master commit but this
time it passed all 167 tests on my laptop.  So I assume the previous 5
failures are unreproducable at least every time (or there might be
something wrong with my testbed).  I'll rerun some more times, and
when I post the OOB patch I'll cover all correct qcow2 tests.

Regards,
Stefan Hajnoczi June 1, 2018, 12:39 p.m. UTC | #9
On Thu, May 24, 2018 at 09:08:58AM +0200, Markus Armbruster wrote:
> Peter Xu <peterx@redhat.com> writes:
> 
> > On Mon, May 21, 2018 at 09:13:06AM -0500, Eric Blake wrote:
> >> On 05/21/2018 03:42 AM, Peter Xu wrote:
> >> > We turned Out-Of-Band feature of monitors off for 2.12 release.  Now we
> >> > try to turn that on again.
> >> 
> >> "try to turn" sounds weak, like you aren't sure of this patch.  If you
> >> aren't sure, then why should we feel safe in applying it?  This text is
> >> going in the permanent git history, so sound bold, rather than hesitant!
> >
> > Yeah I am not really strong at turn that on by default, that's why I
> > marked the patch as RFC.  I wanted to hear from your opinions.  For
> > now IMHO even with x-oob, postcopy can start to work with network
> > recovery, then the requirement from my part is done.  However I'm
> > thinking maybe we should still turn that on for all the people.  One
> > reason is that we already have the QMP capability negotiation so it
> > seems redundant (as you mentioned before), meanwhile exposing it to
> > broader users can let broader users to leverage this new bits directly
> > (meanwhile easier to expose potential issues for OOB too).
> >
> > Meanwhile I'm not confident too on that there can be other test cases
> > that has not yet been run with Out-Of-Band, so even if we solved all
> > existing problems I can't be sure that no further test will broke.
> > However I don't see it a problem for merging, since AFAIU I can't
> > really know what will break again (if there is) unless we apply that
> > to master again... :)
> 
> The time to switch it on is now.  I don't think we can find remaining
> issues (if any) unless we do.
> 
> The time for doubts is the day before rc0 ;)

I agree with Markus, let's enable OOB by default now.

The path to OOB has been long.  It's an invasive monitor change that
solves the postcopy recovery problem.  There have been lots of
complications but Peter has resolved them.

I personally don't think OOB is suitable for much else beyond postcopy
recovery (real-time code is a pain to write!), but we should switch to
this new code path and make sure it's solid for QEMU 3.0.

Stefan
diff mbox

Patch

diff --git a/monitor.c b/monitor.c
index 46814af533..ce5cc5e34e 100644
--- a/monitor.c
+++ b/monitor.c
@@ -4560,16 +4560,9 @@  void monitor_init(Chardev *chr, int flags)
     bool use_readline = flags & MONITOR_USE_READLINE;
     bool use_oob = flags & MONITOR_USE_OOB;
 
-    if (use_oob) {
-        if (CHARDEV_IS_MUX(chr)) {
-            error_report("Monitor Out-Of-Band is not supported with "
-                         "MUX typed chardev backend");
-            exit(1);
-        }
-        if (use_readline) {
-            error_report("Monitor Out-Of-band is only supported by QMP");
-            exit(1);
-        }
+    if (CHARDEV_IS_MUX(chr)) {
+        /* MUX is still not supported for Out-Of-Band */
+        use_oob = false;
     }
 
     monitor_data_init(mon, false, use_oob);
diff --git a/tests/qmp-test.c b/tests/qmp-test.c
index 88f867f8c0..c85a3964d9 100644
--- a/tests/qmp-test.c
+++ b/tests/qmp-test.c
@@ -89,7 +89,7 @@  static void test_qmp_protocol(void)
     g_assert(q);
     test_version(qdict_get(q, "version"));
     capabilities = qdict_get_qlist(q, "capabilities");
-    g_assert(capabilities && qlist_empty(capabilities));
+    g_assert(capabilities);
     qobject_unref(resp);
 
     /* Test valid command before handshake */
diff --git a/vl.c b/vl.c
index 3b39bbd7a8..b71fb8eb25 100644
--- a/vl.c
+++ b/vl.c
@@ -2394,6 +2394,10 @@  static int mon_init_func(void *opaque, QemuOpts *opts, Error **errp)
         flags = MONITOR_USE_READLINE;
     } else if (strcmp(mode, "control") == 0) {
         flags = MONITOR_USE_CONTROL;
+        /* Out-Of-Band is on by default */
+        if (qemu_opt_get_bool(opts, "x-oob", 1)) {
+            flags |= MONITOR_USE_OOB;
+        }
     } else {
         error_report("unknown monitor mode \"%s\"", mode);
         exit(1);
@@ -2402,11 +2406,6 @@  static int mon_init_func(void *opaque, QemuOpts *opts, Error **errp)
     if (qemu_opt_get_bool(opts, "pretty", 0))
         flags |= MONITOR_USE_PRETTY;
 
-    /* OOB is off by default */
-    if (qemu_opt_get_bool(opts, "x-oob", 0)) {
-        flags |= MONITOR_USE_OOB;
-    }
-
     chardev = qemu_opt_get(opts, "chardev");
     chr = qemu_chr_find(chardev);
     if (chr == NULL) {