diff mbox series

[12/15] iotests: Disable AQMP logging under non-debug modes

Message ID 20210917054047.2042843-13-jsnow@redhat.com (mailing list archive)
State New, archived
Headers show
Series Switch iotests to using Async QMP | expand

Commit Message

John Snow Sept. 17, 2021, 5:40 a.m. UTC
Disable the aqmp logger, which likes to (at the moment) print out
intermediate warnings and errors that cause session termination; disable
them so they don't interfere with the job output.

Leave any "CRITICAL" warnings enabled though, those are ones that we
should never see, no matter what.

Signed-off-by: John Snow <jsnow@redhat.com>
---
 tests/qemu-iotests/iotests.py | 2 ++
 1 file changed, 2 insertions(+)

Comments

Hanna Reitz Sept. 17, 2021, 2:30 p.m. UTC | #1
On 17.09.21 07:40, John Snow wrote:
> Disable the aqmp logger, which likes to (at the moment) print out
> intermediate warnings and errors that cause session termination; disable
> them so they don't interfere with the job output.
>
> Leave any "CRITICAL" warnings enabled though, those are ones that we
> should never see, no matter what.

I mean, looks OK to me, but from what I understand (i.e. little), 
qmp_client doesn’t log CRITICAL messages, at least I can’t see any. Only 
ERRORs.

I guess I’m missing some CRITICAL messages in external functions called 
from qmp_client.py, but shouldn’t we still keep ERRORs?

Hanna

> Signed-off-by: John Snow <jsnow@redhat.com>
> ---
>   tests/qemu-iotests/iotests.py | 2 ++
>   1 file changed, 2 insertions(+)
>
> diff --git a/tests/qemu-iotests/iotests.py b/tests/qemu-iotests/iotests.py
> index 273d2777ae..47e5f9738b 100644
> --- a/tests/qemu-iotests/iotests.py
> +++ b/tests/qemu-iotests/iotests.py
> @@ -1383,6 +1383,8 @@ def execute_setup_common(supported_fmts: Sequence[str] = (),
>       if debug:
>           sys.argv.remove('-d')
>       logging.basicConfig(level=(logging.DEBUG if debug else logging.WARN))
> +    if not debug:
> +        logging.getLogger("qemu.aqmp.qmp_client").setLevel(logging.CRITICAL)
>   
>       _verify_image_format(supported_fmts, unsupported_fmts)
>       _verify_protocol(supported_protocols, unsupported_protocols)
John Snow Sept. 18, 2021, 12:58 a.m. UTC | #2
On Fri, Sep 17, 2021 at 10:30 AM Hanna Reitz <hreitz@redhat.com> wrote:

> On 17.09.21 07:40, John Snow wrote:
> > Disable the aqmp logger, which likes to (at the moment) print out
> > intermediate warnings and errors that cause session termination; disable
> > them so they don't interfere with the job output.
> >
> > Leave any "CRITICAL" warnings enabled though, those are ones that we
> > should never see, no matter what.
>
> I mean, looks OK to me, but from what I understand (i.e. little),
> qmp_client doesn’t log CRITICAL messages, at least I can’t see any. Only
> ERRORs.
>
>
There's *one* critical message in protocol.py, used for a circumstance that
I *think* should be impossible. I do not think I currently use any WARNING
level statements.


> I guess I’m missing some CRITICAL messages in external functions called
> from qmp_client.py, but shouldn’t we still keep ERRORs?
>

...Mayyyyyybe?

The errors logged by AQMP are *almost always* raised as Exceptions
somewhere else, eventually. Sometimes when we encounter them in one
context, we need to save them and then re-raise them in a different
execution context. There's one good exception to this: My pal, EOFError.

If the reader context encounters EOF, it raises EOFError and this causes a
disconnect to be scheduled asynchronously. *Any* Exception that causes a
disconnect to be scheduled asynchronously is dutifully logged as an ERROR.
At this point in the code, we don't really know if the user of the library
considers this an "error" yet or not. I've waffled a lot on how exactly to
treat this circumstance. ...Hm, I guess that's really the only case where I
have an error that really ought to be suppressed. I suppose what I will do
here is: if the exception happens to be an EOFError I will drop the
severity of the log message down to INFO. I don't know why it takes being
challenged on this stuff to start thinking clearly about it, but here we
are. Thank you for your feedback :~)

--js
John Snow Sept. 18, 2021, 2:14 a.m. UTC | #3
On Fri, Sep 17, 2021 at 8:58 PM John Snow <jsnow@redhat.com> wrote:

>
>
> On Fri, Sep 17, 2021 at 10:30 AM Hanna Reitz <hreitz@redhat.com> wrote:
>
>> On 17.09.21 07:40, John Snow wrote:
>> > Disable the aqmp logger, which likes to (at the moment) print out
>> > intermediate warnings and errors that cause session termination; disable
>> > them so they don't interfere with the job output.
>> >
>> > Leave any "CRITICAL" warnings enabled though, those are ones that we
>> > should never see, no matter what.
>>
>> I mean, looks OK to me, but from what I understand (i.e. little),
>> qmp_client doesn’t log CRITICAL messages, at least I can’t see any. Only
>> ERRORs.
>>
>>
> There's *one* critical message in protocol.py, used for a circumstance
> that I *think* should be impossible. I do not think I currently use any
> WARNING level statements.
>
>
>> I guess I’m missing some CRITICAL messages in external functions called
>> from qmp_client.py, but shouldn’t we still keep ERRORs?
>>
>
> ...Mayyyyyybe?
>
> The errors logged by AQMP are *almost always* raised as Exceptions
> somewhere else, eventually. Sometimes when we encounter them in one
> context, we need to save them and then re-raise them in a different
> execution context. There's one good exception to this: My pal, EOFError.
>
> If the reader context encounters EOF, it raises EOFError and this causes a
> disconnect to be scheduled asynchronously. *Any* Exception that causes a
> disconnect to be scheduled asynchronously is dutifully logged as an ERROR.
> At this point in the code, we don't really know if the user of the library
> considers this an "error" yet or not. I've waffled a lot on how exactly to
> treat this circumstance. ...Hm, I guess that's really the only case where I
> have an error that really ought to be suppressed. I suppose what I will do
> here is: if the exception happens to be an EOFError I will drop the
> severity of the log message down to INFO. I don't know why it takes being
> challenged on this stuff to start thinking clearly about it, but here we
> are. Thank you for your feedback :~)
>
> --js
>

Oh, CI testing reminds me of why I am a liar here.

the mirror-top-perms test intentionally expects not to be able to connect,
but we're treated to these two additional lines of output:

+ERROR:qemu.aqmp.qmp_client.qemub-2536319:Negotiation failed: EOFError
+ERROR:qemu.aqmp.qmp_client.qemub-2536319:Failed to establish session:
EOFError

Uh. I guess a temporary suppression in mirror-top-perms, then ...? In most
other cases I rather imagine we do want to see this kind of output to help
give more information about how things have failed -- they ARE errors. We
just happen to not care about them right here. The only thing I don't
exactly like about this is that it requires some knowledge by the caller to
know to disable it. It makes writing negative tests a bit more annoying
because the library leans so heavily on yelling loudly when it encounters
problems.
Hanna Reitz Oct. 4, 2021, 9:52 a.m. UTC | #4
On 18.09.21 02:58, John Snow wrote:
>
>
> On Fri, Sep 17, 2021 at 10:30 AM Hanna Reitz <hreitz@redhat.com 
> <mailto:hreitz@redhat.com>> wrote:
>
>     On 17.09.21 07:40, John Snow wrote:
>     > Disable the aqmp logger, which likes to (at the moment) print out
>     > intermediate warnings and errors that cause session termination;
>     disable
>     > them so they don't interfere with the job output.
>     >
>     > Leave any "CRITICAL" warnings enabled though, those are ones that we
>     > should never see, no matter what.
>
>     I mean, looks OK to me, but from what I understand (i.e. little),
>     qmp_client doesn’t log CRITICAL messages, at least I can’t see
>     any. Only
>     ERRORs.
>
>
> There's *one* critical message in protocol.py, used for a circumstance 
> that I *think* should be impossible. I do not think I currently use 
> any WARNING level statements.
>
>     I guess I’m missing some CRITICAL messages in external functions
>     called
>     from qmp_client.py, but shouldn’t we still keep ERRORs?
>
>
> ...Mayyyyyybe?
>
> The errors logged by AQMP are *almost always* raised as Exceptions 
> somewhere else, eventually. Sometimes when we encounter them in one 
> context, we need to save them and then re-raise them in a different 
> execution context. There's one good exception to this: My pal, EOFError.
>
> If the reader context encounters EOF, it raises EOFError and this 
> causes a disconnect to be scheduled asynchronously. *Any* Exception 
> that causes a disconnect to be scheduled asynchronously is dutifully 
> logged as an ERROR. At this point in the code, we don't really know if 
> the user of the library considers this an "error" yet or not. I've 
> waffled a lot on how exactly to treat this circumstance. ...Hm, I 
> guess that's really the only case where I have an error that really 
> ought to be suppressed. I suppose what I will do here is: if the 
> exception happens to be an EOFError I will drop the severity of the 
> log message down to INFO. I don't know why it takes being challenged 
> on this stuff to start thinking clearly about it, but here we are. 
> Thank you for your feedback :~)

Errr... You’re welcome!! *cough*

Hanna
Hanna Reitz Oct. 4, 2021, 10:12 a.m. UTC | #5
On 18.09.21 04:14, John Snow wrote:
>
>
> On Fri, Sep 17, 2021 at 8:58 PM John Snow <jsnow@redhat.com 
> <mailto:jsnow@redhat.com>> wrote:
>
>
>
>     On Fri, Sep 17, 2021 at 10:30 AM Hanna Reitz <hreitz@redhat.com
>     <mailto:hreitz@redhat.com>> wrote:
>
>         On 17.09.21 07:40, John Snow wrote:
>         > Disable the aqmp logger, which likes to (at the moment)
>         print out
>         > intermediate warnings and errors that cause session
>         termination; disable
>         > them so they don't interfere with the job output.
>         >
>         > Leave any "CRITICAL" warnings enabled though, those are ones
>         that we
>         > should never see, no matter what.
>
>         I mean, looks OK to me, but from what I understand (i.e. little),
>         qmp_client doesn’t log CRITICAL messages, at least I can’t see
>         any. Only
>         ERRORs.
>
>
>     There's *one* critical message in protocol.py, used for a
>     circumstance that I *think* should be impossible. I do not think I
>     currently use any WARNING level statements.
>
>         I guess I’m missing some CRITICAL messages in external
>         functions called
>         from qmp_client.py, but shouldn’t we still keep ERRORs?
>
>
>     ...Mayyyyyybe?
>
>     The errors logged by AQMP are *almost always* raised as Exceptions
>     somewhere else, eventually. Sometimes when we encounter them in
>     one context, we need to save them and then re-raise them in a
>     different execution context. There's one good exception to this:
>     My pal, EOFError.
>
>     If the reader context encounters EOF, it raises EOFError and this
>     causes a disconnect to be scheduled asynchronously. *Any*
>     Exception that causes a disconnect to be scheduled asynchronously
>     is dutifully logged as an ERROR. At this point in the code, we
>     don't really know if the user of the library considers this an
>     "error" yet or not. I've waffled a lot on how exactly to treat
>     this circumstance. ...Hm, I guess that's really the only case
>     where I have an error that really ought to be suppressed. I
>     suppose what I will do here is: if the exception happens to be an
>     EOFError I will drop the severity of the log message down to INFO.
>     I don't know why it takes being challenged on this stuff to start
>     thinking clearly about it, but here we are. Thank you for your
>     feedback :~)
>
>     --js
>
>
> Oh, CI testing reminds me of why I am a liar here.
>
> the mirror-top-perms test intentionally expects not to be able to 
> connect, but we're treated to these two additional lines of output:
>
> +ERROR:qemu.aqmp.qmp_client.qemub-2536319:Negotiation failed: EOFError
> +ERROR:qemu.aqmp.qmp_client.qemub-2536319:Failed to establish session: 
> EOFError
>
> Uh. I guess a temporary suppression in mirror-top-perms, then ...?

Sounds right to me, if that’s simple enough.

(By the way, I understand it right that you want to lower the severity 
of EOFErrors to INFO only on disconnect, right?  Which is why they’re 
still logged as ERRORs here, because they aren’t occurring on disconnects?)

> In most other cases I rather imagine we do want to see this kind of 
> output to help give more information about how things have failed -- 
> they ARE errors. We just happen to not care about them right here.

Well, in fact we do expect them here, so we could even log them, but 
first I don’t know whether they’re stable enough for that, and second 
this would break the “choose the AQMP or legacy QMP back-end via an 
environment variable” thing.

> The only thing I don't exactly like about this is that it requires 
> some knowledge by the caller to know to disable it. It makes writing 
> negative tests a bit more annoying because the library leans so 
> heavily on yelling loudly when it encounters problems.

Yeah.  I’m afraid I don’t have a good idea on how to convey test writers 
how to suppress these errors, even if it were a simple one-liner (like 
`self.vm_b.set_log_threshold(logging.CRITICAL)`)...

We could start an “iotests tips and tricks” document, but do we want to?

Hanna
John Snow Oct. 4, 2021, 6:32 p.m. UTC | #6
On Mon, Oct 4, 2021 at 6:12 AM Hanna Reitz <hreitz@redhat.com> wrote:

> On 18.09.21 04:14, John Snow wrote:
> >
> >
> > On Fri, Sep 17, 2021 at 8:58 PM John Snow <jsnow@redhat.com
> > <mailto:jsnow@redhat.com>> wrote:
> >
> >
> >
> >     On Fri, Sep 17, 2021 at 10:30 AM Hanna Reitz <hreitz@redhat.com
> >     <mailto:hreitz@redhat.com>> wrote:
> >
> >         On 17.09.21 07:40, John Snow wrote:
> >         > Disable the aqmp logger, which likes to (at the moment)
> >         print out
> >         > intermediate warnings and errors that cause session
> >         termination; disable
> >         > them so they don't interfere with the job output.
> >         >
> >         > Leave any "CRITICAL" warnings enabled though, those are ones
> >         that we
> >         > should never see, no matter what.
> >
> >         I mean, looks OK to me, but from what I understand (i.e. little),
> >         qmp_client doesn’t log CRITICAL messages, at least I can’t see
> >         any. Only
> >         ERRORs.
> >
> >
> >     There's *one* critical message in protocol.py, used for a
> >     circumstance that I *think* should be impossible. I do not think I
> >     currently use any WARNING level statements.
> >
> >         I guess I’m missing some CRITICAL messages in external
> >         functions called
> >         from qmp_client.py, but shouldn’t we still keep ERRORs?
> >
> >
> >     ...Mayyyyyybe?
> >
> >     The errors logged by AQMP are *almost always* raised as Exceptions
> >     somewhere else, eventually. Sometimes when we encounter them in
> >     one context, we need to save them and then re-raise them in a
> >     different execution context. There's one good exception to this:
> >     My pal, EOFError.
> >
> >     If the reader context encounters EOF, it raises EOFError and this
> >     causes a disconnect to be scheduled asynchronously. *Any*
> >     Exception that causes a disconnect to be scheduled asynchronously
> >     is dutifully logged as an ERROR. At this point in the code, we
> >     don't really know if the user of the library considers this an
> >     "error" yet or not. I've waffled a lot on how exactly to treat
> >     this circumstance. ...Hm, I guess that's really the only case
> >     where I have an error that really ought to be suppressed. I
> >     suppose what I will do here is: if the exception happens to be an
> >     EOFError I will drop the severity of the log message down to INFO.
> >     I don't know why it takes being challenged on this stuff to start
> >     thinking clearly about it, but here we are. Thank you for your
> >     feedback :~)
> >
> >     --js
> >
> >
> > Oh, CI testing reminds me of why I am a liar here.
> >
> > the mirror-top-perms test intentionally expects not to be able to
> > connect, but we're treated to these two additional lines of output:
> >
> > +ERROR:qemu.aqmp.qmp_client.qemub-2536319:Negotiation failed: EOFError
> > +ERROR:qemu.aqmp.qmp_client.qemub-2536319:Failed to establish session:
> > EOFError
> >
> > Uh. I guess a temporary suppression in mirror-top-perms, then ...?
>
> Sounds right to me, if that’s simple enough.
>
> (By the way, I understand it right that you want to lower the severity
> of EOFErrors to INFO only on disconnect, right?  Which is why they’re
> still logged as ERRORs here, because they aren’t occurring on disconnects?)
>
>
More or less, yeah.

When an EOFError causes the reader coroutine to halt (because it can't read
the next message), I decided (in v2) to drop that one particular logging
message down to "INFO", because it might -- or might not be -- an expected
occurrence from the point of view of whoever is managing the QMP
connection. Maybe it was expected (The test used qemu-guest-agent or
something else to make the guest shutdown, taking QEMU down with it without
the knowledge of the QMP library layer) or maybe it was unexpected (the QMP
remote really just disappeared from us on a whim). There's no way to know,
so it probably isn't right to consider it an error.

In the connection case, I left it as an ERROR because the caller asked us
to connect to an endpoint and we were unable to, which feels unambiguous.
It will be ultimately reported via Exceptions as qemu.aqmp.ConnectError,
with additional information available in fields of that exception object.
Even though the exception is reported to the caller, I decided to log the
occurrence anyway, because I felt like it should be the job of the library
to make a good log and not the caller's responsibility to catch the
exception and then log it themselves.

That does leave us with this atypical case though: the caller is
intentionally causing problems :)

(Well, atypical for a user of the library who is using it to make a tool
they expect to work. Quite a typical case for tests in the abstract, though
we only seem to have precisely one usage of this pattern in iotests so far.)


> > In most other cases I rather imagine we do want to see this kind of
> > output to help give more information about how things have failed --
> > they ARE errors. We just happen to not care about them right here.
>
> Well, in fact we do expect them here, so we could even log them, but
> first I don’t know whether they’re stable enough for that, and second
> this would break the “choose the AQMP or legacy QMP back-end via an
> environment variable” thing.
>
>
Yeah, that's the other half of it, I came to realize. Just logging the
expected failure feels the most idiomatic, but it requires a new logging
filter (I have to filter out the PID from the logger name to make it work
consistently) and breaks the ability to switch.

So I suppose "for now" just disabling the logger output for a critical
section and leaving a comment that states that once we're feeling confident
about the new library (maybe after the next release when everything has
really gone through the testing wash cycle) we can remove the suppression
and just log the errors the normal way.


> > The only thing I don't exactly like about this is that it requires
> > some knowledge by the caller to know to disable it. It makes writing
> > negative tests a bit more annoying because the library leans so
> > heavily on yelling loudly when it encounters problems.
>
> Yeah.  I’m afraid I don’t have a good idea on how to convey test writers
> how to suppress these errors, even if it were a simple one-liner (like
> `self.vm_b.set_log_threshold(logging.CRITICAL)`)...
>
> We could start an “iotests tips and tricks” document, but do we want to?
>
> Hanna
>
>
Hm, not just yet. More chances for un-checked stuff to bitrot. There's only
this one caller that poses a problem here, so it's probably not going to be
too difficult to just update tests to expect the error logs.

V2 soon. I already respun it, but it's been a week, so I will need to scrub
it down with a good proof-reading pass. Thanks for your patience, and I
hope you enjoyed your PTO :~)

--js
John Snow Oct. 4, 2021, 9:26 p.m. UTC | #7
On Mon, Oct 4, 2021 at 2:32 PM John Snow <jsnow@redhat.com> wrote:

>
>
> On Mon, Oct 4, 2021 at 6:12 AM Hanna Reitz <hreitz@redhat.com> wrote:
>
>> On 18.09.21 04:14, John Snow wrote:
>> >
>> >
>> > On Fri, Sep 17, 2021 at 8:58 PM John Snow <jsnow@redhat.com
>> > <mailto:jsnow@redhat.com>> wrote:
>> >
>> >
>> >
>> >     On Fri, Sep 17, 2021 at 10:30 AM Hanna Reitz <hreitz@redhat.com
>> >     <mailto:hreitz@redhat.com>> wrote:
>> >
>> >         On 17.09.21 07:40, John Snow wrote:
>> >         > Disable the aqmp logger, which likes to (at the moment)
>> >         print out
>> >         > intermediate warnings and errors that cause session
>> >         termination; disable
>> >         > them so they don't interfere with the job output.
>> >         >
>> >         > Leave any "CRITICAL" warnings enabled though, those are ones
>> >         that we
>> >         > should never see, no matter what.
>> >
>> >         I mean, looks OK to me, but from what I understand (i.e.
>> little),
>> >         qmp_client doesn’t log CRITICAL messages, at least I can’t see
>> >         any. Only
>> >         ERRORs.
>> >
>> >
>> >     There's *one* critical message in protocol.py, used for a
>> >     circumstance that I *think* should be impossible. I do not think I
>> >     currently use any WARNING level statements.
>> >
>> >         I guess I’m missing some CRITICAL messages in external
>> >         functions called
>> >         from qmp_client.py, but shouldn’t we still keep ERRORs?
>> >
>> >
>> >     ...Mayyyyyybe?
>> >
>> >     The errors logged by AQMP are *almost always* raised as Exceptions
>> >     somewhere else, eventually. Sometimes when we encounter them in
>> >     one context, we need to save them and then re-raise them in a
>> >     different execution context. There's one good exception to this:
>> >     My pal, EOFError.
>> >
>> >     If the reader context encounters EOF, it raises EOFError and this
>> >     causes a disconnect to be scheduled asynchronously. *Any*
>> >     Exception that causes a disconnect to be scheduled asynchronously
>> >     is dutifully logged as an ERROR. At this point in the code, we
>> >     don't really know if the user of the library considers this an
>> >     "error" yet or not. I've waffled a lot on how exactly to treat
>> >     this circumstance. ...Hm, I guess that's really the only case
>> >     where I have an error that really ought to be suppressed. I
>> >     suppose what I will do here is: if the exception happens to be an
>> >     EOFError I will drop the severity of the log message down to INFO.
>> >     I don't know why it takes being challenged on this stuff to start
>> >     thinking clearly about it, but here we are. Thank you for your
>> >     feedback :~)
>> >
>> >     --js
>> >
>> >
>> > Oh, CI testing reminds me of why I am a liar here.
>> >
>> > the mirror-top-perms test intentionally expects not to be able to
>> > connect, but we're treated to these two additional lines of output:
>> >
>> > +ERROR:qemu.aqmp.qmp_client.qemub-2536319:Negotiation failed: EOFError
>> > +ERROR:qemu.aqmp.qmp_client.qemub-2536319:Failed to establish session:
>> > EOFError
>> >
>> > Uh. I guess a temporary suppression in mirror-top-perms, then ...?
>>
>> Sounds right to me, if that’s simple enough.
>>
>> (By the way, I understand it right that you want to lower the severity
>> of EOFErrors to INFO only on disconnect, right?  Which is why they’re
>> still logged as ERRORs here, because they aren’t occurring on
>> disconnects?)
>>
>>
> More or less, yeah.
>
> When an EOFError causes the reader coroutine to halt (because it can't
> read the next message), I decided (in v2) to drop that one particular
> logging message down to "INFO", because it might -- or might not be -- an
> expected occurrence from the point of view of whoever is managing the QMP
> connection. Maybe it was expected (The test used qemu-guest-agent or
> something else to make the guest shutdown, taking QEMU down with it without
> the knowledge of the QMP library layer) or maybe it was unexpected (the QMP
> remote really just disappeared from us on a whim). There's no way to know,
> so it probably isn't right to consider it an error.
>
> In the connection case, I left it as an ERROR because the caller asked us
> to connect to an endpoint and we were unable to, which feels unambiguous.
> It will be ultimately reported via Exceptions as qemu.aqmp.ConnectError,
> with additional information available in fields of that exception object.
> Even though the exception is reported to the caller, I decided to log the
> occurrence anyway, because I felt like it should be the job of the library
> to make a good log and not the caller's responsibility to catch the
> exception and then log it themselves.
>
> That does leave us with this atypical case though: the caller is
> intentionally causing problems :)
>
> (Well, atypical for a user of the library who is using it to make a tool
> they expect to work. Quite a typical case for tests in the abstract, though
> we only seem to have precisely one usage of this pattern in iotests so far.)
>
>
>> > In most other cases I rather imagine we do want to see this kind of
>> > output to help give more information about how things have failed --
>> > they ARE errors. We just happen to not care about them right here.
>>
>> Well, in fact we do expect them here, so we could even log them, but
>> first I don’t know whether they’re stable enough for that, and second
>> this would break the “choose the AQMP or legacy QMP back-end via an
>> environment variable” thing.
>>
>>
> Yeah, that's the other half of it, I came to realize. Just logging the
> expected failure feels the most idiomatic, but it requires a new logging
> filter (I have to filter out the PID from the logger name to make it work
> consistently) and breaks the ability to switch.
>
> So I suppose "for now" just disabling the logger output for a critical
> section and leaving a comment that states that once we're feeling confident
> about the new library (maybe after the next release when everything has
> really gone through the testing wash cycle) we can remove the suppression
> and just log the errors the normal way.
>
>
>> > The only thing I don't exactly like about this is that it requires
>> > some knowledge by the caller to know to disable it. It makes writing
>> > negative tests a bit more annoying because the library leans so
>> > heavily on yelling loudly when it encounters problems.
>>
>> Yeah.  I’m afraid I don’t have a good idea on how to convey test writers
>> how to suppress these errors, even if it were a simple one-liner (like
>> `self.vm_b.set_log_threshold(logging.CRITICAL)`)...
>>
>> We could start an “iotests tips and tricks” document, but do we want to?
>>
>> Hanna
>>
>>
> Hm, not just yet. More chances for un-checked stuff to bitrot. There's
> only this one caller that poses a problem here, so it's probably not going
> to be too difficult to just update tests to expect the error logs.
>
> V2 soon. I already respun it, but it's been a week, so I will need to
> scrub it down with a good proof-reading pass. Thanks for your patience, and
> I hope you enjoyed your PTO :~)
>
>
Oh, uh, I got confused and I actually already sent V2 for this series.
Apologies for the confusion.

--js
Hanna Reitz Oct. 5, 2021, 3:12 p.m. UTC | #8
On 04.10.21 20:32, John Snow wrote:
>
>
> On Mon, Oct 4, 2021 at 6:12 AM Hanna Reitz <hreitz@redhat.com 
> <mailto:hreitz@redhat.com>> wrote:
>
>     On 18.09.21 04:14, John Snow wrote:
>     >
>     >
>     > On Fri, Sep 17, 2021 at 8:58 PM John Snow <jsnow@redhat.com
>     <mailto:jsnow@redhat.com>
>     > <mailto:jsnow@redhat.com <mailto:jsnow@redhat.com>>> wrote:
>     >
>     >
>     >
>     >     On Fri, Sep 17, 2021 at 10:30 AM Hanna Reitz
>     <hreitz@redhat.com <mailto:hreitz@redhat.com>
>     >     <mailto:hreitz@redhat.com <mailto:hreitz@redhat.com>>> wrote:
>     >
>     >         On 17.09.21 07:40, John Snow wrote:
>     >         > Disable the aqmp logger, which likes to (at the moment)
>     >         print out
>     >         > intermediate warnings and errors that cause session
>     >         termination; disable
>     >         > them so they don't interfere with the job output.
>     >         >
>     >         > Leave any "CRITICAL" warnings enabled though, those
>     are ones
>     >         that we
>     >         > should never see, no matter what.
>     >
>     >         I mean, looks OK to me, but from what I understand (i.e.
>     little),
>     >         qmp_client doesn’t log CRITICAL messages, at least I
>     can’t see
>     >         any. Only
>     >         ERRORs.
>     >
>     >
>     >     There's *one* critical message in protocol.py, used for a
>     >     circumstance that I *think* should be impossible. I do not
>     think I
>     >     currently use any WARNING level statements.
>     >
>     >         I guess I’m missing some CRITICAL messages in external
>     >         functions called
>     >         from qmp_client.py, but shouldn’t we still keep ERRORs?
>     >
>     >
>     >     ...Mayyyyyybe?
>     >
>     >     The errors logged by AQMP are *almost always* raised as
>     Exceptions
>     >     somewhere else, eventually. Sometimes when we encounter them in
>     >     one context, we need to save them and then re-raise them in a
>     >     different execution context. There's one good exception to this:
>     >     My pal, EOFError.
>     >
>     >     If the reader context encounters EOF, it raises EOFError and
>     this
>     >     causes a disconnect to be scheduled asynchronously. *Any*
>     >     Exception that causes a disconnect to be scheduled
>     asynchronously
>     >     is dutifully logged as an ERROR. At this point in the code, we
>     >     don't really know if the user of the library considers this an
>     >     "error" yet or not. I've waffled a lot on how exactly to treat
>     >     this circumstance. ...Hm, I guess that's really the only case
>     >     where I have an error that really ought to be suppressed. I
>     >     suppose what I will do here is: if the exception happens to
>     be an
>     >     EOFError I will drop the severity of the log message down to
>     INFO.
>     >     I don't know why it takes being challenged on this stuff to
>     start
>     >     thinking clearly about it, but here we are. Thank you for your
>     >     feedback :~)
>     >
>     >     --js
>     >
>     >
>     > Oh, CI testing reminds me of why I am a liar here.
>     >
>     > the mirror-top-perms test intentionally expects not to be able to
>     > connect, but we're treated to these two additional lines of output:
>     >
>     > +ERROR:qemu.aqmp.qmp_client.qemub-2536319:Negotiation failed:
>     EOFError
>     > +ERROR:qemu.aqmp.qmp_client.qemub-2536319:Failed to establish
>     session:
>     > EOFError
>     >
>     > Uh. I guess a temporary suppression in mirror-top-perms, then ...?
>
>     Sounds right to me, if that’s simple enough.
>
>     (By the way, I understand it right that you want to lower the
>     severity
>     of EOFErrors to INFO only on disconnect, right?  Which is why they’re
>     still logged as ERRORs here, because they aren’t occurring on
>     disconnects?)
>
>
> More or less, yeah.
>
> When an EOFError causes the reader coroutine to halt (because it can't 
> read the next message), I decided (in v2) to drop that one particular 
> logging message down to "INFO", because it might -- or might not be -- 
> an expected occurrence from the point of view of whoever is managing 
> the QMP connection. Maybe it was expected (The test used 
> qemu-guest-agent or something else to make the guest shutdown, taking 
> QEMU down with it without the knowledge of the QMP library layer) or 
> maybe it was unexpected (the QMP remote really just disappeared from 
> us on a whim). There's no way to know, so it probably isn't right to 
> consider it an error.
>
> In the connection case, I left it as an ERROR because the caller asked 
> us to connect to an endpoint and we were unable to, which feels 
> unambiguous. It will be ultimately reported via Exceptions as 
> qemu.aqmp.ConnectError, with additional information available in 
> fields of that exception object. Even though the exception is reported 
> to the caller, I decided to log the occurrence anyway, because I felt 
> like it should be the job of the library to make a good log and not 
> the caller's responsibility to catch the exception and then log it 
> themselves.
>
> That does leave us with this atypical case though: the caller is 
> intentionally causing problems :)
>
> (Well, atypical for a user of the library who is using it to make a 
> tool they expect to work. Quite a typical case for tests in the 
> abstract, though we only seem to have precisely one usage of this 
> pattern in iotests so far.)
>
>     > In most other cases I rather imagine we do want to see this kind of
>     > output to help give more information about how things have
>     failed --
>     > they ARE errors. We just happen to not care about them right here.
>
>     Well, in fact we do expect them here, so we could even log them, but
>     first I don’t know whether they’re stable enough for that, and second
>     this would break the “choose the AQMP or legacy QMP back-end via an
>     environment variable” thing.
>
>
> Yeah, that's the other half of it, I came to realize. Just logging the 
> expected failure feels the most idiomatic, but it requires a new 
> logging filter (I have to filter out the PID from the logger name to 
> make it work consistently) and breaks the ability to switch.
>
> So I suppose "for now" just disabling the logger output for a critical 
> section and leaving a comment that states that once we're feeling 
> confident about the new library (maybe after the next release when 
> everything has really gone through the testing wash cycle) we can 
> remove the suppression and just log the errors the normal way.
>
>     > The only thing I don't exactly like about this is that it requires
>     > some knowledge by the caller to know to disable it. It makes
>     writing
>     > negative tests a bit more annoying because the library leans so
>     > heavily on yelling loudly when it encounters problems.
>
>     Yeah.  I’m afraid I don’t have a good idea on how to convey test
>     writers
>     how to suppress these errors, even if it were a simple one-liner
>     (like
>     `self.vm_b.set_log_threshold(logging.CRITICAL)`)...
>
>     We could start an “iotests tips and tricks” document, but do we
>     want to?
>
>     Hanna
>
>
> Hm, not just yet. More chances for un-checked stuff to bitrot. There's 
> only this one caller that poses a problem here, so it's probably not 
> going to be too difficult to just update tests to expect the error logs.

OK.

> V2 soon. I already respun it, but it's been a week, so I will need to 
> scrub it down with a good proof-reading pass. Thanks for your 
> patience, and I hope you enjoyed your PTO :~)

You know how it is.  Some of it was spent enjoying it, the rest was 
spent lamenting that I’m not enjoying it enough.

Hanna
diff mbox series

Patch

diff --git a/tests/qemu-iotests/iotests.py b/tests/qemu-iotests/iotests.py
index 273d2777ae..47e5f9738b 100644
--- a/tests/qemu-iotests/iotests.py
+++ b/tests/qemu-iotests/iotests.py
@@ -1383,6 +1383,8 @@  def execute_setup_common(supported_fmts: Sequence[str] = (),
     if debug:
         sys.argv.remove('-d')
     logging.basicConfig(level=(logging.DEBUG if debug else logging.WARN))
+    if not debug:
+        logging.getLogger("qemu.aqmp.qmp_client").setLevel(logging.CRITICAL)
 
     _verify_image_format(supported_fmts, unsupported_fmts)
     _verify_protocol(supported_protocols, unsupported_protocols)