diff mbox series

[v3,10/15] qemu_iotests: extent QMP socket timeout when using valgrind

Message ID 20210414170352.29927-11-eesposit@redhat.com (mailing list archive)
State New, archived
Headers show
Series qemu_iotests: improve debugging options | expand

Commit Message

Emanuele Giuseppe Esposito April 14, 2021, 5:03 p.m. UTC
As with gdbserver, valgrind delays the test execution, so
the default QMP socket timeout timeout too soon.

Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
---
 python/qemu/machine.py        | 2 +-
 tests/qemu-iotests/iotests.py | 4 ++--
 2 files changed, 3 insertions(+), 3 deletions(-)

Comments

Max Reitz April 30, 2021, 1:02 p.m. UTC | #1
On 14.04.21 19:03, Emanuele Giuseppe Esposito wrote:
> As with gdbserver, valgrind delays the test execution, so
> the default QMP socket timeout timeout too soon.

I’m curious: The default timeouts should be long enough for slow 
systems, too, though (e.g. heavily-loaded CI systems).  I would expect 
that valgrind is used on developer systems where there is more leeway, 
so the timeouts should still work.

But in practice, thinking about that doesn’t matter.  If valgrind leads 
to a timeout being hit, that wouldn’t be nice.  OTOH, if you run 
valgrind to debug a test/qemu, you don’t particularly care about the 
timeouts anyway.

So in principle, this patch sounds good to me, it’s just that it’s based 
on patch 5, which I don’t fully agree with.

Max

> Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
> ---
>   python/qemu/machine.py        | 2 +-
>   tests/qemu-iotests/iotests.py | 4 ++--
>   2 files changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/python/qemu/machine.py b/python/qemu/machine.py
> index d6142271c2..dce96e1858 100644
> --- a/python/qemu/machine.py
> +++ b/python/qemu/machine.py
> @@ -410,7 +410,7 @@ def _launch(self) -> None:
>                                          shell=False,
>                                          close_fds=False)
>   
> -        if 'gdbserver' in self._wrapper:
> +        if 'gdbserver' in self._wrapper or 'valgrind' in self._wrapper:
>               self._qmp_timer = None
>           self._post_launch()
>   
> diff --git a/tests/qemu-iotests/iotests.py b/tests/qemu-iotests/iotests.py
> index a2e8604674..94597433fa 100644
> --- a/tests/qemu-iotests/iotests.py
> +++ b/tests/qemu-iotests/iotests.py
> @@ -489,7 +489,7 @@ def log(msg: Msg,
>   
>   class Timeout:
>       def __init__(self, seconds, errmsg="Timeout"):
> -        if qemu_gdb:
> +        if qemu_gdb or qemu_valgrind:
>               self.seconds = 3000
>           else:
>               self.seconds = seconds
> @@ -700,7 +700,7 @@ def qmp_to_opts(self, obj):
>           return ','.join(output_list)
>   
>       def get_qmp_events(self, wait: bool = False) -> List[QMPMessage]:
> -        if qemu_gdb:
> +        if qemu_gdb or qemu_valgrind:
>               wait = 0.0
>           return super().get_qmp_events(wait=wait)
>   
>
Emanuele Giuseppe Esposito April 30, 2021, 9:03 p.m. UTC | #2
On 30/04/2021 15:02, Max Reitz wrote:
> On 14.04.21 19:03, Emanuele Giuseppe Esposito wrote:
>> As with gdbserver, valgrind delays the test execution, so
>> the default QMP socket timeout timeout too soon.
> 
> I’m curious: The default timeouts should be long enough for slow 
> systems, too, though (e.g. heavily-loaded CI systems).  I would expect 
> that valgrind is used on developer systems where there is more leeway, 
> so the timeouts should still work.

As said in patch 5, I will check again which timeout is essential to 
avoid and which not.

Emanuele
> 
> But in practice, thinking about that doesn’t matter.  If valgrind leads 
> to a timeout being hit, that wouldn’t be nice.  OTOH, if you run 
> valgrind to debug a test/qemu, you don’t particularly care about the 
> timeouts anyway.
> 
> So in principle, this patch sounds good to me, it’s just that it’s based 
> on patch 5, which I don’t fully agree with.
> 
> Max
> 
>> Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
>> ---
>>   python/qemu/machine.py        | 2 +-
>>   tests/qemu-iotests/iotests.py | 4 ++--
>>   2 files changed, 3 insertions(+), 3 deletions(-)
>>
>> diff --git a/python/qemu/machine.py b/python/qemu/machine.py
>> index d6142271c2..dce96e1858 100644
>> --- a/python/qemu/machine.py
>> +++ b/python/qemu/machine.py
>> @@ -410,7 +410,7 @@ def _launch(self) -> None:
>>                                          shell=False,
>>                                          close_fds=False)
>> -        if 'gdbserver' in self._wrapper:
>> +        if 'gdbserver' in self._wrapper or 'valgrind' in self._wrapper:
>>               self._qmp_timer = None
>>           self._post_launch()
>> diff --git a/tests/qemu-iotests/iotests.py 
>> b/tests/qemu-iotests/iotests.py
>> index a2e8604674..94597433fa 100644
>> --- a/tests/qemu-iotests/iotests.py
>> +++ b/tests/qemu-iotests/iotests.py
>> @@ -489,7 +489,7 @@ def log(msg: Msg,
>>   class Timeout:
>>       def __init__(self, seconds, errmsg="Timeout"):
>> -        if qemu_gdb:
>> +        if qemu_gdb or qemu_valgrind:
>>               self.seconds = 3000
>>           else:
>>               self.seconds = seconds
>> @@ -700,7 +700,7 @@ def qmp_to_opts(self, obj):
>>           return ','.join(output_list)
>>       def get_qmp_events(self, wait: bool = False) -> List[QMPMessage]:
>> -        if qemu_gdb:
>> +        if qemu_gdb or qemu_valgrind:
>>               wait = 0.0
>>           return super().get_qmp_events(wait=wait)
>>
>
John Snow May 13, 2021, 6:47 p.m. UTC | #3
On 4/14/21 1:03 PM, Emanuele Giuseppe Esposito wrote:
> As with gdbserver, valgrind delays the test execution, so
> the default QMP socket timeout timeout too soon.
> 
> Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
> ---
>   python/qemu/machine.py        | 2 +-
>   tests/qemu-iotests/iotests.py | 4 ++--
>   2 files changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/python/qemu/machine.py b/python/qemu/machine.py
> index d6142271c2..dce96e1858 100644
> --- a/python/qemu/machine.py
> +++ b/python/qemu/machine.py
> @@ -410,7 +410,7 @@ def _launch(self) -> None:
>                                          shell=False,
>                                          close_fds=False)
>   
> -        if 'gdbserver' in self._wrapper:
> +        if 'gdbserver' in self._wrapper or 'valgrind' in self._wrapper:

This approaches me suggesting that we just change __init__ to accept a 
parameter that lets the caller decide what kind of timeout(s) they find 
acceptable. They know more about what they're trying to run than we do.

Certainly after launch occurs, the user is free to just grab the qmp 
object and tinker around with the timeouts, but that does not allow us 
to change the timeout(s) for accept itself.

D'oh.

(Spilled milk: It was probably a mistake to make the default launch 
behavior here have a timeout of 15 seconds. That logic likely belongs to 
the iotests implementation. The default here probably ought to indeed be 
"wait forever".)

In the here and now ... would it be acceptable to change the launch() 
method to add a timeout parameter? It's still a little awkward, because 
conceptually it's a timeout for just QMP and not for the actual duration 
of the entire launch process.

But, I guess, it's *closer* to the truth.

If you wanted to route it that way, I take back what I said about not 
wanting to pass around variables to event loop hooks.

If we defined the timeout as something that applies exclusively to the 
launching process, then it'd be appropriate to route that to the 
launch-related functions ... and subclasses would have to be adjusted to 
be made aware that they're expected to operate within those parameters, 
which is good.

Sorry for my waffling back and forth on this. Let me know what the 
actual requirements are if you figure out which timeouts you need / 
don't need and I'll give you some review priority.

If you attack this series again, can you please split out the python/* 
patches into its own little series and CC me?

--js

>               self._qmp_timer = None
>           self._post_launch()
>   
> diff --git a/tests/qemu-iotests/iotests.py b/tests/qemu-iotests/iotests.py
> index a2e8604674..94597433fa 100644
> --- a/tests/qemu-iotests/iotests.py
> +++ b/tests/qemu-iotests/iotests.py
> @@ -489,7 +489,7 @@ def log(msg: Msg,
>   
>   class Timeout:
>       def __init__(self, seconds, errmsg="Timeout"):
> -        if qemu_gdb:
> +        if qemu_gdb or qemu_valgrind:
>               self.seconds = 3000
>           else:
>               self.seconds = seconds
> @@ -700,7 +700,7 @@ def qmp_to_opts(self, obj):
>           return ','.join(output_list)
>   
>       def get_qmp_events(self, wait: bool = False) -> List[QMPMessage]:
> -        if qemu_gdb:
> +        if qemu_gdb or qemu_valgrind:
>               wait = 0.0
>           return super().get_qmp_events(wait=wait)
>   
>
Emanuele Giuseppe Esposito May 14, 2021, 8:16 a.m. UTC | #4
On 13/05/2021 20:47, John Snow wrote:
> On 4/14/21 1:03 PM, Emanuele Giuseppe Esposito wrote:
>> As with gdbserver, valgrind delays the test execution, so
>> the default QMP socket timeout timeout too soon.
>>
>> Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
>> ---
>>   python/qemu/machine.py        | 2 +-
>>   tests/qemu-iotests/iotests.py | 4 ++--
>>   2 files changed, 3 insertions(+), 3 deletions(-)
>>
>> diff --git a/python/qemu/machine.py b/python/qemu/machine.py
>> index d6142271c2..dce96e1858 100644
>> --- a/python/qemu/machine.py
>> +++ b/python/qemu/machine.py
>> @@ -410,7 +410,7 @@ def _launch(self) -> None:
>>                                          shell=False,
>>                                          close_fds=False)
>> -        if 'gdbserver' in self._wrapper:
>> +        if 'gdbserver' in self._wrapper or 'valgrind' in self._wrapper:
> 
> This approaches me suggesting that we just change __init__ to accept a 
> parameter that lets the caller decide what kind of timeout(s) they find 
> acceptable. They know more about what they're trying to run than we do.
> 
> Certainly after launch occurs, the user is free to just grab the qmp 
> object and tinker around with the timeouts, but that does not allow us 
> to change the timeout(s) for accept itself.
> 
> D'oh.
> 
> (Spilled milk: It was probably a mistake to make the default launch 
> behavior here have a timeout of 15 seconds. That logic likely belongs to 
> the iotests implementation. The default here probably ought to indeed be 
> "wait forever".)
> 
> In the here and now ... would it be acceptable to change the launch() 
> method to add a timeout parameter? It's still a little awkward, because 
> conceptually it's a timeout for just QMP and not for the actual duration 
> of the entire launch process.
> 
> But, I guess, it's *closer* to the truth.
> 
> If you wanted to route it that way, I take back what I said about not 
> wanting to pass around variables to event loop hooks.
> 
> If we defined the timeout as something that applies exclusively to the 
> launching process, then it'd be appropriate to route that to the 
> launch-related functions ... and subclasses would have to be adjusted to 
> be made aware that they're expected to operate within those parameters, 
> which is good.
> 
> Sorry for my waffling back and forth on this. Let me know what the 
> actual requirements are if you figure out which timeouts you need / 
> don't need and I'll give you some review priority.

Uhm.. I am getting a little bit confused on what to do too :)

So the current plan I have for _qmp_timer is:

- As Max suggested, move it in __init__ and check there for the wrapper 
contents. If we need to block forever (gdb, valgrind), we set it to 
None. Otherwise to 15 seconds. I think setting it always to None is not 
ideal, because if you are testing something that deadlocks (see my 
attempts to remove/add locks in QEMU multiqueue) and the socket is set 
to block forever, you don't know if the test is super slow or it just 
deadlocked.

Well, one can argue that in both cases this is not the expected 
behavior, but I think having an upper bound on each QMP command 
execution would be good.

- pass _qmp_timer directly to self._qmp.accept() in _post launch, 
leaving _launch() intact. I think this makes sense because as you also 
mentioned, changing _post_launch() into taking a parameter requires 
changing also all subclasses and pass values around.

Any opinion on this is very welcome.

Spoiler alert I haven't tested these changes yet, but I am positive that 
there shouldn't be any problem. (Last famous words)

Emanuele


> 
> If you attack this series again, can you please split out the python/* 
> patches into its own little series and CC me?
> 
> --js
> 
>>               self._qmp_timer = None
>>           self._post_launch()
>> diff --git a/tests/qemu-iotests/iotests.py 
>> b/tests/qemu-iotests/iotests.py
>> index a2e8604674..94597433fa 100644
>> --- a/tests/qemu-iotests/iotests.py
>> +++ b/tests/qemu-iotests/iotests.py
>> @@ -489,7 +489,7 @@ def log(msg: Msg,
>>   class Timeout:
>>       def __init__(self, seconds, errmsg="Timeout"):
>> -        if qemu_gdb:
>> +        if qemu_gdb or qemu_valgrind:
>>               self.seconds = 3000
>>           else:
>>               self.seconds = seconds
>> @@ -700,7 +700,7 @@ def qmp_to_opts(self, obj):
>>           return ','.join(output_list)
>>       def get_qmp_events(self, wait: bool = False) -> List[QMPMessage]:
>> -        if qemu_gdb:
>> +        if qemu_gdb or qemu_valgrind:
>>               wait = 0.0
>>           return super().get_qmp_events(wait=wait)
>>
>
John Snow May 14, 2021, 8:02 p.m. UTC | #5
On 5/14/21 4:16 AM, Emanuele Giuseppe Esposito wrote:
> 
> 
> On 13/05/2021 20:47, John Snow wrote:
>> On 4/14/21 1:03 PM, Emanuele Giuseppe Esposito wrote:
>>> As with gdbserver, valgrind delays the test execution, so
>>> the default QMP socket timeout timeout too soon.
>>>
>>> Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
>>> ---
>>>   python/qemu/machine.py        | 2 +-
>>>   tests/qemu-iotests/iotests.py | 4 ++--
>>>   2 files changed, 3 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/python/qemu/machine.py b/python/qemu/machine.py
>>> index d6142271c2..dce96e1858 100644
>>> --- a/python/qemu/machine.py
>>> +++ b/python/qemu/machine.py
>>> @@ -410,7 +410,7 @@ def _launch(self) -> None:
>>>                                          shell=False,
>>>                                          close_fds=False)
>>> -        if 'gdbserver' in self._wrapper:
>>> +        if 'gdbserver' in self._wrapper or 'valgrind' in self._wrapper:
>>
>> This approaches me suggesting that we just change __init__ to accept a 
>> parameter that lets the caller decide what kind of timeout(s) they 
>> find acceptable. They know more about what they're trying to run than 
>> we do.
>>
>> Certainly after launch occurs, the user is free to just grab the qmp 
>> object and tinker around with the timeouts, but that does not allow us 
>> to change the timeout(s) for accept itself.
>>
>> D'oh.
>>
>> (Spilled milk: It was probably a mistake to make the default launch 
>> behavior here have a timeout of 15 seconds. That logic likely belongs 
>> to the iotests implementation. The default here probably ought to 
>> indeed be "wait forever".)
>>
>> In the here and now ... would it be acceptable to change the launch() 
>> method to add a timeout parameter? It's still a little awkward, 
>> because conceptually it's a timeout for just QMP and not for the 
>> actual duration of the entire launch process.
>>
>> But, I guess, it's *closer* to the truth.
>>
>> If you wanted to route it that way, I take back what I said about not 
>> wanting to pass around variables to event loop hooks.
>>
>> If we defined the timeout as something that applies exclusively to the 
>> launching process, then it'd be appropriate to route that to the 
>> launch-related functions ... and subclasses would have to be adjusted 
>> to be made aware that they're expected to operate within those 
>> parameters, which is good.
>>
>> Sorry for my waffling back and forth on this. Let me know what the 
>> actual requirements are if you figure out which timeouts you need / 
>> don't need and I'll give you some review priority.
> 
> Uhm.. I am getting a little bit confused on what to do too :)
> 

SORRY, I hit send too quickly and then change my mind. I've handed you a 
giant bag of my own confusion. Very unfair of me!

> So the current plan I have for _qmp_timer is:
> 
> - As Max suggested, move it in __init__ and check there for the wrapper 
> contents. If we need to block forever (gdb, valgrind), we set it to 
> None. Otherwise to 15 seconds. I think setting it always to None is not 
> ideal, because if you are testing something that deadlocks (see my 
> attempts to remove/add locks in QEMU multiqueue) and the socket is set 
> to block forever, you don't know if the test is super slow or it just 
> deadlocked.
> 

I agree with your concern on rational defaults, let's focus on that briefly:

Let's have QEMUMachine default to *no timeouts* moving forward, and have 
the timeouts be *opt-in*. This keeps the Machine class somewhat pure and 
free of opinions. The separation of mechanism and policy.

Next, instead of modifying hundreds of tests to opt-in to the timeout, 
let's modify the VM class in iotests.py to opt-in to that timeout, 
restoring the current "safe" behavior of iotests.

The above items can happen in a single commit, preserving behavior in 
the bisect.

Finally, we can add a non-private property that individual tests can 
re-override to opt BACK out of the default.

Something as simple as:

vm.qmp_timeout = None

would be just fine.

> Well, one can argue that in both cases this is not the expected 
> behavior, but I think having an upper bound on each QMP command 
> execution would be good.
> 
> - pass _qmp_timer directly to self._qmp.accept() in _post launch, 
> leaving _launch() intact. I think this makes sense because as you also 
> mentioned, changing _post_launch() into taking a parameter requires 
> changing also all subclasses and pass values around.
> 

Sounds OK. If we do change the defaults back to "No Timeout" in a way 
that allows an override by an opinionated class, we'll already have the 
public property, though, so a parameter might not be needed.

(Yes, this is the THIRD time I've changed my mind in 48 hours.)

> Any opinion on this is very welcome.
> 

Brave words!

My last thought here is that I still don't like the idea of QEMUMachine 
class changing its timeout behavior based on the introspection of 
wrapper args.

It feels much more like the case that a caller who is knowingly wrapping 
it with a program that delays its execution should change its parameters 
accordingly based on what the caller knows about what they're trying to 
accomplish.

Does that make the code too messy? I understand you probably want to 
ensure that adding a GDB wrapper is painless and simple, so it might not 
be great to always ask a caller to remember to set some timeout value to 
use it.

I figure that the right place to do this, though, is wherever the 
boilerplate code gets written that knows how to set up the right gdb 
args and so on, and not in machine.py. It sounds like iotests.py code to 
me, maybe in the VM class.

> Spoiler alert I haven't tested these changes yet, but I am positive that 
> there shouldn't be any problem. (Last famous words)
> 
> Emanuele
> 
> 
Clear as mud?

--js
Emanuele Giuseppe Esposito May 18, 2021, 1:58 p.m. UTC | #6
>> So the current plan I have for _qmp_timer is:
>>
>> - As Max suggested, move it in __init__ and check there for the 
>> wrapper contents. If we need to block forever (gdb, valgrind), we set 
>> it to None. Otherwise to 15 seconds. I think setting it always to None 
>> is not ideal, because if you are testing something that deadlocks (see 
>> my attempts to remove/add locks in QEMU multiqueue) and the socket is 
>> set to block forever, you don't know if the test is super slow or it 
>> just deadlocked.
>>
> 
> I agree with your concern on rational defaults, let's focus on that 
> briefly:
> 
> Let's have QEMUMachine default to *no timeouts* moving forward, and have 
> the timeouts be *opt-in*. This keeps the Machine class somewhat pure and 
> free of opinions. The separation of mechanism and policy.
> 
> Next, instead of modifying hundreds of tests to opt-in to the timeout, 
> let's modify the VM class in iotests.py to opt-in to that timeout, 
> restoring the current "safe" behavior of iotests.
> 
> The above items can happen in a single commit, preserving behavior in 
> the bisect.
> 
> Finally, we can add a non-private property that individual tests can 
> re-override to opt BACK out of the default.
> 
> Something as simple as:
> 
> vm.qmp_timeout = None
> 
> would be just fine.
>

I applied these suggested changes, will send v4 and we'll see what comes 
out of it.

>> Well, one can argue that in both cases this is not the expected 
>> behavior, but I think having an upper bound on each QMP command 
>> execution would be good.
>>
>> - pass _qmp_timer directly to self._qmp.accept() in _post launch, 
>> leaving _launch() intact. I think this makes sense because as you also 
>> mentioned, changing _post_launch() into taking a parameter requires 
>> changing also all subclasses and pass values around.
>>
> 
> Sounds OK. If we do change the defaults back to "No Timeout" in a way 
> that allows an override by an opinionated class, we'll already have the 
> public property, though, so a parameter might not be needed.
> 
> (Yes, this is the THIRD time I've changed my mind in 48 hours.)
> 
>> Any opinion on this is very welcome.
>>
> 
> Brave words!
> 
> My last thought here is that I still don't like the idea of QEMUMachine 
> class changing its timeout behavior based on the introspection of 
> wrapper args.
> 
> It feels much more like the case that a caller who is knowingly wrapping 
> it with a program that delays its execution should change its parameters 
> accordingly based on what the caller knows about what they're trying to 
> accomplish.
> 
> Does that make the code too messy? I understand you probably want to 
> ensure that adding a GDB wrapper is painless and simple, so it might not 
> be great to always ask a caller to remember to set some timeout value to 
> use it.
> 
I am not sure I follow you here, where do you want to move this logic? 
Can you please elaborate more, I did not understand what you mean.

I understand that tweaking the timers in iotests.py with checks like

if not (qemu_gdb or qemu_valgrind):
	normal timer

may not be the most beautiful piece of code, but as you said it keeps 
things as simple as they can.

> I figure that the right place to do this, though, is wherever the 
> boilerplate code gets written that knows how to set up the right gdb 
> args and so on, and not in machine.py. It sounds like iotests.py code to 
> me, maybe in the VM class.

After the changes suggested on qmp_timeout, iotests.py already contains 
the only code to perform the setup right for gdb and valgrind, and 
machine.py is not touched (except for qmp_timeout). iotests.py will 
essentially contain a couple of ifs like the one above, changing the 
timer when gdb and valgring are *not* needed.

Emanuele
John Snow May 18, 2021, 2:26 p.m. UTC | #7
On 5/18/21 9:58 AM, Emanuele Giuseppe Esposito wrote:
> 
>>> So the current plan I have for _qmp_timer is:
>>>
>>> - As Max suggested, move it in __init__ and check there for the 
>>> wrapper contents. If we need to block forever (gdb, valgrind), we set 
>>> it to None. Otherwise to 15 seconds. I think setting it always to 
>>> None is not ideal, because if you are testing something that 
>>> deadlocks (see my attempts to remove/add locks in QEMU multiqueue) 
>>> and the socket is set to block forever, you don't know if the test is 
>>> super slow or it just deadlocked.
>>>
>>
>> I agree with your concern on rational defaults, let's focus on that 
>> briefly:
>>
>> Let's have QEMUMachine default to *no timeouts* moving forward, and 
>> have the timeouts be *opt-in*. This keeps the Machine class somewhat 
>> pure and free of opinions. The separation of mechanism and policy.
>>
>> Next, instead of modifying hundreds of tests to opt-in to the timeout, 
>> let's modify the VM class in iotests.py to opt-in to that timeout, 
>> restoring the current "safe" behavior of iotests.
>>
>> The above items can happen in a single commit, preserving behavior in 
>> the bisect.
>>
>> Finally, we can add a non-private property that individual tests can 
>> re-override to opt BACK out of the default.
>>
>> Something as simple as:
>>
>> vm.qmp_timeout = None
>>
>> would be just fine.
>>
> 
> I applied these suggested changes, will send v4 and we'll see what comes 
> out of it.
> 
>>> Well, one can argue that in both cases this is not the expected 
>>> behavior, but I think having an upper bound on each QMP command 
>>> execution would be good.
>>>
>>> - pass _qmp_timer directly to self._qmp.accept() in _post launch, 
>>> leaving _launch() intact. I think this makes sense because as you 
>>> also mentioned, changing _post_launch() into taking a parameter 
>>> requires changing also all subclasses and pass values around.
>>>
>>
>> Sounds OK. If we do change the defaults back to "No Timeout" in a way 
>> that allows an override by an opinionated class, we'll already have 
>> the public property, though, so a parameter might not be needed.
>>
>> (Yes, this is the THIRD time I've changed my mind in 48 hours.)
>>
>>> Any opinion on this is very welcome.
>>>
>>
>> Brave words!
>>
>> My last thought here is that I still don't like the idea of 
>> QEMUMachine class changing its timeout behavior based on the 
>> introspection of wrapper args.
>>
>> It feels much more like the case that a caller who is knowingly 
>> wrapping it with a program that delays its execution should change its 
>> parameters accordingly based on what the caller knows about what 
>> they're trying to accomplish.
>>
>> Does that make the code too messy? I understand you probably want to 
>> ensure that adding a GDB wrapper is painless and simple, so it might 
>> not be great to always ask a caller to remember to set some timeout 
>> value to use it.
>>
> I am not sure I follow you here, where do you want to move this logic? 
> Can you please elaborate more, I did not understand what you mean.
> 
> I understand that tweaking the timers in iotests.py with checks like
> 
> if not (qemu_gdb or qemu_valgrind):
>      normal timer
> 
> may not be the most beautiful piece of code, but as you said it keeps 
> things as simple as they can.
> 

What I mean is that of the two patterns:

caller.py:
     vm = machine(..., wrapper_args=['gdb', ...])

machine.py:
     def __init__(...):
         if 'gdb' in wrapper_args:
             self.timer = None

vs this one:

caller.py:
     vm = machine(..., wrapper_args=['gdb', ...], timer=None)

machine.py:
     def __init__(...):
         ... # No introspection of wrapper_args


I prefer the second. I would assume it's possible to localize the logic 
that creates a GDB-wrapped machine alongside the knowledge that it needs 
the timeout turned off *outside* of the machine class.

I could be *very wrong* about that assumption though. The reason I 
prefer the second pattern is because it avoids having to deal with what 
happens when a caller specifies both a timeout and a gdb-wrapper. In the 
second case, the caller explicitly requested the timeout to be None, so 
anything that occurs afterwards is the fault of the caller, not machine.py.

To me, that's "simpler". (I could be wrong, I don't have a great overall 
view of your series, just the bits that I have seen that touch machine.py.)

--js

>> I figure that the right place to do this, though, is wherever the 
>> boilerplate code gets written that knows how to set up the right gdb 
>> args and so on, and not in machine.py. It sounds like iotests.py code 
>> to me, maybe in the VM class.
> 
> After the changes suggested on qmp_timeout, iotests.py already contains 
> the only code to perform the setup right for gdb and valgrind, and 
> machine.py is not touched (except for qmp_timeout). iotests.py will 
> essentially contain a couple of ifs like the one above, changing the 
> timer when gdb and valgring are *not* needed.
> 
> Emanuele
>
Emanuele Giuseppe Esposito May 18, 2021, 6:20 p.m. UTC | #8
On 18/05/2021 16:26, John Snow wrote:
> On 5/18/21 9:58 AM, Emanuele Giuseppe Esposito wrote:
>>
>>>> So the current plan I have for _qmp_timer is:
>>>>
>>>> - As Max suggested, move it in __init__ and check there for the 
>>>> wrapper contents. If we need to block forever (gdb, valgrind), we 
>>>> set it to None. Otherwise to 15 seconds. I think setting it always 
>>>> to None is not ideal, because if you are testing something that 
>>>> deadlocks (see my attempts to remove/add locks in QEMU multiqueue) 
>>>> and the socket is set to block forever, you don't know if the test 
>>>> is super slow or it just deadlocked.
>>>>
>>>
>>> I agree with your concern on rational defaults, let's focus on that 
>>> briefly:
>>>
>>> Let's have QEMUMachine default to *no timeouts* moving forward, and 
>>> have the timeouts be *opt-in*. This keeps the Machine class somewhat 
>>> pure and free of opinions. The separation of mechanism and policy.
>>>
>>> Next, instead of modifying hundreds of tests to opt-in to the 
>>> timeout, let's modify the VM class in iotests.py to opt-in to that 
>>> timeout, restoring the current "safe" behavior of iotests.
>>>
>>> The above items can happen in a single commit, preserving behavior in 
>>> the bisect.
>>>
>>> Finally, we can add a non-private property that individual tests can 
>>> re-override to opt BACK out of the default.
>>>
>>> Something as simple as:
>>>
>>> vm.qmp_timeout = None
>>>
>>> would be just fine.
>>>
>>
>> I applied these suggested changes, will send v4 and we'll see what 
>> comes out of it.
>>
>>>> Well, one can argue that in both cases this is not the expected 
>>>> behavior, but I think having an upper bound on each QMP command 
>>>> execution would be good.
>>>>
>>>> - pass _qmp_timer directly to self._qmp.accept() in _post launch, 
>>>> leaving _launch() intact. I think this makes sense because as you 
>>>> also mentioned, changing _post_launch() into taking a parameter 
>>>> requires changing also all subclasses and pass values around.
>>>>
>>>
>>> Sounds OK. If we do change the defaults back to "No Timeout" in a way 
>>> that allows an override by an opinionated class, we'll already have 
>>> the public property, though, so a parameter might not be needed.
>>>
>>> (Yes, this is the THIRD time I've changed my mind in 48 hours.)
>>>
>>>> Any opinion on this is very welcome.
>>>>
>>>
>>> Brave words!
>>>
>>> My last thought here is that I still don't like the idea of 
>>> QEMUMachine class changing its timeout behavior based on the 
>>> introspection of wrapper args.
>>>
>>> It feels much more like the case that a caller who is knowingly 
>>> wrapping it with a program that delays its execution should change 
>>> its parameters accordingly based on what the caller knows about what 
>>> they're trying to accomplish.
>>>
>>> Does that make the code too messy? I understand you probably want to 
>>> ensure that adding a GDB wrapper is painless and simple, so it might 
>>> not be great to always ask a caller to remember to set some timeout 
>>> value to use it.
>>>
>> I am not sure I follow you here, where do you want to move this logic? 
>> Can you please elaborate more, I did not understand what you mean.
>>
>> I understand that tweaking the timers in iotests.py with checks like
>>
>> if not (qemu_gdb or qemu_valgrind):
>>      normal timer
>>
>> may not be the most beautiful piece of code, but as you said it keeps 
>> things as simple as they can.
>>
> 
> What I mean is that of the two patterns:
> 
> caller.py:
>      vm = machine(..., wrapper_args=['gdb', ...])
> 
> machine.py:
>      def __init__(...):
>          if 'gdb' in wrapper_args:
>              self.timer = None
> 
> vs this one:
> 
> caller.py:
>      vm = machine(..., wrapper_args=['gdb', ...], timer=None)
> 
> machine.py:
>      def __init__(...):
>          ... # No introspection of wrapper_args
> 
> 
> I prefer the second. I would assume it's possible to localize the logic 
> that creates a GDB-wrapped machine alongside the knowledge that it needs 
> the timeout turned off *outside* of the machine class.
> 
> I could be *very wrong* about that assumption though. The reason I 
> prefer the second pattern is because it avoids having to deal with what 
> happens when a caller specifies both a timeout and a gdb-wrapper. In the 
> second case, the caller explicitly requested the timeout to be None, so 
> anything that occurs afterwards is the fault of the caller, not machine.py.
> 
> To me, that's "simpler". (I could be wrong, I don't have a great overall 
> view of your series, just the bits that I have seen that touch machine.py.)

I think this can be done almost effortless. With your suggested changes 
on qmp_timer, we can have:

machine.py
def __init__(self, ..., wrapper, timer: None)
	self._qmp_timer = timer

def _post_launch(self)
	self._qmp.accept(self._qmp_timer)

iotests.py
	timer = None if qemu_gdb or qemu_valgrind else 15.0
	wrapper = qemu_gdb or qemu_valgrind # did not know about this OR trick btw
	vm = machine(..., wrapper, timer)

Thank you,
Emanuele
> 
> --js
> 
>>> I figure that the right place to do this, though, is wherever the 
>>> boilerplate code gets written that knows how to set up the right gdb 
>>> args and so on, and not in machine.py. It sounds like iotests.py code 
>>> to me, maybe in the VM class.
>>
>> After the changes suggested on qmp_timeout, iotests.py already 
>> contains the only code to perform the setup right for gdb and 
>> valgrind, and machine.py is not touched (except for qmp_timeout). 
>> iotests.py will essentially contain a couple of ifs like the one 
>> above, changing the timer when gdb and valgring are *not* needed.
>>
>> Emanuele
>>
>
diff mbox series

Patch

diff --git a/python/qemu/machine.py b/python/qemu/machine.py
index d6142271c2..dce96e1858 100644
--- a/python/qemu/machine.py
+++ b/python/qemu/machine.py
@@ -410,7 +410,7 @@  def _launch(self) -> None:
                                        shell=False,
                                        close_fds=False)
 
-        if 'gdbserver' in self._wrapper:
+        if 'gdbserver' in self._wrapper or 'valgrind' in self._wrapper:
             self._qmp_timer = None
         self._post_launch()
 
diff --git a/tests/qemu-iotests/iotests.py b/tests/qemu-iotests/iotests.py
index a2e8604674..94597433fa 100644
--- a/tests/qemu-iotests/iotests.py
+++ b/tests/qemu-iotests/iotests.py
@@ -489,7 +489,7 @@  def log(msg: Msg,
 
 class Timeout:
     def __init__(self, seconds, errmsg="Timeout"):
-        if qemu_gdb:
+        if qemu_gdb or qemu_valgrind:
             self.seconds = 3000
         else:
             self.seconds = seconds
@@ -700,7 +700,7 @@  def qmp_to_opts(self, obj):
         return ','.join(output_list)
 
     def get_qmp_events(self, wait: bool = False) -> List[QMPMessage]:
-        if qemu_gdb:
+        if qemu_gdb or qemu_valgrind:
             wait = 0.0
         return super().get_qmp_events(wait=wait)