tests/acceptance: fix timeout for vm.wait

Message ID	160552129733.22861.2033930860605615947.stgit@pasha-ThinkPad-X280 (mailing list archive)
State	New, archived
Headers	show Return-Path: <SRS0=KwCX=EW=nongnu.org=qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@kernel.org> DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 1A76A20709 Subject: [PATCH] tests/acceptance: fix timeout for vm.wait From: Pavel Dovgalyuk <pavel.dovgalyuk@ispras.ru> To: qemu-devel@nongnu.org Date: Mon, 16 Nov 2020 13:08:17 +0300 Message-ID: <160552129733.22861.2033930860605615947.stgit@pasha-ThinkPad-X280> User-Agent: StGit/0.23 MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit Received-SPF: pass client-ip=83.149.199.84; envelope-from=pavel.dovgalyuk@ispras.ru; helo=mail.ispras.ru Precedence: list Cc: pavel.dovgalyuk@ispras.ru, philmd@redhat.com, wrampazz@redhat.com, crosa@redhat.com, pbonzini@redhat.com, alex.bennee@linaro.org Errors-To: qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org Sender: "Qemu-devel" <qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org>
Series	tests/acceptance: fix timeout for vm.wait \| expand tests/acceptance: fix timeout for vm.wait

Pavel Dovgalyuk Nov. 16, 2020, 10:08 a.m. UTC

This patch adds timeout parameter to vm.wait() calls, because the default
value is just 30 seconds, and tests may last for more time.

Signed-off-by: Pavel Dovgalyuk <pavel.dovgalyuk@ispras.ru>
---
 tests/acceptance/boot_linux_console.py |    8 ++++----
 tests/acceptance/replay_kernel.py      |    2 +-
 2 files changed, 5 insertions(+), 5 deletions(-)

Philippe Mathieu-Daudé Nov. 16, 2020, 11:13 a.m. UTC | #1

Cc'ing John.

On Mon, Nov 16, 2020 at 11:08 AM Pavel Dovgalyuk
<pavel.dovgalyuk@ispras.ru> wrote:
>
> This patch adds timeout parameter to vm.wait() calls, because the default
> value is just 30 seconds, and tests may last for more time.
>
> Signed-off-by: Pavel Dovgalyuk <pavel.dovgalyuk@ispras.ru>
> ---
>  tests/acceptance/boot_linux_console.py |    8 ++++----
>  tests/acceptance/replay_kernel.py      |    2 +-
>  2 files changed, 5 insertions(+), 5 deletions(-)
>
> diff --git a/tests/acceptance/boot_linux_console.py b/tests/acceptance/boot_linux_console.py
> index cb6086ca6f..1cb8cb7a2a 100644
> --- a/tests/acceptance/boot_linux_console.py
> +++ b/tests/acceptance/boot_linux_console.py
> @@ -208,7 +208,7 @@ class BootLinuxConsole(LinuxKernelTest):
>          exec_command_and_wait_for_pattern(self, 'reboot',
>                                                  'reboot: Restarting system')
>          # Wait for VM to shut down gracefully
> -        self.vm.wait()
> +        self.vm.wait(None)
>
>      @skipUnless(os.getenv('AVOCADO_ALLOW_UNTRUSTED_CODE'), 'untrusted code')
>      def test_mips64el_malta_5KEc_cpio(self):
> @@ -250,7 +250,7 @@ class BootLinuxConsole(LinuxKernelTest):
>          exec_command_and_wait_for_pattern(self, 'reboot',
>                                                  'reboot: Restarting system')
>          # Wait for VM to shut down gracefully
> -        self.vm.wait()
> +        self.vm.wait(None)
>
>      def do_test_mips_malta32el_nanomips(self, kernel_url, kernel_hash):
>          kernel_path_xz = self.fetch_asset(kernel_url, asset_hash=kernel_hash)
> @@ -725,7 +725,7 @@ class BootLinuxConsole(LinuxKernelTest):
>          exec_command_and_wait_for_pattern(self, 'reboot',
>                                                  'reboot: Restarting system')
>          # Wait for VM to shut down gracefully
> -        self.vm.wait()
> +        self.vm.wait(None)
>
>      @skipUnless(os.getenv('ARMBIAN_ARTIFACTS_CACHED'),
>                  'Test artifacts fetched from unreliable apt.armbian.com')
> @@ -778,7 +778,7 @@ class BootLinuxConsole(LinuxKernelTest):
>          exec_command_and_wait_for_pattern(self, 'reboot',
>                                                  'reboot: Restarting system')
>          # Wait for VM to shut down gracefully
> -        self.vm.wait()
> +        self.vm.wait(None)
>
>      @skipUnless(os.getenv('ARMBIAN_ARTIFACTS_CACHED'),
>                  'Test artifacts fetched from unreliable dl.armbian.com')
> diff --git a/tests/acceptance/replay_kernel.py b/tests/acceptance/replay_kernel.py
> index c8b043ac31..02a52b1d72 100644
> --- a/tests/acceptance/replay_kernel.py
> +++ b/tests/acceptance/replay_kernel.py
> @@ -60,7 +60,7 @@ class ReplayKernelBase(LinuxKernelTest):
>              logger.info('finished the recording with log size %s bytes'
>                          % os.path.getsize(replay_path))
>          else:
> -            vm.wait()
> +            vm.wait(None)
>              logger.info('successfully finished the replay')
>          elapsed = time.time() - start_time
>          logger.info('elapsed time %.2f sec' % elapsed)
>

John Snow Dec. 1, 2020, 7:15 p.m. UTC | #2

On 11/16/20 6:13 AM, Philippe Mathieu-Daudé wrote:
> Cc'ing John.
> 
> On Mon, Nov 16, 2020 at 11:08 AM Pavel Dovgalyuk
> <pavel.dovgalyuk@ispras.ru> wrote:
>>
>> This patch adds timeout parameter to vm.wait() calls, because the default
>> value is just 30 seconds, and tests may last for more time.
>>

This doesn't sound right -- the timeout isn't meant to be for the entire 
duration of the test, the timeout is from the time of issuing a shutdown 
command until the time the VM actually shuts down. Ideally, that should 
not take a particularly long time in a well-behaved test.

Why is it lasting longer than 30 seconds?
How long is it actually taking, do we know?

And, are you SURE you want to have *no* timeout for all of these calls? 
(Will something else kill them if they take too long? Ideally before 
gitlab itself kills the job for running too long?)

Cleber?

>> Signed-off-by: Pavel Dovgalyuk <pavel.dovgalyuk@ispras.ru>
>> ---
>>   tests/acceptance/boot_linux_console.py |    8 ++++----
>>   tests/acceptance/replay_kernel.py      |    2 +-
>>   2 files changed, 5 insertions(+), 5 deletions(-)
>>
>> diff --git a/tests/acceptance/boot_linux_console.py b/tests/acceptance/boot_linux_console.py
>> index cb6086ca6f..1cb8cb7a2a 100644
>> --- a/tests/acceptance/boot_linux_console.py
>> +++ b/tests/acceptance/boot_linux_console.py
>> @@ -208,7 +208,7 @@ class BootLinuxConsole(LinuxKernelTest):
>>           exec_command_and_wait_for_pattern(self, 'reboot',
>>                                                   'reboot: Restarting system')
>>           # Wait for VM to shut down gracefully
>> -        self.vm.wait()
>> +        self.vm.wait(None)
>>
>>       @skipUnless(os.getenv('AVOCADO_ALLOW_UNTRUSTED_CODE'), 'untrusted code')
>>       def test_mips64el_malta_5KEc_cpio(self):
>> @@ -250,7 +250,7 @@ class BootLinuxConsole(LinuxKernelTest):
>>           exec_command_and_wait_for_pattern(self, 'reboot',
>>                                                   'reboot: Restarting system')
>>           # Wait for VM to shut down gracefully
>> -        self.vm.wait()
>> +        self.vm.wait(None)
>>
>>       def do_test_mips_malta32el_nanomips(self, kernel_url, kernel_hash):
>>           kernel_path_xz = self.fetch_asset(kernel_url, asset_hash=kernel_hash)
>> @@ -725,7 +725,7 @@ class BootLinuxConsole(LinuxKernelTest):
>>           exec_command_and_wait_for_pattern(self, 'reboot',
>>                                                   'reboot: Restarting system')
>>           # Wait for VM to shut down gracefully
>> -        self.vm.wait()
>> +        self.vm.wait(None)
>>
>>       @skipUnless(os.getenv('ARMBIAN_ARTIFACTS_CACHED'),
>>                   'Test artifacts fetched from unreliable apt.armbian.com')
>> @@ -778,7 +778,7 @@ class BootLinuxConsole(LinuxKernelTest):
>>           exec_command_and_wait_for_pattern(self, 'reboot',
>>                                                   'reboot: Restarting system')
>>           # Wait for VM to shut down gracefully
>> -        self.vm.wait()
>> +        self.vm.wait(None)
>>
>>       @skipUnless(os.getenv('ARMBIAN_ARTIFACTS_CACHED'),
>>                   'Test artifacts fetched from unreliable dl.armbian.com')
>> diff --git a/tests/acceptance/replay_kernel.py b/tests/acceptance/replay_kernel.py
>> index c8b043ac31..02a52b1d72 100644
>> --- a/tests/acceptance/replay_kernel.py
>> +++ b/tests/acceptance/replay_kernel.py
>> @@ -60,7 +60,7 @@ class ReplayKernelBase(LinuxKernelTest):
>>               logger.info('finished the recording with log size %s bytes'
>>                           % os.path.getsize(replay_path))
>>           else:
>> -            vm.wait()
>> +            vm.wait(None)
>>               logger.info('successfully finished the replay')
>>           elapsed = time.time() - start_time
>>           logger.info('elapsed time %.2f sec' % elapsed)
>>
>

Pavel Dovgalyuk Dec. 2, 2020, 6:31 a.m. UTC | #3

On 01.12.2020 22:15, John Snow wrote:
> On 11/16/20 6:13 AM, Philippe Mathieu-Daudé wrote:
>> Cc'ing John.
>>
>> On Mon, Nov 16, 2020 at 11:08 AM Pavel Dovgalyuk
>> <pavel.dovgalyuk@ispras.ru> wrote:
>>>
>>> This patch adds timeout parameter to vm.wait() calls, because the 
>>> default
>>> value is just 30 seconds, and tests may last for more time.
>>>
> 
> This doesn't sound right -- the timeout isn't meant to be for the entire 
> duration of the test, the timeout is from the time of issuing a shutdown 
> command until the time the VM actually shuts down. Ideally, that should 
> not take a particularly long time in a well-behaved test.
> 
> Why is it lasting longer than 30 seconds?

These are complex Linux boot&execution tests.
Such loading process could take more than 30 seconds.
E.g., BootLinux tests have timeout of 900 seconds.

> How long is it actually taking, do we know?
> 
> And, are you SURE you want to have *no* timeout for all of these calls? 
> (Will something else kill them if they take too long? Ideally before 
> gitlab itself kills the job for running too long?)
> 
> Cleber?
> 
>>> Signed-off-by: Pavel Dovgalyuk <pavel.dovgalyuk@ispras.ru>
>>> ---
>>>   tests/acceptance/boot_linux_console.py |    8 ++++----
>>>   tests/acceptance/replay_kernel.py      |    2 +-
>>>   2 files changed, 5 insertions(+), 5 deletions(-)
>>>
>>> diff --git a/tests/acceptance/boot_linux_console.py 
>>> b/tests/acceptance/boot_linux_console.py
>>> index cb6086ca6f..1cb8cb7a2a 100644
>>> --- a/tests/acceptance/boot_linux_console.py
>>> +++ b/tests/acceptance/boot_linux_console.py
>>> @@ -208,7 +208,7 @@ class BootLinuxConsole(LinuxKernelTest):
>>>           exec_command_and_wait_for_pattern(self, 'reboot',
>>>                                                   'reboot: Restarting 
>>> system')
>>>           # Wait for VM to shut down gracefully
>>> -        self.vm.wait()
>>> +        self.vm.wait(None)
>>>
>>>       @skipUnless(os.getenv('AVOCADO_ALLOW_UNTRUSTED_CODE'), 
>>> 'untrusted code')
>>>       def test_mips64el_malta_5KEc_cpio(self):
>>> @@ -250,7 +250,7 @@ class BootLinuxConsole(LinuxKernelTest):
>>>           exec_command_and_wait_for_pattern(self, 'reboot',
>>>                                                   'reboot: Restarting 
>>> system')
>>>           # Wait for VM to shut down gracefully
>>> -        self.vm.wait()
>>> +        self.vm.wait(None)
>>>
>>>       def do_test_mips_malta32el_nanomips(self, kernel_url, 
>>> kernel_hash):
>>>           kernel_path_xz = self.fetch_asset(kernel_url, 
>>> asset_hash=kernel_hash)
>>> @@ -725,7 +725,7 @@ class BootLinuxConsole(LinuxKernelTest):
>>>           exec_command_and_wait_for_pattern(self, 'reboot',
>>>                                                   'reboot: Restarting 
>>> system')
>>>           # Wait for VM to shut down gracefully
>>> -        self.vm.wait()
>>> +        self.vm.wait(None)
>>>
>>>       @skipUnless(os.getenv('ARMBIAN_ARTIFACTS_CACHED'),
>>>                   'Test artifacts fetched from unreliable 
>>> apt.armbian.com')
>>> @@ -778,7 +778,7 @@ class BootLinuxConsole(LinuxKernelTest):
>>>           exec_command_and_wait_for_pattern(self, 'reboot',
>>>                                                   'reboot: Restarting 
>>> system')
>>>           # Wait for VM to shut down gracefully
>>> -        self.vm.wait()
>>> +        self.vm.wait(None)
>>>
>>>       @skipUnless(os.getenv('ARMBIAN_ARTIFACTS_CACHED'),
>>>                   'Test artifacts fetched from unreliable 
>>> dl.armbian.com')
>>> diff --git a/tests/acceptance/replay_kernel.py 
>>> b/tests/acceptance/replay_kernel.py
>>> index c8b043ac31..02a52b1d72 100644
>>> --- a/tests/acceptance/replay_kernel.py
>>> +++ b/tests/acceptance/replay_kernel.py
>>> @@ -60,7 +60,7 @@ class ReplayKernelBase(LinuxKernelTest):
>>>               logger.info('finished the recording with log size %s 
>>> bytes'
>>>                           % os.path.getsize(replay_path))
>>>           else:
>>> -            vm.wait()
>>> +            vm.wait(None)
>>>               logger.info('successfully finished the replay')
>>>           elapsed = time.time() - start_time
>>>           logger.info('elapsed time %.2f sec' % elapsed)
>>>
>>
>

John Snow Dec. 2, 2020, 3:22 p.m. UTC | #4

On 12/2/20 1:31 AM, Pavel Dovgalyuk wrote:
>>>>
>>>> This patch adds timeout parameter to vm.wait() calls, because the 
>>>> default
>>>> value is just 30 seconds, and tests may last for more time.
>>>>
>>
>> This doesn't sound right -- the timeout isn't meant to be for the 
>> entire duration of the test, the timeout is from the time of issuing a 
>> shutdown command until the time the VM actually shuts down. Ideally, 
>> that should not take a particularly long time in a well-behaved test.
>>
>> Why is it lasting longer than 30 seconds?
> 
> These are complex Linux boot&execution tests.
> Such loading process could take more than 30 seconds.
> E.g., BootLinux tests have timeout of 900 seconds.

This timeout should only count towards the time spent *shutting down*, 
not the time to run the entire test. 30 seconds used to be enough time 
for this to happen on gitlab, if it's taking longer than that I am 
worried that something has gone wrong.

Where were the failures observed, and on what tests? Are there logs I 
can review?

--js

Pavel Dovgalyuk Dec. 3, 2020, 6:29 a.m. UTC | #5

On 02.12.2020 18:22, John Snow wrote:
> On 12/2/20 1:31 AM, Pavel Dovgalyuk wrote:
>>>>>
>>>>> This patch adds timeout parameter to vm.wait() calls, because the 
>>>>> default
>>>>> value is just 30 seconds, and tests may last for more time.
>>>>>
>>>
>>> This doesn't sound right -- the timeout isn't meant to be for the 
>>> entire duration of the test, the timeout is from the time of issuing 
>>> a shutdown command until the time the VM actually shuts down. 
>>> Ideally, that should not take a particularly long time in a 
>>> well-behaved test.
>>>
>>> Why is it lasting longer than 30 seconds?
>>
>> These are complex Linux boot&execution tests.
>> Such loading process could take more than 30 seconds.
>> E.g., BootLinux tests have timeout of 900 seconds.
> 
> This timeout should only count towards the time spent *shutting down*, 
> not the time to run the entire test. 30 seconds used to be enough time 
> for this to happen on gitlab, if it's taking longer than that I am 
> worried that something has gone wrong.
> 
> Where were the failures observed, and on what tests? Are there logs I 
> can review?

I've got your point. You were right.
The problem was with new long-lasting record/replay tests:

if record:
     cloudinit.wait_for_phone_home(('0.0.0.0', self.phone_home_port),
                                   self.name)
     vm.shutdown()
     logger.info('finished the recording with log size %s bytes'
                 % os.path.getsize(replay_path))
else:
     vm.wait(None)
     logger.info('successfully fihished the replay')


Replay phase here waits for shutdown for the whole period of Linux boot 
and execution. We don't check any VM output and just wait for finishing
the replay.

Smaller RR tests include "self.wait_for_console_pattern" during replay 
and therefore can't have problems with this timeout.

Pavel Dovgalyuk

Cleber Rosa Dec. 3, 2020, 4:56 p.m. UTC | #6

On Thu, Dec 03, 2020 at 09:29:10AM +0300, Pavel Dovgalyuk wrote:
> On 02.12.2020 18:22, John Snow wrote:
> > On 12/2/20 1:31 AM, Pavel Dovgalyuk wrote:
> > > > > > 
> > > > > > This patch adds timeout parameter to vm.wait() calls,
> > > > > > because the default
> > > > > > value is just 30 seconds, and tests may last for more time.
> > > > > > 
> > > > 
> > > > This doesn't sound right -- the timeout isn't meant to be for
> > > > the entire duration of the test, the timeout is from the time of
> > > > issuing a shutdown command until the time the VM actually shuts
> > > > down. Ideally, that should not take a particularly long time in
> > > > a well-behaved test.
> > > > 
> > > > Why is it lasting longer than 30 seconds?
> > > 
> > > These are complex Linux boot&execution tests.
> > > Such loading process could take more than 30 seconds.
> > > E.g., BootLinux tests have timeout of 900 seconds.
> > 
> > This timeout should only count towards the time spent *shutting down*,
> > not the time to run the entire test. 30 seconds used to be enough time
> > for this to happen on gitlab, if it's taking longer than that I am
> > worried that something has gone wrong.
> > 
> > Where were the failures observed, and on what tests? Are there logs I
> > can review?
> 
> I've got your point. You were right.
> The problem was with new long-lasting record/replay tests:
>

Hi Pavel,

You mean a new test you're writing based on "boot_linux.py"?

> if record:
>     cloudinit.wait_for_phone_home(('0.0.0.0', self.phone_home_port),
>                                   self.name)
>     vm.shutdown()
>     logger.info('finished the recording with log size %s bytes'
>                 % os.path.getsize(replay_path))
> else:
>     vm.wait(None)
>     logger.info('successfully fihished the replay')
> 
> 
> Replay phase here waits for shutdown for the whole period of Linux boot and
> execution. We don't check any VM output and just wait for finishing
> the replay.
>

I'm missing why a replay be different when it comes to waiting for a
successfull "I'm finished booting" from the guest?  Wouldn't the guest
during the replay act just the same?

> Smaller RR tests include "self.wait_for_console_pattern" during replay and
> therefore can't have problems with this timeout.
>
> Pavel Dovgalyuk
> 

Right.

Thanks,
- Cleber.

Pavel Dovgalyuk Dec. 4, 2020, 6:38 a.m. UTC | #7

On 03.12.2020 19:56, Cleber Rosa wrote:
> On Thu, Dec 03, 2020 at 09:29:10AM +0300, Pavel Dovgalyuk wrote:
>> On 02.12.2020 18:22, John Snow wrote:
>>> On 12/2/20 1:31 AM, Pavel Dovgalyuk wrote:
>>>>>>>
>>>>>>> This patch adds timeout parameter to vm.wait() calls,
>>>>>>> because the default
>>>>>>> value is just 30 seconds, and tests may last for more time.
>>>>>>>
>>>>>
>>>>> This doesn't sound right -- the timeout isn't meant to be for
>>>>> the entire duration of the test, the timeout is from the time of
>>>>> issuing a shutdown command until the time the VM actually shuts
>>>>> down. Ideally, that should not take a particularly long time in
>>>>> a well-behaved test.
>>>>>
>>>>> Why is it lasting longer than 30 seconds?
>>>>
>>>> These are complex Linux boot&execution tests.
>>>> Such loading process could take more than 30 seconds.
>>>> E.g., BootLinux tests have timeout of 900 seconds.
>>>
>>> This timeout should only count towards the time spent *shutting down*,
>>> not the time to run the entire test. 30 seconds used to be enough time
>>> for this to happen on gitlab, if it's taking longer than that I am
>>> worried that something has gone wrong.
>>>
>>> Where were the failures observed, and on what tests? Are there logs I
>>> can review?
>>
>> I've got your point. You were right.
>> The problem was with new long-lasting record/replay tests:
>>
> 
> Hi Pavel,
> 
> You mean a new test you're writing based on "boot_linux.py"?

Right.

> 
>> if record:
>>      cloudinit.wait_for_phone_home(('0.0.0.0', self.phone_home_port),
>>                                    self.name)
>>      vm.shutdown()
>>      logger.info('finished the recording with log size %s bytes'
>>                  % os.path.getsize(replay_path))
>> else:
>>      vm.wait(None)
>>      logger.info('successfully fihished the replay')
>>
>>
>> Replay phase here waits for shutdown for the whole period of Linux boot and
>> execution. We don't check any VM output and just wait for finishing
>> the replay.
>>
> 
> I'm missing why a replay be different when it comes to waiting for a
> successfull "I'm finished booting" from the guest?  Wouldn't the guest
> during the replay act just the same?

Yes, it will act the same.
There is no need in checking the output (because record does not check 
console patterns too), because replay shutdowns automatically after the 
execution ends.

Pavel Dovgalyuk

tests/acceptance: fix timeout for vm.wait

Commit Message

Comments

Patch