diff mbox

iotests: Fix 219's timing

Message ID 20180601123221.29437-1-mreitz@redhat.com (mailing list archive)
State New, archived
Headers show

Commit Message

Max Reitz June 1, 2018, 12:32 p.m. UTC
219 has two issues that may lead to sporadic failure, both of which are
the result of issuing query-jobs too early after a job has been
modified.  This can then lead to different results based on whether the
modification has taken effect already or not.

First, query-jobs is issued right after the job has been created.
Besides its current progress possibly being in any random state (which
has already been taken care of), its total progress too is basically
arbitrary, because the job may not yet have been able to determine it.
This patch addresses this by just filtering the total progress, like
what has been done for the current progress already.

Secondly, query-jobs is issued right after a job has been resumed.  The
job may or may not yet have had the time to actually perform any I/O,
and thus its current progress may or may not have advanced.  To make
sure it has indeed advanced (which is what the reference output already
assumes), insert a sleep of 100 ms before query-jobs is invoked.  With a
slice time of 100 ms, a buffer size of 64 kB and a speed of 256 kB/s,
this should be the right amount of time to let the job advance by
exactly 64 kB.

Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 tests/qemu-iotests/219     | 10 +++++++---
 tests/qemu-iotests/219.out | 10 +++++-----
 2 files changed, 12 insertions(+), 8 deletions(-)

Comments

Eric Blake June 1, 2018, 9:36 p.m. UTC | #1
On 06/01/2018 07:32 AM, Max Reitz wrote:
> 219 has two issues that may lead to sporadic failure, both of which are
> the result of issuing query-jobs too early after a job has been
> modified.  This can then lead to different results based on whether the
> modification has taken effect already or not.
> 
> First, query-jobs is issued right after the job has been created.
> Besides its current progress possibly being in any random state (which
> has already been taken care of), its total progress too is basically
> arbitrary, because the job may not yet have been able to determine it.
> This patch addresses this by just filtering the total progress, like
> what has been done for the current progress already.
> 
> Secondly, query-jobs is issued right after a job has been resumed.  The
> job may or may not yet have had the time to actually perform any I/O,
> and thus its current progress may or may not have advanced.  To make
> sure it has indeed advanced (which is what the reference output already
> assumes), insert a sleep of 100 ms before query-jobs is invoked.  With a
> slice time of 100 ms, a buffer size of 64 kB and a speed of 256 kB/s,
> this should be the right amount of time to let the job advance by
> exactly 64 kB.

This still sounds a bit fragile (under heavy load, our 100ms sleep could 
turn into 200 such that more than 64k could transfer before we finally 
get a chance to query), but seems more robust that what we currently have.

> 
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> ---
>   tests/qemu-iotests/219     | 10 +++++++---
>   tests/qemu-iotests/219.out | 10 +++++-----
>   2 files changed, 12 insertions(+), 8 deletions(-)

> @@ -58,12 +61,13 @@ def test_job_lifecycle(vm, job, job_args, has_ready=False):
>       iotests.log(vm.qmp(job, job_id='job0', **job_args))
>   
>       # Depending on the storage, the first request may or may not have completed
> -    # yet, so filter out the progress. Later query-job calls don't need the
> -    # filtering because the progress is made deterministic by the block job
> -    # speed
> +    # yet (and the total progress may not have been fully determined yet), so
> +    # filter out the progress. Later query-job calls don't need the filtering
> +    # because the progress is made deterministic by the block job speed
>       result = vm.qmp('query-jobs')
>       for j in result['return']:
>           del j['current-progress']
> +        del j['total-progress']

Would this be better as:

j['current-progress'] = "VALUE"
j['total-progress'] = "VALUE"

>       iotests.log(result)
>   
>       # undefined -> created -> running
> diff --git a/tests/qemu-iotests/219.out b/tests/qemu-iotests/219.out
> index 346801b655..740b3d06d7 100644
> --- a/tests/qemu-iotests/219.out
> +++ b/tests/qemu-iotests/219.out
> @@ -3,7 +3,7 @@ Launching VM...
>   
>   Starting block job: drive-mirror (auto-finalize: True; auto-dismiss: True)
>   {u'return': {}}
> -{u'return': [{u'status': u'running', u'total-progress': 4194304, u'id': u'job0', u'type': u'mirror'}]}
> +{u'return': [{u'status': u'running', u'id': u'job0', u'type': u'mirror'}]}

so that the trace shows that the field was present but filtered, rather 
than looking strange by not showing the field at all?

But since it was copied from pre-existing deletion rather than replacement,
Reviewed-by: Eric Blake <eblake@redhat.com>
Max Reitz June 4, 2018, 9:50 a.m. UTC | #2
On 2018-06-01 23:36, Eric Blake wrote:
> On 06/01/2018 07:32 AM, Max Reitz wrote:
>> 219 has two issues that may lead to sporadic failure, both of which are
>> the result of issuing query-jobs too early after a job has been
>> modified.  This can then lead to different results based on whether the
>> modification has taken effect already or not.
>>
>> First, query-jobs is issued right after the job has been created.
>> Besides its current progress possibly being in any random state (which
>> has already been taken care of), its total progress too is basically
>> arbitrary, because the job may not yet have been able to determine it.
>> This patch addresses this by just filtering the total progress, like
>> what has been done for the current progress already.
>>
>> Secondly, query-jobs is issued right after a job has been resumed.  The
>> job may or may not yet have had the time to actually perform any I/O,
>> and thus its current progress may or may not have advanced.  To make
>> sure it has indeed advanced (which is what the reference output already
>> assumes), insert a sleep of 100 ms before query-jobs is invoked.  With a
>> slice time of 100 ms, a buffer size of 64 kB and a speed of 256 kB/s,
>> this should be the right amount of time to let the job advance by
>> exactly 64 kB.
> 
> This still sounds a bit fragile (under heavy load, our 100ms sleep could
> turn into 200 such that more than 64k could transfer before we finally
> get a chance to query), but seems more robust that what we currently have.

Indeed.  I'll just say I'll try to keep it in mind when I see the test
failing somewhen in the future. O:-)

(I guess I'll have to make it slower then.)

>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>> ---
>>   tests/qemu-iotests/219     | 10 +++++++---
>>   tests/qemu-iotests/219.out | 10 +++++-----
>>   2 files changed, 12 insertions(+), 8 deletions(-)
> 
>> @@ -58,12 +61,13 @@ def test_job_lifecycle(vm, job, job_args,
>> has_ready=False):
>>       iotests.log(vm.qmp(job, job_id='job0', **job_args))
>>         # Depending on the storage, the first request may or may not
>> have completed
>> -    # yet, so filter out the progress. Later query-job calls don't
>> need the
>> -    # filtering because the progress is made deterministic by the
>> block job
>> -    # speed
>> +    # yet (and the total progress may not have been fully determined
>> yet), so
>> +    # filter out the progress. Later query-job calls don't need the
>> filtering
>> +    # because the progress is made deterministic by the block job speed
>>       result = vm.qmp('query-jobs')
>>       for j in result['return']:
>>           del j['current-progress']
>> +        del j['total-progress']
> 
> Would this be better as:
> 
> j['current-progress'] = "VALUE"
> j['total-progress'] = "VALUE"
> 
>>       iotests.log(result)
>>         # undefined -> created -> running
>> diff --git a/tests/qemu-iotests/219.out b/tests/qemu-iotests/219.out
>> index 346801b655..740b3d06d7 100644
>> --- a/tests/qemu-iotests/219.out
>> +++ b/tests/qemu-iotests/219.out
>> @@ -3,7 +3,7 @@ Launching VM...
>>     Starting block job: drive-mirror (auto-finalize: True;
>> auto-dismiss: True)
>>   {u'return': {}}
>> -{u'return': [{u'status': u'running', u'total-progress': 4194304,
>> u'id': u'job0', u'type': u'mirror'}]}
>> +{u'return': [{u'status': u'running', u'id': u'job0', u'type':
>> u'mirror'}]}
> 
> so that the trace shows that the field was present but filtered, rather
> than looking strange by not showing the field at all?

Yes, that would be better.  I'll change it.

> But since it was copied from pre-existing deletion rather than replacement,
> Reviewed-by: Eric Blake <eblake@redhat.com>

Thanks,

Max
diff mbox

Patch

diff --git a/tests/qemu-iotests/219 b/tests/qemu-iotests/219
index 898a26eef0..87ab07b94d 100755
--- a/tests/qemu-iotests/219
+++ b/tests/qemu-iotests/219
@@ -20,6 +20,7 @@ 
 # Check using the job-* QMP commands with block jobs
 
 import iotests
+import time
 
 iotests.verify_image_format(supported_fmts=['qcow2'])
 
@@ -46,6 +47,8 @@  def test_pause_resume(vm):
 
             iotests.log(vm.qmp(resume_cmd, **{resume_arg: 'job0'}))
             iotests.log(iotests.filter_qmp_event(vm.event_wait('JOB_STATUS_CHANGE')))
+            # Let the job proceed before querying its progress
+            time.sleep(0.1)
             iotests.log(vm.qmp('query-jobs'))
 
 def test_job_lifecycle(vm, job, job_args, has_ready=False):
@@ -58,12 +61,13 @@  def test_job_lifecycle(vm, job, job_args, has_ready=False):
     iotests.log(vm.qmp(job, job_id='job0', **job_args))
 
     # Depending on the storage, the first request may or may not have completed
-    # yet, so filter out the progress. Later query-job calls don't need the
-    # filtering because the progress is made deterministic by the block job
-    # speed
+    # yet (and the total progress may not have been fully determined yet), so
+    # filter out the progress. Later query-job calls don't need the filtering
+    # because the progress is made deterministic by the block job speed
     result = vm.qmp('query-jobs')
     for j in result['return']:
         del j['current-progress']
+        del j['total-progress']
     iotests.log(result)
 
     # undefined -> created -> running
diff --git a/tests/qemu-iotests/219.out b/tests/qemu-iotests/219.out
index 346801b655..740b3d06d7 100644
--- a/tests/qemu-iotests/219.out
+++ b/tests/qemu-iotests/219.out
@@ -3,7 +3,7 @@  Launching VM...
 
 Starting block job: drive-mirror (auto-finalize: True; auto-dismiss: True)
 {u'return': {}}
-{u'return': [{u'status': u'running', u'total-progress': 4194304, u'id': u'job0', u'type': u'mirror'}]}
+{u'return': [{u'status': u'running', u'id': u'job0', u'type': u'mirror'}]}
 {u'timestamp': {u'seconds': 'SECS', u'microseconds': 'USECS'}, u'data': {u'status': u'created', u'id': u'job0'}, u'event': u'JOB_STATUS_CHANGE'}
 {u'timestamp': {u'seconds': 'SECS', u'microseconds': 'USECS'}, u'data': {u'status': u'running', u'id': u'job0'}, u'event': u'JOB_STATUS_CHANGE'}
 
@@ -93,7 +93,7 @@  Waiting for PENDING state...
 
 Starting block job: drive-backup (auto-finalize: True; auto-dismiss: True)
 {u'return': {}}
-{u'return': [{u'status': u'running', u'total-progress': 4194304, u'id': u'job0', u'type': u'backup'}]}
+{u'return': [{u'status': u'running', u'id': u'job0', u'type': u'backup'}]}
 {u'timestamp': {u'seconds': 'SECS', u'microseconds': 'USECS'}, u'data': {u'status': u'created', u'id': u'job0'}, u'event': u'JOB_STATUS_CHANGE'}
 {u'timestamp': {u'seconds': 'SECS', u'microseconds': 'USECS'}, u'data': {u'status': u'running', u'id': u'job0'}, u'event': u'JOB_STATUS_CHANGE'}
 
@@ -144,7 +144,7 @@  Waiting for PENDING state...
 
 Starting block job: drive-backup (auto-finalize: True; auto-dismiss: False)
 {u'return': {}}
-{u'return': [{u'status': u'running', u'total-progress': 4194304, u'id': u'job0', u'type': u'backup'}]}
+{u'return': [{u'status': u'running', u'id': u'job0', u'type': u'backup'}]}
 {u'timestamp': {u'seconds': 'SECS', u'microseconds': 'USECS'}, u'data': {u'status': u'created', u'id': u'job0'}, u'event': u'JOB_STATUS_CHANGE'}
 {u'timestamp': {u'seconds': 'SECS', u'microseconds': 'USECS'}, u'data': {u'status': u'running', u'id': u'job0'}, u'event': u'JOB_STATUS_CHANGE'}
 
@@ -203,7 +203,7 @@  Waiting for PENDING state...
 
 Starting block job: drive-backup (auto-finalize: False; auto-dismiss: True)
 {u'return': {}}
-{u'return': [{u'status': u'running', u'total-progress': 4194304, u'id': u'job0', u'type': u'backup'}]}
+{u'return': [{u'status': u'running', u'id': u'job0', u'type': u'backup'}]}
 {u'timestamp': {u'seconds': 'SECS', u'microseconds': 'USECS'}, u'data': {u'status': u'created', u'id': u'job0'}, u'event': u'JOB_STATUS_CHANGE'}
 {u'timestamp': {u'seconds': 'SECS', u'microseconds': 'USECS'}, u'data': {u'status': u'running', u'id': u'job0'}, u'event': u'JOB_STATUS_CHANGE'}
 
@@ -262,7 +262,7 @@  Waiting for PENDING state...
 
 Starting block job: drive-backup (auto-finalize: False; auto-dismiss: False)
 {u'return': {}}
-{u'return': [{u'status': u'running', u'total-progress': 4194304, u'id': u'job0', u'type': u'backup'}]}
+{u'return': [{u'status': u'running', u'id': u'job0', u'type': u'backup'}]}
 {u'timestamp': {u'seconds': 'SECS', u'microseconds': 'USECS'}, u'data': {u'status': u'created', u'id': u'job0'}, u'event': u'JOB_STATUS_CHANGE'}
 {u'timestamp': {u'seconds': 'SECS', u'microseconds': 'USECS'}, u'data': {u'status': u'running', u'id': u'job0'}, u'event': u'JOB_STATUS_CHANGE'}