diff mbox

iotests: Fix 219's timing

Message ID 20180606183738.31688-1-mreitz@redhat.com (mailing list archive)
State New, archived
Headers show

Commit Message

Max Reitz June 6, 2018, 6:37 p.m. UTC
219 has two issues that may lead to sporadic failure, both of which are
the result of issuing query-jobs too early after a job has been
modified.  This can then lead to different results based on whether the
modification has taken effect already or not.

First, query-jobs is issued right after the job has been created.
Besides its current progress possibly being in any random state (which
has already been taken care of), its total progress too is basically
arbitrary, because the job may not yet have been able to determine it.
This patch addresses this by just filtering the total progress, like
what has been done for the current progress already.  However, for more
clarity, the filtering is changed to replace the values by a string
'FILTERED' instead of deleting them.

Secondly, query-jobs is issued right after a job has been resumed.  The
job may or may not yet have had the time to actually perform any I/O,
and thus its current progress may or may not have advanced.  To make
sure it has indeed advanced (which is what the reference output already
assumes), insert a sleep of 100 ms before query-jobs is invoked.  With a
slice time of 100 ms, a buffer size of 64 kB and a speed of 256 kB/s,
this should be the right amount of time to let the job advance by
exactly 64 kB.

Signed-off-by: Max Reitz <mreitz@redhat.com>
---
v2: Changed the query-jobs progress filtering [Eric]
---
 tests/qemu-iotests/219     | 12 ++++++++----
 tests/qemu-iotests/219.out | 10 +++++-----
 2 files changed, 13 insertions(+), 9 deletions(-)

Comments

Max Reitz June 6, 2018, 6:38 p.m. UTC | #1
On 2018-06-06 20:37, Max Reitz wrote:
> 219 has two issues that may lead to sporadic failure, both of which are
> the result of issuing query-jobs too early after a job has been
> modified.  This can then lead to different results based on whether the
> modification has taken effect already or not.
> 
> First, query-jobs is issued right after the job has been created.
> Besides its current progress possibly being in any random state (which
> has already been taken care of), its total progress too is basically
> arbitrary, because the job may not yet have been able to determine it.
> This patch addresses this by just filtering the total progress, like
> what has been done for the current progress already.  However, for more
> clarity, the filtering is changed to replace the values by a string
> 'FILTERED' instead of deleting them.
> 
> Secondly, query-jobs is issued right after a job has been resumed.  The
> job may or may not yet have had the time to actually perform any I/O,
> and thus its current progress may or may not have advanced.  To make
> sure it has indeed advanced (which is what the reference output already
> assumes), insert a sleep of 100 ms before query-jobs is invoked.  With a
> slice time of 100 ms, a buffer size of 64 kB and a speed of 256 kB/s,
> this should be the right amount of time to let the job advance by
> exactly 64 kB.
> 
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> ---
> v2: Changed the query-jobs progress filtering [Eric]
> ---
>  tests/qemu-iotests/219     | 12 ++++++++----
>  tests/qemu-iotests/219.out | 10 +++++-----
>  2 files changed, 13 insertions(+), 9 deletions(-)

Oops, forgot the "v2" tag.  Forgive me.

Max
Peter Maydell June 6, 2018, 6:41 p.m. UTC | #2
On 6 June 2018 at 19:37, Max Reitz <mreitz@redhat.com> wrote:
> 219 has two issues that may lead to sporadic failure, both of which are
> the result of issuing query-jobs too early after a job has been
> modified.  This can then lead to different results based on whether the
> modification has taken effect already or not.
>
> First, query-jobs is issued right after the job has been created.
> Besides its current progress possibly being in any random state (which
> has already been taken care of), its total progress too is basically
> arbitrary, because the job may not yet have been able to determine it.
> This patch addresses this by just filtering the total progress, like
> what has been done for the current progress already.  However, for more
> clarity, the filtering is changed to replace the values by a string
> 'FILTERED' instead of deleting them.
>
> Secondly, query-jobs is issued right after a job has been resumed.  The
> job may or may not yet have had the time to actually perform any I/O,
> and thus its current progress may or may not have advanced.  To make
> sure it has indeed advanced (which is what the reference output already
> assumes), insert a sleep of 100 ms before query-jobs is invoked.  With a
> slice time of 100 ms, a buffer size of 64 kB and a speed of 256 kB/s,
> this should be the right amount of time to let the job advance by
> exactly 64 kB.
>
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> ---
> v2: Changed the query-jobs progress filtering [Eric]
> ---

I know nothing about the iotests, so this might be off-base,
but this looks rather like "try to fix a race condition by
adding a sleep", which generally doesn't work very well ?

thanks
-- PMM
Max Reitz June 6, 2018, 6:50 p.m. UTC | #3
On 2018-06-06 20:41, Peter Maydell wrote:
> On 6 June 2018 at 19:37, Max Reitz <mreitz@redhat.com> wrote:
>> 219 has two issues that may lead to sporadic failure, both of which are
>> the result of issuing query-jobs too early after a job has been
>> modified.  This can then lead to different results based on whether the
>> modification has taken effect already or not.
>>
>> First, query-jobs is issued right after the job has been created.
>> Besides its current progress possibly being in any random state (which
>> has already been taken care of), its total progress too is basically
>> arbitrary, because the job may not yet have been able to determine it.
>> This patch addresses this by just filtering the total progress, like
>> what has been done for the current progress already.  However, for more
>> clarity, the filtering is changed to replace the values by a string
>> 'FILTERED' instead of deleting them.
>>
>> Secondly, query-jobs is issued right after a job has been resumed.  The
>> job may or may not yet have had the time to actually perform any I/O,
>> and thus its current progress may or may not have advanced.  To make
>> sure it has indeed advanced (which is what the reference output already
>> assumes), insert a sleep of 100 ms before query-jobs is invoked.  With a
>> slice time of 100 ms, a buffer size of 64 kB and a speed of 256 kB/s,
>> this should be the right amount of time to let the job advance by
>> exactly 64 kB.
>>
>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>> ---
>> v2: Changed the query-jobs progress filtering [Eric]
>> ---
> 
> I know nothing about the iotests, so this might be off-base,
> but this looks rather like "try to fix a race condition by
> adding a sleep", which generally doesn't work very well ?

The job tested here already has its own timing (copying 64 kB four times
a second, in 100 ms steps), so a sleep is not too bad.  What is
happening is that the job is put to sleep, then reawakened and it should
do one copy step immediately afterwards.  Then it won't do anything for
250 ms.

Now waiting 100 ms should really be enough to make that "immediate" step
actually happening, and it doesn't really hurt because we have 250 ms
anyway.

But I think it should be possible without the sleep (by repeatedly
querying the progress), I'll give it a try.

Max
diff mbox

Patch

diff --git a/tests/qemu-iotests/219 b/tests/qemu-iotests/219
index 898a26eef0..6931c5e449 100755
--- a/tests/qemu-iotests/219
+++ b/tests/qemu-iotests/219
@@ -20,6 +20,7 @@ 
 # Check using the job-* QMP commands with block jobs
 
 import iotests
+import time
 
 iotests.verify_image_format(supported_fmts=['qcow2'])
 
@@ -46,6 +47,8 @@  def test_pause_resume(vm):
 
             iotests.log(vm.qmp(resume_cmd, **{resume_arg: 'job0'}))
             iotests.log(iotests.filter_qmp_event(vm.event_wait('JOB_STATUS_CHANGE')))
+            # Let the job proceed before querying its progress
+            time.sleep(0.1)
             iotests.log(vm.qmp('query-jobs'))
 
 def test_job_lifecycle(vm, job, job_args, has_ready=False):
@@ -58,12 +61,13 @@  def test_job_lifecycle(vm, job, job_args, has_ready=False):
     iotests.log(vm.qmp(job, job_id='job0', **job_args))
 
     # Depending on the storage, the first request may or may not have completed
-    # yet, so filter out the progress. Later query-job calls don't need the
-    # filtering because the progress is made deterministic by the block job
-    # speed
+    # yet (and the total progress may not have been fully determined yet), so
+    # filter out the progress. Later query-job calls don't need the filtering
+    # because the progress is made deterministic by the block job speed
     result = vm.qmp('query-jobs')
     for j in result['return']:
-        del j['current-progress']
+        j['current-progress'] = 'FILTERED'
+        j['total-progress'] = 'FILTERED'
     iotests.log(result)
 
     # undefined -> created -> running
diff --git a/tests/qemu-iotests/219.out b/tests/qemu-iotests/219.out
index 346801b655..6dc07bc41e 100644
--- a/tests/qemu-iotests/219.out
+++ b/tests/qemu-iotests/219.out
@@ -3,7 +3,7 @@  Launching VM...
 
 Starting block job: drive-mirror (auto-finalize: True; auto-dismiss: True)
 {u'return': {}}
-{u'return': [{u'status': u'running', u'total-progress': 4194304, u'id': u'job0', u'type': u'mirror'}]}
+{u'return': [{u'status': u'running', u'current-progress': 'FILTERED', u'total-progress': 'FILTERED', u'id': u'job0', u'type': u'mirror'}]}
 {u'timestamp': {u'seconds': 'SECS', u'microseconds': 'USECS'}, u'data': {u'status': u'created', u'id': u'job0'}, u'event': u'JOB_STATUS_CHANGE'}
 {u'timestamp': {u'seconds': 'SECS', u'microseconds': 'USECS'}, u'data': {u'status': u'running', u'id': u'job0'}, u'event': u'JOB_STATUS_CHANGE'}
 
@@ -93,7 +93,7 @@  Waiting for PENDING state...
 
 Starting block job: drive-backup (auto-finalize: True; auto-dismiss: True)
 {u'return': {}}
-{u'return': [{u'status': u'running', u'total-progress': 4194304, u'id': u'job0', u'type': u'backup'}]}
+{u'return': [{u'status': u'running', u'current-progress': 'FILTERED', u'total-progress': 'FILTERED', u'id': u'job0', u'type': u'backup'}]}
 {u'timestamp': {u'seconds': 'SECS', u'microseconds': 'USECS'}, u'data': {u'status': u'created', u'id': u'job0'}, u'event': u'JOB_STATUS_CHANGE'}
 {u'timestamp': {u'seconds': 'SECS', u'microseconds': 'USECS'}, u'data': {u'status': u'running', u'id': u'job0'}, u'event': u'JOB_STATUS_CHANGE'}
 
@@ -144,7 +144,7 @@  Waiting for PENDING state...
 
 Starting block job: drive-backup (auto-finalize: True; auto-dismiss: False)
 {u'return': {}}
-{u'return': [{u'status': u'running', u'total-progress': 4194304, u'id': u'job0', u'type': u'backup'}]}
+{u'return': [{u'status': u'running', u'current-progress': 'FILTERED', u'total-progress': 'FILTERED', u'id': u'job0', u'type': u'backup'}]}
 {u'timestamp': {u'seconds': 'SECS', u'microseconds': 'USECS'}, u'data': {u'status': u'created', u'id': u'job0'}, u'event': u'JOB_STATUS_CHANGE'}
 {u'timestamp': {u'seconds': 'SECS', u'microseconds': 'USECS'}, u'data': {u'status': u'running', u'id': u'job0'}, u'event': u'JOB_STATUS_CHANGE'}
 
@@ -203,7 +203,7 @@  Waiting for PENDING state...
 
 Starting block job: drive-backup (auto-finalize: False; auto-dismiss: True)
 {u'return': {}}
-{u'return': [{u'status': u'running', u'total-progress': 4194304, u'id': u'job0', u'type': u'backup'}]}
+{u'return': [{u'status': u'running', u'current-progress': 'FILTERED', u'total-progress': 'FILTERED', u'id': u'job0', u'type': u'backup'}]}
 {u'timestamp': {u'seconds': 'SECS', u'microseconds': 'USECS'}, u'data': {u'status': u'created', u'id': u'job0'}, u'event': u'JOB_STATUS_CHANGE'}
 {u'timestamp': {u'seconds': 'SECS', u'microseconds': 'USECS'}, u'data': {u'status': u'running', u'id': u'job0'}, u'event': u'JOB_STATUS_CHANGE'}
 
@@ -262,7 +262,7 @@  Waiting for PENDING state...
 
 Starting block job: drive-backup (auto-finalize: False; auto-dismiss: False)
 {u'return': {}}
-{u'return': [{u'status': u'running', u'total-progress': 4194304, u'id': u'job0', u'type': u'backup'}]}
+{u'return': [{u'status': u'running', u'current-progress': 'FILTERED', u'total-progress': 'FILTERED', u'id': u'job0', u'type': u'backup'}]}
 {u'timestamp': {u'seconds': 'SECS', u'microseconds': 'USECS'}, u'data': {u'status': u'created', u'id': u'job0'}, u'event': u'JOB_STATUS_CHANGE'}
 {u'timestamp': {u'seconds': 'SECS', u'microseconds': 'USECS'}, u'data': {u'status': u'running', u'id': u'job0'}, u'event': u'JOB_STATUS_CHANGE'}