mbox series

[0/4] job: Allow complete for jobs on standby

Message ID 20210409120422.144040-1-mreitz@redhat.com (mailing list archive)
Headers show
Series job: Allow complete for jobs on standby | expand

Message

Max Reitz April 9, 2021, 12:04 p.m. UTC
Hi,

We sometimes have a problem with jobs remaining on STANDBY after a drain
(for a short duration), so if the user immediately issues a
block-job-complete, that will fail.

(See also
https://lists.nongnu.org/archive/html/qemu-block/2021-04/msg00215.html,
which this series is an alternative for.)

Looking at the only implementation of .complete(), which is
mirror_complete(), it looks like there is basically nothing that would
prevent it from being run while mirror is paused.  Really only the
job_enter() at the end, which we should not and need not do when the job
is paused.

So make that conditional (patch 2), clean up the function on the way
(patch 1, which moves one of its blocks to mirror_exit_common()), and
then we can allow job_complete() on jobs that are on standby (patch 3).

Patch 4 is basically the same test as in
https://lists.nongnu.org/archive/html/qemu-block/2021-04/msg00214.html,
except some comments are different and, well, job_complete() just works
on STANDBY jobs.

Patch 5 is an iotest that may or may not show the problem for you.  I’ve
tuned the numbers so that on my machine, it fails about 50/50 without
this series (i.e., the job is still on STANDBY and job_complete()
refuses to do anything).

I’m not sure we want that iotest, because it does quite a bit of I/O and
it’s unreliable, and I don’t think there’s anything I can do to make it
reliable.


Max Reitz (5):
  mirror: Move open_backing_file to exit_common
  mirror: Do not enter a paused job on completion
  job: Allow complete for jobs on standby
  test-blockjob: Test job_wait_unpaused()
  iotests: Test completion immediately after drain

 block/mirror.c                                |  28 ++--
 job.c                                         |   4 +-
 tests/unit/test-blockjob.c                    | 121 ++++++++++++++++++
 .../tests/mirror-complete-after-drain         |  89 +++++++++++++
 .../tests/mirror-complete-after-drain.out     |  14 ++
 5 files changed, 239 insertions(+), 17 deletions(-)
 create mode 100755 tests/qemu-iotests/tests/mirror-complete-after-drain
 create mode 100644 tests/qemu-iotests/tests/mirror-complete-after-drain.out

Comments

Kevin Wolf April 9, 2021, 4:15 p.m. UTC | #1
Am 09.04.2021 um 14:04 hat Max Reitz geschrieben:
> Hi,
> 
> We sometimes have a problem with jobs remaining on STANDBY after a drain
> (for a short duration), so if the user immediately issues a
> block-job-complete, that will fail.
> 
> (See also
> https://lists.nongnu.org/archive/html/qemu-block/2021-04/msg00215.html,
> which this series is an alternative for.)
> 
> Looking at the only implementation of .complete(), which is
> mirror_complete(), it looks like there is basically nothing that would
> prevent it from being run while mirror is paused.  Really only the
> job_enter() at the end, which we should not and need not do when the job
> is paused.
> 
> So make that conditional (patch 2), clean up the function on the way
> (patch 1, which moves one of its blocks to mirror_exit_common()), and
> then we can allow job_complete() on jobs that are on standby (patch 3).
> 
> Patch 4 is basically the same test as in
> https://lists.nongnu.org/archive/html/qemu-block/2021-04/msg00214.html,
> except some comments are different and, well, job_complete() just works
> on STANDBY jobs.
> 
> Patch 5 is an iotest that may or may not show the problem for you.  I’ve
> tuned the numbers so that on my machine, it fails about 50/50 without
> this series (i.e., the job is still on STANDBY and job_complete()
> refuses to do anything).
> 
> I’m not sure we want that iotest, because it does quite a bit of I/O and
> it’s unreliable, and I don’t think there’s anything I can do to make it
> reliable.

Thanks, applied patches 1-4 to the block branch.

Kevin