diff mbox

[v2,09/11] blockjob: add block_job_start

Message ID 1475272849-19990-10-git-send-email-jsnow@redhat.com (mailing list archive)
State New, archived
Headers show

Commit Message

John Snow Sept. 30, 2016, 10 p.m. UTC
Instead of automatically starting jobs at creation time via backup_start
et al, we'd like to return a job object pointer that can be started
manually at later point in time.

For now, add the block_job_start mechanism and start the jobs
automatically as we have been doing, with conversions job-by-job coming
in later patches.

Of note: cancellation of unstarted jobs will perform all the normal
cleanup as if the job had started, particularly abort and clean. The
only difference is that we will not emit any events, because the job
never actually started.

Signed-off-by: John Snow <jsnow@redhat.com>
---
 block/backup.c            |  3 +--
 block/commit.c            |  3 +--
 block/mirror.c            |  3 +--
 block/stream.c            |  3 +--
 blockjob.c                | 48 +++++++++++++++++++++++++++++++++++------------
 include/block/block_int.h |  8 ++++++++
 tests/test-blockjob-txn.c | 12 ++++++------
 7 files changed, 54 insertions(+), 26 deletions(-)

Comments

Kevin Wolf Oct. 5, 2016, 3:17 p.m. UTC | #1
Am 01.10.2016 um 00:00 hat John Snow geschrieben:
> Instead of automatically starting jobs at creation time via backup_start
> et al, we'd like to return a job object pointer that can be started
> manually at later point in time.
> 
> For now, add the block_job_start mechanism and start the jobs
> automatically as we have been doing, with conversions job-by-job coming
> in later patches.
> 
> Of note: cancellation of unstarted jobs will perform all the normal
> cleanup as if the job had started, particularly abort and clean. The
> only difference is that we will not emit any events, because the job
> never actually started.
> 
> Signed-off-by: John Snow <jsnow@redhat.com>

Should we make sure that jobs are only added to the block_jobs list once
they are started? It doesn't sound like a good idea to make a job
without a coroutine user-accessible.

Kevin
John Snow Oct. 6, 2016, 10:44 p.m. UTC | #2
On 10/05/2016 11:17 AM, Kevin Wolf wrote:
> Am 01.10.2016 um 00:00 hat John Snow geschrieben:
>> Instead of automatically starting jobs at creation time via backup_start
>> et al, we'd like to return a job object pointer that can be started
>> manually at later point in time.
>>
>> For now, add the block_job_start mechanism and start the jobs
>> automatically as we have been doing, with conversions job-by-job coming
>> in later patches.
>>
>> Of note: cancellation of unstarted jobs will perform all the normal
>> cleanup as if the job had started, particularly abort and clean. The
>> only difference is that we will not emit any events, because the job
>> never actually started.
>>
>> Signed-off-by: John Snow <jsnow@redhat.com>
>
> Should we make sure that jobs are only added to the block_jobs list once
> they are started? It doesn't sound like a good idea to make a job
> without a coroutine user-accessible.
>
> Kevin
>

That would certainly help limit exposure to a potentially dangerous 
object, but some operations on these un-started jobs are still perfectly 
reasonable, like cancellation. Even things like "set speed" are 
perfectly reasonable on an unstarted job.

I'd rather just verify that having an accessible job cannot be operated 
on unsafely via the public interface, even though that's more work.

So here's the list:

-block_job_next: Harmless.

-block_job_get: Harmless.

-block_job_set_speed: Depends on the set_speed implementation, but 
backup_set_speed makes no assumptions and that's the only job I am 
attempting to convert in this series.

-block_job_cancel: Edited to specifically support pre-started jobs in 
this patch.

-block_job_complete: Edited to check for a coroutine specifically, but 
even without, a job will not be able to become ready without running first.

-block_job_query: Harmless* (*I could probably expose a 'started' 
variable so that libvirt didn't get confused as to why there are jobs 
that exist but are not busy nor paused.)

-block_job_pause: Harmless**

-block_job_user_pause: Harmless**

-block_job_user_paused: Harmless, if meaningless.

-block_job_resume: **We will attempt to call block_job_enter, but 
conditional on job->co, we do nothing, so it's harmless if not 
misleading that you can pause/resume to your heart's content.

-block_job_user_resume: ^ http://i.imgur.com/2zYxrIe.png ^

-block_job_cancel_sync: Safe, due to edits to block_job_cancel. Polling 
loop WILL complete as normal, because all jobs will finish through 
block_job_completed one way or another.

-block_job_cancel_sync_all: As safe as the above.

-block_job_complete_sync: Safe, complete will return error for unstarted 
jobs.

-block_job_iostatus_reset: Harmless, I think -- backup does not 
implement this method. (Huh, *no* jobs implement iostatus_reset at the 
moment...)

-block_job_txn_new: Doesn't operate on jobs.

-block_job_txn_unref: Also doesn't operate on jobs.

-block_job_get_aio_context: Harmless.

-block_job_txn_add_job: Safe and intended! There is trickery here, 
though, as once a job is introduced into transactions it opens it up to 
the private interface. This adds the following functions to considerations:

-block_job_completed: Safe, does not assume a coroutine anywhere.

-block_job_completed_single: Specifically updated to be context-aware of 
if we are pre-started or not. This is the "real" completion mechanism 
for BlockJobs that gets run for transactional OR individual jobs.

-block_job_completed_txn_abort: ***BUG***! The problem with the patch as 
I've sent it is that cancel calls completed (under the premise that 
nobody else would ever get to be able to), but we call both cancel AND 
completed from this function, which will cause us to call completed 
twice on pre-started jobs. I will fix this for the next version.

-block_job_completed_txn_success: Should never be called; if it IS, the 
presence of an unstarted job in the transaction will cause an early 
return. And even if I am utterly wrong and every job in the transaction 
completes successfully (somehow?) calling block_job_completed_single is 
perfectly safe by design.


Everything else is either internal to block job instances or the 
blockjob core.


There may be:
(A) Bugs in my code/thinking, or
(B) Improvements we can make to the readability,

but I believe that this is (Apart from the mentioned bug) not dangerous.

--js
John Snow Oct. 17, 2016, 6 p.m. UTC | #3
On 10/06/2016 06:44 PM, John Snow wrote:
>
>
> On 10/05/2016 11:17 AM, Kevin Wolf wrote:
>> Am 01.10.2016 um 00:00 hat John Snow geschrieben:
>>> Instead of automatically starting jobs at creation time via backup_start
>>> et al, we'd like to return a job object pointer that can be started
>>> manually at later point in time.
>>>
>>> For now, add the block_job_start mechanism and start the jobs
>>> automatically as we have been doing, with conversions job-by-job coming
>>> in later patches.
>>>
>>> Of note: cancellation of unstarted jobs will perform all the normal
>>> cleanup as if the job had started, particularly abort and clean. The
>>> only difference is that we will not emit any events, because the job
>>> never actually started.
>>>
>>> Signed-off-by: John Snow <jsnow@redhat.com>
>>
>> Should we make sure that jobs are only added to the block_jobs list once
>> they are started? It doesn't sound like a good idea to make a job
>> without a coroutine user-accessible.
>>
>> Kevin
>>
>
> That would certainly help limit exposure to a potentially dangerous
> object, but some operations on these un-started jobs are still perfectly
> reasonable, like cancellation. Even things like "set speed" are
> perfectly reasonable on an unstarted job.
>
> I'd rather just verify that having an accessible job cannot be operated
> on unsafely via the public interface, even though that's more work.
>
> So here's the list:
>
> -block_job_next: Harmless.
>
> -block_job_get: Harmless.
>
> -block_job_set_speed: Depends on the set_speed implementation, but
> backup_set_speed makes no assumptions and that's the only job I am
> attempting to convert in this series.
>
> -block_job_cancel: Edited to specifically support pre-started jobs in
> this patch.
>
> -block_job_complete: Edited to check for a coroutine specifically, but
> even without, a job will not be able to become ready without running first.
>
> -block_job_query: Harmless* (*I could probably expose a 'started'
> variable so that libvirt didn't get confused as to why there are jobs
> that exist but are not busy nor paused.)
>
> -block_job_pause: Harmless**
>
> -block_job_user_pause: Harmless**
>
> -block_job_user_paused: Harmless, if meaningless.
>
> -block_job_resume: **We will attempt to call block_job_enter, but
> conditional on job->co, we do nothing, so it's harmless if not
> misleading that you can pause/resume to your heart's content.
>
> -block_job_user_resume: ^ http://i.imgur.com/2zYxrIe.png ^
>
> -block_job_cancel_sync: Safe, due to edits to block_job_cancel. Polling
> loop WILL complete as normal, because all jobs will finish through
> block_job_completed one way or another.
>
> -block_job_cancel_sync_all: As safe as the above.
>
> -block_job_complete_sync: Safe, complete will return error for unstarted
> jobs.
>
> -block_job_iostatus_reset: Harmless, I think -- backup does not
> implement this method. (Huh, *no* jobs implement iostatus_reset at the
> moment...)
>
> -block_job_txn_new: Doesn't operate on jobs.
>
> -block_job_txn_unref: Also doesn't operate on jobs.
>
> -block_job_get_aio_context: Harmless.
>
> -block_job_txn_add_job: Safe and intended! There is trickery here,
> though, as once a job is introduced into transactions it opens it up to
> the private interface. This adds the following functions to considerations:
>
> -block_job_completed: Safe, does not assume a coroutine anywhere.
>
> -block_job_completed_single: Specifically updated to be context-aware of
> if we are pre-started or not. This is the "real" completion mechanism
> for BlockJobs that gets run for transactional OR individual jobs.
>
> -block_job_completed_txn_abort: ***BUG***! The problem with the patch as
> I've sent it is that cancel calls completed (under the premise that
> nobody else would ever get to be able to), but we call both cancel AND
> completed from this function, which will cause us to call completed
> twice on pre-started jobs. I will fix this for the next version.
>

Actually, I was mistaken. The way I wrote it is fine, and it will work 
like this:

qmp_transaction encounters an error during prepare and calls the abort 
method for each action.

- The abort method calls block_job_cancel on the not-yet-started job,
- Which in turn calls block_job_completed with ret = -ECANCELED

which then either calls:
- block_job_completed_single with a failure mode (which calls .abort and 
.clean) if we are not in a transaction, or
- block_job_completed_txn_abort

txn_abort is going to:

1) Set txn->aborting to true, and
2) Cancel every other job in the transaction using block_job_cancel_sync.

These jobs will also come to txn_abort, but since txn->aborting is true, 
will become a NOP.

Then, finally, we will call block_job_completed_single on every job in 
the transaction which will perform our proper cleanup by calling .abort 
and .clean for each job.

No duplication, I was mistaken...!

> -block_job_completed_txn_success: Should never be called; if it IS, the
> presence of an unstarted job in the transaction will cause an early
> return. And even if I am utterly wrong and every job in the transaction
> completes successfully (somehow?) calling block_job_completed_single is
> perfectly safe by design.
>
>
> Everything else is either internal to block job instances or the
> blockjob core.
>
>
> There may be:
> (A) Bugs in my code/thinking, or
> (B) Improvements we can make to the readability,
>
> but I believe that this is (Apart from the mentioned bug) not dangerous.
>
> --js
diff mbox

Patch

diff --git a/block/backup.c b/block/backup.c
index 58dfc58..7294169 100644
--- a/block/backup.c
+++ b/block/backup.c
@@ -637,9 +637,8 @@  void backup_start(const char *job_id, BlockDriverState *bs,
 
     bdrv_op_block_all(target, job->common.blocker);
     job->common.len = len;
-    job->common.co = qemu_coroutine_create(job->common.driver->start, job);
     block_job_txn_add_job(txn, &job->common);
-    qemu_coroutine_enter(job->common.co);
+    block_job_start(&job->common);
     return;
 
  error:
diff --git a/block/commit.c b/block/commit.c
index dbaf39e..e64fdf3 100644
--- a/block/commit.c
+++ b/block/commit.c
@@ -275,10 +275,9 @@  void commit_start(const char *job_id, BlockDriverState *bs,
     s->backing_file_str = g_strdup(backing_file_str);
 
     s->on_error = on_error;
-    s->common.co = qemu_coroutine_create(s->common.driver->start, s);
 
     trace_commit_start(bs, base, top, s, s->common.co, opaque);
-    qemu_coroutine_enter(s->common.co);
+    block_job_start(&s->common);
 }
 
 
diff --git a/block/mirror.c b/block/mirror.c
index ef54e5b..2a6a662 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -969,9 +969,8 @@  static void mirror_start_job(const char *job_id, BlockDriverState *bs,
 
     bdrv_op_block_all(target, s->common.blocker);
 
-    s->common.co = qemu_coroutine_create(s->common.driver->start, s);
     trace_mirror_start(bs, s, s->common.co, opaque);
-    qemu_coroutine_enter(s->common.co);
+    block_job_start(&s->common);
 }
 
 void mirror_start(const char *job_id, BlockDriverState *bs,
diff --git a/block/stream.c b/block/stream.c
index 2a1c814..55cbbe7 100644
--- a/block/stream.c
+++ b/block/stream.c
@@ -232,7 +232,6 @@  void stream_start(const char *job_id, BlockDriverState *bs,
     s->backing_file_str = g_strdup(backing_file_str);
 
     s->on_error = on_error;
-    s->common.co = qemu_coroutine_create(s->common.driver->start, s);
     trace_stream_start(bs, base, s, s->common.co, opaque);
-    qemu_coroutine_enter(s->common.co);
+    block_job_start(&s->common);
 }
diff --git a/blockjob.c b/blockjob.c
index 44cbf6c..ac9a68d 100644
--- a/blockjob.c
+++ b/blockjob.c
@@ -161,7 +161,8 @@  void *block_job_create(const char *job_id, const BlockJobDriver *driver,
     job->blk           = blk;
     job->cb            = cb;
     job->opaque        = opaque;
-    job->busy          = true;
+    job->busy          = false;
+    job->paused        = true;
     job->refcnt        = 1;
     bs->job = job;
 
@@ -184,6 +185,15 @@  void *block_job_create(const char *job_id, const BlockJobDriver *driver,
     return job;
 }
 
+void block_job_start(BlockJob *job)
+{
+    assert(job && !job->co && job->paused && !job->busy && job->driver->start);
+    job->paused = false;
+    job->busy = true;
+    job->co = qemu_coroutine_create(job->driver->start, job);
+    qemu_coroutine_enter(job->co);
+}
+
 void block_job_ref(BlockJob *job)
 {
     ++job->refcnt;
@@ -220,18 +230,24 @@  static void block_job_completed_single(BlockJob *job)
     if (job->driver->clean) {
         job->driver->clean(job);
     }
+
     if (job->cb) {
         job->cb(job->opaque, job->ret);
     }
-    if (block_job_is_cancelled(job)) {
-        block_job_event_cancelled(job);
-    } else {
-        const char *msg = NULL;
-        if (job->ret < 0) {
-            msg = strerror(-job->ret);
+
+    /* Emit events only if we actually started */
+    if (job->co) {
+        if (block_job_is_cancelled(job)) {
+            block_job_event_cancelled(job);
+        } else {
+            const char *msg = NULL;
+            if (job->ret < 0) {
+                msg = strerror(-job->ret);
+            }
+            block_job_event_completed(job, msg);
         }
-        block_job_event_completed(job, msg);
     }
+
     if (job->txn) {
         QLIST_REMOVE(job, txn_list);
         block_job_txn_unref(job->txn);
@@ -335,7 +351,8 @@  void block_job_set_speed(BlockJob *job, int64_t speed, Error **errp)
 
 void block_job_complete(BlockJob *job, Error **errp)
 {
-    if (job->pause_count || job->cancelled || !job->driver->complete) {
+    if (job->pause_count || job->cancelled ||
+        !job->co || !job->driver->complete) {
         error_setg(errp, "The active block job '%s' cannot be completed",
                    job->id);
         return;
@@ -364,6 +381,8 @@  bool block_job_paused(BlockJob *job)
 
 void coroutine_fn block_job_pause_point(BlockJob *job)
 {
+    assert(job && job->co);
+
     if (!block_job_should_pause(job)) {
         return;
     }
@@ -408,9 +427,14 @@  void block_job_enter(BlockJob *job)
 
 void block_job_cancel(BlockJob *job)
 {
-    job->cancelled = true;
-    block_job_iostatus_reset(job);
-    block_job_enter(job);
+    if (job->co) {
+        job->cancelled = true;
+        block_job_iostatus_reset(job);
+        block_job_enter(job);
+    } else {
+        /* Job was queued but unstarted. Perform cleanup. */
+        block_job_completed(job, -ECANCELED);
+    }
 }
 
 bool block_job_is_cancelled(BlockJob *job)
diff --git a/include/block/block_int.h b/include/block/block_int.h
index 3e79228..686f6a8 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -763,6 +763,14 @@  void backup_start(const char *job_id, BlockDriverState *bs,
                   BlockCompletionFunc *cb, void *opaque,
                   BlockJobTxn *txn, Error **errp);
 
+/**
+ * block_job_start:
+ * @job: The job object as returned by @block_job_create.
+ *
+ * Begins execution of a block job.
+ */
+void block_job_start(BlockJob *job);
+
 void hmp_drive_add_node(Monitor *mon, const char *optstr);
 
 BdrvChild *bdrv_root_attach_child(BlockDriverState *child_bs,
diff --git a/tests/test-blockjob-txn.c b/tests/test-blockjob-txn.c
index 736e18f..d694b52 100644
--- a/tests/test-blockjob-txn.c
+++ b/tests/test-blockjob-txn.c
@@ -24,10 +24,6 @@  typedef struct {
     int *result;
 } TestBlockJob;
 
-static const BlockJobDriver test_block_job_driver = {
-    .instance_size = sizeof(TestBlockJob),
-};
-
 static void test_block_job_complete(BlockJob *job, void *opaque)
 {
     BlockDriverState *bs = blk_bs(job->blk);
@@ -77,6 +73,11 @@  static void test_block_job_cb(void *opaque, int ret)
     g_free(data);
 }
 
+static const BlockJobDriver test_block_job_driver = {
+    .instance_size = sizeof(TestBlockJob),
+    .start = test_block_job_run,
+};
+
 /* Create a block job that completes with a given return code after a given
  * number of event loop iterations.  The return code is stored in the given
  * result pointer.
@@ -103,10 +104,9 @@  static BlockJob *test_block_job_start(unsigned int iterations,
     s->use_timer = use_timer;
     s->rc = rc;
     s->result = result;
-    s->common.co = qemu_coroutine_create(test_block_job_run, s);
     data->job = s;
     data->result = result;
-    qemu_coroutine_enter(s->common.co);
+    block_job_start(&s->common);
     return &s->common;
 }