diff mbox

[v2,for-2.11,1/4] blockjob: do not allow coroutine double entry or entry-after-completion

Message ID d21a7bb5d13a8d8db55bea05e46f5f4e18ed481e.1511230683.git.jcody@redhat.com (mailing list archive)
State New, archived
Headers show

Commit Message

Jeff Cody Nov. 21, 2017, 2:23 a.m. UTC
When block_job_sleep_ns() is called, the co-routine is scheduled for
future execution.  If we allow the job to be re-entered prior to the
scheduled time, we present a race condition in which a coroutine can be
entered recursively, or even entered after the coroutine is deleted.

The job->busy flag is used by blockjobs when a coroutine is busy
executing. The function 'block_job_enter()' obeys the busy flag,
and will not enter a coroutine if set.  If we sleep a job, we need to
leave the busy flag set, so that subsequent calls to block_job_enter()
are prevented.

This fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1508708

Also, in block_job_start(), set the relevant job flags (.busy, .paused)
before creating the coroutine, not just before executing it.

Signed-off-by: Jeff Cody <jcody@redhat.com>
---
 blockjob.c                   | 9 ++++++---
 include/block/blockjob_int.h | 3 ++-
 2 files changed, 8 insertions(+), 4 deletions(-)

Comments

Stefan Hajnoczi Nov. 21, 2017, 10:49 a.m. UTC | #1
On Mon, Nov 20, 2017 at 09:23:23PM -0500, Jeff Cody wrote:
> @@ -291,10 +291,10 @@ void block_job_start(BlockJob *job)
>  {
>      assert(job && !block_job_started(job) && job->paused &&
>             job->driver && job->driver->start);
> -    job->co = qemu_coroutine_create(block_job_co_entry, job);
>      job->pause_count--;
>      job->busy = true;
>      job->paused = false;
> +    job->co = qemu_coroutine_create(block_job_co_entry, job);
>      bdrv_coroutine_enter(blk_bs(job->blk), job->co);
>  }

Please see discussion on v1 about this hunk.

The rest looks good.
Paolo Bonzini Nov. 21, 2017, 1:12 p.m. UTC | #2
On 21/11/2017 11:49, Stefan Hajnoczi wrote:
> On Mon, Nov 20, 2017 at 09:23:23PM -0500, Jeff Cody wrote:
>> @@ -291,10 +291,10 @@ void block_job_start(BlockJob *job)
>>  {
>>      assert(job && !block_job_started(job) && job->paused &&
>>             job->driver && job->driver->start);
>> -    job->co = qemu_coroutine_create(block_job_co_entry, job);
>>      job->pause_count--;
>>      job->busy = true;
>>      job->paused = false;
>> +    job->co = qemu_coroutine_create(block_job_co_entry, job);
>>      bdrv_coroutine_enter(blk_bs(job->blk), job->co);
>>  }
> 
> Please see discussion on v1 about this hunk.
> 
> The rest looks good.

I'm okay with this hunk, but I would appreciate that the commit message
said why it's okay to delay block job cancellation after
block_job_sleep_ns returns.

Thanks,

Paolo
Jeff Cody Nov. 21, 2017, 1:26 p.m. UTC | #3
On Tue, Nov 21, 2017 at 02:12:32PM +0100, Paolo Bonzini wrote:
> On 21/11/2017 11:49, Stefan Hajnoczi wrote:
> > On Mon, Nov 20, 2017 at 09:23:23PM -0500, Jeff Cody wrote:
> >> @@ -291,10 +291,10 @@ void block_job_start(BlockJob *job)
> >>  {
> >>      assert(job && !block_job_started(job) && job->paused &&
> >>             job->driver && job->driver->start);
> >> -    job->co = qemu_coroutine_create(block_job_co_entry, job);
> >>      job->pause_count--;
> >>      job->busy = true;
> >>      job->paused = false;
> >> +    job->co = qemu_coroutine_create(block_job_co_entry, job);
> >>      bdrv_coroutine_enter(blk_bs(job->blk), job->co);
> >>  }
> > 
> > Please see discussion on v1 about this hunk.
> > 
> > The rest looks good.
> 
> I'm okay with this hunk, but I would appreciate that the commit message
> said why it's okay to delay block job cancellation after
> block_job_sleep_ns returns.
> 

Stefan is right in his reply to my v1, so I'll go ahead and drop this hunk
for v3.  I'll also add the info you requested to the commit message.

Jeff
diff mbox

Patch

diff --git a/blockjob.c b/blockjob.c
index 3a0c491..e181295 100644
--- a/blockjob.c
+++ b/blockjob.c
@@ -291,10 +291,10 @@  void block_job_start(BlockJob *job)
 {
     assert(job && !block_job_started(job) && job->paused &&
            job->driver && job->driver->start);
-    job->co = qemu_coroutine_create(block_job_co_entry, job);
     job->pause_count--;
     job->busy = true;
     job->paused = false;
+    job->co = qemu_coroutine_create(block_job_co_entry, job);
     bdrv_coroutine_enter(blk_bs(job->blk), job->co);
 }
 
@@ -797,11 +797,14 @@  void block_job_sleep_ns(BlockJob *job, QEMUClockType type, int64_t ns)
         return;
     }
 
-    job->busy = false;
+    /* We need to leave job->busy set here, because when we have
+     * put a coroutine to 'sleep', we have scheduled it to run in
+     * the future.  We cannot enter that same coroutine again before
+     * it wakes and runs, otherwise we risk double-entry or entry after
+     * completion. */
     if (!block_job_should_pause(job)) {
         co_aio_sleep_ns(blk_get_aio_context(job->blk), type, ns);
     }
-    job->busy = true;
 
     block_job_pause_point(job);
 }
diff --git a/include/block/blockjob_int.h b/include/block/blockjob_int.h
index f13ad05..43f3be2 100644
--- a/include/block/blockjob_int.h
+++ b/include/block/blockjob_int.h
@@ -143,7 +143,8 @@  void *block_job_create(const char *job_id, const BlockJobDriver *driver,
  * @ns: How many nanoseconds to stop for.
  *
  * Put the job to sleep (assuming that it wasn't canceled) for @ns
- * nanoseconds.  Canceling the job will interrupt the wait immediately.
+ * nanoseconds.  Canceling the job will not interrupt the wait, so the
+ * cancel will not process until the coroutine wakes up.
  */
 void block_job_sleep_ns(BlockJob *job, QEMUClockType type, int64_t ns);