Message ID | d21a7bb5d13a8d8db55bea05e46f5f4e18ed481e.1511230683.git.jcody@redhat.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Mon, Nov 20, 2017 at 09:23:23PM -0500, Jeff Cody wrote: > @@ -291,10 +291,10 @@ void block_job_start(BlockJob *job) > { > assert(job && !block_job_started(job) && job->paused && > job->driver && job->driver->start); > - job->co = qemu_coroutine_create(block_job_co_entry, job); > job->pause_count--; > job->busy = true; > job->paused = false; > + job->co = qemu_coroutine_create(block_job_co_entry, job); > bdrv_coroutine_enter(blk_bs(job->blk), job->co); > } Please see discussion on v1 about this hunk. The rest looks good.
On 21/11/2017 11:49, Stefan Hajnoczi wrote: > On Mon, Nov 20, 2017 at 09:23:23PM -0500, Jeff Cody wrote: >> @@ -291,10 +291,10 @@ void block_job_start(BlockJob *job) >> { >> assert(job && !block_job_started(job) && job->paused && >> job->driver && job->driver->start); >> - job->co = qemu_coroutine_create(block_job_co_entry, job); >> job->pause_count--; >> job->busy = true; >> job->paused = false; >> + job->co = qemu_coroutine_create(block_job_co_entry, job); >> bdrv_coroutine_enter(blk_bs(job->blk), job->co); >> } > > Please see discussion on v1 about this hunk. > > The rest looks good. I'm okay with this hunk, but I would appreciate that the commit message said why it's okay to delay block job cancellation after block_job_sleep_ns returns. Thanks, Paolo
On Tue, Nov 21, 2017 at 02:12:32PM +0100, Paolo Bonzini wrote: > On 21/11/2017 11:49, Stefan Hajnoczi wrote: > > On Mon, Nov 20, 2017 at 09:23:23PM -0500, Jeff Cody wrote: > >> @@ -291,10 +291,10 @@ void block_job_start(BlockJob *job) > >> { > >> assert(job && !block_job_started(job) && job->paused && > >> job->driver && job->driver->start); > >> - job->co = qemu_coroutine_create(block_job_co_entry, job); > >> job->pause_count--; > >> job->busy = true; > >> job->paused = false; > >> + job->co = qemu_coroutine_create(block_job_co_entry, job); > >> bdrv_coroutine_enter(blk_bs(job->blk), job->co); > >> } > > > > Please see discussion on v1 about this hunk. > > > > The rest looks good. > > I'm okay with this hunk, but I would appreciate that the commit message > said why it's okay to delay block job cancellation after > block_job_sleep_ns returns. > Stefan is right in his reply to my v1, so I'll go ahead and drop this hunk for v3. I'll also add the info you requested to the commit message. Jeff
diff --git a/blockjob.c b/blockjob.c index 3a0c491..e181295 100644 --- a/blockjob.c +++ b/blockjob.c @@ -291,10 +291,10 @@ void block_job_start(BlockJob *job) { assert(job && !block_job_started(job) && job->paused && job->driver && job->driver->start); - job->co = qemu_coroutine_create(block_job_co_entry, job); job->pause_count--; job->busy = true; job->paused = false; + job->co = qemu_coroutine_create(block_job_co_entry, job); bdrv_coroutine_enter(blk_bs(job->blk), job->co); } @@ -797,11 +797,14 @@ void block_job_sleep_ns(BlockJob *job, QEMUClockType type, int64_t ns) return; } - job->busy = false; + /* We need to leave job->busy set here, because when we have + * put a coroutine to 'sleep', we have scheduled it to run in + * the future. We cannot enter that same coroutine again before + * it wakes and runs, otherwise we risk double-entry or entry after + * completion. */ if (!block_job_should_pause(job)) { co_aio_sleep_ns(blk_get_aio_context(job->blk), type, ns); } - job->busy = true; block_job_pause_point(job); } diff --git a/include/block/blockjob_int.h b/include/block/blockjob_int.h index f13ad05..43f3be2 100644 --- a/include/block/blockjob_int.h +++ b/include/block/blockjob_int.h @@ -143,7 +143,8 @@ void *block_job_create(const char *job_id, const BlockJobDriver *driver, * @ns: How many nanoseconds to stop for. * * Put the job to sleep (assuming that it wasn't canceled) for @ns - * nanoseconds. Canceling the job will interrupt the wait immediately. + * nanoseconds. Canceling the job will not interrupt the wait, so the + * cancel will not process until the coroutine wakes up. */ void block_job_sleep_ns(BlockJob *job, QEMUClockType type, int64_t ns);
When block_job_sleep_ns() is called, the co-routine is scheduled for future execution. If we allow the job to be re-entered prior to the scheduled time, we present a race condition in which a coroutine can be entered recursively, or even entered after the coroutine is deleted. The job->busy flag is used by blockjobs when a coroutine is busy executing. The function 'block_job_enter()' obeys the busy flag, and will not enter a coroutine if set. If we sleep a job, we need to leave the busy flag set, so that subsequent calls to block_job_enter() are prevented. This fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1508708 Also, in block_job_start(), set the relevant job flags (.busy, .paused) before creating the coroutine, not just before executing it. Signed-off-by: Jeff Cody <jcody@redhat.com> --- blockjob.c | 9 ++++++--- include/block/blockjob_int.h | 3 ++- 2 files changed, 8 insertions(+), 4 deletions(-)