Message ID | 2cd4d33dc68bb3c738e1c9aa39a0ddd4108c401e.1511145863.git.jcody@redhat.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Sun, Nov 19, 2017 at 09:46:42PM -0500, Jeff Cody wrote: > --- a/blockjob.c > +++ b/blockjob.c > @@ -291,10 +291,10 @@ void block_job_start(BlockJob *job) > { > assert(job && !block_job_started(job) && job->paused && > job->driver && job->driver->start); > - job->co = qemu_coroutine_create(block_job_co_entry, job); > job->pause_count--; > job->busy = true; > job->paused = false; > + job->co = qemu_coroutine_create(block_job_co_entry, job); > bdrv_coroutine_enter(blk_bs(job->blk), job->co); > } > This hunk makes no difference. The coroutine is only entered by bdrv_coroutine_enter() so the order of job field initialization doesn't matter. > @@ -797,11 +797,14 @@ void block_job_sleep_ns(BlockJob *job, QEMUClockType type, int64_t ns) > return; > } > > - job->busy = false; > + /* We need to leave job->busy set here, because when we have > + * put a coroutine to 'sleep', we have scheduled it to run in > + * the future. We cannot enter that same coroutine again before > + * it wakes and runs, otherwise we risk double-entry or entry after > + * completion. */ > if (!block_job_should_pause(job)) { > co_aio_sleep_ns(blk_get_aio_context(job->blk), type, ns); > } > - job->busy = true; > > block_job_pause_point(job); This leaves a stale doc comment in include/block/blockjob_int.h: /** * block_job_sleep_ns: * @job: The job that calls the function. * @clock: The clock to sleep on. * @ns: How many nanoseconds to stop for. * * Put the job to sleep (assuming that it wasn't canceled) for @ns * nanoseconds. Canceling the job will interrupt the wait immediately. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ */ void block_job_sleep_ns(BlockJob *job, QEMUClockType type, int64_t ns); This raises questions about the ability to cancel sleep: 1. Does something depend on cancelling sleep? 2. Did cancellation work properly in commit 4513eafe928ff47486f4167c28d364c72b5ff7e3 ("block: add block_job_sleep_ns") and was it broken afterwards? It is possible to fix the recursive coroutine entry without losing sleep cancellation. Whether it's worth the trouble depends on the answers to the above questions. Stefan
On Mon, Nov 20, 2017 at 11:16:53AM +0000, Stefan Hajnoczi wrote: > On Sun, Nov 19, 2017 at 09:46:42PM -0500, Jeff Cody wrote: > > --- a/blockjob.c > > +++ b/blockjob.c > > @@ -291,10 +291,10 @@ void block_job_start(BlockJob *job) > > { > > assert(job && !block_job_started(job) && job->paused && > > job->driver && job->driver->start); > > - job->co = qemu_coroutine_create(block_job_co_entry, job); > > job->pause_count--; > > job->busy = true; > > job->paused = false; > > + job->co = qemu_coroutine_create(block_job_co_entry, job); > > bdrv_coroutine_enter(blk_bs(job->blk), job->co); > > } > > > > This hunk makes no difference. The coroutine is only entered by > bdrv_coroutine_enter() so the order of job field initialization doesn't > matter. > It likely makes no difference with the current code (unless there is a latent bug). However I made the change to protect against the following scenario - which, perhaps to your point, would be a bug in any case: 1. job->co = qemu_coroutine_create() * Now block_job_started() returns true, as it just checks for job->co 2. Another thread calls block_job_enter(), before we call bdrv_coroutine_enter(). * block_job_enter() checks job->busy and block_job_started() to determine if coroutine entry is allowed. Without this change, these checks could pass and coroutine entry could occur. * I don't think this can happen in the current code, but the above hunk change is still correct, and would protect against such an occurrence. I guess the question is, "is it worth doing?", to try and prevent that sort of buggy behavior. My thought was "yes" because: A) there is no penalty in doing it this way B) while a bug, double entry like this can lead to memory and/or data corruption, and the checks for co->caller et al. might not catch it. This is particularly true if the coroutine exits (COROUTINE_TERMINATE) before the re-entry. But maybe if we are concerned about that we should figure out a way to abort() instead. Of course, that makes allowing recursive coroutines more difficult in the future. > > @@ -797,11 +797,14 @@ void block_job_sleep_ns(BlockJob *job, QEMUClockType type, int64_t ns) > > return; > > } > > > > - job->busy = false; > > + /* We need to leave job->busy set here, because when we have > > + * put a coroutine to 'sleep', we have scheduled it to run in > > + * the future. We cannot enter that same coroutine again before > > + * it wakes and runs, otherwise we risk double-entry or entry after > > + * completion. */ > > if (!block_job_should_pause(job)) { > > co_aio_sleep_ns(blk_get_aio_context(job->blk), type, ns); > > } > > - job->busy = true; > > > > block_job_pause_point(job); > > This leaves a stale doc comment in include/block/blockjob_int.h: > > /** > * block_job_sleep_ns: > * @job: The job that calls the function. > * @clock: The clock to sleep on. > * @ns: How many nanoseconds to stop for. > * > * Put the job to sleep (assuming that it wasn't canceled) for @ns > * nanoseconds. Canceling the job will interrupt the wait immediately. > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > */ I didn't catch the doc, that should be changed as well. > void block_job_sleep_ns(BlockJob *job, QEMUClockType type, int64_t ns); > > This raises questions about the ability to cancel sleep: > > 1. Does something depend on cancelling sleep? > Not that I can tell. The advantage is that you don't have to wait for the timer, so something like qmp_block_job_cancel() will cancel sooner. But it is obviously broken with the current coroutine implementation to try to do that. > 2. Did cancellation work properly in commit > 4513eafe928ff47486f4167c28d364c72b5ff7e3 ("block: add > block_job_sleep_ns") and was it broken afterwards? > With iothreads, the answer is complicated. It was broken for a while for other reasons. It broke after using aio_co_wake() in the sleep timer cb (commmit 2f47da5f7f), which added the ability to schedule a coroutine if the timer callback was called from the wrong AioContext. Prior to that it "worked" in that the segfault was not present. But even to bisect back to 2f47da5f7f was not straightforward, because attempting them stream/cancel with iothreads would not even work until c324fd0 (so I only bisected back as far as c324fd0 would cleanly apply). And it is tricky to say if it "works" or not, because it is racy. What may have appeared to work may be more attributed to luck and timing. If the coroutine is going to run at a future time, we cannot enter it beforehand. We risk the coroutine not even existing when the timer does run the sleeping coroutine. At the very least, early entry with the current code would require a way to delete the associated timer. > It is possible to fix the recursive coroutine entry without losing sleep > cancellation. Whether it's worth the trouble depends on the answers to > the above questions. > I contemplated the same thing. At least for 2.11, fixing recursive coroutine entry is probably more than we want to do. Long term, my opinion is that we should fix it, because preventing it becomes more difficult. It is easy to miss something that might cause a recursive entry in code reviews, and since it can be racy, casual testing may often miss it as well. Jeff
On 20/11/2017 12:16, Stefan Hajnoczi wrote: > This raises questions about the ability to cancel sleep: > > 1. Does something depend on cancelling sleep? block_job_cancel does, but in practice the sleep time is so small (smaller than SLICE_TIME, which is 100 ms) that we probably don't care. I agree with Jeff that canceling the sleep by force-entering the coroutine seemed clever but is probably a very bad idea. Paolo
On Mon, Nov 20, 2017 at 08:36:19AM -0500, Jeff Cody wrote: > On Mon, Nov 20, 2017 at 11:16:53AM +0000, Stefan Hajnoczi wrote: > > On Sun, Nov 19, 2017 at 09:46:42PM -0500, Jeff Cody wrote: > > > --- a/blockjob.c > > > +++ b/blockjob.c > > > @@ -291,10 +291,10 @@ void block_job_start(BlockJob *job) > > > { > > > assert(job && !block_job_started(job) && job->paused && > > > job->driver && job->driver->start); > > > - job->co = qemu_coroutine_create(block_job_co_entry, job); > > > job->pause_count--; > > > job->busy = true; > > > job->paused = false; > > > + job->co = qemu_coroutine_create(block_job_co_entry, job); > > > bdrv_coroutine_enter(blk_bs(job->blk), job->co); > > > } > > > > > > > This hunk makes no difference. The coroutine is only entered by > > bdrv_coroutine_enter() so the order of job field initialization doesn't > > matter. > > > > It likely makes no difference with the current code (unless there is a > latent bug). However I made the change to protect against the following > scenario - which, perhaps to your point, would be a bug in any case: > > 1. job->co = qemu_coroutine_create() > > * Now block_job_started() returns true, as it just checks for job->co > > 2. Another thread calls block_job_enter(), before we call > bdrv_coroutine_enter(). The job is protected by AioContext acquire/release. Other threads cannot touch it because the block_job_start() caller has already acquired the AioContext. > > * block_job_enter() checks job->busy and block_job_started() to > determine if coroutine entry is allowed. Without this change, these > checks could pass and coroutine entry could occur. > > * I don't think this can happen in the current code, but the above hunk > change is still correct, and would protect against such an > occurrence. > > I guess the question is, "is it worth doing?", to try and prevent that sort > of buggy behavior. My thought was "yes" because: > > A) there is no penalty in doing it this way > > B) while a bug, double entry like this can lead to memory and/or > data corruption, and the checks for co->caller et al. might not > catch it. This is particularly true if the coroutine exits > (COROUTINE_TERMINATE) before the re-entry. > > But maybe if we are concerned about that we should figure out a way to > abort() instead. Of course, that makes allowing recursive coroutines more > difficult in the future. The compiler and CPU can reorder memory accesses so simply reordering assignment statements is ineffective against threads. I'm against merging this hunk because: 1. There is a proper thread-safety mechanism in place that callers are already using, so this is the wrong way to attempt to provide thread-safety. 2. This change doesn't protect against the multi-threaded scenario you described because the memory order isn't being controlled. > > > > > @@ -797,11 +797,14 @@ void block_job_sleep_ns(BlockJob *job, QEMUClockType type, int64_t ns) > > > return; > > > } > > > > > > - job->busy = false; > > > + /* We need to leave job->busy set here, because when we have > > > + * put a coroutine to 'sleep', we have scheduled it to run in > > > + * the future. We cannot enter that same coroutine again before > > > + * it wakes and runs, otherwise we risk double-entry or entry after > > > + * completion. */ > > > if (!block_job_should_pause(job)) { > > > co_aio_sleep_ns(blk_get_aio_context(job->blk), type, ns); > > > } > > > - job->busy = true; > > > > > > block_job_pause_point(job); > > > > This leaves a stale doc comment in include/block/blockjob_int.h: > > > > /** > > * block_job_sleep_ns: > > * @job: The job that calls the function. > > * @clock: The clock to sleep on. > > * @ns: How many nanoseconds to stop for. > > * > > * Put the job to sleep (assuming that it wasn't canceled) for @ns > > * nanoseconds. Canceling the job will interrupt the wait immediately. > > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > > */ > > I didn't catch the doc, that should be changed as well. > > > void block_job_sleep_ns(BlockJob *job, QEMUClockType type, int64_t ns); > > > > This raises questions about the ability to cancel sleep: > > > > 1. Does something depend on cancelling sleep? > > > > Not that I can tell. The advantage is that you don't have to wait for the > timer, so something like qmp_block_job_cancel() will cancel sooner. > > But it is obviously broken with the current coroutine implementation to try > to do that. > > > 2. Did cancellation work properly in commit > > 4513eafe928ff47486f4167c28d364c72b5ff7e3 ("block: add > > block_job_sleep_ns") and was it broken afterwards? > > > > With iothreads, the answer is complicated. It was broken for a while for > other reasons. > > It broke after using aio_co_wake() in the sleep timer cb (commmit > 2f47da5f7f), which added the ability to schedule a coroutine if the timer > callback was called from the wrong AioContext. > > Prior to that it "worked" in that the segfault was not present. > > But even to bisect back to 2f47da5f7f was not straightforward, because > attempting them stream/cancel with iothreads would not even work until > c324fd0 (so I only bisected back as far as c324fd0 would cleanly apply). > > And it is tricky to say if it "works" or not, because it is racy. What may > have appeared to work may be more attributed to luck and timing. > > If the coroutine is going to run at a future time, we cannot enter it > beforehand. We risk the coroutine not even existing when the timer does run > the sleeping coroutine. At the very least, early entry with the current > code would require a way to delete the associated timer. > > > It is possible to fix the recursive coroutine entry without losing sleep > > cancellation. Whether it's worth the trouble depends on the answers to > > the above questions. > > > > I contemplated the same thing. > > At least for 2.11, fixing recursive coroutine entry is probably more than we > want to do. > > Long term, my opinion is that we should fix it, because preventing it > becomes more difficult. It is easy to miss something that might cause a > recursive entry in code reviews, and since it can be racy, casual testing > may often miss it as well. I think both your and Paolos answers show that we don't need to cancel the timer. It's okay if the coroutine sleeps for the full duration. I'm happy with your approach. Stefan
Am 20.11.2017 um 23:25 hat Paolo Bonzini geschrieben: > On 20/11/2017 12:16, Stefan Hajnoczi wrote: > > This raises questions about the ability to cancel sleep: > > > > 1. Does something depend on cancelling sleep? > > block_job_cancel does, but in practice the sleep time is so small > (smaller than SLICE_TIME, which is 100 ms) that we probably don't care. Just note that this is something that can happen during the final migration phase when the VM is already stopped. In other words, with non-shared storage, these up to 100 ms are added to the migration downtime. Kevin > I agree with Jeff that canceling the sleep by force-entering the > coroutine seemed clever but is probably a very bad idea. > > Paolo
diff --git a/blockjob.c b/blockjob.c index 3a0c491..e181295 100644 --- a/blockjob.c +++ b/blockjob.c @@ -291,10 +291,10 @@ void block_job_start(BlockJob *job) { assert(job && !block_job_started(job) && job->paused && job->driver && job->driver->start); - job->co = qemu_coroutine_create(block_job_co_entry, job); job->pause_count--; job->busy = true; job->paused = false; + job->co = qemu_coroutine_create(block_job_co_entry, job); bdrv_coroutine_enter(blk_bs(job->blk), job->co); } @@ -797,11 +797,14 @@ void block_job_sleep_ns(BlockJob *job, QEMUClockType type, int64_t ns) return; } - job->busy = false; + /* We need to leave job->busy set here, because when we have + * put a coroutine to 'sleep', we have scheduled it to run in + * the future. We cannot enter that same coroutine again before + * it wakes and runs, otherwise we risk double-entry or entry after + * completion. */ if (!block_job_should_pause(job)) { co_aio_sleep_ns(blk_get_aio_context(job->blk), type, ns); } - job->busy = true; block_job_pause_point(job); }
When block_job_sleep_ns() is called, the co-routine is scheduled for future execution. If we allow the job to be re-entered prior to the scheduled time, we present a race condition in which a coroutine can be entered recursively, or even entered after the coroutine is deleted. The job->busy flag is used by blockjobs when a coroutine is busy executing. The function 'block_job_enter()' obeys the busy flag, and will not enter a coroutine if set. If we sleep a job, we need to leave the busy flag set, so that subsequent calls to block_job_enter() are prevented. This fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1508708 Also, in block_job_start(), set the relevant job flags (.busy, .paused) before creating the coroutine, not just before executing it. Signed-off-by: Jeff Cody <jcody@redhat.com> --- blockjob.c | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-)