Message ID | 5555A33B.20006@de.ibm.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Christian Borntraeger <borntraeger@de.ibm.com> writes: > I see a significant latency (can be minutes with 2000 disks and HZ=100) > when exiting a QEMU process that has lots of disk devices via aio. The > process sits idle doing nothing as zombie in exit_aio waiting for the > completion. > > Turns out that > commit 6098b45b32 ("aio: block exit_aio() until all context requests are > completed") caused the delay. > > Patch description was: > > It seems that exit_aio() also needs to wait for all iocbs to complete (like > io_destroy), but we missed the wait step in current implemention, so fix > it in the same way as we did in io_destroy. > > Now: io_destroy requires to block until everything is cleaned up from its > interface description in the manpage: > DESCRIPTION > The io_destroy() system call will attempt to cancel all outstanding > asynchronous I/O operations against ctx_id, will block on the completion > of all operations that could not be canceled, and will destroy the ctx_id. > > Does process exit require the same full blocking? We might be able to > cleanup the process and let the aio data structures be freed lazily. > Opinions or better ideas? This has already been fixed: commit dc48e56d761610da4ea1088d1bea0a030b8e3e43 Author: Jens Axboe <axboe@fb.com> Date: Wed Apr 15 11:17:23 2015 -0600 aio: fix serial draining in exit_aio() Cheers, Jeff -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Am 15.05.2015 um 15:42 schrieb Jeff Moyer: > Christian Borntraeger <borntraeger@de.ibm.com> writes: > >> I see a significant latency (can be minutes with 2000 disks and HZ=100) >> when exiting a QEMU process that has lots of disk devices via aio. The >> process sits idle doing nothing as zombie in exit_aio waiting for the >> completion. >> >> Turns out that >> commit 6098b45b32 ("aio: block exit_aio() until all context requests are >> completed") caused the delay. >> >> Patch description was: >> >> It seems that exit_aio() also needs to wait for all iocbs to complete (like >> io_destroy), but we missed the wait step in current implemention, so fix >> it in the same way as we did in io_destroy. >> >> Now: io_destroy requires to block until everything is cleaned up from its >> interface description in the manpage: >> DESCRIPTION >> The io_destroy() system call will attempt to cancel all outstanding >> asynchronous I/O operations against ctx_id, will block on the completion >> of all operations that could not be canceled, and will destroy the ctx_id. >> >> Does process exit require the same full blocking? We might be able to >> cleanup the process and let the aio data structures be freed lazily. >> Opinions or better ideas? > > This has already been fixed: > > commit dc48e56d761610da4ea1088d1bea0a030b8e3e43 > Author: Jens Axboe <axboe@fb.com> > Date: Wed Apr 15 11:17:23 2015 -0600 > > aio: fix serial draining in exit_aio() > > Cheers, > Jeff > Cool thanks. As the original patch had cc stable, shouldnt the fix also be backported? Christian -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 05/15/2015 09:26 AM, Christian Borntraeger wrote: > Am 15.05.2015 um 15:42 schrieb Jeff Moyer: >> Christian Borntraeger <borntraeger@de.ibm.com> writes: >> >>> I see a significant latency (can be minutes with 2000 disks and HZ=100) >>> when exiting a QEMU process that has lots of disk devices via aio. The >>> process sits idle doing nothing as zombie in exit_aio waiting for the >>> completion. >>> >>> Turns out that >>> commit 6098b45b32 ("aio: block exit_aio() until all context requests are >>> completed") caused the delay. >>> >>> Patch description was: >>> >>> It seems that exit_aio() also needs to wait for all iocbs to complete (like >>> io_destroy), but we missed the wait step in current implemention, so fix >>> it in the same way as we did in io_destroy. >>> >>> Now: io_destroy requires to block until everything is cleaned up from its >>> interface description in the manpage: >>> DESCRIPTION >>> The io_destroy() system call will attempt to cancel all outstanding >>> asynchronous I/O operations against ctx_id, will block on the completion >>> of all operations that could not be canceled, and will destroy the ctx_id. >>> >>> Does process exit require the same full blocking? We might be able to >>> cleanup the process and let the aio data structures be freed lazily. >>> Opinions or better ideas? >> >> This has already been fixed: >> >> commit dc48e56d761610da4ea1088d1bea0a030b8e3e43 >> Author: Jens Axboe <axboe@fb.com> >> Date: Wed Apr 15 11:17:23 2015 -0600 >> >> aio: fix serial draining in exit_aio() >> >> Cheers, >> Jeff >> > Cool thanks. As the original patch had cc stable, shouldnt the fix also be backported? I'll email stable.
diff --git a/fs/aio.c b/fs/aio.c index a793f70..1e6bcdb 100644 --- a/fs/aio.c +++ b/fs/aio.c @@ -820,8 +820,6 @@ void exit_aio(struct mm_struct *mm) for (i = 0; i < table->nr; ++i) { struct kioctx *ctx = table->table[i]; - struct completion requests_done = - COMPLETION_INITIALIZER_ONSTACK(requests_done); if (!ctx) continue; @@ -833,10 +831,7 @@ void exit_aio(struct mm_struct *mm) * that it needs to unmap the area, just set it to 0. */ ctx->mmap_size = 0; - kill_ioctx(mm, ctx, &requests_done); - - /* Wait until all IO for the context are done. */ - wait_for_completion(&requests_done); + kill_ioctx(mm, ctx, NULL); } RCU_INIT_POINTER(mm->ioctx_table, NULL);