Message ID | 1349764760-21093-2-git-send-email-koverstreet@google.com (mailing list archive) |
---|---|
State | Deferred, archived |
Headers | show |
On Mon, Oct 08, 2012 at 11:39:17PM -0700, Kent Overstreet wrote:
> Minor refactoring, to get rid of some duplicated code
Honestly: I wouldn't bother. Nothing of consequence uses cancel.
I have an RFC patch series that tears it out. Let me polish that up
send it out, I'll cc: you.
- z
--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
On Tue, Oct 09, 2012 at 11:26:25AM -0700, Zach Brown wrote: > On Mon, Oct 08, 2012 at 11:39:17PM -0700, Kent Overstreet wrote: > > Minor refactoring, to get rid of some duplicated code > > Honestly: I wouldn't bother. Nothing of consequence uses cancel. > > I have an RFC patch series that tears it out. Let me polish that up > send it out, I'll cc: you. Even better :) I've been looking at aio locking the past few days, and I was getting ready to write something up about cancellation to the lists. Short version, supporting cancellation without global sychronization is possible but it'd require help from the allocator. If we can just rip it out though that'd really make me happy. -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
On Tue, Oct 09, 2012 at 02:37:00PM -0700, Kent Overstreet wrote: > > Honestly: I wouldn't bother. Nothing of consequence uses cancel. > > > > I have an RFC patch series that tears it out. Let me polish that up > > send it out, I'll cc: you. > > Even better :) > > I've been looking at aio locking the past few days, and I was getting > ready to write something up about cancellation to the lists. I can definitely think of use cases (inside of Google) where we could really use aio_cancel. The issue is it's hard use it safely (i.e., we need to know whether the file system has already extended the file before we can know whether or not we can safely cancel the I/O). > Short version, supporting cancellation without global sychronization is > possible but it'd require help from the allocator. Well, we would need some kind of flag to indicate whether cancellation is possible, yes. But if we're doing AIO to a raw disk, or we're talking about a read request, we wouldn't need any help from the file system's block allocator. And maybe the current way of doing things isn't the best way. But it would be nice if we didn't completely give up on the functionality of aio_cancel. - Ted -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
> And maybe the current way of doing things isn't the best way. But it > would be nice if we didn't completely give up on the functionality of > aio_cancel. I sympathize, but the reality is that the current infrastructure is very bad and no one is using it. It's not like we're getting rid of the syscall. I'll be behaving exactly as it does today: returning the error code that indicates that cancellation failed because it lost the race with completion. Every caller has to cope with that to use cancel safely. So if someone eventually implements iocb cancel safely we'll be able to plumb it back under the aio syscalls. But until that day I see no reason to carry around buggy infrastructure that is only slowing down the fast path. - z -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
On Wed, Oct 10, 2012 at 02:20:51PM -0700, Zach Brown wrote: > > I sympathize, but the reality is that the current infrastructure > is very bad and no one is using it. > > It's not like we're getting rid of the syscall. I'll be behaving > exactly as it does today: returning the error code that indicates that > cancellation failed because it lost the race with completion. Every > caller has to cope with that to use cancel safely. Yes, I realize it doesn't work today. But hopefully we'll be able to reimplement it in a cleaner way. That's why I said, I hope we don't give up on the functionality of aio_cancel --- not that I'm against removing what we currently have. Cheers, - Ted -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
On Wed, Oct 10, 2012 at 07:03:56AM -0400, Theodore Ts'o wrote: > On Tue, Oct 09, 2012 at 02:37:00PM -0700, Kent Overstreet wrote: > > > Honestly: I wouldn't bother. Nothing of consequence uses cancel. > > > > > > I have an RFC patch series that tears it out. Let me polish that up > > > send it out, I'll cc: you. > > > > Even better :) > > > > I've been looking at aio locking the past few days, and I was getting > > ready to write something up about cancellation to the lists. > > I can definitely think of use cases (inside of Google) where we could > really use aio_cancel. The issue is it's hard use it safely (i.e., we > need to know whether the file system has already extended the file > before we can know whether or not we can safely cancel the I/O). > > > Short version, supporting cancellation without global sychronization is > > possible but it'd require help from the allocator. > > Well, we would need some kind of flag to indicate whether cancellation > is possible, yes. But if we're doing AIO to a raw disk, or we're > talking about a read request, we wouldn't need any help from the > file system's block allocator. > > And maybe the current way of doing things isn't the best way. But it > would be nice if we didn't completely give up on the functionality of > aio_cancel. I thought about this more, and I think ripping out cancellation is a bad idea in the long term too. With what aio is capable of now (O_DIRECT), it's certainly arguable that cancellation doesn't give you much and isn't worth the hassle of implementing. But, if we ever fix aio (and I certainly hope we do) and implement something capable of doing arbitrary IO asynchronously, then we really are going to need cancellation, because of io to sockets/pipes that can be blocked for an unbounded amount of time. Even if it turns out programs don't ever need cancellation in practice, the real kicker is checkpoint/restore - for checkpoint/restore we've got to enumerate all the outstanding operations or wait for them to complete (for the ones that do complete in bounded time), and that's the hard part of cancellation right there. Zach might object here and point out that if we implement asynchronous operations with kernel threads, we can implement cancelling by just killing the thread. But this is only relevant if we _only_ implement asynchronous operations with threads in aio v2, and I personally think that's a lousy idea. The reason is there's a lot of things that need to be fixed with aio in aio v2, so aio v2 really needs to be a complete replacement for existing aio users. It won't be if we start using kernel threads where we weren't before - kernel threads are cheap but they aren't free, we'll regress on performance and that means the new interfaces probably won't get used at all by the existing users. This isn't a big deal - we can implement aio v2 as a generic async syscall mechanism, and then the default implementation for any given syscall will just be the thread pool stuff but use optimized async implementations for cases where it's available - this'll let us make use of much of the existing infrastructure. But it does mean we need a cancellation mechanism that isn't tied to threads - i.e. does basically what the existing aio cancellation does. So, IMO we should bite the bullet and do cancellation right now. Also, I don't think there's anything really terrible about the existing cancellation mechanism (unlike retrys, that code is... wtf), it's just problematic for locking/performance. But that's fixable. It is definitely possible to implement cancellation such that it doesn't cost anything when it's not being used, but we do need some help from the allocator. Say we had a way of iterating over all the slabs in the kiocb cache: if we pass a constructor to kmem_cache_create() we can guarantee that all the refcounts are initialized to 0. Combine that with SLAB_DESTROY_BY_RCU, and we can safely do a try_get_kiocb() when we're iterating over all the kiocbs looking for the one we want. The missing piece is a way to iterate over all those slabs, I haven't been able to find any exposed interface for that. If we could implement such a mechanism, that would be the way to go (and I don't think it'd be useful for just aio, this sort of thing comes up elsewhere. Anything that has to time out operations in particular). The other option would be to use a simpler dedicated allocator - I have a simple fast percpu allocator I wrote awhile back for some driver code, that allocates out of a fixed sized array (I needed to be able to reference objects by a 16 bit id, i.e. index in the array). I could easily change that to allocate pages (i.e. slabs) lazily... then we'd have one of these allocator structs per kioctx so they'd get freed when the kioctx goes away and we'd have less to look through when we want to cancel something. This'd probably be the shortest path, but - while it's no slower than the existing slab allocators, it doesn't do numa allocation. I don't know how much that matters on in practice with objects this small. Besides that, adding another general purpose allocator would IMO be a little silly. Either way, IMO the right way to do it is with help from the allocator. -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
diff --git a/fs/aio.c b/fs/aio.c index 1ad2d97..95419c4 100644 --- a/fs/aio.c +++ b/fs/aio.c @@ -229,6 +229,29 @@ static inline void put_ioctx(struct kioctx *kioctx) __put_ioctx(kioctx); } +static int kiocb_cancel(struct kioctx *ctx, struct kiocb *kiocb, + struct io_event *res) +{ + int (*cancel)(struct kiocb *, struct io_event *); + int ret = -EINVAL; + + cancel = kiocb->ki_cancel; + kiocbSetCancelled(kiocb); + if (cancel) { + kiocb->ki_users++; + spin_unlock_irq(&ctx->ctx_lock); + + memset(res, 0, sizeof(*res)); + res->obj = (u64) kiocb->ki_obj.user; + res->data = kiocb->ki_user_data; + ret = cancel(kiocb, res); + + spin_lock_irq(&ctx->ctx_lock); + } + + return ret; +} + /* ioctx_alloc * Allocates and initializes an ioctx. Returns an ERR_PTR if it failed. */ @@ -304,7 +327,6 @@ out_freectx: */ static void kill_ctx(struct kioctx *ctx) { - int (*cancel)(struct kiocb *, struct io_event *); struct task_struct *tsk = current; DECLARE_WAITQUEUE(wait, tsk); struct io_event res; @@ -315,14 +337,8 @@ static void kill_ctx(struct kioctx *ctx) struct list_head *pos = ctx->active_reqs.next; struct kiocb *iocb = list_kiocb(pos); list_del_init(&iocb->ki_list); - cancel = iocb->ki_cancel; - kiocbSetCancelled(iocb); - if (cancel) { - iocb->ki_users++; - spin_unlock_irq(&ctx->ctx_lock); - cancel(iocb, &res); - spin_lock_irq(&ctx->ctx_lock); - } + + kiocb_cancel(ctx, iocb, &res); } if (!ctx->reqs_active) @@ -1709,7 +1725,7 @@ static struct kiocb *lookup_kiocb(struct kioctx *ctx, struct iocb __user *iocb, SYSCALL_DEFINE3(io_cancel, aio_context_t, ctx_id, struct iocb __user *, iocb, struct io_event __user *, result) { - int (*cancel)(struct kiocb *iocb, struct io_event *res); + struct io_event res; struct kioctx *ctx; struct kiocb *kiocb; u32 key; @@ -1724,32 +1740,22 @@ SYSCALL_DEFINE3(io_cancel, aio_context_t, ctx_id, struct iocb __user *, iocb, return -EINVAL; spin_lock_irq(&ctx->ctx_lock); - ret = -EAGAIN; + kiocb = lookup_kiocb(ctx, iocb, key); - if (kiocb && kiocb->ki_cancel) { - cancel = kiocb->ki_cancel; - kiocb->ki_users ++; - kiocbSetCancelled(kiocb); - } else - cancel = NULL; + if (kiocb) + ret = kiocb_cancel(ctx, kiocb, &res); + else + ret = -EAGAIN; + spin_unlock_irq(&ctx->ctx_lock); - if (NULL != cancel) { - struct io_event tmp; - pr_debug("calling cancel\n"); - memset(&tmp, 0, sizeof(tmp)); - tmp.obj = (u64)(unsigned long)kiocb->ki_obj.user; - tmp.data = kiocb->ki_user_data; - ret = cancel(kiocb, &tmp); - if (!ret) { - /* Cancellation succeeded -- copy the result - * into the user's buffer. - */ - if (copy_to_user(result, &tmp, sizeof(tmp))) - ret = -EFAULT; - } - } else - ret = -EINVAL; + if (!ret) { + /* Cancellation succeeded -- copy the result + * into the user's buffer. + */ + if (copy_to_user(result, &res, sizeof(res))) + ret = -EFAULT; + } put_ioctx(ctx);
Minor refactoring, to get rid of some duplicated code Signed-off-by: Kent Overstreet <koverstreet@google.com> --- fs/aio.c | 72 ++++++++++++++++++++++++++++++++++---------------------------- 1 file changed, 39 insertions(+), 33 deletions(-)