diff mbox

[2/5] aio: kiocb_cancel()

Message ID 1349764760-21093-2-git-send-email-koverstreet@google.com (mailing list archive)
State Deferred, archived
Headers show

Commit Message

Kent Overstreet Oct. 9, 2012, 6:39 a.m. UTC
Minor refactoring, to get rid of some duplicated code

Signed-off-by: Kent Overstreet <koverstreet@google.com>
---
 fs/aio.c |   72 ++++++++++++++++++++++++++++++++++----------------------------
 1 file changed, 39 insertions(+), 33 deletions(-)

Comments

Zach Brown Oct. 9, 2012, 6:26 p.m. UTC | #1
On Mon, Oct 08, 2012 at 11:39:17PM -0700, Kent Overstreet wrote:
> Minor refactoring, to get rid of some duplicated code

Honestly: I wouldn't bother.  Nothing of consequence uses cancel.

I have an RFC patch series that tears it out.  Let me polish that up
send it out, I'll cc: you.

- z

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
Kent Overstreet Oct. 9, 2012, 9:37 p.m. UTC | #2
On Tue, Oct 09, 2012 at 11:26:25AM -0700, Zach Brown wrote:
> On Mon, Oct 08, 2012 at 11:39:17PM -0700, Kent Overstreet wrote:
> > Minor refactoring, to get rid of some duplicated code
> 
> Honestly: I wouldn't bother.  Nothing of consequence uses cancel.
> 
> I have an RFC patch series that tears it out.  Let me polish that up
> send it out, I'll cc: you.

Even better :)

I've been looking at aio locking the past few days, and I was getting
ready to write something up about cancellation to the lists.

Short version, supporting cancellation without global sychronization is
possible but it'd require help from the allocator.

If we can just rip it out though that'd really make me happy.

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
Theodore Ts'o Oct. 10, 2012, 11:03 a.m. UTC | #3
On Tue, Oct 09, 2012 at 02:37:00PM -0700, Kent Overstreet wrote:
> > Honestly: I wouldn't bother.  Nothing of consequence uses cancel.
> > 
> > I have an RFC patch series that tears it out.  Let me polish that up
> > send it out, I'll cc: you.
> 
> Even better :)
> 
> I've been looking at aio locking the past few days, and I was getting
> ready to write something up about cancellation to the lists.

I can definitely think of use cases (inside of Google) where we could
really use aio_cancel.  The issue is it's hard use it safely (i.e., we
need to know whether the file system has already extended the file
before we can know whether or not we can safely cancel the I/O).

> Short version, supporting cancellation without global sychronization is
> possible but it'd require help from the allocator.

Well, we would need some kind of flag to indicate whether cancellation
is possible, yes.  But if we're doing AIO to a raw disk, or we're
talking about a read request, we wouldn't need any help from the
file system's block allocator.

And maybe the current way of doing things isn't the best way.  But it
would be nice if we didn't completely give up on the functionality of
aio_cancel.

						- Ted

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
Zach Brown Oct. 10, 2012, 9:20 p.m. UTC | #4
> And maybe the current way of doing things isn't the best way.  But it
> would be nice if we didn't completely give up on the functionality of
> aio_cancel.

I sympathize, but the reality is that the current infrastructure
is very bad and no one is using it.

It's not like we're getting rid of the syscall.  I'll be behaving
exactly as it does today: returning the error code that indicates that
cancellation failed because it lost the race with completion.  Every
caller has to cope with that to use cancel safely.

So if someone eventually implements iocb cancel safely we'll be able to
plumb it back under the aio syscalls.  But until that day I see no
reason to carry around buggy infrastructure that is only slowing down
the fast path.

- z

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
Theodore Ts'o Oct. 10, 2012, 11:21 p.m. UTC | #5
On Wed, Oct 10, 2012 at 02:20:51PM -0700, Zach Brown wrote:
> 
> I sympathize, but the reality is that the current infrastructure
> is very bad and no one is using it.
> 
> It's not like we're getting rid of the syscall.  I'll be behaving
> exactly as it does today: returning the error code that indicates that
> cancellation failed because it lost the race with completion.  Every
> caller has to cope with that to use cancel safely.

Yes, I realize it doesn't work today.  But hopefully we'll be able to
reimplement it in a cleaner way.  That's why I said, I hope we don't
give up on the functionality of aio_cancel --- not that I'm against
removing what we currently have.

Cheers,

					- Ted

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
Kent Overstreet Oct. 11, 2012, 2:41 a.m. UTC | #6
On Wed, Oct 10, 2012 at 07:03:56AM -0400, Theodore Ts'o wrote:
> On Tue, Oct 09, 2012 at 02:37:00PM -0700, Kent Overstreet wrote:
> > > Honestly: I wouldn't bother.  Nothing of consequence uses cancel.
> > > 
> > > I have an RFC patch series that tears it out.  Let me polish that up
> > > send it out, I'll cc: you.
> > 
> > Even better :)
> > 
> > I've been looking at aio locking the past few days, and I was getting
> > ready to write something up about cancellation to the lists.
> 
> I can definitely think of use cases (inside of Google) where we could
> really use aio_cancel.  The issue is it's hard use it safely (i.e., we
> need to know whether the file system has already extended the file
> before we can know whether or not we can safely cancel the I/O).
> 
> > Short version, supporting cancellation without global sychronization is
> > possible but it'd require help from the allocator.
> 
> Well, we would need some kind of flag to indicate whether cancellation
> is possible, yes.  But if we're doing AIO to a raw disk, or we're
> talking about a read request, we wouldn't need any help from the
> file system's block allocator.
> 
> And maybe the current way of doing things isn't the best way.  But it
> would be nice if we didn't completely give up on the functionality of
> aio_cancel.

I thought about this more, and I think ripping out cancellation is a bad
idea in the long term too.

With what aio is capable of now (O_DIRECT), it's certainly arguable that
cancellation doesn't give you much and isn't worth the hassle of
implementing.

But, if we ever fix aio (and I certainly hope we do) and implement
something capable of doing arbitrary IO asynchronously, then we really
are going to need cancellation, because of io to sockets/pipes that can
be blocked for an unbounded amount of time.

Even if it turns out programs don't ever need cancellation in practice,
the real kicker is checkpoint/restore - for checkpoint/restore we've got
to enumerate all the outstanding operations or wait for them to complete
(for the ones that do complete in bounded time), and that's the hard
part of cancellation right there.

Zach might object here and point out that if we implement asynchronous
operations with kernel threads, we can implement cancelling by just
killing the thread. But this is only relevant if we _only_ implement
asynchronous operations with threads in aio v2, and I personally think
that's a lousy idea.

The reason is there's a lot of things that need to be fixed with aio in
aio v2, so aio v2 really needs to be a complete replacement for existing
aio users. It won't be if we start using kernel threads where we weren't
before - kernel threads are cheap but they aren't free, we'll regress on
performance and that means the new interfaces probably won't get used at
all by the existing users.

This isn't a big deal - we can implement aio v2 as a generic async
syscall mechanism, and then the default implementation for any given
syscall will just be the thread pool stuff but use optimized async
implementations for cases where it's available - this'll let us make use
of much of the existing infrastructure.

But it does mean we need a cancellation mechanism that isn't tied to
threads - i.e. does basically what the existing aio cancellation does.

So, IMO we should bite the bullet and do cancellation right now.

Also, I don't think there's anything really terrible about the existing
cancellation mechanism (unlike retrys, that code is... wtf), it's just
problematic for locking/performance. But that's fixable.

It is definitely possible to implement cancellation such that it doesn't
cost anything when it's not being used, but we do need some help from
the allocator.

Say we had a way of iterating over all the slabs in the kiocb cache:
if we pass a constructor to kmem_cache_create() we can guarantee that
all the refcounts are initialized to 0. Combine that with
SLAB_DESTROY_BY_RCU, and we can safely do a try_get_kiocb() when we're
iterating over all the kiocbs looking for the one we want.

The missing piece is a way to iterate over all those slabs, I haven't
been able to find any exposed interface for that.

If we could implement such a mechanism, that would be the way to go (and
I don't think it'd be useful for just aio, this sort of thing comes up
elsewhere. Anything that has to time out operations in particular). 

The other option would be to use a simpler dedicated allocator - I have
a simple fast percpu allocator I wrote awhile back for some driver code,
that allocates out of a fixed sized array (I needed to be able to
reference objects by a 16 bit id, i.e. index in the array). I could
easily change that to allocate pages (i.e. slabs) lazily... then we'd
have one of these allocator structs per kioctx so they'd get freed when
the kioctx goes away and we'd have less to look through when we want to
cancel something.

This'd probably be the shortest path, but - while it's no slower than
the existing slab allocators, it doesn't do numa allocation. I don't
know how much that matters on in practice with objects this small.
Besides that, adding another general purpose allocator would IMO be a
little silly.

Either way, IMO the right way to do it is with help from the allocator.

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
diff mbox

Patch

diff --git a/fs/aio.c b/fs/aio.c
index 1ad2d97..95419c4 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -229,6 +229,29 @@  static inline void put_ioctx(struct kioctx *kioctx)
 		__put_ioctx(kioctx);
 }
 
+static int kiocb_cancel(struct kioctx *ctx, struct kiocb *kiocb,
+			struct io_event *res)
+{
+	int (*cancel)(struct kiocb *, struct io_event *);
+	int ret = -EINVAL;
+
+	cancel = kiocb->ki_cancel;
+	kiocbSetCancelled(kiocb);
+	if (cancel) {
+		kiocb->ki_users++;
+		spin_unlock_irq(&ctx->ctx_lock);
+
+		memset(res, 0, sizeof(*res));
+		res->obj = (u64) kiocb->ki_obj.user;
+		res->data = kiocb->ki_user_data;
+		ret = cancel(kiocb, res);
+
+		spin_lock_irq(&ctx->ctx_lock);
+	}
+
+	return ret;
+}
+
 /* ioctx_alloc
  *	Allocates and initializes an ioctx.  Returns an ERR_PTR if it failed.
  */
@@ -304,7 +327,6 @@  out_freectx:
  */
 static void kill_ctx(struct kioctx *ctx)
 {
-	int (*cancel)(struct kiocb *, struct io_event *);
 	struct task_struct *tsk = current;
 	DECLARE_WAITQUEUE(wait, tsk);
 	struct io_event res;
@@ -315,14 +337,8 @@  static void kill_ctx(struct kioctx *ctx)
 		struct list_head *pos = ctx->active_reqs.next;
 		struct kiocb *iocb = list_kiocb(pos);
 		list_del_init(&iocb->ki_list);
-		cancel = iocb->ki_cancel;
-		kiocbSetCancelled(iocb);
-		if (cancel) {
-			iocb->ki_users++;
-			spin_unlock_irq(&ctx->ctx_lock);
-			cancel(iocb, &res);
-			spin_lock_irq(&ctx->ctx_lock);
-		}
+
+		kiocb_cancel(ctx, iocb, &res);
 	}
 
 	if (!ctx->reqs_active)
@@ -1709,7 +1725,7 @@  static struct kiocb *lookup_kiocb(struct kioctx *ctx, struct iocb __user *iocb,
 SYSCALL_DEFINE3(io_cancel, aio_context_t, ctx_id, struct iocb __user *, iocb,
 		struct io_event __user *, result)
 {
-	int (*cancel)(struct kiocb *iocb, struct io_event *res);
+	struct io_event res;
 	struct kioctx *ctx;
 	struct kiocb *kiocb;
 	u32 key;
@@ -1724,32 +1740,22 @@  SYSCALL_DEFINE3(io_cancel, aio_context_t, ctx_id, struct iocb __user *, iocb,
 		return -EINVAL;
 
 	spin_lock_irq(&ctx->ctx_lock);
-	ret = -EAGAIN;
+
 	kiocb = lookup_kiocb(ctx, iocb, key);
-	if (kiocb && kiocb->ki_cancel) {
-		cancel = kiocb->ki_cancel;
-		kiocb->ki_users ++;
-		kiocbSetCancelled(kiocb);
-	} else
-		cancel = NULL;
+	if (kiocb)
+		ret = kiocb_cancel(ctx, kiocb, &res);
+	else
+		ret = -EAGAIN;
+
 	spin_unlock_irq(&ctx->ctx_lock);
 
-	if (NULL != cancel) {
-		struct io_event tmp;
-		pr_debug("calling cancel\n");
-		memset(&tmp, 0, sizeof(tmp));
-		tmp.obj = (u64)(unsigned long)kiocb->ki_obj.user;
-		tmp.data = kiocb->ki_user_data;
-		ret = cancel(kiocb, &tmp);
-		if (!ret) {
-			/* Cancellation succeeded -- copy the result
-			 * into the user's buffer.
-			 */
-			if (copy_to_user(result, &tmp, sizeof(tmp)))
-				ret = -EFAULT;
-		}
-	} else
-		ret = -EINVAL;
+	if (!ret) {
+		/* Cancellation succeeded -- copy the result
+		 * into the user's buffer.
+		 */
+		if (copy_to_user(result, &res, sizeof(res)))
+			ret = -EFAULT;
+	}
 
 	put_ioctx(ctx);