Orangefs ABI documentation

Message ID	20160217230900.GP17997@ZenIV.linux.org.uk (mailing list archive)
State	New, archived
Headers	show Return-Path: <linux-fsdevel-owner@kernel.org> Date: Wed, 17 Feb 2016 23:09:00 +0000 From: Al Viro <viro@ZenIV.linux.org.uk> To: Martin Brandenburg <martin@omnibond.com> Cc: Mike Marshall <hubcap@omnibond.com>, Linus Torvalds <torvalds@linux-foundation.org>, linux-fsdevel <linux-fsdevel@vger.kernel.org>, Stephen Rothwell <sfr@canb.auug.org.au> Subject: Re: Orangefs ABI documentation Message-ID: <20160217230900.GP17997@ZenIV.linux.org.uk> References: <20160214234312.GX17997@ZenIV.linux.org.uk> <CAOg9mSSsztHyRsQ=BAA9FoEMo7XEdgTwYHosqvh4ZjykDapj2A@mail.gmail.com> <20160215184554.GY17997@ZenIV.linux.org.uk> <CA+D=wkgGZ9A8Qa5C6q3cROrr+Gp=jsgowvcbOs-22UU=aVT7Wg@mail.gmail.com> <20160215230434.GZ17997@ZenIV.linux.org.uk> <CAOg9mSSdsVL8tRiAojg=UCNPQ6iPcthtdowH9kWyiWnXUvTEHg@mail.gmail.com> <20160216233609.GE17997@ZenIV.linux.org.uk> <20160216235441.GF17997@ZenIV.linux.org.uk> <CAOg9mSR0QZOy-JAesN32o51omxdOQyfUh-Zj8k7ZohKLaa3GxQ@mail.gmail.com> <alpine.BSO.2.20.1602171722490.29280@server1.sysblue.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <alpine.BSO.2.20.1602171722490.29280@server1.sysblue.org> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk

Message ID

20160217230900.GP17997@ZenIV.linux.org.uk (mailing list archive)

State

New, archived

Headers

Date: Wed, 17 Feb 2016 23:09:00 +0000
From: Al Viro <viro@ZenIV.linux.org.uk>
To: Martin Brandenburg <martin@omnibond.com>
Cc: Mike Marshall <hubcap@omnibond.com>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	Stephen Rothwell <sfr@canb.auug.org.au>
Subject: Re: Orangefs ABI documentation
Message-ID: <20160217230900.GP17997@ZenIV.linux.org.uk>
References: <20160214234312.GX17997@ZenIV.linux.org.uk>
	<CAOg9mSSsztHyRsQ=BAA9FoEMo7XEdgTwYHosqvh4ZjykDapj2A@mail.gmail.com>
	<20160215184554.GY17997@ZenIV.linux.org.uk>
	<CA+D=wkgGZ9A8Qa5C6q3cROrr+Gp=jsgowvcbOs-22UU=aVT7Wg@mail.gmail.com>
	<20160215230434.GZ17997@ZenIV.linux.org.uk>
	<CAOg9mSSdsVL8tRiAojg=UCNPQ6iPcthtdowH9kWyiWnXUvTEHg@mail.gmail.com>
	<20160216233609.GE17997@ZenIV.linux.org.uk>
	<20160216235441.GF17997@ZenIV.linux.org.uk>
	<CAOg9mSR0QZOy-JAesN32o51omxdOQyfUh-Zj8k7ZohKLaa3GxQ@mail.gmail.com>
	<alpine.BSO.2.20.1602171722490.29280@server1.sysblue.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <alpine.BSO.2.20.1602171722490.29280@server1.sysblue.org>
User-Agent: Mutt/1.5.21 (2010-09-15)
Sender: linux-fsdevel-owner@vger.kernel.org
Precedence: bulk

Commit Message

Al Viro Feb. 17, 2016, 11:09 p.m. UTC

On Wed, Feb 17, 2016 at 05:40:08PM -0500, Martin Brandenburg wrote:

> In orangefs_clean_up_interrupted_operation
> 
> 	if (op_state_waiting(op)) {
> 		/*
> 		 * upcall hasn't been read; remove op from upcall request
> 		 * list.
> 		 */
> 		spin_unlock(&op->lock);
> 
> 		/* HERE */
> 
> 		spin_lock(&orangefs_request_list_lock);
> 		list_del(&op->list);
> 		spin_unlock(&orangefs_request_list_lock);

Hmm...  We'd already marked it as given up, though.  Before dropping op->lock.

> 		gossip_debug(GOSSIP_WAIT_DEBUG,
> 			     "Interrupted: Removed op %p from request_list\n",
> 			     op);
> 	} else if (op_state_in_progress(op)) {
> 
> and orangefs_devreq_read
> 
> restart:
> 	/* Get next op (if any) from top of list. */
> 	spin_lock(&orangefs_request_list_lock);
> 	list_for_each_entry_safe(op, temp, &orangefs_request_list, list) {
> 		__s32 fsid;
> 		/* This lock is held past the end of the loop when we break. */
> 
> 		/* HERE */
> 
> 		spin_lock(&op->lock);
> 		if (unlikely(op_state_purged(op))) {
> 			spin_unlock(&op->lock);
> 			continue;
> 		}
> 
> I think both processes can end up working on the same
> op.

It can be picked up.  And then we'll run into

        if (unlikely(op_state_given_up(cur_op))) {
                spin_unlock(&cur_op->lock);
                spin_unlock(&htable_ops_in_progress_lock);
                op_release(cur_op);
                goto restart;

Oh, I see...  OK, yes - by the time we get to that check the sucker has
already been through resubmitting it into the list, so the "given up" flag
is lost.

Hrm...  The obvious approach is to at least avoid taking it off the list
if it's given up - i.e. turn
                __s32 fsid;
                /* This lock is held past the end of the loop when we break. */
                spin_lock(&op->lock);
                if (unlikely(op_state_purged(op))) {
into
                __s32 fsid;
                /* This lock is held past the end of the loop when we break. */
                spin_lock(&op->lock);
                if (unlikely(op_state_purged(op) || op_state_given_up(op))) {
a bit before that point.

However, that doesn't prevent all unpleasantness here - giving up just as
it's being copied to userland and going into restart.  Ho-hum...  How about
the following:
	* move increment of op->attempts into the same place where we
set "given up"
	* in addition to the check for "given up" in the request-picking loop
(as above), fetch op->attempts before dropping op->lock
	* after having retaken op->lock (after copy_to_user()) recheck
op->attempts instead of checking for "given up".

IOW, something like this:

--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Al Viro Feb. 17, 2016, 11:15 p.m. UTC | #1

On Wed, Feb 17, 2016 at 11:09:00PM +0000, Al Viro wrote:

> However, that doesn't prevent all unpleasantness here - giving up just as
> it's being copied to userland and going into restart.  Ho-hum...  How about
> the following:
> 	* move increment of op->attempts into the same place where we
> set "given up"
> 	* in addition to the check for "given up" in the request-picking loop
> (as above), fetch op->attempts before dropping op->lock
> 	* after having retaken op->lock (after copy_to_user()) recheck
> op->attempts instead of checking for "given up".

	Crap...  There's a similar problem on the other end - in
orangefs_devreq_write_iter() between the time when op has been fetched
from the hash and the time we finish copying reply from userland.  Same
kind of "what it clear_... gets all way through and we resubmit it before
we get around to checking the given_up flag" problem...

	Let me think a bit...
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

diff --git a/fs/orangefs/devorangefs-req.c b/fs/orangefs/devorangefs-req.c
index b27ed1c..1938d55 100644
--- a/fs/orangefs/devorangefs-req.c
+++ b/fs/orangefs/devorangefs-req.c
@@ -109,6 +109,7 @@  static ssize_t orangefs_devreq_read(struct file *file,
 	static __s32 magic = ORANGEFS_DEVREQ_MAGIC;
 	struct orangefs_kernel_op_s *cur_op = NULL;
 	unsigned long ret;
+	int attempts;
 
 	/* We do not support blocking IO. */
 	if (!(file->f_flags & O_NONBLOCK)) {
@@ -133,7 +134,7 @@  restart:
 		__s32 fsid;
 		/* This lock is held past the end of the loop when we break. */
 		spin_lock(&op->lock);
-		if (unlikely(op_state_purged(op))) {
+		if (unlikely(op_state_purged(op) || op_state_given_up(op))) {
 			spin_unlock(&op->lock);
 			continue;
 		}
@@ -207,6 +208,7 @@  restart:
 	list_del_init(&cur_op->list);
 	get_op(op);
 	spin_unlock(&orangefs_request_list_lock);
+	attempts = op->attempts;
 
 	spin_unlock(&cur_op->lock);
 
@@ -227,7 +229,8 @@  restart:
 
 	spin_lock(&htable_ops_in_progress_lock);
 	spin_lock(&cur_op->lock);
-	if (unlikely(op_state_given_up(cur_op))) {
+	if (unlikely(cur_op->attempts != attempts)) {
+		/* given up just as we copied to userland */
 		spin_unlock(&cur_op->lock);
 		spin_unlock(&htable_ops_in_progress_lock);
 		op_release(cur_op);
diff --git a/fs/orangefs/waitqueue.c b/fs/orangefs/waitqueue.c
index d980240..cc43ac8 100644
--- a/fs/orangefs/waitqueue.c
+++ b/fs/orangefs/waitqueue.c
@@ -139,7 +139,6 @@  retry_servicing:
 	op->downcall.status = ret;
 	/* retry if operation has not been serviced and if requested */
 	if (ret == -EAGAIN) {
-		op->attempts++;
 		timeout = op_timeout_secs * HZ;
 		gossip_debug(GOSSIP_WAIT_DEBUG,
 			     "orangefs: tag %llu (%s)"
@@ -208,6 +207,7 @@  static void orangefs_clean_up_interrupted_operation(struct orangefs_kernel_op_s
 	 * Called with op->lock held.
 	 */
 	op->op_state |= OP_VFS_STATE_GIVEN_UP;
+	op->attempts++;
 
 	if (op_state_waiting(op)) {
 		/*

Orangefs ABI documentation

Commit Message

Comments

Patch