Orangefs ABI documentation

Message ID	CA+D=wkgGZ9A8Qa5C6q3cROrr+Gp=jsgowvcbOs-22UU=aVT7Wg@mail.gmail.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <linux-fsdevel-owner@kernel.org> MIME-Version: 1.0 In-Reply-To: <20160215184554.GY17997@ZenIV.linux.org.uk> References: <CAOg9mSR+=Xj3rtf_DQQjxhWs=HQHYu-3Mjdb3LeOvaYEwLXvYA@mail.gmail.com> <20160212042757.GP17997@ZenIV.linux.org.uk> <CAOg9mSTPQLarX0kpBgfhnpPZL=0Y7netSLhSkhHfAEor_w2mEg@mail.gmail.com> <CA+D=wkhJRVsety9WX9KLGbRUEk1hwSf4Q0oP23c1TtQjr_Oguw@mail.gmail.com> <CAOg9mSQ=pF=Pg60VoSUeywybDATi1M3TTRmjGNEmvyLCh0AAoQ@mail.gmail.com> <20160213174738.GR17997@ZenIV.linux.org.uk> <20160214025615.GU17997@ZenIV.linux.org.uk> <CAOg9mSRgGghk=RAmWZ_MLgp=3DkR+2iV5dU8ffaAFK8aYmM_xQ@mail.gmail.com> <20160214234312.GX17997@ZenIV.linux.org.uk> <CAOg9mSSsztHyRsQ=BAA9FoEMo7XEdgTwYHosqvh4ZjykDapj2A@mail.gmail.com> <20160215184554.GY17997@ZenIV.linux.org.uk> Date: Mon, 15 Feb 2016 17:32:54 -0500 Message-ID: <CA+D=wkgGZ9A8Qa5C6q3cROrr+Gp=jsgowvcbOs-22UU=aVT7Wg@mail.gmail.com> Subject: Re: Orangefs ABI documentation From: Martin Brandenburg <martin@omnibond.com> To: Al Viro <viro@zeniv.linux.org.uk> Cc: Mike Marshall <hubcap@omnibond.com>, Linus Torvalds <torvalds@linux-foundation.org>, linux-fsdevel <linux-fsdevel@vger.kernel.org>, Stephen Rothwell <sfr@canb.auug.org.au> Content-Type: text/plain; charset=UTF-8 Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk

Message ID

CA+D=wkgGZ9A8Qa5C6q3cROrr+Gp=jsgowvcbOs-22UU=aVT7Wg@mail.gmail.com (mailing list archive)

State

New, archived

Headers

MIME-Version: 1.0
In-Reply-To: <20160215184554.GY17997@ZenIV.linux.org.uk>
References: <CAOg9mSR+=Xj3rtf_DQQjxhWs=HQHYu-3Mjdb3LeOvaYEwLXvYA@mail.gmail.com>
	<20160212042757.GP17997@ZenIV.linux.org.uk>
	<CAOg9mSTPQLarX0kpBgfhnpPZL=0Y7netSLhSkhHfAEor_w2mEg@mail.gmail.com>
	<CA+D=wkhJRVsety9WX9KLGbRUEk1hwSf4Q0oP23c1TtQjr_Oguw@mail.gmail.com>
	<CAOg9mSQ=pF=Pg60VoSUeywybDATi1M3TTRmjGNEmvyLCh0AAoQ@mail.gmail.com>
	<20160213174738.GR17997@ZenIV.linux.org.uk>
	<20160214025615.GU17997@ZenIV.linux.org.uk>
	<CAOg9mSRgGghk=RAmWZ_MLgp=3DkR+2iV5dU8ffaAFK8aYmM_xQ@mail.gmail.com>
	<20160214234312.GX17997@ZenIV.linux.org.uk>
	<CAOg9mSSsztHyRsQ=BAA9FoEMo7XEdgTwYHosqvh4ZjykDapj2A@mail.gmail.com>
	<20160215184554.GY17997@ZenIV.linux.org.uk>
Date: Mon, 15 Feb 2016 17:32:54 -0500
Message-ID: <CA+D=wkgGZ9A8Qa5C6q3cROrr+Gp=jsgowvcbOs-22UU=aVT7Wg@mail.gmail.com>
Subject: Re: Orangefs ABI documentation
From: Martin Brandenburg <martin@omnibond.com>
To: Al Viro <viro@zeniv.linux.org.uk>
Cc: Mike Marshall <hubcap@omnibond.com>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	Stephen Rothwell <sfr@canb.auug.org.au>
Content-Type: text/plain; charset=UTF-8
Sender: linux-fsdevel-owner@vger.kernel.org
Precedence: bulk

Commit Message

Martin Brandenburg Feb. 15, 2016, 10:32 p.m. UTC

On 2/15/16, Al Viro <viro@zeniv.linux.org.uk> wrote:
> On Mon, Feb 15, 2016 at 12:46:51PM -0500, Mike Marshall wrote:
>> I pushed the list_del up to the kernel.org for-next branch...
>>
>> And I've been running tests with the CRUDE bandaid... weird
>> results...
>>
>> No oopses, no WARN_ONs... I was running dbench and ls -R
>> or find and kill-minus-nining different ones of them with no
>> perceived resulting problems, so I moved on to signalling
>> the client-core to abort... it restarted numerous times,
>> and then stuff wedged up differently than I've seen before.
>
> There are other problems with that thing (starting with the fact that
> retrying readdir/wait_for_direct_io can try to grab a slot despite the
> bufmap winding down).  OK, at that point I think we should try to see
> if bufmap rewrite works - I've rebased on top of your branch and pushed
> (head at 8c3bc9a).  Bufmap rewrite is really completely untested -
> it's done pretty much blindly and I'd be surprised as hell if it has no
> brainos at the first try.
>

There's at least one major issue aside from a small typo.

Something that used a slot, such as reader, would call
service_operation while holding a bufmap. Then the client-core would
crash, and the kernel would get run_down waiting on the slots to be
given up. But the slots are not given up until someone wakes all the
processes waiting in service_operation up, which happens after all the
slots are given up. Then client-core hangs until someone sends a
deadly signal to all the processes waiting in service_operation or
presumably the timeout expires.

This splits finalize and run_down so that orangefs_devreq_release can
mark the slot map as killed, then purge waiting ops, then wait for all
the slots to be released. Meanwhile, processes which were waiting will
get into orangefs_bufmap_get which will see that the slot map is
shutting down and wait for the client-core to come back.

This is all at https://www.github.com/martinbrandenburg/linux.git branch slots.

-- Martin

 void orangefs_bufmap_put(int buffer_index);
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Al Viro Feb. 15, 2016, 11:04 p.m. UTC | #1

On Mon, Feb 15, 2016 at 05:32:54PM -0500, Martin Brandenburg wrote:

> Something that used a slot, such as reader, would call
> service_operation while holding a bufmap. Then the client-core would
> crash, and the kernel would get run_down waiting on the slots to be
> given up. But the slots are not given up until someone wakes all the
> processes waiting in service_operation up, which happens after all the
> slots are given up. Then client-core hangs until someone sends a
> deadly signal to all the processes waiting in service_operation or
> presumably the timeout expires.
> 
> This splits finalize and run_down so that orangefs_devreq_release can
> mark the slot map as killed, then purge waiting ops, then wait for all
> the slots to be released. Meanwhile, processes which were waiting will
> get into orangefs_bufmap_get which will see that the slot map is
> shutting down and wait for the client-core to come back.

D'oh.  Yes, that was exactly the point of separating mark_dead and run_down -
the latter should've been done after purging all requests.  Fixes folded,
branch force-pushed.
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Mike Marshall Feb. 16, 2016, 11:15 p.m. UTC | #2

This thing is invulnerable now!

Nothing hangs when I kill the client-core, and the client-core
always restarts.

Sometimes, if you hit it right with a kill while dbench is running,
a file create will fail.

I've been trying to trace down why all day, in case there's
something that can be done...

Here's what I see:

  orangefs_create
    service_operation
      wait_for_matching_downcall purges op and returns -EAGAIN
      orangefs_clean_up_interrupted_operation
      if (EAGAIN)
        ...
        goto retry_servicing
      wait_for_matching_downcall returns 0
    service_operation returns 0
  orangefs_create has good return value from service_operation

   op->khandle: 00000000-0000-0000-0000-000000000000
   op->fs_id: 0

   subsequent getattr on bogus object fails orangefs_create on EINVAL.

   seems like the second time around, wait_for_matching_downcall
   must have seen op_state_serviced, but I don't see how yet...

I pushed the new patches out to

gitolite.kernel.org:pub/scm/linux/kernel/git/hubcap/linux
for-next

I made a couple of additional patches that make it easier to read
the flow of gossip statements, and also removed a few lines of vestigial
ASYNC code.

-Mike

On Mon, Feb 15, 2016 at 6:04 PM, Al Viro <viro@zeniv.linux.org.uk> wrote:
> On Mon, Feb 15, 2016 at 05:32:54PM -0500, Martin Brandenburg wrote:
>
>> Something that used a slot, such as reader, would call
>> service_operation while holding a bufmap. Then the client-core would
>> crash, and the kernel would get run_down waiting on the slots to be
>> given up. But the slots are not given up until someone wakes all the
>> processes waiting in service_operation up, which happens after all the
>> slots are given up. Then client-core hangs until someone sends a
>> deadly signal to all the processes waiting in service_operation or
>> presumably the timeout expires.
>>
>> This splits finalize and run_down so that orangefs_devreq_release can
>> mark the slot map as killed, then purge waiting ops, then wait for all
>> the slots to be released. Meanwhile, processes which were waiting will
>> get into orangefs_bufmap_get which will see that the slot map is
>> shutting down and wait for the client-core to come back.
>
> D'oh.  Yes, that was exactly the point of separating mark_dead and run_down -
> the latter should've been done after purging all requests.  Fixes folded,
> branch force-pushed.
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

diff --git a/fs/orangefs/devorangefs-req.c b/fs/orangefs/devorangefs-req.c
index d96bcf10..b27ed1c 100644
--- a/fs/orangefs/devorangefs-req.c
+++ b/fs/orangefs/devorangefs-req.c
@@ -513,6 +513,9 @@  static int orangefs_devreq_release(struct inode
*inode, struct file *file)
 	 * them as purged and wake them up
 	 */
 	purge_inprogress_ops();
+
+	orangefs_bufmap_run_down();
+
 	gossip_debug(GOSSIP_DEV_DEBUG,
 		     "pvfs2-client-core: device close complete\n");
 	open_access_count = 0;
diff --git a/fs/orangefs/orangefs-bufmap.c b/fs/orangefs/orangefs-bufmap.c
index 3c6e07c..c544710 100644
--- a/fs/orangefs/orangefs-bufmap.c
+++ b/fs/orangefs/orangefs-bufmap.c
@@ -20,7 +20,7 @@  static struct slot_map rw_map = {
 };
 static struct slot_map readdir_map = {
 	.c = -1,
-	.q = __WAIT_QUEUE_HEAD_INITIALIZER(rw_map.q)
+	.q = __WAIT_QUEUE_HEAD_INITIALIZER(readdir_map.q)
 };


@@ -430,6 +430,15 @@  void orangefs_bufmap_finalize(void)
 	gossip_debug(GOSSIP_BUFMAP_DEBUG, "orangefs_bufmap_finalize: called\n");
 	mark_killed(&rw_map);
 	mark_killed(&readdir_map);
+	gossip_debug(GOSSIP_BUFMAP_DEBUG,
+		     "orangefs_bufmap_finalize: exiting normally\n");
+}
+
+void orangefs_bufmap_run_down(void)
+{
+	struct orangefs_bufmap *bufmap = __orangefs_bufmap;
+	if (!bufmap)
+		return;
 	run_down(&rw_map);
 	run_down(&readdir_map);
 	spin_lock(&orangefs_bufmap_lock);
@@ -437,8 +446,6 @@  void orangefs_bufmap_finalize(void)
 	spin_unlock(&orangefs_bufmap_lock);
 	orangefs_bufmap_unmap(bufmap);
 	orangefs_bufmap_free(bufmap);
-	gossip_debug(GOSSIP_BUFMAP_DEBUG,
-		     "orangefs_bufmap_finalize: exiting normally\n");
 }

 /*
diff --git a/fs/orangefs/orangefs-bufmap.h b/fs/orangefs/orangefs-bufmap.h
index ad8d82a..0be62be 100644
--- a/fs/orangefs/orangefs-bufmap.h
+++ b/fs/orangefs/orangefs-bufmap.h
@@ -17,6 +17,8 @@  int orangefs_bufmap_initialize(struct
ORANGEFS_dev_map_desc *user_desc);

 void orangefs_bufmap_finalize(void);

+void orangefs_bufmap_run_down(void);
+
 int orangefs_bufmap_get(struct orangefs_bufmap **mapp, int *buffer_index);

Orangefs ABI documentation

Commit Message

Comments

Patch