diff mbox

Orangefs ABI documentation

Message ID 20160130172244.GD17997@ZenIV.linux.org.uk (mailing list archive)
State New, archived
Headers show

Commit Message

Al Viro Jan. 30, 2016, 5:22 p.m. UTC
On Sun, Jan 24, 2016 at 05:12:30PM -0500, Mike Marshall wrote:
> But in my tests, if I kill the client-core bad things happen...
> sometimes the client-core doesn't restart, and the kernel gets
> sick (hangs or slows way down but no oops). When the client-core
> does restart, the activity I had going on (dbench again) fizzles out,
> and the filesystem is corrupted...

> Anyhow, I don't think the "restart the client-core" code is up to snuff <g>.
> 
> I'll look closer at how the out-of-tree module works, maybe it really
> does work and we've broken it with our massive changes to the
> upstream version over the last few years. I see that the client (whose
> job it is to restart the client-core) and the client-core implement
> signal handling with signal(2), whose man page says to use
> sigaction(2) instead...

Could you try this and see if either WARN_ON() actually triggers?

--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/fs/orangefs/file.c b/fs/orangefs/file.c
index c585063d..e2ab0d4 100644
--- a/fs/orangefs/file.c
+++ b/fs/orangefs/file.c
@@ -246,10 +246,7 @@  populate_shared_memory:
 				       iter,
 				       new_op->downcall.resp.io.amt_complete);
 		if (ret < 0) {
-			/*
-			 * put error codes in downcall so that handle_io_error()
-			 * preserves it properly
-			 */
+			WARN_ON(!op_state_serviced(new_op));
 			new_op->downcall.status = ret;
 			handle_io_error();
 			goto out;
diff --git a/fs/orangefs/waitqueue.c b/fs/orangefs/waitqueue.c
index cdbf57b..191d886 100644
--- a/fs/orangefs/waitqueue.c
+++ b/fs/orangefs/waitqueue.c
@@ -205,6 +205,7 @@  retry_servicing:
 
 		/* op uses shared memory */
 		if (orangefs_get_bufmap_init() == 0) {
+			WARN_ON(1);
 			/*
 			 * This operation uses the shared memory system AND
 			 * the system is not yet ready. This situation occurs