diff mbox

BUG at net/sunrpc/svc_xprt.c:921 (another one)

Message ID 20130212205216.GE10267@fieldses.org (mailing list archive)
State New, archived
Headers show

Commit Message

J. Bruce Fields Feb. 12, 2013, 8:52 p.m. UTC
On Sun, Jan 20, 2013 at 05:51:12PM -0500, Mark Lord wrote:
> Got it again, this time on a different system
> running mostly the same software.

Mark, Pawe?, Tom, could any of you confirm whether this helps?

--b.

--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Mark Lord Feb. 13, 2013, 3 p.m. UTC | #1
On 13-02-12 03:52 PM, J. Bruce Fields wrote:
> On Sun, Jan 20, 2013 at 05:51:12PM -0500, Mark Lord wrote:
>> Got it again, this time on a different system
>> running mostly the same software.
> 
> Mark, Pawe?, Tom, could any of you confirm whether this helps?
..

No, I cannot confirm one way or the other,
because I haven't noticed it again since the most recent
couple of occurrences I posted earlier here.

Cheers
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
J. Bruce Fields Feb. 15, 2013, 7:22 p.m. UTC | #2
On Wed, Feb 13, 2013 at 10:00:58AM -0500, Mark Lord wrote:
> On 13-02-12 03:52 PM, J. Bruce Fields wrote:
> > On Sun, Jan 20, 2013 at 05:51:12PM -0500, Mark Lord wrote:
> >> Got it again, this time on a different system
> >> running mostly the same software.
> > 
> > Mark, Pawe?, Tom, could any of you confirm whether this helps?
> ..
> 
> No, I cannot confirm one way or the other,
> because I haven't noticed it again since the most recent
> couple of occurrences I posted earlier here.

OK, I do have at least one report of someone who saw the bug before
running for a day without seeing it:

	https://bugzilla.redhat.com/show_bug.cgi?id=910457

Any more reports positive or negative welcome.

This looks like it should solve the problem, so for now I intend to
commit the following.

--b.
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Pawe? Sikora Feb. 15, 2013, 7:33 p.m. UTC | #3
On Tuesday 12 of February 2013 15:52:17 J. Bruce Fields wrote:
> On Sun, Jan 20, 2013 at 05:51:12PM -0500, Mark Lord wrote:
> > Got it again, this time on a different system
> > running mostly the same software.
> 
> Mark, Pawe?, Tom, could any of you confirm whether this helps?

with this patch i can confirm a 2 day uptime w/o any oopses.

--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Tom Horsley Feb. 15, 2013, 7:42 p.m. UTC | #4
On Fri, 15 Feb 2013 14:22:29 -0500
J. Bruce Fields wrote:

> Any more reports positive or negative welcome.

Well, I don't have the time or energy to try patches on my
system at work, but these seem to be concerned with terminating
an NFS connection. My aborts all happen at boot when it
is trying to do the mount to begin with.
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
J. Bruce Fields Feb. 15, 2013, 7:46 p.m. UTC | #5
On Fri, Feb 15, 2013 at 02:42:08PM -0500, Tom Horsley wrote:
> On Fri, 15 Feb 2013 14:22:29 -0500
> J. Bruce Fields wrote:
> 
> > Any more reports positive or negative welcome.
> 
> Well, I don't have the time or energy to try patches on my
> system at work, but these seem to be concerned with terminating
> an NFS connection. My aborts all happen at boot when it
> is trying to do the mount to begin with.

The rpc server it's tearing down is a callback server that runs on the
nfs client.

--b.
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
J. Bruce Fields Feb. 17, 2013, 3:54 p.m. UTC | #6
On Fri, Feb 15, 2013 at 08:33:14PM +0100, Pawe? Sikora wrote:
> On Tuesday 12 of February 2013 15:52:17 J. Bruce Fields wrote:
> > On Sun, Jan 20, 2013 at 05:51:12PM -0500, Mark Lord wrote:
> > > Got it again, this time on a different system
> > > running mostly the same software.
> > 
> > Mark, Pawe?, Tom, could any of you confirm whether this helps?
> 
> with this patch i can confirm a 2 day uptime w/o any oopses.
> 

Thanks for the report.--b.
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Pawe? Sikora Feb. 18, 2013, 7:42 a.m. UTC | #7
On Sunday 17 of February 2013 10:54:20 J. Bruce Fields wrote:
> On Fri, Feb 15, 2013 at 08:33:14PM +0100, Pawe? Sikora wrote:
> > On Tuesday 12 of February 2013 15:52:17 J. Bruce Fields wrote:
> > > On Sun, Jan 20, 2013 at 05:51:12PM -0500, Mark Lord wrote:
> > > > Got it again, this time on a different system
> > > > running mostly the same software.
> > > 
> > > Mark, Pawe?, Tom, could any of you confirm whether this helps?
> > 
> > with this patch i can confirm a 2 day uptime w/o any oopses.
> > 
> 
> Thanks for the report.--b.

please push this patch to the 3.7.y stable kernel.

--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/net/sunrpc/svc.c b/net/sunrpc/svc.c
index dbf12ac..2d34b6b 100644
--- a/net/sunrpc/svc.c
+++ b/net/sunrpc/svc.c
@@ -515,15 +515,6 @@  EXPORT_SYMBOL_GPL(svc_create_pooled);
 
 void svc_shutdown_net(struct svc_serv *serv, struct net *net)
 {
-	/*
-	 * The set of xprts (contained in the sv_tempsocks and
-	 * sv_permsocks lists) is now constant, since it is modified
-	 * only by accepting new sockets (done by service threads in
-	 * svc_recv) or aging old ones (done by sv_temptimer), or
-	 * configuration changes (excluded by whatever locking the
-	 * caller is using--nfsd_mutex in the case of nfsd).  So it's
-	 * safe to traverse those lists and shut everything down:
-	 */
 	svc_close_net(serv, net);
 
 	if (serv->sv_shutdown)
diff --git a/net/sunrpc/svc_xprt.c b/net/sunrpc/svc_xprt.c
index b8e47fa..ca71056 100644
--- a/net/sunrpc/svc_xprt.c
+++ b/net/sunrpc/svc_xprt.c
@@ -856,7 +856,6 @@  static void svc_age_temp_xprts(unsigned long closure)
 	struct svc_serv *serv = (struct svc_serv *)closure;
 	struct svc_xprt *xprt;
 	struct list_head *le, *next;
-	LIST_HEAD(to_be_aged);
 
 	dprintk("svc_age_temp_xprts\n");
 
@@ -877,25 +876,15 @@  static void svc_age_temp_xprts(unsigned long closure)
 		if (atomic_read(&xprt->xpt_ref.refcount) > 1 ||
 		    test_bit(XPT_BUSY, &xprt->xpt_flags))
 			continue;
-		svc_xprt_get(xprt);
-		list_move(le, &to_be_aged);
+		list_del_init(le);
 		set_bit(XPT_CLOSE, &xprt->xpt_flags);
 		set_bit(XPT_DETACHED, &xprt->xpt_flags);
-	}
-	spin_unlock_bh(&serv->sv_lock);
-
-	while (!list_empty(&to_be_aged)) {
-		le = to_be_aged.next;
-		/* fiddling the xpt_list node is safe 'cos we're XPT_DETACHED */
-		list_del_init(le);
-		xprt = list_entry(le, struct svc_xprt, xpt_list);
-
 		dprintk("queuing xprt %p for closing\n", xprt);
 
 		/* a thread will dequeue and close it soon */
 		svc_xprt_enqueue(xprt);
-		svc_xprt_put(xprt);
 	}
+	spin_unlock_bh(&serv->sv_lock);
 
 	mod_timer(&serv->sv_temptimer, jiffies + svc_conn_age_period * HZ);
 }
@@ -959,21 +948,24 @@  void svc_close_xprt(struct svc_xprt *xprt)
 }
 EXPORT_SYMBOL_GPL(svc_close_xprt);
 
-static void svc_close_list(struct svc_serv *serv, struct list_head *xprt_list, struct net *net)
+static int svc_close_list(struct svc_serv *serv, struct list_head *xprt_list, struct net *net)
 {
 	struct svc_xprt *xprt;
+	int ret = 0;
 
 	spin_lock(&serv->sv_lock);
 	list_for_each_entry(xprt, xprt_list, xpt_list) {
 		if (xprt->xpt_net != net)
 			continue;
+		ret++;
 		set_bit(XPT_CLOSE, &xprt->xpt_flags);
-		set_bit(XPT_BUSY, &xprt->xpt_flags);
+		svc_xprt_enqueue(xprt);
 	}
 	spin_unlock(&serv->sv_lock);
+	return ret;
 }
 
-static void svc_clear_pools(struct svc_serv *serv, struct net *net)
+static struct svc_xprt *svc_dequeue_net(struct svc_serv *serv, struct net *net)
 {
 	struct svc_pool *pool;
 	struct svc_xprt *xprt;
@@ -988,42 +980,46 @@  static void svc_clear_pools(struct svc_serv *serv, struct net *net)
 			if (xprt->xpt_net != net)
 				continue;
 			list_del_init(&xprt->xpt_ready);
+			spin_unlock_bh(&pool->sp_lock);
+			return xprt;
 		}
 		spin_unlock_bh(&pool->sp_lock);
 	}
+	return NULL;
 }
 
-static void svc_clear_list(struct svc_serv *serv, struct list_head *xprt_list, struct net *net)
+static void svc_clean_up_xprts(struct svc_serv *serv, struct net *net)
 {
 	struct svc_xprt *xprt;
-	struct svc_xprt *tmp;
-	LIST_HEAD(victims);
-
-	spin_lock(&serv->sv_lock);
-	list_for_each_entry_safe(xprt, tmp, xprt_list, xpt_list) {
-		if (xprt->xpt_net != net)
-			continue;
-		list_move(&xprt->xpt_list, &victims);
-	}
-	spin_unlock(&serv->sv_lock);
 
-	list_for_each_entry_safe(xprt, tmp, &victims, xpt_list)
+	while ((xprt = svc_dequeue_net(serv, net))) {
+		set_bit(XPT_CLOSE, &xprt->xpt_flags);
 		svc_delete_xprt(xprt);
+	}
 }
 
+/*
+ * Server threads may still be running (especially in the case where the
+ * service is still running in other network namespaces).
+ *
+ * So we shut down sockets the same way we would on a running server, by
+ * setting XPT_CLOSE, enqueuing, and letting a thread pick it up to do
+ * the close.  In the case there are no such other threads,
+ * threads running, svc_clean_up_xprts() does a simple version of a
+ * server's main event loop, and in the case where there are other
+ * threads, we may need to wait a little while and then check again to
+ * see if they're done.
+ */
 void svc_close_net(struct svc_serv *serv, struct net *net)
 {
-	svc_close_list(serv, &serv->sv_tempsocks, net);
-	svc_close_list(serv, &serv->sv_permsocks, net);
+	int delay = 0;
 
-	svc_clear_pools(serv, net);
-	/*
-	 * At this point the sp_sockets lists will stay empty, since
-	 * svc_xprt_enqueue will not add new entries without taking the
-	 * sp_lock and checking XPT_BUSY.
-	 */
-	svc_clear_list(serv, &serv->sv_tempsocks, net);
-	svc_clear_list(serv, &serv->sv_permsocks, net);
+	while (svc_close_list(serv, &serv->sv_permsocks, net) +
+	       svc_close_list(serv, &serv->sv_tempsocks, net)) {
+
+		svc_clean_up_xprts(serv, net);
+		msleep(delay++);
+	}
 }
 
 /*