diff mbox

svcrpc: fix svc_xprt_enqueue/svc_recv busy-looping

Message ID 20120820223746.GL5779@fieldses.org (mailing list archive)
State New, archived
Headers show

Commit Message

J. Bruce Fields Aug. 20, 2012, 10:37 p.m. UTC
From: "J. Bruce Fields" <bfields@redhat.com>

The rpc server tries to ensure that there will be room to send a reply
before it receives a request.

It does this by tracking, in xpt_reserved, an upper bound on the total
size of the replies that is has already committed to for the socket.

Currently it is adding in the estimate for a new reply *before* it
checks whether there is space available.  If it finds that there is not
space, it then subtracts the estimate back out.

This may lead the subsequent svc_xprt_enqueue to decide that there is
space after all.

The results is a svc_recv() that will repeatedly return -EAGAIN, causing
server threads to loop without doing any actual work.

Cc: stable@vger.kernel.org
Reported-by: Michael Tokarev <mjt@tls.msk.ru>
Tested-by: Michael Tokarev <mjt@tls.msk.ru>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
---
 net/sunrpc/svc_xprt.c |    7 ++-----
 1 file changed, 2 insertions(+), 5 deletions(-)

Queuing up for 3.6 absent any objections.--b.

Comments

J. Bruce Fields Aug. 20, 2012, 10:49 p.m. UTC | #1
On Mon, Aug 20, 2012 at 06:37:47PM -0400, bfields wrote:
> From: "J. Bruce Fields" <bfields@redhat.com>
> 
> The rpc server tries to ensure that there will be room to send a reply
> before it receives a request.
> 
> It does this by tracking, in xpt_reserved, an upper bound on the total
> size of the replies that is has already committed to for the socket.
> 
> Currently it is adding in the estimate for a new reply *before* it
> checks whether there is space available.  If it finds that there is not
> space, it then subtracts the estimate back out.
> 
> This may lead the subsequent svc_xprt_enqueue to decide that there is
> space after all.
> 
> The results is a svc_recv() that will repeatedly return -EAGAIN, causing
> server threads to loop without doing any actual work.
> 
> Cc: stable@vger.kernel.org
> Reported-by: Michael Tokarev <mjt@tls.msk.ru>
> Tested-by: Michael Tokarev <mjt@tls.msk.ru>
> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
> ---
>  net/sunrpc/svc_xprt.c |    7 ++-----
>  1 file changed, 2 insertions(+), 5 deletions(-)
> 
> Queuing up for 3.6 absent any objections.--b.

By the way, one thing I'm still curious about is how this got
introduced.  mjt bisected it to f03d78db65085609938fdb686238867e65003181
"net: refine {udp|tcp|sctp}_mem limits", which looks like it just made
the problem a little more likely.

The last substantive change to has_wspace logic was Trond's
47fcb03fefee2501e79176932a4184fc24d6f8ec, but I have a tough time
figuring out whether that would have affected it one way or the other.

As far as I can tell we've always added to xpt_reserved in this way, so
that svc_recv and svc_xprt_enqueue are comparing different things, and
surely this was always wrong even if the problem must have been harder
to trigger before.

But some of the wspace logic I don't understand, so cc'ing Neil and
Trond in case they see any other problem I missed.

--b.

> 
> diff --git a/net/sunrpc/svc_xprt.c b/net/sunrpc/svc_xprt.c
> index 0d693a8..bac973a 100644
> --- a/net/sunrpc/svc_xprt.c
> +++ b/net/sunrpc/svc_xprt.c
> @@ -316,7 +316,6 @@ static bool svc_xprt_has_something_to_do(struct svc_xprt *xprt)
>   */
>  void svc_xprt_enqueue(struct svc_xprt *xprt)
>  {
> -	struct svc_serv	*serv = xprt->xpt_server;
>  	struct svc_pool *pool;
>  	struct svc_rqst	*rqstp;
>  	int cpu;
> @@ -362,8 +361,6 @@ void svc_xprt_enqueue(struct svc_xprt *xprt)
>  				rqstp, rqstp->rq_xprt);
>  		rqstp->rq_xprt = xprt;
>  		svc_xprt_get(xprt);
> -		rqstp->rq_reserved = serv->sv_max_mesg;
> -		atomic_add(rqstp->rq_reserved, &xprt->xpt_reserved);
>  		pool->sp_stats.threads_woken++;
>  		wake_up(&rqstp->rq_wait);
>  	} else {
> @@ -640,8 +637,6 @@ int svc_recv(struct svc_rqst *rqstp, long timeout)
>  	if (xprt) {
>  		rqstp->rq_xprt = xprt;
>  		svc_xprt_get(xprt);
> -		rqstp->rq_reserved = serv->sv_max_mesg;
> -		atomic_add(rqstp->rq_reserved, &xprt->xpt_reserved);
>  
>  		/* As there is a shortage of threads and this request
>  		 * had to be queued, don't allow the thread to wait so
> @@ -738,6 +733,8 @@ int svc_recv(struct svc_rqst *rqstp, long timeout)
>  		else
>  			len = xprt->xpt_ops->xpo_recvfrom(rqstp);
>  		dprintk("svc: got len=%d\n", len);
> +		rqstp->rq_reserved = serv->sv_max_mesg;
> +		atomic_add(rqstp->rq_reserved, &xprt->xpt_reserved);
>  	}
>  	svc_xprt_received(xprt);
>  
> -- 
> 1.7.9.5
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Michael Tokarev Aug. 21, 2012, 6:20 a.m. UTC | #2
On 21.08.2012 02:37, J. Bruce Fields wrote:
> From: "J. Bruce Fields" <bfields@redhat.com>
> 
> The rpc server tries to ensure that there will be room to send a reply
> before it receives a request.
> 
> It does this by tracking, in xpt_reserved, an upper bound on the total
> size of the replies that is has already committed to for the socket.
> 
> Currently it is adding in the estimate for a new reply *before* it
> checks whether there is space available.  If it finds that there is not
> space, it then subtracts the estimate back out.
> 
> This may lead the subsequent svc_xprt_enqueue to decide that there is
> space after all.
> 
> The results is a svc_recv() that will repeatedly return -EAGAIN, causing
> server threads to loop without doing any actual work.
> 
> Cc: stable@vger.kernel.org

This is applicable to all 3.0+ stable kernels.  The commit which
made this bug apparent is included into 3.0-rc5 (changing memory
buffer sizes for tcp/udp/stcp).  Before that commit, this bug is
not triggered, at least here.  So 3.0, 3.2, 3.4 and 3.5 (for which
stable series are maintained at the moment) definitely should
include it.

Should it be applied to 2.6.32 &Co too?

/mjt

> Reported-by: Michael Tokarev <mjt@tls.msk.ru>
> Tested-by: Michael Tokarev <mjt@tls.msk.ru>
> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
J. Bruce Fields Aug. 21, 2012, 6:34 p.m. UTC | #3
On Tue, Aug 21, 2012 at 10:20:31AM +0400, Michael Tokarev wrote:
> On 21.08.2012 02:37, J. Bruce Fields wrote:
> > From: "J. Bruce Fields" <bfields@redhat.com>
> > 
> > The rpc server tries to ensure that there will be room to send a reply
> > before it receives a request.
> > 
> > It does this by tracking, in xpt_reserved, an upper bound on the total
> > size of the replies that is has already committed to for the socket.
> > 
> > Currently it is adding in the estimate for a new reply *before* it
> > checks whether there is space available.  If it finds that there is not
> > space, it then subtracts the estimate back out.
> > 
> > This may lead the subsequent svc_xprt_enqueue to decide that there is
> > space after all.
> > 
> > The results is a svc_recv() that will repeatedly return -EAGAIN, causing
> > server threads to loop without doing any actual work.
> > 
> > Cc: stable@vger.kernel.org
> 
> This is applicable to all 3.0+ stable kernels.  The commit which
> made this bug apparent is included into 3.0-rc5 (changing memory
> buffer sizes for tcp/udp/stcp).  Before that commit, this bug is
> not triggered, at least here.  So 3.0, 3.2, 3.4 and 3.5 (for which
> stable series are maintained at the moment) definitely should
> include it.
> 
> Should it be applied to 2.6.32 &Co too?

Probably.  It doesn't apply cleanly as is, though, so I suspect the
stable cc won't be enough to get it applied by default (that may be true
of the some the more recent kernels too, I haven't checked).

If you want to be sure that would happen, test and post a backported
patch and I'll make sure it gets to Greg.

--b.

> 
> /mjt
> 
> > Reported-by: Michael Tokarev <mjt@tls.msk.ru>
> > Tested-by: Michael Tokarev <mjt@tls.msk.ru>
> > Signed-off-by: J. Bruce Fields <bfields@redhat.com>
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
NeilBrown Sept. 25, 2012, 5:54 a.m. UTC | #4
On Mon, 20 Aug 2012 18:49:15 -0400 "J. Bruce Fields" <bfields@fieldses.org>
wrote:

> On Mon, Aug 20, 2012 at 06:37:47PM -0400, bfields wrote:
> > From: "J. Bruce Fields" <bfields@redhat.com>
> > 
> > The rpc server tries to ensure that there will be room to send a reply
> > before it receives a request.
> > 
> > It does this by tracking, in xpt_reserved, an upper bound on the total
> > size of the replies that is has already committed to for the socket.
> > 
> > Currently it is adding in the estimate for a new reply *before* it
> > checks whether there is space available.  If it finds that there is not
> > space, it then subtracts the estimate back out.
> > 
> > This may lead the subsequent svc_xprt_enqueue to decide that there is
> > space after all.
> > 
> > The results is a svc_recv() that will repeatedly return -EAGAIN, causing
> > server threads to loop without doing any actual work.
> > 
> > Cc: stable@vger.kernel.org
> > Reported-by: Michael Tokarev <mjt@tls.msk.ru>
> > Tested-by: Michael Tokarev <mjt@tls.msk.ru>
> > Signed-off-by: J. Bruce Fields <bfields@redhat.com>
> > ---
> >  net/sunrpc/svc_xprt.c |    7 ++-----
> >  1 file changed, 2 insertions(+), 5 deletions(-)
> > 
> > Queuing up for 3.6 absent any objections.--b.
> 
> By the way, one thing I'm still curious about is how this got
> introduced.  mjt bisected it to f03d78db65085609938fdb686238867e65003181
> "net: refine {udp|tcp|sctp}_mem limits", which looks like it just made
> the problem a little more likely.
> 
> The last substantive change to has_wspace logic was Trond's
> 47fcb03fefee2501e79176932a4184fc24d6f8ec, but I have a tough time
> figuring out whether that would have affected it one way or the other.
> 
> As far as I can tell we've always added to xpt_reserved in this way, so
> that svc_recv and svc_xprt_enqueue are comparing different things, and
> surely this was always wrong even if the problem must have been harder
> to trigger before.
> 
> But some of the wspace logic I don't understand, so cc'ing Neil and
> Trond in case they see any other problem I missed.

Hi Bruce et al.

My guess is that 
commit 9c335c0b8daf56b9f73479d00b1dd726e1fcca09
Author: J. Bruce Fields <bfields@redhat.com>
Date:   Tue Oct 26 11:32:03 2010 -0400

    svcrpc: fix wspace-checking race

introduced this problem.  It moved the test for 'do we have enough space'
from before we add in the requirements of a new request, to after.

But I think your patch looks good.

Thanks,
NeilBrown


> 
> --b.
> 
> > 
> > diff --git a/net/sunrpc/svc_xprt.c b/net/sunrpc/svc_xprt.c
> > index 0d693a8..bac973a 100644
> > --- a/net/sunrpc/svc_xprt.c
> > +++ b/net/sunrpc/svc_xprt.c
> > @@ -316,7 +316,6 @@ static bool svc_xprt_has_something_to_do(struct svc_xprt *xprt)
> >   */
> >  void svc_xprt_enqueue(struct svc_xprt *xprt)
> >  {
> > -	struct svc_serv	*serv = xprt->xpt_server;
> >  	struct svc_pool *pool;
> >  	struct svc_rqst	*rqstp;
> >  	int cpu;
> > @@ -362,8 +361,6 @@ void svc_xprt_enqueue(struct svc_xprt *xprt)
> >  				rqstp, rqstp->rq_xprt);
> >  		rqstp->rq_xprt = xprt;
> >  		svc_xprt_get(xprt);
> > -		rqstp->rq_reserved = serv->sv_max_mesg;
> > -		atomic_add(rqstp->rq_reserved, &xprt->xpt_reserved);
> >  		pool->sp_stats.threads_woken++;
> >  		wake_up(&rqstp->rq_wait);
> >  	} else {
> > @@ -640,8 +637,6 @@ int svc_recv(struct svc_rqst *rqstp, long timeout)
> >  	if (xprt) {
> >  		rqstp->rq_xprt = xprt;
> >  		svc_xprt_get(xprt);
> > -		rqstp->rq_reserved = serv->sv_max_mesg;
> > -		atomic_add(rqstp->rq_reserved, &xprt->xpt_reserved);
> >  
> >  		/* As there is a shortage of threads and this request
> >  		 * had to be queued, don't allow the thread to wait so
> > @@ -738,6 +733,8 @@ int svc_recv(struct svc_rqst *rqstp, long timeout)
> >  		else
> >  			len = xprt->xpt_ops->xpo_recvfrom(rqstp);
> >  		dprintk("svc: got len=%d\n", len);
> > +		rqstp->rq_reserved = serv->sv_max_mesg;
> > +		atomic_add(rqstp->rq_reserved, &xprt->xpt_reserved);
> >  	}
> >  	svc_xprt_received(xprt);
> >  
> > -- 
> > 1.7.9.5
> > 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
J. Bruce Fields Sept. 25, 2012, 1:33 p.m. UTC | #5
On Tue, Sep 25, 2012 at 03:54:57PM +1000, NeilBrown wrote:
> On Mon, 20 Aug 2012 18:49:15 -0400 "J. Bruce Fields" <bfields@fieldses.org>
> wrote:
> 
> > On Mon, Aug 20, 2012 at 06:37:47PM -0400, bfields wrote:
> > > From: "J. Bruce Fields" <bfields@redhat.com>
> > > 
> > > The rpc server tries to ensure that there will be room to send a reply
> > > before it receives a request.
> > > 
> > > It does this by tracking, in xpt_reserved, an upper bound on the total
> > > size of the replies that is has already committed to for the socket.
> > > 
> > > Currently it is adding in the estimate for a new reply *before* it
> > > checks whether there is space available.  If it finds that there is not
> > > space, it then subtracts the estimate back out.
> > > 
> > > This may lead the subsequent svc_xprt_enqueue to decide that there is
> > > space after all.
> > > 
> > > The results is a svc_recv() that will repeatedly return -EAGAIN, causing
> > > server threads to loop without doing any actual work.
> > > 
> > > Cc: stable@vger.kernel.org
> > > Reported-by: Michael Tokarev <mjt@tls.msk.ru>
> > > Tested-by: Michael Tokarev <mjt@tls.msk.ru>
> > > Signed-off-by: J. Bruce Fields <bfields@redhat.com>
> > > ---
> > >  net/sunrpc/svc_xprt.c |    7 ++-----
> > >  1 file changed, 2 insertions(+), 5 deletions(-)
> > > 
> > > Queuing up for 3.6 absent any objections.--b.
> > 
> > By the way, one thing I'm still curious about is how this got
> > introduced.  mjt bisected it to f03d78db65085609938fdb686238867e65003181
> > "net: refine {udp|tcp|sctp}_mem limits", which looks like it just made
> > the problem a little more likely.
> > 
> > The last substantive change to has_wspace logic was Trond's
> > 47fcb03fefee2501e79176932a4184fc24d6f8ec, but I have a tough time
> > figuring out whether that would have affected it one way or the other.
> > 
> > As far as I can tell we've always added to xpt_reserved in this way, so
> > that svc_recv and svc_xprt_enqueue are comparing different things, and
> > surely this was always wrong even if the problem must have been harder
> > to trigger before.
> > 
> > But some of the wspace logic I don't understand, so cc'ing Neil and
> > Trond in case they see any other problem I missed.
> 
> Hi Bruce et al.
> 
> My guess is that 
> commit 9c335c0b8daf56b9f73479d00b1dd726e1fcca09
> Author: J. Bruce Fields <bfields@redhat.com>
> Date:   Tue Oct 26 11:32:03 2010 -0400
> 
>     svcrpc: fix wspace-checking race
> 
> introduced this problem.  It moved the test for 'do we have enough space'
> from before we add in the requirements of a new request, to after.
> 
> But I think your patch looks good.

Oops--thanks for spotting that!

All I wish I had now is an idea for a patch that would make that mistake
less likely in the future.  Slather on some comments, I guess....

--b.
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/net/sunrpc/svc_xprt.c b/net/sunrpc/svc_xprt.c
index 0d693a8..bac973a 100644
--- a/net/sunrpc/svc_xprt.c
+++ b/net/sunrpc/svc_xprt.c
@@ -316,7 +316,6 @@  static bool svc_xprt_has_something_to_do(struct svc_xprt *xprt)
  */
 void svc_xprt_enqueue(struct svc_xprt *xprt)
 {
-	struct svc_serv	*serv = xprt->xpt_server;
 	struct svc_pool *pool;
 	struct svc_rqst	*rqstp;
 	int cpu;
@@ -362,8 +361,6 @@  void svc_xprt_enqueue(struct svc_xprt *xprt)
 				rqstp, rqstp->rq_xprt);
 		rqstp->rq_xprt = xprt;
 		svc_xprt_get(xprt);
-		rqstp->rq_reserved = serv->sv_max_mesg;
-		atomic_add(rqstp->rq_reserved, &xprt->xpt_reserved);
 		pool->sp_stats.threads_woken++;
 		wake_up(&rqstp->rq_wait);
 	} else {
@@ -640,8 +637,6 @@  int svc_recv(struct svc_rqst *rqstp, long timeout)
 	if (xprt) {
 		rqstp->rq_xprt = xprt;
 		svc_xprt_get(xprt);
-		rqstp->rq_reserved = serv->sv_max_mesg;
-		atomic_add(rqstp->rq_reserved, &xprt->xpt_reserved);
 
 		/* As there is a shortage of threads and this request
 		 * had to be queued, don't allow the thread to wait so
@@ -738,6 +733,8 @@  int svc_recv(struct svc_rqst *rqstp, long timeout)
 		else
 			len = xprt->xpt_ops->xpo_recvfrom(rqstp);
 		dprintk("svc: got len=%d\n", len);
+		rqstp->rq_reserved = serv->sv_max_mesg;
+		atomic_add(rqstp->rq_reserved, &xprt->xpt_reserved);
 	}
 	svc_xprt_received(xprt);