Message ID | 20120820215210.GJ5779@fieldses.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
J. Bruce Fields [bfields@fieldses.org] wrote: > From: "J. Bruce Fields" <bfields@redhat.com> > > svc_tcp_sendto sets XPT_CLOSE if we fail to transmit the entire reply. > However, the XPT_CLOSE won't be acted on immediately. Meanwhile other > threads could send further replies before the socket is really shut > down. This can manifest as data corruption: for example, if a truncated > read reply is followed by another rpc reply, that second reply will look > to the client like further read data. > > Symptoms were data corruption preceded by svc_tcp_sendto logging > something like > > kernel: rpc-srv/tcp: nfsd: sent only 963696 when sending 1048708 bytes - shutting down socket > > Cc: stable@vger.kernel.org > Reported-by: Malahal Naineni <malahal@us.ibm.com> > Signed-off-by: J. Bruce Fields <bfields@redhat.com> > --- > net/sunrpc/svc_xprt.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > Can anyone see a reason *not* to fail the send on a socket marked with > XPT_CLOSE? > > diff --git a/net/sunrpc/svc_xprt.c b/net/sunrpc/svc_xprt.c > index 88f2bf6..0d693a8 100644 > --- a/net/sunrpc/svc_xprt.c > +++ b/net/sunrpc/svc_xprt.c > @@ -794,7 +794,8 @@ int svc_send(struct svc_rqst *rqstp) > > /* Grab mutex to serialize outgoing data. */ > mutex_lock(&xprt->xpt_mutex); > - if (test_bit(XPT_DEAD, &xprt->xpt_flags)) > + if (test_bit(XPT_DEAD, &xprt->xpt_flags) > + || test_bit(XPT_CLOSE, &xprt->xpt_flags)) > len = -ENOTCONN; > else > len = xprt->xpt_ops->xpo_sendto(rqstp); > -- > 1.7.9.5 Instrumented svc_send_common() to send partial read replies, was able reproduce the corruption easily. After applying this patch, I wasn't able to reproduce the corruption. The patch looks good. Reviewed-by: Malahal Naineni <malahal@us.ibm.com> Tested-by: Malahal Naineni <malahal@us.ibm.com> -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Aug 20, 2012 at 05:35:06PM -0500, Malahal Naineni wrote: > J. Bruce Fields [bfields@fieldses.org] wrote: > > From: "J. Bruce Fields" <bfields@redhat.com> > > > > svc_tcp_sendto sets XPT_CLOSE if we fail to transmit the entire reply. > > However, the XPT_CLOSE won't be acted on immediately. Meanwhile other > > threads could send further replies before the socket is really shut > > down. ... > Instrumented svc_send_common() to send partial read replies, was able > reproduce the corruption easily. After applying this patch, I wasn't > able to reproduce the corruption. The patch looks good. I wonder, maybe someone who understands the tcp code better could answer: is it really possible for a sendto to fail and then a subsequent send succeed? If so, what are possible causes? I'm curious how we'd reproduce this without the artificial fault injection. --b. -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/net/sunrpc/svc_xprt.c b/net/sunrpc/svc_xprt.c index 88f2bf6..0d693a8 100644 --- a/net/sunrpc/svc_xprt.c +++ b/net/sunrpc/svc_xprt.c @@ -794,7 +794,8 @@ int svc_send(struct svc_rqst *rqstp) /* Grab mutex to serialize outgoing data. */ mutex_lock(&xprt->xpt_mutex); - if (test_bit(XPT_DEAD, &xprt->xpt_flags)) + if (test_bit(XPT_DEAD, &xprt->xpt_flags) + || test_bit(XPT_CLOSE, &xprt->xpt_flags)) len = -ENOTCONN; else len = xprt->xpt_ops->xpo_sendto(rqstp);