Message ID | 1477322680.14828.6.camel@redhat.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
> On Oct 24, 2016, at 11:24 AM, Jeff Layton <jlayton@redhat.com> wrote: > > On Mon, 2016-10-24 at 11:19 -0400, Jeff Layton wrote: >> On Mon, 2016-10-24 at 09:51 -0400, Chuck Lever wrote: >>> >>>> >>>> >>>> On Oct 24, 2016, at 9:31 AM, Jeff Layton <jlayton@redhat.com> wrote: >>>> >>>> On Mon, 2016-10-24 at 11:15 +0800, Eryu Guan wrote: >>>>> >>>>> >>>>> On Sun, Oct 23, 2016 at 02:21:15PM -0400, J. Bruce Fields wrote: >>>>>> >>>>>> >>>>>> >>>>>> I'm getting an intermittent crash in the nfs server as of >>>>>> 68778945e46f143ed7974b427a8065f69a4ce944 "SUNRPC: Separate buffer >>>>>> pointers for RPC Call and Reply messages". >>>>>> >>>>>> I haven't tried to understand that commit or why it would be a problem yet, I >>>>>> don't see an obvious connection--I can take a closer look Monday. >>>>>> >>>>>> Could even be that I just landed on this commit by chance, the problem is a >>>>>> little hard to reproduce so I don't completely trust my testing. >>>>> >>>>> I've hit the same crash on 4.9-rc1 kernel, and it's reproduced for me >>>>> reliably by running xfstests generic/013 case, on a loopback mounted >>>>> NFSv4.1 (or NFSv4.2), XFS is the underlying exported fs. More details >>>>> please see >>>>> >>>>> http://marc.info/?l=linux-nfs&m=147714320129362&w=2 >>>>> >>>> >>>> Looks like you landed at the same commit as Bruce, so that's probably >>>> legit. That commit is very small though. The only real change that >>>> doesn't affect the new field is this: >>>> >>>> >>>> @@ -1766,7 +1766,7 @@ rpc_xdr_encode(struct rpc_task *task) >>>> req->rq_buffer, >>>> req->rq_callsize); >>>> xdr_buf_init(&req->rq_rcv_buf, >>>> - (char *)req->rq_buffer + req->rq_callsize, >>>> + req->rq_rbuffer, >>>> req->rq_rcvsize); >>>> >>>> >>>> So I'm guessing this is breaking the callback channel somehow? >>> >>> Could be the TCP backchannel code is using rq_buffer in a different >>> way than RDMA backchannel or the forward channel code. >>> >> >> Well, it basically allocates a page per rpc_rqst and then maps that. >> >> One thing I notice is that this patch ensures that rq_rbuffer gets set >> up in rpc_malloc and xprt_rdma_allocate, but it looks like >> xprt_alloc_bc_req didn't get the same treatment. >> >> I suspect that that may be the problem... >> > In fact, maybe we just need this here? (untested and probably > whitespace damaged): > > diff --git a/net/sunrpc/backchannel_rqst.c b/net/sunrpc/backchannel_rqst.c > index ac701c28f44f..c561aa8ce05b 100644 > --- a/net/sunrpc/backchannel_rqst.c > +++ b/net/sunrpc/backchannel_rqst.c > @@ -100,6 +100,7 @@ struct rpc_rqst *xprt_alloc_bc_req(struct rpc_xprt *xprt, gfp_t gfp_flags) > goto out_free; > } > req->rq_rcv_buf.len = PAGE_SIZE; > + req->rq_rbuffer = req->rq_rcv_buf.head[0].iov_base; That looks plausible! Basically that is needed after xdr_buf_init() is done for a backchannel rpc_rqst's receive buffer. net/sunrpc/xprtrdma/backchannel.c might need a similar change. I saw crashes with generic/013 at bake-a-thon last week, but as the iommu was involved with those, I've been looking in a different place. Will give this a try. > /* Preallocate one XDR send buffer */ > if (xprt_alloc_xdr_buf(&req->rq_snd_buf, gfp_flags) < 0) { -- Chuck Lever -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Oct 24, 2016 at 11:24:40AM -0400, Jeff Layton wrote: > On Mon, 2016-10-24 at 11:19 -0400, Jeff Layton wrote: > > On Mon, 2016-10-24 at 09:51 -0400, Chuck Lever wrote: > > > > > > > > > > > > > > > On Oct 24, 2016, at 9:31 AM, Jeff Layton <jlayton@redhat.com> wrote: > > > > > > > > On Mon, 2016-10-24 at 11:15 +0800, Eryu Guan wrote: > > > > > > > > > > > > > > > On Sun, Oct 23, 2016 at 02:21:15PM -0400, J. Bruce Fields wrote: > > > > > > > > > > > > > > > > > > > > > > > > I'm getting an intermittent crash in the nfs server as of > > > > > > 68778945e46f143ed7974b427a8065f69a4ce944 "SUNRPC: Separate buffer > > > > > > pointers for RPC Call and Reply messages". > > > > > > > > > > > > I haven't tried to understand that commit or why it would be a problem yet, I > > > > > > don't see an obvious connection--I can take a closer look Monday. > > > > > > > > > > > > Could even be that I just landed on this commit by chance, the problem is a > > > > > > little hard to reproduce so I don't completely trust my testing. > > > > > > > > > > I've hit the same crash on 4.9-rc1 kernel, and it's reproduced for me > > > > > reliably by running xfstests generic/013 case, on a loopback mounted > > > > > NFSv4.1 (or NFSv4.2), XFS is the underlying exported fs. More details > > > > > please see > > > > > > > > > > http://marc.info/?l=linux-nfs&m=147714320129362&w=2 > > > > > > > > > > > > > Looks like you landed at the same commit as Bruce, so that's probably > > > > legit. That commit is very small though. The only real change that > > > > doesn't affect the new field is this: > > > > > > > > > > > > @@ -1766,7 +1766,7 @@ rpc_xdr_encode(struct rpc_task *task) > > > > req->rq_buffer, > > > > req->rq_callsize); > > > > xdr_buf_init(&req->rq_rcv_buf, > > > > - (char *)req->rq_buffer + req->rq_callsize, > > > > + req->rq_rbuffer, > > > > req->rq_rcvsize); > > > > > > > > > > > > So I'm guessing this is breaking the callback channel somehow? > > > > > > Could be the TCP backchannel code is using rq_buffer in a different > > > way than RDMA backchannel or the forward channel code. > > > > > > > Well, it basically allocates a page per rpc_rqst and then maps that. > > > > One thing I notice is that this patch ensures that rq_rbuffer gets set > > up in rpc_malloc and xprt_rdma_allocate, but it looks like > > xprt_alloc_bc_req didn't get the same treatment. > > > > I suspect that that may be the problem... > > > In fact, maybe we just need this here? (untested and probably > whitespace damaged): No change in results for me. --b. > > diff --git a/net/sunrpc/backchannel_rqst.c b/net/sunrpc/backchannel_rqst.c > index ac701c28f44f..c561aa8ce05b 100644 > --- a/net/sunrpc/backchannel_rqst.c > +++ b/net/sunrpc/backchannel_rqst.c > @@ -100,6 +100,7 @@ struct rpc_rqst *xprt_alloc_bc_req(struct rpc_xprt *xprt, gfp_t gfp_flags) > goto out_free; > } > req->rq_rcv_buf.len = PAGE_SIZE; > + req->rq_rbuffer = req->rq_rcv_buf.head[0].iov_base; > > /* Preallocate one XDR send buffer */ > if (xprt_alloc_xdr_buf(&req->rq_snd_buf, gfp_flags) < 0) { -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/net/sunrpc/backchannel_rqst.c b/net/sunrpc/backchannel_rqst.c index ac701c28f44f..c561aa8ce05b 100644 --- a/net/sunrpc/backchannel_rqst.c +++ b/net/sunrpc/backchannel_rqst.c @@ -100,6 +100,7 @@ struct rpc_rqst *xprt_alloc_bc_req(struct rpc_xprt *xprt, gfp_t gfp_flags) goto out_free; } req->rq_rcv_buf.len = PAGE_SIZE; + req->rq_rbuffer = req->rq_rcv_buf.head[0].iov_base; /* Preallocate one XDR send buffer */ if (xprt_alloc_xdr_buf(&req->rq_snd_buf, gfp_flags) < 0) {