From patchwork Mon Oct 24 19:17:34 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jeff Layton X-Patchwork-Id: 9393115 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 0DF9A6077A for ; Mon, 24 Oct 2016 19:17:52 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id E81A128FE0 for ; Mon, 24 Oct 2016 19:17:51 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id DB67129001; Mon, 24 Oct 2016 19:17:51 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.4 required=2.0 tests=BAYES_00, RCVD_IN_DNSWL_HI, RCVD_IN_SORBS_SPAM, T_TVD_MIME_EPI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 8B26128FE0 for ; Mon, 24 Oct 2016 19:17:50 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753745AbcJXTRt (ORCPT ); Mon, 24 Oct 2016 15:17:49 -0400 Received: from mail-qk0-f176.google.com ([209.85.220.176]:35762 "EHLO mail-qk0-f176.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751492AbcJXTRj (ORCPT ); Mon, 24 Oct 2016 15:17:39 -0400 Received: by mail-qk0-f176.google.com with SMTP id z190so245611500qkc.2 for ; Mon, 24 Oct 2016 12:17:38 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:message-id:subject:from:to:cc:date:in-reply-to :references:mime-version; bh=7E+AqqLfYhGP84pppPzjE7OeDehZAQZhAKcgwHCQQp0=; b=U765wQom6C/ahIamP6oxBiaTEPFivmRwFgvYJdx6BgnzcNvKrt9R6ZFr3YuoxrLV1O tdp1fqVHcqdSepIsahiswZzODMfjoYyO3hMi4ypuRf8N7CSnKD+QAsdJkDIUCwTWlL2s 0bnNGLbVPx2m+kxqRw2nymQ0il87i3CsRb/ZanniLRy7+TAM3awSB2RgzoD/xW58NbmY dZ48YZeoOMGtM0f8j5sd49mB16L+znskpwHyOEOa9qBhvgGhtnQnqAEx7cRKME/tkhjA shAqTKyyXtwOMdGKFgD6GCv8WoMZBzw5+4YJvwQ49Wr4+17fJLH9EBMFWEHwZmwr0piU T04A== X-Gm-Message-State: ABUngvcNEJn6uB5sSu8/diwCHNJFA2VE7ctkrAIq6bzcZ0q/2jzGcJ1C8IwVVFGL1GEKdhgh X-Received: by 10.55.149.130 with SMTP id x124mr15773294qkd.319.1477336657814; Mon, 24 Oct 2016 12:17:37 -0700 (PDT) Received: from cpe-2606-A000-1125-405B-1A5E-FFF-FE12-8671.dyn6.twc.com (cpe-2606-A000-1125-405B-1A5E-FFF-FE12-8671.dyn6.twc.com. [2606:a000:1125:405b:1a5e:fff:fe12:8671]) by smtp.gmail.com with ESMTPSA id n77sm9156135qkn.28.2016.10.24.12.17.36 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Mon, 24 Oct 2016 12:17:36 -0700 (PDT) Message-ID: <1477336654.21854.9.camel@redhat.com> Subject: Re: upstream server crash From: Jeff Layton To: "J. Bruce Fields" Cc: Chuck Lever , Eryu Guan , Linux NFS Mailing List Date: Mon, 24 Oct 2016 15:17:34 -0400 In-Reply-To: <20161024180858.GA27359@fieldses.org> References: <20161023182115.GA14481@fieldses.org> <20161024031519.GN2462@eguan.usersys.redhat.com> <1477315868.2625.37.camel@redhat.com> <7B3F94BF-CAA1-4001-BEBC-C93965A81DE4@oracle.com> <1477322377.14828.4.camel@redhat.com> <1477322680.14828.6.camel@redhat.com> <20161024180858.GA27359@fieldses.org> X-Mailer: Evolution 3.20.5 (3.20.5-1.fc24) Mime-Version: 1.0 Sender: linux-nfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-nfs@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP On Mon, 2016-10-24 at 14:08 -0400, J. Bruce Fields wrote: > On Mon, Oct 24, 2016 at 11:24:40AM -0400, Jeff Layton wrote: > > > > On Mon, 2016-10-24 at 11:19 -0400, Jeff Layton wrote: > > > > > > On Mon, 2016-10-24 at 09:51 -0400, Chuck Lever wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Oct 24, 2016, at 9:31 AM, Jeff Layton wrote: > > > > > > > > > > On Mon, 2016-10-24 at 11:15 +0800, Eryu Guan wrote: > > > > > > > > > > > > > > > > > > > > > > > > On Sun, Oct 23, 2016 at 02:21:15PM -0400, J. Bruce Fields wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > I'm getting an intermittent crash in the nfs server as of > > > > > > > 68778945e46f143ed7974b427a8065f69a4ce944 "SUNRPC: Separate buffer > > > > > > > pointers for RPC Call and Reply messages". > > > > > > > > > > > > > > I haven't tried to understand that commit or why it would be a problem yet, I > > > > > > > don't see an obvious connection--I can take a closer look Monday. > > > > > > > > > > > > > > Could even be that I just landed on this commit by chance, the problem is a > > > > > > > little hard to reproduce so I don't completely trust my testing. > > > > > > > > > > > > I've hit the same crash on 4.9-rc1 kernel, and it's reproduced for me > > > > > > reliably by running xfstests generic/013 case, on a loopback mounted > > > > > > NFSv4.1 (or NFSv4.2), XFS is the underlying exported fs. More details > > > > > > please see > > > > > > > > > > > > http://marc.info/?l=linux-nfs&m=147714320129362&w=2 > > > > > > > > > > > > > > > > Looks like you landed at the same commit as Bruce, so that's probably > > > > > legit. That commit is very small though. The only real change that > > > > > doesn't affect the new field is this: > > > > > > > > > > > > > > > @@ -1766,7 +1766,7 @@ rpc_xdr_encode(struct rpc_task *task) > > > > > req->rq_buffer, > > > > > req->rq_callsize); > > > > > xdr_buf_init(&req->rq_rcv_buf, > > > > > - (char *)req->rq_buffer + req->rq_callsize, > > > > > + req->rq_rbuffer, > > > > > req->rq_rcvsize); > > > > > > > > > > > > > > > So I'm guessing this is breaking the callback channel somehow? > > > > > > > > Could be the TCP backchannel code is using rq_buffer in a different > > > > way than RDMA backchannel or the forward channel code. > > > > > > > > > > Well, it basically allocates a page per rpc_rqst and then maps that. > > > > > > One thing I notice is that this patch ensures that rq_rbuffer gets set > > > up in rpc_malloc and xprt_rdma_allocate, but it looks like > > > xprt_alloc_bc_req didn't get the same treatment. > > > > > > I suspect that that may be the problem... > > > > > In fact, maybe we just need this here? (untested and probably > > whitespace damaged): > > No change in results for me. > > --b. > > > > > > diff --git a/net/sunrpc/backchannel_rqst.c b/net/sunrpc/backchannel_rqst.c > > index ac701c28f44f..c561aa8ce05b 100644 > > --- a/net/sunrpc/backchannel_rqst.c > > +++ b/net/sunrpc/backchannel_rqst.c > > @@ -100,6 +100,7 @@ struct rpc_rqst *xprt_alloc_bc_req(struct rpc_xprt *xprt, gfp_t gfp_flags) > > goto out_free; > > } > > req->rq_rcv_buf.len = PAGE_SIZE; > > + req->rq_rbuffer = req->rq_rcv_buf.head[0].iov_base; > > > > /* Preallocate one XDR send buffer */ > > if (xprt_alloc_xdr_buf(&req->rq_snd_buf, gfp_flags) < 0) { Ahh ok, I think I see. We probably also need to set rq_rbuffer in bc_malloc and and xprt_rdma_bc_allocate. My guess is that we're ending up in rpc_xdr_encode with a NULL rq_rbuffer pointer, so the right fix would seem to be to ensure that it is properly set whenever rq_buffer is set. So I think this may be what we want, actually. I'll plan to test it out but may not get to it before tomorrow. From ef2a391bc4d8f6b729aacee7cde8d9baf86767c3 Mon Sep 17 00:00:00 2001 From: Jeff Layton Date: Mon, 24 Oct 2016 15:13:40 -0400 Subject: [PATCH] sunrpc: fix some missing rq_rbuffer assignments I think we basically need to set rq_rbuffer whenever rq_buffer is set. Signed-off-by: Jeff Layton --- net/sunrpc/xprtrdma/svc_rdma_backchannel.c | 1 + net/sunrpc/xprtsock.c | 1 + 2 files changed, 2 insertions(+) diff --git a/net/sunrpc/xprtrdma/svc_rdma_backchannel.c b/net/sunrpc/xprtrdma/svc_rdma_backchannel.c index 2d8545c34095..fc4535ead7c2 100644 --- a/net/sunrpc/xprtrdma/svc_rdma_backchannel.c +++ b/net/sunrpc/xprtrdma/svc_rdma_backchannel.c @@ -182,6 +182,7 @@ xprt_rdma_bc_allocate(struct rpc_task *task) return -ENOMEM; rqst->rq_buffer = page_address(page); + rqst->rq_rbuffer = (char *)rqst->rq_buffer + rqst->rq_callsize; return 0; } diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c index 0137af1c0916..e01c825bc683 100644 --- a/net/sunrpc/xprtsock.c +++ b/net/sunrpc/xprtsock.c @@ -2563,6 +2563,7 @@ static int bc_malloc(struct rpc_task *task) buf->len = PAGE_SIZE; rqst->rq_buffer = buf->data; + rqst->rq_rbuffer = (char *)rqst->rq_buffer + rqst->rq_callsize; return 0; } -- 2.7.4