From patchwork Mon Oct 24 15:24:40 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jeff Layton X-Patchwork-Id: 9392329 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 0359D6077A for ; Mon, 24 Oct 2016 15:24:46 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id E8E2529167 for ; Mon, 24 Oct 2016 15:24:45 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id DC25D2916A; Mon, 24 Oct 2016 15:24:45 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.4 required=2.0 tests=BAYES_00, RCVD_IN_DNSWL_HI, RCVD_IN_SORBS_SPAM autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 9CE8A2916D for ; Mon, 24 Oct 2016 15:24:44 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1757520AbcJXPYn (ORCPT ); Mon, 24 Oct 2016 11:24:43 -0400 Received: from mail-qk0-f173.google.com ([209.85.220.173]:38904 "EHLO mail-qk0-f173.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752360AbcJXPYm (ORCPT ); Mon, 24 Oct 2016 11:24:42 -0400 Received: by mail-qk0-f173.google.com with SMTP id v138so1787200qka.5 for ; Mon, 24 Oct 2016 08:24:42 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:message-id:subject:from:to:cc:date:in-reply-to :references:mime-version:content-transfer-encoding; bh=LpSKle0JZrZiiC3jtP2yOsNkqoxClgJk6hLL9icVw30=; b=JWwRjdWtHzoz70fsveeiSSA1u3YToDFHL9o6BGMKGxwNczlToOf7b5zpXUVoAc2SUp YsDVQlDEJnUJyl6CikSGZ6DBJhBC4oGt/vWlLxIu+Q9DVr6nAbEsMCurajehycL/ZU62 T8GMzyJMWFFWYnFa6b2f/9Ke6V6X0DxPhTcp8wiTZdFiVir/atjlGm21vw2jomXekUOm JLyb5/b4yX1r6em+ffEzlU0Yhg3RtiD3qACaRUufpj7BQmJkusy4PTD+liMJtWbJpO5N MPbU/RYiIMBboEbL6/VvYb0gmb1KpguJitFwiPDtq7rtTWTWdaUk22P+8o9EuA9nNaiw Je7Q== X-Gm-Message-State: ABUngvecZCgN2XZIDFEC4H+spJPvQGqzRDKvfEfModUSBXsfN4MvhoxKKDJL/dQep/+KEilg X-Received: by 10.55.140.199 with SMTP id o190mr14405323qkd.299.1477322681731; Mon, 24 Oct 2016 08:24:41 -0700 (PDT) Received: from cpe-2606-A000-1125-405B-1A5E-FFF-FE12-8671.dyn6.twc.com (cpe-2606-A000-1125-405B-1A5E-FFF-FE12-8671.dyn6.twc.com. [2606:a000:1125:405b:1a5e:fff:fe12:8671]) by smtp.gmail.com with ESMTPSA id k74sm8616856qkl.14.2016.10.24.08.24.41 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Mon, 24 Oct 2016 08:24:41 -0700 (PDT) Message-ID: <1477322680.14828.6.camel@redhat.com> Subject: Re: upstream server crash From: Jeff Layton To: Chuck Lever Cc: Eryu Guan , "J. Bruce Fields" , Linux NFS Mailing List Date: Mon, 24 Oct 2016 11:24:40 -0400 In-Reply-To: <1477322377.14828.4.camel@redhat.com> References: <20161023182115.GA14481@fieldses.org> <20161024031519.GN2462@eguan.usersys.redhat.com> <1477315868.2625.37.camel@redhat.com> <7B3F94BF-CAA1-4001-BEBC-C93965A81DE4@oracle.com> <1477322377.14828.4.camel@redhat.com> X-Mailer: Evolution 3.20.5 (3.20.5-1.fc24) Mime-Version: 1.0 Sender: linux-nfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-nfs@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP On Mon, 2016-10-24 at 11:19 -0400, Jeff Layton wrote: > On Mon, 2016-10-24 at 09:51 -0400, Chuck Lever wrote: > > > > > > > > > > > On Oct 24, 2016, at 9:31 AM, Jeff Layton wrote: > > > > > > On Mon, 2016-10-24 at 11:15 +0800, Eryu Guan wrote: > > > > > > > > > > > > On Sun, Oct 23, 2016 at 02:21:15PM -0400, J. Bruce Fields wrote: > > > > > > > > > > > > > > > > > > > > I'm getting an intermittent crash in the nfs server as of > > > > > 68778945e46f143ed7974b427a8065f69a4ce944 "SUNRPC: Separate buffer > > > > > pointers for RPC Call and Reply messages". > > > > > > > > > > I haven't tried to understand that commit or why it would be a problem yet, I > > > > > don't see an obvious connection--I can take a closer look Monday. > > > > > > > > > > Could even be that I just landed on this commit by chance, the problem is a > > > > > little hard to reproduce so I don't completely trust my testing. > > > > > > > > I've hit the same crash on 4.9-rc1 kernel, and it's reproduced for me > > > > reliably by running xfstests generic/013 case, on a loopback mounted > > > > NFSv4.1 (or NFSv4.2), XFS is the underlying exported fs. More details > > > > please see > > > > > > > > http://marc.info/?l=linux-nfs&m=147714320129362&w=2 > > > > > > > > > > Looks like you landed at the same commit as Bruce, so that's probably > > > legit. That commit is very small though. The only real change that > > > doesn't affect the new field is this: > > > > > > > > > @@ -1766,7 +1766,7 @@ rpc_xdr_encode(struct rpc_task *task) > > > req->rq_buffer, > > > req->rq_callsize); > > > xdr_buf_init(&req->rq_rcv_buf, > > > - (char *)req->rq_buffer + req->rq_callsize, > > > + req->rq_rbuffer, > > > req->rq_rcvsize); > > > > > > > > > So I'm guessing this is breaking the callback channel somehow? > > > > Could be the TCP backchannel code is using rq_buffer in a different > > way than RDMA backchannel or the forward channel code. > > > > Well, it basically allocates a page per rpc_rqst and then maps that. > > One thing I notice is that this patch ensures that rq_rbuffer gets set > up in rpc_malloc and xprt_rdma_allocate, but it looks like > xprt_alloc_bc_req didn't get the same treatment. > > I suspect that that may be the problem... > In fact, maybe we just need this here? (untested and probably whitespace damaged): --- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html diff --git a/net/sunrpc/backchannel_rqst.c b/net/sunrpc/backchannel_rqst.c index ac701c28f44f..c561aa8ce05b 100644 --- a/net/sunrpc/backchannel_rqst.c +++ b/net/sunrpc/backchannel_rqst.c @@ -100,6 +100,7 @@ struct rpc_rqst *xprt_alloc_bc_req(struct rpc_xprt *xprt, gfp_t gfp_flags) goto out_free; } req->rq_rcv_buf.len = PAGE_SIZE; + req->rq_rbuffer = req->rq_rcv_buf.head[0].iov_base; /* Preallocate one XDR send buffer */ if (xprt_alloc_xdr_buf(&req->rq_snd_buf, gfp_flags) < 0) {