From patchwork Sun Nov 9 01:14:20 2014 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chuck Lever X-Patchwork-Id: 5259381 Return-Path: X-Original-To: patchwork-linux-nfs@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork2.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.19.201]) by patchwork2.web.kernel.org (Postfix) with ESMTP id 6D3B0C11AC for ; Sun, 9 Nov 2014 01:14:28 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id 88C9520154 for ; Sun, 9 Nov 2014 01:14:27 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 0B88020148 for ; Sun, 9 Nov 2014 01:14:26 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751320AbaKIBOX (ORCPT ); Sat, 8 Nov 2014 20:14:23 -0500 Received: from mail-ie0-f180.google.com ([209.85.223.180]:39850 "EHLO mail-ie0-f180.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751088AbaKIBOX (ORCPT ); Sat, 8 Nov 2014 20:14:23 -0500 Received: by mail-ie0-f180.google.com with SMTP id y20so7445452ier.11 for ; Sat, 08 Nov 2014 17:14:22 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=sender:subject:from:to:date:message-id:in-reply-to:references :user-agent:mime-version:content-type:content-transfer-encoding; bh=jItTgU2qvs7YhvjJZhuAPM2iJTy9xDul5lQ/Rvf0LfI=; b=dBwJk0mp2jsLrZSJL/9V+Bk2boXJa35vwt7gU8k/dnVM4KXFCyAymp8gnwbFZ3JtZj J5V3vFz6d+0M4pxhLNzlTznqVjCDAJkm2yVnSoBPhukqKyiYQmB4dhFOL8zraIhUMH2s M+KaahPXfO3pg/NdNq3Y5NAY1eNzVZFKXvGnFxXia+VzRf7lCs5UGF+KqFf5pGJrgbQh Wo0pPzz4l/f+BjVJOlmsyozYh1KkaBN5C3M+ssHNeJ3unKgz8aXVNgG0R6NobtNYlB2P qT7AqAz7HrhNy4o6y1edqszyL39YQ6axypSC44+fAKTnfJsT2qerq0fz+0Hbt53+uERS QarQ== X-Received: by 10.50.112.165 with SMTP id ir5mr14963978igb.44.1415495662037; Sat, 08 Nov 2014 17:14:22 -0800 (PST) Received: from manet.1015granger.net ([2604:8800:100:81fc:82ee:73ff:fe43:d64f]) by mx.google.com with ESMTPSA id uu4sm732791igb.19.2014.11.08.17.14.21 for (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Sat, 08 Nov 2014 17:14:21 -0800 (PST) Subject: [PATCH v2 02/10] xprtrdma: Cap req_cqinit From: Chuck Lever To: linux-rdma@vger.kernel.org, linux-nfs@vger.kernel.org Date: Sat, 08 Nov 2014 20:14:20 -0500 Message-ID: <20141109011420.8806.1849.stgit@manet.1015granger.net> In-Reply-To: <20141109010328.8806.5861.stgit@manet.1015granger.net> References: <20141109010328.8806.5861.stgit@manet.1015granger.net> User-Agent: StGit/0.17.1-3-g7d0f MIME-Version: 1.0 Sender: linux-nfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-nfs@vger.kernel.org X-Spam-Status: No, score=-5.9 required=5.0 tests=BAYES_00,DKIM_SIGNED, RCVD_IN_DNSWL_HI,RP_MATCHES_RCVD,T_DKIM_INVALID,UNPARSEABLE_RELAY, URIBL_RHS_DOB autolearn=unavailable version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Recent work made FRMR registration and invalidation completions unsignaled. This greatly reduces the adapter interrupt rate. Every so often, however, a posted send Work Request is allowed to signal. Otherwise, the provider's Work Queue will wrap and the workload will hang. The number of Work Requests that are allowed to remain unsignaled is determined by the value of req_cqinit. Currently, this is set to the size of the send Work Queue divided by two, minus 1. For FRMR, the send Work Queue is the maximum number of concurrent RPCs (currently 32) times the maximum number of Work Requests an RPC might use (currently 7, though some adapters may need more). For mlx4, this is 224 entries. This leaves completion signaling disabled for 111 send Work Requests. Some providers hold back dispatching Work Requests until a CQE is generated. If completions are disabled, then no CQEs are generated for quite some time, and that can stall the Work Queue. I've seen this occur running xfstests generic/113 over NFSv4, where eventually, posting a FAST_REG_MR Work Request fails with -ENOMEM because the Work Queue has overflowed. The connection is dropped and re-established. Cap the rep_cqinit setting so completions are not left turned off for too long. BugLink: https://bugzilla.linux-nfs.org/show_bug.cgi?id=269 Signed-off-by: Chuck Lever --- net/sunrpc/xprtrdma/verbs.c | 4 +++- net/sunrpc/xprtrdma/xprt_rdma.h | 6 ++++++ 2 files changed, 9 insertions(+), 1 deletion(-) -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c index 6ea2942..af45cf3 100644 --- a/net/sunrpc/xprtrdma/verbs.c +++ b/net/sunrpc/xprtrdma/verbs.c @@ -733,7 +733,9 @@ rpcrdma_ep_create(struct rpcrdma_ep *ep, struct rpcrdma_ia *ia, /* set trigger for requesting send completion */ ep->rep_cqinit = ep->rep_attr.cap.max_send_wr/2 - 1; - if (ep->rep_cqinit <= 2) + if (ep->rep_cqinit > RPCRDMA_MAX_UNSIGNALED_SENDS) + ep->rep_cqinit = RPCRDMA_MAX_UNSIGNALED_SENDS; + else if (ep->rep_cqinit <= 2) ep->rep_cqinit = 0; INIT_CQCOUNT(ep); ep->rep_ia = ia; diff --git a/net/sunrpc/xprtrdma/xprt_rdma.h b/net/sunrpc/xprtrdma/xprt_rdma.h index ac7fc9a..b799041 100644 --- a/net/sunrpc/xprtrdma/xprt_rdma.h +++ b/net/sunrpc/xprtrdma/xprt_rdma.h @@ -97,6 +97,12 @@ struct rpcrdma_ep { struct ib_wc rep_recv_wcs[RPCRDMA_POLLSIZE]; }; +/* + * Force a signaled SEND Work Request every so often, + * in case the provider needs to do some housekeeping. + */ +#define RPCRDMA_MAX_UNSIGNALED_SENDS (32) + #define INIT_CQCOUNT(ep) atomic_set(&(ep)->rep_cqcount, (ep)->rep_cqinit) #define DECR_CQCOUNT(ep) atomic_sub_return(1, &(ep)->rep_cqcount)