From patchwork Wed Nov 9 19:05:29 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chuck Lever X-Patchwork-Id: 9420055 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id D540C60585 for ; Wed, 9 Nov 2016 19:05:35 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id C5CAB27D9B for ; Wed, 9 Nov 2016 19:05:35 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id BAC9D293C4; Wed, 9 Nov 2016 19:05:35 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.3 required=2.0 tests=BAYES_00,DKIM_SIGNED, RCVD_IN_DNSWL_HI, RCVD_IN_SORBS_SPAM, T_DKIM_INVALID autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id D483B293C2 for ; Wed, 9 Nov 2016 19:05:34 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752956AbcKITFd (ORCPT ); Wed, 9 Nov 2016 14:05:33 -0500 Received: from mail-it0-f65.google.com ([209.85.214.65]:32863 "EHLO mail-it0-f65.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751594AbcKITFc (ORCPT ); Wed, 9 Nov 2016 14:05:32 -0500 Received: by mail-it0-f65.google.com with SMTP id e187so23703022itc.0; Wed, 09 Nov 2016 11:05:32 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=sender:subject:from:to:date:message-id:in-reply-to:references :user-agent:mime-version:content-transfer-encoding; bh=MXmuBrRpeM+/2+G3qg895vHs5sbFU4l/HHtNT7/mnkg=; b=gU/d/yiHmGMZIUAyQkzrKr4gFJKBIAwzlQA5hdGcGEm5cvut2MqsY7QPqTS14vM6q+ JOzh+nsexfEHsFXMVadfzzFA+BPnIbkdbO+s25FBHycyyBwJduuDUCc+7Qz4szP9ePvK sTXGGgtKVbvlZ7e7jy06rt3fswNIJLMLAIkLbI4UdnGq6xBFiEWKGY8/GbKqYZxxqbXr lZ9JETq4jlIy7S8ZXy2hKLtkPfOS879CwETJfNIMJrauRqFoNkZ54EawAteK/qylNq0P EIKKM+hHEbhvcdxnjPFnzBULVsMxtzPY0jMiHYNodGlBqwwDrN3aDpVCYNn7a6vD2d33 9Lww== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:sender:subject:from:to:date:message-id :in-reply-to:references:user-agent:mime-version :content-transfer-encoding; bh=MXmuBrRpeM+/2+G3qg895vHs5sbFU4l/HHtNT7/mnkg=; b=HzhrOueIZuiPBHolD98+LPT6TZXs6A1uTS2v9kH9DITAV588ANYB0uzjsRs81JyT5x KqJd/YoZBG8pj6ESszxwhUvF79ab+2//Uxkjpb2thpv4xpfi4ZnyvAvjlSca+froBiYG F69XqHXiQIfAVTHBe2azYWfbbKr2Zmc46ylNeotlH5oqck8nQIM918AWLxBhZMT24LNc awgKaPCH39L+7VpbMVKVA4GTZGQ0zPjvWCWyu9s6+GyvEScjXsL1e1BhmYAvnQAruLeU j+gw5bRQumWDGXPDh9Sv3BGa0Zv15LFJic9BW4zniBVJ/C+Nqft2yuUqwn4eAGeWLTCT MOfA== X-Gm-Message-State: ABUngveIAQh89JC+momeWYeI8I66dHgEHEp13NkGjI9Gh4+EaY4AugvE5lv/jhJPC92MMA== X-Received: by 10.36.91.85 with SMTP id g82mr8336094itb.23.1478718331738; Wed, 09 Nov 2016 11:05:31 -0800 (PST) Received: from manet.1015granger.net ([2604:8800:100:81fc:ec4:7aff:fe6c:1dce]) by smtp.gmail.com with ESMTPSA id l14sm564412ioi.18.2016.11.09.11.05.29 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 09 Nov 2016 11:05:29 -0800 (PST) Subject: [PATCH v1 05/14] SUNRPC: Proper metric accounting when RPC is not transmitted From: Chuck Lever To: linux-rdma@vger.kernel.org, linux-nfs@vger.kernel.org Date: Wed, 09 Nov 2016 14:05:29 -0500 Message-ID: <20161109190529.15007.84163.stgit@manet.1015granger.net> In-Reply-To: <20161109184735.15007.96507.stgit@manet.1015granger.net> References: <20161109184735.15007.96507.stgit@manet.1015granger.net> User-Agent: StGit/0.17.1-dirty MIME-Version: 1.0 Sender: linux-nfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-nfs@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP I noticed recently that during an xfstests on a krb5i mount, the retransmit count for certain operations had gone negative, and the backlog value became unreasonably large. I recall that Andy has pointed this out to me in the past. When call_refresh fails to find a valid credential for an RPC, the RPC exits immediately without sending anything on the wire. This leaves rq_ntrans, rq_xtime, and rq_rtt set to zero. The solution for om_queue is to not add the to RPC's running backlog queue total whenever rq_xtime is zero. For om_ntrans, it's a bit more difficult. A zero rq_ntrans causes om_ops to become larger than om_ntrans. The design of the RPC metrics API assumes that ntrans will always be equal to or larger than the ops count. The result is that when an RPC fails to find credentials, the RPC operation's reported retransmit count, which is computed in user space as the difference between ops and ntrans, goes negative. Ideally the kernel API should report a separate retransmit and "exited before initial transmission" metric, so that user space can sort out the difference properly. To avoid kernel API changes and changes to the way rq_ntrans is used when performing transport locking, account for untransmitted RPCs so that om_ntrans keeps up with om_ops: always add one or more to om_ntrans. Signed-off-by: Chuck Lever --- net/sunrpc/stats.c | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-) -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html diff --git a/net/sunrpc/stats.c b/net/sunrpc/stats.c index 2ecb994..caeb01a 100644 --- a/net/sunrpc/stats.c +++ b/net/sunrpc/stats.c @@ -157,15 +157,17 @@ void rpc_count_iostats_metrics(const struct rpc_task *task, spin_lock(&op_metrics->om_lock); op_metrics->om_ops++; - op_metrics->om_ntrans += req->rq_ntrans; + /* kernel API: om_ops must never become larger than om_ntrans */ + op_metrics->om_ntrans += max(req->rq_ntrans, 1); op_metrics->om_timeouts += task->tk_timeouts; op_metrics->om_bytes_sent += req->rq_xmit_bytes_sent; op_metrics->om_bytes_recv += req->rq_reply_bytes_recvd; - delta = ktime_sub(req->rq_xtime, task->tk_start); - op_metrics->om_queue = ktime_add(op_metrics->om_queue, delta); - + if (ktime_to_ns(req->rq_xtime)) { + delta = ktime_sub(req->rq_xtime, task->tk_start); + op_metrics->om_queue = ktime_add(op_metrics->om_queue, delta); + } op_metrics->om_rtt = ktime_add(op_metrics->om_rtt, req->rq_rtt); delta = ktime_sub(now, task->tk_start);