From patchwork Mon Feb 26 17:22:06 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chuck Lever X-Patchwork-Id: 10242875 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id B564C602A0 for ; Mon, 26 Feb 2018 17:22:14 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id A90932A1FD for ; Mon, 26 Feb 2018 17:22:14 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 9D6652A194; Mon, 26 Feb 2018 17:22:14 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.8 required=2.0 tests=BAYES_00,DKIM_SIGNED, RCVD_IN_DNSWL_HI,T_DKIM_INVALID autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id A5CC22A194 for ; Mon, 26 Feb 2018 17:22:12 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751485AbeBZRWL (ORCPT ); Mon, 26 Feb 2018 12:22:11 -0500 Received: from mail-it0-f67.google.com ([209.85.214.67]:37508 "EHLO mail-it0-f67.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751061AbeBZRWJ (ORCPT ); Mon, 26 Feb 2018 12:22:09 -0500 Received: by mail-it0-f67.google.com with SMTP id k79so8958694ita.2; Mon, 26 Feb 2018 09:22:09 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=sender:subject:from:to:cc:date:message-id:user-agent:mime-version :content-transfer-encoding; bh=wfkQ0s1nCUD4LsIv+dRzkG0vgP5ttC6uT0wjTv575hk=; b=mop2N+PpT7e5KgIu5ZJGl3/Ea0go5mmqC2Y1kNmte5bOyJdEj1qK2X8qNDoMEuL15e mv5b7vKWoOum/ir3uXvejI8O0cyP1gNavGr7u/YliX+LvIMDsF5L/ta5LvSsjDuZIxA/ mgcysjorrQ4viafqZt0Oli/a7p5SLCPqMVfgQ085M13Rma11Wv5xeUYK98R352I8mG5H yqBhsmpNM2oVcZQuEdmU/h2rFS4bLsINDyIQm3sn6woEVrH8DunX8y+JazRh9dV+G+gQ eWSKEPQ4wsHWFhJfEZ01wY4YbhpGvFiDjnVk20FEnU+1+6RovEjUY8vtdRCEucKEDwBT j3Ew== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:subject:from:to:cc:date:message-id :user-agent:mime-version:content-transfer-encoding; bh=wfkQ0s1nCUD4LsIv+dRzkG0vgP5ttC6uT0wjTv575hk=; b=BmLOrG2+Yr2RRRNAZ/T+A30tEnnDkVYhB0UF+CgSMmya/TUEQDKRrPbC+VTQp5tQMx SHsl9GgSNfiZ0Bl96bSIkeA5n2yiHiUDBLBgFaWjHakrjT6RiH9kfjnT30pv5ivmwztZ 5BmtjbxxAby0Budumeh3FI2BsRBbZG/VXbvOXZwLSulnk2Q+LQL38q3BGVsPzoYblXtK W1rzgKhKs7KFswQPzkzJ3ySwz3m12l4PWrNQQdfG6vX2Cojwg6hrlYXipITSQjLRXIDy qTDCU8+9hbwc8q5diWOoIHef2Eore0+6LCKSPm24ehqbO7r/U2Q565bZOV6IsAd5Q4R5 c64Q== X-Gm-Message-State: APf1xPC1kKtfhIwv1NKyZt7lM1U5UjeZ7xpX+vrn92UvIjb5afh8PIps Qpoa4AuttvqOEEJVd13dHj4Hfw== X-Google-Smtp-Source: AG47ELtjjgrAUz6igE+GqIMDzrWbHLsAdHhmekOD773yw6reXOVTpDSOa/lyHi07go+kRkbumhU/ag== X-Received: by 10.36.3.197 with SMTP id e188mr7725255ite.74.1519665728981; Mon, 26 Feb 2018 09:22:08 -0800 (PST) Received: from gateway.1015granger.net (c-68-61-232-219.hsd1.mi.comcast.net. [68.61.232.219]) by smtp.gmail.com with ESMTPSA id u127sm5622026iod.47.2018.02.26.09.22.07 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 26 Feb 2018 09:22:08 -0800 (PST) Received: from manet.1015granger.net (manet.1015granger.net [192.168.1.51]) by gateway.1015granger.net (8.14.7/8.14.7) with ESMTP id w1QHM6CQ018492; Mon, 26 Feb 2018 17:22:06 GMT Subject: [PATCH] xprtrdma: Fix latency regression on NUMA NFS/RDMA clients From: Chuck Lever To: tj@kernel.org Cc: linux-rdma@vger.kernel.org, linux-nfs@vger.kernel.org Date: Mon, 26 Feb 2018 12:22:06 -0500 Message-ID: <20180226171008.13286.35869.stgit@manet.1015granger.net> User-Agent: StGit/0.17.1-dirty MIME-Version: 1.0 Sender: linux-nfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-nfs@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP With v4.15, on one of my NFS/RDMA clients I measured a nearly doubling in the latency of small read and write system calls. There was no change in server round trip time. The extra latency appears in the whole RPC execution path. "git bisect" settled on commit ccede7598588 ("xprtrdma: Spread reply processing over more CPUs") . After some experimentation, I found that leaving the WQ bound and allowing the scheduler to pick the dispatch CPU seems to eliminate the long latencies, and it does not introduce any new regressions. The fix is implemented by reverting only the part of commit ccede7598588 ("xprtrdma: Spread reply processing over more CPUs") that dispatches RPC replies specifically on the CPU where the matching RPC call was made. Interestingly, saving the CPU number and later queuing reply processing there was effective _only_ for a NFS READ and WRITE request. On my NUMA client, in-kernel RPC reply processing for asynchronous RPCs was dispatched on the same CPU where the RPC call was made, as expected. However synchronous RPCs seem to get their reply dispatched on some other CPU than where the call was placed, every time. Fixes: ccede7598588 ("xprtrdma: Spread reply processing over ... ") Signed-off-by: Chuck Lever --- Hi Tejun- I'm interested in your comments about how rpcrdma_reply_handler uses queue_work to spread workload away from the CPU core that is assigned to handle Receive completions, in particular how it might work on NUMA systems. The rpcrdma_receive_wq workqueue is no longer WQ_UNBOUND. In this patch I have changed the mechanism from queuing work on a particular core to queuing work using WORK_CPU_UNBOUND and letting the scheduler choose where the work item is dispatched. The purpose is to move work away from the Receiving CPU when it is busy, to help the workload scale well on systems with multiple CPUs. net/sunrpc/xprtrdma/rpc_rdma.c | 2 +- net/sunrpc/xprtrdma/transport.c | 2 -- net/sunrpc/xprtrdma/xprt_rdma.h | 1 - 3 files changed, 1 insertion(+), 4 deletions(-) -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html diff --git a/net/sunrpc/xprtrdma/rpc_rdma.c b/net/sunrpc/xprtrdma/rpc_rdma.c index f0855a9..4bc0f4d 100644 --- a/net/sunrpc/xprtrdma/rpc_rdma.c +++ b/net/sunrpc/xprtrdma/rpc_rdma.c @@ -1366,7 +1366,7 @@ void rpcrdma_reply_handler(struct rpcrdma_rep *rep) trace_xprtrdma_reply(rqst->rq_task, rep, req, credits); - queue_work_on(req->rl_cpu, rpcrdma_receive_wq, &rep->rr_work); + queue_work(rpcrdma_receive_wq, &rep->rr_work); return; out_badstatus: diff --git a/net/sunrpc/xprtrdma/transport.c b/net/sunrpc/xprtrdma/transport.c index 4b1ecfe..f86021e 100644 --- a/net/sunrpc/xprtrdma/transport.c +++ b/net/sunrpc/xprtrdma/transport.c @@ -52,7 +52,6 @@ #include #include #include -#include #include "xprt_rdma.h" @@ -651,7 +650,6 @@ if (!rpcrdma_get_recvbuf(r_xprt, req, rqst->rq_rcvsize, flags)) goto out_fail; - req->rl_cpu = smp_processor_id(); req->rl_connect_cookie = 0; /* our reserved value */ rpcrdma_set_xprtdata(rqst, req); rqst->rq_buffer = req->rl_sendbuf->rg_base; diff --git a/net/sunrpc/xprtrdma/xprt_rdma.h b/net/sunrpc/xprtrdma/xprt_rdma.h index 69883a9..430a6de 100644 --- a/net/sunrpc/xprtrdma/xprt_rdma.h +++ b/net/sunrpc/xprtrdma/xprt_rdma.h @@ -334,7 +334,6 @@ enum { struct rpcrdma_buffer; struct rpcrdma_req { struct list_head rl_list; - int rl_cpu; unsigned int rl_connect_cookie; struct rpcrdma_buffer *rl_buffer; struct rpcrdma_rep *rl_reply;