From patchwork Mon Sep 17 13:02:51 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Trond Myklebust X-Patchwork-Id: 10602651 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 626FC157B for ; Mon, 17 Sep 2018 13:04:01 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 57E65299A8 for ; Mon, 17 Sep 2018 13:04:01 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 4BBE829B63; Mon, 17 Sep 2018 13:04:01 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.0 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FROM,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id A11C8299A8 for ; Mon, 17 Sep 2018 13:04:00 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727491AbeIQSbO (ORCPT ); Mon, 17 Sep 2018 14:31:14 -0400 Received: from mail-qt0-f173.google.com ([209.85.216.173]:40848 "EHLO mail-qt0-f173.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727460AbeIQSbN (ORCPT ); Mon, 17 Sep 2018 14:31:13 -0400 Received: by mail-qt0-f173.google.com with SMTP id h4-v6so15111487qtj.7 for ; Mon, 17 Sep 2018 06:03:58 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:subject:date:message-id:mime-version :content-transfer-encoding; bh=WZz/l4bIbcV6/g8lsFpGkw75CETOubF4Adncn3fOnYY=; b=KRcgjlAtGxOSt1auUsZS6KOrkED9qK9mTQBUo2VaPgMMFZrnMyrN2cClAlJRIST2LE gDOKr4y9o7GWPfUTcOCkURK0xbsuaU3qSLhiSaFDRWYR54RG5j57T9JtDztDTGPOPGt3 Jht3qBE1oYX4k/sPP06gHCYmjXeoHjZIMmxiJifjwfA4J1uYOXiBqEVu7NS5pZAgYxfd u3NSm7LV9j3vSVpmlCBHxruOuZDSrfiSEIeB6XX87b8fXm68UVPvthBGbxsOnZm/ejBT Bl0ETbYeZHvoleovamk44pxwReOiJEsABvP6u3ajQOZjMb4qGbeLXnxXUOQUk3pkwNmU m2uQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:subject:date:message-id:mime-version :content-transfer-encoding; bh=WZz/l4bIbcV6/g8lsFpGkw75CETOubF4Adncn3fOnYY=; b=Ebo96OmQ1SybZKp8jhgAozAwkwaNbiHdYHCp0PG/2pd5VRRR35YwLIWv0AlnqqOpka XflbucTJZRdd8oDXVd7bThwsOK7vm21QBs+IIoaGDIW16OMCP4l2y3sJp1yq8x62LMkm gLxlto76YgjH6ksVQvMc6w0XGE9JZQPSzyqlhN7lPzDKzFukocP68lM1g8rQK8PKjJXL 6OFH4lt8EqrkJNGJijlFvl4BzaaAO8m2hKPIpPkL/VJDtT6KXwJrnN89xZdEBTW8ltAZ ZWYD/Yu6iV5Qbxhi20y4G/fOpRdU95yudtsgBWOLCHIm+BUgMCFLbM6bniHuocVFcxKZ oTOw== X-Gm-Message-State: APzg51CLCsexklnMP3cigSQI0Kd1j3VpUInrdBrN/1wwF4VUx80yEzCB yNTJgBAEKVnXmKWdoq0FoS/M/5Y= X-Google-Smtp-Source: ANB0VdYSSXZds6BHh611i7ykK/GzGU4eSP+AI8gv+Nc4E2uxOkjgppB/j9M/U+hRydMjDAj8Kmhk3A== X-Received: by 2002:a0c:d135:: with SMTP id a50-v6mr17371035qvh.112.1537189437679; Mon, 17 Sep 2018 06:03:57 -0700 (PDT) Received: from leira.trondhjem.org.localdomain ([66.187.232.65]) by smtp.gmail.com with ESMTPSA id q1-v6sm10499607qkl.31.2018.09.17.06.03.56 for (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Mon, 17 Sep 2018 06:03:57 -0700 (PDT) From: Trond Myklebust X-Google-Original-From: Trond Myklebust To: linux-nfs@vger.kernel.org Subject: [PATCH v3 00/44] Convert RPC client transmission to a queued model Date: Mon, 17 Sep 2018 09:02:51 -0400 Message-Id: <20180917130335.112832-1-trond.myklebust@hammerspace.com> X-Mailer: git-send-email 2.17.1 MIME-Version: 1.0 Sender: linux-nfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-nfs@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP For historical reasons, the RPC client is heavily serialised during the process of transmitting a request by the XPRT_LOCK. A request is required to take that lock before it can start XDR encoding, and it is required to hold it until it is done transmitting. In essence the lock protects the following functions: - Stream based transport connect/reconnect - RPCSEC_GSS encoding of the RPC message - Transmission of a single RPC message The following patch set assumes that we do not need to do much to improve performance of the connect/reconnect case, as that is supposed to be a rare occurrence. The set looks at dealing with RPCSEC_GSS issues by removing serialisation while encoding, and simply assuming that if we detect after grabbing the XPRT_LOCK that we're about to transmit a message with a sequence number that has fallen outside the window allowed by RFC2203, then we can abort the transmission of that message, and schedule it for re-encoding. Since window sizes are typically expected to lie above 100 messages or so, we expect these cases where we miss the window to be rare, in general. We try to avoid the requirement that every request must go through the process of being woken up to grab the XPRT_LOCK in order to transmit itself by allowing a request that currently holds the XPRT_LOCK to grab other requests from an ordered queue, and to transmit them too. The bulk of the changes in this patchset are dedicated to providing this functionality. In addition, the XPRT_LOCK queue provides some extra functionality: - Throttling of the TCP slot allocation (as Chuck pointed out) - Fair queuing, to ensure batch jobs don't crowd out interactive ones The patchset does add functionality to ensure that the resulting transmission queue is fair, and also fixes up the RPC wait queues to ensure that they don't compromise fairness. For now, this patchset discards the TCP slot throttling. We may still want to throttle in the case where the connection is lost, but if we do so, we should ensure we do not serialise all requests when in the connected state. The last few patches also take a new look at the client receive code now that we have the iterator method for reading socket data into page buffers. It converts the TCP and the UNIX stream code to using the iterator method and performs some cleanups. --- v2: - Address feedback by Chuck. - Handle UDP/RDMA credits correctly - Remove throttling of TCP slot allocations - Minor nits - Clean up the write_space handling - Fair queueing v3: - Performance improvements, bugfixes and cleanups - Socket stream receive queue improvements Trond Myklebust (44): SUNRPC: Clean up initialisation of the struct rpc_rqst SUNRPC: If there is no reply expected, bail early from call_decode SUNRPC: The transmitted message must lie in the RPCSEC window of validity SUNRPC: Simplify identification of when the message send/receive is complete SUNRPC: Avoid holding locks across the XDR encoding of the RPC message SUNRPC: Rename TCP receive-specific state variables SUNRPC: Move reset of TCP state variables into the reconnect code SUNRPC: Add socket transmit queue offset tracking SUNRPC: Simplify dealing with aborted partially transmitted messages SUNRPC: Refactor the transport request pinning SUNRPC: Add a helper to wake up a sleeping rpc_task and set its status SUNRPC: Test whether the task is queued before grabbing the queue spinlocks SUNRPC: Don't wake queued RPC calls multiple times in xprt_transmit SUNRPC: Rename xprt->recv_lock to xprt->queue_lock SUNRPC: Refactor xprt_transmit() to remove the reply queue code SUNRPC: Refactor xprt_transmit() to remove wait for reply code SUNRPC: Minor cleanup for call_transmit() SUNRPC: Distinguish between the slot allocation list and receive queue SUNRPC: Add a transmission queue for RPC requests SUNRPC: Refactor RPC call encoding SUNRPC: Fix up the back channel transmit SUNRPC: Treat the task and request as separate in the xprt_ops->send_request() SUNRPC: Don't reset the request 'bytes_sent' counter when releasing XPRT_LOCK SUNRPC: Simplify xprt_prepare_transmit() SUNRPC: Move RPC retransmission stat counter to xprt_transmit() SUNRPC: Improve latency for interactive tasks SUNRPC: Support for congestion control when queuing is enabled SUNRPC: Enqueue swapper tagged RPCs at the head of the transmit queue SUNRPC: Allow calls to xprt_transmit() to drain the entire transmit queue SUNRPC: Allow soft RPC calls to time out when waiting for the XPRT_LOCK SUNRPC: Turn off throttling of RPC slots for TCP sockets SUNRPC: Clean up transport write space handling SUNRPC: Cleanup: remove the unused 'task' argument from the request_send() SUNRPC: Don't take transport->lock unnecessarily when taking XPRT_LOCK SUNRPC: Convert xprt receive queue to use an rbtree SUNRPC: Fix priority queue fairness SUNRPC: Convert the xprt->sending queue back to an ordinary wait queue SUNRPC: Add a label for RPC calls that require allocation on receive SUNRPC: Add a bvec array to struct xdr_buf for use with iovec_iter() SUNRPC: Simplify TCP receive code by switching to using iterators SUNRPC: Clean up - rename xs_tcp_data_receive() to xs_stream_data_receive() SUNRPC: Allow AF_LOCAL sockets to use the generic stream receive SUNRPC: Clean up xs_udp_data_receive() SUNRPC: Unexport xdr_partial_copy_from_skb() fs/nfs/nfs3xdr.c | 4 +- include/linux/sunrpc/auth.h | 2 + include/linux/sunrpc/auth_gss.h | 1 + include/linux/sunrpc/bc_xprt.h | 1 + include/linux/sunrpc/sched.h | 10 +- include/linux/sunrpc/svc_xprt.h | 1 - include/linux/sunrpc/xdr.h | 11 +- include/linux/sunrpc/xprt.h | 35 +- include/linux/sunrpc/xprtsock.h | 36 +- include/trace/events/sunrpc.h | 37 +- net/sunrpc/auth.c | 10 + net/sunrpc/auth_gss/auth_gss.c | 41 + net/sunrpc/auth_gss/gss_rpc_xdr.c | 1 + net/sunrpc/backchannel_rqst.c | 1 - net/sunrpc/clnt.c | 174 ++-- net/sunrpc/sched.c | 178 ++-- net/sunrpc/socklib.c | 10 +- net/sunrpc/svc_xprt.c | 2 - net/sunrpc/svcsock.c | 6 +- net/sunrpc/xdr.c | 34 + net/sunrpc/xprt.c | 893 ++++++++++++----- net/sunrpc/xprtrdma/backchannel.c | 4 +- net/sunrpc/xprtrdma/rpc_rdma.c | 12 +- net/sunrpc/xprtrdma/svc_rdma_backchannel.c | 14 +- net/sunrpc/xprtrdma/transport.c | 10 +- net/sunrpc/xprtsock.c | 1060 +++++++++----------- 26 files changed, 1474 insertions(+), 1114 deletions(-)