From patchwork Fri Nov 23 12:40:19 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ilya Dryomov X-Patchwork-Id: 10695753 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 6EC31175A for ; Fri, 23 Nov 2018 12:40:45 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 5D04C2C013 for ; Fri, 23 Nov 2018 12:40:45 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 51A272C018; Fri, 23 Nov 2018 12:40:45 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.7 required=2.0 tests=BAYES_00,DKIM_ADSP_CUSTOM_MED, DKIM_INVALID,DKIM_SIGNED,FREEMAIL_FROM,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id E65AB2C050 for ; Fri, 23 Nov 2018 12:40:44 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2409869AbeKWXYq (ORCPT ); Fri, 23 Nov 2018 18:24:46 -0500 Received: from mail-wm1-f65.google.com ([209.85.128.65]:40103 "EHLO mail-wm1-f65.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2409857AbeKWXYq (ORCPT ); Fri, 23 Nov 2018 18:24:46 -0500 Received: by mail-wm1-f65.google.com with SMTP id q26so11825228wmf.5 for ; Fri, 23 Nov 2018 04:40:42 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=from:to:subject:date:message-id:in-reply-to:references; bh=EWw8HerqCtPx4Hysb0uHq2kzbCLssiK+EkAIKzeXRHA=; b=jh/RI7BvPjjVYEh/0fTMUcJxKL3ftyP5MHAxOAHigOMCIgLH3EDBGgpQR4/uzfyBiV xwxePZG1MgFI8uepv7Z1K8ZxxNCbA9fugrMsur94f+06xeUOc1KV9fCk2X1OxV6ihktc B7avI0e+znwdHmH5fj+SwECfVRUXKjBHTMFe3WGt5mW8x4kjoPSfHPlst3tMAq50G9/f XK7a6HiEOEoa80/+TOSioGGmpfbXCZJ0ia7bLmIIX3P9tvT9lOUQG57YPhEwJQ/ckNCx Cw7WzjJoqw3qdWZ5fBFVjCtC3TfER2BCOeA/CUKmnlyd4l8BeSrva9QGFncfFjZ8RSpf EctQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:from:to:subject:date:message-id:in-reply-to :references; bh=EWw8HerqCtPx4Hysb0uHq2kzbCLssiK+EkAIKzeXRHA=; b=RqGdplOWHbzY5P9uV6mLPg+TZ8PmQzjjrLblD8qC0I+fL5jiWFxErfo52/ARQcoP24 z5T7DV40Ob/1F1oY6g5eGtAgSvZXWOZdYGJKwgMbfFNY4Tl3PEwdeMpT8NEtcwqtYoK/ 0yXGmtWUXmISSXe0dtXY/DCv3BYjxO7BUTNpr2mk11pupuAOthKZpCtoub6suy02nlhy 2wk709xuJDlwXAUTfGv5Vso+mzwrMKLarLfSX+8QAarcyMm0hqKhzzF5oLvCUQgyLi82 Itx7zzaAD2ERm0jMJZSNzmi0g6Jz87wzRa6oxhSXAQTVchZFmMi7692NmzTBieLYlTRZ vLIQ== X-Gm-Message-State: AGRZ1gLRovsJC8wpycWloc70aozXf2sqsy3OapFuoyV/Kk16boi23PZc 7O1KZLCcmDwS00duCaHAJau+ecCF X-Google-Smtp-Source: AJdET5fQczT75/f7Mn0/qbj6jYIOQ/+3BHuT172yDQds+gwoVwl8cnq2nc0JWBxVHvrEX4y3osp3kA== X-Received: by 2002:a1c:2d90:: with SMTP id t138-v6mr3347685wmt.0.1542976841137; Fri, 23 Nov 2018 04:40:41 -0800 (PST) Received: from orange.local (ip-94-112-136-201.net.upcbroadband.cz. [94.112.136.201]) by smtp.gmail.com with ESMTPSA id l3sm31077857wru.36.2018.11.23.04.40.40 for (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Fri, 23 Nov 2018 04:40:40 -0800 (PST) From: Ilya Dryomov To: ceph-devel@vger.kernel.org Subject: [PATCH 3/4] libceph: use MSG_SENDPAGE_NOTLAST with ceph_tcp_sendpage() Date: Fri, 23 Nov 2018 13:40:19 +0100 Message-Id: <20181123124020.4637-4-idryomov@gmail.com> X-Mailer: git-send-email 2.14.4 In-Reply-To: <20181123124020.4637-1-idryomov@gmail.com> References: <20181123124020.4637-1-idryomov@gmail.com> Sender: ceph-devel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: ceph-devel@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Prevent do_tcp_sendpages() from calling tcp_push() (at least) once per page. Instead, arrange for tcp_push() to be called (at least) once per data payload. This results in more MSS-sized packets and fewer packets overall (5-10% reduction in my tests with typical OSD request sizes). See commits 2f5338442425 ("tcp: allow splice() to build full TSO packets"), 35f9c09fe9c7 ("tcp: tcp_sendpages() should call tcp_push() once") and ae62ca7b0321 ("tcp: fix MSG_SENDPAGE_NOTLAST logic") for details. Here is an example of a packet size histogram for 128K OSD requests (MSS = 1448, top 5): Before: SIZE COUNT 1448 777700 952 127915 1200 39238 1219 9806 21 5675 After: SIZE COUNT 1448 897280 21 6201 1019 2797 643 2739 376 2479 We could do slightly better by explicitly corking the socket but it's not clear it's worth it. Signed-off-by: Ilya Dryomov --- net/ceph/messenger.c | 17 +++++++++++++---- 1 file changed, 13 insertions(+), 4 deletions(-) diff --git a/net/ceph/messenger.c b/net/ceph/messenger.c index 21a743a3bd29..649faa626b35 100644 --- a/net/ceph/messenger.c +++ b/net/ceph/messenger.c @@ -560,12 +560,15 @@ static int ceph_tcp_sendmsg(struct socket *sock, struct kvec *iov, return r; } +/* + * @more: either or both of MSG_MORE and MSG_SENDPAGE_NOTLAST + */ static int ceph_tcp_sendpage(struct socket *sock, struct page *page, - int offset, size_t size, bool more) + int offset, size_t size, int more) { ssize_t (*sendpage)(struct socket *sock, struct page *page, int offset, size_t size, int flags); - int flags = MSG_DONTWAIT | MSG_NOSIGNAL | (more ? MSG_MORE : 0); + int flags = MSG_DONTWAIT | MSG_NOSIGNAL | more; int ret; /* @@ -1552,6 +1555,7 @@ static int write_partial_message_data(struct ceph_connection *con) struct ceph_msg *msg = con->out_msg; struct ceph_msg_data_cursor *cursor = &msg->cursor; bool do_datacrc = !ceph_test_opt(from_msgr(con->msgr), NOCRC); + int more = MSG_MORE | MSG_SENDPAGE_NOTLAST; u32 crc; dout("%s %p msg %p\n", __func__, con, msg); @@ -1580,8 +1584,10 @@ static int write_partial_message_data(struct ceph_connection *con) } page = ceph_msg_data_next(cursor, &page_offset, &length, NULL); + if (length == cursor->total_resid) + more = MSG_MORE; ret = ceph_tcp_sendpage(con->sock, page, page_offset, length, - true); + more); if (ret <= 0) { if (do_datacrc) msg->footer.data_crc = cpu_to_le32(crc); @@ -1611,13 +1617,16 @@ static int write_partial_message_data(struct ceph_connection *con) */ static int write_partial_skip(struct ceph_connection *con) { + int more = MSG_MORE | MSG_SENDPAGE_NOTLAST; int ret; dout("%s %p %d left\n", __func__, con, con->out_skip); while (con->out_skip > 0) { size_t size = min(con->out_skip, (int) PAGE_SIZE); - ret = ceph_tcp_sendpage(con->sock, zero_page, 0, size, true); + if (size == con->out_skip) + more = MSG_MORE; + ret = ceph_tcp_sendpage(con->sock, zero_page, 0, size, more); if (ret <= 0) goto out; con->out_skip -= ret;