From patchwork Mon Aug 22 09:11:44 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peilin Ye X-Patchwork-Id: 12950379 X-Patchwork-Delegate: kuba@kernel.org Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 020E4C32772 for ; Mon, 22 Aug 2022 09:13:14 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234046AbiHVJNN (ORCPT ); Mon, 22 Aug 2022 05:13:13 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59628 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234247AbiHVJMp (ORCPT ); Mon, 22 Aug 2022 05:12:45 -0400 Received: from mail-qk1-x735.google.com (mail-qk1-x735.google.com [IPv6:2607:f8b0:4864:20::735]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 079D62FFE8; Mon, 22 Aug 2022 02:12:13 -0700 (PDT) Received: by mail-qk1-x735.google.com with SMTP id j6so7369037qkl.10; Mon, 22 Aug 2022 02:12:13 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc; bh=2ea0wBQW9PkB6wUzKzt6kw1w9f7ZFMFItxwkN5Ui76I=; b=m1rksws9JQWcPB9p0mF6gjYGfRzBOHLQR3wkiwQMRchKOxskgKChiS+fWSRaRIIST0 l+OJg3Ag3nY1FklQmbzzonq56gL8OoMu4foV4SYhJs57rowKEIpWwaGMYJoD2xETeNFK +o3N8YMF+PnVCcFlsn6yMAHQzswXOu0JgKHTkzz5kqcEQEMIMaDRefI/tRibZUZ1jSVT uaaho9ZolWCVIvztzk/igJLm3d0SucnKTSNa1Eeip0qK/292pOh9zdLAHBKKqHNHpqHc 0JEc11rUtTe17oUkkCi/soQWFgii0lkE1Ek9ltBDlE5tTfj3T1rumiRY3JTs4xNAXb5r 2cgw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc; bh=2ea0wBQW9PkB6wUzKzt6kw1w9f7ZFMFItxwkN5Ui76I=; b=S7sO+8deN6fHLx2wn1aEgWImMT+XoaOCE8VEiT8PwOLmBhzS+XQG1CZhRWMiGfZJfW KjtEqwB3iT9qKuo7BuQH5nTz8G7nkjEtJGs0wsg9E4vA5OFgd57N5J6C3wDzMWEcK3jG jYI15AvoOrKK/Q5Z/ocZ5ZaTf/SCWhEu4gfRJNbq3kgBj6Aw1F07s79jEzi0Rsz9zTaB VDPztSURb+87rwlbbcBH/oLF08E755OrRInX4D59YfnXOlZsRuLSe35fIoI/K3qftS+u UMMnLS6fPTqb8EUss6aXMtHF6JHhzNiDyfOjGMmOJXgRc8exCTzdC1iLF1pcakFb8qlR /7Ig== X-Gm-Message-State: ACgBeo0b42m4IM9g9D3Gpfj3Yk9EjsocvxvwSua/iGdZAaIbAenv87ck zotUiQp128kM8kRME2EWdPUsPPAprg== X-Google-Smtp-Source: AA6agR4FiUN86wqTYFjOD1AoH2apImNQYxd/cveD5lRCMGech/b0wr3QXo5orTFFQphusFu7xjln5w== X-Received: by 2002:a37:444f:0:b0:6bb:186e:345e with SMTP id r76-20020a37444f000000b006bb186e345emr11943251qka.105.1661159532536; Mon, 22 Aug 2022 02:12:12 -0700 (PDT) Received: from bytedance.attlocal.net (ec2-52-52-7-82.us-west-1.compute.amazonaws.com. [52.52.7.82]) by smtp.gmail.com with ESMTPSA id n16-20020ac85a10000000b00344883d3ef8sm9103416qta.84.2022.08.22.02.12.09 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 22 Aug 2022 02:12:12 -0700 (PDT) From: Peilin Ye To: "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Jonathan Corbet , Hideaki YOSHIFUJI , David Ahern , Jamal Hadi Salim , Cong Wang , Jiri Pirko Cc: Peilin Ye , netdev@vger.kernel.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, Cong Wang , Stephen Hemminger , Dave Taht , Peilin Ye Subject: [PATCH RFC v2 net-next 1/5] net: Introduce Qdisc backpressure infrastructure Date: Mon, 22 Aug 2022 02:11:44 -0700 Message-Id: <7e5bd29f232d42d6aa94ff818a778de707203406.1661158173.git.peilin.ye@bytedance.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: kuba@kernel.org X-Patchwork-State: RFC From: Peilin Ye Currently sockets (especially UDP ones) can drop a lot of traffic at TC egress when rate limited by shaper Qdiscs like HTB. Improve this by introducing a Qdisc backpressure infrastructure: a. A new 'sock struct' field, @sk_overlimits, which keeps track of the number of bytes in socket send buffer that are currently unavailable due to TC egress congestion. The size of an overlimit socket's "effective" send buffer is represented by @sk_sndbuf minus @sk_overlimits, with a lower limit of SOCK_MIN_SNDBUF: max(@sk_sndbuf - @sk_overlimits, SOCK_MIN_SNDBUF) b. A new (*backpressure) 'struct proto' callback, which is the protocol's private algorithm for Qdisc backpressure. Working together: 1. When a shaper Qdisc (TBF, HTB, CBQ, etc.) drops a packet that belongs to a local socket, it calls qdisc_backpressure(). 2. qdisc_backpressure() eventually invokes the socket protocol's (*backpressure) callback, which should increase @sk_overlimits. 3. The transport layer then sees a smaller "effective" send buffer and will send slower. 4. It is the per-protocol (*backpressure) implementation's responsibility to decrease @sk_overlimits when TC egress becomes idle again, potentially by using a timer. Suggested-by: Cong Wang Signed-off-by: Peilin Ye --- include/net/sch_generic.h | 11 +++++++++++ include/net/sock.h | 21 +++++++++++++++++++++ net/core/sock.c | 1 + 3 files changed, 33 insertions(+) diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h index ec693fe7c553..afdf4bf64936 100644 --- a/include/net/sch_generic.h +++ b/include/net/sch_generic.h @@ -19,6 +19,7 @@ #include #include #include +#include struct Qdisc_ops; struct qdisc_walker; @@ -1188,6 +1189,16 @@ static inline int qdisc_drop_all(struct sk_buff *skb, struct Qdisc *sch, return NET_XMIT_DROP; } +static inline void qdisc_backpressure(struct sk_buff *skb) +{ + struct sock *sk = skb->sk; + + if (!sk || !sk_fullsock(sk)) + return; + + sk_backpressure(sk); +} + /* Length to Time (L2T) lookup in a qdisc_rate_table, to determine how long it will take to send a packet given its size. */ diff --git a/include/net/sock.h b/include/net/sock.h index 05a1bbdf5805..ef10ca66cf26 100644 --- a/include/net/sock.h +++ b/include/net/sock.h @@ -277,6 +277,7 @@ struct sk_filter; * @sk_pacing_status: Pacing status (requested, handled by sch_fq) * @sk_max_pacing_rate: Maximum pacing rate (%SO_MAX_PACING_RATE) * @sk_sndbuf: size of send buffer in bytes + * @sk_overlimits: size of temporarily unavailable send buffer in bytes * @__sk_flags_offset: empty field used to determine location of bitfield * @sk_padding: unused element for alignment * @sk_no_check_tx: %SO_NO_CHECK setting, set checksum in TX packets @@ -439,6 +440,7 @@ struct sock { struct dst_entry __rcu *sk_dst_cache; atomic_t sk_omem_alloc; int sk_sndbuf; + int sk_overlimits; /* ===== cache line for TX ===== */ int sk_wmem_queued; @@ -1264,6 +1266,7 @@ struct proto { bool (*stream_memory_free)(const struct sock *sk, int wake); bool (*sock_is_readable)(struct sock *sk); + void (*backpressure)(struct sock *sk); /* Memory pressure */ void (*enter_memory_pressure)(struct sock *sk); void (*leave_memory_pressure)(struct sock *sk); @@ -2499,6 +2502,24 @@ static inline void sk_stream_moderate_sndbuf(struct sock *sk) WRITE_ONCE(sk->sk_sndbuf, max_t(u32, val, SOCK_MIN_SNDBUF)); } +static inline int sk_sndbuf_avail(struct sock *sk) +{ + int overlimits, sndbuf = READ_ONCE(sk->sk_sndbuf); + + if (!sk->sk_prot->backpressure) + return sndbuf; + + overlimits = READ_ONCE(sk->sk_overlimits); + + return max_t(int, sndbuf - overlimits, SOCK_MIN_SNDBUF); +} + +static inline void sk_backpressure(struct sock *sk) +{ + if (sk->sk_prot->backpressure) + sk->sk_prot->backpressure(sk); +} + /** * sk_page_frag - return an appropriate page_frag * @sk: socket diff --git a/net/core/sock.c b/net/core/sock.c index 4cb957d934a2..167d471b176f 100644 --- a/net/core/sock.c +++ b/net/core/sock.c @@ -2194,6 +2194,7 @@ struct sock *sk_clone_lock(const struct sock *sk, const gfp_t priority) /* sk_wmem_alloc set to one (see sk_free() and sock_wfree()) */ refcount_set(&newsk->sk_wmem_alloc, 1); + newsk->sk_overlimits = 0; atomic_set(&newsk->sk_omem_alloc, 0); sk_init_common(newsk); From patchwork Mon Aug 22 09:12:20 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peilin Ye X-Patchwork-Id: 12950380 X-Patchwork-Delegate: kuba@kernel.org Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 35D66C32789 for ; Mon, 22 Aug 2022 09:13:33 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234223AbiHVJNa (ORCPT ); Mon, 22 Aug 2022 05:13:30 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59458 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231699AbiHVJM6 (ORCPT ); Mon, 22 Aug 2022 05:12:58 -0400 Received: from mail-qk1-x729.google.com (mail-qk1-x729.google.com [IPv6:2607:f8b0:4864:20::729]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C4D722F3B2; Mon, 22 Aug 2022 02:12:28 -0700 (PDT) Received: by mail-qk1-x729.google.com with SMTP id f14so7403260qkm.0; Mon, 22 Aug 2022 02:12:28 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc; bh=2fvahmCP6fgUtz12efKvT8cnv75am6qomqsw7veYScQ=; b=mMs2NjduS9kOkRBmMgKAFlTQ5qatTuHz5LQx1NkaCUQGz/XgemW0hYV5lA3SFMwgES HPpqxrMn5cNaLzRCQM2nqLDdfSsbsCSxYz0QCveGhNDJ2DpBz04iW804/FVYsN2Zbht9 oYLvi5fRSlaH4qCYP8sWBsazvaD/UKKSphL4z2Q6zvx0aOiFzetTjblzpq6MOGCqWsaP yEHIdo7ouJVUsdTzZQAIvQEwe4FqHW4bsAg4wTJocXzvBJUyGdPXkPUt1masggoQvtd9 KTHqlGhfhnKkUpiRWDqruZgWds60wfY7bZtvJmVA3Bh4h5fSn5rVzu/l7aCfV0Za7Bnq z1Kw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc; bh=2fvahmCP6fgUtz12efKvT8cnv75am6qomqsw7veYScQ=; b=YPeMz6VNIxmFzHNKg6E1TVvYW+oeCP6hjmr4ImSCml7SpUrvHHhbL+1t267X5x/mes gQ8ru4IZAHpqhgOV3ou5Dk3CbmolLUcUTWk9+/bvyrHJq10Kn4Cv727NBeP7nfeY7CiV IqCWC3M4Irf+KKFdHQ3W8bLg8oKfVkyXY5V1g5XCxhvAQ7dWpAQ3Ea3Wp6rh6h/7uF1f 7lJwhk9qREVKWfVqS5nESpklHTx0rEEp52cL2/yZuF1WIptosAnoYYKGUdJ/x8iyT38T ZqvOX9jSuyLnm/Fth1FT620lqZHlEmg4cw9qTCYtKXaLAC4HtuoZX6M8GxSaZ57DOQ6T MS2Q== X-Gm-Message-State: ACgBeo1q6fOsZRifnGKYWM1h66/Mg6OPISHcuGY3Cj3ZwNXQWSTunL2+ wIU2P7NT3PvjZnqTv+ifMQ== X-Google-Smtp-Source: AA6agR681qZn+OY8hYP0CYPx47lPT7P4ZTU+4P04yniRccFURb5nJ5zEwMPCLCaIuqXtSxRmYAkPEA== X-Received: by 2002:a05:620a:4104:b0:6bb:61ce:73a3 with SMTP id j4-20020a05620a410400b006bb61ce73a3mr11715102qko.250.1661159547437; Mon, 22 Aug 2022 02:12:27 -0700 (PDT) Received: from bytedance.attlocal.net (ec2-52-52-7-82.us-west-1.compute.amazonaws.com. [52.52.7.82]) by smtp.gmail.com with ESMTPSA id c23-20020a05620a269700b006b893d135basm10388282qkp.86.2022.08.22.02.12.24 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 22 Aug 2022 02:12:27 -0700 (PDT) From: Peilin Ye To: "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Jonathan Corbet , Hideaki YOSHIFUJI , David Ahern , Jamal Hadi Salim , Cong Wang , Jiri Pirko Cc: Peilin Ye , netdev@vger.kernel.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, Cong Wang , Stephen Hemminger , Dave Taht , Peilin Ye Subject: [PATCH RFC v2 net-next 2/5] net/udp: Implement Qdisc backpressure algorithm Date: Mon, 22 Aug 2022 02:12:20 -0700 Message-Id: <881f3d5bf87bdf4c19a0bd0ae0bf51fbeca7978d.1661158173.git.peilin.ye@bytedance.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: kuba@kernel.org X-Patchwork-State: RFC From: Peilin Ye Support Qdisc backpressure for UDP (IPv4 and IPv6) sockets by implementing the (*backpressure) callback: 1. When a shaper Qdisc drops a packet due to TC egress congestion, halve the effective send buffer [1], then (re)scedule the backpressure timer. [1] sndbuf - overlimits_new == 1/2 * (sndbuf - overlimits_old) 2. When the timer expires, double the effective send buffer [2]. If the socket is still overlimit, reschedule the timer itself. [2] sndbuf - overlimits_new == 2 * (sndbuf - overlimits_old) In sock_wait_for_wmem() and sock_alloc_send_pskb(), check the size of effective send buffer instead, so that overlimit sockets send slower. See sk_sndbuf_avail(). The timer interval is specified by a new per-net sysctl, sysctl_udp_backpressure_interval. Default is 100 milliseconds, meaning that an overlimit UDP socket will try to double its effective send buffer every 100 milliseconds. Use 0 to disable Qdisc backpressure for UDP sockets. Generally, longer interval means lower packet drop rate, but also makes overlimit sockets slower to recover when TC egress becomes idle (or the shaper Qdisc gets removed, etc.) Test results with TBF + SFQ Qdiscs, 500 Mbits/sec rate limit with 16 iperf UDP '-b 1G' clients: Interval Throughput Drop Rate CPU Usage [3] 0 (disabled) 480.0 Mb/s 96.50% 68.38% 10 ms 486.4 Mb/s 9.28% 1.30% 100 ms 486.4 Mb/s 1.10% 1.11% 1000 ms 486.4 Mb/s 0.13% 0.81% [3] perf-top, __pv_queued_spin_lock_slowpath() Signed-off-by: Peilin Ye --- Documentation/networking/ip-sysctl.rst | 11 ++++ include/linux/udp.h | 3 ++ include/net/netns/ipv4.h | 1 + include/net/udp.h | 1 + net/core/sock.c | 4 +- net/ipv4/sysctl_net_ipv4.c | 7 +++ net/ipv4/udp.c | 69 +++++++++++++++++++++++++- net/ipv6/udp.c | 2 +- 8 files changed, 94 insertions(+), 4 deletions(-) diff --git a/Documentation/networking/ip-sysctl.rst b/Documentation/networking/ip-sysctl.rst index 56cd4ea059b2..a0d8e9518fda 100644 --- a/Documentation/networking/ip-sysctl.rst +++ b/Documentation/networking/ip-sysctl.rst @@ -1070,6 +1070,17 @@ udp_rmem_min - INTEGER udp_wmem_min - INTEGER UDP does not have tx memory accounting and this tunable has no effect. +udp_backpressure_interval - INTEGER + The time interval (in milliseconds) in which an overlimit UDP socket + tries to increase its effective send buffer size, used by Qdisc + backpressure. A longer interval typically results in a lower packet + drop rate, but also makes it slower for overlimit UDP sockets to + recover from backpressure when TC egress becomes idle. + + 0 to disable Qdisc backpressure for UDP sockets. + + Default: 100 + RAW variables ============= diff --git a/include/linux/udp.h b/include/linux/udp.h index 254a2654400f..dd017994738b 100644 --- a/include/linux/udp.h +++ b/include/linux/udp.h @@ -86,6 +86,9 @@ struct udp_sock { /* This field is dirtied by udp_recvmsg() */ int forward_deficit; + + /* Qdisc backpressure timer */ + struct timer_list backpressure_timer; }; #define UDP_MAX_SEGMENTS (1 << 6UL) diff --git a/include/net/netns/ipv4.h b/include/net/netns/ipv4.h index c7320ef356d9..01f72ddf23e0 100644 --- a/include/net/netns/ipv4.h +++ b/include/net/netns/ipv4.h @@ -182,6 +182,7 @@ struct netns_ipv4 { int sysctl_udp_wmem_min; int sysctl_udp_rmem_min; + int sysctl_udp_backpressure_interval; u8 sysctl_fib_notify_on_flag_change; diff --git a/include/net/udp.h b/include/net/udp.h index 5ee88ddf79c3..82018e58659b 100644 --- a/include/net/udp.h +++ b/include/net/udp.h @@ -279,6 +279,7 @@ int udp_init_sock(struct sock *sk); int udp_pre_connect(struct sock *sk, struct sockaddr *uaddr, int addr_len); int __udp_disconnect(struct sock *sk, int flags); int udp_disconnect(struct sock *sk, int flags); +void udp_backpressure(struct sock *sk); __poll_t udp_poll(struct file *file, struct socket *sock, poll_table *wait); struct sk_buff *skb_udp_tunnel_segment(struct sk_buff *skb, netdev_features_t features, diff --git a/net/core/sock.c b/net/core/sock.c index 167d471b176f..cb6ba66f80c8 100644 --- a/net/core/sock.c +++ b/net/core/sock.c @@ -2614,7 +2614,7 @@ static long sock_wait_for_wmem(struct sock *sk, long timeo) break; set_bit(SOCK_NOSPACE, &sk->sk_socket->flags); prepare_to_wait(sk_sleep(sk), &wait, TASK_INTERRUPTIBLE); - if (refcount_read(&sk->sk_wmem_alloc) < READ_ONCE(sk->sk_sndbuf)) + if (refcount_read(&sk->sk_wmem_alloc) < sk_sndbuf_avail(sk)) break; if (sk->sk_shutdown & SEND_SHUTDOWN) break; @@ -2649,7 +2649,7 @@ struct sk_buff *sock_alloc_send_pskb(struct sock *sk, unsigned long header_len, if (sk->sk_shutdown & SEND_SHUTDOWN) goto failure; - if (sk_wmem_alloc_get(sk) < READ_ONCE(sk->sk_sndbuf)) + if (sk_wmem_alloc_get(sk) < sk_sndbuf_avail(sk)) break; sk_set_bit(SOCKWQ_ASYNC_NOSPACE, sk); diff --git a/net/ipv4/sysctl_net_ipv4.c b/net/ipv4/sysctl_net_ipv4.c index 5490c285668b..1e509a417b92 100644 --- a/net/ipv4/sysctl_net_ipv4.c +++ b/net/ipv4/sysctl_net_ipv4.c @@ -1337,6 +1337,13 @@ static struct ctl_table ipv4_net_table[] = { .proc_handler = proc_dointvec_minmax, .extra1 = SYSCTL_ONE }, + { + .procname = "udp_backpressure_interval", + .data = &init_net.ipv4.sysctl_udp_backpressure_interval, + .maxlen = sizeof(int), + .mode = 0644, + .proc_handler = proc_dointvec_ms_jiffies, + }, { .procname = "fib_notify_on_flag_change", .data = &init_net.ipv4.sysctl_fib_notify_on_flag_change, diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c index 34eda973bbf1..ff58f638c834 100644 --- a/net/ipv4/udp.c +++ b/net/ipv4/udp.c @@ -110,6 +110,7 @@ #include #include #include "udp_impl.h" +#include #include #include #include @@ -1614,10 +1615,73 @@ void udp_destruct_sock(struct sock *sk) } EXPORT_SYMBOL_GPL(udp_destruct_sock); +static inline int udp_backpressure_interval_get(struct sock *sk) +{ + return READ_ONCE(sock_net(sk)->ipv4.sysctl_udp_backpressure_interval); +} + +static inline void udp_reset_backpressure_timer(struct sock *sk, + unsigned long expires) +{ + sk_reset_timer(sk, &udp_sk(sk)->backpressure_timer, expires); +} + +static void udp_backpressure_timer(struct timer_list *t) +{ + struct udp_sock *up = from_timer(up, t, backpressure_timer); + int interval, sndbuf, overlimits; + struct sock *sk = &up->inet.sk; + + interval = udp_backpressure_interval_get(sk); + if (!interval) { + /* Qdisc backpressure has been turned off */ + WRITE_ONCE(sk->sk_overlimits, 0); + goto out; + } + + sndbuf = READ_ONCE(sk->sk_sndbuf); + overlimits = READ_ONCE(sk->sk_overlimits); + + /* sndbuf - overlimits_new == 2 * (sndbuf - overlimits_old) */ + overlimits = min_t(int, overlimits, sndbuf - SOCK_MIN_SNDBUF); + overlimits = max_t(int, (2 * overlimits) - sndbuf, 0); + WRITE_ONCE(sk->sk_overlimits, overlimits); + + if (overlimits > 0) + udp_reset_backpressure_timer(sk, jiffies + interval); + +out: + sock_put(sk); +} + +void udp_backpressure(struct sock *sk) +{ + int interval, sndbuf, overlimits; + + interval = udp_backpressure_interval_get(sk); + if (!interval) /* Qdisc backpressure is off */ + return; + + sndbuf = READ_ONCE(sk->sk_sndbuf); + overlimits = READ_ONCE(sk->sk_overlimits); + + /* sndbuf - overlimits_new == 1/2 * (sndbuf - overlimits_old) */ + overlimits = min_t(int, overlimits, sndbuf - SOCK_MIN_SNDBUF); + overlimits += (sndbuf - overlimits) >> 1; + WRITE_ONCE(sk->sk_overlimits, overlimits); + + if (overlimits > 0) + udp_reset_backpressure_timer(sk, jiffies + interval); +} +EXPORT_SYMBOL_GPL(udp_backpressure); + int udp_init_sock(struct sock *sk) { - skb_queue_head_init(&udp_sk(sk)->reader_queue); + struct udp_sock *up = udp_sk(sk); + + skb_queue_head_init(&up->reader_queue); sk->sk_destruct = udp_destruct_sock; + timer_setup(&up->backpressure_timer, udp_backpressure_timer, 0); return 0; } EXPORT_SYMBOL_GPL(udp_init_sock); @@ -2653,6 +2717,7 @@ void udp_destroy_sock(struct sock *sk) /* protects from races with udp_abort() */ sock_set_flag(sk, SOCK_DEAD); udp_flush_pending_frames(sk); + sk_stop_timer(sk, &up->backpressure_timer); unlock_sock_fast(sk, slow); if (static_branch_unlikely(&udp_encap_needed_key)) { if (up->encap_type) { @@ -2946,6 +3011,7 @@ struct proto udp_prot = { #ifdef CONFIG_BPF_SYSCALL .psock_update_sk_prot = udp_bpf_update_proto, #endif + .backpressure = udp_backpressure, .memory_allocated = &udp_memory_allocated, .per_cpu_fw_alloc = &udp_memory_per_cpu_fw_alloc, @@ -3268,6 +3334,7 @@ static int __net_init udp_sysctl_init(struct net *net) { net->ipv4.sysctl_udp_rmem_min = PAGE_SIZE; net->ipv4.sysctl_udp_wmem_min = PAGE_SIZE; + net->ipv4.sysctl_udp_backpressure_interval = msecs_to_jiffies(100); #ifdef CONFIG_NET_L3_MASTER_DEV net->ipv4.sysctl_udp_l3mdev_accept = 0; diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c index 16c176e7c69a..106032af6756 100644 --- a/net/ipv6/udp.c +++ b/net/ipv6/udp.c @@ -1735,7 +1735,7 @@ struct proto udpv6_prot = { #ifdef CONFIG_BPF_SYSCALL .psock_update_sk_prot = udp_bpf_update_proto, #endif - + .backpressure = udp_backpressure, .memory_allocated = &udp_memory_allocated, .per_cpu_fw_alloc = &udp_memory_per_cpu_fw_alloc, From patchwork Mon Aug 22 09:12:34 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peilin Ye X-Patchwork-Id: 12950381 X-Patchwork-Delegate: kuba@kernel.org Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6BFD7C3F6B0 for ; Mon, 22 Aug 2022 09:13:54 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234228AbiHVJNx (ORCPT ); Mon, 22 Aug 2022 05:13:53 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58920 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232041AbiHVJND (ORCPT ); Mon, 22 Aug 2022 05:13:03 -0400 Received: from mail-qk1-x72e.google.com (mail-qk1-x72e.google.com [IPv6:2607:f8b0:4864:20::72e]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3965530544; Mon, 22 Aug 2022 02:12:43 -0700 (PDT) Received: by mail-qk1-x72e.google.com with SMTP id w18so7377830qki.8; Mon, 22 Aug 2022 02:12:43 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc; bh=RwkX5VuQNq6CPzJFAR5XBewgxtXPqi3xSbTnyOwN7+I=; b=W16aPagnB+Eqqwg3eqLrkblOdKSXmNZjPlTsvIty0BfcZwt7bOxAQKgtjJB4PHOp6H Ld/SodOaEnThPZKGKxB1aOLggBKnhwm2uYreRwRg9vEQ2SR4c5tXKN8d6uZjT5FMoVAt 0bqNcUQ1XNySmsF/om7ndNCq2XAW+FQSzQF3nDrTNOPh3n6RJTgMUSFpbsG17lBLYGpF 7Q0Y6q/fL7X/hN5rKHkZ7iHoo/ZLnCMk/21hZhvQBcZA1h4d/gzZEXT3NzVdP7F6+f2K e6ygCwn2jmqFqwyKNaceIEOD3r+15lnziXt2zHjnTfqSJ6Q4/3R5/olLY52we/yBe1us QdVg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc; bh=RwkX5VuQNq6CPzJFAR5XBewgxtXPqi3xSbTnyOwN7+I=; b=uRjIxQI6ClhA+f+Idr3ubRtOXtIHRyiTwOK3xXsLSoeN97q3MFDfYUcpr3oLC1IzG4 PPgo2FnEr1yr++Nv7uYJqvilT6LALOZDFMy+N9Zvs+kZsJEWN0ZNCfNjS1vA/YZQfVU7 SafhXLpvXNbih33FbV/Y2IVtqdHA5P+o50uHF+1WTY1+NwBWgdPq/MfqcofFbn+x+ibD RlJUxTYssh/bxFtPFSfjHrzFYyjf5iHL17oE3T/SKdXtBLiFqIWG97zmq/Jeq9jD1/2t 8Fsb6NaxXoEVPitY1Isv4JNE7YLV83j2HawryXUhH58JUOCKEoHgcLfNFMniXcaurPIO E7mw== X-Gm-Message-State: ACgBeo16XvnUxFE/y1m6D2G2ivvlFCyuk71h1cTjNphJ8/PdGtBnodpg Lahr+VQdtAHVQfuwxyiOP4jOwnl4oQ== X-Google-Smtp-Source: AA6agR5Ii9soHdFQXnzDhWF4g+b/ct1l8BF9/MouvVlVlmQMFdFO0cRsthpx3Qe2pavQ5Iu0x6h5Mg== X-Received: by 2002:a37:e20c:0:b0:6bb:2157:749c with SMTP id g12-20020a37e20c000000b006bb2157749cmr11775624qki.752.1661159562294; Mon, 22 Aug 2022 02:12:42 -0700 (PDT) Received: from bytedance.attlocal.net (ec2-52-52-7-82.us-west-1.compute.amazonaws.com. [52.52.7.82]) by smtp.gmail.com with ESMTPSA id y12-20020a05620a25cc00b006bb78d095c5sm9862563qko.79.2022.08.22.02.12.39 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 22 Aug 2022 02:12:41 -0700 (PDT) From: Peilin Ye To: "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Jonathan Corbet , Hideaki YOSHIFUJI , David Ahern , Jamal Hadi Salim , Cong Wang , Jiri Pirko Cc: Peilin Ye , netdev@vger.kernel.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, Cong Wang , Stephen Hemminger , Dave Taht , Peilin Ye Subject: [PATCH RFC v2 net-next 3/5] net/sched: sch_tbf: Use Qdisc backpressure infrastructure Date: Mon, 22 Aug 2022 02:12:34 -0700 Message-Id: X-Mailer: git-send-email 2.25.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: kuba@kernel.org X-Patchwork-State: RFC From: Peilin Ye Recently we introduced a Qdisc backpressure infrastructure (currently supports UDP sockets). Use it in TBF Qdisc. Tested with 500 Mbits/sec rate limit and SFQ inner Qdisc using 16 iperf UDP 1 Gbit/sec clients. Before: [ 3] 0.0-15.0 sec 53.6 MBytes 30.0 Mbits/sec 0.208 ms 1190234/1228450 (97%) [ 3] 0.0-15.0 sec 54.7 MBytes 30.6 Mbits/sec 0.085 ms 955591/994593 (96%) [ 3] 0.0-15.0 sec 55.4 MBytes 31.0 Mbits/sec 0.170 ms 966364/1005868 (96%) [ 3] 0.0-15.0 sec 55.0 MBytes 30.8 Mbits/sec 0.167 ms 925083/964333 (96%) <...> ^^^^^^^^^^^^^^^^^^^ Total throughput is 480.2 Mbits/sec and average drop rate is 96.5%. Now enable Qdisc backpressure for UDP sockets, with udp_backpressure_interval default to 100 milliseconds: [ 3] 0.0-15.0 sec 54.4 MBytes 30.4 Mbits/sec 0.097 ms 450/39246 (1.1%) [ 3] 0.0-15.0 sec 54.4 MBytes 30.4 Mbits/sec 0.331 ms 435/39232 (1.1%) [ 3] 0.0-15.0 sec 54.4 MBytes 30.4 Mbits/sec 0.040 ms 435/39212 (1.1%) [ 3] 0.0-15.0 sec 54.4 MBytes 30.4 Mbits/sec 0.031 ms 426/39208 (1.1%) <...> ^^^^^^^^^^^^^^^^ Total throughput is 486.4 Mbits/sec (1.29% higher) and average drop rate is 1.1% (98.86% lower). However, enabling Qdisc backpressure affects fairness between flow if we use TBF Qdisc with default bfifo inner Qdisc: [ 3] 0.0-15.0 sec 46.1 MBytes 25.8 Mbits/sec 1.102 ms 142/33048 (0.43%) [ 3] 0.0-15.0 sec 72.8 MBytes 40.7 Mbits/sec 0.476 ms 145/52081 (0.28%) [ 3] 0.0-15.0 sec 53.2 MBytes 29.7 Mbits/sec 1.047 ms 141/38086 (0.37%) [ 3] 0.0-15.0 sec 45.5 MBytes 25.4 Mbits/sec 1.600 ms 141/32573 (0.43%) <...> ^^^^^^^^^^^^^^^^^ In the test, per-flow throughput ranged from 16.4 to 68.7 Mbits/sec. However, total throughput was still 486.4 Mbits/sec (0.87% higher than before), and average drop rate was 0.41% (99.58% lower than before). Signed-off-by: Peilin Ye --- net/sched/sch_tbf.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/net/sched/sch_tbf.c b/net/sched/sch_tbf.c index 72102277449e..cf9cc7dbf078 100644 --- a/net/sched/sch_tbf.c +++ b/net/sched/sch_tbf.c @@ -222,6 +222,7 @@ static int tbf_segment(struct sk_buff *skb, struct Qdisc *sch, len += segs->len; ret = qdisc_enqueue(segs, q->qdisc, to_free); if (ret != NET_XMIT_SUCCESS) { + qdisc_backpressure(skb); if (net_xmit_drop_count(ret)) qdisc_qstats_drop(sch); } else { @@ -250,6 +251,7 @@ static int tbf_enqueue(struct sk_buff *skb, struct Qdisc *sch, } ret = qdisc_enqueue(skb, q->qdisc, to_free); if (ret != NET_XMIT_SUCCESS) { + qdisc_backpressure(skb); if (net_xmit_drop_count(ret)) qdisc_qstats_drop(sch); return ret; From patchwork Mon Aug 22 09:12:45 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peilin Ye X-Patchwork-Id: 12950382 X-Patchwork-Delegate: kuba@kernel.org Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 35408C32772 for ; Mon, 22 Aug 2022 09:13:55 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234327AbiHVJNy (ORCPT ); Mon, 22 Aug 2022 05:13:54 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59938 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234269AbiHVJND (ORCPT ); Mon, 22 Aug 2022 05:13:03 -0400 Received: from mail-qk1-x735.google.com (mail-qk1-x735.google.com [IPv6:2607:f8b0:4864:20::735]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 2DDBB2F38E; Mon, 22 Aug 2022 02:12:53 -0700 (PDT) Received: by mail-qk1-x735.google.com with SMTP id b2so7369111qkh.12; Mon, 22 Aug 2022 02:12:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc; bh=EoKU5NMd0nzh9okS9yEL/Pxx8uJ6M3X3Pa8DzwXaJ+s=; b=dPK7MlOv/3Rv0kwtSMm0JB69QUtxCgCKxZmnOprsQIp6PvJ3ToiOcWmfZBZx6b9sq/ m8b779okEh51eTPdfVb0smV5QcFUx7z+tXP8B0DebRax4kx7tZ2+gIGlMi2gAfmmouGX ycbiDJpu9TLbqRwR+05Dw7w9We5KHvTc3Y3zqHcmXQKpSR6y57iaHbfFfxL+Ss+5Nljy tmY+4HHzyPVTg/+mrGf5GHB3JzK/cccnOy45IaZ9PUvzjbq/UBa/C2vunSot0+gAr9JR g1ModSkIE9De7+Kdq+rNIwCfBbiUiqTfPB/f77Z3Zkh/k2m11k2mZZAdl8u8pgmTP1nj AZPg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc; bh=EoKU5NMd0nzh9okS9yEL/Pxx8uJ6M3X3Pa8DzwXaJ+s=; b=qyfgBuHtQdj9v2wUNiCZ+Vd/hDVQrF+7qMAlApBVHkwUxhWXS7WDoBYbqni8jL8/qW vUaOsyyfLG8nrTFrqri/+ccOCFFzHYwcnddl2IhqEKnTE/lV8wscltugKdgLw7XOW4F7 chVX5ypm61fF/VSK5CJOFxsUfRgyEchdb6P8IFFqUAiHGYH60jSelV30Eoq+m0G7Bw2v 0T+TIWPsxHIRPgEtzuIqSfv6qDs30kvrVHh39U3ZaE13DQ+ZleRyxEpbgKkxdOK0qahn YJhUVyWh5PlrIiRNxVv6HVxc57L0D5RJXCLOCPXNql06o6vzXentk+R9fNetGkLjlPXG 1BlQ== X-Gm-Message-State: ACgBeo08us/oMmQRP9m6hM3SKVKwJqvMYomYRGUli0OWHgnui5nxOpu/ 0qsSTluEsxNXCP6Y3oPjpw== X-Google-Smtp-Source: AA6agR7kAtYh6BJ3ptuhEepdvK5e3yQ+AwZtrhWmc2vVNkIQj/8XuOwa+/n08Fo7y/e1nzWn0CXVZw== X-Received: by 2002:a05:620a:f8c:b0:6b5:bcfd:87e3 with SMTP id b12-20020a05620a0f8c00b006b5bcfd87e3mr11737015qkn.93.1661159572324; Mon, 22 Aug 2022 02:12:52 -0700 (PDT) Received: from bytedance.attlocal.net (ec2-52-52-7-82.us-west-1.compute.amazonaws.com. [52.52.7.82]) by smtp.gmail.com with ESMTPSA id az41-20020a05620a172900b006bb8b5b79efsm10293327qkb.129.2022.08.22.02.12.49 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 22 Aug 2022 02:12:51 -0700 (PDT) From: Peilin Ye To: "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Jonathan Corbet , Hideaki YOSHIFUJI , David Ahern , Jamal Hadi Salim , Cong Wang , Jiri Pirko Cc: Peilin Ye , netdev@vger.kernel.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, Cong Wang , Stephen Hemminger , Dave Taht , Peilin Ye Subject: [PATCH RFC v2 net-next 4/5] net/sched: sch_htb: Use Qdisc backpressure infrastructure Date: Mon, 22 Aug 2022 02:12:45 -0700 Message-Id: X-Mailer: git-send-email 2.25.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: kuba@kernel.org X-Patchwork-State: RFC From: Peilin Ye Recently we introduced a Qdisc backpressure infrastructure (currently supports UDP sockets). Use it in HTB Qdisc. Tested with 500 Mbits/sec rate limit using 16 iperf UDP 1 Gbit/sec clients. Before: [ 3] 0.0-15.0 sec 54.2 MBytes 30.4 Mbits/sec 0.875 ms 1245750/1284444 (97%) [ 3] 0.0-15.0 sec 54.2 MBytes 30.3 Mbits/sec 1.288 ms 1238753/1277402 (97%) [ 3] 0.0-15.0 sec 54.8 MBytes 30.6 Mbits/sec 1.761 ms 1261762/1300817 (97%) [ 3] 0.0-15.0 sec 53.9 MBytes 30.1 Mbits/sec 1.635 ms 1241690/1280133 (97%) <...> ^^^^^^^^^^^^^^^^^^^^^ Total throughput is 482.0 Mbits/sec and average drop rate is 97.0%. Now enable Qdisc backpressure for UDP sockets, with udp_backpressure_interval default to 100 milliseconds: [ 3] 0.0-15.0 sec 53.0 MBytes 29.6 Mbits/sec 1.621 ms 54/37856 (0.14%) [ 3] 0.0-15.0 sec 55.9 MBytes 31.3 Mbits/sec 1.368 ms 6/39895 (0.015%) [ 3] 0.0-15.0 sec 52.3 MBytes 29.2 Mbits/sec 1.560 ms 56/37340 (0.15%) [ 3] 0.0-15.0 sec 52.7 MBytes 29.5 Mbits/sec 1.495 ms 57/37677 (0.15%) <...> ^^^^^^^^^^^^^^^^ Total throughput is 485.9 Mbits/sec (0.81% higher) and average drop rate is 0.1% (99.9% lower). Fairness between flows is slightly affected, with per-flow average throughput ranging from 29.2 to 31.8 Mbits/sec (compared with 29.7 to 30.6 Mbits/sec). Signed-off-by: Peilin Ye --- net/sched/sch_htb.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/net/sched/sch_htb.c b/net/sched/sch_htb.c index 23a9d6242429..e337b3d0dab3 100644 --- a/net/sched/sch_htb.c +++ b/net/sched/sch_htb.c @@ -623,6 +623,7 @@ static int htb_enqueue(struct sk_buff *skb, struct Qdisc *sch, __qdisc_enqueue_tail(skb, &q->direct_queue); q->direct_pkts++; } else { + qdisc_backpressure(skb); return qdisc_drop(skb, sch, to_free); } #ifdef CONFIG_NET_CLS_ACT @@ -634,6 +635,7 @@ static int htb_enqueue(struct sk_buff *skb, struct Qdisc *sch, #endif } else if ((ret = qdisc_enqueue(skb, cl->leaf.q, to_free)) != NET_XMIT_SUCCESS) { + qdisc_backpressure(skb); if (net_xmit_drop_count(ret)) { qdisc_qstats_drop(sch); cl->drops++; From patchwork Mon Aug 22 09:12:57 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peilin Ye X-Patchwork-Id: 12950383 X-Patchwork-Delegate: kuba@kernel.org Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id BC21EC32789 for ; Mon, 22 Aug 2022 09:14:10 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234377AbiHVJOJ (ORCPT ); Mon, 22 Aug 2022 05:14:09 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60978 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234288AbiHVJNG (ORCPT ); Mon, 22 Aug 2022 05:13:06 -0400 Received: from mail-qt1-x831.google.com (mail-qt1-x831.google.com [IPv6:2607:f8b0:4864:20::831]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id EE04F2F012; Mon, 22 Aug 2022 02:13:05 -0700 (PDT) Received: by mail-qt1-x831.google.com with SMTP id s11so7402083qtx.6; Mon, 22 Aug 2022 02:13:05 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc; bh=oLhRD+x39eo4QrarK+tsqoJXBmbcLJZygmNFucd2Avo=; b=msRFpykTXsCe/Pxd8Y6WfbL+jEIjV+02eeFydYh17S4cRtTLXt9VY9B2mKuhDkiFKv W0jleKLm4DXs6spyttKlvblcS1D/lTPPdAFidZ4y3jLFD+S4oKfEnX/HQ5Na9IBUey6U a0kXbS8Nk/UcIHPbs+4eI9g8yyYGujNUh2EKGozQQietcyyIMO0nIqymNnMtMqhaikyM 8gqy80bE+hulmDp3Wi6gnFSj8L0pBoBhoCWtUzCHzpNhGc0sAop+dm5dX/Gwcaf69lVs N9GuJ7+UyxVdF+lDTM8GlCV9zGJSofbi6SxTSt8P3dxdUcNqVz6Z/360KnDrsoM7ms1L Ge6Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc; bh=oLhRD+x39eo4QrarK+tsqoJXBmbcLJZygmNFucd2Avo=; b=nwgtli7oh5Mr25w3g4u/4Qe8HlyZ5I0CAU0isjP21Dh4TP1VqQZmCbEI1IxprEOdgs i2JfWLXXXmttTieEg6ynlG7vgmexakE6iYo69ZXgpWEuc7yRAOa5aL2uSMuWhar3CsnG ImrVtuCSz2BHHRYjD73BVnA8rCzIkr6a7doApQBKlCT0wpUsa+eI9mLt+stN5SgiaqLH Yo5iIwV8yIIViKPPhmTZU69QD1E6IgyhB+BTtNxTxdCldd+ECz2tEx2i1OxUa7L7YV9E HeknFFG5SMcIs54LYfAlM6J6FhaW244kPZvpN4i6ydfkfatK9baql+sn4zdCH3/+OzUA p88A== X-Gm-Message-State: ACgBeo2zawxIUndr9WhpKst4Gj8V8sP2zvU/nnAoFpd8WVc9X09gOAK0 PNLwm7GXSO0WCxpreCqT5Q== X-Google-Smtp-Source: AA6agR4N9vnN8Gs4b6r+a2D+aUM9fVUuftjPg9+VD9tSWdsy6YeEUE5rTWIyu3/GFCwRnpeP5qFB9g== X-Received: by 2002:a05:622a:1214:b0:344:5ec8:d620 with SMTP id y20-20020a05622a121400b003445ec8d620mr14654563qtx.228.1661159585026; Mon, 22 Aug 2022 02:13:05 -0700 (PDT) Received: from bytedance.attlocal.net (ec2-52-52-7-82.us-west-1.compute.amazonaws.com. [52.52.7.82]) by smtp.gmail.com with ESMTPSA id bw12-20020a05622a098c00b0031eddc83560sm8626117qtb.90.2022.08.22.02.13.01 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 22 Aug 2022 02:13:04 -0700 (PDT) From: Peilin Ye To: "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Jonathan Corbet , Hideaki YOSHIFUJI , David Ahern , Jamal Hadi Salim , Cong Wang , Jiri Pirko Cc: Peilin Ye , netdev@vger.kernel.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, Cong Wang , Stephen Hemminger , Dave Taht , Peilin Ye Subject: [PATCH RFC v2 net-next 5/5] net/sched: sch_cbq: Use Qdisc backpressure infrastructure Date: Mon, 22 Aug 2022 02:12:57 -0700 Message-Id: <614f8f31e3b62dfebb8cb4707c81918a6c7e381d.1661158173.git.peilin.ye@bytedance.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: kuba@kernel.org X-Patchwork-State: RFC From: Peilin Ye Recently we introduced a Qdisc backpressure infrastructure (currently supports UDP sockets). Use it in CBQ Qdisc. Tested with 500 Mbits/sec rate limit using 16 iperf UDP 1 Gbit/sec clients. Before: [ 3] 0.0-15.0 sec 55.8 MBytes 31.2 Mbits/sec 1.185 ms 1073326/1113110 (96%) [ 3] 0.0-15.0 sec 55.9 MBytes 31.3 Mbits/sec 1.001 ms 1080330/1120201 (96%) [ 3] 0.0-15.0 sec 55.6 MBytes 31.1 Mbits/sec 1.750 ms 1078292/1117980 (96%) [ 3] 0.0-15.0 sec 55.3 MBytes 30.9 Mbits/sec 0.895 ms 1089200/1128640 (97%) <...> ^^^^^^^^^^^^^^^^^^^^^ Total throughput is 493.7 Mbits/sec and average drop rate is 96.13%. Now enable Qdisc backpressure for UDP sockets, with udp_backpressure_interval default to 100 milliseconds: [ 3] 0.0-15.0 sec 54.2 MBytes 30.3 Mbits/sec 2.302 ms 54/38692 (0.14%) [ 3] 0.0-15.0 sec 54.1 MBytes 30.2 Mbits/sec 2.227 ms 54/38671 (0.14%) [ 3] 0.0-15.0 sec 53.5 MBytes 29.9 Mbits/sec 2.043 ms 57/38203 (0.15%) [ 3] 0.0-15.0 sec 58.1 MBytes 32.5 Mbits/sec 1.843 ms 1/41480 (0.0024%) <...> ^^^^^^^^^^^^^^^^^ Total throughput is 497.1 Mbits/sec (0.69% higher), average drop rate is 0.08% (99.9% lower). Fairness between flows is slightly affected, with per-flow average throughput ranging from 29.9 to 32.6 Mbits/sec (compared with 30.3 to 31.3 Mbits/sec). Signed-off-by: Peilin Ye --- net/sched/sch_cbq.c | 1 + 1 file changed, 1 insertion(+) diff --git a/net/sched/sch_cbq.c b/net/sched/sch_cbq.c index 91a0dc463c48..42e44f570988 100644 --- a/net/sched/sch_cbq.c +++ b/net/sched/sch_cbq.c @@ -381,6 +381,7 @@ cbq_enqueue(struct sk_buff *skb, struct Qdisc *sch, return ret; } + qdisc_backpressure(skb); if (net_xmit_drop_count(ret)) { qdisc_qstats_drop(sch); cbq_mark_toplevel(q, cl);