diff mbox series

[net-next,v2,3/3] Add test for the use of new args in cong_control

Message ID 20240501074338.362361-3-miaxu@meta.com (mailing list archive)
State Changes Requested
Delegated to: Netdev Maintainers
Headers show
Series [net-next,v2,1/3] Add new args for cong_control in tcp_congestion_ops | expand

Checks

Context Check Description
netdev/series_format warning Series does not have a cover letter
netdev/tree_selection success Clearly marked for net-next, async
netdev/ynl success Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present success Fixes tag not required for -next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 932 this patch: 932
netdev/build_tools success Errors and warnings before: 0 this patch: 0
netdev/cc_maintainers warning 16 maintainers not CCed: kuniyu@amazon.com andrii@kernel.org eddyz87@gmail.com daniel@iogearbox.net linux-kselftest@vger.kernel.org ast@kernel.org haoluo@google.com kpsingh@kernel.org john.fastabend@gmail.com shuah@kernel.org jolsa@kernel.org yonghong.song@linux.dev song@kernel.org martin.lau@linux.dev mykolal@fb.com sdf@google.com
netdev/build_clang success Errors and warnings before: 938 this patch: 938
netdev/verify_signedoff fail author Signed-off-by missing
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 944 this patch: 944
netdev/checkpatch fail CHECK: Alignment should match open parenthesis CHECK: Lines should not end with a '(' CHECK: Please don't use multiple blank lines CHECK: Please use a blank line after function/struct/union/enum declarations CHECK: spaces preferred around that '-' (ctx:VxV) CHECK: spaces preferred around that '<<' (ctx:VxV) ERROR: Macros with complex values should be enclosed in parentheses WARNING: added, moved or deleted file(s), does MAINTAINERS need updating? WARNING: line length of 81 exceeds 80 columns WARNING: line length of 83 exceeds 80 columns WARNING: line length of 84 exceeds 80 columns WARNING: line length of 85 exceeds 80 columns WARNING: line length of 89 exceeds 80 columns WARNING: line length of 92 exceeds 80 columns WARNING: line length of 99 exceeds 80 columns
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0
netdev/contest fail net-next-2024-05-01--09-00 (tests: 1001)

Commit Message

Miao Xu May 1, 2024, 7:43 a.m. UTC
This patch adds a selftest to show the usage of the new arguments in
cong_control. For simplicity's sake, the testing example reuses cubic's
kernel functions.
--
Changes in v2:
* Added highlights to explain major differences between the bpf program
and tcp_cubic.c.
* bpf_tcp_helpers.h should not be further extended, so remove the
  dependency on this file. Use vmlinux.h instead.
* Minor changes such as indentation.

Signed-off-by: Miao Xu <miaxu@meta.com>
---
 .../bpf/progs/bpf_cubic_cong_control.c        | 207 ++++++++++++++++++
 .../selftests/bpf/progs/bpf_tracing_net.h     |  10 +
 2 files changed, 217 insertions(+)
 create mode 100644 tools/testing/selftests/bpf/progs/bpf_cubic_cong_control.c

Comments

Martin KaFai Lau May 1, 2024, 8:19 p.m. UTC | #1
On 5/1/24 12:43 AM, Miao Xu wrote:
> This patch adds a selftest to show the usage of the new arguments in
> cong_control. For simplicity's sake, the testing example reuses cubic's
> kernel functions.

Jakub, is it ok to target the set for the bpf-next?

The bpf_tcp_ca test failed (Jakub also mentioned). The progs/tcp_ca_kfunc.c 
requires changes. The func signature of bbr_main and the BPF_PROG(cong_control, 
...) has to be adjusted.

Since it needs a respin, a few nits.

Please add "selftests/bpf:" to the subject line of this patch 3. I think Patch 1 
can use "tcp:" and patch 2 can use "bpf: tcp:" also.

Please also add a cover letter, git format-patch --cover-letter ...

[ ... ]

> +void BPF_STRUCT_OPS(bpf_cubic_cong_control, struct sock *sk, __u32 ack, int flag,
> +		const struct rate_sample *rs)
> +{
> +	struct tcp_sock *tp = tcp_sk(sk);
> +
> +	if (((1<<TCP_CA_CWR) | (1<<TCP_CA_Recovery)) &
> +			(1 << inet_csk(sk)->icsk_ca_state)) {
> +		/* Reduce cwnd if state mandates */
> +		tcp_cwnd_reduction(sk, rs->acked_sacked, rs->losses, flag);
> +
> +		if (!before(tp->snd_una, tp->high_seq)) {
> +			/* Reset cwnd to ssthresh in CWR or Recovery (unless it's undone) */
> +			if (tp->snd_ssthresh < TCP_INFINITE_SSTHRESH &&
> +					inet_csk(sk)->icsk_ca_state == TCP_CA_CWR) {
> +				tp->snd_cwnd = tp->snd_ssthresh;
> +				tp->snd_cwnd_stamp = tcp_jiffies32;
> +			}
> +			// __cwnd_event(sk, CA_EVENT_COMPLETE_CWR);

Remove the commented out code.

> +		}
> +	} else if (tcp_may_raise_cwnd(sk, flag)) {
> +		/* Advance cwnd if state allows */
> +		cubictcp_cong_avoid(sk, ack, rs->acked_sacked);
> +		tp->snd_cwnd_stamp = tcp_jiffies32;
> +	}
> +
> +	tcp_update_pacing_rate(sk);
> +}
> +
> +__u32 BPF_STRUCT_OPS(bpf_cubic_recalc_ssthresh, struct sock *sk)
> +{
> +	return cubictcp_recalc_ssthresh(sk);
> +}
> +
> +void BPF_STRUCT_OPS(bpf_cubic_state, struct sock *sk, __u8 new_state)
> +{
> +	cubictcp_state(sk, new_state);
> +}
> +
> +void BPF_STRUCT_OPS(bpf_cubic_acked, struct sock *sk,
> +		const struct ack_sample *sample)
> +{
> +	cubictcp_acked(sk, sample);
> +}
> +
> +__u32 BPF_STRUCT_OPS(bpf_cubic_undo_cwnd, struct sock *sk)
> +{
> +	return tcp_reno_undo_cwnd(sk);
> +}
> +
> +
> +SEC(".struct_ops")
> +struct tcp_congestion_ops cubic = {
> +	.init		= (void *)bpf_cubic_init,
> +	.ssthresh	= (void *)bpf_cubic_recalc_ssthresh,
> +	.cong_control	= (void *)bpf_cubic_cong_control,
> +	.set_state	= (void *)bpf_cubic_state,
> +	.undo_cwnd	= (void *)bpf_cubic_undo_cwnd,
> +	.cwnd_event	= (void *)bpf_cubic_cwnd_event,
> +	.pkts_acked     = (void *)bpf_cubic_acked,
> +	.name		= "bpf_cubic",

nit. It has the same name as the tcp-cc in bpf_cubic.c. Rename it to 
"bpf_cc_cubic" ?
Jakub Kicinski May 2, 2024, 12:39 a.m. UTC | #2
On Wed, 1 May 2024 13:19:38 -0700 Martin KaFai Lau wrote:
> On 5/1/24 12:43 AM, Miao Xu wrote:
> > This patch adds a selftest to show the usage of the new arguments in
> > cong_control. For simplicity's sake, the testing example reuses cubic's
> > kernel functions.  
> 
> Jakub, is it ok to target the set for the bpf-next?

SGTM!
diff mbox series

Patch

diff --git a/tools/testing/selftests/bpf/progs/bpf_cubic_cong_control.c b/tools/testing/selftests/bpf/progs/bpf_cubic_cong_control.c
new file mode 100644
index 000000000000..7ec9da0356c3
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/bpf_cubic_cong_control.c
@@ -0,0 +1,207 @@ 
+// SPDX-License-Identifier: GPL-2.0-only
+
+/* Highlights:
+ * 1. The major difference between this bpf program and tcp_cubic.c
+ *    is that this bpf program relies on `cong_control` rather than
+ *    `cong_avoid` in the struct tcp_congestion_ops.
+ * 2. Logic such as tcp_cwnd_reduction, tcp_cong_avoid, and
+ *    tcp_update_pacing_rate is bypassed when `cong_control` is
+ *    defined, so moving these logic to `cong_control`.
+ * 3. WARNING: This bpf program is NOT the same as tcp_cubic.c.
+ *    The main purpose is to show use cases of the arguments in
+ *    `cong_control`. For simplicity's sake, it reuses tcp cubic's
+ *    kernel functions.
+ */
+
+#include "vmlinux.h"
+
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_tracing.h>
+#include "bpf_tracing_net.h"
+
+#define BPF_STRUCT_OPS(name, args...) \
+SEC("struct_ops/"#name) \
+BPF_PROG(name, args)
+
+
+#define min(a, b) ((a) < (b) ? (a) : (b))
+#define max(a, b) ((a) > (b) ? (a) : (b))
+
+static __always_inline struct inet_connection_sock *inet_csk(const struct sock *sk)
+{
+	return (struct inet_connection_sock *)sk;
+}
+
+static __always_inline struct tcp_sock *tcp_sk(const struct sock *sk)
+{
+	return (struct tcp_sock *)sk;
+}
+
+static __always_inline bool before(__u32 seq1, __u32 seq2)
+{
+	return (__s32)(seq1-seq2) < 0;
+}
+#define after(seq2, seq1) before(seq1, seq2)
+
+char _license[] SEC("license") = "GPL";
+
+extern void cubictcp_init(struct sock *sk) __ksym;
+extern void cubictcp_cwnd_event(struct sock *sk, enum tcp_ca_event event) __ksym;
+extern __u32 cubictcp_recalc_ssthresh(struct sock *sk) __ksym;
+extern void cubictcp_state(struct sock *sk, __u8 new_state) __ksym;
+extern __u32 tcp_reno_undo_cwnd(struct sock *sk) __ksym;
+extern void cubictcp_acked(struct sock *sk, const struct ack_sample *sample) __ksym;
+extern void cubictcp_cong_avoid(struct sock *sk, __u32 ack, __u32 acked) __ksym;
+
+
+void BPF_STRUCT_OPS(bpf_cubic_init, struct sock *sk)
+{
+	cubictcp_init(sk);
+}
+
+void BPF_STRUCT_OPS(bpf_cubic_cwnd_event, struct sock *sk, enum tcp_ca_event event)
+{
+	cubictcp_cwnd_event(sk, event);
+}
+
+#define USEC_PER_SEC 1000000UL
+#define TCP_PACING_SS_RATIO (200)
+#define TCP_PACING_CA_RATIO (120)
+#define TCP_REORDERING (12)
+#define likely(x) (__builtin_expect(!!(x), 1))
+
+static __always_inline __u64 div64_u64(__u64 dividend, __u64 divisor)
+{
+	return dividend / divisor;
+}
+
+static __always_inline void tcp_update_pacing_rate(struct sock *sk)
+{
+	const struct tcp_sock *tp = tcp_sk(sk);
+	__u64 rate;
+
+	/* set sk_pacing_rate to 200 % of current rate (mss * cwnd / srtt) */
+	rate = (__u64)tp->mss_cache * ((USEC_PER_SEC / 100) << 3);
+
+	/* current rate is (cwnd * mss) / srtt
+	 * In Slow Start [1], set sk_pacing_rate to 200 % the current rate.
+	 * In Congestion Avoidance phase, set it to 120 % the current rate.
+	 *
+	 * [1] : Normal Slow Start condition is (tp->snd_cwnd < tp->snd_ssthresh)
+	 *	 If snd_cwnd >= (tp->snd_ssthresh / 2), we are approaching
+	 *	 end of slow start and should slow down.
+	 */
+	if (tp->snd_cwnd < tp->snd_ssthresh / 2)
+		rate *= TCP_PACING_SS_RATIO;
+	else
+		rate *= TCP_PACING_CA_RATIO;
+
+	rate *= max(tp->snd_cwnd, tp->packets_out);
+
+	if (likely(tp->srtt_us))
+		rate = div64_u64(rate, (__u64)tp->srtt_us);
+
+	sk->sk_pacing_rate = min(rate, sk->sk_max_pacing_rate);
+}
+
+static __always_inline void tcp_cwnd_reduction(
+		struct sock *sk,
+		int newly_acked_sacked,
+		int newly_lost,
+		int flag) {
+	struct tcp_sock *tp = tcp_sk(sk);
+	int sndcnt = 0;
+	__u32 pkts_in_flight = tp->packets_out - (tp->sacked_out + tp->lost_out) + tp->retrans_out;
+	int delta = tp->snd_ssthresh - pkts_in_flight;
+
+	if (newly_acked_sacked <= 0 || !tp->prior_cwnd)
+		return;
+
+	__u32 prr_delivered = tp->prr_delivered + newly_acked_sacked;
+
+	if (delta < 0) {
+		__u64 dividend =
+			(__u64)tp->snd_ssthresh * prr_delivered + tp->prior_cwnd - 1;
+		sndcnt = (__u32)div64_u64(dividend, (__u64)tp->prior_cwnd) - tp->prr_out;
+	} else {
+		sndcnt = max(prr_delivered - tp->prr_out, newly_acked_sacked);
+		if (flag & FLAG_SND_UNA_ADVANCED && !newly_lost)
+			sndcnt++;
+		sndcnt = min(delta, sndcnt);
+	}
+	/* Force a fast retransmit upon entering fast recovery */
+	sndcnt = max(sndcnt, (tp->prr_out ? 0 : 1));
+	tp->snd_cwnd = pkts_in_flight + sndcnt;
+}
+
+/* Decide wheather to run the increase function of congestion control. */
+static __always_inline bool tcp_may_raise_cwnd(
+		const struct sock *sk,
+		const int flag) {
+	if (tcp_sk(sk)->reordering > TCP_REORDERING)
+		return flag & FLAG_FORWARD_PROGRESS;
+
+	return flag & FLAG_DATA_ACKED;
+}
+
+void BPF_STRUCT_OPS(bpf_cubic_cong_control, struct sock *sk, __u32 ack, int flag,
+		const struct rate_sample *rs)
+{
+	struct tcp_sock *tp = tcp_sk(sk);
+
+	if (((1<<TCP_CA_CWR) | (1<<TCP_CA_Recovery)) &
+			(1 << inet_csk(sk)->icsk_ca_state)) {
+		/* Reduce cwnd if state mandates */
+		tcp_cwnd_reduction(sk, rs->acked_sacked, rs->losses, flag);
+
+		if (!before(tp->snd_una, tp->high_seq)) {
+			/* Reset cwnd to ssthresh in CWR or Recovery (unless it's undone) */
+			if (tp->snd_ssthresh < TCP_INFINITE_SSTHRESH &&
+					inet_csk(sk)->icsk_ca_state == TCP_CA_CWR) {
+				tp->snd_cwnd = tp->snd_ssthresh;
+				tp->snd_cwnd_stamp = tcp_jiffies32;
+			}
+			// __cwnd_event(sk, CA_EVENT_COMPLETE_CWR);
+		}
+	} else if (tcp_may_raise_cwnd(sk, flag)) {
+		/* Advance cwnd if state allows */
+		cubictcp_cong_avoid(sk, ack, rs->acked_sacked);
+		tp->snd_cwnd_stamp = tcp_jiffies32;
+	}
+
+	tcp_update_pacing_rate(sk);
+}
+
+__u32 BPF_STRUCT_OPS(bpf_cubic_recalc_ssthresh, struct sock *sk)
+{
+	return cubictcp_recalc_ssthresh(sk);
+}
+
+void BPF_STRUCT_OPS(bpf_cubic_state, struct sock *sk, __u8 new_state)
+{
+	cubictcp_state(sk, new_state);
+}
+
+void BPF_STRUCT_OPS(bpf_cubic_acked, struct sock *sk,
+		const struct ack_sample *sample)
+{
+	cubictcp_acked(sk, sample);
+}
+
+__u32 BPF_STRUCT_OPS(bpf_cubic_undo_cwnd, struct sock *sk)
+{
+	return tcp_reno_undo_cwnd(sk);
+}
+
+
+SEC(".struct_ops")
+struct tcp_congestion_ops cubic = {
+	.init		= (void *)bpf_cubic_init,
+	.ssthresh	= (void *)bpf_cubic_recalc_ssthresh,
+	.cong_control	= (void *)bpf_cubic_cong_control,
+	.set_state	= (void *)bpf_cubic_state,
+	.undo_cwnd	= (void *)bpf_cubic_undo_cwnd,
+	.cwnd_event	= (void *)bpf_cubic_cwnd_event,
+	.pkts_acked     = (void *)bpf_cubic_acked,
+	.name		= "bpf_cubic",
+};
diff --git a/tools/testing/selftests/bpf/progs/bpf_tracing_net.h b/tools/testing/selftests/bpf/progs/bpf_tracing_net.h
index 7001965d1cc3..f9ec630dfcd5 100644
--- a/tools/testing/selftests/bpf/progs/bpf_tracing_net.h
+++ b/tools/testing/selftests/bpf/progs/bpf_tracing_net.h
@@ -80,6 +80,14 @@ 
 #define TCP_INFINITE_SSTHRESH	0x7fffffff
 #define TCP_PINGPONG_THRESH	3
 
+#define FLAG_DATA_ACKED 0x04 /* This ACK acknowledged new data.		*/
+#define FLAG_SYN_ACKED 0x10 /* This ACK acknowledged SYN.		*/
+#define FLAG_DATA_SACKED 0x20 /* New SACK.				*/
+#define FLAG_SND_UNA_ADVANCED \
+	0x400 /* Snd_una was changed (!= FLAG_DATA_ACKED) */
+#define FLAG_ACKED (FLAG_DATA_ACKED | FLAG_SYN_ACKED)
+#define FLAG_FORWARD_PROGRESS (FLAG_ACKED | FLAG_DATA_SACKED)
+
 #define fib_nh_dev		nh_common.nhc_dev
 #define fib_nh_gw_family	nh_common.nhc_gw_family
 #define fib_nh_gw6		nh_common.nhc_gw.ipv6
@@ -119,4 +127,6 @@ 
 #define tw_v6_daddr		__tw_common.skc_v6_daddr
 #define tw_v6_rcv_saddr		__tw_common.skc_v6_rcv_saddr
 
+#define tcp_jiffies32 ((__u32)bpf_jiffies64())
+
 #endif