From patchwork Thu May 2 04:23:16 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Miao Xu X-Patchwork-Id: 13651262 X-Patchwork-Delegate: bpf@iogearbox.net Received: from mx0a-00082601.pphosted.com (mx0b-00082601.pphosted.com [67.231.153.30]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8FC071B974 for ; Thu, 2 May 2024 04:23:57 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=67.231.153.30 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1714623839; cv=none; b=KWRX12bUc2fvFdjxcGecveR5d0w7bz4ufh2tbEXffTwPG/Z54Qby+haPSb8ca5ZzLHdG/LDYOM+imSdlQRt98Zh0gFr+7zSBGoVDQlYHVQAkf4XqMtykudqRlCT+B1xus/NS/zGAWZjmgWD5noizYFa97dEUhT7XHCI18N+t5TM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1714623839; c=relaxed/simple; bh=Z0WII5lUqq3PnSCvCLo1suIEAOcUVGhf1W9ZB94rRLI=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=nwAEXpwxif8XdRbbKoB5fv/LLL4zN1g75Q2Fo3VFJYoHDH9wpmqRnVAomUvosRi0xoNLHDDxaWPyLtVnKbMrYHBmZ1Z3FRB5lBgxCjJRagQKea4fEojJ5H5+u5i9mksfnDrfbWtAhLHKugLXAe3GLYFUbxzcL2jV8q43MLhEPzw= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=meta.com; spf=pass smtp.mailfrom=meta.com; dkim=pass (2048-bit key) header.d=meta.com header.i=@meta.com header.b=bUWY2uVT; arc=none smtp.client-ip=67.231.153.30 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=meta.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=meta.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=meta.com header.i=@meta.com header.b="bUWY2uVT" Received: from pps.filterd (m0089730.ppops.net [127.0.0.1]) by m0089730.ppops.net (8.17.1.19/8.17.1.19) with ESMTP id 441L060Q025995 for ; Wed, 1 May 2024 21:23:56 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=meta.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : content-type; s=s2048-2021-q4; bh=J02JlNwBe+eBK6a8SEyyUE7aqAV4yt0ezY3h6QhmIR8=; b=bUWY2uVTvcayUprb/XZPHIAQ8G5JCqd30yFk3OA51+A9z1IKD5TDtCcTOWUA7We/dhj6 15RwbJXpx8wDZ1HGJEhPsa5Qrtjy0ZkQUr7LgdJJgfYHgq7qRnWjdfMaAgIgiU/IQY7F bdsNakYqH9eOjHgx7tE/rTkGammU1zszOrT4dBCkQiCPIjJLrv0VGaea++h2nt4BqG6l zxwYqJwe+YnJjrGzlukkmcabgeVnhBnBbR0yYUlv7EmnHxrDpEMN8KOIt8y1pq1PxBRP 327JF1ioYq5pE8ZH2s0IlHw1U0BcQemwKd5DrPWZsaSSGlh4T9PL6IPZDzqN5nM+lBSO kA== Received: from mail.thefacebook.com ([163.114.132.120]) by m0089730.ppops.net (PPS) with ESMTPS id 3xu2ymk05c-17 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Wed, 01 May 2024 21:23:56 -0700 Received: from twshared30219.42.prn1.facebook.com (2620:10d:c085:108::150d) by mail.thefacebook.com (2620:10d:c085:21d::8) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.35; Wed, 1 May 2024 21:23:52 -0700 Received: by devvm15954.vll0.facebook.com (Postfix, from userid 420730) id 06515CB26230; Wed, 1 May 2024 21:23:41 -0700 (PDT) From: Miao Xu To: Eric Dumazet , "David S . Miller" , Jakub Kicinski , Paolo Abeni , David Ahern , Martin Lau CC: , , Miao Xu Subject: [PATCH net-next v3 1/3] tcp: Add new args for cong_control in tcp_congestion_ops Date: Wed, 1 May 2024 21:23:16 -0700 Message-ID: <20240502042318.801932-2-miaxu@meta.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20240502042318.801932-1-miaxu@meta.com> References: <20240502042318.801932-1-miaxu@meta.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-FB-Internal: Safe X-Proofpoint-ORIG-GUID: TWuy4RnpSjx4_kZnjehVHVmqTxjhZ-kK X-Proofpoint-GUID: TWuy4RnpSjx4_kZnjehVHVmqTxjhZ-kK X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1011,Hydra:6.0.650,FMLib:17.11.176.26 definitions=2024-05-01_16,2024-04-30_01,2023-05-22_02 X-Patchwork-Delegate: kuba@kernel.org This patch adds two new arguments for cong_control of struct tcp_congestion_ops: - ack - flag These two arguments are inherited from the caller tcp_cong_control in tcp_intput.c. One use case of them is to update cwnd and pacing rate inside cong_control based on the info they provide. For example, the flag can be used to decide if it is the right time to raise or reduce a sender's cwnd. Reviewed-by: Eric Dumazet --- Changes in v3: * Fixed the broken selftest Changes in v2: * Split the v1 patch into 2 separate patches. In particular, spin out bpf_tcp_ca.c as a separate patch because it is bpf specific. Signed-off-by: Miao Xu --- include/net/tcp.h | 2 +- net/ipv4/bpf_tcp_ca.c | 3 ++- net/ipv4/tcp_bbr.c | 2 +- net/ipv4/tcp_input.c | 2 +- tools/testing/selftests/bpf/progs/tcp_ca_kfunc.c | 6 +++--- 5 files changed, 8 insertions(+), 7 deletions(-) diff --git a/include/net/tcp.h b/include/net/tcp.h index fe98fb01879b..7294da8fb780 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -1172,7 +1172,7 @@ struct tcp_congestion_ops { /* call when packets are delivered to update cwnd and pacing rate, * after all the ca_state processing. (optional) */ - void (*cong_control)(struct sock *sk, const struct rate_sample *rs); + void (*cong_control)(struct sock *sk, u32 ack, int flag, const struct rate_sample *rs); /* new value of cwnd after loss (required) */ diff --git a/net/ipv4/bpf_tcp_ca.c b/net/ipv4/bpf_tcp_ca.c index 7f518ea5f4ac..6bd7f8db189a 100644 --- a/net/ipv4/bpf_tcp_ca.c +++ b/net/ipv4/bpf_tcp_ca.c @@ -307,7 +307,8 @@ static u32 bpf_tcp_ca_min_tso_segs(struct sock *sk) return 0; } -static void bpf_tcp_ca_cong_control(struct sock *sk, const struct rate_sample *rs) +static void bpf_tcp_ca_cong_control(struct sock *sk, u32 ack, int flag, + const struct rate_sample *rs) { } diff --git a/net/ipv4/tcp_bbr.c b/net/ipv4/tcp_bbr.c index 7e52ab24e40a..760941e55153 100644 --- a/net/ipv4/tcp_bbr.c +++ b/net/ipv4/tcp_bbr.c @@ -1024,7 +1024,7 @@ static void bbr_update_model(struct sock *sk, const struct rate_sample *rs) bbr_update_gains(sk); } -__bpf_kfunc static void bbr_main(struct sock *sk, const struct rate_sample *rs) +__bpf_kfunc static void bbr_main(struct sock *sk, u32 ack, int flag, const struct rate_sample *rs) { struct bbr *bbr = inet_csk_ca(sk); u32 bw; diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index 53e1150f706f..23ccfc7b1d3c 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -3541,7 +3541,7 @@ static void tcp_cong_control(struct sock *sk, u32 ack, u32 acked_sacked, const struct inet_connection_sock *icsk = inet_csk(sk); if (icsk->icsk_ca_ops->cong_control) { - icsk->icsk_ca_ops->cong_control(sk, rs); + icsk->icsk_ca_ops->cong_control(sk, ack, flag, rs); return; } diff --git a/tools/testing/selftests/bpf/progs/tcp_ca_kfunc.c b/tools/testing/selftests/bpf/progs/tcp_ca_kfunc.c index fcfbfe0336b4..52b610357309 100644 --- a/tools/testing/selftests/bpf/progs/tcp_ca_kfunc.c +++ b/tools/testing/selftests/bpf/progs/tcp_ca_kfunc.c @@ -5,7 +5,7 @@ #include extern void bbr_init(struct sock *sk) __ksym; -extern void bbr_main(struct sock *sk, const struct rate_sample *rs) __ksym; +extern void bbr_main(struct sock *sk, u32 ack, int flag, const struct rate_sample *rs) __ksym; extern u32 bbr_sndbuf_expand(struct sock *sk) __ksym; extern u32 bbr_undo_cwnd(struct sock *sk) __ksym; extern void bbr_cwnd_event(struct sock *sk, enum tcp_ca_event event) __ksym; @@ -42,9 +42,9 @@ void BPF_PROG(in_ack_event, struct sock *sk, u32 flags) } SEC("struct_ops/cong_control") -void BPF_PROG(cong_control, struct sock *sk, const struct rate_sample *rs) +void BPF_PROG(cong_control, struct sock *sk, u32 ack, int flag, const struct rate_sample *rs) { - bbr_main(sk, rs); + bbr_main(sk, ack, flag, rs); } SEC("struct_ops/cong_avoid") From patchwork Thu May 2 04:23:17 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Miao Xu X-Patchwork-Id: 13651260 X-Patchwork-Delegate: bpf@iogearbox.net Received: from mx0a-00082601.pphosted.com (mx0b-00082601.pphosted.com [67.231.153.30]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 860231CA82 for ; Thu, 2 May 2024 04:23:51 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=67.231.153.30 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1714623832; cv=none; b=UEXMzwQ9Bib4A3CUHTbbaNXi2CZhVu572g2dCiTO5C665aEiTuIWaw2lMXJEwBw/xVnRvUFMk8Y4caka3b4Wr/9ezTS338cFrzrYYbKx3zagZjrs6Jua1UbLaiPEmd/dtiivnGp6ot2W8gXRWpCynqdEDPAcNQuRsdp3MBDDmSo= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1714623832; c=relaxed/simple; bh=mOFBz0Vwi79zWTBvD7c6ovgqZSa09WGBOELcyB6lWWs=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=MvzGvQug3NMQec8WKjChr21DO6yWO4YDMn9hwAZCGV6+fL/URAQVQWNGNuUYsdKm3ECG5sMaDYogPKky3k9LPUqFjm35nCOLyHNloOmmB9fevX2v0QzpGXMPnNJCNFfs6P0AENCiYHLLEs/9Wyx/sx1xQcJ7g/VdqTOBsJlNq+M= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=meta.com; spf=pass smtp.mailfrom=meta.com; dkim=pass (2048-bit key) header.d=meta.com header.i=@meta.com header.b=EMprmCPA; arc=none smtp.client-ip=67.231.153.30 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=meta.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=meta.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=meta.com header.i=@meta.com header.b="EMprmCPA" Received: from pps.filterd (m0089730.ppops.net [127.0.0.1]) by m0089730.ppops.net (8.17.1.19/8.17.1.19) with ESMTP id 441L060G025995 for ; Wed, 1 May 2024 21:23:50 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=meta.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : content-type; s=s2048-2021-q4; bh=FdW5u1MxBSQfGLg/7BMhH46+WtHnACBpEy9949+GeOc=; b=EMprmCPAF8ckU0nLGGnK/gweBIxWc5SlyeMi/ggZgLqJnSnhpRJ3bi1ouNsApx2pbt0e KmE0wbvFyZehsMYn6VhYVhhXUW5uZf0DbuGyDasXIbv0+hVUJ3zbDFXck9q05OXALHWV +ZFLB0DDeX4Q4JMFUX1jzRWOon+PVLkA5pVQrhkzdREFctbTezs0SCgvjbyZU1cx2jC6 wyAQd/IAo/nfpygFyka6sgjJWZxmAOaIzZGYIFPJUqnFzvQTfhHX/I6xXxEYtcgMuDSr LZY/SfsTUPnt/qOD/cWXt/EkBuLc8132hRh/FR0Nts8kvfhm+LD1SUMgsyKyYeyfxEOd nw== Received: from mail.thefacebook.com ([163.114.132.120]) by m0089730.ppops.net (PPS) with ESMTPS id 3xu2ymk05c-7 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Wed, 01 May 2024 21:23:49 -0700 Received: from twshared18280.38.frc1.facebook.com (2620:10d:c085:208::11) by mail.thefacebook.com (2620:10d:c085:21d::8) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.35; Wed, 1 May 2024 21:23:45 -0700 Received: by devvm15954.vll0.facebook.com (Postfix, from userid 420730) id E8689CB26250; Wed, 1 May 2024 21:23:43 -0700 (PDT) From: Miao Xu To: Eric Dumazet , "David S . Miller" , Jakub Kicinski , Paolo Abeni , David Ahern , Martin Lau CC: , , Miao Xu Subject: [PATCH net-next v3 2/3] bpf: tcp: Allow to write tp->snd_cwnd_stamp in bpf_tcp_ca Date: Wed, 1 May 2024 21:23:17 -0700 Message-ID: <20240502042318.801932-3-miaxu@meta.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20240502042318.801932-1-miaxu@meta.com> References: <20240502042318.801932-1-miaxu@meta.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-FB-Internal: Safe X-Proofpoint-ORIG-GUID: 9YlKpdXg1YnKMD1g6J39CaU9hawpmZoa X-Proofpoint-GUID: 9YlKpdXg1YnKMD1g6J39CaU9hawpmZoa X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1011,Hydra:6.0.650,FMLib:17.11.176.26 definitions=2024-05-01_16,2024-04-30_01,2023-05-22_02 X-Patchwork-Delegate: kuba@kernel.org This patch allows the write of tp->snd_cwnd_stamp in a bpf tcp ca program. An use case of writing this field is to keep track of the time whenever tp->snd_cwnd is raised or reduced inside the `cong_control` callback. Reviewed-by: Eric Dumazet --- Changes in v3: * Updated the title. Changes in v2: * None. It is a spinout from the original 1st patch. Signed-off-by: Miao Xu --- net/ipv4/bpf_tcp_ca.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/net/ipv4/bpf_tcp_ca.c b/net/ipv4/bpf_tcp_ca.c index 6bd7f8db189a..18227757ec0c 100644 --- a/net/ipv4/bpf_tcp_ca.c +++ b/net/ipv4/bpf_tcp_ca.c @@ -107,6 +107,9 @@ static int bpf_tcp_ca_btf_struct_access(struct bpf_verifier_log *log, case offsetof(struct tcp_sock, snd_cwnd_cnt): end = offsetofend(struct tcp_sock, snd_cwnd_cnt); break; + case offsetof(struct tcp_sock, snd_cwnd_stamp): + end = offsetofend(struct tcp_sock, snd_cwnd_stamp); + break; case offsetof(struct tcp_sock, snd_ssthresh): end = offsetofend(struct tcp_sock, snd_ssthresh); break; From patchwork Thu May 2 04:23:18 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Miao Xu X-Patchwork-Id: 13651261 X-Patchwork-Delegate: bpf@iogearbox.net Received: from mx0a-00082601.pphosted.com (mx0b-00082601.pphosted.com [67.231.153.30]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 305B61B815 for ; Thu, 2 May 2024 04:23:53 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=67.231.153.30 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1714623835; cv=none; b=qOUMSqGQI9YRFFffiHDSq/THGMEn1a9jZYbn9jMpnN6QrJPebyijw4tiydZCdKXH/vXHP++6QE9hW/cGuJG+CY0rkGf+PWkxE4EX7vrh6eDNqck+Iw8EKy8Y3SmWTBXPNXSv/3qt284KeuODQZpRuvCcxHoON+EKYlsKbactfUI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1714623835; c=relaxed/simple; bh=nq4J+JF/nlZncQ9KRhE7y7GG5Mp/mKSDdzAHJp7VVqA=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=djNm26SPiUUHmpcNt8gbJrYX0OlMuY+PcF/bjATjC8/vxvS//2fV2d2Zb6Mz+wPMMo3wL5vwC3B2DUL/oRsQd5eosLMbH3N3A7FkSLADEnmsFa/JB9zsIWlujIabt04TMsZ4bqxNuZbIK0yXVmvTGChOwSPz3MsDF87q4r/vFFk= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=meta.com; spf=pass smtp.mailfrom=meta.com; dkim=pass (2048-bit key) header.d=meta.com header.i=@meta.com header.b=aJ9HJE6n; arc=none smtp.client-ip=67.231.153.30 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=meta.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=meta.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=meta.com header.i=@meta.com header.b="aJ9HJE6n" Received: from pps.filterd (m0089730.ppops.net [127.0.0.1]) by m0089730.ppops.net (8.17.1.19/8.17.1.19) with ESMTP id 441L060L025995 for ; Wed, 1 May 2024 21:23:53 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=meta.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : content-type; s=s2048-2021-q4; bh=CmM0TG0WgPsvqMWWyrMseQcITzyN2UNylHaeM4P42lA=; b=aJ9HJE6nnCG9qn8xvGkrLhf59O8bKaThZ5aq7ArPgXUePLpq7P28CKgzqqZPQxkcB8Ou wJTFCkxZc2A4e5JJ+cpkOxQ8zXm2mA8kuteES0T6PrqSKqB6y+xkJoVVlaHtLgrM2a54 a16hA5WEexQnDeUokWKJbNI3Wsh73eyLl0zfT/XrCI4T/c39IM6nhzP4N6E2+UKjh5ul zB+dxQbFykbVzAx2KcjWxI/rUZ3UCsK6joiHu4RzRL1wC6T4MDzOlxYfVovC3ygfDj7/ ncbKxxYF3sJPYwgu/73jZ5bnK6rI0Lkkz1CNG6WikZYOdmZIGF05BEk/EDl3ehua+jtq /A== Received: from mail.thefacebook.com ([163.114.132.120]) by m0089730.ppops.net (PPS) with ESMTPS id 3xu2ymk05c-12 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Wed, 01 May 2024 21:23:52 -0700 Received: from twshared18280.38.frc1.facebook.com (2620:10d:c085:108::150d) by mail.thefacebook.com (2620:10d:c085:21d::8) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.35; Wed, 1 May 2024 21:23:46 -0700 Received: by devvm15954.vll0.facebook.com (Postfix, from userid 420730) id 72CB7CB262BB; Wed, 1 May 2024 21:23:45 -0700 (PDT) From: Miao Xu To: Eric Dumazet , "David S . Miller" , Jakub Kicinski , Paolo Abeni , David Ahern , Martin Lau CC: , , Miao Xu Subject: [PATCH net-next v3 3/3] selftests/bpf: Add test for the use of new args in cong_control Date: Wed, 1 May 2024 21:23:18 -0700 Message-ID: <20240502042318.801932-4-miaxu@meta.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20240502042318.801932-1-miaxu@meta.com> References: <20240502042318.801932-1-miaxu@meta.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-FB-Internal: Safe X-Proofpoint-ORIG-GUID: BDSjakxhxFbNNVc1MwBVbJCELwof-0En X-Proofpoint-GUID: BDSjakxhxFbNNVc1MwBVbJCELwof-0En X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1011,Hydra:6.0.650,FMLib:17.11.176.26 definitions=2024-05-01_16,2024-04-30_01,2023-05-22_02 X-Patchwork-Delegate: kuba@kernel.org This patch adds a selftest to show the usage of the new arguments in cong_control. For simplicity's sake, the testing example reuses cubic's kernel functions. --- Changes in v3: * Renamed the selftest file and the bpf struct_ops' name. * Minor changes such as removing unused comments. Changes in v2: * Added highlights to explain major differences between the bpf program and tcp_cubic.c. * bpf_tcp_helpers.h should not be further extended, so remove the dependency on this file. Use vmlinux.h instead. * Minor changes such as indentation. Signed-off-by: Miao Xu --- .../selftests/bpf/progs/bpf_cc_cubic.c | 206 ++++++++++++++++++ .../selftests/bpf/progs/bpf_tracing_net.h | 10 + 2 files changed, 216 insertions(+) create mode 100644 tools/testing/selftests/bpf/progs/bpf_cc_cubic.c diff --git a/tools/testing/selftests/bpf/progs/bpf_cc_cubic.c b/tools/testing/selftests/bpf/progs/bpf_cc_cubic.c new file mode 100644 index 000000000000..e37868c05794 --- /dev/null +++ b/tools/testing/selftests/bpf/progs/bpf_cc_cubic.c @@ -0,0 +1,206 @@ +// SPDX-License-Identifier: GPL-2.0-only + +/* Highlights: + * 1. The major difference between this bpf program and tcp_cubic.c + * is that this bpf program relies on `cong_control` rather than + * `cong_avoid` in the struct tcp_congestion_ops. + * 2. Logic such as tcp_cwnd_reduction, tcp_cong_avoid, and + * tcp_update_pacing_rate is bypassed when `cong_control` is + * defined, so moving these logic to `cong_control`. + * 3. WARNING: This bpf program is NOT the same as tcp_cubic.c. + * The main purpose is to show use cases of the arguments in + * `cong_control`. For simplicity's sake, it reuses tcp cubic's + * kernel functions. + */ + +#include "vmlinux.h" + +#include +#include +#include "bpf_tracing_net.h" + +#define BPF_STRUCT_OPS(name, args...) \ +SEC("struct_ops/"#name) \ +BPF_PROG(name, args) + + +#define min(a, b) ((a) < (b) ? (a) : (b)) +#define max(a, b) ((a) > (b) ? (a) : (b)) + +static __always_inline struct inet_connection_sock *inet_csk(const struct sock *sk) +{ + return (struct inet_connection_sock *)sk; +} + +static __always_inline struct tcp_sock *tcp_sk(const struct sock *sk) +{ + return (struct tcp_sock *)sk; +} + +static __always_inline bool before(__u32 seq1, __u32 seq2) +{ + return (__s32)(seq1-seq2) < 0; +} +#define after(seq2, seq1) before(seq1, seq2) + +char _license[] SEC("license") = "GPL"; + +extern void cubictcp_init(struct sock *sk) __ksym; +extern void cubictcp_cwnd_event(struct sock *sk, enum tcp_ca_event event) __ksym; +extern __u32 cubictcp_recalc_ssthresh(struct sock *sk) __ksym; +extern void cubictcp_state(struct sock *sk, __u8 new_state) __ksym; +extern __u32 tcp_reno_undo_cwnd(struct sock *sk) __ksym; +extern void cubictcp_acked(struct sock *sk, const struct ack_sample *sample) __ksym; +extern void cubictcp_cong_avoid(struct sock *sk, __u32 ack, __u32 acked) __ksym; + + +void BPF_STRUCT_OPS(bpf_cubic_init, struct sock *sk) +{ + cubictcp_init(sk); +} + +void BPF_STRUCT_OPS(bpf_cubic_cwnd_event, struct sock *sk, enum tcp_ca_event event) +{ + cubictcp_cwnd_event(sk, event); +} + +#define USEC_PER_SEC 1000000UL +#define TCP_PACING_SS_RATIO (200) +#define TCP_PACING_CA_RATIO (120) +#define TCP_REORDERING (12) +#define likely(x) (__builtin_expect(!!(x), 1)) + +static __always_inline __u64 div64_u64(__u64 dividend, __u64 divisor) +{ + return dividend / divisor; +} + +static __always_inline void tcp_update_pacing_rate(struct sock *sk) +{ + const struct tcp_sock *tp = tcp_sk(sk); + __u64 rate; + + /* set sk_pacing_rate to 200 % of current rate (mss * cwnd / srtt) */ + rate = (__u64)tp->mss_cache * ((USEC_PER_SEC / 100) << 3); + + /* current rate is (cwnd * mss) / srtt + * In Slow Start [1], set sk_pacing_rate to 200 % the current rate. + * In Congestion Avoidance phase, set it to 120 % the current rate. + * + * [1] : Normal Slow Start condition is (tp->snd_cwnd < tp->snd_ssthresh) + * If snd_cwnd >= (tp->snd_ssthresh / 2), we are approaching + * end of slow start and should slow down. + */ + if (tp->snd_cwnd < tp->snd_ssthresh / 2) + rate *= TCP_PACING_SS_RATIO; + else + rate *= TCP_PACING_CA_RATIO; + + rate *= max(tp->snd_cwnd, tp->packets_out); + + if (likely(tp->srtt_us)) + rate = div64_u64(rate, (__u64)tp->srtt_us); + + sk->sk_pacing_rate = min(rate, sk->sk_max_pacing_rate); +} + +static __always_inline void tcp_cwnd_reduction( + struct sock *sk, + int newly_acked_sacked, + int newly_lost, + int flag) { + struct tcp_sock *tp = tcp_sk(sk); + int sndcnt = 0; + __u32 pkts_in_flight = tp->packets_out - (tp->sacked_out + tp->lost_out) + tp->retrans_out; + int delta = tp->snd_ssthresh - pkts_in_flight; + + if (newly_acked_sacked <= 0 || !tp->prior_cwnd) + return; + + __u32 prr_delivered = tp->prr_delivered + newly_acked_sacked; + + if (delta < 0) { + __u64 dividend = + (__u64)tp->snd_ssthresh * prr_delivered + tp->prior_cwnd - 1; + sndcnt = (__u32)div64_u64(dividend, (__u64)tp->prior_cwnd) - tp->prr_out; + } else { + sndcnt = max(prr_delivered - tp->prr_out, newly_acked_sacked); + if (flag & FLAG_SND_UNA_ADVANCED && !newly_lost) + sndcnt++; + sndcnt = min(delta, sndcnt); + } + /* Force a fast retransmit upon entering fast recovery */ + sndcnt = max(sndcnt, (tp->prr_out ? 0 : 1)); + tp->snd_cwnd = pkts_in_flight + sndcnt; +} + +/* Decide wheather to run the increase function of congestion control. */ +static __always_inline bool tcp_may_raise_cwnd( + const struct sock *sk, + const int flag) { + if (tcp_sk(sk)->reordering > TCP_REORDERING) + return flag & FLAG_FORWARD_PROGRESS; + + return flag & FLAG_DATA_ACKED; +} + +void BPF_STRUCT_OPS(bpf_cubic_cong_control, struct sock *sk, __u32 ack, int flag, + const struct rate_sample *rs) +{ + struct tcp_sock *tp = tcp_sk(sk); + + if (((1<icsk_ca_state)) { + /* Reduce cwnd if state mandates */ + tcp_cwnd_reduction(sk, rs->acked_sacked, rs->losses, flag); + + if (!before(tp->snd_una, tp->high_seq)) { + /* Reset cwnd to ssthresh in CWR or Recovery (unless it's undone) */ + if (tp->snd_ssthresh < TCP_INFINITE_SSTHRESH && + inet_csk(sk)->icsk_ca_state == TCP_CA_CWR) { + tp->snd_cwnd = tp->snd_ssthresh; + tp->snd_cwnd_stamp = tcp_jiffies32; + } + } + } else if (tcp_may_raise_cwnd(sk, flag)) { + /* Advance cwnd if state allows */ + cubictcp_cong_avoid(sk, ack, rs->acked_sacked); + tp->snd_cwnd_stamp = tcp_jiffies32; + } + + tcp_update_pacing_rate(sk); +} + +__u32 BPF_STRUCT_OPS(bpf_cubic_recalc_ssthresh, struct sock *sk) +{ + return cubictcp_recalc_ssthresh(sk); +} + +void BPF_STRUCT_OPS(bpf_cubic_state, struct sock *sk, __u8 new_state) +{ + cubictcp_state(sk, new_state); +} + +void BPF_STRUCT_OPS(bpf_cubic_acked, struct sock *sk, + const struct ack_sample *sample) +{ + cubictcp_acked(sk, sample); +} + +__u32 BPF_STRUCT_OPS(bpf_cubic_undo_cwnd, struct sock *sk) +{ + return tcp_reno_undo_cwnd(sk); +} + + +SEC(".struct_ops") +struct tcp_congestion_ops cubic = { + .init = (void *)bpf_cubic_init, + .ssthresh = (void *)bpf_cubic_recalc_ssthresh, + .cong_control = (void *)bpf_cubic_cong_control, + .set_state = (void *)bpf_cubic_state, + .undo_cwnd = (void *)bpf_cubic_undo_cwnd, + .cwnd_event = (void *)bpf_cubic_cwnd_event, + .pkts_acked = (void *)bpf_cubic_acked, + .name = "bpf_cc_cubic", +}; diff --git a/tools/testing/selftests/bpf/progs/bpf_tracing_net.h b/tools/testing/selftests/bpf/progs/bpf_tracing_net.h index 7001965d1cc3..f9ec630dfcd5 100644 --- a/tools/testing/selftests/bpf/progs/bpf_tracing_net.h +++ b/tools/testing/selftests/bpf/progs/bpf_tracing_net.h @@ -80,6 +80,14 @@ #define TCP_INFINITE_SSTHRESH 0x7fffffff #define TCP_PINGPONG_THRESH 3 +#define FLAG_DATA_ACKED 0x04 /* This ACK acknowledged new data. */ +#define FLAG_SYN_ACKED 0x10 /* This ACK acknowledged SYN. */ +#define FLAG_DATA_SACKED 0x20 /* New SACK. */ +#define FLAG_SND_UNA_ADVANCED \ + 0x400 /* Snd_una was changed (!= FLAG_DATA_ACKED) */ +#define FLAG_ACKED (FLAG_DATA_ACKED | FLAG_SYN_ACKED) +#define FLAG_FORWARD_PROGRESS (FLAG_ACKED | FLAG_DATA_SACKED) + #define fib_nh_dev nh_common.nhc_dev #define fib_nh_gw_family nh_common.nhc_gw_family #define fib_nh_gw6 nh_common.nhc_gw.ipv6 @@ -119,4 +127,6 @@ #define tw_v6_daddr __tw_common.skc_v6_daddr #define tw_v6_rcv_saddr __tw_common.skc_v6_rcv_saddr +#define tcp_jiffies32 ((__u32)bpf_jiffies64()) + #endif