diff mbox series

[net] tcp: Set ECT0 bit in tos/tclass for synack when BPF needs ECN

Message ID 160593039663.2604.1374502006916871573.stgit@localhost.localdomain (mailing list archive)
State Accepted
Delegated to: Netdev Maintainers
Headers show
Series [net] tcp: Set ECT0 bit in tos/tclass for synack when BPF needs ECN | expand

Checks

Context Check Description
netdev/cover_letter success Link
netdev/fixes_present success Link
netdev/patch_count success Link
netdev/tree_selection success Clearly marked for net
netdev/subject_prefix success Link
netdev/source_inline success Was 0 now: 0
netdev/verify_signedoff success Link
netdev/module_param success Was 0 now: 0
netdev/build_32bit success Errors and warnings before: 15 this patch: 15
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/verify_fixes success Link
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 39 lines checked
netdev/build_allmodconfig_warn success Errors and warnings before: 15 this patch: 15
netdev/header_inline success Link
netdev/stable success Stable not CCed

Commit Message

Alexander Duyck Nov. 21, 2020, 3:47 a.m. UTC
From: Alexander Duyck <alexanderduyck@fb.com>

When a BPF program is used to select between a type of TCP congestion
control algorithm that uses either ECN or not there is a case where the
synack for the frame was coming up without the ECT0 bit set. A bit of
research found that this was due to the final socket being configured to
dctcp while the listener socket was staying in cubic.

To reproduce it all that is needed is to monitor TCP traffic while running
the sample bpf program "samples/bpf/tcp_cong_kern.c". What is observed,
assuming tcp_dctcp module is loaded or compiled in and the traffic matches
the rules in the sample file, is that for all frames with the exception of
the synack the ECT0 bit is set.

To address that it is necessary to make one additional call to
tcp_bpf_ca_needs_ecn using the request socket and then use the output of
that to set the ECT0 bit for the tos/tclass of the packet.

Fixes: 91b5b21c7c16 ("bpf: Add support for changing congestion control")
Signed-off-by: Alexander Duyck <alexanderduyck@fb.com>
---
 net/ipv4/tcp_ipv4.c |   12 ++++++++----
 net/ipv6/tcp_ipv6.c |    9 +++++++--
 2 files changed, 15 insertions(+), 6 deletions(-)

Comments

Jakub Kicinski Nov. 24, 2020, 10:14 p.m. UTC | #1
On Fri, 20 Nov 2020 19:47:44 -0800 Alexander Duyck wrote:
> From: Alexander Duyck <alexanderduyck@fb.com>
> 
> When a BPF program is used to select between a type of TCP congestion
> control algorithm that uses either ECN or not there is a case where the
> synack for the frame was coming up without the ECT0 bit set. A bit of
> research found that this was due to the final socket being configured to
> dctcp while the listener socket was staying in cubic.
> 
> To reproduce it all that is needed is to monitor TCP traffic while running
> the sample bpf program "samples/bpf/tcp_cong_kern.c". What is observed,
> assuming tcp_dctcp module is loaded or compiled in and the traffic matches
> the rules in the sample file, is that for all frames with the exception of
> the synack the ECT0 bit is set.
> 
> To address that it is necessary to make one additional call to
> tcp_bpf_ca_needs_ecn using the request socket and then use the output of
> that to set the ECT0 bit for the tos/tclass of the packet.
> 
> Fixes: 91b5b21c7c16 ("bpf: Add support for changing congestion control")
> Signed-off-by: Alexander Duyck <alexanderduyck@fb.com>

Applied, thank you!
diff mbox series

Patch

diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index c5f8b686aa82..d20d779b18f3 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -980,13 +980,17 @@  static int tcp_v4_send_synack(const struct sock *sk, struct dst_entry *dst,
 
 	skb = tcp_make_synack(sk, dst, req, foc, synack_type, syn_skb);
 
-	tos = sock_net(sk)->ipv4.sysctl_tcp_reflect_tos ?
-			tcp_rsk(req)->syn_tos & ~INET_ECN_MASK :
-			inet_sk(sk)->tos;
-
 	if (skb) {
 		__tcp_v4_send_check(skb, ireq->ir_loc_addr, ireq->ir_rmt_addr);
 
+		tos = sock_net(sk)->ipv4.sysctl_tcp_reflect_tos ?
+				tcp_rsk(req)->syn_tos & ~INET_ECN_MASK :
+				inet_sk(sk)->tos;
+
+		if (!INET_ECN_is_capable(tos) &&
+		    tcp_bpf_ca_needs_ecn((struct sock *)req))
+			tos |= INET_ECN_ECT_0;
+
 		rcu_read_lock();
 		err = ip_build_and_send_pkt(skb, sk, ireq->ir_loc_addr,
 					    ireq->ir_rmt_addr,
diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c
index 3d49e8d0afee..e2b360e9bec4 100644
--- a/net/ipv6/tcp_ipv6.c
+++ b/net/ipv6/tcp_ipv6.c
@@ -527,11 +527,16 @@  static int tcp_v6_send_synack(const struct sock *sk, struct dst_entry *dst,
 		if (np->repflow && ireq->pktopts)
 			fl6->flowlabel = ip6_flowlabel(ipv6_hdr(ireq->pktopts));
 
-		rcu_read_lock();
-		opt = ireq->ipv6_opt;
 		tclass = sock_net(sk)->ipv4.sysctl_tcp_reflect_tos ?
 				tcp_rsk(req)->syn_tos & ~INET_ECN_MASK :
 				np->tclass;
+
+		if (!INET_ECN_is_capable(tclass) &&
+		    tcp_bpf_ca_needs_ecn((struct sock *)req))
+			tclass |= INET_ECN_ECT_0;
+
+		rcu_read_lock();
+		opt = ireq->ipv6_opt;
 		if (!opt)
 			opt = rcu_dereference(np->opt);
 		err = ip6_xmit(sk, skb, fl6, sk->sk_mark, opt,