diff mbox series

[v2] tcp: adjust rcvbuff according copied rate of user space

Message ID 1660117763-38322-1-git-send-email-liyonglong@chinatelecom.cn (mailing list archive)
State Rejected
Delegated to: Netdev Maintainers
Headers show
Series [v2] tcp: adjust rcvbuff according copied rate of user space | expand

Checks

Context Check Description
netdev/tree_selection success Guessed tree name to be net-next, async
netdev/fixes_present success Fixes tag not required for -next series
netdev/subject_prefix warning Target tree name not specified in the subject
netdev/cover_letter success Single patches do not need cover letters
netdev/patch_count success Link
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 2195 this patch: 2195
netdev/cc_maintainers warning 1 maintainers not CCed: yoshfuji@linux-ipv6.org
netdev/build_clang success Errors and warnings before: 579 this patch: 579
netdev/module_param success Was 0 now: 0
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 2323 this patch: 2323
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 53 lines checked
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0

Commit Message

YonglongLi Aug. 10, 2022, 7:49 a.m. UTC
every time data is copied to user space tcp_rcv_space_adjust is called.
current It adjust rcvbuff by the length of data copied to user space.
If the interval of user space copy data from socket is not stable, the
length of data copied to user space will not exactly show the speed of
copying data from rcvbuff.
so in tcp_rcv_space_adjust it is more reasonable to adjust rcvbuff by
copied rate (length of copied data/interval)instead of copied data len

I tested this patch in simulation environment by Mininet:
with 80~120ms RTT / 1% loss link, 100 runs
of (netperf -t TCP_STREAM -l 5), and got an average throughput
of 17715 Kbit instead of 17703 Kbit.
with 80~120ms RTT without loss link, 100 runs of (netperf -t
TCP_STREAM -l 5), and got an average throughput of 18272 Kbit
instead of 18248 Kbit.

Signed-off-by: Yonglong Li <liyonglong@chinatelecom.cn>
---
 include/linux/tcp.h  |  1 +
 net/ipv4/tcp_input.c | 16 +++++++++++++---
 2 files changed, 14 insertions(+), 3 deletions(-)

Comments

Neal Cardwell Aug. 10, 2022, 12:43 p.m. UTC | #1
On Wed, Aug 10, 2022 at 3:49 AM Yonglong Li <liyonglong@chinatelecom.cn> wrote:
>
> every time data is copied to user space tcp_rcv_space_adjust is called.
> current It adjust rcvbuff by the length of data copied to user space.
> If the interval of user space copy data from socket is not stable, the
> length of data copied to user space will not exactly show the speed of
> copying data from rcvbuff.
> so in tcp_rcv_space_adjust it is more reasonable to adjust rcvbuff by
> copied rate (length of copied data/interval)instead of copied data len
>
> I tested this patch in simulation environment by Mininet:
> with 80~120ms RTT / 1% loss link, 100 runs
> of (netperf -t TCP_STREAM -l 5), and got an average throughput
> of 17715 Kbit instead of 17703 Kbit.
> with 80~120ms RTT without loss link, 100 runs of (netperf -t
> TCP_STREAM -l 5), and got an average throughput of 18272 Kbit
> instead of 18248 Kbit.

So with 1% emulated loss that's a 0.06% throughput improvement and
without emulated loss that's a 0.13% improvement. That sounds like it
may well be statistical noise, particularly given that we would expect
the steady-state impact of this change to be negligible.

IMHO these results do not justify the added complexity and state.

best regards,
neal
YonglongLi Aug. 11, 2022, 8:14 a.m. UTC | #2
On 8/10/2022 8:43 PM, Neal Cardwell wrote:
> On Wed, Aug 10, 2022 at 3:49 AM Yonglong Li <liyonglong@chinatelecom.cn> wrote:
>>
>> every time data is copied to user space tcp_rcv_space_adjust is called.
>> current It adjust rcvbuff by the length of data copied to user space.
>> If the interval of user space copy data from socket is not stable, the
>> length of data copied to user space will not exactly show the speed of
>> copying data from rcvbuff.
>> so in tcp_rcv_space_adjust it is more reasonable to adjust rcvbuff by
>> copied rate (length of copied data/interval)instead of copied data len
>>
>> I tested this patch in simulation environment by Mininet:
>> with 80~120ms RTT / 1% loss link, 100 runs
>> of (netperf -t TCP_STREAM -l 5), and got an average throughput
>> of 17715 Kbit instead of 17703 Kbit.
>> with 80~120ms RTT without loss link, 100 runs of (netperf -t
>> TCP_STREAM -l 5), and got an average throughput of 18272 Kbit
>> instead of 18248 Kbit.
> 
> So with 1% emulated loss that's a 0.06% throughput improvement and
> without emulated loss that's a 0.13% improvement. That sounds like it
> may well be statistical noise, particularly given that we would expect
> the steady-state impact of this change to be negligible.
> 
Hi neal,

Thank you for your feedback.
I don't think the improvement is statistical noise. Because I can get small
improvement after patch every time I test.
Neal Cardwell Aug. 11, 2022, 1:26 p.m. UTC | #3
On Thu, Aug 11, 2022 at 4:15 AM Yonglong Li <liyonglong@chinatelecom.cn> wrote:
>
>
>
> On 8/10/2022 8:43 PM, Neal Cardwell wrote:
> > On Wed, Aug 10, 2022 at 3:49 AM Yonglong Li <liyonglong@chinatelecom.cn> wrote:
> >>
> >> every time data is copied to user space tcp_rcv_space_adjust is called.
> >> current It adjust rcvbuff by the length of data copied to user space.
> >> If the interval of user space copy data from socket is not stable, the
> >> length of data copied to user space will not exactly show the speed of
> >> copying data from rcvbuff.
> >> so in tcp_rcv_space_adjust it is more reasonable to adjust rcvbuff by
> >> copied rate (length of copied data/interval)instead of copied data len
> >>
> >> I tested this patch in simulation environment by Mininet:
> >> with 80~120ms RTT / 1% loss link, 100 runs
> >> of (netperf -t TCP_STREAM -l 5), and got an average throughput
> >> of 17715 Kbit instead of 17703 Kbit.
> >> with 80~120ms RTT without loss link, 100 runs of (netperf -t
> >> TCP_STREAM -l 5), and got an average throughput of 18272 Kbit
> >> instead of 18248 Kbit.
> >
> > So with 1% emulated loss that's a 0.06% throughput improvement and
> > without emulated loss that's a 0.13% improvement. That sounds like it
> > may well be statistical noise, particularly given that we would expect
> > the steady-state impact of this change to be negligible.
> >
> Hi neal,
>
> Thank you for your feedback.
> I don't think the improvement is statistical noise. Because I can get small
> improvement after patch every time I test.

Interesting. To help us all understand the dynamics, can you please
share a sender-side tcpdump binary .pcap trace of the emulated tests
without loss, with:

(a) one baseline pcap of a test without the patch, and

(b) one experimental pcap of a test with the patch showing the roughly
0.13% throughput improvement.

It will be interesting to compare the receive window and transmit
behavior in both cases.

thanks,
neal
diff mbox series

Patch

diff --git a/include/linux/tcp.h b/include/linux/tcp.h
index a9fbe22..18e091c 100644
--- a/include/linux/tcp.h
+++ b/include/linux/tcp.h
@@ -410,6 +410,7 @@  struct tcp_sock {
 		u32	space;
 		u32	seq;
 		u64	time;
+		u32	prior_rate;
 	} rcvq_space;
 
 /* TCP-specific MTU probe information. */
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index ab5f0ea..b97aa36 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -544,6 +544,7 @@  static void tcp_init_buffer_space(struct sock *sk)
 	tcp_mstamp_refresh(tp);
 	tp->rcvq_space.time = tp->tcp_mstamp;
 	tp->rcvq_space.seq = tp->copied_seq;
+	tp->rcvq_space.prior_rate = 0;
 
 	maxwin = tcp_full_space(sk);
 
@@ -701,6 +702,7 @@  static inline void tcp_rcv_rtt_measure_ts(struct sock *sk,
 void tcp_rcv_space_adjust(struct sock *sk)
 {
 	struct tcp_sock *tp = tcp_sk(sk);
+	u64 pre_copied_rate, copied_rate;
 	u32 copied;
 	int time;
 
@@ -713,7 +715,14 @@  void tcp_rcv_space_adjust(struct sock *sk)
 
 	/* Number of bytes copied to user in last RTT */
 	copied = tp->copied_seq - tp->rcvq_space.seq;
-	if (copied <= tp->rcvq_space.space)
+	copied_rate = copied * USEC_PER_SEC;
+	do_div(copied_rate, time);
+	pre_copied_rate = tp->rcvq_space.prior_rate;
+	if (!tp->rcvq_space.prior_rate) {
+		pre_copied_rate = tp->rcvq_space.space * USEC_PER_SEC;
+		do_div(pre_copied_rate, time);
+	}
+	if (copied_rate <= pre_copied_rate || !pre_copied_rate)
 		goto new_measure;
 
 	/* A bit of theory :
@@ -736,8 +745,8 @@  void tcp_rcv_space_adjust(struct sock *sk)
 		rcvwin = ((u64)copied << 1) + 16 * tp->advmss;
 
 		/* Accommodate for sender rate increase (eg. slow start) */
-		grow = rcvwin * (copied - tp->rcvq_space.space);
-		do_div(grow, tp->rcvq_space.space);
+		grow = rcvwin * (copied_rate - pre_copied_rate);
+		do_div(grow, pre_copied_rate);
 		rcvwin += (grow << 1);
 
 		rcvmem = SKB_TRUESIZE(tp->advmss + MAX_TCP_HEADER);
@@ -755,6 +764,7 @@  void tcp_rcv_space_adjust(struct sock *sk)
 		}
 	}
 	tp->rcvq_space.space = copied;
+	tp->rcvq_space.prior_rate = copied_rate;
 
 new_measure:
 	tp->rcvq_space.seq = tp->copied_seq;