diff mbox series

[RFC,v2,net-next] tcp: remove 64 KByte limit for initial tp->rcv_wnd value

Message ID 20240518025008.70689-1-kerneljasonxing@gmail.com (mailing list archive)
State Superseded
Delegated to: Netdev Maintainers
Headers show
Series [RFC,v2,net-next] tcp: remove 64 KByte limit for initial tp->rcv_wnd value | expand

Checks

Context Check Description
netdev/series_format success Single patches do not need cover letters
netdev/tree_selection success Clearly marked for net-next
netdev/ynl success Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present success Fixes tag not required for -next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 922 this patch: 922
netdev/build_tools success No tools touched, skip
netdev/cc_maintainers success CCed 5 of 5 maintainers
netdev/build_clang success Errors and warnings before: 927 this patch: 927
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 927 this patch: 927
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 8 lines checked
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 1 this patch: 1
netdev/source_inline success Was 0 now: 0

Commit Message

Jason Xing May 18, 2024, 2:50 a.m. UTC
From: Jason Xing <kernelxing@tencent.com>

In 2018 commit a337531b942b ("tcp: up initial rmem to 128KB and SYN rwin
to around 64KB") limited the initial value of tp->rcv_wnd to 65535, most
CDN team would not benefit from this change because they cannot have a
large window to receive a big packet, which will be slowed down especially
in long RTT.

According to RFC 7323, it says:
  "The maximum receive window, and therefore the scale factor, is
   determined by the maximum receive buffer space."

So we can get rid of this 64k limitation and let the window be tunable if
the user wants to do it within the control of buffer space. Then many
companies, I believe, can have the same behaviour as old days. Besides,
there are many papers conducting various interesting experiments which
have something to do with this window and show good outputs in some cases,
say, paper [1] in Yahoo! CDN.

To avoid future confusion, current change doesn't affect the initial
receive window on the wire in a SYN or SYN+ACK packet which are set within
65535 bytes according to RFC 7323 also due to the limit in
__tcp_transmit_skb():

    th->window      = htons(min(tp->rcv_wnd, 65535U));

In one word, __tcp_transmit_skb() already ensures that constraint is
respected, no matter how large tp->rcv_wnd is.

Let me provide one example if with or without the patch:
Before:
client   --- SYN: rwindow=65535 ---> server
client   <--- SYN+ACK: rwindow=65535 ----  server
client   --- ACK: rwindow=65536 ---> server
Note: for the last ACK, the calculation is 512 << 7.

After:
client   --- SYN: rwindow=65535 ---> server
client   <--- SYN+ACK: rwindow=65535 ----  server
client   --- ACK: rwindow=175232 ---> server
Note: I use the following command to make it work:
ip route change default via [ip] dev eth0 metric 100 initrwnd 120
For the last ACK, the calculation is 1369 << 7.

We can pay attention to the last ACK in 3-way shakehand and notice that
with the patch applied the window can reach more than 64 KByte.

[1]: https://conferences.sigcomm.org/imc/2011/docs/p569.pdf

Signed-off-by: Jason Xing <kernelxing@tencent.com>
---
v2
Link: https://lore.kernel.org/all/20240517085031.18896-1-kerneljasonxing@gmail.com/
1. revise the title and body messages (Neal)
---
 net/ipv4/tcp_output.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Eric Dumazet May 20, 2024, 4:51 p.m. UTC | #1
On Sat, May 18, 2024 at 4:50 AM Jason Xing <kerneljasonxing@gmail.com> wrote:
>
> From: Jason Xing <kernelxing@tencent.com>
>
> In 2018 commit a337531b942b ("tcp: up initial rmem to 128KB and SYN rwin
> to around 64KB") limited the initial value of tp->rcv_wnd to 65535, most
> CDN team would not benefit from this change because they cannot have a
> large window to receive a big packet, which will be slowed down especially
> in long RTT.
>
> According to RFC 7323, it says:
>   "The maximum receive window, and therefore the scale factor, is
>    determined by the maximum receive buffer space."

This seems not relevant ?  wscale factor is not changed in this patch ?
tp->rcv_wnd is also not the maximum receive window.

>
> So we can get rid of this 64k limitation and let the window be tunable if
> the user wants to do it within the control of buffer space. Then many
> companies, I believe, can have the same behaviour as old days.

Not sure this has ever worked, see below.

Also, the "many companies ..." mention has nothing to do in a changelog.


> Besides,
> there are many papers conducting various interesting experiments which
> have something to do with this window and show good outputs in some cases,
> say, paper [1] in Yahoo! CDN.

I think this changelog is trying hard to sell something, but in
reality TCP 3WHS nature
makes your claims wrong.

Instead, you should clearly document that this problem can _not_ be
solved for both
active _and_ passive connections.

In the first RTT, a client (active connection) can not send more than
64KB, if TCP specs
are properly applied.

>
> To avoid future confusion, current change doesn't affect the initial
> receive window on the wire in a SYN or SYN+ACK packet which are set within
> 65535 bytes according to RFC 7323 also due to the limit in
> __tcp_transmit_skb():
>
>     th->window      = htons(min(tp->rcv_wnd, 65535U));
>
> In one word, __tcp_transmit_skb() already ensures that constraint is
> respected, no matter how large tp->rcv_wnd is.
>
> Let me provide one example if with or without the patch:
> Before:
> client   --- SYN: rwindow=65535 ---> server
> client   <--- SYN+ACK: rwindow=65535 ----  server
> client   --- ACK: rwindow=65536 ---> server
> Note: for the last ACK, the calculation is 512 << 7.
>
> After:
> client   --- SYN: rwindow=65535 ---> server
> client   <--- SYN+ACK: rwindow=65535 ----  server
> client   --- ACK: rwindow=175232 ---> server
> Note: I use the following command to make it work:
> ip route change default via [ip] dev eth0 metric 100 initrwnd 120
> For the last ACK, the calculation is 1369 << 7.
>
> We can pay attention to the last ACK in 3-way shakehand and notice that
> with the patch applied the window can reach more than 64 KByte.

You carefully avoided mentioning the asymmetry.
I do not think this is needed in the changelog, because this is adding
confusion.

>
> [1]: https://conferences.sigcomm.org/imc/2011/docs/p569.pdf
>
> Signed-off-by: Jason Xing <kernelxing@tencent.com>
> ---
> v2
> Link: https://lore.kernel.org/all/20240517085031.18896-1-kerneljasonxing@gmail.com/
> 1. revise the title and body messages (Neal)
> ---
>  net/ipv4/tcp_output.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
> index 95caf8aaa8be..95618d0e78e4 100644
> --- a/net/ipv4/tcp_output.c
> +++ b/net/ipv4/tcp_output.c
> @@ -232,7 +232,7 @@ void tcp_select_initial_window(const struct sock *sk, int __space, __u32 mss,
>         if (READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_workaround_signed_windows))
>                 (*rcv_wnd) = min(space, MAX_TCP_WINDOW);
>         else
> -               (*rcv_wnd) = min_t(u32, space, U16_MAX);
> +               (*rcv_wnd) = space;

This is probably breaking some  packetdrill tests, but your change
might [1] be good,
especially because it allows DRS behavior to be consistent for large
MTU (eg MTU 9000) and bigger tcp_rmem[1],
even without playing with initrwnd attribute.

"ss -temoi " would display after connection setup  rcv_space:89600
instead of a capped value.

[1] This is hard to say, DRS is full of surprises.
Jason Xing May 21, 2024, 12:36 a.m. UTC | #2
Hello Eric,

On Tue, May 21, 2024 at 12:51 AM Eric Dumazet <edumazet@google.com> wrote:
>
> On Sat, May 18, 2024 at 4:50 AM Jason Xing <kerneljasonxing@gmail.com> wrote:
> >
> > From: Jason Xing <kernelxing@tencent.com>
> >
> > In 2018 commit a337531b942b ("tcp: up initial rmem to 128KB and SYN rwin
> > to around 64KB") limited the initial value of tp->rcv_wnd to 65535, most
> > CDN team would not benefit from this change because they cannot have a
> > large window to receive a big packet, which will be slowed down especially
> > in long RTT.
> >
> > According to RFC 7323, it says:
> >   "The maximum receive window, and therefore the scale factor, is
> >    determined by the maximum receive buffer space."
>
> This seems not relevant ?  wscale factor is not changed in this patch ?
> tp->rcv_wnd is also not the maximum receive window.

Thanks for your review.

I can remove this part. I was trying to claim I do not break RFC.

>
> >
> > So we can get rid of this 64k limitation and let the window be tunable if
> > the user wants to do it within the control of buffer space. Then many
> > companies, I believe, can have the same behaviour as old days.
>
> Not sure this has ever worked, see below.
>
> Also, the "many companies ..." mention has nothing to do in a changelog.

Oh, I just copied/translated from my initial studies of this rcv_wnd
by reading many papers something like this.

I can also remove this sentence.

>
>
> > Besides,
> > there are many papers conducting various interesting experiments which
> > have something to do with this window and show good outputs in some cases,
> > say, paper [1] in Yahoo! CDN.
>
> I think this changelog is trying hard to sell something, but in
> reality TCP 3WHS nature
> makes your claims wrong.
>
> Instead, you should clearly document that this problem can _not_ be
> solved for both
> active _and_ passive connections.
>
> In the first RTT, a client (active connection) can not send more than
> 64KB, if TCP specs
> are properly applied.

Having a large rcv_wnd if the user can tweak this knob can help
transfer data more rapidly. I'm not referring to the first RTT.

>
> >
> > To avoid future confusion, current change doesn't affect the initial
> > receive window on the wire in a SYN or SYN+ACK packet which are set within
> > 65535 bytes according to RFC 7323 also due to the limit in
> > __tcp_transmit_skb():
> >
> >     th->window      = htons(min(tp->rcv_wnd, 65535U));
> >
> > In one word, __tcp_transmit_skb() already ensures that constraint is
> > respected, no matter how large tp->rcv_wnd is.
> >
> > Let me provide one example if with or without the patch:
> > Before:
> > client   --- SYN: rwindow=65535 ---> server
> > client   <--- SYN+ACK: rwindow=65535 ----  server
> > client   --- ACK: rwindow=65536 ---> server
> > Note: for the last ACK, the calculation is 512 << 7.
> >
> > After:
> > client   --- SYN: rwindow=65535 ---> server
> > client   <--- SYN+ACK: rwindow=65535 ----  server
> > client   --- ACK: rwindow=175232 ---> server
> > Note: I use the following command to make it work:
> > ip route change default via [ip] dev eth0 metric 100 initrwnd 120
> > For the last ACK, the calculation is 1369 << 7.
> >
> > We can pay attention to the last ACK in 3-way shakehand and notice that
> > with the patch applied the window can reach more than 64 KByte.
>
> You carefully avoided mentioning the asymmetry.
> I do not think this is needed in the changelog, because this is adding
> confusion.

What kind of case I've met in production is only about whether we're
capable of sending more data at the same time at the very beginning,
so I care much more about the sending process right now.

>
> >
> > [1]: https://conferences.sigcomm.org/imc/2011/docs/p569.pdf
> >
> > Signed-off-by: Jason Xing <kernelxing@tencent.com>
> > ---
> > v2
> > Link: https://lore.kernel.org/all/20240517085031.18896-1-kerneljasonxing@gmail.com/
> > 1. revise the title and body messages (Neal)
> > ---
> >  net/ipv4/tcp_output.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
> > index 95caf8aaa8be..95618d0e78e4 100644
> > --- a/net/ipv4/tcp_output.c
> > +++ b/net/ipv4/tcp_output.c
> > @@ -232,7 +232,7 @@ void tcp_select_initial_window(const struct sock *sk, int __space, __u32 mss,
> >         if (READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_workaround_signed_windows))
> >                 (*rcv_wnd) = min(space, MAX_TCP_WINDOW);
> >         else
> > -               (*rcv_wnd) = min_t(u32, space, U16_MAX);
> > +               (*rcv_wnd) = space;
>
> This is probably breaking some  packetdrill tests, but your change
> might [1] be good,

I'll do some packetdrill tests and get back some information soon.

> especially because it allows DRS behavior to be consistent for large
> MTU (eg MTU 9000) and bigger tcp_rmem[1],
> even without playing with initrwnd attribute.
>
> "ss -temoi " would display after connection setup  rcv_space:89600
> instead of a capped value.
>
> [1] This is hard to say, DRS is full of surprises.

To avoid confusion, I will remove this link and relevant statements.

Here are my opinions in conclusion:
1) this change doesn't break the law, I mean, various RFCs.
2) this change allows us to have the same behaviour as 2018 in this case.
3) this change does some good things to certain cases, especially for
the CDN team.

I'll refine the changelog as far as I can, hoping it will not confuse
the readers.

Thanks,
Jason
Jason Xing May 21, 2024, 12:58 a.m. UTC | #3
On Tue, May 21, 2024 at 8:36 AM Jason Xing <kerneljasonxing@gmail.com> wrote:
>
> Hello Eric,
>
> On Tue, May 21, 2024 at 12:51 AM Eric Dumazet <edumazet@google.com> wrote:
> >
> > On Sat, May 18, 2024 at 4:50 AM Jason Xing <kerneljasonxing@gmail.com> wrote:
> > >
> > > From: Jason Xing <kernelxing@tencent.com>
> > >
> > > In 2018 commit a337531b942b ("tcp: up initial rmem to 128KB and SYN rwin
> > > to around 64KB") limited the initial value of tp->rcv_wnd to 65535, most
> > > CDN team would not benefit from this change because they cannot have a
> > > large window to receive a big packet, which will be slowed down especially
> > > in long RTT.
> > >
> > > According to RFC 7323, it says:
> > >   "The maximum receive window, and therefore the scale factor, is
> > >    determined by the maximum receive buffer space."
> >
> > This seems not relevant ?  wscale factor is not changed in this patch ?
> > tp->rcv_wnd is also not the maximum receive window.
>
> Thanks for your review.
>
> I can remove this part. I was trying to claim I do not break RFC.
>
> >
> > >
> > > So we can get rid of this 64k limitation and let the window be tunable if
> > > the user wants to do it within the control of buffer space. Then many
> > > companies, I believe, can have the same behaviour as old days.
> >
> > Not sure this has ever worked, see below.
> >
> > Also, the "many companies ..." mention has nothing to do in a changelog.
>
> Oh, I just copied/translated from my initial studies of this rcv_wnd
> by reading many papers something like this.
>
> I can also remove this sentence.
>
> >
> >
> > > Besides,
> > > there are many papers conducting various interesting experiments which
> > > have something to do with this window and show good outputs in some cases,
> > > say, paper [1] in Yahoo! CDN.
> >
> > I think this changelog is trying hard to sell something, but in
> > reality TCP 3WHS nature
> > makes your claims wrong.
> >
> > Instead, you should clearly document that this problem can _not_ be
> > solved for both
> > active _and_ passive connections.
> >
> > In the first RTT, a client (active connection) can not send more than
> > 64KB, if TCP specs
> > are properly applied.
>
> Having a large rcv_wnd if the user can tweak this knob can help
> transfer data more rapidly. I'm not referring to the first RTT.

For the first RTT, it is surely limited to 64 KB at most as you said.

For the whole process, the change can accelerate the sending process
and save some RTTs.

How can I find this change? We had some servers upgraded to the latest
kernel and noticed the indicator from the user side showed worse
results than before. Because of this, I spent some time digging into
this part.

After applying this patch, the indicator shows normal/good results
like before. It is proven it works.

For the CDN team, they are very sensitive to the latency/time about
sending big data in the long RTT.

Thanks,
Jason

>
> >
> > >
> > > To avoid future confusion, current change doesn't affect the initial
> > > receive window on the wire in a SYN or SYN+ACK packet which are set within
> > > 65535 bytes according to RFC 7323 also due to the limit in
> > > __tcp_transmit_skb():
> > >
> > >     th->window      = htons(min(tp->rcv_wnd, 65535U));
> > >
> > > In one word, __tcp_transmit_skb() already ensures that constraint is
> > > respected, no matter how large tp->rcv_wnd is.
> > >
> > > Let me provide one example if with or without the patch:
> > > Before:
> > > client   --- SYN: rwindow=65535 ---> server
> > > client   <--- SYN+ACK: rwindow=65535 ----  server
> > > client   --- ACK: rwindow=65536 ---> server
> > > Note: for the last ACK, the calculation is 512 << 7.
> > >
> > > After:
> > > client   --- SYN: rwindow=65535 ---> server
> > > client   <--- SYN+ACK: rwindow=65535 ----  server
> > > client   --- ACK: rwindow=175232 ---> server
> > > Note: I use the following command to make it work:
> > > ip route change default via [ip] dev eth0 metric 100 initrwnd 120
> > > For the last ACK, the calculation is 1369 << 7.
> > >
> > > We can pay attention to the last ACK in 3-way shakehand and notice that
> > > with the patch applied the window can reach more than 64 KByte.
> >
> > You carefully avoided mentioning the asymmetry.
> > I do not think this is needed in the changelog, because this is adding
> > confusion.
>
> What kind of case I've met in production is only about whether we're
> capable of sending more data at the same time at the very beginning,
> so I care much more about the sending process right now.
>
> >
> > >
> > > [1]: https://conferences.sigcomm.org/imc/2011/docs/p569.pdf
> > >
> > > Signed-off-by: Jason Xing <kernelxing@tencent.com>
> > > ---
> > > v2
> > > Link: https://lore.kernel.org/all/20240517085031.18896-1-kerneljasonxing@gmail.com/
> > > 1. revise the title and body messages (Neal)
> > > ---
> > >  net/ipv4/tcp_output.c | 2 +-
> > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > >
> > > diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
> > > index 95caf8aaa8be..95618d0e78e4 100644
> > > --- a/net/ipv4/tcp_output.c
> > > +++ b/net/ipv4/tcp_output.c
> > > @@ -232,7 +232,7 @@ void tcp_select_initial_window(const struct sock *sk, int __space, __u32 mss,
> > >         if (READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_workaround_signed_windows))
> > >                 (*rcv_wnd) = min(space, MAX_TCP_WINDOW);
> > >         else
> > > -               (*rcv_wnd) = min_t(u32, space, U16_MAX);
> > > +               (*rcv_wnd) = space;
> >
> > This is probably breaking some  packetdrill tests, but your change
> > might [1] be good,
>
> I'll do some packetdrill tests and get back some information soon.
>
> > especially because it allows DRS behavior to be consistent for large
> > MTU (eg MTU 9000) and bigger tcp_rmem[1],
> > even without playing with initrwnd attribute.
> >
> > "ss -temoi " would display after connection setup  rcv_space:89600
> > instead of a capped value.
> >
> > [1] This is hard to say, DRS is full of surprises.
>
> To avoid confusion, I will remove this link and relevant statements.
>
> Here are my opinions in conclusion:
> 1) this change doesn't break the law, I mean, various RFCs.
> 2) this change allows us to have the same behaviour as 2018 in this case.
> 3) this change does some good things to certain cases, especially for
> the CDN team.
>
> I'll refine the changelog as far as I can, hoping it will not confuse
> the readers.
>
> Thanks,
> Jason
Jason Xing May 21, 2024, 6:56 a.m. UTC | #4
Hello Eric,

On Tue, May 21, 2024 at 8:36 AM Jason Xing <kerneljasonxing@gmail.com> wrote:
>
> Hello Eric,
>
> On Tue, May 21, 2024 at 12:51 AM Eric Dumazet <edumazet@google.com> wrote:
> >
> > On Sat, May 18, 2024 at 4:50 AM Jason Xing <kerneljasonxing@gmail.com> wrote:
> > >
> > > From: Jason Xing <kernelxing@tencent.com>
> > >
> > > In 2018 commit a337531b942b ("tcp: up initial rmem to 128KB and SYN rwin
> > > to around 64KB") limited the initial value of tp->rcv_wnd to 65535, most
> > > CDN team would not benefit from this change because they cannot have a
> > > large window to receive a big packet, which will be slowed down especially
> > > in long RTT.
> > >
> > > According to RFC 7323, it says:
> > >   "The maximum receive window, and therefore the scale factor, is
> > >    determined by the maximum receive buffer space."
> >
> > This seems not relevant ?  wscale factor is not changed in this patch ?
> > tp->rcv_wnd is also not the maximum receive window.
>
> Thanks for your review.
>
> I can remove this part. I was trying to claim I do not break RFC.
>
> >
> > >
> > > So we can get rid of this 64k limitation and let the window be tunable if
> > > the user wants to do it within the control of buffer space. Then many
> > > companies, I believe, can have the same behaviour as old days.
> >
> > Not sure this has ever worked, see below.
> >
> > Also, the "many companies ..." mention has nothing to do in a changelog.
>
> Oh, I just copied/translated from my initial studies of this rcv_wnd
> by reading many papers something like this.
>
> I can also remove this sentence.
>
> >
> >
> > > Besides,
> > > there are many papers conducting various interesting experiments which
> > > have something to do with this window and show good outputs in some cases,
> > > say, paper [1] in Yahoo! CDN.
> >
> > I think this changelog is trying hard to sell something, but in
> > reality TCP 3WHS nature
> > makes your claims wrong.
> >
> > Instead, you should clearly document that this problem can _not_ be
> > solved for both
> > active _and_ passive connections.
> >
> > In the first RTT, a client (active connection) can not send more than
> > 64KB, if TCP specs
> > are properly applied.
>
> Having a large rcv_wnd if the user can tweak this knob can help
> transfer data more rapidly. I'm not referring to the first RTT.
>
> >
> > >
> > > To avoid future confusion, current change doesn't affect the initial
> > > receive window on the wire in a SYN or SYN+ACK packet which are set within
> > > 65535 bytes according to RFC 7323 also due to the limit in
> > > __tcp_transmit_skb():
> > >
> > >     th->window      = htons(min(tp->rcv_wnd, 65535U));
> > >
> > > In one word, __tcp_transmit_skb() already ensures that constraint is
> > > respected, no matter how large tp->rcv_wnd is.
> > >
> > > Let me provide one example if with or without the patch:
> > > Before:
> > > client   --- SYN: rwindow=65535 ---> server
> > > client   <--- SYN+ACK: rwindow=65535 ----  server
> > > client   --- ACK: rwindow=65536 ---> server
> > > Note: for the last ACK, the calculation is 512 << 7.
> > >
> > > After:
> > > client   --- SYN: rwindow=65535 ---> server
> > > client   <--- SYN+ACK: rwindow=65535 ----  server
> > > client   --- ACK: rwindow=175232 ---> server
> > > Note: I use the following command to make it work:
> > > ip route change default via [ip] dev eth0 metric 100 initrwnd 120
> > > For the last ACK, the calculation is 1369 << 7.
> > >
> > > We can pay attention to the last ACK in 3-way shakehand and notice that
> > > with the patch applied the window can reach more than 64 KByte.
> >
> > You carefully avoided mentioning the asymmetry.
> > I do not think this is needed in the changelog, because this is adding
> > confusion.
>
> What kind of case I've met in production is only about whether we're
> capable of sending more data at the same time at the very beginning,
> so I care much more about the sending process right now.
>
> >
> > >
> > > [1]: https://conferences.sigcomm.org/imc/2011/docs/p569.pdf
> > >
> > > Signed-off-by: Jason Xing <kernelxing@tencent.com>
> > > ---
> > > v2
> > > Link: https://lore.kernel.org/all/20240517085031.18896-1-kerneljasonxing@gmail.com/
> > > 1. revise the title and body messages (Neal)
> > > ---
> > >  net/ipv4/tcp_output.c | 2 +-
> > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > >
> > > diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
> > > index 95caf8aaa8be..95618d0e78e4 100644
> > > --- a/net/ipv4/tcp_output.c
> > > +++ b/net/ipv4/tcp_output.c
> > > @@ -232,7 +232,7 @@ void tcp_select_initial_window(const struct sock *sk, int __space, __u32 mss,
> > >         if (READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_workaround_signed_windows))
> > >                 (*rcv_wnd) = min(space, MAX_TCP_WINDOW);
> > >         else
> > > -               (*rcv_wnd) = min_t(u32, space, U16_MAX);
> > > +               (*rcv_wnd) = space;
> >
> > This is probably breaking some  packetdrill tests, but your change
> > might [1] be good,
>
> I'll do some packetdrill tests and get back some information soon.

I'm done with the packetdrill tests[1]. Here are two tests failed
after comparing with/without this patch:
1) ./packetdrill/run_all.py -S -v -L -l tcp/ioctl/ioctl-siocinq-fin.pkt
2) ./packetdrill/run_all.py -S -v -L -l
tcp/fastopen/server/client-ack-dropped-then-recovery-ms-timestamps.pkt

For the first one, it shows:
"FAIL [/data/home/kernelxing/source_code/packetdrill/gtests/net/tcp/ioctl/ioctl-siocinq-fin.pkt
(ipv6)]
stdout:
stderr:
ioctl-siocinq-fin.pkt:28: error handling packet: timing error:
expected outbound packet at 0.302321 sec but happened at 0.342759 sec;
tolerance 0.004000 sec
script packet:  0.302321 . 1:1(0) ack 2002
actual packet:  0.342759 . 1:1(0) ack 2002 win 65535"

For the second one, it shows:
"client-ack-dropped-then-recovery-ms-timestamps.pkt:33: error handling
packet: live packet field tcp_window: expected: 256 (0x100) vs actual:
532 (0x214)
script packet:  0.012251 P. 1:5001(5000) ack 1001 win 256 <nop,nop,TS
val 2010 ecr 1000>
actual packet:  0.012242 P. 1:5001(5000) ack 1001 win 532 <nop,nop,TS
val 2010 ecr 1000>
Ran    3 tests:    0 passing,    3 failing,    0 timed out (0.91 sec):
tcp/fastopen/server/client-ack-dropped-then-recovery-ms-timestamps.pkt"

The reason is unexpected window size. Since I removed the limit of
64KB, It is expected from my view.

[1]: https://github.com/google/packetdrill
Running: ./packetdrill/run_all.py -S -v -L -l tcp/

I wonder if you mind this change which might be unpredictable, how
about this one:
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 95caf8aaa8be..3bf7d9fd2d6b 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -231,11 +231,13 @@ void tcp_select_initial_window(const struct sock
*sk, int __space, __u32 mss,
         */
        if (READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_workaround_signed_windows))
                (*rcv_wnd) = min(space, MAX_TCP_WINDOW);
-       else
-               (*rcv_wnd) = min_t(u32, space, U16_MAX);

-       if (init_rcv_wnd)
+       if (init_rcv_wnd) {
+               *rcv_wnd = space;
                *rcv_wnd = min(*rcv_wnd, init_rcv_wnd * mss);
+       } else {
+               *rcv_wnd = min_t(u32, space, U16_MAX);
+       }

        *rcv_wscale = 0;
        if (wscale_ok) {
?

It affects/changes the TCP stack only when the user tries to use 'ip
route' to set initrwnd.

Thanks,
Jason

>
> > especially because it allows DRS behavior to be consistent for large
> > MTU (eg MTU 9000) and bigger tcp_rmem[1],
> > even without playing with initrwnd attribute.
> >
> > "ss -temoi " would display after connection setup  rcv_space:89600
> > instead of a capped value.
> >
> > [1] This is hard to say, DRS is full of surprises.
>
> To avoid confusion, I will remove this link and relevant statements.
>
> Here are my opinions in conclusion:
> 1) this change doesn't break the law, I mean, various RFCs.
> 2) this change allows us to have the same behaviour as 2018 in this case.
> 3) this change does some good things to certain cases, especially for
> the CDN team.
>
> I'll refine the changelog as far as I can, hoping it will not confuse
> the readers.
>
> Thanks,
> Jason
Eric Dumazet May 21, 2024, 9:43 a.m. UTC | #5
On Tue, May 21, 2024 at 8:56 AM Jason Xing <kerneljasonxing@gmail.com> wrote:
>
> Hello Eric,
>
> On Tue, May 21, 2024 at 8:36 AM Jason Xing <kerneljasonxing@gmail.com> wrote:
> >
> > Hello Eric,
> >
> > On Tue, May 21, 2024 at 12:51 AM Eric Dumazet <edumazet@google.com> wrote:
> > >
> > > On Sat, May 18, 2024 at 4:50 AM Jason Xing <kerneljasonxing@gmail.com> wrote:
> > > >
> > > > From: Jason Xing <kernelxing@tencent.com>
> > > >
> > > > In 2018 commit a337531b942b ("tcp: up initial rmem to 128KB and SYN rwin
> > > > to around 64KB") limited the initial value of tp->rcv_wnd to 65535, most
> > > > CDN team would not benefit from this change because they cannot have a
> > > > large window to receive a big packet, which will be slowed down especially
> > > > in long RTT.
> > > >
> > > > According to RFC 7323, it says:
> > > >   "The maximum receive window, and therefore the scale factor, is
> > > >    determined by the maximum receive buffer space."
> > >
> > > This seems not relevant ?  wscale factor is not changed in this patch ?
> > > tp->rcv_wnd is also not the maximum receive window.
> >
> > Thanks for your review.
> >
> > I can remove this part. I was trying to claim I do not break RFC.
> >
> > >
> > > >
> > > > So we can get rid of this 64k limitation and let the window be tunable if
> > > > the user wants to do it within the control of buffer space. Then many
> > > > companies, I believe, can have the same behaviour as old days.
> > >
> > > Not sure this has ever worked, see below.
> > >
> > > Also, the "many companies ..." mention has nothing to do in a changelog.
> >
> > Oh, I just copied/translated from my initial studies of this rcv_wnd
> > by reading many papers something like this.
> >
> > I can also remove this sentence.
> >
> > >
> > >
> > > > Besides,
> > > > there are many papers conducting various interesting experiments which
> > > > have something to do with this window and show good outputs in some cases,
> > > > say, paper [1] in Yahoo! CDN.
> > >
> > > I think this changelog is trying hard to sell something, but in
> > > reality TCP 3WHS nature
> > > makes your claims wrong.
> > >
> > > Instead, you should clearly document that this problem can _not_ be
> > > solved for both
> > > active _and_ passive connections.
> > >
> > > In the first RTT, a client (active connection) can not send more than
> > > 64KB, if TCP specs
> > > are properly applied.
> >
> > Having a large rcv_wnd if the user can tweak this knob can help
> > transfer data more rapidly. I'm not referring to the first RTT.
> >
> > >
> > > >
> > > > To avoid future confusion, current change doesn't affect the initial
> > > > receive window on the wire in a SYN or SYN+ACK packet which are set within
> > > > 65535 bytes according to RFC 7323 also due to the limit in
> > > > __tcp_transmit_skb():
> > > >
> > > >     th->window      = htons(min(tp->rcv_wnd, 65535U));
> > > >
> > > > In one word, __tcp_transmit_skb() already ensures that constraint is
> > > > respected, no matter how large tp->rcv_wnd is.
> > > >
> > > > Let me provide one example if with or without the patch:
> > > > Before:
> > > > client   --- SYN: rwindow=65535 ---> server
> > > > client   <--- SYN+ACK: rwindow=65535 ----  server
> > > > client   --- ACK: rwindow=65536 ---> server
> > > > Note: for the last ACK, the calculation is 512 << 7.
> > > >
> > > > After:
> > > > client   --- SYN: rwindow=65535 ---> server
> > > > client   <--- SYN+ACK: rwindow=65535 ----  server
> > > > client   --- ACK: rwindow=175232 ---> server
> > > > Note: I use the following command to make it work:
> > > > ip route change default via [ip] dev eth0 metric 100 initrwnd 120
> > > > For the last ACK, the calculation is 1369 << 7.
> > > >
> > > > We can pay attention to the last ACK in 3-way shakehand and notice that
> > > > with the patch applied the window can reach more than 64 KByte.
> > >
> > > You carefully avoided mentioning the asymmetry.
> > > I do not think this is needed in the changelog, because this is adding
> > > confusion.
> >
> > What kind of case I've met in production is only about whether we're
> > capable of sending more data at the same time at the very beginning,
> > so I care much more about the sending process right now.
> >
> > >
> > > >
> > > > [1]: https://conferences.sigcomm.org/imc/2011/docs/p569.pdf
> > > >
> > > > Signed-off-by: Jason Xing <kernelxing@tencent.com>
> > > > ---
> > > > v2
> > > > Link: https://lore.kernel.org/all/20240517085031.18896-1-kerneljasonxing@gmail.com/
> > > > 1. revise the title and body messages (Neal)
> > > > ---
> > > >  net/ipv4/tcp_output.c | 2 +-
> > > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > >
> > > > diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
> > > > index 95caf8aaa8be..95618d0e78e4 100644
> > > > --- a/net/ipv4/tcp_output.c
> > > > +++ b/net/ipv4/tcp_output.c
> > > > @@ -232,7 +232,7 @@ void tcp_select_initial_window(const struct sock *sk, int __space, __u32 mss,
> > > >         if (READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_workaround_signed_windows))
> > > >                 (*rcv_wnd) = min(space, MAX_TCP_WINDOW);
> > > >         else
> > > > -               (*rcv_wnd) = min_t(u32, space, U16_MAX);
> > > > +               (*rcv_wnd) = space;
> > >
> > > This is probably breaking some  packetdrill tests, but your change
> > > might [1] be good,
> >
> > I'll do some packetdrill tests and get back some information soon.
>
> I'm done with the packetdrill tests[1]. Here are two tests failed
> after comparing with/without this patch:
> 1) ./packetdrill/run_all.py -S -v -L -l tcp/ioctl/ioctl-siocinq-fin.pkt
> 2) ./packetdrill/run_all.py -S -v -L -l
> tcp/fastopen/server/client-ack-dropped-then-recovery-ms-timestamps.pkt
>
> For the first one, it shows:
> "FAIL [/data/home/kernelxing/source_code/packetdrill/gtests/net/tcp/ioctl/ioctl-siocinq-fin.pkt
> (ipv6)]
> stdout:
> stderr:
> ioctl-siocinq-fin.pkt:28: error handling packet: timing error:
> expected outbound packet at 0.302321 sec but happened at 0.342759 sec;
> tolerance 0.004000 sec
> script packet:  0.302321 . 1:1(0) ack 2002
> actual packet:  0.342759 . 1:1(0) ack 2002 win 65535"
>
> For the second one, it shows:
> "client-ack-dropped-then-recovery-ms-timestamps.pkt:33: error handling
> packet: live packet field tcp_window: expected: 256 (0x100) vs actual:
> 532 (0x214)
> script packet:  0.012251 P. 1:5001(5000) ack 1001 win 256 <nop,nop,TS
> val 2010 ecr 1000>
> actual packet:  0.012242 P. 1:5001(5000) ack 1001 win 532 <nop,nop,TS
> val 2010 ecr 1000>
> Ran    3 tests:    0 passing,    3 failing,    0 timed out (0.91 sec):
> tcp/fastopen/server/client-ack-dropped-then-recovery-ms-timestamps.pkt"
>
> The reason is unexpected window size. Since I removed the limit of
> 64KB, It is expected from my view.

I think you misunderstood what I was saying.

Basically, this change will break some packetdrill tests, and this is fine,
because those packetdrill tests were relying on a prior kernel behavior that
was not set in stone (certainly not documented)

>
> [1]: https://github.com/google/packetdrill
> Running: ./packetdrill/run_all.py -S -v -L -l tcp/
>
> I wonder if you mind this change which might be unpredictable, how
> about this one:
> diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
> index 95caf8aaa8be..3bf7d9fd2d6b 100644
> --- a/net/ipv4/tcp_output.c
> +++ b/net/ipv4/tcp_output.c
> @@ -231,11 +231,13 @@ void tcp_select_initial_window(const struct sock
> *sk, int __space, __u32 mss,
>          */
>         if (READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_workaround_signed_windows))
>                 (*rcv_wnd) = min(space, MAX_TCP_WINDOW);
> -       else
> -               (*rcv_wnd) = min_t(u32, space, U16_MAX);
>
> -       if (init_rcv_wnd)
> +       if (init_rcv_wnd) {
> +               *rcv_wnd = space;
>                 *rcv_wnd = min(*rcv_wnd, init_rcv_wnd * mss);
> +       } else {
> +               *rcv_wnd = min_t(u32, space, U16_MAX);
> +       }
>
>         *rcv_wscale = 0;
>         if (wscale_ok) {
> ?
>
> It affects/changes the TCP stack only when the user tries to use 'ip
> route' to set initrwnd.

I much prefer the prior and simpler version.

Only the changelog was not very good IMO.

Also, I think this is fixing a bug and should target the net tree.

If it took 6 years to discover the unwanted side effects, we should
make sure the fix
is backported by stable teams, thanks to an appropriate Fixes: tag.


>
> Thanks,
> Jason
>
> >
> > > especially because it allows DRS behavior to be consistent for large
> > > MTU (eg MTU 9000) and bigger tcp_rmem[1],
> > > even without playing with initrwnd attribute.
> > >
> > > "ss -temoi " would display after connection setup  rcv_space:89600
> > > instead of a capped value.
> > >
> > > [1] This is hard to say, DRS is full of surprises.
> >
> > To avoid confusion, I will remove this link and relevant statements.
> >
> > Here are my opinions in conclusion:
> > 1) this change doesn't break the law, I mean, various RFCs.
> > 2) this change allows us to have the same behaviour as 2018 in this case.
> > 3) this change does some good things to certain cases, especially for
> > the CDN team.
> >
> > I'll refine the changelog as far as I can, hoping it will not confuse
> > the readers.
> >
> > Thanks,
> > Jason
Jason Xing May 21, 2024, 1:11 p.m. UTC | #6
On Tue, May 21, 2024 at 5:43 PM Eric Dumazet <edumazet@google.com> wrote:
>
> On Tue, May 21, 2024 at 8:56 AM Jason Xing <kerneljasonxing@gmail.com> wrote:
> >
> > Hello Eric,
> >
> > On Tue, May 21, 2024 at 8:36 AM Jason Xing <kerneljasonxing@gmail.com> wrote:
> > >
> > > Hello Eric,
> > >
> > > On Tue, May 21, 2024 at 12:51 AM Eric Dumazet <edumazet@google.com> wrote:
> > > >
> > > > On Sat, May 18, 2024 at 4:50 AM Jason Xing <kerneljasonxing@gmail.com> wrote:
> > > > >
> > > > > From: Jason Xing <kernelxing@tencent.com>
> > > > >
> > > > > In 2018 commit a337531b942b ("tcp: up initial rmem to 128KB and SYN rwin
> > > > > to around 64KB") limited the initial value of tp->rcv_wnd to 65535, most
> > > > > CDN team would not benefit from this change because they cannot have a
> > > > > large window to receive a big packet, which will be slowed down especially
> > > > > in long RTT.
> > > > >
> > > > > According to RFC 7323, it says:
> > > > >   "The maximum receive window, and therefore the scale factor, is
> > > > >    determined by the maximum receive buffer space."
> > > >
> > > > This seems not relevant ?  wscale factor is not changed in this patch ?
> > > > tp->rcv_wnd is also not the maximum receive window.
> > >
> > > Thanks for your review.
> > >
> > > I can remove this part. I was trying to claim I do not break RFC.
> > >
> > > >
> > > > >
> > > > > So we can get rid of this 64k limitation and let the window be tunable if
> > > > > the user wants to do it within the control of buffer space. Then many
> > > > > companies, I believe, can have the same behaviour as old days.
> > > >
> > > > Not sure this has ever worked, see below.
> > > >
> > > > Also, the "many companies ..." mention has nothing to do in a changelog.
> > >
> > > Oh, I just copied/translated from my initial studies of this rcv_wnd
> > > by reading many papers something like this.
> > >
> > > I can also remove this sentence.
> > >
> > > >
> > > >
> > > > > Besides,
> > > > > there are many papers conducting various interesting experiments which
> > > > > have something to do with this window and show good outputs in some cases,
> > > > > say, paper [1] in Yahoo! CDN.
> > > >
> > > > I think this changelog is trying hard to sell something, but in
> > > > reality TCP 3WHS nature
> > > > makes your claims wrong.
> > > >
> > > > Instead, you should clearly document that this problem can _not_ be
> > > > solved for both
> > > > active _and_ passive connections.
> > > >
> > > > In the first RTT, a client (active connection) can not send more than
> > > > 64KB, if TCP specs
> > > > are properly applied.
> > >
> > > Having a large rcv_wnd if the user can tweak this knob can help
> > > transfer data more rapidly. I'm not referring to the first RTT.
> > >
> > > >
> > > > >
> > > > > To avoid future confusion, current change doesn't affect the initial
> > > > > receive window on the wire in a SYN or SYN+ACK packet which are set within
> > > > > 65535 bytes according to RFC 7323 also due to the limit in
> > > > > __tcp_transmit_skb():
> > > > >
> > > > >     th->window      = htons(min(tp->rcv_wnd, 65535U));
> > > > >
> > > > > In one word, __tcp_transmit_skb() already ensures that constraint is
> > > > > respected, no matter how large tp->rcv_wnd is.
> > > > >
> > > > > Let me provide one example if with or without the patch:
> > > > > Before:
> > > > > client   --- SYN: rwindow=65535 ---> server
> > > > > client   <--- SYN+ACK: rwindow=65535 ----  server
> > > > > client   --- ACK: rwindow=65536 ---> server
> > > > > Note: for the last ACK, the calculation is 512 << 7.
> > > > >
> > > > > After:
> > > > > client   --- SYN: rwindow=65535 ---> server
> > > > > client   <--- SYN+ACK: rwindow=65535 ----  server
> > > > > client   --- ACK: rwindow=175232 ---> server
> > > > > Note: I use the following command to make it work:
> > > > > ip route change default via [ip] dev eth0 metric 100 initrwnd 120
> > > > > For the last ACK, the calculation is 1369 << 7.
> > > > >
> > > > > We can pay attention to the last ACK in 3-way shakehand and notice that
> > > > > with the patch applied the window can reach more than 64 KByte.
> > > >
> > > > You carefully avoided mentioning the asymmetry.
> > > > I do not think this is needed in the changelog, because this is adding
> > > > confusion.
> > >
> > > What kind of case I've met in production is only about whether we're
> > > capable of sending more data at the same time at the very beginning,
> > > so I care much more about the sending process right now.
> > >
> > > >
> > > > >
> > > > > [1]: https://conferences.sigcomm.org/imc/2011/docs/p569.pdf
> > > > >
> > > > > Signed-off-by: Jason Xing <kernelxing@tencent.com>
> > > > > ---
> > > > > v2
> > > > > Link: https://lore.kernel.org/all/20240517085031.18896-1-kerneljasonxing@gmail.com/
> > > > > 1. revise the title and body messages (Neal)
> > > > > ---
> > > > >  net/ipv4/tcp_output.c | 2 +-
> > > > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > > >
> > > > > diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
> > > > > index 95caf8aaa8be..95618d0e78e4 100644
> > > > > --- a/net/ipv4/tcp_output.c
> > > > > +++ b/net/ipv4/tcp_output.c
> > > > > @@ -232,7 +232,7 @@ void tcp_select_initial_window(const struct sock *sk, int __space, __u32 mss,
> > > > >         if (READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_workaround_signed_windows))
> > > > >                 (*rcv_wnd) = min(space, MAX_TCP_WINDOW);
> > > > >         else
> > > > > -               (*rcv_wnd) = min_t(u32, space, U16_MAX);
> > > > > +               (*rcv_wnd) = space;
> > > >
> > > > This is probably breaking some  packetdrill tests, but your change
> > > > might [1] be good,
> > >
> > > I'll do some packetdrill tests and get back some information soon.
> >
> > I'm done with the packetdrill tests[1]. Here are two tests failed
> > after comparing with/without this patch:
> > 1) ./packetdrill/run_all.py -S -v -L -l tcp/ioctl/ioctl-siocinq-fin.pkt
> > 2) ./packetdrill/run_all.py -S -v -L -l
> > tcp/fastopen/server/client-ack-dropped-then-recovery-ms-timestamps.pkt
> >
> > For the first one, it shows:
> > "FAIL [/data/home/kernelxing/source_code/packetdrill/gtests/net/tcp/ioctl/ioctl-siocinq-fin.pkt
> > (ipv6)]
> > stdout:
> > stderr:
> > ioctl-siocinq-fin.pkt:28: error handling packet: timing error:
> > expected outbound packet at 0.302321 sec but happened at 0.342759 sec;
> > tolerance 0.004000 sec
> > script packet:  0.302321 . 1:1(0) ack 2002
> > actual packet:  0.342759 . 1:1(0) ack 2002 win 65535"
> >
> > For the second one, it shows:
> > "client-ack-dropped-then-recovery-ms-timestamps.pkt:33: error handling
> > packet: live packet field tcp_window: expected: 256 (0x100) vs actual:
> > 532 (0x214)
> > script packet:  0.012251 P. 1:5001(5000) ack 1001 win 256 <nop,nop,TS
> > val 2010 ecr 1000>
> > actual packet:  0.012242 P. 1:5001(5000) ack 1001 win 532 <nop,nop,TS
> > val 2010 ecr 1000>
> > Ran    3 tests:    0 passing,    3 failing,    0 timed out (0.91 sec):
> > tcp/fastopen/server/client-ack-dropped-then-recovery-ms-timestamps.pkt"
> >
> > The reason is unexpected window size. Since I removed the limit of
> > 64KB, It is expected from my view.
>
> I think you misunderstood what I was saying.
>
> Basically, this change will break some packetdrill tests, and this is fine,
> because those packetdrill tests were relying on a prior kernel behavior that
> was not set in stone (certainly not documented)
>
> >
> > [1]: https://github.com/google/packetdrill
> > Running: ./packetdrill/run_all.py -S -v -L -l tcp/
> >
> > I wonder if you mind this change which might be unpredictable, how
> > about this one:
> > diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
> > index 95caf8aaa8be..3bf7d9fd2d6b 100644
> > --- a/net/ipv4/tcp_output.c
> > +++ b/net/ipv4/tcp_output.c
> > @@ -231,11 +231,13 @@ void tcp_select_initial_window(const struct sock
> > *sk, int __space, __u32 mss,
> >          */
> >         if (READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_workaround_signed_windows))
> >                 (*rcv_wnd) = min(space, MAX_TCP_WINDOW);
> > -       else
> > -               (*rcv_wnd) = min_t(u32, space, U16_MAX);
> >
> > -       if (init_rcv_wnd)
> > +       if (init_rcv_wnd) {
> > +               *rcv_wnd = space;
> >                 *rcv_wnd = min(*rcv_wnd, init_rcv_wnd * mss);
> > +       } else {
> > +               *rcv_wnd = min_t(u32, space, U16_MAX);
> > +       }
> >
> >         *rcv_wscale = 0;
> >         if (wscale_ok) {
> > ?
> >
> > It affects/changes the TCP stack only when the user tries to use 'ip
> > route' to set initrwnd.
>
> I much prefer the prior and simpler version.
>
> Only the changelog was not very good IMO.
>
> Also, I think this is fixing a bug and should target the net tree.
>
> If it took 6 years to discover the unwanted side effects, we should
> make sure the fix
> is backported by stable teams, thanks to an appropriate Fixes: tag.

Oh, I see. I'll submit a new one soon.

Thanks,
Jason
diff mbox series

Patch

diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 95caf8aaa8be..95618d0e78e4 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -232,7 +232,7 @@  void tcp_select_initial_window(const struct sock *sk, int __space, __u32 mss,
 	if (READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_workaround_signed_windows))
 		(*rcv_wnd) = min(space, MAX_TCP_WINDOW);
 	else
-		(*rcv_wnd) = min_t(u32, space, U16_MAX);
+		(*rcv_wnd) = space;
 
 	if (init_rcv_wnd)
 		*rcv_wnd = min(*rcv_wnd, init_rcv_wnd * mss);