diff mbox series

[RFC,net-next] tcp: break the limitation of initial receive window

Message ID 20240517085031.18896-1-kerneljasonxing@gmail.com (mailing list archive)
State RFC
Delegated to: Netdev Maintainers
Headers show
Series [RFC,net-next] tcp: break the limitation of initial receive window | expand

Checks

Context Check Description
netdev/series_format success Single patches do not need cover letters
netdev/tree_selection success Clearly marked for net-next
netdev/ynl success Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present success Fixes tag not required for -next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 922 this patch: 922
netdev/build_tools success No tools touched, skip
netdev/cc_maintainers success CCed 5 of 5 maintainers
netdev/build_clang success Errors and warnings before: 927 this patch: 927
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 927 this patch: 927
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 8 lines checked
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 1 this patch: 1
netdev/source_inline success Was 0 now: 0

Commit Message

Jason Xing May 17, 2024, 8:50 a.m. UTC
From: Jason Xing <kernelxing@tencent.com>

Since in 2018 one commit a337531b942b ("tcp: up initial rmem to 128KB and
SYN rwin to around 64KB") limited received window within 65535, most CDN
team would not benefit from this change because they cannot have a large
window to receive a big packet one time especially in long RTT.

According to RFC 7323, it says:
  "The maximum receive window, and therefore the scale factor, is
   determined by the maximum receive buffer space."

So we can get rid of this 64k limitation and let the window be tunable if
the user wants to do it within the control of buffer space. Then many
companies, I believe, can have the same behaviour as old days. Besides,
there are many papers conducting various interesting experiments which
have something to do with this window and show good outputs in some cases.

Signed-off-by: Jason Xing <kernelxing@tencent.com>
---
 net/ipv4/tcp_output.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Neal Cardwell May 17, 2024, 2:42 p.m. UTC | #1
On Fri, May 17, 2024 at 4:50 AM Jason Xing <kerneljasonxing@gmail.com> wrote:
>
> From: Jason Xing <kernelxing@tencent.com>
>
> Since in 2018 one commit a337531b942b ("tcp: up initial rmem to 128KB and
> SYN rwin to around 64KB") limited received window within 65535, most CDN
> team would not benefit from this change because they cannot have a large
> window to receive a big packet one time especially in long RTT.
>
> According to RFC 7323, it says:
>   "The maximum receive window, and therefore the scale factor, is
>    determined by the maximum receive buffer space."
>
> So we can get rid of this 64k limitation and let the window be tunable if
> the user wants to do it within the control of buffer space. Then many
> companies, I believe, can have the same behaviour as old days. Besides,
> there are many papers conducting various interesting experiments which
> have something to do with this window and show good outputs in some cases.
>
> Signed-off-by: Jason Xing <kernelxing@tencent.com>
> ---
>  net/ipv4/tcp_output.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
> index 95caf8aaa8be..95618d0e78e4 100644
> --- a/net/ipv4/tcp_output.c
> +++ b/net/ipv4/tcp_output.c
> @@ -232,7 +232,7 @@ void tcp_select_initial_window(const struct sock *sk, int __space, __u32 mss,
>         if (READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_workaround_signed_windows))
>                 (*rcv_wnd) = min(space, MAX_TCP_WINDOW);
>         else
> -               (*rcv_wnd) = min_t(u32, space, U16_MAX);
> +               (*rcv_wnd) = space;

Hmm, has this patch been tested? This doesn't look like it would work.

Please note that RFC 7323 says in
https://datatracker.ietf.org/doc/html/rfc7323#section-2.2 :

   The window field in a segment where the SYN bit is set (i.e., a <SYN>
   or <SYN,ACK>) MUST NOT be scaled.

Since the receive window field in a SYN is unscaled, that means the
TCP wire protocol has no way to convey a receive window in the SYN
that is bigger than 64KBytes.

That is why this code places a limit of U16_MAX on the value here.

If you want to advertise a bigger receive window in the SYN, you'll
need to define a new TCP option type, and write an IETF Internet Draft
and/or RFC standardizing the new option.

If you would like to, instead, submit a patch with a comment
explaining that this U16_MAX limit is inherent in the RFC 7323 wire
protocol specification, that could make sense.

best regards,
neal
Jason Xing May 17, 2024, 5:41 p.m. UTC | #2
On Fri, May 17, 2024 at 10:42 PM Neal Cardwell <ncardwell@google.com> wrote:
>
> On Fri, May 17, 2024 at 4:50 AM Jason Xing <kerneljasonxing@gmail.com> wrote:
> >
> > From: Jason Xing <kernelxing@tencent.com>
> >
> > Since in 2018 one commit a337531b942b ("tcp: up initial rmem to 128KB and
> > SYN rwin to around 64KB") limited received window within 65535, most CDN
> > team would not benefit from this change because they cannot have a large
> > window to receive a big packet one time especially in long RTT.
> >
> > According to RFC 7323, it says:
> >   "The maximum receive window, and therefore the scale factor, is
> >    determined by the maximum receive buffer space."
> >
> > So we can get rid of this 64k limitation and let the window be tunable if
> > the user wants to do it within the control of buffer space. Then many
> > companies, I believe, can have the same behaviour as old days. Besides,
> > there are many papers conducting various interesting experiments which
> > have something to do with this window and show good outputs in some cases.
> >
> > Signed-off-by: Jason Xing <kernelxing@tencent.com>
> > ---
> >  net/ipv4/tcp_output.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
> > index 95caf8aaa8be..95618d0e78e4 100644
> > --- a/net/ipv4/tcp_output.c
> > +++ b/net/ipv4/tcp_output.c
> > @@ -232,7 +232,7 @@ void tcp_select_initial_window(const struct sock *sk, int __space, __u32 mss,
> >         if (READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_workaround_signed_windows))
> >                 (*rcv_wnd) = min(space, MAX_TCP_WINDOW);
> >         else
> > -               (*rcv_wnd) = min_t(u32, space, U16_MAX);
> > +               (*rcv_wnd) = space;
>
> Hmm, has this patch been tested? This doesn't look like it would work.

Hello Neal,

Thanks for the comment.

Sure, I provided such a patch a few months ago which has been tested
in production for the customers.

One example of using a much bigger initial receive window:
client   ---window=65535---> server
client   <---window=14600----  server
client   ---window=175616---> server

Then the client could send more data than before in fewer rtt.

Above is the output of tcpdump.

Oh, I just found a similar case:
https://lore.kernel.org/all/20220213040545.365600-1-tilan7663@gmail.com/

Before this, I always believed I'm not the only one who had such an issue.

>
> Please note that RFC 7323 says in
> https://datatracker.ietf.org/doc/html/rfc7323#section-2.2 :
>
>    The window field in a segment where the SYN bit is set (i.e., a <SYN>
>    or <SYN,ACK>) MUST NOT be scaled.
>
> Since the receive window field in a SYN is unscaled, that means the
> TCP wire protocol has no way to convey a receive window in the SYN
> that is bigger than 64KBytes.
>
> That is why this code places a limit of U16_MAX on the value here.
>
> If you want to advertise a bigger receive window in the SYN, you'll

No. It's not my original intention.

For SYN packet itself is limited in the __tcp_transmit_skb() as below:

    th->window      = htons(min(tp->rcv_wnd, 65535U));

> need to define a new TCP option type, and write an IETF Internet Draft
> and/or RFC standardizing the new option.
>
> If you would like to, instead, submit a patch with a comment
> explaining that this U16_MAX limit is inherent in the RFC 7323 wire
> protocol specification, that could make sense.

I quoted from that link:
--------
    I'm not trying to make the sender to send more than 64Kib in the
first RTT. The change will only make the sender to send more starting
on the second RTT(after first ack received on the data). Instead of
having the rcv_wnd to grow from 64Kib, the rcv_wnd can start from a
much larger base value.

    Without the patch:

    RTT:                                1,                   2,
    3,  ...
    rcv_wnd:                64KiB,        192KiB,         576KiB,  ...

    With the patch (assume rcv_wnd is set to 512KiB):

    RTT:                                1,                    2,
     3,   ...
   rcv_wnd:                64KiB,    1.536MiB,    4.608MiB,  ...
--------

Another quotation from the paper [1] which has been deployed in Yahoo:
"By increasing this ICW, small objects stand to be transferred in
fewer RTTs, which when compounded across all objects on a page can cut
down the total page load time significantly.
...
Luckily, on popular operating systems (except Linux which has a much
smaller receive window), the initial receive window is quite large
(64KB-256KB), which would allow for utilizing a larger ICW"

[1]: https://conferences.sigcomm.org/imc/2011/docs/p569.pdf

Thanks,
Jason

>
> best regards,
> neal
Jason Xing May 17, 2024, 5:48 p.m. UTC | #3
On Sat, May 18, 2024 at 1:41 AM Jason Xing <kerneljasonxing@gmail.com> wrote:
>
> On Fri, May 17, 2024 at 10:42 PM Neal Cardwell <ncardwell@google.com> wrote:
> >
> > On Fri, May 17, 2024 at 4:50 AM Jason Xing <kerneljasonxing@gmail.com> wrote:
> > >
> > > From: Jason Xing <kernelxing@tencent.com>
> > >
> > > Since in 2018 one commit a337531b942b ("tcp: up initial rmem to 128KB and
> > > SYN rwin to around 64KB") limited received window within 65535, most CDN
> > > team would not benefit from this change because they cannot have a large
> > > window to receive a big packet one time especially in long RTT.
> > >
> > > According to RFC 7323, it says:
> > >   "The maximum receive window, and therefore the scale factor, is
> > >    determined by the maximum receive buffer space."
> > >
> > > So we can get rid of this 64k limitation and let the window be tunable if
> > > the user wants to do it within the control of buffer space. Then many
> > > companies, I believe, can have the same behaviour as old days. Besides,
> > > there are many papers conducting various interesting experiments which
> > > have something to do with this window and show good outputs in some cases.
> > >
> > > Signed-off-by: Jason Xing <kernelxing@tencent.com>
> > > ---
> > >  net/ipv4/tcp_output.c | 2 +-
> > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > >
> > > diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
> > > index 95caf8aaa8be..95618d0e78e4 100644
> > > --- a/net/ipv4/tcp_output.c
> > > +++ b/net/ipv4/tcp_output.c
> > > @@ -232,7 +232,7 @@ void tcp_select_initial_window(const struct sock *sk, int __space, __u32 mss,
> > >         if (READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_workaround_signed_windows))
> > >                 (*rcv_wnd) = min(space, MAX_TCP_WINDOW);
> > >         else
> > > -               (*rcv_wnd) = min_t(u32, space, U16_MAX);
> > > +               (*rcv_wnd) = space;
> >
> > Hmm, has this patch been tested? This doesn't look like it would work.
>
> Hello Neal,
>
> Thanks for the comment.
>
> Sure, I provided such a patch a few months ago which has been tested
> in production for the customers.
>
> One example of using a much bigger initial receive window:
> client   ---window=65535---> server
> client   <---window=14600----  server
> client   ---window=175616---> server
>
> Then the client could send more data than before in fewer rtt.
>
> Above is the output of tcpdump.
>
> Oh, I just found a similar case:
> https://lore.kernel.org/all/20220213040545.365600-1-tilan7663@gmail.com/
>
> Before this, I always believed I'm not the only one who had such an issue.
>
> >
> > Please note that RFC 7323 says in
> > https://datatracker.ietf.org/doc/html/rfc7323#section-2.2 :
> >
> >    The window field in a segment where the SYN bit is set (i.e., a <SYN>
> >    or <SYN,ACK>) MUST NOT be scaled.
> >
> > Since the receive window field in a SYN is unscaled, that means the
> > TCP wire protocol has no way to convey a receive window in the SYN
> > that is bigger than 64KBytes.
> >
> > That is why this code places a limit of U16_MAX on the value here.
> >
> > If you want to advertise a bigger receive window in the SYN, you'll
>
> No. It's not my original intention.
>
> For SYN packet itself is limited in the __tcp_transmit_skb() as below:
>
>     th->window      = htons(min(tp->rcv_wnd, 65535U));

With this limitation/protection of the window in SYN packet, It would
not break RFC with this patch applied. I try to advertise a bigger
initRwnd of ACK in a 3-way shakehand process.

Thanks,
Jason
Neal Cardwell May 17, 2024, 6:35 p.m. UTC | #4
On Fri, May 17, 2024 at 1:49 PM Jason Xing <kerneljasonxing@gmail.com> wrote:
>
> On Sat, May 18, 2024 at 1:41 AM Jason Xing <kerneljasonxing@gmail.com> wrote:
> >
> > On Fri, May 17, 2024 at 10:42 PM Neal Cardwell <ncardwell@google.com> wrote:
> > >
> > > On Fri, May 17, 2024 at 4:50 AM Jason Xing <kerneljasonxing@gmail.com> wrote:
> > > >
> > > > From: Jason Xing <kernelxing@tencent.com>
> > > >
> > > > Since in 2018 one commit a337531b942b ("tcp: up initial rmem to 128KB and
> > > > SYN rwin to around 64KB") limited received window within 65535, most CDN
> > > > team would not benefit from this change because they cannot have a large
> > > > window to receive a big packet one time especially in long RTT.
> > > >
> > > > According to RFC 7323, it says:
> > > >   "The maximum receive window, and therefore the scale factor, is
> > > >    determined by the maximum receive buffer space."
> > > >
> > > > So we can get rid of this 64k limitation and let the window be tunable if
> > > > the user wants to do it within the control of buffer space. Then many
> > > > companies, I believe, can have the same behaviour as old days. Besides,
> > > > there are many papers conducting various interesting experiments which
> > > > have something to do with this window and show good outputs in some cases.
> > > >
> > > > Signed-off-by: Jason Xing <kernelxing@tencent.com>
> > > > ---
> > > >  net/ipv4/tcp_output.c | 2 +-
> > > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > >
> > > > diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
> > > > index 95caf8aaa8be..95618d0e78e4 100644
> > > > --- a/net/ipv4/tcp_output.c
> > > > +++ b/net/ipv4/tcp_output.c
> > > > @@ -232,7 +232,7 @@ void tcp_select_initial_window(const struct sock *sk, int __space, __u32 mss,
> > > >         if (READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_workaround_signed_windows))
> > > >                 (*rcv_wnd) = min(space, MAX_TCP_WINDOW);
> > > >         else
> > > > -               (*rcv_wnd) = min_t(u32, space, U16_MAX);
> > > > +               (*rcv_wnd) = space;
> > >
> > > Hmm, has this patch been tested? This doesn't look like it would work.
> >
> > Hello Neal,
> >
> > Thanks for the comment.
> >
> > Sure, I provided such a patch a few months ago which has been tested
> > in production for the customers.
> >
> > One example of using a much bigger initial receive window:
> > client   ---window=65535---> server
> > client   <---window=14600----  server
> > client   ---window=175616---> server
> >
> > Then the client could send more data than before in fewer rtt.
> >
> > Above is the output of tcpdump.
> >
> > Oh, I just found a similar case:
> > https://lore.kernel.org/all/20220213040545.365600-1-tilan7663@gmail.com/
> >
> > Before this, I always believed I'm not the only one who had such an issue.
> >
> > >
> > > Please note that RFC 7323 says in
> > > https://datatracker.ietf.org/doc/html/rfc7323#section-2.2 :
> > >
> > >    The window field in a segment where the SYN bit is set (i.e., a <SYN>
> > >    or <SYN,ACK>) MUST NOT be scaled.
> > >
> > > Since the receive window field in a SYN is unscaled, that means the
> > > TCP wire protocol has no way to convey a receive window in the SYN
> > > that is bigger than 64KBytes.
> > >
> > > That is why this code places a limit of U16_MAX on the value here.
> > >
> > > If you want to advertise a bigger receive window in the SYN, you'll
> >
> > No. It's not my original intention.
> >
> > For SYN packet itself is limited in the __tcp_transmit_skb() as below:
> >
> >     th->window      = htons(min(tp->rcv_wnd, 65535U));
>
> With this limitation/protection of the window in SYN packet, It would
> not break RFC with this patch applied. I try to advertise a bigger
> initRwnd of ACK in a 3-way shakehand process.

Thanks for the explanation.

I think the confusion arose because in your title ("tcp: break the
limitation of initial receive window"), I interpreted "initial receive
window" as the initial receive window advertised on the wire (which is
limited by protocol spec to 64 kbytes), when you meant the initial
value of tp->rcv_wnd. There are similar ambiguities in the commit
message body.

I would suggest resubmitting a version of the patch with a revised
commit title and commit description, to clarify at least the following
issues:

+ For the patch title, perhaps something like:
  tcp: remove 64 KByte limit for initial tp->rcv_wnd value

+ For the commit description, in the sentence 'Since in 2018 one
commit a337531b942b ("tcp: up initial rmem to 128KB and SYN rwin to
around 64KB") limited received window within 65535', please revise
this to clarify that you are talking about tp->rcv_wnd and not the
receive window on the wire. For example: 'In 2018 commit a337531b942b
("tcp: up initial rmem to 128KB and SYN rwin to around 64KB") limited
the initial value of tp->rcv_wnd to 65535,.'

+ For the commit description, please add a note that RFC 7323 limits
the initial receive window on the wire in a SYN or SYN+ACK to 65535
bytes, and __tcp_transmit_skb() already ensures that constraint is
respected, no matter how large tp->rcv_wnd is.

+ For the commit description, please include some version of your
example of the receive window values in a handshake, like:

One example of using a much bigger initial receive window:
client   --- SYN: rwindow=65535 ---> server
client   <--- SYN+ACK: rwindow=14600 ----  server
client   --- ACK: rwindow=175616 ---> server

+ For the commit description, please include some version of your
example of the evolution of the receive window over the first 3 round
trips, before and after your patch.

thanks,
neal
Jason Xing May 18, 2024, 12:04 a.m. UTC | #5
Hello Neal,

On Sat, May 18, 2024 at 2:35 AM Neal Cardwell <ncardwell@google.com> wrote:
>
> On Fri, May 17, 2024 at 1:49 PM Jason Xing <kerneljasonxing@gmail.com> wrote:
> >
> > On Sat, May 18, 2024 at 1:41 AM Jason Xing <kerneljasonxing@gmail.com> wrote:
> > >
> > > On Fri, May 17, 2024 at 10:42 PM Neal Cardwell <ncardwell@google.com> wrote:
> > > >
> > > > On Fri, May 17, 2024 at 4:50 AM Jason Xing <kerneljasonxing@gmail.com> wrote:
> > > > >
> > > > > From: Jason Xing <kernelxing@tencent.com>
> > > > >
> > > > > Since in 2018 one commit a337531b942b ("tcp: up initial rmem to 128KB and
> > > > > SYN rwin to around 64KB") limited received window within 65535, most CDN
> > > > > team would not benefit from this change because they cannot have a large
> > > > > window to receive a big packet one time especially in long RTT.
> > > > >
> > > > > According to RFC 7323, it says:
> > > > >   "The maximum receive window, and therefore the scale factor, is
> > > > >    determined by the maximum receive buffer space."
> > > > >
> > > > > So we can get rid of this 64k limitation and let the window be tunable if
> > > > > the user wants to do it within the control of buffer space. Then many
> > > > > companies, I believe, can have the same behaviour as old days. Besides,
> > > > > there are many papers conducting various interesting experiments which
> > > > > have something to do with this window and show good outputs in some cases.
> > > > >
> > > > > Signed-off-by: Jason Xing <kernelxing@tencent.com>
> > > > > ---
> > > > >  net/ipv4/tcp_output.c | 2 +-
> > > > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > > >
> > > > > diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
> > > > > index 95caf8aaa8be..95618d0e78e4 100644
> > > > > --- a/net/ipv4/tcp_output.c
> > > > > +++ b/net/ipv4/tcp_output.c
> > > > > @@ -232,7 +232,7 @@ void tcp_select_initial_window(const struct sock *sk, int __space, __u32 mss,
> > > > >         if (READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_workaround_signed_windows))
> > > > >                 (*rcv_wnd) = min(space, MAX_TCP_WINDOW);
> > > > >         else
> > > > > -               (*rcv_wnd) = min_t(u32, space, U16_MAX);
> > > > > +               (*rcv_wnd) = space;
> > > >
> > > > Hmm, has this patch been tested? This doesn't look like it would work.
> > >
> > > Hello Neal,
> > >
> > > Thanks for the comment.
> > >
> > > Sure, I provided such a patch a few months ago which has been tested
> > > in production for the customers.
> > >
> > > One example of using a much bigger initial receive window:
> > > client   ---window=65535---> server
> > > client   <---window=14600----  server
> > > client   ---window=175616---> server
> > >
> > > Then the client could send more data than before in fewer rtt.
> > >
> > > Above is the output of tcpdump.
> > >
> > > Oh, I just found a similar case:
> > > https://lore.kernel.org/all/20220213040545.365600-1-tilan7663@gmail.com/
> > >
> > > Before this, I always believed I'm not the only one who had such an issue.
> > >
> > > >
> > > > Please note that RFC 7323 says in
> > > > https://datatracker.ietf.org/doc/html/rfc7323#section-2.2 :
> > > >
> > > >    The window field in a segment where the SYN bit is set (i.e., a <SYN>
> > > >    or <SYN,ACK>) MUST NOT be scaled.
> > > >
> > > > Since the receive window field in a SYN is unscaled, that means the
> > > > TCP wire protocol has no way to convey a receive window in the SYN
> > > > that is bigger than 64KBytes.
> > > >
> > > > That is why this code places a limit of U16_MAX on the value here.
> > > >
> > > > If you want to advertise a bigger receive window in the SYN, you'll
> > >
> > > No. It's not my original intention.
> > >
> > > For SYN packet itself is limited in the __tcp_transmit_skb() as below:
> > >
> > >     th->window      = htons(min(tp->rcv_wnd, 65535U));
> >
> > With this limitation/protection of the window in SYN packet, It would
> > not break RFC with this patch applied. I try to advertise a bigger
> > initRwnd of ACK in a 3-way shakehand process.
>
> Thanks for the explanation.
>
> I think the confusion arose because in your title ("tcp: break the
> limitation of initial receive window"), I interpreted "initial receive
> window" as the initial receive window advertised on the wire (which is
> limited by protocol spec to 64 kbytes), when you meant the initial
> value of tp->rcv_wnd. There are similar ambiguities in the commit
> message body.
>
> I would suggest resubmitting a version of the patch with a revised
> commit title and commit description, to clarify at least the following
> issues:
>
> + For the patch title, perhaps something like:
>   tcp: remove 64 KByte limit for initial tp->rcv_wnd value

Thanks for the help.

I'll update it.

>
> + For the commit description, in the sentence 'Since in 2018 one
> commit a337531b942b ("tcp: up initial rmem to 128KB and SYN rwin to
> around 64KB") limited received window within 65535', please revise
> this to clarify that you are talking about tp->rcv_wnd and not the
> receive window on the wire. For example: 'In 2018 commit a337531b942b
> ("tcp: up initial rmem to 128KB and SYN rwin to around 64KB") limited
> the initial value of tp->rcv_wnd to 65535,.'

Got it :)

>
> + For the commit description, please add a note that RFC 7323 limits
> the initial receive window on the wire in a SYN or SYN+ACK to 65535
> bytes, and __tcp_transmit_skb() already ensures that constraint is
> respected, no matter how large tp->rcv_wnd is.

Okay, I will add it to avoid future confusion.

>
> + For the commit description, please include some version of your
> example of the receive window values in a handshake, like:
>
> One example of using a much bigger initial receive window:
> client   --- SYN: rwindow=65535 ---> server
> client   <--- SYN+ACK: rwindow=14600 ----  server
> client   --- ACK: rwindow=175616 ---> server

Agreed.

>
> + For the commit description, please include some version of your
> example of the evolution of the receive window over the first 3 round
> trips, before and after your patch.

I'll revise the body message according to your comments.

Thanks,
Jason
diff mbox series

Patch

diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 95caf8aaa8be..95618d0e78e4 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -232,7 +232,7 @@  void tcp_select_initial_window(const struct sock *sk, int __space, __u32 mss,
 	if (READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_workaround_signed_windows))
 		(*rcv_wnd) = min(space, MAX_TCP_WINDOW);
 	else
-		(*rcv_wnd) = min_t(u32, space, U16_MAX);
+		(*rcv_wnd) = space;
 
 	if (init_rcv_wnd)
 		*rcv_wnd = min(*rcv_wnd, init_rcv_wnd * mss);