[net] tcp: gso: really support BIG TCP

Message ID	20230601211732.1606062-1-edumazet@google.com (mailing list archive)
State	Superseded
Delegated to:	Netdev Maintainers
Headers	show Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1E66733DF for <netdev@vger.kernel.org>; Thu, 1 Jun 2023 21:17:35 +0000 (UTC) Date: Thu, 1 Jun 2023 21:17:32 +0000 Precedence: bulk Mime-Version: 1.0 Message-ID: <20230601211732.1606062-1-edumazet@google.com> Subject: [PATCH net] tcp: gso: really support BIG TCP From: Eric Dumazet <edumazet@google.com> To: "David S . Miller" <davem@davemloft.net>, Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com> Cc: netdev@vger.kernel.org, Xin Long <lucien.xin@gmail.com>, David Ahern <dsahern@kernel.org>, eric.dumazet@gmail.com, Eric Dumazet <edumazet@google.com>, Alexander Duyck <alexanderduyck@fb.com> Content-Type: text/plain; charset="UTF-8"
Series	[net] tcp: gso: really support BIG TCP \| expand [net] tcp: gso: really support BIG TCP

Context	Check	Description
netdev/series_format	success	Single patches do not need cover letters
netdev/tree_selection	success	Clearly marked for net
netdev/fixes_present	success	Fixes tag present in non-next series
netdev/header_inline	success	No static functions without inline keyword in header files
netdev/build_32bit	success	Errors and warnings before: 14 this patch: 14
netdev/cc_maintainers	success	CCed 7 of 7 maintainers
netdev/build_clang	success	Errors and warnings before: 8 this patch: 8
netdev/verify_signedoff	success	Signed-off-by tag matches author and committer
netdev/deprecated_api	success	None detected
netdev/check_selftest	success	No net selftest shell script
netdev/verify_fixes	success	Fixes tag looks correct
netdev/build_allmodconfig_warn	success	Errors and warnings before: 14 this patch: 14
netdev/checkpatch	success	total: 0 errors, 0 warnings, 0 checks, 9 lines checked
netdev/kdoc	success	Errors and warnings before: 0 this patch: 0
netdev/source_inline	success	Was 0 now: 0

Eric Dumazet June 1, 2023, 9:17 p.m. UTC

We missed that tcp_gso_segment() was assuming skb->len was smaller than 65535 :

oldlen = (u16)~skb->len;

This part came with commit 0718bcc09b35 ("[NET]: Fix CHECKSUM_HW GSO problems.")

This leads to wrong TCP checksum.

Simply use csum_fold() to support 32bit packet lengthes.

oldlen name is a bit misleading, as it is the contribution
of skb->len on the input skb TCP checksum. I added a comment
to clarify this point.

Fixes: 09f3d1a3a52c ("ipv6/gso: remove temporary HBH/jumbo header")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Alexander Duyck <alexanderduyck@fb.com>
---
 net/ipv4/tcp_offload.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

Alexander Duyck June 1, 2023, 9:46 p.m. UTC | #1

> -----Original Message-----
> From: Eric Dumazet <edumazet@google.com>
> Sent: Thursday, June 1, 2023 2:18 PM
> To: David S . Miller <davem@davemloft.net>; Jakub Kicinski
> <kuba@kernel.org>; Paolo Abeni <pabeni@redhat.com>
> Cc: netdev@vger.kernel.org; Xin Long <lucien.xin@gmail.com>; David Ahern
> <dsahern@kernel.org>; eric.dumazet@gmail.com; Eric Dumazet
> <edumazet@google.com>; Alexander Duyck <alexanderduyck@meta.com>
> Subject: [PATCH net] tcp: gso: really support BIG TCP
> 
> > 
> We missed that tcp_gso_segment() was assuming skb->len was smaller than
> 65535 :
> 
> oldlen = (u16)~skb->len;
> 
> This part came with commit 0718bcc09b35 ("[NET]: Fix CHECKSUM_HW GSO
> problems.")
> 
> This leads to wrong TCP checksum.
> 
> Simply use csum_fold() to support 32bit packet lengthes.
> 
> oldlen name is a bit misleading, as it is the contribution of skb->len on the
> input skb TCP checksum. I added a comment to clarify this point.
>
> Fixes: 09f3d1a3a52c ("ipv6/gso: remove temporary HBH/jumbo header")
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Cc: Alexander Duyck <alexanderduyck@fb.com>
> ---
>  net/ipv4/tcp_offload.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/net/ipv4/tcp_offload.c b/net/ipv4/tcp_offload.c index
> 45dda788938704c3f762256266d9ea29b6ded4a5..5a1a163b2d859696df8f204b5
> 0e3fc76c14b64e9 100644
> --- a/net/ipv4/tcp_offload.c
> +++ b/net/ipv4/tcp_offload.c
> @@ -75,7 +75,8 @@ struct sk_buff *tcp_gso_segment(struct sk_buff *skb,
>  	if (!pskb_may_pull(skb, thlen))
>  		goto out;
> 
> -	oldlen = (u16)~skb->len;
> +	/* Contribution of skb->len in current TCP checksum */
> +	oldlen = (__force u32)csum_fold((__force __wsum)skb->len);
>  	__skb_pull(skb, thlen);
> 
>  	mss = skb_shinfo(skb)->gso_size;

The only thing I am not sure about with this change is if we risk overflowing a u32 with all the math that may occur. The original code couldn't exceed a u16 for delta since we were essentially adding -oldlen + new header + mss. With us now allowing the use of a value larger than 16 bit we should be able to have the resultant value exceed 16b which means we might overflow when we add it to the current checksum.

As such we may want to look at holding off on the csum_fold until after we have added the new header and mss and then store the folded value in delta.

Eric Dumazet June 2, 2023, 2:30 a.m. UTC | #2

On Thu, Jun 1, 2023 at 11:46 PM Alexander Duyck <alexanderduyck@meta.com> wrote:
>
>
>
> > -----Original Message-----
> > From: Eric Dumazet <edumazet@google.com>
> > Sent: Thursday, June 1, 2023 2:18 PM
> > To: David S . Miller <davem@davemloft.net>; Jakub Kicinski
> > <kuba@kernel.org>; Paolo Abeni <pabeni@redhat.com>
> > Cc: netdev@vger.kernel.org; Xin Long <lucien.xin@gmail.com>; David Ahern
> > <dsahern@kernel.org>; eric.dumazet@gmail.com; Eric Dumazet
> > <edumazet@google.com>; Alexander Duyck <alexanderduyck@meta.com>
> > Subject: [PATCH net] tcp: gso: really support BIG TCP
> >
> > >
> > We missed that tcp_gso_segment() was assuming skb->len was smaller than
> > 65535 :
> >
> > oldlen = (u16)~skb->len;
> >
> > This part came with commit 0718bcc09b35 ("[NET]: Fix CHECKSUM_HW GSO
> > problems.")
> >
> > This leads to wrong TCP checksum.
> >
> > Simply use csum_fold() to support 32bit packet lengthes.
> >
> > oldlen name is a bit misleading, as it is the contribution of skb->len on the
> > input skb TCP checksum. I added a comment to clarify this point.
> >
> > Fixes: 09f3d1a3a52c ("ipv6/gso: remove temporary HBH/jumbo header")
> > Signed-off-by: Eric Dumazet <edumazet@google.com>
> > Cc: Alexander Duyck <alexanderduyck@fb.com>
> > ---
> >  net/ipv4/tcp_offload.c | 3 ++-
> >  1 file changed, 2 insertions(+), 1 deletion(-)
> >
> > diff --git a/net/ipv4/tcp_offload.c b/net/ipv4/tcp_offload.c index
> > 45dda788938704c3f762256266d9ea29b6ded4a5..5a1a163b2d859696df8f204b5
> > 0e3fc76c14b64e9 100644
> > --- a/net/ipv4/tcp_offload.c
> > +++ b/net/ipv4/tcp_offload.c
> > @@ -75,7 +75,8 @@ struct sk_buff *tcp_gso_segment(struct sk_buff *skb,
> >       if (!pskb_may_pull(skb, thlen))
> >               goto out;
> >
> > -     oldlen = (u16)~skb->len;
> > +     /* Contribution of skb->len in current TCP checksum */
> > +     oldlen = (__force u32)csum_fold((__force __wsum)skb->len);
> >       __skb_pull(skb, thlen);
> >
> >       mss = skb_shinfo(skb)->gso_size;
>
> The only thing I am not sure about with this change is if we risk overflowing a u32 with all the math that may occur. The original code couldn't exceed a u16 for delta since we were essentially adding -oldlen + new header + mss. With us now allowing the use of a value larger than 16 bit we should be able to have the resultant value exceed 16b which means we might overflow when we add it to the current checksum.
>
> As such we may want to look at holding off on the csum_fold until after we have added the new header and mss and then store the folded value in delta.
>

I think you missed that csum_fold() result is also a 16bit value.

Logic Herbert added years ago will work the same.

Here is the pure C implementation of csum_fold(), I hope this will
clear your concerns.

static inline __sum16 csum_fold(__wsum csum)
{
        u32 sum = (__force u32)csum;

        sum = (sum & 0xffff) + (sum >> 16);
        sum = (sum & 0xffff) + (sum >> 16);
        return (__force __sum16)~sum;
}

TCP ipv6 stack populates th->check with tcp_v6_send_check()
->
  __tcp_v6_send_check(skb, &sk->sk_v6_rcv_saddr, &sk->sk_v6_daddr);
->
    th->check = ~tcp_v6_check(skb->len, saddr, daddr, 0);

Thanks !

Alexander Duyck June 2, 2023, 2:24 p.m. UTC | #3

On Fri, 2023-06-02 at 04:30 +0200, Eric Dumazet wrote:
> On Thu, Jun 1, 2023 at 11:46 PM Alexander Duyck <alexanderduyck@meta.com> wrote:
> > 
> > 
> > 
> > > -----Original Message-----
> > > From: Eric Dumazet <edumazet@google.com>
> > > Sent: Thursday, June 1, 2023 2:18 PM
> > > To: David S . Miller <davem@davemloft.net>; Jakub Kicinski
> > > <kuba@kernel.org>; Paolo Abeni <pabeni@redhat.com>
> > > Cc: netdev@vger.kernel.org; Xin Long <lucien.xin@gmail.com>; David Ahern
> > > <dsahern@kernel.org>; eric.dumazet@gmail.com; Eric Dumazet
> > > <edumazet@google.com>; Alexander Duyck <alexanderduyck@meta.com>
> > > Subject: [PATCH net] tcp: gso: really support BIG TCP
> > > 
> > > > 
> > > We missed that tcp_gso_segment() was assuming skb->len was smaller than
> > > 65535 :
> > > 
> > > oldlen = (u16)~skb->len;
> > > 
> > > This part came with commit 0718bcc09b35 ("[NET]: Fix CHECKSUM_HW GSO
> > > problems.")
> > > 
> > > This leads to wrong TCP checksum.
> > > 
> > > Simply use csum_fold() to support 32bit packet lengthes.
> > > 
> > > oldlen name is a bit misleading, as it is the contribution of skb->len on the
> > > input skb TCP checksum. I added a comment to clarify this point.
> > > 
> > > Fixes: 09f3d1a3a52c ("ipv6/gso: remove temporary HBH/jumbo header")
> > > Signed-off-by: Eric Dumazet <edumazet@google.com>
> > > Cc: Alexander Duyck <alexanderduyck@fb.com>
> > > ---
> > >  net/ipv4/tcp_offload.c | 3 ++-
> > >  1 file changed, 2 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/net/ipv4/tcp_offload.c b/net/ipv4/tcp_offload.c index
> > > 45dda788938704c3f762256266d9ea29b6ded4a5..5a1a163b2d859696df8f204b5
> > > 0e3fc76c14b64e9 100644
> > > --- a/net/ipv4/tcp_offload.c
> > > +++ b/net/ipv4/tcp_offload.c
> > > @@ -75,7 +75,8 @@ struct sk_buff *tcp_gso_segment(struct sk_buff *skb,
> > >       if (!pskb_may_pull(skb, thlen))
> > >               goto out;
> > > 
> > > -     oldlen = (u16)~skb->len;
> > > +     /* Contribution of skb->len in current TCP checksum */
> > > +     oldlen = (__force u32)csum_fold((__force __wsum)skb->len);
> > >       __skb_pull(skb, thlen);
> > > 
> > >       mss = skb_shinfo(skb)->gso_size;
> > 
> > The only thing I am not sure about with this change is if we risk overflowing a u32 with all the math that may occur. The original code couldn't exceed a u16 for delta since we were essentially adding -oldlen + new header + mss. With us now allowing the use of a value larger than 16 bit we should be able to have the resultant value exceed 16b which means we might overflow when we add it to the current checksum.
> > 
> > As such we may want to look at holding off on the csum_fold until after we have added the new header and mss and then store the folded value in delta.
> > 
> 
> I think you missed that csum_fold() result is also a 16bit value.

I saw that. My concern was more about delta versus the oldlen value
itself though. Specifically your folded value is added to thlen + mss
which can then overflow past a 16b value, and when byteswapped and then
added to the original checksum there is a risk of potential overflow.

The general idea was that ~skb->len + (segment length) will always
technically be less than 0 since the original skb->len should always be
larger or equal to the new segmented length. So the code as it was
would always generate a value 16 or less in length.

This was important when we computed delta and added it to the original
value since we were using htonl which would byteswap things so we could
potentially generate a 32b value, but it couldn't overflow since the
two addends consisted of the upper 16b and lower 16b.

That is why I am thinking we are better off just dropping the "(u16)"
cast and just passing ~skb->len as the old_len.

To address this we then have a couple different approaches we could
take:
1. use csum_fold on the "delta" value either before or after the htonl.
2. use csum_add instead "+" for the addition of (th->check + delta)

I'm thinking option 2 may be the better way to go as it would just add
2 addc operations, one for newcheck and one at the end of the function
for th->check.

Eric Dumazet June 2, 2023, 3:21 p.m. UTC | #4

On Fri, Jun 2, 2023 at 4:24 PM Alexander H Duyck
<alexander.duyck@gmail.com> wrote:
>
> On Fri, 2023-06-02 at 04:30 +0200, Eric Dumazet wrote:
> > On Thu, Jun 1, 2023 at 11:46 PM Alexander Duyck <alexanderduyck@meta.com> wrote:
> > >
> > >
> > >
> > > > -----Original Message-----
> > > > From: Eric Dumazet <edumazet@google.com>
> > > > Sent: Thursday, June 1, 2023 2:18 PM
> > > > To: David S . Miller <davem@davemloft.net>; Jakub Kicinski
> > > > <kuba@kernel.org>; Paolo Abeni <pabeni@redhat.com>
> > > > Cc: netdev@vger.kernel.org; Xin Long <lucien.xin@gmail.com>; David Ahern
> > > > <dsahern@kernel.org>; eric.dumazet@gmail.com; Eric Dumazet
> > > > <edumazet@google.com>; Alexander Duyck <alexanderduyck@meta.com>
> > > > Subject: [PATCH net] tcp: gso: really support BIG TCP
> > > >
> > > > >
> > > > We missed that tcp_gso_segment() was assuming skb->len was smaller than
> > > > 65535 :
> > > >
> > > > oldlen = (u16)~skb->len;
> > > >
> > > > This part came with commit 0718bcc09b35 ("[NET]: Fix CHECKSUM_HW GSO
> > > > problems.")
> > > >
> > > > This leads to wrong TCP checksum.
> > > >
> > > > Simply use csum_fold() to support 32bit packet lengthes.
> > > >
> > > > oldlen name is a bit misleading, as it is the contribution of skb->len on the
> > > > input skb TCP checksum. I added a comment to clarify this point.
> > > >
> > > > Fixes: 09f3d1a3a52c ("ipv6/gso: remove temporary HBH/jumbo header")
> > > > Signed-off-by: Eric Dumazet <edumazet@google.com>
> > > > Cc: Alexander Duyck <alexanderduyck@fb.com>
> > > > ---
> > > >  net/ipv4/tcp_offload.c | 3 ++-
> > > >  1 file changed, 2 insertions(+), 1 deletion(-)
> > > >
> > > > diff --git a/net/ipv4/tcp_offload.c b/net/ipv4/tcp_offload.c index
> > > > 45dda788938704c3f762256266d9ea29b6ded4a5..5a1a163b2d859696df8f204b5
> > > > 0e3fc76c14b64e9 100644
> > > > --- a/net/ipv4/tcp_offload.c
> > > > +++ b/net/ipv4/tcp_offload.c
> > > > @@ -75,7 +75,8 @@ struct sk_buff *tcp_gso_segment(struct sk_buff *skb,
> > > >       if (!pskb_may_pull(skb, thlen))
> > > >               goto out;
> > > >
> > > > -     oldlen = (u16)~skb->len;
> > > > +     /* Contribution of skb->len in current TCP checksum */
> > > > +     oldlen = (__force u32)csum_fold((__force __wsum)skb->len);
> > > >       __skb_pull(skb, thlen);
> > > >
> > > >       mss = skb_shinfo(skb)->gso_size;
> > >
> > > The only thing I am not sure about with this change is if we risk overflowing a u32 with all the math that may occur. The original code couldn't exceed a u16 for delta since we were essentially adding -oldlen + new header + mss. With us now allowing the use of a value larger than 16 bit we should be able to have the resultant value exceed 16b which means we might overflow when we add it to the current checksum.
> > >
> > > As such we may want to look at holding off on the csum_fold until after we have added the new header and mss and then store the folded value in delta.
> > >
> >
> > I think you missed that csum_fold() result is also a 16bit value.
>
> I saw that. My concern was more about delta versus the oldlen value
> itself though. Specifically your folded value is added to thlen + mss
> which can then overflow past a 16b value, and when byteswapped and then
> added to the original checksum there is a risk of potential overflow.

I do not think it matters. Herbert Xu said that what matters is that

oldlen + (thlen + mss) would not overflow a 32bit value./

>
> The general idea was that ~skb->len + (segment length) will always
> technically be less than 0 since the original skb->len should always be
> larger or equal to the new segmented length. So the code as it was
> would always generate a value 16 or less in length.
>
> This was important when we computed delta and added it to the original
> value since we were using htonl which would byteswap things so we could
> potentially generate a 32b value, but it couldn't overflow since the
> two addends consisted of the upper 16b and lower 16b.
>
> That is why I am thinking we are better off just dropping the "(u16)"
> cast and just passing ~skb->len as the old_len.
>
> To address this we then have a couple different approaches we could
> take:
> 1. use csum_fold on the "delta" value either before or after the htonl.
> 2. use csum_add instead "+" for the addition of (th->check + delta)
>
> I'm thinking option 2 may be the better way to go as it would just add
> 2 addc operations, one for newcheck and one at the end of the function
> for th->check.

Are you working on a patch yourself ? What would be the ETA ?

Thanks.

Eric Dumazet June 2, 2023, 3:24 p.m. UTC | #5

On Fri, Jun 2, 2023 at 5:21 PM Eric Dumazet <edumazet@google.com> wrote:
>
> On Fri, Jun 2, 2023 at 4:24 PM Alexander H Duyck
> <alexander.duyck@gmail.com> wrote:
> >
> > On Fri, 2023-06-02 at 04:30 +0200, Eric Dumazet wrote:
> > > On Thu, Jun 1, 2023 at 11:46 PM Alexander Duyck <alexanderduyck@meta.com> wrote:
> > > >
> > > >
> > > >
> > > > > -----Original Message-----
> > > > > From: Eric Dumazet <edumazet@google.com>
> > > > > Sent: Thursday, June 1, 2023 2:18 PM
> > > > > To: David S . Miller <davem@davemloft.net>; Jakub Kicinski
> > > > > <kuba@kernel.org>; Paolo Abeni <pabeni@redhat.com>
> > > > > Cc: netdev@vger.kernel.org; Xin Long <lucien.xin@gmail.com>; David Ahern
> > > > > <dsahern@kernel.org>; eric.dumazet@gmail.com; Eric Dumazet
> > > > > <edumazet@google.com>; Alexander Duyck <alexanderduyck@meta.com>
> > > > > Subject: [PATCH net] tcp: gso: really support BIG TCP
> > > > >
> > > > > >
> > > > > We missed that tcp_gso_segment() was assuming skb->len was smaller than
> > > > > 65535 :
> > > > >
> > > > > oldlen = (u16)~skb->len;
> > > > >
> > > > > This part came with commit 0718bcc09b35 ("[NET]: Fix CHECKSUM_HW GSO
> > > > > problems.")
> > > > >
> > > > > This leads to wrong TCP checksum.
> > > > >
> > > > > Simply use csum_fold() to support 32bit packet lengthes.
> > > > >
> > > > > oldlen name is a bit misleading, as it is the contribution of skb->len on the
> > > > > input skb TCP checksum. I added a comment to clarify this point.
> > > > >
> > > > > Fixes: 09f3d1a3a52c ("ipv6/gso: remove temporary HBH/jumbo header")
> > > > > Signed-off-by: Eric Dumazet <edumazet@google.com>
> > > > > Cc: Alexander Duyck <alexanderduyck@fb.com>
> > > > > ---
> > > > >  net/ipv4/tcp_offload.c | 3 ++-
> > > > >  1 file changed, 2 insertions(+), 1 deletion(-)
> > > > >
> > > > > diff --git a/net/ipv4/tcp_offload.c b/net/ipv4/tcp_offload.c index
> > > > > 45dda788938704c3f762256266d9ea29b6ded4a5..5a1a163b2d859696df8f204b5
> > > > > 0e3fc76c14b64e9 100644
> > > > > --- a/net/ipv4/tcp_offload.c
> > > > > +++ b/net/ipv4/tcp_offload.c
> > > > > @@ -75,7 +75,8 @@ struct sk_buff *tcp_gso_segment(struct sk_buff *skb,
> > > > >       if (!pskb_may_pull(skb, thlen))
> > > > >               goto out;
> > > > >
> > > > > -     oldlen = (u16)~skb->len;
> > > > > +     /* Contribution of skb->len in current TCP checksum */
> > > > > +     oldlen = (__force u32)csum_fold((__force __wsum)skb->len);
> > > > >       __skb_pull(skb, thlen);
> > > > >
> > > > >       mss = skb_shinfo(skb)->gso_size;
> > > >
> > > > The only thing I am not sure about with this change is if we risk overflowing a u32 with all the math that may occur. The original code couldn't exceed a u16 for delta since we were essentially adding -oldlen + new header + mss. With us now allowing the use of a value larger than 16 bit we should be able to have the resultant value exceed 16b which means we might overflow when we add it to the current checksum.
> > > >
> > > > As such we may want to look at holding off on the csum_fold until after we have added the new header and mss and then store the folded value in delta.
> > > >
> > >
> > > I think you missed that csum_fold() result is also a 16bit value.
> >
> > I saw that. My concern was more about delta versus the oldlen value
> > itself though. Specifically your folded value is added to thlen + mss
> > which can then overflow past a 16b value, and when byteswapped and then
> > added to the original checksum there is a risk of potential overflow.
>
> I do not think it matters. Herbert Xu said that what matters is that
>
> oldlen + (thlen + mss) would not overflow a 32bit value./
>
> >
> > The general idea was that ~skb->len + (segment length) will always
> > technically be less than 0 since the original skb->len should always be
> > larger or equal to the new segmented length. So the code as it was
> > would always generate a value 16 or less in length.
> >
> > This was important when we computed delta and added it to the original
> > value since we were using htonl which would byteswap things so we could
> > potentially generate a 32b value, but it couldn't overflow since the
> > two addends consisted of the upper 16b and lower 16b.
> >
> > That is why I am thinking we are better off just dropping the "(u16)"
> > cast and just passing ~skb->len as the old_len.
> >
> > To address this we then have a couple different approaches we could
> > take:
> > 1. use csum_fold on the "delta" value either before or after the htonl.
> > 2. use csum_add instead "+" for the addition of (th->check + delta)
> >
> > I'm thinking option 2 may be the better way to go as it would just add
> > 2 addc operations, one for newcheck and one at the end of the function
> > for th->check.
>
> Are you working on a patch yourself ? What would be the ETA ?
>
> Thanks.

To be clear, I spent a lot of time trying to code something like the following,
then I chose a much simpler solution (patch as you see it)

This was the WIP (and not working at all)

diff --git a/net/ipv4/tcp_offload.c b/net/ipv4/tcp_offload.c
index 3568607588b107f06b8fb7b1143d5417bb2a3ac2..19bc2d9ae10d45aa5cbb35add4aa8e9f6b46a6ab
100644
--- a/net/ipv4/tcp_offload.c
+++ b/net/ipv4/tcp_offload.c
@@ -60,11 +60,11 @@ struct sk_buff *tcp_gso_segment(struct sk_buff *skb,
 {
        struct sk_buff *segs = ERR_PTR(-EINVAL);
        unsigned int sum_truesize = 0;
+       unsigned int oldlen, newlen;
        struct tcphdr *th;
        unsigned int thlen;
        unsigned int seq;
-       __be32 delta;
-       unsigned int oldlen;
+       __wsum delta;
        unsigned int mss;
        struct sk_buff *gso_skb = skb;
        __sum16 newcheck;
@@ -78,7 +78,7 @@ struct sk_buff *tcp_gso_segment(struct sk_buff *skb,
        if (!pskb_may_pull(skb, thlen))
                goto out;

-       oldlen = (u16)~skb->len;
+       oldlen = skb->len;
        __skb_pull(skb, thlen);

        mss = skb_shinfo(skb)->gso_size;
@@ -113,7 +113,9 @@ struct sk_buff *tcp_gso_segment(struct sk_buff *skb,
        if (skb_is_gso(segs))
                mss *= skb_shinfo(segs)->gso_segs;

-       delta = htonl(oldlen + (thlen + mss));
+       newlen = thlen + mss;
+       delta = csum_sub(htonl(newlen), htonl(oldlen));
+       newcheck = csum_fold(csum_add(csum_unfold(th->check), delta));

        skb = segs;
        th = tcp_hdr(skb);
@@ -122,8 +124,6 @@ struct sk_buff *tcp_gso_segment(struct sk_buff *skb,
        if (unlikely(skb_shinfo(gso_skb)->tx_flags & SKBTX_SW_TSTAMP))
                tcp_gso_tstamp(segs, skb_shinfo(gso_skb)->tskey, seq, mss);

-       newcheck = ~csum_fold((__force __wsum)((__force u32)th->check +
-                                              (__force u32)delta));

        while (skb->next) {
                th->fin = th->psh = 0;
@@ -168,11 +168,11 @@ struct sk_buff *tcp_gso_segment(struct sk_buff *skb,
                        WARN_ON_ONCE(refcount_sub_and_test(-delta,
&skb->sk->sk_wmem_alloc));
        }

-       delta = htonl(oldlen + (skb_tail_pointer(skb) -
-                               skb_transport_header(skb)) +
-                     skb->data_len);
-       th->check = ~csum_fold((__force __wsum)((__force u32)th->check +
-                               (__force u32)delta));
+       newlen = skb_tail_pointer(skb) - skb_transport_header(skb) +
+                skb->data_len;
+       delta = csum_sub(htonl(newlen), htonl(oldlen));
+       th->check = csum_fold(csum_add(csum_unfold(th->check), delta));
+
        if (skb->ip_summed == CHECKSUM_PARTIAL)
                gso_reset_checksum(skb, ~th->check);
        else

Alexander Duyck June 2, 2023, 3:38 p.m. UTC | #6

On Fri, Jun 2, 2023 at 8:25 AM Eric Dumazet <edumazet@google.com> wrote:

<...>

> This was the WIP (and not working at all)

This is way more than what I was thinking Some of what I had stated
was just pseudocode. No need for the addition of newlen for instance.

> diff --git a/net/ipv4/tcp_offload.c b/net/ipv4/tcp_offload.c
> index 3568607588b107f06b8fb7b1143d5417bb2a3ac2..19bc2d9ae10d45aa5cbb35add4aa8e9f6b46a6ab
> 100644
> --- a/net/ipv4/tcp_offload.c
> +++ b/net/ipv4/tcp_offload.c
> @@ -60,11 +60,11 @@ struct sk_buff *tcp_gso_segment(struct sk_buff *skb,
>  {
>         struct sk_buff *segs = ERR_PTR(-EINVAL);
>         unsigned int sum_truesize = 0;
> +       unsigned int oldlen, newlen;
>         struct tcphdr *th;
>         unsigned int thlen;
>         unsigned int seq;
> -       __be32 delta;
> -       unsigned int oldlen;
> +       __wsum delta;
>         unsigned int mss;
>         struct sk_buff *gso_skb = skb;
>         __sum16 newcheck;

I wouldn't even bother with any of these changes.

> @@ -78,7 +78,7 @@ struct sk_buff *tcp_gso_segment(struct sk_buff *skb,
>         if (!pskb_may_pull(skb, thlen))
>                 goto out;
>
> -       oldlen = (u16)~skb->len;
> +       oldlen = skb->len;
>         __skb_pull(skb, thlen);
>
>         mss = skb_shinfo(skb)->gso_size;

As I stated before I would just drop the "(u16)". We are expanding the
oldlen to a u32, but don't have to worry about it overflowing because
skb->len should always be greater than or equal to our segmented
length. So the most it could reach is ~0 if skb->len == segmented
length.

> @@ -113,7 +113,9 @@ struct sk_buff *tcp_gso_segment(struct sk_buff *skb,
>         if (skb_is_gso(segs))
>                 mss *= skb_shinfo(segs)->gso_segs;
>
> -       delta = htonl(oldlen + (thlen + mss));
> +       newlen = thlen + mss;
> +       delta = csum_sub(htonl(newlen), htonl(oldlen));
> +       newcheck = csum_fold(csum_add(csum_unfold(th->check), delta));
>
>         skb = segs;
>         th = tcp_hdr(skb)

I think all of this can just be dropped.

> @@ -122,8 +124,6 @@ struct sk_buff *tcp_gso_segment(struct sk_buff *skb,
>         if (unlikely(skb_shinfo(gso_skb)->tx_flags & SKBTX_SW_TSTAMP))
>                 tcp_gso_tstamp(segs, skb_shinfo(gso_skb)->tskey, seq, mss);
>
> -       newcheck = ~csum_fold((__force __wsum)((__force u32)th->check +
> -                                              (__force u32)delta));
>
>         while (skb->next) {
>                 th->fin = th->psh = 0;

So the change I would make here is:
newcheck = ~csum_fold(csum_add(csum_unfold(th->check, ()), (__force
__wsum)delta);


> @@ -168,11 +168,11 @@ struct sk_buff *tcp_gso_segment(struct sk_buff *skb,
>                         WARN_ON_ONCE(refcount_sub_and_test(-delta,
> &skb->sk->sk_wmem_alloc));
>         }
>
> -       delta = htonl(oldlen + (skb_tail_pointer(skb) -
> -                               skb_transport_header(skb)) +
> -                     skb->data_len);
> -       th->check = ~csum_fold((__force __wsum)((__force u32)th->check +
> -                               (__force u32)delta));
> +       newlen = skb_tail_pointer(skb) - skb_transport_header(skb) +
> +                skb->data_len;
> +       delta = csum_sub(htonl(newlen), htonl(oldlen));
> +       th->check = csum_fold(csum_add(csum_unfold(th->check), delta));
> +
>         if (skb->ip_summed == CHECKSUM_PARTIAL)
>                 gso_reset_checksum(skb, ~th->check);
>         else

Likewise here, the only thing that has to be replaced is the th->check
line so it would be:
th->check = ~csum_fold(csum_add(csum_unfold(th->check, ()), (__force
__wsum)delta);

David Laight June 5, 2023, 1:58 p.m. UTC | #7

From: Alexander Duyck
> Sent: 02 June 2023 16:39
...
> I wouldn't even bother with any of these changes.
> 
> > @@ -78,7 +78,7 @@ struct sk_buff *tcp_gso_segment(struct sk_buff *skb,
> >         if (!pskb_may_pull(skb, thlen))
> >                 goto out;
> >
> > -       oldlen = (u16)~skb->len;
> > +       oldlen = skb->len;
> >         __skb_pull(skb, thlen);
> >
> >         mss = skb_shinfo(skb)->gso_size;
> 
> As I stated before I would just drop the "(u16)". We are expanding the
> oldlen to a u32, but don't have to worry about it overflowing because
> skb->len should always be greater than or equal to our segmented
> length. So the most it could reach is ~0 if skb->len == segmented
> length.

Have you missed the ~ ?
The (u16) makes it equivalent to skb->len ^ 0xffff.
Otherwise all the high bit are set and any arithmetic will generate
a carry and the wrong value.

FWIW this C version of csum_fold() (eg from arch/arc)
generates better code than the existing versions on pretty
much every architecture.

	return (~sum - ror32(sum, 16)) >> 16;

The main exception will be sparc (which has a carry flag but
no rotate) and arm with its barrel shifter.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

[net] tcp: gso: really support BIG TCP

Checks

Commit Message

Comments

Patch