Message ID | 20211124202446.2917972-3-eric.dumazet@gmail.com (mailing list archive) |
---|---|
State | Accepted |
Commit | 29c3002644bdd653f6ec6407d25135d0a4f7cefb |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | net: small csum optimizations | expand |
From: Eric Dumazet > Sent: 24 November 2021 20:25 > > From: Eric Dumazet <edumazet@google.com> > > Remove one pair of add/adc instructions and their dependency > against carry flag. > > We can leverage third argument to csum_partial(): > > X = csum_block_sub(X, csum_partial(start, len, 0), 0); > > --> > > X = csum_block_add(X, ~csum_partial(start, len, 0), 0); > > --> > > X = ~csum_partial(start, len, ~X); That doesn't seem to refer to the change in this file. David > > Signed-off-by: Eric Dumazet <edumazet@google.com> > --- > include/linux/skbuff.h | 6 +++++- > 1 file changed, 5 insertions(+), 1 deletion(-) > > diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h > index eba256af64a577b458998845f2dc01a5ec80745a..eae4bd3237a41cc1b60b44c91fbfe21dfdd8f117 100644 > --- a/include/linux/skbuff.h > +++ b/include/linux/skbuff.h > @@ -3485,7 +3485,11 @@ __skb_postpull_rcsum(struct sk_buff *skb, const void *start, unsigned int len, > static inline void skb_postpull_rcsum(struct sk_buff *skb, > const void *start, unsigned int len) > { > - __skb_postpull_rcsum(skb, start, len, 0); > + if (skb->ip_summed == CHECKSUM_COMPLETE) > + skb->csum = ~csum_partial(start, len, ~skb->csum); > + else if (skb->ip_summed == CHECKSUM_PARTIAL && > + skb_checksum_start_offset(skb) < 0) > + skb->ip_summed = CHECKSUM_NONE; > } > > static __always_inline void > -- > 2.34.0.rc2.393.gf8c9666880-goog - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)
On Thu, Nov 25, 2021 at 1:41 AM David Laight <David.Laight@aculab.com> wrote: > > From: Eric Dumazet > > Sent: 24 November 2021 20:25 > > > > From: Eric Dumazet <edumazet@google.com> > > > > Remove one pair of add/adc instructions and their dependency > > against carry flag. > > > > We can leverage third argument to csum_partial(): > > > > X = csum_block_sub(X, csum_partial(start, len, 0), 0); > > > > --> > > > > X = csum_block_add(X, ~csum_partial(start, len, 0), 0); > > > > --> > > > > X = ~csum_partial(start, len, ~X); > > That doesn't seem to refer to the change in this file. > It is describing the change. The first step, of copying first the __skb_postpull_rcsum(skb, start, len, 0) content into __skb_postpull_rcsum() was kind of obvious. > David > > > > > Signed-off-by: Eric Dumazet <edumazet@google.com> > > --- > > include/linux/skbuff.h | 6 +++++- > > 1 file changed, 5 insertions(+), 1 deletion(-) > > > > diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h > > index eba256af64a577b458998845f2dc01a5ec80745a..eae4bd3237a41cc1b60b44c91fbfe21dfdd8f117 100644 > > --- a/include/linux/skbuff.h > > +++ b/include/linux/skbuff.h > > @@ -3485,7 +3485,11 @@ __skb_postpull_rcsum(struct sk_buff *skb, const void *start, unsigned int len, > > static inline void skb_postpull_rcsum(struct sk_buff *skb, > > const void *start, unsigned int len) > > { > > - __skb_postpull_rcsum(skb, start, len, 0); > > + if (skb->ip_summed == CHECKSUM_COMPLETE) > > + skb->csum = ~csum_partial(start, len, ~skb->csum); > > + else if (skb->ip_summed == CHECKSUM_PARTIAL && > > + skb_checksum_start_offset(skb) < 0) > > + skb->ip_summed = CHECKSUM_NONE; > > } > > > > static __always_inline void > > -- > > 2.34.0.rc2.393.gf8c9666880-goog > > - > Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK > Registration No: 1397386 (Wales) >
There is another optimisation you can do that removes a conditional from the end of the checksum generation. The 'adc' sum is reduced to 16 bits leaving a value [1..0xffff]. This is then inverted to get the checksum, but the range is then [0..0xfffe]. So 0 then has to be converted to 0xffff. If you add 1 to thw inut checksum one of the csum_partial() calls the adc sum is one too big, so the inverted value is one too small. Adding 1 to the inverted value fixes this and leaves a checksum in the correct range. Potentially the invert+increment can be done as a negate prior to the final masking with 0xffff. (Which the compiler may well sort out for you.) You do need to know 'early' that the checksum is going to get inverted - or too many places might add in the extra 'one'. On 64bit systems the 'input checksum' to csum_partial() can (almost certainly) be made a long - with a proviso that the value must not exceed 2**56 because the function might want to add a partial word to it. I'm also not sure how well any of this runs on mips-like cpu that don't have a carry flag (I think this includes riscV). On 64bit cpu it may be best to add 32bit values to 64bit registers. With 2 memory read ports it is even possibly that an x86 cpu can do 8 bytes/clock by adding 32 bit values to two registers. However the reads would have to be aligned and arranged to avoid cache bank conflicts. David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)
Hello, On Wed, Nov 24, 2021 at 12:24:46PM -0800, Eric Dumazet wrote: > From: Eric Dumazet <edumazet@google.com> > > Remove one pair of add/adc instructions and their dependency > against carry flag. > > We can leverage third argument to csum_partial(): > > X = csum_block_sub(X, csum_partial(start, len, 0), 0); > > --> > > X = csum_block_add(X, ~csum_partial(start, len, 0), 0); > > --> > > X = ~csum_partial(start, len, ~X); > > Signed-off-by: Eric Dumazet <edumazet@google.com> > --- > include/linux/skbuff.h | 6 +++++- > 1 file changed, 5 insertions(+), 1 deletion(-) > > diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h > index eba256af64a577b458998845f2dc01a5ec80745a..eae4bd3237a41cc1b60b44c91fbfe21dfdd8f117 100644 > --- a/include/linux/skbuff.h > +++ b/include/linux/skbuff.h > @@ -3485,7 +3485,11 @@ __skb_postpull_rcsum(struct sk_buff *skb, const void *start, unsigned int len, > static inline void skb_postpull_rcsum(struct sk_buff *skb, > const void *start, unsigned int len) > { > - __skb_postpull_rcsum(skb, start, len, 0); > + if (skb->ip_summed == CHECKSUM_COMPLETE) > + skb->csum = ~csum_partial(start, len, ~skb->csum); > + else if (skb->ip_summed == CHECKSUM_PARTIAL && > + skb_checksum_start_offset(skb) < 0) > + skb->ip_summed = CHECKSUM_NONE; > } > > static __always_inline void > -- > 2.34.0.rc2.393.gf8c9666880-goog > I am seeing some errors after this patch, and I am not able to understand why. Specifically, __skb_gro_checksum_complete() hits this condition: wsum = skb_checksum(skb, skb_gro_offset(skb), skb_gro_len(skb), 0); /* NAPI_GRO_CB(skb)->csum holds pseudo checksum */ sum = csum_fold(csum_add(NAPI_GRO_CB(skb)->csum, wsum)); /* See comments in __skb_checksum_complete(). */ if (likely(!sum)) { if (unlikely(skb->ip_summed == CHECKSUM_COMPLETE) && !skb->csum_complete_sw) netdev_rx_csum_fault(skb->dev, skb); } To test, I am using a DSA switch network interface with an IPv4 address and pinging through it. The idea is as follows: an enetc port is attached to a switch, and that switch places a frame header before the Ethernet header. The enetc calculates the L2 payload (actually what it perceives as L2 payload, since it has no insight into the switch header format) checksum and puts it in skb->csum: https://elixir.bootlin.com/linux/latest/source/drivers/net/ethernet/freescale/enetc/enetc.c#L726 Then, the switch driver packet type handler is invoked, and this strips that header and recalculates the checksum (then it changes skb->dev and this is how pinging is done through the DSA interface, but that is irrelevant). https://elixir.bootlin.com/linux/latest/source/net/dsa/tag_ocelot.c#L56 There seems to be a disparity when the skb->csum is calculated by skb_postpull_rcsum as zero. Before, it was calculated as 0xffff. Below is a dump added by me in skb_postpull_rcsum when the checksums calculated through both methods differ. I've kept the old implementation inside skb->csum and this is what skb_dump() sees: [ 99.891524] skb csum of 20 bytes (20 to left of skb->data) using method 1: 0x0 method 2 0xffffffff, orig 0xf470 [ 99.901701] skb len=84 headroom=98 headlen=84 tailroom=1546 [ 99.901701] mac=(84,-6) net=(78,0) trans=78 [ 99.901701] shinfo(txflags=0 nr_frags=0 gso(size=0 type=0 segs=0)) [ 99.901701] csum(0xffffffff ip_summed=2 complete_sw=0 valid=0 level=0) [ 99.901701] hash(0x0 sw=0 l4=0) proto=0x00f8 pkttype=3 iif=7 [ 99.929916] dev name=eno2 feat=0x0x00020100001149a9 [ 99.934831] skb headroom: 00000000: 00 c0 b1 a4 ff ff 00 00 00 e0 b1 a4 ff ff 00 00 [ 99.942533] skb headroom: 00000010: 00 6f 5b 02 f8 14 ff ff 40 62 5b 02 f8 14 ff ff [ 99.950232] skb headroom: 00000020: 21 6f 5b 02 f8 14 ff ff 00 00 00 00 00 00 00 00 [ 99.957931] skb headroom: 00000030: 00 00 00 00 00 00 00 00 00 40 00 00 00 00 00 00 [ 99.965631] skb headroom: 00000040: 88 80 00 0a 00 33 9d 40 f0 41 01 80 00 00 08 0f [ 99.973330] skb headroom: 00000050: 00 10 00 00 00 04 9f 05 f6 28 ba ae e4 b6 2c 3d [ 99.981028] skb headroom: 00000060: 08 00 [ 99.985062] skb linear: 00000000: 45 00 00 54 27 ac 00 00 40 01 09 a8 c0 a8 64 03 [ 99.992762] skb linear: 00000010: c0 a8 64 01 00 00 10 e6 01 5c 00 04 49 30 a7 61 [ 100.000462] skb linear: 00000020: 00 00 00 00 3d 55 01 00 00 00 00 00 10 11 12 13 [ 100.008162] skb linear: 00000030: 14 15 16 17 18 19 1a 1b 1c 1d 1e 1f 20 21 22 23 [ 100.015862] skb linear: 00000040: 24 25 26 27 28 29 2a 2b 2c 2d 2e 2f 30 31 32 33 [ 100.023561] skb linear: 00000050: 34 35 36 37 And below is the same output as above, but annotated by me with some comments: [ 99.891524] skb csum of 20 bytes (20 to left of skb->data) using method 1: 0x0 method 2 0xffffffff, orig 0xf470 [ 99.901701] skb len=84 headroom=98 headlen=84 tailroom=1546 [ 99.901701] mac=(84,-6) net=(78,0) trans=78 ^^^^^^^^^^^^^^^^^^^^^^ since the print is done from ocelot_rcv, the network and transport headers haven't yet been updated [ 99.901701] shinfo(txflags=0 nr_frags=0 gso(size=0 type=0 segs=0)) [ 99.901701] csum(0xffffffff ip_summed=2 complete_sw=0 valid=0 level=0) [ 99.901701] hash(0x0 sw=0 l4=0) proto=0x00f8 pkttype=3 iif=7 [ 99.929916] dev name=eno2 feat=0x0x00020100001149a9 [ 99.934831] skb headroom: 00000000: 00 c0 b1 a4 ff ff 00 00 00 e0 b1 a4 ff ff 00 00 [ 99.942533] skb headroom: 00000010: 00 6f 5b 02 f8 14 ff ff 40 62 5b 02 f8 14 ff ff [ 99.950232] skb headroom: 00000020: 21 6f 5b 02 f8 14 ff ff 00 00 00 00 00 00 00 00 [ 99.957931] skb headroom: 00000030: 00 00 00 00 00 00 00 00 00 40 00 00 00 00 00 00 here is where the enetc saw the the "start" variable (old skb->data) beginning of the frame points here v v [ 99.965631] skb headroom: 00000040: 88 80 00 0a 00 33 9d 40 f0 41 01 80 00 00 08 0f OCELOT_TAG_LEN bytes later is the real MAC header v [ 99.973330] skb headroom: 00000050: 00 10 00 00 00 04 9f 05 f6 28 ba ae e4 b6 2c 3d [ 99.981028] skb headroom: 00000060: 08 00 ^ the skb_postpull_rcsum is called from "start" until the first byte prior to this one [ 99.985062] skb linear: 00000000: 45 00 00 54 27 ac 00 00 40 01 09 a8 c0 a8 64 03 [ 99.992762] skb linear: 00000010: c0 a8 64 01 00 00 10 e6 01 5c 00 04 49 30 a7 61 [ 100.000462] skb linear: 00000020: 00 00 00 00 3d 55 01 00 00 00 00 00 10 11 12 13 [ 100.008162] skb linear: 00000030: 14 15 16 17 18 19 1a 1b 1c 1d 1e 1f 20 21 22 23 [ 100.015862] skb linear: 00000040: 24 25 26 27 28 29 2a 2b 2c 2d 2e 2f 30 31 32 33 [ 100.023561] skb linear: 00000050: 34 35 36 37 Do you have some suggestions as to what may be wrong? Thanks.
Hi Vladimir On Thu, Dec 2, 2021 at 5:10 AM Vladimir Oltean <olteanv@gmail.com> wrote: > > Hello, > > On Wed, Nov 24, 2021 at 12:24:46PM -0800, Eric Dumazet wrote: > > From: Eric Dumazet <edumazet@google.com> > > > > Remove one pair of add/adc instructions and their dependency > > against carry flag. > > > > We can leverage third argument to csum_partial(): > > > > X = csum_block_sub(X, csum_partial(start, len, 0), 0); > > > > --> > > > > X = csum_block_add(X, ~csum_partial(start, len, 0), 0); > > > > --> > > > > X = ~csum_partial(start, len, ~X); > > > > Signed-off-by: Eric Dumazet <edumazet@google.com> > > --- > > include/linux/skbuff.h | 6 +++++- > > 1 file changed, 5 insertions(+), 1 deletion(-) > > > > diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h > > index eba256af64a577b458998845f2dc01a5ec80745a..eae4bd3237a41cc1b60b44c91fbfe21dfdd8f117 100644 > > --- a/include/linux/skbuff.h > > +++ b/include/linux/skbuff.h > > @@ -3485,7 +3485,11 @@ __skb_postpull_rcsum(struct sk_buff *skb, const void *start, unsigned int len, > > static inline void skb_postpull_rcsum(struct sk_buff *skb, > > const void *start, unsigned int len) > > { > > - __skb_postpull_rcsum(skb, start, len, 0); > > + if (skb->ip_summed == CHECKSUM_COMPLETE) > > + skb->csum = ~csum_partial(start, len, ~skb->csum); > > + else if (skb->ip_summed == CHECKSUM_PARTIAL && > > + skb_checksum_start_offset(skb) < 0) > > + skb->ip_summed = CHECKSUM_NONE; > > } > > > > static __always_inline void > > -- > > 2.34.0.rc2.393.gf8c9666880-goog > > > > I am seeing some errors after this patch, and I am not able to > understand why. Specifically, __skb_gro_checksum_complete() hits this > condition: There were two patches, one for GRO, one for skb_postpull_rcsum() I am a bit confused by your report. Which one is causing problems ? > > wsum = skb_checksum(skb, skb_gro_offset(skb), skb_gro_len(skb), 0); > > /* NAPI_GRO_CB(skb)->csum holds pseudo checksum */ > sum = csum_fold(csum_add(NAPI_GRO_CB(skb)->csum, wsum)); > /* See comments in __skb_checksum_complete(). */ > if (likely(!sum)) { > if (unlikely(skb->ip_summed == CHECKSUM_COMPLETE) && > !skb->csum_complete_sw) > netdev_rx_csum_fault(skb->dev, skb); > } > > To test, I am using a DSA switch network interface with an IPv4 address > and pinging through it. > > The idea is as follows: an enetc port is attached to a switch, and that > switch places a frame header before the Ethernet header. > The enetc calculates the L2 payload (actually what it perceives as L2 > payload, since it has no insight into the switch header format) checksum > and puts it in skb->csum: > https://elixir.bootlin.com/linux/latest/source/drivers/net/ethernet/freescale/enetc/enetc.c#L726 > > Then, the switch driver packet type handler is invoked, and this strips > that header and recalculates the checksum (then it changes skb->dev and > this is how pinging is done through the DSA interface, but that is > irrelevant). > https://elixir.bootlin.com/linux/latest/source/net/dsa/tag_ocelot.c#L56 > > There seems to be a disparity when the skb->csum is calculated by > skb_postpull_rcsum as zero. Before, it was calculated as 0xffff. skb->csum is 32bit, so there are about 2^16 different values for a given Internet checksum > > Below is a dump added by me in skb_postpull_rcsum when the checksums > calculated through both methods differ. I've kept the old implementation > inside skb->csum and this is what skb_dump() sees: > > [ 99.891524] skb csum of 20 bytes (20 to left of skb->data) using method 1: 0x0 method 2 0xffffffff, orig 0xf470 > [ 99.901701] skb len=84 headroom=98 headlen=84 tailroom=1546 > [ 99.901701] mac=(84,-6) net=(78,0) trans=78 > [ 99.901701] shinfo(txflags=0 nr_frags=0 gso(size=0 type=0 segs=0)) > [ 99.901701] csum(0xffffffff ip_summed=2 complete_sw=0 valid=0 level=0) > [ 99.901701] hash(0x0 sw=0 l4=0) proto=0x00f8 pkttype=3 iif=7 > [ 99.929916] dev name=eno2 feat=0x0x00020100001149a9 > [ 99.934831] skb headroom: 00000000: 00 c0 b1 a4 ff ff 00 00 00 e0 b1 a4 ff ff 00 00 > [ 99.942533] skb headroom: 00000010: 00 6f 5b 02 f8 14 ff ff 40 62 5b 02 f8 14 ff ff > [ 99.950232] skb headroom: 00000020: 21 6f 5b 02 f8 14 ff ff 00 00 00 00 00 00 00 00 > [ 99.957931] skb headroom: 00000030: 00 00 00 00 00 00 00 00 00 40 00 00 00 00 00 00 > [ 99.965631] skb headroom: 00000040: 88 80 00 0a 00 33 9d 40 f0 41 01 80 00 00 08 0f > [ 99.973330] skb headroom: 00000050: 00 10 00 00 00 04 9f 05 f6 28 ba ae e4 b6 2c 3d > [ 99.981028] skb headroom: 00000060: 08 00 > [ 99.985062] skb linear: 00000000: 45 00 00 54 27 ac 00 00 40 01 09 a8 c0 a8 64 03 > [ 99.992762] skb linear: 00000010: c0 a8 64 01 00 00 10 e6 01 5c 00 04 49 30 a7 61 > [ 100.000462] skb linear: 00000020: 00 00 00 00 3d 55 01 00 00 00 00 00 10 11 12 13 > [ 100.008162] skb linear: 00000030: 14 15 16 17 18 19 1a 1b 1c 1d 1e 1f 20 21 22 23 > [ 100.015862] skb linear: 00000040: 24 25 26 27 28 29 2a 2b 2c 2d 2e 2f 30 31 32 33 > [ 100.023561] skb linear: 00000050: 34 35 36 37 > > And below is the same output as above, but annotated by me with some comments: > > [ 99.891524] skb csum of 20 bytes (20 to left of skb->data) using method 1: 0x0 method 2 0xffffffff, orig 0xf470 > [ 99.901701] skb len=84 headroom=98 headlen=84 tailroom=1546 > [ 99.901701] mac=(84,-6) net=(78,0) trans=78 > ^^^^^^^^^^^^^^^^^^^^^^ > since the print is done from ocelot_rcv, the network and > transport headers haven't yet been updated > > [ 99.901701] shinfo(txflags=0 nr_frags=0 gso(size=0 type=0 segs=0)) > [ 99.901701] csum(0xffffffff ip_summed=2 complete_sw=0 valid=0 level=0) > [ 99.901701] hash(0x0 sw=0 l4=0) proto=0x00f8 pkttype=3 iif=7 > [ 99.929916] dev name=eno2 feat=0x0x00020100001149a9 > [ 99.934831] skb headroom: 00000000: 00 c0 b1 a4 ff ff 00 00 00 e0 b1 a4 ff ff 00 00 > [ 99.942533] skb headroom: 00000010: 00 6f 5b 02 f8 14 ff ff 40 62 5b 02 f8 14 ff ff > [ 99.950232] skb headroom: 00000020: 21 6f 5b 02 f8 14 ff ff 00 00 00 00 00 00 00 00 > [ 99.957931] skb headroom: 00000030: 00 00 00 00 00 00 00 00 00 40 00 00 00 00 00 00 > > here is where the enetc saw the the "start" variable (old skb->data) > beginning of the frame points here > v v > [ 99.965631] skb headroom: 00000040: 88 80 00 0a 00 33 9d 40 f0 41 01 80 00 00 08 0f > > OCELOT_TAG_LEN bytes > later is the real MAC header > v > [ 99.973330] skb headroom: 00000050: 00 10 00 00 00 04 9f 05 f6 28 ba ae e4 b6 2c 3d > [ 99.981028] skb headroom: 00000060: 08 00 > ^ > the skb_postpull_rcsum is called from "start" > until the first byte prior to this one > > [ 99.985062] skb linear: 00000000: 45 00 00 54 27 ac 00 00 40 01 09 a8 c0 a8 64 03 > [ 99.992762] skb linear: 00000010: c0 a8 64 01 00 00 10 e6 01 5c 00 04 49 30 a7 61 > [ 100.000462] skb linear: 00000020: 00 00 00 00 3d 55 01 00 00 00 00 00 10 11 12 13 > [ 100.008162] skb linear: 00000030: 14 15 16 17 18 19 1a 1b 1c 1d 1e 1f 20 21 22 23 > [ 100.015862] skb linear: 00000040: 24 25 26 27 28 29 2a 2b 2c 2d 2e 2f 30 31 32 33 > [ 100.023561] skb linear: 00000050: 34 35 36 37 > > Do you have some suggestions as to what may be wrong? Thanks. What kind of traffic is triggering the fault ? TCP, UDP, something else ? Do you have a stack trace to provide, because it is not clear from where the issue is detected. Thanks.
From: Vladimir Oltean > Sent: 02 December 2021 13:11 ... > > --- a/include/linux/skbuff.h > > +++ b/include/linux/skbuff.h > > @@ -3485,7 +3485,11 @@ __skb_postpull_rcsum(struct sk_buff *skb, const void *start, unsigned int > len, > > static inline void skb_postpull_rcsum(struct sk_buff *skb, > > const void *start, unsigned int len) > > { > > - __skb_postpull_rcsum(skb, start, len, 0); > > + if (skb->ip_summed == CHECKSUM_COMPLETE) > > + skb->csum = ~csum_partial(start, len, ~skb->csum); You can't do that, the domain is 1..0xffff (or maybe 0xffffffff). The invert has to convert ~0 to ~0 not zero. ... > There seems to be a disparity when the skb->csum is calculated by > skb_postpull_rcsum as zero. Before, it was calculated as 0xffff. Which is what that will do for some inputs at least. Maybe: skb->csum = 1 + ~csum_partial(start, len, ~skb->csum + 1); is right. I think that is the same as: skb->csum = -csum_partial(start, len, -skb->csum); Although letting the compiler do that transform probably makes the code easier to read. David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)
On Thu, Dec 2, 2021 at 7:06 AM David Laight <David.Laight@aculab.com> wrote: > > From: Vladimir Oltean > > Sent: 02 December 2021 13:11 > ... > > > --- a/include/linux/skbuff.h > > > +++ b/include/linux/skbuff.h > > > @@ -3485,7 +3485,11 @@ __skb_postpull_rcsum(struct sk_buff *skb, const void *start, unsigned int > > len, > > > static inline void skb_postpull_rcsum(struct sk_buff *skb, > > > const void *start, unsigned int len) > > > { > > > - __skb_postpull_rcsum(skb, start, len, 0); > > > + if (skb->ip_summed == CHECKSUM_COMPLETE) > > > + skb->csum = ~csum_partial(start, len, ~skb->csum); > > You can't do that, the domain is 1..0xffff (or maybe 0xffffffff). > The invert has to convert ~0 to ~0 not zero. > ... > > There seems to be a disparity when the skb->csum is calculated by > > skb_postpull_rcsum as zero. Before, it was calculated as 0xffff. > > Which is what that will do for some inputs at least. > Maybe: > skb->csum = 1 + ~csum_partial(start, len, ~skb->csum + 1); > is right. > I think that is the same as: > skb->csum = -csum_partial(start, len, -skb->csum); > Although letting the compiler do that transform probably makes > the code easier to read. > > Interesting, update_csum_diff4() and update_csum_diff16() seem to both use. skb->csum = ~csum_partial((char *)diff, sizeof(diff), ~skb->csum);
On Thu, Dec 02, 2021 at 06:51:47AM -0800, Eric Dumazet wrote: > Hi Vladimir > > I am seeing some errors after this patch, and I am not able to > > understand why. Specifically, __skb_gro_checksum_complete() hits this > > condition: > > There were two patches, one for GRO, one for skb_postpull_rcsum() > > I am a bit confused by your report. Which one is causing problems ? I'm sorry, indeed it seems that I missed to provide that info. Anyway, it is the skb_postpull_rcsum() call from the DSA switch driver, that I pointed to, which seems to be problematic. [ 754.211845] mscc_felix 0000:00:00.5 swp0: hw csum failure [ 754.217670] skb len=64 headroom=118 headlen=64 tailroom=1546 [ 754.217670] mac=(84,14) net=(98,20) trans=118 [ 754.217670] shinfo(txflags=0 nr_frags=0 gso(size=0 type=0 segs=0)) [ 754.217670] csum(0x0 ip_summed=2 complete_sw=0 valid=0 level=0) [ 754.217670] hash(0x0 sw=0 l4=0) proto=0x0800 pkttype=0 iif=9 [ 754.246905] dev name=swp0 feat=0x0x0002000000195829 [ 754.253200] skb headroom: 00000000: ef be ad de ef be ad de ef be ad de ef be ad de [ 754.261751] skb headroom: 00000010: ef be ad de ef be ad de ef be ad de ef be ad de [ 754.269444] skb headroom: 00000020: ef be ad de ef be ad de ef be ad de ef be ad de [ 754.277135] skb headroom: 00000030: ef be ad de ef be ad de ef be ad de ef be ad de [ 754.284826] skb headroom: 00000040: 88 80 00 0a 00 3e 6b e3 36 a1 01 80 00 00 00 0f [ 754.292516] skb headroom: 00000050: 00 10 00 00 d2 ee 27 92 2d 6c 6a b6 a6 22 19 47 [ 754.300207] skb headroom: 00000060: 08 00 45 00 00 54 2d 7c 00 00 40 01 03 d9 c0 a8 [ 754.307897] skb headroom: 00000070: 64 02 c0 a8 64 01 [ 754.312971] skb linear: 00000000: 00 00 60 4d 03 af 00 01 77 eb a8 61 00 00 00 00 [ 754.320662] skb linear: 00000010: b6 e2 06 00 00 00 00 00 10 11 12 13 14 15 16 17 [ 754.328352] skb linear: 00000020: 18 19 1a 1b 1c 1d 1e 1f 20 21 22 23 24 25 26 27 [ 754.336042] skb linear: 00000030: 28 29 2a 2b 2c 2d 2e 2f 30 31 32 33 34 35 36 37 [ 754.343732] skb tailroom: 00000000: 00 00 00 00 00 00 00 00 00 00 ef be ad de ef be [ 754.351423] skb tailroom: 00000010: ad de ef be ad de ef be ad de ef be ad de ef be (irrelevant tailroom trimmed) [ 755.088130] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.16.0-rc3-next-20211202-07010-ga9b9500ffaac-dirty #1531 [ 755.098169] Hardware name: LS1028A RDB Board (DT) [ 755.102885] Call trace: [ 755.105333] dump_backtrace+0x0/0x1ac [ 755.109016] show_stack+0x18/0x70 [ 755.112341] dump_stack_lvl+0x68/0x84 [ 755.116022] dump_stack+0x18/0x34 [ 755.119345] netdev_rx_csum_fault+0x60/0x64 [ 755.123549] __skb_checksum_complete+0x104/0x10c [ 755.128180] icmp_rcv+0x9c/0x3f0 [ 755.131421] ip_protocol_deliver_rcu+0x40/0x220 [ 755.135965] ip_local_deliver_finish+0x68/0x84 [ 755.140421] ip_local_deliver+0x7c/0x120 [ 755.144353] ip_sublist_rcv_finish+0x48/0x70 [ 755.148643] ip_sublist_rcv+0x168/0x1f0 [ 755.152489] ip_list_rcv+0xf8/0x1a0 [ 755.155985] __netif_receive_skb_list_core+0x184/0x214 [ 755.161142] netif_receive_skb_list_internal+0x180/0x29c [ 755.166471] napi_complete_done+0x68/0x1bc [ 755.170581] gro_cell_poll+0x80/0xa0 [ 755.174176] __napi_poll+0x38/0x184 [ 755.177674] net_rx_action+0xe8/0x280 [ 755.181347] __do_softirq+0x124/0x2a0 [ 755.185019] __irq_exit_rcu+0xe4/0x100 [ 755.188782] irq_exit_rcu+0x10/0x1c [ 755.192278] el1_interrupt+0x38/0x84 [ 755.195864] el1h_64_irq_handler+0x18/0x24 [ 755.199972] el1h_64_irq+0x78/0x7c [ 755.203380] cpuidle_enter_state+0x12c/0x2f0 [ 755.207671] cpuidle_enter+0x38/0x50 [ 755.211256] do_idle+0x214/0x29c [ 755.214495] cpu_startup_entry+0x24/0x80 [ 755.218428] rest_init+0xe4/0xf4 [ 755.221664] arch_call_rest_init+0x10/0x1c [ 755.225778] start_kernel+0x628/0x668 [ 755.229450] __primary_switched+0xc0/0xc8 > > There seems to be a disparity when the skb->csum is calculated by > > skb_postpull_rcsum as zero. Before, it was calculated as 0xffff. > > skb->csum is 32bit, so there are about 2^16 different values for a > given Internet checksum I meant 0xffffffff, sorry. It is visible in the skb_dump output that it was 0xffffffff before and now it is 0. > > Do you have some suggestions as to what may be wrong? Thanks. > > What kind of traffic is triggering the fault ? TCP, UDP, something else ? The simplest to reproduce would be for ICMP. I'm pretty sure I had a stack trace with TCP as well, but I don't seem to be able to reproduce that right now. > Do you have a stack trace to provide, because it is not clear from > where the issue is detected.
On Thu, Dec 2, 2021 at 8:29 AM Vladimir Oltean <olteanv@gmail.com> wrote: > > On Thu, Dec 02, 2021 at 06:51:47AM -0800, Eric Dumazet wrote: > > Hi Vladimir > > > I am seeing some errors after this patch, and I am not able to > > > understand why. Specifically, __skb_gro_checksum_complete() hits this > > > condition: > > > > There were two patches, one for GRO, one for skb_postpull_rcsum() > > > > I am a bit confused by your report. Which one is causing problems ? > > I'm sorry, indeed it seems that I missed to provide that info. > Anyway, it is the skb_postpull_rcsum() call from the DSA switch driver, > that I pointed to, which seems to be problematic. > > [ 754.211845] mscc_felix 0000:00:00.5 swp0: hw csum failure > [ 754.217670] skb len=64 headroom=118 headlen=64 tailroom=1546 > [ 754.217670] mac=(84,14) net=(98,20) trans=118 > [ 754.217670] shinfo(txflags=0 nr_frags=0 gso(size=0 type=0 segs=0)) > [ 754.217670] csum(0x0 ip_summed=2 complete_sw=0 valid=0 level=0) > [ 754.217670] hash(0x0 sw=0 l4=0) proto=0x0800 pkttype=0 iif=9 > [ 754.246905] dev name=swp0 feat=0x0x0002000000195829 > [ 754.253200] skb headroom: 00000000: ef be ad de ef be ad de ef be ad de ef be ad de > [ 754.261751] skb headroom: 00000010: ef be ad de ef be ad de ef be ad de ef be ad de > [ 754.269444] skb headroom: 00000020: ef be ad de ef be ad de ef be ad de ef be ad de > [ 754.277135] skb headroom: 00000030: ef be ad de ef be ad de ef be ad de ef be ad de > [ 754.284826] skb headroom: 00000040: 88 80 00 0a 00 3e 6b e3 36 a1 01 80 00 00 00 0f > [ 754.292516] skb headroom: 00000050: 00 10 00 00 d2 ee 27 92 2d 6c 6a b6 a6 22 19 47 > [ 754.300207] skb headroom: 00000060: 08 00 45 00 00 54 2d 7c 00 00 40 01 03 d9 c0 a8 > [ 754.307897] skb headroom: 00000070: 64 02 c0 a8 64 01 > [ 754.312971] skb linear: 00000000: 00 00 60 4d 03 af 00 01 77 eb a8 61 00 00 00 00 > [ 754.320662] skb linear: 00000010: b6 e2 06 00 00 00 00 00 10 11 12 13 14 15 16 17 > [ 754.328352] skb linear: 00000020: 18 19 1a 1b 1c 1d 1e 1f 20 21 22 23 24 25 26 27 > [ 754.336042] skb linear: 00000030: 28 29 2a 2b 2c 2d 2e 2f 30 31 32 33 34 35 36 37 > [ 754.343732] skb tailroom: 00000000: 00 00 00 00 00 00 00 00 00 00 ef be ad de ef be > [ 754.351423] skb tailroom: 00000010: ad de ef be ad de ef be ad de ef be ad de ef be > (irrelevant tailroom trimmed) > [ 755.088130] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.16.0-rc3-next-20211202-07010-ga9b9500ffaac-dirty #1531 > [ 755.098169] Hardware name: LS1028A RDB Board (DT) > [ 755.102885] Call trace: > [ 755.105333] dump_backtrace+0x0/0x1ac > [ 755.109016] show_stack+0x18/0x70 > [ 755.112341] dump_stack_lvl+0x68/0x84 > [ 755.116022] dump_stack+0x18/0x34 > [ 755.119345] netdev_rx_csum_fault+0x60/0x64 > [ 755.123549] __skb_checksum_complete+0x104/0x10c > [ 755.128180] icmp_rcv+0x9c/0x3f0 > [ 755.131421] ip_protocol_deliver_rcu+0x40/0x220 > [ 755.135965] ip_local_deliver_finish+0x68/0x84 > [ 755.140421] ip_local_deliver+0x7c/0x120 > [ 755.144353] ip_sublist_rcv_finish+0x48/0x70 > [ 755.148643] ip_sublist_rcv+0x168/0x1f0 > [ 755.152489] ip_list_rcv+0xf8/0x1a0 > [ 755.155985] __netif_receive_skb_list_core+0x184/0x214 > [ 755.161142] netif_receive_skb_list_internal+0x180/0x29c > [ 755.166471] napi_complete_done+0x68/0x1bc > [ 755.170581] gro_cell_poll+0x80/0xa0 > [ 755.174176] __napi_poll+0x38/0x184 > [ 755.177674] net_rx_action+0xe8/0x280 > [ 755.181347] __do_softirq+0x124/0x2a0 > [ 755.185019] __irq_exit_rcu+0xe4/0x100 > [ 755.188782] irq_exit_rcu+0x10/0x1c > [ 755.192278] el1_interrupt+0x38/0x84 > [ 755.195864] el1h_64_irq_handler+0x18/0x24 > [ 755.199972] el1h_64_irq+0x78/0x7c > [ 755.203380] cpuidle_enter_state+0x12c/0x2f0 > [ 755.207671] cpuidle_enter+0x38/0x50 > [ 755.211256] do_idle+0x214/0x29c > [ 755.214495] cpu_startup_entry+0x24/0x80 > [ 755.218428] rest_init+0xe4/0xf4 > [ 755.221664] arch_call_rest_init+0x10/0x1c > [ 755.225778] start_kernel+0x628/0x668 > [ 755.229450] __primary_switched+0xc0/0xc8 > > > > There seems to be a disparity when the skb->csum is calculated by > > > skb_postpull_rcsum as zero. Before, it was calculated as 0xffff. > > > > skb->csum is 32bit, so there are about 2^16 different values for a > > given Internet checksum > > I meant 0xffffffff, sorry. It is visible in the skb_dump output that it > was 0xffffffff before and now it is 0. > > > > Do you have some suggestions as to what may be wrong? Thanks. > > > > What kind of traffic is triggering the fault ? TCP, UDP, something else ? > > The simplest to reproduce would be for ICMP. I'm pretty sure I had a > stack trace with TCP as well, but I don't seem to be able to reproduce > that right now. > > > Do you have a stack trace to provide, because it is not clear from > > where the issue is detected. Thanks Vladimir I think that maybe the issue is that the initial skb->csum is zero, and the csum_parttial(removed_block) is also zero. But the initial skb->csum should not be zero if you have a non " all zero" frame. Can you double check this in drivers/net/ethernet/freescale/enetc/enetc.c ?
On Thu, Dec 2, 2021 at 11:32 AM Eric Dumazet <edumazet@google.com> wrote: > > Thanks Vladimir > > I think that maybe the issue is that the initial skb->csum is zero, > and the csum_parttial(removed_block) is also zero. > > But the initial skb->csum should not be zero if you have a non " all > zero" frame. > > Can you double check this in drivers/net/ethernet/freescale/enetc/enetc.c ? Yes, I am not sure why the csum is inverted in enetc_get_offloads() Perhaps diff --git a/drivers/net/ethernet/freescale/enetc/enetc.c b/drivers/net/ethernet/freescale/enetc/enetc.c index 504e12554079e306e477b9619f272d6e96527377..72524f14cae0093763f8a3f57b1e08e31bc4df1a 100644 --- a/drivers/net/ethernet/freescale/enetc/enetc.c +++ b/drivers/net/ethernet/freescale/enetc/enetc.c @@ -986,7 +986,7 @@ static void enetc_get_offloads(struct enetc_bdr *rx_ring, if (rx_ring->ndev->features & NETIF_F_RXCSUM) { u16 inet_csum = le16_to_cpu(rxbd->r.inet_csum); - skb->csum = csum_unfold((__force __sum16)~htons(inet_csum)); + skb->csum = csum_unfold((__force __sum16)htons(inet_csum)); skb->ip_summed = CHECKSUM_COMPLETE; } If this does not work, then maybe : diff --git a/drivers/net/ethernet/freescale/enetc/enetc.c b/drivers/net/ethernet/freescale/enetc/enetc.c index 504e12554079e306e477b9619f272d6e96527377..d190faa9a8242f9f3f962dd30b9f4409a83ee697 100644 --- a/drivers/net/ethernet/freescale/enetc/enetc.c +++ b/drivers/net/ethernet/freescale/enetc/enetc.c @@ -987,7 +987,8 @@ static void enetc_get_offloads(struct enetc_bdr *rx_ring, u16 inet_csum = le16_to_cpu(rxbd->r.inet_csum); skb->csum = csum_unfold((__force __sum16)~htons(inet_csum)); - skb->ip_summed = CHECKSUM_COMPLETE; + if (likely(skb->csum)) + skb->ip_summed = CHECKSUM_COMPLETE; } if (le16_to_cpu(rxbd->r.flags) & ENETC_RXBD_FLAG_VLAN) {
On Thu, Dec 02, 2021 at 11:32:09AM -0800, Eric Dumazet wrote: > On Thu, Dec 2, 2021 at 8:29 AM Vladimir Oltean <olteanv@gmail.com> wrote: > > > > On Thu, Dec 02, 2021 at 06:51:47AM -0800, Eric Dumazet wrote: > > > Hi Vladimir > > > > I am seeing some errors after this patch, and I am not able to > > > > understand why. Specifically, __skb_gro_checksum_complete() hits this > > > > condition: > > > > > > There were two patches, one for GRO, one for skb_postpull_rcsum() > > > > > > I am a bit confused by your report. Which one is causing problems ? > > > > I'm sorry, indeed it seems that I missed to provide that info. > > Anyway, it is the skb_postpull_rcsum() call from the DSA switch driver, > > that I pointed to, which seems to be problematic. > > > > [ 754.211845] mscc_felix 0000:00:00.5 swp0: hw csum failure > > [ 754.217670] skb len=64 headroom=118 headlen=64 tailroom=1546 > > [ 754.217670] mac=(84,14) net=(98,20) trans=118 > > [ 754.217670] shinfo(txflags=0 nr_frags=0 gso(size=0 type=0 segs=0)) > > [ 754.217670] csum(0x0 ip_summed=2 complete_sw=0 valid=0 level=0) > > [ 754.217670] hash(0x0 sw=0 l4=0) proto=0x0800 pkttype=0 iif=9 > > [ 754.246905] dev name=swp0 feat=0x0x0002000000195829 > > [ 754.253200] skb headroom: 00000000: ef be ad de ef be ad de ef be ad de ef be ad de > > [ 754.261751] skb headroom: 00000010: ef be ad de ef be ad de ef be ad de ef be ad de > > [ 754.269444] skb headroom: 00000020: ef be ad de ef be ad de ef be ad de ef be ad de > > [ 754.277135] skb headroom: 00000030: ef be ad de ef be ad de ef be ad de ef be ad de > > [ 754.284826] skb headroom: 00000040: 88 80 00 0a 00 3e 6b e3 36 a1 01 80 00 00 00 0f > > [ 754.292516] skb headroom: 00000050: 00 10 00 00 d2 ee 27 92 2d 6c 6a b6 a6 22 19 47 > > [ 754.300207] skb headroom: 00000060: 08 00 45 00 00 54 2d 7c 00 00 40 01 03 d9 c0 a8 > > [ 754.307897] skb headroom: 00000070: 64 02 c0 a8 64 01 > > [ 754.312971] skb linear: 00000000: 00 00 60 4d 03 af 00 01 77 eb a8 61 00 00 00 00 > > [ 754.320662] skb linear: 00000010: b6 e2 06 00 00 00 00 00 10 11 12 13 14 15 16 17 > > [ 754.328352] skb linear: 00000020: 18 19 1a 1b 1c 1d 1e 1f 20 21 22 23 24 25 26 27 > > [ 754.336042] skb linear: 00000030: 28 29 2a 2b 2c 2d 2e 2f 30 31 32 33 34 35 36 37 > > [ 754.343732] skb tailroom: 00000000: 00 00 00 00 00 00 00 00 00 00 ef be ad de ef be > > [ 754.351423] skb tailroom: 00000010: ad de ef be ad de ef be ad de ef be ad de ef be > > (irrelevant tailroom trimmed) > > [ 755.088130] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.16.0-rc3-next-20211202-07010-ga9b9500ffaac-dirty #1531 > > [ 755.098169] Hardware name: LS1028A RDB Board (DT) > > [ 755.102885] Call trace: > > [ 755.105333] dump_backtrace+0x0/0x1ac > > [ 755.109016] show_stack+0x18/0x70 > > [ 755.112341] dump_stack_lvl+0x68/0x84 > > [ 755.116022] dump_stack+0x18/0x34 > > [ 755.119345] netdev_rx_csum_fault+0x60/0x64 > > [ 755.123549] __skb_checksum_complete+0x104/0x10c > > [ 755.128180] icmp_rcv+0x9c/0x3f0 > > [ 755.131421] ip_protocol_deliver_rcu+0x40/0x220 > > [ 755.135965] ip_local_deliver_finish+0x68/0x84 > > [ 755.140421] ip_local_deliver+0x7c/0x120 > > [ 755.144353] ip_sublist_rcv_finish+0x48/0x70 > > [ 755.148643] ip_sublist_rcv+0x168/0x1f0 > > [ 755.152489] ip_list_rcv+0xf8/0x1a0 > > [ 755.155985] __netif_receive_skb_list_core+0x184/0x214 > > [ 755.161142] netif_receive_skb_list_internal+0x180/0x29c > > [ 755.166471] napi_complete_done+0x68/0x1bc > > [ 755.170581] gro_cell_poll+0x80/0xa0 > > [ 755.174176] __napi_poll+0x38/0x184 > > [ 755.177674] net_rx_action+0xe8/0x280 > > [ 755.181347] __do_softirq+0x124/0x2a0 > > [ 755.185019] __irq_exit_rcu+0xe4/0x100 > > [ 755.188782] irq_exit_rcu+0x10/0x1c > > [ 755.192278] el1_interrupt+0x38/0x84 > > [ 755.195864] el1h_64_irq_handler+0x18/0x24 > > [ 755.199972] el1h_64_irq+0x78/0x7c > > [ 755.203380] cpuidle_enter_state+0x12c/0x2f0 > > [ 755.207671] cpuidle_enter+0x38/0x50 > > [ 755.211256] do_idle+0x214/0x29c > > [ 755.214495] cpu_startup_entry+0x24/0x80 > > [ 755.218428] rest_init+0xe4/0xf4 > > [ 755.221664] arch_call_rest_init+0x10/0x1c > > [ 755.225778] start_kernel+0x628/0x668 > > [ 755.229450] __primary_switched+0xc0/0xc8 > > > > > > There seems to be a disparity when the skb->csum is calculated by > > > > skb_postpull_rcsum as zero. Before, it was calculated as 0xffff. > > > > > > skb->csum is 32bit, so there are about 2^16 different values for a > > > given Internet checksum > > > > I meant 0xffffffff, sorry. It is visible in the skb_dump output that it > > was 0xffffffff before and now it is 0. > > > > > > Do you have some suggestions as to what may be wrong? Thanks. > > > > > > What kind of traffic is triggering the fault ? TCP, UDP, something else ? > > > > The simplest to reproduce would be for ICMP. I'm pretty sure I had a > > stack trace with TCP as well, but I don't seem to be able to reproduce > > that right now. > > > > > Do you have a stack trace to provide, because it is not clear from > > > where the issue is detected. > > Thanks Vladimir > > I think that maybe the issue is that the initial skb->csum is zero, > and the csum_parttial(removed_block) is also zero. > > But the initial skb->csum should not be zero if you have a non " all > zero" frame. > > Can you double check this in drivers/net/ethernet/freescale/enetc/enetc.c ? To me it looks like the strange part is that the checksum of the removed block (printed by me as "csum_partial(start, len, 0)" inside skb_postpull_rcsum()) is the same as the skb->csum itself. [ 66.287583] fsl_enetc 0000:00:00.2 eno2: enetc_get_offloads 991: skb 0xffff4050c3671f00 csum 0x3c1d [ 66.296716] skb csum of 20 bytes (20 to the left of skb->data) using old method: 0x0, new method: 0xffffffff, orig csum 0x3c1d, csum of removed block 0x3c1d [ 66.310786] skb len=84 headroom=98 headlen=84 tailroom=1546 [ 66.310786] mac=(84,-6) net=(78,0) trans=78 [ 66.310786] shinfo(txflags=0 nr_frags=0 gso(size=0 type=0 segs=0)) [ 66.310786] csum(0xffffffff ip_summed=2 complete_sw=0 valid=0 level=0) [ 66.310786] hash(0x0 sw=0 l4=0) proto=0x00f8 pkttype=3 iif=7 [ 66.338997] dev name=eno2 feat=0x0x00020100001149a9 [ 66.343904] skb headroom: 00000000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 66.351600] skb headroom: 00000010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 66.359295] skb headroom: 00000020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 66.366990] skb headroom: 00000030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 66.374685] skb headroom: 00000040: 88 80 00 0a 00 38 c1 17 3d 01 01 80 00 00 08 0f [ 66.382379] skb headroom: 00000050: 00 10 00 00 52 f3 98 af f9 8c d2 ee 27 92 2d 6c [ 66.390073] skb headroom: 00000060: 08 00 [ 66.394105] skb linear: 00000000: 45 00 00 54 c8 59 40 00 40 01 28 fb c0 a8 64 01 [ 66.401799] skb linear: 00000010: c0 a8 64 02 08 00 ee 94 06 98 00 04 03 2e a9 61 [ 66.409493] skb linear: 00000020: 00 00 00 00 8b 6c 0c 00 00 00 00 00 10 11 12 13 [ 66.417187] skb linear: 00000030: 14 15 16 17 18 19 1a 1b 1c 1d 1e 1f 20 21 22 23 [ 66.424880] skb linear: 00000040: 24 25 26 27 28 29 2a 2b 2c 2d 2e 2f 30 31 32 33 [ 66.432574] skb linear: 00000050: 34 35 36 37 [ 66.437128] skb tailroom: 00000000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 67.181735] fsl_enetc 0000:00:00.2 eno2: ocelot_rcv 131: skb 0xffff4050c3671f00 csum before skb_postpull_rcsum 0x3c1d, after 0xffffffff This isn't the case for some other traffic streams, here is some iperf3 TCP: [ 52.336847] fsl_enetc 0000:00:00.2 eno2: enetc_get_offloads 991: skb 0xffff4050c44ab500 csum 0x18ce [ 52.345930] fsl_enetc 0000:00:00.2 eno2: enetc_get_offloads 991: skb 0xffff4050c3671f00 csum 0x18ce [ 52.355014] fsl_enetc 0000:00:00.2 eno2: enetc_get_offloads 991: skb 0xffff4050c3671e00 csum 0x18ce [ 52.397629] fsl_enetc 0000:00:00.2 eno2: ocelot_rcv 131: skb 0xffff4050c44ab500 csum before skb_postpull_rcsum 0x18ce, after 0xffffdcb0 [ 52.409853] fsl_enetc 0000:00:00.2 eno2: ocelot_rcv 131: skb 0xffff4050c3671f00 csum before skb_postpull_rcsum 0x18ce, after 0xffffdcb0 [ 52.422076] fsl_enetc 0000:00:00.2 eno2: ocelot_rcv 131: skb 0xffff4050c3671e00 csum before skb_postpull_rcsum 0x18ce, after 0xffffdcb0 When skb->csum isn't equal to the csum of the removed block, the two implementations of skb_postpull_rcsum() agree.
On Thu, Dec 02, 2021 at 10:40:36PM +0200, Vladimir Oltean wrote: > To me it looks like the strange part is that the checksum of the removed > block (printed by me as "csum_partial(start, len, 0)" inside > skb_postpull_rcsum()) is the same as the skb->csum itself. > > [ 66.287583] fsl_enetc 0000:00:00.2 eno2: enetc_get_offloads 991: skb 0xffff4050c3671f00 csum 0x3c1d > [ 66.296716] skb csum of 20 bytes (20 to the left of skb->data) using old method: 0x0, new method: 0xffffffff, orig csum 0x3c1d, csum of removed block 0x3c1d sorry, this line is confusing, what is printed as the "old method" is in fact the "new method" and viceversa. I tried to rename them from something clearer than "method 1" and "method 2" and failed. > [ 66.310786] skb len=84 headroom=98 headlen=84 tailroom=1546 > [ 66.310786] mac=(84,-6) net=(78,0) trans=78 > [ 66.310786] shinfo(txflags=0 nr_frags=0 gso(size=0 type=0 segs=0)) > [ 66.310786] csum(0xffffffff ip_summed=2 complete_sw=0 valid=0 level=0) > [ 66.310786] hash(0x0 sw=0 l4=0) proto=0x00f8 pkttype=3 iif=7 > [ 66.338997] dev name=eno2 feat=0x0x00020100001149a9 > [ 66.343904] skb headroom: 00000000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > [ 66.351600] skb headroom: 00000010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > [ 66.359295] skb headroom: 00000020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > [ 66.366990] skb headroom: 00000030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > [ 66.374685] skb headroom: 00000040: 88 80 00 0a 00 38 c1 17 3d 01 01 80 00 00 08 0f > [ 66.382379] skb headroom: 00000050: 00 10 00 00 52 f3 98 af f9 8c d2 ee 27 92 2d 6c > [ 66.390073] skb headroom: 00000060: 08 00 > [ 66.394105] skb linear: 00000000: 45 00 00 54 c8 59 40 00 40 01 28 fb c0 a8 64 01 > [ 66.401799] skb linear: 00000010: c0 a8 64 02 08 00 ee 94 06 98 00 04 03 2e a9 61 > [ 66.409493] skb linear: 00000020: 00 00 00 00 8b 6c 0c 00 00 00 00 00 10 11 12 13 > [ 66.417187] skb linear: 00000030: 14 15 16 17 18 19 1a 1b 1c 1d 1e 1f 20 21 22 23 > [ 66.424880] skb linear: 00000040: 24 25 26 27 28 29 2a 2b 2c 2d 2e 2f 30 31 32 33 > [ 66.432574] skb linear: 00000050: 34 35 36 37 > [ 66.437128] skb tailroom: 00000000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > [ 67.181735] fsl_enetc 0000:00:00.2 eno2: ocelot_rcv 131: skb 0xffff4050c3671f00 csum before skb_postpull_rcsum 0x3c1d, after 0xffffffff
> To me it looks like the strange part is that the checksum of the removed > block (printed by me as "csum_partial(start, len, 0)" inside > skb_postpull_rcsum()) is the same as the skb->csum itself. If you are removing all the bytes that made the original checksum that will happen. And that might be true for the packets you are building. Try replacing both ~ with -. So replace: skb->csum = ~csum_partial(start, len, ~skb->csum); with: skb->csum = -csum_partial(start, len, -skb->csum); That should geneate ~0u instead 0 (if I've got my maths right). David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)
On Thu, Dec 02, 2021 at 12:37:35PM -0800, Eric Dumazet wrote: > On Thu, Dec 2, 2021 at 11:32 AM Eric Dumazet <edumazet@google.com> wrote: > > > > Thanks Vladimir > > > > I think that maybe the issue is that the initial skb->csum is zero, > > and the csum_parttial(removed_block) is also zero. > > > > But the initial skb->csum should not be zero if you have a non " all > > zero" frame. > > > > Can you double check this in drivers/net/ethernet/freescale/enetc/enetc.c ? > > Yes, I am not sure why the csum is inverted in enetc_get_offloads() I guess it's inverted because the hardware doesn't provide its one's complement. > Perhaps > > diff --git a/drivers/net/ethernet/freescale/enetc/enetc.c > b/drivers/net/ethernet/freescale/enetc/enetc.c > index 504e12554079e306e477b9619f272d6e96527377..72524f14cae0093763f8a3f57b1e08e31bc4df1a > 100644 > --- a/drivers/net/ethernet/freescale/enetc/enetc.c > +++ b/drivers/net/ethernet/freescale/enetc/enetc.c > @@ -986,7 +986,7 @@ static void enetc_get_offloads(struct enetc_bdr *rx_ring, > if (rx_ring->ndev->features & NETIF_F_RXCSUM) { > u16 inet_csum = le16_to_cpu(rxbd->r.inet_csum); > > - skb->csum = csum_unfold((__force __sum16)~htons(inet_csum)); > + skb->csum = csum_unfold((__force __sum16)htons(inet_csum)); > skb->ip_summed = CHECKSUM_COMPLETE; > } > > If this does not work, then maybe : > > diff --git a/drivers/net/ethernet/freescale/enetc/enetc.c > b/drivers/net/ethernet/freescale/enetc/enetc.c > index 504e12554079e306e477b9619f272d6e96527377..d190faa9a8242f9f3f962dd30b9f4409a83ee697 > 100644 > --- a/drivers/net/ethernet/freescale/enetc/enetc.c > +++ b/drivers/net/ethernet/freescale/enetc/enetc.c > @@ -987,7 +987,8 @@ static void enetc_get_offloads(struct enetc_bdr *rx_ring, > u16 inet_csum = le16_to_cpu(rxbd->r.inet_csum); > > skb->csum = csum_unfold((__force __sum16)~htons(inet_csum)); > - skb->ip_summed = CHECKSUM_COMPLETE; > + if (likely(skb->csum)) > + skb->ip_summed = CHECKSUM_COMPLETE; > } > > if (le16_to_cpu(rxbd->r.flags) & ENETC_RXBD_FLAG_VLAN) { I guess you aren't interested any longer in the result of these changes, since the csum isn't zero in enetc?
On Thu, Dec 02, 2021 at 08:58:46PM +0000, David Laight wrote: > > To me it looks like the strange part is that the checksum of the removed > > block (printed by me as "csum_partial(start, len, 0)" inside > > skb_postpull_rcsum()) is the same as the skb->csum itself. > > If you are removing all the bytes that made the original checksum > that will happen. > And that might be true for the packets you are building. Yes, I am not removing all the bytes that made up the original L2 payload csum. Let me pull up the skb_dump from my original message: here is where the enetc saw the the "start" variable (old skb->data) beginning of the frame points here v v skb headroom: 00000040: 88 80 00 0a 00 33 9d 40 f0 41 01 80 00 00 08 0f OCELOT_TAG_LEN bytes into the frame, the real MAC header can be found v skb headroom: 00000050: 00 10 00 00 00 04 9f 05 f6 28 ba ae e4 b6 2c 3d skb headroom: 00000060: 08 00 skb linear: 00000000: 45 00 00 54 27 ac 00 00 40 01 09 a8 c0 a8 64 03 ^ the skb_postpull_rcsum is called from "start" pointer until the first byte prior to this one skb linear: 00000010: c0 a8 64 01 00 00 10 e6 01 5c 00 04 49 30 a7 61 skb linear: 00000020: 00 00 00 00 3d 55 01 00 00 00 00 00 10 11 12 13 skb linear: 00000030: 14 15 16 17 18 19 1a 1b 1c 1d 1e 1f 20 21 22 23 skb linear: 00000040: 24 25 26 27 28 29 2a 2b 2c 2d 2e 2f 30 31 32 33 skb linear: 00000050: 34 35 36 37 So skb_postpull_rcsum() is called from "skb headroom" offset 0x4e to offset 0x61 inclusive (0x61 - 0x4e + 1 = 20 == OCELOT_TAG_LEN). However as I understand it, the CHECKSUM_COMPLETE of this packet is calculated by enetc from "skb headroom" offset 0x4e and all the way until "skb linear" offset 0x53. So there is still a good chunk of packet to go. That's why it is still a mystery to me why the checksums would be equal. They still are, with your change suggested below, of course, but at least there is no splat now. > > Try replacing both ~ with -. > So replace: > skb->csum = ~csum_partial(start, len, ~skb->csum); > with: > skb->csum = -csum_partial(start, len, -skb->csum); > > That should geneate ~0u instead 0 (if I've got my maths right). Indeed, replacing both one's complement operations with two's complement seems to produce correct results (consistent with old code) in all cases that I am testing with (ICMP, TCP, UDP). Thanks! > > David > > - > Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK > Registration No: 1397386 (Wales) >
From: Vladimir Oltean > Sent: 02 December 2021 21:40 > > On Thu, Dec 02, 2021 at 08:58:46PM +0000, David Laight wrote: > > > To me it looks like the strange part is that the checksum of the removed > > > block (printed by me as "csum_partial(start, len, 0)" inside > > > skb_postpull_rcsum()) is the same as the skb->csum itself. > > > > If you are removing all the bytes that made the original checksum > > that will happen. > > And that might be true for the packets you are building. > > Yes, I am not removing all the bytes that made up the original L2 > payload csum. Let me pull up the skb_dump from my original message: > > here is where the enetc saw the the "start" variable (old skb->data) > beginning of the frame points here > v v > skb headroom: 00000040: 88 80 00 0a 00 33 9d 40 f0 41 01 80 00 00 08 0f > > OCELOT_TAG_LEN bytes into the frame, > the real MAC header can be found > v > skb headroom: 00000050: 00 10 00 00 00 04 9f 05 f6 28 ba ae e4 b6 2c 3d > skb headroom: 00000060: 08 00 > skb linear: 00000000: 45 00 00 54 27 ac 00 00 40 01 09 a8 c0 a8 64 03 > ^ > the skb_postpull_rcsum is called from "start" > pointer until the first byte prior to this one > > skb linear: 00000010: c0 a8 64 01 00 00 10 e6 01 5c 00 04 49 30 a7 61 > skb linear: 00000020: 00 00 00 00 3d 55 01 00 00 00 00 00 10 11 12 13 > skb linear: 00000030: 14 15 16 17 18 19 1a 1b 1c 1d 1e 1f 20 21 22 23 > skb linear: 00000040: 24 25 26 27 28 29 2a 2b 2c 2d 2e 2f 30 31 32 33 > skb linear: 00000050: 34 35 36 37 > > So skb_postpull_rcsum() is called from "skb headroom" offset 0x4e to > offset 0x61 inclusive (0x61 - 0x4e + 1 = 20 == OCELOT_TAG_LEN). > > However as I understand it, the CHECKSUM_COMPLETE of this packet is > calculated by enetc from "skb headroom" offset 0x4e and all the way > until "skb linear" offset 0x53. So there is still a good chunk of packet > to go. That's why it is still a mystery to me why the checksums would be > equal ... Possibly because the rest of the packet actually has a valid checksum (ie 0xffff) that (somewhere) got reduced to 16 bits. If the checksum of the header were then added, and later removed you'd end up inverting ~0u. David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)
From: Vladimir Oltean > Sent: 02 December 2021 21:40 ... > > > > Try replacing both ~ with -. > > So replace: > > skb->csum = ~csum_partial(start, len, ~skb->csum); > > with: > > skb->csum = -csum_partial(start, len, -skb->csum); > > > > That should geneate ~0u instead 0 (if I've got my maths right). > > Indeed, replacing both one's complement operations with two's complement > seems to produce correct results (consistent with old code) in all cases > that I am testing with (ICMP, TCP, UDP). Thanks! You need to generate (or persuade Eric to generate) a patch. I don't have the right source tree. Any code that does ~csum_partial() is 'dubious' unless followed by a check for 0. The two's compliment negate save the conditional - provided the offset of 1 can be added in earlier. David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)
On Fri, Dec 03, 2021 at 02:57:04PM +0000, David Laight wrote: > From: Vladimir Oltean > > Sent: 02 December 2021 21:40 > ... > > > > > > Try replacing both ~ with -. > > > So replace: > > > skb->csum = ~csum_partial(start, len, ~skb->csum); > > > with: > > > skb->csum = -csum_partial(start, len, -skb->csum); > > > > > > That should geneate ~0u instead 0 (if I've got my maths right). > > > > Indeed, replacing both one's complement operations with two's complement > > seems to produce correct results (consistent with old code) in all cases > > that I am testing with (ICMP, TCP, UDP). Thanks! > > You need to generate (or persuade Eric to generate) a patch. > I don't have the right source tree. > > Any code that does ~csum_partial() is 'dubious' unless followed > by a check for 0. > The two's compliment negate save the conditional - provided the > offset of 1 can be added in earlier. > > David > > - > Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK > Registration No: 1397386 (Wales) > Eric, could you please send a patch with this change? If you want and if it helps, I can also help you reproduce this locally using the dsa_loop mockup driver.
On Fri, Dec 3, 2021 at 8:14 AM Vladimir Oltean <olteanv@gmail.com> wrote: > > On Fri, Dec 03, 2021 at 02:57:04PM +0000, David Laight wrote: > > From: Vladimir Oltean > > > Sent: 02 December 2021 21:40 > > ... > > > > > > > > Try replacing both ~ with -. > > > > So replace: > > > > skb->csum = ~csum_partial(start, len, ~skb->csum); > > > > with: > > > > skb->csum = -csum_partial(start, len, -skb->csum); > > > > > > > > That should geneate ~0u instead 0 (if I've got my maths right). > > > > > > Indeed, replacing both one's complement operations with two's complement > > > seems to produce correct results (consistent with old code) in all cases > > > that I am testing with (ICMP, TCP, UDP). Thanks! > > > > You need to generate (or persuade Eric to generate) a patch. > > I don't have the right source tree. > > > > Any code that does ~csum_partial() is 'dubious' unless followed > > by a check for 0. > > The two's compliment negate save the conditional - provided the > > offset of 1 can be added in earlier. > > > > David > > > > - > > Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK > > Registration No: 1397386 (Wales) > > > > Eric, could you please send a patch with this change? Sure, I will do this today, after more testing. > If you want and if it helps, I can also help you reproduce this locally > using the dsa_loop mockup driver. No need, thanks !
> > Eric, could you please send a patch with this change? > > Sure, I will do this today, after more testing. I've just done a quick grep and found two ~csum_partial() in include/net/seg6.h. Both are wrong (and completely horrid). There are also 40 csum_partial(buf, len, 0). If all the buffer is zero they'll return zero - invalid. They ought to be changed to csum_partial(buf, len, 0xffff). David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)
On Fri, Dec 3, 2021 at 8:47 AM David Laight <David.Laight@aculab.com> wrote: > > > > Eric, could you please send a patch with this change? > > > > Sure, I will do this today, after more testing. > > I've just done a quick grep and found two ~csum_partial() in > include/net/seg6.h. This is what I already mentioned in this email thread, and the reason I have CCed David Lebrun. https://marc.info/?l=linux-netdev&m=163845851801840&w=2 David, can you comment on this ? > Both are wrong (and completely horrid). > > There are also 40 csum_partial(buf, len, 0). > If all the buffer is zero they'll return zero - invalid. > They ought to be changed to csum_partial(buf, len, 0xffff). > Please point where all zero buffers can be valid in the first place.
... > > There are also 40 csum_partial(buf, len, 0). > > If all the buffer is zero they'll return zero - invalid. > > They ought to be changed to csum_partial(buf, len, 0xffff). > > > > Please point where all zero buffers can be valid in the first place. They probably can't be. But I bet a lot of the code was written without realising it mattered. A quick scan does imply that they are all probably ok. But I have spotted 18 ~csum_fold() - they may be dubious. David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index eba256af64a577b458998845f2dc01a5ec80745a..eae4bd3237a41cc1b60b44c91fbfe21dfdd8f117 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -3485,7 +3485,11 @@ __skb_postpull_rcsum(struct sk_buff *skb, const void *start, unsigned int len, static inline void skb_postpull_rcsum(struct sk_buff *skb, const void *start, unsigned int len) { - __skb_postpull_rcsum(skb, start, len, 0); + if (skb->ip_summed == CHECKSUM_COMPLETE) + skb->csum = ~csum_partial(start, len, ~skb->csum); + else if (skb->ip_summed == CHECKSUM_PARTIAL && + skb_checksum_start_offset(skb) < 0) + skb->ip_summed = CHECKSUM_NONE; } static __always_inline void