Message ID | 1ed21e6d-7cbc-43e3-8933-fc40562b70b2@gmail.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | net: gro: remove network_header use, move p->{flush/flush_id} calculations to L4 | expand |
Richard Gobert wrote: > {inet,ipv6}_gro_receive functions perform flush checks (ttl, flags, > iph->id, ...) against all packets in a loop. These flush checks are used in > all merging UDP and TCP flows. > > These checks need to be done only once and only against the found p skb, > since they only affect flush and not same_flow. > > This patch leverages correct network header offsets from the cb for both > outer and inner network headers - allowing these checks to be done only > once, in tcp_gro_receive and udp_gro_receive_segment. As a result, > NAPI_GRO_CB(p)->flush is not used at all. In addition, flush_id checks are > more declarative and contained in inet_gro_flush, thus removing the need > for flush_id in napi_gro_cb. > > This results in less parsing code for non-loop flush tests for TCP and UDP > flows. > > To make sure results are not within noise range - I've made netfilter drop > all TCP packets, and measured CPU performance in GRO (in this case GRO is > responsible for about 50% of the CPU utilization). > > perf top while replaying 64 parallel IP/TCP streams merging in GRO: > (gro_network_flush is compiled inline to tcp_gro_receive) > net-next: > 6.94% [kernel] [k] inet_gro_receive > 3.02% [kernel] [k] tcp_gro_receive > > patch applied: > 4.27% [kernel] [k] tcp_gro_receive > 4.22% [kernel] [k] inet_gro_receive > > perf top while replaying 64 parallel IP/IP/TCP streams merging in GRO (same > results for any encapsulation, in this case inet_gro_receive is top > offender in net-next) > net-next: > 10.09% [kernel] [k] inet_gro_receive > 2.08% [kernel] [k] tcp_gro_receive > > patch applied: > 6.97% [kernel] [k] inet_gro_receive > 3.68% [kernel] [k] tcp_gro_receive Thanks for getting the additional numbers. The savings are not huge. But +1 on the change also because it simplifies this non-obvious logic. It makes sense to separate flow matching and flush logic. Btw please include Alexander Duyck in the Cc: of this series. > +static inline int inet_gro_flush(const struct iphdr *iph, const struct iphdr *iph2, > + struct sk_buff *p, bool outer) > +{ > + const u32 id = ntohl(*(__be32 *)&iph->id); > + const u32 id2 = ntohl(*(__be32 *)&iph2->id); > + const u16 flush_id = (id >> 16) - (id2 >> 16); > + const u16 count = NAPI_GRO_CB(p)->count; > + const u32 df = id & IP_DF; > + u32 is_atomic; > + int flush; > + > + /* All fields must match except length and checksum. */ > + flush = (iph->ttl ^ iph2->ttl) | (iph->tos ^ iph2->tos) | (df ^ (id2 & IP_DF)); > + > + if (outer && df) > + return flush; Does the fixed id logic apply equally to inner and outer IPv4? > + > + /* When we receive our second frame we can make a decision on if we > + * continue this flow as an atomic flow with a fixed ID or if we use > + * an incrementing ID. > + */ > + NAPI_GRO_CB(p)->is_atomic |= (count == 1 && df && flush_id == 0); > + is_atomic = (df && NAPI_GRO_CB(p)->is_atomic) - 1; > + > + return flush | (flush_id ^ (count & is_atomic)); This is a good time to consider making this logical more obvious. First off, the flush check can be part of the outer && df above, as flush is not modified after. Subjective, but I find the following more readable, and not worth saving a few branches. if (count == 1 && df && !flush_id) NAPI_GRO_CB(p)->is_atomic = true; ip_fixedid_matches = NAPI_GRO_CB(p)->is_atomic ^ df; ipid_offset_matches = ipid_offset - count; return ip_fixedid_matches & ipid_offset_matches; Have to be a bit careful about types. Have not checked that in detail. And while nitpicking: ipid_offset may be a more descriptive variable name than flush_id, and ip_fixedid than is_atomic. If changing those does not result in a lot of code churn. > +} > + > +static inline int ipv6_gro_flush(const struct ipv6hdr *iph, const struct ipv6hdr *iph2) > +{ > + /* <Version:4><Traffic_Class:8><Flow_Label:20> */ > + __be32 first_word = *(__be32 *)iph ^ *(__be32 *)iph2; > + > + /* Flush if Traffic Class fields are different. */ > + return !!((first_word & htonl(0x0FF00000)) | > + (__force __be32)(iph->hop_limit ^ iph2->hop_limit)); > +} > + > +static inline int gro_network_flush(const void *th, const void *th2, struct sk_buff *p, int off) > +{ > + const bool encap_mark = NAPI_GRO_CB(p)->encap_mark; Is this correct when udp_gro_complete clears this for tunnels? > + int flush = 0; > + int i; > + > + for (i = 0; i <= encap_mark; i++) { > + const u16 diff = off - NAPI_GRO_CB(p)->network_offsets[i]; > + const void *nh = th - diff; > + const void *nh2 = th2 - diff; > + > + if (((struct iphdr *)nh)->version == 6) > + flush |= ipv6_gro_flush(nh, nh2); > + else > + flush |= inet_gro_flush(nh, nh2, p, i != encap_mark); > + } Maybe slightly better for branch prediction, and more obvious, if creating a helper function __gro_network_flush and calling __gro_network_flush(th, th2, p, off - NAPI_GRO_CB(p)->network_offsets[0]) if (NAPI_GRO_CB(p)->encap_mark) __gro_network_flush(th, th2, p, off - NAPI_GRO_CB(p)->network_offsets[1]) > + > + return flush; > +} > + > int skb_gro_receive(struct sk_buff *p, struct sk_buff *skb); >
Willem de Bruijn wrote: > Richard Gobert wrote: >> {inet,ipv6}_gro_receive functions perform flush checks (ttl, flags, >> iph->id, ...) against all packets in a loop. These flush checks are used in >> all merging UDP and TCP flows. >> >> These checks need to be done only once and only against the found p skb, >> since they only affect flush and not same_flow. >> >> This patch leverages correct network header offsets from the cb for both >> outer and inner network headers - allowing these checks to be done only >> once, in tcp_gro_receive and udp_gro_receive_segment. As a result, >> NAPI_GRO_CB(p)->flush is not used at all. In addition, flush_id checks are >> more declarative and contained in inet_gro_flush, thus removing the need >> for flush_id in napi_gro_cb. >> >> This results in less parsing code for non-loop flush tests for TCP and UDP >> flows. >> >> To make sure results are not within noise range - I've made netfilter drop >> all TCP packets, and measured CPU performance in GRO (in this case GRO is >> responsible for about 50% of the CPU utilization). >> >> perf top while replaying 64 parallel IP/TCP streams merging in GRO: >> (gro_network_flush is compiled inline to tcp_gro_receive) >> net-next: >> 6.94% [kernel] [k] inet_gro_receive >> 3.02% [kernel] [k] tcp_gro_receive >> >> patch applied: >> 4.27% [kernel] [k] tcp_gro_receive >> 4.22% [kernel] [k] inet_gro_receive >> >> perf top while replaying 64 parallel IP/IP/TCP streams merging in GRO (same >> results for any encapsulation, in this case inet_gro_receive is top >> offender in net-next) >> net-next: >> 10.09% [kernel] [k] inet_gro_receive >> 2.08% [kernel] [k] tcp_gro_receive >> >> patch applied: >> 6.97% [kernel] [k] inet_gro_receive >> 3.68% [kernel] [k] tcp_gro_receive > > Thanks for getting the additional numbers. The savings are not huge. > > But +1 on the change also because it simplifies this non-obvious > logic. It makes sense to separate flow matching and flush logic. > > Btw please include Alexander Duyck in the Cc: of this series. Thanks, will do that when I re-post. >> +static inline int inet_gro_flush(const struct iphdr *iph, const struct iphdr *iph2, >> + struct sk_buff *p, bool outer) >> +{ >> + const u32 id = ntohl(*(__be32 *)&iph->id); >> + const u32 id2 = ntohl(*(__be32 *)&iph2->id); >> + const u16 flush_id = (id >> 16) - (id2 >> 16); >> + const u16 count = NAPI_GRO_CB(p)->count; >> + const u32 df = id & IP_DF; >> + u32 is_atomic; >> + int flush; >> + >> + /* All fields must match except length and checksum. */ >> + flush = (iph->ttl ^ iph2->ttl) | (iph->tos ^ iph2->tos) | (df ^ (id2 & IP_DF)); >> + >> + if (outer && df) >> + return flush; > > Does the fixed id logic apply equally to inner and outer IPv4? > Fixed id logic is not applied equally to inner and outer IPv4. innermost IDs are checked, but outer IPv4 IDs are not checked at all if DF is set. This is the current logic in the code and this patch preserves it. To my understanding this is explained as intentional by the original commit author (20160407223218.11142.26592.stgit@ahduyck-xeon-server) Alexander - could you maybe elaborate further? >> + >> + /* When we receive our second frame we can make a decision on if we >> + * continue this flow as an atomic flow with a fixed ID or if we use >> + * an incrementing ID. >> + */ >> + NAPI_GRO_CB(p)->is_atomic |= (count == 1 && df && flush_id == 0); >> + is_atomic = (df && NAPI_GRO_CB(p)->is_atomic) - 1; >> + >> + return flush | (flush_id ^ (count & is_atomic)); > > This is a good time to consider making this logical more obvious. > > First off, the flush check can be part of the outer && df above, as > flush is not modified after. > > Subjective, but I find the following more readable, and not worth > saving a few branches. > > if (count == 1 && df && !flush_id) > NAPI_GRO_CB(p)->is_atomic = true; > > ip_fixedid_matches = NAPI_GRO_CB(p)->is_atomic ^ df; > ipid_offset_matches = ipid_offset - count; > > return ip_fixedid_matches & ipid_offset_matches; > > Have to be a bit careful about types. Have not checked that in detail. > ip_fixedid_matches should also account for checking whether flush_id is 0 or not, if we're going for readability I'd suggest checking "which check" should be done (fixedid or offset) and do only the appropriate one. if (count == 1 && df && !ipid_offset) NAPI_GRO_CB(p)->is_atomic = true; if (NAPI_GRO_CB(p)->is_atomic && df) return flush | ipid_offset; return flush | (ipid_offset ^ count); > And while nitpicking: > ipid_offset may be a more descriptive variable name than flush_id, and > ip_fixedid than is_atomic. If changing those does not result in a lot > of code churn. > I also think is_atomic is not the best name, I'll change both names in the next patch. >> +} >> + >> +static inline int ipv6_gro_flush(const struct ipv6hdr *iph, const struct ipv6hdr *iph2) >> +{ >> + /* <Version:4><Traffic_Class:8><Flow_Label:20> */ >> + __be32 first_word = *(__be32 *)iph ^ *(__be32 *)iph2; >> + >> + /* Flush if Traffic Class fields are different. */ >> + return !!((first_word & htonl(0x0FF00000)) | >> + (__force __be32)(iph->hop_limit ^ iph2->hop_limit)); >> +} >> + >> +static inline int gro_network_flush(const void *th, const void *th2, struct sk_buff *p, int off) >> +{ >> + const bool encap_mark = NAPI_GRO_CB(p)->encap_mark; > > Is this correct when udp_gro_complete clears this for tunnels? > gro_network_flush is called in receive flow, so udp_gro_complete cannot be called after gro_network_flush. I think the function name should be changed to gro_receive_network_flush. >> + int flush = 0; >> + int i; >> + >> + for (i = 0; i <= encap_mark; i++) { >> + const u16 diff = off - NAPI_GRO_CB(p)->network_offsets[i]; >> + const void *nh = th - diff; >> + const void *nh2 = th2 - diff; >> + >> + if (((struct iphdr *)nh)->version == 6) >> + flush |= ipv6_gro_flush(nh, nh2); >> + else >> + flush |= inet_gro_flush(nh, nh2, p, i != encap_mark); >> + } > > Maybe slightly better for branch prediction, and more obvious, if > creating a helper function __gro_network_flush and calling > > __gro_network_flush(th, th2, p, off - NAPI_GRO_CB(p)->network_offsets[0]) > if (NAPI_GRO_CB(p)->encap_mark) > __gro_network_flush(th, th2, p, off - NAPI_GRO_CB(p)->network_offsets[1]) > >> + >> + return flush; >> +} >> + >> int skb_gro_receive(struct sk_buff *p, struct sk_buff *skb); >>
diff --git a/include/net/gro.h b/include/net/gro.h index 1faff23b66e8..0565dd716ab7 100644 --- a/include/net/gro.h +++ b/include/net/gro.h @@ -36,15 +36,15 @@ struct napi_gro_cb { /* This is non-zero if the packet cannot be merged with the new skb. */ u16 flush; - /* Save the IP ID here and check when we get to the transport layer */ - u16 flush_id; - /* Number of segments aggregated. */ u16 count; /* Used in ipv6_gro_receive() and foo-over-udp and esp-in-udp */ u16 proto; + /* used to support CHECKSUM_COMPLETE for tunneling protocols */ + __wsum csum; + /* Used in napi_gro_cb::free */ #define NAPI_GRO_FREE 1 #define NAPI_GRO_FREE_STOLEN_HEAD 2 @@ -85,9 +85,6 @@ struct napi_gro_cb { u8 is_flist:1; ); - /* used to support CHECKSUM_COMPLETE for tunneling protocols */ - __wsum csum; - /* L3 offsets */ union { struct { @@ -442,6 +439,63 @@ static inline __wsum ip6_gro_compute_pseudo(const struct sk_buff *skb, skb_gro_len(skb), proto, 0)); } +static inline int inet_gro_flush(const struct iphdr *iph, const struct iphdr *iph2, + struct sk_buff *p, bool outer) +{ + const u32 id = ntohl(*(__be32 *)&iph->id); + const u32 id2 = ntohl(*(__be32 *)&iph2->id); + const u16 flush_id = (id >> 16) - (id2 >> 16); + const u16 count = NAPI_GRO_CB(p)->count; + const u32 df = id & IP_DF; + u32 is_atomic; + int flush; + + /* All fields must match except length and checksum. */ + flush = (iph->ttl ^ iph2->ttl) | (iph->tos ^ iph2->tos) | (df ^ (id2 & IP_DF)); + + if (outer && df) + return flush; + + /* When we receive our second frame we can make a decision on if we + * continue this flow as an atomic flow with a fixed ID or if we use + * an incrementing ID. + */ + NAPI_GRO_CB(p)->is_atomic |= (count == 1 && df && flush_id == 0); + is_atomic = (df && NAPI_GRO_CB(p)->is_atomic) - 1; + + return flush | (flush_id ^ (count & is_atomic)); +} + +static inline int ipv6_gro_flush(const struct ipv6hdr *iph, const struct ipv6hdr *iph2) +{ + /* <Version:4><Traffic_Class:8><Flow_Label:20> */ + __be32 first_word = *(__be32 *)iph ^ *(__be32 *)iph2; + + /* Flush if Traffic Class fields are different. */ + return !!((first_word & htonl(0x0FF00000)) | + (__force __be32)(iph->hop_limit ^ iph2->hop_limit)); +} + +static inline int gro_network_flush(const void *th, const void *th2, struct sk_buff *p, int off) +{ + const bool encap_mark = NAPI_GRO_CB(p)->encap_mark; + int flush = 0; + int i; + + for (i = 0; i <= encap_mark; i++) { + const u16 diff = off - NAPI_GRO_CB(p)->network_offsets[i]; + const void *nh = th - diff; + const void *nh2 = th2 - diff; + + if (((struct iphdr *)nh)->version == 6) + flush |= ipv6_gro_flush(nh, nh2); + else + flush |= inet_gro_flush(nh, nh2, p, i != encap_mark); + } + + return flush; +} + int skb_gro_receive(struct sk_buff *p, struct sk_buff *skb); /* Pass the currently batched GRO_NORMAL SKBs up to the stack. */ diff --git a/net/core/gro.c b/net/core/gro.c index 99a45a5211c9..3e9422c23bc9 100644 --- a/net/core/gro.c +++ b/net/core/gro.c @@ -331,8 +331,6 @@ static void gro_list_prepare(const struct list_head *head, list_for_each_entry(p, head, list) { unsigned long diffs; - NAPI_GRO_CB(p)->flush = 0; - if (hash != skb_get_hash_raw(p)) { NAPI_GRO_CB(p)->same_flow = 0; continue; @@ -472,7 +470,6 @@ static enum gro_result dev_gro_receive(struct napi_struct *napi, struct sk_buff sizeof(u32))); /* Avoid slow unaligned acc */ *(u32 *)&NAPI_GRO_CB(skb)->zeroed = 0; NAPI_GRO_CB(skb)->flush = skb_has_frag_list(skb); - NAPI_GRO_CB(skb)->is_atomic = 1; NAPI_GRO_CB(skb)->count = 1; if (unlikely(skb_is_gso(skb))) { NAPI_GRO_CB(skb)->count = skb_shinfo(skb)->gso_segs; diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c index 428196e1541f..44564d009e95 100644 --- a/net/ipv4/af_inet.c +++ b/net/ipv4/af_inet.c @@ -1482,7 +1482,6 @@ struct sk_buff *inet_gro_receive(struct list_head *head, struct sk_buff *skb) struct sk_buff *p; unsigned int hlen; unsigned int off; - unsigned int id; int flush = 1; int proto; @@ -1508,13 +1507,10 @@ struct sk_buff *inet_gro_receive(struct list_head *head, struct sk_buff *skb) goto out; NAPI_GRO_CB(skb)->proto = proto; - id = ntohl(*(__be32 *)&iph->id); - flush = (u16)((ntohl(*(__be32 *)iph) ^ skb_gro_len(skb)) | (id & ~IP_DF)); - id >>= 16; + flush = (u16)((ntohl(*(__be32 *)iph) ^ skb_gro_len(skb)) | (ntohl(*(__be32 *)&iph->id) & ~IP_DF)); list_for_each_entry(p, head, list) { struct iphdr *iph2; - u16 flush_id; if (!NAPI_GRO_CB(p)->same_flow) continue; @@ -1531,43 +1527,8 @@ struct sk_buff *inet_gro_receive(struct list_head *head, struct sk_buff *skb) NAPI_GRO_CB(p)->same_flow = 0; continue; } - - /* All fields must match except length and checksum. */ - NAPI_GRO_CB(p)->flush |= - (iph->ttl ^ iph2->ttl) | - (iph->tos ^ iph2->tos) | - ((iph->frag_off ^ iph2->frag_off) & htons(IP_DF)); - - NAPI_GRO_CB(p)->flush |= flush; - - /* We need to store of the IP ID check to be included later - * when we can verify that this packet does in fact belong - * to a given flow. - */ - flush_id = (u16)(id - ntohs(iph2->id)); - - /* This bit of code makes it much easier for us to identify - * the cases where we are doing atomic vs non-atomic IP ID - * checks. Specifically an atomic check can return IP ID - * values 0 - 0xFFFF, while a non-atomic check can only - * return 0 or 0xFFFF. - */ - if (!NAPI_GRO_CB(p)->is_atomic || - !(iph->frag_off & htons(IP_DF))) { - flush_id ^= NAPI_GRO_CB(p)->count; - flush_id = flush_id ? 0xFFFF : 0; - } - - /* If the previous IP ID value was based on an atomic - * datagram we can overwrite the value and ignore it. - */ - if (NAPI_GRO_CB(skb)->is_atomic) - NAPI_GRO_CB(p)->flush_id = flush_id; - else - NAPI_GRO_CB(p)->flush_id |= flush_id; } - NAPI_GRO_CB(skb)->is_atomic = !!(iph->frag_off & htons(IP_DF)); NAPI_GRO_CB(skb)->flush |= flush; NAPI_GRO_CB(skb)->inner_network_offset = off; diff --git a/net/ipv4/tcp_offload.c b/net/ipv4/tcp_offload.c index b70ae50e658d..625b4800b3ed 100644 --- a/net/ipv4/tcp_offload.c +++ b/net/ipv4/tcp_offload.c @@ -232,9 +232,7 @@ struct sk_buff *tcp_gro_receive(struct list_head *head, struct sk_buff *skb) goto out_check_final; found: - /* Include the IP ID check below from the inner most IP hdr */ - flush = NAPI_GRO_CB(p)->flush; - flush |= (__force int)(flags & TCP_FLAG_CWR); + flush = (__force int)(flags & TCP_FLAG_CWR); flush |= (__force int)((flags ^ tcp_flag_word(th2)) & ~(TCP_FLAG_CWR | TCP_FLAG_FIN | TCP_FLAG_PSH)); flush |= (__force int)(th->ack_seq ^ th2->ack_seq); @@ -242,16 +240,7 @@ struct sk_buff *tcp_gro_receive(struct list_head *head, struct sk_buff *skb) flush |= *(u32 *)((u8 *)th + i) ^ *(u32 *)((u8 *)th2 + i); - /* When we receive our second frame we can made a decision on if we - * continue this flow as an atomic flow with a fixed ID or if we use - * an incrementing ID. - */ - if (NAPI_GRO_CB(p)->flush_id != 1 || - NAPI_GRO_CB(p)->count != 1 || - !NAPI_GRO_CB(p)->is_atomic) - flush |= NAPI_GRO_CB(p)->flush_id; - else - NAPI_GRO_CB(p)->is_atomic = false; + flush |= gro_network_flush(th, th2, p, off); mss = skb_shinfo(p)->gso_size; diff --git a/net/ipv4/udp_offload.c b/net/ipv4/udp_offload.c index 8721fe5beca2..5d9696eaab8a 100644 --- a/net/ipv4/udp_offload.c +++ b/net/ipv4/udp_offload.c @@ -466,6 +466,7 @@ static struct sk_buff *udp_gro_receive_segment(struct list_head *head, struct sk_buff *skb) { struct udphdr *uh = udp_gro_udphdr(skb); + int off = skb_gro_offset(skb); struct sk_buff *pp = NULL; struct udphdr *uh2; struct sk_buff *p; @@ -505,14 +506,7 @@ static struct sk_buff *udp_gro_receive_segment(struct list_head *head, return p; } - flush = NAPI_GRO_CB(p)->flush; - - if (NAPI_GRO_CB(p)->flush_id != 1 || - NAPI_GRO_CB(p)->count != 1 || - !NAPI_GRO_CB(p)->is_atomic) - flush |= NAPI_GRO_CB(p)->flush_id; - else - NAPI_GRO_CB(p)->is_atomic = false; + flush = gro_network_flush(uh, uh2, p, off); /* Terminate the flow on len mismatch or if it grow "too much". * Under small packet flood GRO count could elsewhere grow a lot diff --git a/net/ipv6/ip6_offload.c b/net/ipv6/ip6_offload.c index 5d6b875a4638..72991a02cb30 100644 --- a/net/ipv6/ip6_offload.c +++ b/net/ipv6/ip6_offload.c @@ -290,19 +290,8 @@ INDIRECT_CALLABLE_SCOPE struct sk_buff *ipv6_gro_receive(struct list_head *head, nlen - sizeof(struct ipv6hdr))) goto not_same_flow; } - /* flush if Traffic Class fields are different */ - NAPI_GRO_CB(p)->flush |= !!((first_word & htonl(0x0FF00000)) | - (__force __be32)(iph->hop_limit ^ iph2->hop_limit)); - NAPI_GRO_CB(p)->flush |= flush; - - /* If the previous IP ID value was based on an atomic - * datagram we can overwrite the value and ignore it. - */ - if (NAPI_GRO_CB(skb)->is_atomic) - NAPI_GRO_CB(p)->flush_id = 0; } - NAPI_GRO_CB(skb)->is_atomic = true; NAPI_GRO_CB(skb)->flush |= flush; skb_gro_postpull_rcsum(skb, iph, nlen);
{inet,ipv6}_gro_receive functions perform flush checks (ttl, flags, iph->id, ...) against all packets in a loop. These flush checks are used in all merging UDP and TCP flows. These checks need to be done only once and only against the found p skb, since they only affect flush and not same_flow. This patch leverages correct network header offsets from the cb for both outer and inner network headers - allowing these checks to be done only once, in tcp_gro_receive and udp_gro_receive_segment. As a result, NAPI_GRO_CB(p)->flush is not used at all. In addition, flush_id checks are more declarative and contained in inet_gro_flush, thus removing the need for flush_id in napi_gro_cb. This results in less parsing code for non-loop flush tests for TCP and UDP flows. To make sure results are not within noise range - I've made netfilter drop all TCP packets, and measured CPU performance in GRO (in this case GRO is responsible for about 50% of the CPU utilization). perf top while replaying 64 parallel IP/TCP streams merging in GRO: (gro_network_flush is compiled inline to tcp_gro_receive) net-next: 6.94% [kernel] [k] inet_gro_receive 3.02% [kernel] [k] tcp_gro_receive patch applied: 4.27% [kernel] [k] tcp_gro_receive 4.22% [kernel] [k] inet_gro_receive perf top while replaying 64 parallel IP/IP/TCP streams merging in GRO (same results for any encapsulation, in this case inet_gro_receive is top offender in net-next) net-next: 10.09% [kernel] [k] inet_gro_receive 2.08% [kernel] [k] tcp_gro_receive patch applied: 6.97% [kernel] [k] inet_gro_receive 3.68% [kernel] [k] tcp_gro_receive Signed-off-by: Richard Gobert <richardbgobert@gmail.com> --- include/net/gro.h | 66 ++++++++++++++++++++++++++++++++++++++---- net/core/gro.c | 3 -- net/ipv4/af_inet.c | 41 +------------------------- net/ipv4/tcp_offload.c | 15 ++-------- net/ipv4/udp_offload.c | 10 ++----- net/ipv6/ip6_offload.c | 11 ------- 6 files changed, 65 insertions(+), 81 deletions(-)