diff mbox series

[net-next,v7,2/3] net: gro: move L3 flush checks to tcp_gro_receive and udp_gro_receive_segment

Message ID 20240412155533.115507-3-richardbgobert@gmail.com (mailing list archive)
State Changes Requested
Delegated to: Netdev Maintainers
Headers show
Series net: gro: move p->{flush/flush_id} calculations to L4 | expand

Checks

Context Check Description
netdev/tree_selection success Clearly marked for net-next
netdev/apply fail Patch does not apply to net-next-1

Commit Message

Richard Gobert April 12, 2024, 3:55 p.m. UTC
{inet,ipv6}_gro_receive functions perform flush checks (ttl, flags,
iph->id, ...) against all packets in a loop. These flush checks are used
currently in all tcp flows and in some UDP flows in GRO.

These checks need to be done only once and only against the found p skb,
since they only affect flush and not same_flow.

Leveraging the previous commit in the series, in which correct network
header offsets are saved for both outer and inner network headers -
allowing these checks to be done only once, in tcp_gro_receive and
udp_gro_receive_segment. As a result, NAPI_GRO_CB(p)->flush is not used at
all. In addition, flush_id checks are more declarative and contained in
inet_gro_flush, thus removing the need for flush_id in napi_gro_cb.

This results in less parsing code for UDP flows and non-loop flush tests
for TCP flows.

To make sure results are not within noise range - I've made netfilter drop
all TCP packets, and measured CPU performance in GRO (in this case GRO is
responsible for about 50% of the CPU utilization).

L3 flush/flush_id checks are not relevant to UDP connections where
skb_gro_receive_list is called. The only code change relevant to this flow
is inet_gro_receive. The rest of the code parsing this flow stays the
same.

All concurrent connections tested are with the same ip srcaddr and
dstaddr.

perf top while replaying 64 concurrent IP/UDP connections (UDP fwd flow):
net-next:
        3.03%  [kernel]  [k] inet_gro_receive

patch applied:
        2.78%  [kernel]  [k] inet_gro_receive

perf top while replaying encapsulated load - 64 concurrent IP/IP/UDP
connections (rx-gro-list and rx-udp-gro-forwarding are enabled):
net-next:
        10.50%  [kernel]  [k] inet_gro_receive

patch applied:
        8.19%  [kernel]  [k] inet_gro_receive

perf top while replaying 64 parallel IP/TCP streams merging in GRO:
(gro_network_flush is compiled inline to tcp_gro_receive)
net-next:
        6.94% [kernel] [k] inet_gro_receive
        3.02% [kernel] [k] tcp_gro_receive

patch applied:
        4.27% [kernel] [k] tcp_gro_receive
        4.22% [kernel] [k] inet_gro_receive

perf top while replaying 64 parallel IP/IP/TCP streams merging in GRO (same
results for any encapsulation, in this case inet_gro_receive is top
offender in net-next)
net-next:
        10.09% [kernel] [k] inet_gro_receive
        2.08% [kernel] [k] tcp_gro_receive

patch applied:
        6.97% [kernel] [k] inet_gro_receive
        3.68% [kernel] [k] tcp_gro_receive

Signed-off-by: Richard Gobert <richardbgobert@gmail.com>
---
 include/net/gro.h      | 66 ++++++++++++++++++++++++++++++++++++++----
 net/core/gro.c         |  4 ---
 net/ipv4/af_inet.c     | 41 +-------------------------
 net/ipv4/tcp_offload.c | 15 ++--------
 net/ipv4/udp_offload.c | 16 +++-------
 net/ipv6/ip6_offload.c | 11 -------
 6 files changed, 67 insertions(+), 86 deletions(-)

Comments

Paolo Abeni April 16, 2024, 9:21 a.m. UTC | #1
On Fri, 2024-04-12 at 17:55 +0200, Richard Gobert wrote:
> {inet,ipv6}_gro_receive functions perform flush checks (ttl, flags,
> iph->id, ...) against all packets in a loop. These flush checks are used
> currently in all tcp flows and in some UDP flows in GRO.
> 
> These checks need to be done only once and only against the found p skb,
> since they only affect flush and not same_flow.
> 
> Leveraging the previous commit in the series, in which correct network
> header offsets are saved for both outer and inner network headers -
> allowing these checks to be done only once, in tcp_gro_receive and
> udp_gro_receive_segment. As a result, NAPI_GRO_CB(p)->flush is not used at
> all. In addition, flush_id checks are more declarative and contained in
> inet_gro_flush, thus removing the need for flush_id in napi_gro_cb.
> 
> This results in less parsing code for UDP flows and non-loop flush tests
> for TCP flows.
> 
> To make sure results are not within noise range - I've made netfilter drop
> all TCP packets, and measured CPU performance in GRO (in this case GRO is
> responsible for about 50% of the CPU utilization).
> 
> L3 flush/flush_id checks are not relevant to UDP connections where
> skb_gro_receive_list is called. The only code change relevant to this flow
> is inet_gro_receive. The rest of the code parsing this flow stays the
> same.
> 
> All concurrent connections tested are with the same ip srcaddr and
> dstaddr.
> 
> perf top while replaying 64 concurrent IP/UDP connections (UDP fwd flow):
> net-next:
>         3.03%  [kernel]  [k] inet_gro_receive
> 
> patch applied:
>         2.78%  [kernel]  [k] inet_gro_receive

Why there are no figures for
udp_gro_receive_segment()/gro_network_flush() here?

Also you should be able to observer a very high amount of CPU usage by
GRO even with TCP with very high speed links, keeping the BH/GRO on a
CPU and the user-space/data copy on a different one (or using rx zero
copy).

Thanks,

Paolo
Paolo Abeni April 16, 2024, 9:58 a.m. UTC | #2
On Tue, 2024-04-16 at 11:21 +0200, Paolo Abeni wrote:
> On Fri, 2024-04-12 at 17:55 +0200, Richard Gobert wrote:
> > {inet,ipv6}_gro_receive functions perform flush checks (ttl, flags,
> > iph->id, ...) against all packets in a loop. These flush checks are used
> > currently in all tcp flows and in some UDP flows in GRO.
> > 
> > These checks need to be done only once and only against the found p skb,
> > since they only affect flush and not same_flow.
> > 
> > Leveraging the previous commit in the series, in which correct network
> > header offsets are saved for both outer and inner network headers -
> > allowing these checks to be done only once, in tcp_gro_receive and
> > udp_gro_receive_segment. As a result, NAPI_GRO_CB(p)->flush is not used at
> > all. In addition, flush_id checks are more declarative and contained in
> > inet_gro_flush, thus removing the need for flush_id in napi_gro_cb.
> > 
> > This results in less parsing code for UDP flows and non-loop flush tests
> > for TCP flows.
> > 
> > To make sure results are not within noise range - I've made netfilter drop
> > all TCP packets, and measured CPU performance in GRO (in this case GRO is
> > responsible for about 50% of the CPU utilization).
> > 
> > L3 flush/flush_id checks are not relevant to UDP connections where
> > skb_gro_receive_list is called. The only code change relevant to this flow
> > is inet_gro_receive. The rest of the code parsing this flow stays the
> > same.
> > 
> > All concurrent connections tested are with the same ip srcaddr and
> > dstaddr.
> > 
> > perf top while replaying 64 concurrent IP/UDP connections (UDP fwd flow):
> > net-next:
> >         3.03%  [kernel]  [k] inet_gro_receive
> > 
> > patch applied:
> >         2.78%  [kernel]  [k] inet_gro_receive
> 
> Why there are no figures for
> udp_gro_receive_segment()/gro_network_flush() here?
> 
> Also you should be able to observer a very high amount of CPU usage by
> GRO even with TCP with very high speed links, keeping the BH/GRO on a
> CPU and the user-space/data copy on a different one (or using rx zero
> copy).

To be more explicit: I think at least the above figures are required, 
and I still fear the real gain in that case would range from zero to
negative. 

If you can't do the TCP part of the testing, please provide at least
the figures for a single UDP flow, that should give more indication WRT
the result we can expect with TCP.

Note that GRO is used mainly by TCP and TCP packets with different
src/dst port will land into different GRO hash buckets, having
different RX hash.

That will happen even for UDP, at least for some (most?) nics include
the UDP ports in the RX hash.

Thanks,

Paolo
Richard Gobert April 18, 2024, 3:05 p.m. UTC | #3
Paolo Abeni wrote:
> On Tue, 2024-04-16 at 11:21 +0200, Paolo Abeni wrote:
>> On Fri, 2024-04-12 at 17:55 +0200, Richard Gobert wrote:
>>> {inet,ipv6}_gro_receive functions perform flush checks (ttl, flags,
>>> iph->id, ...) against all packets in a loop. These flush checks are used
>>> currently in all tcp flows and in some UDP flows in GRO.
>>>
>>> These checks need to be done only once and only against the found p skb,
>>> since they only affect flush and not same_flow.
>>>
>>> Leveraging the previous commit in the series, in which correct network
>>> header offsets are saved for both outer and inner network headers -
>>> allowing these checks to be done only once, in tcp_gro_receive and
>>> udp_gro_receive_segment. As a result, NAPI_GRO_CB(p)->flush is not used at
>>> all. In addition, flush_id checks are more declarative and contained in
>>> inet_gro_flush, thus removing the need for flush_id in napi_gro_cb.
>>>
>>> This results in less parsing code for UDP flows and non-loop flush tests
>>> for TCP flows.
>>>
>>> To make sure results are not within noise range - I've made netfilter drop
>>> all TCP packets, and measured CPU performance in GRO (in this case GRO is
>>> responsible for about 50% of the CPU utilization).
>>>
>>> L3 flush/flush_id checks are not relevant to UDP connections where
>>> skb_gro_receive_list is called. The only code change relevant to this flow
>>> is inet_gro_receive. The rest of the code parsing this flow stays the
>>> same.
>>>
>>> All concurrent connections tested are with the same ip srcaddr and
>>> dstaddr.
>>>
>>> perf top while replaying 64 concurrent IP/UDP connections (UDP fwd flow):
>>> net-next:
>>>         3.03%  [kernel]  [k] inet_gro_receive
>>>
>>> patch applied:
>>>         2.78%  [kernel]  [k] inet_gro_receive
>>
>> Why there are no figures for
>> udp_gro_receive_segment()/gro_network_flush() here?
>>
>> Also you should be able to observer a very high amount of CPU usage by
>> GRO even with TCP with very high speed links, keeping the BH/GRO on a
>> CPU and the user-space/data copy on a different one (or using rx zero
>> copy).
> 
> To be more explicit: I think at least the above figures are required, 
> and I still fear the real gain in that case would range from zero to
> negative. 
> 

I wrote about it in the commit message in short, sorry if I wasn't clear
enough.

gro_network_flush is compiled in-line to both udp_gro_receive_segment and
tcp_gro_receive. udp_gro_receive_segment is compiled in-line to
udp_gro_receive.

The UDP numbers I posted are not relevant anymore after Willem and
Alexander's thread, after which we understood flush and flush_id should be
calculated for all UDP flows.

I can post new numbers for the UDP fwd path after implementing the correct
change. As for TCP - the numbers I posted stay the same.

You should note there is an increase in CPU utilization in tcp_gro_receive
because of the inline compilation of gro_network_flush. The numbers make
sense and show performance enhancement in the case I showed when both
inet_gro_receive and tcp_gro_receive are accounted for.

> If you can't do the TCP part of the testing, please provide at least
> the figures for a single UDP flow, that should give more indication WRT
> the result we can expect with TCP.
> 
> Note that GRO is used mainly by TCP and TCP packets with different
> src/dst port will land into different GRO hash buckets, having
> different RX hash.
> 
> That will happen even for UDP, at least for some (most?) nics include
> the UDP ports in the RX hash.
> 
> Thanks,
> 
> Paolo
>
diff mbox series

Patch

diff --git a/include/net/gro.h b/include/net/gro.h
index a1cc8e8c2ebd..c1f80f1156d6 100644
--- a/include/net/gro.h
+++ b/include/net/gro.h
@@ -36,15 +36,15 @@  struct napi_gro_cb {
 	/* This is non-zero if the packet cannot be merged with the new skb. */
 	u16	flush;
 
-	/* Save the IP ID here and check when we get to the transport layer */
-	u16	flush_id;
-
 	/* Number of segments aggregated. */
 	u16	count;
 
 	/* Used in ipv6_gro_receive() and foo-over-udp and esp-in-udp */
 	u16	proto;
 
+	/* used to support CHECKSUM_COMPLETE for tunneling protocols */
+	__wsum	csum;
+
 /* Used in napi_gro_cb::free */
 #define NAPI_GRO_FREE             1
 #define NAPI_GRO_FREE_STOLEN_HEAD 2
@@ -85,9 +85,6 @@  struct napi_gro_cb {
 		u8	is_flist:1;
 	);
 
-	/* used to support CHECKSUM_COMPLETE for tunneling protocols */
-	__wsum	csum;
-
 	/* L3 offsets */
 	union {
 		struct {
@@ -443,6 +440,63 @@  static inline __wsum ip6_gro_compute_pseudo(const struct sk_buff *skb,
 					    skb_gro_len(skb), proto, 0));
 }
 
+static inline int inet_gro_flush(const struct iphdr *iph, const struct iphdr *iph2,
+				 struct sk_buff *p, bool outer)
+{
+	const u32 id = ntohl(*(__be32 *)&iph->id);
+	const u32 id2 = ntohl(*(__be32 *)&iph2->id);
+	const u16 flush_id = (id >> 16) - (id2 >> 16);
+	const u16 count = NAPI_GRO_CB(p)->count;
+	const u32 df = id & IP_DF;
+	u32 is_atomic;
+	int flush;
+
+	/* All fields must match except length and checksum. */
+	flush = (iph->ttl ^ iph2->ttl) | (iph->tos ^ iph2->tos) | (df ^ (id2 & IP_DF));
+
+	if (outer && df)
+		return flush;
+
+	/* When we receive our second frame we can make a decision on if we
+	 * continue this flow as an atomic flow with a fixed ID or if we use
+	 * an incrementing ID.
+	 */
+	NAPI_GRO_CB(p)->is_atomic |= (count == 1 && df && flush_id == 0);
+	is_atomic = (df && NAPI_GRO_CB(p)->is_atomic) - 1;
+
+	return flush | (flush_id ^ (count & is_atomic));
+}
+
+static inline int ipv6_gro_flush(const struct ipv6hdr *iph, const struct ipv6hdr *iph2)
+{
+	/* <Version:4><Traffic_Class:8><Flow_Label:20> */
+	__be32 first_word = *(__be32 *)iph ^ *(__be32 *)iph2;
+
+	/* Flush if Traffic Class fields are different. */
+	return !!((first_word & htonl(0x0FF00000)) |
+		(__force __be32)(iph->hop_limit ^ iph2->hop_limit));
+}
+
+static inline int gro_network_flush(const void *th, const void *th2, struct sk_buff *p, int off)
+{
+	const bool encap_mark = NAPI_GRO_CB(p)->encap_mark;
+	int flush = 0;
+	int i;
+
+	for (i = 0; i <= encap_mark; i++) {
+		const u16 diff = off - NAPI_GRO_CB(p)->network_offsets[i];
+		const void *nh = th - diff;
+		const void *nh2 = th2 - diff;
+
+		if (((struct iphdr *)nh)->version == 6)
+			flush |= ipv6_gro_flush(nh, nh2);
+		else
+			flush |= inet_gro_flush(nh, nh2, p, i != encap_mark);
+	}
+
+	return flush;
+}
+
 int skb_gro_receive(struct sk_buff *p, struct sk_buff *skb);
 
 /* Pass the currently batched GRO_NORMAL SKBs up to the stack. */
diff --git a/net/core/gro.c b/net/core/gro.c
index b2156e6cc4ad..3bfdfefe4a24 100644
--- a/net/core/gro.c
+++ b/net/core/gro.c
@@ -89,7 +89,6 @@  void dev_remove_offload(struct packet_offload *po)
 }
 EXPORT_SYMBOL(dev_remove_offload);
 
-
 int skb_gro_receive(struct sk_buff *p, struct sk_buff *skb)
 {
 	struct skb_shared_info *pinfo, *skbinfo = skb_shinfo(skb);
@@ -330,8 +329,6 @@  static void gro_list_prepare(const struct list_head *head,
 	list_for_each_entry(p, head, list) {
 		unsigned long diffs;
 
-		NAPI_GRO_CB(p)->flush = 0;
-
 		if (hash != skb_get_hash_raw(p)) {
 			NAPI_GRO_CB(p)->same_flow = 0;
 			continue;
@@ -471,7 +468,6 @@  static enum gro_result dev_gro_receive(struct napi_struct *napi, struct sk_buff
 					sizeof(u32))); /* Avoid slow unaligned acc */
 	*(u32 *)&NAPI_GRO_CB(skb)->zeroed = 0;
 	NAPI_GRO_CB(skb)->flush = skb_has_frag_list(skb);
-	NAPI_GRO_CB(skb)->is_atomic = 1;
 	NAPI_GRO_CB(skb)->count = 1;
 	if (unlikely(skb_is_gso(skb))) {
 		NAPI_GRO_CB(skb)->count = skb_shinfo(skb)->gso_segs;
diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
index 6546bf376b24..af094aecf38c 100644
--- a/net/ipv4/af_inet.c
+++ b/net/ipv4/af_inet.c
@@ -1481,7 +1481,6 @@  struct sk_buff *inet_gro_receive(struct list_head *head, struct sk_buff *skb)
 	struct sk_buff *p;
 	unsigned int hlen;
 	unsigned int off;
-	unsigned int id;
 	int flush = 1;
 	int proto;
 
@@ -1507,13 +1506,10 @@  struct sk_buff *inet_gro_receive(struct list_head *head, struct sk_buff *skb)
 		goto out;
 
 	NAPI_GRO_CB(skb)->proto = proto;
-	id = ntohl(*(__be32 *)&iph->id);
-	flush = (u16)((ntohl(*(__be32 *)iph) ^ skb_gro_len(skb)) | (id & ~IP_DF));
-	id >>= 16;
+	flush = (u16)((ntohl(*(__be32 *)iph) ^ skb_gro_len(skb)) | (ntohl(*(__be32 *)&iph->id) & ~IP_DF));
 
 	list_for_each_entry(p, head, list) {
 		struct iphdr *iph2;
-		u16 flush_id;
 
 		if (!NAPI_GRO_CB(p)->same_flow)
 			continue;
@@ -1530,43 +1526,8 @@  struct sk_buff *inet_gro_receive(struct list_head *head, struct sk_buff *skb)
 			NAPI_GRO_CB(p)->same_flow = 0;
 			continue;
 		}
-
-		/* All fields must match except length and checksum. */
-		NAPI_GRO_CB(p)->flush |=
-			(iph->ttl ^ iph2->ttl) |
-			(iph->tos ^ iph2->tos) |
-			((iph->frag_off ^ iph2->frag_off) & htons(IP_DF));
-
-		NAPI_GRO_CB(p)->flush |= flush;
-
-		/* We need to store of the IP ID check to be included later
-		 * when we can verify that this packet does in fact belong
-		 * to a given flow.
-		 */
-		flush_id = (u16)(id - ntohs(iph2->id));
-
-		/* This bit of code makes it much easier for us to identify
-		 * the cases where we are doing atomic vs non-atomic IP ID
-		 * checks.  Specifically an atomic check can return IP ID
-		 * values 0 - 0xFFFF, while a non-atomic check can only
-		 * return 0 or 0xFFFF.
-		 */
-		if (!NAPI_GRO_CB(p)->is_atomic ||
-		    !(iph->frag_off & htons(IP_DF))) {
-			flush_id ^= NAPI_GRO_CB(p)->count;
-			flush_id = flush_id ? 0xFFFF : 0;
-		}
-
-		/* If the previous IP ID value was based on an atomic
-		 * datagram we can overwrite the value and ignore it.
-		 */
-		if (NAPI_GRO_CB(skb)->is_atomic)
-			NAPI_GRO_CB(p)->flush_id = flush_id;
-		else
-			NAPI_GRO_CB(p)->flush_id |= flush_id;
 	}
 
-	NAPI_GRO_CB(skb)->is_atomic = !!(iph->frag_off & htons(IP_DF));
 	NAPI_GRO_CB(skb)->flush |= flush;
 
 	/* Note : No need to call skb_gro_postpull_rcsum() here,
diff --git a/net/ipv4/tcp_offload.c b/net/ipv4/tcp_offload.c
index 7f045b881dd4..1b10ab3b0f6a 100644
--- a/net/ipv4/tcp_offload.c
+++ b/net/ipv4/tcp_offload.c
@@ -232,9 +232,7 @@  struct sk_buff *tcp_gro_receive(struct list_head *head, struct sk_buff *skb)
 	goto out_check_final;
 
 found:
-	/* Include the IP ID check below from the inner most IP hdr */
-	flush = NAPI_GRO_CB(p)->flush;
-	flush |= (__force int)(flags & TCP_FLAG_CWR);
+	flush = (__force int)(flags & TCP_FLAG_CWR);
 	flush |= (__force int)((flags ^ tcp_flag_word(th2)) &
 		  ~(TCP_FLAG_CWR | TCP_FLAG_FIN | TCP_FLAG_PSH));
 	flush |= (__force int)(th->ack_seq ^ th2->ack_seq);
@@ -242,16 +240,7 @@  struct sk_buff *tcp_gro_receive(struct list_head *head, struct sk_buff *skb)
 		flush |= *(u32 *)((u8 *)th + i) ^
 			 *(u32 *)((u8 *)th2 + i);
 
-	/* When we receive our second frame we can made a decision on if we
-	 * continue this flow as an atomic flow with a fixed ID or if we use
-	 * an incrementing ID.
-	 */
-	if (NAPI_GRO_CB(p)->flush_id != 1 ||
-	    NAPI_GRO_CB(p)->count != 1 ||
-	    !NAPI_GRO_CB(p)->is_atomic)
-		flush |= NAPI_GRO_CB(p)->flush_id;
-	else
-		NAPI_GRO_CB(p)->is_atomic = false;
+	flush |= gro_network_flush(th, th2, p, off);
 
 	mss = skb_shinfo(p)->gso_size;
 
diff --git a/net/ipv4/udp_offload.c b/net/ipv4/udp_offload.c
index ad4c88fe7d15..c5a5155904cf 100644
--- a/net/ipv4/udp_offload.c
+++ b/net/ipv4/udp_offload.c
@@ -466,12 +466,12 @@  static struct sk_buff *udp_gro_receive_segment(struct list_head *head,
 					       struct sk_buff *skb)
 {
 	struct udphdr *uh = udp_gro_udphdr(skb);
+	int off = skb_gro_offset(skb);
 	struct sk_buff *pp = NULL;
 	struct udphdr *uh2;
 	struct sk_buff *p;
 	unsigned int ulen;
 	int ret = 0;
-	int flush;
 
 	/* requires non zero csum, for symmetry with GSO */
 	if (!uh->check) {
@@ -529,17 +529,9 @@  static struct sk_buff *udp_gro_receive_segment(struct list_head *head,
 				skb_gro_postpull_rcsum(skb, uh,
 						       sizeof(struct udphdr));
 
-				flush = NAPI_GRO_CB(p)->flush;
-
-				if (NAPI_GRO_CB(p)->flush_id != 1 ||
-				    NAPI_GRO_CB(p)->count != 1 ||
-				    !NAPI_GRO_CB(p)->is_atomic)
-					flush |= NAPI_GRO_CB(p)->flush_id;
-				else
-					NAPI_GRO_CB(p)->is_atomic = false;
-
-				if (flush || skb_gro_receive(p, skb))
-					ret = 1;
+				ret = gro_network_flush(uh, uh2, p, off);
+				if (!ret)
+					ret = skb_gro_receive(p, skb);
 			}
 		}
 
diff --git a/net/ipv6/ip6_offload.c b/net/ipv6/ip6_offload.c
index ba41939537f2..c9a6bc1afc9a 100644
--- a/net/ipv6/ip6_offload.c
+++ b/net/ipv6/ip6_offload.c
@@ -288,19 +288,8 @@  INDIRECT_CALLABLE_SCOPE struct sk_buff *ipv6_gro_receive(struct list_head *head,
 				   nlen - sizeof(struct ipv6hdr)))
 				goto not_same_flow;
 		}
-		/* flush if Traffic Class fields are different */
-		NAPI_GRO_CB(p)->flush |= !!((first_word & htonl(0x0FF00000)) |
-			(__force __be32)(iph->hop_limit ^ iph2->hop_limit));
-		NAPI_GRO_CB(p)->flush |= flush;
-
-		/* If the previous IP ID value was based on an atomic
-		 * datagram we can overwrite the value and ignore it.
-		 */
-		if (NAPI_GRO_CB(skb)->is_atomic)
-			NAPI_GRO_CB(p)->flush_id = 0;
 	}
 
-	NAPI_GRO_CB(skb)->is_atomic = true;
 	NAPI_GRO_CB(skb)->flush |= flush;
 
 	skb_gro_postpull_rcsum(skb, iph, nlen);