Message ID | 20240419153542.121087-2-richardbgobert@gmail.com (mailing list archive) |
---|---|
State | Superseded |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | net: gro: add flush/flush_id checks and fix wrong offset in udp | expand |
Richard Gobert wrote: > This patch adds network_offset and inner_network_offset to napi_gro_cb, and > makes sure both are set correctly. In the common path there's only one > write (skb_gro_reset_offset, which replaces skb_set_network_header). > > Signed-off-by: Richard Gobert <richardbgobert@gmail.com> > --- > drivers/net/geneve.c | 1 + > drivers/net/vxlan/vxlan_core.c | 1 + > include/net/gro.h | 18 ++++++++++++++++-- > net/8021q/vlan_core.c | 2 ++ > net/core/gro.c | 1 + > net/ethernet/eth.c | 1 + > net/ipv4/af_inet.c | 5 +---- > net/ipv4/gre_offload.c | 1 + > net/ipv6/ip6_offload.c | 8 ++++---- > 9 files changed, 28 insertions(+), 10 deletions(-) > > +static inline int skb_gro_network_offset(const struct sk_buff *skb) > +{ > + return NAPI_GRO_CB(skb)->network_offsets[NAPI_GRO_CB(skb)->encap_mark]; > +} > + > @@ -236,8 +236,6 @@ INDIRECT_CALLABLE_SCOPE struct sk_buff *ipv6_gro_receive(struct list_head *head, > if (unlikely(!iph)) > goto out; > > - skb_set_network_header(skb, off); > - Especially for net, this is still a large patch. Can we avoid touching all those tunnel callbacks and just set the offsets in inet_gro_receive and ipv6_gro_receive themselves, just as skb_set_network_header now: @@ -236,7 +236,7 @@ INDIRECT_CALLABLE_SCOPE struct sk_buff *ipv6_gro_receive(struct list_head *head, if (unlikely(!iph)) goto out; - skb_set_network_header(skb, off); + NAPI_GRO_CB(skb)->network_offsets[NAPI_GRO_CB(skb)->encap_mark] = off;
Willem de Bruijn wrote: > Richard Gobert wrote: >> This patch adds network_offset and inner_network_offset to napi_gro_cb, and >> makes sure both are set correctly. In the common path there's only one >> write (skb_gro_reset_offset, which replaces skb_set_network_header). >> >> Signed-off-by: Richard Gobert <richardbgobert@gmail.com> >> --- >> drivers/net/geneve.c | 1 + >> drivers/net/vxlan/vxlan_core.c | 1 + >> include/net/gro.h | 18 ++++++++++++++++-- >> net/8021q/vlan_core.c | 2 ++ >> net/core/gro.c | 1 + >> net/ethernet/eth.c | 1 + >> net/ipv4/af_inet.c | 5 +---- >> net/ipv4/gre_offload.c | 1 + >> net/ipv6/ip6_offload.c | 8 ++++---- >> 9 files changed, 28 insertions(+), 10 deletions(-) >> > >> +static inline int skb_gro_network_offset(const struct sk_buff *skb) >> +{ >> + return NAPI_GRO_CB(skb)->network_offsets[NAPI_GRO_CB(skb)->encap_mark]; >> +} >> + > > >> @@ -236,8 +236,6 @@ INDIRECT_CALLABLE_SCOPE struct sk_buff *ipv6_gro_receive(struct list_head *head, >> if (unlikely(!iph)) >> goto out; >> >> - skb_set_network_header(skb, off); >> - > > Especially for net, this is still a large patch. > > Can we avoid touching all those tunnel callbacks and just set the > offsets in inet_gro_receive and ipv6_gro_receive themselves, just > as skb_set_network_header now: > > @@ -236,7 +236,7 @@ INDIRECT_CALLABLE_SCOPE struct sk_buff *ipv6_gro_receive(struct list_head *head, > if (unlikely(!iph)) > goto out; > > - skb_set_network_header(skb, off); > + NAPI_GRO_CB(skb)->network_offsets[NAPI_GRO_CB(skb)->encap_mark] = off; > Thanks for the reply! Setting network_offset on dev_gro_receive and inner_network_offset only in the tunnel callbacks is the best option IMO. I agree that we want a small patch to net that solves the problem, although I think always using ->encap_mark in the common path is not ideal. We can avoid changing all the tunnel callbacks by always setting inner_network_offset in {ipv6,inet}_gro_receive and initialize network_offset to 0 in dev_gro_receive. It will result in a small change, without using ->encap_mark. What are your thoughts?
Richard Gobert wrote: > Willem de Bruijn wrote: > > Richard Gobert wrote: > >> This patch adds network_offset and inner_network_offset to napi_gro_cb, and > >> makes sure both are set correctly. In the common path there's only one > >> write (skb_gro_reset_offset, which replaces skb_set_network_header). > >> > >> Signed-off-by: Richard Gobert <richardbgobert@gmail.com> > >> --- > >> drivers/net/geneve.c | 1 + > >> drivers/net/vxlan/vxlan_core.c | 1 + > >> include/net/gro.h | 18 ++++++++++++++++-- > >> net/8021q/vlan_core.c | 2 ++ > >> net/core/gro.c | 1 + > >> net/ethernet/eth.c | 1 + > >> net/ipv4/af_inet.c | 5 +---- > >> net/ipv4/gre_offload.c | 1 + > >> net/ipv6/ip6_offload.c | 8 ++++---- > >> 9 files changed, 28 insertions(+), 10 deletions(-) > >> > > > >> +static inline int skb_gro_network_offset(const struct sk_buff *skb) > >> +{ > >> + return NAPI_GRO_CB(skb)->network_offsets[NAPI_GRO_CB(skb)->encap_mark]; > >> +} > >> + > > > > > >> @@ -236,8 +236,6 @@ INDIRECT_CALLABLE_SCOPE struct sk_buff *ipv6_gro_receive(struct list_head *head, > >> if (unlikely(!iph)) > >> goto out; > >> > >> - skb_set_network_header(skb, off); > >> - > > > > Especially for net, this is still a large patch. > > > > Can we avoid touching all those tunnel callbacks and just set the > > offsets in inet_gro_receive and ipv6_gro_receive themselves, just > > as skb_set_network_header now: > > > > @@ -236,7 +236,7 @@ INDIRECT_CALLABLE_SCOPE struct sk_buff *ipv6_gro_receive(struct list_head *head, > > if (unlikely(!iph)) > > goto out; > > > > - skb_set_network_header(skb, off); > > + NAPI_GRO_CB(skb)->network_offsets[NAPI_GRO_CB(skb)->encap_mark] = off; > > > > Thanks for the reply! > > Setting network_offset on dev_gro_receive and inner_network_offset only > in the tunnel callbacks is the best option IMO. I agree that > we want a small patch to net that solves the problem, although I > think always using ->encap_mark in the common path is not ideal. > > We can avoid changing all the tunnel callbacks by always setting > inner_network_offset in {ipv6,inet}_gro_receive and initialize > network_offset to 0 in dev_gro_receive. It will result in a small > change, without using ->encap_mark. > > What are your thoughts? That works. It's a bit ugly that inner_network_offset will always be set, even if a packet only traverses inet_gro_receive once. What is your concern with testing encap_mark? How do you want to detect in udp[46]_lib_lookup_skb which of the two offsets to use? That would still be encap_mark based?
Willem de Bruijn wrote: > Richard Gobert wrote: >> Willem de Bruijn wrote: >>> Richard Gobert wrote: >>>> This patch adds network_offset and inner_network_offset to napi_gro_cb, and >>>> makes sure both are set correctly. In the common path there's only one >>>> write (skb_gro_reset_offset, which replaces skb_set_network_header). >>>> >>>> Signed-off-by: Richard Gobert <richardbgobert@gmail.com> >>>> --- >>>> drivers/net/geneve.c | 1 + >>>> drivers/net/vxlan/vxlan_core.c | 1 + >>>> include/net/gro.h | 18 ++++++++++++++++-- >>>> net/8021q/vlan_core.c | 2 ++ >>>> net/core/gro.c | 1 + >>>> net/ethernet/eth.c | 1 + >>>> net/ipv4/af_inet.c | 5 +---- >>>> net/ipv4/gre_offload.c | 1 + >>>> net/ipv6/ip6_offload.c | 8 ++++---- >>>> 9 files changed, 28 insertions(+), 10 deletions(-) >>>> >>> >>>> +static inline int skb_gro_network_offset(const struct sk_buff *skb) >>>> +{ >>>> + return NAPI_GRO_CB(skb)->network_offsets[NAPI_GRO_CB(skb)->encap_mark]; >>>> +} >>>> + >>> >>> >>>> @@ -236,8 +236,6 @@ INDIRECT_CALLABLE_SCOPE struct sk_buff *ipv6_gro_receive(struct list_head *head, >>>> if (unlikely(!iph)) >>>> goto out; >>>> >>>> - skb_set_network_header(skb, off); >>>> - >>> >>> Especially for net, this is still a large patch. >>> >>> Can we avoid touching all those tunnel callbacks and just set the >>> offsets in inet_gro_receive and ipv6_gro_receive themselves, just >>> as skb_set_network_header now: >>> >>> @@ -236,7 +236,7 @@ INDIRECT_CALLABLE_SCOPE struct sk_buff *ipv6_gro_receive(struct list_head *head, >>> if (unlikely(!iph)) >>> goto out; >>> >>> - skb_set_network_header(skb, off); >>> + NAPI_GRO_CB(skb)->network_offsets[NAPI_GRO_CB(skb)->encap_mark] = off; >>> >> >> Thanks for the reply! >> >> Setting network_offset on dev_gro_receive and inner_network_offset only >> in the tunnel callbacks is the best option IMO. I agree that >> we want a small patch to net that solves the problem, although I >> think always using ->encap_mark in the common path is not ideal. >> >> We can avoid changing all the tunnel callbacks by always setting >> inner_network_offset in {ipv6,inet}_gro_receive and initialize >> network_offset to 0 in dev_gro_receive. It will result in a small >> change, without using ->encap_mark. >> >> What are your thoughts? > > That works. It's a bit ugly that inner_network_offset will always be > set, even if a packet only traverses inet_gro_receive once. What is > your concern with testing encap_mark? > > How do you want to detect in udp[46]_lib_lookup_skb which of the two > offsets to use? That would still be encap_mark based? > I'd like to minimize any potential overhead, even a small one, and this way we do not need to access encap_mark at all in the common path. NAPI_GRO_CB(skb)->network_offsets[NAPI_GRO_CB(skb)->encap_mark] = off; compiles to: movzx eax, byte ptr [rbx+46h] shr al, 1 and eax, 1 mov [rbx+rax*2+4Ch], r14w while NAPI_GRO_CB(skb)->inner_network_offset = off; compiles to: mov [rbx+4Eh], r14w I do plan to add a patch to net-next after this to remove the access entirely from inet gro callbacks, in the meantime, it looks to me like a reasonable patch and small enough to not raise concerns. For udp_lib_lookup I don't see a way around it so yes, it would still be dependent on encap_mark. Since this runs in the complete phase it's less concerning. Let me know that you're ok with that and I'll post a v3.
Richard Gobert wrote: > Willem de Bruijn wrote: > > Richard Gobert wrote: > >> Willem de Bruijn wrote: > >>> Richard Gobert wrote: > >>>> This patch adds network_offset and inner_network_offset to napi_gro_cb, and > >>>> makes sure both are set correctly. In the common path there's only one > >>>> write (skb_gro_reset_offset, which replaces skb_set_network_header). > >>>> > >>>> Signed-off-by: Richard Gobert <richardbgobert@gmail.com> > >>>> --- > >>>> drivers/net/geneve.c | 1 + > >>>> drivers/net/vxlan/vxlan_core.c | 1 + > >>>> include/net/gro.h | 18 ++++++++++++++++-- > >>>> net/8021q/vlan_core.c | 2 ++ > >>>> net/core/gro.c | 1 + > >>>> net/ethernet/eth.c | 1 + > >>>> net/ipv4/af_inet.c | 5 +---- > >>>> net/ipv4/gre_offload.c | 1 + > >>>> net/ipv6/ip6_offload.c | 8 ++++---- > >>>> 9 files changed, 28 insertions(+), 10 deletions(-) > >>>> > >>> > >>>> +static inline int skb_gro_network_offset(const struct sk_buff *skb) > >>>> +{ > >>>> + return NAPI_GRO_CB(skb)->network_offsets[NAPI_GRO_CB(skb)->encap_mark]; > >>>> +} > >>>> + > >>> > >>> > >>>> @@ -236,8 +236,6 @@ INDIRECT_CALLABLE_SCOPE struct sk_buff *ipv6_gro_receive(struct list_head *head, > >>>> if (unlikely(!iph)) > >>>> goto out; > >>>> > >>>> - skb_set_network_header(skb, off); > >>>> - > >>> > >>> Especially for net, this is still a large patch. > >>> > >>> Can we avoid touching all those tunnel callbacks and just set the > >>> offsets in inet_gro_receive and ipv6_gro_receive themselves, just > >>> as skb_set_network_header now: > >>> > >>> @@ -236,7 +236,7 @@ INDIRECT_CALLABLE_SCOPE struct sk_buff *ipv6_gro_receive(struct list_head *head, > >>> if (unlikely(!iph)) > >>> goto out; > >>> > >>> - skb_set_network_header(skb, off); > >>> + NAPI_GRO_CB(skb)->network_offsets[NAPI_GRO_CB(skb)->encap_mark] = off; > >>> > >> > >> Thanks for the reply! > >> > >> Setting network_offset on dev_gro_receive and inner_network_offset only > >> in the tunnel callbacks is the best option IMO. I agree that > >> we want a small patch to net that solves the problem, although I > >> think always using ->encap_mark in the common path is not ideal. > >> > >> We can avoid changing all the tunnel callbacks by always setting > >> inner_network_offset in {ipv6,inet}_gro_receive and initialize > >> network_offset to 0 in dev_gro_receive. It will result in a small > >> change, without using ->encap_mark. > >> > >> What are your thoughts? > > > > That works. It's a bit ugly that inner_network_offset will always be > > set, even if a packet only traverses inet_gro_receive once. What is > > your concern with testing encap_mark? > > > > How do you want to detect in udp[46]_lib_lookup_skb which of the two > > offsets to use? That would still be encap_mark based? > > > > I'd like to minimize any potential overhead, even a small one, and this way > we do not need to access encap_mark at all in the common path. > > NAPI_GRO_CB(skb)->network_offsets[NAPI_GRO_CB(skb)->encap_mark] = off; > > compiles to: > > movzx eax, byte ptr [rbx+46h] > shr al, 1 > and eax, 1 > mov [rbx+rax*2+4Ch], r14w > > while > > NAPI_GRO_CB(skb)->inner_network_offset = off; > > compiles to: > > mov [rbx+4Eh], r14w > > I do plan to add a patch to net-next after this to remove the access > entirely from inet gro callbacks, in the meantime, it looks to me like a > reasonable patch and small enough to not raise concerns. > > For udp_lib_lookup I don't see a way around it so yes, it would still be > dependent on encap_mark. Since this runs in the complete phase it's less > concerning. > > Let me know that you're ok with that and I'll post a v3. Yes, looks fine. Main cost is memory access, and that encap_mark will be accessed soon after in udp4_lib_lookup. I don't expect two arithmetic instructions to matter. But this code does now have one more store: the one in dev_gro_receive. Either way, in the noise. Both approaches look fine to me: very concise and essentially equivalent. Choose your preferred option.
diff --git a/drivers/net/geneve.c b/drivers/net/geneve.c index 6c2835086b57..6549348cc24e 100644 --- a/drivers/net/geneve.c +++ b/drivers/net/geneve.c @@ -542,6 +542,7 @@ static struct sk_buff *geneve_gro_receive(struct sock *sk, if (!ptype) goto out; + NAPI_GRO_CB(skb)->inner_network_offset = hlen; pp = call_gro_receive(ptype->callbacks.gro_receive, head, skb); flush = 0; diff --git a/drivers/net/vxlan/vxlan_core.c b/drivers/net/vxlan/vxlan_core.c index ba319fc21957..c649a82eeca7 100644 --- a/drivers/net/vxlan/vxlan_core.c +++ b/drivers/net/vxlan/vxlan_core.c @@ -754,6 +754,7 @@ static struct sk_buff *vxlan_gpe_gro_receive(struct sock *sk, vh = vxlan_gro_prepare_receive(sk, head, skb, &grc); if (vh) { + NAPI_GRO_CB(skb)->inner_network_offset = skb_gro_offset(skb); if (!vxlan_parse_gpe_proto(vh, &protocol)) goto out; ptype = gro_find_receive_by_type(protocol); diff --git a/include/net/gro.h b/include/net/gro.h index 50f1e403dbbb..1faff23b66e8 100644 --- a/include/net/gro.h +++ b/include/net/gro.h @@ -87,6 +87,15 @@ struct napi_gro_cb { /* used to support CHECKSUM_COMPLETE for tunneling protocols */ __wsum csum; + + /* L3 offsets */ + union { + struct { + u16 network_offset; + u16 inner_network_offset; + }; + u16 network_offsets[2]; + }; }; #define NAPI_GRO_CB(skb) ((struct napi_gro_cb *)(skb)->cb) @@ -172,12 +181,17 @@ static inline void *skb_gro_header(struct sk_buff *skb, unsigned int hlen, return ptr; } +static inline int skb_gro_network_offset(const struct sk_buff *skb) +{ + return NAPI_GRO_CB(skb)->network_offsets[NAPI_GRO_CB(skb)->encap_mark]; +} + static inline void *skb_gro_network_header(const struct sk_buff *skb) { if (skb_gro_may_pull(skb, skb_gro_offset(skb))) - return skb_gro_header_fast(skb, skb_network_offset(skb)); + return skb_gro_header_fast(skb, skb_gro_network_offset(skb)); - return skb_network_header(skb); + return skb->data + skb_gro_network_offset(skb); } static inline __wsum inet_gro_compute_pseudo(const struct sk_buff *skb, diff --git a/net/8021q/vlan_core.c b/net/8021q/vlan_core.c index f00158234505..9404dd551dfd 100644 --- a/net/8021q/vlan_core.c +++ b/net/8021q/vlan_core.c @@ -478,6 +478,8 @@ static struct sk_buff *vlan_gro_receive(struct list_head *head, if (unlikely(!vhdr)) goto out; + NAPI_GRO_CB(skb)->network_offsets[NAPI_GRO_CB(skb)->encap_mark] = hlen; + type = vhdr->h_vlan_encapsulated_proto; ptype = gro_find_receive_by_type(type); diff --git a/net/core/gro.c b/net/core/gro.c index 83f35d99a682..c7901253a1a8 100644 --- a/net/core/gro.c +++ b/net/core/gro.c @@ -371,6 +371,7 @@ static inline void skb_gro_reset_offset(struct sk_buff *skb, u32 nhoff) const skb_frag_t *frag0; unsigned int headlen; + NAPI_GRO_CB(skb)->network_offset = 0; NAPI_GRO_CB(skb)->data_offset = 0; headlen = skb_headlen(skb); NAPI_GRO_CB(skb)->frag0 = skb->data; diff --git a/net/ethernet/eth.c b/net/ethernet/eth.c index 2edc8b796a4e..ea589e8cde2a 100644 --- a/net/ethernet/eth.c +++ b/net/ethernet/eth.c @@ -441,6 +441,7 @@ struct sk_buff *eth_gro_receive(struct list_head *head, struct sk_buff *skb) skb_gro_pull(skb, sizeof(*eh)); skb_gro_postpull_rcsum(skb, eh, sizeof(*eh)); + NAPI_GRO_CB(skb)->inner_network_offset = hlen; pp = indirect_call_gro_receive_inet(ptype->callbacks.gro_receive, ipv6_gro_receive, inet_gro_receive, diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c index 55bd72997b31..7899cbd5b263 100644 --- a/net/ipv4/af_inet.c +++ b/net/ipv4/af_inet.c @@ -1568,10 +1568,6 @@ struct sk_buff *inet_gro_receive(struct list_head *head, struct sk_buff *skb) NAPI_GRO_CB(skb)->is_atomic = !!(iph->frag_off & htons(IP_DF)); NAPI_GRO_CB(skb)->flush |= flush; - skb_set_network_header(skb, off); - /* The above will be needed by the transport layer if there is one - * immediately following this IP hdr. - */ /* Note : No need to call skb_gro_postpull_rcsum() here, * as we already checked checksum over ipv4 header was 0 @@ -1597,6 +1593,7 @@ static struct sk_buff *ipip_gro_receive(struct list_head *head, } NAPI_GRO_CB(skb)->encap_mark = 1; + NAPI_GRO_CB(skb)->inner_network_offset = skb_gro_offset(skb); return inet_gro_receive(head, skb); } diff --git a/net/ipv4/gre_offload.c b/net/ipv4/gre_offload.c index 5028c72d494a..a1ff2bdf6206 100644 --- a/net/ipv4/gre_offload.c +++ b/net/ipv4/gre_offload.c @@ -224,6 +224,7 @@ static struct sk_buff *gre_gro_receive(struct list_head *head, /* Adjusted NAPI_GRO_CB(skb)->csum after skb_gro_pull()*/ skb_gro_postpull_rcsum(skb, greh, grehlen); + NAPI_GRO_CB(skb)->inner_network_offset = hlen; pp = call_gro_receive(ptype->callbacks.gro_receive, head, skb); flush = 0; diff --git a/net/ipv6/ip6_offload.c b/net/ipv6/ip6_offload.c index b41e35af69ea..765797ca729c 100644 --- a/net/ipv6/ip6_offload.c +++ b/net/ipv6/ip6_offload.c @@ -67,7 +67,7 @@ static int ipv6_gro_pull_exthdrs(struct sk_buff *skb, int off, int proto) off += len; } - skb_gro_pull(skb, off - skb_network_offset(skb)); + skb_gro_pull(skb, off - skb_gro_network_offset(skb)); return proto; } @@ -236,8 +236,6 @@ INDIRECT_CALLABLE_SCOPE struct sk_buff *ipv6_gro_receive(struct list_head *head, if (unlikely(!iph)) goto out; - skb_set_network_header(skb, off); - flush += ntohs(iph->payload_len) != skb->len - hlen; proto = iph->nexthdr; @@ -259,7 +257,7 @@ INDIRECT_CALLABLE_SCOPE struct sk_buff *ipv6_gro_receive(struct list_head *head, NAPI_GRO_CB(skb)->proto = proto; flush--; - nlen = skb_network_header_len(skb); + nlen = skb_gro_offset(skb) - off; list_for_each_entry(p, head, list) { const struct ipv6hdr *iph2; @@ -327,6 +325,7 @@ static struct sk_buff *sit_ip6ip6_gro_receive(struct list_head *head, } NAPI_GRO_CB(skb)->encap_mark = 1; + NAPI_GRO_CB(skb)->inner_network_offset = skb_gro_offset(skb); return ipv6_gro_receive(head, skb); } @@ -342,6 +341,7 @@ static struct sk_buff *ip4ip6_gro_receive(struct list_head *head, } NAPI_GRO_CB(skb)->encap_mark = 1; + NAPI_GRO_CB(skb)->inner_network_offset = skb_gro_offset(skb); return inet_gro_receive(head, skb); }
This patch adds network_offset and inner_network_offset to napi_gro_cb, and makes sure both are set correctly. In the common path there's only one write (skb_gro_reset_offset, which replaces skb_set_network_header). Signed-off-by: Richard Gobert <richardbgobert@gmail.com> --- drivers/net/geneve.c | 1 + drivers/net/vxlan/vxlan_core.c | 1 + include/net/gro.h | 18 ++++++++++++++++-- net/8021q/vlan_core.c | 2 ++ net/core/gro.c | 1 + net/ethernet/eth.c | 1 + net/ipv4/af_inet.c | 5 +---- net/ipv4/gre_offload.c | 1 + net/ipv6/ip6_offload.c | 8 ++++---- 9 files changed, 28 insertions(+), 10 deletions(-)