Message ID | CAJGXZLgcH6bjmj7YR-hAWpEQW1CPjEcOdMN01hqsVk18E4ScZQ@mail.gmail.com (mailing list archive) |
---|---|
State | Not Applicable |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [BUG] In af_packet.c::dev_parse_header() skb.network_header does not point to the network header | expand |
Context | Check | Description |
---|---|---|
netdev/tree_selection | success | Guessing tree name failed - patch did not apply |
Aleksey Shumnik wrote: > Dear maintainers, > > I wrote the ip6gre_header_parser() function in ip6_gre.c, the > implementation is similar to ipgre_header_parser() in ip_gre.c. (By > the way, it is strange that this function is not implemented in > ip6_gre.c) > The implementation of the function is presented below. > It should parse the ip6 header and take the source address and its > length from there. To get a pointer to the ip header, it is logical to > use skb_network_header(), but it does not work correctly and returns a > pointer to payload (skb.data). > Also in ip_gre.c::ipgre_header_parser() skb_mac_header() returns a > pointer to the ip header and everything works correctly (although it > seems that this is also an error, because the pointer to the mac > header should have been returned, and logically the > skb_network_header() function should be applied), For a device of type ARPHRD_IPGRE or ARPHRD_IP6GRE there is no other MAC header. This is not ARPHRD_ETHER. The link layer header can be seen by looking for header_ops.create if it exists. This creates the header for packet sockets of type SOCK_DGRAM. > but in ip6_gre.c all > skb_mac_header(), skb_network_header(), skb_tranport_header() returns > a pointer to payload (skb.data). > This function is called when receiving a packet and parsing it in > af_packet.c::packet_rcv() in dev_parse_header(). > The problem is that there is no way to accurately determine the > beginning of the ip header. The issue happens when comparing packet_rcv on an ip_gre tunnel vs an ip6_gre tunnel. The packet_rcv call does the same in both cases, e.g., setting skb->data at mac or network header depending on SOCK_DGRAM or SOCK_RAW. The issue then is likely with a difference in tunnel implementations. Both implement header_ops and header_ops.create (which is used on receive by dev_has_header, but on transmit by dev_hard_header). They return different lengths: one with and one without the IP header. We've seen inconsistency in this before between tunnels. See also commit aab1e898c26c. ipgre_xmit has special logic to optionally pull the headers, but only if header_ops is set, which it isn't for all variants of GRE tunnels. Probably particularly relevant is this section in __ipgre_rcv: /* Special case for ipgre_header_parse(), which expects the * mac_header to point to the outer IP header. */ if (tunnel->dev->header_ops == &ipgre_header_ops) skb_pop_mac_header(skb); else skb_reset_mac_header(skb); and see this comment in the mentioned commit: ipgre_header_parse() seems to be the only case that requires mac_header to point to the outer header. We can detect this case accurately by checking ->header_ops. For all other cases, we can reset mac_header. > diff --git a/net/ipv6/ip6_gre.c b/net/ipv6/ip6_gre.c > index 90565b8..0d0c37b 100644 > --- a/net/ipv6/ip6_gre.c > +++ b/net/ipv6/ip6_gre.c > @@ -1404,8 +1404,16 @@ static int ip6gre_header(struct sk_buff *skb, > struct net_device *dev, > return -t->hlen; > } > > +static int ip6gre_header_parse(const struct sk_buff *skb, unsigned char *haddr) > +{ > + const struct ipv6hdr *ipv6h = (const struct ipv6hdr *) skb_mac_header(skb); > + memcpy(haddr, &ipv6h->saddr, 16); > + return 16; > +} > + > static const struct header_ops ip6gre_header_ops = { > .create = ip6gre_header, > + .parse = ip6gre_header_parse, > }; > > static const struct net_device_ops ip6gre_netdev_ops = { > > Would you answer whether this behavior is an error and why the > behavior in ip_gre.c and ip6_gre.c differs? > > Regards, > Aleksey
On Tue, Apr 11, 2023 at 11:19:53PM -0400, Willem de Bruijn wrote: > Aleksey Shumnik wrote: > > but in ip6_gre.c all > > skb_mac_header(), skb_network_header(), skb_tranport_header() returns > > a pointer to payload (skb.data). > > This function is called when receiving a packet and parsing it in > > af_packet.c::packet_rcv() in dev_parse_header(). > > The problem is that there is no way to accurately determine the > > beginning of the ip header. > > The issue happens when comparing packet_rcv on an ip_gre tunnel vs an > ip6_gre tunnel. > > The packet_rcv call does the same in both cases, e.g., setting > skb->data at mac or network header depending on SOCK_DGRAM or > SOCK_RAW. > > The issue then is likely with a difference in tunnel implementations. > Both implement header_ops and header_ops.create (which is used on > receive by dev_has_header, but on transmit by dev_hard_header). They > return different lengths: one with and one without the IP header. The problem is that, upon reception on an af_packet socket, ip_gre wants to set the outer source IP address in sll->sll_addr. That is, it considers the outer IP header as the mac header of the gre device. As far as I know, ip_gre is the only tunnel that does that. > We've seen inconsistency in this before between tunnels. See also > commit aab1e898c26c. ipgre_xmit has special logic to optionally pull > the headers, but only if header_ops is set, which it isn't for all > variants of GRE tunnels. > > Probably particularly relevant is this section in __ipgre_rcv: > > /* Special case for ipgre_header_parse(), which expects the > * mac_header to point to the outer IP header. > */ > if (tunnel->dev->header_ops == &ipgre_header_ops) > skb_pop_mac_header(skb); > else > skb_reset_mac_header(skb); > > and see this comment in the mentioned commit: > > ipgre_header_parse() seems to be the only case that requires mac_header > to point to the outer header. We can detect this case accurately by > checking ->header_ops. For all other cases, we can reset mac_header. The problem was about unifying the different ip tunnel behaviours, as described in the cover letter of the series (merge commit 8eb517a2a4ae ("Merge branch 'reset-mac'") has all the details). The idea is to make all tunnel devices consistently set ->mac_header and ->network_header to the corresponding inner headers. For tunnels that directly transport network protocols, ->mac_header equals ->network_header (that is, the mac header length is 0). But there's a problem with ip_gre, as it wants to access the outer headers again, even though it has already pulled them. To do that, ip_gre saves the offset of the outer ip header in the ->mac_header, so that ipgre_header_parse() can find it again later. That's why ip_gre can't properly set ->mac_header to the inner mac header offset, as the other tunnels do. I personally find this use of ->mac_header a bit hacky, but it's used to implement a feature that's required for some users (see commit 0e3da5bb8da4 ("ip_gre: fix msg_name parsing for recvfrom/recvmsg")). We could probably store the outer IP header offset elsewhere and reset ->mac_header the way all other tunnels do. But I didn't find a satisfying solution, so I just kept ip_gre as an exception. > > diff --git a/net/ipv6/ip6_gre.c b/net/ipv6/ip6_gre.c > > index 90565b8..0d0c37b 100644 > > --- a/net/ipv6/ip6_gre.c > > +++ b/net/ipv6/ip6_gre.c > > @@ -1404,8 +1404,16 @@ static int ip6gre_header(struct sk_buff *skb, > > struct net_device *dev, > > return -t->hlen; > > } > > > > +static int ip6gre_header_parse(const struct sk_buff *skb, unsigned char *haddr) > > +{ > > + const struct ipv6hdr *ipv6h = (const struct ipv6hdr *) skb_mac_header(skb); > > + memcpy(haddr, &ipv6h->saddr, 16); > > + return 16; > > +} > > + > > static const struct header_ops ip6gre_header_ops = { > > .create = ip6gre_header, > > + .parse = ip6gre_header_parse, > > }; > > > > static const struct net_device_ops ip6gre_netdev_ops = { > > > > Would you answer whether this behavior is an error and why the > > behavior in ip_gre.c and ip6_gre.c differs? > > > > Regards, > > Aleksey > >
On Tue, Apr 11, 2023 at 05:47:34PM +0300, Aleksey Shumnik wrote: > Dear maintainers, > > I wrote the ip6gre_header_parser() function in ip6_gre.c, the > implementation is similar to ipgre_header_parser() in ip_gre.c. (By > the way, it is strange that this function is not implemented in > ip6_gre.c) > The implementation of the function is presented below. > It should parse the ip6 header and take the source address and its > length from there. To get a pointer to the ip header, it is logical to > use skb_network_header(), but it does not work correctly and returns a > pointer to payload (skb.data). At this point, the tunnel device has already removed the outer headers. So skb_network_header() points to the _inner_ network header. > Also in ip_gre.c::ipgre_header_parser() skb_mac_header() returns a > pointer to the ip header and everything works correctly (although it > seems that this is also an error, because the pointer to the mac > header should have been returned, and logically the > skb_network_header() function should be applied), Well, skb_mac_header() and skb_network_header() should point to the inner mac and network headers respectively. However, because ip_gre has no inner mac header, skb_mac_header() was repurposed to save the position of the outer network header (so that ipgre_header_parse() can find it). > but in ip6_gre.c all > skb_mac_header(), skb_network_header(), skb_tranport_header() returns > a pointer to payload (skb.data). The packet has already been decapsulated by the tunnel device: the outer headers are gone. Therefore, the packet now starts right after the gre header. So skb_mac_header() points there. And since ip6_gre transports packets with no mac header, the mac header length is zero, which means skb_network_header() equals skb_mac_header() and points to the inner network header. Finally, as the inner network header hasn't been parsed yet, we don't know where it ends, so skb_tranport_header() isn't usable yet. > This function is called when receiving a packet and parsing it in > af_packet.c::packet_rcv() in dev_parse_header(). > The problem is that there is no way to accurately determine the > beginning of the ip header. > > diff --git a/net/ipv6/ip6_gre.c b/net/ipv6/ip6_gre.c > index 90565b8..0d0c37b 100644 > --- a/net/ipv6/ip6_gre.c > +++ b/net/ipv6/ip6_gre.c > @@ -1404,8 +1404,16 @@ static int ip6gre_header(struct sk_buff *skb, > struct net_device *dev, > return -t->hlen; > } > > +static int ip6gre_header_parse(const struct sk_buff *skb, unsigned char *haddr) > +{ > + const struct ipv6hdr *ipv6h = (const struct ipv6hdr *) skb_mac_header(skb); > + memcpy(haddr, &ipv6h->saddr, 16); > + return 16; > +} > + > static const struct header_ops ip6gre_header_ops = { > .create = ip6gre_header, > + .parse = ip6gre_header_parse, > }; > > static const struct net_device_ops ip6gre_netdev_ops = { > > Would you answer whether this behavior is an error and why the > behavior in ip_gre.c and ip6_gre.c differs? For me, ip_gre should make the mac header point to the inner mac header, which incidentally is also the inner network header. The difference in behaviour between ip_gre and ip6_gre certainly comes from the fact that both modules were implemented at different times, by different people. > Regards, > Aleksey >
diff --git a/net/ipv6/ip6_gre.c b/net/ipv6/ip6_gre.c index 90565b8..0d0c37b 100644 --- a/net/ipv6/ip6_gre.c +++ b/net/ipv6/ip6_gre.c @@ -1404,8 +1404,16 @@ static int ip6gre_header(struct sk_buff *skb, struct net_device *dev, return -t->hlen; } +static int ip6gre_header_parse(const struct sk_buff *skb, unsigned char *haddr) +{ + const struct ipv6hdr *ipv6h = (const struct ipv6hdr *) skb_mac_header(skb); + memcpy(haddr, &ipv6h->saddr, 16); + return 16; +} + static const struct header_ops ip6gre_header_ops = { .create = ip6gre_header, + .parse = ip6gre_header_parse, };