diff mbox series

[BUG] In af_packet.c::dev_parse_header() skb.network_header does not point to the network header

Message ID CAJGXZLgcH6bjmj7YR-hAWpEQW1CPjEcOdMN01hqsVk18E4ScZQ@mail.gmail.com (mailing list archive)
State Not Applicable
Delegated to: Netdev Maintainers
Headers show
Series [BUG] In af_packet.c::dev_parse_header() skb.network_header does not point to the network header | expand

Checks

Context Check Description
netdev/tree_selection success Guessing tree name failed - patch did not apply

Commit Message

Aleksey Shumnik April 11, 2023, 2:47 p.m. UTC
Dear maintainers,

I wrote the ip6gre_header_parser() function in ip6_gre.c, the
implementation is similar to ipgre_header_parser() in ip_gre.c. (By
the way, it is strange that this function is not implemented in
ip6_gre.c)
The implementation of the function is presented below.
It should parse the ip6 header and take the source address and its
length from there. To get a pointer to the ip header, it is logical to
use skb_network_header(), but it does not work correctly and returns a
pointer to payload (skb.data).
Also in ip_gre.c::ipgre_header_parser() skb_mac_header() returns a
pointer to the ip header and everything works correctly (although it
seems that this is also an error, because the pointer to the mac
header should have been returned, and logically the
skb_network_header() function should be applied), but in ip6_gre.c all
skb_mac_header(), skb_network_header(), skb_tranport_header() returns
a pointer to payload (skb.data).
This function is called when receiving a packet and parsing it in
af_packet.c::packet_rcv() in dev_parse_header().
The problem is that there is no way to accurately determine the
beginning of the ip header.

 static const struct net_device_ops ip6gre_netdev_ops = {

Would you answer whether this behavior is an error and why the
behavior in ip_gre.c and ip6_gre.c differs?

Regards,
Aleksey

Comments

Willem de Bruijn April 12, 2023, 3:19 a.m. UTC | #1
Aleksey Shumnik wrote:
> Dear maintainers,
> 
> I wrote the ip6gre_header_parser() function in ip6_gre.c, the
> implementation is similar to ipgre_header_parser() in ip_gre.c. (By
> the way, it is strange that this function is not implemented in
> ip6_gre.c)
> The implementation of the function is presented below.
> It should parse the ip6 header and take the source address and its
> length from there. To get a pointer to the ip header, it is logical to
> use skb_network_header(), but it does not work correctly and returns a
> pointer to payload (skb.data).
> Also in ip_gre.c::ipgre_header_parser() skb_mac_header() returns a
> pointer to the ip header and everything works correctly (although it
> seems that this is also an error, because the pointer to the mac
> header should have been returned, and logically the
> skb_network_header() function should be applied), 

For a device of type ARPHRD_IPGRE or ARPHRD_IP6GRE there is no other
MAC header. This is not ARPHRD_ETHER.

The link layer header can be seen by looking for header_ops.create
if it exists. This creates the header for packet sockets of type
SOCK_DGRAM.

> but in ip6_gre.c all
> skb_mac_header(), skb_network_header(), skb_tranport_header() returns
> a pointer to payload (skb.data).
> This function is called when receiving a packet and parsing it in
> af_packet.c::packet_rcv() in dev_parse_header().
> The problem is that there is no way to accurately determine the
> beginning of the ip header.

The issue happens when comparing packet_rcv on an ip_gre tunnel vs an
ip6_gre tunnel.

The packet_rcv call does the same in both cases, e.g., setting
skb->data at mac or network header depending on SOCK_DGRAM or
SOCK_RAW.

The issue then is likely with a difference in tunnel implementations.
Both implement header_ops and header_ops.create (which is used on
receive by dev_has_header, but on transmit by dev_hard_header). They
return different lengths: one with and one without the IP header.

We've seen inconsistency in this before between tunnels. See also
commit aab1e898c26c. ipgre_xmit has special logic to optionally pull
the headers, but only if header_ops is set, which it isn't for all
variants of GRE tunnels.

Probably particularly relevant is this section in __ipgre_rcv:

                /* Special case for ipgre_header_parse(), which expects the
                 * mac_header to point to the outer IP header.
                 */
                if (tunnel->dev->header_ops == &ipgre_header_ops)
                        skb_pop_mac_header(skb);
                else
                        skb_reset_mac_header(skb);

and see this comment in the mentioned commit:

    ipgre_header_parse() seems to be the only case that requires mac_header
    to point to the outer header. We can detect this case accurately by
    checking ->header_ops. For all other cases, we can reset mac_header.

> diff --git a/net/ipv6/ip6_gre.c b/net/ipv6/ip6_gre.c
> index 90565b8..0d0c37b 100644
> --- a/net/ipv6/ip6_gre.c
> +++ b/net/ipv6/ip6_gre.c
> @@ -1404,8 +1404,16 @@ static int ip6gre_header(struct sk_buff *skb,
> struct net_device *dev,
>   return -t->hlen;
>  }
> 
> +static int ip6gre_header_parse(const struct sk_buff *skb, unsigned char *haddr)
> +{
> + const struct ipv6hdr *ipv6h = (const struct ipv6hdr *) skb_mac_header(skb);
> + memcpy(haddr, &ipv6h->saddr, 16);
> + return 16;
> +}
> +
>  static const struct header_ops ip6gre_header_ops = {
>   .create = ip6gre_header,
> + .parse = ip6gre_header_parse,
>  };
> 
>  static const struct net_device_ops ip6gre_netdev_ops = {
> 
> Would you answer whether this behavior is an error and why the
> behavior in ip_gre.c and ip6_gre.c differs?
> 
> Regards,
> Aleksey
Guillaume Nault April 20, 2023, 4:19 p.m. UTC | #2
On Tue, Apr 11, 2023 at 11:19:53PM -0400, Willem de Bruijn wrote:
> Aleksey Shumnik wrote:
> > but in ip6_gre.c all
> > skb_mac_header(), skb_network_header(), skb_tranport_header() returns
> > a pointer to payload (skb.data).
> > This function is called when receiving a packet and parsing it in
> > af_packet.c::packet_rcv() in dev_parse_header().
> > The problem is that there is no way to accurately determine the
> > beginning of the ip header.
> 
> The issue happens when comparing packet_rcv on an ip_gre tunnel vs an
> ip6_gre tunnel.
> 
> The packet_rcv call does the same in both cases, e.g., setting
> skb->data at mac or network header depending on SOCK_DGRAM or
> SOCK_RAW.
> 
> The issue then is likely with a difference in tunnel implementations.
> Both implement header_ops and header_ops.create (which is used on
> receive by dev_has_header, but on transmit by dev_hard_header). They
> return different lengths: one with and one without the IP header.

The problem is that, upon reception on an af_packet socket, ip_gre
wants to set the outer source IP address in sll->sll_addr. That is, it
considers the outer IP header as the mac header of the gre device.

As far as I know, ip_gre is the only tunnel that does that.

> We've seen inconsistency in this before between tunnels. See also
> commit aab1e898c26c. ipgre_xmit has special logic to optionally pull
> the headers, but only if header_ops is set, which it isn't for all
> variants of GRE tunnels.
> 
> Probably particularly relevant is this section in __ipgre_rcv:
> 
>                 /* Special case for ipgre_header_parse(), which expects the
>                  * mac_header to point to the outer IP header.
>                  */
>                 if (tunnel->dev->header_ops == &ipgre_header_ops)
>                         skb_pop_mac_header(skb);
>                 else
>                         skb_reset_mac_header(skb);
> 
> and see this comment in the mentioned commit:
> 
>     ipgre_header_parse() seems to be the only case that requires mac_header
>     to point to the outer header. We can detect this case accurately by
>     checking ->header_ops. For all other cases, we can reset mac_header.

The problem was about unifying the different ip tunnel behaviours, as
described in the cover letter of the series (merge commit 8eb517a2a4ae
("Merge branch 'reset-mac'") has all the details).

The idea is to make all tunnel devices consistently set ->mac_header
and ->network_header to the corresponding inner headers. For tunnels
that directly transport network protocols, ->mac_header equals
->network_header (that is, the mac header length is 0).

But there's a problem with ip_gre, as it wants to access the outer
headers again, even though it has already pulled them. To do that,
ip_gre saves the offset of the outer ip header in the ->mac_header, so
that ipgre_header_parse() can find it again later. That's why ip_gre
can't properly set ->mac_header to the inner mac header offset, as the
other tunnels do.

I personally find this use of ->mac_header a bit hacky, but it's used
to implement a feature that's required for some users (see commit
0e3da5bb8da4 ("ip_gre: fix msg_name parsing for recvfrom/recvmsg")). We
could probably store the outer IP header offset elsewhere and reset
->mac_header the way all other tunnels do. But I didn't find a
satisfying solution, so I just kept ip_gre as an exception.

> > diff --git a/net/ipv6/ip6_gre.c b/net/ipv6/ip6_gre.c
> > index 90565b8..0d0c37b 100644
> > --- a/net/ipv6/ip6_gre.c
> > +++ b/net/ipv6/ip6_gre.c
> > @@ -1404,8 +1404,16 @@ static int ip6gre_header(struct sk_buff *skb,
> > struct net_device *dev,
> >   return -t->hlen;
> >  }
> > 
> > +static int ip6gre_header_parse(const struct sk_buff *skb, unsigned char *haddr)
> > +{
> > + const struct ipv6hdr *ipv6h = (const struct ipv6hdr *) skb_mac_header(skb);
> > + memcpy(haddr, &ipv6h->saddr, 16);
> > + return 16;
> > +}
> > +
> >  static const struct header_ops ip6gre_header_ops = {
> >   .create = ip6gre_header,
> > + .parse = ip6gre_header_parse,
> >  };
> > 
> >  static const struct net_device_ops ip6gre_netdev_ops = {
> > 
> > Would you answer whether this behavior is an error and why the
> > behavior in ip_gre.c and ip6_gre.c differs?
> > 
> > Regards,
> > Aleksey
> 
>
Guillaume Nault April 20, 2023, 4:53 p.m. UTC | #3
On Tue, Apr 11, 2023 at 05:47:34PM +0300, Aleksey Shumnik wrote:
> Dear maintainers,
> 
> I wrote the ip6gre_header_parser() function in ip6_gre.c, the
> implementation is similar to ipgre_header_parser() in ip_gre.c. (By
> the way, it is strange that this function is not implemented in
> ip6_gre.c)
> The implementation of the function is presented below.
> It should parse the ip6 header and take the source address and its
> length from there. To get a pointer to the ip header, it is logical to
> use skb_network_header(), but it does not work correctly and returns a
> pointer to payload (skb.data).

At this point, the tunnel device has already removed the outer headers.
So skb_network_header() points to the _inner_ network header.

> Also in ip_gre.c::ipgre_header_parser() skb_mac_header() returns a
> pointer to the ip header and everything works correctly (although it
> seems that this is also an error, because the pointer to the mac
> header should have been returned, and logically the
> skb_network_header() function should be applied),

Well, skb_mac_header() and skb_network_header() should point to the
inner mac and network headers respectively. However, because ip_gre
has no inner mac header, skb_mac_header() was repurposed to save the
position of the outer network header (so that ipgre_header_parse() can
find it).

> but in ip6_gre.c all
> skb_mac_header(), skb_network_header(), skb_tranport_header() returns
> a pointer to payload (skb.data).

The packet has already been decapsulated by the tunnel device: the
outer headers are gone. Therefore, the packet now starts right after
the gre header. So skb_mac_header() points there. And since ip6_gre
transports packets with no mac header, the mac header length is zero,
which means skb_network_header() equals skb_mac_header() and points to
the inner network header. Finally, as the inner network header hasn't
been parsed yet, we don't know where it ends, so skb_tranport_header()
isn't usable yet.

> This function is called when receiving a packet and parsing it in
> af_packet.c::packet_rcv() in dev_parse_header().
> The problem is that there is no way to accurately determine the
> beginning of the ip header.
> 
> diff --git a/net/ipv6/ip6_gre.c b/net/ipv6/ip6_gre.c
> index 90565b8..0d0c37b 100644
> --- a/net/ipv6/ip6_gre.c
> +++ b/net/ipv6/ip6_gre.c
> @@ -1404,8 +1404,16 @@ static int ip6gre_header(struct sk_buff *skb,
> struct net_device *dev,
>   return -t->hlen;
>  }
> 
> +static int ip6gre_header_parse(const struct sk_buff *skb, unsigned char *haddr)
> +{
> + const struct ipv6hdr *ipv6h = (const struct ipv6hdr *) skb_mac_header(skb);
> + memcpy(haddr, &ipv6h->saddr, 16);
> + return 16;
> +}
> +
>  static const struct header_ops ip6gre_header_ops = {
>   .create = ip6gre_header,
> + .parse = ip6gre_header_parse,
>  };
> 
>  static const struct net_device_ops ip6gre_netdev_ops = {
> 
> Would you answer whether this behavior is an error and why the
> behavior in ip_gre.c and ip6_gre.c differs?

For me, ip_gre should make the mac header point to the inner mac
header, which incidentally is also the inner network header.

The difference in behaviour between ip_gre and ip6_gre certainly comes
from the fact that both modules were implemented at different times, by
different people.

> Regards,
> Aleksey
>
diff mbox series

Patch

diff --git a/net/ipv6/ip6_gre.c b/net/ipv6/ip6_gre.c
index 90565b8..0d0c37b 100644
--- a/net/ipv6/ip6_gre.c
+++ b/net/ipv6/ip6_gre.c
@@ -1404,8 +1404,16 @@  static int ip6gre_header(struct sk_buff *skb,
struct net_device *dev,
  return -t->hlen;
 }

+static int ip6gre_header_parse(const struct sk_buff *skb, unsigned char *haddr)
+{
+ const struct ipv6hdr *ipv6h = (const struct ipv6hdr *) skb_mac_header(skb);
+ memcpy(haddr, &ipv6h->saddr, 16);
+ return 16;
+}
+
 static const struct header_ops ip6gre_header_ops = {
  .create = ip6gre_header,
+ .parse = ip6gre_header_parse,
 };