diff mbox series

IPv6: Set SIT tunnel hard_header_len to zero

Message ID 20201103104133.GA1573211@tws (mailing list archive)
State Not Applicable
Delegated to: Netdev Maintainers
Headers show
Series IPv6: Set SIT tunnel hard_header_len to zero | expand

Commit Message

Oliver Herms Nov. 3, 2020, 10:41 a.m. UTC
Due to the legacy usage of hard_header_len for SIT tunnels while
already using infrastructure from net/ipv4/ip_tunnel.c the
calculation of the path MTU in tnl_update_pmtu is incorrect.
This leads to unnecessary creation of MTU exceptions for any
flow going over a SIT tunnel.

As SIT tunnels do not have a header themsevles other than their
transport (L3, L2) headers we're leaving hard_header_len set to zero
as tnl_update_pmtu is already taking care of the transport headers
sizes.

This will also help avoiding unnecessary IPv6 GC runs and spinlock
contention seen when using SIT tunnels and for more than
net.ipv6.route.gc_thresh flows.

Fixes: c54419321455 ("GRE: Refactor GRE tunneling code.")
Signed-off-by: Oliver Herms <oliver.peter.herms@gmail.com>
---
 net/ipv6/sit.c | 2 --
 1 file changed, 2 deletions(-)

Comments

Willem de Bruijn Nov. 3, 2020, 6:42 p.m. UTC | #1
On Tue, Nov 3, 2020 at 5:41 AM Oliver Herms
<oliver.peter.herms@gmail.com> wrote:
>
> Due to the legacy usage of hard_header_len for SIT tunnels while
> already using infrastructure from net/ipv4/ip_tunnel.c the
> calculation of the path MTU in tnl_update_pmtu is incorrect.
> This leads to unnecessary creation of MTU exceptions for any
> flow going over a SIT tunnel.
>
> As SIT tunnels do not have a header themsevles other than their
> transport (L3, L2) headers we're leaving hard_header_len set to zero
> as tnl_update_pmtu is already taking care of the transport headers
> sizes.
>
> This will also help avoiding unnecessary IPv6 GC runs and spinlock
> contention seen when using SIT tunnels and for more than
> net.ipv6.route.gc_thresh flows.

Thanks. Yes, this is long overdue.

The hard_header_len issue was also recently discussed in the context
of GRE in commit fdafed459998 ("ip_gre: set dev->hard_header_len and
dev->needed_headroom properly").

Question is whether we should reserve room in needed_headroom instead.
AFAIK this existing update logic in ip6_tnl_xmit is sufficient

"
        /* Calculate max headroom for all the headers and adjust
         * needed_headroom if necessary.
         */
        max_headroom = LL_RESERVED_SPACE(dst->dev) + sizeof(struct ipv6hdr)
                        + dst->header_len + t->hlen;
        if (max_headroom > dev->needed_headroom)
                dev->needed_headroom = max_headroom;
"

> Fixes: c54419321455 ("GRE: Refactor GRE tunneling code.")

How did you arrive at this SHA1?

> Signed-off-by: Oliver Herms <oliver.peter.herms@gmail.com>
> ---
>  net/ipv6/sit.c | 2 --
>  1 file changed, 2 deletions(-)
>
> diff --git a/net/ipv6/sit.c b/net/ipv6/sit.c
> index 5e2c34c0ac97..5e7983cb6154 100644
> --- a/net/ipv6/sit.c
> +++ b/net/ipv6/sit.c
> @@ -1128,7 +1128,6 @@ static void ipip6_tunnel_bind_dev(struct net_device *dev)
>         if (tdev && !netif_is_l3_master(tdev)) {
>                 int t_hlen = tunnel->hlen + sizeof(struct iphdr);
>
> -               dev->hard_header_len = tdev->hard_header_len + sizeof(struct iphdr);
>                 dev->mtu = tdev->mtu - t_hlen;
>                 if (dev->mtu < IPV6_MIN_MTU)
>                         dev->mtu = IPV6_MIN_MTU;
> @@ -1426,7 +1425,6 @@ static void ipip6_tunnel_setup(struct net_device *dev)
>         dev->priv_destructor    = ipip6_dev_free;
>
>         dev->type               = ARPHRD_SIT;
> -       dev->hard_header_len    = LL_MAX_HEADER + t_hlen;
>         dev->mtu                = ETH_DATA_LEN - t_hlen;
>         dev->min_mtu            = IPV6_MIN_MTU;
>         dev->max_mtu            = IP6_MAX_MTU - t_hlen;
> --
> 2.25.1
>
Oliver Herms Nov. 4, 2020, 7:30 p.m. UTC | #2
On 03.11.20 19:42, Willem de Bruijn wrote:
> Thanks. Yes, this is long overdue.
> 
> The hard_header_len issue was also recently discussed in the context
> of GRE in commit fdafed459998 ("ip_gre: set dev->hard_header_len and
> dev->needed_headroom properly").
> 
> Question is whether we should reserve room in needed_headroom instead.
> AFAIK this existing update logic in ip6_tnl_xmit is sufficient
> 
> "
>         /* Calculate max headroom for all the headers and adjust
>          * needed_headroom if necessary.
>          */
>         max_headroom = LL_RESERVED_SPACE(dst->dev) + sizeof(struct ipv6hdr)
>                         + dst->header_len + t->hlen;
>         if (max_headroom > dev->needed_headroom)
>                 dev->needed_headroom = max_headroom;
> "I think that's enough. At least it definitely works in my test and prod environment.
Would be good to get another opinion on this though.

>> Fixes: c54419321455 ("GRE: Refactor GRE tunneling code.")
> 
> How did you arrive at this SHA1?
I think the legacy usage of hard_header_len in ipv6/sit.c was overseen in c54419321455.
Please correct me if I'm wrong.
Willem de Bruijn Nov. 4, 2020, 7:52 p.m. UTC | #3
On Wed, Nov 4, 2020 at 2:30 PM Oliver Herms
<oliver.peter.herms@gmail.com> wrote:
>
> On 03.11.20 19:42, Willem de Bruijn wrote:
> > Thanks. Yes, this is long overdue.
> >
> > The hard_header_len issue was also recently discussed in the context
> > of GRE in commit fdafed459998 ("ip_gre: set dev->hard_header_len and
> > dev->needed_headroom properly").
> >
> > Question is whether we should reserve room in needed_headroom instead.
> > AFAIK this existing update logic in ip6_tnl_xmit is sufficient
> >
> > "
> >         /* Calculate max headroom for all the headers and adjust
> >          * needed_headroom if necessary.
> >          */
> >         max_headroom = LL_RESERVED_SPACE(dst->dev) + sizeof(struct ipv6hdr)
> >                         + dst->header_len + t->hlen;
> >         if (max_headroom > dev->needed_headroom)
> >                 dev->needed_headroom = max_headroom;
> > "I think that's enough. At least it definitely works in my test and prod environment.
> Would be good to get another opinion on this though.

Sit should behave the same as other tunnels. At least in ip6_tunnel.c
and ip6_gre.c I do not see explicit initialization of needed_headroom.

> >> Fixes: c54419321455 ("GRE: Refactor GRE tunneling code.")
> >
> > How did you arrive at this SHA1?
> I think the legacy usage of hard_header_len in ipv6/sit.c was overseen in c54419321455.
> Please correct me if I'm wrong.

I don't see anything in that patch assign or modify hard_header_len.
Oliver Herms Nov. 9, 2020, 9:05 a.m. UTC | #4
On 04.11.20 20:52, Willem de Bruijn wrote:
>>>> Fixes: c54419321455 ("GRE: Refactor GRE tunneling code.")
>>>
>>> How did you arrive at this SHA1?
>> I think the legacy usage of hard_header_len in ipv6/sit.c was overseen in c54419321455.
>> Please correct me if I'm wrong.
> 
> I don't see anything in that patch assign or modify hard_header_len.
> 
It's not assigning or modifying it but changing expectations about how dev->hard_header_len is to be used.

The patch changed the MTU calculation from:
mtu = dst_mtu(&rt->dst) - dev->hard_header_len - tunnel->hlen;

to this:
mtu = dst_mtu(&rt->dst) - dev->hard_header_len - sizeof(struct iphdr);

Later is became this (in patch 23a3647. This is the current implementation.):
mtu = dst_mtu(&rt->dst) - dev->hard_header_len - sizeof(struct iphdr) - tunnel_hlen;

Apparently the initial usage of dev->hard_header_len was that it contains the length
of all headers before the tunnel payload. c54419321455 changed it to assuming dev->hard_header_len 
does not contain the tunnels outter IP header. Thus I think the bug was introduced by c54419321455.

Kind Regards
Oliver
Willem de Bruijn Nov. 9, 2020, 3:10 p.m. UTC | #5
On Mon, Nov 9, 2020 at 4:05 AM Oliver Herms
<oliver.peter.herms@gmail.com> wrote:
>
>
> On 04.11.20 20:52, Willem de Bruijn wrote:
> >>>> Fixes: c54419321455 ("GRE: Refactor GRE tunneling code.")
> >>>
> >>> How did you arrive at this SHA1?
> >> I think the legacy usage of hard_header_len in ipv6/sit.c was overseen in c54419321455.
> >> Please correct me if I'm wrong.
> >
> > I don't see anything in that patch assign or modify hard_header_len.
> >
> It's not assigning or modifying it but changing expectations about how dev->hard_header_len is to be used.
>
> The patch changed the MTU calculation from:
> mtu = dst_mtu(&rt->dst) - dev->hard_header_len - tunnel->hlen;
>
> to this:
> mtu = dst_mtu(&rt->dst) - dev->hard_header_len - sizeof(struct iphdr);
>
> Later is became this (in patch 23a3647. This is the current implementation.):
> mtu = dst_mtu(&rt->dst) - dev->hard_header_len - sizeof(struct iphdr) - tunnel_hlen;
>
> Apparently the initial usage of dev->hard_header_len was that it contains the length
> of all headers before the tunnel payload. c54419321455 changed it to assuming dev->hard_header_len
> does not contain the tunnels outter IP header. Thus I think the bug was introduced by c54419321455.

And the only header in the case of SIT is that outer ip header. Got it, thanks.

Overly conservative MTU calculation is one issue. Packet sockets also
expect read/write link layer access with SOCK_RAW, which does not work
correctly for sit. I'm not sure that it ever did.

The chosen commit predates all stable trees, which is the most important point.

Acked-by: Willem de Bruijn <willemb@google.com>

Could ip6 tunnels have the same issue? In ip6_tnl_dev_init_gen,

        dev->hard_header_len = LL_MAX_HEADER + t_hlen;
Jakub Kicinski Nov. 9, 2020, 11:16 p.m. UTC | #6
On Tue, 3 Nov 2020 11:41:33 +0100 Oliver Herms wrote:
> Due to the legacy usage of hard_header_len for SIT tunnels while
> already using infrastructure from net/ipv4/ip_tunnel.c the
> calculation of the path MTU in tnl_update_pmtu is incorrect.
> This leads to unnecessary creation of MTU exceptions for any
> flow going over a SIT tunnel.
> 
> As SIT tunnels do not have a header themsevles other than their
> transport (L3, L2) headers we're leaving hard_header_len set to zero
> as tnl_update_pmtu is already taking care of the transport headers
> sizes.
> 
> This will also help avoiding unnecessary IPv6 GC runs and spinlock
> contention seen when using SIT tunnels and for more than
> net.ipv6.route.gc_thresh flows.
> 
> Fixes: c54419321455 ("GRE: Refactor GRE tunneling code.")
> Signed-off-by: Oliver Herms <oliver.peter.herms@gmail.com>

Applied, thanks! Let's carry on the discussion about ipv6 tunnels,
though, it does look questionable.
diff mbox series

Patch

diff --git a/net/ipv6/sit.c b/net/ipv6/sit.c
index 5e2c34c0ac97..5e7983cb6154 100644
--- a/net/ipv6/sit.c
+++ b/net/ipv6/sit.c
@@ -1128,7 +1128,6 @@  static void ipip6_tunnel_bind_dev(struct net_device *dev)
 	if (tdev && !netif_is_l3_master(tdev)) {
 		int t_hlen = tunnel->hlen + sizeof(struct iphdr);
 
-		dev->hard_header_len = tdev->hard_header_len + sizeof(struct iphdr);
 		dev->mtu = tdev->mtu - t_hlen;
 		if (dev->mtu < IPV6_MIN_MTU)
 			dev->mtu = IPV6_MIN_MTU;
@@ -1426,7 +1425,6 @@  static void ipip6_tunnel_setup(struct net_device *dev)
 	dev->priv_destructor	= ipip6_dev_free;
 
 	dev->type		= ARPHRD_SIT;
-	dev->hard_header_len	= LL_MAX_HEADER + t_hlen;
 	dev->mtu		= ETH_DATA_LEN - t_hlen;
 	dev->min_mtu		= IPV6_MIN_MTU;
 	dev->max_mtu		= IP6_MAX_MTU - t_hlen;