Message ID | cover.1624572003.git.gnault@redhat.com (mailing list archive) |
---|---|
Headers | show |
Series | net: reset MAC header consistently across L3 virtual devices | expand |
On 6/25/21 7:32 AM, Guillaume Nault wrote: > Some virtual L3 devices, like vxlan-gpe and gre (in collect_md mode), > reset the MAC header pointer after they parsed the outer headers. This > accurately reflects the fact that the decapsulated packet is pure L3 > packet, as that makes the MAC header 0 bytes long (the MAC and network > header pointers are equal). > > However, many L3 devices only adjust the network header after > decapsulation and leave the MAC header pointer to its original value. > This can confuse other parts of the networking stack, like TC, which > then considers the outer headers as one big MAC header. > > This patch series makes the following L3 tunnels behave like VXLAN-GPE: > bareudp, ipip, sit, gre, ip6gre, ip6tnl, gtp. > > The case of gre is a bit special. It already resets the MAC header > pointer in collect_md mode, so only the classical mode needs to be > adjusted. However, gre also has a special case that expects the MAC > header pointer to keep pointing to the outer header even after > decapsulation. Therefore, patch 4 keeps an exception for this case. > > Ideally, we'd centralise the call to skb_reset_mac_header() in > ip_tunnel_rcv(), to avoid manual calls in ipip (patch 2), > sit (patch 3) and gre (patch 4). That's unfortunately not feasible > currently, because of the gre special case discussed above that > precludes us from resetting the MAC header unconditionally. What about adding a flag to ip_tunnel indicating if it can be done (or should not be done since doing it is the most common)? > > The original motivation is to redirect bareudp packets to Ethernet > devices (as described in patch 1). The rest of this series aims at > bringing consistency across all L3 devices (apart from gre's special > case unfortunately). Can you add a selftests that covers the use cases you mention in the commit logs?
On Sat, Jun 26, 2021 at 11:50:19AM -0600, David Ahern wrote: > On 6/25/21 7:32 AM, Guillaume Nault wrote: > > Some virtual L3 devices, like vxlan-gpe and gre (in collect_md mode), > > reset the MAC header pointer after they parsed the outer headers. This > > accurately reflects the fact that the decapsulated packet is pure L3 > > packet, as that makes the MAC header 0 bytes long (the MAC and network > > header pointers are equal). > > > > However, many L3 devices only adjust the network header after > > decapsulation and leave the MAC header pointer to its original value. > > This can confuse other parts of the networking stack, like TC, which > > then considers the outer headers as one big MAC header. > > > > This patch series makes the following L3 tunnels behave like VXLAN-GPE: > > bareudp, ipip, sit, gre, ip6gre, ip6tnl, gtp. > > > > The case of gre is a bit special. It already resets the MAC header > > pointer in collect_md mode, so only the classical mode needs to be > > adjusted. However, gre also has a special case that expects the MAC > > header pointer to keep pointing to the outer header even after > > decapsulation. Therefore, patch 4 keeps an exception for this case. > > > > Ideally, we'd centralise the call to skb_reset_mac_header() in > > ip_tunnel_rcv(), to avoid manual calls in ipip (patch 2), > > sit (patch 3) and gre (patch 4). That's unfortunately not feasible > > currently, because of the gre special case discussed above that > > precludes us from resetting the MAC header unconditionally. > > What about adding a flag to ip_tunnel indicating if it can be done (or > should not be done since doing it is the most common)? That's feasible. I didn't do it here because I wanted to keep the patch series focused on L3 tunnels. Modifying ip_tunnel_rcv()'s prototype would also require updating erspan_rcv(), which isn't L3 (erspan carries Ethernet frames). I was feeling such consolidation would be best done in a follow up patch series. I can repost if you feel strongly about it. Otherwise, I'll follow up with the ip_tunnel_rcv() consolidation in a later patch. Just let me know if you have any preference. > > The original motivation is to redirect bareudp packets to Ethernet > > devices (as described in patch 1). The rest of this series aims at > > bringing consistency across all L3 devices (apart from gre's special > > case unfortunately). > > Can you add a selftests that covers the use cases you mention in the > commit logs? I'm already having a selftests for most of the tunnels (and their different operating modes), gtp being the main exception. But it's not yet ready for upstream, as I'm trying to move the topology in its own .sh file, to keep the main selftests as simple as possible. I'll post it as soon as I get it in good shape. Thanks for the rewiew.
On 6/26/21 2:53 PM, Guillaume Nault wrote: > On Sat, Jun 26, 2021 at 11:50:19AM -0600, David Ahern wrote: >> On 6/25/21 7:32 AM, Guillaume Nault wrote: >>> Some virtual L3 devices, like vxlan-gpe and gre (in collect_md mode), >>> reset the MAC header pointer after they parsed the outer headers. This >>> accurately reflects the fact that the decapsulated packet is pure L3 >>> packet, as that makes the MAC header 0 bytes long (the MAC and network >>> header pointers are equal). >>> >>> However, many L3 devices only adjust the network header after >>> decapsulation and leave the MAC header pointer to its original value. >>> This can confuse other parts of the networking stack, like TC, which >>> then considers the outer headers as one big MAC header. >>> >>> This patch series makes the following L3 tunnels behave like VXLAN-GPE: >>> bareudp, ipip, sit, gre, ip6gre, ip6tnl, gtp. >>> >>> The case of gre is a bit special. It already resets the MAC header >>> pointer in collect_md mode, so only the classical mode needs to be >>> adjusted. However, gre also has a special case that expects the MAC >>> header pointer to keep pointing to the outer header even after >>> decapsulation. Therefore, patch 4 keeps an exception for this case. >>> >>> Ideally, we'd centralise the call to skb_reset_mac_header() in >>> ip_tunnel_rcv(), to avoid manual calls in ipip (patch 2), >>> sit (patch 3) and gre (patch 4). That's unfortunately not feasible >>> currently, because of the gre special case discussed above that >>> precludes us from resetting the MAC header unconditionally. >> >> What about adding a flag to ip_tunnel indicating if it can be done (or >> should not be done since doing it is the most common)? > > That's feasible. I didn't do it here because I wanted to keep the > patch series focused on L3 tunnels. Modifying ip_tunnel_rcv()'s > prototype would also require updating erspan_rcv(), which isn't L3 > (erspan carries Ethernet frames). I was feeling such consolidation > would be best done in a follow up patch series. I was thinking a flag in 'struct ip_tunnel'. It's the private data for those netdevices, so a per-instance setting. I haven't walked through the details to know if it would work. > > I can repost if you feel strongly about it. Otherwise, I'll follow up > with the ip_tunnel_rcv() consolidation in a later patch. Just let me > know if you have any preference. > >>> The original motivation is to redirect bareudp packets to Ethernet >>> devices (as described in patch 1). The rest of this series aims at >>> bringing consistency across all L3 devices (apart from gre's special >>> case unfortunately). >> >> Can you add a selftests that covers the use cases you mention in the >> commit logs? > > I'm already having a selftests for most of the tunnels (and their > different operating modes), gtp being the main exception. But it's not > yet ready for upstream, as I'm trying to move the topology in its own > .sh file, to keep the main selftests as simple as possible. > I'll post it as soon as I get it in good shape. > That works. Thanks,
On Sun, Jun 27, 2021 at 09:56:53AM -0600, David Ahern wrote: > On 6/26/21 2:53 PM, Guillaume Nault wrote: > > On Sat, Jun 26, 2021 at 11:50:19AM -0600, David Ahern wrote: > >> On 6/25/21 7:32 AM, Guillaume Nault wrote: > >>> Some virtual L3 devices, like vxlan-gpe and gre (in collect_md mode), > >>> reset the MAC header pointer after they parsed the outer headers. This > >>> accurately reflects the fact that the decapsulated packet is pure L3 > >>> packet, as that makes the MAC header 0 bytes long (the MAC and network > >>> header pointers are equal). > >>> > >>> However, many L3 devices only adjust the network header after > >>> decapsulation and leave the MAC header pointer to its original value. > >>> This can confuse other parts of the networking stack, like TC, which > >>> then considers the outer headers as one big MAC header. > >>> > >>> This patch series makes the following L3 tunnels behave like VXLAN-GPE: > >>> bareudp, ipip, sit, gre, ip6gre, ip6tnl, gtp. > >>> > >>> The case of gre is a bit special. It already resets the MAC header > >>> pointer in collect_md mode, so only the classical mode needs to be > >>> adjusted. However, gre also has a special case that expects the MAC > >>> header pointer to keep pointing to the outer header even after > >>> decapsulation. Therefore, patch 4 keeps an exception for this case. > >>> > >>> Ideally, we'd centralise the call to skb_reset_mac_header() in > >>> ip_tunnel_rcv(), to avoid manual calls in ipip (patch 2), > >>> sit (patch 3) and gre (patch 4). That's unfortunately not feasible > >>> currently, because of the gre special case discussed above that > >>> precludes us from resetting the MAC header unconditionally. > >> > >> What about adding a flag to ip_tunnel indicating if it can be done (or > >> should not be done since doing it is the most common)? > > > > That's feasible. I didn't do it here because I wanted to keep the > > patch series focused on L3 tunnels. Modifying ip_tunnel_rcv()'s > > prototype would also require updating erspan_rcv(), which isn't L3 > > (erspan carries Ethernet frames). I was feeling such consolidation > > would be best done in a follow up patch series. > > I was thinking a flag in 'struct ip_tunnel'. It's the private data for > those netdevices, so a per-instance setting. I haven't walked through > the details to know if it would work. I didn't think about that. Good idea, that looks perfectly doable. But I'd still prefer to centralise the skb_reset_mac_header() call in a dedicated patch set. I we did it here, we'd have to not reset the mac header by default, to guarantee that unrelated tunnels wouldn't be affected. However, I think that the default behaviour should be to reset the mac header and that only special cases, like the one in ip_gre, should explicitely turn that off. Therefore, we'd need a follow up patch anyway, to change the way this "reset_mac" flag would be set. IMHO, the current approach has the advantage of clearly separating the new feature from the refactoring. But if you feel strongly about using a flag in struct ip_tunnel, I can rework this series.
On 6/28/21 9:04 AM, Guillaume Nault wrote: > On Sun, Jun 27, 2021 at 09:56:53AM -0600, David Ahern wrote: >> On 6/26/21 2:53 PM, Guillaume Nault wrote: >>> On Sat, Jun 26, 2021 at 11:50:19AM -0600, David Ahern wrote: >>>> On 6/25/21 7:32 AM, Guillaume Nault wrote: >>>>> Some virtual L3 devices, like vxlan-gpe and gre (in collect_md mode), >>>>> reset the MAC header pointer after they parsed the outer headers. This >>>>> accurately reflects the fact that the decapsulated packet is pure L3 >>>>> packet, as that makes the MAC header 0 bytes long (the MAC and network >>>>> header pointers are equal). >>>>> >>>>> However, many L3 devices only adjust the network header after >>>>> decapsulation and leave the MAC header pointer to its original value. >>>>> This can confuse other parts of the networking stack, like TC, which >>>>> then considers the outer headers as one big MAC header. >>>>> >>>>> This patch series makes the following L3 tunnels behave like VXLAN-GPE: >>>>> bareudp, ipip, sit, gre, ip6gre, ip6tnl, gtp. >>>>> >>>>> The case of gre is a bit special. It already resets the MAC header >>>>> pointer in collect_md mode, so only the classical mode needs to be >>>>> adjusted. However, gre also has a special case that expects the MAC >>>>> header pointer to keep pointing to the outer header even after >>>>> decapsulation. Therefore, patch 4 keeps an exception for this case. >>>>> >>>>> Ideally, we'd centralise the call to skb_reset_mac_header() in >>>>> ip_tunnel_rcv(), to avoid manual calls in ipip (patch 2), >>>>> sit (patch 3) and gre (patch 4). That's unfortunately not feasible >>>>> currently, because of the gre special case discussed above that >>>>> precludes us from resetting the MAC header unconditionally. >>>> >>>> What about adding a flag to ip_tunnel indicating if it can be done (or >>>> should not be done since doing it is the most common)? >>> >>> That's feasible. I didn't do it here because I wanted to keep the >>> patch series focused on L3 tunnels. Modifying ip_tunnel_rcv()'s >>> prototype would also require updating erspan_rcv(), which isn't L3 >>> (erspan carries Ethernet frames). I was feeling such consolidation >>> would be best done in a follow up patch series. >> >> I was thinking a flag in 'struct ip_tunnel'. It's the private data for >> those netdevices, so a per-instance setting. I haven't walked through >> the details to know if it would work. > > I didn't think about that. Good idea, that looks perfectly doable. But > I'd still prefer to centralise the skb_reset_mac_header() call in a > dedicated patch set. I we did it here, we'd have to not reset the mac > header by default, to guarantee that unrelated tunnels wouldn't be > affected. > However, I think that the default behaviour should be to reset the mac > header and that only special cases, like the one in ip_gre, should > explicitely turn that off. Therefore, we'd need a follow up patch > anyway, to change the way this "reset_mac" flag would be set. > > IMHO, the current approach has the advantage of clearly separating the > new feature from the refactoring. But if you feel strongly about using > a flag in struct ip_tunnel, I can rework this series. > I am accustomed to doing refactoring first to add some new feature in the simplest and clearest way.
Hello: This series was applied to netdev/net-next.git (refs/heads/master): On Fri, 25 Jun 2021 15:32:59 +0200 you wrote: > Some virtual L3 devices, like vxlan-gpe and gre (in collect_md mode), > reset the MAC header pointer after they parsed the outer headers. This > accurately reflects the fact that the decapsulated packet is pure L3 > packet, as that makes the MAC header 0 bytes long (the MAC and network > header pointers are equal). > > However, many L3 devices only adjust the network header after > decapsulation and leave the MAC header pointer to its original value. > This can confuse other parts of the networking stack, like TC, which > then considers the outer headers as one big MAC header. > > [...] Here is the summary with links: - [net-next,1/6] bareudp: allow redirecting bareudp packets to eth devices https://git.kernel.org/netdev/net-next/c/99c8719b7981 - [net-next,2/6] ipip: allow redirecting ipip and mplsip packets to eth devices https://git.kernel.org/netdev/net-next/c/7ad136fd288c - [net-next,3/6] sit: allow redirecting ip6ip, ipip and mplsip packets to eth devices https://git.kernel.org/netdev/net-next/c/730eed2772e7 - [net-next,4/6] gre: let mac_header point to outer header only when necessary https://git.kernel.org/netdev/net-next/c/aab1e898c26c - [net-next,5/6] ip6_tunnel: allow redirecting ip6gre and ipxip6 packets to eth devices https://git.kernel.org/netdev/net-next/c/da5a2e49f064 - [net-next,6/6] gtp: reset mac_header after decap https://git.kernel.org/netdev/net-next/c/b2d898c8a523 You are awesome, thank you! -- Deet-doot-dot, I am a bot. https://korg.docs.kernel.org/patchwork/pwbot.html
On Mon, Jun 28, 2021 at 12:46:24PM -0600, David Ahern wrote: > On 6/28/21 9:04 AM, Guillaume Nault wrote: > > On Sun, Jun 27, 2021 at 09:56:53AM -0600, David Ahern wrote: > >> On 6/26/21 2:53 PM, Guillaume Nault wrote: > >>> On Sat, Jun 26, 2021 at 11:50:19AM -0600, David Ahern wrote: > >>>> On 6/25/21 7:32 AM, Guillaume Nault wrote: > >>>>> Some virtual L3 devices, like vxlan-gpe and gre (in collect_md mode), > >>>>> reset the MAC header pointer after they parsed the outer headers. This > >>>>> accurately reflects the fact that the decapsulated packet is pure L3 > >>>>> packet, as that makes the MAC header 0 bytes long (the MAC and network > >>>>> header pointers are equal). > >>>>> > >>>>> However, many L3 devices only adjust the network header after > >>>>> decapsulation and leave the MAC header pointer to its original value. > >>>>> This can confuse other parts of the networking stack, like TC, which > >>>>> then considers the outer headers as one big MAC header. > >>>>> > >>>>> This patch series makes the following L3 tunnels behave like VXLAN-GPE: > >>>>> bareudp, ipip, sit, gre, ip6gre, ip6tnl, gtp. > >>>>> > >>>>> The case of gre is a bit special. It already resets the MAC header > >>>>> pointer in collect_md mode, so only the classical mode needs to be > >>>>> adjusted. However, gre also has a special case that expects the MAC > >>>>> header pointer to keep pointing to the outer header even after > >>>>> decapsulation. Therefore, patch 4 keeps an exception for this case. > >>>>> > >>>>> Ideally, we'd centralise the call to skb_reset_mac_header() in > >>>>> ip_tunnel_rcv(), to avoid manual calls in ipip (patch 2), > >>>>> sit (patch 3) and gre (patch 4). That's unfortunately not feasible > >>>>> currently, because of the gre special case discussed above that > >>>>> precludes us from resetting the MAC header unconditionally. > >>>> > >>>> What about adding a flag to ip_tunnel indicating if it can be done (or > >>>> should not be done since doing it is the most common)? > >>> > >>> That's feasible. I didn't do it here because I wanted to keep the > >>> patch series focused on L3 tunnels. Modifying ip_tunnel_rcv()'s > >>> prototype would also require updating erspan_rcv(), which isn't L3 > >>> (erspan carries Ethernet frames). I was feeling such consolidation > >>> would be best done in a follow up patch series. > >> > >> I was thinking a flag in 'struct ip_tunnel'. It's the private data for > >> those netdevices, so a per-instance setting. I haven't walked through > >> the details to know if it would work. > > > > I didn't think about that. Good idea, that looks perfectly doable. But > > I'd still prefer to centralise the skb_reset_mac_header() call in a > > dedicated patch set. I we did it here, we'd have to not reset the mac > > header by default, to guarantee that unrelated tunnels wouldn't be > > affected. > > However, I think that the default behaviour should be to reset the mac > > header and that only special cases, like the one in ip_gre, should > > explicitely turn that off. Therefore, we'd need a follow up patch > > anyway, to change the way this "reset_mac" flag would be set. > > > > IMHO, the current approach has the advantage of clearly separating the > > new feature from the refactoring. But if you feel strongly about using > > a flag in struct ip_tunnel, I can rework this series. > > > > I am accustomed to doing refactoring first to add some new feature in > the simplest and clearest way. I agree on the general statement. Anyway, I've just received the patchwork-bot message saying the series has been applied. I'll follow up with the struct ip_tunnel flag once net-next reopens (as I guess it's about to close).