Message ID | 20210826140150.19920-1-justin.iurman@uliege.be (mailing list archive) |
---|---|
State | RFC |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [RFC,net-next] ipv6: Support for anonymous tunnel decapsulation | expand |
Context | Check | Description |
---|---|---|
netdev/cover_letter | success | Link |
netdev/fixes_present | success | Link |
netdev/patch_count | success | Link |
netdev/tree_selection | success | Clearly marked for net-next |
netdev/subject_prefix | success | Link |
netdev/cc_maintainers | warning | 4 maintainers not CCed: zxu@linkedin.com praveen5582@gmail.com Jason@zx2c4.com linmiaohe@huawei.com |
netdev/source_inline | success | Was 0 now: 0 |
netdev/verify_signedoff | success | Link |
netdev/module_param | success | Was 0 now: 0 |
netdev/build_32bit | fail | Errors and warnings before: 2215 this patch: 2217 |
netdev/kdoc | success | Errors and warnings before: 38 this patch: 38 |
netdev/verify_fixes | success | Link |
netdev/checkpatch | warning | CHECK: spaces preferred around that '|' (ctx:VxV) WARNING: added, moved or deleted file(s), does MAINTAINERS need updating? |
netdev/build_allmodconfig_warn | fail | Errors and warnings before: 2293 this patch: 2295 |
netdev/header_inline | success | Link |
Le 26/08/2021 à 16:01, Justin Iurman a écrit : > Nowadays, there are more and more private domains where a lot of ingresses and > egresses must be linked altogether. Configuring each possible tunnel explicitly > could quickly become a nightmare in such use case. Therefore, introducing > support for ip6ip6 decapsulation without an explicit tunnel configuration looks > like the best solution (e.g., for IOAM). For now, this patch only adds support > for ip6ip6 decap, but ip6ip4 could probably be added too if needed. > > Last year, we had an interesting discussion [1] with Tom about this topic, and > especially on how such solution could be implemented in a more generic way. Here > is the summary of the thread. > > Tom said: > "This is just IP in IP encapsulation that happens to be terminated at > an egress node of the IOAM domain. The fact that it's IOAM isn't > germaine, this IP in IP is done in a variety of ways. We should be > using the normal protocol handler for NEXTHDR_IPV6 instead of special > case code." > > He also said: > "The current implementation might not be what you're looking for since > ip6ip6 wants a tunnel configured. What we really want is more like > anonymous decapsulation, that is just decap the ip6ip6 packet and > resubmit the packet into the stack (this is what you patch is doing). > The idea has been kicked around before, especially in the use case > where we're tunneling across a domain and there could be hundreds of > such tunnels to some device. I think it's generally okay to do this, > although someone might raise security concerns since it sort of > obfuscates the "real packet". Probably makes sense to have a sysctl to > enable this and probably could default to on. Of course, if we do this > the next question is should we also implement anonymous decapsulation > for 44,64,46 tunnels." > > Based on the above, here is a generic solution to introduce anonymous tunnels > for IPv6. We know that the tunnel6 module is, when loaded, already responsible > for handling IPPROTO_IPV6 from an IPv6 context (= ip6ip6). Therefore, when > tunnel6 is loaded, it handles ip6ip6 with its tunnel6_rcv handler. Inside the > handler, we add a check for anonymous tunnel decapsulation and, if enabled, > perform the decap. When tunnel6 is unloaded, it gives the responsability back to > tunnel6_anonymous and its own handler. Note that the introduced sysctl to > enable anonymous decapsulation is equal to 0 (= disabled) by default. Indeed, > as opposed to what Tom suggested, I think it should be disabled by default in > order to make sure that users won't have it enabled without knowing it (for > security reasons, obviously). > > Thoughts? I'm not sure to understand why the current code isn't enough. The fallback tunnels created by legacy IP tunnels drivers are able to receive and decapsulate any encapsulated packets. I made a quick try with an ip6 tunnel and it works perfectly: host1 -- router1 -- router2 -- host2 A ping is done from host1 to host2. router1 is configured with a standard ip6 tunnel and screenshots are done on router2: $ modprobe ip6_tunnel $ ip l s ip6tnl0 up $ tcpdump -ni ip6tnl0 tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on ip6tnl0, link-type LINUX_SLL (Linux cooked), capture size 262144 bytes 17:22:22.749246 IP6 fd00:100::1 > fd00:200::1: ICMP6, echo request, seq 0, length 64 And a tcpdump on the link interface: $ tcpdump -ni ntfp2 tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on ntfp2, link-type EN10MB (Ethernet), capture size 262144 bytes 17:23:41.587252 IP6 fd00:125::1 > fd00:125::2: IP6 fd00:100::1 > fd00:200::1: ICMP6, echo request, seq 0, length 64 17:23:41.589291 IP6 fd00:200::1 > fd00:100::1: ICMP6, echo reply, seq 0, length 64 $ ip -d a l dev ip6tnl0 6: ip6tnl0@NONE: <NOARP,UP,LOWER_UP> mtu 1452 qdisc noqueue state UNKNOWN group default qlen 1000 link/tunnel6 :: brd :: promiscuity 0 minmtu 68 maxmtu 65407 ip6tnl ip6ip6 remote any local any hoplimit inherit encaplimit 0 tclass 0x00 flowlabel 0x00000 numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535 inet6 fe80::b47d:abff:feac:ec09/64 scope link valid_lft forever preferred_lft forever Am I missing something? Regards, Nicolas > > Some feedback would be really appreciated, specifically on these points: > - Should the anonymous decapsulation happen before (as it is right now) or > after tunnel6 handlers? "Before" looks like the most logical solution as, > even if you configure a tunnel and enable anonymous decap, the latter will > take precedence. > - Any comments on the choice of the sysctl name ("tunnel66_decap_enabled")? > - Any comments on the patch in general? > > [1] https://lore.kernel.org/netdev/CALx6S374PQ7GGA_ey6wCwc55hUzOx+2kWT=96TzyF0=g=8T=WA@mail.gmail.com
Nicolas, >> Based on the above, here is a generic solution to introduce anonymous tunnels >> for IPv6. We know that the tunnel6 module is, when loaded, already responsible >> for handling IPPROTO_IPV6 from an IPv6 context (= ip6ip6). Therefore, when >> tunnel6 is loaded, it handles ip6ip6 with its tunnel6_rcv handler. Inside the >> handler, we add a check for anonymous tunnel decapsulation and, if enabled, >> perform the decap. When tunnel6 is unloaded, it gives the responsability back to >> tunnel6_anonymous and its own handler. Note that the introduced sysctl to >> enable anonymous decapsulation is equal to 0 (= disabled) by default. Indeed, >> as opposed to what Tom suggested, I think it should be disabled by default in >> order to make sure that users won't have it enabled without knowing it (for >> security reasons, obviously). >> >> Thoughts? > I'm not sure to understand why the current code isn't enough. The fallback > tunnels created by legacy IP tunnels drivers are able to receive and decapsulate > any encapsulated packets. Because, right now, you need to use the ip6_tunnel module and explicitly configure a tunnel, as you described below. The goal of this patch is to provide a way to apply an ip6ip6 decapsulation *without* having to configure a tunnel. > I made a quick try with an ip6 tunnel and it works perfectly: > > host1 -- router1 -- router2 -- host2 > > A ping is done from host1 to host2. router1 is configured with a standard ip6 > tunnel and screenshots are done on router2: > > $ modprobe ip6_tunnel > $ ip l s ip6tnl0 up > $ tcpdump -ni ip6tnl0 > tcpdump: verbose output suppressed, use -v or -vv for full protocol decode > listening on ip6tnl0, link-type LINUX_SLL (Linux cooked), capture size 262144 > bytes > 17:22:22.749246 IP6 fd00:100::1 > fd00:200::1: ICMP6, echo request, seq 0, > length 64 > > And a tcpdump on the link interface: > $ tcpdump -ni ntfp2 > tcpdump: verbose output suppressed, use -v or -vv for full protocol decode > listening on ntfp2, link-type EN10MB (Ethernet), capture size 262144 bytes > 17:23:41.587252 IP6 fd00:125::1 > fd00:125::2: IP6 fd00:100::1 > fd00:200::1: > ICMP6, echo request, seq 0, length 64 > 17:23:41.589291 IP6 fd00:200::1 > fd00:100::1: ICMP6, echo reply, seq 0, length > 64 > > $ ip -d a l dev ip6tnl0 > 6: ip6tnl0@NONE: <NOARP,UP,LOWER_UP> mtu 1452 qdisc noqueue state UNKNOWN group > default qlen 1000 > link/tunnel6 :: brd :: promiscuity 0 minmtu 68 maxmtu 65407 > ip6tnl ip6ip6 remote any local any hoplimit inherit encaplimit 0 tclass 0x00 > flowlabel 0x00000 numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs > 65535 > inet6 fe80::b47d:abff:feac:ec09/64 scope link > valid_lft forever preferred_lft forever > > Am I missing something?
Le 26/08/2021 à 18:23, Justin Iurman a écrit : [snip] >>> Thoughts? >> I'm not sure to understand why the current code isn't enough. The fallback >> tunnels created by legacy IP tunnels drivers are able to receive and decapsulate >> any encapsulated packets. > > Because, right now, you need to use the ip6_tunnel module and explicitly configure a tunnel, as you described below. The goal of this patch is to provide a way to apply an ip6ip6 decapsulation *without* having to configure a tunnel. What is the difference between setting a sysctl somewhere and putting an interface up?
> [snip] > >>>> Thoughts? >>> I'm not sure to understand why the current code isn't enough. The fallback >>> tunnels created by legacy IP tunnels drivers are able to receive and decapsulate >>> any encapsulated packets. >> >> Because, right now, you need to use the ip6_tunnel module and explicitly >> configure a tunnel, as you described below. The goal of this patch is to >> provide a way to apply an ip6ip6 decapsulation *without* having to configure a >> tunnel. > > What is the difference between setting a sysctl somewhere and putting an > interface up? Well, correct me if I'm wrong but, it's more than just putting an interface up. You'd first need ip6_tunnel (and so tunnel6) module loaded, but you'd also need to configure a tunnel on the decap node. Indeed, the current ip6_tunnel fallback handler only works if a tunnel matches the packet (i.e., ipxip6_rcv will return -1 since ip6_tnl_lookup will return NULL, leading to *no* decapsulation from this handler). So, again, think about the case where you have lots of ingresses and egresses that should be linked (= a tunnel for each pair) altogether in a domain. You'd need to configure N tunnels on the decap node, where N is the number of ingresses. Well, actually no, you could just configure one tunnel with "remote any", but you'd still depend on the ip6_tunnel module and play with tunnel configuration and its interface. This patch provides a way to avoid that by just enabling the ip6ip6 decapsulation through a per interface sysctl.
Le 27/08/2021 à 09:51, Justin Iurman a écrit : >> [snip] >> >>>>> Thoughts? >>>> I'm not sure to understand why the current code isn't enough. The fallback >>>> tunnels created by legacy IP tunnels drivers are able to receive and decapsulate >>>> any encapsulated packets. >>> >>> Because, right now, you need to use the ip6_tunnel module and explicitly >>> configure a tunnel, as you described below. The goal of this patch is to >>> provide a way to apply an ip6ip6 decapsulation *without* having to configure a >>> tunnel. >> >> What is the difference between setting a sysctl somewhere and putting an >> interface up? > > Well, correct me if I'm wrong but, it's more than just putting an interface up. You'd first need ip6_tunnel (and so tunnel6) module loaded, but you'd also need to configure a tunnel on the decap node. No, you just need to have the module. The fallback device is automatically created. And if the module is built-in, there is nothing to do. Indeed, the current ip6_tunnel fallback handler only works if a tunnel matches the packet (i.e., ipxip6_rcv will return -1 since ip6_tnl_lookup will return NULL, leading to *no* decapsulation from this handler). No. ip6_tnl_lookup() won't return NULL if the fallback device exists and is up. https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/net/ipv6/ip6_tunnel.c#n168 The tunnels lookup algorithm has several steps: - try to match local and remote addr - try to match only local addr - try to match only dst addr - return the lwt tunnel if it exists - return the fallback device if it exists and is up https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/net/ipv6/ip6_tunnel.c#n100 https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/net/ipv6/sit.c#n96 https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/net/ipv4/ip_tunnel.c#n72 > > So, again, think about the case where you have lots of ingresses and egresses that should be linked (= a tunnel for each pair) altogether in a domain. You'd need to configure N tunnels on the decap node, where N is the number of ingresses. Well, actually no, you could just configure one tunnel with "remote any", but you'd still depend on the ip6_tunnel module and play with tunnel configuration and its interface. This patch provides a way to avoid that by just enabling the ip6ip6 decapsulation through a per interface sysctl. > I don't understand the problem of depending to the ip6_tunnel module. Duplicating a subset of the existing code to avoid a dependency to an existing module seems a bad idea for me, from a maintenance point of view. Regards, Nicolas
>>> [snip] >>> >>>>>> Thoughts? >>>>> I'm not sure to understand why the current code isn't enough. The fallback >>>>> tunnels created by legacy IP tunnels drivers are able to receive and decapsulate >>>>> any encapsulated packets. >>>> >>>> Because, right now, you need to use the ip6_tunnel module and explicitly >>>> configure a tunnel, as you described below. The goal of this patch is to >>>> provide a way to apply an ip6ip6 decapsulation *without* having to configure a >>>> tunnel. >>> >>> What is the difference between setting a sysctl somewhere and putting an >>> interface up? >> >> Well, correct me if I'm wrong but, it's more than just putting an interface up. >> You'd first need ip6_tunnel (and so tunnel6) module loaded, but you'd also need >> to configure a tunnel on the decap node. > No, you just need to have the module. The fallback device is automatically > created. And if the module is built-in, there is nothing to do. > > Indeed, the current ip6_tunnel fallback handler only works if a tunnel matches > the packet (i.e., ipxip6_rcv will return -1 since ip6_tnl_lookup will return > NULL, leading to *no* decapsulation from this handler). > No. ip6_tnl_lookup() won't return NULL if the fallback device exists and is up. > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/net/ipv6/ip6_tunnel.c#n168 My bad, I missed the condition at the end and didn't test it. Indeed, you're correct. > The tunnels lookup algorithm has several steps: > - try to match local and remote addr > - try to match only local addr > - try to match only dst addr > - return the lwt tunnel if it exists > - return the fallback device if it exists and is up > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/net/ipv6/ip6_tunnel.c#n100 > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/net/ipv6/sit.c#n96 > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/net/ipv4/ip_tunnel.c#n72 > >> >> So, again, think about the case where you have lots of ingresses and egresses >> that should be linked (= a tunnel for each pair) altogether in a domain. You'd >> need to configure N tunnels on the decap node, where N is the number of >> ingresses. Well, actually no, you could just configure one tunnel with "remote >> any", but you'd still depend on the ip6_tunnel module and play with tunnel >> configuration and its interface. This patch provides a way to avoid that by >> just enabling the ip6ip6 decapsulation through a per interface sysctl. >> > I don't understand the problem of depending to the ip6_tunnel module. > Duplicating a subset of the existing code to avoid a dependency to an existing > module seems a bad idea for me, from a maintenance point of view. Totally agree, I know this is usually not ideal, especially now that I know ip6_tunnel can already do the job without tunnel configurations. The only "downside" is that you need the ip6_tunnel module but that's fine. Thanks for the feedback and clarification.
diff --git a/include/linux/ipv6.h b/include/linux/ipv6.h index ef4a69865737..119bce49b254 100644 --- a/include/linux/ipv6.h +++ b/include/linux/ipv6.h @@ -79,6 +79,7 @@ struct ipv6_devconf { __u32 ioam6_id; __u32 ioam6_id_wide; __u8 ioam6_enabled; + __u8 tunnel66_decap_enabled; struct ctl_table_header *sysctl_header; }; diff --git a/include/net/tunnel6_anonymous.h b/include/net/tunnel6_anonymous.h new file mode 100644 index 000000000000..990a0ca63edf --- /dev/null +++ b/include/net/tunnel6_anonymous.h @@ -0,0 +1,23 @@ +/* SPDX-License-Identifier: GPL-2.0+ */ +/* + * Anonymous tunnels for IPv6 + * + * Author: + * Justin Iurman <justin.iurman@uliege.be> + */ + +#ifndef _NET_TUNNEL6_ANONYMOUS_H +#define _NET_TUNNEL6_ANONYMOUS_H + +#include <linux/skbuff.h> + +int tunnel6_anonymous_init(void); +void tunnel6_anonymous_exit(void); + +int tunnel6_anonymous_register(void); +int tunnel6_anonymous_unregister(void); + +bool anonymous66_enabled(struct sk_buff *skb); +int anonymous66_decap(struct sk_buff *skb); + +#endif /* _NET_TUNNEL6_ANONYMOUS_H */ diff --git a/include/uapi/linux/ipv6.h b/include/uapi/linux/ipv6.h index b243a53fa985..8b17a26ab661 100644 --- a/include/uapi/linux/ipv6.h +++ b/include/uapi/linux/ipv6.h @@ -193,6 +193,7 @@ enum { DEVCONF_IOAM6_ENABLED, DEVCONF_IOAM6_ID, DEVCONF_IOAM6_ID_WIDE, + DEVCONF_TUNNEL66_DECAP_ENABLED, DEVCONF_MAX }; diff --git a/net/ipv6/Makefile b/net/ipv6/Makefile index 1bc7e143217b..efeaeced17db 100644 --- a/net/ipv6/Makefile +++ b/net/ipv6/Makefile @@ -10,7 +10,8 @@ ipv6-objs := af_inet6.o anycast.o ip6_output.o ip6_input.o addrconf.o \ route.o ip6_fib.o ipv6_sockglue.o ndisc.o udp.o udplite.o \ raw.o icmp.o mcast.o reassembly.o tcp_ipv6.o ping.o \ exthdrs.o datagram.o ip6_flowlabel.o inet6_connection_sock.o \ - udp_offload.o seg6.o fib6_notifier.o rpl.o ioam6.o + udp_offload.o seg6.o fib6_notifier.o rpl.o ioam6.o \ + tunnel6_anonymous.o ipv6-offload := ip6_offload.o tcpv6_offload.o exthdrs_offload.o diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c index 8381288a0d6e..22e14f84b12e 100644 --- a/net/ipv6/addrconf.c +++ b/net/ipv6/addrconf.c @@ -241,6 +241,7 @@ static struct ipv6_devconf ipv6_devconf __read_mostly = { .ioam6_enabled = 0, .ioam6_id = IOAM6_DEFAULT_IF_ID, .ioam6_id_wide = IOAM6_DEFAULT_IF_ID_WIDE, + .tunnel66_decap_enabled = 0, }; static struct ipv6_devconf ipv6_devconf_dflt __read_mostly = { @@ -300,6 +301,7 @@ static struct ipv6_devconf ipv6_devconf_dflt __read_mostly = { .ioam6_enabled = 0, .ioam6_id = IOAM6_DEFAULT_IF_ID, .ioam6_id_wide = IOAM6_DEFAULT_IF_ID_WIDE, + .tunnel66_decap_enabled = 0, }; /* Check if link is ready: is it up and is a valid qdisc available */ @@ -5532,6 +5534,7 @@ static inline void ipv6_store_devconf(struct ipv6_devconf *cnf, array[DEVCONF_IOAM6_ENABLED] = cnf->ioam6_enabled; array[DEVCONF_IOAM6_ID] = cnf->ioam6_id; array[DEVCONF_IOAM6_ID_WIDE] = cnf->ioam6_id_wide; + array[DEVCONF_TUNNEL66_DECAP_ENABLED] = cnf->tunnel66_decap_enabled; } static inline size_t inet6_ifla6_size(void) @@ -6965,6 +6968,15 @@ static const struct ctl_table addrconf_sysctl[] = { .mode = 0644, .proc_handler = proc_douintvec, }, + { + .procname = "tunnel66_decap_enabled", + .data = &ipv6_devconf.tunnel66_decap_enabled, + .maxlen = sizeof(u8), + .mode = 0644, + .proc_handler = proc_dou8vec_minmax, + .extra1 = (void *)SYSCTL_ZERO, + .extra2 = (void *)SYSCTL_ONE, + }, { /* sentinel */ } diff --git a/net/ipv6/af_inet6.c b/net/ipv6/af_inet6.c index d92c90d97763..abb2e504b15e 100644 --- a/net/ipv6/af_inet6.c +++ b/net/ipv6/af_inet6.c @@ -63,6 +63,7 @@ #include <net/compat.h> #include <net/xfrm.h> #include <net/ioam6.h> +#include <net/tunnel6_anonymous.h> #include <linux/uaccess.h> #include <linux/mroute6.h> @@ -1199,6 +1200,10 @@ static int __init inet6_init(void) if (err) goto ioam6_fail; + err = tunnel6_anonymous_init(); + if (err) + goto tunnel6_anonymous_fail; + err = igmp6_late_init(); if (err) goto igmp6_late_err; @@ -1221,6 +1226,8 @@ static int __init inet6_init(void) igmp6_late_cleanup(); #endif igmp6_late_err: + tunnel6_anonymous_exit(); +tunnel6_anonymous_fail: ioam6_exit(); ioam6_fail: rpl_exit(); diff --git a/net/ipv6/tunnel6.c b/net/ipv6/tunnel6.c index 00e8d8b1c9a7..b1a1cfd1e7f1 100644 --- a/net/ipv6/tunnel6.c +++ b/net/ipv6/tunnel6.c @@ -17,6 +17,7 @@ #include <linux/slab.h> #include <net/ipv6.h> #include <net/protocol.h> +#include <net/tunnel6_anonymous.h> #include <net/xfrm.h> static struct xfrm6_tunnel __rcu *tunnel6_handlers __read_mostly; @@ -144,6 +145,12 @@ static int tunnel6_rcv(struct sk_buff *skb) if (!pskb_may_pull(skb, sizeof(struct ipv6hdr))) goto drop; + /* Anonymous tunnel decapsulation + * has a higher priority (if enabled) + */ + if (anonymous66_enabled(skb)) + return anonymous66_decap(skb); + for_each_tunnel_rcu(tunnel6_handlers, handler) if (!handler->handler(skb)) return 0; @@ -257,8 +264,11 @@ static const struct inet6_protocol tunnelmpls6_protocol = { static int __init tunnel6_init(void) { if (inet6_add_protocol(&tunnel6_protocol, IPPROTO_IPV6)) { - pr_err("%s: can't add protocol\n", __func__); - return -EAGAIN; + if (tunnel6_anonymous_unregister() || + inet6_add_protocol(&tunnel6_protocol, IPPROTO_IPV6)) { + pr_err("%s: can't add protocol\n", __func__); + return -EAGAIN; + } } if (inet6_add_protocol(&tunnel46_protocol, IPPROTO_IPIP)) { pr_err("%s: can't add protocol\n", __func__); @@ -295,6 +305,8 @@ static void __exit tunnel6_fini(void) pr_err("%s: can't remove protocol\n", __func__); if (inet6_del_protocol(&tunnel6_protocol, IPPROTO_IPV6)) pr_err("%s: can't remove protocol\n", __func__); + else + tunnel6_anonymous_register(); if (xfrm6_tunnel_mpls_supported() && inet6_del_protocol(&tunnelmpls6_protocol, IPPROTO_MPLS)) pr_err("%s: can't remove protocol\n", __func__); diff --git a/net/ipv6/tunnel6_anonymous.c b/net/ipv6/tunnel6_anonymous.c new file mode 100644 index 000000000000..c28cfb090ef0 --- /dev/null +++ b/net/ipv6/tunnel6_anonymous.c @@ -0,0 +1,83 @@ +// SPDX-License-Identifier: GPL-2.0+ +/* + * Anonymous tunnels for IPv6 + * + * Handle the decapsulation process of anonymous tunnels (i.e., not + * explicitly configured). This behavior is needed for architectures + * where a lot of ingresses and egresses must be linked altogether, + * leading to a solution to avoid configuring all possible tunnels. + * + * Author: + * Justin Iurman <justin.iurman@uliege.be> + */ + +#include <linux/export.h> +#include <linux/icmpv6.h> +#include <linux/init.h> +#include <linux/netdevice.h> +#include <net/addrconf.h> +#include <net/protocol.h> +#include <net/tunnel6_anonymous.h> +#include <uapi/linux/in.h> + +/* called with rcu_read_lock() */ +int anonymous66_rcv(struct sk_buff *skb) +{ + if (!pskb_may_pull(skb, sizeof(struct ipv6hdr))) + goto drop; + + if (anonymous66_enabled(skb)) + return anonymous66_decap(skb); + + icmpv6_send(skb, ICMPV6_PARAMPROB, ICMPV6_UNK_NEXTHDR, 0); +drop: + kfree_skb(skb); + return 0; +} + +static const struct inet6_protocol anonymous66_protocol = { + .handler = anonymous66_rcv, + .flags = INET6_PROTO_NOPOLICY|INET6_PROTO_FINAL, +}; + +bool anonymous66_enabled(struct sk_buff *skb) +{ + return __in6_dev_get(skb->dev)->cnf.tunnel66_decap_enabled; +} +EXPORT_SYMBOL(anonymous66_enabled); + +int anonymous66_decap(struct sk_buff *skb) +{ + skb_reset_network_header(skb); + skb_reset_transport_header(skb); + skb->encapsulation = 0; + + __skb_tunnel_rx(skb, skb->dev, dev_net(skb->dev)); + netif_rx(skb); + + return 0; +} +EXPORT_SYMBOL(anonymous66_decap); + +int tunnel6_anonymous_register(void) +{ + return inet6_add_protocol(&anonymous66_protocol, IPPROTO_IPV6); +} +EXPORT_SYMBOL(tunnel6_anonymous_register); + +int tunnel6_anonymous_unregister(void) +{ + return inet6_del_protocol(&anonymous66_protocol, IPPROTO_IPV6); +} +EXPORT_SYMBOL(tunnel6_anonymous_unregister); + +int __init tunnel6_anonymous_init(void) +{ + tunnel6_anonymous_register(); + return 0; +} + +void tunnel6_anonymous_exit(void) +{ + tunnel6_anonymous_unregister(); +}
Nowadays, there are more and more private domains where a lot of ingresses and egresses must be linked altogether. Configuring each possible tunnel explicitly could quickly become a nightmare in such use case. Therefore, introducing support for ip6ip6 decapsulation without an explicit tunnel configuration looks like the best solution (e.g., for IOAM). For now, this patch only adds support for ip6ip6 decap, but ip6ip4 could probably be added too if needed. Last year, we had an interesting discussion [1] with Tom about this topic, and especially on how such solution could be implemented in a more generic way. Here is the summary of the thread. Tom said: "This is just IP in IP encapsulation that happens to be terminated at an egress node of the IOAM domain. The fact that it's IOAM isn't germaine, this IP in IP is done in a variety of ways. We should be using the normal protocol handler for NEXTHDR_IPV6 instead of special case code." He also said: "The current implementation might not be what you're looking for since ip6ip6 wants a tunnel configured. What we really want is more like anonymous decapsulation, that is just decap the ip6ip6 packet and resubmit the packet into the stack (this is what you patch is doing). The idea has been kicked around before, especially in the use case where we're tunneling across a domain and there could be hundreds of such tunnels to some device. I think it's generally okay to do this, although someone might raise security concerns since it sort of obfuscates the "real packet". Probably makes sense to have a sysctl to enable this and probably could default to on. Of course, if we do this the next question is should we also implement anonymous decapsulation for 44,64,46 tunnels." Based on the above, here is a generic solution to introduce anonymous tunnels for IPv6. We know that the tunnel6 module is, when loaded, already responsible for handling IPPROTO_IPV6 from an IPv6 context (= ip6ip6). Therefore, when tunnel6 is loaded, it handles ip6ip6 with its tunnel6_rcv handler. Inside the handler, we add a check for anonymous tunnel decapsulation and, if enabled, perform the decap. When tunnel6 is unloaded, it gives the responsability back to tunnel6_anonymous and its own handler. Note that the introduced sysctl to enable anonymous decapsulation is equal to 0 (= disabled) by default. Indeed, as opposed to what Tom suggested, I think it should be disabled by default in order to make sure that users won't have it enabled without knowing it (for security reasons, obviously). Thoughts? Some feedback would be really appreciated, specifically on these points: - Should the anonymous decapsulation happen before (as it is right now) or after tunnel6 handlers? "Before" looks like the most logical solution as, even if you configure a tunnel and enable anonymous decap, the latter will take precedence. - Any comments on the choice of the sysctl name ("tunnel66_decap_enabled")? - Any comments on the patch in general? [1] https://lore.kernel.org/netdev/CALx6S374PQ7GGA_ey6wCwc55hUzOx+2kWT=96TzyF0=g=8T=WA@mail.gmail.com Signed-off-by: Justin Iurman <justin.iurman@uliege.be> --- include/linux/ipv6.h | 1 + include/net/tunnel6_anonymous.h | 23 +++++++++ include/uapi/linux/ipv6.h | 1 + net/ipv6/Makefile | 3 +- net/ipv6/addrconf.c | 12 +++++ net/ipv6/af_inet6.c | 7 +++ net/ipv6/tunnel6.c | 16 ++++++- net/ipv6/tunnel6_anonymous.c | 83 +++++++++++++++++++++++++++++++++ 8 files changed, 143 insertions(+), 3 deletions(-) create mode 100644 include/net/tunnel6_anonymous.h create mode 100644 net/ipv6/tunnel6_anonymous.c