diff mbox series

[RFC,14/16] gtp: add support for flow based tunneling

Message ID 20210123195916.2765481-15-jonas@norrbonn.se (mailing list archive)
State RFC
Delegated to: Netdev Maintainers
Headers show
Series GTP: flow based | expand

Checks

Context Check Description
netdev/cover_letter success Link
netdev/fixes_present success Link
netdev/patch_count fail Series longer than 15 patches
netdev/tree_selection success Guessed tree name to be net-next
netdev/subject_prefix success Link
netdev/cc_maintainers warning 9 maintainers not CCed: mstarovo@pm.me toke@redhat.com irusskikh@marvell.com atenart@kernel.org horatiu.vultur@microchip.com osmocom-net-gprs@lists.osmocom.org thomas.karlsson@paneda.se nicolas.dichtel@6wind.com davem@davemloft.net
netdev/source_inline success Was 0 now: 0
netdev/verify_signedoff fail Link
netdev/module_param success Was 0 now: 0
netdev/build_32bit fail Errors and warnings before: 404 this patch: 405
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/verify_fixes success Link
netdev/checkpatch warning CHECK: Alignment should match open parenthesis CHECK: Please don't use multiple blank lines WARNING: line length of 96 exceeds 80 columns
netdev/build_allmodconfig_warn fail Errors and warnings before: 430 this patch: 431
netdev/header_inline success Link
netdev/stable success Stable not CCed

Commit Message

Jonas Bonn Jan. 23, 2021, 7:59 p.m. UTC
From: Pravin B Shelar <pbshelar@fb.com>

This patch adds support for flow based tunneling, allowing to send and
receive GTP tunneled packets via the (lightweight) tunnel metadata
mechanism.  This would allow integration with OVS and eBPF using flow
based tunneling APIs.

The mechanism used here is to get the required GTP tunnel parameters
from the tunnel metadata instead of looking up a pre-configured PDP
context.  The tunnel metadata contains the necessary information for
creating the GTP header.

Signed-off-by: Jonas Bonn <jonas@norrbonn.se>
---
 drivers/net/gtp.c                  | 160 +++++++++++++++++++++++++----
 include/uapi/linux/gtp.h           |  12 +++
 include/uapi/linux/if_tunnel.h     |   1 +
 tools/include/uapi/linux/if_link.h |   1 +
 4 files changed, 156 insertions(+), 18 deletions(-)

Comments

Jonas Bonn Jan. 24, 2021, 2:27 p.m. UTC | #1
Hi Pravin,

A couple more comments around the GTP_METADATA bits:

On 23/01/2021 20:59, Jonas Bonn wrote:
> From: Pravin B Shelar <pbshelar@fb.com>
> 
> This patch adds support for flow based tunneling, allowing to send and
> receive GTP tunneled packets via the (lightweight) tunnel metadata
> mechanism.  This would allow integration with OVS and eBPF using flow
> based tunneling APIs.
> 
> The mechanism used here is to get the required GTP tunnel parameters
> from the tunnel metadata instead of looking up a pre-configured PDP
> context.  The tunnel metadata contains the necessary information for
> creating the GTP header.
> 
> Signed-off-by: Jonas Bonn <jonas@norrbonn.se>
> ---
>   drivers/net/gtp.c                  | 160 +++++++++++++++++++++++++----
>   include/uapi/linux/gtp.h           |  12 +++
>   include/uapi/linux/if_tunnel.h     |   1 +
>   tools/include/uapi/linux/if_link.h |   1 +
>   4 files changed, 156 insertions(+), 18 deletions(-)
> 

<...>

>   
> +static int gtp_set_tun_dst(struct pdp_ctx *pctx, struct sk_buff *skb,
> +			   unsigned int hdrlen)
> +{
> +	struct metadata_dst *tun_dst;
> +	struct gtp1_header *gtp1;
> +	int opts_len = 0;
> +	__be64 tid;
> +
> +	gtp1 = (struct gtp1_header *)(skb->data + sizeof(struct udphdr));
> +
> +	tid = key32_to_tunnel_id(gtp1->tid);
> +
> +	if (unlikely(gtp1->flags & GTP1_F_MASK))
> +		opts_len = sizeof(struct gtpu_metadata);

So if there are GTP flags sets, you're saying that this no longer a 
T-PDU but something else.  That's wrong... the flags indicate the 
presence of extensions to the GTP header itself.

> +
> +	tun_dst = udp_tun_rx_dst(skb,
> +			pctx->sk->sk_family, TUNNEL_KEY, tid, opts_len);
> +	if (!tun_dst) {
> +		netdev_dbg(pctx->dev, "Failed to allocate tun_dst");
> +		goto err;
> +	}
> +
> +	netdev_dbg(pctx->dev, "attaching metadata_dst to skb, gtp ver %d hdrlen %d\n",
> +		   pctx->gtp_version, hdrlen);
> +	if (unlikely(opts_len)) {
> +		struct gtpu_metadata *opts;
> +
> +		opts = ip_tunnel_info_opts(&tun_dst->u.tun_info);
> +		opts->ver = GTP_METADATA_V1;
> +		opts->flags = gtp1->flags;
> +		opts->type = gtp1->type;
> +		netdev_dbg(pctx->dev, "recved control pkt: flag %x type: %d\n",
> +			   opts->flags, opts->type);
> +		tun_dst->u.tun_info.key.tun_flags |= TUNNEL_GTPU_OPT;
> +		tun_dst->u.tun_info.options_len = opts_len;
> +		skb->protocol = htons(0xffff);         /* Unknown */

It might be relevant to set protocol to unknown for 'end markers' (i.e. 
gtp1->type == 0xfe), but not for everything that happens to have flag 
bits set.  'flags' and 'type' are independent of each other and need to 
handled as such.

/Jonas
Jonas Bonn Jan. 24, 2021, 3:11 p.m. UTC | #2
Hi,

On 23/01/2021 20:59, Jonas Bonn wrote:
> From: Pravin B Shelar <pbshelar@fb.com>
> 
> This patch adds support for flow based tunneling, allowing to send and
> receive GTP tunneled packets via the (lightweight) tunnel metadata
> mechanism.  This would allow integration with OVS and eBPF using flow
> based tunneling APIs.
> 
> The mechanism used here is to get the required GTP tunnel parameters
> from the tunnel metadata instead of looking up a pre-configured PDP
> context.  The tunnel metadata contains the necessary information for
> creating the GTP header.

The GTP driver operates in two modes:  GGSN/PGW/UPF and SGSN/SGW/NG-U. 
For simplicity we'll just refer to these as 'ggsn' and 'sgsn', but these 
are essentially just upstream and downstream nodes.

So, the classic way of adding a tunnel to the driver is:

gtp-tunnel add gtp v1 100 200 192.168.100.1 172.99.0.1

That encapsulates a lot of information about both ends of the tunnel 
including TEID's for both ends of the tunnel, the local IP in use, and 
the remote end.

With this new approach we have ('ggsn' side):

ip route add 192.168.100.1/32 encap ip id 200 dst 172.99.0.1 dev gtp

That has all the information required for sending packets from 'ggsn' to 
'sgsn', but it's missing everything required for the validation of 
incoming packets from the 'sgsn'.  The implementation just ignores the 
TEID on incoming packets and doesn't validate TEID against IP like the 
PDP context variant does.

'sgsn' side we have:

ip route add SOME_IP encap ip id 100 dst 172.99.0.2 dev gtp

'sgsn' side is intended for testing eNodeB-type entities behind which 
there are a large number of simulated UE's.  The PDP context setup 
allows a form of 'source routing' internally in the driver whereby the 
MS/UE address of the PDP context is matched to the source IP of the 
outgoing packet in order to determine the required TEID.  How is 
something similar achievable with the ip route example above; can a 
'src' parameter be added in order to get the right 'id' (TEID) from a 
set of otherwise similar routing rules?


In the one example you posted 
(https://github.com/pshelar/iproute2/commit/d6e99f8342672e6e9ce0b71e153296f8e2b41cfc) 
you have, on the 'ggsn' side:

ip route add 1.1.1.0/24 encap id 0 dst 10.1.0.2 dev gtp1

This amounts to mapping a route to an entire network of 'devices' 
through a single TEID at host 10.1.0.2.  This might work because you 
just ignore the TEID on incoming packets and inject them into the 
receiving host in your 'sgsn' variant of the driver, but with any real 
downstream GTP device I don't think the above is of any practical value.

If I were to consider the above as an 'sgsn' side route setup, then 
you've got an entire network of devices talking to an upstream GTP 
entity (UPF) through a single TEID... security-wise, I'd be surprised if 
anybody actually allowed this.

Thanks for taking the time to consider the above.  I look forward to 
your comments.

/Jonas

> 
> Signed-off-by: Jonas Bonn <jonas@norrbonn.se>
> ---
>   drivers/net/gtp.c                  | 160 +++++++++++++++++++++++++----
>   include/uapi/linux/gtp.h           |  12 +++
>   include/uapi/linux/if_tunnel.h     |   1 +
>   tools/include/uapi/linux/if_link.h |   1 +
>   4 files changed, 156 insertions(+), 18 deletions(-)
> 
> diff --git a/drivers/net/gtp.c b/drivers/net/gtp.c
> index 8aab46ec8a94..668ed8a4836e 100644
> --- a/drivers/net/gtp.c
> +++ b/drivers/net/gtp.c
> @@ -21,6 +21,7 @@
>   #include <linux/file.h>
>   #include <linux/gtp.h>
>   
> +#include <net/dst_metadata.h>
>   #include <net/net_namespace.h>
>   #include <net/protocol.h>
>   #include <net/ip.h>
> @@ -74,6 +75,9 @@ struct gtp_dev {
>   	unsigned int		hash_size;
>   	struct hlist_head	*tid_hash;
>   	struct hlist_head	*addr_hash;
> +	/* Used by LWT tunnel. */
> +	bool			collect_md;
> +	struct socket		*collect_md_sock;
>   };
>   
>   static unsigned int gtp_net_id __read_mostly;
> @@ -224,6 +228,51 @@ static int gtp_rx(struct pdp_ctx *pctx, struct sk_buff *skb,
>   	return -1;
>   }
>   
> +static int gtp_set_tun_dst(struct pdp_ctx *pctx, struct sk_buff *skb,
> +			   unsigned int hdrlen)
> +{
> +	struct metadata_dst *tun_dst;
> +	struct gtp1_header *gtp1;
> +	int opts_len = 0;
> +	__be64 tid;
> +
> +	gtp1 = (struct gtp1_header *)(skb->data + sizeof(struct udphdr));
> +
> +	tid = key32_to_tunnel_id(gtp1->tid);
> +
> +	if (unlikely(gtp1->flags & GTP1_F_MASK))
> +		opts_len = sizeof(struct gtpu_metadata);
> +
> +	tun_dst = udp_tun_rx_dst(skb,
> +			pctx->sk->sk_family, TUNNEL_KEY, tid, opts_len);
> +	if (!tun_dst) {
> +		netdev_dbg(pctx->dev, "Failed to allocate tun_dst");
> +		goto err;
> +	}
> +
> +	netdev_dbg(pctx->dev, "attaching metadata_dst to skb, gtp ver %d hdrlen %d\n",
> +		   pctx->gtp_version, hdrlen);
> +	if (unlikely(opts_len)) {
> +		struct gtpu_metadata *opts;
> +
> +		opts = ip_tunnel_info_opts(&tun_dst->u.tun_info);
> +		opts->ver = GTP_METADATA_V1;
> +		opts->flags = gtp1->flags;
> +		opts->type = gtp1->type;
> +		netdev_dbg(pctx->dev, "recved control pkt: flag %x type: %d\n",
> +			   opts->flags, opts->type);
> +		tun_dst->u.tun_info.key.tun_flags |= TUNNEL_GTPU_OPT;
> +		tun_dst->u.tun_info.options_len = opts_len;
> +		skb->protocol = htons(0xffff);         /* Unknown */
> +	}
> +
> +	skb_dst_set(skb, &tun_dst->dst);
> +	return 0;
> +err:
> +	return -1;
> +}
> +
> +
>   /* 1 means pass up to the stack, -1 means drop and 0 means decapsulated. */
>   static int gtp0_udp_encap_recv(struct gtp_dev *gtp, struct sk_buff *skb)
>   {
> @@ -262,6 +311,7 @@ static int gtp1u_udp_encap_recv(struct gtp_dev *gtp, struct sk_buff *skb)
>   	unsigned int hdrlen = sizeof(struct udphdr) +
>   			      sizeof(struct gtp1_header);
>   	struct gtp1_header *gtp1;
> +	struct pdp_ctx md_pctx;
>   	struct pdp_ctx *pctx;
>   
>   	if (!pskb_may_pull(skb, hdrlen))
> @@ -272,6 +322,24 @@ static int gtp1u_udp_encap_recv(struct gtp_dev *gtp, struct sk_buff *skb)
>   	if ((gtp1->flags >> 5) != GTP_V1)
>   		return 1;
>   
> +	if (ip_tunnel_collect_metadata() || gtp->collect_md) {
> +		int err;
> +
> +		pctx = &md_pctx;
> +
> +		pctx->gtp_version = GTP_V1;
> +		pctx->sk = gtp->sk1u;
> +		pctx->dev = gtp->dev;
> +
> +		err = gtp_set_tun_dst(pctx, skb, hdrlen);
> +		if (err) {
> +			gtp->dev->stats.rx_dropped++;
> +			return -1;
> +		}
> +
> +		return gtp_rx(pctx, skb, hdrlen);
> +	}
> +
>   	if (gtp1->type != GTP_TPDU)
>   		return 1;
>   
> @@ -353,7 +421,8 @@ static int gtp_encap_recv(struct sock *sk, struct sk_buff *skb)
>   	if (!gtp)
>   		return 1;
>   
> -	netdev_dbg(gtp->dev, "encap_recv sk=%p\n", sk);
> +	netdev_dbg(gtp->dev, "encap_recv sk=%p type %d\n",
> +		   sk, udp_sk(sk)->encap_type);
>   
>   	switch (udp_sk(sk)->encap_type) {
>   	case UDP_ENCAP_GTP0:
> @@ -539,7 +608,7 @@ static struct rtable *gtp_get_v4_rt(struct sk_buff *skb,
>   	memset(&fl4, 0, sizeof(fl4));
>   	fl4.flowi4_oif		= sk->sk_bound_dev_if;
>   	fl4.daddr		= pctx->peer_addr_ip4.s_addr;
> -	fl4.saddr		= inet_sk(sk)->inet_saddr;
> +	fl4.saddr		= *saddr;
>   	fl4.flowi4_tos		= RT_CONN_FLAGS(sk);
>   	fl4.flowi4_proto	= sk->sk_protocol;
>   
> @@ -617,29 +686,84 @@ static void gtp_push_header(struct sk_buff *skb, struct pdp_ctx *pctx,
>   static int gtp_xmit_ip4(struct sk_buff *skb, struct net_device *dev)
>   {
>   	struct gtp_dev *gtp = netdev_priv(dev);
> +	struct gtpu_metadata *opts = NULL;
> +	struct pdp_ctx md_pctx;
>   	struct pdp_ctx *pctx;
> +	__be16 port;
>   	struct rtable *rt;
> -	__be32 saddr;
>   	struct iphdr *iph;
> +	__be32 saddr;
>   	int headroom;
> -	__be16 port;
> +	__u8 tos;
>   	int r;
>   
> -	/* Read the IP destination address and resolve the PDP context.
> -	 * Prepend PDP header with TEI/TID from PDP ctx.
> -	 */
> -	iph = ip_hdr(skb);
> -	if (gtp->role == GTP_ROLE_SGSN)
> -		pctx = ipv4_pdp_find(gtp, iph->saddr);
> -	else
> -		pctx = ipv4_pdp_find(gtp, iph->daddr);
> +	if (gtp->collect_md) {
> +		/* LWT GTP1U encap */
> +		struct ip_tunnel_info *info = NULL;
>   
> -	if (!pctx) {
> -		netdev_dbg(dev, "no PDP ctx found for %pI4, skip\n",
> -			   &iph->daddr);
> -		return -ENOENT;
> +		info = skb_tunnel_info(skb);
> +		if (!info) {
> +			netdev_dbg(dev, "missing tunnel info");
> +			return -ENOENT;
> +		}
> +		if (info->key.tp_dst && ntohs(info->key.tp_dst) != GTP1U_PORT) {
> +			netdev_dbg(dev, "unexpected GTP dst port: %d", ntohs(info->key.tp_dst));
> +			return -EOPNOTSUPP;
> +		}
> +
> +		if (!gtp->sk1u) {
> +			netdev_dbg(dev, "missing tunnel sock");
> +			return -EOPNOTSUPP;
> +		}
> +
> +		pctx = &md_pctx;
> +		memset(pctx, 0, sizeof(*pctx));
> +		pctx->sk = gtp->sk1u;
> +		pctx->gtp_version = GTP_V1;
> +		pctx->u.v1.o_tei = ntohl(tunnel_id_to_key32(info->key.tun_id));
> +		pctx->peer_addr_ip4.s_addr = info->key.u.ipv4.dst;
> +
> +		saddr = info->key.u.ipv4.src;
> +		tos = info->key.tos;
> +
> +		if (info->options_len != 0) {
> +			if (info->key.tun_flags & TUNNEL_GTPU_OPT) {
> +				opts = ip_tunnel_info_opts(info);
> +			} else {
> +				netdev_dbg(dev, "missing tunnel metadata for control pkt");
> +				return -EOPNOTSUPP;
> +			}
> +		}
> +		netdev_dbg(dev, "flow-based GTP1U encap: tunnel id %d\n",
> +			   pctx->u.v1.o_tei);
> +	} else {
> +		struct iphdr *iph;
> +
> +		if (ntohs(skb->protocol) != ETH_P_IP)
> +			return -EOPNOTSUPP;
> +
> +		iph = ip_hdr(skb);
> +
> +		/* Read the IP destination address and resolve the PDP context.
> +		 * Prepend PDP header with TEI/TID from PDP ctx.
> +		 */
> +		if (gtp->role == GTP_ROLE_SGSN)
> +			pctx = ipv4_pdp_find(gtp, iph->saddr);
> +		else
> +			pctx = ipv4_pdp_find(gtp, iph->daddr);
> +
> +		if (!pctx) {
> +			netdev_dbg(dev, "no PDP ctx found for %pI4, skip\n",
> +				   &iph->daddr);
> +			return -ENOENT;
> +		}
> +		netdev_dbg(dev, "found PDP context %p\n", pctx);
> +
> +		saddr = inet_sk(pctx->sk)->inet_saddr;
> +		tos = iph->tos;
> +		netdev_dbg(dev, "gtp -> IP src: %pI4 dst: %pI4\n",
> +			   &iph->saddr, &iph->daddr);
>   	}
> -	netdev_dbg(dev, "found PDP context %p\n", pctx);
>   
>   	rt = gtp_get_v4_rt(skb, dev, pctx, &saddr);
>   	if (IS_ERR(rt)) {
> @@ -691,7 +815,7 @@ static int gtp_xmit_ip4(struct sk_buff *skb, struct net_device *dev)
>   
>   	udp_tunnel_xmit_skb(rt, pctx->sk, skb,
>   			    saddr, pctx->peer_addr_ip4.s_addr,
> -			    iph->tos,
> +			    tos,
>   			    ip4_dst_hoplimit(&rt->dst),
>   			    0,
>   			    port, port,
> diff --git a/include/uapi/linux/gtp.h b/include/uapi/linux/gtp.h
> index 79f9191bbb24..62aff78b7c56 100644
> --- a/include/uapi/linux/gtp.h
> +++ b/include/uapi/linux/gtp.h
> @@ -2,6 +2,8 @@
>   #ifndef _UAPI_LINUX_GTP_H_
>   #define _UAPI_LINUX_GTP_H_
>   
> +#include <linux/types.h>
> +
>   #define GTP_GENL_MCGRP_NAME	"gtp"
>   
>   enum gtp_genl_cmds {
> @@ -34,4 +36,14 @@ enum gtp_attrs {
>   };
>   #define GTPA_MAX (__GTPA_MAX + 1)
>   
> +enum {
> +	GTP_METADATA_V1
> +};
> +
> +struct gtpu_metadata {
> +	__u8    ver;
> +	__u8    flags;
> +	__u8    type;
> +};
> +
>   #endif /* _UAPI_LINUX_GTP_H_ */
> diff --git a/include/uapi/linux/if_tunnel.h b/include/uapi/linux/if_tunnel.h
> index 7d9105533c7b..802da679fab1 100644
> --- a/include/uapi/linux/if_tunnel.h
> +++ b/include/uapi/linux/if_tunnel.h
> @@ -176,6 +176,7 @@ enum {
>   #define TUNNEL_VXLAN_OPT	__cpu_to_be16(0x1000)
>   #define TUNNEL_NOCACHE		__cpu_to_be16(0x2000)
>   #define TUNNEL_ERSPAN_OPT	__cpu_to_be16(0x4000)
> +#define TUNNEL_GTPU_OPT		__cpu_to_be16(0x8000)
>   
>   #define TUNNEL_OPTIONS_PRESENT \
>   		(TUNNEL_GENEVE_OPT | TUNNEL_VXLAN_OPT | TUNNEL_ERSPAN_OPT)
> diff --git a/tools/include/uapi/linux/if_link.h b/tools/include/uapi/linux/if_link.h
> index d208b2af697f..28d649bda686 100644
> --- a/tools/include/uapi/linux/if_link.h
> +++ b/tools/include/uapi/linux/if_link.h
> @@ -617,6 +617,7 @@ enum {
>   	IFLA_GTP_FD1,
>   	IFLA_GTP_PDP_HASHSIZE,
>   	IFLA_GTP_ROLE,
> +	IFLA_GTP_COLLECT_METADATA,
>   	__IFLA_GTP_MAX,
>   };
>   #define IFLA_GTP_MAX (__IFLA_GTP_MAX - 1)
>
Jonas Bonn Jan. 25, 2021, 8:12 a.m. UTC | #3
Hi Pravin,

I'm going to submit a new series without the GTP_METADATA bits.  I think 
the whole "collect metadata" approach is fine, but the way GTP header 
information is passed through the tunnel via metadata needs a bit more 
thought.  See below...

On 23/01/2021 20:59, Jonas Bonn wrote:
> From: Pravin B Shelar <pbshelar@fb.com>
>   
> +static int gtp_set_tun_dst(struct pdp_ctx *pctx, struct sk_buff *skb,
> +			   unsigned int hdrlen)
> +{
> +	struct metadata_dst *tun_dst;
> +	struct gtp1_header *gtp1;
> +	int opts_len = 0;
> +	__be64 tid;
> +
> +	gtp1 = (struct gtp1_header *)(skb->data + sizeof(struct udphdr));
> +
> +	tid = key32_to_tunnel_id(gtp1->tid);
> +
> +	if (unlikely(gtp1->flags & GTP1_F_MASK))
> +		opts_len = sizeof(struct gtpu_metadata);

This decides that GTP metadata is required if any of the S, E, and PN 
bits are set in the header.  However:

i) even when any of those bits are set, none of the extra headers are 
actually added to the metadata so it's somewhat pointless to even bother 
reporting that they're set

ii) the more interesting case is that you might want to report reception 
of an end marker through the tunnel; that however, is signalled by way 
of the GTP header type and not via the flags; but, see below...


> +
> +	tun_dst = udp_tun_rx_dst(skb,
> +			pctx->sk->sk_family, TUNNEL_KEY, tid, opts_len);
> +	if (!tun_dst) {
> +		netdev_dbg(pctx->dev, "Failed to allocate tun_dst");
> +		goto err;
> +	}

The problem, as I see it, is that end marker messages don't actually 
contain an inner packet, so you won't be able to set up a destination 
for them.  The above fails and you never hit the metadata path below.

> +
> +	netdev_dbg(pctx->dev, "attaching metadata_dst to skb, gtp ver %d hdrlen %d\n",
> +		   pctx->gtp_version, hdrlen);
> +	if (unlikely(opts_len)) {
> +		struct gtpu_metadata *opts;
> +
> +		opts = ip_tunnel_info_opts(&tun_dst->u.tun_info);
> +		opts->ver = GTP_METADATA_V1;
> +		opts->flags = gtp1->flags;
> +		opts->type = gtp1->type;
> +		netdev_dbg(pctx->dev, "recved control pkt: flag %x type: %d\n",
> +			   opts->flags, opts->type);
> +		tun_dst->u.tun_info.key.tun_flags |= TUNNEL_GTPU_OPT;
> +		tun_dst->u.tun_info.options_len = opts_len;
> +		skb->protocol = htons(0xffff);         /* Unknown */
> +	}

Assuming that you do hit this code and are able to set the 'type' field 
in the metadata, who is going to be the recipient.  After you pull the 
GTP headers, the SKB is presumably zero-length...

/Jonas
Jonas Bonn Jan. 25, 2021, 8:47 a.m. UTC | #4
Hi Pravin,

On 23/01/2021 20:59, Jonas Bonn wrote:
> From: Pravin B Shelar <pbshelar@fb.com>
> 
> @@ -617,29 +686,84 @@ static void gtp_push_header(struct sk_buff *skb, struct pdp_ctx *pctx,
>   static int gtp_xmit_ip4(struct sk_buff *skb, struct net_device *dev)
>   {
>   	struct gtp_dev *gtp = netdev_priv(dev);
> +	struct gtpu_metadata *opts = NULL;
> +	struct pdp_ctx md_pctx;
>   	struct pdp_ctx *pctx;
> +	__be16 port;
>   	struct rtable *rt;
> -	__be32 saddr;
>   	struct iphdr *iph;
> +	__be32 saddr;
>   	int headroom;
> -	__be16 port;
> +	__u8 tos;
>   	int r;
>   
> -	/* Read the IP destination address and resolve the PDP context.
> -	 * Prepend PDP header with TEI/TID from PDP ctx.
> -	 */
> -	iph = ip_hdr(skb);
> -	if (gtp->role == GTP_ROLE_SGSN)
> -		pctx = ipv4_pdp_find(gtp, iph->saddr);
> -	else
> -		pctx = ipv4_pdp_find(gtp, iph->daddr);
> +	if (gtp->collect_md) {

Why do we have this restriction that the device be exclusively "collect 
metadata" mode or PDP context mode?  Why are we not able to mix the two?

Furthermore, since the collect_md_sock will effectively always be 
listening on INADDR_ANY, that precludes any other PDP context devices 
from co-existing with it.  So setting up a secondary device for PDP 
contexts isn't a feasible workaround.

If mixing isn't possible, then I suppose PDP context management needs to 
be made to fail gracefully in "collect_md" mode... with the current 
patches I think that contexts can be set up but they are just silently 
ignored, which seems like a potential source of confusion.

/Jonas


> +		/* LWT GTP1U encap */
> +		struct ip_tunnel_info *info = NULL;
>   
> -	if (!pctx) {
> -		netdev_dbg(dev, "no PDP ctx found for %pI4, skip\n",
> -			   &iph->daddr);
> -		return -ENOENT;
> +		info = skb_tunnel_info(skb);
> +		if (!info) {
> +			netdev_dbg(dev, "missing tunnel info");
> +			return -ENOENT;
> +		}
> +		if (info->key.tp_dst && ntohs(info->key.tp_dst) != GTP1U_PORT) {
> +			netdev_dbg(dev, "unexpected GTP dst port: %d", ntohs(info->key.tp_dst));
> +			return -EOPNOTSUPP;
> +		}
> +
> +		if (!gtp->sk1u) {
> +			netdev_dbg(dev, "missing tunnel sock");
> +			return -EOPNOTSUPP;
> +		}
> +
> +		pctx = &md_pctx;
> +		memset(pctx, 0, sizeof(*pctx));
> +		pctx->sk = gtp->sk1u;
> +		pctx->gtp_version = GTP_V1;
> +		pctx->u.v1.o_tei = ntohl(tunnel_id_to_key32(info->key.tun_id));
> +		pctx->peer_addr_ip4.s_addr = info->key.u.ipv4.dst;
> +
> +		saddr = info->key.u.ipv4.src;
> +		tos = info->key.tos;
> +
> +		if (info->options_len != 0) {
> +			if (info->key.tun_flags & TUNNEL_GTPU_OPT) {
> +				opts = ip_tunnel_info_opts(info);
> +			} else {
> +				netdev_dbg(dev, "missing tunnel metadata for control pkt");
> +				return -EOPNOTSUPP;
> +			}
> +		}
> +		netdev_dbg(dev, "flow-based GTP1U encap: tunnel id %d\n",
> +			   pctx->u.v1.o_tei);
> +	} else {
> +		struct iphdr *iph;
> +
> +		if (ntohs(skb->protocol) != ETH_P_IP)
> +			return -EOPNOTSUPP;
> +
> +		iph = ip_hdr(skb);
> +
> +		/* Read the IP destination address and resolve the PDP context.
> +		 * Prepend PDP header with TEI/TID from PDP ctx.
> +		 */
> +		if (gtp->role == GTP_ROLE_SGSN)
> +			pctx = ipv4_pdp_find(gtp, iph->saddr);
> +		else
> +			pctx = ipv4_pdp_find(gtp, iph->daddr);
> +
> +		if (!pctx) {
> +			netdev_dbg(dev, "no PDP ctx found for %pI4, skip\n",
> +				   &iph->daddr);
> +			return -ENOENT;
> +		}
> +		netdev_dbg(dev, "found PDP context %p\n", pctx);
> +
> +		saddr = inet_sk(pctx->sk)->inet_saddr;
> +		tos = iph->tos;
> +		netdev_dbg(dev, "gtp -> IP src: %pI4 dst: %pI4\n",
> +			   &iph->saddr, &iph->daddr);
>   	}
> -	netdev_dbg(dev, "found PDP context %p\n", pctx);
>   
>   	rt = gtp_get_v4_rt(skb, dev, pctx, &saddr);
>   	if (IS_ERR(rt)) {
> @@ -691,7 +815,7 @@ static int gtp_xmit_ip4(struct sk_buff *skb, struct net_device *dev)
>   
>   	udp_tunnel_xmit_skb(rt, pctx->sk, skb,
>   			    saddr, pctx->peer_addr_ip4.s_addr,
> -			    iph->tos,
> +			    tos,
>   			    ip4_dst_hoplimit(&rt->dst),
>   			    0,
>   			    port, port,
> diff --git a/include/uapi/linux/gtp.h b/include/uapi/linux/gtp.h
> index 79f9191bbb24..62aff78b7c56 100644
> --- a/include/uapi/linux/gtp.h
> +++ b/include/uapi/linux/gtp.h
> @@ -2,6 +2,8 @@
>   #ifndef _UAPI_LINUX_GTP_H_
>   #define _UAPI_LINUX_GTP_H_
>   
> +#include <linux/types.h>
> +
>   #define GTP_GENL_MCGRP_NAME	"gtp"
>   
>   enum gtp_genl_cmds {
> @@ -34,4 +36,14 @@ enum gtp_attrs {
>   };
>   #define GTPA_MAX (__GTPA_MAX + 1)
>   
> +enum {
> +	GTP_METADATA_V1
> +};
> +
> +struct gtpu_metadata {
> +	__u8    ver;
> +	__u8    flags;
> +	__u8    type;
> +};
> +
>   #endif /* _UAPI_LINUX_GTP_H_ */
> diff --git a/include/uapi/linux/if_tunnel.h b/include/uapi/linux/if_tunnel.h
> index 7d9105533c7b..802da679fab1 100644
> --- a/include/uapi/linux/if_tunnel.h
> +++ b/include/uapi/linux/if_tunnel.h
> @@ -176,6 +176,7 @@ enum {
>   #define TUNNEL_VXLAN_OPT	__cpu_to_be16(0x1000)
>   #define TUNNEL_NOCACHE		__cpu_to_be16(0x2000)
>   #define TUNNEL_ERSPAN_OPT	__cpu_to_be16(0x4000)
> +#define TUNNEL_GTPU_OPT		__cpu_to_be16(0x8000)
>   
>   #define TUNNEL_OPTIONS_PRESENT \
>   		(TUNNEL_GENEVE_OPT | TUNNEL_VXLAN_OPT | TUNNEL_ERSPAN_OPT)
> diff --git a/tools/include/uapi/linux/if_link.h b/tools/include/uapi/linux/if_link.h
> index d208b2af697f..28d649bda686 100644
> --- a/tools/include/uapi/linux/if_link.h
> +++ b/tools/include/uapi/linux/if_link.h
> @@ -617,6 +617,7 @@ enum {
>   	IFLA_GTP_FD1,
>   	IFLA_GTP_PDP_HASHSIZE,
>   	IFLA_GTP_ROLE,
> +	IFLA_GTP_COLLECT_METADATA,
>   	__IFLA_GTP_MAX,
>   };
>   #define IFLA_GTP_MAX (__IFLA_GTP_MAX - 1)
>
Jonas Bonn Feb. 6, 2021, 6:04 p.m. UTC | #5
Hi Pravin et al;

TL;DR:  we don't need to introduce an entire collect_md mode to the 
driver; we just need to tweak what we've got so that metadata is 
"always" added on RX and respected on TX; make the userspace socket 
optional and dump GTP packets to netstack if it's not present; and make 
a decision on whether to allow packets into netstack without validating 
their IP address.

On 23/01/2021 20:59, Jonas Bonn wrote:
> From: Pravin B Shelar <pbshelar@fb.com>
> 
> This patch adds support for flow based tunneling, allowing to send and
> receive GTP tunneled packets via the (lightweight) tunnel metadata
> mechanism.  This would allow integration with OVS and eBPF using flow
> based tunneling APIs.
> 
> The mechanism used here is to get the required GTP tunnel parameters
> from the tunnel metadata instead of looking up a pre-configured PDP
> context.  The tunnel metadata contains the necessary information for
> creating the GTP header.


So, I've been investigating this a bit further...

What is being introduced in this patch is a variant of "normal 
functionality" that resembles something called "collect metadata" mode 
in other drivers (GRE, VXLAN, etc).  These other drivers operate in one 
of two modes:  more-or-less-point-to-point mode, or this alternative 
collect metadata varaint.  The latter is something that has been bolted 
on to an existing implementation of the former.

For iproute2, a parameter called 'external' is added to the setup of 
links of the above type to switch between the two modes; the 
point-to-point parameters are invalid in 'external' (or 'collect 
metadata') mode.  The name 'external' is because the driver is 
externally controlled, meaning that the tunnel information is not 
hardcoded into the device instance itself.

The GTP driver, however, has never been a point-to-point device.  It is 
already 'externally controlled' in the sense that tunnels can be added 
and removed at any time.  Adding this 'external' mode doesn't 
immediately make sense because that's roughly what we already have.

Looking into how ip_tunnel_collect_metadata() works, it looks to me like 
it's always true if there's _ANY_ routing rule in the system with 
'tunnel ID' set.  Is that correct?  I'll assume it is in order to 
continue my thoughts.

So, with that, either the system is collecting SKB metadata or it isn't. 
  If it is, it seems reasonable that we populate incoming packets with 
the tunnel ID as in this patch.  That said, I'm not convinced that we 
should bypass the PDP context mechanism entirely... there should still 
be a PDP context set up or we drop the packet for security reasons.

For outgoing packets, it seems reasonable that the remote TEID may come 
from metadata OR a PDP context.  That would allow some routing 
alternatives to what we have right now which just does a PDP context 
lookup based on the destination/source address of the packet.  It would 
be nice for OVS/BPF to be able to direct a packet to a remote GTP 
endpoint by providing that endpoint/TEID via a metadata structure.

So we end up with, roughly:

On RX:
i) check TEID in GTP header
ii) lookup PDP context
iii) validate IP of encapsulated packet
iv) if ip(tunnel_collect_metadata()) { /* add tun info */ }
v) decapsulate and pass to network stack

On TX:
i) if SKB has metadata, get destination and TEID from metadata (tunnel ID)
ii) otherwise, lookup PDP context for packet

For RX, only iv) is new; for TX only step i) is new.  The rest is what 
we already have.

The one thing that this complicates is the case where an external entity 
(OVS or BPF) is doing validation of packet IP against incoming TEID.  In 
that case, we've got double validation of the incoming address and, I 
suppose, OVS/BPF would perhaps be more efficient (?)...  In that case, 
holding a PDP context is a bit of a waste of memory.

It's somewhat of a security issue to allow un-checked packets into the 
system, so I'm not super keen on dropping the validation of incoming 
packets; given the previous paragraph, however, we might add a flag when 
creating the link to decide whether or not we allow packets through even 
if we can't validate them.  This would also apply to packets without a 
PDP context for an incoming TEID, too.  This flag, I suppose, looks a 
bit like the 'collect_metadata' flag that Pravin has added here.

The other difference to what we currently have is that this patch sets 
up a socket exclusively in kernel space for the tunnel; currently, all 
sockets terminate in userspace:  userspace receives all packets that 
can't be decapsulated and re-injected in to the network stack.  With 
this patch, ALL packets (without a userspace termination) are 
re-injected into the network stack; it's just that anything that can't 
be decapsulated gets injected as a "GTP packet" with some metadata and 
an UNKNOWN protocol type.  If nothing is looking at the metadata and 
acting on it, then these will just get dropped; and that's what would 
happen if nothing was listening on the userspace end, too.  So we might 
as well just make the FD1 socket parameter to the link setup optional 
and be done with it.

So, thoughts?  What am I missing?

/Jonas

> 
> Signed-off-by: Jonas Bonn <jonas@norrbonn.se>
> ---
>   drivers/net/gtp.c                  | 160 +++++++++++++++++++++++++----
>   include/uapi/linux/gtp.h           |  12 +++
>   include/uapi/linux/if_tunnel.h     |   1 +
>   tools/include/uapi/linux/if_link.h |   1 +
>   4 files changed, 156 insertions(+), 18 deletions(-)
> 
> diff --git a/drivers/net/gtp.c b/drivers/net/gtp.c
> index 8aab46ec8a94..668ed8a4836e 100644
> --- a/drivers/net/gtp.c
> +++ b/drivers/net/gtp.c
> @@ -21,6 +21,7 @@
>   #include <linux/file.h>
>   #include <linux/gtp.h>
>   
> +#include <net/dst_metadata.h>
>   #include <net/net_namespace.h>
>   #include <net/protocol.h>
>   #include <net/ip.h>
> @@ -74,6 +75,9 @@ struct gtp_dev {
>   	unsigned int		hash_size;
>   	struct hlist_head	*tid_hash;
>   	struct hlist_head	*addr_hash;
> +	/* Used by LWT tunnel. */
> +	bool			collect_md;
> +	struct socket		*collect_md_sock;
>   };
>   
>   static unsigned int gtp_net_id __read_mostly;
> @@ -224,6 +228,51 @@ static int gtp_rx(struct pdp_ctx *pctx, struct sk_buff *skb,
>   	return -1;
>   }
>   
> +static int gtp_set_tun_dst(struct pdp_ctx *pctx, struct sk_buff *skb,
> +			   unsigned int hdrlen)
> +{
> +	struct metadata_dst *tun_dst;
> +	struct gtp1_header *gtp1;
> +	int opts_len = 0;
> +	__be64 tid;
> +
> +	gtp1 = (struct gtp1_header *)(skb->data + sizeof(struct udphdr));
> +
> +	tid = key32_to_tunnel_id(gtp1->tid);
> +
> +	if (unlikely(gtp1->flags & GTP1_F_MASK))
> +		opts_len = sizeof(struct gtpu_metadata);
> +
> +	tun_dst = udp_tun_rx_dst(skb,
> +			pctx->sk->sk_family, TUNNEL_KEY, tid, opts_len);
> +	if (!tun_dst) {
> +		netdev_dbg(pctx->dev, "Failed to allocate tun_dst");
> +		goto err;
> +	}
> +
> +	netdev_dbg(pctx->dev, "attaching metadata_dst to skb, gtp ver %d hdrlen %d\n",
> +		   pctx->gtp_version, hdrlen);
> +	if (unlikely(opts_len)) {
> +		struct gtpu_metadata *opts;
> +
> +		opts = ip_tunnel_info_opts(&tun_dst->u.tun_info);
> +		opts->ver = GTP_METADATA_V1;
> +		opts->flags = gtp1->flags;
> +		opts->type = gtp1->type;
> +		netdev_dbg(pctx->dev, "recved control pkt: flag %x type: %d\n",
> +			   opts->flags, opts->type);
> +		tun_dst->u.tun_info.key.tun_flags |= TUNNEL_GTPU_OPT;
> +		tun_dst->u.tun_info.options_len = opts_len;
> +		skb->protocol = htons(0xffff);         /* Unknown */
> +	}
> +
> +	skb_dst_set(skb, &tun_dst->dst);
> +	return 0;
> +err:
> +	return -1;
> +}
> +
> +
>   /* 1 means pass up to the stack, -1 means drop and 0 means decapsulated. */
>   static int gtp0_udp_encap_recv(struct gtp_dev *gtp, struct sk_buff *skb)
>   {
> @@ -262,6 +311,7 @@ static int gtp1u_udp_encap_recv(struct gtp_dev *gtp, struct sk_buff *skb)
>   	unsigned int hdrlen = sizeof(struct udphdr) +
>   			      sizeof(struct gtp1_header);
>   	struct gtp1_header *gtp1;
> +	struct pdp_ctx md_pctx;
>   	struct pdp_ctx *pctx;
>   
>   	if (!pskb_may_pull(skb, hdrlen))
> @@ -272,6 +322,24 @@ static int gtp1u_udp_encap_recv(struct gtp_dev *gtp, struct sk_buff *skb)
>   	if ((gtp1->flags >> 5) != GTP_V1)
>   		return 1;
>   
> +	if (ip_tunnel_collect_metadata() || gtp->collect_md) {
> +		int err;
> +
> +		pctx = &md_pctx;
> +
> +		pctx->gtp_version = GTP_V1;
> +		pctx->sk = gtp->sk1u;
> +		pctx->dev = gtp->dev;
> +
> +		err = gtp_set_tun_dst(pctx, skb, hdrlen);
> +		if (err) {
> +			gtp->dev->stats.rx_dropped++;
> +			return -1;
> +		}
> +
> +		return gtp_rx(pctx, skb, hdrlen);
> +	}
> +
>   	if (gtp1->type != GTP_TPDU)
>   		return 1;
>   
> @@ -353,7 +421,8 @@ static int gtp_encap_recv(struct sock *sk, struct sk_buff *skb)
>   	if (!gtp)
>   		return 1;
>   
> -	netdev_dbg(gtp->dev, "encap_recv sk=%p\n", sk);
> +	netdev_dbg(gtp->dev, "encap_recv sk=%p type %d\n",
> +		   sk, udp_sk(sk)->encap_type);
>   
>   	switch (udp_sk(sk)->encap_type) {
>   	case UDP_ENCAP_GTP0:
> @@ -539,7 +608,7 @@ static struct rtable *gtp_get_v4_rt(struct sk_buff *skb,
>   	memset(&fl4, 0, sizeof(fl4));
>   	fl4.flowi4_oif		= sk->sk_bound_dev_if;
>   	fl4.daddr		= pctx->peer_addr_ip4.s_addr;
> -	fl4.saddr		= inet_sk(sk)->inet_saddr;
> +	fl4.saddr		= *saddr;
>   	fl4.flowi4_tos		= RT_CONN_FLAGS(sk);
>   	fl4.flowi4_proto	= sk->sk_protocol;
>   
> @@ -617,29 +686,84 @@ static void gtp_push_header(struct sk_buff *skb, struct pdp_ctx *pctx,
>   static int gtp_xmit_ip4(struct sk_buff *skb, struct net_device *dev)
>   {
>   	struct gtp_dev *gtp = netdev_priv(dev);
> +	struct gtpu_metadata *opts = NULL;
> +	struct pdp_ctx md_pctx;
>   	struct pdp_ctx *pctx;
> +	__be16 port;
>   	struct rtable *rt;
> -	__be32 saddr;
>   	struct iphdr *iph;
> +	__be32 saddr;
>   	int headroom;
> -	__be16 port;
> +	__u8 tos;
>   	int r;
>   
> -	/* Read the IP destination address and resolve the PDP context.
> -	 * Prepend PDP header with TEI/TID from PDP ctx.
> -	 */
> -	iph = ip_hdr(skb);
> -	if (gtp->role == GTP_ROLE_SGSN)
> -		pctx = ipv4_pdp_find(gtp, iph->saddr);
> -	else
> -		pctx = ipv4_pdp_find(gtp, iph->daddr);
> +	if (gtp->collect_md) {
> +		/* LWT GTP1U encap */
> +		struct ip_tunnel_info *info = NULL;
>   
> -	if (!pctx) {
> -		netdev_dbg(dev, "no PDP ctx found for %pI4, skip\n",
> -			   &iph->daddr);
> -		return -ENOENT;
> +		info = skb_tunnel_info(skb);
> +		if (!info) {
> +			netdev_dbg(dev, "missing tunnel info");
> +			return -ENOENT;
> +		}
> +		if (info->key.tp_dst && ntohs(info->key.tp_dst) != GTP1U_PORT) {
> +			netdev_dbg(dev, "unexpected GTP dst port: %d", ntohs(info->key.tp_dst));
> +			return -EOPNOTSUPP;
> +		}
> +
> +		if (!gtp->sk1u) {
> +			netdev_dbg(dev, "missing tunnel sock");
> +			return -EOPNOTSUPP;
> +		}
> +
> +		pctx = &md_pctx;
> +		memset(pctx, 0, sizeof(*pctx));
> +		pctx->sk = gtp->sk1u;
> +		pctx->gtp_version = GTP_V1;
> +		pctx->u.v1.o_tei = ntohl(tunnel_id_to_key32(info->key.tun_id));
> +		pctx->peer_addr_ip4.s_addr = info->key.u.ipv4.dst;
> +
> +		saddr = info->key.u.ipv4.src;
> +		tos = info->key.tos;
> +
> +		if (info->options_len != 0) {
> +			if (info->key.tun_flags & TUNNEL_GTPU_OPT) {
> +				opts = ip_tunnel_info_opts(info);
> +			} else {
> +				netdev_dbg(dev, "missing tunnel metadata for control pkt");
> +				return -EOPNOTSUPP;
> +			}
> +		}
> +		netdev_dbg(dev, "flow-based GTP1U encap: tunnel id %d\n",
> +			   pctx->u.v1.o_tei);
> +	} else {
> +		struct iphdr *iph;
> +
> +		if (ntohs(skb->protocol) != ETH_P_IP)
> +			return -EOPNOTSUPP;
> +
> +		iph = ip_hdr(skb);
> +
> +		/* Read the IP destination address and resolve the PDP context.
> +		 * Prepend PDP header with TEI/TID from PDP ctx.
> +		 */
> +		if (gtp->role == GTP_ROLE_SGSN)
> +			pctx = ipv4_pdp_find(gtp, iph->saddr);
> +		else
> +			pctx = ipv4_pdp_find(gtp, iph->daddr);
> +
> +		if (!pctx) {
> +			netdev_dbg(dev, "no PDP ctx found for %pI4, skip\n",
> +				   &iph->daddr);
> +			return -ENOENT;
> +		}
> +		netdev_dbg(dev, "found PDP context %p\n", pctx);
> +
> +		saddr = inet_sk(pctx->sk)->inet_saddr;
> +		tos = iph->tos;
> +		netdev_dbg(dev, "gtp -> IP src: %pI4 dst: %pI4\n",
> +			   &iph->saddr, &iph->daddr);
>   	}
> -	netdev_dbg(dev, "found PDP context %p\n", pctx);
>   
>   	rt = gtp_get_v4_rt(skb, dev, pctx, &saddr);
>   	if (IS_ERR(rt)) {
> @@ -691,7 +815,7 @@ static int gtp_xmit_ip4(struct sk_buff *skb, struct net_device *dev)
>   
>   	udp_tunnel_xmit_skb(rt, pctx->sk, skb,
>   			    saddr, pctx->peer_addr_ip4.s_addr,
> -			    iph->tos,
> +			    tos,
>   			    ip4_dst_hoplimit(&rt->dst),
>   			    0,
>   			    port, port,
> diff --git a/include/uapi/linux/gtp.h b/include/uapi/linux/gtp.h
> index 79f9191bbb24..62aff78b7c56 100644
> --- a/include/uapi/linux/gtp.h
> +++ b/include/uapi/linux/gtp.h
> @@ -2,6 +2,8 @@
>   #ifndef _UAPI_LINUX_GTP_H_
>   #define _UAPI_LINUX_GTP_H_
>   
> +#include <linux/types.h>
> +
>   #define GTP_GENL_MCGRP_NAME	"gtp"
>   
>   enum gtp_genl_cmds {
> @@ -34,4 +36,14 @@ enum gtp_attrs {
>   };
>   #define GTPA_MAX (__GTPA_MAX + 1)
>   
> +enum {
> +	GTP_METADATA_V1
> +};
> +
> +struct gtpu_metadata {
> +	__u8    ver;
> +	__u8    flags;
> +	__u8    type;
> +};
> +
>   #endif /* _UAPI_LINUX_GTP_H_ */
> diff --git a/include/uapi/linux/if_tunnel.h b/include/uapi/linux/if_tunnel.h
> index 7d9105533c7b..802da679fab1 100644
> --- a/include/uapi/linux/if_tunnel.h
> +++ b/include/uapi/linux/if_tunnel.h
> @@ -176,6 +176,7 @@ enum {
>   #define TUNNEL_VXLAN_OPT	__cpu_to_be16(0x1000)
>   #define TUNNEL_NOCACHE		__cpu_to_be16(0x2000)
>   #define TUNNEL_ERSPAN_OPT	__cpu_to_be16(0x4000)
> +#define TUNNEL_GTPU_OPT		__cpu_to_be16(0x8000)
>   
>   #define TUNNEL_OPTIONS_PRESENT \
>   		(TUNNEL_GENEVE_OPT | TUNNEL_VXLAN_OPT | TUNNEL_ERSPAN_OPT)
> diff --git a/tools/include/uapi/linux/if_link.h b/tools/include/uapi/linux/if_link.h
> index d208b2af697f..28d649bda686 100644
> --- a/tools/include/uapi/linux/if_link.h
> +++ b/tools/include/uapi/linux/if_link.h
> @@ -617,6 +617,7 @@ enum {
>   	IFLA_GTP_FD1,
>   	IFLA_GTP_PDP_HASHSIZE,
>   	IFLA_GTP_ROLE,
> +	IFLA_GTP_COLLECT_METADATA,
>   	__IFLA_GTP_MAX,
>   };
>   #define IFLA_GTP_MAX (__IFLA_GTP_MAX - 1)
>
Pravin Shelar Feb. 7, 2021, 5:56 p.m. UTC | #6
On Sat, Feb 6, 2021 at 10:06 AM Jonas Bonn <jonas@norrbonn.se> wrote:
>
> Hi Pravin et al;
>
> TL;DR:  we don't need to introduce an entire collect_md mode to the
> driver; we just need to tweak what we've got so that metadata is
> "always" added on RX and respected on TX; make the userspace socket
> optional and dump GTP packets to netstack if it's not present; and make
> a decision on whether to allow packets into netstack without validating
> their IP address.
>
Thanks for summarizing the LWT mechanism below. Overall I am fine with
the changes, a couple of comments inlined.


> On 23/01/2021 20:59, Jonas Bonn wrote:
> > From: Pravin B Shelar <pbshelar@fb.com>
> >
> > This patch adds support for flow based tunneling, allowing to send and
> > receive GTP tunneled packets via the (lightweight) tunnel metadata
> > mechanism.  This would allow integration with OVS and eBPF using flow
> > based tunneling APIs.
> >
> > The mechanism used here is to get the required GTP tunnel parameters
> > from the tunnel metadata instead of looking up a pre-configured PDP
> > context.  The tunnel metadata contains the necessary information for
> > creating the GTP header.
>
>
> So, I've been investigating this a bit further...
>
> What is being introduced in this patch is a variant of "normal
> functionality" that resembles something called "collect metadata" mode
> in other drivers (GRE, VXLAN, etc).  These other drivers operate in one
> of two modes:  more-or-less-point-to-point mode, or this alternative
> collect metadata varaint.  The latter is something that has been bolted
> on to an existing implementation of the former.
>
> For iproute2, a parameter called 'external' is added to the setup of
> links of the above type to switch between the two modes; the
> point-to-point parameters are invalid in 'external' (or 'collect
> metadata') mode.  The name 'external' is because the driver is
> externally controlled, meaning that the tunnel information is not
> hardcoded into the device instance itself.
>
> The GTP driver, however, has never been a point-to-point device.  It is
> already 'externally controlled' in the sense that tunnels can be added
> and removed at any time.  Adding this 'external' mode doesn't
> immediately make sense because that's roughly what we already have.
>
> Looking into how ip_tunnel_collect_metadata() works, it looks to me like
> it's always true if there's _ANY_ routing rule in the system with
> 'tunnel ID' set.  Is that correct?  I'll assume it is in order to
> continue my thoughts.
>
Right. It is just optimization to avoid allocating tun-dst in datapath.

> So, with that, either the system is collecting SKB metadata or it isn't.
>   If it is, it seems reasonable that we populate incoming packets with
> the tunnel ID as in this patch.  That said, I'm not convinced that we
> should bypass the PDP context mechanism entirely... there should still
> be a PDP context set up or we drop the packet for security reasons.
>
> For outgoing packets, it seems reasonable that the remote TEID may come
> from metadata OR a PDP context.  That would allow some routing
> alternatives to what we have right now which just does a PDP context
> lookup based on the destination/source address of the packet.  It would
> be nice for OVS/BPF to be able to direct a packet to a remote GTP
> endpoint by providing that endpoint/TEID via a metadata structure.
>
> So we end up with, roughly:
>
> On RX:
> i) check TEID in GTP header
> ii) lookup PDP context
> iii) validate IP of encapsulated packet
> iv) if ip(tunnel_collect_metadata()) { /* add tun info */ }
> v) decapsulate and pass to network stack
>
> On TX:
> i) if SKB has metadata, get destination and TEID from metadata (tunnel ID)
> ii) otherwise, lookup PDP context for packet
>
> For RX, only iv) is new; for TX only step i) is new.  The rest is what
> we already have.
>
> The one thing that this complicates is the case where an external entity
> (OVS or BPF) is doing validation of packet IP against incoming TEID.  In
> that case, we've got double validation of the incoming address and, I
> suppose, OVS/BPF would perhaps be more efficient (?)...  In that case,
> holding a PDP context is a bit of a waste of memory.
>
> It's somewhat of a security issue to allow un-checked packets into the
> system, so I'm not super keen on dropping the validation of incoming
> packets; given the previous paragraph, however, we might add a flag when
> creating the link to decide whether or not we allow packets through even
> if we can't validate them.  This would also apply to packets without a
> PDP context for an incoming TEID, too.  This flag, I suppose, looks a
> bit like the 'collect_metadata' flag that Pravin has added here.
>
Yes. user should have the option to use GTP devices in LTW mode and
bypass PDP session context completely. Lets add a flag for GTP link to
have consistent behaviour for all types of LWT devices.

> The other difference to what we currently have is that this patch sets
> up a socket exclusively in kernel space for the tunnel; currently, all
> sockets terminate in userspace:  userspace receives all packets that
> can't be decapsulated and re-injected in to the network stack.  With
> this patch, ALL packets (without a userspace termination) are
> re-injected into the network stack; it's just that anything that can't
> be decapsulated gets injected as a "GTP packet" with some metadata and
> an UNKNOWN protocol type.  If nothing is looking at the metadata and
> acting on it, then these will just get dropped; and that's what would
> happen if nothing was listening on the userspace end, too.  So we might
> as well just make the FD1 socket parameter to the link setup optional
> and be done with it.
>
Good idea to make FD1 optional argument.

Thanks.
diff mbox series

Patch

diff --git a/drivers/net/gtp.c b/drivers/net/gtp.c
index 8aab46ec8a94..668ed8a4836e 100644
--- a/drivers/net/gtp.c
+++ b/drivers/net/gtp.c
@@ -21,6 +21,7 @@ 
 #include <linux/file.h>
 #include <linux/gtp.h>
 
+#include <net/dst_metadata.h>
 #include <net/net_namespace.h>
 #include <net/protocol.h>
 #include <net/ip.h>
@@ -74,6 +75,9 @@  struct gtp_dev {
 	unsigned int		hash_size;
 	struct hlist_head	*tid_hash;
 	struct hlist_head	*addr_hash;
+	/* Used by LWT tunnel. */
+	bool			collect_md;
+	struct socket		*collect_md_sock;
 };
 
 static unsigned int gtp_net_id __read_mostly;
@@ -224,6 +228,51 @@  static int gtp_rx(struct pdp_ctx *pctx, struct sk_buff *skb,
 	return -1;
 }
 
+static int gtp_set_tun_dst(struct pdp_ctx *pctx, struct sk_buff *skb,
+			   unsigned int hdrlen)
+{
+	struct metadata_dst *tun_dst;
+	struct gtp1_header *gtp1;
+	int opts_len = 0;
+	__be64 tid;
+
+	gtp1 = (struct gtp1_header *)(skb->data + sizeof(struct udphdr));
+
+	tid = key32_to_tunnel_id(gtp1->tid);
+
+	if (unlikely(gtp1->flags & GTP1_F_MASK))
+		opts_len = sizeof(struct gtpu_metadata);
+
+	tun_dst = udp_tun_rx_dst(skb,
+			pctx->sk->sk_family, TUNNEL_KEY, tid, opts_len);
+	if (!tun_dst) {
+		netdev_dbg(pctx->dev, "Failed to allocate tun_dst");
+		goto err;
+	}
+
+	netdev_dbg(pctx->dev, "attaching metadata_dst to skb, gtp ver %d hdrlen %d\n",
+		   pctx->gtp_version, hdrlen);
+	if (unlikely(opts_len)) {
+		struct gtpu_metadata *opts;
+
+		opts = ip_tunnel_info_opts(&tun_dst->u.tun_info);
+		opts->ver = GTP_METADATA_V1;
+		opts->flags = gtp1->flags;
+		opts->type = gtp1->type;
+		netdev_dbg(pctx->dev, "recved control pkt: flag %x type: %d\n",
+			   opts->flags, opts->type);
+		tun_dst->u.tun_info.key.tun_flags |= TUNNEL_GTPU_OPT;
+		tun_dst->u.tun_info.options_len = opts_len;
+		skb->protocol = htons(0xffff);         /* Unknown */
+	}
+
+	skb_dst_set(skb, &tun_dst->dst);
+	return 0;
+err:
+	return -1;
+}
+
+
 /* 1 means pass up to the stack, -1 means drop and 0 means decapsulated. */
 static int gtp0_udp_encap_recv(struct gtp_dev *gtp, struct sk_buff *skb)
 {
@@ -262,6 +311,7 @@  static int gtp1u_udp_encap_recv(struct gtp_dev *gtp, struct sk_buff *skb)
 	unsigned int hdrlen = sizeof(struct udphdr) +
 			      sizeof(struct gtp1_header);
 	struct gtp1_header *gtp1;
+	struct pdp_ctx md_pctx;
 	struct pdp_ctx *pctx;
 
 	if (!pskb_may_pull(skb, hdrlen))
@@ -272,6 +322,24 @@  static int gtp1u_udp_encap_recv(struct gtp_dev *gtp, struct sk_buff *skb)
 	if ((gtp1->flags >> 5) != GTP_V1)
 		return 1;
 
+	if (ip_tunnel_collect_metadata() || gtp->collect_md) {
+		int err;
+
+		pctx = &md_pctx;
+
+		pctx->gtp_version = GTP_V1;
+		pctx->sk = gtp->sk1u;
+		pctx->dev = gtp->dev;
+
+		err = gtp_set_tun_dst(pctx, skb, hdrlen);
+		if (err) {
+			gtp->dev->stats.rx_dropped++;
+			return -1;
+		}
+
+		return gtp_rx(pctx, skb, hdrlen);
+	}
+
 	if (gtp1->type != GTP_TPDU)
 		return 1;
 
@@ -353,7 +421,8 @@  static int gtp_encap_recv(struct sock *sk, struct sk_buff *skb)
 	if (!gtp)
 		return 1;
 
-	netdev_dbg(gtp->dev, "encap_recv sk=%p\n", sk);
+	netdev_dbg(gtp->dev, "encap_recv sk=%p type %d\n",
+		   sk, udp_sk(sk)->encap_type);
 
 	switch (udp_sk(sk)->encap_type) {
 	case UDP_ENCAP_GTP0:
@@ -539,7 +608,7 @@  static struct rtable *gtp_get_v4_rt(struct sk_buff *skb,
 	memset(&fl4, 0, sizeof(fl4));
 	fl4.flowi4_oif		= sk->sk_bound_dev_if;
 	fl4.daddr		= pctx->peer_addr_ip4.s_addr;
-	fl4.saddr		= inet_sk(sk)->inet_saddr;
+	fl4.saddr		= *saddr;
 	fl4.flowi4_tos		= RT_CONN_FLAGS(sk);
 	fl4.flowi4_proto	= sk->sk_protocol;
 
@@ -617,29 +686,84 @@  static void gtp_push_header(struct sk_buff *skb, struct pdp_ctx *pctx,
 static int gtp_xmit_ip4(struct sk_buff *skb, struct net_device *dev)
 {
 	struct gtp_dev *gtp = netdev_priv(dev);
+	struct gtpu_metadata *opts = NULL;
+	struct pdp_ctx md_pctx;
 	struct pdp_ctx *pctx;
+	__be16 port;
 	struct rtable *rt;
-	__be32 saddr;
 	struct iphdr *iph;
+	__be32 saddr;
 	int headroom;
-	__be16 port;
+	__u8 tos;
 	int r;
 
-	/* Read the IP destination address and resolve the PDP context.
-	 * Prepend PDP header with TEI/TID from PDP ctx.
-	 */
-	iph = ip_hdr(skb);
-	if (gtp->role == GTP_ROLE_SGSN)
-		pctx = ipv4_pdp_find(gtp, iph->saddr);
-	else
-		pctx = ipv4_pdp_find(gtp, iph->daddr);
+	if (gtp->collect_md) {
+		/* LWT GTP1U encap */
+		struct ip_tunnel_info *info = NULL;
 
-	if (!pctx) {
-		netdev_dbg(dev, "no PDP ctx found for %pI4, skip\n",
-			   &iph->daddr);
-		return -ENOENT;
+		info = skb_tunnel_info(skb);
+		if (!info) {
+			netdev_dbg(dev, "missing tunnel info");
+			return -ENOENT;
+		}
+		if (info->key.tp_dst && ntohs(info->key.tp_dst) != GTP1U_PORT) {
+			netdev_dbg(dev, "unexpected GTP dst port: %d", ntohs(info->key.tp_dst));
+			return -EOPNOTSUPP;
+		}
+
+		if (!gtp->sk1u) {
+			netdev_dbg(dev, "missing tunnel sock");
+			return -EOPNOTSUPP;
+		}
+
+		pctx = &md_pctx;
+		memset(pctx, 0, sizeof(*pctx));
+		pctx->sk = gtp->sk1u;
+		pctx->gtp_version = GTP_V1;
+		pctx->u.v1.o_tei = ntohl(tunnel_id_to_key32(info->key.tun_id));
+		pctx->peer_addr_ip4.s_addr = info->key.u.ipv4.dst;
+
+		saddr = info->key.u.ipv4.src;
+		tos = info->key.tos;
+
+		if (info->options_len != 0) {
+			if (info->key.tun_flags & TUNNEL_GTPU_OPT) {
+				opts = ip_tunnel_info_opts(info);
+			} else {
+				netdev_dbg(dev, "missing tunnel metadata for control pkt");
+				return -EOPNOTSUPP;
+			}
+		}
+		netdev_dbg(dev, "flow-based GTP1U encap: tunnel id %d\n",
+			   pctx->u.v1.o_tei);
+	} else {
+		struct iphdr *iph;
+
+		if (ntohs(skb->protocol) != ETH_P_IP)
+			return -EOPNOTSUPP;
+
+		iph = ip_hdr(skb);
+
+		/* Read the IP destination address and resolve the PDP context.
+		 * Prepend PDP header with TEI/TID from PDP ctx.
+		 */
+		if (gtp->role == GTP_ROLE_SGSN)
+			pctx = ipv4_pdp_find(gtp, iph->saddr);
+		else
+			pctx = ipv4_pdp_find(gtp, iph->daddr);
+
+		if (!pctx) {
+			netdev_dbg(dev, "no PDP ctx found for %pI4, skip\n",
+				   &iph->daddr);
+			return -ENOENT;
+		}
+		netdev_dbg(dev, "found PDP context %p\n", pctx);
+
+		saddr = inet_sk(pctx->sk)->inet_saddr;
+		tos = iph->tos;
+		netdev_dbg(dev, "gtp -> IP src: %pI4 dst: %pI4\n",
+			   &iph->saddr, &iph->daddr);
 	}
-	netdev_dbg(dev, "found PDP context %p\n", pctx);
 
 	rt = gtp_get_v4_rt(skb, dev, pctx, &saddr);
 	if (IS_ERR(rt)) {
@@ -691,7 +815,7 @@  static int gtp_xmit_ip4(struct sk_buff *skb, struct net_device *dev)
 
 	udp_tunnel_xmit_skb(rt, pctx->sk, skb,
 			    saddr, pctx->peer_addr_ip4.s_addr,
-			    iph->tos,
+			    tos,
 			    ip4_dst_hoplimit(&rt->dst),
 			    0,
 			    port, port,
diff --git a/include/uapi/linux/gtp.h b/include/uapi/linux/gtp.h
index 79f9191bbb24..62aff78b7c56 100644
--- a/include/uapi/linux/gtp.h
+++ b/include/uapi/linux/gtp.h
@@ -2,6 +2,8 @@ 
 #ifndef _UAPI_LINUX_GTP_H_
 #define _UAPI_LINUX_GTP_H_
 
+#include <linux/types.h>
+
 #define GTP_GENL_MCGRP_NAME	"gtp"
 
 enum gtp_genl_cmds {
@@ -34,4 +36,14 @@  enum gtp_attrs {
 };
 #define GTPA_MAX (__GTPA_MAX + 1)
 
+enum {
+	GTP_METADATA_V1
+};
+
+struct gtpu_metadata {
+	__u8    ver;
+	__u8    flags;
+	__u8    type;
+};
+
 #endif /* _UAPI_LINUX_GTP_H_ */
diff --git a/include/uapi/linux/if_tunnel.h b/include/uapi/linux/if_tunnel.h
index 7d9105533c7b..802da679fab1 100644
--- a/include/uapi/linux/if_tunnel.h
+++ b/include/uapi/linux/if_tunnel.h
@@ -176,6 +176,7 @@  enum {
 #define TUNNEL_VXLAN_OPT	__cpu_to_be16(0x1000)
 #define TUNNEL_NOCACHE		__cpu_to_be16(0x2000)
 #define TUNNEL_ERSPAN_OPT	__cpu_to_be16(0x4000)
+#define TUNNEL_GTPU_OPT		__cpu_to_be16(0x8000)
 
 #define TUNNEL_OPTIONS_PRESENT \
 		(TUNNEL_GENEVE_OPT | TUNNEL_VXLAN_OPT | TUNNEL_ERSPAN_OPT)
diff --git a/tools/include/uapi/linux/if_link.h b/tools/include/uapi/linux/if_link.h
index d208b2af697f..28d649bda686 100644
--- a/tools/include/uapi/linux/if_link.h
+++ b/tools/include/uapi/linux/if_link.h
@@ -617,6 +617,7 @@  enum {
 	IFLA_GTP_FD1,
 	IFLA_GTP_PDP_HASHSIZE,
 	IFLA_GTP_ROLE,
+	IFLA_GTP_COLLECT_METADATA,
 	__IFLA_GTP_MAX,
 };
 #define IFLA_GTP_MAX (__IFLA_GTP_MAX - 1)