From patchwork Mon Apr 1 15:32:08 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alan Maguire X-Patchwork-Id: 10880157 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 3F354139A for ; Mon, 1 Apr 2019 15:33:13 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 2892F285BE for ; Mon, 1 Apr 2019 15:33:13 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 1C77C28779; Mon, 1 Apr 2019 15:33:13 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.0 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI, UNPARSEABLE_RELAY autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 38C7F285BE for ; Mon, 1 Apr 2019 15:33:12 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727190AbfDAPdM (ORCPT ); Mon, 1 Apr 2019 11:33:12 -0400 Received: from userp2130.oracle.com ([156.151.31.86]:59860 "EHLO userp2130.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726514AbfDAPdL (ORCPT ); Mon, 1 Apr 2019 11:33:11 -0400 Received: from pps.filterd (userp2130.oracle.com [127.0.0.1]) by userp2130.oracle.com (8.16.0.27/8.16.0.27) with SMTP id x31FTEKU040108; Mon, 1 Apr 2019 15:32:45 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : cc : subject : date : message-id : in-reply-to : references; s=corp-2018-07-02; bh=wyg8y8hW62LbVSF7WSMB9NfbtvwpobZD/x2375SBSgY=; b=jKk3XFf1WC/rWZ/X8r3WOYlO/20Q4r9BlhRlae0xWFUD31eqEyJ5fAnXljTSZ9H2yS9X vrIqpAfdxQiVWQ7eNviP3/uWFs7SzxSuLfRSB1kNcQEU6E/OSLfM6u0YqjaR8DH9Av7I 75cmi4YJiBk1AKLr7ziDBlwJKSWQR/2N0vkj+DQAV6D+fzKUo937pS5nvqNhMs/k4zyk 9ae0FE7zdBlNdAaBdHuZtlbzc9iS63CyXD/pnX8HOC9k0g4i+S5X4tMzRKrsOgdCisyl 9kPJqNfaxjJZjSTChdVH7kquLEsCIMFfCLy/fLVLrL5XIBzux/AJvNS8SIEw2Nav6k/2 kQ== Received: from userv0021.oracle.com (userv0021.oracle.com [156.151.31.71]) by userp2130.oracle.com with ESMTP id 2rhyvsyu0b-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Mon, 01 Apr 2019 15:32:45 +0000 Received: from userv0121.oracle.com (userv0121.oracle.com [156.151.31.72]) by userv0021.oracle.com (8.14.4/8.14.4) with ESMTP id x31FWjMx028771 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Mon, 1 Apr 2019 15:32:45 GMT Received: from abhmp0022.oracle.com (abhmp0022.oracle.com [141.146.116.28]) by userv0121.oracle.com (8.14.4/8.13.8) with ESMTP id x31FWio8015621; Mon, 1 Apr 2019 15:32:44 GMT Received: from dhcp-10-175-171-252.vpn.oracle.com (/10.175.171.252) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Mon, 01 Apr 2019 08:32:44 -0700 From: Alan Maguire To: willemb@google.com, ast@kernel.org, daniel@iogearbox.net, davem@davemloft.net, shuah@kernel.org, kafai@fb.com, songliubraving@fb.com, yhs@fb.com, quentin.monnet@netronome.com, john.fastabend@gmail.com, rdna@fb.com, linux-kselftest@vger.kernel.org, netdev@vger.kernel.org, bpf@vger.kernel.org Cc: Alan Maguire Subject: [PATCH bpf-next 1/4] selftests_bpf: extend test_tc_tunnel for UDP encap Date: Mon, 1 Apr 2019 16:32:08 +0100 Message-Id: <1554132731-3095-2-git-send-email-alan.maguire@oracle.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1554132731-3095-1-git-send-email-alan.maguire@oracle.com> References: <1554132731-3095-1-git-send-email-alan.maguire@oracle.com> X-Proofpoint-Virus-Version: vendor=nai engine=5900 definitions=9214 signatures=668685 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 priorityscore=1501 malwarescore=0 suspectscore=4 phishscore=0 bulkscore=0 spamscore=0 clxscore=1015 lowpriorityscore=0 mlxscore=0 impostorscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1810050000 definitions=main-1904010104 Sender: linux-kselftest-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kselftest@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP In commit 868d523535c2 ("bpf: add bpf_skb_adjust_room encap flags") ...Willem introduced support to bpf_skb_adjust_room for GSO-friendly GRE and UDP encapsulation and later introduced associated test_tc_tunnel tests. Here those tests are extended to cover UDP encapsulation also. Signed-off-by: Alan Maguire --- tools/testing/selftests/bpf/progs/test_tc_tunnel.c | 149 ++++++++++++++------- tools/testing/selftests/bpf/test_tc_tunnel.sh | 72 +++++++--- 2 files changed, 157 insertions(+), 64 deletions(-) diff --git a/tools/testing/selftests/bpf/progs/test_tc_tunnel.c b/tools/testing/selftests/bpf/progs/test_tc_tunnel.c index f541c2d..cc88379 100644 --- a/tools/testing/selftests/bpf/progs/test_tc_tunnel.c +++ b/tools/testing/selftests/bpf/progs/test_tc_tunnel.c @@ -12,6 +12,7 @@ #include #include #include +#include #include #include @@ -20,16 +21,27 @@ static const int cfg_port = 8000; -struct grev4hdr { - struct iphdr ip; +static const int cfg_udp_src = 20000; +static const int cfg_udp_dst = 5555; + +struct gre_hdr { __be16 flags; __be16 protocol; } __attribute__((packed)); -struct grev6hdr { +union l4hdr { + struct udphdr udp; + struct gre_hdr gre; +}; + +struct v4hdr { + struct iphdr ip; + union l4hdr l4hdr; +} __attribute__((packed)); + +struct v6hdr { struct ipv6hdr ip; - __be16 flags; - __be16 protocol; + union l4hdr l4hdr; } __attribute__((packed)); static __always_inline void set_ipv4_csum(struct iphdr *iph) @@ -47,10 +59,11 @@ static __always_inline void set_ipv4_csum(struct iphdr *iph) iph->check = ~((csum & 0xffff) + (csum >> 16)); } -static __always_inline int encap_ipv4(struct __sk_buff *skb, bool with_gre) +static __always_inline int encap_ipv4(struct __sk_buff *skb, __u8 encap_proto) { - struct grev4hdr h_outer; struct iphdr iph_inner; + struct v4hdr h_outer; + struct udphdr *udph; struct tcphdr tcph; __u64 flags; int olen; @@ -70,12 +83,29 @@ static __always_inline int encap_ipv4(struct __sk_buff *skb, bool with_gre) if (tcph.dest != __bpf_constant_htons(cfg_port)) return TC_ACT_OK; - flags = BPF_F_ADJ_ROOM_FIXED_GSO | BPF_F_ADJ_ROOM_ENCAP_L3_IPV4; - if (with_gre) { - flags |= BPF_F_ADJ_ROOM_ENCAP_L4_GRE; - olen = sizeof(h_outer); - } else { - olen = sizeof(h_outer.ip); + olen = sizeof(h_outer.ip); + + flags = BPF_F_ADJ_ROOM_ENCAP_L3_IPV4; + switch (encap_proto) { + case IPPROTO_GRE: + flags |= BPF_F_ADJ_ROOM_ENCAP_L4_GRE | BPF_F_ADJ_ROOM_FIXED_GSO; + olen += sizeof(h_outer.l4hdr.gre); + h_outer.l4hdr.gre.protocol = bpf_htons(ETH_P_IP); + h_outer.l4hdr.gre.flags = 0; + break; + case IPPROTO_UDP: + flags |= BPF_F_ADJ_ROOM_ENCAP_L4_UDP; + olen += sizeof(h_outer.l4hdr.udp); + h_outer.l4hdr.udp.source = __bpf_constant_htons(cfg_udp_src); + h_outer.l4hdr.udp.dest = __bpf_constant_htons(cfg_udp_dst); + h_outer.l4hdr.udp.check = 0; + h_outer.l4hdr.udp.len = bpf_htons(bpf_ntohs(iph_inner.tot_len) + + sizeof(h_outer.l4hdr.udp)); + break; + case IPPROTO_IPIP: + break; + default: + return TC_ACT_OK; } /* add room between mac and network header */ @@ -85,16 +115,10 @@ static __always_inline int encap_ipv4(struct __sk_buff *skb, bool with_gre) /* prepare new outer network header */ h_outer.ip = iph_inner; h_outer.ip.tot_len = bpf_htons(olen + - bpf_htons(h_outer.ip.tot_len)); - if (with_gre) { - h_outer.ip.protocol = IPPROTO_GRE; - h_outer.protocol = bpf_htons(ETH_P_IP); - h_outer.flags = 0; - } else { - h_outer.ip.protocol = IPPROTO_IPIP; - } + bpf_htons(h_outer.ip.tot_len)); + h_outer.ip.protocol = encap_proto; - set_ipv4_csum((void *)&h_outer.ip); + set_ipv4_csum(&h_outer.ip); /* store new outer network header */ if (bpf_skb_store_bytes(skb, ETH_HLEN, &h_outer, olen, @@ -104,11 +128,12 @@ static __always_inline int encap_ipv4(struct __sk_buff *skb, bool with_gre) return TC_ACT_OK; } -static __always_inline int encap_ipv6(struct __sk_buff *skb, bool with_gre) +static __always_inline int encap_ipv6(struct __sk_buff *skb, __u8 encap_proto) { struct ipv6hdr iph_inner; - struct grev6hdr h_outer; + struct v6hdr h_outer; struct tcphdr tcph; + __u16 tot_len; __u64 flags; int olen; @@ -124,14 +149,31 @@ static __always_inline int encap_ipv6(struct __sk_buff *skb, bool with_gre) if (tcph.dest != __bpf_constant_htons(cfg_port)) return TC_ACT_OK; - flags = BPF_F_ADJ_ROOM_FIXED_GSO | BPF_F_ADJ_ROOM_ENCAP_L3_IPV6; - if (with_gre) { - flags |= BPF_F_ADJ_ROOM_ENCAP_L4_GRE; - olen = sizeof(h_outer); - } else { - olen = sizeof(h_outer.ip); - } + olen = sizeof(h_outer.ip); + flags = BPF_F_ADJ_ROOM_ENCAP_L3_IPV6; + switch (encap_proto) { + case IPPROTO_GRE: + flags |= BPF_F_ADJ_ROOM_ENCAP_L4_GRE | BPF_F_ADJ_ROOM_FIXED_GSO; + olen += sizeof(h_outer.l4hdr.gre); + h_outer.l4hdr.gre.protocol = bpf_htons(ETH_P_IPV6); + h_outer.l4hdr.gre.flags = 0; + break; + case IPPROTO_UDP: + flags |= BPF_F_ADJ_ROOM_ENCAP_L4_UDP; + olen += sizeof(h_outer.l4hdr.udp); + h_outer.l4hdr.udp.source = __bpf_constant_htons(cfg_udp_src); + h_outer.l4hdr.udp.dest = __bpf_constant_htons(cfg_udp_dst); + h_outer.l4hdr.udp.check = 0; + tot_len = bpf_ntohs(iph_inner.payload_len) + sizeof(iph_inner); + h_outer.l4hdr.udp.len = bpf_htons(tot_len + + sizeof(h_outer.l4hdr.udp)); + break; + case IPPROTO_IPV6: + break; + default: + return TC_ACT_OK; + } /* add room between mac and network header */ if (bpf_skb_adjust_room(skb, olen, BPF_ADJ_ROOM_MAC, flags)) @@ -141,13 +183,8 @@ static __always_inline int encap_ipv6(struct __sk_buff *skb, bool with_gre) h_outer.ip = iph_inner; h_outer.ip.payload_len = bpf_htons(olen + bpf_ntohs(h_outer.ip.payload_len)); - if (with_gre) { - h_outer.ip.nexthdr = IPPROTO_GRE; - h_outer.protocol = bpf_htons(ETH_P_IPV6); - h_outer.flags = 0; - } else { - h_outer.ip.nexthdr = IPPROTO_IPV6; - } + + h_outer.ip.nexthdr = encap_proto; /* store new outer network header */ if (bpf_skb_store_bytes(skb, ETH_HLEN, &h_outer, olen, @@ -161,7 +198,7 @@ static __always_inline int encap_ipv6(struct __sk_buff *skb, bool with_gre) int __encap_ipip(struct __sk_buff *skb) { if (skb->protocol == __bpf_constant_htons(ETH_P_IP)) - return encap_ipv4(skb, false); + return encap_ipv4(skb, IPPROTO_IPIP); else return TC_ACT_OK; } @@ -170,7 +207,16 @@ int __encap_ipip(struct __sk_buff *skb) int __encap_gre(struct __sk_buff *skb) { if (skb->protocol == __bpf_constant_htons(ETH_P_IP)) - return encap_ipv4(skb, true); + return encap_ipv4(skb, IPPROTO_GRE); + else + return TC_ACT_OK; +} + +SEC("encap_udp") +int __encap_udp(struct __sk_buff *skb) +{ + if (skb->protocol == __bpf_constant_htons(ETH_P_IP)) + return encap_ipv4(skb, IPPROTO_UDP); else return TC_ACT_OK; } @@ -179,7 +225,7 @@ int __encap_gre(struct __sk_buff *skb) int __encap_ip6tnl(struct __sk_buff *skb) { if (skb->protocol == __bpf_constant_htons(ETH_P_IPV6)) - return encap_ipv6(skb, false); + return encap_ipv6(skb, IPPROTO_IPV6); else return TC_ACT_OK; } @@ -188,23 +234,34 @@ int __encap_ip6tnl(struct __sk_buff *skb) int __encap_ip6gre(struct __sk_buff *skb) { if (skb->protocol == __bpf_constant_htons(ETH_P_IPV6)) - return encap_ipv6(skb, true); + return encap_ipv6(skb, IPPROTO_GRE); + else + return TC_ACT_OK; +} + +SEC("encap_ip6udp") +int __encap_ip6udp(struct __sk_buff *skb) +{ + if (skb->protocol == __bpf_constant_htons(ETH_P_IPV6)) + return encap_ipv6(skb, IPPROTO_UDP); else return TC_ACT_OK; } static int decap_internal(struct __sk_buff *skb, int off, int len, char proto) { - char buf[sizeof(struct grev6hdr)]; - int olen; + char buf[sizeof(struct v6hdr)]; + int olen = len; switch (proto) { case IPPROTO_IPIP: case IPPROTO_IPV6: - olen = len; break; case IPPROTO_GRE: - olen = len + 4 /* gre hdr */; + olen += sizeof(struct gre_hdr); + break; + case IPPROTO_UDP: + olen += sizeof(struct udphdr); break; default: return TC_ACT_OK; diff --git a/tools/testing/selftests/bpf/test_tc_tunnel.sh b/tools/testing/selftests/bpf/test_tc_tunnel.sh index c805adb..3ae54f0 100755 --- a/tools/testing/selftests/bpf/test_tc_tunnel.sh +++ b/tools/testing/selftests/bpf/test_tc_tunnel.sh @@ -15,6 +15,9 @@ readonly ns2_v4=192.168.1.2 readonly ns1_v6=fd::1 readonly ns2_v6=fd::2 +# Must match port used by bpf program +readonly udpport=5555 + readonly infile="$(mktemp)" readonly outfile="$(mktemp)" @@ -103,6 +106,18 @@ if [[ "$#" -eq "0" ]]; then echo "ip6 gre gso" $0 ipv6 ip6gre 2000 + echo "ip udp" + $0 ipv4 udp 100 + + echo "ip6 udp" + $0 ipv6 ip6udp 100 + + echo "ip udp gso" + $0 ipv4 udp 2000 + + echo "ip6 udp gso" + $0 ipv6 ip6udp 2000 + echo "OK. All tests passed" exit 0 fi @@ -117,12 +132,14 @@ case "$1" in "ipv4") readonly addr1="${ns1_v4}" readonly addr2="${ns2_v4}" - readonly netcat_opt=-4 + readonly ipproto=4 + readonly netcat_opt=-${ipproto} ;; "ipv6") readonly addr1="${ns1_v6}" readonly addr2="${ns2_v6}" - readonly netcat_opt=-6 + readonly ipproto=6 + readonly netcat_opt=-${ipproto} ;; *) echo "unknown arg: $1" @@ -158,27 +175,46 @@ server_listen # serverside, insert decap module # server is still running # client can connect again -ip netns exec "${ns2}" ip link add dev testtun0 type "${tuntype}" \ - remote "${addr1}" local "${addr2}" -# Because packets are decapped by the tunnel they arrive on testtun0 from -# the IP stack perspective. Ensure reverse path filtering is disabled -# otherwise we drop the TCP SYN as arriving on testtun0 instead of the -# expected veth2 (veth2 is where 192.168.1.2 is configured). -ip netns exec "${ns2}" sysctl -qw net.ipv4.conf.all.rp_filter=0 -# rp needs to be disabled for both all and testtun0 as the rp value is -# selected as the max of the "all" and device-specific values. -ip netns exec "${ns2}" sysctl -qw net.ipv4.conf.testtun0.rp_filter=0 -ip netns exec "${ns2}" ip link set dev testtun0 up -echo "test bpf encap with tunnel device decap" -client_connect -verify_data + +# Skip tunnel tests for ip6udp. For IPv6, a UDP checksum is required +# and there seems to be no way to tell a fou6 tunnel to allow 0 +# checksums. Accordingly for both these cases, we skip tests against +# tunnel peer, and test encap using BPF decap only. +if [[ "$tuntype" != "ip6udp" ]]; then + if [[ "$tuntype" == "udp" ]]; then + # Set up fou tunnel. + ttype=ipip + targs="encap fou encap-sport auto encap-dport $udpport" + # fou may be a module; allow this to fail. + modprobe fou ||true + ip netns exec "${ns2}" ip fou add port 5555 ipproto "${ipproto}" + else + ttype=$tuntype + targs="" + fi + ip netns exec "${ns2}" ip link add name testtun0 type "${ttype}" \ + remote "${addr1}" local "${addr2}" $targs + # Because packets are decapped by the tunnel they arrive on testtun0 + # from the IP stack perspective. Ensure reverse path filtering is + # disabled otherwise we drop the TCP SYN as arriving on testtun0 + # instead of the expected veth2 (veth2 is where 192.168.1.2 is + # configured). + ip netns exec "${ns2}" sysctl -qw net.ipv4.conf.all.rp_filter=0 + # rp needs to be disabled for both all and testtun0 as the rp value is + # selected as the max of the "all" and device-specific values. + ip netns exec "${ns2}" sysctl -qw net.ipv4.conf.testtun0.rp_filter=0 + ip netns exec "${ns2}" ip link set dev testtun0 up + echo "test bpf encap with tunnel device decap" + client_connect + verify_data + ip netns exec "${ns2}" ip link del dev testtun0 + server_listen +fi # serverside, use BPF for decap -ip netns exec "${ns2}" ip link del dev testtun0 ip netns exec "${ns2}" tc qdisc add dev veth2 clsact ip netns exec "${ns2}" tc filter add dev veth2 ingress \ bpf direct-action object-file ./test_tc_tunnel.o section decap -server_listen echo "test bpf encap with bpf decap" client_connect verify_data From patchwork Mon Apr 1 15:32:09 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alan Maguire X-Patchwork-Id: 10880161 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 52E7414DE for ; Mon, 1 Apr 2019 15:33:31 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 3CFDC285BE for ; Mon, 1 Apr 2019 15:33:31 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 30AB428779; Mon, 1 Apr 2019 15:33:31 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.0 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI, UNPARSEABLE_RELAY autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id B049F285BE for ; Mon, 1 Apr 2019 15:33:30 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728255AbfDAPda (ORCPT ); Mon, 1 Apr 2019 11:33:30 -0400 Received: from userp2130.oracle.com ([156.151.31.86]:60200 "EHLO userp2130.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726514AbfDAPda (ORCPT ); Mon, 1 Apr 2019 11:33:30 -0400 Received: from pps.filterd (userp2130.oracle.com [127.0.0.1]) by userp2130.oracle.com (8.16.0.27/8.16.0.27) with SMTP id x31FTK2I040180; Mon, 1 Apr 2019 15:33:05 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : cc : subject : date : message-id : in-reply-to : references; s=corp-2018-07-02; bh=VoMXM1QWgyjyznRLlltmKjzTr7D78C6JCyvieJydhNE=; b=ReW0ntEH4VEhrA6C31oTasjuB5E7o+QAdF0Uf18khMo1mXAe0sRB0RVFwNRLuQElSJrH O0PG3zPaaBa7fXDN4vRfLkyUhH9T4K53FqZbEJZF/PY0P8J0z5c9T4ZMUwXPE5OfmbTv rmXVzlwQFEpowzskJnPEZJXVQrWs0DtYBLgUNCbAZ4KfXKLP0+C1lHxi5zjF5m3iJ8H+ BW4hQjXwjtdIklhAxik2F0w9Ut1pqz74lzuPMNJ1+AVXJDpVJKIfAbxYzASNmSONL0oU h4/vNhj1JxA88Rzn7L0T/JpM8bqcZQ+DuS0vwj2ZtqZN1lcDWrr//tlj8Bd5orNd4rnn xA== Received: from userv0021.oracle.com (userv0021.oracle.com [156.151.31.71]) by userp2130.oracle.com with ESMTP id 2rhyvsyu32-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Mon, 01 Apr 2019 15:33:05 +0000 Received: from userv0121.oracle.com (userv0121.oracle.com [156.151.31.72]) by userv0021.oracle.com (8.14.4/8.14.4) with ESMTP id x31FWttk028990 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Mon, 1 Apr 2019 15:32:55 GMT Received: from abhmp0022.oracle.com (abhmp0022.oracle.com [141.146.116.28]) by userv0121.oracle.com (8.14.4/8.13.8) with ESMTP id x31FWsfk015704; Mon, 1 Apr 2019 15:32:54 GMT Received: from dhcp-10-175-171-252.vpn.oracle.com (/10.175.171.252) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Mon, 01 Apr 2019 08:32:53 -0700 From: Alan Maguire To: willemb@google.com, ast@kernel.org, daniel@iogearbox.net, davem@davemloft.net, shuah@kernel.org, kafai@fb.com, songliubraving@fb.com, yhs@fb.com, quentin.monnet@netronome.com, john.fastabend@gmail.com, rdna@fb.com, linux-kselftest@vger.kernel.org, netdev@vger.kernel.org, bpf@vger.kernel.org Cc: Alan Maguire Subject: [PATCH bpf-next 2/4] bpf: add layer 2 encap support to bpf_skb_adjust_room Date: Mon, 1 Apr 2019 16:32:09 +0100 Message-Id: <1554132731-3095-3-git-send-email-alan.maguire@oracle.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1554132731-3095-1-git-send-email-alan.maguire@oracle.com> References: <1554132731-3095-1-git-send-email-alan.maguire@oracle.com> X-Proofpoint-Virus-Version: vendor=nai engine=5900 definitions=9214 signatures=668685 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 priorityscore=1501 malwarescore=0 suspectscore=3 phishscore=0 bulkscore=0 spamscore=0 clxscore=1015 lowpriorityscore=0 mlxscore=0 impostorscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1810050000 definitions=main-1904010104 Sender: linux-kselftest-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kselftest@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP In commit 868d523535c2 ("bpf: add bpf_skb_adjust_room encap flags") ...Willem introduced support to bpf_skb_adjust_room for GSO-friendly GRE and UDP encapsulation. For GSO to work for skbs, the inner headers (mac and network) need to be marked. For L3 encapsulation using bpf_skb_adjust_room, the mac and network headers are identical. Here we provide a way of specifying the inner mac header length for cases where L2 encap is desired. Such an approach can support encapsulated ethernet headers, MPLS headers etc. For example to convert from a packet of form [eth][ip][tcp] to [eth][ip][udp][inner mac][ip][tcp], something like the following could be done: headroom = sizeof(iph) + sizeof(struct udphdr) + inner_maclen; ret = bpf_skb_adjust_room(skb, headroom, BPF_ADJ_ROOM_MAC, BPF_F_ADJ_ROOM_ENCAP_L4_UDP | BPF_F_ADJ_ROOM_ENCAP_L3_IPV4 | BPF_F_ADJ_ROOM_ENCAP_L2(inner_maclen)); Signed-off-by: Alan Maguire --- include/uapi/linux/bpf.h | 5 +++++ net/core/filter.c | 19 ++++++++++++++----- 2 files changed, 19 insertions(+), 5 deletions(-) diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h index 8370245..6d8346a 100644 --- a/include/uapi/linux/bpf.h +++ b/include/uapi/linux/bpf.h @@ -1500,6 +1500,10 @@ struct bpf_stack_build_id { * * **BPF_F_ADJ_ROOM_ENCAP_L4_UDP **: * Use with ENCAP_L3 flags to further specify the tunnel type. * + * * **BPF_F_ADJ_ROOM_ENCAP_L2(len) **: + * Use with ENCAP_L3/L4 flags to further specify the tunnel + * type; **len** is the length of the inner MAC header. + * * A call to this helper is susceptible to change the underlaying * packet buffer. Therefore, at load time, all checks on pointers * previously done by the verifier are invalidated and must be @@ -2645,6 +2649,7 @@ enum bpf_func_id { #define BPF_F_ADJ_ROOM_ENCAP_L3_IPV6 (1ULL << 2) #define BPF_F_ADJ_ROOM_ENCAP_L4_GRE (1ULL << 3) #define BPF_F_ADJ_ROOM_ENCAP_L4_UDP (1ULL << 4) +#define BPF_F_ADJ_ROOM_ENCAP_L2(len) (((__u64)len & 0xff) << 56) /* Mode for BPF_FUNC_skb_adjust_room helper. */ enum bpf_adj_room_mode { diff --git a/net/core/filter.c b/net/core/filter.c index 22eb2ed..02ae8c0 100644 --- a/net/core/filter.c +++ b/net/core/filter.c @@ -2969,14 +2969,16 @@ static u32 bpf_skb_net_base_len(const struct sk_buff *skb) #define BPF_F_ADJ_ROOM_MASK (BPF_F_ADJ_ROOM_FIXED_GSO | \ BPF_F_ADJ_ROOM_ENCAP_L3_MASK | \ BPF_F_ADJ_ROOM_ENCAP_L4_GRE | \ - BPF_F_ADJ_ROOM_ENCAP_L4_UDP) + BPF_F_ADJ_ROOM_ENCAP_L4_UDP | \ + BPF_F_ADJ_ROOM_ENCAP_L2(0xff)) static int bpf_skb_net_grow(struct sk_buff *skb, u32 off, u32 len_diff, u64 flags) { + u16 mac_len = 0, inner_mac = 0, inner_net = 0, inner_trans = 0; bool encap = flags & BPF_F_ADJ_ROOM_ENCAP_L3_MASK; - u16 mac_len = 0, inner_net = 0, inner_trans = 0; unsigned int gso_type = SKB_GSO_DODGY; + u8 inner_mac_len = flags >> 56; int ret; if (skb_is_gso(skb) && !skb_is_gso_tcp(skb)) { @@ -3003,11 +3005,19 @@ static int bpf_skb_net_grow(struct sk_buff *skb, u32 off, u32 len_diff, flags & BPF_F_ADJ_ROOM_ENCAP_L4_UDP) return -EINVAL; + if (flags & BPF_F_ADJ_ROOM_ENCAP_L4_UDP && + flags & BPF_F_ADJ_ROOM_FIXED_GSO && + inner_mac_len > 0) + return -EINVAL; + if (skb->encapsulation) return -EALREADY; mac_len = skb->network_header - skb->mac_header; inner_net = skb->network_header; + if (inner_mac_len > len_diff) + return -EINVAL; + inner_mac = inner_net - inner_mac_len; inner_trans = skb->transport_header; } @@ -3016,8 +3026,7 @@ static int bpf_skb_net_grow(struct sk_buff *skb, u32 off, u32 len_diff, return ret; if (encap) { - /* inner mac == inner_net on l3 encap */ - skb->inner_mac_header = inner_net; + skb->inner_mac_header = inner_mac; skb->inner_network_header = inner_net; skb->inner_transport_header = inner_trans; skb_set_inner_protocol(skb, skb->protocol); @@ -3031,7 +3040,7 @@ static int bpf_skb_net_grow(struct sk_buff *skb, u32 off, u32 len_diff, gso_type |= SKB_GSO_GRE; else if (flags & BPF_F_ADJ_ROOM_ENCAP_L3_IPV6) gso_type |= SKB_GSO_IPXIP6; - else + else if (flags & BPF_F_ADJ_ROOM_ENCAP_L3_IPV4) gso_type |= SKB_GSO_IPXIP4; if (flags & BPF_F_ADJ_ROOM_ENCAP_L4_GRE || From patchwork Mon Apr 1 15:32:10 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alan Maguire X-Patchwork-Id: 10880159 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 7C003139A for ; Mon, 1 Apr 2019 15:33:25 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 66CF0285BE for ; Mon, 1 Apr 2019 15:33:25 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 5AF1928779; Mon, 1 Apr 2019 15:33:25 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.0 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI, UNPARSEABLE_RELAY autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 09DBB285BE for ; Mon, 1 Apr 2019 15:33:25 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727042AbfDAPdY (ORCPT ); Mon, 1 Apr 2019 11:33:24 -0400 Received: from userp2130.oracle.com ([156.151.31.86]:60102 "EHLO userp2130.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726514AbfDAPdY (ORCPT ); Mon, 1 Apr 2019 11:33:24 -0400 Received: from pps.filterd (userp2130.oracle.com [127.0.0.1]) by userp2130.oracle.com (8.16.0.27/8.16.0.27) with SMTP id x31FTDWI040074; Mon, 1 Apr 2019 15:33:00 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : cc : subject : date : message-id : in-reply-to : references; s=corp-2018-07-02; bh=LV/VZ3XE0s/Oa+dHeYlxpbqvz0uWIzIQQtoD2TA66BY=; b=Iih+d0T1U0d0+yq2RJ7HB14HAgj6s+ESbybnsCEY/jfcidWIp9O+SOG5z6KzKaxY90Bs VWjXoXE3J2RvMalS/W7SVMcEANlWgTUB09g+zxPbKKuIxk8sydQnyyVv3U142o0A81/5 racelpu2ghKcz8HSMnhtqkGioq7tL4YfaKirnxNLIvpbtQYs8fv0W9P3RP15+3Lxt6bV XH7nHcJ/RuhLA2j3m9GZm3/IbqP4Xp/LoQfe/X38MdRaAU+GclB0AoWi1QI/p4irC04C uG/tXgNq/H/+uFQ7MVmoLJoUUoZRc1OPgTOcs5OLYAJewXv89YPlfQRcUK2HeIGbs35K dg== Received: from aserv0022.oracle.com (aserv0022.oracle.com [141.146.126.234]) by userp2130.oracle.com with ESMTP id 2rhyvsyu2f-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Mon, 01 Apr 2019 15:33:00 +0000 Received: from aserv0121.oracle.com (aserv0121.oracle.com [141.146.126.235]) by aserv0022.oracle.com (8.14.4/8.14.4) with ESMTP id x31FWx3e016374 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Mon, 1 Apr 2019 15:32:59 GMT Received: from abhmp0022.oracle.com (abhmp0022.oracle.com [141.146.116.28]) by aserv0121.oracle.com (8.14.4/8.13.8) with ESMTP id x31FWxfw006051; Mon, 1 Apr 2019 15:32:59 GMT Received: from dhcp-10-175-171-252.vpn.oracle.com (/10.175.171.252) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Mon, 01 Apr 2019 08:32:59 -0700 From: Alan Maguire To: willemb@google.com, ast@kernel.org, daniel@iogearbox.net, davem@davemloft.net, shuah@kernel.org, kafai@fb.com, songliubraving@fb.com, yhs@fb.com, quentin.monnet@netronome.com, john.fastabend@gmail.com, rdna@fb.com, linux-kselftest@vger.kernel.org, netdev@vger.kernel.org, bpf@vger.kernel.org Cc: Alan Maguire Subject: [PATCH bpf-next 3/4] bpf: sync bpf.h to tools/ for BPF_F_ADJ_ROOM_ENCAP_L2 Date: Mon, 1 Apr 2019 16:32:10 +0100 Message-Id: <1554132731-3095-4-git-send-email-alan.maguire@oracle.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1554132731-3095-1-git-send-email-alan.maguire@oracle.com> References: <1554132731-3095-1-git-send-email-alan.maguire@oracle.com> X-Proofpoint-Virus-Version: vendor=nai engine=5900 definitions=9214 signatures=668685 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 priorityscore=1501 malwarescore=0 suspectscore=3 phishscore=0 bulkscore=0 spamscore=0 clxscore=1015 lowpriorityscore=0 mlxscore=0 impostorscore=0 mlxlogscore=975 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1810050000 definitions=main-1904010104 Sender: linux-kselftest-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kselftest@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Sync include/uapi/linux/bpf.h with tools/ equivalent to add BPF_F_ADJ_ROOM_ENCAP_L2(len) macro. Signed-off-by: Alan Maguire --- tools/include/uapi/linux/bpf.h | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h index 8370245..6d8346a 100644 --- a/tools/include/uapi/linux/bpf.h +++ b/tools/include/uapi/linux/bpf.h @@ -1500,6 +1500,10 @@ struct bpf_stack_build_id { * * **BPF_F_ADJ_ROOM_ENCAP_L4_UDP **: * Use with ENCAP_L3 flags to further specify the tunnel type. * + * * **BPF_F_ADJ_ROOM_ENCAP_L2(len) **: + * Use with ENCAP_L3/L4 flags to further specify the tunnel + * type; **len** is the length of the inner MAC header. + * * A call to this helper is susceptible to change the underlaying * packet buffer. Therefore, at load time, all checks on pointers * previously done by the verifier are invalidated and must be @@ -2645,6 +2649,7 @@ enum bpf_func_id { #define BPF_F_ADJ_ROOM_ENCAP_L3_IPV6 (1ULL << 2) #define BPF_F_ADJ_ROOM_ENCAP_L4_GRE (1ULL << 3) #define BPF_F_ADJ_ROOM_ENCAP_L4_UDP (1ULL << 4) +#define BPF_F_ADJ_ROOM_ENCAP_L2(len) (((__u64)len & 0xff) << 56) /* Mode for BPF_FUNC_skb_adjust_room helper. */ enum bpf_adj_room_mode { From patchwork Mon Apr 1 15:32:11 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alan Maguire X-Patchwork-Id: 10880163 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id BC6C714DE for ; Mon, 1 Apr 2019 15:33:34 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id A6BF5286EE for ; Mon, 1 Apr 2019 15:33:34 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 9AE0B28785; Mon, 1 Apr 2019 15:33:34 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.0 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI, UNPARSEABLE_RELAY autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 91CC5286EE for ; Mon, 1 Apr 2019 15:33:33 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728625AbfDAPdd (ORCPT ); Mon, 1 Apr 2019 11:33:33 -0400 Received: from userp2130.oracle.com ([156.151.31.86]:60240 "EHLO userp2130.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726514AbfDAPdc (ORCPT ); Mon, 1 Apr 2019 11:33:32 -0400 Received: from pps.filterd (userp2130.oracle.com [127.0.0.1]) by userp2130.oracle.com (8.16.0.27/8.16.0.27) with SMTP id x31FTSPI040276; Mon, 1 Apr 2019 15:33:08 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : cc : subject : date : message-id : in-reply-to : references; s=corp-2018-07-02; bh=rGtOH+rk1p2SXwaoz1Y4HYVLfkkrIH67TBB29bhyEmM=; b=cujpGiq7l1Q5gPy+w4qKNarss31qFKJs+tSAihadg7jQdLJx1+/DNgJZ0hyXLRrM6tk/ 0wj/R39GCb5nDbBuGKc8Yw46CeBOTg8eCx66KQMKlyyVWgp78+YfgaDWNRDVJgqomETJ 8Yg/W+xvESAt4sTkEcrlHTwnVycVseBeilp29475VCl8Xnfdf86ezWV/iCUYzXfSEWXf vgWMHhUiq2jUpdFuXmclLKjF0GexLfWN9mPojf5s8DVDzkoIKAbGqeOykmZMxWhxhqIt TYrTQ0mZWYNLkolycECHDvcMK62jnQRR1isbU1+I4BS+a/PcOGVLa+PbDT8poGQUD5t5 BQ== Received: from aserv0021.oracle.com (aserv0021.oracle.com [141.146.126.233]) by userp2130.oracle.com with ESMTP id 2rhyvsyu3c-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Mon, 01 Apr 2019 15:33:07 +0000 Received: from userv0122.oracle.com (userv0122.oracle.com [156.151.31.75]) by aserv0021.oracle.com (8.14.4/8.14.4) with ESMTP id x31FX5An001636 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Mon, 1 Apr 2019 15:33:05 GMT Received: from abhmp0022.oracle.com (abhmp0022.oracle.com [141.146.116.28]) by userv0122.oracle.com (8.14.4/8.14.4) with ESMTP id x31FX4JQ013954; Mon, 1 Apr 2019 15:33:04 GMT Received: from dhcp-10-175-171-252.vpn.oracle.com (/10.175.171.252) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Mon, 01 Apr 2019 08:33:03 -0700 From: Alan Maguire To: willemb@google.com, ast@kernel.org, daniel@iogearbox.net, davem@davemloft.net, shuah@kernel.org, kafai@fb.com, songliubraving@fb.com, yhs@fb.com, quentin.monnet@netronome.com, john.fastabend@gmail.com, rdna@fb.com, linux-kselftest@vger.kernel.org, netdev@vger.kernel.org, bpf@vger.kernel.org Cc: Alan Maguire Subject: [PATCH bpf-next 4/4] selftests_bpf: extend test_tc_tunnel.sh test for L2 encap Date: Mon, 1 Apr 2019 16:32:11 +0100 Message-Id: <1554132731-3095-5-git-send-email-alan.maguire@oracle.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1554132731-3095-1-git-send-email-alan.maguire@oracle.com> References: <1554132731-3095-1-git-send-email-alan.maguire@oracle.com> X-Proofpoint-Virus-Version: vendor=nai engine=5900 definitions=9214 signatures=668685 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 priorityscore=1501 malwarescore=0 suspectscore=4 phishscore=0 bulkscore=0 spamscore=0 clxscore=1015 lowpriorityscore=0 mlxscore=0 impostorscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1810050000 definitions=main-1904010104 Sender: linux-kselftest-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kselftest@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Update test_tc_tunnel to verify adding inner L2 header encapsulation (an MPLS label) works. Signed-off-by: Alan Maguire --- tools/testing/selftests/bpf/progs/test_tc_tunnel.c | 172 +++++++++++++++++---- tools/testing/selftests/bpf/test_tc_tunnel.sh | 59 +++---- 2 files changed, 170 insertions(+), 61 deletions(-) diff --git a/tools/testing/selftests/bpf/progs/test_tc_tunnel.c b/tools/testing/selftests/bpf/progs/test_tc_tunnel.c index cc88379..5127b1b 100644 --- a/tools/testing/selftests/bpf/progs/test_tc_tunnel.c +++ b/tools/testing/selftests/bpf/progs/test_tc_tunnel.c @@ -11,6 +11,7 @@ #include #include #include +#include #include #include #include @@ -23,7 +24,13 @@ static const int cfg_udp_src = 20000; static const int cfg_udp_dst = 5555; +/* MPLSoverUDP */ +#define MPLS_OVER_UDP_PORT 6635 +static const int cfg_mplsudp_dst = MPLS_OVER_UDP_PORT; +/* MPLS label 1000 with S bit (last label) set and ttl of 255. */ +static const __u32 mpls_label = __bpf_constant_htonl(1000 << 12 | + MPLS_LS_S_MASK | 0xff); struct gre_hdr { __be16 flags; __be16 protocol; @@ -37,6 +44,7 @@ struct gre_hdr { struct v4hdr { struct iphdr ip; union l4hdr l4hdr; + __u8 pad[16]; /* enough space for eth header after udp hdr */ } __attribute__((packed)); struct v6hdr { @@ -59,14 +67,17 @@ static __always_inline void set_ipv4_csum(struct iphdr *iph) iph->check = ~((csum & 0xffff) + (csum >> 16)); } -static __always_inline int encap_ipv4(struct __sk_buff *skb, __u8 encap_proto) +static __always_inline int encap_ipv4(struct __sk_buff *skb, __u8 encap_proto, + __u16 l2_proto) { struct iphdr iph_inner; struct v4hdr h_outer; struct udphdr *udph; struct tcphdr tcph; + struct ethhdr eth; + int olen, elen; __u64 flags; - int olen; + __u16 dst; if (bpf_skb_load_bytes(skb, ETH_HLEN, &iph_inner, sizeof(iph_inner)) < 0) @@ -84,23 +95,39 @@ static __always_inline int encap_ipv4(struct __sk_buff *skb, __u8 encap_proto) return TC_ACT_OK; olen = sizeof(h_outer.ip); + elen = 0; flags = BPF_F_ADJ_ROOM_ENCAP_L3_IPV4; + + if (l2_proto == ETH_P_MPLS_UC) { + elen = sizeof(mpls_label); + flags |= BPF_F_ADJ_ROOM_ENCAP_L2(elen); + } + switch (encap_proto) { case IPPROTO_GRE: flags |= BPF_F_ADJ_ROOM_ENCAP_L4_GRE | BPF_F_ADJ_ROOM_FIXED_GSO; olen += sizeof(h_outer.l4hdr.gre); - h_outer.l4hdr.gre.protocol = bpf_htons(ETH_P_IP); + h_outer.l4hdr.gre.protocol = bpf_htons(l2_proto); h_outer.l4hdr.gre.flags = 0; break; case IPPROTO_UDP: flags |= BPF_F_ADJ_ROOM_ENCAP_L4_UDP; olen += sizeof(h_outer.l4hdr.udp); - h_outer.l4hdr.udp.source = __bpf_constant_htons(cfg_udp_src); - h_outer.l4hdr.udp.dest = __bpf_constant_htons(cfg_udp_dst); h_outer.l4hdr.udp.check = 0; h_outer.l4hdr.udp.len = bpf_htons(bpf_ntohs(iph_inner.tot_len) + - sizeof(h_outer.l4hdr.udp)); + sizeof(h_outer.l4hdr.udp) + + elen); + h_outer.l4hdr.udp.source = __bpf_constant_htons(cfg_udp_src); + switch (l2_proto) { + case ETH_P_IP: + dst = cfg_udp_dst; + break; + case ETH_P_MPLS_UC: + dst = cfg_mplsudp_dst; + break; + } + h_outer.l4hdr.udp.dest = bpf_htons(dst); break; case IPPROTO_IPIP: break; @@ -108,6 +135,13 @@ static __always_inline int encap_ipv4(struct __sk_buff *skb, __u8 encap_proto) return TC_ACT_OK; } + /* add L2 encap (if specified) */ + if (l2_proto == ETH_P_MPLS_UC) + __builtin_memcpy((__u8 *)&h_outer + olen, &mpls_label, + sizeof(mpls_label)); + + olen += elen; + /* add room between mac and network header */ if (bpf_skb_adjust_room(skb, olen, BPF_ADJ_ROOM_MAC, flags)) return TC_ACT_SHOT; @@ -124,18 +158,19 @@ static __always_inline int encap_ipv4(struct __sk_buff *skb, __u8 encap_proto) if (bpf_skb_store_bytes(skb, ETH_HLEN, &h_outer, olen, BPF_F_INVALIDATE_HASH) < 0) return TC_ACT_SHOT; - return TC_ACT_OK; } -static __always_inline int encap_ipv6(struct __sk_buff *skb, __u8 encap_proto) +static __always_inline int encap_ipv6(struct __sk_buff *skb, __u8 encap_proto, + __u16 l2_proto) { struct ipv6hdr iph_inner; struct v6hdr h_outer; struct tcphdr tcph; + int olen, elen; __u16 tot_len; __u64 flags; - int olen; + __u16 dst; if (bpf_skb_load_bytes(skb, ETH_HLEN, &iph_inner, sizeof(iph_inner)) < 0) @@ -150,24 +185,39 @@ static __always_inline int encap_ipv6(struct __sk_buff *skb, __u8 encap_proto) return TC_ACT_OK; olen = sizeof(h_outer.ip); + elen = 0; flags = BPF_F_ADJ_ROOM_ENCAP_L3_IPV6; + + if (l2_proto == ETH_P_MPLS_UC) { + elen = sizeof(mpls_label); + flags |= BPF_F_ADJ_ROOM_ENCAP_L2(elen); + } + switch (encap_proto) { case IPPROTO_GRE: flags |= BPF_F_ADJ_ROOM_ENCAP_L4_GRE | BPF_F_ADJ_ROOM_FIXED_GSO; olen += sizeof(h_outer.l4hdr.gre); - h_outer.l4hdr.gre.protocol = bpf_htons(ETH_P_IPV6); + h_outer.l4hdr.gre.protocol = bpf_htons(l2_proto); h_outer.l4hdr.gre.flags = 0; break; case IPPROTO_UDP: flags |= BPF_F_ADJ_ROOM_ENCAP_L4_UDP; olen += sizeof(h_outer.l4hdr.udp); - h_outer.l4hdr.udp.source = __bpf_constant_htons(cfg_udp_src); - h_outer.l4hdr.udp.dest = __bpf_constant_htons(cfg_udp_dst); h_outer.l4hdr.udp.check = 0; tot_len = bpf_ntohs(iph_inner.payload_len) + sizeof(iph_inner); h_outer.l4hdr.udp.len = bpf_htons(tot_len + - sizeof(h_outer.l4hdr.udp)); + sizeof(h_outer.l4hdr.udp) + elen); + h_outer.l4hdr.udp.source = __bpf_constant_htons(cfg_udp_src); + switch (l2_proto) { + case ETH_P_IPV6: + dst = cfg_udp_dst; + break; + case ETH_P_MPLS_UC: + dst = cfg_mplsudp_dst; + break; + } + h_outer.l4hdr.udp.dest = bpf_htons(dst); break; case IPPROTO_IPV6: break; @@ -175,6 +225,13 @@ static __always_inline int encap_ipv6(struct __sk_buff *skb, __u8 encap_proto) return TC_ACT_OK; } + /* add L2 encap (if specified) */ + if (l2_proto == ETH_P_MPLS_UC) + __builtin_memcpy((__u8 *)&h_outer + olen, &mpls_label, + sizeof(mpls_label)); + + olen += elen; + /* add room between mac and network header */ if (bpf_skb_adjust_room(skb, olen, BPF_ADJ_ROOM_MAC, flags)) return TC_ACT_SHOT; @@ -194,63 +251,104 @@ static __always_inline int encap_ipv6(struct __sk_buff *skb, __u8 encap_proto) return TC_ACT_OK; } -SEC("encap_ipip") +SEC("encap_ipip_none") int __encap_ipip(struct __sk_buff *skb) { if (skb->protocol == __bpf_constant_htons(ETH_P_IP)) - return encap_ipv4(skb, IPPROTO_IPIP); + return encap_ipv4(skb, IPPROTO_IPIP, ETH_P_IP); + else + return TC_ACT_OK; +} + +SEC("encap_gre_none") +int __encap_gre_none(struct __sk_buff *skb) +{ + if (skb->protocol == __bpf_constant_htons(ETH_P_IP)) + return encap_ipv4(skb, IPPROTO_GRE, ETH_P_IP); else return TC_ACT_OK; } -SEC("encap_gre") -int __encap_gre(struct __sk_buff *skb) +SEC("encap_gre_mpls") +int __encap_gre_mpls(struct __sk_buff *skb) { if (skb->protocol == __bpf_constant_htons(ETH_P_IP)) - return encap_ipv4(skb, IPPROTO_GRE); + return encap_ipv4(skb, IPPROTO_GRE, ETH_P_MPLS_UC); else return TC_ACT_OK; } -SEC("encap_udp") + +SEC("encap_udp_none") int __encap_udp(struct __sk_buff *skb) { if (skb->protocol == __bpf_constant_htons(ETH_P_IP)) - return encap_ipv4(skb, IPPROTO_UDP); + return encap_ipv4(skb, IPPROTO_UDP, ETH_P_IP); + else + return TC_ACT_OK; +} + +SEC("encap_udp_mpls") +int __encap_udp_mpls(struct __sk_buff *skb) +{ + if (skb->protocol == __bpf_constant_htons(ETH_P_IP)) + return encap_ipv4(skb, IPPROTO_UDP, ETH_P_MPLS_UC); + else + return TC_ACT_OK; +} + + +SEC("encap_ip6tnl_none") +int __encap_ip6tnl_none(struct __sk_buff *skb) +{ + if (skb->protocol == __bpf_constant_htons(ETH_P_IPV6)) + return encap_ipv6(skb, IPPROTO_IPV6, ETH_P_IPV6); + else + return TC_ACT_OK; +} + +SEC("encap_ip6gre_none") +int __encap_ip6gre_none(struct __sk_buff *skb) +{ + if (skb->protocol == __bpf_constant_htons(ETH_P_IPV6)) + return encap_ipv6(skb, IPPROTO_GRE, ETH_P_IPV6); else return TC_ACT_OK; } -SEC("encap_ip6tnl") -int __encap_ip6tnl(struct __sk_buff *skb) +SEC("encap_ip6gre_mpls") +int __encap_ip6gre_mpls(struct __sk_buff *skb) { if (skb->protocol == __bpf_constant_htons(ETH_P_IPV6)) - return encap_ipv6(skb, IPPROTO_IPV6); + return encap_ipv6(skb, IPPROTO_GRE, ETH_P_MPLS_UC); else return TC_ACT_OK; } -SEC("encap_ip6gre") -int __encap_ip6gre(struct __sk_buff *skb) +SEC("encap_ip6udp_none") +int __encap_ip6udp_none(struct __sk_buff *skb) { if (skb->protocol == __bpf_constant_htons(ETH_P_IPV6)) - return encap_ipv6(skb, IPPROTO_GRE); + return encap_ipv6(skb, IPPROTO_UDP, ETH_P_IPV6); else return TC_ACT_OK; } -SEC("encap_ip6udp") -int __encap_ip6udp(struct __sk_buff *skb) +SEC("encap_ip6udp_mpls") +int __encap_ip6udp_mpls(struct __sk_buff *skb) { if (skb->protocol == __bpf_constant_htons(ETH_P_IPV6)) - return encap_ipv6(skb, IPPROTO_UDP); + return encap_ipv6(skb, IPPROTO_UDP, ETH_P_MPLS_UC); else return TC_ACT_OK; } -static int decap_internal(struct __sk_buff *skb, int off, int len, char proto) +static __always_inline int decap_internal(struct __sk_buff *skb, int off, + int len, char proto) { char buf[sizeof(struct v6hdr)]; + struct gre_hdr greh; + struct udphdr udph; int olen = len; switch (proto) { @@ -259,9 +357,17 @@ static int decap_internal(struct __sk_buff *skb, int off, int len, char proto) break; case IPPROTO_GRE: olen += sizeof(struct gre_hdr); + if (bpf_skb_load_bytes(skb, off + len, &greh, sizeof(greh)) < 0) + return TC_ACT_OK; + if (bpf_ntohs(greh.protocol) == ETH_P_MPLS_UC) + olen += sizeof(mpls_label); break; case IPPROTO_UDP: olen += sizeof(struct udphdr); + if (bpf_skb_load_bytes(skb, off + len, &udph, sizeof(udph)) < 0) + return TC_ACT_OK; + if (bpf_ntohs(udph.dest) == MPLS_OVER_UDP_PORT) + olen += sizeof(mpls_label); break; default: return TC_ACT_OK; @@ -274,7 +380,7 @@ static int decap_internal(struct __sk_buff *skb, int off, int len, char proto) return TC_ACT_OK; } -static int decap_ipv4(struct __sk_buff *skb) +static __always_inline int decap_ipv4(struct __sk_buff *skb) { struct iphdr iph_outer; @@ -289,7 +395,7 @@ static int decap_ipv4(struct __sk_buff *skb) iph_outer.protocol); } -static int decap_ipv6(struct __sk_buff *skb) +static __always_inline int decap_ipv6(struct __sk_buff *skb) { struct ipv6hdr iph_outer; @@ -302,7 +408,7 @@ static int decap_ipv6(struct __sk_buff *skb) } SEC("decap") -int decap_f(struct __sk_buff *skb) +static int decap_f(struct __sk_buff *skb) { switch (skb->protocol) { case __bpf_constant_htons(ETH_P_IP): diff --git a/tools/testing/selftests/bpf/test_tc_tunnel.sh b/tools/testing/selftests/bpf/test_tc_tunnel.sh index 3ae54f0..37c479e 100755 --- a/tools/testing/selftests/bpf/test_tc_tunnel.sh +++ b/tools/testing/selftests/bpf/test_tc_tunnel.sh @@ -89,42 +89,44 @@ set -e # no arguments: automated test, run all if [[ "$#" -eq "0" ]]; then echo "ipip" - $0 ipv4 ipip 100 + $0 ipv4 ipip none 100 echo "ip6ip6" - $0 ipv6 ip6tnl 100 + $0 ipv6 ip6tnl none 100 - echo "ip gre" - $0 ipv4 gre 100 + for mac in none mpls ; do + echo "ip gre $mac" + $0 ipv4 gre $mac 100 - echo "ip6 gre" - $0 ipv6 ip6gre 100 + echo "ip6 gre $mac" + $0 ipv6 ip6gre $mac 100 - echo "ip gre gso" - $0 ipv4 gre 2000 + echo "ip gre $mac gso" + $0 ipv4 gre $mac 2000 - echo "ip6 gre gso" - $0 ipv6 ip6gre 2000 + echo "ip6 gre $mac gso" + $0 ipv6 ip6gre $mac 2000 - echo "ip udp" - $0 ipv4 udp 100 + echo "ip udp $mac" + $0 ipv4 udp $mac 100 - echo "ip6 udp" - $0 ipv6 ip6udp 100 + echo "ip6 udp $mac" + $0 ipv6 ip6udp $mac 100 - echo "ip udp gso" - $0 ipv4 udp 2000 + echo "ip udp $mac gso" + $0 ipv4 udp $mac 2000 - echo "ip6 udp gso" - $0 ipv6 ip6udp 2000 + echo "ip6 udp $mac gso" + $0 ipv6 ip6udp $mac 2000 + done echo "OK. All tests passed" exit 0 fi -if [[ "$#" -ne "3" ]]; then +if [[ "$#" -ne "4" ]]; then echo "Usage: $0" - echo " or: $0 " + echo " or: $0 " exit 1 fi @@ -148,9 +150,10 @@ case "$1" in esac readonly tuntype=$2 -readonly datalen=$3 +readonly mactype=$3 +readonly datalen=$4 -echo "encap ${addr1} to ${addr2}, type ${tuntype}, len ${datalen}" +echo "encap ${addr1} to ${addr2}, tun ${tuntype} mac ${mactype} len ${datalen}" trap cleanup EXIT @@ -167,7 +170,7 @@ verify_data ip netns exec "${ns1}" tc qdisc add dev veth1 clsact ip netns exec "${ns1}" tc filter add dev veth1 egress \ bpf direct-action object-file ./test_tc_tunnel.o \ - section "encap_${tuntype}" + section "encap_${tuntype}_${mactype}" echo "test bpf encap without decap (expect failure)" server_listen ! client_connect @@ -176,11 +179,11 @@ server_listen # server is still running # client can connect again -# Skip tunnel tests for ip6udp. For IPv6, a UDP checksum is required -# and there seems to be no way to tell a fou6 tunnel to allow 0 -# checksums. Accordingly for both these cases, we skip tests against -# tunnel peer, and test encap using BPF decap only. -if [[ "$tuntype" != "ip6udp" ]]; then +# Skip tunnel tests for L2 encap and ip6udp. For IPv6, a UDP checksum +# is required and there seems to be no way to tell a fou6 tunnel to +# allow 0 checksums. Accordingly for both these cases, we skip tests +# against tunnel peer and test using BPF decap only. +if [[ "$mactype" == "none" && "$tuntype" != "ip6udp" ]]; then if [[ "$tuntype" == "udp" ]]; then # Set up fou tunnel. ttype=ipip