From patchwork Tue Oct 20 10:51:02 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: =?utf-8?q?Toke_H=C3=B8iland-J=C3=B8rgensen?= X-Patchwork-Id: 11846273 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-9.8 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS, URIBL_BLOCKED autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id BA579C43467 for ; Tue, 20 Oct 2020 10:51:17 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 4DEBE223C6 for ; Tue, 20 Oct 2020 10:51:17 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="e/7OM5Sz" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2393347AbgJTKvM (ORCPT ); Tue, 20 Oct 2020 06:51:12 -0400 Received: from us-smtp-delivery-124.mimecast.com ([216.205.24.124]:53555 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2393329AbgJTKvL (ORCPT ); Tue, 20 Oct 2020 06:51:11 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1603191068; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=RgnubOFEiETeR6F22XciIV+n1vb5oBSvduhmtnz04ts=; b=e/7OM5Sz8TrOwdruhBfulYo2WdDdyy8pM78a7T7oXQrTNSHm4qIuCrK/jmF1CCEn3cfHxT 5LxuV83wMe/SXyaLVtMGaYT7UcC4Z/mI6ArnqOoOMMeZ7kxe+wuMWJ8P1PrSiSPk4OniFW ITO5PifBCV+UIEvlOB5pCDmwcJsWoj0= Received: from mail-ej1-f69.google.com (mail-ej1-f69.google.com [209.85.218.69]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-431-TxK3QTvvM8q3N7ARAt1xhQ-1; Tue, 20 Oct 2020 06:51:06 -0400 X-MC-Unique: TxK3QTvvM8q3N7ARAt1xhQ-1 Received: by mail-ej1-f69.google.com with SMTP id mm21so656422ejb.18 for ; Tue, 20 Oct 2020 03:51:06 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:from:to:cc:date:message-id:in-reply-to :references:user-agent:mime-version:content-transfer-encoding; bh=RgnubOFEiETeR6F22XciIV+n1vb5oBSvduhmtnz04ts=; b=HR2QMYzE2aoXXCpkVxbnRxE7U5usMlaCfsvr1v4gW648TToAsoDStLasV4fjuyMwnu FTod9eJErAlQwQnuvRzuqsKScbDzr4iTGFOZhbdxn4pEvZHe4j49wKfQYcmlODEfhD2R +B8P/VR1m4eCOjQQUsPDW5NLvWskkrJT7ZMfCcO24AMwmapo75T2i0JaJWzvlnzXXYlp VDFXf5D/zSbGrfOOB+mZQjmaNp/t4AiTvpLpbWVICVXkl9ASYG/xUq0cEiNhxb2J6n2W 70p3HH6dDSgCQ+/D5JR9Xs9Pl868yG1MVKA/X6j18m5Sl6TYUpLfxhxkwJL3CqFXBl3H lrMw== X-Gm-Message-State: AOAM531iRQpCHvHSLxaxAZ8fR6s0gtFP3EwrOP+jsQeax+506rfU97Yg jKlZO2d1+6rK8LPqyiqLS1z3t5+pvQMlzbrxLgmiBNbzvSegEdMuTeMMvxrZrzhVYAtcKBT/8EB ooEdqCCyIfxWU X-Received: by 2002:a50:9e87:: with SMTP id a7mr2110200edf.297.1603191065014; Tue, 20 Oct 2020 03:51:05 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwdcph31He4EHA8QeSBNBTEaU05yr5NVUZviGJbLbCdOzJ7we0OAfvRePum3A6h56svUxMHBw== X-Received: by 2002:a50:9e87:: with SMTP id a7mr2110177edf.297.1603191064584; Tue, 20 Oct 2020 03:51:04 -0700 (PDT) Received: from alrua-x1.borgediget.toke.dk ([45.145.92.2]) by smtp.gmail.com with ESMTPSA id qq10sm2083266ejb.31.2020.10.20.03.51.03 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 20 Oct 2020 03:51:03 -0700 (PDT) Received: by alrua-x1.borgediget.toke.dk (Postfix, from userid 1000) id 4C8461838FB; Tue, 20 Oct 2020 12:51:02 +0200 (CEST) Subject: [PATCH bpf v2 1/3] bpf_redirect_neigh: Support supplying the nexthop as a helper parameter From: =?utf-8?q?Toke_H=C3=B8iland-J=C3=B8rgensen?= To: Daniel Borkmann Cc: David Ahern , netdev@vger.kernel.org, bpf@vger.kernel.org Date: Tue, 20 Oct 2020 12:51:02 +0200 Message-ID: <160319106221.15822.2629789706666194966.stgit@toke.dk> In-Reply-To: <160319106111.15822.18417665895694986295.stgit@toke.dk> References: <160319106111.15822.18417665895694986295.stgit@toke.dk> User-Agent: StGit/0.23 MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net From: Toke Høiland-Jørgensen Based on the discussion in [0], update the bpf_redirect_neigh() helper to accept an optional parameter specifying the nexthop information. This makes it possible to combine bpf_fib_lookup() and bpf_redirect_neigh() without incurring a duplicate FIB lookup - since the FIB lookup helper will return the nexthop information even if no neighbour is present, this can simply be passed on to bpf_redirect_neigh() if bpf_fib_lookup() returns BPF_FIB_LKUP_RET_NO_NEIGH. [0] https://lore.kernel.org/bpf/393e17fc-d187-3a8d-2f0d-a627c7c63fca@iogearbox.net/ Signed-off-by: Toke Høiland-Jørgensen --- include/linux/filter.h | 9 ++ include/uapi/linux/bpf.h | 24 +++++- net/core/filter.c | 163 +++++++++++++++++++++++++--------------- scripts/bpf_helpers_doc.py | 1 tools/include/uapi/linux/bpf.h | 24 +++++- 5 files changed, 153 insertions(+), 68 deletions(-) diff --git a/include/linux/filter.h b/include/linux/filter.h index 20fc24c9779a..ba9de7188cd0 100644 --- a/include/linux/filter.h +++ b/include/linux/filter.h @@ -607,12 +607,21 @@ struct bpf_skb_data_end { void *data_end; }; +struct bpf_nh_params { + u8 nh_family; + union { + __u32 ipv4_nh; + struct in6_addr ipv6_nh; + }; +}; + struct bpf_redirect_info { u32 flags; u32 tgt_index; void *tgt_value; struct bpf_map *map; u32 kern_flags; + struct bpf_nh_params nh; }; DECLARE_PER_CPU(struct bpf_redirect_info, bpf_redirect_info); diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h index bf5a99d803e4..9668cde9d684 100644 --- a/include/uapi/linux/bpf.h +++ b/include/uapi/linux/bpf.h @@ -3677,15 +3677,19 @@ union bpf_attr { * Return * The id is returned or 0 in case the id could not be retrieved. * - * long bpf_redirect_neigh(u32 ifindex, u64 flags) + * long bpf_redirect_neigh(u32 ifindex, struct bpf_redir_neigh *params, int plen, u64 flags) * Description * Redirect the packet to another net device of index *ifindex* * and fill in L2 addresses from neighboring subsystem. This helper * is somewhat similar to **bpf_redirect**\ (), except that it * populates L2 addresses as well, meaning, internally, the helper - * performs a FIB lookup based on the skb's networking header to - * get the address of the next hop and then relies on the neighbor - * lookup for the L2 address of the nexthop. + * relies on the neighbor lookup for the L2 address of the nexthop. + * + * The helper will perform a FIB lookup based on the skb's + * networking header to get the address of the next hop, unless + * this is supplied by the caller in the *params* argument. The + * *plen* argument indicates the len of *params* and should be set + * to 0 if *params* is NULL. * * The *flags* argument is reserved and must be 0. The helper is * currently only supported for tc BPF program types, and enabled @@ -4906,6 +4910,18 @@ struct bpf_fib_lookup { __u8 dmac[6]; /* ETH_ALEN */ }; +struct bpf_redir_neigh { + /* network family for lookup (AF_INET, AF_INET6) */ + __u8 nh_family; + /* avoid hole in struct - must be set to 0 */ + __u8 unused[3]; + /* network address of nexthop; skips fib lookup to find gateway */ + union { + __be32 ipv4_nh; + __u32 ipv6_nh[4]; /* in6_addr; network order */ + }; +}; + enum bpf_task_fd_type { BPF_FD_TYPE_RAW_TRACEPOINT, /* tp name */ BPF_FD_TYPE_TRACEPOINT, /* tp name */ diff --git a/net/core/filter.c b/net/core/filter.c index c5e2a1c5fd8d..fa09b4f141ae 100644 --- a/net/core/filter.c +++ b/net/core/filter.c @@ -2165,12 +2165,12 @@ static int __bpf_redirect(struct sk_buff *skb, struct net_device *dev, } #if IS_ENABLED(CONFIG_IPV6) -static int bpf_out_neigh_v6(struct net *net, struct sk_buff *skb) +static int bpf_out_neigh_v6(struct net *net, struct sk_buff *skb, + struct net_device *dev, struct bpf_nh_params *nh) { - struct dst_entry *dst = skb_dst(skb); - struct net_device *dev = dst->dev; u32 hh_len = LL_RESERVED_SPACE(dev); const struct in6_addr *nexthop; + struct dst_entry *dst = NULL; struct neighbour *neigh; if (dev_xmit_recursion()) { @@ -2196,8 +2196,13 @@ static int bpf_out_neigh_v6(struct net *net, struct sk_buff *skb) } rcu_read_lock_bh(); - nexthop = rt6_nexthop(container_of(dst, struct rt6_info, dst), - &ipv6_hdr(skb)->daddr); + if (!nh) { + dst = skb_dst(skb); + nexthop = rt6_nexthop(container_of(dst, struct rt6_info, dst), + &ipv6_hdr(skb)->daddr); + } else { + nexthop = &nh->ipv6_nh; + } neigh = ip_neigh_gw6(dev, nexthop); if (likely(!IS_ERR(neigh))) { int ret; @@ -2210,36 +2215,43 @@ static int bpf_out_neigh_v6(struct net *net, struct sk_buff *skb) return ret; } rcu_read_unlock_bh(); - IP6_INC_STATS(dev_net(dst->dev), - ip6_dst_idev(dst), IPSTATS_MIB_OUTNOROUTES); + if (dst) + IP6_INC_STATS(dev_net(dst->dev), + ip6_dst_idev(dst), IPSTATS_MIB_OUTNOROUTES); out_drop: kfree_skb(skb); return -ENETDOWN; } -static int __bpf_redirect_neigh_v6(struct sk_buff *skb, struct net_device *dev) +static int __bpf_redirect_neigh_v6(struct sk_buff *skb, struct net_device *dev, + struct bpf_nh_params *nh) { const struct ipv6hdr *ip6h = ipv6_hdr(skb); struct net *net = dev_net(dev); int err, ret = NET_XMIT_DROP; - struct dst_entry *dst; - struct flowi6 fl6 = { - .flowi6_flags = FLOWI_FLAG_ANYSRC, - .flowi6_mark = skb->mark, - .flowlabel = ip6_flowinfo(ip6h), - .flowi6_oif = dev->ifindex, - .flowi6_proto = ip6h->nexthdr, - .daddr = ip6h->daddr, - .saddr = ip6h->saddr, - }; - dst = ipv6_stub->ipv6_dst_lookup_flow(net, NULL, &fl6, NULL); - if (IS_ERR(dst)) - goto out_drop; + if (!nh) { + struct dst_entry *dst; + struct flowi6 fl6 = { + .flowi6_flags = FLOWI_FLAG_ANYSRC, + .flowi6_mark = skb->mark, + .flowlabel = ip6_flowinfo(ip6h), + .flowi6_oif = dev->ifindex, + .flowi6_proto = ip6h->nexthdr, + .daddr = ip6h->daddr, + .saddr = ip6h->saddr, + }; + + dst = ipv6_stub->ipv6_dst_lookup_flow(net, NULL, &fl6, NULL); + if (IS_ERR(dst)) + goto out_drop; - skb_dst_set(skb, dst); + skb_dst_set(skb, dst); + } else if (nh->nh_family != AF_INET6) { + goto out_drop; + } - err = bpf_out_neigh_v6(net, skb); + err = bpf_out_neigh_v6(net, skb, dev, nh); if (unlikely(net_xmit_eval(err))) dev->stats.tx_errors++; else @@ -2252,7 +2264,8 @@ static int __bpf_redirect_neigh_v6(struct sk_buff *skb, struct net_device *dev) return ret; } #else -static int __bpf_redirect_neigh_v6(struct sk_buff *skb, struct net_device *dev) +static int __bpf_redirect_neigh_v6(struct sk_buff *skb, struct net_device *dev, + struct bpf_nh_params *nh) { kfree_skb(skb); return NET_XMIT_DROP; @@ -2260,11 +2273,9 @@ static int __bpf_redirect_neigh_v6(struct sk_buff *skb, struct net_device *dev) #endif /* CONFIG_IPV6 */ #if IS_ENABLED(CONFIG_INET) -static int bpf_out_neigh_v4(struct net *net, struct sk_buff *skb) +static int bpf_out_neigh_v4(struct net *net, struct sk_buff *skb, + struct net_device *dev, struct bpf_nh_params *nh) { - struct dst_entry *dst = skb_dst(skb); - struct rtable *rt = container_of(dst, struct rtable, dst); - struct net_device *dev = dst->dev; u32 hh_len = LL_RESERVED_SPACE(dev); struct neighbour *neigh; bool is_v6gw = false; @@ -2292,7 +2303,21 @@ static int bpf_out_neigh_v4(struct net *net, struct sk_buff *skb) } rcu_read_lock_bh(); - neigh = ip_neigh_for_gw(rt, skb, &is_v6gw); + if (!nh) { + struct dst_entry *dst = skb_dst(skb); + struct rtable *rt = container_of(dst, struct rtable, dst); + + neigh = ip_neigh_for_gw(rt, skb, &is_v6gw); + } else if (nh->nh_family == AF_INET6) { + neigh = ip_neigh_gw6(dev, &nh->ipv6_nh); + is_v6gw = true; + } else if (nh->nh_family == AF_INET) { + neigh = ip_neigh_gw4(dev, nh->ipv4_nh); + } else { + rcu_read_unlock_bh(); + goto out_drop; + } + if (likely(!IS_ERR(neigh))) { int ret; @@ -2309,33 +2334,37 @@ static int bpf_out_neigh_v4(struct net *net, struct sk_buff *skb) return -ENETDOWN; } -static int __bpf_redirect_neigh_v4(struct sk_buff *skb, struct net_device *dev) +static int __bpf_redirect_neigh_v4(struct sk_buff *skb, struct net_device *dev, + struct bpf_nh_params *nh) { const struct iphdr *ip4h = ip_hdr(skb); struct net *net = dev_net(dev); int err, ret = NET_XMIT_DROP; - struct rtable *rt; - struct flowi4 fl4 = { - .flowi4_flags = FLOWI_FLAG_ANYSRC, - .flowi4_mark = skb->mark, - .flowi4_tos = RT_TOS(ip4h->tos), - .flowi4_oif = dev->ifindex, - .flowi4_proto = ip4h->protocol, - .daddr = ip4h->daddr, - .saddr = ip4h->saddr, - }; - rt = ip_route_output_flow(net, &fl4, NULL); - if (IS_ERR(rt)) - goto out_drop; - if (rt->rt_type != RTN_UNICAST && rt->rt_type != RTN_LOCAL) { - ip_rt_put(rt); - goto out_drop; - } + if (!nh) { + struct flowi4 fl4 = { + .flowi4_flags = FLOWI_FLAG_ANYSRC, + .flowi4_mark = skb->mark, + .flowi4_tos = RT_TOS(ip4h->tos), + .flowi4_oif = dev->ifindex, + .flowi4_proto = ip4h->protocol, + .daddr = ip4h->daddr, + .saddr = ip4h->saddr, + }; + struct rtable *rt; + + rt = ip_route_output_flow(net, &fl4, NULL); + if (IS_ERR(rt)) + goto out_drop; + if (rt->rt_type != RTN_UNICAST && rt->rt_type != RTN_LOCAL) { + ip_rt_put(rt); + goto out_drop; + } - skb_dst_set(skb, &rt->dst); + skb_dst_set(skb, &rt->dst); + } - err = bpf_out_neigh_v4(net, skb); + err = bpf_out_neigh_v4(net, skb, dev, nh); if (unlikely(net_xmit_eval(err))) dev->stats.tx_errors++; else @@ -2348,14 +2377,16 @@ static int __bpf_redirect_neigh_v4(struct sk_buff *skb, struct net_device *dev) return ret; } #else -static int __bpf_redirect_neigh_v4(struct sk_buff *skb, struct net_device *dev) +static int __bpf_redirect_neigh_v4(struct sk_buff *skb, struct net_device *dev, + struct bpf_nh_params *nh) { kfree_skb(skb); return NET_XMIT_DROP; } #endif /* CONFIG_INET */ -static int __bpf_redirect_neigh(struct sk_buff *skb, struct net_device *dev) +static int __bpf_redirect_neigh(struct sk_buff *skb, struct net_device *dev, + struct bpf_nh_params *nh) { struct ethhdr *ethh = eth_hdr(skb); @@ -2370,9 +2401,9 @@ static int __bpf_redirect_neigh(struct sk_buff *skb, struct net_device *dev) skb_reset_network_header(skb); if (skb->protocol == htons(ETH_P_IP)) - return __bpf_redirect_neigh_v4(skb, dev); + return __bpf_redirect_neigh_v4(skb, dev, nh); else if (skb->protocol == htons(ETH_P_IPV6)) - return __bpf_redirect_neigh_v6(skb, dev); + return __bpf_redirect_neigh_v6(skb, dev, nh); out: kfree_skb(skb); return -ENOTSUPP; @@ -2382,7 +2413,8 @@ static int __bpf_redirect_neigh(struct sk_buff *skb, struct net_device *dev) enum { BPF_F_NEIGH = (1ULL << 1), BPF_F_PEER = (1ULL << 2), -#define BPF_F_REDIRECT_INTERNAL (BPF_F_NEIGH | BPF_F_PEER) + BPF_F_NEXTHOP = (1ULL << 3), +#define BPF_F_REDIRECT_INTERNAL (BPF_F_NEIGH | BPF_F_PEER | BPF_F_NEXTHOP) }; BPF_CALL_3(bpf_clone_redirect, struct sk_buff *, skb, u32, ifindex, u64, flags) @@ -2455,8 +2487,8 @@ int skb_do_redirect(struct sk_buff *skb) return -EAGAIN; } return flags & BPF_F_NEIGH ? - __bpf_redirect_neigh(skb, dev) : - __bpf_redirect(skb, dev, flags); + __bpf_redirect_neigh(skb, dev, flags & BPF_F_NEXTHOP ? &ri->nh : NULL) : + __bpf_redirect(skb, dev, flags); out_drop: kfree_skb(skb); return -EINVAL; @@ -2504,16 +2536,25 @@ static const struct bpf_func_proto bpf_redirect_peer_proto = { .arg2_type = ARG_ANYTHING, }; -BPF_CALL_2(bpf_redirect_neigh, u32, ifindex, u64, flags) +BPF_CALL_4(bpf_redirect_neigh, u32, ifindex, struct bpf_redir_neigh *, params, + int, plen, u64, flags) { struct bpf_redirect_info *ri = this_cpu_ptr(&bpf_redirect_info); - if (unlikely(flags)) + if (unlikely((plen && plen < sizeof(*params)) || flags)) + return TC_ACT_SHOT; + + if (unlikely(plen && (params->unused[0] || params->unused[1] || + params->unused[2]))) return TC_ACT_SHOT; - ri->flags = BPF_F_NEIGH; + ri->flags = BPF_F_NEIGH | (plen ? BPF_F_NEXTHOP : 0); ri->tgt_index = ifindex; + BUILD_BUG_ON(sizeof(struct bpf_redir_neigh) != sizeof(struct bpf_nh_params)); + if (plen) + memcpy(&ri->nh, params, sizeof(ri->nh)); + return TC_ACT_REDIRECT; } @@ -2522,7 +2563,9 @@ static const struct bpf_func_proto bpf_redirect_neigh_proto = { .gpl_only = false, .ret_type = RET_INTEGER, .arg1_type = ARG_ANYTHING, - .arg2_type = ARG_ANYTHING, + .arg2_type = ARG_PTR_TO_MEM_OR_NULL, + .arg3_type = ARG_CONST_SIZE_OR_ZERO, + .arg4_type = ARG_ANYTHING, }; BPF_CALL_2(bpf_msg_apply_bytes, struct sk_msg *, msg, u32, bytes) diff --git a/scripts/bpf_helpers_doc.py b/scripts/bpf_helpers_doc.py index 7d86fdd190be..6769caae142f 100755 --- a/scripts/bpf_helpers_doc.py +++ b/scripts/bpf_helpers_doc.py @@ -453,6 +453,7 @@ class PrinterHelpers(Printer): 'struct bpf_perf_event_data', 'struct bpf_perf_event_value', 'struct bpf_pidns_info', + 'struct bpf_redir_neigh', 'struct bpf_sk_lookup', 'struct bpf_sock', 'struct bpf_sock_addr', diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h index bf5a99d803e4..9668cde9d684 100644 --- a/tools/include/uapi/linux/bpf.h +++ b/tools/include/uapi/linux/bpf.h @@ -3677,15 +3677,19 @@ union bpf_attr { * Return * The id is returned or 0 in case the id could not be retrieved. * - * long bpf_redirect_neigh(u32 ifindex, u64 flags) + * long bpf_redirect_neigh(u32 ifindex, struct bpf_redir_neigh *params, int plen, u64 flags) * Description * Redirect the packet to another net device of index *ifindex* * and fill in L2 addresses from neighboring subsystem. This helper * is somewhat similar to **bpf_redirect**\ (), except that it * populates L2 addresses as well, meaning, internally, the helper - * performs a FIB lookup based on the skb's networking header to - * get the address of the next hop and then relies on the neighbor - * lookup for the L2 address of the nexthop. + * relies on the neighbor lookup for the L2 address of the nexthop. + * + * The helper will perform a FIB lookup based on the skb's + * networking header to get the address of the next hop, unless + * this is supplied by the caller in the *params* argument. The + * *plen* argument indicates the len of *params* and should be set + * to 0 if *params* is NULL. * * The *flags* argument is reserved and must be 0. The helper is * currently only supported for tc BPF program types, and enabled @@ -4906,6 +4910,18 @@ struct bpf_fib_lookup { __u8 dmac[6]; /* ETH_ALEN */ }; +struct bpf_redir_neigh { + /* network family for lookup (AF_INET, AF_INET6) */ + __u8 nh_family; + /* avoid hole in struct - must be set to 0 */ + __u8 unused[3]; + /* network address of nexthop; skips fib lookup to find gateway */ + union { + __be32 ipv4_nh; + __u32 ipv6_nh[4]; /* in6_addr; network order */ + }; +}; + enum bpf_task_fd_type { BPF_FD_TYPE_RAW_TRACEPOINT, /* tp name */ BPF_FD_TYPE_TRACEPOINT, /* tp name */ From patchwork Tue Oct 20 10:51:03 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: =?utf-8?q?Toke_H=C3=B8iland-J=C3=B8rgensen?= X-Patchwork-Id: 11846269 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-9.9 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2B80DC433E7 for ; Tue, 20 Oct 2020 10:51:18 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id C9A0B22409 for ; Tue, 20 Oct 2020 10:51:17 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="eef/WBCV" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2393338AbgJTKvK (ORCPT ); Tue, 20 Oct 2020 06:51:10 -0400 Received: from us-smtp-delivery-124.mimecast.com ([63.128.21.124]:58288 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2392246AbgJTKvJ (ORCPT ); Tue, 20 Oct 2020 06:51:09 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1603191067; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=fIajhDaxv8mNezIzz6CRmGgwrL/Oe6HKl2u4NZJHNuc=; b=eef/WBCVuJwh+nASwzthf7MpVJrVcqmdFRhMima9/TIdI8GTXxz7wPTgoEs+nhTQRFIeos y0eb9AurBxN6DQlKH/zp8GTar+SzE0YRtCRMWNF+s/0LqKif2e1BcE5j06cjxNUfT32QFj 6WPYUZSDr+Mjyq+KQ6cakNkhatJFnD0= Received: from mail-ed1-f70.google.com (mail-ed1-f70.google.com [209.85.208.70]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-312-NPtLDTUdM-qsl2cuixMwRA-1; Tue, 20 Oct 2020 06:51:05 -0400 X-MC-Unique: NPtLDTUdM-qsl2cuixMwRA-1 Received: by mail-ed1-f70.google.com with SMTP id t4so455265edv.7 for ; Tue, 20 Oct 2020 03:51:05 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:from:to:cc:date:message-id:in-reply-to :references:user-agent:mime-version:content-transfer-encoding; bh=fIajhDaxv8mNezIzz6CRmGgwrL/Oe6HKl2u4NZJHNuc=; b=l0ANFQDoM7UJKZV0EEyfW+9DVJZq8GBH0eoj2b6jqViVl90BCtUWzIIBkVqT3pWv29 V6QRl8qstiKcXwlSDiysNEkgp8hK9GVxykFPckZZZQX2yvqebbgQQLL+QT3m1f6dN2ff 1djVKVVf2nR4HL8BGaNqtnRXv5iXhh1wT9gMTz7pKJl8kfOYZemuFxATAyzNQ7oYrW7y 5Hua4+z3IvH7bKN/JXkWSCvyaCuXXRuQYqBc01FyVk9cSZ6k8s7QqqNJRQjoPMwPPS9F ZKlDRSxBPG2VQ7EHd8mFymYhlGytE9HaqkRtT9LvbSp0f1wkMZthP4+RSLw71phBwQ38 KGgw== X-Gm-Message-State: AOAM531C2ZsusB8MLqDKZQIhVqZQMIqbITMpnMoZ4+jNwQmf3aac00Xa FBKs0WaC6/+GEJ8poman0e6iK43qpWyCEg5IIgswgN2wOTR3ccsIl3+Z4vXZKqDFjpL3TDJDsLf zjRzBUSrXrvNg X-Received: by 2002:a05:6402:31ac:: with SMTP id dj12mr2133986edb.20.1603191064452; Tue, 20 Oct 2020 03:51:04 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxtPsV13iNano4mHMiAGGBm9pzeIHM7JjUY0Xk6r3cXIfaxfmxxzY/1AKGICIGQfqH8GvEQ7Q== X-Received: by 2002:a05:6402:31ac:: with SMTP id dj12mr2133972edb.20.1603191064190; Tue, 20 Oct 2020 03:51:04 -0700 (PDT) Received: from alrua-x1.borgediget.toke.dk ([45.145.92.2]) by smtp.gmail.com with ESMTPSA id bk13sm2091048ejb.58.2020.10.20.03.51.03 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 20 Oct 2020 03:51:03 -0700 (PDT) Received: by alrua-x1.borgediget.toke.dk (Postfix, from userid 1000) id 614521838FF; Tue, 20 Oct 2020 12:51:03 +0200 (CEST) Subject: [PATCH bpf v2 2/3] bpf_fib_lookup: optionally skip neighbour lookup From: =?utf-8?q?Toke_H=C3=B8iland-J=C3=B8rgensen?= To: Daniel Borkmann Cc: David Ahern , netdev@vger.kernel.org, bpf@vger.kernel.org Date: Tue, 20 Oct 2020 12:51:03 +0200 Message-ID: <160319106331.15822.2945713836148003890.stgit@toke.dk> In-Reply-To: <160319106111.15822.18417665895694986295.stgit@toke.dk> References: <160319106111.15822.18417665895694986295.stgit@toke.dk> User-Agent: StGit/0.23 MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net From: Toke Høiland-Jørgensen The bpf_fib_lookup() helper performs a neighbour lookup for the destination IP and returns BPF_FIB_LKUP_NO_NEIGH if this fails, with the expectation that the BPF program will deal with this condition, either by passing the packet up the stack, or by using bpf_redirect_neigh(). The neighbour lookup is done via a hash table (through ___neigh_lookup_noref()), which incurs some overhead. If the caller knows this is likely to fail anyway, it may want to skip that and go unconditionally to bpf_redirect_neigh(). For this use case, add a flag to bpf_fib_lookup() that will make it skip the neighbour lookup and instead always return BPF_FIB_LKUP_RET_NO_NEIGH (but still populate the gateway and target ifindex). Signed-off-by: Toke Høiland-Jørgensen --- include/uapi/linux/bpf.h | 10 ++++++---- net/core/filter.c | 16 ++++++++++++++-- tools/include/uapi/linux/bpf.h | 10 ++++++---- 3 files changed, 26 insertions(+), 10 deletions(-) diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h index 9668cde9d684..4bfd3c72dae6 100644 --- a/include/uapi/linux/bpf.h +++ b/include/uapi/linux/bpf.h @@ -4841,12 +4841,14 @@ struct bpf_raw_tracepoint_args { __u64 args[0]; }; -/* DIRECT: Skip the FIB rules and go to FIB table associated with device - * OUTPUT: Do lookup from egress perspective; default is ingress +/* DIRECT: Skip the FIB rules and go to FIB table associated with device + * OUTPUT: Do lookup from egress perspective; default is ingress + * SKIP_NEIGH: Skip neighbour lookup and return BPF_FIB_LKUP_RET_NO_NEIGH on success */ enum { - BPF_FIB_LOOKUP_DIRECT = (1U << 0), - BPF_FIB_LOOKUP_OUTPUT = (1U << 1), + BPF_FIB_LOOKUP_DIRECT = (1U << 0), + BPF_FIB_LOOKUP_OUTPUT = (1U << 1), + BPF_FIB_LOOKUP_SKIP_NEIGH = (1U << 2), }; enum { diff --git a/net/core/filter.c b/net/core/filter.c index fa09b4f141ae..9791e6311afa 100644 --- a/net/core/filter.c +++ b/net/core/filter.c @@ -5382,6 +5382,9 @@ static int bpf_ipv4_fib_lookup(struct net *net, struct bpf_fib_lookup *params, if (nhc->nhc_gw_family) params->ipv4_dst = nhc->nhc_gw.ipv4; + if (flags & BPF_FIB_LOOKUP_SKIP_NEIGH) + return BPF_FIB_LKUP_RET_NO_NEIGH; + neigh = __ipv4_neigh_lookup_noref(dev, (__force u32)params->ipv4_dst); } else { @@ -5389,6 +5392,10 @@ static int bpf_ipv4_fib_lookup(struct net *net, struct bpf_fib_lookup *params, params->family = AF_INET6; *dst = nhc->nhc_gw.ipv6; + + if (flags & BPF_FIB_LOOKUP_SKIP_NEIGH) + return BPF_FIB_LKUP_RET_NO_NEIGH; + neigh = __ipv6_neigh_lookup_noref_stub(dev, dst); } @@ -5501,6 +5508,9 @@ static int bpf_ipv6_fib_lookup(struct net *net, struct bpf_fib_lookup *params, params->rt_metric = res.f6i->fib6_metric; params->ifindex = dev->ifindex; + if (flags & BPF_FIB_LOOKUP_SKIP_NEIGH) + return BPF_FIB_LKUP_RET_NO_NEIGH; + /* xdp and cls_bpf programs are run in RCU-bh so rcu_read_lock_bh is * not needed here. */ @@ -5518,7 +5528,8 @@ BPF_CALL_4(bpf_xdp_fib_lookup, struct xdp_buff *, ctx, if (plen < sizeof(*params)) return -EINVAL; - if (flags & ~(BPF_FIB_LOOKUP_DIRECT | BPF_FIB_LOOKUP_OUTPUT)) + if (flags & ~(BPF_FIB_LOOKUP_DIRECT | BPF_FIB_LOOKUP_OUTPUT | + BPF_FIB_LOOKUP_SKIP_NEIGH)) return -EINVAL; switch (params->family) { @@ -5555,7 +5566,8 @@ BPF_CALL_4(bpf_skb_fib_lookup, struct sk_buff *, skb, if (plen < sizeof(*params)) return -EINVAL; - if (flags & ~(BPF_FIB_LOOKUP_DIRECT | BPF_FIB_LOOKUP_OUTPUT)) + if (flags & ~(BPF_FIB_LOOKUP_DIRECT | BPF_FIB_LOOKUP_OUTPUT | + BPF_FIB_LOOKUP_SKIP_NEIGH)) return -EINVAL; switch (params->family) { diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h index 9668cde9d684..4bfd3c72dae6 100644 --- a/tools/include/uapi/linux/bpf.h +++ b/tools/include/uapi/linux/bpf.h @@ -4841,12 +4841,14 @@ struct bpf_raw_tracepoint_args { __u64 args[0]; }; -/* DIRECT: Skip the FIB rules and go to FIB table associated with device - * OUTPUT: Do lookup from egress perspective; default is ingress +/* DIRECT: Skip the FIB rules and go to FIB table associated with device + * OUTPUT: Do lookup from egress perspective; default is ingress + * SKIP_NEIGH: Skip neighbour lookup and return BPF_FIB_LKUP_RET_NO_NEIGH on success */ enum { - BPF_FIB_LOOKUP_DIRECT = (1U << 0), - BPF_FIB_LOOKUP_OUTPUT = (1U << 1), + BPF_FIB_LOOKUP_DIRECT = (1U << 0), + BPF_FIB_LOOKUP_OUTPUT = (1U << 1), + BPF_FIB_LOOKUP_SKIP_NEIGH = (1U << 2), }; enum { From patchwork Tue Oct 20 10:51:04 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: =?utf-8?q?Toke_H=C3=B8iland-J=C3=B8rgensen?= X-Patchwork-Id: 11846271 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-9.8 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS, URIBL_BLOCKED autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id CF315C4363A for ; Tue, 20 Oct 2020 10:51:18 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 664C922404 for ; Tue, 20 Oct 2020 10:51:18 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="c7Kqv0hP" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2393337AbgJTKvR (ORCPT ); Tue, 20 Oct 2020 06:51:17 -0400 Received: from us-smtp-delivery-124.mimecast.com ([63.128.21.124]:23447 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2390242AbgJTKvL (ORCPT ); Tue, 20 Oct 2020 06:51:11 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1603191069; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=LHuaX/N0u2zUKlxZw67FyAdEirtwxjRNTIN1rjGKMKI=; b=c7Kqv0hP3ZZGFsYUchEkzTY+xHLGaZ5YMK2DbFRou7bybCuTYSRn6rKKp0izebD5w5mHpg 7661aXBWzD55lHQ2TaIYwYWesQ/74lNnA2A9D0M2gT+M4Eu8INUvPY+2KXP21djQjEP/o6 evVwqLA/p77Io+1sb64NERU6pTc1CQc= Received: from mail-wm1-f69.google.com (mail-wm1-f69.google.com [209.85.128.69]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-174-oOpYwacIPo6DtFyWECRvWQ-1; Tue, 20 Oct 2020 06:51:07 -0400 X-MC-Unique: oOpYwacIPo6DtFyWECRvWQ-1 Received: by mail-wm1-f69.google.com with SMTP id r19so311306wmh.9 for ; Tue, 20 Oct 2020 03:51:07 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:from:to:cc:date:message-id:in-reply-to :references:user-agent:mime-version:content-transfer-encoding; bh=LHuaX/N0u2zUKlxZw67FyAdEirtwxjRNTIN1rjGKMKI=; b=UEtAGfmPzjDTxkDI6ukJki3dS9iAW2kbAVjkTJtNrLwd3UzjxcxbU7x5bbD2YKvcfI hQj8GZQVx9XNr05M8rN4vUT88BA2UPvG9h6AN5rCtr7g1wudGr3Nd1sDRR+sk3/sh+mx JCvtrezkp9o05yBYggRwOTWb1LTm9O9QG545v4W5IaH6ydpC1suqugFg89IHdMXFHzyJ kInKgGq6cJ3mZ6riMSzWPwxL/AByquOwc8BYhy8vyyKdCASqzNXgjZCyJbD6ZWCMCNXu 4MbBrphL5z6Hfxj4CHUv8/6EuWsyk8JzK44mzQMDDmY38u3skmunXdQCuWls+uh5WzEw Ffkw== X-Gm-Message-State: AOAM532MR9C1zI6w6baQaWCzIwN1lw+IDpbxlUvd5ZHqmflETa9x34T9 rF2HWmBDy3tsDLECErUsd3YB99hilVmqVWOpF3CG/W6tVB+xDegoQb9/uW/pPFR2Ik/UrRlIQT6 3+Ks3rsc+0AVr X-Received: by 2002:a05:6000:10c6:: with SMTP id b6mr2907555wrx.10.1603191065959; Tue, 20 Oct 2020 03:51:05 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxObTEdvq6XhspC0dW+u+JgTyvedQafrFhf3dBQ2IwQTA9bIJzH6RGvvlz8H10h2QcGfJLN6g== X-Received: by 2002:a05:6000:10c6:: with SMTP id b6mr2907519wrx.10.1603191065546; Tue, 20 Oct 2020 03:51:05 -0700 (PDT) Received: from alrua-x1.borgediget.toke.dk ([2a0c:4d80:42:443::2]) by smtp.gmail.com with ESMTPSA id j7sm2369191wrn.81.2020.10.20.03.51.04 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 20 Oct 2020 03:51:05 -0700 (PDT) Received: by alrua-x1.borgediget.toke.dk (Postfix, from userid 1000) id 79F231838FA; Tue, 20 Oct 2020 12:51:04 +0200 (CEST) Subject: [PATCH bpf v2 3/3] selftests: Update test_tc_redirect.sh to use the modified bpf_redirect_neigh() From: =?utf-8?q?Toke_H=C3=B8iland-J=C3=B8rgensen?= To: Daniel Borkmann Cc: David Ahern , netdev@vger.kernel.org, bpf@vger.kernel.org Date: Tue, 20 Oct 2020 12:51:04 +0200 Message-ID: <160319106440.15822.14581567089579452818.stgit@toke.dk> In-Reply-To: <160319106111.15822.18417665895694986295.stgit@toke.dk> References: <160319106111.15822.18417665895694986295.stgit@toke.dk> User-Agent: StGit/0.23 MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net From: Toke Høiland-Jørgensen This updates the test_tc_neigh prog in selftests to use the new syntax of bpf_redirect_neigh(). To exercise the helper both with and without the optional parameter, add an additional test_tc_neigh_fib test program, which does a bpf_fib_lookup() followed by a call to bpf_redirect_neigh() instead of looking up the ifindex in a map. This second test uses the BPF_FIB_LOOKUP_SKIP_NEIGH flag in one forwarding direction, but not in the other, to test both ways of combining the two helpers. Update the test_tc_redirect.sh script to run both versions of the test, and while we're add it, fix it to work on systems that have a consolidated dual-stack 'ping' binary instead of separate ping/ping6 versions. Signed-off-by: Toke Høiland-Jørgensen --- tools/testing/selftests/bpf/progs/test_tc_neigh.c | 5 - .../selftests/bpf/progs/test_tc_neigh_fib.c | 153 ++++++++++++++++++++ tools/testing/selftests/bpf/test_tc_redirect.sh | 18 ++ 3 files changed, 171 insertions(+), 5 deletions(-) create mode 100644 tools/testing/selftests/bpf/progs/test_tc_neigh_fib.c diff --git a/tools/testing/selftests/bpf/progs/test_tc_neigh.c b/tools/testing/selftests/bpf/progs/test_tc_neigh.c index fe182616b112..b985ac4e7a81 100644 --- a/tools/testing/selftests/bpf/progs/test_tc_neigh.c +++ b/tools/testing/selftests/bpf/progs/test_tc_neigh.c @@ -1,4 +1,5 @@ // SPDX-License-Identifier: GPL-2.0 +#include #include #include @@ -118,7 +119,7 @@ SEC("dst_ingress") int tc_dst(struct __sk_buff *skb) if (bpf_skb_store_bytes(skb, 0, &zero, sizeof(zero), 0) < 0) return TC_ACT_SHOT; - return bpf_redirect_neigh(get_dev_ifindex(dev_src), 0); + return bpf_redirect_neigh(get_dev_ifindex(dev_src), NULL, 0, 0); } SEC("src_ingress") int tc_src(struct __sk_buff *skb) @@ -142,7 +143,7 @@ SEC("src_ingress") int tc_src(struct __sk_buff *skb) if (bpf_skb_store_bytes(skb, 0, &zero, sizeof(zero), 0) < 0) return TC_ACT_SHOT; - return bpf_redirect_neigh(get_dev_ifindex(dev_dst), 0); + return bpf_redirect_neigh(get_dev_ifindex(dev_dst), NULL, 0, 0); } char __license[] SEC("license") = "GPL"; diff --git a/tools/testing/selftests/bpf/progs/test_tc_neigh_fib.c b/tools/testing/selftests/bpf/progs/test_tc_neigh_fib.c new file mode 100644 index 000000000000..14792ce3a85c --- /dev/null +++ b/tools/testing/selftests/bpf/progs/test_tc_neigh_fib.c @@ -0,0 +1,153 @@ +// SPDX-License-Identifier: GPL-2.0 +#include +#include +#include + +#include +#include +#include +#include +#include +#include +#include + +#include +#include + +#ifndef ctx_ptr +# define ctx_ptr(field) (void *)(long)(field) +#endif + +#define AF_INET 2 +#define AF_INET6 10 + +static __always_inline int fill_fib_params_v4(struct __sk_buff *skb, + struct bpf_fib_lookup *fib_params) +{ + void *data_end = ctx_ptr(skb->data_end); + void *data = ctx_ptr(skb->data); + struct iphdr *ip4h; + + if (data + sizeof(struct ethhdr) > data_end) + return -1; + + ip4h = (struct iphdr *)(data + sizeof(struct ethhdr)); + if ((void *)(ip4h + 1) > data_end) + return -1; + + fib_params->family = AF_INET; + fib_params->tos = ip4h->tos; + fib_params->l4_protocol = ip4h->protocol; + fib_params->sport = 0; + fib_params->dport = 0; + fib_params->tot_len = bpf_ntohs(ip4h->tot_len); + fib_params->ipv4_src = ip4h->saddr; + fib_params->ipv4_dst = ip4h->daddr; + + return 0; +} + +static __always_inline int fill_fib_params_v6(struct __sk_buff *skb, + struct bpf_fib_lookup *fib_params) +{ + struct in6_addr *src = (struct in6_addr *)fib_params->ipv6_src; + struct in6_addr *dst = (struct in6_addr *)fib_params->ipv6_dst; + void *data_end = ctx_ptr(skb->data_end); + void *data = ctx_ptr(skb->data); + struct ipv6hdr *ip6h; + + if (data + sizeof(struct ethhdr) > data_end) + return -1; + + ip6h = (struct ipv6hdr *)(data + sizeof(struct ethhdr)); + if ((void *)(ip6h + 1) > data_end) + return -1; + + fib_params->family = AF_INET6; + fib_params->flowinfo = 0; + fib_params->l4_protocol = ip6h->nexthdr; + fib_params->sport = 0; + fib_params->dport = 0; + fib_params->tot_len = bpf_ntohs(ip6h->payload_len); + *src = ip6h->saddr; + *dst = ip6h->daddr; + + return 0; +} + +SEC("chk_egress") int tc_chk(struct __sk_buff *skb) +{ + void *data_end = ctx_ptr(skb->data_end); + void *data = ctx_ptr(skb->data); + __u32 *raw = data; + + if (data + sizeof(struct ethhdr) > data_end) + return TC_ACT_SHOT; + + return !raw[0] && !raw[1] && !raw[2] ? TC_ACT_SHOT : TC_ACT_OK; +} + +static __always_inline int tc_redir(struct __sk_buff *skb, int fib_lookup_flags) +{ + struct bpf_fib_lookup fib_params = { .ifindex = skb->ingress_ifindex }; + __u8 zero[ETH_ALEN * 2]; + int ret = -1; + + switch (skb->protocol) { + case __bpf_constant_htons(ETH_P_IP): + ret = fill_fib_params_v4(skb, &fib_params); + break; + case __bpf_constant_htons(ETH_P_IPV6): + ret = fill_fib_params_v6(skb, &fib_params); + break; + } + + if (ret) + return TC_ACT_OK; + + ret = bpf_fib_lookup(skb, &fib_params, sizeof(fib_params), + fib_lookup_flags); + if (ret == BPF_FIB_LKUP_RET_NOT_FWDED || ret < 0) + return TC_ACT_OK; + + __builtin_memset(&zero, 0, sizeof(zero)); + if (bpf_skb_store_bytes(skb, 0, &zero, sizeof(zero), 0) < 0) + return TC_ACT_SHOT; + + if (ret == BPF_FIB_LKUP_RET_NO_NEIGH) { + struct bpf_redir_neigh nh_params = {}; + + nh_params.nh_family = fib_params.family; + __builtin_memcpy(&nh_params.ipv6_nh, &fib_params.ipv6_dst, + sizeof(nh_params.ipv6_nh)); + + return bpf_redirect_neigh(fib_params.ifindex, &nh_params, + sizeof(nh_params), 0); + + } else if (!fib_lookup_flags && ret == BPF_FIB_LKUP_RET_SUCCESS) { + void *data_end = ctx_ptr(skb->data_end); + struct ethhdr *eth = ctx_ptr(skb->data); + + if (eth + 1 > data_end) + return TC_ACT_SHOT; + + __builtin_memcpy(eth->h_dest, fib_params.dmac, ETH_ALEN); + __builtin_memcpy(eth->h_source, fib_params.smac, ETH_ALEN); + + return bpf_redirect(fib_params.ifindex, 0); + } + + return TC_ACT_SHOT; +} + +SEC("dst_ingress") int tc_dst(struct __sk_buff *skb) +{ + return tc_redir(skb, 0); +} + +SEC("src_ingress") int tc_src(struct __sk_buff *skb) +{ + return tc_redir(skb, BPF_FIB_LOOKUP_SKIP_NEIGH); +} + +char __license[] SEC("license") = "GPL"; diff --git a/tools/testing/selftests/bpf/test_tc_redirect.sh b/tools/testing/selftests/bpf/test_tc_redirect.sh index 6d7482562140..8868aa1ca902 100755 --- a/tools/testing/selftests/bpf/test_tc_redirect.sh +++ b/tools/testing/selftests/bpf/test_tc_redirect.sh @@ -24,8 +24,7 @@ command -v timeout >/dev/null 2>&1 || \ { echo >&2 "timeout is not available"; exit 1; } command -v ping >/dev/null 2>&1 || \ { echo >&2 "ping is not available"; exit 1; } -command -v ping6 >/dev/null 2>&1 || \ - { echo >&2 "ping6 is not available"; exit 1; } +if command -v ping6 >/dev/null 2>&1; then PING6=ping6; else PING6=ping; fi command -v perl >/dev/null 2>&1 || \ { echo >&2 "perl is not available"; exit 1; } command -v jq >/dev/null 2>&1 || \ @@ -152,7 +151,7 @@ netns_test_connectivity() echo -e "${TEST}: ${GREEN}PASS${NC}" TEST="ICMPv6 connectivity test" - ip netns exec ${NS_SRC} ping6 $PING_ARG ${IP6_DST} + ip netns exec ${NS_SRC} $PING6 $PING_ARG ${IP6_DST} if [ $? -ne 0 ]; then echo -e "${TEST}: ${RED}FAIL${NC}" exit 1 @@ -170,6 +169,7 @@ hex_mem_str() netns_setup_bpf() { local obj=$1 + local use_forwarding=${2:-0} ip netns exec ${NS_FWD} tc qdisc add dev veth_src_fwd clsact ip netns exec ${NS_FWD} tc filter add dev veth_src_fwd ingress bpf da obj $obj sec src_ingress @@ -179,6 +179,14 @@ netns_setup_bpf() ip netns exec ${NS_FWD} tc filter add dev veth_dst_fwd ingress bpf da obj $obj sec dst_ingress ip netns exec ${NS_FWD} tc filter add dev veth_dst_fwd egress bpf da obj $obj sec chk_egress + if [ "$use_forwarding" -eq "1" ]; then + # bpf_fib_lookup() checks if forwarding is enabled + ip netns exec ${NS_FWD} sysctl -w net.ipv4.ip_forward=1 + ip netns exec ${NS_FWD} sysctl -w net.ipv6.conf.veth_dst_fwd.forwarding=1 + ip netns exec ${NS_FWD} sysctl -w net.ipv6.conf.veth_src_fwd.forwarding=1 + return 0 + fi + veth_src=$(ip netns exec ${NS_FWD} cat /sys/class/net/veth_src_fwd/ifindex) veth_dst=$(ip netns exec ${NS_FWD} cat /sys/class/net/veth_dst_fwd/ifindex) @@ -200,5 +208,9 @@ netns_setup_bpf test_tc_neigh.o netns_test_connectivity netns_cleanup netns_setup +netns_setup_bpf test_tc_neigh_fib.o 1 +netns_test_connectivity +netns_cleanup +netns_setup netns_setup_bpf test_tc_peer.o netns_test_connectivity