From patchwork Wed Dec 16 14:30:31 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Hangbin Liu X-Patchwork-Id: 11977697 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6A450C0018C for ; Wed, 16 Dec 2020 14:31:52 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 1BECB23382 for ; Wed, 16 Dec 2020 14:31:52 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726528AbgLPObj (ORCPT ); Wed, 16 Dec 2020 09:31:39 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40458 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726398AbgLPObj (ORCPT ); Wed, 16 Dec 2020 09:31:39 -0500 Received: from mail-pj1-x1033.google.com (mail-pj1-x1033.google.com [IPv6:2607:f8b0:4864:20::1033]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C9BE2C0617A6; Wed, 16 Dec 2020 06:30:58 -0800 (PST) Received: by mail-pj1-x1033.google.com with SMTP id hk16so1706155pjb.4; Wed, 16 Dec 2020 06:30:58 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=uDGzZrwaFErNtNIsPs1ybtzqRdKD5oPPG000GGXZziE=; b=tFTvRZg0f37D0V7XFgU5CF+mtAYQ71jDf/U5Ou8hm8HsuSr2MNa+9Yprpe0doGWfFQ awvWD/adRJF7KjmAITChj5Oxm0VxwouK4EXu2GLo4wr48Zt/dKmoT+EwnV5/q+OSotjb x/YKFu/5aKrcWKoHPPitk0a5Y3VsyAy327+7E4lm8kFi5tv47WBEspgBx+/CFXvFW1I6 eB65J6GJM9BjESkY7hRLDmnQehRDuAevmhHt2nTnyFM2Rhv2NccDbBHMCiZVmjk7U1EG whSk5Yc8c+HB3d+s8HzYVIsAjtUGi3u+NbqBFavGcqSJhamQrSWDCWXnUFjUR9B22Lqv FRbA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=uDGzZrwaFErNtNIsPs1ybtzqRdKD5oPPG000GGXZziE=; b=UP5Q1HC5WZjzyowabHt+hrZdW2wKEJE9Nl1+H1opmTnA+qpTQp0c22LaAcpj73LB1M 10oMZeEXDyir/HjHXs8hTbKenFT9u2fZnZxZDTzD/fz/+kqXPOhXjsZarreUipTzRtSg 6jnmWxPsjUdYerdviYaDAWwrhgLlGdgOHXAWKTJ5hZGLDR4f9B+EuWkc5aq7xVd576rK Aw27IzCHH916YgY8KZdpRte1YZTpZxZ8eePMCKJCoyuo7ZlUeISeg7pJVkyk6zZ7Iw9L wlzPfPGdRwn7d4LmxDwHXHtHEGH9U0yGCk2wqj/HWYbSP8TnoHOCqSU80KdQypEKw3Yi hDkA== X-Gm-Message-State: AOAM533Dg0SUjJSBorvzyxb+9PMOztIpPqZRfP5XeeRixijy/x05j6Zs DGfxqpDUEVAdER9jwzyEydNihmaFSCZDiBbt X-Google-Smtp-Source: ABdhPJzeoJVbJK4q5AmGtmNQMAf+khBQRwgW3WIeU4OInrt/FT+usb8YAWjh/tzGpGJ3/eTlnx5Hcg== X-Received: by 2002:a17:90a:6705:: with SMTP id n5mr3315475pjj.215.1608129058091; Wed, 16 Dec 2020 06:30:58 -0800 (PST) Received: from localhost.localdomain.com ([209.132.188.80]) by smtp.gmail.com with ESMTPSA id a141sm2858802pfa.189.2020.12.16.06.30.53 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 16 Dec 2020 06:30:57 -0800 (PST) From: Hangbin Liu To: bpf@vger.kernel.org Cc: netdev@vger.kernel.org, =?utf-8?q?Toke_H=C3=B8iland-J=C3=B8rgensen?= , Jiri Benc , Jesper Dangaard Brouer , Eelco Chaudron , ast@kernel.org, Daniel Borkmann , Lorenzo Bianconi , David Ahern , Andrii Nakryiko , Alexei Starovoitov , Hangbin Liu Subject: [PATCHv12 bpf-next 1/6] bpf: run devmap xdp_prog on flush instead of bulk enqueue Date: Wed, 16 Dec 2020 22:30:31 +0800 Message-Id: <20201216143036.2296568-2-liuhangbin@gmail.com> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20201216143036.2296568-1-liuhangbin@gmail.com> References: <20200907082724.1721685-1-liuhangbin@gmail.com> <20201216143036.2296568-1-liuhangbin@gmail.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net From: Jesper Dangaard Brouer This changes the devmap XDP program support to run the program when the bulk queue is flushed instead of before the frame is enqueued. This has a couple of benefits: - It "sorts" the packets by destination devmap entry, and then runs the same BPF program on all the packets in sequence. This ensures that we keep the XDP program and destination device properties hot in I-cache. - It makes the multicast implementation simpler because it can just enqueue packets using bq_enqueue() without having to deal with the devmap program at all. The drawback is that if the devmap program drops the packet, the enqueue step is redundant. However, arguably this is mostly visible in a micro-benchmark, and with more mixed traffic the I-cache benefit should win out. The performance impact of just this patch is as follows: Using xdp_redirect_map(with a 2nd xdp_prog patch[1]) in sample/bpf and send pkts via pktgen cmd: ./pktgen_sample03_burst_single_flow.sh -i eno1 -d $dst_ip -m $dst_mac -t 10 -s 64 There are about +/- 0.1M deviation for native testing, the performance improved for the base-case, but some drop back with xdp devmap prog attached. Version | Test | Generic | Native | Native + 2nd xdp_prog 5.10 rc6 | xdp_redirect_map i40e->i40e | 2.0M | 9.1M | 8.0M 5.10 rc6 | xdp_redirect_map i40e->veth | 1.7M | 11.0M | 9.7M 5.10 rc6 + patch | xdp_redirect_map i40e->i40e | 2.0M | 9.5M | 7.5M 5.10 rc6 + patch | xdp_redirect_map i40e->veth | 1.7M | 11.6M | 9.1M [1] https://patchwork.ozlabs.org/project/netdev/patch/20201208120159.2278277-1-liuhangbin@gmail.com/ Signed-off-by: Jesper Dangaard Brouer Signed-off-by: Hangbin Liu --- kernel/bpf/devmap.c | 116 ++++++++++++++++++++++++++++---------------- 1 file changed, 73 insertions(+), 43 deletions(-) diff --git a/kernel/bpf/devmap.c b/kernel/bpf/devmap.c index f6e9c68afdd4..2a83232cf63a 100644 --- a/kernel/bpf/devmap.c +++ b/kernel/bpf/devmap.c @@ -57,6 +57,7 @@ struct xdp_dev_bulk_queue { struct list_head flush_node; struct net_device *dev; struct net_device *dev_rx; + struct bpf_prog *xdp_prog; unsigned int count; }; @@ -327,40 +328,92 @@ bool dev_map_can_have_prog(struct bpf_map *map) return false; } +static int dev_map_bpf_prog_run(struct bpf_prog *xdp_prog, + struct xdp_frame **frames, int n, + struct net_device *dev) +{ + struct xdp_txq_info txq = { .dev = dev }; + struct xdp_buff xdp; + int i, nframes = 0; + + for (i = 0; i < n; i++) { + struct xdp_frame *xdpf = frames[i]; + u32 act; + int err; + + xdp_convert_frame_to_buff(xdpf, &xdp); + xdp.txq = &txq; + + act = bpf_prog_run_xdp(xdp_prog, &xdp); + switch (act) { + case XDP_PASS: + err = xdp_update_frame_from_buff(&xdp, xdpf); + if (unlikely(err < 0)) + xdp_return_frame_rx_napi(xdpf); + else + frames[nframes++] = xdpf; + break; + default: + bpf_warn_invalid_xdp_action(act); + fallthrough; + case XDP_ABORTED: + trace_xdp_exception(dev, xdp_prog, act); + fallthrough; + case XDP_DROP: + xdp_return_frame_rx_napi(xdpf); + break; + } + } + return n - nframes; /* dropped frames count */ +} + static void bq_xmit_all(struct xdp_dev_bulk_queue *bq, u32 flags) { struct net_device *dev = bq->dev; int sent = 0, drops = 0, err = 0; + unsigned int cnt = bq->count; + unsigned int xdp_drop; int i; - if (unlikely(!bq->count)) + if (unlikely(!cnt)) return; - for (i = 0; i < bq->count; i++) { + for (i = 0; i < cnt; i++) { struct xdp_frame *xdpf = bq->q[i]; prefetch(xdpf); } - sent = dev->netdev_ops->ndo_xdp_xmit(dev, bq->count, bq->q, flags); + if (unlikely(bq->xdp_prog)) { + xdp_drop = dev_map_bpf_prog_run(bq->xdp_prog, bq->q, cnt, dev); + cnt -= xdp_drop; + if (!cnt) { + sent = 0; + drops = xdp_drop; + goto out; + } + } + + sent = dev->netdev_ops->ndo_xdp_xmit(dev, cnt, bq->q, flags); if (sent < 0) { err = sent; sent = 0; goto error; } - drops = bq->count - sent; + drops = (cnt - sent) + xdp_drop; out: bq->count = 0; trace_xdp_devmap_xmit(bq->dev_rx, dev, sent, drops, err); bq->dev_rx = NULL; + bq->xdp_prog = NULL; __list_del_clearprev(&bq->flush_node); return; error: /* If ndo_xdp_xmit fails with an errno, no frames have been * xmit'ed and it's our responsibility to them free all. */ - for (i = 0; i < bq->count; i++) { + for (i = 0; i < cnt; i++) { struct xdp_frame *xdpf = bq->q[i]; xdp_return_frame_rx_napi(xdpf); @@ -408,7 +461,8 @@ struct bpf_dtab_netdev *__dev_map_lookup_elem(struct bpf_map *map, u32 key) * Thus, safe percpu variable access. */ static void bq_enqueue(struct net_device *dev, struct xdp_frame *xdpf, - struct net_device *dev_rx) + struct net_device *dev_rx, + struct bpf_dtab_netdev *dst) { struct list_head *flush_list = this_cpu_ptr(&dev_flush_list); struct xdp_dev_bulk_queue *bq = this_cpu_ptr(dev->xdp_bulkq); @@ -423,6 +477,14 @@ static void bq_enqueue(struct net_device *dev, struct xdp_frame *xdpf, if (!bq->dev_rx) bq->dev_rx = dev_rx; + /* Store (potential) xdp_prog that run before egress to dev as + * part of bulk_queue. This will be same xdp_prog for all + * xdp_frame's in bulk_queue, because this per-CPU store must + * be flushed from net_device drivers NAPI func end. + */ + if (dst && dst->xdp_prog && !bq->xdp_prog) + bq->xdp_prog = dst->xdp_prog; + bq->q[bq->count++] = xdpf; if (!bq->flush_node.prev) @@ -430,7 +492,8 @@ static void bq_enqueue(struct net_device *dev, struct xdp_frame *xdpf, } static inline int __xdp_enqueue(struct net_device *dev, struct xdp_buff *xdp, - struct net_device *dev_rx) + struct net_device *dev_rx, + struct bpf_dtab_netdev *dst) { struct xdp_frame *xdpf; int err; @@ -446,42 +509,14 @@ static inline int __xdp_enqueue(struct net_device *dev, struct xdp_buff *xdp, if (unlikely(!xdpf)) return -EOVERFLOW; - bq_enqueue(dev, xdpf, dev_rx); + bq_enqueue(dev, xdpf, dev_rx, dst); return 0; } -static struct xdp_buff *dev_map_run_prog(struct net_device *dev, - struct xdp_buff *xdp, - struct bpf_prog *xdp_prog) -{ - struct xdp_txq_info txq = { .dev = dev }; - u32 act; - - xdp_set_data_meta_invalid(xdp); - xdp->txq = &txq; - - act = bpf_prog_run_xdp(xdp_prog, xdp); - switch (act) { - case XDP_PASS: - return xdp; - case XDP_DROP: - break; - default: - bpf_warn_invalid_xdp_action(act); - fallthrough; - case XDP_ABORTED: - trace_xdp_exception(dev, xdp_prog, act); - break; - } - - xdp_return_buff(xdp); - return NULL; -} - int dev_xdp_enqueue(struct net_device *dev, struct xdp_buff *xdp, struct net_device *dev_rx) { - return __xdp_enqueue(dev, xdp, dev_rx); + return __xdp_enqueue(dev, xdp, dev_rx, NULL); } int dev_map_enqueue(struct bpf_dtab_netdev *dst, struct xdp_buff *xdp, @@ -489,12 +524,7 @@ int dev_map_enqueue(struct bpf_dtab_netdev *dst, struct xdp_buff *xdp, { struct net_device *dev = dst->dev; - if (dst->xdp_prog) { - xdp = dev_map_run_prog(dev, xdp, dst->xdp_prog); - if (!xdp) - return 0; - } - return __xdp_enqueue(dev, xdp, dev_rx); + return __xdp_enqueue(dev, xdp, dev_rx, dst); } int dev_map_generic_redirect(struct bpf_dtab_netdev *dst, struct sk_buff *skb, From patchwork Wed Dec 16 14:30:32 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Hangbin Liu X-Patchwork-Id: 11977699 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id C47F0C2BBD5 for ; Wed, 16 Dec 2020 14:31:52 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 6F0E7233FB for ; Wed, 16 Dec 2020 14:31:52 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726531AbgLPObn (ORCPT ); Wed, 16 Dec 2020 09:31:43 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40470 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726398AbgLPObn (ORCPT ); Wed, 16 Dec 2020 09:31:43 -0500 Received: from mail-pj1-x102a.google.com (mail-pj1-x102a.google.com [IPv6:2607:f8b0:4864:20::102a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 55E3AC0617A7; Wed, 16 Dec 2020 06:31:03 -0800 (PST) Received: by mail-pj1-x102a.google.com with SMTP id l23so1715998pjg.1; Wed, 16 Dec 2020 06:31:03 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=uDSjQ6eqNouGRWexyQUcNrTiRrg41nS2aMR+Xhy4/m4=; b=Zk8XwqvFTh/Gkr6bLxBAR9j/7I5t3lD/iRMNSPDKqpW60tG/22neKsHS5q61kK3Qwi TV1EFKFQO2eamPgO7vXRwzO52khRF3pPPmXi7ULJXqdCby1zDXA57siKdk6XgPUHT7h+ HHozP0ikaGNj2qdAXNzdb7R6QscgvQnnvCa6AwxGgYvHMuBUw1l6JIbrvmSG7LsY/dAl BV2X5/47EBLKIJnzu/bb1rOSaSJhPC3wQK/Hw22XV60M2Xr/zC9uZ3dLfshimh79zcbW egphgVr3+xuiVV6yOK6pS8MgGPjLhwzvWpIKovydaqM2jUXMPIsQHUryEmipMM3RzCMp VB9w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=uDSjQ6eqNouGRWexyQUcNrTiRrg41nS2aMR+Xhy4/m4=; b=nWj0WAgjaY4p5MUG6ywZ2D5ewpJ0Qhs8kee0N6q0LJR4YLb1BNrl3AtDioAtJsVymM OK8fBUUnJKNITj7QAQCCGHyNfmuke0kISy+dY0e0x81sXktYhlKLZau+O0oX0kmdsNme 372KMyPOIYhocJ6tdb56ueaVRmabqLhQYkRcpyIsDjKbCBlSCX7cYIjvwHdvjtbaFLOW O60bjIbjsOkLaFdiatvC3Shbr9Gfl0T2f8o7OApXwEpsuLGUoaFmsh2aGK9sM/Df3S7T Cq2nm/ND4lnNloVz4Yl8SgesmZCWin4QvEt0R7Z/Ggvy4p+37KLpQ/lx+LcDpQL7GXEk QmxQ== X-Gm-Message-State: AOAM530iqU5yziC+bFKatfeSISsFU9b0O9Vn+Pek0EhtQfp1GlNGN66g UH4zhbDmouHJc2cuHECBqekX1T1ri0lkrTxH X-Google-Smtp-Source: ABdhPJya2Ymw/iQOwGDn9RlXMTLzZ5uT2afFkHOj5zxUjKNNKtQtFJE9pk3/h7dzmDph21bM5hbXEg== X-Received: by 2002:a17:902:b616:b029:da:fcfd:7568 with SMTP id b22-20020a170902b616b02900dafcfd7568mr31890951pls.35.1608129062688; Wed, 16 Dec 2020 06:31:02 -0800 (PST) Received: from localhost.localdomain.com ([209.132.188.80]) by smtp.gmail.com with ESMTPSA id a141sm2858802pfa.189.2020.12.16.06.30.58 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 16 Dec 2020 06:31:02 -0800 (PST) From: Hangbin Liu To: bpf@vger.kernel.org Cc: netdev@vger.kernel.org, =?utf-8?q?Toke_H=C3=B8iland-J=C3=B8rgensen?= , Jiri Benc , Jesper Dangaard Brouer , Eelco Chaudron , ast@kernel.org, Daniel Borkmann , Lorenzo Bianconi , David Ahern , Andrii Nakryiko , Alexei Starovoitov , Hangbin Liu Subject: [PATCHv12 bpf-next 2/6] bpf: add a new bpf argument type ARG_CONST_MAP_PTR_OR_NULL Date: Wed, 16 Dec 2020 22:30:32 +0800 Message-Id: <20201216143036.2296568-3-liuhangbin@gmail.com> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20201216143036.2296568-1-liuhangbin@gmail.com> References: <20200907082724.1721685-1-liuhangbin@gmail.com> <20201216143036.2296568-1-liuhangbin@gmail.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net Add a new bpf argument type ARG_CONST_MAP_PTR_OR_NULL which could be used when we want to allow NULL pointer for map parameter. The bpf helper need to take care and check if the map is NULL when use this type. Signed-off-by: Hangbin Liu --- v11-v12: rebase the patch to latest bpf-next v10: remove useless CONST_PTR_TO_MAP_OR_NULL and Copy-paste comment. v9: merge the patch from [1] in to this series. v1-v8: no this patch [1] https://lore.kernel.org/bpf/20200715070001.2048207-1-liuhangbin@gmail.com/ --- include/linux/bpf.h | 1 + kernel/bpf/verifier.c | 10 ++++++---- 2 files changed, 7 insertions(+), 4 deletions(-) diff --git a/include/linux/bpf.h b/include/linux/bpf.h index 07cb5d15e743..7850c87456fc 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -296,6 +296,7 @@ enum bpf_arg_type { ARG_CONST_ALLOC_SIZE_OR_ZERO, /* number of allocated bytes requested */ ARG_PTR_TO_BTF_ID_SOCK_COMMON, /* pointer to in-kernel sock_common or bpf-mirrored bpf_sock */ ARG_PTR_TO_PERCPU_BTF_ID, /* pointer to in-kernel percpu type */ + ARG_CONST_MAP_PTR_OR_NULL, /* const argument used as pointer to bpf_map or NULL */ __BPF_ARG_TYPE_MAX, }; diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index 17270b8404f1..9f6633c9ea12 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -445,7 +445,8 @@ static bool arg_type_may_be_null(enum bpf_arg_type type) type == ARG_PTR_TO_MEM_OR_NULL || type == ARG_PTR_TO_CTX_OR_NULL || type == ARG_PTR_TO_SOCKET_OR_NULL || - type == ARG_PTR_TO_ALLOC_MEM_OR_NULL; + type == ARG_PTR_TO_ALLOC_MEM_OR_NULL || + type == ARG_CONST_MAP_PTR_OR_NULL; } /* Determine whether the function releases some resources allocated by another @@ -4065,6 +4066,7 @@ static const struct bpf_reg_types *compatible_reg_types[__BPF_ARG_TYPE_MAX] = { [ARG_CONST_SIZE_OR_ZERO] = &scalar_types, [ARG_CONST_ALLOC_SIZE_OR_ZERO] = &scalar_types, [ARG_CONST_MAP_PTR] = &const_map_ptr_types, + [ARG_CONST_MAP_PTR_OR_NULL] = &const_map_ptr_types, [ARG_PTR_TO_CTX] = &context_types, [ARG_PTR_TO_CTX_OR_NULL] = &context_types, [ARG_PTR_TO_SOCK_COMMON] = &sock_types, @@ -4210,9 +4212,9 @@ static int check_func_arg(struct bpf_verifier_env *env, u32 arg, meta->ref_obj_id = reg->ref_obj_id; } - if (arg_type == ARG_CONST_MAP_PTR) { - /* bpf_map_xxx(map_ptr) call: remember that map_ptr */ - meta->map_ptr = reg->map_ptr; + if (arg_type == ARG_CONST_MAP_PTR || + arg_type == ARG_CONST_MAP_PTR_OR_NULL) { + meta->map_ptr = register_is_null(reg) ? NULL : reg->map_ptr; } else if (arg_type == ARG_PTR_TO_MAP_KEY) { /* bpf_map_xxx(..., map_ptr, ..., key) call: * check that [key, key + map->key_size) are within From patchwork Wed Dec 16 14:30:33 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Hangbin Liu X-Patchwork-Id: 11977701 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0C5FFC2BBD4 for ; Wed, 16 Dec 2020 14:31:53 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id C058D233F8 for ; Wed, 16 Dec 2020 14:31:52 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726537AbgLPObt (ORCPT ); Wed, 16 Dec 2020 09:31:49 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40488 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726535AbgLPObs (ORCPT ); Wed, 16 Dec 2020 09:31:48 -0500 Received: from mail-pl1-x634.google.com (mail-pl1-x634.google.com [IPv6:2607:f8b0:4864:20::634]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 78FF4C0617B0; Wed, 16 Dec 2020 06:31:08 -0800 (PST) Received: by mail-pl1-x634.google.com with SMTP id bj5so13019546plb.4; Wed, 16 Dec 2020 06:31:08 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=8bs9o1aVr9vs38fE6hFy4s7ug3gAsTr97BQwZceNo6E=; b=DwhAjCDw0b5DwzQaGApeTYXDa2E3dHzvkUsD5GKVN7fLryGR87/j86kd4N96Wk7Cz8 Xh4bxaWyqaRANLBQ8znUuryID30tAYZBfBwIeEyX62Ibd+VJmkCVeDFce8m2iD23PVBm 7HCOp81AmJ9McDHHUZyGVDoz0rnOLL+sFSGUVl6zfbMh9eNwXduwnxVzqg2q9RoGBJZz 6cZfeNNjtorCSg6mGeGcoWeE2tNzkWQzGlCeyIQA115pYZOeh5RBU0jjkM0OtJOIwInV L11apL8DshsdbJTAuPzqTSbwNop8ZNExUesAx1bneh5Sa/vxLKvg4MpIikSFbePG1ZSY NRnA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=8bs9o1aVr9vs38fE6hFy4s7ug3gAsTr97BQwZceNo6E=; b=IaVMo8h49qBBn8GEv0fnn08tDr5r0OtmOraZAwYxPlkbLwIJDKx/Hn9Ljgds7h971p jkyi6PgRugYIoOQ9M4kMfVemuoxQJe5OuzJqkMxq3omgxw4w6uiqIhO2mLqXIRbV5GM+ QBR93C4heqHrwaxLNDwuspfcffjt8PA4CRrFX0xiyZW5MtO/lTiXCh09CkNOgImScINy jf8ira1qCFSXoXfJh5vaXpKsajJFUGawPb7tuVWal5I11Dz6Z6k+O7Pbe0NBV/PY2q// w4e7PehoqDzRFPf1gl1L+3TwZH7zWZKYoQ925U+5zkKKmHLcA0c1hye1xso8bpqGlJD8 EwCg== X-Gm-Message-State: AOAM530ZcqtKeU6d/sfH5a9ysrDuKK2j00W4USgqxqRF+7uowPIelLrJ jfXAWQr/NsgiG8WWFPr814NrZU0oI6R59qVB X-Google-Smtp-Source: ABdhPJwUGz3B6hd4qtRbMoL7hh2g3U35G0KojBLWXtFBllJZcdtAUU5GTP6ZvHMhpoASn7yGxYLqfQ== X-Received: by 2002:a17:902:7e85:b029:da:726a:3a4f with SMTP id z5-20020a1709027e85b02900da726a3a4fmr2567840pla.65.1608129067297; Wed, 16 Dec 2020 06:31:07 -0800 (PST) Received: from localhost.localdomain.com ([209.132.188.80]) by smtp.gmail.com with ESMTPSA id a141sm2858802pfa.189.2020.12.16.06.31.03 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 16 Dec 2020 06:31:06 -0800 (PST) From: Hangbin Liu To: bpf@vger.kernel.org Cc: netdev@vger.kernel.org, =?utf-8?q?Toke_H=C3=B8iland-J=C3=B8rgensen?= , Jiri Benc , Jesper Dangaard Brouer , Eelco Chaudron , ast@kernel.org, Daniel Borkmann , Lorenzo Bianconi , David Ahern , Andrii Nakryiko , Alexei Starovoitov , Hangbin Liu Subject: [PATCHv12 bpf-next 3/6] xdp: add a new helper for dev map multicast support Date: Wed, 16 Dec 2020 22:30:33 +0800 Message-Id: <20201216143036.2296568-4-liuhangbin@gmail.com> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20201216143036.2296568-1-liuhangbin@gmail.com> References: <20200907082724.1721685-1-liuhangbin@gmail.com> <20201216143036.2296568-1-liuhangbin@gmail.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net This patch is for xdp multicast support. which has been discussed before[0], The goal is to be able to implement an OVS-like data plane in XDP, i.e., a software switch that can forward XDP frames to multiple ports. To achieve this, an application needs to specify a group of interfaces to forward a packet to. It is also common to want to exclude one or more physical interfaces from the forwarding operation - e.g., to forward a packet to all interfaces in the multicast group except the interface it arrived on. While this could be done simply by adding more groups, this quickly leads to a combinatorial explosion in the number of groups an application has to maintain. To avoid the combinatorial explosion, we propose to include the ability to specify an "exclude group" as part of the forwarding operation. This needs to be a group (instead of just a single port index), because a physical interface can be part of a logical grouping, such as a bond device. Thus, the logical forwarding operation becomes a "set difference" operation, i.e. "forward to all ports in group A that are not also in group B". This series implements such an operation using device maps to represent the groups. This means that the XDP program specifies two device maps, one containing the list of netdevs to redirect to, and the other containing the exclude list. To achieve this, I re-implement a new helper bpf_redirect_map_multi() to accept two maps, the forwarding map and exclude map. The forwarding map could be DEVMAP or DEVMAP_HASH, but the exclude map *must* be DEVMAP_HASH to get better performace. If user don't want to use exclude map and just want simply stop redirecting back to ingress device, they can use flag BPF_F_EXCLUDE_INGRESS. As both bpf_xdp_redirect_map() and this new helpers are using struct bpf_redirect_info, I add a new ex_map and set tgt_value to NULL in the new helper to make a difference with bpf_xdp_redirect_map(). Also I keep the general data path in net/core/filter.c, the native data path in kernel/bpf/devmap.c so we can use direct calls to get better performace. [0] https://xdp-project.net/#Handling-multicast Signed-off-by: Hangbin Liu --- v12: rebase the code based on Jespoer's devmap xdp_prog patch v11: Fix bpf_redirect_map_multi() helper description typo. Add loop limit for devmap_get_next_obj() and dev_map_redirect_multi(). v10: Update helper bpf_xdp_redirect_map_multi() - No need to check map pointer as we will do the check in verifier. v9: Update helper bpf_xdp_redirect_map_multi() - Use ARG_CONST_MAP_PTR_OR_NULL for helper arg2 v8: Update function dev_in_exclude_map(): - remove duplicate ex_map map_type check in - lookup the element in dev map by obj dev index directly instead of looping all the map v7: a) Fix helper flag check b) Limit the *ex_map* to use DEVMAP_HASH only and update function dev_in_exclude_map() to get better performance. v6: converted helper return types from int to long v5: a) Check devmap_get_next_key() return value. b) Pass through flags to __bpf_tx_xdp_map() instead of bool value. c) In function dev_map_enqueue_multi(), consume xdpf for the last obj instead of the first on. d) Update helper description and code comments to explain that we use NULL target value to distinguish multicast and unicast forwarding. e) Update memory model, memory id and frame_sz in xdpf_clone(). v4: Fix bpf_xdp_redirect_map_multi_proto arg2_type typo v3: Based on Toke's suggestion, do the following update a) Update bpf_redirect_map_multi() description in bpf.h. b) Fix exclude_ifindex checking order in dev_in_exclude_map(). c) Fix one more xdpf clone in dev_map_enqueue_multi(). d) Go find next one in dev_map_enqueue_multi() if the interface is not able to forward instead of abort the whole loop. e) Remove READ_ONCE/WRITE_ONCE for ex_map. v2: Add new syscall bpf_xdp_redirect_map_multi() which could accept include/exclude maps directly. --- include/linux/bpf.h | 20 +++++ include/linux/filter.h | 1 + include/net/xdp.h | 1 + include/uapi/linux/bpf.h | 27 +++++++ kernel/bpf/devmap.c | 132 +++++++++++++++++++++++++++++++++ kernel/bpf/verifier.c | 6 ++ net/core/filter.c | 118 +++++++++++++++++++++++++++-- net/core/xdp.c | 29 ++++++++ tools/include/uapi/linux/bpf.h | 27 +++++++ 9 files changed, 356 insertions(+), 5 deletions(-) diff --git a/include/linux/bpf.h b/include/linux/bpf.h index 7850c87456fc..2b4eb7b7f2e2 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -1419,6 +1419,11 @@ int dev_xdp_enqueue(struct net_device *dev, struct xdp_buff *xdp, struct net_device *dev_rx); int dev_map_enqueue(struct bpf_dtab_netdev *dst, struct xdp_buff *xdp, struct net_device *dev_rx); +bool dev_in_exclude_map(struct bpf_dtab_netdev *obj, struct bpf_map *map, + int exclude_ifindex); +int dev_map_enqueue_multi(struct xdp_buff *xdp, struct net_device *dev_rx, + struct bpf_map *map, struct bpf_map *ex_map, + u32 flags); int dev_map_generic_redirect(struct bpf_dtab_netdev *dst, struct sk_buff *skb, struct bpf_prog *xdp_prog); bool dev_map_can_have_prog(struct bpf_map *map); @@ -1587,6 +1592,21 @@ int dev_map_enqueue(struct bpf_dtab_netdev *dst, struct xdp_buff *xdp, return 0; } +static inline +bool dev_in_exclude_map(struct bpf_dtab_netdev *obj, struct bpf_map *map, + int exclude_ifindex) +{ + return false; +} + +static inline +int dev_map_enqueue_multi(struct xdp_buff *xdp, struct net_device *dev_rx, + struct bpf_map *map, struct bpf_map *ex_map, + u32 flags) +{ + return 0; +} + struct sk_buff; static inline int dev_map_generic_redirect(struct bpf_dtab_netdev *dst, diff --git a/include/linux/filter.h b/include/linux/filter.h index 29c27656165b..55095e4759b4 100644 --- a/include/linux/filter.h +++ b/include/linux/filter.h @@ -620,6 +620,7 @@ struct bpf_redirect_info { u32 tgt_index; void *tgt_value; struct bpf_map *map; + struct bpf_map *ex_map; u32 kern_flags; struct bpf_nh_params nh; }; diff --git a/include/net/xdp.h b/include/net/xdp.h index 600acb307db6..2d90f641d9ac 100644 --- a/include/net/xdp.h +++ b/include/net/xdp.h @@ -145,6 +145,7 @@ void xdp_warn(const char *msg, const char *func, const int line); #define XDP_WARN(msg) xdp_warn(msg, __func__, __LINE__) struct xdp_frame *xdp_convert_zc_to_xdp_frame(struct xdp_buff *xdp); +struct xdp_frame *xdpf_clone(struct xdp_frame *xdpf); static inline void xdp_convert_frame_to_buff(struct xdp_frame *frame, struct xdp_buff *xdp) diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h index 77d7c1bb2923..b22e79220df5 100644 --- a/include/uapi/linux/bpf.h +++ b/include/uapi/linux/bpf.h @@ -3830,6 +3830,27 @@ union bpf_attr { * Return * A pointer to a struct socket on success or NULL if the file is * not a socket. + * + * long bpf_redirect_map_multi(struct bpf_map *map, struct bpf_map *ex_map, u64 flags) + * Description + * This is a multicast implementation for XDP redirect. It will + * redirect the packet to ALL the interfaces in *map*, but + * exclude the interfaces in *ex_map*. + * + * The forwarding *map* could be either BPF_MAP_TYPE_DEVMAP or + * BPF_MAP_TYPE_DEVMAP_HASH. But the *ex_map* must be + * BPF_MAP_TYPE_DEVMAP_HASH to get better performance. + * + * Currently the *flags* only supports *BPF_F_EXCLUDE_INGRESS*, + * which additionally excludes the current ingress device. + * + * See also bpf_redirect_map() as a unicast implementation, + * which supports redirecting packet to a specific ifindex + * in the map. As both helpers use struct bpf_redirect_info + * to store the redirect info, we will use a a NULL tgt_value + * to distinguish multicast and unicast redirecting. + * Return + * **XDP_REDIRECT** on success, or **XDP_ABORTED** on error. */ #define __BPF_FUNC_MAPPER(FN) \ FN(unspec), \ @@ -3995,6 +4016,7 @@ union bpf_attr { FN(ktime_get_coarse_ns), \ FN(ima_inode_hash), \ FN(sock_from_file), \ + FN(redirect_map_multi), \ /* */ /* integer value in 'imm' field of BPF_CALL instruction selects which helper @@ -4171,6 +4193,11 @@ enum { BPF_F_BPRM_SECUREEXEC = (1ULL << 0), }; +/* BPF_FUNC_redirect_map_multi flags. */ +enum { + BPF_F_EXCLUDE_INGRESS = (1ULL << 0), +}; + #define __bpf_md_ptr(type, name) \ union { \ type name; \ diff --git a/kernel/bpf/devmap.c b/kernel/bpf/devmap.c index 2a83232cf63a..c4ecfa6b2873 100644 --- a/kernel/bpf/devmap.c +++ b/kernel/bpf/devmap.c @@ -527,6 +527,138 @@ int dev_map_enqueue(struct bpf_dtab_netdev *dst, struct xdp_buff *xdp, return __xdp_enqueue(dev, xdp, dev_rx, dst); } +/* Use direct call in fast path instead of map->ops->map_get_next_key() */ +static int devmap_get_next_key(struct bpf_map *map, void *key, void *next_key) +{ + + switch (map->map_type) { + case BPF_MAP_TYPE_DEVMAP: + return dev_map_get_next_key(map, key, next_key); + case BPF_MAP_TYPE_DEVMAP_HASH: + return dev_map_hash_get_next_key(map, key, next_key); + default: + break; + } + + return -ENOENT; +} + +bool dev_in_exclude_map(struct bpf_dtab_netdev *obj, struct bpf_map *map, + int exclude_ifindex) +{ + if (obj->dev->ifindex == exclude_ifindex) + return true; + + if (!map) + return false; + + return __dev_map_hash_lookup_elem(map, obj->dev->ifindex) != NULL; +} + +static struct bpf_dtab_netdev *devmap_get_next_obj(struct xdp_buff *xdp, struct bpf_map *map, + struct bpf_map *ex_map, u32 *key, + u32 *next_key, int ex_ifindex) +{ + struct bpf_dtab_netdev *obj; + struct net_device *dev; + u32 *tmp_key = key; + u32 index; + int err; + + err = devmap_get_next_key(map, tmp_key, next_key); + if (err) + return NULL; + + /* When using dev map hash, we could restart the hashtab traversal + * in case the key has been updated/removed in the mean time. + * So we may end up potentially looping due to traversal restarts + * from first elem. + * + * Let's use map's max_entries to limit the loop number. + */ + for (index = 0; index < map->max_entries; index++) { + switch (map->map_type) { + case BPF_MAP_TYPE_DEVMAP: + obj = __dev_map_lookup_elem(map, *next_key); + break; + case BPF_MAP_TYPE_DEVMAP_HASH: + obj = __dev_map_hash_lookup_elem(map, *next_key); + break; + default: + break; + } + + if (!obj || dev_in_exclude_map(obj, ex_map, ex_ifindex)) + goto find_next; + + dev = obj->dev; + + if (!dev->netdev_ops->ndo_xdp_xmit) + goto find_next; + + err = xdp_ok_fwd_dev(dev, xdp->data_end - xdp->data); + if (unlikely(err)) + goto find_next; + + return obj; + +find_next: + tmp_key = next_key; + err = devmap_get_next_key(map, tmp_key, next_key); + if (err) + break; + } + + return NULL; +} + +int dev_map_enqueue_multi(struct xdp_buff *xdp, struct net_device *dev_rx, + struct bpf_map *map, struct bpf_map *ex_map, + u32 flags) +{ + struct bpf_dtab_netdev *obj = NULL, *next_obj = NULL; + struct xdp_frame *xdpf, *nxdpf; + bool last_one = false; + int ex_ifindex; + u32 key, next_key; + + ex_ifindex = flags & BPF_F_EXCLUDE_INGRESS ? dev_rx->ifindex : 0; + + /* Find first available obj */ + obj = devmap_get_next_obj(xdp, map, ex_map, NULL, &key, ex_ifindex); + if (!obj) + return 0; + + xdpf = xdp_convert_buff_to_frame(xdp); + if (unlikely(!xdpf)) + return -EOVERFLOW; + + for (;;) { + /* Check if we still have one more available obj */ + next_obj = devmap_get_next_obj(xdp, map, ex_map, &key, + &next_key, ex_ifindex); + if (!next_obj) + last_one = true; + + if (last_one) { + bq_enqueue(obj->dev, xdpf, dev_rx, obj); + return 0; + } + + nxdpf = xdpf_clone(xdpf); + if (unlikely(!nxdpf)) { + xdp_return_frame_rx_napi(xdpf); + return -ENOMEM; + } + + bq_enqueue(obj->dev, nxdpf, dev_rx, obj); + + /* Deal with next obj */ + obj = next_obj; + key = next_key; + } +} + int dev_map_generic_redirect(struct bpf_dtab_netdev *dst, struct sk_buff *skb, struct bpf_prog *xdp_prog) { diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index 9f6633c9ea12..fd297b929edd 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -4420,6 +4420,7 @@ static int check_map_func_compatibility(struct bpf_verifier_env *env, case BPF_MAP_TYPE_DEVMAP: case BPF_MAP_TYPE_DEVMAP_HASH: if (func_id != BPF_FUNC_redirect_map && + func_id != BPF_FUNC_redirect_map_multi && func_id != BPF_FUNC_map_lookup_elem) goto error; break; @@ -4524,6 +4525,11 @@ static int check_map_func_compatibility(struct bpf_verifier_env *env, map->map_type != BPF_MAP_TYPE_XSKMAP) goto error; break; + case BPF_FUNC_redirect_map_multi: + if (map->map_type != BPF_MAP_TYPE_DEVMAP && + map->map_type != BPF_MAP_TYPE_DEVMAP_HASH) + goto error; + break; case BPF_FUNC_sk_redirect_map: case BPF_FUNC_msg_redirect_map: case BPF_FUNC_sock_map_update: diff --git a/net/core/filter.c b/net/core/filter.c index 255aeee72402..a5d97cb1054b 100644 --- a/net/core/filter.c +++ b/net/core/filter.c @@ -3924,12 +3924,19 @@ static const struct bpf_func_proto bpf_xdp_adjust_meta_proto = { }; static int __bpf_tx_xdp_map(struct net_device *dev_rx, void *fwd, - struct bpf_map *map, struct xdp_buff *xdp) + struct bpf_map *map, struct xdp_buff *xdp, + struct bpf_map *ex_map, u32 flags) { switch (map->map_type) { case BPF_MAP_TYPE_DEVMAP: case BPF_MAP_TYPE_DEVMAP_HASH: - return dev_map_enqueue(fwd, xdp, dev_rx); + /* We use a NULL fwd value to distinguish multicast + * and unicast forwarding + */ + if (fwd) + return dev_map_enqueue(fwd, xdp, dev_rx); + else + return dev_map_enqueue_multi(xdp, dev_rx, map, ex_map, flags); case BPF_MAP_TYPE_CPUMAP: return cpu_map_enqueue(fwd, xdp, dev_rx); case BPF_MAP_TYPE_XSKMAP: @@ -3986,12 +3993,14 @@ int xdp_do_redirect(struct net_device *dev, struct xdp_buff *xdp, { struct bpf_redirect_info *ri = this_cpu_ptr(&bpf_redirect_info); struct bpf_map *map = READ_ONCE(ri->map); + struct bpf_map *ex_map = ri->ex_map; u32 index = ri->tgt_index; void *fwd = ri->tgt_value; int err; ri->tgt_index = 0; ri->tgt_value = NULL; + ri->ex_map = NULL; WRITE_ONCE(ri->map, NULL); if (unlikely(!map)) { @@ -4003,7 +4012,7 @@ int xdp_do_redirect(struct net_device *dev, struct xdp_buff *xdp, err = dev_xdp_enqueue(fwd, xdp, dev); } else { - err = __bpf_tx_xdp_map(dev, fwd, map, xdp); + err = __bpf_tx_xdp_map(dev, fwd, map, xdp, ex_map, ri->flags); } if (unlikely(err)) @@ -4017,6 +4026,62 @@ int xdp_do_redirect(struct net_device *dev, struct xdp_buff *xdp, } EXPORT_SYMBOL_GPL(xdp_do_redirect); +static int dev_map_redirect_multi(struct net_device *dev, struct sk_buff *skb, + struct bpf_prog *xdp_prog, + struct bpf_map *map, struct bpf_map *ex_map, + u32 flags) + +{ + struct bpf_dtab_netdev *dst; + struct sk_buff *nskb; + bool exclude_ingress; + u32 key, next_key, index; + void *fwd; + int err; + + /* Get first key from forward map */ + err = map->ops->map_get_next_key(map, NULL, &key); + if (err) + return err; + + exclude_ingress = !!(flags & BPF_F_EXCLUDE_INGRESS); + + /* When using dev map hash, we could restart the hashtab traversal + * in case the key has been updated/removed in the mean time. + * So we may end up potentially looping due to traversal restarts + * from first elem. + * + * Let's use map's max_entries to limit the loop number. + */ + for (index = 0; index < map->max_entries; index++) { + fwd = __xdp_map_lookup_elem(map, key); + if (fwd) { + dst = (struct bpf_dtab_netdev *)fwd; + if (dev_in_exclude_map(dst, ex_map, + exclude_ingress ? dev->ifindex : 0)) + goto find_next; + + nskb = skb_clone(skb, GFP_ATOMIC); + if (!nskb) + return -ENOMEM; + + /* Try forword next one no mater the current forward + * succeed or not */ + dev_map_generic_redirect(dst, nskb, xdp_prog); + } + +find_next: + err = map->ops->map_get_next_key(map, &key, &next_key); + if (err) + break; + + key = next_key; + } + + consume_skb(skb); + return 0; +} + static int xdp_do_generic_redirect_map(struct net_device *dev, struct sk_buff *skb, struct xdp_buff *xdp, @@ -4024,19 +4089,30 @@ static int xdp_do_generic_redirect_map(struct net_device *dev, struct bpf_map *map) { struct bpf_redirect_info *ri = this_cpu_ptr(&bpf_redirect_info); + struct bpf_map *ex_map = ri->ex_map; u32 index = ri->tgt_index; void *fwd = ri->tgt_value; int err = 0; ri->tgt_index = 0; ri->tgt_value = NULL; + ri->ex_map = NULL; WRITE_ONCE(ri->map, NULL); if (map->map_type == BPF_MAP_TYPE_DEVMAP || map->map_type == BPF_MAP_TYPE_DEVMAP_HASH) { - struct bpf_dtab_netdev *dst = fwd; + /* We use a NULL fwd value to distinguish multicast + * and unicast forwarding + */ + if (fwd) { + struct bpf_dtab_netdev *dst = fwd; + + err = dev_map_generic_redirect(dst, skb, xdp_prog); + } else { + err = dev_map_redirect_multi(dev, skb, xdp_prog, map, + ex_map, ri->flags); + } - err = dev_map_generic_redirect(dst, skb, xdp_prog); if (unlikely(err)) goto err; } else if (map->map_type == BPF_MAP_TYPE_XSKMAP) { @@ -4150,6 +4226,36 @@ static const struct bpf_func_proto bpf_xdp_redirect_map_proto = { .arg3_type = ARG_ANYTHING, }; +BPF_CALL_3(bpf_xdp_redirect_map_multi, struct bpf_map *, map, + struct bpf_map *, ex_map, u64, flags) +{ + struct bpf_redirect_info *ri = this_cpu_ptr(&bpf_redirect_info); + + /* Limit ex_map type to DEVMAP_HASH to get better performance */ + if (unlikely((ex_map && ex_map->map_type != BPF_MAP_TYPE_DEVMAP_HASH) || + flags & ~BPF_F_EXCLUDE_INGRESS)) + return XDP_ABORTED; + + ri->tgt_index = 0; + /* Set the tgt_value to NULL to distinguish with bpf_xdp_redirect_map */ + ri->tgt_value = NULL; + ri->flags = flags; + ri->ex_map = ex_map; + + WRITE_ONCE(ri->map, map); + + return XDP_REDIRECT; +} + +static const struct bpf_func_proto bpf_xdp_redirect_map_multi_proto = { + .func = bpf_xdp_redirect_map_multi, + .gpl_only = false, + .ret_type = RET_INTEGER, + .arg1_type = ARG_CONST_MAP_PTR, + .arg2_type = ARG_CONST_MAP_PTR_OR_NULL, + .arg3_type = ARG_ANYTHING, +}; + static unsigned long bpf_skb_copy(void *dst_buff, const void *skb, unsigned long off, unsigned long len) { @@ -7227,6 +7333,8 @@ xdp_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog) return &bpf_xdp_redirect_proto; case BPF_FUNC_redirect_map: return &bpf_xdp_redirect_map_proto; + case BPF_FUNC_redirect_map_multi: + return &bpf_xdp_redirect_map_multi_proto; case BPF_FUNC_xdp_adjust_tail: return &bpf_xdp_adjust_tail_proto; case BPF_FUNC_fib_lookup: diff --git a/net/core/xdp.c b/net/core/xdp.c index 3a8c9ab4ecbe..6d86af029dc5 100644 --- a/net/core/xdp.c +++ b/net/core/xdp.c @@ -513,3 +513,32 @@ void xdp_warn(const char *msg, const char *func, const int line) WARN(1, "XDP_WARN: %s(line:%d): %s\n", func, line, msg); }; EXPORT_SYMBOL_GPL(xdp_warn); + +struct xdp_frame *xdpf_clone(struct xdp_frame *xdpf) +{ + unsigned int headroom, totalsize; + struct xdp_frame *nxdpf; + struct page *page; + void *addr; + + headroom = xdpf->headroom + sizeof(*xdpf); + totalsize = headroom + xdpf->len; + + if (unlikely(totalsize > PAGE_SIZE)) + return NULL; + page = dev_alloc_page(); + if (!page) + return NULL; + addr = page_to_virt(page); + + memcpy(addr, xdpf, totalsize); + + nxdpf = addr; + nxdpf->data = addr + headroom; + nxdpf->frame_sz = PAGE_SIZE; + nxdpf->mem.type = MEM_TYPE_PAGE_ORDER0; + nxdpf->mem.id = 0; + + return nxdpf; +} +EXPORT_SYMBOL_GPL(xdpf_clone); diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h index 77d7c1bb2923..b22e79220df5 100644 --- a/tools/include/uapi/linux/bpf.h +++ b/tools/include/uapi/linux/bpf.h @@ -3830,6 +3830,27 @@ union bpf_attr { * Return * A pointer to a struct socket on success or NULL if the file is * not a socket. + * + * long bpf_redirect_map_multi(struct bpf_map *map, struct bpf_map *ex_map, u64 flags) + * Description + * This is a multicast implementation for XDP redirect. It will + * redirect the packet to ALL the interfaces in *map*, but + * exclude the interfaces in *ex_map*. + * + * The forwarding *map* could be either BPF_MAP_TYPE_DEVMAP or + * BPF_MAP_TYPE_DEVMAP_HASH. But the *ex_map* must be + * BPF_MAP_TYPE_DEVMAP_HASH to get better performance. + * + * Currently the *flags* only supports *BPF_F_EXCLUDE_INGRESS*, + * which additionally excludes the current ingress device. + * + * See also bpf_redirect_map() as a unicast implementation, + * which supports redirecting packet to a specific ifindex + * in the map. As both helpers use struct bpf_redirect_info + * to store the redirect info, we will use a a NULL tgt_value + * to distinguish multicast and unicast redirecting. + * Return + * **XDP_REDIRECT** on success, or **XDP_ABORTED** on error. */ #define __BPF_FUNC_MAPPER(FN) \ FN(unspec), \ @@ -3995,6 +4016,7 @@ union bpf_attr { FN(ktime_get_coarse_ns), \ FN(ima_inode_hash), \ FN(sock_from_file), \ + FN(redirect_map_multi), \ /* */ /* integer value in 'imm' field of BPF_CALL instruction selects which helper @@ -4171,6 +4193,11 @@ enum { BPF_F_BPRM_SECUREEXEC = (1ULL << 0), }; +/* BPF_FUNC_redirect_map_multi flags. */ +enum { + BPF_F_EXCLUDE_INGRESS = (1ULL << 0), +}; + #define __bpf_md_ptr(type, name) \ union { \ type name; \ From patchwork Wed Dec 16 14:30:34 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Hangbin Liu X-Patchwork-Id: 11977703 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1FFA7C0018C for ; Wed, 16 Dec 2020 14:32:07 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id D386F233EB for ; Wed, 16 Dec 2020 14:32:04 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726535AbgLPOcE (ORCPT ); Wed, 16 Dec 2020 09:32:04 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40528 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726398AbgLPOcE (ORCPT ); Wed, 16 Dec 2020 09:32:04 -0500 Received: from mail-pl1-x62c.google.com (mail-pl1-x62c.google.com [IPv6:2607:f8b0:4864:20::62c]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E42DDC06138C; Wed, 16 Dec 2020 06:31:12 -0800 (PST) Received: by mail-pl1-x62c.google.com with SMTP id bj5so13019669plb.4; Wed, 16 Dec 2020 06:31:12 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=g1j8+acOWDLw0bmtM9yqkrEBjFLMZU5ebS8aJ/s+Aoo=; b=ZWNnWdxb5wMcNs2Q8y/aJiBQ9Mmax1o3ZrluKb4MhVg97eialBA6cCaUZywrgbIyqY //Zx81kji+7ITE9/+woxNy6Dg18i4SWuAboYF+FAVlCSTFr9oficTVX4LX1iUyKjtNNo wXqbfHnEBdZvpVT7ZiEkQ+BaJM961onecVJktdazdoTWdOi+HUgwabQqRu3dQXZ4eiYF Xjw05uJuUlKPt38k/U1N8TQmOqONrVKy0dHUuaURkG6yrCGkeKwnxasTr4RbnkMZMGzM YFl3xGeyubN4APPsW0+EFG2UjBT9w5/ngaEAM4cXcLchPQ1naPGCjTQA1NMKDlBT3gyh w+aQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=g1j8+acOWDLw0bmtM9yqkrEBjFLMZU5ebS8aJ/s+Aoo=; b=o+A1WChgTR22P4mybZXVNIaNVG2uBRslQuHm75coQgbDlfrcczTPSV4AIz+z9+VDzn u7AATprFL26+pmyqMxmJb4R26+K2fCyD7Wd6RUYuX2th3KbVOguvACLPKSwoTjc7Qjc7 uWE5qWmmkU9ZCdHDWkkBxnsKcihuUqj4kjHMWACSDKL6M1yDPf33kyOjMVD+tFYJPYDn JpPjeiWV4DR9Og7c1D+RgcNWOU4sOpizyBBxwBrb9PQOk01H2hrPBL1spLJy7r1DdP/r GCfSTJOsZ0dP6aZn6dYM+VDcthfNn7hsIcphJlTI14SSlAmjiM5EZEMj5eSbohS0TGe7 zf9A== X-Gm-Message-State: AOAM530JCGsC/ph5E42eJ+oEKds5WCGIfkcv/YwTBtj6eY/m7ngVKXBU dKavrlNiVYd43Prm0cGnyJvV17XL6qUX2cv9 X-Google-Smtp-Source: ABdhPJycFpE3EDLT0U2bbjIsaB7rWmVRZRB9VOLuMQPEcCVm2F2rJ3LmaVdAXxEQqojUoV2vAt1W4w== X-Received: by 2002:a17:902:c383:b029:db:c725:e325 with SMTP id g3-20020a170902c383b02900dbc725e325mr31674442plg.21.1608129072076; Wed, 16 Dec 2020 06:31:12 -0800 (PST) Received: from localhost.localdomain.com ([209.132.188.80]) by smtp.gmail.com with ESMTPSA id a141sm2858802pfa.189.2020.12.16.06.31.07 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 16 Dec 2020 06:31:11 -0800 (PST) From: Hangbin Liu To: bpf@vger.kernel.org Cc: netdev@vger.kernel.org, =?utf-8?q?Toke_H=C3=B8iland-J=C3=B8rgensen?= , Jiri Benc , Jesper Dangaard Brouer , Eelco Chaudron , ast@kernel.org, Daniel Borkmann , Lorenzo Bianconi , David Ahern , Andrii Nakryiko , Alexei Starovoitov , Hangbin Liu Subject: [PATCHv12 bpf-next 4/6] sample/bpf: add xdp_redirect_map_multicast test Date: Wed, 16 Dec 2020 22:30:34 +0800 Message-Id: <20201216143036.2296568-5-liuhangbin@gmail.com> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20201216143036.2296568-1-liuhangbin@gmail.com> References: <20200907082724.1721685-1-liuhangbin@gmail.com> <20201216143036.2296568-1-liuhangbin@gmail.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net This is a sample for xdp multicast. In the sample we could forward all packets between given interfaces. There is also an option -X that could enable 2nd xdp_prog on egress interface. Signed-off-by: Hangbin Liu --- v12: add devmap xdp_prog on egress support v10-v11: no update v9: use NULL directly for arg2 and redefine the maps with btf format v5: add a null_map as we have strict the arg2 to ARG_CONST_MAP_PTR. Move the testing part to bpf selftest in next patch. v4: no update. v3: add rxcnt map to show the packet transmit speed. v2: no update. --- samples/bpf/Makefile | 3 + samples/bpf/xdp_redirect_map_multi_kern.c | 96 +++++++ samples/bpf/xdp_redirect_map_multi_user.c | 301 ++++++++++++++++++++++ 3 files changed, 400 insertions(+) create mode 100644 samples/bpf/xdp_redirect_map_multi_kern.c create mode 100644 samples/bpf/xdp_redirect_map_multi_user.c diff --git a/samples/bpf/Makefile b/samples/bpf/Makefile index 26fc96ca619e..200029fcf53c 100644 --- a/samples/bpf/Makefile +++ b/samples/bpf/Makefile @@ -41,6 +41,7 @@ tprogs-y += test_map_in_map tprogs-y += per_socket_stats_example tprogs-y += xdp_redirect tprogs-y += xdp_redirect_map +tprogs-y += xdp_redirect_map_multi tprogs-y += xdp_redirect_cpu tprogs-y += xdp_monitor tprogs-y += xdp_rxq_info @@ -99,6 +100,7 @@ test_map_in_map-objs := test_map_in_map_user.o per_socket_stats_example-objs := cookie_uid_helper_example.o xdp_redirect-objs := xdp_redirect_user.o xdp_redirect_map-objs := xdp_redirect_map_user.o +xdp_redirect_map_multi-objs := xdp_redirect_map_multi_user.o xdp_redirect_cpu-objs := xdp_redirect_cpu_user.o xdp_monitor-objs := xdp_monitor_user.o xdp_rxq_info-objs := xdp_rxq_info_user.o @@ -160,6 +162,7 @@ always-y += tcp_tos_reflect_kern.o always-y += tcp_dumpstats_kern.o always-y += xdp_redirect_kern.o always-y += xdp_redirect_map_kern.o +always-y += xdp_redirect_map_multi_kern.o always-y += xdp_redirect_cpu_kern.o always-y += xdp_monitor_kern.o always-y += xdp_rxq_info_kern.o diff --git a/samples/bpf/xdp_redirect_map_multi_kern.c b/samples/bpf/xdp_redirect_map_multi_kern.c new file mode 100644 index 000000000000..0c63aace9bd2 --- /dev/null +++ b/samples/bpf/xdp_redirect_map_multi_kern.c @@ -0,0 +1,96 @@ +/* SPDX-License-Identifier: GPL-2.0 + * + * modify it under the terms of version 2 of the GNU General Public + * License as published by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, but + * WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * General Public License for more details. + */ +#define KBUILD_MODNAME "foo" +#include +#include +#include +#include +#include +#include + +struct { + __uint(type, BPF_MAP_TYPE_DEVMAP_HASH); + __uint(key_size, sizeof(int)); + __uint(value_size, sizeof(int)); + __uint(max_entries, 32); +} forward_map_general SEC(".maps"); + +struct { + __uint(type, BPF_MAP_TYPE_DEVMAP_HASH); + __uint(key_size, sizeof(int)); + __uint(value_size, sizeof(struct bpf_devmap_val)); + __uint(max_entries, 32); +} forward_map_native SEC(".maps"); + +struct { + __uint(type, BPF_MAP_TYPE_PERCPU_ARRAY); + __type(key, u32); + __type(value, long); + __uint(max_entries, 1); +} rxcnt SEC(".maps"); + +/* map to stroe egress interfaces mac addresses, set the + * max_entries to 1 and extend it in user sapce prog. + */ +struct { + __uint(type, BPF_MAP_TYPE_ARRAY); + __type(key, u32); + __type(value, __be64); + __uint(max_entries, 1); +} mac_map SEC(".maps"); + +static int xdp_redirect_map(struct xdp_md *ctx, void *forward_map) +{ + long *value; + u32 key = 0; + + /* count packet in global counter */ + value = bpf_map_lookup_elem(&rxcnt, &key); + if (value) + *value += 1; + + return bpf_redirect_map_multi(forward_map, NULL, BPF_F_EXCLUDE_INGRESS); +} + +SEC("xdp_redirect_general") +int xdp_redirect_map_general(struct xdp_md *ctx) +{ + return xdp_redirect_map(ctx, &forward_map_general); +} + +SEC("xdp_redirect_native") +int xdp_redirect_map_native(struct xdp_md *ctx) +{ + return xdp_redirect_map(ctx, &forward_map_native); +} + +SEC("xdp_devmap/map_prog") +int xdp_devmap_prog(struct xdp_md *ctx) +{ + void *data_end = (void *)(long)ctx->data_end; + void *data = (void *)(long)ctx->data; + u32 key = ctx->egress_ifindex; + struct ethhdr *eth = data; + __be64 *mac; + u64 nh_off; + + nh_off = sizeof(*eth); + if (data + nh_off > data_end) + return XDP_DROP; + + mac = bpf_map_lookup_elem(&mac_map, &key); + if (mac) + __builtin_memcpy(eth->h_source, mac, ETH_ALEN); + + return XDP_PASS; +} + +char _license[] SEC("license") = "GPL"; diff --git a/samples/bpf/xdp_redirect_map_multi_user.c b/samples/bpf/xdp_redirect_map_multi_user.c new file mode 100644 index 000000000000..67ffc294567c --- /dev/null +++ b/samples/bpf/xdp_redirect_map_multi_user.c @@ -0,0 +1,301 @@ +// SPDX-License-Identifier: GPL-2.0 +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include "bpf_util.h" +#include +#include + +#define MAX_IFACE_NUM 32 + +static __u32 xdp_flags = XDP_FLAGS_UPDATE_IF_NOEXIST; +static int ifaces[MAX_IFACE_NUM] = {}; +static int rxcnt_map_fd; + +static void int_exit(int sig) +{ + __u32 prog_id = 0; + int i; + + for (i = 0; ifaces[i] > 0; i++) { + if (bpf_get_link_xdp_id(ifaces[i], &prog_id, xdp_flags)) { + printf("bpf_get_link_xdp_id failed\n"); + exit(1); + } + if (prog_id) + bpf_set_link_xdp_fd(ifaces[i], -1, xdp_flags); + } + + exit(0); +} + +static void poll_stats(int interval) +{ + unsigned int nr_cpus = bpf_num_possible_cpus(); + __u64 values[nr_cpus], prev[nr_cpus]; + + memset(prev, 0, sizeof(prev)); + + while (1) { + __u64 sum = 0; + __u32 key = 0; + int i; + + sleep(interval); + assert(bpf_map_lookup_elem(rxcnt_map_fd, &key, values) == 0); + for (i = 0; i < nr_cpus; i++) + sum += (values[i] - prev[i]); + if (sum) + printf("Forwarding %10llu pkt/s\n", sum / interval); + memcpy(prev, values, sizeof(values)); + } +} + +static int get_mac_addr(unsigned int ifindex, void *mac_addr) +{ + char ifname[IF_NAMESIZE]; + struct ifreq ifr; + int fd, ret = -1; + + fd = socket(AF_INET, SOCK_DGRAM, 0); + if (fd < 0) + return ret; + + if (!if_indextoname(ifindex, ifname)) + goto err_out; + + strcpy(ifr.ifr_name, ifname); + + if (ioctl(fd, SIOCGIFHWADDR, &ifr) != 0) + goto err_out; + + memcpy(mac_addr, ifr.ifr_hwaddr.sa_data, 6 * sizeof(char)); + ret = 0; + +err_out: + close(fd); + return ret; +} + +static int update_mac_map(struct bpf_object *obj) +{ + int i, ret = -1, mac_map_fd; + unsigned char mac_addr[6]; + unsigned int ifindex; + + mac_map_fd = bpf_object__find_map_fd_by_name(obj, "mac_map"); + if (mac_map_fd < 0) { + printf("find mac map fd failed\n"); + return ret; + } + + for (i = 0; ifaces[i] > 0; i++) { + ifindex = ifaces[i]; + + ret = get_mac_addr(ifindex, mac_addr); + if (ret < 0) { + printf("get interface %d mac failed\n", ifindex); + return ret; + } + + ret = bpf_map_update_elem(mac_map_fd, &ifindex, mac_addr, 0); + if (ret) { + perror("bpf_update_elem mac_map_fd"); + return ret; + } + } + + return 0; +} + +static void usage(const char *prog) +{ + fprintf(stderr, + "usage: %s [OPTS] ...\n" + "OPTS:\n" + " -S use skb-mode\n" + " -N enforce native mode\n" + " -F force loading prog\n" + " -X load xdp program on egress\n", + prog); +} + +int main(int argc, char **argv) +{ + int i, ret, opt, forward_map_fd, max_ifindex = 0; + struct bpf_program *ingress_prog, *egress_prog; + int ingress_prog_fd, egress_prog_fd = -1; + struct bpf_devmap_val devmap_val; + bool attach_egress_prog = false; + char ifname[IF_NAMESIZE]; + struct bpf_map *mac_map; + struct bpf_object *obj; + unsigned int ifindex; + char filename[256]; + + while ((opt = getopt(argc, argv, "SNFX")) != -1) { + switch (opt) { + case 'S': + xdp_flags |= XDP_FLAGS_SKB_MODE; + break; + case 'N': + /* default, set below */ + break; + case 'F': + xdp_flags &= ~XDP_FLAGS_UPDATE_IF_NOEXIST; + break; + case 'X': + attach_egress_prog = true; + break; + default: + usage(basename(argv[0])); + return 1; + } + } + + if (!(xdp_flags & XDP_FLAGS_SKB_MODE)) { + xdp_flags |= XDP_FLAGS_DRV_MODE; + } else if (attach_egress_prog) { + printf("Load xdp program on egress with SKB mode not supported yet\n"); + return 1; + } + + if (optind == argc) { + printf("usage: %s ...\n", argv[0]); + return 1; + } + + printf("Get interfaces"); + for (i = 0; i < MAX_IFACE_NUM && argv[optind + i]; i++) { + ifaces[i] = if_nametoindex(argv[optind + i]); + if (!ifaces[i]) + ifaces[i] = strtoul(argv[optind + i], NULL, 0); + if (!if_indextoname(ifaces[i], ifname)) { + perror("Invalid interface name or i"); + return 1; + } + + /* Find the largest index number */ + if (ifaces[i] > max_ifindex) + max_ifindex = ifaces[i]; + + printf(" %d", ifaces[i]); + } + printf("\n"); + + snprintf(filename, sizeof(filename), "%s_kern.o", argv[0]); + + obj = bpf_object__open(filename); + if (libbpf_get_error(obj)) { + printf("ERROR: opening BPF object file failed\n"); + obj = NULL; + goto err_out; + } + + /* Reset the map size to max ifindex + 1 */ + if (attach_egress_prog) { + mac_map = bpf_object__find_map_by_name(obj, "mac_map"); + ret = bpf_map__resize(mac_map, max_ifindex + 1); + if (ret < 0) { + printf("ERROR: reset mac map size failed\n"); + goto err_out; + } + } + + /* load BPF program */ + if (bpf_object__load(obj)) { + printf("ERROR: loading BPF object file failed\n"); + goto err_out; + } + + if (xdp_flags & XDP_FLAGS_SKB_MODE) { + ingress_prog = bpf_object__find_program_by_title(obj, "xdp_redirect_general"); + forward_map_fd = bpf_object__find_map_fd_by_name(obj, "forward_map_general"); + } else { + ingress_prog = bpf_object__find_program_by_title(obj, "xdp_redirect_native"); + forward_map_fd = bpf_object__find_map_fd_by_name(obj, "forward_map_native"); + } + if (!ingress_prog || forward_map_fd < 0) { + printf("finding ingress_prog/forward_map in obj file failed\n"); + goto err_out; + } + + ingress_prog_fd = bpf_program__fd(ingress_prog); + if (ingress_prog_fd < 0) { + printf("find ingress_prog fd failed\n"); + goto err_out; + } + + rxcnt_map_fd = bpf_object__find_map_fd_by_name(obj, "rxcnt"); + if (rxcnt_map_fd < 0) { + printf("bpf_object__find_map_fd_by_name failed\n"); + goto err_out; + } + + if (attach_egress_prog) { + /* Update mac_map with all egress interfaces' mac addr */ + if (update_mac_map(obj) < 0) { + printf("Error: update mac map failed"); + goto err_out; + } + + /* Find egress prog fd */ + egress_prog = bpf_object__find_program_by_title(obj, "xdp_devmap/map_prog"); + if (!egress_prog) { + printf("finding egress_prog in obj file failed\n"); + goto err_out; + } + egress_prog_fd = bpf_program__fd(egress_prog); + if (egress_prog_fd < 0) { + printf("find egress_prog fd failed\n"); + goto err_out; + } + } + + /* Remove attached program when program is interrupted or killed */ + signal(SIGINT, int_exit); + signal(SIGTERM, int_exit); + + /* Init forward multicast groups */ + for (i = 0; ifaces[i] > 0; i++) { + ifindex = ifaces[i]; + + /* bind prog_fd to each interface */ + ret = bpf_set_link_xdp_fd(ifindex, ingress_prog_fd, xdp_flags); + if (ret) { + printf("Set xdp fd failed on %d\n", ifindex); + goto err_out; + } + + /* Add all the interfaces to forward group and attach + * egress devmap programe if exist */ + devmap_val.ifindex = ifindex; + devmap_val.bpf_prog.fd = egress_prog_fd; + ret = bpf_map_update_elem(forward_map_fd, &ifindex, &devmap_val, 0); + if (ret) { + perror("bpf_map_update_elem forward_map"); + goto err_out; + } + } + + poll_stats(2); + + return 0; + +err_out: + return 1; +} From patchwork Wed Dec 16 14:30:35 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Hangbin Liu X-Patchwork-Id: 11977705 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3CFA4C2BBCA for ; Wed, 16 Dec 2020 14:32:07 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 0D89F233ED for ; Wed, 16 Dec 2020 14:32:07 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726398AbgLPOcE (ORCPT ); Wed, 16 Dec 2020 09:32:04 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40530 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726503AbgLPOcE (ORCPT ); Wed, 16 Dec 2020 09:32:04 -0500 Received: from mail-pg1-x52c.google.com (mail-pg1-x52c.google.com [IPv6:2607:f8b0:4864:20::52c]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8E3A4C061282; Wed, 16 Dec 2020 06:31:17 -0800 (PST) Received: by mail-pg1-x52c.google.com with SMTP id c22so2329748pgg.13; Wed, 16 Dec 2020 06:31:17 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=OXVrjajjuTQUexu3XOfHEJa2VxcqHLEGUo6MBIg1jO4=; b=B+MwAN066r/ACOibRm8vUnyo2l68EP6O3VNmuK42e+EAXebwkg7Dm72rrNRHSEai75 F1igZ/KrGTh8ONk2/IqqPxd7cm+sCBbA2KqUbF/IPmdCZStOQV0C051GCF6MaXhYOJGU lIdLIAVDQthWkXkYs5hzhYaL/afx6DzDcTxAbuaoF7lPuE4pmzmpT6imGxLha4IOJA6d 3Q03GXwpWuw0EhNQjbON1FOm+VyytjX05Eg7xFtvk0C4z7Rl0RdrlQ+B/oH72QmVlOMu lrkJFTpGhZwLEJniEe8Dj0e05RoK+CK6rHDt75vVQ7Gvtuo+1URDJ8VFNfgexlDiL6Bb ZtPQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=OXVrjajjuTQUexu3XOfHEJa2VxcqHLEGUo6MBIg1jO4=; b=b2qns/Sy2901VI7pj6F9ksGzz5ZPwrGGLL0r1okwSAZ+9hOWou854aIuV2Pciba8IY b64HHRuWOojw14pSl4KtFkRNgfxuYjmEcAM3tGQ6JjgsYsNFdVCHt8igMyaM4vO6BRLp xvJVpMbpXYVa2LLwSYlhnLpcbqaRebxVSZxzgIwl8vWcEB3NNeDB7XV8ospF8qWTIvNY a1xDRHbZea7XmBCz6NHs2WznAbP3HkkvJOMXPzmgmP6l5SvVShCwK5CMQBE6lr1ZyVBO ZY8r9UYA4FleSVZt0Of2iXzBfPcggnVopqDdQwi94RylQqF+b82nF52MfPuiQ/VP4IB1 eI8A== X-Gm-Message-State: AOAM531TX2uzlczuu4NCouitDuixSOhc1p1t1cuhiVYkI8mwjYHdYz2p LWxOeOFoYlaLRVPqjxiBQ2d0g7WZ7eSZsmM5 X-Google-Smtp-Source: ABdhPJy0YRFIGbR+UDKw9RltiHAA/92xrNE2H3v2tZ1sssRC31H8tqBtw9++fKwXJAUlN9KPP89Ewg== X-Received: by 2002:a62:6456:0:b029:1a1:e39e:cb46 with SMTP id y83-20020a6264560000b02901a1e39ecb46mr23105441pfb.0.1608129076908; Wed, 16 Dec 2020 06:31:16 -0800 (PST) Received: from localhost.localdomain.com ([209.132.188.80]) by smtp.gmail.com with ESMTPSA id a141sm2858802pfa.189.2020.12.16.06.31.12 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 16 Dec 2020 06:31:16 -0800 (PST) From: Hangbin Liu To: bpf@vger.kernel.org Cc: netdev@vger.kernel.org, =?utf-8?q?Toke_H=C3=B8iland-J=C3=B8rgensen?= , Jiri Benc , Jesper Dangaard Brouer , Eelco Chaudron , ast@kernel.org, Daniel Borkmann , Lorenzo Bianconi , David Ahern , Andrii Nakryiko , Alexei Starovoitov , Hangbin Liu Subject: [PATCHv12 bpf-next 5/6] selftests/bpf: Add verifier tests for bpf arg ARG_CONST_MAP_PTR_OR_NULL Date: Wed, 16 Dec 2020 22:30:35 +0800 Message-Id: <20201216143036.2296568-6-liuhangbin@gmail.com> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20201216143036.2296568-1-liuhangbin@gmail.com> References: <20200907082724.1721685-1-liuhangbin@gmail.com> <20201216143036.2296568-1-liuhangbin@gmail.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net Use helper bpf_redirect_map() and bpf_redirect_map_multi() to test bpf arg ARG_CONST_MAP_PTR and ARG_CONST_MAP_PTR_OR_NULL. Make sure the map arg could be verified correctly when it is NULL or valid map pointer. Add devmap and devmap_hash in struct bpf_test due to bpf_redirect_{map, map_multi} limit. Test result: ]# ./test_verifier 713 716 #713/p ARG_CONST_MAP_PTR: null pointer OK #714/p ARG_CONST_MAP_PTR: valid map pointer OK #715/p ARG_CONST_MAP_PTR_OR_NULL: null pointer for ex_map OK #716/p ARG_CONST_MAP_PTR_OR_NULL: valid map pointer for ex_map OK Summary: 4 PASSED, 0 SKIPPED, 0 FAILED Signed-off-by: Hangbin Liu --- tools/testing/selftests/bpf/test_verifier.c | 22 +++++- .../testing/selftests/bpf/verifier/map_ptr.c | 70 +++++++++++++++++++ 2 files changed, 91 insertions(+), 1 deletion(-) diff --git a/tools/testing/selftests/bpf/test_verifier.c b/tools/testing/selftests/bpf/test_verifier.c index 777a81404fdb..17eb3958ce6d 100644 --- a/tools/testing/selftests/bpf/test_verifier.c +++ b/tools/testing/selftests/bpf/test_verifier.c @@ -50,7 +50,7 @@ #define MAX_INSNS BPF_MAXINSNS #define MAX_TEST_INSNS 1000000 #define MAX_FIXUPS 8 -#define MAX_NR_MAPS 20 +#define MAX_NR_MAPS 22 #define MAX_TEST_RUNS 8 #define POINTER_VALUE 0xcafe4all #define TEST_DATA_LEN 64 @@ -87,6 +87,8 @@ struct bpf_test { int fixup_sk_storage_map[MAX_FIXUPS]; int fixup_map_event_output[MAX_FIXUPS]; int fixup_map_reuseport_array[MAX_FIXUPS]; + int fixup_map_devmap[MAX_FIXUPS]; + int fixup_map_devmap_hash[MAX_FIXUPS]; const char *errstr; const char *errstr_unpriv; uint32_t insn_processed; @@ -640,6 +642,8 @@ static void do_test_fixup(struct bpf_test *test, enum bpf_prog_type prog_type, int *fixup_sk_storage_map = test->fixup_sk_storage_map; int *fixup_map_event_output = test->fixup_map_event_output; int *fixup_map_reuseport_array = test->fixup_map_reuseport_array; + int *fixup_map_devmap = test->fixup_map_devmap; + int *fixup_map_devmap_hash = test->fixup_map_devmap_hash; if (test->fill_helper) { test->fill_insns = calloc(MAX_TEST_INSNS, sizeof(struct bpf_insn)); @@ -817,6 +821,22 @@ static void do_test_fixup(struct bpf_test *test, enum bpf_prog_type prog_type, fixup_map_reuseport_array++; } while (*fixup_map_reuseport_array); } + if (*fixup_map_devmap) { + map_fds[20] = __create_map(BPF_MAP_TYPE_DEVMAP, + sizeof(u32), sizeof(u32), 1, 0); + do { + prog[*fixup_map_devmap].imm = map_fds[20]; + fixup_map_devmap++; + } while (*fixup_map_devmap); + } + if (*fixup_map_devmap_hash) { + map_fds[21] = __create_map(BPF_MAP_TYPE_DEVMAP_HASH, + sizeof(u32), sizeof(u32), 1, 0); + do { + prog[*fixup_map_devmap_hash].imm = map_fds[21]; + fixup_map_devmap_hash++; + } while (*fixup_map_devmap_hash); + } } struct libcap { diff --git a/tools/testing/selftests/bpf/verifier/map_ptr.c b/tools/testing/selftests/bpf/verifier/map_ptr.c index b117bdd3806d..1a532198c9c1 100644 --- a/tools/testing/selftests/bpf/verifier/map_ptr.c +++ b/tools/testing/selftests/bpf/verifier/map_ptr.c @@ -93,3 +93,73 @@ .fixup_map_hash_16b = { 4 }, .result = ACCEPT, }, +{ + "ARG_CONST_MAP_PTR: null pointer", + .insns = { + /* bpf_redirect_map arg1 (map) */ + BPF_MOV64_IMM(BPF_REG_1, 0), + /* bpf_redirect_map arg2 (ifindex) */ + BPF_MOV64_IMM(BPF_REG_2, 0), + /* bpf_redirect_map arg3 (flags) */ + BPF_MOV64_IMM(BPF_REG_3, 0), + BPF_EMIT_CALL(BPF_FUNC_redirect_map), + BPF_EXIT_INSN(), + }, + .result = REJECT, + .prog_type = BPF_PROG_TYPE_XDP, + .errstr = "R1 type=inv expected=map_ptr", +}, +{ + "ARG_CONST_MAP_PTR: valid map pointer", + .insns = { + BPF_MOV64_IMM(BPF_REG_1, 0), + /* bpf_redirect_map arg1 (map) */ + BPF_LD_MAP_FD(BPF_REG_1, 0), + /* bpf_redirect_map arg2 (ifindex) */ + BPF_MOV64_IMM(BPF_REG_2, 0), + /* bpf_redirect_map arg3 (flags) */ + BPF_MOV64_IMM(BPF_REG_3, 0), + BPF_EMIT_CALL(BPF_FUNC_redirect_map), + BPF_EXIT_INSN(), + }, + .fixup_map_devmap = { 1 }, + .result = ACCEPT, + .prog_type = BPF_PROG_TYPE_XDP, +}, +{ + "ARG_CONST_MAP_PTR_OR_NULL: null pointer for ex_map", + .insns = { + BPF_MOV64_IMM(BPF_REG_1, 0), + /* bpf_redirect_map_multi arg1 (in_map) */ + BPF_LD_MAP_FD(BPF_REG_1, 0), + /* bpf_redirect_map_multi arg2 (ex_map) */ + BPF_MOV64_IMM(BPF_REG_2, 0), + /* bpf_redirect_map_multi arg3 (flags) */ + BPF_MOV64_IMM(BPF_REG_3, 0), + BPF_EMIT_CALL(BPF_FUNC_redirect_map_multi), + BPF_EXIT_INSN(), + }, + .fixup_map_devmap = { 1 }, + .result = ACCEPT, + .prog_type = BPF_PROG_TYPE_XDP, + .retval = 4, +}, +{ + "ARG_CONST_MAP_PTR_OR_NULL: valid map pointer for ex_map", + .insns = { + BPF_MOV64_IMM(BPF_REG_1, 0), + /* bpf_redirect_map_multi arg1 (in_map) */ + BPF_LD_MAP_FD(BPF_REG_1, 0), + /* bpf_redirect_map_multi arg2 (ex_map) */ + BPF_LD_MAP_FD(BPF_REG_2, 1), + /* bpf_redirect_map_multi arg3 (flags) */ + BPF_MOV64_IMM(BPF_REG_3, 0), + BPF_EMIT_CALL(BPF_FUNC_redirect_map_multi), + BPF_EXIT_INSN(), + }, + .fixup_map_devmap = { 1 }, + .fixup_map_devmap_hash = { 3 }, + .result = ACCEPT, + .prog_type = BPF_PROG_TYPE_XDP, + .retval = 4, +}, From patchwork Wed Dec 16 14:30:36 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Hangbin Liu X-Patchwork-Id: 11977707 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 43856C4361B for ; Wed, 16 Dec 2020 14:32:17 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id ED9A3233EA for ; Wed, 16 Dec 2020 14:32:16 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726552AbgLPOcQ (ORCPT ); Wed, 16 Dec 2020 09:32:16 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40562 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726550AbgLPOcQ (ORCPT ); Wed, 16 Dec 2020 09:32:16 -0500 Received: from mail-pg1-x531.google.com (mail-pg1-x531.google.com [IPv6:2607:f8b0:4864:20::531]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 79F4FC061285; Wed, 16 Dec 2020 06:31:22 -0800 (PST) Received: by mail-pg1-x531.google.com with SMTP id n7so17796026pgg.2; Wed, 16 Dec 2020 06:31:22 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=ve2QJoSUADQmc+lctSOIT6jiT7hmysASWTeEknZzKWA=; b=MkAfgVhzpAUbZGQRuMtV5yWBe/SEmMs4JdPdKOclG3mzGL/18IdNaWCgubNzHX1++r ibfqMC81IXHjT6R70Gax51G4PxZCelzRFQBkCYMYIhfpLtjGvn3V/tDOO2PoyibsqxfA V356/dCN+qSmH+wabmY8j4abBZEuRZMftxkqz6Avj4zdcbC7i1xGHZQi/5l18LYPYZz8 1Kna0UTsSpWclzNBhaL0uo70BIiaOi3nNJaVQVmFa2oHBGjGBFKIOQD5611w7yeOaojc wW6FG5zozrFU9dKORG/SkLnxeQnU1mlN141W4fkutUqBPqR5bwsJmTkQLRJRYM2aDxKo 98Sg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=ve2QJoSUADQmc+lctSOIT6jiT7hmysASWTeEknZzKWA=; b=RXRgQ9mZOclvH5+mbVlvCjQNu28I4JK0nIK43XLBsHnhtZbQjhsdj1o0tfMVU7MCzd 6Jk9TqOjIfekfDj8phUcVHh8GYuuFXSE9bTCl4gF81wCdzPSw1d60B7CHdvf+ZreDldX /uAWLAwEXYrCPeeckPh5qPTCrFOnj2me4M/GcPCKoyZ2zJgoou9ph+O1xXmXIf35Lb31 K66rc9iRF5H0I7BZ+GfW75h4eZnw9eHUSqHytmbTPWf7RXCR6Fd0KjMZ65tAYp33iv6I yrq0aXZe2TuULTDRfwLdofEWHub5km+SaBjY2qBbnP468X+11BnYYgNdA6uJiWtOIepo YtyA== X-Gm-Message-State: AOAM530t64Jb/98bAbyoRxQs7rjgxdgyktMqZhDshqkJe9yIih1AFQAH TuVPq+cPFHBQO8KtRuDKaDl132T5jZMdZoyc X-Google-Smtp-Source: ABdhPJwDKVPXhknHBUQsmm/9Da9pHA9yWE94Fn+ZGhwY9YS+H2CGR9KfzuSqzaEapdvFIfRy/itCxg== X-Received: by 2002:a63:da58:: with SMTP id l24mr33221441pgj.178.1608129081514; Wed, 16 Dec 2020 06:31:21 -0800 (PST) Received: from localhost.localdomain.com ([209.132.188.80]) by smtp.gmail.com with ESMTPSA id a141sm2858802pfa.189.2020.12.16.06.31.17 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 16 Dec 2020 06:31:21 -0800 (PST) From: Hangbin Liu To: bpf@vger.kernel.org Cc: netdev@vger.kernel.org, =?utf-8?q?Toke_H=C3=B8iland-J=C3=B8rgensen?= , Jiri Benc , Jesper Dangaard Brouer , Eelco Chaudron , ast@kernel.org, Daniel Borkmann , Lorenzo Bianconi , David Ahern , Andrii Nakryiko , Alexei Starovoitov , Hangbin Liu Subject: [PATCHv12 bpf-next 6/6] selftests/bpf: add xdp_redirect_multi test Date: Wed, 16 Dec 2020 22:30:36 +0800 Message-Id: <20201216143036.2296568-7-liuhangbin@gmail.com> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20201216143036.2296568-1-liuhangbin@gmail.com> References: <20200907082724.1721685-1-liuhangbin@gmail.com> <20201216143036.2296568-1-liuhangbin@gmail.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net Add a bpf selftest for new helper xdp_redirect_map_multi(). In this test we have 3 forward groups and 1 exclude group. The test will redirect each interface's packets to all the interfaces in the forward group, and exclude the interface in exclude map. We will also test both DEVMAP and DEVMAP_HASH with xdp generic and drv. For more test details, you can find it in the test script. Here is the test result. ]# ./test_xdp_redirect_multi.sh Pass: xdpgeneric arp ns1-2 Pass: xdpgeneric arp ns1-3 Pass: xdpgeneric arp ns1-4 Pass: xdpgeneric ping ns1-2 Pass: xdpgeneric ping ns1-3 Pass: xdpgeneric ping ns1-4 Pass: xdpgeneric ping6 ns2-1 Pass: xdpgeneric ping6 ns2-3 Pass: xdpgeneric ping6 ns2-4 Pass: xdpdrv arp ns1-2 Pass: xdpdrv arp ns1-3 Pass: xdpdrv arp ns1-4 Pass: xdpdrv ping ns1-2 Pass: xdpdrv ping ns1-3 Pass: xdpdrv ping ns1-4 Pass: xdpdrv ping6 ns2-1 Pass: xdpdrv ping6 ns2-3 Pass: xdpdrv ping6 ns2-4 Pass: xdpegress mac ns1-2 Pass: xdpegress mac ns1-3 Pass: xdpegress mac ns1-4 Pass: xdpegress ping ns1-2 Pass: xdpegress ping ns1-3 Pass: xdpegress ping ns1-4 Summary: PASS 24, FAIL 0 Signed-off-by: Hangbin Liu --- v12: add devmap prog test on egress v9: use NULL directly for arg2 and redefine the maps with btf format --- tools/testing/selftests/bpf/Makefile | 3 +- .../bpf/progs/xdp_redirect_multi_kern.c | 120 ++++++++ .../selftests/bpf/test_xdp_redirect_multi.sh | 208 ++++++++++++++ .../selftests/bpf/xdp_redirect_multi.c | 258 ++++++++++++++++++ 4 files changed, 588 insertions(+), 1 deletion(-) create mode 100644 tools/testing/selftests/bpf/progs/xdp_redirect_multi_kern.c create mode 100755 tools/testing/selftests/bpf/test_xdp_redirect_multi.sh create mode 100644 tools/testing/selftests/bpf/xdp_redirect_multi.c diff --git a/tools/testing/selftests/bpf/Makefile b/tools/testing/selftests/bpf/Makefile index 8c33e999319a..20b9481c679f 100644 --- a/tools/testing/selftests/bpf/Makefile +++ b/tools/testing/selftests/bpf/Makefile @@ -51,6 +51,7 @@ TEST_FILES = test_lwt_ip_encap.o \ # Order correspond to 'make run_tests' order TEST_PROGS := test_kmod.sh \ test_xdp_redirect.sh \ + test_xdp_redirect_multi.sh \ test_xdp_meta.sh \ test_xdp_veth.sh \ test_offload.py \ @@ -80,7 +81,7 @@ TEST_PROGS_EXTENDED := with_addr.sh \ TEST_GEN_PROGS_EXTENDED = test_sock_addr test_skb_cgroup_id_user \ flow_dissector_load test_flow_dissector test_tcp_check_syncookie_user \ test_lirc_mode2_user xdping test_cpp runqslower bench bpf_testmod.ko \ - xdpxceiver + xdpxceiver xdp_redirect_multi TEST_CUSTOM_PROGS = urandom_read diff --git a/tools/testing/selftests/bpf/progs/xdp_redirect_multi_kern.c b/tools/testing/selftests/bpf/progs/xdp_redirect_multi_kern.c new file mode 100644 index 000000000000..a9785b28175a --- /dev/null +++ b/tools/testing/selftests/bpf/progs/xdp_redirect_multi_kern.c @@ -0,0 +1,120 @@ +/* SPDX-License-Identifier: GPL-2.0 + * + * modify it under the terms of version 2 of the GNU General Public + * License as published by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, but + * WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * General Public License for more details. + */ +#define KBUILD_MODNAME "foo" +#include +#include +#include +#include +#include +#include + +#include +#include +#include + +struct { + __uint(type, BPF_MAP_TYPE_DEVMAP); + __uint(key_size, sizeof(int)); + __uint(value_size, sizeof(int)); + __uint(max_entries, 128); +} forward_map_v4 SEC(".maps"); + +struct { + __uint(type, BPF_MAP_TYPE_DEVMAP_HASH); + __uint(key_size, sizeof(int)); + __uint(value_size, sizeof(int)); + __uint(max_entries, 128); +} forward_map_v6 SEC(".maps"); + +struct { + __uint(type, BPF_MAP_TYPE_DEVMAP_HASH); + __uint(key_size, sizeof(int)); + __uint(value_size, sizeof(int)); + __uint(max_entries, 128); +} forward_map_all SEC(".maps"); + +struct { + __uint(type, BPF_MAP_TYPE_DEVMAP_HASH); + __uint(key_size, sizeof(int)); + __uint(value_size, sizeof(struct bpf_devmap_val)); + __uint(max_entries, 128); +} forward_map_egress SEC(".maps"); + +struct { + __uint(type, BPF_MAP_TYPE_DEVMAP_HASH); + __uint(key_size, sizeof(int)); + __uint(value_size, sizeof(int)); + __uint(max_entries, 128); +} exclude_map SEC(".maps"); + +/* map to stroe egress interfaces mac addresses */ +struct { + __uint(type, BPF_MAP_TYPE_HASH); + __type(key, __u32); + __type(value, __be64); + __uint(max_entries, 128); +} mac_map SEC(".maps"); + +SEC("xdp_redirect_map_multi") +int xdp_redirect_map_multi_prog(struct xdp_md *ctx) +{ + void *data_end = (void *)(long)ctx->data_end; + void *data = (void *)(long)ctx->data; + struct ethhdr *eth = data; + __u16 h_proto; + __u64 nh_off; + + nh_off = sizeof(*eth); + if (data + nh_off > data_end) + return XDP_DROP; + + h_proto = eth->h_proto; + + if (h_proto == bpf_htons(ETH_P_IP)) + return bpf_redirect_map_multi(&forward_map_v4, &exclude_map, + BPF_F_EXCLUDE_INGRESS); + else if (h_proto == bpf_htons(ETH_P_IPV6)) + return bpf_redirect_map_multi(&forward_map_v6, &exclude_map, + BPF_F_EXCLUDE_INGRESS); + else + return bpf_redirect_map_multi(&forward_map_all, NULL, + BPF_F_EXCLUDE_INGRESS); +} + +/* The following 2 progs are for 2nd devmap prog testing */ +SEC("xdp_redirect_map_ingress") +int xdp_redirect_map_all_prog(struct xdp_md *ctx) +{ + return bpf_redirect_map_multi(&forward_map_egress, NULL, BPF_F_EXCLUDE_INGRESS); +} + +SEC("xdp_devmap/map_prog") +int xdp_devmap_prog(struct xdp_md *ctx) +{ + void *data_end = (void *)(long)ctx->data_end; + void *data = (void *)(long)ctx->data; + __u32 key = ctx->egress_ifindex; + struct ethhdr *eth = data; + __u64 nh_off; + __be64 *mac; + + nh_off = sizeof(*eth); + if (data + nh_off > data_end) + return XDP_DROP; + + mac = bpf_map_lookup_elem(&mac_map, &key); + if (mac) + __builtin_memcpy(eth->h_source, mac, ETH_ALEN); + + return XDP_PASS; +} + +char _license[] SEC("license") = "GPL"; diff --git a/tools/testing/selftests/bpf/test_xdp_redirect_multi.sh b/tools/testing/selftests/bpf/test_xdp_redirect_multi.sh new file mode 100755 index 000000000000..6503751fdca5 --- /dev/null +++ b/tools/testing/selftests/bpf/test_xdp_redirect_multi.sh @@ -0,0 +1,208 @@ +#!/bin/bash +# SPDX-License-Identifier: GPL-2.0 +# +# Test topology: +# - - - - - - - - - - - - - - - - - - - - - - - - - +# | veth1 veth2 veth3 veth4 | ... init net +# - -| - - - - - - | - - - - - - | - - - - - - | - - +# --------- --------- --------- --------- +# | veth0 | | veth0 | | veth0 | | veth0 | ... +# --------- --------- --------- --------- +# ns1 ns2 ns3 ns4 +# +# Forward maps: +# Forward map_all has interfaces: veth1, veth2, veth3, veth4, ... (All traffic except IPv4, IPv6) +# Forward map_v4 has interfaces: veth1, veth3, veth4, ... (For IPv4 traffic only) +# Forward map_v6 has interfaces: veth2, veth3, veth4, ... (For IPv6 traffic only) +# Forward map_egress has all interfaces and redirect all pkts +# Exclude Groups: +# Exclude map: veth3 (assume ns3 is in black list) +# Map type: +# map_v4 use DEVMAP, others use DEVMAP_HASH +# +# Test modules: +# XDP modes: generic, native, native + egress_prog +# +# Test cases: +# ARP(we didn't block ARP for ns3): +# ns1 -> gw: ns2, ns3, ns4 should receive the arp request +# IPv4: +# ns1 -> ns2 (block), ns1 -> ns3 (block), ns1 -> ns4 (pass) +# IPv6 +# ns2 -> ns1 (block), ns2 -> ns3 (block), ns2 -> ns4 (pass) +# egress_prog: +# all ping test should pass, the src mac should be egress interface's mac +# + + +# netns numbers +NUM=4 +IFACES="" +DRV_MODE="xdpgeneric xdpdrv xdpegress" +PASS=0 +FAIL=0 + +test_pass() +{ + echo "Pass: $@" + PASS=$((PASS + 1)) +} + +test_fail() +{ + echo "fail: $@" + FAIL=$((FAIL + 1)) +} + +clean_up() +{ + for i in $(seq $NUM); do + ip link del veth$i 2> /dev/null + ip netns del ns$i 2> /dev/null + done +} + +# Kselftest framework requirement - SKIP code is 4. +check_env() +{ + ip link set dev lo xdpgeneric off &>/dev/null + if [ $? -ne 0 ];then + echo "selftests: [SKIP] Could not run test without the ip xdpgeneric support" + exit 4 + fi + + which tcpdump &>/dev/null + if [ $? -ne 0 ];then + echo "selftests: [SKIP] Could not run test without tcpdump" + exit 4 + fi +} + +setup_ns() +{ + local mode=$1 + IFACES="" + + if [ "$mode" = "xdpegress" ]; then + mode="xdpdrv" + fi + + for i in $(seq $NUM); do + ip netns add ns$i + ip link add veth$i type veth peer name veth0 netns ns$i + ip link set veth$i up + ip -n ns$i link set veth0 up + + ip -n ns$i addr add 192.0.2.$i/24 dev veth0 + ip -n ns$i addr add 2001:db8::$i/64 dev veth0 + ip -n ns$i link set veth0 $mode obj \ + xdp_dummy.o sec xdp_dummy &> /dev/null || \ + { test_fail "Unable to load dummy xdp" && exit 1; } + IFACES="$IFACES veth$i" + veth_mac[$i]=$(ip link show veth$i | awk '/link\/ether/ {print $2}') + done +} + +do_egress_tests() +{ + local mode=$1 + + # mac test + ip netns exec ns2 tcpdump -e -i veth0 -nn -l -e &> mac_ns1-2_${mode}.log & + ip netns exec ns3 tcpdump -e -i veth0 -nn -l -e &> mac_ns1-3_${mode}.log & + ip netns exec ns4 tcpdump -e -i veth0 -nn -l -e &> mac_ns1-4_${mode}.log & + ip netns exec ns1 ping 192.0.2.254 -c 4 &> /dev/null + sleep 2 + pkill -9 tcpdump + + # mac check + grep -q "${veth_mac[2]} > ff:ff:ff:ff:ff:ff" mac_ns1-2_${mode}.log && \ + test_pass "$mode mac ns1-2" || test_fail "$mode mac ns1-2" + grep -q "${veth_mac[3]} > ff:ff:ff:ff:ff:ff" mac_ns1-3_${mode}.log && \ + test_pass "$mode mac ns1-3" || test_fail "$mode mac ns1-3" + grep -q "${veth_mac[4]} > ff:ff:ff:ff:ff:ff" mac_ns1-4_${mode}.log && \ + test_pass "$mode mac ns1-4" || test_fail "$mode mac ns1-4" + + # ping test + ip netns exec ns1 ping 192.0.2.2 -c 4 &> /dev/null && \ + test_pass "$mode ping ns1-2" || test_fail "$mode ping ns1-2" + ip netns exec ns1 ping 192.0.2.3 -c 4 &> /dev/null && \ + test_pass "$mode ping ns1-3" || test_fail "$mode ping ns1-3" + ip netns exec ns1 ping 192.0.2.4 -c 4 &> /dev/null && \ + test_pass "$mode ping ns1-4" || test_fail "$mode ping ns1-4" +} + +do_ping_tests() +{ + local mode=$1 + + # arp test + ip netns exec ns2 tcpdump -i veth0 -nn -l -e &> arp_ns1-2_${mode}.log & + ip netns exec ns3 tcpdump -i veth0 -nn -l -e &> arp_ns1-3_${mode}.log & + ip netns exec ns4 tcpdump -i veth0 -nn -l -e &> arp_ns1-4_${mode}.log & + ip netns exec ns1 ping 192.0.2.254 -c 4 &> /dev/null + sleep 2 + pkill -9 tcpdump + grep -q "Request who-has 192.0.2.254 tell 192.0.2.1" arp_ns1-2_${mode}.log && \ + test_pass "$mode arp ns1-2" || test_fail "$mode arp ns1-2" + grep -q "Request who-has 192.0.2.254 tell 192.0.2.1" arp_ns1-3_${mode}.log && \ + test_pass "$mode arp ns1-3" || test_fail "$mode arp ns1-3" + grep -q "Request who-has 192.0.2.254 tell 192.0.2.1" arp_ns1-4_${mode}.log && \ + test_pass "$mode arp ns1-4" || test_fail "$mode arp ns1-4" + + # ping test + ip netns exec ns1 ping 192.0.2.2 -c 4 &> /dev/null && \ + test_fail "$mode ping ns1-2" || test_pass "$mode ping ns1-2" + ip netns exec ns1 ping 192.0.2.3 -c 4 &> /dev/null && \ + test_fail "$mode ping ns1-3" || test_pass "$mode ping ns1-3" + ip netns exec ns1 ping 192.0.2.4 -c 4 &> /dev/null && \ + test_pass "$mode ping ns1-4" || test_fail "$mode ping ns1-4" + + # ping6 test + ip netns exec ns2 ping6 2001:db8::1 -c 4 &> /dev/null && \ + test_fail "$mode ping6 ns2-1" || test_pass "$mode ping6 ns2-1" + ip netns exec ns2 ping6 2001:db8::3 -c 4 &> /dev/null && \ + test_fail "$mode ping6 ns2-3" || test_pass "$mode ping6 ns2-3" + ip netns exec ns2 ping6 2001:db8::4 -c 4 &> /dev/null && \ + test_pass "$mode ping6 ns2-4" || test_fail "$mode ping6 ns2-4" +} + +do_tests() +{ + local mode=$1 + local drv_p + + case ${mode} in + xdpdrv) drv_p="-N";; + xdpegress) drv_p="-X";; + xdpgeneric) drv_p="-S";; + esac + + ./xdp_redirect_multi $drv_p $IFACES &> xdp_redirect_${mode}.log & + xdp_pid=$! + sleep 10 + + if [ "$mode" = "xdpegress" ]; then + do_egress_tests $mode + else + do_ping_tests $mode + fi + + kill $xdp_pid +} + +trap clean_up 0 2 3 6 9 + +check_env +rm -f xdp_redirect_*.log arp_ns*.log mac_ns*.log + +for mode in ${DRV_MODE}; do + setup_ns $mode + do_tests $mode + sleep 10 + clean_up + sleep 5 +done + +echo "Summary: PASS $PASS, FAIL $FAIL" +[ $FAIL -eq 0 ] && exit 0 || exit 1 diff --git a/tools/testing/selftests/bpf/xdp_redirect_multi.c b/tools/testing/selftests/bpf/xdp_redirect_multi.c new file mode 100644 index 000000000000..0a39dfe3246c --- /dev/null +++ b/tools/testing/selftests/bpf/xdp_redirect_multi.c @@ -0,0 +1,258 @@ +// SPDX-License-Identifier: GPL-2.0 +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include "bpf_util.h" +#include +#include + +#define MAX_IFACE_NUM 32 + +static __u32 xdp_flags = XDP_FLAGS_UPDATE_IF_NOEXIST; +static int ifaces[MAX_IFACE_NUM] = {}; + +static void int_exit(int sig) +{ + __u32 prog_id = 0; + int i; + + for (i = 0; ifaces[i] > 0; i++) { + if (bpf_get_link_xdp_id(ifaces[i], &prog_id, xdp_flags)) { + printf("bpf_get_link_xdp_id failed\n"); + exit(1); + } + if (prog_id) + bpf_set_link_xdp_fd(ifaces[i], -1, xdp_flags); + } + + exit(0); +} + +static int get_mac_addr(unsigned int ifindex, void *mac_addr) +{ + char ifname[IF_NAMESIZE]; + struct ifreq ifr; + int fd, ret = -1; + + fd = socket(AF_INET, SOCK_DGRAM, 0); + if (fd < 0) + return ret; + + if (!if_indextoname(ifindex, ifname)) + goto err_out; + + strcpy(ifr.ifr_name, ifname); + + if (ioctl(fd, SIOCGIFHWADDR, &ifr) != 0) + goto err_out; + + memcpy(mac_addr, ifr.ifr_hwaddr.sa_data, 6 * sizeof(char)); + ret = 0; + +err_out: + close(fd); + return ret; +} + +static void usage(const char *prog) +{ + fprintf(stderr, + "usage: %s [OPTS] ...\n" + "OPTS:\n" + " -S use skb-mode\n" + " -N enforce native mode\n" + " -F force loading prog\n" + " -X load xdp program on egress\n", + prog); +} + +int main(int argc, char **argv) +{ + int prog_fd, group_all, group_v4, group_v6, exclude, mac_map; + struct rlimit r = {RLIM_INFINITY, RLIM_INFINITY}; + struct bpf_program *ingress_prog, *egress_prog; + struct bpf_prog_load_attr prog_load_attr = { + .prog_type = BPF_PROG_TYPE_UNSPEC, + }; + int i, ret, opt, egress_prog_fd = -1; + struct bpf_devmap_val devmap_val; + bool attach_egress_prog = false; + unsigned char mac_addr[6]; + char ifname[IF_NAMESIZE]; + struct bpf_object *obj; + unsigned int ifindex; + char filename[256]; + + while ((opt = getopt(argc, argv, "SNFX")) != -1) { + switch (opt) { + case 'S': + xdp_flags |= XDP_FLAGS_SKB_MODE; + break; + case 'N': + /* default, set below */ + break; + case 'F': + xdp_flags &= ~XDP_FLAGS_UPDATE_IF_NOEXIST; + break; + case 'X': + attach_egress_prog = true; + break; + default: + usage(basename(argv[0])); + return 1; + } + } + + if (!(xdp_flags & XDP_FLAGS_SKB_MODE)) { + xdp_flags |= XDP_FLAGS_DRV_MODE; + } else if (attach_egress_prog) { + printf("Load xdp program on egress with SKB mode not supported yet\n"); + goto err_out; + } + + if (optind == argc) { + printf("usage: %s ...\n", argv[0]); + goto err_out; + } + + if (setrlimit(RLIMIT_MEMLOCK, &r)) { + perror("setrlimit(RLIMIT_MEMLOCK)"); + goto err_out; + } + + printf("Get interfaces"); + for (i = 0; i < MAX_IFACE_NUM && argv[optind + i]; i++) { + ifaces[i] = if_nametoindex(argv[optind + i]); + if (!ifaces[i]) + ifaces[i] = strtoul(argv[optind + i], NULL, 0); + if (!if_indextoname(ifaces[i], ifname)) { + perror("Invalid interface name or i"); + goto err_out; + } + printf(" %d", ifaces[i]); + } + printf("\n"); + + snprintf(filename, sizeof(filename), "%s_kern.o", argv[0]); + prog_load_attr.file = filename; + + if (bpf_prog_load_xattr(&prog_load_attr, &obj, &prog_fd)) + goto err_out; + + if (attach_egress_prog) + group_all = bpf_object__find_map_fd_by_name(obj, "forward_map_egress"); + else + group_all = bpf_object__find_map_fd_by_name(obj, "forward_map_all"); + group_v4 = bpf_object__find_map_fd_by_name(obj, "forward_map_v4"); + group_v6 = bpf_object__find_map_fd_by_name(obj, "forward_map_v6"); + exclude = bpf_object__find_map_fd_by_name(obj, "exclude_map"); + mac_map = bpf_object__find_map_fd_by_name(obj, "mac_map"); + + if (group_all < 0 || group_v4 < 0 || group_v6 < 0 || exclude < 0 || + mac_map < 0) { + printf("bpf_object__find_map_fd_by_name failed\n"); + goto err_out; + } + + if (attach_egress_prog) { + /* Find ingress/egress prog for 2nd xdp prog */ + ingress_prog = bpf_object__find_program_by_title(obj, "xdp_redirect_map_ingress"); + egress_prog = bpf_object__find_program_by_title(obj, "xdp_devmap/map_prog"); + if (!ingress_prog || !egress_prog) { + printf("finding ingress/egress_prog in obj file failed\n"); + goto err_out; + } + prog_fd = bpf_program__fd(ingress_prog); + egress_prog_fd = bpf_program__fd(egress_prog); + if (prog_fd < 0 || egress_prog_fd < 0) { + printf("find egress_prog fd failed\n"); + goto err_out; + } + } + + signal(SIGINT, int_exit); + signal(SIGTERM, int_exit); + + /* Init forward multicast groups and exclude group */ + for (i = 0; ifaces[i] > 0; i++) { + ifindex = ifaces[i]; + + if (attach_egress_prog) { + ret = get_mac_addr(ifindex, mac_addr); + if (ret < 0) { + printf("get interface %d mac failed\n", ifindex); + goto err_out; + } + ret = bpf_map_update_elem(mac_map, &ifindex, mac_addr, 0); + if (ret) { + perror("bpf_update_elem mac_map failed\n"); + goto err_out; + } + } + + /* Add all the interfaces to group all */ + devmap_val.ifindex = ifindex; + devmap_val.bpf_prog.fd = egress_prog_fd; + ret = bpf_map_update_elem(group_all, &ifindex, &devmap_val, 0); + if (ret) { + perror("bpf_map_update_elem"); + goto err_out; + } + + /* For testing: remove the 1st interfaces from group v6 */ + if (i != 0) { + ret = bpf_map_update_elem(group_v6, &ifindex, &ifindex, 0); + if (ret) { + perror("bpf_map_update_elem"); + goto err_out; + } + } + + /* For testing: remove the 2nd interfaces from group v4 */ + if (i != 1) { + ret = bpf_map_update_elem(group_v4, &ifindex, &ifindex, 0); + if (ret) { + perror("bpf_map_update_elem"); + goto err_out; + } + } + + /* For testing: add the 3rd interfaces to exclude map */ + if (i == 2) { + ret = bpf_map_update_elem(exclude, &ifindex, &ifindex, 0); + if (ret) { + perror("bpf_map_update_elem"); + goto err_out; + } + } + + /* bind prog_fd to each interface */ + ret = bpf_set_link_xdp_fd(ifindex, prog_fd, xdp_flags); + if (ret) { + printf("Set xdp fd failed on %d\n", ifindex); + goto err_out; + } + } + + /* sleep some time for testing */ + sleep(999); + + return 0; + +err_out: + return 1; +}