From patchwork Tue Mar 8 14:57:57 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: =?utf-8?q?Toke_H=C3=B8iland-J=C3=B8rgensen?= X-Patchwork-Id: 12773929 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id E7A28C433F5 for ; Tue, 8 Mar 2022 14:58:20 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1347710AbiCHO7P (ORCPT ); Tue, 8 Mar 2022 09:59:15 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53704 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1347694AbiCHO7I (ORCPT ); Tue, 8 Mar 2022 09:59:08 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 93C721A816 for ; Tue, 8 Mar 2022 06:58:10 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1646751489; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=bxDXghPGmHYz3rKQT4OQVa0V3FISdxYHJcVrLjZbAdY=; b=TGtM20gc6bn819cdiBXfXewCDFDijGiCM+ERz8Xem5grEd0Yva0JbhJa8cNV76LUwL6Upj JIq+WV9GBA04CiqSfHjaajo5KvJKUQsCTrLPiK9/Cr0J7PMw0MTO2x0Y30NVj3DhcAoWDs pM+wMvaxDmWxDg+RaaOaX8lksGSKSKU= Received: from mail-ej1-f71.google.com (mail-ej1-f71.google.com [209.85.218.71]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-247-WDe9vRaRPyyE_aIh98aPOA-1; Tue, 08 Mar 2022 09:58:07 -0500 X-MC-Unique: WDe9vRaRPyyE_aIh98aPOA-1 Received: by mail-ej1-f71.google.com with SMTP id q22-20020a1709064cd600b006db14922f93so2901373ejt.7 for ; Tue, 08 Mar 2022 06:58:07 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=bxDXghPGmHYz3rKQT4OQVa0V3FISdxYHJcVrLjZbAdY=; b=13JIM2wOMHLoeV7joOCXLASOuj84VWmCorpBf6c90Yv44ef2gpNbxbMNZHM/dq/mUm HMar4wxV4RPC9HcUNHdaT+T5Bq/EgH1dKmYffBbVMq9ABqtXIlqqweSHS6CDbm4+DDGA ZZj2D3DkpGBgQ3PS3XdXZ356ajO0wGR1d9iUkP0zO8HxqaLXK02JlWa0V935DtZ4G2nF BQ3viYxX+6b353RcsBxHAJO8qLPrTDvNI+E9S+9uKSCHltcGX0aA+M/6xkkv3GznxzaJ MjgmT4l/kYKbR6/5AG8aI7jUex5FBYy2zHHUi9GUAKYpdY1ZnGUIO+ztj1U4Pbw0UPju bTPg== X-Gm-Message-State: AOAM533y8VK+0PPXLouX0C4hUls+A+3GFyWwH8Djbnh/dBm49nM7X52V 7tj1/5hHonb+QbwKyeyYwjxWnhZKfN1AVzXRlH81OfGvx7vx+N7/LL0svU9P1RKZDk2Xm2Eb/pi DFDgIkpc3BfbuDHOB X-Received: by 2002:a17:907:3f25:b0:6b0:5e9a:83 with SMTP id hq37-20020a1709073f2500b006b05e9a0083mr14320519ejc.659.1646751484945; Tue, 08 Mar 2022 06:58:04 -0800 (PST) X-Google-Smtp-Source: ABdhPJwmMULHXyCccy3HJ2bMAY5VL/vRW2QNsVqiHdT5QmozQN6rJqgK3Gs548O70r5pXiYVyjjR2A== X-Received: by 2002:a17:907:3f25:b0:6b0:5e9a:83 with SMTP id hq37-20020a1709073f2500b006b05e9a0083mr14320492ejc.659.1646751484464; Tue, 08 Mar 2022 06:58:04 -0800 (PST) Received: from alrua-x1.borgediget.toke.dk ([45.145.92.2]) by smtp.gmail.com with ESMTPSA id gl2-20020a170906e0c200b006a767d52373sm5952243ejb.182.2022.03.08.06.58.03 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 08 Mar 2022 06:58:03 -0800 (PST) Received: by alrua-x1.borgediget.toke.dk (Postfix, from userid 1000) id 315A21928D2; Tue, 8 Mar 2022 15:58:03 +0100 (CET) From: =?utf-8?q?Toke_H=C3=B8iland-J=C3=B8rgensen?= To: Alexei Starovoitov , Daniel Borkmann , Andrii Nakryiko , Martin KaFai Lau , Song Liu , Yonghong Song , John Fastabend , KP Singh , "David S. Miller" , Jakub Kicinski , Jesper Dangaard Brouer Cc: =?utf-8?q?Toke_H=C3=B8iland-J=C3=B8rgensen?= , netdev@vger.kernel.org, bpf@vger.kernel.org Subject: [PATCH bpf-next v10 1/5] bpf: Add "live packet" mode for XDP in BPF_PROG_RUN Date: Tue, 8 Mar 2022 15:57:57 +0100 Message-Id: <20220308145801.46256-2-toke@redhat.com> X-Mailer: git-send-email 2.35.1 In-Reply-To: <20220308145801.46256-1-toke@redhat.com> References: <20220308145801.46256-1-toke@redhat.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net This adds support for running XDP programs through BPF_PROG_RUN in a mode that enables live packet processing of the resulting frames. Previous uses of BPF_PROG_RUN for XDP returned the XDP program return code and the modified packet data to userspace, which is useful for unit testing of XDP programs. The existing BPF_PROG_RUN for XDP allows userspace to set the ingress ifindex and RXQ number as part of the context object being passed to the kernel. This patch reuses that code, but adds a new mode with different semantics, which can be selected with the new BPF_F_TEST_XDP_LIVE_FRAMES flag. When running BPF_PROG_RUN in this mode, the XDP program return codes will be honoured: returning XDP_PASS will result in the frame being injected into the networking stack as if it came from the selected networking interface, while returning XDP_TX and XDP_REDIRECT will result in the frame being transmitted out that interface. XDP_TX is translated into an XDP_REDIRECT operation to the same interface, since the real XDP_TX action is only possible from within the network drivers themselves, not from the process context where BPF_PROG_RUN is executed. Internally, this new mode of operation creates a page pool instance while setting up the test run, and feeds pages from that into the XDP program. The setup cost of this is amortised over the number of repetitions specified by userspace. To support the performance testing use case, we further optimise the setup step so that all pages in the pool are pre-initialised with the packet data, and pre-computed context and xdp_frame objects stored at the start of each page. This makes it possible to entirely avoid touching the page content on each XDP program invocation, and enables sending up to 9 Mpps/core on my test box. Because the data pages are recycled by the page pool, and the test runner doesn't re-initialise them for each run, subsequent invocations of the XDP program will see the packet data in the state it was after the last time it ran on that particular page. This means that an XDP program that modifies the packet before redirecting it has to be careful about which assumptions it makes about the packet content, but that is only an issue for the most naively written programs. Enabling the new flag is only allowed when not setting ctx_out and data_out in the test specification, since using it means frames will be redirected somewhere else, so they can't be returned. Signed-off-by: Toke Høiland-Jørgensen --- include/uapi/linux/bpf.h | 3 + kernel/bpf/Kconfig | 1 + kernel/bpf/syscall.c | 2 +- net/bpf/test_run.c | 331 +++++++++++++++++++++++++++++++-- tools/include/uapi/linux/bpf.h | 3 + 5 files changed, 325 insertions(+), 15 deletions(-) diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h index 4eebea830613..bc23020b638d 100644 --- a/include/uapi/linux/bpf.h +++ b/include/uapi/linux/bpf.h @@ -1232,6 +1232,8 @@ enum { /* If set, run the test on the cpu specified by bpf_attr.test.cpu */ #define BPF_F_TEST_RUN_ON_CPU (1U << 0) +/* If set, XDP frames will be transmitted after processing */ +#define BPF_F_TEST_XDP_LIVE_FRAMES (1U << 1) /* type for BPF_ENABLE_STATS */ enum bpf_stats_type { @@ -1393,6 +1395,7 @@ union bpf_attr { __aligned_u64 ctx_out; __u32 flags; __u32 cpu; + __u32 batch_size; } test; struct { /* anonymous struct used by BPF_*_GET_*_ID */ diff --git a/kernel/bpf/Kconfig b/kernel/bpf/Kconfig index c3cf0b86eeb2..d56ee177d5f8 100644 --- a/kernel/bpf/Kconfig +++ b/kernel/bpf/Kconfig @@ -30,6 +30,7 @@ config BPF_SYSCALL select TASKS_TRACE_RCU select BINARY_PRINTF select NET_SOCK_MSG if NET + select PAGE_POOL if NET default n help Enable the bpf() system call that allows to manipulate BPF programs diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c index db402ebc5570..9beb585be5a6 100644 --- a/kernel/bpf/syscall.c +++ b/kernel/bpf/syscall.c @@ -3336,7 +3336,7 @@ static int bpf_prog_query(const union bpf_attr *attr, } } -#define BPF_PROG_TEST_RUN_LAST_FIELD test.cpu +#define BPF_PROG_TEST_RUN_LAST_FIELD test.batch_size static int bpf_prog_test_run(const union bpf_attr *attr, union bpf_attr __user *uattr) diff --git a/net/bpf/test_run.c b/net/bpf/test_run.c index ba410b069824..c9772f9688a7 100644 --- a/net/bpf/test_run.c +++ b/net/bpf/test_run.c @@ -15,6 +15,7 @@ #include #include #include +#include #include #include #include @@ -53,10 +54,11 @@ static void bpf_test_timer_leave(struct bpf_test_timer *t) rcu_read_unlock(); } -static bool bpf_test_timer_continue(struct bpf_test_timer *t, u32 repeat, int *err, u32 *duration) +static bool bpf_test_timer_continue(struct bpf_test_timer *t, int iterations, + u32 repeat, int *err, u32 *duration) __must_hold(rcu) { - t->i++; + t->i += iterations; if (t->i >= repeat) { /* We're done. */ t->time_spent += ktime_get_ns() - t->time_start; @@ -88,6 +90,283 @@ static bool bpf_test_timer_continue(struct bpf_test_timer *t, u32 repeat, int *e return false; } +/* We put this struct at the head of each page with a context and frame + * initialised when the page is allocated, so we don't have to do this on each + * repetition of the test run. + */ +struct xdp_page_head { + struct xdp_buff orig_ctx; + struct xdp_buff ctx; + struct xdp_frame frm; + u8 data[]; +}; + +struct xdp_test_data { + struct xdp_buff *orig_ctx; + struct xdp_rxq_info rxq; + struct net_device *dev; + struct page_pool *pp; + struct xdp_frame **frames; + struct sk_buff **skbs; + u32 batch_size; + u32 frame_cnt; +}; + +#define TEST_XDP_FRAME_SIZE (PAGE_SIZE - sizeof(struct xdp_page_head) \ + - sizeof(struct skb_shared_info)) +#define TEST_XDP_MAX_BATCH 256 + +static void xdp_test_run_init_page(struct page *page, void *arg) +{ + struct xdp_page_head *head = phys_to_virt(page_to_phys(page)); + struct xdp_buff *new_ctx, *orig_ctx; + u32 headroom = XDP_PACKET_HEADROOM; + struct xdp_test_data *xdp = arg; + size_t frm_len, meta_len; + struct xdp_frame *frm; + void *data; + + orig_ctx = xdp->orig_ctx; + frm_len = orig_ctx->data_end - orig_ctx->data_meta; + meta_len = orig_ctx->data - orig_ctx->data_meta; + headroom -= meta_len; + + new_ctx = &head->ctx; + frm = &head->frm; + data = &head->data; + memcpy(data + headroom, orig_ctx->data_meta, frm_len); + + xdp_init_buff(new_ctx, TEST_XDP_FRAME_SIZE, &xdp->rxq); + xdp_prepare_buff(new_ctx, data, headroom, frm_len, true); + new_ctx->data = new_ctx->data_meta + meta_len; + + xdp_update_frame_from_buff(new_ctx, frm); + frm->mem = new_ctx->rxq->mem; + + memcpy(&head->orig_ctx, new_ctx, sizeof(head->orig_ctx)); +} + +static int xdp_test_run_setup(struct xdp_test_data *xdp, struct xdp_buff *orig_ctx) +{ + struct xdp_mem_info mem = {}; + struct page_pool *pp; + int err = -ENOMEM; + struct page_pool_params pp_params = { + .order = 0, + .flags = 0, + .pool_size = xdp->batch_size, + .nid = NUMA_NO_NODE, + .max_len = TEST_XDP_FRAME_SIZE, + .init_callback = xdp_test_run_init_page, + .init_arg = xdp, + }; + + xdp->frames = kvmalloc_array(xdp->batch_size, sizeof(void *), GFP_KERNEL); + if (!xdp->frames) + return -ENOMEM; + + xdp->skbs = kvmalloc_array(xdp->batch_size, sizeof(void *), GFP_KERNEL); + if (!xdp->skbs) + goto err_skbs; + + pp = page_pool_create(&pp_params); + if (IS_ERR(pp)) { + err = PTR_ERR(pp); + goto err_pp; + } + + /* will copy 'mem.id' into pp->xdp_mem_id */ + err = xdp_reg_mem_model(&mem, MEM_TYPE_PAGE_POOL, pp); + if (err) + goto err_mmodel; + + xdp->pp = pp; + + /* We create a 'fake' RXQ referencing the original dev, but with an + * xdp_mem_info pointing to our page_pool + */ + xdp_rxq_info_reg(&xdp->rxq, orig_ctx->rxq->dev, 0, 0); + xdp->rxq.mem.type = MEM_TYPE_PAGE_POOL; + xdp->rxq.mem.id = pp->xdp_mem_id; + xdp->dev = orig_ctx->rxq->dev; + xdp->orig_ctx = orig_ctx; + + return 0; + +err_mmodel: + page_pool_destroy(pp); +err_pp: + kfree(xdp->skbs); +err_skbs: + kfree(xdp->frames); + return err; +} + +static void xdp_test_run_teardown(struct xdp_test_data *xdp) +{ + page_pool_destroy(xdp->pp); + kfree(xdp->frames); + kfree(xdp->skbs); +} + +static bool ctx_was_changed(struct xdp_page_head *head) +{ + return head->orig_ctx.data != head->ctx.data || + head->orig_ctx.data_meta != head->ctx.data_meta || + head->orig_ctx.data_end != head->ctx.data_end; +} + +static void reset_ctx(struct xdp_page_head *head) +{ + if (likely(!ctx_was_changed(head))) + return; + + head->ctx.data = head->orig_ctx.data; + head->ctx.data_meta = head->orig_ctx.data_meta; + head->ctx.data_end = head->orig_ctx.data_end; + xdp_update_frame_from_buff(&head->ctx, &head->frm); +} + +static int xdp_recv_frames(struct xdp_frame **frames, int nframes, + struct sk_buff **skbs, + struct net_device *dev) +{ + gfp_t gfp = __GFP_ZERO | GFP_ATOMIC; + int i, n; + LIST_HEAD(list); + + n = kmem_cache_alloc_bulk(skbuff_head_cache, gfp, nframes, (void **)skbs); + if (unlikely(n == 0)) { + for (i = 0; i < nframes; i++) + xdp_return_frame(frames[i]); + return -ENOMEM; + } + + for (i = 0; i < nframes; i++) { + struct xdp_frame *xdpf = frames[i]; + struct sk_buff *skb = skbs[i]; + + skb = __xdp_build_skb_from_frame(xdpf, skb, dev); + if (!skb) { + xdp_return_frame(xdpf); + continue; + } + + list_add_tail(&skb->list, &list); + } + netif_receive_skb_list(&list); + + return 0; +} + +static int xdp_test_run_batch(struct xdp_test_data *xdp, struct bpf_prog *prog, + u32 repeat) +{ + struct bpf_redirect_info *ri = this_cpu_ptr(&bpf_redirect_info); + int err = 0, act, ret, i, nframes = 0, batch_sz; + struct xdp_frame **frames = xdp->frames; + struct xdp_page_head *head; + struct xdp_frame *frm; + bool redirect = false; + struct xdp_buff *ctx; + struct page *page; + + batch_sz = min_t(u32, repeat, xdp->batch_size); + + local_bh_disable(); + xdp_set_return_frame_no_direct(); + + for (i = 0; i < batch_sz; i++) { + page = page_pool_dev_alloc_pages(xdp->pp); + if (!page) { + err = -ENOMEM; + goto out; + } + + head = phys_to_virt(page_to_phys(page)); + reset_ctx(head); + ctx = &head->ctx; + frm = &head->frm; + xdp->frame_cnt++; + + act = bpf_prog_run_xdp(prog, ctx); + + /* if program changed pkt bounds we need to update the xdp_frame */ + if (unlikely(ctx_was_changed(head))) { + ret = xdp_update_frame_from_buff(ctx, frm); + if (ret) { + xdp_return_buff(ctx); + continue; + } + } + + switch (act) { + case XDP_TX: + /* we can't do a real XDP_TX since we're not in the + * driver, so turn it into a REDIRECT back to the same + * index + */ + ri->tgt_index = xdp->dev->ifindex; + ri->map_id = INT_MAX; + ri->map_type = BPF_MAP_TYPE_UNSPEC; + fallthrough; + case XDP_REDIRECT: + redirect = true; + ret = xdp_do_redirect_frame(xdp->dev, ctx, frm, prog); + if (ret) + xdp_return_buff(ctx); + break; + case XDP_PASS: + frames[nframes++] = frm; + break; + default: + bpf_warn_invalid_xdp_action(NULL, prog, act); + fallthrough; + case XDP_DROP: + xdp_return_buff(ctx); + break; + } + } + +out: + if (redirect) + xdp_do_flush(); + if (nframes) + err = xdp_recv_frames(frames, nframes, xdp->skbs, xdp->dev); + + xdp_clear_return_frame_no_direct(); + local_bh_enable(); + return err; +} + +static int bpf_test_run_xdp_live(struct bpf_prog *prog, struct xdp_buff *ctx, + u32 repeat, u32 batch_size, u32 *time) + +{ + struct xdp_test_data xdp = { .batch_size = batch_size }; + struct bpf_test_timer t = { .mode = NO_MIGRATE }; + int ret; + + if (!repeat) + repeat = 1; + + ret = xdp_test_run_setup(&xdp, ctx); + if (ret) + return ret; + + bpf_test_timer_enter(&t); + do { + xdp.frame_cnt = 0; + ret = xdp_test_run_batch(&xdp, prog, repeat - t.i); + if (unlikely(ret < 0)) + break; + } while (bpf_test_timer_continue(&t, xdp.frame_cnt, repeat, &ret, time)); + bpf_test_timer_leave(&t); + + xdp_test_run_teardown(&xdp); + return ret; +} + static int bpf_test_run(struct bpf_prog *prog, void *ctx, u32 repeat, u32 *retval, u32 *time, bool xdp) { @@ -119,7 +398,7 @@ static int bpf_test_run(struct bpf_prog *prog, void *ctx, u32 repeat, *retval = bpf_prog_run_xdp(prog, ctx); else *retval = bpf_prog_run(prog, ctx); - } while (bpf_test_timer_continue(&t, repeat, &ret, time)); + } while (bpf_test_timer_continue(&t, 1, repeat, &ret, time)); bpf_reset_run_ctx(old_ctx); bpf_test_timer_leave(&t); @@ -446,7 +725,7 @@ int bpf_prog_test_run_tracing(struct bpf_prog *prog, int b = 2, err = -EFAULT; u32 retval = 0; - if (kattr->test.flags || kattr->test.cpu) + if (kattr->test.flags || kattr->test.cpu || kattr->test.batch_size) return -EINVAL; switch (prog->expected_attach_type) { @@ -510,7 +789,7 @@ int bpf_prog_test_run_raw_tp(struct bpf_prog *prog, /* doesn't support data_in/out, ctx_out, duration, or repeat */ if (kattr->test.data_in || kattr->test.data_out || kattr->test.ctx_out || kattr->test.duration || - kattr->test.repeat) + kattr->test.repeat || kattr->test.batch_size) return -EINVAL; if (ctx_size_in < prog->aux->max_ctx_offset || @@ -741,7 +1020,7 @@ int bpf_prog_test_run_skb(struct bpf_prog *prog, const union bpf_attr *kattr, void *data; int ret; - if (kattr->test.flags || kattr->test.cpu) + if (kattr->test.flags || kattr->test.cpu || kattr->test.batch_size) return -EINVAL; data = bpf_test_init(kattr, kattr->test.data_size_in, @@ -922,7 +1201,9 @@ static void xdp_convert_buff_to_md(struct xdp_buff *xdp, struct xdp_md *xdp_md) int bpf_prog_test_run_xdp(struct bpf_prog *prog, const union bpf_attr *kattr, union bpf_attr __user *uattr) { + bool do_live = (kattr->test.flags & BPF_F_TEST_XDP_LIVE_FRAMES); u32 tailroom = SKB_DATA_ALIGN(sizeof(struct skb_shared_info)); + u32 batch_size = kattr->test.batch_size; u32 size = kattr->test.data_size_in; u32 headroom = XDP_PACKET_HEADROOM; u32 retval, duration, max_data_sz; @@ -938,6 +1219,18 @@ int bpf_prog_test_run_xdp(struct bpf_prog *prog, const union bpf_attr *kattr, prog->expected_attach_type == BPF_XDP_CPUMAP) return -EINVAL; + if (kattr->test.flags & ~BPF_F_TEST_XDP_LIVE_FRAMES) + return -EINVAL; + + if (do_live) { + if (!batch_size) + batch_size = NAPI_POLL_WEIGHT; + else if (batch_size > TEST_XDP_MAX_BATCH) + return -E2BIG; + } else if (batch_size) { + return -EINVAL; + } + ctx = bpf_ctx_init(kattr, sizeof(struct xdp_md)); if (IS_ERR(ctx)) return PTR_ERR(ctx); @@ -946,14 +1239,20 @@ int bpf_prog_test_run_xdp(struct bpf_prog *prog, const union bpf_attr *kattr, /* There can't be user provided data before the meta data */ if (ctx->data_meta || ctx->data_end != size || ctx->data > ctx->data_end || - unlikely(xdp_metalen_invalid(ctx->data))) + unlikely(xdp_metalen_invalid(ctx->data)) || + (do_live && (kattr->test.data_out || kattr->test.ctx_out))) goto free_ctx; /* Meta data is allocated from the headroom */ headroom -= ctx->data; } max_data_sz = 4096 - headroom - tailroom; - size = min_t(u32, size, max_data_sz); + if (size > max_data_sz) { + /* disallow live data mode for jumbo frames */ + if (do_live) + goto free_ctx; + size = max_data_sz; + } data = bpf_test_init(kattr, size, max_data_sz, headroom, tailroom); if (IS_ERR(data)) { @@ -1011,7 +1310,10 @@ int bpf_prog_test_run_xdp(struct bpf_prog *prog, const union bpf_attr *kattr, if (repeat > 1) bpf_prog_change_xdp(NULL, prog); - ret = bpf_test_run(prog, &xdp, repeat, &retval, &duration, true); + if (do_live) + ret = bpf_test_run_xdp_live(prog, &xdp, repeat, batch_size, &duration); + else + ret = bpf_test_run(prog, &xdp, repeat, &retval, &duration, true); /* We convert the xdp_buff back to an xdp_md before checking the return * code so the reference count of any held netdevice will be decremented * even if the test run failed. @@ -1073,7 +1375,7 @@ int bpf_prog_test_run_flow_dissector(struct bpf_prog *prog, if (prog->type != BPF_PROG_TYPE_FLOW_DISSECTOR) return -EINVAL; - if (kattr->test.flags || kattr->test.cpu) + if (kattr->test.flags || kattr->test.cpu || kattr->test.batch_size) return -EINVAL; if (size < ETH_HLEN) @@ -1108,7 +1410,7 @@ int bpf_prog_test_run_flow_dissector(struct bpf_prog *prog, do { retval = bpf_flow_dissect(prog, &ctx, eth->h_proto, ETH_HLEN, size, flags); - } while (bpf_test_timer_continue(&t, repeat, &ret, &duration)); + } while (bpf_test_timer_continue(&t, 1, repeat, &ret, &duration)); bpf_test_timer_leave(&t); if (ret < 0) @@ -1140,7 +1442,7 @@ int bpf_prog_test_run_sk_lookup(struct bpf_prog *prog, const union bpf_attr *kat if (prog->type != BPF_PROG_TYPE_SK_LOOKUP) return -EINVAL; - if (kattr->test.flags || kattr->test.cpu) + if (kattr->test.flags || kattr->test.cpu || kattr->test.batch_size) return -EINVAL; if (kattr->test.data_in || kattr->test.data_size_in || kattr->test.data_out || @@ -1203,7 +1505,7 @@ int bpf_prog_test_run_sk_lookup(struct bpf_prog *prog, const union bpf_attr *kat do { ctx.selected_sk = NULL; retval = BPF_PROG_SK_LOOKUP_RUN_ARRAY(progs, ctx, bpf_prog_run); - } while (bpf_test_timer_continue(&t, repeat, &ret, &duration)); + } while (bpf_test_timer_continue(&t, 1, repeat, &ret, &duration)); bpf_test_timer_leave(&t); if (ret < 0) @@ -1242,7 +1544,8 @@ int bpf_prog_test_run_syscall(struct bpf_prog *prog, /* doesn't support data_in/out, ctx_out, duration, or repeat or flags */ if (kattr->test.data_in || kattr->test.data_out || kattr->test.ctx_out || kattr->test.duration || - kattr->test.repeat || kattr->test.flags) + kattr->test.repeat || kattr->test.flags || + kattr->test.batch_size) return -EINVAL; if (ctx_size_in < prog->aux->max_ctx_offset || diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h index 4eebea830613..bc23020b638d 100644 --- a/tools/include/uapi/linux/bpf.h +++ b/tools/include/uapi/linux/bpf.h @@ -1232,6 +1232,8 @@ enum { /* If set, run the test on the cpu specified by bpf_attr.test.cpu */ #define BPF_F_TEST_RUN_ON_CPU (1U << 0) +/* If set, XDP frames will be transmitted after processing */ +#define BPF_F_TEST_XDP_LIVE_FRAMES (1U << 1) /* type for BPF_ENABLE_STATS */ enum bpf_stats_type { @@ -1393,6 +1395,7 @@ union bpf_attr { __aligned_u64 ctx_out; __u32 flags; __u32 cpu; + __u32 batch_size; } test; struct { /* anonymous struct used by BPF_*_GET_*_ID */ From patchwork Tue Mar 8 14:57:58 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: =?utf-8?q?Toke_H=C3=B8iland-J=C3=B8rgensen?= X-Patchwork-Id: 12773930 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id DE8E0C433EF for ; Tue, 8 Mar 2022 14:58:24 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1347690AbiCHO7Q (ORCPT ); Tue, 8 Mar 2022 09:59:16 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54160 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1347697AbiCHO7O (ORCPT ); Tue, 8 Mar 2022 09:59:14 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 9D129205E0 for ; Tue, 8 Mar 2022 06:58:12 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1646751491; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=iFqilxT+lSZOOOZ98XwesoXOJ/SGZfcJgQSLbKjZtCg=; b=R1NE5ZEIPyFBFggJHkSKFdYCGuxR8ttQVTmRIWSR5OSNxirGiTjvO+2+pRYpcF8NPNXvNx +igGn5QDmVR+LwQayCnJ9qhbTr8HF9U3Q0NFDaZ2Op7hywaVxDHn0D//osig6rkKBX99Hw g3njikpVj4/pwx5aZ3h9h2RzVxY1SM0= Received: from mail-ej1-f69.google.com (mail-ej1-f69.google.com [209.85.218.69]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-664-KVdnHGsmM3CxiRq4Z7OjMQ-1; Tue, 08 Mar 2022 09:58:10 -0500 X-MC-Unique: KVdnHGsmM3CxiRq4Z7OjMQ-1 Received: by mail-ej1-f69.google.com with SMTP id le4-20020a170907170400b006dab546bc40so5558478ejc.15 for ; Tue, 08 Mar 2022 06:58:10 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=iFqilxT+lSZOOOZ98XwesoXOJ/SGZfcJgQSLbKjZtCg=; b=CiKMmNCIswtVrRFHnDkI6iQwKmwivwzNCKc04ioKZ4FABuJ3TNpFBRYtOYz1OxpQ19 u7+888z/E9qNFJUxejTquEJI1h97Ody/V5JqL4pSDGVWTK2Se42D/OMuscfrtv8jZRRo CNznQbJvcInz9WCs1Aft+sTO+QDyt8yyxtfNDVERcCZlCiTzg33pHUCm7Vmu9ji/hj0T gXsqjmDxbNFncqxCMcUoFocV5an9NYruD9CjtaPpLqIQrKeQN3mbq0rbyXT/er73q+vs Rdnc+cbEq1s2GtA081Jvl8yOpbQNfyjpgtBKn+HLXAhWirgxsdvxLNq2lw3TAwilwY5h Xobg== X-Gm-Message-State: AOAM533OU94QxaGiHNHOEs/yO46s+aI3TkHX+NEDSsZHCIPs+cduLTLm JxEZ92ou5Keyclh7pBotP1YU5kf0TewBh+xOWm2/8QIACzuW/+uc6akg7EEe4Jko+zNH99RKvro /39vlcEktvUZfvUCW X-Received: by 2002:a17:907:961b:b0:6d9:acb0:5403 with SMTP id gb27-20020a170907961b00b006d9acb05403mr13442061ejc.568.1646751488362; Tue, 08 Mar 2022 06:58:08 -0800 (PST) X-Google-Smtp-Source: ABdhPJwyT4bqTkb/KvZ/GIdppGkevRs1F3WAJIZxAs4m3ze4tcXMPUM+DQnZsFpkV+3EoTVc1IkbCw== X-Received: by 2002:a17:907:961b:b0:6d9:acb0:5403 with SMTP id gb27-20020a170907961b00b006d9acb05403mr13441891ejc.568.1646751485821; Tue, 08 Mar 2022 06:58:05 -0800 (PST) Received: from alrua-x1.borgediget.toke.dk ([2a0c:4d80:42:443::2]) by smtp.gmail.com with ESMTPSA id y6-20020a056402358600b004166413d27bsm1738953edc.97.2022.03.08.06.58.04 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 08 Mar 2022 06:58:04 -0800 (PST) Received: by alrua-x1.borgediget.toke.dk (Postfix, from userid 1000) id 8AA101928D4; Tue, 8 Mar 2022 15:58:03 +0100 (CET) From: =?utf-8?q?Toke_H=C3=B8iland-J=C3=B8rgensen?= To: Alexei Starovoitov , Daniel Borkmann , Andrii Nakryiko , Martin KaFai Lau , Song Liu , Yonghong Song , John Fastabend , KP Singh , "David S. Miller" , Jakub Kicinski , Jesper Dangaard Brouer Cc: =?utf-8?q?Toke_H=C3=B8iland-J=C3=B8rgensen?= , Jonathan Corbet , linux-doc@vger.kernel.org, netdev@vger.kernel.org, bpf@vger.kernel.org Subject: [PATCH bpf-next v10 2/5] Documentation/bpf: Add documentation for BPF_PROG_RUN Date: Tue, 8 Mar 2022 15:57:58 +0100 Message-Id: <20220308145801.46256-3-toke@redhat.com> X-Mailer: git-send-email 2.35.1 In-Reply-To: <20220308145801.46256-1-toke@redhat.com> References: <20220308145801.46256-1-toke@redhat.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net This adds documentation for the BPF_PROG_RUN command; a short overview of the command itself, and a more verbose description of the "live packet" mode for XDP introduced in the previous commit. Signed-off-by: Toke Høiland-Jørgensen --- Documentation/bpf/bpf_prog_run.rst | 117 +++++++++++++++++++++++++++++ Documentation/bpf/index.rst | 1 + 2 files changed, 118 insertions(+) create mode 100644 Documentation/bpf/bpf_prog_run.rst diff --git a/Documentation/bpf/bpf_prog_run.rst b/Documentation/bpf/bpf_prog_run.rst new file mode 100644 index 000000000000..4868c909df5c --- /dev/null +++ b/Documentation/bpf/bpf_prog_run.rst @@ -0,0 +1,117 @@ +.. SPDX-License-Identifier: GPL-2.0 + +=================================== +Running BPF programs from userspace +=================================== + +This document describes the ``BPF_PROG_RUN`` facility for running BPF programs +from userspace. + +.. contents:: + :local: + :depth: 2 + + +Overview +-------- + +The ``BPF_PROG_RUN`` command can be used through the ``bpf()`` syscall to +execute a BPF program in the kernel and return the results to userspace. This +can be used to unit test BPF programs against user-supplied context objects, and +as way to explicitly execute programs in the kernel for their side effects. The +command was previously named ``BPF_PROG_TEST_RUN``, and both constants continue +to be defined in the UAPI header, aliased to the same value. + +The ``BPF_PROG_RUN`` command can be used to execute BPF programs of the +following types: + +- ``BPF_PROG_TYPE_SOCKET_FILTER`` +- ``BPF_PROG_TYPE_SCHED_CLS`` +- ``BPF_PROG_TYPE_SCHED_ACT`` +- ``BPF_PROG_TYPE_XDP`` +- ``BPF_PROG_TYPE_SK_LOOKUP`` +- ``BPF_PROG_TYPE_CGROUP_SKB`` +- ``BPF_PROG_TYPE_LWT_IN`` +- ``BPF_PROG_TYPE_LWT_OUT`` +- ``BPF_PROG_TYPE_LWT_XMIT`` +- ``BPF_PROG_TYPE_LWT_SEG6LOCAL`` +- ``BPF_PROG_TYPE_FLOW_DISSECTOR`` +- ``BPF_PROG_TYPE_STRUCT_OPS`` +- ``BPF_PROG_TYPE_RAW_TRACEPOINT`` +- ``BPF_PROG_TYPE_SYSCALL`` + +When using the ``BPF_PROG_RUN`` command, userspace supplies an input context +object and (for program types operating on network packets) a buffer containing +the packet data that the BPF program will operate on. The kernel will then +execute the program and return the results to userspace. Note that programs will +not have any side effects while being run in this mode; in particular, packets +will not actually be redirected or dropped, the program return code will just be +returned to userspace. A separate mode for live execution of XDP programs is +provided, documented separately below. + +Running XDP programs in "live frame mode" +----------------------------------------- + +The ``BPF_PROG_RUN`` command has a separate mode for running live XDP programs, +which can be used to execute XDP programs in a way where packets will actually +be processed by the kernel after the execution of the XDP program as if they +arrived on a physical interface. This mode is activated by setting the +``BPF_F_TEST_XDP_LIVE_FRAMES`` flag when supplying an XDP program to +``BPF_PROG_RUN``. + +The live packet mode is optimised for high performance execution of the supplied +XDP program many times (suitable for, e.g., running as a traffic generator), +which means the semantics are not quite as straight-forward as the regular test +run mode. Specifically: + +- When executing an XDP program in live frame mode, the result of the execution + will not be returned to userspace; instead, the kernel will perform the + operation indicated by the program's return code (drop the packet, redirect + it, etc). For this reason, setting the ``data_out`` or ``ctx_out`` attributes + in the syscall parameters when running in this mode will be rejected. In + addition, not all failures will be reported back to userspace directly; + specifically, only fatal errors in setup or during execution (like memory + allocation errors) will halt execution and return an error. If an error occurs + in packet processing, like a failure to redirect to a given interface, + execution will continue with the next repetition; these errors can be detected + via the same trace points as for regular XDP programs. + +- Userspace can supply an ifindex as part of the context object, just like in + the regular (non-live) mode. The XDP program will be executed as though the + packet arrived on this interface; i.e., the ``ingress_ifindex`` of the context + object will point to that interface. Furthermore, if the XDP program returns + ``XDP_PASS``, the packet will be injected into the kernel networking stack as + though it arrived on that ifindex, and if it returns ``XDP_TX``, the packet + will be transmitted *out* of that same interface. Do note, though, that + because the program execution is not happening in driver context, an + ``XDP_TX`` is actually turned into the same action as an ``XDP_REDIRECT`` to + that same interface (i.e., it will only work if the driver has support for the + ``ndo_xdp_xmit`` driver op). + +- When running the program with multiple repetitions, the execution will happen + in batches. The batch size defaults to 64 packets (which is same as the + maximum NAPI receive batch size), but can be specified by userspace through + the ``batch_size`` parameter, up to a maximum of 256 packets. For each batch, + the kernel executes the XDP program repeatedly, each invocation getting a + separate copy of the packet data. For each repetition, if the program drops + the packet, the data page is immediately recycled (see below). Otherwise, the + packet is buffered until the end of the batch, at which point all packets + buffered this way during the batch are transmitted at once. + +- When setting up the test run, the kernel will initialise a pool of memory + pages of the same size as the batch size. Each memory page will be initialised + with the initial packet data supplied by userspace at ``BPF_PROG_RUN`` + invocation. When possible, the pages will be recycled on future program + invocations, to improve performance. Pages will generally be recycled a full + batch at a time, except when a packet is dropped (by return code or because + of, say, a redirection error), in which case that page will be recycled + immediately. If a packet ends up being passed to the regular networking stack + (because the XDP program returns ``XDP_PASS``, or because it ends up being + redirected to an interface that injects it into the stack), the page will be + released and a new one will be allocated when the pool is empty. + + When recycling, the page content is not rewritten; only the packet boundary + pointers (``data``, ``data_end`` and ``data_meta``) in the context object will + be reset to the original values. This means that if a program rewrites the + packet contents, it has to be prepared to see either the original content or + the modified version on subsequent invocations. diff --git a/Documentation/bpf/index.rst b/Documentation/bpf/index.rst index ef5c996547ec..96056a7447c7 100644 --- a/Documentation/bpf/index.rst +++ b/Documentation/bpf/index.rst @@ -21,6 +21,7 @@ that goes into great technical depth about the BPF Architecture. helpers programs maps + bpf_prog_run classic_vs_extended.rst bpf_licensing test_debug From patchwork Tue Mar 8 14:57:59 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: =?utf-8?q?Toke_H=C3=B8iland-J=C3=B8rgensen?= X-Patchwork-Id: 12773928 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9EBDBC433EF for ; Tue, 8 Mar 2022 14:58:14 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1347687AbiCHO7I (ORCPT ); Tue, 8 Mar 2022 09:59:08 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53608 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1347688AbiCHO7H (ORCPT ); Tue, 8 Mar 2022 09:59:07 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id E37D71B785 for ; Tue, 8 Mar 2022 06:58:10 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1646751490; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=gvNVqarfNkzMY75zhkRvdB+Wpo5cjAId5AeXOVZkNno=; b=aISNWBe9aCFm+HiVctpbC0iEfI+Px0FAFunVzdlzPWWyCVl7zje+b5HCmQdmiLUMI+khHe swiF8nTThJFy9lbqR8tnnyQoHchgJiOpo5/nOyxEgqV2V1510zF3AJWLVZA+XBGpPSqfwx DkTqYj0vmNA/gbDzQGmDNcbzLOG53yA= Received: from mail-ed1-f69.google.com (mail-ed1-f69.google.com [209.85.208.69]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-458-dBlp_CMJNJSeKu6iHMYt_w-1; Tue, 08 Mar 2022 09:58:08 -0500 X-MC-Unique: dBlp_CMJNJSeKu6iHMYt_w-1 Received: by mail-ed1-f69.google.com with SMTP id l24-20020a056402231800b00410f19a3103so10744135eda.5 for ; Tue, 08 Mar 2022 06:58:08 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=gvNVqarfNkzMY75zhkRvdB+Wpo5cjAId5AeXOVZkNno=; b=GNIM4xudjInOS9X0iWAe4SZhqLsHzf0d9bqJR7rOW5WTWOXqssRFFrkOlYPuFmIvJt C7Fg6FYy/cn60v9aAS0E0RoksYaTLEe0kRANEgs4ETDaA1fb9LVrYPzkNJh+ZPq2linG HegfS61Xr1fje2Sz0AScLZ4JnHU+dzq3MzG6HCCCuFbWLI6OV6uOB4p/eFgDuGfk2rPL S90K618Yli/MpWn7OhxxJF0Bp8oM/Q5AvVsTySYSbJX0d1el7mIhkT2m2KH09RJ3fZH8 +AZrMbC26/Gxi4QDmmjppoQzR4YpoDSpYy6Ma0KedjXknsWSBwBDrtOhDcF5PBUTAknf iYQw== X-Gm-Message-State: AOAM532kvUlMnqoVht1TvAvdLlfdBimiMQsYWtTZwRQzp4NA9aQ9d4dV l1hqXlq4NVjf+xkEpfamyS2VkbRt/M5WaG3Zuu8Ryq71uQ1wuQFd5hPcIYPjbZhdmzREvX++SS6 Xvep+NNyZY/QVuV9r X-Received: by 2002:a17:906:fb1b:b0:6da:9e7d:1390 with SMTP id lz27-20020a170906fb1b00b006da9e7d1390mr13259479ejb.644.1646751486430; Tue, 08 Mar 2022 06:58:06 -0800 (PST) X-Google-Smtp-Source: ABdhPJxckfWnRhAYcHBmCxOgoj4VKqW6GOb5gPXGwumyS2wB47Ean+tc4hOZw1goN1vtGkzrIzN8ww== X-Received: by 2002:a17:906:fb1b:b0:6da:9e7d:1390 with SMTP id lz27-20020a170906fb1b00b006da9e7d1390mr13259392ejb.644.1646751485080; Tue, 08 Mar 2022 06:58:05 -0800 (PST) Received: from alrua-x1.borgediget.toke.dk ([2a0c:4d80:42:443::2]) by smtp.gmail.com with ESMTPSA id fy1-20020a1709069f0100b006d229ed7f30sm6117558ejc.39.2022.03.08.06.58.04 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 08 Mar 2022 06:58:04 -0800 (PST) Received: by alrua-x1.borgediget.toke.dk (Postfix, from userid 1000) id CBB531928D6; Tue, 8 Mar 2022 15:58:03 +0100 (CET) From: =?utf-8?q?Toke_H=C3=B8iland-J=C3=B8rgensen?= To: Alexei Starovoitov , Daniel Borkmann , Andrii Nakryiko , Martin KaFai Lau , Song Liu , Yonghong Song , John Fastabend , KP Singh Cc: =?utf-8?q?Toke_H=C3=B8iland-J=C3=B8rgensen?= , netdev@vger.kernel.org, bpf@vger.kernel.org Subject: [PATCH bpf-next v10 3/5] libbpf: Support batch_size option to bpf_prog_test_run Date: Tue, 8 Mar 2022 15:57:59 +0100 Message-Id: <20220308145801.46256-4-toke@redhat.com> X-Mailer: git-send-email 2.35.1 In-Reply-To: <20220308145801.46256-1-toke@redhat.com> References: <20220308145801.46256-1-toke@redhat.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net Add support for setting the new batch_size parameter to BPF_PROG_TEST_RUN to libbpf; just add it as an option and pass it through to the kernel. Acked-by: Martin KaFai Lau Signed-off-by: Toke Høiland-Jørgensen --- tools/lib/bpf/bpf.c | 1 + tools/lib/bpf/bpf.h | 3 ++- 2 files changed, 3 insertions(+), 1 deletion(-) diff --git a/tools/lib/bpf/bpf.c b/tools/lib/bpf/bpf.c index 3c7c180294fa..f69ce3a01385 100644 --- a/tools/lib/bpf/bpf.c +++ b/tools/lib/bpf/bpf.c @@ -995,6 +995,7 @@ int bpf_prog_test_run_opts(int prog_fd, struct bpf_test_run_opts *opts) memset(&attr, 0, sizeof(attr)); attr.test.prog_fd = prog_fd; + attr.test.batch_size = OPTS_GET(opts, batch_size, 0); attr.test.cpu = OPTS_GET(opts, cpu, 0); attr.test.flags = OPTS_GET(opts, flags, 0); attr.test.repeat = OPTS_GET(opts, repeat, 0); diff --git a/tools/lib/bpf/bpf.h b/tools/lib/bpf/bpf.h index 16b21757b8bf..5253cb4a4c0a 100644 --- a/tools/lib/bpf/bpf.h +++ b/tools/lib/bpf/bpf.h @@ -512,8 +512,9 @@ struct bpf_test_run_opts { __u32 duration; /* out: average per repetition in ns */ __u32 flags; __u32 cpu; + __u32 batch_size; }; -#define bpf_test_run_opts__last_field cpu +#define bpf_test_run_opts__last_field batch_size LIBBPF_API int bpf_prog_test_run_opts(int prog_fd, struct bpf_test_run_opts *opts); From patchwork Tue Mar 8 14:58:00 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: =?utf-8?q?Toke_H=C3=B8iland-J=C3=B8rgensen?= X-Patchwork-Id: 12773931 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 00AC4C4332F for ; Tue, 8 Mar 2022 14:58:26 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1347703AbiCHO7U (ORCPT ); Tue, 8 Mar 2022 09:59:20 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54172 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1347693AbiCHO7P (ORCPT ); Tue, 8 Mar 2022 09:59:15 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 284112CC9E for ; Tue, 8 Mar 2022 06:58:15 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1646751494; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Y3p/D+LKC0CPwTQtFCwaC048P3XZEYMeJHbFVjQHPik=; b=bEdptXGs22qYVE+MULDLg1AaZ2JBLjZ0uQ7iUQ4IRR8Lc7xtkbCBJ3K1bAbQWKhdiFZES2 bzfdl0bLbfUmEkN2lJekqlKU4AYxM+5txYxqz5qzMrg2/1qal1jq5F+eBG4CP8NhpHIMNQ mIuxZedrrXl8blfJpmYGjUuCzw9jyas= Received: from mail-ed1-f72.google.com (mail-ed1-f72.google.com [209.85.208.72]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-230-PqRIwoMiO5KlEPQpNFa5sQ-1; Tue, 08 Mar 2022 09:58:13 -0500 X-MC-Unique: PqRIwoMiO5KlEPQpNFa5sQ-1 Received: by mail-ed1-f72.google.com with SMTP id da28-20020a056402177c00b00415ce4b20baso9837125edb.17 for ; Tue, 08 Mar 2022 06:58:13 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=Y3p/D+LKC0CPwTQtFCwaC048P3XZEYMeJHbFVjQHPik=; b=qP47Pq5sZBMYcU8fRfvWZtpPCyixrrE1OPfv7t4pksOfckaNX3yjflnkOoJAKgR2vH Bk9Kts7qf1OqsllmHSeG4G8GpiAzaaSn8+hbpR3YG+dt+DT1sqzM7k4rSbxPeeYGssyd TSOom6NuPJZ8fl4QH0DPYtzxIFf0KJyIhSSUl1dM75semLBAaPWxizvRp0ashwzuuGDj YIy+6mQp7nx//T76YD2FI6Jmwnk5ln+Mv975x2WevlrB2ZnlbbVdhIeE0rq8suD1CYwg GuXyoGJtaqne4rjMsC2cskugne1lPZgyemIPCNuynVc8CFrhmC5mGrPLOuNLdnFU1fCD +yuw== X-Gm-Message-State: AOAM533DTdKA1ls67Q1RqhSlGDrp53D/+dNNuABJUysluVnXb1HId+xl mVuKNfFqX3Iql2rHGsswf3/RriqB7eHIIU/10jBSQoA6gPL/1dk7hNw21PMJjs74THW7ufCT/1+ 407e9V3hgHAVd3BPq X-Received: by 2002:a05:6402:2548:b0:416:4155:e12 with SMTP id l8-20020a056402254800b0041641550e12mr11276601edb.175.1646751489442; Tue, 08 Mar 2022 06:58:09 -0800 (PST) X-Google-Smtp-Source: ABdhPJxYFLAl8d95Ga+0g/easO+CIUBnQ+jRcIsWAh5LlijQ4M+kGaSdtwCxmmEd0GZo4+vPNaH3ag== X-Received: by 2002:a05:6402:2548:b0:416:4155:e12 with SMTP id l8-20020a056402254800b0041641550e12mr11276412edb.175.1646751486830; Tue, 08 Mar 2022 06:58:06 -0800 (PST) Received: from alrua-x1.borgediget.toke.dk ([2a0c:4d80:42:443::2]) by smtp.gmail.com with ESMTPSA id p24-20020a1709061b5800b006da6435cedcsm5939105ejg.132.2022.03.08.06.58.04 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 08 Mar 2022 06:58:05 -0800 (PST) Received: by alrua-x1.borgediget.toke.dk (Postfix, from userid 1000) id 39D0D1928D8; Tue, 8 Mar 2022 15:58:04 +0100 (CET) From: =?utf-8?q?Toke_H=C3=B8iland-J=C3=B8rgensen?= To: Alexei Starovoitov , Daniel Borkmann , Andrii Nakryiko , Martin KaFai Lau , Song Liu , Yonghong Song , John Fastabend , KP Singh , "David S. Miller" , Jakub Kicinski , Jesper Dangaard Brouer Cc: =?utf-8?q?Toke_H=C3=B8iland-J=C3=B8rgensen?= , Shuah Khan , netdev@vger.kernel.org, bpf@vger.kernel.org Subject: [PATCH bpf-next v10 4/5] selftests/bpf: Move open_netns() and close_netns() into network_helpers.c Date: Tue, 8 Mar 2022 15:58:00 +0100 Message-Id: <20220308145801.46256-5-toke@redhat.com> X-Mailer: git-send-email 2.35.1 In-Reply-To: <20220308145801.46256-1-toke@redhat.com> References: <20220308145801.46256-1-toke@redhat.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net These will also be used by the xdp_do_redirect test being added in the next commit. Acked-by: Martin KaFai Lau Signed-off-by: Toke Høiland-Jørgensen --- tools/testing/selftests/bpf/network_helpers.c | 86 ++++++++++++++++++ tools/testing/selftests/bpf/network_helpers.h | 9 ++ .../selftests/bpf/prog_tests/tc_redirect.c | 89 ------------------- 3 files changed, 95 insertions(+), 89 deletions(-) diff --git a/tools/testing/selftests/bpf/network_helpers.c b/tools/testing/selftests/bpf/network_helpers.c index 6db1af8fdee7..2bb1f9b3841d 100644 --- a/tools/testing/selftests/bpf/network_helpers.c +++ b/tools/testing/selftests/bpf/network_helpers.c @@ -1,18 +1,25 @@ // SPDX-License-Identifier: GPL-2.0-only +#define _GNU_SOURCE + #include #include #include #include #include +#include #include +#include +#include #include #include #include +#include #include "bpf_util.h" #include "network_helpers.h" +#include "test_progs.h" #define clean_errno() (errno == 0 ? "None" : strerror(errno)) #define log_err(MSG, ...) ({ \ @@ -356,3 +363,82 @@ char *ping_command(int family) } return "ping"; } + +struct nstoken { + int orig_netns_fd; +}; + +static int setns_by_fd(int nsfd) +{ + int err; + + err = setns(nsfd, CLONE_NEWNET); + close(nsfd); + + if (!ASSERT_OK(err, "setns")) + return err; + + /* Switch /sys to the new namespace so that e.g. /sys/class/net + * reflects the devices in the new namespace. + */ + err = unshare(CLONE_NEWNS); + if (!ASSERT_OK(err, "unshare")) + return err; + + /* Make our /sys mount private, so the following umount won't + * trigger the global umount in case it's shared. + */ + err = mount("none", "/sys", NULL, MS_PRIVATE, NULL); + if (!ASSERT_OK(err, "remount private /sys")) + return err; + + err = umount2("/sys", MNT_DETACH); + if (!ASSERT_OK(err, "umount2 /sys")) + return err; + + err = mount("sysfs", "/sys", "sysfs", 0, NULL); + if (!ASSERT_OK(err, "mount /sys")) + return err; + + err = mount("bpffs", "/sys/fs/bpf", "bpf", 0, NULL); + if (!ASSERT_OK(err, "mount /sys/fs/bpf")) + return err; + + return 0; +} + +struct nstoken *open_netns(const char *name) +{ + int nsfd; + char nspath[PATH_MAX]; + int err; + struct nstoken *token; + + token = malloc(sizeof(struct nstoken)); + if (!ASSERT_OK_PTR(token, "malloc token")) + return NULL; + + token->orig_netns_fd = open("/proc/self/ns/net", O_RDONLY); + if (!ASSERT_GE(token->orig_netns_fd, 0, "open /proc/self/ns/net")) + goto fail; + + snprintf(nspath, sizeof(nspath), "%s/%s", "/var/run/netns", name); + nsfd = open(nspath, O_RDONLY | O_CLOEXEC); + if (!ASSERT_GE(nsfd, 0, "open netns fd")) + goto fail; + + err = setns_by_fd(nsfd); + if (!ASSERT_OK(err, "setns_by_fd")) + goto fail; + + return token; +fail: + free(token); + return NULL; +} + +void close_netns(struct nstoken *token) +{ + ASSERT_OK(setns_by_fd(token->orig_netns_fd), "setns_by_fd"); + free(token); +} diff --git a/tools/testing/selftests/bpf/network_helpers.h b/tools/testing/selftests/bpf/network_helpers.h index d198181a5648..a4b3b2f9877b 100644 --- a/tools/testing/selftests/bpf/network_helpers.h +++ b/tools/testing/selftests/bpf/network_helpers.h @@ -55,4 +55,13 @@ int make_sockaddr(int family, const char *addr_str, __u16 port, struct sockaddr_storage *addr, socklen_t *len); char *ping_command(int family); +struct nstoken; +/** + * open_netns() - Switch to specified network namespace by name. + * + * Returns token with which to restore the original namespace + * using close_netns(). + */ +struct nstoken *open_netns(const char *name); +void close_netns(struct nstoken *token); #endif diff --git a/tools/testing/selftests/bpf/prog_tests/tc_redirect.c b/tools/testing/selftests/bpf/prog_tests/tc_redirect.c index 2b255e28ed26..7ad66a247c02 100644 --- a/tools/testing/selftests/bpf/prog_tests/tc_redirect.c +++ b/tools/testing/selftests/bpf/prog_tests/tc_redirect.c @@ -10,8 +10,6 @@ * to drop unexpected traffic. */ -#define _GNU_SOURCE - #include #include #include @@ -19,10 +17,8 @@ #include #include #include -#include #include #include -#include #include #include @@ -92,91 +88,6 @@ static int write_file(const char *path, const char *newval) return 0; } -struct nstoken { - int orig_netns_fd; -}; - -static int setns_by_fd(int nsfd) -{ - int err; - - err = setns(nsfd, CLONE_NEWNET); - close(nsfd); - - if (!ASSERT_OK(err, "setns")) - return err; - - /* Switch /sys to the new namespace so that e.g. /sys/class/net - * reflects the devices in the new namespace. - */ - err = unshare(CLONE_NEWNS); - if (!ASSERT_OK(err, "unshare")) - return err; - - /* Make our /sys mount private, so the following umount won't - * trigger the global umount in case it's shared. - */ - err = mount("none", "/sys", NULL, MS_PRIVATE, NULL); - if (!ASSERT_OK(err, "remount private /sys")) - return err; - - err = umount2("/sys", MNT_DETACH); - if (!ASSERT_OK(err, "umount2 /sys")) - return err; - - err = mount("sysfs", "/sys", "sysfs", 0, NULL); - if (!ASSERT_OK(err, "mount /sys")) - return err; - - err = mount("bpffs", "/sys/fs/bpf", "bpf", 0, NULL); - if (!ASSERT_OK(err, "mount /sys/fs/bpf")) - return err; - - return 0; -} - -/** - * open_netns() - Switch to specified network namespace by name. - * - * Returns token with which to restore the original namespace - * using close_netns(). - */ -static struct nstoken *open_netns(const char *name) -{ - int nsfd; - char nspath[PATH_MAX]; - int err; - struct nstoken *token; - - token = calloc(1, sizeof(struct nstoken)); - if (!ASSERT_OK_PTR(token, "malloc token")) - return NULL; - - token->orig_netns_fd = open("/proc/self/ns/net", O_RDONLY); - if (!ASSERT_GE(token->orig_netns_fd, 0, "open /proc/self/ns/net")) - goto fail; - - snprintf(nspath, sizeof(nspath), "%s/%s", "/var/run/netns", name); - nsfd = open(nspath, O_RDONLY | O_CLOEXEC); - if (!ASSERT_GE(nsfd, 0, "open netns fd")) - goto fail; - - err = setns_by_fd(nsfd); - if (!ASSERT_OK(err, "setns_by_fd")) - goto fail; - - return token; -fail: - free(token); - return NULL; -} - -static void close_netns(struct nstoken *token) -{ - ASSERT_OK(setns_by_fd(token->orig_netns_fd), "setns_by_fd"); - free(token); -} - static int netns_setup_namespaces(const char *verb) { const char * const *ns = namespaces; From patchwork Tue Mar 8 14:58:01 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: =?utf-8?q?Toke_H=C3=B8iland-J=C3=B8rgensen?= X-Patchwork-Id: 12773932 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id D5DD6C433FE for ; Tue, 8 Mar 2022 14:58:28 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1347712AbiCHO7W (ORCPT ); Tue, 8 Mar 2022 09:59:22 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54174 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1347702AbiCHO7O (ORCPT ); Tue, 8 Mar 2022 09:59:14 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id E15D82183F for ; Tue, 8 Mar 2022 06:58:13 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1646751493; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=yN6T+VF6KObwBoQLQ3U5mjr0SfIS/LHaWBWadcFK+X4=; b=a0PGXNuV5crG3ybRNgcHaGYrrkSqLA8hR2r8Nq1pQfbtBRebHwXDQcrRASYwhTo2aOjp5h Y/tRrCu5TColH6pbobNE7p+dHTkaNKyrAWry5MMs3b44eHUaqUldfdMx741+wxYeOZfd+g FMpB7zEhDwQnOwSjhAno5IZDn8tl4rI= Received: from mail-ej1-f71.google.com (mail-ej1-f71.google.com [209.85.218.71]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-6-mD91hKcIPS6-Rt96MSy5fg-1; Tue, 08 Mar 2022 09:58:12 -0500 X-MC-Unique: mD91hKcIPS6-Rt96MSy5fg-1 Received: by mail-ej1-f71.google.com with SMTP id hx13-20020a170906846d00b006db02e1a307so3739069ejc.2 for ; Tue, 08 Mar 2022 06:58:11 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=yN6T+VF6KObwBoQLQ3U5mjr0SfIS/LHaWBWadcFK+X4=; b=m1BVRybIBR2iiZtYYsK2Ilnv/Amo9cVhUzGqD1KrQyCYjyJXo9CLIi0yeu7sKjTC2K 6cOTYc4siAyB2mzGiYkp5OjbIRT82RwONCpfkucBrqft3VJdH2sYDZeUaVjmtSgiQ4in 1y3ZcEBTxxDzRvlNad/B3mjttCDBqgeAOmeN5nQglRIoqQYCJIVEnwFYe74ANo/5rpWY UrEt5rEiB5y2b+jGf+bj88mpIa5xLm8Tqa8LOe08KyhBfDKBcZE1YOD6yB6ewTgTx8Rs +lrWEo1gWb5w26+oiY7Meyoc7Sv/MkpTpTmEhGxjZ46Zpy+uC9hEDxNoMdAJlW7LCkFw 4AyA== X-Gm-Message-State: AOAM533/gG2ChdOabFoZbBT6hrTHr/bHXFFQZc4us+VoXb557wdYar/J k93PVKN+Y3JibYeCme2PZG7NXjxmTZJ3LV4v/ZPno6I6ie4UZidOQD/UsHX0rOJksgo9DSvLTF7 Pink0nNP9QMpCKI1E X-Received: by 2002:aa7:d711:0:b0:416:11c2:55fa with SMTP id t17-20020aa7d711000000b0041611c255famr16849870edq.242.1646751488722; Tue, 08 Mar 2022 06:58:08 -0800 (PST) X-Google-Smtp-Source: ABdhPJwZh9L5vZq6E8EdkeKcJnfilE2mVME+aaZd5G5A4EiHN4vr+xQ0qjyUpj5+1ug4SfcCJKg8eA== X-Received: by 2002:aa7:d711:0:b0:416:11c2:55fa with SMTP id t17-20020aa7d711000000b0041611c255famr16849756edq.242.1646751487135; Tue, 08 Mar 2022 06:58:07 -0800 (PST) Received: from alrua-x1.borgediget.toke.dk ([2a0c:4d80:42:443::2]) by smtp.gmail.com with ESMTPSA id bq23-20020a170906d0d700b006db0372d3a2sm3751157ejb.20.2022.03.08.06.58.05 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 08 Mar 2022 06:58:05 -0800 (PST) Received: by alrua-x1.borgediget.toke.dk (Postfix, from userid 1000) id 9CAA11928DA; Tue, 8 Mar 2022 15:58:04 +0100 (CET) From: =?utf-8?q?Toke_H=C3=B8iland-J=C3=B8rgensen?= To: Alexei Starovoitov , Daniel Borkmann , "David S. Miller" , Jakub Kicinski , Jesper Dangaard Brouer , John Fastabend , Andrii Nakryiko , Martin KaFai Lau , Song Liu , Yonghong Song , KP Singh Cc: =?utf-8?q?Toke_H=C3=B8iland-J=C3=B8rgensen?= , Shuah Khan , netdev@vger.kernel.org, bpf@vger.kernel.org Subject: [PATCH bpf-next v10 5/5] selftests/bpf: Add selftest for XDP_REDIRECT in BPF_PROG_RUN Date: Tue, 8 Mar 2022 15:58:01 +0100 Message-Id: <20220308145801.46256-6-toke@redhat.com> X-Mailer: git-send-email 2.35.1 In-Reply-To: <20220308145801.46256-1-toke@redhat.com> References: <20220308145801.46256-1-toke@redhat.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net This adds a selftest for the XDP_REDIRECT facility in BPF_PROG_RUN, that redirects packets into a veth and counts them using an XDP program on the other side of the veth pair and a TC program on the local side of the veth. Signed-off-by: Toke Høiland-Jørgensen --- .../bpf/prog_tests/xdp_do_redirect.c | 177 ++++++++++++++++++ .../bpf/progs/test_xdp_do_redirect.c | 100 ++++++++++ 2 files changed, 277 insertions(+) create mode 100644 tools/testing/selftests/bpf/prog_tests/xdp_do_redirect.c create mode 100644 tools/testing/selftests/bpf/progs/test_xdp_do_redirect.c diff --git a/tools/testing/selftests/bpf/prog_tests/xdp_do_redirect.c b/tools/testing/selftests/bpf/prog_tests/xdp_do_redirect.c new file mode 100644 index 000000000000..9926b07e38c8 --- /dev/null +++ b/tools/testing/selftests/bpf/prog_tests/xdp_do_redirect.c @@ -0,0 +1,177 @@ +// SPDX-License-Identifier: GPL-2.0 +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include "test_xdp_do_redirect.skel.h" + +#define SYS(fmt, ...) \ + ({ \ + char cmd[1024]; \ + snprintf(cmd, sizeof(cmd), fmt, ##__VA_ARGS__); \ + if (!ASSERT_OK(system(cmd), cmd)) \ + goto out; \ + }) + +struct udp_packet { + struct ethhdr eth; + struct ipv6hdr iph; + struct udphdr udp; + __u8 payload[64 - sizeof(struct udphdr) + - sizeof(struct ethhdr) - sizeof(struct ipv6hdr)]; +} __packed; + +static struct udp_packet pkt_udp = { + .eth.h_proto = __bpf_constant_htons(ETH_P_IPV6), + .eth.h_dest = {0x00, 0x11, 0x22, 0x33, 0x44, 0x55}, + .eth.h_source = {0x66, 0x77, 0x88, 0x99, 0xaa, 0xbb}, + .iph.version = 6, + .iph.nexthdr = IPPROTO_UDP, + .iph.payload_len = bpf_htons(sizeof(struct udp_packet) + - offsetof(struct udp_packet, udp)), + .iph.hop_limit = 2, + .iph.saddr.s6_addr16 = {bpf_htons(0xfc00), 0, 0, 0, 0, 0, 0, bpf_htons(1)}, + .iph.daddr.s6_addr16 = {bpf_htons(0xfc00), 0, 0, 0, 0, 0, 0, bpf_htons(2)}, + .udp.source = bpf_htons(1), + .udp.dest = bpf_htons(1), + .udp.len = bpf_htons(sizeof(struct udp_packet) + - offsetof(struct udp_packet, udp)), + .payload = {0x42}, /* receiver XDP program matches on this */ +}; + +static int attach_tc_prog(struct bpf_tc_hook *hook, int fd) +{ + DECLARE_LIBBPF_OPTS(bpf_tc_opts, opts, .handle = 1, .priority = 1, .prog_fd = fd); + int ret; + + ret = bpf_tc_hook_create(hook); + if (!ASSERT_OK(ret, "create tc hook")) + return ret; + + ret = bpf_tc_attach(hook, &opts); + if (!ASSERT_OK(ret, "bpf_tc_attach")) { + bpf_tc_hook_destroy(hook); + return ret; + } + + return 0; +} + +#define NUM_PKTS 10000 +void test_xdp_do_redirect(void) +{ + int err, xdp_prog_fd, tc_prog_fd, ifindex_src, ifindex_dst; + char data[sizeof(pkt_udp) + sizeof(__u32)]; + struct test_xdp_do_redirect *skel = NULL; + struct nstoken *nstoken = NULL; + struct bpf_link *link; + + struct xdp_md ctx_in = { .data = sizeof(__u32), + .data_end = sizeof(data) }; + DECLARE_LIBBPF_OPTS(bpf_test_run_opts, opts, + .data_in = &data, + .data_size_in = sizeof(data), + .ctx_in = &ctx_in, + .ctx_size_in = sizeof(ctx_in), + .flags = BPF_F_TEST_XDP_LIVE_FRAMES, + .repeat = NUM_PKTS, + .batch_size = 64, + ); + DECLARE_LIBBPF_OPTS(bpf_tc_hook, tc_hook, + .attach_point = BPF_TC_INGRESS); + + memcpy(&data[sizeof(__u32)], &pkt_udp, sizeof(pkt_udp)); + *((__u32 *)data) = 0x42; /* metadata test value */ + + skel = test_xdp_do_redirect__open(); + if (!ASSERT_OK_PTR(skel, "skel")) + return; + + /* The XDP program we run with bpf_prog_run() will cycle through all + * three xmit (PASS/TX/REDIRECT) return codes starting from above, and + * ending up with PASS, so we should end up with two packets on the dst + * iface and NUM_PKTS-2 in the TC hook. We match the packets on the UDP + * payload. + */ + SYS("ip netns add testns"); + nstoken = open_netns("testns"); + if (!ASSERT_OK_PTR(nstoken, "setns")) + goto out; + + SYS("ip link add veth_src type veth peer name veth_dst"); + SYS("ip link set dev veth_src address 00:11:22:33:44:55"); + SYS("ip link set dev veth_dst address 66:77:88:99:aa:bb"); + SYS("ip link set dev veth_src up"); + SYS("ip link set dev veth_dst up"); + SYS("ip addr add dev veth_src fc00::1/64"); + SYS("ip addr add dev veth_dst fc00::2/64"); + SYS("ip neigh add fc00::2 dev veth_src lladdr 66:77:88:99:aa:bb"); + + /* We enable forwarding in the test namespace because that will cause + * the packets that go through the kernel stack (with XDP_PASS) to be + * forwarded back out the same interface (because of the packet dst + * combined with the interface addresses). When this happens, the + * regular forwarding path will end up going through the same + * veth_xdp_xmit() call as the XDP_REDIRECT code, which can cause a + * deadlock if it happens on the same CPU. There's a local_bh_disable() + * in the test_run code to prevent this, but an earlier version of the + * code didn't have this, so we keep the test behaviour to make sure the + * bug doesn't resurface. + */ + SYS("sysctl -qw net.ipv6.conf.all.forwarding=1"); + + ifindex_src = if_nametoindex("veth_src"); + ifindex_dst = if_nametoindex("veth_dst"); + if (!ASSERT_NEQ(ifindex_src, 0, "ifindex_src") || + !ASSERT_NEQ(ifindex_dst, 0, "ifindex_dst")) + goto out; + + memcpy(skel->rodata->expect_dst, &pkt_udp.eth.h_dest, ETH_ALEN); + skel->rodata->ifindex_out = ifindex_src; /* redirect back to the same iface */ + skel->rodata->ifindex_in = ifindex_src; + ctx_in.ingress_ifindex = ifindex_src; + tc_hook.ifindex = ifindex_src; + + if (!ASSERT_OK(test_xdp_do_redirect__load(skel), "load")) + goto out; + + link = bpf_program__attach_xdp(skel->progs.xdp_count_pkts, ifindex_dst); + if (!ASSERT_OK_PTR(link, "prog_attach")) + goto out; + skel->links.xdp_count_pkts = link; + + tc_prog_fd = bpf_program__fd(skel->progs.tc_count_pkts); + if (attach_tc_prog(&tc_hook, tc_prog_fd)) + goto out; + + xdp_prog_fd = bpf_program__fd(skel->progs.xdp_redirect); + err = bpf_prog_test_run_opts(xdp_prog_fd, &opts); + if (!ASSERT_OK(err, "prog_run")) + goto out_tc; + + /* wait for the packets to be flushed */ + kern_sync_rcu(); + + /* There will be one packet sent through XDP_REDIRECT and one through + * XDP_TX; these will show up on the XDP counting program, while the + * rest will be counted at the TC ingress hook (and the counting program + * resets the packet payload so they don't get counted twice even though + * they are re-xmited out the veth device + */ + ASSERT_EQ(skel->bss->pkts_seen_xdp, 2, "pkt_count_xdp"); + ASSERT_EQ(skel->bss->pkts_seen_zero, 2, "pkt_count_zero"); + ASSERT_EQ(skel->bss->pkts_seen_tc, NUM_PKTS - 2, "pkt_count_tc"); + +out_tc: + bpf_tc_hook_destroy(&tc_hook); +out: + if (nstoken) + close_netns(nstoken); + system("ip netns del testns"); + test_xdp_do_redirect__destroy(skel); +} diff --git a/tools/testing/selftests/bpf/progs/test_xdp_do_redirect.c b/tools/testing/selftests/bpf/progs/test_xdp_do_redirect.c new file mode 100644 index 000000000000..77a123071940 --- /dev/null +++ b/tools/testing/selftests/bpf/progs/test_xdp_do_redirect.c @@ -0,0 +1,100 @@ +// SPDX-License-Identifier: GPL-2.0 +#include +#include + +#define ETH_ALEN 6 +#define HDR_SZ (sizeof(struct ethhdr) + sizeof(struct ipv6hdr) + sizeof(struct udphdr)) +const volatile int ifindex_out; +const volatile int ifindex_in; +const volatile __u8 expect_dst[ETH_ALEN]; +volatile int pkts_seen_xdp = 0; +volatile int pkts_seen_zero = 0; +volatile int pkts_seen_tc = 0; +volatile int retcode = XDP_REDIRECT; + +SEC("xdp") +int xdp_redirect(struct xdp_md *xdp) +{ + __u32 *metadata = (void *)(long)xdp->data_meta; + void *data_end = (void *)(long)xdp->data_end; + void *data = (void *)(long)xdp->data; + + __u8 *payload = data + HDR_SZ; + int ret = retcode; + + if (payload + 1 > data_end) + return XDP_ABORTED; + + if (xdp->ingress_ifindex != ifindex_in) + return XDP_ABORTED; + + if (metadata + 1 > data) + return XDP_ABORTED; + + if (*metadata != 0x42) + return XDP_ABORTED; + + if (*payload == 0) { + *payload = 0x42; + pkts_seen_zero++; + } + + if (bpf_xdp_adjust_meta(xdp, 4)) + return XDP_ABORTED; + + if (retcode > XDP_PASS) + retcode--; + + if (ret == XDP_REDIRECT) + return bpf_redirect(ifindex_out, 0); + + return ret; +} + +static bool check_pkt(void *data, void *data_end) +{ + struct ipv6hdr *iph = data + sizeof(struct ethhdr); + __u8 *payload = data + HDR_SZ; + + if (payload + 1 > data_end) + return false; + + if (iph->nexthdr != IPPROTO_UDP || *payload != 0x42) + return false; + + /* reset the payload so the same packet doesn't get counted twice when + * it cycles back through the kernel path and out the dst veth + */ + *payload = 0; + return true; +} + +SEC("xdp") +int xdp_count_pkts(struct xdp_md *xdp) +{ + void *data = (void *)(long)xdp->data; + void *data_end = (void *)(long)xdp->data_end; + + if (check_pkt(data, data_end)) + pkts_seen_xdp++; + + /* Return XDP_DROP to make sure the data page is recycled, like when it + * exits a physical NIC. Recycled pages will be counted in the + * pkts_seen_zero counter above. + */ + return XDP_DROP; +} + +SEC("tc") +int tc_count_pkts(struct __sk_buff *skb) +{ + void *data = (void *)(long)skb->data; + void *data_end = (void *)(long)skb->data_end; + + if (check_pkt(data, data_end)) + pkts_seen_tc++; + + return 0; +} + +char _license[] SEC("license") = "GPL";