From patchwork Wed Dec 2 10:39:22 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mariusz Dudek X-Patchwork-Id: 11945617 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2FB15C83013 for ; Wed, 2 Dec 2020 10:40:26 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id E4D7120872 for ; Wed, 2 Dec 2020 10:40:25 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729512AbgLBKkR (ORCPT ); Wed, 2 Dec 2020 05:40:17 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35370 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725971AbgLBKkQ (ORCPT ); Wed, 2 Dec 2020 05:40:16 -0500 Received: from mail-lf1-x141.google.com (mail-lf1-x141.google.com [IPv6:2a00:1450:4864:20::141]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 2A06EC0617A6; Wed, 2 Dec 2020 02:39:30 -0800 (PST) Received: by mail-lf1-x141.google.com with SMTP id j205so3772904lfj.6; Wed, 02 Dec 2020 02:39:30 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=pMJwK7FbHxVAeE34wtoBUkMJgvokHlZJkT0LbM9e8/A=; b=LQ9KcY/MZorp11liNW9JbAdGSClzMJYkWHhZzqtnv0I8L1J4qpbBfw1JDf2srl7TJd KZGWVq+v06Bx9GUhtMn+vj8yQD7lRBlI4rKy5dRpg9hCsP7LFzrwaZCmKtIs9nXJjSJj qUvmnY1rFvCDNPYtQYkn4A9UYduwZ+d3Jko8DSN21MZJCEfOcDrm28Wded3Qou8BYyBa e5rPu+BQoJipiaf36iKmxhbXv9PWEEzbZ5Xb9SOcGNiXr1+eXe0/dArJbNEIhoXDiYIg ru3Gv0hBeKxqlYRBNTiIzNyfRn5NlD56676Ewo043h2U647iN0tRMhJLksLtDviOgBOr aDOA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=pMJwK7FbHxVAeE34wtoBUkMJgvokHlZJkT0LbM9e8/A=; b=WnMu4RmZbszy8WFXwG4R47nc95sTsfSmOSVGpNTkxOspUY4aIDsn5tvrbJwy8zODfC Mi3lHYdJtj/jbZYvh2gzVJXWkMWpCqDRkDlcMarfabn4iFn5bE0s+e/HNldr/l8P5pHz gXwwPDsfLcGdD390auqalXgkR8iPD2i5J776Gu308DhU1idLA03VW9gwxwKIjCokvM+N EY/n7Pl/tf04y2QoN83SEbOMyX+SbdTJcrn4vBb1pdop9U0zDAGLckVpulwolOtVwemG uNFucIKceDkGkfPe9yqnFvhcvDY51hXS1wHO0MdcgLFkBRoCuSzF5k5Vxm+BsPo5sJ1n 3Wng== X-Gm-Message-State: AOAM530fUnr9RIYGpaLDX4dXN2wungxJsHjsvEm2ymB6zaBrtVw0vazb +bIxjOLfIjBoFj4ajKBGvzI= X-Google-Smtp-Source: ABdhPJzF/qJuRaqfxtSnNCjQkQKamFO2LnMiInBZoKNjIQHTb0iaXXDQXZf4zseZhKGVn5I4Fk+Y/A== X-Received: by 2002:a19:cbc6:: with SMTP id b189mr898272lfg.275.1606905568657; Wed, 02 Dec 2020 02:39:28 -0800 (PST) Received: from localhost.localdomain (host-89-229-233-64.dynamic.mm.pl. [89.229.233.64]) by smtp.gmail.com with ESMTPSA id q13sm354236lfp.2.2020.12.02.02.39.27 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 02 Dec 2020 02:39:28 -0800 (PST) From: mariusz.dudek@gmail.com X-Google-Original-From: mariuszx.dudek@intel.com To: andrii.nakryiko@gmail.com, magnus.karlsson@intel.com, bjorn.topel@intel.com, ast@kernel.org, daniel@iogearbox.net, netdev@vger.kernel.org, jonathan.lemon@gmail.com Cc: bpf@vger.kernel.org, Mariusz Dudek Subject: [PATCH v6 bpf-next 1/2] libbpf: separate XDP program load with xsk socket creation Date: Wed, 2 Dec 2020 11:39:22 +0100 Message-Id: <20201202103923.12447-2-mariuszx.dudek@intel.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20201202103923.12447-1-mariuszx.dudek@intel.com> References: <20201202103923.12447-1-mariuszx.dudek@intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net From: Mariusz Dudek Add support for separation of eBPF program load and xsk socket creation. This is needed for use-case when you want to privide as little privileges as possible to the data plane application that will handle xsk socket creation and incoming traffic. With this patch the data entity container can be run with only CAP_NET_RAW capability to fulfill its purpose of creating xsk socket and handling packages. In case your umem is larger or equal process limit for MEMLOCK you need either increase the limit or CAP_IPC_LOCK capability. To resolve privileges issue two APIs are introduced: - xsk_setup_xdp_prog - loads the built in XDP program. It can also return xsks_map_fd which is needed by unprivileged process to update xsks_map with AF_XDP socket "fd" - xsk_socket__update_xskmap - inserts an AF_XDP socket into an xskmap for a particular xsk_socket Signed-off-by: Mariusz Dudek --- tools/lib/bpf/libbpf.map | 2 + tools/lib/bpf/xsk.c | 92 ++++++++++++++++++++++++++++++++++++---- tools/lib/bpf/xsk.h | 5 +++ 3 files changed, 90 insertions(+), 9 deletions(-) diff --git a/tools/lib/bpf/libbpf.map b/tools/lib/bpf/libbpf.map index 29ff4807b909..d939d5ac092e 100644 --- a/tools/lib/bpf/libbpf.map +++ b/tools/lib/bpf/libbpf.map @@ -345,4 +345,6 @@ LIBBPF_0.3.0 { btf__parse_split; btf__new_empty_split; btf__new_split; + xsk_setup_xdp_prog; + xsk_socket__update_xskmap; } LIBBPF_0.2.0; diff --git a/tools/lib/bpf/xsk.c b/tools/lib/bpf/xsk.c index 9bc537d0b92d..4b051ec7cfbb 100644 --- a/tools/lib/bpf/xsk.c +++ b/tools/lib/bpf/xsk.c @@ -566,8 +566,35 @@ static int xsk_set_bpf_maps(struct xsk_socket *xsk) &xsk->fd, 0); } -static int xsk_setup_xdp_prog(struct xsk_socket *xsk) +static int xsk_create_xsk_struct(int ifindex, struct xsk_socket *xsk) { + char ifname[IFNAMSIZ]; + struct xsk_ctx *ctx; + char *interface; + + ctx = calloc(1, sizeof(*ctx)); + if (!ctx) + return -ENOMEM; + + interface = if_indextoname(ifindex, &ifname[0]); + if (!interface) { + free(ctx); + return -errno; + } + + ctx->ifindex = ifindex; + strncpy(ctx->ifname, ifname, IFNAMSIZ - 1); + ctx->ifname[IFNAMSIZ - 1] = 0; + + xsk->ctx = ctx; + + return 0; +} + +static int __xsk_setup_xdp_prog(struct xsk_socket *_xdp, + int *xsks_map_fd) +{ + struct xsk_socket *xsk = _xdp; struct xsk_ctx *ctx = xsk->ctx; __u32 prog_id = 0; int err; @@ -584,8 +611,7 @@ static int xsk_setup_xdp_prog(struct xsk_socket *xsk) err = xsk_load_xdp_prog(xsk); if (err) { - xsk_delete_bpf_maps(xsk); - return err; + goto err_load_xdp_prog; } } else { ctx->prog_fd = bpf_prog_get_fd_by_id(prog_id); @@ -598,15 +624,29 @@ static int xsk_setup_xdp_prog(struct xsk_socket *xsk) } } - if (xsk->rx) + if (xsk->rx) { err = xsk_set_bpf_maps(xsk); - if (err) { - xsk_delete_bpf_maps(xsk); - close(ctx->prog_fd); - return err; + if (err) { + if (!prog_id) { + goto err_set_bpf_maps; + } else { + close(ctx->prog_fd); + return err; + } + } } + if (xsks_map_fd) + *xsks_map_fd = ctx->xsks_map_fd; return 0; + +err_set_bpf_maps: + close(ctx->prog_fd); + bpf_set_link_xdp_fd(ctx->ifindex, -1, 0); +err_load_xdp_prog: + xsk_delete_bpf_maps(xsk); + + return err; } static struct xsk_ctx *xsk_get_ctx(struct xsk_umem *umem, int ifindex, @@ -689,6 +729,40 @@ static struct xsk_ctx *xsk_create_ctx(struct xsk_socket *xsk, return ctx; } +static void xsk_destroy_xsk_struct(struct xsk_socket *xsk) +{ + free(xsk->ctx); + free(xsk); +} + +int xsk_socket__update_xskmap(struct xsk_socket *xsk, int fd) +{ + xsk->ctx->xsks_map_fd = fd; + return xsk_set_bpf_maps(xsk); +} + +int xsk_setup_xdp_prog(int ifindex, int *xsks_map_fd) +{ + struct xsk_socket *xsk; + int res; + + xsk = calloc(1, sizeof(*xsk)); + if (!xsk) + return -ENOMEM; + + res = xsk_create_xsk_struct(ifindex, xsk); + if (res) { + free(xsk); + return -EINVAL; + } + + res = __xsk_setup_xdp_prog(xsk, xsks_map_fd); + + xsk_destroy_xsk_struct(xsk); + + return res; +} + int xsk_socket__create_shared(struct xsk_socket **xsk_ptr, const char *ifname, __u32 queue_id, struct xsk_umem *umem, @@ -838,7 +912,7 @@ int xsk_socket__create_shared(struct xsk_socket **xsk_ptr, ctx->prog_fd = -1; if (!(xsk->config.libbpf_flags & XSK_LIBBPF_FLAGS__INHIBIT_PROG_LOAD)) { - err = xsk_setup_xdp_prog(xsk); + err = __xsk_setup_xdp_prog(xsk, NULL); if (err) goto out_mmap_tx; } diff --git a/tools/lib/bpf/xsk.h b/tools/lib/bpf/xsk.h index 5865e082ba0b..e9f121f5d129 100644 --- a/tools/lib/bpf/xsk.h +++ b/tools/lib/bpf/xsk.h @@ -204,6 +204,11 @@ struct xsk_umem_config { __u32 flags; }; +LIBBPF_API int xsk_setup_xdp_prog(int ifindex, + int *xsks_map_fd); +LIBBPF_API int xsk_socket__update_xskmap(struct xsk_socket *xsk, + int xsks_map_fd); + /* Flags for the libbpf_flags field. */ #define XSK_LIBBPF_FLAGS__INHIBIT_PROG_LOAD (1 << 0) From patchwork Wed Dec 2 10:39:23 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mariusz Dudek X-Patchwork-Id: 11945615 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id CC9BBC83012 for ; Wed, 2 Dec 2020 10:40:25 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 7A9F520872 for ; Wed, 2 Dec 2020 10:40:25 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729493AbgLBKkN (ORCPT ); Wed, 2 Dec 2020 05:40:13 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35380 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725971AbgLBKkM (ORCPT ); Wed, 2 Dec 2020 05:40:12 -0500 Received: from mail-lf1-x130.google.com (mail-lf1-x130.google.com [IPv6:2a00:1450:4864:20::130]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 21A2BC0617A7; Wed, 2 Dec 2020 02:39:32 -0800 (PST) Received: by mail-lf1-x130.google.com with SMTP id l11so3835984lfg.0; Wed, 02 Dec 2020 02:39:32 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=t0z8hFuN0CmWfy4MMYvH83rJGlqQzj1t5SRgUnGKKQM=; b=XRWwgfy43TBuRJrvsdSS9mrArYU2qd42M22SrCTGgNUQ0XNUXbmIGoCiAxKvuOlyYG b+mazWkQRpLHM2TlepqUOQaI+OeDCmw962CFHmv/BbXsdxMJgLWGi52fCxi/2AbdfnMl KsrkpS85R8okLLaTmSz/W1pJyW8swn+zik5cbMahGGnKb/Y2JFXej0ksn0n9bZBKptHk lDMW8QOqBSiu9hQt4OUJpGvUcE9EO7/Nj4I3+ZcU1ceGpOtLiP7muvYSkVzHlHlgXMdq AT0xciqXORAoqdqCxVuA9YhI/O6X8hrQoBxhpcrA5jt8jE8xdBH3Xsc97ZPsy+Totk8o 0j8w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=t0z8hFuN0CmWfy4MMYvH83rJGlqQzj1t5SRgUnGKKQM=; b=al8k6aHZ91GVBGbBRTaI2OboNZh7cvdnNpRaCpe4q84171jmoZphYzC6ZEWSk017ov ke1VhXV1qJMn9U0nQD50TxyQesrkuHlumSPnhfdlx8f0pXB2hTVCz0qd4SKGd5TxcNYm XAfrkdKaiE4h//XJEAlChkzaK8fq2aYg1ocK9qhUnMt8ZG/VpbQ89rgzgANXkM9RN9+n EfQppktuYl77bBqCFdCmDiXNeQ0KqoGqMvCMjdZv0Ws/SRxglfe9N4sNR1yE7S4eoI1M aAs29PYnvGP+HqF/p87sv6VH/bDPhtN5ruIttIXwMQGjNCHryYok+0eCPi0n41kqd/Ql xb0w== X-Gm-Message-State: AOAM5322uvXIgt56ZnqYpEaqSHK5hTkEUilMpOoZS7qV8QzmvmcIl8p3 ANnV5xywzykweshMZoGMgzYu+ZMOSf6MHIAi X-Google-Smtp-Source: ABdhPJyZeDmRtIoHrZer/zChXBCbwlxfGltwbTA1B6igtjh9W8AV/B9+Sho/wNwUpE74tuic5pg5sQ== X-Received: by 2002:ac2:5e8d:: with SMTP id b13mr929347lfq.246.1606905570565; Wed, 02 Dec 2020 02:39:30 -0800 (PST) Received: from localhost.localdomain (host-89-229-233-64.dynamic.mm.pl. [89.229.233.64]) by smtp.gmail.com with ESMTPSA id q13sm354236lfp.2.2020.12.02.02.39.29 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 02 Dec 2020 02:39:29 -0800 (PST) From: mariusz.dudek@gmail.com X-Google-Original-From: mariuszx.dudek@intel.com To: andrii.nakryiko@gmail.com, magnus.karlsson@intel.com, bjorn.topel@intel.com, ast@kernel.org, daniel@iogearbox.net, netdev@vger.kernel.org, jonathan.lemon@gmail.com Cc: bpf@vger.kernel.org, Mariusz Dudek Subject: [PATCH v6 bpf-next 2/2] samples/bpf: sample application for eBPF load and socket creation split Date: Wed, 2 Dec 2020 11:39:23 +0100 Message-Id: <20201202103923.12447-3-mariuszx.dudek@intel.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20201202103923.12447-1-mariuszx.dudek@intel.com> References: <20201202103923.12447-1-mariuszx.dudek@intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net From: Mariusz Dudek Introduce a sample program to demonstrate the control and data plane split. For the control plane part a new program called xdpsock_ctrl_proc is introduced. For the data plane part, some code was added to xdpsock_user.c to act as the data plane entity. Application xdpsock_ctrl_proc works as control entity with sudo privileges (CAP_SYS_ADMIN and CAP_NET_ADMIN are sufficient) and the extended xdpsock as data plane entity with CAP_NET_RAW capability only. Usage example: sudo ./samples/bpf/xdpsock_ctrl_proc -i sudo ./samples/bpf/xdpsock -i -q -n -N -l -R Signed-off-by: Mariusz Dudek --- samples/bpf/Makefile | 4 +- samples/bpf/xdpsock.h | 8 ++ samples/bpf/xdpsock_ctrl_proc.c | 187 ++++++++++++++++++++++++++++++++ samples/bpf/xdpsock_user.c | 146 +++++++++++++++++++++++-- 4 files changed, 335 insertions(+), 10 deletions(-) create mode 100644 samples/bpf/xdpsock_ctrl_proc.c diff --git a/samples/bpf/Makefile b/samples/bpf/Makefile index 05db041f8b18..26fc96ca619e 100644 --- a/samples/bpf/Makefile +++ b/samples/bpf/Makefile @@ -48,6 +48,7 @@ tprogs-y += syscall_tp tprogs-y += cpustat tprogs-y += xdp_adjust_tail tprogs-y += xdpsock +tprogs-y += xdpsock_ctrl_proc tprogs-y += xsk_fwd tprogs-y += xdp_fwd tprogs-y += task_fd_query @@ -105,6 +106,7 @@ syscall_tp-objs := syscall_tp_user.o cpustat-objs := cpustat_user.o xdp_adjust_tail-objs := xdp_adjust_tail_user.o xdpsock-objs := xdpsock_user.o +xdpsock_ctrl_proc-objs := xdpsock_ctrl_proc.o xsk_fwd-objs := xsk_fwd.o xdp_fwd-objs := xdp_fwd_user.o task_fd_query-objs := task_fd_query_user.o $(TRACE_HELPERS) @@ -202,7 +204,7 @@ TPROGLDLIBS_tracex4 += -lrt TPROGLDLIBS_trace_output += -lrt TPROGLDLIBS_map_perf_test += -lrt TPROGLDLIBS_test_overhead += -lrt -TPROGLDLIBS_xdpsock += -pthread +TPROGLDLIBS_xdpsock += -pthread -lcap TPROGLDLIBS_xsk_fwd += -pthread # Allows pointing LLC/CLANG to a LLVM backend with bpf support, redefine on cmdline: diff --git a/samples/bpf/xdpsock.h b/samples/bpf/xdpsock.h index b7eca15c78cc..fd70cce60712 100644 --- a/samples/bpf/xdpsock.h +++ b/samples/bpf/xdpsock.h @@ -8,4 +8,12 @@ #define MAX_SOCKS 4 +#define SOCKET_NAME "sock_cal_bpf_fd" +#define MAX_NUM_OF_CLIENTS 10 + +#define CLOSE_CONN 1 + +typedef __u64 u64; +typedef __u32 u32; + #endif /* XDPSOCK_H */ diff --git a/samples/bpf/xdpsock_ctrl_proc.c b/samples/bpf/xdpsock_ctrl_proc.c new file mode 100644 index 000000000000..384e62e3c6d6 --- /dev/null +++ b/samples/bpf/xdpsock_ctrl_proc.c @@ -0,0 +1,187 @@ +// SPDX-License-Identifier: GPL-2.0 +/* Copyright(c) 2017 - 2018 Intel Corporation. */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include +#include +#include "xdpsock.h" + +static const char *opt_if = ""; + +static struct option long_options[] = { + {"interface", required_argument, 0, 'i'}, + {0, 0, 0, 0} +}; + +static void usage(const char *prog) +{ + const char *str = + " Usage: %s [OPTIONS]\n" + " Options:\n" + " -i, --interface=n Run on interface n\n" + "\n"; + fprintf(stderr, "%s\n", str); + + exit(0); +} + +static void parse_command_line(int argc, char **argv) +{ + int option_index, c; + + opterr = 0; + + for (;;) { + c = getopt_long(argc, argv, "i:", + long_options, &option_index); + if (c == -1) + break; + + switch (c) { + case 'i': + opt_if = optarg; + break; + default: + usage(basename(argv[0])); + } + } +} + +static int send_xsks_map_fd(int sock, int fd) +{ + char cmsgbuf[CMSG_SPACE(sizeof(int))]; + struct msghdr msg; + struct iovec iov; + int value = 0; + + if (fd == -1) { + fprintf(stderr, "Incorrect fd = %d\n", fd); + return -1; + } + iov.iov_base = &value; + iov.iov_len = sizeof(int); + + msg.msg_name = NULL; + msg.msg_namelen = 0; + msg.msg_iov = &iov; + msg.msg_iovlen = 1; + msg.msg_flags = 0; + msg.msg_control = cmsgbuf; + msg.msg_controllen = CMSG_LEN(sizeof(int)); + + struct cmsghdr *cmsg = CMSG_FIRSTHDR(&msg); + + cmsg->cmsg_level = SOL_SOCKET; + cmsg->cmsg_type = SCM_RIGHTS; + cmsg->cmsg_len = CMSG_LEN(sizeof(int)); + + *(int *)CMSG_DATA(cmsg) = fd; + int ret = sendmsg(sock, &msg, 0); + + if (ret == -1) { + fprintf(stderr, "Sendmsg failed with %s", strerror(errno)); + return -errno; + } + + return ret; +} + +int +main(int argc, char **argv) +{ + struct sockaddr_un server; + int listening = 1; + int rval, msgsock; + int ifindex = 0; + int flag = 1; + int cmd = 0; + int sock; + int err; + int xsks_map_fd; + + parse_command_line(argc, argv); + + ifindex = if_nametoindex(opt_if); + if (ifindex == 0) { + fprintf(stderr, "Unable to get ifindex for Interface %s. Reason:%s", + opt_if, strerror(errno)); + return -errno; + } + + sock = socket(AF_UNIX, SOCK_STREAM, 0); + if (sock < 0) { + fprintf(stderr, "Opening socket stream failed: %s", strerror(errno)); + return -errno; + } + + server.sun_family = AF_UNIX; + strcpy(server.sun_path, SOCKET_NAME); + + setsockopt(sock, SOL_SOCKET, SO_REUSEADDR, &flag, sizeof(int)); + + if (bind(sock, (struct sockaddr *)&server, sizeof(struct sockaddr_un))) { + fprintf(stderr, "Binding to socket stream failed: %s", strerror(errno)); + return -errno; + } + + listen(sock, MAX_NUM_OF_CLIENTS); + + err = xsk_setup_xdp_prog(ifindex, &xsks_map_fd); + if (err) { + fprintf(stderr, "Setup of xdp program failed\n"); + goto close_sock; + } + + while (listening) { + msgsock = accept(sock, 0, 0); + if (msgsock == -1) { + fprintf(stderr, "Error accepting connection: %s", strerror(errno)); + err = -errno; + goto close_sock; + } + err = send_xsks_map_fd(msgsock, xsks_map_fd); + if (err <= 0) { + fprintf(stderr, "Error %d sending xsks_map_fd\n", err); + goto cleanup; + } + do { + rval = read(msgsock, &cmd, sizeof(int)); + if (rval < 0) { + fprintf(stderr, "Error reading stream message"); + } else { + if (cmd != CLOSE_CONN) + fprintf(stderr, "Recv unknown cmd = %d\n", cmd); + listening = 0; + break; + } + } while (rval > 0); + } + close(msgsock); + close(sock); + unlink(SOCKET_NAME); + + /* Unset fd for given ifindex */ + err = bpf_set_link_xdp_fd(ifindex, -1, 0); + if (err) { + fprintf(stderr, "Error when unsetting bpf prog_fd for ifindex(%d)\n", ifindex); + return err; + } + + return 0; + +cleanup: + close(msgsock); +close_sock: + close(sock); + unlink(SOCKET_NAME); + return err; +} diff --git a/samples/bpf/xdpsock_user.c b/samples/bpf/xdpsock_user.c index 036bd019e400..0fee7f3aef3c 100644 --- a/samples/bpf/xdpsock_user.c +++ b/samples/bpf/xdpsock_user.c @@ -24,10 +24,12 @@ #include #include #include +#include #include #include #include #include +#include #include #include @@ -96,6 +98,7 @@ static bool opt_need_wakeup = true; static u32 opt_num_xsks = 1; static u32 prog_id; static bool opt_busy_poll; +static bool opt_reduced_cap; struct xsk_ring_stats { unsigned long rx_npkts; @@ -154,6 +157,7 @@ struct xsk_socket_info { static int num_socks; struct xsk_socket_info *xsks[MAX_SOCKS]; +int sock; static unsigned long get_nsecs(void) { @@ -461,6 +465,7 @@ static void *poller(void *arg) static void remove_xdp_program(void) { u32 curr_prog_id = 0; + int cmd = CLOSE_CONN; if (bpf_get_link_xdp_id(opt_ifindex, &curr_prog_id, opt_xdp_flags)) { printf("bpf_get_link_xdp_id failed\n"); @@ -472,6 +477,13 @@ static void remove_xdp_program(void) printf("couldn't find a prog id on a given interface\n"); else printf("program on interface changed, not removing\n"); + + if (opt_reduced_cap) { + if (write(sock, &cmd, sizeof(int)) < 0) { + fprintf(stderr, "Error writing into stream socket: %s", strerror(errno)); + exit(EXIT_FAILURE); + } + } } static void int_exit(int sig) @@ -854,7 +866,7 @@ static struct xsk_socket_info *xsk_configure_socket(struct xsk_umem_info *umem, xsk->umem = umem; cfg.rx_size = XSK_RING_CONS__DEFAULT_NUM_DESCS; cfg.tx_size = XSK_RING_PROD__DEFAULT_NUM_DESCS; - if (opt_num_xsks > 1) + if (opt_num_xsks > 1 || opt_reduced_cap) cfg.libbpf_flags = XSK_LIBBPF_FLAGS__INHIBIT_PROG_LOAD; else cfg.libbpf_flags = 0; @@ -913,6 +925,7 @@ static struct option long_options[] = { {"app-stats", no_argument, 0, 'a'}, {"irq-string", no_argument, 0, 'I'}, {"busy-poll", no_argument, 0, 'B'}, + {"reduce-cap", no_argument, 0, 'R'}, {0, 0, 0, 0} }; @@ -935,7 +948,7 @@ static void usage(const char *prog) " -m, --no-need-wakeup Turn off use of driver need wakeup flag.\n" " -f, --frame-size=n Set the frame size (must be a power of two in aligned mode, default is %d).\n" " -u, --unaligned Enable unaligned chunk placement\n" - " -M, --shared-umem Enable XDP_SHARED_UMEM\n" + " -M, --shared-umem Enable XDP_SHARED_UMEM (cannot be used with -R)\n" " -F, --force Force loading the XDP prog\n" " -d, --duration=n Duration in secs to run command.\n" " Default: forever.\n" @@ -952,6 +965,7 @@ static void usage(const char *prog) " -a, --app-stats Display application (syscall) statistics.\n" " -I, --irq-string Display driver interrupt statistics for interface associated with irq-string.\n" " -B, --busy-poll Busy poll.\n" + " -R, --reduce-cap Use reduced capabilities (cannot be used with -M)\n" "\n"; fprintf(stderr, str, prog, XSK_UMEM__DEFAULT_FRAME_SIZE, opt_batch_size, MIN_PKT_SIZE, MIN_PKT_SIZE, @@ -967,7 +981,7 @@ static void parse_command_line(int argc, char **argv) opterr = 0; for (;;) { - c = getopt_long(argc, argv, "Frtli:q:pSNn:czf:muMd:b:C:s:P:xQaI:B", + c = getopt_long(argc, argv, "Frtli:q:pSNn:czf:muMd:b:C:s:P:xQaI:BR", long_options, &option_index); if (c == -1) break; @@ -1069,6 +1083,9 @@ static void parse_command_line(int argc, char **argv) case 'B': opt_busy_poll = 1; break; + case 'R': + opt_reduced_cap = true; + break; default: usage(basename(argv[0])); } @@ -1090,6 +1107,11 @@ static void parse_command_line(int argc, char **argv) opt_xsk_frame_size); usage(basename(argv[0])); } + + if (opt_reduced_cap && opt_num_xsks > 1) { + fprintf(stderr, "ERROR: -M and -R cannot be used together\n"); + usage(basename(argv[0])); + } } static void kick_tx(struct xsk_socket_info *xsk) @@ -1487,26 +1509,117 @@ static void apply_setsockopt(struct xsk_socket_info *xsk) exit_with_error(errno); } +static int recv_xsks_map_fd_from_ctrl_node(int sock, int *_fd) +{ + char cms[CMSG_SPACE(sizeof(int))]; + struct cmsghdr *cmsg; + struct msghdr msg; + struct iovec iov; + int value; + int len; + + iov.iov_base = &value; + iov.iov_len = sizeof(int); + + msg.msg_name = 0; + msg.msg_namelen = 0; + msg.msg_iov = &iov; + msg.msg_iovlen = 1; + msg.msg_flags = 0; + msg.msg_control = (caddr_t)cms; + msg.msg_controllen = sizeof(cms); + + len = recvmsg(sock, &msg, 0); + + if (len < 0) { + fprintf(stderr, "Recvmsg failed length incorrect.\n"); + return -EINVAL; + } + + if (len == 0) { + fprintf(stderr, "Recvmsg failed no data\n"); + return -EINVAL; + } + + cmsg = CMSG_FIRSTHDR(&msg); + *_fd = *(int *)CMSG_DATA(cmsg); + + return 0; +} + +static int +recv_xsks_map_fd(int *xsks_map_fd) +{ + struct sockaddr_un server; + int err; + + sock = socket(AF_UNIX, SOCK_STREAM, 0); + if (sock < 0) { + fprintf(stderr, "Error opening socket stream: %s", strerror(errno)); + return errno; + } + + server.sun_family = AF_UNIX; + strcpy(server.sun_path, SOCKET_NAME); + + if (connect(sock, (struct sockaddr *)&server, sizeof(struct sockaddr_un)) < 0) { + close(sock); + fprintf(stderr, "Error connecting stream socket: %s", strerror(errno)); + return errno; + } + + err = recv_xsks_map_fd_from_ctrl_node(sock, xsks_map_fd); + if (err) { + fprintf(stderr, "Error %d recieving fd\n", err); + return err; + } + return 0; +} + int main(int argc, char **argv) { + struct __user_cap_header_struct hdr = { _LINUX_CAPABILITY_VERSION_3, 0 }; + struct __user_cap_data_struct data[2] = { { 0 } }; struct rlimit r = {RLIM_INFINITY, RLIM_INFINITY}; bool rx = false, tx = false; struct xsk_umem_info *umem; struct bpf_object *obj; + int xsks_map_fd = 0; pthread_t pt; int i, ret; void *bufs; parse_command_line(argc, argv); - if (setrlimit(RLIMIT_MEMLOCK, &r)) { - fprintf(stderr, "ERROR: setrlimit(RLIMIT_MEMLOCK) \"%s\"\n", - strerror(errno)); - exit(EXIT_FAILURE); + if (opt_reduced_cap) { + if (capget(&hdr, data) < 0) + fprintf(stderr, "Error getting capabilities\n"); + + data->effective &= CAP_TO_MASK(CAP_NET_RAW); + data->permitted &= CAP_TO_MASK(CAP_NET_RAW); + + if (capset(&hdr, data) < 0) + fprintf(stderr, "Setting capabilities failed\n"); + + if (capget(&hdr, data) < 0) { + fprintf(stderr, "Error getting capabilities\n"); + } else { + fprintf(stderr, "Capabilities EFF %x Caps INH %x Caps Per %x\n", + data[0].effective, data[0].inheritable, data[0].permitted); + fprintf(stderr, "Capabilities EFF %x Caps INH %x Caps Per %x\n", + data[1].effective, data[1].inheritable, data[1].permitted); + } + } else { + if (setrlimit(RLIMIT_MEMLOCK, &r)) { + fprintf(stderr, "ERROR: setrlimit(RLIMIT_MEMLOCK) \"%s\"\n", + strerror(errno)); + exit(EXIT_FAILURE); + } + + if (opt_num_xsks > 1) + load_xdp_program(argv, &obj); } - if (opt_num_xsks > 1) - load_xdp_program(argv, &obj); /* Reserve memory for the umem. Use hugepages if unaligned chunk mode */ bufs = mmap(NULL, NUM_FRAMES * opt_xsk_frame_size, @@ -1541,6 +1654,21 @@ int main(int argc, char **argv) if (opt_num_xsks > 1 && opt_bench != BENCH_TXONLY) enter_xsks_into_map(obj); + if (opt_reduced_cap) { + ret = recv_xsks_map_fd(&xsks_map_fd); + if (ret) { + fprintf(stderr, "Error %d receiving xsks_map_fd\n", ret); + exit_with_error(ret); + } + if (xsks[0]->xsk) { + ret = xsk_socket__update_xskmap(xsks[0]->xsk, xsks_map_fd); + if (ret) { + fprintf(stderr, "Update of BPF map failed(%d)\n", ret); + exit_with_error(ret); + } + } + } + signal(SIGINT, int_exit); signal(SIGTERM, int_exit); signal(SIGABRT, int_exit);