[bpf-next,v5,2/3] libbpf: add low level TC-BPF API

This adds functions that wrap the netlink API used for adding,
manipulating, and removing traffic control filters.

An API summary:

A bpf_tc_hook represents a location where a TC-BPF filter can be
attached. This means that creating a hook leads to creation of the
backing qdisc, while destruction either removes all filters attached to
a hook, or destroys qdisc if requested explicitly (as discussed below).

The TC-BPF API functions operate on this bpf_tc_hook to attach, replace,
query, and detach tc filters.

All functions return 0 on success, and a negative error code on failure.

bpf_tc_hook_create - Create a hook
Parameters:
	@hook - Cannot be NULL, ifindex > 0, attach_point must be set to
		proper enum constant. Note that parent must be unset when
		attach_point is one of BPF_TC_INGRESS or BPF_TC_EGRESS. Note
		that as an exception BPF_TC_INGRESS|BPF_TC_EGRESS is also a
		valid value for attach_point.

		Returns -EOPNOTSUPP when hook has attach_point as BPF_TC_CUSTOM.

	@flags - Currently only BPF_TC_F_REPLACE, which creates qdisc in
		 non-exclusive mode (i.e. an existing qdisc will be replaced
		 instead of this function failing with -EEXIST).

bpf_tc_hook_destroy - Destroy the hook
Parameters:
        @hook - Cannot be NULL. The behaviour depends on value of
		attach_point.

		If BPF_TC_INGRESS, all filters attached to the ingress
		hook will be detached.
		If BPF_TC_EGRESS, all filters attached to the egress hook
		will be detached.
		If BPF_TC_INGRESS|BPF_TC_EGRESS, the clsact qdisc will be
		deleted, also detaching all filters.

		It is advised that if the qdisc is operated on by many programs,
		then the program atleast check that there are no other existing
		filters before deleting the clsact qdisc. An example is shown
		below:

		/* set opts as NULL, as we're not really interested in
		 * getting any info for a particular filter, but just
	 	 * detecting its presence.
		 */
		DECLARE_LIBBPF_OPTS(bpf_tc_hook, .ifindex = if_nametoindex("lo"),
				    .attach_point = BPF_TC_INGRESS);
		r = bpf_tc_query(&hook, NULL);
		if (r < 0 && r == -ENOENT) {
			/* no filters */
			hook.attach_point = BPF_TC_INGRESS|BPF_TC_EGREESS;
			return bpf_tc_hook_destroy(&hook);
		} else /* failed or r == 0, the latter means filters do exist */
			return r;

		Note that there is a small race between checking for no
		filters and deleting the qdisc. This is currently unavoidable.

		Returns -EOPNOTSUPP when hook has attach_point as BPF_TC_CUSTOM.

bpf_tc_attach - Attach a filter to a hook
Parameters:
	@hook - Cannot be NULL. Represents the hook the filter will be
		attached to. Requirements for ifindex and attach_point are
		same as described in bpf_tc_hook_create, but BPF_TC_CUSTOM
		is also supported.  In that case, parent must be set to the
		handle where the filter will be attached (using TC_H_MAKE).

		E.g. To set parent to 1:16 like in tc command line,
		     the equivalent would be TC_H_MAKE(1 << 16, 16)

	@opts - Cannot be NULL.

		The following opts are optional:
			handle - The handle of the filter
			priority - The priority of the filter
				   Must be >= 0 and <= UINT16_MAX
		The following opts must be set:
			prog_fd - The fd of the loaded SCHED_CLS prog
		The following opts must be unset:
			prog_id - The ID of the BPF prog

		The following opts will be filled by bpf_tc_attach on a
		successful attach operation if they are unset:
			handle - The handle of the attached filter
			priority - The priority of the attached filter
			prog_id - The ID of the attached SCHED_CLS prog

		This way, the user can know what the auto allocated
		values for optional opts like handle and priority are
		for the newly attached filter, if they were unset.

		Note that some other attributes are set to some default
		values listed below (this holds for all bpf_tc_* APIs):
			protocol - ETH_P_ALL
			mode - direct action
			chain index - 0
			class ID - 0 (this can be set by writing to the
			skb->tc_classid field from the BPF program)

	@flags - Currently only BPF_TC_F_REPLACE, which creates filter
		 in non-exclusive mode (i.e. an existing filter with the
		 same attributes will be replaced instead of this
		 function failing with -EEXIST).

bpf_tc_detach
Parameters:
	@hook: Cannot be NULL. Represents the hook the filter will be
		detached from. Requirements are same as described above
		in bpf_tc_attach.

	@opts:	Cannot be NULL.

		The following opts must be set:
			handle
			priority
		The following opts must be unset:
			prog_fd
			prog_id

bpf_tc_query
Parameters:
	@hook: Cannot be NULL. Represents the hook where the filter
	       lookup will be performed. Requires are same as described
	       above in bpf_tc_attach.

	@opts: Can be NULL.

	       The following opts are optional:
			handle
			priority
			prog_fd
			prog_id

	       However, only one of prog_fd and prog_id must be
	       set. Setting both leads to an error. Setting none is
	       allowed.

	       The following fields will be filled by bpf_tc_query on a
	       successful lookup if they are unset:
			handle
			priority
			prog_id

	       Based on the specified optional parameters, the matching
	       data for the first matching filter is filled in and 0 is
	       returned. When setting prog_fd, the prog_id will be
	       matched against prog_id of the loaded SCHED_CLS prog
	       represented by prog_fd.

	       To uniquely identify a filter, e.g. to detect its presence,
	       it is recommended to set both handle and priority fields.

Some usage examples (using bpf skeleton infrastructure):

BPF program (test_tc_bpf.c):

	#include <linux/bpf.h>
	#include <bpf/bpf_helpers.h>

	SEC("classifier")
	int cls(struct __sk_buff *skb)
	{
		return 0;
	}

Userspace loader:

	DECLARE_LIBBPF_OPTS(bpf_tc_opts, opts, 0);
	struct test_tc_bpf *skel = NULL;
	int fd, r;

	skel = test_tc_bpf__open_and_load();
	if (!skel)
		return -ENOMEM;

	fd = bpf_program__fd(skel->progs.cls);

	DECLARE_LIBBPF_OPTS(bpf_tc_hook, hook, .ifindex =
			    if_nametoindex("lo"), .attach_point =
			    BPF_TC_INGRESS);
	/* Create clsact qdisc */
	r = bpf_tc_hook_create(&hook, 0);
	if (r < 0)
		goto end;

	DECLARE_LIBBPF_OPTS(bpf_tc_opts, opts, .prog_fd = fd);
	r = bpf_tc_attach(&hook, &opts, 0);
	if (r < 0)
		goto end;
	/* Print the auto allocated handle and priority */
	printf("Handle=%"PRIu32", opts.handle);
	printf("Priority=%"PRIu32", opts.priority);

	opts.prog_fd = opts.prog_id = 0;
	bpf_tc_detach(&hook, &opts);
end:
	test_tc_bpf__destroy(skel);

This is equivalent to doing the following using tc command line:
  # tc qdisc add dev lo clsact
  # tc filter add dev lo ingress bpf obj foo.o sec classifier da

Another example replacing a filter (extending prior example):

	/* We can also choose both (or one), let's try replacing an
	 * existing filter.
	 */
	DECLARE_LIBBPF_OPTS(bpf_tc_opts, replace_opts, .handle =
			    opts.handle, .priority = opts.priority,
			    .prog_fd = fd);
	r = bpf_tc_attach(&hook, &replace_opts, 0);
	if (r < 0 && r == -EEXIST) {
		/* Expected, now use BPF_TC_F_REPLACE to replace it */
		return bpf_tc_attach(&hook, &replace_opts, BPF_TC_F_REPLACE);
	} else if (r == 0) {
		/* There must be no existing filter with these
		 * attributes, so cleanup and return an error.
		 */
		replace_opts.prog_fd = replace_opts.prog_id = 0;
		r = bpf_tc_detach(&hook, &replace_opts);
		if (r == 0)
			r = -1;
	}
	return r;

To obtain info of a particular filter:

	/* Find info for filter with handle 1 and priority 50 */
	DECLARE_LIBBPF_OPTS(bpf_tc_opts, info_opts, .handle = 1,
			    .priority = 50);
	r = bpf_tc_query(&hook, &info_opts);
	if (r < 0 && r == -ENOENT)
		printf("Filter not found");
	else if (r == 0)
		printf("Prog ID: %"PRIu32", info_opts.prog_id);
	return r;

We can also match using prog_id to find the same filter:

	DECLARE_LIBBPF_OPTS(bpf_tc_opts, info_opts2, .prog_id =
			    info_opts.prog_id);
	r = bpf_tc_query(&hook, &info_opts2);
	if (r < 0 && r == -ENOENT)
		printf("Filter not found");
	else if (r == 0) {
		/* If we know there's only one filter for this loaded prog,
		 * it is safe to assert that the handle and priority are
		 * as expected.
		 */
		assert(info_opts2.handle == 1);
		assert(info_opts2.priority == 50);
	}
	return r;

Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 tools/lib/bpf/libbpf.h   |  41 ++++
 tools/lib/bpf/libbpf.map |   5 +
 tools/lib/bpf/netlink.c  | 463 ++++++++++++++++++++++++++++++++++++++-
 3 files changed, 508 insertions(+), 1 deletion(-)

Message ID	20210428162553.719588-3-memxor@gmail.com (mailing list archive)
State	Superseded
Delegated to:	BPF
Headers	show Return-Path: <bpf-owner@kernel.org> X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9B52FC43460 for <bpf@archiver.kernel.org>; Wed, 28 Apr 2021 16:26:09 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 67C5961443 for <bpf@archiver.kernel.org>; Wed, 28 Apr 2021 16:26:09 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S240819AbhD1Q0x (ORCPT <rfc822;bpf@archiver.kernel.org>); Wed, 28 Apr 2021 12:26:53 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60026 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S240811AbhD1Q0w (ORCPT <rfc822;bpf@vger.kernel.org>); Wed, 28 Apr 2021 12:26:52 -0400 Received: from mail-pj1-x1043.google.com (mail-pj1-x1043.google.com [IPv6:2607:f8b0:4864:20::1043]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 15659C061573; Wed, 28 Apr 2021 09:26:07 -0700 (PDT) Received: by mail-pj1-x1043.google.com with SMTP id f6-20020a17090a6546b029015088cf4a1eso9411889pjs.2; Wed, 28 Apr 2021 09:26:07 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=zS568GGYEqP3Sico18CaWkqCg6Dlcw2U42tSFOnLSug=; b=Paoz++uQwKX1c5wmfXP1os5fUvCUQshdJLGcFwgB0bGBxLMPkpW43GUqkqovmm8VTI mTLuAsg7aVvRfckguj/I2bmcls0r7h5B+xd3zPvif7a3b9J8TleS0yiuo1u1gf3kXJV3 CMUVMNandYmYZir7y3Hl0Ox5wg1KpQp+TVlvQoAI0Ad5IgetFTEto8tQqMfQ3flHzBsw DdUV1utE/2Rzg1XUS3XV/dsPiJXq7HAei9GE5OuUZFmRFL4s53EhoDzDG3yAsP+AjW42 cj6I9SpagYo+Ffs80ZrY06TgoFiz+8FYllNSUxrs4xheQ/Yr1VXMOc8im5D8C0MVcqLD 2edw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=zS568GGYEqP3Sico18CaWkqCg6Dlcw2U42tSFOnLSug=; b=iR+/djVqfbRRDeWVfxRMQpoSykXFWxoUirBc9sZSdsvWWce0HO3pJls6Wdkr529oO5 gzf0ILU1sozC84WNj01Tom4TPqOWpIQ9TKxi/fqPs8JOZ7qXUHkWvk3uQ5muCz8tjCt3 tu1urSjwBbA2OQvwRiMBUMUufuTGHa1jYkYwhNPlPsu7ykPR63RA7zXthorY1vJ6HXbn JQ3ZadLngCm/eZwqHpyb/IMkLlE6I15EHb4YTaOI0jJwfwmL8DWVgnXXQcxv7YBkCBOQ QkCWn3JM2BCjbXT2cjkEyJ3sbLF0eiD5eL9aywzhV4rrDMJxqNfptNaNJ1tZkCkYVBtN x0xQ== X-Gm-Message-State: AOAM530pWHdcaBF3LhtkHTpGXx7VPDoeqSgqp8BtLygUnx1hz58weLzf UbSNEorVALUgPtbEdzebF4O4cgJKmWp2FA== X-Google-Smtp-Source: ABdhPJyGV0pp+l1fycEtcZm6QQK2dOAh7INvyJymkxng8VCIZJvdeSpxTRu459GV57qnVXS70Q3baQ== X-Received: by 2002:a17:902:7788:b029:e9:11:5334 with SMTP id o8-20020a1709027788b02900e900115334mr30582944pll.70.1619627165994; Wed, 28 Apr 2021 09:26:05 -0700 (PDT) Received: from localhost ([112.79.247.72]) by smtp.gmail.com with ESMTPSA id k17sm73089pji.47.2021.04.28.09.26.05 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 28 Apr 2021 09:26:05 -0700 (PDT) From: Kumar Kartikeya Dwivedi <memxor@gmail.com> To: bpf@vger.kernel.org Cc: Kumar Kartikeya Dwivedi <memxor@gmail.com>, =?utf-8?q?Toke_H=C3=B8iland-?= =?utf-8?q?J=C3=B8rgensen?= <toke@redhat.com>, Alexei Starovoitov <ast@kernel.org>, Daniel Borkmann <daniel@iogearbox.net>, Andrii Nakryiko <andrii@kernel.org>, Martin KaFai Lau <kafai@fb.com>, Song Liu <songliubraving@fb.com>, Yonghong Song <yhs@fb.com>, John Fastabend <john.fastabend@gmail.com>, KP Singh <kpsingh@kernel.org>, "David S. Miller" <davem@davemloft.net>, Jakub Kicinski <kuba@kernel.org>, Jesper Dangaard Brouer <brouer@redhat.com>, Shaun Crampton <shaun@tigera.io>, netdev@vger.kernel.org Subject: [PATCH bpf-next v5 2/3] libbpf: add low level TC-BPF API Date: Wed, 28 Apr 2021 21:55:52 +0530 Message-Id: <20210428162553.719588-3-memxor@gmail.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20210428162553.719588-1-memxor@gmail.com> References: <20210428162553.719588-1-memxor@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Precedence: bulk List-ID: <bpf.vger.kernel.org> X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net
Series	Add TC-BPF API \| expand [bpf-next,v5,0/3] Add TC-BPF API [bpf-next,v5,1/3] libbpf: add netlink helpers [bpf-next,v5,2/3] libbpf: add low level TC-BPF API [bpf-next,v5,3/3] libbpf: add selftests for TC-BPF API

Context	Check	Description
netdev/cover_letter	success	Link
netdev/fixes_present	success	Link
netdev/patch_count	success	Link
netdev/tree_selection	success	Clearly marked for bpf-next
netdev/subject_prefix	success	Link
netdev/cc_maintainers	success	CCed 10 of 10 maintainers
netdev/source_inline	success	Was 0 now: 0
netdev/verify_signedoff	success	Link
netdev/module_param	success	Was 0 now: 0
netdev/build_32bit	success	Errors and warnings before: 0 this patch: 0
netdev/kdoc	success	Errors and warnings before: 0 this patch: 0
netdev/verify_fixes	success	Link
netdev/checkpatch	fail	CHECK: Alignment should match open parenthesis CHECK: Please don't use multiple blank lines CHECK: Unbalanced braces around else statement CHECK: braces {} should be used on all arms of this statement CHECK: spaces preferred around that '\|' (ctx:VxV) ERROR: code indent should use tabs where possible ERROR: space prohibited before that ':' (ctx:WxV) ERROR: switch and case should be at the same indent WARNING: Missing a blank line after declarations WARNING: Prefer 'long long' over 'long long int' as the int is unnecessary WARNING: line length of 104 exceeds 80 columns WARNING: line length of 84 exceeds 80 columns WARNING: line length of 87 exceeds 80 columns WARNING: line length of 99 exceeds 80 columns
netdev/build_allmodconfig_warn	success	Errors and warnings before: 0 this patch: 0
netdev/header_inline	success	Link

[bpf-next,v5,2/3] libbpf: add low level TC-BPF API

Checks

Commit Message

Comments

Patch