[PATCHv6,bpf-next,03/28] bpf: Add multi uprobe link

Message ID	20230803073420.1558613-4-jolsa@kernel.org (mailing list archive)
State	Superseded
Delegated to:	BPF
Headers	show Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4FFBF9450 for <bpf@vger.kernel.org>; Thu, 3 Aug 2023 07:34:55 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 5A8E3C433C7; Thu, 3 Aug 2023 07:34:52 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1691048095; bh=6KDINBWKk8WLD24OKFEO/OmWDl2nF8kvUBKQcsgUKo8=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=rbc2oDMGQez4grH9P+BLS8oiri1gPkxj4Ghxctj8zw21uQTGLCYtyYPHYqWoyh+nV dh2GbCmmguPyQn4dhI6SWg4E72NEYJ1YQFyoa/UamptnVxuprIxN1hsl3hLbZnuO4+ NkQ7cu4HUWmEWE1cBbgfn1nUxv8baFTb6bcsNpMsedv3ItWhGGG+sSDFGN0tqgPVxz hQLORUSHqXSbb6QU3DJ+3buBW/kBhWHCZoMdG9e4LNKoaXf0hRSMXCcuFOHdiolXl8 T9VHMr0juWhdyzo+r1k8uyPUj90xUiw0TKcySt1WhCmVt6bMOTubkEIv09YloRyLLK 8WmVoiGUpe/9A== From: Jiri Olsa <jolsa@kernel.org> To: Alexei Starovoitov <ast@kernel.org>, Daniel Borkmann <daniel@iogearbox.net>, Andrii Nakryiko <andrii@kernel.org> Cc: Yafang Shao <laoar.shao@gmail.com>, bpf@vger.kernel.org, Martin KaFai Lau <kafai@fb.com>, Song Liu <songliubraving@fb.com>, Yonghong Song <yhs@fb.com>, John Fastabend <john.fastabend@gmail.com>, KP Singh <kpsingh@chromium.org>, Stanislav Fomichev <sdf@google.com>, Hao Luo <haoluo@google.com> Subject: [PATCHv6 bpf-next 03/28] bpf: Add multi uprobe link Date: Thu, 3 Aug 2023 09:33:55 +0200 Message-ID: <20230803073420.1558613-4-jolsa@kernel.org> X-Mailer: git-send-email 2.41.0 In-Reply-To: <20230803073420.1558613-1-jolsa@kernel.org> References: <20230803073420.1558613-1-jolsa@kernel.org> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: <bpf.vger.kernel.org> List-Subscribe: <mailto:bpf+subscribe@vger.kernel.org> List-Unsubscribe: <mailto:bpf+unsubscribe@vger.kernel.org> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Delegate: bpf@iogearbox.net
Series	bpf: Add multi uprobe link \| expand [PATCHv6,bpf-next,00/28] bpf: Add multi uprobe link [PATCHv6,bpf-next,01/28] bpf: Switch BPF_F_KPROBE_MULTI_RETURN macro to enum [PATCHv6,bpf-next,02/28] bpf: Add attach_type checks under bpf_prog_attach_check_attach_type [PATCHv6,bpf-next,03/28] bpf: Add multi uprobe link [PATCHv6,bpf-next,04/28] bpf: Add cookies support for uprobe_multi link [PATCHv6,bpf-next,05/28] bpf: Add pid filter support for uprobe_multi link [PATCHv6,bpf-next,06/28] bpf: Add bpf_get_func_ip helper support for uprobe link [PATCHv6,bpf-next,07/28] libbpf: Add uprobe_multi attach type and link names [PATCHv6,bpf-next,08/28] libbpf: Move elf_find_func_offset* functions to elf object [PATCHv6,bpf-next,09/28] libbpf: Add elf_open/elf_close functions [PATCHv6,bpf-next,10/28] libbpf: Add elf symbol iterator [PATCHv6,bpf-next,11/28] libbpf: Add elf_resolve_syms_offsets function [PATCHv6,bpf-next,12/28] libbpf: Add elf_resolve_pattern_offsets function [PATCHv6,bpf-next,13/28] libbpf: Add bpf_link_create support for multi uprobes [PATCHv6,bpf-next,14/28] libbpf: Add bpf_program__attach_uprobe_multi function [PATCHv6,bpf-next,15/28] libbpf: Add support for u[ret]probe.multi[.s] program sections [PATCHv6,bpf-next,16/28] libbpf: Add uprobe multi link detection [PATCHv6,bpf-next,17/28] libbpf: Add uprobe multi link support to bpf_program__attach_usdt [PATCHv6,bpf-next,18/28] selftests/bpf: Move get_time_ns to testing_helpers.h [PATCHv6,bpf-next,19/28] selftests/bpf: Add uprobe_multi skel test [PATCHv6,bpf-next,20/28] selftests/bpf: Add uprobe_multi api test [PATCHv6,bpf-next,21/28] selftests/bpf: Add uprobe_multi link test [PATCHv6,bpf-next,22/28] selftests/bpf: Add uprobe_multi test program [PATCHv6,bpf-next,23/28] selftests/bpf: Add uprobe_multi bench test [PATCHv6,bpf-next,24/28] selftests/bpf: Add uprobe_multi usdt test code [PATCHv6,bpf-next,25/28] selftests/bpf: Add uprobe_multi usdt bench test [PATCHv6,bpf-next,26/28] selftests/bpf: Add uprobe_multi cookie test [PATCHv6,bpf-next,27/28] selftests/bpf: Add uprobe_multi pid filter tests [PATCHv6,bpf-next,28/28] selftests/bpf: Add extra link to uprobe_multi tests

Message ID

20230803073420.1558613-4-jolsa@kernel.org (mailing list archive)

State

Superseded

Delegated to:

BPF

Headers

From: Jiri Olsa <jolsa@kernel.org>
To: Alexei Starovoitov <ast@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Andrii Nakryiko <andrii@kernel.org>
Cc: Yafang Shao <laoar.shao@gmail.com>,
	bpf@vger.kernel.org,
	Martin KaFai Lau <kafai@fb.com>,
	Song Liu <songliubraving@fb.com>,
	Yonghong Song <yhs@fb.com>,
	John Fastabend <john.fastabend@gmail.com>,
	KP Singh <kpsingh@chromium.org>,
	Stanislav Fomichev <sdf@google.com>,
	Hao Luo <haoluo@google.com>
Subject: [PATCHv6 bpf-next 03/28] bpf: Add multi uprobe link
Date: Thu,  3 Aug 2023 09:33:55 +0200
Message-ID: <20230803073420.1558613-4-jolsa@kernel.org>
In-Reply-To: <20230803073420.1558613-1-jolsa@kernel.org>
References: <20230803073420.1558613-1-jolsa@kernel.org>
Precedence: bulk
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit

Series

bpf: Add multi uprobe link | expand

Context	Check	Description
netdev/series_format	fail	Series longer than 15 patches (and no cover letter)
netdev/tree_selection	success	Clearly marked for bpf-next, async
netdev/fixes_present	success	Fixes tag not required for -next series
netdev/header_inline	success	No static functions without inline keyword in header files
netdev/build_32bit	fail	Errors and warnings before: 4279 this patch: 4280
netdev/cc_maintainers	warning	7 maintainers not CCed: linux-trace-kernel@vger.kernel.org kpsingh@kernel.org mhiramat@kernel.org martin.lau@linux.dev song@kernel.org yonghong.song@linux.dev rostedt@goodmis.org
netdev/build_clang	success	Errors and warnings before: 1719 this patch: 1719
netdev/verify_signedoff	success	Signed-off-by tag matches author and committer
netdev/deprecated_api	success	None detected
netdev/check_selftest	success	No net selftest shell script
netdev/verify_fixes	success	No Fixes tag
netdev/build_allmodconfig_warn	fail	Errors and warnings before: 4450 this patch: 4451
netdev/checkpatch	warning	CHECK: Please use a blank line after function/struct/union/enum declarations WARNING: line length of 100 exceeds 80 columns WARNING: line length of 81 exceeds 80 columns WARNING: line length of 83 exceeds 80 columns WARNING: line length of 84 exceeds 80 columns WARNING: line length of 86 exceeds 80 columns WARNING: line length of 88 exceeds 80 columns WARNING: line length of 90 exceeds 80 columns WARNING: line length of 91 exceeds 80 columns WARNING: line length of 95 exceeds 80 columns
netdev/kdoc	success	Errors and warnings before: 0 this patch: 0
netdev/source_inline	success	Was 0 now: 0
bpf/vmtest-bpf-next-PR	success	PR summary
bpf/vmtest-bpf-next-VM_Test-1	success	Logs for ShellCheck
bpf/vmtest-bpf-next-VM_Test-6	success	Logs for set-matrix
bpf/vmtest-bpf-next-VM_Test-2	success	Logs for build for aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-4	success	Logs for build for x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-5	success	Logs for build for x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-3	success	Logs for build for s390x with gcc
bpf/vmtest-bpf-next-VM_Test-9	success	Logs for test_maps on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-10	success	Logs for test_maps on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-19	success	Logs for test_progs_no_alu32_parallel on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-20	success	Logs for test_progs_no_alu32_parallel on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-21	success	Logs for test_progs_no_alu32_parallel on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-22	success	Logs for test_progs_parallel on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-23	success	Logs for test_progs_parallel on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-24	success	Logs for test_progs_parallel on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-25	success	Logs for test_verifier on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-27	success	Logs for test_verifier on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-28	success	Logs for test_verifier on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-29	success	Logs for veristat
bpf/vmtest-bpf-next-VM_Test-7	success	Logs for test_maps on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-11	success	Logs for test_progs on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-13	success	Logs for test_progs on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-14	success	Logs for test_progs on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-15	success	Logs for test_progs_no_alu32 on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-17	success	Logs for test_progs_no_alu32 on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-18	success	Logs for test_progs_no_alu32 on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-12	success	Logs for test_progs on s390x with gcc
bpf/vmtest-bpf-next-VM_Test-16	success	Logs for test_progs_no_alu32 on s390x with gcc
bpf/vmtest-bpf-next-VM_Test-26	success	Logs for test_verifier on s390x with gcc
bpf/vmtest-bpf-next-VM_Test-8	fail	Logs for test_maps on s390x with gcc

Context

Check

Description

netdev/series_format

fail

Series longer than 15 patches (and no cover letter)

netdev/tree_selection

success

Clearly marked for bpf-next, async

netdev/fixes_present

success

Fixes tag not required for -next series

netdev/header_inline

success

No static functions without inline keyword in header files

netdev/build_32bit

fail

Errors and warnings before: 4279 this patch: 4280

netdev/cc_maintainers

warning

7 maintainers not CCed: linux-trace-kernel@vger.kernel.org kpsingh@kernel.org mhiramat@kernel.org martin.lau@linux.dev song@kernel.org yonghong.song@linux.dev rostedt@goodmis.org

netdev/build_clang

success

Errors and warnings before: 1719 this patch: 1719

netdev/verify_signedoff

success

Signed-off-by tag matches author and committer

netdev/deprecated_api

success

None detected

netdev/check_selftest

success

No net selftest shell script

netdev/verify_fixes

success

No Fixes tag

netdev/build_allmodconfig_warn

fail

Errors and warnings before: 4450 this patch: 4451

netdev/checkpatch

warning

CHECK: Please use a blank line after function/struct/union/enum declarations WARNING: line length of 100 exceeds 80 columns WARNING: line length of 81 exceeds 80 columns WARNING: line length of 83 exceeds 80 columns WARNING: line length of 84 exceeds 80 columns WARNING: line length of 86 exceeds 80 columns WARNING: line length of 88 exceeds 80 columns WARNING: line length of 90 exceeds 80 columns WARNING: line length of 91 exceeds 80 columns WARNING: line length of 95 exceeds 80 columns

netdev/kdoc

success

Errors and warnings before: 0 this patch: 0

netdev/source_inline

success

Was 0 now: 0

bpf/vmtest-bpf-next-PR

success

PR summary

bpf/vmtest-bpf-next-VM_Test-1

success

Logs for ShellCheck

bpf/vmtest-bpf-next-VM_Test-6

success

Logs for set-matrix

bpf/vmtest-bpf-next-VM_Test-2

success

Logs for build for aarch64 with gcc

bpf/vmtest-bpf-next-VM_Test-4

success

Logs for build for x86_64 with gcc

bpf/vmtest-bpf-next-VM_Test-5

success

Logs for build for x86_64 with llvm-16

bpf/vmtest-bpf-next-VM_Test-3

success

Logs for build for s390x with gcc

bpf/vmtest-bpf-next-VM_Test-9

success

Logs for test_maps on x86_64 with gcc

bpf/vmtest-bpf-next-VM_Test-10

success

Logs for test_maps on x86_64 with llvm-16

bpf/vmtest-bpf-next-VM_Test-19

success

Logs for test_progs_no_alu32_parallel on aarch64 with gcc

bpf/vmtest-bpf-next-VM_Test-20

success

Logs for test_progs_no_alu32_parallel on x86_64 with gcc

bpf/vmtest-bpf-next-VM_Test-21

success

Logs for test_progs_no_alu32_parallel on x86_64 with llvm-16

bpf/vmtest-bpf-next-VM_Test-22

success

Logs for test_progs_parallel on aarch64 with gcc

bpf/vmtest-bpf-next-VM_Test-23

success

Logs for test_progs_parallel on x86_64 with gcc

bpf/vmtest-bpf-next-VM_Test-24

success

Logs for test_progs_parallel on x86_64 with llvm-16

bpf/vmtest-bpf-next-VM_Test-25

success

Logs for test_verifier on aarch64 with gcc

bpf/vmtest-bpf-next-VM_Test-27

success

Logs for test_verifier on x86_64 with gcc

bpf/vmtest-bpf-next-VM_Test-28

success

Logs for test_verifier on x86_64 with llvm-16

bpf/vmtest-bpf-next-VM_Test-29

success

Logs for veristat

bpf/vmtest-bpf-next-VM_Test-7

success

Logs for test_maps on aarch64 with gcc

bpf/vmtest-bpf-next-VM_Test-11

success

Logs for test_progs on aarch64 with gcc

bpf/vmtest-bpf-next-VM_Test-13

success

Logs for test_progs on x86_64 with gcc

bpf/vmtest-bpf-next-VM_Test-14

success

Logs for test_progs on x86_64 with llvm-16

bpf/vmtest-bpf-next-VM_Test-15

success

Logs for test_progs_no_alu32 on aarch64 with gcc

bpf/vmtest-bpf-next-VM_Test-17

success

Logs for test_progs_no_alu32 on x86_64 with gcc

bpf/vmtest-bpf-next-VM_Test-18

success

Logs for test_progs_no_alu32 on x86_64 with llvm-16

bpf/vmtest-bpf-next-VM_Test-12

success

Logs for test_progs on s390x with gcc

bpf/vmtest-bpf-next-VM_Test-16

success

Logs for test_progs_no_alu32 on s390x with gcc

bpf/vmtest-bpf-next-VM_Test-26

success

Logs for test_verifier on s390x with gcc

bpf/vmtest-bpf-next-VM_Test-8

fail

Logs for test_maps on s390x with gcc

Commit Message

Jiri Olsa Aug. 3, 2023, 7:33 a.m. UTC

Adding new multi uprobe link that allows to attach bpf program
to multiple uprobes.

Uprobes to attach are specified via new link_create uprobe_multi
union:

  struct {
    __aligned_u64   path;
    __aligned_u64   offsets;
    __aligned_u64   ref_ctr_offsets;
    __u32           cnt;
    __u32           flags;
  } uprobe_multi;

Uprobes are defined for single binary specified in path and multiple
calling sites specified in offsets array with optional reference
counters specified in ref_ctr_offsets array. All specified arrays
have length of 'cnt'.

The 'flags' supports single bit for now that marks the uprobe as
return probe.

Acked-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Yafang Shao <laoar.shao@gmail.com>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
---
 include/linux/trace_events.h   |   6 +
 include/uapi/linux/bpf.h       |  16 +++
 kernel/bpf/syscall.c           |  14 +-
 kernel/trace/bpf_trace.c       | 237 +++++++++++++++++++++++++++++++++
 tools/include/uapi/linux/bpf.h |  16 +++
 5 files changed, 286 insertions(+), 3 deletions(-)

Comments

Yonghong Song Aug. 4, 2023, 9:55 p.m. UTC | #1

On 8/3/23 12:33 AM, Jiri Olsa wrote:
> Adding new multi uprobe link that allows to attach bpf program
> to multiple uprobes.
> 
> Uprobes to attach are specified via new link_create uprobe_multi
> union:
> 
>    struct {
>      __aligned_u64   path;
>      __aligned_u64   offsets;
>      __aligned_u64   ref_ctr_offsets;
>      __u32           cnt;
>      __u32           flags;
>    } uprobe_multi;
> 
> Uprobes are defined for single binary specified in path and multiple
> calling sites specified in offsets array with optional reference
> counters specified in ref_ctr_offsets array. All specified arrays
> have length of 'cnt'.
> 
> The 'flags' supports single bit for now that marks the uprobe as
> return probe.
> 
> Acked-by: Andrii Nakryiko <andrii@kernel.org>
> Acked-by: Yafang Shao <laoar.shao@gmail.com>
> Signed-off-by: Jiri Olsa <jolsa@kernel.org>
> ---
>   include/linux/trace_events.h   |   6 +
>   include/uapi/linux/bpf.h       |  16 +++
>   kernel/bpf/syscall.c           |  14 +-
>   kernel/trace/bpf_trace.c       | 237 +++++++++++++++++++++++++++++++++
>   tools/include/uapi/linux/bpf.h |  16 +++
>   5 files changed, 286 insertions(+), 3 deletions(-)
> 
[...]
> +
> +static int uprobe_prog_run(struct bpf_uprobe *uprobe,
> +			   unsigned long entry_ip,
> +			   struct pt_regs *regs)
> +{
> +	struct bpf_uprobe_multi_link *link = uprobe->link;
> +	struct bpf_uprobe_multi_run_ctx run_ctx = {
> +		.entry_ip = entry_ip,
> +	};
> +	struct bpf_prog *prog = link->link.prog;
> +	bool sleepable = prog->aux->sleepable;
> +	struct bpf_run_ctx *old_run_ctx;
> +	int err = 0;
> +
> +	might_fault();

Could you explain what you try to protect here
with might_fault()?

In my opinion, might_fault() is unnecessary here
since the calling context is process context and
there is no mmap_lock held, so might_fault()
won't capture anything.

might_fault() is used in iter.c and trampoline.c
since their calling context is more complex
than here and might_fault() may actually capture
issues.

> +
> +	migrate_disable();
> +
> +	if (sleepable)
> +		rcu_read_lock_trace();
> +	else
> +		rcu_read_lock();

Looking at trampoline.c and iter.c, typical
usage is
	rcu_read_lock_trace()/rcu_read_lock()
	migrate_disable()

Your above sequenence could be correct too. But it
is great if we can keep consistency here.

> +
> +	old_run_ctx = bpf_set_run_ctx(&run_ctx.run_ctx);
> +	err = bpf_prog_run(link->link.prog, regs);
> +	bpf_reset_run_ctx(old_run_ctx);
> +
> +	if (sleepable)
> +		rcu_read_unlock_trace();
> +	else
> +		rcu_read_unlock();
> +
> +	migrate_enable();
> +	return err;
> +}
> +
> +static int
> +uprobe_multi_link_handler(struct uprobe_consumer *con, struct pt_regs *regs)
> +{
> +	struct bpf_uprobe *uprobe;
> +
> +	uprobe = container_of(con, struct bpf_uprobe, consumer);
> +	return uprobe_prog_run(uprobe, instruction_pointer(regs), regs);
> +}
> +
> +static int
> +uprobe_multi_link_ret_handler(struct uprobe_consumer *con, unsigned long func, struct pt_regs *regs)
> +{
> +	struct bpf_uprobe *uprobe;
> +
> +	uprobe = container_of(con, struct bpf_uprobe, consumer);
> +	return uprobe_prog_run(uprobe, func, regs);
> +}
> +
> +int bpf_uprobe_multi_link_attach(const union bpf_attr *attr, struct bpf_prog *prog)
> +{
> +	struct bpf_uprobe_multi_link *link = NULL;
> +	unsigned long __user *uref_ctr_offsets;
> +	unsigned long *ref_ctr_offsets = NULL;
> +	struct bpf_link_primer link_primer;
> +	struct bpf_uprobe *uprobes = NULL;
> +	unsigned long __user *uoffsets;
> +	void __user *upath;
> +	u32 flags, cnt, i;
> +	struct path path;
> +	char *name;
> +	int err;
> +
> +	/* no support for 32bit archs yet */
> +	if (sizeof(u64) != sizeof(void *))
> +		return -EOPNOTSUPP;
> +
> +	if (prog->expected_attach_type != BPF_TRACE_UPROBE_MULTI)
> +		return -EINVAL;
> +
> +	flags = attr->link_create.uprobe_multi.flags;
> +	if (flags & ~BPF_F_UPROBE_MULTI_RETURN)
> +		return -EINVAL;
> +
> +	/*
> +	 * path, offsets and cnt are mandatory,
> +	 * ref_ctr_offsets is optional
> +	 */
> +	upath = u64_to_user_ptr(attr->link_create.uprobe_multi.path);
> +	uoffsets = u64_to_user_ptr(attr->link_create.uprobe_multi.offsets);
> +	cnt = attr->link_create.uprobe_multi.cnt;
> +
> +	if (!upath || !uoffsets || !cnt)
> +		return -EINVAL;
> +
> +	uref_ctr_offsets = u64_to_user_ptr(attr->link_create.uprobe_multi.ref_ctr_offsets);
> +
> +	name = strndup_user(upath, PATH_MAX);
> +	if (IS_ERR(name)) {
> +		err = PTR_ERR(name);
> +		return err;
> +	}
> +
> +	err = kern_path(name, LOOKUP_FOLLOW, &path);
> +	kfree(name);
> +	if (err)
> +		return err;
> +
> +	if (!d_is_reg(path.dentry)) {
> +		err = -EBADF;
> +		goto error_path_put;
> +	}
> +
> +	err = -ENOMEM;
> +
> +	link = kzalloc(sizeof(*link), GFP_KERNEL);
> +	uprobes = kvcalloc(cnt, sizeof(*uprobes), GFP_KERNEL);
> +
> +	if (!uprobes || !link)
> +		goto error_free;
> +
> +	if (uref_ctr_offsets) {
> +		ref_ctr_offsets = kvcalloc(cnt, sizeof(*ref_ctr_offsets), GFP_KERNEL);
> +		if (!ref_ctr_offsets)
> +			goto error_free;
> +	}
> +
> +	for (i = 0; i < cnt; i++) {
> +		if (uref_ctr_offsets && __get_user(ref_ctr_offsets[i], uref_ctr_offsets + i)) {
> +			err = -EFAULT;
> +			goto error_free;
> +		}
> +		if (__get_user(uprobes[i].offset, uoffsets + i)) {
> +			err = -EFAULT;
> +			goto error_free;
> +		}
> +
> +		uprobes[i].link = link;
> +
> +		if (flags & BPF_F_UPROBE_MULTI_RETURN)
> +			uprobes[i].consumer.ret_handler = uprobe_multi_link_ret_handler;
> +		else
> +			uprobes[i].consumer.handler = uprobe_multi_link_handler;
> +	}
> +
> +	link->cnt = cnt;
> +	link->uprobes = uprobes;
> +	link->path = path;
> +
> +	bpf_link_init(&link->link, BPF_LINK_TYPE_UPROBE_MULTI,
> +		      &bpf_uprobe_multi_link_lops, prog);
> +
> +	err = bpf_link_prime(&link->link, &link_primer);
> +	if (err)
> +		goto error_free;
> +
> +	for (i = 0; i < cnt; i++) {
> +		err = uprobe_register_refctr(d_real_inode(link->path.dentry),
> +					     uprobes[i].offset,
> +					     ref_ctr_offsets ? ref_ctr_offsets[i] : 0,
> +					     &uprobes[i].consumer);
> +		if (err) {
> +			bpf_uprobe_unregister(&path, uprobes, i);
> +			bpf_link_cleanup(&link_primer);
> +			kvfree(ref_ctr_offsets);

Is it possible we may miss some of below 'error_free' cleanups?
In my opinion, we should replace
			kvfree(ref_ctr_offsets);
			return err;
with
			goto error_free;

Could you double check?

> +			return err;
> +		}
> +	}
> +
> +	kvfree(ref_ctr_offsets);
> +	return bpf_link_settle(&link_primer);
> +
> +error_free:
> +	kvfree(ref_ctr_offsets);
> +	kvfree(uprobes);
> +	kfree(link);
> +error_path_put:
> +	path_put(&path);
> +	return err;
> +}
> +#else /* !CONFIG_UPROBES */
> +int bpf_uprobe_multi_link_attach(const union bpf_attr *attr, struct bpf_prog *prog)
> +{
> +	return -EOPNOTSUPP;
> +}
> +#endif /* CONFIG_UPROBES */
[...]

Jiri Olsa Aug. 5, 2023, 9:11 p.m. UTC | #2

On Fri, Aug 04, 2023 at 02:55:29PM -0700, Yonghong Song wrote:

SNIP

> > +static int uprobe_prog_run(struct bpf_uprobe *uprobe,
> > +			   unsigned long entry_ip,
> > +			   struct pt_regs *regs)
> > +{
> > +	struct bpf_uprobe_multi_link *link = uprobe->link;
> > +	struct bpf_uprobe_multi_run_ctx run_ctx = {
> > +		.entry_ip = entry_ip,
> > +	};
> > +	struct bpf_prog *prog = link->link.prog;
> > +	bool sleepable = prog->aux->sleepable;
> > +	struct bpf_run_ctx *old_run_ctx;
> > +	int err = 0;
> > +
> > +	might_fault();
> 
> Could you explain what you try to protect here
> with might_fault()?
> 
> In my opinion, might_fault() is unnecessary here
> since the calling context is process context and
> there is no mmap_lock held, so might_fault()
> won't capture anything.
> 
> might_fault() is used in iter.c and trampoline.c
> since their calling context is more complex
> than here and might_fault() may actually capture
> issues.

hum, I followed bpf_prog_run_array_sleepable, which is called
the same way.. will check

> 
> > +
> > +	migrate_disable();
> > +
> > +	if (sleepable)
> > +		rcu_read_lock_trace();
> > +	else
> > +		rcu_read_lock();
> 
> Looking at trampoline.c and iter.c, typical
> usage is
> 	rcu_read_lock_trace()/rcu_read_lock()
> 	migrate_disable()
> 
> Your above sequenence could be correct too. But it
> is great if we can keep consistency here.

ok, will switch that

SNIP

> > +	link->cnt = cnt;
> > +	link->uprobes = uprobes;
> > +	link->path = path;
> > +
> > +	bpf_link_init(&link->link, BPF_LINK_TYPE_UPROBE_MULTI,
> > +		      &bpf_uprobe_multi_link_lops, prog);
> > +
> > +	err = bpf_link_prime(&link->link, &link_primer);
> > +	if (err)
> > +		goto error_free;
> > +
> > +	for (i = 0; i < cnt; i++) {
> > +		err = uprobe_register_refctr(d_real_inode(link->path.dentry),
> > +					     uprobes[i].offset,
> > +					     ref_ctr_offsets ? ref_ctr_offsets[i] : 0,
> > +					     &uprobes[i].consumer);
> > +		if (err) {
> > +			bpf_uprobe_unregister(&path, uprobes, i);
> > +			bpf_link_cleanup(&link_primer);
> > +			kvfree(ref_ctr_offsets);
> 
> Is it possible we may miss some of below 'error_free' cleanups?
> In my opinion, we should replace
> 			kvfree(ref_ctr_offsets);
> 			return err;
> with
> 			goto error_free;
> 
> Could you double check?

the problem here is that bpf_link_cleanup calls link's dealloc callback,
so it get's released in bpf_uprobe_multi_link_dealloc.. which is missing
task release :-\

I think we could init the link only after we create all the uprobes,
and have single release function dealloc callback and error path in here

thanks,
jirka

> 
> > +			return err;
> > +		}
> > +	}
> > +
> > +	kvfree(ref_ctr_offsets);
> > +	return bpf_link_settle(&link_primer);
> > +
> > +error_free:
> > +	kvfree(ref_ctr_offsets);
> > +	kvfree(uprobes);
> > +	kfree(link);
> > +error_path_put:
> > +	path_put(&path);
> > +	return err;
> > +}
> > +#else /* !CONFIG_UPROBES */
> > +int bpf_uprobe_multi_link_attach(const union bpf_attr *attr, struct bpf_prog *prog)
> > +{
> > +	return -EOPNOTSUPP;
> > +}
> > +#endif /* CONFIG_UPROBES */
> [...]

diff --git a/include/linux/trace_events.h b/include/linux/trace_events.h
index e66d04dbe56a..5b85cf18c350 100644
--- a/include/linux/trace_events.h
+++ b/include/linux/trace_events.h
@@ -752,6 +752,7 @@  int bpf_get_perf_event_info(const struct perf_event *event, u32 *prog_id,
 			    u32 *fd_type, const char **buf,
 			    u64 *probe_offset, u64 *probe_addr);
 int bpf_kprobe_multi_link_attach(const union bpf_attr *attr, struct bpf_prog *prog);
+int bpf_uprobe_multi_link_attach(const union bpf_attr *attr, struct bpf_prog *prog);
 #else
 static inline unsigned int trace_call_bpf(struct trace_event_call *call, void *ctx)
 {
@@ -798,6 +799,11 @@  bpf_kprobe_multi_link_attach(const union bpf_attr *attr, struct bpf_prog *prog)
 {
 	return -EOPNOTSUPP;
 }
+static inline int
+bpf_uprobe_multi_link_attach(const union bpf_attr *attr, struct bpf_prog *prog)
+{
+	return -EOPNOTSUPP;
+}
 #endif
 
 enum {
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 7abb382dc6c1..f112a0b948f3 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -1039,6 +1039,7 @@  enum bpf_attach_type {
 	BPF_NETFILTER,
 	BPF_TCX_INGRESS,
 	BPF_TCX_EGRESS,
+	BPF_TRACE_UPROBE_MULTI,
 	__MAX_BPF_ATTACH_TYPE
 };
 
@@ -1057,6 +1058,7 @@  enum bpf_link_type {
 	BPF_LINK_TYPE_STRUCT_OPS = 9,
 	BPF_LINK_TYPE_NETFILTER = 10,
 	BPF_LINK_TYPE_TCX = 11,
+	BPF_LINK_TYPE_UPROBE_MULTI = 12,
 	MAX_BPF_LINK_TYPE,
 };
 
@@ -1190,6 +1192,13 @@  enum {
 	BPF_F_KPROBE_MULTI_RETURN = (1U << 0)
 };
 
+/* link_create.uprobe_multi.flags used in LINK_CREATE command for
+ * BPF_TRACE_UPROBE_MULTI attach type to create return probe.
+ */
+enum {
+	BPF_F_UPROBE_MULTI_RETURN = (1U << 0)
+};
+
 /* link_create.netfilter.flags used in LINK_CREATE command for
  * BPF_PROG_TYPE_NETFILTER to enable IP packet defragmentation.
  */
@@ -1626,6 +1635,13 @@  union bpf_attr {
 				};
 				__u64		expected_revision;
 			} tcx;
+			struct {
+				__aligned_u64	path;
+				__aligned_u64	offsets;
+				__aligned_u64	ref_ctr_offsets;
+				__u32		cnt;
+				__u32		flags;
+			} uprobe_multi;
 		};
 	} link_create;
 
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 7c01186d4078..75c83300339e 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -2815,10 +2815,12 @@  static void bpf_link_free_id(int id)
 
 /* Clean up bpf_link and corresponding anon_inode file and FD. After
  * anon_inode is created, bpf_link can't be just kfree()'d due to deferred
- * anon_inode's release() call. This helper marksbpf_link as
+ * anon_inode's release() call. This helper marks bpf_link as
  * defunct, releases anon_inode file and puts reserved FD. bpf_prog's refcnt
  * is not decremented, it's the responsibility of a calling code that failed
  * to complete bpf_link initialization.
+ * This helper eventually calls link's dealloc callback, but does not call
+ * link's release callback.
  */
 void bpf_link_cleanup(struct bpf_link_primer *primer)
 {
@@ -3757,8 +3759,12 @@  static int bpf_prog_attach_check_attach_type(const struct bpf_prog *prog,
 		if (prog->expected_attach_type == BPF_TRACE_KPROBE_MULTI &&
 		    attach_type != BPF_TRACE_KPROBE_MULTI)
 			return -EINVAL;
+		if (prog->expected_attach_type == BPF_TRACE_UPROBE_MULTI &&
+		    attach_type != BPF_TRACE_UPROBE_MULTI)
+			return -EINVAL;
 		if (attach_type != BPF_PERF_EVENT &&
-		    attach_type != BPF_TRACE_KPROBE_MULTI)
+		    attach_type != BPF_TRACE_KPROBE_MULTI &&
+		    attach_type != BPF_TRACE_UPROBE_MULTI)
 			return -EINVAL;
 		return 0;
 	case BPF_PROG_TYPE_SCHED_CLS:
@@ -4954,8 +4960,10 @@  static int link_create(union bpf_attr *attr, bpfptr_t uattr)
 	case BPF_PROG_TYPE_KPROBE:
 		if (attr->link_create.attach_type == BPF_PERF_EVENT)
 			ret = bpf_perf_link_attach(attr, prog);
-		else
+		else if (attr->link_create.attach_type == BPF_TRACE_KPROBE_MULTI)
 			ret = bpf_kprobe_multi_link_attach(attr, prog);
+		else if (attr->link_create.attach_type == BPF_TRACE_UPROBE_MULTI)
+			ret = bpf_uprobe_multi_link_attach(attr, prog);
 		break;
 	default:
 		ret = -EINVAL;
diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index 83bde2475ae5..0c67644ccb2e 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -23,6 +23,7 @@ 
 #include <linux/sort.h>
 #include <linux/key.h>
 #include <linux/verification.h>
+#include <linux/namei.h>
 
 #include <net/bpf_sk_storage.h>
 
@@ -2954,3 +2955,239 @@  static u64 bpf_kprobe_multi_entry_ip(struct bpf_run_ctx *ctx)
 	return 0;
 }
 #endif
+
+#ifdef CONFIG_UPROBES
+struct bpf_uprobe_multi_link;
+
+struct bpf_uprobe {
+	struct bpf_uprobe_multi_link *link;
+	loff_t offset;
+	struct uprobe_consumer consumer;
+};
+
+struct bpf_uprobe_multi_link {
+	struct path path;
+	struct bpf_link link;
+	u32 cnt;
+	struct bpf_uprobe *uprobes;
+};
+
+struct bpf_uprobe_multi_run_ctx {
+	struct bpf_run_ctx run_ctx;
+	unsigned long entry_ip;
+};
+
+static void bpf_uprobe_unregister(struct path *path, struct bpf_uprobe *uprobes,
+				  u32 cnt)
+{
+	u32 i;
+
+	for (i = 0; i < cnt; i++) {
+		uprobe_unregister(d_real_inode(path->dentry), uprobes[i].offset,
+				  &uprobes[i].consumer);
+	}
+}
+
+static void bpf_uprobe_multi_link_release(struct bpf_link *link)
+{
+	struct bpf_uprobe_multi_link *umulti_link;
+
+	umulti_link = container_of(link, struct bpf_uprobe_multi_link, link);
+	bpf_uprobe_unregister(&umulti_link->path, umulti_link->uprobes, umulti_link->cnt);
+}
+
+static void bpf_uprobe_multi_link_dealloc(struct bpf_link *link)
+{
+	struct bpf_uprobe_multi_link *umulti_link;
+
+	umulti_link = container_of(link, struct bpf_uprobe_multi_link, link);
+	path_put(&umulti_link->path);
+	kvfree(umulti_link->uprobes);
+	kfree(umulti_link);
+}
+
+static const struct bpf_link_ops bpf_uprobe_multi_link_lops = {
+	.release = bpf_uprobe_multi_link_release,
+	.dealloc = bpf_uprobe_multi_link_dealloc,
+};
+
+static int uprobe_prog_run(struct bpf_uprobe *uprobe,
+			   unsigned long entry_ip,
+			   struct pt_regs *regs)
+{
+	struct bpf_uprobe_multi_link *link = uprobe->link;
+	struct bpf_uprobe_multi_run_ctx run_ctx = {
+		.entry_ip = entry_ip,
+	};
+	struct bpf_prog *prog = link->link.prog;
+	bool sleepable = prog->aux->sleepable;
+	struct bpf_run_ctx *old_run_ctx;
+	int err = 0;
+
+	might_fault();
+
+	migrate_disable();
+
+	if (sleepable)
+		rcu_read_lock_trace();
+	else
+		rcu_read_lock();
+
+	old_run_ctx = bpf_set_run_ctx(&run_ctx.run_ctx);
+	err = bpf_prog_run(link->link.prog, regs);
+	bpf_reset_run_ctx(old_run_ctx);
+
+	if (sleepable)
+		rcu_read_unlock_trace();
+	else
+		rcu_read_unlock();
+
+	migrate_enable();
+	return err;
+}
+
+static int
+uprobe_multi_link_handler(struct uprobe_consumer *con, struct pt_regs *regs)
+{
+	struct bpf_uprobe *uprobe;
+
+	uprobe = container_of(con, struct bpf_uprobe, consumer);
+	return uprobe_prog_run(uprobe, instruction_pointer(regs), regs);
+}
+
+static int
+uprobe_multi_link_ret_handler(struct uprobe_consumer *con, unsigned long func, struct pt_regs *regs)
+{
+	struct bpf_uprobe *uprobe;
+
+	uprobe = container_of(con, struct bpf_uprobe, consumer);
+	return uprobe_prog_run(uprobe, func, regs);
+}
+
+int bpf_uprobe_multi_link_attach(const union bpf_attr *attr, struct bpf_prog *prog)
+{
+	struct bpf_uprobe_multi_link *link = NULL;
+	unsigned long __user *uref_ctr_offsets;
+	unsigned long *ref_ctr_offsets = NULL;
+	struct bpf_link_primer link_primer;
+	struct bpf_uprobe *uprobes = NULL;
+	unsigned long __user *uoffsets;
+	void __user *upath;
+	u32 flags, cnt, i;
+	struct path path;
+	char *name;
+	int err;
+
+	/* no support for 32bit archs yet */
+	if (sizeof(u64) != sizeof(void *))
+		return -EOPNOTSUPP;
+
+	if (prog->expected_attach_type != BPF_TRACE_UPROBE_MULTI)
+		return -EINVAL;
+
+	flags = attr->link_create.uprobe_multi.flags;
+	if (flags & ~BPF_F_UPROBE_MULTI_RETURN)
+		return -EINVAL;
+
+	/*
+	 * path, offsets and cnt are mandatory,
+	 * ref_ctr_offsets is optional
+	 */
+	upath = u64_to_user_ptr(attr->link_create.uprobe_multi.path);
+	uoffsets = u64_to_user_ptr(attr->link_create.uprobe_multi.offsets);
+	cnt = attr->link_create.uprobe_multi.cnt;
+
+	if (!upath || !uoffsets || !cnt)
+		return -EINVAL;
+
+	uref_ctr_offsets = u64_to_user_ptr(attr->link_create.uprobe_multi.ref_ctr_offsets);
+
+	name = strndup_user(upath, PATH_MAX);
+	if (IS_ERR(name)) {
+		err = PTR_ERR(name);
+		return err;
+	}
+
+	err = kern_path(name, LOOKUP_FOLLOW, &path);
+	kfree(name);
+	if (err)
+		return err;
+
+	if (!d_is_reg(path.dentry)) {
+		err = -EBADF;
+		goto error_path_put;
+	}
+
+	err = -ENOMEM;
+
+	link = kzalloc(sizeof(*link), GFP_KERNEL);
+	uprobes = kvcalloc(cnt, sizeof(*uprobes), GFP_KERNEL);
+
+	if (!uprobes || !link)
+		goto error_free;
+
+	if (uref_ctr_offsets) {
+		ref_ctr_offsets = kvcalloc(cnt, sizeof(*ref_ctr_offsets), GFP_KERNEL);
+		if (!ref_ctr_offsets)
+			goto error_free;
+	}
+
+	for (i = 0; i < cnt; i++) {
+		if (uref_ctr_offsets && __get_user(ref_ctr_offsets[i], uref_ctr_offsets + i)) {
+			err = -EFAULT;
+			goto error_free;
+		}
+		if (__get_user(uprobes[i].offset, uoffsets + i)) {
+			err = -EFAULT;
+			goto error_free;
+		}
+
+		uprobes[i].link = link;
+
+		if (flags & BPF_F_UPROBE_MULTI_RETURN)
+			uprobes[i].consumer.ret_handler = uprobe_multi_link_ret_handler;
+		else
+			uprobes[i].consumer.handler = uprobe_multi_link_handler;
+	}
+
+	link->cnt = cnt;
+	link->uprobes = uprobes;
+	link->path = path;
+
+	bpf_link_init(&link->link, BPF_LINK_TYPE_UPROBE_MULTI,
+		      &bpf_uprobe_multi_link_lops, prog);
+
+	err = bpf_link_prime(&link->link, &link_primer);
+	if (err)
+		goto error_free;
+
+	for (i = 0; i < cnt; i++) {
+		err = uprobe_register_refctr(d_real_inode(link->path.dentry),
+					     uprobes[i].offset,
+					     ref_ctr_offsets ? ref_ctr_offsets[i] : 0,
+					     &uprobes[i].consumer);
+		if (err) {
+			bpf_uprobe_unregister(&path, uprobes, i);
+			bpf_link_cleanup(&link_primer);
+			kvfree(ref_ctr_offsets);
+			return err;
+		}
+	}
+
+	kvfree(ref_ctr_offsets);
+	return bpf_link_settle(&link_primer);
+
+error_free:
+	kvfree(ref_ctr_offsets);
+	kvfree(uprobes);
+	kfree(link);
+error_path_put:
+	path_put(&path);
+	return err;
+}
+#else /* !CONFIG_UPROBES */
+int bpf_uprobe_multi_link_attach(const union bpf_attr *attr, struct bpf_prog *prog)
+{
+	return -EOPNOTSUPP;
+}
+#endif /* CONFIG_UPROBES */
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 7abb382dc6c1..f112a0b948f3 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -1039,6 +1039,7 @@  enum bpf_attach_type {
 	BPF_NETFILTER,
 	BPF_TCX_INGRESS,
 	BPF_TCX_EGRESS,
+	BPF_TRACE_UPROBE_MULTI,
 	__MAX_BPF_ATTACH_TYPE
 };
 
@@ -1057,6 +1058,7 @@  enum bpf_link_type {
 	BPF_LINK_TYPE_STRUCT_OPS = 9,
 	BPF_LINK_TYPE_NETFILTER = 10,
 	BPF_LINK_TYPE_TCX = 11,
+	BPF_LINK_TYPE_UPROBE_MULTI = 12,
 	MAX_BPF_LINK_TYPE,
 };
 
@@ -1190,6 +1192,13 @@  enum {
 	BPF_F_KPROBE_MULTI_RETURN = (1U << 0)
 };
 
+/* link_create.uprobe_multi.flags used in LINK_CREATE command for
+ * BPF_TRACE_UPROBE_MULTI attach type to create return probe.
+ */
+enum {
+	BPF_F_UPROBE_MULTI_RETURN = (1U << 0)
+};
+
 /* link_create.netfilter.flags used in LINK_CREATE command for
  * BPF_PROG_TYPE_NETFILTER to enable IP packet defragmentation.
  */
@@ -1626,6 +1635,13 @@  union bpf_attr {
 				};
 				__u64		expected_revision;
 			} tcx;
+			struct {
+				__aligned_u64	path;
+				__aligned_u64	offsets;
+				__aligned_u64	ref_ctr_offsets;
+				__u32		cnt;
+				__u32		flags;
+			} uprobe_multi;
 		};
 	} link_create;

[PATCHv6,bpf-next,03/28] bpf: Add multi uprobe link

Checks

Commit Message

Comments

Patch