[PATCHv2,bpf-next,01/24] bpf: Add multi uprobe link

Message ID	20230620083550.690426-2-jolsa@kernel.org (mailing list archive)
State	Superseded
Delegated to:	BPF
Headers	show Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 2210EAD48 for <bpf@vger.kernel.org>; Tue, 20 Jun 2023 08:36:04 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 33F55C433C8; Tue, 20 Jun 2023 08:36:02 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1687250164; bh=8o3UKWpW7XacEVg3k9OJXjCKgmWSwdm9wEffUSUaOls=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=HOJRCdskDkc5NE1vDw2FNOpyb/Zx3gtzwFUvbpBgizhBrvzo3OwKUdbgCIsQOwgR0 ZXwegQjURsyOaGzjC83oHgKwnlUM4WzIqFB6bL/KdoINQZhOzkcR7RZOLafL+wHrRm 9m+8FFHv5xGtNnT6yMp4BHdoLBK3p109QmxsuI7uQwh8gp5fD26LcKeZx/sPNgRnOJ mwCMMXHBQv3v7D2jb7B0/5fD5lD8uMCKcHmOLorCgCXewogMFsKkIJs+Cu1J0E0OE0 kXG/DIWTkngyCqrkyzQ0F/cxqklPe1/MyoMNMAy9mUsPCv9C04n5q36RGnuX5uKxno qtHtgb7W7JkNw== From: Jiri Olsa <jolsa@kernel.org> To: Alexei Starovoitov <ast@kernel.org>, Daniel Borkmann <daniel@iogearbox.net>, Andrii Nakryiko <andrii@kernel.org> Cc: bpf@vger.kernel.org, Martin KaFai Lau <kafai@fb.com>, Song Liu <songliubraving@fb.com>, Yonghong Song <yhs@fb.com>, John Fastabend <john.fastabend@gmail.com>, KP Singh <kpsingh@chromium.org>, Stanislav Fomichev <sdf@google.com>, Hao Luo <haoluo@google.com> Subject: [PATCHv2 bpf-next 01/24] bpf: Add multi uprobe link Date: Tue, 20 Jun 2023 10:35:27 +0200 Message-ID: <20230620083550.690426-2-jolsa@kernel.org> X-Mailer: git-send-email 2.41.0 In-Reply-To: <20230620083550.690426-1-jolsa@kernel.org> References: <20230620083550.690426-1-jolsa@kernel.org> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: <bpf.vger.kernel.org> List-Subscribe: <mailto:bpf+subscribe@vger.kernel.org> List-Unsubscribe: <mailto:bpf+unsubscribe@vger.kernel.org> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Delegate: bpf@iogearbox.net
Series	bpf: Add multi uprobe link \| expand [PATCHv2,bpf-next,00/24] bpf: Add multi uprobe link [PATCHv2,bpf-next,01/24] bpf: Add multi uprobe link [PATCHv2,bpf-next,02/24] bpf: Add cookies support for uprobe_multi link [PATCHv2,bpf-next,03/24] bpf: Add pid filter support for uprobe_multi link [PATCHv2,bpf-next,04/24] bpf: Add bpf_get_func_ip helper support for uprobe link [PATCHv2,bpf-next,05/24] libbpf: Add uprobe_multi attach type and link names [PATCHv2,bpf-next,06/24] libbpf: Add elf symbol iterator [PATCHv2,bpf-next,07/24] libbpf: Add open_elf/close_elf functions [PATCHv2,bpf-next,08/24] libbpf: Add elf_find_multi_func_offset function [PATCHv2,bpf-next,09/24] libbpf: Add elf_find_pattern_func_offset function [PATCHv2,bpf-next,10/24] libbpf: Add bpf_link_create support for multi uprobes [PATCHv2,bpf-next,11/24] libbpf: Add bpf_program__attach_uprobe_multi_opts function [PATCHv2,bpf-next,12/24] libbpf: Add support for u[ret]probe.multi[.s] program sections [PATCHv2,bpf-next,13/24] libbpf: Add uprobe multi link detection [PATCHv2,bpf-next,14/24] libbpf: Add uprobe multi link support to bpf_program__attach_usdt [PATCHv2,bpf-next,15/24] selftests/bpf: Add uprobe_multi skel test [PATCHv2,bpf-next,16/24] selftests/bpf: Add uprobe_multi api test [PATCHv2,bpf-next,17/24] selftests/bpf: Add uprobe_multi link test [PATCHv2,bpf-next,18/24] selftests/bpf: Add uprobe_multi test program [PATCHv2,bpf-next,19/24] selftests/bpf: Add uprobe_multi bench test [PATCHv2,bpf-next,20/24] selftests/bpf: Add usdt_multi test program [PATCHv2,bpf-next,21/24] selftests/bpf: Add usdt_multi bench test [PATCHv2,bpf-next,22/24] selftests/bpf: Add uprobe_multi cookie test [PATCHv2,bpf-next,23/24] selftests/bpf: Add uprobe_multi pid filter tests [PATCHv2,bpf-next,24/24] selftests/bpf: Add extra link to uprobe_multi tests

Context	Check	Description
bpf/vmtest-bpf-next-PR	fail	PR summary
bpf/vmtest-bpf-next-VM_Test-1	success	Logs for ${{ matrix.test }} on ${{ matrix.arch }} with ${{ matrix.toolchain_full }}
bpf/vmtest-bpf-next-VM_Test-2	success	Logs for ShellCheck
bpf/vmtest-bpf-next-VM_Test-3	success	Logs for build for aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-4	success	Logs for build for s390x with gcc
bpf/vmtest-bpf-next-VM_Test-5	success	Logs for build for x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-6	fail	Logs for build for x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-7	success	Logs for set-matrix
bpf/vmtest-bpf-next-VM_Test-8	success	Logs for veristat
netdev/series_format	fail	Series longer than 15 patches (and no cover letter)
netdev/tree_selection	success	Clearly marked for bpf-next, async
netdev/fixes_present	success	Fixes tag not required for -next series
netdev/header_inline	success	No static functions without inline keyword in header files
netdev/build_32bit	fail	Errors and warnings before: 2999 this patch: 3004
netdev/cc_maintainers	warning	6 maintainers not CCed: linux-trace-kernel@vger.kernel.org kpsingh@kernel.org mhiramat@kernel.org martin.lau@linux.dev song@kernel.org rostedt@goodmis.org
netdev/build_clang	success	Errors and warnings before: 392 this patch: 392
netdev/verify_signedoff	success	Signed-off-by tag matches author and committer
netdev/deprecated_api	success	None detected
netdev/check_selftest	success	No net selftest shell script
netdev/verify_fixes	success	No Fixes tag
netdev/build_allmodconfig_warn	fail	Errors and warnings before: 3141 this patch: 3146
netdev/checkpatch	warning	CHECK: Please use a blank line after function/struct/union/enum declarations CHECK: Prefer using the BIT macro WARNING: line length of 100 exceeds 80 columns WARNING: line length of 81 exceeds 80 columns WARNING: line length of 83 exceeds 80 columns WARNING: line length of 84 exceeds 80 columns WARNING: line length of 88 exceeds 80 columns WARNING: line length of 90 exceeds 80 columns WARNING: line length of 91 exceeds 80 columns
netdev/kdoc	success	Errors and warnings before: 0 this patch: 0
netdev/source_inline	success	Was 0 now: 0

Jiri Olsa June 20, 2023, 8:35 a.m. UTC

Adding new multi uprobe link that allows to attach bpf program
to multiple uprobes.

Uprobes to attach are specified via new link_create uprobe_multi
union:

  struct {
          __u32           flags;
          __u32           cnt;
          __aligned_u64   path;
          __aligned_u64   offsets;
          __aligned_u64   ref_ctr_offsets;
  } uprobe_multi;

Uprobes are defined for single binary specified in path and multiple
calling sites specified in offsets array with optional reference
counters specified in ref_ctr_offsets array. All specified arrays
have length of 'cnt'.

The 'flags' supports single bit for now that marks the uprobe as
return probe.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
---
 include/linux/trace_events.h   |   6 +
 include/uapi/linux/bpf.h       |  14 ++
 kernel/bpf/syscall.c           |  12 +-
 kernel/trace/bpf_trace.c       | 237 +++++++++++++++++++++++++++++++++
 tools/include/uapi/linux/bpf.h |  14 ++
 5 files changed, 281 insertions(+), 2 deletions(-)

Alexei Starovoitov June 20, 2023, 5:11 p.m. UTC | #1

On Tue, Jun 20, 2023 at 10:35:27AM +0200, Jiri Olsa wrote:
> +static int uprobe_prog_run(struct bpf_uprobe *uprobe,
> +			   unsigned long entry_ip,
> +			   struct pt_regs *regs)
> +{
> +	struct bpf_uprobe_multi_link *link = uprobe->link;
> +	struct bpf_uprobe_multi_run_ctx run_ctx = {
> +		.entry_ip = entry_ip,
> +	};
> +	struct bpf_prog *prog = link->link.prog;
> +	struct bpf_run_ctx *old_run_ctx;
> +	int err = 0;
> +
> +	might_fault();
> +
> +	rcu_read_lock_trace();
> +	migrate_disable();
> +
> +	if (unlikely(__this_cpu_inc_return(bpf_prog_active) != 1))
> +		goto out;

bpf_prog_run_array_sleepable() doesn't do such things.
Such 'proteciton' will actively hurt.
The sleepable prog below will block all kprobes on this cpu.
please remove.

> +
> +	old_run_ctx = bpf_set_run_ctx(&run_ctx.run_ctx);
> +
> +	if (!prog->aux->sleepable)
> +		rcu_read_lock();
> +
> +	err = bpf_prog_run(link->link.prog, regs);
> +
> +	if (!prog->aux->sleepable)
> +		rcu_read_unlock();
> +
> +	bpf_reset_run_ctx(old_run_ctx);
> +
> +out:
> +	__this_cpu_dec(bpf_prog_active);
> +	migrate_enable();
> +	rcu_read_unlock_trace();
> +	return err;

Jiri Olsa June 21, 2023, 8:32 a.m. UTC | #2

On Tue, Jun 20, 2023 at 10:11:15AM -0700, Alexei Starovoitov wrote:
> On Tue, Jun 20, 2023 at 10:35:27AM +0200, Jiri Olsa wrote:
> > +static int uprobe_prog_run(struct bpf_uprobe *uprobe,
> > +			   unsigned long entry_ip,
> > +			   struct pt_regs *regs)
> > +{
> > +	struct bpf_uprobe_multi_link *link = uprobe->link;
> > +	struct bpf_uprobe_multi_run_ctx run_ctx = {
> > +		.entry_ip = entry_ip,
> > +	};
> > +	struct bpf_prog *prog = link->link.prog;
> > +	struct bpf_run_ctx *old_run_ctx;
> > +	int err = 0;
> > +
> > +	might_fault();
> > +
> > +	rcu_read_lock_trace();
> > +	migrate_disable();
> > +
> > +	if (unlikely(__this_cpu_inc_return(bpf_prog_active) != 1))
> > +		goto out;
> 
> bpf_prog_run_array_sleepable() doesn't do such things.
> Such 'proteciton' will actively hurt.
> The sleepable prog below will block all kprobes on this cpu.
> please remove.

ok makes sense, can't recall the reason why I added it

jirka

> 
> > +
> > +	old_run_ctx = bpf_set_run_ctx(&run_ctx.run_ctx);
> > +
> > +	if (!prog->aux->sleepable)
> > +		rcu_read_lock();
> > +
> > +	err = bpf_prog_run(link->link.prog, regs);
> > +
> > +	if (!prog->aux->sleepable)
> > +		rcu_read_unlock();
> > +
> > +	bpf_reset_run_ctx(old_run_ctx);
> > +
> > +out:
> > +	__this_cpu_dec(bpf_prog_active);
> > +	migrate_enable();
> > +	rcu_read_unlock_trace();
> > +	return err;

Andrii Nakryiko June 23, 2023, 12:18 a.m. UTC | #3

On Tue, Jun 20, 2023 at 1:36 AM Jiri Olsa <jolsa@kernel.org> wrote:
>
> Adding new multi uprobe link that allows to attach bpf program
> to multiple uprobes.
>
> Uprobes to attach are specified via new link_create uprobe_multi
> union:
>
>   struct {
>           __u32           flags;
>           __u32           cnt;
>           __aligned_u64   path;
>           __aligned_u64   offsets;
>           __aligned_u64   ref_ctr_offsets;
>   } uprobe_multi;
>
> Uprobes are defined for single binary specified in path and multiple
> calling sites specified in offsets array with optional reference
> counters specified in ref_ctr_offsets array. All specified arrays
> have length of 'cnt'.
>
> The 'flags' supports single bit for now that marks the uprobe as
> return probe.
>
> Signed-off-by: Jiri Olsa <jolsa@kernel.org>
> ---
>  include/linux/trace_events.h   |   6 +
>  include/uapi/linux/bpf.h       |  14 ++
>  kernel/bpf/syscall.c           |  12 +-
>  kernel/trace/bpf_trace.c       | 237 +++++++++++++++++++++++++++++++++
>  tools/include/uapi/linux/bpf.h |  14 ++
>  5 files changed, 281 insertions(+), 2 deletions(-)
>

[...]

> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> index a75c54b6f8a3..a96e46cd407e 100644
> --- a/kernel/bpf/syscall.c
> +++ b/kernel/bpf/syscall.c
> @@ -3516,6 +3516,11 @@ static int bpf_prog_attach_check_attach_type(const struct bpf_prog *prog,
>                 return prog->enforce_expected_attach_type &&
>                         prog->expected_attach_type != attach_type ?
>                         -EINVAL : 0;
> +       case BPF_PROG_TYPE_KPROBE:
> +               if (prog->expected_attach_type == BPF_TRACE_KPROBE_MULTI &&
> +                   attach_type != BPF_TRACE_KPROBE_MULTI)

should this be UPROBE_MULTI? this looks like your recent bug fix,
which already landed

> +                       return -EINVAL;
> +               fallthrough;

and I replaced this with `return 0;` ;)
>         default:
>                 return 0;
>         }
> @@ -4681,7 +4686,8 @@ static int link_create(union bpf_attr *attr, bpfptr_t uattr)
>                 break;
>         case BPF_PROG_TYPE_KPROBE:
>                 if (attr->link_create.attach_type != BPF_PERF_EVENT &&
> -                   attr->link_create.attach_type != BPF_TRACE_KPROBE_MULTI) {
> +                   attr->link_create.attach_type != BPF_TRACE_KPROBE_MULTI &&
> +                   attr->link_create.attach_type != BPF_TRACE_UPROBE_MULTI) {
>                         ret = -EINVAL;
>                         goto out;
>                 }

should this be moved into bpf_prog_attach_check_attach_type() and
unify these checks?

> @@ -4748,8 +4754,10 @@ static int link_create(union bpf_attr *attr, bpfptr_t uattr)
>         case BPF_PROG_TYPE_KPROBE:
>                 if (attr->link_create.attach_type == BPF_PERF_EVENT)
>                         ret = bpf_perf_link_attach(attr, prog);
> -               else
> +               else if (attr->link_create.attach_type == BPF_TRACE_KPROBE_MULTI)
>                         ret = bpf_kprobe_multi_link_attach(attr, prog);
> +               else if (attr->link_create.attach_type == BPF_TRACE_UPROBE_MULTI)
> +                       ret = bpf_uprobe_multi_link_attach(attr, prog);
>                 break;
>         default:
>                 ret = -EINVAL;

[...]

> +static void bpf_uprobe_unregister(struct path *path, struct bpf_uprobe *uprobes,
> +                                 u32 cnt)
> +{
> +       u32 i;
> +
> +       for (i = 0; i < cnt; i++) {
> +               uprobe_unregister(d_real_inode(path->dentry), uprobes[i].offset,
> +                                 &uprobes[i].consumer);
> +       }
> +}
> +
> +static void bpf_uprobe_multi_link_release(struct bpf_link *link)
> +{
> +       struct bpf_uprobe_multi_link *umulti_link;
> +
> +       umulti_link = container_of(link, struct bpf_uprobe_multi_link, link);
> +       bpf_uprobe_unregister(&umulti_link->path, umulti_link->uprobes, umulti_link->cnt);
> +       path_put(&umulti_link->path);
> +}
> +
> +static void bpf_uprobe_multi_link_dealloc(struct bpf_link *link)
> +{
> +       struct bpf_uprobe_multi_link *umulti_link;
> +
> +       umulti_link = container_of(link, struct bpf_uprobe_multi_link, link);
> +       kvfree(umulti_link->uprobes);
> +       kfree(umulti_link);
> +}
> +
> +static const struct bpf_link_ops bpf_uprobe_multi_link_lops = {
> +       .release = bpf_uprobe_multi_link_release,
> +       .dealloc = bpf_uprobe_multi_link_dealloc,
> +};
> +
> +static int uprobe_prog_run(struct bpf_uprobe *uprobe,
> +                          unsigned long entry_ip,
> +                          struct pt_regs *regs)
> +{
> +       struct bpf_uprobe_multi_link *link = uprobe->link;
> +       struct bpf_uprobe_multi_run_ctx run_ctx = {
> +               .entry_ip = entry_ip,
> +       };
> +       struct bpf_prog *prog = link->link.prog;
> +       struct bpf_run_ctx *old_run_ctx;
> +       int err = 0;
> +
> +       might_fault();
> +
> +       rcu_read_lock_trace();

we don't need this if uprobe is not sleepable, right? why unconditional then?

> +       migrate_disable();
> +
> +       if (unlikely(__this_cpu_inc_return(bpf_prog_active) != 1))
> +               goto out;
> +
> +       old_run_ctx = bpf_set_run_ctx(&run_ctx.run_ctx);
> +
> +       if (!prog->aux->sleepable)
> +               rcu_read_lock();
> +
> +       err = bpf_prog_run(link->link.prog, regs);
> +
> +       if (!prog->aux->sleepable)
> +               rcu_read_unlock();
> +
> +       bpf_reset_run_ctx(old_run_ctx);
> +
> +out:
> +       __this_cpu_dec(bpf_prog_active);
> +       migrate_enable();
> +       rcu_read_unlock_trace();
> +       return err;
> +}
> +

[...]

> +
> +       err = kern_path(name, LOOKUP_FOLLOW, &path);
> +       kfree(name);
> +       if (err)
> +               return err;
> +
> +       if (!d_is_reg(path.dentry)) {
> +               err = -EINVAL;
> +               goto error_path_put;
> +       }
> +
> +       err = -ENOMEM;
> +
> +       link = kzalloc(sizeof(*link), GFP_KERNEL);
> +       uprobes = kvcalloc(cnt, sizeof(*uprobes), GFP_KERNEL);
> +       ref_ctr_offsets = kvcalloc(cnt, sizeof(*ref_ctr_offsets), GFP_KERNEL);

ref_ctr_offsets is optional, but we'll unconditionally allocate this array?

> +
> +       if (!uprobes || !ref_ctr_offsets || !link)
> +               goto error_free;
> +
> +       for (i = 0; i < cnt; i++) {
> +               if (uref_ctr_offsets && __get_user(ref_ctr_offset, uref_ctr_offsets + i)) {
> +                       err = -EFAULT;
> +                       goto error_free;
> +               }
> +               if (__get_user(offset, uoffsets + i)) {
> +                       err = -EFAULT;
> +                       goto error_free;
> +               }
> +
> +               uprobes[i].offset = offset;
> +               uprobes[i].link = link;
> +
> +               if (flags & BPF_F_UPROBE_MULTI_RETURN)
> +                       uprobes[i].consumer.ret_handler = uprobe_multi_link_ret_handler;
> +               else
> +                       uprobes[i].consumer.handler = uprobe_multi_link_handler;
> +
> +               ref_ctr_offsets[i] = ref_ctr_offset;
> +       }
> +
> +       link->cnt = cnt;
> +       link->uprobes = uprobes;
> +       link->path = path;
> +
> +       bpf_link_init(&link->link, BPF_LINK_TYPE_UPROBE_MULTI,
> +                     &bpf_uprobe_multi_link_lops, prog);
> +
> +       err = bpf_link_prime(&link->link, &link_primer);
> +       if (err)
> +               goto error_free;
> +
> +       for (i = 0; i < cnt; i++) {
> +               err = uprobe_register_refctr(d_real_inode(link->path.dentry),
> +                                            uprobes[i].offset, ref_ctr_offsets[i],
> +                                            &uprobes[i].consumer);
> +               if (err) {
> +                       bpf_uprobe_unregister(&path, uprobes, i);

bpf_link_cleanup() will do this through
bpf_uprobe_multi_link_release(), no? So you are double unregistering?
Either drop cnt to zero, or just don't do this here? Latter is better,
IMO.

> +                       bpf_link_cleanup(&link_primer);
> +                       kvfree(ref_ctr_offsets);
> +                       return err;
> +               }
> +       }
> +
> +       kvfree(ref_ctr_offsets);
> +       return bpf_link_settle(&link_primer);
> +
> +error_free:
> +       kvfree(ref_ctr_offsets);
> +       kvfree(uprobes);
> +       kfree(link);
> +error_path_put:
> +       path_put(&path);
> +       return err;
> +}
> +#else /* !CONFIG_UPROBES */
> +int bpf_uprobe_multi_link_attach(const union bpf_attr *attr, struct bpf_prog *prog)
> +{
> +       return -EOPNOTSUPP;
> +}

[...]

Jiri Olsa June 23, 2023, 8:19 a.m. UTC | #4

On Thu, Jun 22, 2023 at 05:18:05PM -0700, Andrii Nakryiko wrote:
> On Tue, Jun 20, 2023 at 1:36 AM Jiri Olsa <jolsa@kernel.org> wrote:
> >
> > Adding new multi uprobe link that allows to attach bpf program
> > to multiple uprobes.
> >
> > Uprobes to attach are specified via new link_create uprobe_multi
> > union:
> >
> >   struct {
> >           __u32           flags;
> >           __u32           cnt;
> >           __aligned_u64   path;
> >           __aligned_u64   offsets;
> >           __aligned_u64   ref_ctr_offsets;
> >   } uprobe_multi;
> >
> > Uprobes are defined for single binary specified in path and multiple
> > calling sites specified in offsets array with optional reference
> > counters specified in ref_ctr_offsets array. All specified arrays
> > have length of 'cnt'.
> >
> > The 'flags' supports single bit for now that marks the uprobe as
> > return probe.
> >
> > Signed-off-by: Jiri Olsa <jolsa@kernel.org>
> > ---
> >  include/linux/trace_events.h   |   6 +
> >  include/uapi/linux/bpf.h       |  14 ++
> >  kernel/bpf/syscall.c           |  12 +-
> >  kernel/trace/bpf_trace.c       | 237 +++++++++++++++++++++++++++++++++
> >  tools/include/uapi/linux/bpf.h |  14 ++
> >  5 files changed, 281 insertions(+), 2 deletions(-)
> >
> 
> [...]
> 
> > diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> > index a75c54b6f8a3..a96e46cd407e 100644
> > --- a/kernel/bpf/syscall.c
> > +++ b/kernel/bpf/syscall.c
> > @@ -3516,6 +3516,11 @@ static int bpf_prog_attach_check_attach_type(const struct bpf_prog *prog,
> >                 return prog->enforce_expected_attach_type &&
> >                         prog->expected_attach_type != attach_type ?
> >                         -EINVAL : 0;
> > +       case BPF_PROG_TYPE_KPROBE:
> > +               if (prog->expected_attach_type == BPF_TRACE_KPROBE_MULTI &&
> > +                   attach_type != BPF_TRACE_KPROBE_MULTI)
> 
> should this be UPROBE_MULTI? this looks like your recent bug fix,
> which already landed
> 
> > +                       return -EINVAL;
> > +               fallthrough;
> 
> and I replaced this with `return 0;` ;)

ugh, yes, will fix

> >         default:
> >                 return 0;
> >         }
> > @@ -4681,7 +4686,8 @@ static int link_create(union bpf_attr *attr, bpfptr_t uattr)
> >                 break;
> >         case BPF_PROG_TYPE_KPROBE:
> >                 if (attr->link_create.attach_type != BPF_PERF_EVENT &&
> > -                   attr->link_create.attach_type != BPF_TRACE_KPROBE_MULTI) {
> > +                   attr->link_create.attach_type != BPF_TRACE_KPROBE_MULTI &&
> > +                   attr->link_create.attach_type != BPF_TRACE_UPROBE_MULTI) {
> >                         ret = -EINVAL;
> >                         goto out;
> >                 }
> 
> should this be moved into bpf_prog_attach_check_attach_type() and
> unify these checks?

ok, perhaps we could move there the whole switch, will check

> 
> > @@ -4748,8 +4754,10 @@ static int link_create(union bpf_attr *attr, bpfptr_t uattr)
> >         case BPF_PROG_TYPE_KPROBE:
> >                 if (attr->link_create.attach_type == BPF_PERF_EVENT)
> >                         ret = bpf_perf_link_attach(attr, prog);
> > -               else
> > +               else if (attr->link_create.attach_type == BPF_TRACE_KPROBE_MULTI)
> >                         ret = bpf_kprobe_multi_link_attach(attr, prog);
> > +               else if (attr->link_create.attach_type == BPF_TRACE_UPROBE_MULTI)
> > +                       ret = bpf_uprobe_multi_link_attach(attr, prog);
> >                 break;
> >         default:
> >                 ret = -EINVAL;
> 
> [...]
> 
> > +static void bpf_uprobe_unregister(struct path *path, struct bpf_uprobe *uprobes,
> > +                                 u32 cnt)
> > +{
> > +       u32 i;
> > +
> > +       for (i = 0; i < cnt; i++) {
> > +               uprobe_unregister(d_real_inode(path->dentry), uprobes[i].offset,
> > +                                 &uprobes[i].consumer);
> > +       }
> > +}
> > +
> > +static void bpf_uprobe_multi_link_release(struct bpf_link *link)
> > +{
> > +       struct bpf_uprobe_multi_link *umulti_link;
> > +
> > +       umulti_link = container_of(link, struct bpf_uprobe_multi_link, link);
> > +       bpf_uprobe_unregister(&umulti_link->path, umulti_link->uprobes, umulti_link->cnt);
> > +       path_put(&umulti_link->path);
> > +}
> > +
> > +static void bpf_uprobe_multi_link_dealloc(struct bpf_link *link)
> > +{
> > +       struct bpf_uprobe_multi_link *umulti_link;
> > +
> > +       umulti_link = container_of(link, struct bpf_uprobe_multi_link, link);
> > +       kvfree(umulti_link->uprobes);
> > +       kfree(umulti_link);
> > +}
> > +
> > +static const struct bpf_link_ops bpf_uprobe_multi_link_lops = {
> > +       .release = bpf_uprobe_multi_link_release,
> > +       .dealloc = bpf_uprobe_multi_link_dealloc,
> > +};
> > +
> > +static int uprobe_prog_run(struct bpf_uprobe *uprobe,
> > +                          unsigned long entry_ip,
> > +                          struct pt_regs *regs)
> > +{
> > +       struct bpf_uprobe_multi_link *link = uprobe->link;
> > +       struct bpf_uprobe_multi_run_ctx run_ctx = {
> > +               .entry_ip = entry_ip,
> > +       };
> > +       struct bpf_prog *prog = link->link.prog;
> > +       struct bpf_run_ctx *old_run_ctx;
> > +       int err = 0;
> > +
> > +       might_fault();
> > +
> > +       rcu_read_lock_trace();
> 
> we don't need this if uprobe is not sleepable, right? why unconditional then?

I won't pretend I understand what rcu_read_lock_trace does ;-)

I tried to follow bpf_prog_run_array_sleepable where it's called
unconditionally for both sleepable and non-sleepable progs

there are conditional rcu_read_un/lock calls later on

I will check

> 
> > +       migrate_disable();
> > +
> > +       if (unlikely(__this_cpu_inc_return(bpf_prog_active) != 1))
> > +               goto out;
> > +
> > +       old_run_ctx = bpf_set_run_ctx(&run_ctx.run_ctx);
> > +
> > +       if (!prog->aux->sleepable)
> > +               rcu_read_lock();
> > +
> > +       err = bpf_prog_run(link->link.prog, regs);
> > +
> > +       if (!prog->aux->sleepable)
> > +               rcu_read_unlock();
> > +
> > +       bpf_reset_run_ctx(old_run_ctx);
> > +
> > +out:
> > +       __this_cpu_dec(bpf_prog_active);
> > +       migrate_enable();
> > +       rcu_read_unlock_trace();
> > +       return err;
> > +}
> > +
> 
> [...]
> 
> > +
> > +       err = kern_path(name, LOOKUP_FOLLOW, &path);
> > +       kfree(name);
> > +       if (err)
> > +               return err;
> > +
> > +       if (!d_is_reg(path.dentry)) {
> > +               err = -EINVAL;
> > +               goto error_path_put;
> > +       }
> > +
> > +       err = -ENOMEM;
> > +
> > +       link = kzalloc(sizeof(*link), GFP_KERNEL);
> > +       uprobes = kvcalloc(cnt, sizeof(*uprobes), GFP_KERNEL);
> > +       ref_ctr_offsets = kvcalloc(cnt, sizeof(*ref_ctr_offsets), GFP_KERNEL);
> 
> ref_ctr_offsets is optional, but we'll unconditionally allocate this array?

true :-\ will add the uref_ctr_offsets check

> 
> > +
> > +       if (!uprobes || !ref_ctr_offsets || !link)
> > +               goto error_free;
> > +
> > +       for (i = 0; i < cnt; i++) {
> > +               if (uref_ctr_offsets && __get_user(ref_ctr_offset, uref_ctr_offsets + i)) {
> > +                       err = -EFAULT;
> > +                       goto error_free;
> > +               }
> > +               if (__get_user(offset, uoffsets + i)) {
> > +                       err = -EFAULT;
> > +                       goto error_free;
> > +               }
> > +
> > +               uprobes[i].offset = offset;
> > +               uprobes[i].link = link;
> > +
> > +               if (flags & BPF_F_UPROBE_MULTI_RETURN)
> > +                       uprobes[i].consumer.ret_handler = uprobe_multi_link_ret_handler;
> > +               else
> > +                       uprobes[i].consumer.handler = uprobe_multi_link_handler;
> > +
> > +               ref_ctr_offsets[i] = ref_ctr_offset;
> > +       }
> > +
> > +       link->cnt = cnt;
> > +       link->uprobes = uprobes;
> > +       link->path = path;
> > +
> > +       bpf_link_init(&link->link, BPF_LINK_TYPE_UPROBE_MULTI,
> > +                     &bpf_uprobe_multi_link_lops, prog);
> > +
> > +       err = bpf_link_prime(&link->link, &link_primer);
> > +       if (err)
> > +               goto error_free;
> > +
> > +       for (i = 0; i < cnt; i++) {
> > +               err = uprobe_register_refctr(d_real_inode(link->path.dentry),
> > +                                            uprobes[i].offset, ref_ctr_offsets[i],
> > +                                            &uprobes[i].consumer);
> > +               if (err) {
> > +                       bpf_uprobe_unregister(&path, uprobes, i);
> 
> bpf_link_cleanup() will do this through
> bpf_uprobe_multi_link_release(), no? So you are double unregistering?
> Either drop cnt to zero, or just don't do this here? Latter is better,
> IMO.

bpf_link_cleanup path won't call release callback so we have to do that

I think I can add simple selftest to have this path covered

thanks,
jirka

> 
> > +                       bpf_link_cleanup(&link_primer);
> > +                       kvfree(ref_ctr_offsets);
> > +                       return err;
> > +               }
> > +       }
> > +
> > +       kvfree(ref_ctr_offsets);
> > +       return bpf_link_settle(&link_primer);
> > +
> > +error_free:
> > +       kvfree(ref_ctr_offsets);
> > +       kvfree(uprobes);
> > +       kfree(link);
> > +error_path_put:
> > +       path_put(&path);
> > +       return err;
> > +}
> > +#else /* !CONFIG_UPROBES */
> > +int bpf_uprobe_multi_link_attach(const union bpf_attr *attr, struct bpf_prog *prog)
> > +{
> > +       return -EOPNOTSUPP;
> > +}
> 
> [...]

Andrii Nakryiko June 23, 2023, 4:24 p.m. UTC | #5

On Fri, Jun 23, 2023 at 1:19 AM Jiri Olsa <olsajiri@gmail.com> wrote:
>
> On Thu, Jun 22, 2023 at 05:18:05PM -0700, Andrii Nakryiko wrote:
> > On Tue, Jun 20, 2023 at 1:36 AM Jiri Olsa <jolsa@kernel.org> wrote:
> > >
> > > Adding new multi uprobe link that allows to attach bpf program
> > > to multiple uprobes.
> > >
> > > Uprobes to attach are specified via new link_create uprobe_multi
> > > union:
> > >
> > >   struct {
> > >           __u32           flags;
> > >           __u32           cnt;
> > >           __aligned_u64   path;
> > >           __aligned_u64   offsets;
> > >           __aligned_u64   ref_ctr_offsets;
> > >   } uprobe_multi;
> > >
> > > Uprobes are defined for single binary specified in path and multiple
> > > calling sites specified in offsets array with optional reference
> > > counters specified in ref_ctr_offsets array. All specified arrays
> > > have length of 'cnt'.
> > >
> > > The 'flags' supports single bit for now that marks the uprobe as
> > > return probe.
> > >
> > > Signed-off-by: Jiri Olsa <jolsa@kernel.org>
> > > ---
> > >  include/linux/trace_events.h   |   6 +
> > >  include/uapi/linux/bpf.h       |  14 ++
> > >  kernel/bpf/syscall.c           |  12 +-
> > >  kernel/trace/bpf_trace.c       | 237 +++++++++++++++++++++++++++++++++
> > >  tools/include/uapi/linux/bpf.h |  14 ++
> > >  5 files changed, 281 insertions(+), 2 deletions(-)
> > >
> >
> > [...]
> >
> > > diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> > > index a75c54b6f8a3..a96e46cd407e 100644
> > > --- a/kernel/bpf/syscall.c
> > > +++ b/kernel/bpf/syscall.c
> > > @@ -3516,6 +3516,11 @@ static int bpf_prog_attach_check_attach_type(const struct bpf_prog *prog,
> > >                 return prog->enforce_expected_attach_type &&
> > >                         prog->expected_attach_type != attach_type ?
> > >                         -EINVAL : 0;
> > > +       case BPF_PROG_TYPE_KPROBE:
> > > +               if (prog->expected_attach_type == BPF_TRACE_KPROBE_MULTI &&
> > > +                   attach_type != BPF_TRACE_KPROBE_MULTI)
> >
> > should this be UPROBE_MULTI? this looks like your recent bug fix,
> > which already landed
> >
> > > +                       return -EINVAL;
> > > +               fallthrough;
> >
> > and I replaced this with `return 0;` ;)
>
> ugh, yes, will fix
>
> > >         default:
> > >                 return 0;
> > >         }
> > > @@ -4681,7 +4686,8 @@ static int link_create(union bpf_attr *attr, bpfptr_t uattr)
> > >                 break;
> > >         case BPF_PROG_TYPE_KPROBE:
> > >                 if (attr->link_create.attach_type != BPF_PERF_EVENT &&
> > > -                   attr->link_create.attach_type != BPF_TRACE_KPROBE_MULTI) {
> > > +                   attr->link_create.attach_type != BPF_TRACE_KPROBE_MULTI &&
> > > +                   attr->link_create.attach_type != BPF_TRACE_UPROBE_MULTI) {
> > >                         ret = -EINVAL;
> > >                         goto out;
> > >                 }
> >
> > should this be moved into bpf_prog_attach_check_attach_type() and
> > unify these checks?
>
> ok, perhaps we could move there the whole switch, will check

+1

>
> >
> > > @@ -4748,8 +4754,10 @@ static int link_create(union bpf_attr *attr, bpfptr_t uattr)
> > >         case BPF_PROG_TYPE_KPROBE:
> > >                 if (attr->link_create.attach_type == BPF_PERF_EVENT)
> > >                         ret = bpf_perf_link_attach(attr, prog);
> > > -               else
> > > +               else if (attr->link_create.attach_type == BPF_TRACE_KPROBE_MULTI)
> > >                         ret = bpf_kprobe_multi_link_attach(attr, prog);
> > > +               else if (attr->link_create.attach_type == BPF_TRACE_UPROBE_MULTI)
> > > +                       ret = bpf_uprobe_multi_link_attach(attr, prog);
> > >                 break;
> > >         default:
> > >                 ret = -EINVAL;
> >
> > [...]
> >
> > > +static void bpf_uprobe_unregister(struct path *path, struct bpf_uprobe *uprobes,
> > > +                                 u32 cnt)
> > > +{
> > > +       u32 i;
> > > +
> > > +       for (i = 0; i < cnt; i++) {
> > > +               uprobe_unregister(d_real_inode(path->dentry), uprobes[i].offset,
> > > +                                 &uprobes[i].consumer);
> > > +       }
> > > +}
> > > +
> > > +static void bpf_uprobe_multi_link_release(struct bpf_link *link)
> > > +{
> > > +       struct bpf_uprobe_multi_link *umulti_link;
> > > +
> > > +       umulti_link = container_of(link, struct bpf_uprobe_multi_link, link);
> > > +       bpf_uprobe_unregister(&umulti_link->path, umulti_link->uprobes, umulti_link->cnt);
> > > +       path_put(&umulti_link->path);
> > > +}
> > > +
> > > +static void bpf_uprobe_multi_link_dealloc(struct bpf_link *link)
> > > +{
> > > +       struct bpf_uprobe_multi_link *umulti_link;
> > > +
> > > +       umulti_link = container_of(link, struct bpf_uprobe_multi_link, link);
> > > +       kvfree(umulti_link->uprobes);
> > > +       kfree(umulti_link);
> > > +}
> > > +
> > > +static const struct bpf_link_ops bpf_uprobe_multi_link_lops = {
> > > +       .release = bpf_uprobe_multi_link_release,
> > > +       .dealloc = bpf_uprobe_multi_link_dealloc,
> > > +};
> > > +
> > > +static int uprobe_prog_run(struct bpf_uprobe *uprobe,
> > > +                          unsigned long entry_ip,
> > > +                          struct pt_regs *regs)
> > > +{
> > > +       struct bpf_uprobe_multi_link *link = uprobe->link;
> > > +       struct bpf_uprobe_multi_run_ctx run_ctx = {
> > > +               .entry_ip = entry_ip,
> > > +       };
> > > +       struct bpf_prog *prog = link->link.prog;
> > > +       struct bpf_run_ctx *old_run_ctx;
> > > +       int err = 0;
> > > +
> > > +       might_fault();
> > > +
> > > +       rcu_read_lock_trace();
> >
> > we don't need this if uprobe is not sleepable, right? why unconditional then?
>
> I won't pretend I understand what rcu_read_lock_trace does ;-)
>
> I tried to follow bpf_prog_run_array_sleepable where it's called
> unconditionally for both sleepable and non-sleepable progs
>
> there are conditional rcu_read_un/lock calls later on
>
> I will check

hm... Alexei can chime in here, but given here we actually are trying
to run one BPF program (not entire array of them), we do know whether
it's going to be sleepable or not. So we can avoid unnecessary
rcu_read_{lock,unlock}_trace() calls. rcu_read_lock_trace() is used
when there is going to be sleepable BPF program executed to protect
BPF maps and other resources from being freed too soon. But if we know
that we don't need sleepable, we can avoid that.

>
> >
> > > +       migrate_disable();
> > > +
> > > +       if (unlikely(__this_cpu_inc_return(bpf_prog_active) != 1))
> > > +               goto out;
> > > +
> > > +       old_run_ctx = bpf_set_run_ctx(&run_ctx.run_ctx);
> > > +
> > > +       if (!prog->aux->sleepable)
> > > +               rcu_read_lock();
> > > +
> > > +       err = bpf_prog_run(link->link.prog, regs);
> > > +
> > > +       if (!prog->aux->sleepable)
> > > +               rcu_read_unlock();
> > > +
> > > +       bpf_reset_run_ctx(old_run_ctx);
> > > +
> > > +out:
> > > +       __this_cpu_dec(bpf_prog_active);
> > > +       migrate_enable();
> > > +       rcu_read_unlock_trace();
> > > +       return err;
> > > +}
> > > +
> >
> > [...]
> >
> > > +
> > > +       err = kern_path(name, LOOKUP_FOLLOW, &path);
> > > +       kfree(name);
> > > +       if (err)
> > > +               return err;
> > > +
> > > +       if (!d_is_reg(path.dentry)) {
> > > +               err = -EINVAL;
> > > +               goto error_path_put;
> > > +       }
> > > +
> > > +       err = -ENOMEM;
> > > +
> > > +       link = kzalloc(sizeof(*link), GFP_KERNEL);
> > > +       uprobes = kvcalloc(cnt, sizeof(*uprobes), GFP_KERNEL);
> > > +       ref_ctr_offsets = kvcalloc(cnt, sizeof(*ref_ctr_offsets), GFP_KERNEL);
> >
> > ref_ctr_offsets is optional, but we'll unconditionally allocate this array?
>
> true :-\ will add the uref_ctr_offsets check
>
> >
> > > +
> > > +       if (!uprobes || !ref_ctr_offsets || !link)
> > > +               goto error_free;
> > > +
> > > +       for (i = 0; i < cnt; i++) {
> > > +               if (uref_ctr_offsets && __get_user(ref_ctr_offset, uref_ctr_offsets + i)) {
> > > +                       err = -EFAULT;
> > > +                       goto error_free;
> > > +               }
> > > +               if (__get_user(offset, uoffsets + i)) {
> > > +                       err = -EFAULT;
> > > +                       goto error_free;
> > > +               }
> > > +
> > > +               uprobes[i].offset = offset;
> > > +               uprobes[i].link = link;
> > > +
> > > +               if (flags & BPF_F_UPROBE_MULTI_RETURN)
> > > +                       uprobes[i].consumer.ret_handler = uprobe_multi_link_ret_handler;
> > > +               else
> > > +                       uprobes[i].consumer.handler = uprobe_multi_link_handler;
> > > +
> > > +               ref_ctr_offsets[i] = ref_ctr_offset;
> > > +       }
> > > +
> > > +       link->cnt = cnt;
> > > +       link->uprobes = uprobes;
> > > +       link->path = path;
> > > +
> > > +       bpf_link_init(&link->link, BPF_LINK_TYPE_UPROBE_MULTI,
> > > +                     &bpf_uprobe_multi_link_lops, prog);
> > > +
> > > +       err = bpf_link_prime(&link->link, &link_primer);
> > > +       if (err)
> > > +               goto error_free;
> > > +
> > > +       for (i = 0; i < cnt; i++) {
> > > +               err = uprobe_register_refctr(d_real_inode(link->path.dentry),
> > > +                                            uprobes[i].offset, ref_ctr_offsets[i],
> > > +                                            &uprobes[i].consumer);
> > > +               if (err) {
> > > +                       bpf_uprobe_unregister(&path, uprobes, i);
> >
> > bpf_link_cleanup() will do this through
> > bpf_uprobe_multi_link_release(), no? So you are double unregistering?
> > Either drop cnt to zero, or just don't do this here? Latter is better,
> > IMO.
>
> bpf_link_cleanup path won't call release callback so we have to do that

bpf_link_cleanup() does fput(primer->file); which eventually calls
release callback, no? I'd add printk and simulate failure just to be
sure

>
> I think I can add simple selftest to have this path covered
>
> thanks,
> jirka
>
> >
> > > +                       bpf_link_cleanup(&link_primer);
> > > +                       kvfree(ref_ctr_offsets);
> > > +                       return err;
> > > +               }
> > > +       }
> > > +
> > > +       kvfree(ref_ctr_offsets);
> > > +       return bpf_link_settle(&link_primer);
> > > +
> > > +error_free:
> > > +       kvfree(ref_ctr_offsets);
> > > +       kvfree(uprobes);
> > > +       kfree(link);
> > > +error_path_put:
> > > +       path_put(&path);
> > > +       return err;
> > > +}
> > > +#else /* !CONFIG_UPROBES */
> > > +int bpf_uprobe_multi_link_attach(const union bpf_attr *attr, struct bpf_prog *prog)
> > > +{
> > > +       return -EOPNOTSUPP;
> > > +}
> >
> > [...]

Alexei Starovoitov June 23, 2023, 4:39 p.m. UTC | #6

On Fri, Jun 23, 2023 at 9:24 AM Andrii Nakryiko
<andrii.nakryiko@gmail.com> wrote:
>
> > > > +
> > > > +static int uprobe_prog_run(struct bpf_uprobe *uprobe,
> > > > +                          unsigned long entry_ip,
> > > > +                          struct pt_regs *regs)
> > > > +{
> > > > +       struct bpf_uprobe_multi_link *link = uprobe->link;
> > > > +       struct bpf_uprobe_multi_run_ctx run_ctx = {
> > > > +               .entry_ip = entry_ip,
> > > > +       };
> > > > +       struct bpf_prog *prog = link->link.prog;
> > > > +       struct bpf_run_ctx *old_run_ctx;
> > > > +       int err = 0;
> > > > +
> > > > +       might_fault();
> > > > +
> > > > +       rcu_read_lock_trace();
> > >
> > > we don't need this if uprobe is not sleepable, right? why unconditional then?
> >
> > I won't pretend I understand what rcu_read_lock_trace does ;-)
> >
> > I tried to follow bpf_prog_run_array_sleepable where it's called
> > unconditionally for both sleepable and non-sleepable progs
> >
> > there are conditional rcu_read_un/lock calls later on
> >
> > I will check
>
> hm... Alexei can chime in here, but given here we actually are trying
> to run one BPF program (not entire array of them), we do know whether
> it's going to be sleepable or not. So we can avoid unnecessary
> rcu_read_{lock,unlock}_trace() calls. rcu_read_lock_trace() is used
> when there is going to be sleepable BPF program executed to protect
> BPF maps and other resources from being freed too soon. But if we know
> that we don't need sleepable, we can avoid that.

We can add more checks and bool flags to avoid rcu_read_{lock,unlock}_trace(),
but it will likely be slower. These calls are very fast.
Simpler and faster to do it unconditionally even when the array doesn't
have sleepable progs.
rcu_read_lock() we have to do conditionally, because it won't be ok
if sleepable progs are in the array.

Andrii Nakryiko June 23, 2023, 5:11 p.m. UTC | #7

On Fri, Jun 23, 2023 at 9:39 AM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Fri, Jun 23, 2023 at 9:24 AM Andrii Nakryiko
> <andrii.nakryiko@gmail.com> wrote:
> >
> > > > > +
> > > > > +static int uprobe_prog_run(struct bpf_uprobe *uprobe,
> > > > > +                          unsigned long entry_ip,
> > > > > +                          struct pt_regs *regs)
> > > > > +{
> > > > > +       struct bpf_uprobe_multi_link *link = uprobe->link;
> > > > > +       struct bpf_uprobe_multi_run_ctx run_ctx = {
> > > > > +               .entry_ip = entry_ip,
> > > > > +       };
> > > > > +       struct bpf_prog *prog = link->link.prog;
> > > > > +       struct bpf_run_ctx *old_run_ctx;
> > > > > +       int err = 0;
> > > > > +
> > > > > +       might_fault();
> > > > > +
> > > > > +       rcu_read_lock_trace();
> > > >
> > > > we don't need this if uprobe is not sleepable, right? why unconditional then?
> > >
> > > I won't pretend I understand what rcu_read_lock_trace does ;-)
> > >
> > > I tried to follow bpf_prog_run_array_sleepable where it's called
> > > unconditionally for both sleepable and non-sleepable progs
> > >
> > > there are conditional rcu_read_un/lock calls later on
> > >
> > > I will check
> >
> > hm... Alexei can chime in here, but given here we actually are trying
> > to run one BPF program (not entire array of them), we do know whether
> > it's going to be sleepable or not. So we can avoid unnecessary
> > rcu_read_{lock,unlock}_trace() calls. rcu_read_lock_trace() is used
> > when there is going to be sleepable BPF program executed to protect
> > BPF maps and other resources from being freed too soon. But if we know
> > that we don't need sleepable, we can avoid that.
>
> We can add more checks and bool flags to avoid rcu_read_{lock,unlock}_trace(),
> but it will likely be slower. These calls are very fast.

that's ok then. But seeing how we do

rcu_read_lock_trace();
if (!sleepable)
    rcu_read_lock();

it felt like we might as well just do

if (sleepable)
    rcu_read_lock_trace();
else
    rcu_read_lock();


As I mentioned, in this case we have a single bpf_prog, not a
bpf_prog_array, so that changes things a bit.

But ultimately, the context switch required for uprobe dwarfs overhead
of any of this, presumably, so it's a minor concern.

> Simpler and faster to do it unconditionally even when the array doesn't
> have sleepable progs.
> rcu_read_lock() we have to do conditionally, because it won't be ok
> if sleepable progs are in the array.

Alexei Starovoitov June 23, 2023, 5:20 p.m. UTC | #8

On Fri, Jun 23, 2023 at 10:11 AM Andrii Nakryiko
<andrii.nakryiko@gmail.com> wrote:
>
> On Fri, Jun 23, 2023 at 9:39 AM Alexei Starovoitov
> <alexei.starovoitov@gmail.com> wrote:
> >
> > On Fri, Jun 23, 2023 at 9:24 AM Andrii Nakryiko
> > <andrii.nakryiko@gmail.com> wrote:
> > >
> > > > > > +
> > > > > > +static int uprobe_prog_run(struct bpf_uprobe *uprobe,
> > > > > > +                          unsigned long entry_ip,
> > > > > > +                          struct pt_regs *regs)
> > > > > > +{
> > > > > > +       struct bpf_uprobe_multi_link *link = uprobe->link;
> > > > > > +       struct bpf_uprobe_multi_run_ctx run_ctx = {
> > > > > > +               .entry_ip = entry_ip,
> > > > > > +       };
> > > > > > +       struct bpf_prog *prog = link->link.prog;
> > > > > > +       struct bpf_run_ctx *old_run_ctx;
> > > > > > +       int err = 0;
> > > > > > +
> > > > > > +       might_fault();
> > > > > > +
> > > > > > +       rcu_read_lock_trace();
> > > > >
> > > > > we don't need this if uprobe is not sleepable, right? why unconditional then?
> > > >
> > > > I won't pretend I understand what rcu_read_lock_trace does ;-)
> > > >
> > > > I tried to follow bpf_prog_run_array_sleepable where it's called
> > > > unconditionally for both sleepable and non-sleepable progs
> > > >
> > > > there are conditional rcu_read_un/lock calls later on
> > > >
> > > > I will check
> > >
> > > hm... Alexei can chime in here, but given here we actually are trying
> > > to run one BPF program (not entire array of them), we do know whether
> > > it's going to be sleepable or not. So we can avoid unnecessary
> > > rcu_read_{lock,unlock}_trace() calls. rcu_read_lock_trace() is used
> > > when there is going to be sleepable BPF program executed to protect
> > > BPF maps and other resources from being freed too soon. But if we know
> > > that we don't need sleepable, we can avoid that.
> >
> > We can add more checks and bool flags to avoid rcu_read_{lock,unlock}_trace(),
> > but it will likely be slower. These calls are very fast.
>
> that's ok then. But seeing how we do
>
> rcu_read_lock_trace();
> if (!sleepable)
>     rcu_read_lock();
>
> it felt like we might as well just do
>
> if (sleepable)
>     rcu_read_lock_trace();
> else
>     rcu_read_lock();
>
>
> As I mentioned, in this case we have a single bpf_prog, not a
> bpf_prog_array, so that changes things a bit.

Ahh. It's only one prog. I missed that. Above makes sense then.
But why is it not an array? We can attach multiple uprobes to the same
location. Anyway that can be dealt with later.

Jiri Olsa June 25, 2023, 1:18 a.m. UTC | #9

On Fri, Jun 23, 2023 at 09:24:22AM -0700, Andrii Nakryiko wrote:

SNIP

> > > > +
> > > > +       if (!uprobes || !ref_ctr_offsets || !link)
> > > > +               goto error_free;
> > > > +
> > > > +       for (i = 0; i < cnt; i++) {
> > > > +               if (uref_ctr_offsets && __get_user(ref_ctr_offset, uref_ctr_offsets + i)) {
> > > > +                       err = -EFAULT;
> > > > +                       goto error_free;
> > > > +               }
> > > > +               if (__get_user(offset, uoffsets + i)) {
> > > > +                       err = -EFAULT;
> > > > +                       goto error_free;
> > > > +               }
> > > > +
> > > > +               uprobes[i].offset = offset;
> > > > +               uprobes[i].link = link;
> > > > +
> > > > +               if (flags & BPF_F_UPROBE_MULTI_RETURN)
> > > > +                       uprobes[i].consumer.ret_handler = uprobe_multi_link_ret_handler;
> > > > +               else
> > > > +                       uprobes[i].consumer.handler = uprobe_multi_link_handler;
> > > > +
> > > > +               ref_ctr_offsets[i] = ref_ctr_offset;
> > > > +       }
> > > > +
> > > > +       link->cnt = cnt;
> > > > +       link->uprobes = uprobes;
> > > > +       link->path = path;
> > > > +
> > > > +       bpf_link_init(&link->link, BPF_LINK_TYPE_UPROBE_MULTI,
> > > > +                     &bpf_uprobe_multi_link_lops, prog);
> > > > +
> > > > +       err = bpf_link_prime(&link->link, &link_primer);
> > > > +       if (err)
> > > > +               goto error_free;
> > > > +
> > > > +       for (i = 0; i < cnt; i++) {
> > > > +               err = uprobe_register_refctr(d_real_inode(link->path.dentry),
> > > > +                                            uprobes[i].offset, ref_ctr_offsets[i],
> > > > +                                            &uprobes[i].consumer);
> > > > +               if (err) {
> > > > +                       bpf_uprobe_unregister(&path, uprobes, i);
> > >
> > > bpf_link_cleanup() will do this through
> > > bpf_uprobe_multi_link_release(), no? So you are double unregistering?
> > > Either drop cnt to zero, or just don't do this here? Latter is better,
> > > IMO.
> >
> > bpf_link_cleanup path won't call release callback so we have to do that
> 
> bpf_link_cleanup() does fput(primer->file); which eventually calls
> release callback, no? I'd add printk and simulate failure just to be
> sure

I recall we had similar discussion for kprobe_multi link ;-)

I'll double check that but I think bpf_link_cleanup calls just
dealloc callback not release

jirka

> 
> >
> > I think I can add simple selftest to have this path covered
> >
> > thanks,
> > jirka

SNIP

Jiri Olsa June 25, 2023, 1:19 a.m. UTC | #10

On Fri, Jun 23, 2023 at 10:20:26AM -0700, Alexei Starovoitov wrote:
> On Fri, Jun 23, 2023 at 10:11 AM Andrii Nakryiko
> <andrii.nakryiko@gmail.com> wrote:
> >
> > On Fri, Jun 23, 2023 at 9:39 AM Alexei Starovoitov
> > <alexei.starovoitov@gmail.com> wrote:
> > >
> > > On Fri, Jun 23, 2023 at 9:24 AM Andrii Nakryiko
> > > <andrii.nakryiko@gmail.com> wrote:
> > > >
> > > > > > > +
> > > > > > > +static int uprobe_prog_run(struct bpf_uprobe *uprobe,
> > > > > > > +                          unsigned long entry_ip,
> > > > > > > +                          struct pt_regs *regs)
> > > > > > > +{
> > > > > > > +       struct bpf_uprobe_multi_link *link = uprobe->link;
> > > > > > > +       struct bpf_uprobe_multi_run_ctx run_ctx = {
> > > > > > > +               .entry_ip = entry_ip,
> > > > > > > +       };
> > > > > > > +       struct bpf_prog *prog = link->link.prog;
> > > > > > > +       struct bpf_run_ctx *old_run_ctx;
> > > > > > > +       int err = 0;
> > > > > > > +
> > > > > > > +       might_fault();
> > > > > > > +
> > > > > > > +       rcu_read_lock_trace();
> > > > > >
> > > > > > we don't need this if uprobe is not sleepable, right? why unconditional then?
> > > > >
> > > > > I won't pretend I understand what rcu_read_lock_trace does ;-)
> > > > >
> > > > > I tried to follow bpf_prog_run_array_sleepable where it's called
> > > > > unconditionally for both sleepable and non-sleepable progs
> > > > >
> > > > > there are conditional rcu_read_un/lock calls later on
> > > > >
> > > > > I will check
> > > >
> > > > hm... Alexei can chime in here, but given here we actually are trying
> > > > to run one BPF program (not entire array of them), we do know whether
> > > > it's going to be sleepable or not. So we can avoid unnecessary
> > > > rcu_read_{lock,unlock}_trace() calls. rcu_read_lock_trace() is used
> > > > when there is going to be sleepable BPF program executed to protect
> > > > BPF maps and other resources from being freed too soon. But if we know
> > > > that we don't need sleepable, we can avoid that.
> > >
> > > We can add more checks and bool flags to avoid rcu_read_{lock,unlock}_trace(),
> > > but it will likely be slower. These calls are very fast.
> >
> > that's ok then. But seeing how we do
> >
> > rcu_read_lock_trace();
> > if (!sleepable)
> >     rcu_read_lock();
> >
> > it felt like we might as well just do
> >
> > if (sleepable)
> >     rcu_read_lock_trace();
> > else
> >     rcu_read_lock();

ok

> >
> >
> > As I mentioned, in this case we have a single bpf_prog, not a
> > bpf_prog_array, so that changes things a bit.
> 
> Ahh. It's only one prog. I missed that. Above makes sense then.
> But why is it not an array? We can attach multiple uprobes to the same
> location. Anyway that can be dealt with later.

I think we could add support for this later if it's needed

jirka

Andrii Nakryiko June 26, 2023, 6:27 p.m. UTC | #11

On Sat, Jun 24, 2023 at 6:19 PM Jiri Olsa <olsajiri@gmail.com> wrote:
>
> On Fri, Jun 23, 2023 at 09:24:22AM -0700, Andrii Nakryiko wrote:
>
> SNIP
>
> > > > > +
> > > > > +       if (!uprobes || !ref_ctr_offsets || !link)
> > > > > +               goto error_free;
> > > > > +
> > > > > +       for (i = 0; i < cnt; i++) {
> > > > > +               if (uref_ctr_offsets && __get_user(ref_ctr_offset, uref_ctr_offsets + i)) {
> > > > > +                       err = -EFAULT;
> > > > > +                       goto error_free;
> > > > > +               }
> > > > > +               if (__get_user(offset, uoffsets + i)) {
> > > > > +                       err = -EFAULT;
> > > > > +                       goto error_free;
> > > > > +               }
> > > > > +
> > > > > +               uprobes[i].offset = offset;
> > > > > +               uprobes[i].link = link;
> > > > > +
> > > > > +               if (flags & BPF_F_UPROBE_MULTI_RETURN)
> > > > > +                       uprobes[i].consumer.ret_handler = uprobe_multi_link_ret_handler;
> > > > > +               else
> > > > > +                       uprobes[i].consumer.handler = uprobe_multi_link_handler;
> > > > > +
> > > > > +               ref_ctr_offsets[i] = ref_ctr_offset;
> > > > > +       }
> > > > > +
> > > > > +       link->cnt = cnt;
> > > > > +       link->uprobes = uprobes;
> > > > > +       link->path = path;
> > > > > +
> > > > > +       bpf_link_init(&link->link, BPF_LINK_TYPE_UPROBE_MULTI,
> > > > > +                     &bpf_uprobe_multi_link_lops, prog);
> > > > > +
> > > > > +       err = bpf_link_prime(&link->link, &link_primer);
> > > > > +       if (err)
> > > > > +               goto error_free;
> > > > > +
> > > > > +       for (i = 0; i < cnt; i++) {
> > > > > +               err = uprobe_register_refctr(d_real_inode(link->path.dentry),
> > > > > +                                            uprobes[i].offset, ref_ctr_offsets[i],
> > > > > +                                            &uprobes[i].consumer);
> > > > > +               if (err) {
> > > > > +                       bpf_uprobe_unregister(&path, uprobes, i);
> > > >
> > > > bpf_link_cleanup() will do this through
> > > > bpf_uprobe_multi_link_release(), no? So you are double unregistering?
> > > > Either drop cnt to zero, or just don't do this here? Latter is better,
> > > > IMO.
> > >
> > > bpf_link_cleanup path won't call release callback so we have to do that
> >
> > bpf_link_cleanup() does fput(primer->file); which eventually calls
> > release callback, no? I'd add printk and simulate failure just to be
> > sure
>
> I recall we had similar discussion for kprobe_multi link ;-)
>
> I'll double check that but I think bpf_link_cleanup calls just
> dealloc callback not release

Let's document this in comments for bpf_link_cleanup() so we don't
have to discuss this again :)

I think you are right, btw. I see that bpf_link_cleanup() sets
link->prog to NULL, and bpf_link_free() won't call
link->ops->release() if link->prog is NULL.

Tricky, I keep forgetting this. Let's explicitly explain this in a comment.

>
> jirka
>
> >
> > >
> > > I think I can add simple selftest to have this path covered
> > >
> > > thanks,
> > > jirka
>
> SNIP

Jiri Olsa June 26, 2023, 7:23 p.m. UTC | #12

On Mon, Jun 26, 2023 at 11:27:25AM -0700, Andrii Nakryiko wrote:

SNIP

> > > > > bpf_link_cleanup() will do this through
> > > > > bpf_uprobe_multi_link_release(), no? So you are double unregistering?
> > > > > Either drop cnt to zero, or just don't do this here? Latter is better,
> > > > > IMO.
> > > >
> > > > bpf_link_cleanup path won't call release callback so we have to do that
> > >
> > > bpf_link_cleanup() does fput(primer->file); which eventually calls
> > > release callback, no? I'd add printk and simulate failure just to be
> > > sure
> >
> > I recall we had similar discussion for kprobe_multi link ;-)
> >
> > I'll double check that but I think bpf_link_cleanup calls just
> > dealloc callback not release
> 
> Let's document this in comments for bpf_link_cleanup() so we don't
> have to discuss this again :)
> 
> I think you are right, btw. I see that bpf_link_cleanup() sets
> link->prog to NULL, and bpf_link_free() won't call
> link->ops->release() if link->prog is NULL.
> 
> Tricky, I keep forgetting this. Let's explicitly explain this in a comment.

ok, will add the comment

jirka

> 
> >
> > jirka
> >
> > >
> > > >
> > > > I think I can add simple selftest to have this path covered
> > > >
> > > > thanks,
> > > > jirka
> >
> > SNIP

[PATCHv2,bpf-next,01/24] bpf: Add multi uprobe link

Checks

Commit Message

Comments

Patch