diff mbox series

[v3,bpf-next,2/4] libbpf: add auto-attach for uprobes based on section name

Message ID 1643645554-28723-3-git-send-email-alan.maguire@oracle.com (mailing list archive)
State Changes Requested
Delegated to: BPF
Headers show
Series libbpf: name-based u[ret]probe attach | expand

Checks

Context Check Description
bpf/vmtest-bpf-next-PR fail PR summary
netdev/tree_selection success Clearly marked for bpf-next
netdev/fixes_present success Fixes tag not required for -next series
netdev/subject_prefix success Link
netdev/cover_letter success Series has a cover letter
netdev/patch_count success Link
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 0 this patch: 0
netdev/cc_maintainers success CCed 10 of 10 maintainers
netdev/build_clang success Errors and warnings before: 0 this patch: 0
netdev/module_param success Was 0 now: 0
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 0 this patch: 0
netdev/checkpatch warning WARNING: Prefer kstrto<type> to single variable sscanf WARNING: line length of 82 exceeds 80 columns WARNING: line length of 83 exceeds 80 columns WARNING: line length of 85 exceeds 80 columns WARNING: line length of 88 exceeds 80 columns WARNING: line length of 90 exceeds 80 columns
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0
bpf/vmtest-bpf-next fail VM_Test

Commit Message

Alan Maguire Jan. 31, 2022, 4:12 p.m. UTC
Now that u[ret]probes can use name-based specification, it makes
sense to add support for auto-attach based on SEC() definition.
The format proposed is

	SEC("u[ret]probe//path/to/prog:[raw_offset|[function_name[+offset]]")

For example, to trace malloc() in libc:

        SEC("uprobe//usr/lib64/libc.so.6:malloc")

Auto-attach is done for all tasks (pid -1).

Note that there is a backwards-compatibility issue here.  Consider a BPF
object consisting of a set of BPF programs, including a uprobe program.
Because uprobes did not previously support auto-attach, it's possible that
because the uprobe section name is not in auto-attachable form, overall
BPF skeleton attach would now fail due to the failure of the uprobe program
to auto-attach.  So we need to handle the case of auto-attach failure where
the form of the section name is not suitable for auto-attach without
a complete attach failure.  On surveying the code, bpf_program__attach()
already returns -ESRCH in cases where no auto-attach function is
supplied, so for consistency with that - and because that return value
is less likely to collide with actual attach failures than -EOPNOTSUPP -
it is used as the attach function return value signalling auto-attach
is not possible.

Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
---
 tools/lib/bpf/libbpf.c | 77 ++++++++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 74 insertions(+), 3 deletions(-)

Comments

Andrii Nakryiko Feb. 4, 2022, 7:22 p.m. UTC | #1
On Mon, Jan 31, 2022 at 8:13 AM Alan Maguire <alan.maguire@oracle.com> wrote:
>
> Now that u[ret]probes can use name-based specification, it makes
> sense to add support for auto-attach based on SEC() definition.
> The format proposed is
>
>         SEC("u[ret]probe//path/to/prog:[raw_offset|[function_name[+offset]]")
>
> For example, to trace malloc() in libc:
>
>         SEC("uprobe//usr/lib64/libc.so.6:malloc")

I assume that path to library can be relative path as well, right?

Also, should be look at trying to locate library in the system if it's
specified as "libc"? Or if the binary is "bash", for example. Just
bringing this up, because I think it came up before in the context of
one of libbpf-tools.

>
> Auto-attach is done for all tasks (pid -1).
>
> Note that there is a backwards-compatibility issue here.  Consider a BPF
> object consisting of a set of BPF programs, including a uprobe program.
> Because uprobes did not previously support auto-attach, it's possible that
> because the uprobe section name is not in auto-attachable form, overall
> BPF skeleton attach would now fail due to the failure of the uprobe program
> to auto-attach.  So we need to handle the case of auto-attach failure where
> the form of the section name is not suitable for auto-attach without
> a complete attach failure.  On surveying the code, bpf_program__attach()
> already returns -ESRCH in cases where no auto-attach function is
> supplied, so for consistency with that - and because that return value
> is less likely to collide with actual attach failures than -EOPNOTSUPP -
> it is used as the attach function return value signalling auto-attach
> is not possible.

I'm actually working on generalizing and extending this part of
libbpf's SEC() handling, I should post code today or early next week.
So we can base your code on those changes and we won't need to worry
about error code collisions.

>
> Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
> ---
>  tools/lib/bpf/libbpf.c | 77 ++++++++++++++++++++++++++++++++++++++++++++++++--
>  1 file changed, 74 insertions(+), 3 deletions(-)
>
> diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
> index eb95629..e2b4415 100644
> --- a/tools/lib/bpf/libbpf.c
> +++ b/tools/lib/bpf/libbpf.c
> @@ -8581,6 +8581,7 @@ int bpf_program__set_log_buf(struct bpf_program *prog, char *log_buf, size_t log
>  }
>

[...]
Alan Maguire Feb. 23, 2022, 9:32 a.m. UTC | #2
On Fri, 4 Feb 2022, Andrii Nakryiko wrote:

> On Mon, Jan 31, 2022 at 8:13 AM Alan Maguire <alan.maguire@oracle.com> wrote:
> >
> > Now that u[ret]probes can use name-based specification, it makes
> > sense to add support for auto-attach based on SEC() definition.
> > The format proposed is
> >
> >         SEC("u[ret]probe//path/to/prog:[raw_offset|[function_name[+offset]]")
> >
> > For example, to trace malloc() in libc:
> >
> >         SEC("uprobe//usr/lib64/libc.so.6:malloc")
> 
> I assume that path to library can be relative path as well, right?
> 
> Also, should be look at trying to locate library in the system if it's
> specified as "libc"? Or if the binary is "bash", for example. Just
> bringing this up, because I think it came up before in the context of
> one of libbpf-tools.
>

This is a great suggestion for usability, but I'm trying to puzzle
out how to carry out the location search for cases where the path 
specified is not a relative or absolute path.

A few things we can can do - use search paths from PATH and
LD_LIBRARY_PATH, with an appended set of standard locations
such as /usr/bin, /usr/sbin for cases where those environment
variables are missing or incomplete.

However, when it comes to libraries, do we search in /usr/lib64 or 
/usr/lib? We could use whether the version of libbpf is 64-bit or not I 
suppose, but it's at least conceivable that the user might want to 
instrument a 32-bit library from a 64-bit libbpf.  Do you think that
approach is sufficient, or are there other things we should do? Thanks!

Alan
Andrii Nakryiko Feb. 24, 2022, 1:49 a.m. UTC | #3
On Wed, Feb 23, 2022 at 1:33 AM Alan Maguire <alan.maguire@oracle.com> wrote:
>
> On Fri, 4 Feb 2022, Andrii Nakryiko wrote:
>
> > On Mon, Jan 31, 2022 at 8:13 AM Alan Maguire <alan.maguire@oracle.com> wrote:
> > >
> > > Now that u[ret]probes can use name-based specification, it makes
> > > sense to add support for auto-attach based on SEC() definition.
> > > The format proposed is
> > >
> > >         SEC("u[ret]probe//path/to/prog:[raw_offset|[function_name[+offset]]")
> > >
> > > For example, to trace malloc() in libc:
> > >
> > >         SEC("uprobe//usr/lib64/libc.so.6:malloc")
> >
> > I assume that path to library can be relative path as well, right?
> >
> > Also, should be look at trying to locate library in the system if it's
> > specified as "libc"? Or if the binary is "bash", for example. Just
> > bringing this up, because I think it came up before in the context of
> > one of libbpf-tools.
> >
>
> This is a great suggestion for usability, but I'm trying to puzzle
> out how to carry out the location search for cases where the path
> specified is not a relative or absolute path.
>
> A few things we can can do - use search paths from PATH and
> LD_LIBRARY_PATH, with an appended set of standard locations
> such as /usr/bin, /usr/sbin for cases where those environment
> variables are missing or incomplete.
>
> However, when it comes to libraries, do we search in /usr/lib64 or
> /usr/lib? We could use whether the version of libbpf is 64-bit or not I
> suppose, but it's at least conceivable that the user might want to
> instrument a 32-bit library from a 64-bit libbpf.  Do you think that
> approach is sufficient, or are there other things we should do? Thanks!

How does dynamic linker do this? When I specify "libbpf.so", is there
some documented algorithm for finding the library? If it's more or
less codified, we could implement something like that. If not, well,
too bad, we can do some useful heuristic, but ultimately there will be
cases that won't be supported. Worst case user will have to specify an
absolute path.

>
> Alan
Alan Maguire Feb. 24, 2022, 3:39 p.m. UTC | #4
On Thu, 24 Feb 2022, Andrii Nakryiko wrote:

> On Wed, Feb 23, 2022 at 1:33 AM Alan Maguire <alan.maguire@oracle.com> wrote:
> >
> > On Fri, 4 Feb 2022, Andrii Nakryiko wrote:
> >
> > > On Mon, Jan 31, 2022 at 8:13 AM Alan Maguire <alan.maguire@oracle.com> wrote:
> > > >
> > > > Now that u[ret]probes can use name-based specification, it makes
> > > > sense to add support for auto-attach based on SEC() definition.
> > > > The format proposed is
> > > >
> > > >         SEC("u[ret]probe//path/to/prog:[raw_offset|[function_name[+offset]]")
> > > >
> > > > For example, to trace malloc() in libc:
> > > >
> > > >         SEC("uprobe//usr/lib64/libc.so.6:malloc")
> > >
> > > I assume that path to library can be relative path as well, right?
> > >
> > > Also, should be look at trying to locate library in the system if it's
> > > specified as "libc"? Or if the binary is "bash", for example. Just
> > > bringing this up, because I think it came up before in the context of
> > > one of libbpf-tools.
> > >
> >
> > This is a great suggestion for usability, but I'm trying to puzzle
> > out how to carry out the location search for cases where the path
> > specified is not a relative or absolute path.
> >
> > A few things we can can do - use search paths from PATH and
> > LD_LIBRARY_PATH, with an appended set of standard locations
> > such as /usr/bin, /usr/sbin for cases where those environment
> > variables are missing or incomplete.
> >
> > However, when it comes to libraries, do we search in /usr/lib64 or
> > /usr/lib? We could use whether the version of libbpf is 64-bit or not I
> > suppose, but it's at least conceivable that the user might want to
> > instrument a 32-bit library from a 64-bit libbpf.  Do you think that
> > approach is sufficient, or are there other things we should do? Thanks!
> 
> How does dynamic linker do this? When I specify "libbpf.so", is there
> some documented algorithm for finding the library? If it's more or
> less codified, we could implement something like that. If not, well,
> too bad, we can do some useful heuristic, but ultimately there will be
> cases that won't be supported. Worst case user will have to specify an
> absolute path.
> 

There's a nice description in [1]:

       If filename is NULL, then the returned handle is for the main
       program.  If filename contains a slash ("/"), then it is
       interpreted as a (relative or absolute) pathname.  Otherwise, the
       dynamic linker searches for the object as follows (see ld.so(8)
       for further details):

       o   (ELF only) If the calling object (i.e., the shared library or
           executable from which dlopen() is called) contains a DT_RPATH
           tag, and does not contain a DT_RUNPATH tag, then the
           directories listed in the DT_RPATH tag are searched.

       o   If, at the time that the program was started, the environment
           variable LD_LIBRARY_PATH was defined to contain a colon-
           separated list of directories, then these are searched.  (As
           a security measure, this variable is ignored for set-user-ID
           and set-group-ID programs.)

       o   (ELF only) If the calling object contains a DT_RUNPATH tag,
           then the directories listed in that tag are searched.

       o   The cache file /etc/ld.so.cache (maintained by ldconfig(8))
           is checked to see whether it contains an entry for filename.

       o   The directories /lib and /usr/lib are searched (in that
           order).

Rather than re-inventing all of that however, we could use it
by dlopen()ing the file when it is a library (contains .so) and
is not a relative/absolute path, and then use dlinfo()'s
RTLD_DI_ORIGIN command to extract the path discovered, and then
dlclose() it. It would require linking libbpf with -ldl however.
What do you think?

Alan
 
[1] https://man7.org/linux/man-pages/man3/dlopen.3.html

> >
> > Alan
>
Andrii Nakryiko March 1, 2022, 1:45 a.m. UTC | #5
On Thu, Feb 24, 2022 at 7:40 AM Alan Maguire <alan.maguire@oracle.com> wrote:
>
> On Thu, 24 Feb 2022, Andrii Nakryiko wrote:
>
> > On Wed, Feb 23, 2022 at 1:33 AM Alan Maguire <alan.maguire@oracle.com> wrote:
> > >
> > > On Fri, 4 Feb 2022, Andrii Nakryiko wrote:
> > >
> > > > On Mon, Jan 31, 2022 at 8:13 AM Alan Maguire <alan.maguire@oracle.com> wrote:
> > > > >
> > > > > Now that u[ret]probes can use name-based specification, it makes
> > > > > sense to add support for auto-attach based on SEC() definition.
> > > > > The format proposed is
> > > > >
> > > > >         SEC("u[ret]probe//path/to/prog:[raw_offset|[function_name[+offset]]")
> > > > >
> > > > > For example, to trace malloc() in libc:
> > > > >
> > > > >         SEC("uprobe//usr/lib64/libc.so.6:malloc")
> > > >
> > > > I assume that path to library can be relative path as well, right?
> > > >
> > > > Also, should be look at trying to locate library in the system if it's
> > > > specified as "libc"? Or if the binary is "bash", for example. Just
> > > > bringing this up, because I think it came up before in the context of
> > > > one of libbpf-tools.
> > > >
> > >
> > > This is a great suggestion for usability, but I'm trying to puzzle
> > > out how to carry out the location search for cases where the path
> > > specified is not a relative or absolute path.
> > >
> > > A few things we can can do - use search paths from PATH and
> > > LD_LIBRARY_PATH, with an appended set of standard locations
> > > such as /usr/bin, /usr/sbin for cases where those environment
> > > variables are missing or incomplete.
> > >
> > > However, when it comes to libraries, do we search in /usr/lib64 or
> > > /usr/lib? We could use whether the version of libbpf is 64-bit or not I
> > > suppose, but it's at least conceivable that the user might want to
> > > instrument a 32-bit library from a 64-bit libbpf.  Do you think that
> > > approach is sufficient, or are there other things we should do? Thanks!
> >
> > How does dynamic linker do this? When I specify "libbpf.so", is there
> > some documented algorithm for finding the library? If it's more or
> > less codified, we could implement something like that. If not, well,
> > too bad, we can do some useful heuristic, but ultimately there will be
> > cases that won't be supported. Worst case user will have to specify an
> > absolute path.
> >
>
> There's a nice description in [1]:
>
>        If filename is NULL, then the returned handle is for the main
>        program.  If filename contains a slash ("/"), then it is
>        interpreted as a (relative or absolute) pathname.  Otherwise, the
>        dynamic linker searches for the object as follows (see ld.so(8)
>        for further details):
>
>        o   (ELF only) If the calling object (i.e., the shared library or
>            executable from which dlopen() is called) contains a DT_RPATH
>            tag, and does not contain a DT_RUNPATH tag, then the
>            directories listed in the DT_RPATH tag are searched.
>
>        o   If, at the time that the program was started, the environment
>            variable LD_LIBRARY_PATH was defined to contain a colon-
>            separated list of directories, then these are searched.  (As
>            a security measure, this variable is ignored for set-user-ID
>            and set-group-ID programs.)
>
>        o   (ELF only) If the calling object contains a DT_RUNPATH tag,
>            then the directories listed in that tag are searched.
>
>        o   The cache file /etc/ld.so.cache (maintained by ldconfig(8))
>            is checked to see whether it contains an entry for filename.
>
>        o   The directories /lib and /usr/lib are searched (in that
>            order).
>
> Rather than re-inventing all of that however, we could use it
> by dlopen()ing the file when it is a library (contains .so) and
> is not a relative/absolute path, and then use dlinfo()'s
> RTLD_DI_ORIGIN command to extract the path discovered, and then
> dlclose() it. It would require linking libbpf with -ldl however.
> What do you think?

What do I think about dlopen()'ing some random library under root by
libbpf into the host process?.. I'd say that's a bad idea.

I'd probably start with just checking /lib, /usr/lib (and maybe those
32-bit and 64-bit specific ones, depending on host architecture; not
sure about all the details there, tbh). Or just say that the path to
the shared library has to be specified.

There is a similar problem with doing something like
SEC("uprobe/bash:readline"). Do we want to "search" for bash? I think
bpftrace is supporting that, but I haven't checked what it is doing.


>
> Alan
>
> [1] https://man7.org/linux/man-pages/man3/dlopen.3.html
>
> > >
> > > Alan
> >
diff mbox series

Patch

diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index eb95629..e2b4415 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -8581,6 +8581,7 @@  int bpf_program__set_log_buf(struct bpf_program *prog, char *log_buf, size_t log
 }
 
 static struct bpf_link *attach_kprobe(const struct bpf_program *prog, long cookie);
+static struct bpf_link *attach_uprobe(const struct bpf_program *prog, long cookie);
 static struct bpf_link *attach_tp(const struct bpf_program *prog, long cookie);
 static struct bpf_link *attach_raw_tp(const struct bpf_program *prog, long cookie);
 static struct bpf_link *attach_trace(const struct bpf_program *prog, long cookie);
@@ -8592,9 +8593,9 @@  int bpf_program__set_log_buf(struct bpf_program *prog, char *log_buf, size_t log
 	SEC_DEF("sk_reuseport/migrate",	SK_REUSEPORT, BPF_SK_REUSEPORT_SELECT_OR_MIGRATE, SEC_ATTACHABLE | SEC_SLOPPY_PFX),
 	SEC_DEF("sk_reuseport",		SK_REUSEPORT, BPF_SK_REUSEPORT_SELECT, SEC_ATTACHABLE | SEC_SLOPPY_PFX),
 	SEC_DEF("kprobe/",		KPROBE,	0, SEC_NONE, attach_kprobe),
-	SEC_DEF("uprobe/",		KPROBE,	0, SEC_NONE),
+	SEC_DEF("uprobe/",		KPROBE, 0, SEC_NONE, attach_uprobe),
 	SEC_DEF("kretprobe/",		KPROBE, 0, SEC_NONE, attach_kprobe),
-	SEC_DEF("uretprobe/",		KPROBE, 0, SEC_NONE),
+	SEC_DEF("uretprobe/",		KPROBE, 0, SEC_NONE, attach_uprobe),
 	SEC_DEF("tc",			SCHED_CLS, 0, SEC_NONE),
 	SEC_DEF("classifier",		SCHED_CLS, 0, SEC_NONE | SEC_SLOPPY_PFX),
 	SEC_DEF("action",		SCHED_ACT, 0, SEC_NONE | SEC_SLOPPY_PFX),
@@ -10525,6 +10526,64 @@  static long elf_find_func_offset(const char *binary_path, const char *name)
 
 }
 
+/* Format of u[ret]probe section definition supporting auto-attach:
+ * u[ret]probe//path/to/prog:function[+offset]
+ *
+ * Many uprobe programs do not avail of auto-attach, so we need to handle the
+ * case where the format is uprobe/myfunc by returning NULL rather than an
+ * error.
+ */
+static struct bpf_link *attach_uprobe(const struct bpf_program *prog, long cookie)
+{
+	DECLARE_LIBBPF_OPTS(bpf_uprobe_opts, opts);
+	char *func_name, binary_path[512];
+	unsigned int raw_offset;
+	char *func, *probe_name;
+	struct bpf_link *link;
+	size_t offset = 0;
+	int n, err;
+
+	opts.retprobe = str_has_pfx(prog->sec_name, "uretprobe/");
+	if (opts.retprobe)
+		probe_name = prog->sec_name + sizeof("uretprobe/") - 1;
+	else
+		probe_name = prog->sec_name + sizeof("uprobe/") - 1;
+
+	snprintf(binary_path, sizeof(binary_path), "%s", probe_name);
+	/* ':' should be prior to function+offset */
+	func_name = strrchr(binary_path, ':');
+	if (!func_name) {
+		pr_debug("section '%s' is old-style u[ret]probe/function, cannot auto-attach\n",
+			 prog->sec_name);
+		return libbpf_err_ptr(-ESRCH);
+	}
+	func_name[0] = '\0';
+	func_name++;
+	n = sscanf(func_name, "%m[a-zA-Z0-9_.]+%li", &func, &offset);
+	if (n < 1) {
+		err = -EINVAL;
+		pr_warn("uprobe name is invalid: %s\n", func_name);
+		return libbpf_err_ptr(err);
+	}
+	/* Is func a raw address? */
+	if (n == 1 && sscanf(func, "%x", &raw_offset) == 1) {
+		free(func);
+		func = NULL;
+		offset = (size_t)raw_offset;
+	}
+	if (opts.retprobe && offset != 0) {
+		free(func);
+		err = -EINVAL;
+		pr_warn("uretprobes do not support offset specification\n");
+		return libbpf_err_ptr(err);
+	}
+	opts.func_name = func;
+
+	link = bpf_program__attach_uprobe_opts(prog, -1, binary_path, offset, &opts);
+	free(func);
+	return link;
+}
+
 struct bpf_link *bpf_program__attach_uprobe(const struct bpf_program *prog,
 					    bool retprobe, pid_t pid,
 					    const char *binary_path,
@@ -12041,7 +12100,19 @@  int bpf_object__attach_skeleton(struct bpf_object_skeleton *s)
 
 		*link = bpf_program__attach(prog);
 		err = libbpf_get_error(*link);
-		if (err) {
+		switch (err) {
+		case 0:
+			break;
+		case -ESRCH:
+			/* -ESRCH is used as it is less likely to collide with other error
+			 * cases in program attach while being consistent with the value
+			 * returned by bpf_program__attach() where no auto-attach function
+			 * is provided.
+			 */
+			pr_warn("auto-attach not supported for program '%s'\n",
+				bpf_program__name(prog));
+			break;
+		default:
 			pr_warn("failed to auto-attach program '%s': %d\n",
 				bpf_program__name(prog), err);
 			return libbpf_err(err);