diff mbox series

[bpf-next,1/2] bpf: Add bpf_task_pt_regs() helper

Message ID 6d269f13f2ff742e319a8c19112ef40f0b4c2f46.1629329560.git.dxu@dxuuu.xyz (mailing list archive)
State Changes Requested
Delegated to: BPF
Headers show
Series Add bpf_task_pt_regs() helper | expand

Checks

Context Check Description
netdev/cover_letter success Link
netdev/fixes_present success Link
netdev/patch_count success Link
netdev/tree_selection success Clearly marked for bpf-next
netdev/subject_prefix success Link
netdev/cc_maintainers warning 14 maintainers not CCed: haoluo@google.com jackmanb@google.com songliubraving@fb.com andrii@kernel.org daniel@iogearbox.net kafai@fb.com mingo@redhat.com joe@cilium.io quentin@isovalent.com netdev@vger.kernel.org ast@kernel.org john.fastabend@gmail.com rostedt@goodmis.org kpsingh@kernel.org
netdev/source_inline success Was 0 now: 0
netdev/verify_signedoff success Link
netdev/module_param success Was 0 now: 0
netdev/build_32bit success Errors and warnings before: 11909 this patch: 11909
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/verify_fixes success Link
netdev/checkpatch warning CHECK: No space is necessary after a cast
netdev/build_allmodconfig_warn success Errors and warnings before: 11431 this patch: 11431
netdev/header_inline success Link

Commit Message

Daniel Xu Aug. 18, 2021, 11:41 p.m. UTC
The motivation behind this helper is to access userspace pt_regs in a
kprobe handler.

uprobe's ctx is the userspace pt_regs. kprobe's ctx is the kernelspace
pt_regs. bpf_task_pt_regs() allows accessing userspace pt_regs in a
kprobe handler. The final case (kernelspace pt_regs in uprobe) is
pretty rare (usermode helper) so I think that can be solved later if
necessary.

More concretely, this helper is useful in doing BPF-based DWARF stack
unwinding. Currently the kernel can only do framepointer based stack
unwinds for userspace code. This is because the DWARF state machines are
too fragile to be computed in kernelspace [0]. The idea behind
DWARF-based stack unwinds w/ BPF is to copy a chunk of the userspace
stack (while in prog context) and send it up to userspace for unwinding
(probably with libunwind) [1]. This would effectively enable profiling
applications with -fomit-frame-pointer using kprobes and uprobes.

[0]: https://lkml.org/lkml/2012/2/10/356
[1]: https://github.com/danobi/bpf-dwarf-walk

Signed-off-by: Daniel Xu <dxu@dxuuu.xyz>
---
 include/uapi/linux/bpf.h       |  7 +++++++
 kernel/trace/bpf_trace.c       | 20 ++++++++++++++++++++
 tools/include/uapi/linux/bpf.h |  7 +++++++
 3 files changed, 34 insertions(+)

Comments

Andrii Nakryiko Aug. 19, 2021, 8:27 p.m. UTC | #1
On Wed, Aug 18, 2021 at 4:42 PM Daniel Xu <dxu@dxuuu.xyz> wrote:
>
> The motivation behind this helper is to access userspace pt_regs in a
> kprobe handler.
>
> uprobe's ctx is the userspace pt_regs. kprobe's ctx is the kernelspace
> pt_regs. bpf_task_pt_regs() allows accessing userspace pt_regs in a
> kprobe handler. The final case (kernelspace pt_regs in uprobe) is
> pretty rare (usermode helper) so I think that can be solved later if
> necessary.
>
> More concretely, this helper is useful in doing BPF-based DWARF stack
> unwinding. Currently the kernel can only do framepointer based stack
> unwinds for userspace code. This is because the DWARF state machines are
> too fragile to be computed in kernelspace [0]. The idea behind
> DWARF-based stack unwinds w/ BPF is to copy a chunk of the userspace
> stack (while in prog context) and send it up to userspace for unwinding
> (probably with libunwind) [1]. This would effectively enable profiling
> applications with -fomit-frame-pointer using kprobes and uprobes.
>
> [0]: https://lkml.org/lkml/2012/2/10/356
> [1]: https://github.com/danobi/bpf-dwarf-walk
>
> Signed-off-by: Daniel Xu <dxu@dxuuu.xyz>
> ---

Seems like a really useful thing. Few notes:

1. Given this is user pt_regs, should we call it bpf_get_user_pt_regs()?
2. Would it be safe to enable it for all types of programs, not just
kprobe/tp/raw_tp/perf? Why limit the list?
3. It seems like it's the sixth declaration of BTF_ID for task_struct,
maybe it's time to consolidate them?

>  include/uapi/linux/bpf.h       |  7 +++++++
>  kernel/trace/bpf_trace.c       | 20 ++++++++++++++++++++
>  tools/include/uapi/linux/bpf.h |  7 +++++++
>  3 files changed, 34 insertions(+)

[...]
Daniel Xu Aug. 24, 2021, 1:38 a.m. UTC | #2
On Thu, Aug 19, 2021 at 01:27:16PM -0700, Andrii Nakryiko wrote:
> On Wed, Aug 18, 2021 at 4:42 PM Daniel Xu <dxu@dxuuu.xyz> wrote:
> >
> > The motivation behind this helper is to access userspace pt_regs in a
> > kprobe handler.
> >
> > uprobe's ctx is the userspace pt_regs. kprobe's ctx is the kernelspace
> > pt_regs. bpf_task_pt_regs() allows accessing userspace pt_regs in a
> > kprobe handler. The final case (kernelspace pt_regs in uprobe) is
> > pretty rare (usermode helper) so I think that can be solved later if
> > necessary.
> >
> > More concretely, this helper is useful in doing BPF-based DWARF stack
> > unwinding. Currently the kernel can only do framepointer based stack
> > unwinds for userspace code. This is because the DWARF state machines are
> > too fragile to be computed in kernelspace [0]. The idea behind
> > DWARF-based stack unwinds w/ BPF is to copy a chunk of the userspace
> > stack (while in prog context) and send it up to userspace for unwinding
> > (probably with libunwind) [1]. This would effectively enable profiling
> > applications with -fomit-frame-pointer using kprobes and uprobes.
> >
> > [0]: https://lkml.org/lkml/2012/2/10/356
> > [1]: https://github.com/danobi/bpf-dwarf-walk
> >
> > Signed-off-by: Daniel Xu <dxu@dxuuu.xyz>
> > ---
> 
> Seems like a really useful thing. Few notes:
> 
> 1. Given this is user pt_regs, should we call it bpf_get_user_pt_regs()?

I'm not 100% sure, but it seems to me that task_pt_regs() works for
kernel threads too. I see in arch/x86/kernel/smpboot.c that
task_pt_regs() is being used on the idle thread (which I think is a
kernel thread).

> 2. Would it be safe to enable it for all types of programs, not just
> kprobe/tp/raw_tp/perf? Why limit the list?

Oh I didn't realize I put a limit on it. I'll look closer.

> 3. It seems like it's the sixth declaration of BTF_ID for task_struct,
> maybe it's time to consolidate them?

Ok, will consolidate.

[...]

Thanks,
Daniel
diff mbox series

Patch

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index c4f7892edb2b..47427493206a 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -4871,6 +4871,12 @@  union bpf_attr {
  * 	Return
  *		Value specified by user at BPF link creation/attachment time
  *		or 0, if it was not specified.
+ *
+ * long bpf_task_pt_regs(struct task_struct *task)
+ *	Description
+ *		Get the struct pt_regs associated with **task**.
+ *	Return
+ *		A pointer to struct pt_regs.
  */
 #define __BPF_FUNC_MAPPER(FN)		\
 	FN(unspec),			\
@@ -5048,6 +5054,7 @@  union bpf_attr {
 	FN(timer_cancel),		\
 	FN(get_func_ip),		\
 	FN(get_attach_cookie),		\
+	FN(task_pt_regs),		\
 	/* */
 
 /* integer value in 'imm' field of BPF_CALL instruction selects which helper
diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index cbc73c08c4a4..5924bb5a1462 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -723,6 +723,24 @@  static const struct bpf_func_proto bpf_get_current_task_btf_proto = {
 	.ret_btf_id	= &bpf_get_current_btf_ids[0],
 };
 
+BPF_CALL_1(bpf_task_pt_regs, struct task_struct *, task)
+{
+	return (unsigned long) task_pt_regs(task);
+}
+
+BTF_ID_LIST(bpf_task_pt_regs_ids)
+BTF_ID(struct, task_struct)
+BTF_ID(struct, pt_regs)
+
+static const struct bpf_func_proto bpf_task_pt_regs_proto = {
+	.func		= bpf_task_pt_regs,
+	.gpl_only	= true,
+	.arg1_type	= ARG_PTR_TO_BTF_ID,
+	.arg1_btf_id	= &bpf_task_pt_regs_ids[0],
+	.ret_type	= RET_PTR_TO_BTF_ID,
+	.ret_btf_id	= &bpf_task_pt_regs_ids[1],
+};
+
 BPF_CALL_2(bpf_current_task_under_cgroup, struct bpf_map *, map, u32, idx)
 {
 	struct bpf_array *array = container_of(map, struct bpf_array, map);
@@ -1032,6 +1050,8 @@  bpf_tracing_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 		return &bpf_get_current_task_proto;
 	case BPF_FUNC_get_current_task_btf:
 		return &bpf_get_current_task_btf_proto;
+	case BPF_FUNC_task_pt_regs:
+		return &bpf_task_pt_regs_proto;
 	case BPF_FUNC_get_current_uid_gid:
 		return &bpf_get_current_uid_gid_proto;
 	case BPF_FUNC_get_current_comm:
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index c4f7892edb2b..47427493206a 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -4871,6 +4871,12 @@  union bpf_attr {
  * 	Return
  *		Value specified by user at BPF link creation/attachment time
  *		or 0, if it was not specified.
+ *
+ * long bpf_task_pt_regs(struct task_struct *task)
+ *	Description
+ *		Get the struct pt_regs associated with **task**.
+ *	Return
+ *		A pointer to struct pt_regs.
  */
 #define __BPF_FUNC_MAPPER(FN)		\
 	FN(unspec),			\
@@ -5048,6 +5054,7 @@  union bpf_attr {
 	FN(timer_cancel),		\
 	FN(get_func_ip),		\
 	FN(get_attach_cookie),		\
+	FN(task_pt_regs),		\
 	/* */
 
 /* integer value in 'imm' field of BPF_CALL instruction selects which helper