[bpf-next,1/2] bpf: Add bpf_task_pt_regs() helper

Message ID	6d269f13f2ff742e319a8c19112ef40f0b4c2f46.1629329560.git.dxu@dxuuu.xyz (mailing list archive)
State	Changes Requested
Delegated to:	BPF
Headers	show Return-Path: <bpf-owner@kernel.org> From: Daniel Xu <dxu@dxuuu.xyz> To: bpf@vger.kernel.org, yhs@fb.com, andriin@fb.com Cc: Daniel Xu <dxu@dxuuu.xyz>, kernel-team@fb.com, linux-kernel@vger.kernel.org Subject: [PATCH bpf-next 1/2] bpf: Add bpf_task_pt_regs() helper Date: Wed, 18 Aug 2021 16:41:41 -0700 Message-Id: <6d269f13f2ff742e319a8c19112ef40f0b4c2f46.1629329560.git.dxu@dxuuu.xyz> In-Reply-To: <cover.1629329560.git.dxu@dxuuu.xyz> References: <cover.1629329560.git.dxu@dxuuu.xyz> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk
Series	Add bpf_task_pt_regs() helper \| expand [bpf-next,0/2] Add bpf_task_pt_regs() helper [bpf-next,1/2] bpf: Add bpf_task_pt_regs() helper [bpf-next,2/2] bpf: selftests: Add bpf_task_pt_regs() selftest

Message ID

6d269f13f2ff742e319a8c19112ef40f0b4c2f46.1629329560.git.dxu@dxuuu.xyz (mailing list archive)

State

Changes Requested

Delegated to:

BPF

Headers

From: Daniel Xu <dxu@dxuuu.xyz>
To: bpf@vger.kernel.org, yhs@fb.com, andriin@fb.com
Cc: Daniel Xu <dxu@dxuuu.xyz>, kernel-team@fb.com,
        linux-kernel@vger.kernel.org
Subject: [PATCH bpf-next 1/2] bpf: Add bpf_task_pt_regs() helper
Date: Wed, 18 Aug 2021 16:41:41 -0700
Message-Id: 
 <6d269f13f2ff742e319a8c19112ef40f0b4c2f46.1629329560.git.dxu@dxuuu.xyz>
In-Reply-To: <cover.1629329560.git.dxu@dxuuu.xyz>
References: <cover.1629329560.git.dxu@dxuuu.xyz>
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
Precedence: bulk

Series

Add bpf_task_pt_regs() helper | expand

Context	Check	Description
netdev/cover_letter	success	Link
netdev/fixes_present	success	Link
netdev/patch_count	success	Link
netdev/tree_selection	success	Clearly marked for bpf-next
netdev/subject_prefix	success	Link
netdev/cc_maintainers	warning	14 maintainers not CCed: haoluo@google.com jackmanb@google.com songliubraving@fb.com andrii@kernel.org daniel@iogearbox.net kafai@fb.com mingo@redhat.com joe@cilium.io quentin@isovalent.com netdev@vger.kernel.org ast@kernel.org john.fastabend@gmail.com rostedt@goodmis.org kpsingh@kernel.org
netdev/source_inline	success	Was 0 now: 0
netdev/verify_signedoff	success	Link
netdev/module_param	success	Was 0 now: 0
netdev/build_32bit	success	Errors and warnings before: 11909 this patch: 11909
netdev/kdoc	success	Errors and warnings before: 0 this patch: 0
netdev/verify_fixes	success	Link
netdev/checkpatch	warning	CHECK: No space is necessary after a cast
netdev/build_allmodconfig_warn	success	Errors and warnings before: 11431 this patch: 11431
netdev/header_inline	success	Link

Context

Check

Description

netdev/cover_letter

success

Link

netdev/fixes_present

success

Link

netdev/patch_count

success

Link

netdev/tree_selection

success

Clearly marked for bpf-next

netdev/subject_prefix

success

Link

netdev/cc_maintainers

warning

14 maintainers not CCed: haoluo@google.com jackmanb@google.com songliubraving@fb.com andrii@kernel.org daniel@iogearbox.net kafai@fb.com mingo@redhat.com joe@cilium.io quentin@isovalent.com netdev@vger.kernel.org ast@kernel.org john.fastabend@gmail.com rostedt@goodmis.org kpsingh@kernel.org

netdev/source_inline

success

Was 0 now: 0

netdev/verify_signedoff

success

Link

netdev/module_param

success

Was 0 now: 0

netdev/build_32bit

success

Errors and warnings before: 11909 this patch: 11909

netdev/kdoc

success

Errors and warnings before: 0 this patch: 0

netdev/verify_fixes

success

Link

netdev/checkpatch

warning

CHECK: No space is necessary after a cast

netdev/build_allmodconfig_warn

success

Errors and warnings before: 11431 this patch: 11431

netdev/header_inline

success

Link

Commit Message

Daniel Xu Aug. 18, 2021, 11:41 p.m. UTC

The motivation behind this helper is to access userspace pt_regs in a
kprobe handler.

uprobe's ctx is the userspace pt_regs. kprobe's ctx is the kernelspace
pt_regs. bpf_task_pt_regs() allows accessing userspace pt_regs in a
kprobe handler. The final case (kernelspace pt_regs in uprobe) is
pretty rare (usermode helper) so I think that can be solved later if
necessary.

More concretely, this helper is useful in doing BPF-based DWARF stack
unwinding. Currently the kernel can only do framepointer based stack
unwinds for userspace code. This is because the DWARF state machines are
too fragile to be computed in kernelspace [0]. The idea behind
DWARF-based stack unwinds w/ BPF is to copy a chunk of the userspace
stack (while in prog context) and send it up to userspace for unwinding
(probably with libunwind) [1]. This would effectively enable profiling
applications with -fomit-frame-pointer using kprobes and uprobes.

[0]: https://lkml.org/lkml/2012/2/10/356
[1]: https://github.com/danobi/bpf-dwarf-walk

Signed-off-by: Daniel Xu <dxu@dxuuu.xyz>
---
 include/uapi/linux/bpf.h       |  7 +++++++
 kernel/trace/bpf_trace.c       | 20 ++++++++++++++++++++
 tools/include/uapi/linux/bpf.h |  7 +++++++
 3 files changed, 34 insertions(+)

Comments

Andrii Nakryiko Aug. 19, 2021, 8:27 p.m. UTC | #1

On Wed, Aug 18, 2021 at 4:42 PM Daniel Xu <dxu@dxuuu.xyz> wrote:
>
> The motivation behind this helper is to access userspace pt_regs in a
> kprobe handler.
>
> uprobe's ctx is the userspace pt_regs. kprobe's ctx is the kernelspace
> pt_regs. bpf_task_pt_regs() allows accessing userspace pt_regs in a
> kprobe handler. The final case (kernelspace pt_regs in uprobe) is
> pretty rare (usermode helper) so I think that can be solved later if
> necessary.
>
> More concretely, this helper is useful in doing BPF-based DWARF stack
> unwinding. Currently the kernel can only do framepointer based stack
> unwinds for userspace code. This is because the DWARF state machines are
> too fragile to be computed in kernelspace [0]. The idea behind
> DWARF-based stack unwinds w/ BPF is to copy a chunk of the userspace
> stack (while in prog context) and send it up to userspace for unwinding
> (probably with libunwind) [1]. This would effectively enable profiling
> applications with -fomit-frame-pointer using kprobes and uprobes.
>
> [0]: https://lkml.org/lkml/2012/2/10/356
> [1]: https://github.com/danobi/bpf-dwarf-walk
>
> Signed-off-by: Daniel Xu <dxu@dxuuu.xyz>
> ---

Seems like a really useful thing. Few notes:

1. Given this is user pt_regs, should we call it bpf_get_user_pt_regs()?
2. Would it be safe to enable it for all types of programs, not just
kprobe/tp/raw_tp/perf? Why limit the list?
3. It seems like it's the sixth declaration of BTF_ID for task_struct,
maybe it's time to consolidate them?

>  include/uapi/linux/bpf.h       |  7 +++++++
>  kernel/trace/bpf_trace.c       | 20 ++++++++++++++++++++
>  tools/include/uapi/linux/bpf.h |  7 +++++++
>  3 files changed, 34 insertions(+)

[...]

Daniel Xu Aug. 24, 2021, 1:38 a.m. UTC | #2

On Thu, Aug 19, 2021 at 01:27:16PM -0700, Andrii Nakryiko wrote:
> On Wed, Aug 18, 2021 at 4:42 PM Daniel Xu <dxu@dxuuu.xyz> wrote:
> >
> > The motivation behind this helper is to access userspace pt_regs in a
> > kprobe handler.
> >
> > uprobe's ctx is the userspace pt_regs. kprobe's ctx is the kernelspace
> > pt_regs. bpf_task_pt_regs() allows accessing userspace pt_regs in a
> > kprobe handler. The final case (kernelspace pt_regs in uprobe) is
> > pretty rare (usermode helper) so I think that can be solved later if
> > necessary.
> >
> > More concretely, this helper is useful in doing BPF-based DWARF stack
> > unwinding. Currently the kernel can only do framepointer based stack
> > unwinds for userspace code. This is because the DWARF state machines are
> > too fragile to be computed in kernelspace [0]. The idea behind
> > DWARF-based stack unwinds w/ BPF is to copy a chunk of the userspace
> > stack (while in prog context) and send it up to userspace for unwinding
> > (probably with libunwind) [1]. This would effectively enable profiling
> > applications with -fomit-frame-pointer using kprobes and uprobes.
> >
> > [0]: https://lkml.org/lkml/2012/2/10/356
> > [1]: https://github.com/danobi/bpf-dwarf-walk
> >
> > Signed-off-by: Daniel Xu <dxu@dxuuu.xyz>
> > ---
> 
> Seems like a really useful thing. Few notes:
> 
> 1. Given this is user pt_regs, should we call it bpf_get_user_pt_regs()?

I'm not 100% sure, but it seems to me that task_pt_regs() works for
kernel threads too. I see in arch/x86/kernel/smpboot.c that
task_pt_regs() is being used on the idle thread (which I think is a
kernel thread).

> 2. Would it be safe to enable it for all types of programs, not just
> kprobe/tp/raw_tp/perf? Why limit the list?

Oh I didn't realize I put a limit on it. I'll look closer.

> 3. It seems like it's the sixth declaration of BTF_ID for task_struct,
> maybe it's time to consolidate them?

Ok, will consolidate.

[...]

Thanks,
Daniel

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index c4f7892edb2b..47427493206a 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -4871,6 +4871,12 @@  union bpf_attr {
  * 	Return
  *		Value specified by user at BPF link creation/attachment time
  *		or 0, if it was not specified.
+ *
+ * long bpf_task_pt_regs(struct task_struct *task)
+ *	Description
+ *		Get the struct pt_regs associated with **task**.
+ *	Return
+ *		A pointer to struct pt_regs.
  */
 #define __BPF_FUNC_MAPPER(FN)		\
 	FN(unspec),			\
@@ -5048,6 +5054,7 @@  union bpf_attr {
 	FN(timer_cancel),		\
 	FN(get_func_ip),		\
 	FN(get_attach_cookie),		\
+	FN(task_pt_regs),		\
 	/* */
 
 /* integer value in 'imm' field of BPF_CALL instruction selects which helper
diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index cbc73c08c4a4..5924bb5a1462 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -723,6 +723,24 @@  static const struct bpf_func_proto bpf_get_current_task_btf_proto = {
 	.ret_btf_id	= &bpf_get_current_btf_ids[0],
 };
 
+BPF_CALL_1(bpf_task_pt_regs, struct task_struct *, task)
+{
+	return (unsigned long) task_pt_regs(task);
+}
+
+BTF_ID_LIST(bpf_task_pt_regs_ids)
+BTF_ID(struct, task_struct)
+BTF_ID(struct, pt_regs)
+
+static const struct bpf_func_proto bpf_task_pt_regs_proto = {
+	.func		= bpf_task_pt_regs,
+	.gpl_only	= true,
+	.arg1_type	= ARG_PTR_TO_BTF_ID,
+	.arg1_btf_id	= &bpf_task_pt_regs_ids[0],
+	.ret_type	= RET_PTR_TO_BTF_ID,
+	.ret_btf_id	= &bpf_task_pt_regs_ids[1],
+};
+
 BPF_CALL_2(bpf_current_task_under_cgroup, struct bpf_map *, map, u32, idx)
 {
 	struct bpf_array *array = container_of(map, struct bpf_array, map);
@@ -1032,6 +1050,8 @@  bpf_tracing_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 		return &bpf_get_current_task_proto;
 	case BPF_FUNC_get_current_task_btf:
 		return &bpf_get_current_task_btf_proto;
+	case BPF_FUNC_task_pt_regs:
+		return &bpf_task_pt_regs_proto;
 	case BPF_FUNC_get_current_uid_gid:
 		return &bpf_get_current_uid_gid_proto;
 	case BPF_FUNC_get_current_comm:
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index c4f7892edb2b..47427493206a 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -4871,6 +4871,12 @@  union bpf_attr {
  * 	Return
  *		Value specified by user at BPF link creation/attachment time
  *		or 0, if it was not specified.
+ *
+ * long bpf_task_pt_regs(struct task_struct *task)
+ *	Description
+ *		Get the struct pt_regs associated with **task**.
+ *	Return
+ *		A pointer to struct pt_regs.
  */
 #define __BPF_FUNC_MAPPER(FN)		\
 	FN(unspec),			\
@@ -5048,6 +5054,7 @@  union bpf_attr {
 	FN(timer_cancel),		\
 	FN(get_func_ip),		\
 	FN(get_attach_cookie),		\
+	FN(task_pt_regs),		\
 	/* */
 
 /* integer value in 'imm' field of BPF_CALL instruction selects which helper

[bpf-next,1/2] bpf: Add bpf_task_pt_regs() helper

Checks

Commit Message

Comments

Patch