Message ID | 6d269f13f2ff742e319a8c19112ef40f0b4c2f46.1629329560.git.dxu@dxuuu.xyz (mailing list archive) |
---|---|
State | Changes Requested |
Delegated to: | BPF |
Headers | show |
Series | Add bpf_task_pt_regs() helper | expand |
On Wed, Aug 18, 2021 at 4:42 PM Daniel Xu <dxu@dxuuu.xyz> wrote: > > The motivation behind this helper is to access userspace pt_regs in a > kprobe handler. > > uprobe's ctx is the userspace pt_regs. kprobe's ctx is the kernelspace > pt_regs. bpf_task_pt_regs() allows accessing userspace pt_regs in a > kprobe handler. The final case (kernelspace pt_regs in uprobe) is > pretty rare (usermode helper) so I think that can be solved later if > necessary. > > More concretely, this helper is useful in doing BPF-based DWARF stack > unwinding. Currently the kernel can only do framepointer based stack > unwinds for userspace code. This is because the DWARF state machines are > too fragile to be computed in kernelspace [0]. The idea behind > DWARF-based stack unwinds w/ BPF is to copy a chunk of the userspace > stack (while in prog context) and send it up to userspace for unwinding > (probably with libunwind) [1]. This would effectively enable profiling > applications with -fomit-frame-pointer using kprobes and uprobes. > > [0]: https://lkml.org/lkml/2012/2/10/356 > [1]: https://github.com/danobi/bpf-dwarf-walk > > Signed-off-by: Daniel Xu <dxu@dxuuu.xyz> > --- Seems like a really useful thing. Few notes: 1. Given this is user pt_regs, should we call it bpf_get_user_pt_regs()? 2. Would it be safe to enable it for all types of programs, not just kprobe/tp/raw_tp/perf? Why limit the list? 3. It seems like it's the sixth declaration of BTF_ID for task_struct, maybe it's time to consolidate them? > include/uapi/linux/bpf.h | 7 +++++++ > kernel/trace/bpf_trace.c | 20 ++++++++++++++++++++ > tools/include/uapi/linux/bpf.h | 7 +++++++ > 3 files changed, 34 insertions(+) [...]
On Thu, Aug 19, 2021 at 01:27:16PM -0700, Andrii Nakryiko wrote: > On Wed, Aug 18, 2021 at 4:42 PM Daniel Xu <dxu@dxuuu.xyz> wrote: > > > > The motivation behind this helper is to access userspace pt_regs in a > > kprobe handler. > > > > uprobe's ctx is the userspace pt_regs. kprobe's ctx is the kernelspace > > pt_regs. bpf_task_pt_regs() allows accessing userspace pt_regs in a > > kprobe handler. The final case (kernelspace pt_regs in uprobe) is > > pretty rare (usermode helper) so I think that can be solved later if > > necessary. > > > > More concretely, this helper is useful in doing BPF-based DWARF stack > > unwinding. Currently the kernel can only do framepointer based stack > > unwinds for userspace code. This is because the DWARF state machines are > > too fragile to be computed in kernelspace [0]. The idea behind > > DWARF-based stack unwinds w/ BPF is to copy a chunk of the userspace > > stack (while in prog context) and send it up to userspace for unwinding > > (probably with libunwind) [1]. This would effectively enable profiling > > applications with -fomit-frame-pointer using kprobes and uprobes. > > > > [0]: https://lkml.org/lkml/2012/2/10/356 > > [1]: https://github.com/danobi/bpf-dwarf-walk > > > > Signed-off-by: Daniel Xu <dxu@dxuuu.xyz> > > --- > > Seems like a really useful thing. Few notes: > > 1. Given this is user pt_regs, should we call it bpf_get_user_pt_regs()? I'm not 100% sure, but it seems to me that task_pt_regs() works for kernel threads too. I see in arch/x86/kernel/smpboot.c that task_pt_regs() is being used on the idle thread (which I think is a kernel thread). > 2. Would it be safe to enable it for all types of programs, not just > kprobe/tp/raw_tp/perf? Why limit the list? Oh I didn't realize I put a limit on it. I'll look closer. > 3. It seems like it's the sixth declaration of BTF_ID for task_struct, > maybe it's time to consolidate them? Ok, will consolidate. [...] Thanks, Daniel
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h index c4f7892edb2b..47427493206a 100644 --- a/include/uapi/linux/bpf.h +++ b/include/uapi/linux/bpf.h @@ -4871,6 +4871,12 @@ union bpf_attr { * Return * Value specified by user at BPF link creation/attachment time * or 0, if it was not specified. + * + * long bpf_task_pt_regs(struct task_struct *task) + * Description + * Get the struct pt_regs associated with **task**. + * Return + * A pointer to struct pt_regs. */ #define __BPF_FUNC_MAPPER(FN) \ FN(unspec), \ @@ -5048,6 +5054,7 @@ union bpf_attr { FN(timer_cancel), \ FN(get_func_ip), \ FN(get_attach_cookie), \ + FN(task_pt_regs), \ /* */ /* integer value in 'imm' field of BPF_CALL instruction selects which helper diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c index cbc73c08c4a4..5924bb5a1462 100644 --- a/kernel/trace/bpf_trace.c +++ b/kernel/trace/bpf_trace.c @@ -723,6 +723,24 @@ static const struct bpf_func_proto bpf_get_current_task_btf_proto = { .ret_btf_id = &bpf_get_current_btf_ids[0], }; +BPF_CALL_1(bpf_task_pt_regs, struct task_struct *, task) +{ + return (unsigned long) task_pt_regs(task); +} + +BTF_ID_LIST(bpf_task_pt_regs_ids) +BTF_ID(struct, task_struct) +BTF_ID(struct, pt_regs) + +static const struct bpf_func_proto bpf_task_pt_regs_proto = { + .func = bpf_task_pt_regs, + .gpl_only = true, + .arg1_type = ARG_PTR_TO_BTF_ID, + .arg1_btf_id = &bpf_task_pt_regs_ids[0], + .ret_type = RET_PTR_TO_BTF_ID, + .ret_btf_id = &bpf_task_pt_regs_ids[1], +}; + BPF_CALL_2(bpf_current_task_under_cgroup, struct bpf_map *, map, u32, idx) { struct bpf_array *array = container_of(map, struct bpf_array, map); @@ -1032,6 +1050,8 @@ bpf_tracing_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog) return &bpf_get_current_task_proto; case BPF_FUNC_get_current_task_btf: return &bpf_get_current_task_btf_proto; + case BPF_FUNC_task_pt_regs: + return &bpf_task_pt_regs_proto; case BPF_FUNC_get_current_uid_gid: return &bpf_get_current_uid_gid_proto; case BPF_FUNC_get_current_comm: diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h index c4f7892edb2b..47427493206a 100644 --- a/tools/include/uapi/linux/bpf.h +++ b/tools/include/uapi/linux/bpf.h @@ -4871,6 +4871,12 @@ union bpf_attr { * Return * Value specified by user at BPF link creation/attachment time * or 0, if it was not specified. + * + * long bpf_task_pt_regs(struct task_struct *task) + * Description + * Get the struct pt_regs associated with **task**. + * Return + * A pointer to struct pt_regs. */ #define __BPF_FUNC_MAPPER(FN) \ FN(unspec), \ @@ -5048,6 +5054,7 @@ union bpf_attr { FN(timer_cancel), \ FN(get_func_ip), \ FN(get_attach_cookie), \ + FN(task_pt_regs), \ /* */ /* integer value in 'imm' field of BPF_CALL instruction selects which helper
The motivation behind this helper is to access userspace pt_regs in a kprobe handler. uprobe's ctx is the userspace pt_regs. kprobe's ctx is the kernelspace pt_regs. bpf_task_pt_regs() allows accessing userspace pt_regs in a kprobe handler. The final case (kernelspace pt_regs in uprobe) is pretty rare (usermode helper) so I think that can be solved later if necessary. More concretely, this helper is useful in doing BPF-based DWARF stack unwinding. Currently the kernel can only do framepointer based stack unwinds for userspace code. This is because the DWARF state machines are too fragile to be computed in kernelspace [0]. The idea behind DWARF-based stack unwinds w/ BPF is to copy a chunk of the userspace stack (while in prog context) and send it up to userspace for unwinding (probably with libunwind) [1]. This would effectively enable profiling applications with -fomit-frame-pointer using kprobes and uprobes. [0]: https://lkml.org/lkml/2012/2/10/356 [1]: https://github.com/danobi/bpf-dwarf-walk Signed-off-by: Daniel Xu <dxu@dxuuu.xyz> --- include/uapi/linux/bpf.h | 7 +++++++ kernel/trace/bpf_trace.c | 20 ++++++++++++++++++++ tools/include/uapi/linux/bpf.h | 7 +++++++ 3 files changed, 34 insertions(+)