From patchwork Tue Jan 11 15:00:30 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Masami Hiramatsu (Google)" X-Patchwork-Id: 12709922 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 82C55C433EF for ; Tue, 11 Jan 2022 15:00:43 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S243213AbiAKPAm (ORCPT ); Tue, 11 Jan 2022 10:00:42 -0500 Received: from ams.source.kernel.org ([145.40.68.75]:57552 "EHLO ams.source.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S243159AbiAKPAl (ORCPT ); Tue, 11 Jan 2022 10:00:41 -0500 Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id 2A500B81B35; Tue, 11 Jan 2022 15:00:37 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 25AADC36AEB; Tue, 11 Jan 2022 15:00:31 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1641913235; bh=CPGGc6dq/couLDlKjtLJltKyuODKvJvMhk+enUJpnGI=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=pt42SuOCrLWTA8x192pPS9eP/2QifxvtsEHlHeg8mE/fX+SLdiEx9JHFsqSgfq2lX vydCB4rPzt+aEUYJm4/wZfL3h3b+ENOT/uVV7e0sUMLu0UgY1EiT0nSC4W/WGPxM6a 532oubRPwBvBcCoZGvGmcUOfWmGzpw0hR9BXfP8eB/s7qokuvSn4G2+9KYHriR6got IeO8wrB9DJcMUIahZXY/39fKQhdCf7E1N7Hc32s1SO6FQl0PyrxZ/SoBTM8Hf6hv0m QsG5+ijszPv89qTzEWd2UTXh1XbdMGMe/YeYl2c7orMGMXr/1N31pfsQgCsfjIozxz 527CztXfUAojg== From: Masami Hiramatsu To: Jiri Olsa Cc: Alexei Starovoitov , Daniel Borkmann , Andrii Nakryiko , Masami Hiramatsu , netdev@vger.kernel.org, bpf@vger.kernel.org, lkml , Martin KaFai Lau , Song Liu , Yonghong Song , John Fastabend , KP Singh , Steven Rostedt , "Naveen N . Rao" , Anil S Keshavamurthy , "David S . Miller" Subject: [RFC PATCH 1/6] fprobe: Add ftrace based probe APIs Date: Wed, 12 Jan 2022 00:00:30 +0900 Message-Id: <164191322984.806991.3666707512798363619.stgit@devnote2> X-Mailer: git-send-email 2.25.1 In-Reply-To: <164191321766.806991.7930388561276940676.stgit@devnote2> References: <164191321766.806991.7930388561276940676.stgit@devnote2> User-Agent: StGit/0.19 MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-State: RFC The fprobe is a wrapper API for ftrace function tracer. Unlike kprobes, this probes only supports the function entry, but it can probe multiple functions by one fprobe. The usage is almost same as the kprobe, user will specify the function names by fprobe::entries[].syms, the number of syms by fprobe::nentry, and the user handler by fprobe::entry_handler. struct fprobe fp = { 0 }; struct fprobe_entry targets[] = {{.sym = "func1"}, {.sym = "func2"}, {.sym = "func3"}}; fp.handler = user_handler; fp.nentry = ARRAY_SIZE(targets); fp.entries = targets; ret = register_fprobe(&fp); Note that the fp::entries will be sorted by the converted function address. Signed-off-by: Masami Hiramatsu --- include/linux/fprobes.h | 71 +++++++++++++++++++++++++ kernel/trace/Kconfig | 10 ++++ kernel/trace/Makefile | 1 kernel/trace/fprobes.c | 132 +++++++++++++++++++++++++++++++++++++++++++++++ 4 files changed, 214 insertions(+) create mode 100644 include/linux/fprobes.h create mode 100644 kernel/trace/fprobes.c diff --git a/include/linux/fprobes.h b/include/linux/fprobes.h new file mode 100644 index 000000000000..fa85a2fc3ad1 --- /dev/null +++ b/include/linux/fprobes.h @@ -0,0 +1,71 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* Simple ftrace probe wrapper */ +#ifndef _LINUX_FPROBES_H +#define _LINUX_FPROBES_H + +#include +#include + +/* + * fprobe_entry - function entry for fprobe + * @sym: The symbol name of the function. + * @addr: The address of @sym. + * @data: per-entry data + * + * User must specify either @sym or @addr (not both). @data is optional. + */ +struct fprobe_entry { + const char *sym; + unsigned long addr; + void *data; +}; + +struct fprobe { + struct fprobe_entry *entries; + unsigned int nentry; + + struct ftrace_ops ftrace; + unsigned long nmissed; + unsigned int flags; + void (*entry_handler) (struct fprobe *, unsigned long, struct pt_regs *); +}; + +#define FPROBE_FL_DISABLED 1 + +static inline bool fprobe_disabled(struct fprobe *fp) +{ + return (fp) ? fp->flags & FPROBE_FL_DISABLED : false; +} + +#ifdef CONFIG_FPROBES +int register_fprobe(struct fprobe *fp); +int unregister_fprobe(struct fprobe *fp); +struct fprobe_entry *fprobe_find_entry(struct fprobe *fp, unsigned long addr); +#else +static inline int register_fprobe(struct fprobe *fp) +{ + return -ENOTSUPP; +} +static inline int unregister_fprobe(struct fprobe *fp) +{ + return -ENOTSUPP; +} +struct fprobe_entry *fprobe_find_entry(struct fprobe *fp, unsigned long addr) +{ + return NULL; +} +#endif + +static inline void disable_fprobe(struct fprobe *fp) +{ + if (fp) + fp->flags |= FPROBE_FL_DISABLED; +} + +static inline void enable_fprobe(struct fprobe *fp) +{ + if (fp) + fp->flags &= ~FPROBE_FL_DISABLED; +} + +#endif diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig index 420ff4bc67fd..45a3618a20a7 100644 --- a/kernel/trace/Kconfig +++ b/kernel/trace/Kconfig @@ -223,6 +223,16 @@ config DYNAMIC_FTRACE_WITH_ARGS depends on DYNAMIC_FTRACE depends on HAVE_DYNAMIC_FTRACE_WITH_ARGS +config FPROBES + bool "Kernel Function Probe (fprobe)" + depends on FUNCTION_TRACER + depends on DYNAMIC_FTRACE_WITH_REGS + default n + help + This option enables kernel function probe feature, which is + similar to kprobes, but probes only for kernel function entries + and it can probe multiple functions by one fprobe. + config FUNCTION_PROFILER bool "Kernel function profiler" depends on FUNCTION_TRACER diff --git a/kernel/trace/Makefile b/kernel/trace/Makefile index bedc5caceec7..47a37a3bb974 100644 --- a/kernel/trace/Makefile +++ b/kernel/trace/Makefile @@ -97,6 +97,7 @@ obj-$(CONFIG_PROBE_EVENTS) += trace_probe.o obj-$(CONFIG_UPROBE_EVENTS) += trace_uprobe.o obj-$(CONFIG_BOOTTIME_TRACING) += trace_boot.o obj-$(CONFIG_FTRACE_RECORD_RECURSION) += trace_recursion_record.o +obj-$(CONFIG_FPROBES) += fprobes.o obj-$(CONFIG_TRACEPOINT_BENCHMARK) += trace_benchmark.o diff --git a/kernel/trace/fprobes.c b/kernel/trace/fprobes.c new file mode 100644 index 000000000000..0a609093d48c --- /dev/null +++ b/kernel/trace/fprobes.c @@ -0,0 +1,132 @@ +// SPDX-License-Identifier: GPL-2.0 + +#define pr_fmt(fmt) "fprobes: " fmt + +#include +#include +#include +#include +#include + +static void fprobe_handler(unsigned long ip, unsigned long parent_ip, + struct ftrace_ops *ops, struct ftrace_regs *fregs) +{ + struct fprobe *fp; + int bit; + + fp = container_of(ops, struct fprobe, ftrace); + if (fprobe_disabled(fp)) + return; + + bit = ftrace_test_recursion_trylock(ip, parent_ip); + if (bit < 0) { + fp->nmissed++; + return; + } + + if (fp->entry_handler) + fp->entry_handler(fp, ip, ftrace_get_regs(fregs)); + + ftrace_test_recursion_unlock(bit); +} +NOKPROBE_SYMBOL(fprobe_handler); + +static int convert_func_addresses(struct fprobe *fp) +{ + unsigned int i; + struct fprobe_entry *ent = fp->entries; + + for (i = 0; i < fp->nentry; i++) { + if ((ent[i].sym && ent[i].addr) || + (!ent[i].sym && !ent[i].addr)) + return -EINVAL; + + if (ent[i].addr) + continue; + + ent[i].addr = kallsyms_lookup_name(ent[i].sym); + if (!ent[i].addr) + return -ENOENT; + } + + return 0; +} + +/* Since the entry list is sorted, we can search it by bisect */ +struct fprobe_entry *fprobe_find_entry(struct fprobe *fp, unsigned long addr) +{ + int d, n; + + d = n = fp->nentry / 2; + + while (fp->entries[n].addr != addr) { + d /= 2; + if (d == 0) + return NULL; + if (fp->entries[n].addr < addr) + n += d; + else + n -= d; + } + + return fp->entries + n; +} +EXPORT_SYMBOL_GPL(fprobe_find_entry); + +static int fprobe_comp_func(const void *a, const void *b) +{ + return ((struct fprobe_entry *)a)->addr - ((struct fprobe_entry *)b)->addr; +} + +/** + * register_fprobe - Register fprobe to ftrace + * @fp: A fprobe data structure to be registered. + * + * This expects the user set @fp::entry_handler, @fp::entries and @fp::nentry. + * For each entry of @fp::entries[], user must set 'addr' or 'sym'. + * Note that you do not set both of 'addr' and 'sym' of the entry. + */ +int register_fprobe(struct fprobe *fp) +{ + unsigned int i; + int ret; + + if (!fp || !fp->nentry || !fp->entries) + return -EINVAL; + + ret = convert_func_addresses(fp); + if (ret < 0) + return ret; + /* + * Sort the addresses so that the handler can find corresponding user data + * immediately. + */ + sort(fp->entries, fp->nentry, sizeof(*fp->entries), + fprobe_comp_func, NULL); + + fp->nmissed = 0; + fp->ftrace.func = fprobe_handler; + fp->ftrace.flags = FTRACE_OPS_FL_SAVE_REGS; + + for (i = 0; i < fp->nentry; i++) { + ret = ftrace_set_filter_ip(&fp->ftrace, fp->entries[i].addr, 0, 0); + if (ret < 0) + return ret; + } + + return register_ftrace_function(&fp->ftrace); +} +EXPORT_SYMBOL_GPL(register_fprobe); + +/** + * unregister_fprobe - Unregister fprobe from ftrace + * @fp: A fprobe data structure to be unregistered. + */ +int unregister_fprobe(struct fprobe *fp) +{ + if (!fp || !fp->nentry || !fp->entries) + return -EINVAL; + + return unregister_ftrace_function(&fp->ftrace); +} +EXPORT_SYMBOL_GPL(unregister_fprobe); From patchwork Tue Jan 11 15:00:41 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Masami Hiramatsu (Google)" X-Patchwork-Id: 12709923 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4BB50C433EF for ; Tue, 11 Jan 2022 15:00:49 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S242936AbiAKPAr (ORCPT ); Tue, 11 Jan 2022 10:00:47 -0500 Received: from dfw.source.kernel.org ([139.178.84.217]:49682 "EHLO dfw.source.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S243268AbiAKPAr (ORCPT ); Tue, 11 Jan 2022 10:00:47 -0500 Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 9E1B461687; Tue, 11 Jan 2022 15:00:46 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 46165C36AEB; Tue, 11 Jan 2022 15:00:43 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1641913246; bh=LjMTXfEOH4fDg+1bjxIjI+knZF+5oxu4LbnwtJq5lUM=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=IN7cBiyGU0nBZSAuX3mCktLMcqUJxVHGorMcz7XJHYSjzJc1tjaKZQXmpOftcCIZO z+qshXlq0jVzYfK63Me3wYCOXLZvWyRWGMUgK4tSShffTL38vVcpi09/aP+mCJ3N6L f3SSqyGA09fR67V+HLVq9ENCevze8ys8k8rBBSRr2yJWaQcPDtUam+rQmnViKKGwYN k7gmguhSTK+ipxI8Vc2CQv9ArOi/5I7Mlo8Z9xrL4rUZHlE6cjQarblOY8CdB1DW9K JE+ZGH70RCx8jVzuwa7q3+ZkEq5UoYHMDYwLVLWNSQVt8j6H2leetBW2vRegcOmYJ8 F4dq4ueS7mGkQ== From: Masami Hiramatsu To: Jiri Olsa Cc: Alexei Starovoitov , Daniel Borkmann , Andrii Nakryiko , Masami Hiramatsu , netdev@vger.kernel.org, bpf@vger.kernel.org, lkml , Martin KaFai Lau , Song Liu , Yonghong Song , John Fastabend , KP Singh , Steven Rostedt , "Naveen N . Rao" , Anil S Keshavamurthy , "David S . Miller" Subject: [RFC PATCH 2/6] rethook: Add a generic return hook Date: Wed, 12 Jan 2022 00:00:41 +0900 Message-Id: <164191324119.806991.11671123002722404207.stgit@devnote2> X-Mailer: git-send-email 2.25.1 In-Reply-To: <164191321766.806991.7930388561276940676.stgit@devnote2> References: <164191321766.806991.7930388561276940676.stgit@devnote2> User-Agent: StGit/0.19 MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-State: RFC Add a return hook framework which hooks the function return. Most of the idea came from the kretprobe, but this is independent from kretprobe. Note that this is expected to be used with other function entry hooking feature, like ftrace, fprobe, adn kprobes. Eventually this will replace the kretprobe (e.g. kprobe + rethook = kretprobe), but at this moment, this is just a additional hook. Signed-off-by: Masami Hiramatsu --- include/linux/rethook.h | 74 +++++++++++++++ include/linux/sched.h | 3 + kernel/exit.c | 2 kernel/fork.c | 3 + kernel/trace/Kconfig | 11 ++ kernel/trace/Makefile | 1 kernel/trace/rethook.c | 226 +++++++++++++++++++++++++++++++++++++++++++++++ 7 files changed, 320 insertions(+) create mode 100644 include/linux/rethook.h create mode 100644 kernel/trace/rethook.c diff --git a/include/linux/rethook.h b/include/linux/rethook.h new file mode 100644 index 000000000000..2622bcd5213a --- /dev/null +++ b/include/linux/rethook.h @@ -0,0 +1,74 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* + * Return hooking with list-based shadow stack. + */ +#ifndef _LINUX_RETHOOK_H +#define _LINUX_RETHOOK_H + +#include +#include +#include +#include +#include + +struct rethook_node; + +typedef void (*rethook_handler_t) (struct rethook_node *, void *, struct pt_regs *); + +struct rethook { + void *data; + rethook_handler_t handler; + struct freelist_head pool; + refcount_t ref; + struct rcu_head rcu; +}; + +struct rethook_node { + union { + struct freelist_node freelist; + struct rcu_head rcu; + }; + struct llist_node llist; + struct rethook *rethook; + unsigned long ret_addr; + unsigned long frame; +}; + +int rethook_node_init(struct rethook_node *node); + +struct rethook *rethook_alloc(void *data, rethook_handler_t handler); +void rethook_free(struct rethook *rh); +void rethook_add_node(struct rethook *rh, struct rethook_node *node); + +struct rethook_node *rethook_try_get(struct rethook *rh); +void rethook_node_recycle(struct rethook_node *node); +void rethook_hook_current(struct rethook_node *node, struct pt_regs *regs); + +unsigned long rethook_find_ret_addr(struct task_struct *tsk, unsigned long frame, + struct llist_node **cur); + +/* Arch dependent code must implement this and trampoline code */ +void arch_rethook_prepare(struct rethook_node *node, struct pt_regs *regs); +void arch_rethook_trampoline(void); + +static inline bool is_rethook_trampoline(unsigned long addr) +{ + return addr == (unsigned long)arch_rethook_trampoline; +} + +/* If the architecture needs a fixup the return address, implement it. */ +void arch_rethook_fixup_return(struct pt_regs *regs, + unsigned long correct_ret_addr); + +/* Generic trampoline handler, arch code must prepare asm stub */ +unsigned long rethook_trampoline_handler(struct pt_regs *regs, + unsigned long frame); + +#ifdef CONFIG_RETHOOK +void rethook_flush_task(struct task_struct *tk); +#else +#define rethook_flush_task(tsk) do { } while (0) +#endif + +#endif + diff --git a/include/linux/sched.h b/include/linux/sched.h index 78c351e35fec..2bfabf5355b7 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -1473,6 +1473,9 @@ struct task_struct { #ifdef CONFIG_KRETPROBES struct llist_head kretprobe_instances; #endif +#ifdef CONFIG_RETHOOK + struct llist_head rethooks; +#endif #ifdef CONFIG_ARCH_HAS_PARANOID_L1D_FLUSH /* diff --git a/kernel/exit.c b/kernel/exit.c index f702a6a63686..a39a321c1f37 100644 --- a/kernel/exit.c +++ b/kernel/exit.c @@ -64,6 +64,7 @@ #include #include #include +#include #include #include @@ -169,6 +170,7 @@ static void delayed_put_task_struct(struct rcu_head *rhp) struct task_struct *tsk = container_of(rhp, struct task_struct, rcu); kprobe_flush_task(tsk); + rethook_flush_task(tsk); perf_event_delayed_put(tsk); trace_sched_process_free(tsk); put_task_struct(tsk); diff --git a/kernel/fork.c b/kernel/fork.c index 3244cc56b697..ffae38be64c4 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -2282,6 +2282,9 @@ static __latent_entropy struct task_struct *copy_process( #ifdef CONFIG_KRETPROBES p->kretprobe_instances.first = NULL; #endif +#ifdef CONFIG_RETHOOK + p->rethooks.first = NULL; +#endif /* * Ensure that the cgroup subsystem policies allow the new process to be diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig index 45a3618a20a7..9328724258dc 100644 --- a/kernel/trace/Kconfig +++ b/kernel/trace/Kconfig @@ -10,6 +10,17 @@ config USER_STACKTRACE_SUPPORT config NOP_TRACER bool +config HAVE_RETHOOK + bool + +config RETHOOK + bool + depends on HAVE_RETHOOK + help + Enable generic return hooking feature. This is an internal + API, which will be used by other function-entry hooking + feature like fprobe and kprobes. + config HAVE_FUNCTION_TRACER bool help diff --git a/kernel/trace/Makefile b/kernel/trace/Makefile index 47a37a3bb974..c68fdacbf9ef 100644 --- a/kernel/trace/Makefile +++ b/kernel/trace/Makefile @@ -98,6 +98,7 @@ obj-$(CONFIG_UPROBE_EVENTS) += trace_uprobe.o obj-$(CONFIG_BOOTTIME_TRACING) += trace_boot.o obj-$(CONFIG_FTRACE_RECORD_RECURSION) += trace_recursion_record.o obj-$(CONFIG_FPROBES) += fprobes.o +obj-$(CONFIG_RETHOOK) += rethook.o obj-$(CONFIG_TRACEPOINT_BENCHMARK) += trace_benchmark.o diff --git a/kernel/trace/rethook.c b/kernel/trace/rethook.c new file mode 100644 index 000000000000..80c0584e8497 --- /dev/null +++ b/kernel/trace/rethook.c @@ -0,0 +1,226 @@ +// SPDX-License-Identifier: GPL-2.0 + +#define pr_fmt(fmt) "rethook: " fmt + +#include +#include +#include +#include +#include +#include +#include + +/* Return hook list (shadow stack by list) */ + +void rethook_flush_task(struct task_struct *tk) +{ + struct rethook_node *rhn; + struct llist_node *node; + + preempt_disable(); + + node = __llist_del_all(&tk->rethooks); + while (node) { + rhn = container_of(node, struct rethook_node, llist); + node = node->next; + rethook_node_recycle(rhn); + } + + preempt_enable(); +} + +static void rethook_free_rcu(struct rcu_head *head) +{ + struct rethook *rh = container_of(head, struct rethook, rcu); + struct rethook_node *rhn; + struct freelist_node *node; + int count = 1; + + node = rh->pool.head; + while (node) { + rhn = container_of(node, struct rethook_node, freelist); + node = node->next; + kfree(rhn); + count++; + } + + /* The rh->ref is the number of pooled node + 1 */ + if (refcount_sub_and_test(count, &rh->ref)) + kfree(rh); +} + +void rethook_free(struct rethook *rh) +{ + rh->handler = NULL; + rh->data = NULL; + + call_rcu(&rh->rcu, rethook_free_rcu); +} + +/* + * @handler must not NULL. @handler == NULL means this rethook is + * going to be freed. + */ +struct rethook *rethook_alloc(void *data, rethook_handler_t handler) +{ + struct rethook *rh = kzalloc(sizeof(struct rethook), GFP_KERNEL); + + if (!rh || !handler) + return NULL; + + rh->data = data; + rh->handler = handler; + rh->pool.head = NULL; + refcount_set(&rh->ref, 1); + + return rh; +} + +void rethook_add_node(struct rethook *rh, struct rethook_node *node) +{ + node->rethook = rh; + freelist_add(&node->freelist, &rh->pool); + refcount_inc(&rh->ref); +} + +static void free_rethook_node_rcu(struct rcu_head *head) +{ + struct rethook_node *node = container_of(head, struct rethook_node, rcu); + + if (refcount_dec_and_test(&node->rethook->ref)) + kfree(node->rethook); + kfree(node); +} + +void rethook_node_recycle(struct rethook_node *node) +{ + if (likely(READ_ONCE(node->rethook->handler))) + freelist_add(&node->freelist, &node->rethook->pool); + else + call_rcu(&node->rcu, free_rethook_node_rcu); +} + +struct rethook_node *rethook_try_get(struct rethook *rh) +{ + struct freelist_node *fn; + + /* Check whether @rh is going to be freed. */ + if (unlikely(!READ_ONCE(rh->handler))) + return NULL; + + fn = freelist_try_get(&rh->pool); + if (!fn) + return NULL; + + return container_of(fn, struct rethook_node, freelist); +} + +void rethook_hook_current(struct rethook_node *node, struct pt_regs *regs) +{ + arch_rethook_prepare(node, regs); + __llist_add(&node->llist, ¤t->rethooks); +} + +/* This assumes the 'tsk' is the current task or the is not running. */ +static unsigned long __rethook_find_ret_addr(struct task_struct *tsk, + struct llist_node **cur) +{ + struct rethook_node *rh = NULL; + struct llist_node *node = *cur; + + if (!node) + node = tsk->rethooks.first; + else + node = node->next; + + while (node) { + rh = container_of(node, struct rethook_node, llist); + if (rh->ret_addr != (unsigned long)arch_rethook_trampoline) { + *cur = node; + return rh->ret_addr; + } + node = node->next; + } + return 0; +} +NOKPROBE_SYMBOL(__rethook_find_ret_addr); + +/** + * rethook_find_ret_addr -- Find correct return address modified by rethook + * @tsk: Target task + * @frame: A frame pointer + * @cur: a storage of the loop cursor llist_node pointer for next call + * + * Find the correct return address modified by a rethook on @tsk in unsigned + * long type. If it finds the return address, this returns that address value, + * or this returns 0. + * The @tsk must be 'current' or a task which is not running. @frame is a hint + * to get the currect return address - which is compared with the + * rethook::frame field. The @cur is a loop cursor for searching the + * kretprobe return addresses on the @tsk. The '*@cur' should be NULL at the + * first call, but '@cur' itself must NOT NULL. + */ +unsigned long rethook_find_ret_addr(struct task_struct *tsk, unsigned long frame, + struct llist_node **cur) +{ + struct rethook_node *rhn = NULL; + unsigned long ret; + + if (WARN_ON_ONCE(!cur)) + return 0; + + do { + ret = __rethook_find_ret_addr(tsk, cur); + if (!ret) + break; + rhn = container_of(*cur, struct rethook_node, llist); + } while (rhn->frame != frame); + + return ret; +} +NOKPROBE_SYMBOL(rethook_find_ret_addr); + +void __weak arch_rethook_fixup_return(struct pt_regs *regs, + unsigned long correct_ret_addr) +{ + /* + * Do nothing by default. If the architecture which uses a + * frame pointer to record real return address on the stack, + * it should fill this function to fixup the return address + * so that stacktrace works from the rethook handler. + */ +} + +unsigned long rethook_trampoline_handler(struct pt_regs *regs, + unsigned long frame) +{ + struct rethook_node *rhn; + struct llist_node *first, *node = NULL; + unsigned long correct_ret_addr = __rethook_find_ret_addr(current, &node); + + if (!correct_ret_addr) { + pr_err("rethook: Return address not found! Maybe there is a bug in the kernel\n"); + BUG_ON(1); + } + + instruction_pointer_set(regs, correct_ret_addr); + arch_rethook_fixup_return(regs, correct_ret_addr); + + first = current->rethooks.first; + current->rethooks.first = node->next; + node->next = NULL; + + while (first) { + rhn = container_of(first, struct rethook_node, llist); + if (WARN_ON_ONCE(rhn->frame != frame)) + break; + if (rhn->rethook->handler) + rhn->rethook->handler(rhn, rhn->rethook->data, regs); + + first = first->next; + rethook_node_recycle(rhn); + } + + return correct_ret_addr; +} + From patchwork Tue Jan 11 15:00:52 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Masami Hiramatsu (Google)" X-Patchwork-Id: 12709924 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6E4FCC433EF for ; Tue, 11 Jan 2022 15:00:59 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S243432AbiAKPA6 (ORCPT ); Tue, 11 Jan 2022 10:00:58 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35554 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S243268AbiAKPA5 (ORCPT ); Tue, 11 Jan 2022 10:00:57 -0500 Received: from dfw.source.kernel.org (dfw.source.kernel.org [IPv6:2604:1380:4641:c500::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 99313C061748; Tue, 11 Jan 2022 07:00:57 -0800 (PST) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 2913D6169F; Tue, 11 Jan 2022 15:00:57 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id C74AAC36AE3; Tue, 11 Jan 2022 15:00:53 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1641913256; bh=tzqau06Ug3zjELYtX9vo3t11Cd6xCmSb0rhMW9gNyRk=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=PM+nk/uWsF5zAtq+FM+pJOXUqs0hE5Xu/j4kgMeFZ8A+UNa3Y2afYVofB14sHs2jr yaIa/Kg4o8qAay/JDmUaJt91eVnQAB9wjx8KQQZUgVwazw0SZKbDSycS2OrQsGN9W+ 2d4AvYowAvGgTWoV92d3cvcBYKWfDjQhxjgsLJ2cAn0Mvj4oYwJjqLLWYVHnUQisoa SsD1R9zgxk/yvd35BgQFSAqWLUtvYqaHyr4jW3F9krHlm5BnMb3C/ITgVnn+dKXEgr 0COP5ayb4qiEz01X01GGgLEJPR4H/mN3d5n4XQFe1ojy3JQX4iAzUbOwmTLuw9zCcw W8jfV6qGmPkcQ== From: Masami Hiramatsu To: Jiri Olsa Cc: Alexei Starovoitov , Daniel Borkmann , Andrii Nakryiko , Masami Hiramatsu , netdev@vger.kernel.org, bpf@vger.kernel.org, lkml , Martin KaFai Lau , Song Liu , Yonghong Song , John Fastabend , KP Singh , Steven Rostedt , "Naveen N . Rao" , Anil S Keshavamurthy , "David S . Miller" Subject: [RFC PATCH 3/6] rethook: x86: Add rethook x86 implementation Date: Wed, 12 Jan 2022 00:00:52 +0900 Message-Id: <164191325139.806991.12863256211590357778.stgit@devnote2> X-Mailer: git-send-email 2.25.1 In-Reply-To: <164191321766.806991.7930388561276940676.stgit@devnote2> References: <164191321766.806991.7930388561276940676.stgit@devnote2> User-Agent: StGit/0.19 MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-State: RFC Add rethook for x86 implementation. Most of the code has been copied from kretprobes on x86. Signed-off-by: Masami Hiramatsu --- arch/x86/Kconfig | 1 arch/x86/kernel/Makefile | 1 arch/x86/kernel/rethook.c | 115 +++++++++++++++++++++++++++++++++++++++++++++ 3 files changed, 117 insertions(+) create mode 100644 arch/x86/kernel/rethook.c diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 7399327d1eff..939c4c897e63 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -219,6 +219,7 @@ config X86 select HAVE_KPROBES_ON_FTRACE select HAVE_FUNCTION_ERROR_INJECTION select HAVE_KRETPROBES + select HAVE_RETHOOK select HAVE_KVM select HAVE_LIVEPATCH if X86_64 select HAVE_MIXED_BREAKPOINTS_REGS diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile index 2ff3e600f426..66593d8c4d74 100644 --- a/arch/x86/kernel/Makefile +++ b/arch/x86/kernel/Makefile @@ -106,6 +106,7 @@ obj-$(CONFIG_FUNCTION_GRAPH_TRACER) += ftrace.o obj-$(CONFIG_FTRACE_SYSCALLS) += ftrace.o obj-$(CONFIG_X86_TSC) += trace_clock.o obj-$(CONFIG_TRACING) += trace.o +obj-$(CONFIG_RETHOOK) += rethook.o obj-$(CONFIG_CRASH_CORE) += crash_core_$(BITS).o obj-$(CONFIG_KEXEC_CORE) += machine_kexec_$(BITS).o obj-$(CONFIG_KEXEC_CORE) += relocate_kernel_$(BITS).o crash.o diff --git a/arch/x86/kernel/rethook.c b/arch/x86/kernel/rethook.c new file mode 100644 index 000000000000..f2f3b9526e43 --- /dev/null +++ b/arch/x86/kernel/rethook.c @@ -0,0 +1,115 @@ +// SPDX-License-Identifier: GPL-2.0-or-later +/* + * x86 implementation of rethook. Mostly copied from arch/x86/kernel/kprobes/core.c. + */ +#include +#include +#include + +#include "kprobes/common.h" + +/* + * Called from arch_rethook_trampoline + */ +__used __visible void arch_rethook_trampoline_callback(struct pt_regs *regs) +{ + unsigned long *frame_pointer; + + /* fixup registers */ + regs->cs = __KERNEL_CS; +#ifdef CONFIG_X86_32 + regs->gs = 0; +#endif + regs->ip = (unsigned long)&arch_rethook_trampoline; + regs->orig_ax = ~0UL; + regs->sp += sizeof(long); + frame_pointer = ®s->sp + 1; + + /* + * The return address at 'frame_pointer' is recovered by the + * arch_rethook_fixup_return() which called from this + * rethook_trampoline_handler(). + */ + rethook_trampoline_handler(regs, (unsigned long)frame_pointer); + + /* + * Copy FLAGS to 'pt_regs::sp' so that arch_rethook_trapmoline() + * can do RET right after POPF. + */ + regs->sp = regs->flags; +} +NOKPROBE_SYMBOL(arch_rethook_trampoline_callback); + +/* + * When a target function returns, this code saves registers and calls + * arch_rethook_trampoline_callback(), which calls the rethook handler. + */ +asm( + ".text\n" + ".global arch_rethook_trampoline\n" + ".type arch_rethook_trampoline, @function\n" + "arch_rethook_trampoline:\n" +#ifdef CONFIG_X86_64 + /* Push a fake return address to tell the unwinder it's a kretprobe. */ + " pushq $arch_rethook_trampoline\n" + UNWIND_HINT_FUNC + /* Save the 'sp - 8', this will be fixed later. */ + " pushq %rsp\n" + " pushfq\n" + SAVE_REGS_STRING + " movq %rsp, %rdi\n" + " call arch_rethook_trampoline_callback\n" + RESTORE_REGS_STRING + /* In the callback function, 'regs->flags' is copied to 'regs->sp'. */ + " addq $8, %rsp\n" + " popfq\n" +#else + /* Push a fake return address to tell the unwinder it's a kretprobe. */ + " pushl $arch_rethook_trampoline\n" + UNWIND_HINT_FUNC + /* Save the 'sp - 4', this will be fixed later. */ + " pushl %esp\n" + " pushfl\n" + SAVE_REGS_STRING + " movl %esp, %eax\n" + " call arch_rethook_trampoline_callback\n" + RESTORE_REGS_STRING + /* In the callback function, 'regs->flags' is copied to 'regs->sp'. */ + " addl $4, %esp\n" + " popfl\n" +#endif + " ret\n" + ".size arch_rethook_trampoline, .-arch_rethook_trampoline\n" +); +NOKPROBE_SYMBOL(arch_rethook_trampoline); +/* + * arch_rethook_trampoline() skips updating frame pointer. The frame pointer + * saved in arch_rethook_trampoline_callback() points to the real caller + * function's frame pointer. Thus the arch_rethook_trampoline() doesn't have + * a standard stack frame with CONFIG_FRAME_POINTER=y. + * Let's mark it non-standard function. Anyway, FP unwinder can correctly + * unwind without the hint. + */ +STACK_FRAME_NON_STANDARD_FP(arch_rethook_trampoline); + +/* This is called from rethook_trampoline_handler(). */ +void arch_rethook_fixup_return(struct pt_regs *regs, + unsigned long correct_ret_addr) +{ + unsigned long *frame_pointer = ®s->sp + 1; + + /* Replace fake return address with real one. */ + *frame_pointer = correct_ret_addr; +} + +void arch_rethook_prepare(struct rethook_node *rh, struct pt_regs *regs) +{ + unsigned long *stack = (unsigned long *)regs->sp; + + rh->ret_addr = stack[0]; + rh->frame = regs->sp; + + /* Replace the return addr with trampoline addr */ + stack[0] = (unsigned long) arch_rethook_trampoline; +} +NOKPROBE_SYMBOL(arch_rethook_prepare); From patchwork Tue Jan 11 15:01:02 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Masami Hiramatsu (Google)" X-Patchwork-Id: 12709925 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 797B7C433F5 for ; Tue, 11 Jan 2022 15:01:17 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S243510AbiAKPBP (ORCPT ); Tue, 11 Jan 2022 10:01:15 -0500 Received: from ams.source.kernel.org ([145.40.68.75]:57810 "EHLO ams.source.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S243491AbiAKPBO (ORCPT ); Tue, 11 Jan 2022 10:01:14 -0500 Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id 01DC9B81B34; Tue, 11 Jan 2022 15:01:08 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id EE168C36AE3; Tue, 11 Jan 2022 15:01:03 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1641913266; bh=0SFUgokXnTa6zxw26ArkFug0LnrLNaFy5O6fChALoI8=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=q5cZHTrM5KGqt/PwGZXCmXWs2jWfIc+T/kQjckmrwkOMxx9XJznYX5W+pjrXuW4Dn mqa6tB1gnx/+EO5A4l0JCNFsfyk2TVfJ8rtT6EaQ+v2tlFr5b4PLyflq+SHs9eFVRr rxMNKUr5oF2n4/ZA6zpJyEvrydx+SKHtn5SOEg60wgTP2RaUIs3AqBizuXELzWkYEG A7mvFpJr7krjhxMMgjGgJr/0V6XhvAVPY79wnk5Yh+v7TCJjz9LMrHtWzT0nXPmR6E 8e0FixvwoUy3oaKARSKPe92Ir9X1OqoXbbuRjGeBi7dKgfuc/LKDfZgtedVQRiU88/ 2CeQAI3PIGx5A== From: Masami Hiramatsu To: Jiri Olsa Cc: Alexei Starovoitov , Daniel Borkmann , Andrii Nakryiko , Masami Hiramatsu , netdev@vger.kernel.org, bpf@vger.kernel.org, lkml , Martin KaFai Lau , Song Liu , Yonghong Song , John Fastabend , KP Singh , Steven Rostedt , "Naveen N . Rao" , Anil S Keshavamurthy , "David S . Miller" Subject: [RFC PATCH 4/6] fprobe: Add exit_handler support Date: Wed, 12 Jan 2022 00:01:02 +0900 Message-Id: <164191326189.806991.3684466615191467367.stgit@devnote2> X-Mailer: git-send-email 2.25.1 In-Reply-To: <164191321766.806991.7930388561276940676.stgit@devnote2> References: <164191321766.806991.7930388561276940676.stgit@devnote2> User-Agent: StGit/0.19 MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-State: RFC Add exit_handler to fprobe. fprobe + rethook allows us to hook the kernel function return without fgraph tracer. Eventually, the fgraph tracer will be generic array based return hooking and fprobe may use it if user requests. Since both array-based approach and list-based approach have Pros and Cons, (e.g. memory consumption v.s. less missing events) it is better to keep both but fprobe will provide the same exit-handler interface. Signed-off-by: Masami Hiramatsu --- include/linux/fprobes.h | 4 +++ kernel/trace/Kconfig | 1 + kernel/trace/fprobes.c | 59 +++++++++++++++++++++++++++++++++++++++++++++-- 3 files changed, 62 insertions(+), 2 deletions(-) diff --git a/include/linux/fprobes.h b/include/linux/fprobes.h index fa85a2fc3ad1..d2eb064c5b79 100644 --- a/include/linux/fprobes.h +++ b/include/linux/fprobes.h @@ -5,6 +5,7 @@ #include #include +#include /* * fprobe_entry - function entry for fprobe @@ -27,7 +28,10 @@ struct fprobe { struct ftrace_ops ftrace; unsigned long nmissed; unsigned int flags; + struct rethook *rethook; + void (*entry_handler) (struct fprobe *, unsigned long, struct pt_regs *); + void (*exit_handler) (struct fprobe *, unsigned long, struct pt_regs *); }; #define FPROBE_FL_DISABLED 1 diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig index 9328724258dc..59e227ade0b7 100644 --- a/kernel/trace/Kconfig +++ b/kernel/trace/Kconfig @@ -238,6 +238,7 @@ config FPROBES bool "Kernel Function Probe (fprobe)" depends on FUNCTION_TRACER depends on DYNAMIC_FTRACE_WITH_REGS + select RETHOOK default n help This option enables kernel function probe feature, which is diff --git a/kernel/trace/fprobes.c b/kernel/trace/fprobes.c index 0a609093d48c..1e8202a19e3d 100644 --- a/kernel/trace/fprobes.c +++ b/kernel/trace/fprobes.c @@ -5,12 +5,20 @@ #include #include #include +#include #include #include +struct fprobe_rethook_node { + struct rethook_node node; + unsigned long entry_ip; +}; + static void fprobe_handler(unsigned long ip, unsigned long parent_ip, struct ftrace_ops *ops, struct ftrace_regs *fregs) { + struct fprobe_rethook_node *fpr; + struct rethook_node *rh; struct fprobe *fp; int bit; @@ -27,10 +35,34 @@ static void fprobe_handler(unsigned long ip, unsigned long parent_ip, if (fp->entry_handler) fp->entry_handler(fp, ip, ftrace_get_regs(fregs)); + if (fp->exit_handler) { + rh = rethook_try_get(fp->rethook); + if (!rh) { + fp->nmissed++; + goto out; + } + fpr = container_of(rh, struct fprobe_rethook_node, node); + fpr->entry_ip = ip; + rethook_hook_current(rh, ftrace_get_regs(fregs)); + } + +out: ftrace_test_recursion_unlock(bit); } NOKPROBE_SYMBOL(fprobe_handler); +static void fprobe_exit_handler(struct rethook_node *rh, void *data, + struct pt_regs *regs) +{ + struct fprobe *fp = (struct fprobe *)data; + struct fprobe_rethook_node *fpr; + + fpr = container_of(rh, struct fprobe_rethook_node, node); + + fp->exit_handler(fp, fpr->entry_ip, regs); +} +NOKPROBE_SYMBOL(fprobe_exit_handler); + static int convert_func_addresses(struct fprobe *fp) { unsigned int i; @@ -88,7 +120,7 @@ static int fprobe_comp_func(const void *a, const void *b) */ int register_fprobe(struct fprobe *fp) { - unsigned int i; + unsigned int i, size; int ret; if (!fp || !fp->nentry || !fp->entries) @@ -114,6 +146,23 @@ int register_fprobe(struct fprobe *fp) return ret; } + /* Initialize rethook if needed */ + if (fp->exit_handler) { + size = fp->nentry * num_possible_cpus() * 2; + fp->rethook = rethook_alloc((void *)fp, fprobe_exit_handler); + for (i = 0; i < size; i++) { + struct rethook_node *node; + + node = kzalloc(sizeof(struct fprobe_rethook_node), GFP_KERNEL); + if (!node) { + rethook_free(fp->rethook); + return -ENOMEM; + } + rethook_add_node(fp->rethook, node); + } + } else + fp->rethook = NULL; + return register_ftrace_function(&fp->ftrace); } EXPORT_SYMBOL_GPL(register_fprobe); @@ -124,9 +173,15 @@ EXPORT_SYMBOL_GPL(register_fprobe); */ int unregister_fprobe(struct fprobe *fp) { + int ret; + if (!fp || !fp->nentry || !fp->entries) return -EINVAL; - return unregister_ftrace_function(&fp->ftrace); + ret = unregister_ftrace_function(&fp->ftrace); + if (!ret) + rethook_free(fp->rethook); + + return ret; } EXPORT_SYMBOL_GPL(unregister_fprobe); From patchwork Tue Jan 11 15:01:12 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Masami Hiramatsu (Google)" X-Patchwork-Id: 12709926 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 49DFDC433FE for ; Tue, 11 Jan 2022 15:01:30 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S243853AbiAKPB1 (ORCPT ); Tue, 11 Jan 2022 10:01:27 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35650 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S243600AbiAKPBS (ORCPT ); Tue, 11 Jan 2022 10:01:18 -0500 Received: from dfw.source.kernel.org (dfw.source.kernel.org [IPv6:2604:1380:4641:c500::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 40FF2C06173F; Tue, 11 Jan 2022 07:01:18 -0800 (PST) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id D3880616A9; Tue, 11 Jan 2022 15:01:17 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 5437DC36AEB; Tue, 11 Jan 2022 15:01:14 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1641913277; bh=1j0gb5NnOtQmmflNztpId+8+T/swnj4qKPrK5FKZCfY=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=HsLhBP1RT/X3aJ5bZ2VFbSBMWPbl21dw0S8GPr8UhAz0jcQ6N32cKRP96FIOAtLGD 7d5dBL1FEBIW3uraf9HGu6gJdG95sf4CoZi/Aq/bZ40wAl0UldmJ1AmAHhVFpWSOx4 N9eVREpZ38JYs2Dco+4ald6TSK83z7p+xkmPBQd0fsNm7r2rOcjYzpdlTqeNgX8Q+H LkI+p7Q1xNVBVS1NSOjAjMIv1G92E3ore0gj4M2lpqjexLWHHkreq3t+LLsKg+jN6r ZvErG2xmE7ZldMhemIustK8g00WJKPVW7OUOzkhNwPh0217mblxLAk811uVctDW/EC oYdyUU11T8UDA== From: Masami Hiramatsu To: Jiri Olsa Cc: Alexei Starovoitov , Daniel Borkmann , Andrii Nakryiko , Masami Hiramatsu , netdev@vger.kernel.org, bpf@vger.kernel.org, lkml , Martin KaFai Lau , Song Liu , Yonghong Song , John Fastabend , KP Singh , Steven Rostedt , "Naveen N . Rao" , Anil S Keshavamurthy , "David S . Miller" Subject: [RFC PATCH 5/6] fprobe: Add sample program for fprobe Date: Wed, 12 Jan 2022 00:01:12 +0900 Message-Id: <164191327207.806991.15842602939159094192.stgit@devnote2> X-Mailer: git-send-email 2.25.1 In-Reply-To: <164191321766.806991.7930388561276940676.stgit@devnote2> References: <164191321766.806991.7930388561276940676.stgit@devnote2> User-Agent: StGit/0.19 MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-State: RFC Add a sample program for the fprobe. Signed-off-by: Masami Hiramatsu --- samples/Kconfig | 6 ++ samples/Makefile | 1 samples/fprobe/Makefile | 3 + samples/fprobe/fprobe_example.c | 103 +++++++++++++++++++++++++++++++++++++++ 4 files changed, 113 insertions(+) create mode 100644 samples/fprobe/Makefile create mode 100644 samples/fprobe/fprobe_example.c diff --git a/samples/Kconfig b/samples/Kconfig index 43d2e9aa557f..487b5d17f722 100644 --- a/samples/Kconfig +++ b/samples/Kconfig @@ -73,6 +73,12 @@ config SAMPLE_HW_BREAKPOINT help This builds kernel hardware breakpoint example modules. +config SAMPLE_FPROBE + tristate "Build fprobe examples -- loadable modules only" + depends on FPROBES && m + help + This build several fprobe example modules. + config SAMPLE_KFIFO tristate "Build kfifo examples -- loadable modules only" depends on m diff --git a/samples/Makefile b/samples/Makefile index 4bcd6b93bffa..4f73fe7aa473 100644 --- a/samples/Makefile +++ b/samples/Makefile @@ -32,3 +32,4 @@ obj-$(CONFIG_SAMPLE_INTEL_MEI) += mei/ subdir-$(CONFIG_SAMPLE_WATCHDOG) += watchdog subdir-$(CONFIG_SAMPLE_WATCH_QUEUE) += watch_queue obj-$(CONFIG_DEBUG_KMEMLEAK_TEST) += kmemleak/ +obj-$(CONFIG_SAMPLE_FPROBE) += fprobe/ diff --git a/samples/fprobe/Makefile b/samples/fprobe/Makefile new file mode 100644 index 000000000000..ecccbfa6e99b --- /dev/null +++ b/samples/fprobe/Makefile @@ -0,0 +1,3 @@ +# SPDX-License-Identifier: GPL-2.0-only + +obj-$(CONFIG_SAMPLE_FPROBE) += fprobe_example.o diff --git a/samples/fprobe/fprobe_example.c b/samples/fprobe/fprobe_example.c new file mode 100644 index 000000000000..8ea335cfe916 --- /dev/null +++ b/samples/fprobe/fprobe_example.c @@ -0,0 +1,103 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * Here's a sample kernel module showing the use of fprobe to dump a + * stack trace and selected registers when kernel_clone() is called. + * + * For more information on theory of operation of kprobes, see + * Documentation/trace/kprobes.rst + * + * You will see the trace data in /var/log/messages and on the console + * whenever kernel_clone() is invoked to create a new process. + */ + +#define pr_fmt(fmt) "%s: " fmt, __func__ + +#include +#include +#include +#include + +#define MAX_SYMBOL_LEN 4096 +struct fprobe sample_probe; +static char symbol[MAX_SYMBOL_LEN] = "kernel_clone"; +module_param_string(symbol, symbol, sizeof(symbol), 0644); + +static void sample_entry_handler(struct fprobe *fp, unsigned long ip, struct pt_regs *regs) +{ + const char *sym = ""; + struct fprobe_entry *ent; + + ent = fprobe_find_entry(fp, ip); + if (ent) + sym = ent->sym; + + pr_info("Enter <%s> ip = 0x%p (%pS)\n", sym, (void *)ip, (void *)ip); +} + +static void sample_exit_handler(struct fprobe *fp, unsigned long ip, struct pt_regs *regs) +{ + unsigned long rip = instruction_pointer(regs); + const char *sym = ""; + struct fprobe_entry *ent; + + ent = fprobe_find_entry(fp, ip); + if (ent) + sym = ent->sym; + + pr_info("Return from <%s> ip = 0x%p to rip = 0x%p (%pS)\n", sym, (void *)ip, + (void *)rip, (void *)rip); +} + +static int __init fprobe_init(void) +{ + struct fprobe_entry *ents; + char *tmp, *p; + int ret, count, i; + + sample_probe.entry_handler = sample_entry_handler; + sample_probe.exit_handler = sample_exit_handler; + + if (strchr(symbol, ',')) { + tmp = kstrdup(symbol, GFP_KERNEL); + if (!tmp) + return -ENOMEM; + p = tmp; + count = 1; + while ((p = strchr(p, ',')) != NULL) + count++; + } else { + count = 1; + tmp = symbol; + } + + ents = kzalloc(count * sizeof(*ents), GFP_KERNEL); + if (!ents) { + if (tmp != symbol) + kfree(tmp); + return -ENOMEM; + } + + for (i = 0; i < count; i++) + ents[i].sym = strsep(&tmp, ","); + + sample_probe.entries = ents; + sample_probe.nentry = count; + + ret = register_fprobe(&sample_probe); + if (ret < 0) { + pr_err("register_fprobe failed, returned %d\n", ret); + return ret; + } + pr_info("Planted fprobe at %s\n", symbol); + return 0; +} + +static void __exit fprobe_exit(void) +{ + unregister_fprobe(&sample_probe); + pr_info("fprobe at %s unregistered\n", symbol); +} + +module_init(fprobe_init) +module_exit(fprobe_exit) +MODULE_LICENSE("GPL"); From patchwork Tue Jan 11 15:01:22 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Masami Hiramatsu (Google)" X-Patchwork-Id: 12709927 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2B02BC433EF for ; Tue, 11 Jan 2022 15:02:05 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S243869AbiAKPCC (ORCPT ); Tue, 11 Jan 2022 10:02:02 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35702 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S243955AbiAKPBa (ORCPT ); Tue, 11 Jan 2022 10:01:30 -0500 Received: from dfw.source.kernel.org (dfw.source.kernel.org [IPv6:2604:1380:4641:c500::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0085BC061748; Tue, 11 Jan 2022 07:01:29 -0800 (PST) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 849EE616AB; Tue, 11 Jan 2022 15:01:29 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 6B242C36AE3; Tue, 11 Jan 2022 15:01:24 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1641913289; bh=+dZrj08j6BDJLaq7xQoh5Q656UT654qcGRPJDdWM6Y0=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=PbK22lCWDHigxdMGPohxkmB9Q/aqv3UjJdWFUAYHDr8gb2EU7T4Xf3Xluv1dYcYVc q40YJlJY1wjQwf61Ql1idMo8ZVgbhQ6UQTnHl2RXNRPvwceFuRRnNde89IF9xkJuxu Ypfi4IkJC5qMZrl9j+mldsnpuM56tpsLP51PnusdnFBU5749IQlmPggDKs4Iz1UaQO HsvsjMyxMhn4hZEDQUsbbaffnkAi56qLopJxLbBkRXaG+H2jppWjw0Tj+rp95/wYBB l7eKnytRmSe13aPNNrenr6wAocixGpJY6tlLIIxrZWnhouC/548xKqbKHk88wbpo2r CGoNyTuZJTWjA== From: Masami Hiramatsu To: Jiri Olsa Cc: Alexei Starovoitov , Daniel Borkmann , Andrii Nakryiko , Masami Hiramatsu , netdev@vger.kernel.org, bpf@vger.kernel.org, lkml , Martin KaFai Lau , Song Liu , Yonghong Song , John Fastabend , KP Singh , Steven Rostedt , "Naveen N . Rao" , Anil S Keshavamurthy , "David S . Miller" Subject: [RFC PATCH 6/6] bpf: Add kprobe link for attaching raw kprobes Date: Wed, 12 Jan 2022 00:01:22 +0900 Message-Id: <164191328259.806991.14418649843650864871.stgit@devnote2> X-Mailer: git-send-email 2.25.1 In-Reply-To: <164191321766.806991.7930388561276940676.stgit@devnote2> References: <164191321766.806991.7930388561276940676.stgit@devnote2> User-Agent: StGit/0.19 MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net X-Patchwork-State: RFC From: Jiri Olsa Adding new link type BPF_LINK_TYPE_KPROBE to attach so called "kprobes" directly through fprobe API. Note that since the using kprobes with multiple same handler is not efficient, this uses the fprobe which natively support multiple probe points for one same handler, but limited on function entry and exit. Adding new attach type BPF_TRACE_RAW_KPROBE that enables such link for kprobe program. The new link allows to create multiple kprobes link by using new link_create interface: struct { __aligned_u64 addrs; __u32 cnt; __u64 bpf_cookie; } kprobe; Plus new flag BPF_F_KPROBE_RETURN for link_create.flags to create return probe. Signed-off-by: Jiri Olsa Signed-off-by: Masami Hiramatsu --- include/linux/bpf_types.h | 1 include/uapi/linux/bpf.h | 12 ++ kernel/bpf/syscall.c | 199 +++++++++++++++++++++++++++++++++++++++- tools/include/uapi/linux/bpf.h | 12 ++ 4 files changed, 219 insertions(+), 5 deletions(-) diff --git a/include/linux/bpf_types.h b/include/linux/bpf_types.h index 48a91c51c015..a9000feab34e 100644 --- a/include/linux/bpf_types.h +++ b/include/linux/bpf_types.h @@ -140,3 +140,4 @@ BPF_LINK_TYPE(BPF_LINK_TYPE_XDP, xdp) #ifdef CONFIG_PERF_EVENTS BPF_LINK_TYPE(BPF_LINK_TYPE_PERF_EVENT, perf) #endif +BPF_LINK_TYPE(BPF_LINK_TYPE_KPROBE, kprobe) diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h index ba5af15e25f5..10e9b56a074e 100644 --- a/include/uapi/linux/bpf.h +++ b/include/uapi/linux/bpf.h @@ -995,6 +995,7 @@ enum bpf_attach_type { BPF_SK_REUSEPORT_SELECT, BPF_SK_REUSEPORT_SELECT_OR_MIGRATE, BPF_PERF_EVENT, + BPF_TRACE_RAW_KPROBE, __MAX_BPF_ATTACH_TYPE }; @@ -1009,6 +1010,7 @@ enum bpf_link_type { BPF_LINK_TYPE_NETNS = 5, BPF_LINK_TYPE_XDP = 6, BPF_LINK_TYPE_PERF_EVENT = 7, + BPF_LINK_TYPE_KPROBE = 8, MAX_BPF_LINK_TYPE, }; @@ -1111,6 +1113,11 @@ enum bpf_link_type { */ #define BPF_F_SLEEPABLE (1U << 4) +/* link_create flags used in LINK_CREATE command for BPF_TRACE_RAW_KPROBE + * attach type. + */ +#define BPF_F_KPROBE_RETURN (1U << 0) + /* When BPF ldimm64's insn[0].src_reg != 0 then this can have * the following extensions: * @@ -1463,6 +1470,11 @@ union bpf_attr { */ __u64 bpf_cookie; } perf_event; + struct { + __aligned_u64 addrs; + __u32 cnt; + __u64 bpf_cookie; + } kprobe; }; } link_create; diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c index 1033ee8c0caf..d237ba7762ec 100644 --- a/kernel/bpf/syscall.c +++ b/kernel/bpf/syscall.c @@ -31,6 +31,7 @@ #include #include #include +#include #define IS_FD_ARRAY(map) ((map)->map_type == BPF_MAP_TYPE_PERF_EVENT_ARRAY || \ (map)->map_type == BPF_MAP_TYPE_CGROUP_ARRAY || \ @@ -3013,8 +3014,186 @@ static int bpf_perf_link_attach(const union bpf_attr *attr, struct bpf_prog *pro fput(perf_file); return err; } +#else +static int bpf_perf_link_attach(const union bpf_attr *attr, struct bpf_prog *prog) +{ + return -ENOTSUPP; +} #endif /* CONFIG_PERF_EVENTS */ +#ifdef CONFIG_FPROBES + +/* Note that this is called 'kprobe_link' but using fprobe inside */ +struct bpf_kprobe_link { + struct bpf_link link; + struct fprobe fp; + bool is_return; + unsigned long *addrs; + u32 cnt; + u64 bpf_cookie; +}; + +static void bpf_kprobe_link_release(struct bpf_link *link) +{ + struct bpf_kprobe_link *kprobe_link; + + kprobe_link = container_of(link, struct bpf_kprobe_link, link); + + unregister_fprobe(&kprobe_link->fp); +} + +static void bpf_kprobe_link_dealloc(struct bpf_link *link) +{ + struct bpf_kprobe_link *kprobe_link; + + kprobe_link = container_of(link, struct bpf_kprobe_link, link); + kfree(kprobe_link->fp.entries); + kfree(kprobe_link->addrs); + kfree(kprobe_link); +} + +static const struct bpf_link_ops bpf_kprobe_link_lops = { + .release = bpf_kprobe_link_release, + .dealloc = bpf_kprobe_link_dealloc, +}; + +static int kprobe_link_prog_run(struct bpf_kprobe_link *kprobe_link, + struct pt_regs *regs) +{ + struct bpf_trace_run_ctx run_ctx; + struct bpf_run_ctx *old_run_ctx; + int err; + + if (unlikely(__this_cpu_inc_return(bpf_prog_active) != 1)) { + err = 0; + goto out; + } + + old_run_ctx = bpf_set_run_ctx(&run_ctx.run_ctx); + run_ctx.bpf_cookie = kprobe_link->bpf_cookie; + + rcu_read_lock(); + migrate_disable(); + err = bpf_prog_run(kprobe_link->link.prog, regs); + migrate_enable(); + rcu_read_unlock(); + + bpf_reset_run_ctx(old_run_ctx); + + out: + __this_cpu_dec(bpf_prog_active); + return err; +} + +static void kprobe_link_entry_handler(struct fprobe *fp, unsigned long entry_ip, + struct pt_regs *regs) +{ + struct bpf_kprobe_link *kprobe_link; + + kprobe_link = container_of(fp, struct bpf_kprobe_link, fp); + kprobe_link_prog_run(kprobe_link, regs); +} + +static void kprobe_link_exit_handler(struct fprobe *fp, unsigned long entry_ip, + struct pt_regs *regs) +{ + struct bpf_kprobe_link *kprobe_link; + + kprobe_link = container_of(fp, struct bpf_kprobe_link, fp); + kprobe_link_prog_run(kprobe_link, regs); +} + +static int bpf_kprobe_link_attach(const union bpf_attr *attr, struct bpf_prog *prog) +{ + struct bpf_link_primer link_primer; + struct bpf_kprobe_link *link = NULL; + struct fprobe_entry *ents = NULL; + unsigned long *addrs; + u32 flags, cnt, size, i; + void __user *uaddrs; + u64 **tmp; + int err; + + flags = attr->link_create.flags; + if (flags & ~BPF_F_KPROBE_RETURN) + return -EINVAL; + + uaddrs = u64_to_user_ptr(attr->link_create.kprobe.addrs); + cnt = attr->link_create.kprobe.cnt; + size = cnt * sizeof(*tmp); + + tmp = kzalloc(size, GFP_KERNEL); + if (!tmp) + return -ENOMEM; + + if (copy_from_user(tmp, uaddrs, size)) { + err = -EFAULT; + goto error; + } + + /* TODO add extra copy for 32bit archs */ + if (sizeof(u64) != sizeof(void *)) { + err = -EINVAL; + goto error; + } + + addrs = (unsigned long *) tmp; + + link = kzalloc(sizeof(*link), GFP_KERNEL); + if (!link) { + err = -ENOMEM; + goto error; + } + + ents = kzalloc(sizeof(*ents) * cnt, GFP_KERNEL); + if (!ents) { + err = -ENOMEM; + goto error; + } + for (i = 0; i < cnt; i++) + ents[i].addr = addrs[i]; + + bpf_link_init(&link->link, BPF_LINK_TYPE_KPROBE, &bpf_kprobe_link_lops, prog); + + err = bpf_link_prime(&link->link, &link_primer); + if (err) + goto error; + + link->is_return = flags & BPF_F_KPROBE_RETURN; + link->addrs = addrs; + link->cnt = cnt; + link->bpf_cookie = attr->link_create.kprobe.bpf_cookie; + + link->fp.entries = ents; + link->fp.nentry = cnt; + + if (link->is_return) + link->fp.exit_handler = kprobe_link_exit_handler; + else + link->fp.entry_handler = kprobe_link_entry_handler; + + err = register_fprobe(&link->fp); + if (err) { + bpf_link_cleanup(&link_primer); + goto error; + } + + return bpf_link_settle(&link_primer); + +error: + kfree(ents); + kfree(link); + kfree(tmp); + + return err; +} +#else /* !CONFIG_FPROBES */ +static int bpf_kprobe_link_attach(const union bpf_attr *attr, struct bpf_prog *prog) +{ + return -ENOTSUPP; +} +#endif + #define BPF_RAW_TRACEPOINT_OPEN_LAST_FIELD raw_tracepoint.prog_fd static int bpf_raw_tracepoint_open(const union bpf_attr *attr) @@ -4241,7 +4420,7 @@ static int tracing_bpf_link_attach(const union bpf_attr *attr, bpfptr_t uattr, return -EINVAL; } -#define BPF_LINK_CREATE_LAST_FIELD link_create.iter_info_len +#define BPF_LINK_CREATE_LAST_FIELD link_create.kprobe.bpf_cookie static int link_create(union bpf_attr *attr, bpfptr_t uattr) { enum bpf_prog_type ptype; @@ -4265,7 +4444,6 @@ static int link_create(union bpf_attr *attr, bpfptr_t uattr) ret = tracing_bpf_link_attach(attr, uattr, prog); goto out; case BPF_PROG_TYPE_PERF_EVENT: - case BPF_PROG_TYPE_KPROBE: case BPF_PROG_TYPE_TRACEPOINT: if (attr->link_create.attach_type != BPF_PERF_EVENT) { ret = -EINVAL; @@ -4273,6 +4451,14 @@ static int link_create(union bpf_attr *attr, bpfptr_t uattr) } ptype = prog->type; break; + case BPF_PROG_TYPE_KPROBE: + if (attr->link_create.attach_type != BPF_PERF_EVENT && + attr->link_create.attach_type != BPF_TRACE_RAW_KPROBE) { + ret = -EINVAL; + goto out; + } + ptype = prog->type; + break; default: ptype = attach_type_to_prog_type(attr->link_create.attach_type); if (ptype == BPF_PROG_TYPE_UNSPEC || ptype != prog->type) { @@ -4304,13 +4490,16 @@ static int link_create(union bpf_attr *attr, bpfptr_t uattr) ret = bpf_xdp_link_attach(attr, prog); break; #endif -#ifdef CONFIG_PERF_EVENTS case BPF_PROG_TYPE_PERF_EVENT: case BPF_PROG_TYPE_TRACEPOINT: - case BPF_PROG_TYPE_KPROBE: ret = bpf_perf_link_attach(attr, prog); break; -#endif + case BPF_PROG_TYPE_KPROBE: + if (attr->link_create.attach_type == BPF_PERF_EVENT) + ret = bpf_perf_link_attach(attr, prog); + else + ret = bpf_kprobe_link_attach(attr, prog); + break; default: ret = -EINVAL; } diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h index ba5af15e25f5..10e9b56a074e 100644 --- a/tools/include/uapi/linux/bpf.h +++ b/tools/include/uapi/linux/bpf.h @@ -995,6 +995,7 @@ enum bpf_attach_type { BPF_SK_REUSEPORT_SELECT, BPF_SK_REUSEPORT_SELECT_OR_MIGRATE, BPF_PERF_EVENT, + BPF_TRACE_RAW_KPROBE, __MAX_BPF_ATTACH_TYPE }; @@ -1009,6 +1010,7 @@ enum bpf_link_type { BPF_LINK_TYPE_NETNS = 5, BPF_LINK_TYPE_XDP = 6, BPF_LINK_TYPE_PERF_EVENT = 7, + BPF_LINK_TYPE_KPROBE = 8, MAX_BPF_LINK_TYPE, }; @@ -1111,6 +1113,11 @@ enum bpf_link_type { */ #define BPF_F_SLEEPABLE (1U << 4) +/* link_create flags used in LINK_CREATE command for BPF_TRACE_RAW_KPROBE + * attach type. + */ +#define BPF_F_KPROBE_RETURN (1U << 0) + /* When BPF ldimm64's insn[0].src_reg != 0 then this can have * the following extensions: * @@ -1463,6 +1470,11 @@ union bpf_attr { */ __u64 bpf_cookie; } perf_event; + struct { + __aligned_u64 addrs; + __u32 cnt; + __u64 bpf_cookie; + } kprobe; }; } link_create;