From patchwork Fri Oct 23 19:53:47 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Michael Jeanson X-Patchwork-Id: 11854479 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH, MAILING_LIST_MULTI,SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 864BEC55178 for ; Fri, 23 Oct 2020 19:54:50 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 1EF2C24641 for ; Fri, 23 Oct 2020 19:54:50 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=efficios.com header.i=@efficios.com header.b="CzoZZTCo" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755127AbgJWTyi (ORCPT ); Fri, 23 Oct 2020 15:54:38 -0400 Received: from mail.efficios.com ([167.114.26.124]:45648 "EHLO mail.efficios.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750002AbgJWTyU (ORCPT ); Fri, 23 Oct 2020 15:54:20 -0400 Received: from localhost (localhost [127.0.0.1]) by mail.efficios.com (Postfix) with ESMTP id CB24427927F; Fri, 23 Oct 2020 15:54:18 -0400 (EDT) Received: from mail.efficios.com ([127.0.0.1]) by localhost (mail03.efficios.com [127.0.0.1]) (amavisd-new, port 10032) with ESMTP id 8LDav60Vv_2f; Fri, 23 Oct 2020 15:54:17 -0400 (EDT) Received: from localhost (localhost [127.0.0.1]) by mail.efficios.com (Postfix) with ESMTP id DAE9727933C; Fri, 23 Oct 2020 15:54:17 -0400 (EDT) DKIM-Filter: OpenDKIM Filter v2.10.3 mail.efficios.com DAE9727933C DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=efficios.com; s=default; t=1603482857; bh=CTqY0O67DdCwwsbnyH1KFmTbS8QeXTEz6NpzJqawOzk=; h=From:To:Date:Message-Id:MIME-Version; b=CzoZZTCoIl9i3DBlG7J33f8PYkDXxys0fCbIfXuR94oDORZ643o4/XUekijYz4pVT 7AE2u/5VItv8ui/xU2Q/96BxWpJHrJLTAsNC0r+OVwrMxqufe4dwWiPuBL8WlAENV8 l73SfNWw4gNKIRwzpccO0EOyzZPldbAlxg+/WtluOgnXuBr2LxmQtMerA0a7SevwmV M87Biofy3lc3DHrRyVLQ2NHBWROAGgR3TTjmyH6XOw9LG1f5ZGB+0ohr14c+RW1Uaq TeHpEoTpl7DyWHIfA+wOpdbmv1u5cB1HndLgBmIvNMW7ptoOMnWV5Z1tqkrcctTr1Q N8Q6DRRmzTMbg== X-Virus-Scanned: amavisd-new at efficios.com Received: from mail.efficios.com ([127.0.0.1]) by localhost (mail03.efficios.com [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id 9kKYRoZUQbxo; Fri, 23 Oct 2020 15:54:17 -0400 (EDT) Received: from localhost.localdomain (96-127-212-112.qc.cable.ebox.net [96.127.212.112]) by mail.efficios.com (Postfix) with ESMTPSA id 7B637279620; Fri, 23 Oct 2020 15:54:17 -0400 (EDT) From: Michael Jeanson To: linux-kernel@vger.kernel.org Cc: mathieu.desnoyers@efficios.com, Michael Jeanson , Steven Rostedt , Peter Zijlstra , Alexei Starovoitov , Yonghong Song , "Paul E . McKenney" , Ingo Molnar , Arnaldo Carvalho de Melo , Mark Rutland , Alexander Shishkin , Jiri Olsa , Namhyung Kim , Joel Fernandes , bpf@vger.kernel.org Subject: [RFC PATCH 1/6] tracing: introduce sleepable tracepoints Date: Fri, 23 Oct 2020 15:53:47 -0400 Message-Id: <20201023195352.26269-2-mjeanson@efficios.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20201023195352.26269-1-mjeanson@efficios.com> References: <20201023195352.26269-1-mjeanson@efficios.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-State: RFC When invoked from system call enter/exit instrumentation, accessing user-space data is a common use-case for tracers. However, tracepoints currently disable preemption around iteration on the registered tracepoint probes and invocation of the probe callbacks, which prevents tracers from handling page faults. Extend the tracepoint and trace event APIs to allow defining a sleepable tracepoint which invokes its callback with preemption enabled. Also extend the tracepoint API to allow tracers to request specific probes to be connected to those sleepable tracepoints. When the TRACEPOINT_MAYSLEEP flag is provided on registration, the probe callback will be called with preemption enabled, and is allowed to take page faults. Sleepable probes can only be registered on sleepable tracepoints and non-sleepable probes on non-sleepable tracepoints. The tasks trace rcu mechanism is used to synchronize read-side marshalling of the registered probes with respect to sleepable probes unregistration and teardown. Co-developed-by: Mathieu Desnoyers Signed-off-by: Mathieu Desnoyers Signed-off-by: Michael Jeanson Cc: Steven Rostedt (VMware) Cc: Peter Zijlstra Cc: Alexei Starovoitov Cc: Yonghong Song Cc: Paul E. McKenney Cc: Ingo Molnar Cc: Arnaldo Carvalho de Melo Cc: Mark Rutland Cc: Alexander Shishkin Cc: Jiri Olsa Cc: Namhyung Kim Cc: Joel Fernandes (Google) Cc: bpf@vger.kernel.org --- include/linux/tracepoint-defs.h | 11 ++++ include/linux/tracepoint.h | 85 +++++++++++++++++++++----- include/trace/define_trace.h | 7 +++ include/trace/trace_events.h | 6 ++ init/Kconfig | 1 + kernel/tracepoint.c | 103 ++++++++++++++++++++++++++------ 6 files changed, 181 insertions(+), 32 deletions(-) diff --git a/include/linux/tracepoint-defs.h b/include/linux/tracepoint-defs.h index b29950a19205..87ff40cf343f 100644 --- a/include/linux/tracepoint-defs.h +++ b/include/linux/tracepoint-defs.h @@ -27,12 +27,23 @@ struct tracepoint_func { int prio; }; +/** + * enum tracepoint_flags - Tracepoint flags + * @TRACEPOINT_MAYSLEEP: The tracepoint probe callback will be called with + * preemption enabled, and is allowed to take page + * faults. + */ +enum tracepoint_flags { + TRACEPOINT_MAYSLEEP = (1 << 0), +}; + struct tracepoint { const char *name; /* Tracepoint name */ struct static_key key; int (*regfunc)(void); void (*unregfunc)(void); struct tracepoint_func __rcu *funcs; + unsigned int flags; }; #ifdef CONFIG_HAVE_ARCH_PREL32_RELOCATIONS diff --git a/include/linux/tracepoint.h b/include/linux/tracepoint.h index 598fec9f9dbf..0386b54cbcbb 100644 --- a/include/linux/tracepoint.h +++ b/include/linux/tracepoint.h @@ -18,6 +18,7 @@ #include #include #include +#include #include struct module; @@ -37,9 +38,14 @@ extern struct srcu_struct tracepoint_srcu; extern int tracepoint_probe_register(struct tracepoint *tp, void *probe, void *data); extern int +tracepoint_probe_register_maysleep(struct tracepoint *tp, void *probe, void *data); +extern int tracepoint_probe_register_prio(struct tracepoint *tp, void *probe, void *data, int prio); extern int +tracepoint_probe_register_prio_maysleep(struct tracepoint *tp, void *probe, void *data, + int prio); +extern int tracepoint_probe_unregister(struct tracepoint *tp, void *probe, void *data); extern void for_each_kernel_tracepoint(void (*fct)(struct tracepoint *tp, void *priv), @@ -79,6 +85,7 @@ int unregister_tracepoint_module_notifier(struct notifier_block *nb) #ifdef CONFIG_TRACEPOINTS static inline void tracepoint_synchronize_unregister(void) { + synchronize_rcu_tasks_trace(); synchronize_srcu(&tracepoint_srcu); synchronize_rcu(); } @@ -157,12 +164,13 @@ static inline struct tracepoint *tracepoint_ptr_deref(tracepoint_ptr_t *p) * has a "void" prototype, then it is invalid to declare a function * as "(void *, void)". */ -#define __DO_TRACE(tp, proto, args, cond, rcuidle) \ +#define __DO_TRACE(tp, proto, args, cond, rcuidle, tp_flags) \ do { \ struct tracepoint_func *it_func_ptr; \ void *it_func; \ void *__data; \ int __maybe_unused __idx = 0; \ + bool maysleep = (tp_flags) & TRACEPOINT_MAYSLEEP; \ \ if (!(cond)) \ return; \ @@ -170,8 +178,13 @@ static inline struct tracepoint *tracepoint_ptr_deref(tracepoint_ptr_t *p) /* srcu can't be used from NMI */ \ WARN_ON_ONCE(rcuidle && in_nmi()); \ \ - /* keep srcu and sched-rcu usage consistent */ \ - preempt_disable_notrace(); \ + if (maysleep) { \ + might_sleep(); \ + rcu_read_lock_trace(); \ + } else { \ + /* keep srcu and sched-rcu usage consistent */ \ + preempt_disable_notrace(); \ + } \ \ /* \ * For rcuidle callers, use srcu since sched-rcu \ @@ -197,21 +210,24 @@ static inline struct tracepoint *tracepoint_ptr_deref(tracepoint_ptr_t *p) srcu_read_unlock_notrace(&tracepoint_srcu, __idx);\ } \ \ - preempt_enable_notrace(); \ + if (maysleep) \ + rcu_read_unlock_trace(); \ + else \ + preempt_enable_notrace(); \ } while (0) #ifndef MODULE -#define __DECLARE_TRACE_RCU(name, proto, args, cond, data_proto, data_args) \ +#define __DECLARE_TRACE_RCU(name, proto, args, cond, data_proto, data_args, tp_flags) \ static inline void trace_##name##_rcuidle(proto) \ { \ if (static_key_false(&__tracepoint_##name.key)) \ __DO_TRACE(&__tracepoint_##name, \ TP_PROTO(data_proto), \ TP_ARGS(data_args), \ - TP_CONDITION(cond), 1); \ + TP_CONDITION(cond), 1, tp_flags); \ } #else -#define __DECLARE_TRACE_RCU(name, proto, args, cond, data_proto, data_args) +#define __DECLARE_TRACE_RCU(name, proto, args, cond, data_proto, data_args, tp_flags) #endif /* @@ -226,7 +242,7 @@ static inline struct tracepoint *tracepoint_ptr_deref(tracepoint_ptr_t *p) * even when this tracepoint is off. This code has no purpose other than * poking RCU a bit. */ -#define __DECLARE_TRACE(name, proto, args, cond, data_proto, data_args) \ +#define __DECLARE_TRACE(name, proto, args, cond, data_proto, data_args, tp_flags) \ extern struct tracepoint __tracepoint_##name; \ static inline void trace_##name(proto) \ { \ @@ -234,7 +250,7 @@ static inline struct tracepoint *tracepoint_ptr_deref(tracepoint_ptr_t *p) __DO_TRACE(&__tracepoint_##name, \ TP_PROTO(data_proto), \ TP_ARGS(data_args), \ - TP_CONDITION(cond), 0); \ + TP_CONDITION(cond), 0, tp_flags); \ if (IS_ENABLED(CONFIG_LOCKDEP) && (cond)) { \ rcu_read_lock_sched_notrace(); \ rcu_dereference_sched(__tracepoint_##name.funcs);\ @@ -242,7 +258,7 @@ static inline struct tracepoint *tracepoint_ptr_deref(tracepoint_ptr_t *p) } \ } \ __DECLARE_TRACE_RCU(name, PARAMS(proto), PARAMS(args), \ - PARAMS(cond), PARAMS(data_proto), PARAMS(data_args)) \ + PARAMS(cond), PARAMS(data_proto), PARAMS(data_args), tp_flags) \ static inline int \ register_trace_##name(void (*probe)(data_proto), void *data) \ { \ @@ -250,6 +266,12 @@ static inline struct tracepoint *tracepoint_ptr_deref(tracepoint_ptr_t *p) (void *)probe, data); \ } \ static inline int \ + register_trace_maysleep_##name(void (*probe)(data_proto), void *data) \ + { \ + return tracepoint_probe_register_maysleep(&__tracepoint_##name, \ + (void *)probe, data); \ + } \ + static inline int \ register_trace_prio_##name(void (*probe)(data_proto), void *data,\ int prio) \ { \ @@ -257,6 +279,13 @@ static inline struct tracepoint *tracepoint_ptr_deref(tracepoint_ptr_t *p) (void *)probe, data, prio); \ } \ static inline int \ + register_trace_prio_maysleep_##name(void (*probe)(data_proto), \ + void *data, int prio) \ + { \ + return tracepoint_probe_register_prio_maysleep(&__tracepoint_##name, \ + (void *)probe, data, prio); \ + } \ + static inline int \ unregister_trace_##name(void (*probe)(data_proto), void *data) \ { \ return tracepoint_probe_unregister(&__tracepoint_##name,\ @@ -277,14 +306,17 @@ static inline struct tracepoint *tracepoint_ptr_deref(tracepoint_ptr_t *p) * structures, so we create an array of pointers that will be used for iteration * on the tracepoints. */ -#define DEFINE_TRACE_FN(name, reg, unreg) \ +#define DEFINE_TRACE_FN_FLAGS(name, reg, unreg, tp_flags) \ static const char __tpstrtab_##name[] \ __section(__tracepoints_strings) = #name; \ struct tracepoint __tracepoint_##name __used \ __section(__tracepoints) = \ - { __tpstrtab_##name, STATIC_KEY_INIT_FALSE, reg, unreg, NULL };\ + { __tpstrtab_##name, STATIC_KEY_INIT_FALSE, reg, unreg, NULL, tp_flags };\ __TRACEPOINT_ENTRY(name); +#define DEFINE_TRACE_FN(name, reg, unreg) \ + DEFINE_TRACE_FN_FLAGS(name, reg, unreg, 0) + #define DEFINE_TRACE(name) \ DEFINE_TRACE_FN(name, NULL, NULL); @@ -294,7 +326,7 @@ static inline struct tracepoint *tracepoint_ptr_deref(tracepoint_ptr_t *p) EXPORT_SYMBOL(__tracepoint_##name) #else /* !TRACEPOINTS_ENABLED */ -#define __DECLARE_TRACE(name, proto, args, cond, data_proto, data_args) \ +#define __DECLARE_TRACE(name, proto, args, cond, data_proto, data_args, tp_flags) \ static inline void trace_##name(proto) \ { } \ static inline void trace_##name##_rcuidle(proto) \ @@ -306,6 +338,18 @@ static inline struct tracepoint *tracepoint_ptr_deref(tracepoint_ptr_t *p) return -ENOSYS; \ } \ static inline int \ + register_trace_maysleep_##name(void (*probe)(data_proto), \ + void *data) \ + { \ + return -ENOSYS; \ + } \ + static inline int \ + register_trace_prio_maysleep_##name(void (*probe)(data_proto), \ + void *data, int prio) \ + { \ + return -ENOSYS; \ + } \ + static inline int \ unregister_trace_##name(void (*probe)(data_proto), \ void *data) \ { \ @@ -320,6 +364,7 @@ static inline struct tracepoint *tracepoint_ptr_deref(tracepoint_ptr_t *p) return false; \ } +#define DEFINE_TRACE_FN_FLAGS(name, reg, unreg, tp_flags) #define DEFINE_TRACE_FN(name, reg, unreg) #define DEFINE_TRACE(name) #define EXPORT_TRACEPOINT_SYMBOL_GPL(name) @@ -375,13 +420,20 @@ static inline struct tracepoint *tracepoint_ptr_deref(tracepoint_ptr_t *p) __DECLARE_TRACE(name, PARAMS(proto), PARAMS(args), \ cpu_online(raw_smp_processor_id()), \ PARAMS(void *__data, proto), \ - PARAMS(__data, args)) + PARAMS(__data, args), 0) + +#define DECLARE_TRACE_MAYSLEEP(name, proto, args) \ + __DECLARE_TRACE(name, PARAMS(proto), PARAMS(args), \ + cpu_online(raw_smp_processor_id()), \ + PARAMS(void *__data, proto), \ + PARAMS(__data, args), \ + TRACEPOINT_MAYSLEEP) #define DECLARE_TRACE_CONDITION(name, proto, args, cond) \ __DECLARE_TRACE(name, PARAMS(proto), PARAMS(args), \ cpu_online(raw_smp_processor_id()) && (PARAMS(cond)), \ PARAMS(void *__data, proto), \ - PARAMS(__data, args)) + PARAMS(__data, args), 0) #define TRACE_EVENT_FLAGS(event, flag) @@ -512,6 +564,9 @@ static inline struct tracepoint *tracepoint_ptr_deref(tracepoint_ptr_t *p) #define TRACE_EVENT_FN(name, proto, args, struct, \ assign, print, reg, unreg) \ DECLARE_TRACE(name, PARAMS(proto), PARAMS(args)) +#define TRACE_EVENT_FN_MAYSLEEP(name, proto, args, struct, \ + assign, print, reg, unreg) \ + DECLARE_TRACE_MAYSLEEP(name, PARAMS(proto), PARAMS(args)) #define TRACE_EVENT_FN_COND(name, proto, args, cond, struct, \ assign, print, reg, unreg) \ DECLARE_TRACE_CONDITION(name, PARAMS(proto), \ diff --git a/include/trace/define_trace.h b/include/trace/define_trace.h index bd75f97867b9..2b6ae7c978b3 100644 --- a/include/trace/define_trace.h +++ b/include/trace/define_trace.h @@ -41,6 +41,12 @@ assign, print, reg, unreg) \ DEFINE_TRACE_FN(name, reg, unreg) +/* Define a trace event with the MAYSLEEP flag set */ +#undef TRACE_EVENT_FN_MAYSLEEP +#define TRACE_EVENT_FN_MAYSLEEP(name, proto, args, tstruct, \ + assign, print, reg, unreg) \ + DEFINE_TRACE_FN_FLAGS(name, reg, unreg, TRACEPOINT_MAYSLEEP) + #undef TRACE_EVENT_FN_COND #define TRACE_EVENT_FN_COND(name, proto, args, cond, tstruct, \ assign, print, reg, unreg) \ @@ -106,6 +112,7 @@ #undef TRACE_EVENT #undef TRACE_EVENT_FN +#undef TRACE_EVENT_FN_MAYSLEEP #undef TRACE_EVENT_FN_COND #undef TRACE_EVENT_CONDITION #undef TRACE_EVENT_NOP diff --git a/include/trace/trace_events.h b/include/trace/trace_events.h index 1bc3e7bba9a4..8b3f4068a702 100644 --- a/include/trace/trace_events.h +++ b/include/trace/trace_events.h @@ -138,6 +138,12 @@ TRACE_MAKE_SYSTEM_STR(); TRACE_EVENT(name, PARAMS(proto), PARAMS(args), \ PARAMS(tstruct), PARAMS(assign), PARAMS(print)) \ +#undef TRACE_EVENT_FN_MAYSLEEP +#define TRACE_EVENT_FN_MAYSLEEP(name, proto, args, tstruct, \ + assign, print, reg, unreg) \ + TRACE_EVENT(name, PARAMS(proto), PARAMS(args), \ + PARAMS(tstruct), PARAMS(assign), PARAMS(print)) \ + #undef TRACE_EVENT_FN_COND #define TRACE_EVENT_FN_COND(name, proto, args, cond, tstruct, \ assign, print, reg, unreg) \ diff --git a/init/Kconfig b/init/Kconfig index d6a0b31b13dc..857f57562490 100644 --- a/init/Kconfig +++ b/init/Kconfig @@ -2018,6 +2018,7 @@ config PROFILING # config TRACEPOINTS bool + select TASKS_TRACE_RCU endmenu # General setup diff --git a/kernel/tracepoint.c b/kernel/tracepoint.c index 73956eaff8a9..8d8e41c5d8a5 100644 --- a/kernel/tracepoint.c +++ b/kernel/tracepoint.c @@ -60,11 +60,16 @@ static inline void *allocate_probes(int count) return p == NULL ? NULL : p->probes; } -static void srcu_free_old_probes(struct rcu_head *head) +static void rcu_tasks_trace_free_old_probes(struct rcu_head *head) { kfree(container_of(head, struct tp_probes, rcu)); } +static void srcu_free_old_probes(struct rcu_head *head) +{ + call_rcu_tasks_trace(head, rcu_tasks_trace_free_old_probes); +} + static void rcu_free_old_probes(struct rcu_head *head) { call_srcu(&tracepoint_srcu, head, srcu_free_old_probes); @@ -85,7 +90,7 @@ static __init int release_early_probes(void) return 0; } -/* SRCU is initialized at core_initcall */ +/* SRCU and Tasks Trace RCU are initialized at core_initcall */ postcore_initcall(release_early_probes); static inline void release_probes(struct tracepoint_func *old) @@ -95,8 +100,9 @@ static inline void release_probes(struct tracepoint_func *old) struct tp_probes, probes[0]); /* - * We can't free probes if SRCU is not initialized yet. - * Postpone the freeing till after SRCU is initialized. + * We can't free probes if SRCU and Tasks Trace RCU are not + * initialized yet. Postpone the freeing till after both are + * initialized. */ if (unlikely(!ok_to_free_tracepoints)) { tp_probes->rcu.next = early_probes; @@ -105,10 +111,9 @@ static inline void release_probes(struct tracepoint_func *old) } /* - * Tracepoint probes are protected by both sched RCU and SRCU, - * by calling the SRCU callback in the sched RCU callback we - * cover both cases. So let us chain the SRCU and sched RCU - * callbacks to wait for both grace periods. + * Tracepoint probes are protected by sched RCU, SRCU and + * Tasks Trace RCU by chaining the callbacks we cover all three + * cases and wait for all three grace periods. */ call_rcu(&tp_probes->rcu, rcu_free_old_probes); } @@ -289,6 +294,21 @@ static int tracepoint_remove_func(struct tracepoint *tp, return 0; } +static int __tracepoint_probe_register_prio(struct tracepoint *tp, void *probe, + void *data, int prio) +{ + struct tracepoint_func tp_func; + int ret; + + mutex_lock(&tracepoints_mutex); + tp_func.func = probe; + tp_func.data = data; + tp_func.prio = prio; + ret = tracepoint_add_func(tp, &tp_func, prio); + mutex_unlock(&tracepoints_mutex); + return ret; +} + /** * tracepoint_probe_register_prio - Connect a probe to a tracepoint with priority * @tp: tracepoint @@ -296,6 +316,8 @@ static int tracepoint_remove_func(struct tracepoint *tp, * @data: tracepoint data * @prio: priority of this function over other registered functions * + * Non-sleepable probes can only be registered on non-sleepable tracepoints. + * * Returns 0 if ok, error value on error. * Note: if @tp is within a module, the caller is responsible for * unregistering the probe before the module is gone. This can be @@ -305,25 +327,49 @@ static int tracepoint_remove_func(struct tracepoint *tp, int tracepoint_probe_register_prio(struct tracepoint *tp, void *probe, void *data, int prio) { - struct tracepoint_func tp_func; - int ret; + if (tp->flags & TRACEPOINT_MAYSLEEP) + return -EINVAL; - mutex_lock(&tracepoints_mutex); - tp_func.func = probe; - tp_func.data = data; - tp_func.prio = prio; - ret = tracepoint_add_func(tp, &tp_func, prio); - mutex_unlock(&tracepoints_mutex); - return ret; + return __tracepoint_probe_register_prio(tp, probe, data, prio); } EXPORT_SYMBOL_GPL(tracepoint_probe_register_prio); +/** + * tracepoint_probe_register_prio_maysleep - Connect a sleepable probe to a tracepoint with priority + * @tp: tracepoint + * @probe: probe handler + * @data: tracepoint data + * @prio: priority of this function over other registered functions + * + * When the TRACEPOINT_MAYSLEEP flag is provided on registration, the probe + * callback will be called with preemption enabled, and is allowed to take + * page faults. Sleepable probes can only be registered on sleepable + * tracepoints. + * + * Returns 0 if ok, error value on error. + * Note: if @tp is within a module, the caller is responsible for + * unregistering the probe before the module is gone. This can be + * performed either with a tracepoint module going notifier, or from + * within module exit functions. + */ +int tracepoint_probe_register_prio_maysleep(struct tracepoint *tp, void *probe, + void *data, int prio) +{ + if (!(tp->flags & TRACEPOINT_MAYSLEEP)) + return -EINVAL; + + return __tracepoint_probe_register_prio(tp, probe, data, prio); +} +EXPORT_SYMBOL_GPL(tracepoint_probe_register_prio_maysleep); + /** * tracepoint_probe_register - Connect a probe to a tracepoint * @tp: tracepoint * @probe: probe handler * @data: tracepoint data * + * Non-sleepable probes can only be registered on non-sleepable tracepoints. + * * Returns 0 if ok, error value on error. * Note: if @tp is within a module, the caller is responsible for * unregistering the probe before the module is gone. This can be @@ -336,6 +382,29 @@ int tracepoint_probe_register(struct tracepoint *tp, void *probe, void *data) } EXPORT_SYMBOL_GPL(tracepoint_probe_register); +/** + * tracepoint_probe_register_maysleep - Connect a sleepable probe to a tracepoint + * @tp: tracepoint + * @probe: probe handler + * @data: tracepoint data + * + * When the TRACEPOINT_MAYSLEEP flag is provided on registration, the probe + * callback will be called with preemption enabled, and is allowed to take + * page faults. Sleepable probes can only be registered on sleepable + * tracepoints. + * + * Returns 0 if ok, error value on error. + * Note: if @tp is within a module, the caller is responsible for + * unregistering the probe before the module is gone. This can be + * performed either with a tracepoint module going notifier, or from + * within module exit functions. + */ +int tracepoint_probe_register_maysleep(struct tracepoint *tp, void *probe, void *data) +{ + return tracepoint_probe_register_prio_maysleep(tp, probe, data, TRACEPOINT_DEFAULT_PRIO); +} +EXPORT_SYMBOL_GPL(tracepoint_probe_register_maysleep); + /** * tracepoint_probe_unregister - Disconnect a probe from a tracepoint * @tp: tracepoint From patchwork Fri Oct 23 19:53:48 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Michael Jeanson X-Patchwork-Id: 11854485 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH, MAILING_LIST_MULTI,SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0DE51C388F9 for ; Fri, 23 Oct 2020 19:54:39 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id A39552192A for ; Fri, 23 Oct 2020 19:54:38 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=efficios.com header.i=@efficios.com header.b="lqOvpsjN" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755030AbgJWTyV (ORCPT ); Fri, 23 Oct 2020 15:54:21 -0400 Received: from mail.efficios.com ([167.114.26.124]:45624 "EHLO mail.efficios.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S463938AbgJWTyU (ORCPT ); Fri, 23 Oct 2020 15:54:20 -0400 Received: from localhost (localhost [127.0.0.1]) by mail.efficios.com (Postfix) with ESMTP id AD11F279340; Fri, 23 Oct 2020 15:54:18 -0400 (EDT) Received: from mail.efficios.com ([127.0.0.1]) by localhost (mail03.efficios.com [127.0.0.1]) (amavisd-new, port 10032) with ESMTP id cKe7_8r-O1NN; Fri, 23 Oct 2020 15:54:18 -0400 (EDT) Received: from localhost (localhost [127.0.0.1]) by mail.efficios.com (Postfix) with ESMTP id 346D027933E; Fri, 23 Oct 2020 15:54:18 -0400 (EDT) DKIM-Filter: OpenDKIM Filter v2.10.3 mail.efficios.com 346D027933E DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=efficios.com; s=default; t=1603482858; bh=1k/Fc6f/EyimtcvTtfhKUy5m5XKDGsU8/p129kdRdMk=; h=From:To:Date:Message-Id:MIME-Version; b=lqOvpsjNrK+U7UDmHkZo6G452BqjZToR85K9/hQTONiQ6/BeA38C0iV/ku+XoxeZE rszC+1fC7hje2jSmE5VuRJhnNdyUiZPsePzfsI4OM4aPE+i9LATq3s86hlxABHMsYv to6oIXUMKRr5pdxtPHom6R7l0KCERWYFFlZle4FlJONBCpK+UQHW1JTPpDEhnZga81 w47eVFOCgG0UMNjq9QJCxCul0GjCYzNRP39gm1T5WVa/8J+V+m3St+ZC+0RU20ahg/ I5EJREVbijO4+4EkQZQXF34DOQcXFUwrcuTrZfhDod1bfCirDYBEnCINilg9725s9i vVvhTjxUXKOHQ== X-Virus-Scanned: amavisd-new at efficios.com Received: from mail.efficios.com ([127.0.0.1]) by localhost (mail03.efficios.com [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id Hv855XeYaNej; Fri, 23 Oct 2020 15:54:18 -0400 (EDT) Received: from localhost.localdomain (96-127-212-112.qc.cable.ebox.net [96.127.212.112]) by mail.efficios.com (Postfix) with ESMTPSA id D7AA9279622; Fri, 23 Oct 2020 15:54:17 -0400 (EDT) From: Michael Jeanson To: linux-kernel@vger.kernel.org Cc: mathieu.desnoyers@efficios.com, Michael Jeanson , Steven Rostedt , Peter Zijlstra , Alexei Starovoitov , Yonghong Song , "Paul E . McKenney" , Ingo Molnar , Arnaldo Carvalho de Melo , Mark Rutland , Alexander Shishkin , Jiri Olsa , Namhyung Kim , Joel Fernandes , bpf@vger.kernel.org Subject: [RFC PATCH 2/6] tracing: ftrace: add support for sleepable tracepoints Date: Fri, 23 Oct 2020 15:53:48 -0400 Message-Id: <20201023195352.26269-3-mjeanson@efficios.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20201023195352.26269-1-mjeanson@efficios.com> References: <20201023195352.26269-1-mjeanson@efficios.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-State: RFC In preparation for converting system call enter/exit instrumentation into sleepable tracepoints, make sure that ftrace can handle registering to such tracepoints by explicitly disabling preemption within the ftrace tracepoint probes to respect the current expectations within ftrace ring buffer code. This change does not yet allow ftrace to take page faults per se within its probe, but allows its existing probes to connect to sleepable tracepoints. Co-developed-by: Mathieu Desnoyers Signed-off-by: Mathieu Desnoyers Signed-off-by: Michael Jeanson Cc: Steven Rostedt (VMware) Cc: Peter Zijlstra Cc: Alexei Starovoitov Cc: Yonghong Song Cc: Paul E. McKenney Cc: Ingo Molnar Cc: Arnaldo Carvalho de Melo Cc: Mark Rutland Cc: Alexander Shishkin Cc: Jiri Olsa Cc: Namhyung Kim Cc: Joel Fernandes (Google) Cc: bpf@vger.kernel.org --- include/trace/trace_events.h | 75 +++++++++++++++++++++++++++++++++--- kernel/trace/trace_events.c | 15 +++++++- 2 files changed, 83 insertions(+), 7 deletions(-) diff --git a/include/trace/trace_events.h b/include/trace/trace_events.h index 8b3f4068a702..b95a9c3d9405 100644 --- a/include/trace/trace_events.h +++ b/include/trace/trace_events.h @@ -80,6 +80,16 @@ TRACE_MAKE_SYSTEM_STR(); PARAMS(print)); \ DEFINE_EVENT(name, name, PARAMS(proto), PARAMS(args)); +#undef TRACE_EVENT_MAYSLEEP +#define TRACE_EVENT_MAYSLEEP(name, proto, args, tstruct, assign, print) \ + DECLARE_EVENT_CLASS_MAYSLEEP(name, \ + PARAMS(proto), \ + PARAMS(args), \ + PARAMS(tstruct), \ + PARAMS(assign), \ + PARAMS(print)); \ + DEFINE_EVENT(name, name, PARAMS(proto), PARAMS(args)); + #undef __field #define __field(type, item) type item; @@ -118,6 +128,12 @@ TRACE_MAKE_SYSTEM_STR(); \ static struct trace_event_class event_class_##name; +#undef DECLARE_EVENT_CLASS_MAYSLEEP +#define DECLARE_EVENT_CLASS_MAYSLEEP(name, proto, args, \ + tstruct, assign, print) \ + DECLARE_EVENT_CLASS(name, PARAMS(proto), PARAMS(args), \ + PARAMS(tstruct), PARAMS(assign), PARAMS(print)) + #undef DEFINE_EVENT #define DEFINE_EVENT(template, name, proto, args) \ static struct trace_event_call __used \ @@ -141,7 +157,7 @@ TRACE_MAKE_SYSTEM_STR(); #undef TRACE_EVENT_FN_MAYSLEEP #define TRACE_EVENT_FN_MAYSLEEP(name, proto, args, tstruct, \ assign, print, reg, unreg) \ - TRACE_EVENT(name, PARAMS(proto), PARAMS(args), \ + TRACE_EVENT_MAYSLEEP(name, PARAMS(proto), PARAMS(args), \ PARAMS(tstruct), PARAMS(assign), PARAMS(print)) \ #undef TRACE_EVENT_FN_COND @@ -212,6 +228,12 @@ TRACE_MAKE_SYSTEM_STR(); tstruct; \ }; +#undef DECLARE_EVENT_CLASS_MAYSLEEP +#define DECLARE_EVENT_CLASS_MAYSLEEP(call, proto, args, \ + tstruct, assign, print) \ + DECLARE_EVENT_CLASS(call, PARAMS(proto), PARAMS(args), \ + PARAMS(tstruct), PARAMS(assign), PARAMS(print)) + #undef DEFINE_EVENT #define DEFINE_EVENT(template, name, proto, args) @@ -378,6 +400,12 @@ static struct trace_event_functions trace_event_type_funcs_##call = { \ .trace = trace_raw_output_##call, \ }; +#undef DECLARE_EVENT_CLASS_MAYSLEEP +#define DECLARE_EVENT_CLASS_MAYSLEEP(call, proto, args, \ + tstruct, assign, print) \ + DECLARE_EVENT_CLASS(call, PARAMS(proto), PARAMS(args), \ + PARAMS(tstruct), PARAMS(assign), PARAMS(print)) + #undef DEFINE_EVENT_PRINT #define DEFINE_EVENT_PRINT(template, call, proto, args, print) \ static notrace enum print_line_t \ @@ -448,6 +476,12 @@ static struct trace_event_fields trace_event_fields_##call[] = { \ tstruct \ {} }; +#undef DECLARE_EVENT_CLASS_MAYSLEEP +#define DECLARE_EVENT_CLASS_MAYSLEEP(call, proto, args, \ + tstruct, func, print) \ + DECLARE_EVENT_CLASS(call, PARAMS(proto), PARAMS(args), \ + PARAMS(tstruct), PARAMS(func), PARAMS(print)) + #undef DEFINE_EVENT_PRINT #define DEFINE_EVENT_PRINT(template, name, proto, args, print) @@ -524,6 +558,12 @@ static inline notrace int trace_event_get_offsets_##call( \ return __data_size; \ } +#undef DECLARE_EVENT_CLASS_MAYSLEEP +#define DECLARE_EVENT_CLASS_MAYSLEEP(call, proto, args, \ + tstruct, assign, print) \ + DECLARE_EVENT_CLASS(call, PARAMS(proto), PARAMS(args), \ + PARAMS(tstruct), PARAMS(assign), PARAMS(print)) + #include TRACE_INCLUDE(TRACE_INCLUDE_FILE) /* @@ -673,8 +713,8 @@ static inline notrace int trace_event_get_offsets_##call( \ #undef __perf_task #define __perf_task(t) (t) -#undef DECLARE_EVENT_CLASS -#define DECLARE_EVENT_CLASS(call, proto, args, tstruct, assign, print) \ +#undef _DECLARE_EVENT_CLASS +#define _DECLARE_EVENT_CLASS(call, proto, args, tstruct, assign, print, tp_flags) \ \ static notrace void \ trace_event_raw_event_##call(void *__data, proto) \ @@ -685,8 +725,11 @@ trace_event_raw_event_##call(void *__data, proto) \ struct trace_event_raw_##call *entry; \ int __data_size; \ \ + if ((tp_flags) & TRACEPOINT_MAYSLEEP) \ + preempt_disable_notrace(); \ + \ if (trace_trigger_soft_disabled(trace_file)) \ - return; \ + goto end; \ \ __data_size = trace_event_get_offsets_##call(&__data_offsets, args); \ \ @@ -694,14 +737,30 @@ trace_event_raw_event_##call(void *__data, proto) \ sizeof(*entry) + __data_size); \ \ if (!entry) \ - return; \ + goto end; \ \ tstruct \ \ { assign; } \ \ trace_event_buffer_commit(&fbuffer); \ +end: \ + if ((tp_flags) & TRACEPOINT_MAYSLEEP) \ + preempt_enable_notrace(); \ } + +#undef DECLARE_EVENT_CLASS +#define DECLARE_EVENT_CLASS(call, proto, args, tstruct, assign, print) \ + _DECLARE_EVENT_CLASS(call, PARAMS(proto), PARAMS(args), \ + PARAMS(tstruct), PARAMS(assign), \ + PARAMS(print), 0) + +#undef DECLARE_EVENT_CLASS_MAYSLEEP +#define DECLARE_EVENT_CLASS_MAYSLEEP(call, proto, args, tstruct, assign, print) \ + _DECLARE_EVENT_CLASS(call, PARAMS(proto), PARAMS(args), \ + PARAMS(tstruct), PARAMS(assign), \ + PARAMS(print), TRACEPOINT_MAYSLEEP) + /* * The ftrace_test_probe is compiled out, it is only here as a build time check * to make sure that if the tracepoint handling changes, the ftrace probe will @@ -748,6 +807,12 @@ static struct trace_event_class __used __refdata event_class_##call = { \ _TRACE_PERF_INIT(call) \ }; +#undef DECLARE_EVENT_CLASS_MAYSLEEP +#define DECLARE_EVENT_CLASS_MAYSLEEP(call, proto, args, \ + tstruct, assign, print) \ + DECLARE_EVENT_CLASS(call, PARAMS(proto), PARAMS(args), \ + PARAMS(tstruct), PARAMS(assign), PARAMS(print)) + #undef DEFINE_EVENT #define DEFINE_EVENT(template, call, proto, args) \ \ diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c index a85effb2373b..058fe2834f14 100644 --- a/kernel/trace/trace_events.c +++ b/kernel/trace/trace_events.c @@ -290,9 +290,15 @@ int trace_event_reg(struct trace_event_call *call, WARN_ON(!(call->flags & TRACE_EVENT_FL_TRACEPOINT)); switch (type) { case TRACE_REG_REGISTER: - return tracepoint_probe_register(call->tp, + if (call->tp->flags & TRACEPOINT_MAYSLEEP) + return tracepoint_probe_register_maysleep(call->tp, call->class->probe, file); + else + return tracepoint_probe_register(call->tp, + call->class->probe, + file); + case TRACE_REG_UNREGISTER: tracepoint_probe_unregister(call->tp, call->class->probe, @@ -301,7 +307,12 @@ int trace_event_reg(struct trace_event_call *call, #ifdef CONFIG_PERF_EVENTS case TRACE_REG_PERF_REGISTER: - return tracepoint_probe_register(call->tp, + if (call->tp->flags & TRACEPOINT_MAYSLEEP) + return tracepoint_probe_register_maysleep(call->tp, + call->class->perf_probe, + call); + else + return tracepoint_probe_register(call->tp, call->class->perf_probe, call); case TRACE_REG_PERF_UNREGISTER: From patchwork Fri Oct 23 19:53:49 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Michael Jeanson X-Patchwork-Id: 11854481 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH, MAILING_LIST_MULTI,SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 20DE5C5517A for ; Fri, 23 Oct 2020 19:54:22 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id B17DA22210 for ; Fri, 23 Oct 2020 19:54:21 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=efficios.com header.i=@efficios.com header.b="QrvGlnnt" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755026AbgJWTyU (ORCPT ); Fri, 23 Oct 2020 15:54:20 -0400 Received: from mail.efficios.com ([167.114.26.124]:45678 "EHLO mail.efficios.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755005AbgJWTyU (ORCPT ); Fri, 23 Oct 2020 15:54:20 -0400 Received: from localhost (localhost [127.0.0.1]) by mail.efficios.com (Postfix) with ESMTP id 11A83279718; Fri, 23 Oct 2020 15:54:19 -0400 (EDT) Received: from mail.efficios.com ([127.0.0.1]) by localhost (mail03.efficios.com [127.0.0.1]) (amavisd-new, port 10032) with ESMTP id ax90esXAu1Sp; Fri, 23 Oct 2020 15:54:18 -0400 (EDT) Received: from localhost (localhost [127.0.0.1]) by mail.efficios.com (Postfix) with ESMTP id B9EE9279632; Fri, 23 Oct 2020 15:54:18 -0400 (EDT) DKIM-Filter: OpenDKIM Filter v2.10.3 mail.efficios.com B9EE9279632 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=efficios.com; s=default; t=1603482858; bh=SNCUKPYZ/Ab7zOMacK9FkCd4L9/ecQHTPUCkRH7Nkf0=; h=From:To:Date:Message-Id:MIME-Version; b=QrvGlnntLyTtQFm4FzrzQnZoKkf74q6/rSkENH+dICVFxy6iQ/XNTuSxIME7OU/wj 56vwfHepmQyn2cP4KZ4Eij7UkJYL5CcfxRqxSW4F8RlbSJwMC2749qLluIxlBTNGDU k6lWhF8mSq2OZXSGO9XobgTLFUU4of0qjyoRGBgG1F5sPdbakCLKEkHomWLbNpmJoT XexwJoWh29Wl10WfvoSuPJPsCvvWaKhdX5EGOZ7KnTfwDcwvh8mrqCuXeG5jdlNXH6 vYgA6QU1lIVKhc9T1Zc/sAVS8H4YyYGLlMNY1oB6YZu6NdJrs948bKSQvB44wtWEqn bhWnlF1I0dVQA== X-Virus-Scanned: amavisd-new at efficios.com Received: from mail.efficios.com ([127.0.0.1]) by localhost (mail03.efficios.com [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id 7vH3c3nv-SzA; Fri, 23 Oct 2020 15:54:18 -0400 (EDT) Received: from localhost.localdomain (96-127-212-112.qc.cable.ebox.net [96.127.212.112]) by mail.efficios.com (Postfix) with ESMTPSA id 3BDEC27933F; Fri, 23 Oct 2020 15:54:18 -0400 (EDT) From: Michael Jeanson To: linux-kernel@vger.kernel.org Cc: mathieu.desnoyers@efficios.com, Michael Jeanson , Steven Rostedt , Peter Zijlstra , Alexei Starovoitov , Yonghong Song , "Paul E . McKenney" , Ingo Molnar , Arnaldo Carvalho de Melo , Mark Rutland , Alexander Shishkin , Jiri Olsa , Namhyung Kim , Joel Fernandes , bpf@vger.kernel.org Subject: [RFC PATCH 3/6] tracing: bpf-trace: add support for sleepable tracepoints Date: Fri, 23 Oct 2020 15:53:49 -0400 Message-Id: <20201023195352.26269-4-mjeanson@efficios.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20201023195352.26269-1-mjeanson@efficios.com> References: <20201023195352.26269-1-mjeanson@efficios.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net X-Patchwork-State: RFC In preparation for converting system call enter/exit instrumentation into sleepable tracepoints, make sure that bpf can handle registering to such tracepoints by explicitly disabling preemption within the bpf tracepoint probes to respect the current expectations within bpf tracing code. This change does not yet allow bpf to take page faults per se within its probe, but allows its existing probes to connect to sleepable tracepoints. Co-developed-by: Mathieu Desnoyers Signed-off-by: Mathieu Desnoyers Signed-off-by: Michael Jeanson Cc: Steven Rostedt (VMware) Cc: Peter Zijlstra Cc: Alexei Starovoitov Cc: Yonghong Song Cc: Paul E. McKenney Cc: Ingo Molnar Cc: Arnaldo Carvalho de Melo Cc: Mark Rutland Cc: Alexander Shishkin Cc: Jiri Olsa Cc: Namhyung Kim Cc: Joel Fernandes (Google) Cc: bpf@vger.kernel.org --- include/trace/bpf_probe.h | 23 +++++++++++++++++++++-- kernel/trace/bpf_trace.c | 5 ++++- 2 files changed, 25 insertions(+), 3 deletions(-) diff --git a/include/trace/bpf_probe.h b/include/trace/bpf_probe.h index 1ce3be63add1..d688cb9b32fe 100644 --- a/include/trace/bpf_probe.h +++ b/include/trace/bpf_probe.h @@ -55,15 +55,34 @@ /* tracepoints with more than 12 arguments will hit build error */ #define CAST_TO_U64(...) CONCATENATE(__CAST, COUNT_ARGS(__VA_ARGS__))(__VA_ARGS__) -#undef DECLARE_EVENT_CLASS -#define DECLARE_EVENT_CLASS(call, proto, args, tstruct, assign, print) \ +#undef _DECLARE_EVENT_CLASS +#define _DECLARE_EVENT_CLASS(call, proto, args, tstruct, assign, print, tp_flags) \ static notrace void \ __bpf_trace_##call(void *__data, proto) \ { \ struct bpf_prog *prog = __data; \ + \ + if ((tp_flags) & TRACEPOINT_MAYSLEEP) \ + preempt_disable_notrace(); \ + \ CONCATENATE(bpf_trace_run, COUNT_ARGS(args))(prog, CAST_TO_U64(args)); \ + \ + if ((tp_flags) & TRACEPOINT_MAYSLEEP) \ + preempt_enable_notrace(); \ } +#undef DECLARE_EVENT_CLASS +#define DECLARE_EVENT_CLASS(call, proto, args, tstruct, assign, print) \ + _DECLARE_EVENT_CLASS(call, PARAMS(proto), PARAMS(args), \ + PARAMS(tstruct), PARAMS(assign), PARAMS(print), 0) + +#undef DECLARE_EVENT_CLASS_MAYSLEEP +#define DECLARE_EVENT_CLASS_MAYSLEEP(call, proto, args, tstruct, \ + assign, print) \ + _DECLARE_EVENT_CLASS(call, PARAMS(proto), PARAMS(args), \ + PARAMS(tstruct), PARAMS(assign), PARAMS(print), \ + TRACEPOINT_MAYSLEEP) + /* * This part is compiled out, it is only here as a build time check * to make sure that if the tracepoint handling changes, the diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c index a8d4f253ed77..54f8b320fe2f 100644 --- a/kernel/trace/bpf_trace.c +++ b/kernel/trace/bpf_trace.c @@ -1947,7 +1947,10 @@ static int __bpf_probe_register(struct bpf_raw_event_map *btp, struct bpf_prog * if (prog->aux->max_tp_access > btp->writable_size) return -EINVAL; - return tracepoint_probe_register(tp, (void *)btp->bpf_func, prog); + if (tp->flags & TRACEPOINT_MAYSLEEP) + return tracepoint_probe_register_maysleep(tp, (void *)btp->bpf_func, prog); + else + return tracepoint_probe_register(tp, (void *)btp->bpf_func, prog); } int bpf_probe_register(struct bpf_raw_event_map *btp, struct bpf_prog *prog) From patchwork Fri Oct 23 19:53:50 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Michael Jeanson X-Patchwork-Id: 11854487 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH, MAILING_LIST_MULTI,SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 30781C55178 for ; Fri, 23 Oct 2020 19:54:41 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id D12A522201 for ; Fri, 23 Oct 2020 19:54:40 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=efficios.com header.i=@efficios.com header.b="ihNsa3Sb" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755133AbgJWTyi (ORCPT ); Fri, 23 Oct 2020 15:54:38 -0400 Received: from mail.efficios.com ([167.114.26.124]:45706 "EHLO mail.efficios.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755017AbgJWTyU (ORCPT ); Fri, 23 Oct 2020 15:54:20 -0400 Received: from localhost (localhost [127.0.0.1]) by mail.efficios.com (Postfix) with ESMTP id 586B9279698; Fri, 23 Oct 2020 15:54:19 -0400 (EDT) Received: from mail.efficios.com ([127.0.0.1]) by localhost (mail03.efficios.com [127.0.0.1]) (amavisd-new, port 10032) with ESMTP id vjAf7EqbJBQA; Fri, 23 Oct 2020 15:54:19 -0400 (EDT) Received: from localhost (localhost [127.0.0.1]) by mail.efficios.com (Postfix) with ESMTP id 0281527980B; Fri, 23 Oct 2020 15:54:19 -0400 (EDT) DKIM-Filter: OpenDKIM Filter v2.10.3 mail.efficios.com 0281527980B DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=efficios.com; s=default; t=1603482859; bh=cS+Z1yDAr8AcUOGEKTdn+TykhKh9gdu61RHiIBpE4YY=; h=From:To:Date:Message-Id:MIME-Version; b=ihNsa3SbR1l2R43+9+NDhCMNB43YLX0ygwo3FZ1pFBE1AzV2iYbrNn12Wtm0+bnFD Eb0OeNoOm8MnsJzswTiVqY7gy4Cc6Ly+NcCrRozLmbzmfPXxGr8tau3ojDamU5hy3R P8NsNGGEcjjNkF1QxvFL9cqGW6lKHOlrwIj3LsDB99BbE34gJ3hdyG16STgv8krrZ6 v9Onu77Rq5TzawrIsPdtYZdNx86STB6iEn26qUllgN//WyQ5py38IDVfTcK6McLD8r //3WqkMVLCq7ZUGskbypyKAuY4noQPIkLCha1B98EDB8y7vm28HXrzaPyD7uZxV+LX rbYix1ymFsjpA== X-Virus-Scanned: amavisd-new at efficios.com Received: from mail.efficios.com ([127.0.0.1]) by localhost (mail03.efficios.com [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id cwRlg_YJ-FoS; Fri, 23 Oct 2020 15:54:18 -0400 (EDT) Received: from localhost.localdomain (96-127-212-112.qc.cable.ebox.net [96.127.212.112]) by mail.efficios.com (Postfix) with ESMTPSA id 96F7427979B; Fri, 23 Oct 2020 15:54:18 -0400 (EDT) From: Michael Jeanson To: linux-kernel@vger.kernel.org Cc: mathieu.desnoyers@efficios.com, Michael Jeanson , Steven Rostedt , Peter Zijlstra , Alexei Starovoitov , Yonghong Song , "Paul E . McKenney" , Ingo Molnar , Arnaldo Carvalho de Melo , Mark Rutland , Alexander Shishkin , Jiri Olsa , Namhyung Kim , Joel Fernandes , bpf@vger.kernel.org Subject: [RFC PATCH 4/6] tracing: perf: add support for sleepable tracepoints Date: Fri, 23 Oct 2020 15:53:50 -0400 Message-Id: <20201023195352.26269-5-mjeanson@efficios.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20201023195352.26269-1-mjeanson@efficios.com> References: <20201023195352.26269-1-mjeanson@efficios.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-State: RFC In preparation for converting system call enter/exit instrumentation into sleepable tracepoints, make sure that perf can handle registering to such tracepoints by explicitly disabling preemption within the perf tracepoint probes to respect the current expectations within perf ring buffer code. This change does not yet allow perf to take page faults per se within its probe, but allows its existing probes to connect to sleepable tracepoints. Co-developed-by: Mathieu Desnoyers Signed-off-by: Mathieu Desnoyers Signed-off-by: Michael Jeanson Cc: Steven Rostedt (VMware) Cc: Peter Zijlstra Cc: Alexei Starovoitov Cc: Yonghong Song Cc: Paul E. McKenney Cc: Ingo Molnar Cc: Arnaldo Carvalho de Melo Cc: Mark Rutland Cc: Alexander Shishkin Cc: Jiri Olsa Cc: Namhyung Kim Cc: Joel Fernandes (Google) Cc: bpf@vger.kernel.org --- include/trace/perf.h | 26 ++++++++++++++++++++++---- 1 file changed, 22 insertions(+), 4 deletions(-) diff --git a/include/trace/perf.h b/include/trace/perf.h index dbc6c74defc3..e1d866c3a076 100644 --- a/include/trace/perf.h +++ b/include/trace/perf.h @@ -27,8 +27,8 @@ #undef __perf_task #define __perf_task(t) (__task = (t)) -#undef DECLARE_EVENT_CLASS -#define DECLARE_EVENT_CLASS(call, proto, args, tstruct, assign, print) \ +#undef _DECLARE_EVENT_CLASS +#define _DECLARE_EVENT_CLASS(call, proto, args, tstruct, assign, print, tp_flags) \ static notrace void \ perf_trace_##call(void *__data, proto) \ { \ @@ -43,13 +43,16 @@ perf_trace_##call(void *__data, proto) \ int __data_size; \ int rctx; \ \ + if ((tp_flags) & TRACEPOINT_MAYSLEEP) \ + preempt_disable_notrace(); \ + \ __data_size = trace_event_get_offsets_##call(&__data_offsets, args); \ \ head = this_cpu_ptr(event_call->perf_events); \ if (!bpf_prog_array_valid(event_call) && \ __builtin_constant_p(!__task) && !__task && \ hlist_empty(head)) \ - return; \ + goto end; \ \ __entry_size = ALIGN(__data_size + sizeof(*entry) + sizeof(u32),\ sizeof(u64)); \ @@ -57,7 +60,7 @@ perf_trace_##call(void *__data, proto) \ \ entry = perf_trace_buf_alloc(__entry_size, &__regs, &rctx); \ if (!entry) \ - return; \ + goto end; \ \ perf_fetch_caller_regs(__regs); \ \ @@ -68,8 +71,23 @@ perf_trace_##call(void *__data, proto) \ perf_trace_run_bpf_submit(entry, __entry_size, rctx, \ event_call, __count, __regs, \ head, __task); \ +end: \ + if ((tp_flags) & TRACEPOINT_MAYSLEEP) \ + preempt_enable_notrace(); \ } +#undef DECLARE_EVENT_CLASS +#define DECLARE_EVENT_CLASS(call, proto, args, tstruct, assign, print) \ + _DECLARE_EVENT_CLASS(call, PARAMS(proto), PARAMS(args), \ + PARAMS(tstruct), PARAMS(assign), PARAMS(print), 0) + +#undef DECLARE_EVENT_CLASS_MAYSLEEP +#define DECLARE_EVENT_CLASS_MAYSLEEP(call, proto, args, tstruct, \ + assign, print) \ + _DECLARE_EVENT_CLASS(call, PARAMS(proto), PARAMS(args), \ + PARAMS(tstruct), PARAMS(assign), PARAMS(print), \ + TRACEPOINT_MAYSLEEP) + /* * This part is compiled out, it is only here as a build time check * to make sure that if the tracepoint handling changes, the From patchwork Fri Oct 23 19:53:51 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Michael Jeanson X-Patchwork-Id: 11854483 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH, MAILING_LIST_MULTI,SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1945BC56201 for ; Fri, 23 Oct 2020 19:54:31 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id ADE3524641 for ; Fri, 23 Oct 2020 19:54:30 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=efficios.com header.i=@efficios.com header.b="l+ceucGZ" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755056AbgJWTyX (ORCPT ); Fri, 23 Oct 2020 15:54:23 -0400 Received: from mail.efficios.com ([167.114.26.124]:45730 "EHLO mail.efficios.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S463594AbgJWTyW (ORCPT ); Fri, 23 Oct 2020 15:54:22 -0400 Received: from localhost (localhost [127.0.0.1]) by mail.efficios.com (Postfix) with ESMTP id E6C67279343; Fri, 23 Oct 2020 15:54:19 -0400 (EDT) Received: from mail.efficios.com ([127.0.0.1]) by localhost (mail03.efficios.com [127.0.0.1]) (amavisd-new, port 10032) with ESMTP id c_FZvHWRqXC9; Fri, 23 Oct 2020 15:54:19 -0400 (EDT) Received: from localhost (localhost [127.0.0.1]) by mail.efficios.com (Postfix) with ESMTP id 650DA27944D; Fri, 23 Oct 2020 15:54:19 -0400 (EDT) DKIM-Filter: OpenDKIM Filter v2.10.3 mail.efficios.com 650DA27944D DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=efficios.com; s=default; t=1603482859; bh=7Vr3N2XvQsk9CtvpGKjgfXulzOVMImQL6qNLEXg00vY=; h=From:To:Date:Message-Id:MIME-Version; b=l+ceucGZRjZT+6y0b++l3MGPBF4q1EAAw7OlOgNWOTIHSm1+0HPHlNeXAe0z0rkOl pzxctJxGRSPboBvlpbcHyQA0QBOP4L0/paQmHsCRbLXNid4Xv0J1xc5pH4ojawhlGv w9N6RfFndVgWZe+bl8hdEZ7Gr0QkOQpvPMDhnhbwADIQQ4qP4ZvP+t9psBEtsOrgMA HEEhAfLzB13DDfFeyRI1QQQ0kXRREPo+QSVs6AGqBv/N8/wLbdCUr1B0fLuEOyGW/Z dkX/XprEWHRsBdCgCFHLJXZqRHnecwhJAdp4529Rzn6SdpKJlYd2hucPwaf3smzL+S 0SdHsFc8lSStA== X-Virus-Scanned: amavisd-new at efficios.com Received: from mail.efficios.com ([127.0.0.1]) by localhost (mail03.efficios.com [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id tPxcqyxcMzra; Fri, 23 Oct 2020 15:54:19 -0400 (EDT) Received: from localhost.localdomain (96-127-212-112.qc.cable.ebox.net [96.127.212.112]) by mail.efficios.com (Postfix) with ESMTPSA id 01799279880; Fri, 23 Oct 2020 15:54:18 -0400 (EDT) From: Michael Jeanson To: linux-kernel@vger.kernel.org Cc: mathieu.desnoyers@efficios.com, Michael Jeanson , Steven Rostedt , Peter Zijlstra , Alexei Starovoitov , Yonghong Song , "Paul E . McKenney" , Ingo Molnar , Arnaldo Carvalho de Melo , Mark Rutland , Alexander Shishkin , Jiri Olsa , Namhyung Kim , Joel Fernandes , bpf@vger.kernel.org Subject: [RFC PATCH 5/6] tracing: convert sys_enter/exit to sleepable tracepoints Date: Fri, 23 Oct 2020 15:53:51 -0400 Message-Id: <20201023195352.26269-6-mjeanson@efficios.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20201023195352.26269-1-mjeanson@efficios.com> References: <20201023195352.26269-1-mjeanson@efficios.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-State: RFC Convert the definition of the system call enter/exit tracepoints to sleepable tracepoints now that all upstream tracers handle it. Co-developed-by: Mathieu Desnoyers Signed-off-by: Mathieu Desnoyers Signed-off-by: Michael Jeanson Cc: Steven Rostedt (VMware) Cc: Peter Zijlstra Cc: Alexei Starovoitov Cc: Yonghong Song Cc: Paul E. McKenney Cc: Ingo Molnar Cc: Arnaldo Carvalho de Melo Cc: Mark Rutland Cc: Alexander Shishkin Cc: Jiri Olsa Cc: Namhyung Kim Cc: Joel Fernandes (Google) Cc: bpf@vger.kernel.org --- include/trace/events/syscalls.h | 4 +- kernel/trace/trace_syscalls.c | 68 ++++++++++++++++++++------------- 2 files changed, 44 insertions(+), 28 deletions(-) diff --git a/include/trace/events/syscalls.h b/include/trace/events/syscalls.h index b6e0cbc2c71f..fbb8d8b48f81 100644 --- a/include/trace/events/syscalls.h +++ b/include/trace/events/syscalls.h @@ -15,7 +15,7 @@ #ifdef CONFIG_HAVE_SYSCALL_TRACEPOINTS -TRACE_EVENT_FN(sys_enter, +TRACE_EVENT_FN_MAYSLEEP(sys_enter, TP_PROTO(struct pt_regs *regs, long id), @@ -41,7 +41,7 @@ TRACE_EVENT_FN(sys_enter, TRACE_EVENT_FLAGS(sys_enter, TRACE_EVENT_FL_CAP_ANY) -TRACE_EVENT_FN(sys_exit, +TRACE_EVENT_FN_MAYSLEEP(sys_exit, TP_PROTO(struct pt_regs *regs, long ret), diff --git a/kernel/trace/trace_syscalls.c b/kernel/trace/trace_syscalls.c index d85a2f0f316b..48d92d59fb92 100644 --- a/kernel/trace/trace_syscalls.c +++ b/kernel/trace/trace_syscalls.c @@ -304,21 +304,23 @@ static void ftrace_syscall_enter(void *data, struct pt_regs *regs, long id) int syscall_nr; int size; + preempt_disable_notrace(); + syscall_nr = trace_get_syscall_nr(current, regs); if (syscall_nr < 0 || syscall_nr >= NR_syscalls) - return; + goto end; /* Here we're inside tp handler's rcu_read_lock_sched (__DO_TRACE) */ trace_file = rcu_dereference_sched(tr->enter_syscall_files[syscall_nr]); if (!trace_file) - return; + goto end; if (trace_trigger_soft_disabled(trace_file)) - return; + goto end; sys_data = syscall_nr_to_meta(syscall_nr); if (!sys_data) - return; + goto end; size = sizeof(*entry) + sizeof(unsigned long) * sys_data->nb_args; @@ -329,7 +331,7 @@ static void ftrace_syscall_enter(void *data, struct pt_regs *regs, long id) event = trace_buffer_lock_reserve(buffer, sys_data->enter_event->event.type, size, irq_flags, pc); if (!event) - return; + goto end; entry = ring_buffer_event_data(event); entry->nr = syscall_nr; @@ -338,6 +340,8 @@ static void ftrace_syscall_enter(void *data, struct pt_regs *regs, long id) event_trigger_unlock_commit(trace_file, buffer, event, entry, irq_flags, pc); +end: + preempt_enable_notrace(); } static void ftrace_syscall_exit(void *data, struct pt_regs *regs, long ret) @@ -352,21 +356,23 @@ static void ftrace_syscall_exit(void *data, struct pt_regs *regs, long ret) int pc; int syscall_nr; + preempt_disable_notrace(); + syscall_nr = trace_get_syscall_nr(current, regs); if (syscall_nr < 0 || syscall_nr >= NR_syscalls) - return; + goto end; /* Here we're inside tp handler's rcu_read_lock_sched (__DO_TRACE()) */ trace_file = rcu_dereference_sched(tr->exit_syscall_files[syscall_nr]); if (!trace_file) - return; + goto end; if (trace_trigger_soft_disabled(trace_file)) - return; + goto end; sys_data = syscall_nr_to_meta(syscall_nr); if (!sys_data) - return; + goto end; local_save_flags(irq_flags); pc = preempt_count(); @@ -376,7 +382,7 @@ static void ftrace_syscall_exit(void *data, struct pt_regs *regs, long ret) sys_data->exit_event->event.type, sizeof(*entry), irq_flags, pc); if (!event) - return; + goto end; entry = ring_buffer_event_data(event); entry->nr = syscall_nr; @@ -384,6 +390,8 @@ static void ftrace_syscall_exit(void *data, struct pt_regs *regs, long ret) event_trigger_unlock_commit(trace_file, buffer, event, entry, irq_flags, pc); +end: + preempt_enable_notrace(); } static int reg_event_syscall_enter(struct trace_event_file *file, @@ -398,7 +406,7 @@ static int reg_event_syscall_enter(struct trace_event_file *file, return -ENOSYS; mutex_lock(&syscall_trace_lock); if (!tr->sys_refcount_enter) - ret = register_trace_sys_enter(ftrace_syscall_enter, tr); + ret = register_trace_maysleep_sys_enter(ftrace_syscall_enter, tr); if (!ret) { rcu_assign_pointer(tr->enter_syscall_files[num], file); tr->sys_refcount_enter++; @@ -436,7 +444,7 @@ static int reg_event_syscall_exit(struct trace_event_file *file, return -ENOSYS; mutex_lock(&syscall_trace_lock); if (!tr->sys_refcount_exit) - ret = register_trace_sys_exit(ftrace_syscall_exit, tr); + ret = register_trace_maysleep_sys_exit(ftrace_syscall_exit, tr); if (!ret) { rcu_assign_pointer(tr->exit_syscall_files[num], file); tr->sys_refcount_exit++; @@ -600,20 +608,22 @@ static void perf_syscall_enter(void *ignore, struct pt_regs *regs, long id) int rctx; int size; + preempt_disable_notrace(); + syscall_nr = trace_get_syscall_nr(current, regs); if (syscall_nr < 0 || syscall_nr >= NR_syscalls) - return; + goto end; if (!test_bit(syscall_nr, enabled_perf_enter_syscalls)) - return; + goto end; sys_data = syscall_nr_to_meta(syscall_nr); if (!sys_data) - return; + goto end; head = this_cpu_ptr(sys_data->enter_event->perf_events); valid_prog_array = bpf_prog_array_valid(sys_data->enter_event); if (!valid_prog_array && hlist_empty(head)) - return; + goto end; /* get the size after alignment with the u32 buffer size field */ size = sizeof(unsigned long) * sys_data->nb_args + sizeof(*rec); @@ -622,7 +632,7 @@ static void perf_syscall_enter(void *ignore, struct pt_regs *regs, long id) rec = perf_trace_buf_alloc(size, NULL, &rctx); if (!rec) - return; + goto end; rec->nr = syscall_nr; syscall_get_arguments(current, regs, args); @@ -632,12 +642,14 @@ static void perf_syscall_enter(void *ignore, struct pt_regs *regs, long id) !perf_call_bpf_enter(sys_data->enter_event, regs, sys_data, rec)) || hlist_empty(head)) { perf_swevent_put_recursion_context(rctx); - return; + goto end; } perf_trace_buf_submit(rec, size, rctx, sys_data->enter_event->event.type, 1, regs, head, NULL); +end: + preempt_enable_notrace(); } static int perf_sysenter_enable(struct trace_event_call *call) @@ -649,7 +661,7 @@ static int perf_sysenter_enable(struct trace_event_call *call) mutex_lock(&syscall_trace_lock); if (!sys_perf_refcount_enter) - ret = register_trace_sys_enter(perf_syscall_enter, NULL); + ret = register_trace_maysleep_sys_enter(perf_syscall_enter, NULL); if (ret) { pr_info("event trace: Could not activate syscall entry trace point"); } else { @@ -699,20 +711,22 @@ static void perf_syscall_exit(void *ignore, struct pt_regs *regs, long ret) int rctx; int size; + preempt_disable_notrace(); + syscall_nr = trace_get_syscall_nr(current, regs); if (syscall_nr < 0 || syscall_nr >= NR_syscalls) - return; + goto end; if (!test_bit(syscall_nr, enabled_perf_exit_syscalls)) - return; + goto end; sys_data = syscall_nr_to_meta(syscall_nr); if (!sys_data) - return; + goto end; head = this_cpu_ptr(sys_data->exit_event->perf_events); valid_prog_array = bpf_prog_array_valid(sys_data->exit_event); if (!valid_prog_array && hlist_empty(head)) - return; + goto end; /* We can probably do that at build time */ size = ALIGN(sizeof(*rec) + sizeof(u32), sizeof(u64)); @@ -720,7 +734,7 @@ static void perf_syscall_exit(void *ignore, struct pt_regs *regs, long ret) rec = perf_trace_buf_alloc(size, NULL, &rctx); if (!rec) - return; + goto end; rec->nr = syscall_nr; rec->ret = syscall_get_return_value(current, regs); @@ -729,11 +743,13 @@ static void perf_syscall_exit(void *ignore, struct pt_regs *regs, long ret) !perf_call_bpf_exit(sys_data->exit_event, regs, rec)) || hlist_empty(head)) { perf_swevent_put_recursion_context(rctx); - return; + goto end; } perf_trace_buf_submit(rec, size, rctx, sys_data->exit_event->event.type, 1, regs, head, NULL); +end: + preempt_enable_notrace(); } static int perf_sysexit_enable(struct trace_event_call *call) @@ -745,7 +761,7 @@ static int perf_sysexit_enable(struct trace_event_call *call) mutex_lock(&syscall_trace_lock); if (!sys_perf_refcount_exit) - ret = register_trace_sys_exit(perf_syscall_exit, NULL); + ret = register_trace_maysleep_sys_exit(perf_syscall_exit, NULL); if (ret) { pr_info("event trace: Could not activate syscall exit trace point"); } else { From patchwork Fri Oct 23 19:53:52 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Michael Jeanson X-Patchwork-Id: 11854477 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH, MAILING_LIST_MULTI,SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id A0B42C5517A for ; Fri, 23 Oct 2020 19:54:34 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 3787522201 for ; Fri, 23 Oct 2020 19:54:34 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=efficios.com header.i=@efficios.com header.b="FER/C7w2" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755093AbgJWTyd (ORCPT ); Fri, 23 Oct 2020 15:54:33 -0400 Received: from mail.efficios.com ([167.114.26.124]:45766 "EHLO mail.efficios.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755006AbgJWTyV (ORCPT ); Fri, 23 Oct 2020 15:54:21 -0400 Received: from localhost (localhost [127.0.0.1]) by mail.efficios.com (Postfix) with ESMTP id 6EDFC27969A; Fri, 23 Oct 2020 15:54:20 -0400 (EDT) Received: from mail.efficios.com ([127.0.0.1]) by localhost (mail03.efficios.com [127.0.0.1]) (amavisd-new, port 10032) with ESMTP id jXavxSd7PtMQ; Fri, 23 Oct 2020 15:54:20 -0400 (EDT) Received: from localhost (localhost [127.0.0.1]) by mail.efficios.com (Postfix) with ESMTP id 0072727979E; Fri, 23 Oct 2020 15:54:20 -0400 (EDT) DKIM-Filter: OpenDKIM Filter v2.10.3 mail.efficios.com 0072727979E DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=efficios.com; s=default; t=1603482860; bh=j9UnACLjaWXUFo/av68UOdzNoWRf/HE9TVrjkJrsWRU=; h=From:To:Date:Message-Id:MIME-Version; b=FER/C7w270ug+KSJHcCBfyo3oaJ8nz40XK6yMAvydFSXmhZk5/rKseIWmdYqcExqc j0FPC4yoUENNueuIIPRaSKl61V7H1aSFVuWaWNaXC20VEDQtW7UBJzbAYHNoiJtg2p buuRbfMOpL9HwiBdPiLYmgifvfzyrAx/8uSXEhm72vNIQ2lWQkSpPf7DQP0/SWSUDZ lq5wzhJQua3SZliRAyxnsueizOM5GJk7HZbjAfLAPkHZkswgJS+IkD9BjLMuceQI6K 9gIZbbLKji17dvAgR2jdGW6cc9AzHIPuW+UMvg7+aSuGG6Z40mW0yj2ODkkGluBWHL 09PZcurWVIBtQ== X-Virus-Scanned: amavisd-new at efficios.com Received: from mail.efficios.com ([127.0.0.1]) by localhost (mail03.efficios.com [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id oNOK1vXrZYpn; Fri, 23 Oct 2020 15:54:19 -0400 (EDT) Received: from localhost.localdomain (96-127-212-112.qc.cable.ebox.net [96.127.212.112]) by mail.efficios.com (Postfix) with ESMTPSA id 618C027971A; Fri, 23 Oct 2020 15:54:19 -0400 (EDT) From: Michael Jeanson To: linux-kernel@vger.kernel.org Cc: mathieu.desnoyers@efficios.com, Michael Jeanson , Steven Rostedt , Peter Zijlstra , Alexei Starovoitov , Yonghong Song , "Paul E . McKenney" , Ingo Molnar , Arnaldo Carvalho de Melo , Mark Rutland , Alexander Shishkin , Jiri Olsa , Namhyung Kim , Joel Fernandes , bpf@vger.kernel.org Subject: [RFC PATCH 6/6] tracing: use sched-RCU instead of SRCU for rcuidle tracepoints Date: Fri, 23 Oct 2020 15:53:52 -0400 Message-Id: <20201023195352.26269-7-mjeanson@efficios.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20201023195352.26269-1-mjeanson@efficios.com> References: <20201023195352.26269-1-mjeanson@efficios.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-State: RFC From: Mathieu Desnoyers Considering that tracer callbacks expect RCU to be watching (for instance, perf uses rcu_read_lock), we need rcuidle tracepoints to issue rcu_irq_{enter,exit}_irqson around calls to the callbacks. So there is no point in using SRCU anymore given that rcuidle tracepoints need to ensure RCU is watching. Therefore, simply use sched-RCU like normal tracepoints for rcuidle tracepoints. Signed-off-by: Mathieu Desnoyers Cc: Michael Jeanson Cc: Steven Rostedt (VMware) Cc: Peter Zijlstra Cc: Alexei Starovoitov Cc: Yonghong Song Cc: Paul E. McKenney Cc: Ingo Molnar Cc: Arnaldo Carvalho de Melo Cc: Mark Rutland Cc: Alexander Shishkin Cc: Jiri Olsa Cc: Namhyung Kim Cc: Joel Fernandes (Google) Cc: bpf@vger.kernel.org --- include/linux/tracepoint.h | 33 +++++++-------------------------- kernel/tracepoint.c | 25 +++++++++---------------- 2 files changed, 16 insertions(+), 42 deletions(-) diff --git a/include/linux/tracepoint.h b/include/linux/tracepoint.h index 0386b54cbcbb..1414b11f864b 100644 --- a/include/linux/tracepoint.h +++ b/include/linux/tracepoint.h @@ -13,7 +13,6 @@ */ #include -#include #include #include #include @@ -33,8 +32,6 @@ struct trace_eval_map { #define TRACEPOINT_DEFAULT_PRIO 10 -extern struct srcu_struct tracepoint_srcu; - extern int tracepoint_probe_register(struct tracepoint *tp, void *probe, void *data); extern int @@ -86,7 +83,6 @@ int unregister_tracepoint_module_notifier(struct notifier_block *nb) static inline void tracepoint_synchronize_unregister(void) { synchronize_rcu_tasks_trace(); - synchronize_srcu(&tracepoint_srcu); synchronize_rcu(); } #else @@ -175,25 +171,13 @@ static inline struct tracepoint *tracepoint_ptr_deref(tracepoint_ptr_t *p) if (!(cond)) \ return; \ \ - /* srcu can't be used from NMI */ \ - WARN_ON_ONCE(rcuidle && in_nmi()); \ - \ - if (maysleep) { \ - might_sleep(); \ + might_sleep_if(maysleep); \ + if (rcuidle) \ + rcu_irq_enter_irqson(); \ + if (maysleep) \ rcu_read_lock_trace(); \ - } else { \ - /* keep srcu and sched-rcu usage consistent */ \ + else \ preempt_disable_notrace(); \ - } \ - \ - /* \ - * For rcuidle callers, use srcu since sched-rcu \ - * doesn't work from the idle path. \ - */ \ - if (rcuidle) { \ - __idx = srcu_read_lock_notrace(&tracepoint_srcu);\ - rcu_irq_enter_irqson(); \ - } \ \ it_func_ptr = rcu_dereference_raw((tp)->funcs); \ \ @@ -205,15 +189,12 @@ static inline struct tracepoint *tracepoint_ptr_deref(tracepoint_ptr_t *p) } while ((++it_func_ptr)->func); \ } \ \ - if (rcuidle) { \ - rcu_irq_exit_irqson(); \ - srcu_read_unlock_notrace(&tracepoint_srcu, __idx);\ - } \ - \ if (maysleep) \ rcu_read_unlock_trace(); \ else \ preempt_enable_notrace(); \ + if (rcuidle) \ + rcu_irq_exit_irqson(); \ } while (0) #ifndef MODULE diff --git a/kernel/tracepoint.c b/kernel/tracepoint.c index 8d8e41c5d8a5..68b4e50798b1 100644 --- a/kernel/tracepoint.c +++ b/kernel/tracepoint.c @@ -18,9 +18,6 @@ extern tracepoint_ptr_t __start___tracepoints_ptrs[]; extern tracepoint_ptr_t __stop___tracepoints_ptrs[]; -DEFINE_SRCU(tracepoint_srcu); -EXPORT_SYMBOL_GPL(tracepoint_srcu); - /* Set to 1 to enable tracepoint debug output */ static const int tracepoint_debug; @@ -65,14 +62,9 @@ static void rcu_tasks_trace_free_old_probes(struct rcu_head *head) kfree(container_of(head, struct tp_probes, rcu)); } -static void srcu_free_old_probes(struct rcu_head *head) -{ - call_rcu_tasks_trace(head, rcu_tasks_trace_free_old_probes); -} - static void rcu_free_old_probes(struct rcu_head *head) { - call_srcu(&tracepoint_srcu, head, srcu_free_old_probes); + call_rcu_tasks_trace(head, rcu_tasks_trace_free_old_probes); } static __init int release_early_probes(void) @@ -90,7 +82,7 @@ static __init int release_early_probes(void) return 0; } -/* SRCU and Tasks Trace RCU are initialized at core_initcall */ +/* Tasks Trace RCU is initialized at core_initcall */ postcore_initcall(release_early_probes); static inline void release_probes(struct tracepoint_func *old) @@ -100,9 +92,8 @@ static inline void release_probes(struct tracepoint_func *old) struct tp_probes, probes[0]); /* - * We can't free probes if SRCU and Tasks Trace RCU are not - * initialized yet. Postpone the freeing till after both are - * initialized. + * We can't free probes if Tasks Trace RCU is not initialized yet. + * Postpone the freeing till after Tasks Trace RCU is initialized. */ if (unlikely(!ok_to_free_tracepoints)) { tp_probes->rcu.next = early_probes; @@ -111,9 +102,11 @@ static inline void release_probes(struct tracepoint_func *old) } /* - * Tracepoint probes are protected by sched RCU, SRCU and - * Tasks Trace RCU by chaining the callbacks we cover all three - * cases and wait for all three grace periods. + * Tracepoint probes are protected by both sched RCU and + * Tasks Trace RCU, by calling the Tasks Trace RCU callback in + * the sched RCU callback we cover both cases. So let us chain + * the Tasks Trace RCU and sched RCU callbacks to wait for both + * grace periods. */ call_rcu(&tp_probes->rcu, rcu_free_old_probes); }