[RFC,v3,09/13] uprobes: SRCU-protect uretprobe lifetime (with timeout)

Message ID	20240813042917.506057-10-andrii@kernel.org (mailing list archive)
State	Handled Elsewhere
Headers	show Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 37CD0745E4; Tue, 13 Aug 2024 04:30:10 +0000 (UTC) From: Andrii Nakryiko <andrii@kernel.org> To: linux-trace-kernel@vger.kernel.org, peterz@infradead.org, oleg@redhat.com Cc: rostedt@goodmis.org, mhiramat@kernel.org, bpf@vger.kernel.org, linux-kernel@vger.kernel.org, jolsa@kernel.org, paulmck@kernel.org, willy@infradead.org, surenb@google.com, akpm@linux-foundation.org, linux-mm@kvack.org, Andrii Nakryiko <andrii@kernel.org> Subject: [PATCH RFC v3 09/13] uprobes: SRCU-protect uretprobe lifetime (with timeout) Date: Mon, 12 Aug 2024 21:29:13 -0700 Message-ID: <20240813042917.506057-10-andrii@kernel.org> In-Reply-To: <20240813042917.506057-1-andrii@kernel.org> References: <20240813042917.506057-1-andrii@kernel.org> Precedence: bulk MIME-Version: 1.0 Content-Transfer-Encoding: 8bit
Series	uprobes: RCU-protected hot path optimizations \| expand [v3,00/13] uprobes: RCU-protected hot path optimizations [v3,01/13] uprobes: revamp uprobe refcounting and lifetime management [v3,02/13] uprobes: protected uprobe lifetime with SRCU [v3,03/13] uprobes: get rid of enum uprobe_filter_ctx in uprobe filter callbacks [v3,04/13] uprobes: travers uprobe's consumer list locklessly under SRCU protection [v3,05/13] perf/uprobe: split uprobe_unregister() [v3,06/13] rbtree: provide rb_find_rcu() / rb_find_add_rcu() [v3,07/13] uprobes: perform lockless SRCU-protected uprobes_tree lookup [v3,08/13] uprobes: switch to RCU Tasks Trace flavor for better performance [RFC,v3,09/13] uprobes: SRCU-protect uretprobe lifetime (with timeout) [RFC,v3,10/13] uprobes: implement SRCU-protected lifetime for single-stepped uprobe [RFC,v3,11/13] mm: introduce mmap_lock_speculation_{start\|end} [RFC,v3,12/13] mm: add SLAB_TYPESAFE_BY_RCU to files_cache [RFC,v3,13/13] uprobes: add speculative lockless VMA to inode resolution

Context	Check	Description
netdev/tree_selection	success	Not a local patch
bpf/vmtest-bpf-PR	fail	merge-conflict
bpf/vmtest-bpf-VM_Test-7	success	Logs for aarch64-gcc / test (test_progs, false, 360) / test_progs on aarch64 with gcc
bpf/vmtest-bpf-VM_Test-8	success	Logs for aarch64-gcc / test (test_progs_no_alu32, false, 360) / test_progs_no_alu32 on aarch64 with gcc
bpf/vmtest-bpf-VM_Test-1	success	Logs for ShellCheck
bpf/vmtest-bpf-VM_Test-9	success	Logs for aarch64-gcc / test (test_verifier, false, 360) / test_verifier on aarch64 with gcc
bpf/vmtest-bpf-VM_Test-6	success	Logs for aarch64-gcc / test (test_maps, false, 360) / test_maps on aarch64 with gcc
bpf/vmtest-bpf-VM_Test-5	success	Logs for aarch64-gcc / build-release
bpf/vmtest-bpf-VM_Test-3	success	Logs for Validate matrix.py
bpf/vmtest-bpf-VM_Test-0	success	Logs for Lint
bpf/vmtest-bpf-VM_Test-4	success	Logs for aarch64-gcc / build / build for aarch64 with gcc
bpf/vmtest-bpf-VM_Test-10	success	Logs for aarch64-gcc / veristat
bpf/vmtest-bpf-VM_Test-2	success	Logs for Unittests
bpf/vmtest-bpf-VM_Test-11	success	Logs for s390x-gcc / build / build for s390x with gcc
bpf/vmtest-bpf-VM_Test-12	success	Logs for s390x-gcc / build-release
bpf/vmtest-bpf-VM_Test-13	success	Logs for s390x-gcc / test (test_progs, false, 360) / test_progs on s390x with gcc
bpf/vmtest-bpf-VM_Test-14	pending	Logs for s390x-gcc / test (test_progs_no_alu32, false, 360) / test_progs_no_alu32 on s390x with gcc
bpf/vmtest-bpf-VM_Test-15	success	Logs for s390x-gcc / test (test_verifier, false, 360) / test_verifier on s390x with gcc
bpf/vmtest-bpf-VM_Test-16	success	Logs for s390x-gcc / veristat
bpf/vmtest-bpf-VM_Test-17	success	Logs for set-matrix
bpf/vmtest-bpf-VM_Test-18	success	Logs for x86_64-gcc / build / build for x86_64 with gcc
bpf/vmtest-bpf-VM_Test-19	success	Logs for x86_64-gcc / build-release
bpf/vmtest-bpf-VM_Test-20	success	Logs for x86_64-gcc / test (test_maps, false, 360) / test_maps on x86_64 with gcc
bpf/vmtest-bpf-VM_Test-21	success	Logs for x86_64-gcc / test (test_progs, false, 360) / test_progs on x86_64 with gcc
bpf/vmtest-bpf-VM_Test-22	success	Logs for x86_64-gcc / test (test_progs_no_alu32, false, 360) / test_progs_no_alu32 on x86_64 with gcc
bpf/vmtest-bpf-VM_Test-23	success	Logs for x86_64-gcc / test (test_progs_no_alu32_parallel, true, 30) / test_progs_no_alu32_parallel on x86_64 with gcc
bpf/vmtest-bpf-VM_Test-24	success	Logs for x86_64-gcc / test (test_progs_parallel, true, 30) / test_progs_parallel on x86_64 with gcc
bpf/vmtest-bpf-VM_Test-25	success	Logs for x86_64-gcc / test (test_verifier, false, 360) / test_verifier on x86_64 with gcc
bpf/vmtest-bpf-VM_Test-26	success	Logs for x86_64-gcc / veristat / veristat on x86_64 with gcc
bpf/vmtest-bpf-VM_Test-27	success	Logs for x86_64-llvm-17 / build / build for x86_64 with llvm-17
bpf/vmtest-bpf-VM_Test-28	success	Logs for x86_64-llvm-17 / build-release / build for x86_64 with llvm-17-O2
bpf/vmtest-bpf-VM_Test-29	success	Logs for x86_64-llvm-17 / test (test_maps, false, 360) / test_maps on x86_64 with llvm-17
bpf/vmtest-bpf-VM_Test-30	success	Logs for x86_64-llvm-17 / test (test_progs, false, 360) / test_progs on x86_64 with llvm-17
bpf/vmtest-bpf-VM_Test-31	success	Logs for x86_64-llvm-17 / test (test_progs_no_alu32, false, 360) / test_progs_no_alu32 on x86_64 with llvm-17
bpf/vmtest-bpf-VM_Test-32	success	Logs for x86_64-llvm-17 / test (test_verifier, false, 360) / test_verifier on x86_64 with llvm-17
bpf/vmtest-bpf-VM_Test-33	success	Logs for x86_64-llvm-17 / veristat
bpf/vmtest-bpf-VM_Test-34	success	Logs for x86_64-llvm-18 / build / build for x86_64 with llvm-18
bpf/vmtest-bpf-VM_Test-35	success	Logs for x86_64-llvm-18 / build-release / build for x86_64 with llvm-18-O2
bpf/vmtest-bpf-VM_Test-36	success	Logs for x86_64-llvm-18 / test (test_maps, false, 360) / test_maps on x86_64 with llvm-18
bpf/vmtest-bpf-VM_Test-37	success	Logs for x86_64-llvm-18 / test (test_progs, false, 360) / test_progs on x86_64 with llvm-18
bpf/vmtest-bpf-VM_Test-38	success	Logs for x86_64-llvm-18 / test (test_progs_cpuv4, false, 360) / test_progs_cpuv4 on x86_64 with llvm-18
bpf/vmtest-bpf-VM_Test-39	success	Logs for x86_64-llvm-18 / test (test_progs_no_alu32, false, 360) / test_progs_no_alu32 on x86_64 with llvm-18
bpf/vmtest-bpf-VM_Test-40	success	Logs for x86_64-llvm-18 / test (test_verifier, false, 360) / test_verifier on x86_64 with llvm-18
bpf/vmtest-bpf-VM_Test-41	success	Logs for x86_64-llvm-18 / veristat

diff --git a/include/linux/uprobes.h b/include/linux/uprobes.h index e41cdae5597b..9a0aa0b2a5fe 100644 --- a/include/linux/uprobes.h +++ b/include/linux/uprobes.h @@ -15,6 +15,7 @@ #include <linux/rbtree.h> #include <linux/types.h> #include <linux/wait.h> +#include <linux/timer.h> struct uprobe; struct vm_area_struct; @@ -48,6 +49,45 @@ enum uprobe_task_state { UTASK_SSTEP_TRAPPED, }; +/* + * Hybrid lifetime uprobe. Represents a uprobe instance that could be either + * SRCU protected (with SRCU protection eventually potentially timing out), + * refcounted using uprobe->ref, or there could be no valid uprobe (NULL). + * + * hprobe's internal state is setup such that background timer thread can + * atomically "downgrade" temporarily RCU-protected uprobe into refcounted one + * (or no uprobe, if refcounting failed). + * + * *stable* pointer always point to the uprobe (or could be NULL if there is + * was no valid underlying uprobe to begin with). + * + * *leased* pointer is the key to achieving race-free atomic lifetime state + * transition and can have three possible states: + * - either the same non-NULL value as *stable*, in which case uprobe is + * SRCU-protected; + * - NULL, in which case uprobe (if there is any) is refcounted; + * - special __UPROBE_DEAD value, which represents an uprobe that was SRCU + * protected initially, but SRCU period timed out and we attempted to + * convert it to refcounted, but refcount_inc_not_zero() failed, because + * uprobe effectively went away (the last consumer unsubscribed). In this + * case it's important to know that *stable* pointer (which still has + * non-NULL uprobe pointer) shouldn't be used, because lifetime of + * underlying uprobe is not guaranteed anymore. __UPROBE_DEAD is just an + * internal marker and is handled transparently by hprobe_fetch() helper. + * + * When uprobe is SRCU-protected, we also record srcu_idx value, necessary for + * SRCU unlocking. + * + * See hprobe_expire() and hprobe_fetch() for details of race-free uprobe + * state transitioning details. It all hinges on atomic xchg() over *leaded* + * pointer. *stable* pointer, once initially set, is not modified concurrently. + */ +struct hprobe { + struct uprobe *stable; + struct uprobe *leased; + int srcu_idx; +}; + /* * uprobe_task: Metadata of a task while it singlesteps. */ @@ -68,6 +108,7 @@ struct uprobe_task { }; struct uprobe *active_uprobe; + struct timer_list ri_timer; unsigned long xol_vaddr; struct arch_uprobe *auprobe; @@ -77,14 +118,18 @@ struct uprobe_task { }; struct return_instance { - struct uprobe *uprobe; unsigned long func; unsigned long stack; /* stack pointer */ unsigned long orig_ret_vaddr; /* original return address */ bool chained; /* true, if instance is nested */ struct return_instance *next; /* keep as stack */ -}; + + union { + struct hprobe hprobe; + struct rcu_head rcu; + }; +} ____cacheline_aligned; enum rp_check { RP_CHECK_CALL, diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c index 0480ad841942..26acd06871e6 100644 --- a/kernel/events/uprobes.c +++ b/kernel/events/uprobes.c @@ -26,6 +26,7 @@ #include <linux/task_work.h> #include <linux/shmem_fs.h> #include <linux/khugepaged.h> +#include <linux/srcu.h> #include <linux/uprobes.h> @@ -49,6 +50,9 @@ static struct mutex uprobes_mmap_mutex[UPROBES_HASH_SZ]; DEFINE_STATIC_PERCPU_RWSEM(dup_mmap_sem); +/* Covers return_instance's uprobe lifetime. */ +DEFINE_STATIC_SRCU(uretprobes_srcu); + /* Have a copy of original instruction */ #define UPROBE_COPY_INSN 0 @@ -594,13 +598,6 @@ set_orig_insn(struct arch_uprobe *auprobe, struct mm_struct *mm, unsigned long v *(uprobe_opcode_t *)&auprobe->insn); } -/* uprobe should have guaranteed positive refcount */ -static struct uprobe *get_uprobe(struct uprobe *uprobe) -{ - refcount_inc(&uprobe->ref); - return uprobe; -} - /* * uprobe should have guaranteed lifetime, which can be either of: * - caller already has refcount taken (and wants an extra one); @@ -619,13 +616,20 @@ static inline bool uprobe_is_active(struct uprobe *uprobe) return !RB_EMPTY_NODE(&uprobe->rb_node); } -static void uprobe_free_rcu(struct rcu_head *rcu) +static void uprobe_free_rcu_tasks_trace(struct rcu_head *rcu) { struct uprobe *uprobe = container_of(rcu, struct uprobe, rcu); kfree(uprobe); } +static void uprobe_free_srcu(struct rcu_head *rcu) +{ + struct uprobe *uprobe = container_of(rcu, struct uprobe, rcu); + + call_rcu_tasks_trace(&uprobe->rcu, uprobe_free_rcu_tasks_trace); +} + static void put_uprobe(struct uprobe *uprobe) { if (!refcount_dec_and_test(&uprobe->ref)) @@ -650,7 +654,146 @@ static void put_uprobe(struct uprobe *uprobe) delayed_uprobe_remove(uprobe, NULL); mutex_unlock(&delayed_uprobe_lock); - call_rcu_tasks_trace(&uprobe->rcu, uprobe_free_rcu); + /* start srcu -> rcu_tasks_trace -> kfree chain */ + call_srcu(&uretprobes_srcu, &uprobe->rcu, uprobe_free_srcu); +} + +/* + * Special marker pointer for when ri_timer() expired, unlocking RCU, but + * failed to acquire refcount on uprobe (because it doesn't have any + * associated consumer anymore, for example). In such case it's important for + * hprobe_consume() to return NULL uprobe, instead of "stable" uprobe pointer, + * as that one isn't protected by either refcount nor RCU region now. + */ +#define __UPROBE_DEAD ((struct uprobe *)(-0xdead)) + +#define RI_TIMER_PERIOD (HZ/10) /* 100 ms */ + +/* Initialize hprobe as SRCU-protected "leased" uprobe */ +static void hprobe_init_leased(struct hprobe *hprobe, struct uprobe *uprobe, int srcu_idx) +{ + hprobe->srcu_idx = srcu_idx; + hprobe->stable = uprobe; + hprobe->leased = uprobe; +} + +/* Initialize hprobe as refcounted ("stable") uprobe (uprobe can be NULL). */ +static void hprobe_init_stable(struct hprobe *hprobe, struct uprobe *uprobe) +{ + hprobe->srcu_idx = -1; + hprobe->stable = uprobe; + hprobe->leased = NULL; +} + +/* + * hprobe_consume() fetches hprobe's underlying uprobe and detects whether + * uprobe is still SRCU protected, or is refcounted. hprobe_consume() can be + * used only once for a given hprobe. + * + * Caller has to perform SRCU unlock if under_rcu is set to true; + * otherwise, either properly refcounted uprobe is returned or NULL. + */ +static inline struct uprobe *hprobe_consume(struct hprobe *hprobe, bool *under_rcu) +{ + struct uprobe *uprobe; + + uprobe = xchg(&hprobe->leased, NULL); + if (uprobe) { + if (unlikely(uprobe == __UPROBE_DEAD)) { + *under_rcu = false; + return NULL; + } + + *under_rcu = true; + return uprobe; + } + + *under_rcu = false; + return hprobe->stable; +} + +/* + * Reset hprobe state and, if under_rcu is true, release SRCU lock. + * hprobe_finalize() can only be used from current context after + * hprobe_consume() call (which determines uprobe and under_rcu value). + */ +static void hprobe_finalize(struct hprobe *hprobe, struct uprobe *uprobe, bool under_rcu) +{ + if (under_rcu) + __srcu_read_unlock(&uretprobes_srcu, hprobe->srcu_idx); + else if (uprobe) + put_uprobe(uprobe); + /* prevent free_ret_instance() from double-putting uprobe */ + hprobe->stable = NULL; +} + +/* + * Attempt to switch (atomically) uprobe from being RCU protected ("leased") + * to refcounted ("stable") state. Competes with hprobe_consume(), only one of + * them can win the race to perform SRCU unlocking. Whoever wins must perform + * SRCU unlock. + * + * Returns underlying valid uprobe or NULL, if there was no underlying uprobe + * to begin with or we failed to bump its refcount and it's going away. + * + * Returned non-NULL uprobe can be still safely used within an ongoing SRCU + * locked region. It's not guaranteed that returned uprobe has a positive + * refcount, so caller has to attempt try_get_uprobe(), if it needs to use + * returned uprobe instance beyond ongoing SRCU lock region. See dup_utask(). + */ +static struct uprobe* hprobe_expire(struct hprobe *hprobe) +{ + struct uprobe *uprobe; + + /* + * return_instance's hprobe is protected by RCU. + * Underlying uprobe is itself protected from reuse by SRCU. + */ + lockdep_assert(rcu_read_lock_held() && srcu_read_lock_held(&uretprobes_srcu)); + + /* + * Leased pointer can only be NULL, __UPROBE_DEAD, or some valid uprobe + * pointer. This pointer can only be updated to NULL or __UPROBE_DEAD, + * not any other valid uprobe pointer. So it's safe to fetch it with + * READ_ONCE() and try to refcount it, if it's not NULL or __UPROBE_DEAD. + */ + uprobe = data_race(READ_ONCE(hprobe->leased)); + if (!uprobe || uprobe == __UPROBE_DEAD) + return NULL; + + if (!try_get_uprobe(uprobe)) { + /* + * hprobe_consume() might have xchg()'ed to NULL already, + * in which case we shouldn't set __UPROBE_DEAD. + */ + cmpxchg(&hprobe->leased, uprobe, __UPROBE_DEAD); + return NULL; + } + + /* + * Even if hprobe_consume() won and unlocked SRCU, we still have + * a guarantee that uprobe won't be freed (and thus won't be reused) + * because out caller maintains its own SRCU locked region. + * So cmpxchg() below is well-formed. + */ + if (cmpxchg(&hprobe->leased, uprobe, NULL)) { + /* + * At this point uprobe is properly refcounted, so it's safe + * to end its original SRCU locked region. + */ + __srcu_read_unlock(&uretprobes_srcu, hprobe->srcu_idx); + return uprobe; + } + + /* We lost the race, undo our refcount bump. It can drop to zero. */ + put_uprobe(uprobe); + + /* + * We return underlying uprobe nevertheless because it's still valid + * until the end of current SRCU locked region, and can be used to + * try_get_uprobe(). This is used in dup_utask(). + */ + return uprobe; } static __always_inline @@ -1727,11 +1870,18 @@ unsigned long uprobe_get_trap_addr(struct pt_regs *regs) return instruction_pointer(regs); } -static struct return_instance *free_ret_instance(struct return_instance *ri) +static struct return_instance *free_ret_instance(struct return_instance *ri, bool cleanup_hprobe) { struct return_instance *next = ri->next; - put_uprobe(ri->uprobe); - kfree(ri); + struct uprobe *uprobe; + bool under_rcu; + + if (cleanup_hprobe) { + uprobe = hprobe_consume(&ri->hprobe, &under_rcu); + hprobe_finalize(&ri->hprobe, uprobe, under_rcu); + } + + kfree_rcu(ri, rcu); return next; } @@ -1747,18 +1897,51 @@ void uprobe_free_utask(struct task_struct *t) if (!utask) return; + timer_delete_sync(&utask->ri_timer); + if (utask->active_uprobe) put_uprobe(utask->active_uprobe); ri = utask->return_instances; while (ri) - ri = free_ret_instance(ri); + ri = free_ret_instance(ri, true /* cleanup_hprobe */); xol_free_insn_slot(t); kfree(utask); t->utask = NULL; } +#define for_each_ret_instance_rcu(pos, head) \ + for (pos = rcu_dereference_raw(head); pos; pos = rcu_dereference_raw(pos->next)) + +static void ri_timer(struct timer_list *timer) +{ + struct uprobe_task *utask = container_of(timer, struct uprobe_task, ri_timer); + struct return_instance *ri; + + /* SRCU protects uprobe from reuse for the cmpxchg() inside hprobe_expire(). */ + guard(srcu)(&uretprobes_srcu); + /* RCU protects return_instance from freeing. */ + guard(rcu)(); + + for_each_ret_instance_rcu(ri, utask->return_instances) { + hprobe_expire(&ri->hprobe); + } +} + +static struct uprobe_task *alloc_utask(void) +{ + struct uprobe_task *utask; + + utask = kzalloc(sizeof(*utask), GFP_KERNEL); + if (!utask) + return NULL; + + timer_setup(&utask->ri_timer, ri_timer, 0); + + return utask; +} + /* * Allocate a uprobe_task object for the task if necessary. * Called when the thread hits a breakpoint. @@ -1770,7 +1953,7 @@ void uprobe_free_utask(struct task_struct *t) static struct uprobe_task *get_utask(void) { if (!current->utask) - current->utask = kzalloc(sizeof(struct uprobe_task), GFP_KERNEL); + current->utask = alloc_utask(); return current->utask; } @@ -1778,12 +1961,16 @@ static int dup_utask(struct task_struct *t, struct uprobe_task *o_utask) { struct uprobe_task *n_utask; struct return_instance **p, *o, *n; + struct uprobe *uprobe; - n_utask = kzalloc(sizeof(struct uprobe_task), GFP_KERNEL); + n_utask = alloc_utask(); if (!n_utask) return -ENOMEM; t->utask = n_utask; + /* protect uprobes from freeing, we'll need try_get_uprobe() them */ + guard(srcu)(&uretprobes_srcu); + p = &n_utask->return_instances; for (o = o_utask->return_instances; o; o = o->next) { n = kmalloc(sizeof(struct return_instance), GFP_KERNEL); @@ -1791,17 +1978,24 @@ static int dup_utask(struct task_struct *t, struct uprobe_task *o_utask) return -ENOMEM; *n = *o; + + /* see hprobe_expire() comments */ + uprobe = hprobe_expire(&o->hprobe); + if (uprobe) /* refcount bump for new utask */ + uprobe = try_get_uprobe(uprobe); + /* - * uprobe's refcnt has to be positive at this point, kept by - * utask->return_instances items; return_instances can't be - * removed right now, as task is blocked due to duping; so - * get_uprobe() is safe to use here. + * New utask will have stable properly refcounted uprobe or NULL. + * Even if we failed to get refcounted uprobe, we still need + * to preserve full set of return_instances for proper + * uretprobe handling and nesting in forked task. */ - get_uprobe(n->uprobe); - n->next = NULL; + hprobe_init_stable(&n->hprobe, uprobe); - *p = n; + n->next = NULL; + rcu_assign_pointer(*p, n); p = &n->next; + n_utask->depth++; } @@ -1877,10 +2071,10 @@ static void cleanup_return_instances(struct uprobe_task *utask, bool chained, enum rp_check ctx = chained ? RP_CHECK_CHAIN_CALL : RP_CHECK_CALL; while (ri && !arch_uretprobe_is_alive(ri, ctx, regs)) { - ri = free_ret_instance(ri); + ri = free_ret_instance(ri, true /* cleanup_hprobe */); utask->depth--; } - utask->return_instances = ri; + rcu_assign_pointer(utask->return_instances, ri); } static void prepare_uretprobe(struct uprobe *uprobe, struct pt_regs *regs) @@ -1889,6 +2083,7 @@ static void prepare_uretprobe(struct uprobe *uprobe, struct pt_regs *regs) struct uprobe_task *utask; unsigned long orig_ret_vaddr, trampoline_vaddr; bool chained; + int srcu_idx; if (!get_xol_area()) return; @@ -1904,10 +2099,6 @@ static void prepare_uretprobe(struct uprobe *uprobe, struct pt_regs *regs) return; } - /* we need to bump refcount to store uprobe in utask */ - if (!try_get_uprobe(uprobe)) - return; - ri = kmalloc(sizeof(struct return_instance), GFP_KERNEL); if (!ri) goto fail; @@ -1937,20 +2128,36 @@ static void prepare_uretprobe(struct uprobe *uprobe, struct pt_regs *regs) } orig_ret_vaddr = utask->return_instances->orig_ret_vaddr; } - ri->uprobe = uprobe; + + /* __srcu_read_lock() because SRCU lock survives switch to user space */ + srcu_idx = __srcu_read_lock(&uretprobes_srcu); + ri->func = instruction_pointer(regs); ri->stack = user_stack_pointer(regs); ri->orig_ret_vaddr = orig_ret_vaddr; ri->chained = chained; utask->depth++; + + hprobe_init_leased(&ri->hprobe, uprobe, srcu_idx); ri->next = utask->return_instances; - utask->return_instances = ri; + rcu_assign_pointer(utask->return_instances, ri); + + /* + * Don't reschedule if timer is already active. This way we have + * a guaranteed cap on maximum timer period (SRCU expiration duration) + * regardless of how long and well-timed uretprobe chain user space + * might cause. At worst we'll just have a few extra inconsequential + * refcount bumps even if we could, technically, get away with just an + * SRCU lock. On the other hand, we get timer expiration logic + * triggered and tested regularly even for very short-running uretprobes. + */ + if (!timer_pending(&utask->ri_timer)) + mod_timer(&utask->ri_timer, jiffies + RI_TIMER_PERIOD); return; fail: kfree(ri); - put_uprobe(uprobe); } /* Prepare to single-step probed instruction out of line. */ @@ -2144,11 +2351,14 @@ static void handler_chain(struct uprobe *uprobe, struct pt_regs *regs) } static void -handle_uretprobe_chain(struct return_instance *ri, struct pt_regs *regs) +handle_uretprobe_chain(struct return_instance *ri, struct uprobe *uprobe, struct pt_regs *regs) { - struct uprobe *uprobe = ri->uprobe; struct uprobe_consumer *uc; + /* all consumers unsubscribed meanwhile */ + if (unlikely(!uprobe)) + return; + rcu_read_lock_trace(); list_for_each_entry_rcu(uc, &uprobe->consumers, cons_node, rcu_read_lock_trace_held()) { if (uc->ret_handler) @@ -2173,7 +2383,8 @@ void uprobe_handle_trampoline(struct pt_regs *regs) { struct uprobe_task *utask; struct return_instance *ri, *next; - bool valid; + struct uprobe *uprobe; + bool valid, under_rcu; utask = current->utask; if (!utask) @@ -2203,21 +2414,24 @@ void uprobe_handle_trampoline(struct pt_regs *regs) * trampoline addresses on the stack are replaced with correct * original return addresses */ - utask->return_instances = ri->next; + rcu_assign_pointer(utask->return_instances, ri->next); + + uprobe = hprobe_consume(&ri->hprobe, &under_rcu); if (valid) - handle_uretprobe_chain(ri, regs); - ri = free_ret_instance(ri); + handle_uretprobe_chain(ri, uprobe, regs); + hprobe_finalize(&ri->hprobe, uprobe, under_rcu); + + /* We already took care of hprobe, no need to waste more time on that. */ + ri = free_ret_instance(ri, false /* !cleanup_hprobe */); utask->depth--; } while (ri != next); } while (!valid); - utask->return_instances = ri; return; - sigill: +sigill: uprobe_warn(current, "handle uretprobe, sending SIGILL."); force_sig(SIGILL); - } bool __weak arch_uprobe_ignore(struct arch_uprobe *aup, struct pt_regs *regs)

[RFC,v3,09/13] uprobes: SRCU-protect uretprobe lifetime (with timeout)

Checks

Commit Message

Comments

Patch