From patchwork Thu Aug 29 18:37:34 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrii Nakryiko X-Patchwork-Id: 13783562 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 05739C8303E for ; Thu, 29 Aug 2024 18:37:54 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 8A3EB6B009A; Thu, 29 Aug 2024 14:37:53 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 82B366B00A4; Thu, 29 Aug 2024 14:37:53 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 6A4636B00AB; Thu, 29 Aug 2024 14:37:53 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 44CAA6B009A for ; Thu, 29 Aug 2024 14:37:53 -0400 (EDT) Received: from smtpin18.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id EFFD11607DA for ; Thu, 29 Aug 2024 18:37:52 +0000 (UTC) X-FDA: 82506141984.18.CC71DE6 Received: from nyc.source.kernel.org (nyc.source.kernel.org [147.75.193.91]) by imf06.hostedemail.com (Postfix) with ESMTP id 2F877180009 for ; Thu, 29 Aug 2024 18:37:51 +0000 (UTC) Authentication-Results: imf06.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=fCU0fK9i; dmarc=pass (policy=quarantine) header.from=kernel.org; spf=pass (imf06.hostedemail.com: domain of andrii@kernel.org designates 147.75.193.91 as permitted sender) smtp.mailfrom=andrii@kernel.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1724956571; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=AOMTfLgi9knuXnk5PhgPB7U9YULtwaGPXfrVHA/gUQg=; b=XqVsJmDQKYEoab0U+k0w+xgejSj3OuQpJ5BHeO4DoEBtJb9Y2kO+gXxaUXhXw1oEnRs7OL D4zC2qwd53DoLL0NjURud/UwkNWjX3P5VCHBWALozWk7x68npASwlld4aaAoZgYfqy8RG4 aAg5s+i05ve7xZxp78pbVfG8qbgQcdQ= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1724956571; a=rsa-sha256; cv=none; b=enIXNsBjm12xSjcghcp3cJD7DSUDyTLgU7IuxkhnEljVdL9zV56sSyD5svXdYlK59yR3Ph Mbs6MtnePrIPUpoQOlB9B7SmgEqzsPDtA42NhlsAM5aQYI/+1IPSlPJswpqzErGggdHRiE VsJXrhRLbY1sU2sprbUZCMfvu7wH3pE= ARC-Authentication-Results: i=1; imf06.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=fCU0fK9i; dmarc=pass (policy=quarantine) header.from=kernel.org; spf=pass (imf06.hostedemail.com: domain of andrii@kernel.org designates 147.75.193.91 as permitted sender) smtp.mailfrom=andrii@kernel.org Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by nyc.source.kernel.org (Postfix) with ESMTP id E6DCFA42679; Thu, 29 Aug 2024 18:37:42 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 75456C4CEC1; Thu, 29 Aug 2024 18:37:49 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1724956669; bh=j0G5fXEcKabl4eM0DNdrmoN5gAzK97GyH+KqiIKlO+E=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=fCU0fK9ivtgxKy6epMKix/fDFPjwlnUFK9iSv+ZKYGwu9h91YK3zK5uKGcRRHraFK WTKMQOCVKD/bEQ0etR2TkyO+r6ruA04W7Su/SayniCwbbo/c1B34vLEAts3Ub99WJ+ D6fnwsxj0Jh//mmAfIGPI5RNPszqnNbNnd7aV61A+bX/8vfBoB/XIIsCi7r1v1r50V iNRXL4jbwMtzx0MukNrmXJDAY6Rz2Mr4z+EylFIqqyE633ac4mrSs3drizUOajlX5k zq0gSMu0HofVpnQx56XkFYKZCxZ8tePh11zwRO0ah5b66Z0lbqXchNWiD6DP9mBgCb Q7crbma7U4LPw== From: Andrii Nakryiko To: linux-trace-kernel@vger.kernel.org, peterz@infradead.org, oleg@redhat.com Cc: rostedt@goodmis.org, mhiramat@kernel.org, bpf@vger.kernel.org, linux-kernel@vger.kernel.org, jolsa@kernel.org, paulmck@kernel.org, willy@infradead.org, surenb@google.com, akpm@linux-foundation.org, linux-mm@kvack.org, Andrii Nakryiko Subject: [PATCH v4 1/8] uprobes: revamp uprobe refcounting and lifetime management Date: Thu, 29 Aug 2024 11:37:34 -0700 Message-ID: <20240829183741.3331213-2-andrii@kernel.org> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20240829183741.3331213-1-andrii@kernel.org> References: <20240829183741.3331213-1-andrii@kernel.org> MIME-Version: 1.0 X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: 2F877180009 X-Stat-Signature: tn5sof7ki6e55q87g7fgrb7fdq9ko7se X-Rspam-User: X-HE-Tag: 1724956671-613396 X-HE-Meta: U2FsdGVkX18Dzttiy3FpZGbSr3w/Ep4TjJwGLt4c0WxD9c9wA8JXLN1isodx6M/AI2omSpcz+nGGWHc0YXyWf2MXMgm11KRPjjNZEvHdkyQq5zWAzS8WhOxSsMfY3ja8A9v/MEBeO9LEKMfRDxI9IHFHESJzL224ijVgdduzl1qMGIf427FTAlvVwpm1ikceKVHgm8OTlVmcFuCNlkD9UsrDxnXm0Oy2E9KvUfBlV0xDXGNkQMpTOCwV/5lHhiQfgf+BTbA3Ylv0R/gGWqOBWe9igFKR6ii7OGWpYCu7LHMghxmc4no3pvZ64avMnpvNvvK6CjAGrMCqhR2D1MIlWPp0PbNRKOA+bR0s/RHWfk6xTxA2I5JJ2nG++OJyr/lA0WuSNOJ4ylcB1/RHJhRaI8BSsSQ7tBxHhM6R+5Ev0wbOH5uksey51zCv2KDAGhBmdpEMFfKC2Qi5cRR8A7bKEYVruJUAM8N5Rjx7JeIXrDygLDTK+lpM484FhibRydGNWkvffaarqbpcPxa3HPaLCrCmK3/2YIn8fMchjBB98OJH79if/vg2cDgNF8GGTZLnydkp6R4wX/GkhZ9siUHBNG5UKJ6WWA0NSzfW5UR2DEOkBnOMz0Fy3vZhqwZnr785KNZ4H1ezUViDcfIWzYkX9ns2TtstQbLNYFPdUs0H0ZsgW9yPr2mAQFGpu8Z1chfLRd75PBr3Yt1QEL4BxUVNglx0D61tqCDXL84mm0rcPx5SJQAXs+NFm6TADwJfL/AJn7hRlO0EvzwAKzS0/90xBpS12Yv3JrGUD4KlN0B49wJchadyP7JctIp1rG9dxlZk1W6/YgVqcooHuQYelQ+OfMLU570rWmSNWAPDrN8CxIoZOgsbLAli+/lOmMzwYhqPcVcug5lcRwM2o/0jiSLSkV486n4BzJtDlcjJw/mZu0k2luFLq4GhfV3LuG9yQyayRMjS4iDnvTDbyMPksGN hH57XU99 onOdlho+JYD3/qj4+j1ClACDGQzV6aZQuQtc2ygcUx8bfSh4nEriF4LauG6A10IziUNnrr1BbVp7Rawurp/XtT2RvSm44xw0N74GaUfBUdDAjD5C6aWs/zNt0L4hPEHejWjdDP/JEWMatsErN/8QgSWhc+MSPL+YgFBuhenTU1Kzz1IxjBNaNUekrHp3oLn3Hqtx/lmpzTaSYyZSmILYqhS23vWeyE1k0YTid305xe7+hCiY= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Revamp how struct uprobe is refcounted, and thus how its lifetime is managed. Right now, there are a few possible "owners" of uprobe refcount: - uprobes_tree RB tree assumes one refcount when uprobe is registered and added to the lookup tree; - while uprobe is triggered and kernel is handling it in the breakpoint handler code, temporary refcount bump is done to keep uprobe from being freed; - if we have uretprobe requested on a given struct uprobe instance, we take another refcount to keep uprobe alive until user space code returns from the function and triggers return handler. The uprobe_tree's extra refcount of 1 is confusing and problematic. No matter how many actual consumers are attached, they all share the same refcount, and we have an extra logic to drop the "last" (which might not really be last) refcount once uprobe's consumer list becomes empty. This is unconventional and has to be kept in mind as a special case all the time. Further, because of this design we have the situations where find_uprobe() will find uprobe, bump refcount, return it to the caller, but that uprobe will still need uprobe_is_active() check, after which the caller is required to drop refcount and try again. This is just too many details leaking to the higher level logic. This patch changes refcounting scheme in such a way as to not have uprobes_tree keeping extra refcount for struct uprobe. Instead, each uprobe_consumer is assuming its own refcount, which will be dropped when consumer is unregistered. Other than that, all the active users of uprobe (entry and return uprobe handling code) keeps exactly the same refcounting approach. With the above setup, once uprobe's refcount drops to zero, we need to make sure that uprobe's "destructor" removes uprobe from uprobes_tree, of course. This, though, races with uprobe entry handling code in handle_swbp(), which, through find_active_uprobe()->find_uprobe() lookup, can race with uprobe being destroyed after refcount drops to zero (e.g., due to uprobe_consumer unregistering). So we add try_get_uprobe(), which will attempt to bump refcount, unless it already is zero. Caller needs to guarantee that uprobe instance won't be freed in parallel, which is the case while we keep uprobes_treelock (for read or write, doesn't matter). Note also, we now don't leak the race between registration and unregistration, so we remove the retry logic completely. If find_uprobe() returns valid uprobe, it's guaranteed to remain in uprobes_tree with properly incremented refcount. The race is handled inside __insert_uprobe() and put_uprobe() working together: __insert_uprobe() will remove uprobe from RB-tree, if it can't bump refcount and will retry to insert the new uprobe instance. put_uprobe() won't attempt to remove uprobe from RB-tree, if it's already not there. All that is protected by uprobes_treelock, which keeps things simple. Signed-off-by: Andrii Nakryiko --- kernel/events/uprobes.c | 179 +++++++++++++++++++++++----------------- 1 file changed, 101 insertions(+), 78 deletions(-) diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c index 33349cc8de0c..147561c19d57 100644 --- a/kernel/events/uprobes.c +++ b/kernel/events/uprobes.c @@ -109,6 +109,11 @@ struct xol_area { unsigned long vaddr; /* Page(s) of instruction slots */ }; +static void uprobe_warn(struct task_struct *t, const char *msg) +{ + pr_warn("uprobe: %s:%d failed to %s\n", current->comm, current->pid, msg); +} + /* * valid_vma: Verify if the specified vma is an executable vma * Relax restrictions while unregistering: vm_flags might have @@ -587,25 +592,53 @@ set_orig_insn(struct arch_uprobe *auprobe, struct mm_struct *mm, unsigned long v *(uprobe_opcode_t *)&auprobe->insn); } +/* uprobe should have guaranteed positive refcount */ static struct uprobe *get_uprobe(struct uprobe *uprobe) { refcount_inc(&uprobe->ref); return uprobe; } +/* + * uprobe should have guaranteed lifetime, which can be either of: + * - caller already has refcount taken (and wants an extra one); + * - uprobe is RCU protected and won't be freed until after grace period; + * - we are holding uprobes_treelock (for read or write, doesn't matter). + */ +static struct uprobe *try_get_uprobe(struct uprobe *uprobe) +{ + if (refcount_inc_not_zero(&uprobe->ref)) + return uprobe; + return NULL; +} + +static inline bool uprobe_is_active(struct uprobe *uprobe) +{ + return !RB_EMPTY_NODE(&uprobe->rb_node); +} + static void put_uprobe(struct uprobe *uprobe) { - if (refcount_dec_and_test(&uprobe->ref)) { - /* - * If application munmap(exec_vma) before uprobe_unregister() - * gets called, we don't get a chance to remove uprobe from - * delayed_uprobe_list from remove_breakpoint(). Do it here. - */ - mutex_lock(&delayed_uprobe_lock); - delayed_uprobe_remove(uprobe, NULL); - mutex_unlock(&delayed_uprobe_lock); - kfree(uprobe); - } + if (!refcount_dec_and_test(&uprobe->ref)) + return; + + write_lock(&uprobes_treelock); + + if (uprobe_is_active(uprobe)) + rb_erase(&uprobe->rb_node, &uprobes_tree); + + write_unlock(&uprobes_treelock); + + /* + * If application munmap(exec_vma) before uprobe_unregister() + * gets called, we don't get a chance to remove uprobe from + * delayed_uprobe_list from remove_breakpoint(). Do it here. + */ + mutex_lock(&delayed_uprobe_lock); + delayed_uprobe_remove(uprobe, NULL); + mutex_unlock(&delayed_uprobe_lock); + + kfree(uprobe); } static __always_inline @@ -656,7 +689,7 @@ static struct uprobe *__find_uprobe(struct inode *inode, loff_t offset) struct rb_node *node = rb_find(&key, &uprobes_tree, __uprobe_cmp_key); if (node) - return get_uprobe(__node_2_uprobe(node)); + return try_get_uprobe(__node_2_uprobe(node)); return NULL; } @@ -676,26 +709,44 @@ static struct uprobe *find_uprobe(struct inode *inode, loff_t offset) return uprobe; } +/* + * Attempt to insert a new uprobe into uprobes_tree. + * + * If uprobe already exists (for given inode+offset), we just increment + * refcount of previously existing uprobe. + * + * If not, a provided new instance of uprobe is inserted into the tree (with + * assumed initial refcount == 1). + * + * In any case, we return a uprobe instance that ends up being in uprobes_tree. + * Caller has to clean up new uprobe instance, if it ended up not being + * inserted into the tree. + * + * We assume that uprobes_treelock is held for writing. + */ static struct uprobe *__insert_uprobe(struct uprobe *uprobe) { struct rb_node *node; - +again: node = rb_find_add(&uprobe->rb_node, &uprobes_tree, __uprobe_cmp); - if (node) - return get_uprobe(__node_2_uprobe(node)); + if (node) { + struct uprobe *u = __node_2_uprobe(node); - /* get access + creation ref */ - refcount_set(&uprobe->ref, 2); - return NULL; + if (!try_get_uprobe(u)) { + rb_erase(node, &uprobes_tree); + RB_CLEAR_NODE(&u->rb_node); + goto again; + } + + return u; + } + + return uprobe; } /* - * Acquire uprobes_treelock. - * Matching uprobe already exists in rbtree; - * increment (access refcount) and return the matching uprobe. - * - * No matching uprobe; insert the uprobe in rb_tree; - * get a double refcount (access + creation) and return NULL. + * Acquire uprobes_treelock and insert uprobe into uprobes_tree + * (or reuse existing one, see __insert_uprobe() comments above). */ static struct uprobe *insert_uprobe(struct uprobe *uprobe) { @@ -732,11 +783,13 @@ static struct uprobe *alloc_uprobe(struct inode *inode, loff_t offset, uprobe->ref_ctr_offset = ref_ctr_offset; init_rwsem(&uprobe->register_rwsem); init_rwsem(&uprobe->consumer_rwsem); + RB_CLEAR_NODE(&uprobe->rb_node); + refcount_set(&uprobe->ref, 1); /* add to uprobes_tree, sorted on inode:offset */ cur_uprobe = insert_uprobe(uprobe); /* a uprobe exists for this inode:offset combination */ - if (cur_uprobe) { + if (cur_uprobe != uprobe) { if (cur_uprobe->ref_ctr_offset != uprobe->ref_ctr_offset) { ref_ctr_mismatch_warn(cur_uprobe, uprobe); put_uprobe(cur_uprobe); @@ -921,26 +974,6 @@ remove_breakpoint(struct uprobe *uprobe, struct mm_struct *mm, unsigned long vad return set_orig_insn(&uprobe->arch, mm, vaddr); } -static inline bool uprobe_is_active(struct uprobe *uprobe) -{ - return !RB_EMPTY_NODE(&uprobe->rb_node); -} -/* - * There could be threads that have already hit the breakpoint. They - * will recheck the current insn and restart if find_uprobe() fails. - * See find_active_uprobe(). - */ -static void delete_uprobe(struct uprobe *uprobe) -{ - if (WARN_ON(!uprobe_is_active(uprobe))) - return; - - write_lock(&uprobes_treelock); - rb_erase(&uprobe->rb_node, &uprobes_tree); - write_unlock(&uprobes_treelock); - RB_CLEAR_NODE(&uprobe->rb_node); /* for uprobe_is_active() */ -} - struct map_info { struct map_info *next; struct mm_struct *mm; @@ -1094,17 +1127,13 @@ void uprobe_unregister(struct uprobe *uprobe, struct uprobe_consumer *uc) int err; down_write(&uprobe->register_rwsem); - if (WARN_ON(!consumer_del(uprobe, uc))) + if (WARN_ON(!consumer_del(uprobe, uc))) { err = -ENOENT; - else + } else { err = register_for_each_vma(uprobe, NULL); - - /* TODO : cant unregister? schedule a worker thread */ - if (!err) { - if (!uprobe->consumers) - delete_uprobe(uprobe); - else - err = -EBUSY; + /* TODO : cant unregister? schedule a worker thread */ + if (unlikely(err)) + uprobe_warn(current, "unregister, leaking uprobe"); } up_write(&uprobe->register_rwsem); @@ -1159,27 +1188,16 @@ struct uprobe *uprobe_register(struct inode *inode, if (!IS_ALIGNED(ref_ctr_offset, sizeof(short))) return ERR_PTR(-EINVAL); - retry: uprobe = alloc_uprobe(inode, offset, ref_ctr_offset); if (IS_ERR(uprobe)) return uprobe; - /* - * We can race with uprobe_unregister()->delete_uprobe(). - * Check uprobe_is_active() and retry if it is false. - */ down_write(&uprobe->register_rwsem); - ret = -EAGAIN; - if (likely(uprobe_is_active(uprobe))) { - consumer_add(uprobe, uc); - ret = register_for_each_vma(uprobe, uc); - } + consumer_add(uprobe, uc); + ret = register_for_each_vma(uprobe, uc); up_write(&uprobe->register_rwsem); - put_uprobe(uprobe); if (ret) { - if (unlikely(ret == -EAGAIN)) - goto retry; uprobe_unregister(uprobe, uc); return ERR_PTR(ret); } @@ -1286,15 +1304,17 @@ static void build_probe_list(struct inode *inode, u = rb_entry(t, struct uprobe, rb_node); if (u->inode != inode || u->offset < min) break; - list_add(&u->pending_list, head); - get_uprobe(u); + /* if uprobe went away, it's safe to ignore it */ + if (try_get_uprobe(u)) + list_add(&u->pending_list, head); } for (t = n; (t = rb_next(t)); ) { u = rb_entry(t, struct uprobe, rb_node); if (u->inode != inode || u->offset > max) break; - list_add(&u->pending_list, head); - get_uprobe(u); + /* if uprobe went away, it's safe to ignore it */ + if (try_get_uprobe(u)) + list_add(&u->pending_list, head); } } read_unlock(&uprobes_treelock); @@ -1752,6 +1772,12 @@ static int dup_utask(struct task_struct *t, struct uprobe_task *o_utask) return -ENOMEM; *n = *o; + /* + * uprobe's refcnt has to be positive at this point, kept by + * utask->return_instances items; return_instances can't be + * removed right now, as task is blocked due to duping; so + * get_uprobe() is safe to use here. + */ get_uprobe(n->uprobe); n->next = NULL; @@ -1763,12 +1789,6 @@ static int dup_utask(struct task_struct *t, struct uprobe_task *o_utask) return 0; } -static void uprobe_warn(struct task_struct *t, const char *msg) -{ - pr_warn("uprobe: %s:%d failed to %s\n", - current->comm, current->pid, msg); -} - static void dup_xol_work(struct callback_head *work) { if (current->flags & PF_EXITING) @@ -1894,7 +1914,10 @@ static void prepare_uretprobe(struct uprobe *uprobe, struct pt_regs *regs) } orig_ret_vaddr = utask->return_instances->orig_ret_vaddr; } - + /* + * uprobe's refcnt is positive, held by caller, so it's safe to + * unconditionally bump it one more time here + */ ri->uprobe = get_uprobe(uprobe); ri->func = instruction_pointer(regs); ri->stack = user_stack_pointer(regs); From patchwork Thu Aug 29 18:37:35 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrii Nakryiko X-Patchwork-Id: 13783563 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9DAE5C87FD2 for ; Thu, 29 Aug 2024 18:37:57 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C4DAC6B00AF; Thu, 29 Aug 2024 14:37:56 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id BDBCA6B00B1; Thu, 29 Aug 2024 14:37:56 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A2C6F6B00B2; Thu, 29 Aug 2024 14:37:56 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 80E456B00AF for ; Thu, 29 Aug 2024 14:37:56 -0400 (EDT) Received: from smtpin07.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 0644BA11F9 for ; Thu, 29 Aug 2024 18:37:56 +0000 (UTC) X-FDA: 82506142152.07.6DD7536 Received: from nyc.source.kernel.org (nyc.source.kernel.org [147.75.193.91]) by imf27.hostedemail.com (Postfix) with ESMTP id 6A5F240017 for ; Thu, 29 Aug 2024 18:37:54 +0000 (UTC) Authentication-Results: imf27.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=EFNfuBWg; dmarc=pass (policy=quarantine) header.from=kernel.org; spf=pass (imf27.hostedemail.com: domain of andrii@kernel.org designates 147.75.193.91 as permitted sender) smtp.mailfrom=andrii@kernel.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1724956608; a=rsa-sha256; cv=none; b=RJnFeeuRUEDNvz7MLEVc2jZOZKM1zZ588B9Q5FDIelWnUkuVqtJPPY099C0C/kPkib1BON F/uT5kCRL/oeW/T7afZNWgu3wfTKRTe69OWvsTfnZdvNnfnMfIsfRk2lI4xMrW25xmYZul Oj1f1cB7UcyoIHRUSqFqrHPm3TE11pw= ARC-Authentication-Results: i=1; imf27.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=EFNfuBWg; dmarc=pass (policy=quarantine) header.from=kernel.org; spf=pass (imf27.hostedemail.com: domain of andrii@kernel.org designates 147.75.193.91 as permitted sender) smtp.mailfrom=andrii@kernel.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1724956608; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=1ZWK7bm1HkTQcb51SIbzEbm8+h3G7RmFgIjgqVvjSlA=; b=SrmZLbmptvAv97Iwqc5UwdyIcAIwTwOFaxI7gRhJKTcmK14eCVT1l4Lqytsq+B5+Ofeh24 Oj4pIi0QMqVYtyALi397c6FsfEv1KYgIyqv68ZOnc9mhzbkfE8PQrUaOvZiyRFgNbO3+9A 9uegwAOv4eVhq7SreshxhrNlz44CWXU= Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by nyc.source.kernel.org (Postfix) with ESMTP id 77478A4210A; Thu, 29 Aug 2024 18:37:46 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id CC7B1C4CEC1; Thu, 29 Aug 2024 18:37:52 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1724956673; bh=5HQ56DnZm01JlMtuqxoBg8Kdzw4Qreqv0MmJ/xI2OUc=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=EFNfuBWgWsjVm+IBgiE5BYYh3/DMcE7xBCELmRTVGUDeKpUlExQNXzYAv4Kx2VDf3 WaefWA2jaF1yEO4Yypj/YMr7W64YoJFSj4f0C7Fnh4JX8C/Kk34G7BZ9i9NgKm9gBh oHpeYJ2roT5iLeVkdTjleNGdwRDtw6e13Sqd8B2gIn0pKOoTJB6hVLUTjT1u28BFbM Kj1Fofk6CpG+1FIGHI+Ewbr54KODXnBLo24oFvVjxXzO/R+bL9e2rdLJMfRHJ9N75U vrrKHET/k5f778b62TC8KlGKenJQnFcdE/fmwmfh0SimHCU2BgkCWX1TMCjbICf9sZ SgWCalOZfGJ0w== From: Andrii Nakryiko To: linux-trace-kernel@vger.kernel.org, peterz@infradead.org, oleg@redhat.com Cc: rostedt@goodmis.org, mhiramat@kernel.org, bpf@vger.kernel.org, linux-kernel@vger.kernel.org, jolsa@kernel.org, paulmck@kernel.org, willy@infradead.org, surenb@google.com, akpm@linux-foundation.org, linux-mm@kvack.org, Andrii Nakryiko Subject: [PATCH v4 2/8] uprobes: protected uprobe lifetime with SRCU Date: Thu, 29 Aug 2024 11:37:35 -0700 Message-ID: <20240829183741.3331213-3-andrii@kernel.org> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20240829183741.3331213-1-andrii@kernel.org> References: <20240829183741.3331213-1-andrii@kernel.org> MIME-Version: 1.0 X-Rspamd-Queue-Id: 6A5F240017 X-Rspam-User: X-Rspamd-Server: rspam05 X-Stat-Signature: wb64a594nwxwq1b4mmentgfr676imagt X-HE-Tag: 1724956674-315595 X-HE-Meta: U2FsdGVkX195jz2CrhLHY/F4WMhNn7KB9JOVCu+QvPyk4BLASR2+v7xF+bE4bGPBV5w9lmMyMa7mBIwqH7aOHP4hkpqbF79Li66y+xoQlitws4wuHbAzpFDP6F8tPTd2eOxjKkXyFfjKGGDrtKW2aHhXVx56l9lOQXL0/ydaOHr27bBvYCOic3F5d5SalnTMpzACeuNb/XDY9542UQI1AP1Mi3fzg96AmNYVZcm64ggdsvkSteH4bW1RcO4LLu50zT7qfPO0XIMs93PxTT/4sW/4UcjlwJXzUK48OtE4giQIP4DTq+kTJnOiBcZdmtyxxMJrul6tTOzMoKLcTg0jPHIA0lR3YAj+p1rqptXavdWHPnP41zCcrMhJtOfRuzdF/SgTvHXi6e+LAJJjeWO1hpbX0sz2OGjJdqtxe6L9PhhgrR0CR6RQ7O3hQ5yzP1nJ52PB5/W/TMoZLa2/FbeSkXEMdL/R0qoKfRaSM39HpSTS6sQwVvewXsvalceLWHy4Y/JvSDEbc7AAWSwVCfCcjNkHIO0QHUdrphpzIhUfGk3dcjdo+ApNDfMPlXF6GD/QnabLF1ph4DWnRVBtuCwl6c8JGbF+Ayto/u6MDRLA908IopfCyZqAt1u8mqo2Yxtkh3EtMQOEodM9YxTM2aHzPbsZ3o+uSjYW8fgIO73f+a9tdmG4DJNU3DlcyQ5wz9vnYGRZi6BZYHPF4ljwi6s9bI6RfrumRGF57aFcrronBNS7IYrQv8btxJ+e4Wn52FwNZ9KdKS5fbIlzc9mUWThlObXeD18DCqboGlBaPEclwMW9dakuLVXy1QAj8dnRaezW0JkUHtuCulAE0B9CQqwMN1Gp6PNBriqOY7BF5JdOXDpMDJUuNmZHF9FkFUybghd9AlHXsQTxdoqT1xooG4mo0pfAyJdFca+UK8pgEE9QtvEU4k2v1ioLPE1pzrH1+IYvzpkzeF1yBZjuNzurAri OPq39I/f uMVRlh07l1e/bJh0JF6s9XZnEanMZLYirxtz0FqqPp2oGx0V4nbPduYSGeMKjEeXNydidshmlJKawtc4Cm2YDSFosuEZmAQrB5cN0mJQX1o2ARSoSIhG4RHlTf5SgkjjPMR2ZzX3iak2qOE/6I26NBL5mZifEyX3MhpAXCAfET/NXa9UXhBlywRwn2nptnfzPW/A6U5Q1wwBi+/sSe8m4mbb2XLF1bvgr3pjxtySChlodF3G9eRfz6FniWA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: To avoid unnecessarily taking a (brief) refcount on uprobe during breakpoint handling in handle_swbp for entry uprobes, make find_uprobe() not take refcount, but protect the lifetime of a uprobe instance with RCU. This improves scalability, as refcount gets quite expensive due to cache line bouncing between multiple CPUs. Specifically, we utilize our own uprobe-specific SRCU instance for this RCU protection. put_uprobe() will delay actual kfree() using call_srcu(). For now, uretprobe and single-stepping handling will still acquire refcount as necessary. We'll address these issues in follow up patches by making them use SRCU with timeout. Signed-off-by: Andrii Nakryiko --- kernel/events/uprobes.c | 94 +++++++++++++++++++++++------------------ 1 file changed, 54 insertions(+), 40 deletions(-) diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c index 147561c19d57..3e3595753e2c 100644 --- a/kernel/events/uprobes.c +++ b/kernel/events/uprobes.c @@ -41,6 +41,8 @@ static struct rb_root uprobes_tree = RB_ROOT; static DEFINE_RWLOCK(uprobes_treelock); /* serialize rbtree access */ +DEFINE_STATIC_SRCU(uprobes_srcu); + #define UPROBES_HASH_SZ 13 /* serialize uprobe->pending_list */ static struct mutex uprobes_mmap_mutex[UPROBES_HASH_SZ]; @@ -59,6 +61,7 @@ struct uprobe { struct list_head pending_list; struct uprobe_consumer *consumers; struct inode *inode; /* Also hold a ref to inode */ + struct rcu_head rcu; loff_t offset; loff_t ref_ctr_offset; unsigned long flags; @@ -617,6 +620,13 @@ static inline bool uprobe_is_active(struct uprobe *uprobe) return !RB_EMPTY_NODE(&uprobe->rb_node); } +static void uprobe_free_rcu(struct rcu_head *rcu) +{ + struct uprobe *uprobe = container_of(rcu, struct uprobe, rcu); + + kfree(uprobe); +} + static void put_uprobe(struct uprobe *uprobe) { if (!refcount_dec_and_test(&uprobe->ref)) @@ -638,7 +648,7 @@ static void put_uprobe(struct uprobe *uprobe) delayed_uprobe_remove(uprobe, NULL); mutex_unlock(&delayed_uprobe_lock); - kfree(uprobe); + call_srcu(&uprobes_srcu, &uprobe->rcu, uprobe_free_rcu); } static __always_inline @@ -680,33 +690,25 @@ static inline int __uprobe_cmp(struct rb_node *a, const struct rb_node *b) return uprobe_cmp(u->inode, u->offset, __node_2_uprobe(b)); } -static struct uprobe *__find_uprobe(struct inode *inode, loff_t offset) +/* + * Assumes being inside RCU protected region. + * No refcount is taken on returned uprobe. + */ +static struct uprobe *find_uprobe_rcu(struct inode *inode, loff_t offset) { struct __uprobe_key key = { .inode = inode, .offset = offset, }; - struct rb_node *node = rb_find(&key, &uprobes_tree, __uprobe_cmp_key); - - if (node) - return try_get_uprobe(__node_2_uprobe(node)); - - return NULL; -} + struct rb_node *node; -/* - * Find a uprobe corresponding to a given inode:offset - * Acquires uprobes_treelock - */ -static struct uprobe *find_uprobe(struct inode *inode, loff_t offset) -{ - struct uprobe *uprobe; + lockdep_assert(srcu_read_lock_held(&uprobes_srcu)); read_lock(&uprobes_treelock); - uprobe = __find_uprobe(inode, offset); + node = rb_find(&key, &uprobes_tree, __uprobe_cmp_key); read_unlock(&uprobes_treelock); - return uprobe; + return node ? __node_2_uprobe(node) : NULL; } /* @@ -1080,10 +1082,10 @@ register_for_each_vma(struct uprobe *uprobe, struct uprobe_consumer *new) goto free; /* * We take mmap_lock for writing to avoid the race with - * find_active_uprobe() which takes mmap_lock for reading. + * find_active_uprobe_rcu() which takes mmap_lock for reading. * Thus this install_breakpoint() can not make - * is_trap_at_addr() true right after find_uprobe() - * returns NULL in find_active_uprobe(). + * is_trap_at_addr() true right after find_uprobe_rcu() + * returns NULL in find_active_uprobe_rcu(). */ mmap_write_lock(mm); vma = find_vma(mm, info->vaddr); @@ -1885,9 +1887,13 @@ static void prepare_uretprobe(struct uprobe *uprobe, struct pt_regs *regs) return; } + /* we need to bump refcount to store uprobe in utask */ + if (!try_get_uprobe(uprobe)) + return; + ri = kmalloc(sizeof(struct return_instance), GFP_KERNEL); if (!ri) - return; + goto fail; trampoline_vaddr = uprobe_get_trampoline_vaddr(); orig_ret_vaddr = arch_uretprobe_hijack_return_addr(trampoline_vaddr, regs); @@ -1914,11 +1920,7 @@ static void prepare_uretprobe(struct uprobe *uprobe, struct pt_regs *regs) } orig_ret_vaddr = utask->return_instances->orig_ret_vaddr; } - /* - * uprobe's refcnt is positive, held by caller, so it's safe to - * unconditionally bump it one more time here - */ - ri->uprobe = get_uprobe(uprobe); + ri->uprobe = uprobe; ri->func = instruction_pointer(regs); ri->stack = user_stack_pointer(regs); ri->orig_ret_vaddr = orig_ret_vaddr; @@ -1929,8 +1931,9 @@ static void prepare_uretprobe(struct uprobe *uprobe, struct pt_regs *regs) utask->return_instances = ri; return; - fail: +fail: kfree(ri); + put_uprobe(uprobe); } /* Prepare to single-step probed instruction out of line. */ @@ -1945,9 +1948,14 @@ pre_ssout(struct uprobe *uprobe, struct pt_regs *regs, unsigned long bp_vaddr) if (!utask) return -ENOMEM; + if (!try_get_uprobe(uprobe)) + return -EINVAL; + xol_vaddr = xol_get_insn_slot(uprobe); - if (!xol_vaddr) - return -ENOMEM; + if (!xol_vaddr) { + err = -ENOMEM; + goto err_out; + } utask->xol_vaddr = xol_vaddr; utask->vaddr = bp_vaddr; @@ -1955,12 +1963,15 @@ pre_ssout(struct uprobe *uprobe, struct pt_regs *regs, unsigned long bp_vaddr) err = arch_uprobe_pre_xol(&uprobe->arch, regs); if (unlikely(err)) { xol_free_insn_slot(current); - return err; + goto err_out; } utask->active_uprobe = uprobe; utask->state = UTASK_SSTEP; return 0; +err_out: + put_uprobe(uprobe); + return err; } /* @@ -2043,7 +2054,8 @@ static int is_trap_at_addr(struct mm_struct *mm, unsigned long vaddr) return is_trap_insn(&opcode); } -static struct uprobe *find_active_uprobe(unsigned long bp_vaddr, int *is_swbp) +/* assumes being inside RCU protected region */ +static struct uprobe *find_active_uprobe_rcu(unsigned long bp_vaddr, int *is_swbp) { struct mm_struct *mm = current->mm; struct uprobe *uprobe = NULL; @@ -2056,7 +2068,7 @@ static struct uprobe *find_active_uprobe(unsigned long bp_vaddr, int *is_swbp) struct inode *inode = file_inode(vma->vm_file); loff_t offset = vaddr_to_offset(vma, bp_vaddr); - uprobe = find_uprobe(inode, offset); + uprobe = find_uprobe_rcu(inode, offset); } if (!uprobe) @@ -2202,13 +2214,15 @@ static void handle_swbp(struct pt_regs *regs) { struct uprobe *uprobe; unsigned long bp_vaddr; - int is_swbp; + int is_swbp, srcu_idx; bp_vaddr = uprobe_get_swbp_addr(regs); if (bp_vaddr == uprobe_get_trampoline_vaddr()) return uprobe_handle_trampoline(regs); - uprobe = find_active_uprobe(bp_vaddr, &is_swbp); + srcu_idx = srcu_read_lock(&uprobes_srcu); + + uprobe = find_active_uprobe_rcu(bp_vaddr, &is_swbp); if (!uprobe) { if (is_swbp > 0) { /* No matching uprobe; signal SIGTRAP. */ @@ -2224,7 +2238,7 @@ static void handle_swbp(struct pt_regs *regs) */ instruction_pointer_set(regs, bp_vaddr); } - return; + goto out; } /* change it in advance for ->handler() and restart */ @@ -2259,12 +2273,12 @@ static void handle_swbp(struct pt_regs *regs) if (arch_uprobe_skip_sstep(&uprobe->arch, regs)) goto out; - if (!pre_ssout(uprobe, regs, bp_vaddr)) - return; + if (pre_ssout(uprobe, regs, bp_vaddr)) + goto out; - /* arch_uprobe_skip_sstep() succeeded, or restart if can't singlestep */ out: - put_uprobe(uprobe); + /* arch_uprobe_skip_sstep() succeeded, or restart if can't singlestep */ + srcu_read_unlock(&uprobes_srcu, srcu_idx); } /* From patchwork Thu Aug 29 18:37:36 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrii Nakryiko X-Patchwork-Id: 13783564 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id ED3E1C87FD2 for ; Thu, 29 Aug 2024 18:38:03 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 682376B00B3; Thu, 29 Aug 2024 14:38:03 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 6082A6B00B4; Thu, 29 Aug 2024 14:38:03 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 45ADC6B00B6; Thu, 29 Aug 2024 14:38:03 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 2458B6B00B3 for ; Thu, 29 Aug 2024 14:38:03 -0400 (EDT) Received: from smtpin22.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 96881406DD for ; Thu, 29 Aug 2024 18:38:02 +0000 (UTC) X-FDA: 82506142404.22.5CA55A9 Received: from sin.source.kernel.org (sin.source.kernel.org [145.40.73.55]) by imf15.hostedemail.com (Postfix) with ESMTP id 4685BA002A for ; Thu, 29 Aug 2024 18:37:59 +0000 (UTC) Authentication-Results: imf15.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b="R0C32/Hx"; spf=pass (imf15.hostedemail.com: domain of andrii@kernel.org designates 145.40.73.55 as permitted sender) smtp.mailfrom=andrii@kernel.org; dmarc=pass (policy=quarantine) header.from=kernel.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1724956562; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=tdv90doT2nJa6DVyXFTavFfUQdeTgDeASjyWU3h4dEE=; b=0MaYrNSz5olDjwXnbAI6xltNP19cqUDrDn+JzCwKxKIKIjAuA2lI7b4vnQMSCdPUvFR6JX 8e2VWPsyFjG4ghnnnt20Sqt2KlIJzr93xed+UjHYuCg4BQi/KpBND5I9meDOBXAS80rT3S ohEZ4v94alLpL+9Ird2hDAMCzInF/TQ= ARC-Authentication-Results: i=1; imf15.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b="R0C32/Hx"; spf=pass (imf15.hostedemail.com: domain of andrii@kernel.org designates 145.40.73.55 as permitted sender) smtp.mailfrom=andrii@kernel.org; dmarc=pass (policy=quarantine) header.from=kernel.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1724956562; a=rsa-sha256; cv=none; b=j8W3e3fQkhJPwKqAOGZUMjdRvv3uJwd+BFY9dUt8xVB4b67NFUlubGLUani/z/ltf7m0AG 8sheFVvq+QWXpkLidzcDS+AEpBYp/Xm2Rk3PR9ZMVv8Mz1lTgBKOqwi2nGRITEAguY33VX BE8PWi4W0VgYoXYthAVB9y6lg6SulEI= Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by sin.source.kernel.org (Postfix) with ESMTP id E9367CE1C58; Thu, 29 Aug 2024 18:37:56 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 1C858C4CEC1; Thu, 29 Aug 2024 18:37:56 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1724956676; bh=uB9+IvJHW7QHyKw3tlvcdrmTqmlaEh8Wb6qWvDMkeqI=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=R0C32/HxcnullYmYODfLBOAm95ENZ77LpyOSqhrTR5282TU19iHVfGCtTERRKl4Xd gEoceEOvrXWiPR5WvCbPIecsj2plew3ivwlj3WwOiAqxtwwECBThdxn+Mml3qBA7j7 RLikOuen3wU/dRuBorVMIGts13xMDZhNtC+ScbQHhOTSUgZgSpZENOzdXljKuGXaFz YeSw7eef693DWz+loTFI1HdJQofxGf/eI6YuLHudlkEPyXtX49LpuPxfuZb0xUEQ2N 3fNV9bCTWMO+3XFKlfnJzuS7l4KTglLvSyd9817qOMzE0nGaBC3Lsy+GfsR3F2RXDv ZFYHpe9IhIO1w== From: Andrii Nakryiko To: linux-trace-kernel@vger.kernel.org, peterz@infradead.org, oleg@redhat.com Cc: rostedt@goodmis.org, mhiramat@kernel.org, bpf@vger.kernel.org, linux-kernel@vger.kernel.org, jolsa@kernel.org, paulmck@kernel.org, willy@infradead.org, surenb@google.com, akpm@linux-foundation.org, linux-mm@kvack.org, Andrii Nakryiko Subject: [PATCH v4 3/8] uprobes: get rid of enum uprobe_filter_ctx in uprobe filter callbacks Date: Thu, 29 Aug 2024 11:37:36 -0700 Message-ID: <20240829183741.3331213-4-andrii@kernel.org> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20240829183741.3331213-1-andrii@kernel.org> References: <20240829183741.3331213-1-andrii@kernel.org> MIME-Version: 1.0 X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: 4685BA002A X-Stat-Signature: gxdrr3aa4e5npkwhpxogpbbx56gj96ds X-Rspam-User: X-HE-Tag: 1724956679-417680 X-HE-Meta: U2FsdGVkX1+ZEo7zp0KaHJLBWAfEYPcU30yePXw8y2ri3NH0qw5fHDEhzWOH6voEAWLOBly58bMm9Cn+BLn9FMQuqHDjJ10tCIEq/O3bLMSHVuSvynKC05J50KnzUGGoN/Y+c2IWoonPRrdz8tHSFAW/hSKXEim1jazVKgW4r56HJx6HIK1yQ4E8aHJQryQBDlqklBQKg7khJix75A1Mf2O6eYHv0qNqPVMiSu7EWHTNvi3rcYoT1iv6BJ3MfdFWPxIE6cTblvOhKtGyDnTkSoL2c4FJrDCND8BeqF8ayCU5wZ1PqUInMcLaQ9nAo3Zyae7Kohh+eDr46H4ltQArR9ZVhEpPddKuljmPMK4jUkQVHjN3rhRwX+bnr5ZMeExr7j6NRyPs9uR2lTaMVpuK2ybsWVXpKb1PEXWxckYauqTVmOThwYmkxY30vjgLVzoKQVxOW5Qc+bsw77a+E+atoXw2/+bsFqGKVsQwzEDq+iTwOaFIAouHA6wP/Rhl89FnDzw9yz9Yy9k5BJ1HWSET9ZEyN87uVWxyjSjLwvXZzF3grnT6AzG1sDNxUEflO81txn+8I20ymlnwvJXCgYhls57mMg6w5cBYwMsRT/+Pgb/VZ9c6bT+J6sj4f+eOBO19ZWweUQCMUQlhk/Mv38lpS+VTsRuJCW/A3EIaoiigPJ31MLJbKnWitNd4HxKD/d5ZcWOYFpmgxVxC382iuTYfo9NGji3kAm67AxnszEyUXqXJhOz23eQT1ytHNNKgjM0X2P791tcCKx7miH4FacldhJkMc5pMByMfjvzduHdxsCYaB8iSYl7JUWF6zuQM1FCo3I+RAEziKd0xKIsDBvdtnE0f4DmGPsNqiVGxZk1DX2+9pULOJpF/JJT3cGspN4VuaQMXFet8Y/JsQbEtyAF1ujt1/bEBOK8ictJU/mcy9n4XVziUJZVbVyciudCw9WI7Jl2Usnu/PN3Tmgz2Q4q 0+99Xccw R7WQcxdcynR8vDnh5VTsVvw3duTePQ20/ZMMBTUdMl+TTaCKzA9uc+DMc+OON3/SGZPryk76Uj46KCRDm4zU7Ai0V+Xv8SjmvGXCp9gefWiFoeQA4kQXBK7lfSZKZL4dR41Ozb+SBnJVZNUSWojsS7XgQW+6E014wCJBmIxvrHcWz/pq8E6a0NpaYL8lvQAHAy9mKNkzsPrA0t40gYMCvb0Mn+E0Ux2c+vy35gF2oZYhocitCdpqqgo08FUneDG8xVm4TJXJoz+L3W1L011dAgMwVzw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: It serves no purpose beyond adding unnecessray argument passed to the filter callback. Just get rid of it, no one is actually using it. Signed-off-by: Andrii Nakryiko --- include/linux/uprobes.h | 10 +--------- kernel/events/uprobes.c | 18 +++++++----------- kernel/trace/bpf_trace.c | 3 +-- kernel/trace/trace_uprobe.c | 9 +++------ 4 files changed, 12 insertions(+), 28 deletions(-) diff --git a/include/linux/uprobes.h b/include/linux/uprobes.h index 6332c111036e..9cf0dce62e4c 100644 --- a/include/linux/uprobes.h +++ b/include/linux/uprobes.h @@ -28,20 +28,12 @@ struct page; #define MAX_URETPROBE_DEPTH 64 -enum uprobe_filter_ctx { - UPROBE_FILTER_REGISTER, - UPROBE_FILTER_UNREGISTER, - UPROBE_FILTER_MMAP, -}; - struct uprobe_consumer { int (*handler)(struct uprobe_consumer *self, struct pt_regs *regs); int (*ret_handler)(struct uprobe_consumer *self, unsigned long func, struct pt_regs *regs); - bool (*filter)(struct uprobe_consumer *self, - enum uprobe_filter_ctx ctx, - struct mm_struct *mm); + bool (*filter)(struct uprobe_consumer *self, struct mm_struct *mm); struct uprobe_consumer *next; }; diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c index 3e3595753e2c..8bdcdc6901b2 100644 --- a/kernel/events/uprobes.c +++ b/kernel/events/uprobes.c @@ -918,21 +918,19 @@ static int prepare_uprobe(struct uprobe *uprobe, struct file *file, return ret; } -static inline bool consumer_filter(struct uprobe_consumer *uc, - enum uprobe_filter_ctx ctx, struct mm_struct *mm) +static inline bool consumer_filter(struct uprobe_consumer *uc, struct mm_struct *mm) { - return !uc->filter || uc->filter(uc, ctx, mm); + return !uc->filter || uc->filter(uc, mm); } -static bool filter_chain(struct uprobe *uprobe, - enum uprobe_filter_ctx ctx, struct mm_struct *mm) +static bool filter_chain(struct uprobe *uprobe, struct mm_struct *mm) { struct uprobe_consumer *uc; bool ret = false; down_read(&uprobe->consumer_rwsem); for (uc = uprobe->consumers; uc; uc = uc->next) { - ret = consumer_filter(uc, ctx, mm); + ret = consumer_filter(uc, mm); if (ret) break; } @@ -1099,12 +1097,10 @@ register_for_each_vma(struct uprobe *uprobe, struct uprobe_consumer *new) if (is_register) { /* consult only the "caller", new consumer. */ - if (consumer_filter(new, - UPROBE_FILTER_REGISTER, mm)) + if (consumer_filter(new, mm)) err = install_breakpoint(uprobe, mm, vma, info->vaddr); } else if (test_bit(MMF_HAS_UPROBES, &mm->flags)) { - if (!filter_chain(uprobe, - UPROBE_FILTER_UNREGISTER, mm)) + if (!filter_chain(uprobe, mm)) err |= remove_breakpoint(uprobe, mm, info->vaddr); } @@ -1387,7 +1383,7 @@ int uprobe_mmap(struct vm_area_struct *vma) */ list_for_each_entry_safe(uprobe, u, &tmp_list, pending_list) { if (!fatal_signal_pending(current) && - filter_chain(uprobe, UPROBE_FILTER_MMAP, vma->vm_mm)) { + filter_chain(uprobe, vma->vm_mm)) { unsigned long vaddr = offset_to_vaddr(vma, uprobe->offset); install_breakpoint(uprobe, vma->vm_mm, vma, vaddr); } diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c index 4e391daafa64..73c570b5988b 100644 --- a/kernel/trace/bpf_trace.c +++ b/kernel/trace/bpf_trace.c @@ -3320,8 +3320,7 @@ static int uprobe_prog_run(struct bpf_uprobe *uprobe, } static bool -uprobe_multi_link_filter(struct uprobe_consumer *con, enum uprobe_filter_ctx ctx, - struct mm_struct *mm) +uprobe_multi_link_filter(struct uprobe_consumer *con, struct mm_struct *mm) { struct bpf_uprobe *uprobe; diff --git a/kernel/trace/trace_uprobe.c b/kernel/trace/trace_uprobe.c index 52e76a73fa7c..7eb79e0a5352 100644 --- a/kernel/trace/trace_uprobe.c +++ b/kernel/trace/trace_uprobe.c @@ -1078,9 +1078,7 @@ print_uprobe_event(struct trace_iterator *iter, int flags, struct trace_event *e return trace_handle_return(s); } -typedef bool (*filter_func_t)(struct uprobe_consumer *self, - enum uprobe_filter_ctx ctx, - struct mm_struct *mm); +typedef bool (*filter_func_t)(struct uprobe_consumer *self, struct mm_struct *mm); static int trace_uprobe_enable(struct trace_uprobe *tu, filter_func_t filter) { @@ -1339,8 +1337,7 @@ static int uprobe_perf_open(struct trace_event_call *call, return err; } -static bool uprobe_perf_filter(struct uprobe_consumer *uc, - enum uprobe_filter_ctx ctx, struct mm_struct *mm) +static bool uprobe_perf_filter(struct uprobe_consumer *uc, struct mm_struct *mm) { struct trace_uprobe_filter *filter; struct trace_uprobe *tu; @@ -1426,7 +1423,7 @@ static void __uprobe_perf_func(struct trace_uprobe *tu, static int uprobe_perf_func(struct trace_uprobe *tu, struct pt_regs *regs, struct uprobe_cpu_buffer **ucbp) { - if (!uprobe_perf_filter(&tu->consumer, 0, current->mm)) + if (!uprobe_perf_filter(&tu->consumer, current->mm)) return UPROBE_HANDLER_REMOVE; if (!is_ret_probe(tu)) From patchwork Thu Aug 29 18:37:37 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrii Nakryiko X-Patchwork-Id: 13783565 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id E41A2C8303E for ; Thu, 29 Aug 2024 18:38:05 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B8A9D6B00B4; Thu, 29 Aug 2024 14:38:03 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id B131D6B00B6; Thu, 29 Aug 2024 14:38:03 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 9B1D26B00B7; Thu, 29 Aug 2024 14:38:03 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 6A3E96B00B6 for ; Thu, 29 Aug 2024 14:38:03 -0400 (EDT) Received: from smtpin09.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 1E61B160814 for ; Thu, 29 Aug 2024 18:38:03 +0000 (UTC) X-FDA: 82506142446.09.9FCC00F Received: from nyc.source.kernel.org (nyc.source.kernel.org [147.75.193.91]) by imf17.hostedemail.com (Postfix) with ESMTP id 7281340008 for ; Thu, 29 Aug 2024 18:38:01 +0000 (UTC) Authentication-Results: imf17.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=W1RTkwN6; dmarc=pass (policy=quarantine) header.from=kernel.org; spf=pass (imf17.hostedemail.com: domain of andrii@kernel.org designates 147.75.193.91 as permitted sender) smtp.mailfrom=andrii@kernel.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1724956660; a=rsa-sha256; cv=none; b=FIqKBatwea29t9s1Dic+NEFq7AwiZbdMD9C13M8FPyMv4vIcgsq3OU/do5cbSWEEgq1G/L HdkAAMJB0Y+0Yk8oV9nhRaePCNrOF/UE2rUU5S9THbICGaK5A+PXLF5rVvEBK+XsCaIEus bnYx2dIZrZ+LjPRmsbNpYiWxJcRA52o= ARC-Authentication-Results: i=1; imf17.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=W1RTkwN6; dmarc=pass (policy=quarantine) header.from=kernel.org; spf=pass (imf17.hostedemail.com: domain of andrii@kernel.org designates 147.75.193.91 as permitted sender) smtp.mailfrom=andrii@kernel.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1724956660; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=rye/ZL56FAgXcgLeJIdStCBWJGqiX8EXiu7rAVfwJSU=; b=Nvf1kJV0cF5vM4gNFHDnPRKagcNbZ9fCO6szFphcy+7vNlugU0755xhL5CCq4RPuYK445r tu8GN7X7XWeCR8zMIp7Ol6PYD0rWf83vHbEldK3Njl4b/LrWuaq/jMWTLa5zp+d0dredBo 9jkPJLzIPHtLU95jo0z/W+pup7bREVA= Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by nyc.source.kernel.org (Postfix) with ESMTP id 63904A42744; Thu, 29 Aug 2024 18:37:53 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 6BB9FC4CEC7; Thu, 29 Aug 2024 18:37:59 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1724956679; bh=F7UsqMhK02VGjTw6o3bhys9hh68g9/OujJofPiFpMS4=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=W1RTkwN6Rbj5HoYk9B2TAHsY06cg4q4HSn9uT472zc6E48u/a4M15I9RyHUKAVCnb fFbx0cHQm+lyb/Yz4zeCGwqYzGijtLJkzX9o3qB9oqn4tDYfKwypX+yKqhjVUlorO2 EnT+sqPdXaXGyiKnvq2PDMpo5/HAEgc67hCg1LE/S5qREVOunnxZ2spYN6PVBPJGI8 BEiFWZJ9tpBuIWCq0ffuUGVAu4laXHbmHtuv9Pg7SN2yAMWvBfxZCDuRT1HrCx2YzB 6qwZ0Fu8zJ8qSGQtcAOVcFbX5gzAUiXZMsmQ9uyrI5u8z7a5iSxf7RKfB7G+o8Patn lq0HQFjJxo9zA== From: Andrii Nakryiko To: linux-trace-kernel@vger.kernel.org, peterz@infradead.org, oleg@redhat.com Cc: rostedt@goodmis.org, mhiramat@kernel.org, bpf@vger.kernel.org, linux-kernel@vger.kernel.org, jolsa@kernel.org, paulmck@kernel.org, willy@infradead.org, surenb@google.com, akpm@linux-foundation.org, linux-mm@kvack.org, Andrii Nakryiko Subject: [PATCH v4 4/8] uprobes: travers uprobe's consumer list locklessly under SRCU protection Date: Thu, 29 Aug 2024 11:37:37 -0700 Message-ID: <20240829183741.3331213-5-andrii@kernel.org> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20240829183741.3331213-1-andrii@kernel.org> References: <20240829183741.3331213-1-andrii@kernel.org> MIME-Version: 1.0 X-Rspam-User: X-Rspamd-Queue-Id: 7281340008 X-Rspamd-Server: rspam01 X-Stat-Signature: 3cx89et35rer63os7wnje8sdcksqpjh1 X-HE-Tag: 1724956681-146669 X-HE-Meta: U2FsdGVkX19fgo4hqIh+bqiXTkjEDfpPFDGmClIOaDYtf2G1/jx6P92hkDTu0N86iHRvDwWQ85DmeA46WG9j3BWhhrhs5At2FPndHw7b9N64XyZWAbzvWvrEWrUVyn7exgMCB/h7OPCXqjFpvTFxNI0vXenqGZdnVYy5y1SGUTJCE3PRGivZ6BIJcxmaPBlQX05CiTyZp2BE3PYw+EkbS/kdPOBUM+7isyfeVzgVhx4BCbo8I8xgVDk3dJe4Wu3fH7ECZp98uiJUvremzS+qSaYhlXxHEnlcTXNHe8RDzi7AvwdF7sa70hX2EGuNYNniVFcFcEeDFsQS3mXcVl/XR4WoVQVaBHiBx4uRlQE2DhMT4e+W3eWM3RYFd7Z2NTJMnBSdNx4ktwR3WW9Ia10DKP21MNkimYDUmjGfIWsJhrvhBJ3rOmUeNOyjBjGUhYriIDkufAnFS+eXkzNdkuVz8sYc89nEXfgiFq+2mq/zA2b75y9b5oS300ku75LhMrSHVQetPL1nGFZKGhIPv/7B3ldj5YxyGoE5GtZ+a9h02+eZNwkyHeB+v91VEeoSaatzP79NxhfFtPYXHP/aDztPnAaXjvd9Zj2FqrxcQc0TfqyLEZUbrBr7EPSptSPNCP10PP0mnPZUIy7Ahx6pAiqRYizFFs8UzueOO7i7+kBpfb0yYSDHc3fvt1ICRejYNIWpBhuGvoXEWRyBdXf+us1fzKvkDMSFrOpAJCy67qA+2NX7Lv+uf/Bscgx5csknJdPt4Vjd8xW0aYk/aoN7NiS85FEyHOGL4zVxJVjIWKqhMm8IauCvFlcBE4mUJJA18t/gWCjCIH9XNcC8mbhSXxeaevKF2XhX3i3S7VoMoe2NuUAtrZMBgv3NGe9PCnSBHmrXP+YiHWyKMfai8cJEjPVTM1zadfco/YTzHIgwfZ4SfD8jcUgRAH1FO8shuzUy0bERuHBv6jRhtm6fnj00bU9 RL0nj68U TZekeoYTMPRLf0iM5G79FmjFAnC5y14WsCA3wYx6uatsEnQSNE150iDpYLBQ6rVhYUuTNVQ2GGf9/vbqKGMnD0qL0r7PRllfagDaBII62sxI+2Be3A3S8cUjrtjgz+xkd4sRo1rMUldbJ0e0bzWRKXVmiygU2/pJ2N1/e2Showzl/x/IQin6crQotQ581iqMVK0FWbKLQ3DEeMUxH25NP2PLstJzYDOEBmicLXSwlp+WMY5Q= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: uprobe->register_rwsem is one of a few big bottlenecks to scalability of uprobes, so we need to get rid of it to improve uprobe performance and multi-CPU scalability. First, we turn uprobe's consumer list to a typical doubly-linked list and utilize existing RCU-aware helpers for traversing such lists, as well as adding and removing elements from it. For entry uprobes we already have SRCU protection active since before uprobe lookup. For uretprobe we keep refcount, guaranteeing that uprobe won't go away from under us, but we add SRCU protection around consumer list traversal. Lastly, to keep handler_chain()'s UPROBE_HANDLER_REMOVE handling simple, we remember whether any removal was requested during handler calls, but then we double-check the decision under a proper register_rwsem using consumers' filter callbacks. Handler removal is very rare, so this extra lock won't hurt performance, overall, but we also avoid the need for any extra protection (e.g., seqcount locks). Signed-off-by: Andrii Nakryiko --- include/linux/uprobes.h | 2 +- kernel/events/uprobes.c | 104 +++++++++++++++++++++++----------------- 2 files changed, 62 insertions(+), 44 deletions(-) diff --git a/include/linux/uprobes.h b/include/linux/uprobes.h index 9cf0dce62e4c..29c935b0d504 100644 --- a/include/linux/uprobes.h +++ b/include/linux/uprobes.h @@ -35,7 +35,7 @@ struct uprobe_consumer { struct pt_regs *regs); bool (*filter)(struct uprobe_consumer *self, struct mm_struct *mm); - struct uprobe_consumer *next; + struct list_head cons_node; }; #ifdef CONFIG_UPROBES diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c index 8bdcdc6901b2..97e58d160647 100644 --- a/kernel/events/uprobes.c +++ b/kernel/events/uprobes.c @@ -59,7 +59,7 @@ struct uprobe { struct rw_semaphore register_rwsem; struct rw_semaphore consumer_rwsem; struct list_head pending_list; - struct uprobe_consumer *consumers; + struct list_head consumers; struct inode *inode; /* Also hold a ref to inode */ struct rcu_head rcu; loff_t offset; @@ -783,6 +783,7 @@ static struct uprobe *alloc_uprobe(struct inode *inode, loff_t offset, uprobe->inode = inode; uprobe->offset = offset; uprobe->ref_ctr_offset = ref_ctr_offset; + INIT_LIST_HEAD(&uprobe->consumers); init_rwsem(&uprobe->register_rwsem); init_rwsem(&uprobe->consumer_rwsem); RB_CLEAR_NODE(&uprobe->rb_node); @@ -808,32 +809,19 @@ static struct uprobe *alloc_uprobe(struct inode *inode, loff_t offset, static void consumer_add(struct uprobe *uprobe, struct uprobe_consumer *uc) { down_write(&uprobe->consumer_rwsem); - uc->next = uprobe->consumers; - uprobe->consumers = uc; + list_add_rcu(&uc->cons_node, &uprobe->consumers); up_write(&uprobe->consumer_rwsem); } /* * For uprobe @uprobe, delete the consumer @uc. - * Return true if the @uc is deleted successfully - * or return false. + * Should never be called with consumer that's not part of @uprobe->consumers. */ -static bool consumer_del(struct uprobe *uprobe, struct uprobe_consumer *uc) +static void consumer_del(struct uprobe *uprobe, struct uprobe_consumer *uc) { - struct uprobe_consumer **con; - bool ret = false; - down_write(&uprobe->consumer_rwsem); - for (con = &uprobe->consumers; *con; con = &(*con)->next) { - if (*con == uc) { - *con = uc->next; - ret = true; - break; - } - } + list_del_rcu(&uc->cons_node); up_write(&uprobe->consumer_rwsem); - - return ret; } static int __copy_insn(struct address_space *mapping, struct file *filp, @@ -929,7 +917,8 @@ static bool filter_chain(struct uprobe *uprobe, struct mm_struct *mm) bool ret = false; down_read(&uprobe->consumer_rwsem); - for (uc = uprobe->consumers; uc; uc = uc->next) { + list_for_each_entry_srcu(uc, &uprobe->consumers, cons_node, + srcu_read_lock_held(&uprobes_srcu)) { ret = consumer_filter(uc, mm); if (ret) break; @@ -1125,18 +1114,29 @@ void uprobe_unregister(struct uprobe *uprobe, struct uprobe_consumer *uc) int err; down_write(&uprobe->register_rwsem); - if (WARN_ON(!consumer_del(uprobe, uc))) { - err = -ENOENT; - } else { - err = register_for_each_vma(uprobe, NULL); - /* TODO : cant unregister? schedule a worker thread */ - if (unlikely(err)) - uprobe_warn(current, "unregister, leaking uprobe"); - } + consumer_del(uprobe, uc); + err = register_for_each_vma(uprobe, NULL); up_write(&uprobe->register_rwsem); - if (!err) - put_uprobe(uprobe); + /* TODO : cant unregister? schedule a worker thread */ + if (unlikely(err)) { + uprobe_warn(current, "unregister, leaking uprobe"); + goto out_sync; + } + + put_uprobe(uprobe); + +out_sync: + /* + * Now that handler_chain() and handle_uretprobe_chain() iterate over + * uprobe->consumers list under RCU protection without holding + * uprobe->register_rwsem, we need to wait for RCU grace period to + * make sure that we can't call into just unregistered + * uprobe_consumer's callbacks anymore. If we don't do that, fast and + * unlucky enough caller can free consumer's memory and cause + * handler_chain() or handle_uretprobe_chain() to do an use-after-free. + */ + synchronize_srcu(&uprobes_srcu); } EXPORT_SYMBOL_GPL(uprobe_unregister); @@ -1214,13 +1214,20 @@ EXPORT_SYMBOL_GPL(uprobe_register); int uprobe_apply(struct uprobe *uprobe, struct uprobe_consumer *uc, bool add) { struct uprobe_consumer *con; - int ret = -ENOENT; + int ret = -ENOENT, srcu_idx; down_write(&uprobe->register_rwsem); - for (con = uprobe->consumers; con && con != uc ; con = con->next) - ; - if (con) - ret = register_for_each_vma(uprobe, add ? uc : NULL); + + srcu_idx = srcu_read_lock(&uprobes_srcu); + list_for_each_entry_srcu(con, &uprobe->consumers, cons_node, + srcu_read_lock_held(&uprobes_srcu)) { + if (con == uc) { + ret = register_for_each_vma(uprobe, add ? uc : NULL); + break; + } + } + srcu_read_unlock(&uprobes_srcu, srcu_idx); + up_write(&uprobe->register_rwsem); return ret; @@ -2085,10 +2092,12 @@ static void handler_chain(struct uprobe *uprobe, struct pt_regs *regs) struct uprobe_consumer *uc; int remove = UPROBE_HANDLER_REMOVE; bool need_prep = false; /* prepare return uprobe, when needed */ + bool has_consumers = false; - down_read(&uprobe->register_rwsem); current->utask->auprobe = &uprobe->arch; - for (uc = uprobe->consumers; uc; uc = uc->next) { + + list_for_each_entry_srcu(uc, &uprobe->consumers, cons_node, + srcu_read_lock_held(&uprobes_srcu)) { int rc = 0; if (uc->handler) { @@ -2101,17 +2110,24 @@ static void handler_chain(struct uprobe *uprobe, struct pt_regs *regs) need_prep = true; remove &= rc; + has_consumers = true; } current->utask->auprobe = NULL; if (need_prep && !remove) prepare_uretprobe(uprobe, regs); /* put bp at return */ - if (remove && uprobe->consumers) { - WARN_ON(!uprobe_is_active(uprobe)); - unapply_uprobe(uprobe, current->mm); + if (remove && has_consumers) { + down_read(&uprobe->register_rwsem); + + /* re-check that removal is still required, this time under lock */ + if (!filter_chain(uprobe, current->mm)) { + WARN_ON(!uprobe_is_active(uprobe)); + unapply_uprobe(uprobe, current->mm); + } + + up_read(&uprobe->register_rwsem); } - up_read(&uprobe->register_rwsem); } static void @@ -2119,13 +2135,15 @@ handle_uretprobe_chain(struct return_instance *ri, struct pt_regs *regs) { struct uprobe *uprobe = ri->uprobe; struct uprobe_consumer *uc; + int srcu_idx; - down_read(&uprobe->register_rwsem); - for (uc = uprobe->consumers; uc; uc = uc->next) { + srcu_idx = srcu_read_lock(&uprobes_srcu); + list_for_each_entry_srcu(uc, &uprobe->consumers, cons_node, + srcu_read_lock_held(&uprobes_srcu)) { if (uc->ret_handler) uc->ret_handler(uc, ri->func, regs); } - up_read(&uprobe->register_rwsem); + srcu_read_unlock(&uprobes_srcu, srcu_idx); } static struct return_instance *find_next_ret_chain(struct return_instance *ri) From patchwork Thu Aug 29 18:37:38 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrii Nakryiko X-Patchwork-Id: 13783566 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id ACB8CC8303E for ; Thu, 29 Aug 2024 18:38:08 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 8088E6B00BA; Thu, 29 Aug 2024 14:38:06 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 7B7086B00B8; Thu, 29 Aug 2024 14:38:06 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 659246B00BA; Thu, 29 Aug 2024 14:38:06 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 408D66B00B7 for ; Thu, 29 Aug 2024 14:38:06 -0400 (EDT) Received: from smtpin06.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id F17231C580A for ; Thu, 29 Aug 2024 18:38:05 +0000 (UTC) X-FDA: 82506142530.06.96F3E6B Received: from nyc.source.kernel.org (nyc.source.kernel.org [147.75.193.91]) by imf27.hostedemail.com (Postfix) with ESMTP id 6EE3B40013 for ; Thu, 29 Aug 2024 18:38:04 +0000 (UTC) Authentication-Results: imf27.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b="HkUw/pPN"; spf=pass (imf27.hostedemail.com: domain of andrii@kernel.org designates 147.75.193.91 as permitted sender) smtp.mailfrom=andrii@kernel.org; dmarc=pass (policy=quarantine) header.from=kernel.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1724956596; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=ffv33pHTTwJ8xJ+2E08gKVaLK3wPuwOzXAMEySBUC2s=; b=aaxhFv/EAXMz54GPb3Z8Rcly94RGvZx+FlN5BBI7puWe3tCCAdMcahS7PNRx9JrtjWzoaL /rJdwYa6HsuhKWaWKhU+NJMMtl1NjZIeo8NyA1LV2vM6ylbbbulmH68sFSHQ3ZHbKgnmMK htTDOpxAMD+qy1Gm0XsPHmyrDViVpgk= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1724956596; a=rsa-sha256; cv=none; b=TGN4lf4qqMADZmmXC92gg9XkM3GKJMjlNckUQF2SfI+s6z2ryTkvsAlQ+8fH9Y3l1jiQtS iTCYtpdubcrVxzvpP+Cgb7BWWzGq4krS5iOs9klB98+Gf7z+Uvmq+EqYbcqoTp1iE2cBoG ZN/dbwb8fKTjXtgRLn0G/8iPBYR7F14= ARC-Authentication-Results: i=1; imf27.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b="HkUw/pPN"; spf=pass (imf27.hostedemail.com: domain of andrii@kernel.org designates 147.75.193.91 as permitted sender) smtp.mailfrom=andrii@kernel.org; dmarc=pass (policy=quarantine) header.from=kernel.org Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by nyc.source.kernel.org (Postfix) with ESMTP id AF93BA4047A; Thu, 29 Aug 2024 18:37:56 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id B8154C4CEC2; Thu, 29 Aug 2024 18:38:02 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1724956683; bh=geXkqqWUeVDb2gxcg/E5E7J1aR9O8y9hG/x2XUPypPM=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=HkUw/pPNnJxLjUURJlzPHQZr4Qb8F8+sAw4FaFm3KxDm5cyTpd2vqojw+bMqQhctz 2Dv+jAqBhBAjN56/JX1YaFzXs5Vvb1KLvkoah+ojUeJTvmgIBsdGJlYnLDz642TiUa TTHQ27LIQ4Ufyci7sFJqvOHVuz3R0tNHcljwMPwlPOXcCBIZBPMfRV3/0jIY8hPFyn Bz8UKEGrUnb7gEB1rzPiKBEMx0yJlh8EovcIXZlHV8vrL4gjL8FjNKEV8+a3lXZbzZ eV1Qht0taZEOkbDLmOW9A4aCOT2BC4Yz58MDvm/3DAOj/TLrXO8Hew4yVUhxPXhYwE 46SMYHOF6qaLA== From: Andrii Nakryiko To: linux-trace-kernel@vger.kernel.org, peterz@infradead.org, oleg@redhat.com Cc: rostedt@goodmis.org, mhiramat@kernel.org, bpf@vger.kernel.org, linux-kernel@vger.kernel.org, jolsa@kernel.org, paulmck@kernel.org, willy@infradead.org, surenb@google.com, akpm@linux-foundation.org, linux-mm@kvack.org, Andrii Nakryiko Subject: [PATCH v4 5/8] perf/uprobe: split uprobe_unregister() Date: Thu, 29 Aug 2024 11:37:38 -0700 Message-ID: <20240829183741.3331213-6-andrii@kernel.org> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20240829183741.3331213-1-andrii@kernel.org> References: <20240829183741.3331213-1-andrii@kernel.org> MIME-Version: 1.0 X-Rspamd-Queue-Id: 6EE3B40013 X-Stat-Signature: i4m9cdmew7grousz1jrboydc4uef8t1e X-Rspamd-Server: rspam09 X-Rspam-User: X-HE-Tag: 1724956684-682031 X-HE-Meta: U2FsdGVkX1+Okj67Of/kkOGt0h8Dxj7n5ewJ+s196vjGNtBUKsfLxXJvvbqBPO/Om1rygdBZbrCpe6O7uHaj+xQVA22E4XhDYNuT+bTx7MFKiEtiO2JcFdcTQB+rFjqkKMW8MwlYoUi4RSCyoSunuJl8dd86jAqc8HZUSExKF7oZZpWxvQWbaCvFgBSE1sEeFYsHA35Pj0lsrsB8U9jV/Lufk+1iN4/BYbXaOdw806j83/9RrXHerUBvCm9dOwGMp0DWQjpDBC7DgWBwkUu6AevU6Sag9oGdLigQmEcWTiJ95Tf/Xq8sEfJhhfK07eXUUyVM+2jRjf9iUunRgNx9K2e3Rnm3/jg8mkVDyqXi6FngCC45x4+Qd8KXEwQrBhIdzkpFZGB5UTNxld3WT8GxGNH/0P6MayAjW3Txq4qzQzzjsCGmc6Ph86B7Z5pF2j/QGNtgQig/TPqAjoD7mkJ2xP4OYwa8b3lOy/WWRqdeAwQpEDUU0wKXjeXs9W2QwmRZzGlLtlTvRor82WhjuygG89diE6wB646sJNispEBgf2VrFBos24roDs1ddFQX7wuN/zyDkZ9He/I6UZarjYqWAzCBvho3fJofzvq7EQDYIVIG8ungok6rvoKk8KumPvB1EgS5t/FLPnAM8Xf6tdml5Ux9y6JMrGbJkV6a+y/xzcRIlEMV59tdJIBAVoR/VOZMQyWyh5WX7bbkqqkOoC79E5Zmday692FOfZ8AjgbdTg/emxvEcl9BQTLGRxeUW24TgAJu/YAr3l+3k3Za16lZZ3nSr4LSxgVFJuzNKUKrhgRiwXXbOo3mYk5gqmP+uqVfmGDUfj15PQdka64TfqOpDsLLkYsMcQfSKzO0ImA2qiy8ln3VQW1uo6VsZyDiAM/V3gFH94I9ZmNEJG7lv1HLVGUONaf6FhQfSpX3z7dRDiqRPlq9uLBVvsoPx5dtvh34ia3/YhzIxhsgR+ac1lO dnv7QWgW Tf+SsFnjFpdBK9DNwjhosIjSPnAtrbWzFLezATL0Hkl2/okYfx/tYY8eplwSz4BRM2DB0Gczbns0r6F4ncwJ/ttkkNy556KxrMr2lOjlrWAOJbYp6WklJ6U6fhoNh2ijNVuvL6PzibHti4ifCxTNb6ONN10OJ8PU8ftL1xkIYFa+5rNqhGrJ4HoSkitjs23dJRxRWfVV92EV45DgcRQ2t5YUcatp85/BX/9tsWdbCIk0IlMBA2+mFUqCRW/Dj3oXGSQ0p/XKieDW1eyiFDcUauoUqM4tgv/DdiSuqGOAXljudFiNX9C02ObvkYeB3eQvFd0AD8Z10yz/p7JXRIePoUSETnQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: From: Peter Zijlstra With uprobe_unregister() having grown a synchronize_srcu(), it becomes fairly slow to call. Esp. since both users of this API call it in a loop. Peel off the sync_srcu() and do it once, after the loop. We also need to add uprobe_unregister_sync() into uprobe_register()'s error handling path, as we need to be careful about returning to the caller before we have a guarantee that partially attached consumer won't be called anymore. This is an unlikely slow path and this should be totally fine to be slow in the case of a failed attach. Signed-off-by: Peter Zijlstra (Intel) Co-developed-by: Andrii Nakryiko Signed-off-by: Andrii Nakryiko --- include/linux/uprobes.h | 8 +++++-- kernel/events/uprobes.c | 21 +++++++++++++------ kernel/trace/bpf_trace.c | 5 ++++- kernel/trace/trace_uprobe.c | 6 +++++- .../selftests/bpf/bpf_testmod/bpf_testmod.c | 3 ++- 5 files changed, 32 insertions(+), 11 deletions(-) diff --git a/include/linux/uprobes.h b/include/linux/uprobes.h index 29c935b0d504..e41cdae5597b 100644 --- a/include/linux/uprobes.h +++ b/include/linux/uprobes.h @@ -108,7 +108,8 @@ extern unsigned long uprobe_get_trap_addr(struct pt_regs *regs); extern int uprobe_write_opcode(struct arch_uprobe *auprobe, struct mm_struct *mm, unsigned long vaddr, uprobe_opcode_t); extern struct uprobe *uprobe_register(struct inode *inode, loff_t offset, loff_t ref_ctr_offset, struct uprobe_consumer *uc); extern int uprobe_apply(struct uprobe *uprobe, struct uprobe_consumer *uc, bool); -extern void uprobe_unregister(struct uprobe *uprobe, struct uprobe_consumer *uc); +extern void uprobe_unregister_nosync(struct uprobe *uprobe, struct uprobe_consumer *uc); +extern void uprobe_unregister_sync(void); extern int uprobe_mmap(struct vm_area_struct *vma); extern void uprobe_munmap(struct vm_area_struct *vma, unsigned long start, unsigned long end); extern void uprobe_start_dup_mmap(void); @@ -157,7 +158,10 @@ uprobe_apply(struct uprobe* uprobe, struct uprobe_consumer *uc, bool add) return -ENOSYS; } static inline void -uprobe_unregister(struct uprobe *uprobe, struct uprobe_consumer *uc) +uprobe_unregister_nosync(struct uprobe *uprobe, struct uprobe_consumer *uc) +{ +} +static inline void uprobe_unregister_sync(void) { } static inline int uprobe_mmap(struct vm_area_struct *vma) diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c index 97e58d160647..e9b755ddf960 100644 --- a/kernel/events/uprobes.c +++ b/kernel/events/uprobes.c @@ -1105,11 +1105,11 @@ register_for_each_vma(struct uprobe *uprobe, struct uprobe_consumer *new) } /** - * uprobe_unregister - unregister an already registered probe. + * uprobe_unregister_nosync - unregister an already registered probe. * @uprobe: uprobe to remove * @uc: identify which probe if multiple probes are colocated. */ -void uprobe_unregister(struct uprobe *uprobe, struct uprobe_consumer *uc) +void uprobe_unregister_nosync(struct uprobe *uprobe, struct uprobe_consumer *uc) { int err; @@ -1121,12 +1121,15 @@ void uprobe_unregister(struct uprobe *uprobe, struct uprobe_consumer *uc) /* TODO : cant unregister? schedule a worker thread */ if (unlikely(err)) { uprobe_warn(current, "unregister, leaking uprobe"); - goto out_sync; + return; } put_uprobe(uprobe); +} +EXPORT_SYMBOL_GPL(uprobe_unregister_nosync); -out_sync: +void uprobe_unregister_sync(void) +{ /* * Now that handler_chain() and handle_uretprobe_chain() iterate over * uprobe->consumers list under RCU protection without holding @@ -1138,7 +1141,7 @@ void uprobe_unregister(struct uprobe *uprobe, struct uprobe_consumer *uc) */ synchronize_srcu(&uprobes_srcu); } -EXPORT_SYMBOL_GPL(uprobe_unregister); +EXPORT_SYMBOL_GPL(uprobe_unregister_sync); /** * uprobe_register - register a probe @@ -1196,7 +1199,13 @@ struct uprobe *uprobe_register(struct inode *inode, up_write(&uprobe->register_rwsem); if (ret) { - uprobe_unregister(uprobe, uc); + uprobe_unregister_nosync(uprobe, uc); + /* + * Registration might have partially succeeded, so we can have + * this consumer being called right at this time. We need to + * sync here. It's ok, it's unlikely slow path. + */ + uprobe_unregister_sync(); return ERR_PTR(ret); } diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c index 73c570b5988b..6b632710c98e 100644 --- a/kernel/trace/bpf_trace.c +++ b/kernel/trace/bpf_trace.c @@ -3184,7 +3184,10 @@ static void bpf_uprobe_unregister(struct bpf_uprobe *uprobes, u32 cnt) u32 i; for (i = 0; i < cnt; i++) - uprobe_unregister(uprobes[i].uprobe, &uprobes[i].consumer); + uprobe_unregister_nosync(uprobes[i].uprobe, &uprobes[i].consumer); + + if (cnt) + uprobe_unregister_sync(); } static void bpf_uprobe_multi_link_release(struct bpf_link *link) diff --git a/kernel/trace/trace_uprobe.c b/kernel/trace/trace_uprobe.c index 7eb79e0a5352..f7443e996b1b 100644 --- a/kernel/trace/trace_uprobe.c +++ b/kernel/trace/trace_uprobe.c @@ -1097,6 +1097,7 @@ static int trace_uprobe_enable(struct trace_uprobe *tu, filter_func_t filter) static void __probe_event_disable(struct trace_probe *tp) { struct trace_uprobe *tu; + bool sync = false; tu = container_of(tp, struct trace_uprobe, tp); WARN_ON(!uprobe_filter_is_empty(tu->tp.event->filter)); @@ -1105,9 +1106,12 @@ static void __probe_event_disable(struct trace_probe *tp) if (!tu->uprobe) continue; - uprobe_unregister(tu->uprobe, &tu->consumer); + uprobe_unregister_nosync(tu->uprobe, &tu->consumer); + sync = true; tu->uprobe = NULL; } + if (sync) + uprobe_unregister_sync(); } static int probe_event_enable(struct trace_event_call *call, diff --git a/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.c b/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.c index 3c0515a27842..1fc16657cf42 100644 --- a/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.c +++ b/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.c @@ -475,7 +475,8 @@ static void testmod_unregister_uprobe(void) mutex_lock(&testmod_uprobe_mutex); if (uprobe.uprobe) { - uprobe_unregister(uprobe.uprobe, &uprobe.consumer); + uprobe_unregister_nosync(uprobe.uprobe, &uprobe.consumer); + uprobe_unregister_sync(); path_put(&uprobe.path); uprobe.uprobe = NULL; } From patchwork Thu Aug 29 18:37:39 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrii Nakryiko X-Patchwork-Id: 13783567 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 95F03C87FD2 for ; Thu, 29 Aug 2024 18:38:11 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C26936B008C; Thu, 29 Aug 2024 14:38:09 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id BAD886B0098; Thu, 29 Aug 2024 14:38:09 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A018C6B00B7; Thu, 29 Aug 2024 14:38:09 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 7C7796B008C for ; Thu, 29 Aug 2024 14:38:09 -0400 (EDT) Received: from smtpin11.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 2FDBEA0816 for ; Thu, 29 Aug 2024 18:38:09 +0000 (UTC) X-FDA: 82506142698.11.D56EFF7 Received: from nyc.source.kernel.org (nyc.source.kernel.org [147.75.193.91]) by imf22.hostedemail.com (Postfix) with ESMTP id 8945EC0006 for ; Thu, 29 Aug 2024 18:38:07 +0000 (UTC) Authentication-Results: imf22.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=JzC1CmdK; dmarc=pass (policy=quarantine) header.from=kernel.org; spf=pass (imf22.hostedemail.com: domain of andrii@kernel.org designates 147.75.193.91 as permitted sender) smtp.mailfrom=andrii@kernel.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1724956621; a=rsa-sha256; cv=none; b=TL1+Tgg76Y5AEi0AuAd6//EPs9u6scnRUNCJJ0s8qwkIcbNttP2+dSKr60yIM+/1WqiPKy aB/X9zQ7OvTmzc/xMkujSSsyGk8I5Wb9jtc7OhzW/hvh1tHMA5oKPL/Kzx11Fj+1INDYa4 14PQeuAXiQQdZlEMDPh+tbJ2QD6j30s= ARC-Authentication-Results: i=1; imf22.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=JzC1CmdK; dmarc=pass (policy=quarantine) header.from=kernel.org; spf=pass (imf22.hostedemail.com: domain of andrii@kernel.org designates 147.75.193.91 as permitted sender) smtp.mailfrom=andrii@kernel.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1724956621; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=hCGVwTt9bzFS2vsQe5QWlZJE2SQYwI4THELaVkQI2Lw=; b=09XjGE5FVJQxTGHNOsClqyUdib/RLxkQkXa9cYXocRGkS1joxkuNtNg+e8wiD0UYyiPlT4 N5osv0HNiV3UxJDe9eE/+TPwMvsRupqIiR04ZpBR8cD2/257UANVTZEMsxMollf7tPj23I +hu2FT4wBl+U/GdH9UTs6JH1r4RD8n0= Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by nyc.source.kernel.org (Postfix) with ESMTP id A441DA42811; Thu, 29 Aug 2024 18:37:59 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 0DB41C4CEC8; Thu, 29 Aug 2024 18:38:06 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1724956686; bh=iIO+4imlGFWJODra3HvN51TRltaxN3vJ0lLIJ7O8eDQ=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=JzC1CmdKA1dtEn7aGy1jjW/ec/JkAE54yAYkcV7++RQ8t92nsCNYrmWjTv5EAce7s yziefBlq6hxVldBtXB+7OKy7SGFy0H/BKTt7LyNloWdF/FxvOQe4YPAHuciZta8xmo hNN8v5AtrC45em3nFrg143zZISbYTM7V8ypcuYxl3xFF0RNlNuVWhFSWS8mx1JLRYL 8ttljUgfpwxp5E9rn/hUY3uSv4eOT2RNYFphSR4IaMcMZgHhEd07n605BPDs9uQ75D Dc9aJe8SJJesglvAjesr+/i3sZBzpAmaZyvulNcBOWBpbBxAJhc7tGiW1DpBfj2ppr ODT29am/jlW5w== From: Andrii Nakryiko To: linux-trace-kernel@vger.kernel.org, peterz@infradead.org, oleg@redhat.com Cc: rostedt@goodmis.org, mhiramat@kernel.org, bpf@vger.kernel.org, linux-kernel@vger.kernel.org, jolsa@kernel.org, paulmck@kernel.org, willy@infradead.org, surenb@google.com, akpm@linux-foundation.org, linux-mm@kvack.org, Andrii Nakryiko Subject: [PATCH v4 6/8] rbtree: provide rb_find_rcu() / rb_find_add_rcu() Date: Thu, 29 Aug 2024 11:37:39 -0700 Message-ID: <20240829183741.3331213-7-andrii@kernel.org> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20240829183741.3331213-1-andrii@kernel.org> References: <20240829183741.3331213-1-andrii@kernel.org> MIME-Version: 1.0 X-Rspamd-Queue-Id: 8945EC0006 X-Rspam-User: X-Rspamd-Server: rspam05 X-Stat-Signature: m1j6wczqa6dmsj33xt1jhgr4zcyj5i1m X-HE-Tag: 1724956687-65828 X-HE-Meta: U2FsdGVkX1+phtYLaGjhorEa0ksUojqfW3wwq7MJCCRC8ftnhZTCInCE1J7V3xt4RkZo/jOPrjtnGdnRjbj/7O0rxWYpURg1wSXw51zvNmbDFpEUSPlzwHkc0U6y1aEXWMAWL5U11EV9okUqnmEa+tJBqul9Y4hIU4SegTZ+WsiVvMF0gnp4N5PBAfe4Dc7LME6HeZIQVhBYEXACqk0laOMM99di9XurUZhDRQJIMYxAtguI76Q2C8LAvBrdfITU3i1MQ2K2OUEtq3ZSYJ/cYPiyR9iDj7vK1t9BW6esmCYKfTmxHJE++JuMRj6S19KeaRmnKzpBAps/UUL4JByLj6PQiAQkQQqWS5fiqP/XrDZgf76HS+Vybw5eK7bpDGnjiQn1UTG3DaOgBnOrt7eIhMITJN8Z0ZfR++KYeYdCm4o0YG8oyG5MnofM5Qi/ZtYb6SlPthJO3u4Q+O9q0KI+VQAaat4eOxLFVds0iKOBFaXpqnkft/KaCvj/+66DU1DtyGM5Dx0xQZFaDqAvGxOpRfFtw7bnKu9QhQJ77d0+sv0zMm95RRGK4Eeb/P9+QkfnhIfG17hArINMhn/VmcQ1Z14l9JYy0RAVSSfO1ZbJB12Iu5/O8smeBItFS7gQsAsCdHDVhMUTSQ/ebThl5qIsud8Apztr22E5Le5rSQwqBjoUTNloa52ByXkALL6goIMK6nufFTFxAsUFGi/h7QF/0XBKN3ReMZ+f4NC9miCFscHosEeSq98bOKFhNhDBOww3gzvqXX4rZ5ZM2zgJbpjHLDUkf+QCoh5CDZUUZXdLLSKOcl3jJxgG/FS11KgXvyrRN9C+uEUfZ2WNDFuMlpdFpFXha3T1Rh28Y/Nf9rY3/OqasRLdxxKw9/JWx2VFSt0G1E9NQw8btYiF2FvzUweYeJ/WzyziLxBvIZsqhKevP/1886YErJwLVUSIGRYWbQpO8jeKaXLRV9kavFnT9fi AUW/SEPG NdV+cR22DlnHG+6PSXOszAyNoIlsDu+oGxjlgopXfB5CeUnPKAvFt1wwK7qiLixDds4tARre6ACBzVh3hamJ36guhYpe30Cq8FMX/rmLKFY+7Y9hvM45mi3l271QK2xilppMPHRW3ABTiznd/k+AtUmg6dPqbV7bVzBL5JBdK8nJkTXZmp6tAeV0RrmdNbDGGGda0JSYxKtM+NOQkLRFyL74EjNePrd6aVUljx8CCHCQMjFSnenOIDpal5w5VyaQ06G49irgRpNKP+bFovZhCv1HW3c+TfeV5zuCH0pMz+6CWAm7sL5Hss/oX28hTduXUbvaqpWB2fvzy50bbkT6bwl8Ncg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000001, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: From: Peter Zijlstra Much like latch_tree, add two RCU methods for the regular RB-tree, which can be used in conjunction with a seqcount to provide lockless lookups. Signed-off-by: Peter Zijlstra (Intel) Reviewed-by: Masami Hiramatsu (Google) --- include/linux/rbtree.h | 67 ++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 67 insertions(+) diff --git a/include/linux/rbtree.h b/include/linux/rbtree.h index f7edca369eda..7c173aa64e1e 100644 --- a/include/linux/rbtree.h +++ b/include/linux/rbtree.h @@ -244,6 +244,42 @@ rb_find_add(struct rb_node *node, struct rb_root *tree, return NULL; } +/** + * rb_find_add_rcu() - find equivalent @node in @tree, or add @node + * @node: node to look-for / insert + * @tree: tree to search / modify + * @cmp: operator defining the node order + * + * Adds a Store-Release for link_node. + * + * Returns the rb_node matching @node, or NULL when no match is found and @node + * is inserted. + */ +static __always_inline struct rb_node * +rb_find_add_rcu(struct rb_node *node, struct rb_root *tree, + int (*cmp)(struct rb_node *, const struct rb_node *)) +{ + struct rb_node **link = &tree->rb_node; + struct rb_node *parent = NULL; + int c; + + while (*link) { + parent = *link; + c = cmp(node, parent); + + if (c < 0) + link = &parent->rb_left; + else if (c > 0) + link = &parent->rb_right; + else + return parent; + } + + rb_link_node_rcu(node, parent, link); + rb_insert_color(node, tree); + return NULL; +} + /** * rb_find() - find @key in tree @tree * @key: key to match @@ -272,6 +308,37 @@ rb_find(const void *key, const struct rb_root *tree, return NULL; } +/** + * rb_find_rcu() - find @key in tree @tree + * @key: key to match + * @tree: tree to search + * @cmp: operator defining the node order + * + * Notably, tree descent vs concurrent tree rotations is unsound and can result + * in false-negatives. + * + * Returns the rb_node matching @key or NULL. + */ +static __always_inline struct rb_node * +rb_find_rcu(const void *key, const struct rb_root *tree, + int (*cmp)(const void *key, const struct rb_node *)) +{ + struct rb_node *node = tree->rb_node; + + while (node) { + int c = cmp(key, node); + + if (c < 0) + node = rcu_dereference_raw(node->rb_left); + else if (c > 0) + node = rcu_dereference_raw(node->rb_right); + else + return node; + } + + return NULL; +} + /** * rb_find_first() - find the first @key in @tree * @key: key to match From patchwork Thu Aug 29 18:37:40 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Andrii Nakryiko X-Patchwork-Id: 13783568 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id DB866C8303E for ; Thu, 29 Aug 2024 18:38:16 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 6FA8F6B00BD; Thu, 29 Aug 2024 14:38:16 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 67E966B00BE; Thu, 29 Aug 2024 14:38:16 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 4F9BC6B00BF; Thu, 29 Aug 2024 14:38:16 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 2D36A6B00BD for ; Thu, 29 Aug 2024 14:38:16 -0400 (EDT) Received: from smtpin25.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id AA03C1A0791 for ; Thu, 29 Aug 2024 18:38:15 +0000 (UTC) X-FDA: 82506142950.25.AFB99E9 Received: from sin.source.kernel.org (sin.source.kernel.org [145.40.73.55]) by imf14.hostedemail.com (Postfix) with ESMTP id 61A75100010 for ; Thu, 29 Aug 2024 18:38:12 +0000 (UTC) Authentication-Results: imf14.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=XF0YpnC5; dmarc=pass (policy=quarantine) header.from=kernel.org; spf=pass (imf14.hostedemail.com: domain of andrii@kernel.org designates 145.40.73.55 as permitted sender) smtp.mailfrom=andrii@kernel.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1724956673; a=rsa-sha256; cv=none; b=PTwB7PF1938EqoVOnKUtlWBsg1Lvj+uzJrGv8bBzdCcO/fkGtElOCWFCpvUX8j/DuRgQ8F T5r+fPpWxaqZYSPwAGXGI6CcQzgvluMLkCMSWApsSe0erGhpanecGXxUVhTOY5ScqUuzeM 35Ou2RyWD9tDjrmHlsR2PRfFkhG3ggg= ARC-Authentication-Results: i=1; imf14.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=XF0YpnC5; dmarc=pass (policy=quarantine) header.from=kernel.org; spf=pass (imf14.hostedemail.com: domain of andrii@kernel.org designates 145.40.73.55 as permitted sender) smtp.mailfrom=andrii@kernel.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1724956673; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=o/ukAiLYasM21m4ZiLLKXGAN6zTzRdy+4EiIKU1TU2Q=; b=TN+WqrFdat6fHzMBCMehd0tTZ1TosWYF03m2cQM8H1Ulx6SP1/wOsRYGq3jeaRtse9KEcL NnU+/KehQljj2Y7k6WLA+Jj2ce9sbYl4n1tOBGZcluAtoxOiDE2rMC3ugEr3IWpaHFPDlD 62DK5l37UKpJkDkwP/IfymUrOIpZ1O4= Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by sin.source.kernel.org (Postfix) with ESMTP id 9EA61CE1C3C; Thu, 29 Aug 2024 18:38:10 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 3F230C4AF09; Thu, 29 Aug 2024 18:38:09 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1724956689; bh=mdH6TrklWVyaR/UpNIC2j1A7Fp9rcW7rzM5fvQ4YaHo=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=XF0YpnC5vTfJOQNNzjrlVXplIzK4BiEEzeHP3AGqpYm8JuozzgWT8V3GTt0aHub7q 91Vxk4w5bA7UQbiu+92B+/1BSG52/lE8ORJAcy8loT7JhyZolfthiIwkxzNZ41N6ls CeO098IN8DYVaxzfIXWYpK1ipEvG5UyTpIJmdm6maQk//l0Y8GFBoG+UhM0PD/vl04 L5SZcPvarLd2dxTjc1F4BVFTcF3Mitw2GlxkDMt93LCn6gglORlgtoOSkrzloTSo8x 1cRD30AZ/JurDAGTpIrw77yiUUpqe5kYYDzSQdDGuxnOQvLJGIe+Q9b+KTM1r+7/vI oqseR6t3ofaIA== From: Andrii Nakryiko To: linux-trace-kernel@vger.kernel.org, peterz@infradead.org, oleg@redhat.com Cc: rostedt@goodmis.org, mhiramat@kernel.org, bpf@vger.kernel.org, linux-kernel@vger.kernel.org, jolsa@kernel.org, paulmck@kernel.org, willy@infradead.org, surenb@google.com, akpm@linux-foundation.org, linux-mm@kvack.org, Andrii Nakryiko Subject: [PATCH v4 7/8] uprobes: perform lockless SRCU-protected uprobes_tree lookup Date: Thu, 29 Aug 2024 11:37:40 -0700 Message-ID: <20240829183741.3331213-8-andrii@kernel.org> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20240829183741.3331213-1-andrii@kernel.org> References: <20240829183741.3331213-1-andrii@kernel.org> MIME-Version: 1.0 X-Rspam-User: X-Rspamd-Queue-Id: 61A75100010 X-Rspamd-Server: rspam01 X-Stat-Signature: d7zrary8hdms45jdn66swy6ikmnerzka X-HE-Tag: 1724956692-562817 X-HE-Meta: U2FsdGVkX19nwR9IYr/KCjYWUN0wKff8hDwzxnjmkZv9esgyx1yqP5HFAlstakF8of+twYs0CE0jHJoly4x2839RIICXjmP21HsWoRTlt5Sv9inOlZE970MfvIiYDiJ1/BGguwcJhDtASM77fJv1E6jbkEWIxnVTh5C9BEg62yNyBujIXnnMvgC4hSoHgmAJI/tcVGa+xv3PlYkXN5txGo/LKcwLrzINEbjmj9OMENa8t1YsyoaPtkZp5vG3X1t7lT9YUD/nViMpEA5XmETLNuElBSdethdxG6svKeGWk6pcWRfFXT0GLinxW+kvd/Licuz/CxcmecwsIOxuINp6Smw0sEf6EqMl2+n2HIvXZ/phdp/WsK6CViPWXGzAv5AUeV41ED1csDpBAYd/AGZvPDQWSm49TjP/fP5kzL20V4fk99CgQ4tqhlQBZV8XScpfuGQ8Tna3u9F35FYdffDWrhg8GgwOMY80XKlYbLWDkkae8il2UdepUz049ySYXEKMFFDSGAMZHCDI0Z/VG42trwwJtcUaM2+A3kxzoepiizNNH7qRFsGF6i9pb410kHrerf1lH4O2WC1SwViouEWn0p8GcPQpPEcc9Gr3ba1hFLN36sn+ACoJINr5NzxamJ31stVmEZHiG/tNL1h2DSkv7v8umwbVTc047qgyDRmXX4SYKPCprMz56QEQMv5Iafp1dbAmTqXZRxRpkGkk5FDl6G9+WKvdAOuIYI87KpqWtBAWQStcnQn8f35l0bnBQbgL3lOj9GsxRmd7Vbw7v5pWIIKzKUyqYIJY35/6ofpJbTg7lUSfVlJE9BFqHR3kV4lmTXHX93P+wohtCDALLLtvhBgRqfmKtU8sLnggc1vxZU5SmRdotOFFs1RH1yzw6ExJLWRA/fTqYLxcl2tgaor/z/dH3AL/KD3ada/PvSMzjRPS1upynXS2Z+e/xvMS8c1FZ0RqxLGKHZS2oEsyfu+ yDLJGvZ+ 8rZvP3xaMOHwXXYs8rP9Ijmk+FWRCnFbkPVSP3H1bnkelid9oJrsUf9w0xcAZZRlyntD8hGfig8cHtpBh/9TY+hT4m+hpjKQ7/ZBuiHxWT7XFZKLMyFUJxhN8Y55qLNHTeJlnCERXpHhh844q5XOaiHef7c/qv/A+OKPkeZ3aEKpb9CRuUf+BcYsndNPdacs0l9m5DJl62dxGpfUEeo52B3Sv0CU3tIcrz6c6wA3OC78UyOCOp/2H3XGAAnL4aVTdXZVTMuqiLpOjr9V8ytpWxX6d8+ENWjl6RbN3AqksE8qgWcuyRhDYxN5GsSpQMespV5rTrxXNuc2Ihr5zQiZ710YoWmH9N7obbURno7j9B3XUtFW9Mb0uq2WnUQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Another big bottleneck to scalablity is uprobe_treelock that's taken in a very hot path in handle_swbp(). Now that uprobes are SRCU-protected, take advantage of that and make uprobes_tree RB-tree look up lockless. To make RB-tree RCU-protected lockless lookup correct, we need to take into account that such RB-tree lookup can return false negatives if there are parallel RB-tree modifications (rotations) going on. We use seqcount lock to detect whether RB-tree changed, and if we find nothing while RB-tree got modified inbetween, we just retry. If uprobe was found, then it's guaranteed to be a correct lookup. With all the lock-avoiding changes done, we get a pretty decent improvement in performance and scalability of uprobes with number of CPUs, even though we are still nowhere near linear scalability. This is due to SRCU not really scaling very well with number of CPUs on a particular hardware that was used for testing (80-core Intel Xeon Gold 6138 CPU @ 2.00GHz), but also due to the remaning mmap_lock, which is currently taken to resolve interrupt address to inode+offset and then uprobe instance. And, of course, uretprobes still need similar RCU to avoid refcount in the hot path, which will be addressed in the follow up patches. Nevertheless, the improvement is good. We used BPF selftest-based uprobe-nop and uretprobe-nop benchmarks to get the below numbers, varying number of CPUs on which uprobes and uretprobes are triggered. BASELINE ======== uprobe-nop ( 1 cpus): 3.032 ± 0.023M/s ( 3.032M/s/cpu) uprobe-nop ( 2 cpus): 3.452 ± 0.005M/s ( 1.726M/s/cpu) uprobe-nop ( 4 cpus): 3.663 ± 0.005M/s ( 0.916M/s/cpu) uprobe-nop ( 8 cpus): 3.718 ± 0.038M/s ( 0.465M/s/cpu) uprobe-nop (16 cpus): 3.344 ± 0.008M/s ( 0.209M/s/cpu) uprobe-nop (32 cpus): 2.288 ± 0.021M/s ( 0.071M/s/cpu) uprobe-nop (64 cpus): 3.205 ± 0.004M/s ( 0.050M/s/cpu) uretprobe-nop ( 1 cpus): 1.979 ± 0.005M/s ( 1.979M/s/cpu) uretprobe-nop ( 2 cpus): 2.361 ± 0.005M/s ( 1.180M/s/cpu) uretprobe-nop ( 4 cpus): 2.309 ± 0.002M/s ( 0.577M/s/cpu) uretprobe-nop ( 8 cpus): 2.253 ± 0.001M/s ( 0.282M/s/cpu) uretprobe-nop (16 cpus): 2.007 ± 0.000M/s ( 0.125M/s/cpu) uretprobe-nop (32 cpus): 1.624 ± 0.003M/s ( 0.051M/s/cpu) uretprobe-nop (64 cpus): 2.149 ± 0.001M/s ( 0.034M/s/cpu) SRCU CHANGES ============ uprobe-nop ( 1 cpus): 3.276 ± 0.005M/s ( 3.276M/s/cpu) uprobe-nop ( 2 cpus): 4.125 ± 0.002M/s ( 2.063M/s/cpu) uprobe-nop ( 4 cpus): 7.713 ± 0.002M/s ( 1.928M/s/cpu) uprobe-nop ( 8 cpus): 8.097 ± 0.006M/s ( 1.012M/s/cpu) uprobe-nop (16 cpus): 6.501 ± 0.056M/s ( 0.406M/s/cpu) uprobe-nop (32 cpus): 4.398 ± 0.084M/s ( 0.137M/s/cpu) uprobe-nop (64 cpus): 6.452 ± 0.000M/s ( 0.101M/s/cpu) uretprobe-nop ( 1 cpus): 2.055 ± 0.001M/s ( 2.055M/s/cpu) uretprobe-nop ( 2 cpus): 2.677 ± 0.000M/s ( 1.339M/s/cpu) uretprobe-nop ( 4 cpus): 4.561 ± 0.003M/s ( 1.140M/s/cpu) uretprobe-nop ( 8 cpus): 5.291 ± 0.002M/s ( 0.661M/s/cpu) uretprobe-nop (16 cpus): 5.065 ± 0.019M/s ( 0.317M/s/cpu) uretprobe-nop (32 cpus): 3.622 ± 0.003M/s ( 0.113M/s/cpu) uretprobe-nop (64 cpus): 3.723 ± 0.002M/s ( 0.058M/s/cpu) Peak througput increased from 3.7 mln/s (uprobe triggerings) up to about 8 mln/s. For uretprobes it's a bit more modest with bump from 2.4 mln/s to 5mln/s. Suggested-by: Peter Zijlstra (Intel) Signed-off-by: Andrii Nakryiko --- kernel/events/uprobes.c | 30 ++++++++++++++++++++++++------ 1 file changed, 24 insertions(+), 6 deletions(-) diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c index e9b755ddf960..8a464cf38127 100644 --- a/kernel/events/uprobes.c +++ b/kernel/events/uprobes.c @@ -40,6 +40,7 @@ static struct rb_root uprobes_tree = RB_ROOT; #define no_uprobe_events() RB_EMPTY_ROOT(&uprobes_tree) static DEFINE_RWLOCK(uprobes_treelock); /* serialize rbtree access */ +static seqcount_rwlock_t uprobes_seqcount = SEQCNT_RWLOCK_ZERO(uprobes_seqcount, &uprobes_treelock); DEFINE_STATIC_SRCU(uprobes_srcu); @@ -634,8 +635,11 @@ static void put_uprobe(struct uprobe *uprobe) write_lock(&uprobes_treelock); - if (uprobe_is_active(uprobe)) + if (uprobe_is_active(uprobe)) { + write_seqcount_begin(&uprobes_seqcount); rb_erase(&uprobe->rb_node, &uprobes_tree); + write_seqcount_end(&uprobes_seqcount); + } write_unlock(&uprobes_treelock); @@ -701,14 +705,26 @@ static struct uprobe *find_uprobe_rcu(struct inode *inode, loff_t offset) .offset = offset, }; struct rb_node *node; + unsigned int seq; lockdep_assert(srcu_read_lock_held(&uprobes_srcu)); - read_lock(&uprobes_treelock); - node = rb_find(&key, &uprobes_tree, __uprobe_cmp_key); - read_unlock(&uprobes_treelock); + do { + seq = read_seqcount_begin(&uprobes_seqcount); + node = rb_find_rcu(&key, &uprobes_tree, __uprobe_cmp_key); + /* + * Lockless RB-tree lookups can result only in false negatives. + * If the element is found, it is correct and can be returned + * under RCU protection. If we find nothing, we need to + * validate that seqcount didn't change. If it did, we have to + * try again as we might have missed the element (false + * negative). If seqcount is unchanged, search truly failed. + */ + if (node) + return __node_2_uprobe(node); + } while (read_seqcount_retry(&uprobes_seqcount, seq)); - return node ? __node_2_uprobe(node) : NULL; + return NULL; } /* @@ -730,7 +746,7 @@ static struct uprobe *__insert_uprobe(struct uprobe *uprobe) { struct rb_node *node; again: - node = rb_find_add(&uprobe->rb_node, &uprobes_tree, __uprobe_cmp); + node = rb_find_add_rcu(&uprobe->rb_node, &uprobes_tree, __uprobe_cmp); if (node) { struct uprobe *u = __node_2_uprobe(node); @@ -755,7 +771,9 @@ static struct uprobe *insert_uprobe(struct uprobe *uprobe) struct uprobe *u; write_lock(&uprobes_treelock); + write_seqcount_begin(&uprobes_seqcount); u = __insert_uprobe(uprobe); + write_seqcount_end(&uprobes_seqcount); write_unlock(&uprobes_treelock); return u; From patchwork Thu Aug 29 18:37:41 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Andrii Nakryiko X-Patchwork-Id: 13783569 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5D100C8303E for ; Thu, 29 Aug 2024 18:38:20 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id DFAAF6B00BF; Thu, 29 Aug 2024 14:38:19 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id DAA376B00C0; Thu, 29 Aug 2024 14:38:19 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C23D26B00C1; Thu, 29 Aug 2024 14:38:19 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 9BF226B00BF for ; Thu, 29 Aug 2024 14:38:19 -0400 (EDT) Received: from smtpin17.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 5699BAA12F for ; Thu, 29 Aug 2024 18:38:19 +0000 (UTC) X-FDA: 82506143118.17.938581C Received: from sin.source.kernel.org (sin.source.kernel.org [145.40.73.55]) by imf27.hostedemail.com (Postfix) with ESMTP id 1108240010 for ; Thu, 29 Aug 2024 18:38:16 +0000 (UTC) Authentication-Results: imf27.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=QnKvJP4g; dmarc=pass (policy=quarantine) header.from=kernel.org; spf=pass (imf27.hostedemail.com: domain of andrii@kernel.org designates 145.40.73.55 as permitted sender) smtp.mailfrom=andrii@kernel.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1724956598; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=EGkxGsDkQkjBsCH9yt88K3Cz6L4BU6DOPAGgBXbfy/I=; b=21IloUehtEZzfp3Fh4NL11RDX9jkLonjy9Ohxk4ObfCksOTnTfFKKVsKPevYdJ201vB8ru 4UHv1ar9JE2IRNqajyOgLJI1gnoHo2Rb2htvQKrg7PiEp8uOE9IWMQsdBWzExe3283HhXf OTOXwRrANVdCIxCDNcqpQT6xPxsXkCI= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1724956598; a=rsa-sha256; cv=none; b=QfiQKhRkr/oQT7M3xwVWAhLzAoIR8DMGFoVKrASEZ0zN7N1+PdmlVWuoPmTUkZkd6qTuMe JnZ75lDCzDLEz2y8t18LSFMdW1sNipqqdgX1VuulLn/UO0LDiYyNMvc1Ugaw/ZFhG5PM1i ac/IovrXMygfxfi2fR7tW11kOw9luiY= ARC-Authentication-Results: i=1; imf27.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=QnKvJP4g; dmarc=pass (policy=quarantine) header.from=kernel.org; spf=pass (imf27.hostedemail.com: domain of andrii@kernel.org designates 145.40.73.55 as permitted sender) smtp.mailfrom=andrii@kernel.org Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by sin.source.kernel.org (Postfix) with ESMTP id DD5E5CE1C58; Thu, 29 Aug 2024 18:38:13 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 9D088C4CEC1; Thu, 29 Aug 2024 18:38:12 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1724956692; bh=Jv6KT2UGgAmwIKABURNXs74b2nTjkVTPbA6PDUOU5nQ=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=QnKvJP4gt4bTiecXgx46lcFCCvdZn8AOYqSJkL7xF8R8HkS8GsKEcPk0Eva0855YG z6Jd28llajhOMKffsONc8T9IUvkosxd4J69TVk0JuQ1I+7As0u/pRnq5gJbTBzi+U0 R+vHEuvTrFugTc+jD/Wry/pQGr/N3hzYPuduFcXn5moFYNmeuU6hIWbdktcP1O9/VZ F7cx3tZAKHOgEdAT3g0OrXtoqJQQh9HzqlFioNyVCh8AIFehHlJPvDtStRjhBgXE7c lDHpr11rkId9QOD5/3IAKTCWeB5V45KQ9Rohmooo4r2vZvd+1Qs2xuwpArgVVsnxUQ a6zswlYT80YaQ== From: Andrii Nakryiko To: linux-trace-kernel@vger.kernel.org, peterz@infradead.org, oleg@redhat.com Cc: rostedt@goodmis.org, mhiramat@kernel.org, bpf@vger.kernel.org, linux-kernel@vger.kernel.org, jolsa@kernel.org, paulmck@kernel.org, willy@infradead.org, surenb@google.com, akpm@linux-foundation.org, linux-mm@kvack.org, Andrii Nakryiko Subject: [PATCH v4 8/8] uprobes: switch to RCU Tasks Trace flavor for better performance Date: Thu, 29 Aug 2024 11:37:41 -0700 Message-ID: <20240829183741.3331213-9-andrii@kernel.org> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20240829183741.3331213-1-andrii@kernel.org> References: <20240829183741.3331213-1-andrii@kernel.org> MIME-Version: 1.0 X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: 1108240010 X-Stat-Signature: uhqucg5414og83zwn8y1eoon4jndh5t1 X-Rspam-User: X-HE-Tag: 1724956696-112022 X-HE-Meta: U2FsdGVkX18zryqOepKhckz771cmDqt3VGY1RtbYOJDfAPrUZvrMKzGOl+seqj0Hdx/1/gg7NGGbpfGDOw/EAAVmERrws4H6BuZ0Axvuf+fcK33pjTUwH/209lgyEWFgmLQhPwHOrsi0A7gcB8gJqacxtqtCtmUHwoTRRygJEDyntyTVNZnf97X6bvB9TOsVt4jZlYQGubyR1T7uU6Hky0ULr1eX2KB66tgbAFPL3RGKEHzddiJAyxS5ZzXfXXH73AbY9rTMZrPvCNfxwBuaqVXsWP1T81wwtP3rY/n6ICjc6om/4/6qFLwdXzhiQzOChLieRr5n4VhFiorUurQ8Wa3ewxxhgA/PdaAIqm2Z0t8Dm8/P1IPPEN3kdBqmbEOWc4TCXBzGS1oO8UuUm0NTLsJ2vzOgBaxFMzrdlJq6h4QfdZmZE5z8kPy/wmdUHR41S6mkUD6AaDBg3f5I7mFe1Gh93m9Qae04N3VJ7r3/MNYkabHgQOK6w+q14URgYOmOb9mCD1B4VNMDD62t3NM/afcGuWCaIkCb2JXlg8qSonLbbQf2tp6/FLIQ1B6Xuu55W+vGhoFCzzGftJ6oxTyEjk/Y1qWb+riV7WIJ1CU+mQQTyDhGPRIc1j696xxrlSnjV2350NPpyh07L3QxJh2iWOeXWIca6ZbYYFgsquT3lGijDicT/bji+iY1bWFDDzi+6d2VuAAYoMqeGTGiS2lKYdBQnO5nuqOoKpspPCoqVVb5EwJ4y6ffxtGKuPFwYaJh7+JpjuHExdQDcn9emXJaqvRgYwrpxn814Qlyk9+UgSqHttI8lX/ZkHUTJsvpUTmUEJbtARfzK7i1kzQ1zJNjhNzlD00e13eukIrx/28VHoPR2NR0Qbjd0eiTCgu2bAzCywbcp+hnJ13OV/sFWesBjo89MhRPLr+kOQqMk+HGx/8pHx5UOYOCYJ/+Sa+OzdRfQaxzQ+83Yr/YM43ByJo KvxiUbQS 5mDMr5xzolyVKAKhZsnun5dpanLhSnyYpFMrb0GdRNpcw4ui2fkXc0XUe4q9wk+f/zxFqm4o3jTmr1AGq4HIaxJfAfNI24Fhr0NgUh6U9icFSCzR/ZY2Qjhm996lX47hQqCqjonV+70VWdcK1TwkwMzqPFO9VH+FjCYlCUXWAPPEVFPb3MfPUrhkuW/EdPxwm1r9nt/VafeWLpd5IFxdiiWaRyRZjNdlLj2VOF6U/qg0o9UUXWH/wH8ct/lb4oSNHptNFqDlzH4OK6UrZOB1cd7FYqA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: This patch switches uprobes SRCU usage to RCU Tasks Trace flavor, which is optimized for more lightweight and quick readers (at the expense of slower writers, which for uprobes is a fine tradeof) and has better performance and scalability with number of CPUs. Similarly to baseline vs SRCU, we've benchmarked SRCU-based implementation vs RCU Tasks Trace implementation. SRCU ==== uprobe-nop ( 1 cpus): 3.276 ± 0.005M/s ( 3.276M/s/cpu) uprobe-nop ( 2 cpus): 4.125 ± 0.002M/s ( 2.063M/s/cpu) uprobe-nop ( 4 cpus): 7.713 ± 0.002M/s ( 1.928M/s/cpu) uprobe-nop ( 8 cpus): 8.097 ± 0.006M/s ( 1.012M/s/cpu) uprobe-nop (16 cpus): 6.501 ± 0.056M/s ( 0.406M/s/cpu) uprobe-nop (32 cpus): 4.398 ± 0.084M/s ( 0.137M/s/cpu) uprobe-nop (64 cpus): 6.452 ± 0.000M/s ( 0.101M/s/cpu) uretprobe-nop ( 1 cpus): 2.055 ± 0.001M/s ( 2.055M/s/cpu) uretprobe-nop ( 2 cpus): 2.677 ± 0.000M/s ( 1.339M/s/cpu) uretprobe-nop ( 4 cpus): 4.561 ± 0.003M/s ( 1.140M/s/cpu) uretprobe-nop ( 8 cpus): 5.291 ± 0.002M/s ( 0.661M/s/cpu) uretprobe-nop (16 cpus): 5.065 ± 0.019M/s ( 0.317M/s/cpu) uretprobe-nop (32 cpus): 3.622 ± 0.003M/s ( 0.113M/s/cpu) uretprobe-nop (64 cpus): 3.723 ± 0.002M/s ( 0.058M/s/cpu) RCU Tasks Trace =============== uprobe-nop ( 1 cpus): 3.396 ± 0.002M/s ( 3.396M/s/cpu) uprobe-nop ( 2 cpus): 4.271 ± 0.006M/s ( 2.135M/s/cpu) uprobe-nop ( 4 cpus): 8.499 ± 0.015M/s ( 2.125M/s/cpu) uprobe-nop ( 8 cpus): 10.355 ± 0.028M/s ( 1.294M/s/cpu) uprobe-nop (16 cpus): 7.615 ± 0.099M/s ( 0.476M/s/cpu) uprobe-nop (32 cpus): 4.430 ± 0.007M/s ( 0.138M/s/cpu) uprobe-nop (64 cpus): 6.887 ± 0.020M/s ( 0.108M/s/cpu) uretprobe-nop ( 1 cpus): 2.174 ± 0.001M/s ( 2.174M/s/cpu) uretprobe-nop ( 2 cpus): 2.853 ± 0.001M/s ( 1.426M/s/cpu) uretprobe-nop ( 4 cpus): 4.913 ± 0.002M/s ( 1.228M/s/cpu) uretprobe-nop ( 8 cpus): 5.883 ± 0.002M/s ( 0.735M/s/cpu) uretprobe-nop (16 cpus): 5.147 ± 0.001M/s ( 0.322M/s/cpu) uretprobe-nop (32 cpus): 3.738 ± 0.008M/s ( 0.117M/s/cpu) uretprobe-nop (64 cpus): 4.397 ± 0.002M/s ( 0.069M/s/cpu) Peak throughput for uprobes increases from 8 mln/s to 10.3 mln/s (+28%!), and for uretprobes from 5.3 mln/s to 5.8 mln/s (+11%), as we have more work to do on uretprobes side. Even single-thread (no contention) performance is slightly better: 3.276 mln/s to 3.396 mln/s (+3.5%) for uprobes, and 2.055 mln/s to 2.174 mln/s (+5.8%) for uretprobes. Signed-off-by: Andrii Nakryiko --- kernel/events/uprobes.c | 37 +++++++++++++++---------------------- 1 file changed, 15 insertions(+), 22 deletions(-) diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c index 8a464cf38127..a5d39cec53d5 100644 --- a/kernel/events/uprobes.c +++ b/kernel/events/uprobes.c @@ -42,8 +42,6 @@ static struct rb_root uprobes_tree = RB_ROOT; static DEFINE_RWLOCK(uprobes_treelock); /* serialize rbtree access */ static seqcount_rwlock_t uprobes_seqcount = SEQCNT_RWLOCK_ZERO(uprobes_seqcount, &uprobes_treelock); -DEFINE_STATIC_SRCU(uprobes_srcu); - #define UPROBES_HASH_SZ 13 /* serialize uprobe->pending_list */ static struct mutex uprobes_mmap_mutex[UPROBES_HASH_SZ]; @@ -652,7 +650,7 @@ static void put_uprobe(struct uprobe *uprobe) delayed_uprobe_remove(uprobe, NULL); mutex_unlock(&delayed_uprobe_lock); - call_srcu(&uprobes_srcu, &uprobe->rcu, uprobe_free_rcu); + call_rcu_tasks_trace(&uprobe->rcu, uprobe_free_rcu); } static __always_inline @@ -707,7 +705,7 @@ static struct uprobe *find_uprobe_rcu(struct inode *inode, loff_t offset) struct rb_node *node; unsigned int seq; - lockdep_assert(srcu_read_lock_held(&uprobes_srcu)); + lockdep_assert(rcu_read_lock_trace_held()); do { seq = read_seqcount_begin(&uprobes_seqcount); @@ -935,8 +933,7 @@ static bool filter_chain(struct uprobe *uprobe, struct mm_struct *mm) bool ret = false; down_read(&uprobe->consumer_rwsem); - list_for_each_entry_srcu(uc, &uprobe->consumers, cons_node, - srcu_read_lock_held(&uprobes_srcu)) { + list_for_each_entry_rcu(uc, &uprobe->consumers, cons_node, rcu_read_lock_trace_held()) { ret = consumer_filter(uc, mm); if (ret) break; @@ -1157,7 +1154,7 @@ void uprobe_unregister_sync(void) * unlucky enough caller can free consumer's memory and cause * handler_chain() or handle_uretprobe_chain() to do an use-after-free. */ - synchronize_srcu(&uprobes_srcu); + synchronize_rcu_tasks_trace(); } EXPORT_SYMBOL_GPL(uprobe_unregister_sync); @@ -1241,19 +1238,18 @@ EXPORT_SYMBOL_GPL(uprobe_register); int uprobe_apply(struct uprobe *uprobe, struct uprobe_consumer *uc, bool add) { struct uprobe_consumer *con; - int ret = -ENOENT, srcu_idx; + int ret = -ENOENT; down_write(&uprobe->register_rwsem); - srcu_idx = srcu_read_lock(&uprobes_srcu); - list_for_each_entry_srcu(con, &uprobe->consumers, cons_node, - srcu_read_lock_held(&uprobes_srcu)) { + rcu_read_lock_trace(); + list_for_each_entry_rcu(con, &uprobe->consumers, cons_node, rcu_read_lock_trace_held()) { if (con == uc) { ret = register_for_each_vma(uprobe, add ? uc : NULL); break; } } - srcu_read_unlock(&uprobes_srcu, srcu_idx); + rcu_read_unlock_trace(); up_write(&uprobe->register_rwsem); @@ -2123,8 +2119,7 @@ static void handler_chain(struct uprobe *uprobe, struct pt_regs *regs) current->utask->auprobe = &uprobe->arch; - list_for_each_entry_srcu(uc, &uprobe->consumers, cons_node, - srcu_read_lock_held(&uprobes_srcu)) { + list_for_each_entry_rcu(uc, &uprobe->consumers, cons_node, rcu_read_lock_trace_held()) { int rc = 0; if (uc->handler) { @@ -2162,15 +2157,13 @@ handle_uretprobe_chain(struct return_instance *ri, struct pt_regs *regs) { struct uprobe *uprobe = ri->uprobe; struct uprobe_consumer *uc; - int srcu_idx; - srcu_idx = srcu_read_lock(&uprobes_srcu); - list_for_each_entry_srcu(uc, &uprobe->consumers, cons_node, - srcu_read_lock_held(&uprobes_srcu)) { + rcu_read_lock_trace(); + list_for_each_entry_rcu(uc, &uprobe->consumers, cons_node, rcu_read_lock_trace_held()) { if (uc->ret_handler) uc->ret_handler(uc, ri->func, regs); } - srcu_read_unlock(&uprobes_srcu, srcu_idx); + rcu_read_unlock_trace(); } static struct return_instance *find_next_ret_chain(struct return_instance *ri) @@ -2255,13 +2248,13 @@ static void handle_swbp(struct pt_regs *regs) { struct uprobe *uprobe; unsigned long bp_vaddr; - int is_swbp, srcu_idx; + int is_swbp; bp_vaddr = uprobe_get_swbp_addr(regs); if (bp_vaddr == uprobe_get_trampoline_vaddr()) return uprobe_handle_trampoline(regs); - srcu_idx = srcu_read_lock(&uprobes_srcu); + rcu_read_lock_trace(); uprobe = find_active_uprobe_rcu(bp_vaddr, &is_swbp); if (!uprobe) { @@ -2319,7 +2312,7 @@ static void handle_swbp(struct pt_regs *regs) out: /* arch_uprobe_skip_sstep() succeeded, or restart if can't singlestep */ - srcu_read_unlock(&uprobes_srcu, srcu_idx); + rcu_read_unlock_trace(); } /*