From patchwork Tue Sep 3 17:46:02 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Andrii Nakryiko X-Patchwork-Id: 13789159 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id D407CCD37AB for ; Tue, 3 Sep 2024 17:46:34 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 43E398D01BE; Tue, 3 Sep 2024 13:46:34 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 3EE8B8D018A; Tue, 3 Sep 2024 13:46:34 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 267598D01BE; Tue, 3 Sep 2024 13:46:34 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 07A5C8D018A for ; Tue, 3 Sep 2024 13:46:34 -0400 (EDT) Received: from smtpin01.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id C2ECA1207D2 for ; Tue, 3 Sep 2024 17:46:33 +0000 (UTC) X-FDA: 82524156666.01.75A2F5B Received: from nyc.source.kernel.org (nyc.source.kernel.org [147.75.193.91]) by imf20.hostedemail.com (Postfix) with ESMTP id 1B2611C0020 for ; Tue, 3 Sep 2024 17:46:31 +0000 (UTC) Authentication-Results: imf20.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=SKhNGs1D; spf=pass (imf20.hostedemail.com: domain of andrii@kernel.org designates 147.75.193.91 as permitted sender) smtp.mailfrom=andrii@kernel.org; dmarc=pass (policy=quarantine) header.from=kernel.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1725385568; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=bV4F9AJb16UUAfSetX/O8CLbZuY8uFXGxZE32BLGhr0=; b=5qFXyI5XTdzikX5sE9lRoYIxlmrhxrQfAuVVvN/YtRSZwNHvuTqZRIXEDiGQFmHaMWS/8L EtznOR4GBNPDWiB+DrWo/aavLF/fVtw8gSkXbMkw7L1vCdkIJP2oH8L6nLX8RMFP5ICTgp 31a8ej7fEuzOUNwXCfUfZvQTbFRJZbg= ARC-Authentication-Results: i=1; imf20.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=SKhNGs1D; spf=pass (imf20.hostedemail.com: domain of andrii@kernel.org designates 147.75.193.91 as permitted sender) smtp.mailfrom=andrii@kernel.org; dmarc=pass (policy=quarantine) header.from=kernel.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1725385568; a=rsa-sha256; cv=none; b=W37Ll5OtjEjfjuFzsmmMbOhW0FCH1VikAjC224eLez8tary25GkzuyIIWFFcnOu0Hsa5F2 d3xJA/q/FdUQq8UaeRePimlct3YjJypKrzecf1Bxb+4pwHc1mgorC5Z6NY6pST/mrE9hG/ bpEriNIjhRzoWyBZ8WfmdmGUao9NEMU= Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by nyc.source.kernel.org (Postfix) with ESMTP id 51986A43401; Tue, 3 Sep 2024 17:46:24 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id A2803C4CEC4; Tue, 3 Sep 2024 17:46:30 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1725385590; bh=RvQt+mn0Ib4bzcl2DbcJugmCiAEXvSCyxNzylfy/bEU=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=SKhNGs1DObzAy/uvZHlQgdMaLMKlO7U+tcdlSlZceAism21OBtxrovQA64NPpwVLO GcglGJsXb4iyd6KJDYJQDsda4kLcWkbICqjQlBGVEp0MWD8pt8XgkTRwbN4+H3gIDL GH9jODOs1OjLMtyK1qjE9D05z9d+2yAcnuVxw2JOkYdGy4T+qT54gF82tCTMXFukXj o8ygjUWUahKuKL8HgGYGYZhnF0SAnnsQjxYSmaUk0FvGjj7Q1paJIsaPFxifWgY8b+ 2xYntsad8LbEoczx4NMfU85OsVERPsaeOHQ2uDunAqe7yCZsnpzdpzm/IsvyDvKvbI fJOLjGoveDpCg== From: Andrii Nakryiko To: linux-trace-kernel@vger.kernel.org, peterz@infradead.org, oleg@redhat.com Cc: rostedt@goodmis.org, mhiramat@kernel.org, bpf@vger.kernel.org, linux-kernel@vger.kernel.org, jolsa@kernel.org, paulmck@kernel.org, willy@infradead.org, surenb@google.com, akpm@linux-foundation.org, linux-mm@kvack.org, Andrii Nakryiko Subject: [PATCH v5 7/8] uprobes: perform lockless SRCU-protected uprobes_tree lookup Date: Tue, 3 Sep 2024 10:46:02 -0700 Message-ID: <20240903174603.3554182-8-andrii@kernel.org> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20240903174603.3554182-1-andrii@kernel.org> References: <20240903174603.3554182-1-andrii@kernel.org> MIME-Version: 1.0 X-Rspam-User: X-Stat-Signature: uq4d71nbbdchkrw8g7yxo4bujoq8mmoz X-Rspamd-Queue-Id: 1B2611C0020 X-Rspamd-Server: rspam11 X-HE-Tag: 1725385591-121481 X-HE-Meta: U2FsdGVkX18zBt2HJDDJY4DsKzsSETMwkOWRoOEla1G2J+HgI2vM5QoVERG4/gfDLPsRGF/271ODcgmCbQUDrFzLPrSWo3rvVdljfjvVBk7uZN7fAp3wUrdLOio25r8GT980BMPi3z24bwlaoW1vzLgDx6NLqTR0GrVCQZlyxdaxx+mdEumD69Cwvt3NPO4U/+ULDN3mUkVRpFFNhyIHhq30sehL1xicP4KBKQ5Z18lOos+gUYQ5Y8+viW+PeJHUUbUL4y4yVdZ/TF1prgLMcXCEeI1cKziPL5BoLTBGxTfpfTl70NjezSb1wjdRz+oNVGxzRoXRf9/fP+1NIstX4z8xbbAp5fJKdE8t1hskW1FGF1UQpk6DsF1ESyptbdTlACn7QjgM74vFCe92LGVkucq8pEZJXZPTwVzN0hv3Y1QOWAv5K/pkur0b3gtQPuXaNbip+NxopSPcOUE6lvZfXjVKS5E/H/VvFx7CHVKrnY1qdT69YFniyyGHGFSuppNBB+e5eWXw9wZAa0m7UqbO71JgTaedkBmDqkCSkHgnrbdHh9qHkeVbnZbb9VxV+GJZ6doU5BDtjuj7uPMTLp6DjdgyvHoMyNpy9dHaMflHl7vWG7ZPBj/Oq031euD1IwyPJCdaOfHEvFp0pT6A1bkMfBH/v+w0bNC8jFzE1tRRCxKC5SOsbiPCC8Y9Ufp9l+lymhquONhyNBeQUfHot6tbD1QFFokNr1yNsK+Ho+NFLaYhm31iIFvGmnD1b4F7dnwoS6GPpBe9pbI46AzX7kDaIput9fA+IH7X+eVnM41PmF2gMltWCOPEKtVRtwqrs+qQiCNaEzwW3FpKq7HkXlDe/5S1nImRztWavqjeDSEQ5Gu4oJs5gLmDRvhai/rRn4UqjlgDUElGmdbJK4cnUg6+SAyhGaVfdRPj7BnjOvUKAdUCtUcHyXb65paMTKXe93ThxYh3tCP7NyPOQ4ZA0I8 /8uiFYyT kl/MiUFZVfGb9G1AnLGaPdd48u9p90ou+d2Yqtbc1uoFJ0nbXkevPV3dZdbxbBI2akuV0YT1plV9xpBJwuWQvIJOAoFqgSa4+XQbLFV1de0JvX/HlymnSmaj7e0uhfCJtyvc2cBydEHeP/S+cq4yit5qk/P1EXiJm4A9d7WumMeTo+R5zXq5kBwM05avv5k0QfTsb5CiHqeXsT3xc4hvfBz3qcbhff/5tilDezDGleYGLEmTOigiLPMzAO1vnKFOpmIg+GTBT+suU63AMb2NOpm6MDrNkoxyTNXMd1VY6q6cPQQSFWds1KLaGv1lSKgraTAyj79zr/PsubkoT0vZKQJ74v1RQVUby4GYw7lYesQ7YTcb3HbdNaxxkQPUxVolQuokylcX75hoY96o= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Another big bottleneck to scalablity is uprobe_treelock that's taken in a very hot path in handle_swbp(). Now that uprobes are SRCU-protected, take advantage of that and make uprobes_tree RB-tree look up lockless. To make RB-tree RCU-protected lockless lookup correct, we need to take into account that such RB-tree lookup can return false negatives if there are parallel RB-tree modifications (rotations) going on. We use seqcount lock to detect whether RB-tree changed, and if we find nothing while RB-tree got modified inbetween, we just retry. If uprobe was found, then it's guaranteed to be a correct lookup. With all the lock-avoiding changes done, we get a pretty decent improvement in performance and scalability of uprobes with number of CPUs, even though we are still nowhere near linear scalability. This is due to SRCU not really scaling very well with number of CPUs on a particular hardware that was used for testing (80-core Intel Xeon Gold 6138 CPU @ 2.00GHz), but also due to the remaning mmap_lock, which is currently taken to resolve interrupt address to inode+offset and then uprobe instance. And, of course, uretprobes still need similar RCU to avoid refcount in the hot path, which will be addressed in the follow up patches. Nevertheless, the improvement is good. We used BPF selftest-based uprobe-nop and uretprobe-nop benchmarks to get the below numbers, varying number of CPUs on which uprobes and uretprobes are triggered. BASELINE ======== uprobe-nop ( 1 cpus): 3.032 ± 0.023M/s ( 3.032M/s/cpu) uprobe-nop ( 2 cpus): 3.452 ± 0.005M/s ( 1.726M/s/cpu) uprobe-nop ( 4 cpus): 3.663 ± 0.005M/s ( 0.916M/s/cpu) uprobe-nop ( 8 cpus): 3.718 ± 0.038M/s ( 0.465M/s/cpu) uprobe-nop (16 cpus): 3.344 ± 0.008M/s ( 0.209M/s/cpu) uprobe-nop (32 cpus): 2.288 ± 0.021M/s ( 0.071M/s/cpu) uprobe-nop (64 cpus): 3.205 ± 0.004M/s ( 0.050M/s/cpu) uretprobe-nop ( 1 cpus): 1.979 ± 0.005M/s ( 1.979M/s/cpu) uretprobe-nop ( 2 cpus): 2.361 ± 0.005M/s ( 1.180M/s/cpu) uretprobe-nop ( 4 cpus): 2.309 ± 0.002M/s ( 0.577M/s/cpu) uretprobe-nop ( 8 cpus): 2.253 ± 0.001M/s ( 0.282M/s/cpu) uretprobe-nop (16 cpus): 2.007 ± 0.000M/s ( 0.125M/s/cpu) uretprobe-nop (32 cpus): 1.624 ± 0.003M/s ( 0.051M/s/cpu) uretprobe-nop (64 cpus): 2.149 ± 0.001M/s ( 0.034M/s/cpu) SRCU CHANGES ============ uprobe-nop ( 1 cpus): 3.276 ± 0.005M/s ( 3.276M/s/cpu) uprobe-nop ( 2 cpus): 4.125 ± 0.002M/s ( 2.063M/s/cpu) uprobe-nop ( 4 cpus): 7.713 ± 0.002M/s ( 1.928M/s/cpu) uprobe-nop ( 8 cpus): 8.097 ± 0.006M/s ( 1.012M/s/cpu) uprobe-nop (16 cpus): 6.501 ± 0.056M/s ( 0.406M/s/cpu) uprobe-nop (32 cpus): 4.398 ± 0.084M/s ( 0.137M/s/cpu) uprobe-nop (64 cpus): 6.452 ± 0.000M/s ( 0.101M/s/cpu) uretprobe-nop ( 1 cpus): 2.055 ± 0.001M/s ( 2.055M/s/cpu) uretprobe-nop ( 2 cpus): 2.677 ± 0.000M/s ( 1.339M/s/cpu) uretprobe-nop ( 4 cpus): 4.561 ± 0.003M/s ( 1.140M/s/cpu) uretprobe-nop ( 8 cpus): 5.291 ± 0.002M/s ( 0.661M/s/cpu) uretprobe-nop (16 cpus): 5.065 ± 0.019M/s ( 0.317M/s/cpu) uretprobe-nop (32 cpus): 3.622 ± 0.003M/s ( 0.113M/s/cpu) uretprobe-nop (64 cpus): 3.723 ± 0.002M/s ( 0.058M/s/cpu) Peak througput increased from 3.7 mln/s (uprobe triggerings) up to about 8 mln/s. For uretprobes it's a bit more modest with bump from 2.4 mln/s to 5mln/s. Reviewed-by: Oleg Nesterov Suggested-by: Peter Zijlstra (Intel) Signed-off-by: Andrii Nakryiko --- kernel/events/uprobes.c | 30 ++++++++++++++++++++++++------ 1 file changed, 24 insertions(+), 6 deletions(-) diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c index e9b755ddf960..8a464cf38127 100644 --- a/kernel/events/uprobes.c +++ b/kernel/events/uprobes.c @@ -40,6 +40,7 @@ static struct rb_root uprobes_tree = RB_ROOT; #define no_uprobe_events() RB_EMPTY_ROOT(&uprobes_tree) static DEFINE_RWLOCK(uprobes_treelock); /* serialize rbtree access */ +static seqcount_rwlock_t uprobes_seqcount = SEQCNT_RWLOCK_ZERO(uprobes_seqcount, &uprobes_treelock); DEFINE_STATIC_SRCU(uprobes_srcu); @@ -634,8 +635,11 @@ static void put_uprobe(struct uprobe *uprobe) write_lock(&uprobes_treelock); - if (uprobe_is_active(uprobe)) + if (uprobe_is_active(uprobe)) { + write_seqcount_begin(&uprobes_seqcount); rb_erase(&uprobe->rb_node, &uprobes_tree); + write_seqcount_end(&uprobes_seqcount); + } write_unlock(&uprobes_treelock); @@ -701,14 +705,26 @@ static struct uprobe *find_uprobe_rcu(struct inode *inode, loff_t offset) .offset = offset, }; struct rb_node *node; + unsigned int seq; lockdep_assert(srcu_read_lock_held(&uprobes_srcu)); - read_lock(&uprobes_treelock); - node = rb_find(&key, &uprobes_tree, __uprobe_cmp_key); - read_unlock(&uprobes_treelock); + do { + seq = read_seqcount_begin(&uprobes_seqcount); + node = rb_find_rcu(&key, &uprobes_tree, __uprobe_cmp_key); + /* + * Lockless RB-tree lookups can result only in false negatives. + * If the element is found, it is correct and can be returned + * under RCU protection. If we find nothing, we need to + * validate that seqcount didn't change. If it did, we have to + * try again as we might have missed the element (false + * negative). If seqcount is unchanged, search truly failed. + */ + if (node) + return __node_2_uprobe(node); + } while (read_seqcount_retry(&uprobes_seqcount, seq)); - return node ? __node_2_uprobe(node) : NULL; + return NULL; } /* @@ -730,7 +746,7 @@ static struct uprobe *__insert_uprobe(struct uprobe *uprobe) { struct rb_node *node; again: - node = rb_find_add(&uprobe->rb_node, &uprobes_tree, __uprobe_cmp); + node = rb_find_add_rcu(&uprobe->rb_node, &uprobes_tree, __uprobe_cmp); if (node) { struct uprobe *u = __node_2_uprobe(node); @@ -755,7 +771,9 @@ static struct uprobe *insert_uprobe(struct uprobe *uprobe) struct uprobe *u; write_lock(&uprobes_treelock); + write_seqcount_begin(&uprobes_seqcount); u = __insert_uprobe(uprobe); + write_seqcount_end(&uprobes_seqcount); write_unlock(&uprobes_treelock); return u;