From patchwork Tue Aug 13 04:29:11 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Andrii Nakryiko X-Patchwork-Id: 13761248 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 14A78C52D7C for ; Tue, 13 Aug 2024 04:30:07 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 95F396B00A3; Tue, 13 Aug 2024 00:30:06 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 8E8826B00A4; Tue, 13 Aug 2024 00:30:06 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 714446B00A5; Tue, 13 Aug 2024 00:30:06 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 510156B00A3 for ; Tue, 13 Aug 2024 00:30:06 -0400 (EDT) Received: from smtpin12.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 0E8551A05E6 for ; Tue, 13 Aug 2024 04:30:06 +0000 (UTC) X-FDA: 82445944812.12.AE300BA Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by imf27.hostedemail.com (Postfix) with ESMTP id 569EC4000A for ; Tue, 13 Aug 2024 04:30:04 +0000 (UTC) Authentication-Results: imf27.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=cZSEy+Xo; dmarc=pass (policy=none) header.from=kernel.org; spf=pass (imf27.hostedemail.com: domain of andrii@kernel.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=andrii@kernel.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1723523347; a=rsa-sha256; cv=none; b=l+5kO04UUwx1nqtoyuV+RUhc5afdjCEdmKkXb4vzP1nDX3JqwB4DjZH/Al3AsM2lH/jC+P MUv7d1WMImhMWNVMH3GwJDtU9bee3FQ/zb8hPXGcixhDGF69AtLL1yFi8So8xaXI9oP5du S8CZ0Bca9i1cF3TrOX5Y3rTym/LYfTA= ARC-Authentication-Results: i=1; imf27.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=cZSEy+Xo; dmarc=pass (policy=none) header.from=kernel.org; spf=pass (imf27.hostedemail.com: domain of andrii@kernel.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=andrii@kernel.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1723523347; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=FCMvolNdMapmguVot6yEgB2UuXNjv/Po8Xt56cDF0BA=; b=lT1bRy0ogEL5UuA6mAk6DKRnIUB4xFlaeANyNzslu3BGNVNDbQ5Cz6gTL1O51Sy1p4JcOv Ps62ZDewpSff3nWj2q7+pzvv0LKxrGR4BNFrXtT3rBkqOXvJbMoxRIhjxaMrUBqHo7mn3z zshYQhuJcW/dk1Ogyb99tj2AGDqCbBE= Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by dfw.source.kernel.org (Postfix) with ESMTP id 91EFB61564; Tue, 13 Aug 2024 04:30:03 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 32717C4AF16; Tue, 13 Aug 2024 04:30:03 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1723523403; bh=pS649MnIu4k9iBlktKdHu5GDOrjn3RJ3VigER76AbMA=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=cZSEy+XoeS3UFrInP9/zXYxqHQgKl+3I0n1r2WTRl/KauaJ/woCw8rRvRGDLMh6fX 0wa4ulv4U7K+gwbpr4pBTAkOgxmPgweuPkDiFInsVzrls1t1JIO2gEtLlia3i7gcyn A+Z+vMxl0xnQhPfi4usipo4NX5bPEO+DzckBxpOAYLqgFaP/2kjkb5ERIHfntzrI6j 6CH2kFv9upGXOPA1GNFH7SL2ElSTx2+ppjDpXr2W+HCQ6uoSM+oe9tfPO2h2BWTtkA yr5JDEkFcLVYFiDg4gTpYDSF/BP4DKrdBOXWKEF3e7hgKYf5LcmHztb75RR+5Gm0i2 7xXtP40jyBg/Q== From: Andrii Nakryiko To: linux-trace-kernel@vger.kernel.org, peterz@infradead.org, oleg@redhat.com Cc: rostedt@goodmis.org, mhiramat@kernel.org, bpf@vger.kernel.org, linux-kernel@vger.kernel.org, jolsa@kernel.org, paulmck@kernel.org, willy@infradead.org, surenb@google.com, akpm@linux-foundation.org, linux-mm@kvack.org, Andrii Nakryiko Subject: [PATCH v3 07/13] uprobes: perform lockless SRCU-protected uprobes_tree lookup Date: Mon, 12 Aug 2024 21:29:11 -0700 Message-ID: <20240813042917.506057-8-andrii@kernel.org> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20240813042917.506057-1-andrii@kernel.org> References: <20240813042917.506057-1-andrii@kernel.org> MIME-Version: 1.0 X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: 569EC4000A X-Stat-Signature: 5ibozr5c55j6nyu8r9bs981q3zmkokft X-Rspam-User: X-HE-Tag: 1723523404-586833 X-HE-Meta: U2FsdGVkX18ju02b46vOQO4w9fFeYArw3WLJm9lsI2d/a2TktLPpVFsO4AAzyM3RBP6qzsNKk+sKByE6zXMnt7sWV3rHfZcN995ziC/CPPvY0PgAFpM0NufQd/4X7PJrMMSs4T85mA+BQ2OP9GFq4AthDsAp+cPn2klG+Z8TYGF129HK8aGeP7Lge8JVio9qr7jqPsO9Obgne+xRk3dzd2fiPsGQJnaIOxgbUWDCu3XIUB6mRot7i4jpT6YhvZpMOTLpXt5jVZxO2l92DNubdknfwnCRIpE9DRLxffiVwzD1uNm7da1s2y8UJH/yBVkHu7vyQGW1NeA55OS7LtMnnTsKCrSemssPFLZdiqePeofu7tQLAHAlsqeEt/EDBXdlW3qo8o3XTCLxc1lhbR9GqKx6CFBRnRKItYcgInb3xY8aNTrzeK5tJGsZC1BkQrwosqFBFoaEka2c6MoLpv1OtNoTLXD6D4iyir4UvQia6tO69hgERmAMTTfgMdSlfUV4XpMQoKEHtmOiTExFfDDxXZ8nMRMyxRuscdWlAnE3cW7f1vshFWtznovPmyh50ZBsuu5Pa5nuiLqa6i41XMs+2kgdo4WUlEz8jeyG8iBJ6HM/6iemFacffUC7caecnpj/kqFen/XM3CGlzySF8kjvzp4O857cBbVDivRLYp8+C3ZoRo9XynIMgSXSe/rvIZStL2KC7OlBgfGts+I8Y3Z7OYMhR2KDsr42k8xG8ltV/Hva4AsOH6q7NayoFOmahgTpUoVx9B0VRKdsIk4wPqpL2L0jBBkvLw0uCenffqSEPDX3h6QEK2FtrL+S/aLG7BJq+ipg55EhotX9QHbxWXAJBOkl6PNDZ2pIfj5vVoO/dyiT572etXtYUgUHzqZaSI3WNQ7K243n+zrN9HoFoNxdIAYV3liSsVFWnxkXr+D03s8hgUSsxWSmgHukll13+tg7SU4ulYQHhjyfl325byl B2XZBvKC trar3vK/zG5J0cw7yLaRsVbHxVSiaMIY7gsJMUa1Mb5LLOkujWBTrplT/DiRTr48XX69DyAdf1mSuKVtApgaUlZF7b4h3lUMxwVe2h7sLNiSFm+GVNPFn5vE443EY/uZtwQJmW7ZwKhIYFW4X9YiIQT+DfAdTlPBbq51cQojdfUV+nIKgo+fyUmB45EH66XE1OdGKjZRr+t7UUMPBUC+SZe/Arc7b+psjbSxqqGv0xGaN8fypsc8Kzqv0KA1JEthPrNrCNZIT4323IrbwoE2diuuxkb6tffQuDlnCA/e/1ZBTWrgh624ObObU2JFmoDietHUHS8zZ95bw9c7Vz8z6i5uRtEodXCc12rRsNEOkh1TIsFoI2cKUh5eVPbHy+CAMGsnt X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Another big bottleneck to scalablity is uprobe_treelock that's taken in a very hot path in handle_swbp(). Now that uprobes are SRCU-protected, take advantage of that and make uprobes_tree RB-tree look up lockless. To make RB-tree RCU-protected lockless lookup correct, we need to take into account that such RB-tree lookup can return false negatives if there are parallel RB-tree modifications (rotations) going on. We use seqcount lock to detect whether RB-tree changed, and if we find nothing while RB-tree got modified inbetween, we just retry. If uprobe was found, then it's guaranteed to be a correct lookup. With all the lock-avoiding changes done, we get a pretty decent improvement in performance and scalability of uprobes with number of CPUs, even though we are still nowhere near linear scalability. This is due to SRCU not really scaling very well with number of CPUs on a particular hardware that was used for testing (80-core Intel Xeon Gold 6138 CPU @ 2.00GHz), but also due to the remaning mmap_lock, which is currently taken to resolve interrupt address to inode+offset and then uprobe instance. And, of course, uretprobes still need similar RCU to avoid refcount in the hot path, which will be addressed in the follow up patches. Nevertheless, the improvement is good. We used BPF selftest-based uprobe-nop and uretprobe-nop benchmarks to get the below numbers, varying number of CPUs on which uprobes and uretprobes are triggered. BASELINE ======== uprobe-nop ( 1 cpus): 3.032 ± 0.023M/s ( 3.032M/s/cpu) uprobe-nop ( 2 cpus): 3.452 ± 0.005M/s ( 1.726M/s/cpu) uprobe-nop ( 4 cpus): 3.663 ± 0.005M/s ( 0.916M/s/cpu) uprobe-nop ( 8 cpus): 3.718 ± 0.038M/s ( 0.465M/s/cpu) uprobe-nop (16 cpus): 3.344 ± 0.008M/s ( 0.209M/s/cpu) uprobe-nop (32 cpus): 2.288 ± 0.021M/s ( 0.071M/s/cpu) uprobe-nop (64 cpus): 3.205 ± 0.004M/s ( 0.050M/s/cpu) uretprobe-nop ( 1 cpus): 1.979 ± 0.005M/s ( 1.979M/s/cpu) uretprobe-nop ( 2 cpus): 2.361 ± 0.005M/s ( 1.180M/s/cpu) uretprobe-nop ( 4 cpus): 2.309 ± 0.002M/s ( 0.577M/s/cpu) uretprobe-nop ( 8 cpus): 2.253 ± 0.001M/s ( 0.282M/s/cpu) uretprobe-nop (16 cpus): 2.007 ± 0.000M/s ( 0.125M/s/cpu) uretprobe-nop (32 cpus): 1.624 ± 0.003M/s ( 0.051M/s/cpu) uretprobe-nop (64 cpus): 2.149 ± 0.001M/s ( 0.034M/s/cpu) SRCU CHANGES ============ uprobe-nop ( 1 cpus): 3.276 ± 0.005M/s ( 3.276M/s/cpu) uprobe-nop ( 2 cpus): 4.125 ± 0.002M/s ( 2.063M/s/cpu) uprobe-nop ( 4 cpus): 7.713 ± 0.002M/s ( 1.928M/s/cpu) uprobe-nop ( 8 cpus): 8.097 ± 0.006M/s ( 1.012M/s/cpu) uprobe-nop (16 cpus): 6.501 ± 0.056M/s ( 0.406M/s/cpu) uprobe-nop (32 cpus): 4.398 ± 0.084M/s ( 0.137M/s/cpu) uprobe-nop (64 cpus): 6.452 ± 0.000M/s ( 0.101M/s/cpu) uretprobe-nop ( 1 cpus): 2.055 ± 0.001M/s ( 2.055M/s/cpu) uretprobe-nop ( 2 cpus): 2.677 ± 0.000M/s ( 1.339M/s/cpu) uretprobe-nop ( 4 cpus): 4.561 ± 0.003M/s ( 1.140M/s/cpu) uretprobe-nop ( 8 cpus): 5.291 ± 0.002M/s ( 0.661M/s/cpu) uretprobe-nop (16 cpus): 5.065 ± 0.019M/s ( 0.317M/s/cpu) uretprobe-nop (32 cpus): 3.622 ± 0.003M/s ( 0.113M/s/cpu) uretprobe-nop (64 cpus): 3.723 ± 0.002M/s ( 0.058M/s/cpu) Peak througput increased from 3.7 mln/s (uprobe triggerings) up to about 8 mln/s. For uretprobes it's a bit more modest with bump from 2.4 mln/s to 5mln/s. Suggested-by: Peter Zijlstra (Intel) Signed-off-by: Andrii Nakryiko --- kernel/events/uprobes.c | 30 ++++++++++++++++++++++++------ 1 file changed, 24 insertions(+), 6 deletions(-) diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c index 0b6d4c0a0088..8559ca365679 100644 --- a/kernel/events/uprobes.c +++ b/kernel/events/uprobes.c @@ -40,6 +40,7 @@ static struct rb_root uprobes_tree = RB_ROOT; #define no_uprobe_events() RB_EMPTY_ROOT(&uprobes_tree) static DEFINE_RWLOCK(uprobes_treelock); /* serialize rbtree access */ +static seqcount_rwlock_t uprobes_seqcount = SEQCNT_RWLOCK_ZERO(uprobes_seqcount, &uprobes_treelock); DEFINE_STATIC_SRCU(uprobes_srcu); @@ -634,8 +635,11 @@ static void put_uprobe(struct uprobe *uprobe) write_lock(&uprobes_treelock); - if (uprobe_is_active(uprobe)) + if (uprobe_is_active(uprobe)) { + write_seqcount_begin(&uprobes_seqcount); rb_erase(&uprobe->rb_node, &uprobes_tree); + write_seqcount_end(&uprobes_seqcount); + } write_unlock(&uprobes_treelock); @@ -701,14 +705,26 @@ static struct uprobe *find_uprobe_rcu(struct inode *inode, loff_t offset) .offset = offset, }; struct rb_node *node; + unsigned int seq; lockdep_assert(srcu_read_lock_held(&uprobes_srcu)); - read_lock(&uprobes_treelock); - node = rb_find(&key, &uprobes_tree, __uprobe_cmp_key); - read_unlock(&uprobes_treelock); + do { + seq = read_seqcount_begin(&uprobes_seqcount); + node = rb_find_rcu(&key, &uprobes_tree, __uprobe_cmp_key); + /* + * Lockless RB-tree lookups can result only in false negatives. + * If the element is found, it is correct and can be returned + * under RCU protection. If we find nothing, we need to + * validate that seqcount didn't change. If it did, we have to + * try again as we might have missed the element (false + * negative). If seqcount is unchanged, search truly failed. + */ + if (node) + return __node_2_uprobe(node); + } while (read_seqcount_retry(&uprobes_seqcount, seq)); - return node ? __node_2_uprobe(node) : NULL; + return NULL; } /* @@ -730,7 +746,7 @@ static struct uprobe *__insert_uprobe(struct uprobe *uprobe) { struct rb_node *node; again: - node = rb_find_add(&uprobe->rb_node, &uprobes_tree, __uprobe_cmp); + node = rb_find_add_rcu(&uprobe->rb_node, &uprobes_tree, __uprobe_cmp); if (node) { struct uprobe *u = __node_2_uprobe(node); @@ -755,7 +771,9 @@ static struct uprobe *insert_uprobe(struct uprobe *uprobe) struct uprobe *u; write_lock(&uprobes_treelock); + write_seqcount_begin(&uprobes_seqcount); u = __insert_uprobe(uprobe); + write_seqcount_end(&uprobes_seqcount); write_unlock(&uprobes_treelock); return u;