From patchwork Mon Jul 11 13:52:34 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Brian Foster X-Patchwork-Id: 12913784 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2F1EEC43334 for ; Mon, 11 Jul 2022 13:52:48 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230395AbiGKNwq (ORCPT ); Mon, 11 Jul 2022 09:52:46 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48716 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230317AbiGKNwn (ORCPT ); Mon, 11 Jul 2022 09:52:43 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 6C3345C97B for ; Mon, 11 Jul 2022 06:52:42 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1657547561; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=2mSjA/SPr5DLrqocTMOuQhkZ+1jjuTVoS1exRVSQKa4=; b=Jfpy/cHyG2Z6GejulSWPcNcJ3G6NeyrMO97gM/emM9ye8578CCUsslh25xNnD/O+IDrYKC WlRvSL26deipi9j9RFcRZRO8vw8rfwuMWL8T9950tvMpLOZC5hwW8/IjYbKQyip2RirmDJ 32yTEjlrLq2OzArz6QtVP/rdwfEm77Q= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-661-xZrsl3tlPeSeCLwvcwrYmA-1; Mon, 11 Jul 2022 09:52:38 -0400 X-MC-Unique: xZrsl3tlPeSeCLwvcwrYmA-1 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.rdu2.redhat.com [10.11.54.5]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 1627685A584; Mon, 11 Jul 2022 13:52:38 +0000 (UTC) Received: from bfoster.redhat.com (unknown [10.22.32.24]) by smtp.corp.redhat.com (Postfix) with ESMTP id D5EB418EA8; Mon, 11 Jul 2022 13:52:37 +0000 (UTC) From: Brian Foster To: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org Cc: ikent@redhat.com, onestero@redhat.com, willy@infradead.org Subject: [PATCH v2 1/4] radix-tree: propagate all tags in idr tree Date: Mon, 11 Jul 2022 09:52:34 -0400 Message-Id: <20220711135237.173667-2-bfoster@redhat.com> In-Reply-To: <20220711135237.173667-1-bfoster@redhat.com> References: <20220711135237.173667-1-bfoster@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.11.54.5 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org The IDR tree has hardcoded tag propagation logic to handle the internal IDR_FREE tag and ignore all others. Fix up the hardcoded logic to support additional tags. This is specifically to support a new internal IDR_TGID radix tree tag used to improve search efficiency of pids with associated PIDTYPE_TGID tasks within a pid namespace. Signed-off-by: Brian Foster --- lib/radix-tree.c | 26 +++++++++++++++----------- 1 file changed, 15 insertions(+), 11 deletions(-) diff --git a/lib/radix-tree.c b/lib/radix-tree.c index b3afafe46fff..08eef33e7820 100644 --- a/lib/radix-tree.c +++ b/lib/radix-tree.c @@ -431,12 +431,14 @@ static int radix_tree_extend(struct radix_tree_root *root, gfp_t gfp, tag_clear(node, IDR_FREE, 0); root_tag_set(root, IDR_FREE); } - } else { - /* Propagate the aggregated tag info to the new child */ - for (tag = 0; tag < RADIX_TREE_MAX_TAGS; tag++) { - if (root_tag_get(root, tag)) - tag_set(node, tag, 0); - } + } + + /* Propagate the aggregated tag info to the new child */ + for (tag = 0; tag < RADIX_TREE_MAX_TAGS; tag++) { + if (is_idr(root) && tag == IDR_FREE) + continue; + if (root_tag_get(root, tag)) + tag_set(node, tag, 0); } BUG_ON(shift > BITS_PER_LONG); @@ -1368,11 +1370,13 @@ static bool __radix_tree_delete(struct radix_tree_root *root, unsigned offset = get_slot_offset(node, slot); int tag; - if (is_idr(root)) - node_tag_set(root, node, IDR_FREE, offset); - else - for (tag = 0; tag < RADIX_TREE_MAX_TAGS; tag++) - node_tag_clear(root, node, tag, offset); + for (tag = 0; tag < RADIX_TREE_MAX_TAGS; tag++) { + if (is_idr(root) && tag == IDR_FREE) { + node_tag_set(root, node, tag, offset); + continue; + } + node_tag_clear(root, node, tag, offset); + } replace_slot(slot, NULL, node, -1, values); return node && delete_node(root, node); From patchwork Mon Jul 11 13:52:35 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Brian Foster X-Patchwork-Id: 12913783 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 54EC0C433EF for ; Mon, 11 Jul 2022 13:52:45 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230369AbiGKNwo (ORCPT ); Mon, 11 Jul 2022 09:52:44 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48718 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230344AbiGKNwn (ORCPT ); Mon, 11 Jul 2022 09:52:43 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 9B85361B17 for ; Mon, 11 Jul 2022 06:52:42 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1657547561; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=SXxw8023X5Ng7EOWnmeS1q1VSSsbgfI4xN4bjScnx08=; b=CrE1fMtIRRXxtY4ZodSgT65A0bLr4PiYBkhZMr8xT6CIW07DPp3tyUXyBXRQWi5hoy/x7h 4foQbUeKgvmsuwH7WWpl5oiTRe87roCLTJpetCfe8gljvShUaaJR2cc2tqONybwWrvSQBv LeRPjDLhehtSNZmqyoMnAY9fZstd3wQ= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-632-TJYriBkXMFySrijOEM6Z1w-1; Mon, 11 Jul 2022 09:52:38 -0400 X-MC-Unique: TJYriBkXMFySrijOEM6Z1w-1 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.rdu2.redhat.com [10.11.54.5]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 555A938005C4; Mon, 11 Jul 2022 13:52:38 +0000 (UTC) Received: from bfoster.redhat.com (unknown [10.22.32.24]) by smtp.corp.redhat.com (Postfix) with ESMTP id 22B8F18EA8; Mon, 11 Jul 2022 13:52:38 +0000 (UTC) From: Brian Foster To: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org Cc: ikent@redhat.com, onestero@redhat.com, willy@infradead.org Subject: [PATCH v2 2/4] idr: support optional id tagging Date: Mon, 11 Jul 2022 09:52:35 -0400 Message-Id: <20220711135237.173667-3-bfoster@redhat.com> In-Reply-To: <20220711135237.173667-1-bfoster@redhat.com> References: <20220711135237.173667-1-bfoster@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.11.54.5 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org Certain idr users can benefit from generic tagging support of the underlying radix-tree (or xarray) data structure. For example, a readdir of the /proc root dir performs an inefficient walk of the pid namespace idr tree. This involves checking the entry of every allocated id for a group leader task association. Expose a simple, single tag interface for idr users to facilitate more efficient scans in situations like this. Signed-off-by: Brian Foster --- include/linux/idr.h | 26 ++++++++++++++++++++++++++ 1 file changed, 26 insertions(+) diff --git a/include/linux/idr.h b/include/linux/idr.h index a0dce14090a9..44e8bb287d0e 100644 --- a/include/linux/idr.h +++ b/include/linux/idr.h @@ -27,6 +27,7 @@ struct idr { * to users. Use tag 0 to track whether a node has free space below it. */ #define IDR_FREE 0 +#define IDR_TAG 1 /* Set the IDR flag and the IDR_FREE tag */ #define IDR_RT_MARKER (ROOT_IS_IDR | (__force gfp_t) \ @@ -174,6 +175,31 @@ static inline void idr_preload_end(void) local_unlock(&radix_tree_preloads.lock); } +static inline void idr_set_tag(struct idr *idr, unsigned long id) +{ + radix_tree_tag_set(&idr->idr_rt, id - idr->idr_base, IDR_TAG); +} + +static inline bool idr_get_tag(struct idr *idr, unsigned long id) +{ + return radix_tree_tag_get(&idr->idr_rt, id - idr->idr_base, IDR_TAG); +} + +/* + * Find the next id with the internal tag set. + */ +static inline void *idr_get_next_tag(struct idr *idr, unsigned long id) +{ + unsigned int ret; + void *entry; + + ret = radix_tree_gang_lookup_tag(&idr->idr_rt, &entry, + id - idr->idr_base, 1, IDR_TAG); + if (ret != 1) + return NULL; + return entry; +} + /** * idr_for_each_entry() - Iterate over an IDR's elements of a given type. * @idr: IDR handle. From patchwork Mon Jul 11 13:52:36 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Brian Foster X-Patchwork-Id: 12913787 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 40192C43334 for ; Mon, 11 Jul 2022 13:53:12 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230495AbiGKNxL (ORCPT ); Mon, 11 Jul 2022 09:53:11 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49048 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230445AbiGKNwu (ORCPT ); Mon, 11 Jul 2022 09:52:50 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 102C961DBF for ; Mon, 11 Jul 2022 06:52:48 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1657547567; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=rop7OK4IumhZoT3E3GNMDnS+35IynqSezhWluPU0kXM=; b=L3wGLXEcAT4ZNqu12K0wCQmcnuaWlf5TY1mqdRNw7pixoTDp6EMg9BzZ564LqXnzMc0OUq 8An0K1saj2Fc2WN4Da7uLq3UlCRz/MO/oubQqSv7Eqvj0zt6O8z/tpG0SDvGF15XUE/ad2 xS5CvyX8mv26u12v/zem3krMtPYh/kg= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-166-fVzmwPV6NGmL3XJbDS8qWQ-1; Mon, 11 Jul 2022 09:52:38 -0400 X-MC-Unique: fVzmwPV6NGmL3XJbDS8qWQ-1 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.rdu2.redhat.com [10.11.54.5]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 938C0857A87; Mon, 11 Jul 2022 13:52:38 +0000 (UTC) Received: from bfoster.redhat.com (unknown [10.22.32.24]) by smtp.corp.redhat.com (Postfix) with ESMTP id 6154518ECC; Mon, 11 Jul 2022 13:52:38 +0000 (UTC) From: Brian Foster To: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org Cc: ikent@redhat.com, onestero@redhat.com, willy@infradead.org Subject: [PATCH v2 3/4] pid: tag pids associated with group leader tasks Date: Mon, 11 Jul 2022 09:52:36 -0400 Message-Id: <20220711135237.173667-4-bfoster@redhat.com> In-Reply-To: <20220711135237.173667-1-bfoster@redhat.com> References: <20220711135237.173667-1-bfoster@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.11.54.5 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org Searching the pid_namespace for group leader tasks is a fairly inefficient operation. Listing the root directory of a procfs mount performs a linear scan of allocated pids, checking each entry for an associated PIDTYPE_TGID task to determine whether to populate a directory entry. This can cause a significant increase in readdir() syscall latency when run in namespaces that might have one or more processes with significant thread counts. To facilitate improved TGID pid searches, tag the ids of pid entries that are likely to have an associated PIDTYPE_TGID task. To keep the code simple and avoid having to maintain synchronization between tag state and post-fork pid-task association changes, the tag is applied to all pids allocated for tasks cloned without CLONE_THREAD. This means that it is possible for a pid to remain tagged in the idr tree after being disassociated from the group leader task. For example, a process that does a setsid() followed by fork() and exit() (to daemonize) will remain associated with the original pid for the session, but link with the child pid as the group leader. OTOH, the only place other than fork() where a tgid association occurs is in the exec() path, which kills all other tasks in the group and associates the current task with the preexisting leader pid. Therefore, the semantics of the tag are that false positives (tagged pids without PIDTYPE_TGID tasks) are possible, but false negatives (untagged pids without PIDTYPE_TGID tasks) should never occur. This is an effective optimization because false negatives are fairly uncommon and don't add overhead (i.e. we already have to check pid_task() for tagged entries), but still filters out thread pids that are guaranteed not to have TGID task association. Tag entries in the pid allocation path when the caller specifies that the pid associates with a new thread group. Since false negatives are not allowed, warn in the event that a PIDTYPE_TGID task is ever attached to an untagged pid. Finally, create a helper to implement the task search based on the tag semantics defined above (based on search logic currently implemented by next_tgid() in procfs). Signed-off-by: Brian Foster --- include/linux/pid.h | 3 ++- kernel/fork.c | 2 +- kernel/pid.c | 40 +++++++++++++++++++++++++++++++++++++++- 3 files changed, 42 insertions(+), 3 deletions(-) diff --git a/include/linux/pid.h b/include/linux/pid.h index 343abf22092e..64caf21be256 100644 --- a/include/linux/pid.h +++ b/include/linux/pid.h @@ -132,9 +132,10 @@ extern struct pid *find_vpid(int nr); */ extern struct pid *find_get_pid(int nr); extern struct pid *find_ge_pid(int nr, struct pid_namespace *); +struct task_struct *find_get_tgid_task(int *id, struct pid_namespace *); extern struct pid *alloc_pid(struct pid_namespace *ns, pid_t *set_tid, - size_t set_tid_size); + size_t set_tid_size, bool group_leader); extern void free_pid(struct pid *pid); extern void disable_pid_allocation(struct pid_namespace *ns); diff --git a/kernel/fork.c b/kernel/fork.c index 9d44f2d46c69..3c52f45ec93e 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -2254,7 +2254,7 @@ static __latent_entropy struct task_struct *copy_process( if (pid != &init_struct_pid) { pid = alloc_pid(p->nsproxy->pid_ns_for_children, args->set_tid, - args->set_tid_size); + args->set_tid_size, !(clone_flags & CLONE_THREAD)); if (IS_ERR(pid)) { retval = PTR_ERR(pid); goto bad_fork_cleanup_thread; diff --git a/kernel/pid.c b/kernel/pid.c index 2fc0a16ec77b..bd72d1dbff95 100644 --- a/kernel/pid.c +++ b/kernel/pid.c @@ -157,7 +157,7 @@ void free_pid(struct pid *pid) } struct pid *alloc_pid(struct pid_namespace *ns, pid_t *set_tid, - size_t set_tid_size) + size_t set_tid_size, bool group_leader) { struct pid *pid; enum pid_type type; @@ -272,6 +272,8 @@ struct pid *alloc_pid(struct pid_namespace *ns, pid_t *set_tid, for ( ; upid >= pid->numbers; --upid) { /* Make the PID visible to find_pid_ns. */ idr_replace(&upid->ns->idr, pid, upid->nr); + if (group_leader) + idr_set_tag(&upid->ns->idr, upid->nr); upid->ns->pid_allocated++; } spin_unlock_irq(&pidmap_lock); @@ -331,6 +333,10 @@ static struct pid **task_pid_ptr(struct task_struct *task, enum pid_type type) void attach_pid(struct task_struct *task, enum pid_type type) { struct pid *pid = *task_pid_ptr(task, type); + struct pid_namespace *pid_ns = ns_of_pid(pid); + pid_t pid_nr = pid_nr_ns(pid, pid_ns); + + WARN_ON(type == PIDTYPE_TGID && !idr_get_tag(&pid_ns->idr, pid_nr)); hlist_add_head_rcu(&task->pid_links[type], &pid->tasks[type]); } @@ -520,6 +526,38 @@ struct pid *find_ge_pid(int nr, struct pid_namespace *ns) return idr_get_next(&ns->idr, &nr); } +/* + * Used by proc to find the first thread group leader task with an id greater + * than or equal to *id. + * + * Use the idr tag hint to find the next best pid. The tag does not guarantee a + * linked task exists, so retry until a suitable entry is found. + */ +struct task_struct *find_get_tgid_task(int *id, struct pid_namespace *ns) +{ + struct pid *pid; + struct task_struct *t; + unsigned int nr = *id; + + rcu_read_lock(); + + do { + pid = idr_get_next_tag(&ns->idr, nr); + if (!pid) { + rcu_read_unlock(); + return NULL; + } + t = pid_task(pid, PIDTYPE_TGID); + nr++; + } while (!t); + + *id = pid_nr_ns(pid, ns); + get_task_struct(t); + rcu_read_unlock(); + + return t; +} + struct pid *pidfd_get_pid(unsigned int fd, unsigned int *flags) { struct fd f; From patchwork Mon Jul 11 13:52:37 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Brian Foster X-Patchwork-Id: 12913785 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4360ECCA47B for ; Mon, 11 Jul 2022 13:52:49 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230317AbiGKNws (ORCPT ); Mon, 11 Jul 2022 09:52:48 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48748 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230371AbiGKNwo (ORCPT ); Mon, 11 Jul 2022 09:52:44 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 30D6F61D5D for ; Mon, 11 Jul 2022 06:52:43 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1657547562; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=ZZi7tQd5jc7PjCR++DRTDi/vBjqHkl8WrqcY6+kDf54=; b=dfLiIbaj07wvHLmrCrcMUYmkRWxveVy+zGIQRSrlrFvoNhBNGVYbILI1sP2CDDq43N5SKM X7VDvlqnmuwKqUMpKS3gvyv6D/iWB5ENvlTXLMiU44KwLCiyzBgrQfA65qhBE+bOMiBEHe WzStoXcF2kG0FThLuJkPQsF6vPahTFE= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-622-2FbtIBGBPJiSAqOprJ_oBg-1; Mon, 11 Jul 2022 09:52:39 -0400 X-MC-Unique: 2FbtIBGBPJiSAqOprJ_oBg-1 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.rdu2.redhat.com [10.11.54.5]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id D31431C01B23; Mon, 11 Jul 2022 13:52:38 +0000 (UTC) Received: from bfoster.redhat.com (unknown [10.22.32.24]) by smtp.corp.redhat.com (Postfix) with ESMTP id 9EA4618EA8; Mon, 11 Jul 2022 13:52:38 +0000 (UTC) From: Brian Foster To: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org Cc: ikent@redhat.com, onestero@redhat.com, willy@infradead.org Subject: [PATCH v2 4/4] procfs: use efficient tgid pid search on root readdir Date: Mon, 11 Jul 2022 09:52:37 -0400 Message-Id: <20220711135237.173667-5-bfoster@redhat.com> In-Reply-To: <20220711135237.173667-1-bfoster@redhat.com> References: <20220711135237.173667-1-bfoster@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.11.54.5 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org find_ge_pid() walks every allocated id and checks every associated pid in the namespace for a link to a PIDTYPE_TGID task. If the pid namespace contains processes with large numbers of threads, this search doesn't scale and can notably increase getdents() syscall latency. For example, on a mostly idle 2.4GHz Intel Xeon running Fedora on 5.19.0-rc2, 'strace -T xfs_io -c readdir /proc' shows the following: getdents64(... /* 814 entries */, 32768) = 20624 <0.000568> With the addition of a dummy (i.e. idle) process running that creates an additional 100k threads, that latency increases to: getdents64(... /* 815 entries */, 32768) = 20656 <0.011315> While this may not be noticeable to users in one off /proc scans or simple usage of ps or top, we have users that report problems caused by this latency increase in these sort of scaled environments with custom tooling that makes heavier use of task monitoring. Optimize the tgid task scanning in proc_pid_readdir() by using the more efficient find_get_tgid_task() helper. This significantly improves readdir() latency when the pid namespace is populated with processes with very large thread counts. For example, the above 100k idle task test against a patched kernel now results in the following: Idle: getdents64(... /* 861 entries */, 32768) = 21048 <0.000670> "" + 100k threads: getdents64(... /* 862 entries */, 32768) = 21096 <0.000959> ... which is a much smaller latency hit after the high thread count task is started. Signed-off-by: Brian Foster --- fs/proc/base.c | 17 +---------------- 1 file changed, 1 insertion(+), 16 deletions(-) diff --git a/fs/proc/base.c b/fs/proc/base.c index 8dfa36a99c74..b3bff6d26dcc 100644 --- a/fs/proc/base.c +++ b/fs/proc/base.c @@ -3429,24 +3429,9 @@ struct tgid_iter { }; static struct tgid_iter next_tgid(struct pid_namespace *ns, struct tgid_iter iter) { - struct pid *pid; - if (iter.task) put_task_struct(iter.task); - rcu_read_lock(); -retry: - iter.task = NULL; - pid = find_ge_pid(iter.tgid, ns); - if (pid) { - iter.tgid = pid_nr_ns(pid, ns); - iter.task = pid_task(pid, PIDTYPE_TGID); - if (!iter.task) { - iter.tgid += 1; - goto retry; - } - get_task_struct(iter.task); - } - rcu_read_unlock(); + iter.task = find_get_tgid_task(&iter.tgid, ns); return iter; }