From patchwork Fri Jul 15 11:33:47 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Brian Foster X-Patchwork-Id: 12919118 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9D00CC43334 for ; Fri, 15 Jul 2022 11:33:54 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B0CBF9401E2; Fri, 15 Jul 2022 07:33:53 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id ABD3C9401A5; Fri, 15 Jul 2022 07:33:53 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 986D49401E2; Fri, 15 Jul 2022 07:33:53 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 859289401A5 for ; Fri, 15 Jul 2022 07:33:53 -0400 (EDT) Received: from smtpin26.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 5A1CE215FB for ; Fri, 15 Jul 2022 11:33:53 +0000 (UTC) X-FDA: 79689124746.26.9B9783E Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf03.hostedemail.com (Postfix) with ESMTP id 075D02004D for ; Fri, 15 Jul 2022 11:33:51 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1657884831; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=yxgyt0qVaY9WwgzpJ9tasNw5y5ALrf5uKQLs4e0miO8=; b=KsG+gkOMgCfpsfi3xo+k0Imed0klO+56Uj4Ij2quioaoPqxnfDTSAVX2Yl8Ob10UV6Uab2 LWFY3FO0dlnwFgDzrB/p21rN5a53e6VjW1kUXeRos7/lOcT0gpwIltFKu8uz1NrMjyrdpK qkxk5tPi9C8vpHO9RUTxRVySKn/kPH8= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-329-6T1jPdT4OGOhsQrctocbmA-1; Fri, 15 Jul 2022 07:33:50 -0400 X-MC-Unique: 6T1jPdT4OGOhsQrctocbmA-1 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.rdu2.redhat.com [10.11.54.4]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 0ED738041BE; Fri, 15 Jul 2022 11:33:50 +0000 (UTC) Received: from bfoster.redhat.com (unknown [10.22.32.24]) by smtp.corp.redhat.com (Postfix) with ESMTP id CD1692026D64; Fri, 15 Jul 2022 11:33:49 +0000 (UTC) From: Brian Foster To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: Matthew Wilcox , ikent@redhat.com, oleg@redhat.com Subject: [PATCH 1/3] pid: replace pidmap_lock with xarray lock Date: Fri, 15 Jul 2022 07:33:47 -0400 Message-Id: <20220715113349.831370-2-bfoster@redhat.com> In-Reply-To: <20220715113349.831370-1-bfoster@redhat.com> References: <20220715113349.831370-1-bfoster@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.78 on 10.11.54.4 ARC-Authentication-Results: i=1; imf03.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=KsG+gkOM; dmarc=pass (policy=none) header.from=redhat.com; spf=none (imf03.hostedemail.com: domain of bfoster@redhat.com has no SPF policy when checking 170.10.129.124) smtp.mailfrom=bfoster@redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1657884832; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=yxgyt0qVaY9WwgzpJ9tasNw5y5ALrf5uKQLs4e0miO8=; b=2BGYlh4yqZtT2me+1uXfUOm4z5ZELRwAlnWCkjl2ezUSeSlkK4hxXhCMXGhvnHoMb07M/Z q4MLsSkBMYkD9zPzep+HC7p+roo0jZAkq1Yi7QmSFX3C/QqOIiWbXZ8xXURHHWheNvc5rv XBf0aW4GNyjFMduqHnvIUn4+8zge5bw= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1657884832; a=rsa-sha256; cv=none; b=XQ3rjJpcdlbRwYACtrRVv0UJzoW2RiZBMi7QMTJhY7y0QdIs+EuBt1lyawMEhw3rlpCpFD jrXart7hJkTg2rKRpgH/OeW77R1xKdDU8thvVRGYFmvTwE6YsVdLGioSnNmBWqApPixybL sxUV2BWUZfDX8QYeUiEsvBBYzD05ZCU= X-Rspam-User: X-Rspamd-Queue-Id: 075D02004D X-Rspamd-Server: rspam02 X-Stat-Signature: n51jui56j8x1cuo4rnoybbbti4qwnnz6 Authentication-Results: imf03.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=KsG+gkOM; dmarc=pass (policy=none) header.from=redhat.com; spf=none (imf03.hostedemail.com: domain of bfoster@redhat.com has no SPF policy when checking 170.10.129.124) smtp.mailfrom=bfoster@redhat.com X-HE-Tag: 1657884831-242898 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: As a first step to changing the struct pid tracking code from the idr over to the xarray, replace the custom pidmap_lock spinlock with the internal lock associated with the underlying xarray. This is effectively equivalent to using idr_lock() and friends, but since the goal is to disentangle from the idr, move directly to the underlying xarray api. Signed-off-by: Matthew Wilcox Signed-off-by: Brian Foster --- kernel/pid.c | 79 ++++++++++++++++++++++++++-------------------------- 1 file changed, 40 insertions(+), 39 deletions(-) diff --git a/kernel/pid.c b/kernel/pid.c index 2fc0a16ec77b..72a6e9d0db81 100644 --- a/kernel/pid.c +++ b/kernel/pid.c @@ -86,22 +86,6 @@ struct pid_namespace init_pid_ns = { }; EXPORT_SYMBOL_GPL(init_pid_ns); -/* - * Note: disable interrupts while the pidmap_lock is held as an - * interrupt might come in and do read_lock(&tasklist_lock). - * - * If we don't disable interrupts there is a nasty deadlock between - * detach_pid()->free_pid() and another cpu that does - * spin_lock(&pidmap_lock) followed by an interrupt routine that does - * read_lock(&tasklist_lock); - * - * After we clean up the tasklist_lock and know there are no - * irq handlers that take it we can leave the interrupts enabled. - * For now it is easier to be safe than to prove it can't happen. - */ - -static __cacheline_aligned_in_smp DEFINE_SPINLOCK(pidmap_lock); - void put_pid(struct pid *pid) { struct pid_namespace *ns; @@ -129,10 +113,11 @@ void free_pid(struct pid *pid) int i; unsigned long flags; - spin_lock_irqsave(&pidmap_lock, flags); for (i = 0; i <= pid->level; i++) { struct upid *upid = pid->numbers + i; struct pid_namespace *ns = upid->ns; + + xa_lock_irqsave(&ns->idr.idr_rt, flags); switch (--ns->pid_allocated) { case 2: case 1: @@ -150,8 +135,8 @@ void free_pid(struct pid *pid) } idr_remove(&ns->idr, upid->nr); + xa_unlock_irqrestore(&ns->idr.idr_rt, flags); } - spin_unlock_irqrestore(&pidmap_lock, flags); call_rcu(&pid->rcu, delayed_put_pid); } @@ -206,7 +191,7 @@ struct pid *alloc_pid(struct pid_namespace *ns, pid_t *set_tid, } idr_preload(GFP_KERNEL); - spin_lock_irq(&pidmap_lock); + xa_lock_irq(&tmp->idr.idr_rt); if (tid) { nr = idr_alloc(&tmp->idr, NULL, tid, @@ -233,7 +218,7 @@ struct pid *alloc_pid(struct pid_namespace *ns, pid_t *set_tid, nr = idr_alloc_cyclic(&tmp->idr, NULL, pid_min, pid_max, GFP_ATOMIC); } - spin_unlock_irq(&pidmap_lock); + xa_unlock_irq(&tmp->idr.idr_rt); idr_preload_end(); if (nr < 0) { @@ -266,34 +251,38 @@ struct pid *alloc_pid(struct pid_namespace *ns, pid_t *set_tid, INIT_HLIST_HEAD(&pid->inodes); upid = pid->numbers + ns->level; - spin_lock_irq(&pidmap_lock); - if (!(ns->pid_allocated & PIDNS_ADDING)) - goto out_unlock; for ( ; upid >= pid->numbers; --upid) { + tmp = upid->ns; + + xa_lock_irq(&tmp->idr.idr_rt); + if (tmp == ns && !(tmp->pid_allocated & PIDNS_ADDING)) { + xa_unlock_irq(&tmp->idr.idr_rt); + put_pid_ns(ns); + goto out_free; + } + /* Make the PID visible to find_pid_ns. */ - idr_replace(&upid->ns->idr, pid, upid->nr); - upid->ns->pid_allocated++; + idr_replace(&tmp->idr, pid, upid->nr); + tmp->pid_allocated++; + xa_unlock_irq(&tmp->idr.idr_rt); } - spin_unlock_irq(&pidmap_lock); return pid; -out_unlock: - spin_unlock_irq(&pidmap_lock); - put_pid_ns(ns); - out_free: - spin_lock_irq(&pidmap_lock); while (++i <= ns->level) { upid = pid->numbers + i; - idr_remove(&upid->ns->idr, upid->nr); - } + tmp = upid->ns; - /* On failure to allocate the first pid, reset the state */ - if (ns->pid_allocated == PIDNS_ADDING) - idr_set_cursor(&ns->idr, 0); + xa_lock_irq(&tmp->idr.idr_rt); - spin_unlock_irq(&pidmap_lock); + /* On failure to allocate the first pid, reset the state */ + if (tmp == ns && tmp->pid_allocated == PIDNS_ADDING) + idr_set_cursor(&ns->idr, 0); + + idr_remove(&tmp->idr, upid->nr); + xa_unlock_irq(&tmp->idr.idr_rt); + } kmem_cache_free(ns->pid_cachep, pid); return ERR_PTR(retval); @@ -301,9 +290,9 @@ struct pid *alloc_pid(struct pid_namespace *ns, pid_t *set_tid, void disable_pid_allocation(struct pid_namespace *ns) { - spin_lock_irq(&pidmap_lock); + xa_lock_irq(&ns->idr.idr_rt); ns->pid_allocated &= ~PIDNS_ADDING; - spin_unlock_irq(&pidmap_lock); + xa_unlock_irq(&ns->idr.idr_rt); } struct pid *find_pid_ns(int nr, struct pid_namespace *ns) @@ -646,6 +635,18 @@ SYSCALL_DEFINE2(pidfd_open, pid_t, pid, unsigned int, flags) return fd; } +/* + * Note: disable interrupts while the xarray lock is held as an interrupt might + * come in and do read_lock(&tasklist_lock). + * + * If we don't disable interrupts there is a nasty deadlock between + * detach_pid()->free_pid() and another cpu that does xa_lock() followed by an + * interrupt routine that does read_lock(&tasklist_lock); + * + * After we clean up the tasklist_lock and know there are no irq handlers that + * take it we can leave the interrupts enabled. For now it is easier to be safe + * than to prove it can't happen. + */ void __init pid_idr_init(void) { /* Verify no one has done anything silly: */ From patchwork Fri Jul 15 11:33:48 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Brian Foster X-Patchwork-Id: 12919119 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 40250CCA47C for ; Fri, 15 Jul 2022 11:33:56 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 67B3F9401E4; Fri, 15 Jul 2022 07:33:55 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 5DC7E9401E5; Fri, 15 Jul 2022 07:33:55 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 3DF2B9401E4; Fri, 15 Jul 2022 07:33:55 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 2DB129401E3 for ; Fri, 15 Jul 2022 07:33:55 -0400 (EDT) Received: from smtpin28.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id EAEA3358AC for ; Fri, 15 Jul 2022 11:33:54 +0000 (UTC) X-FDA: 79689124788.28.0E20A95 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf06.hostedemail.com (Postfix) with ESMTP id 5E6E0180036 for ; Fri, 15 Jul 2022 11:33:54 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1657884833; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=mSVbgxFJfTBqx19mVcvhDywOdKVcI2pzv3Tboo7rs1U=; b=dPlh6BjREWbCX0DTcRNufWuDtJXb8XS4R0Wy5rujPu3Jp/rD6w8FNTj+mSJ+cox0QA4s0t u/lsunu5eFa/KoRhwG1o7ri5IkDGgU2hUbTy+qRGHxtxy304HSodsCMka9fk+F1cVXim4T /rrkJKJU+8UMNiIE5w8+vVD6OL5rwqM= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-164-_WO-K6-_MVqM4nZ-NYdFkw-1; Fri, 15 Jul 2022 07:33:50 -0400 X-MC-Unique: _WO-K6-_MVqM4nZ-NYdFkw-1 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.rdu2.redhat.com [10.11.54.4]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 5856A3833288; Fri, 15 Jul 2022 11:33:50 +0000 (UTC) Received: from bfoster.redhat.com (unknown [10.22.32.24]) by smtp.corp.redhat.com (Postfix) with ESMTP id 1D9232026D07; Fri, 15 Jul 2022 11:33:50 +0000 (UTC) From: Brian Foster To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: Matthew Wilcox , ikent@redhat.com, oleg@redhat.com Subject: [PATCH 2/3] pid: split cyclic id allocation cursor from idr Date: Fri, 15 Jul 2022 07:33:48 -0400 Message-Id: <20220715113349.831370-3-bfoster@redhat.com> In-Reply-To: <20220715113349.831370-1-bfoster@redhat.com> References: <20220715113349.831370-1-bfoster@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.78 on 10.11.54.4 ARC-Authentication-Results: i=1; imf06.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=dPlh6BjR; dmarc=pass (policy=none) header.from=redhat.com; spf=none (imf06.hostedemail.com: domain of bfoster@redhat.com has no SPF policy when checking 170.10.133.124) smtp.mailfrom=bfoster@redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1657884834; a=rsa-sha256; cv=none; b=RBmgCcNoO0v/WzopcP5hcmkXcABHQlDWzNIJ9r/EU+VLS1fa/A7UJejR0RQ59F8XwHbV7U Uej+v+jQvOJq4dcDSYQKi6KXGwscrRXktbVHbkTBiUgTGA5UC3oMWXCdhyLYzjzapsJWkT iHztH5OYZuMOTDK4dKlAiQvSJNAbNJU= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1657884834; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=mSVbgxFJfTBqx19mVcvhDywOdKVcI2pzv3Tboo7rs1U=; b=ve26/SAduWDVK1zDkal5qOHnH6jd8whzRqx5eNshj3LqyxWgaqUcvLqWYVzs23Ke3aS4RH OCYmyuE7wY/lDYalvmcwfVudB+Tj7uGwS9FXBFiAWSiU/YyUp/qCHM5QyiV0G9r6xtf96W k7J66et0B7bjCoHfV/UKIwAJlaISaUw= Authentication-Results: imf06.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=dPlh6BjR; dmarc=pass (policy=none) header.from=redhat.com; spf=none (imf06.hostedemail.com: domain of bfoster@redhat.com has no SPF policy when checking 170.10.133.124) smtp.mailfrom=bfoster@redhat.com X-Rspam-User: X-Stat-Signature: qh94kpikgiozjc8bgfk438ujhfp6o851 X-Rspamd-Queue-Id: 5E6E0180036 X-Rspamd-Server: rspam03 X-HE-Tag: 1657884834-576233 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: As a next step in separating pid allocation from the idr, split off the cyclic pid allocation cursor from the idr. Lift the cursor value into the sturct pid_namespace. Note that this involves temporarily open-coding the cursor increment on allocation, but this is cleaned up in the subsequent patch. Signed-off-by: Matthew Wilcox Signed-off-by: Brian Foster --- arch/powerpc/platforms/cell/spufs/sched.c | 2 +- fs/proc/loadavg.c | 2 +- include/linux/pid_namespace.h | 1 + kernel/pid.c | 6 ++++-- kernel/pid_namespace.c | 4 ++-- 5 files changed, 9 insertions(+), 6 deletions(-) diff --git a/arch/powerpc/platforms/cell/spufs/sched.c b/arch/powerpc/platforms/cell/spufs/sched.c index 99bd027a7f7c..a2ed928d7658 100644 --- a/arch/powerpc/platforms/cell/spufs/sched.c +++ b/arch/powerpc/platforms/cell/spufs/sched.c @@ -1072,7 +1072,7 @@ static int show_spu_loadavg(struct seq_file *s, void *private) LOAD_INT(c), LOAD_FRAC(c), count_active_contexts(), atomic_read(&nr_spu_contexts), - idr_get_cursor(&task_active_pid_ns(current)->idr) - 1); + READ_ONCE(task_active_pid_ns(current)->pid_next) - 1); return 0; } #endif diff --git a/fs/proc/loadavg.c b/fs/proc/loadavg.c index f32878d9a39f..62f89d549582 100644 --- a/fs/proc/loadavg.c +++ b/fs/proc/loadavg.c @@ -21,7 +21,7 @@ static int loadavg_proc_show(struct seq_file *m, void *v) LOAD_INT(avnrun[1]), LOAD_FRAC(avnrun[1]), LOAD_INT(avnrun[2]), LOAD_FRAC(avnrun[2]), nr_running(), nr_threads, - idr_get_cursor(&task_active_pid_ns(current)->idr) - 1); + READ_ONCE(task_active_pid_ns(current)->pid_next) - 1); return 0; } diff --git a/include/linux/pid_namespace.h b/include/linux/pid_namespace.h index 07481bb87d4e..82c72482019d 100644 --- a/include/linux/pid_namespace.h +++ b/include/linux/pid_namespace.h @@ -18,6 +18,7 @@ struct fs_pin; struct pid_namespace { struct idr idr; + unsigned int pid_next; struct rcu_head rcu; unsigned int pid_allocated; struct task_struct *child_reaper; diff --git a/kernel/pid.c b/kernel/pid.c index 72a6e9d0db81..409303ada383 100644 --- a/kernel/pid.c +++ b/kernel/pid.c @@ -75,6 +75,7 @@ int pid_max_max = PID_MAX_LIMIT; struct pid_namespace init_pid_ns = { .ns.count = REFCOUNT_INIT(2), .idr = IDR_INIT(init_pid_ns.idr), + .pid_next = 0, .pid_allocated = PIDNS_ADDING, .level = 0, .child_reaper = &init_task, @@ -208,7 +209,7 @@ struct pid *alloc_pid(struct pid_namespace *ns, pid_t *set_tid, * init really needs pid 1, but after reaching the * maximum wrap back to RESERVED_PIDS */ - if (idr_get_cursor(&tmp->idr) > RESERVED_PIDS) + if (tmp->pid_next > RESERVED_PIDS) pid_min = RESERVED_PIDS; /* @@ -217,6 +218,7 @@ struct pid *alloc_pid(struct pid_namespace *ns, pid_t *set_tid, */ nr = idr_alloc_cyclic(&tmp->idr, NULL, pid_min, pid_max, GFP_ATOMIC); + tmp->pid_next = nr + 1; } xa_unlock_irq(&tmp->idr.idr_rt); idr_preload_end(); @@ -278,7 +280,7 @@ struct pid *alloc_pid(struct pid_namespace *ns, pid_t *set_tid, /* On failure to allocate the first pid, reset the state */ if (tmp == ns && tmp->pid_allocated == PIDNS_ADDING) - idr_set_cursor(&ns->idr, 0); + ns->pid_next = 0; idr_remove(&tmp->idr, upid->nr); xa_unlock_irq(&tmp->idr.idr_rt); diff --git a/kernel/pid_namespace.c b/kernel/pid_namespace.c index f4f8cb0435b4..a53d20c5c85e 100644 --- a/kernel/pid_namespace.c +++ b/kernel/pid_namespace.c @@ -272,12 +272,12 @@ static int pid_ns_ctl_handler(struct ctl_table *table, int write, * it should synchronize its usage with external means. */ - next = idr_get_cursor(&pid_ns->idr) - 1; + next = READ_ONCE(pid_ns->pid_next) - 1; tmp.data = &next; ret = proc_dointvec_minmax(&tmp, write, buffer, lenp, ppos); if (!ret && write) - idr_set_cursor(&pid_ns->idr, next + 1); + WRITE_ONCE(pid_ns->pid_next, next + 1); return ret; } From patchwork Fri Jul 15 11:33:49 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Brian Foster X-Patchwork-Id: 12919121 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id CF5E9C433EF for ; Fri, 15 Jul 2022 11:33:58 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B9F1A9401E3; Fri, 15 Jul 2022 07:33:55 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id B107F9401E5; Fri, 15 Jul 2022 07:33:55 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 853D99401E3; Fri, 15 Jul 2022 07:33:55 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 50B329401A5 for ; Fri, 15 Jul 2022 07:33:55 -0400 (EDT) Received: from smtpin31.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay12.hostedemail.com (Postfix) with ESMTP id 2939B121373 for ; Fri, 15 Jul 2022 11:33:55 +0000 (UTC) X-FDA: 79689124830.31.7EEDAF3 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf04.hostedemail.com (Postfix) with ESMTP id B882940048 for ; Fri, 15 Jul 2022 11:33:54 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1657884834; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Hvuf9VnGJpcPanSv5G7DzQd+yzx1Nt4XTho4EmUGdsU=; b=f2ol8O7Vb9p6fXWEgQ+GJllmIB5BIjq3Q7pdjfUZ1YoqRrTIPGJNvORtHD+U8PZ/E7fubH VthfeASFEYGxVLl3ptEjLGPG53eqL8n1XBXnESXMu03/aoCnQNPeDsvRA+v3bXypKVqb9S 4Q8MQcNnZMmIz4L3qTZwVpg0mR2dVlM= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-112-TrRwoTvcNU6v99Zo8BPrJA-1; Fri, 15 Jul 2022 07:33:50 -0400 X-MC-Unique: TrRwoTvcNU6v99Zo8BPrJA-1 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.rdu2.redhat.com [10.11.54.4]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 9CEB51C05EC1; Fri, 15 Jul 2022 11:33:50 +0000 (UTC) Received: from bfoster.redhat.com (unknown [10.22.32.24]) by smtp.corp.redhat.com (Postfix) with ESMTP id 65D992026D64; Fri, 15 Jul 2022 11:33:50 +0000 (UTC) From: Brian Foster To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: Matthew Wilcox , ikent@redhat.com, oleg@redhat.com Subject: [PATCH 3/3] pid: switch pid_namespace from idr to xarray Date: Fri, 15 Jul 2022 07:33:49 -0400 Message-Id: <20220715113349.831370-4-bfoster@redhat.com> In-Reply-To: <20220715113349.831370-1-bfoster@redhat.com> References: <20220715113349.831370-1-bfoster@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.78 on 10.11.54.4 ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1657884834; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Hvuf9VnGJpcPanSv5G7DzQd+yzx1Nt4XTho4EmUGdsU=; b=Za72Ad9Rs48TouYJVSdIcilZUNgRUCH8xNW9SVrAx0KzLxJusnh+BzYsxFOoysFmncZSnr A0UR6qsJeyJnKyroUEcpDB7xIUGXif0F2uFdICKe5iVg01a4SmFao05g+91SeO4+oVNZGU +re8Rz3U9Tyfv+TpPoxO53k6NJDhnxs= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1657884834; a=rsa-sha256; cv=none; b=7W5CGdRF6u427ulQWhBPAChJkV8JNWT43vdME49rtOQwtzZHas6zGjB6kJz6ZhWAMzKRQw aDniLiciKkjhZ6j/DELqCt5aOf812JNo7iNAuOXFBMl2/K4LYpcI0dvQpSrlBnSRLFFsWy o9WkocW3ZhJjoTrZFqBmEOFUvLga53Q= ARC-Authentication-Results: i=1; imf04.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=f2ol8O7V; dmarc=pass (policy=none) header.from=redhat.com; spf=none (imf04.hostedemail.com: domain of bfoster@redhat.com has no SPF policy when checking 170.10.133.124) smtp.mailfrom=bfoster@redhat.com X-Rspam-User: X-Rspamd-Server: rspam05 Authentication-Results: imf04.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=f2ol8O7V; dmarc=pass (policy=none) header.from=redhat.com; spf=none (imf04.hostedemail.com: domain of bfoster@redhat.com has no SPF policy when checking 170.10.133.124) smtp.mailfrom=bfoster@redhat.com X-Stat-Signature: cz1cecc8rm1pmkf3azrn66g4mg75rysg X-Rspamd-Queue-Id: B882940048 X-HE-Tag: 1657884834-689241 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Switch struct pid[_namespace] management over to use the xarray api directly instead of the idr. The underlying data structures used by both interfaces is the same. The difference is that the idr api relies on the old, idr-custom radix-tree implementation for things like efficient tracking/allocation of free ids. The xarray already supports this, so most of this is a direct switchover from the old api to the new. Signed-off-by: Matthew Wilcox Signed-off-by: Brian Foster --- include/linux/pid_namespace.h | 8 ++-- include/linux/threads.h | 2 +- init/main.c | 3 +- kernel/pid.c | 78 ++++++++++++++++------------------- kernel/pid_namespace.c | 19 ++++----- 5 files changed, 51 insertions(+), 59 deletions(-) diff --git a/include/linux/pid_namespace.h b/include/linux/pid_namespace.h index 82c72482019d..e4f5979b482b 100644 --- a/include/linux/pid_namespace.h +++ b/include/linux/pid_namespace.h @@ -9,7 +9,7 @@ #include #include #include -#include +#include /* MAX_PID_NS_LEVEL is needed for limiting size of 'struct pid' */ #define MAX_PID_NS_LEVEL 32 @@ -17,7 +17,7 @@ struct fs_pin; struct pid_namespace { - struct idr idr; + struct xarray xa; unsigned int pid_next; struct rcu_head rcu; unsigned int pid_allocated; @@ -38,6 +38,8 @@ extern struct pid_namespace init_pid_ns; #define PIDNS_ADDING (1U << 31) +#define PID_XA_FLAGS (XA_FLAGS_TRACK_FREE | XA_FLAGS_LOCK_IRQ) + #ifdef CONFIG_PID_NS static inline struct pid_namespace *get_pid_ns(struct pid_namespace *ns) { @@ -85,7 +87,7 @@ static inline int reboot_pid_ns(struct pid_namespace *pid_ns, int cmd) extern struct pid_namespace *task_active_pid_ns(struct task_struct *tsk); void pidhash_init(void); -void pid_idr_init(void); +void pid_init(void); static inline bool task_is_in_init_pid_ns(struct task_struct *tsk) { diff --git a/include/linux/threads.h b/include/linux/threads.h index c34173e6c5f1..37e4391ee89f 100644 --- a/include/linux/threads.h +++ b/include/linux/threads.h @@ -38,7 +38,7 @@ * Define a minimum number of pids per cpu. Heuristically based * on original pid max of 32k for 32 cpus. Also, increase the * minimum settable value for pid_max on the running system based - * on similar defaults. See kernel/pid.c:pid_idr_init() for details. + * on similar defaults. See kernel/pid.c:pid_init() for details. */ #define PIDS_PER_CPU_DEFAULT 1024 #define PIDS_PER_CPU_MIN 8 diff --git a/init/main.c b/init/main.c index 0ee39cdcfcac..3944dcd10c09 100644 --- a/init/main.c +++ b/init/main.c @@ -73,7 +73,6 @@ #include #include #include -#include #include #include #include @@ -1100,7 +1099,7 @@ asmlinkage __visible void __init __no_sanitize_address start_kernel(void) late_time_init(); sched_clock_init(); calibrate_delay(); - pid_idr_init(); + pid_init(); anon_vma_init(); #ifdef CONFIG_X86 if (efi_enabled(EFI_RUNTIME_SERVICES)) diff --git a/kernel/pid.c b/kernel/pid.c index 409303ada383..b86a97dbdcc1 100644 --- a/kernel/pid.c +++ b/kernel/pid.c @@ -41,7 +41,7 @@ #include #include #include -#include +#include #include #include @@ -66,15 +66,9 @@ int pid_max = PID_MAX_DEFAULT; int pid_max_min = RESERVED_PIDS + 1; int pid_max_max = PID_MAX_LIMIT; -/* - * PID-map pages start out as NULL, they get allocated upon - * first use and are never deallocated. This way a low pid_max - * value does not cause lots of bitmaps to be allocated, but - * the scheme scales to up to 4 million PIDs, runtime. - */ struct pid_namespace init_pid_ns = { .ns.count = REFCOUNT_INIT(2), - .idr = IDR_INIT(init_pid_ns.idr), + .xa = XARRAY_INIT(init_pid_ns.xa, PID_XA_FLAGS), .pid_next = 0, .pid_allocated = PIDNS_ADDING, .level = 0, @@ -118,7 +112,7 @@ void free_pid(struct pid *pid) struct upid *upid = pid->numbers + i; struct pid_namespace *ns = upid->ns; - xa_lock_irqsave(&ns->idr.idr_rt, flags); + xa_lock_irqsave(&ns->xa, flags); switch (--ns->pid_allocated) { case 2: case 1: @@ -135,8 +129,8 @@ void free_pid(struct pid *pid) break; } - idr_remove(&ns->idr, upid->nr); - xa_unlock_irqrestore(&ns->idr.idr_rt, flags); + __xa_erase(&ns->xa, upid->nr); + xa_unlock_irqrestore(&ns->xa, flags); } call_rcu(&pid->rcu, delayed_put_pid); @@ -147,7 +141,7 @@ struct pid *alloc_pid(struct pid_namespace *ns, pid_t *set_tid, { struct pid *pid; enum pid_type type; - int i, nr; + int i; struct pid_namespace *tmp; struct upid *upid; int retval = -ENOMEM; @@ -191,18 +185,17 @@ struct pid *alloc_pid(struct pid_namespace *ns, pid_t *set_tid, set_tid_size--; } - idr_preload(GFP_KERNEL); - xa_lock_irq(&tmp->idr.idr_rt); + xa_lock_irq(&tmp->xa); if (tid) { - nr = idr_alloc(&tmp->idr, NULL, tid, - tid + 1, GFP_ATOMIC); + retval = __xa_insert(&tmp->xa, tid, NULL, GFP_KERNEL); + /* - * If ENOSPC is returned it means that the PID is - * alreay in use. Return EEXIST in that case. + * If EBUSY is returned it means that the PID is already + * in use. Return EEXIST in that case. */ - if (nr == -ENOSPC) - nr = -EEXIST; + if (retval == -EBUSY) + retval = -EEXIST; } else { int pid_min = 1; /* @@ -216,19 +209,18 @@ struct pid *alloc_pid(struct pid_namespace *ns, pid_t *set_tid, * Store a null pointer so find_pid_ns does not find * a partially initialized PID (see below). */ - nr = idr_alloc_cyclic(&tmp->idr, NULL, pid_min, - pid_max, GFP_ATOMIC); - tmp->pid_next = nr + 1; + retval = __xa_alloc_cyclic(&tmp->xa, &tid, NULL, + XA_LIMIT(pid_min, pid_max), + &tmp->pid_next, GFP_KERNEL); + if (retval == -EBUSY) + retval = -EAGAIN; } - xa_unlock_irq(&tmp->idr.idr_rt); - idr_preload_end(); + xa_unlock_irq(&tmp->xa); - if (nr < 0) { - retval = (nr == -ENOSPC) ? -EAGAIN : nr; + if (retval < 0) goto out_free; - } - pid->numbers[i].nr = nr; + pid->numbers[i].nr = tid; pid->numbers[i].ns = tmp; tmp = tmp->parent; } @@ -256,17 +248,17 @@ struct pid *alloc_pid(struct pid_namespace *ns, pid_t *set_tid, for ( ; upid >= pid->numbers; --upid) { tmp = upid->ns; - xa_lock_irq(&tmp->idr.idr_rt); + xa_lock_irq(&tmp->xa); if (tmp == ns && !(tmp->pid_allocated & PIDNS_ADDING)) { - xa_unlock_irq(&tmp->idr.idr_rt); + xa_unlock_irq(&tmp->xa); put_pid_ns(ns); goto out_free; } /* Make the PID visible to find_pid_ns. */ - idr_replace(&tmp->idr, pid, upid->nr); + __xa_store(&tmp->xa, upid->nr, pid, 0); tmp->pid_allocated++; - xa_unlock_irq(&tmp->idr.idr_rt); + xa_unlock_irq(&tmp->xa); } return pid; @@ -276,14 +268,14 @@ struct pid *alloc_pid(struct pid_namespace *ns, pid_t *set_tid, upid = pid->numbers + i; tmp = upid->ns; - xa_lock_irq(&tmp->idr.idr_rt); + xa_lock_irq(&tmp->xa); /* On failure to allocate the first pid, reset the state */ if (tmp == ns && tmp->pid_allocated == PIDNS_ADDING) ns->pid_next = 0; - idr_remove(&tmp->idr, upid->nr); - xa_unlock_irq(&tmp->idr.idr_rt); + __xa_erase(&tmp->xa, upid->nr); + xa_unlock_irq(&tmp->xa); } kmem_cache_free(ns->pid_cachep, pid); @@ -292,14 +284,14 @@ struct pid *alloc_pid(struct pid_namespace *ns, pid_t *set_tid, void disable_pid_allocation(struct pid_namespace *ns) { - xa_lock_irq(&ns->idr.idr_rt); + xa_lock_irq(&ns->xa); ns->pid_allocated &= ~PIDNS_ADDING; - xa_unlock_irq(&ns->idr.idr_rt); + xa_unlock_irq(&ns->xa); } struct pid *find_pid_ns(int nr, struct pid_namespace *ns) { - return idr_find(&ns->idr, nr); + return xa_load(&ns->xa, nr); } EXPORT_SYMBOL_GPL(find_pid_ns); @@ -508,7 +500,9 @@ EXPORT_SYMBOL_GPL(task_active_pid_ns); */ struct pid *find_ge_pid(int nr, struct pid_namespace *ns) { - return idr_get_next(&ns->idr, &nr); + unsigned long index = nr; + + return xa_find(&ns->xa, &index, ULONG_MAX, XA_PRESENT); } struct pid *pidfd_get_pid(unsigned int fd, unsigned int *flags) @@ -649,7 +643,7 @@ SYSCALL_DEFINE2(pidfd_open, pid_t, pid, unsigned int, flags) * take it we can leave the interrupts enabled. For now it is easier to be safe * than to prove it can't happen. */ -void __init pid_idr_init(void) +void __init pid_init(void) { /* Verify no one has done anything silly: */ BUILD_BUG_ON(PID_MAX_LIMIT >= PIDNS_ADDING); @@ -661,8 +655,6 @@ void __init pid_idr_init(void) PIDS_PER_CPU_MIN * num_possible_cpus()); pr_info("pid_max: default: %u minimum: %u\n", pid_max, pid_max_min); - idr_init(&init_pid_ns.idr); - init_pid_ns.pid_cachep = KMEM_CACHE(pid, SLAB_HWCACHE_ALIGN | SLAB_PANIC | SLAB_ACCOUNT); } diff --git a/kernel/pid_namespace.c b/kernel/pid_namespace.c index a53d20c5c85e..8561e01e2d01 100644 --- a/kernel/pid_namespace.c +++ b/kernel/pid_namespace.c @@ -22,7 +22,7 @@ #include #include #include -#include +#include static DEFINE_MUTEX(pid_caches_mutex); static struct kmem_cache *pid_ns_cachep; @@ -92,15 +92,15 @@ static struct pid_namespace *create_pid_namespace(struct user_namespace *user_ns if (ns == NULL) goto out_dec; - idr_init(&ns->idr); + xa_init_flags(&ns->xa, PID_XA_FLAGS); ns->pid_cachep = create_pid_cachep(level); if (ns->pid_cachep == NULL) - goto out_free_idr; + goto out_free_xa; err = ns_alloc_inum(&ns->ns); if (err) - goto out_free_idr; + goto out_free_xa; ns->ns.ops = &pidns_operations; refcount_set(&ns->ns.count, 1); @@ -112,8 +112,8 @@ static struct pid_namespace *create_pid_namespace(struct user_namespace *user_ns return ns; -out_free_idr: - idr_destroy(&ns->idr); +out_free_xa: + xa_destroy(&ns->xa); kmem_cache_free(pid_ns_cachep, ns); out_dec: dec_pid_namespaces(ucounts); @@ -135,7 +135,7 @@ static void destroy_pid_namespace(struct pid_namespace *ns) { ns_free_inum(&ns->ns); - idr_destroy(&ns->idr); + xa_destroy(&ns->xa); call_rcu(&ns->rcu, delayed_free_pidns); } @@ -165,7 +165,7 @@ EXPORT_SYMBOL_GPL(put_pid_ns); void zap_pid_ns_processes(struct pid_namespace *pid_ns) { - int nr; + long nr; int rc; struct task_struct *task, *me = current; int init_pids = thread_group_leader(me) ? 1 : 2; @@ -198,8 +198,7 @@ void zap_pid_ns_processes(struct pid_namespace *pid_ns) */ rcu_read_lock(); read_lock(&tasklist_lock); - nr = 2; - idr_for_each_entry_continue(&pid_ns->idr, pid, nr) { + xa_for_each_range(&pid_ns->xa, nr, pid, 2, ULONG_MAX) { task = pid_task(pid, PIDTYPE_PID); if (task && !__fatal_signal_pending(task)) group_send_sig_info(SIGKILL, SEND_SIG_PRIV, task, PIDTYPE_MAX);