From patchwork Wed Feb 5 20:09:28 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mateusz Guzik X-Patchwork-Id: 13961859 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2D6E5C02198 for ; Wed, 5 Feb 2025 20:09:50 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 28ECE28001A; Wed, 5 Feb 2025 15:09:49 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 23E78280008; Wed, 5 Feb 2025 15:09:49 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 0443E28001A; Wed, 5 Feb 2025 15:09:48 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id D32AB280008 for ; Wed, 5 Feb 2025 15:09:48 -0500 (EST) Received: from smtpin06.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 943B11A099C for ; Wed, 5 Feb 2025 20:09:48 +0000 (UTC) X-FDA: 83086981656.06.D23BE6E Received: from mail-ej1-f42.google.com (mail-ej1-f42.google.com [209.85.218.42]) by imf10.hostedemail.com (Postfix) with ESMTP id 9D2E8C000D for ; Wed, 5 Feb 2025 20:09:46 +0000 (UTC) Authentication-Results: imf10.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=PsXYGjT8; spf=pass (imf10.hostedemail.com: domain of mjguzik@gmail.com designates 209.85.218.42 as permitted sender) smtp.mailfrom=mjguzik@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1738786186; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=UXXHLc4u2eFHoy15ceLKQBCXMdX6jJSCsW2EOllr/as=; b=M58mfPqhSLKOGRx8P5vtGr8bNZ96qOwwg8FYhT49Djtx6MxgUT6FBAIcMtzTkH0MNEqgri pS/DuqHjvGrmfEJ+Zx6n6eWqiMLQDLzfRp9j4gxCXKh1Dvs3cEVY4dzpfdHW7sup3eqCrW NJBkNYrTT1kLPdPTtCdd+ZptmRC9fZ0= ARC-Authentication-Results: i=1; imf10.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=PsXYGjT8; spf=pass (imf10.hostedemail.com: domain of mjguzik@gmail.com designates 209.85.218.42 as permitted sender) smtp.mailfrom=mjguzik@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1738786186; a=rsa-sha256; cv=none; b=KmTf8bqdB2pAuXhI0o0XzPiYRn/uUrOoT33NnWAfSlweo4cK3X1uozzgJ4yJ+XMIdeJIJS JT3CVsz2WLSte2b9FDSRpvB3l1fpC/yIroopNyYX5lgJ3BjAYjtCJ2taBiwtwzp558aN0I 0l+SYDGIKdPVwctaPfDXCil4GVT20Js= Received: by mail-ej1-f42.google.com with SMTP id a640c23a62f3a-ab737e78900so40941366b.0 for ; Wed, 05 Feb 2025 12:09:46 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1738786185; x=1739390985; darn=kvack.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=UXXHLc4u2eFHoy15ceLKQBCXMdX6jJSCsW2EOllr/as=; b=PsXYGjT8RwCUOTMI38MdBLIG9zDtHraE3Xye1egI5UaicNc122Nge19ZVL0zusIESd YzTipie73tkBnIvcSBfUJwSHFHXS2KJD8ebW1QuU6thTxNR24rZe3Nu7lW/5cqYN+B7s /W8pe9xpcvB9DC3wbvJLwLq1HMwJktSERWKbQdYM4r7ZQ5yU6Liy8yGo/zLQef95Mlg4 fmhd+E0kbpmqr4CN4892GGiOJaE5urJdHBxe7rH7Kh2zvtxscD/0L6NuHKrQd1cez/78 9YljlXtw3heDtIlARg63jgaVTYEElar3fxn5jJYdkMoUwaBfLaJpUPZrGf4YqXEn9ZOx iOIg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1738786185; x=1739390985; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=UXXHLc4u2eFHoy15ceLKQBCXMdX6jJSCsW2EOllr/as=; b=IVBZty8HGquKsXLASwXZmy4Opcm8XH+v8YFK/UPFss3anq7z/MbrvQnXc9wd9p4IWY fLgA/cqFeveZrmSjAy28EKsv4OH8go3ifJ/iRqJEPpQNgwtCCYrfcNhkNAqNhHNqMzwM frtU0SCqjMn9K5YgTl1yAmQlgZXZL9B+7Yek1M5sTaYyd4+N7E632eOwrvreQDNxOoY+ Bg1E46T0BVy0czps0P1OGaIK6lGN9EpvOwje6hvz2cM6NbsLLJWWN3qqGjyLyrMibPv1 Bbz1tfaJWi6s5F9/bZSBhwMaUT5awRZH2CZ5lSTi0/fQ6ywRHWcQfCAsh/kjyBhWNjou WPUw== X-Forwarded-Encrypted: i=1; AJvYcCXBrXtR0PD5VLH5u7Bh7yJooV/oCTgf7ChF40h0ozoawW7Ba5R9sxO24oXuJ3EFMGVjZL2nrDmWog==@kvack.org X-Gm-Message-State: AOJu0YyOyr7uzuxTJT0taVZg0g4FChejuNvbSmEj/TgYSuNP1gnO/rG+ 4b4TMGxo6EJUcdRZBF3or4pu00Kczddk7J2OyTIvoq2nI60sHSZD X-Gm-Gg: ASbGncu1EvfyYNJN6KJ5C7L7+0WxD3xv907mSNbFzzeQA0zg3x/MLu0U++8yxQKwx74 WfQZ9I0hi6ngJtZ7T2BPefw6lnx6qzhkK8AYqGFOXccXvYsyApxmpHwKIHCNGESA+2ffG+Flkve hR8FNU4Y6PC30mm9/NVW9dVWmB9sMSgYc4CRpiCvCgte6IXT1mkNkguzQ44L8tbaLwR1YdVaaLb bU8wllAmMPbB3gev6ohv3iMk48RYJVBEs8LcmRyL3PJTDGfUKnG3NrJmhaPD4NJqwpQ86TkzLoI u5i3bTBFHMqGz7e8PUvT3kEN/YS5CP8= X-Google-Smtp-Source: AGHT+IFDmULsBHl9yMwJRJbGHgVNOKJAub6H9YT0BDzPSTc7XIU9hNIyfZX4Cm4lZOh8Lcspw1U0/w== X-Received: by 2002:a17:907:96a2:b0:aa6:9198:75a6 with SMTP id a640c23a62f3a-ab75e23ea38mr444140866b.21.1738786185031; Wed, 05 Feb 2025 12:09:45 -0800 (PST) Received: from f.. (cst-prg-95-94.cust.vodafone.cz. [46.135.95.94]) by smtp.gmail.com with ESMTPSA id a640c23a62f3a-ab703fbcc0csm932221866b.53.2025.02.05.12.09.42 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 05 Feb 2025 12:09:44 -0800 (PST) From: Mateusz Guzik To: ebiederm@xmission.com, oleg@redhat.com Cc: brauner@kernel.org, akpm@linux-foundation.org, Liam.Howlett@oracle.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Mateusz Guzik Subject: [PATCH v5 4/5] pid: perform free_pid() calls outside of tasklist_lock Date: Wed, 5 Feb 2025 21:09:28 +0100 Message-ID: <20250205200929.406568-5-mjguzik@gmail.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20250205200929.406568-1-mjguzik@gmail.com> References: <20250205200929.406568-1-mjguzik@gmail.com> MIME-Version: 1.0 X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: 9D2E8C000D X-Stat-Signature: 7kct9gn9jjkp16mjxrh9rexopjbo3bsb X-Rspam-User: X-HE-Tag: 1738786186-595883 X-HE-Meta: U2FsdGVkX18m0DBF+yj99BF0fw5kGU3HMUj0rrkBLqJ02nX/zISI7NezFBlOVoiCw3/IwdNNOngsIPqdLzHCCzJjWjDwxQ75YEXsfSvSL2cscUAArwD8hdzRP6drK+42Igoxw6WSaxY214eDUDzmQpykvNp597AmzvIHLICsnsD5tRvAdd9veE3g5xRb7Vj/AU83qGDAomM/9dLri+U6MN7ehU4Vnv/lWrmSpx6dRCSEj3aS16q/dyqQV1OSv1vCiDRpgSdmNe8SjSp/UBEkm88hZ406/EET2RTa8Tdlo72Ofvw4ltD7fVLjGv/Qyvq2ddJlTzUytzo8CZa+6ZBboSsZI/FOkq5IXRJS13/TSAxblAiSz8K1R9xNb+hEFsfs8/cswBZKHLumo/ibf5i9J62A1u6ioVl9yzEAkN+w3L1S0tRpuTlNcUt+NSiixz7yt7IV2ti/XW2yIGhHGypgteWsduqkPoWNqCoHnYx+Hw9ZoXpdGAp74dlfWm44L6eHSMVxHdUilcL12kuXidPKMgfhyfqJaxBcRm+/EcAqF6Gyb72vDk183TPPRP8zbe/O7wjUslAF5U4II8LFOC6bqRX8LRnqv4W/i3QRiwe+3zauzJEuBjiXBXOVOiXXQWVy3k6yXvuZD0t3skvsGG1uhy1c8Su6Yi7B4E0l/o+nQxVyTfzRX/hnT0UW3/U+t5iSQmy5dzhc+Oh4aKZoVDsywrD1fcUIkU7INYx1YJztIMfQ6jHKz2QpM9GHKeeT9E4Da4WP8dd17hDyiSpVsrDTLstZZ7UwcCM7z1tvC4qG7IZW4EaSTVx/X6Joq+1mM1R2bNbR5hJuNb1PiqUQK9KK/2ZQXDImlb1U3s1hHGl8IFkZGeXe/j8rHWUpvrAgIMbwg8EJQJ+bL7IdZIacbNmxGbM10GGaeUhYJCc2DnIyPHpD19T5aL2NGizgzS1VzTHlEKpmtC6MMUhb9ozJo/S GjnWjAfE luOPjdIXQZ/18t0NYRjxvZzldpto66vAgyN1O2MPf2aCN/Z3SZjSpVTRaRbQCZ6xJTkzHYkS3nQABd4W0GszqXMhvw5LLM5m9Vp+kkD3KO7TV4I5YdtzZ10D8k2CQtQd2iMy9ln6mC3/T1HwEmjRV9oFDdJodgimt6U5GNE/F9MecDYLpPlVnacqI0czWHUYdYeeL0xiJ9EkvRQyeQW3JwxnN5W3mNGY3rW86qSk+BUlBJbIqC0f9otUCD4xDIgLq3UZBCUb2TtEGLevZ/6W4bVnF58hyLA+jcSmziafMk+ujMw3AXbee8hG9thyljqS7Yx7QMH630P/lQ2I+pV1pgaeOTlANZBIwUuPR1Ae3Ho2mQz/EcDR79PickpWi8h5Le6tIjlQ0uFRltrB8xi0Bn4d4ldR4UvbwzDxj5xtSrNl2CSOhTeaHZxzixO7Zd6cRavAO X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: As the clone side already executes pid allocation with only pidmap_lock held, issuing free_pid() while still holding tasklist_lock exacerbates total hold time of the latter. The pid array is smuggled through newly added release_task_post struct so that any extra things want to get moved out have means to do it. Reviewed-by: Oleg Nesterov Signed-off-by: Mateusz Guzik --- include/linux/pid.h | 7 ++++--- kernel/exit.c | 27 +++++++++++++++++++-------- kernel/pid.c | 44 ++++++++++++++++++++++---------------------- kernel/sys.c | 14 +++++++++----- 4 files changed, 54 insertions(+), 38 deletions(-) diff --git a/include/linux/pid.h b/include/linux/pid.h index 98837a1ff0f3..311ecebd7d56 100644 --- a/include/linux/pid.h +++ b/include/linux/pid.h @@ -101,9 +101,9 @@ extern struct pid *get_task_pid(struct task_struct *task, enum pid_type type); * these helpers must be called with the tasklist_lock write-held. */ extern void attach_pid(struct task_struct *task, enum pid_type); -extern void detach_pid(struct task_struct *task, enum pid_type); -extern void change_pid(struct task_struct *task, enum pid_type, - struct pid *pid); +void detach_pid(struct pid **pids, struct task_struct *task, enum pid_type); +void change_pid(struct pid **pids, struct task_struct *task, enum pid_type, + struct pid *pid); extern void exchange_tids(struct task_struct *task, struct task_struct *old); extern void transfer_pid(struct task_struct *old, struct task_struct *new, enum pid_type); @@ -129,6 +129,7 @@ extern struct pid *find_ge_pid(int nr, struct pid_namespace *); extern struct pid *alloc_pid(struct pid_namespace *ns, pid_t *set_tid, size_t set_tid_size); extern void free_pid(struct pid *pid); +void free_pids(struct pid **pids); extern void disable_pid_allocation(struct pid_namespace *ns); /* diff --git a/kernel/exit.c b/kernel/exit.c index b5c0cbc6bdfb..0d6df671c8a8 100644 --- a/kernel/exit.c +++ b/kernel/exit.c @@ -122,14 +122,22 @@ static __init int kernel_exit_sysfs_init(void) late_initcall(kernel_exit_sysfs_init); #endif -static void __unhash_process(struct task_struct *p, bool group_dead) +/* + * For things release_task() would like to do *after* tasklist_lock is released. + */ +struct release_task_post { + struct pid *pids[PIDTYPE_MAX]; +}; + +static void __unhash_process(struct release_task_post *post, struct task_struct *p, + bool group_dead) { nr_threads--; - detach_pid(p, PIDTYPE_PID); + detach_pid(post->pids, p, PIDTYPE_PID); if (group_dead) { - detach_pid(p, PIDTYPE_TGID); - detach_pid(p, PIDTYPE_PGID); - detach_pid(p, PIDTYPE_SID); + detach_pid(post->pids, p, PIDTYPE_TGID); + detach_pid(post->pids, p, PIDTYPE_PGID); + detach_pid(post->pids, p, PIDTYPE_SID); list_del_rcu(&p->tasks); list_del_init(&p->sibling); @@ -141,7 +149,7 @@ static void __unhash_process(struct task_struct *p, bool group_dead) /* * This function expects the tasklist_lock write-locked. */ -static void __exit_signal(struct task_struct *tsk) +static void __exit_signal(struct release_task_post *post, struct task_struct *tsk) { struct signal_struct *sig = tsk->signal; bool group_dead = thread_group_leader(tsk); @@ -194,7 +202,7 @@ static void __exit_signal(struct task_struct *tsk) task_io_accounting_add(&sig->ioac, &tsk->ioac); sig->sum_sched_runtime += tsk->se.sum_exec_runtime; sig->nr_threads--; - __unhash_process(tsk, group_dead); + __unhash_process(post, tsk, group_dead); write_sequnlock(&sig->stats_lock); /* @@ -236,10 +244,13 @@ void __weak release_thread(struct task_struct *dead_task) void release_task(struct task_struct *p) { + struct release_task_post post; struct task_struct *leader; struct pid *thread_pid; int zap_leader; repeat: + memset(&post, 0, sizeof(post)); + /* don't need to get the RCU readlock here - the process is dead and * can't be modifying its own credentials. But shut RCU-lockdep up */ rcu_read_lock(); @@ -252,7 +263,7 @@ void release_task(struct task_struct *p) write_lock_irq(&tasklist_lock); ptrace_release_task(p); - __exit_signal(p); + __exit_signal(&post, p); /* * If we are the last non-leader member of the thread diff --git a/kernel/pid.c b/kernel/pid.c index 2ae872f689a7..73625f28c166 100644 --- a/kernel/pid.c +++ b/kernel/pid.c @@ -88,20 +88,6 @@ struct pid_namespace init_pid_ns = { }; EXPORT_SYMBOL_GPL(init_pid_ns); -/* - * Note: disable interrupts while the pidmap_lock is held as an - * interrupt might come in and do read_lock(&tasklist_lock). - * - * If we don't disable interrupts there is a nasty deadlock between - * detach_pid()->free_pid() and another cpu that does - * spin_lock(&pidmap_lock) followed by an interrupt routine that does - * read_lock(&tasklist_lock); - * - * After we clean up the tasklist_lock and know there are no - * irq handlers that take it we can leave the interrupts enabled. - * For now it is easier to be safe than to prove it can't happen. - */ - static __cacheline_aligned_in_smp DEFINE_SPINLOCK(pidmap_lock); seqcount_spinlock_t pidmap_lock_seq = SEQCNT_SPINLOCK_ZERO(pidmap_lock_seq, &pidmap_lock); @@ -128,10 +114,11 @@ static void delayed_put_pid(struct rcu_head *rhp) void free_pid(struct pid *pid) { - /* We can be called with write_lock_irq(&tasklist_lock) held */ int i; unsigned long flags; + lockdep_assert_not_held(&tasklist_lock); + spin_lock_irqsave(&pidmap_lock, flags); for (i = 0; i <= pid->level; i++) { struct upid *upid = pid->numbers + i; @@ -160,6 +147,18 @@ void free_pid(struct pid *pid) call_rcu(&pid->rcu, delayed_put_pid); } +void free_pids(struct pid **pids) +{ + int tmp; + + /* + * This can batch pidmap_lock. + */ + for (tmp = PIDTYPE_MAX; --tmp >= 0; ) + if (pids[tmp]) + free_pid(pids[tmp]); +} + struct pid *alloc_pid(struct pid_namespace *ns, pid_t *set_tid, size_t set_tid_size) { @@ -347,8 +346,8 @@ void attach_pid(struct task_struct *task, enum pid_type type) hlist_add_head_rcu(&task->pid_links[type], &pid->tasks[type]); } -static void __change_pid(struct task_struct *task, enum pid_type type, - struct pid *new) +static void __change_pid(struct pid **pids, struct task_struct *task, + enum pid_type type, struct pid *new) { struct pid **pid_ptr, *pid; int tmp; @@ -370,18 +369,19 @@ static void __change_pid(struct task_struct *task, enum pid_type type, if (pid_has_task(pid, tmp)) return; - free_pid(pid); + WARN_ON(pids[type]); + pids[type] = pid; } -void detach_pid(struct task_struct *task, enum pid_type type) +void detach_pid(struct pid **pids, struct task_struct *task, enum pid_type type) { - __change_pid(task, type, NULL); + __change_pid(pids, task, type, NULL); } -void change_pid(struct task_struct *task, enum pid_type type, +void change_pid(struct pid **pids, struct task_struct *task, enum pid_type type, struct pid *pid) { - __change_pid(task, type, pid); + __change_pid(pids, task, type, pid); attach_pid(task, type); } diff --git a/kernel/sys.c b/kernel/sys.c index cb366ff8703a..4efca8a97d62 100644 --- a/kernel/sys.c +++ b/kernel/sys.c @@ -1085,6 +1085,7 @@ SYSCALL_DEFINE2(setpgid, pid_t, pid, pid_t, pgid) { struct task_struct *p; struct task_struct *group_leader = current->group_leader; + struct pid *pids[PIDTYPE_MAX] = { 0 }; struct pid *pgrp; int err; @@ -1142,13 +1143,14 @@ SYSCALL_DEFINE2(setpgid, pid_t, pid, pid_t, pgid) goto out; if (task_pgrp(p) != pgrp) - change_pid(p, PIDTYPE_PGID, pgrp); + change_pid(pids, p, PIDTYPE_PGID, pgrp); err = 0; out: /* All paths lead to here, thus we are safe. -DaveM */ write_unlock_irq(&tasklist_lock); rcu_read_unlock(); + free_pids(pids); return err; } @@ -1222,21 +1224,22 @@ SYSCALL_DEFINE1(getsid, pid_t, pid) return retval; } -static void set_special_pids(struct pid *pid) +static void set_special_pids(struct pid **pids, struct pid *pid) { struct task_struct *curr = current->group_leader; if (task_session(curr) != pid) - change_pid(curr, PIDTYPE_SID, pid); + change_pid(pids, curr, PIDTYPE_SID, pid); if (task_pgrp(curr) != pid) - change_pid(curr, PIDTYPE_PGID, pid); + change_pid(pids, curr, PIDTYPE_PGID, pid); } int ksys_setsid(void) { struct task_struct *group_leader = current->group_leader; struct pid *sid = task_pid(group_leader); + struct pid *pids[PIDTYPE_MAX] = { 0 }; pid_t session = pid_vnr(sid); int err = -EPERM; @@ -1252,13 +1255,14 @@ int ksys_setsid(void) goto out; group_leader->signal->leader = 1; - set_special_pids(sid); + set_special_pids(pids, sid); proc_clear_tty(group_leader); err = session; out: write_unlock_irq(&tasklist_lock); + free_pids(pids); if (err > 0) { proc_sid_connector(group_leader); sched_autogroup_create_attach(group_leader);