From patchwork Thu Feb 6 16:44:13 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mateusz Guzik X-Patchwork-Id: 13963326 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id CAA84C0219D for ; Thu, 6 Feb 2025 16:44:54 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 515C06B0093; Thu, 6 Feb 2025 11:44:54 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 4C59B6B0095; Thu, 6 Feb 2025 11:44:54 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 33EE16B0096; Thu, 6 Feb 2025 11:44:54 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 1262B6B0093 for ; Thu, 6 Feb 2025 11:44:54 -0500 (EST) Received: from smtpin23.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id B17691C7481 for ; Thu, 6 Feb 2025 16:44:53 +0000 (UTC) X-FDA: 83090094066.23.19D0B8F Received: from mail-ej1-f44.google.com (mail-ej1-f44.google.com [209.85.218.44]) by imf13.hostedemail.com (Postfix) with ESMTP id B03972001C for ; Thu, 6 Feb 2025 16:44:51 +0000 (UTC) Authentication-Results: imf13.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=c0btGaQ7; spf=pass (imf13.hostedemail.com: domain of mjguzik@gmail.com designates 209.85.218.44 as permitted sender) smtp.mailfrom=mjguzik@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1738860291; a=rsa-sha256; cv=none; b=vMJWkZRIzUAVDhB4sW+NdexsxnE/Z2PmJQG5BuwSaK2NIC7gOZmh494vtYcUzBdq20unOC glx1SfWMYefhdhd+zqMeIbKOu3cGmAfMQFKXNGAvR7gDVaFHWSzKFjBGIPSkdfAUeKCm+h CE3eBe56kP2gCWUSy/nO4sywY4MF+F0= ARC-Authentication-Results: i=1; imf13.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=c0btGaQ7; spf=pass (imf13.hostedemail.com: domain of mjguzik@gmail.com designates 209.85.218.44 as permitted sender) smtp.mailfrom=mjguzik@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1738860291; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=P1tcF++dbHlub2RYbrz+kvADf5ajR/p1WCoTi4SqY9k=; b=vuwBe/iVLsoIPQR+9WfuKlqGlfT5QgaadfV72T2DxFqPfCALrhRhzf1sLQQhFpBKOdVIre 3+aARQWstDmaX33AYDqsivrtSI896sobKuN8QA+H5obyq14y4IULWGgdJnlgPywFw9fQub sYYt/ascykS4OW0OheWL2IO9rKk6olw= Received: by mail-ej1-f44.google.com with SMTP id a640c23a62f3a-ab777352df4so122996866b.2 for ; Thu, 06 Feb 2025 08:44:51 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1738860290; x=1739465090; darn=kvack.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=P1tcF++dbHlub2RYbrz+kvADf5ajR/p1WCoTi4SqY9k=; b=c0btGaQ7DOImzse+mV+bByRxRdXRPsYWuNvB2mREWgZ5K8IHz+JqqFD5p/9lEt3/OE E7iyYdaJD3Cbe+PvzmWlwCbIEODEUuaF9NRJLZaHakY/5BS5+ZLdsrTGkT9DChOGJAKZ cPbo9wIK6WZF47NvTsrRsA4rPWNSxMVeeZgDVWigIcKQbxhBKzZm0HWoxLZuwYc5a13V MB2RdtKQ7+QEpBJKxfIfWGsdu3d2x3MTFfx5jbXvTRqBs1H9yMbNIgjGfE/6CZ4gvqib zuZ2LpBrRox0cggmPeFm0QOfG7Ze58gJEn8aSBlGK4oRSOWrZ8hLQcZPe4ReoEKIfh/4 aSVw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1738860290; x=1739465090; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=P1tcF++dbHlub2RYbrz+kvADf5ajR/p1WCoTi4SqY9k=; b=U7JFacPjPrFsLmvTYFzNaME2sfWDM914BTiHpE+unnF7k5tv7x0D3qwmiNQyfiQ49v kMEHhlgz7CzE6kAZILhY/eMnoN+nWrLA9DPuoIO075Snly8bFJEb5/WLsglSua+6BQ5J Vr/hA9gV1SoCd0xliFVjl88bNxpQHjfr89AowhiNTNz+RPqsuzvJWyzeodEqc0eF+aRJ 9pDlQktgks1CxnPPp5IrMxUlJYM3J79jbCen25FJFPDftxIT2omxGOlNuJi6WqrtI0+R PNFQBx+0FbhT/2MW6iGVYqRj8FKkac06pbQtiCxqqpRD+9X8MdJrY7RUd1frOv833Mzs +pDw== X-Forwarded-Encrypted: i=1; AJvYcCXdiKz0gUsCTgkxviKcfeR2KPfpZaKzwafPfRojorvpgxmXMSrTxCiq0MQqW0YFBXFtGIeVG6tsoA==@kvack.org X-Gm-Message-State: AOJu0YxtmUnPdgOMBNln5mTq4echwv/jTJ0YWGFN6xKzZI99SIPi+LAH InF1jctPx3ROqCC1bnwV1Baq1jwhZZy6/eyL7P7J6TNz3gUMiphe X-Gm-Gg: ASbGncvsoKNvEoTuHP7cwLBPFQFMOTXykCwi4a7lD13thpyt+jFlibbhgOTWzf7CBSt ovYovdBZT6tYoj/J1a1sHFMVP4DcqENdHzEXdlMAEbFq/OZDuAYdSladLnzjUi1wd2mxknI3csL Rl7zAq0tm+xXN1VZHBF15yOWEPirc4DQaNsJ5RUJ0vA2wbXApvjVVJ7xqTDtZy8XxwVDdL+lgKd fDEFVZtQhcJgI9b1U5nQLNtVeI/Oxi9I3EYpRmUwqgboVq+8wNMZ5m+7pik9ZTM+oszsPchW+Rd gUHxkLfJHc4d3MX/rvJ755ABidkkhqw= X-Google-Smtp-Source: AGHT+IE4X6Gh7ABr/CAabaJiQHTc3ksfl4DHOZdtIehrJhW32+PzJmv296KghHaD7mifNH3wlAlW+g== X-Received: by 2002:a17:906:5594:b0:ab7:7482:a11 with SMTP id a640c23a62f3a-ab774820b25mr252379466b.12.1738860289978; Thu, 06 Feb 2025 08:44:49 -0800 (PST) Received: from f.. (cst-prg-95-94.cust.vodafone.cz. [46.135.95.94]) by smtp.gmail.com with ESMTPSA id 4fb4d7f45d1cf-5dcf9f6fa69sm1121849a12.73.2025.02.06.08.44.46 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 06 Feb 2025 08:44:48 -0800 (PST) From: Mateusz Guzik To: ebiederm@xmission.com, oleg@redhat.com Cc: brauner@kernel.org, akpm@linux-foundation.org, Liam.Howlett@oracle.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Mateusz Guzik Subject: [PATCH v6 4/5] pid: perform free_pid() calls outside of tasklist_lock Date: Thu, 6 Feb 2025 17:44:13 +0100 Message-ID: <20250206164415.450051-5-mjguzik@gmail.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20250206164415.450051-1-mjguzik@gmail.com> References: <20250206164415.450051-1-mjguzik@gmail.com> MIME-Version: 1.0 X-Rspamd-Queue-Id: B03972001C X-Stat-Signature: x5oncs8kq619rmfzpypk1hfin4sbramo X-Rspam-User: X-Rspamd-Server: rspam10 X-HE-Tag: 1738860291-983742 X-HE-Meta: U2FsdGVkX1/askYTB43Qz0JD+BQgqwK24F1rH1f3tS7SqMUTSURO8skLtybS3G23AoycQScdljwQPHOhmai8/a14vudeR/Ul4yPMn3mCzzH+NguBbfIc9WrVIN+RhgJ4DcLCwcgl2I1rXv7RpvqtnS5o9cspBVmBcaB9cgtCSNphFRyvnKJFjFtwNF55sLIiIKnyR2Kz1+xLK+sDb5beYuCxqNGnJ+iWz9H2gQFUsG/eBXcPjraL6sKTAZZxg+FHcxuyPcvLkps6zOtxyGdo521wmKBTwPpOs1U5Hmq/umK92HsyCmRZUjX1f07+ZRhxfIWPIv1CeYLXHLNeU+GVFFDVW8T0xCfBNNLrVkDdtKm9/Rifd//etGqkG+vwKypINZMvHMyqEPNoeLpiDSm3PQGW1aU5XduY9bGbfsiS2ofVU8fjfLGilEXwd2HUM3YDAAGHIvf06zjpNOdjoY/uSKLdWx/WZzUMGvufIxxpExwZIK3LabUFs/62H8wMp5jP5hg1DSeHzw2dTuhzTlqHdUhZ0tVL97EVBqKaWK9kfRxilpjcWf1kcX1fk9FB6+ZpsJFU79ZozHGPhoHRbhYyAbynH1Tp5/d2YKi7xOKvafKuek5NMPDaLHZ8hWgMUJxfmBfQhoO8w4LKj2o+/rnuH0LxMoe83ngWrhytuNv8N+QSCzXd3K3SVF826C7AClEBJb1B2nzKE0gaP8oCP9SZjoGyxA46wTmfk9EeJTlaL2LKvTF41ReUfJI1D45s6//px8ktporu0nRURxtY3jGzkj+pfH45kmH00m/MTGT72bQWkEqh2usC6VeKLFdqqugxPaxnbnvp1UDcvHZIzPDeg4k6ZTJs0E31TfbufJ+XR7T/PxXLYB8+WajFkVY+DKdK4tECKt5WHRI5KnXBeMLXiYES3On+s0XRF/9NtJ1OZwAuM5RyoX6MR9F8ucciU9gSb2egFuawmEAKETtVNUm 12H9A0SC HDM1AwFQwZ6etX1sJPbbYJ16NzXw5izzV0h1Sf+XnfRIq06V9jpTsSZz/jZ4lb4ZIs/6nRGE50S1JOFr4Ag/cpey1zhg+dZulZrWRxOxBGQNZd29Kec4zsYfZUnFW0O8t7U+AhRYJU6MCpRH8U23DF5JzNAz5YlataFGc13Lh1WoI6KI7Ku46+zh5FjzA35ufiSHKiTsMLI2aONHFLRgUoX34p/ayqfrhM6tpk2/nC5SWYlI6/zxGjx/3JIXL6bV2NlZcUkIbwT36CZm10uAM+WXkT/5YJanBdBA90+aZAshxhRjQk0Iwp+mvE6HtaTEh7pGPGsvHf8Hi0BmuXsf0ysmKa4dpPWPJMxXT7ngSdh8vgdPsQwmJdTmKesRVw/BbdBIIOa+B5ZswGxpITtEEzaXPZYC5DKtYX9W0vsBn3IW4jWgvfyy3LGb5Ik7bmU1b/G1w X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: As the clone side already executes pid allocation with only pidmap_lock held, issuing free_pid() while still holding tasklist_lock exacerbates total hold time of the latter. More things may show up later which require initial clean up with the lock held and allow finishing without it. For that reason a struct to collect such work is added instead of merely passing the pid array. Reviewed-by: Oleg Nesterov Signed-off-by: Mateusz Guzik --- include/linux/pid.h | 7 ++++--- kernel/exit.c | 28 ++++++++++++++++++++-------- kernel/pid.c | 44 ++++++++++++++++++++++---------------------- kernel/sys.c | 14 +++++++++----- 4 files changed, 55 insertions(+), 38 deletions(-) diff --git a/include/linux/pid.h b/include/linux/pid.h index 98837a1ff0f3..311ecebd7d56 100644 --- a/include/linux/pid.h +++ b/include/linux/pid.h @@ -101,9 +101,9 @@ extern struct pid *get_task_pid(struct task_struct *task, enum pid_type type); * these helpers must be called with the tasklist_lock write-held. */ extern void attach_pid(struct task_struct *task, enum pid_type); -extern void detach_pid(struct task_struct *task, enum pid_type); -extern void change_pid(struct task_struct *task, enum pid_type, - struct pid *pid); +void detach_pid(struct pid **pids, struct task_struct *task, enum pid_type); +void change_pid(struct pid **pids, struct task_struct *task, enum pid_type, + struct pid *pid); extern void exchange_tids(struct task_struct *task, struct task_struct *old); extern void transfer_pid(struct task_struct *old, struct task_struct *new, enum pid_type); @@ -129,6 +129,7 @@ extern struct pid *find_ge_pid(int nr, struct pid_namespace *); extern struct pid *alloc_pid(struct pid_namespace *ns, pid_t *set_tid, size_t set_tid_size); extern void free_pid(struct pid *pid); +void free_pids(struct pid **pids); extern void disable_pid_allocation(struct pid_namespace *ns); /* diff --git a/kernel/exit.c b/kernel/exit.c index 8c39c84582b7..0d6df671c8a8 100644 --- a/kernel/exit.c +++ b/kernel/exit.c @@ -122,14 +122,22 @@ static __init int kernel_exit_sysfs_init(void) late_initcall(kernel_exit_sysfs_init); #endif -static void __unhash_process(struct task_struct *p, bool group_dead) +/* + * For things release_task() would like to do *after* tasklist_lock is released. + */ +struct release_task_post { + struct pid *pids[PIDTYPE_MAX]; +}; + +static void __unhash_process(struct release_task_post *post, struct task_struct *p, + bool group_dead) { nr_threads--; - detach_pid(p, PIDTYPE_PID); + detach_pid(post->pids, p, PIDTYPE_PID); if (group_dead) { - detach_pid(p, PIDTYPE_TGID); - detach_pid(p, PIDTYPE_PGID); - detach_pid(p, PIDTYPE_SID); + detach_pid(post->pids, p, PIDTYPE_TGID); + detach_pid(post->pids, p, PIDTYPE_PGID); + detach_pid(post->pids, p, PIDTYPE_SID); list_del_rcu(&p->tasks); list_del_init(&p->sibling); @@ -141,7 +149,7 @@ static void __unhash_process(struct task_struct *p, bool group_dead) /* * This function expects the tasklist_lock write-locked. */ -static void __exit_signal(struct task_struct *tsk) +static void __exit_signal(struct release_task_post *post, struct task_struct *tsk) { struct signal_struct *sig = tsk->signal; bool group_dead = thread_group_leader(tsk); @@ -194,7 +202,7 @@ static void __exit_signal(struct task_struct *tsk) task_io_accounting_add(&sig->ioac, &tsk->ioac); sig->sum_sched_runtime += tsk->se.sum_exec_runtime; sig->nr_threads--; - __unhash_process(tsk, group_dead); + __unhash_process(post, tsk, group_dead); write_sequnlock(&sig->stats_lock); /* @@ -236,10 +244,13 @@ void __weak release_thread(struct task_struct *dead_task) void release_task(struct task_struct *p) { + struct release_task_post post; struct task_struct *leader; struct pid *thread_pid; int zap_leader; repeat: + memset(&post, 0, sizeof(post)); + /* don't need to get the RCU readlock here - the process is dead and * can't be modifying its own credentials. But shut RCU-lockdep up */ rcu_read_lock(); @@ -252,7 +263,7 @@ void release_task(struct task_struct *p) write_lock_irq(&tasklist_lock); ptrace_release_task(p); - __exit_signal(p); + __exit_signal(&post, p); /* * If we are the last non-leader member of the thread @@ -278,6 +289,7 @@ void release_task(struct task_struct *p) put_pid(thread_pid); add_device_randomness(&p->se.sum_exec_runtime, sizeof(p->se.sum_exec_runtime)); + free_pids(post.pids); release_thread(p); put_task_struct_rcu_user(p); diff --git a/kernel/pid.c b/kernel/pid.c index 2ae872f689a7..73625f28c166 100644 --- a/kernel/pid.c +++ b/kernel/pid.c @@ -88,20 +88,6 @@ struct pid_namespace init_pid_ns = { }; EXPORT_SYMBOL_GPL(init_pid_ns); -/* - * Note: disable interrupts while the pidmap_lock is held as an - * interrupt might come in and do read_lock(&tasklist_lock). - * - * If we don't disable interrupts there is a nasty deadlock between - * detach_pid()->free_pid() and another cpu that does - * spin_lock(&pidmap_lock) followed by an interrupt routine that does - * read_lock(&tasklist_lock); - * - * After we clean up the tasklist_lock and know there are no - * irq handlers that take it we can leave the interrupts enabled. - * For now it is easier to be safe than to prove it can't happen. - */ - static __cacheline_aligned_in_smp DEFINE_SPINLOCK(pidmap_lock); seqcount_spinlock_t pidmap_lock_seq = SEQCNT_SPINLOCK_ZERO(pidmap_lock_seq, &pidmap_lock); @@ -128,10 +114,11 @@ static void delayed_put_pid(struct rcu_head *rhp) void free_pid(struct pid *pid) { - /* We can be called with write_lock_irq(&tasklist_lock) held */ int i; unsigned long flags; + lockdep_assert_not_held(&tasklist_lock); + spin_lock_irqsave(&pidmap_lock, flags); for (i = 0; i <= pid->level; i++) { struct upid *upid = pid->numbers + i; @@ -160,6 +147,18 @@ void free_pid(struct pid *pid) call_rcu(&pid->rcu, delayed_put_pid); } +void free_pids(struct pid **pids) +{ + int tmp; + + /* + * This can batch pidmap_lock. + */ + for (tmp = PIDTYPE_MAX; --tmp >= 0; ) + if (pids[tmp]) + free_pid(pids[tmp]); +} + struct pid *alloc_pid(struct pid_namespace *ns, pid_t *set_tid, size_t set_tid_size) { @@ -347,8 +346,8 @@ void attach_pid(struct task_struct *task, enum pid_type type) hlist_add_head_rcu(&task->pid_links[type], &pid->tasks[type]); } -static void __change_pid(struct task_struct *task, enum pid_type type, - struct pid *new) +static void __change_pid(struct pid **pids, struct task_struct *task, + enum pid_type type, struct pid *new) { struct pid **pid_ptr, *pid; int tmp; @@ -370,18 +369,19 @@ static void __change_pid(struct task_struct *task, enum pid_type type, if (pid_has_task(pid, tmp)) return; - free_pid(pid); + WARN_ON(pids[type]); + pids[type] = pid; } -void detach_pid(struct task_struct *task, enum pid_type type) +void detach_pid(struct pid **pids, struct task_struct *task, enum pid_type type) { - __change_pid(task, type, NULL); + __change_pid(pids, task, type, NULL); } -void change_pid(struct task_struct *task, enum pid_type type, +void change_pid(struct pid **pids, struct task_struct *task, enum pid_type type, struct pid *pid) { - __change_pid(task, type, pid); + __change_pid(pids, task, type, pid); attach_pid(task, type); } diff --git a/kernel/sys.c b/kernel/sys.c index cb366ff8703a..4efca8a97d62 100644 --- a/kernel/sys.c +++ b/kernel/sys.c @@ -1085,6 +1085,7 @@ SYSCALL_DEFINE2(setpgid, pid_t, pid, pid_t, pgid) { struct task_struct *p; struct task_struct *group_leader = current->group_leader; + struct pid *pids[PIDTYPE_MAX] = { 0 }; struct pid *pgrp; int err; @@ -1142,13 +1143,14 @@ SYSCALL_DEFINE2(setpgid, pid_t, pid, pid_t, pgid) goto out; if (task_pgrp(p) != pgrp) - change_pid(p, PIDTYPE_PGID, pgrp); + change_pid(pids, p, PIDTYPE_PGID, pgrp); err = 0; out: /* All paths lead to here, thus we are safe. -DaveM */ write_unlock_irq(&tasklist_lock); rcu_read_unlock(); + free_pids(pids); return err; } @@ -1222,21 +1224,22 @@ SYSCALL_DEFINE1(getsid, pid_t, pid) return retval; } -static void set_special_pids(struct pid *pid) +static void set_special_pids(struct pid **pids, struct pid *pid) { struct task_struct *curr = current->group_leader; if (task_session(curr) != pid) - change_pid(curr, PIDTYPE_SID, pid); + change_pid(pids, curr, PIDTYPE_SID, pid); if (task_pgrp(curr) != pid) - change_pid(curr, PIDTYPE_PGID, pid); + change_pid(pids, curr, PIDTYPE_PGID, pid); } int ksys_setsid(void) { struct task_struct *group_leader = current->group_leader; struct pid *sid = task_pid(group_leader); + struct pid *pids[PIDTYPE_MAX] = { 0 }; pid_t session = pid_vnr(sid); int err = -EPERM; @@ -1252,13 +1255,14 @@ int ksys_setsid(void) goto out; group_leader->signal->leader = 1; - set_special_pids(sid); + set_special_pids(pids, sid); proc_clear_tty(group_leader); err = session; out: write_unlock_irq(&tasklist_lock); + free_pids(pids); if (err > 0) { proc_sid_connector(group_leader); sched_autogroup_create_attach(group_leader);