From patchwork Tue Jan 28 16:07:43 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mateusz Guzik X-Patchwork-Id: 13952704 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id EB666C0218A for ; Tue, 28 Jan 2025 16:07:57 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 719B56B008C; Tue, 28 Jan 2025 11:07:57 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 6A2D82801E3; Tue, 28 Jan 2025 11:07:57 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 4F6216B00BE; Tue, 28 Jan 2025 11:07:57 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 2D32E6B008C for ; Tue, 28 Jan 2025 11:07:57 -0500 (EST) Received: from smtpin07.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 9112112015F for ; Tue, 28 Jan 2025 16:07:56 +0000 (UTC) X-FDA: 83057341752.07.FAFD0A8 Received: from mail-ed1-f43.google.com (mail-ed1-f43.google.com [209.85.208.43]) by imf03.hostedemail.com (Postfix) with ESMTP id C05292000B for ; Tue, 28 Jan 2025 16:07:54 +0000 (UTC) Authentication-Results: imf03.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=FnqNlMNK; spf=pass (imf03.hostedemail.com: domain of mjguzik@gmail.com designates 209.85.208.43 as permitted sender) smtp.mailfrom=mjguzik@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1738080474; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:references:dkim-signature; bh=ovay8nedMLnJi20KqC6F6Tzu3DM+dLgm6RUualvZZ3c=; b=ge+/OuaEp1TnUGhASzB9ejtIigf1RJC8/+llo8jfcP1Ko7Cnb5shJPYsOsK0G5ve68Camu L+lF2Hrujo6Rqf5Z+BUhPqpFDJ9quvFoowysBRPI9wx/vG+hmKMaQS/1eiqsBSVRR6jDCc tLxHIc97QLFBqac9xeu5Vam+H90Fqd8= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1738080474; a=rsa-sha256; cv=none; b=hze/qccGWmj0RLzFuaBca5R+0iQKW/4UuC3sfDIrg2DpEWuaRpEEdgjnUKq8dOgVUilQTx 6HO5qMVzbi3nyJHKoQFWFIdEgIYnB+VxaSdfaFY/PIeROWAQ6HBYVepcw6ghDyr7rLLVev uZxckG1R14toWDrwN0s+zI0Ee0ZOvSc= ARC-Authentication-Results: i=1; imf03.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=FnqNlMNK; spf=pass (imf03.hostedemail.com: domain of mjguzik@gmail.com designates 209.85.208.43 as permitted sender) smtp.mailfrom=mjguzik@gmail.com; dmarc=pass (policy=none) header.from=gmail.com Received: by mail-ed1-f43.google.com with SMTP id 4fb4d7f45d1cf-5d3d143376dso8187990a12.3 for ; Tue, 28 Jan 2025 08:07:54 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1738080473; x=1738685273; darn=kvack.org; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=ovay8nedMLnJi20KqC6F6Tzu3DM+dLgm6RUualvZZ3c=; b=FnqNlMNKYdnrWHjMBaaGKi4vskA+jSwcdrXVHwGe7n+mlbfm1Mdev21c37aWPZVBFP btBQVZSVm819z3cPnfiXmQ42e4gsOwyb1zcz3cKvfP1SNfl9QFvpnKCzqO+3M4j/7N9C mlULDClfmBKyiNdGcteY3OWGcXI4roj7vC79M69JUg5Q6kiqPh/quHESWucx5D8d9pIj 94z/02UA+l5xszQfWb0SeJ0C2gf+cjeqUv8Mp5q9jBGy3z4EJIJ/nVd72JFfMlYQ/e1/ Nd0cXHfqNV0C++BO6w6XQ8UX/tKbMueZAoIEU9xQJSHpr5N8Dq7BiokJi7JzovJzyBs/ Ed4A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1738080473; x=1738685273; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=ovay8nedMLnJi20KqC6F6Tzu3DM+dLgm6RUualvZZ3c=; b=idNrNZScZsz5sTHDV8pnGJ30NT1hwa/eVf6mkQUCgT+P+d26wtphrkq4+NVvxxruVL RcQTbI9ZpmWl/9kLpPwlEKjxZWPmsgThbixCkybA6aM5Dda6HHZQIW19UIMyVhUHgt9c hBJgrge+B3yrzTy9PYB2IEwhhzqGk9Jf5seNyaO2VV8M+oUnvppQfL4ggQIFCeF/w2Sv WgpnxiNl0znTdfSb3+bzmiZ04HQVazdlPW9u+NnYnvSxcG3aPeT8n27jRwC2HL6BPphY 8Ghxa94Q8bKY57hgZNBsqG7znam8EUKGmj5Gr1RxWv0lFbFW4aCr3sVuMWooxSKHErgU WHBg== X-Gm-Message-State: AOJu0YwtyOex/f0em6iM8eL5vvtXDxWeMRZ5A3U68oUE5tgUMSm7CWVC KiCBNGIQ4KcszyUUjST30jFR21+qrc7GoOVtagrPp8Ku6uvp935t X-Gm-Gg: ASbGncvHXqhPzMP9hbsWQlIrtH/N72W5vfNRpbJzacvAylAENxLelFWjydpqqod3RbB GMvYrvlrjzE/H/n6e3iLrLZG3h6hokszlf1mIoTFOIOzaz1yH5/lGliyB1kSejHIn50hYByKJyf QPpPg8eo4DsamsssEJ8dq1IK4GReDznxRuhOLQJjX7BUyfiW4M/0jmVOPzSvMFz455QtM7aM1E5 Z2ccH+W9yZnDBPm4WUMV+sY8ncZXHe+G34oiIi1+2JYUYSgK+YtotxYDlDrRHdY4f9j7q8thfVC PaHLEz5Nk1rAGYv/hv8jvGgmfETduiw= X-Google-Smtp-Source: AGHT+IF4h9Geomlfe43LVBQYIdfj4bDaBQVvTG+9BiPbyFdhTm8Pxvp2SBqsUXLR1ROpCt4rSgT35Q== X-Received: by 2002:a05:6402:358f:b0:5dc:1059:69f with SMTP id 4fb4d7f45d1cf-5dc105a612cmr18010289a12.10.1738080472827; Tue, 28 Jan 2025 08:07:52 -0800 (PST) Received: from f.. (cst-prg-69-60.cust.vodafone.cz. [46.135.69.60]) by smtp.gmail.com with ESMTPSA id 4fb4d7f45d1cf-5dc186b3157sm7407406a12.50.2025.01.28.08.07.49 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 28 Jan 2025 08:07:52 -0800 (PST) From: Mateusz Guzik To: brauner@kernel.org, oleg@redhat.com, akpm@linux-foundation.org Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Mateusz Guzik Subject: [PATCH v2] exit: perform randomness and pid work without tasklist_lock Date: Tue, 28 Jan 2025 17:07:43 +0100 Message-ID: <20250128160743.3142544-1-mjguzik@gmail.com> X-Mailer: git-send-email 2.43.0 MIME-Version: 1.0 X-Stat-Signature: 1sdfgxqujdfz7okrhnbqf7apbc3m4wep X-Rspam-User: X-Rspamd-Queue-Id: C05292000B X-Rspamd-Server: rspam03 X-HE-Tag: 1738080474-738796 X-HE-Meta: U2FsdGVkX18eSeS8USgjDmgQOYhX52c8Wlde91n0LyifswQNZfoy/9SMRVYq6rO/+aXz0Z+rvj/fQDoXQ8GEZMRqg/1eR0A4owPc88sIF7//g1zeAViKIYI7IWvLlbC/FgOZBgqU5UJIt6+IYteMbSDTW34rk5QQoyxN8EagbU29ROsMpN6EE0rcqm+wz+3fV6M/yoeqZ2VRKznqoaxwsGg08uVARvmvk4znVFgfZk9aJKCc8y0fUR8Rs+LDHei38ZznAWGjLJI85ENr4NtIgBAT/RFuZ9mfXF+OdaB7/jBnTruCJTe7HDq32mTmK50A6By7ab5oz4xemEMFnPveA0PusgjB0zS5qIdBhoPcwK2pwChtBUASHNCxg9g0IS723jih/nXkJKel/e8cBzbpMPxhCAlHDdEkl/5ebkdwX5VyaEQ8KIwHQ1jnwgzeg07/+11iR+D+yno/N2XV+m5/5+QUAK7DMBzTcZHagBa4CBfc8q9SKsjfY55ymP+otUVs6eE2VYq3ZOcHpjY02SPb+RPvn5y4QMSaELSW9AvKrmX1NRnWV/hPjfGFUyAyzy3wECZMfjEa8fkG+qBla2bKxaj+CwktnmiBmdns2NdDVmKHfQiQozNXlUNaNgpzfBn/a39VnHgtOsO97f2KNuXWfGeSzgdnYSKiAMLBhTsUrCHjPsvy09Q4mOqWOdTAkjznQYy5pQ3V0BQYya32XCMG8aWMiV193dNlD5ZJfP+3c6srq8Sy2CEVyEkUvRIrGvMgCIvLtHscBSyuErdNmgvJ0aMCrRBz5E+6iCeGCxyuo+w5lUEXmyV9aaU+9r8AfNBfeCPvJbET7nPgyFGamClb35907tRwb4E5PvZrqlxNXAR3ZOWPFx4UNaguiXoRgUP/Qj1YDise4TDDNjc8s3sF3e5DC+2fgSVKrQz/mwrt/RuoiI9ZFHz/sqz2/e9YBSX477J0bgFuG4SPvqMjtqi XMeI/83b BAKPhGKuz7L6Nzr8lvKqKHpo48HaG+kyeFTAGgbuW4Hm3xu0OU4CZTnEm47Z0mBqCjZkqlvK0iVXBrlGGP0kmY/QwEnfyWGMucNB3wEdp/I9yUeoiLjv8ccft8DXS2dxAcBPW4NMKyJrrYdeGxLjXRTOwUu3BpR0Nvkzwx8VJgSMPC+d9oB9nBXB6PE52Z7UYjQj2IMYCcaiIcolw27gmb4cc/FipbZaSkrqdPfIhBUYR7ObKqGSfNzIRVl9WthsRctQkcSNI9F1xtw93y6wuMrrOc6KOIWe4N8ptb7vH6z2Ltk5q8sVqcWiX4xL/ghjy9bmY6Q2M4WyygiOyiK6pgweRxQaOj5qjguOC1edjS+60Z7j+Kd/SfhTnDQ6M8zlZAE1ZX8zOLK/Ljhgs1f/OZkrncnZe2i7mqIwf X-Bogosity: Ham, tests=bogofilter, spamicity=0.000070, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Both add_device_randomness() and attach_pid()/detach_pid() have their own synchronisation mechanisms. The clone side calls them *without* the tasklist_lock held, meaning parallel calls can contend on their locks. The exit side calls them *with* the tasklist_lock lock, which means the hold time is avoidably extended by waiting on either of the 2 locks, in turn exacerbating contention on tasklist_lock itself. Postponing the work until after the lock is dropped bumps thread creation/destruction rate by 15% on a 24 core vm. Bench (plop into will-it-scale): $ cat tests/threadspawn1.c char *testcase_description = "Thread creation and teardown"; static void *worker(void *arg) { return (NULL); } void testcase(unsigned long long *iterations, unsigned long nr) { pthread_t thread; int error; while (1) { error = pthread_create(&thread, NULL, worker, NULL); assert(error == 0); error = pthread_join(thread, NULL); assert(error == 0); (*iterations)++; } } Run: $ ./threadspawn1_processes -t 24 Signed-off-by: Mateusz Guzik --- v2: - introduce a struct to collect work - move out pids as well there is more which can be pulled out this may look suspicious: + proc_flush_pid(p->thread_pid); AFAICS this is constant for the duration of the lifetime, so i don't there is a problem include/linux/pid.h | 1 + kernel/exit.c | 53 +++++++++++++++++++++++++++++++++------------ kernel/pid.c | 23 +++++++++++++++----- 3 files changed, 58 insertions(+), 19 deletions(-) diff --git a/include/linux/pid.h b/include/linux/pid.h index 98837a1ff0f3..6e9fcacd02cd 100644 --- a/include/linux/pid.h +++ b/include/linux/pid.h @@ -101,6 +101,7 @@ extern struct pid *get_task_pid(struct task_struct *task, enum pid_type type); * these helpers must be called with the tasklist_lock write-held. */ extern void attach_pid(struct task_struct *task, enum pid_type); +extern struct pid *detach_pid_return(struct task_struct *task, enum pid_type); extern void detach_pid(struct task_struct *task, enum pid_type); extern void change_pid(struct task_struct *task, enum pid_type, struct pid *pid); diff --git a/kernel/exit.c b/kernel/exit.c index 1dcddfe537ee..4e452d3e3a89 100644 --- a/kernel/exit.c +++ b/kernel/exit.c @@ -122,14 +122,23 @@ static __init int kernel_exit_sysfs_init(void) late_initcall(kernel_exit_sysfs_init); #endif -static void __unhash_process(struct task_struct *p, bool group_dead) +/* + * For things release_task would like to do *after* tasklist_lock is released. + */ +struct release_task_post { + unsigned long long randomness; + struct pid *pids[PIDTYPE_MAX]; +}; + +static void __unhash_process(struct release_task_post *post, struct task_struct *p, + bool group_dead) { nr_threads--; - detach_pid(p, PIDTYPE_PID); + post->pids[PIDTYPE_PID] = detach_pid_return(p, PIDTYPE_PID); if (group_dead) { - detach_pid(p, PIDTYPE_TGID); - detach_pid(p, PIDTYPE_PGID); - detach_pid(p, PIDTYPE_SID); + post->pids[PIDTYPE_TGID] = detach_pid_return(p, PIDTYPE_TGID); + post->pids[PIDTYPE_PGID] = detach_pid_return(p, PIDTYPE_PGID); + post->pids[PIDTYPE_SID] = detach_pid_return(p, PIDTYPE_SID); list_del_rcu(&p->tasks); list_del_init(&p->sibling); @@ -141,7 +150,8 @@ static void __unhash_process(struct task_struct *p, bool group_dead) /* * This function expects the tasklist_lock write-locked. */ -static void __exit_signal(struct task_struct *tsk) +static void __exit_signal(struct release_task_post *post, + struct task_struct *tsk) { struct signal_struct *sig = tsk->signal; bool group_dead = thread_group_leader(tsk); @@ -174,8 +184,7 @@ static void __exit_signal(struct task_struct *tsk) sig->curr_target = next_thread(tsk); } - add_device_randomness((const void*) &tsk->se.sum_exec_runtime, - sizeof(unsigned long long)); + post->randomness = tsk->se.sum_exec_runtime; /* * Accumulate here the counters for all threads as they die. We could @@ -197,7 +206,7 @@ static void __exit_signal(struct task_struct *tsk) task_io_accounting_add(&sig->ioac, &tsk->ioac); sig->sum_sched_runtime += tsk->se.sum_exec_runtime; sig->nr_threads--; - __unhash_process(tsk, group_dead); + __unhash_process(post, tsk, group_dead); write_sequnlock(&sig->stats_lock); /* @@ -240,9 +249,13 @@ void __weak release_thread(struct task_struct *dead_task) void release_task(struct task_struct *p) { struct task_struct *leader; - struct pid *thread_pid; int zap_leader; + struct release_task_post post; repeat: + memset(&post, 0, sizeof(post)); + + proc_flush_pid(p->thread_pid); + /* don't need to get the RCU readlock here - the process is dead and * can't be modifying its own credentials. But shut RCU-lockdep up */ rcu_read_lock(); @@ -253,8 +266,7 @@ void release_task(struct task_struct *p) write_lock_irq(&tasklist_lock); ptrace_release_task(p); - thread_pid = get_pid(p->thread_pid); - __exit_signal(p); + __exit_signal(&post, p); /* * If we are the last non-leader member of the thread @@ -276,8 +288,21 @@ void release_task(struct task_struct *p) } write_unlock_irq(&tasklist_lock); - proc_flush_pid(thread_pid); - put_pid(thread_pid); + + /* + * Process clean up deferred to after we drop the tasklist lock. + */ + add_device_randomness((const void*) &post.randomness, + sizeof(unsigned long long)); + if (post.pids[PIDTYPE_PID]) + free_pid(post.pids[PIDTYPE_PID]); + if (post.pids[PIDTYPE_TGID]) + free_pid(post.pids[PIDTYPE_TGID]); + if (post.pids[PIDTYPE_PGID]) + free_pid(post.pids[PIDTYPE_PGID]); + if (post.pids[PIDTYPE_SID]) + free_pid(post.pids[PIDTYPE_SID]); + release_thread(p); put_task_struct_rcu_user(p); diff --git a/kernel/pid.c b/kernel/pid.c index 3a10a7b6fcf8..047cdbcef5cf 100644 --- a/kernel/pid.c +++ b/kernel/pid.c @@ -343,7 +343,7 @@ void attach_pid(struct task_struct *task, enum pid_type type) hlist_add_head_rcu(&task->pid_links[type], &pid->tasks[type]); } -static void __change_pid(struct task_struct *task, enum pid_type type, +static struct pid *__change_pid(struct task_struct *task, enum pid_type type, struct pid *new) { struct pid **pid_ptr = task_pid_ptr(task, type); @@ -362,20 +362,33 @@ static void __change_pid(struct task_struct *task, enum pid_type type, for (tmp = PIDTYPE_MAX; --tmp >= 0; ) if (pid_has_task(pid, tmp)) - return; + return NULL; - free_pid(pid); + return pid; +} + +struct pid *detach_pid_return(struct task_struct *task, enum pid_type type) +{ + return __change_pid(task, type, NULL); } void detach_pid(struct task_struct *task, enum pid_type type) { - __change_pid(task, type, NULL); + struct pid *pid; + + pid = detach_pid_return(task, type); + if (pid) + free_pid(pid); } void change_pid(struct task_struct *task, enum pid_type type, struct pid *pid) { - __change_pid(task, type, pid); + struct pid *opid; + + opid = __change_pid(task, type, pid); + if (opid) + free_pid(opid); attach_pid(task, type); }