mbox series

[v6,0/5] reduce tasklist_lock hold time on exit and do some pid cleanup

Message ID 20250206164415.450051-1-mjguzik@gmail.com (mailing list archive)
Headers show
Series reduce tasklist_lock hold time on exit and do some pid cleanup | expand

Message

Mateusz Guzik Feb. 6, 2025, 4:44 p.m. UTC
I fixed fat-fingering and touchedup some commit messages. git diff
between the old and new branch does not show any code changes.

hopefully we will be done here after this iteration 8->

old cover letter:

The clone side contends against exit side in a way which avoidably
exacerbates the problem by the latter waiting on locks held by the
former while holding the tasklist_lock.

Whacking this for both add_device_randomness and pids allocation gives
me a 15% speed up for thread creation/destruction in a 24-core vm.

The random patch is worth about 4%.

The new bottleneck is pidmap_lock itself, with the biggest problem being
the allocation itself taking the lock *twice*.

Bench (plop into will-it-scale):
$ cat tests/threadspawn1.c

char *testcase_description = "Thread creation and teardown";

static void *worker(void *arg)
{
	return (NULL);
}

void testcase(unsigned long long *iterations, unsigned long nr)
{
	pthread_t thread;
	int error;

	while (1) {
		error = pthread_create(&thread, NULL, worker, NULL);
		assert(error == 0);
		error = pthread_join(thread, NULL);
		assert(error == 0);
		(*iterations)++;
	}
}

v6:
- expand on the commit message in 3/5 pid: sprinkle tasklist_lock asserts
- move fat-fingered free_pids call to the right patch

v5:
- whack scripts/selinux/genheaders/genheaders which accidentally got in
- rebased on next-20250205

v4:
- justify moving get_pid in the commit message with a one-liner
- drop the tty unref patch -- it is completely optional and Oleg has his
  own variant
- add the ACK by Oleg

v3:
- keep procfs flush where it was, instead hoist get_pid outside of the
  lock
- make detach_pid et al accept an array argument of pids to populate
- sprinkle asserts
- drop irq trips around pidmap_lock
- move tty unref outside of tasklist_lock



Mateusz Guzik (5):
  exit: perform add_device_randomness() without tasklist_lock
  exit: hoist get_pid() in release_task() outside of tasklist_lock
  pid: sprinkle tasklist_lock asserts
  pid: perform free_pid() calls outside of tasklist_lock
  pid: drop irq disablement around pidmap_lock

 include/linux/pid.h |  7 ++--
 kernel/exit.c       | 36 +++++++++++++-------
 kernel/pid.c        | 82 +++++++++++++++++++++++++--------------------
 kernel/sys.c        | 14 +++++---
 4 files changed, 82 insertions(+), 57 deletions(-)

Comments

Liam R. Howlett Feb. 6, 2025, 5:14 p.m. UTC | #1
* Mateusz Guzik <mjguzik@gmail.com> [250206 11:44]:
> I fixed fat-fingering and touchedup some commit messages. git diff
> between the old and new branch does not show any code changes.


Thanks for doing these changes.

Acked-by: Liam R. Howlett <Liam.Howlett@Oracle.com>

> 
> hopefully we will be done here after this iteration 8->
> 
> old cover letter:
> 
> The clone side contends against exit side in a way which avoidably
> exacerbates the problem by the latter waiting on locks held by the
> former while holding the tasklist_lock.
> 
> Whacking this for both add_device_randomness and pids allocation gives
> me a 15% speed up for thread creation/destruction in a 24-core vm.
> 
> The random patch is worth about 4%.
> 
> The new bottleneck is pidmap_lock itself, with the biggest problem being
> the allocation itself taking the lock *twice*.
> 
> Bench (plop into will-it-scale):
> $ cat tests/threadspawn1.c
> 
> char *testcase_description = "Thread creation and teardown";
> 
> static void *worker(void *arg)
> {
> 	return (NULL);
> }
> 
> void testcase(unsigned long long *iterations, unsigned long nr)
> {
> 	pthread_t thread;
> 	int error;
> 
> 	while (1) {
> 		error = pthread_create(&thread, NULL, worker, NULL);
> 		assert(error == 0);
> 		error = pthread_join(thread, NULL);
> 		assert(error == 0);
> 		(*iterations)++;
> 	}
> }
> 
> v6:
> - expand on the commit message in 3/5 pid: sprinkle tasklist_lock asserts
> - move fat-fingered free_pids call to the right patch
> 
> v5:
> - whack scripts/selinux/genheaders/genheaders which accidentally got in
> - rebased on next-20250205
> 
> v4:
> - justify moving get_pid in the commit message with a one-liner
> - drop the tty unref patch -- it is completely optional and Oleg has his
>   own variant
> - add the ACK by Oleg
> 
> v3:
> - keep procfs flush where it was, instead hoist get_pid outside of the
>   lock
> - make detach_pid et al accept an array argument of pids to populate
> - sprinkle asserts
> - drop irq trips around pidmap_lock
> - move tty unref outside of tasklist_lock
> 
> 
> 
> Mateusz Guzik (5):
>   exit: perform add_device_randomness() without tasklist_lock
>   exit: hoist get_pid() in release_task() outside of tasklist_lock
>   pid: sprinkle tasklist_lock asserts
>   pid: perform free_pid() calls outside of tasklist_lock
>   pid: drop irq disablement around pidmap_lock
> 
>  include/linux/pid.h |  7 ++--
>  kernel/exit.c       | 36 +++++++++++++-------
>  kernel/pid.c        | 82 +++++++++++++++++++++++++--------------------
>  kernel/sys.c        | 14 +++++---
>  4 files changed, 82 insertions(+), 57 deletions(-)
> 
> -- 
> 2.43.0
>
Christian Brauner Feb. 7, 2025, 9:18 a.m. UTC | #2
On Thu, 06 Feb 2025 17:44:09 +0100, Mateusz Guzik wrote:
> I fixed fat-fingering and touchedup some commit messages. git diff
> between the old and new branch does not show any code changes.
> 
> hopefully we will be done here after this iteration 8->
> 
> old cover letter:
> 
> [...]

This is great!

---

Applied to the kernel-6.15.tasklist_lock branch of the vfs/vfs.git tree.
Patches in the kernel-6.15.tasklist_lock branch should appear in linux-next soon.

Please report any outstanding bugs that were missed during review in a
new review to the original patch series allowing us to drop it.

It's encouraged to provide Acked-bys and Reviewed-bys even though the
patch has now been applied. If possible patch trailers will be updated.

Note that commit hashes shown below are subject to change due to rebase,
trailer updates or similar. If in doubt, please check the listed branch.

tree:   https://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git
branch: kernel-6.15.tasklist_lock

[1/5] exit: perform add_device_randomness() without tasklist_lock
      https://git.kernel.org/vfs/vfs/c/c8be3764feff
[2/5] exit: hoist get_pid() in release_task() outside of tasklist_lock
      https://git.kernel.org/vfs/vfs/c/9ee2997cbf14
[3/5] pid: sprinkle tasklist_lock asserts
      https://git.kernel.org/vfs/vfs/c/7d0030e1660c
[4/5] pid: perform free_pid() calls outside of tasklist_lock
      https://git.kernel.org/vfs/vfs/c/7b15589a575f
[5/5] pid: drop irq disablement around pidmap_lock
      https://git.kernel.org/vfs/vfs/c/07e518c464e8
Oleg Nesterov Feb. 7, 2025, 9:42 a.m. UTC | #3
On 02/07, Christian Brauner wrote:
>
> Applied to the kernel-6.15.tasklist_lock branch of the vfs/vfs.git tree.

Great.

Then perhaps you can also take the related

	[PATCH v2 0/2] exit: change the release_task() paths to call flush_sigqueue() lockless
	https://lore.kernel.org/all/20250206152244.GA14609@redhat.com/

changes?

Oleg.