diff mbox series

[v3,3/6] exit: postpone tty_kref_put() until after tasklist_lock is dropped

Message ID 20250201163106.28912-4-mjguzik@gmail.com (mailing list archive)
State New
Headers show
Series reduce tasklist_lock hold time on exit and do some | expand

Commit Message

Mateusz Guzik Feb. 1, 2025, 4:31 p.m. UTC
Instead of smuggling the tty pointer directly, use a struct so that more
things can be added later.

Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
---
 kernel/exit.c | 22 +++++++++++++++-------
 1 file changed, 15 insertions(+), 7 deletions(-)

Comments

Oleg Nesterov Feb. 3, 2025, 6:06 p.m. UTC | #1
On 02/01, Mateusz Guzik wrote:
>
> Instead of smuggling the tty pointer directly, use a struct so that more
> things can be added later.

I am not sure this particular change worth the effort, but I won't argue.
I'd like to know what Eric thinks.




OTOH, if we do this, then perhaps we can do more "call tty_kref_put()
lockless" changes later. And perhaps even add the new

	void tty_kref_put_sync(struct tty_struct *tty)
	{
		if (tty)
			kref_put(&tty->kref, release_one_tty);
	}

helper. With this change release_task() doesn't need to abuse
schedule_work(), and this helper can have more users.

Nevermind, this is almost off-topic.

Oleg.
Mateusz Guzik Feb. 3, 2025, 7:33 p.m. UTC | #2
On Mon, Feb 3, 2025 at 7:07 PM Oleg Nesterov <oleg@redhat.com> wrote:
>
> On 02/01, Mateusz Guzik wrote:
> >
> > Instead of smuggling the tty pointer directly, use a struct so that more
> > things can be added later.
>
> I am not sure this particular change worth the effort, but I won't argue.
> I'd like to know what Eric thinks.
>

it trivially whacks an atomic from an area protected by a global lock

> OTOH, if we do this, then perhaps we can do more "call tty_kref_put()
> lockless" changes later. And perhaps even add the new
>
>         void tty_kref_put_sync(struct tty_struct *tty)
>         {
>                 if (tty)
>                         kref_put(&tty->kref, release_one_tty);
>         }
>
> helper. With this change release_task() doesn't need to abuse
> schedule_work(), and this helper can have more users.
>
> Nevermind, this is almost off-topic.
>

I have no interest in messing with ttys, regardless I agree this is
definitely something to consider *after* this patch is sorted out.
Oleg Nesterov Feb. 4, 2025, 11:22 a.m. UTC | #3
On 02/03, Mateusz Guzik wrote:
>
> On Mon, Feb 3, 2025 at 7:07 PM Oleg Nesterov <oleg@redhat.com> wrote:
> >
> > On 02/01, Mateusz Guzik wrote:
> > >
> > > Instead of smuggling the tty pointer directly, use a struct so that more
> > > things can be added later.
> >
> > I am not sure this particular change worth the effort, but I won't argue.
> > I'd like to know what Eric thinks.
> >
>
> it trivially whacks an atomic from an area protected by a global lock

Yes, but I am not sure this can make a noticeable difference... Let
me repeat, I am not really arguing, I leave this to you and Eric.
The patch looks correct and I have already acked it. Just it looks
"less obvious" to me than other changes in this series.

And even if we do this, I am not sure we need the new
"struct release_task_post", it seems we can simply move
tty_kref_put() from __exit_signal() to release_task(), see the
patch at the end.

Nobody can touch signal->tty after the group leader passes __exit_signal(),
if nothing else lock_task_sighand() can't succeed.

But again, feel free to ignore.

Oleg.

--- x/kernel/exit.c
+++ x/kernel/exit.c
@@ -146,7 +146,6 @@ static void __exit_signal(struct task_st
 	struct signal_struct *sig = tsk->signal;
 	bool group_dead = thread_group_leader(tsk);
 	struct sighand_struct *sighand;
-	struct tty_struct *tty;
 	u64 utime, stime;
 
 	sighand = rcu_dereference_check(tsk->sighand,
@@ -159,10 +158,7 @@ static void __exit_signal(struct task_st
 		posix_cpu_timers_exit_group(tsk);
 #endif
 
-	if (group_dead) {
-		tty = sig->tty;
-		sig->tty = NULL;
-	} else {
+	if (!group_dead) {
 		/*
 		 * If there is any task waiting for the group exit
 		 * then notify it:
@@ -210,10 +206,8 @@ static void __exit_signal(struct task_st
 
 	__cleanup_sighand(sighand);
 	clear_tsk_thread_flag(tsk, TIF_SIGPENDING);
-	if (group_dead) {
+	if (group_dead)
 		flush_sigqueue(&sig->shared_pending);
-		tty_kref_put(tty);
-	}
 }
 
 static void delayed_put_task_struct(struct rcu_head *rhp)
@@ -279,6 +273,10 @@ repeat:
 	proc_flush_pid(thread_pid);
 	put_pid(thread_pid);
 	release_thread(p);
+
+	if (p == leader)
+		tty_kref_put(p->signal->tty);
+
 	put_task_struct_rcu_user(p);
 
 	p = leader;
Mateusz Guzik Feb. 4, 2025, 12:12 p.m. UTC | #4
On Tue, Feb 4, 2025 at 12:23 PM Oleg Nesterov <oleg@redhat.com> wrote:
>
> On 02/03, Mateusz Guzik wrote:
> >
> > On Mon, Feb 3, 2025 at 7:07 PM Oleg Nesterov <oleg@redhat.com> wrote:
> > >
> > > On 02/01, Mateusz Guzik wrote:
> > > >
> > > > Instead of smuggling the tty pointer directly, use a struct so that more
> > > > things can be added later.
> > >
> > > I am not sure this particular change worth the effort, but I won't argue.
> > > I'd like to know what Eric thinks.
> > >
> >
> > it trivially whacks an atomic from an area protected by a global lock
>
> Yes, but I am not sure this can make a noticeable difference... Let
> me repeat, I am not really arguing, I leave this to you and Eric.
> The patch looks correct and I have already acked it. Just it looks
> "less obvious" to me than other changes in this series.

Reducing lock hold time is basic multicore hygiene, doubly so for global locks.

Atomics are notoriously slow even if cache-hot.

I agree this change in isolation is not a big deal, but there are
several other "not a big deal"s in the area which will add up if
sorted out.

Given the triviality of the change as far as complexity goes, I would
argue that's a routine patch not warranting much of a discussion
(modulo the new struct, for that see below).

>
> And even if we do this, I am not sure we need the new
> "struct release_task_post", it seems we can simply move
> tty_kref_put() from __exit_signal() to release_task(), see the
> patch at the end.
>
> Nobody can touch signal->tty after the group leader passes __exit_signal(),
> if nothing else lock_task_sighand() can't succeed.
>

I wanted to keep "whack the tty" logic intact in order reduce any discussion. ;)

This brings me to the release_task_post struct.

Suppose the tty thing does not get patched at all or gets patched the
way you are proposing here.

I still need a way to handle the pidmap stuff. Suppose to that end a
pid array gets passed directly.

Some time later another thing might pop up which gets determined
within the lock and wants to return the state to clean up after the
unlock. Then someone will need to add a mechanism of the same sort I'm
adding here. iow I'm future-proofing this and tty just happens to be
the first (even if not necessary) consumer. iow preferably the struct
(or an equivalent) would still be there without tty stuff.

Given the nature of the change there is a lot of opinionated
handwaving possible and "happy to change" vs "feel free to ignore" is
not a productive discussion, thus I would like to clarify: my primary
interest is to whack the pidmap thing out of tasklist_lock-protected
area and this bit I'm going to argue about. Everything else is minor
touch ups which I can drop without much resistance (I'll note though
there is more I can submit in the area later :P), but if you genuinely
want something changed, it would be best explicitly say it.

As is, given the prior ack, I intend to submit v4 with minor touch ups.

> But again, feel free to ignore.
>
> Oleg.
>
> --- x/kernel/exit.c
> +++ x/kernel/exit.c
> @@ -146,7 +146,6 @@ static void __exit_signal(struct task_st
>         struct signal_struct *sig = tsk->signal;
>         bool group_dead = thread_group_leader(tsk);
>         struct sighand_struct *sighand;
> -       struct tty_struct *tty;
>         u64 utime, stime;
>
>         sighand = rcu_dereference_check(tsk->sighand,
> @@ -159,10 +158,7 @@ static void __exit_signal(struct task_st
>                 posix_cpu_timers_exit_group(tsk);
>  #endif
>
> -       if (group_dead) {
> -               tty = sig->tty;
> -               sig->tty = NULL;
> -       } else {
> +       if (!group_dead) {
>                 /*
>                  * If there is any task waiting for the group exit
>                  * then notify it:
> @@ -210,10 +206,8 @@ static void __exit_signal(struct task_st
>
>         __cleanup_sighand(sighand);
>         clear_tsk_thread_flag(tsk, TIF_SIGPENDING);
> -       if (group_dead) {
> +       if (group_dead)
>                 flush_sigqueue(&sig->shared_pending);
> -               tty_kref_put(tty);
> -       }
>  }
>
>  static void delayed_put_task_struct(struct rcu_head *rhp)
> @@ -279,6 +273,10 @@ repeat:
>         proc_flush_pid(thread_pid);
>         put_pid(thread_pid);
>         release_thread(p);
> +
> +       if (p == leader)
> +               tty_kref_put(p->signal->tty);
> +
>         put_task_struct_rcu_user(p);
>
>         p = leader;
>
diff mbox series

Patch

diff --git a/kernel/exit.c b/kernel/exit.c
index 257dd8ed45ea..d2c74f93f7d2 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -122,6 +122,13 @@  static __init int kernel_exit_sysfs_init(void)
 late_initcall(kernel_exit_sysfs_init);
 #endif
 
+/*
+ * For things release_task() would like to do *after* tasklist_lock is released.
+ */
+struct release_task_post {
+	struct tty_struct *tty;
+};
+
 static void __unhash_process(struct task_struct *p, bool group_dead)
 {
 	nr_threads--;
@@ -141,12 +148,11 @@  static void __unhash_process(struct task_struct *p, bool group_dead)
 /*
  * This function expects the tasklist_lock write-locked.
  */
-static void __exit_signal(struct task_struct *tsk)
+static void __exit_signal(struct release_task_post *post, struct task_struct *tsk)
 {
 	struct signal_struct *sig = tsk->signal;
 	bool group_dead = thread_group_leader(tsk);
 	struct sighand_struct *sighand;
-	struct tty_struct *tty;
 	u64 utime, stime;
 
 	sighand = rcu_dereference_check(tsk->sighand,
@@ -160,7 +166,7 @@  static void __exit_signal(struct task_struct *tsk)
 #endif
 
 	if (group_dead) {
-		tty = sig->tty;
+		post->tty = sig->tty;
 		sig->tty = NULL;
 	} else {
 		/*
@@ -207,10 +213,8 @@  static void __exit_signal(struct task_struct *tsk)
 
 	__cleanup_sighand(sighand);
 	clear_tsk_thread_flag(tsk, TIF_SIGPENDING);
-	if (group_dead) {
+	if (group_dead)
 		flush_sigqueue(&sig->shared_pending);
-		tty_kref_put(tty);
-	}
 }
 
 static void delayed_put_task_struct(struct rcu_head *rhp)
@@ -236,10 +240,13 @@  void __weak release_thread(struct task_struct *dead_task)
 
 void release_task(struct task_struct *p)
 {
+	struct release_task_post post;
 	struct task_struct *leader;
 	struct pid *thread_pid;
 	int zap_leader;
 repeat:
+	memset(&post, 0, sizeof(post));
+
 	/* don't need to get the RCU readlock here - the process is dead and
 	 * can't be modifying its own credentials. But shut RCU-lockdep up */
 	rcu_read_lock();
@@ -252,7 +259,7 @@  void release_task(struct task_struct *p)
 
 	write_lock_irq(&tasklist_lock);
 	ptrace_release_task(p);
-	__exit_signal(p);
+	__exit_signal(&post, p);
 
 	/*
 	 * If we are the last non-leader member of the thread
@@ -280,6 +287,7 @@  void release_task(struct task_struct *p)
 			      sizeof(unsigned long long));
 	release_thread(p);
 	put_task_struct_rcu_user(p);
+	tty_kref_put(post.tty);
 
 	p = leader;
 	if (unlikely(zap_leader))