diff mbox series

[4/3] pid: Improve the comment about waiting in zap_pid_ns_processes

Message ID 878skmpcib.fsf_-_@x220.int.ebiederm.org (mailing list archive)
State New, archived
Headers show
Series [1/3] uml: Don't consult current to find the proc_mnt in mconsole_proc | expand

Commit Message

Eric W. Biederman Feb. 28, 2020, 10:34 p.m. UTC
Oleg wrote a very informative comment, but with the removal of
proc_cleanup_work it is no longer accurate.

Rewrite the comment so that it only talks about the details
that are still relevant, and hopefully is a little clearer.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 kernel/pid_namespace.c | 31 +++++++++++++++++++------------
 1 file changed, 19 insertions(+), 12 deletions(-)

Comments

Christian Brauner Feb. 29, 2020, 2:59 a.m. UTC | #1
On Fri, Feb 28, 2020 at 04:34:20PM -0600, Eric W. Biederman wrote:
> 
> Oleg wrote a very informative comment, but with the removal of
> proc_cleanup_work it is no longer accurate.
> 
> Rewrite the comment so that it only talks about the details
> that are still relevant, and hopefully is a little clearer.
> 
> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
> ---
>  kernel/pid_namespace.c | 31 +++++++++++++++++++------------
>  1 file changed, 19 insertions(+), 12 deletions(-)
> 
> diff --git a/kernel/pid_namespace.c b/kernel/pid_namespace.c
> index 318fcc6ba301..01f8ba32cc0c 100644
> --- a/kernel/pid_namespace.c
> +++ b/kernel/pid_namespace.c
> @@ -224,20 +224,27 @@ void zap_pid_ns_processes(struct pid_namespace *pid_ns)
>  	} while (rc != -ECHILD);
>  
>  	/*
> -	 * kernel_wait4() above can't reap the EXIT_DEAD children but we do not
> -	 * really care, we could reparent them to the global init. We could
> -	 * exit and reap ->child_reaper even if it is not the last thread in
> -	 * this pid_ns, free_pid(pid_allocated == 0) calls proc_cleanup_work(),
> -	 * pid_ns can not go away until proc_kill_sb() drops the reference.
> +	 * kernel_wait4() misses EXIT_DEAD children, and EXIT_ZOMBIE
> +	 * process whose parents processes are outside of the pid
> +	 * namespace.  Such processes are created with setns()+fork().
>  	 *
> -	 * But this ns can also have other tasks injected by setns()+fork().
> -	 * Again, ignoring the user visible semantics we do not really need
> -	 * to wait until they are all reaped, but they can be reparented to
> -	 * us and thus we need to ensure that pid->child_reaper stays valid
> -	 * until they all go away. See free_pid()->wake_up_process().
> +	 * If those EXIT_ZOMBIE processes are not reaped by their
> +	 * parents before their parents exit, they will be reparented
> +	 * to pid_ns->child_reaper.  Thus pidns->child_reaper needs to
> +	 * stay valid until they all go away.
>  	 *
> -	 * We rely on ignored SIGCHLD, an injected zombie must be autoreaped
> -	 * if reparented.
> +	 * The code relies on the the pid_ns->child_reaper ignoring

s/the the/the/

Hm, can we maybe reformulate this to:

"The code relies on having made pid_ns->child_reaper ignore SIGCHLD above
causing EXIT_ZOMBIE processes to be autoreaped if reparented."

Which imho makes it clearer that it was us ensuring that SIGCHLD is
ignored. Someone not too familiar with the exit codepaths might be
looking at zap_pid_ns_processes() not knowing that it is only called
when namespace init is exiting.

Otherwise

Acked-by: Christian Brauner <christian.brauner@ubuntu.com>
diff mbox series

Patch

diff --git a/kernel/pid_namespace.c b/kernel/pid_namespace.c
index 318fcc6ba301..01f8ba32cc0c 100644
--- a/kernel/pid_namespace.c
+++ b/kernel/pid_namespace.c
@@ -224,20 +224,27 @@  void zap_pid_ns_processes(struct pid_namespace *pid_ns)
 	} while (rc != -ECHILD);
 
 	/*
-	 * kernel_wait4() above can't reap the EXIT_DEAD children but we do not
-	 * really care, we could reparent them to the global init. We could
-	 * exit and reap ->child_reaper even if it is not the last thread in
-	 * this pid_ns, free_pid(pid_allocated == 0) calls proc_cleanup_work(),
-	 * pid_ns can not go away until proc_kill_sb() drops the reference.
+	 * kernel_wait4() misses EXIT_DEAD children, and EXIT_ZOMBIE
+	 * process whose parents processes are outside of the pid
+	 * namespace.  Such processes are created with setns()+fork().
 	 *
-	 * But this ns can also have other tasks injected by setns()+fork().
-	 * Again, ignoring the user visible semantics we do not really need
-	 * to wait until they are all reaped, but they can be reparented to
-	 * us and thus we need to ensure that pid->child_reaper stays valid
-	 * until they all go away. See free_pid()->wake_up_process().
+	 * If those EXIT_ZOMBIE processes are not reaped by their
+	 * parents before their parents exit, they will be reparented
+	 * to pid_ns->child_reaper.  Thus pidns->child_reaper needs to
+	 * stay valid until they all go away.
 	 *
-	 * We rely on ignored SIGCHLD, an injected zombie must be autoreaped
-	 * if reparented.
+	 * The code relies on the the pid_ns->child_reaper ignoring
+	 * SIGCHILD to cause those EXIT_ZOMBIE processes to be
+	 * autoreaped if reparented.
+	 *
+	 * Semantically it is also desirable to wait for EXIT_ZOMBIE
+	 * processes before allowing the child_reaper to be reaped, as
+	 * that gives the invariant that when the init process of a
+	 * pid namespace is reaped all of the processes in the pid
+	 * namespace are gone.
+	 *
+	 * Once all of the other tasks are gone from the pid_namespace
+	 * free_pid() will awaken this task.
 	 */
 	for (;;) {
 		set_current_state(TASK_INTERRUPTIBLE);