From patchwork Thu Mar 20 09:29:06 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Christian Brauner X-Patchwork-Id: 14023595 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 502DB1C1F2F for ; Thu, 20 Mar 2025 09:29:19 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1742462960; cv=none; b=RDndxMmaRR7if6auCSGKgPfueFNyxYB03J3NpUsD3nxd7auSoAXMSTFUTTwaYkxDWqumb1irZkb13Q12fjXOX/OelwuUrA6Aad3n5eCeABjd8xAtIHDbph5QUxlITfTLQCmN91w16Kv3fXjvOK4IFHPiRyYIv+Ok6J5QkeFbMzE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1742462960; c=relaxed/simple; bh=r50E0l5C8uDz15M+6YCpeInJcjFSrcv03N1iDigSQRc=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=ogNUyt+Z4dgNScZF9LxyUBio8SkFhmKvr/pGj+NgGJfDOZisvmQLa6yIzL9pTV3mNsfeeT1Q/ZE9LPwdhYKoygDtCPRYLxP6HtxkZLsR+2TyS9mssYYij50j8vRrJjhBqSkGDratYdXS6o+n560ijs1NWUk7Xp26ZkXmcDeThmI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=LN128tVL; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="LN128tVL" Received: by smtp.kernel.org (Postfix) with ESMTPSA id BC558C4CEDD; Thu, 20 Mar 2025 09:29:17 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1742462959; bh=r50E0l5C8uDz15M+6YCpeInJcjFSrcv03N1iDigSQRc=; h=From:Date:Subject:References:In-Reply-To:To:Cc:From; b=LN128tVLtp9Rqd8iQfoig6PMPq44VmmnENApEgmP008G3Q3/48opLnPaDwg8EZ+/D 49PCY4Zs4zwxx9s2NB2/Aup2NZQK0d3aNr2cB5DeP795fq4uY54mRYnvcGTszYVC8M hWmKHzBKWJglSSYnlI1VRzyLPrFzsiIaqchbtXpHGIF/MpvyEHQ8b4K4VsrGPK/J3r MlGSlLF1sgEJ77KNyK36+IJjx8kQZ8RtW3qrBPJi037YGNZBENTh/79MReM3HudCS8 K8h8pq3h/P2h7t2vr+z9Vr7gjkhjaqTiul8ZNrXROAYbOzVyhK4lwmqi2c2wCjH4F3 yvuRcP60UY1XA== From: Christian Brauner Date: Thu, 20 Mar 2025 10:29:06 +0100 Subject: [PATCH v3 1/4] pidfs: improve multi-threaded exec and premature thread-group leader exit polling Precedence: bulk X-Mailing-List: linux-fsdevel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Message-Id: <20250320-work-pidfs-thread_group-v3-1-b7e5f7e2c3b1@kernel.org> References: <20250320-work-pidfs-thread_group-v3-0-b7e5f7e2c3b1@kernel.org> In-Reply-To: <20250320-work-pidfs-thread_group-v3-0-b7e5f7e2c3b1@kernel.org> To: Oleg Nesterov Cc: linux-fsdevel@vger.kernel.org, Jeff Layton , Lennart Poettering , Daan De Meyer , Mike Yuan , Christian Brauner X-Mailer: b4 0.15-dev-42535 X-Developer-Signature: v=1; a=openpgp-sha256; l=5992; i=brauner@kernel.org; h=from:subject:message-id; bh=r50E0l5C8uDz15M+6YCpeInJcjFSrcv03N1iDigSQRc=; b=owGbwMvMwCU28Zj0gdSKO4sYT6slMaTfvv/ywQcex+W/dr9Nbfb+fueW+NWX1WW/vpitO8a6l 2/ir/9lbh2lLAxiXAyyYoosDu0m4XLLeSo2G2VqwMxhZQIZwsDFKQAT8TrI8D/Ny2iD70Nrvq/t +7/U161aV/d+ad/W1W/e7tebUs5nqFvA8M9e8bj1+sh9fNLaU289to26yPU7fklVsJP9bO9U56P 37nIAAA== X-Developer-Key: i=brauner@kernel.org; a=openpgp; fpr=4880B8C9BD0E5106FC070F4F7B3C391EFEA93624 This is another attempt trying to make pidfd polling for multi-threaded exec and premature thread-group leader exit consistent. A quick recap of these two cases: (1) During a multi-threaded exec by a subthread, i.e., non-thread-group leader thread, all other threads in the thread-group including the thread-group leader are killed and the struct pid of the thread-group leader will be taken over by the subthread that called exec. IOW, two tasks change their TIDs. (2) A premature thread-group leader exit means that the thread-group leader exited before all of the other subthreads in the thread-group have exited. Both cases lead to inconsistencies for pidfd polling with PIDFD_THREAD. Any caller that holds a PIDFD_THREAD pidfd to the current thread-group leader may or may not see an exit notification on the file descriptor depending on when poll is performed. If the poll is performed before the exec of the subthread has concluded an exit notification is generated for the old thread-group leader. If the poll is performed after the exec of the subthread has concluded no exit notification is generated for the old thread-group leader. The correct behavior would be to simply not generate an exit notification on the struct pid of a subhthread exec because the struct pid is taken over by the subthread and thus remains alive. But this is difficult to handle because a thread-group may exit prematurely as mentioned in (2). In that case an exit notification is reliably generated but the subthreads may continue to run for an indeterminate amount of time and thus also may exec at some point. So far there was no way to distinguish between (1) and (2) internally. This tiny series tries to address this problem by discarding PIDFD_THREAD notification on premature thread-group leader exit. If that works correctly then no exit notifications are generated for a PIDFD_THREAD pidfd for a thread-group leader until all subthreads have been reaped. If a subthread should exec aftewards no exit notification will be generated until that task exits or it creates subthreads and repeates the cycle. Co-Developed-by: Oleg Nesterov Signed-off-by: Oleg Nesterov Signed-off-by: Christian Brauner --- fs/pidfs.c | 22 +++++++++++++++++++++- kernel/exit.c | 12 +++++++++--- kernel/signal.c | 6 ++++-- 3 files changed, 34 insertions(+), 6 deletions(-) diff --git a/fs/pidfs.c b/fs/pidfs.c index a48cc44ced6f..f1c49a7540f3 100644 --- a/fs/pidfs.c +++ b/fs/pidfs.c @@ -218,12 +218,32 @@ static __poll_t pidfd_poll(struct file *file, struct poll_table_struct *pts) /* * Depending on PIDFD_THREAD, inform pollers when the thread * or the whole thread-group exits. + * + * There are two corner cases to consider: + * + * (1) If a thread-group leader of a thread-group with + * subthreads exits prematurely, i.e., before all of the + * subthreads of the thread-group have exited then no + * notification will be generated for PIDFD_THREAD pidfds + * referring to the thread-group leader. + * + * The exit notification for the thread-group leader will be + * delayed until the last subthread of the thread-group + * exits. + * + * (2) If a subthread of a thread-group execs then the + * current thread-group leader will be SIGKILLed and the + * subthread will assume the struct pid of the now defunct + * old thread-group leader. No exit notification will be + * generated for PIDFD_THREAD pidfds referring to the old + * thread-group leader as they continue referring to the new + * thread-group leader. */ guard(rcu)(); task = pid_task(pid, PIDTYPE_PID); if (!task) poll_flags = EPOLLIN | EPOLLRDNORM | EPOLLHUP; - else if (task->exit_state && (thread || thread_group_empty(task))) + else if (task->exit_state && !delay_group_leader(task)) poll_flags = EPOLLIN | EPOLLRDNORM; return poll_flags; diff --git a/kernel/exit.c b/kernel/exit.c index 9916305e34d3..ce5cdad5ba9c 100644 --- a/kernel/exit.c +++ b/kernel/exit.c @@ -271,6 +271,9 @@ void release_task(struct task_struct *p) * If we were the last child thread and the leader has * exited already, and the leader's parent ignores SIGCHLD, * then we are the one who should release the leader. + * + * This will also wake PIDFD_THREAD pidfds for the + * thread-group leader that already exited. */ zap_leader = do_notify_parent(leader, leader->exit_signal); if (zap_leader) @@ -743,10 +746,13 @@ static void exit_notify(struct task_struct *tsk, int group_dead) tsk->exit_state = EXIT_ZOMBIE; /* - * sub-thread or delay_group_leader(), wake up the - * PIDFD_THREAD waiters. + * Wake up PIDFD_THREAD waiters if this is a proper subthread + * exit. If this is a premature thread-group leader exit delay + * the notification until the last subthread exits. If a + * subthread should exec before then no notification will be + * generated. */ - if (!thread_group_empty(tsk)) + if (!delay_group_leader(tsk)) do_notify_pidfd(tsk); if (unlikely(tsk->ptrace)) { diff --git a/kernel/signal.c b/kernel/signal.c index 081f19a24506..0ccef8783dff 100644 --- a/kernel/signal.c +++ b/kernel/signal.c @@ -2180,8 +2180,10 @@ bool do_notify_parent(struct task_struct *tsk, int sig) WARN_ON_ONCE(!tsk->ptrace && (tsk->group_leader != tsk || !thread_group_empty(tsk))); /* - * tsk is a group leader and has no threads, wake up the - * non-PIDFD_THREAD waiters. + * This is a thread-group leader without subthreads so wake up + * the non-PIDFD_THREAD waiters. This also wakes the + * PIDFD_THREAD waiters for the thread-group leader in case it + * exited prematurely from release_task(). */ if (thread_group_empty(tsk)) do_notify_pidfd(tsk); From patchwork Thu Mar 20 09:29:07 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Christian Brauner X-Patchwork-Id: 14023596 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 298191C1F2F for ; Thu, 20 Mar 2025 09:29:22 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1742462963; cv=none; b=lYs3BjBl6vMUIOpspB7BDc7f0U3LrVyzG+SeCh+8iHriBmp80BaFzNkVly0qbPnFxkUApBclJGEQFr2QlIu9STzMaFIAfCwJHwrk7AQ+rYECFbLuItEjnCPCRjD0ZuMZf49F+47KY1pCSuU9NPsTUS6/bW+0/5/qrYszJV57Kek= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1742462963; c=relaxed/simple; bh=nbqWaPaXqyDnB0HDQ1wL/80R9Ma05kzOsoIHnfmrKKY=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=L2Dl5pxHm0ANqvawAajSGENogjbgFUW30h0LT3c9S3lsAE58Zop2fePsUJDclBgu7J1trBuTVF1GI+pn6MBMx0pwb1p70QWun1Zzy3Tb41IsfPrqX/1hbM1VW+IT937ONvv497QE6j1e+0PHVW/qlKnF4WqkrTADRNmjnhPR3zA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=Y7+ZARnr; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="Y7+ZARnr" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 616D8C4CEEA; Thu, 20 Mar 2025 09:29:20 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1742462962; bh=nbqWaPaXqyDnB0HDQ1wL/80R9Ma05kzOsoIHnfmrKKY=; h=From:Date:Subject:References:In-Reply-To:To:Cc:From; b=Y7+ZARnrM7ijCRnyB7NuKu8lCsqkGzSjfqOiSckKXJreYD2OdDy5fH2M31aIBnbBj 0qxzqwOmRjCATiJfjaE5CXsqIHi3Irf6JiqSgmk3T94N9fNop00WcgwzGzzqYwyM7T kbgcBYe7u+fQI1Opmb0nxyAXZ9G7GeWTYr7ptDuZX8SseWDh885qAC3dh8CsGRiAJr WcZWEha2bcN4h0Ybi+snHTnctXFz27DXiLal/InbRVLx6LauFHlrlC2LJv27x5BovS JypcJvLBdVvsusPjpZZBEAuKq8IDmqKOyDt6pFasOJhyK6AH5twNIaNufuZKvXUMUc Evvz5FVBFVTEQ== From: Christian Brauner Date: Thu, 20 Mar 2025 10:29:07 +0100 Subject: [PATCH v3 2/4] selftests/pidfd: first test for multi-threaded exec polling Precedence: bulk X-Mailing-List: linux-fsdevel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Message-Id: <20250320-work-pidfs-thread_group-v3-2-b7e5f7e2c3b1@kernel.org> References: <20250320-work-pidfs-thread_group-v3-0-b7e5f7e2c3b1@kernel.org> In-Reply-To: <20250320-work-pidfs-thread_group-v3-0-b7e5f7e2c3b1@kernel.org> To: Oleg Nesterov Cc: linux-fsdevel@vger.kernel.org, Jeff Layton , Lennart Poettering , Daan De Meyer , Mike Yuan , Christian Brauner X-Mailer: b4 0.15-dev-42535 X-Developer-Signature: v=1; a=openpgp-sha256; l=2472; i=brauner@kernel.org; h=from:subject:message-id; bh=nbqWaPaXqyDnB0HDQ1wL/80R9Ma05kzOsoIHnfmrKKY=; b=owGbwMvMwCU28Zj0gdSKO4sYT6slMaTfvv9S8detLamdDpFTzn3ayq13/G/wkfsLDrZzs9cU+ Me3Me+t7ShlYRDjYpAVU2RxaDcJl1vOU7HZKFMDZg4rE8gQBi5OAZjI6ssM/90k1180SF/36GLk edUVQodCPtfVHXL2iOLL1C5V/LbiWCrD/7T1B/w+PDz/wPBW+DvfzLxHiRu6lh291haVME9g1qV HizkA X-Developer-Key: i=brauner@kernel.org; a=openpgp; fpr=4880B8C9BD0E5106FC070F4F7B3C391EFEA93624 Add first test for premature thread-group leader exit. Signed-off-by: Christian Brauner --- tools/testing/selftests/pidfd/pidfd_info_test.c | 38 ++++++++++++++++++++----- 1 file changed, 31 insertions(+), 7 deletions(-) diff --git a/tools/testing/selftests/pidfd/pidfd_info_test.c b/tools/testing/selftests/pidfd/pidfd_info_test.c index 09bc4ae7aed5..28a28ae4686a 100644 --- a/tools/testing/selftests/pidfd/pidfd_info_test.c +++ b/tools/testing/selftests/pidfd/pidfd_info_test.c @@ -236,7 +236,7 @@ static void *pidfd_info_pause_thread(void *arg) TEST_F(pidfd_info, thread_group) { - pid_t pid_leader, pid_thread; + pid_t pid_leader, pid_poller, pid_thread; pthread_t thread; int nevents, pidfd_leader, pidfd_thread, pidfd_leader_thread, ret; int ipc_sockets[2]; @@ -262,6 +262,35 @@ TEST_F(pidfd_info, thread_group) syscall(__NR_exit, EXIT_SUCCESS); } + /* + * Opening a PIDFD_THREAD aka thread-specific pidfd based on a + * thread-group leader must succeed. + */ + pidfd_leader_thread = sys_pidfd_open(pid_leader, PIDFD_THREAD); + ASSERT_GE(pidfd_leader_thread, 0); + + pid_poller = fork(); + ASSERT_GE(pid_poller, 0); + if (pid_poller == 0) { + /* + * We can't poll and wait for the old thread-group + * leader to exit using a thread-specific pidfd. The + * thread-group leader exited prematurely and + * notification is delayed until all subthreads have + * exited. + */ + fds.events = POLLIN; + fds.fd = pidfd_leader_thread; + nevents = poll(&fds, 1, 10000 /* wait 5 seconds */); + if (nevents != 0) + _exit(EXIT_FAILURE); + if (fds.revents & POLLIN) + _exit(EXIT_FAILURE); + if (fds.revents & POLLHUP) + _exit(EXIT_FAILURE); + _exit(EXIT_SUCCESS); + } + /* Retrieve the tid of the thread. */ EXPECT_EQ(close(ipc_sockets[1]), 0); ASSERT_EQ(read_nointr(ipc_sockets[0], &pid_thread, sizeof(pid_thread)), sizeof(pid_thread)); @@ -275,12 +304,7 @@ TEST_F(pidfd_info, thread_group) pidfd_thread = sys_pidfd_open(pid_thread, PIDFD_THREAD); ASSERT_GE(pidfd_thread, 0); - /* - * Opening a PIDFD_THREAD aka thread-specific pidfd based on a - * thread-group leader must succeed. - */ - pidfd_leader_thread = sys_pidfd_open(pid_leader, PIDFD_THREAD); - ASSERT_GE(pidfd_leader_thread, 0); + ASSERT_EQ(wait_for_pid(pid_poller), 0); /* * Note that pidfd_leader is a thread-group pidfd, so polling on it From patchwork Thu Mar 20 09:29:08 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Christian Brauner X-Patchwork-Id: 14023597 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7A7A72222AB for ; Thu, 20 Mar 2025 09:29:25 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1742462965; cv=none; b=hPkA4J3xsCzrNiWP3owm38hFV6NvarWkd7R2pRAdwQSRTUyY6ZIhT89APxx8V+G60o2lplYJO4CVZPUNmbqZl3cLNKo0Ge5zAkdv8XduM6oJkK3Efqv9a+5FRj/hd4ZSXz4gG6ypOT8HMGIn4yQqvx0xQz/1sPPhctzzsuuW0kk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1742462965; c=relaxed/simple; bh=CAwFZ+NFSzQrP2Mr5CLbJgEZkFRBy4WeTWw8eSTlCbI=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=sa9wyXxHyy0uhgaqcVt6eX/Xyc9sxdGY2TAJaszf/UG5gmj+vlAzlNuSqJW1Qy5SbEBv3CdNAsroyukOiXqWR/tDIe6PJD2d3Ddsh9n45dqykpBhGpyA7A5STx5FcguiJEeYEpiejkOj79EJtrrWrJLmiSUG4iIoNhTQpwPJ1Zw= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=bIvHc+0y; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="bIvHc+0y" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 25021C4CEE8; Thu, 20 Mar 2025 09:29:22 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1742462964; bh=CAwFZ+NFSzQrP2Mr5CLbJgEZkFRBy4WeTWw8eSTlCbI=; h=From:Date:Subject:References:In-Reply-To:To:Cc:From; b=bIvHc+0yUZMaXYEwF9nOr/HoWzUFUe1scgeqXQQ1+i8mcU/gZtIyOSCDmNIWkbV3i idfTtswPI+upsKxP7lYLeH8sq2gnNOhiFLuorgm/68exmgVdF2BVJpYzIYN+U/Ekyb d4hnnBIK6Zg7FEmCjcMPKvNOB732LQFRMFGjvTA48f9c6dEy1Uf01wS/r9xqo6r+rJ sRpBAu6FByjAyuQ2lO7dFwC/I+nv/FMquNoPHKW2mCexPFkYxbO5QGKR9z0PY4Sv6o 36IQ8c0G/2M2pCWZ613Ck6cUYB62fJMs0aywshFyqPUQy8BW07d6mq6bVAVUN9ItYV 3jghVfq50ap7g== From: Christian Brauner Date: Thu, 20 Mar 2025 10:29:08 +0100 Subject: [PATCH v3 3/4] selftests/pidfd: second test for multi-threaded exec polling Precedence: bulk X-Mailing-List: linux-fsdevel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Message-Id: <20250320-work-pidfs-thread_group-v3-3-b7e5f7e2c3b1@kernel.org> References: <20250320-work-pidfs-thread_group-v3-0-b7e5f7e2c3b1@kernel.org> In-Reply-To: <20250320-work-pidfs-thread_group-v3-0-b7e5f7e2c3b1@kernel.org> To: Oleg Nesterov Cc: linux-fsdevel@vger.kernel.org, Jeff Layton , Lennart Poettering , Daan De Meyer , Mike Yuan , Christian Brauner X-Mailer: b4 0.15-dev-42535 X-Developer-Signature: v=1; a=openpgp-sha256; l=4392; i=brauner@kernel.org; h=from:subject:message-id; bh=CAwFZ+NFSzQrP2Mr5CLbJgEZkFRBy4WeTWw8eSTlCbI=; b=owGbwMvMwCU28Zj0gdSKO4sYT6slMaTfvv/yEVvap7cyF0Lrp9UtSbplrXjClTN0Wd63L/dWi jdukPcO7ihlYRDjYpAVU2RxaDcJl1vOU7HZKFMDZg4rE8gQBi5OAZjIs3kM/3MefZ+27LTUcclf NdxTmqrXNZ5ay6Yu6h9Y1j6Ry+H6j0mMDPNbZv6K2D2n/Knx9suBinZm90LLflx7eDtAPHubZdq 2XmYA X-Developer-Key: i=brauner@kernel.org; a=openpgp; fpr=4880B8C9BD0E5106FC070F4F7B3C391EFEA93624 Ensure that during a multi-threaded exec and premature thread-group leader exit no exit notification is generated. Signed-off-by: Christian Brauner --- tools/testing/selftests/pidfd/pidfd_info_test.c | 72 ++++++++++++++++--------- 1 file changed, 48 insertions(+), 24 deletions(-) diff --git a/tools/testing/selftests/pidfd/pidfd_info_test.c b/tools/testing/selftests/pidfd/pidfd_info_test.c index 28a28ae4686a..4169780c9e55 100644 --- a/tools/testing/selftests/pidfd/pidfd_info_test.c +++ b/tools/testing/selftests/pidfd/pidfd_info_test.c @@ -413,7 +413,7 @@ static void *pidfd_info_thread_exec(void *arg) TEST_F(pidfd_info, thread_group_exec) { - pid_t pid_leader, pid_thread; + pid_t pid_leader, pid_poller, pid_thread; pthread_t thread; int nevents, pidfd_leader, pidfd_leader_thread, pidfd_thread, ret; int ipc_sockets[2]; @@ -439,6 +439,37 @@ TEST_F(pidfd_info, thread_group_exec) syscall(__NR_exit, EXIT_SUCCESS); } + /* Open a thread-specific pidfd for the thread-group leader. */ + pidfd_leader_thread = sys_pidfd_open(pid_leader, PIDFD_THREAD); + ASSERT_GE(pidfd_leader_thread, 0); + + pid_poller = fork(); + ASSERT_GE(pid_poller, 0); + if (pid_poller == 0) { + /* + * We can't poll and wait for the old thread-group + * leader to exit using a thread-specific pidfd. The + * thread-group leader exited prematurely and + * notification is delayed until all subthreads have + * exited. + * + * When the thread has execed it will taken over the old + * thread-group leaders struct pid. Calling poll after + * the thread execed will thus block again because a new + * thread-group has started. + */ + fds.events = POLLIN; + fds.fd = pidfd_leader_thread; + nevents = poll(&fds, 1, 10000 /* wait 5 seconds */); + if (nevents != 0) + _exit(EXIT_FAILURE); + if (fds.revents & POLLIN) + _exit(EXIT_FAILURE); + if (fds.revents & POLLHUP) + _exit(EXIT_FAILURE); + _exit(EXIT_SUCCESS); + } + /* Retrieve the tid of the thread. */ EXPECT_EQ(close(ipc_sockets[1]), 0); ASSERT_EQ(read_nointr(ipc_sockets[0], &pid_thread, sizeof(pid_thread)), sizeof(pid_thread)); @@ -447,33 +478,12 @@ TEST_F(pidfd_info, thread_group_exec) pidfd_thread = sys_pidfd_open(pid_thread, PIDFD_THREAD); ASSERT_GE(pidfd_thread, 0); - /* Open a thread-specific pidfd for the thread-group leader. */ - pidfd_leader_thread = sys_pidfd_open(pid_leader, PIDFD_THREAD); - ASSERT_GE(pidfd_leader_thread, 0); - - /* - * We can poll and wait for the old thread-group leader to exit - * using a thread-specific pidfd. - * - * This only works until the thread has execed. When the thread - * has execed it will have taken over the old thread-group - * leaders struct pid. Calling poll after the thread execed will - * thus block again because a new thread-group has started (Yes, - * it's fscked.). - */ - fds.events = POLLIN; - fds.fd = pidfd_leader_thread; - nevents = poll(&fds, 1, -1); - ASSERT_EQ(nevents, 1); - /* The thread-group leader has exited. */ - ASSERT_TRUE(!!(fds.revents & POLLIN)); - /* The thread-group leader hasn't been reaped. */ - ASSERT_FALSE(!!(fds.revents & POLLHUP)); - /* Now that we've opened a thread-specific pidfd the thread can exec. */ ASSERT_EQ(write_nointr(ipc_sockets[0], &pid_thread, sizeof(pid_thread)), sizeof(pid_thread)); EXPECT_EQ(close(ipc_sockets[0]), 0); + ASSERT_EQ(wait_for_pid(pid_poller), 0); + /* Wait until the kernel has SIGKILLed the thread. */ fds.events = POLLHUP; fds.fd = pidfd_thread; @@ -506,6 +516,20 @@ TEST_F(pidfd_info, thread_group_exec) /* Take down the thread-group leader. */ EXPECT_EQ(sys_pidfd_send_signal(pidfd_leader, SIGKILL, NULL, 0), 0); + + /* + * Afte the exec we're dealing with an empty thread-group so now + * we must see an exit notification on the thread-specific pidfd + * for the thread-group leader as there's no subthread that can + * revive the struct pid. + */ + fds.events = POLLIN; + fds.fd = pidfd_leader_thread; + nevents = poll(&fds, 1, -1); + ASSERT_EQ(nevents, 1); + ASSERT_TRUE(!!(fds.revents & POLLIN)); + ASSERT_FALSE(!!(fds.revents & POLLHUP)); + EXPECT_EQ(sys_waitid(P_PIDFD, pidfd_leader, NULL, WEXITED), 0); /* Retrieve exit information for the thread-group leader. */ From patchwork Thu Mar 20 09:29:09 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Christian Brauner X-Patchwork-Id: 14023598 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 2B3282222AB for ; Thu, 20 Mar 2025 09:29:28 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1742462968; cv=none; b=fUFC0C6sVoXBtdEV7pqHT9xsWR0wCI7gcBNterm0gihm3CHErXZ5sr2TNYAS15uSof+bU/kBsIbGv1t7dAPj0Woe6IJDwz3wV7us2hfQ9tN0qp5jbUMq/6L/q0aoujgCo1RZ7PqMyLcCCLhm2k3GLvSTAn8aKHPVZQ79iiQ1uIE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1742462968; c=relaxed/simple; bh=2oPuMKXAnppK07WHh02sLh41hDtkPLKu83E2A4l+cPw=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=IjxP2OeSYDEZQbiKmGYxlGow1F2dpNKFWITFxzisd37sVQdImmZD6VMUnf7QCj6WqJcF8CLSsKoG+k28FdYD5BXKiXWV7PExcRoA8oqtBFkv4Xwg1z07H7wd9g90Sg58wi1neM9YnZ7oTQJDit1TGsh+keIMFkjVup06Ej8Bdj4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=KJZH4EgO; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="KJZH4EgO" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 776BBC4CEDD; Thu, 20 Mar 2025 09:29:25 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1742462968; bh=2oPuMKXAnppK07WHh02sLh41hDtkPLKu83E2A4l+cPw=; h=From:Date:Subject:References:In-Reply-To:To:Cc:From; b=KJZH4EgOZqKZQILNGONlROMHAzBp96guN32MVLJ2B0FbgKwKnbOFPyc9mqd1U1eX9 +8KdRuwl9PS5eVv3ET3dsQDQUTT0auqa2WkmpPQoIZPAcx16aCO19vMLmMhRymze3N y1lEdh6M3mp1d+3kOHvk9gudJFQgHsVFvkyx1SDGceYK1CG/OvpFi1uz1Mq4sE4LrK TT14uzV8I6hf8ZevBXfrbDNOxpuBIWI7euJPIP915ll2YY9D30DhcsO5gKH0Msp6EE bEnj7lnucc73vaGFLmDqF/ozeL/1kAP2Q8V1BNUfSqnxNHwJbJL4NW4LEj1K9fGEAx 6YTMm3P+28xWQ== From: Christian Brauner Date: Thu, 20 Mar 2025 10:29:09 +0100 Subject: [PATCH v3 4/4] selftests/pidfd: third test for multi-threaded exec polling Precedence: bulk X-Mailing-List: linux-fsdevel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Message-Id: <20250320-work-pidfs-thread_group-v3-4-b7e5f7e2c3b1@kernel.org> References: <20250320-work-pidfs-thread_group-v3-0-b7e5f7e2c3b1@kernel.org> In-Reply-To: <20250320-work-pidfs-thread_group-v3-0-b7e5f7e2c3b1@kernel.org> To: Oleg Nesterov Cc: linux-fsdevel@vger.kernel.org, Jeff Layton , Lennart Poettering , Daan De Meyer , Mike Yuan , Christian Brauner X-Mailer: b4 0.15-dev-42535 X-Developer-Signature: v=1; a=openpgp-sha256; l=5712; i=brauner@kernel.org; h=from:subject:message-id; bh=2oPuMKXAnppK07WHh02sLh41hDtkPLKu83E2A4l+cPw=; b=owGbwMvMwCU28Zj0gdSKO4sYT6slMaTfvv8y6CJvtcVdr8S22/cvnT4/gWe+onzcmh171lW0G 3lc89by6yhlYRDjYpAVU2RxaDcJl1vOU7HZKFMDZg4rE8gQBi5OAbjIDob/iQ/CmP6+qfQNmyK3 47aTW3DK+SzR2fFGKrdWvOLtOdSizshwWerCqaJX/mc8Tt3dO3vLqb+7H8uGae7Y0hF9QeVqYyQ THwA= X-Developer-Key: i=brauner@kernel.org; a=openpgp; fpr=4880B8C9BD0E5106FC070F4F7B3C391EFEA93624 Ensure that during a multi-threaded exec and premature thread-group leader exit no exit notification is generated. Signed-off-by: Christian Brauner --- tools/testing/selftests/pidfd/pidfd_info_test.c | 147 ++++++++++++++++++++++++ 1 file changed, 147 insertions(+) diff --git a/tools/testing/selftests/pidfd/pidfd_info_test.c b/tools/testing/selftests/pidfd/pidfd_info_test.c index 4169780c9e55..1758a1b0457b 100644 --- a/tools/testing/selftests/pidfd/pidfd_info_test.c +++ b/tools/testing/selftests/pidfd/pidfd_info_test.c @@ -542,4 +542,151 @@ TEST_F(pidfd_info, thread_group_exec) EXPECT_EQ(close(pidfd_thread), 0); } +static void *pidfd_info_thread_exec_sane(void *arg) +{ + pid_t pid_thread = gettid(); + int ipc_socket = *(int *)arg; + + /* Inform the grand-parent what the tid of this thread is. */ + if (write_nointr(ipc_socket, &pid_thread, sizeof(pid_thread)) != sizeof(pid_thread)) + return NULL; + + if (read_nointr(ipc_socket, &pid_thread, sizeof(pid_thread)) != sizeof(pid_thread)) + return NULL; + + close(ipc_socket); + + sys_execveat(AT_FDCWD, "pidfd_exec_helper", NULL, NULL, 0); + return NULL; +} + +TEST_F(pidfd_info, thread_group_exec_thread) +{ + pid_t pid_leader, pid_poller, pid_thread; + pthread_t thread; + int nevents, pidfd_leader, pidfd_leader_thread, pidfd_thread, ret; + int ipc_sockets[2]; + struct pollfd fds = {}; + struct pidfd_info info = { + .mask = PIDFD_INFO_CGROUPID | PIDFD_INFO_EXIT, + }; + + ret = socketpair(AF_LOCAL, SOCK_STREAM | SOCK_CLOEXEC, 0, ipc_sockets); + EXPECT_EQ(ret, 0); + + pid_leader = create_child(&pidfd_leader, 0); + EXPECT_GE(pid_leader, 0); + + if (pid_leader == 0) { + close(ipc_sockets[0]); + + /* The thread will outlive the thread-group leader. */ + if (pthread_create(&thread, NULL, pidfd_info_thread_exec_sane, &ipc_sockets[1])) + syscall(__NR_exit, EXIT_FAILURE); + + /* + * Pause the thread-group leader. It will be killed once + * the subthread execs. + */ + pause(); + syscall(__NR_exit, EXIT_SUCCESS); + } + + /* Retrieve the tid of the thread. */ + EXPECT_EQ(close(ipc_sockets[1]), 0); + ASSERT_EQ(read_nointr(ipc_sockets[0], &pid_thread, sizeof(pid_thread)), sizeof(pid_thread)); + + /* Opening a thread as a PIDFD_THREAD must succeed. */ + pidfd_thread = sys_pidfd_open(pid_thread, PIDFD_THREAD); + ASSERT_GE(pidfd_thread, 0); + + /* Open a thread-specific pidfd for the thread-group leader. */ + pidfd_leader_thread = sys_pidfd_open(pid_leader, PIDFD_THREAD); + ASSERT_GE(pidfd_leader_thread, 0); + + pid_poller = fork(); + ASSERT_GE(pid_poller, 0); + if (pid_poller == 0) { + /* + * The subthread will now exec. The struct pid of the old + * thread-group leader will be assumed by the subthread which + * becomes the new thread-group leader. So no exit notification + * must be generated. Wait for 5 seconds and call it a success + * if no notification has been received. + */ + fds.events = POLLIN; + fds.fd = pidfd_leader_thread; + nevents = poll(&fds, 1, 10000 /* wait 5 seconds */); + if (nevents != 0) + _exit(EXIT_FAILURE); + if (fds.revents & POLLIN) + _exit(EXIT_FAILURE); + if (fds.revents & POLLHUP) + _exit(EXIT_FAILURE); + _exit(EXIT_SUCCESS); + } + + /* Now that we've opened a thread-specific pidfd the thread can exec. */ + ASSERT_EQ(write_nointr(ipc_sockets[0], &pid_thread, sizeof(pid_thread)), sizeof(pid_thread)); + EXPECT_EQ(close(ipc_sockets[0]), 0); + ASSERT_EQ(wait_for_pid(pid_poller), 0); + + /* Wait until the kernel has SIGKILLed the thread. */ + fds.events = POLLHUP; + fds.fd = pidfd_thread; + nevents = poll(&fds, 1, -1); + ASSERT_EQ(nevents, 1); + /* The thread has been reaped. */ + ASSERT_TRUE(!!(fds.revents & POLLHUP)); + + /* Retrieve thread-specific exit info from pidfd. */ + ASSERT_EQ(ioctl(pidfd_thread, PIDFD_GET_INFO, &info), 0); + ASSERT_FALSE(!!(info.mask & PIDFD_INFO_CREDS)); + ASSERT_TRUE(!!(info.mask & PIDFD_INFO_EXIT)); + /* + * While the kernel will have SIGKILLed the whole thread-group + * during exec it will cause the individual threads to exit + * cleanly. + */ + ASSERT_TRUE(WIFEXITED(info.exit_code)); + ASSERT_EQ(WEXITSTATUS(info.exit_code), 0); + + /* + * The thread-group leader is still alive, the thread has taken + * over its struct pid and thus its pid number. + */ + info.mask = PIDFD_INFO_CGROUPID | PIDFD_INFO_EXIT; + ASSERT_EQ(ioctl(pidfd_leader, PIDFD_GET_INFO, &info), 0); + ASSERT_TRUE(!!(info.mask & PIDFD_INFO_CREDS)); + ASSERT_FALSE(!!(info.mask & PIDFD_INFO_EXIT)); + ASSERT_EQ(info.pid, pid_leader); + + /* Take down the thread-group leader. */ + EXPECT_EQ(sys_pidfd_send_signal(pidfd_leader, SIGKILL, NULL, 0), 0); + + /* + * Afte the exec we're dealing with an empty thread-group so now + * we must see an exit notification on the thread-specific pidfd + * for the thread-group leader as there's no subthread that can + * revive the struct pid. + */ + fds.events = POLLIN; + fds.fd = pidfd_leader_thread; + nevents = poll(&fds, 1, -1); + ASSERT_EQ(nevents, 1); + ASSERT_TRUE(!!(fds.revents & POLLIN)); + ASSERT_FALSE(!!(fds.revents & POLLHUP)); + + EXPECT_EQ(sys_waitid(P_PIDFD, pidfd_leader, NULL, WEXITED), 0); + + /* Retrieve exit information for the thread-group leader. */ + info.mask = PIDFD_INFO_CGROUPID | PIDFD_INFO_EXIT; + ASSERT_EQ(ioctl(pidfd_leader, PIDFD_GET_INFO, &info), 0); + ASSERT_FALSE(!!(info.mask & PIDFD_INFO_CREDS)); + ASSERT_TRUE(!!(info.mask & PIDFD_INFO_EXIT)); + + EXPECT_EQ(close(pidfd_leader), 0); + EXPECT_EQ(close(pidfd_thread), 0); +} + TEST_HARNESS_MAIN