From patchwork Fri Nov 25 13:54:58 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Frederic Weisbecker X-Patchwork-Id: 13055967 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id E3972C4332F for ; Fri, 25 Nov 2022 13:55:12 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229788AbiKYNzL (ORCPT ); Fri, 25 Nov 2022 08:55:11 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50776 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229628AbiKYNzK (ORCPT ); Fri, 25 Nov 2022 08:55:10 -0500 Received: from dfw.source.kernel.org (dfw.source.kernel.org [IPv6:2604:1380:4641:c500::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3660B21E; Fri, 25 Nov 2022 05:55:10 -0800 (PST) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id CC87962457; Fri, 25 Nov 2022 13:55:09 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 2BE43C433D6; Fri, 25 Nov 2022 13:55:07 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1669384509; bh=Q8JjNQ5bUk4AZ2CfSjRxxwMV7KD/wuw64pWCocyDRgA=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=JaRLG4y8AXSHW+V/j1dU5Z7h2CjWGPm9A3Hve6Luaou4atZcm0ge2x8XLpWVJ3yV6 m7EpeG4ynFyGNSgYyCFDOs3gR+Re6P2x6BaqO0eOd76kOKWV14Ojnl4iKjsHVkPFsm WMJtIw5d8oqBJLgNd7cGWS8ovokOyJKTGI/OeWuDWgSRmIa6mlN+asvF+vijYbfHlD nvsmj3RVm19vGniDoCTbTsDJvWz2WCPcdtTHsNLE21Qjf2g9VR+igHQIrlyRyf9d+b u/st5dXe6yQ2xtnj2cPHEghdY+Dq+ecDmDZD6frC7fCFU10cWTSjlk301WXkaZEz4G aK5negbtrDb1g== From: Frederic Weisbecker To: "Paul E . McKenney" Cc: LKML , Frederic Weisbecker , "Eric W . Biederman" , Neeraj Upadhyay , Oleg Nesterov , Pengfei Xu , Boqun Feng , Lai Jiangshan , rcu@vger.kernel.org Subject: [PATCH 1/3] rcu-tasks: Improve comments explaining tasks_rcu_exit_srcu purpose Date: Fri, 25 Nov 2022 14:54:58 +0100 Message-Id: <20221125135500.1653800-2-frederic@kernel.org> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20221125135500.1653800-1-frederic@kernel.org> References: <20221125135500.1653800-1-frederic@kernel.org> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: rcu@vger.kernel.org Make sure we don't need to look again into the depths of git blame in order not to miss a subtle part about how rcu-tasks is dealing with exiting tasks. Suggested-by: Boqun Feng Suggested-by: Neeraj Upadhyay Suggested-by: Paul E. McKenney Cc: Oleg Nesterov Cc: Lai Jiangshan Cc: Eric W. Biederman Signed-off-by: Frederic Weisbecker --- kernel/rcu/tasks.h | 37 +++++++++++++++++++++++++++++-------- 1 file changed, 29 insertions(+), 8 deletions(-) diff --git a/kernel/rcu/tasks.h b/kernel/rcu/tasks.h index 4a991311be9b..7deed6135f73 100644 --- a/kernel/rcu/tasks.h +++ b/kernel/rcu/tasks.h @@ -827,11 +827,21 @@ static void rcu_tasks_pertask(struct task_struct *t, struct list_head *hop) static void rcu_tasks_postscan(struct list_head *hop) { /* - * Wait for tasks that are in the process of exiting. This - * does only part of the job, ensuring that all tasks that were - * previously exiting reach the point where they have disabled - * preemption, allowing the later synchronize_rcu() to finish - * the job. + * Exiting tasks may escape the tasklist scan. Those are vulnerable + * until their final schedule() with TASK_DEAD state. To cope with + * this, divide the fragile exit path part in two intersecting + * read side critical sections: + * + * 1) An _SRCU_ read side starting before calling exit_notify(), + * which may remove the task from the tasklist, and ending after + * the final preempt_disable() call in do_exit(). + * + * 2) An _RCU_ read side starting with the final preempt_disable() + * call in do_exit() and ending with the final call to schedule() + * with TASK_DEAD state. + * + * This handles the part 1). And postgp will handle part 2) with a + * call to synchronize_rcu(). */ synchronize_srcu(&tasks_rcu_exit_srcu); } @@ -898,7 +908,10 @@ static void rcu_tasks_postgp(struct rcu_tasks *rtp) * * In addition, this synchronize_rcu() waits for exiting tasks * to complete their final preempt_disable() region of execution, - * cleaning up after the synchronize_srcu() above. + * cleaning up after synchronize_srcu(&tasks_rcu_exit_srcu), + * enforcing the whole region before tasklist removal until + * the final schedule() with TASK_DEAD state to be an RCU TASKS + * read side critical section. */ synchronize_rcu(); } @@ -988,7 +1001,11 @@ void show_rcu_tasks_classic_gp_kthread(void) EXPORT_SYMBOL_GPL(show_rcu_tasks_classic_gp_kthread); #endif // !defined(CONFIG_TINY_RCU) -/* Do the srcu_read_lock() for the above synchronize_srcu(). */ +/* + * Contribute to protect against tasklist scan blind spot while the + * task is exiting and may be removed from the tasklist. See + * corresponding synchronize_srcu() for further details. + */ void exit_tasks_rcu_start(void) __acquires(&tasks_rcu_exit_srcu) { preempt_disable(); @@ -996,7 +1013,11 @@ void exit_tasks_rcu_start(void) __acquires(&tasks_rcu_exit_srcu) preempt_enable(); } -/* Do the srcu_read_unlock() for the above synchronize_srcu(). */ +/* + * Contribute to protect against tasklist scan blind spot while the + * task is exiting and may be removed from the tasklist. See + * corresponding synchronize_srcu() for further details. + */ void exit_tasks_rcu_finish(void) __releases(&tasks_rcu_exit_srcu) { struct task_struct *t = current; From patchwork Fri Nov 25 13:54:59 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Frederic Weisbecker X-Patchwork-Id: 13055968 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5EBB6C46467 for ; Fri, 25 Nov 2022 13:55:15 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229628AbiKYNzO (ORCPT ); Fri, 25 Nov 2022 08:55:14 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50854 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229756AbiKYNzN (ORCPT ); Fri, 25 Nov 2022 08:55:13 -0500 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B54431AD8F; Fri, 25 Nov 2022 05:55:12 -0800 (PST) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 51DB36245A; Fri, 25 Nov 2022 13:55:12 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id A4754C433C1; Fri, 25 Nov 2022 13:55:09 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1669384511; bh=ALLACmzXC6lYKmELWY0sATEsOCq01HNEAVNMSA5sSNU=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=PWrTou2xJP0lqm6s5QPAORWmW0j8jfCxhnCMQeUbEiz3F0QA3AxuIpCRAkk6inw5p wEDcwPD1DlhwzX8kvszAKdvuVCvPhw+Dvkee6PRXx0kK9u+EG71tWNxOce0XX/WyOg 6I9TlVrT6Ds+SlvRt/8s1gXR8efW9OiazXwJwX0Uv8GBHlUx1UHellIjcY/tApFMOz gDfyGplUPEmCE0qcRmw1AyGPAcsYOKjecreO+qjAWsISQvsifq0uCCqmBnvsbWZ5t0 rjDY9oLXtHLIKgJe9vsrsBRNCqBRUkXaON89nIoi4DeUIM2KnFGrAvr8TNoTR9jDFa POLinGPxc460Q== From: Frederic Weisbecker To: "Paul E . McKenney" Cc: LKML , Frederic Weisbecker , "Eric W . Biederman" , Neeraj Upadhyay , Oleg Nesterov , Pengfei Xu , Boqun Feng , Lai Jiangshan , rcu@vger.kernel.org Subject: [PATCH 2/3] rcu-tasks: Remove preemption disablement around srcu_read_[un]lock() calls Date: Fri, 25 Nov 2022 14:54:59 +0100 Message-Id: <20221125135500.1653800-3-frederic@kernel.org> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20221125135500.1653800-1-frederic@kernel.org> References: <20221125135500.1653800-1-frederic@kernel.org> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: rcu@vger.kernel.org Ever since the following commit: 5a41344a3d83 ("srcu: Simplify __srcu_read_unlock() via this_cpu_dec()") SRCU doesn't rely anymore on preemption to be disabled in order to modify the per-CPU counter. And even then it used to be done from the API itself. Therefore and after checking further, it appears to be safe to remove the preemption disablement around __srcu_read_[un]lock() in exit_tasks_rcu_start() and exit_tasks_rcu_finish() Suggested-by: Boqun Feng Suggested-by: Paul E. McKenney Suggested-by: Neeraj Upadhyay Cc: Lai Jiangshan Signed-off-by: Frederic Weisbecker --- kernel/rcu/tasks.h | 4 ---- 1 file changed, 4 deletions(-) diff --git a/kernel/rcu/tasks.h b/kernel/rcu/tasks.h index 7deed6135f73..9a8114114b48 100644 --- a/kernel/rcu/tasks.h +++ b/kernel/rcu/tasks.h @@ -1008,9 +1008,7 @@ EXPORT_SYMBOL_GPL(show_rcu_tasks_classic_gp_kthread); */ void exit_tasks_rcu_start(void) __acquires(&tasks_rcu_exit_srcu) { - preempt_disable(); current->rcu_tasks_idx = __srcu_read_lock(&tasks_rcu_exit_srcu); - preempt_enable(); } /* @@ -1022,9 +1020,7 @@ void exit_tasks_rcu_finish(void) __releases(&tasks_rcu_exit_srcu) { struct task_struct *t = current; - preempt_disable(); __srcu_read_unlock(&tasks_rcu_exit_srcu, t->rcu_tasks_idx); - preempt_enable(); exit_tasks_rcu_finish_trace(t); } From patchwork Fri Nov 25 13:55:00 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Frederic Weisbecker X-Patchwork-Id: 13055969 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4796FC4332F for ; Fri, 25 Nov 2022 13:55:26 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229823AbiKYNzZ (ORCPT ); Fri, 25 Nov 2022 08:55:25 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51166 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229896AbiKYNzW (ORCPT ); Fri, 25 Nov 2022 08:55:22 -0500 Received: from dfw.source.kernel.org (dfw.source.kernel.org [IPv6:2604:1380:4641:c500::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3690221258; Fri, 25 Nov 2022 05:55:15 -0800 (PST) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id CA49C6245B; Fri, 25 Nov 2022 13:55:14 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 2D975C433D6; Fri, 25 Nov 2022 13:55:12 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1669384514; bh=02dI1lNmveQrMuT/Cc1xXAnRLUb8g7J9hwX6UzoKBFc=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=KouTAujdIoK+X5R0bHouk8vqaIt+Uy9J1A6+KgwwpYJ3AYYmoFXZnLIqJqWhthu+0 LB6nXrlZWPPq0lvRrz/IeIXqy2x0/ZH4aJoWJph3oOzxAYrOWqMc9PrfMy7/QEI4rG vSmOivxxi1G0DES8NNcOg/EYByyXiGT63lxyjmQ/BzupqnEP4dZmBQQWllDfjQYH0k BuTH/ydfgjl5T8rQdvCGwniEUmbZN5IdlZhpPH85qIO+5oqoNXXzKVXWmlZmHBZAzA AMJCvZ2gcpSUjnSFQ5RJ9166NT8dAHz0F9kiTSft1aZDe3g0D8c+LZynmvxEoNOEyS KqZCdzKoO/DBw== From: Frederic Weisbecker To: "Paul E . McKenney" Cc: LKML , Frederic Weisbecker , "Eric W . Biederman" , Neeraj Upadhyay , Oleg Nesterov , Pengfei Xu , Boqun Feng , Lai Jiangshan , rcu@vger.kernel.org Subject: [PATCH 3/3] rcu-tasks: Fix synchronize_rcu_tasks() VS zap_pid_ns_processes() Date: Fri, 25 Nov 2022 14:55:00 +0100 Message-Id: <20221125135500.1653800-4-frederic@kernel.org> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20221125135500.1653800-1-frederic@kernel.org> References: <20221125135500.1653800-1-frederic@kernel.org> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: rcu@vger.kernel.org RCU Tasks and PID-namespace unshare can interact in do_exit() in a complicated circular dependency: 1) TASK A calls unshare(CLONE_NEWPID), this creates a new PID namespace that every subsequent child of TASK A will belong to. But TASK A doesn't itself belong to that new PID namespace. 2) TASK A forks() and creates TASK B. TASK A stays attached to its PID namespace (let's say PID_NS1) and TASK B is the first task belonging to the new PID namespace created by unshare() (let's call it PID_NS2). 3) Since TASK B is the first task attached to PID_NS2, it becomes the PID_NS2 child reaper. 4) TASK A forks() again and creates TASK C which get attached to PID_NS2. Note how TASK C has TASK A as a parent (belonging to PID_NS1) but has TASK B (belonging to PID_NS2) as a pid_namespace child_reaper. 5) TASK B exits and since it is the child reaper for PID_NS2, it has to kill all other tasks attached to PID_NS2, and wait for all of them to die before getting reaped itself (zap_pid_ns_process()). 6) TASK A calls synchronize_rcu_tasks() which leads to synchronize_srcu(&tasks_rcu_exit_srcu). 7) TASK B is waiting for TASK C to get reaped. But TASK B is under a tasks_rcu_exit_srcu SRCU critical section (exit_notify() is between exit_tasks_rcu_start() and exit_tasks_rcu_finish()), blocking TASK A. 8) TASK C exits and since TASK A is its parent, it waits for it to reap TASK C, but it can't because TASK A waits for TASK B that waits for TASK C. Pid_namespace semantics can hardly be changed at this point. But the coverage of tasks_rcu_exit_srcu can be reduced instead. The current task is assumed not to be concurrently reapable at this stage of exit_notify() and therefore tasks_rcu_exit_srcu can be temporarily relaxed without breaking its constraints, providing a way out of the deadlock scenario. Fixes: 3f95aa81d265 ("rcu: Make TASKS_RCU handle tasks that are almost done exiting") Reported-by: Pengfei Xu Suggested-by: Boqun Feng Suggested-by: Neeraj Upadhyay Suggested-by: Paul E. McKenney Cc: Oleg Nesterov Cc: Lai Jiangshan Cc: Eric W . Biederman Signed-off-by: Frederic Weisbecker --- include/linux/rcupdate.h | 2 ++ kernel/pid_namespace.c | 17 +++++++++++++++++ kernel/rcu/tasks.h | 14 ++++++++++++-- 3 files changed, 31 insertions(+), 2 deletions(-) diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h index 89b3036746d2..a19d91d5461c 100644 --- a/include/linux/rcupdate.h +++ b/include/linux/rcupdate.h @@ -238,6 +238,7 @@ void synchronize_rcu_tasks_rude(void); #define rcu_note_voluntary_context_switch(t) rcu_tasks_qs(t, false) void exit_tasks_rcu_start(void); +void exit_tasks_rcu_stop(void); void exit_tasks_rcu_finish(void); #else /* #ifdef CONFIG_TASKS_RCU_GENERIC */ #define rcu_tasks_classic_qs(t, preempt) do { } while (0) @@ -246,6 +247,7 @@ void exit_tasks_rcu_finish(void); #define call_rcu_tasks call_rcu #define synchronize_rcu_tasks synchronize_rcu static inline void exit_tasks_rcu_start(void) { } +static inline void exit_tasks_rcu_stop(void) { } static inline void exit_tasks_rcu_finish(void) { } #endif /* #else #ifdef CONFIG_TASKS_RCU_GENERIC */ diff --git a/kernel/pid_namespace.c b/kernel/pid_namespace.c index f4f8cb0435b4..fc21c5d5fd5d 100644 --- a/kernel/pid_namespace.c +++ b/kernel/pid_namespace.c @@ -244,7 +244,24 @@ void zap_pid_ns_processes(struct pid_namespace *pid_ns) set_current_state(TASK_INTERRUPTIBLE); if (pid_ns->pid_allocated == init_pids) break; + /* + * Release tasks_rcu_exit_srcu to avoid following deadlock: + * + * 1) TASK A unshare(CLONE_NEWPID) + * 2) TASK A fork() twice -> TASK B (child reaper for new ns) + * and TASK C + * 3) TASK B exits, kills TASK C, waits for TASK A to reap it + * 4) TASK A calls synchronize_rcu_tasks() + * -> synchronize_srcu(tasks_rcu_exit_srcu) + * 5) *DEADLOCK* + * + * It is considered safe to release tasks_rcu_exit_srcu here + * because we assume the current task can not be concurrently + * reaped at this point. + */ + exit_tasks_rcu_stop(); schedule(); + exit_tasks_rcu_start(); } __set_current_state(TASK_RUNNING); diff --git a/kernel/rcu/tasks.h b/kernel/rcu/tasks.h index 9a8114114b48..f9dfc2ece287 100644 --- a/kernel/rcu/tasks.h +++ b/kernel/rcu/tasks.h @@ -1016,12 +1016,22 @@ void exit_tasks_rcu_start(void) __acquires(&tasks_rcu_exit_srcu) * task is exiting and may be removed from the tasklist. See * corresponding synchronize_srcu() for further details. */ -void exit_tasks_rcu_finish(void) __releases(&tasks_rcu_exit_srcu) +void exit_tasks_rcu_stop(void) __releases(&tasks_rcu_exit_srcu) { struct task_struct *t = current; __srcu_read_unlock(&tasks_rcu_exit_srcu, t->rcu_tasks_idx); - exit_tasks_rcu_finish_trace(t); +} + +/* + * Contribute to protect against tasklist scan blind spot while the + * task is exiting and may be removed from the tasklist. See + * corresponding synchronize_srcu() for further details. + */ +void exit_tasks_rcu_finish(void) +{ + exit_tasks_rcu_stop(); + exit_tasks_rcu_finish_trace(current); } #else /* #ifdef CONFIG_TASKS_RCU */