Message ID | 20250130185320.1651910-4-paulmck@kernel.org (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | [v2] 4/5] rcu-tasks: Move RCU Tasks self-tests to core_initcall() | expand |
On 2025-01-30 10:53:19 [-0800], Paul E. McKenney wrote: > The timer and hrtimer softirq processing has moved to dedicated threads > for kernels built with CONFIG_IRQ_FORCED_THREADING=y. This results in > timers not expiring until later in early boot, which in turn causes the > RCU Tasks self-tests to hang in kernels built with CONFIG_PROVE_RCU=y, > which further causes the entire kernel to hang. One fix would be to > make timers work during this time, but there are no known users of RCU > Tasks grace periods during that time, so no justification for the added > complexity. Not yet, anyway. > > This commit therefore moves the call to rcu_init_tasks_generic() from > kernel_init_freeable() to a core_initcall(). This works because the > timer and hrtimer kthreads are created at early_initcall() time. Fixes: 49a17639508c3 ("softirq: Use a dedicated thread for timer wakeups on PREEMPT_RT.") ? I played with it and I can reproduce the issue with !RT + threadirqs but not with RT (which implies threadirqs). Is there anything in RT that avoids the problem? Thank you for debugging and the patch. Sebastian
On Tue, Feb 04, 2025 at 11:26:11AM +0100, Sebastian Andrzej Siewior wrote: > On 2025-01-30 10:53:19 [-0800], Paul E. McKenney wrote: > > The timer and hrtimer softirq processing has moved to dedicated threads > > for kernels built with CONFIG_IRQ_FORCED_THREADING=y. This results in > > timers not expiring until later in early boot, which in turn causes the > > RCU Tasks self-tests to hang in kernels built with CONFIG_PROVE_RCU=y, > > which further causes the entire kernel to hang. One fix would be to > > make timers work during this time, but there are no known users of RCU > > Tasks grace periods during that time, so no justification for the added > > complexity. Not yet, anyway. > > > > This commit therefore moves the call to rcu_init_tasks_generic() from > > kernel_init_freeable() to a core_initcall(). This works because the > > timer and hrtimer kthreads are created at early_initcall() time. > > Fixes: 49a17639508c3 ("softirq: Use a dedicated thread for timer wakeups on PREEMPT_RT.") > ? Quite possibly... I freely confess that I was more focused on the fix than on the bug's origin. Would you be willing to try this commit and its predecessor? > I played with it and I can reproduce the issue with !RT + threadirqs but > not with RT (which implies threadirqs). > Is there anything in RT that avoids the problem? Not that I know of, but then again I did not try it. To your point, I do need to make a -rt rcutorture scenario. TREE03 has been intended to approximate this, and it uses the following Kconfig options: ------------------------------------------------------------------------ CONFIG_SMP=y CONFIG_NR_CPUS=16 CONFIG_PREEMPT_NONE=n CONFIG_PREEMPT_VOLUNTARY=n CONFIG_PREEMPT=y #CHECK#CONFIG_PREEMPT_RCU=y CONFIG_HZ_PERIODIC=y CONFIG_NO_HZ_IDLE=n CONFIG_NO_HZ_FULL=n CONFIG_RCU_TRACE=y CONFIG_HOTPLUG_CPU=y CONFIG_RCU_FANOUT=2 CONFIG_RCU_FANOUT_LEAF=2 CONFIG_RCU_NOCB_CPU=n CONFIG_DEBUG_LOCK_ALLOC=n CONFIG_RCU_BOOST=y CONFIG_DEBUG_OBJECTS_RCU_HEAD=n CONFIG_RCU_EXPERT=y ------------------------------------------------------------------------ And the following kernel-boot parameters: ------------------------------------------------------------------------ rcutorture.onoff_interval=200 rcutorture.onoff_holdoff=30 rcutree.gp_preinit_delay=12 rcutree.gp_init_delay=3 rcutree.gp_cleanup_delay=3 rcutree.kthread_prio=2 threadirqs rcutree.use_softirq=0 rcutorture.preempt_duration=10 ------------------------------------------------------------------------ Some of these are for RCU's benefit, but what should I change to more closely approximate a typical real-time deployment? > Thank you for debugging and the patch. And to you for digging into it! Thanx, Paul
On 2025-02-04 03:51:48 [-0800], Paul E. McKenney wrote: > On Tue, Feb 04, 2025 at 11:26:11AM +0100, Sebastian Andrzej Siewior wrote: > > On 2025-01-30 10:53:19 [-0800], Paul E. McKenney wrote: > > > The timer and hrtimer softirq processing has moved to dedicated threads > > > for kernels built with CONFIG_IRQ_FORCED_THREADING=y. This results in > > > timers not expiring until later in early boot, which in turn causes the > > > RCU Tasks self-tests to hang in kernels built with CONFIG_PROVE_RCU=y, > > > which further causes the entire kernel to hang. One fix would be to > > > make timers work during this time, but there are no known users of RCU > > > Tasks grace periods during that time, so no justification for the added > > > complexity. Not yet, anyway. > > > > > > This commit therefore moves the call to rcu_init_tasks_generic() from > > > kernel_init_freeable() to a core_initcall(). This works because the > > > timer and hrtimer kthreads are created at early_initcall() time. > > > > Fixes: 49a17639508c3 ("softirq: Use a dedicated thread for timer wakeups on PREEMPT_RT.") > > ? > > Quite possibly... I freely confess that I was more focused on the fix > than on the bug's origin. Would you be willing to try this commit and > its predecessor? Yes. Just verified. Tested-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> > > I played with it and I can reproduce the issue with !RT + threadirqs but > > not with RT (which implies threadirqs). > > Is there anything in RT that avoids the problem? > > Not that I know of, but then again I did not try it. To your point, The change looks fine. > I do need to make a -rt rcutorture scenario. TREE03 has been intended to > approximate this, and it uses the following Kconfig options: > > ------------------------------------------------------------------------ > > CONFIG_SMP=y > CONFIG_NR_CPUS=16 > CONFIG_PREEMPT_NONE=n > CONFIG_PREEMPT_VOLUNTARY=n > CONFIG_PREEMPT=y > #CHECK#CONFIG_PREEMPT_RCU=y > CONFIG_HZ_PERIODIC=y > CONFIG_NO_HZ_IDLE=n > CONFIG_NO_HZ_FULL=n > CONFIG_RCU_TRACE=y > CONFIG_HOTPLUG_CPU=y > CONFIG_RCU_FANOUT=2 > CONFIG_RCU_FANOUT_LEAF=2 > CONFIG_RCU_NOCB_CPU=n > CONFIG_DEBUG_LOCK_ALLOC=n > CONFIG_RCU_BOOST=y > CONFIG_DEBUG_OBJECTS_RCU_HEAD=n > CONFIG_RCU_EXPERT=y You could enable CONFIG_PREEMPT_RT ;) CONFIG_PREEMPT_LAZY is probably also set a lot. That should be it. > ------------------------------------------------------------------------ > > And the following kernel-boot parameters: > > ------------------------------------------------------------------------ > > rcutorture.onoff_interval=200 rcutorture.onoff_holdoff=30 > rcutree.gp_preinit_delay=12 > rcutree.gp_init_delay=3 > rcutree.gp_cleanup_delay=3 > rcutree.kthread_prio=2 > threadirqs > rcutree.use_softirq=0 > rcutorture.preempt_duration=10 > > ------------------------------------------------------------------------ > > Some of these are for RCU's benefit, but what should I change to more > closely approximate a typical real-time deployment? See above. > > Thank you for debugging and the patch. > > And to you for digging into it! Thanks. > Thanx, Paul Sebastian
On Tue, Feb 04, 2025 at 05:34:09PM +0100, Sebastian Andrzej Siewior wrote: > On 2025-02-04 03:51:48 [-0800], Paul E. McKenney wrote: > > On Tue, Feb 04, 2025 at 11:26:11AM +0100, Sebastian Andrzej Siewior wrote: > > > On 2025-01-30 10:53:19 [-0800], Paul E. McKenney wrote: > > > > The timer and hrtimer softirq processing has moved to dedicated threads > > > > for kernels built with CONFIG_IRQ_FORCED_THREADING=y. This results in > > > > timers not expiring until later in early boot, which in turn causes the > > > > RCU Tasks self-tests to hang in kernels built with CONFIG_PROVE_RCU=y, > > > > which further causes the entire kernel to hang. One fix would be to > > > > make timers work during this time, but there are no known users of RCU > > > > Tasks grace periods during that time, so no justification for the added > > > > complexity. Not yet, anyway. > > > > > > > > This commit therefore moves the call to rcu_init_tasks_generic() from > > > > kernel_init_freeable() to a core_initcall(). This works because the > > > > timer and hrtimer kthreads are created at early_initcall() time. > > > > > > Fixes: 49a17639508c3 ("softirq: Use a dedicated thread for timer wakeups on PREEMPT_RT.") > > > ? > > > > Quite possibly... I freely confess that I was more focused on the fix > > than on the bug's origin. Would you be willing to try this commit and > > its predecessor? > > Yes. Just verified. > Tested-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> > Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Boqun, could you please apply Sebastian's tags, including the Fixes tag above? > > > I played with it and I can reproduce the issue with !RT + threadirqs but > > > not with RT (which implies threadirqs). > > > Is there anything in RT that avoids the problem? > > > > Not that I know of, but then again I did not try it. To your point, > > The change looks fine. > > > I do need to make a -rt rcutorture scenario. TREE03 has been intended to > > approximate this, and it uses the following Kconfig options: > > > > ------------------------------------------------------------------------ > > > > CONFIG_SMP=y > > CONFIG_NR_CPUS=16 > > CONFIG_PREEMPT_NONE=n > > CONFIG_PREEMPT_VOLUNTARY=n > > CONFIG_PREEMPT=y > > #CHECK#CONFIG_PREEMPT_RCU=y > > CONFIG_HZ_PERIODIC=y > > CONFIG_NO_HZ_IDLE=n > > CONFIG_NO_HZ_FULL=n > > CONFIG_RCU_TRACE=y > > CONFIG_HOTPLUG_CPU=y > > CONFIG_RCU_FANOUT=2 > > CONFIG_RCU_FANOUT_LEAF=2 > > CONFIG_RCU_NOCB_CPU=n > > CONFIG_DEBUG_LOCK_ALLOC=n > > CONFIG_RCU_BOOST=y > > CONFIG_DEBUG_OBJECTS_RCU_HEAD=n > > CONFIG_RCU_EXPERT=y > > You could enable CONFIG_PREEMPT_RT ;) > CONFIG_PREEMPT_LAZY is probably also set a lot. > > That should be it. > > > ------------------------------------------------------------------------ > > > > And the following kernel-boot parameters: > > > > ------------------------------------------------------------------------ > > > > rcutorture.onoff_interval=200 rcutorture.onoff_holdoff=30 > > rcutree.gp_preinit_delay=12 > > rcutree.gp_init_delay=3 > > rcutree.gp_cleanup_delay=3 > > rcutree.kthread_prio=2 > > threadirqs > > rcutree.use_softirq=0 > > rcutorture.preempt_duration=10 > > > > ------------------------------------------------------------------------ > > > > Some of these are for RCU's benefit, but what should I change to more > > closely approximate a typical real-time deployment? > > See above. Which got me this diff: ------------------------------------------------------------------------ diff --git a/tools/testing/selftests/rcutorture/configs/rcu/TREE03 b/tools/testing/selftests/rcutorture/configs/rcu/TREE03 index 2dc31b16e506..6158f5002497 100644 --- a/tools/testing/selftests/rcutorture/configs/rcu/TREE03 +++ b/tools/testing/selftests/rcutorture/configs/rcu/TREE03 @@ -2,7 +2,9 @@ CONFIG_SMP=y CONFIG_NR_CPUS=16 CONFIG_PREEMPT_NONE=n CONFIG_PREEMPT_VOLUNTARY=n -CONFIG_PREEMPT=y +CONFIG_PREEMPT=n +CONFIG_PREEMPT_LAZY=y +CONFIG_PREEMPT_RT=y #CHECK#CONFIG_PREEMPT_RCU=y CONFIG_HZ_PERIODIC=y CONFIG_NO_HZ_IDLE=n @@ -15,4 +17,5 @@ CONFIG_RCU_NOCB_CPU=n CONFIG_DEBUG_LOCK_ALLOC=n CONFIG_RCU_BOOST=y CONFIG_DEBUG_OBJECTS_RCU_HEAD=n +CONFIG_EXPERT=y CONFIG_RCU_EXPERT=y ------------------------------------------------------------------------ But a 10-minute run got me the splat shown below, and in addition a shutdown-time hang. This is caused by RCU falling behind a callback-flooding kthread that invokes call_rcu() in a semi-tight loop. Setting rcutree.kthread_prio=40 avoids the splat, but still gets the shutdown-time hang. Retrying with the default rcutree.kthread_prio=2 failed to reproduce the splat, but it did reproduce the shutdown-time hang. OK, maybe printk buffers are not being flushed? A 100-millisecond sleep at the end of of rcu_torture_cleanup() got all of rcutorture's output flushed, but lost the subsequent shutdown-time console traffic. The pr_flush(HZ/10,1) seems more sensible, but this is private to printk(). I would like to log the shutdown-time console traffic because RCU can sometimes break things on that path. Thoughts? Thanx, Paul PS: I will do longer runs in case that splat was not a one-off. My concern is that I might need to adjust something more in order to get a reliable callback-flooding test. ------------------------------------------------------------------------ [ 133.107605] ------------[ cut here ]------------ [ 133.107611] rcutorture_oom_notify invoked upon OOM during forward-progress testing. [ 133.107636] WARNING: CPU: 0 PID: 236 at kernel/rcu/rcutorture.c:3117 rcutorture_oom_notify+0x48/0x2e0 [ 133.107651] Modules linked in: [ 133.107662] CPU: 0 UID: 0 PID: 236 Comm: rcu_torture_fwd Not tainted 6.13.0-rc2-00108-g56d12f072c66-dirty #1747 [ 133.107667] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.15.0-1 04/01/2014 [ 133.107669] RIP: 0010:rcutorture_oom_notify+0x48/0x2e0 [ 133.107673] Code: e8 3d ca c3 00 48 8b 2d 5e 8e 17 02 48 85 ed 0f 84 82 02 00 00 90 48 c7 c6 a0 fc 20 a7 48 c7 c7 40 82 68 a7 e8 f9 38 f6 ff 90 <0f> 0b 90 90 8b 35 36 10 82 01 45 31 e4 85 f6 7e 3c 49 63 dc 41 83 [ 133.107675] RSP: 0018:ffffa3b10081fab0 EFLAGS: 00010286 [ 133.107679] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000027 [ 133.107681] RDX: ffff8bf11f21c948 RSI: 0000000000000001 RDI: ffff8bf11f21c940 [ 133.107682] RBP: ffff8bf101ff8000 R08: ffffffffa7b42608 R09: 00000000ffffdfff [ 133.107683] R10: ffffffffa7a62620 R11: ffffffffa7b12620 R12: 0000000000000000 [ 133.107685] R13: ffffa3b10081fb60 R14: 00000000ffffffff R15: 0000000000000000 [ 133.107688] FS: 0000000000000000(0000) GS:ffff8bf11f200000(0000) knlGS:0000000000000000 [ 133.107690] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 133.107691] CR2: ffff8bf10fe01000 CR3: 0000000001f88000 CR4: 00000000000006f0 [ 133.107693] Call Trace: [ 133.107709] <TASK> [ 133.107713] ? __warn+0x83/0x130 [ 133.107716] ? rcutorture_oom_notify+0x48/0x2e0 [ 133.107720] ? report_bug+0x18e/0x1a0 [ 133.107723] ? handle_bug+0x54/0x90 [ 133.107727] ? exc_invalid_op+0x18/0x70 [ 133.107729] ? asm_exc_invalid_op+0x1a/0x20 [ 133.107733] ? rcutorture_oom_notify+0x48/0x2e0 [ 133.107736] notifier_call_chain+0x59/0xc0 [ 133.107744] blocking_notifier_call_chain+0x3c/0x60 [ 133.107746] out_of_memory+0x1f3/0x710 [ 133.107754] __alloc_pages_noprof+0xf27/0x10b0 [ 133.107764] ? timerqueue_del+0x2e/0x50 [ 133.107767] alloc_pages_mpol_noprof+0x43/0xf0 [ 133.107774] new_slab+0x37d/0x440 [ 133.107796] ___slab_alloc+0x480/0x9a0 [ 133.107801] ? rcu_torture_fwd_prog+0x344/0x9e0 [ 133.107803] ? preempt_schedule_thunk+0x1a/0x30 [ 133.107806] ? rcu_torture_fwd_prog+0x344/0x9e0 [ 133.107808] __kmalloc_cache_noprof+0x74/0x1f0 [ 133.107811] rcu_torture_fwd_prog+0x344/0x9e0 [ 133.107814] ? __pfx_rcu_torture_fwd_prog+0x10/0x10 [ 133.107816] ? kthread+0xd1/0x100 [ 133.107818] ? __pfx_rcu_torture_fwd_prog+0x10/0x10 [ 133.107819] kthread+0xd1/0x100 [ 133.107824] ? __pfx_kthread+0x10/0x10 [ 133.107826] ret_from_fork+0x2f/0x50 [ 133.107833] ? __pfx_kthread+0x10/0x10 [ 133.107835] ret_from_fork_asm+0x1a/0x30 [ 133.107842] </TASK> [ 133.107843] ---[ end trace 0000000000000000 ]---
On Tue, Feb 04, 2025 at 12:20:30PM -0800, Paul E. McKenney wrote: > On Tue, Feb 04, 2025 at 05:34:09PM +0100, Sebastian Andrzej Siewior wrote: > > On 2025-02-04 03:51:48 [-0800], Paul E. McKenney wrote: > > > On Tue, Feb 04, 2025 at 11:26:11AM +0100, Sebastian Andrzej Siewior wrote: > > > > On 2025-01-30 10:53:19 [-0800], Paul E. McKenney wrote: > > > > > The timer and hrtimer softirq processing has moved to dedicated threads > > > > > for kernels built with CONFIG_IRQ_FORCED_THREADING=y. This results in > > > > > timers not expiring until later in early boot, which in turn causes the > > > > > RCU Tasks self-tests to hang in kernels built with CONFIG_PROVE_RCU=y, > > > > > which further causes the entire kernel to hang. One fix would be to > > > > > make timers work during this time, but there are no known users of RCU > > > > > Tasks grace periods during that time, so no justification for the added > > > > > complexity. Not yet, anyway. > > > > > > > > > > This commit therefore moves the call to rcu_init_tasks_generic() from > > > > > kernel_init_freeable() to a core_initcall(). This works because the > > > > > timer and hrtimer kthreads are created at early_initcall() time. > > > > > > > > Fixes: 49a17639508c3 ("softirq: Use a dedicated thread for timer wakeups on PREEMPT_RT.") > > > > ? > > > > > > Quite possibly... I freely confess that I was more focused on the fix > > > than on the bug's origin. Would you be willing to try this commit and > > > its predecessor? > > > > Yes. Just verified. > > Tested-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> > > Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> > > Boqun, could you please apply Sebastian's tags, including the Fixes > tag above? > > > > > I played with it and I can reproduce the issue with !RT + threadirqs but > > > > not with RT (which implies threadirqs). > > > > Is there anything in RT that avoids the problem? > > > > > > Not that I know of, but then again I did not try it. To your point, > > > > The change looks fine. > > > > > I do need to make a -rt rcutorture scenario. TREE03 has been intended to > > > approximate this, and it uses the following Kconfig options: > > > > > > ------------------------------------------------------------------------ > > > > > > CONFIG_SMP=y > > > CONFIG_NR_CPUS=16 > > > CONFIG_PREEMPT_NONE=n > > > CONFIG_PREEMPT_VOLUNTARY=n > > > CONFIG_PREEMPT=y > > > #CHECK#CONFIG_PREEMPT_RCU=y > > > CONFIG_HZ_PERIODIC=y > > > CONFIG_NO_HZ_IDLE=n > > > CONFIG_NO_HZ_FULL=n > > > CONFIG_RCU_TRACE=y > > > CONFIG_HOTPLUG_CPU=y > > > CONFIG_RCU_FANOUT=2 > > > CONFIG_RCU_FANOUT_LEAF=2 > > > CONFIG_RCU_NOCB_CPU=n > > > CONFIG_DEBUG_LOCK_ALLOC=n > > > CONFIG_RCU_BOOST=y > > > CONFIG_DEBUG_OBJECTS_RCU_HEAD=n > > > CONFIG_RCU_EXPERT=y > > > > You could enable CONFIG_PREEMPT_RT ;) > > CONFIG_PREEMPT_LAZY is probably also set a lot. > > > > That should be it. > > > > > ------------------------------------------------------------------------ > > > > > > And the following kernel-boot parameters: > > > > > > ------------------------------------------------------------------------ > > > > > > rcutorture.onoff_interval=200 rcutorture.onoff_holdoff=30 > > > rcutree.gp_preinit_delay=12 > > > rcutree.gp_init_delay=3 > > > rcutree.gp_cleanup_delay=3 > > > rcutree.kthread_prio=2 > > > threadirqs > > > rcutree.use_softirq=0 > > > rcutorture.preempt_duration=10 > > > > > > ------------------------------------------------------------------------ > > > > > > Some of these are for RCU's benefit, but what should I change to more > > > closely approximate a typical real-time deployment? > > > > See above. > > Which got me this diff: > > ------------------------------------------------------------------------ > > diff --git a/tools/testing/selftests/rcutorture/configs/rcu/TREE03 b/tools/testing/selftests/rcutorture/configs/rcu/TREE03 > index 2dc31b16e506..6158f5002497 100644 > --- a/tools/testing/selftests/rcutorture/configs/rcu/TREE03 > +++ b/tools/testing/selftests/rcutorture/configs/rcu/TREE03 > @@ -2,7 +2,9 @@ CONFIG_SMP=y > CONFIG_NR_CPUS=16 > CONFIG_PREEMPT_NONE=n > CONFIG_PREEMPT_VOLUNTARY=n > -CONFIG_PREEMPT=y > +CONFIG_PREEMPT=n > +CONFIG_PREEMPT_LAZY=y > +CONFIG_PREEMPT_RT=y > #CHECK#CONFIG_PREEMPT_RCU=y > CONFIG_HZ_PERIODIC=y > CONFIG_NO_HZ_IDLE=n > @@ -15,4 +17,5 @@ CONFIG_RCU_NOCB_CPU=n > CONFIG_DEBUG_LOCK_ALLOC=n > CONFIG_RCU_BOOST=y > CONFIG_DEBUG_OBJECTS_RCU_HEAD=n > +CONFIG_EXPERT=y > CONFIG_RCU_EXPERT=y > > ------------------------------------------------------------------------ > > But a 10-minute run got me the splat shown below, and in addition a > shutdown-time hang. > > This is caused by RCU falling behind a callback-flooding kthread that > invokes call_rcu() in a semi-tight loop. Setting rcutree.kthread_prio=40 > avoids the splat, but still gets the shutdown-time hang. Retrying with > the default rcutree.kthread_prio=2 failed to reproduce the splat, but > it did reproduce the shutdown-time hang. > > OK, maybe printk buffers are not being flushed? A 100-millisecond sleep > at the end of of rcu_torture_cleanup() got all of rcutorture's output > flushed, but lost the subsequent shutdown-time console traffic. The > pr_flush(HZ/10,1) seems more sensible, but this is private to printk(). > > I would like to log the shutdown-time console traffic because RCU can > sometimes break things on that path. > > Thoughts? Longer rcutorture runs showed (not unexpectedly) that the 100-millisecond sleep was not always sufficient, nor was a 500-milliseconds sleep. There is a call to kmsg_dump(KMSG_DUMP_SHUTDOWN) in kernel_power_off() that appears to be intended to dump out the printk() buffers, but it does not seem to do so in kernels built with CONFIG_PREEMPT_RT=y. Does there need to be a pr_flush() call prior to the call to migrate_to_reboot_cpu()? Or maybe even to do_kernel_power_off_prepare() or kernel_shutdown_prepare()? Adding John Ogness on CC so that he can tell me the error of my ways. > PS: I will do longer runs in case that splat was not a one-off. > My concern is that I might need to adjust something more in order > to get a reliable callback-flooding test. And this was not a one-off. Running 10 40-minute instances of the new-age CONFIG_PREEMPT_RT=y TREE03 reliably triggers this. At first glance, this appears to be an interaction between testing of RCU priority boosting and RCU-callback flooding forward-progress testing. And disabling testing of RCU priority boosting avoids these OOMs. As does running without CONFIG_PREEMPT_RT=y. My next step is to run with rcutorture.preempt_duration=0, which disables within-guest-OS random preempting of kthreads. If that doesn't help, I expect to play around with avoiding concurrent testing of RCU priority boosting and RCU callback flooding forward progress. Or is there a better way? Thanx, Paul
On 2025-02-05, "Paul E. McKenney" <paulmck@kernel.org> wrote: >> This is caused by RCU falling behind a callback-flooding kthread that >> invokes call_rcu() in a semi-tight loop. Setting rcutree.kthread_prio=40 >> avoids the splat, but still gets the shutdown-time hang. Retrying with >> the default rcutree.kthread_prio=2 failed to reproduce the splat, but >> it did reproduce the shutdown-time hang. >> >> OK, maybe printk buffers are not being flushed? A 100-millisecond sleep >> at the end of of rcu_torture_cleanup() got all of rcutorture's output >> flushed, but lost the subsequent shutdown-time console traffic. The >> pr_flush(HZ/10,1) seems more sensible, but this is private to printk(). >> >> I would like to log the shutdown-time console traffic because RCU can >> sometimes break things on that path. pr_flush() was changed to private because there were no users. It would not be a problem to make it available. Adding a pr_flush() to rcu_torture_cleanup() would be an appropriate workaround for now (more on this at the end). > There is a call to kmsg_dump(KMSG_DUMP_SHUTDOWN) in kernel_power_off() > that appears to be intended to dump out the printk() buffers, It only dumps the buffers to the registered kmsg_dumpers. It is not responsible for flushing console backlogs. > but it > does not seem to do so in kernels built with CONFIG_PREEMPT_RT=y. > Does there need to be a pr_flush() call prior to the call to > migrate_to_reboot_cpu()? Or maybe even to do_kernel_power_off_prepare() > or kernel_shutdown_prepare()? With CONFIG_PREEMPT_RT=y, legacy consoles only print via a dedicated kthread. Without a pr_flush() somewhere, there is basically no chance that they will get backlogs flushed because noone is waitig for them. The new console API (NBCON) provides support for "atomic consoles", which _do_ flush by transitioning to synchronous printing during shutdown/reboot. Unfortunately we still don't have any NBCON atomic console implemented in the kernel. The 8250 UART will be our first driver, most likely available in 6.15. (With the current PREEMPT_RT patch applied, the 8250 NBCON atomic driver is used.) Since only CONFIG_PREEMPT_RT=y has this issue, I am not sure if we want to sprinkle pr_flush() calls on all sleepable shutdown/reboot paths, although that is certainly one way to handle it. For your case, adding a pr_flush() to rcu_torture_cleanup() and making pr_flush() non-private would be an easy solution to avoid your problem. John Ogness
On Wed, Feb 05, 2025 at 09:00:07PM +0106, John Ogness wrote: > On 2025-02-05, "Paul E. McKenney" <paulmck@kernel.org> wrote: > >> This is caused by RCU falling behind a callback-flooding kthread that > >> invokes call_rcu() in a semi-tight loop. Setting rcutree.kthread_prio=40 > >> avoids the splat, but still gets the shutdown-time hang. Retrying with > >> the default rcutree.kthread_prio=2 failed to reproduce the splat, but > >> it did reproduce the shutdown-time hang. > >> > >> OK, maybe printk buffers are not being flushed? A 100-millisecond sleep > >> at the end of of rcu_torture_cleanup() got all of rcutorture's output > >> flushed, but lost the subsequent shutdown-time console traffic. The > >> pr_flush(HZ/10,1) seems more sensible, but this is private to printk(). > >> > >> I would like to log the shutdown-time console traffic because RCU can > >> sometimes break things on that path. > > pr_flush() was changed to private because there were no users. It would > not be a problem to make it available. Adding a pr_flush() to > rcu_torture_cleanup() would be an appropriate workaround for now (more > on this at the end). Ah, got it. > > There is a call to kmsg_dump(KMSG_DUMP_SHUTDOWN) in kernel_power_off() > > that appears to be intended to dump out the printk() buffers, > > It only dumps the buffers to the registered kmsg_dumpers. It is not > responsible for flushing console backlogs. That would explain its not doing much for me. ;-) > > but it > > does not seem to do so in kernels built with CONFIG_PREEMPT_RT=y. > > Does there need to be a pr_flush() call prior to the call to > > migrate_to_reboot_cpu()? Or maybe even to do_kernel_power_off_prepare() > > or kernel_shutdown_prepare()? > > With CONFIG_PREEMPT_RT=y, legacy consoles only print via a dedicated > kthread. Without a pr_flush() somewhere, there is basically no chance > that they will get backlogs flushed because noone is waitig for them. > > The new console API (NBCON) provides support for "atomic consoles", > which _do_ flush by transitioning to synchronous printing during > shutdown/reboot. Unfortunately we still don't have any NBCON atomic > console implemented in the kernel. The 8250 UART will be our first > driver, most likely available in 6.15. (With the current PREEMPT_RT > patch applied, the 8250 NBCON atomic driver is used.) > > Since only CONFIG_PREEMPT_RT=y has this issue, I am not sure if we want > to sprinkle pr_flush() calls on all sleepable shutdown/reboot paths, > although that is certainly one way to handle it. For your case, adding a > pr_flush() to rcu_torture_cleanup() and making pr_flush() non-private > would be an easy solution to avoid your problem. Would it make sense to put a call to pr_flush() at the beginning of the kernel_power_off() function? I suspect that would take care of much more than just rcutorture. Or maybe after the pr_emerg("Power down\n"), since that is normally the last thing that shows up on rcutorture's console logs. I will give that a shot and see what happens. ;-) Thanx, Paul
On Wed, Feb 05, 2025 at 12:10:53PM -0800, Paul E. McKenney wrote: > On Wed, Feb 05, 2025 at 09:00:07PM +0106, John Ogness wrote: > > On 2025-02-05, "Paul E. McKenney" <paulmck@kernel.org> wrote: > > >> This is caused by RCU falling behind a callback-flooding kthread that > > >> invokes call_rcu() in a semi-tight loop. Setting rcutree.kthread_prio=40 > > >> avoids the splat, but still gets the shutdown-time hang. Retrying with > > >> the default rcutree.kthread_prio=2 failed to reproduce the splat, but > > >> it did reproduce the shutdown-time hang. > > >> > > >> OK, maybe printk buffers are not being flushed? A 100-millisecond sleep > > >> at the end of of rcu_torture_cleanup() got all of rcutorture's output > > >> flushed, but lost the subsequent shutdown-time console traffic. The > > >> pr_flush(HZ/10,1) seems more sensible, but this is private to printk(). > > >> > > >> I would like to log the shutdown-time console traffic because RCU can > > >> sometimes break things on that path. > > > > pr_flush() was changed to private because there were no users. It would > > not be a problem to make it available. Adding a pr_flush() to > > rcu_torture_cleanup() would be an appropriate workaround for now (more > > on this at the end). > > Ah, got it. > > > > There is a call to kmsg_dump(KMSG_DUMP_SHUTDOWN) in kernel_power_off() > > > that appears to be intended to dump out the printk() buffers, > > > > It only dumps the buffers to the registered kmsg_dumpers. It is not > > responsible for flushing console backlogs. > > That would explain its not doing much for me. ;-) > > > > but it > > > does not seem to do so in kernels built with CONFIG_PREEMPT_RT=y. > > > Does there need to be a pr_flush() call prior to the call to > > > migrate_to_reboot_cpu()? Or maybe even to do_kernel_power_off_prepare() > > > or kernel_shutdown_prepare()? > > > > With CONFIG_PREEMPT_RT=y, legacy consoles only print via a dedicated > > kthread. Without a pr_flush() somewhere, there is basically no chance > > that they will get backlogs flushed because noone is waitig for them. > > > > The new console API (NBCON) provides support for "atomic consoles", > > which _do_ flush by transitioning to synchronous printing during > > shutdown/reboot. Unfortunately we still don't have any NBCON atomic > > console implemented in the kernel. The 8250 UART will be our first > > driver, most likely available in 6.15. (With the current PREEMPT_RT > > patch applied, the 8250 NBCON atomic driver is used.) > > > > Since only CONFIG_PREEMPT_RT=y has this issue, I am not sure if we want > > to sprinkle pr_flush() calls on all sleepable shutdown/reboot paths, > > although that is certainly one way to handle it. For your case, adding a > > pr_flush() to rcu_torture_cleanup() and making pr_flush() non-private > > would be an easy solution to avoid your problem. > > Would it make sense to put a call to pr_flush() at the beginning of the > kernel_power_off() function? I suspect that would take care of much > more than just rcutorture. > > Or maybe after the pr_emerg("Power down\n"), since that is normally > the last thing that shows up on rcutorture's console logs. > > I will give that a shot and see what happens. ;-) And that does the trick! I specified 500ms, but maybe -1 would be better? You tell me! ;-) Thanx, Paul ------------------------------------------------------------------------ commit 172b692a79ec5801eab57cb55d2f9c398f244c62 Author: Paul E. McKenney <paulmck@kernel.org> Date: Wed Feb 5 12:27:23 2025 -0800 printk: Flush console log from kernel_power_off() Kernels built with CONFIG_PREEMPT_RT=y can lose significant console output and shutdown time, which hides shutdown-time RCU issues from rcutorture. Therefore, make pr_flush() public and invoke it after then last print in kernel_power_off(). Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Cc: Petr Mladek <pmladek@suse.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: John Ogness <john.ogness@linutronix.de> Cc: Sergey Senozhatsky <senozhatsky@chromium.org> diff --git a/include/linux/printk.h b/include/linux/printk.h index 4217a9f412b2..b3e88ff1ecc3 100644 --- a/include/linux/printk.h +++ b/include/linux/printk.h @@ -806,4 +806,6 @@ static inline void print_hex_dump_debug(const char *prefix_str, int prefix_type, #define print_hex_dump_bytes(prefix_str, prefix_type, buf, len) \ print_hex_dump_debug(prefix_str, prefix_type, 16, 1, buf, len, true) + +bool pr_flush(int timeout_ms, bool reset_on_progress); #endif diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c index 80910bc3470c..e0c6e5a99897 100644 --- a/kernel/printk/printk.c +++ b/kernel/printk/printk.c @@ -2461,7 +2461,7 @@ asmlinkage __visible int _printk(const char *fmt, ...) } EXPORT_SYMBOL(_printk); -static bool pr_flush(int timeout_ms, bool reset_on_progress); +bool pr_flush(int timeout_ms, bool reset_on_progress); static bool __pr_flush(struct console *con, int timeout_ms, bool reset_on_progress); #else /* CONFIG_PRINTK */ @@ -4466,7 +4466,7 @@ static bool __pr_flush(struct console *con, int timeout_ms, bool reset_on_progre * Context: Process context. May sleep while acquiring console lock. * Return: true if all usable printers are caught up. */ -static bool pr_flush(int timeout_ms, bool reset_on_progress) +bool pr_flush(int timeout_ms, bool reset_on_progress) { return __pr_flush(NULL, timeout_ms, reset_on_progress); } diff --git a/kernel/reboot.c b/kernel/reboot.c index a701000bab34..3448e6ae3556 100644 --- a/kernel/reboot.c +++ b/kernel/reboot.c @@ -704,6 +704,7 @@ void kernel_power_off(void) migrate_to_reboot_cpu(); syscore_shutdown(); pr_emerg("Power down\n"); + pr_flush(500, 1); kmsg_dump(KMSG_DUMP_SHUTDOWN); machine_power_off(); }
On 2025-02-05, "Paul E. McKenney" <paulmck@kernel.org> wrote: >> Would it make sense to put a call to pr_flush() at the beginning of the >> kernel_power_off() function? I suspect that would take care of much >> more than just rcutorture. >> >> Or maybe after the pr_emerg("Power down\n"), since that is normally >> the last thing that shows up on rcutorture's console logs. Yes, after the pr_emerg("Power down\n"), as in your patch below. > And that does the trick! I specified 500ms, but maybe -1 would be > better? You tell me! ;-) I probably would have chosen 1000ms, but if the kthread has made no forward progress after a few max timeslices, it probably is dead or will not be scheduled anyway. If there is no backlog, pr_flush() returns immediately. I recommend _not_ using timeout_ms=-1 here, in case the kthread cannot be scheduled (for whatever reason). > ------------------------------------------------------------------------ > > commit 172b692a79ec5801eab57cb55d2f9c398f244c62 > Author: Paul E. McKenney <paulmck@kernel.org> > Date: Wed Feb 5 12:27:23 2025 -0800 > > printk: Flush console log from kernel_power_off() > > Kernels built with CONFIG_PREEMPT_RT=y can lose significant console output > and shutdown time, which hides shutdown-time RCU issues from rcutorture. > Therefore, make pr_flush() public and invoke it after then last print > in kernel_power_off(). > > Signed-off-by: Paul E. McKenney <paulmck@kernel.org> > Cc: Petr Mladek <pmladek@suse.com> > Cc: Steven Rostedt <rostedt@goodmis.org> > Cc: John Ogness <john.ogness@linutronix.de> > Cc: Sergey Senozhatsky <senozhatsky@chromium.org> > > diff --git a/include/linux/printk.h b/include/linux/printk.h > index 4217a9f412b2..b3e88ff1ecc3 100644 > --- a/include/linux/printk.h > +++ b/include/linux/printk.h > @@ -806,4 +806,6 @@ static inline void print_hex_dump_debug(const char *prefix_str, int prefix_type, > #define print_hex_dump_bytes(prefix_str, prefix_type, buf, len) \ > print_hex_dump_debug(prefix_str, prefix_type, 16, 1, buf, len, true) > > + > +bool pr_flush(int timeout_ms, bool reset_on_progress); This will fail to link for !CONFIG_PRINTK. In kernel/printk/printk.c currently there is: #ifdef CONFIG_PRINTK static bool pr_flush(int timeout_ms, bool reset_on_progress); #else static bool pr_flush(int timeout_ms, bool reset_on_progress) { return true; } #endif This would need to move to include/linux/printk.h There you will find an existing CONFIG_PRINTK if/else block to move those to. > #endif > diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c > index 80910bc3470c..e0c6e5a99897 100644 > --- a/kernel/printk/printk.c > +++ b/kernel/printk/printk.c > @@ -2461,7 +2461,7 @@ asmlinkage __visible int _printk(const char *fmt, ...) > } > EXPORT_SYMBOL(_printk); > > -static bool pr_flush(int timeout_ms, bool reset_on_progress); > +bool pr_flush(int timeout_ms, bool reset_on_progress); > static bool __pr_flush(struct console *con, int timeout_ms, bool reset_on_progress); > > #else /* CONFIG_PRINTK */ > @@ -4466,7 +4466,7 @@ static bool __pr_flush(struct console *con, int timeout_ms, bool reset_on_progre > * Context: Process context. May sleep while acquiring console lock. > * Return: true if all usable printers are caught up. > */ > -static bool pr_flush(int timeout_ms, bool reset_on_progress) > +bool pr_flush(int timeout_ms, bool reset_on_progress) > { > return __pr_flush(NULL, timeout_ms, reset_on_progress); > } > diff --git a/kernel/reboot.c b/kernel/reboot.c > index a701000bab34..3448e6ae3556 100644 > --- a/kernel/reboot.c > +++ b/kernel/reboot.c > @@ -704,6 +704,7 @@ void kernel_power_off(void) > migrate_to_reboot_cpu(); > syscore_shutdown(); > pr_emerg("Power down\n"); > + pr_flush(500, 1); I would have chosen: pr_flush(1000, true); but I really don't have an argument either way. > kmsg_dump(KMSG_DUMP_SHUTDOWN); > machine_power_off(); > } John Ogness
On Wed, Feb 05, 2025 at 10:28:06PM +0106, John Ogness wrote: > On 2025-02-05, "Paul E. McKenney" <paulmck@kernel.org> wrote: > >> Would it make sense to put a call to pr_flush() at the beginning of the > >> kernel_power_off() function? I suspect that would take care of much > >> more than just rcutorture. > >> > >> Or maybe after the pr_emerg("Power down\n"), since that is normally > >> the last thing that shows up on rcutorture's console logs. > > Yes, after the pr_emerg("Power down\n"), as in your patch below. > > > And that does the trick! I specified 500ms, but maybe -1 would be > > better? You tell me! ;-) > > I probably would have chosen 1000ms, but if the kthread has made no > forward progress after a few max timeslices, it probably is dead or will > not be scheduled anyway. I picked 500ms randomly, so I am happy to make it be 1000ms. > If there is no backlog, pr_flush() returns immediately. OK, so I don't need to add "if (IS_ENABLED(CONFIG_PREEMPT_RT))" to guard it, then? > I recommend _not_ using timeout_ms=-1 here, in case the kthread cannot > be scheduled (for whatever reason). Sounds eminently reasonable to me! > > ------------------------------------------------------------------------ > > > > commit 172b692a79ec5801eab57cb55d2f9c398f244c62 > > Author: Paul E. McKenney <paulmck@kernel.org> > > Date: Wed Feb 5 12:27:23 2025 -0800 > > > > printk: Flush console log from kernel_power_off() > > > > Kernels built with CONFIG_PREEMPT_RT=y can lose significant console output > > and shutdown time, which hides shutdown-time RCU issues from rcutorture. > > Therefore, make pr_flush() public and invoke it after then last print > > in kernel_power_off(). > > > > Signed-off-by: Paul E. McKenney <paulmck@kernel.org> > > Cc: Petr Mladek <pmladek@suse.com> > > Cc: Steven Rostedt <rostedt@goodmis.org> > > Cc: John Ogness <john.ogness@linutronix.de> > > Cc: Sergey Senozhatsky <senozhatsky@chromium.org> > > > > diff --git a/include/linux/printk.h b/include/linux/printk.h > > index 4217a9f412b2..b3e88ff1ecc3 100644 > > --- a/include/linux/printk.h > > +++ b/include/linux/printk.h > > @@ -806,4 +806,6 @@ static inline void print_hex_dump_debug(const char *prefix_str, int prefix_type, > > #define print_hex_dump_bytes(prefix_str, prefix_type, buf, len) \ > > print_hex_dump_debug(prefix_str, prefix_type, 16, 1, buf, len, true) > > > > + > > +bool pr_flush(int timeout_ms, bool reset_on_progress); > > This will fail to link for !CONFIG_PRINTK. > > In kernel/printk/printk.c currently there is: > > #ifdef CONFIG_PRINTK > static bool pr_flush(int timeout_ms, bool reset_on_progress); > #else > static bool pr_flush(int timeout_ms, bool reset_on_progress) { return true; } > #endif > > This would need to move to include/linux/printk.h > > There you will find an existing CONFIG_PRINTK if/else block to move > those to. Thank you, will do! > > #endif > > diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c > > index 80910bc3470c..e0c6e5a99897 100644 > > --- a/kernel/printk/printk.c > > +++ b/kernel/printk/printk.c > > @@ -2461,7 +2461,7 @@ asmlinkage __visible int _printk(const char *fmt, ...) > > } > > EXPORT_SYMBOL(_printk); > > > > -static bool pr_flush(int timeout_ms, bool reset_on_progress); > > +bool pr_flush(int timeout_ms, bool reset_on_progress); > > static bool __pr_flush(struct console *con, int timeout_ms, bool reset_on_progress); > > > > #else /* CONFIG_PRINTK */ > > @@ -4466,7 +4466,7 @@ static bool __pr_flush(struct console *con, int timeout_ms, bool reset_on_progre > > * Context: Process context. May sleep while acquiring console lock. > > * Return: true if all usable printers are caught up. > > */ > > -static bool pr_flush(int timeout_ms, bool reset_on_progress) > > +bool pr_flush(int timeout_ms, bool reset_on_progress) > > { > > return __pr_flush(NULL, timeout_ms, reset_on_progress); > > } > > diff --git a/kernel/reboot.c b/kernel/reboot.c > > index a701000bab34..3448e6ae3556 100644 > > --- a/kernel/reboot.c > > +++ b/kernel/reboot.c > > @@ -704,6 +704,7 @@ void kernel_power_off(void) > > migrate_to_reboot_cpu(); > > syscore_shutdown(); > > pr_emerg("Power down\n"); > > + pr_flush(500, 1); > > I would have chosen: > > pr_flush(1000, true); > > but I really don't have an argument either way. Hey, of the two of us, you are the printk() expert, so I will make it be 1000. ;-) Thanx, Paul > > kmsg_dump(KMSG_DUMP_SHUTDOWN); > > machine_power_off(); > > } > > John Ogness
On 2025-02-05, "Paul E. McKenney" <paulmck@kernel.org> wrote: >> If there is no backlog, pr_flush() returns immediately. > > OK, so I don't need to add "if (IS_ENABLED(CONFIG_PREEMPT_RT))" to guard > it, then? For !CONFIG_PREEMPT_RT, if there are legacy consoles registered, pr_flush() will additionally perform a blocking lock on the console_lock. The kernel is in a clean shutdown path, so it should not be a risk to block on the console_lock. But the console_lock is one of the ugly global locks (semaphore, actually) we are working to remove because sometimes it still causes problems. @pmladek: What do you think? Should we have a CONFIG_PREEMPT_RT guard, which would always be safe? Or do you feel comfortable performing a console_lock() in the shutdown path for !CONFIG_PREEMPT_RT? John
On 2025-02-05, John Ogness <john.ogness@linutronix.de> wrote: >> OK, so I don't need to add "if (IS_ENABLED(CONFIG_PREEMPT_RT))" to guard >> it, then? > > For !CONFIG_PREEMPT_RT, if there are legacy consoles registered, > pr_flush() will additionally perform a blocking lock on the > console_lock. Sorry, I was just thinking about the flushing component of pr_flush(). Later in the function it takes the console_lock even for CONFIG_PREEMPT_RT. So please do _not_ have a guard. pr_flush() should never hang on the console_lock during shutdown, but if does, that is something that will need to be debugged and fixed. @pmladek: Looking forward to reading your input on this. John
On Wed, Feb 05, 2025 at 11:37:25PM +0106, John Ogness wrote: > On 2025-02-05, John Ogness <john.ogness@linutronix.de> wrote: > >> OK, so I don't need to add "if (IS_ENABLED(CONFIG_PREEMPT_RT))" to guard > >> it, then? > > > > For !CONFIG_PREEMPT_RT, if there are legacy consoles registered, > > pr_flush() will additionally perform a blocking lock on the > > console_lock. > > Sorry, I was just thinking about the flushing component of > pr_flush(). Later in the function it takes the console_lock even for > CONFIG_PREEMPT_RT. So please do _not_ have a guard. > > pr_flush() should never hang on the console_lock during shutdown, but if > does, that is something that will need to be debugged and fixed. > > @pmladek: Looking forward to reading your input on this. Here is an updated commit, hopefully applying your feedback correctly. Thanx, Paul ------------------------------------------------------------------------ commit 35679c18b062368855e183ee6712ca5c16145d8c Author: Paul E. McKenney <paulmck@kernel.org> Date: Wed Feb 5 12:27:23 2025 -0800 printk: Flush console log from kernel_power_off() Kernels built with CONFIG_PREEMPT_RT=y can lose significant console output and shutdown time, which hides shutdown-time RCU issues from rcutorture. Therefore, make pr_flush() public and invoke it after then last print in kernel_power_off(). [ paulmck: Apply John Ogness feedback. ] Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Cc: Petr Mladek <pmladek@suse.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: John Ogness <john.ogness@linutronix.de> Cc: Sergey Senozhatsky <senozhatsky@chromium.org> diff --git a/include/linux/printk.h b/include/linux/printk.h index 4217a9f412b2..d658d7e81bef 100644 --- a/include/linux/printk.h +++ b/include/linux/printk.h @@ -207,6 +207,7 @@ void printk_legacy_allow_panic_sync(void); extern bool nbcon_device_try_acquire(struct console *con); extern void nbcon_device_release(struct console *con); void nbcon_atomic_flush_unsafe(void); +bool pr_flush(int timeout_ms, bool reset_on_progress); #else static inline __printf(1, 0) int vprintk(const char *s, va_list args) @@ -315,6 +316,11 @@ static inline void nbcon_atomic_flush_unsafe(void) { } +static bool pr_flush(int timeout_ms, bool reset_on_progress) +{ + return true; +} + #endif bool this_cpu_in_panic(void); diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c index 80910bc3470c..b9699eaacf19 100644 --- a/kernel/printk/printk.c +++ b/kernel/printk/printk.c @@ -2461,7 +2461,6 @@ asmlinkage __visible int _printk(const char *fmt, ...) } EXPORT_SYMBOL(_printk); -static bool pr_flush(int timeout_ms, bool reset_on_progress); static bool __pr_flush(struct console *con, int timeout_ms, bool reset_on_progress); #else /* CONFIG_PRINTK */ @@ -2474,7 +2473,6 @@ static bool __pr_flush(struct console *con, int timeout_ms, bool reset_on_progre static u64 syslog_seq; -static bool pr_flush(int timeout_ms, bool reset_on_progress) { return true; } static bool __pr_flush(struct console *con, int timeout_ms, bool reset_on_progress) { return true; } #endif /* CONFIG_PRINTK */ @@ -4466,7 +4464,7 @@ static bool __pr_flush(struct console *con, int timeout_ms, bool reset_on_progre * Context: Process context. May sleep while acquiring console lock. * Return: true if all usable printers are caught up. */ -static bool pr_flush(int timeout_ms, bool reset_on_progress) +bool pr_flush(int timeout_ms, bool reset_on_progress) { return __pr_flush(NULL, timeout_ms, reset_on_progress); } diff --git a/kernel/reboot.c b/kernel/reboot.c index a701000bab34..a84828d0757b 100644 --- a/kernel/reboot.c +++ b/kernel/reboot.c @@ -704,6 +704,7 @@ void kernel_power_off(void) migrate_to_reboot_cpu(); syscore_shutdown(); pr_emerg("Power down\n"); + pr_flush(1000, 1); kmsg_dump(KMSG_DUMP_SHUTDOWN); machine_power_off(); }
On 2025-02-05, "Paul E. McKenney" <paulmck@kernel.org> wrote: > commit 35679c18b062368855e183ee6712ca5c16145d8c > Author: Paul E. McKenney <paulmck@kernel.org> > Date: Wed Feb 5 12:27:23 2025 -0800 > > printk: Flush console log from kernel_power_off() > > Kernels built with CONFIG_PREEMPT_RT=y can lose significant console output > and shutdown time, which hides shutdown-time RCU issues from rcutorture. > Therefore, make pr_flush() public and invoke it after then last print > in kernel_power_off(). > > [ paulmck: Apply John Ogness feedback. ] > > Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Reviewed-by: John Ogness <john.ogness@linutronix.de>
On 2025-02-05 15:50:14 [-0800], Paul E. McKenney wrote: > diff --git a/include/linux/printk.h b/include/linux/printk.h > index 4217a9f412b2..d658d7e81bef 100644 > --- a/include/linux/printk.h > +++ b/include/linux/printk.h > @@ -207,6 +207,7 @@ void printk_legacy_allow_panic_sync(void); > extern bool nbcon_device_try_acquire(struct console *con); > extern void nbcon_device_release(struct console *con); > void nbcon_atomic_flush_unsafe(void); > +bool pr_flush(int timeout_ms, bool reset_on_progress); … > diff --git a/kernel/reboot.c b/kernel/reboot.c > index a701000bab34..a84828d0757b 100644 > --- a/kernel/reboot.c > +++ b/kernel/reboot.c > @@ -704,6 +704,7 @@ void kernel_power_off(void) > migrate_to_reboot_cpu(); > syscore_shutdown(); > pr_emerg("Power down\n"); > + pr_flush(1000, 1); this could be true instead 1 given that the type is bool > kmsg_dump(KMSG_DUMP_SHUTDOWN); > machine_power_off(); > } Sebastian
On Thu, Feb 06, 2025 at 09:34:01AM +0106, John Ogness wrote: > On 2025-02-05, "Paul E. McKenney" <paulmck@kernel.org> wrote: > > commit 35679c18b062368855e183ee6712ca5c16145d8c > > Author: Paul E. McKenney <paulmck@kernel.org> > > Date: Wed Feb 5 12:27:23 2025 -0800 > > > > printk: Flush console log from kernel_power_off() > > > > Kernels built with CONFIG_PREEMPT_RT=y can lose significant console output > > and shutdown time, which hides shutdown-time RCU issues from rcutorture. > > Therefore, make pr_flush() public and invoke it after then last print > > in kernel_power_off(). > > > > [ paulmck: Apply John Ogness feedback. ] > > > > Signed-off-by: Paul E. McKenney <paulmck@kernel.org> > > Reviewed-by: John Ogness <john.ogness@linutronix.de> Thank you! I will apply this on my next rebase. Thanx, Paul
On Thu, Feb 06, 2025 at 10:02:49AM +0100, Sebastian Andrzej Siewior wrote: > On 2025-02-05 15:50:14 [-0800], Paul E. McKenney wrote: > > diff --git a/include/linux/printk.h b/include/linux/printk.h > > index 4217a9f412b2..d658d7e81bef 100644 > > --- a/include/linux/printk.h > > +++ b/include/linux/printk.h > > @@ -207,6 +207,7 @@ void printk_legacy_allow_panic_sync(void); > > extern bool nbcon_device_try_acquire(struct console *con); > > extern void nbcon_device_release(struct console *con); > > void nbcon_atomic_flush_unsafe(void); > > +bool pr_flush(int timeout_ms, bool reset_on_progress); > … > > diff --git a/kernel/reboot.c b/kernel/reboot.c > > index a701000bab34..a84828d0757b 100644 > > --- a/kernel/reboot.c > > +++ b/kernel/reboot.c > > @@ -704,6 +704,7 @@ void kernel_power_off(void) > > migrate_to_reboot_cpu(); > > syscore_shutdown(); > > pr_emerg("Power down\n"); > > + pr_flush(1000, 1); > > this could be true instead 1 given that the type is bool Good point, and this matches other usage. Will fix on my next rebase. Thanx, Paul > > kmsg_dump(KMSG_DUMP_SHUTDOWN); > > machine_power_off(); > > } > > Sebastian
On Thu, Feb 06, 2025 at 01:59:00AM -0800, Paul E. McKenney wrote: > On Thu, Feb 06, 2025 at 10:02:49AM +0100, Sebastian Andrzej Siewior wrote: > > On 2025-02-05 15:50:14 [-0800], Paul E. McKenney wrote: > > > diff --git a/include/linux/printk.h b/include/linux/printk.h > > > index 4217a9f412b2..d658d7e81bef 100644 > > > --- a/include/linux/printk.h > > > +++ b/include/linux/printk.h > > > @@ -207,6 +207,7 @@ void printk_legacy_allow_panic_sync(void); > > > extern bool nbcon_device_try_acquire(struct console *con); > > > extern void nbcon_device_release(struct console *con); > > > void nbcon_atomic_flush_unsafe(void); > > > +bool pr_flush(int timeout_ms, bool reset_on_progress); > > … > > > diff --git a/kernel/reboot.c b/kernel/reboot.c > > > index a701000bab34..a84828d0757b 100644 > > > --- a/kernel/reboot.c > > > +++ b/kernel/reboot.c > > > @@ -704,6 +704,7 @@ void kernel_power_off(void) > > > migrate_to_reboot_cpu(); > > > syscore_shutdown(); > > > pr_emerg("Power down\n"); > > > + pr_flush(1000, 1); > > > > this could be true instead 1 given that the type is bool > > Good point, and this matches other usage. Will fix on my next rebase. And here is the updated commit. Thoughts? Thanx, Paul ------------------------------------------------------------------------ commit 04ccc09a6eb9a50f79c0a0e46fd273b5131b24f1 Author: Paul E. McKenney <paulmck@kernel.org> Date: Wed Feb 5 12:27:23 2025 -0800 printk: Flush console log from kernel_power_off() Kernels built with CONFIG_PREEMPT_RT=y can lose significant console output and shutdown time, which hides shutdown-time RCU issues from rcutorture. Therefore, make pr_flush() public and invoke it after then last print in kernel_power_off(). [ paulmck: Apply John Ogness feedback. ] [ paulmck: Appy Sebastian Andrzej Siewior feedback. ] [ paulmck: Apply kernel test robot feedback. ] Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Reviewed-by: John Ogness <john.ogness@linutronix.de> Cc: Petr Mladek <pmladek@suse.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Sergey Senozhatsky <senozhatsky@chromium.org> diff --git a/include/linux/printk.h b/include/linux/printk.h index 4217a9f412b2..5b462029d03c 100644 --- a/include/linux/printk.h +++ b/include/linux/printk.h @@ -207,6 +207,7 @@ void printk_legacy_allow_panic_sync(void); extern bool nbcon_device_try_acquire(struct console *con); extern void nbcon_device_release(struct console *con); void nbcon_atomic_flush_unsafe(void); +bool pr_flush(int timeout_ms, bool reset_on_progress); #else static inline __printf(1, 0) int vprintk(const char *s, va_list args) @@ -315,6 +316,11 @@ static inline void nbcon_atomic_flush_unsafe(void) { } +static inline bool pr_flush(int timeout_ms, bool reset_on_progress) +{ + return true; +} + #endif bool this_cpu_in_panic(void); diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c index 80910bc3470c..b9699eaacf19 100644 --- a/kernel/printk/printk.c +++ b/kernel/printk/printk.c @@ -2461,7 +2461,6 @@ asmlinkage __visible int _printk(const char *fmt, ...) } EXPORT_SYMBOL(_printk); -static bool pr_flush(int timeout_ms, bool reset_on_progress); static bool __pr_flush(struct console *con, int timeout_ms, bool reset_on_progress); #else /* CONFIG_PRINTK */ @@ -2474,7 +2473,6 @@ static bool __pr_flush(struct console *con, int timeout_ms, bool reset_on_progre static u64 syslog_seq; -static bool pr_flush(int timeout_ms, bool reset_on_progress) { return true; } static bool __pr_flush(struct console *con, int timeout_ms, bool reset_on_progress) { return true; } #endif /* CONFIG_PRINTK */ @@ -4466,7 +4464,7 @@ static bool __pr_flush(struct console *con, int timeout_ms, bool reset_on_progre * Context: Process context. May sleep while acquiring console lock. * Return: true if all usable printers are caught up. */ -static bool pr_flush(int timeout_ms, bool reset_on_progress) +bool pr_flush(int timeout_ms, bool reset_on_progress) { return __pr_flush(NULL, timeout_ms, reset_on_progress); } diff --git a/kernel/reboot.c b/kernel/reboot.c index a701000bab34..3ba15b2c4066 100644 --- a/kernel/reboot.c +++ b/kernel/reboot.c @@ -704,6 +704,7 @@ void kernel_power_off(void) migrate_to_reboot_cpu(); syscore_shutdown(); pr_emerg("Power down\n"); + pr_flush(1000, true); kmsg_dump(KMSG_DUMP_SHUTDOWN); machine_power_off(); }
On Wed 2025-02-05 15:50:14, Paul E. McKenney wrote: > On Wed, Feb 05, 2025 at 11:37:25PM +0106, John Ogness wrote: > > On 2025-02-05, John Ogness <john.ogness@linutronix.de> wrote: > > >> OK, so I don't need to add "if (IS_ENABLED(CONFIG_PREEMPT_RT))" to guard > > >> it, then? > > > > > > For !CONFIG_PREEMPT_RT, if there are legacy consoles registered, > > > pr_flush() will additionally perform a blocking lock on the > > > console_lock. > > > > Sorry, I was just thinking about the flushing component of > > pr_flush(). Later in the function it takes the console_lock even for > > CONFIG_PREEMPT_RT. So please do _not_ have a guard. It should be possible to avoid this console_lock() when only NBCON consoles are registered. We take it in pr_flush() for reading con->seq and potential race with register_console(). But it can be handled another way, for example, by restarting the cycle when non-NBCON console suddenly appears. But it might be done later. We do not have NBCON consoles at the moment so the console_lock() is needed anyway. > > pr_flush() should never hang on the console_lock during shutdown, but if > > does, that is something that will need to be debugged and fixed. > > > > @pmladek: Looking forward to reading your input on this. My only concern is that kernel_power_off() is called in many code paths so it is not easy to review all scenarios. But I haven't found any code path where it would be an obvious problem. Especially, it seems that it is _not_ called in panic situations. Also kernel_power_off() calls syscore_shutdown() which takes syscore_ops_lock mutex. So that is must be called only in a schedulable context even before this patch. Summary: I do not see any obvious problem in the end. Except, for some small details, see below. > Here is an updated commit, hopefully applying your feedback correctly. > > Thanx, Paul > > ------------------------------------------------------------------------ > > commit 35679c18b062368855e183ee6712ca5c16145d8c > Author: Paul E. McKenney <paulmck@kernel.org> > Date: Wed Feb 5 12:27:23 2025 -0800 > > printk: Flush console log from kernel_power_off() > > Kernels built with CONFIG_PREEMPT_RT=y can lose significant console output > and shutdown time, which hides shutdown-time RCU issues from rcutorture. > Therefore, make pr_flush() public and invoke it after then last print > in kernel_power_off(). > > --- a/include/linux/printk.h > +++ b/include/linux/printk.h > @@ -207,6 +207,7 @@ void printk_legacy_allow_panic_sync(void); > extern bool nbcon_device_try_acquire(struct console *con); > extern void nbcon_device_release(struct console *con); > void nbcon_atomic_flush_unsafe(void); > +bool pr_flush(int timeout_ms, bool reset_on_progress); > #else > static inline __printf(1, 0) > int vprintk(const char *s, va_list args) > @@ -315,6 +316,11 @@ static inline void nbcon_atomic_flush_unsafe(void) > { > } > > +static bool pr_flush(int timeout_ms, bool reset_on_progress) There is missing "inline": static inline bool pr_flush(int timeout_ms, bool reset_on_progress) Otherwise, I get In file included from ./include/asm-generic/bug.h:22, from ./arch/x86/include/asm/bug.h:99, from ./include/linux/bug.h:5, from ./include/linux/page-flags.h:10, from kernel/bounds.c:10: ./include/linux/printk.h:319:13: warning: ‘pr_flush’ defined but not used [-Wunused-function] > +{ > + return true; > +} > + > #endif > > bool this_cpu_in_panic(void); > --- a/kernel/reboot.c > +++ b/kernel/reboot.c > @@ -704,6 +704,7 @@ void kernel_power_off(void) > migrate_to_reboot_cpu(); > syscore_shutdown(); > pr_emerg("Power down\n"); > + pr_flush(1000, 1); This should be pr_flush(1000, true) as suggested by Sebastian. > kmsg_dump(KMSG_DUMP_SHUTDOWN); > machine_power_off(); > } With the two changes: Reviewed-by: Petr Mladek <pmladek@suse.com> Paul, I assume that you will pass this via the RCU tree together with the other changes. Or would you prefer to route this particular change via printk tree? Both ways are fine to me. Best Regards, Petr
On Thu, Feb 06, 2025 at 12:06:08PM +0100, Petr Mladek wrote: > On Wed 2025-02-05 15:50:14, Paul E. McKenney wrote: > > On Wed, Feb 05, 2025 at 11:37:25PM +0106, John Ogness wrote: > > > On 2025-02-05, John Ogness <john.ogness@linutronix.de> wrote: > > > >> OK, so I don't need to add "if (IS_ENABLED(CONFIG_PREEMPT_RT))" to guard > > > >> it, then? > > > > > > > > For !CONFIG_PREEMPT_RT, if there are legacy consoles registered, > > > > pr_flush() will additionally perform a blocking lock on the > > > > console_lock. > > > > > > Sorry, I was just thinking about the flushing component of > > > pr_flush(). Later in the function it takes the console_lock even for > > > CONFIG_PREEMPT_RT. So please do _not_ have a guard. Thank you all for looking this over! > It should be possible to avoid this console_lock() when only NBCON > consoles are registered. We take it in pr_flush() for reading con->seq > and potential race with register_console(). But it can be handled > another way, for example, by restarting the cycle when non-NBCON > console suddenly appears. > > But it might be done later. We do not have NBCON consoles at the > moment so the console_lock() is needed anyway. Fair enough. > > > pr_flush() should never hang on the console_lock during shutdown, but if > > > does, that is something that will need to be debugged and fixed. > > > > > > @pmladek: Looking forward to reading your input on this. > > My only concern is that kernel_power_off() is called in many > code paths so it is not easy to review all scenarios. > > But I haven't found any code path where it would be an obvious problem. > Especially, it seems that it is _not_ called in panic situations. > > Also kernel_power_off() calls syscore_shutdown() which takes > syscore_ops_lock mutex. So that is must be called only in > a schedulable context even before this patch. > > Summary: I do not see any obvious problem in the end. > Except, for some small details, see below. The presence of that syscore_shutdown() call greatly reduced my level of anxiety as well. ;-) > > Here is an updated commit, hopefully applying your feedback correctly. > > > > Thanx, Paul > > > > ------------------------------------------------------------------------ > > > > commit 35679c18b062368855e183ee6712ca5c16145d8c > > Author: Paul E. McKenney <paulmck@kernel.org> > > Date: Wed Feb 5 12:27:23 2025 -0800 > > > > printk: Flush console log from kernel_power_off() > > > > Kernels built with CONFIG_PREEMPT_RT=y can lose significant console output > > and shutdown time, which hides shutdown-time RCU issues from rcutorture. > > Therefore, make pr_flush() public and invoke it after then last print > > in kernel_power_off(). > > > > --- a/include/linux/printk.h > > +++ b/include/linux/printk.h > > @@ -207,6 +207,7 @@ void printk_legacy_allow_panic_sync(void); > > extern bool nbcon_device_try_acquire(struct console *con); > > extern void nbcon_device_release(struct console *con); > > void nbcon_atomic_flush_unsafe(void); > > +bool pr_flush(int timeout_ms, bool reset_on_progress); > > #else > > static inline __printf(1, 0) > > int vprintk(const char *s, va_list args) > > @@ -315,6 +316,11 @@ static inline void nbcon_atomic_flush_unsafe(void) > > { > > } > > > > +static bool pr_flush(int timeout_ms, bool reset_on_progress) > > There is missing "inline": > > static inline bool pr_flush(int timeout_ms, bool reset_on_progress) > > > Otherwise, I get > > In file included from ./include/asm-generic/bug.h:22, > from ./arch/x86/include/asm/bug.h:99, > from ./include/linux/bug.h:5, > from ./include/linux/page-flags.h:10, > from kernel/bounds.c:10: > ./include/linux/printk.h:319:13: warning: ‘pr_flush’ defined but not used [-Wunused-function] Fixed, thank you both! > > +{ > > + return true; > > +} > > + > > #endif > > > > bool this_cpu_in_panic(void); > > --- a/kernel/reboot.c > > +++ b/kernel/reboot.c > > @@ -704,6 +704,7 @@ void kernel_power_off(void) > > migrate_to_reboot_cpu(); > > syscore_shutdown(); > > pr_emerg("Power down\n"); > > + pr_flush(1000, 1); > > This should be pr_flush(1000, true) as suggested by Sebastian. Also fixed, also thank you both! > > kmsg_dump(KMSG_DUMP_SHUTDOWN); > > machine_power_off(); > > } > > With the two changes: > > Reviewed-by: Petr Mladek <pmladek@suse.com> > > Paul, I assume that you will pass this via the RCU tree together with > the other changes. Or would you prefer to route this particular > change via printk tree? Both ways are fine to me. One advantage of carrying it via RCU is that I can arrange things so that rcutorture users don't see this problem. So unless a problem arises, thank you, and I do plan to send this up via RCU. Thanx, Paul
diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h index 3bb554723074..c6c4aee47df2 100644 --- a/include/linux/rcupdate.h +++ b/include/linux/rcupdate.h @@ -121,12 +121,6 @@ void rcu_init(void); extern int rcu_scheduler_active; void rcu_sched_clock_irq(int user); -#ifdef CONFIG_TASKS_RCU_GENERIC -void rcu_init_tasks_generic(void); -#else -static inline void rcu_init_tasks_generic(void) { } -#endif - #ifdef CONFIG_RCU_STALL_COMMON void rcu_sysrq_start(void); void rcu_sysrq_end(void); diff --git a/init/main.c b/init/main.c index 00fac1170294..fb880b6fb524 100644 --- a/init/main.c +++ b/init/main.c @@ -1562,7 +1562,6 @@ static noinline void __init kernel_init_freeable(void) init_mm_internals(); - rcu_init_tasks_generic(); do_pre_smp_initcalls(); lockup_detector_init(); diff --git a/kernel/rcu/tasks.h b/kernel/rcu/tasks.h index 59314da5eb60..466668eb4fad 100644 --- a/kernel/rcu/tasks.h +++ b/kernel/rcu/tasks.h @@ -2256,7 +2256,7 @@ void __init tasks_cblist_init_generic(void) #endif } -void __init rcu_init_tasks_generic(void) +static int __init rcu_init_tasks_generic(void) { #ifdef CONFIG_TASKS_RCU rcu_spawn_tasks_kthread(); @@ -2272,7 +2272,10 @@ void __init rcu_init_tasks_generic(void) // Run the self-tests. rcu_tasks_initiate_self_tests(); + + return 0; } +core_initcall(rcu_init_tasks_generic); #else /* #ifdef CONFIG_TASKS_RCU_GENERIC */ static inline void rcu_tasks_bootup_oddness(void) {}
The timer and hrtimer softirq processing has moved to dedicated threads for kernels built with CONFIG_IRQ_FORCED_THREADING=y. This results in timers not expiring until later in early boot, which in turn causes the RCU Tasks self-tests to hang in kernels built with CONFIG_PROVE_RCU=y, which further causes the entire kernel to hang. One fix would be to make timers work during this time, but there are no known users of RCU Tasks grace periods during that time, so no justification for the added complexity. Not yet, anyway. This commit therefore moves the call to rcu_init_tasks_generic() from kernel_init_freeable() to a core_initcall(). This works because the timer and hrtimer kthreads are created at early_initcall() time. Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Frederic Weisbecker <frederic@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: <linux-trace-kernel@vger.kernel.org> --- include/linux/rcupdate.h | 6 ------ init/main.c | 1 - kernel/rcu/tasks.h | 5 ++++- 3 files changed, 4 insertions(+), 8 deletions(-)