mbox series

[RFC,0/1] sched: defer completion task to online CPU

Message ID 20241213203739.1519801-1-usamaarif642@gmail.com (mailing list archive)
Headers show
Series sched: defer completion task to online CPU | expand

Message

Usama Arif Dec. 13, 2024, 8:33 p.m. UTC
We (meta) are running 6.12 release kernel in production and are encoutering
the below warning, mostly at boot time, reported by Vlad Poenaru.

           ------------[ cut here ]------------
           WARNING: CPU: 94 PID: 588 at kernel/time/hrtimer.c:1086 hrtimer_start_range_ns+0x289/0x2d0
           Modules linked in:
           CPU: 94 UID: 0 PID: 588 Comm: migration/94 Not tainted
           Stopper: multi_cpu_stop+0x0/0x120 <- stop_machine_cpuslocked+0x66/0xc0
           RIP: 0010:hrtimer_start_range_ns+0x289/0x2d0
           Code: 41 5c 41 5d 41 5e 41 5f 5d e9 63 94 ea 00 0f 0b 48 83 c4 10 5b 41 5c 41 5d 41 5e 41 5f 5d e9 39 fc 15 01 0f 0b e9 c1 fd ff ff <0f> 0b 48 8b 45 00 e9 59 ff ff ff f3 0f 1e fa 65 8b 05 1d ec e8 7e
           RSP: 0018:ffffc900019cbcc8 EFLAGS: 00010046
           RAX: ffff88bf449a4c40 RBX: 0000000000000082 RCX: 0000000000000001
           RDX: 0000000000000001 RSI: ffff88bf43224c80 RDI: ffff88bf449a4c40
           RBP: ffff88bf449a4c80 R08: ffff888280970090 R09: 0000000000000000
           R10: ffff88bf432252e0 R11: ffffffff811abf70 R12: ffff88bf449a4c40
           R13: ffff88bf43234b28 R14: ffff88bf43224c80 R15: 0000000000000000
           FS:  0000000000000000(0000) GS:ffff88bf44980000(0000) knlGS:0000000000000000
           CR2: 0000000000000000 CR3: 000000404b230001 CR4: 0000000000770ef0
           PKRU: 55555554
           Call Trace:
            <TASK>
            ? __warn+0xcf/0x1b0
            ? hrtimer_start_range_ns+0x289/0x2d0
            ? report_bug+0x120/0x1a0
            ? handle_bug+0x60/0x90
            ? exc_invalid_op+0x1a/0x50
            ? asm_exc_invalid_op+0x1a/0x20
            ? register_refined_jiffies+0xb0/0xb0
            ? hrtimer_start_range_ns+0x289/0x2d0
            ? hrtimer_start_range_ns+0x186/0x2d0
            start_dl_timer+0xfc/0x150
            enqueue_dl_entity+0x367/0x640
            dl_server_start+0x53/0xa0
            enqueue_task_fair+0x363/0x460
            enqueue_task+0x3c/0x200
            ttwu_do_activate+0x94/0x240
            try_to_wake_up+0x315/0x600
            complete+0x4b/0x80
            ? stop_two_cpus+0x2f0/0x2f0
            cpu_stopper_thread+0xb1/0x120
            ? smpboot_unregister_percpu_thread+0xc0/0xc0
            smpboot_thread_fn+0xf7/0x150
            kthread+0x121/0x130
            ? kthread_blkcg+0x40/0x40
            ret_from_fork+0x39/0x50
            ? kthread_blkcg+0x40/0x40
            ret_from_fork_asm+0x11/0x20
            </TASK>
           ---[ end trace 0000000000000000 ]---

It looks like completion that requires an hrtimer is being scheduled on a
CPU that is not yet completely online. There have been other issues with
hrtimer that have been fixed recently [1]. This bug might have been
introduced in [2].

We dont have a reliable reproducer for this (just see it popping up in
production). A possible fix might be to defer the completion to be done
to a CPU that is already online, which is what is done in RFC. It would be
good to get feedback on how this could be reproduced, if the RFC makes sense
or if there is another way to solve this.

Thanks!

[1] https://lore.kernel.org/all/20240913214205.12359-2-frederic@kernel.org/
[2] https://lore.kernel.org/all/169972295552.3135.1094880886431606890.tip-bot2@tip-bot2/

Usama Arif (1):
  sched: defer completion task to online CPU

 kernel/sched/completion.c | 25 ++++++++++++++++++++++++-
 1 file changed, 24 insertions(+), 1 deletion(-)