Message ID | 20230515134133.108780-1-mcanal@igalia.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | drm/vkms: Fix race-condition between the hrtimer and the atomic commit | expand |
Hello, kernel test robot noticed "WARNING:inconsistent_lock_state" on: commit: 82ef6a95c29b55f39cf65ac6a29926fca6025161 ("[PATCH] drm/vkms: Fix race-condition between the hrtimer and the atomic commit") url: https://github.com/intel-lab-lkp/linux/commits/Ma-ra-Canal/drm-vkms-Fix-race-condition-between-the-hrtimer-and-the-atomic-commit/20230515-214326 base: git://anongit.freedesktop.org/drm/drm-misc drm-misc-next patch link: https://lore.kernel.org/all/20230515134133.108780-1-mcanal@igalia.com/ patch subject: [PATCH] drm/vkms: Fix race-condition between the hrtimer and the atomic commit in testcase: boot compiler: clang-14 test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 16G (please refer to attached dmesg/kmsg for entire log/backtrace) +-------------------------------------------------+------------+------------+ | | 812562b8d8 | 82ef6a95c2 | +-------------------------------------------------+------------+------------+ | boot_successes | 10 | 0 | | boot_failures | 0 | 10 | | WARNING:inconsistent_lock_state | 0 | 10 | | inconsistent{HARDIRQ-ON-W}->{IN-HARDIRQ-W}usage | 0 | 10 | | EIP:default_idle | 0 | 10 | +-------------------------------------------------+------------+------------+ If you fix the issue, kindly add following tag | Reported-by: kernel test robot <yujie.liu@intel.com> | Closes: https://lore.kernel.org/oe-lkp/202305221659.6504ded7-yujie.liu@intel.com [ 9.581561][ C0] WARNING: inconsistent lock state [ 9.581563][ C0] 6.4.0-rc1-00152-g82ef6a95c29b #1 Not tainted [ 9.581565][ C0] -------------------------------- [ 9.581567][ C0] inconsistent {HARDIRQ-ON-W} -> {IN-HARDIRQ-W} usage. [ 9.581569][ C0] swapper/0/0 [HC1[1]:SC0[0]:HE0:SE1] takes: [ 9.581574][ C0] c88bc220 (&dev->event_lock){?.+.}-{2:2}, at: drm_handle_vblank (drivers/gpu/drm/drm_vblank.c:?) [ 9.581591][ C0] {HARDIRQ-ON-W} state was registered at: [ 9.581593][ C0] lock_acquire (kernel/locking/lockdep.c:5691) [ 9.581599][ C0] _raw_spin_lock (include/linux/spinlock_api_smp.h:133 kernel/locking/spinlock.c:154) [ 9.581605][ C0] vkms_crtc_atomic_flush (drivers/gpu/drm/vkms/vkms_crtc.c:255) [ 9.581610][ C0] drm_atomic_helper_commit_planes (drivers/gpu/drm/drm_atomic_helper.c:2765) [ 9.581615][ C0] vkms_atomic_commit_tail (drivers/gpu/drm/vkms/vkms_drv.c:75) [ 9.581618][ C0] commit_tail (drivers/gpu/drm/drm_atomic_helper.c:?) [ 9.581621][ C0] drm_atomic_helper_commit (drivers/gpu/drm/drm_atomic_helper.c:?) [ 9.581623][ C0] drm_atomic_commit (drivers/gpu/drm/drm_atomic.c:1504) [ 9.581628][ C0] drm_client_modeset_commit_atomic (drivers/gpu/drm/drm_client_modeset.c:?) [ 9.581633][ C0] drm_client_modeset_commit_locked (drivers/gpu/drm/drm_client_modeset.c:1148) [ 9.581636][ C0] drm_client_modeset_commit (drivers/gpu/drm/drm_client_modeset.c:1174) [ 9.581640][ C0] __drm_fb_helper_restore_fbdev_mode_unlocked (drivers/gpu/drm/drm_fb_helper.c:?) [ 9.581645][ C0] drm_fb_helper_set_par (drivers/gpu/drm/drm_fb_helper.c:?) [ 9.581648][ C0] fbcon_init (drivers/video/fbdev/core/fbcon.c:?) [ 9.581651][ C0] visual_init (drivers/tty/vt/vt.c:1025) [ 9.581654][ C0] do_bind_con_driver (drivers/tty/vt/vt.c:3643) [ 9.581656][ C0] do_take_over_console (drivers/tty/vt/vt.c:4222) [ 9.581658][ C0] do_fbcon_takeover (drivers/video/fbdev/core/fbcon.c:530) [ 9.581661][ C0] do_fb_registered (drivers/video/fbdev/core/fbcon.c:3002) [ 9.581664][ C0] fbcon_fb_registered (drivers/video/fbdev/core/fbcon.c:3022) [ 9.581666][ C0] register_framebuffer (drivers/video/fbdev/core/fbmem.c:1499) [ 9.581673][ C0] __drm_fb_helper_initial_config_and_unlock (drivers/gpu/drm/drm_fb_helper.c:2053) [ 9.581676][ C0] drm_fb_helper_initial_config (drivers/gpu/drm/drm_fb_helper.c:?) [ 9.581679][ C0] drm_fbdev_generic_client_hotplug (drivers/gpu/drm/drm_fbdev_generic.c:?) [ 9.581682][ C0] drm_fbdev_generic_setup (drivers/gpu/drm/drm_fbdev_generic.c:344) [ 9.581685][ C0] vkms_init (drivers/gpu/drm/vkms/vkms_drv.c:?) [ 9.581690][ C0] do_one_initcall (init/main.c:1246) [ 9.581694][ C0] do_initcall_level (init/main.c:1318) [ 9.581699][ C0] do_initcalls (init/main.c:1332) [ 9.581701][ C0] do_basic_setup (init/main.c:1355) [ 9.581704][ C0] kernel_init_freeable (init/main.c:1575) [ 9.581706][ C0] kernel_init (init/main.c:1464) [ 9.581713][ C0] ret_from_fork (arch/x86/entry/entry_32.S:770) [ 9.581716][ C0] irq event stamp: 127808 [ 9.581717][ C0] hardirqs last enabled at (127807): do_idle (kernel/sched/idle.c:?) [ 9.581726][ C0] hardirqs last disabled at (127808): sysvec_apic_timer_interrupt (arch/x86/kernel/apic/apic.c:1106) [ 9.581731][ C0] softirqs last enabled at (127802): do_softirq_own_stack (arch/x86/kernel/irq_32.c:57 arch/x86/kernel/irq_32.c:147) [ 9.581735][ C0] softirqs last disabled at (127791): do_softirq_own_stack (arch/x86/kernel/irq_32.c:57 arch/x86/kernel/irq_32.c:147) [ 9.581738][ C0] [ 9.581738][ C0] other info that might help us debug this: [ 9.581739][ C0] Possible unsafe locking scenario: [ 9.581739][ C0] [ 9.581740][ C0] CPU0 [ 9.581741][ C0] ---- [ 9.581742][ C0] lock(&dev->event_lock); [ 9.581745][ C0] <Interrupt> [ 9.581746][ C0] lock(&dev->event_lock); [ 9.581748][ C0] [ 9.581748][ C0] *** DEADLOCK *** [ 9.581748][ C0] [ 9.581750][ C0] no locks held by swapper/0/0. [ 9.581751][ C0] [ 9.581751][ C0] stack backtrace: [ 9.581754][ C0] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.4.0-rc1-00152-g82ef6a95c29b #1 714d45c675a05052bf3771f9fbc7c7e12a497b74 [ 9.581758][ C0] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014 [ 9.581760][ C0] Call Trace: [ 9.581764][ C0] dump_stack_lvl (lib/dump_stack.c:107) [ 9.581769][ C0] dump_stack (lib/dump_stack.c:113) [ 9.581773][ C0] print_usage_bug (kernel/locking/lockdep.c:?) [ 9.581778][ C0] mark_lock_irq (kernel/locking/lockdep.c:?) [ 9.581781][ C0] ? stack_trace_save (kernel/stacktrace.c:122) [ 9.581787][ C0] ? save_trace (kernel/locking/lockdep.c:582) [ 9.581791][ C0] mark_lock (kernel/locking/lockdep.c:4651) [ 9.581794][ C0] __lock_acquire (kernel/locking/lockdep.c:4537) [ 9.581799][ C0] ? __lock_acquire (kernel/locking/lockdep.c:4572) [ 9.581803][ C0] ? __lock_acquire (kernel/locking/lockdep.c:4572) [ 9.581808][ C0] lock_acquire (kernel/locking/lockdep.c:5691) [ 9.581812][ C0] ? drm_handle_vblank (drivers/gpu/drm/drm_vblank.c:?) [ 9.581816][ C0] ? ktime_get (kernel/time/timekeeping.c:846 kernel/time/timekeeping.c:846) [ 9.581823][ C0] _raw_spin_lock_irqsave (include/linux/spinlock_api_smp.h:110 kernel/locking/spinlock.c:162) [ 9.581826][ C0] ? drm_handle_vblank (drivers/gpu/drm/drm_vblank.c:?) [ 9.581829][ C0] drm_handle_vblank (drivers/gpu/drm/drm_vblank.c:?) [ 9.581833][ C0] drm_crtc_handle_vblank (drivers/gpu/drm/drm_vblank.c:2050) [ 9.581836][ C0] vkms_vblank_simulate (drivers/gpu/drm/vkms/vkms_crtc.c:29) [ 9.581839][ C0] __run_hrtimer (kernel/time/hrtimer.c:?) [ 9.581843][ C0] ? vkms_get_vblank_timestamp (drivers/gpu/drm/vkms/vkms_crtc.c:13) [ 9.581847][ C0] __hrtimer_run_queues (kernel/time/hrtimer.c:1749) [ 9.581851][ C0] ? _raw_spin_lock_irqsave (include/linux/spinlock_api_smp.h:110 kernel/locking/spinlock.c:162) [ 9.581855][ C0] hrtimer_interrupt (kernel/time/hrtimer.c:1811) [ 9.581860][ C0] __sysvec_apic_timer_interrupt (arch/x86/include/asm/jump_label.h:41 include/linux/jump_label.h:207 arch/x86/include/asm/trace/irq_vectors.h:41 arch/x86/kernel/apic/apic.c:1113) [ 9.581867][ C0] ? sysvec_call_function_single (arch/x86/kernel/apic/apic.c:1106) [ 9.581871][ C0] sysvec_apic_timer_interrupt (arch/x86/kernel/apic/apic.c:1106) [ 9.581875][ C0] ? do_idle (kernel/sched/idle.c:?) [ 9.581880][ C0] handle_exception (arch/x86/entry/entry_32.S:1076) [ 9.581888][ C0] EIP: default_idle (arch/x86/include/asm/irqflags.h:37 arch/x86/include/asm/irqflags.h:72 arch/x86/kernel/process.c:711) [ 9.581893][ C0] Code: cd 56 68 30 6e 42 c3 e8 10 91 69 ff 83 c4 08 eb cf 00 00 cc cc 00 00 cc cc 00 00 cc 55 89 e5 eb 07 0f 00 2d 0c f8 18 c2 fb f4 <fa> 5d c3 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 55 89 e5 e8 All code ======== 0: cd 56 int $0x56 2: 68 30 6e 42 c3 push $0xffffffffc3426e30 7: e8 10 91 69 ff call 0xffffffffff69911c c: 83 c4 08 add $0x8,%esp f: eb cf jmp 0xffffffffffffffe0 11: 00 00 add %al,(%rax) 13: cc int3 14: cc int3 15: 00 00 add %al,(%rax) 17: cc int3 18: cc int3 19: 00 00 add %al,(%rax) 1b: cc int3 1c: 55 push %rbp 1d: 89 e5 mov %esp,%ebp 1f: eb 07 jmp 0x28 21: 0f 00 2d 0c f8 18 c2 verw -0x3de707f4(%rip) # 0xffffffffc218f834 28: fb sti 29: f4 hlt 2a:* fa cli <-- trapping instruction 2b: 5d pop %rbp 2c: c3 ret 2d: 90 nop 2e: 90 nop 2f: 90 nop 30: 90 nop 31: 90 nop 32: 90 nop 33: 90 nop 34: 90 nop 35: 90 nop 36: 90 nop 37: 90 nop 38: 90 nop 39: 90 nop 3a: 90 nop 3b: 90 nop 3c: 55 push %rbp 3d: 89 e5 mov %esp,%ebp 3f: e8 .byte 0xe8 Code starting with the faulting instruction =========================================== 0: fa cli 1: 5d pop %rbp 2: c3 ret 3: 90 nop 4: 90 nop 5: 90 nop 6: 90 nop 7: 90 nop 8: 90 nop 9: 90 nop a: 90 nop b: 90 nop c: 90 nop d: 90 nop e: 90 nop f: 90 nop 10: 90 nop 11: 90 nop 12: 55 push %rbp 13: 89 e5 mov %esp,%ebp 15: e8 .byte 0xe8 To reproduce: # build kernel cd linux cp config-6.4.0-rc1-00152-g82ef6a95c29b .config make HOSTCC=clang-14 CC=clang-14 ARCH=i386 olddefconfig prepare modules_prepare bzImage modules make HOSTCC=clang-14 CC=clang-14 ARCH=i386 INSTALL_MOD_PATH=<mod-install-dir> modules_install cd <mod-install-dir> find lib/ | cpio -o -H newc --quiet | gzip > modules.cgz git clone https://github.com/intel/lkp-tests.git cd lkp-tests bin/lkp qemu -k <bzImage> -m modules.cgz job-script # job-script is attached in this email # if come across any failure that blocks the test, # please remove ~/.lkp and /lkp dir to run from a clean state.
diff --git a/drivers/gpu/drm/vkms/vkms_composer.c b/drivers/gpu/drm/vkms/vkms_composer.c index 906d3df40cdb..e65104a2203d 100644 --- a/drivers/gpu/drm/vkms/vkms_composer.c +++ b/drivers/gpu/drm/vkms/vkms_composer.c @@ -320,10 +320,15 @@ void vkms_set_composer(struct vkms_output *out, bool enabled) if (enabled) drm_crtc_vblank_get(&out->crtc); - spin_lock_irq(&out->lock); + mutex_lock(&out->lock); old_enabled = out->composer_enabled; out->composer_enabled = enabled; - spin_unlock_irq(&out->lock); + + /* the composition wasn't enabled, so unlock the lock to make sure the lock + * will be balanced even if we have a failed commit + */ + if (!out->composer_enabled) + mutex_unlock(&out->lock); if (old_enabled) drm_crtc_vblank_put(&out->crtc); diff --git a/drivers/gpu/drm/vkms/vkms_crtc.c b/drivers/gpu/drm/vkms/vkms_crtc.c index 515f6772b866..5a0ee9530748 100644 --- a/drivers/gpu/drm/vkms/vkms_crtc.c +++ b/drivers/gpu/drm/vkms/vkms_crtc.c @@ -16,7 +16,7 @@ static enum hrtimer_restart vkms_vblank_simulate(struct hrtimer *timer) struct drm_crtc *crtc = &output->crtc; struct vkms_crtc_state *state; u64 ret_overrun; - bool ret, fence_cookie; + bool ret, fence_cookie, composer_enabled; fence_cookie = dma_fence_begin_signalling(); @@ -25,15 +25,15 @@ static enum hrtimer_restart vkms_vblank_simulate(struct hrtimer *timer) if (ret_overrun != 1) pr_warn("%s: vblank timer overrun\n", __func__); - spin_lock(&output->lock); ret = drm_crtc_handle_vblank(crtc); if (!ret) DRM_ERROR("vkms failure on handling vblank"); state = output->composer_state; - spin_unlock(&output->lock); + composer_enabled = output->composer_enabled; + mutex_unlock(&output->lock); - if (state && output->composer_enabled) { + if (state && composer_enabled) { u64 frame = drm_crtc_accurate_vblank_count(crtc); /* update frame_start only if a queued vkms_composer_worker() @@ -241,7 +241,7 @@ static void vkms_crtc_atomic_begin(struct drm_crtc *crtc, /* This lock is held across the atomic commit to block vblank timer * from scheduling vkms_composer_worker until the composer is updated */ - spin_lock_irq(&vkms_output->lock); + mutex_lock(&vkms_output->lock); } static void vkms_crtc_atomic_flush(struct drm_crtc *crtc, @@ -264,7 +264,7 @@ static void vkms_crtc_atomic_flush(struct drm_crtc *crtc, vkms_output->composer_state = to_vkms_crtc_state(crtc->state); - spin_unlock_irq(&vkms_output->lock); + mutex_unlock(&vkms_output->lock); } static const struct drm_crtc_helper_funcs vkms_crtc_helper_funcs = { @@ -290,7 +290,7 @@ int vkms_crtc_init(struct drm_device *dev, struct drm_crtc *crtc, drm_crtc_helper_add(crtc, &vkms_crtc_helper_funcs); - spin_lock_init(&vkms_out->lock); + mutex_init(&vkms_out->lock); spin_lock_init(&vkms_out->composer_lock); vkms_out->composer_workq = alloc_ordered_workqueue("vkms_composer", 0); diff --git a/drivers/gpu/drm/vkms/vkms_drv.h b/drivers/gpu/drm/vkms/vkms_drv.h index 5f1a0a44a78c..883266bb0d4a 100644 --- a/drivers/gpu/drm/vkms/vkms_drv.h +++ b/drivers/gpu/drm/vkms/vkms_drv.h @@ -99,7 +99,7 @@ struct vkms_output { /* ordered wq for composer_work */ struct workqueue_struct *composer_workq; /* protects concurrent access to composer */ - spinlock_t lock; + struct mutex lock; /* protected by @lock */ bool composer_enabled;
Currently, it is possible for the composer to be set as enabled and then as disabled without a proper call for the vkms_vblank_simulate(). This is problematic, because the driver would skip one CRC output, causing CRC tests to fail. Therefore, we need to make sure that, for each time the composer is set as enabled, a composer job is added to the queue. In order to provide this guarantee, add a mutex that will lock before the composer is set as enabled and will unlock only after the composer job is added to the queue. This way, we can have a guarantee that the driver won't skip a CRC entry. This race-condition is affecting the IGT test "writeback-check-output", making the test fail and also, leaking writeback framebuffers, as the writeback job is queued, but it is not signaled. This patch avoids both problems. Signed-off-by: Maíra Canal <mcanal@igalia.com> --- drivers/gpu/drm/vkms/vkms_composer.c | 9 +++++++-- drivers/gpu/drm/vkms/vkms_crtc.c | 14 +++++++------- drivers/gpu/drm/vkms/vkms_drv.h | 2 +- 3 files changed, 15 insertions(+), 10 deletions(-)