diff mbox series

drm/vkms: Fix race-condition between the hrtimer and the atomic commit

Message ID 20230515134133.108780-1-mcanal@igalia.com (mailing list archive)
State New, archived
Headers show
Series drm/vkms: Fix race-condition between the hrtimer and the atomic commit | expand

Commit Message

Maíra Canal May 15, 2023, 1:41 p.m. UTC
Currently, it is possible for the composer to be set as enabled and then
as disabled without a proper call for the vkms_vblank_simulate(). This
is problematic, because the driver would skip one CRC output, causing CRC
tests to fail. Therefore, we need to make sure that, for each time the
composer is set as enabled, a composer job is added to the queue.

In order to provide this guarantee, add a mutex that will lock before
the composer is set as enabled and will unlock only after the composer
job is added to the queue. This way, we can have a guarantee that the
driver won't skip a CRC entry.

This race-condition is affecting the IGT test "writeback-check-output",
making the test fail and also, leaking writeback framebuffers, as the
writeback job is queued, but it is not signaled. This patch avoids both
problems.

Signed-off-by: Maíra Canal <mcanal@igalia.com>
---
 drivers/gpu/drm/vkms/vkms_composer.c |  9 +++++++--
 drivers/gpu/drm/vkms/vkms_crtc.c     | 14 +++++++-------
 drivers/gpu/drm/vkms/vkms_drv.h      |  2 +-
 3 files changed, 15 insertions(+), 10 deletions(-)

Comments

Yujie Liu May 22, 2023, 4:57 p.m. UTC | #1
Hello,

kernel test robot noticed "WARNING:inconsistent_lock_state" on:

commit: 82ef6a95c29b55f39cf65ac6a29926fca6025161 ("[PATCH] drm/vkms: Fix race-condition between the hrtimer and the atomic commit")
url: https://github.com/intel-lab-lkp/linux/commits/Ma-ra-Canal/drm-vkms-Fix-race-condition-between-the-hrtimer-and-the-atomic-commit/20230515-214326
base: git://anongit.freedesktop.org/drm/drm-misc drm-misc-next
patch link: https://lore.kernel.org/all/20230515134133.108780-1-mcanal@igalia.com/
patch subject: [PATCH] drm/vkms: Fix race-condition between the hrtimer and the atomic commit

in testcase: boot

compiler: clang-14
test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 16G

(please refer to attached dmesg/kmsg for entire log/backtrace)


+-------------------------------------------------+------------+------------+
|                                                 | 812562b8d8 | 82ef6a95c2 |
+-------------------------------------------------+------------+------------+
| boot_successes                                  | 10         | 0          |
| boot_failures                                   | 0          | 10         |
| WARNING:inconsistent_lock_state                 | 0          | 10         |
| inconsistent{HARDIRQ-ON-W}->{IN-HARDIRQ-W}usage | 0          | 10         |
| EIP:default_idle                                | 0          | 10         |
+-------------------------------------------------+------------+------------+


If you fix the issue, kindly add following tag
| Reported-by: kernel test robot <yujie.liu@intel.com>
| Closes: https://lore.kernel.org/oe-lkp/202305221659.6504ded7-yujie.liu@intel.com


[    9.581561][    C0] WARNING: inconsistent lock state
[    9.581563][    C0] 6.4.0-rc1-00152-g82ef6a95c29b #1 Not tainted
[    9.581565][    C0] --------------------------------
[    9.581567][    C0] inconsistent {HARDIRQ-ON-W} -> {IN-HARDIRQ-W} usage.
[    9.581569][    C0] swapper/0/0 [HC1[1]:SC0[0]:HE0:SE1] takes:
[ 9.581574][ C0] c88bc220 (&dev->event_lock){?.+.}-{2:2}, at: drm_handle_vblank (drivers/gpu/drm/drm_vblank.c:?) 
[    9.581591][    C0] {HARDIRQ-ON-W} state was registered at:
[ 9.581593][ C0] lock_acquire (kernel/locking/lockdep.c:5691) 
[ 9.581599][ C0] _raw_spin_lock (include/linux/spinlock_api_smp.h:133 kernel/locking/spinlock.c:154) 
[ 9.581605][ C0] vkms_crtc_atomic_flush (drivers/gpu/drm/vkms/vkms_crtc.c:255) 
[ 9.581610][ C0] drm_atomic_helper_commit_planes (drivers/gpu/drm/drm_atomic_helper.c:2765) 
[ 9.581615][ C0] vkms_atomic_commit_tail (drivers/gpu/drm/vkms/vkms_drv.c:75) 
[ 9.581618][ C0] commit_tail (drivers/gpu/drm/drm_atomic_helper.c:?) 
[ 9.581621][ C0] drm_atomic_helper_commit (drivers/gpu/drm/drm_atomic_helper.c:?) 
[ 9.581623][ C0] drm_atomic_commit (drivers/gpu/drm/drm_atomic.c:1504) 
[ 9.581628][ C0] drm_client_modeset_commit_atomic (drivers/gpu/drm/drm_client_modeset.c:?) 
[ 9.581633][ C0] drm_client_modeset_commit_locked (drivers/gpu/drm/drm_client_modeset.c:1148) 
[ 9.581636][ C0] drm_client_modeset_commit (drivers/gpu/drm/drm_client_modeset.c:1174) 
[ 9.581640][ C0] __drm_fb_helper_restore_fbdev_mode_unlocked (drivers/gpu/drm/drm_fb_helper.c:?) 
[ 9.581645][ C0] drm_fb_helper_set_par (drivers/gpu/drm/drm_fb_helper.c:?) 
[ 9.581648][ C0] fbcon_init (drivers/video/fbdev/core/fbcon.c:?) 
[ 9.581651][ C0] visual_init (drivers/tty/vt/vt.c:1025) 
[ 9.581654][ C0] do_bind_con_driver (drivers/tty/vt/vt.c:3643) 
[ 9.581656][ C0] do_take_over_console (drivers/tty/vt/vt.c:4222) 
[ 9.581658][ C0] do_fbcon_takeover (drivers/video/fbdev/core/fbcon.c:530) 
[ 9.581661][ C0] do_fb_registered (drivers/video/fbdev/core/fbcon.c:3002) 
[ 9.581664][ C0] fbcon_fb_registered (drivers/video/fbdev/core/fbcon.c:3022) 
[ 9.581666][ C0] register_framebuffer (drivers/video/fbdev/core/fbmem.c:1499) 
[ 9.581673][ C0] __drm_fb_helper_initial_config_and_unlock (drivers/gpu/drm/drm_fb_helper.c:2053) 
[ 9.581676][ C0] drm_fb_helper_initial_config (drivers/gpu/drm/drm_fb_helper.c:?) 
[ 9.581679][ C0] drm_fbdev_generic_client_hotplug (drivers/gpu/drm/drm_fbdev_generic.c:?) 
[ 9.581682][ C0] drm_fbdev_generic_setup (drivers/gpu/drm/drm_fbdev_generic.c:344) 
[ 9.581685][ C0] vkms_init (drivers/gpu/drm/vkms/vkms_drv.c:?) 
[ 9.581690][ C0] do_one_initcall (init/main.c:1246) 
[ 9.581694][ C0] do_initcall_level (init/main.c:1318) 
[ 9.581699][ C0] do_initcalls (init/main.c:1332) 
[ 9.581701][ C0] do_basic_setup (init/main.c:1355) 
[ 9.581704][ C0] kernel_init_freeable (init/main.c:1575) 
[ 9.581706][ C0] kernel_init (init/main.c:1464) 
[ 9.581713][ C0] ret_from_fork (arch/x86/entry/entry_32.S:770) 
[    9.581716][    C0] irq event stamp: 127808
[ 9.581717][ C0] hardirqs last enabled at (127807): do_idle (kernel/sched/idle.c:?) 
[ 9.581726][ C0] hardirqs last disabled at (127808): sysvec_apic_timer_interrupt (arch/x86/kernel/apic/apic.c:1106) 
[ 9.581731][ C0] softirqs last enabled at (127802): do_softirq_own_stack (arch/x86/kernel/irq_32.c:57 arch/x86/kernel/irq_32.c:147) 
[ 9.581735][ C0] softirqs last disabled at (127791): do_softirq_own_stack (arch/x86/kernel/irq_32.c:57 arch/x86/kernel/irq_32.c:147) 
[    9.581738][    C0]
[    9.581738][    C0] other info that might help us debug this:
[    9.581739][    C0]  Possible unsafe locking scenario:
[    9.581739][    C0]
[    9.581740][    C0]        CPU0
[    9.581741][    C0]        ----
[    9.581742][    C0]   lock(&dev->event_lock);
[    9.581745][    C0]   <Interrupt>
[    9.581746][    C0]     lock(&dev->event_lock);
[    9.581748][    C0]
[    9.581748][    C0]  *** DEADLOCK ***
[    9.581748][    C0]
[    9.581750][    C0] no locks held by swapper/0/0.
[    9.581751][    C0]
[    9.581751][    C0] stack backtrace:
[    9.581754][    C0] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.4.0-rc1-00152-g82ef6a95c29b #1 714d45c675a05052bf3771f9fbc7c7e12a497b74
[    9.581758][    C0] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
[    9.581760][    C0] Call Trace:
[ 9.581764][ C0] dump_stack_lvl (lib/dump_stack.c:107) 
[ 9.581769][ C0] dump_stack (lib/dump_stack.c:113) 
[ 9.581773][ C0] print_usage_bug (kernel/locking/lockdep.c:?) 
[ 9.581778][ C0] mark_lock_irq (kernel/locking/lockdep.c:?) 
[ 9.581781][ C0] ? stack_trace_save (kernel/stacktrace.c:122) 
[ 9.581787][ C0] ? save_trace (kernel/locking/lockdep.c:582) 
[ 9.581791][ C0] mark_lock (kernel/locking/lockdep.c:4651) 
[ 9.581794][ C0] __lock_acquire (kernel/locking/lockdep.c:4537) 
[ 9.581799][ C0] ? __lock_acquire (kernel/locking/lockdep.c:4572) 
[ 9.581803][ C0] ? __lock_acquire (kernel/locking/lockdep.c:4572) 
[ 9.581808][ C0] lock_acquire (kernel/locking/lockdep.c:5691) 
[ 9.581812][ C0] ? drm_handle_vblank (drivers/gpu/drm/drm_vblank.c:?) 
[ 9.581816][ C0] ? ktime_get (kernel/time/timekeeping.c:846 kernel/time/timekeeping.c:846) 
[ 9.581823][ C0] _raw_spin_lock_irqsave (include/linux/spinlock_api_smp.h:110 kernel/locking/spinlock.c:162) 
[ 9.581826][ C0] ? drm_handle_vblank (drivers/gpu/drm/drm_vblank.c:?) 
[ 9.581829][ C0] drm_handle_vblank (drivers/gpu/drm/drm_vblank.c:?) 
[ 9.581833][ C0] drm_crtc_handle_vblank (drivers/gpu/drm/drm_vblank.c:2050) 
[ 9.581836][ C0] vkms_vblank_simulate (drivers/gpu/drm/vkms/vkms_crtc.c:29) 
[ 9.581839][ C0] __run_hrtimer (kernel/time/hrtimer.c:?) 
[ 9.581843][ C0] ? vkms_get_vblank_timestamp (drivers/gpu/drm/vkms/vkms_crtc.c:13) 
[ 9.581847][ C0] __hrtimer_run_queues (kernel/time/hrtimer.c:1749) 
[ 9.581851][ C0] ? _raw_spin_lock_irqsave (include/linux/spinlock_api_smp.h:110 kernel/locking/spinlock.c:162) 
[ 9.581855][ C0] hrtimer_interrupt (kernel/time/hrtimer.c:1811) 
[ 9.581860][ C0] __sysvec_apic_timer_interrupt (arch/x86/include/asm/jump_label.h:41 include/linux/jump_label.h:207 arch/x86/include/asm/trace/irq_vectors.h:41 arch/x86/kernel/apic/apic.c:1113) 
[ 9.581867][ C0] ? sysvec_call_function_single (arch/x86/kernel/apic/apic.c:1106) 
[ 9.581871][ C0] sysvec_apic_timer_interrupt (arch/x86/kernel/apic/apic.c:1106) 
[ 9.581875][ C0] ? do_idle (kernel/sched/idle.c:?) 
[ 9.581880][ C0] handle_exception (arch/x86/entry/entry_32.S:1076) 
[ 9.581888][ C0] EIP: default_idle (arch/x86/include/asm/irqflags.h:37 arch/x86/include/asm/irqflags.h:72 arch/x86/kernel/process.c:711) 
[ 9.581893][ C0] Code: cd 56 68 30 6e 42 c3 e8 10 91 69 ff 83 c4 08 eb cf 00 00 cc cc 00 00 cc cc 00 00 cc 55 89 e5 eb 07 0f 00 2d 0c f8 18 c2 fb f4 <fa> 5d c3 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 55 89 e5 e8
All code
========
   0:	cd 56                	int    $0x56
   2:	68 30 6e 42 c3       	push   $0xffffffffc3426e30
   7:	e8 10 91 69 ff       	call   0xffffffffff69911c
   c:	83 c4 08             	add    $0x8,%esp
   f:	eb cf                	jmp    0xffffffffffffffe0
  11:	00 00                	add    %al,(%rax)
  13:	cc                   	int3
  14:	cc                   	int3
  15:	00 00                	add    %al,(%rax)
  17:	cc                   	int3
  18:	cc                   	int3
  19:	00 00                	add    %al,(%rax)
  1b:	cc                   	int3
  1c:	55                   	push   %rbp
  1d:	89 e5                	mov    %esp,%ebp
  1f:	eb 07                	jmp    0x28
  21:	0f 00 2d 0c f8 18 c2 	verw   -0x3de707f4(%rip)        # 0xffffffffc218f834
  28:	fb                   	sti
  29:	f4                   	hlt
  2a:*	fa                   	cli		<-- trapping instruction
  2b:	5d                   	pop    %rbp
  2c:	c3                   	ret
  2d:	90                   	nop
  2e:	90                   	nop
  2f:	90                   	nop
  30:	90                   	nop
  31:	90                   	nop
  32:	90                   	nop
  33:	90                   	nop
  34:	90                   	nop
  35:	90                   	nop
  36:	90                   	nop
  37:	90                   	nop
  38:	90                   	nop
  39:	90                   	nop
  3a:	90                   	nop
  3b:	90                   	nop
  3c:	55                   	push   %rbp
  3d:	89 e5                	mov    %esp,%ebp
  3f:	e8                   	.byte 0xe8

Code starting with the faulting instruction
===========================================
   0:	fa                   	cli
   1:	5d                   	pop    %rbp
   2:	c3                   	ret
   3:	90                   	nop
   4:	90                   	nop
   5:	90                   	nop
   6:	90                   	nop
   7:	90                   	nop
   8:	90                   	nop
   9:	90                   	nop
   a:	90                   	nop
   b:	90                   	nop
   c:	90                   	nop
   d:	90                   	nop
   e:	90                   	nop
   f:	90                   	nop
  10:	90                   	nop
  11:	90                   	nop
  12:	55                   	push   %rbp
  13:	89 e5                	mov    %esp,%ebp
  15:	e8                   	.byte 0xe8


To reproduce:

        # build kernel
	cd linux
	cp config-6.4.0-rc1-00152-g82ef6a95c29b .config
	make HOSTCC=clang-14 CC=clang-14 ARCH=i386 olddefconfig prepare modules_prepare bzImage modules
	make HOSTCC=clang-14 CC=clang-14 ARCH=i386 INSTALL_MOD_PATH=<mod-install-dir> modules_install
	cd <mod-install-dir>
	find lib/ | cpio -o -H newc --quiet | gzip > modules.cgz


        git clone https://github.com/intel/lkp-tests.git
        cd lkp-tests
        bin/lkp qemu -k <bzImage> -m modules.cgz job-script # job-script is attached in this email

        # if come across any failure that blocks the test,
        # please remove ~/.lkp and /lkp dir to run from a clean state.
diff mbox series

Patch

diff --git a/drivers/gpu/drm/vkms/vkms_composer.c b/drivers/gpu/drm/vkms/vkms_composer.c
index 906d3df40cdb..e65104a2203d 100644
--- a/drivers/gpu/drm/vkms/vkms_composer.c
+++ b/drivers/gpu/drm/vkms/vkms_composer.c
@@ -320,10 +320,15 @@  void vkms_set_composer(struct vkms_output *out, bool enabled)
 	if (enabled)
 		drm_crtc_vblank_get(&out->crtc);
 
-	spin_lock_irq(&out->lock);
+	mutex_lock(&out->lock);
 	old_enabled = out->composer_enabled;
 	out->composer_enabled = enabled;
-	spin_unlock_irq(&out->lock);
+
+	/* the composition wasn't enabled, so unlock the lock to make sure the lock
+	 * will be balanced even if we have a failed commit
+	 */
+	if (!out->composer_enabled)
+		mutex_unlock(&out->lock);
 
 	if (old_enabled)
 		drm_crtc_vblank_put(&out->crtc);
diff --git a/drivers/gpu/drm/vkms/vkms_crtc.c b/drivers/gpu/drm/vkms/vkms_crtc.c
index 515f6772b866..5a0ee9530748 100644
--- a/drivers/gpu/drm/vkms/vkms_crtc.c
+++ b/drivers/gpu/drm/vkms/vkms_crtc.c
@@ -16,7 +16,7 @@  static enum hrtimer_restart vkms_vblank_simulate(struct hrtimer *timer)
 	struct drm_crtc *crtc = &output->crtc;
 	struct vkms_crtc_state *state;
 	u64 ret_overrun;
-	bool ret, fence_cookie;
+	bool ret, fence_cookie, composer_enabled;
 
 	fence_cookie = dma_fence_begin_signalling();
 
@@ -25,15 +25,15 @@  static enum hrtimer_restart vkms_vblank_simulate(struct hrtimer *timer)
 	if (ret_overrun != 1)
 		pr_warn("%s: vblank timer overrun\n", __func__);
 
-	spin_lock(&output->lock);
 	ret = drm_crtc_handle_vblank(crtc);
 	if (!ret)
 		DRM_ERROR("vkms failure on handling vblank");
 
 	state = output->composer_state;
-	spin_unlock(&output->lock);
+	composer_enabled = output->composer_enabled;
+	mutex_unlock(&output->lock);
 
-	if (state && output->composer_enabled) {
+	if (state && composer_enabled) {
 		u64 frame = drm_crtc_accurate_vblank_count(crtc);
 
 		/* update frame_start only if a queued vkms_composer_worker()
@@ -241,7 +241,7 @@  static void vkms_crtc_atomic_begin(struct drm_crtc *crtc,
 	/* This lock is held across the atomic commit to block vblank timer
 	 * from scheduling vkms_composer_worker until the composer is updated
 	 */
-	spin_lock_irq(&vkms_output->lock);
+	mutex_lock(&vkms_output->lock);
 }
 
 static void vkms_crtc_atomic_flush(struct drm_crtc *crtc,
@@ -264,7 +264,7 @@  static void vkms_crtc_atomic_flush(struct drm_crtc *crtc,
 
 	vkms_output->composer_state = to_vkms_crtc_state(crtc->state);
 
-	spin_unlock_irq(&vkms_output->lock);
+	mutex_unlock(&vkms_output->lock);
 }
 
 static const struct drm_crtc_helper_funcs vkms_crtc_helper_funcs = {
@@ -290,7 +290,7 @@  int vkms_crtc_init(struct drm_device *dev, struct drm_crtc *crtc,
 
 	drm_crtc_helper_add(crtc, &vkms_crtc_helper_funcs);
 
-	spin_lock_init(&vkms_out->lock);
+	mutex_init(&vkms_out->lock);
 	spin_lock_init(&vkms_out->composer_lock);
 
 	vkms_out->composer_workq = alloc_ordered_workqueue("vkms_composer", 0);
diff --git a/drivers/gpu/drm/vkms/vkms_drv.h b/drivers/gpu/drm/vkms/vkms_drv.h
index 5f1a0a44a78c..883266bb0d4a 100644
--- a/drivers/gpu/drm/vkms/vkms_drv.h
+++ b/drivers/gpu/drm/vkms/vkms_drv.h
@@ -99,7 +99,7 @@  struct vkms_output {
 	/* ordered wq for composer_work */
 	struct workqueue_struct *composer_workq;
 	/* protects concurrent access to composer */
-	spinlock_t lock;
+	struct mutex lock;
 
 	/* protected by @lock */
 	bool composer_enabled;