diff mbox series

[1/3] drm/gma500: Fix BUG: sleeping function called from invalid context errors

Message ID 20220905133738.466490-2-hdegoede@redhat.com (mailing list archive)
State New, archived
Headers show
Series drm/gma500: Fix 2 locking related WARN_ON oopses | expand

Commit Message

Hans de Goede Sept. 5, 2022, 1:37 p.m. UTC
gma_crtc_page_flip() was holding the event_lock spinlock while calling
crtc_funcs->mode_set_base() which takes ww_mutex.

The only reason to hold event_lock is to clear gma_crtc->page_flip_event
on mode_set_base() errors.

Instead unlock it after setting gma_crtc->page_flip_event and on
errors re-take the lock and clear gma_crtc->page_flip_event it
it is still set.

This fixes the following WARN/stacktrace:

[  512.122953] BUG: sleeping function called from invalid context at kernel/locking/mutex.c:870
[  512.123004] in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 1253, name: gnome-shell
[  512.123031] preempt_count: 1, expected: 0
[  512.123048] RCU nest depth: 0, expected: 0
[  512.123066] INFO: lockdep is turned off.
[  512.123080] irq event stamp: 0
[  512.123094] hardirqs last  enabled at (0): [<0000000000000000>] 0x0
[  512.123134] hardirqs last disabled at (0): [<ffffffff8d0ec28c>] copy_process+0x9fc/0x1de0
[  512.123176] softirqs last  enabled at (0): [<ffffffff8d0ec28c>] copy_process+0x9fc/0x1de0
[  512.123207] softirqs last disabled at (0): [<0000000000000000>] 0x0
[  512.123233] Preemption disabled at:
[  512.123241] [<0000000000000000>] 0x0
[  512.123275] CPU: 3 PID: 1253 Comm: gnome-shell Tainted: G        W         5.19.0+ #1
[  512.123304] Hardware name: Packard Bell dot s/SJE01_CT, BIOS V1.10 07/23/2013
[  512.123323] Call Trace:
[  512.123346]  <TASK>
[  512.123370]  dump_stack_lvl+0x5b/0x77
[  512.123412]  __might_resched.cold+0xff/0x13a
[  512.123458]  ww_mutex_lock+0x1e/0xa0
[  512.123495]  psb_gem_pin+0x2c/0x150 [gma500_gfx]
[  512.123601]  gma_pipe_set_base+0x76/0x240 [gma500_gfx]
[  512.123708]  gma_crtc_page_flip+0x95/0x130 [gma500_gfx]
[  512.123808]  drm_mode_page_flip_ioctl+0x57d/0x5d0
[  512.123897]  ? drm_mode_cursor2_ioctl+0x10/0x10
[  512.123936]  drm_ioctl_kernel+0xa1/0x150
[  512.123984]  drm_ioctl+0x21f/0x420
[  512.124025]  ? drm_mode_cursor2_ioctl+0x10/0x10
[  512.124070]  ? rcu_read_lock_bh_held+0xb/0x60
[  512.124104]  ? lock_release+0x1ef/0x2d0
[  512.124161]  __x64_sys_ioctl+0x8d/0xd0
[  512.124203]  do_syscall_64+0x58/0x80
[  512.124239]  ? do_syscall_64+0x67/0x80
[  512.124267]  ? trace_hardirqs_on_prepare+0x55/0xe0
[  512.124300]  ? do_syscall_64+0x67/0x80
[  512.124340]  ? rcu_read_lock_sched_held+0x10/0x80
[  512.124377]  entry_SYSCALL_64_after_hwframe+0x63/0xcd
[  512.124411] RIP: 0033:0x7fcc4a70740f
[  512.124442] Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24 10 00 00 00 48 89 44 24 08 48 8d 44 24 20 48 89 44 24 10 b8 10 00 00 00 0f 05 <89> c2 3d 00 f0 ff ff 77 18 48 8b 44 24 18 64 48 2b 04 25 28 00 00
[  512.124470] RSP: 002b:00007ffda73f5390 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[  512.124503] RAX: ffffffffffffffda RBX: 000055cc9e474500 RCX: 00007fcc4a70740f
[  512.124524] RDX: 00007ffda73f5420 RSI: 00000000c01864b0 RDI: 0000000000000009
[  512.124544] RBP: 00007ffda73f5420 R08: 000055cc9c0b0cb0 R09: 0000000000000034
[  512.124564] R10: 0000000000000000 R11: 0000000000000246 R12: 00000000c01864b0
[  512.124584] R13: 0000000000000009 R14: 000055cc9df484d0 R15: 000055cc9af5d0c0
[  512.124647]  </TASK>

Signed-off-by: Hans de Goede <hdegoede@redhat.com>
---
 drivers/gpu/drm/gma500/gma_display.c | 10 +++++++---
 1 file changed, 7 insertions(+), 3 deletions(-)

Comments

Patrik Jakobsson Sept. 6, 2022, 12:50 p.m. UTC | #1
On Mon, Sep 5, 2022 at 3:37 PM Hans de Goede <hdegoede@redhat.com> wrote:
>
> gma_crtc_page_flip() was holding the event_lock spinlock while calling
> crtc_funcs->mode_set_base() which takes ww_mutex.
>
> The only reason to hold event_lock is to clear gma_crtc->page_flip_event
> on mode_set_base() errors.
>
> Instead unlock it after setting gma_crtc->page_flip_event and on
> errors re-take the lock and clear gma_crtc->page_flip_event it
> it is still set.

Hi Hans, thanks for having a look at gma500.

See comments below.

>
> This fixes the following WARN/stacktrace:
>
> [  512.122953] BUG: sleeping function called from invalid context at kernel/locking/mutex.c:870
> [  512.123004] in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 1253, name: gnome-shell
> [  512.123031] preempt_count: 1, expected: 0
> [  512.123048] RCU nest depth: 0, expected: 0
> [  512.123066] INFO: lockdep is turned off.
> [  512.123080] irq event stamp: 0
> [  512.123094] hardirqs last  enabled at (0): [<0000000000000000>] 0x0
> [  512.123134] hardirqs last disabled at (0): [<ffffffff8d0ec28c>] copy_process+0x9fc/0x1de0
> [  512.123176] softirqs last  enabled at (0): [<ffffffff8d0ec28c>] copy_process+0x9fc/0x1de0
> [  512.123207] softirqs last disabled at (0): [<0000000000000000>] 0x0
> [  512.123233] Preemption disabled at:
> [  512.123241] [<0000000000000000>] 0x0
> [  512.123275] CPU: 3 PID: 1253 Comm: gnome-shell Tainted: G        W         5.19.0+ #1
> [  512.123304] Hardware name: Packard Bell dot s/SJE01_CT, BIOS V1.10 07/23/2013
> [  512.123323] Call Trace:
> [  512.123346]  <TASK>
> [  512.123370]  dump_stack_lvl+0x5b/0x77
> [  512.123412]  __might_resched.cold+0xff/0x13a
> [  512.123458]  ww_mutex_lock+0x1e/0xa0
> [  512.123495]  psb_gem_pin+0x2c/0x150 [gma500_gfx]
> [  512.123601]  gma_pipe_set_base+0x76/0x240 [gma500_gfx]
> [  512.123708]  gma_crtc_page_flip+0x95/0x130 [gma500_gfx]
> [  512.123808]  drm_mode_page_flip_ioctl+0x57d/0x5d0
> [  512.123897]  ? drm_mode_cursor2_ioctl+0x10/0x10
> [  512.123936]  drm_ioctl_kernel+0xa1/0x150
> [  512.123984]  drm_ioctl+0x21f/0x420
> [  512.124025]  ? drm_mode_cursor2_ioctl+0x10/0x10
> [  512.124070]  ? rcu_read_lock_bh_held+0xb/0x60
> [  512.124104]  ? lock_release+0x1ef/0x2d0
> [  512.124161]  __x64_sys_ioctl+0x8d/0xd0
> [  512.124203]  do_syscall_64+0x58/0x80
> [  512.124239]  ? do_syscall_64+0x67/0x80
> [  512.124267]  ? trace_hardirqs_on_prepare+0x55/0xe0
> [  512.124300]  ? do_syscall_64+0x67/0x80
> [  512.124340]  ? rcu_read_lock_sched_held+0x10/0x80
> [  512.124377]  entry_SYSCALL_64_after_hwframe+0x63/0xcd
> [  512.124411] RIP: 0033:0x7fcc4a70740f
> [  512.124442] Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24 10 00 00 00 48 89 44 24 08 48 8d 44 24 20 48 89 44 24 10 b8 10 00 00 00 0f 05 <89> c2 3d 00 f0 ff ff 77 18 48 8b 44 24 18 64 48 2b 04 25 28 00 00
> [  512.124470] RSP: 002b:00007ffda73f5390 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
> [  512.124503] RAX: ffffffffffffffda RBX: 000055cc9e474500 RCX: 00007fcc4a70740f
> [  512.124524] RDX: 00007ffda73f5420 RSI: 00000000c01864b0 RDI: 0000000000000009
> [  512.124544] RBP: 00007ffda73f5420 R08: 000055cc9c0b0cb0 R09: 0000000000000034
> [  512.124564] R10: 0000000000000000 R11: 0000000000000246 R12: 00000000c01864b0
> [  512.124584] R13: 0000000000000009 R14: 000055cc9df484d0 R15: 000055cc9af5d0c0
> [  512.124647]  </TASK>
>
> Signed-off-by: Hans de Goede <hdegoede@redhat.com>
> ---
>  drivers/gpu/drm/gma500/gma_display.c | 10 +++++++---
>  1 file changed, 7 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/gpu/drm/gma500/gma_display.c b/drivers/gpu/drm/gma500/gma_display.c
> index bd40c040a2c9..cf038e322164 100644
> --- a/drivers/gpu/drm/gma500/gma_display.c
> +++ b/drivers/gpu/drm/gma500/gma_display.c
> @@ -532,15 +532,19 @@ int gma_crtc_page_flip(struct drm_crtc *crtc,
>                 WARN_ON(drm_crtc_vblank_get(crtc) != 0);
>
>                 gma_crtc->page_flip_event = event;
> +               spin_unlock_irqrestore(&dev->event_lock, flags);
>
>                 /* Call this locked if we want an event at vblank interrupt. */

If we don't hold the event_lock around mode_set_base() we could
potentially get a vblank before we do the modeset. That would send the
event prematurely. I think this is what the comment tries to tell us.

-Patrik

>                 ret = crtc_funcs->mode_set_base(crtc, crtc->x, crtc->y, old_fb);
>                 if (ret) {
> -                       gma_crtc->page_flip_event = NULL;
> -                       drm_crtc_vblank_put(crtc);
> +                       spin_lock_irqsave(&dev->event_lock, flags);
> +                       if (gma_crtc->page_flip_event) {
> +                               gma_crtc->page_flip_event = NULL;
> +                               drm_crtc_vblank_put(crtc);
> +                       }
> +                       spin_unlock_irqrestore(&dev->event_lock, flags);
>                 }
>
> -               spin_unlock_irqrestore(&dev->event_lock, flags);
>         } else {
>                 ret = crtc_funcs->mode_set_base(crtc, crtc->x, crtc->y, old_fb);
>         }
> --
> 2.36.1
>
Hans de Goede Sept. 6, 2022, 4:49 p.m. UTC | #2
Hi,

On 9/6/22 14:50, Patrik Jakobsson wrote:
> On Mon, Sep 5, 2022 at 3:37 PM Hans de Goede <hdegoede@redhat.com> wrote:
>>
>> gma_crtc_page_flip() was holding the event_lock spinlock while calling
>> crtc_funcs->mode_set_base() which takes ww_mutex.
>>
>> The only reason to hold event_lock is to clear gma_crtc->page_flip_event
>> on mode_set_base() errors.
>>
>> Instead unlock it after setting gma_crtc->page_flip_event and on
>> errors re-take the lock and clear gma_crtc->page_flip_event it
>> it is still set.
> 
> Hi Hans, thanks for having a look at gma500.
> 
> See comments below.
> 
>>
>> This fixes the following WARN/stacktrace:
>>
>> [  512.122953] BUG: sleeping function called from invalid context at kernel/locking/mutex.c:870
>> [  512.123004] in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 1253, name: gnome-shell
>> [  512.123031] preempt_count: 1, expected: 0
>> [  512.123048] RCU nest depth: 0, expected: 0
>> [  512.123066] INFO: lockdep is turned off.
>> [  512.123080] irq event stamp: 0
>> [  512.123094] hardirqs last  enabled at (0): [<0000000000000000>] 0x0
>> [  512.123134] hardirqs last disabled at (0): [<ffffffff8d0ec28c>] copy_process+0x9fc/0x1de0
>> [  512.123176] softirqs last  enabled at (0): [<ffffffff8d0ec28c>] copy_process+0x9fc/0x1de0
>> [  512.123207] softirqs last disabled at (0): [<0000000000000000>] 0x0
>> [  512.123233] Preemption disabled at:
>> [  512.123241] [<0000000000000000>] 0x0
>> [  512.123275] CPU: 3 PID: 1253 Comm: gnome-shell Tainted: G        W         5.19.0+ #1
>> [  512.123304] Hardware name: Packard Bell dot s/SJE01_CT, BIOS V1.10 07/23/2013
>> [  512.123323] Call Trace:
>> [  512.123346]  <TASK>
>> [  512.123370]  dump_stack_lvl+0x5b/0x77
>> [  512.123412]  __might_resched.cold+0xff/0x13a
>> [  512.123458]  ww_mutex_lock+0x1e/0xa0
>> [  512.123495]  psb_gem_pin+0x2c/0x150 [gma500_gfx]
>> [  512.123601]  gma_pipe_set_base+0x76/0x240 [gma500_gfx]
>> [  512.123708]  gma_crtc_page_flip+0x95/0x130 [gma500_gfx]
>> [  512.123808]  drm_mode_page_flip_ioctl+0x57d/0x5d0
>> [  512.123897]  ? drm_mode_cursor2_ioctl+0x10/0x10
>> [  512.123936]  drm_ioctl_kernel+0xa1/0x150
>> [  512.123984]  drm_ioctl+0x21f/0x420
>> [  512.124025]  ? drm_mode_cursor2_ioctl+0x10/0x10
>> [  512.124070]  ? rcu_read_lock_bh_held+0xb/0x60
>> [  512.124104]  ? lock_release+0x1ef/0x2d0
>> [  512.124161]  __x64_sys_ioctl+0x8d/0xd0
>> [  512.124203]  do_syscall_64+0x58/0x80
>> [  512.124239]  ? do_syscall_64+0x67/0x80
>> [  512.124267]  ? trace_hardirqs_on_prepare+0x55/0xe0
>> [  512.124300]  ? do_syscall_64+0x67/0x80
>> [  512.124340]  ? rcu_read_lock_sched_held+0x10/0x80
>> [  512.124377]  entry_SYSCALL_64_after_hwframe+0x63/0xcd
>> [  512.124411] RIP: 0033:0x7fcc4a70740f
>> [  512.124442] Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24 10 00 00 00 48 89 44 24 08 48 8d 44 24 20 48 89 44 24 10 b8 10 00 00 00 0f 05 <89> c2 3d 00 f0 ff ff 77 18 48 8b 44 24 18 64 48 2b 04 25 28 00 00
>> [  512.124470] RSP: 002b:00007ffda73f5390 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
>> [  512.124503] RAX: ffffffffffffffda RBX: 000055cc9e474500 RCX: 00007fcc4a70740f
>> [  512.124524] RDX: 00007ffda73f5420 RSI: 00000000c01864b0 RDI: 0000000000000009
>> [  512.124544] RBP: 00007ffda73f5420 R08: 000055cc9c0b0cb0 R09: 0000000000000034
>> [  512.124564] R10: 0000000000000000 R11: 0000000000000246 R12: 00000000c01864b0
>> [  512.124584] R13: 0000000000000009 R14: 000055cc9df484d0 R15: 000055cc9af5d0c0
>> [  512.124647]  </TASK>
>>
>> Signed-off-by: Hans de Goede <hdegoede@redhat.com>
>> ---
>>  drivers/gpu/drm/gma500/gma_display.c | 10 +++++++---
>>  1 file changed, 7 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/gma500/gma_display.c b/drivers/gpu/drm/gma500/gma_display.c
>> index bd40c040a2c9..cf038e322164 100644
>> --- a/drivers/gpu/drm/gma500/gma_display.c
>> +++ b/drivers/gpu/drm/gma500/gma_display.c
>> @@ -532,15 +532,19 @@ int gma_crtc_page_flip(struct drm_crtc *crtc,
>>                 WARN_ON(drm_crtc_vblank_get(crtc) != 0);
>>
>>                 gma_crtc->page_flip_event = event;
>> +               spin_unlock_irqrestore(&dev->event_lock, flags);
>>
>>                 /* Call this locked if we want an event at vblank interrupt. */
> 
> If we don't hold the event_lock around mode_set_base() we could
> potentially get a vblank before we do the modeset. That would send the
> event prematurely. I think this is what the comment tries to tell us.

There is no way to avoid the vblank-irq and the mode_set_base() call racing
with each other and then delivering a blank event from what is actually
the page-flip which just completed.

Even with the old code, of we hold the lock over the mode_set_base() and then
the vblank-irq triggers while we are holding the lock this will block the
irq-handler until the mode_set_base() has completed (which is probably
only a couple 100s of usecs / max 1 msec) and then as soon as
gma_crtc_page_flip() releases the lock the irq-handler will continue
running and still prematurely deliver the vblank event.

In practice this is not a problem because userspace does:

-submit frame
-wait for vblank
-render new frame
-submit new frame
-wait for vblank

And the time it takes to render a new frame means that after the
first wait for vblank userspace never hits the race, unless
userspace cannot keep up (is rendering at say less then 60 fps)
in that case it may hit the race if it is only barely keeping up
and if it is only barely keeping up then userspace hitting / winning
the race is actually a good thing, because then it should start
rendering the next frame asap.

So TL;DR: we cannot avoid sometimes racing but in practice this
is not an issue. The kernel oopses this fixes OTOH are a real
issue.

I hope this helps explain why I still believe this is the right fix.

Regards,

Hans





> 
> -Patrik
> 
>>                 ret = crtc_funcs->mode_set_base(crtc, crtc->x, crtc->y, old_fb);
>>                 if (ret) {
>> -                       gma_crtc->page_flip_event = NULL;
>> -                       drm_crtc_vblank_put(crtc);
>> +                       spin_lock_irqsave(&dev->event_lock, flags);
>> +                       if (gma_crtc->page_flip_event) {
>> +                               gma_crtc->page_flip_event = NULL;
>> +                               drm_crtc_vblank_put(crtc);
>> +                       }
>> +                       spin_unlock_irqrestore(&dev->event_lock, flags);
>>                 }
>>
>> -               spin_unlock_irqrestore(&dev->event_lock, flags);
>>         } else {
>>                 ret = crtc_funcs->mode_set_base(crtc, crtc->x, crtc->y, old_fb);
>>         }
>> --
>> 2.36.1
>>
>
diff mbox series

Patch

diff --git a/drivers/gpu/drm/gma500/gma_display.c b/drivers/gpu/drm/gma500/gma_display.c
index bd40c040a2c9..cf038e322164 100644
--- a/drivers/gpu/drm/gma500/gma_display.c
+++ b/drivers/gpu/drm/gma500/gma_display.c
@@ -532,15 +532,19 @@  int gma_crtc_page_flip(struct drm_crtc *crtc,
 		WARN_ON(drm_crtc_vblank_get(crtc) != 0);
 
 		gma_crtc->page_flip_event = event;
+		spin_unlock_irqrestore(&dev->event_lock, flags);
 
 		/* Call this locked if we want an event at vblank interrupt. */
 		ret = crtc_funcs->mode_set_base(crtc, crtc->x, crtc->y, old_fb);
 		if (ret) {
-			gma_crtc->page_flip_event = NULL;
-			drm_crtc_vblank_put(crtc);
+			spin_lock_irqsave(&dev->event_lock, flags);
+			if (gma_crtc->page_flip_event) {
+				gma_crtc->page_flip_event = NULL;
+				drm_crtc_vblank_put(crtc);
+			}
+			spin_unlock_irqrestore(&dev->event_lock, flags);
 		}
 
-		spin_unlock_irqrestore(&dev->event_lock, flags);
 	} else {
 		ret = crtc_funcs->mode_set_base(crtc, crtc->x, crtc->y, old_fb);
 	}