diff mbox

drm/radeon: add missing ttm_eu_backoff_reservation to radeon_bo_list_validate

Message ID 51DD36F0.3080500@canonical.com (mailing list archive)
State New, archived
Headers show

Commit Message

Maarten Lankhorst July 10, 2013, 10:26 a.m. UTC
Op 10-07-13 12:03, Markus Trippelsdorf schreef:
> On 2013.07.10 at 11:56 +0200, Maarten Lankhorst wrote:
>> Op 10-07-13 11:46, Markus Trippelsdorf schreef:
>>> On 2013.07.10 at 11:29 +0200, Maarten Lankhorst wrote:
>>>> Op 10-07-13 11:22, Markus Trippelsdorf schreef:
>>>>> By simply copy/pasting a big document under LibreOffice my system hangs
>>>>> itself up. Only a hard reset gets it working again.
>>>>> see also: https://bugs.freedesktop.org/show_bug.cgi?id=66551
>>>>>
>>>>> I've bisected the issue to:
>>>>>
>>>>> commit ecff665f5e3f1c6909353e00b9420e45ae23d995
>>>>> Author: Maarten Lankhorst <m.b.lankhorst@gmail.com>
>>>>> Date:   Thu Jun 27 13:48:17 2013 +0200
>>>>>
>>>>>     drm/ttm: make ttm reservation calls behave like reservation calls
>>>>>     
>>>>>     This commit converts the source of the val_seq counter to
>>>>>     the ww_mutex api. The reservation objects are converted later,
>>>>>     because there is still a lockdep splat in nouveau that has to
>>>>>     resolved first.
>>>>>     
>>>>>     Signed-off-by: Maarten Lankhorst <maarten.lankhorst@canonical.com>
>>>>>     Reviewed-by: Jerome Glisse <jglisse@redhat.com>
>>>>>     Signed-off-by: Dave Airlie <airlied@redhat.com>
>>>> Hey,
>>>>
>>>> Can you try current head with CONFIG_PROVE_LOCKING set and post the
>>>> lockdep splat from dmesg, if any? If there is any locking issue
>>>> lockdep should warn about it.  Lockdep will turn itself off after the
>>>> first splat, so if the lockdep splat happens before running the
>>>> affected parts those will have to be fixed first.
>>> There was an unrelated EDAC lockdep splat, so I simply disabled it.
>>>
>>> This is what I get:
>>>
>>> Jul 10 11:40:44 x4 kernel: ================================================
>>> Jul 10 11:40:44 x4 kernel: [ BUG: lock held when returning to user space! ]
>>> Jul 10 11:40:44 x4 kernel: 3.10.0-08587-g496322b #35 Not tainted
>>> Jul 10 11:40:44 x4 kernel: ------------------------------------------------
>>> Jul 10 11:40:44 x4 kernel: X/211 is leaving the kernel with locks still held!
>>> Jul 10 11:40:44 x4 kernel: 2 locks held by X/211:
>>> Jul 10 11:40:44 x4 kernel: #0:  (reservation_ww_class_acquire){+.+.+.}, at: [<ffffffff813279f0>] radeon_bo_list_validate+0x20/0xd0
>>> Jul 10 11:40:44 x4 kernel: #1:  (reservation_ww_class_mutex){+.+.+.}, at: [<ffffffff81309306>] ttm_eu_reserve_buffers+0x126/0x4b0
>>> Jul 10 11:40:52 x4 kernel: SysRq : Emergency Sync
>>> Jul 10 11:40:53 x4 kernel: Emergency Sync complete
>>>
>> Thanks, exactly what I thought. I missed a backoff somewhere..
>>
>> Does the below patch fix it?
> Yes. Thank you for your quick reply.

8<------
If radeon_cs_parser_relocs fails ttm_eu_backoff_reservation doesn't get called.
This left open a bug where ttm_eu_reserve_buffers succeeded but the bo's were
not unlocked afterwards:

Jul 10 11:40:44 x4 kernel: ================================================
Jul 10 11:40:44 x4 kernel: [ BUG: lock held when returning to user space! ]
Jul 10 11:40:44 x4 kernel: 3.10.0-08587-g496322b #35 Not tainted
Jul 10 11:40:44 x4 kernel: ------------------------------------------------
Jul 10 11:40:44 x4 kernel: X/211 is leaving the kernel with locks still held!
Jul 10 11:40:44 x4 kernel: 2 locks held by X/211:
Jul 10 11:40:44 x4 kernel: #0:  (reservation_ww_class_acquire){+.+.+.}, at: [<ffffffff813279f0>] radeon_bo_list_validate+0x20/0xd0
Jul 10 11:40:44 x4 kernel: #1:  (reservation_ww_class_mutex){+.+.+.}, at: [<ffffffff81309306>] ttm_eu_reserve_buffers+0x126/0x4b0
Jul 10 11:40:52 x4 kernel: SysRq : Emergency Sync
Jul 10 11:40:53 x4 kernel: Emergency Sync complete

This is a regression caused by commit ecff665f5e.
"drm/ttm: make ttm reservation calls behave like reservation calls"

Reported-by: Markus Trippelsdorf <markus@trippelsdorf.de>
Tested-by: Markus Trippelsdorf <markus@trippelsdorf.de>
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@canonical.com>
---

Comments

Alex Deucher July 11, 2013, 7:38 p.m. UTC | #1
I've picked up the patch for my fixes queue.  Thanks!

Alex

On Wed, Jul 10, 2013 at 6:26 AM, Maarten Lankhorst
<maarten.lankhorst@canonical.com> wrote:
> Op 10-07-13 12:03, Markus Trippelsdorf schreef:
>> On 2013.07.10 at 11:56 +0200, Maarten Lankhorst wrote:
>>> Op 10-07-13 11:46, Markus Trippelsdorf schreef:
>>>> On 2013.07.10 at 11:29 +0200, Maarten Lankhorst wrote:
>>>>> Op 10-07-13 11:22, Markus Trippelsdorf schreef:
>>>>>> By simply copy/pasting a big document under LibreOffice my system hangs
>>>>>> itself up. Only a hard reset gets it working again.
>>>>>> see also: https://bugs.freedesktop.org/show_bug.cgi?id=66551
>>>>>>
>>>>>> I've bisected the issue to:
>>>>>>
>>>>>> commit ecff665f5e3f1c6909353e00b9420e45ae23d995
>>>>>> Author: Maarten Lankhorst <m.b.lankhorst@gmail.com>
>>>>>> Date:   Thu Jun 27 13:48:17 2013 +0200
>>>>>>
>>>>>>     drm/ttm: make ttm reservation calls behave like reservation calls
>>>>>>
>>>>>>     This commit converts the source of the val_seq counter to
>>>>>>     the ww_mutex api. The reservation objects are converted later,
>>>>>>     because there is still a lockdep splat in nouveau that has to
>>>>>>     resolved first.
>>>>>>
>>>>>>     Signed-off-by: Maarten Lankhorst <maarten.lankhorst@canonical.com>
>>>>>>     Reviewed-by: Jerome Glisse <jglisse@redhat.com>
>>>>>>     Signed-off-by: Dave Airlie <airlied@redhat.com>
>>>>> Hey,
>>>>>
>>>>> Can you try current head with CONFIG_PROVE_LOCKING set and post the
>>>>> lockdep splat from dmesg, if any? If there is any locking issue
>>>>> lockdep should warn about it.  Lockdep will turn itself off after the
>>>>> first splat, so if the lockdep splat happens before running the
>>>>> affected parts those will have to be fixed first.
>>>> There was an unrelated EDAC lockdep splat, so I simply disabled it.
>>>>
>>>> This is what I get:
>>>>
>>>> Jul 10 11:40:44 x4 kernel: ================================================
>>>> Jul 10 11:40:44 x4 kernel: [ BUG: lock held when returning to user space! ]
>>>> Jul 10 11:40:44 x4 kernel: 3.10.0-08587-g496322b #35 Not tainted
>>>> Jul 10 11:40:44 x4 kernel: ------------------------------------------------
>>>> Jul 10 11:40:44 x4 kernel: X/211 is leaving the kernel with locks still held!
>>>> Jul 10 11:40:44 x4 kernel: 2 locks held by X/211:
>>>> Jul 10 11:40:44 x4 kernel: #0:  (reservation_ww_class_acquire){+.+.+.}, at: [<ffffffff813279f0>] radeon_bo_list_validate+0x20/0xd0
>>>> Jul 10 11:40:44 x4 kernel: #1:  (reservation_ww_class_mutex){+.+.+.}, at: [<ffffffff81309306>] ttm_eu_reserve_buffers+0x126/0x4b0
>>>> Jul 10 11:40:52 x4 kernel: SysRq : Emergency Sync
>>>> Jul 10 11:40:53 x4 kernel: Emergency Sync complete
>>>>
>>> Thanks, exactly what I thought. I missed a backoff somewhere..
>>>
>>> Does the below patch fix it?
>> Yes. Thank you for your quick reply.
>
> 8<------
> If radeon_cs_parser_relocs fails ttm_eu_backoff_reservation doesn't get called.
> This left open a bug where ttm_eu_reserve_buffers succeeded but the bo's were
> not unlocked afterwards:
>
> Jul 10 11:40:44 x4 kernel: ================================================
> Jul 10 11:40:44 x4 kernel: [ BUG: lock held when returning to user space! ]
> Jul 10 11:40:44 x4 kernel: 3.10.0-08587-g496322b #35 Not tainted
> Jul 10 11:40:44 x4 kernel: ------------------------------------------------
> Jul 10 11:40:44 x4 kernel: X/211 is leaving the kernel with locks still held!
> Jul 10 11:40:44 x4 kernel: 2 locks held by X/211:
> Jul 10 11:40:44 x4 kernel: #0:  (reservation_ww_class_acquire){+.+.+.}, at: [<ffffffff813279f0>] radeon_bo_list_validate+0x20/0xd0
> Jul 10 11:40:44 x4 kernel: #1:  (reservation_ww_class_mutex){+.+.+.}, at: [<ffffffff81309306>] ttm_eu_reserve_buffers+0x126/0x4b0
> Jul 10 11:40:52 x4 kernel: SysRq : Emergency Sync
> Jul 10 11:40:53 x4 kernel: Emergency Sync complete
>
> This is a regression caused by commit ecff665f5e.
> "drm/ttm: make ttm reservation calls behave like reservation calls"
>
> Reported-by: Markus Trippelsdorf <markus@trippelsdorf.de>
> Tested-by: Markus Trippelsdorf <markus@trippelsdorf.de>
> Signed-off-by: Maarten Lankhorst <maarten.lankhorst@canonical.com>
> ---
> diff --git a/drivers/gpu/drm/radeon/radeon_object.c b/drivers/gpu/drm/radeon/radeon_object.c
> index 0219d26..2020bf4 100644
> --- a/drivers/gpu/drm/radeon/radeon_object.c
> +++ b/drivers/gpu/drm/radeon/radeon_object.c
> @@ -377,6 +377,7 @@ int radeon_bo_list_validate(struct ww_acquire_ctx *ticket,
>                                         domain = lobj->alt_domain;
>                                         goto retry;
>                                 }
> +                               ttm_eu_backoff_reservation(ticket, head);
>                                 return r;
>                         }
>                 }
>
> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/dri-devel
diff mbox

Patch

diff --git a/drivers/gpu/drm/radeon/radeon_object.c b/drivers/gpu/drm/radeon/radeon_object.c
index 0219d26..2020bf4 100644
--- a/drivers/gpu/drm/radeon/radeon_object.c
+++ b/drivers/gpu/drm/radeon/radeon_object.c
@@ -377,6 +377,7 @@  int radeon_bo_list_validate(struct ww_acquire_ctx *ticket,
 					domain = lobj->alt_domain;
 					goto retry;
 				}
+				ttm_eu_backoff_reservation(ticket, head);
 				return r;
 			}
 		}