diff mbox

drm/amdgpu: release parent fence reference

Message ID 1477247507-11378-4-git-send-email-notasas@gmail.com (mailing list archive)
State New, archived
Headers show

Commit Message

Grazvydas Ignotas Oct. 23, 2016, 6:31 p.m. UTC
It's done in amd_sched_hw_job_reset(), but not in normal job processing.
Leak reported by CONFIG_SLUB_DEBUG.

Signed-off-by: Grazvydas Ignotas <notasas@gmail.com>
---
 CONFIG_SLUB_DEBUG reports more leaks related to ioctls,
 but I was unable to track them down...

 drivers/gpu/drm/amd/scheduler/gpu_scheduler.c | 2 ++
 1 file changed, 2 insertions(+)

Comments

Chunming Zhou Oct. 24, 2016, 2:25 a.m. UTC | #1
On 2016年10月24日 02:31, Grazvydas Ignotas wrote:
> It's done in amd_sched_hw_job_reset(), but not in normal job processing.
> Leak reported by CONFIG_SLUB_DEBUG.
>
> Signed-off-by: Grazvydas Ignotas <notasas@gmail.com>
> ---
>   CONFIG_SLUB_DEBUG reports more leaks related to ioctls,
>   but I was unable to track them down...
>
>   drivers/gpu/drm/amd/scheduler/gpu_scheduler.c | 2 ++
>   1 file changed, 2 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/scheduler/gpu_scheduler.c b/drivers/gpu/drm/amd/scheduler/gpu_scheduler.c
> index 910b8d5..cfb686e 100644
> --- a/drivers/gpu/drm/amd/scheduler/gpu_scheduler.c
> +++ b/drivers/gpu/drm/amd/scheduler/gpu_scheduler.c
> @@ -522,6 +522,8 @@ static void amd_sched_process_job(struct fence *f, struct fence_cb *cb)
>   
>   	trace_amd_sched_process_job(s_fence);
>   	fence_put(&s_fence->finished);
> +	fence_put(s_fence->parent);
If I remember correctly, parent was put in sched fence release.

Regards,
David Zhou
> +	s_fence->parent = NULL;
>   	wake_up_interruptible(&sched->wake_up_worker);
>   }
>
Christian König Oct. 24, 2016, 9:13 a.m. UTC | #2
Am 24.10.2016 um 04:25 schrieb zhoucm1:
>
>
> On 2016年10月24日 02:31, Grazvydas Ignotas wrote:
>> It's done in amd_sched_hw_job_reset(), but not in normal job processing.
>> Leak reported by CONFIG_SLUB_DEBUG.
>>
>> Signed-off-by: Grazvydas Ignotas <notasas@gmail.com>
>> ---
>>   CONFIG_SLUB_DEBUG reports more leaks related to ioctls,
>>   but I was unable to track them down...
>>
>>   drivers/gpu/drm/amd/scheduler/gpu_scheduler.c | 2 ++
>>   1 file changed, 2 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/amd/scheduler/gpu_scheduler.c 
>> b/drivers/gpu/drm/amd/scheduler/gpu_scheduler.c
>> index 910b8d5..cfb686e 100644
>> --- a/drivers/gpu/drm/amd/scheduler/gpu_scheduler.c
>> +++ b/drivers/gpu/drm/amd/scheduler/gpu_scheduler.c
>> @@ -522,6 +522,8 @@ static void amd_sched_process_job(struct fence 
>> *f, struct fence_cb *cb)
>>         trace_amd_sched_process_job(s_fence);
>>       fence_put(&s_fence->finished);
>> +    fence_put(s_fence->parent);
> If I remember correctly, parent was put in sched fence release.

Yes, exactly. It's only released in amd_sched_hw_job_reset() because we 
want to assign a new parent fence.

So this is a clear NAK.

Regards,
Christian.

>
> Regards,
> David Zhou
>> +    s_fence->parent = NULL;
>>       wake_up_interruptible(&sched->wake_up_worker);
>>   }
>
> _______________________________________________
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Grazvydas Ignotas Oct. 24, 2016, 9:43 a.m. UTC | #3
On Mon, Oct 24, 2016 at 12:13 PM, Christian König
<deathsimple@vodafone.de> wrote:
> Am 24.10.2016 um 04:25 schrieb zhoucm1:
>>
>>
>>
>> On 2016年10月24日 02:31, Grazvydas Ignotas wrote:
>>>
>>> It's done in amd_sched_hw_job_reset(), but not in normal job processing.
>>> Leak reported by CONFIG_SLUB_DEBUG.
>>>
>>> Signed-off-by: Grazvydas Ignotas <notasas@gmail.com>
>>> ---
>>>   CONFIG_SLUB_DEBUG reports more leaks related to ioctls,
>>>   but I was unable to track them down...
>>>
>>>   drivers/gpu/drm/amd/scheduler/gpu_scheduler.c | 2 ++
>>>   1 file changed, 2 insertions(+)
>>>
>>> diff --git a/drivers/gpu/drm/amd/scheduler/gpu_scheduler.c
>>> b/drivers/gpu/drm/amd/scheduler/gpu_scheduler.c
>>> index 910b8d5..cfb686e 100644
>>> --- a/drivers/gpu/drm/amd/scheduler/gpu_scheduler.c
>>> +++ b/drivers/gpu/drm/amd/scheduler/gpu_scheduler.c
>>> @@ -522,6 +522,8 @@ static void amd_sched_process_job(struct fence *f,
>>> struct fence_cb *cb)
>>>         trace_amd_sched_process_job(s_fence);
>>>       fence_put(&s_fence->finished);
>>> +    fence_put(s_fence->parent);
>>
>> If I remember correctly, parent was put in sched fence release.
>
>
> Yes, exactly. It's only released in amd_sched_hw_job_reset() because we want
> to assign a new parent fence.

Well then it doesn't work, SLUB_DEBUG detects leaks on teardown, and
with my patch they're gone.
Perhaps the problem is that when new parent fence is assigned, the old
one is not put? I won't be able to check until the weekend, so if
anybody else can do it, please go ahead.

Gražvydas
Christian König Oct. 28, 2016, 11:47 a.m. UTC | #4
Am 24.10.2016 um 11:43 schrieb Grazvydas Ignotas:
> On Mon, Oct 24, 2016 at 12:13 PM, Christian König
> <deathsimple@vodafone.de> wrote:
>> Am 24.10.2016 um 04:25 schrieb zhoucm1:
>>>
>>>
>>> On 2016年10月24日 02:31, Grazvydas Ignotas wrote:
>>>> It's done in amd_sched_hw_job_reset(), but not in normal job processing.
>>>> Leak reported by CONFIG_SLUB_DEBUG.
>>>>
>>>> Signed-off-by: Grazvydas Ignotas <notasas@gmail.com>
>>>> ---
>>>>    CONFIG_SLUB_DEBUG reports more leaks related to ioctls,
>>>>    but I was unable to track them down...
>>>>
>>>>    drivers/gpu/drm/amd/scheduler/gpu_scheduler.c | 2 ++
>>>>    1 file changed, 2 insertions(+)
>>>>
>>>> diff --git a/drivers/gpu/drm/amd/scheduler/gpu_scheduler.c
>>>> b/drivers/gpu/drm/amd/scheduler/gpu_scheduler.c
>>>> index 910b8d5..cfb686e 100644
>>>> --- a/drivers/gpu/drm/amd/scheduler/gpu_scheduler.c
>>>> +++ b/drivers/gpu/drm/amd/scheduler/gpu_scheduler.c
>>>> @@ -522,6 +522,8 @@ static void amd_sched_process_job(struct fence *f,
>>>> struct fence_cb *cb)
>>>>          trace_amd_sched_process_job(s_fence);
>>>>        fence_put(&s_fence->finished);
>>>> +    fence_put(s_fence->parent);
>>> If I remember correctly, parent was put in sched fence release.
>>
>> Yes, exactly. It's only released in amd_sched_hw_job_reset() because we want
>> to assign a new parent fence.
> Well then it doesn't work, SLUB_DEBUG detects leaks on teardown, and
> with my patch they're gone.
> Perhaps the problem is that when new parent fence is assigned, the old
> one is not put? I won't be able to check until the weekend, so if
> anybody else can do it, please go ahead.

Mhm, what steps do you do to reproduce this?

I'm looking into this a bit and it sounds like we just doesn't correctly 
tear down the scheduler on driver unload.

Regards,
Christian.

>
> Gražvydas
Christian König Oct. 28, 2016, 12:21 p.m. UTC | #5
Am 28.10.2016 um 13:47 schrieb Christian König:
> Am 24.10.2016 um 11:43 schrieb Grazvydas Ignotas:
>> On Mon, Oct 24, 2016 at 12:13 PM, Christian König
>> <deathsimple@vodafone.de> wrote:
>>> Am 24.10.2016 um 04:25 schrieb zhoucm1:
>>>>
>>>>
>>>> On 2016年10月24日 02:31, Grazvydas Ignotas wrote:
>>>>> It's done in amd_sched_hw_job_reset(), but not in normal job 
>>>>> processing.
>>>>> Leak reported by CONFIG_SLUB_DEBUG.
>>>>>
>>>>> Signed-off-by: Grazvydas Ignotas <notasas@gmail.com>
>>>>> ---
>>>>>    CONFIG_SLUB_DEBUG reports more leaks related to ioctls,
>>>>>    but I was unable to track them down...
>>>>>
>>>>>    drivers/gpu/drm/amd/scheduler/gpu_scheduler.c | 2 ++
>>>>>    1 file changed, 2 insertions(+)
>>>>>
>>>>> diff --git a/drivers/gpu/drm/amd/scheduler/gpu_scheduler.c
>>>>> b/drivers/gpu/drm/amd/scheduler/gpu_scheduler.c
>>>>> index 910b8d5..cfb686e 100644
>>>>> --- a/drivers/gpu/drm/amd/scheduler/gpu_scheduler.c
>>>>> +++ b/drivers/gpu/drm/amd/scheduler/gpu_scheduler.c
>>>>> @@ -522,6 +522,8 @@ static void amd_sched_process_job(struct fence 
>>>>> *f,
>>>>> struct fence_cb *cb)
>>>>>          trace_amd_sched_process_job(s_fence);
>>>>>        fence_put(&s_fence->finished);
>>>>> +    fence_put(s_fence->parent);
>>>> If I remember correctly, parent was put in sched fence release.
>>>
>>> Yes, exactly. It's only released in amd_sched_hw_job_reset() because 
>>> we want
>>> to assign a new parent fence.
>> Well then it doesn't work, SLUB_DEBUG detects leaks on teardown, and
>> with my patch they're gone.
>> Perhaps the problem is that when new parent fence is assigned, the old
>> one is not put? I won't be able to check until the weekend, so if
>> anybody else can do it, please go ahead.
>
> Mhm, what steps do you do to reproduce this?

Never mind, I was able to figure out how to trigger it. You actually 
have to create a fence to leak it :)

Regards,
Christian.

>
> I'm looking into this a bit and it sounds like we just doesn't 
> correctly tear down the scheduler on driver unload.
>
> Regards,
> Christian.
>
>>
>> Gražvydas
>
>
diff mbox

Patch

diff --git a/drivers/gpu/drm/amd/scheduler/gpu_scheduler.c b/drivers/gpu/drm/amd/scheduler/gpu_scheduler.c
index 910b8d5..cfb686e 100644
--- a/drivers/gpu/drm/amd/scheduler/gpu_scheduler.c
+++ b/drivers/gpu/drm/amd/scheduler/gpu_scheduler.c
@@ -522,6 +522,8 @@  static void amd_sched_process_job(struct fence *f, struct fence_cb *cb)
 
 	trace_amd_sched_process_job(s_fence);
 	fence_put(&s_fence->finished);
+	fence_put(s_fence->parent);
+	s_fence->parent = NULL;
 	wake_up_interruptible(&sched->wake_up_worker);
 }