diff mbox series

drm/i915: Fix syncmap memory leak

Message ID 20210730195342.110234-1-matthew.brost@intel.com (mailing list archive)
State New, archived
Headers show
Series drm/i915: Fix syncmap memory leak | expand

Commit Message

Matthew Brost July 30, 2021, 7:53 p.m. UTC
A small race exists between intel_gt_retire_requests_timeout and
intel_timeline_exit which could result in the syncmap not getting
free'd. Rather than work to hard to seal this race, simply cleanup the
syncmap on fini.

unreferenced object 0xffff88813bc53b18 (size 96):
  comm "gem_close_race", pid 5410, jiffies 4294917818 (age 1105.600s)
  hex dump (first 32 bytes):
    01 00 00 00 00 00 00 00 00 00 00 00 0a 00 00 00  ................
    00 00 00 00 00 00 00 00 6b 6b 6b 6b 06 00 00 00  ........kkkk....
  backtrace:
    [<00000000120b863a>] __sync_alloc_leaf+0x1e/0x40 [i915]
    [<00000000042f6959>] __sync_set+0x1bb/0x240 [i915]
    [<0000000090f0e90f>] i915_request_await_dma_fence+0x1c7/0x400 [i915]
    [<0000000056a48219>] i915_request_await_object+0x222/0x360 [i915]
    [<00000000aaac4ee3>] i915_gem_do_execbuffer+0x1bd0/0x2250 [i915]
    [<000000003c9d830f>] i915_gem_execbuffer2_ioctl+0x405/0xce0 [i915]
    [<00000000fd7a8e68>] drm_ioctl_kernel+0xb0/0xf0 [drm]
    [<00000000e721ee87>] drm_ioctl+0x305/0x3c0 [drm]
    [<000000008b0d8986>] __x64_sys_ioctl+0x71/0xb0
    [<0000000076c362a4>] do_syscall_64+0x33/0x80
    [<00000000eb7a4831>] entry_SYSCALL_64_after_hwframe+0x44/0xa9

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Fixes: 531958f6f357 ("drm/i915/gt: Track timeline activeness in enter/exit")
Cc: <stable@vger.kernel.org>
---
 drivers/gpu/drm/i915/gt/intel_timeline.c | 9 +++++++++
 1 file changed, 9 insertions(+)

Comments

John Harrison Aug. 6, 2021, 6:23 p.m. UTC | #1
On 7/30/2021 12:53, Matthew Brost wrote:
> A small race exists between intel_gt_retire_requests_timeout and
> intel_timeline_exit which could result in the syncmap not getting
> free'd. Rather than work to hard to seal this race, simply cleanup the
free'd -> freed

> syncmap on fini.
>
> unreferenced object 0xffff88813bc53b18 (size 96):
>    comm "gem_close_race", pid 5410, jiffies 4294917818 (age 1105.600s)
>    hex dump (first 32 bytes):
>      01 00 00 00 00 00 00 00 00 00 00 00 0a 00 00 00  ................
>      00 00 00 00 00 00 00 00 6b 6b 6b 6b 06 00 00 00  ........kkkk....
>    backtrace:
>      [<00000000120b863a>] __sync_alloc_leaf+0x1e/0x40 [i915]
>      [<00000000042f6959>] __sync_set+0x1bb/0x240 [i915]
>      [<0000000090f0e90f>] i915_request_await_dma_fence+0x1c7/0x400 [i915]
>      [<0000000056a48219>] i915_request_await_object+0x222/0x360 [i915]
>      [<00000000aaac4ee3>] i915_gem_do_execbuffer+0x1bd0/0x2250 [i915]
>      [<000000003c9d830f>] i915_gem_execbuffer2_ioctl+0x405/0xce0 [i915]
>      [<00000000fd7a8e68>] drm_ioctl_kernel+0xb0/0xf0 [drm]
>      [<00000000e721ee87>] drm_ioctl+0x305/0x3c0 [drm]
>      [<000000008b0d8986>] __x64_sys_ioctl+0x71/0xb0
>      [<0000000076c362a4>] do_syscall_64+0x33/0x80
>      [<00000000eb7a4831>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
>
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> Fixes: 531958f6f357 ("drm/i915/gt: Track timeline activeness in enter/exit")
> Cc: <stable@vger.kernel.org>
> ---
>   drivers/gpu/drm/i915/gt/intel_timeline.c | 9 +++++++++
>   1 file changed, 9 insertions(+)
>
> diff --git a/drivers/gpu/drm/i915/gt/intel_timeline.c b/drivers/gpu/drm/i915/gt/intel_timeline.c
> index c4a126c8caef..1257f4f11e66 100644
> --- a/drivers/gpu/drm/i915/gt/intel_timeline.c
> +++ b/drivers/gpu/drm/i915/gt/intel_timeline.c
> @@ -127,6 +127,15 @@ static void intel_timeline_fini(struct rcu_head *rcu)
>   
>   	i915_vma_put(timeline->hwsp_ggtt);
>   	i915_active_fini(&timeline->active);
> +
> +	/*
> +	 * A small race exists between intel_gt_retire_requests_timeout and
> +	 * intel_timeline_exit which could result in the syncmap not getting
> +	 * free'd. Rather than work to hard to seal this race, simply cleanup
> +	 * the syncmap on fini.
What is the race? I'm going round in circles just trying to work out how 
intel_gt_retire_requests_timeout is supposed to get to 
intel_timeline_exit in the first place.

Also, free'd -> freed.

John.


> +	 */
> +	i915_syncmap_free(&timeline->sync);
> +
>   	kfree(timeline);
>   }
>
Matthew Brost Aug. 6, 2021, 6:29 p.m. UTC | #2
On Fri, Aug 06, 2021 at 11:23:06AM -0700, John Harrison wrote:
> On 7/30/2021 12:53, Matthew Brost wrote:
> > A small race exists between intel_gt_retire_requests_timeout and
> > intel_timeline_exit which could result in the syncmap not getting
> > free'd. Rather than work to hard to seal this race, simply cleanup the
> free'd -> freed
> 

Sure.

> > syncmap on fini.
> > 
> > unreferenced object 0xffff88813bc53b18 (size 96):
> >    comm "gem_close_race", pid 5410, jiffies 4294917818 (age 1105.600s)
> >    hex dump (first 32 bytes):
> >      01 00 00 00 00 00 00 00 00 00 00 00 0a 00 00 00  ................
> >      00 00 00 00 00 00 00 00 6b 6b 6b 6b 06 00 00 00  ........kkkk....
> >    backtrace:
> >      [<00000000120b863a>] __sync_alloc_leaf+0x1e/0x40 [i915]
> >      [<00000000042f6959>] __sync_set+0x1bb/0x240 [i915]
> >      [<0000000090f0e90f>] i915_request_await_dma_fence+0x1c7/0x400 [i915]
> >      [<0000000056a48219>] i915_request_await_object+0x222/0x360 [i915]
> >      [<00000000aaac4ee3>] i915_gem_do_execbuffer+0x1bd0/0x2250 [i915]
> >      [<000000003c9d830f>] i915_gem_execbuffer2_ioctl+0x405/0xce0 [i915]
> >      [<00000000fd7a8e68>] drm_ioctl_kernel+0xb0/0xf0 [drm]
> >      [<00000000e721ee87>] drm_ioctl+0x305/0x3c0 [drm]
> >      [<000000008b0d8986>] __x64_sys_ioctl+0x71/0xb0
> >      [<0000000076c362a4>] do_syscall_64+0x33/0x80
> >      [<00000000eb7a4831>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
> > 
> > Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> > Fixes: 531958f6f357 ("drm/i915/gt: Track timeline activeness in enter/exit")
> > Cc: <stable@vger.kernel.org>
> > ---
> >   drivers/gpu/drm/i915/gt/intel_timeline.c | 9 +++++++++
> >   1 file changed, 9 insertions(+)
> > 
> > diff --git a/drivers/gpu/drm/i915/gt/intel_timeline.c b/drivers/gpu/drm/i915/gt/intel_timeline.c
> > index c4a126c8caef..1257f4f11e66 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_timeline.c
> > +++ b/drivers/gpu/drm/i915/gt/intel_timeline.c
> > @@ -127,6 +127,15 @@ static void intel_timeline_fini(struct rcu_head *rcu)
> >   	i915_vma_put(timeline->hwsp_ggtt);
> >   	i915_active_fini(&timeline->active);
> > +
> > +	/*
> > +	 * A small race exists between intel_gt_retire_requests_timeout and
> > +	 * intel_timeline_exit which could result in the syncmap not getting
> > +	 * free'd. Rather than work to hard to seal this race, simply cleanup
> > +	 * the syncmap on fini.
> What is the race? I'm going round in circles just trying to work out how
> intel_gt_retire_requests_timeout is supposed to get to intel_timeline_exit
> in the first place.
> 

intel_gt_retire_requests_timeout increments tl->active_count, active_count == 2
intel_timeline_exit is called, returns on atomic_add_unless, active_count == 1
intel_gt_retire_requests_timeout decrements tl->active_count, active_count == 0
i915_syncmap_free is never called, memory leak

Matt

> Also, free'd -> freed.
> 
> John.
> 
> 
> > +	 */
> > +	i915_syncmap_free(&timeline->sync);
> > +
> >   	kfree(timeline);
> >   }
>
John Harrison Aug. 6, 2021, 8:16 p.m. UTC | #3
On 8/6/2021 11:29, Matthew Brost wrote:
> On Fri, Aug 06, 2021 at 11:23:06AM -0700, John Harrison wrote:
>> On 7/30/2021 12:53, Matthew Brost wrote:
>>> A small race exists between intel_gt_retire_requests_timeout and
>>> intel_timeline_exit which could result in the syncmap not getting
>>> free'd. Rather than work to hard to seal this race, simply cleanup the
>> free'd -> freed
>>
> Sure.
>
>>> syncmap on fini.
>>>
>>> unreferenced object 0xffff88813bc53b18 (size 96):
>>>     comm "gem_close_race", pid 5410, jiffies 4294917818 (age 1105.600s)
>>>     hex dump (first 32 bytes):
>>>       01 00 00 00 00 00 00 00 00 00 00 00 0a 00 00 00  ................
>>>       00 00 00 00 00 00 00 00 6b 6b 6b 6b 06 00 00 00  ........kkkk....
>>>     backtrace:
>>>       [<00000000120b863a>] __sync_alloc_leaf+0x1e/0x40 [i915]
>>>       [<00000000042f6959>] __sync_set+0x1bb/0x240 [i915]
>>>       [<0000000090f0e90f>] i915_request_await_dma_fence+0x1c7/0x400 [i915]
>>>       [<0000000056a48219>] i915_request_await_object+0x222/0x360 [i915]
>>>       [<00000000aaac4ee3>] i915_gem_do_execbuffer+0x1bd0/0x2250 [i915]
>>>       [<000000003c9d830f>] i915_gem_execbuffer2_ioctl+0x405/0xce0 [i915]
>>>       [<00000000fd7a8e68>] drm_ioctl_kernel+0xb0/0xf0 [drm]
>>>       [<00000000e721ee87>] drm_ioctl+0x305/0x3c0 [drm]
>>>       [<000000008b0d8986>] __x64_sys_ioctl+0x71/0xb0
>>>       [<0000000076c362a4>] do_syscall_64+0x33/0x80
>>>       [<00000000eb7a4831>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
>>>
>>> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
>>> Fixes: 531958f6f357 ("drm/i915/gt: Track timeline activeness in enter/exit")
>>> Cc: <stable@vger.kernel.org>
>>> ---
>>>    drivers/gpu/drm/i915/gt/intel_timeline.c | 9 +++++++++
>>>    1 file changed, 9 insertions(+)
>>>
>>> diff --git a/drivers/gpu/drm/i915/gt/intel_timeline.c b/drivers/gpu/drm/i915/gt/intel_timeline.c
>>> index c4a126c8caef..1257f4f11e66 100644
>>> --- a/drivers/gpu/drm/i915/gt/intel_timeline.c
>>> +++ b/drivers/gpu/drm/i915/gt/intel_timeline.c
>>> @@ -127,6 +127,15 @@ static void intel_timeline_fini(struct rcu_head *rcu)
>>>    	i915_vma_put(timeline->hwsp_ggtt);
>>>    	i915_active_fini(&timeline->active);
>>> +
>>> +	/*
>>> +	 * A small race exists between intel_gt_retire_requests_timeout and
>>> +	 * intel_timeline_exit which could result in the syncmap not getting
>>> +	 * free'd. Rather than work to hard to seal this race, simply cleanup
>>> +	 * the syncmap on fini.
>> What is the race? I'm going round in circles just trying to work out how
>> intel_gt_retire_requests_timeout is supposed to get to intel_timeline_exit
>> in the first place.
>>
> intel_gt_retire_requests_timeout increments tl->active_count, active_count == 2
> intel_timeline_exit is called, returns on atomic_add_unless, active_count == 1
> intel_gt_retire_requests_timeout decrements tl->active_count, active_count == 0
> i915_syncmap_free is never called, memory leak
>
> Matt
Okay. Think I follow it now.

Seems like the syncmap free should have been in timeline_fini instead of 
timeline_exit in the first place?

Reviewed-by: John Harrison <John.C.Harrison@Intel.com>


>
>> Also, free'd -> freed.
>>
>> John.
>>
>>
>>> +	 */
>>> +	i915_syncmap_free(&timeline->sync);
>>> +
>>>    	kfree(timeline);
>>>    }
diff mbox series

Patch

diff --git a/drivers/gpu/drm/i915/gt/intel_timeline.c b/drivers/gpu/drm/i915/gt/intel_timeline.c
index c4a126c8caef..1257f4f11e66 100644
--- a/drivers/gpu/drm/i915/gt/intel_timeline.c
+++ b/drivers/gpu/drm/i915/gt/intel_timeline.c
@@ -127,6 +127,15 @@  static void intel_timeline_fini(struct rcu_head *rcu)
 
 	i915_vma_put(timeline->hwsp_ggtt);
 	i915_active_fini(&timeline->active);
+
+	/*
+	 * A small race exists between intel_gt_retire_requests_timeout and
+	 * intel_timeline_exit which could result in the syncmap not getting
+	 * free'd. Rather than work to hard to seal this race, simply cleanup
+	 * the syncmap on fini.
+	 */
+	i915_syncmap_free(&timeline->sync);
+
 	kfree(timeline);
 }