Message ID | 20210730195342.110234-1-matthew.brost@intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | drm/i915: Fix syncmap memory leak | expand |
On 7/30/2021 12:53, Matthew Brost wrote: > A small race exists between intel_gt_retire_requests_timeout and > intel_timeline_exit which could result in the syncmap not getting > free'd. Rather than work to hard to seal this race, simply cleanup the free'd -> freed > syncmap on fini. > > unreferenced object 0xffff88813bc53b18 (size 96): > comm "gem_close_race", pid 5410, jiffies 4294917818 (age 1105.600s) > hex dump (first 32 bytes): > 01 00 00 00 00 00 00 00 00 00 00 00 0a 00 00 00 ................ > 00 00 00 00 00 00 00 00 6b 6b 6b 6b 06 00 00 00 ........kkkk.... > backtrace: > [<00000000120b863a>] __sync_alloc_leaf+0x1e/0x40 [i915] > [<00000000042f6959>] __sync_set+0x1bb/0x240 [i915] > [<0000000090f0e90f>] i915_request_await_dma_fence+0x1c7/0x400 [i915] > [<0000000056a48219>] i915_request_await_object+0x222/0x360 [i915] > [<00000000aaac4ee3>] i915_gem_do_execbuffer+0x1bd0/0x2250 [i915] > [<000000003c9d830f>] i915_gem_execbuffer2_ioctl+0x405/0xce0 [i915] > [<00000000fd7a8e68>] drm_ioctl_kernel+0xb0/0xf0 [drm] > [<00000000e721ee87>] drm_ioctl+0x305/0x3c0 [drm] > [<000000008b0d8986>] __x64_sys_ioctl+0x71/0xb0 > [<0000000076c362a4>] do_syscall_64+0x33/0x80 > [<00000000eb7a4831>] entry_SYSCALL_64_after_hwframe+0x44/0xa9 > > Signed-off-by: Matthew Brost <matthew.brost@intel.com> > Fixes: 531958f6f357 ("drm/i915/gt: Track timeline activeness in enter/exit") > Cc: <stable@vger.kernel.org> > --- > drivers/gpu/drm/i915/gt/intel_timeline.c | 9 +++++++++ > 1 file changed, 9 insertions(+) > > diff --git a/drivers/gpu/drm/i915/gt/intel_timeline.c b/drivers/gpu/drm/i915/gt/intel_timeline.c > index c4a126c8caef..1257f4f11e66 100644 > --- a/drivers/gpu/drm/i915/gt/intel_timeline.c > +++ b/drivers/gpu/drm/i915/gt/intel_timeline.c > @@ -127,6 +127,15 @@ static void intel_timeline_fini(struct rcu_head *rcu) > > i915_vma_put(timeline->hwsp_ggtt); > i915_active_fini(&timeline->active); > + > + /* > + * A small race exists between intel_gt_retire_requests_timeout and > + * intel_timeline_exit which could result in the syncmap not getting > + * free'd. Rather than work to hard to seal this race, simply cleanup > + * the syncmap on fini. What is the race? I'm going round in circles just trying to work out how intel_gt_retire_requests_timeout is supposed to get to intel_timeline_exit in the first place. Also, free'd -> freed. John. > + */ > + i915_syncmap_free(&timeline->sync); > + > kfree(timeline); > } >
On Fri, Aug 06, 2021 at 11:23:06AM -0700, John Harrison wrote: > On 7/30/2021 12:53, Matthew Brost wrote: > > A small race exists between intel_gt_retire_requests_timeout and > > intel_timeline_exit which could result in the syncmap not getting > > free'd. Rather than work to hard to seal this race, simply cleanup the > free'd -> freed > Sure. > > syncmap on fini. > > > > unreferenced object 0xffff88813bc53b18 (size 96): > > comm "gem_close_race", pid 5410, jiffies 4294917818 (age 1105.600s) > > hex dump (first 32 bytes): > > 01 00 00 00 00 00 00 00 00 00 00 00 0a 00 00 00 ................ > > 00 00 00 00 00 00 00 00 6b 6b 6b 6b 06 00 00 00 ........kkkk.... > > backtrace: > > [<00000000120b863a>] __sync_alloc_leaf+0x1e/0x40 [i915] > > [<00000000042f6959>] __sync_set+0x1bb/0x240 [i915] > > [<0000000090f0e90f>] i915_request_await_dma_fence+0x1c7/0x400 [i915] > > [<0000000056a48219>] i915_request_await_object+0x222/0x360 [i915] > > [<00000000aaac4ee3>] i915_gem_do_execbuffer+0x1bd0/0x2250 [i915] > > [<000000003c9d830f>] i915_gem_execbuffer2_ioctl+0x405/0xce0 [i915] > > [<00000000fd7a8e68>] drm_ioctl_kernel+0xb0/0xf0 [drm] > > [<00000000e721ee87>] drm_ioctl+0x305/0x3c0 [drm] > > [<000000008b0d8986>] __x64_sys_ioctl+0x71/0xb0 > > [<0000000076c362a4>] do_syscall_64+0x33/0x80 > > [<00000000eb7a4831>] entry_SYSCALL_64_after_hwframe+0x44/0xa9 > > > > Signed-off-by: Matthew Brost <matthew.brost@intel.com> > > Fixes: 531958f6f357 ("drm/i915/gt: Track timeline activeness in enter/exit") > > Cc: <stable@vger.kernel.org> > > --- > > drivers/gpu/drm/i915/gt/intel_timeline.c | 9 +++++++++ > > 1 file changed, 9 insertions(+) > > > > diff --git a/drivers/gpu/drm/i915/gt/intel_timeline.c b/drivers/gpu/drm/i915/gt/intel_timeline.c > > index c4a126c8caef..1257f4f11e66 100644 > > --- a/drivers/gpu/drm/i915/gt/intel_timeline.c > > +++ b/drivers/gpu/drm/i915/gt/intel_timeline.c > > @@ -127,6 +127,15 @@ static void intel_timeline_fini(struct rcu_head *rcu) > > i915_vma_put(timeline->hwsp_ggtt); > > i915_active_fini(&timeline->active); > > + > > + /* > > + * A small race exists between intel_gt_retire_requests_timeout and > > + * intel_timeline_exit which could result in the syncmap not getting > > + * free'd. Rather than work to hard to seal this race, simply cleanup > > + * the syncmap on fini. > What is the race? I'm going round in circles just trying to work out how > intel_gt_retire_requests_timeout is supposed to get to intel_timeline_exit > in the first place. > intel_gt_retire_requests_timeout increments tl->active_count, active_count == 2 intel_timeline_exit is called, returns on atomic_add_unless, active_count == 1 intel_gt_retire_requests_timeout decrements tl->active_count, active_count == 0 i915_syncmap_free is never called, memory leak Matt > Also, free'd -> freed. > > John. > > > > + */ > > + i915_syncmap_free(&timeline->sync); > > + > > kfree(timeline); > > } >
On 8/6/2021 11:29, Matthew Brost wrote: > On Fri, Aug 06, 2021 at 11:23:06AM -0700, John Harrison wrote: >> On 7/30/2021 12:53, Matthew Brost wrote: >>> A small race exists between intel_gt_retire_requests_timeout and >>> intel_timeline_exit which could result in the syncmap not getting >>> free'd. Rather than work to hard to seal this race, simply cleanup the >> free'd -> freed >> > Sure. > >>> syncmap on fini. >>> >>> unreferenced object 0xffff88813bc53b18 (size 96): >>> comm "gem_close_race", pid 5410, jiffies 4294917818 (age 1105.600s) >>> hex dump (first 32 bytes): >>> 01 00 00 00 00 00 00 00 00 00 00 00 0a 00 00 00 ................ >>> 00 00 00 00 00 00 00 00 6b 6b 6b 6b 06 00 00 00 ........kkkk.... >>> backtrace: >>> [<00000000120b863a>] __sync_alloc_leaf+0x1e/0x40 [i915] >>> [<00000000042f6959>] __sync_set+0x1bb/0x240 [i915] >>> [<0000000090f0e90f>] i915_request_await_dma_fence+0x1c7/0x400 [i915] >>> [<0000000056a48219>] i915_request_await_object+0x222/0x360 [i915] >>> [<00000000aaac4ee3>] i915_gem_do_execbuffer+0x1bd0/0x2250 [i915] >>> [<000000003c9d830f>] i915_gem_execbuffer2_ioctl+0x405/0xce0 [i915] >>> [<00000000fd7a8e68>] drm_ioctl_kernel+0xb0/0xf0 [drm] >>> [<00000000e721ee87>] drm_ioctl+0x305/0x3c0 [drm] >>> [<000000008b0d8986>] __x64_sys_ioctl+0x71/0xb0 >>> [<0000000076c362a4>] do_syscall_64+0x33/0x80 >>> [<00000000eb7a4831>] entry_SYSCALL_64_after_hwframe+0x44/0xa9 >>> >>> Signed-off-by: Matthew Brost <matthew.brost@intel.com> >>> Fixes: 531958f6f357 ("drm/i915/gt: Track timeline activeness in enter/exit") >>> Cc: <stable@vger.kernel.org> >>> --- >>> drivers/gpu/drm/i915/gt/intel_timeline.c | 9 +++++++++ >>> 1 file changed, 9 insertions(+) >>> >>> diff --git a/drivers/gpu/drm/i915/gt/intel_timeline.c b/drivers/gpu/drm/i915/gt/intel_timeline.c >>> index c4a126c8caef..1257f4f11e66 100644 >>> --- a/drivers/gpu/drm/i915/gt/intel_timeline.c >>> +++ b/drivers/gpu/drm/i915/gt/intel_timeline.c >>> @@ -127,6 +127,15 @@ static void intel_timeline_fini(struct rcu_head *rcu) >>> i915_vma_put(timeline->hwsp_ggtt); >>> i915_active_fini(&timeline->active); >>> + >>> + /* >>> + * A small race exists between intel_gt_retire_requests_timeout and >>> + * intel_timeline_exit which could result in the syncmap not getting >>> + * free'd. Rather than work to hard to seal this race, simply cleanup >>> + * the syncmap on fini. >> What is the race? I'm going round in circles just trying to work out how >> intel_gt_retire_requests_timeout is supposed to get to intel_timeline_exit >> in the first place. >> > intel_gt_retire_requests_timeout increments tl->active_count, active_count == 2 > intel_timeline_exit is called, returns on atomic_add_unless, active_count == 1 > intel_gt_retire_requests_timeout decrements tl->active_count, active_count == 0 > i915_syncmap_free is never called, memory leak > > Matt Okay. Think I follow it now. Seems like the syncmap free should have been in timeline_fini instead of timeline_exit in the first place? Reviewed-by: John Harrison <John.C.Harrison@Intel.com> > >> Also, free'd -> freed. >> >> John. >> >> >>> + */ >>> + i915_syncmap_free(&timeline->sync); >>> + >>> kfree(timeline); >>> }
diff --git a/drivers/gpu/drm/i915/gt/intel_timeline.c b/drivers/gpu/drm/i915/gt/intel_timeline.c index c4a126c8caef..1257f4f11e66 100644 --- a/drivers/gpu/drm/i915/gt/intel_timeline.c +++ b/drivers/gpu/drm/i915/gt/intel_timeline.c @@ -127,6 +127,15 @@ static void intel_timeline_fini(struct rcu_head *rcu) i915_vma_put(timeline->hwsp_ggtt); i915_active_fini(&timeline->active); + + /* + * A small race exists between intel_gt_retire_requests_timeout and + * intel_timeline_exit which could result in the syncmap not getting + * free'd. Rather than work to hard to seal this race, simply cleanup + * the syncmap on fini. + */ + i915_syncmap_free(&timeline->sync); + kfree(timeline); }
A small race exists between intel_gt_retire_requests_timeout and intel_timeline_exit which could result in the syncmap not getting free'd. Rather than work to hard to seal this race, simply cleanup the syncmap on fini. unreferenced object 0xffff88813bc53b18 (size 96): comm "gem_close_race", pid 5410, jiffies 4294917818 (age 1105.600s) hex dump (first 32 bytes): 01 00 00 00 00 00 00 00 00 00 00 00 0a 00 00 00 ................ 00 00 00 00 00 00 00 00 6b 6b 6b 6b 06 00 00 00 ........kkkk.... backtrace: [<00000000120b863a>] __sync_alloc_leaf+0x1e/0x40 [i915] [<00000000042f6959>] __sync_set+0x1bb/0x240 [i915] [<0000000090f0e90f>] i915_request_await_dma_fence+0x1c7/0x400 [i915] [<0000000056a48219>] i915_request_await_object+0x222/0x360 [i915] [<00000000aaac4ee3>] i915_gem_do_execbuffer+0x1bd0/0x2250 [i915] [<000000003c9d830f>] i915_gem_execbuffer2_ioctl+0x405/0xce0 [i915] [<00000000fd7a8e68>] drm_ioctl_kernel+0xb0/0xf0 [drm] [<00000000e721ee87>] drm_ioctl+0x305/0x3c0 [drm] [<000000008b0d8986>] __x64_sys_ioctl+0x71/0xb0 [<0000000076c362a4>] do_syscall_64+0x33/0x80 [<00000000eb7a4831>] entry_SYSCALL_64_after_hwframe+0x44/0xa9 Signed-off-by: Matthew Brost <matthew.brost@intel.com> Fixes: 531958f6f357 ("drm/i915/gt: Track timeline activeness in enter/exit") Cc: <stable@vger.kernel.org> --- drivers/gpu/drm/i915/gt/intel_timeline.c | 9 +++++++++ 1 file changed, 9 insertions(+)