diff mbox series

drm/i915: Release ctx->syncobj on final put, not on ctx close

Message ID 20210806201852.1345184-1-daniel.vetter@ffwll.ch (mailing list archive)
State New, archived
Headers show
Series drm/i915: Release ctx->syncobj on final put, not on ctx close | expand

Commit Message

Daniel Vetter Aug. 6, 2021, 8:18 p.m. UTC
gem context refcounting is another exercise in least locking design it
seems, where most things get destroyed upon context closure (which can
race with anything really). Only the actual memory allocation and the
locks survive while holding a reference.

This tripped up Jason when reimplementing the single timeline feature
in

commit 00dae4d3d35d4f526929633b76e00b0ab4d3970d
Author: Jason Ekstrand <jason@jlekstrand.net>
Date:   Thu Jul 8 10:48:12 2021 -0500

    drm/i915: Implement SINGLE_TIMELINE with a syncobj (v4)

We could fix the bug by holding ctx->mutex, but it's cleaner to just
make the context object actually invariant over its _entire_ lifetime.

Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Fixes: 00dae4d3d35d ("drm/i915: Implement SINGLE_TIMELINE with a syncobj (v4)")
Cc: Jason Ekstrand <jason@jlekstrand.net>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: "Thomas Hellström" <thomas.hellstrom@intel.com>
Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: Dave Airlie <airlied@redhat.com>
---
 drivers/gpu/drm/i915/gem/i915_gem_context.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

Comments

Jason Ekstrand Aug. 8, 2021, 12:56 a.m. UTC | #1
On August 6, 2021 15:18:59 Daniel Vetter <daniel.vetter@ffwll.ch> wrote:

> gem context refcounting is another exercise in least locking design it
> seems, where most things get destroyed upon context closure (which can
> race with anything really). Only the actual memory allocation and the
> locks survive while holding a reference.
>
> This tripped up Jason when reimplementing the single timeline feature
> in
>
> commit 00dae4d3d35d4f526929633b76e00b0ab4d3970d
> Author: Jason Ekstrand <jason@jlekstrand.net>
> Date:   Thu Jul 8 10:48:12 2021 -0500
>
>    drm/i915: Implement SINGLE_TIMELINE with a syncobj (v4)
>
> We could fix the bug by holding ctx->mutex, but it's cleaner to just

What bug is this fixing, exactly?

--Jason

>
> make the context object actually invariant over its _entire_ lifetime.
>
> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
> Fixes: 00dae4d3d35d ("drm/i915: Implement SINGLE_TIMELINE with a syncobj (v4)")
> Cc: Jason Ekstrand <jason@jlekstrand.net>
> Cc: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> Cc: Matthew Brost <matthew.brost@intel.com>
> Cc: Matthew Auld <matthew.auld@intel.com>
> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
> Cc: "Thomas Hellström" <thomas.hellstrom@intel.com>
> Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
> Cc: Dave Airlie <airlied@redhat.com>
> ---
> drivers/gpu/drm/i915/gem/i915_gem_context.c | 6 +++---
> 1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c 
> b/drivers/gpu/drm/i915/gem/i915_gem_context.c
> index 754b9b8d4981..93ba0197d70a 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
> @@ -940,6 +940,9 @@ void i915_gem_context_release(struct kref *ref)
>  trace_i915_context_free(ctx);
>  GEM_BUG_ON(!i915_gem_context_is_closed(ctx));
>
> + if (ctx->syncobj)
> + drm_syncobj_put(ctx->syncobj);
> +
>  mutex_destroy(&ctx->engines_mutex);
>  mutex_destroy(&ctx->lut_mutex);
>
> @@ -1159,9 +1162,6 @@ static void context_close(struct i915_gem_context *ctx)
>  if (vm)
>  i915_vm_close(vm);
>
> - if (ctx->syncobj)
> - drm_syncobj_put(ctx->syncobj);
> -
>  ctx->file_priv = ERR_PTR(-EBADF);
>
>  /*
> --
> 2.32.0
Daniel Vetter Aug. 9, 2021, 6:47 p.m. UTC | #2
On Sun, Aug 8, 2021 at 2:56 AM Jason Ekstrand <jason@jlekstrand.net> wrote:
>
> On August 6, 2021 15:18:59 Daniel Vetter <daniel.vetter@ffwll.ch> wrote:
>
>> gem context refcounting is another exercise in least locking design it
>> seems, where most things get destroyed upon context closure (which can
>> race with anything really). Only the actual memory allocation and the
>> locks survive while holding a reference.
>>
>> This tripped up Jason when reimplementing the single timeline feature
>> in
>>
>> commit 00dae4d3d35d4f526929633b76e00b0ab4d3970d
>> Author: Jason Ekstrand <jason@jlekstrand.net>
>> Date:   Thu Jul 8 10:48:12 2021 -0500
>>
>>     drm/i915: Implement SINGLE_TIMELINE with a syncobj (v4)
>>
>> We could fix the bug by holding ctx->mutex, but it's cleaner to just
>
>
> What bug is this fixing, exactly?

Oh lol I was all busy ranting and not explaining :-) I found it while
auditing other context stuff, so that other patch has the longer
commit message with more history, but that patch is also now tied into
the vm-dercuify, so short summary: You put the syncobj in context
close (i.e. CTX_DESTRY ioctl or close(drmfd)), not in the final
kref_put. Which means you're open to a use-after-free if you race
against an execbuf. ctx->vm is equally broken (but for other ioctl),
once this fix here is merged I send out the ctx->vm fix because that's
tied into the vm-dercuify now due to conflicts.
-Daniel

>
> --Jason
>
>>
>> make the context object actually invariant over its _entire_ lifetime.
>>
>> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
>> Fixes: 00dae4d3d35d ("drm/i915: Implement SINGLE_TIMELINE with a syncobj (v4)")
>> Cc: Jason Ekstrand <jason@jlekstrand.net>
>> Cc: Chris Wilson <chris@chris-wilson.co.uk>
>> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
>> Cc: Matthew Brost <matthew.brost@intel.com>
>> Cc: Matthew Auld <matthew.auld@intel.com>
>> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
>> Cc: "Thomas Hellström" <thomas.hellstrom@intel.com>
>> Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
>> Cc: Dave Airlie <airlied@redhat.com>
>> ---
>>  drivers/gpu/drm/i915/gem/i915_gem_context.c | 6 +++---
>>  1 file changed, 3 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/gem/i915_gem_context.c
>> index 754b9b8d4981..93ba0197d70a 100644
>> --- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
>> @@ -940,6 +940,9 @@ void i915_gem_context_release(struct kref *ref)
>>   trace_i915_context_free(ctx);
>>   GEM_BUG_ON(!i915_gem_context_is_closed(ctx));
>>
>> + if (ctx->syncobj)
>> + drm_syncobj_put(ctx->syncobj);
>> +
>>   mutex_destroy(&ctx->engines_mutex);
>>   mutex_destroy(&ctx->lut_mutex);
>>
>> @@ -1159,9 +1162,6 @@ static void context_close(struct i915_gem_context *ctx)
>>   if (vm)
>>   i915_vm_close(vm);
>>
>> - if (ctx->syncobj)
>> - drm_syncobj_put(ctx->syncobj);
>> -
>>   ctx->file_priv = ERR_PTR(-EBADF);
>>
>>   /*
>> --
>> 2.32.0
>
>
Daniel Vetter Aug. 10, 2021, 10:13 a.m. UTC | #3
On Mon, Aug 09, 2021 at 08:47:14PM +0200, Daniel Vetter wrote:
> On Sun, Aug 8, 2021 at 2:56 AM Jason Ekstrand <jason@jlekstrand.net> wrote:
> >
> > On August 6, 2021 15:18:59 Daniel Vetter <daniel.vetter@ffwll.ch> wrote:
> >
> >> gem context refcounting is another exercise in least locking design it
> >> seems, where most things get destroyed upon context closure (which can
> >> race with anything really). Only the actual memory allocation and the
> >> locks survive while holding a reference.
> >>
> >> This tripped up Jason when reimplementing the single timeline feature
> >> in
> >>
> >> commit 00dae4d3d35d4f526929633b76e00b0ab4d3970d
> >> Author: Jason Ekstrand <jason@jlekstrand.net>
> >> Date:   Thu Jul 8 10:48:12 2021 -0500
> >>
> >>     drm/i915: Implement SINGLE_TIMELINE with a syncobj (v4)
> >>
> >> We could fix the bug by holding ctx->mutex, but it's cleaner to just
> >
> >
> > What bug is this fixing, exactly?
> 
> Oh lol I was all busy ranting and not explaining :-) I found it while
> auditing other context stuff, so that other patch has the longer
> commit message with more history, but that patch is also now tied into
> the vm-dercuify, so short summary: You put the syncobj in context
> close (i.e. CTX_DESTRY ioctl or close(drmfd)), not in the final
> kref_put. Which means you're open to a use-after-free if you race
> against an execbuf. ctx->vm is equally broken (but for other ioctl),
> once this fix here is merged I send out the ctx->vm fix because that's
> tied into the vm-dercuify now due to conflicts.

CI caught more, so just explaining what I'm fixing here isn't going to be
enough.

The troubel is that drm_syncobj_put is now called from very awkward
places, and I need to see whether we can fix that. Or whether we need more
work_struct (like we have already for i915_address_space) for the final
release.
-Daniel

> -Daniel
> 
> >
> > --Jason
> >
> >>
> >> make the context object actually invariant over its _entire_ lifetime.
> >>
> >> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
> >> Fixes: 00dae4d3d35d ("drm/i915: Implement SINGLE_TIMELINE with a syncobj (v4)")
> >> Cc: Jason Ekstrand <jason@jlekstrand.net>
> >> Cc: Chris Wilson <chris@chris-wilson.co.uk>
> >> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> >> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> >> Cc: Matthew Brost <matthew.brost@intel.com>
> >> Cc: Matthew Auld <matthew.auld@intel.com>
> >> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
> >> Cc: "Thomas Hellström" <thomas.hellstrom@intel.com>
> >> Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
> >> Cc: Dave Airlie <airlied@redhat.com>
> >> ---
> >>  drivers/gpu/drm/i915/gem/i915_gem_context.c | 6 +++---
> >>  1 file changed, 3 insertions(+), 3 deletions(-)
> >>
> >> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/gem/i915_gem_context.c
> >> index 754b9b8d4981..93ba0197d70a 100644
> >> --- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
> >> +++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
> >> @@ -940,6 +940,9 @@ void i915_gem_context_release(struct kref *ref)
> >>   trace_i915_context_free(ctx);
> >>   GEM_BUG_ON(!i915_gem_context_is_closed(ctx));
> >>
> >> + if (ctx->syncobj)
> >> + drm_syncobj_put(ctx->syncobj);
> >> +
> >>   mutex_destroy(&ctx->engines_mutex);
> >>   mutex_destroy(&ctx->lut_mutex);
> >>
> >> @@ -1159,9 +1162,6 @@ static void context_close(struct i915_gem_context *ctx)
> >>   if (vm)
> >>   i915_vm_close(vm);
> >>
> >> - if (ctx->syncobj)
> >> - drm_syncobj_put(ctx->syncobj);
> >> -
> >>   ctx->file_priv = ERR_PTR(-EBADF);
> >>
> >>   /*
> >> --
> >> 2.32.0
> >
> >
> 
> 
> -- 
> Daniel Vetter
> Software Engineer, Intel Corporation
> http://blog.ffwll.ch
diff mbox series

Patch

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/gem/i915_gem_context.c
index 754b9b8d4981..93ba0197d70a 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
@@ -940,6 +940,9 @@  void i915_gem_context_release(struct kref *ref)
 	trace_i915_context_free(ctx);
 	GEM_BUG_ON(!i915_gem_context_is_closed(ctx));
 
+	if (ctx->syncobj)
+		drm_syncobj_put(ctx->syncobj);
+
 	mutex_destroy(&ctx->engines_mutex);
 	mutex_destroy(&ctx->lut_mutex);
 
@@ -1159,9 +1162,6 @@  static void context_close(struct i915_gem_context *ctx)
 	if (vm)
 		i915_vm_close(vm);
 
-	if (ctx->syncobj)
-		drm_syncobj_put(ctx->syncobj);
-
 	ctx->file_priv = ERR_PTR(-EBADF);
 
 	/*