Message ID | 20210621114123.3131534-1-maarten.lankhorst@linux.intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [1/3] drm/i915/gt: Do not allow setting ring size for legacy ring submission | expand |
I had some questions on the trybot mailing list, let me copy&paste.. On 21/06/2021 12:41, Maarten Lankhorst wrote: > It doesn't work for legacy ring submission, and is in the best case > ignored. Looks rejected instead of ignored: static int set_ringsize(struct i915_gem_context *ctx, struct drm_i915_gem_context_param *args) { if (!HAS_LOGICAL_RING_CONTEXTS(ctx->i915)) return -ENODEV; > > In the worst case we end up freeing engine->legacy.ring for all other > active engines, resulting in a use-after-free. Worst case is cloning because ring_context_alloc is not taking a reference to engine->legacy.ring, or something else? Regards, Tvrtko > > Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> > Fixes: 88be76cdafc7 ("drm/i915: Allow userspace to specify ringsize on construction") > Cc: Chris Wilson <chris@chris-wilson.co.uk> > --- > drivers/gpu/drm/i915/gt/intel_context_param.c | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/drivers/gpu/drm/i915/gt/intel_context_param.c b/drivers/gpu/drm/i915/gt/intel_context_param.c > index 65dcd090245d..412c36d1b1dd 100644 > --- a/drivers/gpu/drm/i915/gt/intel_context_param.c > +++ b/drivers/gpu/drm/i915/gt/intel_context_param.c > @@ -12,6 +12,9 @@ int intel_context_set_ring_size(struct intel_context *ce, long sz) > { > int err; > > + if (ce->engine->gt->submission_method == INTEL_SUBMISSION_RING) > + return 0; > + > if (intel_context_lock_pinned(ce)) > return -EINTR; > >
Op 21-06-2021 om 14:08 schreef Tvrtko Ursulin: > > I had some questions on the trybot mailing list, let me copy&paste.. > > On 21/06/2021 12:41, Maarten Lankhorst wrote: >> It doesn't work for legacy ring submission, and is in the best case >> ignored. > > Looks rejected instead of ignored: > > static int set_ringsize(struct i915_gem_context *ctx, > struct drm_i915_gem_context_param *args) > { > if (!HAS_LOGICAL_RING_CONTEXTS(ctx->i915)) > return -ENODEV; >> >> In the worst case we end up freeing engine->legacy.ring for all other >> active engines, resulting in a use-after-free. > > Worst case is cloning because ring_context_alloc is not taking a reference to engine->legacy.ring, or something else? > > Regards, > > Tvrtko I only noticed this because tests started failing, if it should already hit -ENODEV then that's weird.. See: https://patchwork.freedesktop.org/series/91501/ for the failure. It should not hit the INCOMPLETEs there. The legacy contexts don't grab a reference to engine->legacy.ring, but a copy to the pointer, presumably because its lifetime is always shorter than the ring lifetime, so it will actually free it. ~Maarten
On 21/06/2021 13:08, Tvrtko Ursulin wrote: > > I had some questions on the trybot mailing list, let me copy&paste.. > > On 21/06/2021 12:41, Maarten Lankhorst wrote: >> It doesn't work for legacy ring submission, and is in the best case >> ignored. > > Looks rejected instead of ignored: > > static int set_ringsize(struct i915_gem_context *ctx, > struct drm_i915_gem_context_param *args) > { > if (!HAS_LOGICAL_RING_CONTEXTS(ctx->i915)) > return -ENODEV; >> >> In the worst case we end up freeing engine->legacy.ring for all other >> active engines, resulting in a use-after-free. > > Worst case is cloning because ring_context_alloc is not taking a > reference to engine->legacy.ring, or something else? No can't be that, it was my incomplete analysis last week. Since ring_context_destroy does not actually free the legacy ring I don't see any use after free paths. Regards, Tvrtko >> >> Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> >> Fixes: 88be76cdafc7 ("drm/i915: Allow userspace to specify ringsize on >> construction") >> Cc: Chris Wilson <chris@chris-wilson.co.uk> >> --- >> drivers/gpu/drm/i915/gt/intel_context_param.c | 3 +++ >> 1 file changed, 3 insertions(+) >> >> diff --git a/drivers/gpu/drm/i915/gt/intel_context_param.c >> b/drivers/gpu/drm/i915/gt/intel_context_param.c >> index 65dcd090245d..412c36d1b1dd 100644 >> --- a/drivers/gpu/drm/i915/gt/intel_context_param.c >> +++ b/drivers/gpu/drm/i915/gt/intel_context_param.c >> @@ -12,6 +12,9 @@ int intel_context_set_ring_size(struct intel_context >> *ce, long sz) >> { >> int err; >> + if (ce->engine->gt->submission_method == INTEL_SUBMISSION_RING) >> + return 0; >> + >> if (intel_context_lock_pinned(ce)) >> return -EINTR; >> > _______________________________________________ > Intel-gfx mailing list > Intel-gfx@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Op 21-06-2021 om 14:52 schreef Tvrtko Ursulin: > > On 21/06/2021 13:08, Tvrtko Ursulin wrote: >> >> I had some questions on the trybot mailing list, let me copy&paste.. >> >> On 21/06/2021 12:41, Maarten Lankhorst wrote: >>> It doesn't work for legacy ring submission, and is in the best case >>> ignored. >> >> Looks rejected instead of ignored: >> >> static int set_ringsize(struct i915_gem_context *ctx, >> struct drm_i915_gem_context_param *args) >> { >> if (!HAS_LOGICAL_RING_CONTEXTS(ctx->i915)) >> return -ENODEV; >>> >>> In the worst case we end up freeing engine->legacy.ring for all other >>> active engines, resulting in a use-after-free. >> >> Worst case is cloning because ring_context_alloc is not taking a reference to engine->legacy.ring, or something else? > > No can't be that, it was my incomplete analysis last week. Since ring_context_destroy does not actually free the legacy ring I don't see any use after free paths. > > Regards, Hmm, it gets stuck inside intel_context_set_ring_size when cloning engines.. I guess it can't happen in practice, just the code introduces the race by preallocating inside intel_context_lock_pinned().. copy_ring_size() should only be called for HAS_LOGICAL_RING_CONTEXTS(). I guess that makes this patch obsolete. It can safely be dropped from the series, I think I should probably introduce a check to only set the size when HAS_LOGICAL_RING_CONTEXTS evaluates to true, but that wouldn't block the rest of this series. ~Maarten
On 21/06/2021 14:07, Maarten Lankhorst wrote: > Op 21-06-2021 om 14:52 schreef Tvrtko Ursulin: >> >> On 21/06/2021 13:08, Tvrtko Ursulin wrote: >>> >>> I had some questions on the trybot mailing list, let me copy&paste.. >>> >>> On 21/06/2021 12:41, Maarten Lankhorst wrote: >>>> It doesn't work for legacy ring submission, and is in the best case >>>> ignored. >>> >>> Looks rejected instead of ignored: >>> >>> static int set_ringsize(struct i915_gem_context *ctx, >>> struct drm_i915_gem_context_param *args) >>> { >>> if (!HAS_LOGICAL_RING_CONTEXTS(ctx->i915)) >>> return -ENODEV; >>>> >>>> In the worst case we end up freeing engine->legacy.ring for all other >>>> active engines, resulting in a use-after-free. >>> >>> Worst case is cloning because ring_context_alloc is not taking a reference to engine->legacy.ring, or something else? >> >> No can't be that, it was my incomplete analysis last week. Since ring_context_destroy does not actually free the legacy ring I don't see any use after free paths. >> >> Regards, > > Hmm, it gets stuck inside intel_context_set_ring_size when cloning engines.. > > I guess it can't happen in practice, just the code introduces the race by preallocating > inside intel_context_lock_pinned().. "The code" being the rest of your series? Haven't looked in there, but can't find a problem in upstream. Since as you say, copy_ring_size will run but intel_context_set_ring_size will not free-and-allocate old/new ring since cloned context does not have a state allocated yet. Regards, Tvrtko > copy_ring_size() should only be called for HAS_LOGICAL_RING_CONTEXTS(). > I guess that makes this patch obsolete. It can safely be dropped from the series, > I think I should probably introduce a check to only set the size when HAS_LOGICAL_RING_CONTEXTS > evaluates to true, but that wouldn't block the rest of this series. > > ~Maarten >
On 21/06/2021 14:12, Tvrtko Ursulin wrote: > > On 21/06/2021 14:07, Maarten Lankhorst wrote: >> Op 21-06-2021 om 14:52 schreef Tvrtko Ursulin: >>> >>> On 21/06/2021 13:08, Tvrtko Ursulin wrote: >>>> >>>> I had some questions on the trybot mailing list, let me copy&paste.. >>>> >>>> On 21/06/2021 12:41, Maarten Lankhorst wrote: >>>>> It doesn't work for legacy ring submission, and is in the best case >>>>> ignored. >>>> >>>> Looks rejected instead of ignored: >>>> >>>> static int set_ringsize(struct i915_gem_context *ctx, >>>> struct drm_i915_gem_context_param *args) >>>> { >>>> if (!HAS_LOGICAL_RING_CONTEXTS(ctx->i915)) >>>> return -ENODEV; >>>>> >>>>> In the worst case we end up freeing engine->legacy.ring for all other >>>>> active engines, resulting in a use-after-free. >>>> >>>> Worst case is cloning because ring_context_alloc is not taking a >>>> reference to engine->legacy.ring, or something else? >>> >>> No can't be that, it was my incomplete analysis last week. Since >>> ring_context_destroy does not actually free the legacy ring I don't >>> see any use after free paths. >>> >>> Regards, >> >> Hmm, it gets stuck inside intel_context_set_ring_size when cloning >> engines.. >> >> I guess it can't happen in practice, just the code introduces the race >> by preallocating >> inside intel_context_lock_pinned().. > > "The code" being the rest of your series? Haven't looked in there, but > can't find a problem in upstream. Since as you say, copy_ring_size will > run but intel_context_set_ring_size will not free-and-allocate old/new > ring since cloned context does not have a state allocated yet. P.S. Putting a HAS_LOGICAL_RING_CONTEXTS check in copy_ring_size would be a bit unfortunate because layering is a bit broken at the moment and that wouldn't make it better. To clarify my thinking: At the moment allocating the ring is responsibility of a backend specific hook. Apart from the generic intel_context_set_ring_size which breaks that by allocating in the layer above the backend. So proper fix could be to introduce backend specific hooks for ring allocation/freeing. *If* you need to allocate the state so early.. not sure about that. I'd first need to understand why. If you say it is a race then it was all accidental? Regards, Tvrtko > Regards, > > Tvrtko > >> copy_ring_size() should only be called for HAS_LOGICAL_RING_CONTEXTS(). >> I guess that makes this patch obsolete. It can safely be dropped from >> the series, >> I think I should probably introduce a check to only set the size when >> HAS_LOGICAL_RING_CONTEXTS >> evaluates to true, but that wouldn't block the rest of this series. >> >> ~Maarten >>
Op 21-06-2021 om 15:20 schreef Tvrtko Ursulin: > > On 21/06/2021 14:12, Tvrtko Ursulin wrote: >> >> On 21/06/2021 14:07, Maarten Lankhorst wrote: >>> Op 21-06-2021 om 14:52 schreef Tvrtko Ursulin: >>>> >>>> On 21/06/2021 13:08, Tvrtko Ursulin wrote: >>>>> >>>>> I had some questions on the trybot mailing list, let me copy&paste.. >>>>> >>>>> On 21/06/2021 12:41, Maarten Lankhorst wrote: >>>>>> It doesn't work for legacy ring submission, and is in the best case >>>>>> ignored. >>>>> >>>>> Looks rejected instead of ignored: >>>>> >>>>> static int set_ringsize(struct i915_gem_context *ctx, >>>>> struct drm_i915_gem_context_param *args) >>>>> { >>>>> if (!HAS_LOGICAL_RING_CONTEXTS(ctx->i915)) >>>>> return -ENODEV; >>>>>> >>>>>> In the worst case we end up freeing engine->legacy.ring for all other >>>>>> active engines, resulting in a use-after-free. >>>>> >>>>> Worst case is cloning because ring_context_alloc is not taking a reference to engine->legacy.ring, or something else? >>>> >>>> No can't be that, it was my incomplete analysis last week. Since ring_context_destroy does not actually free the legacy ring I don't see any use after free paths. >>>> >>>> Regards, >>> >>> Hmm, it gets stuck inside intel_context_set_ring_size when cloning engines.. >>> >>> I guess it can't happen in practice, just the code introduces the race by preallocating >>> inside intel_context_lock_pinned().. >> >> "The code" being the rest of your series? Haven't looked in there, but can't find a problem in upstream. Since as you say, copy_ring_size will run but intel_context_set_ring_size will not free-and-allocate old/new ring since cloned context does not have a state allocated yet. > > P.S. Putting a HAS_LOGICAL_RING_CONTEXTS check in copy_ring_size would be a bit unfortunate because layering is a bit broken at the moment and that wouldn't make it better. > > To clarify my thinking: At the moment allocating the ring is responsibility of a backend specific hook. Apart from the generic intel_context_set_ring_size which breaks that by allocating in the layer above the backend. So proper fix could be to introduce backend specific hooks for ring allocation/freeing. > > *If* you need to allocate the state so early.. not sure about that. I'd first need to understand why. If you say it is a race then it was all accidental? I noticed it mostly when debugging. I fixed it currenly by not allocating state in set_ring_size unnecessarily, hence this patch is no longer needed. :) So if that's the only thing, I can just drop this patch entirely.
diff --git a/drivers/gpu/drm/i915/gt/intel_context_param.c b/drivers/gpu/drm/i915/gt/intel_context_param.c index 65dcd090245d..412c36d1b1dd 100644 --- a/drivers/gpu/drm/i915/gt/intel_context_param.c +++ b/drivers/gpu/drm/i915/gt/intel_context_param.c @@ -12,6 +12,9 @@ int intel_context_set_ring_size(struct intel_context *ce, long sz) { int err; + if (ce->engine->gt->submission_method == INTEL_SUBMISSION_RING) + return 0; + if (intel_context_lock_pinned(ce)) return -EINTR;
It doesn't work for legacy ring submission, and is in the best case ignored. In the worst case we end up freeing engine->legacy.ring for all other active engines, resulting in a use-after-free. Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Fixes: 88be76cdafc7 ("drm/i915: Allow userspace to specify ringsize on construction") Cc: Chris Wilson <chris@chris-wilson.co.uk> --- drivers/gpu/drm/i915/gt/intel_context_param.c | 3 +++ 1 file changed, 3 insertions(+)