diff mbox series

[1/3] drm/i915/gt: Do not allow setting ring size for legacy ring submission

Message ID 20210621114123.3131534-1-maarten.lankhorst@linux.intel.com (mailing list archive)
State New, archived
Headers show
Series [1/3] drm/i915/gt: Do not allow setting ring size for legacy ring submission | expand

Commit Message

Maarten Lankhorst June 21, 2021, 11:41 a.m. UTC
It doesn't work for legacy ring submission, and is in the best case
ignored.

In the worst case we end up freeing engine->legacy.ring for all other
active engines, resulting in a use-after-free.

Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Fixes: 88be76cdafc7 ("drm/i915: Allow userspace to specify ringsize on construction")
Cc: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/gt/intel_context_param.c | 3 +++
 1 file changed, 3 insertions(+)

Comments

Tvrtko Ursulin June 21, 2021, 12:08 p.m. UTC | #1
I had some questions on the trybot mailing list, let me copy&paste..

On 21/06/2021 12:41, Maarten Lankhorst wrote:
> It doesn't work for legacy ring submission, and is in the best case
> ignored.

Looks rejected instead of ignored:

static int set_ringsize(struct i915_gem_context *ctx,
             struct drm_i915_gem_context_param *args)
{
     if (!HAS_LOGICAL_RING_CONTEXTS(ctx->i915))
         return -ENODEV;
> 
> In the worst case we end up freeing engine->legacy.ring for all other
> active engines, resulting in a use-after-free.

Worst case is cloning because ring_context_alloc is not taking a 
reference to engine->legacy.ring, or something else?

Regards,

Tvrtko

> 
> Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
> Fixes: 88be76cdafc7 ("drm/i915: Allow userspace to specify ringsize on construction")
> Cc: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>   drivers/gpu/drm/i915/gt/intel_context_param.c | 3 +++
>   1 file changed, 3 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/gt/intel_context_param.c b/drivers/gpu/drm/i915/gt/intel_context_param.c
> index 65dcd090245d..412c36d1b1dd 100644
> --- a/drivers/gpu/drm/i915/gt/intel_context_param.c
> +++ b/drivers/gpu/drm/i915/gt/intel_context_param.c
> @@ -12,6 +12,9 @@ int intel_context_set_ring_size(struct intel_context *ce, long sz)
>   {
>   	int err;
>   
> +	if (ce->engine->gt->submission_method == INTEL_SUBMISSION_RING)
> +		return 0;
> +
>   	if (intel_context_lock_pinned(ce))
>   		return -EINTR;
>   
>
Maarten Lankhorst June 21, 2021, 12:49 p.m. UTC | #2
Op 21-06-2021 om 14:08 schreef Tvrtko Ursulin:
>
> I had some questions on the trybot mailing list, let me copy&paste..
>
> On 21/06/2021 12:41, Maarten Lankhorst wrote:
>> It doesn't work for legacy ring submission, and is in the best case
>> ignored.
>
> Looks rejected instead of ignored:
>
> static int set_ringsize(struct i915_gem_context *ctx,
>            struct drm_i915_gem_context_param *args)
> {
>    if (!HAS_LOGICAL_RING_CONTEXTS(ctx->i915))
>        return -ENODEV;
>>
>> In the worst case we end up freeing engine->legacy.ring for all other
>> active engines, resulting in a use-after-free.
>
> Worst case is cloning because ring_context_alloc is not taking a reference to engine->legacy.ring, or something else?
>
> Regards,
>
> Tvrtko 

I only noticed this because tests started failing, if it should already hit -ENODEV then that's weird..

See: https://patchwork.freedesktop.org/series/91501/ for the failure.

It should not hit the INCOMPLETEs there.

The legacy contexts don't grab a reference to engine->legacy.ring, but a copy to the pointer, presumably because its lifetime is always shorter than the ring lifetime, so it will actually free it.

~Maarten
Tvrtko Ursulin June 21, 2021, 12:52 p.m. UTC | #3
On 21/06/2021 13:08, Tvrtko Ursulin wrote:
> 
> I had some questions on the trybot mailing list, let me copy&paste..
> 
> On 21/06/2021 12:41, Maarten Lankhorst wrote:
>> It doesn't work for legacy ring submission, and is in the best case
>> ignored.
> 
> Looks rejected instead of ignored:
> 
> static int set_ringsize(struct i915_gem_context *ctx,
>              struct drm_i915_gem_context_param *args)
> {
>      if (!HAS_LOGICAL_RING_CONTEXTS(ctx->i915))
>          return -ENODEV;
>>
>> In the worst case we end up freeing engine->legacy.ring for all other
>> active engines, resulting in a use-after-free.
> 
> Worst case is cloning because ring_context_alloc is not taking a 
> reference to engine->legacy.ring, or something else?

No can't be that, it was my incomplete analysis last week. Since 
ring_context_destroy does not actually free the legacy ring I don't see 
any use after free paths.

Regards,

Tvrtko

>>
>> Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
>> Fixes: 88be76cdafc7 ("drm/i915: Allow userspace to specify ringsize on 
>> construction")
>> Cc: Chris Wilson <chris@chris-wilson.co.uk>
>> ---
>>   drivers/gpu/drm/i915/gt/intel_context_param.c | 3 +++
>>   1 file changed, 3 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/i915/gt/intel_context_param.c 
>> b/drivers/gpu/drm/i915/gt/intel_context_param.c
>> index 65dcd090245d..412c36d1b1dd 100644
>> --- a/drivers/gpu/drm/i915/gt/intel_context_param.c
>> +++ b/drivers/gpu/drm/i915/gt/intel_context_param.c
>> @@ -12,6 +12,9 @@ int intel_context_set_ring_size(struct intel_context 
>> *ce, long sz)
>>   {
>>       int err;
>> +    if (ce->engine->gt->submission_method == INTEL_SUBMISSION_RING)
>> +        return 0;
>> +
>>       if (intel_context_lock_pinned(ce))
>>           return -EINTR;
>>
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Maarten Lankhorst June 21, 2021, 1:07 p.m. UTC | #4
Op 21-06-2021 om 14:52 schreef Tvrtko Ursulin:
>
> On 21/06/2021 13:08, Tvrtko Ursulin wrote:
>>
>> I had some questions on the trybot mailing list, let me copy&paste..
>>
>> On 21/06/2021 12:41, Maarten Lankhorst wrote:
>>> It doesn't work for legacy ring submission, and is in the best case
>>> ignored.
>>
>> Looks rejected instead of ignored:
>>
>> static int set_ringsize(struct i915_gem_context *ctx,
>>              struct drm_i915_gem_context_param *args)
>> {
>>      if (!HAS_LOGICAL_RING_CONTEXTS(ctx->i915))
>>          return -ENODEV;
>>>
>>> In the worst case we end up freeing engine->legacy.ring for all other
>>> active engines, resulting in a use-after-free.
>>
>> Worst case is cloning because ring_context_alloc is not taking a reference to engine->legacy.ring, or something else?
>
> No can't be that, it was my incomplete analysis last week. Since ring_context_destroy does not actually free the legacy ring I don't see any use after free paths.
>
> Regards,

Hmm, it gets stuck inside intel_context_set_ring_size when cloning engines..

I guess it can't happen in practice, just the code introduces the race by preallocating
inside intel_context_lock_pinned()..

copy_ring_size() should only be called for HAS_LOGICAL_RING_CONTEXTS().
I guess that makes this patch obsolete. It can safely be dropped from the series,
I think I should probably introduce a check to only set the size when HAS_LOGICAL_RING_CONTEXTS
evaluates to true, but that wouldn't block the rest of this series.

~Maarten
Tvrtko Ursulin June 21, 2021, 1:12 p.m. UTC | #5
On 21/06/2021 14:07, Maarten Lankhorst wrote:
> Op 21-06-2021 om 14:52 schreef Tvrtko Ursulin:
>>
>> On 21/06/2021 13:08, Tvrtko Ursulin wrote:
>>>
>>> I had some questions on the trybot mailing list, let me copy&paste..
>>>
>>> On 21/06/2021 12:41, Maarten Lankhorst wrote:
>>>> It doesn't work for legacy ring submission, and is in the best case
>>>> ignored.
>>>
>>> Looks rejected instead of ignored:
>>>
>>> static int set_ringsize(struct i915_gem_context *ctx,
>>>               struct drm_i915_gem_context_param *args)
>>> {
>>>       if (!HAS_LOGICAL_RING_CONTEXTS(ctx->i915))
>>>           return -ENODEV;
>>>>
>>>> In the worst case we end up freeing engine->legacy.ring for all other
>>>> active engines, resulting in a use-after-free.
>>>
>>> Worst case is cloning because ring_context_alloc is not taking a reference to engine->legacy.ring, or something else?
>>
>> No can't be that, it was my incomplete analysis last week. Since ring_context_destroy does not actually free the legacy ring I don't see any use after free paths.
>>
>> Regards,
> 
> Hmm, it gets stuck inside intel_context_set_ring_size when cloning engines..
> 
> I guess it can't happen in practice, just the code introduces the race by preallocating
> inside intel_context_lock_pinned()..

"The code" being the rest of your series? Haven't looked in there, but 
can't find a problem in upstream. Since as you say, copy_ring_size will 
run but intel_context_set_ring_size will not free-and-allocate old/new 
ring since cloned context does not have a state allocated yet.

Regards,

Tvrtko

> copy_ring_size() should only be called for HAS_LOGICAL_RING_CONTEXTS().
> I guess that makes this patch obsolete. It can safely be dropped from the series,
> I think I should probably introduce a check to only set the size when HAS_LOGICAL_RING_CONTEXTS
> evaluates to true, but that wouldn't block the rest of this series.
> 
> ~Maarten
>
Tvrtko Ursulin June 21, 2021, 1:20 p.m. UTC | #6
On 21/06/2021 14:12, Tvrtko Ursulin wrote:
> 
> On 21/06/2021 14:07, Maarten Lankhorst wrote:
>> Op 21-06-2021 om 14:52 schreef Tvrtko Ursulin:
>>>
>>> On 21/06/2021 13:08, Tvrtko Ursulin wrote:
>>>>
>>>> I had some questions on the trybot mailing list, let me copy&paste..
>>>>
>>>> On 21/06/2021 12:41, Maarten Lankhorst wrote:
>>>>> It doesn't work for legacy ring submission, and is in the best case
>>>>> ignored.
>>>>
>>>> Looks rejected instead of ignored:
>>>>
>>>> static int set_ringsize(struct i915_gem_context *ctx,
>>>>               struct drm_i915_gem_context_param *args)
>>>> {
>>>>       if (!HAS_LOGICAL_RING_CONTEXTS(ctx->i915))
>>>>           return -ENODEV;
>>>>>
>>>>> In the worst case we end up freeing engine->legacy.ring for all other
>>>>> active engines, resulting in a use-after-free.
>>>>
>>>> Worst case is cloning because ring_context_alloc is not taking a 
>>>> reference to engine->legacy.ring, or something else?
>>>
>>> No can't be that, it was my incomplete analysis last week. Since 
>>> ring_context_destroy does not actually free the legacy ring I don't 
>>> see any use after free paths.
>>>
>>> Regards,
>>
>> Hmm, it gets stuck inside intel_context_set_ring_size when cloning 
>> engines..
>>
>> I guess it can't happen in practice, just the code introduces the race 
>> by preallocating
>> inside intel_context_lock_pinned()..
> 
> "The code" being the rest of your series? Haven't looked in there, but 
> can't find a problem in upstream. Since as you say, copy_ring_size will 
> run but intel_context_set_ring_size will not free-and-allocate old/new 
> ring since cloned context does not have a state allocated yet.

P.S. Putting a HAS_LOGICAL_RING_CONTEXTS check in copy_ring_size would 
be a bit unfortunate because layering is a bit broken at the moment and 
that wouldn't make it better.

To clarify my thinking: At the moment allocating the ring is 
responsibility of a backend specific hook. Apart from the generic 
intel_context_set_ring_size which breaks that by allocating in the layer 
above the backend. So proper fix could be to introduce backend specific 
hooks for ring allocation/freeing.

*If* you need to allocate the state so early.. not sure about that. I'd 
first need to understand why. If you say it is a race then it was all 
accidental?

Regards,

Tvrtko

> Regards,
> 
> Tvrtko
> 
>> copy_ring_size() should only be called for HAS_LOGICAL_RING_CONTEXTS().
>> I guess that makes this patch obsolete. It can safely be dropped from 
>> the series,
>> I think I should probably introduce a check to only set the size when 
>> HAS_LOGICAL_RING_CONTEXTS
>> evaluates to true, but that wouldn't block the rest of this series.
>>
>> ~Maarten
>>
Maarten Lankhorst June 21, 2021, 1:28 p.m. UTC | #7
Op 21-06-2021 om 15:20 schreef Tvrtko Ursulin:
>
> On 21/06/2021 14:12, Tvrtko Ursulin wrote:
>>
>> On 21/06/2021 14:07, Maarten Lankhorst wrote:
>>> Op 21-06-2021 om 14:52 schreef Tvrtko Ursulin:
>>>>
>>>> On 21/06/2021 13:08, Tvrtko Ursulin wrote:
>>>>>
>>>>> I had some questions on the trybot mailing list, let me copy&paste..
>>>>>
>>>>> On 21/06/2021 12:41, Maarten Lankhorst wrote:
>>>>>> It doesn't work for legacy ring submission, and is in the best case
>>>>>> ignored.
>>>>>
>>>>> Looks rejected instead of ignored:
>>>>>
>>>>> static int set_ringsize(struct i915_gem_context *ctx,
>>>>>               struct drm_i915_gem_context_param *args)
>>>>> {
>>>>>       if (!HAS_LOGICAL_RING_CONTEXTS(ctx->i915))
>>>>>           return -ENODEV;
>>>>>>
>>>>>> In the worst case we end up freeing engine->legacy.ring for all other
>>>>>> active engines, resulting in a use-after-free.
>>>>>
>>>>> Worst case is cloning because ring_context_alloc is not taking a reference to engine->legacy.ring, or something else?
>>>>
>>>> No can't be that, it was my incomplete analysis last week. Since ring_context_destroy does not actually free the legacy ring I don't see any use after free paths.
>>>>
>>>> Regards,
>>>
>>> Hmm, it gets stuck inside intel_context_set_ring_size when cloning engines..
>>>
>>> I guess it can't happen in practice, just the code introduces the race by preallocating
>>> inside intel_context_lock_pinned()..
>>
>> "The code" being the rest of your series? Haven't looked in there, but can't find a problem in upstream. Since as you say, copy_ring_size will run but intel_context_set_ring_size will not free-and-allocate old/new ring since cloned context does not have a state allocated yet.
>
> P.S. Putting a HAS_LOGICAL_RING_CONTEXTS check in copy_ring_size would be a bit unfortunate because layering is a bit broken at the moment and that wouldn't make it better.
>
> To clarify my thinking: At the moment allocating the ring is responsibility of a backend specific hook. Apart from the generic intel_context_set_ring_size which breaks that by allocating in the layer above the backend. So proper fix could be to introduce backend specific hooks for ring allocation/freeing.
>
> *If* you need to allocate the state so early.. not sure about that. I'd first need to understand why. If you say it is a race then it was all accidental? 

I noticed it mostly when debugging. I fixed it currenly by not allocating state in set_ring_size unnecessarily, hence this patch is no longer needed. :)

So if that's the only thing, I can just drop this patch entirely.
diff mbox series

Patch

diff --git a/drivers/gpu/drm/i915/gt/intel_context_param.c b/drivers/gpu/drm/i915/gt/intel_context_param.c
index 65dcd090245d..412c36d1b1dd 100644
--- a/drivers/gpu/drm/i915/gt/intel_context_param.c
+++ b/drivers/gpu/drm/i915/gt/intel_context_param.c
@@ -12,6 +12,9 @@  int intel_context_set_ring_size(struct intel_context *ce, long sz)
 {
 	int err;
 
+	if (ce->engine->gt->submission_method == INTEL_SUBMISSION_RING)
+		return 0;
+
 	if (intel_context_lock_pinned(ce))
 		return -EINTR;