diff mbox

drm/i915: Clear the in-use marker on execbuf failure

Message ID 20180219140144.24004-1-chris@chris-wilson.co.uk (mailing list archive)
State New, archived
Headers show

Commit Message

Chris Wilson Feb. 19, 2018, 2:01 p.m. UTC
If we fail to unbind the vma (due to a signal on an active buffer that
needs to be moved for the next execbuf), then we need to clear the
persistent tracking state we setup for this execbuf.

Fixes: c7c6e46f913b ("drm/i915: Convert execbuf to use struct-of-array packing for critical fields")
Testcase: igt/gem_fenced_exec_thrash/no-spare-fences-busy*
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: <stable@vger.kernel.org> # v4.14+
---
 drivers/gpu/drm/i915/i915_gem_execbuffer.c | 2 ++
 1 file changed, 2 insertions(+)

Comments

Tvrtko Ursulin Feb. 19, 2018, 6:31 p.m. UTC | #1
On 19/02/2018 14:01, Chris Wilson wrote:
> If we fail to unbind the vma (due to a signal on an active buffer that
> needs to be moved for the next execbuf), then we need to clear the
> persistent tracking state we setup for this execbuf.
> 
> Fixes: c7c6e46f913b ("drm/i915: Convert execbuf to use struct-of-array packing for critical fields")
> Testcase: igt/gem_fenced_exec_thrash/no-spare-fences-busy*
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> Cc: <stable@vger.kernel.org> # v4.14+
> ---
>   drivers/gpu/drm/i915/i915_gem_execbuffer.c | 2 ++
>   1 file changed, 2 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> index 51f3c32c64bf..4eb28e84fda4 100644
> --- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> +++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> @@ -505,6 +505,8 @@ eb_add_vma(struct i915_execbuffer *eb, unsigned int i, struct i915_vma *vma)
>   		list_add_tail(&vma->exec_link, &eb->unbound);
>   		if (drm_mm_node_allocated(&vma->node))
>   			err = i915_vma_unbind(vma);
> +		if (unlikely(err))
> +			vma->exec_flags = NULL;
>   	}
>   	return err;
>   }
> 

I was trying to track down what actually explodes for like 15 minutes.

My track was:

eb_relocate -> eb_lookup_vmas fails -> eb_relocate -> eb_relocate_slow 
-> eb_reset_vmas -> second pass to eb_lookup_vmas -> resets 
vma->exec_flags. So no explosion.

So in other words I've failed to find what goes wrong and under which 
circumstances.

Regards,

Tvrtko
Chris Wilson Feb. 19, 2018, 6:35 p.m. UTC | #2
Quoting Tvrtko Ursulin (2018-02-19 18:31:31)
> 
> On 19/02/2018 14:01, Chris Wilson wrote:
> > If we fail to unbind the vma (due to a signal on an active buffer that
> > needs to be moved for the next execbuf), then we need to clear the
> > persistent tracking state we setup for this execbuf.
> > 
> > Fixes: c7c6e46f913b ("drm/i915: Convert execbuf to use struct-of-array packing for critical fields")
> > Testcase: igt/gem_fenced_exec_thrash/no-spare-fences-busy*
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> > Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> > Cc: <stable@vger.kernel.org> # v4.14+
> > ---
> >   drivers/gpu/drm/i915/i915_gem_execbuffer.c | 2 ++
> >   1 file changed, 2 insertions(+)
> > 
> > diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> > index 51f3c32c64bf..4eb28e84fda4 100644
> > --- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> > +++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> > @@ -505,6 +505,8 @@ eb_add_vma(struct i915_execbuffer *eb, unsigned int i, struct i915_vma *vma)
> >               list_add_tail(&vma->exec_link, &eb->unbound);
> >               if (drm_mm_node_allocated(&vma->node))
> >                       err = i915_vma_unbind(vma);
> > +             if (unlikely(err))
> > +                     vma->exec_flags = NULL;
> >       }
> >       return err;
> >   }
> > 
> 
> I was trying to track down what actually explodes for like 15 minutes.
> 
> My track was:
> 
> eb_relocate -> eb_lookup_vmas fails -> eb_relocate -> eb_relocate_slow 
> -> eb_reset_vmas -> second pass to eb_lookup_vmas -> resets 
> vma->exec_flags. So no explosion.
> 
> So in other words I've failed to find what goes wrong and under which 
> circumstances.

The first eb_relocate calls eb_lookup_vma triggers the failure and exit
from execbuf. In that path, we mark the current index as the sentinel
(err_vma: eb->vma[i] = NULL) which means we do not clear the last vma
when unwinding in eb_release_vmas. So the vma->exec_flags was carried
over into the next execbuf call from userspace.
-Chris
Tvrtko Ursulin Feb. 19, 2018, 7:16 p.m. UTC | #3
On 19/02/2018 18:35, Chris Wilson wrote:
> Quoting Tvrtko Ursulin (2018-02-19 18:31:31)
>>
>> On 19/02/2018 14:01, Chris Wilson wrote:
>>> If we fail to unbind the vma (due to a signal on an active buffer that
>>> needs to be moved for the next execbuf), then we need to clear the
>>> persistent tracking state we setup for this execbuf.
>>>
>>> Fixes: c7c6e46f913b ("drm/i915: Convert execbuf to use struct-of-array packing for critical fields")
>>> Testcase: igt/gem_fenced_exec_thrash/no-spare-fences-busy*
>>> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
>>> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
>>> Cc: <stable@vger.kernel.org> # v4.14+
>>> ---
>>>    drivers/gpu/drm/i915/i915_gem_execbuffer.c | 2 ++
>>>    1 file changed, 2 insertions(+)
>>>
>>> diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
>>> index 51f3c32c64bf..4eb28e84fda4 100644
>>> --- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
>>> +++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
>>> @@ -505,6 +505,8 @@ eb_add_vma(struct i915_execbuffer *eb, unsigned int i, struct i915_vma *vma)
>>>                list_add_tail(&vma->exec_link, &eb->unbound);
>>>                if (drm_mm_node_allocated(&vma->node))
>>>                        err = i915_vma_unbind(vma);
>>> +             if (unlikely(err))
>>> +                     vma->exec_flags = NULL;
>>>        }
>>>        return err;
>>>    }
>>>
>>
>> I was trying to track down what actually explodes for like 15 minutes.
>>
>> My track was:
>>
>> eb_relocate -> eb_lookup_vmas fails -> eb_relocate -> eb_relocate_slow
>> -> eb_reset_vmas -> second pass to eb_lookup_vmas -> resets
>> vma->exec_flags. So no explosion.
>>
>> So in other words I've failed to find what goes wrong and under which
>> circumstances.
> 
> The first eb_relocate calls eb_lookup_vma triggers the failure and exit
> from execbuf. In that path, we mark the current index as the sentinel
> (err_vma: eb->vma[i] = NULL) which means we do not clear the last vma
> when unwinding in eb_release_vmas. So the vma->exec_flags was carried
> over into the next execbuf call from userspace.

Ah yes, I missed the !vma continue bit in eb_release vmas.

Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Regards,

Tvrtko
Chris Wilson Feb. 19, 2018, 7:44 p.m. UTC | #4
Quoting Tvrtko Ursulin (2018-02-19 19:16:36)
> 
> On 19/02/2018 18:35, Chris Wilson wrote:
> > Quoting Tvrtko Ursulin (2018-02-19 18:31:31)
> >>
> >> On 19/02/2018 14:01, Chris Wilson wrote:
> >>> If we fail to unbind the vma (due to a signal on an active buffer that
> >>> needs to be moved for the next execbuf), then we need to clear the
> >>> persistent tracking state we setup for this execbuf.
> >>>
> >>> Fixes: c7c6e46f913b ("drm/i915: Convert execbuf to use struct-of-array packing for critical fields")
> >>> Testcase: igt/gem_fenced_exec_thrash/no-spare-fences-busy*
> >>> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> >>> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> >>> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> >>> Cc: <stable@vger.kernel.org> # v4.14+
> >>> ---
[snip]
> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

And grabbed because I want to see some improvement in the morning :)

Thanks for submitting yourself to the horrors of i915_gem_execbuf,
-Chris
diff mbox

Patch

diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index 51f3c32c64bf..4eb28e84fda4 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -505,6 +505,8 @@  eb_add_vma(struct i915_execbuffer *eb, unsigned int i, struct i915_vma *vma)
 		list_add_tail(&vma->exec_link, &eb->unbound);
 		if (drm_mm_node_allocated(&vma->node))
 			err = i915_vma_unbind(vma);
+		if (unlikely(err))
+			vma->exec_flags = NULL;
 	}
 	return err;
 }