Message ID | 20180219140144.24004-1-chris@chris-wilson.co.uk (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On 19/02/2018 14:01, Chris Wilson wrote: > If we fail to unbind the vma (due to a signal on an active buffer that > needs to be moved for the next execbuf), then we need to clear the > persistent tracking state we setup for this execbuf. > > Fixes: c7c6e46f913b ("drm/i915: Convert execbuf to use struct-of-array packing for critical fields") > Testcase: igt/gem_fenced_exec_thrash/no-spare-fences-busy* > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> > Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> > Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> > Cc: <stable@vger.kernel.org> # v4.14+ > --- > drivers/gpu/drm/i915/i915_gem_execbuffer.c | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c > index 51f3c32c64bf..4eb28e84fda4 100644 > --- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c > +++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c > @@ -505,6 +505,8 @@ eb_add_vma(struct i915_execbuffer *eb, unsigned int i, struct i915_vma *vma) > list_add_tail(&vma->exec_link, &eb->unbound); > if (drm_mm_node_allocated(&vma->node)) > err = i915_vma_unbind(vma); > + if (unlikely(err)) > + vma->exec_flags = NULL; > } > return err; > } > I was trying to track down what actually explodes for like 15 minutes. My track was: eb_relocate -> eb_lookup_vmas fails -> eb_relocate -> eb_relocate_slow -> eb_reset_vmas -> second pass to eb_lookup_vmas -> resets vma->exec_flags. So no explosion. So in other words I've failed to find what goes wrong and under which circumstances. Regards, Tvrtko
Quoting Tvrtko Ursulin (2018-02-19 18:31:31) > > On 19/02/2018 14:01, Chris Wilson wrote: > > If we fail to unbind the vma (due to a signal on an active buffer that > > needs to be moved for the next execbuf), then we need to clear the > > persistent tracking state we setup for this execbuf. > > > > Fixes: c7c6e46f913b ("drm/i915: Convert execbuf to use struct-of-array packing for critical fields") > > Testcase: igt/gem_fenced_exec_thrash/no-spare-fences-busy* > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> > > Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> > > Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> > > Cc: <stable@vger.kernel.org> # v4.14+ > > --- > > drivers/gpu/drm/i915/i915_gem_execbuffer.c | 2 ++ > > 1 file changed, 2 insertions(+) > > > > diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c > > index 51f3c32c64bf..4eb28e84fda4 100644 > > --- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c > > +++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c > > @@ -505,6 +505,8 @@ eb_add_vma(struct i915_execbuffer *eb, unsigned int i, struct i915_vma *vma) > > list_add_tail(&vma->exec_link, &eb->unbound); > > if (drm_mm_node_allocated(&vma->node)) > > err = i915_vma_unbind(vma); > > + if (unlikely(err)) > > + vma->exec_flags = NULL; > > } > > return err; > > } > > > > I was trying to track down what actually explodes for like 15 minutes. > > My track was: > > eb_relocate -> eb_lookup_vmas fails -> eb_relocate -> eb_relocate_slow > -> eb_reset_vmas -> second pass to eb_lookup_vmas -> resets > vma->exec_flags. So no explosion. > > So in other words I've failed to find what goes wrong and under which > circumstances. The first eb_relocate calls eb_lookup_vma triggers the failure and exit from execbuf. In that path, we mark the current index as the sentinel (err_vma: eb->vma[i] = NULL) which means we do not clear the last vma when unwinding in eb_release_vmas. So the vma->exec_flags was carried over into the next execbuf call from userspace. -Chris
On 19/02/2018 18:35, Chris Wilson wrote: > Quoting Tvrtko Ursulin (2018-02-19 18:31:31) >> >> On 19/02/2018 14:01, Chris Wilson wrote: >>> If we fail to unbind the vma (due to a signal on an active buffer that >>> needs to be moved for the next execbuf), then we need to clear the >>> persistent tracking state we setup for this execbuf. >>> >>> Fixes: c7c6e46f913b ("drm/i915: Convert execbuf to use struct-of-array packing for critical fields") >>> Testcase: igt/gem_fenced_exec_thrash/no-spare-fences-busy* >>> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> >>> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> >>> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> >>> Cc: <stable@vger.kernel.org> # v4.14+ >>> --- >>> drivers/gpu/drm/i915/i915_gem_execbuffer.c | 2 ++ >>> 1 file changed, 2 insertions(+) >>> >>> diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c >>> index 51f3c32c64bf..4eb28e84fda4 100644 >>> --- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c >>> +++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c >>> @@ -505,6 +505,8 @@ eb_add_vma(struct i915_execbuffer *eb, unsigned int i, struct i915_vma *vma) >>> list_add_tail(&vma->exec_link, &eb->unbound); >>> if (drm_mm_node_allocated(&vma->node)) >>> err = i915_vma_unbind(vma); >>> + if (unlikely(err)) >>> + vma->exec_flags = NULL; >>> } >>> return err; >>> } >>> >> >> I was trying to track down what actually explodes for like 15 minutes. >> >> My track was: >> >> eb_relocate -> eb_lookup_vmas fails -> eb_relocate -> eb_relocate_slow >> -> eb_reset_vmas -> second pass to eb_lookup_vmas -> resets >> vma->exec_flags. So no explosion. >> >> So in other words I've failed to find what goes wrong and under which >> circumstances. > > The first eb_relocate calls eb_lookup_vma triggers the failure and exit > from execbuf. In that path, we mark the current index as the sentinel > (err_vma: eb->vma[i] = NULL) which means we do not clear the last vma > when unwinding in eb_release_vmas. So the vma->exec_flags was carried > over into the next execbuf call from userspace. Ah yes, I missed the !vma continue bit in eb_release vmas. Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Regards, Tvrtko
Quoting Tvrtko Ursulin (2018-02-19 19:16:36) > > On 19/02/2018 18:35, Chris Wilson wrote: > > Quoting Tvrtko Ursulin (2018-02-19 18:31:31) > >> > >> On 19/02/2018 14:01, Chris Wilson wrote: > >>> If we fail to unbind the vma (due to a signal on an active buffer that > >>> needs to be moved for the next execbuf), then we need to clear the > >>> persistent tracking state we setup for this execbuf. > >>> > >>> Fixes: c7c6e46f913b ("drm/i915: Convert execbuf to use struct-of-array packing for critical fields") > >>> Testcase: igt/gem_fenced_exec_thrash/no-spare-fences-busy* > >>> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> > >>> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> > >>> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> > >>> Cc: <stable@vger.kernel.org> # v4.14+ > >>> --- [snip] > Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> And grabbed because I want to see some improvement in the morning :) Thanks for submitting yourself to the horrors of i915_gem_execbuf, -Chris
diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c index 51f3c32c64bf..4eb28e84fda4 100644 --- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c +++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c @@ -505,6 +505,8 @@ eb_add_vma(struct i915_execbuffer *eb, unsigned int i, struct i915_vma *vma) list_add_tail(&vma->exec_link, &eb->unbound); if (drm_mm_node_allocated(&vma->node)) err = i915_vma_unbind(vma); + if (unlikely(err)) + vma->exec_flags = NULL; } return err; }
If we fail to unbind the vma (due to a signal on an active buffer that needs to be moved for the next execbuf), then we need to clear the persistent tracking state we setup for this execbuf. Fixes: c7c6e46f913b ("drm/i915: Convert execbuf to use struct-of-array packing for critical fields") Testcase: igt/gem_fenced_exec_thrash/no-spare-fences-busy* Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: <stable@vger.kernel.org> # v4.14+ --- drivers/gpu/drm/i915/i915_gem_execbuffer.c | 2 ++ 1 file changed, 2 insertions(+)