mbox series

[0/5] drm/i915: Get rid of fence error propagation

Message ID 20210602164149.391653-1-jason@jlekstrand.net (mailing list archive)
Headers show
Series drm/i915: Get rid of fence error propagation | expand

Message

Jason Ekstrand June 2, 2021, 4:41 p.m. UTC
Fence error propagation is sketchy at best.  Instead of explicitly handling
fences which might have errors set in the code which is aware of errors, we
just kick them down the line and hope that userspace knows what to do when
a wait eventually fails.  This is sketchy at best because most userspace
isn't prepared to handle errors in those places.  To make things worse, it
allows errors to propagate across processes in unpredictable ways.  This is
causing hangs in one client to kill X11.

Unfortunately, there's no quick path from here to there thanks to the fact
that we're now running the command parser asynchronously and relying on
fence errors for when it fails.  This series first gets rid of asynchronous
command parsing and then cleans up from there.  There was never any real
use-case for asynchronous parsing and the platforms that rely heavily on
the command parser are old enough (Gen7) that, when we changed the way the
command parser works, it wasn't really a change anyone was asking for
anyway.

I think we probably want this whole mess back-ported.  I'm happy to take
suggestions on the strategy there because the history there is a bit
annoying and I'm not 100% sure where the Linux release cuts land.  In any
case, I'm happy to make a version of this series per-release if needed for
Greg to back-port.

Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Jon Bloomfield <jon.bloomfield@intel.com>

Jason Ekstrand (5):
  drm/i915: Revert "drm/i915/gem: Asynchronous cmdparser"
  drm/i915: Remove allow_alloc from i915_gem_object_get_sg*
  drm/i915: Drop error handling from dma_fence_work
  Revert "drm/i915: Propagate errors on awaiting already signaled
    fences"
  Revert "drm/i915: Skip over MI_NOOP when parsing"

 drivers/gpu/drm/i915/gem/i915_gem_clflush.c   |   4 +-
 .../gpu/drm/i915/gem/i915_gem_execbuffer.c    | 227 +-----------------
 drivers/gpu/drm/i915/gem/i915_gem_object.h    |  10 +-
 drivers/gpu/drm/i915/gem/i915_gem_pages.c     |  21 +-
 .../i915/gem/selftests/i915_gem_execbuffer.c  |   4 +
 drivers/gpu/drm/i915/gt/intel_ggtt.c          |   2 +-
 drivers/gpu/drm/i915/i915_cmd_parser.c        | 199 ++++++++-------
 drivers/gpu/drm/i915/i915_drv.h               |   7 +-
 drivers/gpu/drm/i915/i915_request.c           |   8 +-
 drivers/gpu/drm/i915/i915_sw_fence_work.c     |   5 +-
 drivers/gpu/drm/i915/i915_sw_fence_work.h     |   2 +-
 drivers/gpu/drm/i915/i915_vma.c               |   3 +-
 12 files changed, 141 insertions(+), 351 deletions(-)

Comments

Daniel Vetter June 3, 2021, 8:29 a.m. UTC | #1
On Wed, Jun 02, 2021 at 11:41:44AM -0500, Jason Ekstrand wrote:
> Fence error propagation is sketchy at best.  Instead of explicitly handling
> fences which might have errors set in the code which is aware of errors, we
> just kick them down the line and hope that userspace knows what to do when
> a wait eventually fails.  This is sketchy at best because most userspace
> isn't prepared to handle errors in those places.  To make things worse, it
> allows errors to propagate across processes in unpredictable ways.  This is
> causing hangs in one client to kill X11.
> 
> Unfortunately, there's no quick path from here to there thanks to the fact
> that we're now running the command parser asynchronously and relying on
> fence errors for when it fails.  This series first gets rid of asynchronous
> command parsing and then cleans up from there.  There was never any real
> use-case for asynchronous parsing and the platforms that rely heavily on
> the command parser are old enough (Gen7) that, when we changed the way the
> command parser works, it wasn't really a change anyone was asking for
> anyway.
> 
> I think we probably want this whole mess back-ported.  I'm happy to take
> suggestions on the strategy there because the history there is a bit
> annoying and I'm not 100% sure where the Linux release cuts land.  In any
> case, I'm happy to make a version of this series per-release if needed for
> Greg to back-port.

I think just the two reversts are enough to be backported, other 3 are
cleanups.

Also I guess this will need to come with an igt patch to adjust the
cmdparser test.

With all the nits addressed, on the series.

Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>

> 
> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> Cc: Jon Bloomfield <jon.bloomfield@intel.com>
> 
> Jason Ekstrand (5):
>   drm/i915: Revert "drm/i915/gem: Asynchronous cmdparser"
>   drm/i915: Remove allow_alloc from i915_gem_object_get_sg*
>   drm/i915: Drop error handling from dma_fence_work
>   Revert "drm/i915: Propagate errors on awaiting already signaled
>     fences"
>   Revert "drm/i915: Skip over MI_NOOP when parsing"
> 
>  drivers/gpu/drm/i915/gem/i915_gem_clflush.c   |   4 +-
>  .../gpu/drm/i915/gem/i915_gem_execbuffer.c    | 227 +-----------------
>  drivers/gpu/drm/i915/gem/i915_gem_object.h    |  10 +-
>  drivers/gpu/drm/i915/gem/i915_gem_pages.c     |  21 +-
>  .../i915/gem/selftests/i915_gem_execbuffer.c  |   4 +
>  drivers/gpu/drm/i915/gt/intel_ggtt.c          |   2 +-
>  drivers/gpu/drm/i915/i915_cmd_parser.c        | 199 ++++++++-------
>  drivers/gpu/drm/i915/i915_drv.h               |   7 +-
>  drivers/gpu/drm/i915/i915_request.c           |   8 +-
>  drivers/gpu/drm/i915/i915_sw_fence_work.c     |   5 +-
>  drivers/gpu/drm/i915/i915_sw_fence_work.h     |   2 +-
>  drivers/gpu/drm/i915/i915_vma.c               |   3 +-
>  12 files changed, 141 insertions(+), 351 deletions(-)
> 
> -- 
> 2.31.1
>