mbox series

[0/5] Allow to extend the timeout without jobs disappearing (v2)

Message ID 20201204031722.24040-1-luben.tuikov@amd.com (mailing list archive)
Headers show
Series Allow to extend the timeout without jobs disappearing (v2) | expand

Message

Luben Tuikov Dec. 4, 2020, 3:17 a.m. UTC
Hi guys,

This series of patches implements a pending list for
jobs which are in the hardware, and a done list for
tasks which are done and need to be freed.

As tasks complete and call their DRM callback, their
fences are signalled and tasks are added to the done
list and the main scheduler thread woken up. The main
scheduler thread then frees them up.

When a task times out, the timeout function prototype
now returns a value back to DRM. The reason for this is
that the GPU driver has intimate knowledge of the
hardware and can pass back information to DRM on what
to do. Whether to attempt to abort the task (by say
calling a driver abort function, etc., as the
implementation dictates), or whether the task needs
more time. Note that the task is not moved away from
the pending list, unless it is no longer in the GPU.
(The pending list holds tasks which are pending from
DRM's point of view, i.e. the GPU has control over
them--that could be things like DMA is active, CU's are
active, for the task, etc.)

The idea really is that what DRM wants to know is
whether the task is in the GPU or not. So now
drm_sched_backend_ops::timedout_job() returns
DRM_TASK_STATUS_COMPLETE if the task is no longer with
the GPU, or DRM_TASK_STATUS_ALIVE if the task needs
more time.

This series applies to drm-misc-next at 0a260e731d6c.

Tested and works, but I get a lot of
WARN_ON(bo->pin_count)) from ttm_bo_release()
for the VCN ring of amdgpu.

Cc: Alexander Deucher <Alexander.Deucher@amd.com>
Cc: Andrey Grodzovsky <Andrey.Grodzovsky@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>

Luben Tuikov (5):
  drm/scheduler: "node" --> "list"
  gpu/drm: ring_mirror_list --> pending_list
  drm/scheduler: Essentialize the job done callback
  drm/scheduler: Job timeout handler returns status (v2)
  drm/sched: Make use of a "done" list (v2)

 drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c |   6 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c  |   4 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_job.c     |   8 +-
 drivers/gpu/drm/etnaviv/etnaviv_sched.c     |  10 +-
 drivers/gpu/drm/lima/lima_sched.c           |   4 +-
 drivers/gpu/drm/panfrost/panfrost_job.c     |   9 +-
 drivers/gpu/drm/scheduler/sched_main.c      | 345 +++++++++++---------
 drivers/gpu/drm/v3d/v3d_sched.c             |  32 +-
 include/drm/gpu_scheduler.h                 |  38 ++-
 9 files changed, 255 insertions(+), 201 deletions(-)