Message ID | 20210629073510.2764391-2-boris.brezillon@collabora.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | drm/panfrost: Misc improvements | expand |
On Tue, Jun 29, 2021 at 09:34:55AM +0200, Boris Brezillon wrote: > The documentation is a bit vague and doesn't really describe what the > ->timedout_job() is expected to do. Let's add a few more details. > > v5: > * New patch > > Suggested-by: Daniel Vetter <daniel.vetter@ffwll.ch> > Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> > --- > include/drm/gpu_scheduler.h | 14 ++++++++++++++ > 1 file changed, 14 insertions(+) > > diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h > index 10225a0a35d0..65700511e074 100644 > --- a/include/drm/gpu_scheduler.h > +++ b/include/drm/gpu_scheduler.h > @@ -239,6 +239,20 @@ struct drm_sched_backend_ops { > * @timedout_job: Called when a job has taken too long to execute, > * to trigger GPU recovery. > * > + * This method is called in a workqueue context. > + * > + * Drivers typically issue a reset to recover from GPU hangs, and this > + * procedure usually follows the following workflow: > + * > + * 1. Stop the scheduler using drm_sched_stop(). This will park the > + * scheduler thread and cancel the timeout work, guaranteeing that > + * nothing is queued while we reset the hardware queue > + * 2. Try to gracefully stop non-faulty jobs (optional) > + * 3. Issue a GPU reset (driver-specific) > + * 4. Re-submit jobs using drm_sched_resubmit_jobs() > + * 5. Restart the scheduler using drm_sched_start(). At that point, new > + * jobs can be queued, and the scheduler thread is unblocked > + * > * Return DRM_GPU_SCHED_STAT_NOMINAL, when all is normal, > * and the underlying driver has started or completed recovery. > * > -- > 2.31.1 >
diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h index 10225a0a35d0..65700511e074 100644 --- a/include/drm/gpu_scheduler.h +++ b/include/drm/gpu_scheduler.h @@ -239,6 +239,20 @@ struct drm_sched_backend_ops { * @timedout_job: Called when a job has taken too long to execute, * to trigger GPU recovery. * + * This method is called in a workqueue context. + * + * Drivers typically issue a reset to recover from GPU hangs, and this + * procedure usually follows the following workflow: + * + * 1. Stop the scheduler using drm_sched_stop(). This will park the + * scheduler thread and cancel the timeout work, guaranteeing that + * nothing is queued while we reset the hardware queue + * 2. Try to gracefully stop non-faulty jobs (optional) + * 3. Issue a GPU reset (driver-specific) + * 4. Re-submit jobs using drm_sched_resubmit_jobs() + * 5. Restart the scheduler using drm_sched_start(). At that point, new + * jobs can be queued, and the scheduler thread is unblocked + * * Return DRM_GPU_SCHED_STAT_NOMINAL, when all is normal, * and the underlying driver has started or completed recovery. *
The documentation is a bit vague and doesn't really describe what the ->timedout_job() is expected to do. Let's add a few more details. v5: * New patch Suggested-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com> --- include/drm/gpu_scheduler.h | 14 ++++++++++++++ 1 file changed, 14 insertions(+)