diff mbox series

[v5,01/16] drm/sched: Document what the timedout_job method should do

Message ID 20210629073510.2764391-2-boris.brezillon@collabora.com (mailing list archive)
State New, archived
Headers show
Series drm/panfrost: Misc improvements | expand

Commit Message

Boris Brezillon June 29, 2021, 7:34 a.m. UTC
The documentation is a bit vague and doesn't really describe what the
->timedout_job() is expected to do. Let's add a few more details.

v5:
* New patch

Suggested-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
---
 include/drm/gpu_scheduler.h | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

Comments

Daniel Vetter June 29, 2021, 9:05 a.m. UTC | #1
On Tue, Jun 29, 2021 at 09:34:55AM +0200, Boris Brezillon wrote:
> The documentation is a bit vague and doesn't really describe what the
> ->timedout_job() is expected to do. Let's add a few more details.
> 
> v5:
> * New patch
> 
> Suggested-by: Daniel Vetter <daniel.vetter@ffwll.ch>
> Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>

Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>

> ---
>  include/drm/gpu_scheduler.h | 14 ++++++++++++++
>  1 file changed, 14 insertions(+)
> 
> diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
> index 10225a0a35d0..65700511e074 100644
> --- a/include/drm/gpu_scheduler.h
> +++ b/include/drm/gpu_scheduler.h
> @@ -239,6 +239,20 @@ struct drm_sched_backend_ops {
>  	 * @timedout_job: Called when a job has taken too long to execute,
>  	 * to trigger GPU recovery.
>  	 *
> +	 * This method is called in a workqueue context.
> +	 *
> +	 * Drivers typically issue a reset to recover from GPU hangs, and this
> +	 * procedure usually follows the following workflow:
> +	 *
> +	 * 1. Stop the scheduler using drm_sched_stop(). This will park the
> +	 *    scheduler thread and cancel the timeout work, guaranteeing that
> +	 *    nothing is queued while we reset the hardware queue
> +	 * 2. Try to gracefully stop non-faulty jobs (optional)
> +	 * 3. Issue a GPU reset (driver-specific)
> +	 * 4. Re-submit jobs using drm_sched_resubmit_jobs()
> +	 * 5. Restart the scheduler using drm_sched_start(). At that point, new
> +	 *    jobs can be queued, and the scheduler thread is unblocked
> +	 *
>  	 * Return DRM_GPU_SCHED_STAT_NOMINAL, when all is normal,
>  	 * and the underlying driver has started or completed recovery.
>  	 *
> -- 
> 2.31.1
>
diff mbox series

Patch

diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
index 10225a0a35d0..65700511e074 100644
--- a/include/drm/gpu_scheduler.h
+++ b/include/drm/gpu_scheduler.h
@@ -239,6 +239,20 @@  struct drm_sched_backend_ops {
 	 * @timedout_job: Called when a job has taken too long to execute,
 	 * to trigger GPU recovery.
 	 *
+	 * This method is called in a workqueue context.
+	 *
+	 * Drivers typically issue a reset to recover from GPU hangs, and this
+	 * procedure usually follows the following workflow:
+	 *
+	 * 1. Stop the scheduler using drm_sched_stop(). This will park the
+	 *    scheduler thread and cancel the timeout work, guaranteeing that
+	 *    nothing is queued while we reset the hardware queue
+	 * 2. Try to gracefully stop non-faulty jobs (optional)
+	 * 3. Issue a GPU reset (driver-specific)
+	 * 4. Re-submit jobs using drm_sched_resubmit_jobs()
+	 * 5. Restart the scheduler using drm_sched_start(). At that point, new
+	 *    jobs can be queued, and the scheduler thread is unblocked
+	 *
 	 * Return DRM_GPU_SCHED_STAT_NOMINAL, when all is normal,
 	 * and the underlying driver has started or completed recovery.
 	 *