@@ -1333,6 +1333,19 @@ EXPORT_SYMBOL(drm_sched_init);
* @sched: scheduler instance
*
* Tears down and cleans up the scheduler.
+ *
+ * This stops submission of new jobs to the hardware through
+ * drm_sched_backend_ops.run_job(). Consequently, drm_sched_backend_ops.free_job()
+ * will not be called for all jobs still in drm_gpu_scheduler.pending_list.
+ * There is no solution for this currently. Thus, it is up to the driver to make
+ * sure that
+ * a) drm_sched_fini() is only called after for all submitted jobs
+ * drm_sched_backend_ops.free_job() has been called or that
+ * b) the jobs for which drm_sched_backend_ops.free_job() has not been called
+ * after drm_sched_fini() ran are freed manually.
+ *
+ * FIXME: Take care of the above problem and prevent this function from leaking
+ * the jobs in drm_gpu_scheduler.pending_list under any circumstances.
*/
void drm_sched_fini(struct drm_gpu_scheduler *sched)
{
@@ -1428,8 +1441,10 @@ EXPORT_SYMBOL(drm_sched_wqueue_ready);
/**
* drm_sched_wqueue_stop - stop scheduler submission
- *
* @sched: scheduler instance
+ *
+ * Stops the scheduler from pulling new jobs from entities. It also stops
+ * freeing jobs automatically through drm_sched_backend_ops.free_job().
*/
void drm_sched_wqueue_stop(struct drm_gpu_scheduler *sched)
{
@@ -1441,8 +1456,12 @@ EXPORT_SYMBOL(drm_sched_wqueue_stop);
/**
* drm_sched_wqueue_start - start scheduler submission
- *
* @sched: scheduler instance
+ *
+ * Restarts the scheduler after drm_sched_wqueue_stop() has stopped it.
+ *
+ * This function is not necessary for 'conventional' startup. The scheduler is
+ * fully operational after drm_sched_init() succeeded.
*/
void drm_sched_wqueue_start(struct drm_gpu_scheduler *sched)
{
If jobs are still enqueued in struct drm_gpu_scheduler.pending_list when drm_sched_fini() gets called, those jobs will be leaked since that function stops both job-submission and (automatic) job-cleanup. It is, thus, up to the driver to take care of preventing leaks. The related function drm_sched_wqueue_stop() also prevents automatic job cleanup. Those pitfals are not reflected in the documentation, currently. Explicitly inform about the leak problem in the docstring of drm_sched_fini(). Additionally, detail the purpose of drm_sched_wqueue_{start,stop} and hint at the consequences for automatic cleanup. Signed-off-by: Philipp Stanner <pstanner@redhat.com> --- Hi, in our discussion about my proposed waitque-cleanup for this problem Sima suggested [1] that we document the problems first and by doing so get to a consenus what the problems actually are and how we could solve them. This is my proposal for documenting the leaks on teardown. Feedback very welcome. P. [1] https://lore.kernel.org/dri-devel/ZtidJ8S9THvzkQ-6@phenom.ffwll.local/ --- drivers/gpu/drm/scheduler/sched_main.c | 23 +++++++++++++++++++++-- 1 file changed, 21 insertions(+), 2 deletions(-)