Message ID | 20200904081552.38052-2-ysugi@idein.jp (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | drm/v3d: CL/CSD job timeout fixes | expand |
diff --git a/drivers/gpu/drm/v3d/v3d_sched.c b/drivers/gpu/drm/v3d/v3d_sched.c index 0747614a78f0..001216f22017 100644 --- a/drivers/gpu/drm/v3d/v3d_sched.c +++ b/drivers/gpu/drm/v3d/v3d_sched.c @@ -226,6 +226,17 @@ v3d_csd_job_run(struct drm_sched_job *sched_job) struct dma_fence *fence; int i; + /* This error is set to -ECANCELED by drm_sched_resubmit_jobs() if this + * job timed out more than sched_job->sched->hang_limit times. + */ + int error = sched_job->s_fence->finished.error; + + if (unlikely(error < 0)) { + DRM_WARN("Skipping CSD job resubmission due to previous error (%d)\n", + error); + return ERR_PTR(error); + } + v3d->csd_job = job; v3d_invalidate_caches(v3d);
The previous code misses a check for the timeout error set by drm_sched_resubmit_jobs(), which results in an infinite GPU reset loop if once a timeout occurs: [ 178.799106] v3d fec00000.v3d: [drm:v3d_reset [v3d]] *ERROR* Resetting GPU for hang. [ 178.807836] v3d fec00000.v3d: [drm:v3d_reset [v3d]] *ERROR* V3D_ERR_STAT: 0x00001000 [ 179.839132] v3d fec00000.v3d: [drm:v3d_reset [v3d]] *ERROR* Resetting GPU for hang. [ 179.847865] v3d fec00000.v3d: [drm:v3d_reset [v3d]] *ERROR* V3D_ERR_STAT: 0x00001000 [ 180.879146] v3d fec00000.v3d: [drm:v3d_reset [v3d]] *ERROR* Resetting GPU for hang. [ 180.887925] v3d fec00000.v3d: [drm:v3d_reset [v3d]] *ERROR* V3D_ERR_STAT: 0x00001000 [ 181.919188] v3d fec00000.v3d: [drm:v3d_reset [v3d]] *ERROR* Resetting GPU for hang. [ 181.928002] v3d fec00000.v3d: [drm:v3d_reset [v3d]] *ERROR* V3D_ERR_STAT: 0x00001000 ... This commit adds the check for timeout as in v3d_{bin,render}_job_run(): [ 66.408962] v3d fec00000.v3d: [drm:v3d_reset [v3d]] *ERROR* Resetting GPU for hang. [ 66.417734] v3d fec00000.v3d: [drm:v3d_reset [v3d]] *ERROR* V3D_ERR_STAT: 0x00001000 [ 66.428296] [drm] Skipping CSD job resubmission due to previous error (-125) , where -125 is -ECANCELED, though users currently have no way other than inspecting the dmesg to check if the timeout has occurred. Signed-off-by: Yukimasa Sugizaki <ysugi@idein.jp> --- drivers/gpu/drm/v3d/v3d_sched.c | 11 +++++++++++ 1 file changed, 11 insertions(+) -- 2.7.4