Message ID | 20250313-v3d-gpu-reset-fixes-v4-1-c1e780d8e096@igalia.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | drm/v3d: Fix GPU reset issues on the Raspberry Pi 5 | expand |
On 13/03/25 11:43, Maíra Canal wrote: > The V3D driver still relies on `drm_sched_increase_karma()` and > `drm_sched_resubmit_jobs()` for resubmissions when a timeout occurs. > The function `drm_sched_increase_karma()` marks the job as guilty, while > `drm_sched_resubmit_jobs()` sets an error (-ECANCELED) in the DMA fence of > that guilty job. > > Because of this, we must check whether the job’s DMA fence has been > flagged with an error before executing the job. Otherwise, the same guilty > job may be resubmitted indefinitely, causing repeated GPU resets. > > This patch adds a check for an error on the job's fence to prevent running > a guilty job that was previously flagged when the GPU timed out. > > Note that the CPU and CACHE_CLEAN queues do not require this check, as > their jobs are executed synchronously once the DRM scheduler starts them. > > Cc: stable@vger.kernel.org > Fixes: d223f98f0209 ("drm/v3d: Add support for compute shader dispatch.") > Fixes: 1584f16ca96e ("drm/v3d: Add support for submitting jobs to the TFU.") > Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> > Signed-off-by: Maíra Canal <mcanal@igalia.com> As patches 1/7 and 2/7 prevent the same faulty job from being resubmitted in a loop, I just applied them to misc/kernel.git (drm-misc- fixes). Best Regards, - Maíra > --- > drivers/gpu/drm/v3d/v3d_sched.c | 9 ++++++++- > 1 file changed, 8 insertions(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/v3d/v3d_sched.c b/drivers/gpu/drm/v3d/v3d_sched.c > index 80466ce8c7df669280e556c0793490b79e75d2c7..c2010ecdb08f4ba3b54f7783ed33901552d0eba1 100644 > --- a/drivers/gpu/drm/v3d/v3d_sched.c > +++ b/drivers/gpu/drm/v3d/v3d_sched.c > @@ -327,11 +327,15 @@ v3d_tfu_job_run(struct drm_sched_job *sched_job) > struct drm_device *dev = &v3d->drm; > struct dma_fence *fence; > > + if (unlikely(job->base.base.s_fence->finished.error)) > + return NULL; > + > + v3d->tfu_job = job; > + > fence = v3d_fence_create(v3d, V3D_TFU); > if (IS_ERR(fence)) > return NULL; > > - v3d->tfu_job = job; > if (job->base.irq_fence) > dma_fence_put(job->base.irq_fence); > job->base.irq_fence = dma_fence_get(fence); > @@ -369,6 +373,9 @@ v3d_csd_job_run(struct drm_sched_job *sched_job) > struct dma_fence *fence; > int i, csd_cfg0_reg; > > + if (unlikely(job->base.base.s_fence->finished.error)) > + return NULL; > + > v3d->csd_job = job; > > v3d_invalidate_caches(v3d); >
diff --git a/drivers/gpu/drm/v3d/v3d_sched.c b/drivers/gpu/drm/v3d/v3d_sched.c index 80466ce8c7df669280e556c0793490b79e75d2c7..c2010ecdb08f4ba3b54f7783ed33901552d0eba1 100644 --- a/drivers/gpu/drm/v3d/v3d_sched.c +++ b/drivers/gpu/drm/v3d/v3d_sched.c @@ -327,11 +327,15 @@ v3d_tfu_job_run(struct drm_sched_job *sched_job) struct drm_device *dev = &v3d->drm; struct dma_fence *fence; + if (unlikely(job->base.base.s_fence->finished.error)) + return NULL; + + v3d->tfu_job = job; + fence = v3d_fence_create(v3d, V3D_TFU); if (IS_ERR(fence)) return NULL; - v3d->tfu_job = job; if (job->base.irq_fence) dma_fence_put(job->base.irq_fence); job->base.irq_fence = dma_fence_get(fence); @@ -369,6 +373,9 @@ v3d_csd_job_run(struct drm_sched_job *sched_job) struct dma_fence *fence; int i, csd_cfg0_reg; + if (unlikely(job->base.base.s_fence->finished.error)) + return NULL; + v3d->csd_job = job; v3d_invalidate_caches(v3d);