Message ID | 20230202104822.1.I0e49003bf4dd1dead9be4a29dbee41f3b1236e48@changeid (mailing list archive) |
---|---|
State | Not Applicable |
Headers | show |
Series | drm/msm/a6xx: Make GPU destroy a bit safer | expand |
diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c index f3c9600221d4..7f5bc73b2040 100644 --- a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c +++ b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c @@ -974,7 +974,7 @@ int a6xx_gmu_resume(struct a6xx_gpu *a6xx_gpu) int status, ret; if (WARN(!gmu->initialized, "The GMU is not set up yet\n")) - return 0; + return -EINVAL; gmu->hung = false; diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c index aae60cbd9164..6faea5049f76 100644 --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c @@ -1746,7 +1746,9 @@ static void a6xx_destroy(struct msm_gpu *gpu) a6xx_llc_slices_destroy(a6xx_gpu); + mutex_lock(&a6xx_gpu->gmu.lock); a6xx_gmu_remove(a6xx_gpu); + mutex_unlock(&a6xx_gpu->gmu.lock); adreno_gpu_cleanup(adreno_gpu);
If, for whatever reason, we're trying process adreno_runtime_resume() at the same time that a6xx_destroy() is running then things can go boom. Specifically adreno_runtime_resume() will eventually call a6xx_pm_resume() and that may try to resume the gmu. Let's grab the GMU lock as we're destroying the GMU. That will solve the race because a6xx_pm_resume() grabs the same lock. That makes the access of `gmu->initialized` in a6xx_gmu_resume() safe. We'll also return an error code in a6xx_gmu_resume() if we see that `gmu->initialized` was false. If this happens we'll bail out of the rest of a6xx_pm_resume(), which is good because the rest of that function is also not good to do if we're racing with a6xx_destroy(). Signed-off-by: Douglas Anderson <dianders@chromium.org> --- This doesn't _really_ matter for upstream, but downstream in ChromeOS we have a GPU inputboost patch. That inputboost patch was related to adreno_runtime_resume() getting called at the same time that a6xx_destroy() was running. This was seen at bootup when the panel failed to probe. Despite the fact that this isn't truly fixing any bugs upstream, it still seems like a general improvement for the GPU driver. drivers/gpu/drm/msm/adreno/a6xx_gmu.c | 2 +- drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 2 ++ 2 files changed, 3 insertions(+), 1 deletion(-)