diff mbox series

drm: msm: a6xx: fix gpu failure after system resume

Message ID 1594733130-398-1-git-send-email-akhilpo@codeaurora.org (mailing list archive)
State New, archived
Headers show
Series drm: msm: a6xx: fix gpu failure after system resume | expand

Commit Message

Akhil P Oommen July 14, 2020, 1:25 p.m. UTC
On targets where GMU is available, GMU takes over the ownership of GX GDSC
during its initialization. So, take a refcount on the GX PD on behalf of
GMU before we initialize it. This makes sure that nobody can collapse the
GX GDSC once GMU owns the GX GDSC. This patch fixes some weird failures
during GPU wake up during system resume.

Signed-off-by: Akhil P Oommen <akhilpo@codeaurora.org>
---
 drivers/gpu/drm/msm/adreno/a6xx_gmu.c | 18 ++++++++++--------
 1 file changed, 10 insertions(+), 8 deletions(-)

Comments

Jordan Crouse July 14, 2020, 2:57 p.m. UTC | #1
On Tue, Jul 14, 2020 at 06:55:30PM +0530, Akhil P Oommen wrote:
> On targets where GMU is available, GMU takes over the ownership of GX GDSC
> during its initialization. So, take a refcount on the GX PD on behalf of
> GMU before we initialize it. This makes sure that nobody can collapse the
> GX GDSC once GMU owns the GX GDSC. This patch fixes some weird failures
> during GPU wake up during system resume.

The change looks fine but this explanation is confusing. When I read it I
thought "oh, man, we weren't taking a reference to the GX PD during resume???"
but that's not really the case. We *are* taking a reference, just not soon
enough to avoid possible issues. It would be helpful if you reworded this to
explain that you are moving the reference and perhaps to shine a bit more light
on what the "weird" failures are.

Jordan

> Signed-off-by: Akhil P Oommen <akhilpo@codeaurora.org>
> ---
>  drivers/gpu/drm/msm/adreno/a6xx_gmu.c | 18 ++++++++++--------
>  1 file changed, 10 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
> index a6f43ff..5b2df7d 100644
> --- a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
> +++ b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
> @@ -873,10 +873,19 @@ int a6xx_gmu_resume(struct a6xx_gpu *a6xx_gpu)
>  	/* Turn on the resources */
>  	pm_runtime_get_sync(gmu->dev);
>  
> +	/*
> +	 * "enable" the GX power domain which won't actually do anything but it
> +	 * will make sure that the refcounting is correct in case we need to
> +	 * bring down the GX after a GMU failure
> +	 */
> +	if (!IS_ERR_OR_NULL(gmu->gxpd))
> +		pm_runtime_get_sync(gmu->gxpd);
> +
>  	/* Use a known rate to bring up the GMU */
>  	clk_set_rate(gmu->core_clk, 200000000);
>  	ret = clk_bulk_prepare_enable(gmu->nr_clocks, gmu->clocks);
>  	if (ret) {
> +		pm_runtime_put(gmu->gxpd);
>  		pm_runtime_put(gmu->dev);
>  		return ret;
>  	}
> @@ -919,19 +928,12 @@ int a6xx_gmu_resume(struct a6xx_gpu *a6xx_gpu)
>  	/* Set the GPU to the current freq */
>  	a6xx_gmu_set_initial_freq(gpu, gmu);
>  
> -	/*
> -	 * "enable" the GX power domain which won't actually do anything but it
> -	 * will make sure that the refcounting is correct in case we need to
> -	 * bring down the GX after a GMU failure
> -	 */
> -	if (!IS_ERR_OR_NULL(gmu->gxpd))
> -		pm_runtime_get(gmu->gxpd);
> -
>  out:
>  	/* On failure, shut down the GMU to leave it in a good state */
>  	if (ret) {
>  		disable_irq(gmu->gmu_irq);
>  		a6xx_rpmh_stop(gmu);
> +		pm_runtime_put(gmu->gxpd);
>  		pm_runtime_put(gmu->dev);
>  	}
>  
> -- 
> 2.7.4
>
Matthias Kaehlcke July 14, 2020, 5:10 p.m. UTC | #2
On Tue, Jul 14, 2020 at 06:55:30PM +0530, Akhil P Oommen wrote:
> On targets where GMU is available, GMU takes over the ownership of GX GDSC
> during its initialization. So, take a refcount on the GX PD on behalf of
> GMU before we initialize it. This makes sure that nobody can collapse the
> GX GDSC once GMU owns the GX GDSC. This patch fixes some weird failures
> during GPU wake up during system resume.
> 
> Signed-off-by: Akhil P Oommen <akhilpo@codeaurora.org>

I went through a few dozen suspend/resume cycles on SC7180 and didn't run
into the kernel panic that typically occurs after a few iterations without
this patch.

Reported-by: Matthias Kaehlcke <mka@chromium.org>
Tested-by: Matthias Kaehlcke <mka@chromium.org>

On which tree is this patch based on? I had to apply it manually because
'git am' is unhappy when I try to apply it:

  error: sha1 information is lacking or useless (drivers/gpu/drm/msm/adreno/a6xx_gmu.c).
  error: could not build fake ancestor

Both upstream and drm-msm are in my remotes and synced, so I suspect it's
some private tree. Please make sure to base patches on the corresponding
maintainer tree or upstream, whichs makes life easier for maintainers,
testers and reviewers.
Rob Clark July 14, 2020, 6:42 p.m. UTC | #3
On Tue, Jul 14, 2020 at 10:10 AM Matthias Kaehlcke <mka@chromium.org> wrote:
>
> On Tue, Jul 14, 2020 at 06:55:30PM +0530, Akhil P Oommen wrote:
> > On targets where GMU is available, GMU takes over the ownership of GX GDSC
> > during its initialization. So, take a refcount on the GX PD on behalf of
> > GMU before we initialize it. This makes sure that nobody can collapse the
> > GX GDSC once GMU owns the GX GDSC. This patch fixes some weird failures
> > during GPU wake up during system resume.
> >
> > Signed-off-by: Akhil P Oommen <akhilpo@codeaurora.org>
>
> I went through a few dozen suspend/resume cycles on SC7180 and didn't run
> into the kernel panic that typically occurs after a few iterations without
> this patch.
>
> Reported-by: Matthias Kaehlcke <mka@chromium.org>
> Tested-by: Matthias Kaehlcke <mka@chromium.org>
>
> On which tree is this patch based on? I had to apply it manually because
> 'git am' is unhappy when I try to apply it:
>
>   error: sha1 information is lacking or useless (drivers/gpu/drm/msm/adreno/a6xx_gmu.c).
>   error: could not build fake ancestor
>
> Both upstream and drm-msm are in my remotes and synced, so I suspect it's
> some private tree. Please make sure to base patches on the corresponding
> maintainer tree or upstream, whichs makes life easier for maintainers,
> testers and reviewers.

I've run into the same issue frequently :-(

BR,
-R
Akhil P Oommen July 17, 2020, 2:36 p.m. UTC | #4
On 7/15/2020 12:12 AM, Rob Clark wrote:
> On Tue, Jul 14, 2020 at 10:10 AM Matthias Kaehlcke <mka@chromium.org> wrote:
>>
>> On Tue, Jul 14, 2020 at 06:55:30PM +0530, Akhil P Oommen wrote:
>>> On targets where GMU is available, GMU takes over the ownership of GX GDSC
>>> during its initialization. So, take a refcount on the GX PD on behalf of
>>> GMU before we initialize it. This makes sure that nobody can collapse the
>>> GX GDSC once GMU owns the GX GDSC. This patch fixes some weird failures
>>> during GPU wake up during system resume.
>>>
>>> Signed-off-by: Akhil P Oommen <akhilpo@codeaurora.org>
>>
>> I went through a few dozen suspend/resume cycles on SC7180 and didn't run
>> into the kernel panic that typically occurs after a few iterations without
>> this patch.
>>
>> Reported-by: Matthias Kaehlcke <mka@chromium.org>
>> Tested-by: Matthias Kaehlcke <mka@chromium.org>
>>
>> On which tree is this patch based on? I had to apply it manually because
>> 'git am' is unhappy when I try to apply it:
>>
>>    error: sha1 information is lacking or useless (drivers/gpu/drm/msm/adreno/a6xx_gmu.c).
>>    error: could not build fake ancestor
>>
>> Both upstream and drm-msm are in my remotes and synced, so I suspect it's
>> some private tree. Please make sure to base patches on the corresponding
>> maintainer tree or upstream, whichs makes life easier for maintainers,
>> testers and reviewers.
> 
> I've run into the same issue frequently :-(
> 
> BR,
> -R
> 
Sorry, I was using msm-next brand as the base, but had the opp-next 
branch merged too inadvertently.

-Akhil
diff mbox series

Patch

diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
index a6f43ff..5b2df7d 100644
--- a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
+++ b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
@@ -873,10 +873,19 @@  int a6xx_gmu_resume(struct a6xx_gpu *a6xx_gpu)
 	/* Turn on the resources */
 	pm_runtime_get_sync(gmu->dev);
 
+	/*
+	 * "enable" the GX power domain which won't actually do anything but it
+	 * will make sure that the refcounting is correct in case we need to
+	 * bring down the GX after a GMU failure
+	 */
+	if (!IS_ERR_OR_NULL(gmu->gxpd))
+		pm_runtime_get_sync(gmu->gxpd);
+
 	/* Use a known rate to bring up the GMU */
 	clk_set_rate(gmu->core_clk, 200000000);
 	ret = clk_bulk_prepare_enable(gmu->nr_clocks, gmu->clocks);
 	if (ret) {
+		pm_runtime_put(gmu->gxpd);
 		pm_runtime_put(gmu->dev);
 		return ret;
 	}
@@ -919,19 +928,12 @@  int a6xx_gmu_resume(struct a6xx_gpu *a6xx_gpu)
 	/* Set the GPU to the current freq */
 	a6xx_gmu_set_initial_freq(gpu, gmu);
 
-	/*
-	 * "enable" the GX power domain which won't actually do anything but it
-	 * will make sure that the refcounting is correct in case we need to
-	 * bring down the GX after a GMU failure
-	 */
-	if (!IS_ERR_OR_NULL(gmu->gxpd))
-		pm_runtime_get(gmu->gxpd);
-
 out:
 	/* On failure, shut down the GMU to leave it in a good state */
 	if (ret) {
 		disable_irq(gmu->gmu_irq);
 		a6xx_rpmh_stop(gmu);
+		pm_runtime_put(gmu->gxpd);
 		pm_runtime_put(gmu->dev);
 	}