drm: msm: a6xx: fix gpu failure after system resume

Message ID	1594733130-398-1-git-send-email-akhilpo@codeaurora.org (mailing list archive)
State	New, archived
Headers	show Return-Path: <SRS0=WUjm=AZ=lists.freedesktop.org=dri-devel-bounces@kernel.org> DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 2862722203 Date: Subject: Cc: To: From: Sender; bh=ZuuVvtXOSn5P69Z0eecIOVwDd77Y6Cz0UPcGzE05C34=; b=BcbcNW/IwtwzyRvnAlzgRB4kFoT037A/2FuOgFTszwBeNHIkIqYSetwuZ4brzp/CELQQfaGO 6dNYcdazRlJEL866LNf3/324XwB0XqD0eQz+y/MpPcEYvd4yNvzk9zlRQHDEXg3jU/cTXPLV 1Vvs4o5Npkw3wFVlTkeoQoI8JuI= sender: akhilpo) by smtp.codeaurora.org (Postfix) with ESMTPSA id DDDF8C433CA; Tue, 14 Jul 2020 13:25:36 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 smtp.codeaurora.org DDDF8C433CA From: Akhil P Oommen <akhilpo@codeaurora.org> To: freedreno@lists.freedesktop.org Subject: [PATCH] drm: msm: a6xx: fix gpu failure after system resume Date: Tue, 14 Jul 2020 18:55:30 +0530 Message-Id: <1594733130-398-1-git-send-email-akhilpo@codeaurora.org> Precedence: list Cc: devicetree@vger.kernel.org, rnayak@codeaurora.org, jonathan@marek.ca, linux-arm-msm@vger.kernel.org, linux-kernel@vger.kernel.org, mka@chromium.org, dri-devel@freedesktop.org MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" <dri-devel-bounces@lists.freedesktop.org>
Series	drm: msm: a6xx: fix gpu failure after system resume \| expand drm: msm: a6xx: fix gpu failure after system resume

Message ID

1594733130-398-1-git-send-email-akhilpo@codeaurora.org (mailing list archive)

State

New, archived

Headers

DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 2862722203
DMARC-Filter: OpenDMARC Filter v1.3.2 smtp.codeaurora.org DDDF8C433CA
From: Akhil P Oommen <akhilpo@codeaurora.org>
To: freedreno@lists.freedesktop.org
Subject: [PATCH] drm: msm: a6xx: fix gpu failure after system resume
Date: Tue, 14 Jul 2020 18:55:30 +0530
Message-Id: <1594733130-398-1-git-send-email-akhilpo@codeaurora.org>
Precedence: list
Cc: devicetree@vger.kernel.org, rnayak@codeaurora.org, jonathan@marek.ca,
 linux-arm-msm@vger.kernel.org, linux-kernel@vger.kernel.org,
 mka@chromium.org,
 dri-devel@freedesktop.org
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Errors-To: dri-devel-bounces@lists.freedesktop.org
Sender: "dri-devel" <dri-devel-bounces@lists.freedesktop.org>

Series

drm: msm: a6xx: fix gpu failure after system resume | expand

Commit Message

Akhil P Oommen July 14, 2020, 1:25 p.m. UTC

On targets where GMU is available, GMU takes over the ownership of GX GDSC
during its initialization. So, take a refcount on the GX PD on behalf of
GMU before we initialize it. This makes sure that nobody can collapse the
GX GDSC once GMU owns the GX GDSC. This patch fixes some weird failures
during GPU wake up during system resume.

Signed-off-by: Akhil P Oommen <akhilpo@codeaurora.org>
---
 drivers/gpu/drm/msm/adreno/a6xx_gmu.c | 18 ++++++++++--------
 1 file changed, 10 insertions(+), 8 deletions(-)

Comments

Jordan Crouse July 14, 2020, 2:57 p.m. UTC | #1

On Tue, Jul 14, 2020 at 06:55:30PM +0530, Akhil P Oommen wrote:
> On targets where GMU is available, GMU takes over the ownership of GX GDSC
> during its initialization. So, take a refcount on the GX PD on behalf of
> GMU before we initialize it. This makes sure that nobody can collapse the
> GX GDSC once GMU owns the GX GDSC. This patch fixes some weird failures
> during GPU wake up during system resume.

The change looks fine but this explanation is confusing. When I read it I
thought "oh, man, we weren't taking a reference to the GX PD during resume???"
but that's not really the case. We *are* taking a reference, just not soon
enough to avoid possible issues. It would be helpful if you reworded this to
explain that you are moving the reference and perhaps to shine a bit more light
on what the "weird" failures are.

Jordan

> Signed-off-by: Akhil P Oommen <akhilpo@codeaurora.org>
> ---
>  drivers/gpu/drm/msm/adreno/a6xx_gmu.c | 18 ++++++++++--------
>  1 file changed, 10 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
> index a6f43ff..5b2df7d 100644
> --- a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
> +++ b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
> @@ -873,10 +873,19 @@ int a6xx_gmu_resume(struct a6xx_gpu *a6xx_gpu)
>  	/* Turn on the resources */
>  	pm_runtime_get_sync(gmu->dev);
>  
> +	/*
> +	 * "enable" the GX power domain which won't actually do anything but it
> +	 * will make sure that the refcounting is correct in case we need to
> +	 * bring down the GX after a GMU failure
> +	 */
> +	if (!IS_ERR_OR_NULL(gmu->gxpd))
> +		pm_runtime_get_sync(gmu->gxpd);
> +
>  	/* Use a known rate to bring up the GMU */
>  	clk_set_rate(gmu->core_clk, 200000000);
>  	ret = clk_bulk_prepare_enable(gmu->nr_clocks, gmu->clocks);
>  	if (ret) {
> +		pm_runtime_put(gmu->gxpd);
>  		pm_runtime_put(gmu->dev);
>  		return ret;
>  	}
> @@ -919,19 +928,12 @@ int a6xx_gmu_resume(struct a6xx_gpu *a6xx_gpu)
>  	/* Set the GPU to the current freq */
>  	a6xx_gmu_set_initial_freq(gpu, gmu);
>  
> -	/*
> -	 * "enable" the GX power domain which won't actually do anything but it
> -	 * will make sure that the refcounting is correct in case we need to
> -	 * bring down the GX after a GMU failure
> -	 */
> -	if (!IS_ERR_OR_NULL(gmu->gxpd))
> -		pm_runtime_get(gmu->gxpd);
> -
>  out:
>  	/* On failure, shut down the GMU to leave it in a good state */
>  	if (ret) {
>  		disable_irq(gmu->gmu_irq);
>  		a6xx_rpmh_stop(gmu);
> +		pm_runtime_put(gmu->gxpd);
>  		pm_runtime_put(gmu->dev);
>  	}
>  
> -- 
> 2.7.4
>

Matthias Kaehlcke July 14, 2020, 5:10 p.m. UTC | #2

On Tue, Jul 14, 2020 at 06:55:30PM +0530, Akhil P Oommen wrote:
> On targets where GMU is available, GMU takes over the ownership of GX GDSC
> during its initialization. So, take a refcount on the GX PD on behalf of
> GMU before we initialize it. This makes sure that nobody can collapse the
> GX GDSC once GMU owns the GX GDSC. This patch fixes some weird failures
> during GPU wake up during system resume.
> 
> Signed-off-by: Akhil P Oommen <akhilpo@codeaurora.org>

I went through a few dozen suspend/resume cycles on SC7180 and didn't run
into the kernel panic that typically occurs after a few iterations without
this patch.

Reported-by: Matthias Kaehlcke <mka@chromium.org>
Tested-by: Matthias Kaehlcke <mka@chromium.org>

On which tree is this patch based on? I had to apply it manually because
'git am' is unhappy when I try to apply it:

  error: sha1 information is lacking or useless (drivers/gpu/drm/msm/adreno/a6xx_gmu.c).
  error: could not build fake ancestor

Both upstream and drm-msm are in my remotes and synced, so I suspect it's
some private tree. Please make sure to base patches on the corresponding
maintainer tree or upstream, whichs makes life easier for maintainers,
testers and reviewers.

Rob Clark July 14, 2020, 6:42 p.m. UTC | #3

On Tue, Jul 14, 2020 at 10:10 AM Matthias Kaehlcke <mka@chromium.org> wrote:
>
> On Tue, Jul 14, 2020 at 06:55:30PM +0530, Akhil P Oommen wrote:
> > On targets where GMU is available, GMU takes over the ownership of GX GDSC
> > during its initialization. So, take a refcount on the GX PD on behalf of
> > GMU before we initialize it. This makes sure that nobody can collapse the
> > GX GDSC once GMU owns the GX GDSC. This patch fixes some weird failures
> > during GPU wake up during system resume.
> >
> > Signed-off-by: Akhil P Oommen <akhilpo@codeaurora.org>
>
> I went through a few dozen suspend/resume cycles on SC7180 and didn't run
> into the kernel panic that typically occurs after a few iterations without
> this patch.
>
> Reported-by: Matthias Kaehlcke <mka@chromium.org>
> Tested-by: Matthias Kaehlcke <mka@chromium.org>
>
> On which tree is this patch based on? I had to apply it manually because
> 'git am' is unhappy when I try to apply it:
>
>   error: sha1 information is lacking or useless (drivers/gpu/drm/msm/adreno/a6xx_gmu.c).
>   error: could not build fake ancestor
>
> Both upstream and drm-msm are in my remotes and synced, so I suspect it's
> some private tree. Please make sure to base patches on the corresponding
> maintainer tree or upstream, whichs makes life easier for maintainers,
> testers and reviewers.

I've run into the same issue frequently :-(

BR,
-R

Akhil P Oommen July 17, 2020, 2:36 p.m. UTC | #4

On 7/15/2020 12:12 AM, Rob Clark wrote:
> On Tue, Jul 14, 2020 at 10:10 AM Matthias Kaehlcke <mka@chromium.org> wrote:
>>
>> On Tue, Jul 14, 2020 at 06:55:30PM +0530, Akhil P Oommen wrote:
>>> On targets where GMU is available, GMU takes over the ownership of GX GDSC
>>> during its initialization. So, take a refcount on the GX PD on behalf of
>>> GMU before we initialize it. This makes sure that nobody can collapse the
>>> GX GDSC once GMU owns the GX GDSC. This patch fixes some weird failures
>>> during GPU wake up during system resume.
>>>
>>> Signed-off-by: Akhil P Oommen <akhilpo@codeaurora.org>
>>
>> I went through a few dozen suspend/resume cycles on SC7180 and didn't run
>> into the kernel panic that typically occurs after a few iterations without
>> this patch.
>>
>> Reported-by: Matthias Kaehlcke <mka@chromium.org>
>> Tested-by: Matthias Kaehlcke <mka@chromium.org>
>>
>> On which tree is this patch based on? I had to apply it manually because
>> 'git am' is unhappy when I try to apply it:
>>
>>    error: sha1 information is lacking or useless (drivers/gpu/drm/msm/adreno/a6xx_gmu.c).
>>    error: could not build fake ancestor
>>
>> Both upstream and drm-msm are in my remotes and synced, so I suspect it's
>> some private tree. Please make sure to base patches on the corresponding
>> maintainer tree or upstream, whichs makes life easier for maintainers,
>> testers and reviewers.
> 
> I've run into the same issue frequently :-(
> 
> BR,
> -R
> 
Sorry, I was using msm-next brand as the base, but had the opp-next 
branch merged too inadvertently.

-Akhil

diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
index a6f43ff..5b2df7d 100644
--- a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
+++ b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
@@ -873,10 +873,19 @@  int a6xx_gmu_resume(struct a6xx_gpu *a6xx_gpu)
 	/* Turn on the resources */
 	pm_runtime_get_sync(gmu->dev);
 
+	/*
+	 * "enable" the GX power domain which won't actually do anything but it
+	 * will make sure that the refcounting is correct in case we need to
+	 * bring down the GX after a GMU failure
+	 */
+	if (!IS_ERR_OR_NULL(gmu->gxpd))
+		pm_runtime_get_sync(gmu->gxpd);
+
 	/* Use a known rate to bring up the GMU */
 	clk_set_rate(gmu->core_clk, 200000000);
 	ret = clk_bulk_prepare_enable(gmu->nr_clocks, gmu->clocks);
 	if (ret) {
+		pm_runtime_put(gmu->gxpd);
 		pm_runtime_put(gmu->dev);
 		return ret;
 	}
@@ -919,19 +928,12 @@  int a6xx_gmu_resume(struct a6xx_gpu *a6xx_gpu)
 	/* Set the GPU to the current freq */
 	a6xx_gmu_set_initial_freq(gpu, gmu);
 
-	/*
-	 * "enable" the GX power domain which won't actually do anything but it
-	 * will make sure that the refcounting is correct in case we need to
-	 * bring down the GX after a GMU failure
-	 */
-	if (!IS_ERR_OR_NULL(gmu->gxpd))
-		pm_runtime_get(gmu->gxpd);
-
 out:
 	/* On failure, shut down the GMU to leave it in a good state */
 	if (ret) {
 		disable_irq(gmu->gmu_irq);
 		a6xx_rpmh_stop(gmu);
+		pm_runtime_put(gmu->gxpd);
 		pm_runtime_put(gmu->dev);
 	}

drm: msm: a6xx: fix gpu failure after system resume

Commit Message

Comments

Patch