diff mbox series

[v2] drm: msm: a6xx: fix gpu failure after system resume

Message ID 1594996458-15529-1-git-send-email-akhilpo@codeaurora.org (mailing list archive)
State New, archived
Headers show
Series [v2] drm: msm: a6xx: fix gpu failure after system resume | expand

Commit Message

Akhil P Oommen July 17, 2020, 2:34 p.m. UTC
On targets where GMU is available, GMU takes over the ownership of GX GDSC
during its initialization. So, move the refcount-get on GX PD before we
initialize the GMU. This ensures that nobody can collapse the GX GDSC
once GMU owns the GX GDSC. This patch fixes some GMU OOB errors seen
during GPU wake up during a system resume.

Signed-off-by: Akhil P Oommen <akhilpo@codeaurora.org>
Reported-by: Matthias Kaehlcke <mka@chromium.org>
Tested-by: Matthias Kaehlcke <mka@chromium.org>
---
Changes from v1:
- Reworded the commit text
- Added Reported-by & Tested-by tags

 drivers/gpu/drm/msm/adreno/a6xx_gmu.c | 18 ++++++++++--------
 1 file changed, 10 insertions(+), 8 deletions(-)

Comments

Jordan Crouse July 17, 2020, 2:46 p.m. UTC | #1
On Fri, Jul 17, 2020 at 08:04:18PM +0530, Akhil P Oommen wrote:
> On targets where GMU is available, GMU takes over the ownership of GX GDSC
> during its initialization. So, move the refcount-get on GX PD before we
> initialize the GMU. This ensures that nobody can collapse the GX GDSC
> once GMU owns the GX GDSC. This patch fixes some GMU OOB errors seen
> during GPU wake up during a system resume.

> Signed-off-by: Akhil P Oommen <akhilpo@codeaurora.org>
> Reported-by: Matthias Kaehlcke <mka@chromium.org>
> Tested-by: Matthias Kaehlcke <mka@chromium.org>

The Signed-off-by needs to be at the end but I think Rob can do that for you.

Reviewed-by: Jordan Crouse <jcrouse@codeaurora.org>

> ---
> Changes from v1:
> - Reworded the commit text
> - Added Reported-by & Tested-by tags
> 
>  drivers/gpu/drm/msm/adreno/a6xx_gmu.c | 18 ++++++++++--------
>  1 file changed, 10 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
> index 21e77d6..1d33020 100644
> --- a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
> +++ b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
> @@ -854,10 +854,19 @@ int a6xx_gmu_resume(struct a6xx_gpu *a6xx_gpu)
>  	/* Turn on the resources */
>  	pm_runtime_get_sync(gmu->dev);
>  
> +	/*
> +	 * "enable" the GX power domain which won't actually do anything but it
> +	 * will make sure that the refcounting is correct in case we need to
> +	 * bring down the GX after a GMU failure
> +	 */
> +	if (!IS_ERR_OR_NULL(gmu->gxpd))
> +		pm_runtime_get_sync(gmu->gxpd);
> +
>  	/* Use a known rate to bring up the GMU */
>  	clk_set_rate(gmu->core_clk, 200000000);
>  	ret = clk_bulk_prepare_enable(gmu->nr_clocks, gmu->clocks);
>  	if (ret) {
> +		pm_runtime_put(gmu->gxpd);
>  		pm_runtime_put(gmu->dev);
>  		return ret;
>  	}
> @@ -903,19 +912,12 @@ int a6xx_gmu_resume(struct a6xx_gpu *a6xx_gpu)
>  	else
>  		a6xx_hfi_set_freq(gmu, gmu->current_perf_index);
>  
> -	/*
> -	 * "enable" the GX power domain which won't actually do anything but it
> -	 * will make sure that the refcounting is correct in case we need to
> -	 * bring down the GX after a GMU failure
> -	 */
> -	if (!IS_ERR_OR_NULL(gmu->gxpd))
> -		pm_runtime_get(gmu->gxpd);
> -
>  out:
>  	/* On failure, shut down the GMU to leave it in a good state */
>  	if (ret) {
>  		disable_irq(gmu->gmu_irq);
>  		a6xx_rpmh_stop(gmu);
> +		pm_runtime_put(gmu->gxpd);
>  		pm_runtime_put(gmu->dev);
>  	}
>  
> -- 
> 2.7.4
>
Doug Anderson July 17, 2020, 5:39 p.m. UTC | #2
Hi,

On Fri, Jul 17, 2020 at 7:46 AM Jordan Crouse <jcrouse@codeaurora.org> wrote:
>
> On Fri, Jul 17, 2020 at 08:04:18PM +0530, Akhil P Oommen wrote:
> > On targets where GMU is available, GMU takes over the ownership of GX GDSC
> > during its initialization. So, move the refcount-get on GX PD before we
> > initialize the GMU. This ensures that nobody can collapse the GX GDSC
> > once GMU owns the GX GDSC. This patch fixes some GMU OOB errors seen
> > during GPU wake up during a system resume.
>
> > Signed-off-by: Akhil P Oommen <akhilpo@codeaurora.org>
> > Reported-by: Matthias Kaehlcke <mka@chromium.org>
> > Tested-by: Matthias Kaehlcke <mka@chromium.org>
>
> The Signed-off-by needs to be at the end but I think Rob can do that for you.

It does?  I've always been told that this is supposed to be roughly a
log of what happens.  In that sense you added your SoB before the
review/test happened so it should come before.  I know some
maintainers seem to do things differently but that seems to be the
most common.  In that sense, I think the order that Akhil has is
correct.  ...but, obviously, it's up to the maintainer.  ;-)
Rob Clark July 17, 2020, 8:24 p.m. UTC | #3
On Fri, Jul 17, 2020 at 10:39 AM Doug Anderson <dianders@chromium.org> wrote:
>
> Hi,
>
> On Fri, Jul 17, 2020 at 7:46 AM Jordan Crouse <jcrouse@codeaurora.org> wrote:
> >
> > On Fri, Jul 17, 2020 at 08:04:18PM +0530, Akhil P Oommen wrote:
> > > On targets where GMU is available, GMU takes over the ownership of GX GDSC
> > > during its initialization. So, move the refcount-get on GX PD before we
> > > initialize the GMU. This ensures that nobody can collapse the GX GDSC
> > > once GMU owns the GX GDSC. This patch fixes some GMU OOB errors seen
> > > during GPU wake up during a system resume.
> >
> > > Signed-off-by: Akhil P Oommen <akhilpo@codeaurora.org>
> > > Reported-by: Matthias Kaehlcke <mka@chromium.org>
> > > Tested-by: Matthias Kaehlcke <mka@chromium.org>
> >
> > The Signed-off-by needs to be at the end but I think Rob can do that for you.
>
> It does?  I've always been told that this is supposed to be roughly a
> log of what happens.  In that sense you added your SoB before the
> review/test happened so it should come before.  I know some
> maintainers seem to do things differently but that seems to be the
> most common.  In that sense, I think the order that Akhil has is
> correct.  ...but, obviously, it's up to the maintainer.  ;-)

yeah, I chronological order was my understanding too.. but presumably
the Reported-by happened before the Signed-of-by (which is how I
reordered things when applying the patch)

BR,
-R
Doug Anderson July 17, 2020, 8:29 p.m. UTC | #4
Hi,


On Fri, Jul 17, 2020 at 1:24 PM Rob Clark <robdclark@gmail.com> wrote:
>
> On Fri, Jul 17, 2020 at 10:39 AM Doug Anderson <dianders@chromium.org> wrote:
> >
> > Hi,
> >
> > On Fri, Jul 17, 2020 at 7:46 AM Jordan Crouse <jcrouse@codeaurora.org> wrote:
> > >
> > > On Fri, Jul 17, 2020 at 08:04:18PM +0530, Akhil P Oommen wrote:
> > > > On targets where GMU is available, GMU takes over the ownership of GX GDSC
> > > > during its initialization. So, move the refcount-get on GX PD before we
> > > > initialize the GMU. This ensures that nobody can collapse the GX GDSC
> > > > once GMU owns the GX GDSC. This patch fixes some GMU OOB errors seen
> > > > during GPU wake up during a system resume.
> > >
> > > > Signed-off-by: Akhil P Oommen <akhilpo@codeaurora.org>
> > > > Reported-by: Matthias Kaehlcke <mka@chromium.org>
> > > > Tested-by: Matthias Kaehlcke <mka@chromium.org>
> > >
> > > The Signed-off-by needs to be at the end but I think Rob can do that for you.
> >
> > It does?  I've always been told that this is supposed to be roughly a
> > log of what happens.  In that sense you added your SoB before the
> > review/test happened so it should come before.  I know some
> > maintainers seem to do things differently but that seems to be the
> > most common.  In that sense, I think the order that Akhil has is
> > correct.  ...but, obviously, it's up to the maintainer.  ;-)
>
> yeah, I chronological order was my understanding too.. but presumably
> the Reported-by happened before the Signed-of-by (which is how I
> reordered things when applying the patch)

Doh!  Yeah, I somehow read that as Reviewed-by.  Thanks!  :-)

-Doug
diff mbox series

Patch

diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
index 21e77d6..1d33020 100644
--- a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
+++ b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
@@ -854,10 +854,19 @@  int a6xx_gmu_resume(struct a6xx_gpu *a6xx_gpu)
 	/* Turn on the resources */
 	pm_runtime_get_sync(gmu->dev);
 
+	/*
+	 * "enable" the GX power domain which won't actually do anything but it
+	 * will make sure that the refcounting is correct in case we need to
+	 * bring down the GX after a GMU failure
+	 */
+	if (!IS_ERR_OR_NULL(gmu->gxpd))
+		pm_runtime_get_sync(gmu->gxpd);
+
 	/* Use a known rate to bring up the GMU */
 	clk_set_rate(gmu->core_clk, 200000000);
 	ret = clk_bulk_prepare_enable(gmu->nr_clocks, gmu->clocks);
 	if (ret) {
+		pm_runtime_put(gmu->gxpd);
 		pm_runtime_put(gmu->dev);
 		return ret;
 	}
@@ -903,19 +912,12 @@  int a6xx_gmu_resume(struct a6xx_gpu *a6xx_gpu)
 	else
 		a6xx_hfi_set_freq(gmu, gmu->current_perf_index);
 
-	/*
-	 * "enable" the GX power domain which won't actually do anything but it
-	 * will make sure that the refcounting is correct in case we need to
-	 * bring down the GX after a GMU failure
-	 */
-	if (!IS_ERR_OR_NULL(gmu->gxpd))
-		pm_runtime_get(gmu->gxpd);
-
 out:
 	/* On failure, shut down the GMU to leave it in a good state */
 	if (ret) {
 		disable_irq(gmu->gmu_irq);
 		a6xx_rpmh_stop(gmu);
+		pm_runtime_put(gmu->gxpd);
 		pm_runtime_put(gmu->dev);
 	}