diff mbox series

[4/4] drm/msm/gpu: Attach to the GPU GX power domain

Message ID 20181119234706.5821-5-jcrouse@codeaurora.org (mailing list archive)
State Not Applicable, archived
Headers show
Series msm: clk: Define a special power domain for SDM845 GX | expand

Commit Message

Jordan Crouse Nov. 19, 2018, 11:47 p.m. UTC
99.999% of the time during normal operation the GMU is responsible
for power and clock control on the GX domain and the CPU remains
blissfully unaware. However, there is one situation where the CPU
needs to get involved:

The power sequencing rules dictate that the GX needs to be turned
off before the CX so that the CX can be turned on before the GX
during power up. During normal operation when the CPU is taking
down the CX domain a stop command is sent to the GMU which turns
off the GX domain and then the CPU handles the CX domain.

But if the GMU happened to be unresponsive while the GX domain was
left then the CPU will need to step in and turn off the GX domain
before resetting the CX and rebooting the GMU. This unfortunately
means that the CPU needs to be marginally aware of the GX domain
even though it is expected to usually keep its hands off.

To support this we create a semi-disabled GX power domain that
does nothing to the hardware on power up but tries to shut it
down normally on power down. In this method the reference counting
is correct and we can step in with the pm_runtime_put() at the right
time during the failure path.

This patch sets up the connection to the GX power domain and does
the magic to "enable" and disable it at the right points.

Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
---
 drivers/gpu/drm/msm/adreno/a6xx_gmu.c | 41 ++++++++++++++++++++++++++-
 drivers/gpu/drm/msm/adreno/a6xx_gmu.h |  2 ++
 2 files changed, 42 insertions(+), 1 deletion(-)

Comments

Stephen Boyd Nov. 21, 2018, 7:54 a.m. UTC | #1
Quoting Jordan Crouse (2018-11-19 15:47:06)
> 99.999% of the time during normal operation the GMU is responsible
> for power and clock control on the GX domain and the CPU remains
> blissfully unaware. However, there is one situation where the CPU
> needs to get involved:
> 
> The power sequencing rules dictate that the GX needs to be turned
> off before the CX so that the CX can be turned on before the GX
> during power up. During normal operation when the CPU is taking
> down the CX domain a stop command is sent to the GMU which turns
> off the GX domain and then the CPU handles the CX domain.
> 
> But if the GMU happened to be unresponsive while the GX domain was
> left then the CPU will need to step in and turn off the GX domain

      ^ left on?

> before resetting the CX and rebooting the GMU. This unfortunately
> means that the CPU needs to be marginally aware of the GX domain
> even though it is expected to usually keep its hands off.
> 
> To support this we create a semi-disabled GX power domain that
> does nothing to the hardware on power up but tries to shut it
> down normally on power down. In this method the reference counting
> is correct and we can step in with the pm_runtime_put() at the right
> time during the failure path.
> 
> This patch sets up the connection to the GX power domain and does
> the magic to "enable" and disable it at the right points.
> 
> Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>

The pm_runtime gymnastics is scary! But I'm willing to go with it.

> ---
>  drivers/gpu/drm/msm/adreno/a6xx_gmu.c | 41 ++++++++++++++++++++++++++-
>  drivers/gpu/drm/msm/adreno/a6xx_gmu.h |  2 ++
>  2 files changed, 42 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
> index 51493f409358..ca71709efc94 100644
> --- a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
> +++ b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
> @@ -1142,9 +1169,15 @@ void a6xx_gmu_remove(struct a6xx_gpu *a6xx_gpu)
>         if (IS_ERR_OR_NULL(gmu->mmio))
>                 return;
>  
> -       pm_runtime_disable(gmu->dev);
>         a6xx_gmu_stop(a6xx_gpu);
>  
> +       pm_runtime_disable(gmu->dev);
> +
> +       if (!IS_ERR(gmu->gxpd)) {
> +               pm_runtime_disable(gmu->gxpd);
> +               dev_pm_domain_detach(gmu->gxpd, false);
> +       }
> +
>         a6xx_gmu_irq_disable(gmu);
>         a6xx_gmu_memory_free(gmu, gmu->hfi);
>  
> @@ -1203,6 +1236,12 @@ int a6xx_gmu_probe(struct a6xx_gpu *a6xx_gpu, struct device_node *node)
>         if (gmu->hfi_irq < 0 || gmu->gmu_irq < 0)
>                 goto err;
>  
> +       /*
> +        * Get a link to the GX power domain to reset the GPU in case of GMU
> +        * crash
> +        */
> +       gmu->gxpd = dev_pm_domain_attach_by_name(gmu->dev, "gx");

Are there a6xx devices that don't have this genpd? Just curious if we
could always assume that if it's an a6xx gpu then this must be there and
we can always call pm_runtime_get/put on it and avoid the if (!IS_ERR())
checks.

> +
>         /* Get the power levels for the GMU and GPU */
>         a6xx_gmu_pwrlevels_probe(gmu);
>
Jordan Crouse Nov. 21, 2018, 3 p.m. UTC | #2
On Tue, Nov 20, 2018 at 11:54:46PM -0800, Stephen Boyd wrote:
> Quoting Jordan Crouse (2018-11-19 15:47:06)
> > 99.999% of the time during normal operation the GMU is responsible
> > for power and clock control on the GX domain and the CPU remains
> > blissfully unaware. However, there is one situation where the CPU
> > needs to get involved:
> > 
> > The power sequencing rules dictate that the GX needs to be turned
> > off before the CX so that the CX can be turned on before the GX
> > during power up. During normal operation when the CPU is taking
> > down the CX domain a stop command is sent to the GMU which turns
> > off the GX domain and then the CPU handles the CX domain.
> > 
> > But if the GMU happened to be unresponsive while the GX domain was
> > left then the CPU will need to step in and turn off the GX domain
> 
>       ^ left on?
> 
> > before resetting the CX and rebooting the GMU. This unfortunately
> > means that the CPU needs to be marginally aware of the GX domain
> > even though it is expected to usually keep its hands off.
> > 
> > To support this we create a semi-disabled GX power domain that
> > does nothing to the hardware on power up but tries to shut it
> > down normally on power down. In this method the reference counting
> > is correct and we can step in with the pm_runtime_put() at the right
> > time during the failure path.
> > 
> > This patch sets up the connection to the GX power domain and does
> > the magic to "enable" and disable it at the right points.
> > 
> > Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
> 
> The pm_runtime gymnastics is scary! But I'm willing to go with it.
> 
> > ---
> >  drivers/gpu/drm/msm/adreno/a6xx_gmu.c | 41 ++++++++++++++++++++++++++-
> >  drivers/gpu/drm/msm/adreno/a6xx_gmu.h |  2 ++
> >  2 files changed, 42 insertions(+), 1 deletion(-)
> > 
> > diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
> > index 51493f409358..ca71709efc94 100644
> > --- a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
> > +++ b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
> > @@ -1142,9 +1169,15 @@ void a6xx_gmu_remove(struct a6xx_gpu *a6xx_gpu)
> >         if (IS_ERR_OR_NULL(gmu->mmio))
> >                 return;
> >  
> > -       pm_runtime_disable(gmu->dev);
> >         a6xx_gmu_stop(a6xx_gpu);
> >  
> > +       pm_runtime_disable(gmu->dev);
> > +
> > +       if (!IS_ERR(gmu->gxpd)) {
> > +               pm_runtime_disable(gmu->gxpd);
> > +               dev_pm_domain_detach(gmu->gxpd, false);
> > +       }
> > +
> >         a6xx_gmu_irq_disable(gmu);
> >         a6xx_gmu_memory_free(gmu, gmu->hfi);
> >  
> > @@ -1203,6 +1236,12 @@ int a6xx_gmu_probe(struct a6xx_gpu *a6xx_gpu, struct device_node *node)
> >         if (gmu->hfi_irq < 0 || gmu->gmu_irq < 0)
> >                 goto err;
> >  
> > +       /*
> > +        * Get a link to the GX power domain to reset the GPU in case of GMU
> > +        * crash
> > +        */
> > +       gmu->gxpd = dev_pm_domain_attach_by_name(gmu->dev, "gx");
> 
> Are there a6xx devices that don't have this genpd? Just curious if we
> could always assume that if it's an a6xx gpu then this must be there and
> we can always call pm_runtime_get/put on it and avoid the if (!IS_ERR())
> checks.

Mainly I was trying to be backwards compatible with the device tree. I didn't
want to derail various development efforts. I don't mind the checks (though I
wish that pm_runtime_* was error tolerant) but folks really hate them we can
replace this with an error return and force people to update their device trees.

Jordan

> +
> >         /* Get the power levels for the GMU and GPU */
> >         a6xx_gmu_pwrlevels_probe(gmu);
> >
Stephen Boyd Nov. 21, 2018, 7:24 p.m. UTC | #3
Quoting Jordan Crouse (2018-11-21 07:00:06)
> On Tue, Nov 20, 2018 at 11:54:46PM -0800, Stephen Boyd wrote:
> > Quoting Jordan Crouse (2018-11-19 15:47:06)
> > > @@ -1203,6 +1236,12 @@ int a6xx_gmu_probe(struct a6xx_gpu *a6xx_gpu, struct device_node *node)
> > >         if (gmu->hfi_irq < 0 || gmu->gmu_irq < 0)
> > >                 goto err;
> > >  
> > > +       /*
> > > +        * Get a link to the GX power domain to reset the GPU in case of GMU
> > > +        * crash
> > > +        */
> > > +       gmu->gxpd = dev_pm_domain_attach_by_name(gmu->dev, "gx");
> > 
> > Are there a6xx devices that don't have this genpd? Just curious if we
> > could always assume that if it's an a6xx gpu then this must be there and
> > we can always call pm_runtime_get/put on it and avoid the if (!IS_ERR())
> > checks.
> 
> Mainly I was trying to be backwards compatible with the device tree. I didn't
> want to derail various development efforts. I don't mind the checks (though I
> wish that pm_runtime_* was error tolerant) but folks really hate them we can
> replace this with an error return and force people to update their device trees.
> 

Ok. Well we don't have this DT node in mainline yet so either followup
after the merge cycle and remove the checks, or just take the pain now
and remove the checks and tell people using out of tree DT to update?
diff mbox series

Patch

diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
index 51493f409358..ca71709efc94 100644
--- a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
+++ b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
@@ -2,6 +2,7 @@ 
 /* Copyright (c) 2017-2018 The Linux Foundation. All rights reserved. */
 
 #include <linux/clk.h>
+#include <linux/pm_domain.h>
 #include <linux/pm_opp.h>
 #include <soc/qcom/cmd-db.h>
 
@@ -646,6 +647,16 @@  int a6xx_gmu_reset(struct a6xx_gpu *a6xx_gpu)
 	gmu_poll_timeout(gmu, REG_A6XX_RSCC_TCS3_DRV0_STATUS, val,
 		(val & 1), 100, 1000);
 
+	/*
+	 * Depending on the state of the GMU at this point the GX domain might
+	 * have been left on. Hardware sequencing rules state that the GX has to
+	 * be turned off before the CX domain so this is that one time that
+	 * that calling pm_runtime_put_sync() is expected to do something useful
+	 * (turn off the headswitch)
+	 */
+	if (!IS_ERR(gmu->gxpd))
+		pm_runtime_put_sync(gmu->gxpd);
+
 	/* Disable the resources */
 	clk_bulk_disable_unprepare(gmu->nr_clocks, gmu->clocks);
 	pm_runtime_put_sync(gmu->dev);
@@ -707,6 +718,14 @@  int a6xx_gmu_resume(struct a6xx_gpu *a6xx_gpu)
 	/* Set the GPU to the highest power frequency */
 	__a6xx_gmu_set_freq(gmu, gmu->nr_gpu_freqs - 1);
 
+	/*
+	 * "enable" the GX power domain which won't actually do anything but it
+	 * will make sure that the refcounting is correct in case we need to
+	 * bring down the GX after a GMU failure
+	 */
+	if (!IS_ERR(gmu->gxpd))
+		pm_runtime_get(gmu->gxpd);
+
 out:
 	/* Make sure to turn off the boot OOB request on error */
 	if (ret)
@@ -778,6 +797,14 @@  int a6xx_gmu_stop(struct a6xx_gpu *a6xx_gpu)
 	/* Tell RPMh to power off the GPU */
 	a6xx_rpmh_stop(gmu);
 
+	/*
+	 * Mark the GPU power domain as off. During the shutdown process the GMU
+	 * should actually turn off the power so this is really just a
+	 * houskeeping step
+	 */
+	if (!IS_ERR(gmu->gxpd))
+		pm_runtime_put_sync(gmu->gxpd);
+
 	clk_bulk_disable_unprepare(gmu->nr_clocks, gmu->clocks);
 
 	pm_runtime_put_sync(gmu->dev);
@@ -1142,9 +1169,15 @@  void a6xx_gmu_remove(struct a6xx_gpu *a6xx_gpu)
 	if (IS_ERR_OR_NULL(gmu->mmio))
 		return;
 
-	pm_runtime_disable(gmu->dev);
 	a6xx_gmu_stop(a6xx_gpu);
 
+	pm_runtime_disable(gmu->dev);
+
+	if (!IS_ERR(gmu->gxpd)) {
+		pm_runtime_disable(gmu->gxpd);
+		dev_pm_domain_detach(gmu->gxpd, false);
+	}
+
 	a6xx_gmu_irq_disable(gmu);
 	a6xx_gmu_memory_free(gmu, gmu->hfi);
 
@@ -1203,6 +1236,12 @@  int a6xx_gmu_probe(struct a6xx_gpu *a6xx_gpu, struct device_node *node)
 	if (gmu->hfi_irq < 0 || gmu->gmu_irq < 0)
 		goto err;
 
+	/*
+	 * Get a link to the GX power domain to reset the GPU in case of GMU
+	 * crash
+	 */
+	gmu->gxpd = dev_pm_domain_attach_by_name(gmu->dev, "gx");
+
 	/* Get the power levels for the GMU and GPU */
 	a6xx_gmu_pwrlevels_probe(gmu);
 
diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gmu.h b/drivers/gpu/drm/msm/adreno/a6xx_gmu.h
index a871cae2fc5e..dcc172d55f49 100644
--- a/drivers/gpu/drm/msm/adreno/a6xx_gmu.h
+++ b/drivers/gpu/drm/msm/adreno/a6xx_gmu.h
@@ -55,6 +55,8 @@  struct a6xx_gmu {
 	struct iommu_domain *domain;
 	u64 uncached_iova_base;
 
+	struct device *gxpd;
+
 	int idle_level;
 
 	struct a6xx_gmu_bo *hfi;