diff mbox series

[v2,1/2] drm/msm/adreno: De-spaghettify the use of memory barriers

Message ID 20240625-adreno_barriers-v2-1-c01f2ef4b62a@linaro.org (mailing list archive)
State New, archived
Headers show
Series Clean up barriers | expand

Commit Message

Konrad Dybcio June 25, 2024, 6:54 p.m. UTC
Memory barriers help ensure instruction ordering, NOT time and order
of actual write arrival at other observers (e.g. memory-mapped IP).
On architectures employing weak memory ordering, the latter can be a
giant pain point, and it has been as part of this driver.

Moreover, the gpu_/gmu_ accessors already use non-relaxed versions of
readl/writel, which include r/w (respectively) barriers.

Replace the barriers with a readback (or drop altogether where possible)
that ensures the previous writes have exited the write buffer (as the CPU
must flush the write to the register it's trying to read back).

Signed-off-by: Konrad Dybcio <konrad.dybcio@linaro.org>
---
 drivers/gpu/drm/msm/adreno/a6xx_gmu.c |  4 +---
 drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 10 ++++++----
 2 files changed, 7 insertions(+), 7 deletions(-)

Comments

Daniel Vetter June 26, 2024, 7:59 a.m. UTC | #1
On Tue, Jun 25, 2024 at 08:54:41PM +0200, Konrad Dybcio wrote:
> Memory barriers help ensure instruction ordering, NOT time and order
> of actual write arrival at other observers (e.g. memory-mapped IP).
> On architectures employing weak memory ordering, the latter can be a
> giant pain point, and it has been as part of this driver.
> 
> Moreover, the gpu_/gmu_ accessors already use non-relaxed versions of
> readl/writel, which include r/w (respectively) barriers.
> 
> Replace the barriers with a readback (or drop altogether where possible)
> that ensures the previous writes have exited the write buffer (as the CPU
> must flush the write to the register it's trying to read back).
> 
> Signed-off-by: Konrad Dybcio <konrad.dybcio@linaro.org>

Some in pci these readbacks are actually part of the spec and called
posting reads. I'd very much recommend drivers create a small wrapper
function for these cases with a void return value, because it makes the
code so much more legible and easier to understand.
-Sima

> ---
>  drivers/gpu/drm/msm/adreno/a6xx_gmu.c |  4 +---
>  drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 10 ++++++----
>  2 files changed, 7 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
> index 0e3dfd4c2bc8..09d640165b18 100644
> --- a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
> +++ b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
> @@ -466,9 +466,7 @@ static int a6xx_rpmh_start(struct a6xx_gmu *gmu)
>  	int ret;
>  	u32 val;
>  
> -	gmu_write(gmu, REG_A6XX_GMU_RSCC_CONTROL_REQ, 1 << 1);
> -	/* Wait for the register to finish posting */
> -	wmb();
> +	gmu_write(gmu, REG_A6XX_GMU_RSCC_CONTROL_REQ, BIT(1));
>  
>  	ret = gmu_poll_timeout(gmu, REG_A6XX_GMU_RSCC_CONTROL_ACK, val,
>  		val & (1 << 1), 100, 10000);
> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> index c98cdb1e9326..4083d0cad782 100644
> --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> @@ -855,14 +855,16 @@ static int hw_init(struct msm_gpu *gpu)
>  	/* Clear GBIF halt in case GX domain was not collapsed */
>  	if (adreno_is_a619_holi(adreno_gpu)) {
>  		gpu_write(gpu, REG_A6XX_GBIF_HALT, 0);
> +		gpu_read(gpu, REG_A6XX_GBIF_HALT);
> +
>  		gpu_write(gpu, REG_A6XX_RBBM_GPR0_CNTL, 0);
> -		/* Let's make extra sure that the GPU can access the memory.. */
> -		mb();
> +		gpu_read(gpu, REG_A6XX_RBBM_GPR0_CNTL);
>  	} else if (a6xx_has_gbif(adreno_gpu)) {
>  		gpu_write(gpu, REG_A6XX_GBIF_HALT, 0);
> +		gpu_read(gpu, REG_A6XX_GBIF_HALT);
> +
>  		gpu_write(gpu, REG_A6XX_RBBM_GBIF_HALT, 0);
> -		/* Let's make extra sure that the GPU can access the memory.. */
> -		mb();
> +		gpu_read(gpu, REG_A6XX_RBBM_GBIF_HALT);
>  	}
>  
>  	/* Some GPUs are stubborn and take their sweet time to unhalt GBIF! */
> 
> -- 
> 2.45.2
>
Akhil P Oommen June 26, 2024, 9:24 p.m. UTC | #2
On Wed, Jun 26, 2024 at 09:59:39AM +0200, Daniel Vetter wrote:
> On Tue, Jun 25, 2024 at 08:54:41PM +0200, Konrad Dybcio wrote:
> > Memory barriers help ensure instruction ordering, NOT time and order
> > of actual write arrival at other observers (e.g. memory-mapped IP).
> > On architectures employing weak memory ordering, the latter can be a
> > giant pain point, and it has been as part of this driver.
> > 
> > Moreover, the gpu_/gmu_ accessors already use non-relaxed versions of
> > readl/writel, which include r/w (respectively) barriers.
> > 
> > Replace the barriers with a readback (or drop altogether where possible)
> > that ensures the previous writes have exited the write buffer (as the CPU
> > must flush the write to the register it's trying to read back).
> > 
> > Signed-off-by: Konrad Dybcio <konrad.dybcio@linaro.org>
> 
> Some in pci these readbacks are actually part of the spec and called
> posting reads. I'd very much recommend drivers create a small wrapper
> function for these cases with a void return value, because it makes the
> code so much more legible and easier to understand.

For Adreno which is configured via mmio, we don't need to do this often. GBIF_HALT
is a scenario where we need to be extra careful as it can potentially cause some
internal lockup. Another scenario I can think of is GPU soft reset where need to
keep a delay on cpu side after triggering. We should closely scrutinize any
other instance that comes up. So I feel a good justification as a comment here
would be enough, to remind the reader. Think of it as a way to discourage the
use by making it hard.

This is a bit subjective, I am fine if you have a strong opinion on this.

-Akhil.

> -Sima
> 
> > ---
> >  drivers/gpu/drm/msm/adreno/a6xx_gmu.c |  4 +---
> >  drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 10 ++++++----
> >  2 files changed, 7 insertions(+), 7 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
> > index 0e3dfd4c2bc8..09d640165b18 100644
> > --- a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
> > +++ b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
> > @@ -466,9 +466,7 @@ static int a6xx_rpmh_start(struct a6xx_gmu *gmu)
> >  	int ret;
> >  	u32 val;
> >  
> > -	gmu_write(gmu, REG_A6XX_GMU_RSCC_CONTROL_REQ, 1 << 1);
> > -	/* Wait for the register to finish posting */
> > -	wmb();
> > +	gmu_write(gmu, REG_A6XX_GMU_RSCC_CONTROL_REQ, BIT(1));
> >  
> >  	ret = gmu_poll_timeout(gmu, REG_A6XX_GMU_RSCC_CONTROL_ACK, val,
> >  		val & (1 << 1), 100, 10000);
> > diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> > index c98cdb1e9326..4083d0cad782 100644
> > --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> > +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> > @@ -855,14 +855,16 @@ static int hw_init(struct msm_gpu *gpu)
> >  	/* Clear GBIF halt in case GX domain was not collapsed */
> >  	if (adreno_is_a619_holi(adreno_gpu)) {
> >  		gpu_write(gpu, REG_A6XX_GBIF_HALT, 0);
> > +		gpu_read(gpu, REG_A6XX_GBIF_HALT);
> > +
> >  		gpu_write(gpu, REG_A6XX_RBBM_GPR0_CNTL, 0);
> > -		/* Let's make extra sure that the GPU can access the memory.. */
> > -		mb();
> > +		gpu_read(gpu, REG_A6XX_RBBM_GPR0_CNTL);
> >  	} else if (a6xx_has_gbif(adreno_gpu)) {
> >  		gpu_write(gpu, REG_A6XX_GBIF_HALT, 0);
> > +		gpu_read(gpu, REG_A6XX_GBIF_HALT);
> > +
> >  		gpu_write(gpu, REG_A6XX_RBBM_GBIF_HALT, 0);
> > -		/* Let's make extra sure that the GPU can access the memory.. */
> > -		mb();
> > +		gpu_read(gpu, REG_A6XX_RBBM_GBIF_HALT);
> >  	}
> >  
> >  	/* Some GPUs are stubborn and take their sweet time to unhalt GBIF! */
> > 
> > -- 
> > 2.45.2
> > 
> 
> -- 
> Daniel Vetter
> Software Engineer, Intel Corporation
> http://blog.ffwll.ch
Daniel Vetter June 27, 2024, 7:56 a.m. UTC | #3
On Thu, Jun 27, 2024 at 02:54:57AM +0530, Akhil P Oommen wrote:
> On Wed, Jun 26, 2024 at 09:59:39AM +0200, Daniel Vetter wrote:
> > On Tue, Jun 25, 2024 at 08:54:41PM +0200, Konrad Dybcio wrote:
> > > Memory barriers help ensure instruction ordering, NOT time and order
> > > of actual write arrival at other observers (e.g. memory-mapped IP).
> > > On architectures employing weak memory ordering, the latter can be a
> > > giant pain point, and it has been as part of this driver.
> > > 
> > > Moreover, the gpu_/gmu_ accessors already use non-relaxed versions of
> > > readl/writel, which include r/w (respectively) barriers.
> > > 
> > > Replace the barriers with a readback (or drop altogether where possible)
> > > that ensures the previous writes have exited the write buffer (as the CPU
> > > must flush the write to the register it's trying to read back).
> > > 
> > > Signed-off-by: Konrad Dybcio <konrad.dybcio@linaro.org>
> > 
> > Some in pci these readbacks are actually part of the spec and called
> > posting reads. I'd very much recommend drivers create a small wrapper
> > function for these cases with a void return value, because it makes the
> > code so much more legible and easier to understand.
> 
> For Adreno which is configured via mmio, we don't need to do this often. GBIF_HALT
> is a scenario where we need to be extra careful as it can potentially cause some
> internal lockup. Another scenario I can think of is GPU soft reset where need to
> keep a delay on cpu side after triggering. We should closely scrutinize any
> other instance that comes up. So I feel a good justification as a comment here
> would be enough, to remind the reader. Think of it as a way to discourage the
> use by making it hard.
> 
> This is a bit subjective, I am fine if you have a strong opinion on this.

Eh it's up to you, but "we don't do this often" is a reason to make them
stand out even more. Similar reasons why cpu memory barriers must all have
a comment, to explain what they're synchronizing against.

Up to you if you just want a comment rule or make them stand out even more
with an explicit name (and still have the comment rule) that's different
from normal reads. Again comparing to cpu barriers, the nice thing is that
they're (in most cases at least, unless you do really scary stuff) very
easy to spot in the code and the ring alarm bells when doing reviews.
-Sima
diff mbox series

Patch

diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
index 0e3dfd4c2bc8..09d640165b18 100644
--- a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
+++ b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
@@ -466,9 +466,7 @@  static int a6xx_rpmh_start(struct a6xx_gmu *gmu)
 	int ret;
 	u32 val;
 
-	gmu_write(gmu, REG_A6XX_GMU_RSCC_CONTROL_REQ, 1 << 1);
-	/* Wait for the register to finish posting */
-	wmb();
+	gmu_write(gmu, REG_A6XX_GMU_RSCC_CONTROL_REQ, BIT(1));
 
 	ret = gmu_poll_timeout(gmu, REG_A6XX_GMU_RSCC_CONTROL_ACK, val,
 		val & (1 << 1), 100, 10000);
diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
index c98cdb1e9326..4083d0cad782 100644
--- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
@@ -855,14 +855,16 @@  static int hw_init(struct msm_gpu *gpu)
 	/* Clear GBIF halt in case GX domain was not collapsed */
 	if (adreno_is_a619_holi(adreno_gpu)) {
 		gpu_write(gpu, REG_A6XX_GBIF_HALT, 0);
+		gpu_read(gpu, REG_A6XX_GBIF_HALT);
+
 		gpu_write(gpu, REG_A6XX_RBBM_GPR0_CNTL, 0);
-		/* Let's make extra sure that the GPU can access the memory.. */
-		mb();
+		gpu_read(gpu, REG_A6XX_RBBM_GPR0_CNTL);
 	} else if (a6xx_has_gbif(adreno_gpu)) {
 		gpu_write(gpu, REG_A6XX_GBIF_HALT, 0);
+		gpu_read(gpu, REG_A6XX_GBIF_HALT);
+
 		gpu_write(gpu, REG_A6XX_RBBM_GBIF_HALT, 0);
-		/* Let's make extra sure that the GPU can access the memory.. */
-		mb();
+		gpu_read(gpu, REG_A6XX_RBBM_GBIF_HALT);
 	}
 
 	/* Some GPUs are stubborn and take their sweet time to unhalt GBIF! */