diff mbox

[3/3] drm/msm: gpu Add new gpu register read/write functions

Message ID 1479829657-23698-4-git-send-email-jcrouse@codeaurora.org (mailing list archive)
State Not Applicable, archived
Delegated to: Andy Gross
Headers show

Commit Message

Jordan Crouse Nov. 22, 2016, 3:47 p.m. UTC
Add some new functions to manipulate GPU registers.  gpu_read64 and
gpu_write64 can read/write a 64 bit value to two 32 bit registers.
For 4XX and older these are normally perfcounter registers, but
future targets will use 64 bit addressing so there will be many
more spots where a 64 bit read and write are needed.

gpu_rmw() does a read/modify/write on a 32 bit register given a mask
and bits to OR in.

Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
---
 drivers/gpu/drm/msm/adreno/a4xx_gpu.c | 12 ++---------
 drivers/gpu/drm/msm/msm_gpu.h         | 39 +++++++++++++++++++++++++++++++++++
 2 files changed, 41 insertions(+), 10 deletions(-)

Comments

Stephen Boyd Nov. 23, 2016, 7:34 p.m. UTC | #1
On 11/22/2016 07:47 AM, Jordan Crouse wrote:
> Add some new functions to manipulate GPU registers.  gpu_read64 and
> gpu_write64 can read/write a 64 bit value to two 32 bit registers.
> For 4XX and older these are normally perfcounter registers, but
> future targets will use 64 bit addressing so there will be many
> more spots where a 64 bit read and write are needed.
>
> gpu_rmw() does a read/modify/write on a 32 bit register given a mask
> and bits to OR in.
>
> Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
> ---
>  drivers/gpu/drm/msm/adreno/a4xx_gpu.c | 12 ++---------
>  drivers/gpu/drm/msm/msm_gpu.h         | 39 +++++++++++++++++++++++++++++++++++
>  2 files changed, 41 insertions(+), 10 deletions(-)
>
> diff --git a/drivers/gpu/drm/msm/adreno/a4xx_gpu.c b/drivers/gpu/drm/msm/adreno/a4xx_gpu.c
> index ba16507..b82210c 100644
> --- a/drivers/gpu/drm/msm/adreno/a4xx_gpu.c
> +++ b/drivers/gpu/drm/msm/adreno/a4xx_gpu.c
> @@ -513,16 +513,8 @@ static int a4xx_pm_suspend(struct msm_gpu *gpu) {
>  
>  static int a4xx_get_timestamp(struct msm_gpu *gpu, uint64_t *value)
>  {
> -	uint32_t hi, lo, tmp;
> -
> -	tmp = gpu_read(gpu, REG_A4XX_RBBM_PERFCTR_CP_0_HI);
> -	do {
> -		hi = tmp;
> -		lo = gpu_read(gpu, REG_A4XX_RBBM_PERFCTR_CP_0_LO);
> -		tmp = gpu_read(gpu, REG_A4XX_RBBM_PERFCTR_CP_0_HI);
> -	} while (tmp != hi);
> -
> -	*value = (((uint64_t)hi) << 32) | lo;
> +	*value = gpu_read64(gpu, REG_A4XX_RBBM_PERFCTR_CP_0_LO,
> +		REG_A4XX_RBBM_PERFCTR_CP_0_HI);

Did we stop caring about the case where the high bits changed while the
low bits were read? gpu_read64 (pretty poor name by the way considering
how many GPUs there are supported in the kernel) doesn't look to check
for or handle that case.
Rob Clark Nov. 23, 2016, 7:46 p.m. UTC | #2
On Wed, Nov 23, 2016 at 2:34 PM, Stephen Boyd <sboyd@codeaurora.org> wrote:
> On 11/22/2016 07:47 AM, Jordan Crouse wrote:
>> Add some new functions to manipulate GPU registers.  gpu_read64 and
>> gpu_write64 can read/write a 64 bit value to two 32 bit registers.
>> For 4XX and older these are normally perfcounter registers, but
>> future targets will use 64 bit addressing so there will be many
>> more spots where a 64 bit read and write are needed.
>>
>> gpu_rmw() does a read/modify/write on a 32 bit register given a mask
>> and bits to OR in.
>>
>> Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
>> ---
>>  drivers/gpu/drm/msm/adreno/a4xx_gpu.c | 12 ++---------
>>  drivers/gpu/drm/msm/msm_gpu.h         | 39 +++++++++++++++++++++++++++++++++++
>>  2 files changed, 41 insertions(+), 10 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/msm/adreno/a4xx_gpu.c b/drivers/gpu/drm/msm/adreno/a4xx_gpu.c
>> index ba16507..b82210c 100644
>> --- a/drivers/gpu/drm/msm/adreno/a4xx_gpu.c
>> +++ b/drivers/gpu/drm/msm/adreno/a4xx_gpu.c
>> @@ -513,16 +513,8 @@ static int a4xx_pm_suspend(struct msm_gpu *gpu) {
>>
>>  static int a4xx_get_timestamp(struct msm_gpu *gpu, uint64_t *value)
>>  {
>> -     uint32_t hi, lo, tmp;
>> -
>> -     tmp = gpu_read(gpu, REG_A4XX_RBBM_PERFCTR_CP_0_HI);
>> -     do {
>> -             hi = tmp;
>> -             lo = gpu_read(gpu, REG_A4XX_RBBM_PERFCTR_CP_0_LO);
>> -             tmp = gpu_read(gpu, REG_A4XX_RBBM_PERFCTR_CP_0_HI);
>> -     } while (tmp != hi);
>> -
>> -     *value = (((uint64_t)hi) << 32) | lo;
>> +     *value = gpu_read64(gpu, REG_A4XX_RBBM_PERFCTR_CP_0_LO,
>> +             REG_A4XX_RBBM_PERFCTR_CP_0_HI);
>
> Did we stop caring about the case where the high bits changed while the
> low bits were read? gpu_read64 (pretty poor name by the way considering
> how many GPUs there are supported in the kernel) doesn't look to check
> for or handle that case.

fancy hw is fancy ;-)

seems like for the perf ctrs reading _LO latches _HI so the loop was
overkill (and only worked in the first place because I happened to
read _LO first)

I was planning to smash in a comment to that effect when I merged this patch

BR,
-R


> --
> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
> a Linux Foundation Collaborative Project
>
> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel
--
To unsubscribe from this list: send the line "unsubscribe linux-arm-msm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Stephen Boyd Nov. 23, 2016, 7:55 p.m. UTC | #3
On 11/23/2016 11:46 AM, Rob Clark wrote:
> On Wed, Nov 23, 2016 at 2:34 PM, Stephen Boyd <sboyd@codeaurora.org> wrote:
>> On 11/22/2016 07:47 AM, Jordan Crouse wrote:
>>> Add some new functions to manipulate GPU registers.  gpu_read64 and
>>> gpu_write64 can read/write a 64 bit value to two 32 bit registers.
>>> For 4XX and older these are normally perfcounter registers, but
>>> future targets will use 64 bit addressing so there will be many
>>> more spots where a 64 bit read and write are needed.
>>>
>>> gpu_rmw() does a read/modify/write on a 32 bit register given a mask
>>> and bits to OR in.
>>>
>>> Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
>>> ---
>>>  drivers/gpu/drm/msm/adreno/a4xx_gpu.c | 12 ++---------
>>>  drivers/gpu/drm/msm/msm_gpu.h         | 39 +++++++++++++++++++++++++++++++++++
>>>  2 files changed, 41 insertions(+), 10 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/msm/adreno/a4xx_gpu.c b/drivers/gpu/drm/msm/adreno/a4xx_gpu.c
>>> index ba16507..b82210c 100644
>>> --- a/drivers/gpu/drm/msm/adreno/a4xx_gpu.c
>>> +++ b/drivers/gpu/drm/msm/adreno/a4xx_gpu.c
>>> @@ -513,16 +513,8 @@ static int a4xx_pm_suspend(struct msm_gpu *gpu) {
>>>
>>>  static int a4xx_get_timestamp(struct msm_gpu *gpu, uint64_t *value)
>>>  {
>>> -     uint32_t hi, lo, tmp;
>>> -
>>> -     tmp = gpu_read(gpu, REG_A4XX_RBBM_PERFCTR_CP_0_HI);
>>> -     do {
>>> -             hi = tmp;
>>> -             lo = gpu_read(gpu, REG_A4XX_RBBM_PERFCTR_CP_0_LO);
>>> -             tmp = gpu_read(gpu, REG_A4XX_RBBM_PERFCTR_CP_0_HI);
>>> -     } while (tmp != hi);
>>> -
>>> -     *value = (((uint64_t)hi) << 32) | lo;
>>> +     *value = gpu_read64(gpu, REG_A4XX_RBBM_PERFCTR_CP_0_LO,
>>> +             REG_A4XX_RBBM_PERFCTR_CP_0_HI);
>> Did we stop caring about the case where the high bits changed while the
>> low bits were read? gpu_read64 (pretty poor name by the way considering
>> how many GPUs there are supported in the kernel) doesn't look to check
>> for or handle that case.
> fancy hw is fancy ;-)
>
> seems like for the perf ctrs reading _LO latches _HI so the loop was
> overkill (and only worked in the first place because I happened to
> read _LO first)
>
> I was planning to smash in a comment to that effect when I merged this patch
>
>

Ah ok. There's already a comment in the newly introduced function to
that effect but no mention in the commit text hence the confusion.
diff mbox

Patch

diff --git a/drivers/gpu/drm/msm/adreno/a4xx_gpu.c b/drivers/gpu/drm/msm/adreno/a4xx_gpu.c
index ba16507..b82210c 100644
--- a/drivers/gpu/drm/msm/adreno/a4xx_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/a4xx_gpu.c
@@ -513,16 +513,8 @@  static int a4xx_pm_suspend(struct msm_gpu *gpu) {
 
 static int a4xx_get_timestamp(struct msm_gpu *gpu, uint64_t *value)
 {
-	uint32_t hi, lo, tmp;
-
-	tmp = gpu_read(gpu, REG_A4XX_RBBM_PERFCTR_CP_0_HI);
-	do {
-		hi = tmp;
-		lo = gpu_read(gpu, REG_A4XX_RBBM_PERFCTR_CP_0_LO);
-		tmp = gpu_read(gpu, REG_A4XX_RBBM_PERFCTR_CP_0_HI);
-	} while (tmp != hi);
-
-	*value = (((uint64_t)hi) << 32) | lo;
+	*value = gpu_read64(gpu, REG_A4XX_RBBM_PERFCTR_CP_0_LO,
+		REG_A4XX_RBBM_PERFCTR_CP_0_HI);
 
 	return 0;
 }
diff --git a/drivers/gpu/drm/msm/msm_gpu.h b/drivers/gpu/drm/msm/msm_gpu.h
index 4ee95ca..f8e6657 100644
--- a/drivers/gpu/drm/msm/msm_gpu.h
+++ b/drivers/gpu/drm/msm/msm_gpu.h
@@ -154,6 +154,45 @@  static inline u32 gpu_read(struct msm_gpu *gpu, u32 reg)
 	return msm_readl(gpu->mmio + (reg << 2));
 }
 
+static inline void gpu_rmw(struct msm_gpu *gpu, u32 reg, u32 mask, u32 or)
+{
+	uint32_t val = gpu_read(gpu, reg);
+
+	val &= ~mask;
+	gpu_write(gpu, reg, val | or);
+}
+
+static inline u64 gpu_read64(struct msm_gpu *gpu, u32 lo, u32 hi)
+{
+	u64 val;
+
+	/*
+	 * Why not a readq here? Two reasons: 1) many of the LO registers are
+	 * not quad word aligned and 2) the GPU hardware designers have a bit
+	 * of a history of putting registers where they fit, especially in
+	 * spins. The longer a GPU family goes the higher the chance that
+	 * we'll get burned.  We could do a series of validity checks if we
+	 * wanted to, but really is a readq() that much better? Nah.
+	 */
+
+	/*
+	 * For some lo/hi registers (like perfcounters), the hi value is latched
+	 * when the lo is read, so make sure to read the lo first to trigger
+	 * that
+	 */
+	val = (u64) msm_readl(gpu->mmio + (lo << 2));
+	val |= ((u64) msm_readl(gpu->mmio + (hi << 2)) << 32);
+
+	return val;
+}
+
+static inline void gpu_write64(struct msm_gpu *gpu, u32 lo, u32 hi, u64 val)
+{
+	/* Why not a writeq here? Read the screed above */
+	msm_writel(lower_32_bits(val), gpu->mmio + (lo << 2));
+	msm_writel(upper_32_bits(val), gpu->mmio + (hi << 2));
+}
+
 int msm_gpu_pm_suspend(struct msm_gpu *gpu);
 int msm_gpu_pm_resume(struct msm_gpu *gpu);