diff mbox

[4/4] drm/i915: Use czclk_freq in vlv c0 residency calculations

Message ID 1443126560-26006-5-git-send-email-ville.syrjala@linux.intel.com (mailing list archive)
State New, archived
Headers show

Commit Message

Ville Syrjälä Sept. 24, 2015, 8:29 p.m. UTC
From: Ville Syrjälä <ville.syrjala@linux.intel.com>

Replace the use of mem_freq/4 with czclk_freq in the vlv c0 residency
calculations.

Also deal with VLV_COUNT_RANGE_HIGH which affects all RCx residency
counters. We have just enough bits to do this without intermediate
divisions.

Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
---
 drivers/gpu/drm/i915/i915_irq.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

Comments

Imre Deak Sept. 28, 2015, 8:47 p.m. UTC | #1
On Thu, 2015-09-24 at 23:29 +0300, ville.syrjala@linux.intel.com wrote:
> From: Ville Syrjälä <ville.syrjala@linux.intel.com>
> 
> Replace the use of mem_freq/4 with czclk_freq in the vlv c0 residency
> calculations.
> 
> Also deal with VLV_COUNT_RANGE_HIGH which affects all RCx residency
> counters. We have just enough bits to do this without intermediate
> divisions.
> 
> Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
> ---
>  drivers/gpu/drm/i915/i915_irq.c | 8 ++++++--
>  1 file changed, 6 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
> index 07c87e0..d78ef64 100644
> --- a/drivers/gpu/drm/i915/i915_irq.c
> +++ b/drivers/gpu/drm/i915/i915_irq.c
> @@ -998,12 +998,16 @@ static bool vlv_c0_above(struct drm_i915_private *dev_priv,
>  			 int threshold)
>  {
>  	u64 time, c0;
> +	unsigned int mul = 100;
>  
>  	if (old->cz_clock == 0)
>  		return false;
>  
> +	if (I915_READ(VLV_COUNTER_CONTROL) & VLV_COUNT_RANGE_HIGH)
> +		mul <<= 8;

Could've been a separate patch.

> +
>  	time = now->cz_clock - old->cz_clock;
> -	time *= threshold * dev_priv->mem_freq;
> +	time *= threshold * dev_priv->czclk_freq;

Not introduced in this patch, but the above doesn't look correct to me.
Time is cycles _divided_ by frequency, so imo the above should be either
a division, or better we should calculate c0 (10ns) cycles here.

>  
>  	/* Workload can be split between render + media, e.g. SwapBuffers
>  	 * being blitted in X after being rendered in mesa. To account for
> @@ -1011,7 +1015,7 @@ static bool vlv_c0_above(struct drm_i915_private *dev_priv,
>  	 */
>  	c0 = now->render_c0 - old->render_c0;
>  	c0 += now->media_c0 - old->media_c0;
> -	c0 *= 100 * VLV_CZ_CLOCK_TO_MILLI_SEC * 4 / 1000;
> +	c0 *= mul * VLV_CZ_CLOCK_TO_MILLI_SEC;

Based on the above this would need to be fixed too.

The above can be done as a follow-up if needed; this patch does what it
says, so:
Reviewed-by: Imre Deak <imre.deak@intel.com>

>  
>  	return c0 >= time;
>  }
Imre Deak Sept. 28, 2015, 9:46 p.m. UTC | #2
On Mon, 2015-09-28 at 23:47 +0300, Imre Deak wrote:
> On Thu, 2015-09-24 at 23:29 +0300, ville.syrjala@linux.intel.com wrote:
> > From: Ville Syrjälä <ville.syrjala@linux.intel.com>
> > 
> > Replace the use of mem_freq/4 with czclk_freq in the vlv c0 residency
> > calculations.
> > 
> > Also deal with VLV_COUNT_RANGE_HIGH which affects all RCx residency
> > counters. We have just enough bits to do this without intermediate
> > divisions.
> > 
> > Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
> > ---
> >  drivers/gpu/drm/i915/i915_irq.c | 8 ++++++--
> >  1 file changed, 6 insertions(+), 2 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
> > index 07c87e0..d78ef64 100644
> > --- a/drivers/gpu/drm/i915/i915_irq.c
> > +++ b/drivers/gpu/drm/i915/i915_irq.c
> > @@ -998,12 +998,16 @@ static bool vlv_c0_above(struct drm_i915_private *dev_priv,
> >  			 int threshold)
> >  {
> >  	u64 time, c0;
> > +	unsigned int mul = 100;
> >  
> >  	if (old->cz_clock == 0)
> >  		return false;
> >  
> > +	if (I915_READ(VLV_COUNTER_CONTROL) & VLV_COUNT_RANGE_HIGH)
> > +		mul <<= 8;
> 
> Could've been a separate patch.
> 
> > +
> >  	time = now->cz_clock - old->cz_clock;
> > -	time *= threshold * dev_priv->mem_freq;
> > +	time *= threshold * dev_priv->czclk_freq;
> 
> Not introduced in this patch, but the above doesn't look correct to me.
> Time is cycles _divided_ by frequency, so imo the above should be either
> a division, or better we should calculate c0 (10ns) cycles here.
> 
> >  
> >  	/* Workload can be split between render + media, e.g. SwapBuffers
> >  	 * being blitted in X after being rendered in mesa. To account for
> > @@ -1011,7 +1015,7 @@ static bool vlv_c0_above(struct drm_i915_private *dev_priv,
> >  	 */
> >  	c0 = now->render_c0 - old->render_c0;
> >  	c0 += now->media_c0 - old->media_c0;
> > -	c0 *= 100 * VLV_CZ_CLOCK_TO_MILLI_SEC * 4 / 1000;
> > +	c0 *= mul * VLV_CZ_CLOCK_TO_MILLI_SEC;
> 
> Based on the above this would need to be fixed too.

Nvm the above, I realized now how it works:) I was confused seeing that
we scale by czclk freq and the 10ns freq the "opposite" time value.
Sorry for the noise.

> The above can be done as a follow-up if needed; this patch does what it
> says, so:
> Reviewed-by: Imre Deak <imre.deak@intel.com>
> 
> >  
> >  	return c0 >= time;
> >  }
> 
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Daniel Vetter Sept. 29, 2015, 9:08 a.m. UTC | #3
Pulled in entire series, thanks.
-Daniel

On Tue, Sep 29, 2015 at 12:46:20AM +0300, Imre Deak wrote:
> On Mon, 2015-09-28 at 23:47 +0300, Imre Deak wrote:
> > On Thu, 2015-09-24 at 23:29 +0300, ville.syrjala@linux.intel.com wrote:
> > > From: Ville Syrjälä <ville.syrjala@linux.intel.com>
> > > 
> > > Replace the use of mem_freq/4 with czclk_freq in the vlv c0 residency
> > > calculations.
> > > 
> > > Also deal with VLV_COUNT_RANGE_HIGH which affects all RCx residency
> > > counters. We have just enough bits to do this without intermediate
> > > divisions.
> > > 
> > > Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
> > > ---
> > >  drivers/gpu/drm/i915/i915_irq.c | 8 ++++++--
> > >  1 file changed, 6 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
> > > index 07c87e0..d78ef64 100644
> > > --- a/drivers/gpu/drm/i915/i915_irq.c
> > > +++ b/drivers/gpu/drm/i915/i915_irq.c
> > > @@ -998,12 +998,16 @@ static bool vlv_c0_above(struct drm_i915_private *dev_priv,
> > >  			 int threshold)
> > >  {
> > >  	u64 time, c0;
> > > +	unsigned int mul = 100;
> > >  
> > >  	if (old->cz_clock == 0)
> > >  		return false;
> > >  
> > > +	if (I915_READ(VLV_COUNTER_CONTROL) & VLV_COUNT_RANGE_HIGH)
> > > +		mul <<= 8;
> > 
> > Could've been a separate patch.
> > 
> > > +
> > >  	time = now->cz_clock - old->cz_clock;
> > > -	time *= threshold * dev_priv->mem_freq;
> > > +	time *= threshold * dev_priv->czclk_freq;
> > 
> > Not introduced in this patch, but the above doesn't look correct to me.
> > Time is cycles _divided_ by frequency, so imo the above should be either
> > a division, or better we should calculate c0 (10ns) cycles here.
> > 
> > >  
> > >  	/* Workload can be split between render + media, e.g. SwapBuffers
> > >  	 * being blitted in X after being rendered in mesa. To account for
> > > @@ -1011,7 +1015,7 @@ static bool vlv_c0_above(struct drm_i915_private *dev_priv,
> > >  	 */
> > >  	c0 = now->render_c0 - old->render_c0;
> > >  	c0 += now->media_c0 - old->media_c0;
> > > -	c0 *= 100 * VLV_CZ_CLOCK_TO_MILLI_SEC * 4 / 1000;
> > > +	c0 *= mul * VLV_CZ_CLOCK_TO_MILLI_SEC;
> > 
> > Based on the above this would need to be fixed too.
> 
> Nvm the above, I realized now how it works:) I was confused seeing that
> we scale by czclk freq and the 10ns freq the "opposite" time value.
> Sorry for the noise.
> 
> > The above can be done as a follow-up if needed; this patch does what it
> > says, so:
> > Reviewed-by: Imre Deak <imre.deak@intel.com>
> > 
> > >  
> > >  	return c0 >= time;
> > >  }
> > 
> > 
> > _______________________________________________
> > Intel-gfx mailing list
> > Intel-gfx@lists.freedesktop.org
> > http://lists.freedesktop.org/mailman/listinfo/intel-gfx
> 
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Ville Syrjälä Sept. 29, 2015, 12:29 p.m. UTC | #4
On Mon, Sep 28, 2015 at 11:47:15PM +0300, Imre Deak wrote:
> On Thu, 2015-09-24 at 23:29 +0300, ville.syrjala@linux.intel.com wrote:
> > From: Ville Syrjälä <ville.syrjala@linux.intel.com>
> > 
> > Replace the use of mem_freq/4 with czclk_freq in the vlv c0 residency
> > calculations.
> > 
> > Also deal with VLV_COUNT_RANGE_HIGH which affects all RCx residency
> > counters. We have just enough bits to do this without intermediate
> > divisions.
> > 
> > Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
> > ---
> >  drivers/gpu/drm/i915/i915_irq.c | 8 ++++++--
> >  1 file changed, 6 insertions(+), 2 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
> > index 07c87e0..d78ef64 100644
> > --- a/drivers/gpu/drm/i915/i915_irq.c
> > +++ b/drivers/gpu/drm/i915/i915_irq.c
> > @@ -998,12 +998,16 @@ static bool vlv_c0_above(struct drm_i915_private *dev_priv,
> >  			 int threshold)
> >  {
> >  	u64 time, c0;
> > +	unsigned int mul = 100;
> >  
> >  	if (old->cz_clock == 0)
> >  		return false;
> >  
> > +	if (I915_READ(VLV_COUNTER_CONTROL) & VLV_COUNT_RANGE_HIGH)
> > +		mul <<= 8;
> 
> Could've been a separate patch.
> 
> > +
> >  	time = now->cz_clock - old->cz_clock;
> > -	time *= threshold * dev_priv->mem_freq;
> > +	time *= threshold * dev_priv->czclk_freq;
> 
> Not introduced in this patch, but the above doesn't look correct to me.
> Time is cycles _divided_ by frequency, so imo the above should be either
> a division, or better we should calculate c0 (10ns) cycles here.

I think it's correct. It's just moved the division over the to other
side. So what we want to check is:

threshold * (czts - czts_old)     mul * (c0 - c0_old)
----------------------------- <= --------------------
     cz_to_milli_sec                  czclk_freq

Or actually maybe better think it as 

            (czts - czts_old) * czclk_freq
threshold * ------------------------------  <= mul * (c0 - c0_old)
	           cz_to_milli_sec

The fact that the "cz" timestamp is not in cz clock units forces us to
do this silly conversion. I have no idea why Punit wants to give out the
timestamp in some normalized units. If it would instead give us the raw
cz clock timestamp we could just do
"threshold * (czts - czts_old) <= mul * (c0 - c0_old)"

So yeah, another case of the hardware (well, Punit firmware in this case
I suppose) being "helpful" :(

I think I even tried looking for a raw cz timestamp register so that we
could avoid this mess, but I couldn't find one.
 
> >  
> >  	/* Workload can be split between render + media, e.g. SwapBuffers
> >  	 * being blitted in X after being rendered in mesa. To account for
> > @@ -1011,7 +1015,7 @@ static bool vlv_c0_above(struct drm_i915_private *dev_priv,
> >  	 */
> >  	c0 = now->render_c0 - old->render_c0;
> >  	c0 += now->media_c0 - old->media_c0;
> > -	c0 *= 100 * VLV_CZ_CLOCK_TO_MILLI_SEC * 4 / 1000;
> > +	c0 *= mul * VLV_CZ_CLOCK_TO_MILLI_SEC;
> 
> Based on the above this would need to be fixed too.
> 
> The above can be done as a follow-up if needed; this patch does what it
> says, so:
> Reviewed-by: Imre Deak <imre.deak@intel.com>
> 
> >  
> >  	return c0 >= time;
> >  }
>
diff mbox

Patch

diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index 07c87e0..d78ef64 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -998,12 +998,16 @@  static bool vlv_c0_above(struct drm_i915_private *dev_priv,
 			 int threshold)
 {
 	u64 time, c0;
+	unsigned int mul = 100;
 
 	if (old->cz_clock == 0)
 		return false;
 
+	if (I915_READ(VLV_COUNTER_CONTROL) & VLV_COUNT_RANGE_HIGH)
+		mul <<= 8;
+
 	time = now->cz_clock - old->cz_clock;
-	time *= threshold * dev_priv->mem_freq;
+	time *= threshold * dev_priv->czclk_freq;
 
 	/* Workload can be split between render + media, e.g. SwapBuffers
 	 * being blitted in X after being rendered in mesa. To account for
@@ -1011,7 +1015,7 @@  static bool vlv_c0_above(struct drm_i915_private *dev_priv,
 	 */
 	c0 = now->render_c0 - old->render_c0;
 	c0 += now->media_c0 - old->media_c0;
-	c0 *= 100 * VLV_CZ_CLOCK_TO_MILLI_SEC * 4 / 1000;
+	c0 *= mul * VLV_CZ_CLOCK_TO_MILLI_SEC;
 
 	return c0 >= time;
 }