diff mbox

[10/12] drm/i915/skl+: use linetime latency if ddb size is not available

Message ID 20170515083437.31394-11-mahesh1.kumar@intel.com (mailing list archive)
State New, archived
Headers show

Commit Message

Kumar, Mahesh May 15, 2017, 8:34 a.m. UTC
From: "Kumar, Mahesh" <mahesh1.kumar@intel.com>

This patch make changes to use linetime latency if allocated
DDB size during plane watermark calculation is not available,
This is required to implement new DDB allocation algorithm.

In New Algorithm DDB is allocated based on WM values, because of which
number of DDB blocks will not be available during WM calculation,
So this "linetime latency" is suggested by SV/HW team to be used during
switch-case for WM blocks selection.

Changes since v1:
 - Rebase on top of Paulo's patch series
Changes since v2:
 - Fix if-else condition (pointed by Maarten)
Changes since v3:
 - Use common function for timetime_us calculation (Paulo)
 - rebase on drm-tip
Changes since v4:
 - Use consistent name for fixed_point operation

Signed-off-by: "Mahesh Kumar" <mahesh1.kumar@intel.com>
---
 drivers/gpu/drm/i915/i915_drv.h |  7 +++++++
 drivers/gpu/drm/i915/intel_pm.c | 42 ++++++++++++++++++++++++++++++++---------
 2 files changed, 40 insertions(+), 9 deletions(-)

Comments

Matt Roper May 15, 2017, 10:38 p.m. UTC | #1
On Mon, May 15, 2017 at 02:04:35PM +0530, Mahesh Kumar wrote:
> From: "Kumar, Mahesh" <mahesh1.kumar@intel.com>
> 
> This patch make changes to use linetime latency if allocated
> DDB size during plane watermark calculation is not available,
> This is required to implement new DDB allocation algorithm.
> 
> In New Algorithm DDB is allocated based on WM values, because of which
> number of DDB blocks will not be available during WM calculation,
> So this "linetime latency" is suggested by SV/HW team to be used during
> switch-case for WM blocks selection.

(Just realized I never actually sent my review of this patch on the
previous series review; sorry for the delay.)

I'm not sure if the term "linetime" is completely self-explanatory.  You
might want to explicitly clarify that it's the time it takes to fill a
single row at a given clock rate.  So "linetime latency" here is defined
as "line time in microseconds = pipe horizontal total number of pixels /
pixel rate in MHz."

> 
> Changes since v1:
>  - Rebase on top of Paulo's patch series
> Changes since v2:
>  - Fix if-else condition (pointed by Maarten)
> Changes since v3:
>  - Use common function for timetime_us calculation (Paulo)
>  - rebase on drm-tip
> Changes since v4:
>  - Use consistent name for fixed_point operation
> 
> Signed-off-by: "Mahesh Kumar" <mahesh1.kumar@intel.com>
> ---
>  drivers/gpu/drm/i915/i915_drv.h |  7 +++++++
>  drivers/gpu/drm/i915/intel_pm.c | 42 ++++++++++++++++++++++++++++++++---------
>  2 files changed, 40 insertions(+), 9 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index a1858c3eb33a..84052990e695 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -115,6 +115,13 @@ typedef struct {
>  	fp; \
>  })
>  
> +static inline bool is_fixed16_zero(uint_fixed_16_16_t val)
> +{
> +	if (val.val == 0)
> +		return true;
> +	return false;
> +}
> +
>  static inline uint_fixed_16_16_t u32_to_fixed_16_16(uint32_t val)
>  {
>  	uint_fixed_16_16_t fp;
> diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c
> index 0f1d9f672e83..d73369c2c2d9 100644
> --- a/drivers/gpu/drm/i915/intel_pm.c
> +++ b/drivers/gpu/drm/i915/intel_pm.c
> @@ -4197,6 +4197,27 @@ static uint_fixed_16_16_t skl_wm_method2(uint32_t pixel_rate,
>  	return ret;
>  }
>  
> +static uint_fixed_16_16_t
> +skl_get_linetime_us(struct intel_crtc_state *cstate)

We use the same definition of linetime for all platforms that use the
concept of 'linetime' (hsw+).  So maybe the skl_ prefix here isn't
really appropriate?

For consistency, should we also call this new function when calculating
the HSW/BDW watermarks in hsw_compute_linetime_wm()?  Or consolidate
skl_compute_linetime_wm() with hsw_compute_linetime_wm()?  Potentially
something we could do as a follow-up patch if we don't want to deal with
it while getting this series in.

> +{
> +	uint32_t pixel_rate;
> +	uint32_t crtc_htotal;
> +	uint_fixed_16_16_t linetime_us;
> +
> +	if (!cstate->base.active)
> +		return u32_to_fixed_16_16(0);
> +
> +	pixel_rate = cstate->pixel_rate;
> +
> +	if (WARN_ON(pixel_rate == 0))
> +		return u32_to_fixed_16_16(0);
> +
> +	crtc_htotal = cstate->base.adjusted_mode.crtc_htotal;
> +	linetime_us = fixed_16_16_div_u64(crtc_htotal * 1000, pixel_rate);
> +
> +	return linetime_us;
> +}
> +
>  static uint32_t
>  skl_adjusted_plane_pixel_rate(const struct intel_crtc_state *cstate,
>  			      const struct intel_plane_state *pstate)
> @@ -4331,12 +4352,18 @@ static int skl_compute_plane_wm(const struct drm_i915_private *dev_priv,
>  	if (y_tiled) {
>  		selected_result = max_fixed_16_16(method2, y_tile_minimum);
>  	} else {
> +		uint32_t linetime_us;
> +
> +		linetime_us = fixed_16_16_to_u32_round_up(
> +				skl_get_linetime_us(cstate));
>  		if ((cpp * cstate->base.adjusted_mode.crtc_htotal / 512 < 1) &&
>  		    (plane_bytes_per_line / 512 < 1))
>  			selected_result = method2;
> -		else if ((ddb_allocation /
> +		else if ((ddb_allocation && ddb_allocation /
>  			fixed_16_16_to_u32_round_up(plane_blocks_per_line)) >= 1)
>  			selected_result = min_fixed_16_16(method1, method2);
> +		else if (latency >= linetime_us)
> +			selected_result = method2;

Based on the commit message, I'm a little bit confused.  The plan is to
switch to the new algorithm where we calculate watermarks first and then
partition DDB second.  So at the end, ddb_allocation should never be
available here, right?  In that case, can we just remove it as a
parameter to this function and drop your first 'else if' branch here?
That would make us take the new decision logic immediately and give us
an earlier idea whether it really behaves the way we expect it to,
rather than waiting until we've made a bunch of other changes too.

It's also not clear to me why with the old algorithm, the DDB allocation
test being true meant "use the lower of the two watermark methods"
whereas the linetime test means "aways use method2."  Is there any more
explanation available for how/why the linetime latency plays into the
watermark method decision?  The new decision isn't reflected in the
bspec yet (that I can see), so it's kind of hard to review that we're
definitely doing the right thing.


Matt

>  		else
>  			selected_result = method1;
>  	}
> @@ -4424,19 +4451,16 @@ skl_compute_linetime_wm(struct intel_crtc_state *cstate)
>  {
>  	struct drm_atomic_state *state = cstate->base.state;
>  	struct drm_i915_private *dev_priv = to_i915(state->dev);
> -	uint32_t pixel_rate;
> +	uint_fixed_16_16_t linetime_us;
>  	uint32_t linetime_wm;
>  
> -	if (!cstate->base.active)
> -		return 0;
> +	linetime_us = skl_get_linetime_us(cstate);
>  
> -	pixel_rate = cstate->pixel_rate;
> -
> -	if (WARN_ON(pixel_rate == 0))
> +	if (is_fixed16_zero(linetime_us))
>  		return 0;
>  
> -	linetime_wm = DIV_ROUND_UP(8 * cstate->base.adjusted_mode.crtc_htotal *
> -				   1000, pixel_rate);
> +	linetime_wm = fixed_16_16_to_u32_round_up(mul_u32_fixed_16_16(8,
> +				linetime_us));
>  
>  	/* Display WA #1135: bxt. */
>  	if (IS_BROXTON(dev_priv) && dev_priv->ipc_enabled)
> -- 
> 2.11.0
>
Kumar, Mahesh May 16, 2017, 9:46 a.m. UTC | #2
Hi,


On Tuesday 16 May 2017 04:08 AM, Matt Roper wrote:
> On Mon, May 15, 2017 at 02:04:35PM +0530, Mahesh Kumar wrote:
>> From: "Kumar, Mahesh" <mahesh1.kumar@intel.com>
>>
>> This patch make changes to use linetime latency if allocated
>> DDB size during plane watermark calculation is not available,
>> This is required to implement new DDB allocation algorithm.
>>
>> In New Algorithm DDB is allocated based on WM values, because of which
>> number of DDB blocks will not be available during WM calculation,
>> So this "linetime latency" is suggested by SV/HW team to be used during
>> switch-case for WM blocks selection.
> (Just realized I never actually sent my review of this patch on the
> previous series review; sorry for the delay.)
>
> I'm not sure if the term "linetime" is completely self-explanatory.  You
> might want to explicitly clarify that it's the time it takes to fill a
> single row at a given clock rate.  So "linetime latency" here is defined
> as "line time in microseconds = pipe horizontal total number of pixels /
> pixel rate in MHz."
Will update commit message.
>
>> Changes since v1:
>>   - Rebase on top of Paulo's patch series
>> Changes since v2:
>>   - Fix if-else condition (pointed by Maarten)
>> Changes since v3:
>>   - Use common function for timetime_us calculation (Paulo)
>>   - rebase on drm-tip
>> Changes since v4:
>>   - Use consistent name for fixed_point operation
>>
>> Signed-off-by: "Mahesh Kumar" <mahesh1.kumar@intel.com>
>> ---
>>   drivers/gpu/drm/i915/i915_drv.h |  7 +++++++
>>   drivers/gpu/drm/i915/intel_pm.c | 42 ++++++++++++++++++++++++++++++++---------
>>   2 files changed, 40 insertions(+), 9 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
>> index a1858c3eb33a..84052990e695 100644
>> --- a/drivers/gpu/drm/i915/i915_drv.h
>> +++ b/drivers/gpu/drm/i915/i915_drv.h
>> @@ -115,6 +115,13 @@ typedef struct {
>>   	fp; \
>>   })
>>   
>> +static inline bool is_fixed16_zero(uint_fixed_16_16_t val)
>> +{
>> +	if (val.val == 0)
>> +		return true;
>> +	return false;
>> +}
>> +
>>   static inline uint_fixed_16_16_t u32_to_fixed_16_16(uint32_t val)
>>   {
>>   	uint_fixed_16_16_t fp;
>> diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c
>> index 0f1d9f672e83..d73369c2c2d9 100644
>> --- a/drivers/gpu/drm/i915/intel_pm.c
>> +++ b/drivers/gpu/drm/i915/intel_pm.c
>> @@ -4197,6 +4197,27 @@ static uint_fixed_16_16_t skl_wm_method2(uint32_t pixel_rate,
>>   	return ret;
>>   }
>>   
>> +static uint_fixed_16_16_t
>> +skl_get_linetime_us(struct intel_crtc_state *cstate)
> We use the same definition of linetime for all platforms that use the
> concept of 'linetime' (hsw+).  So maybe the skl_ prefix here isn't
> really appropriate?
will make it intel_get_linetime_us.
> For consistency, should we also call this new function when calculating
> the HSW/BDW watermarks in hsw_compute_linetime_wm()?  Or consolidate
> skl_compute_linetime_wm() with hsw_compute_linetime_wm()?  Potentially
> something we could do as a follow-up patch if we don't want to deal with
> it while getting this series in.
sounds good :), will create a new patch to use intel_get_linetime_us in 
hsw_compute_linetime_wm.
hsw & skl require bunch of other calculations/WA in linetime_wm, so will 
keep them separate function only.
>
>> +{
>> +	uint32_t pixel_rate;
>> +	uint32_t crtc_htotal;
>> +	uint_fixed_16_16_t linetime_us;
>> +
>> +	if (!cstate->base.active)
>> +		return u32_to_fixed_16_16(0);
>> +
>> +	pixel_rate = cstate->pixel_rate;
>> +
>> +	if (WARN_ON(pixel_rate == 0))
>> +		return u32_to_fixed_16_16(0);
>> +
>> +	crtc_htotal = cstate->base.adjusted_mode.crtc_htotal;
>> +	linetime_us = fixed_16_16_div_u64(crtc_htotal * 1000, pixel_rate);
>> +
>> +	return linetime_us;
>> +}
>> +
>>   static uint32_t
>>   skl_adjusted_plane_pixel_rate(const struct intel_crtc_state *cstate,
>>   			      const struct intel_plane_state *pstate)
>> @@ -4331,12 +4352,18 @@ static int skl_compute_plane_wm(const struct drm_i915_private *dev_priv,
>>   	if (y_tiled) {
>>   		selected_result = max_fixed_16_16(method2, y_tile_minimum);
>>   	} else {
>> +		uint32_t linetime_us;
>> +
>> +		linetime_us = fixed_16_16_to_u32_round_up(
>> +				skl_get_linetime_us(cstate));
>>   		if ((cpp * cstate->base.adjusted_mode.crtc_htotal / 512 < 1) &&
>>   		    (plane_bytes_per_line / 512 < 1))
>>   			selected_result = method2;
>> -		else if ((ddb_allocation /
>> +		else if ((ddb_allocation && ddb_allocation /
>>   			fixed_16_16_to_u32_round_up(plane_blocks_per_line)) >= 1)
>>   			selected_result = min_fixed_16_16(method1, method2);
>> +		else if (latency >= linetime_us)
>> +			selected_result = method2;
> Based on the commit message, I'm a little bit confused.  The plan is to
> switch to the new algorithm where we calculate watermarks first and then
> partition DDB second.  So at the end, ddb_allocation should never be
> available here, right?  In that case, can we just remove it as a
> parameter to this function and drop your first 'else if' branch here?
> That would make us take the new decision logic immediately and give us
> an earlier idea whether it really behaves the way we expect it to,
> rather than waiting until we've made a bunch of other changes too.
right, We will eventually remove first "else if" check in next patch 
itself :)
I kept it in this patch due to earlier comment by Paulo not to remove 
(ddb_allocation / plane blocks per line)
check until patch which removes the ddb_allocation is merged/submitted.
https://patchwork.freedesktop.org/patch/115537/
>
> It's also not clear to me why with the old algorithm, the DDB allocation
> test being true meant "use the lower of the two watermark methods"
> whereas the linetime test means "aways use method2."  Is there any more
> explanation available for how/why the linetime latency plays into the
> watermark method decision?  The new decision isn't reflected in the
> bspec yet (that I can see), so it's kind of hard to review that we're
> definitely doing the right thing.
Thanks alot for catching this, I revisited the BSpec & it says to use 
minimum of method1, method2.
BTW these changes are already included in internal Bspec, After new 
version release it should be part of open BSpec.
I don't know about release process & timeline though :P

-Mahesh
>
>
> Matt
>
>>   		else
>>   			selected_result = method1;
>>   	}
>> @@ -4424,19 +4451,16 @@ skl_compute_linetime_wm(struct intel_crtc_state *cstate)
>>   {
>>   	struct drm_atomic_state *state = cstate->base.state;
>>   	struct drm_i915_private *dev_priv = to_i915(state->dev);
>> -	uint32_t pixel_rate;
>> +	uint_fixed_16_16_t linetime_us;
>>   	uint32_t linetime_wm;
>>   
>> -	if (!cstate->base.active)
>> -		return 0;
>> +	linetime_us = skl_get_linetime_us(cstate);
>>   
>> -	pixel_rate = cstate->pixel_rate;
>> -
>> -	if (WARN_ON(pixel_rate == 0))
>> +	if (is_fixed16_zero(linetime_us))
>>   		return 0;
>>   
>> -	linetime_wm = DIV_ROUND_UP(8 * cstate->base.adjusted_mode.crtc_htotal *
>> -				   1000, pixel_rate);
>> +	linetime_wm = fixed_16_16_to_u32_round_up(mul_u32_fixed_16_16(8,
>> +				linetime_us));
>>   
>>   	/* Display WA #1135: bxt. */
>>   	if (IS_BROXTON(dev_priv) && dev_priv->ipc_enabled)
>> -- 
>> 2.11.0
>>
diff mbox

Patch

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index a1858c3eb33a..84052990e695 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -115,6 +115,13 @@  typedef struct {
 	fp; \
 })
 
+static inline bool is_fixed16_zero(uint_fixed_16_16_t val)
+{
+	if (val.val == 0)
+		return true;
+	return false;
+}
+
 static inline uint_fixed_16_16_t u32_to_fixed_16_16(uint32_t val)
 {
 	uint_fixed_16_16_t fp;
diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c
index 0f1d9f672e83..d73369c2c2d9 100644
--- a/drivers/gpu/drm/i915/intel_pm.c
+++ b/drivers/gpu/drm/i915/intel_pm.c
@@ -4197,6 +4197,27 @@  static uint_fixed_16_16_t skl_wm_method2(uint32_t pixel_rate,
 	return ret;
 }
 
+static uint_fixed_16_16_t
+skl_get_linetime_us(struct intel_crtc_state *cstate)
+{
+	uint32_t pixel_rate;
+	uint32_t crtc_htotal;
+	uint_fixed_16_16_t linetime_us;
+
+	if (!cstate->base.active)
+		return u32_to_fixed_16_16(0);
+
+	pixel_rate = cstate->pixel_rate;
+
+	if (WARN_ON(pixel_rate == 0))
+		return u32_to_fixed_16_16(0);
+
+	crtc_htotal = cstate->base.adjusted_mode.crtc_htotal;
+	linetime_us = fixed_16_16_div_u64(crtc_htotal * 1000, pixel_rate);
+
+	return linetime_us;
+}
+
 static uint32_t
 skl_adjusted_plane_pixel_rate(const struct intel_crtc_state *cstate,
 			      const struct intel_plane_state *pstate)
@@ -4331,12 +4352,18 @@  static int skl_compute_plane_wm(const struct drm_i915_private *dev_priv,
 	if (y_tiled) {
 		selected_result = max_fixed_16_16(method2, y_tile_minimum);
 	} else {
+		uint32_t linetime_us;
+
+		linetime_us = fixed_16_16_to_u32_round_up(
+				skl_get_linetime_us(cstate));
 		if ((cpp * cstate->base.adjusted_mode.crtc_htotal / 512 < 1) &&
 		    (plane_bytes_per_line / 512 < 1))
 			selected_result = method2;
-		else if ((ddb_allocation /
+		else if ((ddb_allocation && ddb_allocation /
 			fixed_16_16_to_u32_round_up(plane_blocks_per_line)) >= 1)
 			selected_result = min_fixed_16_16(method1, method2);
+		else if (latency >= linetime_us)
+			selected_result = method2;
 		else
 			selected_result = method1;
 	}
@@ -4424,19 +4451,16 @@  skl_compute_linetime_wm(struct intel_crtc_state *cstate)
 {
 	struct drm_atomic_state *state = cstate->base.state;
 	struct drm_i915_private *dev_priv = to_i915(state->dev);
-	uint32_t pixel_rate;
+	uint_fixed_16_16_t linetime_us;
 	uint32_t linetime_wm;
 
-	if (!cstate->base.active)
-		return 0;
+	linetime_us = skl_get_linetime_us(cstate);
 
-	pixel_rate = cstate->pixel_rate;
-
-	if (WARN_ON(pixel_rate == 0))
+	if (is_fixed16_zero(linetime_us))
 		return 0;
 
-	linetime_wm = DIV_ROUND_UP(8 * cstate->base.adjusted_mode.crtc_htotal *
-				   1000, pixel_rate);
+	linetime_wm = fixed_16_16_to_u32_round_up(mul_u32_fixed_16_16(8,
+				linetime_us));
 
 	/* Display WA #1135: bxt. */
 	if (IS_BROXTON(dev_priv) && dev_priv->ipc_enabled)