diff mbox series

drm/i915: program wm blocks to at least blocks required per line

Message ID 20220404134918.729038-1-vinod.govindapillai@intel.com (mailing list archive)
State New, archived
Headers show
Series drm/i915: program wm blocks to at least blocks required per line | expand

Commit Message

Vinod Govindapillai April 4, 2022, 1:49 p.m. UTC
In configurations with single DRAM channel, for usecases like
4K 60 Hz, FIFO underruns are observed quite frequently. Looks
like the wm0 watermark values need to bumped up because the wm0
memory latency calculations are probably not taking the DRAM
channel's impact into account.

As per the Bspec 49325, if the ddb allocation can hold at least
one plane_blocks_per_line we should have selected method2.
Assuming that modern HW versions have enough dbuf to hold
at least one line, set the wm blocks to equivalent to blocks
per line.

cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
cc: Stanislav Lisovskiy <stanislav.lisovskiy@intel.com>

Signed-off-by: Vinod Govindapillai <vinod.govindapillai@intel.com>
---
 drivers/gpu/drm/i915/intel_pm.c | 19 ++++++++++++++++++-
 1 file changed, 18 insertions(+), 1 deletion(-)

Comments

Stanislav Lisovskiy April 6, 2022, 8:14 a.m. UTC | #1
On Mon, Apr 04, 2022 at 04:49:18PM +0300, Vinod Govindapillai wrote:
> In configurations with single DRAM channel, for usecases like
> 4K 60 Hz, FIFO underruns are observed quite frequently. Looks
> like the wm0 watermark values need to bumped up because the wm0
> memory latency calculations are probably not taking the DRAM
> channel's impact into account.
> 
> As per the Bspec 49325, if the ddb allocation can hold at least
> one plane_blocks_per_line we should have selected method2.
> Assuming that modern HW versions have enough dbuf to hold
> at least one line, set the wm blocks to equivalent to blocks
> per line.
> 
> cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
> cc: Stanislav Lisovskiy <stanislav.lisovskiy@intel.com>
> 
> Signed-off-by: Vinod Govindapillai <vinod.govindapillai@intel.com>
> ---
>  drivers/gpu/drm/i915/intel_pm.c | 19 ++++++++++++++++++-
>  1 file changed, 18 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c
> index 8824f269e5f5..ae28a8c63ca4 100644
> --- a/drivers/gpu/drm/i915/intel_pm.c
> +++ b/drivers/gpu/drm/i915/intel_pm.c
> @@ -5474,7 +5474,24 @@ static void skl_compute_plane_wm(const struct intel_crtc_state *crtc_state,
>  		}
>  	}
>  
> -	blocks = fixed16_to_u32_round_up(selected_result) + 1;
> +	/*
> +	 * Lets have blocks at minimum equivalent to plane_blocks_per_line
> +	 * as there will be at minimum one line for lines configuration.
> +	 *
> +	 * As per the Bspec 49325, if the ddb allocation can hold at least
> +	 * one plane_blocks_per_line, we should have selected method2 in
> +	 * the above logic. Assuming that modern versions have enough dbuf
> +	 * and method2 guarantees blocks equivalent to at least 1 line,
> +	 * select the blocks as plane_blocks_per_line.
> +	 *
> +	 * TODO: Revisit the logic when we have better understanding on DRAM
> +	 * channels' impact on the level 0 memory latency and the relevant
> +	 * wm calculations.
> +	 */
> +	blocks = skl_wm_has_lines(dev_priv, level) ?
> +			max_t(u32, fixed16_to_u32_round_up(selected_result) + 1,
> +				  fixed16_to_u32_round_up(wp->plane_blocks_per_line)) :
> +			fixed16_to_u32_round_up(selected_result) + 1;
>  	lines = div_round_up_fixed16(selected_result,
>  				     wp->plane_blocks_per_line);

I think this is a good fix, no IGT/BAT regressions are visible, also 
it fixes some of the current issues at customer side. So don't see any reason
for it not to be merged.

Reviewed-by: Stanislav Lisovskiy <stanislav.lisovskiy@intel.com>

P.S: there is some checkpatch warning, which probably needs to be addressed :)

Stan

>  
> -- 
> 2.25.1
>
Vinod Govindapillai April 6, 2022, 9:21 a.m. UTC | #2
On Wed, 2022-04-06 at 11:14 +0300, Lisovskiy, Stanislav wrote:
> On Mon, Apr 04, 2022 at 04:49:18PM +0300, Vinod Govindapillai wrote:
> > In configurations with single DRAM channel, for usecases like
> > 4K 60 Hz, FIFO underruns are observed quite frequently. Looks
> > like the wm0 watermark values need to bumped up because the wm0
> > memory latency calculations are probably not taking the DRAM
> > channel's impact into account.
> > 
> > As per the Bspec 49325, if the ddb allocation can hold at least
> > one plane_blocks_per_line we should have selected method2.
> > Assuming that modern HW versions have enough dbuf to hold
> > at least one line, set the wm blocks to equivalent to blocks
> > per line.
> > 
> > cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
> > cc: Stanislav Lisovskiy <stanislav.lisovskiy@intel.com>
> > 
> > Signed-off-by: Vinod Govindapillai <vinod.govindapillai@intel.com>
> > ---
> >  drivers/gpu/drm/i915/intel_pm.c | 19 ++++++++++++++++++-
> >  1 file changed, 18 insertions(+), 1 deletion(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c
> > index 8824f269e5f5..ae28a8c63ca4 100644
> > --- a/drivers/gpu/drm/i915/intel_pm.c
> > +++ b/drivers/gpu/drm/i915/intel_pm.c
> > @@ -5474,7 +5474,24 @@ static void skl_compute_plane_wm(const struct intel_crtc_state
> > *crtc_state,
> >  		}
> >  	}
> >  
> > -	blocks = fixed16_to_u32_round_up(selected_result) + 1;
> > +	/*
> > +	 * Lets have blocks at minimum equivalent to plane_blocks_per_line
> > +	 * as there will be at minimum one line for lines configuration.
> > +	 *
> > +	 * As per the Bspec 49325, if the ddb allocation can hold at least
> > +	 * one plane_blocks_per_line, we should have selected method2 in
> > +	 * the above logic. Assuming that modern versions have enough dbuf
> > +	 * and method2 guarantees blocks equivalent to at least 1 line,
> > +	 * select the blocks as plane_blocks_per_line.
> > +	 *
> > +	 * TODO: Revisit the logic when we have better understanding on DRAM
> > +	 * channels' impact on the level 0 memory latency and the relevant
> > +	 * wm calculations.
> > +	 */
> > +	blocks = skl_wm_has_lines(dev_priv, level) ?
> > +			max_t(u32, fixed16_to_u32_round_up(selected_result) + 1,
> > +				  fixed16_to_u32_round_up(wp->plane_blocks_per_line)) :
> > +			fixed16_to_u32_round_up(selected_result) + 1;
> >  	lines = div_round_up_fixed16(selected_result,
> >  				     wp->plane_blocks_per_line);
> 
> I think this is a good fix, no IGT/BAT regressions are visible, also 
> it fixes some of the current issues at customer side. So don't see any reason
> for it not to be merged.
> 
> Reviewed-by: Stanislav Lisovskiy <stanislav.lisovskiy@intel.com>
> 
> P.S: there is some checkpatch warning, which probably needs to be addressed :)

Thanks Stan. I will check this and update.

BR
vinod
> 
> Stan
> 
> >  
> > -- 
> > 2.25.1
> >
Ville Syrjälä April 6, 2022, 12:48 p.m. UTC | #3
On Mon, Apr 04, 2022 at 04:49:18PM +0300, Vinod Govindapillai wrote:
> In configurations with single DRAM channel, for usecases like
> 4K 60 Hz, FIFO underruns are observed quite frequently. Looks
> like the wm0 watermark values need to bumped up because the wm0
> memory latency calculations are probably not taking the DRAM
> channel's impact into account.
> 
> As per the Bspec 49325, if the ddb allocation can hold at least
> one plane_blocks_per_line we should have selected method2.
> Assuming that modern HW versions have enough dbuf to hold
> at least one line, set the wm blocks to equivalent to blocks
> per line.
> 
> cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
> cc: Stanislav Lisovskiy <stanislav.lisovskiy@intel.com>
> 
> Signed-off-by: Vinod Govindapillai <vinod.govindapillai@intel.com>
> ---
>  drivers/gpu/drm/i915/intel_pm.c | 19 ++++++++++++++++++-
>  1 file changed, 18 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c
> index 8824f269e5f5..ae28a8c63ca4 100644
> --- a/drivers/gpu/drm/i915/intel_pm.c
> +++ b/drivers/gpu/drm/i915/intel_pm.c
> @@ -5474,7 +5474,24 @@ static void skl_compute_plane_wm(const struct intel_crtc_state *crtc_state,
>  		}
>  	}
>  
> -	blocks = fixed16_to_u32_round_up(selected_result) + 1;
> +	/*
> +	 * Lets have blocks at minimum equivalent to plane_blocks_per_line
> +	 * as there will be at minimum one line for lines configuration.
> +	 *
> +	 * As per the Bspec 49325, if the ddb allocation can hold at least
> +	 * one plane_blocks_per_line, we should have selected method2 in
> +	 * the above logic. Assuming that modern versions have enough dbuf
> +	 * and method2 guarantees blocks equivalent to at least 1 line,
> +	 * select the blocks as plane_blocks_per_line.
> +	 *
> +	 * TODO: Revisit the logic when we have better understanding on DRAM
> +	 * channels' impact on the level 0 memory latency and the relevant
> +	 * wm calculations.
> +	 */
> +	blocks = skl_wm_has_lines(dev_priv, level) ?
> +			max_t(u32, fixed16_to_u32_round_up(selected_result) + 1,
> +				  fixed16_to_u32_round_up(wp->plane_blocks_per_line)) :
> +			fixed16_to_u32_round_up(selected_result) + 1;

That's looks rather convoluted.

  blocks = fixed16_to_u32_round_up(selected_result) + 1;
+ /* blah */
+ if (has_lines)
+	blocks = max(blocks, fixed16_to_u32_round_up(wp->plane_blocks_per_line));

Also since Art said nothing like this should actually be needed
I think the comment should make it a bit more clear that this
is just a hack to work around the underruns with some single
memory channel configurations.


>  	lines = div_round_up_fixed16(selected_result,
>  				     wp->plane_blocks_per_line);
>  
> -- 
> 2.25.1
Stanislav Lisovskiy April 6, 2022, 1:45 p.m. UTC | #4
On Wed, Apr 06, 2022 at 03:48:02PM +0300, Ville Syrjälä wrote:
> On Mon, Apr 04, 2022 at 04:49:18PM +0300, Vinod Govindapillai wrote:
> > In configurations with single DRAM channel, for usecases like
> > 4K 60 Hz, FIFO underruns are observed quite frequently. Looks
> > like the wm0 watermark values need to bumped up because the wm0
> > memory latency calculations are probably not taking the DRAM
> > channel's impact into account.
> > 
> > As per the Bspec 49325, if the ddb allocation can hold at least
> > one plane_blocks_per_line we should have selected method2.
> > Assuming that modern HW versions have enough dbuf to hold
> > at least one line, set the wm blocks to equivalent to blocks
> > per line.
> > 
> > cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
> > cc: Stanislav Lisovskiy <stanislav.lisovskiy@intel.com>
> > 
> > Signed-off-by: Vinod Govindapillai <vinod.govindapillai@intel.com>
> > ---
> >  drivers/gpu/drm/i915/intel_pm.c | 19 ++++++++++++++++++-
> >  1 file changed, 18 insertions(+), 1 deletion(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c
> > index 8824f269e5f5..ae28a8c63ca4 100644
> > --- a/drivers/gpu/drm/i915/intel_pm.c
> > +++ b/drivers/gpu/drm/i915/intel_pm.c
> > @@ -5474,7 +5474,24 @@ static void skl_compute_plane_wm(const struct intel_crtc_state *crtc_state,
> >  		}
> >  	}
> >  
> > -	blocks = fixed16_to_u32_round_up(selected_result) + 1;
> > +	/*
> > +	 * Lets have blocks at minimum equivalent to plane_blocks_per_line
> > +	 * as there will be at minimum one line for lines configuration.
> > +	 *
> > +	 * As per the Bspec 49325, if the ddb allocation can hold at least
> > +	 * one plane_blocks_per_line, we should have selected method2 in
> > +	 * the above logic. Assuming that modern versions have enough dbuf
> > +	 * and method2 guarantees blocks equivalent to at least 1 line,
> > +	 * select the blocks as plane_blocks_per_line.
> > +	 *
> > +	 * TODO: Revisit the logic when we have better understanding on DRAM
> > +	 * channels' impact on the level 0 memory latency and the relevant
> > +	 * wm calculations.
> > +	 */
> > +	blocks = skl_wm_has_lines(dev_priv, level) ?
> > +			max_t(u32, fixed16_to_u32_round_up(selected_result) + 1,
> > +				  fixed16_to_u32_round_up(wp->plane_blocks_per_line)) :
> > +			fixed16_to_u32_round_up(selected_result) + 1;
> 
> That's looks rather convoluted.
> 
>   blocks = fixed16_to_u32_round_up(selected_result) + 1;
> + /* blah */
> + if (has_lines)
> +	blocks = max(blocks, fixed16_to_u32_round_up(wp->plane_blocks_per_line));

We probably need to do similar refactoring in the whole function ;-)

> 
> Also since Art said nothing like this should actually be needed
> I think the comment should make it a bit more clear that this
> is just a hack to work around the underruns with some single
> memory channel configurations.

It is actually not quite a hack, because we are missing that condition
implementation from BSpec 49325, which instructs us to select method2
when ddb blocks allocation is known and that ratio is >= 1.

Mean this one:

"If ('plane buffer allocation' is known and (plane buffer allocation / plane blocks per line) >=1)
Selected Result Blocks = Method 2"

Stan

> 
> 
> >  	lines = div_round_up_fixed16(selected_result,
> >  				     wp->plane_blocks_per_line);
> >  
> > -- 
> > 2.25.1
> 
> -- 
> Ville Syrjälä
> Intel
Ville Syrjälä April 6, 2022, 2:01 p.m. UTC | #5
On Wed, Apr 06, 2022 at 04:45:26PM +0300, Lisovskiy, Stanislav wrote:
> On Wed, Apr 06, 2022 at 03:48:02PM +0300, Ville Syrjälä wrote:
> > On Mon, Apr 04, 2022 at 04:49:18PM +0300, Vinod Govindapillai wrote:
> > > In configurations with single DRAM channel, for usecases like
> > > 4K 60 Hz, FIFO underruns are observed quite frequently. Looks
> > > like the wm0 watermark values need to bumped up because the wm0
> > > memory latency calculations are probably not taking the DRAM
> > > channel's impact into account.
> > > 
> > > As per the Bspec 49325, if the ddb allocation can hold at least
> > > one plane_blocks_per_line we should have selected method2.
> > > Assuming that modern HW versions have enough dbuf to hold
> > > at least one line, set the wm blocks to equivalent to blocks
> > > per line.
> > > 
> > > cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
> > > cc: Stanislav Lisovskiy <stanislav.lisovskiy@intel.com>
> > > 
> > > Signed-off-by: Vinod Govindapillai <vinod.govindapillai@intel.com>
> > > ---
> > >  drivers/gpu/drm/i915/intel_pm.c | 19 ++++++++++++++++++-
> > >  1 file changed, 18 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c
> > > index 8824f269e5f5..ae28a8c63ca4 100644
> > > --- a/drivers/gpu/drm/i915/intel_pm.c
> > > +++ b/drivers/gpu/drm/i915/intel_pm.c
> > > @@ -5474,7 +5474,24 @@ static void skl_compute_plane_wm(const struct intel_crtc_state *crtc_state,
> > >  		}
> > >  	}
> > >  
> > > -	blocks = fixed16_to_u32_round_up(selected_result) + 1;
> > > +	/*
> > > +	 * Lets have blocks at minimum equivalent to plane_blocks_per_line
> > > +	 * as there will be at minimum one line for lines configuration.
> > > +	 *
> > > +	 * As per the Bspec 49325, if the ddb allocation can hold at least
> > > +	 * one plane_blocks_per_line, we should have selected method2 in
> > > +	 * the above logic. Assuming that modern versions have enough dbuf
> > > +	 * and method2 guarantees blocks equivalent to at least 1 line,
> > > +	 * select the blocks as plane_blocks_per_line.
> > > +	 *
> > > +	 * TODO: Revisit the logic when we have better understanding on DRAM
> > > +	 * channels' impact on the level 0 memory latency and the relevant
> > > +	 * wm calculations.
> > > +	 */
> > > +	blocks = skl_wm_has_lines(dev_priv, level) ?
> > > +			max_t(u32, fixed16_to_u32_round_up(selected_result) + 1,
> > > +				  fixed16_to_u32_round_up(wp->plane_blocks_per_line)) :
> > > +			fixed16_to_u32_round_up(selected_result) + 1;
> > 
> > That's looks rather convoluted.
> > 
> >   blocks = fixed16_to_u32_round_up(selected_result) + 1;
> > + /* blah */
> > + if (has_lines)
> > +	blocks = max(blocks, fixed16_to_u32_round_up(wp->plane_blocks_per_line));
> 
> We probably need to do similar refactoring in the whole function ;-)
> 
> > 
> > Also since Art said nothing like this should actually be needed
> > I think the comment should make it a bit more clear that this
> > is just a hack to work around the underruns with some single
> > memory channel configurations.
> 
> It is actually not quite a hack, because we are missing that condition
> implementation from BSpec 49325, which instructs us to select method2
> when ddb blocks allocation is known and that ratio is >= 1.

The ddb allocation is not yet known, so we're implementing the
algorithm 100% correctly.

And this patch does not implement that misisng part anyway.

> 
> Mean this one:
> 
> "If ('plane buffer allocation' is known and (plane buffer allocation / plane blocks per line) >=1)
> Selected Result Blocks = Method 2"
> 
> Stan
> 
> > 
> > 
> > >  	lines = div_round_up_fixed16(selected_result,
> > >  				     wp->plane_blocks_per_line);
> > >  
> > > -- 
> > > 2.25.1
> > 
> > -- 
> > Ville Syrjälä
> > Intel
Vinod Govindapillai April 6, 2022, 2:15 p.m. UTC | #6
On Wed, 2022-04-06 at 17:01 +0300, Ville Syrjälä wrote:
> On Wed, Apr 06, 2022 at 04:45:26PM +0300, Lisovskiy, Stanislav wrote:
> > On Wed, Apr 06, 2022 at 03:48:02PM +0300, Ville Syrjälä wrote:
> > > On Mon, Apr 04, 2022 at 04:49:18PM +0300, Vinod Govindapillai wrote:
> > > > In configurations with single DRAM channel, for usecases like
> > > > 4K 60 Hz, FIFO underruns are observed quite frequently. Looks
> > > > like the wm0 watermark values need to bumped up because the wm0
> > > > memory latency calculations are probably not taking the DRAM
> > > > channel's impact into account.
> > > > 
> > > > As per the Bspec 49325, if the ddb allocation can hold at least
> > > > one plane_blocks_per_line we should have selected method2.
> > > > Assuming that modern HW versions have enough dbuf to hold
> > > > at least one line, set the wm blocks to equivalent to blocks
> > > > per line.
> > > > 
> > > > cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
> > > > cc: Stanislav Lisovskiy <stanislav.lisovskiy@intel.com>
> > > > 
> > > > Signed-off-by: Vinod Govindapillai <vinod.govindapillai@intel.com>
> > > > ---
> > > >  drivers/gpu/drm/i915/intel_pm.c | 19 ++++++++++++++++++-
> > > >  1 file changed, 18 insertions(+), 1 deletion(-)
> > > > 
> > > > diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c
> > > > index 8824f269e5f5..ae28a8c63ca4 100644
> > > > --- a/drivers/gpu/drm/i915/intel_pm.c
> > > > +++ b/drivers/gpu/drm/i915/intel_pm.c
> > > > @@ -5474,7 +5474,24 @@ static void skl_compute_plane_wm(const struct intel_crtc_state
> > > > *crtc_state,
> > > >  		}
> > > >  	}
> > > >  
> > > > -	blocks = fixed16_to_u32_round_up(selected_result) + 1;
> > > > +	/*
> > > > +	 * Lets have blocks at minimum equivalent to plane_blocks_per_line
> > > > +	 * as there will be at minimum one line for lines configuration.
> > > > +	 *
> > > > +	 * As per the Bspec 49325, if the ddb allocation can hold at least
> > > > +	 * one plane_blocks_per_line, we should have selected method2 in
> > > > +	 * the above logic. Assuming that modern versions have enough dbuf
> > > > +	 * and method2 guarantees blocks equivalent to at least 1 line,
> > > > +	 * select the blocks as plane_blocks_per_line.
> > > > +	 *
> > > > +	 * TODO: Revisit the logic when we have better understanding on DRAM
> > > > +	 * channels' impact on the level 0 memory latency and the relevant
> > > > +	 * wm calculations.
> > > > +	 */
> > > > +	blocks = skl_wm_has_lines(dev_priv, level) ?
> > > > +			max_t(u32, fixed16_to_u32_round_up(selected_result) + 1,
> > > > +				  fixed16_to_u32_round_up(wp->plane_blocks_per_line)) :
> > > > +			fixed16_to_u32_round_up(selected_result) + 1;
> > > 
> > > That's looks rather convoluted.
> > > 
> > >   blocks = fixed16_to_u32_round_up(selected_result) + 1;
> > > + /* blah */
> > > + if (has_lines)
> > > +	blocks = max(blocks, fixed16_to_u32_round_up(wp->plane_blocks_per_line));
> > 
> > We probably need to do similar refactoring in the whole function ;-)
> > 
> > > Also since Art said nothing like this should actually be needed
> > > I think the comment should make it a bit more clear that this
> > > is just a hack to work around the underruns with some single
> > > memory channel configurations.
> > 
> > It is actually not quite a hack, because we are missing that condition
> > implementation from BSpec 49325, which instructs us to select method2
> > when ddb blocks allocation is known and that ratio is >= 1.

In the slides sent by Art, It is mentioned that driver should be using the wm results to arrive at
optimum ddb allocations. So I guess the best solution would be to identify the extra latency because
of the single DRAM channel and calculate the wm.

> 
> The ddb allocation is not yet known, so we're implementing the
> algorithm 100% correctly.
> 
> And this patch does not implement that misisng part anyway.

Thanks. Updated the patch as per your comments and V2 sent.

BR
Vinod

> > Mean this one:
> > 
> > "If ('plane buffer allocation' is known and (plane buffer allocation / plane blocks per line)
> > >=1)
> > Selected Result Blocks = Method 2"
> > 
> > Stan
> > 
> > > 
> > > >  	lines = div_round_up_fixed16(selected_result,
> > > >  				     wp->plane_blocks_per_line);
> > > >  
> > > > -- 
> > > > 2.25.1
> > > 
> > > -- 
> > > Ville Syrjälä
> > > Intel
Stanislav Lisovskiy April 6, 2022, 5:14 p.m. UTC | #7
On Wed, Apr 06, 2022 at 05:01:39PM +0300, Ville Syrjälä wrote:
> On Wed, Apr 06, 2022 at 04:45:26PM +0300, Lisovskiy, Stanislav wrote:
> > On Wed, Apr 06, 2022 at 03:48:02PM +0300, Ville Syrjälä wrote:
> > > On Mon, Apr 04, 2022 at 04:49:18PM +0300, Vinod Govindapillai wrote:
> > > > In configurations with single DRAM channel, for usecases like
> > > > 4K 60 Hz, FIFO underruns are observed quite frequently. Looks
> > > > like the wm0 watermark values need to bumped up because the wm0
> > > > memory latency calculations are probably not taking the DRAM
> > > > channel's impact into account.
> > > > 
> > > > As per the Bspec 49325, if the ddb allocation can hold at least
> > > > one plane_blocks_per_line we should have selected method2.
> > > > Assuming that modern HW versions have enough dbuf to hold
> > > > at least one line, set the wm blocks to equivalent to blocks
> > > > per line.
> > > > 
> > > > cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
> > > > cc: Stanislav Lisovskiy <stanislav.lisovskiy@intel.com>
> > > > 
> > > > Signed-off-by: Vinod Govindapillai <vinod.govindapillai@intel.com>
> > > > ---
> > > >  drivers/gpu/drm/i915/intel_pm.c | 19 ++++++++++++++++++-
> > > >  1 file changed, 18 insertions(+), 1 deletion(-)
> > > > 
> > > > diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c
> > > > index 8824f269e5f5..ae28a8c63ca4 100644
> > > > --- a/drivers/gpu/drm/i915/intel_pm.c
> > > > +++ b/drivers/gpu/drm/i915/intel_pm.c
> > > > @@ -5474,7 +5474,24 @@ static void skl_compute_plane_wm(const struct intel_crtc_state *crtc_state,
> > > >  		}
> > > >  	}
> > > >  
> > > > -	blocks = fixed16_to_u32_round_up(selected_result) + 1;
> > > > +	/*
> > > > +	 * Lets have blocks at minimum equivalent to plane_blocks_per_line
> > > > +	 * as there will be at minimum one line for lines configuration.
> > > > +	 *
> > > > +	 * As per the Bspec 49325, if the ddb allocation can hold at least
> > > > +	 * one plane_blocks_per_line, we should have selected method2 in
> > > > +	 * the above logic. Assuming that modern versions have enough dbuf
> > > > +	 * and method2 guarantees blocks equivalent to at least 1 line,
> > > > +	 * select the blocks as plane_blocks_per_line.
> > > > +	 *
> > > > +	 * TODO: Revisit the logic when we have better understanding on DRAM
> > > > +	 * channels' impact on the level 0 memory latency and the relevant
> > > > +	 * wm calculations.
> > > > +	 */
> > > > +	blocks = skl_wm_has_lines(dev_priv, level) ?
> > > > +			max_t(u32, fixed16_to_u32_round_up(selected_result) + 1,
> > > > +				  fixed16_to_u32_round_up(wp->plane_blocks_per_line)) :
> > > > +			fixed16_to_u32_round_up(selected_result) + 1;
> > > 
> > > That's looks rather convoluted.
> > > 
> > >   blocks = fixed16_to_u32_round_up(selected_result) + 1;
> > > + /* blah */
> > > + if (has_lines)
> > > +	blocks = max(blocks, fixed16_to_u32_round_up(wp->plane_blocks_per_line));
> > 
> > We probably need to do similar refactoring in the whole function ;-)
> > 
> > > 
> > > Also since Art said nothing like this should actually be needed
> > > I think the comment should make it a bit more clear that this
> > > is just a hack to work around the underruns with some single
> > > memory channel configurations.
> > 
> > It is actually not quite a hack, because we are missing that condition
> > implementation from BSpec 49325, which instructs us to select method2
> > when ddb blocks allocation is known and that ratio is >= 1.
> 
> The ddb allocation is not yet known, so we're implementing the
> algorithm 100% correctly.
> 
> And this patch does not implement that misisng part anyway.

Yes, as I understood method2 would just give amount of blocks to be
at least as dbuf blocks per line.

Wonder whether should we actually fully implement this BSpec clause 
and add it to the point where ddb allocation is known or are there 
any obstacles to do that, besides having to reshuffle this function a bit?

Stan

> 
> > 
> > Mean this one:
> > 
> > "If ('plane buffer allocation' is known and (plane buffer allocation / plane blocks per line) >=1)
> > Selected Result Blocks = Method 2"
> > 
> > Stan
> > 
> > > 
> > > 
> > > >  	lines = div_round_up_fixed16(selected_result,
> > > >  				     wp->plane_blocks_per_line);
> > > >  
> > > > -- 
> > > > 2.25.1
> > > 
> > > -- 
> > > Ville Syrjälä
> > > Intel
> 
> -- 
> Ville Syrjälä
> Intel
Ville Syrjälä April 6, 2022, 6:09 p.m. UTC | #8
On Wed, Apr 06, 2022 at 08:14:58PM +0300, Lisovskiy, Stanislav wrote:
> On Wed, Apr 06, 2022 at 05:01:39PM +0300, Ville Syrjälä wrote:
> > On Wed, Apr 06, 2022 at 04:45:26PM +0300, Lisovskiy, Stanislav wrote:
> > > On Wed, Apr 06, 2022 at 03:48:02PM +0300, Ville Syrjälä wrote:
> > > > On Mon, Apr 04, 2022 at 04:49:18PM +0300, Vinod Govindapillai wrote:
> > > > > In configurations with single DRAM channel, for usecases like
> > > > > 4K 60 Hz, FIFO underruns are observed quite frequently. Looks
> > > > > like the wm0 watermark values need to bumped up because the wm0
> > > > > memory latency calculations are probably not taking the DRAM
> > > > > channel's impact into account.
> > > > > 
> > > > > As per the Bspec 49325, if the ddb allocation can hold at least
> > > > > one plane_blocks_per_line we should have selected method2.
> > > > > Assuming that modern HW versions have enough dbuf to hold
> > > > > at least one line, set the wm blocks to equivalent to blocks
> > > > > per line.
> > > > > 
> > > > > cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
> > > > > cc: Stanislav Lisovskiy <stanislav.lisovskiy@intel.com>
> > > > > 
> > > > > Signed-off-by: Vinod Govindapillai <vinod.govindapillai@intel.com>
> > > > > ---
> > > > >  drivers/gpu/drm/i915/intel_pm.c | 19 ++++++++++++++++++-
> > > > >  1 file changed, 18 insertions(+), 1 deletion(-)
> > > > > 
> > > > > diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c
> > > > > index 8824f269e5f5..ae28a8c63ca4 100644
> > > > > --- a/drivers/gpu/drm/i915/intel_pm.c
> > > > > +++ b/drivers/gpu/drm/i915/intel_pm.c
> > > > > @@ -5474,7 +5474,24 @@ static void skl_compute_plane_wm(const struct intel_crtc_state *crtc_state,
> > > > >  		}
> > > > >  	}
> > > > >  
> > > > > -	blocks = fixed16_to_u32_round_up(selected_result) + 1;
> > > > > +	/*
> > > > > +	 * Lets have blocks at minimum equivalent to plane_blocks_per_line
> > > > > +	 * as there will be at minimum one line for lines configuration.
> > > > > +	 *
> > > > > +	 * As per the Bspec 49325, if the ddb allocation can hold at least
> > > > > +	 * one plane_blocks_per_line, we should have selected method2 in
> > > > > +	 * the above logic. Assuming that modern versions have enough dbuf
> > > > > +	 * and method2 guarantees blocks equivalent to at least 1 line,
> > > > > +	 * select the blocks as plane_blocks_per_line.
> > > > > +	 *
> > > > > +	 * TODO: Revisit the logic when we have better understanding on DRAM
> > > > > +	 * channels' impact on the level 0 memory latency and the relevant
> > > > > +	 * wm calculations.
> > > > > +	 */
> > > > > +	blocks = skl_wm_has_lines(dev_priv, level) ?
> > > > > +			max_t(u32, fixed16_to_u32_round_up(selected_result) + 1,
> > > > > +				  fixed16_to_u32_round_up(wp->plane_blocks_per_line)) :
> > > > > +			fixed16_to_u32_round_up(selected_result) + 1;
> > > > 
> > > > That's looks rather convoluted.
> > > > 
> > > >   blocks = fixed16_to_u32_round_up(selected_result) + 1;
> > > > + /* blah */
> > > > + if (has_lines)
> > > > +	blocks = max(blocks, fixed16_to_u32_round_up(wp->plane_blocks_per_line));
> > > 
> > > We probably need to do similar refactoring in the whole function ;-)
> > > 
> > > > 
> > > > Also since Art said nothing like this should actually be needed
> > > > I think the comment should make it a bit more clear that this
> > > > is just a hack to work around the underruns with some single
> > > > memory channel configurations.
> > > 
> > > It is actually not quite a hack, because we are missing that condition
> > > implementation from BSpec 49325, which instructs us to select method2
> > > when ddb blocks allocation is known and that ratio is >= 1.
> > 
> > The ddb allocation is not yet known, so we're implementing the
> > algorithm 100% correctly.
> > 
> > And this patch does not implement that misisng part anyway.
> 
> Yes, as I understood method2 would just give amount of blocks to be
> at least as dbuf blocks per line.
> 
> Wonder whether should we actually fully implement this BSpec clause 
> and add it to the point where ddb allocation is known or are there 
> any obstacles to do that, besides having to reshuffle this function a bit?

We need to calculate the wm to figure out how much ddb to allocate,
and then we'd need the ddb allocation to figure out how to calculate
the wm. Very much chicken vs. egg right there. We'd have to do some
kind of hideous loop where we'd calculate everything twice. I don't
really want to do that since I'd actually like to move the wm
calculation to happen already much earlier during .check_plane()
as that could reduce the amount of redundant wm calculations we
are currently doing.
Stanislav Lisovskiy April 7, 2022, 6:43 a.m. UTC | #9
On Wed, Apr 06, 2022 at 09:09:06PM +0300, Ville Syrjälä wrote:
> On Wed, Apr 06, 2022 at 08:14:58PM +0300, Lisovskiy, Stanislav wrote:
> > On Wed, Apr 06, 2022 at 05:01:39PM +0300, Ville Syrjälä wrote:
> > > On Wed, Apr 06, 2022 at 04:45:26PM +0300, Lisovskiy, Stanislav wrote:
> > > > On Wed, Apr 06, 2022 at 03:48:02PM +0300, Ville Syrjälä wrote:
> > > > > On Mon, Apr 04, 2022 at 04:49:18PM +0300, Vinod Govindapillai wrote:
> > > > > > In configurations with single DRAM channel, for usecases like
> > > > > > 4K 60 Hz, FIFO underruns are observed quite frequently. Looks
> > > > > > like the wm0 watermark values need to bumped up because the wm0
> > > > > > memory latency calculations are probably not taking the DRAM
> > > > > > channel's impact into account.
> > > > > > 
> > > > > > As per the Bspec 49325, if the ddb allocation can hold at least
> > > > > > one plane_blocks_per_line we should have selected method2.
> > > > > > Assuming that modern HW versions have enough dbuf to hold
> > > > > > at least one line, set the wm blocks to equivalent to blocks
> > > > > > per line.
> > > > > > 
> > > > > > cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
> > > > > > cc: Stanislav Lisovskiy <stanislav.lisovskiy@intel.com>
> > > > > > 
> > > > > > Signed-off-by: Vinod Govindapillai <vinod.govindapillai@intel.com>
> > > > > > ---
> > > > > >  drivers/gpu/drm/i915/intel_pm.c | 19 ++++++++++++++++++-
> > > > > >  1 file changed, 18 insertions(+), 1 deletion(-)
> > > > > > 
> > > > > > diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c
> > > > > > index 8824f269e5f5..ae28a8c63ca4 100644
> > > > > > --- a/drivers/gpu/drm/i915/intel_pm.c
> > > > > > +++ b/drivers/gpu/drm/i915/intel_pm.c
> > > > > > @@ -5474,7 +5474,24 @@ static void skl_compute_plane_wm(const struct intel_crtc_state *crtc_state,
> > > > > >  		}
> > > > > >  	}
> > > > > >  
> > > > > > -	blocks = fixed16_to_u32_round_up(selected_result) + 1;
> > > > > > +	/*
> > > > > > +	 * Lets have blocks at minimum equivalent to plane_blocks_per_line
> > > > > > +	 * as there will be at minimum one line for lines configuration.
> > > > > > +	 *
> > > > > > +	 * As per the Bspec 49325, if the ddb allocation can hold at least
> > > > > > +	 * one plane_blocks_per_line, we should have selected method2 in
> > > > > > +	 * the above logic. Assuming that modern versions have enough dbuf
> > > > > > +	 * and method2 guarantees blocks equivalent to at least 1 line,
> > > > > > +	 * select the blocks as plane_blocks_per_line.
> > > > > > +	 *
> > > > > > +	 * TODO: Revisit the logic when we have better understanding on DRAM
> > > > > > +	 * channels' impact on the level 0 memory latency and the relevant
> > > > > > +	 * wm calculations.
> > > > > > +	 */
> > > > > > +	blocks = skl_wm_has_lines(dev_priv, level) ?
> > > > > > +			max_t(u32, fixed16_to_u32_round_up(selected_result) + 1,
> > > > > > +				  fixed16_to_u32_round_up(wp->plane_blocks_per_line)) :
> > > > > > +			fixed16_to_u32_round_up(selected_result) + 1;
> > > > > 
> > > > > That's looks rather convoluted.
> > > > > 
> > > > >   blocks = fixed16_to_u32_round_up(selected_result) + 1;
> > > > > + /* blah */
> > > > > + if (has_lines)
> > > > > +	blocks = max(blocks, fixed16_to_u32_round_up(wp->plane_blocks_per_line));
> > > > 
> > > > We probably need to do similar refactoring in the whole function ;-)
> > > > 
> > > > > 
> > > > > Also since Art said nothing like this should actually be needed
> > > > > I think the comment should make it a bit more clear that this
> > > > > is just a hack to work around the underruns with some single
> > > > > memory channel configurations.
> > > > 
> > > > It is actually not quite a hack, because we are missing that condition
> > > > implementation from BSpec 49325, which instructs us to select method2
> > > > when ddb blocks allocation is known and that ratio is >= 1.
> > > 
> > > The ddb allocation is not yet known, so we're implementing the
> > > algorithm 100% correctly.
> > > 
> > > And this patch does not implement that misisng part anyway.
> > 
> > Yes, as I understood method2 would just give amount of blocks to be
> > at least as dbuf blocks per line.
> > 
> > Wonder whether should we actually fully implement this BSpec clause 
> > and add it to the point where ddb allocation is known or are there 
> > any obstacles to do that, besides having to reshuffle this function a bit?
> 
> We need to calculate the wm to figure out how much ddb to allocate,
> and then we'd need the ddb allocation to figure out how to calculate
> the wm. Very much chicken vs. egg right there. We'd have to do some
> kind of hideous loop where we'd calculate everything twice. I don't
> really want to do that since I'd actually like to move the wm
> calculation to happen already much earlier during .check_plane()
> as that could reduce the amount of redundant wm calculations we
> are currently doing.

I might be missing some details right now, but why do we need a ddb
allocation to count wms?

I thought its like we usually calculate wm levels + min_ddb_allocation,
then based on that we do allocate min_ddb + extra for each plane.
This is correct that by this moment when we calculate wms we have only
min_ddb available, so if this level would be even enabled, we would
at least need min_ddb blocks.

I think we could just use that min_ddb value here for that purpose,
because the condition anyway checks if 
(plane buffer allocation / plane blocks per line) >=1 so, even if
if this wm level would be enabled plane buffer allocation would
be at least min_ddb _or higher_ - however that won't affect this 
condition because even if it happens to be "plane buffer allocation
+ some extra" the ratio would still be valid.
So if it executes for min_ddb / plane blocks per line, we can
probably safely state, further it will be also true.

Stan

> 
> -- 
> Ville Syrjälä
> Intel
Vinod Govindapillai April 7, 2022, 12:09 p.m. UTC | #10
On Thu, 2022-04-07 at 09:43 +0300, Lisovskiy, Stanislav wrote:
> On Wed, Apr 06, 2022 at 09:09:06PM +0300, Ville Syrjälä wrote:
> > On Wed, Apr 06, 2022 at 08:14:58PM +0300, Lisovskiy, Stanislav wrote:
> > > On Wed, Apr 06, 2022 at 05:01:39PM +0300, Ville Syrjälä wrote:
> > > > On Wed, Apr 06, 2022 at 04:45:26PM +0300, Lisovskiy, Stanislav wrote:
> > > > > On Wed, Apr 06, 2022 at 03:48:02PM +0300, Ville Syrjälä wrote:
> > > > > > On Mon, Apr 04, 2022 at 04:49:18PM +0300, Vinod Govindapillai wrote:
> > > > > > > In configurations with single DRAM channel, for usecases like
> > > > > > > 4K 60 Hz, FIFO underruns are observed quite frequently. Looks
> > > > > > > like the wm0 watermark values need to bumped up because the wm0
> > > > > > > memory latency calculations are probably not taking the DRAM
> > > > > > > channel's impact into account.
> > > > > > > 
> > > > > > > As per the Bspec 49325, if the ddb allocation can hold at least
> > > > > > > one plane_blocks_per_line we should have selected method2.
> > > > > > > Assuming that modern HW versions have enough dbuf to hold
> > > > > > > at least one line, set the wm blocks to equivalent to blocks
> > > > > > > per line.
> > > > > > > 
> > > > > > > cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
> > > > > > > cc: Stanislav Lisovskiy <stanislav.lisovskiy@intel.com>
> > > > > > > 
> > > > > > > Signed-off-by: Vinod Govindapillai <vinod.govindapillai@intel.com>
> > > > > > > ---
> > > > > > >  drivers/gpu/drm/i915/intel_pm.c | 19 ++++++++++++++++++-
> > > > > > >  1 file changed, 18 insertions(+), 1 deletion(-)
> > > > > > > 
> > > > > > > diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c
> > > > > > > index 8824f269e5f5..ae28a8c63ca4 100644
> > > > > > > --- a/drivers/gpu/drm/i915/intel_pm.c
> > > > > > > +++ b/drivers/gpu/drm/i915/intel_pm.c
> > > > > > > @@ -5474,7 +5474,24 @@ static void skl_compute_plane_wm(const struct intel_crtc_state
> > > > > > > *crtc_state,
> > > > > > >  		}
> > > > > > >  	}
> > > > > > >  
> > > > > > > -	blocks = fixed16_to_u32_round_up(selected_result) + 1;
> > > > > > > +	/*
> > > > > > > +	 * Lets have blocks at minimum equivalent to plane_blocks_per_line
> > > > > > > +	 * as there will be at minimum one line for lines configuration.
> > > > > > > +	 *
> > > > > > > +	 * As per the Bspec 49325, if the ddb allocation can hold at least
> > > > > > > +	 * one plane_blocks_per_line, we should have selected method2 in
> > > > > > > +	 * the above logic. Assuming that modern versions have enough dbuf
> > > > > > > +	 * and method2 guarantees blocks equivalent to at least 1 line,
> > > > > > > +	 * select the blocks as plane_blocks_per_line.
> > > > > > > +	 *
> > > > > > > +	 * TODO: Revisit the logic when we have better understanding on DRAM
> > > > > > > +	 * channels' impact on the level 0 memory latency and the relevant
> > > > > > > +	 * wm calculations.
> > > > > > > +	 */
> > > > > > > +	blocks = skl_wm_has_lines(dev_priv, level) ?
> > > > > > > +			max_t(u32, fixed16_to_u32_round_up(selected_result) + 1,
> > > > > > > +				  fixed16_to_u32_round_up(wp->plane_blocks_per_line)) :
> > > > > > > +			fixed16_to_u32_round_up(selected_result) + 1;
> > > > > > 
> > > > > > That's looks rather convoluted.
> > > > > > 
> > > > > >   blocks = fixed16_to_u32_round_up(selected_result) + 1;
> > > > > > + /* blah */
> > > > > > + if (has_lines)
> > > > > > +	blocks = max(blocks, fixed16_to_u32_round_up(wp->plane_blocks_per_line));
> > > > > 
> > > > > We probably need to do similar refactoring in the whole function ;-)
> > > > > 
> > > > > > Also since Art said nothing like this should actually be needed
> > > > > > I think the comment should make it a bit more clear that this
> > > > > > is just a hack to work around the underruns with some single
> > > > > > memory channel configurations.
> > > > > 
> > > > > It is actually not quite a hack, because we are missing that condition
> > > > > implementation from BSpec 49325, which instructs us to select method2
> > > > > when ddb blocks allocation is known and that ratio is >= 1.
> > > > 
> > > > The ddb allocation is not yet known, so we're implementing the
> > > > algorithm 100% correctly.
> > > > 
> > > > And this patch does not implement that misisng part anyway.
> > > 
> > > Yes, as I understood method2 would just give amount of blocks to be
> > > at least as dbuf blocks per line.
> > > 
> > > Wonder whether should we actually fully implement this BSpec clause 
> > > and add it to the point where ddb allocation is known or are there 
> > > any obstacles to do that, besides having to reshuffle this function a bit?
> > 
> > We need to calculate the wm to figure out how much ddb to allocate,
> > and then we'd need the ddb allocation to figure out how to calculate
> > the wm. Very much chicken vs. egg right there. We'd have to do some
> > kind of hideous loop where we'd calculate everything twice. I don't
> > really want to do that since I'd actually like to move the wm
> > calculation to happen already much earlier during .check_plane()
> > as that could reduce the amount of redundant wm calculations we
> > are currently doing.
> 
> I might be missing some details right now, but why do we need a ddb
> allocation to count wms?
> 
> I thought its like we usually calculate wm levels + min_ddb_allocation,
> then based on that we do allocate min_ddb + extra for each plane.
> This is correct that by this moment when we calculate wms we have only
> min_ddb available, so if this level would be even enabled, we would
> at least need min_ddb blocks.
> 
> I think we could just use that min_ddb value here for that purpose,
> because the condition anyway checks if 
> (plane buffer allocation / plane blocks per line) >=1 so, even if
> if this wm level would be enabled plane buffer allocation would
> be at least min_ddb _or higher_ - however that won't affect this 
> condition because even if it happens to be "plane buffer allocation
> + some extra" the ratio would still be valid.
> So if it executes for min_ddb / plane blocks per line, we can
> probably safely state, further it will be also true.

min_ddb = 110% of the blocks calculated from the 2 methods (blocks + 10%)
It depends on what method we choose. So I dont think we can use it for any assumptions.

But in any case, I think this patch do not cause any harm in most of the usecases expected out of
skl+ platforms which have enough dbuf!

Per plane ddb allocation happens based on the highest wm level min_ddb which can fit into the
allocation. If one level is not fit, then that level + above package C state transitions are
disabled. 
Now if you look at the logic to select which method to use - if the latency >= linetime, we select
the large buffer method which guantees that there is atleast plane_blocks_per_line. So I think we
can safely assume that latency for wake wm level will be mostly higher, which implies using the
"large buffer" method.

So this change mostly limits to wm0. And hence should not impact ddb allocation, but the memory
fetch bursts might happen slightly more frequently when the processor is in C0?

BR
vinod

> 
> Stan
> 
> > -- 
> > Ville Syrjälä
> > Intel
Stanislav Lisovskiy April 7, 2022, 12:31 p.m. UTC | #11
On Thu, Apr 07, 2022 at 03:09:48PM +0300, Govindapillai, Vinod wrote:
> On Thu, 2022-04-07 at 09:43 +0300, Lisovskiy, Stanislav wrote:
> > On Wed, Apr 06, 2022 at 09:09:06PM +0300, Ville Syrjälä wrote:
> > > On Wed, Apr 06, 2022 at 08:14:58PM +0300, Lisovskiy, Stanislav wrote:
> > > > On Wed, Apr 06, 2022 at 05:01:39PM +0300, Ville Syrjälä wrote:
> > > > > On Wed, Apr 06, 2022 at 04:45:26PM +0300, Lisovskiy, Stanislav wrote:
> > > > > > On Wed, Apr 06, 2022 at 03:48:02PM +0300, Ville Syrjälä wrote:
> > > > > > > On Mon, Apr 04, 2022 at 04:49:18PM +0300, Vinod Govindapillai wrote:
> > > > > > > > In configurations with single DRAM channel, for usecases like
> > > > > > > > 4K 60 Hz, FIFO underruns are observed quite frequently. Looks
> > > > > > > > like the wm0 watermark values need to bumped up because the wm0
> > > > > > > > memory latency calculations are probably not taking the DRAM
> > > > > > > > channel's impact into account.
> > > > > > > >
> > > > > > > > As per the Bspec 49325, if the ddb allocation can hold at least
> > > > > > > > one plane_blocks_per_line we should have selected method2.
> > > > > > > > Assuming that modern HW versions have enough dbuf to hold
> > > > > > > > at least one line, set the wm blocks to equivalent to blocks
> > > > > > > > per line.
> > > > > > > >
> > > > > > > > cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
> > > > > > > > cc: Stanislav Lisovskiy <stanislav.lisovskiy@intel.com>
> > > > > > > >
> > > > > > > > Signed-off-by: Vinod Govindapillai <vinod.govindapillai@intel.com>
> > > > > > > > ---
> > > > > > > >  drivers/gpu/drm/i915/intel_pm.c | 19 ++++++++++++++++++-
> > > > > > > >  1 file changed, 18 insertions(+), 1 deletion(-)
> > > > > > > >
> > > > > > > > diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c
> > > > > > > > index 8824f269e5f5..ae28a8c63ca4 100644
> > > > > > > > --- a/drivers/gpu/drm/i915/intel_pm.c
> > > > > > > > +++ b/drivers/gpu/drm/i915/intel_pm.c
> > > > > > > > @@ -5474,7 +5474,24 @@ static void skl_compute_plane_wm(const struct intel_crtc_state
> > > > > > > > *crtc_state,
> > > > > > > >           }
> > > > > > > >   }
> > > > > > > >
> > > > > > > > - blocks = fixed16_to_u32_round_up(selected_result) + 1;
> > > > > > > > + /*
> > > > > > > > +  * Lets have blocks at minimum equivalent to plane_blocks_per_line
> > > > > > > > +  * as there will be at minimum one line for lines configuration.
> > > > > > > > +  *
> > > > > > > > +  * As per the Bspec 49325, if the ddb allocation can hold at least
> > > > > > > > +  * one plane_blocks_per_line, we should have selected method2 in
> > > > > > > > +  * the above logic. Assuming that modern versions have enough dbuf
> > > > > > > > +  * and method2 guarantees blocks equivalent to at least 1 line,
> > > > > > > > +  * select the blocks as plane_blocks_per_line.
> > > > > > > > +  *
> > > > > > > > +  * TODO: Revisit the logic when we have better understanding on DRAM
> > > > > > > > +  * channels' impact on the level 0 memory latency and the relevant
> > > > > > > > +  * wm calculations.
> > > > > > > > +  */
> > > > > > > > + blocks = skl_wm_has_lines(dev_priv, level) ?
> > > > > > > > +                 max_t(u32, fixed16_to_u32_round_up(selected_result) + 1,
> > > > > > > > +                           fixed16_to_u32_round_up(wp->plane_blocks_per_line)) :
> > > > > > > > +                 fixed16_to_u32_round_up(selected_result) + 1;
> > > > > > >
> > > > > > > That's looks rather convoluted.
> > > > > > >
> > > > > > >   blocks = fixed16_to_u32_round_up(selected_result) + 1;
> > > > > > > + /* blah */
> > > > > > > + if (has_lines)
> > > > > > > +   blocks = max(blocks, fixed16_to_u32_round_up(wp->plane_blocks_per_line));
> > > > > >
> > > > > > We probably need to do similar refactoring in the whole function ;-)
> > > > > >
> > > > > > > Also since Art said nothing like this should actually be needed
> > > > > > > I think the comment should make it a bit more clear that this
> > > > > > > is just a hack to work around the underruns with some single
> > > > > > > memory channel configurations.
> > > > > >
> > > > > > It is actually not quite a hack, because we are missing that condition
> > > > > > implementation from BSpec 49325, which instructs us to select method2
> > > > > > when ddb blocks allocation is known and that ratio is >= 1.
> > > > >
> > > > > The ddb allocation is not yet known, so we're implementing the
> > > > > algorithm 100% correctly.
> > > > >
> > > > > And this patch does not implement that misisng part anyway.
> > > >
> > > > Yes, as I understood method2 would just give amount of blocks to be
> > > > at least as dbuf blocks per line.
> > > >
> > > > Wonder whether should we actually fully implement this BSpec clause
> > > > and add it to the point where ddb allocation is known or are there
> > > > any obstacles to do that, besides having to reshuffle this function a bit?
> > >
> > > We need to calculate the wm to figure out how much ddb to allocate,
> > > and then we'd need the ddb allocation to figure out how to calculate
> > > the wm. Very much chicken vs. egg right there. We'd have to do some
> > > kind of hideous loop where we'd calculate everything twice. I don't
> > > really want to do that since I'd actually like to move the wm
> > > calculation to happen already much earlier during .check_plane()
> > > as that could reduce the amount of redundant wm calculations we
> > > are currently doing.
> >
> > I might be missing some details right now, but why do we need a ddb
> > allocation to count wms?
> >
> > I thought its like we usually calculate wm levels + min_ddb_allocation,
> > then based on that we do allocate min_ddb + extra for each plane.
> > This is correct that by this moment when we calculate wms we have only
> > min_ddb available, so if this level would be even enabled, we would
> > at least need min_ddb blocks.
> >
> > I think we could just use that min_ddb value here for that purpose,
> > because the condition anyway checks if
> > (plane buffer allocation / plane blocks per line) >=1 so, even if
> > if this wm level would be enabled plane buffer allocation would
> > be at least min_ddb _or higher_ - however that won't affect this
> > condition because even if it happens to be "plane buffer allocation
> > + some extra" the ratio would still be valid.
> > So if it executes for min_ddb / plane blocks per line, we can
> > probably safely state, further it will be also true.
> 
> min_ddb = 110% of the blocks calculated from the 2 methods (blocks + 10%)
> It depends on what method we choose. So I dont think we can use it for any assumptions.

Min_ddb is what matters for us because it is an actual ddb allocation we use,
but not the wm level.
As I understand (plane buffer allocation / plane blocks per line) >=1 validity depends
only if min_ddb can get lower after we do full allocation in skl_allocate_plane_ddb,
which can't be smaller than min_ddb.

The allocation algorithm works in such way that it tries to allocate at least min_ddb
, if it can't - wm level would be disabled.
However if it succeeds it might try to add some extra blocks to the allocation
(see skl_allocate_plane_ddb). 
So yes, even though we don't know the exact allocation in skl_compute_plane_wm - 
we can for sure assume it won't be less than min_ddb anyway, which means
that if min_ddb / plane_blocks_per_line >= 1 is true, it will be true also in further,
if that wm level would be at all enabled.

Stan


> 
> But in any case, I think this patch do not cause any harm in most of the usecases expected out of
> skl+ platforms which have enough dbuf!
> 
> Per plane ddb allocation happens based on the highest wm level min_ddb which can fit into the
> allocation. If one level is not fit, then that level + above package C state transitions are
> disabled.
> Now if you look at the logic to select which method to use - if the latency >= linetime, we select
> the large buffer method which guantees that there is atleast plane_blocks_per_line. So I think we
> can safely assume that latency for wake wm level will be mostly higher, which implies using the
> "large buffer" method.
> 
> So this change mostly limits to wm0. And hence should not impact ddb allocation, but the memory
> fetch bursts might happen slightly more frequently when the processor is in C0?
> 
> BR
> vinod
> 
> >
> > Stan
> >
> > > --
> > > Ville Syrjälä
> > > Intel
diff mbox series

Patch

diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c
index 8824f269e5f5..ae28a8c63ca4 100644
--- a/drivers/gpu/drm/i915/intel_pm.c
+++ b/drivers/gpu/drm/i915/intel_pm.c
@@ -5474,7 +5474,24 @@  static void skl_compute_plane_wm(const struct intel_crtc_state *crtc_state,
 		}
 	}
 
-	blocks = fixed16_to_u32_round_up(selected_result) + 1;
+	/*
+	 * Lets have blocks at minimum equivalent to plane_blocks_per_line
+	 * as there will be at minimum one line for lines configuration.
+	 *
+	 * As per the Bspec 49325, if the ddb allocation can hold at least
+	 * one plane_blocks_per_line, we should have selected method2 in
+	 * the above logic. Assuming that modern versions have enough dbuf
+	 * and method2 guarantees blocks equivalent to at least 1 line,
+	 * select the blocks as plane_blocks_per_line.
+	 *
+	 * TODO: Revisit the logic when we have better understanding on DRAM
+	 * channels' impact on the level 0 memory latency and the relevant
+	 * wm calculations.
+	 */
+	blocks = skl_wm_has_lines(dev_priv, level) ?
+			max_t(u32, fixed16_to_u32_round_up(selected_result) + 1,
+				  fixed16_to_u32_round_up(wp->plane_blocks_per_line)) :
+			fixed16_to_u32_round_up(selected_result) + 1;
 	lines = div_round_up_fixed16(selected_result,
 				     wp->plane_blocks_per_line);