diff mbox

[v6,6/6] drm/i915/skl: Update DDB values atomically with wms/plane attrs

Message ID 1470177458-31984-7-git-send-email-cpaul@redhat.com (mailing list archive)
State New, archived
Headers show

Commit Message

cpaul@redhat.com Aug. 2, 2016, 10:37 p.m. UTC
Now that we can hook into update_crtcs and control the order in which we
update CRTCs at each modeset, we can finish the final step of fixing
Skylake's watermark handling by performing DDB updates at the same time
as plane updates and watermark updates.

The first major change in this patch is skl_update_crtcs(), which
handles ensuring that we order each CRTC update in our atomic commits
properly so that they honor the DDB flush order.

The second major change in this patch is the order in which we flush the
pipes. While the previous order may have worked, it can't be used in
this approach since it no longer will do the right thing. For example,
using the old ddb flush order:

We have pipes A, B, and C enabled, and we're disabling C. Initial ddb
allocation looks like this:

|   A   |   B   |xxxxxxx|

Since we're performing the ddb updates after performing any CRTC
disablements in intel_atomic_commit_tail(), the space to the right of
pipe B is unallocated.

1. Flush pipes with new allocation contained into old space. None
   apply, so we skip this
2. Flush pipes having their allocation reduced, but overlapping with a
   previous allocation. None apply, so we also skip this
3. Flush pipes that got more space allocated. This applies to A and B,
   giving us the following update order: A, B

This is wrong, since updating pipe A first will cause it to overlap with
B and potentially burst into flames. Our new order (see the code
comments for details) would update the pipes in the proper order: B, A.

As well, we calculate the order for each DDB update during the check
phase, and reference it later in the commit phase when we hit
skl_update_crtcs().

This long overdue patch fixes the rest of the underruns on Skylake.

Changes since v1:
 - Add skl_ddb_entry_write() for cursor into skl_write_cursor_wm()

Fixes: 0e8fb7ba7ca5 ("drm/i915/skl: Flush the WM configuration")
Fixes: 8211bd5bdf5e ("drm/i915/skl: Program the DDB allocation")
Signed-off-by: Lyude <cpaul@redhat.com>
[omitting CC for stable, since this patch will need to be changed for
such backports first]
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Daniel Vetter <daniel.vetter@intel.com>
Cc: Radhakrishna Sripada <radhakrishna.sripada@intel.com>
Cc: Hans de Goede <hdegoede@redhat.com>
Cc: Matt Roper <matthew.d.roper@intel.com>
---
 drivers/gpu/drm/i915/intel_display.c | 100 ++++++++++--
 drivers/gpu/drm/i915/intel_drv.h     |  10 ++
 drivers/gpu/drm/i915/intel_pm.c      | 288 ++++++++++++++++-------------------
 3 files changed, 233 insertions(+), 165 deletions(-)

Comments

Ville Syrjälä Aug. 3, 2016, 3 p.m. UTC | #1
On Tue, Aug 02, 2016 at 06:37:37PM -0400, Lyude wrote:
> Now that we can hook into update_crtcs and control the order in which we
> update CRTCs at each modeset, we can finish the final step of fixing
> Skylake's watermark handling by performing DDB updates at the same time
> as plane updates and watermark updates.
> 
> The first major change in this patch is skl_update_crtcs(), which
> handles ensuring that we order each CRTC update in our atomic commits
> properly so that they honor the DDB flush order.
> 
> The second major change in this patch is the order in which we flush the
> pipes. While the previous order may have worked, it can't be used in
> this approach since it no longer will do the right thing. For example,
> using the old ddb flush order:
> 
> We have pipes A, B, and C enabled, and we're disabling C. Initial ddb
> allocation looks like this:
> 
> |   A   |   B   |xxxxxxx|
> 
> Since we're performing the ddb updates after performing any CRTC
> disablements in intel_atomic_commit_tail(), the space to the right of
> pipe B is unallocated.
> 
> 1. Flush pipes with new allocation contained into old space. None
>    apply, so we skip this
> 2. Flush pipes having their allocation reduced, but overlapping with a
>    previous allocation. None apply, so we also skip this
> 3. Flush pipes that got more space allocated. This applies to A and B,
>    giving us the following update order: A, B
> 
> This is wrong, since updating pipe A first will cause it to overlap with
> B and potentially burst into flames. Our new order (see the code
> comments for details) would update the pipes in the proper order: B, A.
> 
> As well, we calculate the order for each DDB update during the check
> phase, and reference it later in the commit phase when we hit
> skl_update_crtcs().
> 
> This long overdue patch fixes the rest of the underruns on Skylake.
> 
> Changes since v1:
>  - Add skl_ddb_entry_write() for cursor into skl_write_cursor_wm()
> 
> Fixes: 0e8fb7ba7ca5 ("drm/i915/skl: Flush the WM configuration")
> Fixes: 8211bd5bdf5e ("drm/i915/skl: Program the DDB allocation")
> Signed-off-by: Lyude <cpaul@redhat.com>
> [omitting CC for stable, since this patch will need to be changed for
> such backports first]
> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
> Cc: Daniel Vetter <daniel.vetter@intel.com>
> Cc: Radhakrishna Sripada <radhakrishna.sripada@intel.com>
> Cc: Hans de Goede <hdegoede@redhat.com>
> Cc: Matt Roper <matthew.d.roper@intel.com>
> ---
>  drivers/gpu/drm/i915/intel_display.c | 100 ++++++++++--
>  drivers/gpu/drm/i915/intel_drv.h     |  10 ++
>  drivers/gpu/drm/i915/intel_pm.c      | 288 ++++++++++++++++-------------------
>  3 files changed, 233 insertions(+), 165 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
> index 59cf513..06295f7 100644
> --- a/drivers/gpu/drm/i915/intel_display.c
> +++ b/drivers/gpu/drm/i915/intel_display.c
> @@ -12897,16 +12897,23 @@ static void verify_wm_state(struct drm_crtc *crtc,
>  			  hw_entry->start, hw_entry->end);
>  	}
>  
> -	/* cursor */
> -	hw_entry = &hw_ddb.plane[pipe][PLANE_CURSOR];
> -	sw_entry = &sw_ddb->plane[pipe][PLANE_CURSOR];
> -
> -	if (!skl_ddb_entry_equal(hw_entry, sw_entry)) {
> -		DRM_ERROR("mismatch in DDB state pipe %c cursor "
> -			  "(expected (%u,%u), found (%u,%u))\n",
> -			  pipe_name(pipe),
> -			  sw_entry->start, sw_entry->end,
> -			  hw_entry->start, hw_entry->end);
> +	/*
> +	 * cursor
> +	 * If the cursor plane isn't active, we may not have updated it's ddb
> +	 * allocation. In that case since the ddb allocation will be updated
> +	 * once the plane becomes visible, we can skip this check
> +	 */
> +	if (intel_crtc->cursor_addr) {
> +		hw_entry = &hw_ddb.plane[pipe][PLANE_CURSOR];
> +		sw_entry = &sw_ddb->plane[pipe][PLANE_CURSOR];
> +
> +		if (!skl_ddb_entry_equal(hw_entry, sw_entry)) {
> +			DRM_ERROR("mismatch in DDB state pipe %c cursor "
> +				  "(expected (%u,%u), found (%u,%u))\n",
> +				  pipe_name(pipe),
> +				  sw_entry->start, sw_entry->end,
> +				  hw_entry->start, hw_entry->end);
> +		}
>  	}
>  }
>  
> @@ -13658,6 +13665,72 @@ static void intel_update_crtcs(struct drm_atomic_state *state,
>  	}
>  }
>  
> +static inline void
> +skl_do_ddb_step(struct drm_atomic_state *state,
> +		enum skl_ddb_step step)
> +{
> +	struct intel_atomic_state *intel_state = to_intel_atomic_state(state);
> +	struct drm_crtc *crtc;
> +	struct drm_crtc_state *old_crtc_state;
> +	unsigned int crtc_vblank_mask; /* unused */
> +	int i;
> +
> +	for_each_crtc_in_state(state, crtc, old_crtc_state, i) {
> +		struct intel_crtc *intel_crtc = to_intel_crtc(crtc);
> +		struct intel_crtc_state *cstate =
> +			to_intel_crtc_state(crtc->state);
> +		bool vblank_wait = false;
> +
> +		if (cstate->wm.skl.ddb_realloc != step || !crtc->state->active)
> +			continue;
> +
> +		/*
> +		 * If we're changing the ddb allocation of this pipe to make
> +		 * room for another pipe, we have to wait for the pipe's ddb
> +		 * allocations to actually update by waiting for a vblank.
> +		 * Otherwise we risk the next pipe updating before this pipe
> +		 * finishes, resulting in the pipe fetching from ddb space for
> +		 * the wrong pipe.
> +		 *
> +		 * However, if we know we don't have any more pipes to move
> +		 * around, we can skip this wait and the new ddb allocation
> +		 * will take effect at the start of the next vblank.
> +		 */
> +		switch (step) {
> +		case SKL_DDB_STEP_NO_OVERLAP:
> +		case SKL_DDB_STEP_OVERLAP:
> +			if (step != intel_state->last_ddb_step)
> +				vblank_wait = true;
> +
> +		/* drop through */
> +		case SKL_DDB_STEP_FINAL:
> +			DRM_DEBUG_KMS(
> +			    "Updating [CRTC:%d:pipe %c] for DDB step %d\n",
> +			    crtc->base.id, pipe_name(intel_crtc->pipe),
> +			    step);
> +
> +		case SKL_DDB_STEP_NONE:
> +			break;
> +		}

Not sure we really need this step stuff. How about?

for_each_crtc
	if (crtc_needs_disabling)
		disable_crtc();

do {
	progress = false;
	wait_vbl_pipes=0;
	for_each_crtc() {
		if (!active || needs_modeset)
			continue;
		if (!ddb_changed)
			continue;
		if (new_ddb_overlaps_with_any_other_pipes_current_ddb)
			continue;
		commit;
		wait_vbl_pipes |= pipe;
		progress = true;
	}
	wait_vbls(wait_vbl_pipes);
} while (progress);

for_each_crtc
	if (crtc_needs_enabling)
		enable_crtc();
	commit;
}

Or if we're paranoid, we could also have an upper bound on the
loop and assert that we never reach it.


Though one thing I don't particularly like about this commit while
changing the ddb approach is that it's going to make the update
appear even less atomic. What I'd rather like to do for the normal
commit path is this:

for_each_crtc
	if (crtc_needs_disabling)
		disable_planes
for_each_crtc
	if (crtc_needs_disabling)
		disable_crtc
for_each_crtc
	if (crtc_needs_enabling)
		enable_crtc
for_each_crtc
	if (active)
		commit_planes;

That way everything would pop in and out as close together as possible.
Hmm. Actually, I wonder... I'm thinking we should be able to enable all
crtcs prior to entering the ddb commit loop, on account of no planes
being enabled on those crtcs until we commit them. And if no planes are
enabled, running the pipe w/o allocated ddb should be fine. So with that
approach, I think we should be able to commit all planes within a few
iterations of the loop, and hence within a few vblanks.

> +
> +		intel_update_crtc(crtc, state, old_crtc_state,
> +				  &crtc_vblank_mask);
> +
> +		if (vblank_wait)
> +			intel_wait_for_vblank(state->dev, intel_crtc->pipe);
> +	}
> +}
> +
> +static void skl_update_crtcs(struct drm_atomic_state *state,
> +			     unsigned int *crtc_vblank_mask)
> +{
> +	struct intel_atomic_state *intel_state = to_intel_atomic_state(state);
> +	enum skl_ddb_step step;
> +
> +	for (step = 0; step <= intel_state->last_ddb_step; step++)
> +		skl_do_ddb_step(state, step);
> +}
> +
>  static void intel_atomic_commit_tail(struct drm_atomic_state *state)
>  {
>  	struct drm_device *dev = state->dev;
> @@ -15235,8 +15308,6 @@ void intel_init_display_hooks(struct drm_i915_private *dev_priv)
>  		dev_priv->display.crtc_disable = i9xx_crtc_disable;
>  	}
>  
> -	dev_priv->display.update_crtcs = intel_update_crtcs;
> -
>  	/* Returns the core display clock speed */
>  	if (IS_SKYLAKE(dev_priv) || IS_KABYLAKE(dev_priv))
>  		dev_priv->display.get_display_clock_speed =
> @@ -15326,6 +15397,11 @@ void intel_init_display_hooks(struct drm_i915_private *dev_priv)
>  			skl_modeset_calc_cdclk;
>  	}
>  
> +	if (dev_priv->info.gen >= 9)
> +		dev_priv->display.update_crtcs = skl_update_crtcs;
> +	else
> +		dev_priv->display.update_crtcs = intel_update_crtcs;
> +
>  	switch (INTEL_INFO(dev_priv)->gen) {
>  	case 2:
>  		dev_priv->display.queue_flip = intel_gen2_queue_flip;
> diff --git a/drivers/gpu/drm/i915/intel_drv.h b/drivers/gpu/drm/i915/intel_drv.h
> index 1b444d3..cf5da83 100644
> --- a/drivers/gpu/drm/i915/intel_drv.h
> +++ b/drivers/gpu/drm/i915/intel_drv.h
> @@ -334,6 +334,7 @@ struct intel_atomic_state {
>  
>  	/* Gen9+ only */
>  	struct skl_wm_values wm_results;
> +	int last_ddb_step;
>  };
>  
>  struct intel_plane_state {
> @@ -437,6 +438,13 @@ struct skl_pipe_wm {
>  	uint32_t linetime;
>  };
>  
> +enum skl_ddb_step {
> +	SKL_DDB_STEP_NONE = 0,
> +	SKL_DDB_STEP_NO_OVERLAP,
> +	SKL_DDB_STEP_OVERLAP,
> +	SKL_DDB_STEP_FINAL
> +};
> +
>  struct intel_crtc_wm_state {
>  	union {
>  		struct {
> @@ -467,6 +475,8 @@ struct intel_crtc_wm_state {
>  			/* minimum block allocation */
>  			uint16_t minimum_blocks[I915_MAX_PLANES];
>  			uint16_t minimum_y_blocks[I915_MAX_PLANES];
> +
> +			enum skl_ddb_step ddb_realloc;
>  		} skl;
>  	};
>  
> diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c
> index 6f5beb3..636c90a 100644
> --- a/drivers/gpu/drm/i915/intel_pm.c
> +++ b/drivers/gpu/drm/i915/intel_pm.c
> @@ -3816,6 +3816,11 @@ void skl_write_plane_wm(struct intel_crtc *intel_crtc,
>  			   wm->plane[pipe][plane][level]);
>  	}
>  	I915_WRITE(PLANE_WM_TRANS(pipe, plane), wm->plane_trans[pipe][plane]);
> +
> +	skl_ddb_entry_write(dev_priv, PLANE_BUF_CFG(pipe, plane),
> +			    &wm->ddb.plane[pipe][plane]);
> +	skl_ddb_entry_write(dev_priv, PLANE_NV12_BUF_CFG(pipe, plane),
> +			    &wm->ddb.y_plane[pipe][plane]);
>  }
>  
>  void skl_write_cursor_wm(struct intel_crtc *intel_crtc,
> @@ -3832,170 +3837,51 @@ void skl_write_cursor_wm(struct intel_crtc *intel_crtc,
>  			   wm->plane[pipe][PLANE_CURSOR][level]);
>  	}
>  	I915_WRITE(CUR_WM_TRANS(pipe), wm->plane_trans[pipe][PLANE_CURSOR]);
> -}
> -
> -static void skl_write_wm_values(struct drm_i915_private *dev_priv,
> -				const struct skl_wm_values *new)
> -{
> -	struct drm_device *dev = &dev_priv->drm;
> -	struct intel_crtc *crtc;
> -
> -	for_each_intel_crtc(dev, crtc) {
> -		int i;
> -		enum pipe pipe = crtc->pipe;
> -
> -		if ((new->dirty_pipes & drm_crtc_mask(&crtc->base)) == 0)
> -			continue;
> -		if (!crtc->active)
> -			continue;
>  
> -		for (i = 0; i < intel_num_planes(crtc); i++) {
> -			skl_ddb_entry_write(dev_priv,
> -					    PLANE_BUF_CFG(pipe, i),
> -					    &new->ddb.plane[pipe][i]);
> -			skl_ddb_entry_write(dev_priv,
> -					    PLANE_NV12_BUF_CFG(pipe, i),
> -					    &new->ddb.y_plane[pipe][i]);
> -		}
> -
> -		skl_ddb_entry_write(dev_priv, CUR_BUF_CFG(pipe),
> -				    &new->ddb.plane[pipe][PLANE_CURSOR]);
> -	}
> +	skl_ddb_entry_write(dev_priv, CUR_BUF_CFG(pipe),
> +			    &wm->ddb.plane[pipe][PLANE_CURSOR]);
>  }
>  
> -/*
> - * When setting up a new DDB allocation arrangement, we need to correctly
> - * sequence the times at which the new allocations for the pipes are taken into
> - * account or we'll have pipes fetching from space previously allocated to
> - * another pipe.
> - *
> - * Roughly the sequence looks like:
> - *  1. re-allocate the pipe(s) with the allocation being reduced and not
> - *     overlapping with a previous light-up pipe (another way to put it is:
> - *     pipes with their new allocation strickly included into their old ones).
> - *  2. re-allocate the other pipes that get their allocation reduced
> - *  3. allocate the pipes having their allocation increased
> - *
> - * Steps 1. and 2. are here to take care of the following case:
> - * - Initially DDB looks like this:
> - *     |   B    |   C    |
> - * - enable pipe A.
> - * - pipe B has a reduced DDB allocation that overlaps with the old pipe C
> - *   allocation
> - *     |  A  |  B  |  C  |
> - *
> - * We need to sequence the re-allocation: C, B, A (and not B, C, A).
> - */
> -
> -static void
> -skl_wm_flush_pipe(struct drm_i915_private *dev_priv, enum pipe pipe, int pass)
> +static bool
> +skl_ddb_allocation_equals(const struct skl_ddb_allocation *old,
> +			  const struct skl_ddb_allocation *new,
> +			  enum pipe pipe)
>  {
> -	int plane;
> -
> -	DRM_DEBUG_KMS("flush pipe %c (pass %d)\n", pipe_name(pipe), pass);
> -
> -	for_each_plane(dev_priv, pipe, plane) {
> -		I915_WRITE(PLANE_SURF(pipe, plane),
> -			   I915_READ(PLANE_SURF(pipe, plane)));
> -	}
> -	I915_WRITE(CURBASE(pipe), I915_READ(CURBASE(pipe)));
> +	return new->pipe[pipe].start == old->pipe[pipe].start &&
> +	       new->pipe[pipe].end == old->pipe[pipe].end;
>  }
>  
>  static bool
> -skl_ddb_allocation_included(const struct skl_ddb_allocation *old,
> +skl_ddb_allocation_overlaps(struct drm_atomic_state *state,
> +			    const struct skl_ddb_allocation *old,
>  			    const struct skl_ddb_allocation *new,
>  			    enum pipe pipe)
>  {
> -	uint16_t old_size, new_size;
> -
> -	old_size = skl_ddb_entry_size(&old->pipe[pipe]);
> -	new_size = skl_ddb_entry_size(&new->pipe[pipe]);
> -
> -	return old_size != new_size &&
> -	       new->pipe[pipe].start >= old->pipe[pipe].start &&
> -	       new->pipe[pipe].end <= old->pipe[pipe].end;
> -}
> -
> -static void skl_flush_wm_values(struct drm_i915_private *dev_priv,
> -				struct skl_wm_values *new_values)
> -{
> -	struct drm_device *dev = &dev_priv->drm;
> -	struct skl_ddb_allocation *cur_ddb, *new_ddb;
> -	bool reallocated[I915_MAX_PIPES] = {};
> -	struct intel_crtc *crtc;
> -	enum pipe pipe;
> -
> -	new_ddb = &new_values->ddb;
> -	cur_ddb = &dev_priv->wm.skl_hw.ddb;
> -
> -	/*
> -	 * First pass: flush the pipes with the new allocation contained into
> -	 * the old space.
> -	 *
> -	 * We'll wait for the vblank on those pipes to ensure we can safely
> -	 * re-allocate the freed space without this pipe fetching from it.
> -	 */
> -	for_each_intel_crtc(dev, crtc) {
> -		if (!crtc->active)
> -			continue;
> -
> -		pipe = crtc->pipe;
> -
> -		if (!skl_ddb_allocation_included(cur_ddb, new_ddb, pipe))
> -			continue;
> -
> -		skl_wm_flush_pipe(dev_priv, pipe, 1);
> -		intel_wait_for_vblank(dev, pipe);
> -
> -		reallocated[pipe] = true;
> -	}
> -
> -
> -	/*
> -	 * Second pass: flush the pipes that are having their allocation
> -	 * reduced, but overlapping with a previous allocation.
> -	 *
> -	 * Here as well we need to wait for the vblank to make sure the freed
> -	 * space is not used anymore.
> -	 */
> -	for_each_intel_crtc(dev, crtc) {
> -		if (!crtc->active)
> -			continue;
> -
> -		pipe = crtc->pipe;
> -
> -		if (reallocated[pipe])
> -			continue;
> -
> -		if (skl_ddb_entry_size(&new_ddb->pipe[pipe]) <
> -		    skl_ddb_entry_size(&cur_ddb->pipe[pipe])) {
> -			skl_wm_flush_pipe(dev_priv, pipe, 2);
> -			intel_wait_for_vblank(dev, pipe);
> -			reallocated[pipe] = true;
> -		}
> -	}
> -
> -	/*
> -	 * Third pass: flush the pipes that got more space allocated.
> -	 *
> -	 * We don't need to actively wait for the update here, next vblank
> -	 * will just get more DDB space with the correct WM values.
> -	 */
> -	for_each_intel_crtc(dev, crtc) {
> -		if (!crtc->active)
> -			continue;
> +	struct drm_device *dev = state->dev;
> +	struct intel_crtc *intel_crtc;
> +	enum pipe otherp;
>  
> -		pipe = crtc->pipe;
> +	for_each_intel_crtc(dev, intel_crtc) {
> +		otherp = intel_crtc->pipe;
>  
>  		/*
> -		 * At this point, only the pipes more space than before are
> -		 * left to re-allocate.
> +		 * When checking for overlaps, we don't want to:
> +		 *  - Compare against ourselves
> +		 *  - Compare against pipes that will be disabled in step 0
> +		 *  - Compare against pipes that won't be enabled until step 3
>  		 */
> -		if (reallocated[pipe])
> +		if (otherp == pipe || !new->pipe[otherp].end ||
> +		    !old->pipe[otherp].end)
>  			continue;
>  
> -		skl_wm_flush_pipe(dev_priv, pipe, 3);
> +		if ((new->pipe[pipe].start >= old->pipe[otherp].start &&
> +		     new->pipe[pipe].start < old->pipe[otherp].end) ||
> +		    (old->pipe[otherp].start >= new->pipe[pipe].start &&
> +		     old->pipe[otherp].start < new->pipe[pipe].end))
> +			return true;
>  	}
> +
> +	return false;
>  }
>  
>  static int skl_update_pipe_wm(struct drm_crtc_state *cstate,
> @@ -4038,8 +3924,10 @@ skl_compute_ddb(struct drm_atomic_state *state)
>  	struct drm_device *dev = state->dev;
>  	struct drm_i915_private *dev_priv = to_i915(dev);
>  	struct intel_atomic_state *intel_state = to_intel_atomic_state(state);
> +	struct intel_crtc_state *cstate;
>  	struct intel_crtc *intel_crtc;
> -	struct skl_ddb_allocation *ddb = &intel_state->wm_results.ddb;
> +	struct skl_ddb_allocation *old_ddb = &dev_priv->wm.skl_hw.ddb;
> +	struct skl_ddb_allocation *new_ddb = &intel_state->wm_results.ddb;
>  	uint32_t realloc_pipes = pipes_modified(state);
>  	int ret;
>  
> @@ -4071,13 +3959,11 @@ skl_compute_ddb(struct drm_atomic_state *state)
>  	}
>  
>  	for_each_intel_crtc_mask(dev, intel_crtc, realloc_pipes) {
> -		struct intel_crtc_state *cstate;
> -
>  		cstate = intel_atomic_get_crtc_state(state, intel_crtc);
>  		if (IS_ERR(cstate))
>  			return PTR_ERR(cstate);
>  
> -		ret = skl_allocate_pipe_ddb(cstate, ddb);
> +		ret = skl_allocate_pipe_ddb(cstate, new_ddb);
>  		if (ret)
>  			return ret;
>  
> @@ -4086,6 +3972,73 @@ skl_compute_ddb(struct drm_atomic_state *state)
>  			return ret;
>  	}
>  
> +	/*
> +	 * When setting up a new DDB allocation arrangement, we need to
> +	 * correctly sequence the times at which the new allocations for the
> +	 * pipes are taken into account or we'll have pipes fetching from space
> +	 * previously allocated to another pipe.
> +	 *
> +	 * Roughly the final sequence we want looks like this:
> +	 *  1. Disable any pipes we're not going to be using anymore
> +	 *  2. Reallocate all of the active pipes whose new ddb allocations
> +	 *  won't overlap with another active pipe's ddb allocation.
> +	 *  3. Reallocate remaining active pipes, if any.
> +	 *  4. Enable any new pipes, if any.
> +	 *
> +	 * Example:
> +	 * Initially DDB looks like this:
> +	 *   |   B    |   C    |
> +	 * And the final DDB should look like this:
> +	 *   |  B  |  C  |  A  |
> +	 *
> +	 * 1. We're not disabling any pipes, so do nothing on this step.
> +	 * 2. Pipe B's new allocation wouldn't overlap with pipe C, however
> +	 * pipe C's new allocation does overlap with pipe B's current
> +	 * allocation. Reallocate B first so the DDB looks like this:
> +	 *   |  B  |xx|   C    |
> +	 * 3. Now we can safely reallocate pipe C to it's new location:
> +	 *   |  B  |  C  |xxxxx|
> +	 * 4. Enable any remaining pipes, in this case A
> +	 *   |  B  |  C  |  A  |
> +	 *
> +	 * As well, between every pipe reallocation we have to wait for a
> +	 * vblank on the pipe so that we ensure it's new allocation has taken
> +	 * effect by the time we start moving the next pipe. This can be
> +	 * skipped on the last step we need to perform, which is why we keep
> +	 * track of that information here. For example, if we've reallocated
> +	 * all the pipes that need changing by the time we reach step 3, we can
> +	 * finish without waiting for the pipes we changed in step 3 to update.
> +	 */
> +	for_each_intel_crtc_mask(dev, intel_crtc, realloc_pipes) {
> +		enum pipe pipe = intel_crtc->pipe;
> +		enum skl_ddb_step step;
> +
> +		cstate = intel_atomic_get_crtc_state(state, intel_crtc);
> +		if (IS_ERR(cstate))
> +			return PTR_ERR(cstate);
> +
> +		/* Step 1: Pipes we're disabling / haven't changed */
> +		if (skl_ddb_allocation_equals(old_ddb, new_ddb, pipe) ||
> +		    new_ddb->pipe[pipe].end == 0) {
> +			step = SKL_DDB_STEP_NONE;
> +		/* Step 2-3: Active pipes we're reallocating */
> +		} else if (old_ddb->pipe[pipe].end != 0) {
> +			if (skl_ddb_allocation_overlaps(state, old_ddb, new_ddb,
> +							pipe))
> +				step = SKL_DDB_STEP_OVERLAP;
> +			else
> +				step = SKL_DDB_STEP_NO_OVERLAP;
> +		/* Step 4: Pipes we're enabling */
> +		} else {
> +			step = SKL_DDB_STEP_FINAL;
> +		}
> +
> +		cstate->wm.skl.ddb_realloc = step;
> +
> +		if (step > intel_state->last_ddb_step)
> +			intel_state->last_ddb_step = step;
> +	}
> +
>  	return 0;
>  }
>  
> @@ -4110,10 +4063,13 @@ skl_copy_wm_for_pipe(struct skl_wm_values *dst,
>  static int
>  skl_compute_wm(struct drm_atomic_state *state)
>  {
> +	struct drm_i915_private *dev_priv = to_i915(state->dev);
>  	struct drm_crtc *crtc;
>  	struct drm_crtc_state *cstate;
>  	struct intel_atomic_state *intel_state = to_intel_atomic_state(state);
>  	struct skl_wm_values *results = &intel_state->wm_results;
> +	struct skl_ddb_allocation *old_ddb = &dev_priv->wm.skl_hw.ddb;
> +	struct skl_ddb_allocation *new_ddb = &results->ddb;
>  	struct skl_pipe_wm *pipe_wm;
>  	bool changed = false;
>  	int ret, i;
> @@ -4152,7 +4108,10 @@ skl_compute_wm(struct drm_atomic_state *state)
>  		struct intel_crtc *intel_crtc = to_intel_crtc(crtc);
>  		struct intel_crtc_state *intel_cstate =
>  			to_intel_crtc_state(cstate);
> +		enum skl_ddb_step step;
> +		enum pipe pipe;
>  
> +		pipe = intel_crtc->pipe;
>  		pipe_wm = &intel_cstate->wm.skl.optimal;
>  		ret = skl_update_pipe_wm(cstate, &results->ddb, pipe_wm,
>  					 &changed);
> @@ -4167,7 +4126,18 @@ skl_compute_wm(struct drm_atomic_state *state)
>  			continue;
>  
>  		intel_cstate->update_wm_pre = true;
> +		step = intel_cstate->wm.skl.ddb_realloc;
>  		skl_compute_wm_results(crtc->dev, pipe_wm, results, intel_crtc);
> +
> +		if (!skl_ddb_entry_equal(&old_ddb->pipe[pipe],
> +					 &new_ddb->pipe[pipe])) {
> +			DRM_DEBUG_KMS(
> +			    "DDB changes for [CRTC:%d:pipe %c]: (%3d - %3d) -> (%3d - %3d) on step %d\n",
> +			    intel_crtc->base.base.id, pipe_name(pipe),
> +			    old_ddb->pipe[pipe].start, old_ddb->pipe[pipe].end,
> +			    new_ddb->pipe[pipe].start, new_ddb->pipe[pipe].end,
> +			    step);
> +		}
>  	}
>  
>  	return 0;
> @@ -4191,8 +4161,20 @@ static void skl_update_wm(struct drm_crtc *crtc)
>  
>  	mutex_lock(&dev_priv->wm.wm_mutex);
>  
> -	skl_write_wm_values(dev_priv, results);
> -	skl_flush_wm_values(dev_priv, results);
> +	/*
> +	 * If this pipe isn't active already, we're going to be enabling it
> +	 * very soon. Since it's safe to update these while the pipe's shut off,
> +	 * just do so here. Already active pipes will have their watermarks
> +	 * updated once we update their planes.
> +	 */
> +	if (!intel_crtc->active) {
> +		int plane;
> +
> +		for (plane = 0; plane < intel_num_planes(intel_crtc); plane++)
> +			skl_write_plane_wm(intel_crtc, results, plane);
> +
> +		skl_write_cursor_wm(intel_crtc, results);
> +	}
>  
>  	/*
>  	 * Store the new configuration (but only for the pipes that have
> -- 
> 2.7.4
cpaul@redhat.com Aug. 3, 2016, 9:39 p.m. UTC | #2
On Wed, 2016-08-03 at 18:00 +0300, Ville Syrjälä wrote:
> On Tue, Aug 02, 2016 at 06:37:37PM -0400, Lyude wrote:
> > 
> > Now that we can hook into update_crtcs and control the order in which we
> > update CRTCs at each modeset, we can finish the final step of fixing
> > Skylake's watermark handling by performing DDB updates at the same time
> > as plane updates and watermark updates.
> > 
> > The first major change in this patch is skl_update_crtcs(), which
> > handles ensuring that we order each CRTC update in our atomic commits
> > properly so that they honor the DDB flush order.
> > 
> > The second major change in this patch is the order in which we flush the
> > pipes. While the previous order may have worked, it can't be used in
> > this approach since it no longer will do the right thing. For example,
> > using the old ddb flush order:
> > 
> > We have pipes A, B, and C enabled, and we're disabling C. Initial ddb
> > allocation looks like this:
> > 
> > > 
> > >   A   |   B   |xxxxxxx|
> > 
> > Since we're performing the ddb updates after performing any CRTC
> > disablements in intel_atomic_commit_tail(), the space to the right of
> > pipe B is unallocated.
> > 
> > 1. Flush pipes with new allocation contained into old space. None
> >    apply, so we skip this
> > 2. Flush pipes having their allocation reduced, but overlapping with a
> >    previous allocation. None apply, so we also skip this
> > 3. Flush pipes that got more space allocated. This applies to A and B,
> >    giving us the following update order: A, B
> > 
> > This is wrong, since updating pipe A first will cause it to overlap with
> > B and potentially burst into flames. Our new order (see the code
> > comments for details) would update the pipes in the proper order: B, A.
> > 
> > As well, we calculate the order for each DDB update during the check
> > phase, and reference it later in the commit phase when we hit
> > skl_update_crtcs().
> > 
> > This long overdue patch fixes the rest of the underruns on Skylake.
> > 
> > Changes since v1:
> >  - Add skl_ddb_entry_write() for cursor into skl_write_cursor_wm()
> > 
> > Fixes: 0e8fb7ba7ca5 ("drm/i915/skl: Flush the WM configuration")
> > Fixes: 8211bd5bdf5e ("drm/i915/skl: Program the DDB allocation")
> > Signed-off-by: Lyude <cpaul@redhat.com>
> > [omitting CC for stable, since this patch will need to be changed for
> > such backports first]
> > Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
> > Cc: Daniel Vetter <daniel.vetter@intel.com>
> > Cc: Radhakrishna Sripada <radhakrishna.sripada@intel.com>
> > Cc: Hans de Goede <hdegoede@redhat.com>
> > Cc: Matt Roper <matthew.d.roper@intel.com>
> > ---
> >  drivers/gpu/drm/i915/intel_display.c | 100 ++++++++++--
> >  drivers/gpu/drm/i915/intel_drv.h     |  10 ++
> >  drivers/gpu/drm/i915/intel_pm.c      | 288 ++++++++++++++++--------------
> > -----
> >  3 files changed, 233 insertions(+), 165 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/intel_display.c
> > b/drivers/gpu/drm/i915/intel_display.c
> > index 59cf513..06295f7 100644
> > --- a/drivers/gpu/drm/i915/intel_display.c
> > +++ b/drivers/gpu/drm/i915/intel_display.c
> > @@ -12897,16 +12897,23 @@ static void verify_wm_state(struct drm_crtc *crtc,
> >  			  hw_entry->start, hw_entry->end);
> >  	}
> >  
> > -	/* cursor */
> > -	hw_entry = &hw_ddb.plane[pipe][PLANE_CURSOR];
> > -	sw_entry = &sw_ddb->plane[pipe][PLANE_CURSOR];
> > -
> > -	if (!skl_ddb_entry_equal(hw_entry, sw_entry)) {
> > -		DRM_ERROR("mismatch in DDB state pipe %c cursor "
> > -			  "(expected (%u,%u), found (%u,%u))\n",
> > -			  pipe_name(pipe),
> > -			  sw_entry->start, sw_entry->end,
> > -			  hw_entry->start, hw_entry->end);
> > +	/*
> > +	 * cursor
> > +	 * If the cursor plane isn't active, we may not have updated it's
> > ddb
> > +	 * allocation. In that case since the ddb allocation will be
> > updated
> > +	 * once the plane becomes visible, we can skip this check
> > +	 */
> > +	if (intel_crtc->cursor_addr) {
> > +		hw_entry = &hw_ddb.plane[pipe][PLANE_CURSOR];
> > +		sw_entry = &sw_ddb->plane[pipe][PLANE_CURSOR];
> > +
> > +		if (!skl_ddb_entry_equal(hw_entry, sw_entry)) {
> > +			DRM_ERROR("mismatch in DDB state pipe %c cursor "
> > +				  "(expected (%u,%u), found (%u,%u))\n",
> > +				  pipe_name(pipe),
> > +				  sw_entry->start, sw_entry->end,
> > +				  hw_entry->start, hw_entry->end);
> > +		}
> >  	}
> >  }
> >  
> > @@ -13658,6 +13665,72 @@ static void intel_update_crtcs(struct
> > drm_atomic_state *state,
> >  	}
> >  }
> >  
> > +static inline void
> > +skl_do_ddb_step(struct drm_atomic_state *state,
> > +		enum skl_ddb_step step)
> > +{
> > +	struct intel_atomic_state *intel_state =
> > to_intel_atomic_state(state);
> > +	struct drm_crtc *crtc;
> > +	struct drm_crtc_state *old_crtc_state;
> > +	unsigned int crtc_vblank_mask; /* unused */
> > +	int i;
> > +
> > +	for_each_crtc_in_state(state, crtc, old_crtc_state, i) {
> > +		struct intel_crtc *intel_crtc = to_intel_crtc(crtc);
> > +		struct intel_crtc_state *cstate =
> > +			to_intel_crtc_state(crtc->state);
> > +		bool vblank_wait = false;
> > +
> > +		if (cstate->wm.skl.ddb_realloc != step || !crtc->state-
> > >active)
> > +			continue;
> > +
> > +		/*
> > +		 * If we're changing the ddb allocation of this pipe to
> > make
> > +		 * room for another pipe, we have to wait for the pipe's
> > ddb
> > +		 * allocations to actually update by waiting for a vblank.
> > +		 * Otherwise we risk the next pipe updating before this
> > pipe
> > +		 * finishes, resulting in the pipe fetching from ddb space
> > for
> > +		 * the wrong pipe.
> > +		 *
> > +		 * However, if we know we don't have any more pipes to move
> > +		 * around, we can skip this wait and the new ddb allocation
> > +		 * will take effect at the start of the next vblank.
> > +		 */
> > +		switch (step) {
> > +		case SKL_DDB_STEP_NO_OVERLAP:
> > +		case SKL_DDB_STEP_OVERLAP:
> > +			if (step != intel_state->last_ddb_step)
> > +				vblank_wait = true;
> > +
> > +		/* drop through */
> > +		case SKL_DDB_STEP_FINAL:
> > +			DRM_DEBUG_KMS(
> > +			    "Updating [CRTC:%d:pipe %c] for DDB step %d\n",
> > +			    crtc->base.id, pipe_name(intel_crtc->pipe),
> > +			    step);
> > +
> > +		case SKL_DDB_STEP_NONE:
> > +			break;
> > +		}
> 
> Not sure we really need this step stuff. How about?
> 
> for_each_crtc
> 	if (crtc_needs_disabling)
> 		disable_crtc();
> 
> do {
> 	progress = false;
> 	wait_vbl_pipes=0;
> 	for_each_crtc() {
> 		if (!active || needs_modeset)
> 			continue;
> 		if (!ddb_changed)
> 			continue;
> 		if (new_ddb_overlaps_with_any_other_pipes_current_ddb)
> 			continue;
> 		commit;
> 		wait_vbl_pipes |= pipe;
> 		progress = true;
> 	}
> 	wait_vbls(wait_vbl_pipes);
> } while (progress);
> 
> for_each_crtc
> 	if (crtc_needs_enabling)
> 		enable_crtc();
> 	commit;
> }

I'm fine with this, it might make this logic a little easier to read. 
> 
> Or if we're paranoid, we could also have an upper bound on the
> loop and assert that we never reach it.
> 
> 
> Though one thing I don't particularly like about this commit while
> changing the ddb approach is that it's going to make the update
> appear even less atomic. What I'd rather like to do for the normal
> commit path is this:
> 
> for_each_crtc
> 	if (crtc_needs_disabling)
> 		disable_planes
> for_each_crtc
> 	if (crtc_needs_disabling)
> 		disable_crtc
> for_each_crtc
> 	if (crtc_needs_enabling)
> 		enable_crtc
> for_each_crtc
> 	if (active)
> 		commit_planes;
> 
> That way everything would pop in and out as close together as possible.
> Hmm. Actually, I wonder... I'm thinking we should be able to enable all
> crtcs prior to entering the ddb commit loop, on account of no planes
> being enabled on those crtcs until we commit them. And if no planes are
> enabled, running the pipe w/o allocated ddb should be fine. So with that
> approach, I think we should be able to commit all planes within a few
> iterations of the loop, and hence within a few vblanks.

I can't see any issues with this, and this would definitely make the code a lot
cleaner. I'm alright with going this route if matt doesn't see any issues with
it as well.

Cheers,
	Lyude

> 
> > 
> > +
> > +		intel_update_crtc(crtc, state, old_crtc_state,
> > +				  &crtc_vblank_mask);
> > +
> > +		if (vblank_wait)
> > +			intel_wait_for_vblank(state->dev, intel_crtc-
> > >pipe);
> > +	}
> > +}
> > +
> > +static void skl_update_crtcs(struct drm_atomic_state *state,
> > +			     unsigned int *crtc_vblank_mask)
> > +{
> > +	struct intel_atomic_state *intel_state =
> > to_intel_atomic_state(state);
> > +	enum skl_ddb_step step;
> > +
> > +	for (step = 0; step <= intel_state->last_ddb_step; step++)
> > +		skl_do_ddb_step(state, step);
> > +}
> > +
> >  static void intel_atomic_commit_tail(struct drm_atomic_state *state)
> >  {
> >  	struct drm_device *dev = state->dev;
> > @@ -15235,8 +15308,6 @@ void intel_init_display_hooks(struct
> > drm_i915_private *dev_priv)
> >  		dev_priv->display.crtc_disable = i9xx_crtc_disable;
> >  	}
> >  
> > -	dev_priv->display.update_crtcs = intel_update_crtcs;
> > -
> >  	/* Returns the core display clock speed */
> >  	if (IS_SKYLAKE(dev_priv) || IS_KABYLAKE(dev_priv))
> >  		dev_priv->display.get_display_clock_speed =
> > @@ -15326,6 +15397,11 @@ void intel_init_display_hooks(struct
> > drm_i915_private *dev_priv)
> >  			skl_modeset_calc_cdclk;
> >  	}
> >  
> > +	if (dev_priv->info.gen >= 9)
> > +		dev_priv->display.update_crtcs = skl_update_crtcs;
> > +	else
> > +		dev_priv->display.update_crtcs = intel_update_crtcs;
> > +
> >  	switch (INTEL_INFO(dev_priv)->gen) {
> >  	case 2:
> >  		dev_priv->display.queue_flip = intel_gen2_queue_flip;
> > diff --git a/drivers/gpu/drm/i915/intel_drv.h
> > b/drivers/gpu/drm/i915/intel_drv.h
> > index 1b444d3..cf5da83 100644
> > --- a/drivers/gpu/drm/i915/intel_drv.h
> > +++ b/drivers/gpu/drm/i915/intel_drv.h
> > @@ -334,6 +334,7 @@ struct intel_atomic_state {
> >  
> >  	/* Gen9+ only */
> >  	struct skl_wm_values wm_results;
> > +	int last_ddb_step;
> >  };
> >  
> >  struct intel_plane_state {
> > @@ -437,6 +438,13 @@ struct skl_pipe_wm {
> >  	uint32_t linetime;
> >  };
> >  
> > +enum skl_ddb_step {
> > +	SKL_DDB_STEP_NONE = 0,
> > +	SKL_DDB_STEP_NO_OVERLAP,
> > +	SKL_DDB_STEP_OVERLAP,
> > +	SKL_DDB_STEP_FINAL
> > +};
> > +
> >  struct intel_crtc_wm_state {
> >  	union {
> >  		struct {
> > @@ -467,6 +475,8 @@ struct intel_crtc_wm_state {
> >  			/* minimum block allocation */
> >  			uint16_t minimum_blocks[I915_MAX_PLANES];
> >  			uint16_t minimum_y_blocks[I915_MAX_PLANES];
> > +
> > +			enum skl_ddb_step ddb_realloc;
> >  		} skl;
> >  	};
> >  
> > diff --git a/drivers/gpu/drm/i915/intel_pm.c
> > b/drivers/gpu/drm/i915/intel_pm.c
> > index 6f5beb3..636c90a 100644
> > --- a/drivers/gpu/drm/i915/intel_pm.c
> > +++ b/drivers/gpu/drm/i915/intel_pm.c
> > @@ -3816,6 +3816,11 @@ void skl_write_plane_wm(struct intel_crtc
> > *intel_crtc,
> >  			   wm->plane[pipe][plane][level]);
> >  	}
> >  	I915_WRITE(PLANE_WM_TRANS(pipe, plane), wm-
> > >plane_trans[pipe][plane]);
> > +
> > +	skl_ddb_entry_write(dev_priv, PLANE_BUF_CFG(pipe, plane),
> > +			    &wm->ddb.plane[pipe][plane]);
> > +	skl_ddb_entry_write(dev_priv, PLANE_NV12_BUF_CFG(pipe, plane),
> > +			    &wm->ddb.y_plane[pipe][plane]);
> >  }
> >  
> >  void skl_write_cursor_wm(struct intel_crtc *intel_crtc,
> > @@ -3832,170 +3837,51 @@ void skl_write_cursor_wm(struct intel_crtc
> > *intel_crtc,
> >  			   wm->plane[pipe][PLANE_CURSOR][level]);
> >  	}
> >  	I915_WRITE(CUR_WM_TRANS(pipe), wm-
> > >plane_trans[pipe][PLANE_CURSOR]);
> > -}
> > -
> > -static void skl_write_wm_values(struct drm_i915_private *dev_priv,
> > -				const struct skl_wm_values *new)
> > -{
> > -	struct drm_device *dev = &dev_priv->drm;
> > -	struct intel_crtc *crtc;
> > -
> > -	for_each_intel_crtc(dev, crtc) {
> > -		int i;
> > -		enum pipe pipe = crtc->pipe;
> > -
> > -		if ((new->dirty_pipes & drm_crtc_mask(&crtc->base)) == 0)
> > -			continue;
> > -		if (!crtc->active)
> > -			continue;
> >  
> > -		for (i = 0; i < intel_num_planes(crtc); i++) {
> > -			skl_ddb_entry_write(dev_priv,
> > -					    PLANE_BUF_CFG(pipe, i),
> > -					    &new->ddb.plane[pipe][i]);
> > -			skl_ddb_entry_write(dev_priv,
> > -					    PLANE_NV12_BUF_CFG(pipe, i),
> > -					    &new->ddb.y_plane[pipe][i]);
> > -		}
> > -
> > -		skl_ddb_entry_write(dev_priv, CUR_BUF_CFG(pipe),
> > -				    &new->ddb.plane[pipe][PLANE_CURSOR]);
> > -	}
> > +	skl_ddb_entry_write(dev_priv, CUR_BUF_CFG(pipe),
> > +			    &wm->ddb.plane[pipe][PLANE_CURSOR]);
> >  }
> >  
> > -/*
> > - * When setting up a new DDB allocation arrangement, we need to correctly
> > - * sequence the times at which the new allocations for the pipes are taken
> > into
> > - * account or we'll have pipes fetching from space previously allocated to
> > - * another pipe.
> > - *
> > - * Roughly the sequence looks like:
> > - *  1. re-allocate the pipe(s) with the allocation being reduced and not
> > - *     overlapping with a previous light-up pipe (another way to put it is:
> > - *     pipes with their new allocation strickly included into their old
> > ones).
> > - *  2. re-allocate the other pipes that get their allocation reduced
> > - *  3. allocate the pipes having their allocation increased
> > - *
> > - * Steps 1. and 2. are here to take care of the following case:
> > - * - Initially DDB looks like this:
> > - *     |   B    |   C    |
> > - * - enable pipe A.
> > - * - pipe B has a reduced DDB allocation that overlaps with the old pipe C
> > - *   allocation
> > - *     |  A  |  B  |  C  |
> > - *
> > - * We need to sequence the re-allocation: C, B, A (and not B, C, A).
> > - */
> > -
> > -static void
> > -skl_wm_flush_pipe(struct drm_i915_private *dev_priv, enum pipe pipe, int
> > pass)
> > +static bool
> > +skl_ddb_allocation_equals(const struct skl_ddb_allocation *old,
> > +			  const struct skl_ddb_allocation *new,
> > +			  enum pipe pipe)
> >  {
> > -	int plane;
> > -
> > -	DRM_DEBUG_KMS("flush pipe %c (pass %d)\n", pipe_name(pipe), pass);
> > -
> > -	for_each_plane(dev_priv, pipe, plane) {
> > -		I915_WRITE(PLANE_SURF(pipe, plane),
> > -			   I915_READ(PLANE_SURF(pipe, plane)));
> > -	}
> > -	I915_WRITE(CURBASE(pipe), I915_READ(CURBASE(pipe)));
> > +	return new->pipe[pipe].start == old->pipe[pipe].start &&
> > +	       new->pipe[pipe].end == old->pipe[pipe].end;
> >  }
> >  
> >  static bool
> > -skl_ddb_allocation_included(const struct skl_ddb_allocation *old,
> > +skl_ddb_allocation_overlaps(struct drm_atomic_state *state,
> > +			    const struct skl_ddb_allocation *old,
> >  			    const struct skl_ddb_allocation *new,
> >  			    enum pipe pipe)
> >  {
> > -	uint16_t old_size, new_size;
> > -
> > -	old_size = skl_ddb_entry_size(&old->pipe[pipe]);
> > -	new_size = skl_ddb_entry_size(&new->pipe[pipe]);
> > -
> > -	return old_size != new_size &&
> > -	       new->pipe[pipe].start >= old->pipe[pipe].start &&
> > -	       new->pipe[pipe].end <= old->pipe[pipe].end;
> > -}
> > -
> > -static void skl_flush_wm_values(struct drm_i915_private *dev_priv,
> > -				struct skl_wm_values *new_values)
> > -{
> > -	struct drm_device *dev = &dev_priv->drm;
> > -	struct skl_ddb_allocation *cur_ddb, *new_ddb;
> > -	bool reallocated[I915_MAX_PIPES] = {};
> > -	struct intel_crtc *crtc;
> > -	enum pipe pipe;
> > -
> > -	new_ddb = &new_values->ddb;
> > -	cur_ddb = &dev_priv->wm.skl_hw.ddb;
> > -
> > -	/*
> > -	 * First pass: flush the pipes with the new allocation contained
> > into
> > -	 * the old space.
> > -	 *
> > -	 * We'll wait for the vblank on those pipes to ensure we can safely
> > -	 * re-allocate the freed space without this pipe fetching from it.
> > -	 */
> > -	for_each_intel_crtc(dev, crtc) {
> > -		if (!crtc->active)
> > -			continue;
> > -
> > -		pipe = crtc->pipe;
> > -
> > -		if (!skl_ddb_allocation_included(cur_ddb, new_ddb, pipe))
> > -			continue;
> > -
> > -		skl_wm_flush_pipe(dev_priv, pipe, 1);
> > -		intel_wait_for_vblank(dev, pipe);
> > -
> > -		reallocated[pipe] = true;
> > -	}
> > -
> > -
> > -	/*
> > -	 * Second pass: flush the pipes that are having their allocation
> > -	 * reduced, but overlapping with a previous allocation.
> > -	 *
> > -	 * Here as well we need to wait for the vblank to make sure the
> > freed
> > -	 * space is not used anymore.
> > -	 */
> > -	for_each_intel_crtc(dev, crtc) {
> > -		if (!crtc->active)
> > -			continue;
> > -
> > -		pipe = crtc->pipe;
> > -
> > -		if (reallocated[pipe])
> > -			continue;
> > -
> > -		if (skl_ddb_entry_size(&new_ddb->pipe[pipe]) <
> > -		    skl_ddb_entry_size(&cur_ddb->pipe[pipe])) {
> > -			skl_wm_flush_pipe(dev_priv, pipe, 2);
> > -			intel_wait_for_vblank(dev, pipe);
> > -			reallocated[pipe] = true;
> > -		}
> > -	}
> > -
> > -	/*
> > -	 * Third pass: flush the pipes that got more space allocated.
> > -	 *
> > -	 * We don't need to actively wait for the update here, next vblank
> > -	 * will just get more DDB space with the correct WM values.
> > -	 */
> > -	for_each_intel_crtc(dev, crtc) {
> > -		if (!crtc->active)
> > -			continue;
> > +	struct drm_device *dev = state->dev;
> > +	struct intel_crtc *intel_crtc;
> > +	enum pipe otherp;
> >  
> > -		pipe = crtc->pipe;
> > +	for_each_intel_crtc(dev, intel_crtc) {
> > +		otherp = intel_crtc->pipe;
> >  
> >  		/*
> > -		 * At this point, only the pipes more space than before are
> > -		 * left to re-allocate.
> > +		 * When checking for overlaps, we don't want to:
> > +		 *  - Compare against ourselves
> > +		 *  - Compare against pipes that will be disabled in step 0
> > +		 *  - Compare against pipes that won't be enabled until
> > step 3
> >  		 */
> > -		if (reallocated[pipe])
> > +		if (otherp == pipe || !new->pipe[otherp].end ||
> > +		    !old->pipe[otherp].end)
> >  			continue;
> >  
> > -		skl_wm_flush_pipe(dev_priv, pipe, 3);
> > +		if ((new->pipe[pipe].start >= old->pipe[otherp].start &&
> > +		     new->pipe[pipe].start < old->pipe[otherp].end) ||
> > +		    (old->pipe[otherp].start >= new->pipe[pipe].start &&
> > +		     old->pipe[otherp].start < new->pipe[pipe].end))
> > +			return true;
> >  	}
> > +
> > +	return false;
> >  }
> >  
> >  static int skl_update_pipe_wm(struct drm_crtc_state *cstate,
> > @@ -4038,8 +3924,10 @@ skl_compute_ddb(struct drm_atomic_state *state)
> >  	struct drm_device *dev = state->dev;
> >  	struct drm_i915_private *dev_priv = to_i915(dev);
> >  	struct intel_atomic_state *intel_state =
> > to_intel_atomic_state(state);
> > +	struct intel_crtc_state *cstate;
> >  	struct intel_crtc *intel_crtc;
> > -	struct skl_ddb_allocation *ddb = &intel_state->wm_results.ddb;
> > +	struct skl_ddb_allocation *old_ddb = &dev_priv->wm.skl_hw.ddb;
> > +	struct skl_ddb_allocation *new_ddb = &intel_state->wm_results.ddb;
> >  	uint32_t realloc_pipes = pipes_modified(state);
> >  	int ret;
> >  
> > @@ -4071,13 +3959,11 @@ skl_compute_ddb(struct drm_atomic_state *state)
> >  	}
> >  
> >  	for_each_intel_crtc_mask(dev, intel_crtc, realloc_pipes) {
> > -		struct intel_crtc_state *cstate;
> > -
> >  		cstate = intel_atomic_get_crtc_state(state, intel_crtc);
> >  		if (IS_ERR(cstate))
> >  			return PTR_ERR(cstate);
> >  
> > -		ret = skl_allocate_pipe_ddb(cstate, ddb);
> > +		ret = skl_allocate_pipe_ddb(cstate, new_ddb);
> >  		if (ret)
> >  			return ret;
> >  
> > @@ -4086,6 +3972,73 @@ skl_compute_ddb(struct drm_atomic_state *state)
> >  			return ret;
> >  	}
> >  
> > +	/*
> > +	 * When setting up a new DDB allocation arrangement, we need to
> > +	 * correctly sequence the times at which the new allocations for
> > the
> > +	 * pipes are taken into account or we'll have pipes fetching from
> > space
> > +	 * previously allocated to another pipe.
> > +	 *
> > +	 * Roughly the final sequence we want looks like this:
> > +	 *  1. Disable any pipes we're not going to be using anymore
> > +	 *  2. Reallocate all of the active pipes whose new ddb allocations
> > +	 *  won't overlap with another active pipe's ddb allocation.
> > +	 *  3. Reallocate remaining active pipes, if any.
> > +	 *  4. Enable any new pipes, if any.
> > +	 *
> > +	 * Example:
> > +	 * Initially DDB looks like this:
> > +	 *   |   B    |   C    |
> > +	 * And the final DDB should look like this:
> > +	 *   |  B  |  C  |  A  |
> > +	 *
> > +	 * 1. We're not disabling any pipes, so do nothing on this step.
> > +	 * 2. Pipe B's new allocation wouldn't overlap with pipe C, however
> > +	 * pipe C's new allocation does overlap with pipe B's current
> > +	 * allocation. Reallocate B first so the DDB looks like this:
> > +	 *   |  B  |xx|   C    |
> > +	 * 3. Now we can safely reallocate pipe C to it's new location:
> > +	 *   |  B  |  C  |xxxxx|
> > +	 * 4. Enable any remaining pipes, in this case A
> > +	 *   |  B  |  C  |  A  |
> > +	 *
> > +	 * As well, between every pipe reallocation we have to wait for a
> > +	 * vblank on the pipe so that we ensure it's new allocation has
> > taken
> > +	 * effect by the time we start moving the next pipe. This can be
> > +	 * skipped on the last step we need to perform, which is why we
> > keep
> > +	 * track of that information here. For example, if we've
> > reallocated
> > +	 * all the pipes that need changing by the time we reach step 3, we
> > can
> > +	 * finish without waiting for the pipes we changed in step 3 to
> > update.
> > +	 */
> > +	for_each_intel_crtc_mask(dev, intel_crtc, realloc_pipes) {
> > +		enum pipe pipe = intel_crtc->pipe;
> > +		enum skl_ddb_step step;
> > +
> > +		cstate = intel_atomic_get_crtc_state(state, intel_crtc);
> > +		if (IS_ERR(cstate))
> > +			return PTR_ERR(cstate);
> > +
> > +		/* Step 1: Pipes we're disabling / haven't changed */
> > +		if (skl_ddb_allocation_equals(old_ddb, new_ddb, pipe) ||
> > +		    new_ddb->pipe[pipe].end == 0) {
> > +			step = SKL_DDB_STEP_NONE;
> > +		/* Step 2-3: Active pipes we're reallocating */
> > +		} else if (old_ddb->pipe[pipe].end != 0) {
> > +			if (skl_ddb_allocation_overlaps(state, old_ddb,
> > new_ddb,
> > +							pipe))
> > +				step = SKL_DDB_STEP_OVERLAP;
> > +			else
> > +				step = SKL_DDB_STEP_NO_OVERLAP;
> > +		/* Step 4: Pipes we're enabling */
> > +		} else {
> > +			step = SKL_DDB_STEP_FINAL;
> > +		}
> > +
> > +		cstate->wm.skl.ddb_realloc = step;
> > +
> > +		if (step > intel_state->last_ddb_step)
> > +			intel_state->last_ddb_step = step;
> > +	}
> > +
> >  	return 0;
> >  }
> >  
> > @@ -4110,10 +4063,13 @@ skl_copy_wm_for_pipe(struct skl_wm_values *dst,
> >  static int
> >  skl_compute_wm(struct drm_atomic_state *state)
> >  {
> > +	struct drm_i915_private *dev_priv = to_i915(state->dev);
> >  	struct drm_crtc *crtc;
> >  	struct drm_crtc_state *cstate;
> >  	struct intel_atomic_state *intel_state =
> > to_intel_atomic_state(state);
> >  	struct skl_wm_values *results = &intel_state->wm_results;
> > +	struct skl_ddb_allocation *old_ddb = &dev_priv->wm.skl_hw.ddb;
> > +	struct skl_ddb_allocation *new_ddb = &results->ddb;
> >  	struct skl_pipe_wm *pipe_wm;
> >  	bool changed = false;
> >  	int ret, i;
> > @@ -4152,7 +4108,10 @@ skl_compute_wm(struct drm_atomic_state *state)
> >  		struct intel_crtc *intel_crtc = to_intel_crtc(crtc);
> >  		struct intel_crtc_state *intel_cstate =
> >  			to_intel_crtc_state(cstate);
> > +		enum skl_ddb_step step;
> > +		enum pipe pipe;
> >  
> > +		pipe = intel_crtc->pipe;
> >  		pipe_wm = &intel_cstate->wm.skl.optimal;
> >  		ret = skl_update_pipe_wm(cstate, &results->ddb, pipe_wm,
> >  					 &changed);
> > @@ -4167,7 +4126,18 @@ skl_compute_wm(struct drm_atomic_state *state)
> >  			continue;
> >  
> >  		intel_cstate->update_wm_pre = true;
> > +		step = intel_cstate->wm.skl.ddb_realloc;
> >  		skl_compute_wm_results(crtc->dev, pipe_wm, results,
> > intel_crtc);
> > +
> > +		if (!skl_ddb_entry_equal(&old_ddb->pipe[pipe],
> > +					 &new_ddb->pipe[pipe])) {
> > +			DRM_DEBUG_KMS(
> > +			    "DDB changes for [CRTC:%d:pipe %c]: (%3d - %3d)
> > -> (%3d - %3d) on step %d\n",
> > +			    intel_crtc->base.base.id, pipe_name(pipe),
> > +			    old_ddb->pipe[pipe].start, old_ddb-
> > >pipe[pipe].end,
> > +			    new_ddb->pipe[pipe].start, new_ddb-
> > >pipe[pipe].end,
> > +			    step);
> > +		}
> >  	}
> >  
> >  	return 0;
> > @@ -4191,8 +4161,20 @@ static void skl_update_wm(struct drm_crtc *crtc)
> >  
> >  	mutex_lock(&dev_priv->wm.wm_mutex);
> >  
> > -	skl_write_wm_values(dev_priv, results);
> > -	skl_flush_wm_values(dev_priv, results);
> > +	/*
> > +	 * If this pipe isn't active already, we're going to be enabling it
> > +	 * very soon. Since it's safe to update these while the pipe's shut
> > off,
> > +	 * just do so here. Already active pipes will have their watermarks
> > +	 * updated once we update their planes.
> > +	 */
> > +	if (!intel_crtc->active) {
> > +		int plane;
> > +
> > +		for (plane = 0; plane < intel_num_planes(intel_crtc);
> > plane++)
> > +			skl_write_plane_wm(intel_crtc, results, plane);
> > +
> > +		skl_write_cursor_wm(intel_crtc, results);
> > +	}
> >  
> >  	/*
> >  	 * Store the new configuration (but only for the pipes that have
> > -- 
> > 2.7.4
>
Matt Roper Aug. 3, 2016, 10:19 p.m. UTC | #3
On Wed, Aug 03, 2016 at 06:00:42PM +0300, Ville Syrjälä wrote:
> On Tue, Aug 02, 2016 at 06:37:37PM -0400, Lyude wrote:
> > Now that we can hook into update_crtcs and control the order in which we
> > update CRTCs at each modeset, we can finish the final step of fixing
> > Skylake's watermark handling by performing DDB updates at the same time
> > as plane updates and watermark updates.
> > 
> > The first major change in this patch is skl_update_crtcs(), which
> > handles ensuring that we order each CRTC update in our atomic commits
> > properly so that they honor the DDB flush order.
> > 
> > The second major change in this patch is the order in which we flush the
> > pipes. While the previous order may have worked, it can't be used in
> > this approach since it no longer will do the right thing. For example,
> > using the old ddb flush order:
> > 
> > We have pipes A, B, and C enabled, and we're disabling C. Initial ddb
> > allocation looks like this:
> > 
> > |   A   |   B   |xxxxxxx|
> > 
> > Since we're performing the ddb updates after performing any CRTC
> > disablements in intel_atomic_commit_tail(), the space to the right of
> > pipe B is unallocated.
> > 
> > 1. Flush pipes with new allocation contained into old space. None
> >    apply, so we skip this
> > 2. Flush pipes having their allocation reduced, but overlapping with a
> >    previous allocation. None apply, so we also skip this
> > 3. Flush pipes that got more space allocated. This applies to A and B,
> >    giving us the following update order: A, B
> > 
> > This is wrong, since updating pipe A first will cause it to overlap with
> > B and potentially burst into flames. Our new order (see the code
> > comments for details) would update the pipes in the proper order: B, A.
> > 
> > As well, we calculate the order for each DDB update during the check
> > phase, and reference it later in the commit phase when we hit
> > skl_update_crtcs().
> > 
> > This long overdue patch fixes the rest of the underruns on Skylake.
> > 
> > Changes since v1:
> >  - Add skl_ddb_entry_write() for cursor into skl_write_cursor_wm()
> > 
> > Fixes: 0e8fb7ba7ca5 ("drm/i915/skl: Flush the WM configuration")
> > Fixes: 8211bd5bdf5e ("drm/i915/skl: Program the DDB allocation")
> > Signed-off-by: Lyude <cpaul@redhat.com>
> > [omitting CC for stable, since this patch will need to be changed for
> > such backports first]
> > Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
> > Cc: Daniel Vetter <daniel.vetter@intel.com>
> > Cc: Radhakrishna Sripada <radhakrishna.sripada@intel.com>
> > Cc: Hans de Goede <hdegoede@redhat.com>
> > Cc: Matt Roper <matthew.d.roper@intel.com>
> > ---
> >  drivers/gpu/drm/i915/intel_display.c | 100 ++++++++++--
> >  drivers/gpu/drm/i915/intel_drv.h     |  10 ++
> >  drivers/gpu/drm/i915/intel_pm.c      | 288 ++++++++++++++++-------------------
> >  3 files changed, 233 insertions(+), 165 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
> > index 59cf513..06295f7 100644
> > --- a/drivers/gpu/drm/i915/intel_display.c
> > +++ b/drivers/gpu/drm/i915/intel_display.c
> > @@ -12897,16 +12897,23 @@ static void verify_wm_state(struct drm_crtc *crtc,
> >  			  hw_entry->start, hw_entry->end);
> >  	}
> >  
> > -	/* cursor */
> > -	hw_entry = &hw_ddb.plane[pipe][PLANE_CURSOR];
> > -	sw_entry = &sw_ddb->plane[pipe][PLANE_CURSOR];
> > -
> > -	if (!skl_ddb_entry_equal(hw_entry, sw_entry)) {
> > -		DRM_ERROR("mismatch in DDB state pipe %c cursor "
> > -			  "(expected (%u,%u), found (%u,%u))\n",
> > -			  pipe_name(pipe),
> > -			  sw_entry->start, sw_entry->end,
> > -			  hw_entry->start, hw_entry->end);
> > +	/*
> > +	 * cursor
> > +	 * If the cursor plane isn't active, we may not have updated it's ddb
> > +	 * allocation. In that case since the ddb allocation will be updated
> > +	 * once the plane becomes visible, we can skip this check
> > +	 */
> > +	if (intel_crtc->cursor_addr) {
> > +		hw_entry = &hw_ddb.plane[pipe][PLANE_CURSOR];
> > +		sw_entry = &sw_ddb->plane[pipe][PLANE_CURSOR];
> > +
> > +		if (!skl_ddb_entry_equal(hw_entry, sw_entry)) {
> > +			DRM_ERROR("mismatch in DDB state pipe %c cursor "
> > +				  "(expected (%u,%u), found (%u,%u))\n",
> > +				  pipe_name(pipe),
> > +				  sw_entry->start, sw_entry->end,
> > +				  hw_entry->start, hw_entry->end);
> > +		}
> >  	}
> >  }
> >  
> > @@ -13658,6 +13665,72 @@ static void intel_update_crtcs(struct drm_atomic_state *state,
> >  	}
> >  }
> >  
> > +static inline void
> > +skl_do_ddb_step(struct drm_atomic_state *state,
> > +		enum skl_ddb_step step)
> > +{
> > +	struct intel_atomic_state *intel_state = to_intel_atomic_state(state);
> > +	struct drm_crtc *crtc;
> > +	struct drm_crtc_state *old_crtc_state;
> > +	unsigned int crtc_vblank_mask; /* unused */
> > +	int i;
> > +
> > +	for_each_crtc_in_state(state, crtc, old_crtc_state, i) {
> > +		struct intel_crtc *intel_crtc = to_intel_crtc(crtc);
> > +		struct intel_crtc_state *cstate =
> > +			to_intel_crtc_state(crtc->state);
> > +		bool vblank_wait = false;
> > +
> > +		if (cstate->wm.skl.ddb_realloc != step || !crtc->state->active)
> > +			continue;
> > +
> > +		/*
> > +		 * If we're changing the ddb allocation of this pipe to make
> > +		 * room for another pipe, we have to wait for the pipe's ddb
> > +		 * allocations to actually update by waiting for a vblank.
> > +		 * Otherwise we risk the next pipe updating before this pipe
> > +		 * finishes, resulting in the pipe fetching from ddb space for
> > +		 * the wrong pipe.
> > +		 *
> > +		 * However, if we know we don't have any more pipes to move
> > +		 * around, we can skip this wait and the new ddb allocation
> > +		 * will take effect at the start of the next vblank.
> > +		 */
> > +		switch (step) {
> > +		case SKL_DDB_STEP_NO_OVERLAP:
> > +		case SKL_DDB_STEP_OVERLAP:
> > +			if (step != intel_state->last_ddb_step)
> > +				vblank_wait = true;
> > +
> > +		/* drop through */
> > +		case SKL_DDB_STEP_FINAL:
> > +			DRM_DEBUG_KMS(
> > +			    "Updating [CRTC:%d:pipe %c] for DDB step %d\n",
> > +			    crtc->base.id, pipe_name(intel_crtc->pipe),
> > +			    step);
> > +
> > +		case SKL_DDB_STEP_NONE:
> > +			break;
> > +		}
> 
> Not sure we really need this step stuff. How about?
> 
> for_each_crtc
> 	if (crtc_needs_disabling)
> 		disable_crtc();
> 
> do {
> 	progress = false;
> 	wait_vbl_pipes=0;
> 	for_each_crtc() {
> 		if (!active || needs_modeset)
> 			continue;
> 		if (!ddb_changed)
> 			continue;
> 		if (new_ddb_overlaps_with_any_other_pipes_current_ddb)
> 			continue;
> 		commit;
> 		wait_vbl_pipes |= pipe;
> 		progress = true;
> 	}
> 	wait_vbls(wait_vbl_pipes);
> } while (progress);
> 
> for_each_crtc
> 	if (crtc_needs_enabling)
> 		enable_crtc();
> 	commit;
> }

Yeah, this approach looks nicer.  It's a bit simpler to follow code-wise
and doesn't require us to precompute any ordering during the check phase
so it's a bit more self-contained.  It should also scale properly if
future platforms decide to add more pipes.

> 
> Or if we're paranoid, we could also have an upper bound on the
> loop and assert that we never reach it.
> 
> 
> Though one thing I don't particularly like about this commit while
> changing the ddb approach is that it's going to make the update
> appear even less atomic. What I'd rather like to do for the normal
> commit path is this:
> 
> for_each_crtc
> 	if (crtc_needs_disabling)
> 		disable_planes
> for_each_crtc
> 	if (crtc_needs_disabling)
> 		disable_crtc
> for_each_crtc
> 	if (crtc_needs_enabling)
> 		enable_crtc
> for_each_crtc
> 	if (active)
> 		commit_planes;
> 
> That way everything would pop in and out as close together as possible.
> Hmm. Actually, I wonder... I'm thinking we should be able to enable all
> crtcs prior to entering the ddb commit loop, on account of no planes
> being enabled on those crtcs until we commit them. And if no planes are
> enabled, running the pipe w/o allocated ddb should be fine. So with that
> approach, I think we should be able to commit all planes within a few
> iterations of the loop, and hence within a few vblanks.

So this is pretty similar to what we do today, except that we do the
enabling/disabling of each CRTC and its planes all together, right?
Sounds reasonable to me, although I'm not sure we want to mix that
change in with the gen9-specific series Lyude is working on here.  Maybe
just do the new gen9 handler that way as part of that series and then
come back and update the non-gen9 handler to follow the new flow as a
separate patch?


Matt

> 
> > +
> > +		intel_update_crtc(crtc, state, old_crtc_state,
> > +				  &crtc_vblank_mask);
> > +
> > +		if (vblank_wait)
> > +			intel_wait_for_vblank(state->dev, intel_crtc->pipe);
> > +	}
> > +}
> > +
> > +static void skl_update_crtcs(struct drm_atomic_state *state,
> > +			     unsigned int *crtc_vblank_mask)
> > +{
> > +	struct intel_atomic_state *intel_state = to_intel_atomic_state(state);
> > +	enum skl_ddb_step step;
> > +
> > +	for (step = 0; step <= intel_state->last_ddb_step; step++)
> > +		skl_do_ddb_step(state, step);
> > +}
> > +
> >  static void intel_atomic_commit_tail(struct drm_atomic_state *state)
> >  {
> >  	struct drm_device *dev = state->dev;
> > @@ -15235,8 +15308,6 @@ void intel_init_display_hooks(struct drm_i915_private *dev_priv)
> >  		dev_priv->display.crtc_disable = i9xx_crtc_disable;
> >  	}
> >  
> > -	dev_priv->display.update_crtcs = intel_update_crtcs;
> > -
> >  	/* Returns the core display clock speed */
> >  	if (IS_SKYLAKE(dev_priv) || IS_KABYLAKE(dev_priv))
> >  		dev_priv->display.get_display_clock_speed =
> > @@ -15326,6 +15397,11 @@ void intel_init_display_hooks(struct drm_i915_private *dev_priv)
> >  			skl_modeset_calc_cdclk;
> >  	}
> >  
> > +	if (dev_priv->info.gen >= 9)
> > +		dev_priv->display.update_crtcs = skl_update_crtcs;
> > +	else
> > +		dev_priv->display.update_crtcs = intel_update_crtcs;
> > +
> >  	switch (INTEL_INFO(dev_priv)->gen) {
> >  	case 2:
> >  		dev_priv->display.queue_flip = intel_gen2_queue_flip;
> > diff --git a/drivers/gpu/drm/i915/intel_drv.h b/drivers/gpu/drm/i915/intel_drv.h
> > index 1b444d3..cf5da83 100644
> > --- a/drivers/gpu/drm/i915/intel_drv.h
> > +++ b/drivers/gpu/drm/i915/intel_drv.h
> > @@ -334,6 +334,7 @@ struct intel_atomic_state {
> >  
> >  	/* Gen9+ only */
> >  	struct skl_wm_values wm_results;
> > +	int last_ddb_step;
> >  };
> >  
> >  struct intel_plane_state {
> > @@ -437,6 +438,13 @@ struct skl_pipe_wm {
> >  	uint32_t linetime;
> >  };
> >  
> > +enum skl_ddb_step {
> > +	SKL_DDB_STEP_NONE = 0,
> > +	SKL_DDB_STEP_NO_OVERLAP,
> > +	SKL_DDB_STEP_OVERLAP,
> > +	SKL_DDB_STEP_FINAL
> > +};
> > +
> >  struct intel_crtc_wm_state {
> >  	union {
> >  		struct {
> > @@ -467,6 +475,8 @@ struct intel_crtc_wm_state {
> >  			/* minimum block allocation */
> >  			uint16_t minimum_blocks[I915_MAX_PLANES];
> >  			uint16_t minimum_y_blocks[I915_MAX_PLANES];
> > +
> > +			enum skl_ddb_step ddb_realloc;
> >  		} skl;
> >  	};
> >  
> > diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c
> > index 6f5beb3..636c90a 100644
> > --- a/drivers/gpu/drm/i915/intel_pm.c
> > +++ b/drivers/gpu/drm/i915/intel_pm.c
> > @@ -3816,6 +3816,11 @@ void skl_write_plane_wm(struct intel_crtc *intel_crtc,
> >  			   wm->plane[pipe][plane][level]);
> >  	}
> >  	I915_WRITE(PLANE_WM_TRANS(pipe, plane), wm->plane_trans[pipe][plane]);
> > +
> > +	skl_ddb_entry_write(dev_priv, PLANE_BUF_CFG(pipe, plane),
> > +			    &wm->ddb.plane[pipe][plane]);
> > +	skl_ddb_entry_write(dev_priv, PLANE_NV12_BUF_CFG(pipe, plane),
> > +			    &wm->ddb.y_plane[pipe][plane]);
> >  }
> >  
> >  void skl_write_cursor_wm(struct intel_crtc *intel_crtc,
> > @@ -3832,170 +3837,51 @@ void skl_write_cursor_wm(struct intel_crtc *intel_crtc,
> >  			   wm->plane[pipe][PLANE_CURSOR][level]);
> >  	}
> >  	I915_WRITE(CUR_WM_TRANS(pipe), wm->plane_trans[pipe][PLANE_CURSOR]);
> > -}
> > -
> > -static void skl_write_wm_values(struct drm_i915_private *dev_priv,
> > -				const struct skl_wm_values *new)
> > -{
> > -	struct drm_device *dev = &dev_priv->drm;
> > -	struct intel_crtc *crtc;
> > -
> > -	for_each_intel_crtc(dev, crtc) {
> > -		int i;
> > -		enum pipe pipe = crtc->pipe;
> > -
> > -		if ((new->dirty_pipes & drm_crtc_mask(&crtc->base)) == 0)
> > -			continue;
> > -		if (!crtc->active)
> > -			continue;
> >  
> > -		for (i = 0; i < intel_num_planes(crtc); i++) {
> > -			skl_ddb_entry_write(dev_priv,
> > -					    PLANE_BUF_CFG(pipe, i),
> > -					    &new->ddb.plane[pipe][i]);
> > -			skl_ddb_entry_write(dev_priv,
> > -					    PLANE_NV12_BUF_CFG(pipe, i),
> > -					    &new->ddb.y_plane[pipe][i]);
> > -		}
> > -
> > -		skl_ddb_entry_write(dev_priv, CUR_BUF_CFG(pipe),
> > -				    &new->ddb.plane[pipe][PLANE_CURSOR]);
> > -	}
> > +	skl_ddb_entry_write(dev_priv, CUR_BUF_CFG(pipe),
> > +			    &wm->ddb.plane[pipe][PLANE_CURSOR]);
> >  }
> >  
> > -/*
> > - * When setting up a new DDB allocation arrangement, we need to correctly
> > - * sequence the times at which the new allocations for the pipes are taken into
> > - * account or we'll have pipes fetching from space previously allocated to
> > - * another pipe.
> > - *
> > - * Roughly the sequence looks like:
> > - *  1. re-allocate the pipe(s) with the allocation being reduced and not
> > - *     overlapping with a previous light-up pipe (another way to put it is:
> > - *     pipes with their new allocation strickly included into their old ones).
> > - *  2. re-allocate the other pipes that get their allocation reduced
> > - *  3. allocate the pipes having their allocation increased
> > - *
> > - * Steps 1. and 2. are here to take care of the following case:
> > - * - Initially DDB looks like this:
> > - *     |   B    |   C    |
> > - * - enable pipe A.
> > - * - pipe B has a reduced DDB allocation that overlaps with the old pipe C
> > - *   allocation
> > - *     |  A  |  B  |  C  |
> > - *
> > - * We need to sequence the re-allocation: C, B, A (and not B, C, A).
> > - */
> > -
> > -static void
> > -skl_wm_flush_pipe(struct drm_i915_private *dev_priv, enum pipe pipe, int pass)
> > +static bool
> > +skl_ddb_allocation_equals(const struct skl_ddb_allocation *old,
> > +			  const struct skl_ddb_allocation *new,
> > +			  enum pipe pipe)
> >  {
> > -	int plane;
> > -
> > -	DRM_DEBUG_KMS("flush pipe %c (pass %d)\n", pipe_name(pipe), pass);
> > -
> > -	for_each_plane(dev_priv, pipe, plane) {
> > -		I915_WRITE(PLANE_SURF(pipe, plane),
> > -			   I915_READ(PLANE_SURF(pipe, plane)));
> > -	}
> > -	I915_WRITE(CURBASE(pipe), I915_READ(CURBASE(pipe)));
> > +	return new->pipe[pipe].start == old->pipe[pipe].start &&
> > +	       new->pipe[pipe].end == old->pipe[pipe].end;
> >  }
> >  
> >  static bool
> > -skl_ddb_allocation_included(const struct skl_ddb_allocation *old,
> > +skl_ddb_allocation_overlaps(struct drm_atomic_state *state,
> > +			    const struct skl_ddb_allocation *old,
> >  			    const struct skl_ddb_allocation *new,
> >  			    enum pipe pipe)
> >  {
> > -	uint16_t old_size, new_size;
> > -
> > -	old_size = skl_ddb_entry_size(&old->pipe[pipe]);
> > -	new_size = skl_ddb_entry_size(&new->pipe[pipe]);
> > -
> > -	return old_size != new_size &&
> > -	       new->pipe[pipe].start >= old->pipe[pipe].start &&
> > -	       new->pipe[pipe].end <= old->pipe[pipe].end;
> > -}
> > -
> > -static void skl_flush_wm_values(struct drm_i915_private *dev_priv,
> > -				struct skl_wm_values *new_values)
> > -{
> > -	struct drm_device *dev = &dev_priv->drm;
> > -	struct skl_ddb_allocation *cur_ddb, *new_ddb;
> > -	bool reallocated[I915_MAX_PIPES] = {};
> > -	struct intel_crtc *crtc;
> > -	enum pipe pipe;
> > -
> > -	new_ddb = &new_values->ddb;
> > -	cur_ddb = &dev_priv->wm.skl_hw.ddb;
> > -
> > -	/*
> > -	 * First pass: flush the pipes with the new allocation contained into
> > -	 * the old space.
> > -	 *
> > -	 * We'll wait for the vblank on those pipes to ensure we can safely
> > -	 * re-allocate the freed space without this pipe fetching from it.
> > -	 */
> > -	for_each_intel_crtc(dev, crtc) {
> > -		if (!crtc->active)
> > -			continue;
> > -
> > -		pipe = crtc->pipe;
> > -
> > -		if (!skl_ddb_allocation_included(cur_ddb, new_ddb, pipe))
> > -			continue;
> > -
> > -		skl_wm_flush_pipe(dev_priv, pipe, 1);
> > -		intel_wait_for_vblank(dev, pipe);
> > -
> > -		reallocated[pipe] = true;
> > -	}
> > -
> > -
> > -	/*
> > -	 * Second pass: flush the pipes that are having their allocation
> > -	 * reduced, but overlapping with a previous allocation.
> > -	 *
> > -	 * Here as well we need to wait for the vblank to make sure the freed
> > -	 * space is not used anymore.
> > -	 */
> > -	for_each_intel_crtc(dev, crtc) {
> > -		if (!crtc->active)
> > -			continue;
> > -
> > -		pipe = crtc->pipe;
> > -
> > -		if (reallocated[pipe])
> > -			continue;
> > -
> > -		if (skl_ddb_entry_size(&new_ddb->pipe[pipe]) <
> > -		    skl_ddb_entry_size(&cur_ddb->pipe[pipe])) {
> > -			skl_wm_flush_pipe(dev_priv, pipe, 2);
> > -			intel_wait_for_vblank(dev, pipe);
> > -			reallocated[pipe] = true;
> > -		}
> > -	}
> > -
> > -	/*
> > -	 * Third pass: flush the pipes that got more space allocated.
> > -	 *
> > -	 * We don't need to actively wait for the update here, next vblank
> > -	 * will just get more DDB space with the correct WM values.
> > -	 */
> > -	for_each_intel_crtc(dev, crtc) {
> > -		if (!crtc->active)
> > -			continue;
> > +	struct drm_device *dev = state->dev;
> > +	struct intel_crtc *intel_crtc;
> > +	enum pipe otherp;
> >  
> > -		pipe = crtc->pipe;
> > +	for_each_intel_crtc(dev, intel_crtc) {
> > +		otherp = intel_crtc->pipe;
> >  
> >  		/*
> > -		 * At this point, only the pipes more space than before are
> > -		 * left to re-allocate.
> > +		 * When checking for overlaps, we don't want to:
> > +		 *  - Compare against ourselves
> > +		 *  - Compare against pipes that will be disabled in step 0
> > +		 *  - Compare against pipes that won't be enabled until step 3
> >  		 */
> > -		if (reallocated[pipe])
> > +		if (otherp == pipe || !new->pipe[otherp].end ||
> > +		    !old->pipe[otherp].end)
> >  			continue;
> >  
> > -		skl_wm_flush_pipe(dev_priv, pipe, 3);
> > +		if ((new->pipe[pipe].start >= old->pipe[otherp].start &&
> > +		     new->pipe[pipe].start < old->pipe[otherp].end) ||
> > +		    (old->pipe[otherp].start >= new->pipe[pipe].start &&
> > +		     old->pipe[otherp].start < new->pipe[pipe].end))
> > +			return true;
> >  	}
> > +
> > +	return false;
> >  }
> >  
> >  static int skl_update_pipe_wm(struct drm_crtc_state *cstate,
> > @@ -4038,8 +3924,10 @@ skl_compute_ddb(struct drm_atomic_state *state)
> >  	struct drm_device *dev = state->dev;
> >  	struct drm_i915_private *dev_priv = to_i915(dev);
> >  	struct intel_atomic_state *intel_state = to_intel_atomic_state(state);
> > +	struct intel_crtc_state *cstate;
> >  	struct intel_crtc *intel_crtc;
> > -	struct skl_ddb_allocation *ddb = &intel_state->wm_results.ddb;
> > +	struct skl_ddb_allocation *old_ddb = &dev_priv->wm.skl_hw.ddb;
> > +	struct skl_ddb_allocation *new_ddb = &intel_state->wm_results.ddb;
> >  	uint32_t realloc_pipes = pipes_modified(state);
> >  	int ret;
> >  
> > @@ -4071,13 +3959,11 @@ skl_compute_ddb(struct drm_atomic_state *state)
> >  	}
> >  
> >  	for_each_intel_crtc_mask(dev, intel_crtc, realloc_pipes) {
> > -		struct intel_crtc_state *cstate;
> > -
> >  		cstate = intel_atomic_get_crtc_state(state, intel_crtc);
> >  		if (IS_ERR(cstate))
> >  			return PTR_ERR(cstate);
> >  
> > -		ret = skl_allocate_pipe_ddb(cstate, ddb);
> > +		ret = skl_allocate_pipe_ddb(cstate, new_ddb);
> >  		if (ret)
> >  			return ret;
> >  
> > @@ -4086,6 +3972,73 @@ skl_compute_ddb(struct drm_atomic_state *state)
> >  			return ret;
> >  	}
> >  
> > +	/*
> > +	 * When setting up a new DDB allocation arrangement, we need to
> > +	 * correctly sequence the times at which the new allocations for the
> > +	 * pipes are taken into account or we'll have pipes fetching from space
> > +	 * previously allocated to another pipe.
> > +	 *
> > +	 * Roughly the final sequence we want looks like this:
> > +	 *  1. Disable any pipes we're not going to be using anymore
> > +	 *  2. Reallocate all of the active pipes whose new ddb allocations
> > +	 *  won't overlap with another active pipe's ddb allocation.
> > +	 *  3. Reallocate remaining active pipes, if any.
> > +	 *  4. Enable any new pipes, if any.
> > +	 *
> > +	 * Example:
> > +	 * Initially DDB looks like this:
> > +	 *   |   B    |   C    |
> > +	 * And the final DDB should look like this:
> > +	 *   |  B  |  C  |  A  |
> > +	 *
> > +	 * 1. We're not disabling any pipes, so do nothing on this step.
> > +	 * 2. Pipe B's new allocation wouldn't overlap with pipe C, however
> > +	 * pipe C's new allocation does overlap with pipe B's current
> > +	 * allocation. Reallocate B first so the DDB looks like this:
> > +	 *   |  B  |xx|   C    |
> > +	 * 3. Now we can safely reallocate pipe C to it's new location:
> > +	 *   |  B  |  C  |xxxxx|
> > +	 * 4. Enable any remaining pipes, in this case A
> > +	 *   |  B  |  C  |  A  |
> > +	 *
> > +	 * As well, between every pipe reallocation we have to wait for a
> > +	 * vblank on the pipe so that we ensure it's new allocation has taken
> > +	 * effect by the time we start moving the next pipe. This can be
> > +	 * skipped on the last step we need to perform, which is why we keep
> > +	 * track of that information here. For example, if we've reallocated
> > +	 * all the pipes that need changing by the time we reach step 3, we can
> > +	 * finish without waiting for the pipes we changed in step 3 to update.
> > +	 */
> > +	for_each_intel_crtc_mask(dev, intel_crtc, realloc_pipes) {
> > +		enum pipe pipe = intel_crtc->pipe;
> > +		enum skl_ddb_step step;
> > +
> > +		cstate = intel_atomic_get_crtc_state(state, intel_crtc);
> > +		if (IS_ERR(cstate))
> > +			return PTR_ERR(cstate);
> > +
> > +		/* Step 1: Pipes we're disabling / haven't changed */
> > +		if (skl_ddb_allocation_equals(old_ddb, new_ddb, pipe) ||
> > +		    new_ddb->pipe[pipe].end == 0) {
> > +			step = SKL_DDB_STEP_NONE;
> > +		/* Step 2-3: Active pipes we're reallocating */
> > +		} else if (old_ddb->pipe[pipe].end != 0) {
> > +			if (skl_ddb_allocation_overlaps(state, old_ddb, new_ddb,
> > +							pipe))
> > +				step = SKL_DDB_STEP_OVERLAP;
> > +			else
> > +				step = SKL_DDB_STEP_NO_OVERLAP;
> > +		/* Step 4: Pipes we're enabling */
> > +		} else {
> > +			step = SKL_DDB_STEP_FINAL;
> > +		}
> > +
> > +		cstate->wm.skl.ddb_realloc = step;
> > +
> > +		if (step > intel_state->last_ddb_step)
> > +			intel_state->last_ddb_step = step;
> > +	}
> > +
> >  	return 0;
> >  }
> >  
> > @@ -4110,10 +4063,13 @@ skl_copy_wm_for_pipe(struct skl_wm_values *dst,
> >  static int
> >  skl_compute_wm(struct drm_atomic_state *state)
> >  {
> > +	struct drm_i915_private *dev_priv = to_i915(state->dev);
> >  	struct drm_crtc *crtc;
> >  	struct drm_crtc_state *cstate;
> >  	struct intel_atomic_state *intel_state = to_intel_atomic_state(state);
> >  	struct skl_wm_values *results = &intel_state->wm_results;
> > +	struct skl_ddb_allocation *old_ddb = &dev_priv->wm.skl_hw.ddb;
> > +	struct skl_ddb_allocation *new_ddb = &results->ddb;
> >  	struct skl_pipe_wm *pipe_wm;
> >  	bool changed = false;
> >  	int ret, i;
> > @@ -4152,7 +4108,10 @@ skl_compute_wm(struct drm_atomic_state *state)
> >  		struct intel_crtc *intel_crtc = to_intel_crtc(crtc);
> >  		struct intel_crtc_state *intel_cstate =
> >  			to_intel_crtc_state(cstate);
> > +		enum skl_ddb_step step;
> > +		enum pipe pipe;
> >  
> > +		pipe = intel_crtc->pipe;
> >  		pipe_wm = &intel_cstate->wm.skl.optimal;
> >  		ret = skl_update_pipe_wm(cstate, &results->ddb, pipe_wm,
> >  					 &changed);
> > @@ -4167,7 +4126,18 @@ skl_compute_wm(struct drm_atomic_state *state)
> >  			continue;
> >  
> >  		intel_cstate->update_wm_pre = true;
> > +		step = intel_cstate->wm.skl.ddb_realloc;
> >  		skl_compute_wm_results(crtc->dev, pipe_wm, results, intel_crtc);
> > +
> > +		if (!skl_ddb_entry_equal(&old_ddb->pipe[pipe],
> > +					 &new_ddb->pipe[pipe])) {
> > +			DRM_DEBUG_KMS(
> > +			    "DDB changes for [CRTC:%d:pipe %c]: (%3d - %3d) -> (%3d - %3d) on step %d\n",
> > +			    intel_crtc->base.base.id, pipe_name(pipe),
> > +			    old_ddb->pipe[pipe].start, old_ddb->pipe[pipe].end,
> > +			    new_ddb->pipe[pipe].start, new_ddb->pipe[pipe].end,
> > +			    step);
> > +		}
> >  	}
> >  
> >  	return 0;
> > @@ -4191,8 +4161,20 @@ static void skl_update_wm(struct drm_crtc *crtc)
> >  
> >  	mutex_lock(&dev_priv->wm.wm_mutex);
> >  
> > -	skl_write_wm_values(dev_priv, results);
> > -	skl_flush_wm_values(dev_priv, results);
> > +	/*
> > +	 * If this pipe isn't active already, we're going to be enabling it
> > +	 * very soon. Since it's safe to update these while the pipe's shut off,
> > +	 * just do so here. Already active pipes will have their watermarks
> > +	 * updated once we update their planes.
> > +	 */
> > +	if (!intel_crtc->active) {
> > +		int plane;
> > +
> > +		for (plane = 0; plane < intel_num_planes(intel_crtc); plane++)
> > +			skl_write_plane_wm(intel_crtc, results, plane);
> > +
> > +		skl_write_cursor_wm(intel_crtc, results);
> > +	}
> >  
> >  	/*
> >  	 * Store the new configuration (but only for the pipes that have
> > -- 
> > 2.7.4
> 
> -- 
> Ville Syrjälä
> Intel OTC
Ville Syrjälä Aug. 4, 2016, 6:34 a.m. UTC | #4
On Wed, Aug 03, 2016 at 03:19:28PM -0700, Matt Roper wrote:
> On Wed, Aug 03, 2016 at 06:00:42PM +0300, Ville Syrjälä wrote:
> > On Tue, Aug 02, 2016 at 06:37:37PM -0400, Lyude wrote:
> > > Now that we can hook into update_crtcs and control the order in which we
> > > update CRTCs at each modeset, we can finish the final step of fixing
> > > Skylake's watermark handling by performing DDB updates at the same time
> > > as plane updates and watermark updates.
> > > 
> > > The first major change in this patch is skl_update_crtcs(), which
> > > handles ensuring that we order each CRTC update in our atomic commits
> > > properly so that they honor the DDB flush order.
> > > 
> > > The second major change in this patch is the order in which we flush the
> > > pipes. While the previous order may have worked, it can't be used in
> > > this approach since it no longer will do the right thing. For example,
> > > using the old ddb flush order:
> > > 
> > > We have pipes A, B, and C enabled, and we're disabling C. Initial ddb
> > > allocation looks like this:
> > > 
> > > |   A   |   B   |xxxxxxx|
> > > 
> > > Since we're performing the ddb updates after performing any CRTC
> > > disablements in intel_atomic_commit_tail(), the space to the right of
> > > pipe B is unallocated.
> > > 
> > > 1. Flush pipes with new allocation contained into old space. None
> > >    apply, so we skip this
> > > 2. Flush pipes having their allocation reduced, but overlapping with a
> > >    previous allocation. None apply, so we also skip this
> > > 3. Flush pipes that got more space allocated. This applies to A and B,
> > >    giving us the following update order: A, B
> > > 
> > > This is wrong, since updating pipe A first will cause it to overlap with
> > > B and potentially burst into flames. Our new order (see the code
> > > comments for details) would update the pipes in the proper order: B, A.
> > > 
> > > As well, we calculate the order for each DDB update during the check
> > > phase, and reference it later in the commit phase when we hit
> > > skl_update_crtcs().
> > > 
> > > This long overdue patch fixes the rest of the underruns on Skylake.
> > > 
> > > Changes since v1:
> > >  - Add skl_ddb_entry_write() for cursor into skl_write_cursor_wm()
> > > 
> > > Fixes: 0e8fb7ba7ca5 ("drm/i915/skl: Flush the WM configuration")
> > > Fixes: 8211bd5bdf5e ("drm/i915/skl: Program the DDB allocation")
> > > Signed-off-by: Lyude <cpaul@redhat.com>
> > > [omitting CC for stable, since this patch will need to be changed for
> > > such backports first]
> > > Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
> > > Cc: Daniel Vetter <daniel.vetter@intel.com>
> > > Cc: Radhakrishna Sripada <radhakrishna.sripada@intel.com>
> > > Cc: Hans de Goede <hdegoede@redhat.com>
> > > Cc: Matt Roper <matthew.d.roper@intel.com>
> > > ---
> > >  drivers/gpu/drm/i915/intel_display.c | 100 ++++++++++--
> > >  drivers/gpu/drm/i915/intel_drv.h     |  10 ++
> > >  drivers/gpu/drm/i915/intel_pm.c      | 288 ++++++++++++++++-------------------
> > >  3 files changed, 233 insertions(+), 165 deletions(-)
> > > 
> > > diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
> > > index 59cf513..06295f7 100644
> > > --- a/drivers/gpu/drm/i915/intel_display.c
> > > +++ b/drivers/gpu/drm/i915/intel_display.c
> > > @@ -12897,16 +12897,23 @@ static void verify_wm_state(struct drm_crtc *crtc,
> > >  			  hw_entry->start, hw_entry->end);
> > >  	}
> > >  
> > > -	/* cursor */
> > > -	hw_entry = &hw_ddb.plane[pipe][PLANE_CURSOR];
> > > -	sw_entry = &sw_ddb->plane[pipe][PLANE_CURSOR];
> > > -
> > > -	if (!skl_ddb_entry_equal(hw_entry, sw_entry)) {
> > > -		DRM_ERROR("mismatch in DDB state pipe %c cursor "
> > > -			  "(expected (%u,%u), found (%u,%u))\n",
> > > -			  pipe_name(pipe),
> > > -			  sw_entry->start, sw_entry->end,
> > > -			  hw_entry->start, hw_entry->end);
> > > +	/*
> > > +	 * cursor
> > > +	 * If the cursor plane isn't active, we may not have updated it's ddb
> > > +	 * allocation. In that case since the ddb allocation will be updated
> > > +	 * once the plane becomes visible, we can skip this check
> > > +	 */
> > > +	if (intel_crtc->cursor_addr) {
> > > +		hw_entry = &hw_ddb.plane[pipe][PLANE_CURSOR];
> > > +		sw_entry = &sw_ddb->plane[pipe][PLANE_CURSOR];
> > > +
> > > +		if (!skl_ddb_entry_equal(hw_entry, sw_entry)) {
> > > +			DRM_ERROR("mismatch in DDB state pipe %c cursor "
> > > +				  "(expected (%u,%u), found (%u,%u))\n",
> > > +				  pipe_name(pipe),
> > > +				  sw_entry->start, sw_entry->end,
> > > +				  hw_entry->start, hw_entry->end);
> > > +		}
> > >  	}
> > >  }
> > >  
> > > @@ -13658,6 +13665,72 @@ static void intel_update_crtcs(struct drm_atomic_state *state,
> > >  	}
> > >  }
> > >  
> > > +static inline void
> > > +skl_do_ddb_step(struct drm_atomic_state *state,
> > > +		enum skl_ddb_step step)
> > > +{
> > > +	struct intel_atomic_state *intel_state = to_intel_atomic_state(state);
> > > +	struct drm_crtc *crtc;
> > > +	struct drm_crtc_state *old_crtc_state;
> > > +	unsigned int crtc_vblank_mask; /* unused */
> > > +	int i;
> > > +
> > > +	for_each_crtc_in_state(state, crtc, old_crtc_state, i) {
> > > +		struct intel_crtc *intel_crtc = to_intel_crtc(crtc);
> > > +		struct intel_crtc_state *cstate =
> > > +			to_intel_crtc_state(crtc->state);
> > > +		bool vblank_wait = false;
> > > +
> > > +		if (cstate->wm.skl.ddb_realloc != step || !crtc->state->active)
> > > +			continue;
> > > +
> > > +		/*
> > > +		 * If we're changing the ddb allocation of this pipe to make
> > > +		 * room for another pipe, we have to wait for the pipe's ddb
> > > +		 * allocations to actually update by waiting for a vblank.
> > > +		 * Otherwise we risk the next pipe updating before this pipe
> > > +		 * finishes, resulting in the pipe fetching from ddb space for
> > > +		 * the wrong pipe.
> > > +		 *
> > > +		 * However, if we know we don't have any more pipes to move
> > > +		 * around, we can skip this wait and the new ddb allocation
> > > +		 * will take effect at the start of the next vblank.
> > > +		 */
> > > +		switch (step) {
> > > +		case SKL_DDB_STEP_NO_OVERLAP:
> > > +		case SKL_DDB_STEP_OVERLAP:
> > > +			if (step != intel_state->last_ddb_step)
> > > +				vblank_wait = true;
> > > +
> > > +		/* drop through */
> > > +		case SKL_DDB_STEP_FINAL:
> > > +			DRM_DEBUG_KMS(
> > > +			    "Updating [CRTC:%d:pipe %c] for DDB step %d\n",
> > > +			    crtc->base.id, pipe_name(intel_crtc->pipe),
> > > +			    step);
> > > +
> > > +		case SKL_DDB_STEP_NONE:
> > > +			break;
> > > +		}
> > 
> > Not sure we really need this step stuff. How about?
> > 
> > for_each_crtc
> > 	if (crtc_needs_disabling)
> > 		disable_crtc();
> > 
> > do {
> > 	progress = false;
> > 	wait_vbl_pipes=0;
> > 	for_each_crtc() {
> > 		if (!active || needs_modeset)
> > 			continue;
> > 		if (!ddb_changed)
> > 			continue;
> > 		if (new_ddb_overlaps_with_any_other_pipes_current_ddb)
> > 			continue;
> > 		commit;
> > 		wait_vbl_pipes |= pipe;
> > 		progress = true;
> > 	}
> > 	wait_vbls(wait_vbl_pipes);
> > } while (progress);
> > 
> > for_each_crtc
> > 	if (crtc_needs_enabling)
> > 		enable_crtc();
> > 	commit;
> > }
> 
> Yeah, this approach looks nicer.  It's a bit simpler to follow code-wise
> and doesn't require us to precompute any ordering during the check phase
> so it's a bit more self-contained.  It should also scale properly if
> future platforms decide to add more pipes.

Yep.

> 
> > 
> > Or if we're paranoid, we could also have an upper bound on the
> > loop and assert that we never reach it.
> > 
> > 
> > Though one thing I don't particularly like about this commit while
> > changing the ddb approach is that it's going to make the update
> > appear even less atomic. What I'd rather like to do for the normal
> > commit path is this:
> > 
> > for_each_crtc
> > 	if (crtc_needs_disabling)
> > 		disable_planes
> > for_each_crtc
> > 	if (crtc_needs_disabling)
> > 		disable_crtc
> > for_each_crtc
> > 	if (crtc_needs_enabling)
> > 		enable_crtc
> > for_each_crtc
> > 	if (active)
> > 		commit_planes;
> > 
> > That way everything would pop in and out as close together as possible.
> > Hmm. Actually, I wonder... I'm thinking we should be able to enable all
> > crtcs prior to entering the ddb commit loop, on account of no planes
> > being enabled on those crtcs until we commit them. And if no planes are
> > enabled, running the pipe w/o allocated ddb should be fine. So with that
> > approach, I think we should be able to commit all planes within a few
> > iterations of the loop, and hence within a few vblanks.
> 
> So this is pretty similar to what we do today, except that we do the
> enabling/disabling of each CRTC and its planes all together, right?

Yeah. Should provide better experience in case of "genlocked" pipes at
least, eg. for those 2 part 4k MST monitors.

> Sounds reasonable to me, although I'm not sure we want to mix that
> change in with the gen9-specific series Lyude is working on here.  Maybe
> just do the new gen9 handler that way as part of that series and then
> come back and update the non-gen9 handler to follow the new flow as a
> separate patch?

Sounds good. First fix gen9, then make things pretty :)

As far as my idea of enabling the pipes on gen9 before the commit loop,
I think that would also avoid having to commit the planes separately on
those newly enabled crtcs. My 'progress' loop would take care of those
pipes as well (would just have to drop the needs_modeset check).

> 
> 
> Matt
> 
> > 
> > > +
> > > +		intel_update_crtc(crtc, state, old_crtc_state,
> > > +				  &crtc_vblank_mask);
> > > +
> > > +		if (vblank_wait)
> > > +			intel_wait_for_vblank(state->dev, intel_crtc->pipe);
> > > +	}
> > > +}
> > > +
> > > +static void skl_update_crtcs(struct drm_atomic_state *state,
> > > +			     unsigned int *crtc_vblank_mask)
> > > +{
> > > +	struct intel_atomic_state *intel_state = to_intel_atomic_state(state);
> > > +	enum skl_ddb_step step;
> > > +
> > > +	for (step = 0; step <= intel_state->last_ddb_step; step++)
> > > +		skl_do_ddb_step(state, step);
> > > +}
> > > +
> > >  static void intel_atomic_commit_tail(struct drm_atomic_state *state)
> > >  {
> > >  	struct drm_device *dev = state->dev;
> > > @@ -15235,8 +15308,6 @@ void intel_init_display_hooks(struct drm_i915_private *dev_priv)
> > >  		dev_priv->display.crtc_disable = i9xx_crtc_disable;
> > >  	}
> > >  
> > > -	dev_priv->display.update_crtcs = intel_update_crtcs;
> > > -
> > >  	/* Returns the core display clock speed */
> > >  	if (IS_SKYLAKE(dev_priv) || IS_KABYLAKE(dev_priv))
> > >  		dev_priv->display.get_display_clock_speed =
> > > @@ -15326,6 +15397,11 @@ void intel_init_display_hooks(struct drm_i915_private *dev_priv)
> > >  			skl_modeset_calc_cdclk;
> > >  	}
> > >  
> > > +	if (dev_priv->info.gen >= 9)
> > > +		dev_priv->display.update_crtcs = skl_update_crtcs;
> > > +	else
> > > +		dev_priv->display.update_crtcs = intel_update_crtcs;
> > > +
> > >  	switch (INTEL_INFO(dev_priv)->gen) {
> > >  	case 2:
> > >  		dev_priv->display.queue_flip = intel_gen2_queue_flip;
> > > diff --git a/drivers/gpu/drm/i915/intel_drv.h b/drivers/gpu/drm/i915/intel_drv.h
> > > index 1b444d3..cf5da83 100644
> > > --- a/drivers/gpu/drm/i915/intel_drv.h
> > > +++ b/drivers/gpu/drm/i915/intel_drv.h
> > > @@ -334,6 +334,7 @@ struct intel_atomic_state {
> > >  
> > >  	/* Gen9+ only */
> > >  	struct skl_wm_values wm_results;
> > > +	int last_ddb_step;
> > >  };
> > >  
> > >  struct intel_plane_state {
> > > @@ -437,6 +438,13 @@ struct skl_pipe_wm {
> > >  	uint32_t linetime;
> > >  };
> > >  
> > > +enum skl_ddb_step {
> > > +	SKL_DDB_STEP_NONE = 0,
> > > +	SKL_DDB_STEP_NO_OVERLAP,
> > > +	SKL_DDB_STEP_OVERLAP,
> > > +	SKL_DDB_STEP_FINAL
> > > +};
> > > +
> > >  struct intel_crtc_wm_state {
> > >  	union {
> > >  		struct {
> > > @@ -467,6 +475,8 @@ struct intel_crtc_wm_state {
> > >  			/* minimum block allocation */
> > >  			uint16_t minimum_blocks[I915_MAX_PLANES];
> > >  			uint16_t minimum_y_blocks[I915_MAX_PLANES];
> > > +
> > > +			enum skl_ddb_step ddb_realloc;
> > >  		} skl;
> > >  	};
> > >  
> > > diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c
> > > index 6f5beb3..636c90a 100644
> > > --- a/drivers/gpu/drm/i915/intel_pm.c
> > > +++ b/drivers/gpu/drm/i915/intel_pm.c
> > > @@ -3816,6 +3816,11 @@ void skl_write_plane_wm(struct intel_crtc *intel_crtc,
> > >  			   wm->plane[pipe][plane][level]);
> > >  	}
> > >  	I915_WRITE(PLANE_WM_TRANS(pipe, plane), wm->plane_trans[pipe][plane]);
> > > +
> > > +	skl_ddb_entry_write(dev_priv, PLANE_BUF_CFG(pipe, plane),
> > > +			    &wm->ddb.plane[pipe][plane]);
> > > +	skl_ddb_entry_write(dev_priv, PLANE_NV12_BUF_CFG(pipe, plane),
> > > +			    &wm->ddb.y_plane[pipe][plane]);
> > >  }
> > >  
> > >  void skl_write_cursor_wm(struct intel_crtc *intel_crtc,
> > > @@ -3832,170 +3837,51 @@ void skl_write_cursor_wm(struct intel_crtc *intel_crtc,
> > >  			   wm->plane[pipe][PLANE_CURSOR][level]);
> > >  	}
> > >  	I915_WRITE(CUR_WM_TRANS(pipe), wm->plane_trans[pipe][PLANE_CURSOR]);
> > > -}
> > > -
> > > -static void skl_write_wm_values(struct drm_i915_private *dev_priv,
> > > -				const struct skl_wm_values *new)
> > > -{
> > > -	struct drm_device *dev = &dev_priv->drm;
> > > -	struct intel_crtc *crtc;
> > > -
> > > -	for_each_intel_crtc(dev, crtc) {
> > > -		int i;
> > > -		enum pipe pipe = crtc->pipe;
> > > -
> > > -		if ((new->dirty_pipes & drm_crtc_mask(&crtc->base)) == 0)
> > > -			continue;
> > > -		if (!crtc->active)
> > > -			continue;
> > >  
> > > -		for (i = 0; i < intel_num_planes(crtc); i++) {
> > > -			skl_ddb_entry_write(dev_priv,
> > > -					    PLANE_BUF_CFG(pipe, i),
> > > -					    &new->ddb.plane[pipe][i]);
> > > -			skl_ddb_entry_write(dev_priv,
> > > -					    PLANE_NV12_BUF_CFG(pipe, i),
> > > -					    &new->ddb.y_plane[pipe][i]);
> > > -		}
> > > -
> > > -		skl_ddb_entry_write(dev_priv, CUR_BUF_CFG(pipe),
> > > -				    &new->ddb.plane[pipe][PLANE_CURSOR]);
> > > -	}
> > > +	skl_ddb_entry_write(dev_priv, CUR_BUF_CFG(pipe),
> > > +			    &wm->ddb.plane[pipe][PLANE_CURSOR]);
> > >  }
> > >  
> > > -/*
> > > - * When setting up a new DDB allocation arrangement, we need to correctly
> > > - * sequence the times at which the new allocations for the pipes are taken into
> > > - * account or we'll have pipes fetching from space previously allocated to
> > > - * another pipe.
> > > - *
> > > - * Roughly the sequence looks like:
> > > - *  1. re-allocate the pipe(s) with the allocation being reduced and not
> > > - *     overlapping with a previous light-up pipe (another way to put it is:
> > > - *     pipes with their new allocation strickly included into their old ones).
> > > - *  2. re-allocate the other pipes that get their allocation reduced
> > > - *  3. allocate the pipes having their allocation increased
> > > - *
> > > - * Steps 1. and 2. are here to take care of the following case:
> > > - * - Initially DDB looks like this:
> > > - *     |   B    |   C    |
> > > - * - enable pipe A.
> > > - * - pipe B has a reduced DDB allocation that overlaps with the old pipe C
> > > - *   allocation
> > > - *     |  A  |  B  |  C  |
> > > - *
> > > - * We need to sequence the re-allocation: C, B, A (and not B, C, A).
> > > - */
> > > -
> > > -static void
> > > -skl_wm_flush_pipe(struct drm_i915_private *dev_priv, enum pipe pipe, int pass)
> > > +static bool
> > > +skl_ddb_allocation_equals(const struct skl_ddb_allocation *old,
> > > +			  const struct skl_ddb_allocation *new,
> > > +			  enum pipe pipe)
> > >  {
> > > -	int plane;
> > > -
> > > -	DRM_DEBUG_KMS("flush pipe %c (pass %d)\n", pipe_name(pipe), pass);
> > > -
> > > -	for_each_plane(dev_priv, pipe, plane) {
> > > -		I915_WRITE(PLANE_SURF(pipe, plane),
> > > -			   I915_READ(PLANE_SURF(pipe, plane)));
> > > -	}
> > > -	I915_WRITE(CURBASE(pipe), I915_READ(CURBASE(pipe)));
> > > +	return new->pipe[pipe].start == old->pipe[pipe].start &&
> > > +	       new->pipe[pipe].end == old->pipe[pipe].end;
> > >  }
> > >  
> > >  static bool
> > > -skl_ddb_allocation_included(const struct skl_ddb_allocation *old,
> > > +skl_ddb_allocation_overlaps(struct drm_atomic_state *state,
> > > +			    const struct skl_ddb_allocation *old,
> > >  			    const struct skl_ddb_allocation *new,
> > >  			    enum pipe pipe)
> > >  {
> > > -	uint16_t old_size, new_size;
> > > -
> > > -	old_size = skl_ddb_entry_size(&old->pipe[pipe]);
> > > -	new_size = skl_ddb_entry_size(&new->pipe[pipe]);
> > > -
> > > -	return old_size != new_size &&
> > > -	       new->pipe[pipe].start >= old->pipe[pipe].start &&
> > > -	       new->pipe[pipe].end <= old->pipe[pipe].end;
> > > -}
> > > -
> > > -static void skl_flush_wm_values(struct drm_i915_private *dev_priv,
> > > -				struct skl_wm_values *new_values)
> > > -{
> > > -	struct drm_device *dev = &dev_priv->drm;
> > > -	struct skl_ddb_allocation *cur_ddb, *new_ddb;
> > > -	bool reallocated[I915_MAX_PIPES] = {};
> > > -	struct intel_crtc *crtc;
> > > -	enum pipe pipe;
> > > -
> > > -	new_ddb = &new_values->ddb;
> > > -	cur_ddb = &dev_priv->wm.skl_hw.ddb;
> > > -
> > > -	/*
> > > -	 * First pass: flush the pipes with the new allocation contained into
> > > -	 * the old space.
> > > -	 *
> > > -	 * We'll wait for the vblank on those pipes to ensure we can safely
> > > -	 * re-allocate the freed space without this pipe fetching from it.
> > > -	 */
> > > -	for_each_intel_crtc(dev, crtc) {
> > > -		if (!crtc->active)
> > > -			continue;
> > > -
> > > -		pipe = crtc->pipe;
> > > -
> > > -		if (!skl_ddb_allocation_included(cur_ddb, new_ddb, pipe))
> > > -			continue;
> > > -
> > > -		skl_wm_flush_pipe(dev_priv, pipe, 1);
> > > -		intel_wait_for_vblank(dev, pipe);
> > > -
> > > -		reallocated[pipe] = true;
> > > -	}
> > > -
> > > -
> > > -	/*
> > > -	 * Second pass: flush the pipes that are having their allocation
> > > -	 * reduced, but overlapping with a previous allocation.
> > > -	 *
> > > -	 * Here as well we need to wait for the vblank to make sure the freed
> > > -	 * space is not used anymore.
> > > -	 */
> > > -	for_each_intel_crtc(dev, crtc) {
> > > -		if (!crtc->active)
> > > -			continue;
> > > -
> > > -		pipe = crtc->pipe;
> > > -
> > > -		if (reallocated[pipe])
> > > -			continue;
> > > -
> > > -		if (skl_ddb_entry_size(&new_ddb->pipe[pipe]) <
> > > -		    skl_ddb_entry_size(&cur_ddb->pipe[pipe])) {
> > > -			skl_wm_flush_pipe(dev_priv, pipe, 2);
> > > -			intel_wait_for_vblank(dev, pipe);
> > > -			reallocated[pipe] = true;
> > > -		}
> > > -	}
> > > -
> > > -	/*
> > > -	 * Third pass: flush the pipes that got more space allocated.
> > > -	 *
> > > -	 * We don't need to actively wait for the update here, next vblank
> > > -	 * will just get more DDB space with the correct WM values.
> > > -	 */
> > > -	for_each_intel_crtc(dev, crtc) {
> > > -		if (!crtc->active)
> > > -			continue;
> > > +	struct drm_device *dev = state->dev;
> > > +	struct intel_crtc *intel_crtc;
> > > +	enum pipe otherp;
> > >  
> > > -		pipe = crtc->pipe;
> > > +	for_each_intel_crtc(dev, intel_crtc) {
> > > +		otherp = intel_crtc->pipe;
> > >  
> > >  		/*
> > > -		 * At this point, only the pipes more space than before are
> > > -		 * left to re-allocate.
> > > +		 * When checking for overlaps, we don't want to:
> > > +		 *  - Compare against ourselves
> > > +		 *  - Compare against pipes that will be disabled in step 0
> > > +		 *  - Compare against pipes that won't be enabled until step 3
> > >  		 */
> > > -		if (reallocated[pipe])
> > > +		if (otherp == pipe || !new->pipe[otherp].end ||
> > > +		    !old->pipe[otherp].end)
> > >  			continue;
> > >  
> > > -		skl_wm_flush_pipe(dev_priv, pipe, 3);
> > > +		if ((new->pipe[pipe].start >= old->pipe[otherp].start &&
> > > +		     new->pipe[pipe].start < old->pipe[otherp].end) ||
> > > +		    (old->pipe[otherp].start >= new->pipe[pipe].start &&
> > > +		     old->pipe[otherp].start < new->pipe[pipe].end))
> > > +			return true;
> > >  	}
> > > +
> > > +	return false;
> > >  }
> > >  
> > >  static int skl_update_pipe_wm(struct drm_crtc_state *cstate,
> > > @@ -4038,8 +3924,10 @@ skl_compute_ddb(struct drm_atomic_state *state)
> > >  	struct drm_device *dev = state->dev;
> > >  	struct drm_i915_private *dev_priv = to_i915(dev);
> > >  	struct intel_atomic_state *intel_state = to_intel_atomic_state(state);
> > > +	struct intel_crtc_state *cstate;
> > >  	struct intel_crtc *intel_crtc;
> > > -	struct skl_ddb_allocation *ddb = &intel_state->wm_results.ddb;
> > > +	struct skl_ddb_allocation *old_ddb = &dev_priv->wm.skl_hw.ddb;
> > > +	struct skl_ddb_allocation *new_ddb = &intel_state->wm_results.ddb;
> > >  	uint32_t realloc_pipes = pipes_modified(state);
> > >  	int ret;
> > >  
> > > @@ -4071,13 +3959,11 @@ skl_compute_ddb(struct drm_atomic_state *state)
> > >  	}
> > >  
> > >  	for_each_intel_crtc_mask(dev, intel_crtc, realloc_pipes) {
> > > -		struct intel_crtc_state *cstate;
> > > -
> > >  		cstate = intel_atomic_get_crtc_state(state, intel_crtc);
> > >  		if (IS_ERR(cstate))
> > >  			return PTR_ERR(cstate);
> > >  
> > > -		ret = skl_allocate_pipe_ddb(cstate, ddb);
> > > +		ret = skl_allocate_pipe_ddb(cstate, new_ddb);
> > >  		if (ret)
> > >  			return ret;
> > >  
> > > @@ -4086,6 +3972,73 @@ skl_compute_ddb(struct drm_atomic_state *state)
> > >  			return ret;
> > >  	}
> > >  
> > > +	/*
> > > +	 * When setting up a new DDB allocation arrangement, we need to
> > > +	 * correctly sequence the times at which the new allocations for the
> > > +	 * pipes are taken into account or we'll have pipes fetching from space
> > > +	 * previously allocated to another pipe.
> > > +	 *
> > > +	 * Roughly the final sequence we want looks like this:
> > > +	 *  1. Disable any pipes we're not going to be using anymore
> > > +	 *  2. Reallocate all of the active pipes whose new ddb allocations
> > > +	 *  won't overlap with another active pipe's ddb allocation.
> > > +	 *  3. Reallocate remaining active pipes, if any.
> > > +	 *  4. Enable any new pipes, if any.
> > > +	 *
> > > +	 * Example:
> > > +	 * Initially DDB looks like this:
> > > +	 *   |   B    |   C    |
> > > +	 * And the final DDB should look like this:
> > > +	 *   |  B  |  C  |  A  |
> > > +	 *
> > > +	 * 1. We're not disabling any pipes, so do nothing on this step.
> > > +	 * 2. Pipe B's new allocation wouldn't overlap with pipe C, however
> > > +	 * pipe C's new allocation does overlap with pipe B's current
> > > +	 * allocation. Reallocate B first so the DDB looks like this:
> > > +	 *   |  B  |xx|   C    |
> > > +	 * 3. Now we can safely reallocate pipe C to it's new location:
> > > +	 *   |  B  |  C  |xxxxx|
> > > +	 * 4. Enable any remaining pipes, in this case A
> > > +	 *   |  B  |  C  |  A  |
> > > +	 *
> > > +	 * As well, between every pipe reallocation we have to wait for a
> > > +	 * vblank on the pipe so that we ensure it's new allocation has taken
> > > +	 * effect by the time we start moving the next pipe. This can be
> > > +	 * skipped on the last step we need to perform, which is why we keep
> > > +	 * track of that information here. For example, if we've reallocated
> > > +	 * all the pipes that need changing by the time we reach step 3, we can
> > > +	 * finish without waiting for the pipes we changed in step 3 to update.
> > > +	 */
> > > +	for_each_intel_crtc_mask(dev, intel_crtc, realloc_pipes) {
> > > +		enum pipe pipe = intel_crtc->pipe;
> > > +		enum skl_ddb_step step;
> > > +
> > > +		cstate = intel_atomic_get_crtc_state(state, intel_crtc);
> > > +		if (IS_ERR(cstate))
> > > +			return PTR_ERR(cstate);
> > > +
> > > +		/* Step 1: Pipes we're disabling / haven't changed */
> > > +		if (skl_ddb_allocation_equals(old_ddb, new_ddb, pipe) ||
> > > +		    new_ddb->pipe[pipe].end == 0) {
> > > +			step = SKL_DDB_STEP_NONE;
> > > +		/* Step 2-3: Active pipes we're reallocating */
> > > +		} else if (old_ddb->pipe[pipe].end != 0) {
> > > +			if (skl_ddb_allocation_overlaps(state, old_ddb, new_ddb,
> > > +							pipe))
> > > +				step = SKL_DDB_STEP_OVERLAP;
> > > +			else
> > > +				step = SKL_DDB_STEP_NO_OVERLAP;
> > > +		/* Step 4: Pipes we're enabling */
> > > +		} else {
> > > +			step = SKL_DDB_STEP_FINAL;
> > > +		}
> > > +
> > > +		cstate->wm.skl.ddb_realloc = step;
> > > +
> > > +		if (step > intel_state->last_ddb_step)
> > > +			intel_state->last_ddb_step = step;
> > > +	}
> > > +
> > >  	return 0;
> > >  }
> > >  
> > > @@ -4110,10 +4063,13 @@ skl_copy_wm_for_pipe(struct skl_wm_values *dst,
> > >  static int
> > >  skl_compute_wm(struct drm_atomic_state *state)
> > >  {
> > > +	struct drm_i915_private *dev_priv = to_i915(state->dev);
> > >  	struct drm_crtc *crtc;
> > >  	struct drm_crtc_state *cstate;
> > >  	struct intel_atomic_state *intel_state = to_intel_atomic_state(state);
> > >  	struct skl_wm_values *results = &intel_state->wm_results;
> > > +	struct skl_ddb_allocation *old_ddb = &dev_priv->wm.skl_hw.ddb;
> > > +	struct skl_ddb_allocation *new_ddb = &results->ddb;
> > >  	struct skl_pipe_wm *pipe_wm;
> > >  	bool changed = false;
> > >  	int ret, i;
> > > @@ -4152,7 +4108,10 @@ skl_compute_wm(struct drm_atomic_state *state)
> > >  		struct intel_crtc *intel_crtc = to_intel_crtc(crtc);
> > >  		struct intel_crtc_state *intel_cstate =
> > >  			to_intel_crtc_state(cstate);
> > > +		enum skl_ddb_step step;
> > > +		enum pipe pipe;
> > >  
> > > +		pipe = intel_crtc->pipe;
> > >  		pipe_wm = &intel_cstate->wm.skl.optimal;
> > >  		ret = skl_update_pipe_wm(cstate, &results->ddb, pipe_wm,
> > >  					 &changed);
> > > @@ -4167,7 +4126,18 @@ skl_compute_wm(struct drm_atomic_state *state)
> > >  			continue;
> > >  
> > >  		intel_cstate->update_wm_pre = true;
> > > +		step = intel_cstate->wm.skl.ddb_realloc;
> > >  		skl_compute_wm_results(crtc->dev, pipe_wm, results, intel_crtc);
> > > +
> > > +		if (!skl_ddb_entry_equal(&old_ddb->pipe[pipe],
> > > +					 &new_ddb->pipe[pipe])) {
> > > +			DRM_DEBUG_KMS(
> > > +			    "DDB changes for [CRTC:%d:pipe %c]: (%3d - %3d) -> (%3d - %3d) on step %d\n",
> > > +			    intel_crtc->base.base.id, pipe_name(pipe),
> > > +			    old_ddb->pipe[pipe].start, old_ddb->pipe[pipe].end,
> > > +			    new_ddb->pipe[pipe].start, new_ddb->pipe[pipe].end,
> > > +			    step);
> > > +		}
> > >  	}
> > >  
> > >  	return 0;
> > > @@ -4191,8 +4161,20 @@ static void skl_update_wm(struct drm_crtc *crtc)
> > >  
> > >  	mutex_lock(&dev_priv->wm.wm_mutex);
> > >  
> > > -	skl_write_wm_values(dev_priv, results);
> > > -	skl_flush_wm_values(dev_priv, results);
> > > +	/*
> > > +	 * If this pipe isn't active already, we're going to be enabling it
> > > +	 * very soon. Since it's safe to update these while the pipe's shut off,
> > > +	 * just do so here. Already active pipes will have their watermarks
> > > +	 * updated once we update their planes.
> > > +	 */
> > > +	if (!intel_crtc->active) {
> > > +		int plane;
> > > +
> > > +		for (plane = 0; plane < intel_num_planes(intel_crtc); plane++)
> > > +			skl_write_plane_wm(intel_crtc, results, plane);
> > > +
> > > +		skl_write_cursor_wm(intel_crtc, results);
> > > +	}
> > >  
> > >  	/*
> > >  	 * Store the new configuration (but only for the pipes that have
> > > -- 
> > > 2.7.4
> > 
> > -- 
> > Ville Syrjälä
> > Intel OTC
> 
> -- 
> Matt Roper
> Graphics Software Engineer
> IoTG Platform Enabling & Development
> Intel Corporation
> (916) 356-2795
diff mbox

Patch

diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
index 59cf513..06295f7 100644
--- a/drivers/gpu/drm/i915/intel_display.c
+++ b/drivers/gpu/drm/i915/intel_display.c
@@ -12897,16 +12897,23 @@  static void verify_wm_state(struct drm_crtc *crtc,
 			  hw_entry->start, hw_entry->end);
 	}
 
-	/* cursor */
-	hw_entry = &hw_ddb.plane[pipe][PLANE_CURSOR];
-	sw_entry = &sw_ddb->plane[pipe][PLANE_CURSOR];
-
-	if (!skl_ddb_entry_equal(hw_entry, sw_entry)) {
-		DRM_ERROR("mismatch in DDB state pipe %c cursor "
-			  "(expected (%u,%u), found (%u,%u))\n",
-			  pipe_name(pipe),
-			  sw_entry->start, sw_entry->end,
-			  hw_entry->start, hw_entry->end);
+	/*
+	 * cursor
+	 * If the cursor plane isn't active, we may not have updated it's ddb
+	 * allocation. In that case since the ddb allocation will be updated
+	 * once the plane becomes visible, we can skip this check
+	 */
+	if (intel_crtc->cursor_addr) {
+		hw_entry = &hw_ddb.plane[pipe][PLANE_CURSOR];
+		sw_entry = &sw_ddb->plane[pipe][PLANE_CURSOR];
+
+		if (!skl_ddb_entry_equal(hw_entry, sw_entry)) {
+			DRM_ERROR("mismatch in DDB state pipe %c cursor "
+				  "(expected (%u,%u), found (%u,%u))\n",
+				  pipe_name(pipe),
+				  sw_entry->start, sw_entry->end,
+				  hw_entry->start, hw_entry->end);
+		}
 	}
 }
 
@@ -13658,6 +13665,72 @@  static void intel_update_crtcs(struct drm_atomic_state *state,
 	}
 }
 
+static inline void
+skl_do_ddb_step(struct drm_atomic_state *state,
+		enum skl_ddb_step step)
+{
+	struct intel_atomic_state *intel_state = to_intel_atomic_state(state);
+	struct drm_crtc *crtc;
+	struct drm_crtc_state *old_crtc_state;
+	unsigned int crtc_vblank_mask; /* unused */
+	int i;
+
+	for_each_crtc_in_state(state, crtc, old_crtc_state, i) {
+		struct intel_crtc *intel_crtc = to_intel_crtc(crtc);
+		struct intel_crtc_state *cstate =
+			to_intel_crtc_state(crtc->state);
+		bool vblank_wait = false;
+
+		if (cstate->wm.skl.ddb_realloc != step || !crtc->state->active)
+			continue;
+
+		/*
+		 * If we're changing the ddb allocation of this pipe to make
+		 * room for another pipe, we have to wait for the pipe's ddb
+		 * allocations to actually update by waiting for a vblank.
+		 * Otherwise we risk the next pipe updating before this pipe
+		 * finishes, resulting in the pipe fetching from ddb space for
+		 * the wrong pipe.
+		 *
+		 * However, if we know we don't have any more pipes to move
+		 * around, we can skip this wait and the new ddb allocation
+		 * will take effect at the start of the next vblank.
+		 */
+		switch (step) {
+		case SKL_DDB_STEP_NO_OVERLAP:
+		case SKL_DDB_STEP_OVERLAP:
+			if (step != intel_state->last_ddb_step)
+				vblank_wait = true;
+
+		/* drop through */
+		case SKL_DDB_STEP_FINAL:
+			DRM_DEBUG_KMS(
+			    "Updating [CRTC:%d:pipe %c] for DDB step %d\n",
+			    crtc->base.id, pipe_name(intel_crtc->pipe),
+			    step);
+
+		case SKL_DDB_STEP_NONE:
+			break;
+		}
+
+		intel_update_crtc(crtc, state, old_crtc_state,
+				  &crtc_vblank_mask);
+
+		if (vblank_wait)
+			intel_wait_for_vblank(state->dev, intel_crtc->pipe);
+	}
+}
+
+static void skl_update_crtcs(struct drm_atomic_state *state,
+			     unsigned int *crtc_vblank_mask)
+{
+	struct intel_atomic_state *intel_state = to_intel_atomic_state(state);
+	enum skl_ddb_step step;
+
+	for (step = 0; step <= intel_state->last_ddb_step; step++)
+		skl_do_ddb_step(state, step);
+}
+
 static void intel_atomic_commit_tail(struct drm_atomic_state *state)
 {
 	struct drm_device *dev = state->dev;
@@ -15235,8 +15308,6 @@  void intel_init_display_hooks(struct drm_i915_private *dev_priv)
 		dev_priv->display.crtc_disable = i9xx_crtc_disable;
 	}
 
-	dev_priv->display.update_crtcs = intel_update_crtcs;
-
 	/* Returns the core display clock speed */
 	if (IS_SKYLAKE(dev_priv) || IS_KABYLAKE(dev_priv))
 		dev_priv->display.get_display_clock_speed =
@@ -15326,6 +15397,11 @@  void intel_init_display_hooks(struct drm_i915_private *dev_priv)
 			skl_modeset_calc_cdclk;
 	}
 
+	if (dev_priv->info.gen >= 9)
+		dev_priv->display.update_crtcs = skl_update_crtcs;
+	else
+		dev_priv->display.update_crtcs = intel_update_crtcs;
+
 	switch (INTEL_INFO(dev_priv)->gen) {
 	case 2:
 		dev_priv->display.queue_flip = intel_gen2_queue_flip;
diff --git a/drivers/gpu/drm/i915/intel_drv.h b/drivers/gpu/drm/i915/intel_drv.h
index 1b444d3..cf5da83 100644
--- a/drivers/gpu/drm/i915/intel_drv.h
+++ b/drivers/gpu/drm/i915/intel_drv.h
@@ -334,6 +334,7 @@  struct intel_atomic_state {
 
 	/* Gen9+ only */
 	struct skl_wm_values wm_results;
+	int last_ddb_step;
 };
 
 struct intel_plane_state {
@@ -437,6 +438,13 @@  struct skl_pipe_wm {
 	uint32_t linetime;
 };
 
+enum skl_ddb_step {
+	SKL_DDB_STEP_NONE = 0,
+	SKL_DDB_STEP_NO_OVERLAP,
+	SKL_DDB_STEP_OVERLAP,
+	SKL_DDB_STEP_FINAL
+};
+
 struct intel_crtc_wm_state {
 	union {
 		struct {
@@ -467,6 +475,8 @@  struct intel_crtc_wm_state {
 			/* minimum block allocation */
 			uint16_t minimum_blocks[I915_MAX_PLANES];
 			uint16_t minimum_y_blocks[I915_MAX_PLANES];
+
+			enum skl_ddb_step ddb_realloc;
 		} skl;
 	};
 
diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c
index 6f5beb3..636c90a 100644
--- a/drivers/gpu/drm/i915/intel_pm.c
+++ b/drivers/gpu/drm/i915/intel_pm.c
@@ -3816,6 +3816,11 @@  void skl_write_plane_wm(struct intel_crtc *intel_crtc,
 			   wm->plane[pipe][plane][level]);
 	}
 	I915_WRITE(PLANE_WM_TRANS(pipe, plane), wm->plane_trans[pipe][plane]);
+
+	skl_ddb_entry_write(dev_priv, PLANE_BUF_CFG(pipe, plane),
+			    &wm->ddb.plane[pipe][plane]);
+	skl_ddb_entry_write(dev_priv, PLANE_NV12_BUF_CFG(pipe, plane),
+			    &wm->ddb.y_plane[pipe][plane]);
 }
 
 void skl_write_cursor_wm(struct intel_crtc *intel_crtc,
@@ -3832,170 +3837,51 @@  void skl_write_cursor_wm(struct intel_crtc *intel_crtc,
 			   wm->plane[pipe][PLANE_CURSOR][level]);
 	}
 	I915_WRITE(CUR_WM_TRANS(pipe), wm->plane_trans[pipe][PLANE_CURSOR]);
-}
-
-static void skl_write_wm_values(struct drm_i915_private *dev_priv,
-				const struct skl_wm_values *new)
-{
-	struct drm_device *dev = &dev_priv->drm;
-	struct intel_crtc *crtc;
-
-	for_each_intel_crtc(dev, crtc) {
-		int i;
-		enum pipe pipe = crtc->pipe;
-
-		if ((new->dirty_pipes & drm_crtc_mask(&crtc->base)) == 0)
-			continue;
-		if (!crtc->active)
-			continue;
 
-		for (i = 0; i < intel_num_planes(crtc); i++) {
-			skl_ddb_entry_write(dev_priv,
-					    PLANE_BUF_CFG(pipe, i),
-					    &new->ddb.plane[pipe][i]);
-			skl_ddb_entry_write(dev_priv,
-					    PLANE_NV12_BUF_CFG(pipe, i),
-					    &new->ddb.y_plane[pipe][i]);
-		}
-
-		skl_ddb_entry_write(dev_priv, CUR_BUF_CFG(pipe),
-				    &new->ddb.plane[pipe][PLANE_CURSOR]);
-	}
+	skl_ddb_entry_write(dev_priv, CUR_BUF_CFG(pipe),
+			    &wm->ddb.plane[pipe][PLANE_CURSOR]);
 }
 
-/*
- * When setting up a new DDB allocation arrangement, we need to correctly
- * sequence the times at which the new allocations for the pipes are taken into
- * account or we'll have pipes fetching from space previously allocated to
- * another pipe.
- *
- * Roughly the sequence looks like:
- *  1. re-allocate the pipe(s) with the allocation being reduced and not
- *     overlapping with a previous light-up pipe (another way to put it is:
- *     pipes with their new allocation strickly included into their old ones).
- *  2. re-allocate the other pipes that get their allocation reduced
- *  3. allocate the pipes having their allocation increased
- *
- * Steps 1. and 2. are here to take care of the following case:
- * - Initially DDB looks like this:
- *     |   B    |   C    |
- * - enable pipe A.
- * - pipe B has a reduced DDB allocation that overlaps with the old pipe C
- *   allocation
- *     |  A  |  B  |  C  |
- *
- * We need to sequence the re-allocation: C, B, A (and not B, C, A).
- */
-
-static void
-skl_wm_flush_pipe(struct drm_i915_private *dev_priv, enum pipe pipe, int pass)
+static bool
+skl_ddb_allocation_equals(const struct skl_ddb_allocation *old,
+			  const struct skl_ddb_allocation *new,
+			  enum pipe pipe)
 {
-	int plane;
-
-	DRM_DEBUG_KMS("flush pipe %c (pass %d)\n", pipe_name(pipe), pass);
-
-	for_each_plane(dev_priv, pipe, plane) {
-		I915_WRITE(PLANE_SURF(pipe, plane),
-			   I915_READ(PLANE_SURF(pipe, plane)));
-	}
-	I915_WRITE(CURBASE(pipe), I915_READ(CURBASE(pipe)));
+	return new->pipe[pipe].start == old->pipe[pipe].start &&
+	       new->pipe[pipe].end == old->pipe[pipe].end;
 }
 
 static bool
-skl_ddb_allocation_included(const struct skl_ddb_allocation *old,
+skl_ddb_allocation_overlaps(struct drm_atomic_state *state,
+			    const struct skl_ddb_allocation *old,
 			    const struct skl_ddb_allocation *new,
 			    enum pipe pipe)
 {
-	uint16_t old_size, new_size;
-
-	old_size = skl_ddb_entry_size(&old->pipe[pipe]);
-	new_size = skl_ddb_entry_size(&new->pipe[pipe]);
-
-	return old_size != new_size &&
-	       new->pipe[pipe].start >= old->pipe[pipe].start &&
-	       new->pipe[pipe].end <= old->pipe[pipe].end;
-}
-
-static void skl_flush_wm_values(struct drm_i915_private *dev_priv,
-				struct skl_wm_values *new_values)
-{
-	struct drm_device *dev = &dev_priv->drm;
-	struct skl_ddb_allocation *cur_ddb, *new_ddb;
-	bool reallocated[I915_MAX_PIPES] = {};
-	struct intel_crtc *crtc;
-	enum pipe pipe;
-
-	new_ddb = &new_values->ddb;
-	cur_ddb = &dev_priv->wm.skl_hw.ddb;
-
-	/*
-	 * First pass: flush the pipes with the new allocation contained into
-	 * the old space.
-	 *
-	 * We'll wait for the vblank on those pipes to ensure we can safely
-	 * re-allocate the freed space without this pipe fetching from it.
-	 */
-	for_each_intel_crtc(dev, crtc) {
-		if (!crtc->active)
-			continue;
-
-		pipe = crtc->pipe;
-
-		if (!skl_ddb_allocation_included(cur_ddb, new_ddb, pipe))
-			continue;
-
-		skl_wm_flush_pipe(dev_priv, pipe, 1);
-		intel_wait_for_vblank(dev, pipe);
-
-		reallocated[pipe] = true;
-	}
-
-
-	/*
-	 * Second pass: flush the pipes that are having their allocation
-	 * reduced, but overlapping with a previous allocation.
-	 *
-	 * Here as well we need to wait for the vblank to make sure the freed
-	 * space is not used anymore.
-	 */
-	for_each_intel_crtc(dev, crtc) {
-		if (!crtc->active)
-			continue;
-
-		pipe = crtc->pipe;
-
-		if (reallocated[pipe])
-			continue;
-
-		if (skl_ddb_entry_size(&new_ddb->pipe[pipe]) <
-		    skl_ddb_entry_size(&cur_ddb->pipe[pipe])) {
-			skl_wm_flush_pipe(dev_priv, pipe, 2);
-			intel_wait_for_vblank(dev, pipe);
-			reallocated[pipe] = true;
-		}
-	}
-
-	/*
-	 * Third pass: flush the pipes that got more space allocated.
-	 *
-	 * We don't need to actively wait for the update here, next vblank
-	 * will just get more DDB space with the correct WM values.
-	 */
-	for_each_intel_crtc(dev, crtc) {
-		if (!crtc->active)
-			continue;
+	struct drm_device *dev = state->dev;
+	struct intel_crtc *intel_crtc;
+	enum pipe otherp;
 
-		pipe = crtc->pipe;
+	for_each_intel_crtc(dev, intel_crtc) {
+		otherp = intel_crtc->pipe;
 
 		/*
-		 * At this point, only the pipes more space than before are
-		 * left to re-allocate.
+		 * When checking for overlaps, we don't want to:
+		 *  - Compare against ourselves
+		 *  - Compare against pipes that will be disabled in step 0
+		 *  - Compare against pipes that won't be enabled until step 3
 		 */
-		if (reallocated[pipe])
+		if (otherp == pipe || !new->pipe[otherp].end ||
+		    !old->pipe[otherp].end)
 			continue;
 
-		skl_wm_flush_pipe(dev_priv, pipe, 3);
+		if ((new->pipe[pipe].start >= old->pipe[otherp].start &&
+		     new->pipe[pipe].start < old->pipe[otherp].end) ||
+		    (old->pipe[otherp].start >= new->pipe[pipe].start &&
+		     old->pipe[otherp].start < new->pipe[pipe].end))
+			return true;
 	}
+
+	return false;
 }
 
 static int skl_update_pipe_wm(struct drm_crtc_state *cstate,
@@ -4038,8 +3924,10 @@  skl_compute_ddb(struct drm_atomic_state *state)
 	struct drm_device *dev = state->dev;
 	struct drm_i915_private *dev_priv = to_i915(dev);
 	struct intel_atomic_state *intel_state = to_intel_atomic_state(state);
+	struct intel_crtc_state *cstate;
 	struct intel_crtc *intel_crtc;
-	struct skl_ddb_allocation *ddb = &intel_state->wm_results.ddb;
+	struct skl_ddb_allocation *old_ddb = &dev_priv->wm.skl_hw.ddb;
+	struct skl_ddb_allocation *new_ddb = &intel_state->wm_results.ddb;
 	uint32_t realloc_pipes = pipes_modified(state);
 	int ret;
 
@@ -4071,13 +3959,11 @@  skl_compute_ddb(struct drm_atomic_state *state)
 	}
 
 	for_each_intel_crtc_mask(dev, intel_crtc, realloc_pipes) {
-		struct intel_crtc_state *cstate;
-
 		cstate = intel_atomic_get_crtc_state(state, intel_crtc);
 		if (IS_ERR(cstate))
 			return PTR_ERR(cstate);
 
-		ret = skl_allocate_pipe_ddb(cstate, ddb);
+		ret = skl_allocate_pipe_ddb(cstate, new_ddb);
 		if (ret)
 			return ret;
 
@@ -4086,6 +3972,73 @@  skl_compute_ddb(struct drm_atomic_state *state)
 			return ret;
 	}
 
+	/*
+	 * When setting up a new DDB allocation arrangement, we need to
+	 * correctly sequence the times at which the new allocations for the
+	 * pipes are taken into account or we'll have pipes fetching from space
+	 * previously allocated to another pipe.
+	 *
+	 * Roughly the final sequence we want looks like this:
+	 *  1. Disable any pipes we're not going to be using anymore
+	 *  2. Reallocate all of the active pipes whose new ddb allocations
+	 *  won't overlap with another active pipe's ddb allocation.
+	 *  3. Reallocate remaining active pipes, if any.
+	 *  4. Enable any new pipes, if any.
+	 *
+	 * Example:
+	 * Initially DDB looks like this:
+	 *   |   B    |   C    |
+	 * And the final DDB should look like this:
+	 *   |  B  |  C  |  A  |
+	 *
+	 * 1. We're not disabling any pipes, so do nothing on this step.
+	 * 2. Pipe B's new allocation wouldn't overlap with pipe C, however
+	 * pipe C's new allocation does overlap with pipe B's current
+	 * allocation. Reallocate B first so the DDB looks like this:
+	 *   |  B  |xx|   C    |
+	 * 3. Now we can safely reallocate pipe C to it's new location:
+	 *   |  B  |  C  |xxxxx|
+	 * 4. Enable any remaining pipes, in this case A
+	 *   |  B  |  C  |  A  |
+	 *
+	 * As well, between every pipe reallocation we have to wait for a
+	 * vblank on the pipe so that we ensure it's new allocation has taken
+	 * effect by the time we start moving the next pipe. This can be
+	 * skipped on the last step we need to perform, which is why we keep
+	 * track of that information here. For example, if we've reallocated
+	 * all the pipes that need changing by the time we reach step 3, we can
+	 * finish without waiting for the pipes we changed in step 3 to update.
+	 */
+	for_each_intel_crtc_mask(dev, intel_crtc, realloc_pipes) {
+		enum pipe pipe = intel_crtc->pipe;
+		enum skl_ddb_step step;
+
+		cstate = intel_atomic_get_crtc_state(state, intel_crtc);
+		if (IS_ERR(cstate))
+			return PTR_ERR(cstate);
+
+		/* Step 1: Pipes we're disabling / haven't changed */
+		if (skl_ddb_allocation_equals(old_ddb, new_ddb, pipe) ||
+		    new_ddb->pipe[pipe].end == 0) {
+			step = SKL_DDB_STEP_NONE;
+		/* Step 2-3: Active pipes we're reallocating */
+		} else if (old_ddb->pipe[pipe].end != 0) {
+			if (skl_ddb_allocation_overlaps(state, old_ddb, new_ddb,
+							pipe))
+				step = SKL_DDB_STEP_OVERLAP;
+			else
+				step = SKL_DDB_STEP_NO_OVERLAP;
+		/* Step 4: Pipes we're enabling */
+		} else {
+			step = SKL_DDB_STEP_FINAL;
+		}
+
+		cstate->wm.skl.ddb_realloc = step;
+
+		if (step > intel_state->last_ddb_step)
+			intel_state->last_ddb_step = step;
+	}
+
 	return 0;
 }
 
@@ -4110,10 +4063,13 @@  skl_copy_wm_for_pipe(struct skl_wm_values *dst,
 static int
 skl_compute_wm(struct drm_atomic_state *state)
 {
+	struct drm_i915_private *dev_priv = to_i915(state->dev);
 	struct drm_crtc *crtc;
 	struct drm_crtc_state *cstate;
 	struct intel_atomic_state *intel_state = to_intel_atomic_state(state);
 	struct skl_wm_values *results = &intel_state->wm_results;
+	struct skl_ddb_allocation *old_ddb = &dev_priv->wm.skl_hw.ddb;
+	struct skl_ddb_allocation *new_ddb = &results->ddb;
 	struct skl_pipe_wm *pipe_wm;
 	bool changed = false;
 	int ret, i;
@@ -4152,7 +4108,10 @@  skl_compute_wm(struct drm_atomic_state *state)
 		struct intel_crtc *intel_crtc = to_intel_crtc(crtc);
 		struct intel_crtc_state *intel_cstate =
 			to_intel_crtc_state(cstate);
+		enum skl_ddb_step step;
+		enum pipe pipe;
 
+		pipe = intel_crtc->pipe;
 		pipe_wm = &intel_cstate->wm.skl.optimal;
 		ret = skl_update_pipe_wm(cstate, &results->ddb, pipe_wm,
 					 &changed);
@@ -4167,7 +4126,18 @@  skl_compute_wm(struct drm_atomic_state *state)
 			continue;
 
 		intel_cstate->update_wm_pre = true;
+		step = intel_cstate->wm.skl.ddb_realloc;
 		skl_compute_wm_results(crtc->dev, pipe_wm, results, intel_crtc);
+
+		if (!skl_ddb_entry_equal(&old_ddb->pipe[pipe],
+					 &new_ddb->pipe[pipe])) {
+			DRM_DEBUG_KMS(
+			    "DDB changes for [CRTC:%d:pipe %c]: (%3d - %3d) -> (%3d - %3d) on step %d\n",
+			    intel_crtc->base.base.id, pipe_name(pipe),
+			    old_ddb->pipe[pipe].start, old_ddb->pipe[pipe].end,
+			    new_ddb->pipe[pipe].start, new_ddb->pipe[pipe].end,
+			    step);
+		}
 	}
 
 	return 0;
@@ -4191,8 +4161,20 @@  static void skl_update_wm(struct drm_crtc *crtc)
 
 	mutex_lock(&dev_priv->wm.wm_mutex);
 
-	skl_write_wm_values(dev_priv, results);
-	skl_flush_wm_values(dev_priv, results);
+	/*
+	 * If this pipe isn't active already, we're going to be enabling it
+	 * very soon. Since it's safe to update these while the pipe's shut off,
+	 * just do so here. Already active pipes will have their watermarks
+	 * updated once we update their planes.
+	 */
+	if (!intel_crtc->active) {
+		int plane;
+
+		for (plane = 0; plane < intel_num_planes(intel_crtc); plane++)
+			skl_write_plane_wm(intel_crtc, results, plane);
+
+		skl_write_cursor_wm(intel_crtc, results);
+	}
 
 	/*
 	 * Store the new configuration (but only for the pipes that have