diff mbox series

[v16,6/7] drm/i915: Protect intel_dbuf_slices_update with mutex

Message ID 20200124084456.2961-7-stanislav.lisovskiy@intel.com (mailing list archive)
State New, archived
Headers show
Series Enable second DBuf slice for ICL and TGL | expand

Commit Message

Lisovskiy, Stanislav Jan. 24, 2020, 8:44 a.m. UTC
Now using power_domain mutex to protect from race condition, which
can occur because intel_dbuf_slices_update might be running in
parallel to gen9_dc_off_power_well_enable being called from
intel_dp_detect for instance, which causes assertion triggered by
race condition, as gen9_assert_dbuf_enabled might preempt this
when registers were already updated, while dev_priv was not.

Signed-off-by: Stanislav Lisovskiy <stanislav.lisovskiy@intel.com>
---
 drivers/gpu/drm/i915/display/intel_display_power.c | 12 ++++++++++++
 1 file changed, 12 insertions(+)

Comments

Matt Roper Jan. 28, 2020, 11:33 p.m. UTC | #1
On Fri, Jan 24, 2020 at 10:44:55AM +0200, Stanislav Lisovskiy wrote:
> Now using power_domain mutex to protect from race condition, which
> can occur because intel_dbuf_slices_update might be running in
> parallel to gen9_dc_off_power_well_enable being called from
> intel_dp_detect for instance, which causes assertion triggered by
> race condition, as gen9_assert_dbuf_enabled might preempt this
> when registers were already updated, while dev_priv was not.

I may be overlooking something, but I think your next patch already
takes care of this by ensuring we only do dbuf updates during modesets.
We already had POWER_DOMAIN_MODESET in our various DC_OFF_POWER_DOMAINS
definitions which would ensure that the "DC off" power well is enabled
(and DC states themselves are disabled) for the entire duration of the
modeset process.

If we need this, I'm not sure whether it's a good idea to use
power_domains->lock rather than a new, dedicated lock.  Anything that
touches power domains in any manner grabs this lock, even though we only
really care about it for stopping races with the specific "DC off" power
well.

Also, if we bisect to the point right before these last two patches,
don't we have a problem since there's a point in the git history where
we potentially face a race?


Matt

> 
> Signed-off-by: Stanislav Lisovskiy <stanislav.lisovskiy@intel.com>
> ---
>  drivers/gpu/drm/i915/display/intel_display_power.c | 12 ++++++++++++
>  1 file changed, 12 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/display/intel_display_power.c b/drivers/gpu/drm/i915/display/intel_display_power.c
> index 96b38252578b..99ddc21e004c 100644
> --- a/drivers/gpu/drm/i915/display/intel_display_power.c
> +++ b/drivers/gpu/drm/i915/display/intel_display_power.c
> @@ -4404,12 +4404,22 @@ void icl_dbuf_slices_update(struct drm_i915_private *dev_priv,
>  {
>  	int i;
>  	int max_slices = INTEL_INFO(dev_priv)->num_supported_dbuf_slices;
> +	struct i915_power_domains *power_domains = &dev_priv->power_domains;
>  
>  	WARN(hweight8(req_slices) > max_slices,
>  	     "Invalid number of dbuf slices requested\n");
>  
>  	DRM_DEBUG_KMS("Updating dbuf slices to 0x%x\n", req_slices);
>  
> +	/*
> +	 * Might be running this in parallel to gen9_dc_off_power_well_enable
> +	 * being called from intel_dp_detect for instance,
> +	 * which causes assertion triggered by race condition,
> +	 * as gen9_assert_dbuf_enabled might preempt this when registers
> +	 * were already updated, while dev_priv was not.
> +	 */
> +	mutex_lock(&power_domains->lock);
> +
>  	for (i = 0; i < max_slices; i++) {
>  		intel_dbuf_slice_set(dev_priv,
>  				     _DBUF_CTL_S(i),
> @@ -4417,6 +4427,8 @@ void icl_dbuf_slices_update(struct drm_i915_private *dev_priv,
>  	}
>  
>  	dev_priv->enabled_dbuf_slices_mask = req_slices;
> +
> +	mutex_unlock(&power_domains->lock);
>  }
>  
>  static void icl_dbuf_enable(struct drm_i915_private *dev_priv)
> -- 
> 2.24.1.485.gad05a3d8e5
>
Lisovskiy, Stanislav Jan. 29, 2020, 9:22 a.m. UTC | #2
On Tue, 2020-01-28 at 15:33 -0800, Matt Roper wrote:
> On Fri, Jan 24, 2020 at 10:44:55AM +0200, Stanislav Lisovskiy wrote:
> > Now using power_domain mutex to protect from race condition, which
> > can occur because intel_dbuf_slices_update might be running in
> > parallel to gen9_dc_off_power_well_enable being called from
> > intel_dp_detect for instance, which causes assertion triggered by
> > race condition, as gen9_assert_dbuf_enabled might preempt this
> > when registers were already updated, while dev_priv was not.
> 
> I may be overlooking something, but I think your next patch already
> takes care of this by ensuring we only do dbuf updates during
> modesets.
> We already had POWER_DOMAIN_MODESET in our various
> DC_OFF_POWER_DOMAINS
> definitions which would ensure that the "DC off" power well is
> enabled
> (and DC states themselves are disabled) for the entire duration of
> the
> modeset process.

I probably should have clarified this better in commit message.
With previous patch series, I ran into assertion which turned out to be
a consequence of two bugs:
One problem was that it tried to update dbuf slices to 0 in a non-
modeset commit, which didn't have any crtcs in a state, which was wrong
and to prevent this next patch is now checking that we are actually are
doing a modeset, otherwise we will be ocassionally updating dbuf mask
to 0, which should be done only by icl_dbuf_disable according to BSpec.
Also if the commit doesn't have any crtcs in a state, we should not
update a global state in dev priv, because access is not serialized 
according to Ville's idea - in order to read global state we need
to have at least one crtc grabbed, in order to write we need to grab
all crtcs.

Second problem was that there was a race condition in the driver which
this patch takes care of: after device was suspend/resumed, we had a
noop commit which was updating dbuf slices to 0 due to previous
problem,
it was calling icl_dbuf_slices_update function, which was first
writing that value to DBUF_CTL regs and then updating the dev_priv.
However, during that time we could an intel_dp_detect function called
in parallel based hpd irq, which was in turn enabling dc_off power well
and then triggering assertion in gen9_assert_dbuf_enabled, which is 
now checking if dev_priv slices mask and the actual hardware match,
however because icl_dbuf_slices_update was preempted in the middle,
the state didn't match. I reproduced and confirmed this by adding
artifical delay to this update and additional traces.
The most trivial solution to this was as per discussion with Ville was
to use power domains lock here, because then we will be protected
against competing with dc_off power well enabling, because it also 
locks that mutex first.

Previously we didn't hit this issue, because icl_dbuf_slices_update
was simply updating the registers themself and there was no
correspondent global state in dev_priv, also we didn't ever update
slices configuration during a modeset.

Stan

> 
> If we need this, I'm not sure whether it's a good idea to use
> power_domains->lock rather than a new, dedicated lock.  Anything that
> touches power domains in any manner grabs this lock, even though we
> only
> really care about it for stopping races with the specific "DC off"
> power
> well.
> 
> Also, if we bisect to the point right before these last two patches,
> don't we have a problem since there's a point in the git history
> where
> we potentially face a race?
> 
> 
> Matt
> 
> > 
> > Signed-off-by: Stanislav Lisovskiy <stanislav.lisovskiy@intel.com>
> > ---
> >  drivers/gpu/drm/i915/display/intel_display_power.c | 12
> > ++++++++++++
> >  1 file changed, 12 insertions(+)
> > 
> > diff --git a/drivers/gpu/drm/i915/display/intel_display_power.c
> > b/drivers/gpu/drm/i915/display/intel_display_power.c
> > index 96b38252578b..99ddc21e004c 100644
> > --- a/drivers/gpu/drm/i915/display/intel_display_power.c
> > +++ b/drivers/gpu/drm/i915/display/intel_display_power.c
> > @@ -4404,12 +4404,22 @@ void icl_dbuf_slices_update(struct
> > drm_i915_private *dev_priv,
> >  {
> >  	int i;
> >  	int max_slices = INTEL_INFO(dev_priv)-
> > >num_supported_dbuf_slices;
> > +	struct i915_power_domains *power_domains = &dev_priv-
> > >power_domains;
> >  
> >  	WARN(hweight8(req_slices) > max_slices,
> >  	     "Invalid number of dbuf slices requested\n");
> >  
> >  	DRM_DEBUG_KMS("Updating dbuf slices to 0x%x\n", req_slices);
> >  
> > +	/*
> > +	 * Might be running this in parallel to
> > gen9_dc_off_power_well_enable
> > +	 * being called from intel_dp_detect for instance,
> > +	 * which causes assertion triggered by race condition,
> > +	 * as gen9_assert_dbuf_enabled might preempt this when
> > registers
> > +	 * were already updated, while dev_priv was not.
> > +	 */
> > +	mutex_lock(&power_domains->lock);
> > +
> >  	for (i = 0; i < max_slices; i++) {
> >  		intel_dbuf_slice_set(dev_priv,
> >  				     _DBUF_CTL_S(i),
> > @@ -4417,6 +4427,8 @@ void icl_dbuf_slices_update(struct
> > drm_i915_private *dev_priv,
> >  	}
> >  
> >  	dev_priv->enabled_dbuf_slices_mask = req_slices;
> > +
> > +	mutex_unlock(&power_domains->lock);
> >  }
> >  
> >  static void icl_dbuf_enable(struct drm_i915_private *dev_priv)
> > -- 
> > 2.24.1.485.gad05a3d8e5
> > 
> 
>
Ville Syrjälä Jan. 31, 2020, 3:22 p.m. UTC | #3
On Tue, Jan 28, 2020 at 03:33:11PM -0800, Matt Roper wrote:
> On Fri, Jan 24, 2020 at 10:44:55AM +0200, Stanislav Lisovskiy wrote:
> > Now using power_domain mutex to protect from race condition, which
> > can occur because intel_dbuf_slices_update might be running in
> > parallel to gen9_dc_off_power_well_enable being called from
> > intel_dp_detect for instance, which causes assertion triggered by
> > race condition, as gen9_assert_dbuf_enabled might preempt this
> > when registers were already updated, while dev_priv was not.
> 
> I may be overlooking something, but I think your next patch already
> takes care of this by ensuring we only do dbuf updates during modesets.
> We already had POWER_DOMAIN_MODESET in our various DC_OFF_POWER_DOMAINS
> definitions which would ensure that the "DC off" power well is enabled
> (and DC states themselves are disabled) for the entire duration of the
> modeset process.

Hmm. That's assuming we only do the dbuf assert from the dc off
power well hook. Can't remember if that's the case. If that's not
the only place then we probably miss the lock somewhere else too.

> 
> If we need this, I'm not sure whether it's a good idea to use
> power_domains->lock rather than a new, dedicated lock.  Anything that
> touches power domains in any manner grabs this lock, even though we only
> really care about it for stopping races with the specific "DC off" power
> well.

Separate lock feels a bit overkill to me for something small
like this.

> 
> Also, if we bisect to the point right before these last two patches,
> don't we have a problem since there's a point in the git history where
> we potentially face a race?

Yeah should be earlier in the series I guess. If we need it at all,
which as you point out maybe we don't with the state->modeset checks.
Though maybe we want to get rid of that state->modeset dependency.
I *think* we should start using the global state stuff for dbuf
management, but haven't really looked at the details to figure out
how to organize it in the end. So at that point we may not anymore
be holding the dc off reference (although one might argue that we
should always hold that for dbuf programming so the "wait for it
to enable" thing can't be perturbed by dc transitions).

Anyways for now this seems fine by me
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>

> 
> 
> Matt
> 
> > 
> > Signed-off-by: Stanislav Lisovskiy <stanislav.lisovskiy@intel.com>
> > ---
> >  drivers/gpu/drm/i915/display/intel_display_power.c | 12 ++++++++++++
> >  1 file changed, 12 insertions(+)
> > 
> > diff --git a/drivers/gpu/drm/i915/display/intel_display_power.c b/drivers/gpu/drm/i915/display/intel_display_power.c
> > index 96b38252578b..99ddc21e004c 100644
> > --- a/drivers/gpu/drm/i915/display/intel_display_power.c
> > +++ b/drivers/gpu/drm/i915/display/intel_display_power.c
> > @@ -4404,12 +4404,22 @@ void icl_dbuf_slices_update(struct drm_i915_private *dev_priv,
> >  {
> >  	int i;
> >  	int max_slices = INTEL_INFO(dev_priv)->num_supported_dbuf_slices;
> > +	struct i915_power_domains *power_domains = &dev_priv->power_domains;
> >  
> >  	WARN(hweight8(req_slices) > max_slices,
> >  	     "Invalid number of dbuf slices requested\n");
> >  
> >  	DRM_DEBUG_KMS("Updating dbuf slices to 0x%x\n", req_slices);
> >  
> > +	/*
> > +	 * Might be running this in parallel to gen9_dc_off_power_well_enable
> > +	 * being called from intel_dp_detect for instance,
> > +	 * which causes assertion triggered by race condition,
> > +	 * as gen9_assert_dbuf_enabled might preempt this when registers
> > +	 * were already updated, while dev_priv was not.
> > +	 */
> > +	mutex_lock(&power_domains->lock);
> > +
> >  	for (i = 0; i < max_slices; i++) {
> >  		intel_dbuf_slice_set(dev_priv,
> >  				     _DBUF_CTL_S(i),
> > @@ -4417,6 +4427,8 @@ void icl_dbuf_slices_update(struct drm_i915_private *dev_priv,
> >  	}
> >  
> >  	dev_priv->enabled_dbuf_slices_mask = req_slices;
> > +
> > +	mutex_unlock(&power_domains->lock);
> >  }
> >  
> >  static void icl_dbuf_enable(struct drm_i915_private *dev_priv)
> > -- 
> > 2.24.1.485.gad05a3d8e5
> > 
> 
> -- 
> Matt Roper
> Graphics Software Engineer
> VTT-OSGC Platform Enablement
> Intel Corporation
> (916) 356-2795
diff mbox series

Patch

diff --git a/drivers/gpu/drm/i915/display/intel_display_power.c b/drivers/gpu/drm/i915/display/intel_display_power.c
index 96b38252578b..99ddc21e004c 100644
--- a/drivers/gpu/drm/i915/display/intel_display_power.c
+++ b/drivers/gpu/drm/i915/display/intel_display_power.c
@@ -4404,12 +4404,22 @@  void icl_dbuf_slices_update(struct drm_i915_private *dev_priv,
 {
 	int i;
 	int max_slices = INTEL_INFO(dev_priv)->num_supported_dbuf_slices;
+	struct i915_power_domains *power_domains = &dev_priv->power_domains;
 
 	WARN(hweight8(req_slices) > max_slices,
 	     "Invalid number of dbuf slices requested\n");
 
 	DRM_DEBUG_KMS("Updating dbuf slices to 0x%x\n", req_slices);
 
+	/*
+	 * Might be running this in parallel to gen9_dc_off_power_well_enable
+	 * being called from intel_dp_detect for instance,
+	 * which causes assertion triggered by race condition,
+	 * as gen9_assert_dbuf_enabled might preempt this when registers
+	 * were already updated, while dev_priv was not.
+	 */
+	mutex_lock(&power_domains->lock);
+
 	for (i = 0; i < max_slices; i++) {
 		intel_dbuf_slice_set(dev_priv,
 				     _DBUF_CTL_S(i),
@@ -4417,6 +4427,8 @@  void icl_dbuf_slices_update(struct drm_i915_private *dev_priv,
 	}
 
 	dev_priv->enabled_dbuf_slices_mask = req_slices;
+
+	mutex_unlock(&power_domains->lock);
 }
 
 static void icl_dbuf_enable(struct drm_i915_private *dev_priv)