diff mbox

drm/i915/psr: Fix race in intel_psr_work()

Message ID 20180623044506.3938-1-dhinakaran.pandiyan@intel.com (mailing list archive)
State New, archived
Headers show

Commit Message

Dhinakaran Pandiyan June 23, 2018, 4:45 a.m. UTC
commit 5422b37c907e ("drm/i915/psr: Kill delays when activating psr
back.") switched from delayed work to the plain variant and while doing so
remove the check for work_busy() before scheduling a PSR activation.
This appears to cause consecutive executions of psr_activate() in this
scenario - after a worker picks up the PSR work item for execution and
before the work function can acquire the PSR mutex, a psr_flush() can
get hold of the mutex and schedule another PSR work. Without a psr_exit()
between two psr_activate() calls, the warning messages get printed.
Further, since we drop the mutex in the midst of psr_work() to wait for
PSR to idle, another work item can also get scheduled. Fix this by
returning if PSR was already active.

Note:
I am not 100% sure this is what is going on as I could not reproduce
the bug (https://bugs.freedesktop.org/show_bug.cgi?id=106948)

This patch sort of defeats the point of the WARN_ON()s in psr_activate()
now, do we still need them?

Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Dhinakaran Pandiyan <dhinakaran.pandiyan@intel.com>
---
 drivers/gpu/drm/i915/intel_psr.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Chris Wilson June 24, 2018, 8:54 a.m. UTC | #1
Quoting Dhinakaran Pandiyan (2018-06-23 05:45:06)
> commit 5422b37c907e ("drm/i915/psr: Kill delays when activating psr
> back.") switched from delayed work to the plain variant and while doing so
> remove the check for work_busy() before scheduling a PSR activation.
> This appears to cause consecutive executions of psr_activate() in this
> scenario - after a worker picks up the PSR work item for execution and
> before the work function can acquire the PSR mutex, a psr_flush() can
> get hold of the mutex and schedule another PSR work. Without a psr_exit()
> between two psr_activate() calls, the warning messages get printed.

Ok, that would explain the second work.

> Further, since we drop the mutex in the midst of psr_work() to wait for
> PSR to idle, another work item can also get scheduled. Fix this by
> returning if PSR was already active.
> 
> Note:
> I am not 100% sure this is what is going on as I could not reproduce
> the bug (https://bugs.freedesktop.org/show_bug.cgi?id=106948)
> 
> This patch sort of defeats the point of the WARN_ON()s in psr_activate()
> now, do we still need them?

WARN_ON(active), yup. Seems reasonable to keep for the moment.
WARN_ON(hw_state) would be worth keeping for some time. Hmm, shouldn't
it be checking both PSR_CTL irrespective of intended mode. After
switching psr1 to psr2 we should check that psr1 is disabled as well,
and vice versa.

> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
> Cc: Chris Wilson <chris@chris-wilson.co.uk>
> Signed-off-by: Dhinakaran Pandiyan <dhinakaran.pandiyan@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
-Chris
Dhinakaran Pandiyan June 25, 2018, 6:04 p.m. UTC | #2
On Sun, 2018-06-24 at 09:54 +0100, Chris Wilson wrote:
> Quoting Dhinakaran Pandiyan (2018-06-23 05:45:06)
> > 
> > commit 5422b37c907e ("drm/i915/psr: Kill delays when activating psr
> > back.") switched from delayed work to the plain variant and while
> > doing so
> > remove the check for work_busy() before scheduling a PSR
> > activation.
> > This appears to cause consecutive executions of psr_activate() in
> > this
> > scenario - after a worker picks up the PSR work item for execution
> > and
> > before the work function can acquire the PSR mutex, a psr_flush()
> > can
> > get hold of the mutex and schedule another PSR work. Without a
> > psr_exit()
> > between two psr_activate() calls, the warning messages get printed.
> Ok, that would explain the second work.
> 
> > 
> > Further, since we drop the mutex in the midst of psr_work() to wait
> > for
> > PSR to idle, another work item can also get scheduled. Fix this by
> > returning if PSR was already active.
> > 
> > Note:
> > I am not 100% sure this is what is going on as I could not
> > reproduce
> > the bug (https://bugs.freedesktop.org/show_bug.cgi?id=106948)
> > 
> > This patch sort of defeats the point of the WARN_ON()s in
> > psr_activate()
> > now, do we still need them?
> WARN_ON(active), yup. Seems reasonable to keep for the moment.
> WARN_ON(hw_state) would be worth keeping for some time. Hmm,
> shouldn't
> it be checking both PSR_CTL irrespective of intended mode. After
> switching psr1 to psr2 we should check that psr1 is disabled as well,
> and vice versa.
> 
Yeah, I've sent a patch to check both PSR_CTL and PSR2_CTL. Thanks for
the review.

-DK

> > 
> > Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
> > Cc: Chris Wilson <chris@chris-wilson.co.uk>
> > Signed-off-by: Dhinakaran Pandiyan <dhinakaran.pandiyan@intel.com>
> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
> -Chris
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
diff mbox

Patch

diff --git a/drivers/gpu/drm/i915/intel_psr.c b/drivers/gpu/drm/i915/intel_psr.c
index aea81ace854b..7aa324f0d1f7 100644
--- a/drivers/gpu/drm/i915/intel_psr.c
+++ b/drivers/gpu/drm/i915/intel_psr.c
@@ -811,7 +811,7 @@  static void intel_psr_work(struct work_struct *work)
 	 * recheck. Since psr_flush first clears this and then reschedules we
 	 * won't ever miss a flush when bailing out here.
 	 */
-	if (dev_priv->psr.busy_frontbuffer_bits)
+	if (dev_priv->psr.busy_frontbuffer_bits || dev_priv->psr.active)
 		goto unlock;
 
 	intel_psr_activate(dev_priv->psr.enabled);