diff mbox

[v3,1/2] drm/dp_helper: add workarounds from intel_dp_dpcd_read_wake()

Message ID 1458761417-12165-1-git-send-email-cpaul@redhat.com (mailing list archive)
State New, archived
Headers show

Commit Message

cpaul@redhat.com March 23, 2016, 7:30 p.m. UTC
Some sinks need some time during the process of resuming the system from
sleep before they're ready to handle transactions. While it would be
nice if they responded with NACKs in these scenarios, this isn't always
the case as a few sinks will just timeout on all of the transactions
they receive.

This patch was originally intended to be a workaround for a very
mysterious bug on the T560, where any monitors connected to the dock
would fail to turn back on after resume. When resuming the laptop, it
appears that there's a short period of time where we're unable to
complete any aux transactions, as they all immediately timeout. The only
machine I'm able to reproduce this on is the T560 as other production
Skylake models seem to be fine. The period during which AUX transactions
fail appears to be around 22ms long. AFAIK, the dock for the T560 never
actually turns off, the only difference is that it's in SST mode at the
start of the resume process, so it's unclear as to why it would need so
much time to come back up.

There's been a discussion on this issue going on for a while on the
intel-gfx mailing list about this that has, in addition to including
developers from Intel, also had the correspondence of one of the
hardware engineers for Intel:

http://www.spinics.net/lists/intel-gfx/msg88831.html
http://www.spinics.net/lists/intel-gfx/msg88410.html

We've already looked into a couple of possible explanations for the
problem:

- Calling intel_dp_mst_resume() before right fix.
  intel_runtime_pm_enable_interrupts(). This was the first fix I tried,
  and while it worked it definitely wasn't the right fix. This worked
  because DP aux transactions don't actually require interrupts to work:

	static uint32_t
	intel_dp_aux_wait_done(struct intel_dp *intel_dp, bool has_aux_irq)
	{
		struct intel_digital_port *intel_dig_port = dp_to_dig_port(intel_dp);
		struct drm_device *dev = intel_dig_port->base.base.dev;
		struct drm_i915_private *dev_priv = dev->dev_private;
		i915_reg_t ch_ctl = intel_dp->aux_ch_ctl_reg;
		uint32_t status;
		bool done;

	#define C (((status = I915_READ_NOTRACE(ch_ctl)) & DP_AUX_CH_CTL_SEND_BUSY) == 0)
		if (has_aux_irq)
			done = wait_event_timeout(dev_priv->gmbus_wait_queue, C,
						  msecs_to_jiffies_timeout(10));
		else
			done = wait_for_atomic(C, 10) == 0;
		if (!done)
			DRM_ERROR("dp aux hw did not signal timeout (has irq: %i)!\n",
				  has_aux_irq);
	#undef C

		return status;
	}

  When there's no interrupts enabled, we end up timing out on the
  wait_event_timeout() call, which causes us to check the DP status
  register once to see if the transaction was successful or not. Since
  this adds a 10ms delay to each aux transaction, it ends up adding a
  long enough delay to the resume process for aux transactions to become
  functional again. This gave us the illusion that enabling interrupts
  had something to do with making things work again, and put me on the
  wrong track for a while.

- Interrupts occurring when we try to perform the aux transactions
  required to put the dock back into MST mode. This isn't the problem,
  as the only interrupts I've observed that come during this timeout
  period are from the snd_hda_intel driver, and disabling that driver
  doesn't appear to change the behavior at all.

- Skylake's PSR block causing issues by performing aux transactions
  while we try to bring the dock out of MST mode. Disabling PSR through
  i915's command line options doesn't seem to change the behavior
  either, nor does preventing the DMC firmware from being loaded.

Since this investigation went on for about 2 weeks, we decided it would
be better for the time being to just workaround this issue until we find
a better fix. This patch adds some behavior we want in
drm_dp_dpcd_access() anyway, since DP aux transactions aren't exactly
robust and this will probably fix quite a few other issues with DP MST
hardware not responding in time. Plus, this is something we already do
in the i915 driver with intel_dp_dpcd_read_wake(), except that that
function is somewhat of a hack and DRM helpers can't make use of it.

			    Changes since v2
- Reworked the patch again to incorporate all of the behavior of
  intel_dp_dpcd_read_wake() into drm_dp_dpcd_read() and
  drm_dp_dpcd_access()

			    Changes since v1

- Patch has been reworked to take the retry logic out of
  intel_dp_mst_resume() and into drm_dp_dpcd_access(), based off a
  suggestion from Daniel Vetter

- Commit message is much longer and gives a better description of the
  issue this was originally intended to workaround.

Signed-off-by: Lyude <cpaul@redhat.com>

squash! drm/dp_helper: retry on -ETIMEDOUT in drm_dp_dpcd_access()

squash! drm/dp_helper: retry on -ETIMEDOUT in drm_dp_dpcd_access()
---
 drivers/gpu/drm/drm_dp_helper.c | 47 +++++++++++++++++++++--------------------
 1 file changed, 24 insertions(+), 23 deletions(-)

Comments

cpaul@redhat.com March 24, 2016, 6:27 p.m. UTC | #1
This series of patches takes all of the workarounds we used in
intel_dp_dpcd_read_wake() for working around misbehaving sinks into drm's DP
aux transaction helpers, so that they can be applied to all aux transactions
across each driver. While this patch series was intended to fix issues with the
ThinkPad T560 not bringing displays connected to it's dock back up properly
after suspend, this should fix a lot of other various DP issues by ensuring
that we retry transactions appropriately in every possible failure scenario.

				Changes since v3
- Split the patch that moves the logic out of intel_dp_dpcd_read_wake() into
  multiple patches for each workaround that we apply, so that bisecting this
  change isn't difficult in the event that this breaks something

- Made sure that drm_dp_dpcd_read() only returns the error it encountered during
  the first attempted aux transaction, since the error that following retries
  encounter might be different from the original

				Changes since v2
- Reworked the patch again to incorporate all of the behavior of
  intel_dp_dpcd_read_wake() into drm_dp_dpcd_read() and drm_dp_dpcd_access()

				Changes since v1
- Patch has been reworked to take the retry logic out of intel_dp_mst_resume()
  and into drm_dp_dpcd_access(), based off a suggestion from Daniel Vetter

- Commit message is much longer and gives a better description of the
  issue this was originally intended to workaround.

Lyude (5):
  drm/dp_helper: Increase retry interval to 1000us
  drm/dp_helper: Always wait before retrying native aux transactions
  drm/dp_helper: Retry aux transactions on all errors
  drm/dp_helper: Perform throw-away read before actual read in
    drm_dp_dpcd_read()
  drm/i915: Get rid of intel_dp_dpcd_read_wake()

 drivers/gpu/drm/drm_dp_helper.c | 55 ++++++++++++++++------------
 drivers/gpu/drm/i915/intel_dp.c | 79 ++++++++++++-----------------------------
 2 files changed, 54 insertions(+), 80 deletions(-)
diff mbox

Patch

diff --git a/drivers/gpu/drm/drm_dp_helper.c b/drivers/gpu/drm/drm_dp_helper.c
index 9535c5b..9b3426c 100644
--- a/drivers/gpu/drm/drm_dp_helper.c
+++ b/drivers/gpu/drm/drm_dp_helper.c
@@ -159,7 +159,7 @@  int drm_dp_bw_code_to_link_rate(u8 link_bw)
 }
 EXPORT_SYMBOL(drm_dp_bw_code_to_link_rate);
 
-#define AUX_RETRY_INTERVAL 500 /* us */
+#define AUX_RETRY_INTERVAL 1000 /* us */
 
 /**
  * DOC: dp helpers
@@ -177,8 +177,8 @@  static int drm_dp_dpcd_access(struct drm_dp_aux *aux, u8 request,
 			      unsigned int offset, void *buffer, size_t size)
 {
 	struct drm_dp_aux_msg msg;
-	unsigned int retry;
-	int err;
+	unsigned int retry, native_reply;
+	int ret = 0;
 
 	memset(&msg, 0, sizeof(msg));
 	msg.address = offset;
@@ -193,35 +193,29 @@  static int drm_dp_dpcd_access(struct drm_dp_aux *aux, u8 request,
 	 * sufficient, bump to 32 which makes Dell 4k monitors happier.
 	 */
 	for (retry = 0; retry < 32; retry++) {
+		if (ret != 0 && ret != -ETIMEDOUT) {
+			usleep_range(AUX_RETRY_INTERVAL,
+				     AUX_RETRY_INTERVAL + 100);
+		}
 
 		mutex_lock(&aux->hw_mutex);
-		err = aux->transfer(aux, &msg);
+		ret = aux->transfer(aux, &msg);
 		mutex_unlock(&aux->hw_mutex);
-		if (err < 0) {
-			if (err == -EBUSY)
-				continue;
-
-			return err;
-		}
-
-
-		switch (msg.reply & DP_AUX_NATIVE_REPLY_MASK) {
-		case DP_AUX_NATIVE_REPLY_ACK:
-			if (err < size)
-				return -EPROTO;
-			return err;
 
-		case DP_AUX_NATIVE_REPLY_NACK:
-			return -EIO;
+		if (ret > 0) {
+			native_reply = msg.reply & DP_AUX_NATIVE_REPLY_MASK;
+			if (native_reply == DP_AUX_NATIVE_REPLY_ACK) {
+				if (ret == size)
+					return ret;
 
-		case DP_AUX_NATIVE_REPLY_DEFER:
-			usleep_range(AUX_RETRY_INTERVAL, AUX_RETRY_INTERVAL + 100);
-			break;
+				ret = -EPROTO;
+			} else
+				ret = -EIO;
 		}
 	}
 
 	DRM_DEBUG_KMS("too many retries, giving up\n");
-	return -EIO;
+	return ret;
 }
 
 /**
@@ -241,6 +235,13 @@  static int drm_dp_dpcd_access(struct drm_dp_aux *aux, u8 request,
 ssize_t drm_dp_dpcd_read(struct drm_dp_aux *aux, unsigned int offset,
 			 void *buffer, size_t size)
 {
+	/*
+	 * Sometimes we just get the same incorrect byte repeated over the
+	 * entire buffer. Doing one throw away read initially seems to "solve"
+	 * it.
+	 */
+	drm_dp_dpcd_access(aux, DP_AUX_NATIVE_READ, DP_DPCD_REV, buffer, 1);
+
 	return drm_dp_dpcd_access(aux, DP_AUX_NATIVE_READ, offset, buffer,
 				  size);
 }