diff mbox

[07/10] IB/hfi1: Add a retry for the first-time QSFP access

Message ID 20160524195052.19706.43009.stgit@scvm10.sc.intel.com (mailing list archive)
State Changes Requested
Headers show

Commit Message

Dennis Dalessandro May 24, 2016, 7:50 p.m. UTC
From: Dean Luick <dean.luick@intel.com>

Some QSFPs do not respond within the expected time, but later
appear fine.  Add a limited retry on the first access.

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Dean Luick <dean.luick@intel.com>
---
 drivers/infiniband/hw/hfi1/qsfp.c |   16 ++++++++++++++++
 1 files changed, 16 insertions(+), 0 deletions(-)


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Dennis Dalessandro May 26, 2016, 5:39 p.m. UTC | #1
On Thu, May 26, 2016 at 12:17:16PM -0400, Doug Ledford wrote:
>On 05/24/2016 03:50 PM, Dennis Dalessandro wrote:
>> From: Dean Luick <dean.luick@intel.com>
>> 
>> Some QSFPs do not respond within the expected time, but later
>> appear fine.  Add a limited retry on the first access.
>
>10 seconds is an awful long delay period.  Admittedly I didn't look
>through the sources to see if the refresh is already happening in the
>context of a delayed work queue or similar, so maybe you can ignore
>this, but if you're going to delay for 10 seconds, it should probably be
>done from a workqueue and not via msleep(2000); goto retry;.

Yeah the 10 seconds may be a bit long. I think we can go ahead and drop this 
from the queue for 4.7 merge window since it is winding down. 

Using a workqueue may be the right way to go but there may be some subtle 
issues there I'd like to think through so rather than rush we can wait for 
rc or 4.8 even.

-Denny
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/infiniband/hw/hfi1/qsfp.c b/drivers/infiniband/hw/hfi1/qsfp.c
index 2441669..aa7ed23 100644
--- a/drivers/infiniband/hw/hfi1/qsfp.c
+++ b/drivers/infiniband/hw/hfi1/qsfp.c
@@ -362,6 +362,7 @@  int refresh_qsfp_cache(struct hfi1_pportdata *ppd, struct qsfp_data *cp)
 {
 	u32 target = ppd->dd->hfi1_id;
 	int ret;
+	int retry_count;
 	unsigned long flags;
 	u8 *cache = &cp->cache[0];
 
@@ -376,8 +377,23 @@  int refresh_qsfp_cache(struct hfi1_pportdata *ppd, struct qsfp_data *cp)
 		goto bail;
 	}
 
+	retry_count = 0;
+retry:
 	ret = qsfp_read(ppd, target, 0, cache, QSFP_PAGESIZE);
 	if (ret != QSFP_PAGESIZE) {
+		/*
+		 * This is the first QSFP access the driver makes.
+		 * Some QSFPs don't respond within the expected time,
+		 * but later appear fine.  Retry at 2s intervals for up
+		 * to 10s.
+		 */
+		if (ret < 0 && retry_count < 5) {
+			retry_count++;
+			dd_dev_info(ppd->dd, "%s: QSFP not responding, waiting and retrying %d\n",
+				    __func__, retry_count);
+			msleep(2000);
+			goto retry;
+		}
 		dd_dev_info(ppd->dd,
 			    "%s: Page 0 read failed, expected %d, got %d\n",
 			    __func__, QSFP_PAGESIZE, ret);