Message ID | 20160524195052.19706.43009.stgit@scvm10.sc.intel.com (mailing list archive) |
---|---|
State | Changes Requested |
Headers | show |
On Thu, May 26, 2016 at 12:17:16PM -0400, Doug Ledford wrote: >On 05/24/2016 03:50 PM, Dennis Dalessandro wrote: >> From: Dean Luick <dean.luick@intel.com> >> >> Some QSFPs do not respond within the expected time, but later >> appear fine. Add a limited retry on the first access. > >10 seconds is an awful long delay period. Admittedly I didn't look >through the sources to see if the refresh is already happening in the >context of a delayed work queue or similar, so maybe you can ignore >this, but if you're going to delay for 10 seconds, it should probably be >done from a workqueue and not via msleep(2000); goto retry;. Yeah the 10 seconds may be a bit long. I think we can go ahead and drop this from the queue for 4.7 merge window since it is winding down. Using a workqueue may be the right way to go but there may be some subtle issues there I'd like to think through so rather than rush we can wait for rc or 4.8 even. -Denny -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/drivers/infiniband/hw/hfi1/qsfp.c b/drivers/infiniband/hw/hfi1/qsfp.c index 2441669..aa7ed23 100644 --- a/drivers/infiniband/hw/hfi1/qsfp.c +++ b/drivers/infiniband/hw/hfi1/qsfp.c @@ -362,6 +362,7 @@ int refresh_qsfp_cache(struct hfi1_pportdata *ppd, struct qsfp_data *cp) { u32 target = ppd->dd->hfi1_id; int ret; + int retry_count; unsigned long flags; u8 *cache = &cp->cache[0]; @@ -376,8 +377,23 @@ int refresh_qsfp_cache(struct hfi1_pportdata *ppd, struct qsfp_data *cp) goto bail; } + retry_count = 0; +retry: ret = qsfp_read(ppd, target, 0, cache, QSFP_PAGESIZE); if (ret != QSFP_PAGESIZE) { + /* + * This is the first QSFP access the driver makes. + * Some QSFPs don't respond within the expected time, + * but later appear fine. Retry at 2s intervals for up + * to 10s. + */ + if (ret < 0 && retry_count < 5) { + retry_count++; + dd_dev_info(ppd->dd, "%s: QSFP not responding, waiting and retrying %d\n", + __func__, retry_count); + msleep(2000); + goto retry; + } dd_dev_info(ppd->dd, "%s: Page 0 read failed, expected %d, got %d\n", __func__, QSFP_PAGESIZE, ret);