Message ID | 20250129004337.36898-3-shannon.nelson@amd.com (mailing list archive) |
---|---|
State | Changes Requested |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | pds_core: fixes for adminq overflow | expand |
On Tue, Jan 28, 2025 at 04:43:37PM -0800, Shannon Nelson wrote: > From: Brett Creeley <brett.creeley@amd.com> > > If the adminq is full, the driver reports failure when trying to post > new adminq commands. This is a bit aggressive and unexpected because > technically the adminq post didn't fail in this case, it was just full. > To harden this path add support for a bounded retry mechanism. > > It's possible some commands take longer than expected, maybe hundreds > of milliseconds or seconds due to other processing on the device side, > so to further reduce the chance of failure due to adminq full increase > the PDS_CORE_DEVCMD_TIMEOUT from 5 to 10 seconds. > > The caller of pdsc_adminq_post() may still see -EAGAIN reported if the > space in the adminq never freed up. In this case they can choose to > call the function again or fail. For now, no callers will retry. > > Fixes: 01ba61b55b20 ("pds_core: Add adminq processing and commands") > Signed-off-by: Brett Creeley <brett.creeley@amd.com> > Signed-off-by: Shannon Nelson <shannon.nelson@amd.com> > --- > drivers/net/ethernet/amd/pds_core/adminq.c | 22 ++++++++++++++++++---- > include/linux/pds/pds_core_if.h | 2 +- > 2 files changed, 19 insertions(+), 5 deletions(-) > > diff --git a/drivers/net/ethernet/amd/pds_core/adminq.c b/drivers/net/ethernet/amd/pds_core/adminq.c > index c83a0a80d533..387de1712827 100644 > --- a/drivers/net/ethernet/amd/pds_core/adminq.c > +++ b/drivers/net/ethernet/amd/pds_core/adminq.c > @@ -181,7 +181,10 @@ static int __pdsc_adminq_post(struct pdsc *pdsc, > else > avail -= q->head_idx + 1; > if (!avail) { > - ret = -ENOSPC; > + if (!pdsc_is_fw_running(pdsc)) > + ret = -ENXIO; > + else > + ret = -EAGAIN; Short if will fit nice here, anyway: Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> > goto err_out_unlock; > } > > @@ -251,14 +254,25 @@ int pdsc_adminq_post(struct pdsc *pdsc, > } > > wc.qcq = &pdsc->adminqcq; > - index = __pdsc_adminq_post(pdsc, &pdsc->adminqcq, cmd, comp, &wc); > + time_start = jiffies; > + time_limit = time_start + HZ * pdsc->devcmd_timeout; > + do { > + index = __pdsc_adminq_post(pdsc, &pdsc->adminqcq, cmd, comp, > + &wc); > + if (index != -EAGAIN) > + break; > + > + dev_dbg(pdsc->dev, "Retrying adminq cmd opcode %u\n", > + cmd->opcode); > + /* Give completion processing a chance to free up space */ > + msleep(1); > + } while (time_before(jiffies, time_limit)); > + > if (index < 0) { > err = index; > goto err_out; > } > > - time_start = jiffies; > - time_limit = time_start + HZ * pdsc->devcmd_timeout; > do { > /* Timeslice the actual wait to catch IO errors etc early */ > poll_jiffies = msecs_to_jiffies(poll_interval); > diff --git a/include/linux/pds/pds_core_if.h b/include/linux/pds/pds_core_if.h > index 17a87c1a55d7..babc6d573acd 100644 > --- a/include/linux/pds/pds_core_if.h > +++ b/include/linux/pds/pds_core_if.h > @@ -22,7 +22,7 @@ > #define PDS_CORE_BAR0_INTR_CTRL_OFFSET 0x2000 > #define PDS_CORE_DEV_CMD_DONE 0x00000001 > > -#define PDS_CORE_DEVCMD_TIMEOUT 5 > +#define PDS_CORE_DEVCMD_TIMEOUT 10 > > #define PDS_CORE_CLIENT_ID 0 > #define PDS_CORE_ASIC_TYPE_CAPRI 0 > -- > 2.17.1
On Tue, 28 Jan 2025 16:43:37 -0800 Shannon Nelson wrote: > If the adminq is full, the driver reports failure when trying to post > new adminq commands. This is a bit aggressive and unexpected because > technically the adminq post didn't fail in this case, it was just full. > To harden this path add support for a bounded retry mechanism. > > It's possible some commands take longer than expected, maybe hundreds > of milliseconds or seconds due to other processing on the device side, > so to further reduce the chance of failure due to adminq full increase > the PDS_CORE_DEVCMD_TIMEOUT from 5 to 10 seconds. > > The caller of pdsc_adminq_post() may still see -EAGAIN reported if the > space in the adminq never freed up. In this case they can choose to > call the function again or fail. For now, no callers will retry. How about a semaphore? You can initialize it to the number of slots in the queue, and use down_timeout() if you want the 10 sec timeout?
diff --git a/drivers/net/ethernet/amd/pds_core/adminq.c b/drivers/net/ethernet/amd/pds_core/adminq.c index c83a0a80d533..387de1712827 100644 --- a/drivers/net/ethernet/amd/pds_core/adminq.c +++ b/drivers/net/ethernet/amd/pds_core/adminq.c @@ -181,7 +181,10 @@ static int __pdsc_adminq_post(struct pdsc *pdsc, else avail -= q->head_idx + 1; if (!avail) { - ret = -ENOSPC; + if (!pdsc_is_fw_running(pdsc)) + ret = -ENXIO; + else + ret = -EAGAIN; goto err_out_unlock; } @@ -251,14 +254,25 @@ int pdsc_adminq_post(struct pdsc *pdsc, } wc.qcq = &pdsc->adminqcq; - index = __pdsc_adminq_post(pdsc, &pdsc->adminqcq, cmd, comp, &wc); + time_start = jiffies; + time_limit = time_start + HZ * pdsc->devcmd_timeout; + do { + index = __pdsc_adminq_post(pdsc, &pdsc->adminqcq, cmd, comp, + &wc); + if (index != -EAGAIN) + break; + + dev_dbg(pdsc->dev, "Retrying adminq cmd opcode %u\n", + cmd->opcode); + /* Give completion processing a chance to free up space */ + msleep(1); + } while (time_before(jiffies, time_limit)); + if (index < 0) { err = index; goto err_out; } - time_start = jiffies; - time_limit = time_start + HZ * pdsc->devcmd_timeout; do { /* Timeslice the actual wait to catch IO errors etc early */ poll_jiffies = msecs_to_jiffies(poll_interval); diff --git a/include/linux/pds/pds_core_if.h b/include/linux/pds/pds_core_if.h index 17a87c1a55d7..babc6d573acd 100644 --- a/include/linux/pds/pds_core_if.h +++ b/include/linux/pds/pds_core_if.h @@ -22,7 +22,7 @@ #define PDS_CORE_BAR0_INTR_CTRL_OFFSET 0x2000 #define PDS_CORE_DEV_CMD_DONE 0x00000001 -#define PDS_CORE_DEVCMD_TIMEOUT 5 +#define PDS_CORE_DEVCMD_TIMEOUT 10 #define PDS_CORE_CLIENT_ID 0 #define PDS_CORE_ASIC_TYPE_CAPRI 0