diff mbox

[v2,4/6] cxlflash: Fix to resolve cmd leak after host reset

Message ID 1450127222-48145-1-git-send-email-ukrishn@linux.vnet.ibm.com (mailing list archive)
State Accepted, archived
Headers show

Commit Message

Uma Krishnan Dec. 14, 2015, 9:07 p.m. UTC
From: Manoj Kumar <manoj@linux.vnet.ibm.com>

After a few iterations of resetting the card, either during EEH
recovery, or a host_reset the following is seen in the logs.
cxlflash 0008:00: cxlflash_queuecommand: could not get a free command

At every reset of the card, the commands that are outstanding are
being leaked.  No effort is being made to reap these commands.  A few
more resets later, the above error message floods the logs and the
card is rendered totally unusable as no free commands are available.

Iterated through the 'cmd' queue and printed out the 'free' counter
and found that on each reset certain commands were in-use and
stayed in-use through subsequent resets.

To resolve this issue, when the card is reset, reap all the commands
that are active/outstanding.

Signed-off-by: Manoj N. Kumar <manoj@linux.vnet.ibm.com>
Acked-by: Matthew R. Ochs <mrochs@linux.vnet.ibm.com>
---
 drivers/scsi/cxlflash/main.c | 19 +++++++++++++++++--
 1 file changed, 17 insertions(+), 2 deletions(-)

Comments

Andrew Donnellan Dec. 15, 2015, 2:45 a.m. UTC | #1
On 15/12/15 08:07, Uma Krishnan wrote:
> From: Manoj Kumar <manoj@linux.vnet.ibm.com>
>
> After a few iterations of resetting the card, either during EEH
> recovery, or a host_reset the following is seen in the logs.
> cxlflash 0008:00: cxlflash_queuecommand: could not get a free command
>
> At every reset of the card, the commands that are outstanding are
> being leaked.  No effort is being made to reap these commands.  A few
> more resets later, the above error message floods the logs and the
> card is rendered totally unusable as no free commands are available.
>
> Iterated through the 'cmd' queue and printed out the 'free' counter
> and found that on each reset certain commands were in-use and
> stayed in-use through subsequent resets.
>
> To resolve this issue, when the card is reset, reap all the commands
> that are active/outstanding.
>
> Signed-off-by: Manoj N. Kumar <manoj@linux.vnet.ibm.com>
> Acked-by: Matthew R. Ochs <mrochs@linux.vnet.ibm.com>

Looks reasonable.

Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
diff mbox

Patch

diff --git a/drivers/scsi/cxlflash/main.c b/drivers/scsi/cxlflash/main.c
index 35a3202..ac39856 100644
--- a/drivers/scsi/cxlflash/main.c
+++ b/drivers/scsi/cxlflash/main.c
@@ -632,15 +632,30 @@  static void free_mem(struct cxlflash_cfg *cfg)
  * @cfg:	Internal structure associated with the host.
  *
  * Safe to call with AFU in a partially allocated/initialized state.
+ *
+ * Cleans up all state associated with the command queue, and unmaps
+ * the MMIO space.
+ *
+ *  - complete() will take care of commands we initiated (they'll be checked
+ *  in as part of the cleanup that occurs after the completion)
+ *
+ *  - cmd_checkin() will take care of entries that we did not initiate and that
+ *  have not (and will not) complete because they are sitting on a [now stale]
+ *  hardware queue
  */
 static void stop_afu(struct cxlflash_cfg *cfg)
 {
 	int i;
 	struct afu *afu = cfg->afu;
+	struct afu_cmd *cmd;
 
 	if (likely(afu)) {
-		for (i = 0; i < CXLFLASH_NUM_CMDS; i++)
-			complete(&afu->cmd[i].cevent);
+		for (i = 0; i < CXLFLASH_NUM_CMDS; i++) {
+			cmd = &afu->cmd[i];
+			complete(&cmd->cevent);
+			if (!atomic_read(&cmd->free))
+				cmd_checkin(cmd);
+		}
 
 		if (likely(afu->afu_map)) {
 			cxl_psa_unmap((void __iomem *)afu->afu_map);