From patchwork Mon Mar 26 16:35:34 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Uma Krishnan X-Patchwork-Id: 10308269 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id B4EAE600CC for ; Mon, 26 Mar 2018 16:35:44 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id A4EBB29797 for ; Mon, 26 Mar 2018 16:35:44 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 995352979A; Mon, 26 Mar 2018 16:35:44 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.9 required=2.0 tests=BAYES_00,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 349AC29797 for ; Mon, 26 Mar 2018 16:35:44 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752524AbeCZQfn (ORCPT ); Mon, 26 Mar 2018 12:35:43 -0400 Received: from mx0a-001b2d01.pphosted.com ([148.163.156.1]:54014 "EHLO mx0a-001b2d01.pphosted.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752522AbeCZQfn (ORCPT ); Mon, 26 Mar 2018 12:35:43 -0400 Received: from pps.filterd (m0098399.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.0.22/8.16.0.22) with SMTP id w2QGLYW5019019 for ; Mon, 26 Mar 2018 12:35:42 -0400 Received: from e16.ny.us.ibm.com (e16.ny.us.ibm.com [129.33.205.206]) by mx0a-001b2d01.pphosted.com with ESMTP id 2gy3nd2eep-1 (version=TLSv1.2 cipher=AES256-SHA256 bits=256 verify=NOT) for ; Mon, 26 Mar 2018 12:35:42 -0400 Received: from localhost by e16.ny.us.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Mon, 26 Mar 2018 12:35:41 -0400 Received: from b01cxnp22035.gho.pok.ibm.com (9.57.198.25) by e16.ny.us.ibm.com (146.89.104.203) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; Mon, 26 Mar 2018 12:35:38 -0400 Received: from b01ledav006.gho.pok.ibm.com (b01ledav006.gho.pok.ibm.com [9.57.199.111]) by b01cxnp22035.gho.pok.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id w2QGZcL155640260; Mon, 26 Mar 2018 16:35:38 GMT Received: from b01ledav006.gho.pok.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 474D2AC03F; Mon, 26 Mar 2018 12:36:57 -0400 (EDT) Received: from p8tul1-build.aus.stglabs.ibm.com (unknown [9.3.141.206]) by b01ledav006.gho.pok.ibm.com (Postfix) with ESMTP id 939E1AC03A; Mon, 26 Mar 2018 12:36:56 -0400 (EDT) From: Uma Krishnan To: linux-scsi@vger.kernel.org, James Bottomley , "Martin K. Petersen" , "Matthew R. Ochs" , "Manoj N. Kumar" Cc: linuxppc-dev@lists.ozlabs.org, Andrew Donnellan , Frederic Barrat , Christophe Lombard Subject: [PATCH v3 40/41] cxlflash: Remove commmands from pending list on timeout Date: Mon, 26 Mar 2018 11:35:34 -0500 X-Mailer: git-send-email 2.1.0 In-Reply-To: <1522081759-57431-1-git-send-email-ukrishn@linux.vnet.ibm.com> References: <1522081759-57431-1-git-send-email-ukrishn@linux.vnet.ibm.com> X-TM-AS-GCONF: 00 x-cbid: 18032616-0024-0000-0000-0000033C09E4 X-IBM-SpamModules-Scores: X-IBM-SpamModules-Versions: BY=3.00008748; HX=3.00000241; KW=3.00000007; PH=3.00000004; SC=3.00000255; SDB=6.01008740; UDB=6.00513806; IPR=6.00788022; MB=3.00020251; MTD=3.00000008; XFM=3.00000015; UTC=2018-03-26 16:35:39 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 18032616-0025-0000-0000-000047739524 Message-Id: <1522082134-58938-1-git-send-email-ukrishn@linux.vnet.ibm.com> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:, , definitions=2018-03-26_07:, , signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 priorityscore=1501 malwarescore=0 suspectscore=0 phishscore=0 bulkscore=0 spamscore=0 clxscore=1015 lowpriorityscore=0 impostorscore=0 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1709140000 definitions=main-1803260169 Sender: linux-scsi-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-scsi@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP The following Oops can occur if an internal command sent to the AFU does not complete within the timeout: [c000000ff101b810] c008000016020d94 term_mc+0xfc/0x1b0 [cxlflash] [c000000ff101b8a0] c008000016020fb0 term_afu+0x168/0x280 [cxlflash] [c000000ff101b930] c0080000160232ec cxlflash_pci_error_detected+0x184/0x230 [cxlflash] [c000000ff101b9e0] c00800000d95d468 cxl_vphb_error_detected+0x90/0x150[cxl] [c000000ff101ba20] c00800000d95f27c cxl_pci_error_detected+0xa4/0x240 [cxl] [c000000ff101bac0] c00000000003eaf8 eeh_report_error+0xd8/0x1b0 [c000000ff101bb20] c00000000003d0b8 eeh_pe_dev_traverse+0x98/0x170 [c000000ff101bbb0] c00000000003f438 eeh_handle_normal_event+0x198/0x580 [c000000ff101bc60] c00000000003fba4 eeh_handle_event+0x2a4/0x338 [c000000ff101bd10] c0000000000400b8 eeh_event_handler+0x1f8/0x200 [c000000ff101bdc0] c00000000013da48 kthread+0x1a8/0x1b0 [c000000ff101be30] c00000000000b528 ret_from_kernel_thread+0x5c/0xb4 When an internal command times out, the command buffer is freed while it is still in the pending commands list of the context. This corrupts the list and when the context is cleaned up, a crash is encountered. To resolve this issue, when an AFU command or TMF command times out, the command should be deleted from the hardware queue pending command list before freeing the buffer. Signed-off-by: Uma Krishnan Acked-by: Matthew R. Ochs --- drivers/scsi/cxlflash/main.c | 14 ++++++++++++++ 1 file changed, 14 insertions(+) diff --git a/drivers/scsi/cxlflash/main.c b/drivers/scsi/cxlflash/main.c index dfe7648..c920328 100644 --- a/drivers/scsi/cxlflash/main.c +++ b/drivers/scsi/cxlflash/main.c @@ -473,6 +473,7 @@ static int send_tmf(struct cxlflash_cfg *cfg, struct scsi_device *sdev, struct afu_cmd *cmd = NULL; struct device *dev = &cfg->dev->dev; struct hwq *hwq = get_hwq(afu, PRIMARY_HWQ); + bool needs_deletion = false; char *buf = NULL; ulong lock_flags; int rc = 0; @@ -527,6 +528,7 @@ static int send_tmf(struct cxlflash_cfg *cfg, struct scsi_device *sdev, if (!to) { dev_err(dev, "%s: TMF timed out\n", __func__); rc = -ETIMEDOUT; + needs_deletion = true; } else if (cmd->cmd_aborted) { dev_err(dev, "%s: TMF aborted\n", __func__); rc = -EAGAIN; @@ -537,6 +539,12 @@ static int send_tmf(struct cxlflash_cfg *cfg, struct scsi_device *sdev, } cfg->tmf_active = false; spin_unlock_irqrestore(&cfg->tmf_slock, lock_flags); + + if (needs_deletion) { + spin_lock_irqsave(&hwq->hsq_slock, lock_flags); + list_del(&cmd->list); + spin_unlock_irqrestore(&hwq->hsq_slock, lock_flags); + } out: kfree(buf); return rc; @@ -2284,6 +2292,7 @@ static int send_afu_cmd(struct afu *afu, struct sisl_ioarcb *rcb) struct device *dev = &cfg->dev->dev; struct afu_cmd *cmd = NULL; struct hwq *hwq = get_hwq(afu, PRIMARY_HWQ); + ulong lock_flags; char *buf = NULL; int rc = 0; int nretry = 0; @@ -2329,6 +2338,11 @@ static int send_afu_cmd(struct afu *afu, struct sisl_ioarcb *rcb) case -ETIMEDOUT: rc = afu->context_reset(hwq); if (rc) { + /* Delete the command from pending_cmds list */ + spin_lock_irqsave(&hwq->hsq_slock, lock_flags); + list_del(&cmd->list); + spin_unlock_irqrestore(&hwq->hsq_slock, lock_flags); + cxlflash_schedule_async_reset(cfg); break; }