From patchwork Wed Mar 15 21:58:41 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Brian King X-Patchwork-Id: 9626793 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 660FD60244 for ; Wed, 15 Mar 2017 21:58:52 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 579A9283C9 for ; Wed, 15 Mar 2017 21:58:52 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 4C74428433; Wed, 15 Mar 2017 21:58:52 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.9 required=2.0 tests=BAYES_00,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id CEC83283C9 for ; Wed, 15 Mar 2017 21:58:51 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754057AbdCOV6v (ORCPT ); Wed, 15 Mar 2017 17:58:51 -0400 Received: from mx0a-001b2d01.pphosted.com ([148.163.156.1]:45188 "EHLO mx0a-001b2d01.pphosted.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754000AbdCOV6t (ORCPT ); Wed, 15 Mar 2017 17:58:49 -0400 Received: from pps.filterd (m0098394.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.0.20/8.16.0.20) with SMTP id v2FLsR0c094938 for ; Wed, 15 Mar 2017 17:58:48 -0400 Received: from e16.ny.us.ibm.com (e16.ny.us.ibm.com [129.33.205.206]) by mx0a-001b2d01.pphosted.com with ESMTP id 296ng7uchj-1 (version=TLSv1.2 cipher=AES256-SHA bits=256 verify=NOT) for ; Wed, 15 Mar 2017 17:58:47 -0400 Received: from localhost by e16.ny.us.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Wed, 15 Mar 2017 17:58:46 -0400 Received: from b03cxnp08026.gho.boulder.ibm.com (9.17.130.18) by e16.ny.us.ibm.com (146.89.104.203) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; Wed, 15 Mar 2017 17:58:43 -0400 Received: from b03ledav006.gho.boulder.ibm.com (b03ledav006.gho.boulder.ibm.com [9.17.130.237]) by b03cxnp08026.gho.boulder.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id v2FLwgCo10879362; Wed, 15 Mar 2017 14:58:42 -0700 Received: from b03ledav006.gho.boulder.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 795FBC6037; Wed, 15 Mar 2017 15:58:42 -0600 (MDT) Received: from localhost.localdomain (unknown [9.10.86.20]) by b03ledav006.gho.boulder.ibm.com (Postfix) with ESMTP id E563EC6042; Wed, 15 Mar 2017 15:58:41 -0600 (MDT) Subject: [PATCH 5/6] ipr: Fix SATA EH hang To: James.Bottomley@HansenPartnership.com Cc: martin.petersen@oracle.com, linux-scsi@vger.kernel.org, wenxiong@linux.vnet.ibm.com, brking@linux.vnet.ibm.com, djeffery@redhat.com From: Brian King Date: Wed, 15 Mar 2017 16:58:41 -0500 X-TM-AS-GCONF: 00 x-cbid: 17031521-0024-0000-0000-00000216B5FC X-IBM-SpamModules-Scores: X-IBM-SpamModules-Versions: BY=3.00006789; HX=3.00000240; KW=3.00000007; PH=3.00000004; SC=3.00000206; SDB=6.00834348; UDB=6.00409731; IPR=6.00612004; BA=6.00005214; NDR=6.00000001; ZLA=6.00000005; ZF=6.00000009; ZB=6.00000000; ZP=6.00000000; ZH=6.00000000; ZU=6.00000002; MB=3.00014664; XFM=3.00000013; UTC=2017-03-15 21:58:45 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 17031521-0025-0000-0000-0000429596B0 Message-Id: <20170315215841.E563EC6042@b03ledav006.gho.boulder.ibm.com> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:, , definitions=2017-03-15_09:, , signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 spamscore=0 suspectscore=10 malwarescore=0 phishscore=0 adultscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1702020001 definitions=main-1703150166 Sender: linux-scsi-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-scsi@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP This patch fixes a hang that can occur in ATA EH with ipr. With ipr's usage of libata, commands should never end up on ap->eh_done_q. The timeout function we use for ipr, even for SATA devices, is scsi_times_out, so ATA_QCFLAG_EH_SCHEDULED never gets set for ipr and EH is driven completely by ipr and SCSI. The SCSI EH thread ends up calling ipr's eh_device_reset_handler, which then calls ata_std_error_handler. This ends up calling ipr_sata_reset, which issues a reset to the device. This should result in all pending commands getting failed back and having ata_qc_complete called for them, which should end up clearing ATA_QCFLAG_FAILED as qc->flags gets zeroed in ata_qc_free. This ensures that when we end up in ata_eh_finish, we don't do anything more with the command. On adapters that only support a single interrupt and when running with two MSI-X vectors or less, the adapter firmware guarantees that responses to all outstanding commands are sent back prior to sending the response to the SATA reset command. On newer adapters supporting multiple HRRQs, however, this can no longer be guaranteed, since the command responses and reset response may be processed on different HRRQs. If ipr returns from ipr_sata_reset before the outstanding command was returned, this sends us down the path of __ata_eh_qc_complete which then moves the associated scsi_cmd from the work_q in scsi_eh_bus_device_reset to ap->eh_done_q, which then will sit there forever and we will be wedged. This patch fixes this up by ensuring that any outstanding commands are flushed before returning from eh_device_reset_handler for a SATA device. Reported-by: David Jeffery Signed-off-by: Brian King --- drivers/scsi/ipr.c | 62 +++++++++++++++++++++++++++++++++++------------------ 1 file changed, 41 insertions(+), 21 deletions(-) diff -puN drivers/scsi/ipr.c~ipr_fix_sata_eh_hang2 drivers/scsi/ipr.c --- linux-2.6.git/drivers/scsi/ipr.c~ipr_fix_sata_eh_hang2 2017-03-13 16:32:37.663087624 -0500 +++ linux-2.6.git-bjking1/drivers/scsi/ipr.c 2017-03-13 16:32:37.670087596 -0500 @@ -5067,6 +5067,23 @@ static bool ipr_cmnd_is_free(struct ipr_ } /** + * ipr_match_res - Match function for specified resource entry + * @ipr_cmd: ipr command struct + * @resource: resource entry to match + * + * Returns: + * 1 if command matches sdev / 0 if command does not match sdev + **/ +static int ipr_match_res(struct ipr_cmnd *ipr_cmd, void *resource) +{ + struct ipr_resource_entry *res = resource; + + if (res && ipr_cmd->ioarcb.res_handle == res->res_handle) + return 1; + return 0; +} + +/** * ipr_wait_for_ops - Wait for matching commands to complete * @ipr_cmd: ipr command struct * @device: device to match (sdev) @@ -5246,7 +5263,7 @@ static int ipr_sata_reset(struct ata_lin struct ipr_ioa_cfg *ioa_cfg = sata_port->ioa_cfg; struct ipr_resource_entry *res; unsigned long lock_flags = 0; - int rc = -ENXIO; + int rc = -ENXIO, ret; ENTER; spin_lock_irqsave(ioa_cfg->host->host_lock, lock_flags); @@ -5260,9 +5277,19 @@ static int ipr_sata_reset(struct ata_lin if (res) { rc = ipr_device_reset(ioa_cfg, res); *classes = res->ata_class; - } + spin_unlock_irqrestore(ioa_cfg->host->host_lock, lock_flags); + + ret = ipr_wait_for_ops(ioa_cfg, res, ipr_match_res); + if (ret != SUCCESS) { + spin_lock_irqsave(ioa_cfg->host->host_lock, lock_flags); + ipr_initiate_ioa_reset(ioa_cfg, IPR_SHUTDOWN_ABBREV); + spin_unlock_irqrestore(ioa_cfg->host->host_lock, lock_flags); + + wait_event(ioa_cfg->reset_wait_q, !ioa_cfg->in_reset_reload); + } + } else + spin_unlock_irqrestore(ioa_cfg->host->host_lock, lock_flags); - spin_unlock_irqrestore(ioa_cfg->host->host_lock, lock_flags); LEAVE; return rc; } @@ -5291,9 +5318,6 @@ static int __ipr_eh_dev_reset(struct scs ioa_cfg = (struct ipr_ioa_cfg *) scsi_cmd->device->host->hostdata; res = scsi_cmd->device->hostdata; - if (!res) - return FAILED; - /* * If we are currently going through reset/reload, return failed. This will force the * mid-layer to call ipr_eh_host_reset, which will then go to sleep and wait for the @@ -5332,19 +5356,6 @@ static int __ipr_eh_dev_reset(struct scs spin_unlock_irq(scsi_cmd->device->host->host_lock); ata_std_error_handler(ap); spin_lock_irq(scsi_cmd->device->host->host_lock); - - for_each_hrrq(hrrq, ioa_cfg) { - spin_lock(&hrrq->_lock); - list_for_each_entry(ipr_cmd, - &hrrq->hrrq_pending_q, queue) { - if (ipr_cmd->ioarcb.res_handle == - res->res_handle) { - rc = -EIO; - break; - } - } - spin_unlock(&hrrq->_lock); - } } else rc = ipr_device_reset(ioa_cfg, res); res->resetting_device = 0; @@ -5358,15 +5369,24 @@ static int ipr_eh_dev_reset(struct scsi_ { int rc; struct ipr_ioa_cfg *ioa_cfg; + struct ipr_resource_entry *res; ioa_cfg = (struct ipr_ioa_cfg *) cmd->device->host->hostdata; + res = cmd->device->hostdata; + + if (!res) + return FAILED; spin_lock_irq(cmd->device->host->host_lock); rc = __ipr_eh_dev_reset(cmd); spin_unlock_irq(cmd->device->host->host_lock); - if (rc == SUCCESS) - rc = ipr_wait_for_ops(ioa_cfg, cmd->device, ipr_match_lun); + if (rc == SUCCESS) { + if (ipr_is_gata(res) && res->sata_port) + rc = ipr_wait_for_ops(ioa_cfg, res, ipr_match_res); + else + rc = ipr_wait_for_ops(ioa_cfg, cmd->device, ipr_match_lun); + } return rc; }