From patchwork Wed Aug 10 16:30:46 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Steffen Maier X-Patchwork-Id: 9273679 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 7E58E6022E for ; Wed, 10 Aug 2016 19:15:05 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 665782810E for ; Wed, 10 Aug 2016 19:15:05 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 58FDA28413; Wed, 10 Aug 2016 19:15:05 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.9 required=2.0 tests=BAYES_00,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id C762E2810E for ; Wed, 10 Aug 2016 19:15:04 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S938760AbcHJTOv (ORCPT ); Wed, 10 Aug 2016 15:14:51 -0400 Received: from mx0a-001b2d01.pphosted.com ([148.163.156.1]:11779 "EHLO mx0a-001b2d01.pphosted.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S938692AbcHJTOu (ORCPT ); Wed, 10 Aug 2016 15:14:50 -0400 Received: from pps.filterd (m0098399.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.0.11/8.16.0.11) with SMTP id u7AGYeam124555 for ; Wed, 10 Aug 2016 12:35:33 -0400 Received: from e06smtp13.uk.ibm.com (e06smtp13.uk.ibm.com [195.75.94.109]) by mx0a-001b2d01.pphosted.com with ESMTP id 24qm9sfcd7-1 (version=TLSv1.2 cipher=AES256-SHA bits=256 verify=NOT) for ; Wed, 10 Aug 2016 12:35:32 -0400 Received: from localhost by e06smtp13.uk.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Wed, 10 Aug 2016 17:35:30 +0100 Received: from d06dlp01.portsmouth.uk.ibm.com (9.149.20.13) by e06smtp13.uk.ibm.com (192.168.101.143) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; Wed, 10 Aug 2016 17:35:28 +0100 X-IBM-Helo: d06dlp01.portsmouth.uk.ibm.com X-IBM-MailFrom: maier@linux.vnet.ibm.com X-IBM-RcptTo: linux-s390@vger.kernel.org; linux-scsi@vger.kernel.org; stable@vger.kernel.org Received: from b06cxnps4076.portsmouth.uk.ibm.com (d06relay13.portsmouth.uk.ibm.com [9.149.109.198]) by d06dlp01.portsmouth.uk.ibm.com (Postfix) with ESMTP id 95D1017D8042; Wed, 10 Aug 2016 17:37:07 +0100 (BST) Received: from d06av03.portsmouth.uk.ibm.com (d06av03.portsmouth.uk.ibm.com [9.149.37.213]) by b06cxnps4076.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id u7AGZRKL9699344; Wed, 10 Aug 2016 16:35:27 GMT Received: from d06av03.portsmouth.uk.ibm.com (localhost [127.0.0.1]) by d06av03.portsmouth.uk.ibm.com (8.14.4/8.14.4/NCO v10.0 AVout) with ESMTP id u7AGZQTK022760; Wed, 10 Aug 2016 10:35:27 -0600 Received: from tuxmaker.boeblingen.de.ibm.com (tuxmaker.boeblingen.de.ibm.com [9.152.85.9]) by d06av03.portsmouth.uk.ibm.com (8.14.4/8.14.4/NCO v10.0 AVin) with ESMTP id u7AGVT6U015618 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA256 bits=256 verify=NO); Wed, 10 Aug 2016 10:35:26 -0600 From: Steffen Maier To: "James E . J . Bottomley" , "Martin K . Petersen" Cc: linux-scsi@vger.kernel.org, linux-s390@vger.kernel.org, Martin Schwidefsky , Heiko Carstens , Steffen Maier , "#2 . 6 . 32+" Subject: [PATCH 03/10] zfcp: close window with unblocked rport during rport gone Date: Wed, 10 Aug 2016 18:30:46 +0200 X-Mailer: git-send-email 2.6.6 In-Reply-To: <1470846653-90691-1-git-send-email-maier@linux.vnet.ibm.com> References: <1470846653-90691-1-git-send-email-maier@linux.vnet.ibm.com> X-TM-AS-MML: disable X-Content-Scanned: Fidelis XPS MAILER x-cbid: 16081016-0012-0000-0000-000004432780 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 16081016-0013-0000-0000-000014FC8D58 Message-Id: <1470846653-90691-4-git-send-email-maier@linux.vnet.ibm.com> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:, , definitions=2016-08-10_13:, , signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 spamscore=0 suspectscore=0 malwarescore=0 phishscore=0 adultscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1604210000 definitions=main-1608100173 Sender: linux-scsi-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-scsi@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP On a successful end of reopen port forced, zfcp_erp_strategy_followup_success() re-uses the port erp_action and the subsequent zfcp_erp_action_cleanup() now sees ZFCP_ERP_SUCCEEDED with erp_action->action==ZFCP_ERP_ACTION_REOPEN_PORT instead of ZFCP_ERP_ACTION_REOPEN_PORT_FORCED but must not perform zfcp_scsi_schedule_rport_register(). We can detect this because the fresh port reopen erp_action is in its very first step ZFCP_ERP_STEP_UNINITIALIZED. Otherwise this opens a time window with unblocked rport (until the followup port reopen recovery would block it again). If a scsi_cmnd timeout occurs during this time window fc_timed_out() cannot work as desired and such command would indeed time out and trigger scsi_eh. This prevents a clean and timely path failover. This should not happen if the path issue can be recovered on FC transport layer such as path issues involving RSCNs. Also, unnecessary and repeated DID_IMM_RETRY for pending and undesired new requests occur because internally zfcp still has its zfcp_port blocked. As follow-on errors with scsi_eh, it can cause, in the worst case, permanently lost paths due to one of: sd : [] Medium access timeout failure. Offlining disk! sd : Device offlined - not ready after error recovery For fix validation and to aid future debugging with other recoveries we now also trace (un)blocking of rports. Signed-off-by: Steffen Maier Fixes: 5767620c383a ("[SCSI] zfcp: Do not unblock rport from REOPEN_PORT_FORCED") Fixes: a2fa0aede07c ("[SCSI] zfcp: Block FC transport rports early on errors") Fixes: 5f852be9e11d ("[SCSI] zfcp: Fix deadlock between zfcp ERP and SCSI") Fixes: 338151e06608 ("[SCSI] zfcp: make use of fc_remote_port_delete when target port is unavailable") Fixes: 3859f6a248cb ("[PATCH] zfcp: add rports to enable scsi_add_device to work again") Cc: #2.6.32+ Reviewed-by: Benjamin Block Reviewed-by: Hannes Reinecke --- drivers/s390/scsi/zfcp_dbf.h | 7 ++++++- drivers/s390/scsi/zfcp_erp.c | 12 +++++++++--- drivers/s390/scsi/zfcp_scsi.c | 8 +++++++- 3 files changed, 22 insertions(+), 5 deletions(-) diff --git a/drivers/s390/scsi/zfcp_dbf.h b/drivers/s390/scsi/zfcp_dbf.h index 0be3d48681ae..7901deb4ba89 100644 --- a/drivers/s390/scsi/zfcp_dbf.h +++ b/drivers/s390/scsi/zfcp_dbf.h @@ -2,7 +2,7 @@ * zfcp device driver * debug feature declarations * - * Copyright IBM Corp. 2008, 2010 + * Copyright IBM Corp. 2008, 2015 */ #ifndef ZFCP_DBF_H @@ -17,6 +17,11 @@ #define ZFCP_DBF_INVALID_LUN 0xFFFFFFFFFFFFFFFFull +enum zfcp_dbf_pseudo_erp_act_type { + ZFCP_PSEUDO_ERP_ACTION_RPORT_ADD = 0xff, + ZFCP_PSEUDO_ERP_ACTION_RPORT_DEL = 0xfe, +}; + /** * struct zfcp_dbf_rec_trigger - trace record for triggered recovery action * @ready: number of ready recovery actions diff --git a/drivers/s390/scsi/zfcp_erp.c b/drivers/s390/scsi/zfcp_erp.c index 3fb410977014..a59d678125bd 100644 --- a/drivers/s390/scsi/zfcp_erp.c +++ b/drivers/s390/scsi/zfcp_erp.c @@ -3,7 +3,7 @@ * * Error Recovery Procedures (ERP). * - * Copyright IBM Corp. 2002, 2010 + * Copyright IBM Corp. 2002, 2015 */ #define KMSG_COMPONENT "zfcp" @@ -1217,8 +1217,14 @@ static void zfcp_erp_action_cleanup(struct zfcp_erp_action *act, int result) break; case ZFCP_ERP_ACTION_REOPEN_PORT: - if (result == ZFCP_ERP_SUCCEEDED) - zfcp_scsi_schedule_rport_register(port); + /* This switch case might also happen after a forced reopen + * was successfully done and thus overwritten with a new + * non-forced reopen at `ersfs_2'. In this case, we must not + * do the clean-up of the non-forced version. + */ + if (act->step != ZFCP_ERP_STEP_UNINITIALIZED) + if (result == ZFCP_ERP_SUCCEEDED) + zfcp_scsi_schedule_rport_register(port); /* fall through */ case ZFCP_ERP_ACTION_REOPEN_PORT_FORCED: put_device(&port->dev); diff --git a/drivers/s390/scsi/zfcp_scsi.c b/drivers/s390/scsi/zfcp_scsi.c index b3c6ff49103b..9069f98a1817 100644 --- a/drivers/s390/scsi/zfcp_scsi.c +++ b/drivers/s390/scsi/zfcp_scsi.c @@ -3,7 +3,7 @@ * * Interface to Linux SCSI midlayer. * - * Copyright IBM Corp. 2002, 2013 + * Copyright IBM Corp. 2002, 2015 */ #define KMSG_COMPONENT "zfcp" @@ -556,6 +556,9 @@ static void zfcp_scsi_rport_register(struct zfcp_port *port) ids.port_id = port->d_id; ids.roles = FC_RPORT_ROLE_FCP_TARGET; + zfcp_dbf_rec_trig("scpaddy", port->adapter, port, NULL, + ZFCP_PSEUDO_ERP_ACTION_RPORT_ADD, + ZFCP_PSEUDO_ERP_ACTION_RPORT_ADD); rport = fc_remote_port_add(port->adapter->scsi_host, 0, &ids); if (!rport) { dev_err(&port->adapter->ccw_device->dev, @@ -577,6 +580,9 @@ static void zfcp_scsi_rport_block(struct zfcp_port *port) struct fc_rport *rport = port->rport; if (rport) { + zfcp_dbf_rec_trig("scpdely", port->adapter, port, NULL, + ZFCP_PSEUDO_ERP_ACTION_RPORT_DEL, + ZFCP_PSEUDO_ERP_ACTION_RPORT_DEL); fc_remote_port_delete(rport); port->rport = NULL; }