From patchwork Thu May 3 11:52:47 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Steffen Maier X-Patchwork-Id: 10377965 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 058A160327 for ; Thu, 3 May 2018 11:55:53 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id DB74B285EE for ; Thu, 3 May 2018 11:55:52 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id CF03C2875E; Thu, 3 May 2018 11:55:52 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00, MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 33D2B285EE for ; Thu, 3 May 2018 11:55:52 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751861AbeECLxX (ORCPT ); Thu, 3 May 2018 07:53:23 -0400 Received: from mx0b-001b2d01.pphosted.com ([148.163.158.5]:32838 "EHLO mx0a-001b2d01.pphosted.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1751848AbeECLxP (ORCPT ); Thu, 3 May 2018 07:53:15 -0400 Received: from pps.filterd (m0098419.ppops.net [127.0.0.1]) by mx0b-001b2d01.pphosted.com (8.16.0.22/8.16.0.22) with SMTP id w43BntvZ045839 for ; Thu, 3 May 2018 07:53:15 -0400 Received: from e06smtp11.uk.ibm.com (e06smtp11.uk.ibm.com [195.75.94.107]) by mx0b-001b2d01.pphosted.com with ESMTP id 2hr1mfrsmh-1 (version=TLSv1.2 cipher=AES256-GCM-SHA384 bits=256 verify=NOT) for ; Thu, 03 May 2018 07:53:14 -0400 Received: from localhost by e06smtp11.uk.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Thu, 3 May 2018 12:53:12 +0100 Received: from b06cxnps4076.portsmouth.uk.ibm.com (9.149.109.198) by e06smtp11.uk.ibm.com (192.168.101.141) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; Thu, 3 May 2018 12:53:09 +0100 Received: from d06av25.portsmouth.uk.ibm.com (d06av25.portsmouth.uk.ibm.com [9.149.105.61]) by b06cxnps4076.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id w43Br8MY4194792; Thu, 3 May 2018 11:53:08 GMT Received: from d06av25.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id AEB0F11C04A; Thu, 3 May 2018 12:44:42 +0100 (BST) Received: from d06av25.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 5030411C04C; Thu, 3 May 2018 12:44:42 +0100 (BST) Received: from tuxmaker.boeblingen.de.ibm.com (unknown [9.152.85.9]) by d06av25.portsmouth.uk.ibm.com (Postfix) with ESMTPS; Thu, 3 May 2018 12:44:42 +0100 (BST) From: Steffen Maier To: "James E . J . Bottomley" , "Martin K . Petersen" Cc: linux-scsi@vger.kernel.org, linux-s390@vger.kernel.org, Martin Schwidefsky , Heiko Carstens , Steffen Maier , Jens Remus , stable@vger.kernel.org Subject: [PATCH] zfcp: fix infinite iteration on ERP ready list Date: Thu, 3 May 2018 13:52:47 +0200 X-Mailer: git-send-email 2.13.5 X-TM-AS-GCONF: 00 x-cbid: 18050311-0040-0000-0000-00000454898A X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 18050311-0041-0000-0000-000020F8AFD2 Message-Id: <20180503115247.12859-1-maier@linux.ibm.com> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:, , definitions=2018-05-03_06:, , signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 priorityscore=1501 malwarescore=0 suspectscore=0 phishscore=0 bulkscore=0 spamscore=0 clxscore=1011 lowpriorityscore=0 impostorscore=0 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1709140000 definitions=main-1805030108 Sender: linux-scsi-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-scsi@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Jens Remus zfcp_erp_adapter_reopen() schedules blocking of all of the adapter's rports via zfcp_scsi_schedule_rports_block() and enqueues a reopen adapter ERP action via zfcp_erp_action_enqueue(). Both are separately processed asynchronously and concurrently. Blocking of rports is done in a kworker by zfcp_scsi_rport_work(). It calls zfcp_scsi_rport_block(), which then traces a DBF REC "scpdely" via zfcp_dbf_rec_trig(). zfcp_dbf_rec_trig() acquires the DBF REC spin lock and then iterates with list_for_each() over the adapter's ERP ready list without holding the ERP lock. This opens a race window in which the current list entry can be moved to another list, causing list_for_each() to iterate forever on the wrong list, as the erp_ready_head is never encountered as terminal condition. Meanwhile the ERP action can be processed in the ERP thread by zfcp_erp_thread(). It calls zfcp_erp_strategy(), which acquires the ERP lock and then calls zfcp_erp_action_to_running() to move the ERP action from the ready to the running list. zfcp_erp_action_to_running() can move the ERP action using list_move() just during the aforementioned race window. It then traces a REC RUN "erator1" via zfcp_dbf_rec_run(). zfcp_dbf_rec_run() tries to acquire the DBF REC spin lock. If this is held by the infinitely looping kworker, it effectively spins forever. Example Sequence Diagram: Process ERP Thread rport_work ------------------- ------------------- ------------------- zfcp_erp_adapter_reopen() zfcp_erp_adapter_block() zfcp_scsi_schedule_rports_block() lock ERP zfcp_scsi_rport_work() zfcp_erp_action_enqueue(ZFCP_ERP_ACTION_REOPEN_ADAPTER) list_add_tail() on ready !(rport_task==RPORT_ADD) wake_up() ERP thread zfcp_scsi_rport_block() zfcp_dbf_rec_trig() zfcp_erp_strategy() zfcp_dbf_rec_trig() unlock ERP lock DBF REC zfcp_erp_wait() lock ERP | zfcp_erp_action_to_running() | list_for_each() ready | list_move() current entry | ready to running | zfcp_dbf_rec_run() endless loop over running | zfcp_dbf_rec_run_lvl() | lock DBF REC spins forever Any adapter recovery can trigger this, such as setting the device offline or reboot. V4.9 commit 4eeaa4f3f1d6 ("zfcp: close window with unblocked rport during rport gone") introduced additional tracing of (un)blocking of rports. It missed that the adapter->erp_lock must be held when calling zfcp_dbf_rec_trig(). This fix uses the approach formerly introduced by commit aa0fec62391c ("[SCSI] zfcp: Fix sparse warning by providing new entry in dbf") that got later removed by commit ae0904f60fab ("[SCSI] zfcp: Redesign of the debug tracing for recovery actions."). Introduce zfcp_dbf_rec_trig_lock(), a wrapper for zfcp_dbf_rec_trig() that acquires and releases the adapter->erp_lock for read. Reported-by: Sebastian Ott Signed-off-by: Jens Remus Fixes: 4eeaa4f3f1d6 ("zfcp: close window with unblocked rport during rport gone") Cc: # 2.6.32+ Reviewed-by: Benjamin Block Signed-off-by: Steffen Maier --- James, Martin, this is an important zfcp regression fix. It would be nice if it could make it into 4.17-rcX. The patch applies to James' fixes branch or Martin's 4.17/scsi-fixes branch. Regards, Steffen drivers/s390/scsi/zfcp_dbf.c | 23 ++++++++++++++++++++++- drivers/s390/scsi/zfcp_ext.h | 5 ++++- drivers/s390/scsi/zfcp_scsi.c | 14 +++++++------- 3 files changed, 33 insertions(+), 9 deletions(-) diff --git a/drivers/s390/scsi/zfcp_dbf.c b/drivers/s390/scsi/zfcp_dbf.c index a8b831000b2d..18c4f933e8b9 100644 --- a/drivers/s390/scsi/zfcp_dbf.c +++ b/drivers/s390/scsi/zfcp_dbf.c @@ -4,7 +4,7 @@ * * Debug traces for zfcp. * - * Copyright IBM Corp. 2002, 2017 + * Copyright IBM Corp. 2002, 2018 */ #define KMSG_COMPONENT "zfcp" @@ -308,6 +308,27 @@ void zfcp_dbf_rec_trig(char *tag, struct zfcp_adapter *adapter, spin_unlock_irqrestore(&dbf->rec_lock, flags); } +/** + * zfcp_dbf_rec_trig_lock - trace event related to triggered recovery with lock + * @tag: identifier for event + * @adapter: adapter on which the erp_action should run + * @port: remote port involved in the erp_action + * @sdev: scsi device involved in the erp_action + * @want: wanted erp_action + * @need: required erp_action + * + * The adapter->erp_lock must not be held. + */ +void zfcp_dbf_rec_trig_lock(char *tag, struct zfcp_adapter *adapter, + struct zfcp_port *port, struct scsi_device *sdev, + u8 want, u8 need) +{ + unsigned long flags; + + read_lock_irqsave(&adapter->erp_lock, flags); + zfcp_dbf_rec_trig(tag, adapter, port, sdev, want, need); + read_unlock_irqrestore(&adapter->erp_lock, flags); +} /** * zfcp_dbf_rec_run_lvl - trace event related to running recovery diff --git a/drivers/s390/scsi/zfcp_ext.h b/drivers/s390/scsi/zfcp_ext.h index bf8ea4df2bb8..e5eed8aac0ce 100644 --- a/drivers/s390/scsi/zfcp_ext.h +++ b/drivers/s390/scsi/zfcp_ext.h @@ -4,7 +4,7 @@ * * External function declarations. * - * Copyright IBM Corp. 2002, 2016 + * Copyright IBM Corp. 2002, 2018 */ #ifndef ZFCP_EXT_H @@ -35,6 +35,9 @@ extern int zfcp_dbf_adapter_register(struct zfcp_adapter *); extern void zfcp_dbf_adapter_unregister(struct zfcp_adapter *); extern void zfcp_dbf_rec_trig(char *, struct zfcp_adapter *, struct zfcp_port *, struct scsi_device *, u8, u8); +extern void zfcp_dbf_rec_trig_lock(char *tag, struct zfcp_adapter *adapter, + struct zfcp_port *port, + struct scsi_device *sdev, u8 want, u8 need); extern void zfcp_dbf_rec_run(char *, struct zfcp_erp_action *); extern void zfcp_dbf_rec_run_lvl(int level, char *tag, struct zfcp_erp_action *erp); diff --git a/drivers/s390/scsi/zfcp_scsi.c b/drivers/s390/scsi/zfcp_scsi.c index 4d2ba5682493..22f9562f415c 100644 --- a/drivers/s390/scsi/zfcp_scsi.c +++ b/drivers/s390/scsi/zfcp_scsi.c @@ -4,7 +4,7 @@ * * Interface to Linux SCSI midlayer. * - * Copyright IBM Corp. 2002, 2017 + * Copyright IBM Corp. 2002, 2018 */ #define KMSG_COMPONENT "zfcp" @@ -618,9 +618,9 @@ static void zfcp_scsi_rport_register(struct zfcp_port *port) ids.port_id = port->d_id; ids.roles = FC_RPORT_ROLE_FCP_TARGET; - zfcp_dbf_rec_trig("scpaddy", port->adapter, port, NULL, - ZFCP_PSEUDO_ERP_ACTION_RPORT_ADD, - ZFCP_PSEUDO_ERP_ACTION_RPORT_ADD); + zfcp_dbf_rec_trig_lock("scpaddy", port->adapter, port, NULL, + ZFCP_PSEUDO_ERP_ACTION_RPORT_ADD, + ZFCP_PSEUDO_ERP_ACTION_RPORT_ADD); rport = fc_remote_port_add(port->adapter->scsi_host, 0, &ids); if (!rport) { dev_err(&port->adapter->ccw_device->dev, @@ -642,9 +642,9 @@ static void zfcp_scsi_rport_block(struct zfcp_port *port) struct fc_rport *rport = port->rport; if (rport) { - zfcp_dbf_rec_trig("scpdely", port->adapter, port, NULL, - ZFCP_PSEUDO_ERP_ACTION_RPORT_DEL, - ZFCP_PSEUDO_ERP_ACTION_RPORT_DEL); + zfcp_dbf_rec_trig_lock("scpdely", port->adapter, port, NULL, + ZFCP_PSEUDO_ERP_ACTION_RPORT_DEL, + ZFCP_PSEUDO_ERP_ACTION_RPORT_DEL); fc_remote_port_delete(rport); port->rport = NULL; }