From patchwork Mon Mar 26 16:35:27 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Uma Krishnan X-Patchwork-Id: 10308267 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 0966E60212 for ; Mon, 26 Mar 2018 16:35:38 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id ED30D29796 for ; Mon, 26 Mar 2018 16:35:37 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id E1C4329798; Mon, 26 Mar 2018 16:35:37 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.9 required=2.0 tests=BAYES_00,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 8C03F29796 for ; Mon, 26 Mar 2018 16:35:37 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752502AbeCZQfg (ORCPT ); Mon, 26 Mar 2018 12:35:36 -0400 Received: from mx0a-001b2d01.pphosted.com ([148.163.156.1]:59678 "EHLO mx0a-001b2d01.pphosted.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751091AbeCZQfe (ORCPT ); Mon, 26 Mar 2018 12:35:34 -0400 Received: from pps.filterd (m0098393.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.0.22/8.16.0.22) with SMTP id w2QGZ2ds096082 for ; Mon, 26 Mar 2018 12:35:34 -0400 Received: from e14.ny.us.ibm.com (e14.ny.us.ibm.com [129.33.205.204]) by mx0a-001b2d01.pphosted.com with ESMTP id 2gy1th8nsd-1 (version=TLSv1.2 cipher=AES256-SHA256 bits=256 verify=NOT) for ; Mon, 26 Mar 2018 12:35:33 -0400 Received: from localhost by e14.ny.us.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Mon, 26 Mar 2018 12:35:32 -0400 Received: from b01cxnp22033.gho.pok.ibm.com (9.57.198.23) by e14.ny.us.ibm.com (146.89.104.201) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; Mon, 26 Mar 2018 12:35:30 -0400 Received: from b01ledav005.gho.pok.ibm.com (b01ledav005.gho.pok.ibm.com [9.57.199.110]) by b01cxnp22033.gho.pok.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id w2QGZTxp45940804; Mon, 26 Mar 2018 16:35:29 GMT Received: from b01ledav005.gho.pok.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 7B4EDAE03B; Mon, 26 Mar 2018 12:37:05 -0400 (EDT) Received: from p8tul1-build.aus.stglabs.ibm.com (unknown [9.3.141.206]) by b01ledav005.gho.pok.ibm.com (Postfix) with ESMTP id C4F8DAE03C; Mon, 26 Mar 2018 12:37:04 -0400 (EDT) From: Uma Krishnan To: linux-scsi@vger.kernel.org, James Bottomley , "Martin K. Petersen" , "Matthew R. Ochs" , "Manoj N. Kumar" Cc: linuxppc-dev@lists.ozlabs.org, Andrew Donnellan , Frederic Barrat , Christophe Lombard Subject: [PATCH v3 39/41] cxlflash: Synchronize reset and remove ops Date: Mon, 26 Mar 2018 11:35:27 -0500 X-Mailer: git-send-email 2.1.0 In-Reply-To: <1522081759-57431-1-git-send-email-ukrishn@linux.vnet.ibm.com> References: <1522081759-57431-1-git-send-email-ukrishn@linux.vnet.ibm.com> X-TM-AS-GCONF: 00 x-cbid: 18032616-0052-0000-0000-000002CF310C X-IBM-SpamModules-Scores: X-IBM-SpamModules-Versions: BY=3.00008748; HX=3.00000241; KW=3.00000007; PH=3.00000004; SC=3.00000255; SDB=6.01008740; UDB=6.00513807; IPR=6.00788022; MB=3.00020251; MTD=3.00000008; XFM=3.00000015; UTC=2018-03-26 16:35:31 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 18032616-0053-0000-0000-00005C1EC15C Message-Id: <1522082127-58900-1-git-send-email-ukrishn@linux.vnet.ibm.com> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:, , definitions=2018-03-26_07:, , signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 priorityscore=1501 malwarescore=0 suspectscore=0 phishscore=0 bulkscore=0 spamscore=0 clxscore=1015 lowpriorityscore=0 impostorscore=0 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1709140000 definitions=main-1803260169 Sender: linux-scsi-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-scsi@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP The following Oops can be encountered if a device removal or system shutdown is initiated while an EEH recovery is in process: [c000000ff2f479c0] c008000015256f18 cxlflash_pci_slot_reset+0xa0/0x100 [cxlflash] [c000000ff2f47a30] c00800000dae22e0 cxl_pci_slot_reset+0x168/0x290 [cxl] [c000000ff2f47ae0] c00000000003ef1c eeh_report_reset+0xec/0x170 [c000000ff2f47b20] c00000000003d0b8 eeh_pe_dev_traverse+0x98/0x170 [c000000ff2f47bb0] c00000000003f80c eeh_handle_normal_event+0x56c/0x580 [c000000ff2f47c60] c00000000003fba4 eeh_handle_event+0x2a4/0x338 [c000000ff2f47d10] c0000000000400b8 eeh_event_handler+0x1f8/0x200 [c000000ff2f47dc0] c00000000013da48 kthread+0x1a8/0x1b0 [c000000ff2f47e30] c00000000000b528 ret_from_kernel_thread+0x5c/0xb4 The remove handler frees AFU memory while the EEH recovery is in progress, leading to a race condition. This can result in a crash if the recovery thread tries to access this memory. To resolve this issue, the cxlflash remove handler will evaluate the device state and yield to any active reset or probing threads. Signed-off-by: Uma Krishnan Acked-by: Matthew R. Ochs --- drivers/scsi/cxlflash/main.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/drivers/scsi/cxlflash/main.c b/drivers/scsi/cxlflash/main.c index 42a95b7..dfe7648 100644 --- a/drivers/scsi/cxlflash/main.c +++ b/drivers/scsi/cxlflash/main.c @@ -946,9 +946,9 @@ static void cxlflash_remove(struct pci_dev *pdev) return; } - /* If a Task Management Function is active, wait for it to complete - * before continuing with remove. - */ + /* Yield to running recovery threads before continuing with remove */ + wait_event(cfg->reset_waitq, cfg->state != STATE_RESET && + cfg->state != STATE_PROBING); spin_lock_irqsave(&cfg->tmf_slock, lock_flags); if (cfg->tmf_active) wait_event_interruptible_lock_irq(cfg->tmf_waitq,