From patchwork Tue Mar 5 10:58:41 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Saurav Kashyap X-Patchwork-Id: 10839273 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id F181F1823 for ; Tue, 5 Mar 2019 10:59:33 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id DC75D2B2A9 for ; Tue, 5 Mar 2019 10:59:33 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id D0D732B352; Tue, 5 Mar 2019 10:59:33 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 713C72B2C2 for ; Tue, 5 Mar 2019 10:59:32 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727450AbfCEK7b (ORCPT ); Tue, 5 Mar 2019 05:59:31 -0500 Received: from mail-eopbgr750050.outbound.protection.outlook.com ([40.107.75.50]:37447 "EHLO NAM02-BL2-obe.outbound.protection.outlook.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1726765AbfCEK7b (ORCPT ); Tue, 5 Mar 2019 05:59:31 -0500 Received: from BYAPR07CA0024.namprd07.prod.outlook.com (2603:10b6:a02:bc::37) by BN7PR07MB4753.namprd07.prod.outlook.com (2603:10b6:406:f0::23) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.1665.19; Tue, 5 Mar 2019 10:59:23 +0000 Received: from BY2NAM05FT059.eop-nam05.prod.protection.outlook.com (2a01:111:f400:7e52::206) by BYAPR07CA0024.outlook.office365.com (2603:10b6:a02:bc::37) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384) id 15.20.1665.18 via Frontend Transport; Tue, 5 Mar 2019 10:59:22 +0000 Authentication-Results: spf=fail (sender IP is 199.233.58.38) smtp.mailfrom=marvell.com; vger.kernel.org; dkim=none (message not signed) header.d=none;vger.kernel.org; dmarc=fail action=none header.from=marvell.com; Received-SPF: Fail (protection.outlook.com: domain of marvell.com does not designate 199.233.58.38 as permitted sender) receiver=protection.outlook.com; client-ip=199.233.58.38; helo=CAEXCH02.caveonetworks.com; Received: from CAEXCH02.caveonetworks.com (199.233.58.38) by BY2NAM05FT059.mail.protection.outlook.com (10.152.100.196) with Microsoft SMTP Server (version=TLS1_0, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA) id 15.20.1686.5 via Frontend Transport; Tue, 5 Mar 2019 10:59:22 +0000 Received: from dut1171.mv.qlogic.com (10.112.88.18) by CAEXCH02.caveonetworks.com (10.67.98.110) with Microsoft SMTP Server (TLS) id 14.2.347.0; Tue, 5 Mar 2019 02:59:20 -0800 Received: from dut1171.mv.qlogic.com (localhost [127.0.0.1]) by dut1171.mv.qlogic.com (8.14.7/8.14.7) with ESMTP id x25AxK7P013244; Tue, 5 Mar 2019 02:59:20 -0800 Received: (from root@localhost) by dut1171.mv.qlogic.com (8.14.7/8.14.7/Submit) id x25AxKkR013243; Tue, 5 Mar 2019 02:59:20 -0800 From: Saurav Kashyap To: CC: , Subject: [PATCH 06/26] qedf: Modify abort and tmf handler to handle edge condition and flush. Date: Tue, 5 Mar 2019 02:58:41 -0800 Message-ID: <20190305105901.13185-7-skashyap@marvell.com> X-Mailer: git-send-email 2.12.0 In-Reply-To: <20190305105901.13185-1-skashyap@marvell.com> References: <20190305105901.13185-1-skashyap@marvell.com> MIME-Version: 1.0 X-EOPAttributedMessage: 0 X-Matching-Connectors: 131962571624258578;(abac79dc-c90b-41ba-8033-08d666125e47);(abac79dc-c90b-41ba-8033-08d666125e47) X-Forefront-Antispam-Report: CIP:199.233.58.38;IPV:CAL;CTRY:US;EFV:NLI;SFV:NSPM;SFS:(10009020)(39860400002)(136003)(346002)(376002)(396003)(2980300002)(1109001)(1110001)(339900001)(199004)(189003)(486006)(54906003)(86362001)(1076003)(305945005)(316002)(80596001)(42186006)(4326008)(16586007)(36906005)(69596002)(81166006)(26005)(14444005)(2351001)(97736004)(30864003)(51416003)(81156014)(8676002)(76176011)(446003)(47776003)(11346002)(336012)(105606002)(50226002)(106466001)(26826003)(2616005)(8936002)(6862004)(53936002)(5660300002)(48376002)(68736007)(85426001)(126002)(498600001)(6666004)(87636003)(356004)(50466002)(476003)(36756003)(2906002);DIR:OUT;SFP:1101;SCL:1;SRVR:BN7PR07MB4753;H:CAEXCH02.caveonetworks.com;FPR:;SPF:Fail;LANG:en;PTR:InfoDomainNonexistent;A:1;MX:1; X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id: 41339bce-cdca-475c-4de2-08d6a1599fc6 X-Microsoft-Antispam: BCL:0;PCL:0;RULEID:(2390118)(7020095)(5600127)(711020)(4605104)(2017052603328);SRVR:BN7PR07MB4753; X-MS-TrafficTypeDiagnostic: BN7PR07MB4753: X-Microsoft-Antispam-PRVS: X-Forefront-PRVS: 0967749BC1 X-Microsoft-Exchange-Diagnostics: 1;BN7PR07MB4753;23:ehR6UCfHP3DfVtZ4ZwbRf8ooQv7GmQLXCrT/Yx5O2XdbkAWpIhhu2z/JgRvXNJow9zkZDSV6SIhUHSMWWMCCm6PgVrDsjEEHEe4Gb5x6Ge8t+G/iemD9Eu0sRv0OpZl0poyfS9WXp4+55lx8/LKAX+XALKxxSQSDXMvxSMLOphIYk/slw4sQBSwDv4hh/kzu6igyKNH7AzyYMDk9rRNNCnUMbQ7suoDknu3D1v6LLZhuJXrtg+OUIdZRXOayZZFVLk4xbu5q2q+LxNhkXJJB8Ia/gkWV9Sz+31TN15syEwnPPH8aglf9r+UCGEBEVGQobG6r5LPRcX260G2ewpyDdqQCHZoSKKYlIcE/u4AofRRo5uwHpZQOswC+3zpkRer+46YpKSq0XhaRDanAzvWouJvZA4AXJeSjZ7QCFewkDSgrk9CajDir8g9qQJpNBKmeePle01tHCiUPyYlgN90FUiXDZ+dcBkhHOC7SDcgJNiUxMtqkT6Q4G7pfELerm5vySywS2kvpnDxSlQgmgll+8dNiofULFSSqpdJ2E+nh9FWe4lpOcdajOMKi1fZ/ftUkvtFWqAQ/zChGIbl7Ao3B1eetVhcb7m6goicJb8cGtgt781EB9m4ZATiqlIlFtC0yt7lAl0sMgc7GibBPypqN7bUJ1FGfY62S2iB/oLPSPMW/0tJNMcEXQMcPpbR/3Mo81vBrpOeHWHSe4EvGGSdfy9YAN+/5bYBIF/9/cxtzmNh2hP1y7EdhIGJdhxmQWuoL8oKitzCXXl1dKgpRUIL0DEff05rik2qvH8UUmvbk/dPkjh/wFo3tXhbGQ87V/bTnN4E2URlDOrOLh+eGMx8XBZuHwyB5j6jBfB1i9x3NxcnAixbESrxL6iK51V6R2ujE7sETJ5sC+T6+7uQMS6Ar63PATU3gK5mTvTDDQyr0nx8tUxdZICQ7MzkwvX4nxeXkNwTFV3wGNcJ2LbjbjxPJPbmAnhHrZpkfagxu20WTjaAbNNh1e2wFFg0pFM2zAAMMbdMunm3Ue3A+clfPtPqRCcu3Zb9FHOyG8NIrCYky8mnpa/h/8cvYPfNzXhCwInKVyVo3wHlKnFzAAvW/JwmnB2xGrs4l4u1XM2yxlEqObijPT8CExkH5Q+un5P4wT4qLO50BISC0TBHlRHKgz7a8FsAK8Ne8BA1JdX8cowlffte6CflqpUwX6hQv3x6s9rVSuQPf7VtmeuXxxQc3GFLrYsKGZum4R/XIFC5l82OIlI0F25iQ/SVEZ2ivumfhh2SRM5w9wKEbMuI+/Fdx4J6msg== X-Microsoft-Antispam-Message-Info: 49w0mIVSyjxxkz6Hb3ziWVrQDgI3T1ZumCyJ900RESBkFIFp0tBo4nmmrvIzH20J5Prhxl/v2UBijksXaNNWyD2S/WhdtZnD7ozvA/avqmCrkDEb9JIQWrXjaE2lPvwWqk6lPhzDKGpbash+YcrGt71ZL6EwicmYPAGakRZk47tBd9kSA9atI/8TX0Dqdnl1LX4nzRspnBp5Gej/KoYI31g2bwmwEd0mfw7saIrvLhEjnwjY1k8JvwAe9sHZDUgb3LzOrpqcTwlgfDQmPCSLD4fQoyqYvnV/dtF+sglVIR1PBGKbMDFqT+1FmpiYbQDQ3x2OGhBopU1gO5r/svCNnMgaQ+1hhUOUSf9MW1qJtuHZXYVcucK82UET8K3IpQXUvjf8nFD/0/ltmQTjRo2T6pH3D3Cja4T5ZaV9F9TEXFs= X-MS-Exchange-CrossTenant-OriginalArrivalTime: 05 Mar 2019 10:59:22.1706 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 41339bce-cdca-475c-4de2-08d6a1599fc6 X-MS-Exchange-CrossTenant-Id: 5afe0b00-7697-4969-b663-5eab37d5f47e X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=5afe0b00-7697-4969-b663-5eab37d5f47e;Ip=[199.233.58.38];Helo=[CAEXCH02.caveonetworks.com] X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: BN7PR07MB4753 Sender: linux-scsi-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-scsi@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP An I/Os can be any state when flush is called, it can be in abort, Waiting for abort, RRQ send and waiting or TMF send. - HZ can be different on different architecture, correctly set abort timeout value. - Flush can complete the I/Os prematurely, handle refcount for aborted I/Os and for which RRQ is pending. - Differentiate LUN/TARGET reset, as cleanup needs to be send to firmware accordingly. - Add flush mutex to sync cleanup call from abort and flush routine. - Clear abort/outstanding bit on timeout. Signed-off-by: Shyam Sundar Signed-off-by: Chad Dupuis Signed-off-by: Saurav Kashyap --- drivers/scsi/qedf/qedf.h | 10 ++- drivers/scsi/qedf/qedf_els.c | 31 +++++++- drivers/scsi/qedf/qedf_io.c | 178 +++++++++++++++++++++++++++++++++++------- drivers/scsi/qedf/qedf_main.c | 124 +++++++++++++++++++++-------- 4 files changed, 276 insertions(+), 67 deletions(-) diff --git a/drivers/scsi/qedf/qedf.h b/drivers/scsi/qedf/qedf.h index 787cc12..9e5e183 100644 --- a/drivers/scsi/qedf/qedf.h +++ b/drivers/scsi/qedf/qedf.h @@ -49,8 +49,8 @@ sizeof(struct fc_frame_header)) #define QEDF_MAX_NPIV 64 #define QEDF_TM_TIMEOUT 10 -#define QEDF_ABORT_TIMEOUT 10 -#define QEDF_CLEANUP_TIMEOUT 10 +#define QEDF_ABORT_TIMEOUT (10 * 1000) +#define QEDF_CLEANUP_TIMEOUT 1 #define QEDF_MAX_CDB_LEN 16 #define UPSTREAM_REMOVE 1 @@ -82,6 +82,7 @@ struct qedf_els_cb_arg { }; enum qedf_ioreq_event { + QEDF_IOREQ_EV_NONE, QEDF_IOREQ_EV_ABORT_SUCCESS, QEDF_IOREQ_EV_ABORT_FAILED, QEDF_IOREQ_EV_SEND_RRQ, @@ -182,7 +183,10 @@ struct qedf_rport { #define QEDF_RPORT_SESSION_READY 1 #define QEDF_RPORT_UPLOADING_CONNECTION 2 #define QEDF_RPORT_IN_RESET 3 +#define QEDF_RPORT_IN_LUN_RESET 4 +#define QEDF_RPORT_IN_TARGET_RESET 5 unsigned long flags; + int lun_reset_lun; unsigned long retry_delay_timestamp; struct fc_rport *rport; struct fc_rport_priv *rdata; @@ -395,6 +399,8 @@ struct qedf_ctx { u8 target_resets; u8 task_set_fulls; u8 busy; + /* Used for flush routine */ + struct mutex flush_mutex; }; struct io_bdt { diff --git a/drivers/scsi/qedf/qedf_els.c b/drivers/scsi/qedf/qedf_els.c index a60819b..ac2bfc2 100644 --- a/drivers/scsi/qedf/qedf_els.c +++ b/drivers/scsi/qedf/qedf_els.c @@ -201,8 +201,12 @@ static void qedf_rrq_compl(struct qedf_els_cb_arg *cb_arg) " orig xid = 0x%x, rrq_xid = 0x%x, refcount=%d\n", orig_io_req, orig_io_req->xid, rrq_req->xid, refcount); - /* This should return the aborted io_req to the command pool */ - if (orig_io_req) + /* + * This should return the aborted io_req to the command pool. Note that + * we need to check the refcound in case the original request was + * flushed but we get a completion on this xid. + */ + if (orig_io_req && refcount > 0) kref_put(&orig_io_req->refcount, qedf_release_cmd); out_free: @@ -229,6 +233,7 @@ int qedf_send_rrq(struct qedf_ioreq *aborted_io_req) uint32_t sid; uint32_t r_a_tov; int rc; + int refcount; if (!aborted_io_req) { QEDF_ERR(NULL, "abort_io_req is NULL.\n"); @@ -237,6 +242,15 @@ int qedf_send_rrq(struct qedf_ioreq *aborted_io_req) fcport = aborted_io_req->fcport; + if (!fcport) { + refcount = kref_read(&aborted_io_req->refcount); + QEDF_ERR(NULL, + "RRQ work was queued prior to a flush xid=0x%x, refcount=%d.\n", + aborted_io_req->xid, refcount); + kref_put(&aborted_io_req->refcount, qedf_release_cmd); + return -EINVAL; + } + /* Check that fcport is still offloaded */ if (!test_bit(QEDF_RPORT_SESSION_READY, &fcport->flags)) { QEDF_ERR(NULL, "fcport is no longer offloaded.\n"); @@ -249,6 +263,19 @@ int qedf_send_rrq(struct qedf_ioreq *aborted_io_req) } qedf = fcport->qedf; + + /* + * Sanity check that we can send a RRQ to make sure that refcount isn't + * 0 + */ + refcount = kref_read(&aborted_io_req->refcount); + if (refcount != 1) { + QEDF_INFO(&qedf->dbg_ctx, QEDF_LOG_ELS, + "refcount for xid=%x io_req=%p refcount=%d is not 1.\n", + aborted_io_req->xid, aborted_io_req, refcount); + return -EINVAL; + } + lport = qedf->lport; sid = fcport->sid; r_a_tov = lport->r_a_tov; diff --git a/drivers/scsi/qedf/qedf_io.c b/drivers/scsi/qedf/qedf_io.c index 3a8c7f7..044ef63 100644 --- a/drivers/scsi/qedf/qedf_io.c +++ b/drivers/scsi/qedf/qedf_io.c @@ -43,8 +43,9 @@ static void qedf_cmd_timeout(struct work_struct *work) switch (io_req->cmd_type) { case QEDF_ABTS: if (qedf == NULL) { - QEDF_INFO(NULL, QEDF_LOG_IO, "qedf is NULL for xid=0x%x.\n", - io_req->xid); + QEDF_INFO(NULL, QEDF_LOG_IO, + "qedf is NULL for ABTS xid=0x%x.\n", + io_req->xid); return; } @@ -61,6 +62,9 @@ static void qedf_cmd_timeout(struct work_struct *work) */ kref_put(&io_req->refcount, qedf_release_cmd); + /* Clear in abort bit now that we're done with the command */ + clear_bit(QEDF_CMD_IN_ABORT, &io_req->flags); + /* * Now that the original I/O and the ABTS are complete see * if we need to reconnect to the target. @@ -68,6 +72,15 @@ static void qedf_cmd_timeout(struct work_struct *work) qedf_restart_rport(fcport); break; case QEDF_ELS: + if (!qedf) { + QEDF_INFO(NULL, QEDF_LOG_IO, + "qedf is NULL for ELS xid=0x%x.\n", + io_req->xid); + return; + } + /* ELS request no longer outstanding since it timed out */ + clear_bit(QEDF_CMD_OUTSTANDING, &io_req->flags); + kref_get(&io_req->refcount); /* * Don't attempt to clean an ELS timeout as any subseqeunt @@ -1137,6 +1150,19 @@ void qedf_scsi_completion(struct qedf_ctx *qedf, struct fcoe_cqe *cqe, fcport = io_req->fcport; + /* + * When flush is active, let the cmds be completed from the cleanup + * context + */ + if (test_bit(QEDF_RPORT_IN_TARGET_RESET, &fcport->flags) || + (test_bit(QEDF_RPORT_IN_LUN_RESET, &fcport->flags) && + sc_cmd->device->lun == (u64)fcport->lun_reset_lun)) { + QEDF_INFO(&qedf->dbg_ctx, QEDF_LOG_IO, + "Dropping good completion xid=0x%x as fcport is flushing", + io_req->xid); + return; + } + qedf_parse_fcp_rsp(io_req, fcp_rsp); qedf_unmap_sg_list(qedf, io_req); @@ -1720,15 +1746,23 @@ int qedf_initiate_abts(struct qedf_ioreq *io_req, bool return_scsi_cmd_on_abts) unsigned long flags; struct fcoe_wqe *sqe; u16 sqe_idx; + int refcount = 0; /* Sanity check qedf_rport before dereferencing any pointers */ if (!test_bit(QEDF_RPORT_SESSION_READY, &fcport->flags)) { QEDF_ERR(NULL, "tgt not offloaded\n"); rc = 1; - goto abts_err; + goto out; } rdata = fcport->rdata; + + if (!rdata || !kref_get_unless_zero(&rdata->kref)) { + QEDF_ERR(&qedf->dbg_ctx, "stale rport\n"); + rc = 1; + goto out; + } + r_a_tov = rdata->r_a_tov; qedf = fcport->qedf; lport = qedf->lport; @@ -1736,20 +1770,20 @@ int qedf_initiate_abts(struct qedf_ioreq *io_req, bool return_scsi_cmd_on_abts) if (lport->state != LPORT_ST_READY || !(lport->link_up)) { QEDF_ERR(&(qedf->dbg_ctx), "link is not ready\n"); rc = 1; - goto abts_err; + goto out; } if (atomic_read(&qedf->link_down_tmo_valid) > 0) { QEDF_ERR(&(qedf->dbg_ctx), "link_down_tmo active.\n"); rc = 1; - goto abts_err; + goto out; } /* Ensure room on SQ */ if (!atomic_read(&fcport->free_sqes)) { QEDF_ERR(&(qedf->dbg_ctx), "No SQ entries available\n"); rc = 1; - goto abts_err; + goto out; } if (test_bit(QEDF_RPORT_UPLOADING_CONNECTION, &fcport->flags)) { @@ -1774,18 +1808,17 @@ int qedf_initiate_abts(struct qedf_ioreq *io_req, bool return_scsi_cmd_on_abts) qedf->control_requests++; qedf->packet_aborts++; - /* Set the return CPU to be the same as the request one */ - io_req->cpu = smp_processor_id(); - /* Set the command type to abort */ io_req->cmd_type = QEDF_ABTS; io_req->return_scsi_cmd_on_abts = return_scsi_cmd_on_abts; set_bit(QEDF_CMD_IN_ABORT, &io_req->flags); - QEDF_INFO(&(qedf->dbg_ctx), QEDF_LOG_SCSI_TM, "ABTS io_req xid = " - "0x%x\n", xid); + refcount = kref_read(&io_req->refcount); + QEDF_INFO(&qedf->dbg_ctx, QEDF_LOG_SCSI_TM, + "ABTS io_req xid = 0x%x refcount=%d\n", + xid, refcount); - qedf_cmd_timer_set(qedf, io_req, QEDF_ABORT_TIMEOUT * HZ); + qedf_cmd_timer_set(qedf, io_req, QEDF_ABORT_TIMEOUT); spin_lock_irqsave(&fcport->rport_lock, flags); @@ -1799,13 +1832,6 @@ int qedf_initiate_abts(struct qedf_ioreq *io_req, bool return_scsi_cmd_on_abts) spin_unlock_irqrestore(&fcport->rport_lock, flags); - return rc; -abts_err: - /* - * If the ABTS task fails to queue then we need to cleanup the - * task at the firmware. - */ - qedf_initiate_cleanup(io_req, return_scsi_cmd_on_abts); out: return rc; } @@ -1815,25 +1841,59 @@ void qedf_process_abts_compl(struct qedf_ctx *qedf, struct fcoe_cqe *cqe, { uint32_t r_ctl; uint16_t xid; + int rc; + struct qedf_rport *fcport = io_req->fcport; QEDF_INFO(&(qedf->dbg_ctx), QEDF_LOG_SCSI_TM, "Entered with xid = " "0x%x cmd_type = %d\n", io_req->xid, io_req->cmd_type); - cancel_delayed_work(&io_req->timeout_work); - xid = io_req->xid; r_ctl = cqe->cqe_info.abts_info.r_ctl; + /* This was added at a point when we were scheduling abts_compl & + * cleanup_compl on different CPUs and there was a possibility of + * the io_req to be freed from the other context before we got here. + */ + if (!fcport) { + QEDF_INFO(&qedf->dbg_ctx, QEDF_LOG_IO, + "Dropping ABTS completion xid=0x%x as fcport is NULL", + io_req->xid); + return; + } + + /* + * When flush is active, let the cmds be completed from the cleanup + * context + */ + if (test_bit(QEDF_RPORT_IN_TARGET_RESET, &fcport->flags) || + test_bit(QEDF_RPORT_IN_LUN_RESET, &fcport->flags)) { + QEDF_INFO(&qedf->dbg_ctx, QEDF_LOG_IO, + "Dropping ABTS completion xid=0x%x as fcport is flushing", + io_req->xid); + return; + } + + if (!cancel_delayed_work(&io_req->timeout_work)) { + QEDF_ERR(&qedf->dbg_ctx, + "Wasn't able to cancel abts timeout work.\n"); + } + switch (r_ctl) { case FC_RCTL_BA_ACC: QEDF_INFO(&(qedf->dbg_ctx), QEDF_LOG_SCSI_TM, "ABTS response - ACC Send RRQ after R_A_TOV\n"); io_req->event = QEDF_IOREQ_EV_ABORT_SUCCESS; + rc = kref_get_unless_zero(&io_req->refcount); + if (!rc) { + QEDF_INFO(&qedf->dbg_ctx, QEDF_LOG_SCSI_TM, + "kref is already zero so ABTS was already completed or flushed xid=0x%x.\n", + io_req->xid); + return; + } /* * Dont release this cmd yet. It will be relesed * after we get RRQ response */ - kref_get(&io_req->refcount); queue_delayed_work(qedf->dpc_wq, &io_req->rrq_work, msecs_to_jiffies(qedf->lport->r_a_tov)); break; @@ -2106,6 +2166,7 @@ static int qedf_execute_tmf(struct qedf_rport *fcport, struct scsi_cmnd *sc_cmd, int rc = 0; uint16_t xid; int tmo = 0; + int lun = 0; unsigned long flags; struct fcoe_wqe *sqe; u16 sqe_idx; @@ -2115,6 +2176,7 @@ static int qedf_execute_tmf(struct qedf_rport *fcport, struct scsi_cmnd *sc_cmd, return FAILED; } + lun = (int)sc_cmd->device->lun; if (!test_bit(QEDF_RPORT_SESSION_READY, &fcport->flags)) { QEDF_ERR(&(qedf->dbg_ctx), "fcport not offloaded\n"); rc = FAILED; @@ -2141,7 +2203,7 @@ static int qedf_execute_tmf(struct qedf_rport *fcport, struct scsi_cmnd *sc_cmd, io_req->fcport = fcport; io_req->cmd_type = QEDF_TASK_MGMT_CMD; - /* Set the return CPU to be the same as the request one */ + /* Record which cpu this request is associated with */ io_req->cpu = smp_processor_id(); /* Set TM flags */ @@ -2150,7 +2212,7 @@ static int qedf_execute_tmf(struct qedf_rport *fcport, struct scsi_cmnd *sc_cmd, io_req->tm_flags = tm_flags; /* Default is to return a SCSI command when an error occurs */ - io_req->return_scsi_cmd_on_abts = true; + io_req->return_scsi_cmd_on_abts = false; /* Obtain exchange id */ xid = io_req->xid; @@ -2174,12 +2236,16 @@ static int qedf_execute_tmf(struct qedf_rport *fcport, struct scsi_cmnd *sc_cmd, spin_unlock_irqrestore(&fcport->rport_lock, flags); + set_bit(QEDF_CMD_OUTSTANDING, &io_req->flags); tmo = wait_for_completion_timeout(&io_req->tm_done, QEDF_TM_TIMEOUT * HZ); if (!tmo) { rc = FAILED; QEDF_ERR(&(qedf->dbg_ctx), "wait for tm_cmpl timeout!\n"); + /* Clear outstanding bit since command timed out */ + clear_bit(QEDF_CMD_OUTSTANDING, &io_req->flags); + io_req->sc_cmd = NULL; } else { /* Check TMF response code */ if (io_req->fcp_rsp_code == 0) @@ -2187,14 +2253,25 @@ static int qedf_execute_tmf(struct qedf_rport *fcport, struct scsi_cmnd *sc_cmd, else rc = FAILED; } + /* + * Double check that fcport has not gone into an uploading state before + * executing the command flush for the LUN/target. + */ + if (test_bit(QEDF_RPORT_UPLOADING_CONNECTION, &fcport->flags)) { + QEDF_ERR(&qedf->dbg_ctx, + "fcport is uploading, not executing flush.\n"); + goto no_flush; + } + /* We do not need this io_req any more */ + kref_put(&io_req->refcount, qedf_release_cmd); + if (tm_flags == FCP_TMF_LUN_RESET) - qedf_flush_active_ios(fcport, (int)sc_cmd->device->lun); + qedf_flush_active_ios(fcport, lun); else qedf_flush_active_ios(fcport, -1); - kref_put(&io_req->refcount, qedf_release_cmd); - +no_flush: if (rc != SUCCESS) { QEDF_ERR(&(qedf->dbg_ctx), "task mgmt command failed...\n"); rc = FAILED; @@ -2215,22 +2292,57 @@ int qedf_initiate_tmf(struct scsi_cmnd *sc_cmd, u8 tm_flags) struct fc_lport *lport; int rc = SUCCESS; int rval; + struct qedf_ioreq *io_req = NULL; + int ref_cnt = 0; + struct fc_rport_priv *rdata = fcport->rdata; - rval = fc_remote_port_chkready(rport); + QEDF_ERR(NULL, + "tm_flags 0x%x sc_cmd %p op = 0x%02x target_id = 0x%x lun=%d\n", + tm_flags, sc_cmd, sc_cmd->cmnd[0], rport->scsi_target_id, + (int)sc_cmd->device->lun); + if (!rdata || !kref_get_unless_zero(&rdata->kref)) { + QEDF_ERR(NULL, "stale rport\n"); + return FAILED; + } + + QEDF_ERR(NULL, "portid=%06x tm_flags =%s\n", rdata->ids.port_id, + (tm_flags == FCP_TMF_TGT_RESET) ? "TARGET RESET" : + "LUN RESET"); + + if (sc_cmd->SCp.ptr) { + io_req = (struct qedf_ioreq *)sc_cmd->SCp.ptr; + ref_cnt = kref_read(&io_req->refcount); + QEDF_ERR(NULL, + "orig io_req = %p xid = 0x%x ref_cnt = %d.\n", + io_req, io_req->xid, ref_cnt); + } + + rval = fc_remote_port_chkready(rport); if (rval) { QEDF_ERR(NULL, "device_reset rport not ready\n"); rc = FAILED; goto tmf_err; } - if (fcport == NULL) { + rc = fc_block_scsi_eh(sc_cmd); + if (rc) + return rc; + + if (!fcport) { QEDF_ERR(NULL, "device_reset: rport is NULL\n"); rc = FAILED; goto tmf_err; } qedf = fcport->qedf; + + if (!qedf) { + QEDF_ERR(NULL, "qedf is NULL.\n"); + rc = FAILED; + goto tmf_err; + } + lport = qedf->lport; if (test_bit(QEDF_UNLOADING, &qedf->flags) || @@ -2245,6 +2357,12 @@ int qedf_initiate_tmf(struct scsi_cmnd *sc_cmd, u8 tm_flags) goto tmf_err; } + if (test_bit(QEDF_RPORT_UPLOADING_CONNECTION, &fcport->flags)) { + QEDF_ERR(&qedf->dbg_ctx, "fcport is uploading.\n"); + rc = FAILED; + goto tmf_err; + } + rc = qedf_execute_tmf(fcport, sc_cmd, tm_flags); tmf_err: @@ -2256,6 +2374,8 @@ void qedf_process_tmf_compl(struct qedf_ctx *qedf, struct fcoe_cqe *cqe, { struct fcoe_cqe_rsp_info *fcp_rsp; + clear_bit(QEDF_CMD_OUTSTANDING, &io_req->flags); + fcp_rsp = &cqe->cqe_info.rsp_info; qedf_parse_fcp_rsp(io_req, fcp_rsp); diff --git a/drivers/scsi/qedf/qedf_main.c b/drivers/scsi/qedf/qedf_main.c index 8affe0e..3fd8107 100644 --- a/drivers/scsi/qedf/qedf_main.c +++ b/drivers/scsi/qedf/qedf_main.c @@ -615,50 +615,113 @@ static u32 qedf_get_login_failures(void *cookie) static int qedf_eh_abort(struct scsi_cmnd *sc_cmd) { struct fc_rport *rport = starget_to_rport(scsi_target(sc_cmd->device)); - struct fc_rport_libfc_priv *rp = rport->dd_data; - struct qedf_rport *fcport; struct fc_lport *lport; struct qedf_ctx *qedf; struct qedf_ioreq *io_req; + struct fc_rport_libfc_priv *rp = rport->dd_data; + struct fc_rport_priv *rdata; + struct qedf_rport *fcport = NULL; int rc = FAILED; + int wait_count = 100; + int refcount = 0; int rval; - - if (fc_remote_port_chkready(rport)) { - QEDF_ERR(NULL, "rport not ready\n"); - goto out; - } + int got_ref = 0; lport = shost_priv(sc_cmd->device->host); qedf = (struct qedf_ctx *)lport_priv(lport); - if ((lport->state != LPORT_ST_READY) || !(lport->link_up)) { - QEDF_ERR(&(qedf->dbg_ctx), "link not ready.\n"); + /* rport and tgt are allocated together, so tgt should be non-NULL */ + fcport = (struct qedf_rport *)&rp[1]; + rdata = fcport->rdata; + if (!rdata || !kref_get_unless_zero(&rdata->kref)) { + QEDF_ERR(&qedf->dbg_ctx, "stale rport, sc_cmd=%p\n", sc_cmd); + rc = 1; goto out; } - fcport = (struct qedf_rport *)&rp[1]; io_req = (struct qedf_ioreq *)sc_cmd->SCp.ptr; if (!io_req) { - QEDF_ERR(&(qedf->dbg_ctx), "io_req is NULL.\n"); + QEDF_ERR(&qedf->dbg_ctx, + "sc_cmd not queued with lld, sc_cmd=%p op=0x%02x, port_id=%06x\n", + sc_cmd, sc_cmd->cmnd[0], + rdata->ids.port_id); rc = SUCCESS; - goto out; + goto drop_rdata_kref; } - QEDF_ERR(&(qedf->dbg_ctx), "Aborting io_req sc_cmd=%p xid=0x%x " - "fp_idx=%d.\n", sc_cmd, io_req->xid, io_req->fp_idx); + rval = kref_get_unless_zero(&io_req->refcount); /* ID: 005 */ + if (rval) + got_ref = 1; + + /* If we got a valid io_req, confirm it belongs to this sc_cmd. */ + if (!rval || io_req->sc_cmd != sc_cmd) { + QEDF_ERR(&qedf->dbg_ctx, + "Freed/Incorrect io_req, io_req->sc_cmd=%p, sc_cmd=%p, port_id=%06x, bailing out.\n", + io_req->sc_cmd, sc_cmd, rdata->ids.port_id); + + goto drop_rdata_kref; + } + + if (fc_remote_port_chkready(rport)) { + refcount = kref_read(&io_req->refcount); + QEDF_ERR(&qedf->dbg_ctx, + "rport not ready, io_req=%p, xid=0x%x sc_cmd=%p op=0x%02x, refcount=%d, port_id=%06x\n", + io_req, io_req->xid, sc_cmd, sc_cmd->cmnd[0], + refcount, rdata->ids.port_id); + + goto drop_rdata_kref; + } + + rc = fc_block_scsi_eh(sc_cmd); + if (rc) + goto drop_rdata_kref; + + if (test_bit(QEDF_RPORT_UPLOADING_CONNECTION, &fcport->flags)) { + QEDF_ERR(&qedf->dbg_ctx, + "Connection uploading, xid=0x%x., port_id=%06x\n", + io_req->xid, rdata->ids.port_id); + while (io_req->sc_cmd && (wait_count != 0)) { + msleep(100); + wait_count--; + } + if (wait_count) { + QEDF_ERR(&qedf->dbg_ctx, "ABTS succeeded\n"); + rc = SUCCESS; + } else { + QEDF_ERR(&qedf->dbg_ctx, "ABTS failed\n"); + rc = FAILED; + } + goto drop_rdata_kref; + } + + if (lport->state != LPORT_ST_READY || !(lport->link_up)) { + QEDF_ERR(&qedf->dbg_ctx, "link not ready.\n"); + goto drop_rdata_kref; + } + + QEDF_ERR(&qedf->dbg_ctx, + "Aborting io_req=%p sc_cmd=%p xid=0x%x fp_idx=%d, port_id=%06x.\n", + io_req, sc_cmd, io_req->xid, io_req->fp_idx, + rdata->ids.port_id); if (qedf->stop_io_on_error) { qedf_stop_all_io(qedf); rc = SUCCESS; - goto out; + goto drop_rdata_kref; } init_completion(&io_req->abts_done); rval = qedf_initiate_abts(io_req, true); if (rval) { QEDF_ERR(&(qedf->dbg_ctx), "Failed to queue ABTS.\n"); - goto out; + /* + * If we fail to queue the ABTS then return this command to + * the SCSI layer as it will own and free the xid + */ + rc = SUCCESS; + qedf_scsi_done(qedf, io_req, DID_ERROR); + goto drop_rdata_kref; } wait_for_completion(&io_req->abts_done); @@ -684,19 +747,27 @@ static int qedf_eh_abort(struct scsi_cmnd *sc_cmd) QEDF_ERR(&(qedf->dbg_ctx), "ABTS failed, xid=0x%x.\n", io_req->xid); +drop_rdata_kref: + kref_put(&rdata->kref, fc_rport_destroy); out: + if (got_ref) + kref_put(&io_req->refcount, qedf_release_cmd); return rc; } static int qedf_eh_target_reset(struct scsi_cmnd *sc_cmd) { - QEDF_ERR(NULL, "TARGET RESET Issued..."); + QEDF_ERR(NULL, "%d:0:%d:%lld: TARGET RESET Issued...", + sc_cmd->device->host->host_no, sc_cmd->device->id, + sc_cmd->device->lun); return qedf_initiate_tmf(sc_cmd, FCP_TMF_TGT_RESET); } static int qedf_eh_device_reset(struct scsi_cmnd *sc_cmd) { - QEDF_ERR(NULL, "LUN RESET Issued...\n"); + QEDF_ERR(NULL, "%d:0:%d:%lld: LUN RESET Issued... ", + sc_cmd->device->host->host_no, sc_cmd->device->id, + sc_cmd->device->lun); return qedf_initiate_tmf(sc_cmd, FCP_TMF_LUN_RESET); } @@ -740,22 +811,6 @@ static int qedf_eh_host_reset(struct scsi_cmnd *sc_cmd) { struct fc_lport *lport; struct qedf_ctx *qedf; - struct fc_rport *rport = starget_to_rport(scsi_target(sc_cmd->device)); - struct fc_rport_libfc_priv *rp = rport->dd_data; - struct qedf_rport *fcport = (struct qedf_rport *)&rp[1]; - int rval; - - rval = fc_remote_port_chkready(rport); - - if (rval) { - QEDF_ERR(NULL, "device_reset rport not ready\n"); - return FAILED; - } - - if (fcport == NULL) { - QEDF_ERR(NULL, "device_reset: rport is NULL\n"); - return FAILED; - } lport = shost_priv(sc_cmd->device->host); qedf = lport_priv(lport); @@ -3002,6 +3057,7 @@ static int __qedf_probe(struct pci_dev *pdev, int mode) pci_set_drvdata(pdev, qedf); init_completion(&qedf->fipvlan_compl); mutex_init(&qedf->stats_mutex); + mutex_init(&qedf->flush_mutex); QEDF_INFO(&(qedf->dbg_ctx), QEDF_LOG_INFO, "QLogic FastLinQ FCoE Module qedf %s, "