From patchwork Thu Jul 23 02:34:00 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Can Guo X-Patchwork-Id: 11679599 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id DC7E96C1 for ; Thu, 23 Jul 2020 02:34:21 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id CA03A22CF7 for ; Thu, 23 Jul 2020 02:34:21 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2387424AbgGWCeS (ORCPT ); Wed, 22 Jul 2020 22:34:18 -0400 Received: from labrats.qualcomm.com ([199.106.110.90]:30244 "EHLO labrats.qualcomm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728914AbgGWCeS (ORCPT ); Wed, 22 Jul 2020 22:34:18 -0400 IronPort-SDR: qqjSYGIrY03x/nQGQRrWNlzYExV6CSP62+sFb33/m42pf09y+IaRBD/s7CK4ZtTg6kTADtRmGF D05meV1T/2X/vM8xJoP8gDs0ZMgCL6TYo91ChAUcSDj2OarDFKvhZ32BhZjLUUDW129nbUNOCD C5WWZQwcZbwiedctnQgW8UqeUWcjLh2HJHmLXSoTP97t62ie4RcOeA5S2kdyYY1GmlEY/Y54iy /+Exu+p6iJ7jq7Ldu6pRsR3QSPG4HJpM4efsVJMfdOgVmT0phcP9bkGWW5DCgNWjEjpKAk6D7E mZ4= X-IronPort-AV: E=Sophos;i="5.75,385,1589266800"; d="scan'208";a="47229202" Received: from unknown (HELO ironmsg03-sd.qualcomm.com) ([10.53.140.143]) by labrats.qualcomm.com with ESMTP; 22 Jul 2020 19:34:17 -0700 Received: from pacamara-linux.qualcomm.com ([192.168.140.135]) by ironmsg03-sd.qualcomm.com with ESMTP; 22 Jul 2020 19:34:16 -0700 Received: by pacamara-linux.qualcomm.com (Postfix, from userid 359480) id 7326B22A4D; Wed, 22 Jul 2020 19:34:16 -0700 (PDT) From: Can Guo To: asutoshd@codeaurora.org, nguyenb@codeaurora.org, hongwus@codeaurora.org, rnayak@codeaurora.org, sh425.lee@samsung.com, linux-scsi@vger.kernel.org, kernel-team@android.com, saravanak@google.com, salyzyn@google.com, cang@codeaurora.org Cc: Alim Akhtar , Avri Altman , "James E.J. Bottomley" , "Martin K. Petersen" , Stanley Chu , Bean Huo , Bart Van Assche , linux-kernel@vger.kernel.org (open list) Subject: [PATCH v5 1/9] scsi: ufs: Add checks before setting clk-gating states Date: Wed, 22 Jul 2020 19:34:00 -0700 Message-Id: <1595471649-25675-2-git-send-email-cang@codeaurora.org> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1595471649-25675-1-git-send-email-cang@codeaurora.org> References: <1595471649-25675-1-git-send-email-cang@codeaurora.org> Sender: linux-scsi-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-scsi@vger.kernel.org Clock gating features can be turned on/off selectively which means its state information is only important if it is enabled. This change makes sure that we only look at state of clk-gating if it is enabled. Signed-off-by: Can Guo Reviewed-by: Avri Altman Reviewed-by: Hongwu Su --- drivers/scsi/ufs/ufshcd.c | 17 ++++++++++++++--- 1 file changed, 14 insertions(+), 3 deletions(-) diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c index cdff7e5..99bd3e4 100644 --- a/drivers/scsi/ufs/ufshcd.c +++ b/drivers/scsi/ufs/ufshcd.c @@ -1839,6 +1839,8 @@ static void ufshcd_init_clk_gating(struct ufs_hba *hba) if (!ufshcd_is_clkgating_allowed(hba)) return; + hba->clk_gating.state = CLKS_ON; + hba->clk_gating.delay_ms = 150; INIT_DELAYED_WORK(&hba->clk_gating.gate_work, ufshcd_gate_work); INIT_WORK(&hba->clk_gating.ungate_work, ufshcd_ungate_work); @@ -2541,7 +2543,8 @@ static int ufshcd_queuecommand(struct Scsi_Host *host, struct scsi_cmnd *cmd) err = SCSI_MLQUEUE_HOST_BUSY; goto out; } - WARN_ON(hba->clk_gating.state != CLKS_ON); + WARN_ON(ufshcd_is_clkgating_allowed(hba) && + (hba->clk_gating.state != CLKS_ON)); lrbp = &hba->lrb[tag]; @@ -8315,8 +8318,11 @@ static int ufshcd_suspend(struct ufs_hba *hba, enum ufs_pm_op pm_op) /* If link is active, device ref_clk can't be switched off */ __ufshcd_setup_clocks(hba, false, true); - hba->clk_gating.state = CLKS_OFF; - trace_ufshcd_clk_gating(dev_name(hba->dev), hba->clk_gating.state); + if (ufshcd_is_clkgating_allowed(hba)) { + hba->clk_gating.state = CLKS_OFF; + trace_ufshcd_clk_gating(dev_name(hba->dev), + hba->clk_gating.state); + } /* Put the host controller in low power mode if possible */ ufshcd_hba_vreg_set_lpm(hba); @@ -8456,6 +8462,11 @@ static int ufshcd_resume(struct ufs_hba *hba, enum ufs_pm_op pm_op) if (hba->clk_scaling.is_allowed) ufshcd_suspend_clkscaling(hba); ufshcd_setup_clocks(hba, false); + if (ufshcd_is_clkgating_allowed(hba)) { + hba->clk_gating.state = CLKS_OFF; + trace_ufshcd_clk_gating(dev_name(hba->dev), + hba->clk_gating.state); + } out: hba->pm_op_in_progress = 0; if (ret) From patchwork Thu Jul 23 02:34:01 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Can Guo X-Patchwork-Id: 11679605 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 07BFF912 for ; Thu, 23 Jul 2020 02:34:29 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id E3F0F20888 for ; Thu, 23 Jul 2020 02:34:28 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2387463AbgGWCeV (ORCPT ); Wed, 22 Jul 2020 22:34:21 -0400 Received: from labrats.qualcomm.com ([199.106.110.90]:30244 "EHLO labrats.qualcomm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728914AbgGWCeV (ORCPT ); Wed, 22 Jul 2020 22:34:21 -0400 IronPort-SDR: AVUnZFwD9dC1PVHfriE7QHX9ART67xtSjRGrrv2KMfFJyWLPQzaiaBBOLKK8Vbzfs0RkbMzyCN I1252hTa4tBn0CvYz6bZEnoWmHG1g3jnvFWm7VFjwgrQ+0HZi8k3hkF6k0MtpsDgdkFj7nJzEX CLwNuJqEEfL2UEM+45q1CFOchjJf4ocvLE9RJsbku+NeRelMLZYj+wqHQ9B5H/T2odZa5dkLSP c69hKKCTHjC6EezqMYPrEKssKahFbiuRzsve5WiPNmNz/6GdxhdXt9z6gN3y5m9CFjI/35NrrK Ck0= X-IronPort-AV: E=Sophos;i="5.75,385,1589266800"; d="scan'208";a="47229203" Received: from unknown (HELO ironmsg02-sd.qualcomm.com) ([10.53.140.142]) by labrats.qualcomm.com with ESMTP; 22 Jul 2020 19:34:20 -0700 Received: from pacamara-linux.qualcomm.com ([192.168.140.135]) by ironmsg02-sd.qualcomm.com with ESMTP; 22 Jul 2020 19:34:19 -0700 Received: by pacamara-linux.qualcomm.com (Postfix, from userid 359480) id B07D422A4D; Wed, 22 Jul 2020 19:34:19 -0700 (PDT) From: Can Guo To: asutoshd@codeaurora.org, nguyenb@codeaurora.org, hongwus@codeaurora.org, rnayak@codeaurora.org, sh425.lee@samsung.com, linux-scsi@vger.kernel.org, kernel-team@android.com, saravanak@google.com, salyzyn@google.com, cang@codeaurora.org Cc: Alim Akhtar , Avri Altman , "James E.J. Bottomley" , "Martin K. Petersen" , Stanley Chu , Bean Huo , Bart Van Assche , linux-kernel@vger.kernel.org (open list) Subject: [PATCH v5 2/9] scsi: ufs: Fix imbalanced scsi_block_reqs_cnt caused by ufshcd_hold() Date: Wed, 22 Jul 2020 19:34:01 -0700 Message-Id: <1595471649-25675-3-git-send-email-cang@codeaurora.org> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1595471649-25675-1-git-send-email-cang@codeaurora.org> References: <1595471649-25675-1-git-send-email-cang@codeaurora.org> Sender: linux-scsi-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-scsi@vger.kernel.org The scsi_block_reqs_cnt increased in ufshcd_hold() is supposed to be decreased back in ufshcd_ungate_work() in a paired way. However, if specific ufshcd_hold/release sequences are met, it is possible that scsi_block_reqs_cnt is increased twice but only one ungate work is queued. To make sure scsi_block_reqs_cnt is handled by ufshcd_hold() and ufshcd_ungate_work() in a paired way, increase it only if queue_work() returns true. Signed-off-by: Can Guo Reviewed-by: Hongwu Su --- drivers/scsi/ufs/ufshcd.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c index 99bd3e4..2907828 100644 --- a/drivers/scsi/ufs/ufshcd.c +++ b/drivers/scsi/ufs/ufshcd.c @@ -1611,12 +1611,12 @@ int ufshcd_hold(struct ufs_hba *hba, bool async) */ /* fallthrough */ case CLKS_OFF: - ufshcd_scsi_block_requests(hba); hba->clk_gating.state = REQ_CLKS_ON; trace_ufshcd_clk_gating(dev_name(hba->dev), hba->clk_gating.state); - queue_work(hba->clk_gating.clk_gating_workq, - &hba->clk_gating.ungate_work); + if (queue_work(hba->clk_gating.clk_gating_workq, + &hba->clk_gating.ungate_work)) + ufshcd_scsi_block_requests(hba); /* * fall through to check if we should wait for this * work to be done or not. From patchwork Thu Jul 23 02:34:02 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Can Guo X-Patchwork-Id: 11679601 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id B586C6C1 for ; Thu, 23 Jul 2020 02:34:27 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id A7B50208E4 for ; Thu, 23 Jul 2020 02:34:27 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2387497AbgGWCeY (ORCPT ); Wed, 22 Jul 2020 22:34:24 -0400 Received: from labrats.qualcomm.com ([199.106.110.90]:37589 "EHLO labrats.qualcomm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2387483AbgGWCeY (ORCPT ); Wed, 22 Jul 2020 22:34:24 -0400 IronPort-SDR: +CwSSzEVROfjEbwSccSiEnoU/6BReWG4pBdQdpKI4kzYJjDhT7wosgAwN4RpWStNTncQC2t1Gm WVza9yxg+sw8B0TNgJmfbb7j9iQSlqF6k7bnazOFHwlRTjHmoS4gtJVBE9+wZhMGwKtcQmueR9 Qkv68hFPHykTVt7cgqwULajs+OyJpFVebWoSi9QDxziz2Ptev621nYXtfWZ4/CAcMZ80Or0urt bcExaKu0CDOdq8Aahn1dwlSFgxAFaLq2CrXY0nHxfePb7PBxvbg4YrfTqJtQ+utXueGNtkn2mg h1Q= X-IronPort-AV: E=Sophos;i="5.75,385,1589266800"; d="scan'208";a="29047795" Received: from unknown (HELO ironmsg-SD-alpha.qualcomm.com) ([10.53.140.30]) by labrats.qualcomm.com with ESMTP; 22 Jul 2020 19:34:23 -0700 Received: from pacamara-linux.qualcomm.com ([192.168.140.135]) by ironmsg-SD-alpha.qualcomm.com with ESMTP; 22 Jul 2020 19:34:21 -0700 Received: by pacamara-linux.qualcomm.com (Postfix, from userid 359480) id 3482A22A4D; Wed, 22 Jul 2020 19:34:20 -0700 (PDT) From: Can Guo To: asutoshd@codeaurora.org, nguyenb@codeaurora.org, hongwus@codeaurora.org, rnayak@codeaurora.org, sh425.lee@samsung.com, linux-scsi@vger.kernel.org, kernel-team@android.com, saravanak@google.com, salyzyn@google.com, cang@codeaurora.org Cc: Andy Gross , Bjorn Andersson , Alim Akhtar , Avri Altman , "James E.J. Bottomley" , "Martin K. Petersen" , linux-arm-msm@vger.kernel.org (open list:ARM/QUALCOMM SUPPORT), linux-kernel@vger.kernel.org (open list) Subject: [PATCH v5 3/9] ufs: ufs-qcom: Fix race conditions caused by func ufs_qcom_testbus_config Date: Wed, 22 Jul 2020 19:34:02 -0700 Message-Id: <1595471649-25675-4-git-send-email-cang@codeaurora.org> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1595471649-25675-1-git-send-email-cang@codeaurora.org> References: <1595471649-25675-1-git-send-email-cang@codeaurora.org> Sender: linux-scsi-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-scsi@vger.kernel.org If ufs_qcom_dump_dbg_regs() calls ufs_qcom_testbus_config() from ufshcd_suspend/resume and/or clk gate/ungate context, pm_runtime_get_sync() and ufshcd_hold() will cause racing problems. Fix this by removing the unnecessary calls of pm_runtime_get_sync() and ufshcd_hold(). Signed-off-by: Can Guo Reviewed-by: Hongwu Su --- drivers/scsi/ufs/ufs-qcom.c | 5 ----- 1 file changed, 5 deletions(-) diff --git a/drivers/scsi/ufs/ufs-qcom.c b/drivers/scsi/ufs/ufs-qcom.c index 2e6ddb5..7da27ee 100644 --- a/drivers/scsi/ufs/ufs-qcom.c +++ b/drivers/scsi/ufs/ufs-qcom.c @@ -1604,9 +1604,6 @@ int ufs_qcom_testbus_config(struct ufs_qcom_host *host) */ } mask <<= offset; - - pm_runtime_get_sync(host->hba->dev); - ufshcd_hold(host->hba, false); ufshcd_rmwl(host->hba, TEST_BUS_SEL, (u32)host->testbus.select_major << 19, REG_UFS_CFG1); @@ -1619,8 +1616,6 @@ int ufs_qcom_testbus_config(struct ufs_qcom_host *host) * committed before returning. */ mb(); - ufshcd_release(host->hba); - pm_runtime_put_sync(host->hba->dev); return 0; } From patchwork Thu Jul 23 02:34:03 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Can Guo X-Patchwork-Id: 11679611 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 597D1912 for ; Thu, 23 Jul 2020 02:34:36 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 4B24320888 for ; Thu, 23 Jul 2020 02:34:36 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2387577AbgGWCec (ORCPT ); Wed, 22 Jul 2020 22:34:32 -0400 Received: from labrats.qualcomm.com ([199.106.110.90]:37589 "EHLO labrats.qualcomm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2387483AbgGWCec (ORCPT ); Wed, 22 Jul 2020 22:34:32 -0400 IronPort-SDR: SEaBUIS7QFVySKDz6MH5gNsusfxBz1WgLsI4esHeAUNXPgVAQSsxyXqstH2TmrKaN/uPuuBFPn fJunmBM/ZbtkiNo50oQhEM8NPiyYBcRpxEsgkmDkd8wibOg4LIcD99aABdaEWzDzRphwCCVGDU I/5bcChh9Rm81GqxZmlKnz4C3823U4dv0pKfD4iZfKQhlgh7Jv/7ptpEjPRY31pbM5UdF3zVuG 6iDTA1szGsst+inAqwc7TevudPpM5SIx3Ydn7JK0q7NnIMfbeu4oIzcgbtSGw5DKcXgmDrmVuQ 2TM= X-IronPort-AV: E=Sophos;i="5.75,385,1589266800"; d="scan'208";a="29047797" Received: from unknown (HELO ironmsg04-sd.qualcomm.com) ([10.53.140.144]) by labrats.qualcomm.com with ESMTP; 22 Jul 2020 19:34:31 -0700 Received: from pacamara-linux.qualcomm.com ([192.168.140.135]) by ironmsg04-sd.qualcomm.com with ESMTP; 22 Jul 2020 19:34:27 -0700 Received: by pacamara-linux.qualcomm.com (Postfix, from userid 359480) id ACBCC22E1F; Wed, 22 Jul 2020 19:34:20 -0700 (PDT) From: Can Guo To: asutoshd@codeaurora.org, nguyenb@codeaurora.org, hongwus@codeaurora.org, rnayak@codeaurora.org, sh425.lee@samsung.com, linux-scsi@vger.kernel.org, kernel-team@android.com, saravanak@google.com, salyzyn@google.com, cang@codeaurora.org Cc: Andy Gross , Bjorn Andersson , Alim Akhtar , Avri Altman , "James E.J. Bottomley" , "Martin K. Petersen" , linux-arm-msm@vger.kernel.org (open list:ARM/QUALCOMM SUPPORT), linux-kernel@vger.kernel.org (open list) Subject: [PATCH v5 4/9] scsi: ufs-qcom: Fix schedule while atomic error in ufs_qcom_dump_dbg_regs Date: Wed, 22 Jul 2020 19:34:03 -0700 Message-Id: <1595471649-25675-5-git-send-email-cang@codeaurora.org> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1595471649-25675-1-git-send-email-cang@codeaurora.org> References: <1595471649-25675-1-git-send-email-cang@codeaurora.org> Sender: linux-scsi-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-scsi@vger.kernel.org Dumping testbus registers needs to sleep a bit intermittently as there are too many of them. Skip them for those contexts where sleep is not allowed. Signed-off-by: Can Guo --- drivers/scsi/ufs/ufs-qcom.c | 15 +++++++++------ 1 file changed, 9 insertions(+), 6 deletions(-) diff --git a/drivers/scsi/ufs/ufs-qcom.c b/drivers/scsi/ufs/ufs-qcom.c index 7da27ee..7831b2b 100644 --- a/drivers/scsi/ufs/ufs-qcom.c +++ b/drivers/scsi/ufs/ufs-qcom.c @@ -1651,13 +1651,16 @@ static void ufs_qcom_dump_dbg_regs(struct ufs_hba *hba) ufshcd_dump_regs(hba, REG_UFS_SYS1CLK_1US, 16 * 4, "HCI Vendor Specific Registers "); - /* sleep a bit intermittently as we are dumping too much data */ ufs_qcom_print_hw_debug_reg_all(hba, NULL, ufs_qcom_dump_regs_wrapper); - udelay(1000); - ufs_qcom_testbus_read(hba); - udelay(1000); - ufs_qcom_print_unipro_testbus(hba); - udelay(1000); + + if (in_task()) { + /* sleep a bit intermittently as we are dumping too much data */ + usleep_range(1000, 1100); + ufs_qcom_testbus_read(hba); + usleep_range(1000, 1100); + ufs_qcom_print_unipro_testbus(hba); + usleep_range(1000, 1100); + } } /** From patchwork Thu Jul 23 02:34:04 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Can Guo X-Patchwork-Id: 11679607 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 2CEDE1510 for ; Thu, 23 Jul 2020 02:34:29 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 1EB0E208E4 for ; Thu, 23 Jul 2020 02:34:29 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2387539AbgGWCe2 (ORCPT ); Wed, 22 Jul 2020 22:34:28 -0400 Received: from labrats.qualcomm.com ([199.106.110.90]:37589 "EHLO labrats.qualcomm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2387483AbgGWCe1 (ORCPT ); Wed, 22 Jul 2020 22:34:27 -0400 IronPort-SDR: JH+ei7YBQ/788tY+baRbOlK3VAlitNkgs2budKTVu0/ai/suVWej6BcIN+Zz6//UVDZslCBFKH +ZqWaH4qAULHQ5nbnOKrZ6L4yyjImmP4GBSVMkPloa1TDJRJ9SSCuBssHf8ITKY/FRn+pcriyH 9RvXdAc/z9h+4bfoBXXPKLXvAD+jHsYimllx2cASMebJg+VbhHCfrJiHAfCYqzPqXhoUnrWrzj gi8pG/qzz89UbvAR/qyP+PPed5weSKjME0PZblOGseMWwhuVFisKERmd5aRPLX9+ZKMicZnGc2 npQ= X-IronPort-AV: E=Sophos;i="5.75,385,1589266800"; d="scan'208";a="29047796" Received: from unknown (HELO ironmsg-SD-alpha.qualcomm.com) ([10.53.140.30]) by labrats.qualcomm.com with ESMTP; 22 Jul 2020 19:34:27 -0700 Received: from pacamara-linux.qualcomm.com ([192.168.140.135]) by ironmsg-SD-alpha.qualcomm.com with ESMTP; 22 Jul 2020 19:34:26 -0700 Received: by pacamara-linux.qualcomm.com (Postfix, from userid 359480) id 84D1522A4D; Wed, 22 Jul 2020 19:34:26 -0700 (PDT) From: Can Guo To: asutoshd@codeaurora.org, nguyenb@codeaurora.org, hongwus@codeaurora.org, rnayak@codeaurora.org, sh425.lee@samsung.com, linux-scsi@vger.kernel.org, kernel-team@android.com, saravanak@google.com, salyzyn@google.com, cang@codeaurora.org Cc: Alim Akhtar , Avri Altman , "James E.J. Bottomley" , "Martin K. Petersen" , Stanley Chu , Bean Huo , Bart Van Assche , Satya Tangirala , linux-kernel@vger.kernel.org (open list) Subject: [PATCH v5 5/9] scsi: ufs: Add some debug infos to ufshcd_print_host_state Date: Wed, 22 Jul 2020 19:34:04 -0700 Message-Id: <1595471649-25675-6-git-send-email-cang@codeaurora.org> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1595471649-25675-1-git-send-email-cang@codeaurora.org> References: <1595471649-25675-1-git-send-email-cang@codeaurora.org> Sender: linux-scsi-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-scsi@vger.kernel.org The infos of the last interrupt status and its timestamp are very helpful when debug system stability issues, e.g. IRQ starvation, so add them to ufshcd_print_host_state. Meanwhile, UFS device infos like model name and its FW version also come in handy during debug. In addition, this change makes cleanup to some prints in ufshcd_print_host_regs as similar prints are already available in ufshcd_print_host_state. Signed-off-by: Can Guo --- drivers/scsi/ufs/ufshcd.c | 31 ++++++++++++++++++------------- drivers/scsi/ufs/ufshcd.h | 5 +++++ 2 files changed, 23 insertions(+), 13 deletions(-) diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c index 2907828..a22414b 100644 --- a/drivers/scsi/ufs/ufshcd.c +++ b/drivers/scsi/ufs/ufshcd.c @@ -411,15 +411,6 @@ static void ufshcd_print_err_hist(struct ufs_hba *hba, static void ufshcd_print_host_regs(struct ufs_hba *hba) { ufshcd_dump_regs(hba, 0, UFSHCI_REG_SPACE_SIZE, "host_regs: "); - dev_err(hba->dev, "hba->ufs_version = 0x%x, hba->capabilities = 0x%x\n", - hba->ufs_version, hba->capabilities); - dev_err(hba->dev, - "hba->outstanding_reqs = 0x%x, hba->outstanding_tasks = 0x%x\n", - (u32)hba->outstanding_reqs, (u32)hba->outstanding_tasks); - dev_err(hba->dev, - "last_hibern8_exit_tstamp at %lld us, hibern8_exit_cnt = %d\n", - ktime_to_us(hba->ufs_stats.last_hibern8_exit_tstamp), - hba->ufs_stats.hibern8_exit_cnt); ufshcd_print_err_hist(hba, &hba->ufs_stats.pa_err, "pa_err"); ufshcd_print_err_hist(hba, &hba->ufs_stats.dl_err, "dl_err"); @@ -438,8 +429,6 @@ static void ufshcd_print_host_regs(struct ufs_hba *hba) ufshcd_print_err_hist(hba, &hba->ufs_stats.host_reset, "host_reset"); ufshcd_print_err_hist(hba, &hba->ufs_stats.task_abort, "task_abort"); - ufshcd_print_clk_freqs(hba); - ufshcd_vops_dbg_register_dump(hba); } @@ -499,6 +488,8 @@ static void ufshcd_print_tmrs(struct ufs_hba *hba, unsigned long bitmap) static void ufshcd_print_host_state(struct ufs_hba *hba) { + struct scsi_device *sdev_ufs = hba->sdev_ufs_device; + dev_err(hba->dev, "UFS Host state=%d\n", hba->ufshcd_state); dev_err(hba->dev, "outstanding reqs=0x%lx tasks=0x%lx\n", hba->outstanding_reqs, hba->outstanding_tasks); @@ -511,12 +502,24 @@ static void ufshcd_print_host_state(struct ufs_hba *hba) dev_err(hba->dev, "Auto BKOPS=%d, Host self-block=%d\n", hba->auto_bkops_enabled, hba->host->host_self_blocked); dev_err(hba->dev, "Clk gate=%d\n", hba->clk_gating.state); + dev_err(hba->dev, + "last_hibern8_exit_tstamp at %lld us, hibern8_exit_cnt=%d\n", + ktime_to_us(hba->ufs_stats.last_hibern8_exit_tstamp), + hba->ufs_stats.hibern8_exit_cnt); + dev_err(hba->dev, "last intr at %lld us, last intr status=0x%x\n", + ktime_to_us(hba->ufs_stats.last_intr_ts), + hba->ufs_stats.last_intr_status); dev_err(hba->dev, "error handling flags=0x%x, req. abort count=%d\n", hba->eh_flags, hba->req_abort_count); - dev_err(hba->dev, "Host capabilities=0x%x, caps=0x%x\n", - hba->capabilities, hba->caps); + dev_err(hba->dev, "hba->ufs_version=0x%x, Host capabilities=0x%x, caps=0x%x\n", + hba->ufs_version, hba->capabilities, hba->caps); dev_err(hba->dev, "quirks=0x%x, dev. quirks=0x%x\n", hba->quirks, hba->dev_quirks); + if (sdev_ufs) + dev_err(hba->dev, "UFS dev info: %.8s %.16s rev %.4s\n", + sdev_ufs->vendor, sdev_ufs->model, sdev_ufs->rev); + + ufshcd_print_clk_freqs(hba); } /** @@ -5951,6 +5954,8 @@ static irqreturn_t ufshcd_intr(int irq, void *__hba) spin_lock(hba->host->host_lock); intr_status = ufshcd_readl(hba, REG_INTERRUPT_STATUS); + hba->ufs_stats.last_intr_status = intr_status; + hba->ufs_stats.last_intr_ts = ktime_get(); /* * There could be max of hba->nutrs reqs in flight and in worst case diff --git a/drivers/scsi/ufs/ufshcd.h b/drivers/scsi/ufs/ufshcd.h index 656c069..5b2cdaf 100644 --- a/drivers/scsi/ufs/ufshcd.h +++ b/drivers/scsi/ufs/ufshcd.h @@ -406,6 +406,8 @@ struct ufs_err_reg_hist { /** * struct ufs_stats - keeps usage/err statistics + * @last_intr_status: record the last interrupt status. + * @last_intr_ts: record the last interrupt timestamp. * @hibern8_exit_cnt: Counter to keep track of number of exits, * reset this after link-startup. * @last_hibern8_exit_tstamp: Set time after the hibern8 exit. @@ -425,6 +427,9 @@ struct ufs_err_reg_hist { * @tsk_abort: tracks task abort events */ struct ufs_stats { + u32 last_intr_status; + ktime_t last_intr_ts; + u32 hibern8_exit_cnt; ktime_t last_hibern8_exit_tstamp; From patchwork Thu Jul 23 02:34:05 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Can Guo X-Patchwork-Id: 11679613 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id A67856C1 for ; Thu, 23 Jul 2020 02:34:43 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 89F2320888 for ; Thu, 23 Jul 2020 02:34:43 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2387611AbgGWCej (ORCPT ); Wed, 22 Jul 2020 22:34:39 -0400 Received: from labrats.qualcomm.com ([199.106.110.90]:62898 "EHLO labrats.qualcomm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2387483AbgGWCej (ORCPT ); Wed, 22 Jul 2020 22:34:39 -0400 IronPort-SDR: jfhTkvtMwQwIW97e3pcrnFGe+MnCqYMsBGEUL3nJMSJK13g6Ci952PQGt1PiRIGfIFPGWxdn5z hCZ8GFqvPvG47GK+GMwwX25NnT0fZ+s6g35xqQUBegE8yO8gcHne3aSZNTRkMBbA0xJbIga6UZ YdSCIE/p5udtY+dSbBQJtEOMXY51E1h1BBLZ5cFl4mLEkdZQUWpp/QHkOgLhzCiTgh5uNko3bi bhSNYw2v3OhxriE9RgTMjcHsL8bLYNDhSCbdqTfreme6t0oaIh5CJcQ/FPJIarBObQA3HSDzdg Om4= X-IronPort-AV: E=Sophos;i="5.75,385,1589266800"; d="scan'208";a="47229204" Received: from unknown (HELO ironmsg02-sd.qualcomm.com) ([10.53.140.142]) by labrats.qualcomm.com with ESMTP; 22 Jul 2020 19:34:35 -0700 Received: from pacamara-linux.qualcomm.com ([192.168.140.135]) by ironmsg02-sd.qualcomm.com with ESMTP; 22 Jul 2020 19:34:34 -0700 Received: by pacamara-linux.qualcomm.com (Postfix, from userid 359480) id D33DD22A4D; Wed, 22 Jul 2020 19:34:34 -0700 (PDT) From: Can Guo To: asutoshd@codeaurora.org, nguyenb@codeaurora.org, hongwus@codeaurora.org, rnayak@codeaurora.org, sh425.lee@samsung.com, linux-scsi@vger.kernel.org, kernel-team@android.com, saravanak@google.com, salyzyn@google.com, cang@codeaurora.org Cc: Alim Akhtar , Avri Altman , "James E.J. Bottomley" , "Martin K. Petersen" , Stanley Chu , Bean Huo , Tomas Winkler , Nitin Rawat , Bart Van Assche , Satya Tangirala , linux-kernel@vger.kernel.org (open list) Subject: [PATCH v5 6/9] scsi: ufs: Fix concurrency of error handler and other error recovery paths Date: Wed, 22 Jul 2020 19:34:05 -0700 Message-Id: <1595471649-25675-7-git-send-email-cang@codeaurora.org> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1595471649-25675-1-git-send-email-cang@codeaurora.org> References: <1595471649-25675-1-git-send-email-cang@codeaurora.org> Sender: linux-scsi-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-scsi@vger.kernel.org Error recovery can be invoked from multiple paths, including hibern8 enter/exit, some vendor vops, ufshcd_eh_host_reset_handler(), resume and eh_work scheduled from IRQ context. Ultimately, these paths are trying to invoke ufshcd_reset_and_restore(), in either sync or async manner. Having both sync and async manners at the same time has some problems - If link recovery happens during clock scaling work, acquring scaling_lock in ufshcd_exec_dev_cmd() would cause dead lock, because scaling_lock is already held by scaling work before link recovery happens. - If link recovery happens during ungate work, ufshcd_hold() would be called recursively. Although commit 53c12d0ef6fcb ("scsi: ufs: fix error recovery after the hibern8 exit failure") [1] fixed a deadlock due to recursive calls of ufshcd_hold() by adding a check of eh_in_progress into ufshcd_hold, this check allows eh_work to run in parallel when link recovery is running. - Similar concurrency can also happen when error recovery is invoked from eh_host_reset_handler. Although it tries to avoid that from happening by flushing eh_work before start invoking reset and restore, after flush work returns, eh_work can still be scheduled and running in parallel. - Concurrency can even happen between eh_works. eh_work, currently queued on system_wq, is allowed to have multiple instances running in parallel, but we don't have proper protection for that. If any of above concurrency happens, error recovery would fail and lead ufs device and host into bad states. To fix the concurrency problem, this change queues eh_work on a single threaded workqueue and remove link recovery calls from hibern8 enter/exit path. Meanwhile, make use of eh_work in eh_host_reset_handler instead of calling ufshcd_reset_and_restore. This unifies UFS error recovery mechanism. In addition, according to the UFSHCI JEDEC spec, hibern8 enter/exit error occurs when the link is broken. This essentially applies to any power mode change operations (since they all use PACP_PWR cmds in UniPro layer). So, in this change, if a power mode change operation (including AH8 enter/exit) fails, mark link state as UIC_LINK_BROKEN_STATE and schedule the eh_work. In this case, error handler needs to do a full reset and restore to recover the link back to active. Before the link state is recovered to active, ufshcd_uic_pwr_ctrl simply returns -ENOLINK to avoid more errors. Signed-off-by: Can Guo --- drivers/scsi/ufs/ufs-sysfs.c | 1 + drivers/scsi/ufs/ufshcd.c | 268 +++++++++++++++++++++++-------------------- drivers/scsi/ufs/ufshcd.h | 9 ++ 3 files changed, 151 insertions(+), 127 deletions(-) diff --git a/drivers/scsi/ufs/ufs-sysfs.c b/drivers/scsi/ufs/ufs-sysfs.c index 2d71d23..02d379f00 100644 --- a/drivers/scsi/ufs/ufs-sysfs.c +++ b/drivers/scsi/ufs/ufs-sysfs.c @@ -16,6 +16,7 @@ static const char *ufschd_uic_link_state_to_string( case UIC_LINK_OFF_STATE: return "OFF"; case UIC_LINK_ACTIVE_STATE: return "ACTIVE"; case UIC_LINK_HIBERN8_STATE: return "HIBERN8"; + case UIC_LINK_BROKEN_STATE: return "BROKEN"; default: return "UNKNOWN"; } } diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c index a22414b..c01743a 100644 --- a/drivers/scsi/ufs/ufshcd.c +++ b/drivers/scsi/ufs/ufshcd.c @@ -228,6 +228,7 @@ static int ufshcd_scale_clks(struct ufs_hba *hba, bool scale_up); static irqreturn_t ufshcd_intr(int irq, void *__hba); static int ufshcd_change_power_mode(struct ufs_hba *hba, struct ufs_pa_layer_attr *pwr_mode); +static void ufshcd_schedule_eh_work(struct ufs_hba *hba); static int ufshcd_wb_buf_flush_enable(struct ufs_hba *hba); static int ufshcd_wb_buf_flush_disable(struct ufs_hba *hba); static int ufshcd_wb_ctrl(struct ufs_hba *hba, bool enable); @@ -1571,11 +1572,6 @@ int ufshcd_hold(struct ufs_hba *hba, bool async) spin_lock_irqsave(hba->host->host_lock, flags); hba->clk_gating.active_reqs++; - if (ufshcd_eh_in_progress(hba)) { - spin_unlock_irqrestore(hba->host->host_lock, flags); - return 0; - } - start: switch (hba->clk_gating.state) { case CLKS_ON: @@ -1653,6 +1649,7 @@ static void ufshcd_gate_work(struct work_struct *work) struct ufs_hba *hba = container_of(work, struct ufs_hba, clk_gating.gate_work.work); unsigned long flags; + int ret; spin_lock_irqsave(hba->host->host_lock, flags); /* @@ -1679,8 +1676,11 @@ static void ufshcd_gate_work(struct work_struct *work) /* put the link into hibern8 mode before turning off clocks */ if (ufshcd_can_hibern8_during_gating(hba)) { - if (ufshcd_uic_hibern8_enter(hba)) { + ret = ufshcd_uic_hibern8_enter(hba); + if (ret) { hba->clk_gating.state = CLKS_ON; + dev_err(hba->dev, "%s: hibern8 enter failed %d\n", + __func__, ret); trace_ufshcd_clk_gating(dev_name(hba->dev), hba->clk_gating.state); goto out; @@ -1725,11 +1725,10 @@ static void __ufshcd_release(struct ufs_hba *hba) hba->clk_gating.active_reqs--; - if (hba->clk_gating.active_reqs || hba->clk_gating.is_suspended - || hba->ufshcd_state != UFSHCD_STATE_OPERATIONAL - || ufshcd_any_tag_in_use(hba) || hba->outstanding_tasks - || hba->active_uic_cmd || hba->uic_async_done - || ufshcd_eh_in_progress(hba)) + if (hba->clk_gating.active_reqs || hba->clk_gating.is_suspended || + hba->ufshcd_state != UFSHCD_STATE_OPERATIONAL || + ufshcd_any_tag_in_use(hba) || hba->outstanding_tasks || + hba->active_uic_cmd || hba->uic_async_done) return; hba->clk_gating.state = REQ_CLKS_OFF; @@ -3750,6 +3749,10 @@ static int ufshcd_uic_pwr_ctrl(struct ufs_hba *hba, struct uic_command *cmd) ufshcd_add_delay_before_dme_cmd(hba); spin_lock_irqsave(hba->host->host_lock, flags); + if (ufshcd_is_link_broken(hba)) { + ret = -ENOLINK; + goto out_unlock; + } hba->uic_async_done = &uic_async_done; if (ufshcd_readl(hba, REG_INTERRUPT_ENABLE) & UIC_COMMAND_COMPL) { ufshcd_disable_intr(hba, UIC_COMMAND_COMPL); @@ -3797,6 +3800,11 @@ static int ufshcd_uic_pwr_ctrl(struct ufs_hba *hba, struct uic_command *cmd) hba->uic_async_done = NULL; if (reenable_intr) ufshcd_enable_intr(hba, UIC_COMMAND_COMPL); + if (ret) { + ufshcd_set_link_broken(hba); + ufshcd_schedule_eh_work(hba); + } +out_unlock: spin_unlock_irqrestore(hba->host->host_lock, flags); mutex_unlock(&hba->uic_cmd_mutex); @@ -3866,7 +3874,7 @@ int ufshcd_link_recovery(struct ufs_hba *hba) } EXPORT_SYMBOL_GPL(ufshcd_link_recovery); -static int __ufshcd_uic_hibern8_enter(struct ufs_hba *hba) +static int ufshcd_uic_hibern8_enter(struct ufs_hba *hba) { int ret; struct uic_command uic_cmd = {0}; @@ -3879,45 +3887,16 @@ static int __ufshcd_uic_hibern8_enter(struct ufs_hba *hba) trace_ufshcd_profile_hibern8(dev_name(hba->dev), "enter", ktime_to_us(ktime_sub(ktime_get(), start)), ret); - if (ret) { - int err; - + if (ret) dev_err(hba->dev, "%s: hibern8 enter failed. ret = %d\n", __func__, ret); - - /* - * If link recovery fails then return error code returned from - * ufshcd_link_recovery(). - * If link recovery succeeds then return -EAGAIN to attempt - * hibern8 enter retry again. - */ - err = ufshcd_link_recovery(hba); - if (err) { - dev_err(hba->dev, "%s: link recovery failed", __func__); - ret = err; - } else { - ret = -EAGAIN; - } - } else + else ufshcd_vops_hibern8_notify(hba, UIC_CMD_DME_HIBER_ENTER, POST_CHANGE); return ret; } -static int ufshcd_uic_hibern8_enter(struct ufs_hba *hba) -{ - int ret = 0, retries; - - for (retries = UIC_HIBERN8_ENTER_RETRIES; retries > 0; retries--) { - ret = __ufshcd_uic_hibern8_enter(hba); - if (!ret) - goto out; - } -out: - return ret; -} - int ufshcd_uic_hibern8_exit(struct ufs_hba *hba) { struct uic_command uic_cmd = {0}; @@ -3934,7 +3913,6 @@ int ufshcd_uic_hibern8_exit(struct ufs_hba *hba) if (ret) { dev_err(hba->dev, "%s: hibern8 exit failed. ret = %d\n", __func__, ret); - ret = ufshcd_link_recovery(hba); } else { ufshcd_vops_hibern8_notify(hba, UIC_CMD_DME_HIBER_EXIT, POST_CHANGE); @@ -5557,6 +5535,24 @@ static bool ufshcd_quirk_dl_nac_errors(struct ufs_hba *hba) return err_handling; } +/* host lock must be held before calling this func */ +static inline bool ufshcd_is_saved_err_fatal(struct ufs_hba *hba) +{ + return ((hba->saved_err & INT_FATAL_ERRORS) || + ((hba->saved_err & UIC_ERROR) && + (hba->saved_uic_err & UFSHCD_UIC_DL_PA_INIT_ERROR))); +} + +/* host lock must be held before calling this func */ +static inline void ufshcd_schedule_eh_work(struct ufs_hba *hba) +{ + /* handle fatal errors only when link is not in error state */ + if (hba->ufshcd_state != UFSHCD_STATE_ERROR) { + hba->ufshcd_state = UFSHCD_STATE_EH_SCHEDULED; + queue_work(hba->eh_wq, &hba->eh_work); + } +} + /** * ufshcd_err_handler - handle UFS errors that require s/w attention * @work: pointer to work structure @@ -5573,15 +5569,22 @@ static void ufshcd_err_handler(struct work_struct *work) hba = container_of(work, struct ufs_hba, eh_work); + spin_lock_irqsave(hba->host->host_lock, flags); + if (hba->ufshcd_state == UFSHCD_STATE_ERROR || + (!(hba->saved_err || hba->saved_uic_err || hba->force_reset) && + !ufshcd_is_link_broken(hba))) { + if (hba->ufshcd_state != UFSHCD_STATE_ERROR) + hba->ufshcd_state = UFSHCD_STATE_OPERATIONAL; + spin_unlock_irqrestore(hba->host->host_lock, flags); + return; + } + ufshcd_set_eh_in_progress(hba); + spin_unlock_irqrestore(hba->host->host_lock, flags); pm_runtime_get_sync(hba->dev); ufshcd_hold(hba, false); spin_lock_irqsave(hba->host->host_lock, flags); - if (hba->ufshcd_state == UFSHCD_STATE_RESET) - goto out; - hba->ufshcd_state = UFSHCD_STATE_RESET; - ufshcd_set_eh_in_progress(hba); /* Complete requests that have door-bell cleared by h/w */ ufshcd_complete_requests(hba); @@ -5593,15 +5596,15 @@ static void ufshcd_err_handler(struct work_struct *work) /* release the lock as ufshcd_quirk_dl_nac_errors() may sleep */ ret = ufshcd_quirk_dl_nac_errors(hba); spin_lock_irqsave(hba->host->host_lock, flags); - if (!ret) + if (!ret && !hba->force_reset && ufshcd_is_link_active(hba)) goto skip_err_handling; } - if ((hba->saved_err & INT_FATAL_ERRORS) || - (hba->saved_err & UFSHCD_UIC_HIBERN8_MASK) || + + if (hba->force_reset || ufshcd_is_link_broken(hba) || + ufshcd_is_saved_err_fatal(hba) || ((hba->saved_err & UIC_ERROR) && - (hba->saved_uic_err & (UFSHCD_UIC_DL_PA_INIT_ERROR | - UFSHCD_UIC_DL_NAC_RECEIVED_ERROR | - UFSHCD_UIC_DL_TCx_REPLAY_ERROR)))) + (hba->saved_uic_err & (UFSHCD_UIC_DL_NAC_RECEIVED_ERROR | + UFSHCD_UIC_DL_TCx_REPLAY_ERROR)))) needs_reset = true; /* @@ -5655,34 +5658,25 @@ static void ufshcd_err_handler(struct work_struct *work) __ufshcd_transfer_req_compl(hba, (1UL << (hba->nutrs - 1))); + hba->force_reset = false; spin_unlock_irqrestore(hba->host->host_lock, flags); err = ufshcd_reset_and_restore(hba); spin_lock_irqsave(hba->host->host_lock, flags); - if (err) { - dev_err(hba->dev, "%s: reset and restore failed\n", - __func__); - hba->ufshcd_state = UFSHCD_STATE_ERROR; - } - /* - * Inform scsi mid-layer that we did reset and allow to handle - * Unit Attention properly. - */ - scsi_report_bus_reset(hba->host, 0); - hba->saved_err = 0; - hba->saved_uic_err = 0; + if (err) + dev_err(hba->dev, "%s: reset and restore failed with err %d\n", + __func__, err); } skip_err_handling: if (!needs_reset) { - hba->ufshcd_state = UFSHCD_STATE_OPERATIONAL; + if (hba->ufshcd_state == UFSHCD_STATE_RESET) + hba->ufshcd_state = UFSHCD_STATE_OPERATIONAL; if (hba->saved_err || hba->saved_uic_err) dev_err_ratelimited(hba->dev, "%s: exit: saved_err 0x%x saved_uic_err 0x%x", __func__, hba->saved_err, hba->saved_uic_err); } ufshcd_clear_eh_in_progress(hba); - -out: spin_unlock_irqrestore(hba->host->host_lock, flags); ufshcd_scsi_unblock_requests(hba); ufshcd_release(hba); @@ -5816,6 +5810,7 @@ static irqreturn_t ufshcd_check_errors(struct ufs_hba *hba) hba->errors, ufshcd_get_upmcrs(hba)); ufshcd_update_reg_hist(&hba->ufs_stats.auto_hibern8_err, hba->errors); + ufshcd_set_link_broken(hba); queue_eh_work = true; } @@ -5827,30 +5822,25 @@ static irqreturn_t ufshcd_check_errors(struct ufs_hba *hba) hba->saved_err |= hba->errors; hba->saved_uic_err |= hba->uic_error; - /* handle fatal errors only when link is functional */ - if (hba->ufshcd_state == UFSHCD_STATE_OPERATIONAL) { - /* block commands from scsi mid-layer */ - ufshcd_scsi_block_requests(hba); - - hba->ufshcd_state = UFSHCD_STATE_EH_SCHEDULED; + /* block commands from scsi mid-layer */ + ufshcd_scsi_block_requests(hba); - /* dump controller state before resetting */ - if (hba->saved_err & (INT_FATAL_ERRORS | UIC_ERROR)) { - bool pr_prdt = !!(hba->saved_err & - SYSTEM_BUS_FATAL_ERROR); + /* dump controller state before resetting */ + if (hba->saved_err & (INT_FATAL_ERRORS | UIC_ERROR)) { + bool pr_prdt = !!(hba->saved_err & + SYSTEM_BUS_FATAL_ERROR); - dev_err(hba->dev, "%s: saved_err 0x%x saved_uic_err 0x%x\n", + dev_err(hba->dev, "%s: saved_err 0x%x saved_uic_err 0x%x\n", __func__, hba->saved_err, hba->saved_uic_err); - ufshcd_print_host_regs(hba); - ufshcd_print_pwr_info(hba); - ufshcd_print_tmrs(hba, hba->outstanding_tasks); - ufshcd_print_trs(hba, hba->outstanding_reqs, - pr_prdt); - } - schedule_work(&hba->eh_work); + ufshcd_print_host_regs(hba); + ufshcd_print_pwr_info(hba); + ufshcd_print_tmrs(hba, hba->outstanding_tasks); + ufshcd_print_trs(hba, hba->outstanding_reqs, + pr_prdt); } + ufshcd_schedule_eh_work(hba); retval |= IRQ_HANDLED; } /* @@ -6595,8 +6585,6 @@ static int ufshcd_host_reset_and_restore(struct ufs_hba *hba) /* Establish the link again and restore the device */ err = ufshcd_probe_hba(hba, false); - if (!err && (hba->ufshcd_state != UFSHCD_STATE_OPERATIONAL)) - err = -EIO; out: if (err) dev_err(hba->dev, "%s: Host init failed %d\n", __func__, err); @@ -6615,9 +6603,23 @@ static int ufshcd_host_reset_and_restore(struct ufs_hba *hba) */ static int ufshcd_reset_and_restore(struct ufs_hba *hba) { + u32 saved_err; + u32 saved_uic_err; int err = 0; + unsigned long flags; int retries = MAX_HOST_RESET_RETRIES; + /* + * This is a fresh start, cache and clear saved error first, + * in case new error generated during reset and restore. + */ + spin_lock_irqsave(hba->host->host_lock, flags); + saved_err = hba->saved_err; + saved_uic_err = hba->saved_uic_err; + hba->saved_err = 0; + hba->saved_uic_err = 0; + spin_unlock_irqrestore(hba->host->host_lock, flags); + do { /* Reset the attached device */ ufshcd_vops_device_reset(hba); @@ -6625,6 +6627,18 @@ static int ufshcd_reset_and_restore(struct ufs_hba *hba) err = ufshcd_host_reset_and_restore(hba); } while (err && --retries); + spin_lock_irqsave(hba->host->host_lock, flags); + /* + * Inform scsi mid-layer that we did reset and allow to handle + * Unit Attention properly. + */ + scsi_report_bus_reset(hba->host, 0); + if (err) { + hba->saved_err |= saved_err; + hba->saved_uic_err |= saved_uic_err; + } + spin_unlock_irqrestore(hba->host->host_lock, flags); + return err; } @@ -6636,48 +6650,25 @@ static int ufshcd_reset_and_restore(struct ufs_hba *hba) */ static int ufshcd_eh_host_reset_handler(struct scsi_cmnd *cmd) { - int err; + int err = SUCCESS; unsigned long flags; struct ufs_hba *hba; hba = shost_priv(cmd->device->host); - ufshcd_hold(hba, false); - /* - * Check if there is any race with fatal error handling. - * If so, wait for it to complete. Even though fatal error - * handling does reset and restore in some cases, don't assume - * anything out of it. We are just avoiding race here. - */ - do { - spin_lock_irqsave(hba->host->host_lock, flags); - if (!(work_pending(&hba->eh_work) || - hba->ufshcd_state == UFSHCD_STATE_RESET || - hba->ufshcd_state == UFSHCD_STATE_EH_SCHEDULED)) - break; - spin_unlock_irqrestore(hba->host->host_lock, flags); - dev_dbg(hba->dev, "%s: reset in progress\n", __func__); - flush_work(&hba->eh_work); - } while (1); - - hba->ufshcd_state = UFSHCD_STATE_RESET; - ufshcd_set_eh_in_progress(hba); + spin_lock_irqsave(hba->host->host_lock, flags); + hba->force_reset = true; + ufshcd_schedule_eh_work(hba); + dev_err(hba->dev, "%s: reset in progress - 1\n", __func__); spin_unlock_irqrestore(hba->host->host_lock, flags); - err = ufshcd_reset_and_restore(hba); + flush_work(&hba->eh_work); spin_lock_irqsave(hba->host->host_lock, flags); - if (!err) { - err = SUCCESS; - hba->ufshcd_state = UFSHCD_STATE_OPERATIONAL; - } else { + if (hba->ufshcd_state == UFSHCD_STATE_ERROR) err = FAILED; - hba->ufshcd_state = UFSHCD_STATE_ERROR; - } - ufshcd_clear_eh_in_progress(hba); spin_unlock_irqrestore(hba->host->host_lock, flags); - ufshcd_release(hba); return err; } @@ -7398,6 +7389,7 @@ static int ufshcd_add_lus(struct ufs_hba *hba) static int ufshcd_probe_hba(struct ufs_hba *hba, bool async) { int ret; + unsigned long flags; ktime_t start = ktime_get(); ret = ufshcd_link_startup(hba); @@ -7462,14 +7454,17 @@ static int ufshcd_probe_hba(struct ufs_hba *hba, bool async) */ ufshcd_set_active_icc_lvl(hba); - /* set the state as operational after switching to desired gear */ - hba->ufshcd_state = UFSHCD_STATE_OPERATIONAL; - ufshcd_wb_config(hba); /* Enable Auto-Hibernate if configured */ ufshcd_auto_hibern8_enable(hba); out: + spin_lock_irqsave(hba->host->host_lock, flags); + if (ret) + hba->ufshcd_state = UFSHCD_STATE_ERROR; + else if (hba->ufshcd_state == UFSHCD_STATE_RESET) + hba->ufshcd_state = UFSHCD_STATE_OPERATIONAL; + spin_unlock_irqrestore(hba->host->host_lock, flags); trace_ufshcd_init(dev_name(hba->dev), ret, ktime_to_us(ktime_sub(ktime_get(), start)), @@ -8076,10 +8071,13 @@ static int ufshcd_link_state_transition(struct ufs_hba *hba, if (req_link_state == UIC_LINK_HIBERN8_STATE) { ret = ufshcd_uic_hibern8_enter(hba); - if (!ret) + if (!ret) { ufshcd_set_link_hibern8(hba); - else + } else { + dev_err(hba->dev, "%s: hibern8 enter failed %d\n", + __func__, ret); goto out; + } } /* * If autobkops is enabled, link can't be turned off because @@ -8095,8 +8093,11 @@ static int ufshcd_link_state_transition(struct ufs_hba *hba, * unipro. But putting the link in hibern8 is much faster. */ ret = ufshcd_uic_hibern8_enter(hba); - if (ret) + if (ret) { + dev_err(hba->dev, "%s: hibern8 enter failed %d\n", + __func__, ret); goto out; + } /* * Change controller state to "reset state" which * should also put the link in off/reset state @@ -8405,10 +8406,13 @@ static int ufshcd_resume(struct ufs_hba *hba, enum ufs_pm_op pm_op) if (ufshcd_is_link_hibern8(hba)) { ret = ufshcd_uic_hibern8_exit(hba); - if (!ret) + if (!ret) { ufshcd_set_link_active(hba); - else + } else { + dev_err(hba->dev, "%s: hibern8 exit failed %d\n", + __func__, ret); goto vendor_suspend; + } } else if (ufshcd_is_link_off(hba)) { /* * A full initialization of the host and the device is @@ -8782,6 +8786,7 @@ int ufshcd_init(struct ufs_hba *hba, void __iomem *mmio_base, unsigned int irq) int err; struct Scsi_Host *host = hba->host; struct device *dev = hba->dev; + char eh_wq_name[sizeof("ufs_eh_wq_00")]; if (!mmio_base) { dev_err(hba->dev, @@ -8843,6 +8848,15 @@ int ufshcd_init(struct ufs_hba *hba, void __iomem *mmio_base, unsigned int irq) hba->max_pwr_info.is_valid = false; /* Initialize work queues */ + snprintf(eh_wq_name, sizeof(eh_wq_name), "ufs_eh_wq_%d", + hba->host->host_no); + hba->eh_wq = create_singlethread_workqueue(eh_wq_name); + if (!hba->eh_wq) { + dev_err(hba->dev, "%s: failed to create eh workqueue\n", + __func__); + err = -ENOMEM; + goto out_disable; + } INIT_WORK(&hba->eh_work, ufshcd_err_handler); INIT_WORK(&hba->eeh_work, ufshcd_exception_event_handler); diff --git a/drivers/scsi/ufs/ufshcd.h b/drivers/scsi/ufs/ufshcd.h index 5b2cdaf..83e5cd9 100644 --- a/drivers/scsi/ufs/ufshcd.h +++ b/drivers/scsi/ufs/ufshcd.h @@ -90,6 +90,7 @@ enum uic_link_state { UIC_LINK_OFF_STATE = 0, /* Link powered down or disabled */ UIC_LINK_ACTIVE_STATE = 1, /* Link is in Fast/Slow/Sleep state */ UIC_LINK_HIBERN8_STATE = 2, /* Link is in Hibernate state */ + UIC_LINK_BROKEN_STATE = 3, /* Link is in broken state */ }; #define ufshcd_is_link_off(hba) ((hba)->uic_link_state == UIC_LINK_OFF_STATE) @@ -97,11 +98,15 @@ enum uic_link_state { UIC_LINK_ACTIVE_STATE) #define ufshcd_is_link_hibern8(hba) ((hba)->uic_link_state == \ UIC_LINK_HIBERN8_STATE) +#define ufshcd_is_link_broken(hba) ((hba)->uic_link_state == \ + UIC_LINK_BROKEN_STATE) #define ufshcd_set_link_off(hba) ((hba)->uic_link_state = UIC_LINK_OFF_STATE) #define ufshcd_set_link_active(hba) ((hba)->uic_link_state = \ UIC_LINK_ACTIVE_STATE) #define ufshcd_set_link_hibern8(hba) ((hba)->uic_link_state = \ UIC_LINK_HIBERN8_STATE) +#define ufshcd_set_link_broken(hba) ((hba)->uic_link_state = \ + UIC_LINK_BROKEN_STATE) #define ufshcd_set_ufs_dev_active(h) \ ((h)->curr_dev_pwr_mode = UFS_ACTIVE_PWR_MODE) @@ -613,12 +618,14 @@ struct ufs_hba_variant_params { * @intr_mask: Interrupt Mask Bits * @ee_ctrl_mask: Exception event control mask * @is_powered: flag to check if HBA is powered + * @eh_wq: Workqueue that eh_work works on * @eh_work: Worker to handle UFS errors that require s/w attention * @eeh_work: Worker to handle exception events * @errors: HBA errors * @uic_error: UFS interconnect layer error status * @saved_err: sticky error mask * @saved_uic_err: sticky UIC error mask + * @force_reset: flag to force eh_work perform a full reset * @silence_err_logs: flag to silence error logs * @dev_cmd: ufs device management command information * @last_dme_cmd_tstamp: time stamp of the last completed DME command @@ -707,6 +714,7 @@ struct ufs_hba { bool is_powered; /* Work Queues */ + struct workqueue_struct *eh_wq; struct work_struct eh_work; struct work_struct eeh_work; @@ -716,6 +724,7 @@ struct ufs_hba { u32 saved_err; u32 saved_uic_err; struct ufs_stats ufs_stats; + bool force_reset; bool silence_err_logs; /* Device management request data */ From patchwork Thu Jul 23 02:34:06 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Can Guo X-Patchwork-Id: 11679617 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 182F06C1 for ; Thu, 23 Jul 2020 02:34:52 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 02A082084D for ; Thu, 23 Jul 2020 02:34:52 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2387627AbgGWCeo (ORCPT ); Wed, 22 Jul 2020 22:34:44 -0400 Received: from labrats.qualcomm.com ([199.106.110.90]:62898 "EHLO labrats.qualcomm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1733221AbgGWCeo (ORCPT ); Wed, 22 Jul 2020 22:34:44 -0400 IronPort-SDR: FXeQHEz8IFPPJLfVdMh6zsf76ydA82mn5GsAX017DSdMKZHsvI3hmW+YjwTEuOQGuzzGseu3Kl b0BL0SEfCiR5TzTVOM0J2GZ1fpl2TsSbfnN5Jo/1hIWmQVzIZT50AUCbglqMFq4Wt9mpB/E7C7 qKLr8US9m/9lqo1HZV3YeoR/VFHO6tPgEzAtUoi9lxiAf7xIzCwnCRsbQ4IefWNye5NfmN87Qb 2VoogaaNS8pIWouDWWLiUE3VDisKRcmMrF1TGHsb1CGmEkf6ZAH2z6I3NNJKSKOZpj1es2Sbo2 PFY= X-IronPort-AV: E=Sophos;i="5.75,385,1589266800"; d="scan'208";a="47229205" Received: from unknown (HELO ironmsg02-sd.qualcomm.com) ([10.53.140.142]) by labrats.qualcomm.com with ESMTP; 22 Jul 2020 19:34:43 -0700 Received: from pacamara-linux.qualcomm.com ([192.168.140.135]) by ironmsg02-sd.qualcomm.com with ESMTP; 22 Jul 2020 19:34:41 -0700 Received: by pacamara-linux.qualcomm.com (Postfix, from userid 359480) id A33AA22A4D; Wed, 22 Jul 2020 19:34:40 -0700 (PDT) From: Can Guo To: asutoshd@codeaurora.org, nguyenb@codeaurora.org, hongwus@codeaurora.org, rnayak@codeaurora.org, sh425.lee@samsung.com, linux-scsi@vger.kernel.org, kernel-team@android.com, saravanak@google.com, salyzyn@google.com, cang@codeaurora.org Cc: Alim Akhtar , Avri Altman , "James E.J. Bottomley" , "Martin K. Petersen" , Stanley Chu , Bean Huo , Bart Van Assche , Satya Tangirala , linux-kernel@vger.kernel.org (open list) Subject: [PATCH v5 7/9] scsi: ufs: Recover hba runtime PM error in error handler Date: Wed, 22 Jul 2020 19:34:06 -0700 Message-Id: <1595471649-25675-8-git-send-email-cang@codeaurora.org> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1595471649-25675-1-git-send-email-cang@codeaurora.org> References: <1595471649-25675-1-git-send-email-cang@codeaurora.org> Sender: linux-scsi-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-scsi@vger.kernel.org Current error handler cannot work well or recover hba runtime PM error if ufshcd_suspend/resume has failed due to UFS errors, e.g. hibern8 enter/exit error or SSU cmd error. When this happens, error handler may fail doing full reset and restore because error handler always assumes that powers, IRQs and clocks are ready after pm_runtime_get_sync returns, but actually they are not if ufshcd_reusme fails [1]. Besides, if ufschd_suspend/resume fails due to UFS error, runtime PM framework saves the error value to dev.power.runtime_error. After that, hba dev runtime suspend/resume would not be invoked anymore unless runtime_error is cleared [2]. In case of ufshcd_suspend/resume fails due to UFS errors, for scenario [1], error handler cannot assume anything of pm_runtime_get_sync, meaning error handler should explicitly turn ON powers, IRQs and clocks again. To get the hba runtime PM work as regard for scenario [2], error handler can clear the runtime_error by calling pm_runtime_set_active() if full reset and restore succeeds. And, more important, if pm_runtime_set_active() returns no error, which means runtime_error has been cleared, we also need to resume those scsi devices under hba in case any of them has failed to be resumed due to hba runtime resume failure. This is to unblock blk_queue_enter in case there are bios waiting inside it. In addition, if ufshcd_resume errors out, ufshcd_release in ufshcd_resume would be skipped. After hba runtime PM error is recovered in error handler, we should do ufshcd_release once to get clock gating back to work. Signed-off-by: Can Guo --- drivers/scsi/ufs/ufshcd.c | 90 +++++++++++++++++++++++++++++++++++++++++++---- drivers/scsi/ufs/ufshcd.h | 1 + 2 files changed, 85 insertions(+), 6 deletions(-) diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c index c01743a..68705a1 100644 --- a/drivers/scsi/ufs/ufshcd.c +++ b/drivers/scsi/ufs/ufshcd.c @@ -15,6 +15,7 @@ #include #include #include +#include #include "ufshcd.h" #include "ufs_quirks.h" #include "unipro.h" @@ -229,6 +230,10 @@ static irqreturn_t ufshcd_intr(int irq, void *__hba); static int ufshcd_change_power_mode(struct ufs_hba *hba, struct ufs_pa_layer_attr *pwr_mode); static void ufshcd_schedule_eh_work(struct ufs_hba *hba); +static int ufshcd_setup_hba_vreg(struct ufs_hba *hba, bool on); +static int ufshcd_setup_vreg(struct ufs_hba *hba, bool on); +static inline int ufshcd_config_vreg_hpm(struct ufs_hba *hba, + struct ufs_vreg *vreg); static int ufshcd_wb_buf_flush_enable(struct ufs_hba *hba); static int ufshcd_wb_buf_flush_disable(struct ufs_hba *hba); static int ufshcd_wb_ctrl(struct ufs_hba *hba, bool enable); @@ -5553,6 +5558,77 @@ static inline void ufshcd_schedule_eh_work(struct ufs_hba *hba) } } +static void ufshcd_err_handler_prepare(struct ufs_hba *hba) +{ + pm_runtime_get_sync(hba->dev); + /* + * Don't assume anything of pm_runtime_get_sync(), if resume fails, + * irq and clocks can be OFF, and powers can be OFF or in LPM. + */ + ufshcd_setup_vreg(hba, true); + ufshcd_config_vreg_hpm(hba, hba->vreg_info.vccq); + ufshcd_config_vreg_hpm(hba, hba->vreg_info.vccq2); + ufshcd_setup_hba_vreg(hba, true); + ufshcd_enable_irq(hba); + + ufshcd_hold(hba, false); + if (!ufshcd_is_clkgating_allowed(hba)) + ufshcd_setup_clocks(hba, true); + + if (ufshcd_is_clkscaling_supported(hba)) { + cancel_work_sync(&hba->clk_scaling.suspend_work); + cancel_work_sync(&hba->clk_scaling.resume_work); + ufshcd_suspend_clkscaling(hba); + } +} + +static void ufshcd_err_handler_unprepare(struct ufs_hba *hba) +{ + /* If clk_gating is held by pm ops, release it */ + if (pm_runtime_active(hba->dev) && hba->clk_gating.held_by_pm) { + hba->clk_gating.held_by_pm = false; + ufshcd_release(hba); + } + ufshcd_release(hba); + if (ufshcd_is_clkscaling_supported(hba)) + ufshcd_resume_clkscaling(hba); + pm_runtime_put(hba->dev); +} + +#ifdef CONFIG_PM +static void ufshcd_recover_pm_error(struct ufs_hba *hba) +{ + struct Scsi_Host *shost = hba->host; + struct scsi_device *sdev; + struct request_queue *q; + int ret; + + /* + * Set RPM status of hba device to RPM_ACTIVE, + * this also clears its runtime error. + */ + ret = pm_runtime_set_active(hba->dev); + /* + * If hba device had runtime error, we also need to resume those + * scsi devices under hba in case any of them has failed to be + * resumed due to hba runtime resume failure. This is to unblock + * blk_queue_enter in case there are bios waiting inside it. + */ + if (!ret) { + list_for_each_entry(sdev, &shost->__devices, siblings) { + q = sdev->request_queue; + if (q->dev && (q->rpm_status == RPM_SUSPENDED || + q->rpm_status == RPM_SUSPENDING)) + pm_request_resume(q->dev); + } + } +} +#else +static inline void ufshcd_recover_pm_error(struct ufs_hba *hba) +{ +} +#endif + /** * ufshcd_err_handler - handle UFS errors that require s/w attention * @work: pointer to work structure @@ -5580,9 +5656,7 @@ static void ufshcd_err_handler(struct work_struct *work) } ufshcd_set_eh_in_progress(hba); spin_unlock_irqrestore(hba->host->host_lock, flags); - pm_runtime_get_sync(hba->dev); - ufshcd_hold(hba, false); - + ufshcd_err_handler_prepare(hba); spin_lock_irqsave(hba->host->host_lock, flags); hba->ufshcd_state = UFSHCD_STATE_RESET; @@ -5661,10 +5735,12 @@ static void ufshcd_err_handler(struct work_struct *work) hba->force_reset = false; spin_unlock_irqrestore(hba->host->host_lock, flags); err = ufshcd_reset_and_restore(hba); - spin_lock_irqsave(hba->host->host_lock, flags); if (err) dev_err(hba->dev, "%s: reset and restore failed with err %d\n", __func__, err); + else + ufshcd_recover_pm_error(hba); + spin_lock_irqsave(hba->host->host_lock, flags); } skip_err_handling: @@ -5679,8 +5755,7 @@ static void ufshcd_err_handler(struct work_struct *work) ufshcd_clear_eh_in_progress(hba); spin_unlock_irqrestore(hba->host->host_lock, flags); ufshcd_scsi_unblock_requests(hba); - ufshcd_release(hba); - pm_runtime_put_sync(hba->dev); + ufshcd_err_handler_unprepare(hba); } /** @@ -8232,6 +8307,7 @@ static int ufshcd_suspend(struct ufs_hba *hba, enum ufs_pm_op pm_op) * just gate the clocks. */ ufshcd_hold(hba, false); + hba->clk_gating.held_by_pm = true; hba->clk_gating.is_suspended = true; if (hba->clk_scaling.is_allowed) { @@ -8351,6 +8427,7 @@ static int ufshcd_suspend(struct ufs_hba *hba, enum ufs_pm_op pm_op) hba->clk_gating.is_suspended = false; hba->dev_info.b_rpm_dev_flush_capable = false; ufshcd_release(hba); + hba->clk_gating.held_by_pm = false; out: if (hba->dev_info.b_rpm_dev_flush_capable) { schedule_delayed_work(&hba->rpm_dev_flush_recheck_work, @@ -8457,6 +8534,7 @@ static int ufshcd_resume(struct ufs_hba *hba, enum ufs_pm_op pm_op) /* Schedule clock gating in case of no access to UFS device yet */ ufshcd_release(hba); + hba->clk_gating.held_by_pm = false; goto out; diff --git a/drivers/scsi/ufs/ufshcd.h b/drivers/scsi/ufs/ufshcd.h index 83e5cd9..585e58b 100644 --- a/drivers/scsi/ufs/ufshcd.h +++ b/drivers/scsi/ufs/ufshcd.h @@ -354,6 +354,7 @@ struct ufs_clk_gating { struct device_attribute delay_attr; struct device_attribute enable_attr; bool is_enabled; + bool held_by_pm; int active_reqs; struct workqueue_struct *clk_gating_workq; }; From patchwork Thu Jul 23 02:34:07 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Can Guo X-Patchwork-Id: 11679615 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id F29096C1 for ; Thu, 23 Jul 2020 02:34:49 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id E3F29208E4 for ; Thu, 23 Jul 2020 02:34:49 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2387646AbgGWCep (ORCPT ); Wed, 22 Jul 2020 22:34:45 -0400 Received: from labrats.qualcomm.com ([199.106.110.90]:62898 "EHLO labrats.qualcomm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2387483AbgGWCep (ORCPT ); Wed, 22 Jul 2020 22:34:45 -0400 IronPort-SDR: g6MyfOyK+jFfLqNfuFB7zQkb6MVHf/GVvHBndmAbFCsJD3qHp9+tZT/9K3EK8w7z0QiFE4woUm NDdykMAReMAqETKG66tTH/4i0YJxLYY1tI/SkyyhfUTHCICJ8uY8mzANWqPiYkikqtGYPH0nka KMJYswCJnUQgUnls7/VABBACvGugk155cEhZXOwo2hmUe/JXohDy8+s0WKev9et7UA2yzLgW8y NTrFTmFaEXifYqEEX96dZ5jPpZNmshhIMpAXsVTylPSb0sBpIPqMw59rbnTeiP0CnkEuEJ/hnj jYg= X-IronPort-AV: E=Sophos;i="5.75,385,1589266800"; d="scan'208";a="47229206" Received: from unknown (HELO ironmsg03-sd.qualcomm.com) ([10.53.140.143]) by labrats.qualcomm.com with ESMTP; 22 Jul 2020 19:34:44 -0700 Received: from pacamara-linux.qualcomm.com ([192.168.140.135]) by ironmsg03-sd.qualcomm.com with ESMTP; 22 Jul 2020 19:34:44 -0700 Received: by pacamara-linux.qualcomm.com (Postfix, from userid 359480) id 4C12C22A4D; Wed, 22 Jul 2020 19:34:44 -0700 (PDT) From: Can Guo To: asutoshd@codeaurora.org, nguyenb@codeaurora.org, hongwus@codeaurora.org, rnayak@codeaurora.org, sh425.lee@samsung.com, linux-scsi@vger.kernel.org, kernel-team@android.com, saravanak@google.com, salyzyn@google.com, cang@codeaurora.org Cc: Alim Akhtar , Avri Altman , "James E.J. Bottomley" , "Martin K. Petersen" , Stanley Chu , Bean Huo , Bart Van Assche , linux-kernel@vger.kernel.org (open list) Subject: [PATCH v5 8/9] scsi: ufs: Move dumps in IRQ handler to error handler Date: Wed, 22 Jul 2020 19:34:07 -0700 Message-Id: <1595471649-25675-9-git-send-email-cang@codeaurora.org> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1595471649-25675-1-git-send-email-cang@codeaurora.org> References: <1595471649-25675-1-git-send-email-cang@codeaurora.org> Sender: linux-scsi-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-scsi@vger.kernel.org Sometime dumps in IRQ handler are heavy enough to cause system stability issues, move them to error handler. Signed-off-by: Can Guo --- drivers/scsi/ufs/ufshcd.c | 31 +++++++++++++++---------------- 1 file changed, 15 insertions(+), 16 deletions(-) diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c index 68705a1..ae78d5d 100644 --- a/drivers/scsi/ufs/ufshcd.c +++ b/drivers/scsi/ufs/ufshcd.c @@ -5681,6 +5681,21 @@ static void ufshcd_err_handler(struct work_struct *work) UFSHCD_UIC_DL_TCx_REPLAY_ERROR)))) needs_reset = true; + if (hba->saved_err & (INT_FATAL_ERRORS | UIC_ERROR | + UFSHCD_UIC_HIBERN8_MASK)) { + bool pr_prdt = !!(hba->saved_err & SYSTEM_BUS_FATAL_ERROR); + + dev_err(hba->dev, "%s: saved_err 0x%x saved_uic_err 0x%x\n", + __func__, hba->saved_err, hba->saved_uic_err); + spin_unlock_irqrestore(hba->host->host_lock, flags); + ufshcd_print_host_state(hba); + ufshcd_print_pwr_info(hba); + ufshcd_print_host_regs(hba); + ufshcd_print_tmrs(hba, hba->outstanding_tasks); + ufshcd_print_trs(hba, hba->outstanding_reqs, pr_prdt); + spin_lock_irqsave(hba->host->host_lock, flags); + } + /* * if host reset is required then skip clearing the pending * transfers forcefully because they will get cleared during @@ -5899,22 +5914,6 @@ static irqreturn_t ufshcd_check_errors(struct ufs_hba *hba) /* block commands from scsi mid-layer */ ufshcd_scsi_block_requests(hba); - - /* dump controller state before resetting */ - if (hba->saved_err & (INT_FATAL_ERRORS | UIC_ERROR)) { - bool pr_prdt = !!(hba->saved_err & - SYSTEM_BUS_FATAL_ERROR); - - dev_err(hba->dev, "%s: saved_err 0x%x saved_uic_err 0x%x\n", - __func__, hba->saved_err, - hba->saved_uic_err); - - ufshcd_print_host_regs(hba); - ufshcd_print_pwr_info(hba); - ufshcd_print_tmrs(hba, hba->outstanding_tasks); - ufshcd_print_trs(hba, hba->outstanding_reqs, - pr_prdt); - } ufshcd_schedule_eh_work(hba); retval |= IRQ_HANDLED; } From patchwork Thu Jul 23 02:34:08 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Can Guo X-Patchwork-Id: 11679619 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 9E4AB6C1 for ; Thu, 23 Jul 2020 02:34:57 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 8AF052084D for ; Thu, 23 Jul 2020 02:34:57 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2387667AbgGWCey (ORCPT ); Wed, 22 Jul 2020 22:34:54 -0400 Received: from labrats.qualcomm.com ([199.106.110.90]:1953 "EHLO labrats.qualcomm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1731405AbgGWCey (ORCPT ); Wed, 22 Jul 2020 22:34:54 -0400 IronPort-SDR: 4YMtuW61e2WoCBKgqyINgBtiI4yCbohLK63H0C/kmrEdBdrAPmBAbYVuTHVdxEamCPOJyDRjMt A1HX/6UspjFgPpm68KIly3g5J4U3IBY0fH2LesAJCT4FKPb8OaR/fptkzwdjhONYHDyUKQxk/d M2qdReeJWxfnXz8DAJVov96SkN+2VWVfKAJv6uxTqbcWQkZUqiJnIXY/HhX9odJ2+6LP48w25w jJcAy9tzOyBEClJhtkOQnlv4jlm3EVYYLNl29Bms34GEPvx4g/NqWsDirlAaSe2cNwLTkuPWM+ 6xo= X-IronPort-AV: E=Sophos;i="5.75,385,1589266800"; d="scan'208";a="29047798" Received: from unknown (HELO ironmsg04-sd.qualcomm.com) ([10.53.140.144]) by labrats.qualcomm.com with ESMTP; 22 Jul 2020 19:34:53 -0700 Received: from pacamara-linux.qualcomm.com ([192.168.140.135]) by ironmsg04-sd.qualcomm.com with ESMTP; 22 Jul 2020 19:34:52 -0700 Received: by pacamara-linux.qualcomm.com (Postfix, from userid 359480) id 6101C22A4D; Wed, 22 Jul 2020 19:34:52 -0700 (PDT) From: Can Guo To: asutoshd@codeaurora.org, nguyenb@codeaurora.org, hongwus@codeaurora.org, rnayak@codeaurora.org, sh425.lee@samsung.com, linux-scsi@vger.kernel.org, kernel-team@android.com, saravanak@google.com, salyzyn@google.com, cang@codeaurora.org Cc: Alim Akhtar , Avri Altman , "James E.J. Bottomley" , "Martin K. Petersen" , Stanley Chu , Bean Huo , Bart Van Assche , linux-kernel@vger.kernel.org (open list) Subject: [PATCH v5 9/9] scsi: ufs: Fix a racing problem btw error handler and runtime PM ops Date: Wed, 22 Jul 2020 19:34:08 -0700 Message-Id: <1595471649-25675-10-git-send-email-cang@codeaurora.org> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1595471649-25675-1-git-send-email-cang@codeaurora.org> References: <1595471649-25675-1-git-send-email-cang@codeaurora.org> Sender: linux-scsi-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-scsi@vger.kernel.org Current IRQ handler blocks scsi requests before scheduling eh_work, when error handler calls pm_runtime_get_sync, if ufshcd_suspend/resume sends a scsi cmd, most likely the SSU cmd, since scsi requests are blocked, pm_runtime_get_sync() will never return because ufshcd_suspend/reusme is blocked by the scsi cmd. Some changes and code re-arrangement can be made to resolve it. o In queuecommand path, hba->ufshcd_state check and ufshcd_send_command should stay into the same spin lock. This is to make sure that no more commands leak into doorbell after hba->ufshcd_state is changed. o Don't block scsi requests before scheduling eh_work, let error handler block scsi requests when it is ready to start error recovery. o Don't let scsi layer keep requeuing the scsi cmds sent from hba runtime PM ops, let them pass or fail them. Let them pass if eh_work is scheduled due to non-fatal errors. Fail them fail if eh_work is scheduled due to fatal errors, otherwise the cmds may eventually time out since UFS is in bad state, which gets error handler blocked for too long. If we fail the scsi cmds sent from hba runtime PM ops, hba runtime PM ops fails too, but it does not hurt since error handler can recover hba runtime PM error. Signed-off-by: Can Guo --- drivers/scsi/ufs/ufshcd.c | 84 +++++++++++++++++++++++++++-------------------- 1 file changed, 49 insertions(+), 35 deletions(-) diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c index ae78d5d..e9d8c4f 100644 --- a/drivers/scsi/ufs/ufshcd.c +++ b/drivers/scsi/ufs/ufshcd.c @@ -126,7 +126,8 @@ enum { UFSHCD_STATE_RESET, UFSHCD_STATE_ERROR, UFSHCD_STATE_OPERATIONAL, - UFSHCD_STATE_EH_SCHEDULED, + UFSHCD_STATE_EH_SCHEDULED_FATAL, + UFSHCD_STATE_EH_SCHEDULED_NON_FATAL, }; /* UFSHCD error handling flags */ @@ -2515,34 +2516,6 @@ static int ufshcd_queuecommand(struct Scsi_Host *host, struct scsi_cmnd *cmd) if (!down_read_trylock(&hba->clk_scaling_lock)) return SCSI_MLQUEUE_HOST_BUSY; - spin_lock_irqsave(hba->host->host_lock, flags); - switch (hba->ufshcd_state) { - case UFSHCD_STATE_OPERATIONAL: - break; - case UFSHCD_STATE_EH_SCHEDULED: - case UFSHCD_STATE_RESET: - err = SCSI_MLQUEUE_HOST_BUSY; - goto out_unlock; - case UFSHCD_STATE_ERROR: - set_host_byte(cmd, DID_ERROR); - cmd->scsi_done(cmd); - goto out_unlock; - default: - dev_WARN_ONCE(hba->dev, 1, "%s: invalid state %d\n", - __func__, hba->ufshcd_state); - set_host_byte(cmd, DID_BAD_TARGET); - cmd->scsi_done(cmd); - goto out_unlock; - } - - /* if error handling is in progress, don't issue commands */ - if (ufshcd_eh_in_progress(hba)) { - set_host_byte(cmd, DID_ERROR); - cmd->scsi_done(cmd); - goto out_unlock; - } - spin_unlock_irqrestore(hba->host->host_lock, flags); - hba->req_abort_count = 0; err = ufshcd_hold(hba, true); @@ -2578,11 +2551,50 @@ static int ufshcd_queuecommand(struct Scsi_Host *host, struct scsi_cmnd *cmd) /* Make sure descriptors are ready before ringing the doorbell */ wmb(); - /* issue command to the controller */ spin_lock_irqsave(hba->host->host_lock, flags); + switch (hba->ufshcd_state) { + case UFSHCD_STATE_OPERATIONAL: + case UFSHCD_STATE_EH_SCHEDULED_NON_FATAL: + break; + case UFSHCD_STATE_EH_SCHEDULED_FATAL: + /* + * If we are here, eh_work is either scheduled or running. + * Before eh_work sets ufshcd_state to STATE_RESET, it flushes + * runtime PM ops by calling pm_runtime_get_sync(). If a scsi + * cmd, e.g. the SSU cmd, is sent by PM ops, it can never be + * finished if we let SCSI layer keep retrying it, which gets + * eh_work stuck forever. Neither can we let it pass, because + * ufs now is not in good status, so the SSU cmd may eventually + * time out, blocking eh_work for too long. So just let it fail. + */ + if (hba->pm_op_in_progress) { + hba->force_reset = true; + set_host_byte(cmd, DID_BAD_TARGET); + goto out_compl_cmd; + } + case UFSHCD_STATE_RESET: + err = SCSI_MLQUEUE_HOST_BUSY; + goto out_compl_cmd; + case UFSHCD_STATE_ERROR: + set_host_byte(cmd, DID_ERROR); + goto out_compl_cmd; + default: + dev_WARN_ONCE(hba->dev, 1, "%s: invalid state %d\n", + __func__, hba->ufshcd_state); + set_host_byte(cmd, DID_BAD_TARGET); + goto out_compl_cmd; + } ufshcd_send_command(hba, tag); -out_unlock: spin_unlock_irqrestore(hba->host->host_lock, flags); + goto out; + +out_compl_cmd: + scsi_dma_unmap(lrbp->cmd); + lrbp->cmd = NULL; + spin_unlock_irqrestore(hba->host->host_lock, flags); + ufshcd_release(hba); + if (!err) + cmd->scsi_done(cmd); out: up_read(&hba->clk_scaling_lock); return err; @@ -5553,7 +5565,11 @@ static inline void ufshcd_schedule_eh_work(struct ufs_hba *hba) { /* handle fatal errors only when link is not in error state */ if (hba->ufshcd_state != UFSHCD_STATE_ERROR) { - hba->ufshcd_state = UFSHCD_STATE_EH_SCHEDULED; + if (hba->force_reset || ufshcd_is_link_broken(hba) || + ufshcd_is_saved_err_fatal(hba)) + hba->ufshcd_state = UFSHCD_STATE_EH_SCHEDULED_FATAL; + else + hba->ufshcd_state = UFSHCD_STATE_EH_SCHEDULED_NON_FATAL; queue_work(hba->eh_wq, &hba->eh_work); } } @@ -5658,6 +5674,7 @@ static void ufshcd_err_handler(struct work_struct *work) spin_unlock_irqrestore(hba->host->host_lock, flags); ufshcd_err_handler_prepare(hba); spin_lock_irqsave(hba->host->host_lock, flags); + ufshcd_scsi_block_requests(hba); hba->ufshcd_state = UFSHCD_STATE_RESET; /* Complete requests that have door-bell cleared by h/w */ @@ -5911,9 +5928,6 @@ static irqreturn_t ufshcd_check_errors(struct ufs_hba *hba) */ hba->saved_err |= hba->errors; hba->saved_uic_err |= hba->uic_error; - - /* block commands from scsi mid-layer */ - ufshcd_scsi_block_requests(hba); ufshcd_schedule_eh_work(hba); retval |= IRQ_HANDLED; }