From patchwork Wed Jan 29 12:40:07 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jacek Lawrynowicz X-Patchwork-Id: 13953727 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 3D922C0218D for ; Wed, 29 Jan 2025 12:40:26 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id B586610E7E6; Wed, 29 Jan 2025 12:40:25 +0000 (UTC) Authentication-Results: gabe.freedesktop.org; dkim=pass (2048-bit key; unprotected) header.d=intel.com header.i=@intel.com header.b="QJcmH+7t"; dkim-atps=neutral Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.13]) by gabe.freedesktop.org (Postfix) with ESMTPS id 594A710E7E2 for ; Wed, 29 Jan 2025 12:40:15 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1738154416; x=1769690416; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=ORDKKvPDsTtumPUvXJ34As3hCg4Jf6LPzVj4Imt/ICQ=; b=QJcmH+7tk3idvl/GzUFwasy5OKYYCAS++PF5lyp7sFPLgJWvneOwLtjX TnMCooQCcmEJrZLq8b+HOUIK+hvTxFbypYzp2mUnpOz3ebW3tHwWP8nR1 0VNb7Ugvp3+MmrNeJo0UY91ODw3SK+doXrCsvYhacRFEDFf+wvMT7f3VT F8YGTeJ1LTQJS7oCqG3egowau+jcGnnUPkjY5zML+BzZHYGUo0SrjfaTp qTRjd0f0zr+XED/aurR8r8AXbX7SrClgs0TV3lGDWfXBjTk8qU9YBwmTh pGiRd1ENPI6Z7ox82gz0U1X+In1Yfxp4h9o+kv6wut50utx/Dv0PFg5Zk g==; X-CSE-ConnectionGUID: C5gd+gWoS0im6rIbOT5j7w== X-CSE-MsgGUID: smbS6d10RfCvHpTx+c/UjQ== X-IronPort-AV: E=McAfee;i="6700,10204,11329"; a="49647263" X-IronPort-AV: E=Sophos;i="6.13,243,1732608000"; d="scan'208";a="49647263" Received: from fmviesa003.fm.intel.com ([10.60.135.143]) by orvoesa105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 29 Jan 2025 04:40:16 -0800 X-CSE-ConnectionGUID: Zi/2YPCAQVSWEVsi2qhNiw== X-CSE-MsgGUID: puULkPcaSXO+w6n18m+U7w== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.12,224,1728975600"; d="scan'208";a="113030871" Received: from jlawryno.igk.intel.com ([10.91.220.59]) by fmviesa003-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 29 Jan 2025 04:40:13 -0800 From: Jacek Lawrynowicz To: dri-devel@lists.freedesktop.org Cc: oded.gabbay@gmail.com, quic_jhugo@quicinc.com, maciej.falkowski@linux.intel.com, Jacek Lawrynowicz , stable@vger.kernel.org, Karol Wachowski Subject: [PATCH 1/3] accel/ivpu: Fix error handling in ivpu_boot() Date: Wed, 29 Jan 2025 13:40:07 +0100 Message-ID: <20250129124009.1039982-2-jacek.lawrynowicz@linux.intel.com> X-Mailer: git-send-email 2.45.1 In-Reply-To: <20250129124009.1039982-1-jacek.lawrynowicz@linux.intel.com> References: <20250129124009.1039982-1-jacek.lawrynowicz@linux.intel.com> MIME-Version: 1.0 X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" Ensure IRQs and IPC are properly disabled if HW sched or DCT initialization fails. Fixes: cc3c72c7e610 ("accel/ivpu: Refactor failure diagnostics during boot") Cc: # v6.13+ Reviewed-by: Karol Wachowski Signed-off-by: Jacek Lawrynowicz Reviewed-by: Jeffrey Hugo --- drivers/accel/ivpu/ivpu_drv.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/drivers/accel/ivpu/ivpu_drv.c b/drivers/accel/ivpu/ivpu_drv.c index ca2bf47ce2484..0c4a82271c26d 100644 --- a/drivers/accel/ivpu/ivpu_drv.c +++ b/drivers/accel/ivpu/ivpu_drv.c @@ -397,15 +397,19 @@ int ivpu_boot(struct ivpu_device *vdev) if (ivpu_fw_is_cold_boot(vdev)) { ret = ivpu_pm_dct_init(vdev); if (ret) - goto err_diagnose_failure; + goto err_disable_ipc; ret = ivpu_hw_sched_init(vdev); if (ret) - goto err_diagnose_failure; + goto err_disable_ipc; } return 0; +err_disable_ipc: + ivpu_ipc_disable(vdev); + ivpu_hw_irq_disable(vdev); + disable_irq(vdev->irq); err_diagnose_failure: ivpu_hw_diagnose_failure(vdev); ivpu_mmu_evtq_dump(vdev); From patchwork Wed Jan 29 12:40:08 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jacek Lawrynowicz X-Patchwork-Id: 13953725 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id D610BC0218D for ; Wed, 29 Jan 2025 12:40:19 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 58A6B10E7E4; Wed, 29 Jan 2025 12:40:19 +0000 (UTC) Authentication-Results: gabe.freedesktop.org; dkim=pass (2048-bit key; unprotected) header.d=intel.com header.i=@intel.com header.b="cwqZcDQI"; dkim-atps=neutral Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.13]) by gabe.freedesktop.org (Postfix) with ESMTPS id 49AEA10E7E4 for ; Wed, 29 Jan 2025 12:40:17 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1738154418; x=1769690418; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=B2CrWdcsOicH9+yP6JOFhLTScavsFkbHwpJkajJlaJM=; b=cwqZcDQI9gR7C8CXeNKU5y9hmIg/TjzD52uwOj7sLrrN7MERd3FY2CVQ oJ87oqtVnJu9SrTlNMSIddxhedTKH2Tg/cbHuMOKRt5o/l66dqatiRI4Z 0MpoXRnz9/E9B/bn4HbFgoKE4Q5ltwIiOvT2l/Mx3yDac5unYdSgE7vRF ERkt0jzj78zkrDmapVtepz4Pi5z20Zjdq2JRK1izSL2YY6RtyaW7wqoo1 TiBycyVfHjNGYtLbVvN+XDjT1qMDWx2z1frJGBkLNq1f40gPmmpAUbwvv bMTDOVP69mYGEwMEL1cqNl3ogfdGaKDmsTHBufXUZRDNIfgpqLGfnHvjD g==; X-CSE-ConnectionGUID: xZeSZMYQRA+71sKvehRRfg== X-CSE-MsgGUID: nPaPqodjSNKDySH29qMEfw== X-IronPort-AV: E=McAfee;i="6700,10204,11329"; a="49647269" X-IronPort-AV: E=Sophos;i="6.13,243,1732608000"; d="scan'208";a="49647269" Received: from fmviesa003.fm.intel.com ([10.60.135.143]) by orvoesa105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 29 Jan 2025 04:40:17 -0800 X-CSE-ConnectionGUID: RxrTt+IvSJ2xz4kjOzRMsg== X-CSE-MsgGUID: g0R02cOsRemid+OBtV39Lg== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.12,224,1728975600"; d="scan'208";a="113030872" Received: from jlawryno.igk.intel.com ([10.91.220.59]) by fmviesa003-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 29 Jan 2025 04:40:15 -0800 From: Jacek Lawrynowicz To: dri-devel@lists.freedesktop.org Cc: oded.gabbay@gmail.com, quic_jhugo@quicinc.com, maciej.falkowski@linux.intel.com, Jacek Lawrynowicz , stable@vger.kernel.org Subject: [PATCH 2/3] accel/ivpu: Clear runtime_error after pm_runtime_resume_and_get() fails Date: Wed, 29 Jan 2025 13:40:08 +0100 Message-ID: <20250129124009.1039982-3-jacek.lawrynowicz@linux.intel.com> X-Mailer: git-send-email 2.45.1 In-Reply-To: <20250129124009.1039982-1-jacek.lawrynowicz@linux.intel.com> References: <20250129124009.1039982-1-jacek.lawrynowicz@linux.intel.com> MIME-Version: 1.0 X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" pm_runtime_resume_and_get() sets dev->power.runtime_error that causes all subsequent pm_runtime_get_sync() calls to fail. Clear the runtime_error using pm_runtime_set_suspended(), so the driver doesn't have to be reloaded to recover when the NPU fails to boot during runtime resume. Fixes: 7d4b4c74432d ("accel/ivpu: Remove suspend_reschedule_counter") Cc: # v6.11+ Reviewed-by: Maciej Falkowski Signed-off-by: Jacek Lawrynowicz Reviewed-by: Jeffrey Hugo --- drivers/accel/ivpu/ivpu_pm.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/drivers/accel/ivpu/ivpu_pm.c b/drivers/accel/ivpu/ivpu_pm.c index 949f4233946c6..c3774d2221326 100644 --- a/drivers/accel/ivpu/ivpu_pm.c +++ b/drivers/accel/ivpu/ivpu_pm.c @@ -309,7 +309,10 @@ int ivpu_rpm_get(struct ivpu_device *vdev) int ret; ret = pm_runtime_resume_and_get(vdev->drm.dev); - drm_WARN_ON(&vdev->drm, ret < 0); + if (ret < 0) { + ivpu_err(vdev, "Failed to resume NPU: %d\n", ret); + pm_runtime_set_suspended(vdev->drm.dev); + } return ret; } From patchwork Wed Jan 29 12:40:09 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Jacek Lawrynowicz X-Patchwork-Id: 13953726 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 822CDC0218D for ; Wed, 29 Jan 2025 12:40:22 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 03F5810E7E5; Wed, 29 Jan 2025 12:40:22 +0000 (UTC) Authentication-Results: gabe.freedesktop.org; dkim=pass (2048-bit key; unprotected) header.d=intel.com header.i=@intel.com header.b="j8kY/Kh3"; dkim-atps=neutral Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.13]) by gabe.freedesktop.org (Postfix) with ESMTPS id BB22110E7E5 for ; Wed, 29 Jan 2025 12:40:19 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1738154420; x=1769690420; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=ycYTW3uXyeWMfli1Zkkt9W6LanpaxIjUMBTLTTzIwRI=; b=j8kY/Kh33jKPeFP4NFaVrp/yoCYAHbf0+NzrJlQ5wCJmRpWG7qdAmnR+ AcnI04A/hSm3HsIkrvj0c0h0YAmX2XmelsOP2SQE32c+no6xdMpHnY3WG uDhV+mlB8uFYxqCQVw33ACNg1mDj++NsbgzRCY/1vzyMjWU7UWxIya9PY 1aTPOQAocwbg3ely9BEcAXx190ciL4bdVjZvMsuwSghGQwy4ca7Z5wvok 289WK0i0dwiTL/qmkyFqU5ui0CWx+AJLrnJSJrSMkskkrT6YFCra802mp +fGH7yqjb55uSLH6PDlYCk5FJBFQ1pnAimFm0F+7we5M/9L875npDVYP0 g==; X-CSE-ConnectionGUID: kw29Ku/qQfWGY8ysT4JteQ== X-CSE-MsgGUID: VcuX337JTzu40QGZfL7J4Q== X-IronPort-AV: E=McAfee;i="6700,10204,11329"; a="49647277" X-IronPort-AV: E=Sophos;i="6.13,243,1732608000"; d="scan'208";a="49647277" Received: from fmviesa003.fm.intel.com ([10.60.135.143]) by orvoesa105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 29 Jan 2025 04:40:20 -0800 X-CSE-ConnectionGUID: ejX3+XEfRh6qIee+vuHBWw== X-CSE-MsgGUID: 5Mar7FTlSgCWertrPeAhzA== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.12,224,1728975600"; d="scan'208";a="113030878" Received: from jlawryno.igk.intel.com ([10.91.220.59]) by fmviesa003-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 29 Jan 2025 04:40:17 -0800 From: Jacek Lawrynowicz To: dri-devel@lists.freedesktop.org Cc: oded.gabbay@gmail.com, quic_jhugo@quicinc.com, maciej.falkowski@linux.intel.com, Jacek Lawrynowicz , stable@vger.kernel.org Subject: [PATCH 3/3] accel/ivpu: Fix error handling in recovery/reset Date: Wed, 29 Jan 2025 13:40:09 +0100 Message-ID: <20250129124009.1039982-4-jacek.lawrynowicz@linux.intel.com> X-Mailer: git-send-email 2.45.1 In-Reply-To: <20250129124009.1039982-1-jacek.lawrynowicz@linux.intel.com> References: <20250129124009.1039982-1-jacek.lawrynowicz@linux.intel.com> MIME-Version: 1.0 X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" Disable runtime PM for the duration of reset/recovery so it is possible to set the correct runtime PM state depending on the outcome of the `ivpu_resume()`. Don’t suspend or reset the HW if the NPU is suspended when the reset/recovery is requested. Also, move common reset/recovery code to separate functions for better code readability. Fixes: 27d19268cf39 ("accel/ivpu: Improve recovery and reset support") Cc: # v6.8+ Reviewed-by: Maciej Falkowski Signed-off-by: Jacek Lawrynowicz Reviewed-by: Jeffrey Hugo --- drivers/accel/ivpu/ivpu_pm.c | 79 ++++++++++++++++++++---------------- 1 file changed, 43 insertions(+), 36 deletions(-) diff --git a/drivers/accel/ivpu/ivpu_pm.c b/drivers/accel/ivpu/ivpu_pm.c index c3774d2221326..8b2b050cc41a9 100644 --- a/drivers/accel/ivpu/ivpu_pm.c +++ b/drivers/accel/ivpu/ivpu_pm.c @@ -115,41 +115,57 @@ static int ivpu_resume(struct ivpu_device *vdev) return ret; } -static void ivpu_pm_recovery_work(struct work_struct *work) +static void ivpu_pm_reset_begin(struct ivpu_device *vdev) { - struct ivpu_pm_info *pm = container_of(work, struct ivpu_pm_info, recovery_work); - struct ivpu_device *vdev = pm->vdev; - char *evt[2] = {"IVPU_PM_EVENT=IVPU_RECOVER", NULL}; - int ret; - - ivpu_err(vdev, "Recovering the NPU (reset #%d)\n", atomic_read(&vdev->pm->reset_counter)); - - ret = pm_runtime_resume_and_get(vdev->drm.dev); - if (ret) - ivpu_err(vdev, "Failed to resume NPU: %d\n", ret); - - ivpu_jsm_state_dump(vdev); - ivpu_dev_coredump(vdev); + pm_runtime_disable(vdev->drm.dev); atomic_inc(&vdev->pm->reset_counter); atomic_set(&vdev->pm->reset_pending, 1); down_write(&vdev->pm->reset_lock); +} + +static void ivpu_pm_reset_complete(struct ivpu_device *vdev) +{ + int ret; - ivpu_suspend(vdev); ivpu_pm_prepare_cold_boot(vdev); ivpu_jobs_abort_all(vdev); ivpu_ms_cleanup_all(vdev); ret = ivpu_resume(vdev); - if (ret) + if (ret) { ivpu_err(vdev, "Failed to resume NPU: %d\n", ret); + pm_runtime_set_suspended(vdev->drm.dev); + } else { + pm_runtime_set_active(vdev->drm.dev); + } up_write(&vdev->pm->reset_lock); atomic_set(&vdev->pm->reset_pending, 0); - kobject_uevent_env(&vdev->drm.dev->kobj, KOBJ_CHANGE, evt); pm_runtime_mark_last_busy(vdev->drm.dev); - pm_runtime_put_autosuspend(vdev->drm.dev); + pm_runtime_enable(vdev->drm.dev); +} + +static void ivpu_pm_recovery_work(struct work_struct *work) +{ + struct ivpu_pm_info *pm = container_of(work, struct ivpu_pm_info, recovery_work); + struct ivpu_device *vdev = pm->vdev; + char *evt[2] = {"IVPU_PM_EVENT=IVPU_RECOVER", NULL}; + + ivpu_err(vdev, "Recovering the NPU (reset #%d)\n", atomic_read(&vdev->pm->reset_counter)); + + ivpu_pm_reset_begin(vdev); + + if (!pm_runtime_status_suspended(vdev->drm.dev)) { + ivpu_jsm_state_dump(vdev); + ivpu_dev_coredump(vdev); + ivpu_suspend(vdev); + } + + ivpu_pm_reset_complete(vdev); + + kobject_uevent_env(&vdev->drm.dev->kobj, KOBJ_CHANGE, evt); } void ivpu_pm_trigger_recovery(struct ivpu_device *vdev, const char *reason) @@ -328,16 +344,13 @@ void ivpu_pm_reset_prepare_cb(struct pci_dev *pdev) struct ivpu_device *vdev = pci_get_drvdata(pdev); ivpu_dbg(vdev, PM, "Pre-reset..\n"); - atomic_inc(&vdev->pm->reset_counter); - atomic_set(&vdev->pm->reset_pending, 1); - pm_runtime_get_sync(vdev->drm.dev); - down_write(&vdev->pm->reset_lock); - ivpu_prepare_for_reset(vdev); - ivpu_hw_reset(vdev); - ivpu_pm_prepare_cold_boot(vdev); - ivpu_jobs_abort_all(vdev); - ivpu_ms_cleanup_all(vdev); + ivpu_pm_reset_begin(vdev); + + if (!pm_runtime_status_suspended(vdev->drm.dev)) { + ivpu_prepare_for_reset(vdev); + ivpu_hw_reset(vdev); + } ivpu_dbg(vdev, PM, "Pre-reset done.\n"); } @@ -345,18 +358,12 @@ void ivpu_pm_reset_prepare_cb(struct pci_dev *pdev) void ivpu_pm_reset_done_cb(struct pci_dev *pdev) { struct ivpu_device *vdev = pci_get_drvdata(pdev); - int ret; ivpu_dbg(vdev, PM, "Post-reset..\n"); - ret = ivpu_resume(vdev); - if (ret) - ivpu_err(vdev, "Failed to set RESUME state: %d\n", ret); - up_write(&vdev->pm->reset_lock); - atomic_set(&vdev->pm->reset_pending, 0); - ivpu_dbg(vdev, PM, "Post-reset done.\n"); - pm_runtime_mark_last_busy(vdev->drm.dev); - pm_runtime_put_autosuspend(vdev->drm.dev); + ivpu_pm_reset_complete(vdev); + + ivpu_dbg(vdev, PM, "Post-reset done.\n"); } void ivpu_pm_init(struct ivpu_device *vdev)