From patchwork Mon Feb 27 11:13:01 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Oded Gabbay X-Patchwork-Id: 13153229 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id A5F9EC64ED6 for ; Mon, 27 Feb 2023 11:13:16 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id D52E210E1BF; Mon, 27 Feb 2023 11:13:15 +0000 (UTC) Received: from ams.source.kernel.org (ams.source.kernel.org [IPv6:2604:1380:4601:e00::1]) by gabe.freedesktop.org (Postfix) with ESMTPS id D419910E3B6 for ; Mon, 27 Feb 2023 11:13:13 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id 86178B80CA5; Mon, 27 Feb 2023 11:13:12 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 6B07DC433EF; Mon, 27 Feb 2023 11:13:10 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1677496391; bh=0I5AnhYb/ob1dlnY+rf1XjX4GfCR1gP3WJNq7rzgZno=; h=From:To:Cc:Subject:Date:From; b=dC/NuyD5Qqiygv+Fa23Cl4h7jVV5IYXml4kggKcEpKkQ3TJm1bWjIJlhqacipMTMt exLrcHIP9hw9fdM1n/GuEPX04tW/nGOXLFglN7/DuD5bxY7sU8noGIUk2C8Aj6zlef RcucKZXEsY2WZkrmHx5w9tpHq7ubJht1Px1waoBeOB/vw6PRxT+Z1BqcURaHFhnKNx uJJLHz8+uaWB//2NFnfNmgZD1M0F8IN6ICJWcVtCFtoTT2D1a/UHEGjjvN6cJVekv3 dDzly4tU7N2osPTPMomRpCL/GCkT86L2J3bACjwa1Wug+11G8QOmrpkhTw6cbVBU2I lWly2gknZljbg== From: Oded Gabbay To: dri-devel@lists.freedesktop.org Subject: [PATCH 1/6] habanalabs: add helper function to get vm hash node Date: Mon, 27 Feb 2023 13:13:01 +0200 Message-Id: <20230227111306.3985896-1-ogabbay@kernel.org> X-Mailer: git-send-email 2.39.2 MIME-Version: 1.0 X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Tomer Tayar Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" From: Tomer Tayar Add a helper function to search the vm hash for a node with a given virtual address. As opposed to the current code, this function explicitly returns NULL when no node is found, instead of basing on the loop cursor object's value. Signed-off-by: Tomer Tayar Reviewed-by: Oded Gabbay Signed-off-by: Oded Gabbay Reviewed-by: Stanislaw Gruszka --- drivers/accel/habanalabs/common/memory.c | 28 ++++++++++++++---------- 1 file changed, 16 insertions(+), 12 deletions(-) diff --git a/drivers/accel/habanalabs/common/memory.c b/drivers/accel/habanalabs/common/memory.c index be0cba3b61ab..88f5178d2df7 100644 --- a/drivers/accel/habanalabs/common/memory.c +++ b/drivers/accel/habanalabs/common/memory.c @@ -1266,6 +1266,18 @@ static int map_device_va(struct hl_ctx *ctx, struct hl_mem_in *args, u64 *device return rc; } +/* Should be called while the context's mem_hash_lock is taken */ +static struct hl_vm_hash_node *get_vm_hash_node_locked(struct hl_ctx *ctx, u64 vaddr) +{ + struct hl_vm_hash_node *hnode; + + hash_for_each_possible(ctx->mem_hash, hnode, node, vaddr) + if (vaddr == hnode->vaddr) + return hnode; + + return NULL; +} + /** * unmap_device_va() - unmap the given device virtual address. * @ctx: pointer to the context structure. @@ -1281,10 +1293,10 @@ static int unmap_device_va(struct hl_ctx *ctx, struct hl_mem_in *args, { struct hl_vm_phys_pg_pack *phys_pg_pack = NULL; u64 vaddr = args->unmap.device_virt_addr; - struct hl_vm_hash_node *hnode = NULL; struct asic_fixed_properties *prop; struct hl_device *hdev = ctx->hdev; struct hl_userptr *userptr = NULL; + struct hl_vm_hash_node *hnode; struct hl_va_range *va_range; enum vm_type *vm_type; bool is_userptr; @@ -1294,15 +1306,10 @@ static int unmap_device_va(struct hl_ctx *ctx, struct hl_mem_in *args, /* protect from double entrance */ mutex_lock(&ctx->mem_hash_lock); - hash_for_each_possible(ctx->mem_hash, hnode, node, (unsigned long)vaddr) - if (vaddr == hnode->vaddr) - break; - + hnode = get_vm_hash_node_locked(ctx, vaddr); if (!hnode) { mutex_unlock(&ctx->mem_hash_lock); - dev_err(hdev->dev, - "unmap failed, no mem hnode for vaddr 0x%llx\n", - vaddr); + dev_err(hdev->dev, "unmap failed, no mem hnode for vaddr 0x%llx\n", vaddr); return -EINVAL; } @@ -1782,10 +1789,7 @@ static struct hl_vm_hash_node *memhash_node_export_get(struct hl_ctx *ctx, u64 a /* get the memory handle */ mutex_lock(&ctx->mem_hash_lock); - hash_for_each_possible(ctx->mem_hash, hnode, node, (unsigned long)addr) - if (addr == hnode->vaddr) - break; - + hnode = get_vm_hash_node_locked(ctx, addr); if (!hnode) { mutex_unlock(&ctx->mem_hash_lock); dev_dbg(hdev->dev, "map address %#llx not found\n", addr); From patchwork Mon Feb 27 11:13:02 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Oded Gabbay X-Patchwork-Id: 13153230 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 0C9CFC7EE2D for ; Mon, 27 Feb 2023 11:13:18 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id CBDE110E3C1; Mon, 27 Feb 2023 11:13:16 +0000 (UTC) Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by gabe.freedesktop.org (Postfix) with ESMTPS id 645ED10E3B6 for ; Mon, 27 Feb 2023 11:13:14 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id C3C2560DC4; Mon, 27 Feb 2023 11:13:13 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id D0B4AC4339B; Mon, 27 Feb 2023 11:13:11 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1677496393; bh=LhJ3bnyhSMtSjwKe39NSUgUg82EqxlpgqvrnTl5X2Eg=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=agqVdNEzEb2McN01BlbmXbX98PPI2hefoD7gc17/fLOB3NABpZnWWXMmQHcCNVg8f dRhcHO2xXZxSuqi4Pp80GhQbkk/YhSTLKO1nCUvXlA9smzG/Jmpzcf03amDcYessgm FJs33m5QWRYbhmMUeYwGnHkD0B9h9I/9PWrc2yReZRrVg6PCkHAZGtDFQfnBOIdxIQ eXdY+suYMC8rlSKRt2CAtBC61Kkow8SjHEAueELYZ65xMJJsaQQ8fDuQwA7mFXSWGe q7hd4Qh6MDyJQpp4NCred76BIyYF7m03TWOZjI3liVY9CAi5mWIagLoqJSof8lGhT+ QXMgiEHpkaJCw== From: Oded Gabbay To: dri-devel@lists.freedesktop.org Subject: [PATCH 2/6] habanalabs: add device id to all threads names Date: Mon, 27 Feb 2023 13:13:02 +0200 Message-Id: <20230227111306.3985896-2-ogabbay@kernel.org> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20230227111306.3985896-1-ogabbay@kernel.org> References: <20230227111306.3985896-1-ogabbay@kernel.org> MIME-Version: 1.0 X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Sagiv Ozeri Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" From: Sagiv Ozeri Compute driver threads names will start with hlX-*, when X is the device id. This will help distinguish them from the NIC thread names. Signed-off-by: Sagiv Ozeri Reviewed-by: Oded Gabbay Signed-off-by: Oded Gabbay Reviewed-by: Stanislaw Gruszka --- drivers/accel/habanalabs/common/device.c | 20 ++++++++++++-------- 1 file changed, 12 insertions(+), 8 deletions(-) diff --git a/drivers/accel/habanalabs/common/device.c b/drivers/accel/habanalabs/common/device.c index e544d00fe376..7ade32487138 100644 --- a/drivers/accel/habanalabs/common/device.c +++ b/drivers/accel/habanalabs/common/device.c @@ -840,7 +840,7 @@ static int device_early_init(struct hl_device *hdev) } for (i = 0 ; i < hdev->asic_prop.completion_queues_count ; i++) { - snprintf(workq_name, 32, "hl-free-jobs-%u", (u32) i); + snprintf(workq_name, 32, "hl%u-free-jobs-%u", hdev->cdev_idx, (u32) i); hdev->cq_wq[i] = create_singlethread_workqueue(workq_name); if (hdev->cq_wq[i] == NULL) { dev_err(hdev->dev, "Failed to allocate CQ workqueue\n"); @@ -849,14 +849,16 @@ static int device_early_init(struct hl_device *hdev) } } - hdev->eq_wq = create_singlethread_workqueue("hl-events"); + snprintf(workq_name, 32, "hl%u-events", hdev->cdev_idx); + hdev->eq_wq = create_singlethread_workqueue(workq_name); if (hdev->eq_wq == NULL) { dev_err(hdev->dev, "Failed to allocate EQ workqueue\n"); rc = -ENOMEM; goto free_cq_wq; } - hdev->cs_cmplt_wq = alloc_workqueue("hl-cs-completions", WQ_UNBOUND, 0); + snprintf(workq_name, 32, "hl%u-cs-completions", hdev->cdev_idx); + hdev->cs_cmplt_wq = alloc_workqueue(workq_name, WQ_UNBOUND, 0); if (!hdev->cs_cmplt_wq) { dev_err(hdev->dev, "Failed to allocate CS completions workqueue\n"); @@ -864,7 +866,8 @@ static int device_early_init(struct hl_device *hdev) goto free_eq_wq; } - hdev->ts_free_obj_wq = alloc_workqueue("hl-ts-free-obj", WQ_UNBOUND, 0); + snprintf(workq_name, 32, "hl%u-ts-free-obj", hdev->cdev_idx); + hdev->ts_free_obj_wq = alloc_workqueue(workq_name, WQ_UNBOUND, 0); if (!hdev->ts_free_obj_wq) { dev_err(hdev->dev, "Failed to allocate Timestamp registration free workqueue\n"); @@ -872,15 +875,15 @@ static int device_early_init(struct hl_device *hdev) goto free_cs_cmplt_wq; } - hdev->prefetch_wq = alloc_workqueue("hl-prefetch", WQ_UNBOUND, 0); + snprintf(workq_name, 32, "hl%u-prefetch", hdev->cdev_idx); + hdev->prefetch_wq = alloc_workqueue(workq_name, WQ_UNBOUND, 0); if (!hdev->prefetch_wq) { dev_err(hdev->dev, "Failed to allocate MMU prefetch workqueue\n"); rc = -ENOMEM; goto free_ts_free_wq; } - hdev->hl_chip_info = kzalloc(sizeof(struct hwmon_chip_info), - GFP_KERNEL); + hdev->hl_chip_info = kzalloc(sizeof(struct hwmon_chip_info), GFP_KERNEL); if (!hdev->hl_chip_info) { rc = -ENOMEM; goto free_prefetch_wq; @@ -892,7 +895,8 @@ static int device_early_init(struct hl_device *hdev) hl_mem_mgr_init(hdev->dev, &hdev->kernel_mem_mgr); - hdev->reset_wq = create_singlethread_workqueue("hl_device_reset"); + snprintf(workq_name, 32, "hl%u_device_reset", hdev->cdev_idx); + hdev->reset_wq = create_singlethread_workqueue(workq_name); if (!hdev->reset_wq) { rc = -ENOMEM; dev_err(hdev->dev, "Failed to create device reset WQ\n"); From patchwork Mon Feb 27 11:13:03 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Oded Gabbay X-Patchwork-Id: 13153231 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 940CDC64ED8 for ; Mon, 27 Feb 2023 11:13:27 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id DEB4810E3B9; Mon, 27 Feb 2023 11:13:26 +0000 (UTC) Received: from ams.source.kernel.org (ams.source.kernel.org [IPv6:2604:1380:4601:e00::1]) by gabe.freedesktop.org (Postfix) with ESMTPS id 838E110E3B9 for ; Mon, 27 Feb 2023 11:13:17 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id 24F63B80C94; Mon, 27 Feb 2023 11:13:16 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id B476DC433D2; Mon, 27 Feb 2023 11:13:13 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1677496394; bh=3HA4UDfmacoAKX3rqCOb6ySEtFxApwvHCs+X2CaNP44=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=KnlhGzpRXzaIJ+YT9i9CBy2psuLaLrE+i+v5s5IoC5imR+jh5bRhLG4kBNYjrp9ow URusC+D7Pm5anJTgWQLHqGxYJE8X08crLQgb2FMBwGixoeaeaWUHK2ZKAQ3aoiBuBd k1jAbp9L1VmI6Joj8RhuuBeo79MpvMfEQp3fgSbvktJMV0g5PxdYlvGQb/VxonyNbn 4gMrV6Wbw+Y+FiVm+Xkz0dDa8MgmXusQEqww18TuK9Vp161GxFkOtzuW+J0mkw6/+s aHX5olrFUKocprnWBAmGm+uSWpl3YYX8kEkxoTeLpMMUKdL6WcdvLEyV4SWYpDQ6YH dSlNf9DEzxlug== From: Oded Gabbay To: dri-devel@lists.freedesktop.org Subject: [PATCH 3/6] habanalabs/gaudi2: break is_idle function into per-engine sub-routines Date: Mon, 27 Feb 2023 13:13:03 +0200 Message-Id: <20230227111306.3985896-3-ogabbay@kernel.org> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20230227111306.3985896-1-ogabbay@kernel.org> References: <20230227111306.3985896-1-ogabbay@kernel.org> MIME-Version: 1.0 X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Koby Elbaz Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" From: Koby Elbaz is_idle() was too long, so break it up for readability. In addition, we can now use the new sub-routines from other places. Signed-off-by: Koby Elbaz Reviewed-by: Oded Gabbay Signed-off-by: Oded Gabbay --- drivers/accel/habanalabs/gaudi2/gaudi2.c | 212 ++++++++++++++++------- 1 file changed, 146 insertions(+), 66 deletions(-) diff --git a/drivers/accel/habanalabs/gaudi2/gaudi2.c b/drivers/accel/habanalabs/gaudi2/gaudi2.c index 2021ef9d4702..82448edfdfa0 100644 --- a/drivers/accel/habanalabs/gaudi2/gaudi2.c +++ b/drivers/accel/habanalabs/gaudi2/gaudi2.c @@ -6651,70 +6651,17 @@ static int gaudi2_compute_reset_late_init(struct hl_device *hdev) return hl_fw_unmask_irq_arr(hdev, gaudi2->hw_events, irq_arr_size); } -static void gaudi2_is_tpc_engine_idle(struct hl_device *hdev, int dcore, int inst, u32 offset, - struct iterate_module_ctx *ctx) -{ - struct gaudi2_tpc_idle_data *idle_data = ctx->data; - u32 tpc_cfg_sts, qm_glbl_sts0, qm_glbl_sts1, qm_cgm_sts; - bool is_eng_idle; - int engine_idx; - - if ((dcore == 0) && (inst == (NUM_DCORE0_TPC - 1))) - engine_idx = GAUDI2_DCORE0_ENGINE_ID_TPC_6; - else - engine_idx = GAUDI2_DCORE0_ENGINE_ID_TPC_0 + - dcore * GAUDI2_ENGINE_ID_DCORE_OFFSET + inst; - - tpc_cfg_sts = RREG32(mmDCORE0_TPC0_CFG_STATUS + offset); - qm_glbl_sts0 = RREG32(mmDCORE0_TPC0_QM_GLBL_STS0 + offset); - qm_glbl_sts1 = RREG32(mmDCORE0_TPC0_QM_GLBL_STS1 + offset); - qm_cgm_sts = RREG32(mmDCORE0_TPC0_QM_CGM_STS + offset); - - is_eng_idle = IS_QM_IDLE(qm_glbl_sts0, qm_glbl_sts1, qm_cgm_sts) && - IS_TPC_IDLE(tpc_cfg_sts); - *(idle_data->is_idle) &= is_eng_idle; - - if (idle_data->mask && !is_eng_idle) - set_bit(engine_idx, idle_data->mask); - - if (idle_data->e) - hl_engine_data_sprintf(idle_data->e, - idle_data->tpc_fmt, dcore, inst, - is_eng_idle ? "Y" : "N", - qm_glbl_sts0, qm_cgm_sts, tpc_cfg_sts); -} - -static bool gaudi2_is_device_idle(struct hl_device *hdev, u64 *mask_arr, u8 mask_len, - struct engines_data *e) +static bool gaudi2_get_edma_idle_status(struct hl_device *hdev, u64 *mask_arr, u8 mask_len, + struct engines_data *e) { - u32 qm_glbl_sts0, qm_glbl_sts1, qm_cgm_sts, dma_core_idle_ind_mask, - mme_arch_sts, dec_swreg15, dec_enabled_bit; + u32 qm_glbl_sts0, qm_glbl_sts1, qm_cgm_sts, dma_core_idle_ind_mask; struct asic_fixed_properties *prop = &hdev->asic_prop; - const char *rot_fmt = "%-6d%-5d%-9s%#-14x%#-12x%s\n"; unsigned long *mask = (unsigned long *) mask_arr; const char *edma_fmt = "%-6d%-6d%-9s%#-14x%#x\n"; - const char *mme_fmt = "%-5d%-6s%-9s%#-14x%#x\n"; - const char *nic_fmt = "%-5d%-9s%#-14x%#-12x\n"; - const char *pdma_fmt = "%-6d%-9s%#-14x%#x\n"; - const char *pcie_dec_fmt = "%-10d%-9s%#x\n"; - const char *dec_fmt = "%-6d%-5d%-9s%#x\n"; bool is_idle = true, is_eng_idle; - u64 offset; - - struct gaudi2_tpc_idle_data tpc_idle_data = { - .tpc_fmt = "%-6d%-5d%-9s%#-14x%#-12x%#x\n", - .e = e, - .mask = mask, - .is_idle = &is_idle, - }; - struct iterate_module_ctx tpc_iter = { - .fn = &gaudi2_is_tpc_engine_idle, - .data = &tpc_idle_data, - }; - int engine_idx, i, j; + u64 offset; - /* EDMA, Two engines per Dcore */ if (e) hl_engine_data_sprintf(e, "\nCORE EDMA is_idle QM_GLBL_STS0 DMA_CORE_IDLE_IND_MASK\n" @@ -6753,7 +6700,19 @@ static bool gaudi2_is_device_idle(struct hl_device *hdev, u64 *mask_arr, u8 mask } } - /* PDMA, Two engines in Full chip */ + return is_idle; +} + +static bool gaudi2_get_pdma_idle_status(struct hl_device *hdev, u64 *mask_arr, u8 mask_len, + struct engines_data *e) +{ + u32 qm_glbl_sts0, qm_glbl_sts1, qm_cgm_sts, dma_core_idle_ind_mask; + unsigned long *mask = (unsigned long *) mask_arr; + const char *pdma_fmt = "%-6d%-9s%#-14x%#x\n"; + bool is_idle = true, is_eng_idle; + int engine_idx, i; + u64 offset; + if (e) hl_engine_data_sprintf(e, "\nPDMA is_idle QM_GLBL_STS0 DMA_CORE_IDLE_IND_MASK\n" @@ -6780,6 +6739,19 @@ static bool gaudi2_is_device_idle(struct hl_device *hdev, u64 *mask_arr, u8 mask qm_glbl_sts0, dma_core_idle_ind_mask); } + return is_idle; +} + +static bool gaudi2_get_nic_idle_status(struct hl_device *hdev, u64 *mask_arr, u8 mask_len, + struct engines_data *e) +{ + unsigned long *mask = (unsigned long *) mask_arr; + const char *nic_fmt = "%-5d%-9s%#-14x%#-12x\n"; + u32 qm_glbl_sts0, qm_glbl_sts1, qm_cgm_sts; + bool is_idle = true, is_eng_idle; + int engine_idx, i; + u64 offset; + /* NIC, twelve macros in Full chip */ if (e && hdev->nic_ports_mask) hl_engine_data_sprintf(e, @@ -6813,6 +6785,19 @@ static bool gaudi2_is_device_idle(struct hl_device *hdev, u64 *mask_arr, u8 mask qm_glbl_sts0, qm_cgm_sts); } + return is_idle; +} + +static bool gaudi2_get_mme_idle_status(struct hl_device *hdev, u64 *mask_arr, u8 mask_len, + struct engines_data *e) +{ + u32 qm_glbl_sts0, qm_glbl_sts1, qm_cgm_sts, mme_arch_sts; + unsigned long *mask = (unsigned long *) mask_arr; + const char *mme_fmt = "%-5d%-6s%-9s%#-14x%#x\n"; + bool is_idle = true, is_eng_idle; + int engine_idx, i; + u64 offset; + if (e) hl_engine_data_sprintf(e, "\nMME Stub is_idle QM_GLBL_STS0 MME_ARCH_STATUS\n" @@ -6843,16 +6828,82 @@ static bool gaudi2_is_device_idle(struct hl_device *hdev, u64 *mask_arr, u8 mask set_bit(engine_idx, mask); } - /* - * TPC - */ + return is_idle; +} + +static void gaudi2_is_tpc_engine_idle(struct hl_device *hdev, int dcore, int inst, u32 offset, + struct iterate_module_ctx *ctx) +{ + struct gaudi2_tpc_idle_data *idle_data = ctx->data; + u32 tpc_cfg_sts, qm_glbl_sts0, qm_glbl_sts1, qm_cgm_sts; + bool is_eng_idle; + int engine_idx; + + if ((dcore == 0) && (inst == (NUM_DCORE0_TPC - 1))) + engine_idx = GAUDI2_DCORE0_ENGINE_ID_TPC_6; + else + engine_idx = GAUDI2_DCORE0_ENGINE_ID_TPC_0 + + dcore * GAUDI2_ENGINE_ID_DCORE_OFFSET + inst; + + tpc_cfg_sts = RREG32(mmDCORE0_TPC0_CFG_STATUS + offset); + qm_glbl_sts0 = RREG32(mmDCORE0_TPC0_QM_GLBL_STS0 + offset); + qm_glbl_sts1 = RREG32(mmDCORE0_TPC0_QM_GLBL_STS1 + offset); + qm_cgm_sts = RREG32(mmDCORE0_TPC0_QM_CGM_STS + offset); + + is_eng_idle = IS_QM_IDLE(qm_glbl_sts0, qm_glbl_sts1, qm_cgm_sts) && + IS_TPC_IDLE(tpc_cfg_sts); + *(idle_data->is_idle) &= is_eng_idle; + + if (idle_data->mask && !is_eng_idle) + set_bit(engine_idx, idle_data->mask); + + if (idle_data->e) + hl_engine_data_sprintf(idle_data->e, + idle_data->tpc_fmt, dcore, inst, + is_eng_idle ? "Y" : "N", + qm_glbl_sts0, qm_cgm_sts, tpc_cfg_sts); +} + +static bool gaudi2_get_tpc_idle_status(struct hl_device *hdev, u64 *mask_arr, u8 mask_len, + struct engines_data *e) +{ + struct asic_fixed_properties *prop = &hdev->asic_prop; + unsigned long *mask = (unsigned long *) mask_arr; + bool is_idle = true; + + struct gaudi2_tpc_idle_data tpc_idle_data = { + .tpc_fmt = "%-6d%-5d%-9s%#-14x%#-12x%#x\n", + .e = e, + .mask = mask, + .is_idle = &is_idle, + }; + struct iterate_module_ctx tpc_iter = { + .fn = &gaudi2_is_tpc_engine_idle, + .data = &tpc_idle_data, + }; + if (e && prop->tpc_enabled_mask) hl_engine_data_sprintf(e, - "\nCORE TPC is_idle QM_GLBL_STS0 QM_CGM_STS DMA_CORE_IDLE_IND_MASK\n" - "---- --- -------- ------------ ---------- ----------------------\n"); + "\nCORE TPC is_idle QM_GLBL_STS0 QM_CGM_STS STATUS\n" + "---- --- ------- ------------ ---------- ------\n"); gaudi2_iterate_tpcs(hdev, &tpc_iter); + return tpc_idle_data.is_idle; +} + +static bool gaudi2_get_decoder_idle_status(struct hl_device *hdev, u64 *mask_arr, u8 mask_len, + struct engines_data *e) +{ + struct asic_fixed_properties *prop = &hdev->asic_prop; + unsigned long *mask = (unsigned long *) mask_arr; + const char *pcie_dec_fmt = "%-10d%-9s%#x\n"; + const char *dec_fmt = "%-6d%-5d%-9s%#x\n"; + bool is_idle = true, is_eng_idle; + u32 dec_swreg15, dec_enabled_bit; + int engine_idx, i, j; + u64 offset; + /* Decoders, two each Dcore and two shared PCIe decoders */ if (e && (prop->decoder_enabled_mask & (~PCIE_DEC_EN_MASK))) hl_engine_data_sprintf(e, @@ -6907,10 +6958,23 @@ static bool gaudi2_is_device_idle(struct hl_device *hdev, u64 *mask_arr, u8 mask is_eng_idle ? "Y" : "N", dec_swreg15); } + return is_idle; +} + +static bool gaudi2_get_rotator_idle_status(struct hl_device *hdev, u64 *mask_arr, u8 mask_len, + struct engines_data *e) +{ + const char *rot_fmt = "%-6d%-5d%-9s%#-14x%#-14x%#x\n"; + unsigned long *mask = (unsigned long *) mask_arr; + u32 qm_glbl_sts0, qm_glbl_sts1, qm_cgm_sts; + bool is_idle = true, is_eng_idle; + int engine_idx, i; + u64 offset; + if (e) hl_engine_data_sprintf(e, - "\nCORE ROT is_idle QM_GLBL_STS0 QM_CGM_STS DMA_CORE_STS0\n" - "---- ---- ------- ------------ ---------- -------------\n"); + "\nCORE ROT is_idle QM_GLBL_STS0 QM_GLBL_STS1 QM_CGM_STS\n" + "---- --- ------- ------------ ------------ ----------\n"); for (i = 0 ; i < NUM_OF_ROT ; i++) { engine_idx = GAUDI2_ENGINE_ID_ROT_0 + i; @@ -6929,12 +6993,28 @@ static bool gaudi2_is_device_idle(struct hl_device *hdev, u64 *mask_arr, u8 mask if (e) hl_engine_data_sprintf(e, rot_fmt, i, 0, is_eng_idle ? "Y" : "N", - qm_glbl_sts0, qm_cgm_sts, "-"); + qm_glbl_sts0, qm_glbl_sts1, qm_cgm_sts); } return is_idle; } +bool gaudi2_is_device_idle(struct hl_device *hdev, u64 *mask_arr, u8 mask_len, + struct engines_data *e) +{ + bool is_idle = true; + + is_idle &= gaudi2_get_edma_idle_status(hdev, mask_arr, mask_len, e); + is_idle &= gaudi2_get_pdma_idle_status(hdev, mask_arr, mask_len, e); + is_idle &= gaudi2_get_nic_idle_status(hdev, mask_arr, mask_len, e); + is_idle &= gaudi2_get_mme_idle_status(hdev, mask_arr, mask_len, e); + is_idle &= gaudi2_get_tpc_idle_status(hdev, mask_arr, mask_len, e); + is_idle &= gaudi2_get_decoder_idle_status(hdev, mask_arr, mask_len, e); + is_idle &= gaudi2_get_rotator_idle_status(hdev, mask_arr, mask_len, e); + + return is_idle; +} + static void gaudi2_hw_queues_lock(struct hl_device *hdev) __acquires(&gaudi2->hw_queues_lock) { From patchwork Mon Feb 27 11:13:04 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Oded Gabbay X-Patchwork-Id: 13153234 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 309E7C7EE2E for ; Mon, 27 Feb 2023 11:13:31 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 737DD10E3C8; Mon, 27 Feb 2023 11:13:28 +0000 (UTC) Received: from ams.source.kernel.org (ams.source.kernel.org [IPv6:2604:1380:4601:e00::1]) by gabe.freedesktop.org (Postfix) with ESMTPS id C37DF10E3B9 for ; Mon, 27 Feb 2023 11:13:18 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id 77BCFB80CA7; Mon, 27 Feb 2023 11:13:17 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 5D95DC433EF; Mon, 27 Feb 2023 11:13:15 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1677496396; bh=MNTXL5gCTOP52+/RXLlk4/dz2lODx9/6GzA1xxPjhNs=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=eIe+1s+MMaWuq67uWwYAWZsfz8O/D4YVwqaRXixCll0ZXKbOVWeseJ5XlT1fvY73m i/wrCfDX0J8JKJTsPpz3B0iwQPFb3EfrERT7wS8Z3KMKjb+PusIFf8Y54Fyb3Cdybr Qf2pYQwZWwEjTI8eaIHtgulGqpcClL6XuaW3VkmGk6Im46Y5YDAXe9dNPL0rmdp285 Zn1+nXJSwTsRmRkfMR3OMxenFuwOLkN6v/ueGA9yqPvgGLlw7Ulwfsp1mpPw3K0E4j yW9zoSZpXtSg0N5vtJG+29y5nSc99d4OzMe5ZtJ7ck+s+qu5Men0TtaxVJO654y64d XVXl1zDE1idcw== From: Oded Gabbay To: dri-devel@lists.freedesktop.org Subject: [PATCH 4/6] habanalabs: assert return value of hw_fini Date: Mon, 27 Feb 2023 13:13:04 +0200 Message-Id: <20230227111306.3985896-4-ogabbay@kernel.org> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20230227111306.3985896-1-ogabbay@kernel.org> References: <20230227111306.3985896-1-ogabbay@kernel.org> MIME-Version: 1.0 X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Dafna Hirschfeld Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" From: Dafna Hirschfeld Since hw_fini return error code for failure indication, we should check its return value. Currently it might only fail upon soft-reset from hl_device_reset. Later patch will add hw_fini failure in case of polling timeout in hard-reset. Signed-off-by: Dafna Hirschfeld Reviewed-by: Oded Gabbay Signed-off-by: Oded Gabbay Reviewed-by: Stanislaw Gruszka --- drivers/accel/habanalabs/common/device.c | 12 +++++++++--- drivers/accel/habanalabs/gaudi/gaudi.c | 7 ++++++- drivers/accel/habanalabs/gaudi2/gaudi2.c | 7 ++++++- drivers/accel/habanalabs/goya/goya.c | 7 ++++++- 4 files changed, 27 insertions(+), 6 deletions(-) diff --git a/drivers/accel/habanalabs/common/device.c b/drivers/accel/habanalabs/common/device.c index 7ade32487138..99e793dfb126 100644 --- a/drivers/accel/habanalabs/common/device.c +++ b/drivers/accel/habanalabs/common/device.c @@ -1472,7 +1472,7 @@ int hl_device_reset(struct hl_device *hdev, u32 flags) schedule_hard_reset = false, delay_reset, from_dev_release, from_watchdog_thread; u64 idle_mask[HL_BUSY_ENGINES_MASK_EXT_SIZE] = {0}; struct hl_ctx *ctx; - int i, rc; + int i, rc, hw_fini_rc; if (!hdev->init_done) { dev_err(hdev->dev, "Can't reset before initialization is done\n"); @@ -1634,7 +1634,7 @@ int hl_device_reset(struct hl_device *hdev, u32 flags) } /* Reset the H/W. It will be in idle state after this returns */ - hdev->asic_funcs->hw_fini(hdev, hard_reset, fw_reset); + hw_fini_rc = hdev->asic_funcs->hw_fini(hdev, hard_reset, fw_reset); if (hard_reset) { hdev->fw_loader.fw_comp_loaded = FW_TYPE_NONE; @@ -1661,6 +1661,10 @@ int hl_device_reset(struct hl_device *hdev, u32 flags) hl_ctx_put(ctx); } + if (hw_fini_rc) { + rc = hw_fini_rc; + goto out_err; + } /* Finished tear-down, starting to re-initialize */ if (hard_reset) { @@ -2416,7 +2420,9 @@ void hl_device_fini(struct hl_device *hdev) hl_cb_pool_fini(hdev); /* Reset the H/W. It will be in idle state after this returns */ - hdev->asic_funcs->hw_fini(hdev, true, false); + rc = hdev->asic_funcs->hw_fini(hdev, true, false); + if (rc) + dev_err(hdev->dev, "hw_fini failed in device fini while removing device %d\n", rc); hdev->fw_loader.fw_comp_loaded = FW_TYPE_NONE; diff --git a/drivers/accel/habanalabs/gaudi/gaudi.c b/drivers/accel/habanalabs/gaudi/gaudi.c index 26287084a9e0..60146fd4de6b 100644 --- a/drivers/accel/habanalabs/gaudi/gaudi.c +++ b/drivers/accel/habanalabs/gaudi/gaudi.c @@ -868,13 +868,18 @@ static int gaudi_early_init(struct hl_device *hdev) rc = hl_fw_read_preboot_status(hdev); if (rc) { if (hdev->reset_on_preboot_fail) + /* we are already on failure flow, so don't check if hw_fini fails. */ hdev->asic_funcs->hw_fini(hdev, true, false); goto pci_fini; } if (gaudi_get_hw_state(hdev) == HL_DEVICE_HW_STATE_DIRTY) { dev_dbg(hdev->dev, "H/W state is dirty, must reset before initializing\n"); - hdev->asic_funcs->hw_fini(hdev, true, false); + rc = hdev->asic_funcs->hw_fini(hdev, true, false); + if (rc) { + dev_err(hdev->dev, "failed to reset HW in dirty state (%d)\n", rc); + goto pci_fini; + } } return 0; diff --git a/drivers/accel/habanalabs/gaudi2/gaudi2.c b/drivers/accel/habanalabs/gaudi2/gaudi2.c index 82448edfdfa0..f01fa4bca381 100644 --- a/drivers/accel/habanalabs/gaudi2/gaudi2.c +++ b/drivers/accel/habanalabs/gaudi2/gaudi2.c @@ -2886,13 +2886,18 @@ static int gaudi2_early_init(struct hl_device *hdev) rc = hl_fw_read_preboot_status(hdev); if (rc) { if (hdev->reset_on_preboot_fail) + /* we are already on failure flow, so don't check if hw_fini fails. */ hdev->asic_funcs->hw_fini(hdev, true, false); goto pci_fini; } if (gaudi2_get_hw_state(hdev) == HL_DEVICE_HW_STATE_DIRTY) { dev_dbg(hdev->dev, "H/W state is dirty, must reset before initializing\n"); - hdev->asic_funcs->hw_fini(hdev, true, false); + rc = hdev->asic_funcs->hw_fini(hdev, true, false); + if (rc) { + dev_err(hdev->dev, "failed to reset HW during early init (%d)\n", rc); + goto pci_fini; + } } return 0; diff --git a/drivers/accel/habanalabs/goya/goya.c b/drivers/accel/habanalabs/goya/goya.c index 7a45ab3ca43a..39f9e5de1f4c 100644 --- a/drivers/accel/habanalabs/goya/goya.c +++ b/drivers/accel/habanalabs/goya/goya.c @@ -669,13 +669,18 @@ static int goya_early_init(struct hl_device *hdev) rc = hl_fw_read_preboot_status(hdev); if (rc) { if (hdev->reset_on_preboot_fail) + /* we are already on failure flow, so don't check if hw_fini fails. */ hdev->asic_funcs->hw_fini(hdev, true, false); goto pci_fini; } if (goya_get_hw_state(hdev) == HL_DEVICE_HW_STATE_DIRTY) { dev_dbg(hdev->dev, "H/W state is dirty, must reset before initializing\n"); - hdev->asic_funcs->hw_fini(hdev, true, false); + rc = hdev->asic_funcs->hw_fini(hdev, true, false); + if (rc) { + dev_err(hdev->dev, "failed to reset HW in dirty state (%d)\n", rc); + goto pci_fini; + } } if (!hdev->pldm) { From patchwork Mon Feb 27 11:13:05 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Oded Gabbay X-Patchwork-Id: 13153232 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id ECF7AC64ED6 for ; Mon, 27 Feb 2023 11:13:28 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 104B310E3C2; Mon, 27 Feb 2023 11:13:28 +0000 (UTC) Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by gabe.freedesktop.org (Postfix) with ESMTPS id BA68D10E3B3 for ; Mon, 27 Feb 2023 11:13:18 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 4054A60DC4; Mon, 27 Feb 2023 11:13:18 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id C2802C4339B; Mon, 27 Feb 2023 11:13:16 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1677496397; bh=uT+fzBYrbET5ZYuU3bEXEmVr1RkgKfTwP1oIHHM5mSY=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=WNpaTWA+pbzK6lp4CImKAoyGSpsaL8Gn2AD4VSyUtDgZE5R+cN4SniVUPtAHdhLrT 0x+guraXgULajVKr4uAXkBd7cMWfDJntQXsU5EuTwVQ/qYyMVmlsn5EAknu/WaJDl6 vlMMu7CsXLc2yP5zo9PDAfSD2nfoPuf+snaZsdCYeIo2JNtxy+x+uxlcaqd6LonOZ3 8zLyv0BXjI4fst37o1T52ellFtd1HkZMGr8B/Ja6OhGru7rJVxOV78HaZZ9ciI1pfJ nqWbIp/rdrx9plImmBalhWu4y9cQLzSVrgyEDhXkiQznoXux35M5BNVeAj8gmXYcog mWQpyciHQ9hQQ== From: Oded Gabbay To: dri-devel@lists.freedesktop.org Subject: [PATCH 5/6] habanalabs: use notifications and graceful reset for decoder Date: Mon, 27 Feb 2023 13:13:05 +0200 Message-Id: <20230227111306.3985896-5-ogabbay@kernel.org> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20230227111306.3985896-1-ogabbay@kernel.org> References: <20230227111306.3985896-1-ogabbay@kernel.org> MIME-Version: 1.0 X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Tomer Tayar Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" From: Tomer Tayar Add notifications to user in case of decoder abnormal interrupts, and use the graceful reset mechanism if reset is required. Signed-off-by: Tomer Tayar Reviewed-by: Oded Gabbay Signed-off-by: Oded Gabbay Reviewed-by: Stanislaw Gruszka --- drivers/accel/habanalabs/common/decoder.c | 22 ++++++++++++++++------ 1 file changed, 16 insertions(+), 6 deletions(-) diff --git a/drivers/accel/habanalabs/common/decoder.c b/drivers/accel/habanalabs/common/decoder.c index 2aab14d74b53..69c78c1784b4 100644 --- a/drivers/accel/habanalabs/common/decoder.c +++ b/drivers/accel/habanalabs/common/decoder.c @@ -46,7 +46,7 @@ static void dec_print_abnrm_intr_source(struct hl_device *hdev, u32 irq_status) static void dec_error_intr_work(struct hl_device *hdev, u32 base_addr, u32 core_id) { bool reset_required = false; - u32 irq_status; + u32 irq_status, event_mask; irq_status = RREG32(base_addr + VCMD_IRQ_STATUS_OFFSET); @@ -54,17 +54,27 @@ static void dec_error_intr_work(struct hl_device *hdev, u32 base_addr, u32 core_ dec_print_abnrm_intr_source(hdev, irq_status); - if (irq_status & VCMD_IRQ_STATUS_TIMEOUT_MASK) - reset_required = true; - /* Clear the interrupt */ WREG32(base_addr + VCMD_IRQ_STATUS_OFFSET, irq_status); /* Flush the interrupt clear */ RREG32(base_addr + VCMD_IRQ_STATUS_OFFSET); - if (reset_required) - hl_device_reset(hdev, HL_DRV_RESET_HARD); + if (irq_status & VCMD_IRQ_STATUS_TIMEOUT_MASK) { + reset_required = true; + event_mask = HL_NOTIFIER_EVENT_GENERAL_HW_ERR; + } else if (irq_status & VCMD_IRQ_STATUS_CMDERR_MASK) { + event_mask = HL_NOTIFIER_EVENT_UNDEFINED_OPCODE; + } else { + event_mask = HL_NOTIFIER_EVENT_USER_ENGINE_ERR; + } + + if (reset_required) { + event_mask |= HL_NOTIFIER_EVENT_DEVICE_RESET; + hl_device_cond_reset(hdev, 0, event_mask); + } else { + hl_notifier_event_send_all(hdev, event_mask); + } } static void dec_completion_abnrm(struct work_struct *work) From patchwork Mon Feb 27 11:13:06 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Oded Gabbay X-Patchwork-Id: 13153233 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id B7D3DC7EE2D for ; Mon, 27 Feb 2023 11:13:29 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 140E610E3C7; Mon, 27 Feb 2023 11:13:28 +0000 (UTC) Received: from dfw.source.kernel.org (dfw.source.kernel.org [IPv6:2604:1380:4641:c500::1]) by gabe.freedesktop.org (Postfix) with ESMTPS id 3907F10E3B9 for ; Mon, 27 Feb 2023 11:13:20 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id A1B3360DD5; Mon, 27 Feb 2023 11:13:19 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 33C50C433D2; Mon, 27 Feb 2023 11:13:17 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1677496399; bh=c4IRG41WkBHaqBWynAwU8Sjn0Ina7sC4Ycm9QQIDYWM=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=jrAW//rcXsZaQ9NHfpQFtGOtgYUhTAKG6HqcRLLY19rdArhtBpYgYuPCA9VmTjx1o TBM4KMnDNf4B43IWScHM9nVo/iuMckPgRjLFgMQgXvyQ3xiwKQhMiafSbiE7BLx5Fb qMPoMVPtfAmpCU4ZBKtasvDw3WS7SKDWNnuBO4hD55siAeqKKEkQ2RJdeVFKgW8M/O 65+ACrYV+BM/0RFC10f+COdIj1gxYrN/Ext/JTNU1/qxRp+GoRJc2oEINQxue5M9+U wBYLYe/+zcbjADm8bQgTx5YR+vJKKbINA+5JfWThfyGEIGIq9Kjpjs2K/wlpfMOcAx /SdD1ZyQK2lew== From: Oded Gabbay To: dri-devel@lists.freedesktop.org Subject: [PATCH 6/6] habanalabs/gaudi2: verify return code after scrubbing ARCs DCCMs Date: Mon, 27 Feb 2023 13:13:06 +0200 Message-Id: <20230227111306.3985896-6-ogabbay@kernel.org> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20230227111306.3985896-1-ogabbay@kernel.org> References: <20230227111306.3985896-1-ogabbay@kernel.org> MIME-Version: 1.0 X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Koby Elbaz Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" From: Koby Elbaz In case the KDMA fails scrubbing the DCCMs (following a soft-reset upon device release), the driver will only print failure until reset flow ends, rather than escalating it into a hard-reset. Signed-off-by: Koby Elbaz Reviewed-by: Oded Gabbay Signed-off-by: Oded Gabbay --- drivers/accel/habanalabs/gaudi2/gaudi2.c | 26 ++++++++++++++++++++---- 1 file changed, 22 insertions(+), 4 deletions(-) diff --git a/drivers/accel/habanalabs/gaudi2/gaudi2.c b/drivers/accel/habanalabs/gaudi2/gaudi2.c index f01fa4bca381..2186f8bd547e 100644 --- a/drivers/accel/habanalabs/gaudi2/gaudi2.c +++ b/drivers/accel/habanalabs/gaudi2/gaudi2.c @@ -3024,16 +3024,21 @@ static int gaudi2_scrub_arc_dccm(struct hl_device *hdev, u32 cpu_id) return 0; } -static void gaudi2_scrub_arcs_dccm(struct hl_device *hdev) +static int gaudi2_scrub_arcs_dccm(struct hl_device *hdev) { u16 arc_id; + int rc; for (arc_id = CPU_ID_SCHED_ARC0 ; arc_id < CPU_ID_MAX ; arc_id++) { if (!gaudi2_is_arc_enabled(hdev, arc_id)) continue; - gaudi2_scrub_arc_dccm(hdev, arc_id); + rc = gaudi2_scrub_arc_dccm(hdev, arc_id); + if (rc) + return rc; } + + return 0; } static int gaudi2_late_init(struct hl_device *hdev) @@ -3057,7 +3062,13 @@ static int gaudi2_late_init(struct hl_device *hdev) } gaudi2_init_arcs(hdev); - gaudi2_scrub_arcs_dccm(hdev); + + rc = gaudi2_scrub_arcs_dccm(hdev); + if (rc) { + dev_err(hdev->dev, "Failed to scrub arcs DCCM\n"); + goto disable_pci_access; + } + gaudi2_init_security(hdev); return 0; @@ -6643,12 +6654,19 @@ static int gaudi2_compute_reset_late_init(struct hl_device *hdev) { struct gaudi2_device *gaudi2 = hdev->asic_specific; size_t irq_arr_size; + int rc; /* TODO: missing gaudi2_nic_resume. * Until implemented nic_hw_cap_initialized will remain zeroed */ gaudi2_init_arcs(hdev); - gaudi2_scrub_arcs_dccm(hdev); + + rc = gaudi2_scrub_arcs_dccm(hdev); + if (rc) { + dev_err(hdev->dev, "Failed to scrub arcs DCCM\n"); + return rc; + } + gaudi2_init_security(hdev); /* Unmask all IRQs since some could have been received during the soft reset */