From patchwork Mon May 27 12:12:47 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ofir Bitton X-Patchwork-Id: 13675141 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 83274C25B7C for ; Mon, 27 May 2024 12:13:30 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id A075B10F4BD; Mon, 27 May 2024 12:13:29 +0000 (UTC) Authentication-Results: gabe.freedesktop.org; dkim=pass (2048-bit key; unprotected) header.d=habana.ai header.i=@habana.ai header.b="oiOfkJPX"; dkim-atps=neutral Received: from mail02.habana.ai (habanamailrelay02.habana.ai [62.90.112.121]) by gabe.freedesktop.org (Postfix) with ESMTPS id 6E3B410F4BD for ; Mon, 27 May 2024 12:13:07 +0000 (UTC) Received: internal info suppressed DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=habana.ai; s=default; t=1716811982; bh=jEOjfncRnHrY/GOzNXLpivOmaWg+BBdNPnztm7r0mH8=; h=From:To:Cc:Subject:Date:From; b=oiOfkJPX8EvLromWEhc+OBGh0jTo6HhFUurBbH+ItJWBQgulfLoxrTNFWkPllmHzC 4f5nPBaR4/YBIhEb++Z7iaQLecrROgCM2Q5zOsXJgEw1+kjP2EHyzmKz7w6se1/lq+ SJw4kMmWLt/U1UObkRwM3Q4gyzCHXB7aOtkVhXr/7vnssqQySbrH1SCB/s2F3+51Vt xTeG4KJOX7rdjMbooNAWbMDc2mQthgkI8pACSov7e0l8pMymF1C5VMTS6ZmeYfzhS/ B9N1U5T2Edvr9Km/Q8CNoUxdUA9oRTIjuYZCUJYvyC0bse39D5BGJUaEQgn9XlWzMA 30wkqU2xwtH9w== Received: from obitton-vm-u22.habana-labs.com (localhost [127.0.0.1]) by obitton-vm-u22.habana-labs.com (8.15.2/8.15.2/Debian-22ubuntu3) with ESMTP id 44RCCuar1921351; Mon, 27 May 2024 15:12:57 +0300 From: Ofir Bitton To: dri-devel@lists.freedesktop.org, linux-kernel@vger.kernel.org Cc: Igal Zeltser Subject: [PATCH 1/8] accel/habanalabs: use msg_header instead of desc_header Date: Mon, 27 May 2024 15:12:47 +0300 Message-Id: <20240527121254.1921306-1-obitton@habana.ai> X-Mailer: git-send-email 2.34.1 MIME-Version: 1.0 X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" From: Igal Zeltser Struct comms_desc_header is deprecated and replaced by struct comms_msg_header. As a preparation for removing comms_desc_header from FW, all it's usage in code is replaced by comms_msg_header. Signed-off-by: Igal Zeltser Reviewed-by: Ofir Bitton --- drivers/accel/habanalabs/common/firmware_if.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/drivers/accel/habanalabs/common/firmware_if.c b/drivers/accel/habanalabs/common/firmware_if.c index d1a1d601bde9..886b3c07503d 100644 --- a/drivers/accel/habanalabs/common/firmware_if.c +++ b/drivers/accel/habanalabs/common/firmware_if.c @@ -2093,7 +2093,7 @@ static int hl_fw_dynamic_validate_descriptor(struct hl_device *hdev, * note that no alignment/stride address issues here as all structures * are 64 bit padded. */ - data_ptr = (u8 *)fw_desc + sizeof(struct comms_desc_header); + data_ptr = (u8 *)fw_desc + sizeof(struct comms_msg_header); data_size = le16_to_cpu(fw_desc->header.size); data_crc32 = hl_fw_compat_crc32(data_ptr, data_size); @@ -2247,11 +2247,11 @@ static int hl_fw_dynamic_read_and_validate_descriptor(struct hl_device *hdev, memcpy_fromio(fw_desc, src, sizeof(struct lkd_fw_comms_desc)); fw_data_size = le16_to_cpu(fw_desc->header.size); - temp_fw_desc = vzalloc(sizeof(struct comms_desc_header) + fw_data_size); + temp_fw_desc = vzalloc(sizeof(struct comms_msg_header) + fw_data_size); if (!temp_fw_desc) return -ENOMEM; - memcpy_fromio(temp_fw_desc, src, sizeof(struct comms_desc_header) + fw_data_size); + memcpy_fromio(temp_fw_desc, src, sizeof(struct comms_msg_header) + fw_data_size); rc = hl_fw_dynamic_validate_descriptor(hdev, fw_loader, (struct lkd_fw_comms_desc *) temp_fw_desc); From patchwork Mon May 27 12:12:48 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ofir Bitton X-Patchwork-Id: 13675142 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 9307AC25B74 for ; Mon, 27 May 2024 12:13:35 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id C3FCF10F4E7; Mon, 27 May 2024 12:13:34 +0000 (UTC) Authentication-Results: gabe.freedesktop.org; dkim=pass (2048-bit key; unprotected) header.d=habana.ai header.i=@habana.ai header.b="LcQCWYKU"; dkim-atps=neutral Received: from mail02.habana.ai (habanamailrelay.habana.ai [213.57.90.13]) by gabe.freedesktop.org (Postfix) with ESMTPS id 4706B10FA1D for ; Mon, 27 May 2024 12:13:08 +0000 (UTC) Received: internal info suppressed DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=habana.ai; s=default; t=1716811993; bh=6fkm+Ekorc42aQ0DgvCRfc8pbcF22Uheaaz/c06hl/o=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=LcQCWYKUKob8FxyRRMksHjTJjtqkXFFJhp0EEZg6JpkjOSenDkGStR7NJby89TSSW TZanMV8UbA1hd9R1uTK620Eodt+tA4htDm3++AatDc1Idwd/T5a9c+ecMIaVvDEQzM 6OUfr9X5A+ubbdKNT5hJNMwaDCWF44tzGAreq4dO9r/fDNVI5YZdfuoOJnZNt2Ezxt xwtjGmr67clZlMO39sCnM+l6YHT4GSFHgonr1XWW6qpKog+M3hcpwVv9fAJK0ryIfo AebwE8OVcR7ZaAto8gIq2gQcS9APkf/mdB/5YdF0tLI1185ouPvO8nYq8t/55TyW7F r2/HVYSqr0qQg== Received: from obitton-vm-u22.habana-labs.com (localhost [127.0.0.1]) by obitton-vm-u22.habana-labs.com (8.15.2/8.15.2/Debian-22ubuntu3) with ESMTP id 44RCCuas1921351; Mon, 27 May 2024 15:12:57 +0300 From: Ofir Bitton To: dri-devel@lists.freedesktop.org, linux-kernel@vger.kernel.org Cc: Farah Kassabri Subject: [PATCH 2/8] accel/habanalabs: check for errors after preboot is ready Date: Mon, 27 May 2024 15:12:48 +0300 Message-Id: <20240527121254.1921306-2-obitton@habana.ai> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240527121254.1921306-1-obitton@habana.ai> References: <20240527121254.1921306-1-obitton@habana.ai> MIME-Version: 1.0 X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" From: Farah Kassabri Driver should check and report any fatal errors detected by preboot, before it attempts to load the boot fit. Some errors may cause the driver to stop the boot process and mark the device as unusable. This check will allow the driver to fail and print the error reported by preboot and skip the time wasting attempt of trying to load the boot fit, which will fail due to the error. Signed-off-by: Farah Kassabri Reviewed-by: Ofir Bitton --- drivers/accel/habanalabs/common/firmware_if.c | 24 +++++++++---------- 1 file changed, 12 insertions(+), 12 deletions(-) diff --git a/drivers/accel/habanalabs/common/firmware_if.c b/drivers/accel/habanalabs/common/firmware_if.c index 886b3c07503d..6f0c40b12072 100644 --- a/drivers/accel/habanalabs/common/firmware_if.c +++ b/drivers/accel/habanalabs/common/firmware_if.c @@ -1482,7 +1482,7 @@ int hl_fw_wait_preboot_ready(struct hl_device *hdev) { struct pre_fw_load_props *pre_fw_load = &hdev->fw_loader.pre_fw_load; u32 status = 0, timeout; - int rc, tries = 1; + int rc, tries = 1, fw_err = 0; bool preboot_still_runs; /* Need to check two possible scenarios: @@ -1522,18 +1522,18 @@ int hl_fw_wait_preboot_ready(struct hl_device *hdev) } } - if (rc) { + /* If we read all FF, then something is totally wrong, no point + * of reading specific errors + */ + if (status != -1) + fw_err = fw_read_errors(hdev, pre_fw_load->boot_err0_reg, + pre_fw_load->boot_err1_reg, + pre_fw_load->sts_boot_dev_sts0_reg, + pre_fw_load->sts_boot_dev_sts1_reg); + if (rc || fw_err) { detect_cpu_boot_status(hdev, status); - dev_err(hdev->dev, "CPU boot ready timeout (status = %d)\n", status); - - /* If we read all FF, then something is totally wrong, no point - * of reading specific errors - */ - if (status != -1) - fw_read_errors(hdev, pre_fw_load->boot_err0_reg, - pre_fw_load->boot_err1_reg, - pre_fw_load->sts_boot_dev_sts0_reg, - pre_fw_load->sts_boot_dev_sts1_reg); + dev_err(hdev->dev, "CPU boot %s (status = %d)\n", + fw_err ? "failed due to an error" : "ready timeout", status); return -EIO; } From patchwork Mon May 27 12:12:49 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ofir Bitton X-Patchwork-Id: 13675143 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 41446C25B74 for ; Mon, 27 May 2024 12:13:40 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 5D40010F4F0; Mon, 27 May 2024 12:13:39 +0000 (UTC) Authentication-Results: gabe.freedesktop.org; dkim=pass (2048-bit key; unprotected) header.d=habana.ai header.i=@habana.ai header.b="WhXVwlkD"; dkim-atps=neutral Received: from mail02.habana.ai (habanamailrelay02.habana.ai [62.90.112.121]) by gabe.freedesktop.org (Postfix) with ESMTPS id 42AA310F4FC for ; Mon, 27 May 2024 12:13:07 +0000 (UTC) Received: internal info suppressed DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=habana.ai; s=default; t=1716811982; bh=w8EooheyDNYzk0b5wQ7asK4PXY81sD5/wVdKuo7m5AE=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=WhXVwlkDVTCSkIpOkx+Z8SzJ+JAEf2u2aJDj1mFbvSU1D5jive0u8eGX0Q5ACjW++ GzVj8m/iaeAJPTSAlrZYPWrBwvx22XrdAlL/ZlawD4Uy3OcDDUgB0VnTJAdihq+Gjq qirrK2gVhSBFMjpX6JQMdGDTovm/ii9unK+DLFEzRCdjW6t32H3fbmjZu6Zq/kbWG7 DzWnOihtRLokhWFb4yUoPGPfOj/vowba/x1u12TqMnfQSFVNbq7fHlO3coel7Nf6WH v9hMjeCX8EjIhmpVflVdTmkNRUPW2+TQ5vMNS8Ji78p/M+aDRQsIdT7D7jMXKbuWq1 WmltzRhbSSUBA== Received: from obitton-vm-u22.habana-labs.com (localhost [127.0.0.1]) by obitton-vm-u22.habana-labs.com (8.15.2/8.15.2/Debian-22ubuntu3) with ESMTP id 44RCCuat1921351; Mon, 27 May 2024 15:12:57 +0300 From: Ofir Bitton To: dri-devel@lists.freedesktop.org, linux-kernel@vger.kernel.org Cc: Ariel Suller Subject: [PATCH 3/8] accel/habanalabs/gaudi2: align interrupt names to table Date: Mon, 27 May 2024 15:12:49 +0300 Message-Id: <20240527121254.1921306-3-obitton@habana.ai> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240527121254.1921306-1-obitton@habana.ai> References: <20240527121254.1921306-1-obitton@habana.ai> MIME-Version: 1.0 X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" From: Ariel Suller when reporting tpc events, the dcore and tpc in dcore should be reported and propagated, and not the generatl tpc number Signed-off-by: Ariel Suller Reviewed-by: Ofir Bitton --- .../gaudi2/gaudi2_async_ids_map_extended.h | 150 +++++++++--------- 1 file changed, 75 insertions(+), 75 deletions(-) diff --git a/drivers/accel/habanalabs/include/gaudi2/gaudi2_async_ids_map_extended.h b/drivers/accel/habanalabs/include/gaudi2/gaudi2_async_ids_map_extended.h index 1db73923de62..82d639990cca 100644 --- a/drivers/accel/habanalabs/include/gaudi2/gaudi2_async_ids_map_extended.h +++ b/drivers/accel/habanalabs/include/gaudi2/gaudi2_async_ids_map_extended.h @@ -856,55 +856,55 @@ static struct gaudi2_async_events_ids_map gaudi2_irq_map_table[] = { { .fc_id = 412, .cpu_id = 84, .valid = 1, .msg = 0, .reset = EVENT_RESET_TYPE_HARD, .name = "PCIE_ADDR_DEC_ERR" }, { .fc_id = 413, .cpu_id = 85, .valid = 1, .msg = 0, .reset = EVENT_RESET_TYPE_COMPUTE, - .name = "TPC0_AXI_ERR_RSP" }, + .name = "DCORE0_TPC0_AXI_ERR_RSP" }, { .fc_id = 414, .cpu_id = 85, .valid = 1, .msg = 0, .reset = EVENT_RESET_TYPE_COMPUTE, - .name = "TPC1_AXI_ERR_RSP" }, + .name = "DCORE0_TPC1_AXI_ERR_RSP" }, { .fc_id = 415, .cpu_id = 85, .valid = 1, .msg = 0, .reset = EVENT_RESET_TYPE_COMPUTE, - .name = "TPC2_AXI_ERR_RSP" }, + .name = "DCORE0_TPC2_AXI_ERR_RSP" }, { .fc_id = 416, .cpu_id = 85, .valid = 1, .msg = 0, .reset = EVENT_RESET_TYPE_COMPUTE, - .name = "TPC3_AXI_ERR_RSP" }, + .name = "DCORE0_TPC3_AXI_ERR_RSP" }, { .fc_id = 417, .cpu_id = 85, .valid = 1, .msg = 0, .reset = EVENT_RESET_TYPE_COMPUTE, - .name = "TPC4_AXI_ERR_RSP" }, + .name = "DCORE0_TPC4_AXI_ERR_RSP" }, { .fc_id = 418, .cpu_id = 85, .valid = 1, .msg = 0, .reset = EVENT_RESET_TYPE_COMPUTE, - .name = "TPC5_AXI_ERR_RSP" }, + .name = "DCORE0_TPC5_AXI_ERR_RSP" }, { .fc_id = 419, .cpu_id = 85, .valid = 1, .msg = 0, .reset = EVENT_RESET_TYPE_COMPUTE, - .name = "TPC6_AXI_ERR_RSP" }, + .name = "DCORE1_TPC0_AXI_ERR_RSP" }, { .fc_id = 420, .cpu_id = 85, .valid = 1, .msg = 0, .reset = EVENT_RESET_TYPE_COMPUTE, - .name = "TPC7_AXI_ERR_RSP" }, + .name = "DCORE1_TPC1_AXI_ERR_RSP" }, { .fc_id = 421, .cpu_id = 85, .valid = 1, .msg = 0, .reset = EVENT_RESET_TYPE_COMPUTE, - .name = "TPC8_AXI_ERR_RSP" }, + .name = "DCORE1_TPC2_AXI_ERR_RSP" }, { .fc_id = 422, .cpu_id = 85, .valid = 1, .msg = 0, .reset = EVENT_RESET_TYPE_COMPUTE, - .name = "TPC9_AXI_ERR_RSP" }, + .name = "DCORE1_TPC3_AXI_ERR_RSP" }, { .fc_id = 423, .cpu_id = 85, .valid = 1, .msg = 0, .reset = EVENT_RESET_TYPE_COMPUTE, - .name = "TPC10_AXI_ERR_RSP" }, + .name = "DCORE1_TPC4_AXI_ERR_RSP" }, { .fc_id = 424, .cpu_id = 85, .valid = 1, .msg = 0, .reset = EVENT_RESET_TYPE_COMPUTE, - .name = "TPC11_AXI_ERR_RSP" }, + .name = "DCORE1_TPC5_AXI_ERR_RSP" }, { .fc_id = 425, .cpu_id = 85, .valid = 1, .msg = 0, .reset = EVENT_RESET_TYPE_COMPUTE, - .name = "TPC12_AXI_ERR_RSP" }, + .name = "DCORE2_TPC0_AXI_ERR_RSP" }, { .fc_id = 426, .cpu_id = 85, .valid = 1, .msg = 0, .reset = EVENT_RESET_TYPE_COMPUTE, - .name = "TPC13_AXI_ERR_RSP" }, + .name = "DCORE2_TPC1_AXI_ERR_RSP" }, { .fc_id = 427, .cpu_id = 85, .valid = 1, .msg = 0, .reset = EVENT_RESET_TYPE_COMPUTE, - .name = "TPC14_AXI_ERR_RSP" }, + .name = "DCORE2_TPC2_AXI_ERR_RSP" }, { .fc_id = 428, .cpu_id = 85, .valid = 1, .msg = 0, .reset = EVENT_RESET_TYPE_COMPUTE, - .name = "TPC15_AXI_ERR_RSP" }, + .name = "DCORE2_TPC3_AXI_ERR_RSP" }, { .fc_id = 429, .cpu_id = 85, .valid = 1, .msg = 0, .reset = EVENT_RESET_TYPE_COMPUTE, - .name = "TPC16_AXI_ERR_RSP" }, + .name = "DCORE2_TPC4_AXI_ERR_RSP" }, { .fc_id = 430, .cpu_id = 85, .valid = 1, .msg = 0, .reset = EVENT_RESET_TYPE_COMPUTE, - .name = "TPC17_AXI_ERR_RSP" }, + .name = "DCORE2_TPC5_AXI_ERR_RSP" }, { .fc_id = 431, .cpu_id = 85, .valid = 1, .msg = 0, .reset = EVENT_RESET_TYPE_COMPUTE, - .name = "TPC18_AXI_ERR_RSP" }, + .name = "DCORE3_TPC0_AXI_ERR_RSP" }, { .fc_id = 432, .cpu_id = 85, .valid = 1, .msg = 0, .reset = EVENT_RESET_TYPE_COMPUTE, - .name = "TPC19_AXI_ERR_RSP" }, + .name = "DCORE3_TPC1_AXI_ERR_RSP" }, { .fc_id = 433, .cpu_id = 85, .valid = 1, .msg = 0, .reset = EVENT_RESET_TYPE_COMPUTE, - .name = "TPC20_AXI_ERR_RSP" }, + .name = "DCORE3_TPC2_AXI_ERR_RSP" }, { .fc_id = 434, .cpu_id = 85, .valid = 1, .msg = 0, .reset = EVENT_RESET_TYPE_COMPUTE, - .name = "TPC21_AXI_ERR_RSP" }, + .name = "DCORE3_TPC3_AXI_ERR_RSP" }, { .fc_id = 435, .cpu_id = 85, .valid = 1, .msg = 0, .reset = EVENT_RESET_TYPE_COMPUTE, - .name = "TPC22_AXI_ERR_RSP" }, + .name = "DCORE3_TPC4_AXI_ERR_RSP" }, { .fc_id = 436, .cpu_id = 85, .valid = 1, .msg = 0, .reset = EVENT_RESET_TYPE_COMPUTE, - .name = "TPC23_AXI_ERR_RSP" }, + .name = "DCORE3_TPC5_AXI_ERR_RSP" }, { .fc_id = 437, .cpu_id = 85, .valid = 1, .msg = 0, .reset = EVENT_RESET_TYPE_COMPUTE, - .name = "TPC24_AXI_ERR_RSP" }, + .name = "DCORE4_TPC0_AXI_ERR_RSP" }, { .fc_id = 438, .cpu_id = 86, .valid = 1, .msg = 0, .reset = EVENT_RESET_TYPE_HARD, .name = "AXI_ECC" }, { .fc_id = 439, .cpu_id = 87, .valid = 1, .msg = 0, .reset = EVENT_RESET_TYPE_HARD, @@ -1298,103 +1298,103 @@ static struct gaudi2_async_events_ids_map gaudi2_irq_map_table[] = { { .fc_id = 633, .cpu_id = 130, .valid = 1, .msg = 0, .reset = EVENT_RESET_TYPE_NONE, .name = "TPC0_BMON_SPMU" }, { .fc_id = 634, .cpu_id = 131, .valid = 1, .msg = 0, .reset = EVENT_RESET_TYPE_COMPUTE, - .name = "TPC0_KERNEL_ERR" }, + .name = "DCORE0_TPC0_KERNEL_ERR" }, { .fc_id = 635, .cpu_id = 132, .valid = 1, .msg = 0, .reset = EVENT_RESET_TYPE_NONE, .name = "TPC1_BMON_SPMU" }, { .fc_id = 636, .cpu_id = 133, .valid = 1, .msg = 0, .reset = EVENT_RESET_TYPE_COMPUTE, - .name = "TPC1_KERNEL_ERR" }, + .name = "DCORE0_TPC1_KERNEL_ERR" }, { .fc_id = 637, .cpu_id = 134, .valid = 1, .msg = 0, .reset = EVENT_RESET_TYPE_NONE, .name = "TPC2_BMON_SPMU" }, { .fc_id = 638, .cpu_id = 135, .valid = 1, .msg = 0, .reset = EVENT_RESET_TYPE_COMPUTE, - .name = "TPC2_KERNEL_ERR" }, + .name = "DCORE0_TPC2_KERNEL_ERR" }, { .fc_id = 639, .cpu_id = 136, .valid = 1, .msg = 0, .reset = EVENT_RESET_TYPE_NONE, .name = "TPC3_BMON_SPMU" }, { .fc_id = 640, .cpu_id = 137, .valid = 1, .msg = 0, .reset = EVENT_RESET_TYPE_COMPUTE, - .name = "TPC3_KERNEL_ERR" }, + .name = "DCORE0_TPC3_KERNEL_ERR" }, { .fc_id = 641, .cpu_id = 138, .valid = 1, .msg = 0, .reset = EVENT_RESET_TYPE_NONE, .name = "TPC4_BMON_SPMU" }, { .fc_id = 642, .cpu_id = 139, .valid = 1, .msg = 0, .reset = EVENT_RESET_TYPE_COMPUTE, - .name = "TPC4_KERNEL_ERR" }, + .name = "DCORE0_TPC4_KERNEL_ERR" }, { .fc_id = 643, .cpu_id = 140, .valid = 1, .msg = 0, .reset = EVENT_RESET_TYPE_NONE, .name = "TPC5_BMON_SPMU" }, { .fc_id = 644, .cpu_id = 141, .valid = 1, .msg = 0, .reset = EVENT_RESET_TYPE_COMPUTE, - .name = "TPC5_KERNEL_ERR" }, + .name = "DCORE0_TPC5_KERNEL_ERR" }, { .fc_id = 645, .cpu_id = 150, .valid = 1, .msg = 0, .reset = EVENT_RESET_TYPE_NONE, .name = "TPC6_BMON_SPMU" }, { .fc_id = 646, .cpu_id = 151, .valid = 1, .msg = 0, .reset = EVENT_RESET_TYPE_COMPUTE, - .name = "TPC6_KERNEL_ERR" }, + .name = "DCORE1_TPC0_KERNEL_ERR" }, { .fc_id = 647, .cpu_id = 152, .valid = 1, .msg = 0, .reset = EVENT_RESET_TYPE_NONE, .name = "TPC7_BMON_SPMU" }, { .fc_id = 648, .cpu_id = 153, .valid = 1, .msg = 0, .reset = EVENT_RESET_TYPE_COMPUTE, - .name = "TPC7_KERNEL_ERR" }, + .name = "DCORE1_TPC1_KERNEL_ERR" }, { .fc_id = 649, .cpu_id = 146, .valid = 1, .msg = 0, .reset = EVENT_RESET_TYPE_NONE, .name = "TPC8_BMON_SPMU" }, { .fc_id = 650, .cpu_id = 147, .valid = 1, .msg = 0, .reset = EVENT_RESET_TYPE_COMPUTE, - .name = "TPC8_KERNEL_ERR" }, + .name = "DCORE1_TPC2_KERNEL_ERR" }, { .fc_id = 651, .cpu_id = 148, .valid = 1, .msg = 0, .reset = EVENT_RESET_TYPE_NONE, .name = "TPC9_BMON_SPMU" }, { .fc_id = 652, .cpu_id = 149, .valid = 1, .msg = 0, .reset = EVENT_RESET_TYPE_COMPUTE, - .name = "TPC9_KERNEL_ERR" }, + .name = "DCORE1_TPC3_KERNEL_ERR" }, { .fc_id = 653, .cpu_id = 142, .valid = 1, .msg = 0, .reset = EVENT_RESET_TYPE_NONE, .name = "TPC10_BMON_SPMU" }, { .fc_id = 654, .cpu_id = 143, .valid = 1, .msg = 0, .reset = EVENT_RESET_TYPE_COMPUTE, - .name = "TPC10_KERNEL_ERR" }, + .name = "DCORE1_TPC4_KERNEL_ERR" }, { .fc_id = 655, .cpu_id = 144, .valid = 1, .msg = 0, .reset = EVENT_RESET_TYPE_NONE, .name = "TPC11_BMON_SPMU" }, { .fc_id = 656, .cpu_id = 145, .valid = 1, .msg = 0, .reset = EVENT_RESET_TYPE_COMPUTE, - .name = "TPC11_KERNEL_ERR" }, + .name = "DCORE1_TPC5_KERNEL_ERR" }, { .fc_id = 657, .cpu_id = 162, .valid = 1, .msg = 0, .reset = EVENT_RESET_TYPE_NONE, .name = "TPC12_BMON_SPMU" }, { .fc_id = 658, .cpu_id = 163, .valid = 1, .msg = 0, .reset = EVENT_RESET_TYPE_COMPUTE, - .name = "TPC12_KERNEL_ERR" }, + .name = "DCORE2_TPC0_KERNEL_ERR" }, { .fc_id = 659, .cpu_id = 164, .valid = 1, .msg = 0, .reset = EVENT_RESET_TYPE_NONE, .name = "TPC13_BMON_SPMU" }, { .fc_id = 660, .cpu_id = 165, .valid = 1, .msg = 0, .reset = EVENT_RESET_TYPE_COMPUTE, - .name = "TPC13_KERNEL_ERR" }, + .name = "DCORE2_TPC1_KERNEL_ERR" }, { .fc_id = 661, .cpu_id = 158, .valid = 1, .msg = 0, .reset = EVENT_RESET_TYPE_NONE, .name = "TPC14_BMON_SPMU" }, { .fc_id = 662, .cpu_id = 159, .valid = 1, .msg = 0, .reset = EVENT_RESET_TYPE_COMPUTE, - .name = "TPC14_KERNEL_ERR" }, + .name = "DCORE2_TPC2_KERNEL_ERR" }, { .fc_id = 663, .cpu_id = 160, .valid = 1, .msg = 0, .reset = EVENT_RESET_TYPE_NONE, .name = "TPC15_BMON_SPMU" }, { .fc_id = 664, .cpu_id = 161, .valid = 1, .msg = 0, .reset = EVENT_RESET_TYPE_COMPUTE, - .name = "TPC15_KERNEL_ERR" }, + .name = "DCORE2_TPC3_KERNEL_ERR" }, { .fc_id = 665, .cpu_id = 154, .valid = 1, .msg = 0, .reset = EVENT_RESET_TYPE_NONE, .name = "TPC16_BMON_SPMU" }, { .fc_id = 666, .cpu_id = 155, .valid = 1, .msg = 0, .reset = EVENT_RESET_TYPE_COMPUTE, - .name = "TPC16_KERNEL_ERR" }, + .name = "DCORE2_TPC4_KERNEL_ERR" }, { .fc_id = 667, .cpu_id = 156, .valid = 1, .msg = 0, .reset = EVENT_RESET_TYPE_NONE, .name = "TPC17_BMON_SPMU" }, { .fc_id = 668, .cpu_id = 157, .valid = 1, .msg = 0, .reset = EVENT_RESET_TYPE_COMPUTE, - .name = "TPC17_KERNEL_ERR" }, + .name = "DCORE2_TPC5_KERNEL_ERR" }, { .fc_id = 669, .cpu_id = 166, .valid = 1, .msg = 0, .reset = EVENT_RESET_TYPE_NONE, .name = "TPC18_BMON_SPMU" }, { .fc_id = 670, .cpu_id = 167, .valid = 1, .msg = 0, .reset = EVENT_RESET_TYPE_COMPUTE, - .name = "TPC18_KERNEL_ERR" }, + .name = "DCORE3_TPC0_KERNEL_ERR" }, { .fc_id = 671, .cpu_id = 168, .valid = 1, .msg = 0, .reset = EVENT_RESET_TYPE_NONE, .name = "TPC19_BMON_SPMU" }, { .fc_id = 672, .cpu_id = 169, .valid = 1, .msg = 0, .reset = EVENT_RESET_TYPE_COMPUTE, - .name = "TPC19_KERNEL_ERR" }, + .name = "DCORE3_TPC1_KERNEL_ERR" }, { .fc_id = 673, .cpu_id = 170, .valid = 1, .msg = 0, .reset = EVENT_RESET_TYPE_NONE, .name = "TPC20_BMON_SPMU" }, { .fc_id = 674, .cpu_id = 171, .valid = 1, .msg = 0, .reset = EVENT_RESET_TYPE_COMPUTE, - .name = "TPC20_KERNEL_ERR" }, + .name = "DCORE3_TPC2_KERNEL_ERR" }, { .fc_id = 675, .cpu_id = 172, .valid = 1, .msg = 0, .reset = EVENT_RESET_TYPE_NONE, .name = "TPC21_BMON_SPMU" }, { .fc_id = 676, .cpu_id = 173, .valid = 1, .msg = 0, .reset = EVENT_RESET_TYPE_COMPUTE, - .name = "TPC21_KERNEL_ERR" }, + .name = "DCORE3_TPC3_KERNEL_ERR" }, { .fc_id = 677, .cpu_id = 174, .valid = 1, .msg = 0, .reset = EVENT_RESET_TYPE_NONE, .name = "TPC22_BMON_SPMU" }, { .fc_id = 678, .cpu_id = 175, .valid = 1, .msg = 0, .reset = EVENT_RESET_TYPE_COMPUTE, - .name = "TPC22_KERNEL_ERR" }, + .name = "DCORE3_TPC4_KERNEL_ERR" }, { .fc_id = 679, .cpu_id = 176, .valid = 1, .msg = 0, .reset = EVENT_RESET_TYPE_NONE, .name = "TPC23_BMON_SPMU" }, { .fc_id = 680, .cpu_id = 177, .valid = 1, .msg = 0, .reset = EVENT_RESET_TYPE_COMPUTE, - .name = "TPC23_KERNEL_ERR" }, + .name = "DCORE3_TPC5_KERNEL_ERR" }, { .fc_id = 681, .cpu_id = 178, .valid = 1, .msg = 0, .reset = EVENT_RESET_TYPE_NONE, .name = "TPC24_BMON_SPMU" }, { .fc_id = 682, .cpu_id = 179, .valid = 1, .msg = 0, .reset = EVENT_RESET_TYPE_COMPUTE, - .name = "TPC24_KERNEL_ERR" }, + .name = "DCORE4_TPC0_KERNEL_ERR" }, { .fc_id = 683, .cpu_id = 180, .valid = 0, .msg = 0, .reset = EVENT_RESET_TYPE_NONE, .name = "" }, { .fc_id = 684, .cpu_id = 180, .valid = 0, .msg = 0, .reset = EVENT_RESET_TYPE_NONE, @@ -2442,55 +2442,55 @@ static struct gaudi2_async_events_ids_map gaudi2_irq_map_table[] = { { .fc_id = 1205, .cpu_id = 511, .valid = 0, .msg = 0, .reset = EVENT_RESET_TYPE_NONE, .name = "" }, { .fc_id = 1206, .cpu_id = 512, .valid = 1, .msg = 1, .reset = EVENT_RESET_TYPE_COMPUTE, - .name = "TPC0_QM" }, + .name = "DCORE0_TPC0_QM" }, { .fc_id = 1207, .cpu_id = 513, .valid = 1, .msg = 1, .reset = EVENT_RESET_TYPE_COMPUTE, - .name = "TPC1_QM" }, + .name = "DCORE0_TPC1_QM" }, { .fc_id = 1208, .cpu_id = 514, .valid = 1, .msg = 1, .reset = EVENT_RESET_TYPE_COMPUTE, - .name = "TPC2_QM" }, + .name = "DCORE0_TPC2_QM" }, { .fc_id = 1209, .cpu_id = 515, .valid = 1, .msg = 1, .reset = EVENT_RESET_TYPE_COMPUTE, - .name = "TPC3_QM" }, + .name = "DCORE0_TPC3_QM" }, { .fc_id = 1210, .cpu_id = 516, .valid = 1, .msg = 1, .reset = EVENT_RESET_TYPE_COMPUTE, - .name = "TPC4_QM" }, + .name = "DCORE0_TPC4_QM" }, { .fc_id = 1211, .cpu_id = 517, .valid = 1, .msg = 1, .reset = EVENT_RESET_TYPE_COMPUTE, - .name = "TPC5_QM" }, + .name = "DCORE0_TPC5_QM" }, { .fc_id = 1212, .cpu_id = 518, .valid = 1, .msg = 1, .reset = EVENT_RESET_TYPE_COMPUTE, - .name = "TPC6_QM" }, + .name = "DCORE1_TPC0_QM" }, { .fc_id = 1213, .cpu_id = 519, .valid = 1, .msg = 1, .reset = EVENT_RESET_TYPE_COMPUTE, - .name = "TPC7_QM" }, + .name = "DCORE1_TPC1_QM" }, { .fc_id = 1214, .cpu_id = 520, .valid = 1, .msg = 1, .reset = EVENT_RESET_TYPE_COMPUTE, - .name = "TPC8_QM" }, + .name = "DCORE1_TPC2_QM" }, { .fc_id = 1215, .cpu_id = 521, .valid = 1, .msg = 1, .reset = EVENT_RESET_TYPE_COMPUTE, - .name = "TPC9_QM" }, + .name = "DCORE1_TPC3_QM" }, { .fc_id = 1216, .cpu_id = 522, .valid = 1, .msg = 1, .reset = EVENT_RESET_TYPE_COMPUTE, - .name = "TPC10_QM" }, + .name = "DCORE1_TPC4_QM" }, { .fc_id = 1217, .cpu_id = 523, .valid = 1, .msg = 1, .reset = EVENT_RESET_TYPE_COMPUTE, - .name = "TPC11_QM" }, + .name = "DCORE1_TPC5_QM" }, { .fc_id = 1218, .cpu_id = 524, .valid = 1, .msg = 1, .reset = EVENT_RESET_TYPE_COMPUTE, - .name = "TPC12_QM" }, + .name = "DCORE2_TPC0_QM" }, { .fc_id = 1219, .cpu_id = 525, .valid = 1, .msg = 1, .reset = EVENT_RESET_TYPE_COMPUTE, - .name = "TPC13_QM" }, + .name = "DCORE2_TPC1_QM" }, { .fc_id = 1220, .cpu_id = 526, .valid = 1, .msg = 1, .reset = EVENT_RESET_TYPE_COMPUTE, - .name = "TPC14_QM" }, + .name = "DCORE2_TPC2_QM" }, { .fc_id = 1221, .cpu_id = 527, .valid = 1, .msg = 1, .reset = EVENT_RESET_TYPE_COMPUTE, - .name = "TPC15_QM" }, + .name = "DCORE2_TPC3_QM" }, { .fc_id = 1222, .cpu_id = 528, .valid = 1, .msg = 1, .reset = EVENT_RESET_TYPE_COMPUTE, - .name = "TPC16_QM" }, + .name = "DCORE2_TPC4_QM" }, { .fc_id = 1223, .cpu_id = 529, .valid = 1, .msg = 1, .reset = EVENT_RESET_TYPE_COMPUTE, - .name = "TPC17_QM" }, + .name = "DCORE2_TPC5_QM" }, { .fc_id = 1224, .cpu_id = 530, .valid = 1, .msg = 1, .reset = EVENT_RESET_TYPE_COMPUTE, - .name = "TPC18_QM" }, + .name = "DCORE3_TPC0_QM" }, { .fc_id = 1225, .cpu_id = 531, .valid = 1, .msg = 1, .reset = EVENT_RESET_TYPE_COMPUTE, - .name = "TPC19_QM" }, + .name = "DCORE3_TPC1_QM" }, { .fc_id = 1226, .cpu_id = 532, .valid = 1, .msg = 1, .reset = EVENT_RESET_TYPE_COMPUTE, - .name = "TPC20_QM" }, + .name = "DCORE3_TPC2_QM" }, { .fc_id = 1227, .cpu_id = 533, .valid = 1, .msg = 1, .reset = EVENT_RESET_TYPE_COMPUTE, - .name = "TPC21_QM" }, + .name = "DCORE3_TPC3_QM" }, { .fc_id = 1228, .cpu_id = 534, .valid = 1, .msg = 1, .reset = EVENT_RESET_TYPE_COMPUTE, - .name = "TPC22_QM" }, + .name = "DCORE3_TPC4_QM" }, { .fc_id = 1229, .cpu_id = 535, .valid = 1, .msg = 1, .reset = EVENT_RESET_TYPE_COMPUTE, - .name = "TPC23_QM" }, + .name = "DCORE3_TPC5_QM" }, { .fc_id = 1230, .cpu_id = 536, .valid = 1, .msg = 1, .reset = EVENT_RESET_TYPE_COMPUTE, - .name = "TPC24_QM" }, + .name = "DCORE4_TPC0_QM" }, { .fc_id = 1231, .cpu_id = 537, .valid = 0, .msg = 1, .reset = EVENT_RESET_TYPE_NONE, .name = "" }, { .fc_id = 1232, .cpu_id = 538, .valid = 1, .msg = 1, .reset = EVENT_RESET_TYPE_COMPUTE, From patchwork Mon May 27 12:12:50 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ofir Bitton X-Patchwork-Id: 13675139 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 127B0C25B74 for ; Mon, 27 May 2024 12:13:20 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 3A72410E94D; Mon, 27 May 2024 12:13:19 +0000 (UTC) Authentication-Results: gabe.freedesktop.org; dkim=pass (2048-bit key; unprotected) header.d=habana.ai header.i=@habana.ai header.b="pE2znx8n"; dkim-atps=neutral Received: from mail02.habana.ai (habanamailrelay.habana.ai [213.57.90.13]) by gabe.freedesktop.org (Postfix) with ESMTPS id 2A61110F4E7 for ; Mon, 27 May 2024 12:13:07 +0000 (UTC) Received: internal info suppressed DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=habana.ai; s=default; t=1716811993; bh=3gGM8Mlv1pxxuRoIYA9WhbGp72S3CJF4GKc75niPLk0=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=pE2znx8ngj+qRv7DjNTNbVO635BiKxt1o7ArbCDzGHCDGWomD4xgFuJiBOQ3mypjH Naz3Xuw/85QSrSvjLDbbbL85IzoirLfKuc7jUTuVfumrwAOeA34whgcHPWG1o4wpSu vQpYZyVhQ3P7Swpy8K4lyQidDMxtOlg50qyz0VmmK31xJ0izkvtAYwkQ+zSi2N/NlM TIGRZQUVebU90VUISGN3RUQXuZdodFwZC6GkC7jXMMg35GKp2EDMQf2wBcZnqwrc3q HeBzswYyAyHAMFouA12JR1q/FgZpCUdlhQaBXYHnWefYi6qVX/5eOdSHqTxCqUwdSF GbXsMGtiiuc0Q== Received: from obitton-vm-u22.habana-labs.com (localhost [127.0.0.1]) by obitton-vm-u22.habana-labs.com (8.15.2/8.15.2/Debian-22ubuntu3) with ESMTP id 44RCCuau1921351; Mon, 27 May 2024 15:12:57 +0300 From: Ofir Bitton To: dri-devel@lists.freedesktop.org, linux-kernel@vger.kernel.org Cc: Tomer Tayar Subject: [PATCH 4/8] accel/habanalabs/gaudi2: revise return value handling in gaudi2_hbm_sei_handle_read_err() Date: Mon, 27 May 2024 15:12:50 +0300 Message-Id: <20240527121254.1921306-4-obitton@habana.ai> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240527121254.1921306-1-obitton@habana.ai> References: <20240527121254.1921306-1-obitton@habana.ai> MIME-Version: 1.0 X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" From: Tomer Tayar The return value in gaudi2_hbm_sei_handle_read_err() is boolean and not a bitmask, so there is need for "|= true". In addition, rename the 'rc' variable, as no "return code" is returned here but an indication if a hard reset is required. Signed-off-by: Tomer Tayar Reviewed-by: Ofir Bitton --- drivers/accel/habanalabs/gaudi2/gaudi2.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/drivers/accel/habanalabs/gaudi2/gaudi2.c b/drivers/accel/habanalabs/gaudi2/gaudi2.c index 08276f03c80f..18cc7b773650 100644 --- a/drivers/accel/habanalabs/gaudi2/gaudi2.c +++ b/drivers/accel/habanalabs/gaudi2/gaudi2.c @@ -9263,8 +9263,8 @@ static int gaudi2_handle_mmu_spi_sei_err(struct hl_device *hdev, u16 event_type, static bool gaudi2_hbm_sei_handle_read_err(struct hl_device *hdev, struct hl_eq_hbm_sei_read_err_intr_info *rd_err_data, u32 err_cnt) { + bool require_hard_reset = false; u32 addr, beat, beat_shift; - bool rc = false; dev_err_ratelimited(hdev->dev, "READ ERROR count: ECC SERR: %d, ECC DERR: %d, RD_PARITY: %d\n", @@ -9296,7 +9296,7 @@ static bool gaudi2_hbm_sei_handle_read_err(struct hl_device *hdev, beat, le32_to_cpu(rd_err_data->dbg_rd_err_dm), le32_to_cpu(rd_err_data->dbg_rd_err_syndrome)); - rc |= true; + require_hard_reset = true; } beat_shift = beat * HBM_RD_ERR_BEAT_SHIFT; @@ -9309,7 +9309,7 @@ static bool gaudi2_hbm_sei_handle_read_err(struct hl_device *hdev, (le32_to_cpu(rd_err_data->dbg_rd_err_misc) & (HBM_RD_ERR_PAR_DATA_BEAT0_MASK << beat_shift)) >> (HBM_RD_ERR_PAR_DATA_BEAT0_SHIFT + beat_shift)); - rc |= true; + require_hard_reset = true; } dev_err_ratelimited(hdev->dev, "Beat%d DQ data:\n", beat); @@ -9319,7 +9319,7 @@ static bool gaudi2_hbm_sei_handle_read_err(struct hl_device *hdev, le32_to_cpu(rd_err_data->dbg_rd_err_data[beat * 2 + 1])); } - return rc; + return require_hard_reset; } static void gaudi2_hbm_sei_print_wr_par_info(struct hl_device *hdev, From patchwork Mon May 27 12:12:51 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ofir Bitton X-Patchwork-Id: 13675138 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 3CB8CC25B74 for ; Mon, 27 May 2024 12:13:15 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 9892E10FA1D; Mon, 27 May 2024 12:13:11 +0000 (UTC) Authentication-Results: gabe.freedesktop.org; dkim=pass (2048-bit key; unprotected) header.d=habana.ai header.i=@habana.ai header.b="OjZ20Tx9"; dkim-atps=neutral Received: from mail02.habana.ai (habanamailrelay02.habana.ai [62.90.112.121]) by gabe.freedesktop.org (Postfix) with ESMTPS id 766D410E94D for ; Mon, 27 May 2024 12:13:06 +0000 (UTC) Received: internal info suppressed DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=habana.ai; s=default; t=1716811982; bh=Gweql/mYSZp0D9OIhCBaIAviu0sxLWqrmSCFgLSuNRw=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=OjZ20Tx90j0hd7nl5hmcrPTat70TO6SQ6WRyUe6FE8V4QbtQ+zPbYRUKkPQw65Csi gcg4qB6ZAwSrP/Alh64/jfqXDyJ4yfvQCjOANI1r+71rAjaIObEutwwHBg19qTFK3n lyTknEjr8h+j0afti7ZrFFKNszw2z4NqwuMfzDQReNNIV19Imtzqia0coAB3NLcwtN y7yCqrz1taeQw+GTKOlalQUWmJmtILAHnSL+2h4DdgnGO8YkewnACP6PaUHRJC+ZQw KbNAuuX17u6nH66jXOSPT10sKHs7yZoKLBUcYJ0vIbxZXdQ6B+c/MAxGG7QxyYhfD5 YIGxQVkExws8A== Received: from obitton-vm-u22.habana-labs.com (localhost [127.0.0.1]) by obitton-vm-u22.habana-labs.com (8.15.2/8.15.2/Debian-22ubuntu3) with ESMTP id 44RCCuav1921351; Mon, 27 May 2024 15:12:57 +0300 From: Ofir Bitton To: dri-devel@lists.freedesktop.org, linux-kernel@vger.kernel.org Cc: Tomer Tayar Subject: [PATCH 5/8] accel/habanalabs/gaudi2: assume hard-reset by FW upon MC SEI severe error Date: Mon, 27 May 2024 15:12:51 +0300 Message-Id: <20240527121254.1921306-5-obitton@habana.ai> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240527121254.1921306-1-obitton@habana.ai> References: <20240527121254.1921306-1-obitton@habana.ai> MIME-Version: 1.0 X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" From: Tomer Tayar FW initiates a hard reset upon an MC SEI severe error. Align the driver to expect this reset and avoid accessing the device until the reset is done. Signed-off-by: Tomer Tayar Reviewed-by: Ofir Bitton --- drivers/accel/habanalabs/gaudi2/gaudi2.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/accel/habanalabs/gaudi2/gaudi2.c b/drivers/accel/habanalabs/gaudi2/gaudi2.c index 18cc7b773650..4791582d157c 100644 --- a/drivers/accel/habanalabs/gaudi2/gaudi2.c +++ b/drivers/accel/habanalabs/gaudi2/gaudi2.c @@ -10004,6 +10004,7 @@ static void gaudi2_handle_eqe(struct hl_device *hdev, struct hl_eq_entry *eq_ent if (gaudi2_handle_hbm_mc_sei_err(hdev, event_type, &eq_entry->sei_data)) { reset_flags |= HL_DRV_RESET_FW_FATAL_ERR; reset_required = true; + is_critical = eq_entry->sei_data.hdr.is_critical; } error_count++; break; @@ -10235,8 +10236,7 @@ static void gaudi2_handle_eqe(struct hl_device *hdev, struct hl_eq_entry *eq_ent gaudi2_print_event(hdev, event_type, true, "No error cause for H/W event %u", event_type); - if ((gaudi2_irq_map_table[event_type].reset != EVENT_RESET_TYPE_NONE) || - reset_required) { + if ((gaudi2_irq_map_table[event_type].reset != EVENT_RESET_TYPE_NONE) || reset_required) { if (reset_required || (gaudi2_irq_map_table[event_type].reset == EVENT_RESET_TYPE_HARD)) reset_flags |= HL_DRV_RESET_HARD; From patchwork Mon May 27 12:12:52 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ofir Bitton X-Patchwork-Id: 13675144 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 70002C25B7C for ; Mon, 27 May 2024 12:13:48 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id C459F10F4FC; Mon, 27 May 2024 12:13:47 +0000 (UTC) Authentication-Results: gabe.freedesktop.org; dkim=pass (2048-bit key; unprotected) header.d=habana.ai header.i=@habana.ai header.b="fZHcM75B"; dkim-atps=neutral Received: from mail02.habana.ai (habanamailrelay.habana.ai [213.57.90.13]) by gabe.freedesktop.org (Postfix) with ESMTPS id 479DE10FA35 for ; Mon, 27 May 2024 12:13:07 +0000 (UTC) Received: internal info suppressed DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=habana.ai; s=default; t=1716811993; bh=hikixPhgw118twEP+21boqHR0BCJuA1UyHNIo4eFSHc=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=fZHcM75BM/ZPCm5dogM9PnM2+U6VWsKfHNApocPn57lLGXJ8tUPoYjLT1F86ne5cr mEOtlWPuSo6frJzyxu3p8jU4NhvJpASqHThmvzrDpawGeN3riUTZ+YW8P95ddU4kmt VMYYDPnEmluzgrRQPDKLYCNsUe0C8UGa1np/60FHiKRBJYLGLsi0yOJEICtk5m8tYf oWQU7qildiZgUmLj0YfvGe66zMDKLz9crvlzbG4NhUFlZJSXBsqa29lUdW1VKs/VhD zF2tz5ULOMOzoTHx6+xnWogyd5u4mLIfrkz+m8hohlIxI4N7YxC5hw5yCWRd6PYV6L DlXsPryuK/8Uw== Received: from obitton-vm-u22.habana-labs.com (localhost [127.0.0.1]) by obitton-vm-u22.habana-labs.com (8.15.2/8.15.2/Debian-22ubuntu3) with ESMTP id 44RCCuaw1921351; Mon, 27 May 2024 15:12:57 +0300 From: Ofir Bitton To: dri-devel@lists.freedesktop.org, linux-kernel@vger.kernel.org Cc: Dani Liberman Subject: [PATCH 6/8] accel/habanalabs: separate nonce from max_size in cpucp_packet struct Date: Mon, 27 May 2024 15:12:52 +0300 Message-Id: <20240527121254.1921306-6-obitton@habana.ai> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240527121254.1921306-1-obitton@habana.ai> References: <20240527121254.1921306-1-obitton@habana.ai> MIME-Version: 1.0 X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" From: Dani Liberman In struct cpucp_packet both nonce and data_max_size members are in an union overlapping each other. This is a problem as they both are used in attestation and info_signed packets. The solution here is to move the nonce member to a different union under the same structure. Signed-off-by: Dani Liberman Reviewed-by: Ofir Bitton --- include/linux/habanalabs/cpucp_if.h | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/include/linux/habanalabs/cpucp_if.h b/include/linux/habanalabs/cpucp_if.h index 1ac1d68193e3..0913415243e8 100644 --- a/include/linux/habanalabs/cpucp_if.h +++ b/include/linux/habanalabs/cpucp_if.h @@ -859,9 +859,6 @@ struct cpucp_packet { * result cannot be used to hold general purpose data. */ __le32 status_mask; - - /* random, used once number, for security packets */ - __le32 nonce; }; union { @@ -870,6 +867,9 @@ struct cpucp_packet { /* For Generic packet sub index */ __le32 pkt_subidx; + + /* random, used once number, for security packets */ + __le32 nonce; }; }; From patchwork Mon May 27 12:12:53 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ofir Bitton X-Patchwork-Id: 13675137 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 8EBF4C25B74 for ; Mon, 27 May 2024 12:13:11 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 87EC710E095; Mon, 27 May 2024 12:13:10 +0000 (UTC) Authentication-Results: gabe.freedesktop.org; dkim=pass (2048-bit key; unprotected) header.d=habana.ai header.i=@habana.ai header.b="vF1NwWg+"; dkim-atps=neutral Received: from mail02.habana.ai (habanamailrelay02.habana.ai [62.90.112.121]) by gabe.freedesktop.org (Postfix) with ESMTPS id 98D7810ED94 for ; Mon, 27 May 2024 12:13:06 +0000 (UTC) Received: internal info suppressed DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=habana.ai; s=default; t=1716811982; bh=U9MPkc2aHrKjb35j4wkT+YPsoCL4v2hDv0CLivsUJDc=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=vF1NwWg+FaHiAOT9UF/be/Exbyf0JagRctd5W1240nR69rSZyVqMc+3sajkv0p144 9DI0qn8a7168/9Po2LX/b7xKvJKqc0xR4uZrUO1M+l19XiHyMk6vDNZAeVRxwekDe9 xWO8blujS6AVMwx8h1q328h3SW6nha1bx24PlN1i3tGFrX2FvVHPcE8/oSYwiiM6wb y4knMVBLSJ5LyWOjDFBXUSTT00O/ycaFCUR+xv5/l84TGOdOM6mGTziv9ba6OoF1Gy mrWKgY/6o6WdC4GTNh4DzfSaxc2HptEkiTQxOE89jwsGVgIA/hODjBf+dguAO3i9JQ ghDkY2i84NZiw== Received: from obitton-vm-u22.habana-labs.com (localhost [127.0.0.1]) by obitton-vm-u22.habana-labs.com (8.15.2/8.15.2/Debian-22ubuntu3) with ESMTP id 44RCCuax1921351; Mon, 27 May 2024 15:12:57 +0300 From: Ofir Bitton To: dri-devel@lists.freedesktop.org, linux-kernel@vger.kernel.org Cc: Tomer Tayar Subject: [PATCH 7/8] accel/habanalabs: add an EQ size ASIC property Date: Mon, 27 May 2024 15:12:53 +0300 Message-Id: <20240527121254.1921306-7-obitton@habana.ai> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240527121254.1921306-1-obitton@habana.ai> References: <20240527121254.1921306-1-obitton@habana.ai> MIME-Version: 1.0 X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" From: Tomer Tayar Future supported ASICs might use the dynamic EQ mechanism with the firmware, and in that case the EQ size won't be equal to the default HL_EQ_SIZE_IN_BYTES value. Add an ASIC property to enable overriding this value. Signed-off-by: Tomer Tayar Reviewed-by: Ofir Bitton --- drivers/accel/habanalabs/common/habanalabs.h | 5 +++++ drivers/accel/habanalabs/common/irq.c | 8 +++++--- 2 files changed, 10 insertions(+), 3 deletions(-) diff --git a/drivers/accel/habanalabs/common/habanalabs.h b/drivers/accel/habanalabs/common/habanalabs.h index 5e9f54ca336a..8d0df685e627 100644 --- a/drivers/accel/habanalabs/common/habanalabs.h +++ b/drivers/accel/habanalabs/common/habanalabs.h @@ -651,6 +651,8 @@ struct hl_hints_range { * @hbw_flush_reg: register to read to generate HBW flush. value of 0 means HBW flush is * not supported. * @reserved_fw_mem_size: size of dram memory reserved for FW. + * @fw_event_queue_size: queue size for events from CPU-CP. + * A value of 0 means using the default HL_EQ_SIZE_IN_BYTES value. * @collective_first_sob: first sync object available for collective use * @collective_first_mon: first monitor available for collective use * @sync_stream_first_sob: first sync object available for sync stream use @@ -782,6 +784,7 @@ struct asic_fixed_properties { u32 glbl_err_max_cause_num; u32 hbw_flush_reg; u32 reserved_fw_mem_size; + u32 fw_event_queue_size; u16 collective_first_sob; u16 collective_first_mon; u16 sync_stream_first_sob; @@ -1229,6 +1232,7 @@ struct hl_user_pending_interrupt { * @hdev: pointer to the device structure * @kernel_address: holds the queue's kernel virtual address * @bus_address: holds the queue's DMA address + * @size: the event queue size * @ci: ci inside the queue * @prev_eqe_index: the index of the previous event queue entry. The index of * the current entry's index must be +1 of the previous one. @@ -1240,6 +1244,7 @@ struct hl_eq { struct hl_device *hdev; void *kernel_address; dma_addr_t bus_address; + u32 size; u32 ci; u32 prev_eqe_index; bool check_eqe_index; diff --git a/drivers/accel/habanalabs/common/irq.c b/drivers/accel/habanalabs/common/irq.c index 978b7f4d5eeb..2caf2df4de08 100644 --- a/drivers/accel/habanalabs/common/irq.c +++ b/drivers/accel/habanalabs/common/irq.c @@ -652,14 +652,16 @@ void hl_cq_reset(struct hl_device *hdev, struct hl_cq *q) */ int hl_eq_init(struct hl_device *hdev, struct hl_eq *q) { + u32 size = hdev->asic_prop.fw_event_queue_size ? : HL_EQ_SIZE_IN_BYTES; void *p; - p = hl_cpu_accessible_dma_pool_alloc(hdev, HL_EQ_SIZE_IN_BYTES, &q->bus_address); + p = hl_cpu_accessible_dma_pool_alloc(hdev, size, &q->bus_address); if (!p) return -ENOMEM; q->hdev = hdev; q->kernel_address = p; + q->size = size; q->ci = 0; q->prev_eqe_index = 0; @@ -678,7 +680,7 @@ void hl_eq_fini(struct hl_device *hdev, struct hl_eq *q) { flush_workqueue(hdev->eq_wq); - hl_cpu_accessible_dma_pool_free(hdev, HL_EQ_SIZE_IN_BYTES, q->kernel_address); + hl_cpu_accessible_dma_pool_free(hdev, q->size, q->kernel_address); } void hl_eq_reset(struct hl_device *hdev, struct hl_eq *q) @@ -693,5 +695,5 @@ void hl_eq_reset(struct hl_device *hdev, struct hl_eq *q) * when the device is operational again */ - memset(q->kernel_address, 0, HL_EQ_SIZE_IN_BYTES); + memset(q->kernel_address, 0, q->size); } From patchwork Mon May 27 12:12:54 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ofir Bitton X-Patchwork-Id: 13675140 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 6EB84C25B7C for ; Mon, 27 May 2024 12:13:21 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 60B9910ED94; Mon, 27 May 2024 12:13:19 +0000 (UTC) Authentication-Results: gabe.freedesktop.org; dkim=pass (2048-bit key; unprotected) header.d=habana.ai header.i=@habana.ai header.b="hJPjxmUc"; dkim-atps=neutral Received: from mail02.habana.ai (habanamailrelay.habana.ai [213.57.90.13]) by gabe.freedesktop.org (Postfix) with ESMTPS id 3EA1510F4F0 for ; Mon, 27 May 2024 12:13:07 +0000 (UTC) Received: internal info suppressed DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=habana.ai; s=default; t=1716811993; bh=HA3Ew5tpL7ApKKFbeZ+k1CKifd7ompu4unxRVvUSSkU=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=hJPjxmUciGCeDmYTn82I0pH8ClJhQRo8Ku/9DqJTCAyoxQzt/DBzC3TAzhh+aq1Ou I4eV1MaPXENiqYOJWp7wJpkbtalz5Sy02v0dXXOnvePNY7Sfp4jMVtwMHYT7dWxAOf 1t/Z/tINg6YYTXNeZMIOX4dx87+VOnlCQl89Z1eusBxC7Qvn0jhivLGtzFPQo5uh0X b4uhNP9O2OSZadK8AVqoQs5Pm6cWRcMICojKTKM/ZE81NrO6DDMl+cA9m6X149aIxA Kc+L+uWd2rwrfwAL+BE7MqkB8p3Zz9EYF5BrT8XiTq2iPNhHGrupEvQuK0EcYKcKPB TINOv+umpz6nw== Received: from obitton-vm-u22.habana-labs.com (localhost [127.0.0.1]) by obitton-vm-u22.habana-labs.com (8.15.2/8.15.2/Debian-22ubuntu3) with ESMTP id 44RCCub01921351; Mon, 27 May 2024 15:12:57 +0300 From: Ofir Bitton To: dri-devel@lists.freedesktop.org, linux-kernel@vger.kernel.org Cc: Tomer Tayar Subject: [PATCH 8/8] accel/habanalabs: move hl_eq_heartbeat_event_handle() to common code Date: Mon, 27 May 2024 15:12:54 +0300 Message-Id: <20240527121254.1921306-8-obitton@habana.ai> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240527121254.1921306-1-obitton@habana.ai> References: <20240527121254.1921306-1-obitton@habana.ai> MIME-Version: 1.0 X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" From: Tomer Tayar hl_eq_heartbeat_event_handle() doesn't have ASIC specific code, and therefore can be moved from Gaudi2-only code to common code, and possibly used for other ASICs. Signed-off-by: Tomer Tayar Reviewed-by: Ofir Bitton --- drivers/accel/habanalabs/common/device.c | 6 ++++++ drivers/accel/habanalabs/common/habanalabs.h | 1 + drivers/accel/habanalabs/gaudi2/gaudi2.c | 6 ------ 3 files changed, 7 insertions(+), 6 deletions(-) diff --git a/drivers/accel/habanalabs/common/device.c b/drivers/accel/habanalabs/common/device.c index 35502e938b5d..f9b8601c4396 100644 --- a/drivers/accel/habanalabs/common/device.c +++ b/drivers/accel/habanalabs/common/device.c @@ -2850,3 +2850,9 @@ void hl_set_irq_affinity(struct hl_device *hdev, int irq) if (irq_set_affinity_and_hint(irq, &hdev->irq_affinity_mask)) dev_err(hdev->dev, "Failed setting irq %d affinity\n", irq); } + +void hl_eq_heartbeat_event_handle(struct hl_device *hdev) +{ + hdev->heartbeat_debug_info.heartbeat_event_counter++; + hdev->eq_heartbeat_received = true; +} diff --git a/drivers/accel/habanalabs/common/habanalabs.h b/drivers/accel/habanalabs/common/habanalabs.h index 8d0df685e627..057087dc8592 100644 --- a/drivers/accel/habanalabs/common/habanalabs.h +++ b/drivers/accel/habanalabs/common/habanalabs.h @@ -4059,6 +4059,7 @@ void hl_capture_engine_err(struct hl_device *hdev, u16 engine_id, u16 error_coun void hl_enable_err_info_capture(struct hl_error_info *captured_err_info); void hl_init_cpu_for_irq(struct hl_device *hdev); void hl_set_irq_affinity(struct hl_device *hdev, int irq); +void hl_eq_heartbeat_event_handle(struct hl_device *hdev); #ifdef CONFIG_DEBUG_FS diff --git a/drivers/accel/habanalabs/gaudi2/gaudi2.c b/drivers/accel/habanalabs/gaudi2/gaudi2.c index 4791582d157c..a38b88baadf2 100644 --- a/drivers/accel/habanalabs/gaudi2/gaudi2.c +++ b/drivers/accel/habanalabs/gaudi2/gaudi2.c @@ -9777,12 +9777,6 @@ static u16 event_id_to_engine_id(struct hl_device *hdev, u16 event_type) return U16_MAX; } -static void hl_eq_heartbeat_event_handle(struct hl_device *hdev) -{ - hdev->heartbeat_debug_info.heartbeat_event_counter++; - hdev->eq_heartbeat_received = true; -} - static void gaudi2_handle_eqe(struct hl_device *hdev, struct hl_eq_entry *eq_entry) { struct gaudi2_device *gaudi2 = hdev->asic_specific;