From patchwork Thu Mar 16 11:36:31 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Oded Gabbay X-Patchwork-Id: 13177431 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id CEDDEC6FD1F for ; Thu, 16 Mar 2023 11:36:48 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id F228C10E1C9; Thu, 16 Mar 2023 11:36:47 +0000 (UTC) Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by gabe.freedesktop.org (Postfix) with ESMTPS id CB08610E0A8 for ; Thu, 16 Mar 2023 11:36:45 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 4688F61FBC for ; Thu, 16 Mar 2023 11:36:45 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 1422EC4339B for ; Thu, 16 Mar 2023 11:36:43 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1678966604; bh=yKOfGLaQwNENaIDhKHuYkS4Z3LK3UG8gtYieveeD/mg=; h=From:To:Subject:Date:From; b=lR5sN83zWLugzs5Dw+kIQGy+ZpKLfmAIwVz6c/pktbNoQnOExAMVz6MEsetbJJzJJ qdAwVu//QdjzJu5otgXYJBo7RQZOwv0BqWUmUHfxvlOQS6U9ECnuNsLLAIZRTlRik1 z1Wug2H0eKpZKwNa5YsHK/RN77KdCuJKt89MkJOxoOIxIgnWtwI/uxeyFmnRrMuDZi cmJ8TgQq4ri5xemZnMXCxG/DB0Gz06RwmU4dsktLrEYJyXv61g8u3eS5IIdUWuhHuT CHyRUj+BFCaGhtpI7KGQu2BrteyY6lacnsc6wbTqJhPFxIi/BH0vlj61yfyV9KQX6z YXNZn7qWWk8Ow== From: Oded Gabbay To: dri-devel@lists.freedesktop.org Subject: [PATCH 01/10] accel/habanalabs: align to latest firmware specs Date: Thu, 16 Mar 2023 13:36:31 +0200 Message-Id: <20230316113640.499267-1-ogabbay@kernel.org> X-Mailer: git-send-email 2.40.0 MIME-Version: 1.0 X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" Copy the most up-to-date interface files to the firmware. Signed-off-by: Oded Gabbay Reviewed-by: Ofir Bitton --- drivers/accel/habanalabs/gaudi2/gaudi2.c | 2 +- .../habanalabs/include/common/cpucp_if.h | 51 ++++++++++++++++++- .../habanalabs/include/common/hl_boot_if.h | 47 +++++------------ .../include/gaudi2/gaudi2_async_events.h | 4 +- .../habanalabs/include/gaudi2/gaudi2_fw_if.h | 5 +- 5 files changed, 69 insertions(+), 40 deletions(-) diff --git a/drivers/accel/habanalabs/gaudi2/gaudi2.c b/drivers/accel/habanalabs/gaudi2/gaudi2.c index 8943dc9872da..21cf7180fe9f 100644 --- a/drivers/accel/habanalabs/gaudi2/gaudi2.c +++ b/drivers/accel/habanalabs/gaudi2/gaudi2.c @@ -9784,7 +9784,7 @@ static void gaudi2_handle_eqe(struct hl_device *hdev, struct hl_eq_entry *eq_ent break; case GAUDI2_EVENT_CPU_FP32_NOT_SUPPORTED: - case GAUDI2_EVENT_DEV_RESET_REQ: + case GAUDI2_EVENT_CPU_DEV_RESET_REQ: event_mask |= HL_NOTIFIER_EVENT_GENERAL_HW_ERR; error_count = GAUDI2_NA_EVENT_CAUSE; is_critical = true; diff --git a/drivers/accel/habanalabs/include/common/cpucp_if.h b/drivers/accel/habanalabs/include/common/cpucp_if.h index d713252a4f13..bb65b9e2b424 100644 --- a/drivers/accel/habanalabs/include/common/cpucp_if.h +++ b/drivers/accel/habanalabs/include/common/cpucp_if.h @@ -33,6 +33,10 @@ #define PLL_MAP_MAX_BITS 128 #define PLL_MAP_LEN (PLL_MAP_MAX_BITS / 8) +enum eq_event_id { + EQ_EVENT_NIC_STS_REQUEST = 0, +}; + /* * info of the pkt queue pointers in the first async occurrence */ @@ -354,9 +358,48 @@ struct hl_eq_addr_dec_intr_data { __u8 pad[7]; }; +enum hl_mme_acc_err_type { + MME_ACC_WBC_ERR_RESP_LEGACY, + MME_ACC_WBC_ERR_RESP_SET0_CH0, + MME_ACC_WBC_ERR_RESP_SET0_CH1, + MME_ACC_WBC_ERR_RESP_SET1_CH0, + MME_ACC_WBC_ERR_RESP_SET1_CH1, + MME_ACC_WBC_BUSER_NUMERICAL_INF_ERR_SET0_CH0, + MME_ACC_WBC_BUSER_NUMERICAL_INF_ERR_SET0_CH1, + MME_ACC_WBC_BUSER_NUMERICAL_NINF_ERR_SET0_CH0, + MME_ACC_WBC_BUSER_NUMERICAL_NINF_ERR_SET0_CH1, + MME_ACC_WBC_BUSER_NUMERICAL_NAN_ERR_SET0_CH0, + MME_ACC_WBC_BUSER_NUMERICAL_NAN_ERR_SET0_CH1, + MME_ACC_WBC_BUSER_RR_DBG_ERR_SET0_CH0, + MME_ACC_WBC_BUSER_RR_DBG_ERR_SET0_CH1, + MME_ACC_WBC_BUSER_NUMERICAL_INF_ERR_SET1_CH0, + MME_ACC_WBC_BUSER_NUMERICAL_INF_ERR_SET1_CH1, + MME_ACC_WBC_BUSER_NUMERICAL_NINF_ERR_SET1_CH0, + MME_ACC_WBC_BUSER_NUMERICAL_NINF_ERR_SET1_CH1, + MME_ACC_WBC_BUSER_NUMERICAL_NAN_ERR_SET1_CH0, + MME_ACC_WBC_BUSER_NUMERICAL_NAN_ERR_SET1_CH1, + MME_ACC_WBC_BUSER_RR_DBG_ERR_SET1_CH0, + MME_ACC_WBC_BUSER_RR_DBG_ERR_SET1_CH1, + MME_ACC_AP_STS_SRC_DNRM, + MME_ACC_AP_STS_SRC_INF, + MME_ACC_AP_STS_SRC_NINF, + MME_ACC_AP_STS_SRC_NAN, + MME_ACC_AP_STS_RES_INF, + MME_ACC_AP_STS_RES_NINF, + MME_ACC_AP_STS_RES_NAN +}; + +struct hl_eq_mme_acc_data { + __u8 mme_id; + __u8 err_type; /* enum hl_mme_acc_err_type */ + __le16 ctx_id; + __u8 pad[4]; +}; + struct hl_eq_entry { struct hl_eq_header hdr; union { + __le64 data_placeholder; struct hl_eq_ecc_data ecc_data; struct hl_eq_hbm_ecc_data hbm_ecc_data; /* Gaudi1 HBM */ struct hl_eq_sm_sei_data sm_sei_data; @@ -661,6 +704,9 @@ enum pq_init_status { * CPUCP_PACKET_ACTIVE_STATUS_SET - * LKD sends FW indication whether device is free or in use, this indication is reported * also to the BMC. + * + * CPUCP_PACKET_REGISTER_INTERRUPTS - + * Packet to register interrupts indicating LKD is ready to receive events from FW. */ enum cpucp_packet_id { @@ -725,6 +771,8 @@ enum cpucp_packet_id { CPUCP_PACKET_RESERVED9, /* not used */ CPUCP_PACKET_RESERVED10, /* not used */ CPUCP_PACKET_RESERVED11, /* not used */ + CPUCP_PACKET_RESERVED12, /* internal */ + CPUCP_PACKET_REGISTER_INTERRUPTS, /* internal */ CPUCP_PACKET_ID_MAX /* must be last */ }; @@ -1127,6 +1175,7 @@ struct cpucp_security_info { * (0 = functional 1 = binned) * @interposer_version: Interposer version programmed in eFuse * @substrate_version: Substrate version programmed in eFuse + * @fw_hbm_region_size: Size in bytes of FW reserved region in HBM. * @fw_os_version: Firmware OS Version */ struct cpucp_info { @@ -1154,7 +1203,7 @@ struct cpucp_info { __u8 substrate_version; __u8 reserved2; struct cpucp_security_info sec_info; - __le32 reserved3; + __le32 fw_hbm_region_size; __u8 pll_map[PLL_MAP_LEN]; __le64 mme_binning_mask; __u8 fw_os_version[VERSION_MAX_LEN]; diff --git a/drivers/accel/habanalabs/include/common/hl_boot_if.h b/drivers/accel/habanalabs/include/common/hl_boot_if.h index 2256add057c5..c58d76a2705c 100644 --- a/drivers/accel/habanalabs/include/common/hl_boot_if.h +++ b/drivers/accel/habanalabs/include/common/hl_boot_if.h @@ -770,15 +770,23 @@ enum hl_components { HL_COMPONENTS_ARMCP, HL_COMPONENTS_CPLD, HL_COMPONENTS_UBOOT, + HL_COMPONENTS_FUSE, HL_COMPONENTS_MAX_NUM = 16 }; +#define NAME_MAX_LEN 32 /* bytes */ +struct hl_module_data { + __u8 name[NAME_MAX_LEN]; + __u8 version[VERSION_MAX_LEN]; +}; + /** * struct hl_component_versions - versions associated with hl component. * @struct_size: size of all the struct (including dynamic size of modules). * @modules_offset: offset of the modules field in this struct. * @component: version of the component itself. * @fw_os: Firmware OS Version. + * @comp_name: Name of the component. * @modules_mask: i'th bit (from LSB) is a flag - on if module i in enum * hl_modules is used. * @modules_counter: number of set bits in modules_mask. @@ -791,45 +799,14 @@ struct hl_component_versions { __le16 modules_offset; __u8 component[VERSION_MAX_LEN]; __u8 fw_os[VERSION_MAX_LEN]; + __u8 comp_name[NAME_MAX_LEN]; __le16 modules_mask; __u8 modules_counter; __u8 reserved[1]; - __u8 modules[][VERSION_MAX_LEN]; -}; - -/** - * struct hl_fw_versions - all versions (fuse, cpucp's components with their - * modules) - * @struct_size: size of all the struct (including dynamic size of components). - * @components_offset: offset of the components field in this struct. - * @fuse: silicon production FUSE information. - * @components_mask: i'th bit (from LSB) is a flag - on if component i in enum - * hl_components is used. - * @components_counter: number of set bits in components_mask. - * @reserved: reserved for future use. - * @components: versions of hl components. Index i corresponds to the i'th bit - * that is *on* in components_mask. For example, if - * components_mask=0b101, then *components represents arcpid and - * *(hl_component_versions*)((char*)components + 1') represents - * preboot, where 1' = components[0].struct_size. - */ -struct hl_fw_versions { - __le16 struct_size; - __le16 components_offset; - __u8 fuse[VERSION_MAX_LEN]; - __le16 components_mask; - __u8 components_counter; - __u8 reserved[1]; - struct hl_component_versions components[]; + struct hl_module_data modules[]; }; -/* Max size of struct hl_component_versions */ -#define HL_COMPONENT_VERSIONS_MAX_SIZE \ - (sizeof(struct hl_component_versions) + HL_MODULES_MAX_NUM * \ - VERSION_MAX_LEN) - -/* Max size of struct hl_fw_versions */ -#define HL_FW_VERSIONS_MAX_SIZE (sizeof(struct hl_fw_versions) + \ - HL_COMPONENTS_MAX_NUM * HL_COMPONENT_VERSIONS_MAX_SIZE) +/* Max size of fit size */ +#define HL_FW_VERSIONS_FIT_SIZE 4096 #endif /* HL_BOOT_IF_H */ diff --git a/drivers/accel/habanalabs/include/gaudi2/gaudi2_async_events.h b/drivers/accel/habanalabs/include/gaudi2/gaudi2_async_events.h index 50852cc80373..f661068d0c5f 100644 --- a/drivers/accel/habanalabs/include/gaudi2/gaudi2_async_events.h +++ b/drivers/accel/habanalabs/include/gaudi2/gaudi2_async_events.h @@ -1,6 +1,6 @@ /* SPDX-License-Identifier: GPL-2.0 * - * Copyright 2018-2021 HabanaLabs, Ltd. + * Copyright 2018-2022 HabanaLabs, Ltd. * All Rights Reserved. * */ @@ -958,7 +958,7 @@ enum gaudi2_async_event_id { GAUDI2_EVENT_CPU11_STATUS_NIC11_ENG1 = 1318, GAUDI2_EVENT_ARC_DCCM_FULL = 1319, GAUDI2_EVENT_CPU_FP32_NOT_SUPPORTED = 1320, - GAUDI2_EVENT_DEV_RESET_REQ = 1321, + GAUDI2_EVENT_CPU_DEV_RESET_REQ = 1321, GAUDI2_EVENT_SIZE, }; diff --git a/drivers/accel/habanalabs/include/gaudi2/gaudi2_fw_if.h b/drivers/accel/habanalabs/include/gaudi2/gaudi2_fw_if.h index 82f3ca2a3966..8522f24deac0 100644 --- a/drivers/accel/habanalabs/include/gaudi2/gaudi2_fw_if.h +++ b/drivers/accel/habanalabs/include/gaudi2/gaudi2_fw_if.h @@ -63,7 +63,10 @@ struct gaudi2_cold_rst_data { u32 fake_sig_validation_en : 1; u32 bist_skip_enable : 1; u32 bist_need_iatu_config : 1; - u32 reserved : 24; + u32 fake_bis_compliant : 1; + u32 wd_rst_cause_arm : 1; + u32 wd_rst_cause_arcpid : 1; + u32 reserved : 21; }; __le32 data; }; From patchwork Thu Mar 16 11:36:32 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Oded Gabbay X-Patchwork-Id: 13177433 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 057AAC6FD1F for ; Thu, 16 Mar 2023 11:36:56 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 331F710ECA8; Thu, 16 Mar 2023 11:36:55 +0000 (UTC) Received: from dfw.source.kernel.org (dfw.source.kernel.org [IPv6:2604:1380:4641:c500::1]) by gabe.freedesktop.org (Postfix) with ESMTPS id 08E9110ECA8 for ; Thu, 16 Mar 2023 11:36:47 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 8641F61FBC; Thu, 16 Mar 2023 11:36:47 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 15923C433D2; Thu, 16 Mar 2023 11:36:45 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1678966607; bh=UmGWWvaPm9DtDkjuHYKHBSZzC1ujogR++GWFYr95Suk=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=so3UqfmkAtjO9SS8d6pMzu2h6iZHzFwqIj2FyIw4CEergSPCIL0g1WmxipFIq58yF KRjGpwoDnIwrGQi7IvlqQmg2mrsO+2CzspBfke85iEYear/1cW/geDOzepVU+xNQZ9 rLBHDCV286RJTAgECKSh6R4Jje1uVdrPYEKABLsonV9gakwBjF31FPxXWTJSYr4+6b mADDgp4tnV+eJL227UR5k4Au4QAVKGppBCY0pAUUey60/YhHX6D6akkvwxpf1EXYAA KCKTNLrQnbcQp7FdvH165Tnlc1MbJyhrxGyoSUCzCwq3WZN7cbzlp4dZrcDgsU9UvZ XoaDfP6WTazkQ== From: Oded Gabbay To: dri-devel@lists.freedesktop.org Subject: [PATCH 02/10] accel/habanalabs: do not verify engine modes after being changed Date: Thu, 16 Mar 2023 13:36:32 +0200 Message-Id: <20230316113640.499267-2-ogabbay@kernel.org> X-Mailer: git-send-email 2.40.0 In-Reply-To: <20230316113640.499267-1-ogabbay@kernel.org> References: <20230316113640.499267-1-ogabbay@kernel.org> MIME-Version: 1.0 X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Koby Elbaz Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" From: Koby Elbaz Engines idle state can't always be verified between changes of engine modes (e.g., stall/halt). For example, if a CS is inflight when altering engine's mode, idle state will return NOT idle, always. Signed-off-by: Koby Elbaz Reviewed-by: Oded Gabbay Signed-off-by: Oded Gabbay --- drivers/accel/habanalabs/gaudi2/gaudi2.c | 35 +----------------------- 1 file changed, 1 insertion(+), 34 deletions(-) diff --git a/drivers/accel/habanalabs/gaudi2/gaudi2.c b/drivers/accel/habanalabs/gaudi2/gaudi2.c index 21cf7180fe9f..cb679365240e 100644 --- a/drivers/accel/habanalabs/gaudi2/gaudi2.c +++ b/drivers/accel/habanalabs/gaudi2/gaudi2.c @@ -4545,36 +4545,9 @@ static int gaudi2_set_engine_modes(struct hl_device *hdev, return 0; } -static int gaudi2_verify_engine_modes(struct hl_device *hdev, u32 *engine_ids, - u32 num_engines, u32 engine_command) -{ - bool is_engine_idle = true; - u64 mask_arr = 0; - int i; - - gaudi2_get_tpc_idle_status(hdev, &mask_arr, 8 * sizeof(mask_arr), NULL); - gaudi2_get_mme_idle_status(hdev, &mask_arr, 8 * sizeof(mask_arr), NULL); - gaudi2_get_edma_idle_status(hdev, &mask_arr, 8 * sizeof(mask_arr), NULL); - - for (i = 0 ; i < num_engines ; ++i) { - is_engine_idle = !(mask_arr & BIT_ULL(engine_ids[i])); - if ((engine_command == HL_ENGINE_RESUME) && !is_engine_idle) { - dev_err(hdev->dev, "Engine ID %u remained NOT idle!\n", engine_ids[i]); - return -EBUSY; - } else if ((engine_command == HL_ENGINE_STALL) && is_engine_idle) { - dev_err(hdev->dev, "Engine ID %u remained idle!\n", engine_ids[i]); - return -EBUSY; - } - } - - return 0; -} - static int gaudi2_set_engines(struct hl_device *hdev, u32 *engine_ids, u32 num_engines, u32 engine_command) { - int rc; - switch (engine_command) { case HL_ENGINE_CORE_HALT: case HL_ENGINE_CORE_RUN: @@ -4582,18 +4555,12 @@ static int gaudi2_set_engines(struct hl_device *hdev, u32 *engine_ids, case HL_ENGINE_STALL: case HL_ENGINE_RESUME: - rc = gaudi2_set_engine_modes(hdev, engine_ids, num_engines, engine_command); - if (rc) - return rc; - - return gaudi2_verify_engine_modes(hdev, engine_ids, num_engines, engine_command); + return gaudi2_set_engine_modes(hdev, engine_ids, num_engines, engine_command); default: dev_err(hdev->dev, "failed to execute command id %u\n", engine_command); return -EINVAL; } - - return 0; } static void gaudi2_halt_engines(struct hl_device *hdev, bool hard_reset, bool fw_reset) From patchwork Thu Mar 16 11:36:33 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Oded Gabbay X-Patchwork-Id: 13177432 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id B3239C6FD1F for ; Thu, 16 Mar 2023 11:36:52 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id CFF2110E0A8; Thu, 16 Mar 2023 11:36:51 +0000 (UTC) Received: from dfw.source.kernel.org (dfw.source.kernel.org [IPv6:2604:1380:4641:c500::1]) by gabe.freedesktop.org (Postfix) with ESMTPS id 6E36110E0A8 for ; Thu, 16 Mar 2023 11:36:49 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id ED28161FE9; Thu, 16 Mar 2023 11:36:48 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 7C410C433EF; Thu, 16 Mar 2023 11:36:47 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1678966608; bh=6pHbpnMw9YGDjM+m2mdoSxiaBfTRG6jNg88olV+KRfw=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=XgMVG3g2KN8+sL0loKcBdSGB/tcNgTG2qa1ypRzx1+31k1x/9UCYlyMD6SnpVVhHB VxvOc3HF+YwQBZ0rUweOCXOeNgwkVk5CuII3QywTX9rL35xkMVf4vIb8CYeWazv3HU qr5E1x7zOKOHsYxoqnZXqYbKSEP/B2m2A/cNl3SLAgRvfM5hMcZ9NgEwB1KpmK95uv PLtTXH5D9mOskdaLv94xasL7ZZNyI7J2Do0L01w6/2Tzt/N8Gw/ijKq825AemJAzL3 XWUp6njucQbV0hQ8jdQ9xAs/AdafwozMKumDWJPbVPTTUae29C4x84NSXc2XdhlPz9 YqwLS/1Sm7pOA== From: Oded Gabbay To: dri-devel@lists.freedesktop.org Subject: [PATCH 03/10] accel/habanalabs: increase reset poll timeout Date: Thu, 16 Mar 2023 13:36:33 +0200 Message-Id: <20230316113640.499267-3-ogabbay@kernel.org> X-Mailer: git-send-email 2.40.0 In-Reply-To: <20230316113640.499267-1-ogabbay@kernel.org> References: <20230316113640.499267-1-ogabbay@kernel.org> MIME-Version: 1.0 X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Ofir Bitton Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" From: Ofir Bitton Due to a firmware bug we need to increase reset poll timeout or else we will timeout in secured environments. Signed-off-by: Ofir Bitton Reviewed-by: Oded Gabbay Signed-off-by: Oded Gabbay --- drivers/accel/habanalabs/gaudi2/gaudi2.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/accel/habanalabs/gaudi2/gaudi2.c b/drivers/accel/habanalabs/gaudi2/gaudi2.c index cb679365240e..652f12a058c7 100644 --- a/drivers/accel/habanalabs/gaudi2/gaudi2.c +++ b/drivers/accel/habanalabs/gaudi2/gaudi2.c @@ -23,7 +23,8 @@ #define GAUDI2_DMA_POOL_BLK_SIZE SZ_256 /* 256 bytes */ #define GAUDI2_RESET_TIMEOUT_MSEC 2000 /* 2000ms */ -#define GAUDI2_RESET_POLL_TIMEOUT_USEC 50000 /* 50ms */ + +#define GAUDI2_RESET_POLL_TIMEOUT_USEC 500000 /* 500ms */ #define GAUDI2_PLDM_HRESET_TIMEOUT_MSEC 25000 /* 25s */ #define GAUDI2_PLDM_SRESET_TIMEOUT_MSEC 25000 /* 25s */ #define GAUDI2_PLDM_RESET_POLL_TIMEOUT_USEC 3000000 /* 3s */ From patchwork Thu Mar 16 11:36:34 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Oded Gabbay X-Patchwork-Id: 13177434 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 872EEC7618A for ; Thu, 16 Mar 2023 11:36:57 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 99AD610ECAD; Thu, 16 Mar 2023 11:36:55 +0000 (UTC) Received: from dfw.source.kernel.org (dfw.source.kernel.org [IPv6:2604:1380:4641:c500::1]) by gabe.freedesktop.org (Postfix) with ESMTPS id CB5BD10E0A8 for ; Thu, 16 Mar 2023 11:36:50 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 5A5D961FDF; Thu, 16 Mar 2023 11:36:50 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id DEE63C433D2; Thu, 16 Mar 2023 11:36:48 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1678966609; bh=HM2+yJx8qRBCNqa+qY+UBfgaPfnado1JrJfh1o9ix4I=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=Ya7G8Q5hXLomFM4KMzOPS1yfoGKCO9iPUT4FPGwzCJfjF4uESfw0pd1jCdTmHy6ET B/hMXZpKV8PEDirvY/q0heo0B887OzZUgX/Cewb6Lu1FBDVyylk3Tyrhf1TOcbRBvf I6kj110UNeQ9/HJdFiU4Nlz9nR5rob6QZduUVR8LoAqVQDPoERaGx3YxhYwS0CdrAH yGEhaqIe4H+lSqmykW/QklEpZEerW0z29Hfw/GszmVQcwTNTJ4SxhlMCadZucmGWjb SrCwj3zHWl04zgOWrJpzD07mAUEkuF5J2IxjQLSO/4vaZQ+OwFs3MIyAFK5VbAfg8p lkBfwYSGODu3g== From: Oded Gabbay To: dri-devel@lists.freedesktop.org Subject: [PATCH 04/10] accel/habanalabs: in hw_fini return error code if polling timed-out Date: Thu, 16 Mar 2023 13:36:34 +0200 Message-Id: <20230316113640.499267-4-ogabbay@kernel.org> X-Mailer: git-send-email 2.40.0 In-Reply-To: <20230316113640.499267-1-ogabbay@kernel.org> References: <20230316113640.499267-1-ogabbay@kernel.org> MIME-Version: 1.0 X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Dafna Hirschfeld Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" From: Dafna Hirschfeld In hw_fini callback, we use either the cpucp packet method or polling a register. Currently we return error only in the case of cpucp packet failure. In this patch we also return error if polling timed out. Signed-off-by: Dafna Hirschfeld Reviewed-by: Oded Gabbay Signed-off-by: Oded Gabbay --- drivers/accel/habanalabs/gaudi/gaudi.c | 8 ++++---- drivers/accel/habanalabs/gaudi2/gaudi2.c | 10 ++++++---- drivers/accel/habanalabs/goya/goya.c | 8 ++++---- 3 files changed, 14 insertions(+), 12 deletions(-) diff --git a/drivers/accel/habanalabs/gaudi/gaudi.c b/drivers/accel/habanalabs/gaudi/gaudi.c index 004846bc086e..a9d84675b407 100644 --- a/drivers/accel/habanalabs/gaudi/gaudi.c +++ b/drivers/accel/habanalabs/gaudi/gaudi.c @@ -4206,10 +4206,10 @@ static int gaudi_hw_fini(struct hl_device *hdev, bool hard_reset, bool fw_reset) msleep(reset_timeout_ms); status = RREG32(mmPSOC_GLOBAL_CONF_BTM_FSM); - if (status & PSOC_GLOBAL_CONF_BTM_FSM_STATE_MASK) - dev_err(hdev->dev, - "Timeout while waiting for device to reset 0x%x\n", - status); + if (status & PSOC_GLOBAL_CONF_BTM_FSM_STATE_MASK) { + dev_err(hdev->dev, "Timeout while waiting for device to reset 0x%x\n", status); + return -ETIMEDOUT; + } if (gaudi) { gaudi->hw_cap_initialized &= ~(HW_CAP_CPU | HW_CAP_CPU_Q | HW_CAP_HBM | diff --git a/drivers/accel/habanalabs/gaudi2/gaudi2.c b/drivers/accel/habanalabs/gaudi2/gaudi2.c index 652f12a058c7..15a06beea065 100644 --- a/drivers/accel/habanalabs/gaudi2/gaudi2.c +++ b/drivers/accel/habanalabs/gaudi2/gaudi2.c @@ -6036,7 +6036,7 @@ static void gaudi2_execute_hard_reset(struct hl_device *hdev, u32 reset_sleep_ms WREG32(mmPSOC_RESET_CONF_SW_ALL_RST, 1); } -static void gaudi2_get_soft_rst_done_indication(struct hl_device *hdev, u32 poll_timeout_us) +static int gaudi2_get_soft_rst_done_indication(struct hl_device *hdev, u32 poll_timeout_us) { int i, rc = 0; u32 reg_val; @@ -6053,6 +6053,7 @@ static void gaudi2_get_soft_rst_done_indication(struct hl_device *hdev, u32 poll if (rc) dev_err(hdev->dev, "Timeout while waiting for FW to complete soft reset (0x%x)\n", reg_val); + return rc; } /** @@ -6065,7 +6066,7 @@ static void gaudi2_get_soft_rst_done_indication(struct hl_device *hdev, u32 poll * * This function executes soft reset based on if driver/FW should do the reset */ -static void gaudi2_execute_soft_reset(struct hl_device *hdev, u32 reset_sleep_ms, +static int gaudi2_execute_soft_reset(struct hl_device *hdev, u32 reset_sleep_ms, bool driver_performs_reset, u32 poll_timeout_us) { struct cpu_dyn_regs *dyn_regs = &hdev->fw_loader.dynamic_loader.comm_desc.cpu_dyn_regs; @@ -6079,8 +6080,8 @@ static void gaudi2_execute_soft_reset(struct hl_device *hdev, u32 reset_sleep_ms WREG32(le32_to_cpu(dyn_regs->gic_host_soft_rst_irq), gaudi2_irq_map_table[GAUDI2_EVENT_CPU_SOFT_RESET].cpu_id); - gaudi2_get_soft_rst_done_indication(hdev, poll_timeout_us); - return; + + return gaudi2_get_soft_rst_done_indication(hdev, poll_timeout_us); } /* Block access to engines, QMANs and SM during reset, these @@ -6095,6 +6096,7 @@ static void gaudi2_execute_soft_reset(struct hl_device *hdev, u32 reset_sleep_ms mmPCIE_VDEC1_MSTR_IF_RR_SHRD_HBW_BASE + HL_BLOCK_SIZE); WREG32(mmPSOC_RESET_CONF_SOFT_RST, 1); + return 0; } static void gaudi2_poll_btm_indication(struct hl_device *hdev, u32 reset_sleep_ms, diff --git a/drivers/accel/habanalabs/goya/goya.c b/drivers/accel/habanalabs/goya/goya.c index e02de936b7b5..07d67878eac5 100644 --- a/drivers/accel/habanalabs/goya/goya.c +++ b/drivers/accel/habanalabs/goya/goya.c @@ -2834,10 +2834,10 @@ static int goya_hw_fini(struct hl_device *hdev, bool hard_reset, bool fw_reset) msleep(reset_timeout_ms); status = RREG32(mmPSOC_GLOBAL_CONF_BTM_FSM); - if (status & PSOC_GLOBAL_CONF_BTM_FSM_STATE_MASK) - dev_err(hdev->dev, - "Timeout while waiting for device to reset 0x%x\n", - status); + if (status & PSOC_GLOBAL_CONF_BTM_FSM_STATE_MASK) { + dev_err(hdev->dev, "Timeout while waiting for device to reset 0x%x\n", status); + return -ETIMEDOUT; + } if (!hard_reset && goya) { goya->hw_cap_initialized &= ~(HW_CAP_DMA | HW_CAP_MME | From patchwork Thu Mar 16 11:36:35 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Oded Gabbay X-Patchwork-Id: 13177437 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 66A59C7618A for ; Thu, 16 Mar 2023 11:37:05 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 09D6C10ECB0; Thu, 16 Mar 2023 11:37:03 +0000 (UTC) Received: from dfw.source.kernel.org (dfw.source.kernel.org [IPv6:2604:1380:4641:c500::1]) by gabe.freedesktop.org (Postfix) with ESMTPS id 431A610ECA8 for ; Thu, 16 Mar 2023 11:36:52 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id C6BD161FBC; Thu, 16 Mar 2023 11:36:51 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 57B19C4339C; Thu, 16 Mar 2023 11:36:50 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1678966611; bh=svAWj3QMFrUxzrTRWMGtf8lk+Xcl2JB3q7z9gbEYMJA=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=k2AtBnLHLb/5mqEDRtbvQ7xV1OomSYl6iYjUgZTvkPPbcHj0YcWYFOu9BcmbIJXKH WiieblBk3dlzdBSCRXaW4+1KyCINIeqBIk4QgNOntgzpxcsgyT/iupmp4gXHIAwR2K 27up612DMn3uOjbhlpdUAp7ZUybOnGTVNTHStvA18eY2ih5sm0CZmk5vdifTuRmbVr a41TQhX4s5ZgcQiB2Idt+YBljN1o28E3/YwPCHvMRky+DSWLl6IbjpAt9CYqz5Z5jq +l+qxOnXXm3lUw8QTR0jFMaqfHytG8JHTXJPbLYLn+dSbI15yS24O6vITJMsKx1f2o bfCQwY4L/ASoA== From: Oded Gabbay To: dri-devel@lists.freedesktop.org Subject: [PATCH 05/10] accel/habanalabs: fix use of var reset_sleep_ms Date: Thu, 16 Mar 2023 13:36:35 +0200 Message-Id: <20230316113640.499267-5-ogabbay@kernel.org> X-Mailer: git-send-email 2.40.0 In-Reply-To: <20230316113640.499267-1-ogabbay@kernel.org> References: <20230316113640.499267-1-ogabbay@kernel.org> MIME-Version: 1.0 X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Dafna Hirschfeld Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" From: Dafna Hirschfeld - remove reset_sleep_ms arg from functions that don't use it. - move the call msleep(reset_sleep_ms) from btm poll to gaudi2_hw_fini as it is called from there already for other flow. Signed-off-by: Dafna Hirschfeld Reviewed-by: Oded Gabbay Signed-off-by: Oded Gabbay --- drivers/accel/habanalabs/gaudi2/gaudi2.c | 34 +++++++++++------------- 1 file changed, 16 insertions(+), 18 deletions(-) diff --git a/drivers/accel/habanalabs/gaudi2/gaudi2.c b/drivers/accel/habanalabs/gaudi2/gaudi2.c index 15a06beea065..224eaafe953f 100644 --- a/drivers/accel/habanalabs/gaudi2/gaudi2.c +++ b/drivers/accel/habanalabs/gaudi2/gaudi2.c @@ -6014,11 +6014,10 @@ static void gaudi2_send_hard_reset_cmd(struct hl_device *hdev) * gaudi2_execute_hard_reset - execute hard reset by driver/FW * * @hdev: pointer to the habanalabs device structure - * @reset_sleep_ms: sleep time in msec after reset * * This function executes hard reset based on if driver/FW should do the reset */ -static void gaudi2_execute_hard_reset(struct hl_device *hdev, u32 reset_sleep_ms) +static void gaudi2_execute_hard_reset(struct hl_device *hdev) { if (hdev->asic_prop.hard_reset_done_by_fw) { gaudi2_send_hard_reset_cmd(hdev); @@ -6060,14 +6059,13 @@ static int gaudi2_get_soft_rst_done_indication(struct hl_device *hdev, u32 poll_ * gaudi2_execute_soft_reset - execute soft reset by driver/FW * * @hdev: pointer to the habanalabs device structure - * @reset_sleep_ms: sleep time in msec after reset * @driver_performs_reset: true if driver should perform reset instead of f/w. * @poll_timeout_us: time to wait for response from f/w. * * This function executes soft reset based on if driver/FW should do the reset */ -static int gaudi2_execute_soft_reset(struct hl_device *hdev, u32 reset_sleep_ms, - bool driver_performs_reset, u32 poll_timeout_us) +static int gaudi2_execute_soft_reset(struct hl_device *hdev, bool driver_performs_reset, + u32 poll_timeout_us) { struct cpu_dyn_regs *dyn_regs = &hdev->fw_loader.dynamic_loader.comm_desc.cpu_dyn_regs; @@ -6099,15 +6097,11 @@ static int gaudi2_execute_soft_reset(struct hl_device *hdev, u32 reset_sleep_ms, return 0; } -static void gaudi2_poll_btm_indication(struct hl_device *hdev, u32 reset_sleep_ms, - u32 poll_timeout_us) +static void gaudi2_poll_btm_indication(struct hl_device *hdev, u32 poll_timeout_us) { int i, rc = 0; u32 reg_val; - /* without this sleep reset will not work */ - msleep(reset_sleep_ms); - /* We poll the BTM done indication multiple times after reset due to * a HW errata 'GAUDI2_0300' */ @@ -6129,6 +6123,7 @@ static int gaudi2_hw_fini(struct hl_device *hdev, bool hard_reset, bool fw_reset struct gaudi2_device *gaudi2 = hdev->asic_specific; u32 poll_timeout_us, reset_sleep_ms; bool driver_performs_reset = false; + int rc; if (hdev->pldm) { reset_sleep_ms = hard_reset ? GAUDI2_PLDM_HRESET_TIMEOUT_MSEC : @@ -6146,7 +6141,7 @@ static int gaudi2_hw_fini(struct hl_device *hdev, bool hard_reset, bool fw_reset if (hard_reset) { driver_performs_reset = !hdev->asic_prop.hard_reset_done_by_fw; - gaudi2_execute_hard_reset(hdev, reset_sleep_ms); + gaudi2_execute_hard_reset(hdev); } else { /* * As we have to support also work with preboot only (which does not supports @@ -6156,8 +6151,9 @@ static int gaudi2_hw_fini(struct hl_device *hdev, bool hard_reset, bool fw_reset */ driver_performs_reset = (hdev->fw_components == FW_TYPE_PREBOOT_CPU && !hdev->asic_prop.fw_security_enabled); - gaudi2_execute_soft_reset(hdev, reset_sleep_ms, driver_performs_reset, - poll_timeout_us); + rc = gaudi2_execute_soft_reset(hdev, driver_performs_reset, poll_timeout_us); + if (rc) + return rc; } skip_reset: @@ -6181,12 +6177,14 @@ static int gaudi2_hw_fini(struct hl_device *hdev, bool hard_reset, bool fw_reset * communicate with FW that is during reset. * to overcome this we will always wait to preboot ready indication */ - if ((hdev->fw_components & FW_TYPE_PREBOOT_CPU)) { - msleep(reset_sleep_ms); + + /* without this sleep reset will not work */ + msleep(reset_sleep_ms); + + if (hdev->fw_components & FW_TYPE_PREBOOT_CPU) hl_fw_wait_preboot_ready(hdev); - } else { - gaudi2_poll_btm_indication(hdev, reset_sleep_ms, poll_timeout_us); - } + else + gaudi2_poll_btm_indication(hdev, poll_timeout_us); } if (!gaudi2) From patchwork Thu Mar 16 11:36:36 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Oded Gabbay X-Patchwork-Id: 13177438 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id D2B28C6FD1F for ; Thu, 16 Mar 2023 11:37:08 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 3222C10ECAF; Thu, 16 Mar 2023 11:37:08 +0000 (UTC) Received: from ams.source.kernel.org (ams.source.kernel.org [145.40.68.75]) by gabe.freedesktop.org (Postfix) with ESMTPS id 4501D10ECAB for ; Thu, 16 Mar 2023 11:36:55 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id DC21BB820ED; Thu, 16 Mar 2023 11:36:53 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id BA998C4339E; Thu, 16 Mar 2023 11:36:51 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1678966612; bh=KBp9khRfIFEnAh+0IX1vp3rN20E9xNLpYGp92OwePug=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=QIUmGg2eGiwAruKzSAlakJrmc6ByD1rs54OUvFDG++Pzm4s+qFs8Uens0+sropZLm lMe/NOZNxoiQ9+vqBzu1beg8uyyNWWPFKF0g+Mcb/Zy52oc/+zjI+VQhPcPMXLHZx0 W7ecnDcZ7B3+Bunc5yCvNs9+6s4OILhzrs/vax7VCaQKOKyy9UjDjpYU1qaVNlewFM OcZ7dGnJ19wWP9xpgXgdYHm2z+OnhuKt5PTCC+sIvaf6B0TbQ99UwauH/FYHGb1SCs GpPEDeBzu1fUVljEXbtWN71e+gLFpWFGZolrQTbcMNlHX/xI9NvOMaBYSGAMGf5KHM bf9tbSuuU0AFQ== From: Oded Gabbay To: dri-devel@lists.freedesktop.org Subject: [PATCH 06/10] accel/habanalabs: in {e/p}dma_core events read the err cause reg Date: Thu, 16 Mar 2023 13:36:36 +0200 Message-Id: <20230316113640.499267-6-ogabbay@kernel.org> X-Mailer: git-send-email 2.40.0 In-Reply-To: <20230316113640.499267-1-ogabbay@kernel.org> References: <20230316113640.499267-1-ogabbay@kernel.org> MIME-Version: 1.0 X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Dafna Hirschfeld Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" From: Dafna Hirschfeld Since the err_cause register is unprivileged, we should read it from the driver instead of using the param that came from the FW. Signed-off-by: Dafna Hirschfeld Reviewed-by: Oded Gabbay Signed-off-by: Oded Gabbay --- drivers/accel/habanalabs/gaudi2/gaudi2.c | 40 +++++++++++++++++++----- 1 file changed, 33 insertions(+), 7 deletions(-) diff --git a/drivers/accel/habanalabs/gaudi2/gaudi2.c b/drivers/accel/habanalabs/gaudi2/gaudi2.c index 224eaafe953f..94d53cd1b0ff 100644 --- a/drivers/accel/habanalabs/gaudi2/gaudi2.c +++ b/drivers/accel/habanalabs/gaudi2/gaudi2.c @@ -8689,14 +8689,13 @@ static int gaudi2_handle_kdma_core_event(struct hl_device *hdev, u16 event_type, return error_count; } -static int gaudi2_handle_dma_core_event(struct hl_device *hdev, u16 event_type, - u64 intr_cause_data) +static int gaudi2_handle_dma_core_event(struct hl_device *hdev, u16 event_type, int sts_addr) { - u32 error_count = 0; + u32 error_count = 0, sts_val = RREG32(sts_addr); int i; for (i = 0 ; i < GAUDI2_NUM_OF_DMA_CORE_INTR_CAUSE ; i++) - if (intr_cause_data & BIT(i)) { + if (sts_val & BIT(i)) { gaudi2_print_event(hdev, event_type, true, "err cause: %s", gaudi2_dma_core_interrupts_cause[i]); error_count++; @@ -8707,6 +8706,27 @@ static int gaudi2_handle_dma_core_event(struct hl_device *hdev, u16 event_type, return error_count; } +static int gaudi2_handle_pdma_core_event(struct hl_device *hdev, u16 event_type, int pdma_idx) +{ + u32 sts_addr; + + sts_addr = mmPDMA0_CORE_ERR_CAUSE + pdma_idx * PDMA_OFFSET; + return gaudi2_handle_dma_core_event(hdev, event_type, sts_addr); +} + +static int gaudi2_handle_edma_core_event(struct hl_device *hdev, u16 event_type, int edma_idx) +{ + static const int edma_event_index_map[] = {2, 3, 0, 1, 6, 7, 4, 5}; + u32 sts_addr, index; + + index = edma_event_index_map[edma_idx]; + + sts_addr = mmDCORE0_EDMA0_CORE_ERR_CAUSE + + DCORE_OFFSET * (index / NUM_OF_EDMA_PER_DCORE) + + DCORE_EDMA_OFFSET * (index % NUM_OF_EDMA_PER_DCORE); + return gaudi2_handle_dma_core_event(hdev, event_type, sts_addr); +} + static void gaudi2_print_pcie_mstr_rr_mstr_if_razwi_info(struct hl_device *hdev, u64 *event_mask) { u32 mstr_if_base_addr = mmPCIE_MSTR_RR_MSTR_IF_RR_SHRD_HBW_BASE, razwi_happened_addr; @@ -9524,9 +9544,15 @@ static void gaudi2_handle_eqe(struct hl_device *hdev, struct hl_eq_entry *eq_ent event_mask |= HL_NOTIFIER_EVENT_GENERAL_HW_ERR; break; - case GAUDI2_EVENT_HDMA2_CORE ... GAUDI2_EVENT_PDMA1_CORE: - error_count = gaudi2_handle_dma_core_event(hdev, event_type, - le64_to_cpu(eq_entry->intr_cause.intr_cause_data)); + case GAUDI2_EVENT_HDMA2_CORE ... GAUDI2_EVENT_HDMA5_CORE: + index = event_type - GAUDI2_EVENT_HDMA2_CORE; + error_count = gaudi2_handle_edma_core_event(hdev, event_type, index); + event_mask |= HL_NOTIFIER_EVENT_USER_ENGINE_ERR; + break; + + case GAUDI2_EVENT_PDMA0_CORE ... GAUDI2_EVENT_PDMA1_CORE: + index = event_type - GAUDI2_EVENT_PDMA0_CORE; + error_count = gaudi2_handle_pdma_core_event(hdev, event_type, index); event_mask |= HL_NOTIFIER_EVENT_USER_ENGINE_ERR; break; From patchwork Thu Mar 16 11:36:37 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Oded Gabbay X-Patchwork-Id: 13177439 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 923EAC7618A for ; Thu, 16 Mar 2023 11:37:09 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 9FC0610ECB1; Thu, 16 Mar 2023 11:37:08 +0000 (UTC) Received: from ams.source.kernel.org (ams.source.kernel.org [145.40.68.75]) by gabe.freedesktop.org (Postfix) with ESMTPS id 5D4C510ECAB for ; Thu, 16 Mar 2023 11:36:56 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id 05236B8210D for ; Thu, 16 Mar 2023 11:36:55 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 28D50C4339C for ; Thu, 16 Mar 2023 11:36:52 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1678966613; bh=2hHVaV7ZugDJdJqgB3IINpgpQYOcZYyHRjE813vgofI=; h=From:To:Subject:Date:In-Reply-To:References:From; b=cP3ViW/GanR2oqC4nExnjvik+J7XvCHY84+teVDCraelAWuzTxJN7X/3HR926pG3y inFWaEVFIwbMoC8zSe3YGK056lS/go9oxexKoLQGcljppoM511iE8s5r2KwMPjVLyj H7VI10TapE/LLq7tVmx0mwXdUn982X/buyE5mHNB+YgJKIQdkB/3L6lTwIzvo/m2Qb hVsI1InvTxFe73XspkCz9iTrPriZF/ihMrDYlRlo6AV8xh+b0JkmQ5/6VFAMvwQF7t B9wmHYp9mI/gmlg0QZxxusbUKz7dR/goNjfudFpbYatzuoRVfYroZbsxT8cq2D5zxp Huy+qYMqXnkIg== From: Oded Gabbay To: dri-devel@lists.freedesktop.org Subject: [PATCH 07/10] accel/habanalabs: fix field names in hl_info_hw_ip_info Date: Thu, 16 Mar 2023 13:36:37 +0200 Message-Id: <20230316113640.499267-7-ogabbay@kernel.org> X-Mailer: git-send-email 2.40.0 In-Reply-To: <20230316113640.499267-1-ogabbay@kernel.org> References: <20230316113640.499267-1-ogabbay@kernel.org> MIME-Version: 1.0 X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" Don't use padX for actual reservedX fields. Signed-off-by: Oded Gabbay Reviewed-by: Ofir Bitton --- include/uapi/drm/habanalabs_accel.h | 13 +++++++------ 1 file changed, 7 insertions(+), 6 deletions(-) diff --git a/include/uapi/drm/habanalabs_accel.h b/include/uapi/drm/habanalabs_accel.h index 7ca0ef802fd1..7d457eb4da74 100644 --- a/include/uapi/drm/habanalabs_accel.h +++ b/include/uapi/drm/habanalabs_accel.h @@ -915,17 +915,18 @@ struct hl_info_hw_ip_info { __u64 dram_page_size; __u32 edma_enabled_mask; __u16 number_of_user_interrupts; - __u16 pad2; - __u64 reserved4; + __u8 reserved1; + __u8 reserved2; + __u64 reserved3; __u64 device_mem_alloc_default_page_size; + __u64 reserved4; __u64 reserved5; - __u64 reserved6; - __u32 reserved7; - __u8 reserved8; + __u32 reserved6; + __u8 reserved7; __u8 revision_id; __u16 tpc_interrupt_id; + __u32 reserved8; __u32 reserved9; - __u8 pad3[4]; __u64 engine_core_interrupt_reg_addr; }; From patchwork Thu Mar 16 11:36:38 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Oded Gabbay X-Patchwork-Id: 13177436 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 85FC7C6FD19 for ; Thu, 16 Mar 2023 11:37:03 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 168F910ECAE; Thu, 16 Mar 2023 11:37:02 +0000 (UTC) Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by gabe.freedesktop.org (Postfix) with ESMTPS id D747510ECAE for ; Thu, 16 Mar 2023 11:36:55 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 602AB61FBC; Thu, 16 Mar 2023 11:36:55 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 54302C433D2; Thu, 16 Mar 2023 11:36:54 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1678966615; bh=geRMtjFAl7EjrdJwcdx+aKlpK0t1faQNxfjd2ke2sIw=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=GKkMj390joJhf7U60PhAO914pRrSxStwY/sTXZUDCL+/uJVPI+4b2nZr61itgfUH8 2DwNP+xQvIdS0Km3F3bIWbTQxFtDxYjBOsA4s9OI9srD7z1Mj3gqU6NEipf3nmf7Fw qgPHSMzPEY7caGET7SQukBifJamZjTMRoXTDSpq/Pak4l5R4LfXkWG3wQurqKNll78 FBpIjGqWZvSW0ELBoYd7CFoDtXNiJoktrJOKZdk1zj5NIqbB3/j5WGMQdhCz0srjTV IugGeSnl2jlo1Urn4pX/iFXXd97AMhF3nca/x7zXQojrqOdxba9hUXNE4xd1ew/Zcj k/o29bylVTt6w== From: Oded Gabbay To: dri-devel@lists.freedesktop.org Subject: [PATCH 08/10] accel/habanalabs: return tlb inv error code upon failure Date: Thu, 16 Mar 2023 13:36:38 +0200 Message-Id: <20230316113640.499267-8-ogabbay@kernel.org> X-Mailer: git-send-email 2.40.0 In-Reply-To: <20230316113640.499267-1-ogabbay@kernel.org> References: <20230316113640.499267-1-ogabbay@kernel.org> MIME-Version: 1.0 X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Koby Elbaz Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" From: Koby Elbaz Now that CQ-completion based jobs do not trigger a reset upon failure, failure of such jobs (e.g., MMU cache invalidation) should be handled by the caller itself depending on the error code returned to it. Signed-off-by: Koby Elbaz Reviewed-by: Oded Gabbay Signed-off-by: Oded Gabbay --- drivers/accel/habanalabs/gaudi/gaudi.c | 24 +++++++++------ drivers/accel/habanalabs/gaudi2/gaudi2.c | 37 ++++++++++++++++++------ 2 files changed, 43 insertions(+), 18 deletions(-) diff --git a/drivers/accel/habanalabs/gaudi/gaudi.c b/drivers/accel/habanalabs/gaudi/gaudi.c index a9d84675b407..08a4b1cf2b42 100644 --- a/drivers/accel/habanalabs/gaudi/gaudi.c +++ b/drivers/accel/habanalabs/gaudi/gaudi.c @@ -3725,7 +3725,7 @@ static int gaudi_mmu_init(struct hl_device *hdev) if (rc) { dev_err(hdev->dev, "failed to set hop0 addr for asid %d\n", i); - goto err; + return rc; } } @@ -3736,7 +3736,9 @@ static int gaudi_mmu_init(struct hl_device *hdev) /* mem cache invalidation */ WREG32(mmSTLB_MEM_CACHE_INVALIDATION, 1); - hl_mmu_invalidate_cache(hdev, true, 0); + rc = hl_mmu_invalidate_cache(hdev, true, 0); + if (rc) + return rc; WREG32(mmMMU_UP_MMU_ENABLE, 1); WREG32(mmMMU_UP_SPI_MASK, 0xF); @@ -3752,9 +3754,6 @@ static int gaudi_mmu_init(struct hl_device *hdev) gaudi->hw_cap_initialized |= HW_CAP_MMU; return 0; - -err: - return rc; } static int gaudi_load_firmware_to_device(struct hl_device *hdev) @@ -8420,19 +8419,26 @@ static int gaudi_internal_cb_pool_init(struct hl_device *hdev, } mutex_lock(&hdev->mmu_lock); + rc = hl_mmu_map_contiguous(ctx, hdev->internal_cb_va_base, hdev->internal_cb_pool_dma_addr, HOST_SPACE_INTERNAL_CB_SZ); - - hl_mmu_invalidate_cache(hdev, false, MMU_OP_USERPTR); - mutex_unlock(&hdev->mmu_lock); - if (rc) goto unreserve_internal_cb_pool; + rc = hl_mmu_invalidate_cache(hdev, false, MMU_OP_USERPTR); + if (rc) + goto unmap_internal_cb_pool; + + mutex_unlock(&hdev->mmu_lock); + return 0; +unmap_internal_cb_pool: + hl_mmu_unmap_contiguous(ctx, hdev->internal_cb_va_base, + HOST_SPACE_INTERNAL_CB_SZ); unreserve_internal_cb_pool: + mutex_unlock(&hdev->mmu_lock); hl_unreserve_va_block(hdev, ctx, hdev->internal_cb_va_base, HOST_SPACE_INTERNAL_CB_SZ); destroy_internal_cb_pool: diff --git a/drivers/accel/habanalabs/gaudi2/gaudi2.c b/drivers/accel/habanalabs/gaudi2/gaudi2.c index 94d53cd1b0ff..247a6e2c2e91 100644 --- a/drivers/accel/habanalabs/gaudi2/gaudi2.c +++ b/drivers/accel/habanalabs/gaudi2/gaudi2.c @@ -10239,16 +10239,23 @@ static int gaudi2_debugfs_read_dma(struct hl_device *hdev, u64 addr, u32 size, v /* Create mapping on asic side */ mutex_lock(&hdev->mmu_lock); + rc = hl_mmu_map_contiguous(ctx, reserved_va_base, host_mem_dma_addr, SZ_2M); - hl_mmu_invalidate_cache_range(hdev, false, + if (rc) { + dev_err(hdev->dev, "Failed to create mapping on asic mmu\n"); + goto unreserve_va; + } + + rc = hl_mmu_invalidate_cache_range(hdev, false, MMU_OP_USERPTR | MMU_OP_SKIP_LOW_CACHE_INV, ctx->asid, reserved_va_base, SZ_2M); - mutex_unlock(&hdev->mmu_lock); if (rc) { - dev_err(hdev->dev, "Failed to create mapping on asic mmu\n"); + hl_mmu_unmap_contiguous(ctx, reserved_va_base, SZ_2M); goto unreserve_va; } + mutex_unlock(&hdev->mmu_lock); + /* Enable MMU on KDMA */ gaudi2_kdma_set_mmbp_asid(hdev, false, ctx->asid); @@ -10277,11 +10284,16 @@ static int gaudi2_debugfs_read_dma(struct hl_device *hdev, u64 addr, u32 size, v gaudi2_kdma_set_mmbp_asid(hdev, true, HL_KERNEL_ASID_ID); mutex_lock(&hdev->mmu_lock); - hl_mmu_unmap_contiguous(ctx, reserved_va_base, SZ_2M); - hl_mmu_invalidate_cache_range(hdev, false, MMU_OP_USERPTR, + + rc = hl_mmu_unmap_contiguous(ctx, reserved_va_base, SZ_2M); + if (rc) + goto unreserve_va; + + rc = hl_mmu_invalidate_cache_range(hdev, false, MMU_OP_USERPTR, ctx->asid, reserved_va_base, SZ_2M); - mutex_unlock(&hdev->mmu_lock); + unreserve_va: + mutex_unlock(&hdev->mmu_lock); hl_unreserve_va_block(hdev, ctx, reserved_va_base, SZ_2M); free_data_buffer: hl_asic_dma_free_coherent(hdev, SZ_2M, host_mem_virtual_addr, host_mem_dma_addr); @@ -10334,17 +10346,24 @@ static int gaudi2_internal_cb_pool_init(struct hl_device *hdev, struct hl_ctx *c } mutex_lock(&hdev->mmu_lock); + rc = hl_mmu_map_contiguous(ctx, hdev->internal_cb_va_base, hdev->internal_cb_pool_dma_addr, HOST_SPACE_INTERNAL_CB_SZ); - hl_mmu_invalidate_cache(hdev, false, MMU_OP_USERPTR); - mutex_unlock(&hdev->mmu_lock); - if (rc) goto unreserve_internal_cb_pool; + rc = hl_mmu_invalidate_cache(hdev, false, MMU_OP_USERPTR); + if (rc) + goto unmap_internal_cb_pool; + + mutex_unlock(&hdev->mmu_lock); + return 0; +unmap_internal_cb_pool: + hl_mmu_unmap_contiguous(ctx, hdev->internal_cb_va_base, HOST_SPACE_INTERNAL_CB_SZ); unreserve_internal_cb_pool: + mutex_unlock(&hdev->mmu_lock); hl_unreserve_va_block(hdev, ctx, hdev->internal_cb_va_base, HOST_SPACE_INTERNAL_CB_SZ); destroy_internal_cb_pool: gen_pool_destroy(hdev->internal_cb_pool); From patchwork Thu Mar 16 11:36:39 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Oded Gabbay X-Patchwork-Id: 13177435 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 56A09C6FD1F for ; Thu, 16 Mar 2023 11:37:02 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 8891210ECAB; Thu, 16 Mar 2023 11:37:01 +0000 (UTC) Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by gabe.freedesktop.org (Postfix) with ESMTPS id AE03910ECAB for ; Thu, 16 Mar 2023 11:36:57 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 360F861FDF; Thu, 16 Mar 2023 11:36:57 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id B6FC5C433EF; Thu, 16 Mar 2023 11:36:55 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1678966616; bh=viLl54Uy0wuPZRG1trDxvPd8IJCSu5zoT8+cLeZfoQY=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=NhqUIvSQhQUPh7XiIkQ9c85U7HGZpF5WN21r5IxMptEI64fpOnZ1Vk4tgNX2SNvVO mJu4RHZ76TKtSy5zmX2cz07ezQ3m96PIR43NIKi7xAhwPvhBe8JhXFKT0phjRcHnB4 WqrxYs1j1R7/nrU0hzrwNyYBf4A+4mcDjt/972/E5MdGFntaLCIbGhGDGNTuk/3TqD 9FjN/T8X6dV8vFNTxwVOJwfEZCfCac7pWCUd9speVTqqflfUDUfsfWvY7TIzxm7LIA IRG1tkf3whJ+cOtkyiXtMQlqCSHCYnb50MCn9IjXXYp8FVy6Z8CQL2tUu+dKK9QQEB 3QUlSWghwk0tg== From: Oded Gabbay To: dri-devel@lists.freedesktop.org Subject: [PATCH 09/10] accel/habanalabs: remove '\n' when passing strings to gaudi2_print_event() Date: Thu, 16 Mar 2023 13:36:39 +0200 Message-Id: <20230316113640.499267-9-ogabbay@kernel.org> X-Mailer: git-send-email 2.40.0 In-Reply-To: <20230316113640.499267-1-ogabbay@kernel.org> References: <20230316113640.499267-1-ogabbay@kernel.org> MIME-Version: 1.0 X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Tomer Tayar Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" From: Tomer Tayar Remove all '\n' from strings which are passed as arguments to gaudi2_print_event(), because the newline character is added internally in this function. Signed-off-by: Tomer Tayar Reviewed-by: Oded Gabbay Signed-off-by: Oded Gabbay --- drivers/accel/habanalabs/gaudi2/gaudi2.c | 22 +++++++++++----------- 1 file changed, 11 insertions(+), 11 deletions(-) diff --git a/drivers/accel/habanalabs/gaudi2/gaudi2.c b/drivers/accel/habanalabs/gaudi2/gaudi2.c index 247a6e2c2e91..57c94f9a6042 100644 --- a/drivers/accel/habanalabs/gaudi2/gaudi2.c +++ b/drivers/accel/habanalabs/gaudi2/gaudi2.c @@ -7634,7 +7634,7 @@ static bool gaudi2_handle_ecc_event(struct hl_device *hdev, u16 event_type, memory_wrapper_idx = ecc_data->memory_wrapper_idx; gaudi2_print_event(hdev, event_type, !ecc_data->is_critical, - "ECC error detected. address: %#llx. Syndrom: %#llx. block id %u. critical %u.\n", + "ECC error detected. address: %#llx. Syndrom: %#llx. block id %u. critical %u.", ecc_address, ecc_syndrom, memory_wrapper_idx, ecc_data->is_critical); return !!ecc_data->is_critical; @@ -8925,7 +8925,7 @@ static int gaudi2_handle_sm_err(struct hl_device *hdev, u16 event_type, u8 sm_in continue; gaudi2_print_event(hdev, event_type, true, - "err cause: %s. %s: 0x%X\n", + "err cause: %s. %s: 0x%X", gaudi2_sm_sei_cause[i].cause_name, gaudi2_sm_sei_cause[i].log_name, sei_cause_log); @@ -9131,12 +9131,12 @@ static bool gaudi2_handle_hbm_mc_sei_err(struct hl_device *hdev, u16 event_type, if (cause_idx > GAUDI2_NUM_OF_HBM_SEI_CAUSE - 1) { gaudi2_print_event(hdev, event_type, true, "err cause: %s", - "Invalid HBM SEI event cause (%d) provided by FW\n", cause_idx); + "Invalid HBM SEI event cause (%d) provided by FW", cause_idx); return true; } gaudi2_print_event(hdev, event_type, !sei_data->hdr.is_critical, - "System %s Error Interrupt - HBM(%u) MC(%u) MC_CH(%u) MC_PC(%u). Error cause: %s\n", + "System %s Error Interrupt - HBM(%u) MC(%u) MC_CH(%u) MC_PC(%u). Error cause: %s", sei_data->hdr.is_critical ? "Critical" : "Non-critical", hbm_id, mc_id, sei_data->hdr.mc_channel, sei_data->hdr.mc_pseudo_channel, hbm_mc_sei_cause[cause_idx]); @@ -9260,7 +9260,7 @@ static void gaudi2_print_out_of_sync_info(struct hl_device *hdev, u16 event_type struct hl_hw_queue *q = &hdev->kernel_queues[GAUDI2_QUEUE_ID_CPU_PQ]; gaudi2_print_event(hdev, event_type, false, - "FW: pi=%u, ci=%u, LKD: pi=%u, ci=%d\n", + "FW: pi=%u, ci=%u, LKD: pi=%u, ci=%d", le32_to_cpu(sync_err->pi), le32_to_cpu(sync_err->ci), q->pi, atomic_read(&q->ci)); } @@ -9274,7 +9274,7 @@ static int gaudi2_handle_pcie_p2p_msix(struct hl_device *hdev, u16 event_type) if (p2p_intr) { gaudi2_print_event(hdev, event_type, true, - "pcie p2p transaction terminated due to security, req_id(0x%x)\n", + "pcie p2p transaction terminated due to security, req_id(0x%x)", RREG32(mmPCIE_WRAP_P2P_REQ_ID)); WREG32(mmPCIE_WRAP_P2P_INTR, 0x1); @@ -9283,7 +9283,7 @@ static int gaudi2_handle_pcie_p2p_msix(struct hl_device *hdev, u16 event_type) if (msix_gw_intr) { gaudi2_print_event(hdev, event_type, true, - "pcie msi-x gen denied due to vector num check failure, vec(0x%X)\n", + "pcie msi-x gen denied due to vector num check failure, vec(0x%X)", RREG32(mmPCIE_WRAP_MSIX_GW_VEC)); WREG32(mmPCIE_WRAP_MSIX_GW_INTR, 0x1); @@ -9345,7 +9345,7 @@ static void gaudi2_print_cpu_pkt_failure_info(struct hl_device *hdev, u16 event_ struct hl_hw_queue *q = &hdev->kernel_queues[GAUDI2_QUEUE_ID_CPU_PQ]; gaudi2_print_event(hdev, event_type, false, - "FW reported sanity check failure, FW: pi=%u, ci=%u, LKD: pi=%u, ci=%d\n", + "FW reported sanity check failure, FW: pi=%u, ci=%u, LKD: pi=%u, ci=%d", le32_to_cpu(sync_err->pi), le32_to_cpu(sync_err->ci), q->pi, atomic_read(&q->ci)); } @@ -9365,11 +9365,11 @@ static int hl_arc_event_handle(struct hl_device *hdev, u16 event_type, q = (struct hl_engine_arc_dccm_queue_full_irq *) &payload; gaudi2_print_event(hdev, event_type, true, - "ARC DCCM Full event: EngId: %u, Intr_type: %u, Qidx: %u\n", + "ARC DCCM Full event: EngId: %u, Intr_type: %u, Qidx: %u", engine_id, intr_type, q->queue_index); return 1; default: - gaudi2_print_event(hdev, event_type, true, "Unknown ARC event type\n"); + gaudi2_print_event(hdev, event_type, true, "Unknown ARC event type"); return 0; } } @@ -9800,7 +9800,7 @@ static void gaudi2_handle_eqe(struct hl_device *hdev, struct hl_eq_entry *eq_ent gaudi2_print_event(hdev, event_type, true, "%d", event_type); else if (error_count == 0) gaudi2_print_event(hdev, event_type, true, - "No error cause for H/W event %u\n", event_type); + "No error cause for H/W event %u", event_type); if ((gaudi2_irq_map_table[event_type].reset != EVENT_RESET_TYPE_NONE) || reset_required) { From patchwork Thu Mar 16 11:36:40 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Oded Gabbay X-Patchwork-Id: 13177440 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 4E1B1C6FD1F for ; Thu, 16 Mar 2023 11:37:11 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id C438010ECB3; Thu, 16 Mar 2023 11:37:08 +0000 (UTC) Received: from dfw.source.kernel.org (dfw.source.kernel.org [IPv6:2604:1380:4641:c500::1]) by gabe.freedesktop.org (Postfix) with ESMTPS id A301210ECAB for ; Thu, 16 Mar 2023 11:36:58 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 3290E61FEF; Thu, 16 Mar 2023 11:36:58 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 264CEC433D2; Thu, 16 Mar 2023 11:36:56 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1678966618; bh=7aOdl6PJN+BMBBYZ+zVEBlFYgBGsl35q4qFXRtfR2Q0=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=L+gQA4M1b0slm08L1yVtOFO+G3dfAtuY35lokeR56Rcov0ht1jjVvOgEF9P3A+a/B VmbLuPkGLlq5vLPhf1XYl2hCL7EsMgG6HaxRkVWA5kLtCILjpA8nVyWW1qKAh7G0ox TZhOgnVt6k1e28f2kIuKiye3dkZI1aM/3iodUbClgadh1FHASVxYjRjqvWJjAIugSL oS1RP7TjWdRSoFz2shtt4jzRWjuLzv8tiCmyV1MEOt9QhQlJvWYNuhBHkvkTTu2XnP FNnS2YlVM9TPOObr7buICiDPV6oVIAcj4VN2X1HwyDp+q6EnItAC3kt2wHKVQnWEVu Yidw7e6dl6H/A== From: Oded Gabbay To: dri-devel@lists.freedesktop.org Subject: [PATCH 10/10] accel/habanalabs: expose dram reserved size by kmd Date: Thu, 16 Mar 2023 13:36:40 +0200 Message-Id: <20230316113640.499267-10-ogabbay@kernel.org> X-Mailer: git-send-email 2.40.0 In-Reply-To: <20230316113640.499267-1-ogabbay@kernel.org> References: <20230316113640.499267-1-ogabbay@kernel.org> MIME-Version: 1.0 X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Ofir Bitton Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" From: Ofir Bitton We expose this in order for user applications to know how much dram is reserved for internal use. Signed-off-by: Ofir Bitton Reviewed-by: Oded Gabbay Signed-off-by: Oded Gabbay --- drivers/accel/habanalabs/common/habanalabs_ioctl.c | 1 + include/uapi/drm/habanalabs_accel.h | 2 ++ 2 files changed, 3 insertions(+) diff --git a/drivers/accel/habanalabs/common/habanalabs_ioctl.c b/drivers/accel/habanalabs/common/habanalabs_ioctl.c index 0997ede359d7..81e026066f96 100644 --- a/drivers/accel/habanalabs/common/habanalabs_ioctl.c +++ b/drivers/accel/habanalabs/common/habanalabs_ioctl.c @@ -109,6 +109,7 @@ static int hw_ip_info(struct hl_device *hdev, struct hl_info_args *args) hw_ip.security_enabled = prop->fw_security_enabled; hw_ip.revision_id = hdev->pdev->revision; hw_ip.engine_core_interrupt_reg_addr = prop->engine_core_interrupt_reg_addr; + hw_ip.reserved_dram_size = dram_kmd_size; return copy_to_user(out, &hw_ip, min((size_t) size, sizeof(hw_ip))) ? -EFAULT : 0; diff --git a/include/uapi/drm/habanalabs_accel.h b/include/uapi/drm/habanalabs_accel.h index 7d457eb4da74..e43688d30e96 100644 --- a/include/uapi/drm/habanalabs_accel.h +++ b/include/uapi/drm/habanalabs_accel.h @@ -888,6 +888,7 @@ enum hl_server_type { * @tpc_interrupt_id: interrupt id for TPC to use in order to raise events towards the host. * @engine_core_interrupt_reg_addr: interrupt register address for engine core to use * in order to raise events toward FW. + * @reserved_dram_size: DRAM size reserved for driver and firmware. */ struct hl_info_hw_ip_info { __u64 sram_base_address; @@ -928,6 +929,7 @@ struct hl_info_hw_ip_info { __u32 reserved8; __u32 reserved9; __u64 engine_core_interrupt_reg_addr; + __u64 reserved_dram_size; }; struct hl_info_dram_usage {