From patchwork Tue Aug 13 13:05:17 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: bugzilla-daemon@freedesktop.org X-Patchwork-Id: 11092187 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 396CD112C for ; Tue, 13 Aug 2019 13:05:20 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 22B142868C for ; Tue, 13 Aug 2019 13:05:20 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 169D328675; Tue, 13 Aug 2019 13:05:20 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.2 required=2.0 tests=BAYES_00,HTML_MESSAGE, MAILING_LIST_MULTI,RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1 Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 89E3E28675 for ; Tue, 13 Aug 2019 13:05:19 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id D06006E126; Tue, 13 Aug 2019 13:05:18 +0000 (UTC) X-Original-To: dri-devel@lists.freedesktop.org Delivered-To: dri-devel@lists.freedesktop.org Received: from culpepper.freedesktop.org (culpepper.freedesktop.org [IPv6:2610:10:20:722:a800:ff:fe98:4b55]) by gabe.freedesktop.org (Postfix) with ESMTP id D733B6E12B for ; Tue, 13 Aug 2019 13:05:17 +0000 (UTC) Received: by culpepper.freedesktop.org (Postfix, from userid 33) id D3E717215A; Tue, 13 Aug 2019 13:05:17 +0000 (UTC) From: bugzilla-daemon@freedesktop.org To: dri-devel@lists.freedesktop.org Subject: [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII Date: Tue, 13 Aug 2019 13:05:17 +0000 X-Bugzilla-Reason: AssignedTo X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: DRI X-Bugzilla-Component: DRM/AMDgpu X-Bugzilla-Version: unspecified X-Bugzilla-Keywords: X-Bugzilla-Severity: major X-Bugzilla-Who: tom@r.je X-Bugzilla-Status: NEW X-Bugzilla-Resolution: X-Bugzilla-Priority: medium X-Bugzilla-Assigned-To: dri-devel@lists.freedesktop.org X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" X-Virus-Scanned: ClamAV using ClamSMTP https://bugs.freedesktop.org/show_bug.cgi?id=110674 --- Comment #94 from Tom B --- Reverting d1a3e239a6016f2bb42a91696056e223982e8538 didn't fix it for me. But that commit may give some insight because it is related to uclk which is the first error we get. I also tried globally increasing usec_timeout as it's used in a few places (patch below). This makes the PC take about a minute to boot up, so clearly the GPU is in an invalid state before these timeouts are hit and then each subsequent call to smum_send_msg_to_smc_with_parameter causes a delay because each call times out. Whatever happens, puts the card into a state that it can't recover from. The next step is to try to find where vega20_set_uclk_to_highest_dpm_level is called from and see what happens just before the call to this function. diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index f4ac632a87b2..9b878c74b17e 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -2418,7 +2418,7 @@ int amdgpu_device_init(struct amdgpu_device *adev, adev->pdev = pdev; adev->flags = flags; adev->asic_type = flags & AMD_ASIC_MASK; - adev->usec_timeout = AMDGPU_MAX_USEC_TIMEOUT; + adev->usec_timeout = AMDGPU_MAX_USEC_TIMEOUT*10; if (amdgpu_emu_mode == 1) adev->usec_timeout *= 2; adev->gmc.gart_size = 512 * 1024 * 1024; diff --git a/drivers/gpu/drm/amd/powerplay/hwmgr/hwmgr.c b/drivers/gpu/drm/amd/powerplay/hwmgr/hwmgr.c index a7e8340baf90..a6b2bc4277ef 100644 --- a/drivers/gpu/drm/amd/powerplay/hwmgr/hwmgr.c +++ b/drivers/gpu/drm/amd/powerplay/hwmgr/hwmgr.c @@ -84,7 +84,7 @@ int hwmgr_early_init(struct pp_hwmgr *hwmgr) if (!hwmgr) return -EINVAL; - hwmgr->usec_timeout = AMD_MAX_USEC_TIMEOUT; + hwmgr->usec_timeout = AMD_MAX_USEC_TIMEOUT*10; hwmgr->pp_table_version = PP_TABLE_V1; hwmgr->dpm_level = AMD_DPM_FORCED_LEVEL_AUTO; hwmgr->request_dpm_level = AMD_DPM_FORCED_LEVEL_AUTO;