[Bug,191281] New: [drm:amdgpu_ib_ring_tests] *ERROR* amdgpu: failed testing IB on ring 12 (-110)
diff mbox

Message ID bug-191281-2300@https.bugzilla.kernel.org/
State New
Headers show

Commit Message

bugzilla-daemon@bugzilla.kernel.org Dec. 27, 2016, 9:33 p.m. UTC
https://bugzilla.kernel.org/show_bug.cgi?id=191281

            Bug ID: 191281
           Summary: [drm:amdgpu_ib_ring_tests] *ERROR* amdgpu: failed
                    testing IB on ring 12 (-110)
           Product: Drivers
           Version: 2.5
    Kernel Version: 4.10-rc1
          Hardware: All
                OS: Linux
              Tree: Mainline
            Status: NEW
          Severity: normal
          Priority: P1
         Component: Video(DRI - non Intel)
          Assignee: drivers_video-dri@kernel-bugs.osdl.org
          Reporter: johannes.hirte@datenkhaos.de
        Regression: No

With kernel 4.10-rc1 I get the following error on Carrizo:

[    5.414764] Console: switching to colour frame buffer device 240x67
[    5.419628] amdgpu 0000:00:01.0: fb0: amdgpudrmfb frame buffer device
[    5.426001] [drm] ib test on ring 0 succeeded
[    5.426315] [drm] ib test on ring 1 succeeded
[    5.426384] [drm] ib test on ring 2 succeeded
[    5.426426] [drm] ib test on ring 3 succeeded
[    5.426464] [drm] ib test on ring 4 succeeded
[    5.426506] [drm] ib test on ring 5 succeeded
[    5.426545] [drm] ib test on ring 6 succeeded
[    5.426583] [drm] ib test on ring 7 succeeded
[    5.426623] [drm] ib test on ring 8 succeeded
[    5.426657] [drm] ib test on ring 9 succeeded
[    5.426688] [drm] ib test on ring 10 succeeded
[    6.453373] [drm] ib test on ring 11 succeeded
[    7.688045] [drm:amdgpu_vce_ring_test_ib] *ERROR* amdgpu: IB test timed out.
[    7.688088] [drm:amdgpu_ib_ring_tests] *ERROR* amdgpu: failed testing IB on
ring 12 (-110).
[    7.688122] [drm:amdgpu_device_init] *ERROR* ib ring test failed (-110).
[    7.688268] [ powerplay ] min_core_set_clock not set
[    8.397417] [drm] Initialized amdgpu 3.9.0 20150101 for 0000:00:01.0 on
minor 0

Bisecting was a pain in the ass this time cause of three other bugs. But I was
able to track this down go:

commit ecc2cf7cc8baa1fdb73a7bb9495f6befbcac8cd8
Author: Maruthi Srinivas Bayyavarapu <Maruthi.Bayyavarapu@amd.com>
Date:   Thu Nov 17 17:29:50 2016 +0530

    drm/amdgpu: enable VCE clockgating in Polaris-10/11

    VCE clocks are set to be disabled, when not in use.

    Signed-off-by: Maruthi Bayyavarapu <maruthi.bayyavarapu@amd.com>
    Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
    Signed-off-by: Alex Deucher <alexander.deucher@amd.com>


Simply reverting this wasn't possible due to conflicts, but only reverting this
part

                data &= ~0xef0000;

made my system boot as before.

Comments

bugzilla-daemon@bugzilla.kernel.org Dec. 27, 2016, 9:35 p.m. UTC | #1
https://bugzilla.kernel.org/show_bug.cgi?id=191281

--- Comment #1 from Johannes Hirte <johannes.hirte@datenkhaos.de> ---
Created attachment 248741
  --> https://bugzilla.kernel.org/attachment.cgi?id=248741&action=edit
full dmesg output
bugzilla-daemon@bugzilla.kernel.org Jan. 7, 2017, 1:15 p.m. UTC | #2
https://bugzilla.kernel.org/show_bug.cgi?id=191281

--- Comment #2 from fin4478@hotmail.com ---
Created attachment 250711
  --> https://bugzilla.kernel.org/attachment.cgi?id=250711&action=edit
dmesg output with RX460

I have same errors with Gigabyte RX460 and
~agd5f/linux/log/drivers/gpu/drm/amd?h=drm-next-4.10-wip kernel that I cloned
today. Computer seems to work normally, but booting is 3 seconds slower because
of this and cpu firmware bug traces.
bugzilla-daemon@bugzilla.kernel.org Jan. 8, 2017, 9:29 p.m. UTC | #3
https://bugzilla.kernel.org/show_bug.cgi?id=191281

--- Comment #3 from Johannes Hirte <johannes.hirte@datenkhaos.de> ---
With amdgpu.dpm=0 this doesn't occur. Also tested with amdgpu.powerplay=0, but
it didn't help. I don't know about the meaning of the values applied in
vce_v3_0_set_vce_sw_clock_gating(), but just inverting the "if (gated)" looks
wrong to me.
bugzilla-daemon@bugzilla.kernel.org Jan. 10, 2017, 7:14 p.m. UTC | #4
https://bugzilla.kernel.org/show_bug.cgi?id=191281

--- Comment #4 from Johannes Hirte <johannes.hirte@datenkhaos.de> ---
I can confirm that
https://lists.freedesktop.org/archives/amd-gfx/2017-January/004537.html fixes
boot for me. Tested on top of linux-4.10.0-rc3-00029-gbd5d7428f5e5
bugzilla-daemon@bugzilla.kernel.org June 21, 2017, 6:57 p.m. UTC | #5
https://bugzilla.kernel.org/show_bug.cgi?id=191281

Johannes Hirte (johannes.hirte@datenkhaos.de) changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|---                         |CODE_FIX

--- Comment #5 from Johannes Hirte (johannes.hirte@datenkhaos.de) ---
fixed -> closing

Patch
diff mbox

diff --git a/drivers/gpu/drm/amd/amdgpu/vce_v3_0.c
b/drivers/gpu/drm/amd/amdgpu/vce_v3_0.c
index 39f03f137a56..6b3293a1c7b8 100644
--- a/drivers/gpu/drm/amd/amdgpu/vce_v3_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/vce_v3_0.c
@@ -134,7 +134,7 @@  static void vce_v3_0_set_vce_sw_clock_gating(struct
amdgpu_device *adev,
           accessible but the firmware will throttle the clocks on the
           fly as necessary.
        */
-       if (gated) {
+       if (!gated) {
                data = RREG32(mmVCE_CLOCK_GATING_B);
                data |= 0x1ff;