[Bug,102646] Screen flickering under amdgpu-experimental [buggy auto power profile]
diff mbox series

Message ID bug-102646-502-TdQ7haMv0q@http.bugs.freedesktop.org/
State New
Headers show
Series
  • [Bug,102646] Screen flickering under amdgpu-experimental [buggy auto power profile]
Related show

Commit Message

bugzilla-daemon@freedesktop.org Aug. 5, 2019, 10:11 p.m. UTC
https://bugs.freedesktop.org/show_bug.cgi?id=102646

Ahzo@tutanota.com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |Ahzo@tutanota.com

--- Comment #97 from Ahzo@tutanota.com ---
Created attachment 144950
  --> https://bugs.freedesktop.org/attachment.cgi?id=144950&action=edit
Patch to fix the problem

TLDR: A script to reproduce and a patch to fix this problem are attached.

The problem occurs when switching between high and low GPU memory frequencies
at specific time intervals. It can be reproduced with the attached script,
which optionally accepts a time parameter, defaulting to 1 ms.
With a 75 Hz display mode, screen corruption occurs rather reliably by using a
time parameter in the following ranges:
0.000-0.002, 0.011-0.015, 0.024-0.028, 0.038-0.042, 0.051-0.055, 0.064-0.068,
0.078-0.082, 0.091-0.095, 0.104-0.108

However, using sleep times between these intervals, e.g. 0.1, does not produce
any screen corruption.
For a frequency of 75 Hz the frame time is T = 1000 / 75 ms = 13.3 ms and the
screen corruption happens for sleep times of:
 S = n * T +- 2 ms
Here n is a natural number, i.e. 0, 1, 2, 3, and so on.

Linux 4.14 is not affected by this problem, as is noted in comment 93. However,
that version only works by accident: When the display mode is not yet known,
default parameters, in particular 60 Hz, are used to calculate frame_time_x2 as
(1000000 / 60) * 2 / 100 = 333, which is then used to set VBITimeout. Later,
when the refresh rate of 75 Hz is known, frame_time_x2 gets updated to 266, but
VBITimeout is never actually set to that value via smu7_notify_smc_display.

Linux 4.15 included the DC patches, and when using DC (e.g. by using the boot
argument amdgpu.dc=1), VBITimeout is never set to the default 333, but directly
to 266, which triggers the screen corruption and flickering problems described
in this bug.

With Linux 4.17 the problem got more widespread, because the default was
accidentally switched to enable DC by erroneously removing the 'return
amdgpu_dc > 0;' line with:
commit 367e66870e9cc20b867b11c4484ae83336efcb67
Author: Alex Deucher <alexander.deucher@amd.com>
Date:   Thu Jan 25 16:53:25 2018 -0500

    drm/amdgpu: remove DC special casing for KB/ML

    It seems to be working now.

    Bug: https://bugs.freedesktop.org/show_bug.cgi?id=102372
    Reviewed-by: Mike Lothian <mike@fireburn.co.uk>
    Reviewed-by: Harry Wentland <harry.wentland@amd.com>
    Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

        case CHIP_POLARIS11:
@@ -1714,9 +1716,6 @@ bool amdgpu_device_asic_has_dc_support(enum amd_asic_type
asic_type)
 #if defined(CONFIG_DRM_AMD_DC_PRE_VEGA)
                return amdgpu_dc != 0;
 #endif
-       case CHIP_KABINI:
-       case CHIP_MULLINS:
-               return amdgpu_dc > 0;
        case CHIP_VEGA10:
 #if defined(CONFIG_DRM_AMD_DC_DCN1_0)
        case CHIP_RAVEN:


Linux 4.18 aligns the Non-DC case more closely with the DC case and thus
VBITimeout gets actually set to the updated frame_time_x2 via
smu7_notify_smc_display. Thus the Non-DC case is also affected by this bug
since:
commit 555fd70c59bc7f7acd8bc429d92bd59a66a7b83b
Author: Rex Zhu <Rex.Zhu@amd.com>
Date:   Tue Mar 27 13:32:02 2018 +0800

    drm/amd/pp: Not call cgs interface to get display info

    DC/Non DC all will update display configuration
    when the display state changed
    No need to get display info through cgs interface

    Reviewed-by: Evan Quan <evan.quan@amd.com>
    Signed-off-by: Rex Zhu <Rex.Zhu@amd.com>
    Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

Linux 4.20 contains a commit trying to fix flickering issues:
commit ec2e082a79b5d46addf2e7b83a13fb015fca6149
Author: Alex Deucher <alexander.deucher@amd.com>
Date:   Thu Aug 9 14:24:08 2018 -0500

    drm/amdgpu/powerplay: check vrefresh when when changing displays

    Compare the current vrefresh in addition to the number of displays
    when determining whether or not the smu needs updates when changing
    modes. The SMU needs to be updated if the vbi timeout changes due
    to a different refresh rate.  Fixes flickering around mode changes
    in some cases on polaris parts.

    Reviewed-by: Rex Zhu <Rex.Zhu@amd.com>
    Reviewed-by: Huang Rui <ray.huang@amd.com>
    Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

But that doesn't fix the screen corruption described in this bug, because the
problem is not that VBITimeout isn't updated enough, but rather the opposite,
i.e. that it gets set to the frame_time_x2 value calculated from the correct,
high refresh rate instead of the default value of 333.

At least for 75 Hz, this problem can be fixed by preventing frame_time_x2 and
thus VBITimeout from being smaller than 280, as in the attached patch. Setting
VBITimeout to higher values than the calcualted frame_time_x2 does not seem to
cause any problems.
It would be great if someone could test this patch with higher refresh rates,
as well.

Patch
diff mbox series

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 309977ef5b51..2ad9de42b65b 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -1704,6 +1704,8 @@  bool amdgpu_device_asic_has_dc_support(enum amd_asic_type
asic_type)
        case CHIP_BONAIRE:
        case CHIP_HAWAII:
        case CHIP_KAVERI:
+       case CHIP_KABINI:
+       case CHIP_MULLINS:
        case CHIP_CARRIZO:
        case CHIP_STONEY: