amdgpu freezes kernel after kernel 5.7.6 changes
diff mbox series

Message ID a99ef2e90d61552e7a81e4983e4dcff601f725f9.camel@gmail.com
State New
Headers show
Series
  • amdgpu freezes kernel after kernel 5.7.6 changes
Related show

Commit Message

roland.rucky@gmail.com June 27, 2020, 4:03 p.m. UTC
Not sure if I am contacting the correct person,

Since I updated to kernel 5.7.6, my system started to freeze randomly.
After a couple of freezes, I noticed, that they always happen when
playing games, or during videoplayback in e.g. firefox.

I reverted to the previous kernel 5.7.5, and all issues are gone. Next
I started to revert and test single commits between the two kernel
versions, which affect amdgpu. If I revert the changes listed below,
the kernel does not freeze any more.

Sadly I can`t get any crash reports / logs. Even the magic sysrq key
does not work, when the system is frozen.

I will also attach a patch, which includes all reverted commits.


List of changes I reverted:
-----------------------------------------------------------------------

commit 6674508ba1a2ea6caca5de2bcb25bc00a050fd0a
Author: Harry Wentland <harry.wentland@amd.com>
Date:   Thu May 28 09:44:44 2020 -0400

    Revert "drm/amd/display: disable dcn20 abm feature for bring up"

    commit 14ed1c908a7a623cc0cbf0203f8201d1b7d31d16 upstream.

    This reverts commit 96cb7cf13d8530099c256c053648ad576588c387.

    This change was used for DCN2 bringup and is no longer desired.
    In fact it breaks backlight on DCN2 systems.

    Cc: Alexander Monakov <amonakov@ispras.ru>
    Cc: Hersen Wu <hersenxs.wu@amd.com>
    Cc: Anthony Koo <Anthony.Koo@amd.com>
    Cc: Michael Chiu <Michael.Chiu@amd.com>
    Signed-off-by: Harry Wentland <harry.wentland@amd.com>
    Acked-by: Alex Deucher <alexander.deucher@amd.com>
    Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
    Reported-and-tested-by: Alexander Monakov <amonakov@ispras.ru>
    Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
    Cc: stable@vger.kernel.org
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

 	commit_planes_for_stream(
 				dc,
 				srf_updates,

Details:

* Kernel: 5.7.6
* GPU: radeon 5700XT
* CPU: ryzen 3800X
* running on swaywm(wayland)

Comments

roland.rucky@gmail.com June 28, 2020, 10:27 a.m. UTC | #1
I narrowed it down even more. When reverting 
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=b5232e2ee8df85891514c73472cac09921e5d51d everything is stable. I have an uptime of 4+ hours while actively
trying to reproduce the bug (gaming).

Here is a bug report: 
https://gitlab.freedesktop.org/drm/amd/-/issues/1191

Added Nicholas Kazlauskas, author of the mentioned commit.

Am Samstag, den 27.06.2020, 18:03 +0200 schrieb roland.rucky@gmail.com:
> Not sure if I am contacting the correct person,
> 
> Since I updated to kernel 5.7.6, my system started to freeze
> randomly.
> After a couple of freezes, I noticed, that they always happen when
> playing games, or during videoplayback in e.g. firefox.
> 
> I reverted to the previous kernel 5.7.5, and all issues are gone.
> Next
> I started to revert and test single commits between the two kernel
> versions, which affect amdgpu. If I revert the changes listed below,
> the kernel does not freeze any more.
> 
> Sadly I can`t get any crash reports / logs. Even the magic sysrq key
> does not work, when the system is frozen.
> 
> I will also attach a patch, which includes all reverted commits.
> 
> 
> List of changes I reverted:
> -------------------------------------------------------------------
> ----
> 
> commit 6674508ba1a2ea6caca5de2bcb25bc00a050fd0a
> Author: Harry Wentland <harry.wentland@amd.com>
> Date:   Thu May 28 09:44:44 2020 -0400
> 
>     Revert "drm/amd/display: disable dcn20 abm feature for bring up"
> 
>     commit 14ed1c908a7a623cc0cbf0203f8201d1b7d31d16 upstream.
> 
>     This reverts commit 96cb7cf13d8530099c256c053648ad576588c387.
> 
>     This change was used for DCN2 bringup and is no longer desired.
>     In fact it breaks backlight on DCN2 systems.
> 
>     Cc: Alexander Monakov <amonakov@ispras.ru>
>     Cc: Hersen Wu <hersenxs.wu@amd.com>
>     Cc: Anthony Koo <Anthony.Koo@amd.com>
>     Cc: Michael Chiu <Michael.Chiu@amd.com>
>     Signed-off-by: Harry Wentland <harry.wentland@amd.com>
>     Acked-by: Alex Deucher <alexander.deucher@amd.com>
>     Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
>     Reported-and-tested-by: Alexander Monakov <amonakov@ispras.ru>
>     Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
>     Cc: stable@vger.kernel.org
>     Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> 
> diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
> b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
> index 7fc15b82fe48..f9f02e08054b 100644
> --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
> +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
> @@ -1334,7 +1334,7 @@ static int dm_late_init(void *handle)
>  	unsigned int linear_lut[16];
>  	int i;
>  	struct dmcu *dmcu = adev->dm.dc->res_pool->dmcu;
> -	bool ret = false;
> +	bool ret;
> 
>  	for (i = 0; i < 16; i++)
>  		linear_lut[i] = 0xFFFF * i / 15;
> @@ -1350,13 +1350,10 @@ static int dm_late_init(void *handle)
>  	 */
>  	params.min_abm_backlight = 0x28F;
> 
> -	/* todo will enable for navi10 */
> -	if (adev->asic_type <= CHIP_RAVEN) {
> -		ret = dmcu_load_iram(dmcu, params);
> +	ret = dmcu_load_iram(dmcu, params);
> 
> -		if (!ret)
> -			return -EINVAL;
> -	}
> +	if (!ret)
> +		return -EINVAL;
> 
>  	return detect_mst_link_for_all_connectors(adev->ddev);
>  }
> 
> commit fba8f9ef7e1405ee6f422beb874791e8a5eb489c
> Author: Alex Deucher <alexander.deucher@amd.com>
> Date:   Tue Jun 2 17:22:48 2020 -0400
> 
>     drm/amdgpu/display: use blanked rather than plane state for sync
> groups
> 
>     commit b7f839d292948142eaab77cedd031aad0bfec872 upstream.
> 
>     We may end up with no planes set yet, depending on the ordering,
> but we
>     should have the proper blanking state which is either handled by
> either
>     DPG or TG depending on the hardware generation.  Check both to
> determine
>     the proper blanked state.
> 
>     Bug: https://gitlab.freedesktop.org/drm/amd/issues/781
>     Fixes: 5fc0cbfad45648 ("drm/amd/display: determine if a pipe is
> synced by plane state")
>     Cc: nicholas.kazlauskas@amd.com
>     Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
>     Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
>     Cc: stable@vger.kernel.org
>     Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> 
> diff --git a/drivers/gpu/drm/amd/display/dc/core/dc.c
> b/drivers/gpu/drm/amd/display/dc/core/dc.c
> index 4a619328101c..4acaf4be8a81 100644
> --- a/drivers/gpu/drm/amd/display/dc/core/dc.c
> +++ b/drivers/gpu/drm/amd/display/dc/core/dc.c
> @@ -1011,9 +1011,17 @@ static void program_timing_sync(
>  			}
>  		}
> 
> -		/* set first pipe with plane as master */
> +		/* set first unblanked pipe as master */
>  		for (j = 0; j < group_size; j++) {
> -			if (pipe_set[j]->plane_state) {
> +			bool is_blanked;
> +
> +			if (pipe_set[j]->stream_res.opp->funcs-
> > dpg_is_blanked)
> +				is_blanked =
> +					pipe_set[j]->stream_res.opp-
> > funcs->dpg_is_blanked(pipe_set[j]->stream_res.opp);
> +			else
> +				is_blanked =
> +					pipe_set[j]->stream_res.tg-
> > funcs->is_blanked(pipe_set[j]->stream_res.tg);
> +			if (!is_blanked) {
>  				if (j == 0)
>  					break;
> 
> @@ -1034,9 +1042,17 @@ static void program_timing_sync(
>  				status->timing_sync_info.master =
> false;
> 
>  		}
> -		/* remove any other pipes with plane as they have
> already been synced */
> +		/* remove any other unblanked pipes as they have
> already been synced */
>  		for (j = j + 1; j < group_size; j++) {
> -			if (pipe_set[j]->plane_state) {
> +			bool is_blanked;
> +
> +			if (pipe_set[j]->stream_res.opp->funcs-
> > dpg_is_blanked)
> +				is_blanked =
> +					pipe_set[j]->stream_res.opp-
> > funcs->dpg_is_blanked(pipe_set[j]->stream_res.opp);
> +			else
> +				is_blanked =
> +					pipe_set[j]->stream_res.tg-
> > funcs->is_blanked(pipe_set[j]->stream_res.tg);
> +			if (!is_blanked) {
>  				group_size--;
>  				pipe_set[j] = pipe_set[group_size];
>  				j--;
> 
> commit b5232e2ee8df85891514c73472cac09921e5d51d
> Author: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
> Date:   Tue Jun 2 20:42:33 2020 -0400
> 
>     drm/amd/display: Revalidate bandwidth before commiting DC updates
> 
>     [ Upstream commit a24eaa5c51255b344d5a321f1eeb3205f2775498 ]
> 
>     [Why]
>     Whenever we switch between tiled formats without also switching
> pixel
>     formats or doing anything else that recreates the DC plane state
> we
>     can run into underflow or hangs since we're not updating the
>     DML parameters before committing to the hardware.
> 
>     [How]
>     If the update type is FULL then call validate_bandwidth again to
> update
>     the DML parmeters before committing the state.
> 
>     This is basically just a workaround and protective measure
> against
>     update types being added DC where we could run into this issue in
>     the future.
> 
>     We can only fully validate the state in advance before applying
> it
> to
>     the hardware if we recreate all the plane and stream states since
>     we can't modify what's currently in use.
> 
>     The next step is to update DM to ensure that we're creating the
> plane
>     and stream states for whatever could potentially be a full update
> in
>     DC to pre-emptively recreate the state for DC global validation.
> 
>     The workaround can stay until this has been fixed in DM.
> 
>     Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
>     Reviewed-by: Hersen Wu <hersenxs.wu@amd.com>
>     Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
>     Signed-off-by: Sasha Levin <sashal@kernel.org>
> 
> diff --git a/drivers/gpu/drm/amd/display/dc/core/dc.c
> b/drivers/gpu/drm/amd/display/dc/core/dc.c
> index 47431ca6986d..4a619328101c 100644
> --- a/drivers/gpu/drm/amd/display/dc/core/dc.c
> +++ b/drivers/gpu/drm/amd/display/dc/core/dc.c
> @@ -2517,6 +2517,12 @@ void dc_commit_updates_for_stream(struct dc
> *dc,
> 
>  	copy_stream_update_to_stream(dc, context, stream,
> stream_update);
> 
> +	if (!dc->res_pool->funcs->validate_bandwidth(dc, context,
> false)) {
> +		DC_ERROR("Mode validation failed for stream
> update!\n");
> +		dc_release_state(context);
> +		return;
> +	}
> +
>  	commit_planes_for_stream(
>  				dc,
>  				srf_updates,
> 
> Details:
> 
> * Kernel: 5.7.6
> * GPU: radeon 5700XT
> * CPU: ryzen 3800X
> * running on swaywm(wayland)

Patch
diff mbox series

diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
index f9f02e08054b..7fc15b82fe48 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
@@ -1334,7 +1334,7 @@  static int dm_late_init(void *handle)
 	unsigned int linear_lut[16];
 	int i;
 	struct dmcu *dmcu = adev->dm.dc->res_pool->dmcu;
-	bool ret;
+	bool ret = false;
 
 	for (i = 0; i < 16; i++)
 		linear_lut[i] = 0xFFFF * i / 15;
@@ -1350,10 +1350,13 @@  static int dm_late_init(void *handle)
 	 */
 	params.min_abm_backlight = 0x28F;
 
-	ret = dmcu_load_iram(dmcu, params);
+	/* todo will enable for navi10 */
+	if (adev->asic_type <= CHIP_RAVEN) {
+		ret = dmcu_load_iram(dmcu, params);
 
-	if (!ret)
-		return -EINVAL;
+		if (!ret)
+			return -EINVAL;
+	}
 
 	return detect_mst_link_for_all_connectors(adev->ddev);
 }
diff --git a/drivers/gpu/drm/amd/display/dc/core/dc.c b/drivers/gpu/drm/amd/display/dc/core/dc.c
index 4acaf4be8a81..47431ca6986d 100644
--- a/drivers/gpu/drm/amd/display/dc/core/dc.c
+++ b/drivers/gpu/drm/amd/display/dc/core/dc.c
@@ -1011,17 +1011,9 @@  static void program_timing_sync(
 			}
 		}
 
-		/* set first unblanked pipe as master */
+		/* set first pipe with plane as master */
 		for (j = 0; j < group_size; j++) {
-			bool is_blanked;
-
-			if (pipe_set[j]->stream_res.opp->funcs->dpg_is_blanked)
-				is_blanked =
-					pipe_set[j]->stream_res.opp->funcs->dpg_is_blanked(pipe_set[j]->stream_res.opp);
-			else
-				is_blanked =
-					pipe_set[j]->stream_res.tg->funcs->is_blanked(pipe_set[j]->stream_res.tg);
-			if (!is_blanked) {
+			if (pipe_set[j]->plane_state) {
 				if (j == 0)
 					break;
 
@@ -1042,17 +1034,9 @@  static void program_timing_sync(
 				status->timing_sync_info.master = false;
 
 		}
-		/* remove any other unblanked pipes as they have already been synced */
+		/* remove any other pipes with plane as they have already been synced */
 		for (j = j + 1; j < group_size; j++) {
-			bool is_blanked;
-
-			if (pipe_set[j]->stream_res.opp->funcs->dpg_is_blanked)
-				is_blanked =
-					pipe_set[j]->stream_res.opp->funcs->dpg_is_blanked(pipe_set[j]->stream_res.opp);
-			else
-				is_blanked =
-					pipe_set[j]->stream_res.tg->funcs->is_blanked(pipe_set[j]->stream_res.tg);
-			if (!is_blanked) {
+			if (pipe_set[j]->plane_state) {
 				group_size--;
 				pipe_set[j] = pipe_set[group_size];
 				j--;
@@ -2533,12 +2517,6 @@  void dc_commit_updates_for_stream(struct dc *dc,
 
 	copy_stream_update_to_stream(dc, context, stream, stream_update);
 
-	if (!dc->res_pool->funcs->validate_bandwidth(dc, context, false)) {
-		DC_ERROR("Mode validation failed for stream update!\n");
-		dc_release_state(context);
-		return;
-	}
-
 	commit_planes_for_stream(
 				dc,
 				srf_updates,