mbox series

[0/3] drm/amd/display: fixes for kernel crashes since cursor overlay mode

Message ID 20241217205029.39850-1-mwen@igalia.com (mailing list archive)
Headers show
Series drm/amd/display: fixes for kernel crashes since cursor overlay mode | expand

Message

Melissa Wen Dec. 17, 2024, 8:45 p.m. UTC
Hi,

Some issues have been found by Cosmic users of AMD display since the
introduction of cursor overlay mode: page fault and divide errors
causing interface freezes. Both are 100% reproducible and affects
multiple HW versions.

Patch 1 addresses the page fault error by resolving the definition
mismatch around the number of surfaces supported by the hw, where two
different values (MAX_SURFACES and MAX_SURFACE_NUM) would be taken
through the DC surface updates flow. The regular flow take MAX_SURFACES
== 3 into account and commit_minimal_transition_state uses
MAX_SURFACE_NUM == 6. I noticed that Leo Li has proposed this change in
a previous discussion [1], so I added a Suggested-by tag.

Patch 2 expands the maximum number of surfaces to four, since it's
supported by the hw. Also, this amount accomodates current needs,
avoiding `dc_state_add_plane` complaints of not enough resource. Note
that it somehow reverts the change proposed by [2].

Related AMD issues:
- https://gitlab.freedesktop.org/drm/amd/-/issues/3693
- https://gitlab.freedesktop.org/drm/amd/-/issues/3594

Patch 3 fixes a kernel oops due to division by zero error by checking if
the destination scale size is zero, avoiding calculation and just
setting the out-scale size to zero, similar to what is  done by
drm_calc_scale(). Even though the missing check in dm_get_plane_scale()
wasn't introduced by cursor overlay mode, AFAIU the cursor mode
assessment happens before plane state checks, so
amdgpu_dm_plane_helper_check_state() can't prevent
dm_crtc_get_cursor_mode() taking an invisible plane into account.

Related AMD issue:
- https://gitlab.freedesktop.org/drm/amd/-/issues/3729

Other previous discussions can be found at:
- https://lore.kernel.org/amd-gfx/20241114143741.627128-1-zaeem.mohamed@amd.com/
- https://lore.kernel.org/amd-gfx/20240925154324.348774-1-mwen@igalia.com/

Thanks in advance for any feedback.

Melissa

[1] https://lore.kernel.org/amd-gfx/20241025193727.765195-2-zaeem.mohamed@amd.com/
[2] https://gitlab.freedesktop.org/agd5f/linux/-/commit/3cfd03b79425c

Melissa Wen (3):
  drm/amd/display: fix page fault due to max surface definition mismatch
  drm/amd/display: increase MAX_SURFACES to the value supported by hw
  drm/amd/display: fix divide error in DM plane scale calcs

 drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c       | 4 ++--
 drivers/gpu/drm/amd/display/dc/core/dc.c                | 2 +-
 drivers/gpu/drm/amd/display/dc/core/dc_state.c          | 8 ++++----
 drivers/gpu/drm/amd/display/dc/dc.h                     | 4 ++--
 drivers/gpu/drm/amd/display/dc/dc_stream.h              | 2 +-
 drivers/gpu/drm/amd/display/dc/dc_types.h               | 1 -
 drivers/gpu/drm/amd/display/dc/dml2/dml2_mall_phantom.c | 2 +-
 7 files changed, 11 insertions(+), 12 deletions(-)

Comments

Rodrigo Siqueira Jordao Dec. 19, 2024, 9:23 p.m. UTC | #1
On 12/17/24 1:45 PM, Melissa Wen wrote:
> Hi,
> 
> Some issues have been found by Cosmic users of AMD display since the
> introduction of cursor overlay mode: page fault and divide errors
> causing interface freezes. Both are 100% reproducible and affects
> multiple HW versions.
> 
> Patch 1 addresses the page fault error by resolving the definition
> mismatch around the number of surfaces supported by the hw, where two
> different values (MAX_SURFACES and MAX_SURFACE_NUM) would be taken
> through the DC surface updates flow. The regular flow take MAX_SURFACES
> == 3 into account and commit_minimal_transition_state uses
> MAX_SURFACE_NUM == 6. I noticed that Leo Li has proposed this change in
> a previous discussion [1], so I added a Suggested-by tag.
> 
> Patch 2 expands the maximum number of surfaces to four, since it's
> supported by the hw. Also, this amount accomodates current needs,
> avoiding `dc_state_add_plane` complaints of not enough resource. Note
> that it somehow reverts the change proposed by [2].
> 
> Related AMD issues:
> - https://gitlab.freedesktop.org/drm/amd/-/issues/3693
> - https://gitlab.freedesktop.org/drm/amd/-/issues/3594
> 
> Patch 3 fixes a kernel oops due to division by zero error by checking if
> the destination scale size is zero, avoiding calculation and just
> setting the out-scale size to zero, similar to what is  done by
> drm_calc_scale(). Even though the missing check in dm_get_plane_scale()
> wasn't introduced by cursor overlay mode, AFAIU the cursor mode
> assessment happens before plane state checks, so
> amdgpu_dm_plane_helper_check_state() can't prevent
> dm_crtc_get_cursor_mode() taking an invisible plane into account.
> 
> Related AMD issue:
> - https://gitlab.freedesktop.org/drm/amd/-/issues/3729
> 
> Other previous discussions can be found at:
> - https://lore.kernel.org/amd-gfx/20241114143741.627128-1-zaeem.mohamed@amd.com/
> - https://lore.kernel.org/amd-gfx/20240925154324.348774-1-mwen@igalia.com/
> 
> Thanks in advance for any feedback.
> 
> Melissa
> 
> [1] https://lore.kernel.org/amd-gfx/20241025193727.765195-2-zaeem.mohamed@amd.com/
> [2] https://gitlab.freedesktop.org/agd5f/linux/-/commit/3cfd03b79425c
> 
> Melissa Wen (3):
>    drm/amd/display: fix page fault due to max surface definition mismatch
>    drm/amd/display: increase MAX_SURFACES to the value supported by hw
>    drm/amd/display: fix divide error in DM plane scale calcs
> 
>   drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c       | 4 ++--
>   drivers/gpu/drm/amd/display/dc/core/dc.c                | 2 +-
>   drivers/gpu/drm/amd/display/dc/core/dc_state.c          | 8 ++++----
>   drivers/gpu/drm/amd/display/dc/dc.h                     | 4 ++--
>   drivers/gpu/drm/amd/display/dc/dc_stream.h              | 2 +-
>   drivers/gpu/drm/amd/display/dc/dc_types.h               | 1 -
>   drivers/gpu/drm/amd/display/dc/dml2/dml2_mall_phantom.c | 2 +-
>   7 files changed, 11 insertions(+), 12 deletions(-)
> 

Hi Melissa,

Thanks a lot for your series. I tested it on a couple of hardware 
devices, and I think everything is alright.

Series is
Reviewed-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>

and your series was merged into the amd-staging-drm-next.

Thanks
Siqueira