mbox series

[RFC,0/2] drm/amd/display: address page fault caused by max surface mismatch

Message ID 20241204210929.1994522-1-mwen@igalia.com (mailing list archive)
Headers show
Series drm/amd/display: address page fault caused by max surface mismatch | expand

Message

Melissa Wen Dec. 4, 2024, 8:43 p.m. UTC
[Resending due to temporary mailing list server error on
gabe.freedesktop.org - trying again to reach dri-devel and amd-gfx] 

Hi,

This is another attempt to approach page fault error faced by Cosmic
users of AMD display hw that exposes two overlay planes. It was first
reported as an interface freeze caused by array-index-out-of-bounds,
where the number of active planes was greater than the maximum number of
surfaces reported. This number of active planes started to be greater
with the introduction of cursor overlay mode, so a situation in which
one primary, two overlays and one cursor overlay == 4 became possible.

After further investigation, I noticed there was a definition mismatch
around the number of surfaces supported by the hw, and two different
values (MAX_SURFACES and MAX_SURFACE_NUM) would be taken through the
DC surface updates flow. Also, the main cause of the interface
freeze seems to be a page fault error, where the regular flow take
MAX_SURFACES == 3 into account and commit_minimal_transition_state
uses MAX_SURFACE_NUM == 6.

AFAIU, four is the maximum number of surfaces supported by the hw and
this amount accomodates current needs, that's why this proposal is
aligned with this number. However, this may not be the right value again
according to a commit in the driver branch that states that 6 is "the
max surfaces supported asics can have" [1]. Misleading change?

Previous discussions can be found at:
- https://lore.kernel.org/amd-gfx/20241114143741.627128-1-zaeem.mohamed@amd.com/
- https://lore.kernel.org/amd-gfx/20241025193727.765195-2-zaeem.mohamed@amd.com/
- https://lore.kernel.org/amd-gfx/20240925154324.348774-1-mwen@igalia.com/

Reported issues (and more discussions) related to this series are at AMD
issue tracker:
- https://gitlab.freedesktop.org/drm/amd/-/issues/3693
- https://gitlab.freedesktop.org/drm/amd/-/issues/3594

Please let me know your thoughts.

Melissa

[1] https://gitlab.freedesktop.org/agd5f/linux/-/commit/3cfd03b79425c 

Melissa Wen (2):
  drm/amd/display: fix page fault due to max surface definition mismatch
  drm/amd/display: increase MAX_SURFACES to the value supported by hw

 drivers/gpu/drm/amd/display/dc/core/dc.c                | 2 +-
 drivers/gpu/drm/amd/display/dc/core/dc_state.c          | 8 ++++----
 drivers/gpu/drm/amd/display/dc/dc.h                     | 4 ++--
 drivers/gpu/drm/amd/display/dc/dc_stream.h              | 2 +-
 drivers/gpu/drm/amd/display/dc/dc_types.h               | 1 -
 drivers/gpu/drm/amd/display/dc/dml2/dml2_mall_phantom.c | 2 +-
 6 files changed, 9 insertions(+), 10 deletions(-)