mbox series

[v3,0/1] Maintenence of devcoredump <-> GuC-Err-Capture plumbing

Message ID 20241203174732.3232351-1-alan.previn.teres.alexis@intel.com (mailing list archive)
Headers show
Series Maintenence of devcoredump <-> GuC-Err-Capture plumbing | expand

Message

Alan Previn Dec. 3, 2024, 5:47 p.m. UTC
The GuC-Error-Capture is currently reaching into xe_devcoredump
structure to store its own place-holder snaphot to workaround
the race between G2H-Error-Capture-Notification vs Drm-Scheduler
triggering GuC-Submission-exec-queue-timeout/kill.

Part of that race workaround design included GuC-Error-Capture taking
on some of the front-end functions for xe_hw_engine_snapshot
generation because of the orthogonal debugfs for raw dumps of engine
registers without any job association. We want this to also be handled,
even if indirectly, by GuC-Error-Capture since there is a lot to manage
when it comes to reading and printing the register lists.

However, logically speaking, GuC-Error-Capture node management,
despite being the majority of an engine-snapshot work, is still
a subset of xe_hw_engine_snapshot.

This series intends to re-design the plumbing for future
maintenence and scalability, rearranging the layering
back to what its should be (xe_devcoredump_snapshot owns
xe_hw_engine_snapshot owns xe_guc_capture_snapshot)..

Alan Previn (1):
  drm/xe/guc/capture: Maintenence of devcoredump <-> GuC-Err-Capture
    plumbing

 drivers/gpu/drm/xe/xe_devcoredump.c           |   3 -
 drivers/gpu/drm/xe/xe_devcoredump_types.h     |   6 -
 drivers/gpu/drm/xe/xe_guc_capture.c           | 406 ++++++++----------
 drivers/gpu/drm/xe/xe_guc_capture.h           |  10 +-
 .../drm/xe/xe_guc_capture_snapshot_types.h    |  68 +++
 drivers/gpu/drm/xe/xe_guc_submit.c            |  21 +-
 drivers/gpu/drm/xe/xe_hw_engine.c             | 117 +++--
 drivers/gpu/drm/xe/xe_hw_engine.h             |   4 +-
 drivers/gpu/drm/xe/xe_hw_engine_types.h       |  13 +-
 9 files changed, 359 insertions(+), 289 deletions(-)
 create mode 100644 drivers/gpu/drm/xe/xe_guc_capture_snapshot_types.h


base-commit: 906c4b306e9340f6ffd6d44904ebc86e62e63627

Comments

Dong, Zhanjun Dec. 4, 2024, 10:28 p.m. UTC | #1
LGTM

Reviewed-by: Zhanjun Dong <zhanjun.dong@intel.com>

On 2024-12-03 12:47 p.m., Alan Previn wrote:
> The GuC-Error-Capture is currently reaching into xe_devcoredump
> structure to store its own place-holder snaphot to workaround
> the race between G2H-Error-Capture-Notification vs Drm-Scheduler
> triggering GuC-Submission-exec-queue-timeout/kill.
> 
> Part of that race workaround design included GuC-Error-Capture taking
> on some of the front-end functions for xe_hw_engine_snapshot
> generation because of the orthogonal debugfs for raw dumps of engine
> registers without any job association. We want this to also be handled,
> even if indirectly, by GuC-Error-Capture since there is a lot to manage
> when it comes to reading and printing the register lists.
> 
> However, logically speaking, GuC-Error-Capture node management,
> despite being the majority of an engine-snapshot work, is still
> a subset of xe_hw_engine_snapshot.
> 
> This series intends to re-design the plumbing for future
> maintenence and scalability, rearranging the layering
> back to what its should be (xe_devcoredump_snapshot owns
> xe_hw_engine_snapshot owns xe_guc_capture_snapshot)..
> 
> Alan Previn (1):
>    drm/xe/guc/capture: Maintenence of devcoredump <-> GuC-Err-Capture
>      plumbing
> 
>   drivers/gpu/drm/xe/xe_devcoredump.c           |   3 -
>   drivers/gpu/drm/xe/xe_devcoredump_types.h     |   6 -
>   drivers/gpu/drm/xe/xe_guc_capture.c           | 406 ++++++++----------
>   drivers/gpu/drm/xe/xe_guc_capture.h           |  10 +-
>   .../drm/xe/xe_guc_capture_snapshot_types.h    |  68 +++
>   drivers/gpu/drm/xe/xe_guc_submit.c            |  21 +-
>   drivers/gpu/drm/xe/xe_hw_engine.c             | 117 +++--
>   drivers/gpu/drm/xe/xe_hw_engine.h             |   4 +-
>   drivers/gpu/drm/xe/xe_hw_engine_types.h       |  13 +-
>   9 files changed, 359 insertions(+), 289 deletions(-)
>   create mode 100644 drivers/gpu/drm/xe/xe_guc_capture_snapshot_types.h
> 
> 
> base-commit: 906c4b306e9340f6ffd6d44904ebc86e62e63627