mbox series

[v4,0/1] Maintenence of devcoredump <-> GuC-Err-Capture plumbing

Message ID 20250121190935.1984508-1-alan.previn.teres.alexis@intel.com (mailing list archive)
Headers show
Series Maintenence of devcoredump <-> GuC-Err-Capture plumbing | expand

Message

Alan Previn Jan. 21, 2025, 7:09 p.m. UTC
The GuC-Error-Capture is currently reaching into xe_devcoredump
structure to store its own place-holder snaphot to workaround
the race between G2H-Error-Capture-Notification vs Drm-Scheduler
triggering GuC-Submission-exec-queue-timeout/kill.

Part of that race workaround design included GuC-Error-Capture taking
on some of the front-end functions for xe_hw_engine_snapshot
generation because of the orthogonal debugfs for raw dumps of engine
registers without any job association. We want this to also be handled,
even if indirectly, by GuC-Error-Capture since there is a lot to manage
when it comes to reading and printing the register lists.

However, logically speaking, GuC-Error-Capture node management,
despite being the majority of an engine-snapshot work, is still
a subset of xe_hw_engine_snapshot.

This series intends to re-design the plumbing for future
maintenence and scalability, rearranging the layering
back to what its should be (xe_devcoredump_snapshot owns
xe_hw_engine_snapshot owns xe_guc_capture_snapshot)..

Alan Previn (1):
  drm/xe/guc/capture: Maintenence of devcoredump <-> GuC-Err-Capture
    plumbing

 drivers/gpu/drm/xe/xe_devcoredump.c           |   3 -
 drivers/gpu/drm/xe/xe_devcoredump_types.h     |   6 -
 drivers/gpu/drm/xe/xe_guc_capture.c           | 406 ++++++++----------
 drivers/gpu/drm/xe/xe_guc_capture.h           |  10 +-
 .../drm/xe/xe_guc_capture_snapshot_types.h    |  68 +++
 drivers/gpu/drm/xe/xe_guc_submit.c            |  21 +-
 drivers/gpu/drm/xe/xe_hw_engine.c             | 117 +++--
 drivers/gpu/drm/xe/xe_hw_engine.h             |   4 +-
 drivers/gpu/drm/xe/xe_hw_engine_types.h       |  13 +-
 9 files changed, 359 insertions(+), 289 deletions(-)
 create mode 100644 drivers/gpu/drm/xe/xe_guc_capture_snapshot_types.h


base-commit: cfa9d40db8c30d894171010fe765d96e9bc6a47e

Comments

Rodrigo Vivi Jan. 21, 2025, 11:15 p.m. UTC | #1
On Tue, Jan 21, 2025 at 11:09:34AM -0800, Alan Previn wrote:
> The GuC-Error-Capture is currently reaching into xe_devcoredump
> structure to store its own place-holder snaphot to workaround
> the race between G2H-Error-Capture-Notification vs Drm-Scheduler
> triggering GuC-Submission-exec-queue-timeout/kill.
> 
> Part of that race workaround design included GuC-Error-Capture taking
> on some of the front-end functions for xe_hw_engine_snapshot
> generation because of the orthogonal debugfs for raw dumps of engine
> registers without any job association. We want this to also be handled,
> even if indirectly, by GuC-Error-Capture since there is a lot to manage
> when it comes to reading and printing the register lists.
> 
> However, logically speaking, GuC-Error-Capture node management,
> despite being the majority of an engine-snapshot work, is still
> a subset of xe_hw_engine_snapshot.
> 
> This series intends to re-design the plumbing for future

A 'series' of 1 patch is not a series. Cover letter is not needed.

However, this patch is the size of a series and it should be
split. I'm really surprised that someone went over and
*really* reviewed it.

Even the subject of the patch doesn't make a lot of sense to me.
I don't even know what to write in the pull-request with a patch
like this.

Please break this in small patches.

> maintenence and scalability, rearranging the layering
> back to what its should be (xe_devcoredump_snapshot owns
> xe_hw_engine_snapshot owns xe_guc_capture_snapshot)..
> 
> Alan Previn (1):
>   drm/xe/guc/capture: Maintenence of devcoredump <-> GuC-Err-Capture
>     plumbing
> 
>  drivers/gpu/drm/xe/xe_devcoredump.c           |   3 -
>  drivers/gpu/drm/xe/xe_devcoredump_types.h     |   6 -
>  drivers/gpu/drm/xe/xe_guc_capture.c           | 406 ++++++++----------
>  drivers/gpu/drm/xe/xe_guc_capture.h           |  10 +-
>  .../drm/xe/xe_guc_capture_snapshot_types.h    |  68 +++
>  drivers/gpu/drm/xe/xe_guc_submit.c            |  21 +-
>  drivers/gpu/drm/xe/xe_hw_engine.c             | 117 +++--
>  drivers/gpu/drm/xe/xe_hw_engine.h             |   4 +-
>  drivers/gpu/drm/xe/xe_hw_engine_types.h       |  13 +-
>  9 files changed, 359 insertions(+), 289 deletions(-)
>  create mode 100644 drivers/gpu/drm/xe/xe_guc_capture_snapshot_types.h
> 
> 
> base-commit: cfa9d40db8c30d894171010fe765d96e9bc6a47e
> -- 
> 2.34.1
>