Message ID | 20250214203757.27895-1-jonathan.cavitt@intel.com (mailing list archive) |
---|---|
Headers | show |
Series | drm/xe/xe_drm_client: Add per drm client reset stats | expand |
Hi Jonathan, Em 14/02/2025 17:37, Jonathan Cavitt escreveu: > Add additional information to drm client so it can report the last 50 > exec queues to have been banned on it, as well as the last pagefault > seen when said exec queues were banned. Since we cannot reasonably > associate a pagefault to a specific exec queue, we currently report the > last seen pagefault on the associated hw engine instead. > > The last pagefault seen per exec queue is saved to the hw engine, and the > pagefault is updated during the pagefault handling process in > xe_gt_pagefault. The last seen pagefault is reset when the engine is > reset because any future exec queue bans likely were not caused by said > pagefault after the reset. > > Also add a tracker that counts the number of times the drm client has > experienced an engine reset. What's the use case for this? How will userspace consume this information? > > Signed-off-by: Jonathan Cavitt <jonathan.cavitt@intel.com> > > Jonathan Cavitt (4): > drm/xe/xe_exec_queue: Add ID param to exec queue struct > drm/xe/xe_gt_pagefault: Migrate pagefault struct to header > FIXME: drm/xe/xe_drm_client: Add per drm client pagefault info > drm/xe/xe_drm_client: Add per drm client reset stats > > drivers/gpu/drm/xe/xe_drm_client.c | 130 +++++++++++++++++++++++ > drivers/gpu/drm/xe/xe_drm_client.h | 38 +++++++ > drivers/gpu/drm/xe/xe_exec_queue.c | 8 ++ > drivers/gpu/drm/xe/xe_exec_queue_types.h | 2 + > drivers/gpu/drm/xe/xe_gt_pagefault.c | 46 ++++---- > drivers/gpu/drm/xe/xe_gt_pagefault.h | 51 +++++++++ > drivers/gpu/drm/xe/xe_guc_submit.c | 19 ++++ > drivers/gpu/drm/xe/xe_hw_engine.c | 4 + > drivers/gpu/drm/xe/xe_hw_engine_types.h | 8 ++ > 9 files changed, 279 insertions(+), 27 deletions(-) >
Add additional information to drm client so it can report the last 50 exec queues to have been banned on it, as well as the last pagefault seen when said exec queues were banned. Since we cannot reasonably associate a pagefault to a specific exec queue, we currently report the last seen pagefault on the associated hw engine instead. The last pagefault seen per exec queue is saved to the hw engine, and the pagefault is updated during the pagefault handling process in xe_gt_pagefault. The last seen pagefault is reset when the engine is reset because any future exec queue bans likely were not caused by said pagefault after the reset. Also add a tracker that counts the number of times the drm client has experienced an engine reset. Signed-off-by: Jonathan Cavitt <jonathan.cavitt@intel.com> Jonathan Cavitt (4): drm/xe/xe_exec_queue: Add ID param to exec queue struct drm/xe/xe_gt_pagefault: Migrate pagefault struct to header FIXME: drm/xe/xe_drm_client: Add per drm client pagefault info drm/xe/xe_drm_client: Add per drm client reset stats drivers/gpu/drm/xe/xe_drm_client.c | 130 +++++++++++++++++++++++ drivers/gpu/drm/xe/xe_drm_client.h | 38 +++++++ drivers/gpu/drm/xe/xe_exec_queue.c | 8 ++ drivers/gpu/drm/xe/xe_exec_queue_types.h | 2 + drivers/gpu/drm/xe/xe_gt_pagefault.c | 46 ++++---- drivers/gpu/drm/xe/xe_gt_pagefault.h | 51 +++++++++ drivers/gpu/drm/xe/xe_guc_submit.c | 19 ++++ drivers/gpu/drm/xe/xe_hw_engine.c | 4 + drivers/gpu/drm/xe/xe_hw_engine_types.h | 8 ++ 9 files changed, 279 insertions(+), 27 deletions(-)