diff mbox

drm/i915: Track the proc which created the context

Message ID 1377739860-32047-1-git-send-email-benjamin.widawsky@intel.com (mailing list archive)
State New, archived
Headers show

Commit Message

Ben Widawsky Aug. 29, 2013, 1:31 a.m. UTC
We've tried more than once in the past to print the guilty process.
Prior to Mika's recent work however, we never had a good way to actually
assign guilt. With that, this trivial patch should get print the guilty
process and pid in dmesg.

Until we merge the full PPGTT support (which have per fd contexts), we
don't have a good way to name the default context used by X, and other
clients not opting in to contexts. As such, they will get "swapper" as
their guilty process. Once the full PPGTT support is merged, we should
be able to properly track the names.

NOTE: The string is limited to 16 characters as defined by the ASCII
string kept in the task struct. One could theoretically pull out the
full from the command args as done for cmdline in proc, but this is
terribly ugly, and a homework assignment for another day.

Example courtesy of Eric:
[drm:i915_set_reset_status] *ERROR* render ring hung inside bo (0x1b5a000 ctx 1 [ext_framebuffer pid=5762]) at 0x1b5a034

Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Tested-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
---
 drivers/gpu/drm/i915/i915_drv.h         | 4 ++++
 drivers/gpu/drm/i915/i915_gem.c         | 4 +++-
 drivers/gpu/drm/i915/i915_gem_context.c | 7 +++++++
 3 files changed, 14 insertions(+), 1 deletion(-)

Comments

Daniel Vetter Aug. 29, 2013, 6:36 a.m. UTC | #1
On Thu, Aug 29, 2013 at 3:31 AM, Ben Widawsky
<benjamin.widawsky@intel.com> wrote:
> We've tried more than once in the past to print the guilty process.
> Prior to Mika's recent work however, we never had a good way to actually
> assign guilt. With that, this trivial patch should get print the guilty
> process and pid in dmesg.
>
> Until we merge the full PPGTT support (which have per fd contexts), we
> don't have a good way to name the default context used by X, and other
> clients not opting in to contexts. As such, they will get "swapper" as
> their guilty process. Once the full PPGTT support is merged, we should
> be able to properly track the names.
>
> NOTE: The string is limited to 16 characters as defined by the ASCII
> string kept in the task struct. One could theoretically pull out the
> full from the command args as done for cmdline in proc, but this is
> terribly ugly, and a homework assignment for another day.
>
> Example courtesy of Eric:
> [drm:i915_set_reset_status] *ERROR* render ring hung inside bo (0x1b5a000 ctx 1 [ext_framebuffer pid=5762]) at 0x1b5a034
>
> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> Tested-by: Eric Anholt <eric@anholt.net>
> Signed-off-by: Ben Widawsky <ben@bwidawsk.net>

This is neat. But while at it I think we should add the same
information to the error state, too. Care to quickly update your
patch?

Thanks, Daniel
Chris Wilson Aug. 29, 2013, 9:33 a.m. UTC | #2
On Thu, Aug 29, 2013 at 08:36:59AM +0200, Daniel Vetter wrote:
> On Thu, Aug 29, 2013 at 3:31 AM, Ben Widawsky
> <benjamin.widawsky@intel.com> wrote:
> > We've tried more than once in the past to print the guilty process.
> > Prior to Mika's recent work however, we never had a good way to actually
> > assign guilt. With that, this trivial patch should get print the guilty
> > process and pid in dmesg.
> >
> > Until we merge the full PPGTT support (which have per fd contexts), we
> > don't have a good way to name the default context used by X, and other
> > clients not opting in to contexts. As such, they will get "swapper" as
> > their guilty process. Once the full PPGTT support is merged, we should
> > be able to properly track the names.
> >
> > NOTE: The string is limited to 16 characters as defined by the ASCII
> > string kept in the task struct. One could theoretically pull out the
> > full from the command args as done for cmdline in proc, but this is
> > terribly ugly, and a homework assignment for another day.
> >
> > Example courtesy of Eric:
> > [drm:i915_set_reset_status] *ERROR* render ring hung inside bo (0x1b5a000 ctx 1 [ext_framebuffer pid=5762]) at 0x1b5a034
> >
> > Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> > Tested-by: Eric Anholt <eric@anholt.net>
> > Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
> 
> This is neat. But while at it I think we should add the same
> information to the error state, too. Care to quickly update your
> patch?

We also have request->file, admittedly only valid for live clients, but
could be used for accurate naming. If we want to keep the info around, I
think we want a refcounted struct i915_comm that could be used
independently of contexts (because we have machines too old for
contexts)...
-Chris
diff mbox

Patch

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 05de1ca..742e248 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -594,6 +594,10 @@  struct i915_hw_context {
 	struct intel_ring_buffer *ring;
 	struct drm_i915_gem_object *obj;
 	struct i915_ctx_hang_stats hang_stats;
+
+	/* Process which created this context */
+	char comm[TASK_COMM_LEN];
+	pid_t pid;
 };
 
 struct i915_fbc {
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index c65650c..4845c7a 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2205,11 +2205,13 @@  static void i915_set_reset_status(struct intel_ring_buffer *ring,
 
 	if (ring->hangcheck.action != HANGCHECK_WAIT &&
 	    i915_request_guilty(request, acthd, &inside)) {
-		DRM_ERROR("%s hung %s bo (0x%lx ctx %d) at 0x%x\n",
+		DRM_ERROR("%s hung %s bo (0x%lx ctx %d [%s pid=%d]) at 0x%x\n",
 			  ring->name,
 			  inside ? "inside" : "flushing",
 			  offset,
 			  request->ctx ? request->ctx->id : 0,
+			  request->ctx ? request->ctx->comm : "???",
+			  request->ctx ? request->ctx->pid : -1,
 			  acthd);
 
 		guilty = true;
diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index 403309c..eeee324 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -85,6 +85,7 @@ 
  *
  */
 
+#include <linux/init_task.h>
 #include <drm/drmP.h>
 #include <drm/i915_drm.h>
 #include "i915_drv.h"
@@ -226,6 +227,9 @@  static int create_default_context(struct drm_i915_private *dev_priv)
 		goto err_unpin;
 	}
 
+	memcpy(ctx->comm, INIT_TASK_COMM, TASK_COMM_LEN);
+	ctx->pid = 0;
+
 	DRM_DEBUG_DRIVER("Default HW context loaded\n");
 	return 0;
 
@@ -539,6 +543,9 @@  int i915_gem_context_create_ioctl(struct drm_device *dev, void *data,
 	if (IS_ERR(ctx))
 		return PTR_ERR(ctx);
 
+	memcpy(ctx->comm, current->comm, TASK_COMM_LEN);
+	ctx->pid = task_pid_nr(current);
+
 	args->ctx_id = ctx->id;
 	DRM_DEBUG_DRIVER("HW context %d created\n", args->ctx_id);