Message ID | 1377739860-32047-1-git-send-email-benjamin.widawsky@intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Thu, Aug 29, 2013 at 3:31 AM, Ben Widawsky <benjamin.widawsky@intel.com> wrote: > We've tried more than once in the past to print the guilty process. > Prior to Mika's recent work however, we never had a good way to actually > assign guilt. With that, this trivial patch should get print the guilty > process and pid in dmesg. > > Until we merge the full PPGTT support (which have per fd contexts), we > don't have a good way to name the default context used by X, and other > clients not opting in to contexts. As such, they will get "swapper" as > their guilty process. Once the full PPGTT support is merged, we should > be able to properly track the names. > > NOTE: The string is limited to 16 characters as defined by the ASCII > string kept in the task struct. One could theoretically pull out the > full from the command args as done for cmdline in proc, but this is > terribly ugly, and a homework assignment for another day. > > Example courtesy of Eric: > [drm:i915_set_reset_status] *ERROR* render ring hung inside bo (0x1b5a000 ctx 1 [ext_framebuffer pid=5762]) at 0x1b5a034 > > Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> > Tested-by: Eric Anholt <eric@anholt.net> > Signed-off-by: Ben Widawsky <ben@bwidawsk.net> This is neat. But while at it I think we should add the same information to the error state, too. Care to quickly update your patch? Thanks, Daniel
On Thu, Aug 29, 2013 at 08:36:59AM +0200, Daniel Vetter wrote: > On Thu, Aug 29, 2013 at 3:31 AM, Ben Widawsky > <benjamin.widawsky@intel.com> wrote: > > We've tried more than once in the past to print the guilty process. > > Prior to Mika's recent work however, we never had a good way to actually > > assign guilt. With that, this trivial patch should get print the guilty > > process and pid in dmesg. > > > > Until we merge the full PPGTT support (which have per fd contexts), we > > don't have a good way to name the default context used by X, and other > > clients not opting in to contexts. As such, they will get "swapper" as > > their guilty process. Once the full PPGTT support is merged, we should > > be able to properly track the names. > > > > NOTE: The string is limited to 16 characters as defined by the ASCII > > string kept in the task struct. One could theoretically pull out the > > full from the command args as done for cmdline in proc, but this is > > terribly ugly, and a homework assignment for another day. > > > > Example courtesy of Eric: > > [drm:i915_set_reset_status] *ERROR* render ring hung inside bo (0x1b5a000 ctx 1 [ext_framebuffer pid=5762]) at 0x1b5a034 > > > > Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> > > Tested-by: Eric Anholt <eric@anholt.net> > > Signed-off-by: Ben Widawsky <ben@bwidawsk.net> > > This is neat. But while at it I think we should add the same > information to the error state, too. Care to quickly update your > patch? We also have request->file, admittedly only valid for live clients, but could be used for accurate naming. If we want to keep the info around, I think we want a refcounted struct i915_comm that could be used independently of contexts (because we have machines too old for contexts)... -Chris
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index 05de1ca..742e248 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -594,6 +594,10 @@ struct i915_hw_context { struct intel_ring_buffer *ring; struct drm_i915_gem_object *obj; struct i915_ctx_hang_stats hang_stats; + + /* Process which created this context */ + char comm[TASK_COMM_LEN]; + pid_t pid; }; struct i915_fbc { diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index c65650c..4845c7a 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -2205,11 +2205,13 @@ static void i915_set_reset_status(struct intel_ring_buffer *ring, if (ring->hangcheck.action != HANGCHECK_WAIT && i915_request_guilty(request, acthd, &inside)) { - DRM_ERROR("%s hung %s bo (0x%lx ctx %d) at 0x%x\n", + DRM_ERROR("%s hung %s bo (0x%lx ctx %d [%s pid=%d]) at 0x%x\n", ring->name, inside ? "inside" : "flushing", offset, request->ctx ? request->ctx->id : 0, + request->ctx ? request->ctx->comm : "???", + request->ctx ? request->ctx->pid : -1, acthd); guilty = true; diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c index 403309c..eeee324 100644 --- a/drivers/gpu/drm/i915/i915_gem_context.c +++ b/drivers/gpu/drm/i915/i915_gem_context.c @@ -85,6 +85,7 @@ * */ +#include <linux/init_task.h> #include <drm/drmP.h> #include <drm/i915_drm.h> #include "i915_drv.h" @@ -226,6 +227,9 @@ static int create_default_context(struct drm_i915_private *dev_priv) goto err_unpin; } + memcpy(ctx->comm, INIT_TASK_COMM, TASK_COMM_LEN); + ctx->pid = 0; + DRM_DEBUG_DRIVER("Default HW context loaded\n"); return 0; @@ -539,6 +543,9 @@ int i915_gem_context_create_ioctl(struct drm_device *dev, void *data, if (IS_ERR(ctx)) return PTR_ERR(ctx); + memcpy(ctx->comm, current->comm, TASK_COMM_LEN); + ctx->pid = task_pid_nr(current); + args->ctx_id = ctx->id; DRM_DEBUG_DRIVER("HW context %d created\n", args->ctx_id);