From patchwork Fri Aug 5 09:05:52 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chris Wilson X-Patchwork-Id: 9264971 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 789E860760 for ; Fri, 5 Aug 2016 09:06:17 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 6775127F46 for ; Fri, 5 Aug 2016 09:06:17 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 5C03428422; Fri, 5 Aug 2016 09:06:17 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-4.1 required=2.0 tests=BAYES_00,DKIM_SIGNED, RCVD_IN_DNSWL_MED,T_DKIM_INVALID autolearn=ham version=3.3.1 Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 8833F27F46 for ; Fri, 5 Aug 2016 09:06:16 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id B85566EB1A; Fri, 5 Aug 2016 09:06:15 +0000 (UTC) X-Original-To: intel-gfx@lists.freedesktop.org Delivered-To: intel-gfx@lists.freedesktop.org Received: from mail-wm0-x244.google.com (mail-wm0-x244.google.com [IPv6:2a00:1450:400c:c09::244]) by gabe.freedesktop.org (Postfix) with ESMTPS id 14E7F6EB19 for ; Fri, 5 Aug 2016 09:06:14 +0000 (UTC) Received: by mail-wm0-x244.google.com with SMTP id x83so2861737wma.3 for ; Fri, 05 Aug 2016 02:06:14 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=sender:from:to:cc:subject:date:message-id:in-reply-to:references; bh=KP7CUZkwRg/8l4XYk2eGt+SVUtvnSUk8NnRnY6Dw/g0=; b=UO/mSI8WU74JeHqyR9nEQzWdPUczxOGTYe2LoOZ3c12xqGnuwzQpFTKG79G1obivyh drwjGf3vD0k+YP/yC8ZOZuxdG9CQIrhUYVNJtMq6Ip67ZsZLiLGSn1WQedOXp5+qk5dP UelGSGpO2r6GlE6Vly2IgQNQLiFQoGNpW6tTZkyJYpqOlRB78IZvMWQ6QVGuZLPIIcQp 0nUHJPyBMEUosZz7RUh0ienGJ5h8t3/DlKmkusalOETjJf9CVDjnA4b141/qhF3k3v7n RsotZE6Q8M7U5IBKePQnD5Bq/uCH6X4zFSCEx4twsFqzogTTcGUxNOwby7yGONmiy5Xf XRPg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:sender:from:to:cc:subject:date:message-id :in-reply-to:references; bh=KP7CUZkwRg/8l4XYk2eGt+SVUtvnSUk8NnRnY6Dw/g0=; b=Gwy9ozfu8pHCTJUqoEnykbRQrOJ9QBKmuFzEoVuWtYU7mzJwcwcDCUCp1KxC54Wt4Y Av52/OfzM2gkiKPKg7AxdiOsu209gcstHtAxS5XVcsJsih3FY3CqPEs17WcB6rNeCfar JYKaZQE/clwf/zy5Amy1qw6Gr3EW47AUZ/lCipKmysiYZj/dWGOR5B/lUFDYY1vDlQsi m/P8KpR4tgpYXlToS2G2E19rnZqEeZI2csbdr/BugK1sPA8IOy02RXAQRDj6EAvSAqGk 1oRbRI7G5ipXNzHaNPt6DZWPVmX6iiRkCYKitWonsH+TbHILJrqB3AhJEg5+XgaRb/R9 fIVQ== X-Gm-Message-State: AEkoouvQsSkEDTtAiyNYvqy0lfCXAc0Cs4un+Uqr2NKFHwIFcKHl1euMOiN2b3H2aFQx7Q== X-Received: by 10.28.51.210 with SMTP id z201mr2325723wmz.98.1470387972346; Fri, 05 Aug 2016 02:06:12 -0700 (PDT) Received: from haswell.alporthouse.com ([78.156.65.138]) by smtp.gmail.com with ESMTPSA id m81sm7623439wmf.1.2016.08.05.02.06.10 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 05 Aug 2016 02:06:10 -0700 (PDT) From: Chris Wilson To: intel-gfx@lists.freedesktop.org Date: Fri, 5 Aug 2016 10:05:52 +0100 Message-Id: <1470387964-3363-2-git-send-email-chris@chris-wilson.co.uk> X-Mailer: git-send-email 2.8.1 In-Reply-To: <1470387964-3363-1-git-send-email-chris@chris-wilson.co.uk> References: <1470387964-3363-1-git-send-email-chris@chris-wilson.co.uk> Subject: [Intel-gfx] [PATCH 01/13] drm/i915: Reduce amount of duplicate buffer information captured on error X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" X-Virus-Scanned: ClamAV using ClamSMTP When capturing the error state, we do not need to know about every address space - just those that are related to the error. We know which context is active at the time, therefore we know which VM are implicated in the error. We can then restrict the VM which we report to the relevant subset. Signed-off-by: Chris Wilson --- drivers/gpu/drm/i915/i915_drv.h | 9 +- drivers/gpu/drm/i915/i915_gpu_error.c | 202 ++++++++++++++-------------------- 2 files changed, 89 insertions(+), 122 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index feec00f769e1..7054baa92061 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -517,6 +517,7 @@ struct drm_i915_error_state { int num_waiters; int hangcheck_score; enum intel_engine_hangcheck_action hangcheck_action; + struct i915_address_space *vm; int num_requests; /* our own tracking of ring head and tail */ @@ -586,17 +587,15 @@ struct drm_i915_error_state { u32 read_domains; u32 write_domain; s32 fence_reg:I915_MAX_NUM_FENCE_BITS; - s32 pinned:2; u32 tiling:2; u32 dirty:1; u32 purgeable:1; u32 userptr:1; s32 engine:4; u32 cache_level:3; - } **active_bo, **pinned_bo; - - u32 *active_bo_count, *pinned_bo_count; - u32 vm_count; + } *active_bo[I915_NUM_ENGINES], *pinned_bo; + u32 active_bo_count[I915_NUM_ENGINES], pinned_bo_count; + struct i915_address_space *active_vm[I915_NUM_ENGINES]; }; struct intel_connector; diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c index eecb87063c88..ced296983caa 100644 --- a/drivers/gpu/drm/i915/i915_gpu_error.c +++ b/drivers/gpu/drm/i915/i915_gpu_error.c @@ -42,16 +42,6 @@ static const char *engine_str(int engine) } } -static const char *pin_flag(int pinned) -{ - if (pinned > 0) - return " P"; - else if (pinned < 0) - return " p"; - else - return ""; -} - static const char *tiling_flag(int tiling) { switch (tiling) { @@ -189,7 +179,7 @@ static void print_error_buffers(struct drm_i915_error_state_buf *m, { int i; - err_printf(m, " %s [%d]:\n", name, count); + err_printf(m, "%s [%d]:\n", name, count); while (count--) { err_printf(m, " %08x_%08x %8u %02x %02x [ ", @@ -202,7 +192,6 @@ static void print_error_buffers(struct drm_i915_error_state_buf *m, err_printf(m, "%02x ", err->rseqno[i]); err_printf(m, "] %02x", err->wseqno); - err_puts(m, pin_flag(err->pinned)); err_puts(m, tiling_flag(err->tiling)); err_puts(m, dirty_flag(err->dirty)); err_puts(m, purgeable_flag(err->purgeable)); @@ -414,18 +403,25 @@ int i915_error_state_to_str(struct drm_i915_error_state_buf *m, error_print_engine(m, &error->engine[i]); } - for (i = 0; i < error->vm_count; i++) { - err_printf(m, "vm[%d]\n", i); + for (i = 0; i < I915_NUM_ENGINES; i++) { + if (!error->active_vm[i]) + break; - print_error_buffers(m, "Active", + err_printf(m, "Active vm[%d]\n", i); + for (j = 0; j < I915_NUM_ENGINES; j++) { + if (error->engine[j].vm == error->active_vm[i]) + err_printf(m, " %s\n", + dev_priv->engine[j].name); + } + print_error_buffers(m, " Buffers", error->active_bo[i], error->active_bo_count[i]); - - print_error_buffers(m, "Pinned", - error->pinned_bo[i], - error->pinned_bo_count[i]); } + print_error_buffers(m, "Pinned (global)", + error->pinned_bo, + error->pinned_bo_count); + for (i = 0; i < ARRAY_SIZE(error->engine); i++) { struct drm_i915_error_engine *ee = &error->engine[i]; @@ -626,13 +622,10 @@ static void i915_error_state_free(struct kref *error_ref) i915_error_object_free(error->semaphore_obj); - for (i = 0; i < error->vm_count; i++) + for (i = 0; i < ARRAY_SIZE(error->active_bo); i++) kfree(error->active_bo[i]); - - kfree(error->active_bo); - kfree(error->active_bo_count); kfree(error->pinned_bo); - kfree(error->pinned_bo_count); + kfree(error->overlay); kfree(error->display); kfree(error); @@ -778,9 +771,6 @@ static void capture_bo(struct drm_i915_error_buffer *err, err->read_domains = obj->base.read_domains; err->write_domain = obj->base.write_domain; err->fence_reg = obj->fence_reg; - err->pinned = 0; - if (i915_gem_obj_is_pinned(obj)) - err->pinned = 1; err->tiling = i915_gem_object_get_tiling(obj); err->dirty = obj->dirty; err->purgeable = obj->madv != I915_MADV_WILLNEED; @@ -788,13 +778,17 @@ static void capture_bo(struct drm_i915_error_buffer *err, err->cache_level = obj->cache_level; } -static u32 capture_active_bo(struct drm_i915_error_buffer *err, - int count, struct list_head *head) +static u32 capture_error_bo(struct drm_i915_error_buffer *err, + int count, struct list_head *head, + bool pinned_only) { struct i915_vma *vma; int i = 0; list_for_each_entry(vma, head, vm_link) { + if (pinned_only && !i915_vma_is_pinned(vma)) + continue; + capture_bo(err++, vma); if (++i == count) break; @@ -803,28 +797,6 @@ static u32 capture_active_bo(struct drm_i915_error_buffer *err, return i; } -static u32 capture_pinned_bo(struct drm_i915_error_buffer *err, - int count, struct list_head *head, - struct i915_address_space *vm) -{ - struct drm_i915_gem_object *obj; - struct drm_i915_error_buffer * const first = err; - struct drm_i915_error_buffer * const last = err + count; - - list_for_each_entry(obj, head, global_list) { - struct i915_vma *vma; - - if (err == last) - break; - - list_for_each_entry(vma, &obj->vma_list, obj_link) - if (vma->vm == vm && i915_vma_is_pinned(vma)) - capture_bo(err++, vma); - } - - return err - first; -} - /* Generate a semi-unique error code. The code is not meant to have meaning, The * code's only purpose is to try to prevent false duplicated bug reports by * grossly estimating a GPU error state. @@ -1062,7 +1034,6 @@ static void error_record_engine_registers(struct drm_i915_error_state *error, } } - static void i915_gem_record_active_context(struct intel_engine_cs *engine, struct drm_i915_error_state *error, struct drm_i915_error_engine *ee) @@ -1115,10 +1086,9 @@ static void i915_gem_record_rings(struct drm_i915_private *dev_priv, request = i915_gem_find_active_request(engine); if (request) { - struct i915_address_space *vm; struct intel_ring *ring; - vm = request->ctx->ppgtt ? + ee->vm = request->ctx->ppgtt ? &request->ctx->ppgtt->base : &ggtt->base; /* We need to copy these to an anonymous buffer @@ -1128,7 +1098,7 @@ static void i915_gem_record_rings(struct drm_i915_private *dev_priv, ee->batchbuffer = i915_error_object_create(dev_priv, request->batch_obj, - vm); + ee->vm); if (HAS_BROKEN_CS_TLB(dev_priv)) ee->wa_batchbuffer = @@ -1210,89 +1180,85 @@ static void i915_gem_record_rings(struct drm_i915_private *dev_priv, } } -/* FIXME: Since pin count/bound list is global, we duplicate what we capture per - * VM. - */ static void i915_gem_capture_vm(struct drm_i915_private *dev_priv, struct drm_i915_error_state *error, struct i915_address_space *vm, const int ndx) { - struct drm_i915_error_buffer *active_bo = NULL, *pinned_bo = NULL; - struct drm_i915_gem_object *obj; + struct drm_i915_error_buffer *active_bo; struct i915_vma *vma; int i; i = 0; list_for_each_entry(vma, &vm->active_list, vm_link) i++; - error->active_bo_count[ndx] = i; - - list_for_each_entry(obj, &dev_priv->mm.bound_list, global_list) { - list_for_each_entry(vma, &obj->vma_list, obj_link) - if (vma->vm == vm && i915_vma_is_pinned(vma)) - i++; - } - error->pinned_bo_count[ndx] = i - error->active_bo_count[ndx]; - if (i) { + active_bo = NULL; + if (i) active_bo = kcalloc(i, sizeof(*active_bo), GFP_ATOMIC); - if (active_bo) - pinned_bo = active_bo + error->active_bo_count[ndx]; - } - if (active_bo) - error->active_bo_count[ndx] = - capture_active_bo(active_bo, - error->active_bo_count[ndx], - &vm->active_list); - - if (pinned_bo) - error->pinned_bo_count[ndx] = - capture_pinned_bo(pinned_bo, - error->pinned_bo_count[ndx], - &dev_priv->mm.bound_list, vm); + i = capture_error_bo(active_bo, i, &vm->active_list, false); + else + i = 0; + error->active_bo[ndx] = active_bo; - error->pinned_bo[ndx] = pinned_bo; + error->active_bo_count[ndx] = i; + error->active_vm[ndx] = vm; } -static void i915_gem_capture_buffers(struct drm_i915_private *dev_priv, - struct drm_i915_error_state *error) +static void i915_capture_active_buffers(struct drm_i915_private *dev_priv, + struct drm_i915_error_state *error) { - struct i915_address_space *vm; - int cnt = 0, i = 0; - - list_for_each_entry(vm, &dev_priv->vm_list, global_link) - cnt++; - - error->active_bo = kcalloc(cnt, sizeof(*error->active_bo), GFP_ATOMIC); - error->pinned_bo = kcalloc(cnt, sizeof(*error->pinned_bo), GFP_ATOMIC); - error->active_bo_count = kcalloc(cnt, sizeof(*error->active_bo_count), - GFP_ATOMIC); - error->pinned_bo_count = kcalloc(cnt, sizeof(*error->pinned_bo_count), - GFP_ATOMIC); - - if (error->active_bo == NULL || - error->pinned_bo == NULL || - error->active_bo_count == NULL || - error->pinned_bo_count == NULL) { - kfree(error->active_bo); - kfree(error->active_bo_count); - kfree(error->pinned_bo); - kfree(error->pinned_bo_count); - - error->active_bo = NULL; - error->active_bo_count = NULL; - error->pinned_bo = NULL; - error->pinned_bo_count = NULL; - } else { - list_for_each_entry(vm, &dev_priv->vm_list, global_link) - i915_gem_capture_vm(dev_priv, error, vm, i++); + int cnt = 0, i, j; + + BUILD_BUG_ON(ARRAY_SIZE(error->engine) > ARRAY_SIZE(error->active_bo)); + BUILD_BUG_ON(ARRAY_SIZE(error->active_bo) != ARRAY_SIZE(error->active_vm)); + BUILD_BUG_ON(ARRAY_SIZE(error->active_bo) != ARRAY_SIZE(error->active_bo_count)); - error->vm_count = cnt; + for (i = 0; i < I915_NUM_ENGINES; i++) { + struct drm_i915_error_engine *ee = &error->engine[i]; + + if (!ee->vm) + continue; + + for (j = 0; j < i; j++) + if (error->engine[j].vm == ee->vm) + break; + if (j != i) + continue; + + i915_gem_capture_vm(dev_priv, error, ee->vm, cnt++); } } +static void i915_capture_pinned_buffers(struct drm_i915_private *dev_priv, + struct drm_i915_error_state *error) +{ + struct i915_address_space *vm = &dev_priv->ggtt.base; + struct drm_i915_error_buffer *bo; + struct i915_vma *vma; + int i, j; + + i = 0; + list_for_each_entry(vma, &vm->active_list, vm_link) + i++; + + j = 0; + list_for_each_entry(vma, &vm->inactive_list, vm_link) + j++; + + bo = NULL; + if (i + j) + bo = kcalloc(i + j, sizeof(*bo), GFP_ATOMIC); + if (!bo) + return; + + i = capture_error_bo(bo, i, &vm->active_list, true); + j = capture_error_bo(bo + i, j, &vm->inactive_list, true); + error->pinned_bo_count = i + j; + error->pinned_bo = bo; +} + /* Capture all registers which don't fit into another category. */ static void i915_capture_reg_state(struct drm_i915_private *dev_priv, struct drm_i915_error_state *error) @@ -1436,10 +1402,12 @@ void i915_capture_error_state(struct drm_i915_private *dev_priv, i915_capture_gen_state(dev_priv, error); i915_capture_reg_state(dev_priv, error); - i915_gem_capture_buffers(dev_priv, error); i915_gem_record_fences(dev_priv, error); i915_gem_record_rings(dev_priv, error); + i915_capture_active_buffers(dev_priv, error); + i915_capture_pinned_buffers(dev_priv, error); + do_gettimeofday(&error->time); error->overlay = intel_overlay_capture_error_state(dev_priv);