From patchwork Mon May 25 07:53:36 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chris Wilson X-Patchwork-Id: 11568249 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 7E554739 for ; Mon, 25 May 2020 07:54:09 +0000 (UTC) Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 66E7C2071A for ; Mon, 25 May 2020 07:54:09 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 66E7C2071A Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=chris-wilson.co.uk Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=intel-gfx-bounces@lists.freedesktop.org Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 9960D899F3; Mon, 25 May 2020 07:54:02 +0000 (UTC) X-Original-To: intel-gfx@lists.freedesktop.org Delivered-To: intel-gfx@lists.freedesktop.org Received: from fireflyinternet.com (mail.fireflyinternet.com [109.228.58.192]) by gabe.freedesktop.org (Postfix) with ESMTPS id 8ED15899BB for ; Mon, 25 May 2020 07:53:59 +0000 (UTC) X-Default-Received-SPF: pass (skip=forwardok (res=PASS)) x-ip-name=78.156.65.138; Received: from build.alporthouse.com (unverified [78.156.65.138]) by fireflyinternet.com (Firefly Internet (M1)) with ESMTP id 21284196-1500050 for multiple; Mon, 25 May 2020 08:53:45 +0100 From: Chris Wilson To: intel-gfx@lists.freedesktop.org Date: Mon, 25 May 2020 08:53:36 +0100 Message-Id: <20200525075347.582-1-chris@chris-wilson.co.uk> X-Mailer: git-send-email 2.20.1 MIME-Version: 1.0 Subject: [Intel-gfx] [PATCH 01/12] drm/i915/gt: Stop cross-polluting PIN_GLOBAL with PIN_USER with no-ppgtt X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Chris Wilson Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" In order to keep userptr distinct from ggtt mmaps in the eyes of lockdep, we need to avoid marking those userptr vma as PIN_GLOBAL. (So long as we comply with only using them as local PIN_USER!) References: https://gitlab.freedesktop.org/drm/intel/-/issues/1880 Signed-off-by: Chris Wilson Acked-by: Mika Kuoppala --- drivers/gpu/drm/i915/gt/intel_ggtt.c | 11 +++-------- 1 file changed, 3 insertions(+), 8 deletions(-) diff --git a/drivers/gpu/drm/i915/gt/intel_ggtt.c b/drivers/gpu/drm/i915/gt/intel_ggtt.c index 66165b10256e..8c275f8588c3 100644 --- a/drivers/gpu/drm/i915/gt/intel_ggtt.c +++ b/drivers/gpu/drm/i915/gt/intel_ggtt.c @@ -424,22 +424,17 @@ static int ggtt_bind_vma(struct i915_vma *vma, struct drm_i915_gem_object *obj = vma->obj; u32 pte_flags; + if (i915_vma_is_bound(vma, ~flags & I915_VMA_BIND_MASK)) + return 0; + /* Applicable to VLV (gen8+ do not support RO in the GGTT) */ pte_flags = 0; if (i915_gem_object_is_readonly(obj)) pte_flags |= PTE_READ_ONLY; vma->vm->insert_entries(vma->vm, vma, cache_level, pte_flags); - vma->page_sizes.gtt = I915_GTT_PAGE_SIZE; - /* - * Without aliasing PPGTT there's no difference between - * GLOBAL/LOCAL_BIND, it's all the same ptes. Hence unconditionally - * upgrade to both bound if we bind either to avoid double-binding. - */ - atomic_or(I915_VMA_GLOBAL_BIND | I915_VMA_LOCAL_BIND, &vma->flags); - return 0; } From patchwork Mon May 25 07:53:37 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chris Wilson X-Patchwork-Id: 11568241 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id D1B0D739 for ; Mon, 25 May 2020 07:54:04 +0000 (UTC) Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id BA64F2071A for ; Mon, 25 May 2020 07:54:04 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org BA64F2071A Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=chris-wilson.co.uk Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=intel-gfx-bounces@lists.freedesktop.org Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id D070F899D4; Mon, 25 May 2020 07:54:01 +0000 (UTC) X-Original-To: intel-gfx@lists.freedesktop.org Delivered-To: intel-gfx@lists.freedesktop.org Received: from fireflyinternet.com (mail.fireflyinternet.com [109.228.58.192]) by gabe.freedesktop.org (Postfix) with ESMTPS id 8F547899BE for ; Mon, 25 May 2020 07:53:59 +0000 (UTC) X-Default-Received-SPF: pass (skip=forwardok (res=PASS)) x-ip-name=78.156.65.138; Received: from build.alporthouse.com (unverified [78.156.65.138]) by fireflyinternet.com (Firefly Internet (M1)) with ESMTP id 21284197-1500050 for multiple; Mon, 25 May 2020 08:53:45 +0100 From: Chris Wilson To: intel-gfx@lists.freedesktop.org Date: Mon, 25 May 2020 08:53:37 +0100 Message-Id: <20200525075347.582-2-chris@chris-wilson.co.uk> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20200525075347.582-1-chris@chris-wilson.co.uk> References: <20200525075347.582-1-chris@chris-wilson.co.uk> MIME-Version: 1.0 Subject: [Intel-gfx] [PATCH 02/12] drm/i915/gt: Cancel the flush worker more thoroughly X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Chris Wilson Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" Since the worker may rearm, we currently are only guaranteed to flush the work if we cancel the timer. If the work was running at the time we try and cancel it, we will wait for it to complete, but it may leave items in the pool and requeue the work. If we rearrange the immediate discard of the pool then cancel the work, we know that the work cannot rearm and so our flush will be final. <0> [314.146044] i915_mod-1321 2.... 299799443us : intel_gt_fini_buffer_pool: intel_gt_fini_buffer_pool:227 GEM_BUG_ON(!list_empty(&pool->cache_list[n])) Signed-off-by: Chris Wilson Reviewed-by: Mika Kuoppala --- drivers/gpu/drm/i915/gt/intel_gt_buffer_pool.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/i915/gt/intel_gt_buffer_pool.c b/drivers/gpu/drm/i915/gt/intel_gt_buffer_pool.c index 1495054a4305..418ae184cecf 100644 --- a/drivers/gpu/drm/i915/gt/intel_gt_buffer_pool.c +++ b/drivers/gpu/drm/i915/gt/intel_gt_buffer_pool.c @@ -212,8 +212,9 @@ void intel_gt_flush_buffer_pool(struct intel_gt *gt) { struct intel_gt_buffer_pool *pool = >->buffer_pool; - if (cancel_delayed_work_sync(&pool->work)) + do { pool_free_imm(pool); + } while (cancel_delayed_work_sync(&pool->work)); } void intel_gt_fini_buffer_pool(struct intel_gt *gt) From patchwork Mon May 25 07:53:38 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chris Wilson X-Patchwork-Id: 11568245 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 5F1E314B7 for ; Mon, 25 May 2020 07:54:07 +0000 (UTC) Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 480512071A for ; Mon, 25 May 2020 07:54:07 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 480512071A Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=chris-wilson.co.uk Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=intel-gfx-bounces@lists.freedesktop.org Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id BDF4E89B00; Mon, 25 May 2020 07:54:02 +0000 (UTC) X-Original-To: intel-gfx@lists.freedesktop.org Delivered-To: intel-gfx@lists.freedesktop.org Received: from fireflyinternet.com (mail.fireflyinternet.com [109.228.58.192]) by gabe.freedesktop.org (Postfix) with ESMTPS id A170589B67 for ; Mon, 25 May 2020 07:53:59 +0000 (UTC) X-Default-Received-SPF: pass (skip=forwardok (res=PASS)) x-ip-name=78.156.65.138; Received: from build.alporthouse.com (unverified [78.156.65.138]) by fireflyinternet.com (Firefly Internet (M1)) with ESMTP id 21284198-1500050 for multiple; Mon, 25 May 2020 08:53:45 +0100 From: Chris Wilson To: intel-gfx@lists.freedesktop.org Date: Mon, 25 May 2020 08:53:38 +0100 Message-Id: <20200525075347.582-3-chris@chris-wilson.co.uk> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20200525075347.582-1-chris@chris-wilson.co.uk> References: <20200525075347.582-1-chris@chris-wilson.co.uk> MIME-Version: 1.0 Subject: [Intel-gfx] [PATCH 03/12] drm/i915/gem: Suppress some random warnings X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Chris Wilson Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" Leave the error propagation in place, but limit the warnings to only show up in CI if the unlikely errors are hit. Signed-off-by: Chris Wilson Reviewed-by: Mika Kuoppala --- drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c | 3 +-- drivers/gpu/drm/i915/gem/i915_gem_phys.c | 3 +-- drivers/gpu/drm/i915/gem/i915_gem_shmem.c | 3 +-- drivers/gpu/drm/i915/gem/i915_gem_userptr.c | 2 +- 4 files changed, 4 insertions(+), 7 deletions(-) diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c index e4fb6c372537..219a36995b96 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c @@ -1626,8 +1626,7 @@ eb_relocate_entry(struct i915_execbuffer *eb, err = i915_vma_bind(target->vma, target->vma->obj->cache_level, PIN_GLOBAL, NULL); - if (drm_WARN_ONCE(&i915->drm, err, - "Unexpected failure to bind target VMA!")) + if (err) return err; } } diff --git a/drivers/gpu/drm/i915/gem/i915_gem_phys.c b/drivers/gpu/drm/i915/gem/i915_gem_phys.c index 4c1c7232b024..12245a47e5fb 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_phys.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_phys.c @@ -27,8 +27,7 @@ static int i915_gem_object_get_pages_phys(struct drm_i915_gem_object *obj) void *dst; int i; - if (drm_WARN_ON(obj->base.dev, - i915_gem_object_needs_bit17_swizzle(obj))) + if (GEM_WARN_ON(i915_gem_object_needs_bit17_swizzle(obj))) return -EINVAL; /* diff --git a/drivers/gpu/drm/i915/gem/i915_gem_shmem.c b/drivers/gpu/drm/i915/gem/i915_gem_shmem.c index 7aff3514d97a..7cf8548ff708 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_shmem.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_shmem.c @@ -147,8 +147,7 @@ static int shmem_get_pages(struct drm_i915_gem_object *obj) last_pfn = page_to_pfn(page); /* Check that the i965g/gm workaround works. */ - drm_WARN_ON(&i915->drm, - (gfp & __GFP_DMA32) && (last_pfn >= 0x00100000UL)); + GEM_BUG_ON(gfp & __GFP_DMA32 && last_pfn >= 0x00100000UL); } if (sg) { /* loop terminated early; short sg table */ sg_page_sizes |= sg->length; diff --git a/drivers/gpu/drm/i915/gem/i915_gem_userptr.c b/drivers/gpu/drm/i915/gem/i915_gem_userptr.c index 8b0708708671..2226146b01c9 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_userptr.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_userptr.c @@ -235,7 +235,7 @@ i915_gem_userptr_init__mmu_notifier(struct drm_i915_gem_object *obj, if (flags & I915_USERPTR_UNSYNCHRONIZED) return capable(CAP_SYS_ADMIN) ? 0 : -EPERM; - if (drm_WARN_ON(obj->base.dev, obj->userptr.mm == NULL)) + if (GEM_WARN_ON(!obj->userptr.mm)) return -EINVAL; mn = i915_mmu_notifier_find(obj->userptr.mm); From patchwork Mon May 25 07:53:39 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chris Wilson X-Patchwork-Id: 11568255 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 15DBF14B7 for ; Mon, 25 May 2020 07:54:12 +0000 (UTC) Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id F28EC2071A for ; Mon, 25 May 2020 07:54:11 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org F28EC2071A Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=chris-wilson.co.uk Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=intel-gfx-bounces@lists.freedesktop.org Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 2B10B89B67; Mon, 25 May 2020 07:54:03 +0000 (UTC) X-Original-To: intel-gfx@lists.freedesktop.org Delivered-To: intel-gfx@lists.freedesktop.org Received: from fireflyinternet.com (mail.fireflyinternet.com [109.228.58.192]) by gabe.freedesktop.org (Postfix) with ESMTPS id 9AA3489B00 for ; Mon, 25 May 2020 07:53:59 +0000 (UTC) X-Default-Received-SPF: pass (skip=forwardok (res=PASS)) x-ip-name=78.156.65.138; Received: from build.alporthouse.com (unverified [78.156.65.138]) by fireflyinternet.com (Firefly Internet (M1)) with ESMTP id 21284199-1500050 for multiple; Mon, 25 May 2020 08:53:46 +0100 From: Chris Wilson To: intel-gfx@lists.freedesktop.org Date: Mon, 25 May 2020 08:53:39 +0100 Message-Id: <20200525075347.582-4-chris@chris-wilson.co.uk> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20200525075347.582-1-chris@chris-wilson.co.uk> References: <20200525075347.582-1-chris@chris-wilson.co.uk> MIME-Version: 1.0 Subject: [Intel-gfx] [PATCH 04/12] drm/i915/execlists: Shortcircuit queue_prio() for no internal levels X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Chris Wilson Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" If there are no internal levels and the user priority-shift is zero, we can help the compiler eliminate some dead code: Function old new delta start_timeslice 169 154 -15 __execlists_submission_tasklet 4696 4659 -37 Signed-off-by: Chris Wilson Reviewed-by: Mika Kuoppala --- drivers/gpu/drm/i915/gt/intel_lrc.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c index de5be57ed6d2..3214a4ecc31a 100644 --- a/drivers/gpu/drm/i915/gt/intel_lrc.c +++ b/drivers/gpu/drm/i915/gt/intel_lrc.c @@ -446,6 +446,9 @@ static int queue_prio(const struct intel_engine_execlists *execlists) * we have to flip the index value to become priority. */ p = to_priolist(rb); + if (!I915_USER_PRIORITY_SHIFT) + return p->priority; + return ((p->priority + 1) << I915_USER_PRIORITY_SHIFT) - ffs(p->used); } From patchwork Mon May 25 07:53:40 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chris Wilson X-Patchwork-Id: 11568257 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id CBBB6739 for ; Mon, 25 May 2020 07:54:12 +0000 (UTC) Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id B4D042071A for ; Mon, 25 May 2020 07:54:12 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org B4D042071A Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=chris-wilson.co.uk Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=intel-gfx-bounces@lists.freedesktop.org Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id B80196E03C; Mon, 25 May 2020 07:54:03 +0000 (UTC) X-Original-To: intel-gfx@lists.freedesktop.org Delivered-To: intel-gfx@lists.freedesktop.org Received: from fireflyinternet.com (mail.fireflyinternet.com [109.228.58.192]) by gabe.freedesktop.org (Postfix) with ESMTPS id 92911899F3 for ; Mon, 25 May 2020 07:53:59 +0000 (UTC) X-Default-Received-SPF: pass (skip=forwardok (res=PASS)) x-ip-name=78.156.65.138; Received: from build.alporthouse.com (unverified [78.156.65.138]) by fireflyinternet.com (Firefly Internet (M1)) with ESMTP id 21284200-1500050 for multiple; Mon, 25 May 2020 08:53:46 +0100 From: Chris Wilson To: intel-gfx@lists.freedesktop.org Date: Mon, 25 May 2020 08:53:40 +0100 Message-Id: <20200525075347.582-5-chris@chris-wilson.co.uk> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20200525075347.582-1-chris@chris-wilson.co.uk> References: <20200525075347.582-1-chris@chris-wilson.co.uk> MIME-Version: 1.0 Subject: [Intel-gfx] [PATCH 05/12] drm/i915: Improve execute_cb struct packing X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Chris Wilson Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" Reduce the irq_work llist for attaching the callbacks to the signal for both smaller structs (two fewer pointers!) and simpler [debug] code: Function old new delta irq_execute_cb 35 34 -1 __igt_breadcrumbs_smoketest 1684 1682 -2 i915_request_retire 2003 1996 -7 __i915_request_create 1047 1040 -7 __notify_execute_cb 135 126 -9 __i915_request_ctor 188 178 -10 __await_execution.part.constprop 451 440 -11 igt_wait_request 924 714 -210 One minor artifact is that the order of cb exection is reversed. No current use cases are affected by that change. Signed-off-by: Chris Wilson Reviewed-by: Mika Kuoppala --- drivers/gpu/drm/i915/i915_request.c | 18 +++++++++--------- drivers/gpu/drm/i915/i915_request.h | 2 +- 2 files changed, 10 insertions(+), 10 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c index c282719ad3ac..22df5b229aed 100644 --- a/drivers/gpu/drm/i915/i915_request.c +++ b/drivers/gpu/drm/i915/i915_request.c @@ -42,7 +42,6 @@ #include "intel_pm.h" struct execute_cb { - struct list_head link; struct irq_work work; struct i915_sw_fence *fence; void (*hook)(struct i915_request *rq, struct dma_fence *signal); @@ -189,14 +188,14 @@ static void irq_execute_cb_hook(struct irq_work *wrk) static void __notify_execute_cb(struct i915_request *rq) { - struct execute_cb *cb; + struct execute_cb *cb, *cn; lockdep_assert_held(&rq->lock); - if (list_empty(&rq->execute_cb)) + if (llist_empty(&rq->execute_cb)) return; - list_for_each_entry(cb, &rq->execute_cb, link) + llist_for_each_entry_safe(cb, cn, rq->execute_cb.first, work.llnode) irq_work_queue(&cb->work); /* @@ -209,7 +208,7 @@ static void __notify_execute_cb(struct i915_request *rq) * preempt-to-idle cycle on the target engine, all the while the * master execute_cb may refire. */ - INIT_LIST_HEAD(&rq->execute_cb); + rq->execute_cb.first = NULL; } static inline void @@ -327,7 +326,7 @@ bool i915_request_retire(struct i915_request *rq) set_bit(I915_FENCE_FLAG_ACTIVE, &rq->fence.flags); __notify_execute_cb(rq); } - GEM_BUG_ON(!list_empty(&rq->execute_cb)); + GEM_BUG_ON(!llist_empty(&rq->execute_cb)); spin_unlock_irq(&rq->lock); remove_from_client(rq); @@ -395,7 +394,8 @@ __await_execution(struct i915_request *rq, i915_sw_fence_complete(cb->fence); kmem_cache_free(global.slab_execute_cbs, cb); } else { - list_add_tail(&cb->link, &signal->execute_cb); + cb->work.llnode.next = signal->execute_cb.first; + signal->execute_cb.first = &cb->work.llnode; } spin_unlock_irq(&signal->lock); @@ -704,7 +704,7 @@ static void __i915_request_ctor(void *arg) rq->file_priv = NULL; rq->capture_list = NULL; - INIT_LIST_HEAD(&rq->execute_cb); + init_llist_head(&rq->execute_cb); } struct i915_request * @@ -794,7 +794,7 @@ __i915_request_create(struct intel_context *ce, gfp_t gfp) rq->batch = NULL; GEM_BUG_ON(rq->file_priv); GEM_BUG_ON(rq->capture_list); - GEM_BUG_ON(!list_empty(&rq->execute_cb)); + GEM_BUG_ON(!llist_empty(&rq->execute_cb)); /* * Reserve space in the ring buffer for all the commands required to diff --git a/drivers/gpu/drm/i915/i915_request.h b/drivers/gpu/drm/i915/i915_request.h index 8ec7ee4dbadc..5d4709a3dace 100644 --- a/drivers/gpu/drm/i915/i915_request.h +++ b/drivers/gpu/drm/i915/i915_request.h @@ -214,7 +214,7 @@ struct i915_request { ktime_t emitted; } duration; }; - struct list_head execute_cb; + struct llist_head execute_cb; struct i915_sw_fence semaphore; /* From patchwork Mon May 25 07:53:41 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Chris Wilson X-Patchwork-Id: 11568261 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 5AF22739 for ; Mon, 25 May 2020 07:54:30 +0000 (UTC) Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 4346820776 for ; Mon, 25 May 2020 07:54:30 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 4346820776 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=chris-wilson.co.uk Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=intel-gfx-bounces@lists.freedesktop.org Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id B153889BDB; Mon, 25 May 2020 07:54:29 +0000 (UTC) X-Original-To: intel-gfx@lists.freedesktop.org Delivered-To: intel-gfx@lists.freedesktop.org Received: from fireflyinternet.com (mail.fireflyinternet.com [109.228.58.192]) by gabe.freedesktop.org (Postfix) with ESMTPS id CBDF989BDB for ; Mon, 25 May 2020 07:54:27 +0000 (UTC) X-Default-Received-SPF: pass (skip=forwardok (res=PASS)) x-ip-name=78.156.65.138; Received: from build.alporthouse.com (unverified [78.156.65.138]) by fireflyinternet.com (Firefly Internet (M1)) with ESMTP id 21284201-1500050 for multiple; Mon, 25 May 2020 08:53:46 +0100 From: Chris Wilson To: intel-gfx@lists.freedesktop.org Date: Mon, 25 May 2020 08:53:41 +0100 Message-Id: <20200525075347.582-6-chris@chris-wilson.co.uk> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20200525075347.582-1-chris@chris-wilson.co.uk> References: <20200525075347.582-1-chris@chris-wilson.co.uk> MIME-Version: 1.0 Subject: [Intel-gfx] [PATCH 06/12] dma-buf: Proxy fence, an unsignaled fence placeholder X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Chris Wilson Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" Often we need to create a fence for a future event that has not yet been associated with a fence. We can store a proxy fence, a placeholder, in the timeline and replace it later when the real fence is known. Any listeners that attach to the proxy fence will automatically be signaled when the real fence completes, and any future listeners will instead be attach directly to the real fence avoiding any indirection overhead. Signed-off-by: Chris Wilson Cc: Lionel Landwerlin Reported-by: kbuild test robot --- drivers/dma-buf/Makefile | 13 +- drivers/dma-buf/dma-fence-private.h | 20 + drivers/dma-buf/dma-fence-proxy.c | 295 +++++++++++ drivers/dma-buf/dma-fence.c | 4 +- drivers/dma-buf/selftests.h | 1 + drivers/dma-buf/st-dma-fence-proxy.c | 752 +++++++++++++++++++++++++++ include/linux/dma-fence-proxy.h | 38 ++ 7 files changed, 1119 insertions(+), 4 deletions(-) create mode 100644 drivers/dma-buf/dma-fence-private.h create mode 100644 drivers/dma-buf/dma-fence-proxy.c create mode 100644 drivers/dma-buf/st-dma-fence-proxy.c create mode 100644 include/linux/dma-fence-proxy.h diff --git a/drivers/dma-buf/Makefile b/drivers/dma-buf/Makefile index 995e05f609ff..afaf6dadd9a3 100644 --- a/drivers/dma-buf/Makefile +++ b/drivers/dma-buf/Makefile @@ -1,6 +1,12 @@ # SPDX-License-Identifier: GPL-2.0-only -obj-y := dma-buf.o dma-fence.o dma-fence-array.o dma-fence-chain.o \ - dma-resv.o seqno-fence.o +obj-y := \ + dma-buf.o \ + dma-fence.o \ + dma-fence-array.o \ + dma-fence-chain.o \ + dma-fence-proxy.o \ + dma-resv.o \ + seqno-fence.o obj-$(CONFIG_DMABUF_HEAPS) += dma-heap.o obj-$(CONFIG_DMABUF_HEAPS) += heaps/ obj-$(CONFIG_SYNC_FILE) += sync_file.o @@ -10,6 +16,7 @@ obj-$(CONFIG_UDMABUF) += udmabuf.o dmabuf_selftests-y := \ selftest.o \ st-dma-fence.o \ - st-dma-fence-chain.o + st-dma-fence-chain.o \ + st-dma-fence-proxy.o obj-$(CONFIG_DMABUF_SELFTESTS) += dmabuf_selftests.o diff --git a/drivers/dma-buf/dma-fence-private.h b/drivers/dma-buf/dma-fence-private.h new file mode 100644 index 000000000000..6924d28af0fa --- /dev/null +++ b/drivers/dma-buf/dma-fence-private.h @@ -0,0 +1,20 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +/* + * Fence mechanism for dma-buf and to allow for asynchronous dma access + * + * Copyright (C) 2012 Canonical Ltd + * Copyright (C) 2012 Texas Instruments + * + * Authors: + * Rob Clark + * Maarten Lankhorst + */ + +#ifndef DMA_FENCE_PRIVATE_H +#define DMA_FENCE_PRIAVTE_H + +struct dma_fence; + +bool __dma_fence_enable_signaling(struct dma_fence *fence); + +#endif /* DMA_FENCE_PRIAVTE_H */ diff --git a/drivers/dma-buf/dma-fence-proxy.c b/drivers/dma-buf/dma-fence-proxy.c new file mode 100644 index 000000000000..6689cc89c6ad --- /dev/null +++ b/drivers/dma-buf/dma-fence-proxy.c @@ -0,0 +1,295 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * dma-fence-proxy: placeholder unsignaled fence + * + * Copyright (C) 2017-2019 Intel Corporation + */ + +#include +#include +#include +#include +#include + +#include "dma-fence-private.h" + +struct dma_fence_proxy { + struct dma_fence base; + + struct dma_fence *real; + struct dma_fence_cb cb; + struct irq_work work; + + wait_queue_head_t wq; +}; + +#ifdef CONFIG_DEBUG_LOCK_ALLOC +#define same_lockclass(A, B) (A)->dep_map.key == (B)->dep_map.key +#else +#define same_lockclass(A, B) 0 +#endif + +static const char *proxy_get_driver_name(struct dma_fence *fence) +{ + struct dma_fence_proxy *p = container_of(fence, typeof(*p), base); + struct dma_fence *real = READ_ONCE(p->real); + + return real ? real->ops->get_driver_name(real) : "proxy"; +} + +static const char *proxy_get_timeline_name(struct dma_fence *fence) +{ + struct dma_fence_proxy *p = container_of(fence, typeof(*p), base); + struct dma_fence *real = READ_ONCE(p->real); + + return real ? real->ops->get_timeline_name(real) : "unset"; +} + +static void proxy_irq_work(struct irq_work *work) +{ + struct dma_fence_proxy *p = container_of(work, typeof(*p), work); + + dma_fence_signal(&p->base); + dma_fence_put(&p->base); +} + +static void proxy_callback(struct dma_fence *real, struct dma_fence_cb *cb) +{ + struct dma_fence_proxy *p = container_of(cb, typeof(*p), cb); + + if (real->error) + dma_fence_set_error(&p->base, real->error); + + /* Lower the height of the proxy chain -> single stack frame */ + irq_work_queue(&p->work); +} + +static bool proxy_enable_signaling(struct dma_fence *fence) +{ + struct dma_fence_proxy *p = container_of(fence, typeof(*p), base); + struct dma_fence *real = READ_ONCE(p->real); + bool ret = true; + + if (real) { + spin_lock_nested(real->lock, + same_lockclass(&p->wq.lock, real->lock)); + ret = __dma_fence_enable_signaling(real); + spin_unlock(real->lock); + } + + return ret; +} + +static void proxy_release(struct dma_fence *fence) +{ + struct dma_fence_proxy *p = container_of(fence, typeof(*p), base); + + dma_fence_put(p->real); + dma_fence_free(&p->base); +} + +const struct dma_fence_ops dma_fence_proxy_ops = { + .get_driver_name = proxy_get_driver_name, + .get_timeline_name = proxy_get_timeline_name, + .enable_signaling = proxy_enable_signaling, + .wait = dma_fence_default_wait, + .release = proxy_release, +}; +EXPORT_SYMBOL_GPL(dma_fence_proxy_ops); + +/** + * __dma_fence_create_proxy - Create an unset dma-fence + * @context: context number to use for proxy fence + * @seqno: sequence number to use for proxy fence + * + * __dma_fence_create_proxy() creates a new dma_fence stub that is initially + * unsignaled and may later be replaced with a real fence. Any listeners + * to the proxy fence will be signaled when the target fence signals its + * completion. + */ +struct dma_fence *__dma_fence_create_proxy(u64 context, u64 seqno) +{ + struct dma_fence_proxy *p; + + p = kzalloc(sizeof(*p), GFP_KERNEL); + if (!p) + return NULL; + + init_waitqueue_head(&p->wq); + dma_fence_init(&p->base, &dma_fence_proxy_ops, &p->wq.lock, + context, seqno); + init_irq_work(&p->work, proxy_irq_work); + + return &p->base; +} +EXPORT_SYMBOL(__dma_fence_create_proxy); + +/** + * dma_fence_create_proxy - Create an unset dma-fence + * + * Wraps __dma_fence_create_proxy() to create a new proxy fence with the + * next available (unique) context id. + */ +struct dma_fence *dma_fence_create_proxy(void) +{ + return __dma_fence_create_proxy(dma_fence_context_alloc(1), 0); +} +EXPORT_SYMBOL(dma_fence_create_proxy); + +static void __wake_up_listeners(struct dma_fence_proxy *p) +{ + struct wait_queue_entry *wait, *next; + + list_for_each_entry_safe(wait, next, &p->wq.head, entry) { + INIT_LIST_HEAD(&wait->entry); + wait->func(wait, TASK_NORMAL, 0, p->real); + } +} + +static void proxy_assign(struct dma_fence *fence, struct dma_fence *real) +{ + struct dma_fence_proxy *p = container_of(fence, typeof(*p), base); + unsigned long flags; + + if (WARN_ON(fence == real)) + return; + + if (WARN_ON(test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence->flags))) + return; + + if (WARN_ON(p->real)) + return; + + spin_lock_irqsave(&p->wq.lock, flags); + + if (unlikely(!real)) { + dma_fence_signal_locked(&p->base); + goto unlock; + } + + p->real = dma_fence_get(real); + + dma_fence_get(&p->base); + spin_lock_nested(real->lock, same_lockclass(&p->wq.lock, real->lock)); + if (dma_fence_is_signaled_locked(real)) { + proxy_callback(real, &p->cb); + } else if (test_bit(DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT, + &p->base.flags) && + !__dma_fence_enable_signaling(real)) { + proxy_callback(real, &p->cb); + } else { + p->cb.func = proxy_callback; + list_add_tail(&p->cb.node, &real->cb_list); + } + spin_unlock(real->lock); + +unlock: + __wake_up_listeners(p); + spin_unlock_irqrestore(&p->wq.lock, flags); +} + +/** + * dma_fence_replace_proxy - Replace the proxy fence with the real target + * @slot: pointer to location of fence to update + * @fence: the new fence to store in @slot + * + * Once the real dma_fence is known, we can replace the proxy fence holder + * with a pointer to the real dma fence. Future listeners will attach to + * the real fence, avoiding any indirection overhead. Previous listeners + * will remain attached to the proxy fence, and be signaled in turn when + * the target fence completes. + */ +struct dma_fence * +dma_fence_replace_proxy(struct dma_fence __rcu **slot, struct dma_fence *fence) +{ + struct dma_fence *old; + + if (fence) + dma_fence_get(fence); + + old = rcu_replace_pointer(*slot, fence, true); + if (old && dma_fence_is_proxy(old)) + proxy_assign(old, fence); + + return old; +} +EXPORT_SYMBOL(dma_fence_replace_proxy); + +/** + * dma_fence_proxy_set_real - Set the target of a proxy fence + * @fence: the proxy fence + * @real: the target fence. + * + */ +void dma_fence_proxy_set_real(struct dma_fence *fence, struct dma_fence *real) +{ + if (dma_fence_is_proxy(fence)) + proxy_assign(fence, real); +} +EXPORT_SYMBOL(dma_fence_proxy_set_real); + +/** + * dma_fence_proxy_get_real - Query the target of a proxy fence + * @fence: the proxy fence + * + * Unpeel the proxy fence to see if it has been replaced with a real fence. + */ +struct dma_fence *dma_fence_proxy_get_real(struct dma_fence *fence) +{ + if (dma_fence_is_proxy(fence)) { + struct dma_fence_proxy *p = + container_of(fence, typeof(*p), base); + + if (p->real) + fence = p->real; + } + + return fence; +} +EXPORT_SYMBOL(dma_fence_proxy_get_real); + +void dma_fence_add_proxy_listener(struct dma_fence *fence, + struct wait_queue_entry *wait) +{ + if (dma_fence_is_proxy(fence)) { + struct dma_fence_proxy *p = + container_of(fence, typeof(*p), base); + unsigned long flags; + + spin_lock_irqsave(&p->wq.lock, flags); + if (!p->real) { + list_add_tail(&wait->entry, &p->wq.head); + wait = NULL; + } + fence = p->real; + spin_unlock_irqrestore(&p->wq.lock, flags); + } + + if (wait) { + INIT_LIST_HEAD(&wait->entry); + wait->func(wait, TASK_NORMAL, 0, fence); + } +} +EXPORT_SYMBOL(dma_fence_add_proxy_listener); + +bool dma_fence_remove_proxy_listener(struct dma_fence *fence, + struct wait_queue_entry *wait) +{ + bool ret = false; + + if (dma_fence_is_proxy(fence)) { + struct dma_fence_proxy *p = + container_of(fence, typeof(*p), base); + unsigned long flags; + + spin_lock_irqsave(&p->wq.lock, flags); + if (!list_empty(&wait->entry)) { + list_del_init(&wait->entry); + ret = true; + } + spin_unlock_irqrestore(&p->wq.lock, flags); + } + + return ret; +} +EXPORT_SYMBOL(dma_fence_remove_proxy_listener); diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c index 656e9ac2d028..329bd033059f 100644 --- a/drivers/dma-buf/dma-fence.c +++ b/drivers/dma-buf/dma-fence.c @@ -19,6 +19,8 @@ #define CREATE_TRACE_POINTS #include +#include "dma-fence-private.h" + EXPORT_TRACEPOINT_SYMBOL(dma_fence_emit); EXPORT_TRACEPOINT_SYMBOL(dma_fence_enable_signal); EXPORT_TRACEPOINT_SYMBOL(dma_fence_signaled); @@ -275,7 +277,7 @@ void dma_fence_free(struct dma_fence *fence) } EXPORT_SYMBOL(dma_fence_free); -static bool __dma_fence_enable_signaling(struct dma_fence *fence) +bool __dma_fence_enable_signaling(struct dma_fence *fence) { bool was_set; diff --git a/drivers/dma-buf/selftests.h b/drivers/dma-buf/selftests.h index 55918ef9adab..616eca70e2d8 100644 --- a/drivers/dma-buf/selftests.h +++ b/drivers/dma-buf/selftests.h @@ -12,3 +12,4 @@ selftest(sanitycheck, __sanitycheck__) /* keep first (igt selfcheck) */ selftest(dma_fence, dma_fence) selftest(dma_fence_chain, dma_fence_chain) +selftest(dma_fence_proxy, dma_fence_proxy) diff --git a/drivers/dma-buf/st-dma-fence-proxy.c b/drivers/dma-buf/st-dma-fence-proxy.c new file mode 100644 index 000000000000..c3f210bc4e60 --- /dev/null +++ b/drivers/dma-buf/st-dma-fence-proxy.c @@ -0,0 +1,752 @@ +// SPDX-License-Identifier: MIT +/* + * Copyright © 2019 Intel Corporation + */ + +#include +#include +#include +#include +#include +#include +#include + +#include "selftest.h" + +static struct kmem_cache *slab_fences; + +static struct mock_fence { + struct dma_fence base; + spinlock_t lock; +} *to_mock_fence(struct dma_fence *f) { + return container_of(f, struct mock_fence, base); +} + +static const char *mock_name(struct dma_fence *f) +{ + return "mock"; +} + +static void mock_fence_release(struct dma_fence *f) +{ + kmem_cache_free(slab_fences, to_mock_fence(f)); +} + +static const struct dma_fence_ops mock_ops = { + .get_driver_name = mock_name, + .get_timeline_name = mock_name, + .release = mock_fence_release, +}; + +static struct dma_fence *mock_fence(void) +{ + struct mock_fence *f; + + f = kmem_cache_alloc(slab_fences, GFP_KERNEL); + if (!f) + return NULL; + + spin_lock_init(&f->lock); + dma_fence_init(&f->base, &mock_ops, &f->lock, 0, 0); + + return &f->base; +} + +static int sanitycheck(void *arg) +{ + struct dma_fence *f; + + f = dma_fence_create_proxy(); + if (!f) + return -ENOMEM; + + dma_fence_signal(f); + dma_fence_put(f); + + return 0; +} + +struct fences { + struct dma_fence *real; + struct dma_fence *proxy; + struct dma_fence __rcu *slot; +}; + +static int create_fences(struct fences *f, bool attach) +{ + f->proxy = dma_fence_create_proxy(); + if (!f->proxy) + return -ENOMEM; + + RCU_INIT_POINTER(f->slot, f->proxy); + + f->real = mock_fence(); + if (!f->real) { + dma_fence_put(f->proxy); + return -ENOMEM; + } + + if (attach) + dma_fence_replace_proxy(&f->slot, f->real); + + return 0; +} + +static void free_fences(struct fences *f) +{ + dma_fence_put(dma_fence_replace_proxy(&f->slot, NULL)); + + dma_fence_signal(f->real); + dma_fence_put(f->real); + + dma_fence_signal(f->proxy); + dma_fence_put(f->proxy); +} + +static int wrap_target(void *arg) +{ + struct fences f; + int err = -EINVAL; + + if (create_fences(&f, false)) + return -ENOMEM; + + if (dma_fence_proxy_get_real(f.proxy) != f.proxy) { + pr_err("Unwrapped proxy fenced reported a target fence!\n"); + goto err_free; + } + + dma_fence_proxy_set_real(f.proxy, f.real); + rcu_assign_pointer(f.slot, dma_fence_get(f.real)); /* free_fences() */ + + if (dma_fence_proxy_get_real(f.proxy) != f.real) { + pr_err("Wrapped proxy fenced did not report the target fence!\n"); + goto err_free; + } + + err = 0; +err_free: + free_fences(&f); + return err; +} + +static int wrap_proxy(void *arg) +{ + struct fences f; + int err = -EINVAL; + + if (create_fences(&f, true)) + return -ENOMEM; + + if (dma_fence_proxy_get_real(f.proxy) != f.real) { + pr_err("Wrapped proxy fenced did not report the target fence!\n"); + goto err_free; + } + + err = 0; +err_free: + free_fences(&f); + return err; +} + +static int wrap_signaling(void *arg) +{ + struct fences f; + int err = -EINVAL; + + if (create_fences(&f, true)) + return -ENOMEM; + + if (dma_fence_is_signaled(f.proxy)) { + pr_err("Fence unexpectedly signaled on creation\n"); + goto err_free; + } + + if (dma_fence_signal(f.real)) { + pr_err("Fence reported being already signaled\n"); + goto err_free; + } + + if (!dma_fence_is_signaled(f.proxy)) { + pr_err("Fence not reporting signaled\n"); + goto err_free; + } + + err = 0; +err_free: + free_fences(&f); + return err; +} + +static int wrap_signaling_recurse(void *arg) +{ + struct fences f; + struct dma_fence *chain; + int err = -EINVAL; + + if (create_fences(&f, false)) + return -ENOMEM; + + chain = dma_fence_create_proxy(); + if (!chain) { + err = -ENOMEM; + goto err_free; + } + + dma_fence_replace_proxy(&f.slot, chain); + dma_fence_put(dma_fence_replace_proxy(&f.slot, f.real)); + dma_fence_put(chain); + + /* f.real <- chain <- f.proxy */ + + if (dma_fence_is_signaled(f.proxy)) { + pr_err("Fence unexpectedly signaled on creation\n"); + goto err_free; + } + + if (dma_fence_signal(f.real)) { + pr_err("Fence reported being already signaled\n"); + goto err_free; + } + + if (!dma_fence_is_signaled(f.proxy)) { + pr_err("Fence not reporting signaled\n"); + goto err_free; + } + + err = 0; +err_free: + free_fences(&f); + return err; +} + +struct simple_cb { + struct dma_fence_cb cb; + bool seen; +}; + +static void simple_callback(struct dma_fence *f, struct dma_fence_cb *cb) +{ + /* Ensure the callback marker is visible, no excuses for missing it! */ + smp_store_mb(container_of(cb, struct simple_cb, cb)->seen, true); +} + +static int wrap_add_callback(void *arg) +{ + struct simple_cb cb = {}; + struct fences f; + int err = -EINVAL; + + if (create_fences(&f, true)) + return -ENOMEM; + + if (dma_fence_add_callback(f.proxy, &cb.cb, simple_callback)) { + pr_err("Failed to add callback, fence already signaled!\n"); + goto err_free; + } + + dma_fence_signal(f.real); + if (!cb.seen) { + pr_err("Callback failed!\n"); + goto err_free; + } + + err = 0; +err_free: + free_fences(&f); + return err; +} + +static int wrap_add_callback_recurse(void *arg) +{ + struct simple_cb cb = {}; + struct dma_fence *chain; + struct fences f; + int err = -EINVAL; + + if (create_fences(&f, false)) + return -ENOMEM; + + chain = dma_fence_create_proxy(); + if (!chain) { + err = -ENOMEM; + goto err_free; + } + + dma_fence_replace_proxy(&f.slot, chain); + dma_fence_put(dma_fence_replace_proxy(&f.slot, f.real)); + dma_fence_put(chain); + + /* f.real <- chain <- f.proxy */ + + if (dma_fence_add_callback(f.proxy, &cb.cb, simple_callback)) { + pr_err("Failed to add callback, fence already signaled!\n"); + goto err_free; + } + + dma_fence_signal(f.real); + if (!cb.seen) { + pr_err("Callback failed!\n"); + goto err_free; + } + + err = 0; +err_free: + free_fences(&f); + return err; +} + +static int wrap_late_add_callback(void *arg) +{ + struct simple_cb cb = {}; + struct fences f; + int err = -EINVAL; + + if (create_fences(&f, true)) + return -ENOMEM; + + dma_fence_signal(f.real); + + if (!dma_fence_add_callback(f.proxy, &cb.cb, simple_callback)) { + pr_err("Added callback, but fence was already signaled!\n"); + goto err_free; + } + + dma_fence_signal(f.real); + if (cb.seen) { + pr_err("Callback called after failed attachment!\n"); + goto err_free; + } + + err = 0; +err_free: + free_fences(&f); + return err; +} + +static int wrap_early_add_callback(void *arg) +{ + struct simple_cb cb = {}; + struct fences f; + int err = -EINVAL; + + if (create_fences(&f, false)) + return -ENOMEM; + + if (dma_fence_add_callback(f.proxy, &cb.cb, simple_callback)) { + pr_err("Failed to add callback, fence already signaled!\n"); + goto err_free; + } + + dma_fence_replace_proxy(&f.slot, f.real); + dma_fence_signal(f.real); + if (!cb.seen) { + pr_err("Callback failed!\n"); + goto err_free; + } + + err = 0; +err_free: + free_fences(&f); + return err; +} + +static int wrap_early_add_callback_late(void *arg) +{ + struct simple_cb cb = {}; + struct fences f; + int err = -EINVAL; + + if (create_fences(&f, false)) + return -ENOMEM; + + dma_fence_signal(f.real); + + if (dma_fence_add_callback(f.proxy, &cb.cb, simple_callback)) { + pr_err("Failed to add callback, fence already signaled!\n"); + goto err_free; + } + + dma_fence_replace_proxy(&f.slot, f.real); + dma_fence_signal(f.real); + if (!cb.seen) { + pr_err("Callback failed!\n"); + goto err_free; + } + + err = 0; +err_free: + free_fences(&f); + return err; +} + +static int wrap_early_add_callback_early(void *arg) +{ + struct simple_cb cb = {}; + struct fences f; + int err = -EINVAL; + + if (create_fences(&f, false)) + return -ENOMEM; + + if (dma_fence_add_callback(f.proxy, &cb.cb, simple_callback)) { + pr_err("Failed to add callback, fence already signaled!\n"); + goto err_free; + } + + dma_fence_replace_proxy(&f.slot, f.real); + dma_fence_signal(f.real); + if (!cb.seen) { + pr_err("Callback failed!\n"); + goto err_free; + } + + err = 0; +err_free: + free_fences(&f); + return err; +} + +static int wrap_rm_callback(void *arg) +{ + struct simple_cb cb = {}; + struct fences f; + int err = -EINVAL; + + if (create_fences(&f, true)) + return -ENOMEM; + + if (dma_fence_add_callback(f.proxy, &cb.cb, simple_callback)) { + pr_err("Failed to add callback, fence already signaled!\n"); + goto err_free; + } + + if (!dma_fence_remove_callback(f.proxy, &cb.cb)) { + pr_err("Failed to remove callback!\n"); + goto err_free; + } + + dma_fence_signal(f.real); + if (cb.seen) { + pr_err("Callback still signaled after removal!\n"); + goto err_free; + } + + err = 0; +err_free: + free_fences(&f); + return err; +} + +static int wrap_late_rm_callback(void *arg) +{ + struct simple_cb cb = {}; + struct fences f; + int err = -EINVAL; + + if (create_fences(&f, true)) + return -ENOMEM; + + if (dma_fence_add_callback(f.proxy, &cb.cb, simple_callback)) { + pr_err("Failed to add callback, fence already signaled!\n"); + goto err_free; + } + + dma_fence_signal(f.real); + if (!cb.seen) { + pr_err("Callback failed!\n"); + goto err_free; + } + + if (dma_fence_remove_callback(f.proxy, &cb.cb)) { + pr_err("Callback removal succeed after being executed!\n"); + goto err_free; + } + + err = 0; +err_free: + free_fences(&f); + return err; +} + +static int wrap_status(void *arg) +{ + struct fences f; + int err = -EINVAL; + + if (create_fences(&f, true)) + return -ENOMEM; + + if (dma_fence_get_status(f.proxy)) { + pr_err("Fence unexpectedly has signaled status on creation\n"); + goto err_free; + } + + dma_fence_signal(f.real); + if (!dma_fence_get_status(f.proxy)) { + pr_err("Fence not reporting signaled status\n"); + goto err_free; + } + + err = 0; +err_free: + free_fences(&f); + return err; +} + +static int wrap_error(void *arg) +{ + struct fences f; + int err = -EINVAL; + + if (create_fences(&f, true)) + return -ENOMEM; + + dma_fence_set_error(f.real, -EIO); + + if (dma_fence_get_status(f.proxy)) { + pr_err("Fence unexpectedly has error status before signal\n"); + goto err_free; + } + + dma_fence_signal(f.real); + if (dma_fence_get_status(f.proxy) != -EIO) { + pr_err("Fence not reporting error status, got %d\n", + dma_fence_get_status(f.proxy)); + goto err_free; + } + + err = 0; +err_free: + free_fences(&f); + return err; +} + +static int wrap_wait(void *arg) +{ + struct fences f; + int err = -EINVAL; + + if (create_fences(&f, true)) + return -ENOMEM; + + if (dma_fence_wait_timeout(f.proxy, false, 0) != 0) { + pr_err("Wait reported complete before being signaled\n"); + goto err_free; + } + + dma_fence_signal(f.real); + + if (dma_fence_wait_timeout(f.proxy, false, 0) == 0) { + pr_err("Wait reported incomplete after being signaled\n"); + goto err_free; + } + + err = 0; +err_free: + dma_fence_signal(f.real); + free_fences(&f); + return err; +} + +struct wait_timer { + struct timer_list timer; + struct fences f; +}; + +static void wait_timer(struct timer_list *timer) +{ + struct wait_timer *wt = from_timer(wt, timer, timer); + + dma_fence_signal(wt->f.real); +} + +static int wrap_wait_timeout(void *arg) +{ + struct wait_timer wt; + int err = -EINVAL; + + if (create_fences(&wt.f, true)) + return -ENOMEM; + + timer_setup_on_stack(&wt.timer, wait_timer, 0); + + if (dma_fence_wait_timeout(wt.f.proxy, false, 1) != 0) { + pr_err("Wait reported complete before being signaled\n"); + goto err_free; + } + + mod_timer(&wt.timer, jiffies + 1); + + if (dma_fence_wait_timeout(wt.f.proxy, false, 2) != 0) { + if (timer_pending(&wt.timer)) { + pr_notice("Timer did not fire within the jiffie!\n"); + err = 0; /* not our fault! */ + } else { + pr_err("Wait reported incomplete after timeout\n"); + } + goto err_free; + } + + err = 0; +err_free: + del_timer_sync(&wt.timer); + destroy_timer_on_stack(&wt.timer); + dma_fence_signal(wt.f.real); + free_fences(&wt.f); + return err; +} + +struct proxy_wait { + struct wait_queue_entry base; + struct dma_fence *fence; + bool seen; +}; + +static int proxy_wait_cb(struct wait_queue_entry *entry, + unsigned int mode, int flags, void *key) +{ + struct proxy_wait *p = container_of(entry, typeof(*p), base); + + p->fence = key; + p->seen = true; + + return 0; +} + +static int wrap_listen_early(void *arg) +{ + struct proxy_wait wait = { .base.func = proxy_wait_cb }; + struct fences f; + int err = -EINVAL; + + if (create_fences(&f, false)) + return -ENOMEM; + + dma_fence_replace_proxy(&f.slot, f.real); + dma_fence_add_proxy_listener(f.proxy, &wait.base); + + if (!wait.seen) { + pr_err("Proxy listener was not called after replace!\n"); + err = -EINVAL; + goto err_free; + } + + if (wait.fence != f.real) { + pr_err("Proxy listener was not passed the real fence!\n"); + err = -EINVAL; + goto err_free; + } + + err = 0; +err_free: + dma_fence_signal(f.real); + free_fences(&f); + return err; +} + +static int wrap_listen_late(void *arg) +{ + struct proxy_wait wait = { .base.func = proxy_wait_cb }; + struct fences f; + int err = -EINVAL; + + if (create_fences(&f, false)) + return -ENOMEM; + + dma_fence_add_proxy_listener(f.proxy, &wait.base); + dma_fence_replace_proxy(&f.slot, f.real); + + if (!wait.seen) { + pr_err("Proxy listener was not called on replace!\n"); + err = -EINVAL; + goto err_free; + } + + if (wait.fence != f.real) { + pr_err("Proxy listener was not passed the real fence!\n"); + err = -EINVAL; + goto err_free; + } + + err = 0; +err_free: + dma_fence_signal(f.real); + free_fences(&f); + return err; +} + +static int wrap_listen_cancel(void *arg) +{ + struct proxy_wait wait = { .base.func = proxy_wait_cb }; + struct fences f; + int err = -EINVAL; + + if (create_fences(&f, false)) + return -ENOMEM; + + dma_fence_add_proxy_listener(f.proxy, &wait.base); + if (!dma_fence_remove_proxy_listener(f.proxy, &wait.base)) { + pr_err("Cancelling listener, already detached?\n"); + err = -EINVAL; + goto err_free; + } + dma_fence_replace_proxy(&f.slot, f.real); + + if (wait.seen) { + pr_err("Proxy listener was called after being removed!\n"); + err = -EINVAL; + goto err_free; + } + + if (dma_fence_remove_proxy_listener(f.proxy, &wait.base)) { + pr_err("Double listener cancellation!\n"); + err = -EINVAL; + goto err_free; + } + + err = 0; +err_free: + dma_fence_signal(f.real); + free_fences(&f); + return err; +} + +int dma_fence_proxy(void) +{ + static const struct subtest tests[] = { + SUBTEST(sanitycheck), + SUBTEST(wrap_target), + SUBTEST(wrap_proxy), + SUBTEST(wrap_signaling), + SUBTEST(wrap_signaling_recurse), + SUBTEST(wrap_add_callback), + SUBTEST(wrap_add_callback_recurse), + SUBTEST(wrap_late_add_callback), + SUBTEST(wrap_early_add_callback), + SUBTEST(wrap_early_add_callback_late), + SUBTEST(wrap_early_add_callback_early), + SUBTEST(wrap_rm_callback), + SUBTEST(wrap_late_rm_callback), + SUBTEST(wrap_status), + SUBTEST(wrap_error), + SUBTEST(wrap_wait), + SUBTEST(wrap_wait_timeout), + SUBTEST(wrap_listen_early), + SUBTEST(wrap_listen_late), + SUBTEST(wrap_listen_cancel), + }; + int ret; + + slab_fences = KMEM_CACHE(mock_fence, + SLAB_TYPESAFE_BY_RCU | + SLAB_HWCACHE_ALIGN); + if (!slab_fences) + return -ENOMEM; + + ret = subtests(tests, NULL); + + kmem_cache_destroy(slab_fences); + + return ret; +} diff --git a/include/linux/dma-fence-proxy.h b/include/linux/dma-fence-proxy.h new file mode 100644 index 000000000000..6a986b5bb009 --- /dev/null +++ b/include/linux/dma-fence-proxy.h @@ -0,0 +1,38 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +/* + * dma-fence-proxy: allows waiting upon unset and future fences + * + * Copyright (C) 2017 Intel Corporation + */ + +#ifndef __LINUX_DMA_FENCE_PROXY_H +#define __LINUX_DMA_FENCE_PROXY_H + +#include +#include + +struct wait_queue_entry; + +extern const struct dma_fence_ops dma_fence_proxy_ops; + +struct dma_fence *__dma_fence_create_proxy(u64 context, u64 seqno); +struct dma_fence *dma_fence_create_proxy(void); + +static inline bool dma_fence_is_proxy(struct dma_fence *fence) +{ + return fence->ops == &dma_fence_proxy_ops; +} + +void dma_fence_proxy_set_real(struct dma_fence *fence, struct dma_fence *real); +struct dma_fence *dma_fence_proxy_get_real(struct dma_fence *fence); + +struct dma_fence * +dma_fence_replace_proxy(struct dma_fence __rcu **slot, + struct dma_fence *fence); + +void dma_fence_add_proxy_listener(struct dma_fence *fence, + struct wait_queue_entry *wait); +bool dma_fence_remove_proxy_listener(struct dma_fence *fence, + struct wait_queue_entry *wait); + +#endif /* __LINUX_DMA_FENCE_PROXY_H */ From patchwork Mon May 25 07:53:42 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chris Wilson X-Patchwork-Id: 11568253 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 3691914B7 for ; Mon, 25 May 2020 07:54:11 +0000 (UTC) Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 1F9022071A for ; Mon, 25 May 2020 07:54:11 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 1F9022071A Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=chris-wilson.co.uk Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=intel-gfx-bounces@lists.freedesktop.org Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 7C95E89CA4; Mon, 25 May 2020 07:54:03 +0000 (UTC) X-Original-To: intel-gfx@lists.freedesktop.org Delivered-To: intel-gfx@lists.freedesktop.org Received: from fireflyinternet.com (mail.fireflyinternet.com [109.228.58.192]) by gabe.freedesktop.org (Postfix) with ESMTPS id 90964899D5 for ; Mon, 25 May 2020 07:53:59 +0000 (UTC) X-Default-Received-SPF: pass (skip=forwardok (res=PASS)) x-ip-name=78.156.65.138; Received: from build.alporthouse.com (unverified [78.156.65.138]) by fireflyinternet.com (Firefly Internet (M1)) with ESMTP id 21284202-1500050 for multiple; Mon, 25 May 2020 08:53:46 +0100 From: Chris Wilson To: intel-gfx@lists.freedesktop.org Date: Mon, 25 May 2020 08:53:42 +0100 Message-Id: <20200525075347.582-7-chris@chris-wilson.co.uk> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20200525075347.582-1-chris@chris-wilson.co.uk> References: <20200525075347.582-1-chris@chris-wilson.co.uk> MIME-Version: 1.0 Subject: [Intel-gfx] [PATCH 07/12] drm/i915: Unpeel awaits on a proxy fence X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Chris Wilson Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" If the real target for a proxy fence is known at the time we are attaching our awaits, use the real target in preference to hooking up to the proxy. If use the real target instead, we can optimize the awaits, e.g. if it along the same engine, we can order the submission and avoid the wait-for-completion. Signed-off-by: Chris Wilson --- drivers/gpu/drm/i915/i915_request.c | 157 ++++++++++++++++++++++++++ drivers/gpu/drm/i915/i915_scheduler.c | 41 +++++++ drivers/gpu/drm/i915/i915_scheduler.h | 3 + 3 files changed, 201 insertions(+) diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c index 22df5b229aed..945f47b216f7 100644 --- a/drivers/gpu/drm/i915/i915_request.c +++ b/drivers/gpu/drm/i915/i915_request.c @@ -24,6 +24,7 @@ #include #include +#include #include #include #include @@ -408,6 +409,7 @@ static bool fatal_error(int error) case 0: /* not an error! */ case -EAGAIN: /* innocent victim of a GT reset (__i915_request_reset) */ case -ETIMEDOUT: /* waiting for Godot (timer_i915_sw_fence_wake) */ + case -EDEADLK: /* cyclic fence lockup (await_proxy) */ return false; default: return true; @@ -1135,6 +1137,156 @@ i915_request_await_external(struct i915_request *rq, struct dma_fence *fence) return err; } +struct await_proxy { + struct wait_queue_entry base; + struct i915_request *request; + struct dma_fence *fence; + struct timer_list timer; + struct work_struct work; + int (*attach)(struct await_proxy *ap); + void *data; +}; + +static void await_proxy_work(struct work_struct *work) +{ + struct await_proxy *ap = container_of(work, typeof(*ap), work); + struct i915_request *rq = ap->request; + + del_timer_sync(&ap->timer); + + if (ap->fence) { + int err = 0; + + /* + * If the fence is external, we impose a 10s timeout. + * However, if the fence is internal, we skip a timeout in + * the belief that all fences are in-order (DAG, no cycles) + * and we can enforce forward progress by reset the GPU if + * necessary. A future fence, provided userspace, can trivially + * generate a cycle in the dependency graph, and so cause + * that entire cycle to become deadlocked and for no forward + * progress to either be made, and the driver being kept + * eternally awake. + */ + if (dma_fence_is_i915(ap->fence) && + !i915_sched_node_verify_dag(&rq->sched, + &to_request(ap->fence)->sched)) + err = -EDEADLK; + + if (!err) { + mutex_lock(&rq->context->timeline->mutex); + err = ap->attach(ap); + mutex_unlock(&rq->context->timeline->mutex); + } + + /* Don't flag an error for co-dependent scheduling */ + if (err == -EDEADLK) { + struct i915_sched_node *waiter = + &to_request(ap->fence)->sched; + struct i915_dependency *p; + + list_for_each_entry_lockless(p, + &rq->sched.waiters_list, + wait_link) { + if (p->waiter == waiter && + p->flags & I915_DEPENDENCY_WEAK) { + err = 0; + break; + } + } + } + + if (err < 0) + i915_sw_fence_set_error_once(&rq->submit, err); + } + + i915_sw_fence_complete(&rq->submit); + + dma_fence_put(ap->fence); + kfree(ap); +} + +static int +await_proxy_wake(struct wait_queue_entry *entry, + unsigned int mode, + int flags, + void *fence) +{ + struct await_proxy *ap = container_of(entry, typeof(*ap), base); + + ap->fence = dma_fence_get(fence); + schedule_work(&ap->work); + + return 0; +} + +static void +await_proxy_timer(struct timer_list *t) +{ + struct await_proxy *ap = container_of(t, typeof(*ap), timer); + + if (dma_fence_remove_proxy_listener(ap->base.private, &ap->base)) { + struct i915_request *rq = ap->request; + + pr_notice("Asynchronous wait on unset proxy fence by %s:%s:%llx timed out\n", + rq->fence.ops->get_driver_name(&rq->fence), + rq->fence.ops->get_timeline_name(&rq->fence), + rq->fence.seqno); + i915_sw_fence_set_error_once(&rq->submit, -ETIMEDOUT); + + schedule_work(&ap->work); + } +} + +static int +__i915_request_await_proxy(struct i915_request *rq, + struct dma_fence *fence, + unsigned long timeout, + int (*attach)(struct await_proxy *ap), + void *data) +{ + struct await_proxy *ap; + + ap = kzalloc(sizeof(*ap), I915_FENCE_GFP); + if (!ap) + return -ENOMEM; + + i915_sw_fence_await(&rq->submit); + mark_external(rq); + + ap->base.private = fence; + ap->base.func = await_proxy_wake; + ap->request = rq; + INIT_WORK(&ap->work, await_proxy_work); + ap->attach = attach; + ap->data = data; + + timer_setup(&ap->timer, await_proxy_timer, 0); + if (timeout) + mod_timer(&ap->timer, round_jiffies_up(jiffies + timeout)); + + dma_fence_add_proxy_listener(fence, &ap->base); + return 0; +} + +static int await_proxy(struct await_proxy *ap) +{ + return i915_request_await_dma_fence(ap->request, ap->fence); +} + +static int +i915_request_await_proxy(struct i915_request *rq, struct dma_fence *fence) +{ + /* + * Wait until we know the real fence so that can optimise the + * inter-fence synchronisation. + */ + return __i915_request_await_proxy(rq, fence, + i915_fence_context_timeout(rq->i915, + fence->context), + await_proxy, NULL); +} + int i915_request_await_dma_fence(struct i915_request *rq, struct dma_fence *fence) { @@ -1142,6 +1294,9 @@ i915_request_await_dma_fence(struct i915_request *rq, struct dma_fence *fence) unsigned int nchild = 1; int ret; + /* Unpeel the proxy fence if the real target is already known */ + fence = dma_fence_proxy_get_real(fence); + /* * Note that if the fence-array was created in signal-on-any mode, * we should *not* decompose it into its individual fences. However, @@ -1181,6 +1336,8 @@ i915_request_await_dma_fence(struct i915_request *rq, struct dma_fence *fence) if (dma_fence_is_i915(fence)) ret = i915_request_await_request(rq, to_request(fence)); + else if (dma_fence_is_proxy(fence)) + ret = i915_request_await_proxy(rq, fence); else ret = i915_request_await_external(rq, fence); if (ret < 0) diff --git a/drivers/gpu/drm/i915/i915_scheduler.c b/drivers/gpu/drm/i915/i915_scheduler.c index cbb880b10c65..250832768279 100644 --- a/drivers/gpu/drm/i915/i915_scheduler.c +++ b/drivers/gpu/drm/i915/i915_scheduler.c @@ -469,6 +469,47 @@ int i915_sched_node_add_dependency(struct i915_sched_node *node, return 0; } +bool i915_sched_node_verify_dag(struct i915_sched_node *waiter, + struct i915_sched_node *signaler) +{ + struct i915_dependency *dep, *p; + struct i915_dependency stack; + bool result = false; + LIST_HEAD(dfs); + + if (list_empty(&waiter->waiters_list)) + return true; + + spin_lock_irq(&schedule_lock); + + stack.signaler = signaler; + list_add(&stack.dfs_link, &dfs); + + list_for_each_entry(dep, &dfs, dfs_link) { + struct i915_sched_node *node = dep->signaler; + + if (node_signaled(node)) + continue; + + list_for_each_entry(p, &node->signalers_list, signal_link) { + if (p->signaler == waiter) + goto out; + + if (list_empty(&p->dfs_link)) + list_add_tail(&p->dfs_link, &dfs); + } + } + + result = true; +out: + list_for_each_entry_safe(dep, p, &dfs, dfs_link) + INIT_LIST_HEAD(&dep->dfs_link); + + spin_unlock_irq(&schedule_lock); + + return result; +} + void i915_sched_node_fini(struct i915_sched_node *node) { struct i915_dependency *dep, *tmp; diff --git a/drivers/gpu/drm/i915/i915_scheduler.h b/drivers/gpu/drm/i915/i915_scheduler.h index 6f0bf00fc569..13432add8929 100644 --- a/drivers/gpu/drm/i915/i915_scheduler.h +++ b/drivers/gpu/drm/i915/i915_scheduler.h @@ -28,6 +28,9 @@ void i915_sched_node_init(struct i915_sched_node *node); void i915_sched_node_reinit(struct i915_sched_node *node); +bool i915_sched_node_verify_dag(struct i915_sched_node *waiter, + struct i915_sched_node *signal); + bool __i915_sched_node_add_dependency(struct i915_sched_node *node, struct i915_sched_node *signal, struct i915_dependency *dep, From patchwork Mon May 25 07:53:43 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chris Wilson X-Patchwork-Id: 11568239 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id E68FA739 for ; Mon, 25 May 2020 07:54:02 +0000 (UTC) Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id B2C742071A for ; Mon, 25 May 2020 07:54:02 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org B2C742071A Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=chris-wilson.co.uk Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=intel-gfx-bounces@lists.freedesktop.org Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 7C6CE899AB; Mon, 25 May 2020 07:54:01 +0000 (UTC) X-Original-To: intel-gfx@lists.freedesktop.org Delivered-To: intel-gfx@lists.freedesktop.org Received: from fireflyinternet.com (mail.fireflyinternet.com [109.228.58.192]) by gabe.freedesktop.org (Postfix) with ESMTPS id 8E2AB899AB for ; Mon, 25 May 2020 07:53:59 +0000 (UTC) X-Default-Received-SPF: pass (skip=forwardok (res=PASS)) x-ip-name=78.156.65.138; Received: from build.alporthouse.com (unverified [78.156.65.138]) by fireflyinternet.com (Firefly Internet (M1)) with ESMTP id 21284203-1500050 for multiple; Mon, 25 May 2020 08:53:46 +0100 From: Chris Wilson To: intel-gfx@lists.freedesktop.org Date: Mon, 25 May 2020 08:53:43 +0100 Message-Id: <20200525075347.582-8-chris@chris-wilson.co.uk> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20200525075347.582-1-chris@chris-wilson.co.uk> References: <20200525075347.582-1-chris@chris-wilson.co.uk> MIME-Version: 1.0 Subject: [Intel-gfx] [PATCH 08/12] drm/i915: Add list_for_each_entry_safe_continue_reverse X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Chris Wilson Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" One more list iterator variant, for when we want to unwind from inside one list iterator with the intention of restarting from the current entry as the new head of the list. Signed-off-by: Chris Wilson --- drivers/gpu/drm/i915/i915_utils.h | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/drivers/gpu/drm/i915/i915_utils.h b/drivers/gpu/drm/i915/i915_utils.h index 03a73d2bd50d..6ebccdd12d4c 100644 --- a/drivers/gpu/drm/i915/i915_utils.h +++ b/drivers/gpu/drm/i915/i915_utils.h @@ -266,6 +266,12 @@ static inline int list_is_last_rcu(const struct list_head *list, return READ_ONCE(list->next) == head; } +#define list_for_each_entry_safe_continue_reverse(pos, n, head, member) \ + for (pos = list_prev_entry(pos, member), \ + n = list_prev_entry(pos, member); \ + &pos->member != (head); \ + pos = n, n = list_prev_entry(n, member)) + /* * Wait until the work is finally complete, even if it tries to postpone * by requeueing itself. Note, that if the worker never cancels itself, From patchwork Mon May 25 07:53:44 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chris Wilson X-Patchwork-Id: 11568251 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 6C950739 for ; Mon, 25 May 2020 07:54:10 +0000 (UTC) Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 553F02071A for ; Mon, 25 May 2020 07:54:10 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 553F02071A Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=chris-wilson.co.uk Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=intel-gfx-bounces@lists.freedesktop.org Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 9908D899BE; Mon, 25 May 2020 07:54:02 +0000 (UTC) X-Original-To: intel-gfx@lists.freedesktop.org Delivered-To: intel-gfx@lists.freedesktop.org Received: from fireflyinternet.com (mail.fireflyinternet.com [109.228.58.192]) by gabe.freedesktop.org (Postfix) with ESMTPS id 9218A899DE for ; Mon, 25 May 2020 07:54:00 +0000 (UTC) X-Default-Received-SPF: pass (skip=forwardok (res=PASS)) x-ip-name=78.156.65.138; Received: from build.alporthouse.com (unverified [78.156.65.138]) by fireflyinternet.com (Firefly Internet (M1)) with ESMTP id 21284204-1500050 for multiple; Mon, 25 May 2020 08:53:47 +0100 From: Chris Wilson To: intel-gfx@lists.freedesktop.org Date: Mon, 25 May 2020 08:53:44 +0100 Message-Id: <20200525075347.582-9-chris@chris-wilson.co.uk> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20200525075347.582-1-chris@chris-wilson.co.uk> References: <20200525075347.582-1-chris@chris-wilson.co.uk> MIME-Version: 1.0 Subject: [Intel-gfx] [PATCH 09/12] drm/i915/gem: Async GPU relocations only X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Chris Wilson Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" Reduce the 3 relocation patches down to the single path that accommodates all. The primary motivation for this is to guard the relocations with a natural fence (derived from the i915_request used to write the relocation from the GPU). The tradeoff in using async gpu relocations is that it increases latency over using direct CPU relocations, for the cases where the target is idle and accessible by the CPU. The benefit is greatly reduced lock contention and improved concurrency by pipelining. Note that forcing the async gpu relocations does reveal a few issues they have. Firstly, is that they are visible as writes to gem_busy, causing to mark some buffers are being to written to by the GPU even though userspace only reads. Secondly is that, in combination with the cmdparser, they can cause priority inversions. This should be the case where the work is being put into a common workqueue losing our priority information and so being executed in FIFO from the worker, denying us the opportunity to reorder the requests afterwards. Signed-off-by: Chris Wilson --- .../gpu/drm/i915/gem/i915_gem_execbuffer.c | 295 ++---------------- .../i915/gem/selftests/i915_gem_execbuffer.c | 21 +- 2 files changed, 27 insertions(+), 289 deletions(-) diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c index 219a36995b96..e840b2a85284 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c @@ -45,13 +45,6 @@ struct eb_vma_array { struct eb_vma vma[]; }; -enum { - FORCE_CPU_RELOC = 1, - FORCE_GTT_RELOC, - FORCE_GPU_RELOC, -#define DBG_FORCE_RELOC 0 /* choose one of the above! */ -}; - #define __EXEC_OBJECT_HAS_PIN BIT(31) #define __EXEC_OBJECT_HAS_FENCE BIT(30) #define __EXEC_OBJECT_NEEDS_MAP BIT(29) @@ -260,8 +253,6 @@ struct i915_execbuffer { */ struct reloc_cache { struct drm_mm_node node; /** temporary GTT binding */ - unsigned long vaddr; /** Current kmap address */ - unsigned long page; /** Currently mapped page index */ unsigned int gen; /** Cached value of INTEL_GEN */ bool use_64bit_reloc : 1; bool has_llc : 1; @@ -605,23 +596,6 @@ eb_add_vma(struct i915_execbuffer *eb, } } -static inline int use_cpu_reloc(const struct reloc_cache *cache, - const struct drm_i915_gem_object *obj) -{ - if (!i915_gem_object_has_struct_page(obj)) - return false; - - if (DBG_FORCE_RELOC == FORCE_CPU_RELOC) - return true; - - if (DBG_FORCE_RELOC == FORCE_GTT_RELOC) - return false; - - return (cache->has_llc || - obj->cache_dirty || - obj->cache_level != I915_CACHE_NONE); -} - static int eb_reserve_vma(const struct i915_execbuffer *eb, struct eb_vma *ev, u64 pin_flags) @@ -945,8 +919,6 @@ relocation_target(const struct drm_i915_gem_relocation_entry *reloc, static void reloc_cache_init(struct reloc_cache *cache, struct drm_i915_private *i915) { - cache->page = -1; - cache->vaddr = 0; /* Must be a variable in the struct to allow GCC to unroll. */ cache->gen = INTEL_GEN(i915); cache->has_llc = HAS_LLC(i915); @@ -1089,181 +1061,6 @@ static int reloc_gpu_flush(struct reloc_cache *cache) return err; } -static void reloc_cache_reset(struct reloc_cache *cache) -{ - void *vaddr; - - if (!cache->vaddr) - return; - - vaddr = unmask_page(cache->vaddr); - if (cache->vaddr & KMAP) { - if (cache->vaddr & CLFLUSH_AFTER) - mb(); - - kunmap_atomic(vaddr); - i915_gem_object_finish_access((struct drm_i915_gem_object *)cache->node.mm); - } else { - struct i915_ggtt *ggtt = cache_to_ggtt(cache); - - intel_gt_flush_ggtt_writes(ggtt->vm.gt); - io_mapping_unmap_atomic((void __iomem *)vaddr); - - if (drm_mm_node_allocated(&cache->node)) { - ggtt->vm.clear_range(&ggtt->vm, - cache->node.start, - cache->node.size); - mutex_lock(&ggtt->vm.mutex); - drm_mm_remove_node(&cache->node); - mutex_unlock(&ggtt->vm.mutex); - } else { - i915_vma_unpin((struct i915_vma *)cache->node.mm); - } - } - - cache->vaddr = 0; - cache->page = -1; -} - -static void *reloc_kmap(struct drm_i915_gem_object *obj, - struct reloc_cache *cache, - unsigned long page) -{ - void *vaddr; - - if (cache->vaddr) { - kunmap_atomic(unmask_page(cache->vaddr)); - } else { - unsigned int flushes; - int err; - - err = i915_gem_object_prepare_write(obj, &flushes); - if (err) - return ERR_PTR(err); - - BUILD_BUG_ON(KMAP & CLFLUSH_FLAGS); - BUILD_BUG_ON((KMAP | CLFLUSH_FLAGS) & PAGE_MASK); - - cache->vaddr = flushes | KMAP; - cache->node.mm = (void *)obj; - if (flushes) - mb(); - } - - vaddr = kmap_atomic(i915_gem_object_get_dirty_page(obj, page)); - cache->vaddr = unmask_flags(cache->vaddr) | (unsigned long)vaddr; - cache->page = page; - - return vaddr; -} - -static void *reloc_iomap(struct drm_i915_gem_object *obj, - struct reloc_cache *cache, - unsigned long page) -{ - struct i915_ggtt *ggtt = cache_to_ggtt(cache); - unsigned long offset; - void *vaddr; - - if (cache->vaddr) { - intel_gt_flush_ggtt_writes(ggtt->vm.gt); - io_mapping_unmap_atomic((void __force __iomem *) unmask_page(cache->vaddr)); - } else { - struct i915_vma *vma; - int err; - - if (i915_gem_object_is_tiled(obj)) - return ERR_PTR(-EINVAL); - - if (use_cpu_reloc(cache, obj)) - return NULL; - - i915_gem_object_lock(obj); - err = i915_gem_object_set_to_gtt_domain(obj, true); - i915_gem_object_unlock(obj); - if (err) - return ERR_PTR(err); - - vma = i915_gem_object_ggtt_pin(obj, NULL, 0, 0, - PIN_MAPPABLE | - PIN_NONBLOCK /* NOWARN */ | - PIN_NOEVICT); - if (IS_ERR(vma)) { - memset(&cache->node, 0, sizeof(cache->node)); - mutex_lock(&ggtt->vm.mutex); - err = drm_mm_insert_node_in_range - (&ggtt->vm.mm, &cache->node, - PAGE_SIZE, 0, I915_COLOR_UNEVICTABLE, - 0, ggtt->mappable_end, - DRM_MM_INSERT_LOW); - mutex_unlock(&ggtt->vm.mutex); - if (err) /* no inactive aperture space, use cpu reloc */ - return NULL; - } else { - cache->node.start = vma->node.start; - cache->node.mm = (void *)vma; - } - } - - offset = cache->node.start; - if (drm_mm_node_allocated(&cache->node)) { - ggtt->vm.insert_page(&ggtt->vm, - i915_gem_object_get_dma_address(obj, page), - offset, I915_CACHE_NONE, 0); - } else { - offset += page << PAGE_SHIFT; - } - - vaddr = (void __force *)io_mapping_map_atomic_wc(&ggtt->iomap, - offset); - cache->page = page; - cache->vaddr = (unsigned long)vaddr; - - return vaddr; -} - -static void *reloc_vaddr(struct drm_i915_gem_object *obj, - struct reloc_cache *cache, - unsigned long page) -{ - void *vaddr; - - if (cache->page == page) { - vaddr = unmask_page(cache->vaddr); - } else { - vaddr = NULL; - if ((cache->vaddr & KMAP) == 0) - vaddr = reloc_iomap(obj, cache, page); - if (!vaddr) - vaddr = reloc_kmap(obj, cache, page); - } - - return vaddr; -} - -static void clflush_write32(u32 *addr, u32 value, unsigned int flushes) -{ - if (unlikely(flushes & (CLFLUSH_BEFORE | CLFLUSH_AFTER))) { - if (flushes & CLFLUSH_BEFORE) { - clflushopt(addr); - mb(); - } - - *addr = value; - - /* - * Writes to the same cacheline are serialised by the CPU - * (including clflush). On the write path, we only require - * that it hits memory in an orderly fashion and place - * mb barriers at the start and end of the relocation phase - * to ensure ordering of clflush wrt to the system. - */ - if (flushes & CLFLUSH_AFTER) - clflushopt(addr); - } else - *addr = value; -} - static int reloc_move_to_gpu(struct i915_request *rq, struct i915_vma *vma) { struct drm_i915_gem_object *obj = vma->obj; @@ -1429,17 +1226,6 @@ static u32 *reloc_gpu(struct i915_execbuffer *eb, return cmd; } -static inline bool use_reloc_gpu(struct i915_vma *vma) -{ - if (DBG_FORCE_RELOC == FORCE_GPU_RELOC) - return true; - - if (DBG_FORCE_RELOC) - return false; - - return !dma_resv_test_signaled_rcu(vma->resv, true); -} - static unsigned long vma_phys_addr(struct i915_vma *vma, u32 offset) { struct page *page; @@ -1454,10 +1240,10 @@ static unsigned long vma_phys_addr(struct i915_vma *vma, u32 offset) return addr + offset_in_page(offset); } -static bool __reloc_entry_gpu(struct i915_execbuffer *eb, - struct i915_vma *vma, - u64 offset, - u64 target_addr) +static int __reloc_entry_gpu(struct i915_execbuffer *eb, + struct i915_vma *vma, + u64 offset, + u64 target_addr) { const unsigned int gen = eb->reloc_cache.gen; unsigned int len; @@ -1473,7 +1259,7 @@ static bool __reloc_entry_gpu(struct i915_execbuffer *eb, batch = reloc_gpu(eb, vma, len); if (IS_ERR(batch)) - return false; + return PTR_ERR(batch); addr = gen8_canonical_addr(vma->node.start + offset); if (gen >= 8) { @@ -1522,55 +1308,21 @@ static bool __reloc_entry_gpu(struct i915_execbuffer *eb, *batch++ = target_addr; } - return true; -} - -static bool reloc_entry_gpu(struct i915_execbuffer *eb, - struct i915_vma *vma, - u64 offset, - u64 target_addr) -{ - if (eb->reloc_cache.vaddr) - return false; - - if (!use_reloc_gpu(vma)) - return false; - - return __reloc_entry_gpu(eb, vma, offset, target_addr); + return 0; } static u64 -relocate_entry(struct i915_vma *vma, +relocate_entry(struct i915_execbuffer *eb, + struct i915_vma *vma, const struct drm_i915_gem_relocation_entry *reloc, - struct i915_execbuffer *eb, const struct i915_vma *target) { u64 target_addr = relocation_target(reloc, target); - u64 offset = reloc->offset; - - if (!reloc_entry_gpu(eb, vma, offset, target_addr)) { - bool wide = eb->reloc_cache.use_64bit_reloc; - void *vaddr; - -repeat: - vaddr = reloc_vaddr(vma->obj, - &eb->reloc_cache, - offset >> PAGE_SHIFT); - if (IS_ERR(vaddr)) - return PTR_ERR(vaddr); - - GEM_BUG_ON(!IS_ALIGNED(offset, sizeof(u32))); - clflush_write32(vaddr + offset_in_page(offset), - lower_32_bits(target_addr), - eb->reloc_cache.vaddr); - - if (wide) { - offset += sizeof(u32); - target_addr >>= 32; - wide = false; - goto repeat; - } - } + int err; + + err = __reloc_entry_gpu(eb, vma, reloc->offset, target_addr); + if (err) + return err; return target->node.start | UPDATE; } @@ -1635,8 +1387,7 @@ eb_relocate_entry(struct i915_execbuffer *eb, * If the relocation already has the right value in it, no * more work needs to be done. */ - if (!DBG_FORCE_RELOC && - gen8_canonical_addr(target->vma->node.start) == reloc->presumed_offset) + if (gen8_canonical_addr(target->vma->node.start) == reloc->presumed_offset) return 0; /* Check that the relocation address is valid... */ @@ -1668,7 +1419,7 @@ eb_relocate_entry(struct i915_execbuffer *eb, ev->flags &= ~EXEC_OBJECT_ASYNC; /* and update the user's relocation entry */ - return relocate_entry(ev->vma, reloc, eb, target->vma); + return relocate_entry(eb, ev->vma, reloc, target->vma); } static int eb_relocate_vma(struct i915_execbuffer *eb, struct eb_vma *ev) @@ -1706,10 +1457,8 @@ static int eb_relocate_vma(struct i915_execbuffer *eb, struct eb_vma *ev) * this is bad and so lockdep complains vehemently. */ copied = __copy_from_user(r, urelocs, count * sizeof(r[0])); - if (unlikely(copied)) { - remain = -EFAULT; - goto out; - } + if (unlikely(copied)) + return -EFAULT; remain -= count; do { @@ -1717,8 +1466,7 @@ static int eb_relocate_vma(struct i915_execbuffer *eb, struct eb_vma *ev) if (likely(offset == 0)) { } else if ((s64)offset < 0) { - remain = (int)offset; - goto out; + return (int)offset; } else { /* * Note that reporting an error now @@ -1748,9 +1496,8 @@ static int eb_relocate_vma(struct i915_execbuffer *eb, struct eb_vma *ev) } while (r++, --count); urelocs += ARRAY_SIZE(stack); } while (remain); -out: - reloc_cache_reset(&eb->reloc_cache); - return remain; + + return 0; } static int eb_relocate(struct i915_execbuffer *eb) @@ -2618,7 +2365,7 @@ i915_gem_do_execbuffer(struct drm_device *dev, eb.i915 = i915; eb.file = file; eb.args = args; - if (DBG_FORCE_RELOC || !(args->flags & I915_EXEC_NO_RELOC)) + if (!(args->flags & I915_EXEC_NO_RELOC)) args->flags |= __EXEC_HAS_RELOC; eb.exec = exec; diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/selftests/i915_gem_execbuffer.c index a49016f8ee0d..57c14d3340cd 100644 --- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_execbuffer.c +++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_execbuffer.c @@ -37,20 +37,14 @@ static int __igt_gpu_reloc(struct i915_execbuffer *eb, return err; /* 8-Byte aligned */ - if (!__reloc_entry_gpu(eb, vma, - offsets[0] * sizeof(u32), - 0)) { - err = -EIO; + err = __reloc_entry_gpu(eb, vma, offsets[0] * sizeof(u32), 0); + if (err) goto unpin_vma; - } /* !8-Byte aligned */ - if (!__reloc_entry_gpu(eb, vma, - offsets[1] * sizeof(u32), - 1)) { - err = -EIO; + err = __reloc_entry_gpu(eb, vma, offsets[1] * sizeof(u32), 1); + if (err) goto unpin_vma; - } /* Skip to the end of the cmd page */ i = PAGE_SIZE / sizeof(u32) - RELOC_TAIL - 1; @@ -60,12 +54,9 @@ static int __igt_gpu_reloc(struct i915_execbuffer *eb, eb->reloc_cache.rq_size += i; /* Force batch chaining */ - if (!__reloc_entry_gpu(eb, vma, - offsets[2] * sizeof(u32), - 2)) { - err = -EIO; + err = __reloc_entry_gpu(eb, vma, offsets[2] * sizeof(u32), 2); + if (err) goto unpin_vma; - } GEM_BUG_ON(!eb->reloc_cache.rq); rq = i915_request_get(eb->reloc_cache.rq); From patchwork Mon May 25 07:53:45 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chris Wilson X-Patchwork-Id: 11568247 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 7F53C14B7 for ; Mon, 25 May 2020 07:54:08 +0000 (UTC) Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 66A8E2071A for ; Mon, 25 May 2020 07:54:08 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 66A8E2071A Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=chris-wilson.co.uk Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=intel-gfx-bounces@lists.freedesktop.org Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 417F7899DB; Mon, 25 May 2020 07:54:02 +0000 (UTC) X-Original-To: intel-gfx@lists.freedesktop.org Delivered-To: intel-gfx@lists.freedesktop.org Received: from fireflyinternet.com (mail.fireflyinternet.com [109.228.58.192]) by gabe.freedesktop.org (Postfix) with ESMTPS id 91A12899DB for ; Mon, 25 May 2020 07:53:59 +0000 (UTC) X-Default-Received-SPF: pass (skip=forwardok (res=PASS)) x-ip-name=78.156.65.138; Received: from build.alporthouse.com (unverified [78.156.65.138]) by fireflyinternet.com (Firefly Internet (M1)) with ESMTP id 21284205-1500050 for multiple; Mon, 25 May 2020 08:53:47 +0100 From: Chris Wilson To: intel-gfx@lists.freedesktop.org Date: Mon, 25 May 2020 08:53:45 +0100 Message-Id: <20200525075347.582-10-chris@chris-wilson.co.uk> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20200525075347.582-1-chris@chris-wilson.co.uk> References: <20200525075347.582-1-chris@chris-wilson.co.uk> MIME-Version: 1.0 Subject: [Intel-gfx] [PATCH 10/12] drm/i915/gem: Lift GPU relocation allocation X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Chris Wilson Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" Since we have reduced the relocations paths to just use the async GPU, we can lift the request allocation to the start of the relocations. Knowing that we use one request for all relocations will simplify tracking the relocation fence. Signed-off-by: Chris Wilson --- .../gpu/drm/i915/gem/i915_gem_execbuffer.c | 53 +++++++++---------- .../i915/gem/selftests/i915_gem_execbuffer.c | 4 ++ 2 files changed, 29 insertions(+), 28 deletions(-) diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c index e840b2a85284..903805455af1 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c @@ -900,8 +900,6 @@ eb_get_vma(const struct i915_execbuffer *eb, unsigned long handle) static void eb_destroy(const struct i915_execbuffer *eb) { - GEM_BUG_ON(eb->reloc_cache.rq); - if (eb->array) eb_vma_array_put(eb->array); @@ -926,7 +924,6 @@ static void reloc_cache_init(struct reloc_cache *cache, cache->has_fence = cache->gen < 4; cache->needs_unfenced = INTEL_INFO(i915)->unfenced_needs_alignment; cache->node.flags = 0; - cache->rq = NULL; cache->target = NULL; } @@ -1081,9 +1078,8 @@ static int reloc_move_to_gpu(struct i915_request *rq, struct i915_vma *vma) return err; } -static int __reloc_gpu_alloc(struct i915_execbuffer *eb, - struct intel_engine_cs *engine, - unsigned int len) +static int +__reloc_gpu_alloc(struct i915_execbuffer *eb, struct intel_engine_cs *engine) { struct reloc_cache *cache = &eb->reloc_cache; struct intel_gt_buffer_pool_node *pool; @@ -1173,11 +1169,6 @@ static int __reloc_gpu_alloc(struct i915_execbuffer *eb, return err; } -static bool reloc_can_use_engine(const struct intel_engine_cs *engine) -{ - return engine->class != VIDEO_DECODE_CLASS || !IS_GEN(engine->i915, 6); -} - static u32 *reloc_gpu(struct i915_execbuffer *eb, struct i915_vma *vma, unsigned int len) @@ -1186,20 +1177,6 @@ static u32 *reloc_gpu(struct i915_execbuffer *eb, u32 *cmd; int err; - if (unlikely(!cache->rq)) { - struct intel_engine_cs *engine = eb->engine; - - if (!reloc_can_use_engine(engine)) { - engine = engine->gt->engine_class[COPY_ENGINE_CLASS][0]; - if (!engine) - return ERR_PTR(-ENODEV); - } - - err = __reloc_gpu_alloc(eb, engine, len); - if (unlikely(err)) - return ERR_PTR(err); - } - if (vma != cache->target) { err = reloc_move_to_gpu(cache->rq, vma); if (unlikely(err)) { @@ -1500,6 +1477,24 @@ static int eb_relocate_vma(struct i915_execbuffer *eb, struct eb_vma *ev) return 0; } +static bool reloc_can_use_engine(const struct intel_engine_cs *engine) +{ + return engine->class != VIDEO_DECODE_CLASS || !IS_GEN(engine->i915, 6); +} + +static int reloc_gpu_alloc(struct i915_execbuffer *eb) +{ + struct intel_engine_cs *engine = eb->engine; + + if (!reloc_can_use_engine(engine)) { + engine = engine->gt->engine_class[COPY_ENGINE_CLASS][0]; + if (!engine) + return -ENODEV; + } + + return __reloc_gpu_alloc(eb, engine); +} + static int eb_relocate(struct i915_execbuffer *eb) { int err; @@ -1519,6 +1514,11 @@ static int eb_relocate(struct i915_execbuffer *eb) struct eb_vma *ev; int flush; + err = reloc_gpu_alloc(eb); + if (err) + return err; + GEM_BUG_ON(!eb->reloc_cache.rq); + list_for_each_entry(ev, &eb->relocs, reloc_link) { err = eb_relocate_vma(eb, ev); if (err) @@ -2490,9 +2490,6 @@ i915_gem_do_execbuffer(struct drm_device *dev, batch = vma; } - /* All GPU relocation batches must be submitted prior to the user rq */ - GEM_BUG_ON(eb.reloc_cache.rq); - /* Allocate a request for this batch buffer nice and early. */ eb.request = i915_request_create(eb.context); if (IS_ERR(eb.request)) { diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/selftests/i915_gem_execbuffer.c index 57c14d3340cd..f0ec23632a78 100644 --- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_execbuffer.c +++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_execbuffer.c @@ -36,6 +36,10 @@ static int __igt_gpu_reloc(struct i915_execbuffer *eb, if (err) return err; + err = reloc_gpu_alloc(eb); + if (err) + goto unpin_vma; + /* 8-Byte aligned */ err = __reloc_entry_gpu(eb, vma, offsets[0] * sizeof(u32), 0); if (err) From patchwork Mon May 25 07:53:46 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chris Wilson X-Patchwork-Id: 11568243 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 1A7B3739 for ; Mon, 25 May 2020 07:54:06 +0000 (UTC) Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 035882071A for ; Mon, 25 May 2020 07:54:06 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 035882071A Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=chris-wilson.co.uk Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=intel-gfx-bounces@lists.freedesktop.org Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id E1D37899D5; Mon, 25 May 2020 07:54:01 +0000 (UTC) X-Original-To: intel-gfx@lists.freedesktop.org Delivered-To: intel-gfx@lists.freedesktop.org Received: from fireflyinternet.com (mail.fireflyinternet.com [109.228.58.192]) by gabe.freedesktop.org (Postfix) with ESMTPS id 8FE20899D4 for ; Mon, 25 May 2020 07:53:59 +0000 (UTC) X-Default-Received-SPF: pass (skip=forwardok (res=PASS)) x-ip-name=78.156.65.138; Received: from build.alporthouse.com (unverified [78.156.65.138]) by fireflyinternet.com (Firefly Internet (M1)) with ESMTP id 21284206-1500050 for multiple; Mon, 25 May 2020 08:53:47 +0100 From: Chris Wilson To: intel-gfx@lists.freedesktop.org Date: Mon, 25 May 2020 08:53:46 +0100 Message-Id: <20200525075347.582-11-chris@chris-wilson.co.uk> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20200525075347.582-1-chris@chris-wilson.co.uk> References: <20200525075347.582-1-chris@chris-wilson.co.uk> MIME-Version: 1.0 Subject: [Intel-gfx] [PATCH 11/12] drm/i915/gem: Add all GPU reloc awaits/signals en masse X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Chris Wilson Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" Asynchronous waits and signaling form a traditional semaphore with all the usual ordering problems with taking multiple locks. If we want to add more than one wait on a shared resource by the GPU, we must ensure that all the associated timelines are advanced atomically, ergo we must lock all the timelines en masse. Testcase: igt/gem_exec_reloc/basic-concurrent16 Fixes: 0e97fbb08055 ("drm/i915/gem: Use a single chained reloc batches for a single execbuf") References: https://gitlab.freedesktop.org/drm/intel/-/issues/1889 Signed-off-by: Chris Wilson --- .../gpu/drm/i915/gem/i915_gem_execbuffer.c | 106 ++++++++++++------ .../i915/gem/selftests/i915_gem_execbuffer.c | 24 ++-- 2 files changed, 86 insertions(+), 44 deletions(-) diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c index 903805455af1..be6fb2ebf8cc 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c @@ -259,7 +259,6 @@ struct i915_execbuffer { bool has_fence : 1; bool needs_unfenced : 1; - struct i915_vma *target; struct i915_request *rq; struct i915_vma *rq_vma; u32 *rq_cmd; @@ -924,7 +923,6 @@ static void reloc_cache_init(struct reloc_cache *cache, cache->has_fence = cache->gen < 4; cache->needs_unfenced = INTEL_INFO(i915)->unfenced_needs_alignment; cache->node.flags = 0; - cache->target = NULL; } static inline void *unmask_page(unsigned long p) @@ -1058,26 +1056,6 @@ static int reloc_gpu_flush(struct reloc_cache *cache) return err; } -static int reloc_move_to_gpu(struct i915_request *rq, struct i915_vma *vma) -{ - struct drm_i915_gem_object *obj = vma->obj; - int err; - - i915_vma_lock(vma); - - if (obj->cache_dirty & ~obj->cache_coherent) - i915_gem_clflush_object(obj, 0); - obj->write_domain = 0; - - err = i915_request_await_object(rq, vma->obj, true); - if (err == 0) - err = i915_vma_move_to_active(vma, rq, EXEC_OBJECT_WRITE); - - i915_vma_unlock(vma); - - return err; -} - static int __reloc_gpu_alloc(struct i915_execbuffer *eb, struct intel_engine_cs *engine) { @@ -1177,16 +1155,6 @@ static u32 *reloc_gpu(struct i915_execbuffer *eb, u32 *cmd; int err; - if (vma != cache->target) { - err = reloc_move_to_gpu(cache->rq, vma); - if (unlikely(err)) { - i915_request_set_error_once(cache->rq, err); - return ERR_PTR(err); - } - - cache->target = vma; - } - if (unlikely(cache->rq_size + len > PAGE_SIZE / sizeof(u32) - RELOC_TAIL)) { err = reloc_gpu_chain(cache); @@ -1477,6 +1445,74 @@ static int eb_relocate_vma(struct i915_execbuffer *eb, struct eb_vma *ev) return 0; } +static int reloc_move_to_gpu(struct reloc_cache *cache, struct eb_vma *ev) +{ + struct i915_request *rq = cache->rq; + struct i915_vma *vma = ev->vma; + struct drm_i915_gem_object *obj = vma->obj; + int err; + + if (obj->cache_dirty & ~obj->cache_coherent) + i915_gem_clflush_object(obj, 0); + + obj->write_domain = I915_GEM_DOMAIN_RENDER; + obj->read_domains = I915_GEM_DOMAIN_RENDER; + + err = i915_request_await_object(rq, obj, true); + if (err == 0) { + dma_resv_add_excl_fence(vma->resv, &rq->fence); + err = __i915_vma_move_to_active(vma, rq); + } + + return err; +} + +static int +lock_relocs(struct i915_execbuffer *eb) +{ + struct ww_acquire_ctx acquire; + struct eb_vma *ev; + int err = 0; + + ww_acquire_init(&acquire, &reservation_ww_class); + + list_for_each_entry(ev, &eb->relocs, reloc_link) { + struct i915_vma *vma = ev->vma; + + err = ww_mutex_lock_interruptible(&vma->resv->lock, &acquire); + if (err == -EDEADLK) { + struct eb_vma *unlock = ev, *en; + + list_for_each_entry_safe_continue_reverse(unlock, en, + &eb->relocs, + reloc_link) { + ww_mutex_unlock(&unlock->vma->resv->lock); + list_move_tail(&unlock->reloc_link, + &eb->relocs); + } + + GEM_BUG_ON(!list_is_first(&ev->reloc_link, + &eb->relocs)); + err = ww_mutex_lock_slow_interruptible(&vma->resv->lock, + &acquire); + } + if (err) + break; + } + + ww_acquire_done(&acquire); + + list_for_each_entry_continue_reverse(ev, &eb->relocs, reloc_link) { + if (err == 0) + err = reloc_move_to_gpu(&eb->reloc_cache, ev); + ww_mutex_unlock(&ev->vma->resv->lock); + } + + ww_acquire_fini(&acquire); + + return err; +} + static bool reloc_can_use_engine(const struct intel_engine_cs *engine) { return engine->class != VIDEO_DECODE_CLASS || !IS_GEN(engine->i915, 6); @@ -1519,10 +1555,10 @@ static int eb_relocate(struct i915_execbuffer *eb) return err; GEM_BUG_ON(!eb->reloc_cache.rq); + err = lock_relocs(eb); list_for_each_entry(ev, &eb->relocs, reloc_link) { - err = eb_relocate_vma(eb, ev); - if (err) - break; + if (err == 0) + err = eb_relocate_vma(eb, ev); } flush = reloc_gpu_flush(&eb->reloc_cache); diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/selftests/i915_gem_execbuffer.c index f0ec23632a78..8b06609e260e 100644 --- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_execbuffer.c +++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_execbuffer.c @@ -24,15 +24,15 @@ static int __igt_gpu_reloc(struct i915_execbuffer *eb, GENMASK_ULL(eb->reloc_cache.use_64bit_reloc ? 63 : 31, 0); const u32 *map = page_mask_bits(obj->mm.mapping); struct i915_request *rq; - struct i915_vma *vma; + struct eb_vma ev; int err; int i; - vma = i915_vma_instance(obj, eb->context->vm, NULL); - if (IS_ERR(vma)) - return PTR_ERR(vma); + ev.vma = i915_vma_instance(obj, eb->context->vm, NULL); + if (IS_ERR(ev.vma)) + return PTR_ERR(ev.vma); - err = i915_vma_pin(vma, 0, 0, PIN_USER | PIN_HIGH); + err = i915_vma_pin(ev.vma, 0, 0, PIN_USER | PIN_HIGH); if (err) return err; @@ -40,13 +40,19 @@ static int __igt_gpu_reloc(struct i915_execbuffer *eb, if (err) goto unpin_vma; + i915_vma_lock(ev.vma); + err = reloc_move_to_gpu(&eb->reloc_cache, &ev); + i915_vma_unlock(ev.vma); + if (err) + goto unpin_vma; + /* 8-Byte aligned */ - err = __reloc_entry_gpu(eb, vma, offsets[0] * sizeof(u32), 0); + err = __reloc_entry_gpu(eb, ev.vma, offsets[0] * sizeof(u32), 0); if (err) goto unpin_vma; /* !8-Byte aligned */ - err = __reloc_entry_gpu(eb, vma, offsets[1] * sizeof(u32), 1); + err = __reloc_entry_gpu(eb, ev.vma, offsets[1] * sizeof(u32), 1); if (err) goto unpin_vma; @@ -58,7 +64,7 @@ static int __igt_gpu_reloc(struct i915_execbuffer *eb, eb->reloc_cache.rq_size += i; /* Force batch chaining */ - err = __reloc_entry_gpu(eb, vma, offsets[2] * sizeof(u32), 2); + err = __reloc_entry_gpu(eb, ev.vma, offsets[2] * sizeof(u32), 2); if (err) goto unpin_vma; @@ -96,7 +102,7 @@ static int __igt_gpu_reloc(struct i915_execbuffer *eb, put_rq: i915_request_put(rq); unpin_vma: - i915_vma_unpin(vma); + i915_vma_unpin(ev.vma); return err; } From patchwork Mon May 25 07:53:47 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chris Wilson X-Patchwork-Id: 11568259 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 8088A14B7 for ; Mon, 25 May 2020 07:54:13 +0000 (UTC) Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 698E32071A for ; Mon, 25 May 2020 07:54:13 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 698E32071A Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=chris-wilson.co.uk Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=intel-gfx-bounces@lists.freedesktop.org Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id C54896E03E; Mon, 25 May 2020 07:54:03 +0000 (UTC) X-Original-To: intel-gfx@lists.freedesktop.org Delivered-To: intel-gfx@lists.freedesktop.org Received: from fireflyinternet.com (mail.fireflyinternet.com [109.228.58.192]) by gabe.freedesktop.org (Postfix) with ESMTPS id A7D2689B95 for ; Mon, 25 May 2020 07:54:00 +0000 (UTC) X-Default-Received-SPF: pass (skip=forwardok (res=PASS)) x-ip-name=78.156.65.138; Received: from build.alporthouse.com (unverified [78.156.65.138]) by fireflyinternet.com (Firefly Internet (M1)) with ESMTP id 21284207-1500050 for multiple; Mon, 25 May 2020 08:53:47 +0100 From: Chris Wilson To: intel-gfx@lists.freedesktop.org Date: Mon, 25 May 2020 08:53:47 +0100 Message-Id: <20200525075347.582-12-chris@chris-wilson.co.uk> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20200525075347.582-1-chris@chris-wilson.co.uk> References: <20200525075347.582-1-chris@chris-wilson.co.uk> MIME-Version: 1.0 Subject: [Intel-gfx] [PATCH 12/12] drm/i915/gem: Make relocations atomic within execbuf X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Chris Wilson Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" Although we may chide userspace for reusing the same batches concurrently from multiple threads, at the same time we must be very careful to only execute the batch and its relocations as supplied by the user. If we are not careful, we may allow another thread to rewrite the current batch with its own relocations. We must order the relocations and their batch such that they are an atomic pair on the GPU, and that the ioctl itself appears atomic to userspace. The order of execution may be undetermined, but it will not be subverted. We could do this by moving the relocations into the main request, if it were not for the situation where we need a second engine to perform the relocations for us. Instead, we use the dependency tracking to only publish the write fence on the main request and not on the relocation request, so that concurrent updates are queued after the batch has consumed its relocations. Testcase: igt/gem_exec_reloc/basic-concurrent Fixes: ef398881d27d ("drm/i915/gem: Limit struct_mutex to eb_reserve") Signed-off-by: Chris Wilson --- .../gpu/drm/i915/gem/i915_gem_execbuffer.c | 103 ++++++++++++------ .../i915/gem/selftests/i915_gem_execbuffer.c | 11 +- 2 files changed, 74 insertions(+), 40 deletions(-) diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c index be6fb2ebf8cc..854edc67a82b 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c @@ -5,6 +5,7 @@ */ #include +#include #include #include #include @@ -259,6 +260,8 @@ struct i915_execbuffer { bool has_fence : 1; bool needs_unfenced : 1; + struct dma_fence *fence; + struct i915_request *rq; struct i915_vma *rq_vma; u32 *rq_cmd; @@ -555,16 +558,6 @@ eb_add_vma(struct i915_execbuffer *eb, ev->exec = entry; ev->flags = entry->flags; - if (eb->lut_size > 0) { - ev->handle = entry->handle; - hlist_add_head(&ev->node, - &eb->buckets[hash_32(entry->handle, - eb->lut_size)]); - } - - if (entry->relocation_count) - list_add_tail(&ev->reloc_link, &eb->relocs); - /* * SNA is doing fancy tricks with compressing batch buffers, which leads * to negative relocation deltas. Usually that works out ok since the @@ -581,9 +574,21 @@ eb_add_vma(struct i915_execbuffer *eb, if (eb->reloc_cache.has_fence) ev->flags |= EXEC_OBJECT_NEEDS_FENCE; + INIT_LIST_HEAD(&ev->reloc_link); + eb->batch = ev; } + if (entry->relocation_count) + list_add_tail(&ev->reloc_link, &eb->relocs); + + if (eb->lut_size > 0) { + ev->handle = entry->handle; + hlist_add_head(&ev->node, + &eb->buckets[hash_32(entry->handle, + eb->lut_size)]); + } + if (eb_pin_vma(eb, entry, ev)) { if (entry->offset != vma->node.start) { entry->offset = vma->node.start | UPDATE; @@ -923,6 +928,7 @@ static void reloc_cache_init(struct reloc_cache *cache, cache->has_fence = cache->gen < 4; cache->needs_unfenced = INTEL_INFO(i915)->unfenced_needs_alignment; cache->node.flags = 0; + cache->fence = NULL; } static inline void *unmask_page(unsigned long p) @@ -1021,13 +1027,9 @@ static unsigned int reloc_bb_flags(const struct reloc_cache *cache) static int reloc_gpu_flush(struct reloc_cache *cache) { - struct i915_request *rq; + struct i915_request *rq = cache->rq; int err; - rq = fetch_and_zero(&cache->rq); - if (!rq) - return 0; - if (cache->rq_vma) { struct drm_i915_gem_object *obj = cache->rq_vma->obj; @@ -1051,6 +1053,7 @@ static int reloc_gpu_flush(struct reloc_cache *cache) i915_request_set_error_once(rq, err); intel_gt_chipset_flush(rq->engine->gt); + i915_request_get(rq); i915_request_add(rq); return err; @@ -1353,16 +1356,6 @@ eb_relocate_entry(struct i915_execbuffer *eb, return -EINVAL; } - /* - * If we write into the object, we need to force the synchronisation - * barrier, either with an asynchronous clflush or if we executed the - * patching using the GPU (though that should be serialised by the - * timeline). To be completely sure, and since we are required to - * do relocations we are already stalling, disable the user's opt - * out of our synchronisation. - */ - ev->flags &= ~EXEC_OBJECT_ASYNC; - /* and update the user's relocation entry */ return relocate_entry(eb, ev->vma, reloc, target->vma); } @@ -1457,10 +1450,11 @@ static int reloc_move_to_gpu(struct reloc_cache *cache, struct eb_vma *ev) obj->write_domain = I915_GEM_DOMAIN_RENDER; obj->read_domains = I915_GEM_DOMAIN_RENDER; + ev->flags |= EXEC_OBJECT_ASYNC; err = i915_request_await_object(rq, obj, true); if (err == 0) { - dma_resv_add_excl_fence(vma->resv, &rq->fence); + dma_resv_add_excl_fence(vma->resv, cache->fence); err = __i915_vma_move_to_active(vma, rq); } @@ -1531,7 +1525,15 @@ static int reloc_gpu_alloc(struct i915_execbuffer *eb) return __reloc_gpu_alloc(eb, engine); } -static int eb_relocate(struct i915_execbuffer *eb) +static void free_reloc_fence(struct i915_execbuffer *eb) +{ + struct dma_fence *f = fetch_and_zero(&eb->reloc_cache.fence); + + dma_fence_signal(f); + dma_fence_put(f); +} + +static int eb_reloc(struct i915_execbuffer *eb) { int err; @@ -1550,9 +1552,15 @@ static int eb_relocate(struct i915_execbuffer *eb) struct eb_vma *ev; int flush; + eb->reloc_cache.fence = __dma_fence_create_proxy(0, 0); + if (!eb->reloc_cache.fence) + return -ENOMEM; + err = reloc_gpu_alloc(eb); - if (err) + if (err) { + free_reloc_fence(eb); return err; + } GEM_BUG_ON(!eb->reloc_cache.rq); err = lock_relocs(eb); @@ -1560,6 +1568,7 @@ static int eb_relocate(struct i915_execbuffer *eb) if (err == 0) err = eb_relocate_vma(eb, ev); } + GEM_BUG_ON(dma_fence_is_signaled(eb->reloc_cache.fence)); flush = reloc_gpu_flush(&eb->reloc_cache); if (!err) @@ -1569,6 +1578,15 @@ static int eb_relocate(struct i915_execbuffer *eb) return err; } +static void eb_reloc_signal(struct i915_execbuffer *eb, struct i915_request *rq) +{ + dma_fence_proxy_set_real(eb->reloc_cache.fence, &rq->fence); + i915_request_put(eb->reloc_cache.rq); + + dma_fence_put(eb->reloc_cache.fence); + eb->reloc_cache.fence = NULL; +} + static int eb_move_to_gpu(struct i915_execbuffer *eb) { const unsigned int count = eb->buffer_count; @@ -1812,10 +1830,15 @@ static int eb_parse_pipeline(struct i915_execbuffer *eb, if (err) goto err_batch_unlock; - /* Wait for all writes (and relocs) into the batch to complete */ - err = i915_sw_fence_await_reservation(&pw->base.chain, - pw->batch->resv, NULL, false, - 0, I915_FENCE_GFP); + /* Wait for all writes (or relocs) into the batch to complete */ + if (!eb->reloc_cache.fence || list_empty(&eb->batch->reloc_link)) + err = i915_sw_fence_await_reservation(&pw->base.chain, + pw->batch->resv, NULL, + false, 0, I915_FENCE_GFP); + else + err = i915_sw_fence_await_dma_fence(&pw->base.chain, + &eb->reloc_cache.rq->fence, + 0, I915_FENCE_GFP); if (err < 0) goto err_batch_unlock; @@ -1940,6 +1963,15 @@ static int eb_submit(struct i915_execbuffer *eb, struct i915_vma *batch) { int err; + if (eb->reloc_cache.fence) { + err = i915_request_await_dma_fence(eb->request, + &eb->reloc_cache.rq->fence); + if (err) + return err; + + eb_reloc_signal(eb, eb->request); + } + err = eb_move_to_gpu(eb); if (err) return err; @@ -2464,7 +2496,7 @@ i915_gem_do_execbuffer(struct drm_device *dev, if (unlikely(err)) goto err_context; - err = eb_relocate(&eb); + err = eb_reloc(&eb); if (err) { /* * If the user expects the execobject.offset and @@ -2497,7 +2529,7 @@ i915_gem_do_execbuffer(struct drm_device *dev, err = eb_parse(&eb); if (err) - goto err_vma; + goto err_reloc; /* * snb/ivb/vlv conflate the "batch in ppgtt" bit with the "non-secure @@ -2598,6 +2630,9 @@ i915_gem_do_execbuffer(struct drm_device *dev, err_parse: if (batch->private) intel_gt_buffer_pool_put(batch->private); +err_reloc: + if (eb.reloc_cache.fence) + eb_reloc_signal(&eb, eb.reloc_cache.rq); err_vma: if (eb.trampoline) i915_vma_unpin(eb.trampoline); diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/selftests/i915_gem_execbuffer.c index 8b06609e260e..e0900846de38 100644 --- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_execbuffer.c +++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_execbuffer.c @@ -23,7 +23,6 @@ static int __igt_gpu_reloc(struct i915_execbuffer *eb, const u64 mask = GENMASK_ULL(eb->reloc_cache.use_64bit_reloc ? 63 : 31, 0); const u32 *map = page_mask_bits(obj->mm.mapping); - struct i915_request *rq; struct eb_vma ev; int err; int i; @@ -40,6 +39,8 @@ static int __igt_gpu_reloc(struct i915_execbuffer *eb, if (err) goto unpin_vma; + eb->reloc_cache.fence = &eb->reloc_cache.rq->fence; + i915_vma_lock(ev.vma); err = reloc_move_to_gpu(&eb->reloc_cache, &ev); i915_vma_unlock(ev.vma); @@ -68,12 +69,9 @@ static int __igt_gpu_reloc(struct i915_execbuffer *eb, if (err) goto unpin_vma; - GEM_BUG_ON(!eb->reloc_cache.rq); - rq = i915_request_get(eb->reloc_cache.rq); err = reloc_gpu_flush(&eb->reloc_cache); if (err) goto put_rq; - GEM_BUG_ON(eb->reloc_cache.rq); err = i915_gem_object_wait(obj, I915_WAIT_INTERRUPTIBLE, HZ / 2); if (err) { @@ -81,7 +79,7 @@ static int __igt_gpu_reloc(struct i915_execbuffer *eb, goto put_rq; } - if (!i915_request_completed(rq)) { + if (!i915_request_completed(eb->reloc_cache.rq)) { pr_err("%s: did not wait for relocations!\n", eb->engine->name); err = -EINVAL; goto put_rq; @@ -100,7 +98,8 @@ static int __igt_gpu_reloc(struct i915_execbuffer *eb, igt_hexdump(map, 4096); put_rq: - i915_request_put(rq); + i915_request_put(eb->reloc_cache.rq); + eb->reloc_cache.rq = NULL; unpin_vma: i915_vma_unpin(ev.vma); return err;