From patchwork Sat Feb 20 11:08:48 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Chris Wilson X-Patchwork-Id: 12096801 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 94D75C433E0 for ; Sat, 20 Feb 2021 11:09:10 +0000 (UTC) Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id CEBEA64ED6 for ; Sat, 20 Feb 2021 11:09:09 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org CEBEA64ED6 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=chris-wilson.co.uk Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=intel-gfx-bounces@lists.freedesktop.org Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 72A286E0C6; Sat, 20 Feb 2021 11:09:09 +0000 (UTC) Received: from fireflyinternet.com (unknown [77.68.26.236]) by gabe.freedesktop.org (Postfix) with ESMTPS id 6F1B46E0C6 for ; Sat, 20 Feb 2021 11:09:06 +0000 (UTC) X-Default-Received-SPF: pass (skip=forwardok (res=PASS)) x-ip-name=78.156.69.177; Received: from build.alporthouse.com (unverified [78.156.69.177]) by fireflyinternet.com (Firefly Internet (M1)) with ESMTP id 23919950-1500050 for multiple; Sat, 20 Feb 2021 11:08:50 +0000 From: Chris Wilson To: intel-gfx@lists.freedesktop.org Date: Sat, 20 Feb 2021 11:08:48 +0000 Message-Id: <20210220110848.24898-1-chris@chris-wilson.co.uk> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20210215155616.26330-3-chris@chris-wilson.co.uk> References: <20210215155616.26330-3-chris@chris-wilson.co.uk> MIME-Version: 1.0 Subject: [Intel-gfx] [PATCH] drm/i915: Refine VT-d scanout workaround X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Imre Deak , Matthew Auld , Chris Wilson Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" VT-d may cause overfetch of the scanout PTE, both before and after the vma (depending on the scanout orientation). bspec recommends that we provide a tile-row in either directions, and suggests using 168 PTE, warning that the accesses will wrap around the ends of the GGTT. Currently, we fill the entire GGTT with scratch pages when using VT-d to always ensure there are valid entries around every vma, including scanout. However, writing every PTE is slow as on recent devices we perform 8MiB of uncached writes, incurring an extra 100ms during resume. If instead we focus on only putting guard pages around scanout, we can avoid touching the whole GGTT. To avoid having to introduce extra nodes around each scanout vma, we adjust the scanout drm_mm_node to be smaller than the allocated space, and fixup the extra PTE during dma binding. v2: Move the guard from modifying drm_mm_node.start which is still used by the drm_mm itself, into an adjustment of node.start at the point of use. v3: Pass the requested guard padding from the caller, so we can drop the VT-d w/a knowledge from the i915_vma allocator. v4: Bump minimum padding to 168 PTE and cautiously ensure that a full tile row around the vma is included with the guard. Signed-off-by: Chris Wilson Cc: Ville Syrjälä Cc: Matthew Auld Cc: Imre Deak Reviewed-by: Matthew Auld --- drivers/gpu/drm/i915/gem/i915_gem_domain.c | 13 +++++++++++ drivers/gpu/drm/i915/gt/intel_ggtt.c | 25 +--------------------- drivers/gpu/drm/i915/i915_gem_gtt.h | 1 + drivers/gpu/drm/i915/i915_vma.c | 8 +++++++ 4 files changed, 23 insertions(+), 24 deletions(-) diff --git a/drivers/gpu/drm/i915/gem/i915_gem_domain.c b/drivers/gpu/drm/i915/gem/i915_gem_domain.c index 0478b069c202..32b13af0d3df 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_domain.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_domain.c @@ -16,6 +16,8 @@ #include "i915_gem_lmem.h" #include "i915_gem_mman.h" +#define VTD_GUARD (168u * I915_GTT_PAGE_SIZE) /* 168 or tile-row PTE padding */ + static bool gpu_write_needs_clflush(struct drm_i915_gem_object *obj) { return !(obj->cache_level == I915_CACHE_NONE || @@ -345,6 +347,17 @@ i915_gem_object_pin_to_display_plane(struct drm_i915_gem_object *obj, if (ret) goto err; + /* VT-d may overfetch before/after the vma, so pad with scratch */ + if (intel_scanout_needs_vtd_wa(i915)) { + unsigned int guard = VTD_GUARD; + + if (i915_gem_object_is_tiled(obj)) + guard = max(guard, + i915_gem_object_get_tile_row_size(obj)); + + flags |= PIN_OFFSET_GUARD | guard; + } + /* * As the user may map the buffer once pinned in the display plane * (e.g. libkms for the bootup splash), we have to ensure that we diff --git a/drivers/gpu/drm/i915/gt/intel_ggtt.c b/drivers/gpu/drm/i915/gt/intel_ggtt.c index 6b326138e765..251b50884d1c 100644 --- a/drivers/gpu/drm/i915/gt/intel_ggtt.c +++ b/drivers/gpu/drm/i915/gt/intel_ggtt.c @@ -319,27 +319,6 @@ static void nop_clear_range(struct i915_address_space *vm, { } -static void gen8_ggtt_clear_range(struct i915_address_space *vm, - u64 start, u64 length) -{ - struct i915_ggtt *ggtt = i915_vm_to_ggtt(vm); - unsigned int first_entry = start / I915_GTT_PAGE_SIZE; - unsigned int num_entries = length / I915_GTT_PAGE_SIZE; - const gen8_pte_t scratch_pte = vm->scratch[0]->encode; - gen8_pte_t __iomem *gtt_base = - (gen8_pte_t __iomem *)ggtt->gsm + first_entry; - const int max_entries = ggtt_total_entries(ggtt) - first_entry; - int i; - - if (WARN(num_entries > max_entries, - "First entry = %d; Num entries = %d (max=%d)\n", - first_entry, num_entries, max_entries)) - num_entries = max_entries; - - for (i = 0; i < num_entries; i++) - gen8_set_pte(>t_base[i], scratch_pte); -} - static void bxt_vtd_ggtt_wa(struct i915_address_space *vm) { /* @@ -907,8 +886,6 @@ static int gen8_gmch_probe(struct i915_ggtt *ggtt) ggtt->vm.cleanup = gen6_gmch_remove; ggtt->vm.insert_page = gen8_ggtt_insert_page; ggtt->vm.clear_range = nop_clear_range; - if (intel_scanout_needs_vtd_wa(i915)) - ggtt->vm.clear_range = gen8_ggtt_clear_range; ggtt->vm.insert_entries = gen8_ggtt_insert_entries; @@ -1054,7 +1031,7 @@ static int gen6_gmch_probe(struct i915_ggtt *ggtt) ggtt->vm.alloc_pt_dma = alloc_pt_dma; ggtt->vm.clear_range = nop_clear_range; - if (!HAS_FULL_PPGTT(i915) || intel_scanout_needs_vtd_wa(i915)) + if (!HAS_FULL_PPGTT(i915)) ggtt->vm.clear_range = gen6_ggtt_clear_range; ggtt->vm.insert_page = gen6_ggtt_insert_page; ggtt->vm.insert_entries = gen6_ggtt_insert_entries; diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h index c9b0ee5e1d23..f3ae9afdee15 100644 --- a/drivers/gpu/drm/i915/i915_gem_gtt.h +++ b/drivers/gpu/drm/i915/i915_gem_gtt.h @@ -41,6 +41,7 @@ int i915_gem_gtt_insert(struct i915_address_space *vm, #define PIN_HIGH BIT_ULL(5) #define PIN_OFFSET_BIAS BIT_ULL(6) #define PIN_OFFSET_FIXED BIT_ULL(7) +#define PIN_OFFSET_GUARD BIT_ULL(8) #define PIN_GLOBAL BIT_ULL(10) /* I915_VMA_GLOBAL_BIND */ #define PIN_USER BIT_ULL(11) /* I915_VMA_LOCAL_BIND */ diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c index 01e62edf4704..030b31262e66 100644 --- a/drivers/gpu/drm/i915/i915_vma.c +++ b/drivers/gpu/drm/i915/i915_vma.c @@ -552,6 +552,9 @@ bool i915_vma_misplaced(const struct i915_vma *vma, i915_vma_offset(vma) != (flags & PIN_OFFSET_MASK)) return true; + if (flags & PIN_OFFSET_GUARD && vma->guard < (flags & PIN_OFFSET_MASK)) + return true; + return false; } @@ -629,6 +632,7 @@ i915_vma_insert(struct i915_vma *vma, u64 size, u64 alignment, u64 flags) GEM_BUG_ON(i915_vma_is_bound(vma, I915_VMA_GLOBAL_BIND | I915_VMA_LOCAL_BIND)); GEM_BUG_ON(drm_mm_node_allocated(&vma->node)); + GEM_BUG_ON(hweight64(flags & (PIN_OFFSET_GUARD | PIN_OFFSET_FIXED | PIN_OFFSET_BIAS)) > 1); size = max(size, vma->size); alignment = max_t(typeof(alignment), alignment, vma->display_alignment); @@ -643,6 +647,10 @@ i915_vma_insert(struct i915_vma *vma, u64 size, u64 alignment, u64 flags) GEM_BUG_ON(!is_power_of_2(alignment)); guard = vma->guard; /* retain guard across rebinds */ + if (flags & PIN_OFFSET_GUARD) { + GEM_BUG_ON(overflows_type(flags & PIN_OFFSET_MASK, u32)); + guard = max_t(u32, guard, flags & PIN_OFFSET_MASK); + } guard = ALIGN(guard, alignment); start = flags & PIN_OFFSET_BIAS ? flags & PIN_OFFSET_MASK : 0;