From patchwork Thu Aug 4 07:37:22 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mauro Carvalho Chehab X-Patchwork-Id: 12936167 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id DCEFAC19F2C for ; Thu, 4 Aug 2022 07:38:08 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 27CF311BEC3; Thu, 4 Aug 2022 07:37:44 +0000 (UTC) Received: from ams.source.kernel.org (ams.source.kernel.org [IPv6:2604:1380:4601:e00::1]) by gabe.freedesktop.org (Postfix) with ESMTPS id 1850614BF2C; Thu, 4 Aug 2022 07:37:31 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id 728C3B824A7; Thu, 4 Aug 2022 07:37:30 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 21972C433B5; Thu, 4 Aug 2022 07:37:29 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1659598649; bh=/u32o8/KhC8mWHjp/YPgOef2++CtnNOIZCAj3beykhU=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=q2PNOEbOtnqWyJKxqsZ8mPMlhHa9u7TL3dsAmt5gUEEA3+5tuLeCCPeU0Qw3dj86B crzF7qU8SCzFq5nlTEHCfF62Ws6Ba7c0Vi84Ai9NdrEjEqzPWbqbtTSVvmnubcJfO2 cA7jfEes7ZFpuqR/Hf9/g3G9rbt8ydKGulY3iVw8K9U6ahp3Pl0vYDL+Dt5fNKcxj4 Pk0ANd4aNWHuYhL4wNUB2+e14/5It3Z9RpXI/w61EGshbAh3xcYQJ9Fo2o/SSiglAU 2ffhmVfwUmMc/QUuHZlEf1ZKkXvh2b0fCL6ikvF5WJIQbgpTweF6+z3iQdSkq5oFoK U6LMQSX+jI17w== Received: from mchehab by mail.kernel.org with local (Exim 4.95) (envelope-from ) id 1oJVQ6-0017HO-NR; Thu, 04 Aug 2022 09:37:26 +0200 From: Mauro Carvalho Chehab To: Subject: [PATCH v3 1/3] drm/i915: pass a pointer for tlb seqno at vma_invalidate_tlb() Date: Thu, 4 Aug 2022 09:37:22 +0200 Message-Id: X-Mailer: git-send-email 2.37.1 In-Reply-To: References: MIME-Version: 1.0 X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Tvrtko Ursulin , Michael Cheng , =?utf-8?q?Thomas_Hellstr=C3=B6m?= , David Airlie , dri-devel@lists.freedesktop.org, Lucas De Marchi , linux-kernel@vger.kernel.org, Chris Wilson , Rodrigo Vivi , Andi Shyti , Niranjana Vishwanathapura , Mauro Carvalho Chehab , intel-gfx@lists.freedesktop.org, Matthew Auld Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" WRITE_ONCE() should happen at the original var, not on a local copy of it. Fixes: 5d36acb7198b ("drm/i915/gt: Batch TLB invalidations") Signed-off-by: Mauro Carvalho Chehab Reviewed-by: Andi Shyti --- To avoid mailbombing on a large number of people, only mailing lists were C/C on the cover. See [PATCH v3 0/3] at: https://lore.kernel.org/all/cover.1659598090.git.mchehab@kernel.org/ drivers/gpu/drm/i915/gt/intel_ppgtt.c | 2 +- drivers/gpu/drm/i915/i915_vma.c | 6 +++--- drivers/gpu/drm/i915/i915_vma.h | 2 +- 3 files changed, 5 insertions(+), 5 deletions(-) diff --git a/drivers/gpu/drm/i915/gt/intel_ppgtt.c b/drivers/gpu/drm/i915/gt/intel_ppgtt.c index 2da6c82a8bd2..6ee8d1127016 100644 --- a/drivers/gpu/drm/i915/gt/intel_ppgtt.c +++ b/drivers/gpu/drm/i915/gt/intel_ppgtt.c @@ -211,7 +211,7 @@ void ppgtt_unbind_vma(struct i915_address_space *vm, vm->clear_range(vm, vma_res->start, vma_res->vma_size); if (vma_res->tlb) - vma_invalidate_tlb(vm, *vma_res->tlb); + vma_invalidate_tlb(vm, vma_res->tlb); } static unsigned long pd_count(u64 size, int shift) diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c index 84a9ccbc5fc5..260371716490 100644 --- a/drivers/gpu/drm/i915/i915_vma.c +++ b/drivers/gpu/drm/i915/i915_vma.c @@ -1308,7 +1308,7 @@ I915_SELFTEST_EXPORT int i915_vma_get_pages(struct i915_vma *vma) return err; } -void vma_invalidate_tlb(struct i915_address_space *vm, u32 tlb) +void vma_invalidate_tlb(struct i915_address_space *vm, u32 *tlb) { /* * Before we release the pages that were bound by this vma, we @@ -1318,7 +1318,7 @@ void vma_invalidate_tlb(struct i915_address_space *vm, u32 tlb) * the most recent TLB invalidation seqno, and if we have not yet * flushed the TLBs upon release, perform a full invalidation. */ - WRITE_ONCE(tlb, intel_gt_next_invalidate_tlb_full(vm->gt)); + WRITE_ONCE(*tlb, intel_gt_next_invalidate_tlb_full(vm->gt)); } static void __vma_put_pages(struct i915_vma *vma, unsigned int count) @@ -1971,7 +1971,7 @@ struct dma_fence *__i915_vma_evict(struct i915_vma *vma, bool async) dma_fence_put(unbind_fence); unbind_fence = NULL; } - vma_invalidate_tlb(vma->vm, vma->obj->mm.tlb); + vma_invalidate_tlb(vma->vm, &vma->obj->mm.tlb); } /* diff --git a/drivers/gpu/drm/i915/i915_vma.h b/drivers/gpu/drm/i915/i915_vma.h index 5048eed536da..33a58f605d75 100644 --- a/drivers/gpu/drm/i915/i915_vma.h +++ b/drivers/gpu/drm/i915/i915_vma.h @@ -213,7 +213,7 @@ bool i915_vma_misplaced(const struct i915_vma *vma, u64 size, u64 alignment, u64 flags); void __i915_vma_set_map_and_fenceable(struct i915_vma *vma); void i915_vma_revoke_mmap(struct i915_vma *vma); -void vma_invalidate_tlb(struct i915_address_space *vm, u32 tlb); +void vma_invalidate_tlb(struct i915_address_space *vm, u32 *tlb); struct dma_fence *__i915_vma_evict(struct i915_vma *vma, bool async); int __i915_vma_unbind(struct i915_vma *vma); int __must_check i915_vma_unbind(struct i915_vma *vma); From patchwork Thu Aug 4 07:37:23 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Mauro Carvalho Chehab X-Patchwork-Id: 12936166 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id D51F3C19F2C for ; Thu, 4 Aug 2022 07:38:03 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 325108FD7E; Thu, 4 Aug 2022 07:37:43 +0000 (UTC) Received: from dfw.source.kernel.org (dfw.source.kernel.org [IPv6:2604:1380:4641:c500::1]) by gabe.freedesktop.org (Postfix) with ESMTPS id 5B2BF8FD4D; Thu, 4 Aug 2022 07:37:31 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 512DF61313; Thu, 4 Aug 2022 07:37:30 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 1D342C433D7; Thu, 4 Aug 2022 07:37:29 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1659598649; bh=VNyU3AlNQlE2SfpLXOJv/BvqzM9g1cMbRwW53/ahtVY=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=Fle/bnGt3TypXSTziXKef9mwCRCUFPGzbEpoWhLH9bZbU+ea2OSSk+JpbEMLQnbOg tmu+8aHWKLoQsG9a6343ej9cROULFqP5vArVd25vFcc+ow3tNZ2Rhr8aOBQ+vGEGhz e7Ll7KGFlLturUkeUk0G1zgjpbfmzFp6eqxRqLekzFasSsEwUQGOwSm+4qhN0Rc9CK vX7mf/OnDcMbcerO+Nd8FXMZ0COCZ9WxjLA/qXEBSE6ROFSzYT+qVb/7HMMik+pzpL LabWVj4VN8LIa1fI4dw3HB/xwAtjoVT+C9V5GTEbT6kzfCCDrg9DbL5bcQLlKC9jE1 wxuBe7uRlaPTg== Received: from mchehab by mail.kernel.org with local (Exim 4.95) (envelope-from ) id 1oJVQ6-0017HS-Pm; Thu, 04 Aug 2022 09:37:26 +0200 From: Mauro Carvalho Chehab To: Subject: [PATCH v3 2/3] drm/i915/gt: Move TLB invalidation to its own file Date: Thu, 4 Aug 2022 09:37:23 +0200 Message-Id: <8fdf5df8464c7d5d2629c7f6fde0b82b2be4ad26.1659598090.git.mchehab@kernel.org> X-Mailer: git-send-email 2.37.1 In-Reply-To: References: MIME-Version: 1.0 X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: David Airlie , dri-devel@lists.freedesktop.org, Daniele Ceraolo Spurio , Fei Yang , Chris Wilson , Matthew Auld , Andi Shyti , Niranjana Vishwanathapura , =?utf-8?q?Thomas_Hellstr=C3=B6m?= , Casey Bowman , Lucas De Marchi , intel-gfx@lists.freedesktop.org, Rodrigo Vivi , Mauro Carvalho Chehab , Tomas Winkler , Tvrtko Ursulin , linux-kernel@vger.kernel.org Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" From: Chris Wilson Prepare for supporting more TLB invalidation scenarios by moving the current MMIO invalidation to its own file. It also: - Renames intel_gt_invalidate_tlb() to intel_gt_invalidate_tlb_full() - Add intel_gt_init_tlb() and intel_gt_fini_tlb() abstracts. Signed-off-by: Chris Wilson Reviewed-by: Niranjana Vishwanathapura Reviewed-by: Andi Shyti Cc: Fei Yang Signed-off-by: Mauro Carvalho Chehab --- To avoid mailbombing on a large number of people, only mailing lists were C/C on the cover. See [PATCH v3 0/3] at: https://lore.kernel.org/all/cover.1659598090.git.mchehab@kernel.org/ drivers/gpu/drm/i915/Makefile | 1 + drivers/gpu/drm/i915/gem/i915_gem_pages.c | 4 +- drivers/gpu/drm/i915/gt/intel_gt.c | 168 +------------------- drivers/gpu/drm/i915/gt/intel_gt.h | 12 -- drivers/gpu/drm/i915/gt/intel_tlb.c | 183 ++++++++++++++++++++++ drivers/gpu/drm/i915/gt/intel_tlb.h | 29 ++++ drivers/gpu/drm/i915/i915_vma.c | 1 + 7 files changed, 219 insertions(+), 179 deletions(-) create mode 100644 drivers/gpu/drm/i915/gt/intel_tlb.c create mode 100644 drivers/gpu/drm/i915/gt/intel_tlb.h diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile index 522ef9b4aff3..d3df9832d1f7 100644 --- a/drivers/gpu/drm/i915/Makefile +++ b/drivers/gpu/drm/i915/Makefile @@ -126,6 +126,7 @@ gt-y += \ gt/intel_sseu.o \ gt/intel_sseu_debugfs.o \ gt/intel_timeline.o \ + gt/intel_tlb.o \ gt/intel_workarounds.o \ gt/shmem_utils.o \ gt/sysfs_engines.o diff --git a/drivers/gpu/drm/i915/gem/i915_gem_pages.c b/drivers/gpu/drm/i915/gem/i915_gem_pages.c index 8357dbdcab5c..1cd76cc5d9f3 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_pages.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_pages.c @@ -7,7 +7,7 @@ #include #include "gt/intel_gt.h" -#include "gt/intel_gt_pm.h" +#include "gt/intel_tlb.h" #include "i915_drv.h" #include "i915_gem_object.h" @@ -199,7 +199,7 @@ static void flush_tlb_invalidate(struct drm_i915_gem_object *obj) if (!obj->mm.tlb) return; - intel_gt_invalidate_tlb(gt, obj->mm.tlb); + intel_gt_invalidate_tlb_full(gt, obj->mm.tlb); obj->mm.tlb = 0; } diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c b/drivers/gpu/drm/i915/gt/intel_gt.c index f435e06125aa..18d82cd620bd 100644 --- a/drivers/gpu/drm/i915/gt/intel_gt.c +++ b/drivers/gpu/drm/i915/gt/intel_gt.c @@ -11,9 +11,7 @@ #include "pxp/intel_pxp.h" #include "i915_drv.h" -#include "i915_perf_oa_regs.h" #include "intel_context.h" -#include "intel_engine_pm.h" #include "intel_engine_regs.h" #include "intel_ggtt_gmch.h" #include "intel_gt.h" @@ -31,6 +29,7 @@ #include "intel_renderstate.h" #include "intel_rps.h" #include "intel_gt_sysfs.h" +#include "intel_tlb.h" #include "intel_uncore.h" #include "shmem_utils.h" @@ -48,8 +47,7 @@ static void __intel_gt_init_early(struct intel_gt *gt) intel_gt_init_reset(gt); intel_gt_init_requests(gt); intel_gt_init_timelines(gt); - mutex_init(>->tlb.invalidate_lock); - seqcount_mutex_init(>->tlb.seqno, >->tlb.invalidate_lock); + intel_gt_init_tlb(gt); intel_gt_pm_init_early(gt); intel_uc_init_early(>->uc); @@ -770,7 +768,7 @@ void intel_gt_driver_late_release_all(struct drm_i915_private *i915) intel_gt_fini_requests(gt); intel_gt_fini_reset(gt); intel_gt_fini_timelines(gt); - mutex_destroy(>->tlb.invalidate_lock); + intel_gt_fini_tlb(gt); intel_engines_free(gt); } } @@ -881,163 +879,3 @@ void intel_gt_info_print(const struct intel_gt_info *info, intel_sseu_dump(&info->sseu, p); } - -struct reg_and_bit { - i915_reg_t reg; - u32 bit; -}; - -static struct reg_and_bit -get_reg_and_bit(const struct intel_engine_cs *engine, const bool gen8, - const i915_reg_t *regs, const unsigned int num) -{ - const unsigned int class = engine->class; - struct reg_and_bit rb = { }; - - if (drm_WARN_ON_ONCE(&engine->i915->drm, - class >= num || !regs[class].reg)) - return rb; - - rb.reg = regs[class]; - if (gen8 && class == VIDEO_DECODE_CLASS) - rb.reg.reg += 4 * engine->instance; /* GEN8_M2TCR */ - else - rb.bit = engine->instance; - - rb.bit = BIT(rb.bit); - - return rb; -} - -static void mmio_invalidate_full(struct intel_gt *gt) -{ - static const i915_reg_t gen8_regs[] = { - [RENDER_CLASS] = GEN8_RTCR, - [VIDEO_DECODE_CLASS] = GEN8_M1TCR, /* , GEN8_M2TCR */ - [VIDEO_ENHANCEMENT_CLASS] = GEN8_VTCR, - [COPY_ENGINE_CLASS] = GEN8_BTCR, - }; - static const i915_reg_t gen12_regs[] = { - [RENDER_CLASS] = GEN12_GFX_TLB_INV_CR, - [VIDEO_DECODE_CLASS] = GEN12_VD_TLB_INV_CR, - [VIDEO_ENHANCEMENT_CLASS] = GEN12_VE_TLB_INV_CR, - [COPY_ENGINE_CLASS] = GEN12_BLT_TLB_INV_CR, - [COMPUTE_CLASS] = GEN12_COMPCTX_TLB_INV_CR, - }; - struct drm_i915_private *i915 = gt->i915; - struct intel_uncore *uncore = gt->uncore; - struct intel_engine_cs *engine; - intel_engine_mask_t awake, tmp; - enum intel_engine_id id; - const i915_reg_t *regs; - unsigned int num = 0; - - if (GRAPHICS_VER(i915) == 12) { - regs = gen12_regs; - num = ARRAY_SIZE(gen12_regs); - } else if (GRAPHICS_VER(i915) >= 8 && GRAPHICS_VER(i915) <= 11) { - regs = gen8_regs; - num = ARRAY_SIZE(gen8_regs); - } else if (GRAPHICS_VER(i915) < 8) { - return; - } - - if (drm_WARN_ONCE(&i915->drm, !num, - "Platform does not implement TLB invalidation!")) - return; - - intel_uncore_forcewake_get(uncore, FORCEWAKE_ALL); - - spin_lock_irq(&uncore->lock); /* serialise invalidate with GT reset */ - - awake = 0; - for_each_engine(engine, gt, id) { - struct reg_and_bit rb; - - if (!intel_engine_pm_is_awake(engine)) - continue; - - rb = get_reg_and_bit(engine, regs == gen8_regs, regs, num); - if (!i915_mmio_reg_offset(rb.reg)) - continue; - - intel_uncore_write_fw(uncore, rb.reg, rb.bit); - awake |= engine->mask; - } - - GT_TRACE(gt, "invalidated engines %08x\n", awake); - - /* Wa_2207587034:tgl,dg1,rkl,adl-s,adl-p */ - if (awake && - (IS_TIGERLAKE(i915) || - IS_DG1(i915) || - IS_ROCKETLAKE(i915) || - IS_ALDERLAKE_S(i915) || - IS_ALDERLAKE_P(i915))) - intel_uncore_write_fw(uncore, GEN12_OA_TLB_INV_CR, 1); - - spin_unlock_irq(&uncore->lock); - - for_each_engine_masked(engine, gt, awake, tmp) { - struct reg_and_bit rb; - - /* - * HW architecture suggest typical invalidation time at 40us, - * with pessimistic cases up to 100us and a recommendation to - * cap at 1ms. We go a bit higher just in case. - */ - const unsigned int timeout_us = 100; - const unsigned int timeout_ms = 4; - - rb = get_reg_and_bit(engine, regs == gen8_regs, regs, num); - if (__intel_wait_for_register_fw(uncore, - rb.reg, rb.bit, 0, - timeout_us, timeout_ms, - NULL)) - drm_err_ratelimited(>->i915->drm, - "%s TLB invalidation did not complete in %ums!\n", - engine->name, timeout_ms); - } - - /* - * Use delayed put since a) we mostly expect a flurry of TLB - * invalidations so it is good to avoid paying the forcewake cost and - * b) it works around a bug in Icelake which cannot cope with too rapid - * transitions. - */ - intel_uncore_forcewake_put_delayed(uncore, FORCEWAKE_ALL); -} - -static bool tlb_seqno_passed(const struct intel_gt *gt, u32 seqno) -{ - u32 cur = intel_gt_tlb_seqno(gt); - - /* Only skip if a *full* TLB invalidate barrier has passed */ - return (s32)(cur - ALIGN(seqno, 2)) > 0; -} - -void intel_gt_invalidate_tlb(struct intel_gt *gt, u32 seqno) -{ - intel_wakeref_t wakeref; - - if (I915_SELFTEST_ONLY(gt->awake == -ENODEV)) - return; - - if (intel_gt_is_wedged(gt)) - return; - - if (tlb_seqno_passed(gt, seqno)) - return; - - with_intel_gt_pm_if_awake(gt, wakeref) { - mutex_lock(>->tlb.invalidate_lock); - if (tlb_seqno_passed(gt, seqno)) - goto unlock; - - mmio_invalidate_full(gt); - - write_seqcount_invalidate(>->tlb.seqno); -unlock: - mutex_unlock(>->tlb.invalidate_lock); - } -} diff --git a/drivers/gpu/drm/i915/gt/intel_gt.h b/drivers/gpu/drm/i915/gt/intel_gt.h index 40b06adf509a..b4bba16cdb53 100644 --- a/drivers/gpu/drm/i915/gt/intel_gt.h +++ b/drivers/gpu/drm/i915/gt/intel_gt.h @@ -101,16 +101,4 @@ void intel_gt_info_print(const struct intel_gt_info *info, void intel_gt_watchdog_work(struct work_struct *work); -static inline u32 intel_gt_tlb_seqno(const struct intel_gt *gt) -{ - return seqprop_sequence(>->tlb.seqno); -} - -static inline u32 intel_gt_next_invalidate_tlb_full(const struct intel_gt *gt) -{ - return intel_gt_tlb_seqno(gt) | 1; -} - -void intel_gt_invalidate_tlb(struct intel_gt *gt, u32 seqno); - #endif /* __INTEL_GT_H__ */ diff --git a/drivers/gpu/drm/i915/gt/intel_tlb.c b/drivers/gpu/drm/i915/gt/intel_tlb.c new file mode 100644 index 000000000000..af8cae979489 --- /dev/null +++ b/drivers/gpu/drm/i915/gt/intel_tlb.c @@ -0,0 +1,183 @@ +// SPDX-License-Identifier: MIT +/* + * Copyright © 2022 Intel Corporation + */ + +#include "i915_drv.h" +#include "i915_perf_oa_regs.h" +#include "intel_engine_pm.h" +#include "intel_gt.h" +#include "intel_gt_pm.h" +#include "intel_gt_regs.h" +#include "intel_tlb.h" + +struct reg_and_bit { + i915_reg_t reg; + u32 bit; +}; + +static struct reg_and_bit +get_reg_and_bit(const struct intel_engine_cs *engine, const bool gen8, + const i915_reg_t *regs, const unsigned int num) +{ + const unsigned int class = engine->class; + struct reg_and_bit rb = { }; + + if (drm_WARN_ON_ONCE(&engine->i915->drm, + class >= num || !regs[class].reg)) + return rb; + + rb.reg = regs[class]; + if (gen8 && class == VIDEO_DECODE_CLASS) + rb.reg.reg += 4 * engine->instance; /* GEN8_M2TCR */ + else + rb.bit = engine->instance; + + rb.bit = BIT(rb.bit); + + return rb; +} + +static bool tlb_seqno_passed(const struct intel_gt *gt, u32 seqno) +{ + u32 cur = intel_gt_tlb_seqno(gt); + + /* Only skip if a *full* TLB invalidate barrier has passed */ + return (s32)(cur - ALIGN(seqno, 2)) > 0; +} + +static void mmio_invalidate_full(struct intel_gt *gt) +{ + static const i915_reg_t gen8_regs[] = { + [RENDER_CLASS] = GEN8_RTCR, + [VIDEO_DECODE_CLASS] = GEN8_M1TCR, /* , GEN8_M2TCR */ + [VIDEO_ENHANCEMENT_CLASS] = GEN8_VTCR, + [COPY_ENGINE_CLASS] = GEN8_BTCR, + }; + static const i915_reg_t gen12_regs[] = { + [RENDER_CLASS] = GEN12_GFX_TLB_INV_CR, + [VIDEO_DECODE_CLASS] = GEN12_VD_TLB_INV_CR, + [VIDEO_ENHANCEMENT_CLASS] = GEN12_VE_TLB_INV_CR, + [COPY_ENGINE_CLASS] = GEN12_BLT_TLB_INV_CR, + [COMPUTE_CLASS] = GEN12_COMPCTX_TLB_INV_CR, + }; + struct drm_i915_private *i915 = gt->i915; + struct intel_uncore *uncore = gt->uncore; + struct intel_engine_cs *engine; + intel_engine_mask_t awake, tmp; + enum intel_engine_id id; + const i915_reg_t *regs; + unsigned int num = 0; + + if (GRAPHICS_VER(i915) == 12) { + regs = gen12_regs; + num = ARRAY_SIZE(gen12_regs); + } else if (GRAPHICS_VER(i915) >= 8 && GRAPHICS_VER(i915) <= 11) { + regs = gen8_regs; + num = ARRAY_SIZE(gen8_regs); + } else if (GRAPHICS_VER(i915) < 8) { + return; + } + + if (drm_WARN_ONCE(&i915->drm, !num, + "Platform does not implement TLB invalidation!")) + return; + + intel_uncore_forcewake_get(uncore, FORCEWAKE_ALL); + + spin_lock_irq(&uncore->lock); /* serialise invalidate with GT reset */ + + awake = 0; + for_each_engine(engine, gt, id) { + struct reg_and_bit rb; + + if (!intel_engine_pm_is_awake(engine)) + continue; + + rb = get_reg_and_bit(engine, regs == gen8_regs, regs, num); + if (!i915_mmio_reg_offset(rb.reg)) + continue; + + intel_uncore_write_fw(uncore, rb.reg, rb.bit); + awake |= engine->mask; + } + + GT_TRACE(gt, "invalidated engines %08x\n", awake); + + /* Wa_2207587034:tgl,dg1,rkl,adl-s,adl-p */ + if (awake && + (IS_TIGERLAKE(i915) || + IS_DG1(i915) || + IS_ROCKETLAKE(i915) || + IS_ALDERLAKE_S(i915) || + IS_ALDERLAKE_P(i915))) + intel_uncore_write_fw(uncore, GEN12_OA_TLB_INV_CR, 1); + + spin_unlock_irq(&uncore->lock); + + for_each_engine_masked(engine, gt, awake, tmp) { + struct reg_and_bit rb; + + /* + * HW architecture suggest typical invalidation time at 40us, + * with pessimistic cases up to 100us and a recommendation to + * cap at 1ms. We go a bit higher just in case. + */ + const unsigned int timeout_us = 100; + const unsigned int timeout_ms = 4; + + rb = get_reg_and_bit(engine, regs == gen8_regs, regs, num); + if (__intel_wait_for_register_fw(uncore, + rb.reg, rb.bit, 0, + timeout_us, timeout_ms, + NULL)) + drm_err_ratelimited(>->i915->drm, + "%s TLB invalidation did not complete in %ums!\n", + engine->name, timeout_ms); + } + + /* + * Use delayed put since a) we mostly expect a flurry of TLB + * invalidations so it is good to avoid paying the forcewake cost and + * b) it works around a bug in Icelake which cannot cope with too rapid + * transitions. + */ + intel_uncore_forcewake_put_delayed(uncore, FORCEWAKE_ALL); +} + +void intel_gt_invalidate_tlb_full(struct intel_gt *gt, u32 seqno) +{ + intel_wakeref_t wakeref; + + if (I915_SELFTEST_ONLY(gt->awake == -ENODEV)) + return; + + if (intel_gt_is_wedged(gt)) + return; + + if (tlb_seqno_passed(gt, seqno)) + return; + + with_intel_gt_pm_if_awake(gt, wakeref) { + mutex_lock(>->tlb.invalidate_lock); + if (tlb_seqno_passed(gt, seqno)) + goto unlock; + + mmio_invalidate_full(gt); + + write_seqcount_invalidate(>->tlb.seqno); +unlock: + mutex_unlock(>->tlb.invalidate_lock); + } +} + +void intel_gt_init_tlb(struct intel_gt *gt) +{ + mutex_init(>->tlb.invalidate_lock); + seqcount_mutex_init(>->tlb.seqno, >->tlb.invalidate_lock); +} + +void intel_gt_fini_tlb(struct intel_gt *gt) +{ + mutex_destroy(>->tlb.invalidate_lock); +} diff --git a/drivers/gpu/drm/i915/gt/intel_tlb.h b/drivers/gpu/drm/i915/gt/intel_tlb.h new file mode 100644 index 000000000000..46ce25bf5afe --- /dev/null +++ b/drivers/gpu/drm/i915/gt/intel_tlb.h @@ -0,0 +1,29 @@ +/* SPDX-License-Identifier: MIT */ +/* + * Copyright © 2022 Intel Corporation + */ + +#ifndef INTEL_TLB_H +#define INTEL_TLB_H + +#include +#include + +#include "intel_gt_types.h" + +void intel_gt_invalidate_tlb_full(struct intel_gt *gt, u32 seqno); + +void intel_gt_init_tlb(struct intel_gt *gt); +void intel_gt_fini_tlb(struct intel_gt *gt); + +static inline u32 intel_gt_tlb_seqno(const struct intel_gt *gt) +{ + return seqprop_sequence(>->tlb.seqno); +} + +static inline u32 intel_gt_next_invalidate_tlb_full(const struct intel_gt *gt) +{ + return intel_gt_tlb_seqno(gt) | 1; +} + +#endif /* INTEL_TLB_H */ diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c index 260371716490..21c6883dbeb2 100644 --- a/drivers/gpu/drm/i915/i915_vma.c +++ b/drivers/gpu/drm/i915/i915_vma.c @@ -33,6 +33,7 @@ #include "gt/intel_engine_heartbeat.h" #include "gt/intel_gt.h" #include "gt/intel_gt_requests.h" +#include "gt/intel_tlb.h" #include "i915_drv.h" #include "i915_gem_evict.h" From patchwork Thu Aug 4 07:37:24 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mauro Carvalho Chehab X-Patchwork-Id: 12936168 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 9949FC19F2C for ; Thu, 4 Aug 2022 07:38:13 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 5512910E129; Thu, 4 Aug 2022 07:37:44 +0000 (UTC) Received: from ams.source.kernel.org (ams.source.kernel.org [145.40.68.75]) by gabe.freedesktop.org (Postfix) with ESMTPS id 976678FD5A; Thu, 4 Aug 2022 07:37:32 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id 0A699B824AF; Thu, 4 Aug 2022 07:37:31 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 1F57FC433C1; Thu, 4 Aug 2022 07:37:29 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1659598649; bh=6zDp0MLufE9tBDIHbX6CwEmGunZs9vLu0xAHolrBsfI=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=WnQiVfeQo8D/qGN2bjU+IduF6MQMlvWvT3y4bSCBnw4XxweyAzuQl5ViQ3HTBDEZv BhQhcnMOBqCCA0YJTJUdyc2QXeKya4StB0WPrLyIOKM5YCOX0bDdJPUJfpnQUWfM48 ziErv65mEqNLMix89TV1V+XiVuE9jUG/dceZKJf37wmi8YRhrpDp2oWBxQ7p4Niq2e 380ncgBl/MHUmLse3RCNIFZBzePkXJZV26elPFgZ07xqBbJK4u4Oi2lQT6V82ZYNpj BiChdEnBGTxImfXZBsl1xeYm8lMIH1XhV0m1O7mxGkp35Tpsb6aT091+87gluFQuxP E7YEE8ARmUXjg== Received: from mchehab by mail.kernel.org with local (Exim 4.95) (envelope-from ) id 1oJVQ6-0017HW-RZ; Thu, 04 Aug 2022 09:37:26 +0200 From: Mauro Carvalho Chehab To: Subject: [PATCH v3 3/3] drm/i915/gt: document TLB cache invalidation functions Date: Thu, 4 Aug 2022 09:37:24 +0200 Message-Id: X-Mailer: git-send-email 2.37.1 In-Reply-To: References: MIME-Version: 1.0 X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Tvrtko Ursulin , Andi Shyti , Jonathan Corbet , David Airlie , Thomas Zimmermann , dri-devel@lists.freedesktop.org, linux-kernel@vger.kernel.org, Chris Wilson , linux-doc@vger.kernel.org, Rodrigo Vivi , Niranjana Vishwanathapura , Mauro Carvalho Chehab , intel-gfx@lists.freedesktop.org Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" Add a description for the TLB cache invalidation algorithm and for the related kAPI functions. Signed-off-by: Mauro Carvalho Chehab Reviewed-by: Andi Shyti --- To avoid mailbombing on a large number of people, only mailing lists were C/C on the cover. See [PATCH v3 0/3] at: https://lore.kernel.org/all/cover.1659598090.git.mchehab@kernel.org/ Documentation/gpu/i915.rst | 7 ++ drivers/gpu/drm/i915/gt/intel_tlb.c | 25 ++++++++ drivers/gpu/drm/i915/gt/intel_tlb.h | 99 +++++++++++++++++++++++++++++ 3 files changed, 131 insertions(+) diff --git a/Documentation/gpu/i915.rst b/Documentation/gpu/i915.rst index 4e59db1cfb00..46911fdd79e8 100644 --- a/Documentation/gpu/i915.rst +++ b/Documentation/gpu/i915.rst @@ -58,6 +58,13 @@ Intel GVT-g Host Support(vGPU device model) .. kernel-doc:: drivers/gpu/drm/i915/intel_gvt.c :internal: +TLB cache invalidation +---------------------- + +.. kernel-doc:: drivers/gpu/drm/i915/gt/intel_tlb.h + +.. kernel-doc:: drivers/gpu/drm/i915/gt/intel_tlb.c + Workarounds ----------- diff --git a/drivers/gpu/drm/i915/gt/intel_tlb.c b/drivers/gpu/drm/i915/gt/intel_tlb.c index af8cae979489..16b918ffe824 100644 --- a/drivers/gpu/drm/i915/gt/intel_tlb.c +++ b/drivers/gpu/drm/i915/gt/intel_tlb.c @@ -145,6 +145,18 @@ static void mmio_invalidate_full(struct intel_gt *gt) intel_uncore_forcewake_put_delayed(uncore, FORCEWAKE_ALL); } +/** + * intel_gt_invalidate_tlb_full - do full TLB cache invalidation + * @gt: GT structure + * @seqno: sequence number + * + * Do a full TLB cache invalidation if the @seqno is bigger than the last + * full TLB cache invalidation. + * + * Note: + * The TLB cache invalidation logic depends on GEN-specific registers. + * It currently supports MMIO-based TLB flush for GEN8 to GEN12. + */ void intel_gt_invalidate_tlb_full(struct intel_gt *gt, u32 seqno) { intel_wakeref_t wakeref; @@ -171,12 +183,25 @@ void intel_gt_invalidate_tlb_full(struct intel_gt *gt, u32 seqno) } } +/** + * intel_gt_init_tlb - initialize TLB-specific vars + * @gt: GT structure + * + * TLB cache invalidation logic internally uses some resources that require + * initialization. Should be called before doing any TLB cache invalidation. + */ void intel_gt_init_tlb(struct intel_gt *gt) { mutex_init(>->tlb.invalidate_lock); seqcount_mutex_init(>->tlb.seqno, >->tlb.invalidate_lock); } +/** + * intel_gt_fini_tlb - free TLB-specific vars + * @gt: GT structure + * + * Frees any resources needed by TLB cache invalidation logic. + */ void intel_gt_fini_tlb(struct intel_gt *gt) { mutex_destroy(>->tlb.invalidate_lock); diff --git a/drivers/gpu/drm/i915/gt/intel_tlb.h b/drivers/gpu/drm/i915/gt/intel_tlb.h index 46ce25bf5afe..2838c051f872 100644 --- a/drivers/gpu/drm/i915/gt/intel_tlb.h +++ b/drivers/gpu/drm/i915/gt/intel_tlb.h @@ -11,16 +11,115 @@ #include "intel_gt_types.h" +/** + * DOC: TLB cache invalidation logic + * + * The way the current algorithm works is that a struct drm_i915_gem_object can + * be created on any order. At unbind/evict time, the object is warranted that + * it won't be used anymore. So, a sequence number provided by + * intel_gt_next_invalidate_tlb_full() is stored on it. This can happen either + * at __vma_put_pages() - for VMA sync unbind, or at ppgtt_unbind_vma() - for + * VMA async VMA bind. + * + * At __i915_gem_object_unset_pages(), intel_gt_invalidate_tlb_full() is called, + * where it checks if the sequence number of the object was already invalidated + * or not. If not, it flushes the TLB and increments the sequence number:: + * + * void intel_gt_invalidate_tlb_full(struct intel_gt *gt, u32 seqno) + * { + * ... + * with_intel_gt_pm_if_awake(gt, wakeref) { + * mutex_lock(>->tlb.invalidate_lock); + * if (tlb_seqno_passed(gt, seqno)) + * goto unlock; + * + * // Some code to do TLB invalidation + * ... + * + * write_seqcount_invalidate(>->tlb.seqno); // increment seqno + * mutex_lock(>->tlb.invalidate_lock); + * } + * + * So, let's say the current seqno is 2 and 3 new objects were created, + * on this order:: + * + * obj1 + * obj2 + * obj3 + * + * They can be unbind/evict on a different order. At unbind/evict time, + * the mm.tlb will be stamped with the sequence number, using the number + * from the last TLB flush, plus 1. + * + * Different threads may be used on unbind/evict and/or unset pages. + * As the logic at intel_gt_invalidate_tlb_full() is protected by a mutex, + * for simplicity, let's consider just two threads: + * + * +-------------------+-------------------------+---------------------------------+ + * | sequence number | Thread 0 | Thread 1 + + * +===================+=========================+=================================+ + * | seqno=2 | | | + * | +-------------------------+---------------------------------+ + * | | unbind/evict obj3. | | + * | | | | + * | | obj3.mm.tlb = seqno | 1 | | + * | | // obj3.mm.tlb = 3 | | + * | +-------------------------+---------------------------------+ + * | | unbind/evict obj1. | | + * | | | | + * | | obj1.mm.tlb = seqno | 1 | | + * | | // obj1.mm.tlb = 3 | | + * | +-------------------------+---------------------------------+ + * | | | __i915_gem_object_unset_pages() | + * | | | called for obj3 => TLB flush | + * | | | invalidating both obj1 and obj2.| + * | | | | + * | | | seqno += 2 | + * +-------------------+-------------------------+---------------------------------+ + * | seqno=4 | | | + * | +-------------------------+---------------------------------+ + * | | unbind/evict obj2. | | + * | | | | + * | | obj2.mm.tlb = seqno | 1 | | + * | | // obj2.mm.tlb = 5 | | + * | +-------------------------+---------------------------------+ + * | | | __i915_gem_object_unset_pages() | + * | | | called for obj1, don't flush | + * | | | as past flush invalidated obj1. | + * | +-------------------------+---------------------------------+ + * | | | __i915_gem_object_unset_pages() | + * | | | called for obj2 => TLB flush. | + * | | | invalidating obj2. | + * | | | | + * | | | seqno += 2 | + * +-------------------+-------------------------+---------------------------------+ + * | seqno=6 | | | + * +-------------------+-------------------------+---------------------------------+ + */ + void intel_gt_invalidate_tlb_full(struct intel_gt *gt, u32 seqno); void intel_gt_init_tlb(struct intel_gt *gt); void intel_gt_fini_tlb(struct intel_gt *gt); +/** + * intel_gt_tlb_seqno - Returns the current TLB invlidation sequence number + * @gt: GT structure + * + * There's no need to lock while calling it, as seqprop_sequence is thread-safe + */ static inline u32 intel_gt_tlb_seqno(const struct intel_gt *gt) { return seqprop_sequence(>->tlb.seqno); } +/** + * intel_gt_next_invalidate_tlb_full - Returns the next TLB full invalidation + * sequence number + * @gt: GT structure + * + * There's no need to lock while calling it, as seqprop_sequence is thread-safe + */ static inline u32 intel_gt_next_invalidate_tlb_full(const struct intel_gt *gt) { return intel_gt_tlb_seqno(gt) | 1;