From patchwork Mon Apr 24 18:29:01 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Yang, Fei" X-Patchwork-Id: 13222469 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 11DE0C77B61 for ; Mon, 24 Apr 2023 18:28:03 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 1B24D10E1C4; Mon, 24 Apr 2023 18:27:58 +0000 (UTC) Received: from mga07.intel.com (mga07.intel.com [134.134.136.100]) by gabe.freedesktop.org (Postfix) with ESMTPS id 3830310E1C6; Mon, 24 Apr 2023 18:27:56 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1682360876; x=1713896876; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=iwC96/grpHbaoPYjKXIPv1w6PFpaTXyNWp+fjyuJC94=; b=DTVKNlLaUV4HpucboDUesCmvgQoa8f6MPSm30oPSV5TKo0J8Lzy7omAT JhTImg3qo/aq9OXvzGnedQc5b4x9AStEiyGgm9QcUTv+ArsLnoouCdH5p e7hVxT/WLUZ7v21ocBtxpevb0b6pJ7CMKv7TafbbRw1B8ltWZXTce/zYl KrzwiPu9fMHYor+wg+DkBjrDDomsC62BS+z3Dsk2lPRqJ+Zzo1niIBsgp 6I7ZjTjXPTX1u5HjJzK4c+8dJyTMwEpum/p+/vFSHFu6LNp6rFA1cMiUz lAzIqvYW3X2gwlBU92X8ELn7aEjtBKnsHd0Og517tFpgddXFVG26pNXVO A==; X-IronPort-AV: E=McAfee;i="6600,9927,10690"; a="411802209" X-IronPort-AV: E=Sophos;i="5.99,223,1677571200"; d="scan'208";a="411802209" Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by orsmga105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 24 Apr 2023 11:27:54 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10690"; a="762539847" X-IronPort-AV: E=Sophos;i="5.99,223,1677571200"; d="scan'208";a="762539847" Received: from fyang16-desk.jf.intel.com ([10.24.96.243]) by fmsmga004-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 24 Apr 2023 11:27:54 -0700 From: fei.yang@intel.com To: intel-gfx@lists.freedesktop.org Subject: [PATCH v2 1/2] drm/i915/mtl: Add PTE encode function Date: Mon, 24 Apr 2023 11:29:01 -0700 Message-Id: <20230424182902.3663500-2-fei.yang@intel.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20230424182902.3663500-1-fei.yang@intel.com> References: <20230424182902.3663500-1-fei.yang@intel.com> MIME-Version: 1.0 X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Andrzej Hajda , Nirmoy Das , Fei Yang , dri-devel@lists.freedesktop.org, Andi Shyti Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" From: Fei Yang PTE encode functions are platform dependent. This patch implements PTE functions for MTL, and ensures the correct PTE encode function is used by calling pte_encode function pointer instead of the hardcoded gen8 version of PTE encode. Fixes: b76c0deef627 ("drm/i915/mtl: Define MOCS and PAT tables for MTL") Signed-off-by: Fei Yang Reviewed-by: Andrzej Hajda Reviewed-by: Andi Shyti Acked-by: Nirmoy Das --- drivers/gpu/drm/i915/display/intel_dpt.c | 2 +- drivers/gpu/drm/i915/gt/gen8_ppgtt.c | 45 ++++++++++++++++++++---- drivers/gpu/drm/i915/gt/intel_ggtt.c | 36 +++++++++++++++++-- drivers/gpu/drm/i915/gt/intel_gtt.h | 12 +++++-- 4 files changed, 82 insertions(+), 13 deletions(-) diff --git a/drivers/gpu/drm/i915/display/intel_dpt.c b/drivers/gpu/drm/i915/display/intel_dpt.c index b8027392144d..c5eacfdba1a5 100644 --- a/drivers/gpu/drm/i915/display/intel_dpt.c +++ b/drivers/gpu/drm/i915/display/intel_dpt.c @@ -300,7 +300,7 @@ intel_dpt_create(struct intel_framebuffer *fb) vm->vma_ops.bind_vma = dpt_bind_vma; vm->vma_ops.unbind_vma = dpt_unbind_vma; - vm->pte_encode = gen8_ggtt_pte_encode; + vm->pte_encode = vm->gt->ggtt->vm.pte_encode; dpt->obj = dpt_obj; dpt->obj->is_dpt = true; diff --git a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c index 4daaa6f55668..4c9a2f2db908 100644 --- a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c +++ b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c @@ -55,6 +55,34 @@ static u64 gen8_pte_encode(dma_addr_t addr, return pte; } +static u64 mtl_pte_encode(dma_addr_t addr, + enum i915_cache_level level, + u32 flags) +{ + gen8_pte_t pte = addr | GEN8_PAGE_PRESENT | GEN8_PAGE_RW; + + if (unlikely(flags & PTE_READ_ONLY)) + pte &= ~GEN8_PAGE_RW; + + if (flags & PTE_LM) + pte |= GEN12_PPGTT_PTE_LM; + + switch (level) { + case I915_CACHE_NONE: + pte |= GEN12_PPGTT_PTE_PAT1; + break; + case I915_CACHE_LLC: + case I915_CACHE_L3_LLC: + pte |= GEN12_PPGTT_PTE_PAT0 | GEN12_PPGTT_PTE_PAT1; + break; + case I915_CACHE_WT: + pte |= GEN12_PPGTT_PTE_PAT0; + break; + } + + return pte; +} + static void gen8_ppgtt_notify_vgt(struct i915_ppgtt *ppgtt, bool create) { struct drm_i915_private *i915 = ppgtt->vm.i915; @@ -427,7 +455,7 @@ gen8_ppgtt_insert_pte(struct i915_ppgtt *ppgtt, u32 flags) { struct i915_page_directory *pd; - const gen8_pte_t pte_encode = gen8_pte_encode(0, cache_level, flags); + const gen8_pte_t pte_encode = ppgtt->vm.pte_encode(0, cache_level, flags); gen8_pte_t *vaddr; pd = i915_pd_entry(pdp, gen8_pd_index(idx, 2)); @@ -580,7 +608,7 @@ static void gen8_ppgtt_insert_huge(struct i915_address_space *vm, enum i915_cache_level cache_level, u32 flags) { - const gen8_pte_t pte_encode = gen8_pte_encode(0, cache_level, flags); + const gen8_pte_t pte_encode = vm->pte_encode(0, cache_level, flags); unsigned int rem = sg_dma_len(iter->sg); u64 start = vma_res->start; @@ -743,7 +771,7 @@ static void gen8_ppgtt_insert_entry(struct i915_address_space *vm, GEM_BUG_ON(pt->is_compact); vaddr = px_vaddr(pt); - vaddr[gen8_pd_index(idx, 0)] = gen8_pte_encode(addr, level, flags); + vaddr[gen8_pd_index(idx, 0)] = vm->pte_encode(addr, level, flags); drm_clflush_virt_range(&vaddr[gen8_pd_index(idx, 0)], sizeof(*vaddr)); } @@ -773,7 +801,7 @@ static void __xehpsdv_ppgtt_insert_entry_lm(struct i915_address_space *vm, } vaddr = px_vaddr(pt); - vaddr[gen8_pd_index(idx, 0) / 16] = gen8_pte_encode(addr, level, flags); + vaddr[gen8_pd_index(idx, 0) / 16] = vm->pte_encode(addr, level, flags); } static void xehpsdv_ppgtt_insert_entry(struct i915_address_space *vm, @@ -820,8 +848,8 @@ static int gen8_init_scratch(struct i915_address_space *vm) pte_flags |= PTE_LM; vm->scratch[0]->encode = - gen8_pte_encode(px_dma(vm->scratch[0]), - I915_CACHE_NONE, pte_flags); + vm->pte_encode(px_dma(vm->scratch[0]), + I915_CACHE_NONE, pte_flags); for (i = 1; i <= vm->top; i++) { struct drm_i915_gem_object *obj; @@ -963,7 +991,10 @@ struct i915_ppgtt *gen8_ppgtt_create(struct intel_gt *gt, */ ppgtt->vm.alloc_scratch_dma = alloc_pt_dma; - ppgtt->vm.pte_encode = gen8_pte_encode; + if (GRAPHICS_VER_FULL(gt->i915) >= IP_VER(12, 70)) + ppgtt->vm.pte_encode = mtl_pte_encode; + else + ppgtt->vm.pte_encode = gen8_pte_encode; ppgtt->vm.bind_async_flags = I915_VMA_LOCAL_BIND; ppgtt->vm.insert_entries = gen8_ppgtt_insert; diff --git a/drivers/gpu/drm/i915/gt/intel_ggtt.c b/drivers/gpu/drm/i915/gt/intel_ggtt.c index 3c7f1ed92f5b..20915edc8bd9 100644 --- a/drivers/gpu/drm/i915/gt/intel_ggtt.c +++ b/drivers/gpu/drm/i915/gt/intel_ggtt.c @@ -220,6 +220,33 @@ static void guc_ggtt_invalidate(struct i915_ggtt *ggtt) } } +static u64 mtl_ggtt_pte_encode(dma_addr_t addr, + enum i915_cache_level level, + u32 flags) +{ + gen8_pte_t pte = addr | GEN8_PAGE_PRESENT; + + WARN_ON_ONCE(addr & ~GEN12_GGTT_PTE_ADDR_MASK); + + if (flags & PTE_LM) + pte |= GEN12_GGTT_PTE_LM; + + switch (level) { + case I915_CACHE_NONE: + pte |= MTL_GGTT_PTE_PAT1; + break; + case I915_CACHE_LLC: + case I915_CACHE_L3_LLC: + pte |= MTL_GGTT_PTE_PAT0 | MTL_GGTT_PTE_PAT1; + break; + case I915_CACHE_WT: + pte |= MTL_GGTT_PTE_PAT0; + break; + } + + return pte; +} + u64 gen8_ggtt_pte_encode(dma_addr_t addr, enum i915_cache_level level, u32 flags) @@ -247,7 +274,7 @@ static void gen8_ggtt_insert_page(struct i915_address_space *vm, gen8_pte_t __iomem *pte = (gen8_pte_t __iomem *)ggtt->gsm + offset / I915_GTT_PAGE_SIZE; - gen8_set_pte(pte, gen8_ggtt_pte_encode(addr, level, flags)); + gen8_set_pte(pte, ggtt->vm.pte_encode(addr, level, flags)); ggtt->invalidate(ggtt); } @@ -257,8 +284,8 @@ static void gen8_ggtt_insert_entries(struct i915_address_space *vm, enum i915_cache_level level, u32 flags) { - const gen8_pte_t pte_encode = gen8_ggtt_pte_encode(0, level, flags); struct i915_ggtt *ggtt = i915_vm_to_ggtt(vm); + const gen8_pte_t pte_encode = ggtt->vm.pte_encode(0, level, flags); gen8_pte_t __iomem *gte; gen8_pte_t __iomem *end; struct sgt_iter iter; @@ -981,7 +1008,10 @@ static int gen8_gmch_probe(struct i915_ggtt *ggtt) ggtt->vm.vma_ops.bind_vma = intel_ggtt_bind_vma; ggtt->vm.vma_ops.unbind_vma = intel_ggtt_unbind_vma; - ggtt->vm.pte_encode = gen8_ggtt_pte_encode; + if (GRAPHICS_VER_FULL(i915) >= IP_VER(12, 70)) + ggtt->vm.pte_encode = mtl_ggtt_pte_encode; + else + ggtt->vm.pte_encode = gen8_ggtt_pte_encode; return ggtt_probe_common(ggtt, size); } diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.h b/drivers/gpu/drm/i915/gt/intel_gtt.h index ea17849e7a5c..1910683f03b4 100644 --- a/drivers/gpu/drm/i915/gt/intel_gtt.h +++ b/drivers/gpu/drm/i915/gt/intel_gtt.h @@ -88,9 +88,17 @@ typedef u64 gen8_pte_t; #define BYT_PTE_SNOOPED_BY_CPU_CACHES REG_BIT(2) #define BYT_PTE_WRITEABLE REG_BIT(1) +#define MTL_PPGTT_PTE_PAT3 BIT_ULL(62) #define GEN12_PPGTT_PTE_LM BIT_ULL(11) - -#define GEN12_GGTT_PTE_LM BIT_ULL(1) +#define GEN12_PPGTT_PTE_PAT2 BIT_ULL(7) +#define GEN12_PPGTT_PTE_PAT1 BIT_ULL(4) +#define GEN12_PPGTT_PTE_PAT0 BIT_ULL(3) + +#define GEN12_GGTT_PTE_LM BIT_ULL(1) +#define MTL_GGTT_PTE_PAT0 BIT_ULL(52) +#define MTL_GGTT_PTE_PAT1 BIT_ULL(53) +#define GEN12_GGTT_PTE_ADDR_MASK GENMASK_ULL(45, 12) +#define MTL_GGTT_PTE_PAT_MASK GENMASK_ULL(53, 52) #define GEN12_PDE_64K BIT(6) #define GEN12_PTE_PS64 BIT(8) From patchwork Mon Apr 24 18:29:02 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Yang, Fei" X-Patchwork-Id: 13222471 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id E1238C7618E for ; Mon, 24 Apr 2023 18:28:09 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 1705C10E5CC; Mon, 24 Apr 2023 18:28:00 +0000 (UTC) Received: from mga07.intel.com (mga07.intel.com [134.134.136.100]) by gabe.freedesktop.org (Postfix) with ESMTPS id 75ECB10E1C6; Mon, 24 Apr 2023 18:27:56 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1682360877; x=1713896877; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=dwYVxqPeH0CmQAMXEPNFMj1AbFDbIPfLSM3z4LPBfXw=; b=Ba4bUUHg4I+7bymGuJyjEepOYxhZDnXodHb/+f4/j4JaGbGPde1/Vtly lRgEIhTxxv/o7oURuhFnpKeloFUVPePxeKrf5rKm+Pa5AbYm+fl4mITTo 41VjatT87rWopSwUMDY5sainWPW4G08ClEYEIKah67W/H2cVJGlKWlbM+ j8H/vyuviTM0aTRnf/cW2+/+QyKa6YDuaQnbs4B0+N8LkztGohUFOv/Yr y6ZuPNO9gSn0q1D3BmVBiz8lTV4cLLLGawiZjLUSqy5m1MpBCFnT6rdAI IeqfYlSY+R8KxTL1t+Esag1T28U4cLkHqVXuuxek54Y4AI/j+zvAmQhjM w==; X-IronPort-AV: E=McAfee;i="6600,9927,10690"; a="411802210" X-IronPort-AV: E=Sophos;i="5.99,223,1677571200"; d="scan'208";a="411802210" Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by orsmga105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 24 Apr 2023 11:27:54 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10690"; a="762539850" X-IronPort-AV: E=Sophos;i="5.99,223,1677571200"; d="scan'208";a="762539850" Received: from fyang16-desk.jf.intel.com ([10.24.96.243]) by fmsmga004-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 24 Apr 2023 11:27:54 -0700 From: fei.yang@intel.com To: intel-gfx@lists.freedesktop.org Subject: [PATCH v2 2/2] drm/i915/mtl: workaround coherency issue for Media Date: Mon, 24 Apr 2023 11:29:02 -0700 Message-Id: <20230424182902.3663500-3-fei.yang@intel.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20230424182902.3663500-1-fei.yang@intel.com> References: <20230424182902.3663500-1-fei.yang@intel.com> MIME-Version: 1.0 X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Fei Yang , dri-devel@lists.freedesktop.org, Andrzej Hajda , Andi Shyti , Matt Roper , Nirmoy Das Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" From: Fei Yang This patch implements Wa_22016122933. In MTL, memory writes initiated by the Media tile update the whole cache line, even for partial writes. This creates a coherency problem for cacheable memory if both CPU and GPU are writing data to different locations within a single cache line. This patch circumvents the issue by making CPU/GPU shared memory uncacheable (WC on CPU side, and PAT index 2 for GPU). Additionally, it ensures that CPU writes are visible to the GPU with an intel_guc_write_barrier(). While fixing the CTB issue, we noticed some random GSC firmware loading failure because the share buffers are cacheable (WB) on CPU side but uncached on GPU side. To fix these issues we need to map such shared buffers as WC on CPU side. Since such allocations are not all done through GuC allocator, to avoid too many code changes, the i915_coherent_map_type() is now hard coded to return WC for MTL. v2: Simplify the commit message(Matt). BSpec: 45101 Signed-off-by: Fei Yang Reviewed-by: Andi Shyti Acked-by: Nirmoy Das Reviewed-by: Andrzej Hajda Reviewed-by: Matt Roper Signed-off-by: Nirmoy Das --- drivers/gpu/drm/i915/gem/i915_gem_pages.c | 5 ++++- drivers/gpu/drm/i915/gt/uc/intel_gsc_fw.c | 13 +++++++++++++ drivers/gpu/drm/i915/gt/uc/intel_guc.c | 7 +++++++ drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 6 ++++++ 4 files changed, 30 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/i915/gem/i915_gem_pages.c b/drivers/gpu/drm/i915/gem/i915_gem_pages.c index ecd86130b74f..89fc8ea6bcfc 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_pages.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_pages.c @@ -469,7 +469,10 @@ enum i915_map_type i915_coherent_map_type(struct drm_i915_private *i915, struct drm_i915_gem_object *obj, bool always_coherent) { - if (i915_gem_object_is_lmem(obj)) + /* + * Wa_22016122933: always return I915_MAP_WC for MTL + */ + if (i915_gem_object_is_lmem(obj) || IS_METEORLAKE(i915)) return I915_MAP_WC; if (HAS_LLC(i915) || always_coherent) return I915_MAP_WB; diff --git a/drivers/gpu/drm/i915/gt/uc/intel_gsc_fw.c b/drivers/gpu/drm/i915/gt/uc/intel_gsc_fw.c index 1d9fdfb11268..236673c02f9a 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_gsc_fw.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_gsc_fw.c @@ -110,6 +110,13 @@ static int gsc_fw_load_prepare(struct intel_gsc_uc *gsc) if (obj->base.size < gsc->fw.size) return -ENOSPC; + /* + * Wa_22016122933: For MTL the shared memory needs to be mapped + * as WC on CPU side and UC (PAT index 2) on GPU side + */ + if (IS_METEORLAKE(i915)) + i915_gem_object_set_cache_coherency(obj, I915_CACHE_NONE); + dst = i915_gem_object_pin_map_unlocked(obj, i915_coherent_map_type(i915, obj, true)); if (IS_ERR(dst)) @@ -125,6 +132,12 @@ static int gsc_fw_load_prepare(struct intel_gsc_uc *gsc) memset(dst, 0, obj->base.size); memcpy(dst, src, gsc->fw.size); + /* + * Wa_22016122933: Making sure the data in dst is + * visible to GSC right away + */ + intel_guc_write_barrier(>->uc.guc); + i915_gem_object_unpin_map(gsc->fw.obj); i915_gem_object_unpin_map(obj); diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.c b/drivers/gpu/drm/i915/gt/uc/intel_guc.c index e89f16ecf1ae..c9f20385f6a0 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.c @@ -744,6 +744,13 @@ struct i915_vma *intel_guc_allocate_vma(struct intel_guc *guc, u32 size) if (IS_ERR(obj)) return ERR_CAST(obj); + /* + * Wa_22016122933: For MTL the shared memory needs to be mapped + * as WC on CPU side and UC (PAT index 2) on GPU side + */ + if (IS_METEORLAKE(gt->i915)) + i915_gem_object_set_cache_coherency(obj, I915_CACHE_NONE); + vma = i915_vma_instance(obj, >->ggtt->vm, NULL); if (IS_ERR(vma)) goto err; diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c index 1803a633ed64..99a0a89091e7 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c @@ -902,6 +902,12 @@ static int ct_read(struct intel_guc_ct *ct, struct ct_incoming_msg **msg) /* now update descriptor */ WRITE_ONCE(desc->head, head); + /* + * Wa_22016122933: Making sure the head update is + * visible to GuC right away + */ + intel_guc_write_barrier(ct_to_guc(ct)); + return available - len; corrupted: