diff mbox series

[4/4] drm/i915/dg1: WA GPU hang at RCC

Message ID 20210303010728.3605269-4-lucas.demarchi@intel.com (mailing list archive)
State New, archived
Headers show
Series [1/4] drm/i915/gen12: Add recommended hardware tuning value | expand

Commit Message

Lucas De Marchi March 3, 2021, 1:07 a.m. UTC
From: Zhen Han <zhen.han@intel.com>

GPU hangs at RCC. According to Wa_14012131227 we shouldn't have a hang
due to RHWO, but that is what we are observing, even without media
compressible render target. Feedback from HW engineers is to leave RHWO
disabled.

Cc: Jianjun Liu <Jianjun.liu@intel.com>
Cc: Chuansheng Liu <chuansheng.liu@intel.com>
Cc: Radhakrishna Sripada <radhakrishna.sripada@intel.com>
Signed-off-by: Zhen Han <zhen.han@intel.com>
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
---
 drivers/gpu/drm/i915/gt/intel_workarounds.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

Comments

Han, Zhen March 3, 2021, 3:26 a.m. UTC | #1
Yes, that's the case.
It has RCC related silicon issues in gen12-lp.
Followings are two consecutive GPU hangs we found in SG1 and DG1 linux which have no media compressible render target.
1. HSD-1508524297<https://hsdes.intel.com/appstore/article/>  [SG1][DG1] GPU hang in PIPECONTROL in running 120 ways of Android container with running pocket story HD apk.
--> The solution is disable RHWO optimization in default.
2. hsd-1508734716<https://hsdes.intel.com/appstore/article/> [DG1][Linux] GPU hang in PIPECONTROL(IPEHR:0x7a000004) with (PSS, RCPFE, RCC, WMFE) not done in running Monkey test
--> The solution is keeping RHWO optimization in when Render Target Resolve type is PARTIAL or FULL. The change will be in mesa code.

SV and RCC design team have further study and give the root cause explain the bug-eco HSD of 1508744258 - Hang due to deadlock created by RHWO scenario with RHWO optimization enabled<https://hsdes.intel.com/appstore/article/>.

BTW, recently, Windows team found similar GPU hangs in custom's TGL platform and need "disable RHWO" as the WA solution.  So it's a general issues in Gen12 (TGL and DG1).
*       14012336472 - [HP-TDC_IEC/HarryPotter]SIO1880260 Simple Solitaire UI show garbage when playing the game by finger.<https://hsdes.intel.com/appstore/article/>
*       18014955083 - [TGL] Sporadic pixel shader hang when alpha blending is enabled <https://hsdes.intel.com/appstore/article/>  (SV sighting)

Thanks
Han Zhen

-----Original Message-----
From: De Marchi, Lucas <lucas.demarchi@intel.com>
Sent: Wednesday, March 3, 2021 9:07 AM
To: intel-gfx@lists.freedesktop.org
Cc: Han, Zhen <zhen.han@intel.com>; Liu, Jianjun <jianjun.liu@intel.com>; Liu, Chuansheng <chuansheng.liu@intel.com>; Sripada, Radhakrishna <radhakrishna.sripada@intel.com>
Subject: [PATCH 4/4] drm/i915/dg1: WA GPU hang at RCC

From: Zhen Han <zhen.han@intel.com<mailto:zhen.han@intel.com>>

GPU hangs at RCC. According to Wa_14012131227 we shouldn't have a hang due to RHWO, but that is what we are observing, even without media compressible render target. Feedback from HW engineers is to leave RHWO disabled.

Cc: Jianjun Liu <Jianjun.liu@intel.com<mailto:Jianjun.liu@intel.com>>
Cc: Chuansheng Liu <chuansheng.liu@intel.com<mailto:chuansheng.liu@intel.com>>
Cc: Radhakrishna Sripada <radhakrishna.sripada@intel.com<mailto:radhakrishna.sripada@intel.com>>
Signed-off-by: Zhen Han <zhen.han@intel.com<mailto:zhen.han@intel.com>>
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com<mailto:lucas.demarchi@intel.com>>
---
 drivers/gpu/drm/i915/gt/intel_workarounds.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/drivers/gpu/drm/i915/gt/intel_workarounds.c b/drivers/gpu/drm/i915/gt/intel_workarounds.c
index e678fa8d2ab9..5235fb70a69a 100644
--- a/drivers/gpu/drm/i915/gt/intel_workarounds.c
+++ b/drivers/gpu/drm/i915/gt/intel_workarounds.c
@@ -739,6 +739,17 @@ static void dg1_ctx_workarounds_init(struct intel_engine_cs *engine,
               FF_MODE2,
               FF_MODE2_GS_TIMER_MASK,
               FF_MODE2_GS_TIMER_224, 0);
+
+       /*
+        * Wa_14012131227
+        *
+        * Although the WA is described as causing corruption when using media
+        * compressible render target, leaving RHWO enabled is also causing
+        * gpu hang when using multiple concurrent render and media workloads.
+        * Disable it completely for now.
+        */
+       wa_masked_en(wal, GEN7_COMMON_SLICE_CHICKEN1,
+                    GEN9_RHWO_OPTIMIZATION_DISABLE);
 }

 static void
--
2.30.1
Matt Roper March 3, 2021, 3:37 a.m. UTC | #2
On Tue, Mar 02, 2021 at 05:07:28PM -0800, Lucas De Marchi wrote:
> From: Zhen Han <zhen.han@intel.com>
> 
> GPU hangs at RCC. According to Wa_14012131227 we shouldn't have a hang
> due to RHWO, but that is what we are observing, even without media
> compressible render target. Feedback from HW engineers is to leave RHWO
> disabled.

"14012131227" isn't the correct workaround number; that's a
platform-specific identifier.  This should be referred to by its lineage
number 22011054531 which is common across all affected platforms.
From a quick scan, it looks like this isn't just a DG1 workaround, but
also applies to at least TGL and ADL-S as well (and is pending for RKL).

I'm not sure we actually need this workaround in the kernel though.
We're already whitelisting this register for userspace to allow UMD's to
apply workarounds to it directly (and UMD's are already doing their own
programming of the register for Wa_1808121037).  So it may be best to
leave the handling of this additional bit to them as well, especially if
the desired handling doesn't quite match the officially documented
workaround text.


Matt

> 
> Cc: Jianjun Liu <Jianjun.liu@intel.com>
> Cc: Chuansheng Liu <chuansheng.liu@intel.com>
> Cc: Radhakrishna Sripada <radhakrishna.sripada@intel.com>
> Signed-off-by: Zhen Han <zhen.han@intel.com>
> Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
> ---
>  drivers/gpu/drm/i915/gt/intel_workarounds.c | 11 +++++++++++
>  1 file changed, 11 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/gt/intel_workarounds.c b/drivers/gpu/drm/i915/gt/intel_workarounds.c
> index e678fa8d2ab9..5235fb70a69a 100644
> --- a/drivers/gpu/drm/i915/gt/intel_workarounds.c
> +++ b/drivers/gpu/drm/i915/gt/intel_workarounds.c
> @@ -739,6 +739,17 @@ static void dg1_ctx_workarounds_init(struct intel_engine_cs *engine,
>  	       FF_MODE2,
>  	       FF_MODE2_GS_TIMER_MASK,
>  	       FF_MODE2_GS_TIMER_224, 0);
> +
> +	/*
> +	 * Wa_14012131227
> +	 *
> +	 * Although the WA is described as causing corruption when using media
> +	 * compressible render target, leaving RHWO enabled is also causing
> +	 * gpu hang when using multiple concurrent render and media workloads.
> +	 * Disable it completely for now.
> +	 */
> +	wa_masked_en(wal, GEN7_COMMON_SLICE_CHICKEN1,
> +		     GEN9_RHWO_OPTIMIZATION_DISABLE);
>  }
>  
>  static void
> -- 
> 2.30.1
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Han, Zhen March 3, 2021, 3:47 a.m. UTC | #3
Dear Matt,

Yes, it needs the WA in TGL.  Not sure the ADL-S and RKL.
The issue is different from 1808121037.
Previously, it was not found which exact usage condition needs to disable RHWO in studying the Alibaba issue in SG1, so make this change in kernel. 
Should we move the " disable RHWO " to mesa default setting path? 

Thanks
Han Zhen
-----Original Message-----
From: Roper, Matthew D <matthew.d.roper@intel.com> 
Sent: Wednesday, March 3, 2021 11:37 AM
To: De Marchi, Lucas <lucas.demarchi@intel.com>
Cc: intel-gfx@lists.freedesktop.org; Liu, Jianjun <jianjun.liu@intel.com>; Han, Zhen <zhen.han@intel.com>
Subject: Re: [Intel-gfx] [PATCH 4/4] drm/i915/dg1: WA GPU hang at RCC

On Tue, Mar 02, 2021 at 05:07:28PM -0800, Lucas De Marchi wrote:
> From: Zhen Han <zhen.han@intel.com>
> 
> GPU hangs at RCC. According to Wa_14012131227 we shouldn't have a hang 
> due to RHWO, but that is what we are observing, even without media 
> compressible render target. Feedback from HW engineers is to leave 
> RHWO disabled.

"14012131227" isn't the correct workaround number; that's a platform-specific identifier.  This should be referred to by its lineage number 22011054531 which is common across all affected platforms.
From a quick scan, it looks like this isn't just a DG1 workaround, but also applies to at least TGL and ADL-S as well (and is pending for RKL).

I'm not sure we actually need this workaround in the kernel though.
We're already whitelisting this register for userspace to allow UMD's to apply workarounds to it directly (and UMD's are already doing their own programming of the register for Wa_1808121037).  So it may be best to leave the handling of this additional bit to them as well, especially if the desired handling doesn't quite match the officially documented workaround text.


Matt

> 
> Cc: Jianjun Liu <Jianjun.liu@intel.com>
> Cc: Chuansheng Liu <chuansheng.liu@intel.com>
> Cc: Radhakrishna Sripada <radhakrishna.sripada@intel.com>
> Signed-off-by: Zhen Han <zhen.han@intel.com>
> Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
> ---
>  drivers/gpu/drm/i915/gt/intel_workarounds.c | 11 +++++++++++
>  1 file changed, 11 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/gt/intel_workarounds.c 
> b/drivers/gpu/drm/i915/gt/intel_workarounds.c
> index e678fa8d2ab9..5235fb70a69a 100644
> --- a/drivers/gpu/drm/i915/gt/intel_workarounds.c
> +++ b/drivers/gpu/drm/i915/gt/intel_workarounds.c
> @@ -739,6 +739,17 @@ static void dg1_ctx_workarounds_init(struct intel_engine_cs *engine,
>  	       FF_MODE2,
>  	       FF_MODE2_GS_TIMER_MASK,
>  	       FF_MODE2_GS_TIMER_224, 0);
> +
> +	/*
> +	 * Wa_14012131227
> +	 *
> +	 * Although the WA is described as causing corruption when using media
> +	 * compressible render target, leaving RHWO enabled is also causing
> +	 * gpu hang when using multiple concurrent render and media workloads.
> +	 * Disable it completely for now.
> +	 */
> +	wa_masked_en(wal, GEN7_COMMON_SLICE_CHICKEN1,
> +		     GEN9_RHWO_OPTIMIZATION_DISABLE);
>  }
>  
>  static void
> --
> 2.30.1
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx

--
Matt Roper
Graphics Software Engineer
VTT-OSGC Platform Enablement
Intel Corporation
(916) 356-2795
diff mbox series

Patch

diff --git a/drivers/gpu/drm/i915/gt/intel_workarounds.c b/drivers/gpu/drm/i915/gt/intel_workarounds.c
index e678fa8d2ab9..5235fb70a69a 100644
--- a/drivers/gpu/drm/i915/gt/intel_workarounds.c
+++ b/drivers/gpu/drm/i915/gt/intel_workarounds.c
@@ -739,6 +739,17 @@  static void dg1_ctx_workarounds_init(struct intel_engine_cs *engine,
 	       FF_MODE2,
 	       FF_MODE2_GS_TIMER_MASK,
 	       FF_MODE2_GS_TIMER_224, 0);
+
+	/*
+	 * Wa_14012131227
+	 *
+	 * Although the WA is described as causing corruption when using media
+	 * compressible render target, leaving RHWO enabled is also causing
+	 * gpu hang when using multiple concurrent render and media workloads.
+	 * Disable it completely for now.
+	 */
+	wa_masked_en(wal, GEN7_COMMON_SLICE_CHICKEN1,
+		     GEN9_RHWO_OPTIMIZATION_DISABLE);
 }
 
 static void