diff mbox

[v3,2/3] drm/i915/bxt: Fix inadvertent CPU snooping due to incorrect MOCS config

Message ID 1467380406-11954-3-git-send-email-imre.deak@intel.com (mailing list archive)
State New, archived
Headers show

Commit Message

Imre Deak July 1, 2016, 1:40 p.m. UTC
Setting a write-back cache policy in the MOCS entry definition also
implies snooping, which has a considerable overhead. This is
unexpected for a few reasons:
- From user-space's point of view since it didn't want a coherent
  surface (it didn't set the buffer as such via the set caching IOCTL).
- There is a separate MOCS entry field for snooping (which we never
  set).
- This MOCS table is about caching in (e)LLC and there is no (e)LLC on
  BXT. There is a separate table for L3 cache control.

Considering the above the current behavior of snooping looks like an
unintentional side-effect of the WB setting. Changing it to be LLC-UC
gets rid of the snooping without any ill-effects. For a coherent
surface the application would use a separate MOCS entry at index 1 and
call the set caching IOCTL to setup the PTE entries for the
corresponding buffer to be snooped. In the future we could also add a
new MOCS entry for coherent surfaces.

This resulted in 70% improvement in synthetic texturing benchmarks.

Kudos to Valtteri Rantala, Eero Tamminen and Michael T Frederick and
Ville who helped to narrow the source of problem to the kernel and to
the snooping behaviour in particular.

With a follow-up change to adjust the 3rd entry value
igt/gem_mocs_settings is passing after this change.

v2:
- Rebase on v2 of patch 1/2.
v3:
- Set the entry as LLC uncached instead of PTE-passthrough. This way
  we also keep snooping disabled, but we also make the cacheability/
  coherency setting indepent of the PTE which is managed by the
  kernel. (Chris)

CC: Rong R Yang <rong.r.yang@intel.com>
CC: Yakui Zhao <yakui.zhao@intel.com>
CC: Valtteri Rantala <valtteri.rantala@intel.com>
CC: Eero Tamminen <eero.t.tamminen@intel.com>
CC: Michael T Frederick <michael.t.frederick@intel.com>
CC: Ville Syrjälä <ville.syrjala@linux.intel.com>
CC: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Imre Deak <imre.deak@intel.com>
---
 drivers/gpu/drm/i915/intel_mocs.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

Comments

Zhao, Yakui July 13, 2016, 2:32 a.m. UTC | #1
On 07/01/2016 09:40 PM, Deak, Imre wrote:
> Setting a write-back cache policy in the MOCS entry definition also
> implies snooping, which has a considerable overhead. This is
> unexpected for a few reasons:
> - From user-space's point of view since it didn't want a coherent
>    surface (it didn't set the buffer as such via the set caching IOCTL).
> - There is a separate MOCS entry field for snooping (which we never
>    set).
> - This MOCS table is about caching in (e)LLC and there is no (e)LLC on
>    BXT. There is a separate table for L3 cache control.
>
> Considering the above the current behavior of snooping looks like an
> unintentional side-effect of the WB setting. Changing it to be LLC-UC
> gets rid of the snooping without any ill-effects. For a coherent
> surface the application would use a separate MOCS entry at index 1 and
> call the set caching IOCTL to setup the PTE entries for the
> corresponding buffer to be snooped. In the future we could also add a
> new MOCS entry for coherent surfaces.
>
> This resulted in 70% improvement in synthetic texturing benchmarks.
>
> Kudos to Valtteri Rantala, Eero Tamminen and Michael T Frederick and
> Ville who helped to narrow the source of problem to the kernel and to
> the snooping behaviour in particular.
>
> With a follow-up change to adjust the 3rd entry value
> igt/gem_mocs_settings is passing after this change.
>
> v2:
> - Rebase on v2 of patch 1/2.
> v3:
> - Set the entry as LLC uncached instead of PTE-passthrough. This way
>    we also keep snooping disabled, but we also make the cacheability/
>    coherency setting indepent of the PTE which is managed by the
>    kernel. (Chris)
>
> CC: Rong R Yang<rong.r.yang@intel.com>
> CC: Yakui Zhao<yakui.zhao@intel.com>
> CC: Valtteri Rantala<valtteri.rantala@intel.com>
> CC: Eero Tamminen<eero.t.tamminen@intel.com>
> CC: Michael T Frederick<michael.t.frederick@intel.com>
> CC: Ville Syrjälä<ville.syrjala@linux.intel.com>
> CC: Chris Wilson<chris@chris-wilson.co.uk>
> Signed-off-by: Imre Deak<imre.deak@intel.com>

As the BXT has no LLC, setting the WB-policy will add the extra 
overhead. In such case the patch looks more reasonable for BXT.

Add: Acked-by: Zhao Yakui <yakui.zhao@intel.com>

> ---
>   drivers/gpu/drm/i915/intel_mocs.c | 4 ++--
>   1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/intel_mocs.c b/drivers/gpu/drm/i915/intel_mocs.c
> index d36e609..927825f 100644
> --- a/drivers/gpu/drm/i915/intel_mocs.c
> +++ b/drivers/gpu/drm/i915/intel_mocs.c
> @@ -149,8 +149,8 @@ static const struct drm_i915_mocs_entry broxton_mocs_table[] = {
>   	  .l3cc_value =    L3_ESC(0) | L3_SCC(0) | L3_CACHEABILITY(L3_WB),
>   	},
>   	{
> -	  /* 0x0000003b */
> -	  .control_value = LE_CACHEABILITY(LE_WB) |
> +	  /* 0x00000039 */
> +	  .control_value = LE_CACHEABILITY(LE_UC) |
>   			   LE_TGT_CACHE(LE_TC_LLC_ELLC) |
>   			   LE_LRUM(3) | LE_AOM(0) | LE_RSC(0) | LE_SCC(0) |
>   			   LE_PFM(0) | LE_SCF(0),
Yang, Rong R July 14, 2016, 8:33 a.m. UTC | #2
> -----Original Message-----

> From: Deak, Imre

> Sent: Friday, July 1, 2016 21:40

> To: intel-gfx@lists.freedesktop.org

> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>; Chris Wilson <chris@chris-

> wilson.co.uk>; Yang, Rong R <rong.r.yang@intel.com>; Zhao, Yakui

> <yakui.zhao@intel.com>; Tamminen, Eero T <eero.t.tamminen@intel.com>

> Subject: [PATCH v3 2/3] drm/i915/bxt: Fix inadvertent CPU snooping due to

> incorrect MOCS config

> 

> Setting a write-back cache policy in the MOCS entry definition also implies

> snooping, which has a considerable overhead. This is unexpected for a few

> reasons:

> - From user-space's point of view since it didn't want a coherent

>   surface (it didn't set the buffer as such via the set caching IOCTL).

> - There is a separate MOCS entry field for snooping (which we never

>   set).

> - This MOCS table is about caching in (e)LLC and there is no (e)LLC on

>   BXT. There is a separate table for L3 cache control.

> 

> Considering the above the current behavior of snooping looks like an

> unintentional side-effect of the WB setting. Changing it to be LLC-UC gets rid

> of the snooping without any ill-effects. For a coherent surface the application

> would use a separate MOCS entry at index 1 and call the set caching IOCTL to

> setup the PTE entries for the corresponding buffer to be snooped. In the

> future we could also add a new MOCS entry for coherent surfaces.

> 

> This resulted in 70% improvement in synthetic texturing benchmarks.

> 

> Kudos to Valtteri Rantala, Eero Tamminen and Michael T Frederick and Ville

> who helped to narrow the source of problem to the kernel and to the

> snooping behaviour in particular.

> 

> With a follow-up change to adjust the 3rd entry value

> igt/gem_mocs_settings is passing after this change.

> 

> v2:

> - Rebase on v2 of patch 1/2.

> v3:

> - Set the entry as LLC uncached instead of PTE-passthrough. This way

>   we also keep snooping disabled, but we also make the cacheability/

>   coherency setting indepent of the PTE which is managed by the

>   kernel. (Chris)


About 20% improvement in OpenCL benchmark luxmark.
Add: Tested-by: Rong R Yang <rong.r.yang@intel.com>
 
> CC: Rong R Yang <rong.r.yang@intel.com>

> CC: Yakui Zhao <yakui.zhao@intel.com>

> CC: Valtteri Rantala <valtteri.rantala@intel.com>

> CC: Eero Tamminen <eero.t.tamminen@intel.com>

> CC: Michael T Frederick <michael.t.frederick@intel.com>

> CC: Ville Syrjälä <ville.syrjala@linux.intel.com>

> CC: Chris Wilson <chris@chris-wilson.co.uk>

> Signed-off-by: Imre Deak <imre.deak@intel.com>

> ---

>  drivers/gpu/drm/i915/intel_mocs.c | 4 ++--

>  1 file changed, 2 insertions(+), 2 deletions(-)

> 

> diff --git a/drivers/gpu/drm/i915/intel_mocs.c

> b/drivers/gpu/drm/i915/intel_mocs.c

> index d36e609..927825f 100644

> --- a/drivers/gpu/drm/i915/intel_mocs.c

> +++ b/drivers/gpu/drm/i915/intel_mocs.c

> @@ -149,8 +149,8 @@ static const struct drm_i915_mocs_entry

> broxton_mocs_table[] = {

>  	  .l3cc_value =    L3_ESC(0) | L3_SCC(0) | L3_CACHEABILITY(L3_WB),

>  	},

>  	{

> -	  /* 0x0000003b */

> -	  .control_value = LE_CACHEABILITY(LE_WB) |

> +	  /* 0x00000039 */

> +	  .control_value = LE_CACHEABILITY(LE_UC) |

>  			   LE_TGT_CACHE(LE_TC_LLC_ELLC) |

>  			   LE_LRUM(3) | LE_AOM(0) | LE_RSC(0) | LE_SCC(0) |

>  			   LE_PFM(0) | LE_SCF(0),

> --

> 2.5.0
diff mbox

Patch

diff --git a/drivers/gpu/drm/i915/intel_mocs.c b/drivers/gpu/drm/i915/intel_mocs.c
index d36e609..927825f 100644
--- a/drivers/gpu/drm/i915/intel_mocs.c
+++ b/drivers/gpu/drm/i915/intel_mocs.c
@@ -149,8 +149,8 @@  static const struct drm_i915_mocs_entry broxton_mocs_table[] = {
 	  .l3cc_value =    L3_ESC(0) | L3_SCC(0) | L3_CACHEABILITY(L3_WB),
 	},
 	{
-	  /* 0x0000003b */
-	  .control_value = LE_CACHEABILITY(LE_WB) |
+	  /* 0x00000039 */
+	  .control_value = LE_CACHEABILITY(LE_UC) |
 			   LE_TGT_CACHE(LE_TC_LLC_ELLC) |
 			   LE_LRUM(3) | LE_AOM(0) | LE_RSC(0) | LE_SCC(0) |
 			   LE_PFM(0) | LE_SCF(0),