diff mbox

[2/2] drm/i915/bxt: Fix inadvertent CPU snooping due to incorrect MOCS config

Message ID 1461587001-2463-2-git-send-email-imre.deak@intel.com (mailing list archive)
State New, archived
Headers show

Commit Message

Imre Deak April 25, 2016, 12:23 p.m. UTC
Setting a write-back cache policy in the MOCS entry definition also
implies snooping, which has a considerable overhead. This is
unexpected for a few reasons:
- From user-space's point of view since it didn't want a coherent
  surface (it didn't set the buffer as such via the set caching IOCTL).
- There is a separate MOCS entry field for snooping (which we never
  set).
- This MOCS table is about caching in (e)LLC and there is no (e)LLC on
  BXT. There is a separate table for L3 cache control.

Considering the above the current behavior of snooping looks like an
unintentional side-effect of the WB setting. Changing it to be PTE
based cacheability gets rid of the snooping without any ill-effects.
For a coherent surface the application would use a separate MOCS entry
(at index 1) and call the set caching IOCTL to setup the PTE entries
for the corresponding buffer to be snooped.

This resulted in 70% improvement in synthetic texturing benchmarks.

Kudos to Valtteri Rantala, Eero Tamminen and Michael T Frederick and
Ville who helped to narrow the source of problem to the kernel and to
the snooping behaviour in particular.

CC: Valtteri Rantala <valtteri.rantala@intel.com>
CC: Eero Tamminen <eero.t.tamminen@intel.com>
CC: Michael T Frederick <michael.t.frederick@intel.com>
CC: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Imre Deak <imre.deak@intel.com>
---
 drivers/gpu/drm/i915/intel_mocs.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Chris Wilson April 25, 2016, 12:37 p.m. UTC | #1
On Mon, Apr 25, 2016 at 03:23:21PM +0300, Imre Deak wrote:
> Setting a write-back cache policy in the MOCS entry definition also
> implies snooping, which has a considerable overhead. This is
> unexpected for a few reasons:
> - From user-space's point of view since it didn't want a coherent
>   surface (it didn't set the buffer as such via the set caching IOCTL).
> - There is a separate MOCS entry field for snooping (which we never
>   set).
> - This MOCS table is about caching in (e)LLC and there is no (e)LLC on
>   BXT. There is a separate table for L3 cache control.
> 
> Considering the above the current behavior of snooping looks like an
> unintentional side-effect of the WB setting. Changing it to be PTE
> based cacheability gets rid of the snooping without any ill-effects.
> For a coherent surface the application would use a separate MOCS entry
> (at index 1) and call the set caching IOCTL to setup the PTE entries
> for the corresponding buffer to be snooped.
> 
> This resulted in 70% improvement in synthetic texturing benchmarks.
> 
> Kudos to Valtteri Rantala, Eero Tamminen and Michael T Frederick and
> Ville who helped to narrow the source of problem to the kernel and to
> the snooping behaviour in particular.
> 
> CC: Valtteri Rantala <valtteri.rantala@intel.com>
> CC: Eero Tamminen <eero.t.tamminen@intel.com>
> CC: Michael T Frederick <michael.t.frederick@intel.com>
> CC: Ville Syrjälä <ville.syrjala@linux.intel.com>
> Signed-off-by: Imre Deak <imre.deak@intel.com>
> ---
>  drivers/gpu/drm/i915/intel_mocs.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/i915/intel_mocs.c b/drivers/gpu/drm/i915/intel_mocs.c
> index 5006a92..23c7dd1 100644
> --- a/drivers/gpu/drm/i915/intel_mocs.c
> +++ b/drivers/gpu/drm/i915/intel_mocs.c
> @@ -142,7 +142,7 @@ static const struct drm_i915_mocs_entry broxton_mocs_table[] = {
>  	  .l3cc_value =    L3_ESC(0) | L3_SCC(0) | L3_CACHEABILITY(L3_WB),
>  	},
>  	{
> -	  .control_value = LE_CACHEABILITY(LE_WB) |
> +	  .control_value = LE_CACHEABILITY(LE_PAGETABLE) |
>  			   LE_TGT_CACHE(LE_TC_LLC_ELLC) |
>  			   LE_LRUM(3) | LE_AOM(0) | LE_RSC(0) | LE_SCC(0) |
>  			   LE_PFM(0) | LE_SCF(0),

This is index 2? This should *be* the snooping entry?

Index 0: uncached
Index 1: follow pte
Index 2: snoop

Aim I missing something? Why isn't this a userspce bug for requesting a
mocs setting it didn't wnat? In my kernel this makes mocs_table[1] ==
mocs_table[2].
-Chris
Imre Deak April 25, 2016, 12:39 p.m. UTC | #2
On ma, 2016-04-25 at 13:37 +0100, Chris Wilson wrote:
> On Mon, Apr 25, 2016 at 03:23:21PM +0300, Imre Deak wrote:
> > Setting a write-back cache policy in the MOCS entry definition also
> > implies snooping, which has a considerable overhead. This is
> > unexpected for a few reasons:
> > - From user-space's point of view since it didn't want a coherent
> >   surface (it didn't set the buffer as such via the set caching
> > IOCTL).
> > - There is a separate MOCS entry field for snooping (which we never
> >   set).
> > - This MOCS table is about caching in (e)LLC and there is no (e)LLC
> > on
> >   BXT. There is a separate table for L3 cache control.
> > 
> > Considering the above the current behavior of snooping looks like
> > an
> > unintentional side-effect of the WB setting. Changing it to be PTE
> > based cacheability gets rid of the snooping without any ill-
> > effects.
> > For a coherent surface the application would use a separate MOCS
> > entry
> > (at index 1) and call the set caching IOCTL to setup the PTE
> > entries
> > for the corresponding buffer to be snooped.
> > 
> > This resulted in 70% improvement in synthetic texturing benchmarks.
> > 
> > Kudos to Valtteri Rantala, Eero Tamminen and Michael T Frederick
> > and
> > Ville who helped to narrow the source of problem to the kernel and
> > to
> > the snooping behaviour in particular.
> > 
> > CC: Valtteri Rantala <valtteri.rantala@intel.com>
> > CC: Eero Tamminen <eero.t.tamminen@intel.com>
> > CC: Michael T Frederick <michael.t.frederick@intel.com>
> > CC: Ville Syrjälä <ville.syrjala@linux.intel.com>
> > Signed-off-by: Imre Deak <imre.deak@intel.com>
> > ---
> >  drivers/gpu/drm/i915/intel_mocs.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/intel_mocs.c
> > b/drivers/gpu/drm/i915/intel_mocs.c
> > index 5006a92..23c7dd1 100644
> > --- a/drivers/gpu/drm/i915/intel_mocs.c
> > +++ b/drivers/gpu/drm/i915/intel_mocs.c
> > @@ -142,7 +142,7 @@ static const struct drm_i915_mocs_entry
> > broxton_mocs_table[] = {
> >  	  .l3cc_value =    L3_ESC(0) | L3_SCC(0) |
> > L3_CACHEABILITY(L3_WB),
> >  	},
> >  	{
> > -	  .control_value = LE_CACHEABILITY(LE_WB) |
> > +	  .control_value = LE_CACHEABILITY(LE_PAGETABLE) |
> >  			   LE_TGT_CACHE(LE_TC_LLC_ELLC) |
> >  			   LE_LRUM(3) | LE_AOM(0) | LE_RSC(0) |
> > LE_SCC(0) |
> >  			   LE_PFM(0) | LE_SCF(0),
> 
> This is index 2? This should *be* the snooping entry?
> 
> Index 0: uncached
> Index 1: follow pte
> Index 2: snoop
> 
> Aim I missing something? Why isn't this a userspce bug for requesting
> a
> mocs setting it didn't wnat? In my kernel this makes mocs_table[1] ==
> mocs_table[2].

I don't think there is or can be any snooping entry. On CHV for example
we can only setup snooping via the PTE, so there we necessarily have to
use entry 1 + PTE setup (set caching IOCTL). This is also what both
Windows and Mesa assumes apparently.

--Imre
Chris Wilson April 25, 2016, 12:49 p.m. UTC | #3
On Mon, Apr 25, 2016 at 03:39:54PM +0300, Imre Deak wrote:
> On ma, 2016-04-25 at 13:37 +0100, Chris Wilson wrote:
> > On Mon, Apr 25, 2016 at 03:23:21PM +0300, Imre Deak wrote:
> > > Setting a write-back cache policy in the MOCS entry definition also
> > > implies snooping, which has a considerable overhead. This is
> > > unexpected for a few reasons:
> > > - From user-space's point of view since it didn't want a coherent
> > >   surface (it didn't set the buffer as such via the set caching
> > > IOCTL).
> > > - There is a separate MOCS entry field for snooping (which we never
> > >   set).
> > > - This MOCS table is about caching in (e)LLC and there is no (e)LLC
> > > on
> > >   BXT. There is a separate table for L3 cache control.
> > > 
> > > Considering the above the current behavior of snooping looks like
> > > an
> > > unintentional side-effect of the WB setting. Changing it to be PTE
> > > based cacheability gets rid of the snooping without any ill-
> > > effects.
> > > For a coherent surface the application would use a separate MOCS
> > > entry
> > > (at index 1) and call the set caching IOCTL to setup the PTE
> > > entries
> > > for the corresponding buffer to be snooped.
> > > 
> > > This resulted in 70% improvement in synthetic texturing benchmarks.
> > > 
> > > Kudos to Valtteri Rantala, Eero Tamminen and Michael T Frederick
> > > and
> > > Ville who helped to narrow the source of problem to the kernel and
> > > to
> > > the snooping behaviour in particular.
> > > 
> > > CC: Valtteri Rantala <valtteri.rantala@intel.com>
> > > CC: Eero Tamminen <eero.t.tamminen@intel.com>
> > > CC: Michael T Frederick <michael.t.frederick@intel.com>
> > > CC: Ville Syrjälä <ville.syrjala@linux.intel.com>
> > > Signed-off-by: Imre Deak <imre.deak@intel.com>
> > > ---
> > >  drivers/gpu/drm/i915/intel_mocs.c | 2 +-
> > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > 
> > > diff --git a/drivers/gpu/drm/i915/intel_mocs.c
> > > b/drivers/gpu/drm/i915/intel_mocs.c
> > > index 5006a92..23c7dd1 100644
> > > --- a/drivers/gpu/drm/i915/intel_mocs.c
> > > +++ b/drivers/gpu/drm/i915/intel_mocs.c
> > > @@ -142,7 +142,7 @@ static const struct drm_i915_mocs_entry
> > > broxton_mocs_table[] = {
> > >  	  .l3cc_value =    L3_ESC(0) | L3_SCC(0) |
> > > L3_CACHEABILITY(L3_WB),
> > >  	},
> > >  	{
> > > -	  .control_value = LE_CACHEABILITY(LE_WB) |
> > > +	  .control_value = LE_CACHEABILITY(LE_PAGETABLE) |
> > >  			   LE_TGT_CACHE(LE_TC_LLC_ELLC) |
> > >  			   LE_LRUM(3) | LE_AOM(0) | LE_RSC(0) |
> > > LE_SCC(0) |
> > >  			   LE_PFM(0) | LE_SCF(0),
> > 
> > This is index 2? This should *be* the snooping entry?
> > 
> > Index 0: uncached
> > Index 1: follow pte
> > Index 2: snoop
> > 
> > Aim I missing something? Why isn't this a userspce bug for requesting
> > a
> > mocs setting it didn't wnat? In my kernel this makes mocs_table[1] ==
> > mocs_table[2].
> 
> I don't think there is or can be any snooping entry. On CHV for example
> we can only setup snooping via the PTE, so there we necessarily have to
> use entry 1 + PTE setup (set caching IOCTL). This is also what both
> Windows and Mesa assumes apparently.

Then the issue is not we've enabled snooping via mocs, but that the mocs
entry is completely bogus. Please update your commit message :)
-Chris
Imre Deak April 25, 2016, 1:01 p.m. UTC | #4
On ma, 2016-04-25 at 13:49 +0100, Chris Wilson wrote:
> On Mon, Apr 25, 2016 at 03:39:54PM +0300, Imre Deak wrote:
> > On ma, 2016-04-25 at 13:37 +0100, Chris Wilson wrote:
> > > On Mon, Apr 25, 2016 at 03:23:21PM +0300, Imre Deak wrote:
> > > > Setting a write-back cache policy in the MOCS entry definition
> > > > also
> > > > implies snooping, which has a considerable overhead. This is
> > > > unexpected for a few reasons:
> > > > - From user-space's point of view since it didn't want a
> > > > coherent
> > > >   surface (it didn't set the buffer as such via the set caching
> > > > IOCTL).
> > > > - There is a separate MOCS entry field for snooping (which we
> > > > never
> > > >   set).
> > > > - This MOCS table is about caching in (e)LLC and there is no
> > > > (e)LLC
> > > > on
> > > >   BXT. There is a separate table for L3 cache control.
> > > > 
> > > > Considering the above the current behavior of snooping looks
> > > > like
> > > > an
> > > > unintentional side-effect of the WB setting. Changing it to be
> > > > PTE
> > > > based cacheability gets rid of the snooping without any ill-
> > > > effects.
> > > > For a coherent surface the application would use a separate
> > > > MOCS
> > > > entry
> > > > (at index 1) and call the set caching IOCTL to setup the PTE
> > > > entries
> > > > for the corresponding buffer to be snooped.
> > > > 
> > > > This resulted in 70% improvement in synthetic texturing
> > > > benchmarks.
> > > > 
> > > > Kudos to Valtteri Rantala, Eero Tamminen and Michael T
> > > > Frederick
> > > > and
> > > > Ville who helped to narrow the source of problem to the kernel
> > > > and
> > > > to
> > > > the snooping behaviour in particular.
> > > > 
> > > > CC: Valtteri Rantala <valtteri.rantala@intel.com>
> > > > CC: Eero Tamminen <eero.t.tamminen@intel.com>
> > > > CC: Michael T Frederick <michael.t.frederick@intel.com>
> > > > CC: Ville Syrjälä <ville.syrjala@linux.intel.com>
> > > > Signed-off-by: Imre Deak <imre.deak@intel.com>
> > > > ---
> > > >  drivers/gpu/drm/i915/intel_mocs.c | 2 +-
> > > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > > 
> > > > diff --git a/drivers/gpu/drm/i915/intel_mocs.c
> > > > b/drivers/gpu/drm/i915/intel_mocs.c
> > > > index 5006a92..23c7dd1 100644
> > > > --- a/drivers/gpu/drm/i915/intel_mocs.c
> > > > +++ b/drivers/gpu/drm/i915/intel_mocs.c
> > > > @@ -142,7 +142,7 @@ static const struct drm_i915_mocs_entry
> > > > broxton_mocs_table[] = {
> > > >  	  .l3cc_value =    L3_ESC(0) | L3_SCC(0) |
> > > > L3_CACHEABILITY(L3_WB),
> > > >  	},
> > > >  	{
> > > > -	  .control_value = LE_CACHEABILITY(LE_WB) |
> > > > +	  .control_value = LE_CACHEABILITY(LE_PAGETABLE) |
> > > >  			   LE_TGT_CACHE(LE_TC_LLC_ELLC) |
> > > >  			   LE_LRUM(3) | LE_AOM(0) | LE_RSC(0)
> > > > |
> > > > LE_SCC(0) |
> > > >  			   LE_PFM(0) | LE_SCF(0),
> > > 
> > > This is index 2? This should *be* the snooping entry?
> > > 
> > > Index 0: uncached
> > > Index 1: follow pte
> > > Index 2: snoop
> > > 
> > > Aim I missing something? Why isn't this a userspce bug for
> > > requesting
> > > a
> > > mocs setting it didn't wnat? In my kernel this makes
> > > mocs_table[1] ==
> > > mocs_table[2].
> > 
> > I don't think there is or can be any snooping entry. On CHV for
> > example
> > we can only setup snooping via the PTE, so there we necessarily
> > have to
> > use entry 1 + PTE setup (set caching IOCTL). This is also what both
> > Windows and Mesa assumes apparently.
> 
> Then the issue is not we've enabled snooping via mocs, but that the
> mocs
> entry is completely bogus. Please update your commit message :)

Yes it is bogus, that is what I meant by "incorrect MOCS config" in the
subject. Tbh, the specification is really unclear what these bits
control, it doesn't mention at all for example that the WB setting
would affect snooping behavior. It doesn't either have much sense to me
to set any of these MOCS entry fields other than their default on BXT,
since they all (should) control only e(LLC) cacheability which is
irrelevant on BXT. I will ask for a clarification for this in BSpec.

--Imre
diff mbox

Patch

diff --git a/drivers/gpu/drm/i915/intel_mocs.c b/drivers/gpu/drm/i915/intel_mocs.c
index 5006a92..23c7dd1 100644
--- a/drivers/gpu/drm/i915/intel_mocs.c
+++ b/drivers/gpu/drm/i915/intel_mocs.c
@@ -142,7 +142,7 @@  static const struct drm_i915_mocs_entry broxton_mocs_table[] = {
 	  .l3cc_value =    L3_ESC(0) | L3_SCC(0) | L3_CACHEABILITY(L3_WB),
 	},
 	{
-	  .control_value = LE_CACHEABILITY(LE_WB) |
+	  .control_value = LE_CACHEABILITY(LE_PAGETABLE) |
 			   LE_TGT_CACHE(LE_TC_LLC_ELLC) |
 			   LE_LRUM(3) | LE_AOM(0) | LE_RSC(0) | LE_SCC(0) |
 			   LE_PFM(0) | LE_SCF(0),