Message ID | 1467380406-11954-3-git-send-email-imre.deak@intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On 07/01/2016 09:40 PM, Deak, Imre wrote: > Setting a write-back cache policy in the MOCS entry definition also > implies snooping, which has a considerable overhead. This is > unexpected for a few reasons: > - From user-space's point of view since it didn't want a coherent > surface (it didn't set the buffer as such via the set caching IOCTL). > - There is a separate MOCS entry field for snooping (which we never > set). > - This MOCS table is about caching in (e)LLC and there is no (e)LLC on > BXT. There is a separate table for L3 cache control. > > Considering the above the current behavior of snooping looks like an > unintentional side-effect of the WB setting. Changing it to be LLC-UC > gets rid of the snooping without any ill-effects. For a coherent > surface the application would use a separate MOCS entry at index 1 and > call the set caching IOCTL to setup the PTE entries for the > corresponding buffer to be snooped. In the future we could also add a > new MOCS entry for coherent surfaces. > > This resulted in 70% improvement in synthetic texturing benchmarks. > > Kudos to Valtteri Rantala, Eero Tamminen and Michael T Frederick and > Ville who helped to narrow the source of problem to the kernel and to > the snooping behaviour in particular. > > With a follow-up change to adjust the 3rd entry value > igt/gem_mocs_settings is passing after this change. > > v2: > - Rebase on v2 of patch 1/2. > v3: > - Set the entry as LLC uncached instead of PTE-passthrough. This way > we also keep snooping disabled, but we also make the cacheability/ > coherency setting indepent of the PTE which is managed by the > kernel. (Chris) > > CC: Rong R Yang<rong.r.yang@intel.com> > CC: Yakui Zhao<yakui.zhao@intel.com> > CC: Valtteri Rantala<valtteri.rantala@intel.com> > CC: Eero Tamminen<eero.t.tamminen@intel.com> > CC: Michael T Frederick<michael.t.frederick@intel.com> > CC: Ville Syrjälä<ville.syrjala@linux.intel.com> > CC: Chris Wilson<chris@chris-wilson.co.uk> > Signed-off-by: Imre Deak<imre.deak@intel.com> As the BXT has no LLC, setting the WB-policy will add the extra overhead. In such case the patch looks more reasonable for BXT. Add: Acked-by: Zhao Yakui <yakui.zhao@intel.com> > --- > drivers/gpu/drm/i915/intel_mocs.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/drivers/gpu/drm/i915/intel_mocs.c b/drivers/gpu/drm/i915/intel_mocs.c > index d36e609..927825f 100644 > --- a/drivers/gpu/drm/i915/intel_mocs.c > +++ b/drivers/gpu/drm/i915/intel_mocs.c > @@ -149,8 +149,8 @@ static const struct drm_i915_mocs_entry broxton_mocs_table[] = { > .l3cc_value = L3_ESC(0) | L3_SCC(0) | L3_CACHEABILITY(L3_WB), > }, > { > - /* 0x0000003b */ > - .control_value = LE_CACHEABILITY(LE_WB) | > + /* 0x00000039 */ > + .control_value = LE_CACHEABILITY(LE_UC) | > LE_TGT_CACHE(LE_TC_LLC_ELLC) | > LE_LRUM(3) | LE_AOM(0) | LE_RSC(0) | LE_SCC(0) | > LE_PFM(0) | LE_SCF(0),
> -----Original Message----- > From: Deak, Imre > Sent: Friday, July 1, 2016 21:40 > To: intel-gfx@lists.freedesktop.org > Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>; Chris Wilson <chris@chris- > wilson.co.uk>; Yang, Rong R <rong.r.yang@intel.com>; Zhao, Yakui > <yakui.zhao@intel.com>; Tamminen, Eero T <eero.t.tamminen@intel.com> > Subject: [PATCH v3 2/3] drm/i915/bxt: Fix inadvertent CPU snooping due to > incorrect MOCS config > > Setting a write-back cache policy in the MOCS entry definition also implies > snooping, which has a considerable overhead. This is unexpected for a few > reasons: > - From user-space's point of view since it didn't want a coherent > surface (it didn't set the buffer as such via the set caching IOCTL). > - There is a separate MOCS entry field for snooping (which we never > set). > - This MOCS table is about caching in (e)LLC and there is no (e)LLC on > BXT. There is a separate table for L3 cache control. > > Considering the above the current behavior of snooping looks like an > unintentional side-effect of the WB setting. Changing it to be LLC-UC gets rid > of the snooping without any ill-effects. For a coherent surface the application > would use a separate MOCS entry at index 1 and call the set caching IOCTL to > setup the PTE entries for the corresponding buffer to be snooped. In the > future we could also add a new MOCS entry for coherent surfaces. > > This resulted in 70% improvement in synthetic texturing benchmarks. > > Kudos to Valtteri Rantala, Eero Tamminen and Michael T Frederick and Ville > who helped to narrow the source of problem to the kernel and to the > snooping behaviour in particular. > > With a follow-up change to adjust the 3rd entry value > igt/gem_mocs_settings is passing after this change. > > v2: > - Rebase on v2 of patch 1/2. > v3: > - Set the entry as LLC uncached instead of PTE-passthrough. This way > we also keep snooping disabled, but we also make the cacheability/ > coherency setting indepent of the PTE which is managed by the > kernel. (Chris) About 20% improvement in OpenCL benchmark luxmark. Add: Tested-by: Rong R Yang <rong.r.yang@intel.com> > CC: Rong R Yang <rong.r.yang@intel.com> > CC: Yakui Zhao <yakui.zhao@intel.com> > CC: Valtteri Rantala <valtteri.rantala@intel.com> > CC: Eero Tamminen <eero.t.tamminen@intel.com> > CC: Michael T Frederick <michael.t.frederick@intel.com> > CC: Ville Syrjälä <ville.syrjala@linux.intel.com> > CC: Chris Wilson <chris@chris-wilson.co.uk> > Signed-off-by: Imre Deak <imre.deak@intel.com> > --- > drivers/gpu/drm/i915/intel_mocs.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/drivers/gpu/drm/i915/intel_mocs.c > b/drivers/gpu/drm/i915/intel_mocs.c > index d36e609..927825f 100644 > --- a/drivers/gpu/drm/i915/intel_mocs.c > +++ b/drivers/gpu/drm/i915/intel_mocs.c > @@ -149,8 +149,8 @@ static const struct drm_i915_mocs_entry > broxton_mocs_table[] = { > .l3cc_value = L3_ESC(0) | L3_SCC(0) | L3_CACHEABILITY(L3_WB), > }, > { > - /* 0x0000003b */ > - .control_value = LE_CACHEABILITY(LE_WB) | > + /* 0x00000039 */ > + .control_value = LE_CACHEABILITY(LE_UC) | > LE_TGT_CACHE(LE_TC_LLC_ELLC) | > LE_LRUM(3) | LE_AOM(0) | LE_RSC(0) | LE_SCC(0) | > LE_PFM(0) | LE_SCF(0), > -- > 2.5.0
diff --git a/drivers/gpu/drm/i915/intel_mocs.c b/drivers/gpu/drm/i915/intel_mocs.c index d36e609..927825f 100644 --- a/drivers/gpu/drm/i915/intel_mocs.c +++ b/drivers/gpu/drm/i915/intel_mocs.c @@ -149,8 +149,8 @@ static const struct drm_i915_mocs_entry broxton_mocs_table[] = { .l3cc_value = L3_ESC(0) | L3_SCC(0) | L3_CACHEABILITY(L3_WB), }, { - /* 0x0000003b */ - .control_value = LE_CACHEABILITY(LE_WB) | + /* 0x00000039 */ + .control_value = LE_CACHEABILITY(LE_UC) | LE_TGT_CACHE(LE_TC_LLC_ELLC) | LE_LRUM(3) | LE_AOM(0) | LE_RSC(0) | LE_SCC(0) | LE_PFM(0) | LE_SCF(0),
Setting a write-back cache policy in the MOCS entry definition also implies snooping, which has a considerable overhead. This is unexpected for a few reasons: - From user-space's point of view since it didn't want a coherent surface (it didn't set the buffer as such via the set caching IOCTL). - There is a separate MOCS entry field for snooping (which we never set). - This MOCS table is about caching in (e)LLC and there is no (e)LLC on BXT. There is a separate table for L3 cache control. Considering the above the current behavior of snooping looks like an unintentional side-effect of the WB setting. Changing it to be LLC-UC gets rid of the snooping without any ill-effects. For a coherent surface the application would use a separate MOCS entry at index 1 and call the set caching IOCTL to setup the PTE entries for the corresponding buffer to be snooped. In the future we could also add a new MOCS entry for coherent surfaces. This resulted in 70% improvement in synthetic texturing benchmarks. Kudos to Valtteri Rantala, Eero Tamminen and Michael T Frederick and Ville who helped to narrow the source of problem to the kernel and to the snooping behaviour in particular. With a follow-up change to adjust the 3rd entry value igt/gem_mocs_settings is passing after this change. v2: - Rebase on v2 of patch 1/2. v3: - Set the entry as LLC uncached instead of PTE-passthrough. This way we also keep snooping disabled, but we also make the cacheability/ coherency setting indepent of the PTE which is managed by the kernel. (Chris) CC: Rong R Yang <rong.r.yang@intel.com> CC: Yakui Zhao <yakui.zhao@intel.com> CC: Valtteri Rantala <valtteri.rantala@intel.com> CC: Eero Tamminen <eero.t.tamminen@intel.com> CC: Michael T Frederick <michael.t.frederick@intel.com> CC: Ville Syrjälä <ville.syrjala@linux.intel.com> CC: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Imre Deak <imre.deak@intel.com> --- drivers/gpu/drm/i915/intel_mocs.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)