Message ID | 20220429100414.647857-1-tvrtko.ursulin@linux.intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [1/2] drm/i915: Enable THP on Icelake and beyond | expand |
On 29/04/2022 11:04, Tvrtko Ursulin wrote: > From: Tvrtko Ursulin <tvrtko.ursulin@intel.com> > > We have a statement from HW designers that the GPU read regression when > using 2M pages was fixed from Icelake onwards, which was also confirmed > by bencharking Eero did last year: > > """ > When IOMMU is disabled, enabling THP causes following perf changes on > TGL-H (GT1): > > 10-15% SynMark Batch[0-3] > 5-10% MemBW GPU texture, SynMark ShMapVsm > 3-5% SynMark TerrainFly* + Geom* + Fill* + CSCloth + Batch4 > 1-3% GpuTest Triangle, SynMark TexMem* + DeferredAA + Batch[5-7] > + few others > -7% MemBW GPU blend > > In the above 3D benchmark names, * means all the variants of tests with > the same prefix. For example "SynMark TexMem*", means both TexMem128 & > TexMem512 tests in the synthetic (Intel internal) SynMark test suite. > > In the (public, but proprietary) GfxBench & GLB(enchmark) test suites, > there are both onscreen and offscreen variants of each test. Unless > explicitly stated otherwise, numbers are for both variants. > > All tests are run with FullHD monitor. All tests are fullscreen except > for GLB and GpuTest ones, which are run in 1/2 screen window (GpuTest > triangle is run both in fullscreen and 1/2 screen window). > """ > > Since the only regression is MemBW GPU blend, against many more gains, > it sounds it is time to enable THP on Gen11+. > > Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> > References: https://gitlab.freedesktop.org/drm/intel/-/issues/430 > Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> > Cc: Matthew Auld <matthew.auld@intel.com> > Cc: Eero Tamminen <eero.t.tamminen@intel.com> fwiw, for the series, Reviewed-by: Matthew Auld <matthew.auld@intel.com> > --- > drivers/gpu/drm/i915/gem/i915_gemfs.c | 13 +++++++++---- > 1 file changed, 9 insertions(+), 4 deletions(-) > > diff --git a/drivers/gpu/drm/i915/gem/i915_gemfs.c b/drivers/gpu/drm/i915/gem/i915_gemfs.c > index ee87874e59dc..c5a6bbc842fc 100644 > --- a/drivers/gpu/drm/i915/gem/i915_gemfs.c > +++ b/drivers/gpu/drm/i915/gem/i915_gemfs.c > @@ -28,12 +28,14 @@ int i915_gemfs_init(struct drm_i915_private *i915) > * > * One example, although it is probably better with a per-file > * control, is selecting huge page allocations ("huge=within_size"). > - * However, we only do so to offset the overhead of iommu lookups > - * due to bandwidth issues (slow reads) on Broadwell+. > + * However, we only do so on platforms which benefit from it, or to > + * offset the overhead of iommu lookups, where with latter it is a net > + * win even on platforms which would otherwise see some performance > + * regressions such a slow reads issue on Broadwell and Skylake. > */ > > opts = NULL; > - if (i915_vtd_active(i915)) { > + if (GRAPHICS_VER(i915) >= 11 || i915_vtd_active(i915)) { > if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE)) { > opts = huge_opt; > drm_info(&i915->drm, > @@ -41,7 +43,10 @@ int i915_gemfs_init(struct drm_i915_private *i915) > opts); > } else { > drm_notice(&i915->drm, > - "Transparent Hugepage support is recommended for optimal performance when IOMMU is enabled!\n"); > + "Transparent Hugepage support is recommended for optimal performance%s\n", > + GRAPHICS_VER(i915) >= 11 ? > + " on this platform!" : > + " when IOMMU is enabled!"); > } > } >
On 09/05/2022 11:49, Matthew Auld wrote: > On 29/04/2022 11:04, Tvrtko Ursulin wrote: >> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com> >> >> We have a statement from HW designers that the GPU read regression when >> using 2M pages was fixed from Icelake onwards, which was also confirmed >> by bencharking Eero did last year: >> >> """ >> When IOMMU is disabled, enabling THP causes following perf changes on >> TGL-H (GT1): >> >> 10-15% SynMark Batch[0-3] >> 5-10% MemBW GPU texture, SynMark ShMapVsm >> 3-5% SynMark TerrainFly* + Geom* + Fill* + CSCloth + Batch4 >> 1-3% GpuTest Triangle, SynMark TexMem* + DeferredAA + Batch[5-7] >> + few others >> -7% MemBW GPU blend >> >> In the above 3D benchmark names, * means all the variants of tests with >> the same prefix. For example "SynMark TexMem*", means both TexMem128 & >> TexMem512 tests in the synthetic (Intel internal) SynMark test suite. >> >> In the (public, but proprietary) GfxBench & GLB(enchmark) test suites, >> there are both onscreen and offscreen variants of each test. Unless >> explicitly stated otherwise, numbers are for both variants. >> >> All tests are run with FullHD monitor. All tests are fullscreen except >> for GLB and GpuTest ones, which are run in 1/2 screen window (GpuTest >> triangle is run both in fullscreen and 1/2 screen window). >> """ >> >> Since the only regression is MemBW GPU blend, against many more gains, >> it sounds it is time to enable THP on Gen11+. >> >> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> >> References: https://gitlab.freedesktop.org/drm/intel/-/issues/430 >> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> >> Cc: Matthew Auld <matthew.auld@intel.com> >> Cc: Eero Tamminen <eero.t.tamminen@intel.com> > > fwiw, for the series, > Reviewed-by: Matthew Auld <matthew.auld@intel.com> Thanks! With a statement from hw arch, benchmark results from Eero and a r-b from you, I think it is justified to push this so I have. Lets see if someone notices an improvement. Regards, Tvrtko > >> --- >> drivers/gpu/drm/i915/gem/i915_gemfs.c | 13 +++++++++---- >> 1 file changed, 9 insertions(+), 4 deletions(-) >> >> diff --git a/drivers/gpu/drm/i915/gem/i915_gemfs.c >> b/drivers/gpu/drm/i915/gem/i915_gemfs.c >> index ee87874e59dc..c5a6bbc842fc 100644 >> --- a/drivers/gpu/drm/i915/gem/i915_gemfs.c >> +++ b/drivers/gpu/drm/i915/gem/i915_gemfs.c >> @@ -28,12 +28,14 @@ int i915_gemfs_init(struct drm_i915_private *i915) >> * >> * One example, although it is probably better with a per-file >> * control, is selecting huge page allocations >> ("huge=within_size"). >> - * However, we only do so to offset the overhead of iommu lookups >> - * due to bandwidth issues (slow reads) on Broadwell+. >> + * However, we only do so on platforms which benefit from it, or to >> + * offset the overhead of iommu lookups, where with latter it is >> a net >> + * win even on platforms which would otherwise see some performance >> + * regressions such a slow reads issue on Broadwell and Skylake. >> */ >> opts = NULL; >> - if (i915_vtd_active(i915)) { >> + if (GRAPHICS_VER(i915) >= 11 || i915_vtd_active(i915)) { >> if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE)) { >> opts = huge_opt; >> drm_info(&i915->drm, >> @@ -41,7 +43,10 @@ int i915_gemfs_init(struct drm_i915_private *i915) >> opts); >> } else { >> drm_notice(&i915->drm, >> - "Transparent Hugepage support is recommended for >> optimal performance when IOMMU is enabled!\n"); >> + "Transparent Hugepage support is recommended for >> optimal performance%s\n", >> + GRAPHICS_VER(i915) >= 11 ? >> + " on this platform!" : >> + " when IOMMU is enabled!"); >> } >> }
diff --git a/drivers/gpu/drm/i915/gem/i915_gemfs.c b/drivers/gpu/drm/i915/gem/i915_gemfs.c index ee87874e59dc..c5a6bbc842fc 100644 --- a/drivers/gpu/drm/i915/gem/i915_gemfs.c +++ b/drivers/gpu/drm/i915/gem/i915_gemfs.c @@ -28,12 +28,14 @@ int i915_gemfs_init(struct drm_i915_private *i915) * * One example, although it is probably better with a per-file * control, is selecting huge page allocations ("huge=within_size"). - * However, we only do so to offset the overhead of iommu lookups - * due to bandwidth issues (slow reads) on Broadwell+. + * However, we only do so on platforms which benefit from it, or to + * offset the overhead of iommu lookups, where with latter it is a net + * win even on platforms which would otherwise see some performance + * regressions such a slow reads issue on Broadwell and Skylake. */ opts = NULL; - if (i915_vtd_active(i915)) { + if (GRAPHICS_VER(i915) >= 11 || i915_vtd_active(i915)) { if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE)) { opts = huge_opt; drm_info(&i915->drm, @@ -41,7 +43,10 @@ int i915_gemfs_init(struct drm_i915_private *i915) opts); } else { drm_notice(&i915->drm, - "Transparent Hugepage support is recommended for optimal performance when IOMMU is enabled!\n"); + "Transparent Hugepage support is recommended for optimal performance%s\n", + GRAPHICS_VER(i915) >= 11 ? + " on this platform!" : + " when IOMMU is enabled!"); } }