Message ID | 20201127120718.454037-125-matthew.auld@intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | DG1 + LMEM enabling | expand |
Quoting Matthew Auld (2020-11-27 12:06:40)
> From: Michel Thierry <michel.thierry@intel.com>
Rationale goes here.
Is this wise? HWSP is very frequently read by the CPU, and expected to
be cached on the CPU.
What do the performance profiles indicate?
-Chris
On 27/11/2020 13:55, Chris Wilson wrote: > Quoting Matthew Auld (2020-11-27 12:06:40) >> From: Michel Thierry <michel.thierry@intel.com> > > Rationale goes here. > > Is this wise? HWSP is very frequently read by the CPU, and expected to > be cached on the CPU. > > What do the performance profiles indicate? Do you have a recommendation for an existing selftest or IGT to help measure this? Also are you suggesting moving this to system memory, or just using a different mapping type, if it's placed in local memory? Or maybe try both? Although I'm pretty sceptical about !wc for local memory. > -Chris >
Quoting Matthew Auld (2020-11-30 17:17:16) > On 27/11/2020 13:55, Chris Wilson wrote: > > Quoting Matthew Auld (2020-11-27 12:06:40) > >> From: Michel Thierry <michel.thierry@intel.com> > > > > Rationale goes here. > > > > Is this wise? HWSP is very frequently read by the CPU, and expected to > > be cached on the CPU. > > > > What do the performance profiles indicate? > > Do you have a recommendation for an existing selftest or IGT to help > measure this? > > Also are you suggesting moving this to system memory, or just using a > different mapping type, if it's placed in local memory? Or maybe try > both? Although I'm pretty sceptical about !wc for local memory. A lot of worries go out of the window if this can be in system memory and snooped. For measuring, I suspect there is a lot of chaff that needs to be removed before individual microbenchmarks like perf/request discern any difference; although that would be a starting point. We do a lot of completion checking during execlists interrupt processing, and there we (cpu profiles at least) are sensitive to uncached reads. We can trivially construct a benchmark that only shows the impact of the WC reads; but the point where I think we would first notice from userspace is client wakeup latency scaling: benchmarks/gem_latency, which was once a point of major concern. Nowadays, we can couple that with a second concern about inducing system latency from interrupt processing time. -Chris
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c b/drivers/gpu/drm/i915/gt/intel_engine_cs.c index 0ba020346566..9e0394b06f38 100644 --- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c +++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c @@ -25,6 +25,7 @@ #include <drm/drm_print.h> #include "gem/i915_gem_context.h" +#include "gem/i915_gem_lmem.h" #include "i915_drv.h" @@ -657,7 +658,14 @@ static int init_status_page(struct intel_engine_cs *engine) * in GFP_DMA32 for i965, and no earlier physical address users had * access to more than 4G. */ - obj = i915_gem_object_create_internal(engine->i915, PAGE_SIZE); + if (HAS_LMEM(engine->i915)) { + obj = i915_gem_object_create_lmem(engine->i915, + PAGE_SIZE, + I915_BO_ALLOC_CONTIGUOUS | + I915_BO_ALLOC_VOLATILE); + } else { + obj = i915_gem_object_create_internal(engine->i915, PAGE_SIZE); + } if (IS_ERR(obj)) { drm_err(&engine->i915->drm, "Failed to allocate status page\n"); diff --git a/drivers/gpu/drm/i915/gt/intel_timeline.c b/drivers/gpu/drm/i915/gt/intel_timeline.c index 065943781586..589559b526eb 100644 --- a/drivers/gpu/drm/i915/gt/intel_timeline.c +++ b/drivers/gpu/drm/i915/gt/intel_timeline.c @@ -6,6 +6,7 @@ #include "i915_drv.h" +#include "gem/i915_gem_lmem.h" #include "i915_active.h" #include "i915_syncmap.h" #include "intel_gt.h" @@ -34,7 +35,12 @@ static int __hwsp_alloc(struct intel_gt *gt, struct intel_timeline_hwsp *hwsp) int type; int ret; - obj = i915_gem_object_create_internal(i915, PAGE_SIZE); + if (HAS_LMEM(i915)) + obj = i915_gem_object_create_lmem(i915, PAGE_SIZE, + I915_BO_ALLOC_CONTIGUOUS | + I915_BO_ALLOC_VOLATILE); + else + obj = i915_gem_object_create_internal(i915, PAGE_SIZE); if (IS_ERR(obj)) return PTR_ERR(obj);