diff mbox series

[RFC,124/162] drm/i915/lmem: allocate HWSP in lmem

Message ID 20201127120718.454037-125-matthew.auld@intel.com (mailing list archive)
State New, archived
Headers show
Series DG1 + LMEM enabling | expand

Commit Message

Matthew Auld Nov. 27, 2020, 12:06 p.m. UTC
From: Michel Thierry <michel.thierry@intel.com>

Signed-off-by: Michel Thierry <michel.thierry@intel.com>
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Abdiel Janulgue <abdiel.janulgue@linux.intel.com>
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
---
 drivers/gpu/drm/i915/gt/intel_engine_cs.c | 10 +++++++++-
 drivers/gpu/drm/i915/gt/intel_timeline.c  |  8 +++++++-
 2 files changed, 16 insertions(+), 2 deletions(-)

Comments

Chris Wilson Nov. 27, 2020, 1:55 p.m. UTC | #1
Quoting Matthew Auld (2020-11-27 12:06:40)
> From: Michel Thierry <michel.thierry@intel.com>

Rationale goes here.

Is this wise? HWSP is very frequently read by the CPU, and expected to
be cached on the CPU.

What do the performance profiles indicate?
-Chris
Matthew Auld Nov. 30, 2020, 5:17 p.m. UTC | #2
On 27/11/2020 13:55, Chris Wilson wrote:
> Quoting Matthew Auld (2020-11-27 12:06:40)
>> From: Michel Thierry <michel.thierry@intel.com>
> 
> Rationale goes here.
> 
> Is this wise? HWSP is very frequently read by the CPU, and expected to
> be cached on the CPU.
> 
> What do the performance profiles indicate?

Do you have a recommendation for an existing selftest or IGT to help 
measure this?

Also are you suggesting moving this to system memory, or just using a 
different mapping type, if it's placed in local memory? Or maybe try 
both? Although I'm pretty sceptical about !wc for local memory.

> -Chris
>
Chris Wilson Nov. 30, 2020, 5:35 p.m. UTC | #3
Quoting Matthew Auld (2020-11-30 17:17:16)
> On 27/11/2020 13:55, Chris Wilson wrote:
> > Quoting Matthew Auld (2020-11-27 12:06:40)
> >> From: Michel Thierry <michel.thierry@intel.com>
> > 
> > Rationale goes here.
> > 
> > Is this wise? HWSP is very frequently read by the CPU, and expected to
> > be cached on the CPU.
> > 
> > What do the performance profiles indicate?
> 
> Do you have a recommendation for an existing selftest or IGT to help 
> measure this?
> 
> Also are you suggesting moving this to system memory, or just using a 
> different mapping type, if it's placed in local memory? Or maybe try 
> both? Although I'm pretty sceptical about !wc for local memory.

A lot of worries go out of the window if this can be in system memory
and snooped.

For measuring, I suspect there is a lot of chaff that needs to be
removed before individual microbenchmarks like perf/request discern any
difference; although that would be a starting point. We do a lot of
completion checking during execlists interrupt processing, and there we
(cpu profiles at least) are sensitive to uncached reads.

We can trivially construct a benchmark that only shows the impact of the
WC reads; but the point where I think we would first notice from userspace
is client wakeup latency scaling: benchmarks/gem_latency, which was once
a point of major concern. Nowadays, we can couple that with a second
concern about inducing system latency from interrupt processing time.
-Chris
diff mbox series

Patch

diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
index 0ba020346566..9e0394b06f38 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
@@ -25,6 +25,7 @@ 
 #include <drm/drm_print.h>
 
 #include "gem/i915_gem_context.h"
+#include "gem/i915_gem_lmem.h"
 
 #include "i915_drv.h"
 
@@ -657,7 +658,14 @@  static int init_status_page(struct intel_engine_cs *engine)
 	 * in GFP_DMA32 for i965, and no earlier physical address users had
 	 * access to more than 4G.
 	 */
-	obj = i915_gem_object_create_internal(engine->i915, PAGE_SIZE);
+	if (HAS_LMEM(engine->i915)) {
+		obj = i915_gem_object_create_lmem(engine->i915,
+						  PAGE_SIZE,
+						  I915_BO_ALLOC_CONTIGUOUS |
+						  I915_BO_ALLOC_VOLATILE);
+	} else {
+		obj = i915_gem_object_create_internal(engine->i915, PAGE_SIZE);
+	}
 	if (IS_ERR(obj)) {
 		drm_err(&engine->i915->drm,
 			"Failed to allocate status page\n");
diff --git a/drivers/gpu/drm/i915/gt/intel_timeline.c b/drivers/gpu/drm/i915/gt/intel_timeline.c
index 065943781586..589559b526eb 100644
--- a/drivers/gpu/drm/i915/gt/intel_timeline.c
+++ b/drivers/gpu/drm/i915/gt/intel_timeline.c
@@ -6,6 +6,7 @@ 
 
 #include "i915_drv.h"
 
+#include "gem/i915_gem_lmem.h"
 #include "i915_active.h"
 #include "i915_syncmap.h"
 #include "intel_gt.h"
@@ -34,7 +35,12 @@  static int __hwsp_alloc(struct intel_gt *gt, struct intel_timeline_hwsp *hwsp)
 	int type;
 	int ret;
 
-	obj = i915_gem_object_create_internal(i915, PAGE_SIZE);
+	if (HAS_LMEM(i915))
+		obj = i915_gem_object_create_lmem(i915, PAGE_SIZE,
+						  I915_BO_ALLOC_CONTIGUOUS |
+						  I915_BO_ALLOC_VOLATILE);
+	else
+		obj = i915_gem_object_create_internal(i915, PAGE_SIZE);
 	if (IS_ERR(obj))
 		return PTR_ERR(obj);