diff mbox

[v3,13/14] drm/i915: Enable userspace to opt-out of implicit fencing

Message ID 20161114085703.16540-13-chris@chris-wilson.co.uk (mailing list archive)
State New, archived
Headers show

Commit Message

Chris Wilson Nov. 14, 2016, 8:57 a.m. UTC
Userspace is faced with a dilemma. The kernel requires implicit fencing
to manage resource usage (we always must wait for the GPU to finish
before releasing its PTE) and for third parties. However, userspace may
wish to avoid this serialisation if it is either using explicit fencing
between parties and wants more fine-grained access to buffers (e.g. it
may partition the buffer between uses and track fences on ranges rather
than the implicit fences tracking the whole object). It follows that
userspace needs a mechanism to avoid the kernel's serialisation on its
implicit fences before execbuf execution.

The next question is whether this is an object, execbuf or context flag.
Hybrid users (such as using explicit EGL_ANDROID_native_sync fencing on
shared winsys buffers, but implicit fencing on internal surfaces)
require a per-object level flag. Given that this flag need to be only
set once for the lifetime of the object, this reduces the convenience of
having an execbuf or context level flag (and avoids having multiple
pieces of uABI controlling the same feature).

Incorrect use of this flag will result in rendering corruption and GPU
hangs - but will not result in use-after-free or similar resource
tracking issues.

Serious caveat: write ordering is not strictly correct after setting
this flag on a render target on multiple engines. This affects all
subsequent GEM operations (execbuf, set-domain, pread) and shared
dma-buf operations. A fix is possible - but costly (both in terms of
further ABI changes and runtime overhead).

Testcase: igt/gem_exec_async
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
---
 drivers/gpu/drm/i915/i915_drv.c            |  1 +
 drivers/gpu/drm/i915/i915_gem_execbuffer.c |  3 +++
 include/uapi/drm/i915_drm.h                | 29 ++++++++++++++++++++++++++++-
 3 files changed, 32 insertions(+), 1 deletion(-)

Comments

Chad Versace Jan. 25, 2017, 8:38 p.m. UTC | #1
On Mon 14 Nov 2016, Chris Wilson wrote:
> Userspace is faced with a dilemma. The kernel requires implicit fencing
> to manage resource usage (we always must wait for the GPU to finish
> before releasing its PTE) and for third parties. However, userspace may
> wish to avoid this serialisation if it is either using explicit fencing
> between parties and wants more fine-grained access to buffers (e.g. it
> may partition the buffer between uses and track fences on ranges rather
> than the implicit fences tracking the whole object). It follows that
> userspace needs a mechanism to avoid the kernel's serialisation on its
> implicit fences before execbuf execution.
> 
> The next question is whether this is an object, execbuf or context flag.
> Hybrid users (such as using explicit EGL_ANDROID_native_sync fencing on
> shared winsys buffers, but implicit fencing on internal surfaces)
> require a per-object level flag. Given that this flag need to be only
> set once for the lifetime of the object, this reduces the convenience of
> having an execbuf or context level flag (and avoids having multiple
> pieces of uABI controlling the same feature).
> 
> Incorrect use of this flag will result in rendering corruption and GPU
> hangs - but will not result in use-after-free or similar resource
> tracking issues.
> 
> Serious caveat: write ordering is not strictly correct after setting
> this flag on a render target on multiple engines. This affects all
> subsequent GEM operations (execbuf, set-domain, pread) and shared
> dma-buf operations. A fix is possible - but costly (both in terms of
> further ABI changes and runtime overhead).
> 
> Testcase: igt/gem_exec_async
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> ---
>  drivers/gpu/drm/i915/i915_drv.c            |  1 +
>  drivers/gpu/drm/i915/i915_gem_execbuffer.c |  3 +++
>  include/uapi/drm/i915_drm.h                | 29 ++++++++++++++++++++++++++++-
>  3 files changed, 32 insertions(+), 1 deletion(-)

I'm neutral about this patch. I believe patch 14/14 is useful with or
without this patch, and I want to see patch 14 land regardless of what
happens with this one.

I'm not opposed to this patch. It's just that I don't yet understand
exactly if Mesa's EGL/GL code could effectively use this feature for
Android winsys buffers. The amount of information loss between the
EGL/GL apis and the eventual execbuffer submission may prevent Mesa from
annotating the Android winsys buffers with this.  I'm unsure.  I'm still
thinking about it.

But, if Chris, or anyone, already has plans to use this somehow, perhaps
in the DDX, then don't let my hesitation block the patch.
Chris Wilson Jan. 26, 2017, 10:32 a.m. UTC | #2
On Wed, Jan 25, 2017 at 12:38:32PM -0800, Chad Versace wrote:
> On Mon 14 Nov 2016, Chris Wilson wrote:
> > Userspace is faced with a dilemma. The kernel requires implicit fencing
> > to manage resource usage (we always must wait for the GPU to finish
> > before releasing its PTE) and for third parties. However, userspace may
> > wish to avoid this serialisation if it is either using explicit fencing
> > between parties and wants more fine-grained access to buffers (e.g. it
> > may partition the buffer between uses and track fences on ranges rather
> > than the implicit fences tracking the whole object). It follows that
> > userspace needs a mechanism to avoid the kernel's serialisation on its
> > implicit fences before execbuf execution.
> > 
> > The next question is whether this is an object, execbuf or context flag.
> > Hybrid users (such as using explicit EGL_ANDROID_native_sync fencing on
> > shared winsys buffers, but implicit fencing on internal surfaces)
> > require a per-object level flag. Given that this flag need to be only
> > set once for the lifetime of the object, this reduces the convenience of
> > having an execbuf or context level flag (and avoids having multiple
> > pieces of uABI controlling the same feature).
> > 
> > Incorrect use of this flag will result in rendering corruption and GPU
> > hangs - but will not result in use-after-free or similar resource
> > tracking issues.
> > 
> > Serious caveat: write ordering is not strictly correct after setting
> > this flag on a render target on multiple engines. This affects all
> > subsequent GEM operations (execbuf, set-domain, pread) and shared
> > dma-buf operations. A fix is possible - but costly (both in terms of
> > further ABI changes and runtime overhead).
> > 
> > Testcase: igt/gem_exec_async
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> > ---
> >  drivers/gpu/drm/i915/i915_drv.c            |  1 +
> >  drivers/gpu/drm/i915/i915_gem_execbuffer.c |  3 +++
> >  include/uapi/drm/i915_drm.h                | 29 ++++++++++++++++++++++++++++-
> >  3 files changed, 32 insertions(+), 1 deletion(-)
> 
> I'm neutral about this patch. I believe patch 14/14 is useful with or
> without this patch, and I want to see patch 14 land regardless of what
> happens with this one.

I don't like the patch, it opens up a big wart in the GEM api (incorrect
write tracking on GEM/dma-buf across multiple timelines). Otoh, being
able to discard the implicit fence tracking seems to be an important
feature request - if we go forward witout it, we will then be lacking a
feature that is common across other drivers and in particular seems to
be commonplace in the Android ecosystem.

Daniel, what's your feeling? One problem will be that the
synchronisation issue may be hard to track down in future (proving that
the cause of a stall is an avoidable implicit fence).
 
> I'm not opposed to this patch. It's just that I don't yet understand
> exactly if Mesa's EGL/GL code could effectively use this feature for
> Android winsys buffers. The amount of information loss between the
> EGL/GL apis and the eventual execbuffer submission may prevent Mesa from
> annotating the Android winsys buffers with this.  I'm unsure.  I'm still
> thinking about it.
> 
> But, if Chris, or anyone, already has plans to use this somehow, perhaps
> in the DDX, then don't let my hesitation block the patch.

Actually, the example I have would be for mesa. It can use this on its
own scratch buffers to just discard writes and prevent ordering on
a single scratch shared between contexts, and for its fence tracking using
a single page for multiple rings.
-Chris
Chad Versace Jan. 27, 2017, 12:07 a.m. UTC | #3
On Thu 26 Jan 2017, Chris Wilson wrote:
> On Wed, Jan 25, 2017 at 12:38:32PM -0800, Chad Versace wrote:
> > On Mon 14 Nov 2016, Chris Wilson wrote:
> > > Userspace is faced with a dilemma. The kernel requires implicit fencing
> > > to manage resource usage (we always must wait for the GPU to finish
> > > before releasing its PTE) and for third parties. However, userspace may
> > > wish to avoid this serialisation if it is either using explicit fencing
> > > between parties and wants more fine-grained access to buffers (e.g. it
> > > may partition the buffer between uses and track fences on ranges rather
> > > than the implicit fences tracking the whole object). It follows that
> > > userspace needs a mechanism to avoid the kernel's serialisation on its
> > > implicit fences before execbuf execution.
> > > 
> > > The next question is whether this is an object, execbuf or context flag.
> > > Hybrid users (such as using explicit EGL_ANDROID_native_sync fencing on
> > > shared winsys buffers, but implicit fencing on internal surfaces)
> > > require a per-object level flag. Given that this flag need to be only
> > > set once for the lifetime of the object, this reduces the convenience of
> > > having an execbuf or context level flag (and avoids having multiple
> > > pieces of uABI controlling the same feature).
> > > 
> > > Incorrect use of this flag will result in rendering corruption and GPU
> > > hangs - but will not result in use-after-free or similar resource
> > > tracking issues.
> > > 
> > > Serious caveat: write ordering is not strictly correct after setting
> > > this flag on a render target on multiple engines. This affects all
> > > subsequent GEM operations (execbuf, set-domain, pread) and shared
> > > dma-buf operations. A fix is possible - but costly (both in terms of
> > > further ABI changes and runtime overhead).
> > > 
> > > Testcase: igt/gem_exec_async
> > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > > Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> > > ---
> > >  drivers/gpu/drm/i915/i915_drv.c            |  1 +
> > >  drivers/gpu/drm/i915/i915_gem_execbuffer.c |  3 +++
> > >  include/uapi/drm/i915_drm.h                | 29 ++++++++++++++++++++++++++++-
> > >  3 files changed, 32 insertions(+), 1 deletion(-)
> > 
> > I'm neutral about this patch. I believe patch 14/14 is useful with or
> > without this patch, and I want to see patch 14 land regardless of what
> > happens with this one.
> 
> I don't like the patch, it opens up a big wart in the GEM api (incorrect
> write tracking on GEM/dma-buf across multiple timelines). Otoh, being
> able to discard the implicit fence tracking seems to be an important
> feature request - if we go forward witout it, we will then be lacking a
> feature that is common across other drivers and in particular seems to
> be commonplace in the Android ecosystem.

I agree. The explicit fence fds provide more benefit (that is, less
blocking and, in general, more *explicitness*) when implicit fencing is
disabled. Userspace should have some API to disable the implicit
fencing, and this patch seems like an ok approach. I certainly can think
of nothing better.

> Daniel, what's your feeling? One problem will be that the
> synchronisation issue may be hard to track down in future (proving that
> the cause of a stall is an avoidable implicit fence).
>  
> > I'm not opposed to this patch. It's just that I don't yet understand
> > exactly if Mesa's EGL/GL code could effectively use this feature for
> > Android winsys buffers. The amount of information loss between the
> > EGL/GL apis and the eventual execbuffer submission may prevent Mesa from
> > annotating the Android winsys buffers with this.  I'm unsure.  I'm still
> > thinking about it.
> > 
> > But, if Chris, or anyone, already has plans to use this somehow, perhaps
> > in the DDX, then don't let my hesitation block the patch.
> 
> Actually, the example I have would be for mesa. It can use this on its
> own scratch buffers to just discard writes and prevent ordering on
> a single scratch shared between contexts, and for its fence tracking using
> a single page for multiple rings.

Those use cases sound good to me. This patch is
Acked-by: Chad Versace <chadversary@chromium.org>
diff mbox

Patch

diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
index 99fc075100b1..657c465a8d50 100644
--- a/drivers/gpu/drm/i915/i915_drv.c
+++ b/drivers/gpu/drm/i915/i915_drv.c
@@ -343,6 +343,7 @@  static int i915_getparam(struct drm_device *dev, void *data,
 	case I915_PARAM_HAS_EXEC_HANDLE_LUT:
 	case I915_PARAM_HAS_COHERENT_PHYS_GTT:
 	case I915_PARAM_HAS_EXEC_SOFTPIN:
+	case I915_PARAM_HAS_EXEC_ASYNC:
 		/* For the time being all of these are always true;
 		 * if some supported hardware does not have one of these
 		 * features this value needs to be provided from
diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index e804cb2fa57e..781b5559f86e 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -1111,6 +1111,9 @@  i915_gem_execbuffer_move_to_gpu(struct drm_i915_gem_request *req,
 	list_for_each_entry(vma, vmas, exec_list) {
 		struct drm_i915_gem_object *obj = vma->obj;
 
+		if (vma->exec_entry->flags & EXEC_OBJECT_ASYNC)
+			continue;
+
 		ret = i915_gem_request_await_object
 			(req, obj, obj->base.pending_write_domain);
 		if (ret)
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index 47901a8ad682..4bd83c0b07db 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -396,6 +396,12 @@  typedef struct drm_i915_irq_wait {
  */
 #define I915_PARAM_HAS_SCHEDULER	 41
 
+/* Query whether DRM_I915_GEM_EXECBUFFER2 supports the ability to opt-out of
+ * synchronisation with implicit fencing on individual objects.
+ */
+#define I915_PARAM_HAS_EXEC_ASYNC	 42
+
+
 typedef struct drm_i915_getparam {
 	__s32 param;
 	/*
@@ -736,8 +742,29 @@  struct drm_i915_gem_exec_object2 {
 #define EXEC_OBJECT_SUPPORTS_48B_ADDRESS (1<<3)
 #define EXEC_OBJECT_PINNED		 (1<<4)
 #define EXEC_OBJECT_PAD_TO_SIZE		 (1<<5)
+/* The kernel implicitly tracks GPU activity on all GEM objects, and
+ * synchronises operations with outstanding rendering. This includes
+ * rendering on other devices if exported via dma-buf. However, sometimes
+ * this tracking is too coarse and the user knows better. For example,
+ * if the object is split into non-overlapping ranges shared between different
+ * clients or engines (i.e. suballocating objects), the implicit tracking
+ * by kernel assumes that each operation affects the whole object rather
+ * than an individual range, causing needless synchronisation between clients.
+ * The kernel will also forgo any CPU cache flushes prior to rendering from
+ * the object as the client is expected to be also handling such domain
+ * tracking.
+ *
+ * The kernel maintains the implicit tracking in order to manage resources
+ * used by the GPU - this flag only disables the synchronisation prior to
+ * rendering with this object in this execbuf.
+ *
+ * Opting out of implicit synhronisation requires the user to do its own
+ * explicit tracking to avoid rendering corruption. See, for example,
+ * I915_PARAM_HAS_EXEC_FENCE to order execbufs and execute them asynchronously.
+ */
+#define EXEC_OBJECT_ASYNC		(1<<6)
 /* All remaining bits are MBZ and RESERVED FOR FUTURE USE */
-#define __EXEC_OBJECT_UNKNOWN_FLAGS -(EXEC_OBJECT_PAD_TO_SIZE<<1)
+#define __EXEC_OBJECT_UNKNOWN_FLAGS -(EXEC_OBJECT_ASYNC<<1)
 	__u64 flags;
 
 	union {