diff mbox

[RFC,2/2] drm/i915: Select engines via class and instance in execbuffer2

Message ID 20170418165615.27666-3-tvrtko.ursulin@linux.intel.com (mailing list archive)
State New, archived
Headers show

Commit Message

Tvrtko Ursulin April 18, 2017, 4:56 p.m. UTC
From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Building on top of the previous patch which exported the concept
of engine classes and instances, we can also use this instead of
the current awkward engine selection uAPI.

This is primarily interesting for the VCS engine selection which
is a) currently done via disjoint set of flags, and b) the
current I915_EXEC_BSD flags has different semantics depending on
the underlying hardware which is bad.

Proposed idea here is to reserve 16-bits of flags, to pass in
the engine class and instance (8 bits each), and a new flag
named I915_EXEC_CLASS_INSTACE to tell the kernel this new engine
selection API is in use.

The new uAPI also removes access to the weak VCS engine
balancing as currently existing in the driver.

Example usage to send a command to VCS0:

  eb.flags = i915_execbuffer2_engine(DRM_I915_ENGINE_CLASS_VIDEO_DECODE, 0);

Or to send a command to VCS1:

  eb.flags = i915_execbuffer2_engine(DRM_I915_ENGINE_CLASS_VIDEO_DECODE, 1);

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Ben Widawsky <ben@bwidawsk.net>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Jon Bloomfield <jon.bloomfield@intel.com>
Cc: Daniel Charles <daniel.charles@intel.com>
Cc: "Rogozhkin, Dmitry V" <dmitry.v.rogozhkin@intel.com>
Cc: Oscar Mateo <oscar.mateo@intel.com>
Cc: "Gong, Zhipeng" <zhipeng.gong@intel.com>
Cc: intel-vaapi-media@lists.01.org
Cc: mesa-dev@lists.freedesktop.org
---
 drivers/gpu/drm/i915/i915_gem_execbuffer.c | 36 ++++++++++++++++++++++++++++++
 include/uapi/drm/i915_drm.h                |  7 +++++-
 2 files changed, 42 insertions(+), 1 deletion(-)

Comments

Chris Wilson April 18, 2017, 9:10 p.m. UTC | #1
On Tue, Apr 18, 2017 at 05:56:15PM +0100, Tvrtko Ursulin wrote:
> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> 
> Building on top of the previous patch which exported the concept
> of engine classes and instances, we can also use this instead of
> the current awkward engine selection uAPI.
> 
> This is primarily interesting for the VCS engine selection which
> is a) currently done via disjoint set of flags, and b) the
> current I915_EXEC_BSD flags has different semantics depending on
> the underlying hardware which is bad.
> 
> Proposed idea here is to reserve 16-bits of flags, to pass in
> the engine class and instance (8 bits each), and a new flag
> named I915_EXEC_CLASS_INSTACE to tell the kernel this new engine
> selection API is in use.
> 
> The new uAPI also removes access to the weak VCS engine
> balancing as currently existing in the driver.
> 
> Example usage to send a command to VCS0:
> 
>   eb.flags = i915_execbuffer2_engine(DRM_I915_ENGINE_CLASS_VIDEO_DECODE, 0);
> 
> Or to send a command to VCS1:
> 
>   eb.flags = i915_execbuffer2_engine(DRM_I915_ENGINE_CLASS_VIDEO_DECODE, 1);

To save a bit of space, we can use the ring selector as a class selector
if bit18 is set, with 19-27 as instance. That limits us to 64 classes -
hopefully not a problem for near future. At least I might have you sold
you on a flexible execbuf3 by then.

(As a digression, some cryptic notes for an implementation I did over Easter:
/*
 * Execbuf3!
 *
 * ringbuffer 
 *  - per context
 *  - per engine
 *  - PAGE_SIZE ctl [ro head, rw tai] + user pot
 *  - kthread [i915/$ctx-$engine] (optional?)
 *  - assumes NO_RELOC-esque awareness
 *
 * SYNC flags [wait/signal], handle [semaphore/fence]
 *
 * BIND handle, offset [user provided]
 * ALLOC[32,64] handle, flags, *offset [kernel provided, need RELOC]
 * RELOC[32,64] handle, target_handle, offset, delta
 * CLEAR flags, handle
 * UNBIND handle
 *
 * BATCH flags, handle, offset
 * [or SVM flags, address]
 *   PIN flags (MAY_RELOC), count, handle[count]
 *   FENCE flags, count, handle[count]
 * SUBMIT handle [fence/NULL with error]
 */

At the moment it is just trying to do execbuf2, but more compactly and
with fewer ioctls. But one of the main selling points is that we can
extend the information passed around more freely than execbuf2.)
-Chris
Tvrtko Ursulin April 24, 2017, 8:36 a.m. UTC | #2
On 18/04/2017 22:10, Chris Wilson wrote:
> On Tue, Apr 18, 2017 at 05:56:15PM +0100, Tvrtko Ursulin wrote:
>> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>
>> Building on top of the previous patch which exported the concept
>> of engine classes and instances, we can also use this instead of
>> the current awkward engine selection uAPI.
>>
>> This is primarily interesting for the VCS engine selection which
>> is a) currently done via disjoint set of flags, and b) the
>> current I915_EXEC_BSD flags has different semantics depending on
>> the underlying hardware which is bad.
>>
>> Proposed idea here is to reserve 16-bits of flags, to pass in
>> the engine class and instance (8 bits each), and a new flag
>> named I915_EXEC_CLASS_INSTACE to tell the kernel this new engine
>> selection API is in use.
>>
>> The new uAPI also removes access to the weak VCS engine
>> balancing as currently existing in the driver.
>>
>> Example usage to send a command to VCS0:
>>
>>   eb.flags = i915_execbuffer2_engine(DRM_I915_ENGINE_CLASS_VIDEO_DECODE, 0);
>>
>> Or to send a command to VCS1:
>>
>>   eb.flags = i915_execbuffer2_engine(DRM_I915_ENGINE_CLASS_VIDEO_DECODE, 1);
>
> To save a bit of space, we can use the ring selector as a class selector
> if bit18 is set, with 19-27 as instance. That limits us to 64 classes -
> hopefully not a problem for near future. At least I might have you sold
> you on a flexible execbuf3 by then.

I was considering re-using those bits yes. I was thinking about the pro 
of keeping it completely separate but I suppose there is not much value 
in that. So I can re-use the ring selector just as well and have a 
smaller impact on number of bits left over.

> (As a digression, some cryptic notes for an implementation I did over Easter:
> /*
>  * Execbuf3!
>  *
>  * ringbuffer
>  *  - per context
>  *  - per engine

We have this already so I am missing something I guess.

>  *  - PAGE_SIZE ctl [ro head, rw tai] + user pot
>  *  - kthread [i915/$ctx-$engine] (optional?)

No idea what these two are. :)

>  *  - assumes NO_RELOC-esque awareness

Ok ok NO_RELOC. :)

>  *
>  * SYNC flags [wait/signal], handle [semaphore/fence]

Sync fence in out just as today, but probably more?

>  *
>  * BIND handle, offset [user provided]
>  * ALLOC[32,64] handle, flags, *offset [kernel provided, need RELOC]
>  * RELOC[32,64] handle, target_handle, offset, delta
>  * CLEAR flags, handle
>  * UNBIND handle

Explicit VMA management? Separate ioctl maybe would be better?

>  *
>  * BATCH flags, handle, offset
>  * [or SVM flags, address]
>  *   PIN flags (MAY_RELOC), count, handle[count]
>  *   FENCE flags, count, handle[count]
>  * SUBMIT handle [fence/NULL with error]
>  */

No idea again. :)

> At the moment it is just trying to do execbuf2, but more compactly and
> with fewer ioctls. But one of the main selling points is that we can
> extend the information passed around more freely than execbuf2.)

I have nothing against a better eb since I trust you know much better it 
is needed and when. But I don't know how long it will take to get there. 
This class/instance idea could be implemented quickly to solve the sore 
point of VCS/VCS2 engine selection. But yeah, it is another uABI to keep 
in that case.

Regards,

Tvrtko
diff mbox

Patch

diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index af1965774e7b..7fc92ec839a1 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -1153,6 +1153,10 @@  i915_gem_check_execbuffer(struct drm_i915_gem_execbuffer2 *exec)
 	if (exec->flags & __I915_EXEC_UNKNOWN_FLAGS)
 		return false;
 
+	if ((exec->flags & I915_EXEC_CLASS_INSTANCE) &&
+	    (exec->flags & I915_EXEC_RING_MASK))
+		return false;
+
 	/* Kernel clipping was a DRI1 misfeature */
 	if (exec->num_cliprects || exec->cliprects_ptr)
 		return false;
@@ -1492,6 +1496,35 @@  gen8_dispatch_bsd_engine(struct drm_i915_private *dev_priv,
 	return file_priv->bsd_engine;
 }
 
+extern u8 user_class_map[DRM_I915_ENGINE_CLASS_MAX];
+
+static struct intel_engine_cs *
+eb_select_engine_class_instance(struct drm_i915_private *i915,
+				struct drm_i915_gem_execbuffer2 *args)
+{
+	struct intel_engine_cs *engine;
+	enum intel_engine_id id;
+	u16 class_instance;
+	u8 user_class, class, instance;
+
+	class_instance = (args->flags & I915_EXEC_CLASS_INSTANCE_MASK) >>
+			 I915_EXEC_CLASS_INSTANCE_SHIFT;
+
+	user_class = class_instance >> 8;
+	instance = class_instance & 0xff;
+
+	if (user_class >= DRM_I915_ENGINE_CLASS_MAX)
+		return NULL;
+	class = user_class_map[user_class];
+
+	for_each_engine(engine, i915, id) {
+		if (engine->class == class && engine->instance == instance)
+			return engine;
+	}
+
+	return NULL;
+}
+
 #define I915_USER_RINGS (4)
 
 static const enum intel_engine_id user_ring_map[I915_USER_RINGS + 1] = {
@@ -1510,6 +1543,9 @@  eb_select_engine(struct drm_i915_private *dev_priv,
 	unsigned int user_ring_id = args->flags & I915_EXEC_RING_MASK;
 	struct intel_engine_cs *engine;
 
+	if (args->flags & I915_EXEC_CLASS_INSTANCE)
+		return eb_select_engine_class_instance(dev_priv, args);
+
 	if (user_ring_id > I915_USER_RINGS) {
 		DRM_DEBUG("execbuf with unknown ring: %u\n", user_ring_id);
 		return NULL;
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index 6058596a9f33..727a6dc4b029 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -906,7 +906,12 @@  struct drm_i915_gem_execbuffer2 {
  */
 #define I915_EXEC_FENCE_OUT		(1<<17)
 
-#define __I915_EXEC_UNKNOWN_FLAGS (-(I915_EXEC_FENCE_OUT<<1))
+#define I915_EXEC_CLASS_INSTANCE	(1<<18)
+
+#define I915_EXEC_CLASS_INSTANCE_SHIFT	(19)
+#define I915_EXEC_CLASS_INSTANCE_MASK	(0xffff << I915_EXEC_CLASS_INSTANCE_SHIFT)
+
+#define __I915_EXEC_UNKNOWN_FLAGS (-(35 << 1))
 
 #define I915_EXEC_CONTEXT_ID_MASK	(0xffffffff)
 #define i915_execbuffer2_set_context_id(eb2, context) \