diff mbox

[13/15] drm/i915: Allow execbuffer to use the first object as the batch

Message ID 20170316132006.7976-14-chris@chris-wilson.co.uk (mailing list archive)
State New, archived
Headers show

Commit Message

Chris Wilson March 16, 2017, 1:20 p.m. UTC
Currently, the last object in the execlist is the always the batch.
However, when building the batch buffer we often know the batch object
first and if we can use the first slot in the execlist we can emit
relocation instructions relative to it immediately and avoid a separate
pass to adjust the relocations to point to the last execlist slot.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_drv.c            | 1 +
 drivers/gpu/drm/i915/i915_gem_execbuffer.c | 5 ++++-
 include/uapi/drm/i915_drm.h                | 8 +++++++-
 3 files changed, 12 insertions(+), 2 deletions(-)

Comments

Joonas Lahtinen March 17, 2017, 11:15 a.m. UTC | #1
On to, 2017-03-16 at 13:20 +0000, Chris Wilson wrote:
> Currently, the last object in the execlist is the always the batch.
> However, when building the batch buffer we often know the batch object
> first and if we can use the first slot in the execlist we can emit
> relocation instructions relative to it immediately and avoid a separate
> pass to adjust the relocations to point to the last execlist slot.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>

Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>

Regards, Joonas
Daniel Vetter July 7, 2017, 10:17 a.m. UTC | #2
On Fri, Mar 17, 2017 at 12:15 PM, Joonas Lahtinen
<joonas.lahtinen@linux.intel.com> wrote:
> On to, 2017-03-16 at 13:20 +0000, Chris Wilson wrote:
>> Currently, the last object in the execlist is the always the batch.
>> However, when building the batch buffer we often know the batch object
>> first and if we can use the first slot in the execlist we can emit
>> relocation instructions relative to it immediately and avoid a separate
>> pass to adjust the relocations to point to the last execlist slot.
>>
>> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
>
> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>

This patch was reviewed/pushed full month before the mesa patch was
fully reviewed and ready for merging. That's not how uapi is done.

I've fixed this up now by at least reviewing the mesa patch, but for
next time around: If you review uapi, and you don't make sure the
userspace side is in good shape too, then you've not reviewed the
patch properly.

Same goes for reviewing and not making sure there's tests, but that's
another rant.

Ken, pls make sure we don't end up with another case like resource
streamer (or tell me I should revert this).
-Daniel
Chris Wilson July 7, 2017, 11:58 a.m. UTC | #3
Quoting Daniel Vetter (2017-07-07 11:17:22)
> On Fri, Mar 17, 2017 at 12:15 PM, Joonas Lahtinen
> <joonas.lahtinen@linux.intel.com> wrote:
> > On to, 2017-03-16 at 13:20 +0000, Chris Wilson wrote:
> >> Currently, the last object in the execlist is the always the batch.
> >> However, when building the batch buffer we often know the batch object
> >> first and if we can use the first slot in the execlist we can emit
> >> relocation instructions relative to it immediately and avoid a separate
> >> pass to adjust the relocations to point to the last execlist slot.
> >>
> >> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> >
> > Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> 
> This patch was reviewed/pushed full month before the mesa patch was
> fully reviewed and ready for merging. That's not how uapi is done.
> 
> I've fixed this up now by at least reviewing the mesa patch, but for
> next time around: If you review uapi, and you don't make sure the
> userspace side is in good shape too, then you've not reviewed the
> patch properly.

The userspace was in good shape and had been requested. Now explain the
complication in the API that merits a rant?

You've had well over 2 years to review the patches that first used it.
-Chris
Kenneth Graunke July 8, 2017, 3:54 a.m. UTC | #4
On Friday, July 7, 2017 3:17:22 AM PDT Daniel Vetter wrote:
> On Fri, Mar 17, 2017 at 12:15 PM, Joonas Lahtinen
> <joonas.lahtinen@linux.intel.com> wrote:
> > On to, 2017-03-16 at 13:20 +0000, Chris Wilson wrote:
> >> Currently, the last object in the execlist is the always the batch.
> >> However, when building the batch buffer we often know the batch object
> >> first and if we can use the first slot in the execlist we can emit
> >> relocation instructions relative to it immediately and avoid a separate
> >> pass to adjust the relocations to point to the last execlist slot.
> >>
> >> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> >
> > Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> 
> This patch was reviewed/pushed full month before the mesa patch was
> fully reviewed and ready for merging. That's not how uapi is done.
> 
> I've fixed this up now by at least reviewing the mesa patch, but for
> next time around: If you review uapi, and you don't make sure the
> userspace side is in good shape too, then you've not reviewed the
> patch properly.
> 
> Same goes for reviewing and not making sure there's tests, but that's
> another rant.
> 
> Ken, pls make sure we don't end up with another case like resource
> streamer (or tell me I should revert this).
> -Daniel

Sorry, that's partly my bad - I had mentioned to Chris that I wanted this
feature, and planned to use it, but then got distracted with other work and
didn't get around to actually shipping a patch to do so.  Both Chris and I
wrote patches, and IIRC I was benchmarking things when I got distracted.

Basically, I915_EXEC_HANDLE_LUT appeared to be a performance loss in the
CPU bound benchmarks I looked at, because we had to walk over one of the
lists and patch up references to the batch (-1 => actual index).

BATCH_FIRST eliminates that overhead, making HANDLE_LUT actually useful
(although only a small amount).

--Ken
Daniel Vetter July 10, 2017, 5:55 a.m. UTC | #5
On Fri, Jul 7, 2017 at 1:58 PM, Chris Wilson <chris@chris-wilson.co.uk> wrote:
> Quoting Daniel Vetter (2017-07-07 11:17:22)
>> On Fri, Mar 17, 2017 at 12:15 PM, Joonas Lahtinen
>> <joonas.lahtinen@linux.intel.com> wrote:
>> > On to, 2017-03-16 at 13:20 +0000, Chris Wilson wrote:
>> >> Currently, the last object in the execlist is the always the batch.
>> >> However, when building the batch buffer we often know the batch object
>> >> first and if we can use the first slot in the execlist we can emit
>> >> relocation instructions relative to it immediately and avoid a separate
>> >> pass to adjust the relocations to point to the last execlist slot.
>> >>
>> >> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
>> >
>> > Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
>>
>> This patch was reviewed/pushed full month before the mesa patch was
>> fully reviewed and ready for merging. That's not how uapi is done.
>>
>> I've fixed this up now by at least reviewing the mesa patch, but for
>> next time around: If you review uapi, and you don't make sure the
>> userspace side is in good shape too, then you've not reviewed the
>> patch properly.
>
> The userspace was in good shape and had been requested. Now explain the
> complication in the API that merits a rant?
>
> You've had well over 2 years to review the patches that first used it.

uapi merge rules:
1. get everything reviewed, tested and everything ready for merging
2. merge kernel side
3. merge userpsace side

This here very much looked liked the userspace side wasn't ready yet,
and the kernel side was pushed already. This is the
reviewers&committers job to ensure, not mine, so no idea why I've had
over 2 years to review things ...

Anyway, should we add some checks for uapi additions to dim? Just
checking for a Testcase: or References: line for the userspace might
be useful, but then many uapi additions don't touch include/uapi ...
-Daniel
diff mbox

Patch

diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
index fc51094189c0..e63fe3f704d6 100644
--- a/drivers/gpu/drm/i915/i915_drv.c
+++ b/drivers/gpu/drm/i915/i915_drv.c
@@ -351,6 +351,7 @@  static int i915_getparam(struct drm_device *dev, void *data,
 	case I915_PARAM_HAS_EXEC_ASYNC:
 	case I915_PARAM_HAS_EXEC_FENCE:
 	case I915_PARAM_HAS_EXEC_CAPTURE:
+	case I915_PARAM_HAS_EXEC_BATCH_FIRST:
 		/* For the time being all of these are always true;
 		 * if some supported hardware does not have one of these
 		 * features this value needs to be provided from
diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index 0c4fa7fc94aa..d39ea2ba20ff 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -559,7 +559,10 @@  ht_head(const struct i915_gem_context *ctx, u32 handle)
 
 static int eb_batch_index(const struct i915_execbuffer *eb)
 {
-	return eb->buffer_count - 1;
+	if (eb->args->flags & I915_EXEC_BATCH_FIRST)
+		return 0;
+	else
+		return eb->buffer_count - 1;
 }
 
 static int eb_select_context(struct i915_execbuffer *eb)
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index 176c5a70300b..a3ec2f355c09 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -418,6 +418,11 @@  typedef struct drm_i915_irq_wait {
  */
 #define I915_PARAM_HAS_EXEC_CAPTURE	 45
 
+/* Query whether DRM_I915_GEM_EXECBUFFER2 supports supplying the batch buffer
+ * as the first execobject as opposed to the last. See I915_EXEC_BATCH_FIRST.
+ */
+#define I915_PARAM_HAS_EXEC_BATCH_FIRST	 46
+
 typedef struct drm_i915_getparam {
 	__s32 param;
 	/*
@@ -902,7 +907,8 @@  struct drm_i915_gem_execbuffer2 {
  */
 #define I915_EXEC_FENCE_OUT		(1<<17)
 
-#define __I915_EXEC_UNKNOWN_FLAGS (-(I915_EXEC_FENCE_OUT<<1))
+#define I915_EXEC_BATCH_FIRST		(1<<18)
+#define __I915_EXEC_UNKNOWN_FLAGS (-(I915_EXEC_BATCH_FIRST<<1))
 
 #define I915_EXEC_CONTEXT_ID_MASK	(0xffffffff)
 #define i915_execbuffer2_set_context_id(eb2, context) \