[v2,0/2] drm/i915: timeline semaphore support

Message ID	20190627080339.9178-1-lionel.g.landwerlin@intel.com (mailing list archive)
Headers	show Return-Path: <intel-gfx-bounces@lists.freedesktop.org> From: Lionel Landwerlin <lionel.g.landwerlin@intel.com> To: intel-gfx@lists.freedesktop.org Date: Thu, 27 Jun 2019 11:03:37 +0300 Message-Id: <20190627080339.9178-1-lionel.g.landwerlin@intel.com> MIME-Version: 1.0 Subject: [Intel-gfx] [PATCH v2 0/2] drm/i915: timeline semaphore support Precedence: list Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: base64 Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" <intel-gfx-bounces@lists.freedesktop.org>
Series	drm/i915: timeline semaphore support \| expand [v2,0/2] drm/i915: timeline semaphore support [v2,1/2] drm/i915: introduce a mechanism to extend execbuf2 [v2,2/2] drm/i915: add syncobj timeline support

Message ID

20190627080339.9178-1-lionel.g.landwerlin@intel.com (mailing list archive)

Headers

From: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
To: intel-gfx@lists.freedesktop.org
Date: Thu, 27 Jun 2019 11:03:37 +0300
Message-Id: <20190627080339.9178-1-lionel.g.landwerlin@intel.com>
MIME-Version: 1.0
Subject: [Intel-gfx] [PATCH v2 0/2] drm/i915: timeline semaphore support
Precedence: list
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: base64
Errors-To: intel-gfx-bounces@lists.freedesktop.org
Sender: "Intel-gfx" <intel-gfx-bounces@lists.freedesktop.org>

Series

drm/i915: timeline semaphore support | expand

Message

Lionel Landwerlin June 27, 2019, 8:03 a.m. UTC

Hi,

This revision gets rid of the synchronous wait. We now implement the
synchronous wait as part of the userspace driver. A thread is spawned
for each engine and waits for availability of the syncobjs before
calling into execbuffer.

Cheers,

Lionel Landwerlin (2):
  drm/i915: introduce a mechanism to extend execbuf2
  drm/i915: add syncobj timeline support

 .../gpu/drm/i915/gem/i915_gem_execbuffer.c    | 310 ++++++++++++++----
 drivers/gpu/drm/i915/i915_drv.c               |   2 +
 include/uapi/drm/i915_drm.h                   |  62 +++-
 3 files changed, 314 insertions(+), 60 deletions(-)

--
2.21.0.392.gf8f6787159e

Comments

Chris Wilson June 27, 2019, 10:21 a.m. UTC | #1

Quoting Lionel Landwerlin (2019-06-27 09:03:37)
> Hi,
> 
> This revision gets rid of the synchronous wait. We now implement the
> synchronous wait as part of the userspace driver. A thread is spawned
> for each engine and waits for availability of the syncobjs before
> calling into execbuffer.

Why would you do that? It's not like the kernel already has the ability
to serialises execution asynchronously...
-Chris

Lionel Landwerlin June 27, 2019, 10:39 a.m. UTC | #2

On 27/06/2019 13:21, Chris Wilson wrote:
> Quoting Lionel Landwerlin (2019-06-27 09:03:37)
>> Hi,
>>
>> This revision gets rid of the synchronous wait. We now implement the
>> synchronous wait as part of the userspace driver. A thread is spawned
>> for each engine and waits for availability of the syncobjs before
>> calling into execbuffer.
> Why would you do that? It's not like the kernel already has the ability
> to serialises execution asynchronously...
> -Chris
>
Maybe I didn't express myself well.

There is a requirement from the Vulkan spec that we should be able to 
queue a workload depending on fences that haven't materialized yet.

The last revision implemented that in the i915 by blocking in the 
execbuffer until the input fences had all materialized.

We moved that into the userspace driver. That makes the i915 execbuffer 
path not block (which is one thing you mentioned was a behavior change 
in rev1).

It also makes Anv not block on QueueSubmit() because it hands over the 
waiting to a thread dedicated to the queue.

-Lionel

Chris Wilson June 27, 2019, 10:43 a.m. UTC | #3

Quoting Lionel Landwerlin (2019-06-27 11:39:24)
> On 27/06/2019 13:21, Chris Wilson wrote:
> > Quoting Lionel Landwerlin (2019-06-27 09:03:37)
> >> Hi,
> >>
> >> This revision gets rid of the synchronous wait. We now implement the
> >> synchronous wait as part of the userspace driver. A thread is spawned
> >> for each engine and waits for availability of the syncobjs before
> >> calling into execbuffer.
> > Why would you do that? It's not like the kernel already has the ability
> > to serialises execution asynchronously...
> > -Chris
> >
> Maybe I didn't express myself well.
> 
> There is a requirement from the Vulkan spec that we should be able to 
> queue a workload depending on fences that haven't materialized yet.
> 
> 
> The last revision implemented that in the i915 by blocking in the 
> execbuffer until the input fences had all materialized.

My point was that the syncobj seqno itself could be expressed as a proxy
fence that was replaced with the real fence later. (And that we
previously had code to do exactly that :( i915 would listen to the
proxy fence for its scheduling and so be signaled by the right fence
when it was known. That will be a few orders of magnitude lower latency,
more if we can optimise away the irq by knowing the proxy fence ahead of
time.
-Chris

Lionel Landwerlin June 27, 2019, 11:05 a.m. UTC | #4

On 27/06/2019 13:43, Chris Wilson wrote:
> Quoting Lionel Landwerlin (2019-06-27 11:39:24)
>> On 27/06/2019 13:21, Chris Wilson wrote:
>>> Quoting Lionel Landwerlin (2019-06-27 09:03:37)
>>>> Hi,
>>>>
>>>> This revision gets rid of the synchronous wait. We now implement the
>>>> synchronous wait as part of the userspace driver. A thread is spawned
>>>> for each engine and waits for availability of the syncobjs before
>>>> calling into execbuffer.
>>> Why would you do that? It's not like the kernel already has the ability
>>> to serialises execution asynchronously...
>>> -Chris
>>>
>> Maybe I didn't express myself well.
>>
>> There is a requirement from the Vulkan spec that we should be able to
>> queue a workload depending on fences that haven't materialized yet.
>>
>>
>> The last revision implemented that in the i915 by blocking in the
>> execbuffer until the input fences had all materialized.
> My point was that the syncobj seqno itself could be expressed as a proxy
> fence that was replaced with the real fence later. (And that we
> previously had code to do exactly that :( i915 would listen to the
> proxy fence for its scheduling and so be signaled by the right fence
> when it was known. That will be a few orders of magnitude lower latency,
> more if we can optimise away the irq by knowing the proxy fence ahead of
> time.
> -Chris
>

Agreed, but there is a problem here with the dma_fence_chain implementation.

With timeline semaphores, you could be waiting on for point 6 and nobody 
is ever going to signal that point.

Instead point 8 is going to be signaled and we need to signal point 6 
when that happens.

I don't think the current dma_fence_chain implementation allows to do 
that at the moment.

It would need to be more complex with likely more locking.

I have thought a little bit about what you're proposing and found some 
issues with other features we need to support (like import/export).

You could end up building a arbitrary long chain of proxy fences firing 
one another in cascade and blowing up the stack in a kernel task 
signaling one of them.

-Lionel