Message ID | 20190627080339.9178-1-lionel.g.landwerlin@intel.com (mailing list archive) |
---|---|
Headers | show |
Series | drm/i915: timeline semaphore support | expand |
Quoting Lionel Landwerlin (2019-06-27 09:03:37) > Hi, > > This revision gets rid of the synchronous wait. We now implement the > synchronous wait as part of the userspace driver. A thread is spawned > for each engine and waits for availability of the syncobjs before > calling into execbuffer. Why would you do that? It's not like the kernel already has the ability to serialises execution asynchronously... -Chris
On 27/06/2019 13:21, Chris Wilson wrote: > Quoting Lionel Landwerlin (2019-06-27 09:03:37) >> Hi, >> >> This revision gets rid of the synchronous wait. We now implement the >> synchronous wait as part of the userspace driver. A thread is spawned >> for each engine and waits for availability of the syncobjs before >> calling into execbuffer. > Why would you do that? It's not like the kernel already has the ability > to serialises execution asynchronously... > -Chris > Maybe I didn't express myself well. There is a requirement from the Vulkan spec that we should be able to queue a workload depending on fences that haven't materialized yet. The last revision implemented that in the i915 by blocking in the execbuffer until the input fences had all materialized. We moved that into the userspace driver. That makes the i915 execbuffer path not block (which is one thing you mentioned was a behavior change in rev1). It also makes Anv not block on QueueSubmit() because it hands over the waiting to a thread dedicated to the queue. -Lionel
Quoting Lionel Landwerlin (2019-06-27 11:39:24) > On 27/06/2019 13:21, Chris Wilson wrote: > > Quoting Lionel Landwerlin (2019-06-27 09:03:37) > >> Hi, > >> > >> This revision gets rid of the synchronous wait. We now implement the > >> synchronous wait as part of the userspace driver. A thread is spawned > >> for each engine and waits for availability of the syncobjs before > >> calling into execbuffer. > > Why would you do that? It's not like the kernel already has the ability > > to serialises execution asynchronously... > > -Chris > > > Maybe I didn't express myself well. > > There is a requirement from the Vulkan spec that we should be able to > queue a workload depending on fences that haven't materialized yet. > > > The last revision implemented that in the i915 by blocking in the > execbuffer until the input fences had all materialized. My point was that the syncobj seqno itself could be expressed as a proxy fence that was replaced with the real fence later. (And that we previously had code to do exactly that :( i915 would listen to the proxy fence for its scheduling and so be signaled by the right fence when it was known. That will be a few orders of magnitude lower latency, more if we can optimise away the irq by knowing the proxy fence ahead of time. -Chris
On 27/06/2019 13:43, Chris Wilson wrote: > Quoting Lionel Landwerlin (2019-06-27 11:39:24) >> On 27/06/2019 13:21, Chris Wilson wrote: >>> Quoting Lionel Landwerlin (2019-06-27 09:03:37) >>>> Hi, >>>> >>>> This revision gets rid of the synchronous wait. We now implement the >>>> synchronous wait as part of the userspace driver. A thread is spawned >>>> for each engine and waits for availability of the syncobjs before >>>> calling into execbuffer. >>> Why would you do that? It's not like the kernel already has the ability >>> to serialises execution asynchronously... >>> -Chris >>> >> Maybe I didn't express myself well. >> >> There is a requirement from the Vulkan spec that we should be able to >> queue a workload depending on fences that haven't materialized yet. >> >> >> The last revision implemented that in the i915 by blocking in the >> execbuffer until the input fences had all materialized. > My point was that the syncobj seqno itself could be expressed as a proxy > fence that was replaced with the real fence later. (And that we > previously had code to do exactly that :( i915 would listen to the > proxy fence for its scheduling and so be signaled by the right fence > when it was known. That will be a few orders of magnitude lower latency, > more if we can optimise away the irq by knowing the proxy fence ahead of > time. > -Chris > Agreed, but there is a problem here with the dma_fence_chain implementation. With timeline semaphores, you could be waiting on for point 6 and nobody is ever going to signal that point. Instead point 8 is going to be signaled and we need to signal point 6 when that happens. I don't think the current dma_fence_chain implementation allows to do that at the moment. It would need to be more complex with likely more locking. I have thought a little bit about what you're proposing and found some issues with other features we need to support (like import/export). You could end up building a arbitrary long chain of proxy fences firing one another in cascade and blowing up the stack in a kernel task signaling one of them. -Lionel