Message ID | 1426873717-10176-4-git-send-email-John.C.Harrison@Intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Fri, Mar 20, 2015 at 05:48:36PM +0000, John.C.Harrison@Intel.com wrote: > From: John Harrison <John.C.Harrison@Intel.com> > > The intended usage model for struct fence is that the signalled status should be > set on demand rather than polled. That is, there should not be a need for a > 'signaled' function to be called everytime the status is queried. Instead, > 'something' should be done to enable a signal callback from the hardware which > will update the state directly. In the case of requests, this is the seqno > update interrupt. The idea is that this callback will only be enabled on demand > when something actually tries to wait on the fence. > > This change removes the polling test and replaces it with the callback scheme. > To avoid race conditions where signals can be sent before anyone is waiting for > them, it does not implement the callback on demand feature. When the GPU > scheduler arrives, it will need to know about the completion of every single > request anyway. So it is far simpler to not put in complex and messy anti-race > code in the first place given that it will not be needed in the future. > > Instead, each fence is added to a 'please poke me' list at the start of > i915_add_request(). This happens before the commands to generate the seqno > interrupt are added to the ring thus is guaranteed to be race free. The > interrupt handler then scans through the 'poke me' list when a new seqno pops > out and signals any matching fence/request. The fence is then removed from the > list so the entire request stack does not need to be scanned every time. No. Please let's not go back to the bad old days of generating an interrupt per batch, and doing a lot more work inside the interrupt handler. -Chris
On Fri, Mar 20, 2015 at 09:11:35PM +0000, Chris Wilson wrote: > On Fri, Mar 20, 2015 at 05:48:36PM +0000, John.C.Harrison@Intel.com wrote: > > From: John Harrison <John.C.Harrison@Intel.com> > > > > The intended usage model for struct fence is that the signalled status should be > > set on demand rather than polled. That is, there should not be a need for a > > 'signaled' function to be called everytime the status is queried. Instead, > > 'something' should be done to enable a signal callback from the hardware which > > will update the state directly. In the case of requests, this is the seqno > > update interrupt. The idea is that this callback will only be enabled on demand > > when something actually tries to wait on the fence. > > > > This change removes the polling test and replaces it with the callback scheme. > > To avoid race conditions where signals can be sent before anyone is waiting for > > them, it does not implement the callback on demand feature. When the GPU > > scheduler arrives, it will need to know about the completion of every single > > request anyway. So it is far simpler to not put in complex and messy anti-race > > code in the first place given that it will not be needed in the future. > > > > Instead, each fence is added to a 'please poke me' list at the start of > > i915_add_request(). This happens before the commands to generate the seqno > > interrupt are added to the ring thus is guaranteed to be race free. The > > interrupt handler then scans through the 'poke me' list when a new seqno pops > > out and signals any matching fence/request. The fence is then removed from the > > list so the entire request stack does not need to be scanned every time. > > No. Please let's not go back to the bad old days of generating an interrupt > per batch, and doing a lot more work inside the interrupt handler. Yeah, enable_signalling should be the place where we grab the interrupt reference. Also that we shouldn't call this unconditionally, that pretty much defeats the point of that fastpath optimization. Another complication is missed interrupts. If we detect those and someone calls enable_signalling then we need to fire up a timer to wake up once per jiffy and save stuck fences. To avoid duplication with the threaded wait code we could remove the fallback wakeups from there and just rely on that timer everywhere. -Daniel
On 23/03/2015 09:22, Daniel Vetter wrote: > On Fri, Mar 20, 2015 at 09:11:35PM +0000, Chris Wilson wrote: >> On Fri, Mar 20, 2015 at 05:48:36PM +0000, John.C.Harrison@Intel.com wrote: >>> From: John Harrison <John.C.Harrison@Intel.com> >>> >>> The intended usage model for struct fence is that the signalled status should be >>> set on demand rather than polled. That is, there should not be a need for a >>> 'signaled' function to be called everytime the status is queried. Instead, >>> 'something' should be done to enable a signal callback from the hardware which >>> will update the state directly. In the case of requests, this is the seqno >>> update interrupt. The idea is that this callback will only be enabled on demand >>> when something actually tries to wait on the fence. >>> >>> This change removes the polling test and replaces it with the callback scheme. >>> To avoid race conditions where signals can be sent before anyone is waiting for >>> them, it does not implement the callback on demand feature. When the GPU >>> scheduler arrives, it will need to know about the completion of every single >>> request anyway. So it is far simpler to not put in complex and messy anti-race >>> code in the first place given that it will not be needed in the future. >>> >>> Instead, each fence is added to a 'please poke me' list at the start of >>> i915_add_request(). This happens before the commands to generate the seqno >>> interrupt are added to the ring thus is guaranteed to be race free. The >>> interrupt handler then scans through the 'poke me' list when a new seqno pops >>> out and signals any matching fence/request. The fence is then removed from the >>> list so the entire request stack does not need to be scanned every time. >> No. Please let's not go back to the bad old days of generating an interrupt >> per batch, and doing a lot more work inside the interrupt handler. > Yeah, enable_signalling should be the place where we grab the interrupt > reference. Also that we shouldn't call this unconditionally, that pretty > much defeats the point of that fastpath optimization. > > Another complication is missed interrupts. If we detect those and someone > calls enable_signalling then we need to fire up a timer to wake up once > per jiffy and save stuck fences. To avoid duplication with the threaded > wait code we could remove the fallback wakeups from there and just rely on > that timer everywhere. > -Daniel As has been discussed many times in many forums, the scheduler requires notification of each batch buffer's completion. It needs to know so that it can submit new work, keep dependencies of outstanding work up to date, etc. Android is similar. With the native sync API, Android wants to be signaled about the completion of everything. Every single batch buffer submission comes with a request for a sync point that will be poked when that buffer completes. The kernel has no way of knowing which buffers are actually going to be waited on. There is no driver call anymore. User land simply waits on a file descriptor. I don't see how we can get away without generating an interrupt per batch.
On Mon, Mar 23, 2015 at 12:13:56PM +0000, John Harrison wrote: > On 23/03/2015 09:22, Daniel Vetter wrote: > >On Fri, Mar 20, 2015 at 09:11:35PM +0000, Chris Wilson wrote: > >>On Fri, Mar 20, 2015 at 05:48:36PM +0000, John.C.Harrison@Intel.com wrote: > >>>From: John Harrison <John.C.Harrison@Intel.com> > >>> > >>>The intended usage model for struct fence is that the signalled status should be > >>>set on demand rather than polled. That is, there should not be a need for a > >>>'signaled' function to be called everytime the status is queried. Instead, > >>>'something' should be done to enable a signal callback from the hardware which > >>>will update the state directly. In the case of requests, this is the seqno > >>>update interrupt. The idea is that this callback will only be enabled on demand > >>>when something actually tries to wait on the fence. > >>> > >>>This change removes the polling test and replaces it with the callback scheme. > >>>To avoid race conditions where signals can be sent before anyone is waiting for > >>>them, it does not implement the callback on demand feature. When the GPU > >>>scheduler arrives, it will need to know about the completion of every single > >>>request anyway. So it is far simpler to not put in complex and messy anti-race > >>>code in the first place given that it will not be needed in the future. > >>> > >>>Instead, each fence is added to a 'please poke me' list at the start of > >>>i915_add_request(). This happens before the commands to generate the seqno > >>>interrupt are added to the ring thus is guaranteed to be race free. The > >>>interrupt handler then scans through the 'poke me' list when a new seqno pops > >>>out and signals any matching fence/request. The fence is then removed from the > >>>list so the entire request stack does not need to be scanned every time. > >>No. Please let's not go back to the bad old days of generating an interrupt > >>per batch, and doing a lot more work inside the interrupt handler. > >Yeah, enable_signalling should be the place where we grab the interrupt > >reference. Also that we shouldn't call this unconditionally, that pretty > >much defeats the point of that fastpath optimization. > > > >Another complication is missed interrupts. If we detect those and someone > >calls enable_signalling then we need to fire up a timer to wake up once > >per jiffy and save stuck fences. To avoid duplication with the threaded > >wait code we could remove the fallback wakeups from there and just rely on > >that timer everywhere. > >-Daniel > > As has been discussed many times in many forums, the scheduler requires > notification of each batch buffer's completion. It needs to know so that it > can submit new work, keep dependencies of outstanding work up to date, etc. > > Android is similar. With the native sync API, Android wants to be signaled > about the completion of everything. Every single batch buffer submission > comes with a request for a sync point that will be poked when that buffer > completes. The kernel has no way of knowing which buffers are actually going > to be waited on. There is no driver call anymore. User land simply waits on > a file descriptor. > > I don't see how we can get away without generating an interrupt per batch. I've explained this a bit offline in a meeting, but here's finally the mail version for the record. The reason we want to enable interrupts only when needed is that interrupts don't scale. Looking around high throughput pheriferals all try to avoid interrupts like the plague: netdev has netpoll, block devices just gained the same because of ridiculously fast ssds connected to pcie. And there's lots of people talking about insanely tightly coupled gpu compute workloads (maybe not yet on intel gpus, but it'll come). Now I fully agree that unfortunately the execlist hw design isn't awesome and there's no way around receiving and processing an interrupt per batch. But the hw folks are working on fixing these overheads again (or at least attempting using the guc, I haven't seen the new numbers yet) and old hw without the scheduler works perfectly fine with interrupts mostly disabled. So just because we currently have a suboptimal hw design is imo not a good reason to throw all the on-demand interrupt enabling and handling overboard. I fully expect that we'll need it again. And I think it's easier to keep it working than to first kick it out and then rebuild it again. That's in a nutshell why I think we should keep all that machinery, even though it won't be terribly useful for execlist (with or without the scheduler). Thanks, Daniel
On 03/26/2015 06:22 AM, Daniel Vetter wrote: > On Mon, Mar 23, 2015 at 12:13:56PM +0000, John Harrison wrote: >> On 23/03/2015 09:22, Daniel Vetter wrote: >>> On Fri, Mar 20, 2015 at 09:11:35PM +0000, Chris Wilson wrote: >>>> On Fri, Mar 20, 2015 at 05:48:36PM +0000, John.C.Harrison@Intel.com wrote: >>>>> From: John Harrison <John.C.Harrison@Intel.com> >>>>> >>>>> The intended usage model for struct fence is that the signalled status should be >>>>> set on demand rather than polled. That is, there should not be a need for a >>>>> 'signaled' function to be called everytime the status is queried. Instead, >>>>> 'something' should be done to enable a signal callback from the hardware which >>>>> will update the state directly. In the case of requests, this is the seqno >>>>> update interrupt. The idea is that this callback will only be enabled on demand >>>>> when something actually tries to wait on the fence. >>>>> >>>>> This change removes the polling test and replaces it with the callback scheme. >>>>> To avoid race conditions where signals can be sent before anyone is waiting for >>>>> them, it does not implement the callback on demand feature. When the GPU >>>>> scheduler arrives, it will need to know about the completion of every single >>>>> request anyway. So it is far simpler to not put in complex and messy anti-race >>>>> code in the first place given that it will not be needed in the future. >>>>> >>>>> Instead, each fence is added to a 'please poke me' list at the start of >>>>> i915_add_request(). This happens before the commands to generate the seqno >>>>> interrupt are added to the ring thus is guaranteed to be race free. The >>>>> interrupt handler then scans through the 'poke me' list when a new seqno pops >>>>> out and signals any matching fence/request. The fence is then removed from the >>>>> list so the entire request stack does not need to be scanned every time. >>>> No. Please let's not go back to the bad old days of generating an interrupt >>>> per batch, and doing a lot more work inside the interrupt handler. >>> Yeah, enable_signalling should be the place where we grab the interrupt >>> reference. Also that we shouldn't call this unconditionally, that pretty >>> much defeats the point of that fastpath optimization. >>> >>> Another complication is missed interrupts. If we detect those and someone >>> calls enable_signalling then we need to fire up a timer to wake up once >>> per jiffy and save stuck fences. To avoid duplication with the threaded >>> wait code we could remove the fallback wakeups from there and just rely on >>> that timer everywhere. >>> -Daniel >> >> As has been discussed many times in many forums, the scheduler requires >> notification of each batch buffer's completion. It needs to know so that it >> can submit new work, keep dependencies of outstanding work up to date, etc. >> >> Android is similar. With the native sync API, Android wants to be signaled >> about the completion of everything. Every single batch buffer submission >> comes with a request for a sync point that will be poked when that buffer >> completes. The kernel has no way of knowing which buffers are actually going >> to be waited on. There is no driver call anymore. User land simply waits on >> a file descriptor. >> >> I don't see how we can get away without generating an interrupt per batch. > > I've explained this a bit offline in a meeting, but here's finally the > mail version for the record. The reason we want to enable interrupts only > when needed is that interrupts don't scale. Looking around high throughput > pheriferals all try to avoid interrupts like the plague: netdev has > netpoll, block devices just gained the same because of ridiculously fast > ssds connected to pcie. And there's lots of people talking about insanely > tightly coupled gpu compute workloads (maybe not yet on intel gpus, but > it'll come). > > Now I fully agree that unfortunately the execlist hw design isn't awesome > and there's no way around receiving and processing an interrupt per batch. > But the hw folks are working on fixing these overheads again (or at least > attempting using the guc, I haven't seen the new numbers yet) and old hw > without the scheduler works perfectly fine with interrupts mostly > disabled. So just because we currently have a suboptimal hw design is imo > not a good reason to throw all the on-demand interrupt enabling and > handling overboard. I fully expect that we'll need it again. And I think > it's easier to keep it working than to first kick it out and then rebuild > it again. > > That's in a nutshell why I think we should keep all that machinery, even > though it won't be terribly useful for execlist (with or without the > scheduler). What is our interrupt frequency these days anyway, for an interrupt per batch completion, for a somewhat real set of workloads? There's probably more to shave off of our interrupt handling overhead, which ought to help universally, but especially with execlists and sync point usages. I think Chris was looking at that awhile back and removed some MMIO and such and got the overhead down, but I don't know where we stand today... None of this means that there isn't room for polling and interrupt disabling etc, even in the context of scheduling and execlists of course. Thanks, Jesse
On Thu, Mar 26, 2015 at 10:27:25AM -0700, Jesse Barnes wrote: > On 03/26/2015 06:22 AM, Daniel Vetter wrote: > > On Mon, Mar 23, 2015 at 12:13:56PM +0000, John Harrison wrote: > >> On 23/03/2015 09:22, Daniel Vetter wrote: > >>> On Fri, Mar 20, 2015 at 09:11:35PM +0000, Chris Wilson wrote: > >>>> On Fri, Mar 20, 2015 at 05:48:36PM +0000, John.C.Harrison@Intel.com wrote: > >>>>> From: John Harrison <John.C.Harrison@Intel.com> > >>>>> > >>>>> The intended usage model for struct fence is that the signalled status should be > >>>>> set on demand rather than polled. That is, there should not be a need for a > >>>>> 'signaled' function to be called everytime the status is queried. Instead, > >>>>> 'something' should be done to enable a signal callback from the hardware which > >>>>> will update the state directly. In the case of requests, this is the seqno > >>>>> update interrupt. The idea is that this callback will only be enabled on demand > >>>>> when something actually tries to wait on the fence. > >>>>> > >>>>> This change removes the polling test and replaces it with the callback scheme. > >>>>> To avoid race conditions where signals can be sent before anyone is waiting for > >>>>> them, it does not implement the callback on demand feature. When the GPU > >>>>> scheduler arrives, it will need to know about the completion of every single > >>>>> request anyway. So it is far simpler to not put in complex and messy anti-race > >>>>> code in the first place given that it will not be needed in the future. > >>>>> > >>>>> Instead, each fence is added to a 'please poke me' list at the start of > >>>>> i915_add_request(). This happens before the commands to generate the seqno > >>>>> interrupt are added to the ring thus is guaranteed to be race free. The > >>>>> interrupt handler then scans through the 'poke me' list when a new seqno pops > >>>>> out and signals any matching fence/request. The fence is then removed from the > >>>>> list so the entire request stack does not need to be scanned every time. > >>>> No. Please let's not go back to the bad old days of generating an interrupt > >>>> per batch, and doing a lot more work inside the interrupt handler. > >>> Yeah, enable_signalling should be the place where we grab the interrupt > >>> reference. Also that we shouldn't call this unconditionally, that pretty > >>> much defeats the point of that fastpath optimization. > >>> > >>> Another complication is missed interrupts. If we detect those and someone > >>> calls enable_signalling then we need to fire up a timer to wake up once > >>> per jiffy and save stuck fences. To avoid duplication with the threaded > >>> wait code we could remove the fallback wakeups from there and just rely on > >>> that timer everywhere. > >>> -Daniel > >> > >> As has been discussed many times in many forums, the scheduler requires > >> notification of each batch buffer's completion. It needs to know so that it > >> can submit new work, keep dependencies of outstanding work up to date, etc. > >> > >> Android is similar. With the native sync API, Android wants to be signaled > >> about the completion of everything. Every single batch buffer submission > >> comes with a request for a sync point that will be poked when that buffer > >> completes. The kernel has no way of knowing which buffers are actually going > >> to be waited on. There is no driver call anymore. User land simply waits on > >> a file descriptor. > >> > >> I don't see how we can get away without generating an interrupt per batch. > > > > I've explained this a bit offline in a meeting, but here's finally the > > mail version for the record. The reason we want to enable interrupts only > > when needed is that interrupts don't scale. Looking around high throughput > > pheriferals all try to avoid interrupts like the plague: netdev has > > netpoll, block devices just gained the same because of ridiculously fast > > ssds connected to pcie. And there's lots of people talking about insanely > > tightly coupled gpu compute workloads (maybe not yet on intel gpus, but > > it'll come). > > > > Now I fully agree that unfortunately the execlist hw design isn't awesome > > and there's no way around receiving and processing an interrupt per batch. > > But the hw folks are working on fixing these overheads again (or at least > > attempting using the guc, I haven't seen the new numbers yet) and old hw > > without the scheduler works perfectly fine with interrupts mostly > > disabled. So just because we currently have a suboptimal hw design is imo > > not a good reason to throw all the on-demand interrupt enabling and > > handling overboard. I fully expect that we'll need it again. And I think > > it's easier to keep it working than to first kick it out and then rebuild > > it again. > > > > That's in a nutshell why I think we should keep all that machinery, even > > though it won't be terribly useful for execlist (with or without the > > scheduler). > > What is our interrupt frequency these days anyway, for an interrupt per > batch completion, for a somewhat real set of workloads? There's > probably more to shave off of our interrupt handling overhead, which > ought to help universally, but especially with execlists and sync point > usages. I think Chris was looking at that awhile back and removed some > MMIO and such and got the overhead down, but I don't know where we stand > today... I guess you're referring to the pile of patches to reorder the reads/writes for subordinate irq sources to only happen when they need to? I.e. read only when we have a bit indicating so (unfortunately not available for all of them) and write only if there's something to clear. On a quick scan those patches all landed. The other bit is making the mmio debug stuff faster. That one hasn't converged yet to a version which both reduces the overhead without destroying the usefulness of the debug functionality itself - unclaimed mmio has helped a lot in chasing down runtime pm and power domain bugs in our driver. So I really want to keep it around in some form by default, if at all possible. Maybe check out Chris latest patch and see whether you have a good idea? I've run out on them a bit. -Daniel
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index 28b3c3c..ff662c9 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -2054,6 +2054,9 @@ struct drm_i915_gem_request { * re-ordering, pre-emption, etc., there is no guarantee at all * about the validity or sequentialiaty of the fence's seqno! */ struct fence fence; + struct list_head signal_list; + struct list_head unsignal_list; + bool cancelled; /** On Which ring this request was generated */ struct intel_engine_cs *ring; @@ -2132,6 +2135,8 @@ int i915_gem_request_alloc(struct intel_engine_cs *ring, struct drm_i915_gem_request **req_out); void i915_gem_request_cancel(struct drm_i915_gem_request *req); +void i915_gem_request_notify(struct intel_engine_cs *ring); + static inline bool i915_gem_request_completed(struct drm_i915_gem_request *req) { return fence_is_signaled(&req->fence); diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index b1cde7d..27b8893 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -2364,6 +2364,12 @@ void __i915_add_request(struct drm_i915_gem_request *request, */ request->postfix = intel_ring_get_tail(ringbuf); + /* + * Add the fence to the pending list before emitting the commands to + * generate a seqno notification interrupt. + */ + fence_enable_sw_signaling(&request->fence); + if (i915.enable_execlists) ret = ring->emit_request(request); else @@ -2492,6 +2498,10 @@ static void i915_gem_free_request(struct drm_i915_gem_request *request) put_pid(request->pid); + /* In case the request is still in the signal pending list */ + if (!list_empty(&request->signal_list)) + request->cancelled = true; + i915_gem_request_unreference(request); } @@ -2532,28 +2542,62 @@ static const char *i915_gem_request_get_timeline_name(struct fence *req_fence) static bool i915_gem_request_enable_signaling(struct fence *req_fence) { - WARN(true, "Is this required?"); + struct drm_i915_gem_request *req = container_of(req_fence, + typeof(*req), fence); + bool was_empty; + + was_empty = list_empty(&req->ring->fence_signal_list); + if (was_empty) + WARN_ON(!req->ring->irq_get(req->ring)); + + i915_gem_request_reference(req); + list_add_tail(&req->signal_list, &req->ring->fence_signal_list); + + /* + * Note that signalling is always enabled for every request before + * that request is submitted to the hardware. Therefore there is + * no race condition whereby the signal could pop out before the + * request has been added to the list. Hence no need to check + * for completion and undo to the list add and return false. + */ + return true; } -static bool i915_gem_request_is_completed(struct fence *req_fence) +void i915_gem_request_notify(struct intel_engine_cs *ring) { - struct drm_i915_gem_request *req = container_of(req_fence, - typeof(*req), fence); + struct drm_i915_gem_request *req, *req_next; + unsigned long flags; u32 seqno; - BUG_ON(req == NULL); + if (list_empty(&ring->fence_signal_list)) + return; + + seqno = ring->get_seqno(ring, false); + + spin_lock_irqsave(&ring->fence_lock, flags); + list_for_each_entry_safe(req, req_next, &ring->fence_signal_list, signal_list) { + if (!req->cancelled) { + if (!i915_seqno_passed(seqno, req->seqno)) + continue; - seqno = req->ring->get_seqno(req->ring, false/*lazy_coherency*/); + fence_signal_locked(&req->fence); + } + + list_del(&req->signal_list); + INIT_LIST_HEAD(&req->signal_list); + if (list_empty(&req->ring->fence_signal_list)) + req->ring->irq_put(req->ring); - return i915_seqno_passed(seqno, req->seqno); + list_add_tail(&req->unsignal_list, &req->ring->fence_unsignal_list); + } + spin_unlock_irqrestore(&ring->fence_lock, flags); } static const struct fence_ops i915_gem_request_fops = { .get_driver_name = i915_gem_request_get_driver_name, .get_timeline_name = i915_gem_request_get_timeline_name, .enable_signaling = i915_gem_request_enable_signaling, - .signaled = i915_gem_request_is_completed, .wait = fence_default_wait, .release = i915_gem_request_free, }; @@ -2596,6 +2640,7 @@ int i915_gem_request_alloc(struct intel_engine_cs *ring, return ret; } + INIT_LIST_HEAD(&request->signal_list); fence_init(&request->fence, &i915_gem_request_fops, &ring->fence_lock, ring->fence_context, request->seqno); /* @@ -2714,6 +2759,13 @@ static void i915_gem_reset_ring_cleanup(struct drm_i915_private *dev_priv, i915_gem_free_request(request); } + + /* + * Make sure any requests that were on the signal pending list get + * cleaned up. + */ + i915_gem_request_notify(ring); + i915_gem_retire_requests_ring(ring); } void i915_gem_restore_fences(struct drm_device *dev) @@ -2816,6 +2868,20 @@ i915_gem_retire_requests_ring(struct intel_engine_cs *ring) i915_gem_request_assign(&ring->trace_irq_req, NULL); } + while (!list_empty(&ring->fence_unsignal_list)) { + struct drm_i915_gem_request *request; + unsigned long flags; + + spin_lock_irqsave(&ring->fence_lock, flags); + request = list_first_entry(&ring->fence_unsignal_list, + struct drm_i915_gem_request, + unsignal_list); + list_del(&request->unsignal_list); + spin_unlock_irqrestore(&ring->fence_lock, flags); + + i915_gem_request_unreference(request); + } + WARN_ON(i915_verify_lists(ring->dev)); } @@ -5049,6 +5115,8 @@ init_ring_lists(struct intel_engine_cs *ring) { INIT_LIST_HEAD(&ring->active_list); INIT_LIST_HEAD(&ring->request_list); + INIT_LIST_HEAD(&ring->fence_signal_list); + INIT_LIST_HEAD(&ring->fence_unsignal_list); } void i915_init_vm(struct drm_i915_private *dev_priv, diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c index cc2796b..d1cf226 100644 --- a/drivers/gpu/drm/i915/i915_irq.c +++ b/drivers/gpu/drm/i915/i915_irq.c @@ -994,6 +994,8 @@ static void notify_ring(struct drm_device *dev, trace_i915_gem_request_notify(ring); + i915_gem_request_notify(ring); + wake_up_all(&ring->irq_queue); } @@ -2959,6 +2961,7 @@ static void i915_hangcheck_elapsed(struct work_struct *work) DRM_INFO("%s on %s\n", stuck[i] ? "stuck" : "no progress", ring->name); + trace_printk("%s:%d> \x1B[31;1m<%s> Borked: %s @ %d!\x1B[0m\n", __func__, __LINE__, ring->name, stuck[i] ? "stuck" : "no progress", ring->hangcheck.seqno); rings_hung++; } } diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c index c1072b1..d87126e 100644 --- a/drivers/gpu/drm/i915/intel_lrc.c +++ b/drivers/gpu/drm/i915/intel_lrc.c @@ -1337,6 +1337,8 @@ static int logical_ring_init(struct drm_device *dev, struct intel_engine_cs *rin ring->dev = dev; INIT_LIST_HEAD(&ring->active_list); INIT_LIST_HEAD(&ring->request_list); + INIT_LIST_HEAD(&ring->fence_signal_list); + INIT_LIST_HEAD(&ring->fence_unsignal_list); spin_lock_init(&ring->fence_lock); init_waitqueue_head(&ring->irq_queue); diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c index fd65c0d..9d7ad51 100644 --- a/drivers/gpu/drm/i915/intel_ringbuffer.c +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c @@ -1981,6 +1981,8 @@ static int intel_init_ring_buffer(struct drm_device *dev, INIT_LIST_HEAD(&ring->active_list); INIT_LIST_HEAD(&ring->request_list); INIT_LIST_HEAD(&ring->execlist_queue); + INIT_LIST_HEAD(&ring->fence_signal_list); + INIT_LIST_HEAD(&ring->fence_unsignal_list); spin_lock_init(&ring->fence_lock); ringbuf->size = 32 * PAGE_SIZE; ringbuf->ring = ring; diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h index a0ce08e..7412fe4 100644 --- a/drivers/gpu/drm/i915/intel_ringbuffer.h +++ b/drivers/gpu/drm/i915/intel_ringbuffer.h @@ -311,6 +311,8 @@ struct intel_engine_cs { unsigned fence_context; spinlock_t fence_lock; + struct list_head fence_signal_list; + struct list_head fence_unsignal_list; }; bool intel_ring_initialized(struct intel_engine_cs *ring);