Message ID | 07e05518d9f6620d20cc1101ec1849203fe973f9.1657289332.git.karolina.drobnik@intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | drm/i915: Apply waitboosting before fence wait | expand |
On Fri, Jul 08, 2022 at 04:20:11PM +0200, Karolina Drobnik wrote: > From: Chris Wilson <chris@chris-wilson.co.uk> > > We employ a "waitboost" heuristic to detect when userspace is stalled > waiting for results from earlier execution. Under latency sensitive work > mixed between the gpu/cpu, the GPU is typically under-utilised and so > RPS sees that low utilisation as a reason to downclock the frequency, > causing longer stalls and lower throughput. The user left waiting for > the results is not impressed. > > On applying commit 047a1b877ed4 ("dma-buf & drm/amdgpu: remove dma_resv > workaround") it was observed that deinterlacing h264 on Haswell > performance dropped by 2-5x. The reason being that the natural workload > was not intense enough to trigger RPS (using HW evaluation intervals) to > upclock, and so it was depending on waitboosting for the throughput. > > Commit 047a1b877ed4 ("dma-buf & drm/amdgpu: remove dma_resv workaround") > changes the composition of dma-resv from keeping a single write fence + > multiple read fences, to a single array of multiple write and read > fences (a maximum of one pair of write/read fences per context). The > iteration order was also changed implicitly from all-read fences then > the single write fence, to a mix of write fences followed by read > fences. It is that ordering change that belied the fragility of > waitboosting. > > Currently, a waitboost is inspected at the point of waiting on an > outstanding fence. If the GPU is backlogged such that we haven't yet > stated the request we need to wait on, we force the GPU to upclock until > the completion of that request. By changing the order in which we waited > upon requests, we ended up waiting on those requests in sequence and as > such we saw that each request was already started and so not a suitable > candidate for waitboosting. > > Instead of asking whether to boost each fence in turn, we can look at > whether boosting is required for the dma-resv ensemble prior to waiting > on any fence, making the heuristic more robust to the order in which > fences are stored in the dma-resv. > > Reported-by: Thomas Voegtle <tv@lio96.de> > Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/6284 > Fixes: 047a1b877ed4 ("dma-buf & drm/amdgpu: remove dma_resv workaround") > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> > Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> > Signed-off-by: Karolina Drobnik <karolina.drobnik@intel.com> > Tested-by: Thomas Voegtle <tv@lio96.de> > Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com> Acked-by: Rodrigo Vivi <rodrigo.vivi@intel.com> > --- > drivers/gpu/drm/i915/gem/i915_gem_wait.c | 34 ++++++++++++++++++++++++ > 1 file changed, 34 insertions(+) > > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_wait.c b/drivers/gpu/drm/i915/gem/i915_gem_wait.c > index 319936f91ac5..e6e01c2a74a6 100644 > --- a/drivers/gpu/drm/i915/gem/i915_gem_wait.c > +++ b/drivers/gpu/drm/i915/gem/i915_gem_wait.c > @@ -9,6 +9,7 @@ > #include <linux/jiffies.h> > > #include "gt/intel_engine.h" > +#include "gt/intel_rps.h" > > #include "i915_gem_ioctls.h" > #include "i915_gem_object.h" > @@ -31,6 +32,37 @@ i915_gem_object_wait_fence(struct dma_fence *fence, > timeout); > } > > +static void > +i915_gem_object_boost(struct dma_resv *resv, unsigned int flags) > +{ > + struct dma_resv_iter cursor; > + struct dma_fence *fence; > + > + /* > + * Prescan all fences for potential boosting before we begin waiting. > + * > + * When we wait, we wait on outstanding fences serially. If the > + * dma-resv contains a sequence such as 1:1, 1:2 instead of a reduced > + * form 1:2, then as we look at each wait in turn we see that each > + * request is currently executing and not worthy of boosting. But if > + * we only happen to look at the final fence in the sequence (because > + * of request coalescing or splitting between read/write arrays by > + * the iterator), then we would boost. As such our decision to boost > + * or not is delicately balanced on the order we wait on fences. > + * > + * So instead of looking for boosts sequentially, look for all boosts > + * upfront and then wait on the outstanding fences. > + */ > + > + dma_resv_iter_begin(&cursor, resv, > + dma_resv_usage_rw(flags & I915_WAIT_ALL)); > + dma_resv_for_each_fence_unlocked(&cursor, fence) > + if (dma_fence_is_i915(fence) && > + !i915_request_started(to_request(fence))) > + intel_rps_boost(to_request(fence)); > + dma_resv_iter_end(&cursor); > +} > + > static long > i915_gem_object_wait_reservation(struct dma_resv *resv, > unsigned int flags, > @@ -40,6 +72,8 @@ i915_gem_object_wait_reservation(struct dma_resv *resv, > struct dma_fence *fence; > long ret = timeout ?: 1; > > + i915_gem_object_boost(resv, flags); > + > dma_resv_iter_begin(&cursor, resv, > dma_resv_usage_rw(flags & I915_WAIT_ALL)); > dma_resv_for_each_fence_unlocked(&cursor, fence) { > -- > 2.25.1 >
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_wait.c b/drivers/gpu/drm/i915/gem/i915_gem_wait.c index 319936f91ac5..e6e01c2a74a6 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_wait.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_wait.c @@ -9,6 +9,7 @@ #include <linux/jiffies.h> #include "gt/intel_engine.h" +#include "gt/intel_rps.h" #include "i915_gem_ioctls.h" #include "i915_gem_object.h" @@ -31,6 +32,37 @@ i915_gem_object_wait_fence(struct dma_fence *fence, timeout); } +static void +i915_gem_object_boost(struct dma_resv *resv, unsigned int flags) +{ + struct dma_resv_iter cursor; + struct dma_fence *fence; + + /* + * Prescan all fences for potential boosting before we begin waiting. + * + * When we wait, we wait on outstanding fences serially. If the + * dma-resv contains a sequence such as 1:1, 1:2 instead of a reduced + * form 1:2, then as we look at each wait in turn we see that each + * request is currently executing and not worthy of boosting. But if + * we only happen to look at the final fence in the sequence (because + * of request coalescing or splitting between read/write arrays by + * the iterator), then we would boost. As such our decision to boost + * or not is delicately balanced on the order we wait on fences. + * + * So instead of looking for boosts sequentially, look for all boosts + * upfront and then wait on the outstanding fences. + */ + + dma_resv_iter_begin(&cursor, resv, + dma_resv_usage_rw(flags & I915_WAIT_ALL)); + dma_resv_for_each_fence_unlocked(&cursor, fence) + if (dma_fence_is_i915(fence) && + !i915_request_started(to_request(fence))) + intel_rps_boost(to_request(fence)); + dma_resv_iter_end(&cursor); +} + static long i915_gem_object_wait_reservation(struct dma_resv *resv, unsigned int flags, @@ -40,6 +72,8 @@ i915_gem_object_wait_reservation(struct dma_resv *resv, struct dma_fence *fence; long ret = timeout ?: 1; + i915_gem_object_boost(resv, flags); + dma_resv_iter_begin(&cursor, resv, dma_resv_usage_rw(flags & I915_WAIT_ALL)); dma_resv_for_each_fence_unlocked(&cursor, fence) {