Message ID | 1461240353-29576-3-git-send-email-tvrtko.ursulin@linux.intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Thu, Apr 21, 2016 at 01:05:53PM +0100, Tvrtko Ursulin wrote: > From: Tvrtko Ursulin <tvrtko.ursulin@intel.com> > > i915_gem_obj_to_vma is one of the most expensive functions in > our profiles. Could avoiding some branching by replacing it > with arithmetic be beneficial? Some benchmarks suggest it > slightly might. We can do much better by changing the algorithm here (here and higher up the call chain). -Chris
On 21/04/16 13:05, Tvrtko Ursulin wrote: > From: Tvrtko Ursulin <tvrtko.ursulin@intel.com> > > i915_gem_obj_to_vma is one of the most expensive functions in > our profiles. Could avoiding some branching by replacing it > with arithmetic be beneficial? Some benchmarks suggest it > slightly might. > > Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> > --- > drivers/gpu/drm/i915/i915_gem.c | 14 ++++++++++++-- > 1 file changed, 12 insertions(+), 2 deletions(-) > > diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c > index 0549dea683e1..243bfb922eb3 100644 > --- a/drivers/gpu/drm/i915/i915_gem.c > +++ b/drivers/gpu/drm/i915/i915_gem.c > @@ -4642,11 +4642,21 @@ struct i915_vma *i915_gem_obj_to_vma(struct drm_i915_gem_object *obj, > struct i915_address_space *vm) > { > struct i915_vma *vma; > + > + BUILD_BUG_ON(I915_GGTT_VIEW_NORMAL != 0); > + > list_for_each_entry(vma, &obj->vma_list, obj_link) { > - if (vma->ggtt_view.type == I915_GGTT_VIEW_NORMAL && > - vma->vm == vm) > + /* > + * Below is just a branching avoiding way of saying: > + * vma_ggtt_view.type == I915_GGTT_VIEW_NORMAL && vma->vm == vm, > + * which relies on the fact I915_GGTT_VIEW_NORMAL has to be > + * zero. > + */ > + if (!((unsigned long)vma->ggtt_view.type | > + ((unsigned long)vma->vm ^ (unsigned long)vm))) > return vma; > } > + > return NULL; > } Other alternatives might include splitting the vma_list, so that we have one list for the most-frequently searched-for entries (GGTT view NORMAL) and for everything else, so the above would just need a single test for equality. Or, slightly less effectively, add GGTT/NORMAL entries at the head of the list and others at the tail (and search backwards if you *don't* want a GGTT/NORMAL entry). That would still need the comparisons, but would likely hit an early match. .Dave.
On Tue, Apr 26, 2016 at 11:35:53AM +0100, Dave Gordon wrote: > On 21/04/16 13:05, Tvrtko Ursulin wrote: > >From: Tvrtko Ursulin <tvrtko.ursulin@intel.com> > > > >i915_gem_obj_to_vma is one of the most expensive functions in > >our profiles. Could avoiding some branching by replacing it > >with arithmetic be beneficial? Some benchmarks suggest it > >slightly might. > > > >Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> > >--- > > drivers/gpu/drm/i915/i915_gem.c | 14 ++++++++++++-- > > 1 file changed, 12 insertions(+), 2 deletions(-) > > > >diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c > >index 0549dea683e1..243bfb922eb3 100644 > >--- a/drivers/gpu/drm/i915/i915_gem.c > >+++ b/drivers/gpu/drm/i915/i915_gem.c > >@@ -4642,11 +4642,21 @@ struct i915_vma *i915_gem_obj_to_vma(struct drm_i915_gem_object *obj, > > struct i915_address_space *vm) > > { > > struct i915_vma *vma; > >+ > >+ BUILD_BUG_ON(I915_GGTT_VIEW_NORMAL != 0); > >+ > > list_for_each_entry(vma, &obj->vma_list, obj_link) { > >- if (vma->ggtt_view.type == I915_GGTT_VIEW_NORMAL && > >- vma->vm == vm) > >+ /* > >+ * Below is just a branching avoiding way of saying: > >+ * vma_ggtt_view.type == I915_GGTT_VIEW_NORMAL && vma->vm == vm, > >+ * which relies on the fact I915_GGTT_VIEW_NORMAL has to be > >+ * zero. > >+ */ > >+ if (!((unsigned long)vma->ggtt_view.type | > >+ ((unsigned long)vma->vm ^ (unsigned long)vm))) > > return vma; > > } > >+ > > return NULL; > > } > > Other alternatives might include splitting the vma_list, so that we > have one list for the most-frequently searched-for entries (GGTT > view NORMAL) and for everything else, so the above would just need a > single test for equality. > > Or, slightly less effectively, add GGTT/NORMAL entries at the head > of the list and others at the tail (and search backwards if you > *don't* want a GGTT/NORMAL entry). That would still need the > comparisons, but would likely hit an early match. We want one list for convenience elsewhere, but can keep a rht in parallel. This is not as effective/important as keeping a hashtable to translate from handle to vma, but is still useful for some stress cases. -Chris
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index 0549dea683e1..243bfb922eb3 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -4642,11 +4642,21 @@ struct i915_vma *i915_gem_obj_to_vma(struct drm_i915_gem_object *obj, struct i915_address_space *vm) { struct i915_vma *vma; + + BUILD_BUG_ON(I915_GGTT_VIEW_NORMAL != 0); + list_for_each_entry(vma, &obj->vma_list, obj_link) { - if (vma->ggtt_view.type == I915_GGTT_VIEW_NORMAL && - vma->vm == vm) + /* + * Below is just a branching avoiding way of saying: + * vma_ggtt_view.type == I915_GGTT_VIEW_NORMAL && vma->vm == vm, + * which relies on the fact I915_GGTT_VIEW_NORMAL has to be + * zero. + */ + if (!((unsigned long)vma->ggtt_view.type | + ((unsigned long)vma->vm ^ (unsigned long)vm))) return vma; } + return NULL; }