diff mbox

[RFC,3/3] drm/i915: Micro-optimize i915_gem_obj_to_vma

Message ID 1461240353-29576-3-git-send-email-tvrtko.ursulin@linux.intel.com (mailing list archive)
State New, archived
Headers show

Commit Message

Tvrtko Ursulin April 21, 2016, 12:05 p.m. UTC
From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

i915_gem_obj_to_vma is one of the most expensive functions in
our profiles. Could avoiding some branching by replacing it
with arithmetic be beneficial? Some benchmarks suggest it
slightly might.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/i915_gem.c | 14 ++++++++++++--
 1 file changed, 12 insertions(+), 2 deletions(-)

Comments

Chris Wilson April 21, 2016, 12:17 p.m. UTC | #1
On Thu, Apr 21, 2016 at 01:05:53PM +0100, Tvrtko Ursulin wrote:
> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> 
> i915_gem_obj_to_vma is one of the most expensive functions in
> our profiles. Could avoiding some branching by replacing it
> with arithmetic be beneficial? Some benchmarks suggest it
> slightly might.

We can do much better by changing the algorithm here (here and higher
up the call chain).
-Chris
Dave Gordon April 26, 2016, 10:35 a.m. UTC | #2
On 21/04/16 13:05, Tvrtko Ursulin wrote:
> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>
> i915_gem_obj_to_vma is one of the most expensive functions in
> our profiles. Could avoiding some branching by replacing it
> with arithmetic be beneficial? Some benchmarks suggest it
> slightly might.
>
> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> ---
>   drivers/gpu/drm/i915/i915_gem.c | 14 ++++++++++++--
>   1 file changed, 12 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index 0549dea683e1..243bfb922eb3 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -4642,11 +4642,21 @@ struct i915_vma *i915_gem_obj_to_vma(struct drm_i915_gem_object *obj,
>   				     struct i915_address_space *vm)
>   {
>   	struct i915_vma *vma;
> +
> +	BUILD_BUG_ON(I915_GGTT_VIEW_NORMAL != 0);
> +
>   	list_for_each_entry(vma, &obj->vma_list, obj_link) {
> -		if (vma->ggtt_view.type == I915_GGTT_VIEW_NORMAL &&
> -		    vma->vm == vm)
> +		/*
> +		 * Below is just a branching avoiding way of saying:
> +		 * vma_ggtt_view.type == I915_GGTT_VIEW_NORMAL && vma->vm == vm,
> +		 * which relies on the fact I915_GGTT_VIEW_NORMAL has to be
> +		 * zero.
> +		 */
> +		if (!((unsigned long)vma->ggtt_view.type |
> +		    ((unsigned long)vma->vm ^ (unsigned long)vm)))
>   			return vma;
>   	}
> +
>   	return NULL;
>   }

Other alternatives might include splitting the vma_list, so that we have 
one list for the most-frequently searched-for entries (GGTT view NORMAL) 
and for everything else, so the above would just need a single test for 
equality.

Or, slightly less effectively, add GGTT/NORMAL entries at the head of 
the list and others at the tail (and search backwards if you *don't* 
want a GGTT/NORMAL entry). That would still need the comparisons, but 
would likely hit an early match.

.Dave.
Chris Wilson April 26, 2016, 10:45 a.m. UTC | #3
On Tue, Apr 26, 2016 at 11:35:53AM +0100, Dave Gordon wrote:
> On 21/04/16 13:05, Tvrtko Ursulin wrote:
> >From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> >
> >i915_gem_obj_to_vma is one of the most expensive functions in
> >our profiles. Could avoiding some branching by replacing it
> >with arithmetic be beneficial? Some benchmarks suggest it
> >slightly might.
> >
> >Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> >---
> >  drivers/gpu/drm/i915/i915_gem.c | 14 ++++++++++++--
> >  1 file changed, 12 insertions(+), 2 deletions(-)
> >
> >diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> >index 0549dea683e1..243bfb922eb3 100644
> >--- a/drivers/gpu/drm/i915/i915_gem.c
> >+++ b/drivers/gpu/drm/i915/i915_gem.c
> >@@ -4642,11 +4642,21 @@ struct i915_vma *i915_gem_obj_to_vma(struct drm_i915_gem_object *obj,
> >  				     struct i915_address_space *vm)
> >  {
> >  	struct i915_vma *vma;
> >+
> >+	BUILD_BUG_ON(I915_GGTT_VIEW_NORMAL != 0);
> >+
> >  	list_for_each_entry(vma, &obj->vma_list, obj_link) {
> >-		if (vma->ggtt_view.type == I915_GGTT_VIEW_NORMAL &&
> >-		    vma->vm == vm)
> >+		/*
> >+		 * Below is just a branching avoiding way of saying:
> >+		 * vma_ggtt_view.type == I915_GGTT_VIEW_NORMAL && vma->vm == vm,
> >+		 * which relies on the fact I915_GGTT_VIEW_NORMAL has to be
> >+		 * zero.
> >+		 */
> >+		if (!((unsigned long)vma->ggtt_view.type |
> >+		    ((unsigned long)vma->vm ^ (unsigned long)vm)))
> >  			return vma;
> >  	}
> >+
> >  	return NULL;
> >  }
> 
> Other alternatives might include splitting the vma_list, so that we
> have one list for the most-frequently searched-for entries (GGTT
> view NORMAL) and for everything else, so the above would just need a
> single test for equality.
> 
> Or, slightly less effectively, add GGTT/NORMAL entries at the head
> of the list and others at the tail (and search backwards if you
> *don't* want a GGTT/NORMAL entry). That would still need the
> comparisons, but would likely hit an early match.

We want one list for convenience elsewhere, but can keep a rht in
parallel. This is not as effective/important as keeping a hashtable to
translate from handle to vma, but is still useful for some stress cases.
-Chris
diff mbox

Patch

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 0549dea683e1..243bfb922eb3 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -4642,11 +4642,21 @@  struct i915_vma *i915_gem_obj_to_vma(struct drm_i915_gem_object *obj,
 				     struct i915_address_space *vm)
 {
 	struct i915_vma *vma;
+
+	BUILD_BUG_ON(I915_GGTT_VIEW_NORMAL != 0);
+
 	list_for_each_entry(vma, &obj->vma_list, obj_link) {
-		if (vma->ggtt_view.type == I915_GGTT_VIEW_NORMAL &&
-		    vma->vm == vm)
+		/*
+		 * Below is just a branching avoiding way of saying:
+		 * vma_ggtt_view.type == I915_GGTT_VIEW_NORMAL && vma->vm == vm,
+		 * which relies on the fact I915_GGTT_VIEW_NORMAL has to be
+		 * zero.
+		 */
+		if (!((unsigned long)vma->ggtt_view.type |
+		    ((unsigned long)vma->vm ^ (unsigned long)vm)))
 			return vma;
 	}
+
 	return NULL;
 }