diff mbox

[CI] drm/i915: Skip waking the device to service pwrite

Message ID 20171019063733.31620-1-chris@chris-wilson.co.uk (mailing list archive)
State New, archived
Headers show

Commit Message

Chris Wilson Oct. 19, 2017, 6:37 a.m. UTC
If the device is in runtime suspend, resuming takes time and reduces our
powersaving. If this was for a small write into an object, that resume
will take longer than any savings in using the indirect GGTT access to
avoid the cpu cache.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
---
 drivers/gpu/drm/i915/i915_gem.c | 23 ++++++++++++++++++++---
 1 file changed, 20 insertions(+), 3 deletions(-)

Comments

Daniel Vetter Oct. 19, 2017, 7:26 a.m. UTC | #1
On Thu, Oct 19, 2017 at 07:37:33AM +0100, Chris Wilson wrote:
> If the device is in runtime suspend, resuming takes time and reduces our
> powersaving. If this was for a small write into an object, that resume
> will take longer than any savings in using the indirect GGTT access to
> avoid the cpu cache.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>

Still sounds like a reasonable idea.
-Daniel

> ---
>  drivers/gpu/drm/i915/i915_gem.c | 23 ++++++++++++++++++++---
>  1 file changed, 20 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index d699ea3ab80b..026cb52ece0b 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -1240,7 +1240,23 @@ i915_gem_gtt_pwrite_fast(struct drm_i915_gem_object *obj,
>  	if (ret)
>  		return ret;
>  
> -	intel_runtime_pm_get(i915);
> +	if (i915_gem_object_has_struct_page(obj)) {
> +		/*
> +		 * Avoid waking the device up if we can fallback, as
> +		 * waking/resuming is very slow (worst-case 10-100 ms
> +		 * depending on PCI sleeps and our own resume time).
> +		 * This easily dwarfs any performance advantage from
> +		 * using the cache bypass of indirect GGTT access.
> +		 */
> +		if (!intel_runtime_pm_get_if_in_use(i915)) {
> +			ret = -EFAULT;
> +			goto out_unlock;
> +		}
> +	} else {
> +		/* No backing pages, no fallback, we must force GGTT access */
> +		intel_runtime_pm_get(i915);
> +	}
> +
>  	vma = i915_gem_object_ggtt_pin(obj, NULL, 0, 0,
>  				       PIN_MAPPABLE |
>  				       PIN_NONFAULT |
> @@ -1257,7 +1273,7 @@ i915_gem_gtt_pwrite_fast(struct drm_i915_gem_object *obj,
>  	if (IS_ERR(vma)) {
>  		ret = insert_mappable_node(ggtt, &node, PAGE_SIZE);
>  		if (ret)
> -			goto out_unlock;
> +			goto out_rpm;
>  		GEM_BUG_ON(!node.allocated);
>  	}
>  
> @@ -1320,8 +1336,9 @@ i915_gem_gtt_pwrite_fast(struct drm_i915_gem_object *obj,
>  	} else {
>  		i915_vma_unpin(vma);
>  	}
> -out_unlock:
> +out_rpm:
>  	intel_runtime_pm_put(i915);
> +out_unlock:
>  	mutex_unlock(&i915->drm.struct_mutex);
>  	return ret;
>  }
> -- 
> 2.15.0.rc1
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Tvrtko Ursulin Oct. 19, 2017, 10:02 a.m. UTC | #2
On 19/10/2017 07:37, Chris Wilson wrote:
> If the device is in runtime suspend, resuming takes time and reduces our
> powersaving. If this was for a small write into an object, that resume
> will take longer than any savings in using the indirect GGTT access to
> avoid the cpu cache.

Commit talks about small writes but the patch takes no notice to size of 
requested writes. Is that intended?

Regards,

Tvrtko

> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
> ---
>   drivers/gpu/drm/i915/i915_gem.c | 23 ++++++++++++++++++++---
>   1 file changed, 20 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index d699ea3ab80b..026cb52ece0b 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -1240,7 +1240,23 @@ i915_gem_gtt_pwrite_fast(struct drm_i915_gem_object *obj,
>   	if (ret)
>   		return ret;
>   
> -	intel_runtime_pm_get(i915);
> +	if (i915_gem_object_has_struct_page(obj)) {
> +		/*
> +		 * Avoid waking the device up if we can fallback, as
> +		 * waking/resuming is very slow (worst-case 10-100 ms
> +		 * depending on PCI sleeps and our own resume time).
> +		 * This easily dwarfs any performance advantage from
> +		 * using the cache bypass of indirect GGTT access.
> +		 */
> +		if (!intel_runtime_pm_get_if_in_use(i915)) {
> +			ret = -EFAULT;
> +			goto out_unlock;
> +		}
> +	} else {
> +		/* No backing pages, no fallback, we must force GGTT access */
> +		intel_runtime_pm_get(i915);
> +	}
> +
>   	vma = i915_gem_object_ggtt_pin(obj, NULL, 0, 0,
>   				       PIN_MAPPABLE |
>   				       PIN_NONFAULT |
> @@ -1257,7 +1273,7 @@ i915_gem_gtt_pwrite_fast(struct drm_i915_gem_object *obj,
>   	if (IS_ERR(vma)) {
>   		ret = insert_mappable_node(ggtt, &node, PAGE_SIZE);
>   		if (ret)
> -			goto out_unlock;
> +			goto out_rpm;
>   		GEM_BUG_ON(!node.allocated);
>   	}
>   
> @@ -1320,8 +1336,9 @@ i915_gem_gtt_pwrite_fast(struct drm_i915_gem_object *obj,
>   	} else {
>   		i915_vma_unpin(vma);
>   	}
> -out_unlock:
> +out_rpm:
>   	intel_runtime_pm_put(i915);
> +out_unlock:
>   	mutex_unlock(&i915->drm.struct_mutex);
>   	return ret;
>   }
>
Chris Wilson Oct. 19, 2017, 10:19 a.m. UTC | #3
Quoting Tvrtko Ursulin (2017-10-19 11:02:05)
> 
> On 19/10/2017 07:37, Chris Wilson wrote:
> > If the device is in runtime suspend, resuming takes time and reduces our
> > powersaving. If this was for a small write into an object, that resume
> > will take longer than any savings in using the indirect GGTT access to
> > avoid the cpu cache.
> 
> Commit talks about small writes but the patch takes no notice to size of 
> requested writes. Is that intended?

We are talking gigabytes before the difference in paths outweigh the
worstcase costs, even before we start measuring power. pwrite is a tricky
one, for anything big and tiled or for anything where you may repeat the
read/write/mmap, you don't want to use pwrite but the direct mmap. pwrite
should only wins for small one-off uses, so that's what I have in mind
as the typical user -- and I encourage them to re-evaluate their use.

So why bother? Because it made a big difference to igt pwrite runtime
when we didn't have a display connector. Is the addition in complexity
worth it for the few real users? Should we be more aware of the wider
implications (both power and performance) of certain operations within
the driver? Do we need more mostly idle and/or headless testing?
-Chris
Tvrtko Ursulin Oct. 19, 2017, 11:08 a.m. UTC | #4
On 19/10/2017 11:19, Chris Wilson wrote:
> Quoting Tvrtko Ursulin (2017-10-19 11:02:05)
>>
>> On 19/10/2017 07:37, Chris Wilson wrote:
>>> If the device is in runtime suspend, resuming takes time and reduces our
>>> powersaving. If this was for a small write into an object, that resume
>>> will take longer than any savings in using the indirect GGTT access to
>>> avoid the cpu cache.
>>
>> Commit talks about small writes but the patch takes no notice to size of
>> requested writes. Is that intended?
> 
> We are talking gigabytes before the difference in paths outweigh the
> worstcase costs, even before we start measuring power. pwrite is a tricky

No real complaints, just read the commit and expected to see something 
about the size in the patch itself.

> one, for anything big and tiled or for anything where you may repeat the
> read/write/mmap, you don't want to use pwrite but the direct mmap. pwrite
> should only wins for small one-off uses, so that's what I have in mind
> as the typical user -- and I encourage them to re-evaluate their use.
> 
> So why bother? Because it made a big difference to igt pwrite runtime
> when we didn't have a display connector. Is the addition in complexity
> worth it for the few real users? Should we be more aware of the wider
> implications (both power and performance) of certain operations within
> the driver? Do we need more mostly idle and/or headless testing?

Regards,

Tvrtko
diff mbox

Patch

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index d699ea3ab80b..026cb52ece0b 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -1240,7 +1240,23 @@  i915_gem_gtt_pwrite_fast(struct drm_i915_gem_object *obj,
 	if (ret)
 		return ret;
 
-	intel_runtime_pm_get(i915);
+	if (i915_gem_object_has_struct_page(obj)) {
+		/*
+		 * Avoid waking the device up if we can fallback, as
+		 * waking/resuming is very slow (worst-case 10-100 ms
+		 * depending on PCI sleeps and our own resume time).
+		 * This easily dwarfs any performance advantage from
+		 * using the cache bypass of indirect GGTT access.
+		 */
+		if (!intel_runtime_pm_get_if_in_use(i915)) {
+			ret = -EFAULT;
+			goto out_unlock;
+		}
+	} else {
+		/* No backing pages, no fallback, we must force GGTT access */
+		intel_runtime_pm_get(i915);
+	}
+
 	vma = i915_gem_object_ggtt_pin(obj, NULL, 0, 0,
 				       PIN_MAPPABLE |
 				       PIN_NONFAULT |
@@ -1257,7 +1273,7 @@  i915_gem_gtt_pwrite_fast(struct drm_i915_gem_object *obj,
 	if (IS_ERR(vma)) {
 		ret = insert_mappable_node(ggtt, &node, PAGE_SIZE);
 		if (ret)
-			goto out_unlock;
+			goto out_rpm;
 		GEM_BUG_ON(!node.allocated);
 	}
 
@@ -1320,8 +1336,9 @@  i915_gem_gtt_pwrite_fast(struct drm_i915_gem_object *obj,
 	} else {
 		i915_vma_unpin(vma);
 	}
-out_unlock:
+out_rpm:
 	intel_runtime_pm_put(i915);
+out_unlock:
 	mutex_unlock(&i915->drm.struct_mutex);
 	return ret;
 }