diff mbox

drm/i915: Don't wait for vblank for sprite plane flips

Message ID 1372428931-24144-1-git-send-email-vijay.a.purushothaman@intel.com (mailing list archive)
State New, archived
Headers show

Commit Message

Vijay Purushothaman June 28, 2013, 2:15 p.m. UTC
Since the sprite planes are using synchronized MMIO based flip, no need
to wait for vblank. Removing this wait allows us to get a nice
performance boost to both 3D & media workloads based on sprite (~60 fps
from ~20 fps)

Signed-off-by: Vijay Purushothaman <vijay.a.purushothaman@intel.com>
Signed-off-by: Gary Smith <gary.k.smith@intel.com>
---
 drivers/gpu/drm/i915/intel_sprite.c |   14 +-------------
 1 file changed, 1 insertion(+), 13 deletions(-)

Comments

Kumar, Shobhit June 28, 2013, 2:22 p.m. UTC | #1
> -----Original Message-----
> From: intel-gfx-bounces+shobhit.kumar=intel.com@lists.freedesktop.org
> [mailto:intel-gfx-bounces+shobhit.kumar=intel.com@lists.freedesktop.org]
> On Behalf Of Vijay Purushothaman
> Sent: Friday, June 28, 2013 7:46 PM
> To: Intel Graphics
> Subject: [Intel-gfx] [PATCH] drm/i915: Don't wait for vblank for sprite plane
> flips
> 
> Since the sprite planes are using synchronized MMIO based flip, no need to
> wait for vblank. Removing this wait allows us to get a nice performance boost
> to both 3D & media workloads based on sprite (~60 fps from ~20 fps)
> 
> Signed-off-by: Vijay Purushothaman <vijay.a.purushothaman@intel.com>
> Signed-off-by: Gary Smith <gary.k.smith@intel.com>
> ---
>  drivers/gpu/drm/i915/intel_sprite.c |   14 +-------------
>  1 file changed, 1 insertion(+), 13 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/intel_sprite.c
> b/drivers/gpu/drm/i915/intel_sprite.c
> index 1fa5612..1d14fc0 100644
> --- a/drivers/gpu/drm/i915/intel_sprite.c
> +++ b/drivers/gpu/drm/i915/intel_sprite.c
> @@ -828,20 +828,8 @@ intel_update_plane(struct drm_plane *plane, struct
> drm_crtc *crtc,
>  		intel_disable_primary(crtc);
> 
>  	/* Unpin old obj after new one is active to avoid ugliness */
> -	if (old_obj) {
> -		/*
> -		 * It's fairly common to simply update the position of
> -		 * an existing object.  In that case, we don't need to
> -		 * wait for vblank to avoid ugliness, we only need to
> -		 * do the pin & ref bookkeeping.
> -		 */
> -		if (old_obj != obj) {
> -			mutex_unlock(&dev->struct_mutex);
> -			intel_wait_for_vblank(dev, to_intel_crtc(crtc)-
> >pipe);
> -			mutex_lock(&dev->struct_mutex);
> -		}
> +	if (old_obj)
>  		intel_unpin_fb_obj(old_obj);
> -	}
> 
>  out_unlock:
>  	mutex_unlock(&dev->struct_mutex);
> --
> 1.7.9.5

Tested on VLV. Works fine
Tested-by: Shobhit Kumar <Shobhit.kumar@intel.com>
Ville Syrjälä June 28, 2013, 2:24 p.m. UTC | #2
On Fri, Jun 28, 2013 at 07:45:31PM +0530, Vijay Purushothaman wrote:
> Since the sprite planes are using synchronized MMIO based flip, no need
> to wait for vblank. Removing this wait allows us to get a nice
> performance boost to both 3D & media workloads based on sprite (~60 fps
> from ~20 fps)

Nak. We can't unpin the buffer until the hardware has finished reading
from it.

The proper fix is to do the unpin asynchronously after the flip has
completed. That's one part of the bigger atomic pageflip story.

> 
> Signed-off-by: Vijay Purushothaman <vijay.a.purushothaman@intel.com>
> Signed-off-by: Gary Smith <gary.k.smith@intel.com>
> ---
>  drivers/gpu/drm/i915/intel_sprite.c |   14 +-------------
>  1 file changed, 1 insertion(+), 13 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/intel_sprite.c b/drivers/gpu/drm/i915/intel_sprite.c
> index 1fa5612..1d14fc0 100644
> --- a/drivers/gpu/drm/i915/intel_sprite.c
> +++ b/drivers/gpu/drm/i915/intel_sprite.c
> @@ -828,20 +828,8 @@ intel_update_plane(struct drm_plane *plane, struct drm_crtc *crtc,
>  		intel_disable_primary(crtc);
>  
>  	/* Unpin old obj after new one is active to avoid ugliness */
> -	if (old_obj) {
> -		/*
> -		 * It's fairly common to simply update the position of
> -		 * an existing object.  In that case, we don't need to
> -		 * wait for vblank to avoid ugliness, we only need to
> -		 * do the pin & ref bookkeeping.
> -		 */
> -		if (old_obj != obj) {
> -			mutex_unlock(&dev->struct_mutex);
> -			intel_wait_for_vblank(dev, to_intel_crtc(crtc)->pipe);
> -			mutex_lock(&dev->struct_mutex);
> -		}
> +	if (old_obj)
>  		intel_unpin_fb_obj(old_obj);
> -	}
>  
>  out_unlock:
>  	mutex_unlock(&dev->struct_mutex);
> -- 
> 1.7.9.5
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Chris Wilson June 28, 2013, 4:05 p.m. UTC | #3
On Fri, Jun 28, 2013 at 05:24:50PM +0300, Ville Syrjälä wrote:
> On Fri, Jun 28, 2013 at 07:45:31PM +0530, Vijay Purushothaman wrote:
> > Since the sprite planes are using synchronized MMIO based flip, no need
> > to wait for vblank. Removing this wait allows us to get a nice
> > performance boost to both 3D & media workloads based on sprite (~60 fps
> > from ~20 fps)
> 
> Nak. We can't unpin the buffer until the hardware has finished reading
> from it.
> 
> The proper fix is to do the unpin asynchronously after the flip has
> completed. That's one part of the bigger atomic pageflip story.

The interested reader is invited to review the patches to do async
unpinning here and in set-base sent many, many moons ago.
-Chris
Vijay Purushothaman July 1, 2013, 5:26 a.m. UTC | #4
On 6/28/2013 7:54 PM, Ville Syrjälä wrote:
> On Fri, Jun 28, 2013 at 07:45:31PM +0530, Vijay Purushothaman wrote:
>> Since the sprite planes are using synchronized MMIO based flip, no need
>> to wait for vblank. Removing this wait allows us to get a nice
>> performance boost to both 3D & media workloads based on sprite (~60 fps
>> from ~20 fps)
>
> Nak. We can't unpin the buffer until the hardware has finished reading
> from it.

Thanks much for the review feedback. We have removed this check in our 
android production branch so far we have not seen any side effect or 
artifact. Is this a conservative check or do we have any use case which 
will fail without this piece of code?

Apparently the windows driver team have been using MMIO flips for Sprite 
and the windows driver also didn't wait for vblank for such flips all along.

Could you please help me understand this a bit better? This wait reduces 
the perf to ~20 fps and thus prevent us from using sprite for any OGL 
layer mapping in hwcomposer and we are losing significant amount of 
power. For video content playback the problem is not that bad.

>
> The proper fix is to do the unpin asynchronously after the flip has
> completed. That's one part of the bigger atomic pageflip story.
>

Any idea when are we planning to merge this atomic flip in d-i-n-q?

Thanks,
Vijay


>>
>> Signed-off-by: Vijay Purushothaman <vijay.a.purushothaman@intel.com>
>> Signed-off-by: Gary Smith <gary.k.smith@intel.com>
>> ---
>>   drivers/gpu/drm/i915/intel_sprite.c |   14 +-------------
>>   1 file changed, 1 insertion(+), 13 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/i915/intel_sprite.c b/drivers/gpu/drm/i915/intel_sprite.c
>> index 1fa5612..1d14fc0 100644
>> --- a/drivers/gpu/drm/i915/intel_sprite.c
>> +++ b/drivers/gpu/drm/i915/intel_sprite.c
>> @@ -828,20 +828,8 @@ intel_update_plane(struct drm_plane *plane, struct drm_crtc *crtc,
>>   		intel_disable_primary(crtc);
>>
>>   	/* Unpin old obj after new one is active to avoid ugliness */
>> -	if (old_obj) {
>> -		/*
>> -		 * It's fairly common to simply update the position of
>> -		 * an existing object.  In that case, we don't need to
>> -		 * wait for vblank to avoid ugliness, we only need to
>> -		 * do the pin & ref bookkeeping.
>> -		 */
>> -		if (old_obj != obj) {
>> -			mutex_unlock(&dev->struct_mutex);
>> -			intel_wait_for_vblank(dev, to_intel_crtc(crtc)->pipe);
>> -			mutex_lock(&dev->struct_mutex);
>> -		}
>> +	if (old_obj)
>>   		intel_unpin_fb_obj(old_obj);
>> -	}
>>
>>   out_unlock:
>>   	mutex_unlock(&dev->struct_mutex);
>> --
>> 1.7.9.5
>>
>> _______________________________________________
>> Intel-gfx mailing list
>> Intel-gfx@lists.freedesktop.org
>> http://lists.freedesktop.org/mailman/listinfo/intel-gfx
>
Vijay Purushothaman July 1, 2013, 5:28 a.m. UTC | #5
On 6/28/2013 9:35 PM, Chris Wilson wrote:
> On Fri, Jun 28, 2013 at 05:24:50PM +0300, Ville Syrjälä wrote:
>> On Fri, Jun 28, 2013 at 07:45:31PM +0530, Vijay Purushothaman wrote:
>>> Since the sprite planes are using synchronized MMIO based flip, no need
>>> to wait for vblank. Removing this wait allows us to get a nice
>>> performance boost to both 3D & media workloads based on sprite (~60 fps
>>> from ~20 fps)
>>
>> Nak. We can't unpin the buffer until the hardware has finished reading
>> from it.
>>
>> The proper fix is to do the unpin asynchronously after the flip has
>> completed. That's one part of the bigger atomic pageflip story.
>
> The interested reader is invited to review the patches to do async
> unpinning here and in set-base sent many, many moons ago.
> -Chris
>
Thanks Chris. I will try to dig for those patches. In case if you have 
the pointers handy, would you mind pointing me to the patch series?

Thanks,
Vijay
Ville Syrjälä July 1, 2013, 8:20 a.m. UTC | #6
On Mon, Jul 01, 2013 at 10:56:06AM +0530, Vijay Purushothaman wrote:
> On 6/28/2013 7:54 PM, Ville Syrjälä wrote:
> > On Fri, Jun 28, 2013 at 07:45:31PM +0530, Vijay Purushothaman wrote:
> >> Since the sprite planes are using synchronized MMIO based flip, no need
> >> to wait for vblank. Removing this wait allows us to get a nice
> >> performance boost to both 3D & media workloads based on sprite (~60 fps
> >> from ~20 fps)
> >
> > Nak. We can't unpin the buffer until the hardware has finished reading
> > from it.
> 
> Thanks much for the review feedback. We have removed this check in our 
> android production branch so far we have not seen any side effect or 
> artifact. Is this a conservative check or do we have any use case which 
> will fail without this piece of code?
> 
> Apparently the windows driver team have been using MMIO flips for Sprite 
> and the windows driver also didn't wait for vblank for such flips all along.
> 
> Could you please help me understand this a bit better? This wait reduces 
> the perf to ~20 fps and thus prevent us from using sprite for any OGL 
> layer mapping in hwcomposer and we are losing significant amount of 
> power. For video content playback the problem is not that bad.

I'd say trying to implement hwcomposer w/o atomic pageflip is anyway a
doomed idea. You will see ugly glitches if/when the sprite turns on/off
too late/soon. I actually tried it myself a few years ago, and then
realized that it just won't cut it, which is why I started writing
some of the early atomic page flip code.

> > The proper fix is to do the unpin asynchronously after the flip has
> > completed. That's one part of the bigger atomic pageflip story.
> >
> 
> Any idea when are we planning to merge this atomic flip in d-i-n-q?

After someone has time to rewrite a bunch of the code.

My rough plan:
- Fix watermarks. I've been working on it, and I hope to have something
  reasonable ready before my summer vacation starts after this week.
  Then there's still the pre-pch watermark code to fix.
- Implement drm_plane support for primary planes. I posted some gen2-4
  sprite C code a while back which should serve as a starting point. I
  still haven't quite figured out how we're going to handle compatibility
  w/ old userland though. We may not want to expose these planes unless
  we know userland can handle them. I want to do this before we merge
  any new atomic page flip ioctl, so that we don't have to carry the
  legacy baggage in the new API.
- Cook up some plane_config type of stuff to pre-compute the plane state
- Plug in the real atomic stuff

Another big issues is the locking. We really want to get fine grained
locking for the atomic page flip too, so someone has to figure out how
to do it. Daniel's idea of using the ww_mutex stuff for this sounds like
a reasonable approach to me, but I haven't really thought about it that
much yet. Currently even the non-atomic plane ioctls take the big kms
lock, which plainly sucks.

There are a bunch of API issues left as well, like defining new
properties, figuring out the event stuff (I know some people would
prefer a single even for the whole crtc, which could be done, except
we have to carefully define how it behaves in certain special cases),
and probably some other things I forgot already.

One other interesting idea would be having some swap interval or swap
target timestamp stuff in the kernel, so that userspace could even
schedule multiple atomic updates in advance. This may turn out rather
problematic when performing operations on multiple crtcs, since there
could be several different states in the pipeline for one crtc, so the
state of another crtc would possibly need to be checked against all of
them, so that we can be sure whether the operation will succeed
regardless of which order they end up being pushed to the hardware.
Maybe we'd just limit such heavy pipelining to simple page flip
scenarios, which could neatly avoid some of the problems. But work on
this stuff can easily wait until the basic atomic stuff is in.
diff mbox

Patch

diff --git a/drivers/gpu/drm/i915/intel_sprite.c b/drivers/gpu/drm/i915/intel_sprite.c
index 1fa5612..1d14fc0 100644
--- a/drivers/gpu/drm/i915/intel_sprite.c
+++ b/drivers/gpu/drm/i915/intel_sprite.c
@@ -828,20 +828,8 @@  intel_update_plane(struct drm_plane *plane, struct drm_crtc *crtc,
 		intel_disable_primary(crtc);
 
 	/* Unpin old obj after new one is active to avoid ugliness */
-	if (old_obj) {
-		/*
-		 * It's fairly common to simply update the position of
-		 * an existing object.  In that case, we don't need to
-		 * wait for vblank to avoid ugliness, we only need to
-		 * do the pin & ref bookkeeping.
-		 */
-		if (old_obj != obj) {
-			mutex_unlock(&dev->struct_mutex);
-			intel_wait_for_vblank(dev, to_intel_crtc(crtc)->pipe);
-			mutex_lock(&dev->struct_mutex);
-		}
+	if (old_obj)
 		intel_unpin_fb_obj(old_obj);
-	}
 
 out_unlock:
 	mutex_unlock(&dev->struct_mutex);