diff mbox

[PATCHv2] drm/radeon: Move pageflip request from vblank IRQ to ioctl

Message ID 1309975448-24373-1-git-send-email-simon.farnsworth@onelan.co.uk (mailing list archive)
State New, archived
Headers show

Commit Message

Simon Farnsworth July 6, 2011, 6:04 p.m. UTC
The radeon pageflip ioctl handler delayed submitting the pageflip to
hardware until the vblank IRQ handler. On AMD Fusion (PALM GPU, G-T56N
CPU), when using a reduced blanking CVT mode, a pageflip submitted to
hardware in the IRQ handler failed to complete before the end of the
vblank, resulting in a guaranteed halving of frame rate despite having
plenty of spare CPU and GPU resource.

Fix this by moving the pageflip request to hardware into the pageflip
ioctl, waiting until we are outside a vblank period so that pageflip
timing is still accurate.

This doubles my frame rate in reduced blanking modes, and does not
have an impact on CPU usage in normal blanking modes.

Signed-off-by: Simon Farnsworth <simon.farnsworth@onelan.co.uk>
---
Changes from v1:

 * Replace the fence with a radeon_bo_wait on the new
   frontbuffer. Discussions on #radeon suggest that this should be
   good enough to wait for rendering to complete, while not stalling
   other GPU users.

 * Change from msleep(), which can take 20ms on HZ=100 systems, to
   usleep_range, and choose limits that are reasonable for 50Hz or
   higher displays.

 drivers/gpu/drm/radeon/radeon.h         |    2 -
 drivers/gpu/drm/radeon/radeon_display.c |   74 +++++++------------------------
 drivers/gpu/drm/radeon/radeon_mode.h    |    1 -
 3 files changed, 16 insertions(+), 61 deletions(-)

Comments

Paul Menzel July 6, 2011, 8:54 p.m. UTC | #1
Am Mittwoch, den 06.07.2011, 19:04 +0100 schrieb Simon Farnsworth:
> The radeon pageflip ioctl handler delayed submitting the pageflip to
> hardware until the vblank IRQ handler. On AMD Fusion (PALM GPU, G-T56N
> CPU), when using a reduced blanking CVT mode, a pageflip submitted to
> hardware in the IRQ handler failed to complete before the end of the
> vblank, resulting in a guaranteed halving of frame rate despite having
> plenty of spare CPU and GPU resource.
> 
> Fix this by moving the pageflip request to hardware into the pageflip
> ioctl, waiting until we are outside a vblank period so that pageflip
> timing is still accurate.
> 
> This doubles my frame rate in reduced blanking modes, and does not
> have an impact on CPU usage in normal blanking modes.
> 
> Signed-off-by: Simon Farnsworth <simon.farnsworth@onelan.co.uk>

Please mention the bug number the commit closes/addresses in the commit
message.

[…]


Thanks,

Paul
Michel Dänzer July 7, 2011, 7:46 a.m. UTC | #2
On Mit, 2011-07-06 at 19:04 +0100, Simon Farnsworth wrote: 
> The radeon pageflip ioctl handler delayed submitting the pageflip to
> hardware until the vblank IRQ handler. On AMD Fusion (PALM GPU, G-T56N
> CPU), when using a reduced blanking CVT mode, a pageflip submitted to
> hardware in the IRQ handler failed to complete before the end of the
> vblank, resulting in a guaranteed halving of frame rate despite having
> plenty of spare CPU and GPU resource.
> 
> Fix this by moving the pageflip request to hardware into the pageflip
> ioctl, waiting until we are outside a vblank period so that pageflip
> timing is still accurate.
> 
> This doubles my frame rate in reduced blanking modes, and does not
> have an impact on CPU usage in normal blanking modes.
> 
> Signed-off-by: Simon Farnsworth <simon.farnsworth@onelan.co.uk>
> ---
> Changes from v1:
> 
>  * Replace the fence with a radeon_bo_wait on the new
>    frontbuffer. Discussions on #radeon suggest that this should be
>    good enough to wait for rendering to complete, while not stalling
>    other GPU users.
> 
>  * Change from msleep(), which can take 20ms on HZ=100 systems, to
>    usleep_range, and choose limits that are reasonable for 50Hz or
>    higher displays.
> 
>  drivers/gpu/drm/radeon/radeon.h         |    2 -
>  drivers/gpu/drm/radeon/radeon_display.c |   74 +++++++------------------------
>  drivers/gpu/drm/radeon/radeon_mode.h    |    1 -
>  3 files changed, 16 insertions(+), 61 deletions(-)

Basically looks great (less code that works better, good deal :), just
some nits.


> diff --git a/drivers/gpu/drm/radeon/radeon_display.c b/drivers/gpu/drm/radeon/radeon_display.c
> index 292f73f..1749239 100644
> --- a/drivers/gpu/drm/radeon/radeon_display.c
> +++ b/drivers/gpu/drm/radeon/radeon_display.c
> @@ -348,27 +313,21 @@ static int radeon_crtc_page_flip(struct drm_crtc *crtc,
>  	struct radeon_framebuffer *new_radeon_fb;
>  	struct drm_gem_object *obj;
>  	struct radeon_bo *rbo;
> -	struct radeon_fence *fence;
>  	struct radeon_unpin_work *work;
> +
>  	unsigned long flags;

Drop this blank line.


> @@ -461,19 +415,24 @@ static int radeon_crtc_page_flip(struct drm_crtc *crtc,
>  		goto pflip_cleanup1;
>  	}
>  
> -	/* 32 ought to cover us */
> -	r = radeon_ring_lock(rdev, 32);
> +	r = radeon_bo_wait(rbo, NULL, false);
>  	if (r) {
> -		DRM_ERROR("failed to lock the ring before flip\n");
> +		DRM_ERROR("failed to wait for rendering to complete before flip\n");
>  		goto pflip_cleanup2;
>  	}

Not sure we should bail on radeon_bo_wait failure here. In the worst
case, an incomplete frame would be visible intermittently?

(Have you tested how userspace copes with bailing here? It actually
looks like the error paths may return 0 to userspace anyway despite not
actually flipping)


> +	/* Wait until we are out of vblank - for normal blanking, this
> +	 * takes a worst case of 0.7ms at 50Hz */
> +	/* Enhancement note: you could calculate how long to sleep
> +	 * for, based on vpos, and use this as the lower bound for
> +	 * usleep_range */
> +	while(radeon_get_crtc_scanoutpos(dev, radeon_crtc->crtc_id, &vpos, &hpos) & DRM_SCANOUTPOS_INVBL)
> +		usleep_range(100, 1000);

If vblank lasts up to on the order of 1ms, this could result in several
wakeups on a regular basis? How about something like usleep_range(1000,
4000), which should normally be longer than vblank but shorter than a
frame?
diff mbox

Patch

diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h
index ef0e0e0..1bd13aa 100644
--- a/drivers/gpu/drm/radeon/radeon.h
+++ b/drivers/gpu/drm/radeon/radeon.h
@@ -397,10 +397,8 @@  struct radeon_unpin_work {
 	struct work_struct work;
 	struct radeon_device *rdev;
 	int crtc_id;
-	struct radeon_fence *fence;
 	struct drm_pending_vblank_event *event;
 	struct radeon_bo *old_rbo;
-	u64 new_crtc_base;
 };
 
 struct r500_irq_stat_regs {
diff --git a/drivers/gpu/drm/radeon/radeon_display.c b/drivers/gpu/drm/radeon/radeon_display.c
index 292f73f..1749239 100644
--- a/drivers/gpu/drm/radeon/radeon_display.c
+++ b/drivers/gpu/drm/radeon/radeon_display.c
@@ -276,47 +276,13 @@  void radeon_crtc_handle_flip(struct radeon_device *rdev, int crtc_id)
 	struct drm_pending_vblank_event *e;
 	struct timeval now;
 	unsigned long flags;
-	u32 update_pending;
-	int vpos, hpos;
 
 	spin_lock_irqsave(&rdev->ddev->event_lock, flags);
 	work = radeon_crtc->unpin_work;
-	if (work == NULL ||
-	    !radeon_fence_signaled(work->fence)) {
+	if (work == NULL) {
 		spin_unlock_irqrestore(&rdev->ddev->event_lock, flags);
 		return;
 	}
-	/* New pageflip, or just completion of a previous one? */
-	if (!radeon_crtc->deferred_flip_completion) {
-		/* do the flip (mmio) */
-		update_pending = radeon_page_flip(rdev, crtc_id, work->new_crtc_base);
-	} else {
-		/* This is just a completion of a flip queued in crtc
-		 * at last invocation. Make sure we go directly to
-		 * completion routine.
-		 */
-		update_pending = 0;
-		radeon_crtc->deferred_flip_completion = 0;
-	}
-
-	/* Has the pageflip already completed in crtc, or is it certain
-	 * to complete in this vblank?
-	 */
-	if (update_pending &&
-	    (DRM_SCANOUTPOS_VALID & radeon_get_crtc_scanoutpos(rdev->ddev, crtc_id,
-							       &vpos, &hpos)) &&
-	    (vpos >=0) &&
-	    (vpos < (99 * rdev->mode_info.crtcs[crtc_id]->base.hwmode.crtc_vdisplay)/100)) {
-		/* crtc didn't flip in this target vblank interval,
-		 * but flip is pending in crtc. It will complete it
-		 * in next vblank interval, so complete the flip at
-		 * next vblank irq.
-		 */
-		radeon_crtc->deferred_flip_completion = 1;
-		spin_unlock_irqrestore(&rdev->ddev->event_lock, flags);
-		return;
-	}
-
 	/* Pageflip (will be) certainly completed in this vblank. Clean up. */
 	radeon_crtc->unpin_work = NULL;
 
@@ -332,7 +298,6 @@  void radeon_crtc_handle_flip(struct radeon_device *rdev, int crtc_id)
 	spin_unlock_irqrestore(&rdev->ddev->event_lock, flags);
 
 	drm_vblank_put(rdev->ddev, radeon_crtc->crtc_id);
-	radeon_fence_unref(&work->fence);
 	radeon_post_page_flip(work->rdev, work->crtc_id);
 	schedule_work(&work->work);
 }
@@ -348,27 +313,21 @@  static int radeon_crtc_page_flip(struct drm_crtc *crtc,
 	struct radeon_framebuffer *new_radeon_fb;
 	struct drm_gem_object *obj;
 	struct radeon_bo *rbo;
-	struct radeon_fence *fence;
 	struct radeon_unpin_work *work;
+
 	unsigned long flags;
 	u32 tiling_flags, pitch_pixels;
 	u64 base;
+	int vpos, hpos;
 	int r;
 
 	work = kzalloc(sizeof *work, GFP_KERNEL);
 	if (work == NULL)
 		return -ENOMEM;
 
-	r = radeon_fence_create(rdev, &fence);
-	if (unlikely(r != 0)) {
-		kfree(work);
-		DRM_ERROR("flip queue: failed to create fence.\n");
-		return -ENOMEM;
-	}
 	work->event = event;
 	work->rdev = rdev;
 	work->crtc_id = radeon_crtc->crtc_id;
-	work->fence = radeon_fence_ref(fence);
 	old_radeon_fb = to_radeon_framebuffer(crtc->fb);
 	new_radeon_fb = to_radeon_framebuffer(fb);
 	/* schedule unpin of the old buffer */
@@ -387,7 +346,6 @@  static int radeon_crtc_page_flip(struct drm_crtc *crtc,
 		goto unlock_free;
 	}
 	radeon_crtc->unpin_work = work;
-	radeon_crtc->deferred_flip_completion = 0;
 	spin_unlock_irqrestore(&dev->event_lock, flags);
 
 	/* pin the new buffer */
@@ -448,10 +406,6 @@  static int radeon_crtc_page_flip(struct drm_crtc *crtc,
 		base &= ~7;
 	}
 
-	spin_lock_irqsave(&dev->event_lock, flags);
-	work->new_crtc_base = base;
-	spin_unlock_irqrestore(&dev->event_lock, flags);
-
 	/* update crtc fb */
 	crtc->fb = fb;
 
@@ -461,19 +415,24 @@  static int radeon_crtc_page_flip(struct drm_crtc *crtc,
 		goto pflip_cleanup1;
 	}
 
-	/* 32 ought to cover us */
-	r = radeon_ring_lock(rdev, 32);
+	r = radeon_bo_wait(rbo, NULL, false);
 	if (r) {
-		DRM_ERROR("failed to lock the ring before flip\n");
+		DRM_ERROR("failed to wait for rendering to complete before flip\n");
 		goto pflip_cleanup2;
 	}
 
-	/* emit the fence */
-	radeon_fence_emit(rdev, fence);
-	/* set the proper interrupt */
 	radeon_pre_page_flip(rdev, radeon_crtc->crtc_id);
-	/* fire the ring */
-	radeon_ring_unlock_commit(rdev);
+
+	/* Wait until we are out of vblank - for normal blanking, this
+	 * takes a worst case of 0.7ms at 50Hz */
+	/* Enhancement note: you could calculate how long to sleep
+	 * for, based on vpos, and use this as the lower bound for
+	 * usleep_range */
+	while(radeon_get_crtc_scanoutpos(dev, radeon_crtc->crtc_id, &vpos, &hpos) & DRM_SCANOUTPOS_INVBL)
+		usleep_range(100, 1000);
+
+	/* Request the flip */
+	radeon_page_flip(rdev, radeon_crtc->crtc_id, base);
 
 	return 0;
 
@@ -501,7 +460,6 @@  pflip_cleanup:
 unlock_free:
 	drm_gem_object_unreference_unlocked(old_radeon_fb->obj);
 	spin_unlock_irqrestore(&dev->event_lock, flags);
-	radeon_fence_unref(&fence);
 	kfree(work);
 
 	return r;
diff --git a/drivers/gpu/drm/radeon/radeon_mode.h b/drivers/gpu/drm/radeon/radeon_mode.h
index 6df4e3c..89d5a4c 100644
--- a/drivers/gpu/drm/radeon/radeon_mode.h
+++ b/drivers/gpu/drm/radeon/radeon_mode.h
@@ -282,7 +282,6 @@  struct radeon_crtc {
 	int pll_id;
 	/* page flipping */
 	struct radeon_unpin_work *unpin_work;
-	int deferred_flip_completion;
 };
 
 struct radeon_encoder_primary_dac {