diff mbox

[4/5] drm/radeon: rework page flip handling

Message ID 1398781773-2319-4-git-send-email-deathsimple@vodafone.de (mailing list archive)
State New, archived
Headers show

Commit Message

Christian König April 29, 2014, 2:29 p.m. UTC
From: Christian König <christian.koenig@amd.com>

Instead of trying to flip inside the vblank period when
the buffer is idle, offload blocking for idle to a kernel
thread and program the flip directly into the hardware.

Signed-off-by: Christian König <christian.koenig@amd.com>
---
 drivers/gpu/drm/radeon/radeon.h         |  16 ++-
 drivers/gpu/drm/radeon/radeon_display.c | 231 +++++++++++++++++---------------
 drivers/gpu/drm/radeon/radeon_mode.h    |   4 +-
 3 files changed, 134 insertions(+), 117 deletions(-)

Comments

Michel Dänzer May 2, 2014, 7:25 a.m. UTC | #1
On 29.04.2014 23:29, Christian König wrote:
> From: Christian König <christian.koenig@amd.com>
> 
> Instead of trying to flip inside the vblank period when
> the buffer is idle, offload blocking for idle to a kernel
> thread and program the flip directly into the hardware.
> 
> Signed-off-by: Christian König <christian.koenig@amd.com>
[...]
> +/**
> + * radeon_flip_work_func - page flip framebuffer
> + *
> + * @work - kernel work item
> + *
> + * Wait for the buffer object to become idle and do the actual page flip
> + */
> +static void radeon_flip_work_func(struct work_struct *__work)
>  {
[...]
> +	if (radeon_crtc->flip_work) {
> +		DRM_DEBUG_DRIVER("flip queue: crtc already busy\n");
> +		spin_unlock_irqrestore(&crtc->dev->event_lock, flags);
> +		goto pflip_cleanup1;
> +	}

I'm a little worried about this case. AFAICT this would drop the flip if
a previous one is still pending? I'm not sure current userspace can
actually hit this, but sooner or later we'll probably want to support
things like triple buffering, flips replacing previous ones still
pending, and asynchronous flips (not synchronized to vertical blank).
Those will probably require changes to the kernel/user interface, but it
might be good to at least keep them in mind already for the infrastructure.


Also, in patch 5, we could stop calling drm_vblank_get/put() when we're
not using the vblank interrupt for flipping?


The series looks good to me other than that.
Christian König May 2, 2014, 1:29 p.m. UTC | #2
> I'm a little worried about this case. AFAICT this would drop the flip if
> a previous one is still pending? I'm not sure current userspace can
> actually hit this
Yeah, that concerned me as well. The old code dropped the the new flip 
as well, so I'm pretty sure that the new handling is right and userspace 
won't hit that.

The only difference to the old code is that I've offloaded it to a 
separate thread and so can't return -EBUSY any more.

> but sooner or later we'll probably want to support
> things like triple buffering,
Triple buffering should still work like expected (at least if userspace 
makes sure to not submit more than one flip at a time).

> flips replacing previous ones still
> pending,
Flips replacing previous one is tricky to implement without causing a 
race condition, so I would rather like to avoid doing it for now.

> and asynchronous flips (not synchronized to vertical blank).
Actually that's one of the things I've implement in 3.15 by using the 
pflip interrupt. All you need to do is to set the appropriate bits in 
the hardware to sync the flip to VBLANK, HBLANK or not at all. For the 
pflip interrupt that doesn't matter it should just fire as soon as the 
new base address is used by the hardware.

> Also, in patch 5, we could stop calling drm_vblank_get/put() when we're
> not using the vblank interrupt for flipping?
Mhm, good point. But I think the question is rather why did we needed 
that in the first place?

The VBLANK interrupt is turned on by radeon_irq_kms_pflip_irq_get 
anyway, so that call seems to be superfluous even on the old system.

Thanks for the review, going to work on v2 now,
Christian.

Am 02.05.2014 09:25, schrieb Michel Dänzer:
> On 29.04.2014 23:29, Christian König wrote:
>> From: Christian König <christian.koenig@amd.com>
>>
>> Instead of trying to flip inside the vblank period when
>> the buffer is idle, offload blocking for idle to a kernel
>> thread and program the flip directly into the hardware.
>>
>> Signed-off-by: Christian König <christian.koenig@amd.com>
> [...]
>> +/**
>> + * radeon_flip_work_func - page flip framebuffer
>> + *
>> + * @work - kernel work item
>> + *
>> + * Wait for the buffer object to become idle and do the actual page flip
>> + */
>> +static void radeon_flip_work_func(struct work_struct *__work)
>>   {
> [...]
>> +	if (radeon_crtc->flip_work) {
>> +		DRM_DEBUG_DRIVER("flip queue: crtc already busy\n");
>> +		spin_unlock_irqrestore(&crtc->dev->event_lock, flags);
>> +		goto pflip_cleanup1;
>> +	}
> I'm a little worried about this case. AFAICT this would drop the flip if
> a previous one is still pending? I'm not sure current userspace can
> actually hit this, but sooner or later we'll probably want to support
> things like triple buffering, flips replacing previous ones still
> pending, and asynchronous flips (not synchronized to vertical blank).
> Those will probably require changes to the kernel/user interface, but it
> might be good to at least keep them in mind already for the infrastructure.
>
>
> Also, in patch 5, we could stop calling drm_vblank_get/put() when we're
> not using the vblank interrupt for flipping?
>
>
> The series looks good to me other than that.
>
>
Michel Dänzer May 5, 2014, 9:27 a.m. UTC | #3
On 02.05.2014 22:29, Christian König wrote:
> Am 02.05.2014 09:25, schrieb Michel Dänzer:
>> On 29.04.2014 23:29, Christian König wrote:
>>>
>>> +static void radeon_flip_work_func(struct work_struct *__work)
>>>   {
>> [...]
>>> +    if (radeon_crtc->flip_work) {
>>> +        DRM_DEBUG_DRIVER("flip queue: crtc already busy\n");
>>> +        spin_unlock_irqrestore(&crtc->dev->event_lock, flags);
>>> +        goto pflip_cleanup1;
>>> +    }
>> I'm a little worried about this case. AFAICT this would drop the flip if
>> a previous one is still pending? I'm not sure current userspace can
>> actually hit this
> Yeah, that concerned me as well. The old code dropped the the new flip
> as well, so I'm pretty sure that the new handling is right and userspace
> won't hit that.
> 
> The only difference to the old code is that I've offloaded it to a
> separate thread and so can't return -EBUSY any more.

That's an important difference though: Userspace can react to the -EBUSY
appropriately, but if flips get dropped silently, bad things will
happen, such as the wrong buffer being scanned out.

Keep in mind that we don't control all userspace, e.g. I expect there
will be an increasing number of Wayland compositors using this
functionality.


>> but sooner or later we'll probably want to support
>> things like triple buffering,
> Triple buffering should still work like expected (at least if userspace
> makes sure to not submit more than one flip at a time).
> 
>> flips replacing previous ones still
>> pending,
> Flips replacing previous one is tricky to implement without causing a
> race condition, so I would rather like to avoid doing it for now.

It may be tricky, but it's important. Otherwise it's not possible to
achieve framerates higher than the refresh rate without tearing and with
low latency (using triple buffering).


>> and asynchronous flips (not synchronized to vertical blank).
> Actually that's one of the things I've implement in 3.15 by using the
> pflip interrupt. All you need to do is to set the appropriate bits in
> the hardware to sync the flip to VBLANK, HBLANK or not at all. For the
> pflip interrupt that doesn't matter it should just fire as soon as the
> new base address is used by the hardware.

Sounds great. :) Then we just need to expose that to userspace somehow.


>> Also, in patch 5, we could stop calling drm_vblank_get/put() when we're
>> not using the vblank interrupt for flipping?
> Mhm, good point. But I think the question is rather why did we needed
> that in the first place?
> 
> The VBLANK interrupt is turned on by radeon_irq_kms_pflip_irq_get
> anyway, so that call seems to be superfluous even on the old system.

Indeed, looks like it.
Daniel Vetter May 5, 2014, 2:23 p.m. UTC | #4
On Mon, May 05, 2014 at 06:27:17PM +0900, Michel Dänzer wrote:
> On 02.05.2014 22:29, Christian König wrote:
> > Am 02.05.2014 09:25, schrieb Michel Dänzer:
> >> On 29.04.2014 23:29, Christian König wrote:
> >>>
> >>> +static void radeon_flip_work_func(struct work_struct *__work)
> >>>   {
> >> [...]
> >>> +    if (radeon_crtc->flip_work) {
> >>> +        DRM_DEBUG_DRIVER("flip queue: crtc already busy\n");
> >>> +        spin_unlock_irqrestore(&crtc->dev->event_lock, flags);
> >>> +        goto pflip_cleanup1;
> >>> +    }
> >> I'm a little worried about this case. AFAICT this would drop the flip if
> >> a previous one is still pending? I'm not sure current userspace can
> >> actually hit this
> > Yeah, that concerned me as well. The old code dropped the the new flip
> > as well, so I'm pretty sure that the new handling is right and userspace
> > won't hit that.
> > 
> > The only difference to the old code is that I've offloaded it to a
> > separate thread and so can't return -EBUSY any more.
> 
> That's an important difference though: Userspace can react to the -EBUSY
> appropriately, but if flips get dropped silently, bad things will
> happen, such as the wrong buffer being scanned out.
> 
> Keep in mind that we don't control all userspace, e.g. I expect there
> will be an increasing number of Wayland compositors using this
> functionality.

Rules on i915 are that if the pageflip confirmation event hasn't been sent
out yet the driver returns -EBUSY. The locking for that is done using
irq-save spinlocks, so should be doable to check this synchronously before
latching some kinda of async worker.

I didnt read one line of radeon code to check this though ;-)

Cheers, Daniel
Michel Dänzer May 14, 2014, 8:02 a.m. UTC | #5
On 05.05.2014 18:27, Michel Dänzer wrote:
> On 02.05.2014 22:29, Christian König wrote:
>> Am 02.05.2014 09:25, schrieb Michel Dänzer:
>>>
>>> and asynchronous flips (not synchronized to vertical blank).
>> Actually that's one of the things I've implement in 3.15 by using the
>> pflip interrupt. All you need to do is to set the appropriate bits in
>> the hardware to sync the flip to VBLANK, HBLANK or not at all. For the
>> pflip interrupt that doesn't matter it should just fire as soon as the
>> new base address is used by the hardware.
> 
> Sounds great. :) Then we just need to expose that to userspace somehow.

Actually, looks like that should be quite simple:

git grep -i async_page_flip
diff mbox

Patch

diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h
index fb7c499..1666eac 100644
--- a/drivers/gpu/drm/radeon/radeon.h
+++ b/drivers/gpu/drm/radeon/radeon.h
@@ -676,14 +676,16 @@  void radeon_doorbell_free(struct radeon_device *rdev, u32 doorbell);
  * IRQS.
  */
 
-struct radeon_unpin_work {
-	struct work_struct work;
-	struct radeon_device *rdev;
-	int crtc_id;
-	struct radeon_fence *fence;
+struct radeon_flip_work {
+	struct work_struct		flip_work;
+	struct work_struct		unpin_work;
+	struct radeon_device		*rdev;
+	int				crtc_id;
+	struct drm_framebuffer		*fb;
 	struct drm_pending_vblank_event *event;
-	struct radeon_bo *old_rbo;
-	u64 new_crtc_base;
+	struct radeon_bo		*old_rbo;
+	struct radeon_bo		*new_rbo;
+	struct radeon_fence		*fence;
 };
 
 struct r500_irq_stat_regs {
diff --git a/drivers/gpu/drm/radeon/radeon_display.c b/drivers/gpu/drm/radeon/radeon_display.c
index af6e5ae..b721261 100644
--- a/drivers/gpu/drm/radeon/radeon_display.c
+++ b/drivers/gpu/drm/radeon/radeon_display.c
@@ -249,16 +249,21 @@  static void radeon_crtc_destroy(struct drm_crtc *crtc)
 	struct radeon_crtc *radeon_crtc = to_radeon_crtc(crtc);
 
 	drm_crtc_cleanup(crtc);
+	destroy_workqueue(radeon_crtc->flip_queue);
 	kfree(radeon_crtc);
 }
 
-/*
- * Handle unpin events outside the interrupt handler proper.
+/**
+ * radeon_unpin_work_func - unpin old buffer object
+ *
+ * @__work - kernel work item
+ *
+ * Unpin the old frame buffer object outside of the interrupt handler
  */
 static void radeon_unpin_work_func(struct work_struct *__work)
 {
-	struct radeon_unpin_work *work =
-		container_of(__work, struct radeon_unpin_work, work);
+	struct radeon_flip_work *work =
+		container_of(__work, struct radeon_flip_work, unpin_work);
 	int r;
 
 	/* unpin of the old buffer */
@@ -279,7 +284,7 @@  static void radeon_unpin_work_func(struct work_struct *__work)
 void radeon_crtc_handle_vblank(struct radeon_device *rdev, int crtc_id)
 {
 	struct radeon_crtc *radeon_crtc = rdev->mode_info.crtcs[crtc_id];
-	struct radeon_unpin_work *work;
+	struct radeon_flip_work *work;
 	unsigned long flags;
 	u32 update_pending;
 	int vpos, hpos;
@@ -289,24 +294,13 @@  void radeon_crtc_handle_vblank(struct radeon_device *rdev, int crtc_id)
 		return;
 
 	spin_lock_irqsave(&rdev->ddev->event_lock, flags);
-	work = radeon_crtc->unpin_work;
-	if (work == NULL ||
-	    (work->fence && !radeon_fence_signaled(work->fence))) {
+	work = radeon_crtc->flip_work;
+	if (work == NULL) {
 		spin_unlock_irqrestore(&rdev->ddev->event_lock, flags);
 		return;
 	}
-	/* New pageflip, or just completion of a previous one? */
-	if (!radeon_crtc->deferred_flip_completion) {
-		/* do the flip (mmio) */
-		radeon_page_flip(rdev, crtc_id, work->new_crtc_base);
-		update_pending = radeon_page_flip_pending(rdev, crtc_id);
-	} else {
-		/* This is just a completion of a flip queued in crtc
-		 * at last invocation. Make sure we go directly to
-		 * completion routine.
-		 */
-		update_pending = 0;
-	}
+
+	update_pending = radeon_page_flip_pending(rdev, crtc_id);
 
 	/* Has the pageflip already completed in crtc, or is it certain
 	 * to complete in this vblank?
@@ -324,19 +318,9 @@  void radeon_crtc_handle_vblank(struct radeon_device *rdev, int crtc_id)
 		 */
 		update_pending = 0;
 	}
-	if (update_pending) {
-		/* crtc didn't flip in this target vblank interval,
-		 * but flip is pending in crtc. It will complete it
-		 * in next vblank interval, so complete the flip at
-		 * next vblank irq.
-		 */
-		radeon_crtc->deferred_flip_completion = 1;
-		spin_unlock_irqrestore(&rdev->ddev->event_lock, flags);
-		return;
-	} else {
-		spin_unlock_irqrestore(&rdev->ddev->event_lock, flags);
+	spin_unlock_irqrestore(&rdev->ddev->event_lock, flags);
+	if (!update_pending)
 		radeon_crtc_handle_flip(rdev, crtc_id);
-	}
 }
 
 /**
@@ -350,7 +334,7 @@  void radeon_crtc_handle_vblank(struct radeon_device *rdev, int crtc_id)
 void radeon_crtc_handle_flip(struct radeon_device *rdev, int crtc_id)
 {
 	struct radeon_crtc *radeon_crtc = rdev->mode_info.crtcs[crtc_id];
-	struct radeon_unpin_work *work;
+	struct radeon_flip_work *work;
 	unsigned long flags;
 
 	/* this can happen at init */
@@ -358,15 +342,14 @@  void radeon_crtc_handle_flip(struct radeon_device *rdev, int crtc_id)
 		return;
 
 	spin_lock_irqsave(&rdev->ddev->event_lock, flags);
-	work = radeon_crtc->unpin_work;
+	work = radeon_crtc->flip_work;
 	if (work == NULL) {
 		spin_unlock_irqrestore(&rdev->ddev->event_lock, flags);
 		return;
 	}
 
-	/* Pageflip (will be) certainly completed in this vblank. Clean up. */
-	radeon_crtc->unpin_work = NULL;
-	radeon_crtc->deferred_flip_completion = 0;
+	/* Pageflip completed. Clean up. */
+	radeon_crtc->flip_work = NULL;
 
 	/* wakeup userspace */
 	if (work->event)
@@ -377,83 +360,59 @@  void radeon_crtc_handle_flip(struct radeon_device *rdev, int crtc_id)
 	drm_vblank_put(rdev->ddev, radeon_crtc->crtc_id);
 	radeon_fence_unref(&work->fence);
 	radeon_irq_kms_pflip_irq_get(rdev, work->crtc_id);
-	schedule_work(&work->work);
+	queue_work(radeon_crtc->flip_queue, &work->unpin_work);
 }
 
-static int radeon_crtc_page_flip(struct drm_crtc *crtc,
-				 struct drm_framebuffer *fb,
-				 struct drm_pending_vblank_event *event,
-				 uint32_t page_flip_flags)
+/**
+ * radeon_flip_work_func - page flip framebuffer
+ *
+ * @work - kernel work item
+ *
+ * Wait for the buffer object to become idle and do the actual page flip
+ */
+static void radeon_flip_work_func(struct work_struct *__work)
 {
-	struct drm_device *dev = crtc->dev;
-	struct radeon_device *rdev = dev->dev_private;
-	struct radeon_crtc *radeon_crtc = to_radeon_crtc(crtc);
-	struct radeon_framebuffer *old_radeon_fb;
-	struct radeon_framebuffer *new_radeon_fb;
-	struct drm_gem_object *obj;
-	struct radeon_bo *rbo;
-	struct radeon_unpin_work *work;
-	unsigned long flags;
-	u32 tiling_flags, pitch_pixels;
-	u64 base;
-	int r;
+	struct radeon_flip_work *work =
+		container_of(__work, struct radeon_flip_work, flip_work);
+	struct radeon_device *rdev = work->rdev;
+	struct radeon_crtc *radeon_crtc = rdev->mode_info.crtcs[work->crtc_id];
 
-	work = kzalloc(sizeof *work, GFP_KERNEL);
-	if (work == NULL)
-		return -ENOMEM;
+	struct drm_crtc *crtc = &radeon_crtc->base;
+	struct drm_framebuffer *fb = work->fb;
 
-	work->event = event;
-	work->rdev = rdev;
-	work->crtc_id = radeon_crtc->crtc_id;
-	old_radeon_fb = to_radeon_framebuffer(crtc->primary->fb);
-	new_radeon_fb = to_radeon_framebuffer(fb);
-	/* schedule unpin of the old buffer */
-	obj = old_radeon_fb->obj;
-	/* take a reference to the old object */
-	drm_gem_object_reference(obj);
-	rbo = gem_to_radeon_bo(obj);
-	work->old_rbo = rbo;
-	obj = new_radeon_fb->obj;
-	rbo = gem_to_radeon_bo(obj);
+	uint32_t tiling_flags, pitch_pixels;
+	uint64_t base;
 
-	spin_lock(&rbo->tbo.bdev->fence_lock);
-	if (rbo->tbo.sync_obj)
-		work->fence = radeon_fence_ref(rbo->tbo.sync_obj);
-	spin_unlock(&rbo->tbo.bdev->fence_lock);
+	unsigned long flags;
+	int r;
 
-	INIT_WORK(&work->work, radeon_unpin_work_func);
+	if (work->fence) {
+		r = radeon_fence_wait(work->fence, false);
+		//TODO error handling
 
-	/* We borrow the event spin lock for protecting unpin_work */
-	spin_lock_irqsave(&dev->event_lock, flags);
-	if (radeon_crtc->unpin_work) {
-		DRM_DEBUG_DRIVER("flip queue: crtc already busy\n");
-		r = -EBUSY;
-		goto unlock_free;
+		radeon_fence_unref(&work->fence);
 	}
-	radeon_crtc->unpin_work = work;
-	radeon_crtc->deferred_flip_completion = 0;
-	spin_unlock_irqrestore(&dev->event_lock, flags);
 
 	/* pin the new buffer */
 	DRM_DEBUG_DRIVER("flip-ioctl() cur_fbo = %p, cur_bbo = %p\n",
-			 work->old_rbo, rbo);
+			 work->old_rbo, work->new_rbo);
 
-	r = radeon_bo_reserve(rbo, false);
+	r = radeon_bo_reserve(work->new_rbo, false);
 	if (unlikely(r != 0)) {
 		DRM_ERROR("failed to reserve new rbo buffer before flip\n");
 		goto pflip_cleanup;
 	}
 	/* Only 27 bit offset for legacy CRTC */
-	r = radeon_bo_pin_restricted(rbo, RADEON_GEM_DOMAIN_VRAM,
+	r = radeon_bo_pin_restricted(work->new_rbo, RADEON_GEM_DOMAIN_VRAM,
 				     ASIC_IS_AVIVO(rdev) ? 0 : 1 << 27, &base);
 	if (unlikely(r != 0)) {
-		radeon_bo_unreserve(rbo);
+		radeon_bo_unreserve(work->new_rbo);
 		r = -EINVAL;
 		DRM_ERROR("failed to pin new rbo buffer before flip\n");
 		goto pflip_cleanup;
 	}
-	radeon_bo_get_tiling_flags(rbo, &tiling_flags, NULL);
-	radeon_bo_unreserve(rbo);
+	radeon_bo_get_tiling_flags(work->new_rbo, &tiling_flags, NULL);
+	radeon_bo_unreserve(work->new_rbo);
 
 	if (!ASIC_IS_AVIVO(rdev)) {
 		/* crtc offset is from display base addr not FB location */
@@ -491,44 +450,99 @@  static int radeon_crtc_page_flip(struct drm_crtc *crtc,
 		base &= ~7;
 	}
 
-	spin_lock_irqsave(&dev->event_lock, flags);
-	work->new_crtc_base = base;
-	spin_unlock_irqrestore(&dev->event_lock, flags);
-
-	/* update crtc fb */
-	crtc->primary->fb = fb;
-
-	r = drm_vblank_get(dev, radeon_crtc->crtc_id);
+	r = drm_vblank_get(crtc->dev, radeon_crtc->crtc_id);
 	if (r) {
 		DRM_ERROR("failed to get vblank before flip\n");
 		goto pflip_cleanup1;
 	}
 
+	/* We borrow the event spin lock for protecting flip_work */
+	spin_lock_irqsave(&crtc->dev->event_lock, flags);
+
+	if (radeon_crtc->flip_work) {
+		DRM_DEBUG_DRIVER("flip queue: crtc already busy\n");
+		spin_unlock_irqrestore(&crtc->dev->event_lock, flags);
+		goto pflip_cleanup1;
+	}
+	radeon_crtc->flip_work = work;
+
 	/* set the proper interrupt */
 	radeon_irq_kms_pflip_irq_get(rdev, radeon_crtc->crtc_id);
 
-	return 0;
+	/* do the flip (mmio) */
+	radeon_page_flip(rdev, radeon_crtc->crtc_id, base);
+
+	spin_unlock_irqrestore(&crtc->dev->event_lock, flags);
+
+	return;
 
 pflip_cleanup1:
-	if (unlikely(radeon_bo_reserve(rbo, false) != 0)) {
+	if (unlikely(radeon_bo_reserve(work->new_rbo, false) != 0)) {
 		DRM_ERROR("failed to reserve new rbo in error path\n");
 		goto pflip_cleanup;
 	}
-	if (unlikely(radeon_bo_unpin(rbo) != 0)) {
+	if (unlikely(radeon_bo_unpin(work->new_rbo) != 0)) {
 		DRM_ERROR("failed to unpin new rbo in error path\n");
 	}
-	radeon_bo_unreserve(rbo);
+	radeon_bo_unreserve(work->new_rbo);
 
 pflip_cleanup:
-	spin_lock_irqsave(&dev->event_lock, flags);
-	radeon_crtc->unpin_work = NULL;
-unlock_free:
-	spin_unlock_irqrestore(&dev->event_lock, flags);
-	drm_gem_object_unreference_unlocked(old_radeon_fb->obj);
+	drm_gem_object_unreference_unlocked(&work->old_rbo->gem_base);
 	radeon_fence_unref(&work->fence);
 	kfree(work);
+}
 
-	return r;
+static int radeon_crtc_page_flip(struct drm_crtc *crtc,
+				 struct drm_framebuffer *fb,
+				 struct drm_pending_vblank_event *event,
+				 uint32_t page_flip_flags)
+{
+	struct drm_device *dev = crtc->dev;
+	struct radeon_device *rdev = dev->dev_private;
+	struct radeon_crtc *radeon_crtc = to_radeon_crtc(crtc);
+	struct radeon_framebuffer *old_radeon_fb;
+	struct radeon_framebuffer *new_radeon_fb;
+	struct drm_gem_object *obj;
+	struct radeon_flip_work *work;
+
+	work = kzalloc(sizeof *work, GFP_KERNEL);
+	if (work == NULL)
+		return -ENOMEM;
+
+	INIT_WORK(&work->flip_work, radeon_flip_work_func);
+	INIT_WORK(&work->unpin_work, radeon_unpin_work_func);
+
+	work->rdev = rdev;
+	work->crtc_id = radeon_crtc->crtc_id;
+	work->fb = fb;
+	work->event = event;
+
+	/* schedule unpin of the old buffer */
+	old_radeon_fb = to_radeon_framebuffer(crtc->primary->fb);
+	obj = old_radeon_fb->obj;
+
+	/* take a reference to the old object */
+	drm_gem_object_reference(obj);
+	work->old_rbo = gem_to_radeon_bo(obj);
+
+	new_radeon_fb = to_radeon_framebuffer(fb);
+	obj = new_radeon_fb->obj;
+	work->new_rbo = gem_to_radeon_bo(obj);
+
+	spin_lock(&work->new_rbo->tbo.bdev->fence_lock);
+	if (work->new_rbo->tbo.sync_obj)
+		work->fence = radeon_fence_ref(work->new_rbo->tbo.sync_obj);
+	spin_unlock(&work->new_rbo->tbo.bdev->fence_lock);
+
+	/* update crtc fb */
+	crtc->primary->fb = fb;
+
+	if (work->fence)
+		queue_work(radeon_crtc->flip_queue, &work->flip_work);
+	else
+		radeon_flip_work_func(&work->flip_work);
+
+	return 0;
 }
 
 static int
@@ -598,6 +612,7 @@  static void radeon_crtc_init(struct drm_device *dev, int index)
 
 	drm_mode_crtc_set_gamma_size(&radeon_crtc->base, 256);
 	radeon_crtc->crtc_id = index;
+	radeon_crtc->flip_queue = create_singlethread_workqueue("radeon-crtc");
 	rdev->mode_info.crtcs[index] = radeon_crtc;
 
 	if (rdev->family >= CHIP_BONAIRE) {
diff --git a/drivers/gpu/drm/radeon/radeon_mode.h b/drivers/gpu/drm/radeon/radeon_mode.h
index eda6367..b59096f 100644
--- a/drivers/gpu/drm/radeon/radeon_mode.h
+++ b/drivers/gpu/drm/radeon/radeon_mode.h
@@ -324,8 +324,8 @@  struct radeon_crtc {
 	struct drm_display_mode native_mode;
 	int pll_id;
 	/* page flipping */
-	struct radeon_unpin_work *unpin_work;
-	int deferred_flip_completion;
+	struct workqueue_struct *flip_queue;
+	struct radeon_flip_work *flip_work;
 	/* pll sharing */
 	struct radeon_atom_ss ss;
 	bool ss_enabled;