diff mbox

[43/49] drm/i915/bdw: Handle context switch events

Message ID 1395943218-7708-44-git-send-email-oscar.mateo@intel.com (mailing list archive)
State New, archived
Headers show

Commit Message

oscar.mateo@intel.com March 27, 2014, 6 p.m. UTC
From: Thomas Daniel <thomas.daniel@intel.com>

Handle all context status events in the context status buffer on every
context switch interrupt. We only remove work from the execlist queue
after a context status buffer reports that it has completed and we only
attempt to schedule new contexts on interrupt when a previously submitted
context completes (unless no contexts are queued, which means the GPU is
free).

Signed-off-by: Thomas Daniel <thomas.daniel@intel.com>

v2: Unreferencing the context when we are freeing the request might free
the backing bo, which requires the struct_mutex to be grabbed, so defer
unreferencing and freeing to a bottom half.

Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
---
 drivers/gpu/drm/i915/i915_drv.h         |   3 +
 drivers/gpu/drm/i915/i915_irq.c         |  28 ++++++---
 drivers/gpu/drm/i915/i915_lrc.c         | 101 +++++++++++++++++++++++++++++++-
 drivers/gpu/drm/i915/intel_ringbuffer.c |   1 +
 drivers/gpu/drm/i915/intel_ringbuffer.h |   1 +
 5 files changed, 123 insertions(+), 11 deletions(-)

Comments

Lespiau, Damien April 3, 2014, 2:24 p.m. UTC | #1
On Thu, Mar 27, 2014 at 06:00:12PM +0000, oscar.mateo@intel.com wrote:
> +void gen8_handle_context_events(struct intel_engine *ring)
> +{
> +	struct drm_i915_private *dev_priv = ring->dev->dev_private;
> +	u32 status_pointer;
> +	u8 read_pointer;
> +	u8 write_pointer;
> +	u32 status;
> +	u32 status_id;
> +	u32 submit_contexts = 0;
> +
> +	status_pointer = I915_READ(RING_CONTEXT_STATUS_PTR(ring));
> +
> +	read_pointer = ring->next_context_status_buffer;
> +	write_pointer = status_pointer & 0x07;
> +	if (read_pointer > write_pointer)
> +		write_pointer += 6;
> +
> +	spin_lock(&ring->execlist_lock);
> +
> +	while (read_pointer < write_pointer) {
> +		read_pointer++;
> +		status = I915_READ(RING_CONTEXT_STATUS_BUF(ring) +
> +				(read_pointer % 6) * 8);
> +		status_id = I915_READ(RING_CONTEXT_STATUS_BUF(ring) +
> +				(read_pointer % 6) * 8 + 4);
> +
> +		if (status & GEN8_CTX_STATUS_ELEMENT_SWITCH) {
> +			if (check_remove_request(ring, status_id))
> +				submit_contexts++;
> +		} else if (status & GEN8_CTX_STATUS_COMPLETE) {
> +			if (check_remove_request(ring, status_id))
> +				submit_contexts++;
> +		}
> +	}
> +
> +	if (submit_contexts != 0)
> +		gen8_switch_context_unqueue(ring);
> +
> +	spin_unlock(&ring->execlist_lock);
> +
> +	WARN(submit_contexts > 2, "More than two context complete events?\n");
> +	ring->next_context_status_buffer = write_pointer % 6;
> +}

I'm a bit suprised that we never update the read pointer in the
CONTEXT_STATUS_PTR when we consume entries from CONTEXT_STATUS_BUF.

Are we sure this field isn't used by hw at all to figure out if the
circular buffer has some free space?
oscar.mateo@intel.com April 9, 2014, 8:15 a.m. UTC | #2
It seems to be completely managed by SW, for SW (or, at least, it does not seem to have any visible effect in the HW). But you are right, it is probably worth updating.

-- Oscar

> -----Original Message-----
> From: Lespiau, Damien
> Sent: Thursday, April 03, 2014 3:25 PM
> To: Mateo Lozano, Oscar
> Cc: intel-gfx@lists.freedesktop.org; Daniel, Thomas
> Subject: Re: [Intel-gfx] [PATCH 43/49] drm/i915/bdw: Handle context switch
> events
> 
> On Thu, Mar 27, 2014 at 06:00:12PM +0000, oscar.mateo@intel.com wrote:
> > +void gen8_handle_context_events(struct intel_engine *ring) {
> > +	struct drm_i915_private *dev_priv = ring->dev->dev_private;
> > +	u32 status_pointer;
> > +	u8 read_pointer;
> > +	u8 write_pointer;
> > +	u32 status;
> > +	u32 status_id;
> > +	u32 submit_contexts = 0;
> > +
> > +	status_pointer = I915_READ(RING_CONTEXT_STATUS_PTR(ring));
> > +
> > +	read_pointer = ring->next_context_status_buffer;
> > +	write_pointer = status_pointer & 0x07;
> > +	if (read_pointer > write_pointer)
> > +		write_pointer += 6;
> > +
> > +	spin_lock(&ring->execlist_lock);
> > +
> > +	while (read_pointer < write_pointer) {
> > +		read_pointer++;
> > +		status = I915_READ(RING_CONTEXT_STATUS_BUF(ring) +
> > +				(read_pointer % 6) * 8);
> > +		status_id = I915_READ(RING_CONTEXT_STATUS_BUF(ring) +
> > +				(read_pointer % 6) * 8 + 4);
> > +
> > +		if (status & GEN8_CTX_STATUS_ELEMENT_SWITCH) {
> > +			if (check_remove_request(ring, status_id))
> > +				submit_contexts++;
> > +		} else if (status & GEN8_CTX_STATUS_COMPLETE) {
> > +			if (check_remove_request(ring, status_id))
> > +				submit_contexts++;
> > +		}
> > +	}
> > +
> > +	if (submit_contexts != 0)
> > +		gen8_switch_context_unqueue(ring);
> > +
> > +	spin_unlock(&ring->execlist_lock);
> > +
> > +	WARN(submit_contexts > 2, "More than two context complete
> events?\n");
> > +	ring->next_context_status_buffer = write_pointer % 6; }
> 
> I'm a bit suprised that we never update the read pointer in the
> CONTEXT_STATUS_PTR when we consume entries from
> CONTEXT_STATUS_BUF.
> 
> Are we sure this field isn't used by hw at all to figure out if the circular buffer
> has some free space?
> 
> --
> Damien
robert.beckett@intel.com April 26, 2014, 12:53 a.m. UTC | #3
On 27/03/2014 18:00, oscar.mateo@intel.com wrote:
> From: Thomas Daniel <thomas.daniel@intel.com>
>
> Handle all context status events in the context status buffer on every
> context switch interrupt. We only remove work from the execlist queue
> after a context status buffer reports that it has completed and we only
> attempt to schedule new contexts on interrupt when a previously submitted
> context completes (unless no contexts are queued, which means the GPU is
> free).
>
> Signed-off-by: Thomas Daniel <thomas.daniel@intel.com>
>
> v2: Unreferencing the context when we are freeing the request might free
> the backing bo, which requires the struct_mutex to be grabbed, so defer
> unreferencing and freeing to a bottom half.
>
> Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
> ---
>   drivers/gpu/drm/i915/i915_drv.h         |   3 +
>   drivers/gpu/drm/i915/i915_irq.c         |  28 ++++++---
>   drivers/gpu/drm/i915/i915_lrc.c         | 101 +++++++++++++++++++++++++++++++-
>   drivers/gpu/drm/i915/intel_ringbuffer.c |   1 +
>   drivers/gpu/drm/i915/intel_ringbuffer.h |   1 +
>   5 files changed, 123 insertions(+), 11 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index 2607664..4c8cf52 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -1679,6 +1679,8 @@ struct drm_i915_gem_request {
>
>   	/** execlist queue entry for this request */
>   	struct list_head execlist_link;
> +	/** Struct to handle this request in the bottom half of an interrupt */
> +	struct work_struct work;
>   };
>
>   struct drm_i915_file_private {
> @@ -2344,6 +2346,7 @@ void gen8_gem_context_free(struct i915_hw_context *ctx);
>   int gen8_switch_context_queue(struct intel_engine *ring,
>   			      struct i915_hw_context *to,
>   			      u32 tail);
> +void gen8_handle_context_events(struct intel_engine *ring);
>
>   /* i915_gem_evict.c */
>   int __must_check i915_gem_evict_something(struct drm_device *dev,
> diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
> index 56657b5..6e0f456 100644
> --- a/drivers/gpu/drm/i915/i915_irq.c
> +++ b/drivers/gpu/drm/i915/i915_irq.c
> @@ -1334,6 +1334,7 @@ static irqreturn_t gen8_gt_irq_handler(struct drm_device *dev,
>   				       struct drm_i915_private *dev_priv,
>   				       u32 master_ctl)
>   {
> +	struct intel_engine *ring;
>   	u32 rcs, bcs, vcs, vecs;
>   	uint32_t tmp = 0;
>   	irqreturn_t ret = IRQ_NONE;
> @@ -1342,14 +1343,21 @@ static irqreturn_t gen8_gt_irq_handler(struct drm_device *dev,
>   		tmp = I915_READ(GEN8_GT_IIR(0));
>   		if (tmp) {
>   			ret = IRQ_HANDLED;
> +
>   			rcs = tmp >> GEN8_RCS_IRQ_SHIFT;
> -			bcs = tmp >> GEN8_BCS_IRQ_SHIFT;
> +			ring = &dev_priv->ring[RCS];
>   			if (rcs & GT_RENDER_USER_INTERRUPT)
> -				notify_ring(dev, &dev_priv->ring[RCS]);
> +				notify_ring(dev, ring);
> +			if (rcs & GEN8_GT_CONTEXT_SWITCH_INTERRUPT)
> +				gen8_handle_context_events(ring);

Handling the context events here can generate a new execlist submission, 
which if a small enough workload, can finish and generate a new context 
event interrupt before we ack this interrupt.

When we ack this interrupt, we clear the new one too, loosing an interrupt.

Moving the

I915_WRITE(GEN8_GT_IIR(0), tmp);

to just inside the if (tmp) { conditional (or anywhere before this call) 
fixes this issue. There is no harm in acking the interrupt immediately 
as we have the read stored in tmp.


> +
> +			bcs = tmp >> GEN8_BCS_IRQ_SHIFT;
> +			ring = &dev_priv->ring[BCS];
>   			if (bcs & GT_RENDER_USER_INTERRUPT)
> -				notify_ring(dev, &dev_priv->ring[BCS]);
> -			if ((rcs | bcs) & GEN8_GT_CONTEXT_SWITCH_INTERRUPT)
> -			     DRM_DEBUG_DRIVER("TODO: Context switch\n");
> +				notify_ring(dev, ring);
> +			if (bcs & GEN8_GT_CONTEXT_SWITCH_INTERRUPT)
> +				gen8_handle_context_events(ring);
> +
>   			I915_WRITE(GEN8_GT_IIR(0), tmp);
>   		} else
>   			DRM_ERROR("The master control interrupt lied (GT0)!\n");
> @@ -1360,10 +1368,11 @@ static irqreturn_t gen8_gt_irq_handler(struct drm_device *dev,
>   		if (tmp) {
>   			ret = IRQ_HANDLED;
>   			vcs = tmp >> GEN8_VCS1_IRQ_SHIFT;
> +			ring = &dev_priv->ring[VCS];
>   			if (vcs & GT_RENDER_USER_INTERRUPT)
> -				notify_ring(dev, &dev_priv->ring[VCS]);
> +				notify_ring(dev, ring);
>   			if (vcs & GEN8_GT_CONTEXT_SWITCH_INTERRUPT)
> -			     DRM_DEBUG_DRIVER("TODO: Context switch\n");
> +				gen8_handle_context_events(ring);
>   			I915_WRITE(GEN8_GT_IIR(1), tmp);
>   		} else
>   			DRM_ERROR("The master control interrupt lied (GT1)!\n");
> @@ -1374,10 +1383,11 @@ static irqreturn_t gen8_gt_irq_handler(struct drm_device *dev,
>   		if (tmp) {
>   			ret = IRQ_HANDLED;
>   			vecs = tmp >> GEN8_VECS_IRQ_SHIFT;
> +			ring = &dev_priv->ring[VECS];
>   			if (vecs & GT_RENDER_USER_INTERRUPT)
> -				notify_ring(dev, &dev_priv->ring[VECS]);
> +				notify_ring(dev, ring);
>   			if (vecs & GEN8_GT_CONTEXT_SWITCH_INTERRUPT)
> -			     DRM_DEBUG_DRIVER("TODO: Context switch\n");
> +				gen8_handle_context_events(ring);
>   			I915_WRITE(GEN8_GT_IIR(3), tmp);
>   		} else
>   			DRM_ERROR("The master control interrupt lied (GT3)!\n");
> diff --git a/drivers/gpu/drm/i915/i915_lrc.c b/drivers/gpu/drm/i915/i915_lrc.c
> index 4cacabb..440da11 100644
> --- a/drivers/gpu/drm/i915/i915_lrc.c
> +++ b/drivers/gpu/drm/i915/i915_lrc.c
> @@ -46,7 +46,24 @@
>   #define GEN8_LR_CONTEXT_SIZE (21 * PAGE_SIZE)
>
>   #define RING_ELSP(ring)			((ring)->mmio_base+0x230)
> +#define RING_EXECLIST_STATUS(ring)	((ring)->mmio_base+0x234)
>   #define RING_CONTEXT_CONTROL(ring)	((ring)->mmio_base+0x244)
> +#define RING_CONTEXT_STATUS_BUF(ring)	((ring)->mmio_base+0x370)
> +#define RING_CONTEXT_STATUS_PTR(ring)	((ring)->mmio_base+0x3a0)
> +
> +#define RING_EXECLIST_QFULL		(1 << 0x2)
> +#define RING_EXECLIST1_VALID		(1 << 0x3)
> +#define RING_EXECLIST0_VALID		(1 << 0x4)
> +#define RING_EXECLIST_ACTIVE_STATUS	(3 << 0xE)
> +#define RING_EXECLIST1_ACTIVE		(1 << 0x11)
> +#define RING_EXECLIST0_ACTIVE		(1 << 0x12)
> +
> +#define GEN8_CTX_STATUS_IDLE_ACTIVE	(1 << 0)
> +#define GEN8_CTX_STATUS_PREEMPTED	(1 << 1)
> +#define GEN8_CTX_STATUS_ELEMENT_SWITCH	(1 << 2)
> +#define GEN8_CTX_STATUS_ACTIVE_IDLE	(1 << 3)
> +#define GEN8_CTX_STATUS_COMPLETE	(1 << 4)
> +#define GEN8_CTX_STATUS_LITE_RESTORE	(1 << 15)
>
>   #define CTX_LRI_HEADER_0		0x01
>   #define CTX_CONTEXT_CONTROL		0x02
> @@ -237,6 +254,9 @@ static void gen8_switch_context_unqueue(struct intel_engine *ring)
>   {
>   	struct drm_i915_gem_request *req0 = NULL, *req1 = NULL;
>   	struct drm_i915_gem_request *cursor = NULL, *tmp = NULL;
> +	struct drm_i915_private *dev_priv = ring->dev->dev_private;
> +
> +	assert_spin_locked(&ring->execlist_lock);
>
>   	if (list_empty(&ring->execlist_queue))
>   		return;
> @@ -249,8 +269,7 @@ static void gen8_switch_context_unqueue(struct intel_engine *ring)
>   			/* Same ID: ignore first request, as second request
>   			 * will update tail past first request's workload */
>   			list_del(&req0->execlist_link);
> -			i915_gem_context_unreference(req0->ctx);
> -			kfree(req0);
> +			queue_work(dev_priv->wq, &req0->work);
>   			req0 = cursor;
>   		} else {
>   			req1 = cursor;
> @@ -262,6 +281,83 @@ static void gen8_switch_context_unqueue(struct intel_engine *ring)
>   			req1? req1->ctx : NULL, req1? req1->tail : 0));
>   }
>
> +static bool check_remove_request(struct intel_engine *ring, u32 request_id)
> +{
> +	struct drm_i915_private *dev_priv = ring->dev->dev_private;
> +	struct drm_i915_gem_request *head_req;
> +
> +	assert_spin_locked(&ring->execlist_lock);
> +
> +	head_req = list_first_entry_or_null(&ring->execlist_queue,
> +			struct drm_i915_gem_request, execlist_link);
> +	if (head_req != NULL) {
> +		if (get_submission_id(head_req->ctx) == request_id) {
> +			list_del(&head_req->execlist_link);
> +			queue_work(dev_priv->wq, &head_req->work);
> +			return true;
> +		}
> +	}
> +
> +	return false;
> +}
> +
> +void gen8_handle_context_events(struct intel_engine *ring)
> +{
> +	struct drm_i915_private *dev_priv = ring->dev->dev_private;
> +	u32 status_pointer;
> +	u8 read_pointer;
> +	u8 write_pointer;
> +	u32 status;
> +	u32 status_id;
> +	u32 submit_contexts = 0;
> +
> +	status_pointer = I915_READ(RING_CONTEXT_STATUS_PTR(ring));
> +
> +	read_pointer = ring->next_context_status_buffer;
> +	write_pointer = status_pointer & 0x07;
> +	if (read_pointer > write_pointer)
> +		write_pointer += 6;
> +
> +	spin_lock(&ring->execlist_lock);
> +
> +	while (read_pointer < write_pointer) {
> +		read_pointer++;
> +		status = I915_READ(RING_CONTEXT_STATUS_BUF(ring) +
> +				(read_pointer % 6) * 8);
> +		status_id = I915_READ(RING_CONTEXT_STATUS_BUF(ring) +
> +				(read_pointer % 6) * 8 + 4);
> +
> +		if (status & GEN8_CTX_STATUS_ELEMENT_SWITCH) {
> +			if (check_remove_request(ring, status_id))
> +				submit_contexts++;
> +		} else if (status & GEN8_CTX_STATUS_COMPLETE) {
> +			if (check_remove_request(ring, status_id))
> +				submit_contexts++;
> +		}
> +	}
> +
> +	if (submit_contexts != 0)
> +		gen8_switch_context_unqueue(ring);
> +
> +	spin_unlock(&ring->execlist_lock);
> +
> +	WARN(submit_contexts > 2, "More than two context complete events?\n");
> +	ring->next_context_status_buffer = write_pointer % 6;
> +}
> +
> +static void free_request_task(struct work_struct *work)
> +{
> +	struct drm_i915_gem_request *req =
> +			container_of(work, struct drm_i915_gem_request, work);
> +	struct drm_device *dev = req->ring->dev;
> +
> +	mutex_lock(&dev->struct_mutex);
> +	i915_gem_context_unreference(req->ctx);
> +	mutex_unlock(&dev->struct_mutex);
> +
> +	kfree(req);
> +}
> +
>   int gen8_switch_context_queue(struct intel_engine *ring,
>   			      struct i915_hw_context *to,
>   			      u32 tail)
> @@ -276,6 +372,7 @@ int gen8_switch_context_queue(struct intel_engine *ring,
>   	req->ctx = to;
>   	i915_gem_context_reference(req->ctx);
>   	req->tail = tail;
> +	INIT_WORK(&req->work, free_request_task);
>
>   	spin_lock_irqsave(&ring->execlist_lock, flags);
>
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
> index a92bede..ee5a220 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> @@ -1464,6 +1464,7 @@ static int intel_init_ring(struct drm_device *dev,
>   		if (ring->status_page.page_addr == NULL)
>   			return -ENOMEM;
>   		ring->status_page.obj = obj;
> +		ring->next_context_status_buffer = 0;
>   	} else if (I915_NEED_GFX_HWS(dev)) {
>   		ret = init_status_page(ring);
>   		if (ret)
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
> index 5f4fd3c..daca04e 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.h
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
> @@ -173,6 +173,7 @@ struct intel_engine {
>
>   	struct i915_hw_context *default_context;
>   	struct i915_hw_context *last_context;
> +	u8 next_context_status_buffer;
>
>   	struct intel_ring_hangcheck hangcheck;
>
>
oscar.mateo@intel.com April 28, 2014, 2:43 p.m. UTC | #4
> > >   		tmp = I915_READ(GEN8_GT_IIR(0));
> > >   		if (tmp) {
> > >   			ret = IRQ_HANDLED;
> > > +
> > >   			rcs = tmp >> GEN8_RCS_IRQ_SHIFT;
> > > -			bcs = tmp >> GEN8_BCS_IRQ_SHIFT;
> > > +			ring = &dev_priv->ring[RCS];
> > >   			if (rcs & GT_RENDER_USER_INTERRUPT)
> > > -				notify_ring(dev, &dev_priv->ring[RCS]);
> > > +				notify_ring(dev, ring);
> > > +			if (rcs & GEN8_GT_CONTEXT_SWITCH_INTERRUPT)
> > > +				gen8_handle_context_events(ring);
> > 
> > Handling the context events here can generate a new execlist submission,
> > which if a small enough workload, can finish and generate a new context event
> > interrupt before we ack this interrupt.
> > 
> > When we ack this interrupt, we clear the new one too, loosing an interrupt.
> > 
> > Moving the
> > 
> > I915_WRITE(GEN8_GT_IIR(0), tmp);
> > 
> > to just inside the if (tmp) { conditional (or anywhere before this call) fixes this
> > issue. There is no harm in acking the interrupt immediately as we have the
> > read stored in tmp.
> > 
> -----Original Message-----
> From: Daniel, Thomas
> Sent: Monday, April 28, 2014 10:58 AM
> To: Beckett, Robert; Mateo Lozano, Oscar; Barbalho, Rafael; Ewins, Jon
> Subject: RE: Re: [Intel-gfx] [PATCH 43/49] drm/i915/bdw: Handle context switch
> events
> 
> Hi Bob,
> 
> Looks like a good catch, and a sensible fix.
> 
> Thomas.

I agree with Thomas. Will add to the next revision of the series.

Thanks!
Oscar
diff mbox

Patch

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 2607664..4c8cf52 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -1679,6 +1679,8 @@  struct drm_i915_gem_request {
 
 	/** execlist queue entry for this request */
 	struct list_head execlist_link;
+	/** Struct to handle this request in the bottom half of an interrupt */
+	struct work_struct work;
 };
 
 struct drm_i915_file_private {
@@ -2344,6 +2346,7 @@  void gen8_gem_context_free(struct i915_hw_context *ctx);
 int gen8_switch_context_queue(struct intel_engine *ring,
 			      struct i915_hw_context *to,
 			      u32 tail);
+void gen8_handle_context_events(struct intel_engine *ring);
 
 /* i915_gem_evict.c */
 int __must_check i915_gem_evict_something(struct drm_device *dev,
diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index 56657b5..6e0f456 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -1334,6 +1334,7 @@  static irqreturn_t gen8_gt_irq_handler(struct drm_device *dev,
 				       struct drm_i915_private *dev_priv,
 				       u32 master_ctl)
 {
+	struct intel_engine *ring;
 	u32 rcs, bcs, vcs, vecs;
 	uint32_t tmp = 0;
 	irqreturn_t ret = IRQ_NONE;
@@ -1342,14 +1343,21 @@  static irqreturn_t gen8_gt_irq_handler(struct drm_device *dev,
 		tmp = I915_READ(GEN8_GT_IIR(0));
 		if (tmp) {
 			ret = IRQ_HANDLED;
+
 			rcs = tmp >> GEN8_RCS_IRQ_SHIFT;
-			bcs = tmp >> GEN8_BCS_IRQ_SHIFT;
+			ring = &dev_priv->ring[RCS];
 			if (rcs & GT_RENDER_USER_INTERRUPT)
-				notify_ring(dev, &dev_priv->ring[RCS]);
+				notify_ring(dev, ring);
+			if (rcs & GEN8_GT_CONTEXT_SWITCH_INTERRUPT)
+				gen8_handle_context_events(ring);
+
+			bcs = tmp >> GEN8_BCS_IRQ_SHIFT;
+			ring = &dev_priv->ring[BCS];
 			if (bcs & GT_RENDER_USER_INTERRUPT)
-				notify_ring(dev, &dev_priv->ring[BCS]);
-			if ((rcs | bcs) & GEN8_GT_CONTEXT_SWITCH_INTERRUPT)
-			     DRM_DEBUG_DRIVER("TODO: Context switch\n");
+				notify_ring(dev, ring);
+			if (bcs & GEN8_GT_CONTEXT_SWITCH_INTERRUPT)
+				gen8_handle_context_events(ring);
+
 			I915_WRITE(GEN8_GT_IIR(0), tmp);
 		} else
 			DRM_ERROR("The master control interrupt lied (GT0)!\n");
@@ -1360,10 +1368,11 @@  static irqreturn_t gen8_gt_irq_handler(struct drm_device *dev,
 		if (tmp) {
 			ret = IRQ_HANDLED;
 			vcs = tmp >> GEN8_VCS1_IRQ_SHIFT;
+			ring = &dev_priv->ring[VCS];
 			if (vcs & GT_RENDER_USER_INTERRUPT)
-				notify_ring(dev, &dev_priv->ring[VCS]);
+				notify_ring(dev, ring);
 			if (vcs & GEN8_GT_CONTEXT_SWITCH_INTERRUPT)
-			     DRM_DEBUG_DRIVER("TODO: Context switch\n");
+				gen8_handle_context_events(ring);
 			I915_WRITE(GEN8_GT_IIR(1), tmp);
 		} else
 			DRM_ERROR("The master control interrupt lied (GT1)!\n");
@@ -1374,10 +1383,11 @@  static irqreturn_t gen8_gt_irq_handler(struct drm_device *dev,
 		if (tmp) {
 			ret = IRQ_HANDLED;
 			vecs = tmp >> GEN8_VECS_IRQ_SHIFT;
+			ring = &dev_priv->ring[VECS];
 			if (vecs & GT_RENDER_USER_INTERRUPT)
-				notify_ring(dev, &dev_priv->ring[VECS]);
+				notify_ring(dev, ring);
 			if (vecs & GEN8_GT_CONTEXT_SWITCH_INTERRUPT)
-			     DRM_DEBUG_DRIVER("TODO: Context switch\n");
+				gen8_handle_context_events(ring);
 			I915_WRITE(GEN8_GT_IIR(3), tmp);
 		} else
 			DRM_ERROR("The master control interrupt lied (GT3)!\n");
diff --git a/drivers/gpu/drm/i915/i915_lrc.c b/drivers/gpu/drm/i915/i915_lrc.c
index 4cacabb..440da11 100644
--- a/drivers/gpu/drm/i915/i915_lrc.c
+++ b/drivers/gpu/drm/i915/i915_lrc.c
@@ -46,7 +46,24 @@ 
 #define GEN8_LR_CONTEXT_SIZE (21 * PAGE_SIZE)
 
 #define RING_ELSP(ring)			((ring)->mmio_base+0x230)
+#define RING_EXECLIST_STATUS(ring)	((ring)->mmio_base+0x234)
 #define RING_CONTEXT_CONTROL(ring)	((ring)->mmio_base+0x244)
+#define RING_CONTEXT_STATUS_BUF(ring)	((ring)->mmio_base+0x370)
+#define RING_CONTEXT_STATUS_PTR(ring)	((ring)->mmio_base+0x3a0)
+
+#define RING_EXECLIST_QFULL		(1 << 0x2)
+#define RING_EXECLIST1_VALID		(1 << 0x3)
+#define RING_EXECLIST0_VALID		(1 << 0x4)
+#define RING_EXECLIST_ACTIVE_STATUS	(3 << 0xE)
+#define RING_EXECLIST1_ACTIVE		(1 << 0x11)
+#define RING_EXECLIST0_ACTIVE		(1 << 0x12)
+
+#define GEN8_CTX_STATUS_IDLE_ACTIVE	(1 << 0)
+#define GEN8_CTX_STATUS_PREEMPTED	(1 << 1)
+#define GEN8_CTX_STATUS_ELEMENT_SWITCH	(1 << 2)
+#define GEN8_CTX_STATUS_ACTIVE_IDLE	(1 << 3)
+#define GEN8_CTX_STATUS_COMPLETE	(1 << 4)
+#define GEN8_CTX_STATUS_LITE_RESTORE	(1 << 15)
 
 #define CTX_LRI_HEADER_0		0x01
 #define CTX_CONTEXT_CONTROL		0x02
@@ -237,6 +254,9 @@  static void gen8_switch_context_unqueue(struct intel_engine *ring)
 {
 	struct drm_i915_gem_request *req0 = NULL, *req1 = NULL;
 	struct drm_i915_gem_request *cursor = NULL, *tmp = NULL;
+	struct drm_i915_private *dev_priv = ring->dev->dev_private;
+
+	assert_spin_locked(&ring->execlist_lock);
 
 	if (list_empty(&ring->execlist_queue))
 		return;
@@ -249,8 +269,7 @@  static void gen8_switch_context_unqueue(struct intel_engine *ring)
 			/* Same ID: ignore first request, as second request
 			 * will update tail past first request's workload */
 			list_del(&req0->execlist_link);
-			i915_gem_context_unreference(req0->ctx);
-			kfree(req0);
+			queue_work(dev_priv->wq, &req0->work);
 			req0 = cursor;
 		} else {
 			req1 = cursor;
@@ -262,6 +281,83 @@  static void gen8_switch_context_unqueue(struct intel_engine *ring)
 			req1? req1->ctx : NULL, req1? req1->tail : 0));
 }
 
+static bool check_remove_request(struct intel_engine *ring, u32 request_id)
+{
+	struct drm_i915_private *dev_priv = ring->dev->dev_private;
+	struct drm_i915_gem_request *head_req;
+
+	assert_spin_locked(&ring->execlist_lock);
+
+	head_req = list_first_entry_or_null(&ring->execlist_queue,
+			struct drm_i915_gem_request, execlist_link);
+	if (head_req != NULL) {
+		if (get_submission_id(head_req->ctx) == request_id) {
+			list_del(&head_req->execlist_link);
+			queue_work(dev_priv->wq, &head_req->work);
+			return true;
+		}
+	}
+
+	return false;
+}
+
+void gen8_handle_context_events(struct intel_engine *ring)
+{
+	struct drm_i915_private *dev_priv = ring->dev->dev_private;
+	u32 status_pointer;
+	u8 read_pointer;
+	u8 write_pointer;
+	u32 status;
+	u32 status_id;
+	u32 submit_contexts = 0;
+
+	status_pointer = I915_READ(RING_CONTEXT_STATUS_PTR(ring));
+
+	read_pointer = ring->next_context_status_buffer;
+	write_pointer = status_pointer & 0x07;
+	if (read_pointer > write_pointer)
+		write_pointer += 6;
+
+	spin_lock(&ring->execlist_lock);
+
+	while (read_pointer < write_pointer) {
+		read_pointer++;
+		status = I915_READ(RING_CONTEXT_STATUS_BUF(ring) +
+				(read_pointer % 6) * 8);
+		status_id = I915_READ(RING_CONTEXT_STATUS_BUF(ring) +
+				(read_pointer % 6) * 8 + 4);
+
+		if (status & GEN8_CTX_STATUS_ELEMENT_SWITCH) {
+			if (check_remove_request(ring, status_id))
+				submit_contexts++;
+		} else if (status & GEN8_CTX_STATUS_COMPLETE) {
+			if (check_remove_request(ring, status_id))
+				submit_contexts++;
+		}
+	}
+
+	if (submit_contexts != 0)
+		gen8_switch_context_unqueue(ring);
+
+	spin_unlock(&ring->execlist_lock);
+
+	WARN(submit_contexts > 2, "More than two context complete events?\n");
+	ring->next_context_status_buffer = write_pointer % 6;
+}
+
+static void free_request_task(struct work_struct *work)
+{
+	struct drm_i915_gem_request *req =
+			container_of(work, struct drm_i915_gem_request, work);
+	struct drm_device *dev = req->ring->dev;
+
+	mutex_lock(&dev->struct_mutex);
+	i915_gem_context_unreference(req->ctx);
+	mutex_unlock(&dev->struct_mutex);
+
+	kfree(req);
+}
+
 int gen8_switch_context_queue(struct intel_engine *ring,
 			      struct i915_hw_context *to,
 			      u32 tail)
@@ -276,6 +372,7 @@  int gen8_switch_context_queue(struct intel_engine *ring,
 	req->ctx = to;
 	i915_gem_context_reference(req->ctx);
 	req->tail = tail;
+	INIT_WORK(&req->work, free_request_task);
 
 	spin_lock_irqsave(&ring->execlist_lock, flags);
 
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index a92bede..ee5a220 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -1464,6 +1464,7 @@  static int intel_init_ring(struct drm_device *dev,
 		if (ring->status_page.page_addr == NULL)
 			return -ENOMEM;
 		ring->status_page.obj = obj;
+		ring->next_context_status_buffer = 0;
 	} else if (I915_NEED_GFX_HWS(dev)) {
 		ret = init_status_page(ring);
 		if (ret)
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index 5f4fd3c..daca04e 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -173,6 +173,7 @@  struct intel_engine {
 
 	struct i915_hw_context *default_context;
 	struct i915_hw_context *last_context;
+	u8 next_context_status_buffer;
 
 	struct intel_ring_hangcheck hangcheck;