diff mbox

[RFC,2/8] drm/i915: Skip CSB processing on invalid CSB tail

Message ID 20180316183105.16027-3-jeff.mcgee@intel.com (mailing list archive)
State New, archived
Headers show

Commit Message

jeff.mcgee@intel.com March 16, 2018, 6:30 p.m. UTC
From: Jeff McGee <jeff.mcgee@intel.com>

Engine reset is fast. A context switch interrupt may be generated just
prior to the reset such that the top half handler is racing with reset
post-processing. The handler may set the irq_posted bit again after
the reset code has cleared it to start fresh. Then the re-enabled
tasklet will read the CSB head and tail from MMIO, which will be at
the hardware reset values of 0 and 7 respectively, given that no
actual CSB event has occurred since the reset. Mayhem then ensues as
the tasklet starts processing invalid CSB entries.

We can handle this corner case without adding any new synchronization
between the irq top half and the reset work item. The tasklet can
just skip CSB processing if the tail is not sane.

This patch is required to support the force preemption feature.

Test: Run IGT gem_exec_fpreempt repeatedly.
Change-Id: Ic7eb00600480bd62331c397dd92b747d946241e4
Signed-off-by: Jeff McGee <jeff.mcgee@intel.com>
---
 drivers/gpu/drm/i915/intel_lrc.c | 9 +++++++++
 1 file changed, 9 insertions(+)

Comments

Chris Wilson March 16, 2018, 8:30 p.m. UTC | #1
Quoting jeff.mcgee@intel.com (2018-03-16 18:30:59)
> From: Jeff McGee <jeff.mcgee@intel.com>
> 
> Engine reset is fast. A context switch interrupt may be generated just
> prior to the reset such that the top half handler is racing with reset
> post-processing. The handler may set the irq_posted bit again after
> the reset code has cleared it to start fresh. Then the re-enabled
> tasklet will read the CSB head and tail from MMIO, which will be at
> the hardware reset values of 0 and 7 respectively, given that no
> actual CSB event has occurred since the reset. Mayhem then ensues as
> the tasklet starts processing invalid CSB entries.
> 
> We can handle this corner case without adding any new synchronization
> between the irq top half and the reset work item. The tasklet can
> just skip CSB processing if the tail is not sane.
> 
> This patch is required to support the force preemption feature.

Please note how this is handled currently, because this is not a new
problem.
-Chris
diff mbox

Patch

diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 7d93fcd56d34..5f63d1d6a2d6 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -851,6 +851,14 @@  static void intel_lrc_irq_handler(unsigned long data)
 			head = readl(dev_priv->regs + i915_mmio_reg_offset(RING_CONTEXT_STATUS_PTR(engine)));
 			tail = GEN8_CSB_WRITE_PTR(head);
 			head = GEN8_CSB_READ_PTR(head);
+
+			/* The MMIO read CSB tail may be at the reset value of
+			 * 0x7 if there hasn't been a valid CSB event since
+			 * the engine reset. Skip on to dequeue if so.
+			 */
+			if (tail >= GEN8_CSB_ENTRIES)
+				break;
+
 			execlists->csb_head = head;
 		} else {
 			const int write_idx =
@@ -859,6 +867,7 @@  static void intel_lrc_irq_handler(unsigned long data)
 
 			head = execlists->csb_head;
 			tail = READ_ONCE(buf[write_idx]);
+			GEM_BUG_ON(tail >= GEN8_CSB_ENTRIES);
 		}
 
 		while (head != tail) {