diff mbox series

drm/i915/icl: Forcibly evict stale csb entries

Message ID 20181205134612.24822-1-mika.kuoppala@linux.intel.com (mailing list archive)
State New, archived
Headers show
Series drm/i915/icl: Forcibly evict stale csb entries | expand

Commit Message

Mika Kuoppala Dec. 5, 2018, 1:46 p.m. UTC
Gen11 fails to deliver wrt global observation point on
tail/entry updates and we sometimes see old entry.

Use clflush to forcibly evict our possibly stale copy
of the cacheline in hopes that we get fresh one from gpu.
Obviously there is something amiss in the coherency protocol so
this can be consired as a workaround until real cause
is found.

The working hardware will do the evict without our cue anyways,
so the cost in there should be ameliorated by that fact.

v2: for next pass, s/flush/evict, add reset (Chris)

References: https://bugzilla.freedesktop.org/show_bug.cgi?id=108315
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
---
 drivers/gpu/drm/i915/intel_lrc.c | 23 +++++++++++++++++++++++
 1 file changed, 23 insertions(+)

Comments

Chris Wilson Dec. 5, 2018, 1:58 p.m. UTC | #1
Quoting Mika Kuoppala (2018-12-05 13:46:12)
>  static void nop_submission_tasklet(unsigned long data)
> @@ -1015,6 +1025,19 @@ static void process_csb(struct intel_engine_cs *engine)
>         } while (head != tail);
>  
>         execlists->csb_head = head;
> +
> +       /*
> +        * Gen11 has proven to fail wrt global observation point between
> +        * entry and tail update, failing on the ordering and thus
> +        * we see an old entry in the context status buffer.
> +        *
> +        * Forcibly evict out entries for the next gpu csb update,
> +        * to increase the odds that we get a fresh entries with non
> +        * working hardware. The cost for doing so comes out mostly with
> +        * the wash as hardware, working or not, will need to do the
> +        * invalidation before.
> +        */
> +       invalidate_csb_entries(&buf[0], &buf[GEN8_CSB_ENTRIES - 1]);

If it works, this is a stroke of genius.

If we hypothesize that the GPU did write the CSB entries before the head
pointer and inserted a Global Observation point beforehand, then we
theorize that they merely forgot the cc protocol, the writes to system memory is
correctly, but unordered into the cpu cache.

By using the clflush to evict our used cacheline, on the next pass we
will pull in that CSB entry cacheline back in from memory (ordered by
the rmb used for the ringbuffer) and so, if the HW engineer's
insistence that they did remember their wmb, the CSB entries will be
coherent with the head pointer.

So we remove one piece of the puzzle at what should be negligible cost,
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
-Chris
Mika Kuoppala Dec. 7, 2018, 12:37 p.m. UTC | #2
Chris Wilson <chris@chris-wilson.co.uk> writes:

> Quoting Mika Kuoppala (2018-12-05 13:46:12)
>>  static void nop_submission_tasklet(unsigned long data)
>> @@ -1015,6 +1025,19 @@ static void process_csb(struct intel_engine_cs *engine)
>>         } while (head != tail);
>>  
>>         execlists->csb_head = head;
>> +
>> +       /*
>> +        * Gen11 has proven to fail wrt global observation point between
>> +        * entry and tail update, failing on the ordering and thus
>> +        * we see an old entry in the context status buffer.
>> +        *
>> +        * Forcibly evict out entries for the next gpu csb update,
>> +        * to increase the odds that we get a fresh entries with non
>> +        * working hardware. The cost for doing so comes out mostly with
>> +        * the wash as hardware, working or not, will need to do the
>> +        * invalidation before.
>> +        */
>> +       invalidate_csb_entries(&buf[0], &buf[GEN8_CSB_ENTRIES - 1]);
>
> If it works, this is a stroke of genius.
>
> If we hypothesize that the GPU did write the CSB entries before the head
> pointer and inserted a Global Observation point beforehand, then we
> theorize that they merely forgot the cc protocol, the writes to system memory is
> correctly, but unordered into the cpu cache.
>
> By using the clflush to evict our used cacheline, on the next pass we
> will pull in that CSB entry cacheline back in from memory (ordered by
> the rmb used for the ringbuffer) and so, if the HW engineer's
> insistence that they did remember their wmb, the CSB entries will be
> coherent with the head pointer.
>
> So we remove one piece of the puzzle at what should be negligible cost,
> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>

Thank you for review and kind words, pushed.
-Mika
diff mbox series

Patch

diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index d7fa301b5ec7..2fe920751d94 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -765,6 +765,13 @@  execlists_cancel_port_requests(struct intel_engine_execlists * const execlists)
 	execlists_clear_all_active(execlists);
 }
 
+static inline void
+invalidate_csb_entries(const u32 *first, const u32 *last)
+{
+	clflush((void *)first);
+	clflush((void *)last);
+}
+
 static void reset_csb_pointers(struct intel_engine_execlists *execlists)
 {
 	const unsigned int reset_value = GEN8_CSB_ENTRIES - 1;
@@ -780,6 +787,9 @@  static void reset_csb_pointers(struct intel_engine_execlists *execlists)
 	 */
 	execlists->csb_head = reset_value;
 	WRITE_ONCE(*execlists->csb_write, reset_value);
+
+	invalidate_csb_entries(&execlists->csb_status[0],
+			       &execlists->csb_status[GEN8_CSB_ENTRIES - 1]);
 }
 
 static void nop_submission_tasklet(unsigned long data)
@@ -1015,6 +1025,19 @@  static void process_csb(struct intel_engine_cs *engine)
 	} while (head != tail);
 
 	execlists->csb_head = head;
+
+	/*
+	 * Gen11 has proven to fail wrt global observation point between
+	 * entry and tail update, failing on the ordering and thus
+	 * we see an old entry in the context status buffer.
+	 *
+	 * Forcibly evict out entries for the next gpu csb update,
+	 * to increase the odds that we get a fresh entries with non
+	 * working hardware. The cost for doing so comes out mostly with
+	 * the wash as hardware, working or not, will need to do the
+	 * invalidation before.
+	 */
+	invalidate_csb_entries(&buf[0], &buf[GEN8_CSB_ENTRIES - 1]);
 }
 
 static void __execlists_submission_tasklet(struct intel_engine_cs *const engine)