diff mbox series

[i-g-t,1/2] tests/gem_exec_await: Relax the busy spinner

Message ID 20181119152229.8390-1-tvrtko.ursulin@linux.intel.com (mailing list archive)
State New, archived
Headers show
Series [i-g-t,1/2] tests/gem_exec_await: Relax the busy spinner | expand

Commit Message

Tvrtko Ursulin Nov. 19, 2018, 3:22 p.m. UTC
From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Add some nop instructions between recursive batch buffer start calls to
give system some breathing room. Without these, especially when coupled
with memory pressure, false GPU hangs can be observed caused by the
inability of the chip to cope.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106680
---
 tests/i915/gem_exec_await.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

Comments

Chris Wilson Nov. 19, 2018, 3:28 p.m. UTC | #1
Quoting Tvrtko Ursulin (2018-11-19 15:22:28)
> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> 
> Add some nop instructions between recursive batch buffer start calls to
> give system some breathing room. Without these, especially when coupled
> with memory pressure, false GPU hangs can be observed caused by the
> inability of the chip to cope.

Doesn't seem to be required. And the machines most susceptible to timer
errors due to busyspin have not show the issue.
-Chris
Tvrtko Ursulin Nov. 19, 2018, 3:33 p.m. UTC | #2
On 19/11/2018 15:28, Chris Wilson wrote:
> Quoting Tvrtko Ursulin (2018-11-19 15:22:28)
>> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>
>> Add some nop instructions between recursive batch buffer start calls to
>> give system some breathing room. Without these, especially when coupled
>> with memory pressure, false GPU hangs can be observed caused by the
>> inability of the chip to cope.
> 
> Doesn't seem to be required. And the machines most susceptible to timer
> errors due to busyspin have not show the issue.

With the memory pressure subtest, the second patch in this series, it 
was make it or break it to have the nops. Without them it was GPU hangs 
all around, and with them so far all clean.

Regards,

Tvrtko
Chris Wilson Nov. 19, 2018, 4:18 p.m. UTC | #3
Quoting Tvrtko Ursulin (2018-11-19 15:33:56)
> 
> On 19/11/2018 15:28, Chris Wilson wrote:
> > Quoting Tvrtko Ursulin (2018-11-19 15:22:28)
> >> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> >>
> >> Add some nop instructions between recursive batch buffer start calls to
> >> give system some breathing room. Without these, especially when coupled
> >> with memory pressure, false GPU hangs can be observed caused by the
> >> inability of the chip to cope.
> > 
> > Doesn't seem to be required. And the machines most susceptible to timer
> > errors due to busyspin have not show the issue.
> 
> With the memory pressure subtest, the second patch in this series, it 
> was make it or break it to have the nops. Without them it was GPU hangs 
> all around, and with them so far all clean.

First machine bsw, just applying patch 2/2,

IGT-Version: 1.23-gb6b8d829 (x86_64) (Linux: 4.20.0-rc2+ x86_64)
Using Execlists submission
Ring size: 131 batches
Starting subtest: wide-all
wide: 420 cycles: 24121.034us
Subtest wide-all: SUCCESS (47.060s)
Starting subtest: wide-contexts
wide: 340 cycles: 24893.896us
Subtest wide-contexts: SUCCESS (22.265s)
Starting subtest: wide-contexts-mempressure
wide: 232 cycles: 25153.899us
Subtest wide-contexts-mempressure: SUCCESS (23.141s)

:|
-Chris
Tvrtko Ursulin Nov. 19, 2018, 7:18 p.m. UTC | #4
On 19/11/2018 16:18, Chris Wilson wrote:
> Quoting Tvrtko Ursulin (2018-11-19 15:33:56)
>>
>> On 19/11/2018 15:28, Chris Wilson wrote:
>>> Quoting Tvrtko Ursulin (2018-11-19 15:22:28)
>>>> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>>>
>>>> Add some nop instructions between recursive batch buffer start calls to
>>>> give system some breathing room. Without these, especially when coupled
>>>> with memory pressure, false GPU hangs can be observed caused by the
>>>> inability of the chip to cope.
>>>
>>> Doesn't seem to be required. And the machines most susceptible to timer
>>> errors due to busyspin have not show the issue.
>>
>> With the memory pressure subtest, the second patch in this series, it
>> was make it or break it to have the nops. Without them it was GPU hangs
>> all around, and with them so far all clean.
> 
> First machine bsw, just applying patch 2/2,
> 
> IGT-Version: 1.23-gb6b8d829 (x86_64) (Linux: 4.20.0-rc2+ x86_64)
> Using Execlists submission
> Ring size: 131 batches
> Starting subtest: wide-all
> wide: 420 cycles: 24121.034us
> Subtest wide-all: SUCCESS (47.060s)
> Starting subtest: wide-contexts
> wide: 340 cycles: 24893.896us
> Subtest wide-contexts: SUCCESS (22.265s)
> Starting subtest: wide-contexts-mempressure
> wide: 232 cycles: 25153.899us
> Subtest wide-contexts-mempressure: SUCCESS (23.141s)
> 
> :|

Yes, I think what I had before I cleaned up the test case was more 
copy&paste of the memory pressure thread from gem_syslatency - including 
the rtprio and multithreadedness. So I was possibly starving the 
tasklets and who knows what not, as well as applying memory pressure. 
However, fact still is adding nops to the spinner made even that monster 
pass repeatedly. I'll play with it more tomorrow.

Regards,

Tvrtko
Chris Wilson Nov. 19, 2018, 7:34 p.m. UTC | #5
Quoting Tvrtko Ursulin (2018-11-19 19:18:52)
> 
> On 19/11/2018 16:18, Chris Wilson wrote:
> > Quoting Tvrtko Ursulin (2018-11-19 15:33:56)
> >>
> >> On 19/11/2018 15:28, Chris Wilson wrote:
> >>> Quoting Tvrtko Ursulin (2018-11-19 15:22:28)
> >>>> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> >>>>
> >>>> Add some nop instructions between recursive batch buffer start calls to
> >>>> give system some breathing room. Without these, especially when coupled
> >>>> with memory pressure, false GPU hangs can be observed caused by the
> >>>> inability of the chip to cope.
> >>>
> >>> Doesn't seem to be required. And the machines most susceptible to timer
> >>> errors due to busyspin have not show the issue.
> >>
> >> With the memory pressure subtest, the second patch in this series, it
> >> was make it or break it to have the nops. Without them it was GPU hangs
> >> all around, and with them so far all clean.
> > 
> > First machine bsw, just applying patch 2/2,
> > 
> > IGT-Version: 1.23-gb6b8d829 (x86_64) (Linux: 4.20.0-rc2+ x86_64)
> > Using Execlists submission
> > Ring size: 131 batches
> > Starting subtest: wide-all
> > wide: 420 cycles: 24121.034us
> > Subtest wide-all: SUCCESS (47.060s)
> > Starting subtest: wide-contexts
> > wide: 340 cycles: 24893.896us
> > Subtest wide-contexts: SUCCESS (22.265s)
> > Starting subtest: wide-contexts-mempressure
> > wide: 232 cycles: 25153.899us
> > Subtest wide-contexts-mempressure: SUCCESS (23.141s)
> > 
> > :|
> 
> Yes, I think what I had before I cleaned up the test case was more 
> copy&paste of the memory pressure thread from gem_syslatency - including 
> the rtprio and multithreadedness. So I was possibly starving the 
> tasklets and who knows what not, as well as applying memory pressure. 
> However, fact still is adding nops to the spinner made even that monster 
> pass repeatedly. I'll play with it more tomorrow.

I am a bit nervous about using the noops to avoid the issue, as I
presume that there is a more realistic workload that could generate
similar system latencies, i.e. that there exists a pathological case
that users will hit for similar stalls (gem shrinker perchance).
-Chris
diff mbox series

Patch

diff --git a/tests/i915/gem_exec_await.c b/tests/i915/gem_exec_await.c
index 5cfeb8ec8bfa..3ea5b5903c6b 100644
--- a/tests/i915/gem_exec_await.c
+++ b/tests/i915/gem_exec_await.c
@@ -63,6 +63,7 @@  static void xchg_obj(void *array, unsigned i, unsigned j)
 static void wide(int fd, int ring_size, int timeout, unsigned int flags)
 {
 	const uint32_t bbe = MI_BATCH_BUFFER_END;
+	const unsigned int num_nops = 1000;
 	const int gen = intel_gen(intel_get_drm_devid(fd));
 	struct {
 		struct drm_i915_gem_exec_object2 *obj;
@@ -123,7 +124,7 @@  static void wide(int fd, int ring_size, int timeout, unsigned int flags)
 		exec[e].execbuf.buffer_count = 2;
 
 		exec[e].reloc.target_handle = 1; /* recurse */
-		exec[e].reloc.offset = sizeof(uint32_t);
+		exec[e].reloc.offset = (1 + num_nops) * sizeof(uint32_t);
 		exec[e].reloc.read_domains = I915_GEM_DOMAIN_COMMAND;
 		if (gen < 4)
 			exec[e].reloc.delta = 1;
@@ -162,7 +163,7 @@  static void wide(int fd, int ring_size, int timeout, unsigned int flags)
 			gem_set_domain(fd, exec[e].exec[1].handle,
 				       I915_GEM_DOMAIN_WC, I915_GEM_DOMAIN_WC);
 
-			i = 0;
+			i = num_nops;
 			exec[e].cmd[i] = MI_BATCH_BUFFER_START;
 			if (gen >= 8) {
 				exec[e].cmd[i] |= 1 << 8 | 1;
@@ -200,7 +201,7 @@  static void wide(int fd, int ring_size, int timeout, unsigned int flags)
 		count += nengine;
 
 		for (unsigned e = 0; e < nengine; e++)
-			exec[e].cmd[0] = MI_BATCH_BUFFER_END;
+			exec[e].cmd[num_nops] = MI_BATCH_BUFFER_END;
 		__sync_synchronize();
 	}