Message ID | 20181119152229.8390-1-tvrtko.ursulin@linux.intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [i-g-t,1/2] tests/gem_exec_await: Relax the busy spinner | expand |
Quoting Tvrtko Ursulin (2018-11-19 15:22:28) > From: Tvrtko Ursulin <tvrtko.ursulin@intel.com> > > Add some nop instructions between recursive batch buffer start calls to > give system some breathing room. Without these, especially when coupled > with memory pressure, false GPU hangs can be observed caused by the > inability of the chip to cope. Doesn't seem to be required. And the machines most susceptible to timer errors due to busyspin have not show the issue. -Chris
On 19/11/2018 15:28, Chris Wilson wrote: > Quoting Tvrtko Ursulin (2018-11-19 15:22:28) >> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com> >> >> Add some nop instructions between recursive batch buffer start calls to >> give system some breathing room. Without these, especially when coupled >> with memory pressure, false GPU hangs can be observed caused by the >> inability of the chip to cope. > > Doesn't seem to be required. And the machines most susceptible to timer > errors due to busyspin have not show the issue. With the memory pressure subtest, the second patch in this series, it was make it or break it to have the nops. Without them it was GPU hangs all around, and with them so far all clean. Regards, Tvrtko
Quoting Tvrtko Ursulin (2018-11-19 15:33:56) > > On 19/11/2018 15:28, Chris Wilson wrote: > > Quoting Tvrtko Ursulin (2018-11-19 15:22:28) > >> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com> > >> > >> Add some nop instructions between recursive batch buffer start calls to > >> give system some breathing room. Without these, especially when coupled > >> with memory pressure, false GPU hangs can be observed caused by the > >> inability of the chip to cope. > > > > Doesn't seem to be required. And the machines most susceptible to timer > > errors due to busyspin have not show the issue. > > With the memory pressure subtest, the second patch in this series, it > was make it or break it to have the nops. Without them it was GPU hangs > all around, and with them so far all clean. First machine bsw, just applying patch 2/2, IGT-Version: 1.23-gb6b8d829 (x86_64) (Linux: 4.20.0-rc2+ x86_64) Using Execlists submission Ring size: 131 batches Starting subtest: wide-all wide: 420 cycles: 24121.034us Subtest wide-all: SUCCESS (47.060s) Starting subtest: wide-contexts wide: 340 cycles: 24893.896us Subtest wide-contexts: SUCCESS (22.265s) Starting subtest: wide-contexts-mempressure wide: 232 cycles: 25153.899us Subtest wide-contexts-mempressure: SUCCESS (23.141s) :| -Chris
On 19/11/2018 16:18, Chris Wilson wrote: > Quoting Tvrtko Ursulin (2018-11-19 15:33:56) >> >> On 19/11/2018 15:28, Chris Wilson wrote: >>> Quoting Tvrtko Ursulin (2018-11-19 15:22:28) >>>> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com> >>>> >>>> Add some nop instructions between recursive batch buffer start calls to >>>> give system some breathing room. Without these, especially when coupled >>>> with memory pressure, false GPU hangs can be observed caused by the >>>> inability of the chip to cope. >>> >>> Doesn't seem to be required. And the machines most susceptible to timer >>> errors due to busyspin have not show the issue. >> >> With the memory pressure subtest, the second patch in this series, it >> was make it or break it to have the nops. Without them it was GPU hangs >> all around, and with them so far all clean. > > First machine bsw, just applying patch 2/2, > > IGT-Version: 1.23-gb6b8d829 (x86_64) (Linux: 4.20.0-rc2+ x86_64) > Using Execlists submission > Ring size: 131 batches > Starting subtest: wide-all > wide: 420 cycles: 24121.034us > Subtest wide-all: SUCCESS (47.060s) > Starting subtest: wide-contexts > wide: 340 cycles: 24893.896us > Subtest wide-contexts: SUCCESS (22.265s) > Starting subtest: wide-contexts-mempressure > wide: 232 cycles: 25153.899us > Subtest wide-contexts-mempressure: SUCCESS (23.141s) > > :| Yes, I think what I had before I cleaned up the test case was more copy&paste of the memory pressure thread from gem_syslatency - including the rtprio and multithreadedness. So I was possibly starving the tasklets and who knows what not, as well as applying memory pressure. However, fact still is adding nops to the spinner made even that monster pass repeatedly. I'll play with it more tomorrow. Regards, Tvrtko
Quoting Tvrtko Ursulin (2018-11-19 19:18:52) > > On 19/11/2018 16:18, Chris Wilson wrote: > > Quoting Tvrtko Ursulin (2018-11-19 15:33:56) > >> > >> On 19/11/2018 15:28, Chris Wilson wrote: > >>> Quoting Tvrtko Ursulin (2018-11-19 15:22:28) > >>>> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com> > >>>> > >>>> Add some nop instructions between recursive batch buffer start calls to > >>>> give system some breathing room. Without these, especially when coupled > >>>> with memory pressure, false GPU hangs can be observed caused by the > >>>> inability of the chip to cope. > >>> > >>> Doesn't seem to be required. And the machines most susceptible to timer > >>> errors due to busyspin have not show the issue. > >> > >> With the memory pressure subtest, the second patch in this series, it > >> was make it or break it to have the nops. Without them it was GPU hangs > >> all around, and with them so far all clean. > > > > First machine bsw, just applying patch 2/2, > > > > IGT-Version: 1.23-gb6b8d829 (x86_64) (Linux: 4.20.0-rc2+ x86_64) > > Using Execlists submission > > Ring size: 131 batches > > Starting subtest: wide-all > > wide: 420 cycles: 24121.034us > > Subtest wide-all: SUCCESS (47.060s) > > Starting subtest: wide-contexts > > wide: 340 cycles: 24893.896us > > Subtest wide-contexts: SUCCESS (22.265s) > > Starting subtest: wide-contexts-mempressure > > wide: 232 cycles: 25153.899us > > Subtest wide-contexts-mempressure: SUCCESS (23.141s) > > > > :| > > Yes, I think what I had before I cleaned up the test case was more > copy&paste of the memory pressure thread from gem_syslatency - including > the rtprio and multithreadedness. So I was possibly starving the > tasklets and who knows what not, as well as applying memory pressure. > However, fact still is adding nops to the spinner made even that monster > pass repeatedly. I'll play with it more tomorrow. I am a bit nervous about using the noops to avoid the issue, as I presume that there is a more realistic workload that could generate similar system latencies, i.e. that there exists a pathological case that users will hit for similar stalls (gem shrinker perchance). -Chris
diff --git a/tests/i915/gem_exec_await.c b/tests/i915/gem_exec_await.c index 5cfeb8ec8bfa..3ea5b5903c6b 100644 --- a/tests/i915/gem_exec_await.c +++ b/tests/i915/gem_exec_await.c @@ -63,6 +63,7 @@ static void xchg_obj(void *array, unsigned i, unsigned j) static void wide(int fd, int ring_size, int timeout, unsigned int flags) { const uint32_t bbe = MI_BATCH_BUFFER_END; + const unsigned int num_nops = 1000; const int gen = intel_gen(intel_get_drm_devid(fd)); struct { struct drm_i915_gem_exec_object2 *obj; @@ -123,7 +124,7 @@ static void wide(int fd, int ring_size, int timeout, unsigned int flags) exec[e].execbuf.buffer_count = 2; exec[e].reloc.target_handle = 1; /* recurse */ - exec[e].reloc.offset = sizeof(uint32_t); + exec[e].reloc.offset = (1 + num_nops) * sizeof(uint32_t); exec[e].reloc.read_domains = I915_GEM_DOMAIN_COMMAND; if (gen < 4) exec[e].reloc.delta = 1; @@ -162,7 +163,7 @@ static void wide(int fd, int ring_size, int timeout, unsigned int flags) gem_set_domain(fd, exec[e].exec[1].handle, I915_GEM_DOMAIN_WC, I915_GEM_DOMAIN_WC); - i = 0; + i = num_nops; exec[e].cmd[i] = MI_BATCH_BUFFER_START; if (gen >= 8) { exec[e].cmd[i] |= 1 << 8 | 1; @@ -200,7 +201,7 @@ static void wide(int fd, int ring_size, int timeout, unsigned int flags) count += nengine; for (unsigned e = 0; e < nengine; e++) - exec[e].cmd[0] = MI_BATCH_BUFFER_END; + exec[e].cmd[num_nops] = MI_BATCH_BUFFER_END; __sync_synchronize(); }