[i-g-t,1/2] tests/gem_exec_await: Relax the busy spinner

Message ID	20181119152229.8390-1-tvrtko.ursulin@linux.intel.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <intel-gfx-bounces@lists.freedesktop.org> From: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> To: igt-dev@lists.freedesktop.org Date: Mon, 19 Nov 2018 15:22:28 +0000 Message-Id: <20181119152229.8390-1-tvrtko.ursulin@linux.intel.com> MIME-Version: 1.0 Subject: [Intel-gfx] [PATCH i-g-t 1/2] tests/gem_exec_await: Relax the busy spinner Precedence: list Cc: Intel-gfx@lists.freedesktop.org Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: base64 Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" <intel-gfx-bounces@lists.freedesktop.org>
Series	[i-g-t,1/2] tests/gem_exec_await: Relax the busy spinner \| expand [i-g-t,1/2] tests/gem_exec_await: Relax the busy spinner [i-g-t,2/2] tests/gem_exec_await: Add a memory pressure subtest

Tvrtko Ursulin Nov. 19, 2018, 3:22 p.m. UTC

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Add some nop instructions between recursive batch buffer start calls to
give system some breathing room. Without these, especially when coupled
with memory pressure, false GPU hangs can be observed caused by the
inability of the chip to cope.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106680
---
 tests/i915/gem_exec_await.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

Chris Wilson Nov. 19, 2018, 3:28 p.m. UTC | #1

Quoting Tvrtko Ursulin (2018-11-19 15:22:28)
> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> 
> Add some nop instructions between recursive batch buffer start calls to
> give system some breathing room. Without these, especially when coupled
> with memory pressure, false GPU hangs can be observed caused by the
> inability of the chip to cope.

Doesn't seem to be required. And the machines most susceptible to timer
errors due to busyspin have not show the issue.
-Chris

Tvrtko Ursulin Nov. 19, 2018, 3:33 p.m. UTC | #2

On 19/11/2018 15:28, Chris Wilson wrote:
> Quoting Tvrtko Ursulin (2018-11-19 15:22:28)
>> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>
>> Add some nop instructions between recursive batch buffer start calls to
>> give system some breathing room. Without these, especially when coupled
>> with memory pressure, false GPU hangs can be observed caused by the
>> inability of the chip to cope.
> 
> Doesn't seem to be required. And the machines most susceptible to timer
> errors due to busyspin have not show the issue.

With the memory pressure subtest, the second patch in this series, it 
was make it or break it to have the nops. Without them it was GPU hangs 
all around, and with them so far all clean.

Regards,

Tvrtko

Chris Wilson Nov. 19, 2018, 4:18 p.m. UTC | #3

Quoting Tvrtko Ursulin (2018-11-19 15:33:56)
> 
> On 19/11/2018 15:28, Chris Wilson wrote:
> > Quoting Tvrtko Ursulin (2018-11-19 15:22:28)
> >> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> >>
> >> Add some nop instructions between recursive batch buffer start calls to
> >> give system some breathing room. Without these, especially when coupled
> >> with memory pressure, false GPU hangs can be observed caused by the
> >> inability of the chip to cope.
> > 
> > Doesn't seem to be required. And the machines most susceptible to timer
> > errors due to busyspin have not show the issue.
> 
> With the memory pressure subtest, the second patch in this series, it 
> was make it or break it to have the nops. Without them it was GPU hangs 
> all around, and with them so far all clean.

First machine bsw, just applying patch 2/2,

IGT-Version: 1.23-gb6b8d829 (x86_64) (Linux: 4.20.0-rc2+ x86_64)
Using Execlists submission
Ring size: 131 batches
Starting subtest: wide-all
wide: 420 cycles: 24121.034us
Subtest wide-all: SUCCESS (47.060s)
Starting subtest: wide-contexts
wide: 340 cycles: 24893.896us
Subtest wide-contexts: SUCCESS (22.265s)
Starting subtest: wide-contexts-mempressure
wide: 232 cycles: 25153.899us
Subtest wide-contexts-mempressure: SUCCESS (23.141s)

:|
-Chris

Tvrtko Ursulin Nov. 19, 2018, 7:18 p.m. UTC | #4

On 19/11/2018 16:18, Chris Wilson wrote:
> Quoting Tvrtko Ursulin (2018-11-19 15:33:56)
>>
>> On 19/11/2018 15:28, Chris Wilson wrote:
>>> Quoting Tvrtko Ursulin (2018-11-19 15:22:28)
>>>> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>>>
>>>> Add some nop instructions between recursive batch buffer start calls to
>>>> give system some breathing room. Without these, especially when coupled
>>>> with memory pressure, false GPU hangs can be observed caused by the
>>>> inability of the chip to cope.
>>>
>>> Doesn't seem to be required. And the machines most susceptible to timer
>>> errors due to busyspin have not show the issue.
>>
>> With the memory pressure subtest, the second patch in this series, it
>> was make it or break it to have the nops. Without them it was GPU hangs
>> all around, and with them so far all clean.
> 
> First machine bsw, just applying patch 2/2,
> 
> IGT-Version: 1.23-gb6b8d829 (x86_64) (Linux: 4.20.0-rc2+ x86_64)
> Using Execlists submission
> Ring size: 131 batches
> Starting subtest: wide-all
> wide: 420 cycles: 24121.034us
> Subtest wide-all: SUCCESS (47.060s)
> Starting subtest: wide-contexts
> wide: 340 cycles: 24893.896us
> Subtest wide-contexts: SUCCESS (22.265s)
> Starting subtest: wide-contexts-mempressure
> wide: 232 cycles: 25153.899us
> Subtest wide-contexts-mempressure: SUCCESS (23.141s)
> 
> :|

Yes, I think what I had before I cleaned up the test case was more 
copy&paste of the memory pressure thread from gem_syslatency - including 
the rtprio and multithreadedness. So I was possibly starving the 
tasklets and who knows what not, as well as applying memory pressure. 
However, fact still is adding nops to the spinner made even that monster 
pass repeatedly. I'll play with it more tomorrow.

Regards,

Tvrtko

Chris Wilson Nov. 19, 2018, 7:34 p.m. UTC | #5

Quoting Tvrtko Ursulin (2018-11-19 19:18:52)
> 
> On 19/11/2018 16:18, Chris Wilson wrote:
> > Quoting Tvrtko Ursulin (2018-11-19 15:33:56)
> >>
> >> On 19/11/2018 15:28, Chris Wilson wrote:
> >>> Quoting Tvrtko Ursulin (2018-11-19 15:22:28)
> >>>> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> >>>>
> >>>> Add some nop instructions between recursive batch buffer start calls to
> >>>> give system some breathing room. Without these, especially when coupled
> >>>> with memory pressure, false GPU hangs can be observed caused by the
> >>>> inability of the chip to cope.
> >>>
> >>> Doesn't seem to be required. And the machines most susceptible to timer
> >>> errors due to busyspin have not show the issue.
> >>
> >> With the memory pressure subtest, the second patch in this series, it
> >> was make it or break it to have the nops. Without them it was GPU hangs
> >> all around, and with them so far all clean.
> > 
> > First machine bsw, just applying patch 2/2,
> > 
> > IGT-Version: 1.23-gb6b8d829 (x86_64) (Linux: 4.20.0-rc2+ x86_64)
> > Using Execlists submission
> > Ring size: 131 batches
> > Starting subtest: wide-all
> > wide: 420 cycles: 24121.034us
> > Subtest wide-all: SUCCESS (47.060s)
> > Starting subtest: wide-contexts
> > wide: 340 cycles: 24893.896us
> > Subtest wide-contexts: SUCCESS (22.265s)
> > Starting subtest: wide-contexts-mempressure
> > wide: 232 cycles: 25153.899us
> > Subtest wide-contexts-mempressure: SUCCESS (23.141s)
> > 
> > :|
> 
> Yes, I think what I had before I cleaned up the test case was more 
> copy&paste of the memory pressure thread from gem_syslatency - including 
> the rtprio and multithreadedness. So I was possibly starving the 
> tasklets and who knows what not, as well as applying memory pressure. 
> However, fact still is adding nops to the spinner made even that monster 
> pass repeatedly. I'll play with it more tomorrow.

I am a bit nervous about using the noops to avoid the issue, as I
presume that there is a more realistic workload that could generate
similar system latencies, i.e. that there exists a pathological case
that users will hit for similar stalls (gem shrinker perchance).
-Chris

[i-g-t,1/2] tests/gem_exec_await: Relax the busy spinner

Commit Message

Comments

Patch