mbox series

[i-g-t,v2,0/1] tests/i915/perf: Add stress / race exercises

Message ID 20230209115039.34441-1-janusz.krzysztofik@linux.intel.com (mailing list archive)
Headers show
Series tests/i915/perf: Add stress / race exercises | expand

Message

Janusz Krzysztofik Feb. 9, 2023, 11:50 a.m. UTC
Users reported oopses on list corruptions when using i915 perf with a
number of concurrently running graphics applications.  That indicates we
are currently missing some important tests for such scenarios.  Cover
that gap.

v2: drop open-race subtest for now, not capable of triggering the user
    reported bug, but triggering other bugs which I can't see any fixes
    for queued yet,
  - move the other new subtest out of tests/i915/perf.c (Ashutosh).

Janusz Krzysztofik (1):
  tests/gem_ctx_exec: Exercise barrier race

 tests/i915/gem_ctx_exec.c | 123 ++++++++++++++++++++++++++++++++++++++
 tests/meson.build         |   9 ++-
 2 files changed, 131 insertions(+), 1 deletion(-)

Comments

Janusz Krzysztofik Feb. 10, 2023, 7:53 a.m. UTC | #1
Hi,

On Thursday, 9 February 2023 12:50:38 CET Janusz Krzysztofik wrote:
> Users reported oopses on list corruptions when using i915 perf with a
> number of concurrently running graphics applications.  That indicates we
> are currently missing some important tests for such scenarios.  Cover
> that gap.
> 
> v2: drop open-race subtest for now, not capable of triggering the user
>     reported bug, but triggering other bugs which I can't see any fixes
>     for queued yet,
>   - move the other new subtest out of tests/i915/perf.c (Ashutosh).
> 
> Janusz Krzysztofik (1):
>   tests/gem_ctx_exec: Exercise barrier race

While still waiting for CI results (BAT results don't cover the new subtest) 
I've collected results from a forced execution of the subtest in BAT scope on 
trybot: https://patchwork.freedesktop.org/series/113608/#rev2

While working as expected on most platforms, the test failed on some ancient 
ones instead of skipping.  I've fixed this issue and tested the fix 
successfully on trybot: https://patchwork.freedesktop.org/series/113608/#rev3

I'm still waiting for your comments, if any, before I submit the fixed 
version.

Thanks,
Janusz

> 
>  tests/i915/gem_ctx_exec.c | 123 ++++++++++++++++++++++++++++++++++++++
>  tests/meson.build         |   9 ++-
>  2 files changed, 131 insertions(+), 1 deletion(-)
> 
>
Kamil Konieczny Feb. 10, 2023, 11:21 a.m. UTC | #2
Hi,

On 2023-02-10 at 08:53:12 +0100, Janusz Krzysztofik wrote:
> Hi,
> 
> On Thursday, 9 February 2023 12:50:38 CET Janusz Krzysztofik wrote:
> > Users reported oopses on list corruptions when using i915 perf with a
> > number of concurrently running graphics applications.  That indicates we
> > are currently missing some important tests for such scenarios.  Cover
> > that gap.
> > 
> > v2: drop open-race subtest for now, not capable of triggering the user
> >     reported bug, but triggering other bugs which I can't see any fixes
> >     for queued yet,
> >   - move the other new subtest out of tests/i915/perf.c (Ashutosh).
> > 
> > Janusz Krzysztofik (1):
> >   tests/gem_ctx_exec: Exercise barrier race
> 
> While still waiting for CI results (BAT results don't cover the new subtest) 
> I've collected results from a forced execution of the subtest in BAT scope on 
> trybot: https://patchwork.freedesktop.org/series/113608/#rev2
> 
> While working as expected on most platforms, the test failed on some ancient 
> ones instead of skipping.  I've fixed this issue and tested the fix 
> successfully on trybot: https://patchwork.freedesktop.org/series/113608/#rev3
> 
> I'm still waiting for your comments, if any, before I submit the fixed 
> version.

Patch looks good but as you already noticed it is blacklisted
and do not cause noticeable fail. Proposed solution is to move
it to other test or to create new one, imho one you proposed

igt@gem_barrier_race@remote-request

looks good.

Regards,
Kamil

> 
> Thanks,
> Janusz
> 
> > 
> >  tests/i915/gem_ctx_exec.c | 123 ++++++++++++++++++++++++++++++++++++++
> >  tests/meson.build         |   9 ++-
> >  2 files changed, 131 insertions(+), 1 deletion(-)
> > 
> > 
> 
> 
> 
>
Janusz Krzysztofik Feb. 10, 2023, 11:56 a.m. UTC | #3
On Friday, 10 February 2023 12:21:58 CET Kamil Konieczny wrote:
> Hi,
> 
> On 2023-02-10 at 08:53:12 +0100, Janusz Krzysztofik wrote:
> > Hi,
> > 
> > On Thursday, 9 February 2023 12:50:38 CET Janusz Krzysztofik wrote:
> > > Users reported oopses on list corruptions when using i915 perf with a
> > > number of concurrently running graphics applications.  That indicates we
> > > are currently missing some important tests for such scenarios.  Cover
> > > that gap.
> > > 
> > > v2: drop open-race subtest for now, not capable of triggering the user
> > >     reported bug, but triggering other bugs which I can't see any fixes
> > >     for queued yet,
> > >   - move the other new subtest out of tests/i915/perf.c (Ashutosh).
> > > 
> > > Janusz Krzysztofik (1):
> > >   tests/gem_ctx_exec: Exercise barrier race
> > 
> > While still waiting for CI results (BAT results don't cover the new subtest) 
> > I've collected results from a forced execution of the subtest in BAT scope on 
> > trybot: https://patchwork.freedesktop.org/series/113608/#rev2
> > 
> > While working as expected on most platforms, the test failed on some ancient 
> > ones instead of skipping.  I've fixed this issue and tested the fix 
> > successfully on trybot: https://patchwork.freedesktop.org/series/113608/#rev3
> > 
> > I'm still waiting for your comments, if any, before I submit the fixed 
> > version.
> 
> Patch looks good but as you already noticed it is blacklisted
> and do not cause noticeable fail. Proposed solution is to move
> it to other test or to create new one, imho one you proposed
> 
> igt@gem_barrier_race@remote-request

OK, since I can't point out any better existing candidate, let's create a new 
test.  However, taking into account that we have some more variants in 
progress which differ on the barrier rather then remote-request side 
of workloads, and the remote-request workload will probably be common to all 
those variants, I propose a somehow reordered test naming:

igt@gem_remote_request@barrier-race

This way, we don't determine how remote requests are triggered, then we don't 
connect the new test with perf specifically in any way, and we have plenty of 
room for different workloads we may want to race against remote requests (be 
it perf triggered or not).

If perf specifically requires more thorough testing, that can be handled 
separately in a separate, perf dedicated test.

Thanks,
Janusz


> 
> looks good.
> 
> Regards,
> Kamil
> 
> > 
> > Thanks,
> > Janusz
> > 
> > > 
> > >  tests/i915/gem_ctx_exec.c | 123 ++++++++++++++++++++++++++++++++++++++
> > >  tests/meson.build         |   9 ++-
> > >  2 files changed, 131 insertions(+), 1 deletion(-)
> > > 
> > > 
> > 
> > 
> > 
> > 
>
Janusz Krzysztofik Feb. 13, 2023, 9:42 a.m. UTC | #4
On Friday, 10 February 2023 12:56:12 CET Janusz Krzysztofik wrote:
> On Friday, 10 February 2023 12:21:58 CET Kamil Konieczny wrote:
> > Hi,
> > 
> > On 2023-02-10 at 08:53:12 +0100, Janusz Krzysztofik wrote:
> > > Hi,
> > > 
> > > On Thursday, 9 February 2023 12:50:38 CET Janusz Krzysztofik wrote:
> > > > Users reported oopses on list corruptions when using i915 perf with a
> > > > number of concurrently running graphics applications.  That indicates we
> > > > are currently missing some important tests for such scenarios.  Cover
> > > > that gap.
> > > > 
> > > > v2: drop open-race subtest for now, not capable of triggering the user
> > > >     reported bug, but triggering other bugs which I can't see any fixes
> > > >     for queued yet,
> > > >   - move the other new subtest out of tests/i915/perf.c (Ashutosh).
> > > > 
> > > > Janusz Krzysztofik (1):
> > > >   tests/gem_ctx_exec: Exercise barrier race
> > > 
> > > While still waiting for CI results (BAT results don't cover the new subtest) 
> > > I've collected results from a forced execution of the subtest in BAT scope on 
> > > trybot: https://patchwork.freedesktop.org/series/113608/#rev2
> > > 
> > > While working as expected on most platforms, the test failed on some ancient 
> > > ones instead of skipping.  I've fixed this issue and tested the fix 
> > > successfully on trybot: https://patchwork.freedesktop.org/series/113608/#rev3
> > > 
> > > I'm still waiting for your comments, if any, before I submit the fixed 
> > > version.
> > 
> > Patch looks good but as you already noticed it is blacklisted
> > and do not cause noticeable fail. Proposed solution is to move
> > it to other test or to create new one, imho one you proposed
> > 
> > igt@gem_barrier_race@remote-request
> 
> OK, since I can't point out any better existing candidate, let's create a new 
> test.  However, taking into account that we have some more variants in 
> progress which differ on the barrier rather then remote-request side 
> of workloads, and the remote-request workload will probably be common to all 
> those variants, I propose a somehow reordered test naming:
> 
> igt@gem_remote_request@barrier-race

I've decided to keep my initial igt@gem_barrier_race@remote-request naming.
Since implementation of barrier tasks list handling is intentionally racy, 
I think that's quite reasonable to have a test focused on exercising those 
race cases, and the remote-request case seems not the only one that can be
problematic.

Thanks,
Janusz


> 
> This way, we don't determine how remote requests are triggered, then we don't 
> connect the new test with perf specifically in any way, and we have plenty of 
> room for different workloads we may want to race against remote requests (be 
> it perf triggered or not).
> 
> If perf specifically requires more thorough testing, that can be handled 
> separately in a separate, perf dedicated test.
> 
> Thanks,
> Janusz
> 
> 
> > 
> > looks good.
> > 
> > Regards,
> > Kamil
> > 
> > > 
> > > Thanks,
> > > Janusz
> > > 
> > > > 
> > > >  tests/i915/gem_ctx_exec.c | 123 ++++++++++++++++++++++++++++++++++++++
> > > >  tests/meson.build         |   9 ++-
> > > >  2 files changed, 131 insertions(+), 1 deletion(-)
> > > > 
> > > > 
> > > 
> > > 
> > > 
> > > 
> > 
> 
> 
> 
> 
>