diff mbox series

[3/3] tests/qtest: Re-enable multifd cancel test

Message ID 20230606144551.24367-4-farosas@suse.de (mailing list archive)
State New, archived
Headers show
Series migration: Fix multifd cancel test | expand

Commit Message

Fabiano Rosas June 6, 2023, 2:45 p.m. UTC
We've found the source of flakiness in this test, so re-enable it.

Signed-off-by: Fabiano Rosas <farosas@suse.de>
---
 tests/qtest/migration-test.c | 10 ++--------
 1 file changed, 2 insertions(+), 8 deletions(-)

Comments

Juan Quintela June 7, 2023, 8:27 a.m. UTC | #1
Fabiano Rosas <farosas@suse.de> wrote:
> We've found the source of flakiness in this test, so re-enable it.
>
> Signed-off-by: Fabiano Rosas <farosas@suse.de>
> ---
>  tests/qtest/migration-test.c | 10 ++--------
>  1 file changed, 2 insertions(+), 8 deletions(-)
>
> diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
> index b0c355bbd9..800ad23b75 100644
> --- a/tests/qtest/migration-test.c
> +++ b/tests/qtest/migration-test.c
> @@ -2778,14 +2778,8 @@ int main(int argc, char **argv)
>      }
>      qtest_add_func("/migration/multifd/tcp/plain/none",
>                     test_multifd_tcp_none);
> -    /*
> -     * This test is flaky and sometimes fails in CI and otherwise:
> -     * don't run unless user opts in via environment variable.
> -     */
> -    if (getenv("QEMU_TEST_FLAKY_TESTS")) {
> -        qtest_add_func("/migration/multifd/tcp/plain/cancel",
> -                       test_multifd_tcp_cancel);
> -    }
> +    qtest_add_func("/migration/multifd/tcp/plain/cancel",
> +                   test_multifd_tcp_cancel);
>      qtest_add_func("/migration/multifd/tcp/plain/zlib",
>                     test_multifd_tcp_zlib);
>  #ifdef CONFIG_ZSTD

Reviewed-by: Juan Quintela <quintela@redhat.com>


There was another failure with migration test that I will post during
the rest of the day.  It needs both to get it right.

Later, Juan.
Peter Xu Jan. 8, 2024, 6:42 a.m. UTC | #2
On Wed, Jun 07, 2023 at 10:27:15AM +0200, Juan Quintela wrote:
> Fabiano Rosas <farosas@suse.de> wrote:
> > We've found the source of flakiness in this test, so re-enable it.
> >
> > Signed-off-by: Fabiano Rosas <farosas@suse.de>
> > ---
> >  tests/qtest/migration-test.c | 10 ++--------
> >  1 file changed, 2 insertions(+), 8 deletions(-)
> >
> > diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
> > index b0c355bbd9..800ad23b75 100644
> > --- a/tests/qtest/migration-test.c
> > +++ b/tests/qtest/migration-test.c
> > @@ -2778,14 +2778,8 @@ int main(int argc, char **argv)
> >      }
> >      qtest_add_func("/migration/multifd/tcp/plain/none",
> >                     test_multifd_tcp_none);
> > -    /*
> > -     * This test is flaky and sometimes fails in CI and otherwise:
> > -     * don't run unless user opts in via environment variable.
> > -     */
> > -    if (getenv("QEMU_TEST_FLAKY_TESTS")) {
> > -        qtest_add_func("/migration/multifd/tcp/plain/cancel",
> > -                       test_multifd_tcp_cancel);
> > -    }
> > +    qtest_add_func("/migration/multifd/tcp/plain/cancel",
> > +                   test_multifd_tcp_cancel);
> >      qtest_add_func("/migration/multifd/tcp/plain/zlib",
> >                     test_multifd_tcp_zlib);
> >  #ifdef CONFIG_ZSTD
> 
> Reviewed-by: Juan Quintela <quintela@redhat.com>
> 
> 
> There was another failure with migration test that I will post during
> the rest of the day.  It needs both to get it right.

This one didn't yet land upstream.  I'm not sure, but maybe Juan was saying
about this change:

        commit d2026ee117147893f8d80f060cede6d872ecbd7f
        Author: Juan Quintela <quintela@trasno.org>
        Date:   Wed Apr 26 12:20:36 2023 +0200

        multifd: Fix the number of channels ready

Fabiano, did you try to reproduce multifd-cancel with current master?  I'm
wondering whether this test has already been completely fixed, then maybe
we can pick up this patch now.
Fabiano Rosas Jan. 8, 2024, 2:26 p.m. UTC | #3
Peter Xu <peterx@redhat.com> writes:

> On Wed, Jun 07, 2023 at 10:27:15AM +0200, Juan Quintela wrote:
>> Fabiano Rosas <farosas@suse.de> wrote:
>> > We've found the source of flakiness in this test, so re-enable it.
>> >
>> > Signed-off-by: Fabiano Rosas <farosas@suse.de>
>> > ---
>> >  tests/qtest/migration-test.c | 10 ++--------
>> >  1 file changed, 2 insertions(+), 8 deletions(-)
>> >
>> > diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
>> > index b0c355bbd9..800ad23b75 100644
>> > --- a/tests/qtest/migration-test.c
>> > +++ b/tests/qtest/migration-test.c
>> > @@ -2778,14 +2778,8 @@ int main(int argc, char **argv)
>> >      }
>> >      qtest_add_func("/migration/multifd/tcp/plain/none",
>> >                     test_multifd_tcp_none);
>> > -    /*
>> > -     * This test is flaky and sometimes fails in CI and otherwise:
>> > -     * don't run unless user opts in via environment variable.
>> > -     */
>> > -    if (getenv("QEMU_TEST_FLAKY_TESTS")) {
>> > -        qtest_add_func("/migration/multifd/tcp/plain/cancel",
>> > -                       test_multifd_tcp_cancel);
>> > -    }
>> > +    qtest_add_func("/migration/multifd/tcp/plain/cancel",
>> > +                   test_multifd_tcp_cancel);
>> >      qtest_add_func("/migration/multifd/tcp/plain/zlib",
>> >                     test_multifd_tcp_zlib);
>> >  #ifdef CONFIG_ZSTD
>> 
>> Reviewed-by: Juan Quintela <quintela@redhat.com>
>> 
>> 
>> There was another failure with migration test that I will post during
>> the rest of the day.  It needs both to get it right.
>
> This one didn't yet land upstream.  I'm not sure, but maybe Juan was saying
> about this change:
>
>         commit d2026ee117147893f8d80f060cede6d872ecbd7f
>         Author: Juan Quintela <quintela@trasno.org>
>         Date:   Wed Apr 26 12:20:36 2023 +0200
>
>         multifd: Fix the number of channels ready

That's not it. It was something in the test itself around the fact that
we use two sets of: from/to. There was supposed to be a situation where
we'd start 'to2' while 'to' was still running and that would cause
issues (possibly with sockets).

I think what might have happened is that someone merged a fix through
another tree and Juan didn't notice. I think this is the one:

  commit f2d063e61ee2026700ab44bef967f663e976bec8
  Author: Xuzhou Cheng <xuzhou.cheng@windriver.com>
  Date:   Fri Oct 28 12:57:32 2022 +0800
  
      tests/qtest: migration-test: Make sure QEMU process "to" exited after migration is canceled
      
      Make sure QEMU process "to" exited before launching another target
      for migration in the test_multifd_tcp_cancel case.
      
      Signed-off-by: Xuzhou Cheng <xuzhou.cheng@windriver.com>
      Signed-off-by: Bin Meng <bin.meng@windriver.com>
      Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
      Message-Id: <20221028045736.679903-8-bin.meng@windriver.com>
      Signed-off-by: Thomas Huth <thuth@redhat.com>

> Fabiano, did you try to reproduce multifd-cancel with current master?  I'm
> wondering whether this test has already been completely fixed, then maybe
> we can pick up this patch now.

Yes, let's merge it. I have kept it enabled during testing of all of the
recent race conditions we've debugged and haven't seen it fail. Current
master also looks fine.
Peter Xu Jan. 9, 2024, 2:12 a.m. UTC | #4
On Mon, Jan 08, 2024 at 11:26:04AM -0300, Fabiano Rosas wrote:
> Peter Xu <peterx@redhat.com> writes:
> 
> > On Wed, Jun 07, 2023 at 10:27:15AM +0200, Juan Quintela wrote:
> >> Fabiano Rosas <farosas@suse.de> wrote:
> >> > We've found the source of flakiness in this test, so re-enable it.
> >> >
> >> > Signed-off-by: Fabiano Rosas <farosas@suse.de>
> >> > ---
> >> >  tests/qtest/migration-test.c | 10 ++--------
> >> >  1 file changed, 2 insertions(+), 8 deletions(-)
> >> >
> >> > diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
> >> > index b0c355bbd9..800ad23b75 100644
> >> > --- a/tests/qtest/migration-test.c
> >> > +++ b/tests/qtest/migration-test.c
> >> > @@ -2778,14 +2778,8 @@ int main(int argc, char **argv)
> >> >      }
> >> >      qtest_add_func("/migration/multifd/tcp/plain/none",
> >> >                     test_multifd_tcp_none);
> >> > -    /*
> >> > -     * This test is flaky and sometimes fails in CI and otherwise:
> >> > -     * don't run unless user opts in via environment variable.
> >> > -     */
> >> > -    if (getenv("QEMU_TEST_FLAKY_TESTS")) {
> >> > -        qtest_add_func("/migration/multifd/tcp/plain/cancel",
> >> > -                       test_multifd_tcp_cancel);
> >> > -    }
> >> > +    qtest_add_func("/migration/multifd/tcp/plain/cancel",
> >> > +                   test_multifd_tcp_cancel);
> >> >      qtest_add_func("/migration/multifd/tcp/plain/zlib",
> >> >                     test_multifd_tcp_zlib);
> >> >  #ifdef CONFIG_ZSTD
> >> 
> >> Reviewed-by: Juan Quintela <quintela@redhat.com>
> >> 
> >> 
> >> There was another failure with migration test that I will post during
> >> the rest of the day.  It needs both to get it right.
> >
> > This one didn't yet land upstream.  I'm not sure, but maybe Juan was saying
> > about this change:
> >
> >         commit d2026ee117147893f8d80f060cede6d872ecbd7f
> >         Author: Juan Quintela <quintela@trasno.org>
> >         Date:   Wed Apr 26 12:20:36 2023 +0200
> >
> >         multifd: Fix the number of channels ready
> 
> That's not it. It was something in the test itself around the fact that
> we use two sets of: from/to. There was supposed to be a situation where
> we'd start 'to2' while 'to' was still running and that would cause
> issues (possibly with sockets).
> 
> I think what might have happened is that someone merged a fix through
> another tree and Juan didn't notice. I think this is the one:
> 
>   commit f2d063e61ee2026700ab44bef967f663e976bec8
>   Author: Xuzhou Cheng <xuzhou.cheng@windriver.com>
>   Date:   Fri Oct 28 12:57:32 2022 +0800
>   
>       tests/qtest: migration-test: Make sure QEMU process "to" exited after migration is canceled
>       
>       Make sure QEMU process "to" exited before launching another target
>       for migration in the test_multifd_tcp_cancel case.
>       
>       Signed-off-by: Xuzhou Cheng <xuzhou.cheng@windriver.com>
>       Signed-off-by: Bin Meng <bin.meng@windriver.com>
>       Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
>       Message-Id: <20221028045736.679903-8-bin.meng@windriver.com>
>       Signed-off-by: Thomas Huth <thuth@redhat.com>

Hmm, i see.

> 
> > Fabiano, did you try to reproduce multifd-cancel with current master?  I'm
> > wondering whether this test has already been completely fixed, then maybe
> > we can pick up this patch now.
> 
> Yes, let's merge it. I have kept it enabled during testing of all of the
> recent race conditions we've debugged and haven't seen it fail. Current
> master also looks fine.

It needs a trivial touchup, but then I queued it.

Thanks,
Thomas Huth Jan. 9, 2024, 7:21 a.m. UTC | #5
On 09/01/2024 03.12, Peter Xu wrote:
> On Mon, Jan 08, 2024 at 11:26:04AM -0300, Fabiano Rosas wrote:
>> Peter Xu <peterx@redhat.com> writes:
>>
>>> On Wed, Jun 07, 2023 at 10:27:15AM +0200, Juan Quintela wrote:
>>>> Fabiano Rosas <farosas@suse.de> wrote:
>>>>> We've found the source of flakiness in this test, so re-enable it.
>>>>>
>>>>> Signed-off-by: Fabiano Rosas <farosas@suse.de>
>>>>> ---
>>>>>   tests/qtest/migration-test.c | 10 ++--------
>>>>>   1 file changed, 2 insertions(+), 8 deletions(-)
>>>>>
>>>>> diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
>>>>> index b0c355bbd9..800ad23b75 100644
>>>>> --- a/tests/qtest/migration-test.c
>>>>> +++ b/tests/qtest/migration-test.c
>>>>> @@ -2778,14 +2778,8 @@ int main(int argc, char **argv)
>>>>>       }
>>>>>       qtest_add_func("/migration/multifd/tcp/plain/none",
>>>>>                      test_multifd_tcp_none);
>>>>> -    /*
>>>>> -     * This test is flaky and sometimes fails in CI and otherwise:
>>>>> -     * don't run unless user opts in via environment variable.
>>>>> -     */
>>>>> -    if (getenv("QEMU_TEST_FLAKY_TESTS")) {
>>>>> -        qtest_add_func("/migration/multifd/tcp/plain/cancel",
>>>>> -                       test_multifd_tcp_cancel);
>>>>> -    }
>>>>> +    qtest_add_func("/migration/multifd/tcp/plain/cancel",
>>>>> +                   test_multifd_tcp_cancel);
>>>>>       qtest_add_func("/migration/multifd/tcp/plain/zlib",
>>>>>                      test_multifd_tcp_zlib);
>>>>>   #ifdef CONFIG_ZSTD
>>>>
>>>> Reviewed-by: Juan Quintela <quintela@redhat.com>
>>>>
>>>>
>>>> There was another failure with migration test that I will post during
>>>> the rest of the day.  It needs both to get it right.
>>>
>>> This one didn't yet land upstream.  I'm not sure, but maybe Juan was saying
>>> about this change:
>>>
>>>          commit d2026ee117147893f8d80f060cede6d872ecbd7f
>>>          Author: Juan Quintela <quintela@trasno.org>
>>>          Date:   Wed Apr 26 12:20:36 2023 +0200
>>>
>>>          multifd: Fix the number of channels ready
>>
>> That's not it. It was something in the test itself around the fact that
>> we use two sets of: from/to. There was supposed to be a situation where
>> we'd start 'to2' while 'to' was still running and that would cause
>> issues (possibly with sockets).
>>
>> I think what might have happened is that someone merged a fix through
>> another tree and Juan didn't notice. I think this is the one:
>>
>>    commit f2d063e61ee2026700ab44bef967f663e976bec8
>>    Author: Xuzhou Cheng <xuzhou.cheng@windriver.com>
>>    Date:   Fri Oct 28 12:57:32 2022 +0800
>>    
>>        tests/qtest: migration-test: Make sure QEMU process "to" exited after migration is canceled
>>        
>>        Make sure QEMU process "to" exited before launching another target
>>        for migration in the test_multifd_tcp_cancel case.
>>        
>>        Signed-off-by: Xuzhou Cheng <xuzhou.cheng@windriver.com>
>>        Signed-off-by: Bin Meng <bin.meng@windriver.com>
>>        Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
>>        Message-Id: <20221028045736.679903-8-bin.meng@windriver.com>
>>        Signed-off-by: Thomas Huth <thuth@redhat.com>
> 
> Hmm, i see.

Sorry for that :-( Maybe it's better if we remove the migration-test from 
the qtest section in MAINTAINERS? Since the migration test is very well 
maintained already, there's IMHO no need for picking up the patches via the 
qtest tree, so something like this should prevent these problems:

diff --git a/MAINTAINERS b/MAINTAINERS
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -3269,6 +3269,7 @@ F: tests/qtest/
  F: docs/devel/qgraph.rst
  F: docs/devel/qtest.rst
  X: tests/qtest/bios-tables-test*
+X: tests/qtest/migration-*

  Device Fuzzing
  M: Alexander Bulekov <alxndr@bu.edu>

(as you can see, we're doing it in a similar way for the bios tables test 
already)

If you agree, I can send out a proper patch for this later today.

  Thomas
Peter Xu Jan. 9, 2024, 7:48 a.m. UTC | #6
Hi, Thomas,

On Tue, Jan 09, 2024 at 08:21:53AM +0100, Thomas Huth wrote:
> Sorry for that :-(

Not at all!  I actually appreciate more people looking after it.

> Maybe it's better if we remove the migration-test from
> the qtest section in MAINTAINERS? Since the migration test is very well
> maintained already, there's IMHO no need for picking up the patches via the
> qtest tree, so something like this should prevent these problems:
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -3269,6 +3269,7 @@ F: tests/qtest/
>  F: docs/devel/qgraph.rst
>  F: docs/devel/qtest.rst
>  X: tests/qtest/bios-tables-test*
> +X: tests/qtest/migration-*
> 
>  Device Fuzzing
>  M: Alexander Bulekov <alxndr@bu.edu>
> 
> (as you can see, we're doing it in a similar way for the bios tables test
> already)
> 
> If you agree, I can send out a proper patch for this later today.

Currently the file is covered by both groups of people, which is the best
condition to me:

$ ./scripts/get_maintainer.pl -f tests/qtest/migration-test.c 
Peter Xu <peterx@redhat.com> (maintainer:Migration)
Fabiano Rosas <farosas@suse.de> (maintainer:Migration)
Thomas Huth <thuth@redhat.com> (maintainer:qtest)
Laurent Vivier <lvivier@redhat.com> (maintainer:qtest)
Paolo Bonzini <pbonzini@redhat.com> (reviewer:qtest)
qemu-devel@nongnu.org (open list:All patches CC here)

It makes sense to me e.g. when qtest reworks the framework, and we'd like
migration-test.c to be covered in that same reworks series and
reviewed/pulled together, for example, then those can go via qtest's tree
directly.

If patch submitter follows the MAINTAINERS file it means all of us will be
in the loop and that's the perfect condition, IMHO.  It's just that this
patch didn't have any migration people copied, which caused a very slight
confusion.

It'll be great in that case if qtest maintainers can help submitters to
copy us if the submitters forgot to do so.  I think we should do the same
when there's major changes for qtest framework for a new migration test.
Would that work the best for us?

Thanks,
Thomas Huth Jan. 9, 2024, 8:44 a.m. UTC | #7
On 09/01/2024 08.48, Peter Xu wrote:
> Hi, Thomas,
> 
> On Tue, Jan 09, 2024 at 08:21:53AM +0100, Thomas Huth wrote:
>> Sorry for that :-(
> 
> Not at all!  I actually appreciate more people looking after it.
> 
>> Maybe it's better if we remove the migration-test from
>> the qtest section in MAINTAINERS? Since the migration test is very well
>> maintained already, there's IMHO no need for picking up the patches via the
>> qtest tree, so something like this should prevent these problems:
>>
>> diff --git a/MAINTAINERS b/MAINTAINERS
>> --- a/MAINTAINERS
>> +++ b/MAINTAINERS
>> @@ -3269,6 +3269,7 @@ F: tests/qtest/
>>   F: docs/devel/qgraph.rst
>>   F: docs/devel/qtest.rst
>>   X: tests/qtest/bios-tables-test*
>> +X: tests/qtest/migration-*
>>
>>   Device Fuzzing
>>   M: Alexander Bulekov <alxndr@bu.edu>
>>
>> (as you can see, we're doing it in a similar way for the bios tables test
>> already)
>>
>> If you agree, I can send out a proper patch for this later today.
> 
> Currently the file is covered by both groups of people, which is the best
> condition to me:
> 
> $ ./scripts/get_maintainer.pl -f tests/qtest/migration-test.c
> Peter Xu <peterx@redhat.com> (maintainer:Migration)
> Fabiano Rosas <farosas@suse.de> (maintainer:Migration)
> Thomas Huth <thuth@redhat.com> (maintainer:qtest)
> Laurent Vivier <lvivier@redhat.com> (maintainer:qtest)
> Paolo Bonzini <pbonzini@redhat.com> (reviewer:qtest)
> qemu-devel@nongnu.org (open list:All patches CC here)
> 
> It makes sense to me e.g. when qtest reworks the framework, and we'd like
> migration-test.c to be covered in that same reworks series and
> reviewed/pulled together, for example, then those can go via qtest's tree
> directly.
> 
> If patch submitter follows the MAINTAINERS file it means all of us will be
> in the loop and that's the perfect condition, IMHO.  It's just that this
> patch didn't have any migration people copied, which caused a very slight
> confusion.
> 
> It'll be great in that case if qtest maintainers can help submitters to
> copy us if the submitters forgot to do so.  I think we should do the same
> when there's major changes for qtest framework for a new migration test.
> Would that work the best for us?

Ok, makes sense, let's try it that way!

  Thomas
diff mbox series

Patch

diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
index b0c355bbd9..800ad23b75 100644
--- a/tests/qtest/migration-test.c
+++ b/tests/qtest/migration-test.c
@@ -2778,14 +2778,8 @@  int main(int argc, char **argv)
     }
     qtest_add_func("/migration/multifd/tcp/plain/none",
                    test_multifd_tcp_none);
-    /*
-     * This test is flaky and sometimes fails in CI and otherwise:
-     * don't run unless user opts in via environment variable.
-     */
-    if (getenv("QEMU_TEST_FLAKY_TESTS")) {
-        qtest_add_func("/migration/multifd/tcp/plain/cancel",
-                       test_multifd_tcp_cancel);
-    }
+    qtest_add_func("/migration/multifd/tcp/plain/cancel",
+                   test_multifd_tcp_cancel);
     qtest_add_func("/migration/multifd/tcp/plain/zlib",
                    test_multifd_tcp_zlib);
 #ifdef CONFIG_ZSTD