Message ID | 20230912150611.70676-1-stefanha@redhat.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | gitlab: remove unreliable avocado CI jobs | expand |
On Tue, Sep 12, 2023 at 11:06:11AM -0400, Stefan Hajnoczi wrote: > The avocado-system-alpine, avocado-system-fedora, and > avocado-system-ubuntu jobs are unreliable. I identified them while > looking over CI failures from the past week: > https://gitlab.com/qemu-project/qemu/-/jobs/5058610614 > https://gitlab.com/qemu-project/qemu/-/jobs/5058610654 > https://gitlab.com/qemu-project/qemu/-/jobs/5030428571 > > Thomas Huth suggest on IRC today that there may be a legitimate failure > in there: > > th_huth: f4bug, yes, seems like it does not start at all correctly on > alpine anymore ... and it's broken since ~ 2 weeks already, so if nobody > noticed this by now, this is worrying > > It crept in because the jobs were already unreliable. > > I don't know how to interpret the job output, so all I can do is to > propose removing these jobs. A useful CI job has two outcomes: pass or > fail. Timeouts and other in-between states are not useful because they > require constant triaging by someone who understands the details of the > tests and they can occur when run against pull requests that have > nothing to do with the area covered by the test. > > Hopefully test owners will be able to identify the root causes and solve > them so that these jobs can stay. In their current state the jobs are > not useful since I cannot cannot tell whether job failures are real or > just intermittent when merging qemu.git pull requests. > > If you are a test owner, please take a look. > > It is likely that other avocado-system-* CI jobs have similar failures > from time to time, but I'll leave them as long as they are passing. > > Buglink: https://gitlab.com/qemu-project/qemu/-/issues/1884 > Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> > --- > .gitlab-ci.d/buildtest.yml | 27 --------------------------- > 1 file changed, 27 deletions(-) > > diff --git a/.gitlab-ci.d/buildtest.yml b/.gitlab-ci.d/buildtest.yml > index aee9101507..83ce448c4d 100644 > --- a/.gitlab-ci.d/buildtest.yml > +++ b/.gitlab-ci.d/buildtest.yml > @@ -22,15 +22,6 @@ check-system-alpine: > IMAGE: alpine > MAKE_CHECK_ARGS: check-unit check-qtest > > -avocado-system-alpine: > - extends: .avocado_test_job_template > - needs: > - - job: build-system-alpine > - artifacts: true > - variables: > - IMAGE: alpine > - MAKE_CHECK_ARGS: check-avocado Instead of entirely deleting, I'd suggest adding # Disabled due to frequent random failures # https://gitlab.com/qemu-project/qemu/-/issues/1884 when: manual See example: https://docs.gitlab.com/ee/ci/yaml/#when This disables the job from running unless someone explicitly tells it to run > - > build-system-ubuntu: > extends: > - .native_build_job_template > @@ -53,15 +44,6 @@ check-system-ubuntu: > IMAGE: ubuntu2204 > MAKE_CHECK_ARGS: check > > -avocado-system-ubuntu: > - extends: .avocado_test_job_template > - needs: > - - job: build-system-ubuntu > - artifacts: true > - variables: > - IMAGE: ubuntu2204 > - MAKE_CHECK_ARGS: check-avocado > - > build-system-debian: > extends: > - .native_build_job_template > @@ -127,15 +109,6 @@ check-system-fedora: > IMAGE: fedora > MAKE_CHECK_ARGS: check > > -avocado-system-fedora: > - extends: .avocado_test_job_template > - needs: > - - job: build-system-fedora > - artifacts: true > - variables: > - IMAGE: fedora > - MAKE_CHECK_ARGS: check-avocado > - > crash-test-fedora: > extends: .native_test_job_template > needs: > -- > 2.41.0 > > With regards, Daniel
Daniel P. Berrangé <berrange@redhat.com> writes: > On Tue, Sep 12, 2023 at 11:06:11AM -0400, Stefan Hajnoczi wrote: >> The avocado-system-alpine, avocado-system-fedora, and >> avocado-system-ubuntu jobs are unreliable. I identified them while >> looking over CI failures from the past week: >> https://gitlab.com/qemu-project/qemu/-/jobs/5058610614 >> https://gitlab.com/qemu-project/qemu/-/jobs/5058610654 >> https://gitlab.com/qemu-project/qemu/-/jobs/5030428571 >> >> Thomas Huth suggest on IRC today that there may be a legitimate failure >> in there: >> >> th_huth: f4bug, yes, seems like it does not start at all correctly on >> alpine anymore ... and it's broken since ~ 2 weeks already, so if nobody >> noticed this by now, this is worrying >> >> It crept in because the jobs were already unreliable. >> >> I don't know how to interpret the job output, so all I can do is to >> propose removing these jobs. A useful CI job has two outcomes: pass or >> fail. Timeouts and other in-between states are not useful because they >> require constant triaging by someone who understands the details of the >> tests and they can occur when run against pull requests that have >> nothing to do with the area covered by the test. >> >> Hopefully test owners will be able to identify the root causes and solve >> them so that these jobs can stay. In their current state the jobs are >> not useful since I cannot cannot tell whether job failures are real or >> just intermittent when merging qemu.git pull requests. >> >> If you are a test owner, please take a look. >> >> It is likely that other avocado-system-* CI jobs have similar failures >> from time to time, but I'll leave them as long as they are passing. >> >> Buglink: https://gitlab.com/qemu-project/qemu/-/issues/1884 >> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> >> --- >> .gitlab-ci.d/buildtest.yml | 27 --------------------------- >> 1 file changed, 27 deletions(-) >> >> diff --git a/.gitlab-ci.d/buildtest.yml b/.gitlab-ci.d/buildtest.yml >> index aee9101507..83ce448c4d 100644 >> --- a/.gitlab-ci.d/buildtest.yml >> +++ b/.gitlab-ci.d/buildtest.yml >> @@ -22,15 +22,6 @@ check-system-alpine: >> IMAGE: alpine >> MAKE_CHECK_ARGS: check-unit check-qtest >> >> -avocado-system-alpine: >> - extends: .avocado_test_job_template >> - needs: >> - - job: build-system-alpine >> - artifacts: true >> - variables: >> - IMAGE: alpine >> - MAKE_CHECK_ARGS: check-avocado > > Instead of entirely deleting, I'd suggest adding > > # Disabled due to frequent random failures > # https://gitlab.com/qemu-project/qemu/-/issues/1884 > when: manual > > See example: https://docs.gitlab.com/ee/ci/yaml/#when > > This disables the job from running unless someone explicitly > tells it to run What I don't understand is why we didn't gate the release back when they first tripped. We should have noticed between: https://gitlab.com/qemu-project/qemu/-/pipelines/956543770 and https://gitlab.com/qemu-project/qemu/-/pipelines/957154381 that the system tests where regressing. Yet we merged the changes anyway. > >> - >> build-system-ubuntu: >> extends: >> - .native_build_job_template >> @@ -53,15 +44,6 @@ check-system-ubuntu: >> IMAGE: ubuntu2204 >> MAKE_CHECK_ARGS: check >> >> -avocado-system-ubuntu: >> - extends: .avocado_test_job_template >> - needs: >> - - job: build-system-ubuntu >> - artifacts: true >> - variables: >> - IMAGE: ubuntu2204 >> - MAKE_CHECK_ARGS: check-avocado >> - >> build-system-debian: >> extends: >> - .native_build_job_template >> @@ -127,15 +109,6 @@ check-system-fedora: >> IMAGE: fedora >> MAKE_CHECK_ARGS: check >> >> -avocado-system-fedora: >> - extends: .avocado_test_job_template >> - needs: >> - - job: build-system-fedora >> - artifacts: true >> - variables: >> - IMAGE: fedora >> - MAKE_CHECK_ARGS: check-avocado >> - >> crash-test-fedora: >> extends: .native_test_job_template >> needs: >> -- >> 2.41.0 >> >> > > With regards, > Daniel
On Tue, Sep 12, 2023 at 05:01:26PM +0100, Alex Bennée wrote: > > Daniel P. Berrangé <berrange@redhat.com> writes: > > > On Tue, Sep 12, 2023 at 11:06:11AM -0400, Stefan Hajnoczi wrote: > >> The avocado-system-alpine, avocado-system-fedora, and > >> avocado-system-ubuntu jobs are unreliable. I identified them while > >> looking over CI failures from the past week: > >> https://gitlab.com/qemu-project/qemu/-/jobs/5058610614 > >> https://gitlab.com/qemu-project/qemu/-/jobs/5058610654 > >> https://gitlab.com/qemu-project/qemu/-/jobs/5030428571 > >> > >> Thomas Huth suggest on IRC today that there may be a legitimate failure > >> in there: > >> > >> th_huth: f4bug, yes, seems like it does not start at all correctly on > >> alpine anymore ... and it's broken since ~ 2 weeks already, so if nobody > >> noticed this by now, this is worrying > >> > >> It crept in because the jobs were already unreliable. > >> > >> I don't know how to interpret the job output, so all I can do is to > >> propose removing these jobs. A useful CI job has two outcomes: pass or > >> fail. Timeouts and other in-between states are not useful because they > >> require constant triaging by someone who understands the details of the > >> tests and they can occur when run against pull requests that have > >> nothing to do with the area covered by the test. > >> > >> Hopefully test owners will be able to identify the root causes and solve > >> them so that these jobs can stay. In their current state the jobs are > >> not useful since I cannot cannot tell whether job failures are real or > >> just intermittent when merging qemu.git pull requests. > >> > >> If you are a test owner, please take a look. > >> > >> It is likely that other avocado-system-* CI jobs have similar failures > >> from time to time, but I'll leave them as long as they are passing. > >> > >> Buglink: https://gitlab.com/qemu-project/qemu/-/issues/1884 > >> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> > >> --- > >> .gitlab-ci.d/buildtest.yml | 27 --------------------------- > >> 1 file changed, 27 deletions(-) > >> > >> diff --git a/.gitlab-ci.d/buildtest.yml b/.gitlab-ci.d/buildtest.yml > >> index aee9101507..83ce448c4d 100644 > >> --- a/.gitlab-ci.d/buildtest.yml > >> +++ b/.gitlab-ci.d/buildtest.yml > >> @@ -22,15 +22,6 @@ check-system-alpine: > >> IMAGE: alpine > >> MAKE_CHECK_ARGS: check-unit check-qtest > >> > >> -avocado-system-alpine: > >> - extends: .avocado_test_job_template > >> - needs: > >> - - job: build-system-alpine > >> - artifacts: true > >> - variables: > >> - IMAGE: alpine > >> - MAKE_CHECK_ARGS: check-avocado > > > > Instead of entirely deleting, I'd suggest adding > > > > # Disabled due to frequent random failures > > # https://gitlab.com/qemu-project/qemu/-/issues/1884 > > when: manual > > > > See example: https://docs.gitlab.com/ee/ci/yaml/#when > > > > This disables the job from running unless someone explicitly > > tells it to run > > What I don't understand is why we didn't gate the release back when they > first tripped. We should have noticed between: > > https://gitlab.com/qemu-project/qemu/-/pipelines/956543770 > > and > > https://gitlab.com/qemu-project/qemu/-/pipelines/957154381 > > that the system tests where regressing. Yet we merged the changes > anyway. I think that green series is misleading, based on Richard's mail on list wrt the TCG pull series: https://lists.gnu.org/archive/html/qemu-devel/2023-08/msg04014.html "It's some sort of timing issue, which sometimes goes away when re-run. I was re-running tests *a lot* in order to get them to go green while running the 8.1 release. " Essentially I'd put this down to the tests being soo non-deterministic that we've given up trusting them. With regards, Daniel
On Tue, Sep 12, 2023, 12:14 Daniel P. Berrangé <berrange@redhat.com> wrote: > On Tue, Sep 12, 2023 at 05:01:26PM +0100, Alex Bennée wrote: > > > > Daniel P. Berrangé <berrange@redhat.com> writes: > > > > > On Tue, Sep 12, 2023 at 11:06:11AM -0400, Stefan Hajnoczi wrote: > > >> The avocado-system-alpine, avocado-system-fedora, and > > >> avocado-system-ubuntu jobs are unreliable. I identified them while > > >> looking over CI failures from the past week: > > >> https://gitlab.com/qemu-project/qemu/-/jobs/5058610614 > > >> https://gitlab.com/qemu-project/qemu/-/jobs/5058610654 > > >> https://gitlab.com/qemu-project/qemu/-/jobs/5030428571 > > >> > > >> Thomas Huth suggest on IRC today that there may be a legitimate > failure > > >> in there: > > >> > > >> th_huth: f4bug, yes, seems like it does not start at all correctly > on > > >> alpine anymore ... and it's broken since ~ 2 weeks already, so if > nobody > > >> noticed this by now, this is worrying > > >> > > >> It crept in because the jobs were already unreliable. > > >> > > >> I don't know how to interpret the job output, so all I can do is to > > >> propose removing these jobs. A useful CI job has two outcomes: pass or > > >> fail. Timeouts and other in-between states are not useful because they > > >> require constant triaging by someone who understands the details of > the > > >> tests and they can occur when run against pull requests that have > > >> nothing to do with the area covered by the test. > > >> > > >> Hopefully test owners will be able to identify the root causes and > solve > > >> them so that these jobs can stay. In their current state the jobs are > > >> not useful since I cannot cannot tell whether job failures are real or > > >> just intermittent when merging qemu.git pull requests. > > >> > > >> If you are a test owner, please take a look. > > >> > > >> It is likely that other avocado-system-* CI jobs have similar failures > > >> from time to time, but I'll leave them as long as they are passing. > > >> > > >> Buglink: https://gitlab.com/qemu-project/qemu/-/issues/1884 > > >> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> > > >> --- > > >> .gitlab-ci.d/buildtest.yml | 27 --------------------------- > > >> 1 file changed, 27 deletions(-) > > >> > > >> diff --git a/.gitlab-ci.d/buildtest.yml b/.gitlab-ci.d/buildtest.yml > > >> index aee9101507..83ce448c4d 100644 > > >> --- a/.gitlab-ci.d/buildtest.yml > > >> +++ b/.gitlab-ci.d/buildtest.yml > > >> @@ -22,15 +22,6 @@ check-system-alpine: > > >> IMAGE: alpine > > >> MAKE_CHECK_ARGS: check-unit check-qtest > > >> > > >> -avocado-system-alpine: > > >> - extends: .avocado_test_job_template > > >> - needs: > > >> - - job: build-system-alpine > > >> - artifacts: true > > >> - variables: > > >> - IMAGE: alpine > > >> - MAKE_CHECK_ARGS: check-avocado > > > > > > Instead of entirely deleting, I'd suggest adding > > > > > > # Disabled due to frequent random failures > > > # https://gitlab.com/qemu-project/qemu/-/issues/1884 > > > when: manual > > > > > > See example: https://docs.gitlab.com/ee/ci/yaml/#when > > > > > > This disables the job from running unless someone explicitly > > > tells it to run > > > > What I don't understand is why we didn't gate the release back when they > > first tripped. We should have noticed between: > > > > https://gitlab.com/qemu-project/qemu/-/pipelines/956543770 > > > > and > > > > https://gitlab.com/qemu-project/qemu/-/pipelines/957154381 > > > > that the system tests where regressing. Yet we merged the changes > > anyway. > > I think that green series is misleading, based on Richard's > mail on list wrt the TCG pull series: > > https://lists.gnu.org/archive/html/qemu-devel/2023-08/msg04014.html > > "It's some sort of timing issue, which sometimes goes away > when re-run. I was re-running tests *a lot* in order to > get them to go green while running the 8.1 release. " > > > Essentially I'd put this down to the tests being soo non-deterministic > that we've given up trusting them. > Yes. Stefan > With regards, > Daniel > -- > |: https://berrange.com -o- > https://www.flickr.com/photos/dberrange :| > |: https://libvirt.org -o- > https://fstop138.berrange.com :| > |: https://entangle-photo.org -o- > https://www.instagram.com/dberrange :| > > >
Stefan Hajnoczi <stefanha@gmail.com> writes: > On Tue, Sep 12, 2023, 12:14 Daniel P. Berrangé <berrange@redhat.com> wrote: > > On Tue, Sep 12, 2023 at 05:01:26PM +0100, Alex Bennée wrote: > > > > Daniel P. Berrangé <berrange@redhat.com> writes: > > > > > On Tue, Sep 12, 2023 at 11:06:11AM -0400, Stefan Hajnoczi wrote: > > >> The avocado-system-alpine, avocado-system-fedora, and > > >> avocado-system-ubuntu jobs are unreliable. I identified them while > > >> looking over CI failures from the past week: > > >> https://gitlab.com/qemu-project/qemu/-/jobs/5058610614 > > >> https://gitlab.com/qemu-project/qemu/-/jobs/5058610654 > > >> https://gitlab.com/qemu-project/qemu/-/jobs/5030428571 > > >> > > >> Thomas Huth suggest on IRC today that there may be a legitimate failure > > >> in there: > > >> > > >> th_huth: f4bug, yes, seems like it does not start at all correctly on > > >> alpine anymore ... and it's broken since ~ 2 weeks already, so if nobody > > >> noticed this by now, this is worrying > > >> > > >> It crept in because the jobs were already unreliable. > > >> > > >> I don't know how to interpret the job output, so all I can do is to > > >> propose removing these jobs. A useful CI job has two outcomes: pass or > > >> fail. Timeouts and other in-between states are not useful because they > > >> require constant triaging by someone who understands the details of the > > >> tests and they can occur when run against pull requests that have > > >> nothing to do with the area covered by the test. > > >> > > >> Hopefully test owners will be able to identify the root causes and solve > > >> them so that these jobs can stay. In their current state the jobs are > > >> not useful since I cannot cannot tell whether job failures are real or > > >> just intermittent when merging qemu.git pull requests. > > >> > > >> If you are a test owner, please take a look. > > >> > > >> It is likely that other avocado-system-* CI jobs have similar failures > > >> from time to time, but I'll leave them as long as they are passing. > > >> > > >> Buglink: https://gitlab.com/qemu-project/qemu/-/issues/1884 > > >> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> > > >> --- > > >> .gitlab-ci.d/buildtest.yml | 27 --------------------------- > > >> 1 file changed, 27 deletions(-) > > >> > > >> diff --git a/.gitlab-ci.d/buildtest.yml b/.gitlab-ci.d/buildtest.yml > > >> index aee9101507..83ce448c4d 100644 > > >> --- a/.gitlab-ci.d/buildtest.yml > > >> +++ b/.gitlab-ci.d/buildtest.yml > > >> @@ -22,15 +22,6 @@ check-system-alpine: > > >> IMAGE: alpine > > >> MAKE_CHECK_ARGS: check-unit check-qtest > > >> > > >> -avocado-system-alpine: > > >> - extends: .avocado_test_job_template > > >> - needs: > > >> - - job: build-system-alpine > > >> - artifacts: true > > >> - variables: > > >> - IMAGE: alpine > > >> - MAKE_CHECK_ARGS: check-avocado > > > > > > Instead of entirely deleting, I'd suggest adding > > > > > > # Disabled due to frequent random failures > > > # https://gitlab.com/qemu-project/qemu/-/issues/1884 > > > when: manual > > > > > > See example: https://docs.gitlab.com/ee/ci/yaml/#when > > > > > > This disables the job from running unless someone explicitly > > > tells it to run > > > > What I don't understand is why we didn't gate the release back when they > > first tripped. We should have noticed between: > > > > https://gitlab.com/qemu-project/qemu/-/pipelines/956543770 > > > > and > > > > https://gitlab.com/qemu-project/qemu/-/pipelines/957154381 > > > > that the system tests where regressing. Yet we merged the changes > > anyway. > > I think that green series is misleading, based on Richard's > mail on list wrt the TCG pull series: > > https://lists.gnu.org/archive/html/qemu-devel/2023-08/msg04014.html > > "It's some sort of timing issue, which sometimes goes away > when re-run. I was re-running tests *a lot* in order to > get them to go green while running the 8.1 release. " But I think in that actual case a change exposed a race condition which has only recently been fixed - however we've had additional regresssions since. Rather than kill the system tests we can disable the flaky individual tests in avocado. > > Essentially I'd put this down to the tests being soo non-deterministic > that we've given up trusting them. > > Yes. > > Stefan > > With regards, > Daniel > -- > |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| > |: https://libvirt.org -o- https://fstop138.berrange.com :| > |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
On Tue, 12 Sept 2023 at 14:36, Alex Bennée <alex.bennee@linaro.org> wrote: > > > Stefan Hajnoczi <stefanha@gmail.com> writes: > > > On Tue, Sep 12, 2023, 12:14 Daniel P. Berrangé <berrange@redhat.com> wrote: > > > > On Tue, Sep 12, 2023 at 05:01:26PM +0100, Alex Bennée wrote: > > > > > > Daniel P. Berrangé <berrange@redhat.com> writes: > > > > > > > On Tue, Sep 12, 2023 at 11:06:11AM -0400, Stefan Hajnoczi wrote: > > > >> The avocado-system-alpine, avocado-system-fedora, and > > > >> avocado-system-ubuntu jobs are unreliable. I identified them while > > > >> looking over CI failures from the past week: > > > >> https://gitlab.com/qemu-project/qemu/-/jobs/5058610614 > > > >> https://gitlab.com/qemu-project/qemu/-/jobs/5058610654 > > > >> https://gitlab.com/qemu-project/qemu/-/jobs/5030428571 > > > >> > > > >> Thomas Huth suggest on IRC today that there may be a legitimate failure > > > >> in there: > > > >> > > > >> th_huth: f4bug, yes, seems like it does not start at all correctly on > > > >> alpine anymore ... and it's broken since ~ 2 weeks already, so if nobody > > > >> noticed this by now, this is worrying > > > >> > > > >> It crept in because the jobs were already unreliable. > > > >> > > > >> I don't know how to interpret the job output, so all I can do is to > > > >> propose removing these jobs. A useful CI job has two outcomes: pass or > > > >> fail. Timeouts and other in-between states are not useful because they > > > >> require constant triaging by someone who understands the details of the > > > >> tests and they can occur when run against pull requests that have > > > >> nothing to do with the area covered by the test. > > > >> > > > >> Hopefully test owners will be able to identify the root causes and solve > > > >> them so that these jobs can stay. In their current state the jobs are > > > >> not useful since I cannot cannot tell whether job failures are real or > > > >> just intermittent when merging qemu.git pull requests. > > > >> > > > >> If you are a test owner, please take a look. > > > >> > > > >> It is likely that other avocado-system-* CI jobs have similar failures > > > >> from time to time, but I'll leave them as long as they are passing. > > > >> > > > >> Buglink: https://gitlab.com/qemu-project/qemu/-/issues/1884 > > > >> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> > > > >> --- > > > >> .gitlab-ci.d/buildtest.yml | 27 --------------------------- > > > >> 1 file changed, 27 deletions(-) > > > >> > > > >> diff --git a/.gitlab-ci.d/buildtest.yml b/.gitlab-ci.d/buildtest.yml > > > >> index aee9101507..83ce448c4d 100644 > > > >> --- a/.gitlab-ci.d/buildtest.yml > > > >> +++ b/.gitlab-ci.d/buildtest.yml > > > >> @@ -22,15 +22,6 @@ check-system-alpine: > > > >> IMAGE: alpine > > > >> MAKE_CHECK_ARGS: check-unit check-qtest > > > >> > > > >> -avocado-system-alpine: > > > >> - extends: .avocado_test_job_template > > > >> - needs: > > > >> - - job: build-system-alpine > > > >> - artifacts: true > > > >> - variables: > > > >> - IMAGE: alpine > > > >> - MAKE_CHECK_ARGS: check-avocado > > > > > > > > Instead of entirely deleting, I'd suggest adding > > > > > > > > # Disabled due to frequent random failures > > > > # https://gitlab.com/qemu-project/qemu/-/issues/1884 > > > > when: manual > > > > > > > > See example: https://docs.gitlab.com/ee/ci/yaml/#when > > > > > > > > This disables the job from running unless someone explicitly > > > > tells it to run > > > > > > What I don't understand is why we didn't gate the release back when they > > > first tripped. We should have noticed between: > > > > > > https://gitlab.com/qemu-project/qemu/-/pipelines/956543770 > > > > > > and > > > > > > https://gitlab.com/qemu-project/qemu/-/pipelines/957154381 > > > > > > that the system tests where regressing. Yet we merged the changes > > > anyway. > > > > I think that green series is misleading, based on Richard's > > mail on list wrt the TCG pull series: > > > > https://lists.gnu.org/archive/html/qemu-devel/2023-08/msg04014.html > > > > "It's some sort of timing issue, which sometimes goes away > > when re-run. I was re-running tests *a lot* in order to > > get them to go green while running the 8.1 release. " > > But I think in that actual case a change exposed a race condition which > has only recently been fixed - however we've had additional regresssions > since. > > Rather than kill the system tests we can disable the flaky individual > tests in avocado. That would be nice, please send an alternative patch. I can't do that myself because there are a bunch of test cases with suspicious output and I don't know which ones are legitimate failures, intermittent problems, or expected failures. Stefan > > > > > Essentially I'd put this down to the tests being soo non-deterministic > > that we've given up trusting them. > > > > Yes. > > > > Stefan > > > > With regards, > > Daniel > > -- > > |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| > > |: https://libvirt.org -o- https://fstop138.berrange.com :| > > |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :| > > > -- > Alex Bennée > Virtualisation Tech Lead @ Linaro
On 12/09/2023 17.06, Stefan Hajnoczi wrote: > The avocado-system-alpine, avocado-system-fedora, and > avocado-system-ubuntu jobs are unreliable. I identified them while > looking over CI failures from the past week: > https://gitlab.com/qemu-project/qemu/-/jobs/5058610614 > https://gitlab.com/qemu-project/qemu/-/jobs/5058610654 > https://gitlab.com/qemu-project/qemu/-/jobs/5030428571 > > Thomas Huth suggest on IRC today that there may be a legitimate failure > in there: > > th_huth: f4bug, yes, seems like it does not start at all correctly on > alpine anymore ... and it's broken since ~ 2 weeks already, so if nobody > noticed this by now, this is worrying > > It crept in because the jobs were already unreliable. > > I don't know how to interpret the job output, so all I can do is to > propose removing these jobs. A useful CI job has two outcomes: pass or > fail. Timeouts and other in-between states are not useful because they > require constant triaging by someone who understands the details of the > tests and they can occur when run against pull requests that have > nothing to do with the area covered by the test. > > Hopefully test owners will be able to identify the root causes and solve > them so that these jobs can stay. In their current state the jobs are > not useful since I cannot cannot tell whether job failures are real or > just intermittent when merging qemu.git pull requests. > > If you are a test owner, please take a look. > > It is likely that other avocado-system-* CI jobs have similar failures > from time to time, but I'll leave them as long as they are passing. > > Buglink: https://gitlab.com/qemu-project/qemu/-/issues/1884 > Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> > --- > .gitlab-ci.d/buildtest.yml | 27 --------------------------- > 1 file changed, 27 deletions(-) > > diff --git a/.gitlab-ci.d/buildtest.yml b/.gitlab-ci.d/buildtest.yml > index aee9101507..83ce448c4d 100644 > --- a/.gitlab-ci.d/buildtest.yml > +++ b/.gitlab-ci.d/buildtest.yml > @@ -22,15 +22,6 @@ check-system-alpine: > IMAGE: alpine > MAKE_CHECK_ARGS: check-unit check-qtest > > -avocado-system-alpine: > - extends: .avocado_test_job_template > - needs: > - - job: build-system-alpine > - artifacts: true > - variables: > - IMAGE: alpine > - MAKE_CHECK_ARGS: check-avocado Please don't remove the whole job! Just disable the failing tests within the job, e.g.: diff --git a/tests/avocado/replay_kernel.py b/tests/avocado/replay_kernel.py --- a/tests/avocado/replay_kernel.py +++ b/tests/avocado/replay_kernel.py @@ -503,6 +503,7 @@ def do_test_mips_malta32el_nanomips(self, kernel_path_xz): console_pattern = 'Kernel command line: %s' % kernel_command_line self.run_rr(kernel_path, kernel_command_line, console_pattern, shift=5) + @skipIf(os.getenv('GITLAB_CI'), 'Skipping unstable test on GitLab') def test_mips_malta32el_nanomips_4k(self): """ :avocado: tags=arch:mipsel @@ -517,6 +518,7 @@ def test_mips_malta32el_nanomips_4k(self): kernel_path_xz = self.fetch_asset(kernel_url, asset_hash=kernel_hash) self.do_test_mips_malta32el_nanomips(kernel_path_xz) + @skipIf(os.getenv('GITLAB_CI'), 'Skipping unstable test on GitLab') def test_mips_malta32el_nanomips_16k_up(self): """ :avocado: tags=arch:mipsel @@ -531,6 +533,7 @@ def test_mips_malta32el_nanomips_16k_up(self): kernel_path_xz = self.fetch_asset(kernel_url, asset_hash=kernel_hash) self.do_test_mips_malta32el_nanomips(kernel_path_xz) + @skipIf(os.getenv('GITLAB_CI'), 'Skipping unstable test on GitLab') def test_mips_malta32el_nanomips_64k_dbg(self): """ :avocado: tags=arch:mipsel
On 12/9/23 21:58, Thomas Huth wrote: > On 12/09/2023 17.06, Stefan Hajnoczi wrote: >> The avocado-system-alpine, avocado-system-fedora, and >> avocado-system-ubuntu jobs are unreliable. I identified them while >> looking over CI failures from the past week: >> https://gitlab.com/qemu-project/qemu/-/jobs/5058610614 >> https://gitlab.com/qemu-project/qemu/-/jobs/5058610654 >> https://gitlab.com/qemu-project/qemu/-/jobs/5030428571 >> >> Thomas Huth suggest on IRC today that there may be a legitimate failure >> in there: >> >> th_huth: f4bug, yes, seems like it does not start at all correctly on >> alpine anymore ... and it's broken since ~ 2 weeks already, so if >> nobody >> noticed this by now, this is worrying >> >> It crept in because the jobs were already unreliable. >> >> I don't know how to interpret the job output, so all I can do is to >> propose removing these jobs. A useful CI job has two outcomes: pass or >> fail. Timeouts and other in-between states are not useful because they >> require constant triaging by someone who understands the details of the >> tests and they can occur when run against pull requests that have >> nothing to do with the area covered by the test. >> >> Hopefully test owners will be able to identify the root causes and solve >> them so that these jobs can stay. In their current state the jobs are >> not useful since I cannot cannot tell whether job failures are real or >> just intermittent when merging qemu.git pull requests. >> >> If you are a test owner, please take a look. >> >> It is likely that other avocado-system-* CI jobs have similar failures >> from time to time, but I'll leave them as long as they are passing. >> >> Buglink: https://gitlab.com/qemu-project/qemu/-/issues/1884 >> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> >> --- >> .gitlab-ci.d/buildtest.yml | 27 --------------------------- >> 1 file changed, 27 deletions(-) >> >> diff --git a/.gitlab-ci.d/buildtest.yml b/.gitlab-ci.d/buildtest.yml >> index aee9101507..83ce448c4d 100644 >> --- a/.gitlab-ci.d/buildtest.yml >> +++ b/.gitlab-ci.d/buildtest.yml >> @@ -22,15 +22,6 @@ check-system-alpine: >> IMAGE: alpine >> MAKE_CHECK_ARGS: check-unit check-qtest >> -avocado-system-alpine: >> - extends: .avocado_test_job_template >> - needs: >> - - job: build-system-alpine >> - artifacts: true >> - variables: >> - IMAGE: alpine >> - MAKE_CHECK_ARGS: check-avocado > > Please don't remove the whole job! Just disable the failing tests within > the job, e.g.: > > diff --git a/tests/avocado/replay_kernel.py > b/tests/avocado/replay_kernel.py > --- a/tests/avocado/replay_kernel.py > +++ b/tests/avocado/replay_kernel.py > @@ -503,6 +503,7 @@ def do_test_mips_malta32el_nanomips(self, > kernel_path_xz): > console_pattern = 'Kernel command line: %s' % kernel_command_line > self.run_rr(kernel_path, kernel_command_line, console_pattern, > shift=5) > > + @skipIf(os.getenv('GITLAB_CI'), 'Skipping unstable test on GitLab') Better: @skipUntil(gitlab_issue_resolved(1884)) (joking) I'll send a patch, smth like: @skipIf(os.getenv('GITLAB_CI'), 'Pending https://gitlab.com/qemu-project/qemu/-/issues/1884') > def test_mips_malta32el_nanomips_4k(self): > """ > :avocado: tags=arch:mipsel
On Tue, 12 Sept 2023 at 21:00, Thomas Huth <thuth@redhat.com> wrote: > Please don't remove the whole job! Just disable the failing tests within the job, e.g.: > > diff --git a/tests/avocado/replay_kernel.py b/tests/avocado/replay_kernel.py > --- a/tests/avocado/replay_kernel.py > +++ b/tests/avocado/replay_kernel.py > @@ -503,6 +503,7 @@ def do_test_mips_malta32el_nanomips(self, kernel_path_xz): > console_pattern = 'Kernel command line: %s' % kernel_command_line > self.run_rr(kernel_path, kernel_command_line, console_pattern, shift=5) > > + @skipIf(os.getenv('GITLAB_CI'), 'Skipping unstable test on GitLab') > def test_mips_malta32el_nanomips_4k(self): > """ > :avocado: tags=arch:mipsel Please don't skip unstable tests on gitlab only. If they're unstable, then nobody wants to be running them and wondering if these are flaky tests or real issues, whether theyr'e doing it on gitlab or locally. (I know we already have a lot of these, but the effect is that instead of saying 'make check-avocado' you have to say 'GITLAB_CI=1 make check-avocado'.) thanks -- PMM
On 13/9/23 11:18, Peter Maydell wrote: > On Tue, 12 Sept 2023 at 21:00, Thomas Huth <thuth@redhat.com> wrote: >> Please don't remove the whole job! Just disable the failing tests within the job, e.g.: >> >> diff --git a/tests/avocado/replay_kernel.py b/tests/avocado/replay_kernel.py >> --- a/tests/avocado/replay_kernel.py >> +++ b/tests/avocado/replay_kernel.py >> @@ -503,6 +503,7 @@ def do_test_mips_malta32el_nanomips(self, kernel_path_xz): >> console_pattern = 'Kernel command line: %s' % kernel_command_line >> self.run_rr(kernel_path, kernel_command_line, console_pattern, shift=5) >> >> + @skipIf(os.getenv('GITLAB_CI'), 'Skipping unstable test on GitLab') >> def test_mips_malta32el_nanomips_4k(self): >> """ >> :avocado: tags=arch:mipsel > > Please don't skip unstable tests on gitlab only. If they're > unstable, then nobody wants to be running them and wondering > if these are flaky tests or real issues, whether theyr'e doing > it on gitlab or locally. (I know we already have a lot of these, > but the effect is that instead of saying 'make check-avocado' > you have to say 'GITLAB_CI=1 make check-avocado'.) Good point, I'll simply use: @skip('Pending https://gitlab.com/qemu-project/qemu/-/issues/1884') Looking at other ones: $ git grep -w @skip tests/avocado/ tests/avocado/machine_sparc_leon3.py:17: @skip("Test currently broken") tests/avocado/netdev-ethtool.py:89: @skip("Incomplete reg 0x00178 support") tests/avocado/netdev-ethtool.py:96: @skip("Incomplete reg 0x00178 support") tests/avocado/replay_kernel.py:333: @skip("Test currently broken") # Console stuck as of 5.2-rc1 tests/avocado/replay_kernel.py:368: @skip("nios2 emulation is buggy under record/replay") tests/avocado/virtio_check_params.py:119: @skip("break multi-arch CI") Looking at the first one: commit 5baecf58ad9fb3ce24d331978526909d0beca482 Author: Philippe Mathieu-Daudé <f4bug@amsat.org> Date: Tue Mar 31 12:50:42 2020 +0200 tests/acceptance/machine_sparc_leon3: Disable HelenOS test This test was written/tested around beginning of 2019, but was extracted from a bigger series and posted end of June 2019 [*]. Unfortunately I did not notice commit 162abf1a8 was merged by then, which implements the AHB and APB plug and play devices. HelenOS 0.6 is expecting the PnP registers to be not implemented by QEMU, then forces the discovered AMBA devices (see [2]). Before 162abf1a8, the console was displaying: HelenOS bootloader, release 0.6.0 (Elastic Horse) Built on 2014-12-21 20:17:42 for sparc32 Copyright (c) 2001-2014 HelenOS project 0x4000bf20|0x4000bf20: kernel image (496640/128466 bytes) 0x4002b4f2|0x4002b4f2: ns image (154195/66444 bytes) 0x4003b87e|0x4003b87e: loader image (153182/66437 bytes) 0x4004bc03|0x4004bc03: init image (155339/66834 bytes) 0x4005c115|0x4005c115: locsrv image (162063/70267 bytes) 0x4006d390|0x4006d390: rd image (152678/65889 bytes) 0x4007d4f1|0x4007d4f1: vfs image (168480/73394 bytes) 0x4008f3a3|0x4008f3a3: logger image (158034/68368 bytes) 0x4009feb3|0x4009feb3: ext4fs image (234510/100301 bytes) 0x400b8680|0x400b8680: initrd image (8388608/1668901 bytes) ABMA devices: <1:00c> at 0x80000100 irq 3 <1:00d> at 0x80000200 <1:011> at 0x80000300 irq 8 Memory size: 64 MB As of this commit, it is now confused: ABMA devices: <1:3000> at 0x00000000 irq 0 <1:3000> at 0x00000000 irq 0 <1:3000> at 0x00000000 irq 0 <1:3000> at 0x00000000 irq 0 <1:3000> at 0x00000000 irq 0 <1:3000> at 0x00000000 irq 0 <1:3000> at 0x00000000 irq 0 ... As this test is not working as expected, simply disable it (by skipping it) for now. More than 3 years passed already, what a disappointment. Offending commit is 4 years old. commit 162abf1a83ddd06ce1618666f84f88ba4dbffe10 Author: KONRAD Frederic <frederic.konrad@adacore.com> Date: Wed May 15 14:31:32 2019 +0200 leon3: introduce the plug and play mechanism This adds the AHB and APB plug and play devices. They are scanned during the linux boot to discover the various peripheral. I'm not complaining about that particular commit, I wonder about usefulness of disabling tests from unmaintained areas. Maybe we can commit a date when disabling a test, having a disabled test failing _after_ that date, so if it isn't fixed we remove it. Smth like, @SkipBroken(date='2023-11-15', desc='Pending https://gitlab.com/qemu-project/qemu/-/issues/1884') # Will fail if run after 2023-11-15 and this test isn't fixed Thoughts?
Philippe Mathieu-Daudé <philmd@linaro.org> writes: > On 13/9/23 11:18, Peter Maydell wrote: >> On Tue, 12 Sept 2023 at 21:00, Thomas Huth <thuth@redhat.com> wrote: >>> Please don't remove the whole job! Just disable the failing tests within the job, e.g.: >>> >>> diff --git a/tests/avocado/replay_kernel.py b/tests/avocado/replay_kernel.py >>> --- a/tests/avocado/replay_kernel.py >>> +++ b/tests/avocado/replay_kernel.py >>> @@ -503,6 +503,7 @@ def do_test_mips_malta32el_nanomips(self, kernel_path_xz): >>> console_pattern = 'Kernel command line: %s' % kernel_command_line >>> self.run_rr(kernel_path, kernel_command_line, console_pattern, shift=5) >>> >>> + @skipIf(os.getenv('GITLAB_CI'), 'Skipping unstable test on GitLab') >>> def test_mips_malta32el_nanomips_4k(self): >>> """ >>> :avocado: tags=arch:mipsel >> Please don't skip unstable tests on gitlab only. If they're >> unstable, then nobody wants to be running them and wondering >> if these are flaky tests or real issues, whether theyr'e doing >> it on gitlab or locally. (I know we already have a lot of these, >> but the effect is that instead of saying 'make check-avocado' >> you have to say 'GITLAB_CI=1 make check-avocado'.) > > Good point, I'll simply use: > > @skip('Pending https://gitlab.com/qemu-project/qemu/-/issues/1884') > > Looking at other ones: > > $ git grep -w @skip tests/avocado/ > tests/avocado/machine_sparc_leon3.py:17: @skip("Test currently broken") > tests/avocado/netdev-ethtool.py:89: @skip("Incomplete reg 0x00178 > support") > tests/avocado/netdev-ethtool.py:96: @skip("Incomplete reg 0x00178 > support") > tests/avocado/replay_kernel.py:333: @skip("Test currently broken") > # Console stuck as of 5.2-rc1 > tests/avocado/replay_kernel.py:368: @skip("nios2 emulation is buggy > under record/replay") > tests/avocado/virtio_check_params.py:119: @skip("break multi-arch CI") > > Looking at the first one: > > commit 5baecf58ad9fb3ce24d331978526909d0beca482 > Author: Philippe Mathieu-Daudé <f4bug@amsat.org> > Date: Tue Mar 31 12:50:42 2020 +0200 > > tests/acceptance/machine_sparc_leon3: Disable HelenOS test > > This test was written/tested around beginning of 2019, but was > extracted from a bigger series and posted end of June 2019 [*]. > Unfortunately I did not notice commit 162abf1a8 was merged by > then, which implements the AHB and APB plug and play devices. > > HelenOS 0.6 is expecting the PnP registers to be not implemented > by QEMU, then forces the discovered AMBA devices (see [2]). > > Before 162abf1a8, the console was displaying: > > HelenOS bootloader, release 0.6.0 (Elastic Horse) > Built on 2014-12-21 20:17:42 for sparc32 > Copyright (c) 2001-2014 HelenOS project > 0x4000bf20|0x4000bf20: kernel image (496640/128466 bytes) > 0x4002b4f2|0x4002b4f2: ns image (154195/66444 bytes) > 0x4003b87e|0x4003b87e: loader image (153182/66437 bytes) > 0x4004bc03|0x4004bc03: init image (155339/66834 bytes) > 0x4005c115|0x4005c115: locsrv image (162063/70267 bytes) > 0x4006d390|0x4006d390: rd image (152678/65889 bytes) > 0x4007d4f1|0x4007d4f1: vfs image (168480/73394 bytes) > 0x4008f3a3|0x4008f3a3: logger image (158034/68368 bytes) > 0x4009feb3|0x4009feb3: ext4fs image (234510/100301 bytes) > 0x400b8680|0x400b8680: initrd image (8388608/1668901 bytes) > ABMA devices: > <1:00c> at 0x80000100 irq 3 > <1:00d> at 0x80000200 > <1:011> at 0x80000300 irq 8 > Memory size: 64 MB > > As of this commit, it is now confused: > > ABMA devices: > <1:3000> at 0x00000000 irq 0 > <1:3000> at 0x00000000 irq 0 > <1:3000> at 0x00000000 irq 0 > <1:3000> at 0x00000000 irq 0 > <1:3000> at 0x00000000 irq 0 > <1:3000> at 0x00000000 irq 0 > <1:3000> at 0x00000000 irq 0 > ... > > As this test is not working as expected, simply disable it (by > skipping it) for now. > > More than 3 years passed already, what a disappointment. > Offending commit is 4 years old. > > commit 162abf1a83ddd06ce1618666f84f88ba4dbffe10 > Author: KONRAD Frederic <frederic.konrad@adacore.com> > Date: Wed May 15 14:31:32 2019 +0200 > > leon3: introduce the plug and play mechanism > > This adds the AHB and APB plug and play devices. > They are scanned during the linux boot to discover the various > peripheral. > > I'm not complaining about that particular commit, I wonder about > usefulness of disabling tests from unmaintained areas. > > Maybe we can commit a date when disabling a test, having a disabled > test failing _after_ that date, so if it isn't fixed we remove it. > Smth like, > > @SkipBroken(date='2023-11-15', > desc='Pending > https://gitlab.com/qemu-project/qemu/-/issues/1884') # > Will fail if run after 2023-11-15 and this test isn't > fixed > > Thoughts? I like the: @skip('Pending https://gitlab.com/qemu-project/qemu/-/issues/1884') I think trying to do anything more fancy is just going to lead to frustration later on.
On Wed, Sep 13, 2023 at 11:35:12AM +0100, Alex Bennée wrote: > > Philippe Mathieu-Daudé <philmd@linaro.org> writes: > > > On 13/9/23 11:18, Peter Maydell wrote: > >> On Tue, 12 Sept 2023 at 21:00, Thomas Huth <thuth@redhat.com> wrote: > >>> Please don't remove the whole job! Just disable the failing tests within the job, e.g.: > >>> > >>> diff --git a/tests/avocado/replay_kernel.py b/tests/avocado/replay_kernel.py > >>> --- a/tests/avocado/replay_kernel.py > >>> +++ b/tests/avocado/replay_kernel.py > >>> @@ -503,6 +503,7 @@ def do_test_mips_malta32el_nanomips(self, kernel_path_xz): > >>> console_pattern = 'Kernel command line: %s' % kernel_command_line > >>> self.run_rr(kernel_path, kernel_command_line, console_pattern, shift=5) > >>> > >>> + @skipIf(os.getenv('GITLAB_CI'), 'Skipping unstable test on GitLab') > >>> def test_mips_malta32el_nanomips_4k(self): > >>> """ > >>> :avocado: tags=arch:mipsel > >> Please don't skip unstable tests on gitlab only. If they're > >> unstable, then nobody wants to be running them and wondering > >> if these are flaky tests or real issues, whether theyr'e doing > >> it on gitlab or locally. (I know we already have a lot of these, > >> but the effect is that instead of saying 'make check-avocado' > >> you have to say 'GITLAB_CI=1 make check-avocado'.) > > > > Good point, I'll simply use: > > > > @skip('Pending https://gitlab.com/qemu-project/qemu/-/issues/1884') > > > > Looking at other ones: > > > > $ git grep -w @skip tests/avocado/ > > tests/avocado/machine_sparc_leon3.py:17: @skip("Test currently broken") > > tests/avocado/netdev-ethtool.py:89: @skip("Incomplete reg 0x00178 > > support") > > tests/avocado/netdev-ethtool.py:96: @skip("Incomplete reg 0x00178 > > support") > > tests/avocado/replay_kernel.py:333: @skip("Test currently broken") > > # Console stuck as of 5.2-rc1 > > tests/avocado/replay_kernel.py:368: @skip("nios2 emulation is buggy > > under record/replay") > > tests/avocado/virtio_check_params.py:119: @skip("break multi-arch CI") > > > > Looking at the first one: > > > > commit 5baecf58ad9fb3ce24d331978526909d0beca482 > > Author: Philippe Mathieu-Daudé <f4bug@amsat.org> > > Date: Tue Mar 31 12:50:42 2020 +0200 > > > > tests/acceptance/machine_sparc_leon3: Disable HelenOS test > > > > This test was written/tested around beginning of 2019, but was > > extracted from a bigger series and posted end of June 2019 [*]. > > Unfortunately I did not notice commit 162abf1a8 was merged by > > then, which implements the AHB and APB plug and play devices. > > > > HelenOS 0.6 is expecting the PnP registers to be not implemented > > by QEMU, then forces the discovered AMBA devices (see [2]). > > > > Before 162abf1a8, the console was displaying: > > > > HelenOS bootloader, release 0.6.0 (Elastic Horse) > > Built on 2014-12-21 20:17:42 for sparc32 > > Copyright (c) 2001-2014 HelenOS project > > 0x4000bf20|0x4000bf20: kernel image (496640/128466 bytes) > > 0x4002b4f2|0x4002b4f2: ns image (154195/66444 bytes) > > 0x4003b87e|0x4003b87e: loader image (153182/66437 bytes) > > 0x4004bc03|0x4004bc03: init image (155339/66834 bytes) > > 0x4005c115|0x4005c115: locsrv image (162063/70267 bytes) > > 0x4006d390|0x4006d390: rd image (152678/65889 bytes) > > 0x4007d4f1|0x4007d4f1: vfs image (168480/73394 bytes) > > 0x4008f3a3|0x4008f3a3: logger image (158034/68368 bytes) > > 0x4009feb3|0x4009feb3: ext4fs image (234510/100301 bytes) > > 0x400b8680|0x400b8680: initrd image (8388608/1668901 bytes) > > ABMA devices: > > <1:00c> at 0x80000100 irq 3 > > <1:00d> at 0x80000200 > > <1:011> at 0x80000300 irq 8 > > Memory size: 64 MB > > > > As of this commit, it is now confused: > > > > ABMA devices: > > <1:3000> at 0x00000000 irq 0 > > <1:3000> at 0x00000000 irq 0 > > <1:3000> at 0x00000000 irq 0 > > <1:3000> at 0x00000000 irq 0 > > <1:3000> at 0x00000000 irq 0 > > <1:3000> at 0x00000000 irq 0 > > <1:3000> at 0x00000000 irq 0 > > ... > > > > As this test is not working as expected, simply disable it (by > > skipping it) for now. > > > > More than 3 years passed already, what a disappointment. > > Offending commit is 4 years old. > > > > commit 162abf1a83ddd06ce1618666f84f88ba4dbffe10 > > Author: KONRAD Frederic <frederic.konrad@adacore.com> > > Date: Wed May 15 14:31:32 2019 +0200 > > > > leon3: introduce the plug and play mechanism > > > > This adds the AHB and APB plug and play devices. > > They are scanned during the linux boot to discover the various > > peripheral. > > > > I'm not complaining about that particular commit, I wonder about > > usefulness of disabling tests from unmaintained areas. > > > > Maybe we can commit a date when disabling a test, having a disabled > > test failing _after_ that date, so if it isn't fixed we remove it. > > Smth like, > > > > @SkipBroken(date='2023-11-15', > > desc='Pending > > https://gitlab.com/qemu-project/qemu/-/issues/1884') # > > Will fail if run after 2023-11-15 and this test isn't > > fixed > > > > Thoughts? > > I like the: > > @skip('Pending https://gitlab.com/qemu-project/qemu/-/issues/1884') I'd remove "Pending " too - the URL alone is sufficient info including the record of dates, to guide us when to finally delete a skipped test for good. With regards, Daniel
diff --git a/.gitlab-ci.d/buildtest.yml b/.gitlab-ci.d/buildtest.yml index aee9101507..83ce448c4d 100644 --- a/.gitlab-ci.d/buildtest.yml +++ b/.gitlab-ci.d/buildtest.yml @@ -22,15 +22,6 @@ check-system-alpine: IMAGE: alpine MAKE_CHECK_ARGS: check-unit check-qtest -avocado-system-alpine: - extends: .avocado_test_job_template - needs: - - job: build-system-alpine - artifacts: true - variables: - IMAGE: alpine - MAKE_CHECK_ARGS: check-avocado - build-system-ubuntu: extends: - .native_build_job_template @@ -53,15 +44,6 @@ check-system-ubuntu: IMAGE: ubuntu2204 MAKE_CHECK_ARGS: check -avocado-system-ubuntu: - extends: .avocado_test_job_template - needs: - - job: build-system-ubuntu - artifacts: true - variables: - IMAGE: ubuntu2204 - MAKE_CHECK_ARGS: check-avocado - build-system-debian: extends: - .native_build_job_template @@ -127,15 +109,6 @@ check-system-fedora: IMAGE: fedora MAKE_CHECK_ARGS: check -avocado-system-fedora: - extends: .avocado_test_job_template - needs: - - job: build-system-fedora - artifacts: true - variables: - IMAGE: fedora - MAKE_CHECK_ARGS: check-avocado - crash-test-fedora: extends: .native_test_job_template needs:
The avocado-system-alpine, avocado-system-fedora, and avocado-system-ubuntu jobs are unreliable. I identified them while looking over CI failures from the past week: https://gitlab.com/qemu-project/qemu/-/jobs/5058610614 https://gitlab.com/qemu-project/qemu/-/jobs/5058610654 https://gitlab.com/qemu-project/qemu/-/jobs/5030428571 Thomas Huth suggest on IRC today that there may be a legitimate failure in there: th_huth: f4bug, yes, seems like it does not start at all correctly on alpine anymore ... and it's broken since ~ 2 weeks already, so if nobody noticed this by now, this is worrying It crept in because the jobs were already unreliable. I don't know how to interpret the job output, so all I can do is to propose removing these jobs. A useful CI job has two outcomes: pass or fail. Timeouts and other in-between states are not useful because they require constant triaging by someone who understands the details of the tests and they can occur when run against pull requests that have nothing to do with the area covered by the test. Hopefully test owners will be able to identify the root causes and solve them so that these jobs can stay. In their current state the jobs are not useful since I cannot cannot tell whether job failures are real or just intermittent when merging qemu.git pull requests. If you are a test owner, please take a look. It is likely that other avocado-system-* CI jobs have similar failures from time to time, but I'll leave them as long as they are passing. Buglink: https://gitlab.com/qemu-project/qemu/-/issues/1884 Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> --- .gitlab-ci.d/buildtest.yml | 27 --------------------------- 1 file changed, 27 deletions(-)