Message ID | 20241007115027.243425-1-thuth@redhat.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Mon, 7 Oct 2024 at 12:50, Thomas Huth <thuth@redhat.com> wrote: > > The following changes since commit b5ab62b3c0050612c7f9b0b4baeb44ebab42775a: > > Merge tag 'for-upstream' of https://gitlab.com/bonzini/qemu into staging (2024-10-04 19:28:37 +0100) > > are available in the Git repository at: > > https://gitlab.com/thuth/qemu.git tags/pull-request-2024-10-07 > > for you to fetch changes up to d841f720c98475c0f67695d99f27794bde69ed6e: > > tests/functional: Bump timeout of some tests (2024-10-07 13:21:41 +0200) > > ---------------------------------------------------------------- > * Mark "gluster" support as deprecated > * Update CI to use macOS 14 instead of 13, and add a macOS 15 job > * Use gitlab mirror for advent calendar test images (seems more stable) > * Bump timeouts of some tests > * Remove CRIS disassembler > * Some m68k and s390x cleanups with regards to load and store APIs > > ---------------------------------------------------------------- This suggests it's moving back to the gitlab mirror for the advent calendar tests, but one CI test still failed trying to access http://www.qemu-advent-calendar.org/2023/download/day13.tar.gz and getting a 503 from it: https://gitlab.com/qemu-project/qemu/-/jobs/8009902301 The clang-system test also hit a couple of timeouts: https://gitlab.com/qemu-project/qemu/-/jobs/8009902206 61/109 qemu:qtest+qtest-alpha / qtest-alpha/qmp-cmd-test TIMEOUT 60.10s killed by signal 15 SIGTERM 93/109 qemu:qtest+qtest-arm / qtest-arm/qmp-cmd-test TIMEOUT 60.04s killed by signal 15 SIGTERM which are presumably pre-existing intermittents, but I mention them here just FYI. Some of the other qmp-cmd-test runs in that job also came close to timing out: 102/109 qemu:qtest+qtest-m68k / qtest-m68k/qmp-cmd-test OK 56.56s 65 subtests passed 105/109 qemu:qtest+qtest-mips64 / qtest-mips64/qmp-cmd-test OK 53.74s 65 subtests passed 106/109 qemu:qtest+qtest-s390x / qtest-s390x/qmp-cmd-test OK 45.48s 65 subtests passed so maybe we should add it to slow_tests with a 120s timeout... thanks -- PMM
On Mon, 7 Oct 2024 at 14:43, Peter Maydell <peter.maydell@linaro.org> wrote: > > On Mon, 7 Oct 2024 at 12:50, Thomas Huth <thuth@redhat.com> wrote: > > > > The following changes since commit b5ab62b3c0050612c7f9b0b4baeb44ebab42775a: > > > > Merge tag 'for-upstream' of https://gitlab.com/bonzini/qemu into staging (2024-10-04 19:28:37 +0100) > > > > are available in the Git repository at: > > > > https://gitlab.com/thuth/qemu.git tags/pull-request-2024-10-07 > > > > for you to fetch changes up to d841f720c98475c0f67695d99f27794bde69ed6e: > > > > tests/functional: Bump timeout of some tests (2024-10-07 13:21:41 +0200) > > > > ---------------------------------------------------------------- > > * Mark "gluster" support as deprecated > > * Update CI to use macOS 14 instead of 13, and add a macOS 15 job > > * Use gitlab mirror for advent calendar test images (seems more stable) > > * Bump timeouts of some tests > > * Remove CRIS disassembler > > * Some m68k and s390x cleanups with regards to load and store APIs > > > > ---------------------------------------------------------------- > > This suggests it's moving back to the gitlab mirror for the > advent calendar tests, but one CI test still failed trying > to access http://www.qemu-advent-calendar.org/2023/download/day13.tar.gz > and getting a 503 from it: > > https://gitlab.com/qemu-project/qemu/-/jobs/8009902301 On the rerun it managed to download: https://gitlab.com/qemu-project/qemu/-/jobs/8011303154 > The clang-system test also hit a couple of timeouts: > https://gitlab.com/qemu-project/qemu/-/jobs/8009902206 > > 61/109 qemu:qtest+qtest-alpha / qtest-alpha/qmp-cmd-test > TIMEOUT 60.10s killed by signal 15 SIGTERM > 93/109 qemu:qtest+qtest-arm / qtest-arm/qmp-cmd-test > TIMEOUT 60.04s killed by signal 15 SIGTERM > > which are presumably pre-existing intermittents, but I > mention them here just FYI. Some of the other qmp-cmd-test > runs in that job also came close to timing out: > > 102/109 qemu:qtest+qtest-m68k / qtest-m68k/qmp-cmd-test OK 56.56s 65 > subtests passed > 105/109 qemu:qtest+qtest-mips64 / qtest-mips64/qmp-cmd-test OK 53.74s > 65 subtests passed > 106/109 qemu:qtest+qtest-s390x / qtest-s390x/qmp-cmd-test OK 45.48s 65 > subtests passed > > so maybe we should add it to slow_tests with a 120s > timeout... As expected, these are all intermittents; on the passing job: https://gitlab.com/qemu-project/qemu/-/jobs/8011303114 they completed in 19s, 20s, 19s, 19s, 19s. So we're seeing factor-of-3 variation in job runtime on this k8s runner :-( Anyway, I've pushed this pullreq; we can look at the above two things as follow-on fixes. thanks -- PMM
On 07/10/2024 16.13, Peter Maydell wrote: > On Mon, 7 Oct 2024 at 14:43, Peter Maydell <peter.maydell@linaro.org> wrote: >> >> On Mon, 7 Oct 2024 at 12:50, Thomas Huth <thuth@redhat.com> wrote: >>> >>> The following changes since commit b5ab62b3c0050612c7f9b0b4baeb44ebab42775a: >>> >>> Merge tag 'for-upstream' of https://gitlab.com/bonzini/qemu into staging (2024-10-04 19:28:37 +0100) >>> >>> are available in the Git repository at: >>> >>> https://gitlab.com/thuth/qemu.git tags/pull-request-2024-10-07 >>> >>> for you to fetch changes up to d841f720c98475c0f67695d99f27794bde69ed6e: >>> >>> tests/functional: Bump timeout of some tests (2024-10-07 13:21:41 +0200) >>> >>> ---------------------------------------------------------------- >>> * Mark "gluster" support as deprecated >>> * Update CI to use macOS 14 instead of 13, and add a macOS 15 job >>> * Use gitlab mirror for advent calendar test images (seems more stable) >>> * Bump timeouts of some tests >>> * Remove CRIS disassembler >>> * Some m68k and s390x cleanups with regards to load and store APIs >>> >>> ---------------------------------------------------------------- >> >> This suggests it's moving back to the gitlab mirror for the >> advent calendar tests, but one CI test still failed trying >> to access http://www.qemu-advent-calendar.org/2023/download/day13.tar.gz >> and getting a 503 from it: >> >> https://gitlab.com/qemu-project/qemu/-/jobs/8009902301 Yes, that day13.tar.gz is from 2023 which is not included in the mirror on gitlab (yet). If we continue to see failures with the original site, I can have a try to put it into the mirror repository, too. > On the rerun it managed to download: > https://gitlab.com/qemu-project/qemu/-/jobs/8011303154 > >> The clang-system test also hit a couple of timeouts: >> https://gitlab.com/qemu-project/qemu/-/jobs/8009902206 >> >> 61/109 qemu:qtest+qtest-alpha / qtest-alpha/qmp-cmd-test >> TIMEOUT 60.10s killed by signal 15 SIGTERM >> 93/109 qemu:qtest+qtest-arm / qtest-arm/qmp-cmd-test >> TIMEOUT 60.04s killed by signal 15 SIGTERM >> >> which are presumably pre-existing intermittents, but I >> mention them here just FYI. I neither had anything related to arm/alpha nor to qtests in my pull request, so yes, it's likely something pre-existing... maybe something from the previous pull requests? (or did you see these in the past already?) >> Some of the other qmp-cmd-test >> runs in that job also came close to timing out: >> >> 102/109 qemu:qtest+qtest-m68k / qtest-m68k/qmp-cmd-test OK 56.56s 65 >> subtests passed >> 105/109 qemu:qtest+qtest-mips64 / qtest-mips64/qmp-cmd-test OK 53.74s >> 65 subtests passed >> 106/109 qemu:qtest+qtest-s390x / qtest-s390x/qmp-cmd-test OK 45.48s 65 >> subtests passed >> >> so maybe we should add it to slow_tests with a 120s >> timeout... Ok, m68k and s390x have been touched by this PR ... but still, it's one qtest (qmp-cmd-test) that is failing for multiple targets, so it rather sounds like we've got a regression in one of the previous PRs? > As expected, these are all intermittents; on the passing job: > > https://gitlab.com/qemu-project/qemu/-/jobs/8011303114 > > they completed in 19s, 20s, 19s, 19s, 19s. So we're seeing > factor-of-3 variation in job runtime on this k8s runner :-( > > Anyway, I've pushed this pullreq; we can look at the above > two things as follow-on fixes. Thanks! Thomas
On Mon, 7 Oct 2024 at 17:41, Thomas Huth <thuth@redhat.com> wrote: > > On 07/10/2024 16.13, Peter Maydell wrote: > >> Some of the other qmp-cmd-test > >> runs in that job also came close to timing out: > >> > >> 102/109 qemu:qtest+qtest-m68k / qtest-m68k/qmp-cmd-test OK 56.56s 65 > >> subtests passed > >> 105/109 qemu:qtest+qtest-mips64 / qtest-mips64/qmp-cmd-test OK 53.74s > >> 65 subtests passed > >> 106/109 qemu:qtest+qtest-s390x / qtest-s390x/qmp-cmd-test OK 45.48s 65 > >> subtests passed > >> > >> so maybe we should add it to slow_tests with a 120s > >> timeout... > > Ok, m68k and s390x have been touched by this PR ... but still, it's one > qtest (qmp-cmd-test) that is failing for multiple targets, so it rather > sounds like we've got a regression in one of the previous PRs? I think it's more likely that the k8s runners are just horrifically inconsistent about speed: they have been the flaky CI jobs in one way or another at least since I started doing pullreq handling for this release cycle. If they reliably ran these jobs in 20s then there would be no issue, we would have tons of headroom between that and the 60s timeout. (My local dev box runs them in 13s, and it's not super high-powered.) If they reliably took 60s then we'd have fixed up the timeouts already (but that would imply a very slow CPU). Our other option would be to use that meson "multiply all the timeouts by X" feature for the k8s jobs. Of course if it does go that slowly for the whole job then we run into the whole-job timeout... Paolo: do you have any idea why our k8s runner jobs have such inconsistent performance ? -- PMM