diff mbox series

.gitlab-ci.d/windows: Work-around timeout and OpenGL problems of the MSYS2 jobs

Message ID 20230104123559.277586-1-thuth@redhat.com (mailing list archive)
State New, archived
Headers show
Series .gitlab-ci.d/windows: Work-around timeout and OpenGL problems of the MSYS2 jobs | expand

Commit Message

Thomas Huth Jan. 4, 2023, 12:35 p.m. UTC
The windows jobs (especially the 32-bit job) recently started to
hit the timeout limit. Bump it a little bit to ease the situation
(80 minutes is quite long already - OTOH, these jobs do not have to
wait for a job from the container stage to finish, so this should
still be OK).

Additionally, some update on the container side recently enabled
OpenGL in these jobs - but the corresponding code fails to compile.
Thus disable OpenGL here for the time being until someone figured
out the proper fix in the shader code for this.

Signed-off-by: Thomas Huth <thuth@redhat.com>
---
 Now that the timeout and OpenGL problems are gone, the 64-bit is
 working fine for me again. However, I'm still seeing random issues
 with the 32-bit job ... not sure whether it's a problem on the
 QEMU side or whether the builders are currently instable, since
 the issues do not reproduce reliably...

 .gitlab-ci.d/windows.yml | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

Comments

Philippe Mathieu-Daudé Jan. 4, 2023, 12:46 p.m. UTC | #1
On 4/1/23 13:35, Thomas Huth wrote:
> The windows jobs (especially the 32-bit job) recently started to
> hit the timeout limit. Bump it a little bit to ease the situation
> (80 minutes is quite long already - OTOH, these jobs do not have to
> wait for a job from the container stage to finish, so this should
> still be OK).
> 
> Additionally, some update on the container side recently enabled
> OpenGL in these jobs - but the corresponding code fails to compile.
> Thus disable OpenGL here for the time being until someone figured
> out the proper fix in the shader code for this.
> 
> Signed-off-by: Thomas Huth <thuth@redhat.com>
> ---
>   Now that the timeout and OpenGL problems are gone, the 64-bit is
>   working fine for me again. However, I'm still seeing random issues
>   with the 32-bit job ... not sure whether it's a problem on the
>   QEMU side or whether the builders are currently instable, since
>   the issues do not reproduce reliably...
> 
>   .gitlab-ci.d/windows.yml | 7 ++++---
>   1 file changed, 4 insertions(+), 3 deletions(-)

Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Marc-André Lureau Jan. 4, 2023, 2:54 p.m. UTC | #2
Hi

On Wed, Jan 4, 2023 at 4:36 PM Thomas Huth <thuth@redhat.com> wrote:
>
> The windows jobs (especially the 32-bit job) recently started to
> hit the timeout limit. Bump it a little bit to ease the situation
> (80 minutes is quite long already - OTOH, these jobs do not have to
> wait for a job from the container stage to finish, so this should
> still be OK).
>
> Additionally, some update on the container side recently enabled
> OpenGL in these jobs - but the corresponding code fails to compile.
> Thus disable OpenGL here for the time being until someone figured
> out the proper fix in the shader code for this.

It seems msys2 recently enabled egl support, but qemu egl code has not
been tested on win32 yet.

I'll take a look. I am adding egl support in fedora mingw as well:
https://src.fedoraproject.org/rpms/mingw-libepoxy/pull-request/3

>
> Signed-off-by: Thomas Huth <thuth@redhat.com>
> ---
>  Now that the timeout and OpenGL problems are gone, the 64-bit is
>  working fine for me again. However, I'm still seeing random issues
>  with the 32-bit job ... not sure whether it's a problem on the
>  QEMU side or whether the builders are currently instable, since
>  the issues do not reproduce reliably...
>
>  .gitlab-ci.d/windows.yml | 7 ++++---
>  1 file changed, 4 insertions(+), 3 deletions(-)
>
> diff --git a/.gitlab-ci.d/windows.yml b/.gitlab-ci.d/windows.yml
> index 9b5c4bcd8a..22f794e537 100644
> --- a/.gitlab-ci.d/windows.yml
> +++ b/.gitlab-ci.d/windows.yml
> @@ -10,7 +10,7 @@
>        - ${CI_PROJECT_DIR}/msys64/var/cache
>    needs: []
>    stage: build
> -  timeout: 70m
> +  timeout: 80m
>    before_script:
>    - If ( !(Test-Path -Path msys64\var\cache ) ) {
>        mkdir msys64\var\cache
> @@ -71,7 +71,7 @@ msys2-64bit:
>    # for the msys2 64-bit job, due to the build could not complete within
>    # the project timeout.
>    - ..\msys64\usr\bin\bash -lc '../configure --target-list=x86_64-softmmu
> -      --without-default-devices'
> +      --without-default-devices --disable-opengl'
>    - ..\msys64\usr\bin\bash -lc 'make'
>    # qTests don't run successfully with "--without-default-devices",
>    # so let's exclude the qtests from CI for now.
> @@ -113,6 +113,7 @@ msys2-32bit:
>    - $env:MSYS = 'winsymlinks:native' # Enable native Windows symlink
>    - mkdir output
>    - cd output
> -  - ..\msys64\usr\bin\bash -lc '../configure --target-list=ppc64-softmmu'
> +  - ..\msys64\usr\bin\bash -lc '../configure --target-list=ppc64-softmmu
> +        --disable-opengl'
>    - ..\msys64\usr\bin\bash -lc 'make'
>    - ..\msys64\usr\bin\bash -lc 'make check || { cat meson-logs/testlog.txt; exit 1; } ;'
> --
> 2.31.1
>
>

Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Peter Maydell Jan. 4, 2023, 10:01 p.m. UTC | #3
On Wed, 4 Jan 2023 at 12:36, Thomas Huth <thuth@redhat.com> wrote:
>
> The windows jobs (especially the 32-bit job) recently started to
> hit the timeout limit. Bump it a little bit to ease the situation
> (80 minutes is quite long already - OTOH, these jobs do not have to
> wait for a job from the container stage to finish, so this should
> still be OK).
>
> Additionally, some update on the container side recently enabled
> OpenGL in these jobs - but the corresponding code fails to compile.
> Thus disable OpenGL here for the time being until someone figured
> out the proper fix in the shader code for this.
>
> Signed-off-by: Thomas Huth <thuth@redhat.com>
> ---
>  Now that the timeout and OpenGL problems are gone, the 64-bit is
>  working fine for me again. However, I'm still seeing random issues
>  with the 32-bit job ... not sure whether it's a problem on the
>  QEMU side or whether the builders are currently instable, since
>  the issues do not reproduce reliably...
>
>  .gitlab-ci.d/windows.yml | 7 ++++---
>  1 file changed, 4 insertions(+), 3 deletions(-)

Thanks; applied to master on the assumption it will improve the
CI situation. I found that the msys2-32bit job still timed out
at 1h20, though:

https://gitlab.com/qemu-project/qemu/-/jobs/3555245586

-- PMM
Thomas Huth Jan. 5, 2023, 8:34 a.m. UTC | #4
On 04/01/2023 23.01, Peter Maydell wrote:
> On Wed, 4 Jan 2023 at 12:36, Thomas Huth <thuth@redhat.com> wrote:
>>
>> The windows jobs (especially the 32-bit job) recently started to
>> hit the timeout limit. Bump it a little bit to ease the situation
>> (80 minutes is quite long already - OTOH, these jobs do not have to
>> wait for a job from the container stage to finish, so this should
>> still be OK).
>>
>> Additionally, some update on the container side recently enabled
>> OpenGL in these jobs - but the corresponding code fails to compile.
>> Thus disable OpenGL here for the time being until someone figured
>> out the proper fix in the shader code for this.
>>
>> Signed-off-by: Thomas Huth <thuth@redhat.com>
>> ---
>>   Now that the timeout and OpenGL problems are gone, the 64-bit is
>>   working fine for me again. However, I'm still seeing random issues
>>   with the 32-bit job ... not sure whether it's a problem on the
>>   QEMU side or whether the builders are currently instable, since
>>   the issues do not reproduce reliably...
>>
>>   .gitlab-ci.d/windows.yml | 7 ++++---
>>   1 file changed, 4 insertions(+), 3 deletions(-)
> 
> Thanks; applied to master on the assumption it will improve the
> CI situation. I found that the msys2-32bit job still timed out
> at 1h20, though:
> 
> https://gitlab.com/qemu-project/qemu/-/jobs/3555245586

I just gave it a try again, too, and for me, it finished within 65 minutes:

  https://gitlab.com/thuth/qemu/-/jobs/3557600268

... let's keep looking for a while, maybe it's ok in most cases now, but if 
not, we have to consider something else.

  Thomas
Thomas Huth Jan. 5, 2023, 7:25 p.m. UTC | #5
On 05/01/2023 09.34, Thomas Huth wrote:
> On 04/01/2023 23.01, Peter Maydell wrote:
>> On Wed, 4 Jan 2023 at 12:36, Thomas Huth <thuth@redhat.com> wrote:
>>>
>>> The windows jobs (especially the 32-bit job) recently started to
>>> hit the timeout limit. Bump it a little bit to ease the situation
>>> (80 minutes is quite long already - OTOH, these jobs do not have to
>>> wait for a job from the container stage to finish, so this should
>>> still be OK).
>>>
>>> Additionally, some update on the container side recently enabled
>>> OpenGL in these jobs - but the corresponding code fails to compile.
>>> Thus disable OpenGL here for the time being until someone figured
>>> out the proper fix in the shader code for this.
>>>
>>> Signed-off-by: Thomas Huth <thuth@redhat.com>
>>> ---
>>>   Now that the timeout and OpenGL problems are gone, the 64-bit is
>>>   working fine for me again. However, I'm still seeing random issues
>>>   with the 32-bit job ... not sure whether it's a problem on the
>>>   QEMU side or whether the builders are currently instable, since
>>>   the issues do not reproduce reliably...
>>>
>>>   .gitlab-ci.d/windows.yml | 7 ++++---
>>>   1 file changed, 4 insertions(+), 3 deletions(-)
>>
>> Thanks; applied to master on the assumption it will improve the
>> CI situation. I found that the msys2-32bit job still timed out
>> at 1h20, though:
>>
>> https://gitlab.com/qemu-project/qemu/-/jobs/3555245586
> 
> I just gave it a try again, too, and for me, it finished within 65 minutes:
> 
>   https://gitlab.com/thuth/qemu/-/jobs/3557600268
> 
> ... let's keep looking for a while, maybe it's ok in most cases now, but if 
> not, we have to consider something else.

Ok, so after I've been struggling with a failing msys2-32bit job for my new 
upcoming pull request the last two days (I thought I had a bad patch in 
there), where I had some problems with the test-hmp and qom-test qtests, 
I've come up with a new theory after looking at this CI run from the 
qemu-project staging branch and seeing that these tests are also failing there:

  https://gitlab.com/qemu-project/qemu/-/jobs/3558798544
  https://gitlab.com/qemu-project/qemu/-/jobs/3560870904

That might also explain the timed-out job that you have seen earlier, Peter, 
it was likely a hanging qom-test since that seems to be the first test to be 
executed during the "make check" there.

So the qtests for Windows are definitely not ready for the CI yet (after 
we've enabled them just in December). I think it's best to disable them 
there again completely until the issues are understood and fixed.

  Thomas
Bin Meng Jan. 6, 2023, 9:24 a.m. UTC | #6
On Fri, Jan 6, 2023 at 3:35 AM Thomas Huth <thuth@redhat.com> wrote:
>
> On 05/01/2023 09.34, Thomas Huth wrote:
> > On 04/01/2023 23.01, Peter Maydell wrote:
> >> On Wed, 4 Jan 2023 at 12:36, Thomas Huth <thuth@redhat.com> wrote:
> >>>
> >>> The windows jobs (especially the 32-bit job) recently started to
> >>> hit the timeout limit. Bump it a little bit to ease the situation
> >>> (80 minutes is quite long already - OTOH, these jobs do not have to
> >>> wait for a job from the container stage to finish, so this should
> >>> still be OK).
> >>>
> >>> Additionally, some update on the container side recently enabled
> >>> OpenGL in these jobs - but the corresponding code fails to compile.
> >>> Thus disable OpenGL here for the time being until someone figured
> >>> out the proper fix in the shader code for this.
> >>>
> >>> Signed-off-by: Thomas Huth <thuth@redhat.com>
> >>> ---
> >>>   Now that the timeout and OpenGL problems are gone, the 64-bit is
> >>>   working fine for me again. However, I'm still seeing random issues
> >>>   with the 32-bit job ... not sure whether it's a problem on the
> >>>   QEMU side or whether the builders are currently instable, since
> >>>   the issues do not reproduce reliably...
> >>>
> >>>   .gitlab-ci.d/windows.yml | 7 ++++---
> >>>   1 file changed, 4 insertions(+), 3 deletions(-)
> >>
> >> Thanks; applied to master on the assumption it will improve the
> >> CI situation. I found that the msys2-32bit job still timed out
> >> at 1h20, though:
> >>
> >> https://gitlab.com/qemu-project/qemu/-/jobs/3555245586
> >
> > I just gave it a try again, too, and for me, it finished within 65 minutes:
> >
> >   https://gitlab.com/thuth/qemu/-/jobs/3557600268
> >
> > ... let's keep looking for a while, maybe it's ok in most cases now, but if
> > not, we have to consider something else.
>
> Ok, so after I've been struggling with a failing msys2-32bit job for my new
> upcoming pull request the last two days (I thought I had a bad patch in
> there), where I had some problems with the test-hmp and qom-test qtests,
> I've come up with a new theory after looking at this CI run from the
> qemu-project staging branch and seeing that these tests are also failing there:
>
>   https://gitlab.com/qemu-project/qemu/-/jobs/3558798544
>   https://gitlab.com/qemu-project/qemu/-/jobs/3560870904
>
> That might also explain the timed-out job that you have seen earlier, Peter,
> it was likely a hanging qom-test since that seems to be the first test to be
> executed during the "make check" there.
>
> So the qtests for Windows are definitely not ready for the CI yet (after
> we've enabled them just in December). I think it's best to disable them
> there again completely until the issues are understood and fixed.
>

I cannot reproduce the test failures of both tests (test-hmp and
qom-test) with w32 executables. Neither did the w64 executables.

My testing repo is at commit d1852caab131ea898134fdcea8c14bc2ee75fbe9.

Regards,
Bin
Volker Rümelin Jan. 6, 2023, 5:08 p.m. UTC | #7
Am 04.01.23 um 13:35 schrieb Thomas Huth:
> The windows jobs (especially the 32-bit job) recently started to
> hit the timeout limit. Bump it a little bit to ease the situation
> (80 minutes is quite long already - OTOH, these jobs do not have to
> wait for a job from the container stage to finish, so this should
> still be OK).
>
> Additionally, some update on the container side recently enabled
> OpenGL in these jobs - but the corresponding code fails to compile.
> Thus disable OpenGL here for the time being until someone figured
> out the proper fix in the shader code for this.

This is strange. On my Windows msys2 system, I didn't even notice the 
OpenGL code was silently enabled. The code compiles without issues. 
Today I enabled the GtkGLArea code initialization in ui/gtk.c to test 
OpenGL acceleration on Windows.

 >--- a/ui/gtk.c
 >+++ b/ui/gtk.c
 >@@ -2435,6 +2435,12 @@ static void 
early_gtk_display_init(DisplayOptions *opts)
 >             gtk_use_gl_area = true;
 >             gtk_gl_area_init();
 >         } else
 >+#endif
 >+#if defined(GDK_WINDOWING_WIN32)
 >+        if (GDK_IS_WIN32_DISPLAY(gdk_display_get_default())) {
 >+            gtk_use_gl_area = true;
 >+            gtk_gl_area_init();
 >+        } else
 > #endif
 >         {
 > #ifdef CONFIG_X11

Well, it's a start. On a Linux guest system the WebGL Aquarium frame 
rate increased from 6fps to 14fps while the host processor load went 
down from 100% to 65%.

QEMU was started with:
./qemu-system-x86_64.exe -accel whpx \
-machine pc,usb=off,vmport=off,kernel-irqchip=off \
-cpu Skylake-Client-v4,tsc-deadline=off,x2apic=off \
-smp 4,sockets=1,cores=4,threads=1 \
-device virtio-vga-gl,xres=1280,yres=768,bus=pci.0 \
-display gtk,zoom-to-fit=off,gl=on \
-trace "gd_gl_area_*_context" \
...

This is the start of the QEMU log file:
Windows Hypervisor Platform accelerator is operational
qemu: GtkGLArea console lacks DMABUF support.
Realize gdk gl context failed: Unable to create a GL context
Realize gdk gl context failed: Unable to create a GL context
gd_gl_area_create_context ctx=000002934703bc10, major=4, minor=4
gl_version 44 - core profile enabled
gd_gl_area_destroy_context ctx=000002934703bc10, 
current_ctx=000002934703bc10
gd_gl_area_create_context ctx=000002934703bc10, major=4, minor=4
gd_gl_area_create_context ctx=000002934703bba0, major=4, minor=4
gd_gl_area_create_context ctx=000002934703bb30, major=4, minor=4
GLSL feature level 440
gd_gl_area_create_context ctx=000002934703b890, major=4, minor=4
gd_gl_area_create_context ctx=000002934703bc80, major=4, minor=4
gd_gl_area_create_context ctx=000002934703bcf0, major=4, minor=4
gd_gl_area_destroy_context ctx=000002934703bcf0, 
current_ctx=000002934703bcf0
...

With best regards,
Volker

> Signed-off-by: Thomas Huth<thuth@redhat.com>
> ---
Thomas Huth Jan. 7, 2023, 2:32 p.m. UTC | #8
On 06/01/2023 10.24, Bin Meng wrote:
> On Fri, Jan 6, 2023 at 3:35 AM Thomas Huth <thuth@redhat.com> wrote:
>>
>> On 05/01/2023 09.34, Thomas Huth wrote:
>>> On 04/01/2023 23.01, Peter Maydell wrote:
>>>> On Wed, 4 Jan 2023 at 12:36, Thomas Huth <thuth@redhat.com> wrote:
>>>>>
>>>>> The windows jobs (especially the 32-bit job) recently started to
>>>>> hit the timeout limit. Bump it a little bit to ease the situation
>>>>> (80 minutes is quite long already - OTOH, these jobs do not have to
>>>>> wait for a job from the container stage to finish, so this should
>>>>> still be OK).
>>>>>
>>>>> Additionally, some update on the container side recently enabled
>>>>> OpenGL in these jobs - but the corresponding code fails to compile.
>>>>> Thus disable OpenGL here for the time being until someone figured
>>>>> out the proper fix in the shader code for this.
>>>>>
>>>>> Signed-off-by: Thomas Huth <thuth@redhat.com>
>>>>> ---
>>>>>    Now that the timeout and OpenGL problems are gone, the 64-bit is
>>>>>    working fine for me again. However, I'm still seeing random issues
>>>>>    with the 32-bit job ... not sure whether it's a problem on the
>>>>>    QEMU side or whether the builders are currently instable, since
>>>>>    the issues do not reproduce reliably...
>>>>>
>>>>>    .gitlab-ci.d/windows.yml | 7 ++++---
>>>>>    1 file changed, 4 insertions(+), 3 deletions(-)
>>>>
>>>> Thanks; applied to master on the assumption it will improve the
>>>> CI situation. I found that the msys2-32bit job still timed out
>>>> at 1h20, though:
>>>>
>>>> https://gitlab.com/qemu-project/qemu/-/jobs/3555245586
>>>
>>> I just gave it a try again, too, and for me, it finished within 65 minutes:
>>>
>>>    https://gitlab.com/thuth/qemu/-/jobs/3557600268
>>>
>>> ... let's keep looking for a while, maybe it's ok in most cases now, but if
>>> not, we have to consider something else.
>>
>> Ok, so after I've been struggling with a failing msys2-32bit job for my new
>> upcoming pull request the last two days (I thought I had a bad patch in
>> there), where I had some problems with the test-hmp and qom-test qtests,
>> I've come up with a new theory after looking at this CI run from the
>> qemu-project staging branch and seeing that these tests are also failing there:
>>
>>    https://gitlab.com/qemu-project/qemu/-/jobs/3558798544
>>    https://gitlab.com/qemu-project/qemu/-/jobs/3560870904
>>
>> That might also explain the timed-out job that you have seen earlier, Peter,
>> it was likely a hanging qom-test since that seems to be the first test to be
>> executed during the "make check" there.
>>
>> So the qtests for Windows are definitely not ready for the CI yet (after
>> we've enabled them just in December). I think it's best to disable them
>> there again completely until the issues are understood and fixed.
>>
> 
> I cannot reproduce the test failures of both tests (test-hmp and
> qom-test) with w32 executables. Neither did the w64 executables.
> 
> My testing repo is at commit d1852caab131ea898134fdcea8c14bc2ee75fbe9.

Can you at least reproduce it in the Gitlab-CI? ... it also does not always 
occur, sometimes the jobs are working fine. I suspect it's some kind of race 
or memory problem ... is there something similar to "Valgrind" on Windows? 
If so, could you try to run those qtests there with such tooling enabled?

  Thomas
diff mbox series

Patch

diff --git a/.gitlab-ci.d/windows.yml b/.gitlab-ci.d/windows.yml
index 9b5c4bcd8a..22f794e537 100644
--- a/.gitlab-ci.d/windows.yml
+++ b/.gitlab-ci.d/windows.yml
@@ -10,7 +10,7 @@ 
       - ${CI_PROJECT_DIR}/msys64/var/cache
   needs: []
   stage: build
-  timeout: 70m
+  timeout: 80m
   before_script:
   - If ( !(Test-Path -Path msys64\var\cache ) ) {
       mkdir msys64\var\cache
@@ -71,7 +71,7 @@  msys2-64bit:
   # for the msys2 64-bit job, due to the build could not complete within
   # the project timeout.
   - ..\msys64\usr\bin\bash -lc '../configure --target-list=x86_64-softmmu
-      --without-default-devices'
+      --without-default-devices --disable-opengl'
   - ..\msys64\usr\bin\bash -lc 'make'
   # qTests don't run successfully with "--without-default-devices",
   # so let's exclude the qtests from CI for now.
@@ -113,6 +113,7 @@  msys2-32bit:
   - $env:MSYS = 'winsymlinks:native' # Enable native Windows symlink
   - mkdir output
   - cd output
-  - ..\msys64\usr\bin\bash -lc '../configure --target-list=ppc64-softmmu'
+  - ..\msys64\usr\bin\bash -lc '../configure --target-list=ppc64-softmmu
+        --disable-opengl'
   - ..\msys64\usr\bin\bash -lc 'make'
   - ..\msys64\usr\bin\bash -lc 'make check || { cat meson-logs/testlog.txt; exit 1; } ;'