[0/5] QEMU Gating CI

Message ID	20200312193616.438922-1-crosa@redhat.com (mailing list archive)
Headers	show Return-Path: <SRS0=1prl=45=nongnu.org=qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@kernel.org> DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 97655206E2 From: Cleber Rosa <crosa@redhat.com> To: =?utf-8?q?Alex_Benn=C3=A9e?= <alex.bennee@linaro.org>, Peter Maydell <peter.maydell@linaro.org>, qemu-devel@nongnu.org Subject: [PATCH 0/5] QEMU Gating CI Date: Thu, 12 Mar 2020 15:36:11 -0400 Message-Id: <20200312193616.438922-1-crosa@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: quoted-printable Precedence: list Cc: Fam Zheng <fam@euphon.net>, Thomas Huth <thuth@redhat.com>, Eduardo Habkost <ehabkost@redhat.com>, Erik Skultety <eskultet@redhat.com>, Wainer Moschetta <wmoschet@redhat.com>, Wainer dos Santos Moschetta <wainersm@redhat.com>, Willian Rampazzo <wrampazz@redhat.com>, Cleber Rosa <crosa@redhat.com>, =?utf-8?q?Philippe_Mathieu-Daud=C3=A9?= <philmd@redhat.com>, Beraldo Leal <bleal@redhat.com> Errors-To: qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org Sender: "Qemu-devel" <qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org>
Series	QEMU Gating CI \| expand [0/5] QEMU Gating CI [1/5] tests/docker: add CentOS 8 Dockerfile [2/5] tests/docker: make "buildah bud" output similar to "docker build" [3/5] GitLab CI: avoid calling before_scripts on unintended jobs [4/5] GitLab Gating CI: introduce pipeline-status contrib script [5/5] GitLab Gating CI: initial set of jobs, documentation and scripts

Cleber Rosa March 12, 2020, 7:36 p.m. UTC

The idea about a public facing Gating CI for QEMU was lastly
summarized in an RFC[1].  Since then, it was decided that a
simpler version should be attempted first.

Changes from the RFC patches[2] accompanying the RFC document:

- Moved gating job definitions to .gitlab-ci-gating.yml
- Added info on "--disable-libssh" build option requirement
  (https://bugs.launchpad.net/qemu/+bug/1838763) to Ubuntu 18.04 jobs
- Added info on "--disable-glusterfs" build option requirement
  (there's no static version of those libs in distro supplied
  packages) to one
- Dropped ubuntu-18.04.3-x86_64-notools job definition, because it
  doesn't fall into the general scope of gating job described by PMM
  (and it did not run any test)
- Added w32 and w64 cross builds based on Fedora 30
- Added a FreeBSD based job that builds all targets and runs `make
  check`
- Added "-j`nproc`" and "-j`sysctl -n hw.ncpu`" options to make as a
  simple but effective way of speeding up the builds and tests by
  using a number of make jobs matching the number of CPUs
- Because the Ansible playbooks reference the content on Dockerfiles,
  some fixes to some Dockerfiles caught in the process were included
- New patch with script to check or wait on a pipeline execution

[1] - https://lists.gnu.org/archive/html/qemu-devel/2019-12/msg00231.html
[2] - https://lists.gnu.org/archive/html/qemu-devel/2020-02/msg00154.html

Cleber Rosa (5):
  tests/docker: add CentOS 8 Dockerfile
  tests/docker: make "buildah bud" output similar to "docker build"
  GitLab CI: avoid calling before_scripts on unintended jobs
  GitLab Gating CI: introduce pipeline-status contrib script
  GitLab Gating CI: initial set of jobs, documentation and scripts

 .gitlab-ci-gating.yml                         | 111 ++++++++++
 .gitlab-ci.yml                                |  32 ++-
 contrib/ci/orgs/qemu/build-environment.yml    | 208 ++++++++++++++++++
 contrib/ci/orgs/qemu/gitlab-runner.yml        |  65 ++++++
 contrib/ci/orgs/qemu/inventory                |   2 +
 contrib/ci/orgs/qemu/vars.yml                 |  13 ++
 contrib/ci/scripts/gitlab-pipeline-status     | 148 +++++++++++++
 docs/devel/testing.rst                        | 142 ++++++++++++
 tests/docker/dockerfiles/centos8.docker       |  32 +++
 .../dockerfiles/debian-win32-cross.docker     |   2 +-
 10 files changed, 751 insertions(+), 4 deletions(-)
 create mode 100644 .gitlab-ci-gating.yml
 create mode 100644 contrib/ci/orgs/qemu/build-environment.yml
 create mode 100644 contrib/ci/orgs/qemu/gitlab-runner.yml
 create mode 100644 contrib/ci/orgs/qemu/inventory
 create mode 100644 contrib/ci/orgs/qemu/vars.yml
 create mode 100755 contrib/ci/scripts/gitlab-pipeline-status
 create mode 100644 tests/docker/dockerfiles/centos8.docker

Peter Maydell March 12, 2020, 10 p.m. UTC | #1

On Thu, 12 Mar 2020 at 19:36, Cleber Rosa <crosa@redhat.com> wrote:
>
> The idea about a public facing Gating CI for QEMU was lastly
> summarized in an RFC[1].  Since then, it was decided that a
> simpler version should be attempted first.

OK, so my question here is:
 * what are the instructions that I have to follow to be
able to say "ok, here's my branch, run it through these tests,
please" ?

thanks
-- PMM

Cleber Rosa March 12, 2020, 10:16 p.m. UTC | #2

On Thu, Mar 12, 2020 at 10:00:42PM +0000, Peter Maydell wrote:
> On Thu, 12 Mar 2020 at 19:36, Cleber Rosa <crosa@redhat.com> wrote:
> >
> > The idea about a public facing Gating CI for QEMU was lastly
> > summarized in an RFC[1].  Since then, it was decided that a
> > simpler version should be attempted first.
> 
> OK, so my question here is:
>  * what are the instructions that I have to follow to be
> able to say "ok, here's my branch, run it through these tests,
> please" ?

The quick answer is:

 $ git push git@gitlab.com:qemu-project/qemu.git my-branch:staging

The longer explanation is that these jobs are limited to a "staging"
branch, so all you'd have to do is to push something to a branch
called "staging".  If that branch happens to be from the
"gitlab.com/qemu-project/qemu" repo, than the runners setup there
would be used.  The documentation an ansible playbooks are supposed
to help with this setup.

Once that push happens, you could use:

 $ contrib/ci/scripts/gitlab-pipeline-status --verbose --wait

Before doing something like:

 $ git push git@gitlab.com:qemu-project/qemu.git my-branch:master

> 
> thanks
> -- PMM
> 

Let me know if that makes sense.

Cheers,
- Cleber.

Peter Maydell March 13, 2020, 1:55 p.m. UTC | #3

On Thu, 12 Mar 2020 at 22:16, Cleber Rosa <crosa@redhat.com> wrote:
>
> On Thu, Mar 12, 2020 at 10:00:42PM +0000, Peter Maydell wrote:
> > OK, so my question here is:
> >  * what are the instructions that I have to follow to be
> > able to say "ok, here's my branch, run it through these tests,
> > please" ?
>
> The quick answer is:
>
>  $ git push git@gitlab.com:qemu-project/qemu.git my-branch:staging
>
> The longer explanation is that these jobs are limited to a "staging"
> branch, so all you'd have to do is to push something to a branch
> called "staging".  If that branch happens to be from the
> "gitlab.com/qemu-project/qemu" repo, than the runners setup there
> would be used.  The documentation an ansible playbooks are supposed
> to help with this setup.

Great, thanks. Could I do that for testing purposes with a
staging branch that includes these patches, or would we have
to wait for them to be in master before it works?

-- PMM

Cleber Rosa March 13, 2020, 2:58 p.m. UTC | #4

On Fri, Mar 13, 2020 at 01:55:49PM +0000, Peter Maydell wrote:
> On Thu, 12 Mar 2020 at 22:16, Cleber Rosa <crosa@redhat.com> wrote:
> >
> > On Thu, Mar 12, 2020 at 10:00:42PM +0000, Peter Maydell wrote:
> > > OK, so my question here is:
> > >  * what are the instructions that I have to follow to be
> > > able to say "ok, here's my branch, run it through these tests,
> > > please" ?
> >
> > The quick answer is:
> >
> >  $ git push git@gitlab.com:qemu-project/qemu.git my-branch:staging
> >
> > The longer explanation is that these jobs are limited to a "staging"
> > branch, so all you'd have to do is to push something to a branch
> > called "staging".  If that branch happens to be from the
> > "gitlab.com/qemu-project/qemu" repo, than the runners setup there
> > would be used.  The documentation an ansible playbooks are supposed
> > to help with this setup.
> 
> Great, thanks. Could I do that for testing purposes with a
> staging branch that includes these patches, or would we have
> to wait for them to be in master before it works?
>

You can definitely do that with a staging branch that includes these
patches.

And with regards to setting up machines with runners and other
requirements, I tried to make it easy to replicate, but YMMV.  So,
please let me know if you find any issues whatsoever.

> -- PMM
> 

Thanks,
- Cleber.

Peter Maydell March 16, 2020, 11:57 a.m. UTC | #5

On Thu, 12 Mar 2020 at 22:16, Cleber Rosa <crosa@redhat.com> wrote:
> The quick answer is:
>
>  $ git push git@gitlab.com:qemu-project/qemu.git my-branch:staging

So I did this bit...

> Once that push happens, you could use:
>
>  $ contrib/ci/scripts/gitlab-pipeline-status --verbose --wait

...but this script just says:

$ ./contrib/ci/scripts/gitlab-pipeline-status --verbose --wait
ERROR: No pipeline found
failure

thanks
-- PMM

Cleber Rosa March 16, 2020, 12:04 p.m. UTC | #6

----- Original Message -----
> From: "Peter Maydell" <peter.maydell@linaro.org>
> To: "Cleber Rosa" <crosa@redhat.com>
> Cc: "Alex Bennée" <alex.bennee@linaro.org>, "QEMU Developers" <qemu-devel@nongnu.org>, "Fam Zheng" <fam@euphon.net>,
> "Eduardo Habkost" <ehabkost@redhat.com>, "Beraldo Leal" <bleal@redhat.com>, "Philippe Mathieu-Daudé"
> <philmd@redhat.com>, "Thomas Huth" <thuth@redhat.com>, "Wainer dos Santos Moschetta" <wainersm@redhat.com>, "Erik
> Skultety" <eskultet@redhat.com>, "Willian Rampazzo" <wrampazz@redhat.com>, "Wainer Moschetta" <wmoschet@redhat.com>
> Sent: Monday, March 16, 2020 7:57:33 AM
> Subject: Re: [PATCH 0/5] QEMU Gating CI
> 
> On Thu, 12 Mar 2020 at 22:16, Cleber Rosa <crosa@redhat.com> wrote:
> > The quick answer is:
> >
> >  $ git push git@gitlab.com:qemu-project/qemu.git my-branch:staging
> 
> So I did this bit...
> 
> > Once that push happens, you could use:
> >
> >  $ contrib/ci/scripts/gitlab-pipeline-status --verbose --wait
> 
> ...but this script just says:
> 
> $ ./contrib/ci/scripts/gitlab-pipeline-status --verbose --wait
> ERROR: No pipeline found
> failure
> 

Hi Peter,

A few possible reasons come to my mind:

1) It usually takes a few seconds after the push for the pipeline to

2) If you've pushed to a repo different than gitlab.com/qemu-project/qemu,
   you'd have to tweak the project ID (-p|--project-id).

3) The local branch is not called "staging", so the script can not find the
   commit ID, in that case you can use -c|--commit.

> thanks
> -- PMM
> 
> 

Please let me know if any of these points helps.

Cheers,
- Cleber.

Peter Maydell March 16, 2020, 12:12 p.m. UTC | #7

On Mon, 16 Mar 2020 at 12:04, Cleber Rosa <crosa@redhat.com> wrote:
> A few possible reasons come to my mind:
>
> 1) It usually takes a few seconds after the push for the pipeline to
>
> 2) If you've pushed to a repo different than gitlab.com/qemu-project/qemu,
>    you'd have to tweak the project ID (-p|--project-id).
>
> 3) The local branch is not called "staging", so the script can not find the
>    commit ID, in that case you can use -c|--commit.

Yes, the local branch is something else for the purposes of
testing this series. But using --commit doesn't work either:

$ ./contrib/ci/scripts/gitlab-pipeline-status --verbose --commit
81beaaab0851fe8c4db971 --wait
ERROR: No pipeline found
failure

On the web UI:
https://gitlab.com/qemu-project/qemu/pipelines
the pipelines are marked "stuck" (I don't know why there
are two of them for the same commit); drilling down,
the build part has completed but all the test parts are
pending with "This job is stuck because you don't have
any active runners online with any of these tags assigned
to them" type messages.

thanks
-- PMM

Cleber Rosa March 16, 2020, 12:26 p.m. UTC | #8

----- Original Message -----
> From: "Peter Maydell" <peter.maydell@linaro.org>
> To: "Cleber Rosa" <crosa@redhat.com>
> Cc: "Alex Bennée" <alex.bennee@linaro.org>, "QEMU Developers" <qemu-devel@nongnu.org>, "Fam Zheng" <fam@euphon.net>,
> "Eduardo Habkost" <ehabkost@redhat.com>, "Beraldo Leal" <bleal@redhat.com>, "Philippe Mathieu-Daudé"
> <philmd@redhat.com>, "Thomas Huth" <thuth@redhat.com>, "Wainer dos Santos Moschetta" <wainersm@redhat.com>, "Erik
> Skultety" <eskultet@redhat.com>, "Willian Rampazzo" <wrampazz@redhat.com>, "Wainer Moschetta" <wmoschet@redhat.com>
> Sent: Monday, March 16, 2020 8:12:16 AM
> Subject: Re: [PATCH 0/5] QEMU Gating CI
> 
> On Mon, 16 Mar 2020 at 12:04, Cleber Rosa <crosa@redhat.com> wrote:
> > A few possible reasons come to my mind:
> >
> > 1) It usually takes a few seconds after the push for the pipeline to
> >
> > 2) If you've pushed to a repo different than gitlab.com/qemu-project/qemu,
> >    you'd have to tweak the project ID (-p|--project-id).
> >
> > 3) The local branch is not called "staging", so the script can not find the
> >    commit ID, in that case you can use -c|--commit.
> 
> Yes, the local branch is something else for the purposes of
> testing this series. But using --commit doesn't work either:
> 
> $ ./contrib/ci/scripts/gitlab-pipeline-status --verbose --commit
> 81beaaab0851fe8c4db971 --wait
> ERROR: No pipeline found
> failure
> 

This may be a bug in the script because I was not expecting two
pipelines for the same commit. Checking...

> On the web UI:
> https://gitlab.com/qemu-project/qemu/pipelines
> the pipelines are marked "stuck" (I don't know why there
> are two of them for the same commit); drilling down,
> the build part has completed but all the test parts are
> pending with "This job is stuck because you don't have
> any active runners online with any of these tags assigned
> to them" type messages.
> 

I had also not come across the duplicate pipelines, so I'm trying
to understand what's that about.

About the runners and the fact that the job is stuck without them,
the message seems straightforward enough, but I can't get to the
project configuration to look at the registered runners with my
current permissions (set as "developer").

> thanks
> -- PMM
> 
> 

Thanks,
- Cleber.

Cleber Rosa March 16, 2020, 12:30 p.m. UTC | #9

----- Original Message -----
> From: "Cleber Rosa" <crosa@redhat.com>
> To: "Peter Maydell" <peter.maydell@linaro.org>
> Cc: "Alex Bennée" <alex.bennee@linaro.org>, "QEMU Developers" <qemu-devel@nongnu.org>, "Fam Zheng" <fam@euphon.net>,
> "Eduardo Habkost" <ehabkost@redhat.com>, "Beraldo Leal" <bleal@redhat.com>, "Philippe Mathieu-Daudé"
> <philmd@redhat.com>, "Thomas Huth" <thuth@redhat.com>, "Wainer dos Santos Moschetta" <wainersm@redhat.com>, "Erik
> Skultety" <eskultet@redhat.com>, "Willian Rampazzo" <wrampazz@redhat.com>, "Wainer Moschetta" <wmoschet@redhat.com>
> Sent: Monday, March 16, 2020 8:26:46 AM
> Subject: Re: [PATCH 0/5] QEMU Gating CI
> 
> 
> 
> ----- Original Message -----
> > From: "Peter Maydell" <peter.maydell@linaro.org>
> > To: "Cleber Rosa" <crosa@redhat.com>
> > Cc: "Alex Bennée" <alex.bennee@linaro.org>, "QEMU Developers"
> > <qemu-devel@nongnu.org>, "Fam Zheng" <fam@euphon.net>,
> > "Eduardo Habkost" <ehabkost@redhat.com>, "Beraldo Leal" <bleal@redhat.com>,
> > "Philippe Mathieu-Daudé"
> > <philmd@redhat.com>, "Thomas Huth" <thuth@redhat.com>, "Wainer dos Santos
> > Moschetta" <wainersm@redhat.com>, "Erik
> > Skultety" <eskultet@redhat.com>, "Willian Rampazzo" <wrampazz@redhat.com>,
> > "Wainer Moschetta" <wmoschet@redhat.com>
> > Sent: Monday, March 16, 2020 8:12:16 AM
> > Subject: Re: [PATCH 0/5] QEMU Gating CI
> > 
> > On Mon, 16 Mar 2020 at 12:04, Cleber Rosa <crosa@redhat.com> wrote:
> > > A few possible reasons come to my mind:
> > >
> > > 1) It usually takes a few seconds after the push for the pipeline to
> > >
> > > 2) If you've pushed to a repo different than
> > > gitlab.com/qemu-project/qemu,
> > >    you'd have to tweak the project ID (-p|--project-id).
> > >
> > > 3) The local branch is not called "staging", so the script can not find
> > > the
> > >    commit ID, in that case you can use -c|--commit.
> > 
> > Yes, the local branch is something else for the purposes of
> > testing this series. But using --commit doesn't work either:
> > 
> > $ ./contrib/ci/scripts/gitlab-pipeline-status --verbose --commit
> > 81beaaab0851fe8c4db971 --wait
> > ERROR: No pipeline found
> > failure
> > 
> 
> This may be a bug in the script because I was not expecting two
> pipelines for the same commit. Checking...
> 

Looks like the GitLab API requires a 40 char commit ID, so this
should work:

 $ ./contrib/ci/scripts/gitlab-pipeline-status --verbose --commit 81beaaab0851fe8c4db971df555600152bb83a6c --wait

It'll stay silent at first, but then will print a message every 60 seconds.

Thanks,
- Cleber.

Daniel P. Berrangé March 16, 2020, 12:38 p.m. UTC | #10

On Thu, Mar 12, 2020 at 03:36:11PM -0400, Cleber Rosa wrote:
> The idea about a public facing Gating CI for QEMU was lastly
> summarized in an RFC[1].  Since then, it was decided that a
> simpler version should be attempted first.
> 
> Changes from the RFC patches[2] accompanying the RFC document:
> 
> - Moved gating job definitions to .gitlab-ci-gating.yml
> - Added info on "--disable-libssh" build option requirement
>   (https://bugs.launchpad.net/qemu/+bug/1838763) to Ubuntu 18.04 jobs
> - Added info on "--disable-glusterfs" build option requirement
>   (there's no static version of those libs in distro supplied
>   packages) to one
> - Dropped ubuntu-18.04.3-x86_64-notools job definition, because it
>   doesn't fall into the general scope of gating job described by PMM
>   (and it did not run any test)
> - Added w32 and w64 cross builds based on Fedora 30
> - Added a FreeBSD based job that builds all targets and runs `make
>   check`
> - Added "-j`nproc`" and "-j`sysctl -n hw.ncpu`" options to make as a
>   simple but effective way of speeding up the builds and tests by
>   using a number of make jobs matching the number of CPUs
> - Because the Ansible playbooks reference the content on Dockerfiles,
>   some fixes to some Dockerfiles caught in the process were included
> - New patch with script to check or wait on a pipeline execution
> 
> [1] - https://lists.gnu.org/archive/html/qemu-devel/2019-12/msg00231.html
> [2] - https://lists.gnu.org/archive/html/qemu-devel/2020-02/msg00154.html
> 
> Cleber Rosa (5):
>   tests/docker: add CentOS 8 Dockerfile
>   tests/docker: make "buildah bud" output similar to "docker build"
>   GitLab CI: avoid calling before_scripts on unintended jobs
>   GitLab Gating CI: introduce pipeline-status contrib script
>   GitLab Gating CI: initial set of jobs, documentation and scripts
> 
>  .gitlab-ci-gating.yml                         | 111 ++++++++++
>  .gitlab-ci.yml                                |  32 ++-
>  contrib/ci/orgs/qemu/build-environment.yml    | 208 ++++++++++++++++++
>  contrib/ci/orgs/qemu/gitlab-runner.yml        |  65 ++++++
>  contrib/ci/orgs/qemu/inventory                |   2 +
>  contrib/ci/orgs/qemu/vars.yml                 |  13 ++
>  contrib/ci/scripts/gitlab-pipeline-status     | 148 +++++++++++++

FYI, the contrib/ directory is generally a place for arbitrary / adhoc
but interesting user contributed files/sources that are not officially
supported deliverables of the project.

IOW, this is not a good home for the official CI scripts.

We already have a .gitlab-ci.d/ directory that looks like it would
be good for this.  Or if that's not suitable, then scripts/ci/ is
a second choice.

>  docs/devel/testing.rst                        | 142 ++++++++++++
>  tests/docker/dockerfiles/centos8.docker       |  32 +++
>  .../dockerfiles/debian-win32-cross.docker     |   2 +-
>  10 files changed, 751 insertions(+), 4 deletions(-)
>  create mode 100644 .gitlab-ci-gating.yml
>  create mode 100644 contrib/ci/orgs/qemu/build-environment.yml
>  create mode 100644 contrib/ci/orgs/qemu/gitlab-runner.yml
>  create mode 100644 contrib/ci/orgs/qemu/inventory
>  create mode 100644 contrib/ci/orgs/qemu/vars.yml
>  create mode 100755 contrib/ci/scripts/gitlab-pipeline-status
>  create mode 100644 tests/docker/dockerfiles/centos8.docker

Regards,
Daniel

Cleber Rosa March 16, 2020, 12:46 p.m. UTC | #11

----- Original Message -----
> From: "Daniel P. Berrangé" <berrange@redhat.com>
> To: "Cleber Rosa" <crosa@redhat.com>
> Cc: "Alex Bennée" <alex.bennee@linaro.org>, "Peter Maydell" <peter.maydell@linaro.org>, qemu-devel@nongnu.org, "Fam
> Zheng" <fam@euphon.net>, "Thomas Huth" <thuth@redhat.com>, "Eduardo Habkost" <ehabkost@redhat.com>, "Erik Skultety"
> <eskultet@redhat.com>, "Wainer Moschetta" <wmoschet@redhat.com>, "Wainer dos Santos Moschetta"
> <wainersm@redhat.com>, "Willian Rampazzo" <wrampazz@redhat.com>, "Philippe Mathieu-Daudé" <philmd@redhat.com>,
> "Beraldo Leal" <bleal@redhat.com>
> Sent: Monday, March 16, 2020 8:38:07 AM
> Subject: Re: [PATCH 0/5] QEMU Gating CI
> 
> On Thu, Mar 12, 2020 at 03:36:11PM -0400, Cleber Rosa wrote:
> > The idea about a public facing Gating CI for QEMU was lastly
> > summarized in an RFC[1].  Since then, it was decided that a
> > simpler version should be attempted first.
> > 
> > Changes from the RFC patches[2] accompanying the RFC document:
> > 
> > - Moved gating job definitions to .gitlab-ci-gating.yml
> > - Added info on "--disable-libssh" build option requirement
> >   (https://bugs.launchpad.net/qemu/+bug/1838763) to Ubuntu 18.04 jobs
> > - Added info on "--disable-glusterfs" build option requirement
> >   (there's no static version of those libs in distro supplied
> >   packages) to one
> > - Dropped ubuntu-18.04.3-x86_64-notools job definition, because it
> >   doesn't fall into the general scope of gating job described by PMM
> >   (and it did not run any test)
> > - Added w32 and w64 cross builds based on Fedora 30
> > - Added a FreeBSD based job that builds all targets and runs `make
> >   check`
> > - Added "-j`nproc`" and "-j`sysctl -n hw.ncpu`" options to make as a
> >   simple but effective way of speeding up the builds and tests by
> >   using a number of make jobs matching the number of CPUs
> > - Because the Ansible playbooks reference the content on Dockerfiles,
> >   some fixes to some Dockerfiles caught in the process were included
> > - New patch with script to check or wait on a pipeline execution
> > 
> > [1] - https://lists.gnu.org/archive/html/qemu-devel/2019-12/msg00231.html
> > [2] - https://lists.gnu.org/archive/html/qemu-devel/2020-02/msg00154.html
> > 
> > Cleber Rosa (5):
> >   tests/docker: add CentOS 8 Dockerfile
> >   tests/docker: make "buildah bud" output similar to "docker build"
> >   GitLab CI: avoid calling before_scripts on unintended jobs
> >   GitLab Gating CI: introduce pipeline-status contrib script
> >   GitLab Gating CI: initial set of jobs, documentation and scripts
> > 
> >  .gitlab-ci-gating.yml                         | 111 ++++++++++
> >  .gitlab-ci.yml                                |  32 ++-
> >  contrib/ci/orgs/qemu/build-environment.yml    | 208 ++++++++++++++++++
> >  contrib/ci/orgs/qemu/gitlab-runner.yml        |  65 ++++++
> >  contrib/ci/orgs/qemu/inventory                |   2 +
> >  contrib/ci/orgs/qemu/vars.yml                 |  13 ++
> >  contrib/ci/scripts/gitlab-pipeline-status     | 148 +++++++++++++
> 
> FYI, the contrib/ directory is generally a place for arbitrary / adhoc
> but interesting user contributed files/sources that are not officially
> supported deliverables of the project.
> 
> IOW, this is not a good home for the official CI scripts.
> 

Good point, reason is that I wasn't/ain't sure this script is going
to be "official" and really that useful.  I'm happy to move it
somewhere else though.

> We already have a .gitlab-ci.d/ directory that looks like it would
> be good for this.  Or if that's not suitable, then scripts/ci/ is
> a second choice.
> 

Ack.

Thanks!
- Cleber.

> >  docs/devel/testing.rst                        | 142 ++++++++++++
> >  tests/docker/dockerfiles/centos8.docker       |  32 +++
> >  .../dockerfiles/debian-win32-cross.docker     |   2 +-
> >  10 files changed, 751 insertions(+), 4 deletions(-)
> >  create mode 100644 .gitlab-ci-gating.yml
> >  create mode 100644 contrib/ci/orgs/qemu/build-environment.yml
> >  create mode 100644 contrib/ci/orgs/qemu/gitlab-runner.yml
> >  create mode 100644 contrib/ci/orgs/qemu/inventory
> >  create mode 100644 contrib/ci/orgs/qemu/vars.yml
> >  create mode 100755 contrib/ci/scripts/gitlab-pipeline-status
> >  create mode 100644 tests/docker/dockerfiles/centos8.docker
> 
> Regards,
> Daniel
> --
> |: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange
> |:|
> |: https://libvirt.org         -o-            https://fstop138.berrange.com
> |:|
> |: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange
> |:|
>

Alex Bennée March 16, 2020, 1:11 p.m. UTC | #12

Daniel P. Berrangé <berrange@redhat.com> writes:

> On Thu, Mar 12, 2020 at 03:36:11PM -0400, Cleber Rosa wrote:
>> The idea about a public facing Gating CI for QEMU was lastly
>> summarized in an RFC[1].  Since then, it was decided that a
>> simpler version should be attempted first.
>> 
>> Changes from the RFC patches[2] accompanying the RFC document:
>> 
>> - Moved gating job definitions to .gitlab-ci-gating.yml
>> - Added info on "--disable-libssh" build option requirement
>>   (https://bugs.launchpad.net/qemu/+bug/1838763) to Ubuntu 18.04 jobs
>> - Added info on "--disable-glusterfs" build option requirement
>>   (there's no static version of those libs in distro supplied
>>   packages) to one
>> - Dropped ubuntu-18.04.3-x86_64-notools job definition, because it
>>   doesn't fall into the general scope of gating job described by PMM
>>   (and it did not run any test)
>> - Added w32 and w64 cross builds based on Fedora 30
>> - Added a FreeBSD based job that builds all targets and runs `make
>>   check`
>> - Added "-j`nproc`" and "-j`sysctl -n hw.ncpu`" options to make as a
>>   simple but effective way of speeding up the builds and tests by
>>   using a number of make jobs matching the number of CPUs
>> - Because the Ansible playbooks reference the content on Dockerfiles,
>>   some fixes to some Dockerfiles caught in the process were included
>> - New patch with script to check or wait on a pipeline execution
>> 
>> [1] - https://lists.gnu.org/archive/html/qemu-devel/2019-12/msg00231.html
>> [2] - https://lists.gnu.org/archive/html/qemu-devel/2020-02/msg00154.html
>> 
>> Cleber Rosa (5):
>>   tests/docker: add CentOS 8 Dockerfile
>>   tests/docker: make "buildah bud" output similar to "docker build"
>>   GitLab CI: avoid calling before_scripts on unintended jobs
>>   GitLab Gating CI: introduce pipeline-status contrib script
>>   GitLab Gating CI: initial set of jobs, documentation and scripts
>> 
>>  .gitlab-ci-gating.yml                         | 111 ++++++++++
>>  .gitlab-ci.yml                                |  32 ++-
>>  contrib/ci/orgs/qemu/build-environment.yml    | 208 ++++++++++++++++++
>>  contrib/ci/orgs/qemu/gitlab-runner.yml        |  65 ++++++
>>  contrib/ci/orgs/qemu/inventory                |   2 +
>>  contrib/ci/orgs/qemu/vars.yml                 |  13 ++
>>  contrib/ci/scripts/gitlab-pipeline-status     | 148 +++++++++++++
>
> FYI, the contrib/ directory is generally a place for arbitrary / adhoc
> but interesting user contributed files/sources that are not officially
> supported deliverables of the project.
>
> IOW, this is not a good home for the official CI scripts.
>
> We already have a .gitlab-ci.d/ directory that looks like it would
> be good for this.  Or if that's not suitable, then scripts/ci/ is
> a second choice.

I'd vote for scripts/ci/ or scripts/gitlab/ as the .gitlab-ci.d might be
a little hidden.

Peter Maydell March 16, 2020, 2:57 p.m. UTC | #13

On Mon, 16 Mar 2020 at 12:26, Cleber Rosa <crosa@redhat.com> wrote:
> About the runners and the fact that the job is stuck without them,
> the message seems straightforward enough, but I can't get to the
> project configuration to look at the registered runners with my
> current permissions (set as "developer").

I've moved you up to 'maintainer' status, hopefully that is
sufficient to look at the relevant config ?

thanks
-- PMM

Aleksandar Markovic March 16, 2020, 3:38 p.m. UTC | #14

On Mon, Mar 16, 2020 at 4:24 PM Alex Bennée <alex.bennee@linaro.org> wrote:
>
>
> Daniel P. Berrangé <berrange@redhat.com> writes:
>
> > On Thu, Mar 12, 2020 at 03:36:11PM -0400, Cleber Rosa wrote:
> >> The idea about a public facing Gating CI for QEMU was lastly
> >> summarized in an RFC[1].  Since then, it was decided that a
> >> simpler version should be attempted first.
> >>
> >> Changes from the RFC patches[2] accompanying the RFC document:
> >>
> >> - Moved gating job definitions to .gitlab-ci-gating.yml
> >> - Added info on "--disable-libssh" build option requirement
> >>   (https://bugs.launchpad.net/qemu/+bug/1838763) to Ubuntu 18.04 jobs
> >> - Added info on "--disable-glusterfs" build option requirement
> >>   (there's no static version of those libs in distro supplied
> >>   packages) to one
> >> - Dropped ubuntu-18.04.3-x86_64-notools job definition, because it
> >>   doesn't fall into the general scope of gating job described by PMM
> >>   (and it did not run any test)
> >> - Added w32 and w64 cross builds based on Fedora 30
> >> - Added a FreeBSD based job that builds all targets and runs `make
> >>   check`
> >> - Added "-j`nproc`" and "-j`sysctl -n hw.ncpu`" options to make as a
> >>   simple but effective way of speeding up the builds and tests by
> >>   using a number of make jobs matching the number of CPUs
> >> - Because the Ansible playbooks reference the content on Dockerfiles,
> >>   some fixes to some Dockerfiles caught in the process were included
> >> - New patch with script to check or wait on a pipeline execution
> >>
> >> [1] - https://lists.gnu.org/archive/html/qemu-devel/2019-12/msg00231.html
> >> [2] - https://lists.gnu.org/archive/html/qemu-devel/2020-02/msg00154.html
> >>
> >> Cleber Rosa (5):
> >>   tests/docker: add CentOS 8 Dockerfile
> >>   tests/docker: make "buildah bud" output similar to "docker build"
> >>   GitLab CI: avoid calling before_scripts on unintended jobs
> >>   GitLab Gating CI: introduce pipeline-status contrib script
> >>   GitLab Gating CI: initial set of jobs, documentation and scripts
> >>
> >>  .gitlab-ci-gating.yml                         | 111 ++++++++++
> >>  .gitlab-ci.yml                                |  32 ++-
> >>  contrib/ci/orgs/qemu/build-environment.yml    | 208 ++++++++++++++++++
> >>  contrib/ci/orgs/qemu/gitlab-runner.yml        |  65 ++++++
> >>  contrib/ci/orgs/qemu/inventory                |   2 +
> >>  contrib/ci/orgs/qemu/vars.yml                 |  13 ++
> >>  contrib/ci/scripts/gitlab-pipeline-status     | 148 +++++++++++++
> >
> > FYI, the contrib/ directory is generally a place for arbitrary / adhoc
> > but interesting user contributed files/sources that are not officially
> > supported deliverables of the project.
> >
> > IOW, this is not a good home for the official CI scripts.
> >
> > We already have a .gitlab-ci.d/ directory that looks like it would
> > be good for this.  Or if that's not suitable, then scripts/ci/ is
> > a second choice.
>
> I'd vote for scripts/ci/ or scripts/gitlab/ as the .gitlab-ci.d might be
> a little hidden.
>

I vote for scripts/ci/ or scripts/gitlab/ too. With a little
preference to scripts/ci/.

Aleksandar

> --
> Alex Bennée
>

Cleber Rosa March 17, 2020, 4:59 a.m. UTC | #15

----- Original Message -----
> From: "Peter Maydell" <peter.maydell@linaro.org>
> To: "Cleber Rosa" <crosa@redhat.com>
> Cc: "Fam Zheng" <fam@euphon.net>, "Thomas Huth" <thuth@redhat.com>, "Beraldo Leal" <bleal@redhat.com>, "Erik
> Skultety" <eskultet@redhat.com>, "Alex Bennée" <alex.bennee@linaro.org>, "Wainer Moschetta" <wmoschet@redhat.com>,
> "QEMU Developers" <qemu-devel@nongnu.org>, "Wainer dos Santos Moschetta" <wainersm@redhat.com>, "Willian Rampazzo"
> <wrampazz@redhat.com>, "Philippe Mathieu-Daudé" <philmd@redhat.com>, "Eduardo Habkost" <ehabkost@redhat.com>
> Sent: Monday, March 16, 2020 10:57:30 AM
> Subject: Re: [PATCH 0/5] QEMU Gating CI
> 
> On Mon, 16 Mar 2020 at 12:26, Cleber Rosa <crosa@redhat.com> wrote:
> > About the runners and the fact that the job is stuck without them,
> > the message seems straightforward enough, but I can't get to the
> > project configuration to look at the registered runners with my
> > current permissions (set as "developer").
> 
> I've moved you up to 'maintainer' status, hopefully that is
> sufficient to look at the relevant config ?
> 
> thanks
> -- PMM
> 
> 

Hi Peter,

Yes, that did the trick and I can now see the configuration.  What I can
*not* see is any "Specific Runner" configured.  So maybe:

1) The documentation I included is not clear enough about the fact that
setup steps need to be done on a machine so that it becomes a "Runner"

2) The (Ansible) playbooks (especially contrib/ci/orgs/qemu/gitlab-runner.yml)
is not working as intended

3) Some expectations misalignment on machines that would be available to run
those jobs

In any case, none of those should be big problems.  Please let me know what
you did/experienced/expected up to this point, and we can continue from there.

Regards,
- Cleber.

Peter Maydell March 17, 2020, 9:29 a.m. UTC | #16

On Tue, 17 Mar 2020 at 04:59, Cleber Rosa <crosa@redhat.com> wrote:
> Yes, that did the trick and I can now see the configuration.  What I can
> *not* see is any "Specific Runner" configured.  So maybe:
>
> 1) The documentation I included is not clear enough about the fact that
> setup steps need to be done on a machine so that it becomes a "Runner"
>
> 2) The (Ansible) playbooks (especially contrib/ci/orgs/qemu/gitlab-runner.yml)
> is not working as intended
>
> 3) Some expectations misalignment on machines that would be available to run
> those jobs
>
> In any case, none of those should be big problems.  Please let me know what
> you did/experienced/expected up to this point, and we can continue from there.

Ah, I see. My assumption was that this was all stuff that you were
working on, so that I would then be able to test that it worked correctly,
not that I would need to do configuration of the gitlab.com setup.
I thought all the stuff about "how to set up runners" was only for
people who wanted to set up some 3rd-party CI for non-official
forks or for when we wanted to add extra runners in future (eg for
architectures not yet covered). So the only thing I did was follow
your "just push to staging" instructions.

thanks
-- PMM

Cleber Rosa March 17, 2020, 2:12 p.m. UTC | #17

On Tue, Mar 17, 2020 at 09:29:32AM +0000, Peter Maydell wrote:
> On Tue, 17 Mar 2020 at 04:59, Cleber Rosa <crosa@redhat.com> wrote:
> > Yes, that did the trick and I can now see the configuration.  What I can
> > *not* see is any "Specific Runner" configured.  So maybe:
> >
> > 1) The documentation I included is not clear enough about the fact that
> > setup steps need to be done on a machine so that it becomes a "Runner"
> >
> > 2) The (Ansible) playbooks (especially contrib/ci/orgs/qemu/gitlab-runner.yml)
> > is not working as intended
> >
> > 3) Some expectations misalignment on machines that would be available to run
> > those jobs
> >
> > In any case, none of those should be big problems.  Please let me know what
> > you did/experienced/expected up to this point, and we can continue from there.
> 
> Ah, I see. My assumption was that this was all stuff that you were
> working on, so that I would then be able to test that it worked correctly,
> not that I would need to do configuration of the gitlab.com setup.

Hi Peter,

So, I had to use temporary hardware resources to set this up (and set
it up countless times TBH).  I had the understanding based on the list
of machines you documented[1] that at least some of them would be used
for the permanent setup.

> I thought all the stuff about "how to set up runners" was only for
> people who wanted to set up some 3rd-party CI for non-official
> forks or for when we wanted to add extra runners in future (eg for
> architectures not yet covered). So the only thing I did was follow
> your "just push to staging" instructions.
>

OK, I see it, now it makes more sense.  So we're "only" missing the
setup for the machines we'll use for the more permanent setup.  Would
you like to do a staged setup/migration using one or some of the
machines you documented?  I'm 100% onboard to help with this, meaning
that I can assist you with instructions, or do "pair setup" of the
machines if needed.  I think a good part of the evaluation here comes
down to how manageable/reproducible the setup is, so it'd make sense
for one to be part of the setup itself.

FIY there's also the possibility of grabbing some free VMs on GCP,
Azure, etc and setting them up as GitLab runners in a temporary way
(because of the temporarily free and VM nature).  I have a few
problems with this approach, including the fact that it doesn't yield
the complete experience wrt using hardware one owns and will have to
manage, besides the hardware limitations themselves.

Please let me know how you want to move on from here.

Cheers,
- Cleber.

> thanks
> -- PMM
> 

[1] https://wiki.qemu.org/Requirements/GatingCI#Current_Tests

Peter Maydell March 17, 2020, 2:24 p.m. UTC | #18

On Tue, 17 Mar 2020 at 14:13, Cleber Rosa <crosa@redhat.com> wrote:
>
> On Tue, Mar 17, 2020 at 09:29:32AM +0000, Peter Maydell wrote:
> > Ah, I see. My assumption was that this was all stuff that you were
> > working on, so that I would then be able to test that it worked correctly,
> > not that I would need to do configuration of the gitlab.com setup.

> So, I had to use temporary hardware resources to set this up (and set
> it up countless times TBH).  I had the understanding based on the list
> of machines you documented[1] that at least some of them would be used
> for the permanent setup.

Well, some of them will be (eg the s390 box), but some of them
are my personal ones that can't be reused easily. I'd assumed
in any case that gitlab would have at least support for x86 hosts:
we are definitely not going to continue to use my desktop machine
for running CI builds! Also IIRC RedHat said they'd be able to
provide some machines for runners.

> OK, I see it, now it makes more sense.  So we're "only" missing the
> setup for the machines we'll use for the more permanent setup.  Would
> you like to do a staged setup/migration using one or some of the
> machines you documented?  I'm 100% onboard to help with this, meaning
> that I can assist you with instructions, or do "pair setup" of the
> machines if needed.  I think a good part of the evaluation here comes
> down to how manageable/reproducible the setup is, so it'd make sense
> for one to be part of the setup itself.

I think we should start by getting the gitlab setup working
for the basic "x86 configs" first. Then we can try adding
a runner for s390 (that one's logistically easiest because
it is a project machine, not one owned by me personally or
by Linaro) once the basic framework is working, and expand
from there.

But to a large degree I really don't want to have to get
into the details of how gitlab works or setting up runners
myself if I can avoid it. We're going through this migration
because I want to be able to hand off the CI stuff to other
people, not to retain control of it.

thanks
-- PMM

Markus Armbruster March 19, 2020, 4:33 p.m. UTC | #19

Peter Maydell <peter.maydell@linaro.org> writes:

> On Tue, 17 Mar 2020 at 14:13, Cleber Rosa <crosa@redhat.com> wrote:
>>
>> On Tue, Mar 17, 2020 at 09:29:32AM +0000, Peter Maydell wrote:
>> > Ah, I see. My assumption was that this was all stuff that you were
>> > working on, so that I would then be able to test that it worked correctly,
>> > not that I would need to do configuration of the gitlab.com setup.
>
>> So, I had to use temporary hardware resources to set this up (and set
>> it up countless times TBH).  I had the understanding based on the list
>> of machines you documented[1] that at least some of them would be used
>> for the permanent setup.
>
> Well, some of them will be (eg the s390 box), but some of them
> are my personal ones that can't be reused easily. I'd assumed
> in any case that gitlab would have at least support for x86 hosts:
> we are definitely not going to continue to use my desktop machine
> for running CI builds! Also IIRC RedHat said they'd be able to
> provide some machines for runners.

Correct!  As discussed at the QEMU summit, we'll gladly chip in runners
to test the stuff we care about, but to match the coverage of your
private zoo of machines, others will have to chip in, too.

>> OK, I see it, now it makes more sense.  So we're "only" missing the
>> setup for the machines we'll use for the more permanent setup.  Would
>> you like to do a staged setup/migration using one or some of the
>> machines you documented?  I'm 100% onboard to help with this, meaning
>> that I can assist you with instructions, or do "pair setup" of the
>> machines if needed.  I think a good part of the evaluation here comes
>> down to how manageable/reproducible the setup is, so it'd make sense
>> for one to be part of the setup itself.
>
> I think we should start by getting the gitlab setup working
> for the basic "x86 configs" first. Then we can try adding
> a runner for s390 (that one's logistically easiest because
> it is a project machine, not one owned by me personally or
> by Linaro) once the basic framework is working, and expand
> from there.

Makes sense to me.

Next steps to get this off the ground:

* Red Hat provides runner(s) for x86 stuff we care about.

* If that doesn't cover 'basic "x86 configs" in your judgement, we
  fill the gaps as described below under "Expand from there".

* Add an s390 runner using the project machine you mentioned.

* Expand from there: identify the remaining gaps, map them to people /
  organizations interested in them, and solicit contributions from these
  guys.

A note on contributions: we need both hardware and people.  By people I
mean maintainers for the infrastructure, the tools and all the runners.
Cleber & team are willing to serve for the infrastructure, the tools and
the Red Hat runners.

Does this sound workable?

> But to a large degree I really don't want to have to get
> into the details of how gitlab works or setting up runners
> myself if I can avoid it. We're going through this migration
> because I want to be able to hand off the CI stuff to other
> people, not to retain control of it.

Understand.  We need contributions to gating CI, but the whole point of
this exercise is to make people other than *you* contribute to our
gating CI :)

Let me use this opportunity to say thank you for all your integration
work!

Cleber Rosa March 19, 2020, 11:53 p.m. UTC | #20

On Thu, Mar 19, 2020 at 05:33:01PM +0100, Markus Armbruster wrote:
> Peter Maydell <peter.maydell@linaro.org> writes:
> 
> > On Tue, 17 Mar 2020 at 14:13, Cleber Rosa <crosa@redhat.com> wrote:
> >>
> >> On Tue, Mar 17, 2020 at 09:29:32AM +0000, Peter Maydell wrote:
> >> > Ah, I see. My assumption was that this was all stuff that you were
> >> > working on, so that I would then be able to test that it worked correctly,
> >> > not that I would need to do configuration of the gitlab.com setup.
> >
> >> So, I had to use temporary hardware resources to set this up (and set
> >> it up countless times TBH).  I had the understanding based on the list
> >> of machines you documented[1] that at least some of them would be used
> >> for the permanent setup.
> >
> > Well, some of them will be (eg the s390 box), but some of them
> > are my personal ones that can't be reused easily. I'd assumed
> > in any case that gitlab would have at least support for x86 hosts:
> > we are definitely not going to continue to use my desktop machine
> > for running CI builds! Also IIRC RedHat said they'd be able to
> > provide some machines for runners.

While GitLab let's you run x86 code for free with the "Linux Shared
Runners"[1], I don't think it would be suitable to what we're trying
to achieve.  It's limited to a single OS (CoreOS), single architecture
and really geared towards running containers.  BTW, if it isn't clear,
this is the approach being used today for the jobs defined on
".gitlab-ci.yml".

IMO we can leverage and still expand on the use of the "Linux Shared
Runners", but to really get a grasp oh how well this model can work
for QEMU, we'll need "Specific Runners", because we're validating
how/if we can depend on it for OS/architectures they don't support on
shared runners (and sometimes not even for the gitlab-runner agent).

> 
> Correct!  As discussed at the QEMU summit, we'll gladly chip in runners
> to test the stuff we care about, but to match the coverage of your
> private zoo of machines, others will have to chip in, too.
>

I'm sorry I missed the original discussions, and I'm even more sorry
if that led to any misunderstandings here.

> >> OK, I see it, now it makes more sense.  So we're "only" missing the
> >> setup for the machines we'll use for the more permanent setup.  Would
> >> you like to do a staged setup/migration using one or some of the
> >> machines you documented?  I'm 100% onboard to help with this, meaning
> >> that I can assist you with instructions, or do "pair setup" of the
> >> machines if needed.  I think a good part of the evaluation here comes
> >> down to how manageable/reproducible the setup is, so it'd make sense
> >> for one to be part of the setup itself.
> >
> > I think we should start by getting the gitlab setup working
> > for the basic "x86 configs" first. Then we can try adding
> > a runner for s390 (that one's logistically easiest because
> > it is a project machine, not one owned by me personally or
> > by Linaro) once the basic framework is working, and expand
> > from there.
> 
> Makes sense to me.
> 
> Next steps to get this off the ground:
> 
> * Red Hat provides runner(s) for x86 stuff we care about.
> 
> * If that doesn't cover 'basic "x86 configs" in your judgement, we
>   fill the gaps as described below under "Expand from there".
> 
> * Add an s390 runner using the project machine you mentioned.
> 
> * Expand from there: identify the remaining gaps, map them to people /
>   organizations interested in them, and solicit contributions from these
>   guys.
> 
> A note on contributions: we need both hardware and people.  By people I
> mean maintainers for the infrastructure, the tools and all the runners.
> Cleber & team are willing to serve for the infrastructure, the tools and
> the Red Hat runners.

Right, while we've tried to streamline the process of setting up the
machines, there will be plenty of changes to improve the automation.

More importantly, maintaining the machines is very important to the
super important goal of catching code regressions only, and not facing
other failures.  Mundane tasks such as making sure enough disk space
is always available can be completely change the perception of the
usefulness of a CI environment.  And for this maintenance, we need
help from people "owning" those machines.

> 
> Does this sound workable?
> 
> > But to a large degree I really don't want to have to get
> > into the details of how gitlab works or setting up runners
> > myself if I can avoid it. We're going through this migration
> > because I want to be able to hand off the CI stuff to other
> > people, not to retain control of it.
> 
> Understand.  We need contributions to gating CI, but the whole point of
> this exercise is to make people other than *you* contribute to our
> gating CI :)
> 
> Let me use this opportunity to say thank you for all your integration
> work!
> 
>

^ THIS.  I have to say that I'm still amazed as to how Peter has
managed to automate, integrate and run all those tests in such varied
environments for so long.  Major kudos!

- Cleber.

[1] https://docs.gitlab.com/ee/user/gitlab_com/#linux-shared-runners

Peter Maydell April 21, 2020, 12:53 p.m. UTC | #21

On Thu, 19 Mar 2020 at 16:33, Markus Armbruster <armbru@redhat.com> wrote:
> Peter Maydell <peter.maydell@linaro.org> writes:
> > I think we should start by getting the gitlab setup working
> > for the basic "x86 configs" first. Then we can try adding
> > a runner for s390 (that one's logistically easiest because
> > it is a project machine, not one owned by me personally or
> > by Linaro) once the basic framework is working, and expand
> > from there.
>
> Makes sense to me.
>
> Next steps to get this off the ground:
>
> * Red Hat provides runner(s) for x86 stuff we care about.
>
> * If that doesn't cover 'basic "x86 configs" in your judgement, we
>   fill the gaps as described below under "Expand from there".
>
> * Add an s390 runner using the project machine you mentioned.
>
> * Expand from there: identify the remaining gaps, map them to people /
>   organizations interested in them, and solicit contributions from these
>   guys.
>
> A note on contributions: we need both hardware and people.  By people I
> mean maintainers for the infrastructure, the tools and all the runners.
> Cleber & team are willing to serve for the infrastructure, the tools and
> the Red Hat runners.

So, with 5.0 nearly out the door it seems like a good time to check
in on this thread again to ask where we are progress-wise with this.
My impression is that this patchset provides most of the scripting
and config side of the first step, so what we need is for RH to provide
an x86 runner machine and tell the gitlab CI it exists. I appreciate
that the whole coronavirus and working-from-home situation will have
upended everybody's plans, especially when actual hardware might
be involved, but how's it going ?

thanks
-- PMM

Cleber Rosa April 23, 2020, 5:04 p.m. UTC | #22

----- Original Message -----
> From: "Peter Maydell" <peter.maydell@linaro.org>
> To: "Markus Armbruster" <armbru@redhat.com>
> Cc: "Fam Zheng" <fam@euphon.net>, "Thomas Huth" <thuth@redhat.com>, "Beraldo Leal" <bleal@redhat.com>, "Erik
> Skultety" <eskultet@redhat.com>, "Alex Bennée" <alex.bennee@linaro.org>, "Wainer Moschetta" <wmoschet@redhat.com>,
> "QEMU Developers" <qemu-devel@nongnu.org>, "Wainer dos Santos Moschetta" <wainersm@redhat.com>, "Willian Rampazzo"
> <wrampazz@redhat.com>, "Cleber Rosa" <crosa@redhat.com>, "Philippe Mathieu-Daudé" <philmd@redhat.com>, "Eduardo
> Habkost" <ehabkost@redhat.com>
> Sent: Tuesday, April 21, 2020 8:53:49 AM
> Subject: Re: [PATCH 0/5] QEMU Gating CI
> 
> On Thu, 19 Mar 2020 at 16:33, Markus Armbruster <armbru@redhat.com> wrote:
> > Peter Maydell <peter.maydell@linaro.org> writes:
> > > I think we should start by getting the gitlab setup working
> > > for the basic "x86 configs" first. Then we can try adding
> > > a runner for s390 (that one's logistically easiest because
> > > it is a project machine, not one owned by me personally or
> > > by Linaro) once the basic framework is working, and expand
> > > from there.
> >
> > Makes sense to me.
> >
> > Next steps to get this off the ground:
> >
> > * Red Hat provides runner(s) for x86 stuff we care about.
> >
> > * If that doesn't cover 'basic "x86 configs" in your judgement, we
> >   fill the gaps as described below under "Expand from there".
> >
> > * Add an s390 runner using the project machine you mentioned.
> >
> > * Expand from there: identify the remaining gaps, map them to people /
> >   organizations interested in them, and solicit contributions from these
> >   guys.
> >
> > A note on contributions: we need both hardware and people.  By people I
> > mean maintainers for the infrastructure, the tools and all the runners.
> > Cleber & team are willing to serve for the infrastructure, the tools and
> > the Red Hat runners.
> 
> So, with 5.0 nearly out the door it seems like a good time to check
> in on this thread again to ask where we are progress-wise with this.
> My impression is that this patchset provides most of the scripting
> and config side of the first step, so what we need is for RH to provide
> an x86 runner machine and tell the gitlab CI it exists. I appreciate
> that the whole coronavirus and working-from-home situation will have
> upended everybody's plans, especially when actual hardware might
> be involved, but how's it going ?
> 

Hi Peter,

You hit the nail in the head here.  We were affected indeed with our ability
to move some machines from one lab to another (across the country), but we're
actively working on it.

From now on, I'll give you an update every time a significant event occurs on
our side.

> thanks
> -- PMM
> 
> 

Thanks for checking in!
- Cleber.

Daniel P. Berrangé April 23, 2020, 5:13 p.m. UTC | #23

On Thu, Apr 23, 2020 at 01:04:13PM -0400, Cleber Rosa wrote:
> 
> 
> ----- Original Message -----
> > From: "Peter Maydell" <peter.maydell@linaro.org>
> > To: "Markus Armbruster" <armbru@redhat.com>
> > Cc: "Fam Zheng" <fam@euphon.net>, "Thomas Huth" <thuth@redhat.com>, "Beraldo Leal" <bleal@redhat.com>, "Erik
> > Skultety" <eskultet@redhat.com>, "Alex Bennée" <alex.bennee@linaro.org>, "Wainer Moschetta" <wmoschet@redhat.com>,
> > "QEMU Developers" <qemu-devel@nongnu.org>, "Wainer dos Santos Moschetta" <wainersm@redhat.com>, "Willian Rampazzo"
> > <wrampazz@redhat.com>, "Cleber Rosa" <crosa@redhat.com>, "Philippe Mathieu-Daudé" <philmd@redhat.com>, "Eduardo
> > Habkost" <ehabkost@redhat.com>
> > Sent: Tuesday, April 21, 2020 8:53:49 AM
> > Subject: Re: [PATCH 0/5] QEMU Gating CI
> > 
> > On Thu, 19 Mar 2020 at 16:33, Markus Armbruster <armbru@redhat.com> wrote:
> > > Peter Maydell <peter.maydell@linaro.org> writes:
> > > > I think we should start by getting the gitlab setup working
> > > > for the basic "x86 configs" first. Then we can try adding
> > > > a runner for s390 (that one's logistically easiest because
> > > > it is a project machine, not one owned by me personally or
> > > > by Linaro) once the basic framework is working, and expand
> > > > from there.
> > >
> > > Makes sense to me.
> > >
> > > Next steps to get this off the ground:
> > >
> > > * Red Hat provides runner(s) for x86 stuff we care about.
> > >
> > > * If that doesn't cover 'basic "x86 configs" in your judgement, we
> > >   fill the gaps as described below under "Expand from there".
> > >
> > > * Add an s390 runner using the project machine you mentioned.
> > >
> > > * Expand from there: identify the remaining gaps, map them to people /
> > >   organizations interested in them, and solicit contributions from these
> > >   guys.
> > >
> > > A note on contributions: we need both hardware and people.  By people I
> > > mean maintainers for the infrastructure, the tools and all the runners.
> > > Cleber & team are willing to serve for the infrastructure, the tools and
> > > the Red Hat runners.
> > 
> > So, with 5.0 nearly out the door it seems like a good time to check
> > in on this thread again to ask where we are progress-wise with this.
> > My impression is that this patchset provides most of the scripting
> > and config side of the first step, so what we need is for RH to provide
> > an x86 runner machine and tell the gitlab CI it exists. I appreciate
> > that the whole coronavirus and working-from-home situation will have
> > upended everybody's plans, especially when actual hardware might
> > be involved, but how's it going ?
> > 
> 
> Hi Peter,
> 
> You hit the nail in the head here.  We were affected indeed with our ability
> to move some machines from one lab to another (across the country), but we're
> actively working on it.

For x86, do we really need to be using custom runners ?

With GitLab if someone forks the repo to their personal namespace, they
cannot use any custom runners setup by the origin project. So if we use
custom runners for x86, people forking won't be able to run the GitLab
CI jobs.

As a sub-system maintainer I wouldn't like this, because I ideally want
to be able to run the same jobs on my staging tree, that Peter will run
at merge time for the PULL request I send.

Thus my strong preference would be to use the GitLab runners in every
scenario where they are viable to use. Only use custom runners in the
cases where GitLab runners are clearly inadequate for our needs.

Based on what we've setup in GitLab for libvirt,  the shared runners
they have work fine for x86. Just need the environments you are testing
to be provided as Docker containers (you can actually build and cache
the container images during your CI job too).  IOW, any Linux distro
build and test jobs should be able to use shared runners on x86, and
likewise mingw builds. Custom runners should only be needed if the
jobs need todo *BSD / macOS builds, and/or have access to specific
hardware devices for some reason.


Regards,
Daniel

Cleber Rosa April 23, 2020, 5:36 p.m. UTC | #24

----- Original Message -----
> From: "Daniel P. Berrangé" <berrange@redhat.com>
> To: "Cleber Rosa" <crosa@redhat.com>
> Cc: "Peter Maydell" <peter.maydell@linaro.org>, "Fam Zheng" <fam@euphon.net>, "Thomas Huth" <thuth@redhat.com>,
> "Beraldo Leal" <bleal@redhat.com>, "Erik Skultety" <eskultet@redhat.com>, "Philippe Mathieu-Daudé"
> <philmd@redhat.com>, "Wainer Moschetta" <wmoschet@redhat.com>, "Markus Armbruster" <armbru@redhat.com>, "Wainer dos
> Santos Moschetta" <wainersm@redhat.com>, "QEMU Developers" <qemu-devel@nongnu.org>, "Willian Rampazzo"
> <wrampazz@redhat.com>, "Alex Bennée" <alex.bennee@linaro.org>, "Eduardo Habkost" <ehabkost@redhat.com>
> Sent: Thursday, April 23, 2020 1:13:22 PM
> Subject: Re: [PATCH 0/5] QEMU Gating CI
> 
> On Thu, Apr 23, 2020 at 01:04:13PM -0400, Cleber Rosa wrote:
> > 
> > 
> > ----- Original Message -----
> > > From: "Peter Maydell" <peter.maydell@linaro.org>
> > > To: "Markus Armbruster" <armbru@redhat.com>
> > > Cc: "Fam Zheng" <fam@euphon.net>, "Thomas Huth" <thuth@redhat.com>,
> > > "Beraldo Leal" <bleal@redhat.com>, "Erik
> > > Skultety" <eskultet@redhat.com>, "Alex Bennée" <alex.bennee@linaro.org>,
> > > "Wainer Moschetta" <wmoschet@redhat.com>,
> > > "QEMU Developers" <qemu-devel@nongnu.org>, "Wainer dos Santos Moschetta"
> > > <wainersm@redhat.com>, "Willian Rampazzo"
> > > <wrampazz@redhat.com>, "Cleber Rosa" <crosa@redhat.com>, "Philippe
> > > Mathieu-Daudé" <philmd@redhat.com>, "Eduardo
> > > Habkost" <ehabkost@redhat.com>
> > > Sent: Tuesday, April 21, 2020 8:53:49 AM
> > > Subject: Re: [PATCH 0/5] QEMU Gating CI
> > > 
> > > On Thu, 19 Mar 2020 at 16:33, Markus Armbruster <armbru@redhat.com>
> > > wrote:
> > > > Peter Maydell <peter.maydell@linaro.org> writes:
> > > > > I think we should start by getting the gitlab setup working
> > > > > for the basic "x86 configs" first. Then we can try adding
> > > > > a runner for s390 (that one's logistically easiest because
> > > > > it is a project machine, not one owned by me personally or
> > > > > by Linaro) once the basic framework is working, and expand
> > > > > from there.
> > > >
> > > > Makes sense to me.
> > > >
> > > > Next steps to get this off the ground:
> > > >
> > > > * Red Hat provides runner(s) for x86 stuff we care about.
> > > >
> > > > * If that doesn't cover 'basic "x86 configs" in your judgement, we
> > > >   fill the gaps as described below under "Expand from there".
> > > >
> > > > * Add an s390 runner using the project machine you mentioned.
> > > >
> > > > * Expand from there: identify the remaining gaps, map them to people /
> > > >   organizations interested in them, and solicit contributions from
> > > >   these
> > > >   guys.
> > > >
> > > > A note on contributions: we need both hardware and people.  By people I
> > > > mean maintainers for the infrastructure, the tools and all the runners.
> > > > Cleber & team are willing to serve for the infrastructure, the tools
> > > > and
> > > > the Red Hat runners.
> > > 
> > > So, with 5.0 nearly out the door it seems like a good time to check
> > > in on this thread again to ask where we are progress-wise with this.
> > > My impression is that this patchset provides most of the scripting
> > > and config side of the first step, so what we need is for RH to provide
> > > an x86 runner machine and tell the gitlab CI it exists. I appreciate
> > > that the whole coronavirus and working-from-home situation will have
> > > upended everybody's plans, especially when actual hardware might
> > > be involved, but how's it going ?
> > > 
> > 
> > Hi Peter,
> > 
> > You hit the nail in the head here.  We were affected indeed with our
> > ability
> > to move some machines from one lab to another (across the country), but
> > we're
> > actively working on it.
> 
> For x86, do we really need to be using custom runners ?
> 

Hi Daniel,

We're already using the shared x86 runners, but with a different goal.  The
goal of the "Gating CI" is indeed to expand on non-x86 environments.  We're
in a "chicken and egg" kind of situation, because we'd like to prove that
GitLab CI will allow QEMU to expand to very different runners and jobs, while
not really having all that hardware setup and publicly available at this time.

My experiments were really around that point, I mean, confirming that we can grow
the number of architectures/runners/jobs/configurations to provide a coverage
equal or greater to what Peter already does.

> With GitLab if someone forks the repo to their personal namespace, they
> cannot use any custom runners setup by the origin project. So if we use
> custom runners for x86, people forking won't be able to run the GitLab
> CI jobs.
> 

They will continue to be able use the jobs and runners already defined in
the .gitlab-ci.yml file.  This work will only affect people pushing to the/a
"staging" branch.

> As a sub-system maintainer I wouldn't like this, because I ideally want
> to be able to run the same jobs on my staging tree, that Peter will run
> at merge time for the PULL request I send.
> 

If you're looking for symmetry between any PR and "merge time" jobs, the
only solution is to allow any PR to access all the diverse set of non-shared
machines we're hoping to have in the near future.  This may be something
we'll get to, but I doubt we can tackle it in the near future now.

> Thus my strong preference would be to use the GitLab runners in every
> scenario where they are viable to use. Only use custom runners in the
> cases where GitLab runners are clearly inadequate for our needs.
> 
> Based on what we've setup in GitLab for libvirt,  the shared runners
> they have work fine for x86. Just need the environments you are testing
> to be provided as Docker containers (you can actually build and cache
> the container images during your CI job too).  IOW, any Linux distro
> build and test jobs should be able to use shared runners on x86, and
> likewise mingw builds. Custom runners should only be needed if the
> jobs need todo *BSD / macOS builds, and/or have access to specific
> hardware devices for some reason.
> 

We've discussed this before at the RFC time, wrt how the goal is for a wider
community to provide a wider range of jobs.  Even for x86, one may want
to require their jobs to run on a given accelerator, such as KVM, so we
need to consider that from the very beginning.

I don't see a problem with converging jobs with are being run on custom
runners back into shared runners as much as possible.  At the RFC discussion,
I actually pointed out how the build phase could be running essentially
on pre-built containers (on shared runners), but the test phase, say testing
KVM, should not be bound to that. 

So in essence, right now, moving everything to containers would invalidate the
exercise of being able to care for those custom architectures/builds/jobs we'll
need in the near future.  And that's really the whole point here.

Cheers,
- Cleber.

> 
> Regards,
> Daniel
> --
> |: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange
> |:|
> |: https://libvirt.org         -o-            https://fstop138.berrange.com
> |:|
> |: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange
> |:|
>

Peter Maydell April 23, 2020, 5:50 p.m. UTC | #25

On Thu, 23 Apr 2020 at 18:37, Cleber Rosa <crosa@redhat.com> wrote:
> We're already using the shared x86 runners, but with a different goal.  The
> goal of the "Gating CI" is indeed to expand on non-x86 environments.  We're
> in a "chicken and egg" kind of situation, because we'd like to prove that
> GitLab CI will allow QEMU to expand to very different runners and jobs, while
> not really having all that hardware setup and publicly available at this time.

We do have the S390 machine that IBM kindly made available to
the project -- that is not a personal or Linaro machine, so
there are no issues with giving you a login on that so you
can set it up as a CI runner. Drop me an email if you want
access to it.

thanks
-- PMM

Philippe Mathieu-Daudé April 23, 2020, 9:28 p.m. UTC | #26

On 4/23/20 7:13 PM, Daniel P. Berrangé wrote:
> On Thu, Apr 23, 2020 at 01:04:13PM -0400, Cleber Rosa wrote:
>> ----- Original Message -----
>>> From: "Peter Maydell" <peter.maydell@linaro.org>
>>> To: "Markus Armbruster" <armbru@redhat.com>
>>> Cc: "Fam Zheng" <fam@euphon.net>, "Thomas Huth" <thuth@redhat.com>, "Beraldo Leal" <bleal@redhat.com>, "Erik
>>> Skultety" <eskultet@redhat.com>, "Alex Bennée" <alex.bennee@linaro.org>, "Wainer Moschetta" <wmoschet@redhat.com>,
>>> "QEMU Developers" <qemu-devel@nongnu.org>, "Wainer dos Santos Moschetta" <wainersm@redhat.com>, "Willian Rampazzo"
>>> <wrampazz@redhat.com>, "Cleber Rosa" <crosa@redhat.com>, "Philippe Mathieu-Daudé" <philmd@redhat.com>, "Eduardo
>>> Habkost" <ehabkost@redhat.com>
>>> Sent: Tuesday, April 21, 2020 8:53:49 AM
>>> Subject: Re: [PATCH 0/5] QEMU Gating CI
>>>
>>> On Thu, 19 Mar 2020 at 16:33, Markus Armbruster <armbru@redhat.com> wrote:
>>>> Peter Maydell <peter.maydell@linaro.org> writes:
>>>>> I think we should start by getting the gitlab setup working
>>>>> for the basic "x86 configs" first. Then we can try adding
>>>>> a runner for s390 (that one's logistically easiest because
>>>>> it is a project machine, not one owned by me personally or
>>>>> by Linaro) once the basic framework is working, and expand
>>>>> from there.
>>>>
>>>> Makes sense to me.
>>>>
>>>> Next steps to get this off the ground:
>>>>
>>>> * Red Hat provides runner(s) for x86 stuff we care about.
>>>>
>>>> * If that doesn't cover 'basic "x86 configs" in your judgement, we
>>>>    fill the gaps as described below under "Expand from there".
>>>>
>>>> * Add an s390 runner using the project machine you mentioned.
>>>>
>>>> * Expand from there: identify the remaining gaps, map them to people /
>>>>    organizations interested in them, and solicit contributions from these
>>>>    guys.
>>>>
>>>> A note on contributions: we need both hardware and people.  By people I
>>>> mean maintainers for the infrastructure, the tools and all the runners.
>>>> Cleber & team are willing to serve for the infrastructure, the tools and
>>>> the Red Hat runners.
>>>
>>> So, with 5.0 nearly out the door it seems like a good time to check
>>> in on this thread again to ask where we are progress-wise with this.
>>> My impression is that this patchset provides most of the scripting
>>> and config side of the first step, so what we need is for RH to provide
>>> an x86 runner machine and tell the gitlab CI it exists. I appreciate
>>> that the whole coronavirus and working-from-home situation will have
>>> upended everybody's plans, especially when actual hardware might
>>> be involved, but how's it going ?
>>>
>>
>> Hi Peter,
>>
>> You hit the nail in the head here.  We were affected indeed with our ability
>> to move some machines from one lab to another (across the country), but we're
>> actively working on it.
> 
> For x86, do we really need to be using custom runners ?
> 
> With GitLab if someone forks the repo to their personal namespace, they
> cannot use any custom runners setup by the origin project. So if we use
> custom runners for x86, people forking won't be able to run the GitLab
> CI jobs.
> 
> As a sub-system maintainer I wouldn't like this, because I ideally want
> to be able to run the same jobs on my staging tree, that Peter will run
> at merge time for the PULL request I send.
> 
> Thus my strong preference would be to use the GitLab runners in every
> scenario where they are viable to use. Only use custom runners in the
> cases where GitLab runners are clearly inadequate for our needs.
> 
> Based on what we've setup in GitLab for libvirt,  the shared runners
> they have work fine for x86. Just need the environments you are testing
> to be provided as Docker containers (you can actually build and cache
> the container images during your CI job too).  IOW, any Linux distro
> build and test jobs should be able to use shared runners on x86, and
> likewise mingw builds. Custom runners should only be needed if the
> jobs need todo *BSD / macOS builds, and/or have access to specific
> hardware devices for some reason.

Thanks to insist with that point Daniel. I'd rather see every 
configuration reproducible, so if we loose a hardware sponsor, we can 
find another one and start another runner.
Also note, if it is not easy to reproduce a runner, it will be very hard 
to debug a reported build/test error.

A non-reproducible runner can not be used as gating, because if they 
fail it is not acceptable to lock the project development process.


In some cases custom runners are acceptable. These runners won't be 
"gating" but can post informative log and status.

[*] Specific hardware that is not easily available.

- Alistair at last KVM forum talked about a RISCV board
   (to test host TCG)
- Aleksandar said at last KVM forum Wavecomp could plug a CI20 MIPS
   (to test host TCG)
- Lemote seems interested to setup some Loongson MIPSr6 board
   (to test interaction with KVM)

[*] To run code requiring accepting License Agreements

[*] To run non Free / Open Source code


Owner of these runners take the responsibility to provide enough 
time/information about reported bugs, or to debug them themselves.


Now the problem is GitLab runner is not natively available on the 
architectures listed in this mail, so custom setup is required. A dumb 
script running ssh to a machine also works (tested) but lot of manual 
tuning/maintenance expected.

Erik Skultety April 24, 2020, 6:57 a.m. UTC | #27

On Thu, Apr 23, 2020 at 11:28:21PM +0200, Philippe Mathieu-Daudé wrote:
> On 4/23/20 7:13 PM, Daniel P. Berrangé wrote:
> > On Thu, Apr 23, 2020 at 01:04:13PM -0400, Cleber Rosa wrote:
> > > ----- Original Message -----
> > > > From: "Peter Maydell" <peter.maydell@linaro.org>
> > > > To: "Markus Armbruster" <armbru@redhat.com>
> > > > Cc: "Fam Zheng" <fam@euphon.net>, "Thomas Huth" <thuth@redhat.com>, "Beraldo Leal" <bleal@redhat.com>, "Erik
> > > > Skultety" <eskultet@redhat.com>, "Alex Bennée" <alex.bennee@linaro.org>, "Wainer Moschetta" <wmoschet@redhat.com>,
> > > > "QEMU Developers" <qemu-devel@nongnu.org>, "Wainer dos Santos Moschetta" <wainersm@redhat.com>, "Willian Rampazzo"
> > > > <wrampazz@redhat.com>, "Cleber Rosa" <crosa@redhat.com>, "Philippe Mathieu-Daudé" <philmd@redhat.com>, "Eduardo
> > > > Habkost" <ehabkost@redhat.com>
> > > > Sent: Tuesday, April 21, 2020 8:53:49 AM
> > > > Subject: Re: [PATCH 0/5] QEMU Gating CI
> > > > 
> > > > On Thu, 19 Mar 2020 at 16:33, Markus Armbruster <armbru@redhat.com> wrote:
> > > > > Peter Maydell <peter.maydell@linaro.org> writes:
> > > > > > I think we should start by getting the gitlab setup working
> > > > > > for the basic "x86 configs" first. Then we can try adding
> > > > > > a runner for s390 (that one's logistically easiest because
> > > > > > it is a project machine, not one owned by me personally or
> > > > > > by Linaro) once the basic framework is working, and expand
> > > > > > from there.
> > > > > 
> > > > > Makes sense to me.
> > > > > 
> > > > > Next steps to get this off the ground:
> > > > > 
> > > > > * Red Hat provides runner(s) for x86 stuff we care about.
> > > > > 
> > > > > * If that doesn't cover 'basic "x86 configs" in your judgement, we
> > > > >    fill the gaps as described below under "Expand from there".
> > > > > 
> > > > > * Add an s390 runner using the project machine you mentioned.
> > > > > 
> > > > > * Expand from there: identify the remaining gaps, map them to people /
> > > > >    organizations interested in them, and solicit contributions from these
> > > > >    guys.
> > > > > 
> > > > > A note on contributions: we need both hardware and people.  By people I
> > > > > mean maintainers for the infrastructure, the tools and all the runners.
> > > > > Cleber & team are willing to serve for the infrastructure, the tools and
> > > > > the Red Hat runners.
> > > > 
> > > > So, with 5.0 nearly out the door it seems like a good time to check
> > > > in on this thread again to ask where we are progress-wise with this.
> > > > My impression is that this patchset provides most of the scripting
> > > > and config side of the first step, so what we need is for RH to provide
> > > > an x86 runner machine and tell the gitlab CI it exists. I appreciate
> > > > that the whole coronavirus and working-from-home situation will have
> > > > upended everybody's plans, especially when actual hardware might
> > > > be involved, but how's it going ?
> > > > 
> > > 
> > > Hi Peter,
> > > 
> > > You hit the nail in the head here.  We were affected indeed with our ability
> > > to move some machines from one lab to another (across the country), but we're
> > > actively working on it.
> > 
> > For x86, do we really need to be using custom runners ?
> > 
> > With GitLab if someone forks the repo to their personal namespace, they
> > cannot use any custom runners setup by the origin project. So if we use
> > custom runners for x86, people forking won't be able to run the GitLab
> > CI jobs.
> > 
> > As a sub-system maintainer I wouldn't like this, because I ideally want
> > to be able to run the same jobs on my staging tree, that Peter will run
> > at merge time for the PULL request I send.
> > 
> > Thus my strong preference would be to use the GitLab runners in every
> > scenario where they are viable to use. Only use custom runners in the
> > cases where GitLab runners are clearly inadequate for our needs.
> > 
> > Based on what we've setup in GitLab for libvirt,  the shared runners
> > they have work fine for x86. Just need the environments you are testing
> > to be provided as Docker containers (you can actually build and cache
> > the container images during your CI job too).  IOW, any Linux distro
> > build and test jobs should be able to use shared runners on x86, and
> > likewise mingw builds. Custom runners should only be needed if the
> > jobs need todo *BSD / macOS builds, and/or have access to specific
> > hardware devices for some reason.

Not just ^that, you also want custom VM runners to run integration tests, e.g.
in libvirt, we'd have to put systemd and a lof of other cruft into the
container to be able to run the tests at which point you must ask yourself,
whyt not go with a VM instead in which case we're limited in terms of
infrastructure...

> 
> Thanks to insist with that point Daniel. I'd rather see every configuration
> reproducible, so if we loose a hardware sponsor, we can find another one and
> start another runner.
> Also note, if it is not easy to reproduce a runner, it will be very hard to
> debug a reported build/test error.

(Thanks for bringing ^this point up Philippe)

...However, what we've been actively working on in libvirt is to extend the
lcitool we have (which can spawn local test VMs) to the point where we're able
to generate machines that would be the reproducible. Right now I'm playing with
cloud-init integration with lcitool (patches coming soon) that would allow us
to use the same machines locally as we'd want to have in, say, OpenStack and
share them as compressed images, so even when updated/managed by lcitool
locally, you'd get the same environment.

Regards,

Daniel P. Berrangé April 24, 2020, 9:30 a.m. UTC | #28

On Thu, Apr 23, 2020 at 01:36:48PM -0400, Cleber Rosa wrote:
> 
> 
> ----- Original Message -----
> > From: "Daniel P. Berrangé" <berrange@redhat.com>
> > To: "Cleber Rosa" <crosa@redhat.com>
> > Cc: "Peter Maydell" <peter.maydell@linaro.org>, "Fam Zheng" <fam@euphon.net>, "Thomas Huth" <thuth@redhat.com>,
> > "Beraldo Leal" <bleal@redhat.com>, "Erik Skultety" <eskultet@redhat.com>, "Philippe Mathieu-Daudé"
> > <philmd@redhat.com>, "Wainer Moschetta" <wmoschet@redhat.com>, "Markus Armbruster" <armbru@redhat.com>, "Wainer dos
> > Santos Moschetta" <wainersm@redhat.com>, "QEMU Developers" <qemu-devel@nongnu.org>, "Willian Rampazzo"
> > <wrampazz@redhat.com>, "Alex Bennée" <alex.bennee@linaro.org>, "Eduardo Habkost" <ehabkost@redhat.com>
> > Sent: Thursday, April 23, 2020 1:13:22 PM
> > Subject: Re: [PATCH 0/5] QEMU Gating CI
> > 
> > On Thu, Apr 23, 2020 at 01:04:13PM -0400, Cleber Rosa wrote:
> > > 
> > > 
> > > ----- Original Message -----
> > > > From: "Peter Maydell" <peter.maydell@linaro.org>
> > > > To: "Markus Armbruster" <armbru@redhat.com>
> > > > Cc: "Fam Zheng" <fam@euphon.net>, "Thomas Huth" <thuth@redhat.com>,
> > > > "Beraldo Leal" <bleal@redhat.com>, "Erik
> > > > Skultety" <eskultet@redhat.com>, "Alex Bennée" <alex.bennee@linaro.org>,
> > > > "Wainer Moschetta" <wmoschet@redhat.com>,
> > > > "QEMU Developers" <qemu-devel@nongnu.org>, "Wainer dos Santos Moschetta"
> > > > <wainersm@redhat.com>, "Willian Rampazzo"
> > > > <wrampazz@redhat.com>, "Cleber Rosa" <crosa@redhat.com>, "Philippe
> > > > Mathieu-Daudé" <philmd@redhat.com>, "Eduardo
> > > > Habkost" <ehabkost@redhat.com>
> > > > Sent: Tuesday, April 21, 2020 8:53:49 AM
> > > > Subject: Re: [PATCH 0/5] QEMU Gating CI
> > > > 
> > > > On Thu, 19 Mar 2020 at 16:33, Markus Armbruster <armbru@redhat.com>
> > > > wrote:
> > > > > Peter Maydell <peter.maydell@linaro.org> writes:
> > > > > > I think we should start by getting the gitlab setup working
> > > > > > for the basic "x86 configs" first. Then we can try adding
> > > > > > a runner for s390 (that one's logistically easiest because
> > > > > > it is a project machine, not one owned by me personally or
> > > > > > by Linaro) once the basic framework is working, and expand
> > > > > > from there.
> > > > >
> > > > > Makes sense to me.
> > > > >
> > > > > Next steps to get this off the ground:
> > > > >
> > > > > * Red Hat provides runner(s) for x86 stuff we care about.
> > > > >
> > > > > * If that doesn't cover 'basic "x86 configs" in your judgement, we
> > > > >   fill the gaps as described below under "Expand from there".
> > > > >
> > > > > * Add an s390 runner using the project machine you mentioned.
> > > > >
> > > > > * Expand from there: identify the remaining gaps, map them to people /
> > > > >   organizations interested in them, and solicit contributions from
> > > > >   these
> > > > >   guys.
> > > > >
> > > > > A note on contributions: we need both hardware and people.  By people I
> > > > > mean maintainers for the infrastructure, the tools and all the runners.
> > > > > Cleber & team are willing to serve for the infrastructure, the tools
> > > > > and
> > > > > the Red Hat runners.
> > > > 
> > > > So, with 5.0 nearly out the door it seems like a good time to check
> > > > in on this thread again to ask where we are progress-wise with this.
> > > > My impression is that this patchset provides most of the scripting
> > > > and config side of the first step, so what we need is for RH to provide
> > > > an x86 runner machine and tell the gitlab CI it exists. I appreciate
> > > > that the whole coronavirus and working-from-home situation will have
> > > > upended everybody's plans, especially when actual hardware might
> > > > be involved, but how's it going ?
> > > > 
> > > 
> > > Hi Peter,
> > > 
> > > You hit the nail in the head here.  We were affected indeed with our
> > > ability
> > > to move some machines from one lab to another (across the country), but
> > > we're
> > > actively working on it.
> > 
> > For x86, do we really need to be using custom runners ?
> > 
> 
> Hi Daniel,
> 
> We're already using the shared x86 runners, but with a different goal.  The
> goal of the "Gating CI" is indeed to expand on non-x86 environments.  We're
> in a "chicken and egg" kind of situation, because we'd like to prove that
> GitLab CI will allow QEMU to expand to very different runners and jobs, while
> not really having all that hardware setup and publicly available at this time.
> 
> My experiments were really around that point, I mean, confirming that we can grow
> the number of architectures/runners/jobs/configurations to provide a coverage
> equal or greater to what Peter already does.

So IIUC, you're saying that for x86 gating, the intention is to use shared
runners in general.

Your current work that you say is blocked on access to x86 hardware, is just
about demonstrating the concept of plugging in custom runners, while we wait
for access to non-x86 hardware ?

> > With GitLab if someone forks the repo to their personal namespace, they
> > cannot use any custom runners setup by the origin project. So if we use
> > custom runners for x86, people forking won't be able to run the GitLab
> > CI jobs.
> > 
> 
> They will continue to be able use the jobs and runners already defined in
> the .gitlab-ci.yml file.  This work will only affect people pushing to the/a
> "staging" branch.
> 
> > As a sub-system maintainer I wouldn't like this, because I ideally want
> > to be able to run the same jobs on my staging tree, that Peter will run
> > at merge time for the PULL request I send.
> > 
> 
> If you're looking for symmetry between any PR and "merge time" jobs, the
> only solution is to allow any PR to access all the diverse set of non-shared
> machines we're hoping to have in the near future.  This may be something
> we'll get to, but I doubt we can tackle it in the near future now.

It occurred to me that we could do this if we grant selective access to
the Gitlab repos, to people who are official subsystem maintainers.
GitLab has a concept of "protected branches", so you can control who is
allowed to push changes on a per-branch granularity.

So, for example, in the main qemu.git, we could create branches for each
subsystem tree eg

  staging-block
  staging-qapi
  staging-crypto
  staging-migration
  ....

and for each of these branches, we can grant access to relevant subsystem
maintainer(s).

When they're ready to send a pull request to Peter, they can push their
tree to this branch. Since the branch is in the main gitlab.com/qemu/qemu
project namespace, this branch can run CI using the private QEMU runners.
The subsystem maintainer can thus see the full set of CI results across
all platforms required by Gating, before Peter even gets the pull request.

So when Peter then looks at merging the pull request to master, the only
he's likely to see are the non-deterministic bugs, or issues caused by
semantic conflicts with other recently merged code.

It would even be possible to do the final merge into master entirely from
GitLab, no need to go via email. When the source branch & target branch are
within the same git repo, GitLab has the ability to run CI jobs against the
resulting merge commit in a strict gating manner, before it hits master.
They call this "Merge trains" in their documentation.

IOW, from Peter's POV, merging pull requests could be as simple as hitting
the merge button in GitLab merge request UI. Everything wrt CI would be
completely automated, and the subsystem maintainers would have the
responsibility to dealing with merge conflicts & CI failures, which is
more scalable for the person co-ordinating the merges into master.


Regards,
Daniel

Philippe Mathieu-Daudé April 24, 2020, 9:39 a.m. UTC | #29

On Fri, Apr 24, 2020 at 11:30 AM Daniel P. Berrangé <berrange@redhat.com> wrote:
> > ----- Original Message -----
> > > From: "Daniel P. Berrangé" <berrange@redhat.com>
> > > To: "Cleber Rosa" <crosa@redhat.com>
[...]
> > Hi Daniel,
> >
> > We're already using the shared x86 runners, but with a different goal.  The
> > goal of the "Gating CI" is indeed to expand on non-x86 environments.  We're
> > in a "chicken and egg" kind of situation, because we'd like to prove that
> > GitLab CI will allow QEMU to expand to very different runners and jobs, while
> > not really having all that hardware setup and publicly available at this time.
> >
> > My experiments were really around that point, I mean, confirming that we can grow
> > the number of architectures/runners/jobs/configurations to provide a coverage
> > equal or greater to what Peter already does.
>
> So IIUC, you're saying that for x86 gating, the intention is to use shared
> runners in general.
>
> Your current work that you say is blocked on access to x86 hardware, is just
> about demonstrating the concept of plugging in custom runners, while we wait
> for access to non-x86 hardware ?
>
> > > With GitLab if someone forks the repo to their personal namespace, they
> > > cannot use any custom runners setup by the origin project. So if we use
> > > custom runners for x86, people forking won't be able to run the GitLab
> > > CI jobs.
> > >
> >
> > They will continue to be able use the jobs and runners already defined in
> > the .gitlab-ci.yml file.  This work will only affect people pushing to the/a
> > "staging" branch.
> >
> > > As a sub-system maintainer I wouldn't like this, because I ideally want
> > > to be able to run the same jobs on my staging tree, that Peter will run
> > > at merge time for the PULL request I send.
> > >
> >
> > If you're looking for symmetry between any PR and "merge time" jobs, the
> > only solution is to allow any PR to access all the diverse set of non-shared
> > machines we're hoping to have in the near future.  This may be something
> > we'll get to, but I doubt we can tackle it in the near future now.
>
> It occurred to me that we could do this if we grant selective access to
> the Gitlab repos, to people who are official subsystem maintainers.
> GitLab has a concept of "protected branches", so you can control who is
> allowed to push changes on a per-branch granularity.
>
> So, for example, in the main qemu.git, we could create branches for each
> subsystem tree eg
>
>   staging-block
>   staging-qapi
>   staging-crypto
>   staging-migration
>   ....
>
> and for each of these branches, we can grant access to relevant subsystem
> maintainer(s).

The MAINTAINERS file could help us with that, we already have scripts
to parse its sections.
Maintainers should keep it up-to-date, then the merge script would check, i.e.:

<newline> // section separator
--------------- // ignored
Trivial patches // description ignored
M: Michael Tokarev <mjt@tls.msk.ru>
M: Laurent Vivier <laurent@vivier.eu> // must match commit author
T: git git://git.corpit.ru/qemu.git trivial-patches
T: git https://github.com/vivier/qemu.git trivial-patches // must
match MR source

>
> When they're ready to send a pull request to Peter, they can push their
> tree to this branch. Since the branch is in the main gitlab.com/qemu/qemu
> project namespace, this branch can run CI using the private QEMU runners.
> The subsystem maintainer can thus see the full set of CI results across
> all platforms required by Gating, before Peter even gets the pull request.
>
> So when Peter then looks at merging the pull request to master, the only
> he's likely to see are the non-deterministic bugs, or issues caused by
> semantic conflicts with other recently merged code.
>
> It would even be possible to do the final merge into master entirely from
> GitLab, no need to go via email. When the source branch & target branch are
> within the same git repo, GitLab has the ability to run CI jobs against the
> resulting merge commit in a strict gating manner, before it hits master.
> They call this "Merge trains" in their documentation.
>
> IOW, from Peter's POV, merging pull requests could be as simple as hitting
> the merge button in GitLab merge request UI. Everything wrt CI would be
> completely automated, and the subsystem maintainers would have the
> responsibility to dealing with merge conflicts & CI failures, which is
> more scalable for the person co-ordinating the merges into master.
>
>
> Regards,
> Daniel

Cleber Rosa April 27, 2020, 4:43 a.m. UTC | #30

On Thu, 23 Apr 2020 18:50:47 +0100
Peter Maydell <peter.maydell@linaro.org> wrote:

> On Thu, 23 Apr 2020 at 18:37, Cleber Rosa <crosa@redhat.com> wrote:
> > We're already using the shared x86 runners, but with a different
> > goal.  The goal of the "Gating CI" is indeed to expand on non-x86
> > environments.  We're in a "chicken and egg" kind of situation,
> > because we'd like to prove that GitLab CI will allow QEMU to expand
> > to very different runners and jobs, while not really having all
> > that hardware setup and publicly available at this time.
> 
> We do have the S390 machine that IBM kindly made available to
> the project -- that is not a personal or Linaro machine, so
> there are no issues with giving you a login on that so you
> can set it up as a CI runner. Drop me an email if you want
> access to it.
> 

I sure do.  I'll shoot you an email off-list.

> thanks
> -- PMM
> 

Thanks,
- Cleber.

Cleber Rosa April 27, 2020, 5:12 a.m. UTC | #31

On Thu, 23 Apr 2020 23:28:21 +0200
Philippe Mathieu-Daudé <philmd@redhat.com> wrote:

> On 4/23/20 7:13 PM, Daniel P. Berrangé wrote:
> > On Thu, Apr 23, 2020 at 01:04:13PM -0400, Cleber Rosa wrote:
> >> ----- Original Message -----
> >>> From: "Peter Maydell" <peter.maydell@linaro.org>
> >>> To: "Markus Armbruster" <armbru@redhat.com>
> >>> Cc: "Fam Zheng" <fam@euphon.net>, "Thomas Huth"
> >>> <thuth@redhat.com>, "Beraldo Leal" <bleal@redhat.com>, "Erik
> >>> Skultety" <eskultet@redhat.com>, "Alex Bennée"
> >>> <alex.bennee@linaro.org>, "Wainer Moschetta"
> >>> <wmoschet@redhat.com>, "QEMU Developers" <qemu-devel@nongnu.org>,
> >>> "Wainer dos Santos Moschetta" <wainersm@redhat.com>, "Willian
> >>> Rampazzo" <wrampazz@redhat.com>, "Cleber Rosa"
> >>> <crosa@redhat.com>, "Philippe Mathieu-Daudé" <philmd@redhat.com>,
> >>> "Eduardo Habkost" <ehabkost@redhat.com> Sent: Tuesday, April 21,
> >>> 2020 8:53:49 AM Subject: Re: [PATCH 0/5] QEMU Gating CI
> >>>
> >>> On Thu, 19 Mar 2020 at 16:33, Markus Armbruster
> >>> <armbru@redhat.com> wrote:
> >>>> Peter Maydell <peter.maydell@linaro.org> writes:
> >>>>> I think we should start by getting the gitlab setup working
> >>>>> for the basic "x86 configs" first. Then we can try adding
> >>>>> a runner for s390 (that one's logistically easiest because
> >>>>> it is a project machine, not one owned by me personally or
> >>>>> by Linaro) once the basic framework is working, and expand
> >>>>> from there.
> >>>>
> >>>> Makes sense to me.
> >>>>
> >>>> Next steps to get this off the ground:
> >>>>
> >>>> * Red Hat provides runner(s) for x86 stuff we care about.
> >>>>
> >>>> * If that doesn't cover 'basic "x86 configs" in your judgement,
> >>>> we fill the gaps as described below under "Expand from there".
> >>>>
> >>>> * Add an s390 runner using the project machine you mentioned.
> >>>>
> >>>> * Expand from there: identify the remaining gaps, map them to
> >>>> people / organizations interested in them, and solicit
> >>>> contributions from these guys.
> >>>>
> >>>> A note on contributions: we need both hardware and people.  By
> >>>> people I mean maintainers for the infrastructure, the tools and
> >>>> all the runners. Cleber & team are willing to serve for the
> >>>> infrastructure, the tools and the Red Hat runners.
> >>>
> >>> So, with 5.0 nearly out the door it seems like a good time to
> >>> check in on this thread again to ask where we are progress-wise
> >>> with this. My impression is that this patchset provides most of
> >>> the scripting and config side of the first step, so what we need
> >>> is for RH to provide an x86 runner machine and tell the gitlab CI
> >>> it exists. I appreciate that the whole coronavirus and
> >>> working-from-home situation will have upended everybody's plans,
> >>> especially when actual hardware might be involved, but how's it
> >>> going ?
> >>>
> >>
> >> Hi Peter,
> >>
> >> You hit the nail in the head here.  We were affected indeed with
> >> our ability to move some machines from one lab to another (across
> >> the country), but we're actively working on it.
> > 
> > For x86, do we really need to be using custom runners ?
> > 
> > With GitLab if someone forks the repo to their personal namespace,
> > they cannot use any custom runners setup by the origin project. So
> > if we use custom runners for x86, people forking won't be able to
> > run the GitLab CI jobs.
> > 
> > As a sub-system maintainer I wouldn't like this, because I ideally
> > want to be able to run the same jobs on my staging tree, that Peter
> > will run at merge time for the PULL request I send.
> > 
> > Thus my strong preference would be to use the GitLab runners in
> > every scenario where they are viable to use. Only use custom
> > runners in the cases where GitLab runners are clearly inadequate
> > for our needs.
> > 
> > Based on what we've setup in GitLab for libvirt,  the shared runners
> > they have work fine for x86. Just need the environments you are
> > testing to be provided as Docker containers (you can actually build
> > and cache the container images during your CI job too).  IOW, any
> > Linux distro build and test jobs should be able to use shared
> > runners on x86, and likewise mingw builds. Custom runners should
> > only be needed if the jobs need todo *BSD / macOS builds, and/or
> > have access to specific hardware devices for some reason.
> 
> Thanks to insist with that point Daniel. I'd rather see every 
> configuration reproducible, so if we loose a hardware sponsor, we can 
> find another one and start another runner.

I also believe that having jobs that can be reproducible is key, but I
differ when it comes to believing that the hardware *alone* should
define if a job is gating or not.

My point is that even with easily accessible systems and software,
different jobs can easily yield different results, because of how the
underlying system is configured.  Sure, containers, but again, we have
to consider non container usage too.

In the RFC I tried to gather feedback on a plan to promote and demote
jobs from a gating status.  IMO, most jobs would begin their lives
being non-gating, and would prove both their stability and their
mantainer's responsiveness.  Even when such jobs are already gating,
it should be trivial to simply demote a gating job because of (and
not limited to) any of the following reasons:

 * job fails in a non-reproducible way
 * hardware is unresponsive and takes too long to produce results
 * maintainer is MIA

Some or all of the gating runners could also pick up jobs sent to
a branch other than "staging", say, a branch called "reproducer". That
branch could be writable by maintainers that need to reproduce a given
failure.

> Also note, if it is not easy to reproduce a runner, it will be very
> hard to debug a reported build/test error.
> 

One of the goals of the patches you'll find on this series is to
propose (I would say *require*) that new jobs that require new hardware
(even easily accessible systems such as x86) should provide easy to run
scripts to recreate those environments.  This is inline with my previous
point that it's not enough to just have the same hardware.

> A non-reproducible runner can not be used as gating, because if they 
> fail it is not acceptable to lock the project development process.
> 

Other people may be more familiar with this, but I do remember projects
such as OpenStack deferring test of hardware/software combinations to
other entities.  One specific party won't be able to
reproduce all configurations unless it's decided to be kept really
small.  In my opinion, it's better to face it and acknowledge that
fact, and have plans to be put to action in the exceptional cases where
the environment to reproduce a test is now unavailable.

> 
> In some cases custom runners are acceptable. These runners won't be 
> "gating" but can post informative log and status.
> 

Well, I have the feeling that some people maintaining those runners
will *not* want to have them as "informational" only.  If they invest a
good amount of time on them, I believe they'll want to reap the
benefits such as other not breaking the code they rely on.  If their
system is not gating, they lose that and may find breakage that CI did
not catch.  Again, I don't think "easily accessible" hardware should be
the only criteria for gating/non-gating status.

For instance, would you consider, say, a "Raspberry Pi 4 Model
B", running KVM jobs to be a reproducible runner?  Would you blame a
developer that breaks a Gating CI job on such a platform and says that
he can not reproduce it?

> [*] Specific hardware that is not easily available.
> 
> - Alistair at last KVM forum talked about a RISCV board
>    (to test host TCG)
> - Aleksandar said at last KVM forum Wavecomp could plug a CI20 MIPS
>    (to test host TCG)
> - Lemote seems interested to setup some Loongson MIPSr6 board
>    (to test interaction with KVM)
> 
> [*] To run code requiring accepting License Agreements
> 
> [*] To run non Free / Open Source code
> 
> 
> Owner of these runners take the responsibility to provide enough 
> time/information about reported bugs, or to debug them themselves.
> 

I do think that the owner of such runners may *not* want to have them
with Gating jobs, but I don't think the opposite should be the case,
because I find it very hard to define, without some prejudice, what
"easily available runner" means.

> 
> Now the problem is GitLab runner is not natively available on the 
> architectures listed in this mail, so custom setup is required. A
> dumb script running ssh to a machine also works (tested) but lot of
> manual tuning/maintenance expected.
> 

That's where I'm trying to help.  I built and tested the gitlab-runner
for a number of non-supported environments, and I expect to build
further on that (say contributing code or feedback back to GitLab so
they become official builds?).

Cheers,
- Cleber.

Cleber Rosa April 27, 2020, 5:24 a.m. UTC | #32

On Fri, 24 Apr 2020 08:57:46 +0200
Erik Skultety <eskultet@redhat.com> wrote:

> On Thu, Apr 23, 2020 at 11:28:21PM +0200, Philippe Mathieu-Daudé
> wrote:
> > On 4/23/20 7:13 PM, Daniel P. Berrangé wrote:
> > > On Thu, Apr 23, 2020 at 01:04:13PM -0400, Cleber Rosa wrote:
> > > > ----- Original Message -----
> > > > > From: "Peter Maydell" <peter.maydell@linaro.org>
> > > > > To: "Markus Armbruster" <armbru@redhat.com>
> > > > > Cc: "Fam Zheng" <fam@euphon.net>, "Thomas Huth"
> > > > > <thuth@redhat.com>, "Beraldo Leal" <bleal@redhat.com>, "Erik
> > > > > Skultety" <eskultet@redhat.com>, "Alex Bennée"
> > > > > <alex.bennee@linaro.org>, "Wainer Moschetta"
> > > > > <wmoschet@redhat.com>, "QEMU Developers"
> > > > > <qemu-devel@nongnu.org>, "Wainer dos Santos Moschetta"
> > > > > <wainersm@redhat.com>, "Willian Rampazzo"
> > > > > <wrampazz@redhat.com>, "Cleber Rosa" <crosa@redhat.com>,
> > > > > "Philippe Mathieu-Daudé" <philmd@redhat.com>, "Eduardo
> > > > > Habkost" <ehabkost@redhat.com> Sent: Tuesday, April 21, 2020
> > > > > 8:53:49 AM Subject: Re: [PATCH 0/5] QEMU Gating CI
> > > > > 
> > > > > On Thu, 19 Mar 2020 at 16:33, Markus Armbruster
> > > > > <armbru@redhat.com> wrote:
> > > > > > Peter Maydell <peter.maydell@linaro.org> writes:
> > > > > > > I think we should start by getting the gitlab setup
> > > > > > > working for the basic "x86 configs" first. Then we can
> > > > > > > try adding a runner for s390 (that one's logistically
> > > > > > > easiest because it is a project machine, not one owned by
> > > > > > > me personally or by Linaro) once the basic framework is
> > > > > > > working, and expand from there.
> > > > > > 
> > > > > > Makes sense to me.
> > > > > > 
> > > > > > Next steps to get this off the ground:
> > > > > > 
> > > > > > * Red Hat provides runner(s) for x86 stuff we care about.
> > > > > > 
> > > > > > * If that doesn't cover 'basic "x86 configs" in your
> > > > > > judgement, we fill the gaps as described below under
> > > > > > "Expand from there".
> > > > > > 
> > > > > > * Add an s390 runner using the project machine you
> > > > > > mentioned.
> > > > > > 
> > > > > > * Expand from there: identify the remaining gaps, map them
> > > > > > to people / organizations interested in them, and solicit
> > > > > > contributions from these guys.
> > > > > > 
> > > > > > A note on contributions: we need both hardware and people.
> > > > > > By people I mean maintainers for the infrastructure, the
> > > > > > tools and all the runners. Cleber & team are willing to
> > > > > > serve for the infrastructure, the tools and the Red Hat
> > > > > > runners.
> > > > > 
> > > > > So, with 5.0 nearly out the door it seems like a good time to
> > > > > check in on this thread again to ask where we are
> > > > > progress-wise with this. My impression is that this patchset
> > > > > provides most of the scripting and config side of the first
> > > > > step, so what we need is for RH to provide an x86 runner
> > > > > machine and tell the gitlab CI it exists. I appreciate that
> > > > > the whole coronavirus and working-from-home situation will
> > > > > have upended everybody's plans, especially when actual
> > > > > hardware might be involved, but how's it going ?
> > > > > 
> > > > 
> > > > Hi Peter,
> > > > 
> > > > You hit the nail in the head here.  We were affected indeed
> > > > with our ability to move some machines from one lab to another
> > > > (across the country), but we're actively working on it.
> > > 
> > > For x86, do we really need to be using custom runners ?
> > > 
> > > With GitLab if someone forks the repo to their personal
> > > namespace, they cannot use any custom runners setup by the origin
> > > project. So if we use custom runners for x86, people forking
> > > won't be able to run the GitLab CI jobs.
> > > 
> > > As a sub-system maintainer I wouldn't like this, because I
> > > ideally want to be able to run the same jobs on my staging tree,
> > > that Peter will run at merge time for the PULL request I send.
> > > 
> > > Thus my strong preference would be to use the GitLab runners in
> > > every scenario where they are viable to use. Only use custom
> > > runners in the cases where GitLab runners are clearly inadequate
> > > for our needs.
> > > 
> > > Based on what we've setup in GitLab for libvirt,  the shared
> > > runners they have work fine for x86. Just need the environments
> > > you are testing to be provided as Docker containers (you can
> > > actually build and cache the container images during your CI job
> > > too).  IOW, any Linux distro build and test jobs should be able
> > > to use shared runners on x86, and likewise mingw builds. Custom
> > > runners should only be needed if the jobs need todo *BSD / macOS
> > > builds, and/or have access to specific hardware devices for some
> > > reason.
> 
> Not just ^that, you also want custom VM runners to run integration
> tests, e.g. in libvirt, we'd have to put systemd and a lof of other
> cruft into the container to be able to run the tests at which point
> you must ask yourself, whyt not go with a VM instead in which case
> we're limited in terms of infrastructure...
> 

I'm completely in agreement that a lot of the jobs will be suitable to
be run on containers, but like you exemplified here Erik, we must take
into account the ones that won't be suitable.

For instance, a very real use case is testing KVM on bare metal.  One
could argue that "QEMU running on a container making use of
KVM would suffice". It may be true, it may not.  But even that 
won't be possible Today on a CentOS/RHEL 8 machine, because
gitlab-runner knows nothing about podman, so full blown x86
physical boxes may be the "cheaper" and more practical solution, at
least initially. Trying to use other architectures would surely have
similar issues.

> > 
> > Thanks to insist with that point Daniel. I'd rather see every
> > configuration reproducible, so if we loose a hardware sponsor, we
> > can find another one and start another runner.
> > Also note, if it is not easy to reproduce a runner, it will be very
> > hard to debug a reported build/test error.
> 
> (Thanks for bringing ^this point up Philippe)
> 
> ...However, what we've been actively working on in libvirt is to
> extend the lcitool we have (which can spawn local test VMs) to the
> point where we're able to generate machines that would be the
> reproducible. Right now I'm playing with cloud-init integration with
> lcitool (patches coming soon) that would allow us to use the same
> machines locally as we'd want to have in, say, OpenStack and share
> them as compressed images, so even when updated/managed by lcitool
> locally, you'd get the same environment.
> 
> Regards,
> 

This is great, and it aligns with my previous point that reproducibility
it's not *just* about the hardware, but about diligently documenting
and automating the environments, be them mundane or super specialized.
And IMO that should score some points when it comes to being
promoted/demoted as a Gating machine/job.

Thanks for the feedback Erik!
- Cleber.

Cleber Rosa April 27, 2020, 5:36 a.m. UTC | #33

On Fri, 24 Apr 2020 10:30:23 +0100
Daniel P. Berrangé <berrange@redhat.com> wrote:

> On Thu, Apr 23, 2020 at 01:36:48PM -0400, Cleber Rosa wrote:
> > 
> > 
> > ----- Original Message -----
> > > From: "Daniel P. Berrangé" <berrange@redhat.com>
> > > To: "Cleber Rosa" <crosa@redhat.com>
> > > Cc: "Peter Maydell" <peter.maydell@linaro.org>, "Fam Zheng"
> > > <fam@euphon.net>, "Thomas Huth" <thuth@redhat.com>, "Beraldo
> > > Leal" <bleal@redhat.com>, "Erik Skultety" <eskultet@redhat.com>,
> > > "Philippe Mathieu-Daudé" <philmd@redhat.com>, "Wainer Moschetta"
> > > <wmoschet@redhat.com>, "Markus Armbruster" <armbru@redhat.com>,
> > > "Wainer dos Santos Moschetta" <wainersm@redhat.com>, "QEMU
> > > Developers" <qemu-devel@nongnu.org>, "Willian Rampazzo"
> > > <wrampazz@redhat.com>, "Alex Bennée" <alex.bennee@linaro.org>,
> > > "Eduardo Habkost" <ehabkost@redhat.com> Sent: Thursday, April 23,
> > > 2020 1:13:22 PM Subject: Re: [PATCH 0/5] QEMU Gating CI
> > > 
> > > On Thu, Apr 23, 2020 at 01:04:13PM -0400, Cleber Rosa wrote:
> > > > 
> > > > 
> > > > ----- Original Message -----
> > > > > From: "Peter Maydell" <peter.maydell@linaro.org>
> > > > > To: "Markus Armbruster" <armbru@redhat.com>
> > > > > Cc: "Fam Zheng" <fam@euphon.net>, "Thomas Huth"
> > > > > <thuth@redhat.com>, "Beraldo Leal" <bleal@redhat.com>, "Erik
> > > > > Skultety" <eskultet@redhat.com>, "Alex Bennée"
> > > > > <alex.bennee@linaro.org>, "Wainer Moschetta"
> > > > > <wmoschet@redhat.com>, "QEMU Developers"
> > > > > <qemu-devel@nongnu.org>, "Wainer dos Santos Moschetta"
> > > > > <wainersm@redhat.com>, "Willian Rampazzo"
> > > > > <wrampazz@redhat.com>, "Cleber Rosa" <crosa@redhat.com>,
> > > > > "Philippe Mathieu-Daudé" <philmd@redhat.com>, "Eduardo
> > > > > Habkost" <ehabkost@redhat.com> Sent: Tuesday, April 21, 2020
> > > > > 8:53:49 AM Subject: Re: [PATCH 0/5] QEMU Gating CI
> > > > > 
> > > > > On Thu, 19 Mar 2020 at 16:33, Markus Armbruster
> > > > > <armbru@redhat.com> wrote:
> > > > > > Peter Maydell <peter.maydell@linaro.org> writes:
> > > > > > > I think we should start by getting the gitlab setup
> > > > > > > working for the basic "x86 configs" first. Then we can
> > > > > > > try adding a runner for s390 (that one's logistically
> > > > > > > easiest because it is a project machine, not one owned by
> > > > > > > me personally or by Linaro) once the basic framework is
> > > > > > > working, and expand from there.
> > > > > >
> > > > > > Makes sense to me.
> > > > > >
> > > > > > Next steps to get this off the ground:
> > > > > >
> > > > > > * Red Hat provides runner(s) for x86 stuff we care about.
> > > > > >
> > > > > > * If that doesn't cover 'basic "x86 configs" in your
> > > > > > judgement, we fill the gaps as described below under
> > > > > > "Expand from there".
> > > > > >
> > > > > > * Add an s390 runner using the project machine you
> > > > > > mentioned.
> > > > > >
> > > > > > * Expand from there: identify the remaining gaps, map them
> > > > > > to people / organizations interested in them, and solicit
> > > > > > contributions from these
> > > > > >   guys.
> > > > > >
> > > > > > A note on contributions: we need both hardware and people.
> > > > > > By people I mean maintainers for the infrastructure, the
> > > > > > tools and all the runners. Cleber & team are willing to
> > > > > > serve for the infrastructure, the tools and
> > > > > > the Red Hat runners.
> > > > > 
> > > > > So, with 5.0 nearly out the door it seems like a good time to
> > > > > check in on this thread again to ask where we are
> > > > > progress-wise with this. My impression is that this patchset
> > > > > provides most of the scripting and config side of the first
> > > > > step, so what we need is for RH to provide an x86 runner
> > > > > machine and tell the gitlab CI it exists. I appreciate that
> > > > > the whole coronavirus and working-from-home situation will
> > > > > have upended everybody's plans, especially when actual
> > > > > hardware might be involved, but how's it going ?
> > > > > 
> > > > 
> > > > Hi Peter,
> > > > 
> > > > You hit the nail in the head here.  We were affected indeed
> > > > with our ability
> > > > to move some machines from one lab to another (across the
> > > > country), but we're
> > > > actively working on it.
> > > 
> > > For x86, do we really need to be using custom runners ?
> > > 
> > 
> > Hi Daniel,
> > 
> > We're already using the shared x86 runners, but with a different
> > goal.  The goal of the "Gating CI" is indeed to expand on non-x86
> > environments.  We're in a "chicken and egg" kind of situation,
> > because we'd like to prove that GitLab CI will allow QEMU to expand
> > to very different runners and jobs, while not really having all
> > that hardware setup and publicly available at this time.
> > 
> > My experiments were really around that point, I mean, confirming
> > that we can grow the number of
> > architectures/runners/jobs/configurations to provide a coverage
> > equal or greater to what Peter already does.
> 
> So IIUC, you're saying that for x86 gating, the intention is to use
> shared runners in general.
> 

No, I've said that whenever possible we could use containers and thus
shared runners.  For instance, testing QEMU running on the x86 CentOS 8
KVM is not something we could do with shared runners. 

> Your current work that you say is blocked on access to x86 hardware,
> is just about demonstrating the concept of plugging in custom
> runners, while we wait for access to non-x86 hardware ?
> 

Short answer is no.  The original scope and goal was to have the same or
very similar jobs that Peter runs himself in his own machines.  So it
was/is not about just x86 hardware, but x86 that can run a couple of
different OSs, and non-x86 hardware too.  We're basically scaling down
and changing the scope (for instance adding the s390 machine here) in
an attempt to get this moving forward.

> > > With GitLab if someone forks the repo to their personal
> > > namespace, they cannot use any custom runners setup by the origin
> > > project. So if we use custom runners for x86, people forking
> > > won't be able to run the GitLab CI jobs.
> > > 
> > 
> > They will continue to be able use the jobs and runners already
> > defined in the .gitlab-ci.yml file.  This work will only affect
> > people pushing to the/a "staging" branch.
> > 
> > > As a sub-system maintainer I wouldn't like this, because I
> > > ideally want to be able to run the same jobs on my staging tree,
> > > that Peter will run at merge time for the PULL request I send.
> > > 
> > 
> > If you're looking for symmetry between any PR and "merge time"
> > jobs, the only solution is to allow any PR to access all the
> > diverse set of non-shared machines we're hoping to have in the near
> > future.  This may be something we'll get to, but I doubt we can
> > tackle it in the near future now.
> 
> It occurred to me that we could do this if we grant selective access
> to the Gitlab repos, to people who are official subsystem maintainers.
> GitLab has a concept of "protected branches", so you can control who
> is allowed to push changes on a per-branch granularity.
> 
> So, for example, in the main qemu.git, we could create branches for
> each subsystem tree eg
> 
>   staging-block
>   staging-qapi
>   staging-crypto
>   staging-migration
>   ....
> 
> and for each of these branches, we can grant access to relevant
> subsystem maintainer(s).
> 
> When they're ready to send a pull request to Peter, they can push
> their tree to this branch. Since the branch is in the main
> gitlab.com/qemu/qemu project namespace, this branch can run CI using
> the private QEMU runners. The subsystem maintainer can thus see the
> full set of CI results across all platforms required by Gating,
> before Peter even gets the pull request.
> 

Sure, this is actually an extrapolation/extension of what we're
proposing to do here with the unique "staging" branch.  I see no issues
at all to have more than one (one per subsystem/maintainer) staging
branches.

> So when Peter then looks at merging the pull request to master, the
> only he's likely to see are the non-deterministic bugs, or issues
> caused by semantic conflicts with other recently merged code.
> 
> It would even be possible to do the final merge into master entirely
> from GitLab, no need to go via email. When the source branch & target
> branch are within the same git repo, GitLab has the ability to run CI
> jobs against the resulting merge commit in a strict gating manner,
> before it hits master. They call this "Merge trains" in their
> documentation.
> 
> IOW, from Peter's POV, merging pull requests could be as simple as
> hitting the merge button in GitLab merge request UI. Everything wrt
> CI would be completely automated, and the subsystem maintainers would
> have the responsibility to dealing with merge conflicts & CI
> failures, which is more scalable for the person co-ordinating the
> merges into master.
> 

This is very much aligned to some previous discussions, I believe, at
the RFC thread.  But for practical purposes, the general direction was
to scale down to the bare minimum to replicate Peter's setup and
workflow, and then move from there to possibly something very similar
to what you're describing here.

> 
> Regards,
> Daniel


Thanks a *whole lot* for the feedback Daniel!
- Cleber.

Andrea Bolognani April 27, 2020, 8:51 a.m. UTC | #34

On Mon, 2020-04-27 at 01:24 -0400, Cleber Rosa wrote:
> On Fri, 24 Apr 2020 08:57:46 +0200
> Erik Skultety <eskultet@redhat.com> wrote:
> > On Thu, Apr 23, 2020 at 11:28:21PM +0200, Philippe Mathieu-Daudé
> > wrote:
> > > Thanks to insist with that point Daniel. I'd rather see every
> > > configuration reproducible, so if we loose a hardware sponsor, we
> > > can find another one and start another runner.
> > > Also note, if it is not easy to reproduce a runner, it will be very
> > > hard to debug a reported build/test error.
> > 
> > (Thanks for bringing ^this point up Philippe)
> > 
> > ...However, what we've been actively working on in libvirt is to
> > extend the lcitool we have (which can spawn local test VMs) to the
> > point where we're able to generate machines that would be the
> > reproducible. Right now I'm playing with cloud-init integration with
> > lcitool (patches coming soon) that would allow us to use the same
> > machines locally as we'd want to have in, say, OpenStack and share
> > them as compressed images, so even when updated/managed by lcitool
> > locally, you'd get the same environment.
> 
> This is great, and it aligns with my previous point that reproducibility
> it's not *just* about the hardware, but about diligently documenting
> and automating the environments, be them mundane or super specialized.
> And IMO that should score some points when it comes to being
> promoted/demoted as a Gating machine/job.

I think there's room to extend and rework lcitool so that it can be
used for QEMU CI as well, and we should definitely look into that.

Right now it only really covers VMs and containers, but there's
already one situation (FreeBSD) where the expectation is that you'd
import an existing VM image and then apply CI-related customizations
on top, so it's not too much of a stretch to imagine doing the same
for a bare metal machine. We use Ansible, so as long as we can
connect to the machine via ssh we're pretty much good to go.

Installation of VMs we already perform in an unattended fashion using
preseed/kickstart, and it should be relatively straighforward to
adapt those configurations to also work on real hardware. This way
we'd both be able to rely on having a sane OS as the base, and
relieve the administrator from the duty of manually installing the
machines.

Philippe Mathieu-Daudé April 27, 2020, 10:51 a.m. UTC | #35

On 4/27/20 7:12 AM, Cleber Rosa wrote:
> On Thu, 23 Apr 2020 23:28:21 +0200
> Philippe Mathieu-Daudé <philmd@redhat.com> wrote:
[...]
>> In some cases custom runners are acceptable. These runners won't be
>> "gating" but can post informative log and status.
>>
> 
> Well, I have the feeling that some people maintaining those runners
> will *not* want to have them as "informational" only.  If they invest a
> good amount of time on them, I believe they'll want to reap the
> benefits such as other not breaking the code they rely on.  If their
> system is not gating, they lose that and may find breakage that CI did
> not catch.  Again, I don't think "easily accessible" hardware should be
> the only criteria for gating/non-gating status.
> 
> For instance, would you consider, say, a "Raspberry Pi 4 Model
> B", running KVM jobs to be a reproducible runner?  Would you blame a
> developer that breaks a Gating CI job on such a platform and says that
> he can not reproduce it?

I'm not sure I understood the problem, as I'd answer "yes" but I guess 
you expect me to say "no"?

[...]
>> Now the problem is GitLab runner is not natively available on the
>> architectures listed in this mail, so custom setup is required. A
>> dumb script running ssh to a machine also works (tested) but lot of
>> manual tuning/maintenance expected.
>>
> 
> That's where I'm trying to help.  I built and tested the gitlab-runner
> for a number of non-supported environments, and I expect to build
> further on that (say contributing code or feedback back to GitLab so
> they become official builds?).

Good luck with that, it took more that 2 years to GitLab to officially 
support AMD64:
https://gitlab.com/gitlab-org/gitlab-runner/-/merge_requests/725

Hopefully the first non-x86 user was the hardest one who had to do all 
the bad work, and next architecture might get supported quicker...

> 
> Cheers,
> - Cleber.
>

Cleber Rosa April 27, 2020, 2:28 p.m. UTC | #36

On Mon, 27 Apr 2020 12:51:36 +0200
Philippe Mathieu-Daudé <philmd@redhat.com> wrote:

> On 4/27/20 7:12 AM, Cleber Rosa wrote:
> > On Thu, 23 Apr 2020 23:28:21 +0200
> > Philippe Mathieu-Daudé <philmd@redhat.com> wrote:
> [...]
> >> In some cases custom runners are acceptable. These runners won't be
> >> "gating" but can post informative log and status.
> >>
> > 
> > Well, I have the feeling that some people maintaining those runners
> > will *not* want to have them as "informational" only.  If they
> > invest a good amount of time on them, I believe they'll want to
> > reap the benefits such as other not breaking the code they rely on.
> >  If their system is not gating, they lose that and may find
> > breakage that CI did not catch.  Again, I don't think "easily
> > accessible" hardware should be the only criteria for
> > gating/non-gating status.
> > 
> > For instance, would you consider, say, a "Raspberry Pi 4 Model
> > B", running KVM jobs to be a reproducible runner?  Would you blame a
> > developer that breaks a Gating CI job on such a platform and says
> > that he can not reproduce it?
> 
> I'm not sure I understood the problem, as I'd answer "yes" but I
> guess you expect me to say "no"?
> 

What I mean is: would you blame such a developer for *not* having a
machine himself/herself that he/she can try to reproduce the failure?
And would you consider a "Raspberry Pi 4 Model B" an easily available
hardware?

> [...]
> >> Now the problem is GitLab runner is not natively available on the
> >> architectures listed in this mail, so custom setup is required. A
> >> dumb script running ssh to a machine also works (tested) but lot of
> >> manual tuning/maintenance expected.
> >>
> > 
> > That's where I'm trying to help.  I built and tested the
> > gitlab-runner for a number of non-supported environments, and I
> > expect to build further on that (say contributing code or feedback
> > back to GitLab so they become official builds?).
> 
> Good luck with that, it took more that 2 years to GitLab to
> officially support AMD64:
> https://gitlab.com/gitlab-org/gitlab-runner/-/merge_requests/725
> 

You mean aarch64, sure.  I'm not holding my breath, because we can
always have our own binaries/ports (or other executors such as ssh) but
I'm optimistic...

> Hopefully the first non-x86 user was the hardest one who had to do
> all the bad work, and next architecture might get supported quicker...
> 

... and this point is one of the reasons.  The other is competition
from Travis-CI (and others).

Cheers,
- Cleber.

Philippe Mathieu-Daudé April 27, 2020, 2:41 p.m. UTC | #37

On 4/27/20 4:28 PM, Cleber Rosa wrote:
> On Mon, 27 Apr 2020 12:51:36 +0200
> Philippe Mathieu-Daudé <philmd@redhat.com> wrote:
> 
>> On 4/27/20 7:12 AM, Cleber Rosa wrote:
>>> On Thu, 23 Apr 2020 23:28:21 +0200
>>> Philippe Mathieu-Daudé <philmd@redhat.com> wrote:
>> [...]
>>>> In some cases custom runners are acceptable. These runners won't be
>>>> "gating" but can post informative log and status.
>>>>
>>>
>>> Well, I have the feeling that some people maintaining those runners
>>> will *not* want to have them as "informational" only.  If they
>>> invest a good amount of time on them, I believe they'll want to
>>> reap the benefits such as other not breaking the code they rely on.
>>>   If their system is not gating, they lose that and may find
>>> breakage that CI did not catch.  Again, I don't think "easily
>>> accessible" hardware should be the only criteria for
>>> gating/non-gating status.
>>>
>>> For instance, would you consider, say, a "Raspberry Pi 4 Model
>>> B", running KVM jobs to be a reproducible runner?  Would you blame a
>>> developer that breaks a Gating CI job on such a platform and says
>>> that he can not reproduce it?
>>
>> I'm not sure I understood the problem, as I'd answer "yes" but I
>> guess you expect me to say "no"?
>>
> 
> What I mean is: would you blame such a developer for *not* having a
> machine himself/herself that he/she can try to reproduce the failure?
> And would you consider a "Raspberry Pi 4 Model B" an easily available
> hardware?

My view on this is if someone merged code in mainstream QEMU and 
maintains it, and if it is not easy to reproduce the setup (for a bug 
reported by a CI script), then it is the responsibility of the 
maintainer to resolve it. Either by providing particular access to the 
hardware, or be ready to spend a long debugging session over email and 
multiple time zones.

If it is not possible, then this specific code/setup can not claim for 
gating CI, and eventually mainstream isn't the best place for it.

>> [...]

Cleber Rosa April 27, 2020, 3:19 p.m. UTC | #38

On Mon, 27 Apr 2020 16:41:38 +0200
Philippe Mathieu-Daudé <philmd@redhat.com> wrote:

> On 4/27/20 4:28 PM, Cleber Rosa wrote:
> > 
> > What I mean is: would you blame such a developer for *not* having a
> > machine himself/herself that he/she can try to reproduce the
> > failure? And would you consider a "Raspberry Pi 4 Model B" an
> > easily available hardware?
> 
> My view on this is if someone merged code in mainstream QEMU and 
> maintains it, and if it is not easy to reproduce the setup (for a bug 
> reported by a CI script), then it is the responsibility of the 
> maintainer to resolve it. Either by providing particular access to
> the hardware, or be ready to spend a long debugging session over
> email and multiple time zones.
> 

Right, the "easy to reproduce" has a lot to with access to hardware,
and a lot to do with access to the same or reproducible setup.  And
yes, if I maintain platform/job "foobar" that was once upgraded to
gating status, has since then fallen behind and doesn't allow users to
easily reproduce it, it all falls unto the maintainer to resolve issues.

I'd even say that people having access to identical hardware could
proactively challenge a given job status as gating if they fail to
reproduce it with the provided documentation/scripts.

> If it is not possible, then this specific code/setup can not claim
> for gating CI, and eventually mainstream isn't the best place for it.
> 
> >> [...]
> 

IIUC, we're in agreement. :)

Thanks,
- CLeber.

Daniel P. Berrangé April 27, 2020, 3:20 p.m. UTC | #39

On Mon, Apr 27, 2020 at 04:41:38PM +0200, Philippe Mathieu-Daudé wrote:
> On 4/27/20 4:28 PM, Cleber Rosa wrote:
> > On Mon, 27 Apr 2020 12:51:36 +0200
> > Philippe Mathieu-Daudé <philmd@redhat.com> wrote:
> > 
> > > On 4/27/20 7:12 AM, Cleber Rosa wrote:
> > > > On Thu, 23 Apr 2020 23:28:21 +0200
> > > > Philippe Mathieu-Daudé <philmd@redhat.com> wrote:
> > > [...]
> > > > > In some cases custom runners are acceptable. These runners won't be
> > > > > "gating" but can post informative log and status.
> > > > > 
> > > > 
> > > > Well, I have the feeling that some people maintaining those runners
> > > > will *not* want to have them as "informational" only.  If they
> > > > invest a good amount of time on them, I believe they'll want to
> > > > reap the benefits such as other not breaking the code they rely on.
> > > >   If their system is not gating, they lose that and may find
> > > > breakage that CI did not catch.  Again, I don't think "easily
> > > > accessible" hardware should be the only criteria for
> > > > gating/non-gating status.
> > > > 
> > > > For instance, would you consider, say, a "Raspberry Pi 4 Model
> > > > B", running KVM jobs to be a reproducible runner?  Would you blame a
> > > > developer that breaks a Gating CI job on such a platform and says
> > > > that he can not reproduce it?
> > > 
> > > I'm not sure I understood the problem, as I'd answer "yes" but I
> > > guess you expect me to say "no"?
> > > 
> > 
> > What I mean is: would you blame such a developer for *not* having a
> > machine himself/herself that he/she can try to reproduce the failure?
> > And would you consider a "Raspberry Pi 4 Model B" an easily available
> > hardware?
> 
> My view on this is if someone merged code in mainstream QEMU and maintains
> it, and if it is not easy to reproduce the setup (for a bug reported by a CI
> script), then it is the responsibility of the maintainer to resolve it.
> Either by providing particular access to the hardware, or be ready to spend
> a long debugging session over email and multiple time zones.
> 
> If it is not possible, then this specific code/setup can not claim for
> gating CI, and eventually mainstream isn't the best place for it.

I'd caution to be wary about using gating CI as a big stick for hitting
contributors with. The more rules we put in place whicih contributors
have to follow before their work gets accepted for merge, the less likely
someone is to have a positive experiance contributing to the project, or
even be willing to try. This view of gating CI requirements was a negative
aspect of contributing to the OpenStack project, which drove people away.
There was pushback against contributing work because it lacked CI, but
there was often no viable way for to actually provide CI in a feasible
timeframe, especially for stuff only testable in physical hardware and
not VMs. Even if you work for a big company, it doesn't make it easy to
magic up money to spend on hardware & hosting to provide CI, as corporate
beaurcracy & priorities will get in your way.

I'd really encourage the more nuanced approach of thinking in terms of
tiered support levels:

  - Tier 1: features that we have gating CI tests for. Will always work.

  - Tier 2: features that we have non-gating CI test for. Should work at
            time of release, but may be broken for periods in git master.

  - Tier 3: features that we don't have CI tests for. Compile tested only,
            relying on end user manual testing, so may or may not work
	    at any time or in release.

Obviously tier 1 is the gold standard that we would like everything to
achieve but we'll never achieve that reality unless we cull 90% of QEMU's
code. I don't think that's in the best interests of our users, because
clearly stuff in Tier 2 and Tier 3 is still useful for a large portion of
our end users - not least because Tier 3 is the level everything is at
right now in QEMU unless using a downstream vendor's packages.

The tier levels and CI are largely around setting reasonable quality
expectations. Right now we often have a problem that poeople want to
re-factor code but are afraid of breaking existing functionality that
guests rely on. This causes delays in merging code or causes people to
not even attempt the refactoring in the first place. This harms our
forward progress in QEMU.

With gating CI, we are declaring that contributors should feel free to
refactor anything as long as it passes gating CI. IOW, contributors only
have to care about Tier 1 features continuing to work. It would be nice
if refactoring does not breaks stuff in Tier 2 / 3, but if it does, then
that is acceptable collatoral damage. We would not block the merge on
stuff that is Tier 2 / 3.

Based on what I experianced in OpenStack the other big challenge is
deciding when something can be promoted from Tier 2 to Tier 1. They
had the official gating CI (for Tier 1) being maintained by the core
project infrastructure team. Any CI provided by third party companies
was non-gating (Tier 2) (at least in the time I was involved) because
they did not want code merge blocked on ability to communicate with
third party companies who were often hard to contact when CI broke.
So the only real path from Tier 2 to Tier 1 was to give the project
direct access to the CI hardware, instead of having the providing
company self-manage it.

Regards,
Daniel

Cleber Rosa June 16, 2020, 1:27 a.m. UTC | #40

On Thu, Apr 23, 2020 at 01:04:13PM -0400, Cleber Rosa wrote:
> 
> 
> ----- Original Message -----
> > From: "Peter Maydell" <peter.maydell@linaro.org>
> > To: "Markus Armbruster" <armbru@redhat.com>
> > Cc: "Fam Zheng" <fam@euphon.net>, "Thomas Huth" <thuth@redhat.com>, "Beraldo Leal" <bleal@redhat.com>, "Erik
> > Skultety" <eskultet@redhat.com>, "Alex Bennée" <alex.bennee@linaro.org>, "Wainer Moschetta" <wmoschet@redhat.com>,
> > "QEMU Developers" <qemu-devel@nongnu.org>, "Wainer dos Santos Moschetta" <wainersm@redhat.com>, "Willian Rampazzo"
> > <wrampazz@redhat.com>, "Cleber Rosa" <crosa@redhat.com>, "Philippe Mathieu-Daudé" <philmd@redhat.com>, "Eduardo
> > Habkost" <ehabkost@redhat.com>
> > Sent: Tuesday, April 21, 2020 8:53:49 AM
> > Subject: Re: [PATCH 0/5] QEMU Gating CI
> > 
> > On Thu, 19 Mar 2020 at 16:33, Markus Armbruster <armbru@redhat.com> wrote:
> > > Peter Maydell <peter.maydell@linaro.org> writes:
> > > > I think we should start by getting the gitlab setup working
> > > > for the basic "x86 configs" first. Then we can try adding
> > > > a runner for s390 (that one's logistically easiest because
> > > > it is a project machine, not one owned by me personally or
> > > > by Linaro) once the basic framework is working, and expand
> > > > from there.
> > >
> > > Makes sense to me.
> > >
> > > Next steps to get this off the ground:
> > >
> > > * Red Hat provides runner(s) for x86 stuff we care about.
> > >
> > > * If that doesn't cover 'basic "x86 configs" in your judgement, we
> > >   fill the gaps as described below under "Expand from there".
> > >
> > > * Add an s390 runner using the project machine you mentioned.
> > >
> > > * Expand from there: identify the remaining gaps, map them to people /
> > >   organizations interested in them, and solicit contributions from these
> > >   guys.
> > >
> > > A note on contributions: we need both hardware and people.  By people I
> > > mean maintainers for the infrastructure, the tools and all the runners.
> > > Cleber & team are willing to serve for the infrastructure, the tools and
> > > the Red Hat runners.
> > 
> > So, with 5.0 nearly out the door it seems like a good time to check
> > in on this thread again to ask where we are progress-wise with this.
> > My impression is that this patchset provides most of the scripting
> > and config side of the first step, so what we need is for RH to provide
> > an x86 runner machine and tell the gitlab CI it exists. I appreciate
> > that the whole coronavirus and working-from-home situation will have
> > upended everybody's plans, especially when actual hardware might
> > be involved, but how's it going ?
> > 
> 
> Hi Peter,
> 
> You hit the nail in the head here.  We were affected indeed with our ability
> to move some machines from one lab to another (across the country), but we're
> actively working on it.
> 
> From now on, I'll give you an update every time a significant event occurs on
> our side.
>

Hi all,

It took a while, but I finally have some updates, and they're pretty
good news.

Red Hat is sponsoring 3 x86_64 machines that will act as runners for
the QEMU CI, and together with QEMU's own s390 and aarch64 machines (1
each), we have enough hardware now for a reasonable build and test
coverage.

The s390 and aarch64 machines are already up and running, and the
x86_64 machines are being racked up, but should be up and running
in the next few days.  I'm working on an updated version of this
series that takes into account this new scenario, and some
fixes and improvements.

And as a reminder, if you (individual or organization) would like to
sponsor hardware or people to expand the QEMU build and test coverage,
please reach out to us.

Thanks,
- Cleber.

> > thanks
> > -- PMM
> > 
> > 
> 
> Thanks for checking in!
> - Cleber.

[0/5] QEMU Gating CI

Message

Comments