diff mbox series

[v6,4/4] Jobs based on custom runners: add job definitions for QEMU's machines

Message ID 20210608031425.833536-5-crosa@redhat.com (mailing list archive)
State New, archived
Headers show
Series GitLab Custom Runners and Jobs (was: QEMU Gating CI) | expand

Commit Message

Cleber Rosa June 8, 2021, 3:14 a.m. UTC
The QEMU project has two machines (aarch64 and s390x) that can be used
for jobs that do build and run tests.  This introduces those jobs,
which are a mapping of custom scripts used for the same purpose.

Signed-off-by: Cleber Rosa <crosa@redhat.com>
---
 .gitlab-ci.d/custom-runners.yml | 208 ++++++++++++++++++++++++++++++++
 1 file changed, 208 insertions(+)

Comments

Philippe Mathieu-Daudé June 8, 2021, 6:29 a.m. UTC | #1
Hi Alex, Stefan,

On 6/8/21 5:14 AM, Cleber Rosa wrote:
> The QEMU project has two machines (aarch64 and s390x) that can be used
> for jobs that do build and run tests.

AFAIK there is more hardware available to the project, so I'm wondering
what happened to the rest, is it a deliberate choice to start small?
What will happen with the rest, since we are wasting resources?
Who has access to what and should do what (setup)? How is this list of
hw managed btw? Should there be some public visibility (i.e. Wiki)?

Is there a document explaining what are the steps to follow for an
entity to donate / sponsor hardware? Where would it be stored, should
this hw be shipped somewhere? What documentation should be provided for
its system administration?

In case an entity manages hosting and maintenance, can the QEMU project
share the power bill? Up to which amount?
Similar question if a sysadmin has to be paid.

If the QEMU project can't spend money on CI, what is expected from
resource donators? Simply hardware + power (electricity) + network
traffic? Also sysadmining and monitoring? Do we expect some kind of
reliability on the data stored here or can we assume disposable /
transient runners?
Should donators also provide storage? Do we have a minimum storage
requirement?

Should we provide some guideline explaining any code might be run by
our contributors on these runners and some security measures have to
be taken / considered?

Sponsors usually expect some advertising to thanks them, and often
regular reports on how their resources are used, else they might not
renew their offer. Who should care to keep the relationship with
sponsors?

Where is defined what belong to the project, who is responsible, who can
request access to this hardware, what resource can be used?

More generically, what is the process for a private / corporate entity
to register a runner to the project? (how did it work for this aarch64
and s390x one?)

What else am I missing?

Thanks,

Phil.

> This introduces those jobs,
> which are a mapping of custom scripts used for the same purpose.
> 
> Signed-off-by: Cleber Rosa <crosa@redhat.com>
> ---
>  .gitlab-ci.d/custom-runners.yml | 208 ++++++++++++++++++++++++++++++++
>  1 file changed, 208 insertions(+)
Cleber Rosa June 8, 2021, 1:36 p.m. UTC | #2
On Tue, Jun 8, 2021 at 2:30 AM Philippe Mathieu-Daudé <f4bug@amsat.org>
wrote:

> Hi Alex, Stefan,
>
> On 6/8/21 5:14 AM, Cleber Rosa wrote:
> > The QEMU project has two machines (aarch64 and s390x) that can be used
> > for jobs that do build and run tests.
>
> AFAIK there is more hardware available to the project, so I'm wondering
> what happened to the rest, is it a deliberate choice to start small?
>

Hi Phil,

Yes, this series was deliberately focused on the first two machines owned
and available to QEMU.


> What will happen with the rest, since we are wasting resources?
>

The plan is to allow all machines (currently available and to-be available)
to be connected as custom runners.  This hopefully gets that started.

About "more hardware available to the project", there's one VM from
fosshost which was made available not long ago, and which I set up even
more recently, which could be used as a gitlab runner too.  But, even
though some new hardware resource is available (and wasted?), the human
resources to maintain them have not been properly determined, so I believe
it's a good decision to start with the machines that have been operational
for long, and that already have to the best of my knowledge, people
maintaining them.

I also see a "Debian unstable mips64el (Debian) @ cipunited.cn" registered
as a runner, but I don't have extra information about it.

Besides that, I'll send another series shortly, that builds upon this
series, and adds a Red Hat focused job, on a Red Hat managed machine.  This
should be what other entities should be capable of doing and allowed to do.


> Who has access to what and should do what (setup)? How is this list of
> hw managed btw? Should there be some public visibility (i.e. Wiki)?
>

These are good questions, and I believe Alex can answer them about those
two machines.  Even though I hooked them up to GitLab, AFAICT he is the
ultimate admin (maybe Peter too?).

About hardware management, it has been suggested to use either the Wiki or
a MAINTAINERS entry.  This is still unresolved and feedback would be
appreciated.  For me to propose a MAINTAINERS entry, say, on a v7, I'd need
the information on who is managing them.


> Is there a document explaining what are the steps to follow for an
> entity to donate / sponsor hardware? Where would it be stored, should
> this hw be shipped somewhere? What documentation should be provided for
> its system administration?
>
> In case an entity manages hosting and maintenance, can the QEMU project
> share the power bill? Up to which amount?
> Similar question if a sysadmin has to be paid.
>
> If the QEMU project can't spend money on CI, what is expected from
> resource donators? Simply hardware + power (electricity) + network
> traffic? Also sysadmining and monitoring? Do we expect some kind of
> reliability on the data stored here or can we assume disposable /
> transient runners?
> Should donators also provide storage? Do we have a minimum storage
> requirement?
>
> Should we provide some guideline explaining any code might be run by
> our contributors on these runners and some security measures have to
> be taken / considered?
>
> Sponsors usually expect some advertising to thanks them, and often
> regular reports on how their resources are used, else they might not
> renew their offer. Who should care to keep the relationship with
> sponsors?
>
> Where is defined what belong to the project, who is responsible, who can
> request access to this hardware, what resource can be used?
>
>
You obviously directed the question towards Alex and Stefan (rightfully
so), so I won't attempt to answer these ones at this point.


> More generically, what is the process for a private / corporate entity
> to register a runner to the project? (how did it work for this aarch64
> and s390x one?)
>

The steps are listed on the documentation.  Basically anyone with knowledge
of the "registration token" can add new machines to GitLab as runners.  For
the two aarch64 and s390x, it was a matter of following the documentation
steps which basically involve:

1) providing the hostname(s) in the inventory file
2) provide the "registration token" in the vars.yml file
3) running the playbooks


>
> What else am I missing?
>
>
I think you're missing the answers to all your good questions :).

And I understand that are a lot of them (from everyone, including myself).
The dilemma here is: should we activate the machines already available, and
learn in practice, what's missing?  I honestly believe we should.

Thanks,
- Cleber.


> Thanks,
>
> Phil.
>
> > This introduces those jobs,
> > which are a mapping of custom scripts used for the same purpose.
> >
> > Signed-off-by: Cleber Rosa <crosa@redhat.com>
> > ---
> >  .gitlab-ci.d/custom-runners.yml | 208 ++++++++++++++++++++++++++++++++
> >  1 file changed, 208 insertions(+)
>
>
Wainer dos Santos Moschetta June 8, 2021, 6:27 p.m. UTC | #3
Hi,

On 6/8/21 12:14 AM, Cleber Rosa wrote:
> The QEMU project has two machines (aarch64 and s390x) that can be used
> for jobs that do build and run tests.  This introduces those jobs,
> which are a mapping of custom scripts used for the same purpose.
>
> Signed-off-by: Cleber Rosa <crosa@redhat.com>
> ---
>   .gitlab-ci.d/custom-runners.yml | 208 ++++++++++++++++++++++++++++++++
>   1 file changed, 208 insertions(+)
>
> diff --git a/.gitlab-ci.d/custom-runners.yml b/.gitlab-ci.d/custom-runners.yml
> index a07b27384c..061d3cdfed 100644
> --- a/.gitlab-ci.d/custom-runners.yml
> +++ b/.gitlab-ci.d/custom-runners.yml
> @@ -12,3 +12,211 @@
>   # guarantees a fresh repository on each job run.
>   variables:
>     GIT_STRATEGY: clone
> +
> +# All ubuntu-18.04 jobs should run successfully in an environment
> +# setup by the scripts/ci/setup/build-environment.yml task
> +# "Install basic packages to build QEMU on Ubuntu 18.04/20.04"
> +ubuntu-18.04-s390x-all-linux-static:
> + allow_failure: true
> + needs: []
> + stage: build
> + tags:
> + - ubuntu_18.04
> + - s390x
> + rules:
> + - if: '$CI_COMMIT_BRANCH =~ /^staging/'

Should it restrict the job for pushes to qemu-project only? If yes, then 
it probably needs the statement:

'$CI_PROJECT_NAMESPACE == "qemu-project" && $CI_COMMIT_BRANCH =~ /^staging/'

If you change that here, you will end it changing all over the jobs. In 
general, there are many boilerplates in this file. I'm ok to merge it as 
is as long as it is followed by another series to refactor the code.

> + script:
> + # --disable-libssh is needed because of https://bugs.launchpad.net/qemu/+bug/1838763
> + # --disable-glusterfs is needed because there's no static version of those libs in distro supplied packages
> + - mkdir build
> + - cd build
> + - ../configure --enable-debug --static --disable-system --disable-glusterfs --disable-libssh
> + - make --output-sync -j`nproc`
> + - make --output-sync -j`nproc` check V=1
> + - make --output-sync -j`nproc` check-tcg V=1
> +
> +ubuntu-18.04-s390x-all:
> + allow_failure: true
> + needs: []
> + stage: build
> + tags:
> + - ubuntu_18.04
> + - s390x
> + rules:
> + - if: '$CI_COMMIT_BRANCH =~ /^staging/'
> + script:
> + - mkdir build
> + - cd build
> + - ../configure --disable-libssh
> + - make --output-sync -j`nproc`
> + - make --output-sync -j`nproc` check V=1
> +
> +ubuntu-18.04-s390x-alldbg:
Maybe we don't need both ubuntu-18.04-s390x-all and 
ubuntu-18.04-s390x-alldbg jobs.
> + allow_failure: true
> + needs: []
> + stage: build
> + tags:
> + - ubuntu_18.04
> + - s390x
> + rules:
> + - if: '$CI_COMMIT_BRANCH =~ /^staging/'
> + script:
> + - mkdir build
> + - cd build
> + - ../configure --enable-debug --disable-libssh
> + - make clean
> + - make --output-sync -j`nproc`
> + - make --output-sync -j`nproc` check V=1
> +ubuntu-18.04-s390x-clang:
> + allow_failure: true
> + needs: []
> + stage: build
> + tags:
> + - ubuntu_18.04
> + - s390x
> + rules:
> + - if: '$CI_COMMIT_BRANCH =~ /^staging/'
> +   when: manual
> + script:
> + - mkdir build
> + - cd build
> + - ../configure --disable-libssh --cc=clang --cxx=clang++ --enable-sanitizers
> + - make --output-sync -j`nproc`
> + - make --output-sync -j`nproc` check V=1
> +
> +ubuntu-18.04-s390x-tci:
> + allow_failure: true
> + needs: []
> + stage: build
> + tags:
> + - ubuntu_18.04
> + - s390x
> + rules:
> + - if: '$CI_COMMIT_BRANCH =~ /^staging/'
> + script:
> + - mkdir build
> + - cd build
> + - ../configure --disable-libssh --enable-tcg-interpreter
> + - make --output-sync -j`nproc`
I think it needs to `make check-tcg` at least. See "build-tci" in 
`.gitlab-ci.d/buildtest.yml` for other tests being executed on shared 
runners.
> +
> +ubuntu-18.04-s390x-notcg:
The "build-tcg-disabled" in `.gitlab-ci.d/buildtest.yml` could be 
mimic-ed here too.
> + allow_failure: true
> + needs: []
> + stage: build
> + tags:
> + - ubuntu_18.04
> + - s390x
> + rules:
> + - if: '$CI_COMMIT_BRANCH =~ /^staging/'
> +   when: manual
> + script:
> + - mkdir build
> + - cd build
> + - ../configure --disable-libssh --disable-tcg
> + - make --output-sync -j`nproc`
> + - make --output-sync -j`nproc` check V=1
> +
> +# All ubuntu-20.04 jobs should run successfully in an environment
> +# setup by the scripts/ci/setup/qemu/build-environment.yml task
> +# "Install basic packages to build QEMU on Ubuntu 18.04/20.04"
> +ubuntu-20.04-aarch64-all-linux-static:
> + allow_failure: true
> + needs: []
> + stage: build
> + tags:
> + - ubuntu_20.04
> + - aarch64
> + rules:
> + - if: '$CI_COMMIT_BRANCH =~ /^staging/'
> + script:
> + # --disable-libssh is needed because of https://bugs.launchpad.net/qemu/+bug/1838763
> + # --disable-glusterfs is needed because there's no static version of those libs in distro supplied packages
> + - mkdir build
> + - cd build
> + - ../configure --enable-debug --static --disable-system --disable-glusterfs --disable-libssh
> + - make --output-sync -j`nproc`
> + - make --output-sync -j`nproc` check V=1
> + - make --output-sync -j`nproc` check-tcg V=1
> +
> +ubuntu-20.04-aarch64-all:
> + allow_failure: true
> + needs: []
> + stage: build
> + tags:
> + - ubuntu_20.04
> + - aarch64
> + rules:
> + - if: '$CI_COMMIT_BRANCH =~ /^staging/'
> + script:
> + - mkdir build
> + - cd build
> + - ../configure --disable-libssh
> + - make --output-sync -j`nproc`
> + - make --output-sync -j`nproc` check V=1
> +
> +ubuntu-20.04-aarch64-alldbg:
> + allow_failure: true
> + needs: []
> + stage: build
> + tags:
> + - ubuntu_20.04
> + - aarch64
> + rules:
> + - if: '$CI_COMMIT_BRANCH =~ /^staging/'
> + script:
> + - mkdir build
> + - cd build
> + - ../configure --enable-debug --disable-libssh
> + - make clean
> + - make --output-sync -j`nproc`
> + - make --output-sync -j`nproc` check V=1
> +
> +ubuntu-20.04-aarch64-clang:
> + allow_failure: true
> + needs: []
> + stage: build
> + tags:
> + - ubuntu_20.04
> + - aarch64
> + rules:
> + - if: '$CI_COMMIT_BRANCH =~ /^staging/'
> +   when: manual
> + script:
> + - mkdir build
> + - cd build
> + - ../configure --disable-libssh --cc=clang-10 --cxx=clang++-10 --enable-sanitizers
> + - make --output-sync -j`nproc`
> + - make --output-sync -j`nproc` check V=1
> +
> +ubuntu-20.04-aarch64-tci:
> + allow_failure: true
> + needs: []
> + stage: build
> + tags:
> + - ubuntu_20.04
> + - aarch64
> + rules:
> + - if: '$CI_COMMIT_BRANCH =~ /^staging/'
> + script:
> + - mkdir build
> + - cd build
> + - ../configure --disable-libssh --enable-tcg-interpreter
> + - make --output-sync -j`nproc`
> +
> +ubuntu-20.04-aarch64-notcg:
> + allow_failure: true
> + needs: []
> + stage: build
> + tags:
> + - ubuntu_20.04
> + - aarch64
> + rules:
> + - if: '$CI_COMMIT_BRANCH =~ /^staging/'
> +   when: manual
> + script:
> + - mkdir build
> + - cd build
> + - ../configure --disable-libssh --disable-tcg
> + - make --output-sync -j`nproc`
> + - make --output-sync -j`nproc` check V=1
Wainer dos Santos Moschetta June 8, 2021, 7:07 p.m. UTC | #4
Hi,

On 6/8/21 10:36 AM, Cleber Rosa Junior wrote:
>
>
> On Tue, Jun 8, 2021 at 2:30 AM Philippe Mathieu-Daudé <f4bug@amsat.org 
> <mailto:f4bug@amsat.org>> wrote:
>
>     Hi Alex, Stefan,
>
>     On 6/8/21 5:14 AM, Cleber Rosa wrote:
>     > The QEMU project has two machines (aarch64 and s390x) that can
>     be used
>     > for jobs that do build and run tests.
>
>     AFAIK there is more hardware available to the project, so I'm
>     wondering
>     what happened to the rest, is it a deliberate choice to start small?
>
>
> Hi Phil,
>
> Yes, this series was deliberately focused on the first two machines 
> owned and available to QEMU.
>
>     What will happen with the rest, since we are wasting resources?
>
>
> The plan is to allow all machines (currently available and to-be 
> available) to be connected as custom runners. This hopefully gets that 
> started.
>
> About "more hardware available to the project", there's one VM from 
> fosshost which was made available not long ago, and which I set up 
> even more recently, which could be used as a gitlab runner too.  But, 
> even though some new hardware resource is available (and wasted?), the 
> human resources to maintain them have not been properly determined, so 
> I believe it's a good decision to start with the machines that have 
> been operational for long, and that already have to the best of my 
> knowledge, people maintaining them.
>
> I also see a "Debian unstable mips64el (Debian) @ cipunited.cn 
> <http://cipunited.cn>" registered as a runner, but I don't have extra 
> information about it.
>
> Besides that, I'll send another series shortly, that builds upon this 
> series, and adds a Red Hat focused job, on a Red Hat managed machine.  
> This should be what other entities should be capable of doing and 
> allowed to do.
>
>     Who has access to what and should do what (setup)? How is this list of
>     hw managed btw? Should there be some public visibility (i.e. Wiki)?
>
>
> These are good questions, and I believe Alex can answer them about 
> those two machines.  Even though I hooked them up to GitLab, AFAICT he 
> is the ultimate admin (maybe Peter too?).
>
> About hardware management, it has been suggested to use either the 
> Wiki or a MAINTAINERS entry.  This is still unresolved and feedback 
> would be appreciated.  For me to propose a MAINTAINERS entry, say, on 
> a v7, I'd need the information on who is managing them.
>
>
>     Is there a document explaining what are the steps to follow for an
>     entity to donate / sponsor hardware? Where would it be stored, should
>     this hw be shipped somewhere? What documentation should be
>     provided for
>     its system administration?
>
>     In case an entity manages hosting and maintenance, can the QEMU
>     project
>     share the power bill? Up to which amount?
>     Similar question if a sysadmin has to be paid.
>
>     If the QEMU project can't spend money on CI, what is expected from
>     resource donators? Simply hardware + power (electricity) + network
>     traffic? Also sysadmining and monitoring? Do we expect some kind of
>     reliability on the data stored here or can we assume disposable /
>     transient runners?
>     Should donators also provide storage? Do we have a minimum storage
>     requirement?
>
>     Should we provide some guideline explaining any code might be run by
>     our contributors on these runners and some security measures have to
>     be taken / considered?
>
>     Sponsors usually expect some advertising to thanks them, and often
>     regular reports on how their resources are used, else they might not
>     renew their offer. Who should care to keep the relationship with
>     sponsors?
>
>     Where is defined what belong to the project, who is responsible,
>     who can
>     request access to this hardware, what resource can be used?
>
>
> You obviously directed the question towards Alex and Stefan 
> (rightfully so), so I won't attempt to answer these ones at this point.
>
>     More generically, what is the process for a private / corporate entity
>     to register a runner to the project? (how did it work for this aarch64
>     and s390x one?)
>
>
> The steps are listed on the documentation.  Basically anyone with 
> knowledge of the "registration token" can add new machines to GitLab 
> as runners.  For the two aarch64 and s390x, it was a matter of 
> following the documentation steps which basically involve:
>
> 1) providing the hostname(s) in the inventory file
> 2) provide the "registration token" in the vars.yml file
> 3) running the playbooks
>
>
>     What else am I missing?
>
>
> I think you're missing the answers to all your good questions :).
>
> And I understand that are a lot of them (from everyone, including 
> myself).  The dilemma here is: should we activate the machines already 
> available, and learn in practice, what's missing?  I honestly believe 
> we should.


IMHO we should merge the minimum possible to start using the existing 
machines, then address the questions (good questions, btw) raised by 
Philippe as needed.

Thanks!

- Wainer

>
> Thanks,
> - Clr.
>
>     Thanks,
>
>     Phil.
>
>     > This introduces those jobs,
>     > which are a mapping of custom scripts used for the same purpose.
>     >
>     > Signed-off-by: Cleber Rosa <crosa@redhat.com
>     <mailto:crosa@redhat.com>>
>     > ---
>     >  .gitlab-ci.d/custom-runners.yml | 208
>     ++++++++++++++++++++++++++++++++
>     >  1 file changed, 208 insertions(+)
>
Stefan Hajnoczi June 9, 2021, 2:22 p.m. UTC | #5
On Tue, Jun 08, 2021 at 08:29:53AM +0200, Philippe Mathieu-Daudé wrote:
> On 6/8/21 5:14 AM, Cleber Rosa wrote:
> Sponsors usually expect some advertising to thanks them, and often
> regular reports on how their resources are used, else they might not
> renew their offer. Who should care to keep the relationship with
> sponsors?

Most sponsors I've encountered did not ask for advertising. Either they
have an interest in test coverage because they ship QEMU to customers
or they provide resources to open source projects and leave all the
detail up to us (they don't need test reports).

But in any case it's easy to have a page on the wiki or website listing
sponsors.

There needs to be a point of contact. Someone who answers questions and
coordinates with sponsors.

Stefan
Stefan Hajnoczi June 9, 2021, 2:54 p.m. UTC | #6
On Tue, Jun 08, 2021 at 09:36:37AM -0400, Cleber Rosa Junior wrote:
> On Tue, Jun 8, 2021 at 2:30 AM Philippe Mathieu-Daudé <f4bug@amsat.org>
> wrote:

Here are my thoughts. It's just my opinion based on experience running
the old QEMU Buildbot infrastructure and interacting with our hosting
providers. I'm not planning to actively work on the CI infrastructure,
so if you have a strong opinion and want to do things differently
yourself, feel free.

> > Who has access to what and should do what (setup)? How is this list of
> > hw managed btw? Should there be some public visibility (i.e. Wiki)?
> >
> 
> These are good questions, and I believe Alex can answer them about those
> two machines.  Even though I hooked them up to GitLab, AFAICT he is the
> ultimate admin (maybe Peter too?).
> 
> About hardware management, it has been suggested to use either the Wiki or
> a MAINTAINERS entry.  This is still unresolved and feedback would be
> appreciated.  For me to propose a MAINTAINERS entry, say, on a v7, I'd need
> the information on who is managing them.

Here is the wiki page that lists project resources (machines, etc):
https://wiki.qemu.org/AdminContacts

We can continue to use this page.

> > Is there a document explaining what are the steps to follow for an
> > entity to donate / sponsor hardware? Where would it be stored, should
> > this hw be shipped somewhere? What documentation should be provided for
> > its system administration?

A document is needed that explains the process and the
roles/responsibilities of the people involved.

QEMU doesn't have a physical presence and we currently don't have a way
to host physical machines. We also probably shouldn't get involved in
that because it has a high overhead and puts the responsibility on the
project to maintain the hardware. There are hosting providers like
OSUOSL that offer non-x86 architectures, so I don't think we need to
deal with physical hardware even for other architectures. If someone
needs their special hardware covered, let them act as the sponsor and
sysadmin for that machine - they'll need to figure out where to host it.

> > In case an entity manages hosting and maintenance, can the QEMU project
> > share the power bill? Up to which amount?
> > Similar question if a sysadmin has to be paid.

No, it's too complicated. QEMU does not have regular income that could
be used for periodic expenses. We shouldn't spend time worrying about
this unless there is a real need and plenty of funding.

> > If the QEMU project can't spend money on CI, what is expected from
> > resource donators? Simply hardware + power (electricity) + network
> > traffic? Also sysadmining and monitoring? Do we expect some kind of
> > reliability on the data stored here or can we assume disposable /
> > transient runners?

Sponsors provide access to internet-connected machines. They can
designate a QEMU community volunteer to admin machines or they can admin
the machine themselves.

Sysadmins deal with keeping the machine online (security, network
traffic, monitoring, etc).

For simplicity it's best if sponsored machines are owned and paid for by
the sponsor. Billing electricity, bandwidth, etc to QEMU will make
things complicated and we don't have the admin infrastructure to support
that.

> > Should donators also provide storage? Do we have a minimum storage
> > requirement?

Sponsors provide storage. There is a minimum storage requirement that
depends on the nature of CI jobs (I don't know what the exact amount
is and it may change over time).

> > Where is defined what belong to the project, who is responsible, who can
> > request access to this hardware, what resource can be used?

Machines belong to their sponsors, not to the QEMU project. They could
go away in the future if the sponsor decides to withdraw them.

Only the sysadmin has ssh access to the machine. The CI system provides
access to logs so that ssh access to machines is not necessary for QEMU
developers. If ssh access is needed then the developer can ask the
sysadmin for help.

Regarding resource usage, that's up to the sysadmin. If they want to
apply resource limits to the CI environment they need to configure that.

Stefan
Stefan Hajnoczi June 9, 2021, 3:09 p.m. UTC | #7
On Tue, Jun 08, 2021 at 04:07:57PM -0300, Wainer dos Santos Moschetta wrote:
> > And I understand that are a lot of them (from everyone, including
> > myself).  The dilemma here is: should we activate the machines already
> > available, and learn in practice, what's missing?  I honestly believe we
> > should.
> 
> 
> IMHO we should merge the minimum possible to start using the existing
> machines, then address the questions (good questions, btw) raised by
> Philippe as needed.

That sounds good.

Does anyone want to volunteer to be the QEMU CI runners point of contact
who is responsible for defining the process? Cleber's last email left a
lot for Alex and me to define, but I would prefer it if someone who is
not me takes this on since I'm already spread thin. Alex? Philippe? Cleber?
Wainer?

Stefan
Alex Bennée June 9, 2021, 3:53 p.m. UTC | #8
Wainer dos Santos Moschetta <wainersm@redhat.com> writes:

> Hi,
>
> On 6/8/21 12:14 AM, Cleber Rosa wrote:
>> The QEMU project has two machines (aarch64 and s390x) that can be used
>> for jobs that do build and run tests.  This introduces those jobs,
>> which are a mapping of custom scripts used for the same purpose.
>>
>> Signed-off-by: Cleber Rosa <crosa@redhat.com>
>> ---
>>   .gitlab-ci.d/custom-runners.yml | 208 ++++++++++++++++++++++++++++++++
>>   1 file changed, 208 insertions(+)
>>
>> diff --git a/.gitlab-ci.d/custom-runners.yml b/.gitlab-ci.d/custom-runners.yml
>> index a07b27384c..061d3cdfed 100644
>> --- a/.gitlab-ci.d/custom-runners.yml
>> +++ b/.gitlab-ci.d/custom-runners.yml
>> @@ -12,3 +12,211 @@
>>   # guarantees a fresh repository on each job run.
>>   variables:
>>     GIT_STRATEGY: clone
>> +
>> +# All ubuntu-18.04 jobs should run successfully in an environment
>> +# setup by the scripts/ci/setup/build-environment.yml task
>> +# "Install basic packages to build QEMU on Ubuntu 18.04/20.04"
>> +ubuntu-18.04-s390x-all-linux-static:
>> + allow_failure: true
>> + needs: []
>> + stage: build
>> + tags:
>> + - ubuntu_18.04
>> + - s390x
>> + rules:
>> + - if: '$CI_COMMIT_BRANCH =~ /^staging/'
>
> Should it restrict the job for pushes to qemu-project only? If yes,
> then it probably needs the statement:
>
> '$CI_PROJECT_NAMESPACE == "qemu-project" && $CI_COMMIT_BRANCH =~ /^staging/'
>
> If you change that here, you will end it changing all over the jobs.
> In general, there are many boilerplates in this file. I'm ok to merge
> it as is as long as it is followed by another series to refactor the
> code.
>
>> + script:
>> + # --disable-libssh is needed because of https://bugs.launchpad.net/qemu/+bug/1838763
>> + # --disable-glusterfs is needed because there's no static version of those libs in distro supplied packages
>> + - mkdir build
>> + - cd build
>> + - ../configure --enable-debug --static --disable-system --disable-glusterfs --disable-libssh
>> + - make --output-sync -j`nproc`
>> + - make --output-sync -j`nproc` check V=1
>> + - make --output-sync -j`nproc` check-tcg V=1
>> +
>> +ubuntu-18.04-s390x-all:
>> + allow_failure: true
>> + needs: []
>> + stage: build
>> + tags:
>> + - ubuntu_18.04
>> + - s390x
>> + rules:
>> + - if: '$CI_COMMIT_BRANCH =~ /^staging/'
>> + script:
>> + - mkdir build
>> + - cd build
>> + - ../configure --disable-libssh
>> + - make --output-sync -j`nproc`
>> + - make --output-sync -j`nproc` check V=1
>> +
>> +ubuntu-18.04-s390x-alldbg:
> Maybe we don't need both ubuntu-18.04-s390x-all and
> ubuntu-18.04-s390x-alldbg jobs.
>> + allow_failure: true
>> + needs: []
>> + stage: build
>> + tags:
>> + - ubuntu_18.04
>> + - s390x
>> + rules:
>> + - if: '$CI_COMMIT_BRANCH =~ /^staging/'
>> + script:
>> + - mkdir build
>> + - cd build
>> + - ../configure --enable-debug --disable-libssh
>> + - make clean
>> + - make --output-sync -j`nproc`
>> + - make --output-sync -j`nproc` check V=1
>> +ubuntu-18.04-s390x-clang:
>> + allow_failure: true
>> + needs: []
>> + stage: build
>> + tags:
>> + - ubuntu_18.04
>> + - s390x
>> + rules:
>> + - if: '$CI_COMMIT_BRANCH =~ /^staging/'
>> +   when: manual
>> + script:
>> + - mkdir build
>> + - cd build
>> + - ../configure --disable-libssh --cc=clang --cxx=clang++ --enable-sanitizers
>> + - make --output-sync -j`nproc`
>> + - make --output-sync -j`nproc` check V=1
>> +
>> +ubuntu-18.04-s390x-tci:
>> + allow_failure: true
>> + needs: []
>> + stage: build
>> + tags:
>> + - ubuntu_18.04
>> + - s390x
>> + rules:
>> + - if: '$CI_COMMIT_BRANCH =~ /^staging/'
>> + script:
>> + - mkdir build
>> + - cd build
>> + - ../configure --disable-libssh --enable-tcg-interpreter
>> + - make --output-sync -j`nproc`
> I think it needs to `make check-tcg` at least. See "build-tci" in
> `.gitlab-ci.d/buildtest.yml` for other tests being executed on shared
> runners.

To get anything other than the s390x-linux-user tests we will need the
cross compilers installed. Currently we don't really use docker for
anything other than x86_64 hosts (and some aarch64 which I've tested).

>> +
>> +ubuntu-18.04-s390x-notcg:
> The "build-tcg-disabled" in `.gitlab-ci.d/buildtest.yml` could be
> mimic-ed here too.
>> + allow_failure: true
>> + needs: []
>> + stage: build
>> + tags:
>> + - ubuntu_18.04
>> + - s390x
>> + rules:
>> + - if: '$CI_COMMIT_BRANCH =~ /^staging/'
>> +   when: manual
>> + script:
>> + - mkdir build
>> + - cd build
>> + - ../configure --disable-libssh --disable-tcg
>> + - make --output-sync -j`nproc`
>> + - make --output-sync -j`nproc` check V=1
>> +
>> +# All ubuntu-20.04 jobs should run successfully in an environment
>> +# setup by the scripts/ci/setup/qemu/build-environment.yml task
>> +# "Install basic packages to build QEMU on Ubuntu 18.04/20.04"
>> +ubuntu-20.04-aarch64-all-linux-static:
>> + allow_failure: true
>> + needs: []
>> + stage: build
>> + tags:
>> + - ubuntu_20.04
>> + - aarch64
>> + rules:
>> + - if: '$CI_COMMIT_BRANCH =~ /^staging/'
>> + script:
>> + # --disable-libssh is needed because of https://bugs.launchpad.net/qemu/+bug/1838763
>> + # --disable-glusterfs is needed because there's no static version of those libs in distro supplied packages
>> + - mkdir build
>> + - cd build
>> + - ../configure --enable-debug --static --disable-system --disable-glusterfs --disable-libssh
>> + - make --output-sync -j`nproc`
>> + - make --output-sync -j`nproc` check V=1
>> + - make --output-sync -j`nproc` check-tcg V=1
>> +
>> +ubuntu-20.04-aarch64-all:
>> + allow_failure: true
>> + needs: []
>> + stage: build
>> + tags:
>> + - ubuntu_20.04
>> + - aarch64
>> + rules:
>> + - if: '$CI_COMMIT_BRANCH =~ /^staging/'
>> + script:
>> + - mkdir build
>> + - cd build
>> + - ../configure --disable-libssh
>> + - make --output-sync -j`nproc`
>> + - make --output-sync -j`nproc` check V=1
>> +
>> +ubuntu-20.04-aarch64-alldbg:
>> + allow_failure: true
>> + needs: []
>> + stage: build
>> + tags:
>> + - ubuntu_20.04
>> + - aarch64
>> + rules:
>> + - if: '$CI_COMMIT_BRANCH =~ /^staging/'
>> + script:
>> + - mkdir build
>> + - cd build
>> + - ../configure --enable-debug --disable-libssh
>> + - make clean
>> + - make --output-sync -j`nproc`
>> + - make --output-sync -j`nproc` check V=1
>> +
>> +ubuntu-20.04-aarch64-clang:
>> + allow_failure: true
>> + needs: []
>> + stage: build
>> + tags:
>> + - ubuntu_20.04
>> + - aarch64
>> + rules:
>> + - if: '$CI_COMMIT_BRANCH =~ /^staging/'
>> +   when: manual
>> + script:
>> + - mkdir build
>> + - cd build
>> + - ../configure --disable-libssh --cc=clang-10 --cxx=clang++-10 --enable-sanitizers
>> + - make --output-sync -j`nproc`
>> + - make --output-sync -j`nproc` check V=1
>> +
>> +ubuntu-20.04-aarch64-tci:
>> + allow_failure: true
>> + needs: []
>> + stage: build
>> + tags:
>> + - ubuntu_20.04
>> + - aarch64
>> + rules:
>> + - if: '$CI_COMMIT_BRANCH =~ /^staging/'
>> + script:
>> + - mkdir build
>> + - cd build
>> + - ../configure --disable-libssh --enable-tcg-interpreter
>> + - make --output-sync -j`nproc`
>> +
>> +ubuntu-20.04-aarch64-notcg:
>> + allow_failure: true
>> + needs: []
>> + stage: build
>> + tags:
>> + - ubuntu_20.04
>> + - aarch64
>> + rules:
>> + - if: '$CI_COMMIT_BRANCH =~ /^staging/'
>> +   when: manual
>> + script:
>> + - mkdir build
>> + - cd build
>> + - ../configure --disable-libssh --disable-tcg
>> + - make --output-sync -j`nproc`
>> + - make --output-sync -j`nproc` check V=1
Willian Rampazzo June 9, 2021, 6:56 p.m. UTC | #9
On Tue, Jun 8, 2021 at 12:14 AM Cleber Rosa <crosa@redhat.com> wrote:
>
> The QEMU project has two machines (aarch64 and s390x) that can be used
> for jobs that do build and run tests.  This introduces those jobs,
> which are a mapping of custom scripts used for the same purpose.
>
> Signed-off-by: Cleber Rosa <crosa@redhat.com>
> ---
>  .gitlab-ci.d/custom-runners.yml | 208 ++++++++++++++++++++++++++++++++
>  1 file changed, 208 insertions(+)
>

Based on the comment from the cover letter that these jobs are defined
as trying to mimic what Peter runs on staging, the code looks good to
me, so:

Reviewed-by: Willian Rampazzo <willianr@redhat.com>
Thomas Huth June 10, 2021, 6:18 a.m. UTC | #10
On 08/06/2021 05.14, Cleber Rosa wrote:
> The QEMU project has two machines (aarch64 and s390x) that can be used
> for jobs that do build and run tests.  This introduces those jobs,
> which are a mapping of custom scripts used for the same purpose.
> 
> Signed-off-by: Cleber Rosa <crosa@redhat.com>
> ---
>   .gitlab-ci.d/custom-runners.yml | 208 ++++++++++++++++++++++++++++++++
>   1 file changed, 208 insertions(+)
> 
> diff --git a/.gitlab-ci.d/custom-runners.yml b/.gitlab-ci.d/custom-runners.yml
> index a07b27384c..061d3cdfed 100644
> --- a/.gitlab-ci.d/custom-runners.yml
> +++ b/.gitlab-ci.d/custom-runners.yml
> @@ -12,3 +12,211 @@
>   # guarantees a fresh repository on each job run.
>   variables:
>     GIT_STRATEGY: clone
> +
> +# All ubuntu-18.04 jobs should run successfully in an environment
> +# setup by the scripts/ci/setup/build-environment.yml task
> +# "Install basic packages to build QEMU on Ubuntu 18.04/20.04"
> +ubuntu-18.04-s390x-all-linux-static:
> + allow_failure: true
> + needs: []
> + stage: build
> + tags:
> + - ubuntu_18.04
> + - s390x
> + rules:
> + - if: '$CI_COMMIT_BRANCH =~ /^staging/'

I don't think this will work very well... sub-maintainers might want to push 
to a "staging" branch in their forked repositories, and without the s390x 
runner, the pipeline gets stuck now:

  https://gitlab.com/thuth/qemu/-/pipelines/317812558

We had the same issue in the kvm-unit-test CI, and we solved it there by 
rather making it depend on an environment variable that has to be set if the 
runner is available:

  only:
    variables:
     - $S390X_RUNNER_AVAILABLE

I think that's also nicer in case someone brings their own s390x runner and 
want to use the CI tests on other branches than staging.

Could you please change your patch accordingly?

  Thanks,
   Thomas
Alex Bennée June 11, 2021, 11 a.m. UTC | #11
Cleber Rosa Junior <crosa@redhat.com> writes:

> On Tue, Jun 8, 2021 at 2:30 AM Philippe Mathieu-Daudé <f4bug@amsat.org> wrote:
>
>  Hi Alex, Stefan,
>
>  On 6/8/21 5:14 AM, Cleber Rosa wrote:
>  > The QEMU project has two machines (aarch64 and s390x) that can be used
>  > for jobs that do build and run tests.
>
<snip>
>  
>  Who has access to what and should do what (setup)? How is this list of
>  hw managed btw? Should there be some public visibility (i.e. Wiki)?
>
> These are good questions, and I believe Alex can answer them about those two machines.  Even though I hooked them up to GitLab,
> AFAICT he is the ultimate admin (maybe Peter too?).
>
> About hardware management, it has been suggested to use either the Wiki or a MAINTAINERS entry.  This is still unresolved and feedback
> would be appreciated.  For me to propose a MAINTAINERS entry, say, on
> a v7, I'd need the information on who is managing them.

I can only talk about aarch64.ci.qemu.org which is a donated Equinix
machine that comes from the WorksOnArm initiative. I applied for it on
behalf of the QEMU project and we can have it for as long as it's
useful.

I don't know if we need anything more that documenting the nominal
contacts in:

  https://wiki.qemu.org/AdminContacts

>  Is there a document explaining what are the steps to follow for an
>  entity to donate / sponsor hardware? Where would it be stored, should
>  this hw be shipped somewhere? What documentation should be provided for
>  its system administration?

I think the project can only really work with donations out of someones
data centre where they keep responsibility for the physical aspects of
any machines including the ongoing hosting and running costs.
Cleber Rosa June 30, 2021, 12:30 a.m. UTC | #12
On Tue, Jun 8, 2021 at 2:27 PM Wainer dos Santos Moschetta
<wainersm@redhat.com> wrote:
>
> Hi,
>
> On 6/8/21 12:14 AM, Cleber Rosa wrote:
> > The QEMU project has two machines (aarch64 and s390x) that can be used
> > for jobs that do build and run tests.  This introduces those jobs,
> > which are a mapping of custom scripts used for the same purpose.
> >
> > Signed-off-by: Cleber Rosa <crosa@redhat.com>
> > ---
> >   .gitlab-ci.d/custom-runners.yml | 208 ++++++++++++++++++++++++++++++++
> >   1 file changed, 208 insertions(+)
> >
> > diff --git a/.gitlab-ci.d/custom-runners.yml b/.gitlab-ci.d/custom-runners.yml
> > index a07b27384c..061d3cdfed 100644
> > --- a/.gitlab-ci.d/custom-runners.yml
> > +++ b/.gitlab-ci.d/custom-runners.yml
> > @@ -12,3 +12,211 @@
> >   # guarantees a fresh repository on each job run.
> >   variables:
> >     GIT_STRATEGY: clone
> > +
> > +# All ubuntu-18.04 jobs should run successfully in an environment
> > +# setup by the scripts/ci/setup/build-environment.yml task
> > +# "Install basic packages to build QEMU on Ubuntu 18.04/20.04"
> > +ubuntu-18.04-s390x-all-linux-static:
> > + allow_failure: true
> > + needs: []
> > + stage: build
> > + tags:
> > + - ubuntu_18.04
> > + - s390x
> > + rules:
> > + - if: '$CI_COMMIT_BRANCH =~ /^staging/'
>
> Should it restrict the job for pushes to qemu-project only? If yes, then
> it probably needs the statement:
>
> '$CI_PROJECT_NAMESPACE == "qemu-project" && $CI_COMMIT_BRANCH =~ /^staging/'
>

I'm not sure we should.  In theory, if people have access to other
machines on their own accounts, they should be able to trigger the
same jobs just by using the "staging" prefix.

> If you change that here, you will end it changing all over the jobs. In
> general, there are many boilerplates in this file. I'm ok to merge it as
> is as long as it is followed by another series to refactor the code.
>

Absolutely, as mentioned before, this is a straightforward mapping of
Peter's jobs, so I don't want to go over too many inderaction and
abstraction levels initially.

> > + script:
> > + # --disable-libssh is needed because of https://bugs.launchpad.net/qemu/+bug/1838763
> > + # --disable-glusterfs is needed because there's no static version of those libs in distro supplied packages
> > + - mkdir build
> > + - cd build
> > + - ../configure --enable-debug --static --disable-system --disable-glusterfs --disable-libssh
> > + - make --output-sync -j`nproc`
> > + - make --output-sync -j`nproc` check V=1
> > + - make --output-sync -j`nproc` check-tcg V=1
> > +
> > +ubuntu-18.04-s390x-all:
> > + allow_failure: true
> > + needs: []
> > + stage: build
> > + tags:
> > + - ubuntu_18.04
> > + - s390x
> > + rules:
> > + - if: '$CI_COMMIT_BRANCH =~ /^staging/'
> > + script:
> > + - mkdir build
> > + - cd build
> > + - ../configure --disable-libssh
> > + - make --output-sync -j`nproc`
> > + - make --output-sync -j`nproc` check V=1
> > +
> > +ubuntu-18.04-s390x-alldbg:
> Maybe we don't need both ubuntu-18.04-s390x-all and
> ubuntu-18.04-s390x-alldbg jobs.
> > + allow_failure: true
> > + needs: []
> > + stage: build
> > + tags:
> > + - ubuntu_18.04
> > + - s390x
> > + rules:
> > + - if: '$CI_COMMIT_BRANCH =~ /^staging/'
> > + script:
> > + - mkdir build
> > + - cd build
> > + - ../configure --enable-debug --disable-libssh
> > + - make clean
> > + - make --output-sync -j`nproc`
> > + - make --output-sync -j`nproc` check V=1
> > +ubuntu-18.04-s390x-clang:
> > + allow_failure: true
> > + needs: []
> > + stage: build
> > + tags:
> > + - ubuntu_18.04
> > + - s390x
> > + rules:
> > + - if: '$CI_COMMIT_BRANCH =~ /^staging/'
> > +   when: manual
> > + script:
> > + - mkdir build
> > + - cd build
> > + - ../configure --disable-libssh --cc=clang --cxx=clang++ --enable-sanitizers
> > + - make --output-sync -j`nproc`
> > + - make --output-sync -j`nproc` check V=1
> > +
> > +ubuntu-18.04-s390x-tci:
> > + allow_failure: true
> > + needs: []
> > + stage: build
> > + tags:
> > + - ubuntu_18.04
> > + - s390x
> > + rules:
> > + - if: '$CI_COMMIT_BRANCH =~ /^staging/'
> > + script:
> > + - mkdir build
> > + - cd build
> > + - ../configure --disable-libssh --enable-tcg-interpreter
> > + - make --output-sync -j`nproc`
> I think it needs to `make check-tcg` at least. See "build-tci" in
> `.gitlab-ci.d/buildtest.yml` for other tests being executed on shared
> runners.
> > +
> > +ubuntu-18.04-s390x-notcg:
> The "build-tcg-disabled" in `.gitlab-ci.d/buildtest.yml` could be
> mimic-ed here too.

These are straightforward mappings of Peter's jobs... I honestly don't
think we should deviate any further at this time.  Let me know if you
think I'm missing something though.

Thanks for the review!
- Cleber.
Cleber Rosa June 30, 2021, 12:40 a.m. UTC | #13
On Wed, Jun 9, 2021 at 10:55 AM Stefan Hajnoczi <stefanha@gmail.com> wrote:
>
> On Tue, Jun 08, 2021 at 09:36:37AM -0400, Cleber Rosa Junior wrote:
> > On Tue, Jun 8, 2021 at 2:30 AM Philippe Mathieu-Daudé <f4bug@amsat.org>
> > wrote:
>
> Here are my thoughts. It's just my opinion based on experience running
> the old QEMU Buildbot infrastructure and interacting with our hosting
> providers. I'm not planning to actively work on the CI infrastructure,
> so if you have a strong opinion and want to do things differently
> yourself, feel free.
>
> > > Who has access to what and should do what (setup)? How is this list of
> > > hw managed btw? Should there be some public visibility (i.e. Wiki)?
> > >
> >
> > These are good questions, and I believe Alex can answer them about those
> > two machines.  Even though I hooked them up to GitLab, AFAICT he is the
> > ultimate admin (maybe Peter too?).
> >
> > About hardware management, it has been suggested to use either the Wiki or
> > a MAINTAINERS entry.  This is still unresolved and feedback would be
> > appreciated.  For me to propose a MAINTAINERS entry, say, on a v7, I'd need
> > the information on who is managing them.
>
> Here is the wiki page that lists project resources (machines, etc):
> https://wiki.qemu.org/AdminContacts
>
> We can continue to use this page.
>

ACK.  I'm adding a note to the documentation.

Thanks,
- Cleber.
Cleber Rosa June 30, 2021, 12:47 a.m. UTC | #14
On Wed, Jun 9, 2021 at 11:09 AM Stefan Hajnoczi <stefanha@gmail.com> wrote:
>
> On Tue, Jun 08, 2021 at 04:07:57PM -0300, Wainer dos Santos Moschetta wrote:
> > > And I understand that are a lot of them (from everyone, including
> > > myself).  The dilemma here is: should we activate the machines already
> > > available, and learn in practice, what's missing?  I honestly believe we
> > > should.
> >
> >
> > IMHO we should merge the minimum possible to start using the existing
> > machines, then address the questions (good questions, btw) raised by
> > Philippe as needed.
>
> That sounds good.
>
> Does anyone want to volunteer to be the QEMU CI runners point of contact
> who is responsible for defining the process? Cleber's last email left a
> lot for Alex and me to define, but I would prefer it if someone who is
> not me takes this on since I'm already spread thin. Alex? Philippe? Cleber?
> Wainer?
>
> Stefan

Sure, I can do it.

I'll start by resurrecting the now ancient RFC that contains a lot of
proposals with regards to the process, in light of the current status
and developments:

  https://lists.gnu.org/archive/html/qemu-devel/2019-12/msg00231.html

IMO we can start with documentation (either at ci.rst, or preferably
in the WIKI for now) for the process of onboarding new CI runners, and
track it with issues in GitLab itself.  I'll formalize those opinions.

Thanks,
- Cleber.
Cleber Rosa June 30, 2021, 1:02 a.m. UTC | #15
On Thu, Jun 10, 2021 at 2:18 AM Thomas Huth <thuth@redhat.com> wrote:
>
> On 08/06/2021 05.14, Cleber Rosa wrote:
> > The QEMU project has two machines (aarch64 and s390x) that can be used
> > for jobs that do build and run tests.  This introduces those jobs,
> > which are a mapping of custom scripts used for the same purpose.
> >
> > Signed-off-by: Cleber Rosa <crosa@redhat.com>
> > ---
> >   .gitlab-ci.d/custom-runners.yml | 208 ++++++++++++++++++++++++++++++++
> >   1 file changed, 208 insertions(+)
> >
> > diff --git a/.gitlab-ci.d/custom-runners.yml b/.gitlab-ci.d/custom-runners.yml
> > index a07b27384c..061d3cdfed 100644
> > --- a/.gitlab-ci.d/custom-runners.yml
> > +++ b/.gitlab-ci.d/custom-runners.yml
> > @@ -12,3 +12,211 @@
> >   # guarantees a fresh repository on each job run.
> >   variables:
> >     GIT_STRATEGY: clone
> > +
> > +# All ubuntu-18.04 jobs should run successfully in an environment
> > +# setup by the scripts/ci/setup/build-environment.yml task
> > +# "Install basic packages to build QEMU on Ubuntu 18.04/20.04"
> > +ubuntu-18.04-s390x-all-linux-static:
> > + allow_failure: true
> > + needs: []
> > + stage: build
> > + tags:
> > + - ubuntu_18.04
> > + - s390x
> > + rules:
> > + - if: '$CI_COMMIT_BRANCH =~ /^staging/'
>
> I don't think this will work very well... sub-maintainers might want to push
> to a "staging" branch in their forked repositories, and without the s390x
> runner, the pipeline gets stuck now:
>
>   https://gitlab.com/thuth/qemu/-/pipelines/317812558
>

Hi Thomas,

As I put it in another response, I saw that actually as a feature, in
the sense that:

* people should indeed be allowed to push to their repos and leverage
their hardware, and
* "staging" is a pretty well scoped word, and has a reasonably well
defined meaning
* one would want to mimic as closely as possible what will be done
before a PR is merged

I agree that having the jobs stuck in any situation is not ideal, but
I honestly find that it would be reasonably hard to accidentally hit
that situation.  I also believe it will end up being inevitable for
entities to do a meta-analysis of the GitLab CI pipeline results,
possibly disregarding jobs that they can not run, or simply do not
care about, in their forks.

> We had the same issue in the kvm-unit-test CI, and we solved it there by
> rather making it depend on an environment variable that has to be set if the
> runner is available:
>
>   only:
>     variables:
>      - $S390X_RUNNER_AVAILABLE
>
> I think that's also nicer in case someone brings their own s390x runner and
> want to use the CI tests on other branches than staging.
>

The problem with this approach, is that it would not be enough to
protect the jobs based on variables for the architecture, as the OS
type and version also play a part in the possibility of running jobs.
For instance, suppose we get s390x machines from LinuxOne running
RHEL.  We'd need variables such as, say,
S390X_RHEL_8_4_RUNNER_AVAILABLE and S390X_RHEL_7_6_RUNNER_AVAILABLE.

> Could you please change your patch accordingly?
>

If you strongly believe now is the time to attempt to handle that
problem, I can go ahead and change it.  I stand behind my original
position that we should start with a simpler, "by convention" approach
and address the more complex scenarios as/if they come up.

>   Thanks,
>    Thomas
>

Thank you!
- Cleber.
Cleber Rosa June 30, 2021, 1:08 a.m. UTC | #16
On Fri, Jun 11, 2021 at 7:04 AM Alex Bennée <alex.bennee@linaro.org> wrote:
>
>
> Cleber Rosa Junior <crosa@redhat.com> writes:
>
> > On Tue, Jun 8, 2021 at 2:30 AM Philippe Mathieu-Daudé <f4bug@amsat.org> wrote:
> >
> >  Hi Alex, Stefan,
> >
> >  On 6/8/21 5:14 AM, Cleber Rosa wrote:
> >  > The QEMU project has two machines (aarch64 and s390x) that can be used
> >  > for jobs that do build and run tests.
> >
> <snip>
> >
> >  Who has access to what and should do what (setup)? How is this list of
> >  hw managed btw? Should there be some public visibility (i.e. Wiki)?
> >
> > These are good questions, and I believe Alex can answer them about those two machines.  Even though I hooked them up to GitLab,
> > AFAICT he is the ultimate admin (maybe Peter too?).
> >
> > About hardware management, it has been suggested to use either the Wiki or a MAINTAINERS entry.  This is still unresolved and feedback
> > would be appreciated.  For me to propose a MAINTAINERS entry, say, on
> > a v7, I'd need the information on who is managing them.
>
> I can only talk about aarch64.ci.qemu.org which is a donated Equinix
> machine that comes from the WorksOnArm initiative. I applied for it on
> behalf of the QEMU project and we can have it for as long as it's
> useful.
>
> I don't know if we need anything more that documenting the nominal
> contacts in:
>
>   https://wiki.qemu.org/AdminContacts
>

That's enough indeed, thanks.  I'll follow up with a proposal about
the expected duties of admins, which should be nothing but common
sense.

Is there anyone that can speak for the s390x machine?

> >  Is there a document explaining what are the steps to follow for an
> >  entity to donate / sponsor hardware? Where would it be stored, should
> >  this hw be shipped somewhere? What documentation should be provided for
> >  its system administration?
>
> I think the project can only really work with donations out of someones
> data centre where they keep responsibility for the physical aspects of
> any machines including the ongoing hosting and running costs.
>

Agreed.  Anything else is beyond what can be managed atm.

> --
> Alex Bennée
>

Thanks,
- Cleber.
Willian Rampazzo June 30, 2021, 2:24 p.m. UTC | #17
On Tue, Jun 29, 2021 at 10:08 PM Cleber Rosa <crosa@redhat.com> wrote:
>
> On Fri, Jun 11, 2021 at 7:04 AM Alex Bennée <alex.bennee@linaro.org> wrote:
> >
> >
> > Cleber Rosa Junior <crosa@redhat.com> writes:
> >
> > > On Tue, Jun 8, 2021 at 2:30 AM Philippe Mathieu-Daudé <f4bug@amsat.org> wrote:
> > >
> > >  Hi Alex, Stefan,
> > >
> > >  On 6/8/21 5:14 AM, Cleber Rosa wrote:
> > >  > The QEMU project has two machines (aarch64 and s390x) that can be used
> > >  > for jobs that do build and run tests.
> > >
> > <snip>
> > >
> > >  Who has access to what and should do what (setup)? How is this list of
> > >  hw managed btw? Should there be some public visibility (i.e. Wiki)?
> > >
> > > These are good questions, and I believe Alex can answer them about those two machines.  Even though I hooked them up to GitLab,
> > > AFAICT he is the ultimate admin (maybe Peter too?).
> > >
> > > About hardware management, it has been suggested to use either the Wiki or a MAINTAINERS entry.  This is still unresolved and feedback
> > > would be appreciated.  For me to propose a MAINTAINERS entry, say, on
> > > a v7, I'd need the information on who is managing them.
> >
> > I can only talk about aarch64.ci.qemu.org which is a donated Equinix
> > machine that comes from the WorksOnArm initiative. I applied for it on
> > behalf of the QEMU project and we can have it for as long as it's
> > useful.
> >
> > I don't know if we need anything more that documenting the nominal
> > contacts in:
> >
> >   https://wiki.qemu.org/AdminContacts
> >
>
> That's enough indeed, thanks.  I'll follow up with a proposal about
> the expected duties of admins, which should be nothing but common
> sense.
>
> Is there anyone that can speak for the s390x machine?
>

I can be the backup here, if needed.

Willian
diff mbox series

Patch

diff --git a/.gitlab-ci.d/custom-runners.yml b/.gitlab-ci.d/custom-runners.yml
index a07b27384c..061d3cdfed 100644
--- a/.gitlab-ci.d/custom-runners.yml
+++ b/.gitlab-ci.d/custom-runners.yml
@@ -12,3 +12,211 @@ 
 # guarantees a fresh repository on each job run.
 variables:
   GIT_STRATEGY: clone
+
+# All ubuntu-18.04 jobs should run successfully in an environment
+# setup by the scripts/ci/setup/build-environment.yml task
+# "Install basic packages to build QEMU on Ubuntu 18.04/20.04"
+ubuntu-18.04-s390x-all-linux-static:
+ allow_failure: true
+ needs: []
+ stage: build
+ tags:
+ - ubuntu_18.04
+ - s390x
+ rules:
+ - if: '$CI_COMMIT_BRANCH =~ /^staging/'
+ script:
+ # --disable-libssh is needed because of https://bugs.launchpad.net/qemu/+bug/1838763
+ # --disable-glusterfs is needed because there's no static version of those libs in distro supplied packages
+ - mkdir build
+ - cd build
+ - ../configure --enable-debug --static --disable-system --disable-glusterfs --disable-libssh
+ - make --output-sync -j`nproc`
+ - make --output-sync -j`nproc` check V=1
+ - make --output-sync -j`nproc` check-tcg V=1
+
+ubuntu-18.04-s390x-all:
+ allow_failure: true
+ needs: []
+ stage: build
+ tags:
+ - ubuntu_18.04
+ - s390x
+ rules:
+ - if: '$CI_COMMIT_BRANCH =~ /^staging/'
+ script:
+ - mkdir build
+ - cd build
+ - ../configure --disable-libssh
+ - make --output-sync -j`nproc`
+ - make --output-sync -j`nproc` check V=1
+
+ubuntu-18.04-s390x-alldbg:
+ allow_failure: true
+ needs: []
+ stage: build
+ tags:
+ - ubuntu_18.04
+ - s390x
+ rules:
+ - if: '$CI_COMMIT_BRANCH =~ /^staging/'
+ script:
+ - mkdir build
+ - cd build
+ - ../configure --enable-debug --disable-libssh
+ - make clean
+ - make --output-sync -j`nproc`
+ - make --output-sync -j`nproc` check V=1
+
+ubuntu-18.04-s390x-clang:
+ allow_failure: true
+ needs: []
+ stage: build
+ tags:
+ - ubuntu_18.04
+ - s390x
+ rules:
+ - if: '$CI_COMMIT_BRANCH =~ /^staging/'
+   when: manual
+ script:
+ - mkdir build
+ - cd build
+ - ../configure --disable-libssh --cc=clang --cxx=clang++ --enable-sanitizers
+ - make --output-sync -j`nproc`
+ - make --output-sync -j`nproc` check V=1
+
+ubuntu-18.04-s390x-tci:
+ allow_failure: true
+ needs: []
+ stage: build
+ tags:
+ - ubuntu_18.04
+ - s390x
+ rules:
+ - if: '$CI_COMMIT_BRANCH =~ /^staging/'
+ script:
+ - mkdir build
+ - cd build
+ - ../configure --disable-libssh --enable-tcg-interpreter
+ - make --output-sync -j`nproc`
+
+ubuntu-18.04-s390x-notcg:
+ allow_failure: true
+ needs: []
+ stage: build
+ tags:
+ - ubuntu_18.04
+ - s390x
+ rules:
+ - if: '$CI_COMMIT_BRANCH =~ /^staging/'
+   when: manual
+ script:
+ - mkdir build
+ - cd build
+ - ../configure --disable-libssh --disable-tcg
+ - make --output-sync -j`nproc`
+ - make --output-sync -j`nproc` check V=1
+
+# All ubuntu-20.04 jobs should run successfully in an environment
+# setup by the scripts/ci/setup/qemu/build-environment.yml task
+# "Install basic packages to build QEMU on Ubuntu 18.04/20.04"
+ubuntu-20.04-aarch64-all-linux-static:
+ allow_failure: true
+ needs: []
+ stage: build
+ tags:
+ - ubuntu_20.04
+ - aarch64
+ rules:
+ - if: '$CI_COMMIT_BRANCH =~ /^staging/'
+ script:
+ # --disable-libssh is needed because of https://bugs.launchpad.net/qemu/+bug/1838763
+ # --disable-glusterfs is needed because there's no static version of those libs in distro supplied packages
+ - mkdir build
+ - cd build
+ - ../configure --enable-debug --static --disable-system --disable-glusterfs --disable-libssh
+ - make --output-sync -j`nproc`
+ - make --output-sync -j`nproc` check V=1
+ - make --output-sync -j`nproc` check-tcg V=1
+
+ubuntu-20.04-aarch64-all:
+ allow_failure: true
+ needs: []
+ stage: build
+ tags:
+ - ubuntu_20.04
+ - aarch64
+ rules:
+ - if: '$CI_COMMIT_BRANCH =~ /^staging/'
+ script:
+ - mkdir build
+ - cd build
+ - ../configure --disable-libssh
+ - make --output-sync -j`nproc`
+ - make --output-sync -j`nproc` check V=1
+
+ubuntu-20.04-aarch64-alldbg:
+ allow_failure: true
+ needs: []
+ stage: build
+ tags:
+ - ubuntu_20.04
+ - aarch64
+ rules:
+ - if: '$CI_COMMIT_BRANCH =~ /^staging/'
+ script:
+ - mkdir build
+ - cd build
+ - ../configure --enable-debug --disable-libssh
+ - make clean
+ - make --output-sync -j`nproc`
+ - make --output-sync -j`nproc` check V=1
+
+ubuntu-20.04-aarch64-clang:
+ allow_failure: true
+ needs: []
+ stage: build
+ tags:
+ - ubuntu_20.04
+ - aarch64
+ rules:
+ - if: '$CI_COMMIT_BRANCH =~ /^staging/'
+   when: manual
+ script:
+ - mkdir build
+ - cd build
+ - ../configure --disable-libssh --cc=clang-10 --cxx=clang++-10 --enable-sanitizers
+ - make --output-sync -j`nproc`
+ - make --output-sync -j`nproc` check V=1
+
+ubuntu-20.04-aarch64-tci:
+ allow_failure: true
+ needs: []
+ stage: build
+ tags:
+ - ubuntu_20.04
+ - aarch64
+ rules:
+ - if: '$CI_COMMIT_BRANCH =~ /^staging/'
+ script:
+ - mkdir build
+ - cd build
+ - ../configure --disable-libssh --enable-tcg-interpreter
+ - make --output-sync -j`nproc`
+
+ubuntu-20.04-aarch64-notcg:
+ allow_failure: true
+ needs: []
+ stage: build
+ tags:
+ - ubuntu_20.04
+ - aarch64
+ rules:
+ - if: '$CI_COMMIT_BRANCH =~ /^staging/'
+   when: manual
+ script:
+ - mkdir build
+ - cd build
+ - ../configure --disable-libssh --disable-tcg
+ - make --output-sync -j`nproc`
+ - make --output-sync -j`nproc` check V=1