[3/3] automation: add a QEMU based x86_64 Dom0/DomU test

Message ID	20211021230839.10794-3-sstabellini@kernel.org (mailing list archive)
State	Superseded
Headers	show Return-Path: <SRS0=26F+=PJ=lists.xenproject.org=xen-devel-bounces@kernel.org> DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org A0F0A610EA Errors-To: xen-devel-bounces@lists.xenproject.org Precedence: list Sender: "Xen-devel" <xen-devel-bounces@lists.xenproject.org> From: Stefano Stabellini <sstabellini@kernel.org> To: xen-devel@lists.xenproject.org Cc: sstabellini@kernel.org, iwj@xenproject.org, cardoe@cardoe.com, wl@xen.org, andrew.cooper3@citrix.com, anthony.perard@citrix.com, Stefano Stabellini <stefano.stabellini@xilinx.com> Subject: [PATCH 3/3] automation: add a QEMU based x86_64 Dom0/DomU test Date: Thu, 21 Oct 2021 16:08:39 -0700 Message-Id: <20211021230839.10794-3-sstabellini@kernel.org> In-Reply-To: <alpine.DEB.2.21.2110210831470.2311@sstabellini-ThinkPad-T480s> References: <alpine.DEB.2.21.2110210831470.2311@sstabellini-ThinkPad-T480s>
Series	automation: introduce an x86_64 Dom0/DomU test \| expand [0/3] automation: introduce an x86_64 Dom0/DomU test [1/3] automation: add x86_64 alpine 3.12 test-artifact [2/3] automation: Linux 5.10.74 test-artifact [3/3] automation: add a QEMU based x86_64 Dom0/DomU test

Stefano Stabellini Oct. 21, 2021, 11:08 p.m. UTC

From: Stefano Stabellini <stefano.stabellini@xilinx.com>

Introduce a test based on QEMU to run Xen, Dom0 and start a DomU.
This is similar to the existing qemu-alpine-arm64.sh script and test.
The only differences are:
- use Debian's qemu-system-x86_64 (on ARM we build our own)
- use ipxe instead of u-boot and ImageBuilder

Signed-off-by: Stefano Stabellini <stefano.stabellini@xilinx.com>
---
 automation/gitlab-ci/test.yaml           | 24 +++++++
 automation/scripts/qemu-alpine-x86_64.sh | 92 ++++++++++++++++++++++++
 2 files changed, 116 insertions(+)
 create mode 100644 automation/scripts/qemu-alpine-x86_64.sh

Anthony PERARD Oct. 22, 2021, 1:03 p.m. UTC | #1

On Thu, Oct 21, 2021 at 04:08:39PM -0700, Stefano Stabellini wrote:
> diff --git a/automation/scripts/qemu-alpine-x86_64.sh b/automation/scripts/qemu-alpine-x86_64.sh
> new file mode 100644
> index 0000000000..41b05210d6
> --- /dev/null
> +++ b/automation/scripts/qemu-alpine-x86_64.sh
> @@ -0,0 +1,92 @@
> +#!/bin/bash
> +
> +set -ex
> +
> +apt-get -qy update
> +apt-get -qy install --no-install-recommends qemu-system-x86 \
> +                                            cpio \
> +                                            curl \
> +                                            busybox-static

Please, don't install packages during the CI job. If you need new
packages, update the container.

That said, "curl" doesn't seems to be needed.

> +# DomU Busybox
> +cd binaries
> +mkdir -p initrd
> +mkdir -p initrd/bin
> +mkdir -p initrd/sbin
> +mkdir -p initrd/etc
> +mkdir -p initrd/dev
> +mkdir -p initrd/proc
> +mkdir -p initrd/sys
> +mkdir -p initrd/lib
> +mkdir -p initrd/var
> +mkdir -p initrd/mnt
> +cp /bin/busybox initrd/bin/busybox
> +initrd/bin/busybox --install initrd/bin
> +echo "#!/bin/sh
> +
> +mount -t proc proc /proc
> +mount -t sysfs sysfs /sys
> +mount -t devtmpfs devtmpfs /dev
> +/bin/sh" > initrd/init
> +chmod +x initrd/init
> +cd initrd
> +find . | cpio --create --format='newc' | gzip > ../initrd.cpio.gz

This isn't confusing at all, depending on the tool used to make an
archive, the resulting initrd has a different purpose :-).
    initrd.tar.gz -> dom0
    initrd.cpio.gz -> domU

> +cd ..

Maybe add a comment here saying that we are now preparing dom0 root
filesystem? (as there is one for domu)

> +mkdir -p rootfs
> +cd rootfs
> +tar xvzf ../initrd.tar.gz
> +mkdir proc
> +mkdir run
> +mkdir srv
> +mkdir sys
> +rm var/run
> +cp -ar ../dist/install/* .
> +mv ../initrd.cpio.gz ./root
> +cp ../bzImage ./root
> +echo "name=\"test\"
> +memory=512
> +vcpus=1
> +kernel=\"/root/bzImage\"
> +ramdisk=\"/root/initrd.cpio.gz\"
> +extra=\"console=hvc0 root=/dev/ram0 rdinit=/bin/sh\"
> +" > root/test.cfg
> +echo "#!/bin/bash
> +

Maybe add `set -x` ?

> +export LD_LIBRARY_PATH=/usr/local/lib
> +bash /etc/init.d/xencommons start
> +
> +xl list
> +
> +xl create -c /root/test.cfg
> +
> +" > etc/local.d/xen.start
> +chmod +x etc/local.d/xen.start
> +echo "rc_verbose=yes" >> etc/rc.conf
> +find . |cpio -H newc -o|gzip > ../xen-rootfs.cpio.gz
> +cd ../..
> +
> +cat >> binaries/pxelinux.0 <<- EOF

So, I've look at <<- meaning as I never used that before and it seems to
remove all leading tab chr from the input, yet they are no tab in the
input. So maybe use just <<EOF instead, without the dash -.

> +#!ipxe
> +
> +kernel xen console=com1
> +module bzImage console=hvc0
> +module xen-rootfs.cpio.gz
> +boot
> +EOF

Thanks,

Stefano Stabellini Oct. 22, 2021, 8:05 p.m. UTC | #2

On Fri, 22 Oct 2021, Anthony PERARD wrote:
> On Thu, Oct 21, 2021 at 04:08:39PM -0700, Stefano Stabellini wrote:
> > diff --git a/automation/scripts/qemu-alpine-x86_64.sh b/automation/scripts/qemu-alpine-x86_64.sh
> > new file mode 100644
> > index 0000000000..41b05210d6
> > --- /dev/null
> > +++ b/automation/scripts/qemu-alpine-x86_64.sh
> > @@ -0,0 +1,92 @@
> > +#!/bin/bash
> > +
> > +set -ex
> > +
> > +apt-get -qy update
> > +apt-get -qy install --no-install-recommends qemu-system-x86 \
> > +                                            cpio \
> > +                                            curl \
> > +                                            busybox-static
> 
> Please, don't install packages during the CI job. If you need new
> packages, update the container.

The container used to run this script is the one used for the Xen build:
automation/build/debian/unstable.dockerfile, AKA
registry.gitlab.com/xen-project/xen/debian:unstable. We don't have a
specific container for the sole purpose of running tests.

Thus, I could add qemu-system-x86 to
automation/build/debian/unstable.dockerfile, but then we increase the
size of the debian unstable build container needlessly for all related
build jobs.

Or we could add one more special container just for running tests, but
then it is one more container to maintain and keep up-to-date.

This is why I chose as a compromise to keep the underlying container as
is, and just apt-get the extra 3-4 packages here. It is the same thing
we do on ARM: automation/scripts/qemu-alpine-arm64.sh. Also keep in mind
that this job is run in the "test" step where we have far fewer jobs at
the moment and the runners are not busy. (It would be different in the
"build" step where we have many jobs.)

I am not entirely sure what is the best solution overall, but for this
series at this stage I would prefer to keep the same strategy used for
the ARM tests (i.e. reuse the debian unstable build container and
apt-get the few missing packages.) If we do change the way we do it, I
would rather change both x86 and ARM at the same time.

> That said, "curl" doesn't seems to be needed.

Yeah I'll take away curl.

> > +# DomU Busybox
> > +cd binaries
> > +mkdir -p initrd
> > +mkdir -p initrd/bin
> > +mkdir -p initrd/sbin
> > +mkdir -p initrd/etc
> > +mkdir -p initrd/dev
> > +mkdir -p initrd/proc
> > +mkdir -p initrd/sys
> > +mkdir -p initrd/lib
> > +mkdir -p initrd/var
> > +mkdir -p initrd/mnt
> > +cp /bin/busybox initrd/bin/busybox
> > +initrd/bin/busybox --install initrd/bin
> > +echo "#!/bin/sh
> > +
> > +mount -t proc proc /proc
> > +mount -t sysfs sysfs /sys
> > +mount -t devtmpfs devtmpfs /dev
> > +/bin/sh" > initrd/init
> > +chmod +x initrd/init
> > +cd initrd
> > +find . | cpio --create --format='newc' | gzip > ../initrd.cpio.gz
> 
> This isn't confusing at all, depending on the tool used to make an
> archive, the resulting initrd has a different purpose :-).
>     initrd.tar.gz -> dom0
>     initrd.cpio.gz -> domU
> 
> > +cd ..
> 
> Maybe add a comment here saying that we are now preparing dom0 root
> filesystem? (as there is one for domu)

Sure I can add a comment

> > +mkdir -p rootfs
> > +cd rootfs
> > +tar xvzf ../initrd.tar.gz
> > +mkdir proc
> > +mkdir run
> > +mkdir srv
> > +mkdir sys
> > +rm var/run
> > +cp -ar ../dist/install/* .
> > +mv ../initrd.cpio.gz ./root
> > +cp ../bzImage ./root
> > +echo "name=\"test\"
> > +memory=512
> > +vcpus=1
> > +kernel=\"/root/bzImage\"
> > +ramdisk=\"/root/initrd.cpio.gz\"
> > +extra=\"console=hvc0 root=/dev/ram0 rdinit=/bin/sh\"
> > +" > root/test.cfg
> > +echo "#!/bin/bash
> > +
> 
> Maybe add `set -x` ?

OK

> > +export LD_LIBRARY_PATH=/usr/local/lib
> > +bash /etc/init.d/xencommons start
> > +
> > +xl list
> > +
> > +xl create -c /root/test.cfg
> > +
> > +" > etc/local.d/xen.start
> > +chmod +x etc/local.d/xen.start
> > +echo "rc_verbose=yes" >> etc/rc.conf
> > +find . |cpio -H newc -o|gzip > ../xen-rootfs.cpio.gz
> > +cd ../..
> > +
> > +cat >> binaries/pxelinux.0 <<- EOF
> 
> So, I've look at <<- meaning as I never used that before and it seems to
> remove all leading tab chr from the input, yet they are no tab in the
> input. So maybe use just <<EOF instead, without the dash -.

OK

> > +#!ipxe
> > +
> > +kernel xen console=com1
> > +module bzImage console=hvc0
> > +module xen-rootfs.cpio.gz
> > +boot
> > +EOF

Anthony PERARD Oct. 25, 2021, 4:13 p.m. UTC | #3

On Fri, Oct 22, 2021 at 01:05:35PM -0700, Stefano Stabellini wrote:
> On Fri, 22 Oct 2021, Anthony PERARD wrote:
> > On Thu, Oct 21, 2021 at 04:08:39PM -0700, Stefano Stabellini wrote:
> > > diff --git a/automation/scripts/qemu-alpine-x86_64.sh b/automation/scripts/qemu-alpine-x86_64.sh
> > > new file mode 100644
> > > index 0000000000..41b05210d6
> > > --- /dev/null
> > > +++ b/automation/scripts/qemu-alpine-x86_64.sh
> > > @@ -0,0 +1,92 @@
> > > +#!/bin/bash
> > > +
> > > +set -ex
> > > +
> > > +apt-get -qy update
> > > +apt-get -qy install --no-install-recommends qemu-system-x86 \
> > > +                                            cpio \
> > > +                                            curl \
> > > +                                            busybox-static
> > 
> > Please, don't install packages during the CI job. If you need new
> > packages, update the container.
> 
> The container used to run this script is the one used for the Xen build:
> automation/build/debian/unstable.dockerfile, AKA
> registry.gitlab.com/xen-project/xen/debian:unstable. We don't have a
> specific container for the sole purpose of running tests.

I've added qemu to our debian:stretch container recently, in order to
run the "qemu-smoke-*" tests without running apt commands. Unless more
recent software are needed, you could base the "qemu-alpine-x86*" test
on debian:stretch which might only then missing cpio and busybox. Update
of the stretch container should go smoothly as it has been done
recently.

> Thus, I could add qemu-system-x86 to
> automation/build/debian/unstable.dockerfile, but then we increase the
> size of the debian unstable build container needlessly for all related
> build jobs.

There is something I'm missing, how is it a problem to have a container
that is a bit bigger? What sort of problem could we have to deal with?
On the other hand, doing more task, installing software, downloading
anything from the internet, makes the job much less reliable. It
increase the risk of a failure do to external factors and it takes more
time to run the test.

> Or we could add one more special container just for running tests, but
> then it is one more container to maintain and keep up-to-date.
> 
> This is why I chose as a compromise to keep the underlying container as
> is, and just apt-get the extra 3-4 packages here. It is the same thing
> we do on ARM: automation/scripts/qemu-alpine-arm64.sh. Also keep in mind
> that this job is run in the "test" step where we have far fewer jobs at
> the moment and the runners are not busy. (It would be different in the
> "build" step where we have many jobs.)

I don't really see any difference between a "test" job and a "build"
jobs, both kind use the same resource/runner.

On that note, they're is something I've learned recently: "test" job
don't even have to wait for all "build" job to complete, they can run in
parallel. We would just need to replace "dependencies" by "needs":
    https://docs.gitlab.com/ee/ci/yaml/index.html#needs
But that could be an improvement for an other time and only a side note
for the patch.

> I am not entirely sure what is the best solution overall, but for this
> series at this stage I would prefer to keep the same strategy used for
> the ARM tests (i.e. reuse the debian unstable build container and
> apt-get the few missing packages.) If we do change the way we do it, I
> would rather change both x86 and ARM at the same time.

I'm pretty sure the best strategy would be to do as little as possible
during a job, download as little as possible and possibly cache as much
as possible and do as much as possible ahead of time. Feel free to
change the Arm test, but I don't think it is necessary to change the Arm
test at the same time as introducing an x86 test.

Cheers,

Stefano Stabellini Oct. 26, 2021, 1:33 a.m. UTC | #4

On Mon, 25 Oct 2021, Anthony PERARD wrote:
> On Fri, Oct 22, 2021 at 01:05:35PM -0700, Stefano Stabellini wrote:
> > On Fri, 22 Oct 2021, Anthony PERARD wrote:
> > > On Thu, Oct 21, 2021 at 04:08:39PM -0700, Stefano Stabellini wrote:
> > > > diff --git a/automation/scripts/qemu-alpine-x86_64.sh b/automation/scripts/qemu-alpine-x86_64.sh
> > > > new file mode 100644
> > > > index 0000000000..41b05210d6
> > > > --- /dev/null
> > > > +++ b/automation/scripts/qemu-alpine-x86_64.sh
> > > > @@ -0,0 +1,92 @@
> > > > +#!/bin/bash
> > > > +
> > > > +set -ex
> > > > +
> > > > +apt-get -qy update
> > > > +apt-get -qy install --no-install-recommends qemu-system-x86 \
> > > > +                                            cpio \
> > > > +                                            curl \
> > > > +                                            busybox-static
> > > 
> > > Please, don't install packages during the CI job. If you need new
> > > packages, update the container.
> > 
> > The container used to run this script is the one used for the Xen build:
> > automation/build/debian/unstable.dockerfile, AKA
> > registry.gitlab.com/xen-project/xen/debian:unstable. We don't have a
> > specific container for the sole purpose of running tests.
> 
> I've added qemu to our debian:stretch container recently, in order to
> run the "qemu-smoke-*" tests without running apt commands. Unless more
> recent software are needed, you could base the "qemu-alpine-x86*" test
> on debian:stretch which might only then missing cpio and busybox. Update
> of the stretch container should go smoothly as it has been done
> recently.

I am happy to use debian:stretch in this patch series. I'll make the
change. That only leaves cpio and busybox-static to apt-get which are
small. In general, I am not too happy about increasing the size of build
containers with testing binaries so I would rather not increase it any
further, see below.

> > Thus, I could add qemu-system-x86 to
> > automation/build/debian/unstable.dockerfile, but then we increase the
> > size of the debian unstable build container needlessly for all related
> > build jobs.
> 
> There is something I'm missing, how is it a problem to have a container
> that is a bit bigger? What sort of problem could we have to deal with?

It takes time to clone the container in the gitlab-ci, the bigger the
container the more time it takes. It is fetched over the network. Now we
are fetching qemu (as part of the container) 10 times during the build
although it is not needed.

There is a compromise to be made. Let's say the QEMU package is 100MB.
If we add QEMU to debian:stretch we are going to increase the overall
data usage by about 1GB, which means we are waiting to fetch 1GB of
additional data during the pipeline.

If we needed QEMU only once, then we could just apt-get it and we would
only fetch 100MB once. Although using apt-get will be a bit slower
because the deb needs to be unpacked, etc. I think that if we used QEMU
only in one test then it would be faster to apt-get it.

However in our case we are using QEMU 4 times during the tests, so
adding QEMU to debian:stretch probably made things a bit faster or at
least not slower.

Does it make sense?

This is why we should have specialized containers for the tests instead
of adding test packages to the build containers. However, today all
containers are updated manually so it would be best to make containers
automatically updatable first.  Until then I think it is OK to apt-get
stuff.

> On the other hand, doing more task, installing software, downloading
> anything from the internet, makes the job much less reliable. It
> increase the risk of a failure do to external factors and it takes more
> time to run the test.

That is fair but it is more a theoretical problem than a real issue at
the moment. In reality I haven't seen Debian's apt-get failing even a
single time in our gitlab-ci (it failed because our debian:unstable
container went out of date but that's our fault).

But we do have a severe problem at the moment with external sources: our
"git clones" keep failing during the build on x86. That is definitely
something worth improving (see my other email thread on the subject) and
it is the main problem affecting gitlab-ci at the moment, I keep having
to restart jobs almost daily to get the overall pipeline to "pass".

If you have any ideas on how to stop fetching things using "git" from
external repositories in gitlab-ci that would be fantastic :-)
The only thing I could think of to "fix it" is moving all external repos
to gitlab repositories mirrors.

> > Or we could add one more special container just for running tests, but
> > then it is one more container to maintain and keep up-to-date.
> > 
> > This is why I chose as a compromise to keep the underlying container as
> > is, and just apt-get the extra 3-4 packages here. It is the same thing
> > we do on ARM: automation/scripts/qemu-alpine-arm64.sh. Also keep in mind
> > that this job is run in the "test" step where we have far fewer jobs at
> > the moment and the runners are not busy. (It would be different in the
> > "build" step where we have many jobs.)
> 
> I don't really see any difference between a "test" job and a "build"
> jobs, both kind use the same resource/runner.
> 
> On that note, they're is something I've learned recently: "test" job
> don't even have to wait for all "build" job to complete, they can run in
> parallel. We would just need to replace "dependencies" by "needs":
>     https://docs.gitlab.com/ee/ci/yaml/index.html#needs
> But that could be an improvement for an other time and only a side note
> for the patch.

That is really cool! I didn't know about it. Yes, currently there is a
big distinction between build and test jobs because build jobs are the
bottleneck and test jobs don't start before all the build jobs finish.

> > I am not entirely sure what is the best solution overall, but for this
> > series at this stage I would prefer to keep the same strategy used for
> > the ARM tests (i.e. reuse the debian unstable build container and
> > apt-get the few missing packages.) If we do change the way we do it, I
> > would rather change both x86 and ARM at the same time.
> 
> I'm pretty sure the best strategy would be to do as little as possible
> during a job, download as little as possible and possibly cache as much
> as possible and do as much as possible ahead of time. Feel free to
> change the Arm test, but I don't think it is necessary to change the Arm
> test at the same time as introducing an x86 test.

I agree.

At the same time it would be nice to follow the same strategy between
x86 and ARM going forward: if one optimization is made for one, it is
also made for the other.

Anthony PERARD Oct. 27, 2021, 1:59 p.m. UTC | #5

On Mon, Oct 25, 2021 at 06:33:53PM -0700, Stefano Stabellini wrote:
> On Mon, 25 Oct 2021, Anthony PERARD wrote:
> > There is something I'm missing, how is it a problem to have a container
> > that is a bit bigger? What sort of problem could we have to deal with?
> 
> It takes time to clone the container in the gitlab-ci, the bigger the
> container the more time it takes. It is fetched over the network. Now we
> are fetching qemu (as part of the container) 10 times during the build
> although it is not needed.

I guess the issue would be more like we don't do enough caching with our
gitlab runners. So package installation is just a workaround.

> But we do have a severe problem at the moment with external sources: our
> "git clones" keep failing during the build on x86. That is definitely
> something worth improving (see my other email thread on the subject) and
> it is the main problem affecting gitlab-ci at the moment, I keep having
> to restart jobs almost daily to get the overall pipeline to "pass".
> 
> If you have any ideas on how to stop fetching things using "git" from
> external repositories in gitlab-ci that would be fantastic :-)
> The only thing I could think of to "fix it" is moving all external repos
> to gitlab repositories mirrors.

I don't think that would work, I've seen the initial clone/fetch of a
job fail as well, so from gitlab. If we could have a cache of those
external resources closer to the runners, that would be better.

> > > I am not entirely sure what is the best solution overall, but for this
> > > series at this stage I would prefer to keep the same strategy used for
> > > the ARM tests (i.e. reuse the debian unstable build container and
> > > apt-get the few missing packages.) If we do change the way we do it, I
> > > would rather change both x86 and ARM at the same time.
> > 
> > I'm pretty sure the best strategy would be to do as little as possible
> > during a job, download as little as possible and possibly cache as much
> > as possible and do as much as possible ahead of time. Feel free to
> > change the Arm test, but I don't think it is necessary to change the Arm
> > test at the same time as introducing an x86 test.
> 
> I agree.
> 
> At the same time it would be nice to follow the same strategy between
> x86 and ARM going forward: if one optimization is made for one, it is
> also made for the other.

Probably better, yes.

Thanks,

Stefano Stabellini Oct. 27, 2021, 9:43 p.m. UTC | #6

On Wed, 27 Oct 2021, Anthony PERARD wrote:
> > But we do have a severe problem at the moment with external sources: our
> > "git clones" keep failing during the build on x86. That is definitely
> > something worth improving (see my other email thread on the subject) and
> > it is the main problem affecting gitlab-ci at the moment, I keep having
> > to restart jobs almost daily to get the overall pipeline to "pass".
> > 
> > If you have any ideas on how to stop fetching things using "git" from
> > external repositories in gitlab-ci that would be fantastic :-)
> > The only thing I could think of to "fix it" is moving all external repos
> > to gitlab repositories mirrors.
> 
> I don't think that would work, I've seen the initial clone/fetch of a
> job fail as well, so from gitlab. If we could have a cache of those
> external resources closer to the runners, that would be better.

You mean like a git repository mirror inside the Rackspace network (the
provider of the x86 runner), right? Then we would force the git client
to go to the Rackspace mirror instead of directly to the target using
"insteadOf".

Is that what you meant? Doug, do you think it would work?

Anthony PERARD Oct. 28, 2021, 9:48 a.m. UTC | #7

On Wed, Oct 27, 2021 at 02:43:46PM -0700, Stefano Stabellini wrote:
> On Wed, 27 Oct 2021, Anthony PERARD wrote:
> > > But we do have a severe problem at the moment with external sources: our
> > > "git clones" keep failing during the build on x86. That is definitely
> > > something worth improving (see my other email thread on the subject) and
> > > it is the main problem affecting gitlab-ci at the moment, I keep having
> > > to restart jobs almost daily to get the overall pipeline to "pass".
> > > 
> > > If you have any ideas on how to stop fetching things using "git" from
> > > external repositories in gitlab-ci that would be fantastic :-)
> > > The only thing I could think of to "fix it" is moving all external repos
> > > to gitlab repositories mirrors.
> > 
> > I don't think that would work, I've seen the initial clone/fetch of a
> > job fail as well, so from gitlab. If we could have a cache of those
> > external resources closer to the runners, that would be better.
> 
> You mean like a git repository mirror inside the Rackspace network (the
> provider of the x86 runner), right? Then we would force the git client
> to go to the Rackspace mirror instead of directly to the target using
> "insteadOf".

That would seems the best to me. If we could install Ian's
git-cache-proxy that is used in osstest, that would be good I think.
Having a mirror instead might work too but that would mean figure out
which repo we would need a mirror of.

I did try a different alternative a while back, I tried to use gitlab's
caching capability:
    automation: Cache sub-project git tree in build jobs
    https://lore.kernel.org/xen-devel/20191219144217.305851-3-anthony.perard@citrix.com/
It mostly works but I'm not sure how useful it is as it seems there is
10 computers that would maintain 10 different caches, and most of them
for a short while.

> Is that what you meant? Doug, do you think it would work?

Stefano Stabellini Oct. 28, 2021, 2:19 p.m. UTC | #8

On Thu, 28 Oct 2021, Anthony PERARD wrote:
> On Wed, Oct 27, 2021 at 02:43:46PM -0700, Stefano Stabellini wrote:
> > On Wed, 27 Oct 2021, Anthony PERARD wrote:
> > > > But we do have a severe problem at the moment with external sources: our
> > > > "git clones" keep failing during the build on x86. That is definitely
> > > > something worth improving (see my other email thread on the subject) and
> > > > it is the main problem affecting gitlab-ci at the moment, I keep having
> > > > to restart jobs almost daily to get the overall pipeline to "pass".
> > > > 
> > > > If you have any ideas on how to stop fetching things using "git" from
> > > > external repositories in gitlab-ci that would be fantastic :-)
> > > > The only thing I could think of to "fix it" is moving all external repos
> > > > to gitlab repositories mirrors.
> > > 
> > > I don't think that would work, I've seen the initial clone/fetch of a
> > > job fail as well, so from gitlab. If we could have a cache of those
> > > external resources closer to the runners, that would be better.
> > 
> > You mean like a git repository mirror inside the Rackspace network (the
> > provider of the x86 runner), right? Then we would force the git client
> > to go to the Rackspace mirror instead of directly to the target using
> > "insteadOf".
> 
> That would seems the best to me. If we could install Ian's
> git-cache-proxy that is used in osstest, that would be good I think.
> Having a mirror instead might work too but that would mean figure out
> which repo we would need a mirror of.
> 
> I did try a different alternative a while back, I tried to use gitlab's
> caching capability:
>     automation: Cache sub-project git tree in build jobs
>     https://lore.kernel.org/xen-devel/20191219144217.305851-3-anthony.perard@citrix.com/
> It mostly works but I'm not sure how useful it is as it seems there is
> 10 computers that would maintain 10 different caches, and most of them
> for a short while.

Interesting!

With the current runners being short-lived, it makes sense only to have
git-cache-proxy on a different always-on host. But even after the
runners are changed to be always running, instead of spawned on demand,
it might be better to have git-cache-proxy on a separate host so that
all runners can use it.  It should be more efficient?

[3/3] automation: add a QEMU based x86_64 Dom0/DomU test

Commit Message

Comments

Patch