[1/3] gitlab: always build container images

Message ID	20210208163339.1159514-2-berrange@redhat.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <SRS0=Wb4y=HK=nongnu.org=qemu-devel-bounces+qemu-devel=archiver.kernel.org@kernel.org> DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org E465964DE1 From: =?utf-8?q?Daniel_P=2E_Berrang=C3=A9?= <berrange@redhat.com> To: qemu-devel@nongnu.org Subject: [PATCH 1/3] gitlab: always build container images Date: Mon, 8 Feb 2021 16:33:37 +0000 Message-Id: <20210208163339.1159514-2-berrange@redhat.com> In-Reply-To: <20210208163339.1159514-1-berrange@redhat.com> References: <20210208163339.1159514-1-berrange@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Received-SPF: pass client-ip=216.205.24.124; envelope-from=berrange@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -33 X-Spam_score: -3.4 X-Spam_bar: --- X-Spam_report: (-3.4 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.57, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H3=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action Precedence: list Cc: =?utf-8?q?Alex_Benn=C3=A9e?= <alex.bennee@linaro.org>, Thomas Huth <thuth@redhat.com>, =?utf-8?q?Philippe_Mathieu-Daud=C3=A9?= <philmd@redhat.com>, =?utf-8?q?Dani?= =?utf-8?q?el_P=2E_Berrang=C3=A9?= <berrange@redhat.com>, Wainer dos Santos Moschetta <wainersm@redhat.com> Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: "Qemu-devel" <qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org>
Series	fix build failures from incorrectly skipped container build jobs \| expand [0/3] fix build failures from incorrectly skipped container build jobs [1/3] gitlab: always build container images [2/3] gitlab: add fine grained job deps for all build jobs [3/3] gitlab: fix inconsistent indentation

Daniel P. Berrangé Feb. 8, 2021, 4:33 p.m. UTC

Currently we attempt to skip building container images if the commits do
not involve changes to the dockerfiles or gitlab CI definitions.

Conceptually this makes sense, but there is a challenge in the real
world implementation of this in gitlab.

In the case of a CI pipeline triggered from a merge request, GitLab
knows the common ancestor of the merge request and the main git repo,
so it can trivially determine if any of the commits associated with
the MR change the dockerfiles.

In the case of a CI pipeline triggered from a push to a branch, it is
much more difficult. There is no concept of a common ancestor in this
case. Instead GitLab looks at the set of commits in the git push event.

On the surface this may sound reasonable, but it doesn't take into
account that a push event does not always contain the full set of
patches from a branch.

For example, consider pushing 5 commits, one of which contains a
dockerfile change. This will trigger a CI pipeline for the
containers. Now consider you do some more work on the branch and push 3
further commits, so you now have a branch of 8 commits. For the second
push GitLab will only look at the 3 most recent commits, the other 5
were already present. Thus GitLab will not realize that the branch has
dockerfile changes that need to trigger the container build.

This can cause real world problems:

 - Push 5 commits to branch "foo", including a dockerfile change

    => rebuilds the container images with content from "foo"
    => build jobs runs against containers from "foo"

 - Refresh your master branch with latest upstream master

    => rebuilds the container images with content from "master"
    => build jobs runs against containers from "master"

 - Push 3 more commits to branch "foo", with no dockerfile change

    => no container rebuild triggers
    => build jobs runs against containers from "master"

The "changes" conditional in gitlab is OK, *provided* your build
jobs are not relying on any external state from previous builds.

This is NOT the case in QEMU, because we are building container
images and these are cached. This is a scenario in which the
"changes" conditional is not usuable.

The only other way to avoid this problem would be to use the git
branch name as the container image tag, instead of always using
"latest". The downside of this approach is that the user's gitlab
registry will grow significantly until it starts to trigger
GitLab's automatic deletion policy.  Every time the user starts
a new branch they will have to trigger a rebuild of the container
images. Given this, we might as well just drop the conditional
and always build the container images. Most of the time docker
will be able to use the layer cache to avoid the most expensive
part of the rebuild process (installing all the RPMs/debs/etc)

Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
---
 .gitlab-ci.d/containers.yml | 7 -------
 1 file changed, 7 deletions(-)

Thomas Huth Feb. 9, 2021, 6:37 a.m. UTC | #1

On 08/02/2021 17.33, Daniel P. Berrangé wrote:
[...]
> For example, consider pushing 5 commits, one of which contains a
> dockerfile change. This will trigger a CI pipeline for the
> containers. Now consider you do some more work on the branch and push 3
> further commits, so you now have a branch of 8 commits. For the second
> push GitLab will only look at the 3 most recent commits, the other 5
> were already present. Thus GitLab will not realize that the branch has
> dockerfile changes that need to trigger the container build.
> 
> This can cause real world problems:
> 
>   - Push 5 commits to branch "foo", including a dockerfile change
> 
>      => rebuilds the container images with content from "foo"
>      => build jobs runs against containers from "foo"
> 
>   - Refresh your master branch with latest upstream master
> 
>      => rebuilds the container images with content from "master"
>      => build jobs runs against containers from "master"
> 
>   - Push 3 more commits to branch "foo", with no dockerfile change
> 
>      => no container rebuild triggers
>      => build jobs runs against containers from "master"
> 
> The "changes" conditional in gitlab is OK, *provided* your build
> jobs are not relying on any external state from previous builds.
> 
> This is NOT the case in QEMU, because we are building container
> images and these are cached. This is a scenario in which the
> "changes" conditional is not usuable.
> 
> The only other way to avoid this problem would be to use the git
> branch name as the container image tag, instead of always using
> "latest".
I'm basically fine with your patch, but let me ask one more thing: Won't we 
still have the problem if the user pushes to different branches 
simultaneously? E.g. the user pushes to "foo" with changes to dockerfiles, 
containers start to get rebuild, then pushes to master without waiting for 
the previous CI to finish, then the containers get rebuild from the "master" 
job without the local changes to the dockerfiles. Then in the "foo" CI 
pipelines the following jobs might run with the containers that have been 
built by the "master" job...

So if we really want to get it bulletproof, do we have to use the git branch 
name as the container image tag?

  Thomas

Daniel P. Berrangé Feb. 9, 2021, 9:58 a.m. UTC | #2

On Tue, Feb 09, 2021 at 07:37:51AM +0100, Thomas Huth wrote:
> On 08/02/2021 17.33, Daniel P. Berrangé wrote:
> [...]
> > For example, consider pushing 5 commits, one of which contains a
> > dockerfile change. This will trigger a CI pipeline for the
> > containers. Now consider you do some more work on the branch and push 3
> > further commits, so you now have a branch of 8 commits. For the second
> > push GitLab will only look at the 3 most recent commits, the other 5
> > were already present. Thus GitLab will not realize that the branch has
> > dockerfile changes that need to trigger the container build.
> > 
> > This can cause real world problems:
> > 
> >   - Push 5 commits to branch "foo", including a dockerfile change
> > 
> >      => rebuilds the container images with content from "foo"
> >      => build jobs runs against containers from "foo"
> > 
> >   - Refresh your master branch with latest upstream master
> > 
> >      => rebuilds the container images with content from "master"
> >      => build jobs runs against containers from "master"
> > 
> >   - Push 3 more commits to branch "foo", with no dockerfile change
> > 
> >      => no container rebuild triggers
> >      => build jobs runs against containers from "master"
> > 
> > The "changes" conditional in gitlab is OK, *provided* your build
> > jobs are not relying on any external state from previous builds.
> > 
> > This is NOT the case in QEMU, because we are building container
> > images and these are cached. This is a scenario in which the
> > "changes" conditional is not usuable.
> > 
> > The only other way to avoid this problem would be to use the git
> > branch name as the container image tag, instead of always using
> > "latest".
> I'm basically fine with your patch, but let me ask one more thing: Won't we
> still have the problem if the user pushes to different branches
> simultaneously? E.g. the user pushes to "foo" with changes to dockerfiles,
> containers start to get rebuild, then pushes to master without waiting for
> the previous CI to finish, then the containers get rebuild from the "master"
> job without the local changes to the dockerfiles. Then in the "foo" CI
> pipelines the following jobs might run with the containers that have been
> built by the "master" job...

Yes,  this is the issue I describe in the cover letter.

> So if we really want to get it bulletproof, do we have to use the git branch
> name as the container image tag?

That is possible, but I'm somewhat loathe to do that, as it means the
container registry in developers forks will accumulate a growing list
of image tags. I know gitlab will force expire once it gets beyond a
certain number of tags, but it still felt pretty wasteful of space
to create so many tags.

Having said that, maybe this is not actually wasteful if we always
use the "master" as a cache for docker, then the "new" images we
build on each branch will just re-use existing docker layers and
thus not add to disk usage. We'd only see extra usage if the branch
contained changes to dockerfiles.

Regards,
Daniel

Daniel P. Berrangé Feb. 10, 2021, 11:17 a.m. UTC | #3

On Tue, Feb 09, 2021 at 09:58:29AM +0000, Daniel P. Berrangé wrote:
> On Tue, Feb 09, 2021 at 07:37:51AM +0100, Thomas Huth wrote:
> > On 08/02/2021 17.33, Daniel P. Berrangé wrote:
> > [...]
> > > For example, consider pushing 5 commits, one of which contains a
> > > dockerfile change. This will trigger a CI pipeline for the
> > > containers. Now consider you do some more work on the branch and push 3
> > > further commits, so you now have a branch of 8 commits. For the second
> > > push GitLab will only look at the 3 most recent commits, the other 5
> > > were already present. Thus GitLab will not realize that the branch has
> > > dockerfile changes that need to trigger the container build.
> > > 
> > > This can cause real world problems:
> > > 
> > >   - Push 5 commits to branch "foo", including a dockerfile change
> > > 
> > >      => rebuilds the container images with content from "foo"
> > >      => build jobs runs against containers from "foo"
> > > 
> > >   - Refresh your master branch with latest upstream master
> > > 
> > >      => rebuilds the container images with content from "master"
> > >      => build jobs runs against containers from "master"
> > > 
> > >   - Push 3 more commits to branch "foo", with no dockerfile change
> > > 
> > >      => no container rebuild triggers
> > >      => build jobs runs against containers from "master"
> > > 
> > > The "changes" conditional in gitlab is OK, *provided* your build
> > > jobs are not relying on any external state from previous builds.
> > > 
> > > This is NOT the case in QEMU, because we are building container
> > > images and these are cached. This is a scenario in which the
> > > "changes" conditional is not usuable.
> > > 
> > > The only other way to avoid this problem would be to use the git
> > > branch name as the container image tag, instead of always using
> > > "latest".
> > I'm basically fine with your patch, but let me ask one more thing: Won't we
> > still have the problem if the user pushes to different branches
> > simultaneously? E.g. the user pushes to "foo" with changes to dockerfiles,
> > containers start to get rebuild, then pushes to master without waiting for
> > the previous CI to finish, then the containers get rebuild from the "master"
> > job without the local changes to the dockerfiles. Then in the "foo" CI
> > pipelines the following jobs might run with the containers that have been
> > built by the "master" job...
> 
> Yes,  this is the issue I describe in the cover letter.
> 
> > So if we really want to get it bulletproof, do we have to use the git branch
> > name as the container image tag?
> 
> That is possible, but I'm somewhat loathe to do that, as it means the
> container registry in developers forks will accumulate a growing list
> of image tags. I know gitlab will force expire once it gets beyond a
> certain number of tags, but it still felt pretty wasteful of space
> to create so many tags.
> 
> Having said that, maybe this is not actually wasteful if we always
> use the "master" as a cache for docker, then the "new" images we
> build on each branch will just re-use existing docker layers and
> thus not add to disk usage. We'd only see extra usage if the branch
> contained changes to dockerfiles.

The challenge here is that I need the docker tag name to be in an env
variable in the gitlab-ci.yml file.

I can directly use $CI_COMMIT_REF_NAME  to get the branch name but
the list of valid characters for a git branch is way more permissive
than valid characters for a docker tag.

So we need to filter the git branch name to form a valid docker tag,
and AFAICT, there's no way todo that when setting a global env variable
in the gitlab-ci.yml.  I can only do filtering once in the before_script:
stage, and that's too late to use it in the image name for the job.

We could ignore the problem and hope people always have sane branch
names ? 

   https://docs.docker.com/engine/reference/commandline/tag/

  "A tag name must be valid ASCII and may contain lowercase and 
   uppercase letters, digits, underscores, periods and dashes. 
   A tag name may not start with a period or a dash and may 
   contain a maximum of 128 characters."

that rule would cover all my git branch names, but then ASCII covers
most common english needs.  I worry that we might have contributors
who genuinely use non-ASCII chars in their git branch names, especially
those speakers of non-english/european languages eg persian, chinese,
japanese languages for example. Git is very permissive, allowing
everything except a short list

   https://www.spinics.net/lists/git/msg133704.html

  "A branch name can not:
        - Have a path component that begins with "."
        - Have a double dot ".."
        - Have an ASCII control character, "~", "^", ":" or SP, anywhere
        - End with a "/"
        - End with ".lock"
        - Contain a "\" (backslash"

The result will be if someone names their git branch "

Daniel P. Berrangé Feb. 16, 2021, 12:43 p.m. UTC | #4

On Wed, Feb 10, 2021 at 11:17:00AM +0000, Daniel P. Berrangé wrote:
> On Tue, Feb 09, 2021 at 09:58:29AM +0000, Daniel P. Berrangé wrote:
> > On Tue, Feb 09, 2021 at 07:37:51AM +0100, Thomas Huth wrote:
> > > On 08/02/2021 17.33, Daniel P. Berrangé wrote:
> > > [...]
> > > > For example, consider pushing 5 commits, one of which contains a
> > > > dockerfile change. This will trigger a CI pipeline for the
> > > > containers. Now consider you do some more work on the branch and push 3
> > > > further commits, so you now have a branch of 8 commits. For the second
> > > > push GitLab will only look at the 3 most recent commits, the other 5
> > > > were already present. Thus GitLab will not realize that the branch has
> > > > dockerfile changes that need to trigger the container build.
> > > > 
> > > > This can cause real world problems:
> > > > 
> > > >   - Push 5 commits to branch "foo", including a dockerfile change
> > > > 
> > > >      => rebuilds the container images with content from "foo"
> > > >      => build jobs runs against containers from "foo"
> > > > 
> > > >   - Refresh your master branch with latest upstream master
> > > > 
> > > >      => rebuilds the container images with content from "master"
> > > >      => build jobs runs against containers from "master"
> > > > 
> > > >   - Push 3 more commits to branch "foo", with no dockerfile change
> > > > 
> > > >      => no container rebuild triggers
> > > >      => build jobs runs against containers from "master"
> > > > 
> > > > The "changes" conditional in gitlab is OK, *provided* your build
> > > > jobs are not relying on any external state from previous builds.
> > > > 
> > > > This is NOT the case in QEMU, because we are building container
> > > > images and these are cached. This is a scenario in which the
> > > > "changes" conditional is not usuable.
> > > > 
> > > > The only other way to avoid this problem would be to use the git
> > > > branch name as the container image tag, instead of always using
> > > > "latest".
> > > I'm basically fine with your patch, but let me ask one more thing: Won't we
> > > still have the problem if the user pushes to different branches
> > > simultaneously? E.g. the user pushes to "foo" with changes to dockerfiles,
> > > containers start to get rebuild, then pushes to master without waiting for
> > > the previous CI to finish, then the containers get rebuild from the "master"
> > > job without the local changes to the dockerfiles. Then in the "foo" CI
> > > pipelines the following jobs might run with the containers that have been
> > > built by the "master" job...
> > 
> > Yes,  this is the issue I describe in the cover letter.
> > 
> > > So if we really want to get it bulletproof, do we have to use the git branch
> > > name as the container image tag?
> > 
> > That is possible, but I'm somewhat loathe to do that, as it means the
> > container registry in developers forks will accumulate a growing list
> > of image tags. I know gitlab will force expire once it gets beyond a
> > certain number of tags, but it still felt pretty wasteful of space
> > to create so many tags.
> > 
> > Having said that, maybe this is not actually wasteful if we always
> > use the "master" as a cache for docker, then the "new" images we
> > build on each branch will just re-use existing docker layers and
> > thus not add to disk usage. We'd only see extra usage if the branch
> > contained changes to dockerfiles.
> 
> The challenge here is that I need the docker tag name to be in an env
> variable in the gitlab-ci.yml file.
> 
> I can directly use $CI_COMMIT_REF_NAME  to get the branch name but
> the list of valid characters for a git branch is way more permissive
> than valid characters for a docker tag.
> 
> So we need to filter the git branch name to form a valid docker tag,
> and AFAICT, there's no way todo that when setting a global env variable
> in the gitlab-ci.yml.  I can only do filtering once in the before_script:
> stage, and that's too late to use it in the image name for the job.

I've thought of a solution here.

We can tag the images with $CI_COMMIT_SHORT_SHA , and the build jobs
can reference them with 

  image: $CI_REGISTRY_IMAGE/qemu/$IMAGE:$CI_COMMIT_SHORT_SHA

In the continer build script, we then *also* tag them with a sanitized
version of $CI_COMMIT_REF_NAME, and also use this as the cache to pull
from when building the image.

The main downside here is that we'll end up creating alot of tags, but
most will have the same content so shouldn't be too bad.

Regards,
Daniel

Philippe Mathieu-Daudé Feb. 16, 2021, 1:02 p.m. UTC | #5

On 2/16/21 1:43 PM, Daniel P. Berrangé wrote:
> On Wed, Feb 10, 2021 at 11:17:00AM +0000, Daniel P. Berrangé wrote:
>> On Tue, Feb 09, 2021 at 09:58:29AM +0000, Daniel P. Berrangé wrote:
>>> On Tue, Feb 09, 2021 at 07:37:51AM +0100, Thomas Huth wrote:
>>>> On 08/02/2021 17.33, Daniel P. Berrangé wrote:
>>>> [...]
>>>>> For example, consider pushing 5 commits, one of which contains a
>>>>> dockerfile change. This will trigger a CI pipeline for the
>>>>> containers. Now consider you do some more work on the branch and push 3
>>>>> further commits, so you now have a branch of 8 commits. For the second
>>>>> push GitLab will only look at the 3 most recent commits, the other 5
>>>>> were already present. Thus GitLab will not realize that the branch has
>>>>> dockerfile changes that need to trigger the container build.
>>>>>
>>>>> This can cause real world problems:
>>>>>
>>>>>   - Push 5 commits to branch "foo", including a dockerfile change
>>>>>
>>>>>      => rebuilds the container images with content from "foo"
>>>>>      => build jobs runs against containers from "foo"
>>>>>
>>>>>   - Refresh your master branch with latest upstream master
>>>>>
>>>>>      => rebuilds the container images with content from "master"
>>>>>      => build jobs runs against containers from "master"
>>>>>
>>>>>   - Push 3 more commits to branch "foo", with no dockerfile change
>>>>>
>>>>>      => no container rebuild triggers
>>>>>      => build jobs runs against containers from "master"
>>>>>
>>>>> The "changes" conditional in gitlab is OK, *provided* your build
>>>>> jobs are not relying on any external state from previous builds.
>>>>>
>>>>> This is NOT the case in QEMU, because we are building container
>>>>> images and these are cached. This is a scenario in which the
>>>>> "changes" conditional is not usuable.
>>>>>
>>>>> The only other way to avoid this problem would be to use the git
>>>>> branch name as the container image tag, instead of always using
>>>>> "latest".
>>>> I'm basically fine with your patch, but let me ask one more thing: Won't we
>>>> still have the problem if the user pushes to different branches
>>>> simultaneously? E.g. the user pushes to "foo" with changes to dockerfiles,
>>>> containers start to get rebuild, then pushes to master without waiting for
>>>> the previous CI to finish, then the containers get rebuild from the "master"
>>>> job without the local changes to the dockerfiles. Then in the "foo" CI
>>>> pipelines the following jobs might run with the containers that have been
>>>> built by the "master" job...
>>>
>>> Yes,  this is the issue I describe in the cover letter.
>>>
>>>> So if we really want to get it bulletproof, do we have to use the git branch
>>>> name as the container image tag?
>>>
>>> That is possible, but I'm somewhat loathe to do that, as it means the
>>> container registry in developers forks will accumulate a growing list
>>> of image tags. I know gitlab will force expire once it gets beyond a
>>> certain number of tags, but it still felt pretty wasteful of space
>>> to create so many tags.
>>>
>>> Having said that, maybe this is not actually wasteful if we always
>>> use the "master" as a cache for docker, then the "new" images we
>>> build on each branch will just re-use existing docker layers and
>>> thus not add to disk usage. We'd only see extra usage if the branch
>>> contained changes to dockerfiles.
>>
>> The challenge here is that I need the docker tag name to be in an env
>> variable in the gitlab-ci.yml file.
>>
>> I can directly use $CI_COMMIT_REF_NAME  to get the branch name but
>> the list of valid characters for a git branch is way more permissive
>> than valid characters for a docker tag.
>>
>> So we need to filter the git branch name to form a valid docker tag,
>> and AFAICT, there's no way todo that when setting a global env variable
>> in the gitlab-ci.yml.  I can only do filtering once in the before_script:
>> stage, and that's too late to use it in the image name for the job.
> 
> I've thought of a solution here.
> 
> We can tag the images with $CI_COMMIT_SHORT_SHA , and the build jobs
> can reference them with 
> 
>   image: $CI_REGISTRY_IMAGE/qemu/$IMAGE:$CI_COMMIT_SHORT_SHA
> 
> In the continer build script, we then *also* tag them with a sanitized
> version of $CI_COMMIT_REF_NAME, and also use this as the cache to pull
> from when building the image.
> 
> The main downside here is that we'll end up creating alot of tags, but
> most will have the same content so shouldn't be too bad.

This could be automated (for forks):

https://docs.gitlab.com/ee/user/packages/container_registry/#delete-images-by-using-a-cleanup-policy

Not yet to the qemu-project registry because:

  Cleanup policies can be run on all projects, with these exceptions:

    For GitLab.com, the project must have been created after 2020-02-22.

Regards,

Phil.

Daniel P. Berrangé Feb. 16, 2021, 1:15 p.m. UTC | #6

On Tue, Feb 16, 2021 at 02:02:31PM +0100, Philippe Mathieu-Daudé wrote:
> On 2/16/21 1:43 PM, Daniel P. Berrangé wrote:
> > On Wed, Feb 10, 2021 at 11:17:00AM +0000, Daniel P. Berrangé wrote:
> >> On Tue, Feb 09, 2021 at 09:58:29AM +0000, Daniel P. Berrangé wrote:
> >>> On Tue, Feb 09, 2021 at 07:37:51AM +0100, Thomas Huth wrote:
> >>>> On 08/02/2021 17.33, Daniel P. Berrangé wrote:
> >>>> [...]
> >>>>> For example, consider pushing 5 commits, one of which contains a
> >>>>> dockerfile change. This will trigger a CI pipeline for the
> >>>>> containers. Now consider you do some more work on the branch and push 3
> >>>>> further commits, so you now have a branch of 8 commits. For the second
> >>>>> push GitLab will only look at the 3 most recent commits, the other 5
> >>>>> were already present. Thus GitLab will not realize that the branch has
> >>>>> dockerfile changes that need to trigger the container build.
> >>>>>
> >>>>> This can cause real world problems:
> >>>>>
> >>>>>   - Push 5 commits to branch "foo", including a dockerfile change
> >>>>>
> >>>>>      => rebuilds the container images with content from "foo"
> >>>>>      => build jobs runs against containers from "foo"
> >>>>>
> >>>>>   - Refresh your master branch with latest upstream master
> >>>>>
> >>>>>      => rebuilds the container images with content from "master"
> >>>>>      => build jobs runs against containers from "master"
> >>>>>
> >>>>>   - Push 3 more commits to branch "foo", with no dockerfile change
> >>>>>
> >>>>>      => no container rebuild triggers
> >>>>>      => build jobs runs against containers from "master"
> >>>>>
> >>>>> The "changes" conditional in gitlab is OK, *provided* your build
> >>>>> jobs are not relying on any external state from previous builds.
> >>>>>
> >>>>> This is NOT the case in QEMU, because we are building container
> >>>>> images and these are cached. This is a scenario in which the
> >>>>> "changes" conditional is not usuable.
> >>>>>
> >>>>> The only other way to avoid this problem would be to use the git
> >>>>> branch name as the container image tag, instead of always using
> >>>>> "latest".
> >>>> I'm basically fine with your patch, but let me ask one more thing: Won't we
> >>>> still have the problem if the user pushes to different branches
> >>>> simultaneously? E.g. the user pushes to "foo" with changes to dockerfiles,
> >>>> containers start to get rebuild, then pushes to master without waiting for
> >>>> the previous CI to finish, then the containers get rebuild from the "master"
> >>>> job without the local changes to the dockerfiles. Then in the "foo" CI
> >>>> pipelines the following jobs might run with the containers that have been
> >>>> built by the "master" job...
> >>>
> >>> Yes,  this is the issue I describe in the cover letter.
> >>>
> >>>> So if we really want to get it bulletproof, do we have to use the git branch
> >>>> name as the container image tag?
> >>>
> >>> That is possible, but I'm somewhat loathe to do that, as it means the
> >>> container registry in developers forks will accumulate a growing list
> >>> of image tags. I know gitlab will force expire once it gets beyond a
> >>> certain number of tags, but it still felt pretty wasteful of space
> >>> to create so many tags.
> >>>
> >>> Having said that, maybe this is not actually wasteful if we always
> >>> use the "master" as a cache for docker, then the "new" images we
> >>> build on each branch will just re-use existing docker layers and
> >>> thus not add to disk usage. We'd only see extra usage if the branch
> >>> contained changes to dockerfiles.
> >>
> >> The challenge here is that I need the docker tag name to be in an env
> >> variable in the gitlab-ci.yml file.
> >>
> >> I can directly use $CI_COMMIT_REF_NAME  to get the branch name but
> >> the list of valid characters for a git branch is way more permissive
> >> than valid characters for a docker tag.
> >>
> >> So we need to filter the git branch name to form a valid docker tag,
> >> and AFAICT, there's no way todo that when setting a global env variable
> >> in the gitlab-ci.yml.  I can only do filtering once in the before_script:
> >> stage, and that's too late to use it in the image name for the job.
> > 
> > I've thought of a solution here.
> > 
> > We can tag the images with $CI_COMMIT_SHORT_SHA , and the build jobs
> > can reference them with 
> > 
> >   image: $CI_REGISTRY_IMAGE/qemu/$IMAGE:$CI_COMMIT_SHORT_SHA
> > 
> > In the continer build script, we then *also* tag them with a sanitized
> > version of $CI_COMMIT_REF_NAME, and also use this as the cache to pull
> > from when building the image.
> > 
> > The main downside here is that we'll end up creating alot of tags, but
> > most will have the same content so shouldn't be too bad.
> 
> This could be automated (for forks):
> 
> https://docs.gitlab.com/ee/user/packages/container_registry/#delete-images-by-using-a-cleanup-policy
> 
> Not yet to the qemu-project registry because:
> 
>   Cleanup policies can be run on all projects, with these exceptions:
> 
>     For GitLab.com, the project must have been created after 2020-02-22.

NB, when they say "project" here it appears to refer to the top level
namespace, ie your personal account namespace, not the individual repos.

None of my repos allow expiration to be turned on, even repos I only
created last week :-(

It apears they are getting close to removing this restriction though

  https://gitlab.com/gitlab-org/gitlab/-/issues/196124#note_492157369

so perhaps in the next 6 months expiration will be active.


Regards,
Daniel

[1/3] gitlab: always build container images

Commit Message

Comments

Patch