[v2,1/2] CI: limit GitHub Actions to designated branches

Message ID	73de97dfebfccabe9f1bf32ea41aea5008a949cd.1588607262.git.congdanhqx@gmail.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <SRS0=JCNZ=6S=vger.kernel.org=git-owner@kernel.org> From: =?utf-8?b?xJBvw6BuIFRy4bqnbiBDw7RuZyBEYW5o?= <congdanhqx@gmail.com> To: git@vger.kernel.org Cc: =?utf-8?b?xJBvw6BuIFRy4bqnbiBDw7RuZyBEYW5o?= <congdanhqx@gmail.com>, Jeff Hostetler <jeffhost@microsoft.com>, Johannes Schindelin <johannes.schindelin@gmx.de>, Jeff King <peff@peff.net> Subject: [PATCH v2 1/2] CI: limit GitHub Actions to designated branches Date: Mon, 4 May 2020 22:49:31 +0700 Message-Id: <73de97dfebfccabe9f1bf32ea41aea5008a949cd.1588607262.git.congdanhqx@gmail.com> In-Reply-To: <cover.1588607262.git.congdanhqx@gmail.com> References: <20200504150105.GB11373@coredump.intra.peff.net> <cover.1588607262.git.congdanhqx@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Sender: git-owner@vger.kernel.org Precedence: bulk
Series	Limit GitHub Actions to designated branches \| expand [v2,0/2] Limit GitHub Actions to designated branches [v2,1/2] CI: limit GitHub Actions to designated branches [v2,2/2] SubmittingPatches: advertise GitHub Actions CI

Đoàn Trần Công Danh May 4, 2020, 3:49 p.m. UTC

Git's maintainer usually don't have enough time to debug the failure of
invidual feature branch, they usually want to look at intergration
branches.

Contributors now can have GitHub Actions as an opt-in option,
should they want to check their code, they will push into designated
branch.

Signed-off-by: Đoàn Trần Công Danh <congdanhqx@gmail.com>
---
 .github/workflows/main.yml | 14 +++++++++++++-
 1 file changed, 13 insertions(+), 1 deletion(-)

Jeff King May 4, 2020, 4:23 p.m. UTC | #1

On Mon, May 04, 2020 at 10:49:31PM +0700, Đoàn Trần Công Danh wrote:

> diff --git a/.github/workflows/main.yml b/.github/workflows/main.yml
> index fd4df939b5..ea43b03092 100644
> --- a/.github/workflows/main.yml
> +++ b/.github/workflows/main.yml
> @@ -1,6 +1,18 @@
>  name: CI/PR
>  
> -on: [push, pull_request]
> +on:
> +  pull_request:
> +    branches:
> +      - '**'

Doing "**" here makes sense to catch everything (it would be even better
if we could just say "everything with a pull request" by omitting the
branch filter entirely, but maybe that's not possible).

> +    tags:
> +      - '*'

Would we want that here, too? I guess nobody is likely to push
"foo/v1.2.3".

Or on the flip side, would we want to tighten this? If I push a tag
"wip", I probably don't want it built. Probably the right rule is
"annotated tags only", but I suspect that's not possible.

> +  push:
> +    branches:
> +      - maint
> +      - master
> +      - next
> +      - jch
> +      - pu

What happened to "for-ci" (presumably "for-ci/**")?

-Peff

Taylor Blau May 4, 2020, 9:58 p.m. UTC | #2

On Mon, May 04, 2020 at 12:23:11PM -0400, Jeff King wrote:
> On Mon, May 04, 2020 at 10:49:31PM +0700, Đoàn Trần Công Danh wrote:
>
> > diff --git a/.github/workflows/main.yml b/.github/workflows/main.yml
> > index fd4df939b5..ea43b03092 100644
> > --- a/.github/workflows/main.yml
> > +++ b/.github/workflows/main.yml
> > @@ -1,6 +1,18 @@
> >  name: CI/PR
> >
> > -on: [push, pull_request]
> > +on:
> > +  pull_request:
> > +    branches:
> > +      - '**'
>
> Doing "**" here makes sense to catch everything (it would be even better
> if we could just say "everything with a pull request" by omitting the
> branch filter entirely, but maybe that's not possible).
>
> > +    tags:
> > +      - '*'
>
> Would we want that here, too? I guess nobody is likely to push
> "foo/v1.2.3".
>
> Or on the flip side, would we want to tighten this? If I push a tag
> "wip", I probably don't want it built. Probably the right rule is
> "annotated tags only", but I suspect that's not possible.
>
> > +  push:
> > +    branches:
> > +      - maint
> > +      - master
> > +      - next
> > +      - jch
> > +      - pu
>
> What happened to "for-ci" (presumably "for-ci/**")?

Huh; I'm not sure that I'm sold on the idea of a 'for-ci' namespace
here. In addition to running 'make test' on patches locally before I
send them, I find it tremendously convenient for GitHub to run them for
me when I push 'tb/' branches up to 'ttaylorr/git'.

So, while the above is more-or-less what I'd expect the monitored list
of branches to look like (at least, ignoring the missing 'for-ci/**'
bits), I wish that I could also build every branch that I push up to my
fork.

Of course, I don't want to maintain a one-patch difference between
ttaylorr/git@master and git/git@master, so I wonder if we could get a
little more creative with these rules and actually run Actions on
*every* branch, but introduce a new first step which stops the rest of
the actions run (so that in practice we're not running CI on
non-integration branches in Junio's tree).

I figure that we need something more flexible than the 'push.branches'
list, but I'd be very curious to hear if something like what I'm
describing is possible.

> -Peff

Thanks,
Taylor

Junio C Hamano May 4, 2020, 10:52 p.m. UTC | #3

Taylor Blau <me@ttaylorr.com> writes:

> Huh; I'm not sure that I'm sold on the idea of a 'for-ci' namespace
> here. In addition to running 'make test' on patches locally before I
> send them, I find it tremendously convenient for GitHub to run them for
> me when I push 'tb/' branches up to 'ttaylorr/git'.
>
> So, while the above is more-or-less what I'd expect the monitored list
> of branches to look like (at least, ignoring the missing 'for-ci/**'
> bits), I wish that I could also build every branch that I push up to my
> fork.
>
> Of course, I don't want to maintain a one-patch difference between
> ttaylorr/git@master and git/git@master, so I wonder if we could get a
> little more creative with these rules and actually run Actions on
> *every* branch, but introduce a new first step which stops the rest of
> the actions run (so that in practice we're not running CI on
> non-integration branches in Junio's tree).

Hmph, what are we trying to avoid by using the for-ci/ convention?

If this is only a reaction to what I said earlier (i.e. "building
everything in github.com/gitster/git/ has no value to me"), then I
suspect it may be an over-engineered solution to a problem that does
not exist, and harms people like you.  I could just go there and
turn off GitHub Actions for that repository instead.

Or are there more issues being addressed with the "testing branches
are opt-in, unless a pull request against git/git explicitly says it
is ready to be tested" approach?

Taylor Blau May 4, 2020, 11:15 p.m. UTC | #4

On Mon, May 04, 2020 at 03:52:44PM -0700, Junio C Hamano wrote:
> Taylor Blau <me@ttaylorr.com> writes:
>
> > Huh; I'm not sure that I'm sold on the idea of a 'for-ci' namespace
> > here. In addition to running 'make test' on patches locally before I
> > send them, I find it tremendously convenient for GitHub to run them for
> > me when I push 'tb/' branches up to 'ttaylorr/git'.
> >
> > So, while the above is more-or-less what I'd expect the monitored list
> > of branches to look like (at least, ignoring the missing 'for-ci/**'
> > bits), I wish that I could also build every branch that I push up to my
> > fork.
> >
> > Of course, I don't want to maintain a one-patch difference between
> > ttaylorr/git@master and git/git@master, so I wonder if we could get a
> > little more creative with these rules and actually run Actions on
> > *every* branch, but introduce a new first step which stops the rest of
> > the actions run (so that in practice we're not running CI on
> > non-integration branches in Junio's tree).
>
> Hmph, what are we trying to avoid by using the for-ci/ convention?
>
> If this is only a reaction to what I said earlier (i.e. "building
> everything in github.com/gitster/git/ has no value to me"), then I
> suspect it may be an over-engineered solution to a problem that does
> not exist, and harms people like you.  I could just go there and
> turn off GitHub Actions for that repository instead.

It is a reaction to that, yes. It would be nice to have a middle-ground
where you can run Actions on the official integration branches, but
contributors such as myself run Actions on *every* branch of their fork.

It does feel over-engineered, yes. I would not be surprised if Actions
supports something like this more directly, and I just don't know about
it.

> Or are there more issues being addressed with the "testing branches
> are opt-in, unless a pull request against git/git explicitly says it
> is ready to be tested" approach?

Thanks,
Taylor

Jeff King May 4, 2020, 11:35 p.m. UTC | #5

On Mon, May 04, 2020 at 05:15:44PM -0600, Taylor Blau wrote:

> > If this is only a reaction to what I said earlier (i.e. "building
> > everything in github.com/gitster/git/ has no value to me"), then I
> > suspect it may be an over-engineered solution to a problem that does
> > not exist, and harms people like you.  I could just go there and
> > turn off GitHub Actions for that repository instead.
> 
> It is a reaction to that, yes. It would be nice to have a middle-ground
> where you can run Actions on the official integration branches, but
> contributors such as myself run Actions on *every* branch of their fork.

The problem is that there is no way for people to _not_ run on every
branch of their fork, then. It is either every branch or no branches.

By choosing a prefix, it gives people the option to choose. But it does
place the burden on them of adding an extra refspec:

  refs/heads/*:refs/heads/for-ci/*

or similar. I don't like it, but I don't see any other way to let people
choose some branches but not others (that doesn't involve a one-line
yaml change at the beginning of every branch, which is an even higher
burden).

-Peff

Jeff King May 4, 2020, 11:36 p.m. UTC | #6

On Mon, May 04, 2020 at 03:58:24PM -0600, Taylor Blau wrote:

> Huh; I'm not sure that I'm sold on the idea of a 'for-ci' namespace
> here. In addition to running 'make test' on patches locally before I
> send them, I find it tremendously convenient for GitHub to run them for
> me when I push 'tb/' branches up to 'ttaylorr/git'.
> 
> So, while the above is more-or-less what I'd expect the monitored list
> of branches to look like (at least, ignoring the missing 'for-ci/**'
> bits), I wish that I could also build every branch that I push up to my
> fork.
> 
> Of course, I don't want to maintain a one-patch difference between
> ttaylorr/git@master and git/git@master, so I wonder if we could get a
> little more creative with these rules and actually run Actions on
> *every* branch, but introduce a new first step which stops the rest of
> the actions run (so that in practice we're not running CI on
> non-integration branches in Junio's tree).

I don't understand what that would accomplish. If we ran the actions on
every branch but stopped the run, then you wouldn't get the CI results
you want. What am I missing?

-Peff

Taylor Blau May 5, 2020, 12:20 a.m. UTC | #7

On Mon, May 04, 2020 at 07:36:34PM -0400, Jeff King wrote:
> On Mon, May 04, 2020 at 03:58:24PM -0600, Taylor Blau wrote:
>
> > Huh; I'm not sure that I'm sold on the idea of a 'for-ci' namespace
> > here. In addition to running 'make test' on patches locally before I
> > send them, I find it tremendously convenient for GitHub to run them for
> > me when I push 'tb/' branches up to 'ttaylorr/git'.
> >
> > So, while the above is more-or-less what I'd expect the monitored list
> > of branches to look like (at least, ignoring the missing 'for-ci/**'
> > bits), I wish that I could also build every branch that I push up to my
> > fork.
> >
> > Of course, I don't want to maintain a one-patch difference between
> > ttaylorr/git@master and git/git@master, so I wonder if we could get a
> > little more creative with these rules and actually run Actions on
> > *every* branch, but introduce a new first step which stops the rest of
> > the actions run (so that in practice we're not running CI on
> > non-integration branches in Junio's tree).
>
> I don't understand what that would accomplish. If we ran the actions on
> every branch but stopped the run, then you wouldn't get the CI results
> you want. What am I missing?

That on forks of git/git we *would't* stop the run for non-integration
branches, i.e., that we'd have something like:

  * Actions is running on all branches, of all forks, all the time.

  * If a branch is pushed to git/git, and it's not an integration
    branch, the run is aborted early.

  * If a branch is pushed to a fork of git/git, it's allowed to run no
    matter what.

Maybe it's an over-engineered solution to a problem that we don't
have, though. Junio seems happy to simply disable Actions on git/git.
OTOH, I think it would be OK to skip the second bullet point, and
have Junio disable Actions notifications.

/shrug

> -Peff

Thanks,
Taylor

Junio C Hamano May 5, 2020, 12:24 a.m. UTC | #8

Jeff King <peff@peff.net> writes:

> On Mon, May 04, 2020 at 05:15:44PM -0600, Taylor Blau wrote:
>
>> > If this is only a reaction to what I said earlier (i.e. "building
>> > everything in github.com/gitster/git/ has no value to me"), then I
>> > suspect it may be an over-engineered solution to a problem that does
>> > not exist, and harms people like you.  I could just go there and
>> > turn off GitHub Actions for that repository instead.
>> 
>> It is a reaction to that, yes. It would be nice to have a middle-ground
>> where you can run Actions on the official integration branches, but
>> contributors such as myself run Actions on *every* branch of their fork.
>
> The problem is that there is no way for people to _not_ run on every
> branch of their fork, then. It is either every branch or no branches.

OK.

> .... I don't like it, but I don't see any other way to let people
> choose some branches but not others (that doesn't involve a one-line
> yaml change at the beginning of every branch, which is an even higher
> burden).

True, and having to mark a commit object with "do not run CI on this",
which is what I heard Travis does, is even worse.

Đoàn Trần Công Danh May 5, 2020, 12:34 a.m. UTC | #9

On 2020-05-04 12:23:11-0400, Jeff King <peff@peff.net> wrote:
> On Mon, May 04, 2020 at 10:49:31PM +0700, Đoàn Trần Công Danh wrote:
> Doing "**" here makes sense to catch everything (it would be even better
> if we could just say "everything with a pull request" by omitting the
> branch filter entirely, but maybe that's not possible).

When I was doing this, I couldn't create a fork of my fork to check
the syntax for GitHub PR. So, I pick a safe step.

Turn out, I can create PR against my own fork.
And, it's possible to omit the branch filter entirely.

> > +    tags:
> > +      - '*'
> 
> Would we want that here, too? I guess nobody is likely to push
> "foo/v1.2.3".
> 
> Or on the flip side, would we want to tighten this? If I push a tag
> "wip", I probably don't want it built. Probably the right rule is
> "annotated tags only", but I suspect that's not possible.

From my reading, GitHub Actions only accepts filter by refname.

From GitHub manual, we can limit tags selection by: 
-------------8<------
diff --git a/.github/workflows/main.yml b/.github/workflows/main.yml
index adf8824af1..9bba0ce068 100644
--- a/.github/workflows/main.yml
+++ b/.github/workflows/main.yml
@@ -11,7 +11,8 @@ on:
       - pu
       - 'for-ci**'
     tags:
-      - '*'
+      - '**'
+      - '!**wip**'
 
 env:
   DEVELOPER: 1
---------------->8-------------

But, I'm running into GitHub internal error with this snippet.
I'll look into it later.

> 
> > +  push:
> > +    branches:
> > +      - maint
> > +      - master
> > +      - next
> > +      - jch
> > +      - pu
> 
> What happened to "for-ci" (presumably "for-ci/**")?

Hm, it was lost when I created a lot of branch for testing and I send the incorrect one out.
It'll be correct next time.

Jeff King May 5, 2020, 4:43 p.m. UTC | #10

On Mon, May 04, 2020 at 06:20:55PM -0600, Taylor Blau wrote:

> > > Of course, I don't want to maintain a one-patch difference between
> > > ttaylorr/git@master and git/git@master, so I wonder if we could get a
> > > little more creative with these rules and actually run Actions on
> > > *every* branch, but introduce a new first step which stops the rest of
> > > the actions run (so that in practice we're not running CI on
> > > non-integration branches in Junio's tree).
> >
> > I don't understand what that would accomplish. If we ran the actions on
> > every branch but stopped the run, then you wouldn't get the CI results
> > you want. What am I missing?
> 
> That on forks of git/git we *would't* stop the run for non-integration
> branches, i.e., that we'd have something like:

Ah, so the first step is "if we are gitster/git, then do not run", not
"do not run".

But that is missing the point of the exercise, no? The question of what
gitster/git should do was a side conversation. The purpose of Dscho's
original patch and Danh's followup was to allow anybody to choose which
branches in their own fork. I.e.:

>   * Actions is running on all branches, of all forks, all the time.

This is how it works now, and is the exact thing we are trying to fix.

-Peff

Junio C Hamano May 5, 2020, 5:57 p.m. UTC | #11

Jeff King <peff@peff.net> writes:

> But that is missing the point of the exercise, no? The question of what
> gitster/git should do was a side conversation. The purpose of Dscho's
> original patch and Danh's followup was to allow anybody to choose which
> branches in their own fork. I.e.:
>
>>   * Actions is running on all branches, of all forks, all the time.
>
> This is how it works now, and is the exact thing we are trying to fix.

Thanks for clarifying and refocusing the discussion.  I am onboard.

It seems to me that there are only two and half approaches, then:

 - Branch-build is opt-in; only branches that match selected, known,
   and fixed patterns will be built (e.g. 'maint', 'maint-*',
   'master', 'next', 'pu', and 'for-ci/*').

 - Branch-build is opt-out; branches that match selected, known, and
   fixed patterns will be excluded (e.g., '*-wip').

 - If you do not want your branches want to be skipped, you need to
   tweak the commit at the tip (e.g. mark with '[skip ci]' log
   message, munging .github/workflows/ in the tree).

The last one is only there for completeness.  I do not think mucking
with the objects recorded in the history, whether it is a tweaked
log message or tweaked tree contents, is a good way to do this.

More random ideas...

Would it be too much hassle to use notes for a thing like this?
Perhaps push out with refs/notes/skip-ci note attached to a commit
you do not want to be built?  I have a feeling that it gives way
overkill flexibility with little gain (probably too cumbersome to
manage).

Does push into GitHub repository offer an ability to pass arbitrary
push option, to which actions that trigger "on: push" event can
react?

Jeff King May 5, 2020, 6:24 p.m. UTC | #12

On Tue, May 05, 2020 at 10:57:39AM -0700, Junio C Hamano wrote:

> Would it be too much hassle to use notes for a thing like this?
> Perhaps push out with refs/notes/skip-ci note attached to a commit
> you do not want to be built?  I have a feeling that it gives way
> overkill flexibility with little gain (probably too cumbersome to
> manage).

I think using notes would be a hassle. This config is really associated
with a branch, not a particular config (so you'd have to make sure they
propagate across rebases, etc).

But _if_ we can read from other refs in the repository, I would be very
happy if we parsed config out of refs/ci/branches or something. It feels
like that's something that ought to be possible, but I haven't quite
figured out a way to do it.

Really all we want is some kind of per-repo variable storage where the
values aren't baked into the tree. There is a "secrets" system that can
be used for this, though it kind of feels like an abuse of the concept.

> Does push into GitHub repository offer an ability to pass arbitrary
> push option, to which actions that trigger "on: push" event can
> react?

No, I don't think so.

-Peff

Jeff King May 5, 2020, 9:04 p.m. UTC | #13

On Tue, May 05, 2020 at 02:24:18PM -0400, Jeff King wrote:

> But _if_ we can read from other refs in the repository, I would be very
> happy if we parsed config out of refs/ci/branches or something. It feels
> like that's something that ought to be possible, but I haven't quite
> figured out a way to do it.

OK, I finally figured this out. The result is the patch below, which I
think should make everybody happy. Or at least has the ability to do so
if they're willing to push a config ref. ;)

-- >8 --
Subject: [PATCH] ci: allow per-branch config for GitHub Actions

Depending on the workflows of individual developers, it can either be
convenient or annoying that our GitHub Actions CI jobs are run on every
branch. As an example of annoying: if you carry many half-finished
work-in-progress branches and rebase them frequently against master,
you'd get tons of failure reports that aren't interesting (not to
mention the wasted CPU).

This commit adds a new job which checks a special ref within the
repository for CI config, and (optionally) causes all of the other jobs
to be skipped.

There have been a few alternatives discussed:

One option is to carry information in the commit itself about whether it
should be tested, either in the tree itself (changing the workflow YAML
file) or in the commit message (a "[skip ci]" flag or similar). But
these are frustrating and error-prone to use:

  - you have to manually apply them to each branch that you want to mark

  - it's easy for them to leak into other workflows, like emailing patches

We could likewise try to get some information from the branch name. But
that leads to debates about whether the default should be "off" or "on",
and overriding still ends up somewhat awkward. If we default to "on",
you have to remember to name your branches appropriately to skip CI. And
if "off", you end up having to contort your branch names or duplicate
your pushes with an extra refspec.

By comparison, this commit's solution lets you specify your config once
and forget about it, and all of the data is off in its own ref, where it
can be changed by individual forks without touching the main tree.

I used refs/ci/config as the config ref, which should be a commit whose
tree contains various config files (right now the only one is
"ref-whitelist"). It was intentional to avoid refs/heads/ here so we
don't conflict with any branch workflows. But it does make it a little
awkward to edit, since you can't check it out directly.

Right now the logic is to run CI for all branches by default, unless a
whitelist exists, in which case the branch must be mentioned there
(using its fully qualified ref name). We could easily add in a
blacklist, as well. Or since we're running a shell in a VM, we really
could just run "./allow-ref $refname" and let individual forks specify
whatever shell code they like.

Signed-off-by: Jeff King <peff@peff.net>
---
After writing that, I think we probably ought to just do the allow-ref
thing from the start, and skip this whitelist logic. Then we should
never need to change this workflow file again. People can implement
whatever weird custom logic they want to.

 .github/workflows/main.yml | 32 ++++++++++++++++++++++++++++++++
 1 file changed, 32 insertions(+)

diff --git a/.github/workflows/main.yml b/.github/workflows/main.yml
index fd4df939b5..51f4ff6e89 100644
--- a/.github/workflows/main.yml
+++ b/.github/workflows/main.yml
@@ -6,7 +6,29 @@ env:
   DEVELOPER: 1

 jobs:
+  check-ci:
+      runs-on: ubuntu-latest
+      outputs:
+        enabled: ${{ steps.check-ref.outputs.enabled }}
+      steps:
+        - uses: actions/checkout@v2
+          continue-on-error: true
+          with:
+            ref: refs/ci/config
+        - id: check-ref
+          name: check whether CI is enabled for ref
+          run: |
+            enabled=yes
+            if test -e ref-whitelist &&
+               ! grep '^${{ github.ref }}$' ref-whitelist
+            then
+              enabled=no
+            fi
+            echo "::set-output name=enabled::$enabled"
+
   windows-build:
+    needs: check-ci
+    if: needs.check-ci.outputs.enabled == 'yes'
     runs-on: windows-latest
     steps:
     - uses: actions/checkout@v1
@@ -70,6 +92,8 @@ jobs:
         name: failed-tests-windows
         path: ${{env.FAILED_TEST_ARTIFACTS}}
   vs-build:
+    needs: check-ci
+    if: needs.check-ci.outputs.enabled == 'yes'
     env:
       MSYSTEM: MINGW64
       NO_PERL: 1
@@ -154,6 +178,8 @@ jobs:
                           ${{matrix.nr}} 10 t[0-9]*.sh)
         "@
   regular:
+    needs: check-ci
+    if: needs.check-ci.outputs.enabled == 'yes'
     strategy:
       matrix:
         vector:
@@ -189,6 +215,8 @@ jobs:
         name: failed-tests-${{matrix.vector.jobname}}
         path: ${{env.FAILED_TEST_ARTIFACTS}}
   dockerized:
+    needs: check-ci
+    if: needs.check-ci.outputs.enabled == 'yes'
     strategy:
       matrix:
         vector:
@@ -213,6 +241,8 @@ jobs:
         name: failed-tests-${{matrix.vector.jobname}}
         path: ${{env.FAILED_TEST_ARTIFACTS}}
   static-analysis:
+    needs: check-ci
+    if: needs.check-ci.outputs.enabled == 'yes'
     env:
       jobname: StaticAnalysis
     runs-on: ubuntu-latest
@@ -221,6 +251,8 @@ jobs:
     - run: ci/install-dependencies.sh
     - run: ci/run-static-analysis.sh
   documentation:
+    needs: check-ci
+    if: needs.check-ci.outputs.enabled == 'yes'
     env:
       jobname: Documentation
     runs-on: ubuntu-latest

Junio C Hamano May 5, 2020, 9:29 p.m. UTC | #14

Jeff King <peff@peff.net> writes:

> On Tue, May 05, 2020 at 02:24:18PM -0400, Jeff King wrote:
>
>> But _if_ we can read from other refs in the repository, I would be very
>> happy if we parsed config out of refs/ci/branches or something. It feels
>> like that's something that ought to be possible, but I haven't quite
>> figured out a way to do it.
>
> OK, I finally figured this out. The result is the patch below, which I
> think should make everybody happy. Or at least has the ability to do so
> if they're willing to push a config ref. ;)

That sounds good.

> Subject: [PATCH] ci: allow per-branch config for GitHub Actions
>
> Depending on the workflows of individual developers, it can either be
> convenient or annoying that our GitHub Actions CI jobs are run on every
> branch. As an example of annoying: if you carry many half-finished
> work-in-progress branches and rebase them frequently against master,
> you'd get tons of failure reports that aren't interesting (not to
> mention the wasted CPU).

OK.

> This commit adds a new job which checks a special ref within the
> repository for CI config, and (optionally) causes all of the other jobs
> to be skipped.

Nice---that way, all existing jobs do not even need to know about
the special controlling ref.

> Right now the logic is to run CI for all branches by default, unless a
> whitelist exists, in which case the branch must be mentioned there
> (using its fully qualified ref name).

So there is no wildcard?  Not really complaining, but am wondering.

> We could easily add in a
> blacklist, as well.

OK.

> Or since we're running a shell in a VM, we really
> could just run "./allow-ref $refname" and let individual forks specify
> whatever shell code they like.

I presume that you are saying "checking out the tree of refs/ci/config
and there is a program allow-ref that can tell which one to run ci on"?

> After writing that, I think we probably ought to just do the allow-ref
> thing from the start, and skip this whitelist logic. Then we should
> never need to change this workflow file again. People can implement
> whatever weird custom logic they want to.

Probably.

>  jobs:
> +  check-ci:
> +      runs-on: ubuntu-latest
> +      outputs:
> +        enabled: ${{ steps.check-ref.outputs.enabled }}
> +      steps:
> +        - uses: actions/checkout@v2
> +          continue-on-error: true
> +          with:
> +            ref: refs/ci/config
> +        - id: check-ref
> +          name: check whether CI is enabled for ref
> +          run: |
> +            enabled=yes
> +            if test -e ref-whitelist &&
> +               ! grep '^${{ github.ref }}$' ref-whitelist
> +            then
> +              enabled=no
> +            fi
> +            echo "::set-output name=enabled::$enabled"
> +
>    windows-build:
> +    needs: check-ci
> +    if: needs.check-ci.outputs.enabled == 'yes'
>      runs-on: windows-latest
>      steps:
>      - uses: actions/checkout@v1

Oh, quite nice.  Simple and clean.

Jeff King May 5, 2020, 9:58 p.m. UTC | #15

On Tue, May 05, 2020 at 02:29:09PM -0700, Junio C Hamano wrote:

> > This commit adds a new job which checks a special ref within the
> > repository for CI config, and (optionally) causes all of the other jobs
> > to be skipped.
> 
> Nice---that way, all existing jobs do not even need to know about
> the special controlling ref.

Yep. Also, it only has to run once. It's not too expensive, but it does
seem to take a few seconds to run.

I think the VM setup is usually pretty quick due to the way Actions is
implemented. I made sure we avoid cloning the whole normal git.git tree,
but we still have to clone the individual config branch. It seems to
take ~3-5s to run that job. I'd have liked to make it even cheaper for
branches which don't want CI (less for latency, but more just because I
hate to waste GitHub's resources), but I think that's the best we can
do. And for cases where we actually run CI, 3s is a drop in the bucket. ;)

> > Right now the logic is to run CI for all branches by default, unless a
> > whitelist exists, in which case the branch must be mentioned there
> > (using its fully qualified ref name).
> 
> So there is no wildcard?  Not really complaining, but am wondering.

Correct, there's no wildcard. It would be easy to add one, now that this
gets us to a working shell environment with the refname available. And
probably worth doing if we don't just go the "execute a program from the
config tree" route.

> > Or since we're running a shell in a VM, we really
> > could just run "./allow-ref $refname" and let individual forks specify
> > whatever shell code they like.
> 
> I presume that you are saying "checking out the tree of refs/ci/config
> and there is a program allow-ref that can tell which one to run ci on"?

Yes, exactly.

-Peff

Junio C Hamano May 5, 2020, 10:28 p.m. UTC | #16

Jeff King <peff@peff.net> writes:

>> > Or since we're running a shell in a VM, we really
>> > could just run "./allow-ref $refname" and let individual forks specify
>> > whatever shell code they like.
>> 
>> I presume that you are saying "checking out the tree of refs/ci/config
>> and there is a program allow-ref that can tell which one to run ci on"?
>
> Yes, exactly.

I guess a simple example implementation of allow-ref script, with a
bit of instruction to tell people how to initialize a separate and
otherwise empty repository with just a (possibly customized) copy of
the script in it, and push its 'master' branch to refs/ci/config of
their Git fork, would be a way to go, then?

Sounds simple enough.  Cycles to spin up a VM that adds 3 seconds
latency, only to know that a branch won't need to be built, does
sound like a bit unfortunate, though.

Đoàn Trần Công Danh May 6, 2020, 12:46 a.m. UTC | #17

On 2020-05-05 17:04:51-0400, Jeff King <peff@peff.net> wrote:
> We could likewise try to get some information from the branch name. But
> that leads to debates about whether the default should be "off" or "on",
> and overriding still ends up somewhat awkward. If we default to "on",
> you have to remember to name your branches appropriately to skip CI. And
> if "off", you end up having to contort your branch names or duplicate
> your pushes with an extra refspec.
> 
> By comparison, this commit's solution lets you specify your config once
> and forget about it, and all of the data is off in its own ref, where it
> can be changed by individual forks without touching the main tree.

How about supports the best of both worlds.

Let's say support wildcard 'wip/**' for opt-out.
And use "./allow-ref" to filter everything that passed the wildcard.

> I used refs/ci/config as the config ref, which should be a commit whose
> tree contains various config files (right now the only one is
> "ref-whitelist"). It was intentional to avoid refs/heads/ here so we
> don't conflict with any branch workflows. But it does make it a little
> awkward to edit, since you can't check it out directly.
> 
> Right now the logic is to run CI for all branches by default, unless a
> whitelist exists, in which case the branch must be mentioned there
> (using its fully qualified ref name). We could easily add in a
> blacklist, as well. Or since we're running a shell in a VM, we really
> could just run "./allow-ref $refname" and let individual forks specify
> whatever shell code they like.

Should we go with this route, here is a fix-up patch for your,
(after cherry-pick my [1/3])
--------------------8<-------------------
Subject: [PATCH] fixup! ci: allow per-branch config for GitHub Actions

Signed-off-by: Đoàn Trần Công Danh <congdanhqx@gmail.com>
---
 .github/workflows/main.yml      |  3 +--
 Documentation/SubmittingPatches | 12 ++++++++++++
 contrib/ci-config-allow-ref     |  9 +++++++++
 3 files changed, 22 insertions(+), 2 deletions(-)
 create mode 100755 contrib/ci-config-allow-ref

diff --git a/.github/workflows/main.yml b/.github/workflows/main.yml
index 51f4ff6e89..08217c5ed8 100644
--- a/.github/workflows/main.yml
+++ b/.github/workflows/main.yml
@@ -19,8 +19,7 @@ jobs:
           name: check whether CI is enabled for ref
           run: |
             enabled=yes
-            if test -e ref-whitelist &&
-               ! grep '^${{ github.ref }}$' ref-whitelist
+            if test -x allow-ref && ! ./allow-ref '${{ github.ref }}'
             then
               enabled=no
             fi
diff --git a/Documentation/SubmittingPatches b/Documentation/SubmittingPatches
index 8686318550..8175424929 100644
--- a/Documentation/SubmittingPatches
+++ b/Documentation/SubmittingPatches
@@ -82,6 +82,18 @@ Alternately, you can use GitHub Actions (which supports testing your changes
 on Linux, macOS, and Windows) by pushing into a branch in your fork
 or opening a GitHub's Pull Request against
 https://github.com/git/git.git or a fork of that repository.
+In the event that you only want to trigger GitHub Actions for specific
+refname, you can create an executable file named `allow-ref` in
+`refs/ci/config`. Those below steps may help you:
+--------------
+$ git checkout --orphan ci-config
+$ cp contrib/ci-config-allow-ref allow-ref
+$ $EDITOR allow-ref
+$ git rm -rf .
+$ git commit allow-ref
+$ git push <your-fork> HEAD:refs/ci/config
+--------------
+
 
 Do not forget to update the documentation to describe the updated
 behavior and make sure that the resulting documentation set formats
diff --git a/contrib/ci-config-allow-ref b/contrib/ci-config-allow-ref
new file mode 100755
index 0000000000..b53e9ddbd0
--- /dev/null
+++ b/contrib/ci-config-allow-ref
@@ -0,0 +1,9 @@
+#!/bin/sh
+# Sample filter for GitHub Actions
+# GitHub Actions will run if and only if this script exit with zero status
+
+REFNAME="$1"
+
+case "$REFNAME" in
+refs/heads/no-ci*) exit 1 ;;
+esac

Junio C Hamano May 6, 2020, 3:56 a.m. UTC | #18

Đoàn Trần Công Danh  <congdanhqx@gmail.com> writes:

> diff --git a/Documentation/SubmittingPatches b/Documentation/SubmittingPatches
> index 8686318550..8175424929 100644
> --- a/Documentation/SubmittingPatches
> +++ b/Documentation/SubmittingPatches
> @@ -82,6 +82,18 @@ Alternately, you can u
>  on Linux, macOS, and Windows) by pushing into a branch in your fork
>  or opening a GitHub's Pull Request against
>  https://github.com/git/git.git or a fork of that repository.
> +In the event that you only want to trigger GitHub Actions for specific
> +refname, you can create an executable file named `allow-ref` in
> +`refs/ci/config`. Those below steps may help you:

"These steps below" or "The follwoing steps", perhaps.

> +--------------
> +$ git checkout --orphan ci-config
> +$ cp contrib/ci-config-allow-ref allow-ref
> +$ $EDITOR allow-ref
> +$ git rm -rf .

This sounds horrible.  You just nuked the entire files in the
working tree you use for your everyday Git hacking to edit a
single file.

> +$ git commit allow-ref

If allow-ref were added to the index before this "git commit", the
previous "git rm -rf ." would have removed it.  Since the surviving
allow-ref file must have been untracked, this "git commit" would not
commit anything.  Forgot to "git add"?

> +$ git push <your-fork> HEAD:refs/ci/config
> +--------------

>  Do not forget to update the documentation to describe the updated
>  behavior and make sure that the resulting documentation set formats
> diff --git a/contrib/ci-config-allow-ref b/contrib/ci-config-allow-ref
> new file mode 100755
> index 0000000000..b53e9ddbd0
> --- /dev/null
> +++ b/contrib/ci-config-allow-ref
> @@ -0,0 +1,9 @@
> +#!/bin/sh
> +# Sample filter for GitHub Actions
> +# GitHub Actions will run if and only if this script exit with zero status

s/exit/exits/;

As the instruction above says, we should set the example and
describe the behaviour we implemented initially.  Something as basic
like ...

    # Build any branch other than those whose name begins with "no-ci"

> +
> +REFNAME="$1"
> +
> +case "$REFNAME" in
> +refs/heads/no-ci*) exit 1 ;;
> +esac

Đoàn Trần Công Danh May 6, 2020, 2:25 p.m. UTC | #19

On 2020-05-05 20:56:58-0700, Junio C Hamano <gitster@pobox.com> wrote:
> Đoàn Trần Công Danh  <congdanhqx@gmail.com> writes:
> > +--------------
> > +$ git checkout --orphan ci-config
> > +$ cp contrib/ci-config-allow-ref allow-ref
> > +$ $EDITOR allow-ref
> > +$ git rm -rf .
> 
> This sounds horrible.  You just nuked the entire files in the
> working tree you use for your everyday Git hacking to edit a
> single file.

It isn't that horrible as it sounds. It only removes the files that are
currently added in index, which is the same with tracked files in old
branch, and we can get it back by switching back to old branch.

I decided to make an orphanage branch because I would like to save
time and network bandwidth for the "check-ci" jobs. Since GitHub will
fetch only single branch in GitHub Actions:

	/usr/bin/git -c protocol.version=2 fetch --no-tags --prune --progress --no-recurse-submodules --depth=1 origin +refs/ci/config:refs/ci/config

I wonder whether the "git rm -rf ." makes that block sounds horrible?
If that is the case, we can use the experimental git-switch(1)
instead, it's doing more-or-less the same (or is it the same?) with
"git checkout --orphan" and "git rm -rf ."

-----------------
$ cp contrib/ci-config-allow-ref.sample allow-ref
$ git switch --orphan ci-config
$ edit allow-ref
$ git add allow-ref
$ git commit
-----------------

Note: I changed `$EDITOR` to `edit` to match other example of `edit`.

> As the instruction above says, we should set the example and
> describe the behaviour we implemented initially.  Something as basic
> like ...
> 
>     # Build any branch other than those whose name begins with "no-ci"

That's definitely better.
And I think we should add `.sample` suffix to this file, too

If all of that's sensible enough, here is the replacement fix-up patch.
--------------------8<------------------
Subject: [PATCH] fixup! ci: allow per-branch config for GitHub Actions

Signed-off-by: Đoàn Trần Công Danh <congdanhqx@gmail.com>
---
 .github/workflows/main.yml         |  3 +--
 Documentation/SubmittingPatches    | 12 ++++++++++++
 contrib/ci-config-allow-ref.sample | 12 ++++++++++++
 3 files changed, 25 insertions(+), 2 deletions(-)
 create mode 100755 contrib/ci-config-allow-ref.sample

diff --git a/.github/workflows/main.yml b/.github/workflows/main.yml
index 51f4ff6e89..08217c5ed8 100644
--- a/.github/workflows/main.yml
+++ b/.github/workflows/main.yml
@@ -19,8 +19,7 @@ jobs:
           name: check whether CI is enabled for ref
           run: |
             enabled=yes
-            if test -e ref-whitelist &&
-               ! grep '^${{ github.ref }}$' ref-whitelist
+            if test -x allow-ref && ! ./allow-ref '${{ github.ref }}'
             then
               enabled=no
             fi
diff --git a/Documentation/SubmittingPatches b/Documentation/SubmittingPatches
index b916f07f2c..bf06284f29 100644
--- a/Documentation/SubmittingPatches
+++ b/Documentation/SubmittingPatches
@@ -82,6 +82,18 @@ Alternately, you can use GitHub Actions (which supports testing your changes
 on Linux, macOS, and Windows) by pushing into a branch in your fork
 or opening a GitHub Pull Request against
 https://github.com/git/git.git or a fork of that repository.
+In the event that you only want to trigger GitHub Actions for specific
+refname, you can create an executable file named `allow-ref` in
+`refs/ci/config`. The following steps may help you:
+--------------
+$ cp contrib/ci-config-allow-ref.sample allow-ref
+$ git switch --orphan <a-branch-name-of-your-choice>
+$ edit allow-ref
+$ git add allow-ref
+$ git commit allow-ref
+$ git push <your-fork> HEAD:refs/ci/config
+--------------
+
 
 Do not forget to update the documentation to describe the updated
 behavior and make sure that the resulting documentation set formats
diff --git a/contrib/ci-config-allow-ref.sample b/contrib/ci-config-allow-ref.sample
new file mode 100755
index 0000000000..1973e88912
--- /dev/null
+++ b/contrib/ci-config-allow-ref.sample
@@ -0,0 +1,12 @@
+#!/bin/sh
+# Sample filter for GitHub Actions
+# GitHub Actions will run if and only if this script exits with zero status
+
+# This sample filter will ask GitHub Actions to build
+# any branches whose name doesn't start with "no-ci"
+
+REFNAME="$1"
+
+case "$REFNAME" in
+refs/heads/no-ci*) exit 1 ;;
+esac

Johannes Schindelin May 6, 2020, 3:09 p.m. UTC | #20

Hi,

On Tue, 5 May 2020, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
>
> >  jobs:
> > +  check-ci:
> > +      runs-on: ubuntu-latest
> > +      outputs:
> > +        enabled: ${{ steps.check-ref.outputs.enabled }}
> > +      steps:
> > +        - uses: actions/checkout@v2
> > +          continue-on-error: true
> > +          with:
> > +            ref: refs/ci/config
> > +        - id: check-ref
> > +          name: check whether CI is enabled for ref
> > +          run: |
> > +            enabled=yes
> > +            if test -e ref-whitelist &&
> > +               ! grep '^${{ github.ref }}$' ref-whitelist
> > +            then
> > +              enabled=no
> > +            fi
> > +            echo "::set-output name=enabled::$enabled"
> > +
> >    windows-build:
> > +    needs: check-ci
> > +    if: needs.check-ci.outputs.enabled == 'yes'
> >      runs-on: windows-latest
> >      steps:
> >      - uses: actions/checkout@v1
>
> Oh, quite nice.  Simple and clean.

The idea is indeed very neat. I think we can do a bit better with resource
usage by not even bothering to check this branch out. Something along
those lines (sorry, I really would love to have the time to test this...):

      - id: check-ref
        name: check whether CI is enabled for ref
        uses: actions/github-script@0.9.0
        with:
          script: |
            const req = {
              owner: context.repo.owner,
              repo: context.repo.repo,
              ref: "ci/config"
            };

            try {
              req.tree_sha = (await github.git.getRef(req)).data.object.sha;
              (await github.git.getTree(req))
              .tree.filter(e => e.path == 'ref-whitelist').map(e => {
                req.file_sha = e.sha;
              });
              const list = Buffer.from((await github.git.getBlob(req)).data.content, 'base64').toString('UTF-8');
              core.setOutput('enabled', `\n${list}`.indexOf(`\n${{github.ref}}\n`) < 0 ? 'no' : 'yes');
            } catch (e) {
              core.setOutput('enabled', 'yes');
            }

Ciao,
Dscho

Junio C Hamano May 6, 2020, 4:26 p.m. UTC | #21

Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:

> The idea is indeed very neat. I think we can do a bit better with resource
> usage by not even bothering to check this branch out. Something along
> those lines (sorry, I really would love to have the time to test this...):

Nice.  I view the latest round of Peff's idea "allow-ref" as
"because we are spinning a VM anyway, why not just run an end-user
supplied script and let it decide?" and not as "we must have a
Turing complete flexibility so let's run a script in a VM".  In
other words, it may be overkill and we may strike a better tradeoff
by living with reduced flexibility if there is a way to avoid the
associated cost.

But doesn't this (i.e. uses: actions/github-script) still pay the
cost of spinning up a VM?  How expensive is it to check out a small
tree with a single file, whether it is ref-whitelist or allow-ref?

If the cost to check out a tree is dwarfed, it probably is not worth
pursuing the approach to use a canned "don't customize, just use"
logic in the workflow file to read a configuration file that can
only list full names of allowed refs.

Thanks.

>       - id: check-ref
>         name: check whether CI is enabled for ref
>         uses: actions/github-script@0.9.0
>         with:
>           script: |
>             const req = {
>               owner: context.repo.owner,
>               repo: context.repo.repo,
>               ref: "ci/config"
>             };
>
>             try {
>               req.tree_sha = (await github.git.getRef(req)).data.object.sha;
>               (await github.git.getTree(req))
>               .tree.filter(e => e.path == 'ref-whitelist').map(e => {
>                 req.file_sha = e.sha;
>               });
>               const list = Buffer.from((await github.git.getBlob(req)).data.content, 'base64').toString('UTF-8');
>               core.setOutput('enabled', `\n${list}`.indexOf(`\n${{github.ref}}\n`) < 0 ? 'no' : 'yes');
>             } catch (e) {
>               core.setOutput('enabled', 'yes');
>             }
>
> Ciao,
> Dscho

Junio C Hamano May 6, 2020, 4:31 p.m. UTC | #22

Đoàn Trần Công Danh  <congdanhqx@gmail.com> writes:

> On 2020-05-05 20:56:58-0700, Junio C Hamano <gitster@pobox.com> wrote:
>> Đoàn Trần Công Danh  <congdanhqx@gmail.com> writes:
>> > +--------------
>> > +$ git checkout --orphan ci-config
>> > +$ cp contrib/ci-config-allow-ref allow-ref
>> > +$ $EDITOR allow-ref
>> > +$ git rm -rf .
>> 
>> This sounds horrible.  You just nuked the entire files in the
>> working tree you use for your everyday Git hacking to edit a
>> single file.
>
> It isn't that horrible as it sounds. It only removes the files that are
> currently added in index, which is the same with tracked files in old
> branch, and we can get it back by switching back to old branch.
>
> I decided to make an orphanage branch because I would like to save
> time and network bandwidth for the "check-ci" jobs.

I didn't say it is wrong to record a tree with a single file
(allow-ref) in a commit that is pointed by the ci-config ref.

I would have expected you to create such a commit in an otherwise
empty repository, and push into your fork of Git at GitHub.  That
way, you won't have to checkout and/or refresh the index all of the
3800+ files.

> I wonder whether the "git rm -rf ." makes that block sounds horrible?
> If that is the case, we can use the experimental git-switch(1)
> instead, it's doing more-or-less the same (or is it the same?) with
> "git checkout --orphan" and "git rm -rf ."

That does not change anything, as abuse of your primary repository
is what offends me.

Đoàn Trần Công Danh May 7, 2020, 12:01 p.m. UTC | #23

On 2020-05-06 17:09:39+0200, Johannes Schindelin <Johannes.Schindelin@gmx.de> wrote:
> The idea is indeed very neat. I think we can do a bit better with resource
> usage by not even bothering to check this branch out. Something along
> those lines (sorry, I really would love to have the time to test this...):

While this can avoid the cost of checking out a whole branch (which
can be mitigated by using an orphan branch with single file),

This still spins up an VM, and actions/github-script run (I think)
nodejs, which is more resource intensive than git and sh script.
Above statement maybe wrong, I'm not interacting much with nodejs.

>       - id: check-ref
>         name: check whether CI is enabled for ref
>         uses: actions/github-script@0.9.0
>         with:
>           script: |
>             const req = {
>               owner: context.repo.owner,
>               repo: context.repo.repo,
>               ref: "ci/config"
>             };
> 
>             try {
>               req.tree_sha = (await github.git.getRef(req)).data.object.sha;
>               (await github.git.getTree(req))
>               .tree.filter(e => e.path == 'ref-whitelist').map(e => {
>                 req.file_sha = e.sha;
>               });
>               const list = Buffer.from((await github.git.getBlob(req)).data.content, 'base64').toString('UTF-8');
>               core.setOutput('enabled', `\n${list}`.indexOf(`\n${{github.ref}}\n`) < 0 ? 'no' : 'yes');

And this `indexOf` will check if our ref (exact) matchs (full line)
with some white-list list, which is very limited.
So people couldn't match by some pattern (grep can work).

I haven't tested, but we may use part of above script to read a single
file from a ref, and add another steps for "grep"/"sh"
I'm not sure if that script will cost more resources than git-checkout
or not. And is that solutions over-engineered?

Jeff King May 7, 2020, 12:17 p.m. UTC | #24

On Wed, May 06, 2020 at 09:26:53AM -0700, Junio C Hamano wrote:

> Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:
> 
> > The idea is indeed very neat. I think we can do a bit better with resource
> > usage by not even bothering to check this branch out. Something along
> > those lines (sorry, I really would love to have the time to test this...):
> 
> Nice.  I view the latest round of Peff's idea "allow-ref" as
> "because we are spinning a VM anyway, why not just run an end-user
> supplied script and let it decide?" and not as "we must have a
> Turing complete flexibility so let's run a script in a VM".  In
> other words, it may be overkill and we may strike a better tradeoff
> by living with reduced flexibility if there is a way to avoid the
> associated cost.

The script is just javascript, which has an eval(). We could probably
let refs/ci/config:allow-ref be a snippet of javascript that we run.

I do prefer the VM environment to snippets of javascript for a few
reasons:

  - shell scripts match the rest of our Git toolbox (though that's a
    personal preference, and for the amount of code we're talking about
    even _I_ can do a little javascript)

  - it's easy to test your code locally by running "./allow-ref foo".
    Whereas emulating the environment in which those scriptlets are run
    is much trickier (e.g., you need a working github-api object)

That said, if this is more lightweight, I think it's worth exploring.

> But doesn't this (i.e. uses: actions/github-script) still pay the
> cost of spinning up a VM?  How expensive is it to check out a small
> tree with a single file, whether it is ref-whitelist or allow-ref?

I suspect this script mechanism may be much cheaper. I don't know the
implementation details, but spinning up a nodejs container to run a
javascript snippet should be much cheaper than a full ubuntu VM running
"git clone" (the clone itself should be super cheap because it's a
shallow single-branch clone of a tree with one file in it, but getting
there is relatively heavy-weight).

But I could be wrong.  It's clear that they're not spinning up a full
ubuntu vm from scratch to serve the requests (since it happens in 3s).
I'll see if I can get something working and do some timings.

The latency isn't incredibly important. This is all happening async, and
either we'll skip CI (in which case you don't care how long it takes,
since you're not waiting on the result) or CI will take many minutes, in
which case 3s is nothing. But I would like to be respectful of the CPU
spent running Actions and be as lightweight as possible.

-Peff

Jeff King May 7, 2020, 12:25 p.m. UTC | #25

On Wed, May 06, 2020 at 09:31:25AM -0700, Junio C Hamano wrote:

> Đoàn Trần Công Danh  <congdanhqx@gmail.com> writes:
> 
> > On 2020-05-05 20:56:58-0700, Junio C Hamano <gitster@pobox.com> wrote:
> >> Đoàn Trần Công Danh  <congdanhqx@gmail.com> writes:
> >> > +--------------
> >> > +$ git checkout --orphan ci-config
> >> > +$ cp contrib/ci-config-allow-ref allow-ref
> >> > +$ $EDITOR allow-ref
> >> > +$ git rm -rf .
> >> 
> >> This sounds horrible.  You just nuked the entire files in the
> >> working tree you use for your everyday Git hacking to edit a
> >> single file.
> >
> > It isn't that horrible as it sounds. It only removes the files that are
> > currently added in index, which is the same with tracked files in old
> > branch, and we can get it back by switching back to old branch.
> >
> > I decided to make an orphanage branch because I would like to save
> > time and network bandwidth for the "check-ci" jobs.
> 
> I didn't say it is wrong to record a tree with a single file
> (allow-ref) in a commit that is pointed by the ci-config ref.
> 
> I would have expected you to create such a commit in an otherwise
> empty repository, and push into your fork of Git at GitHub.  That
> way, you won't have to checkout and/or refresh the index all of the
> 3800+ files.

Yeah, I agree that all of the mechanisms for dealing with the unrelated
history are somewhat awkward. Another issue is that you can't just:

  git clone --single-branch -b refs/ci/config my-config

to work on it, because "-b" wants only heads or tags (we could address
that by putting it in refs/heads/ci-config or similar).

If we do go the javascript route, perhaps it would make sense for
refs/ci/config to be a single blob containing a snippet of javascript
with several functions. And then we could just eval() that and call the
appropriate functions (if defined).

Then a sample can live in the main repo with something like:

  # This file contains functions which will be run by the GitHub
  # Actions CI script.
  #
  # You may customize it for your own fork by modifying it on any branch
  # you like, and installing with:
  #
  #  git push <remote> $(git rev-parse HEAD:ci/config):refs/ci/config
  #
  # [allow_ref() sample definition and documentation....]

That sidesteps most of those issues.

-Peff

Đoàn Trần Công Danh May 7, 2020, 12:47 p.m. UTC | #26

On 2020-05-07 19:01:02+0700, Đoàn Trần Công Danh <congdanhqx@gmail.com> wrote:
> On 2020-05-06 17:09:39+0200, Johannes Schindelin <Johannes.Schindelin@gmx.de> wrote:
> > The idea is indeed very neat. I think we can do a bit better with resource
> > usage by not even bothering to check this branch out. Something along
> > those lines (sorry, I really would love to have the time to test this...):
> 
> While this can avoid the cost of checking out a whole branch (which
> can be mitigated by using an orphan branch with single file),
> 
> This still spins up an VM, and actions/github-script run (I think)
> nodejs, which is more resource intensive than git and sh script.
> Above statement maybe wrong, I'm not interacting much with nodejs.

I was wrong, actions/checkout is also using nodejs,
So, this actions/github-script actual reduces the total time for
fetching the file ref-whitelist/ref-blacklist/allow-ref

> >       - id: check-ref
> >         name: check whether CI is enabled for ref
> >         uses: actions/github-script@0.9.0
> >         with:
> >           script: |
> >             const req = {
> >               owner: context.repo.owner,
> >               repo: context.repo.repo,
> >               ref: "ci/config"
> >             };
> > 
> >             try {
> >               req.tree_sha = (await github.git.getRef(req)).data.object.sha;
> >               (await github.git.getTree(req))
> >               .tree.filter(e => e.path == 'ref-whitelist').map(e => {
> >                 req.file_sha = e.sha;
> >               });
> >               const list = Buffer.from((await github.git.getBlob(req)).data.content, 'base64').toString('UTF-8');
> >               core.setOutput('enabled', `\n${list}`.indexOf(`\n${{github.ref}}\n`) < 0 ? 'no' : 'yes');
> 
> And this `indexOf` will check if our ref (exact) matchs (full line)
> with some white-list list, which is very limited.
> So people couldn't match by some pattern (grep can work).
> 
> I haven't tested, but we may use part of above script to read a single
> file from a ref, and add another steps for "grep"/"sh"
> I'm not sure if that script will cost more resources than git-checkout
> or not. And is that solutions over-engineered?

But this point still hold, now, I think using part of above script to
read the file, and allow more custom logic in a separated steps maybe
better solutions.

Jeff King May 7, 2020, 2:02 p.m. UTC | #27

On Thu, May 07, 2020 at 08:17:27AM -0400, Jeff King wrote:

> > But doesn't this (i.e. uses: actions/github-script) still pay the
> > cost of spinning up a VM?  How expensive is it to check out a small
> > tree with a single file, whether it is ref-whitelist or allow-ref?
> 
> I suspect this script mechanism may be much cheaper. I don't know the
> implementation details, but spinning up a nodejs container to run a
> javascript snippet should be much cheaper than a full ubuntu VM running
> "git clone" (the clone itself should be super cheap because it's a
> shallow single-branch clone of a tree with one file in it, but getting
> there is relatively heavy-weight).

Sorry, this is all complete nonsense. There is no magical nodejs
container in Actions. You still have to say "runs-on: ubuntu-latest". So
it's still spinning up that VM and then running inside there.

I just did a timing with three jobs:

  noop:
    runs-on: ubuntu-latest
    steps:
    - run: exit 0

  script:
    runs-on: ubuntu-latest
    steps:
    - uses: actions/github-script@0.9.0
      with:
        script: |
            const req = {
              owner: context.repo.owner,
              repo: context.repo.repo,
              ref: "refs/ci/config"
            };

            try {
              req.tree_sha = (await github.git.getRef(req)).data.object.sha;
              (await github.git.getTree(req))
              .tree.filter(e => e.path == 'ref-whitelist').map(e => {
                req.file_sha = e.sha;
              });
              const list = Buffer.from((await github.git.getBlob(req)).data.content, 'base64').toString('UTF-8');
              core.setOutput('enabled', `\n${list}`.indexOf(`\n${{github.ref}}\n`) < 0 ? 'no' : 'yes');
            } catch (e) {
              core.setOutput('enabled', 'yes');
            }

  checkout:
    runs-on: ubuntu-latest
    steps:
    - uses: actions/checkout@v2
      continue-on-error: true
      with:
        ref: refs/ci/config
    - run: ./allow-ref ${{ github.ref }}

and they took 1, 2, and 3 seconds respectively. They spend 2s getting
the environment set up and the actions loaded. So the API one spent less
than 1s on the network, but the single-file checkout spent slightly
more. Given the timing variations I've seen, I wouldn't be surprised if
it sometimes goes the other way. But even if those numbers are accurate,
I don't think the cost difference is enough to force our hand either
way.

-Peff

Junio C Hamano May 7, 2020, 6:17 p.m. UTC | #28

Jeff King <peff@peff.net> writes:

> On Thu, May 07, 2020 at 08:17:27AM -0400, Jeff King wrote:
>
>> > But doesn't this (i.e. uses: actions/github-script) still pay the
>> > cost of spinning up a VM?  How expensive is it to check out a small
>> > tree with a single file, whether it is ref-whitelist or allow-ref?
>> 
>> I suspect this script mechanism may be much cheaper. I don't know the
>> implementation details, but spinning up a nodejs container to run a
>> javascript snippet should be much cheaper than a full ubuntu VM running
>> "git clone" (the clone itself should be super cheap because it's a
>> shallow single-branch clone of a tree with one file in it, but getting
>> there is relatively heavy-weight).
>
> Sorry, this is all complete nonsense. There is no magical nodejs
> container in Actions. You still have to say "runs-on: ubuntu-latest". So
> it's still spinning up that VM and then running inside there.

Ah, I did see "runs-on: ubuntu-latest" in the tutorial for the node
thing, and was very much dissapointed, before I sent that "don't you
still spin up a VM anyway?" response.  Glad to know that I wasn't
totally misreading the documentation, and unhappy that there wasn't
a magic bullet after all X-<.

> and they took 1, 2, and 3 seconds respectively. They spend 2s getting
> the environment set up and the actions loaded. So the API one spent less
> than 1s on the network, but the single-file checkout spent slightly
> more. Given the timing variations I've seen, I wouldn't be surprised if
> it sometimes goes the other way. But even if those numbers are accurate,
> I don't think the cost difference is enough to force our hand either
> way.

Yup, the above tempts me to say "because we are spinning a VM
anyway, why not just run an end-user supplied script and let it
decide?" would be the best approach.

Unless somebody finds a magic bullet, that is, but unfortunately the
nodejs one does not seem to be one.

Thanks.

Junio C Hamano May 7, 2020, 6:29 p.m. UTC | #29

Jeff King <peff@peff.net> writes:

> Yeah, I agree that all of the mechanisms for dealing with the unrelated
> history are somewhat awkward. Another issue is that you can't just:
>
>   git clone --single-branch -b refs/ci/config my-config
>
> to work on it, because "-b" wants only heads or tags (we could address
> that by putting it in refs/heads/ci-config or similar).

I somehow don't think that it is such a huge issue, because I expect
anybody who has legitimate interest in refs/ci/config to have a full
clone of git.git anyway.  So it is more like defining another
remote.origin.fetch like this:

	 [remote "origin"]
	 	url = https://git.kernel.org/pub/scm/git/git.git/
	 	fetch = +refs/heads/*:refs/remotes/origin/*
	+	fetch = +refs/ci/config:refs/remotes/origin/ci-config

         [remote "publish"]
		url = https://github.com/user/git/


and then do something like:
    
	$ git worktree add -b ci-config ../git-ci-config origin/ci-config
	$ cd ../git-ci-config
	... hack hack hack ...
	$ git push publish ci-config:refs/ci/config

> If we do go the javascript route, perhaps it would make sense for
> refs/ci/config to be a single blob containing a snippet of javascript
> with several functions. And then we could just eval() that and call the
> appropriate functions (if defined).

Yup.

> Then a sample can live in the main repo with something like:
>
>   # This file contains functions which will be run by the GitHub
>   # Actions CI script.
>   #
>   # You may customize it for your own fork by modifying it on any branch
>   # you like, and installing with:
>   #
>   #  git push <remote> $(git rev-parse HEAD:ci/config):refs/ci/config
>   #
>   # [allow_ref() sample definition and documentation....]
>
> That sidesteps most of those issues.

Perhaps.

Jeff King May 7, 2020, 6:54 p.m. UTC | #30

On Thu, May 07, 2020 at 11:29:09AM -0700, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
> 
> > Yeah, I agree that all of the mechanisms for dealing with the unrelated
> > history are somewhat awkward. Another issue is that you can't just:
> >
> >   git clone --single-branch -b refs/ci/config my-config
> >
> > to work on it, because "-b" wants only heads or tags (we could address
> > that by putting it in refs/heads/ci-config or similar).
> 
> I somehow don't think that it is such a huge issue, because I expect
> anybody who has legitimate interest in refs/ci/config to have a full
> clone of git.git anyway.  So it is more like defining another
> remote.origin.fetch like this:
> 
> 	 [remote "origin"]
> 	 	url = https://git.kernel.org/pub/scm/git/git.git/
> 	 	fetch = +refs/heads/*:refs/remotes/origin/*
> 	+	fetch = +refs/ci/config:refs/remotes/origin/ci-config
> 
>          [remote "publish"]
> 		url = https://github.com/user/git/
> 
> 
> and then do something like:
>     
> 	$ git worktree add -b ci-config ../git-ci-config origin/ci-config
> 	$ cd ../git-ci-config
> 	... hack hack hack ...
> 	$ git push publish ci-config:refs/ci/config

I dunno. The duality of refs/ci/config and refs/heads/ci-config is
weird. Your two refspecs overlap on their RHS, and once you push up the
refs/heads/ci-config you created with "git worktree add", they'll start
to conflict (which maybe is OK as long as they're the same).

In the patch I sent out a few hours ago I just caved and made
refs/heads/ci-config the magic ref, in order to avoid complexity.

But it also wouldn't be too bad to say "look, you can store this however
you like in refs/heads/ or not at all, but install it into place with
git push <remote> HEAD:refs/ci/config". The audience is git.git
developers (and even advanced ones whose workflows involve tweaking
their CI config), so I'd like to think they could figure it out.

I have a feeling that fewer than 5 people in the world will end up using
this feature either way. ;)

-Peff

Junio C Hamano May 7, 2020, 7:33 p.m. UTC | #31

Jeff King <peff@peff.net> writes:

> In the patch I sent out a few hours ago I just caved and made
> refs/heads/ci-config the magic ref, in order to avoid complexity.
>
> But it also wouldn't be too bad to say "look, you can store this however
> you like in refs/heads/ or not at all, but install it into place with
> git push <remote> HEAD:refs/ci/config". The audience is git.git
> developers (and even advanced ones whose workflows involve tweaking
> their CI config), so I'd like to think they could figure it out.
>
> I have a feeling that fewer than 5 people in the world will end up using
> this feature either way. ;)

Yeah, let's not spend too much time and too many cycles to
overengineer it.

Thanks.

[v2,1/2] CI: limit GitHub Actions to designated branches

Commit Message

Comments

Patch