mbox series

[0/7] New execute-commands hook for centralized workflow

Message ID 20200304113312.34229-1-zhiyou.jx@alibaba-inc.com (mailing list archive)
Headers show
Series New execute-commands hook for centralized workflow | expand

Message

Jiang Xin March 4, 2020, 11:33 a.m. UTC
In my planned speech at the Git Merge 2020 conference, this series of
patches is a core part of AGit-Flow (a centralized workflow like
Gerrit). Due to a coronavirus outbreak in China, I was unable to attend
the meeting. I wrote a blog "AGit-Flow and git-repo" based on my planned
speech, see:

    https://git-repo.info/en/2020/03/agit-flow-and-git-repo/ 

Git calls an internal `execute_commands` function to handle commands
sent from the client to `git-receive-pack`.  Regardless of what
references the user pushes, git creates or updates the corresponding
references if the user has write permission.  A contributor who has
no write permission cannot push to repository directly.  So, the
contributor has to write commits to an alternate location and sends
pull request by emails or by other ways.  We call this distributed
workflow.

It would be more convenient to work in a centralized workflow like what
Gerrit provided for some cases.  For example, a read-only user may run
the following `git push` command to push commits to a special reference
to create a code review, instead of updating a reference directly.

    git push -o reviewers=user1,user2 \
        -o oldoid=89c082363ac950d224a7259bfba3ccfbf4c560c4 \
        origin \
        HEAD:refs/for/<branch-name>/<session>

The `<branch-name>` in the above example can be as simple as "master",
or a more complicated branch name like "foo/bar".  The `<session>` in
the above example command can be the local branch name of the clien-
side, such as "my/topic".

To support this kind of workflow in CGit, add a filter and a new
handler.  The filter will check the prefix of the reference name, and
if the command has a special reference name, the filter will add a
specific tag (`exec_by_hook`) to the command.  Commands with this
specific tag will be executed by a new handler (an external hook named
"execute-commands") instead of the internal `execute_commands` function.

We can use the external "execute-commands" hook to create pull requests
or send emails.  The centralized workflow is not a replacement for
the distributed workflow, but a supplement. Especially for lightweight
code contribution or for working on a project with multiple repositories.

We also implement a command line tool for this kind of workflow, see:

    https://github.com/aliyun/git-repo-go


Jiang Xin (7):
  receive-pack: new external execute-commands hook
  receive-pack: feed all commands to post-receive
  receive-pack: try `execute-commands --pre-receive`
  receive-pack: read env from execute-commands output
  refs.c: refactor to reuse ref_is_hidden()
  receive-pack: new config receive.executeCommandsHookRefs
  hook: add document and example for "execute-commands" hook

 Documentation/config/receive.txt         |  11 +
 Documentation/githooks.txt               |  43 ++
 builtin/receive-pack.c                   | 228 +++++++-
 refs.c                                   |  11 +-
 refs.h                                   |   1 +
 t/t5411-execute-commands-hook.sh         | 698 +++++++++++++++++++++++
 templates/hooks--execute-commands.sample | 131 +++++
 7 files changed, 1093 insertions(+), 30 deletions(-)
 create mode 100755 t/t5411-execute-commands-hook.sh
 create mode 100755 templates/hooks--execute-commands.sample

Comments

Junio C Hamano March 4, 2020, 8:39 p.m. UTC | #1
Jiang Xin <worldhello.net@gmail.com> writes:

> It would be more convenient to work in a centralized workflow like what
> Gerrit provided for some cases.  For example, a read-only user may run
> the following `git push` command to push commits to a special reference
> to create a code review, instead of updating a reference directly.
>
>     git push -o reviewers=user1,user2 \
>         -o oldoid=89c082363ac950d224a7259bfba3ccfbf4c560c4 \
>         origin \
>         HEAD:refs/for/<branch-name>/<session>
>
> The `<branch-name>` in the above example can be as simple as "master",
> or a more complicated branch name like "foo/bar".  The `<session>` in
> the above example command can be the local branch name of the clien-
> side, such as "my/topic".
>
> To support this kind of workflow in CGit, add a filter and a new
> handler.  The filter will check the prefix of the reference name, and
> if the command has a special reference name, the filter will add a
> specific tag (`exec_by_hook`) to the command.  Commands with this
> specific tag will be executed by a new handler (an external hook named
> "execute-commands") instead of the internal `execute_commands` function.

I do not claim to be great at naming, but you are worse ;-)

 - Any hook is about executing command(s), so "execute-commands"
   hook does not give any information to users.

 - IIUC, this is only about what happens when accepting a push and
   is not called at any other time.  Naming your hook without
   "receive" anywhere in its name would mean other people won't be
   able to add hook that "executes" commands upon cues other than
   receiving a push.

I can guess why you chose that name, because I know there is a
function called execute_commands() in "git receive-pack", but that
is not somethhing you can expect your end users, who are not
intimate to our codebase, to know.

> We can use the external "execute-commands" hook to create pull requests
> or send emails.

You can create pull requests or send emails out of the post-receive
hook, so that is not a convincing justification why we want a new
hook.

Now, I understand that Gerrit-style "notice a push to for/<target>,
take over the whole operation that happens after receiving the pack
data and do something entirely different, such as attempting to run
a merge with refs/heads/<target> and update refs/heads/<target>
instead, or fail the push if automerge fails" is not easy to arrange
within the current "pre-receive" + "post-receive" framework (by the
way, we should start considering to deprecate "update", and
"post-update" hooks as these "*-receive" hooks were added to replace
them, perhaps we should leave a #leftoverbits mark here).  And I
think it is reasonable to add a new hook that takes over the whole
flow in "git receive-pack" to do so.

I just do not think "the execute-commands hook" is a good name for
it.  Perhaps "divert-receive" (as it diverts large portion of what
receive does) or something?  I dunno.

How do Gerrit folks deal with the "we pushed to the server, so let's
pretend to have turned around and fetched from the same server
immediately after doing so" client-side hack, by the way?  

A vanilla "git push" on the client side does not know a push to
refs/for/master would result in an update to refs/heads/master on
the server side, and it would not know the result of the munging
done on the server side (whether it is to rebase what is received on
top of 'master' or to merge it to 'master') anyway, the
remote-tracking branch refs/remotes/origin/master on the client side
would be left stale.  If we wanted to help them pretend to have
fetched immediately after, I think we need to extend the protocol.
Right now, after accepting "git push", the server end will say, for
each proposed update for a ref, if the push updated successfully or
not, but to support the "push to for/<target>, get heads/<target>
updated" interaction, the reporting of the result (done in the
report() function in builtin/receive-pack.c) needs to be able to say
what ref (it may be a ref that "git push" did not think it pushed
to) got updated to what value (it may be an object the client does
not yet have---and we may have to actually turn around and fetch
from them internally if we want to keep the illusion).
Jiang Xin March 5, 2020, 4:51 p.m. UTC | #2
Junio C Hamano <gitster@pobox.com> 于2020年3月5日周四 上午4:39写道:
>
> I do not claim to be great at naming, but you are worse ;-)

I totally agree that I am not good at naming, for example my daughter's name.

>  - Any hook is about executing command(s), so "execute-commands"
>    hook does not give any information to users.
>
>  - IIUC, this is only about what happens when accepting a push and
>    is not called at any other time.  Naming your hook without
>    "receive" anywhere in its name would mean other people won't be
>    able to add hook that "executes" commands upon cues other than
>    receiving a push.
>
> I can guess why you chose that name, because I know there is a
> function called execute_commands() in "git receive-pack", but that
> is not somethhing you can expect your end users, who are not
> intimate to our codebase, to know.

Yes, it's better to name the hook "* -receive", because the hooks are
for different commands, such as "commit-msg" is for `git commit`.

> > We can use the external "execute-commands" hook to create pull requests
> > or send emails.
>
> You can create pull requests or send emails out of the post-receive
> hook, so that is not a convincing justification why we want a new
> hook.

Another solution is using "pre-receive" + "post-receive" to handle a
push to "refs/for/master".  The "post-receive" hook is used to create
a pull requst and delete the special reference "refs/for/master"
created between these two hooks.  But having a temporary reference
created is not safe for concurrent pushes.

> Now, I understand that Gerrit-style "notice a push to for/<target>,
> take over the whole operation that happens after receiving the pack
> data and do something entirely different, such as attempting to run
> a merge with refs/heads/<target> and update refs/heads/<target>
> instead, or fail the push if automerge fails" is not easy to arrange
> within the current "pre-receive" + "post-receive" framework (by the
> way, we should start considering to deprecate "update", and
> "post-update" hooks as these "*-receive" hooks were added to replace
> them, perhaps we should leave a #leftoverbits mark here).  And I
> think it is reasonable to add a new hook that takes over the whole
> flow in "git receive-pack" to do so.
>
> I just do not think "the execute-commands hook" is a good name for
> it.  Perhaps "divert-receive" (as it diverts large portion of what
> receive does) or something?  I dunno.

I suggest naming the hook as "process-receive", which is executed
between the other two "p*-receive" hooks, and no need to create a
special "pre-receive" for "process-receive".

> How do Gerrit folks deal with the "we pushed to the server, so let's
> pretend to have turned around and fetched from the same server
> immediately after doing so" client-side hack, by the way?

In the following example, I push a local commit to a special reference
(refs/for/master) of the remote Gerrit server.  The "report()"
function (if Gerrit has one) says a new reference "refs/for/master"
has been created.  But in deed, there is no such reference created in
the remote repository, Gerrit will create another reference instead,
such as "refs/changes/71/623871/1", for user to download the code
review .  Because the local repository only has normal
"remote.<name>.fetch" config variables for remote tracking, so git
will not create a tracking reference for "refs/for/master".  Command
line tool, such as Android "repo" (or the reimplemented git-repo in
Golang), will create a special reference
(refs/published/<local/branch>) for tracking, and these tools are
responsible for banch tracking.

    $ git push --receive-pack="gerrit receive-pack" origin
refs/heads/master:refs/for/master
    Enumerating objects: 13, done.
    Counting objects: 100% (13/13), done.
    Delta compression using up to 8 threads
    Compressing objects: 100% (11/11), done.
    Writing objects: 100% (12/12), 1.34 KiB | 171.00 KiB/s, done.
    Total 12 (delta 2), reused 0 (delta 0), pack-reused 0
    remote: Resolving deltas: 100% (2/2)
    remote: Processing changes: refs: 1, new: 1, done
    remote:
    remote: SUCCESS
    remote:
    remote: New Changes:
    remote:   http://gerrit.example.com/c/my/repo/+/623889 Test commit
    To ssh://gerrit.example.com:29418/my/repo
     * [new branch]      master -> refs/for/master


> A vanilla "git push" on the client side does not know a push to
> refs/for/master would result in an update to refs/heads/master on
> the server side, and it would not know the result of the munging
> done on the server side (whether it is to rebase what is received on
> top of 'master' or to merge it to 'master') anyway, the

Neither Gerrit nor our AGit-Flow server will update the master branch.
Our AGit-Flow server will create a special reference (like GitHub's
"refs/pull/<number>/head") for reviewers to download commits.

> remote-tracking branch refs/remotes/origin/master on the client side
> would be left stale.  If we wanted to help them pretend to have
> fetched immediately after, I think we need to extend the protocol.
> Right now, after accepting "git push", the server end will say, for
> each proposed update for a ref, if the push updated successfully or
> not, but to support the "push to for/<target>, get heads/<target>
> updated" interaction, the reporting of the result (done in the
> report() function in builtin/receive-pack.c) needs to be able to say
> what ref (it may be a ref that "git push" did not think it pushed
> to) got updated to what value (it may be an object the client does
> not yet have---and we may have to actually turn around and fetch
> from them internally if we want to keep the illusion).

I have no idea now how to make a simple patch to give an accurate report.