[RFC,0/2] MVP implementation of remote-suggested hooks

Message ID	cover.1623881977.git.jonathantanmy@google.com (mailing list archive)
Headers	show Return-Path: <git-owner@kernel.org> Date: Wed, 16 Jun 2021 16:31:47 -0700 Message-Id: <cover.1623881977.git.jonathantanmy@google.com> Mime-Version: 1.0 Subject: [RFC PATCH 0/2] MVP implementation of remote-suggested hooks From: Jonathan Tan <jonathantanmy@google.com> To: git@vger.kernel.org Cc: Jonathan Tan <jonathantanmy@google.com> Content-Type: text/plain; charset="UTF-8" Precedence: bulk
Series	MVP implementation of remote-suggested hooks \| expand [RFC,0/2] MVP implementation of remote-suggested hooks [RFC,1/2] hook: move list of hooks [RFC,2/2] clone,fetch: remote-suggested auto-updating hooks

Jonathan Tan June 16, 2021, 11:31 p.m. UTC

This is a continuation of the work from [1]. That work described the
reasons, possible features, and possible workflows, but not the
implementation in detail. This patch set has an MVP implementation, and
my hope is that having a concrete implementation to look at makes it
easier to discuss matters of implementation.

This but does not use any features from es/config-based-hooks but is
implemented on that branch anyway because firstly, I need an existing
command to attach the "autoupdate" subcommand (and "git hook" works),
and secondly, when we test this at $DAYJOB, we will be testing it
together with the aforementioned branch.

I have had to make several design choices (which I will discuss later),
but now with this implementation, the following workflow is possible:

 1. The remote repo administrator creates a new branch
    "refs/heads/suggested-hooks" pointing to a commit that has all the
    hooks that the administrator wants to suggest. The hooks are
    directly referenced by the commit tree (i.e. they are in the "/"
    directory).

 2. When a user clones, Git notices that
    "refs/remotes/origin/suggested-hooks" is present and prints out a
    message about a command that can be run.

 3. If the user runs that command, Git will install the hooks pointed to
    by that ref, and set hook.autoupdate to true. This config variable
    is checked whenever "git fetch" is run: whenever it notices that
    "refs/remotes/origin/suggested-hooks" changes, it will reinstall the
    hooks.

 4. To turn off autoupdate, set hook.autoupdate to false. Existing hooks
    will remain.

Design choices:

 1. Where should the suggested hooks be in the remote repo? A branch,
    a non-branch ref, a config? I think that a branch is best - it is
    relatively well-understood and any hooks there can be
    version-controlled (and its history is independent of the other
    branches).

 2. When and how should the local repo update its record of the remote's
    suggested hooks? If we go with storing the hooks in a branch of a
    remote side, this would automatically mean (with the default
    refspec) that it would be in refs/remotes/<remote>/<name>. This
    incurs network and hard disk cost even if the local repo does not
    want to use the suggested hooks, but I think that typically they
    would want to use it if they're going to do any work on the repo
    (they would either have to trust or inspect Makefiles etc. anyway,
    so they can do the same for the hooks), and even if they don't want
    to use the remote's hooks, they probably still want to know what the
    remote suggests.

    So using a branch provides a well-understood way of storing the
    hooks on the remote, transmitting it to the local repo, and storing
    the hooks in the local repo.

    So: what should be the default name of this branch? Presumably, "git
    clone" would need to be able to override this.

 3. How should the local repo detect when the remote has updated its
    suggested hooks? I'm thinking when a certain local ref is updated.
    Right now it's hardcoded, but perhaps "git clone" could detect what
    "refs/heads/suggested-hooks" would correspond to, and then set it in
    a config accordingly. Other options include remembering what the
    remote's "refs/heads/suggested-hooks" used to be and always
    comparing it upon every "ls-refs" call, but I think that the local
    ref method is more straightforward.

 4. Should the local repo try to notice if the hooks have been changed
    locally before overwriting upon autoupdate? This would be nice to
    have, but I don't know how practical it would be. In particular, I
    don't know if we can trust that
    "refs/remotes/origin/suggested-hooks" has not been clobbered.

 5. Should we have a command that manually updates the hooks with what's
    in "refs/heads/suggested-hooks"? This is not in this patch set, but
    it sounds like a good idea.

There are perhaps other points that I haven't thought of, of course.

[1] https://lore.kernel.org/git/pull.908.git.1616105016055.gitgitgadget@gmail.com/

Jonathan Tan (2):
  hook: move list of hooks
  clone,fetch: remote-suggested auto-updating hooks

 builtin/bugreport.c |  38 ++------------
 builtin/clone.c     |  10 ++++
 builtin/fetch.c     |  21 ++++++++
 builtin/hook.c      |  13 +++--
 hook.c              | 118 ++++++++++++++++++++++++++++++++++++++++++++
 hook.h              |   5 ++
 t/t5601-clone.sh    |  36 ++++++++++++++
 7 files changed, 202 insertions(+), 39 deletions(-)

Junio C Hamano June 17, 2021, 1:30 a.m. UTC | #1

Jonathan Tan <jonathantanmy@google.com> writes:

>  1. The remote repo administrator creates a new branch
>     "refs/heads/suggested-hooks" pointing to a commit that has all the
>     hooks that the administrator wants to suggest. The hooks are
>     directly referenced by the commit tree (i.e. they are in the "/"
>     directory).

wants to suggest?  They simply suggest ;-)

>  2. When a user clones, Git notices that
>     "refs/remotes/origin/suggested-hooks" is present and prints out a
>     message about a command that can be run.

Can be run to install?  Or can be run to first inspect?  Or both?

>  3. If the user runs that command, Git will install the hooks pointed to
>     by that ref, and set hook.autoupdate to true. This config variable
>     is checked whenever "git fetch" is run: whenever it notices that
>     "refs/remotes/origin/suggested-hooks" changes, it will reinstall the
>     hooks.
>
>  4. To turn off autoupdate, set hook.autoupdate to false. Existing hooks
>     will remain.

OK, so "verify even if you implicitly trust" is actively discouraged.

> Design choices:
>
>  1. Where should the suggested hooks be in the remote repo? A branch,
>     a non-branch ref, a config? I think that a branch is best - it is
>     relatively well-understood and any hooks there can be
>     version-controlled (and its history is independent of the other
>     branches).

As people mentioned in the previous discussions, "independent of the
other branches" has advantages and disadvantages.  The most recent
set of hooks may have some that would not work well with older
codebase, so care must be taken to ensure any hook works on across
versions of the main codebase.  Which may not be a huge downside,
but something users must be aware of.

>  2. When and how should the local repo update its record of the remote's
>     suggested hooks? If we go with storing the hooks in a branch of a
>     remote side, this would automatically mean (with the default
>     refspec) that it would be in refs/remotes/<remote>/<name>. This
>     incurs network and hard disk cost even if the local repo does not
>     want to use the suggested hooks, but I think that typically they
>     would want to use it if they're going to do any work on the repo
>     (they would either have to trust or inspect Makefiles etc. anyway,
>     so they can do the same for the hooks), and even if they don't want
>     to use the remote's hooks, they probably still want to know what the
>     remote suggests.

A way to see what changes are made to recommendation would be
useful, and a branch that mostly linearly progresses is a good way
to give it to the users.

Of course, that can be done with suggested hooks inline with the
rest of the main codebase, too.

>  4. Should the local repo try to notice if the hooks have been changed
>     locally before overwriting upon autoupdate? This would be nice to
>     have, but I don't know how practical it would be. In particular, I
>     don't know if we can trust that
>     "refs/remotes/origin/suggested-hooks" has not been clobbered.

Meaning clobbered by malicious parties?  Then the whole thing is a
no-go from security point of view.  Presumably you trust the main
content enough to work on top of it, so as long as you can trust
that refs/remotes/origin/hooks to the same degree that you would
trust refs/remotes/origin/master, I do not think it is a problem.

Whatever mechanism you use to materialize an updated version of the
hooks can and should record which commit on the suggested-hooks
branch the .git/hooks/* file is based on.  Then when the user wants
to update to a different version of suggested-hooks (either because
you auto-detected, or the user issued a command to update), you have

 - The current version in .git/hooks/*

 - The old pristine version (you recorded the commit when you
   updated the .git/hooks/* copy the last time)

 - The new pristine version (you have a remote-tracking branch).

and it is not a brain surgery to perform three-way merge to update
the first using the difference between the second and the third.

>  5. Should we have a command that manually updates the hooks with what's
>     in "refs/heads/suggested-hooks"? This is not in this patch set, but
>     it sounds like a good idea.

I wonder if having it bound as a submodule to a known location in
the main contents tree makes it easier to manage and more flexible.
Just like you can update the working tree of a submodule to a
version that is bound to the superproject tree, or to a more recent
version of the "branch" in the submodule, you can support workflows
that allow suggested hooks to advance independent of the main
contents version and that uses a specific version of suggested hooks
tied to the main contents.

> There are perhaps other points that I haven't thought of, of course.
>
> [1] https://lore.kernel.org/git/pull.908.git.1616105016055.gitgitgadget@gmail.com/
>
> Jonathan Tan (2):
>   hook: move list of hooks
>   clone,fetch: remote-suggested auto-updating hooks
>
>  builtin/bugreport.c |  38 ++------------
>  builtin/clone.c     |  10 ++++
>  builtin/fetch.c     |  21 ++++++++
>  builtin/hook.c      |  13 +++--
>  hook.c              | 118 ++++++++++++++++++++++++++++++++++++++++++++
>  hook.h              |   5 ++
>  t/t5601-clone.sh    |  36 ++++++++++++++
>  7 files changed, 202 insertions(+), 39 deletions(-)

Emily Shaffer June 18, 2021, 8:57 p.m. UTC | #2

On Wed, Jun 16, 2021 at 04:31:47PM -0700, Jonathan Tan wrote:
> 
> I have had to make several design choices (which I will discuss later),
> but now with this implementation, the following workflow is possible:
> 
>  1. The remote repo administrator creates a new branch
>     "refs/heads/suggested-hooks" pointing to a commit that has all the
>     hooks that the administrator wants to suggest. The hooks are
>     directly referenced by the commit tree (i.e. they are in the "/"
>     directory).

I don't really like that this is in the same namespace as branches users
could create themselves. Hm, I think for 'git maintenance' prefetching
we put those refs in some special namespace, right? Can we do something
similar in this case? Would that prevent us from treating that ref like
a normal branch?

> 
>  2. When a user clones, Git notices that
>     "refs/remotes/origin/suggested-hooks" is present and prints out a
>     message about a command that can be run.
> 
>  3. If the user runs that command, Git will install the hooks pointed to
>     by that ref, and set hook.autoupdate to true. This config variable
>     is checked whenever "git fetch" is run: whenever it notices that
>     "refs/remotes/origin/suggested-hooks" changes, it will reinstall the
>     hooks.

I think this is where most people will have a lot to say from security
perspective. It will be important to make sure that:
 - "git fetch" only automatically updates these scripts when fetching
   from the same remote they were initially fetched from, over some
   secure protocol
 - hook.autoupdate setting maybe should not be default, or should at
   least be able to override during the install command. (One suggestion
   was to also set some experimental.suggestedHooks setting, or
   something, to turn on "automatically set up autoupdate" behavior?)
 - we should notify the user that their hooks were updated/reinstalled
   when that happens later on. (Maybe you did this, I didn't look at the
   diff quite yet)

>  4. To turn off autoupdate, set hook.autoupdate to false. Existing hooks
>     will remain.
> 
> Design choices:
> 
>  1. Where should the suggested hooks be in the remote repo? A branch,
>     a non-branch ref, a config? I think that a branch is best - it is
>     relatively well-understood and any hooks there can be
>     version-controlled (and its history is independent of the other
>     branches).

I agree - a branch without any prior parent sounds ideal to me, that is,
the repo owner setting up the hooks can say 'git checkout --orphan
suggested-hooks'. I guess if they want the branch to have history
shared with the rest of the project there's no reason not to do that,
either.

Do we want to provide helpful tooling for the administrator to create
this magic branch? At the very least I guess we want some documentation,
but I wonder if it's possible to make a helper (maybe a 'git hook'
subcommand) without being more prescriptive than Git project usually is.

>  3. How should the local repo detect when the remote has updated its
>     suggested hooks? I'm thinking when a certain local ref is updated.
>     Right now it's hardcoded, but perhaps "git clone" could detect what
>     "refs/heads/suggested-hooks" would correspond to, and then set it in
>     a config accordingly. Other options include remembering what the
>     remote's "refs/heads/suggested-hooks" used to be and always
>     comparing it upon every "ls-refs" call, but I think that the local
>     ref method is more straightforward.

Hm, I like that idea as it's analogous to remote tracking branch + local
tracking branch (which is a pretty common pattern for people to use).

> 
>  4. Should the local repo try to notice if the hooks have been changed
>     locally before overwriting upon autoupdate? This would be nice to
>     have, but I don't know how practical it would be. In particular, I
>     don't know if we can trust that
>     "refs/remotes/origin/suggested-hooks" has not been clobbered.

I think it would be nice to have, yeah. It seems important to notify the
users that they are executing something different from yesterday, so
noticing when we're making changes to the hook sounds useful.

> 
>  5. Should we have a command that manually updates the hooks with what's
>     in "refs/heads/suggested-hooks"? This is not in this patch set, but
>     it sounds like a good idea.

Yeah, I think so - that lets a user say "no, I don't need those hooks
(or this update)" at first, but later change their mind.

One thing that sounds useful to me, even at this RFC stage, is
documentation showing what this feature looks like to use - for both
the administrator setting up the magic hook branch, and the developer
cloning and using hooks. I think we want that in some Git manpage
eventually anyways, and it would help to see the workflow you're trying
to achieve.

Thanks for sending - will review the diffs today too.

 - Emily

> 
> There are perhaps other points that I haven't thought of, of course.
> 
> [1] https://lore.kernel.org/git/pull.908.git.1616105016055.gitgitgadget@gmail.com/
> 
> Jonathan Tan (2):
>   hook: move list of hooks
>   clone,fetch: remote-suggested auto-updating hooks
> 
>  builtin/bugreport.c |  38 ++------------
>  builtin/clone.c     |  10 ++++
>  builtin/fetch.c     |  21 ++++++++
>  builtin/hook.c      |  13 +++--
>  hook.c              | 118 ++++++++++++++++++++++++++++++++++++++++++++
>  hook.h              |   5 ++
>  t/t5601-clone.sh    |  36 ++++++++++++++
>  7 files changed, 202 insertions(+), 39 deletions(-)
> 
> -- 
> 2.32.0.272.g935e593368-goog
>

Jonathan Tan June 18, 2021, 9:46 p.m. UTC | #3

> Jonathan Tan <jonathantanmy@google.com> writes:
> 
> >  1. The remote repo administrator creates a new branch
> >     "refs/heads/suggested-hooks" pointing to a commit that has all the
> >     hooks that the administrator wants to suggest. The hooks are
> >     directly referenced by the commit tree (i.e. they are in the "/"
> >     directory).
> 
> wants to suggest?  They simply suggest ;-)

Ah yes :-)

> 
> >  2. When a user clones, Git notices that
> >     "refs/remotes/origin/suggested-hooks" is present and prints out a
> >     message about a command that can be run.
> 
> Can be run to install?  Or can be run to first inspect?  Or both?

Right now I only have a command that installs, but I can provide the
appropriate "cat-file" invocations to inspect them as well.

> >  3. If the user runs that command, Git will install the hooks pointed to
> >     by that ref, and set hook.autoupdate to true. This config variable
> >     is checked whenever "git fetch" is run: whenever it notices that
> >     "refs/remotes/origin/suggested-hooks" changes, it will reinstall the
> >     hooks.
> >
> >  4. To turn off autoupdate, set hook.autoupdate to false. Existing hooks
> >     will remain.
> 
> OK, so "verify even if you implicitly trust" is actively discouraged.

Yes I was thinking of the model in which we already trust upstream, but
I agree that verification can be useful. I think we can print the
"cat-file" commands needed to verify before installing, and add a mode
in which we tell the user that the hooks have been updated (but not
automatically install them).

> > Design choices:
> >
> >  1. Where should the suggested hooks be in the remote repo? A branch,
> >     a non-branch ref, a config? I think that a branch is best - it is
> >     relatively well-understood and any hooks there can be
> >     version-controlled (and its history is independent of the other
> >     branches).
> 
> As people mentioned in the previous discussions, "independent of the
> other branches" has advantages and disadvantages.  The most recent
> set of hooks may have some that would not work well with older
> codebase, so care must be taken to ensure any hook works on across
> versions of the main codebase.  Which may not be a huge downside,
> but something users must be aware of.

That's true - and on the flip side, I would presume that the
hook-introducing admin would usually want those hooks to apply
retroactively too (say, to someone updating a "maint" branch). I think
it's more flexible if hooks are in an independent branch, though - (if
independent branch) a hook can be written to tolerate an old codebase,
but if embedded in the code branch, such a hook cannot apply
retroactively to old code.

> >  2. When and how should the local repo update its record of the remote's
> >     suggested hooks? If we go with storing the hooks in a branch of a
> >     remote side, this would automatically mean (with the default
> >     refspec) that it would be in refs/remotes/<remote>/<name>. This
> >     incurs network and hard disk cost even if the local repo does not
> >     want to use the suggested hooks, but I think that typically they
> >     would want to use it if they're going to do any work on the repo
> >     (they would either have to trust or inspect Makefiles etc. anyway,
> >     so they can do the same for the hooks), and even if they don't want
> >     to use the remote's hooks, they probably still want to know what the
> >     remote suggests.
> 
> A way to see what changes are made to recommendation would be
> useful, and a branch that mostly linearly progresses is a good way
> to give it to the users.
> 
> Of course, that can be done with suggested hooks inline with the
> rest of the main codebase, too.

That's true.

> >  4. Should the local repo try to notice if the hooks have been changed
> >     locally before overwriting upon autoupdate? This would be nice to
> >     have, but I don't know how practical it would be. In particular, I
> >     don't know if we can trust that
> >     "refs/remotes/origin/suggested-hooks" has not been clobbered.
> 
> Meaning clobbered by malicious parties?  Then the whole thing is a
> no-go from security point of view.  Presumably you trust the main
> content enough to work on top of it, so as long as you can trust
> that refs/remotes/origin/hooks to the same degree that you would
> trust refs/remotes/origin/master, I do not think it is a problem.

I meant clobbered by the user - should have made that more clear. To
elaborate...

> Whatever mechanism you use to materialize an updated version of the
> hooks can and should record which commit on the suggested-hooks
> branch the .git/hooks/* file is based on.  Then when the user wants
> to update to a different version of suggested-hooks (either because
> you auto-detected, or the user issued a command to update), you have
> 
>  - The current version in .git/hooks/*
> 
>  - The old pristine version (you recorded the commit when you
>    updated the .git/hooks/* copy the last time)
> 
>  - The new pristine version (you have a remote-tracking branch).
> 
> and it is not a brain surgery to perform three-way merge to update
> the first using the difference between the second and the third.

...I was having a difficult time figuring out where to store such
information (ref? config? I wouldn't be surprised if a user saw a value
there and thought that they could change it). Perhaps it could be a
separate file like .git/shallow.

I'm not fully convinced that maintaining the ability to retain local
hook modifications by supporting three-way merges is important, but if
it is, it might be brain surgery to figure out the UX when merge
conflicts occur. Normally these conflicts can be written to the worktree
and index, but we don't have those in the case of hooks.

> >  5. Should we have a command that manually updates the hooks with what's
> >     in "refs/heads/suggested-hooks"? This is not in this patch set, but
> >     it sounds like a good idea.
> 
> I wonder if having it bound as a submodule to a known location in
> the main contents tree makes it easier to manage and more flexible.
> Just like you can update the working tree of a submodule to a
> version that is bound to the superproject tree, or to a more recent
> version of the "branch" in the submodule, you can support workflows
> that allow suggested hooks to advance independent of the main
> contents version and that uses a specific version of suggested hooks
> tied to the main contents.

This would make the hooks dependent on the commit checked out (with
perhaps a bit more leeway in that our implementation could be flexible
and use a commit later than what's in the gitlink), with its own pros
and cons (as you said earlier).

Jonathan Tan June 18, 2021, 9:58 p.m. UTC | #4

> On Wed, Jun 16, 2021 at 04:31:47PM -0700, Jonathan Tan wrote:
> > 
> > I have had to make several design choices (which I will discuss later),
> > but now with this implementation, the following workflow is possible:
> > 
> >  1. The remote repo administrator creates a new branch
> >     "refs/heads/suggested-hooks" pointing to a commit that has all the
> >     hooks that the administrator wants to suggest. The hooks are
> >     directly referenced by the commit tree (i.e. they are in the "/"
> >     directory).
> 
> I don't really like that this is in the same namespace as branches users
> could create themselves. Hm, I think for 'git maintenance' prefetching
> we put those refs in some special namespace, right? Can we do something
> similar in this case? Would that prevent us from treating that ref like
> a normal branch?

Do you mean that the server should put it in a different place, the
client should put it in a different place, or both?

> >  2. When a user clones, Git notices that
> >     "refs/remotes/origin/suggested-hooks" is present and prints out a
> >     message about a command that can be run.
> > 
> >  3. If the user runs that command, Git will install the hooks pointed to
> >     by that ref, and set hook.autoupdate to true. This config variable
> >     is checked whenever "git fetch" is run: whenever it notices that
> >     "refs/remotes/origin/suggested-hooks" changes, it will reinstall the
> >     hooks.
> 
> I think this is where most people will have a lot to say from security
> perspective. It will be important to make sure that:
>  - "git fetch" only automatically updates these scripts when fetching
>    from the same remote they were initially fetched from, over some
>    secure protocol

Putting the hooks in a branch does mean that "git fetch" gets these
scripts from the same remote (as opposed to, say, putting them in a
submodule or referring to them by URL), and I think it's easier to
understand (from the user point of view) that branch equals same remote
(rather than say that a submodule or URL is considered to be from the
same remote if $CRITERIA).

>  - hook.autoupdate setting maybe should not be default, or should at
>    least be able to override during the install command. (One suggestion
>    was to also set some experimental.suggestedHooks setting, or
>    something, to turn on "automatically set up autoupdate" behavior?)

OK, this makes sense.

>  - we should notify the user that their hooks were updated/reinstalled
>    when that happens later on. (Maybe you did this, I didn't look at the
>    diff quite yet)

I didn't do this, but this is a good idea.

> > Design choices:
> > 
> >  1. Where should the suggested hooks be in the remote repo? A branch,
> >     a non-branch ref, a config? I think that a branch is best - it is
> >     relatively well-understood and any hooks there can be
> >     version-controlled (and its history is independent of the other
> >     branches).
> 
> I agree - a branch without any prior parent sounds ideal to me, that is,
> the repo owner setting up the hooks can say 'git checkout --orphan
> suggested-hooks'. I guess if they want the branch to have history
> shared with the rest of the project there's no reason not to do that,
> either.

How would the branch share history with the rest of the project? If you
mean that we should allow the users to store hooks as part of the main
codebase (within a configurable path, say, /suggested_hooks) and then
point both "main" (or whatever the default branch is) and
"refs/remotes/origin/suggested-hooks" to the same place, then that
makes sense (although I would say that it still sounds better to have
separate histories).

> Do we want to provide helpful tooling for the administrator to create
> this magic branch? At the very least I guess we want some documentation,
> but I wonder if it's possible to make a helper (maybe a 'git hook'
> subcommand) without being more prescriptive than Git project usually is.

This sounds like the equivalent of "git checkout --orphan", as you
suggested above. We could just write it out in the documentation. I'm
not completely opposed to a special command, though.

> >  3. How should the local repo detect when the remote has updated its
> >     suggested hooks? I'm thinking when a certain local ref is updated.
> >     Right now it's hardcoded, but perhaps "git clone" could detect what
> >     "refs/heads/suggested-hooks" would correspond to, and then set it in
> >     a config accordingly. Other options include remembering what the
> >     remote's "refs/heads/suggested-hooks" used to be and always
> >     comparing it upon every "ls-refs" call, but I think that the local
> >     ref method is more straightforward.
> 
> Hm, I like that idea as it's analogous to remote tracking branch + local
> tracking branch (which is a pretty common pattern for people to use).

Yeah - that is partly why I like the branch idea (something already
generally understood by people).

> >  4. Should the local repo try to notice if the hooks have been changed
> >     locally before overwriting upon autoupdate? This would be nice to
> >     have, but I don't know how practical it would be. In particular, I
> >     don't know if we can trust that
> >     "refs/remotes/origin/suggested-hooks" has not been clobbered.
> 
> I think it would be nice to have, yeah. It seems important to notify the
> users that they are executing something different from yesterday, so
> noticing when we're making changes to the hook sounds useful.

Junio suggested [1] that we store more information about what's going
on, so this might be practical.

[1] https://lore.kernel.org/git/xmqq35thnuqp.fsf@gitster.g/

> >  5. Should we have a command that manually updates the hooks with what's
> >     in "refs/heads/suggested-hooks"? This is not in this patch set, but
> >     it sounds like a good idea.
> 
> Yeah, I think so - that lets a user say "no, I don't need those hooks
> (or this update)" at first, but later change their mind.

OK.

> One thing that sounds useful to me, even at this RFC stage, is
> documentation showing what this feature looks like to use - for both
> the administrator setting up the magic hook branch, and the developer
> cloning and using hooks. I think we want that in some Git manpage
> eventually anyways, and it would help to see the workflow you're trying
> to achieve.
> 
> Thanks for sending - will review the diffs today too.
> 
>  - Emily

Agreed - at the very least, we would need to write this workflow as a
test, and we can just reuse the same workflow in documentation. We just
need to discuss what the workflow should be :-) (as it is, one thing
that is becoming obvious is that currently people prefer to have
verify-before-update instead of auto-update by default).

Randall S. Becker June 18, 2021, 10:32 p.m. UTC | #5

On June 18, 2021 5:59 PM, Jonathan Tan wrote:
>> On Wed, Jun 16, 2021 at 04:31:47PM -0700, Jonathan Tan wrote:
>> >
>> > I have had to make several design choices (which I will discuss
>> > later), but now with this implementation, the following workflow is possible:
>> >
>> >  1. The remote repo administrator creates a new branch
>> >     "refs/heads/suggested-hooks" pointing to a commit that has all the
>> >     hooks that the administrator wants to suggest. The hooks are
>> >     directly referenced by the commit tree (i.e. they are in the "/"
>> >     directory).
>>
>> I don't really like that this is in the same namespace as branches
>> users could create themselves. Hm, I think for 'git maintenance'
>> prefetching we put those refs in some special namespace, right? Can we
>> do something similar in this case? Would that prevent us from treating
>> that ref like a normal branch?
>
>Do you mean that the server should put it in a different place, the client should put it in a different place, or both?

This brings up a very awkward question: How are enterprise git servers going to deal with this? I do not see the standard Pull Request mechanism available in GitHub handing placing hooks in different places during a merge operation. Or will this entire concept be omitted from PR?

It seems like changes to hooks have to be managed in a similar way to standard managed files rather than as exceptions.

-Randall

Matt Rogers June 19, 2021, 7:58 a.m. UTC | #6

On Sat, Jun 19, 2021 at 3:32 AM Randall S. Becker
<rsbecker@nexbridge.com> wrote:
>
> On June 18, 2021 5:59 PM, Jonathan Tan wrote:
> This brings up a very awkward question: How are enterprise git servers going to deal with this? I do not see the standard Pull Request mechanism available in GitHub handing placing hooks in different places during a merge operation. Or will this entire concept be omitted from PR?
>

Related question, if a project had a collection of suggested hooks,
and a contributor wanted
to update them in relation to a new feature or code change, would they
then have to create two
separate patches/PRs?  That feels like it would decentralize discussions/review?

For example, if I maintained a project on GitHub and a contributor
wanted to add a clang-tidy
invocation as a suggested hook, as well as a config file for it.  What
would the suggested workflow be?

"Open 2 Pull Requests one that updates both branches and do reviews
independently"?

If it was a mailing list would I have to send two separate patches?

I'm not familiar with any workflows or tools that allow you to review
and accept changes to
two branches as part of the same series of changes.

I also think that the history wouldn't be very clear in this case,
since there won't be any obvious
connection between updates to the suggested-hooks and updates to the
rest of the history in
this case (other than timestamps I guess, but I think that's a
relatively weak form of association)

> It seems like changes to hooks have to be managed in a similar way to standard managed files rather than as exceptions.
>
> -Randall
>

Thanks,
Matthew Rogers

Ævar Arnfjörð Bjarmason June 20, 2021, 7:51 p.m. UTC | #7

On Wed, Jun 16 2021, Jonathan Tan wrote:

> This is a continuation of the work from [1]. That work described the
> reasons, possible features, and possible workflows, but not the
> implementation in detail. This patch set has an MVP implementation, and
> my hope is that having a concrete implementation to look at makes it
> easier to discuss matters of implementation.

My C on this RFC is:

1) A request that someone reply (there or here would do) to my comments
   on the last iteration of this at:
   https://lore.kernel.org/git/874kghk906.fsf@evledraar.gmail.com/

2) I think you'd get better feedback if you CC'd the people who've been
  actively discussing this in previous rounds.

To briefly summarize the main gist of that 1):

> This but does not use any features from es/config-based-hooks but is
> implemented on that branch anyway because firstly, I need an existing
> command to attach the "autoupdate" subcommand (and "git hook" works),
> and secondly, when we test this at $DAYJOB, we will be testing it
> together with the aforementioned branch.
>
> I have had to make several design choices (which I will discuss later),
> but now with this implementation, the following workflow is possible:
>
>  1. The remote repo administrator creates a new branch
>     "refs/heads/suggested-hooks" pointing to a commit that has all the
>     hooks that the administrator wants to suggest. The hooks are
>     directly referenced by the commit tree (i.e. they are in the "/"
>     directory).
>
>  2. When a user clones, Git notices that
>     "refs/remotes/origin/suggested-hooks" is present and prints out a
>     message about a command that can be run.
>
>  3. If the user runs that command, Git will install the hooks pointed to
>     by that ref, and set hook.autoupdate to true. This config variable
>     is checked whenever "git fetch" is run: whenever it notices that
>     "refs/remotes/origin/suggested-hooks" changes, it will reinstall the
>     hooks.
>
>  4. To turn off autoupdate, set hook.autoupdate to false. Existing hooks
>     will remain.
>
> Design choices:
>
>  1. Where should the suggested hooks be in the remote repo? A branch,
>     a non-branch ref, a config? I think that a branch is best - it is
>     relatively well-understood and any hooks there can be
>     version-controlled (and its history is independent of the other
>     branches).

First, unlike brian I don't (I hope I'm fairly summarizing his view
here) disagree mostly or entirely with the existence of such a feature
at all. I mean, I get the viewpoint that git shouldn't bless what
amounts to an active RCE from the remote.

I just think that we could probably do a better job of it than what
people are doing in practice, and I've seen people do stuff like have
build systems setup permanent symlinks to git-hooks/<some-name> in the
tracked dir. We could at least envision a git-native implementation
asking the user "do you want this hook update? <shows diff>".

I just find this design approach completely bizarre as noted (probably
in less blunt words) in the linked E-Mail.

We have Emily's series to convert hooks to be config driven that we hope
to land in some form, at that point they won't be any more of a special
snowflake than any other config.

And then, instead of doing what I'd think would be the natural result of
that: Simply supporting an in-repo top-level ".gitconfig" file. We're
still going to seemingly forever have them be an even more special
snowflake with this facility, and the reason seems to be mostly/entirely
to do with working around some aspect or restriction of Google's
internal infrastructure.

I think it's just un-git-y to have a meta-branch that in some way drives
not only all other branches, but all other revisions of all branches,
ever.

It breaks expectations around git in lots of different ways, you can't
fetch a single branch and get its hooks, you can't locally alter, commit
and update your hooks while e.g. renaming a "t/" directory to "test/";
your hooks and code can't be atomically changed).

I think I get why you want to do it that way, I just don't get why, as
mostly noted in those earlier rounds why it wouldn't be a better
approach / more straightforward / more git-y to:

1. Work on getting hooks driven by config <this is happening with
   Emily's series / my split-out "base" topic>
2. Have a facility to read an in-repo '.gitconfig'; have lots of safety
   valves etc. around this, I suggested starting with a whitelist of the
   N least dangerous config options, e.g. some diff viewing options, or
   a suggested sendemail.to or whatever.

3. Work our way up to trusting that for more dangerous stuff, eventually
   hooks. Most of the legitimate concerns from others with this is
   having some UX where our users won't be trained to just blindly say
   "yes" to an alias/hook config that "rm -rf's /" or whatever.

   If we start experimenting with that with aliases or hooks that can
   run arbitrary code it's like handing a toddlder a shotgun, let's at
   least start with a sharp fork or something (less dangerous config) :)

4. People who want this "I want my hooks to apply to all revisions ever"
   could probably get 99% or 100% of what they want if their hook is
   just a stub that does the equivalent of:

       sh `curl https://git.google.com/$reponame/hooks/$hookname`

   You'd then simply forbid on your servers any changes to a .gitconfig
   that did anything with the hook.* namespace.

With such an implementation you don't need a magic
"refs/remotes/origin/suggested-hooks" refs, just some state machine (I
suggested e.g. GPG signing chains as an eventual end-state, but "show a
diff every time" would also do) that keeps track of what config (and
hooks are just one such case) has been OK'd, and which has not.

I'd think it would even work better in the Googleplex, you could clone a
co-worker's branch and execute their hooks, since they're the same as
what you've pre-approved, you could even clone some random person's fork
of a "blessed" project, because the hooks would be the same `sh $(curl
<url I already trust>)`. That validation could even be a system-level
in-config hook on your laptop, thus bringing the whole thing full
circle...

Jonathan Tan June 21, 2021, 6:37 p.m. UTC | #8

> Related question, if a project had a collection of suggested hooks,
> and a contributor wanted
> to update them in relation to a new feature or code change, would they
> then have to create two
> separate patches/PRs?  That feels like it would decentralize discussions/review?
> 
> For example, if I maintained a project on GitHub and a contributor
> wanted to add a clang-tidy
> invocation as a suggested hook, as well as a config file for it.  What
> would the suggested workflow be?
> 
> "Open 2 Pull Requests one that updates both branches and do reviews
> independently"?
> 
> If it was a mailing list would I have to send two separate patches?

With the current design, the answer is "yes" to both. Having said that,
for this particular case, I think that the clang-tidy config file can be
reviewed and merged first, and then the clang-tidy hook. (The hook needs
to be written to also work in the absence of the config file, since the
user may have checked out an older version, so as far as I can tell,
there is no need for precise timing as to when the hook is merged.)

Jonathan Tan June 21, 2021, 6:58 p.m. UTC | #9

> On Wed, Jun 16 2021, Jonathan Tan wrote:
> 
> > This is a continuation of the work from [1]. That work described the
> > reasons, possible features, and possible workflows, but not the
> > implementation in detail. This patch set has an MVP implementation, and
> > my hope is that having a concrete implementation to look at makes it
> > easier to discuss matters of implementation.
> 
> My C on this RFC is:
> 
> 1) A request that someone reply (there or here would do) to my comments
>    on the last iteration of this at:
>    https://lore.kernel.org/git/874kghk906.fsf@evledraar.gmail.com/

OK - I'll take a look at that.

> 2) I think you'd get better feedback if you CC'd the people who've been
>   actively discussing this in previous rounds.

Good point.

> > Design choices:
> >
> >  1. Where should the suggested hooks be in the remote repo? A branch,
> >     a non-branch ref, a config? I think that a branch is best - it is
> >     relatively well-understood and any hooks there can be
> >     version-controlled (and its history is independent of the other
> >     branches).
> 
> First, unlike brian I don't (I hope I'm fairly summarizing his view
> here) disagree mostly or entirely with the existence of such a feature
> at all. I mean, I get the viewpoint that git shouldn't bless what
> amounts to an active RCE from the remote.
> 
> I just think that we could probably do a better job of it than what
> people are doing in practice, and I've seen people do stuff like have
> build systems setup permanent symlinks to git-hooks/<some-name> in the
> tracked dir. We could at least envision a git-native implementation
> asking the user "do you want this hook update? <shows diff>".
> 
> I just find this design approach completely bizarre as noted (probably
> in less blunt words) in the linked E-Mail.

That's fair. You suggest an alternative below (and maybe more in the
linked e-mail) - let's look at your suggestion...

> We have Emily's series to convert hooks to be config driven that we hope
> to land in some form, at that point they won't be any more of a special
> snowflake than any other config.
> 
> And then, instead of doing what I'd think would be the natural result of
> that: Simply supporting an in-repo top-level ".gitconfig" file. We're
> still going to seemingly forever have them be an even more special
> snowflake with this facility, and the reason seems to be mostly/entirely
> to do with working around some aspect or restriction of Google's
> internal infrastructure.

I don't think that this is "natural". In particular, I still don't think
that hooks should be tied to code revision. E.g. if we make commits
based on an old revision and push them, we still want them to follow the
latest requirements.

> I think it's just un-git-y to have a meta-branch that in some way drives
> not only all other branches, but all other revisions of all branches,
> ever.
> 
> It breaks expectations around git in lots of different ways, you can't
> fetch a single branch and get its hooks,

Are you saying that each branch should have its own hooks? That might be
reasonable in certain projects, but I don't see how that is a Git
expectation.

> you can't locally alter, commit
> and update your hooks while e.g. renaming a "t/" directory to "test/";
> your hooks and code can't be atomically changed).

I still think that hooks should work independent of code versions, so I
wouldn't think that atomicity here is important.

> I think I get why you want to do it that way, I just don't get why, as
> mostly noted in those earlier rounds why it wouldn't be a better
> approach / more straightforward / more git-y to:
> 
> 1. Work on getting hooks driven by config <this is happening with
>    Emily's series / my split-out "base" topic>
> 2. Have a facility to read an in-repo '.gitconfig'; have lots of safety
>    valves etc. around this, I suggested starting with a whitelist of the
>    N least dangerous config options, e.g. some diff viewing options, or
>    a suggested sendemail.to or whatever.

I've replied to this above.

> 3. Work our way up to trusting that for more dangerous stuff, eventually
>    hooks. Most of the legitimate concerns from others with this is
>    having some UX where our users won't be trained to just blindly say
>    "yes" to an alias/hook config that "rm -rf's /" or whatever.
> 
>    If we start experimenting with that with aliases or hooks that can
>    run arbitrary code it's like handing a toddlder a shotgun, let's at
>    least start with a sharp fork or something (less dangerous config) :)
> 
> 4. People who want this "I want my hooks to apply to all revisions ever"
>    could probably get 99% or 100% of what they want if their hook is
>    just a stub that does the equivalent of:
> 
>        sh `curl https://git.google.com/$reponame/hooks/$hookname`
> 
>    You'd then simply forbid on your servers any changes to a .gitconfig
>    that did anything with the hook.* namespace.

This would work if set in .git/config (not version controlled), but not
.gitconfig (version controlled).

> With such an implementation you don't need a magic
> "refs/remotes/origin/suggested-hooks" refs, just some state machine (I
> suggested e.g. GPG signing chains as an eventual end-state, but "show a
> diff every time" would also do) that keeps track of what config (and
> hooks are just one such case) has been OK'd, and which has not.

This sounds complicated.

> I'd think it would even work better in the Googleplex, you could clone a
> co-worker's branch and execute their hooks, since they're the same as
> what you've pre-approved,

In the presence of .gitconfig, how would you know?

> you could even clone some random person's fork
> of a "blessed" project, because the hooks would be the same `sh $(curl
> <url I already trust>)`. That validation could even be a system-level
> in-config hook on your laptop, thus bringing the whole thing full
> circle...

Same here.

In summary, I think your point of using hook configs + remote-suggested
configs instead of remote-suggested hooks is a reasonable one, but I
disagree with your reasons (or, at least, your reasons as I understand
them).

Ævar Arnfjörð Bjarmason June 21, 2021, 7:35 p.m. UTC | #10

On Mon, Jun 21 2021, Jonathan Tan wrote:

>> On Wed, Jun 16 2021, Jonathan Tan wrote:
>> 
>> > This is a continuation of the work from [1]. That work described the
>> > reasons, possible features, and possible workflows, but not the
>> > implementation in detail. This patch set has an MVP implementation, and
>> > my hope is that having a concrete implementation to look at makes it
>> > easier to discuss matters of implementation.
>> 
>> My C on this RFC is:
>> 
>> 1) A request that someone reply (there or here would do) to my comments
>>    on the last iteration of this at:
>>    https://lore.kernel.org/git/874kghk906.fsf@evledraar.gmail.com/
>
> OK - I'll take a look at that.
>
>> 2) I think you'd get better feedback if you CC'd the people who've been
>>   actively discussing this in previous rounds.
>
> Good point.
>
>> > Design choices:
>> >
>> >  1. Where should the suggested hooks be in the remote repo? A branch,
>> >     a non-branch ref, a config? I think that a branch is best - it is
>> >     relatively well-understood and any hooks there can be
>> >     version-controlled (and its history is independent of the other
>> >     branches).
>> 
>> First, unlike brian I don't (I hope I'm fairly summarizing his view
>> here) disagree mostly or entirely with the existence of such a feature
>> at all. I mean, I get the viewpoint that git shouldn't bless what
>> amounts to an active RCE from the remote.
>> 
>> I just think that we could probably do a better job of it than what
>> people are doing in practice, and I've seen people do stuff like have
>> build systems setup permanent symlinks to git-hooks/<some-name> in the
>> tracked dir. We could at least envision a git-native implementation
>> asking the user "do you want this hook update? <shows diff>".
>> 
>> I just find this design approach completely bizarre as noted (probably
>> in less blunt words) in the linked E-Mail.
>
> That's fair. You suggest an alternative below (and maybe more in the
> linked e-mail) - let's look at your suggestion...
>
>> We have Emily's series to convert hooks to be config driven that we hope
>> to land in some form, at that point they won't be any more of a special
>> snowflake than any other config.
>> 
>> And then, instead of doing what I'd think would be the natural result of
>> that: Simply supporting an in-repo top-level ".gitconfig" file. We're
>> still going to seemingly forever have them be an even more special
>> snowflake with this facility, and the reason seems to be mostly/entirely
>> to do with working around some aspect or restriction of Google's
>> internal infrastructure.
>
> I don't think that this is "natural". In particular, I still don't think
> that hooks should be tied to code revision. E.g. if we make commits
> based on an old revision and push them, we still want them to follow the
> latest requirements.

Even for real-world centralized workflow situations where I've seen
people think they want that, and the end of the day they almost never
actually want that.

Even something like code linting is a good example, to make it
Google-specific: say for a Go project: Are you going to pin your linting
tool/version to whatever understood your YAML format for the linter as
it was specced 10 years ago when the project started?  It's simply a
giant hassle to have a piece of code operate on every version of your
project ever in a way that doesn't break.

I think in practice the designers of this feature don't actually have
that in mind, but a "close to trunk" workflow, where you'd expect a hook
to only need to operate on revisions for the last few weeks or months,
because that'll be the oldest think people create new topics from.

But I think the burden of proof is really on the other side here,
something that works entirely differently than the rest of git needs to
have a good reason. Our in-repo .gitattributes don't work like this, nor
.gitignore, .mailmap etc.

There's also real world uses of git where the "branches" are wildly
divergent, e.g. I've worked on a system automation repo where the
"master" was just a stub template, and every team had their own almost
entirely different "repo-like" branch. Probably a bad idea for various
reasons, but Git supports it just fine.

For the centralized use-case what's the problem with just having the
hook do a 'for-each-ref --format=' invocation or "cat-file --batch" on
the "origin", and eval what it finds there? I'd think that gives you
what you want for the more centarlized workflow, while leaving git's
implementation working like the rest of our in-repo files.

>> I think it's just un-git-y to have a meta-branch that in some way drives
>> not only all other branches, but all other revisions of all branches,
>> ever.
>> 
>> It breaks expectations around git in lots of different ways, you can't
>> fetch a single branch and get its hooks,
>
> Are you saying that each branch should have its own hooks? That might be
> reasonable in certain projects, but I don't see how that is a Git
> expectation.

It's a git expectation now that I can add git.git as a remote, also
chromium.git, and linux.git, fetch them all, and happily switch in the
same repo between entirely different codebases that don't share a
history.

>> you can't locally alter, commit
>> and update your hooks while e.g. renaming a "t/" directory to "test/";
>> your hooks and code can't be atomically changed).
>
> I still think that hooks should work independent of code versions, so I
> wouldn't think that atomicity here is important.

Covered above.

>> I think I get why you want to do it that way, I just don't get why, as
>> mostly noted in those earlier rounds why it wouldn't be a better
>> approach / more straightforward / more git-y to:
>> 
>> 1. Work on getting hooks driven by config <this is happening with
>>    Emily's series / my split-out "base" topic>
>> 2. Have a facility to read an in-repo '.gitconfig'; have lots of safety
>>    valves etc. around this, I suggested starting with a whitelist of the
>>    N least dangerous config options, e.g. some diff viewing options, or
>>    a suggested sendemail.to or whatever.
>
> I've replied to this above.

Not really, even if we went for this one-HEAD-version-to-rule-them-all
plan wouldn't it make more sense to generalize it as a
refs/remotes/origin/magic-config, and we'd discover a ".gitconfig" file
under that commit/tree.

I.e. whether we generalize this to config in general is orthagonal to
whether such config lives in HEAD or in a magic ref.

With hooks as config I don't see how you'd make any of this
hook-specific, there's other config where the "every revision ever"
applies much more strongly, e.g. sendemail.to. If that changed for this
project tomorrow you wouldn't want a patch based on "maint" to send
things to a different ML.

>> 3. Work our way up to trusting that for more dangerous stuff, eventually
>>    hooks. Most of the legitimate concerns from others with this is
>>    having some UX where our users won't be trained to just blindly say
>>    "yes" to an alias/hook config that "rm -rf's /" or whatever.
>> 
>>    If we start experimenting with that with aliases or hooks that can
>>    run arbitrary code it's like handing a toddlder a shotgun, let's at
>>    least start with a sharp fork or something (less dangerous config) :)
>> 
>> 4. People who want this "I want my hooks to apply to all revisions ever"
>>    could probably get 99% or 100% of what they want if their hook is
>>    just a stub that does the equivalent of:
>> 
>>        sh `curl https://git.google.com/$reponame/hooks/$hookname`
>> 
>>    You'd then simply forbid on your servers any changes to a .gitconfig
>>    that did anything with the hook.* namespace.
>
> This would work if set in .git/config (not version controlled), but not
> .gitconfig (version controlled).

Sorry, what wouldn't work? I meant you'd forbid pushes to your in-repo
.gitconfig in your "master" branch or whatever, just like you're
presumably planning some stronger ACLs for this magic hook branch.

>> With such an implementation you don't need a magic
>> "refs/remotes/origin/suggested-hooks" refs, just some state machine (I
>> suggested e.g. GPG signing chains as an eventual end-state, but "show a
>> diff every time" would also do) that keeps track of what config (and
>> hooks are just one such case) has been OK'd, and which has not.
>
> This sounds complicated.

On the contrary I think anything that leans into git's
content-addressable security model is way less complicated. You don't
care who you fetched Junio's v2.32.0 tag from, what matters is that the
signing chain validates.

The plan of having this magic branch means a whole new trust model for
git, you trust magical authorized remotes. If you trust signed content
chains you can trust hooks if their last modification can be traced to a
signing authority you trust.

    It's really just:

        if (hook_content_changed() && hook_content_same_as_in_ok'd_revision_from_upsteam())
            trust_hooks();

But while we're on the subject, it seems like a very generous assumption
to think that just because you trust hooks at a given revision (or
always trust the latest), that you implicitly trust them when *combined
with* all past and future revisions from the same repository.

Even without a malicious actor that seems like it'll inevitably break in
all sorts of data-destroying ways. E.g. people commit stuff
accidentally. A hook run under a "git bisect" that naïvely does an "rm
*" will eat your data if you land on a revision that an in-tree "-rf"
file.

But once you get to a malicious actor who can say push a topic branch
but not hook updates, will your hooks deal with files with whitespace in
them, arbitrary crafted content etc?

So I'd think that's an even better reason to prefer the in-repo
per-revision atomically committed plan, and only trust hooks for the
revision they're shipped with, at least as a default git security model.

>> I'd think it would even work better in the Googleplex, you could clone a
>> co-worker's branch and execute their hooks, since they're the same as
>> what you've pre-approved,
>
> In the presence of .gitconfig, how would you know?

If it's the same config, or you can automatically OK it. So "same" was
discussed above, or you could trust any hook that's only doing a wget of
some trusted domain and piping that to "sh".

>> you could even clone some random person's fork
>> of a "blessed" project, because the hooks would be the same `sh $(curl
>> <url I already trust>)`. That validation could even be a system-level
>> in-config hook on your laptop, thus bringing the whole thing full
>> circle...
>
> Same here.
>
> In summary, I think your point of using hook configs + remote-suggested
> configs instead of remote-suggested hooks is a reasonable one, but I
> disagree with your reasons (or, at least, your reasons as I understand
> them).

You trust e.g. chromium.git's hooks, but I clone it, patch it, and
re-push it to somegithost.com URL. If you go with trusting content it
becomes easy to install those trusted hooks for the common case, but not
if your entire trust model relies on what URL you git clone'd from.

brian m. carlson June 22, 2021, 12:40 a.m. UTC | #11

On 2021-06-20 at 19:51:04, Ævar Arnfjörð Bjarmason wrote:
> 
> On Wed, Jun 16 2021, Jonathan Tan wrote:
> 
> > This but does not use any features from es/config-based-hooks but is
> > implemented on that branch anyway because firstly, I need an existing
> > command to attach the "autoupdate" subcommand (and "git hook" works),
> > and secondly, when we test this at $DAYJOB, we will be testing it
> > together with the aforementioned branch.
> >
> > I have had to make several design choices (which I will discuss later),
> > but now with this implementation, the following workflow is possible:
> >
> >  1. The remote repo administrator creates a new branch
> >     "refs/heads/suggested-hooks" pointing to a commit that has all the
> >     hooks that the administrator wants to suggest. The hooks are
> >     directly referenced by the commit tree (i.e. they are in the "/"
> >     directory).
> >
> >  2. When a user clones, Git notices that
> >     "refs/remotes/origin/suggested-hooks" is present and prints out a
> >     message about a command that can be run.
> >
> >  3. If the user runs that command, Git will install the hooks pointed to
> >     by that ref, and set hook.autoupdate to true. This config variable
> >     is checked whenever "git fetch" is run: whenever it notices that
> >     "refs/remotes/origin/suggested-hooks" changes, it will reinstall the
> >     hooks.
> >
> >  4. To turn off autoupdate, set hook.autoupdate to false. Existing hooks
> >     will remain.
> >
> > Design choices:
> >
> >  1. Where should the suggested hooks be in the remote repo? A branch,
> >     a non-branch ref, a config? I think that a branch is best - it is
> >     relatively well-understood and any hooks there can be
> >     version-controlled (and its history is independent of the other
> >     branches).
> 
> First, unlike brian I don't (I hope I'm fairly summarizing his view
> here) disagree mostly or entirely with the existence of such a feature
> at all. I mean, I get the viewpoint that git shouldn't bless what
> amounts to an active RCE from the remote.

It's accurate that I'm generally opposed to such a feature.  I feel that
suggesting people install hooks is likely to lead to social engineering
attacks, and it's also likely to lead to bad practices such as the
expectation that all developers will install hooks or the use of hooks
instead of CI or other effective controls.

If we do add this feature (which, as I said, I'm opposed to) and we
decide to store it in a ref, that ref should not be a normal branch by
default (it should be a special one-level ref, like refs/stash or such),
and the ref name should be configurable.  Not all developers use English
as their working language and we should respect that.

In addition, there should be an advice.* option that allows people to
turn this off once and for all, and it should be clearly documented.
Ideally it should be off by default.

> I think I get why you want to do it that way, I just don't get why, as
> mostly noted in those earlier rounds why it wouldn't be a better
> approach / more straightforward / more git-y to:
> 
> 1. Work on getting hooks driven by config <this is happening with
>    Emily's series / my split-out "base" topic>
> 2. Have a facility to read an in-repo '.gitconfig'; have lots of safety
>    valves etc. around this, I suggested starting with a whitelist of the
>    N least dangerous config options, e.g. some diff viewing options, or
>    a suggested sendemail.to or whatever.

This also makes me deeply nervous for much of the same reasons.  There
are situations where e.g. ignoring whitespace can lead to security
problems in code review (think Python), and in general it's hard to
reason about all the ways people can do malicious things.  Typically
adding untrusted config ends poorly (think of all the modeline
vulnerabilities in Vim).

I'd definitely want support for this to be off with no prompting by
default.

Jonathan Tan June 22, 2021, 1:27 a.m. UTC | #12

> >> And then, instead of doing what I'd think would be the natural result of
> >> that: Simply supporting an in-repo top-level ".gitconfig" file. We're
> >> still going to seemingly forever have them be an even more special
> >> snowflake with this facility, and the reason seems to be mostly/entirely
> >> to do with working around some aspect or restriction of Google's
> >> internal infrastructure.
> >
> > I don't think that this is "natural". In particular, I still don't think
> > that hooks should be tied to code revision. E.g. if we make commits
> > based on an old revision and push them, we still want them to follow the
> > latest requirements.
> 
> Even for real-world centralized workflow situations where I've seen
> people think they want that, and the end of the day they almost never
> actually want that.
> 
> Even something like code linting is a good example, to make it
> Google-specific: say for a Go project: Are you going to pin your linting
> tool/version to whatever understood your YAML format for the linter as
> it was specced 10 years ago when the project started?  It's simply a
> giant hassle to have a piece of code operate on every version of your
> project ever in a way that doesn't break.
> 
> I think in practice the designers of this feature don't actually have
> that in mind, but a "close to trunk" workflow, where you'd expect a hook
> to only need to operate on revisions for the last few weeks or months,
> because that'll be the oldest think people create new topics from.

This is probably true, but it still means that people would want any
hook changes to work retroactively, even if it's just on code for the
last few months. So they still need some independence between the hook
and the main code.

> But I think the burden of proof is really on the other side here,
> something that works entirely differently than the rest of git needs to
> have a good reason. Our in-repo .gitattributes don't work like this, nor
> .gitignore, .mailmap etc.

Not all of Git works this way though - e.g. hooks themselves, and
config.

> There's also real world uses of git where the "branches" are wildly
> divergent, e.g. I've worked on a system automation repo where the
> "master" was just a stub template, and every team had their own almost
> entirely different "repo-like" branch. Probably a bad idea for various
> reasons, but Git supports it just fine.

I presume this repo either does not use hooks, or its hooks are built
for such a branch setup? I envision that the admins of the remote would
only suggest hooks that work with their branch setup.

> For the centralized use-case what's the problem with just having the
> hook do a 'for-each-ref --format=' invocation or "cat-file --batch" on
> the "origin", and eval what it finds there? I'd think that gives you
> what you want for the more centarlized workflow, while leaving git's
> implementation working like the rest of our in-repo files.

This would future-proof such a hook, but not make it work retroactively
in the past. (The repo admin could just include no-op hook for all hooks
to future-proof, I guess, but this wouldn't work if we ever introduced
new hooks.)

> >> I think it's just un-git-y to have a meta-branch that in some way drives
> >> not only all other branches, but all other revisions of all branches,
> >> ever.
> >>
> >> It breaks expectations around git in lots of different ways, you can't
> >> fetch a single branch and get its hooks,
> >
> > Are you saying that each branch should have its own hooks? That might be
> > reasonable in certain projects, but I don't see how that is a Git
> > expectation.
> 
> It's a git expectation now that I can add git.git as a remote, also
> chromium.git, and linux.git, fetch them all, and happily switch in the
> same repo between entirely different codebases that don't share a
> history.

This expectation holds only if you know that your hooks can tolerate
such a scenario. If you're using remote-suggested hooks, I don't think
it's a matter of "before, I could have many remotes but now I cannot"
but "before, I had to install these hooks myself and now it's automatic"
- whether this workflow is supported or not would be the same before and
now.

> >> I think I get why you want to do it that way, I just don't get why, as
> >> mostly noted in those earlier rounds why it wouldn't be a better
> >> approach / more straightforward / more git-y to:
> >>
> >> 1. Work on getting hooks driven by config <this is happening with
> >>    Emily's series / my split-out "base" topic>
> >> 2. Have a facility to read an in-repo '.gitconfig'; have lots of safety
> >>    valves etc. around this, I suggested starting with a whitelist of the
> >>    N least dangerous config options, e.g. some diff viewing options, or
> >>    a suggested sendemail.to or whatever.
> >
> > I've replied to this above.
> 
> Not really, even if we went for this one-HEAD-version-to-rule-them-all
> plan wouldn't it make more sense to generalize it as a
> refs/remotes/origin/magic-config, and we'd discover a ".gitconfig" file
> under that commit/tree.
> 
> I.e. whether we generalize this to config in general is orthagonal to
> whether such config lives in HEAD or in a magic ref.
> 
> With hooks as config I don't see how you'd make any of this
> hook-specific, there's other config where the "every revision ever"
> applies much more strongly, e.g. sendemail.to. If that changed for this
> project tomorrow you wouldn't want a patch based on "maint" to send
> things to a different ML.

My opposition to .gitconfig was that it is per-commit, but here it seems
that you're saying that there are reasons for it not to be per-commit
(e.g. the sendemail example).

> >> 4. People who want this "I want my hooks to apply to all revisions ever"
> >>    could probably get 99% or 100% of what they want if their hook is
> >>    just a stub that does the equivalent of:
> >>
> >>        sh `curl https://git.google.com/$reponame/hooks/$hookname`
> >>
> >>    You'd then simply forbid on your servers any changes to a .gitconfig
> >>    that did anything with the hook.* namespace.
> >
> > This would work if set in .git/config (not version controlled), but not
> > .gitconfig (version controlled).
> 
> Sorry, what wouldn't work? I meant you'd forbid pushes to your in-repo
> .gitconfig in your "master" branch or whatever, just like you're
> presumably planning some stronger ACLs for this magic hook branch.

The "I want my hooks to apply to all revisions ever" wouldn't work if it
was per-commit .gitconfig. (It would work if it was in the magic
branch, but at that time, what I understood by ".gitconfig" is the
per-commit version.)

> >> With such an implementation you don't need a magic
> >> "refs/remotes/origin/suggested-hooks" refs, just some state machine (I
> >> suggested e.g. GPG signing chains as an eventual end-state, but "show a
> >> diff every time" would also do) that keeps track of what config (and
> >> hooks are just one such case) has been OK'd, and which has not.
> >
> > This sounds complicated.
> 
> On the contrary I think anything that leans into git's
> content-addressable security model is way less complicated. You don't
> care who you fetched Junio's v2.32.0 tag from, what matters is that the
> signing chain validates.
> 
> The plan of having this magic branch means a whole new trust model for
> git, you trust magical authorized remotes. If you trust signed content
> chains you can trust hooks if their last modification can be traced to a
> signing authority you trust.

The security model might be preexisting, but the things needing to be
built around it make it more complicated. Take this case - I presume
this means that the client would need to tell Git that a certain public
key is considered trusted (perhaps prompted by the server upon clone, so
the client would only need to copy and paste "git trust-key $HTTPS_URL"
or something like that). Whenever a user wants to update a hook, they
will make a commit, and when the PR is merged, the merge commit (or the
tip commit if fast-forwarded) must be signed with the same key.

Compare this with trusting that commits coming from a certain HTTPS URL
are fine. The workflows are the same as for a regular code commit.

>     It's really just:
> 
>         if (hook_content_changed() && hook_content_same_as_in_ok'd_revision_from_upsteam())
>             trust_hooks();
> 
> But while we're on the subject, it seems like a very generous assumption
> to think that just because you trust hooks at a given revision (or
> always trust the latest), that you implicitly trust them when *combined
> with* all past and future revisions from the same repository.
> 
> Even without a malicious actor that seems like it'll inevitably break in
> all sorts of data-destroying ways. E.g. people commit stuff
> accidentally. A hook run under a "git bisect" that navely does an "rm
> *" will eat your data if you land on a revision that an in-tree "-rf"
> file.

I think we have the same problem with current hooks and other config,
and we have been dealing with them relatively well (as far as I know).

> But once you get to a malicious actor who can say push a topic branch
> but not hook updates, will your hooks deal with files with whitespace in
> them, arbitrary crafted content etc?
> 
> So I'd think that's an even better reason to prefer the in-repo
> per-revision atomically committed plan, and only trust hooks for the
> revision they're shipped with, at least as a default git security model.

The malicious actor could include the same hook in their topic branch,
so I don't see how the per-revision hook is better than the all-revision
hook.

> >> I'd think it would even work better in the Googleplex, you could clone a
> >> co-worker's branch and execute their hooks, since they're the same as
> >> what you've pre-approved,
> >
> > In the presence of .gitconfig, how would you know?
> 
> If it's the same config, or you can automatically OK it. So "same" was
> discussed above, or you could trust any hook that's only doing a wget of
> some trusted domain and piping that to "sh".

Extending trust to hooks that are exactly the same makes sense.

> You trust e.g. chromium.git's hooks, but I clone it, patch it, and
> re-push it to somegithost.com URL. If you go with trusting content it
> becomes easy to install those trusted hooks for the common case, but not
> if your entire trust model relies on what URL you git clone'd from.

If the repo administrator provides a key and signs their hooks, then the
repo itself is indeed more portable and does not need to come from a
single source. This may be useful, but I'm not sure how useful this is,
since hooks are typically provided by projects. And as far as I know
projects generally have one source of truth, so the portability is not a
big plus.

Jonathan Tan June 23, 2021, 10:58 p.m. UTC | #13

> > First, unlike brian I don't (I hope I'm fairly summarizing his view
> > here) disagree mostly or entirely with the existence of such a feature
> > at all. I mean, I get the viewpoint that git shouldn't bless what
> > amounts to an active RCE from the remote.
> 
> It's accurate that I'm generally opposed to such a feature.

From emails like [1], I think that you have understood both the pros and
cons, and decided that the cons outweigh the pros, which is fair. I'll
reply so that others can know what I think about these points.

[1] https://lore.kernel.org/git/YGzrfaSC4xd75j2U@camp.crustytoothpaste.net/

> I feel that
> suggesting people install hooks is likely to lead to social engineering
> attacks,

I think that this vector of social engineering attack already exists, as
there are projects that without malice recommend hook installation (so
an attacker could masquerade as such). It is true that Git itself
printing a message recommending hook installation perhaps could lend a
false sense of security, but at least we can control what that message
says (as opposed to a project recommending a third-party tool, all with
messaging out of our control).

> and it's also likely to lead to bad practices such as the
> expectation that all developers will install hooks or the use of hooks
> instead of CI or other effective controls.

I agree that hooks can be overused or misused, but there are still
legitimate uses of it that a project might want to use.

> If we do add this feature (which, as I said, I'm opposed to) and we
> decide to store it in a ref, that ref should not be a normal branch by
> default (it should be a special one-level ref, like refs/stash or such),

Any particular reason not to expose it as a branch (besides following
from your general idea that a user should seek out such a feature and
not have it presented to them up-front)?

> and the ref name should be configurable.  Not all developers use English
> as their working language and we should respect that.

That makes sense.

> In addition, there should be an advice.* option that allows people to
> turn this off once and for all, and it should be clearly documented.
> Ideally it should be off by default.

I don't think this would be considered "advice" like the other options,
but having an option to turn this off once and for all makes sense.
Making it off by default would probably mean that projects that use such
hooks would recommend cloning with "git -c my-config=1 clone $URL", but
perhaps that's OK.

> > I think I get why you want to do it that way, I just don't get why, as
> > mostly noted in those earlier rounds why it wouldn't be a better
> > approach / more straightforward / more git-y to:
> > 
> > 1. Work on getting hooks driven by config <this is happening with
> >    Emily's series / my split-out "base" topic>
> > 2. Have a facility to read an in-repo '.gitconfig'; have lots of safety
> >    valves etc. around this, I suggested starting with a whitelist of the
> >    N least dangerous config options, e.g. some diff viewing options, or
> >    a suggested sendemail.to or whatever.
> 
> This also makes me deeply nervous for much of the same reasons.  There
> are situations where e.g. ignoring whitespace can lead to security
> problems in code review (think Python), and in general it's hard to
> reason about all the ways people can do malicious things.  Typically
> adding untrusted config ends poorly (think of all the modeline
> vulnerabilities in Vim).
> 
> I'd definitely want support for this to be off with no prompting by
> default.

To use your example, the model we're proposing is more of only using the
modelines from sources we trust - as opposed to ensuring that all
possible options set by modelines are benign. Admittedly, the
administrator of the source may have difficulty ensuring that bad code
doesn't slip through code review, for example, but that is a problem
they already deal with (at least for projects with any form of
executable code in them, e.g. production code or a build script).

brian m. carlson June 24, 2021, 11:11 p.m. UTC | #14

On 2021-06-23 at 22:58:09, Jonathan Tan wrote:
> > If we do add this feature (which, as I said, I'm opposed to) and we
> > decide to store it in a ref, that ref should not be a normal branch by
> > default (it should be a special one-level ref, like refs/stash or such),
> 
> Any particular reason not to expose it as a branch (besides following
> from your general idea that a user should seek out such a feature and
> not have it presented to them up-front)?

Branches are for the main code of the project.  While it's possible to
have orphan branches that do other things, I think that's in general an
anti-pattern, and using a special ref for things which are separate and
independent from the main code of the repository would be a more elegant
solution.

> > In addition, there should be an advice.* option that allows people to
> > turn this off once and for all, and it should be clearly documented.
> > Ideally it should be off by default.
> 
> I don't think this would be considered "advice" like the other options,
> but having an option to turn this off once and for all makes sense.
> Making it off by default would probably mean that projects that use such
> hooks would recommend cloning with "git -c my-config=1 clone $URL", but
> perhaps that's OK.

Sure, I'm not picky about what it looks like in "advice" vs something
else.  I think forcing projects to explicitly opt-in to this behavior
means that the social engineering and security problems are much
reduced, and while I'm still not wild about the idea, I would feel much
better about it.

> > This also makes me deeply nervous for much of the same reasons.  There
> > are situations where e.g. ignoring whitespace can lead to security
> > problems in code review (think Python), and in general it's hard to
> > reason about all the ways people can do malicious things.  Typically
> > adding untrusted config ends poorly (think of all the modeline
> > vulnerabilities in Vim).
> > 
> > I'd definitely want support for this to be off with no prompting by
> > default.
> 
> To use your example, the model we're proposing is more of only using the
> modelines from sources we trust - as opposed to ensuring that all
> possible options set by modelines are benign. Admittedly, the
> administrator of the source may have difficulty ensuring that bad code
> doesn't slip through code review, for example, but that is a problem
> they already deal with (at least for projects with any form of
> executable code in them, e.g. production code or a build script).

As I think I've previously mentioned, I don't want to receive
configuration of my development environment from sources I trust.  Even
at work, I don't want the repositories I work with to modify my
development environment in this way.  I tend to have a highly customized
configuration that breaks many people's expectations about tooling, so
the cases that this isn't a security problem (in repositories I trust)
can still result in a functionality problem.

Also, since we don't know what repositories the user trusts, the only
safe assumption is that the user trusts nothing unless they explicitly
tell us.

Junio C Hamano June 28, 2021, 11:12 p.m. UTC | #15

"brian m. carlson" <sandals@crustytoothpaste.net> writes:

> If we do add this feature (which, as I said, I'm opposed to) and we
> decide to store it in a ref, that ref should not be a normal branch by
> default (it should be a special one-level ref, like refs/stash or such),
> and the ref name should be configurable.  Not all developers use English
> as their working language and we should respect that.

Just this part.

I am not sure what you are trying to achieve by requiring it to be
configurable.  After all, even for names of branches that are used
to store the code, which is of more significance than this "unusual"
ref, we've lived with a hardcoded 'master' and with the server-end
advertisability of the configured values (i.e. "git clone" was
designed to figure out if the origin used a custom name for the
primary line of history and use the same name from the beginning),
the end-user sitting on the receiving end did not have to do
anything special even when the project wanted to use a custom name.

But this "unusual" ref would not have a natural equivalent of "the
origin side can point the primary branch with its HEAD", so by
allowing it to be configurable, you are pretty much closing the door
for those who blindly honor whatever the origin tells them to use
when running "git clone" from doing so.  I agree that it is a good
security measure (and I am not sold to the "remote suggested hooks"
idea in the first place), but is that the real reason why you
suggested configurability?

[RFC,0/2] MVP implementation of remote-suggested hooks

Message

Comments