diff mbox series

[v2] hooks: propose project configured hooks

Message ID pull.908.v2.git.1616723016659.gitgitgadget@gmail.com (mailing list archive)
State Superseded
Headers show
Series [v2] hooks: propose project configured hooks | expand

Commit Message

Albert Cui March 26, 2021, 1:43 a.m. UTC
From: Albert Cui <albertqcui@gmail.com>

Hooks today are configured at the repository level, making it difficult to
share hooks across repositories. Configuration-based hook management, by
moving hooks configuration to the config, makes this much easier. However,
there is still no good way for project maintainers to encourage or enforce
adoption of specific hook commands on specific hook events in a repository.
As such, there are many tools that provide this functionality on top of Git.

This patch documents the requirements we propose for this feature as well as
a design sketch for implementation.

Signed-off-by: Albert Cui <albertqcui@gmail.com>
Helped-by: Emily Shaffer <emilyshaffer@google.com>
Change-Id: I5f6747524b97c51dfe5fa28e48ea03981b2da5b8
---
    [RFC] hooks: propose project configured hooks
    
    V2:
    
     * Clarify usage of the word repository; "repository owner" is
       specifically confusing so changed to "project"
     * Add section about local vs remote hooks, and why think why there's
       still use case for local hooks
     * Change design principles section to be more explicit about security
       considerations
     * Introduce requirement for a command to set up hooks
     * Add UX examples
    
    Signed-off-by: Albert Cui albertqcui@gmail.com Helped-by: Emily Shaffer
    emilyshaffer@google.com

Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-908%2Falbertcui%2Fhooks-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-908/albertcui/hooks-v2
Pull-Request: https://github.com/gitgitgadget/git/pull/908

Range-diff vs v1:

 1:  2a558d06245d ! 1:  473e14edd441 hooks: propose repository owner configured hooks
     @@ Metadata
      Author: Albert Cui <albertqcui@gmail.com>
      
       ## Commit message ##
     -    hooks: propose repository owner configured hooks
     +    hooks: propose project configured hooks
      
          Hooks today are configured at the repository level, making it difficult to
          share hooks across repositories. Configuration-based hook management, by
     @@ Commit message
          adoption of specific hook commands on specific hook events in a repository.
          As such, there are many tools that provide this functionality on top of Git.
      
     -    We propose adding native Git functionality to allow project maintainers to
     -    specify hooks that a user ought to install and utilize in their development
     -    workflows. This patch documents the requirements we propose for this feature
     -    as well as a design sketch for implementation.
     +    This patch documents the requirements we propose for this feature as well as
     +    a design sketch for implementation.
      
          Signed-off-by: Albert Cui <albertqcui@gmail.com>
          Helped-by: Emily Shaffer <emilyshaffer@google.com>
     +    Change-Id: I5f6747524b97c51dfe5fa28e48ea03981b2da5b8
      
       ## Documentation/Makefile ##
      @@ Documentation/Makefile: TECH_DOCS += technical/protocol-common
       TECH_DOCS += technical/protocol-v2
       TECH_DOCS += technical/racy-git
       TECH_DOCS += technical/reftable
     -+TECH_DOCS += technical/repository-owner-hooks
     ++TECH_DOCS += technical/project-configured-hooks
       TECH_DOCS += technical/send-pack-pipeline
       TECH_DOCS += technical/shallow
       TECH_DOCS += technical/signature-format
      
     - ## Documentation/technical/repository-owner-hooks.txt (new) ##
     + ## Documentation/technical/project-configured-hooks.txt (new) ##
      @@
     -+Repository Owner Hooks Administration
     -+-------------------------------------
     ++Project Configured Hooks
     ++------------------------
      +
      +Background
      +~~~~~~~~~~
     @@ Documentation/technical/repository-owner-hooks.txt (new)
      +execute commands or scripts when certain Git events occur. Some use cases
      +include:
      +
     -+* The `pre-commit` hook event: before committing, a developer may want to lint
     -+their changes to enforce certain code style and quality. If there are style
     -+issues, the developer may want the commit to fail so that they can fix the
     -+issues.
     ++* The `pre-commit` hook event:
      +
     -+* The `commit-msg` hook event: after committing, repository owners may want to
     -+enforce a certain commit message style. This may be out of necessity:
     ++  ** A developer may want to lint their changes to enforce certain code style
     ++  and quality. The project may want the commit to fail so that commits that
     ++  violate the project's style don't get uploaded.
     ++
     ++  ** The project may want to prevent developers from committing passwords or
     ++  other sensitive information e.g. via
     ++  https://github.com/awslabs/git-secrets[git-secrets].
     ++
     ++* The `commit-msg` hook event: the project may want to enforce a certain commit
     ++message style. This may be out of necessity:
      +https://www.gerritcodereview.com/[Gerrit Code Review], for example, requires
      +each commit to have a Change-Id in the footer.
      +
     -+* The `pre-push` hook: before pushing, repository owners may want to verify that
     -+pushes are going to the correct remote to prevent leaks.
     ++* The `pre-push` hook: the project may want to verify that pushes are going to
     ++the correct central repository to prevent leaks.
      +
      +A common thread between these use cases is to enforce certain behavior across
      +many developers working in the same code base, encouraging best practices and
      +healthy code quality.
      +
     -+Hooks today are configured at the repository level, making it difficult to share
     -+hooks across repositories.
     ++Hooks today are configured individually in each clone, making it difficult for
     ++project maintainers to enforce hooks usage across them.
      +https://lore.kernel.org/git/20210311021037.3001235-2-emilyshaffer@google.com[Configuration-based
     -+hook management], by moving hooks configuration to the config, makes this much
     -+easier. However, there is still no good way for project maintainers to encourage
     ++hook management], by moving hooks to the config, makes this easier; individuals
     ++can at least configure hooks to be used across multiple workspaces on their
     ++host. However, there is still no good way for project maintainers to encourage
      +or enforce adoption of specific hook commands on specific hook events in a
     -+repository. As such, there are many tools that provide this functionality on top
     -+of Git (see <<prior-art, Prior Art>>).
     ++clone. As such, there are many tools that provide this functionality on top of
     ++Git (see <<prior-art, Prior Art>>).
      +
      +We propose adding native Git functionality to allow project maintainers to
      +specify hooks that a user ought to install and utilize in their development
      +workflows.
      +
     ++Server-side vs Local Checks
     ++^^^^^^^^^^^^^^^^^^^^^^^^^^^
     ++
     ++A large motivation for this change is providing projects a method to enable
     ++local checks of code e.g. linting. As documented in
     ++https://git-scm.com/docs/gitfaq#_hooks[gitfaq], in some cases, server-side
     ++checks may be more appropriate, especially since developers can always skip
     ++local hook execution. We think there are still benefits to local checks:
     ++
     ++* Ensuring commit message style and preventing the committing of sensitive data,
     ++as described above. In particular, if CI results are public, as with many open
     ++source projects, server side checks are useless for guarding against leaking
     ++sensitive data.
     ++
     ++* Helps developers catch issues earlier: typically developers need to push to
     ++the remote to trigger server-side checks. Local hooks can be run anytime the
     ++developer wants. This is especially useful if the project has slow
     ++server-checks; catching issues locally can save the developer a lot of time
     ++waiting for CI. They are also useful for locally reproducing an issue identified
     ++in CI, helping resolve issues faster.
     ++
     ++* Since the typical goal of developers to submit changes to a central
     ++repository, their interests are aligned with the project maintainers'; while
     ++they can choose to skip local hook execution, it is in their interest to allow
     ++hooks to run at least before proposing code for submission to the central
     ++repository to increase the chances of the code getting accepted.
     ++
     ++In the ideal world, developers and project maintainers use both local and server
     ++side checks in their workflow. However, for many smaller projects, this may not
     ++be possible: CI may be too expensive to run or configure. The number of local
     ++solutions to this use case speaks to this need (see <<prior-art, Prior Art>>).
     ++Bringing this natively to Git can give all these developers a well-supported,
     ++secure implementation opposed to the fragmentation we see today.
     ++
      +User Goals / Critical User Journeys
      +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      +
     -+* As a repository owner / project maintainer,
     ++* As a project maintainer,
      +
      +    ** I want to enforce code style so I require developers to use a
      +    standardized linter.
      +
     -+    ** I want to prevent leaks / share code widely so I check that developers
     -+    are uploading code to the right place.
     ++    ** I want to prevent sensitive data from getting checked in.
      +
     -+    ** I want this to just work for all the developers in my repository, without
     ++    ** I want to prevent leaks so I check that developers are uploading code to
     ++    the right private central repository. Conversely, I may want to encourage
     ++    more open source development and encourage developers to push to public
     ++    central repositories.
     ++
     ++    ** I want this to just work for all the developers in my project, without
      +    needing to support them through configuration.
      +
     -+* As a developer developing in a repository,
     ++* As a developer developing in a local clone,
     ++
     ++    ** I want to set up my workspace.
     ++
     ++    ** I want control over what code runs on my machine.
     ++
     ++    ** I want to set up my own hooks, in addition to or in lieu of project
     ++    configured hooks.
     ++
     ++    ** I want to skip hooks from running (for various reasons).
     ++
     ++Security Considerations and Design Principles
     ++~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      +
     -+    ** I want to set up my workspace
     ++We must balance the desire to make hooks setup easy for developers --- allowing
     ++them to get hooks set up with low friction, and hence increasing the probability
     ++of them adopting these hooks --- with protecting users from the security risks
     ++of arbitrary code execution on their hosts.
      +
     -+    ** I want control over what code runs on my machine
     ++To inform the design, we propose these design principles:
      +
     -+    ** I want to skip hooks from running (for various reasons)
     ++* User consent: Users must explicitly agree to hooks usage; no hooks should
     ++execute without consent, and users should re-consent if hooks update. Users can
     ++opt-out of hooks.
      +
     -+Design Principles
     -+~~~~~~~~~~~~~~~~~
     ++* Trust comes from the central repository:
     ++  ** Most users don't have the time or expertise to properly audit every hook
     ++  and what it does. There must be trust between the user and the remote that the
     ++  code came from, and the Git project should ensure trust to the degree it can
     ++  e.g. enforce HTTPS for its integrity guarantees.
      +
     -+* *Make it easy:* Developers just want to get their work done. Introducing
     -+friction is bad, especially when it prevents development from even starting e.g.
     -+workspace set up.
     ++  ** Since developers will likely build their local clone in their development
     ++  process, at some point, arbitrary code from the repository will be executed.
     ++  In this sense, hooks _with user consent_ do not introduce a new attack surface.
      +
     -+* *Treat hooks as software, not configuration:* We take seriously the
     -+responsibility that comes with causing arbitrary code to run on a user's
     -+machine. Such code needs to come from a reputable source, be automatically
     -+updated, and run with user consent.
     ++* Give users visibility: Git must allow users to make informed decisions. This
     ++means surfacing essential information to the user in a visible manner e.g. what
     ++remotes the hooks are coming from, whether the hooks have changed in the latest
     ++checkout.
      +
      +Feature Requirements
      +~~~~~~~~~~~~~~~~~~~~
     @@ Documentation/technical/repository-owner-hooks.txt (new)
      +Minimum Feature Set
      +^^^^^^^^^^^^^^^^^^^
      +
     -+* A repository owner can specify a configuration for what hook commands are
     ++* A repository can specify a configuration for what hook commands are
      +enabled for what hook events
      +
     -+* A repository owner can specify, in this configuration, where the hook
     ++* A repository can specify, in this configuration, where the hook
      +commands reside
      +
      +    ** This could be a path to a script/binary within the repository
     @@ Documentation/technical/repository-owner-hooks.txt (new)
      +    ** This could be a user installed command or script/binary that exists
      +    outside of the repository and is present in `$PATH`
      +
     -+* Users must explicitly approve hook execution at least once (yes/no)
     ++* This configuration should only apply if it was received over HTTPS
      +
     -+    ** This could happen during setup or at execution time
     ++* A setup command for users to set up hooks
     ++
     ++    ** Hook setup could happen at clone time assuming the user has consented
     ++    e.g. if `--setup-hooks` is passed to `git clone`
     ++
     ++* Users must explicitly approve hooks at least once
     ++
     ++    ** Running the setup command should count as approval, including if the user
     ++    consented during the clone
      +
      +    ** When a hook command changes, a user should re-approve execution (note:
      +    implementation should not interfere with requirement listed in “Fast
      +    Follows")
      +
     -+    * Users do not need to run setup scripts to install hooks --- the setup flow
     -+    happens automatically at clone time
     -+
      +* Automation is able to continue to use clone and other commands
      +non-interactively
      +
     @@ Documentation/technical/repository-owner-hooks.txt (new)
      +Implementation Exploration: Check "magic" branch for configs at fetch time
      +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      +
     -+Example workflow
     -+^^^^^^^^^^^^^^^^
     ++Example User Experience
     ++^^^^^^^^^^^^^^^^^^^^^^^
     ++
     ++===== Case 1: Consent through clone
     ++
     ++....
     ++$ git clone --setup-hooks
     ++...
     ++
     ++The following hooks were installed from remote `origin` ($ORIGIN_URL):
     ++
     ++pre-commit: git-secrets --pre_commit_hook
     ++pre-push:  $GIT_ROOT/pre_push.sh
     ++....
     ++
     ++===== Case 2: Prompting after clone
     ++....
     ++$ git clone
     ++...
     ++
     ++Remote `origin` ($ORIGIN_URL) suggest installing the following hooks:
     ++
     ++pre-commit: git-secrets --pre_commit_hook
     ++pre-push:  $GIT_ROOT/pre_push.sh
     ++
     ++# instead of prompting, we could give users commands to run instead
     ++# see case 3
     ++
     ++Do you wish to install them?
     ++1. Yes (this time)
     ++2. Yes (always from origin)
     ++3. No (not this time)
     ++4. No (never)
     ++....
     ++
     ++===== Case 3: Re-prompting when hooks change
     ++....
     ++$ git pull
     ++
     ++The following hooks were updated from remote `origin` ($ORIGIN_URL):
     ++
     ++pre-push:  $GIT_ROOT/pre_push.sh
     ++
     ++If you wish to install them, run `git hook setup origin`.
     ++
     ++If you wish to always accept hooks from `origin`, run `git hook setup --always
     ++origin`. You should only do this if you trust code changes from origin.
     ++
     ++To always ignore hooks from `origin`, run `git hook ignore origin`.
     ++....
     ++
     ++===== Case 4: Nudging when hooks weren't installed
     ++....
     ++$ git commit
     ++advice: The repository owner has recommended a 'pre-commit' hook that was not run.
     ++To view it, run `git show origin/refs/recommended-config:some-pre-commit`. To install it, run `git hook setup origin pre-commit`
     ++
     ++Turn off this advice by setting config variable advice.missingHook to false."
     ++....
     ++
     ++
     ++Implementation Sketch
     ++^^^^^^^^^^^^^^^^^^^^^
      +
      +* Perform fetch as normal
      +
     @@ Documentation/technical/repository-owner-hooks.txt (new)
      +* A way to perform general set up of a code base e.g. installing dependencies,
      +any first-build related tasks
      +
     ++* Sandboxing hook execution to provide higher levels of security.
     ++
      +[[prior-art]]
      +Prior Art
      +~~~~~~~~~


 Documentation/Makefile                        |   1 +
 .../technical/project-configured-hooks.txt    | 426 ++++++++++++++++++
 2 files changed, 427 insertions(+)
 create mode 100644 Documentation/technical/project-configured-hooks.txt


base-commit: a5828ae6b52137b913b978e16cd2334482eb4c1f

Comments

Emily Shaffer March 29, 2021, 11:20 p.m. UTC | #1
On Fri, Mar 26, 2021 at 01:43:36AM +0000, Albert Cui via GitGitGadget wrote:

> Change-Id: I5f6747524b97c51dfe5fa28e48ea03981b2da5b8
Oops :)

I avoid this by setting gerrit.createChangeId = false in my global
config and adding an alias:
  alias.gerrit-commit = "-c gerrit.createChangeId=true commit"

> +Server-side vs Local Checks
> +^^^^^^^^^^^^^^^^^^^^^^^^^^^
> +
> +* Helps developers catch issues earlier: typically developers need to push to
> +the remote to trigger server-side checks. Local hooks can be run anytime the
> +developer wants. This is especially useful if the project has slow
> +server-checks; catching issues locally can save the developer a lot of time
> +waiting for CI. They are also useful for locally reproducing an issue identified
> +in CI, helping resolve issues faster.

Big +1 to this - I hate having to wait for a push and CI build, possibly
queued behind someone else's work or an earlier mistaken push, to check
whether my stuff is right. :)

> +In the ideal world, developers and project maintainers use both local and server
> +side checks in their workflow. However, for many smaller projects, this may not
> +be possible: CI may be too expensive to run or configure. The number of local
> +solutions to this use case speaks to this need (see <<prior-art, Prior Art>>).
> +Bringing this natively to Git can give all these developers a well-supported,
            ^~~~
This is a little vague here. It sounds like you might be suggesting to
standardize server-side CI config in Git-controlled projects.
> +secure implementation opposed to the fragmentation we see today.

The point about solution fragmentation is a strong one and I wonder
whether it's being emphasized enough. There is obviously a need, or else
people wouldn't keep writing all these things in the Prior Art section
:)

> +Security Considerations and Design Principles
> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> +
[snip]
> +  ** Since developers will likely build their local clone in their development
> +  process, at some point, arbitrary code from the repository will be executed.
> +  In this sense, hooks _with user consent_ do not introduce a new attack surface.

It might be worth saying that we want to make configuration of
project-configured hooks to be approximately as easy/automatic as
building (that is, the user still has to explicitly run a build, and
isn't prompted at the end of their clone whether they want to build it
right away).
> +
> +* Give users visibility: Git must allow users to make informed decisions. This
> +means surfacing essential information to the user in a visible manner e.g. what
> +remotes the hooks are coming from, whether the hooks have changed in the latest
> +checkout.
   ^~~~~~~~
Better say "fetch", if we are proposing this magic branch thing.

> +* This configuration should only apply if it was received over HTTPS

Meaning, non-HTTPS fetches should just not update this special branch?

> +* A setup command for users to set up hooks
AIUI, this is proposed to be part of `git hook`, right?

I don't think it needs to be part of this doc but it'd be nice to also
support installing just a subset, like:

  git hook setup pre-commit
  git hook setup --interactive

> +* Users must explicitly approve hooks at least once
> +
> +    ** Running the setup command should count as approval, including if the user
> +    consented during the clone
> +
> +    ** When a hook command changes, a user should re-approve execution (note:
> +    implementation should not interfere with requirement listed in “Fast
> +    Follows")
> +
> +* Automation is able to continue to use clone and other commands
> +non-interactively

One interesting point - by using an advice instead of an interactive
prompt at clone time, we get this for free.

> +Fast Follows
> +^^^^^^^^^^^^
> +
> +* When prompted to execute a hook, users can specify always or never, even if
> +the hook updates

I think we want to base this on the remote URL, right? I know we talked
a little offline about how to mitigate vs. malicious maintainer (for
example this whole mess with The Great Suspender) and I'm not sure what
solution there might be.

I wonder if it's worth it to notify users that their always-okayed hooks
were updated during fetch?

> +
> +Nice to Haves
> +^^^^^^^^^^^^^
> +
> +* A method to skip hook execution i.e. `--no-verify` works everywhere

This part I'd like to discuss more on-list - I think it would need to
happen as an argument to git.c (e.g. git --no-verify commit blah), or
else we'd have the problems we have with --no-verify today. But is that
too ugly? I think everything else (even teaching parse-options to grab
--no-verify regardless, which, ick) would still be prone to issues,
since not everybody uses parse-options and not every subcommand
implementor knows their subcommand will invoke a hook. (For example, the
nice surprise when rebase started using some different strategy and
invoking the post-commit hook way more often, off the top of my head so
details may not be correct.)

> +* Support a “warnings only mode” where hooks run but don’t block commands from
> +executing

Same as --no-verify. I wonder whether it's "good enough" to do these two
as configs? hook.skip-all=true, hook.ignore-result=true?

> +Implementation Exploration: Check "magic" branch for configs at fetch time
> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> +
> +Example User Experience
> +^^^^^^^^^^^^^^^^^^^^^^^
> +
> +===== Case 1: Consent through clone
> +
> +....
> +$ git clone --setup-hooks
> +...
> +
> +The following hooks were installed from remote `origin` ($ORIGIN_URL):
> +
> +pre-commit: git-secrets --pre_commit_hook
> +pre-push:  $GIT_ROOT/pre_push.sh

Hm, I thought we wanted to consider storing the hook body in the magic
branch as well? To avoid changing hook implementation during bisect, for
example?

> +....
> +
> +===== Case 2: Prompting after clone
> +....
> +$ git clone
> +...
> +
> +Remote `origin` ($ORIGIN_URL) suggest installing the following hooks:
> +
> +pre-commit: git-secrets --pre_commit_hook
> +pre-push:  $GIT_ROOT/pre_push.sh
> +
> +# instead of prompting, we could give users commands to run instead
> +# see case 3

Yep, I think this is a better idea - I glued together the two UXen below
:)

> +
> +Do you wish to install them?
> +1. Yes (this time)
> +2. Yes (always from origin)
> +3. No (not this time)
> +4. No (never)
> +....

Offline when we discussed this, it seems like users will just smash 2
("whatever gets you to stop bothering me") regardless of whether the
hooks are actually coming from a source the user trusts. So I would
prefer something like:

  $ git clone
  ....
  Remote `origin` ($ORIGIN_URL) suggest installing the following hooks:

  pre-commit: git-secrets --pre_commit_hook
  pre-push:  $GIT_ROOT/pre_push.sh

  If you wish to install them, run `git hook setup origin`.

> +===== Case 3: Re-prompting when hooks change
> +....
> +$ git pull
> +
> +The following hooks were updated from remote `origin` ($ORIGIN_URL):
> +
> +pre-push:  $GIT_ROOT/pre_push.sh
> +
> +If you wish to install them, run `git hook setup origin`.
> +
> +If you wish to always accept hooks from `origin`, run `git hook setup --always
> +origin`. You should only do this if you trust code changes from origin.
> +
> +To always ignore hooks from `origin`, run `git hook ignore origin`.
> +....
> +
> +===== Case 4: Nudging when hooks weren't installed
> +....
> +$ git commit
> +advice: The repository owner has recommended a 'pre-commit' hook that was not run.
> +To view it, run `git show origin/refs/recommended-config:some-pre-commit`. To install it, run `git hook setup origin pre-commit`
> +
> +Turn off this advice by setting config variable advice.missingHook to false."
> +....

(Full disclosure: this was my idea.)
I realize that some folks upstream may find this is too chatty for
general use. I'm hoping being able to shut off the advice globally might
be enough of a mitigation; maybe we can gate it behind an experimental
config or something if folks aren't so sure?

> +Implementation Sketch
> +^^^^^^^^^^^^^^^^^^^^^
> +
> +* Perform fetch as normal
> +
> +* After fetch is complete, Git checks for a "magic" config branch (e.g.
> ++origin/refs/recommended-config+) which contains information about config lines
> +an end-user may want (including hooks).
> +
> +* As part of the fetch subcommand, Git prompts users to install the configs
> +contained there.

Like I mentioned above, I think we probably want to drop the entire
interactive installer wizard concept...

> +    ** User responses to that prompt could be "sticky" - e.g. a user could reply
> +    "no (this time)", "no (never)", "yes (this time)", or "yes (always)".
> +    Always/never indicate that the user trusts the remote this config is coming
> +    from, and should not apply to configs fetched from other remotes.

...which also means that we can drop trying to express this briefly and
instead say something wordy in a flag to `git hook setup` (or whatever
we call it).

> +Later, we might want to do this before the initial clone is performed; that
> +workflow looks like:
> +
> +* During clone, perform ls-refs as normal
> +
> +* If the server has a "magic" config branch, fetch only that config branch.
> +
> +* Prompt users as described above.
> +
> +* Perform the rest of the clone.

This part I'm still interested in, although I'm not sure how to
reconcile not wanting an interactive prompt with wanting an early step
like this during clone. Maybe that's what this `git clone --setup-hooks`
(or maybe, `git clone --with-recommended-configs`) is for?

> +Pros
> +^^^^
> +
> +* Repository owners have a method for providing recommended config for
> +contributors.
> +
> +* Installation flow happens without additional user intervention.

I think when we wrote this bullet point it was to express "the user
doesn't have to run something else to discover these hooks exist". But I
don't think "without additional user intervention" fully describes
what's proposed here, either. Hrm.

> +
> +* Keeping config branch and history separate from code branch and history means
> +it is versioned, but not tied to user's checkout.

Probably worth discussing/including that we intend hook contents to also
live in the config branch, to make sure we're running the same hook
regardless of checkout/bisect state/inspection/have been working on a
feature for 6 months and have been fetching but not rebasing/etc. I'm
not sure I see that explicitly called out here...

Actually, I found the following (pasting from much earlier in the doc):

  +    ** This could be a path to a script/binary within the repository
  +
  +    ** This could be a path to a script/binary contained within
  submodules of
  +    the repository
  +
  +    ** This could be a user installed command or script/binary that
  exists
  +    outside of the repository and is present in `$PATH`

Maybe this part needs to be modified to explicitly refer to the hook
executable being tracked in the magic branch?

> +Cons
> +^^^^
[snip]
> +* Turning a "set and forget" command like clone into an interactive session with
> +the user is not ideal; care must be taken to avoid breaking bots.

If we notify and nag, but don't interactively prompt, then we get happy
bots for free ;)

> +
> +* Inflating configs and executables from a remote tracking branch which is never
> +checked out could be slow.

I wonder about this. This seems to me like something that might be
drastically slower or faster depending on platform. Hmmm.

> +Future Work
> +~~~~~~~~~~~
> +
> +* Extending this to allow repository owners to specify specific configurations
> +in general e.g. this repository should use partial-clone with these parameters.

Offline I think there was a little discussion with Stolee about whether
it made more sense to *only* approach this specific problem with this
document, as the hooks are also config, and so they could come later.
But I think if we want to store the executable in the magic branch (and
I do... since I keep bringing it up :) ) then it doesn't make sense to
say "build it for config and everything else will follow".

> +* Extending this to support submodules: We want to make sure this works in a way
> +that's easy to adapt to submodules, who would likely need to run the same hooks
> +as the superproject; for example, submodules could inherit the superproject
> +config.

I'm hoping to send an RFC patch introducing such an inherited
superproject config ... very soon. I hope. So there wasn't much detail
provided here, intentionally.

> +* Sandboxing hook execution to provide higher levels of security.

I think this says: "Can we run a user hook in a container that only has
access to the repo in question?"

It sounds like a complicated answer. I could see legitimate reasons to
want wider access than just the container - for example, some
hook-specific configuration that doesn't fit the Git config format, or
even something like updating a stats file to keep a record of how many
commits I made/pushed/whatever every day, stored in a central location
for reference at performance review time :) But I also don't know alllll
that much about containerization - I think there are ways to hand over
access to other needed files like this, right?

But then, I also feel yucky thinking about Debian telling me that my Git
install also needs me to install Docker... :)

Worth thinking about and discussing at a later date, I'd guess.

> +[[prior-art]]
> +Prior Art
> +~~~~~~~~~

I wonder whether it's useful to mention (in mails, I guess, not in
the checked in doc) why these are bad - do they duplicate work between
each other? Are they engaging in bad practices when interfacing with
Git? etc.?

It would be a lot of work to collect, so maybe it's not that useful..


Thanks for writing up v2 / mailing it, Albert.

 - Emily
Derrick Stolee March 30, 2021, 3:24 p.m. UTC | #2
On 3/25/2021 9:43 PM, Albert Cui via GitGitGadget wrote:
> From: Albert Cui <albertqcui@gmail.com>
> 
> Hooks today are configured at the repository level, making it difficult to
> share hooks across repositories. Configuration-based hook management, by
> moving hooks configuration to the config, makes this much easier. However,
> there is still no good way for project maintainers to encourage or enforce
> adoption of specific hook commands on specific hook events in a repository.
> As such, there are many tools that provide this functionality on top of Git.
> 
> This patch documents the requirements we propose for this feature as well as
> a design sketch for implementation.

Sorry for being so late in reviewing this.

My first reaction is that this feature is suggesting multiple security
vulnerabilities as core functionality. It also seems to be tied to
niche projects (in number of projects, not necessarily the size of those
projects).

I was recommended in conversation to think of this as a way to take
existing ad-hoc behavior and standardize it with a "Git-blessed"
solution. I'm not sure this proposal makes a strong enough case for
why having a "configure-hooks.sh" script in the base of the repo is
not enough. It simultaneously does not use existing precedents like
.gitattributes or .gitignore as direction in using the worktree at
HEAD as a mechanism for communicating details. I find using a separate
ref for hooks to be a non-starter and the design should be rebuilt from
scratch.

I also expect that a significant portion of users will see a message
like "this repository needs hooks" and will just say "yes" to get rid
of the prompt. There needs to be sufficient opportunity for users to
inspect the hook configuration and avoid frustrated or distracted users
from doing the wrong thing.

Server-side checks should always exist, so users who don't follow the
project's guidelines using the recommended hooks will be blocked. The
important thing is that there is an easy way for willing participants
to install the correct hooks. This doesn't mean we should make it
almost automatic.

Also, please proactively pursue a security review of the feature,
including non-technical risks such as social engineering, forks, or
other possible attacks. This idea seems so risky that I would be
against accepting it unless a security expert has done a thorough
review.

> +We propose adding native Git functionality to allow project maintainers to
> +specify hooks that a user ought to install and utilize in their development
> +workflows.

I think providing a way for repository owners to _recommend_ how cloners
should interact with the repository is a good idea. I think starting with
hooks is perhaps a significant jump to the most complicated version of
that idea.

As you think of this design, it might be good to think about how some
recommended Git config (within an allow-list) might fit into this system
as well. I would have started there, with things like "Use partial clone"
or "use sparse-checkout". Those are really things that need to happen at
clone time, they can't really happen after-the-fact, which helps justifying
a modification to 'git clone'. The hook configuration doesn't _need_ to
happen during 'git clone'. More on this timing later.

The .gitattributes file is the closest analogue I could find in current
functionality, but it operates on a path-based scope, not repository scope.

> +Server-side vs Local Checks
> +^^^^^^^^^^^^^^^^^^^^^^^^^^^
...
> +In the ideal world, developers and project maintainers use both local and server
> +side checks in their workflow. However, for many smaller projects, this may not
> +be possible: CI may be too expensive to run or configure. The number of local
> +solutions to this use case speaks to this need (see <<prior-art, Prior Art>>).
> +Bringing this natively to Git can give all these developers a well-supported,
> +secure implementation opposed to the fragmentation we see today.

I'm not sure this is a good selling point for small projects. If they
are small, then the CI to verify commits is cheap(er).

Local hooks should never be used as a replacement for server-side checks.
A user could always use a repository without the local hooks and push
commits that have not been vetted locally. The extreme example is to have
a commit hook that compiles the code and runs all the tests. Would you
then remove all CI builds?

Making it easier to adopt local hooks can avoid some pain points when
users are blocked by the server-side checks.

> +Server-side vs Local Checks
> +^^^^^^^^^^^^^^^^^^^^^^^^^^^
...
> +User Goals / Critical User Journeys
> +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
...

I appreciate the motivation in this document. However, the motivation
doesn't really justify why this should be baked into Git itself, since
a "configure-repo" script in the base of the repo would suffice to
achieve that functionality.

The reason to put this in Git is to standardize this process so it
is not different in each repository. It might be good to spend time
justifying that angle.

> +Security Considerations and Design Principles
> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> +
> +We must balance the desire to make hooks setup easy for developers --- allowing
> +them to get hooks set up with low friction, and hence increasing the probability
> +of them adopting these hooks --- with protecting users from the security risks
> +of arbitrary code execution on their hosts.
> +
> +To inform the design, we propose these design principles:
> +
> +* User consent: Users must explicitly agree to hooks usage; no hooks should
> +execute without consent, and users should re-consent if hooks update. Users can
> +opt-out of hooks.
> +
> +* Trust comes from the central repository:
> +  ** Most users don't have the time or expertise to properly audit every hook
> +  and what it does. There must be trust between the user and the remote that the
> +  code came from, and the Git project should ensure trust to the degree it can
> +  e.g. enforce HTTPS for its integrity guarantees.
> +
> +  ** Since developers will likely build their local clone in their development
> +  process, at some point, arbitrary code from the repository will be executed.
> +  In this sense, hooks _with user consent_ do not introduce a new attack surface.

It is critical that users are presented with this consent at the correct
times. For instance, I believe configuring local hooks should only be
done _after_ "git clone" completes. That allows a user to inspect the
worktree to their content instead of in the middle of an interactive shell
session or something. (The "git clone" command could output a message to
stderr saying "This repository recommends configuring local hooks. Run
'git <tbd>' to inspect the hooks and configure them.")

We've had enough code-execution bugs with "git clone" that I want to
completely avoid that possibility here.

> +* Give users visibility: Git must allow users to make informed decisions. This
> +means surfacing essential information to the user in a visible manner e.g. what
> +remotes the hooks are coming from, whether the hooks have changed in the latest
> +checkout.

As a user moves HEAD, we should similarly avoid updating the hooks
automatically, but instead present a message to the user to update their
hooks using an intentional command.

> +    ** This could be a path to a script/binary within the repository

Binaries will be tricky if you want users of multiple platforms to
interact with your repository. And scripts can be slower than binaries.

How could someone build hooks from source using your workflow? Perhaps
users are expected to locally compile the code before configuring the
hooks?

> +    ** This could be a path to a script/binary contained within submodules of
> +    the repository

This gives me significant chills. Proceed with caution here.

I understand the reason to want this feature: you could have a suite
of repositories using a common hook set that lives in each as a
submodule. I just want to point out that this adds yet another
dimension for attack.

> +    ** This could be a user installed command or script/binary that exists
> +    outside of the repository and is present in `$PATH`

Like `rm -rf ~/*`? I'm trying to think of dangerous things to do without
elevation. It could help here to clarify the intended user pattern here:
"This repository requires that you install tool X." This seems unlikely
to be necessarily true at clone time, so the users will have a broken
state if they don't run some extra steps. How will that be communicated?

Requirements like these make me think that these repositories would be
better off with a script that configures the hooks after checking if
these things actually exist on the PATH (and installs them if not). I
would lower the priority of this one for now.

> +* This configuration should only apply if it was received over HTTPS

Avoiding http:// and git:// makes sense. Why not SSH?

> +* A setup command for users to set up hooks
> +
> +    ** Hook setup could happen at clone time assuming the user has consented
> +    e.g. if `--setup-hooks` is passed to `git clone`

This is not enough consent.

> +* Users must explicitly approve hooks at least once
> +
> +    ** Running the setup command should count as approval, including if the user
> +    consented during the clone
> +
> +    ** When a hook command changes, a user should re-approve execution (note:
> +    implementation should not interfere with requirement listed in “Fast
> +    Follows")

Users should explicitly approve hooks any time they would change.
They should also be able to explore the source of the change using
whatever editors and tools they want, so the worktree should change
to its new state without new hooks, _then_ the user could consider
updating hooks based on that new state.

> +Fast Follows
> +^^^^^^^^^^^^
> +
> +* When prompted to execute a hook, users can specify always or never, even if
> +the hook updates

I don't understand what this means. "when prompted to execute a hook" are you
saying that the user will get a message saying "Git will now run the pre-commit
hook, are you ok with that?"

"even if the hook updates": I've made my stance clear that the user should be
in complete control of when the hooks update.

> +Out of Scope
> +^^^^^^^^^^^^
> +
> +* Ensuring the user has installed software that isn't distributed with the repo

If you are going to allow hooks to run something on the PATH, then Git
should probably check that such an executable exists before setting the
config and causing problems.

> +Implementation Exploration: Check "magic" branch for configs at fetch time
> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> +
> +Example User Experience
> +^^^^^^^^^^^^^^^^^^^^^^^
> +
> +===== Case 1: Consent through clone
> +
> +....
> +$ git clone --setup-hooks
> +...
> +
> +The following hooks were installed from remote `origin` ($ORIGIN_URL):
> +
> +pre-commit: git-secrets --pre_commit_hook
> +pre-push:  $GIT_ROOT/pre_push.sh
> +....

Nope. I think this workflow is a non-starter.

> +===== Case 2: Prompting after clone
> +....
> +$ git clone
> +...
> +
> +Remote `origin` ($ORIGIN_URL) suggest installing the following hooks:
> +
> +pre-commit: git-secrets --pre_commit_hook
> +pre-push:  $GIT_ROOT/pre_push.sh

Yes, this works for me.

> +# instead of prompting, we could give users commands to run instead
> +# see case 3
> +
> +Do you wish to install them?
> +1. Yes (this time)
> +2. Yes (always from origin)
> +3. No (not this time)
> +4. No (never)

I'd rather see the installation as a separate step. That gives more
weight to the users' consent. Even if you do have a prompt here that
says Yes/No, *do not* include "always from origin".

> +===== Case 3: Re-prompting when hooks change
> +....
> +$ git pull
> +
> +The following hooks were updated from remote `origin` ($ORIGIN_URL):
> +
> +pre-push:  $GIT_ROOT/pre_push.sh
> +
> +If you wish to install them, run `git hook setup origin`.

Good. Stop here. Perhaps also describe this as something that happens
with "git checkout" because it matters when HEAD updates, even if the
commit was fetched earlier.

> +===== Case 4: Nudging when hooks weren't installed
> +....
> +$ git commit
> +advice: The repository owner has recommended a 'pre-commit' hook that was not run.
> +To view it, run `git show origin/refs/recommended-config:some-pre-commit`. To install it, run `git hook setup origin pre-commit`
> +
> +Turn off this advice by setting config variable advice.missingHook to false."
> +....

These nudges seem like a good pattern, especially with the advice config.

> +Implementation Sketch
> +^^^^^^^^^^^^^^^^^^^^^
> +
> +* Perform fetch as normal
> +
> +* After fetch is complete, Git checks for a "magic" config branch (e.g.
> ++origin/refs/recommended-config+) which contains information about config lines
> +an end-user may want (including hooks).

I think this is the wrong direction to go. You are recommending a few things:

1. Some branch names are more special than others.
2. Hooks live in a separate history than the rest of the repository.
3. Users cannot inspect the hooks in their worktree before installation.

Instead, think about things like .gitignore and .gitattributes, as they can
change as the repository changes. Make a special _filename_ or directory:
for example ".githooks/".

> +* As part of the fetch subcommand, Git prompts users to install the configs
> +contained there.

Prompt users that they are available and can be configured using another
command.

I summarized my thoughts at the top.

Thanks,
-Stolee
Albert Cui April 1, 2021, 8:02 p.m. UTC | #3
On Mon, Mar 29, 2021 at 4:20 PM Emily Shaffer <emilyshaffer@google.com> wrote:
>
> > +Security Considerations and Design Principles
> > +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> > +
> [snip]
> > +  ** Since developers will likely build their local clone in their development
> > +  process, at some point, arbitrary code from the repository will be executed.
> > +  In this sense, hooks _with user consent_ do not introduce a new attack surface.
>
> It might be worth saying that we want to make configuration of
> project-configured hooks to be approximately as easy/automatic as
> building (that is, the user still has to explicitly run a build, and
> isn't prompted at the end of their clone whether they want to build it
> right away).

+1, I like phrasing it this way.
> > +
> > +* Give users visibility: Git must allow users to make informed decisions. This
> > +means surfacing essential information to the user in a visible manner e.g. what
> > +remotes the hooks are coming from, whether the hooks have changed in the latest
> > +checkout.
>    ^~~~~~~~
> Better say "fetch", if we are proposing this magic branch thing.
>
> > +* This configuration should only apply if it was received over HTTPS
>
> Meaning, non-HTTPS fetches should just not update this special branch?

Yes, though I erroneously forgot to include SSH as well. I think the
main issue is
person-in-the-middle type attacks.

> > +* A setup command for users to set up hooks
> AIUI, this is proposed to be part of `git hook`, right?
>
> I don't think it needs to be part of this doc but it'd be nice to also
> support installing just a subset, like:
>
>   git hook setup pre-commit
>   git hook setup --interactive
>
Correct, I think `git hook` is a natural evolution. This is a nice to
have that we can document.

>
> > +Fast Follows
> > +^^^^^^^^^^^^
> > +
> > +* When prompted to execute a hook, users can specify always or never, even if
> > +the hook updates
>
> I think we want to base this on the remote URL, right? I know we talked
> a little offline about how to mitigate vs. malicious maintainer (for
> example this whole mess with The Great Suspender) and I'm not sure what
> solution there might be.
>
> I wonder if it's worth it to notify users that their always-okayed hooks
> were updated during fetch?
>
It definitely aligns with the security principles to notify, even if
they have OK'd updates.

> > +Implementation Exploration: Check "magic" branch for configs at fetch time
> > +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> > +
> > +Example User Experience
> > +^^^^^^^^^^^^^^^^^^^^^^^
> > +
> > +===== Case 1: Consent through clone
> > +
> > +....
> > +$ git clone --setup-hooks
> > +...
> > +
> > +The following hooks were installed from remote `origin` ($ORIGIN_URL):
> > +
> > +pre-commit: git-secrets --pre_commit_hook
> > +pre-push:  $GIT_ROOT/pre_push.sh
>
> Hm, I thought we wanted to consider storing the hook body in the magic
> branch as well? To avoid changing hook implementation during bisect, for
> example?
>

Good question. If we consider this as an extension of config-based
hooks, then I think
it's logical to still support hooks in the repo itself. In
documentation, we might suggest
that people who want to use this feature store the hook in the magic
branch for that
reason.
Ævar Arnfjörð Bjarmason April 2, 2021, 9:59 a.m. UTC | #4
On Fri, Mar 26 2021, Albert Cui via GitGitGadget wrote:

> From: Albert Cui <albertqcui@gmail.com>

Formatting nit:

> +Git has https://git-scm.com/docs/githooks[hooks] functionality to allow users to
> [...]
> +local checks of code e.g. linting. As documented in
> +https://git-scm.com/docs/gitfaq#_hooks[gitfaq], in some cases, server-side
> +checks may be more appropriate, especially since developers can always skip

This should be a linkgit:* even in the technical/ directory, should it
not? We build docs on git-scm.com (among others), but in our source tree
we should be using linking syntax, not linking to our own already-built
docs on some website.
Ævar Arnfjörð Bjarmason April 2, 2021, 10:30 a.m. UTC | #5
On Fri, Mar 26 2021, Albert Cui via GitGitGadget wrote:

> From: Albert Cui <albertqcui@gmail.com>
>
> Hooks today are configured at the repository level, making it difficult to
> share hooks across repositories. Configuration-based hook management, by
> moving hooks configuration to the config, makes this much easier. However,
> there is still no good way for project maintainers to encourage or enforce
> adoption of specific hook commands on specific hook events in a repository.
> As such, there are many tools that provide this functionality on top of Git.
>
> This patch documents the requirements we propose for this feature as well as
> a design sketch for implementation.

I had comments on the v1 that are still unanswered by this re-roll:
https://lore.kernel.org/git/87im5nzbe6.fsf@evledraar.gmail.com/

I think a more productive way to handle such proposals is to reply to
such general discussion and /then/ re-roll the series.

I'm not going to repeat the outstanding points there (but would like you
to reply to it, and having just read Derrick Stolee's E-Mail downthread
there's another significant point of feedback to reply to.

So just comments on the range-diff below. I think for any one paragraph
reading ahead helps, because it's a bit of stream-of-conciousness as I
try to make sense of things that are then hinted at in later parts of
the document (which itself is a suggestion to maybe re-arrange some of
it).

>      -+* The `pre-commit` hook event: before committing, a developer may want to lint
>      -+their changes to enforce certain code style and quality. If there are style
>      -+issues, the developer may want the commit to fail so that they can fix the
>      -+issues.
>      ++* The `pre-commit` hook event:
>       +
>      -+* The `commit-msg` hook event: after committing, repository owners may want to
>      -+enforce a certain commit message style. This may be out of necessity:
>      ++  ** A developer may want to lint their changes to enforce certain code style
>      ++  and quality. The project may want the commit to fail so that commits that
>      ++  violate the project's style don't get uploaded.
>      ++
>      ++  ** The project may want to prevent developers from committing passwords or
>      ++  other sensitive information e.g. via
>      ++  https://github.com/awslabs/git-secrets[git-secrets].

Why does the project want to prevent developers from *committing* such
information by default? I commit such stuff all the time, it's only a
problem once it's pushed.

The genesis of this series seems to be need within the
Google-plex. Having operated a similar central-repository corporate
setup before I'm surprised that you're even trying to prevent such
things on local computers, surely you'll have to re-do such checks
on-push on the server, so just have them live there?

But I think this is answered below:

>      ++Server-side vs Local Checks
>      ++^^^^^^^^^^^^^^^^^^^^^^^^^^^
>      ++
>      ++A large motivation for this change is providing projects a method to enable
>      ++local checks of code e.g. linting. As documented in
>      ++https://git-scm.com/docs/gitfaq#_hooks[gitfaq], in some cases, server-side
>      ++checks may be more appropriate, especially since developers can always skip
>      ++local hook execution. We think there are still benefits to local checks:
>      ++
>      ++* Ensuring commit message style and preventing the committing of sensitive data,
>      ++as described above. In particular, if CI results are public, as with many open
>      ++source projects, server side checks are useless for guarding against leaking
>      ++sensitive data.

So what you really mean is you want a pre-commit hook as an alternative
to some "once we've accpted the push and CI runs we notice naughty
data", not as a pre-receive hook alternative?

>      ++
>      ++* Helps developers catch issues earlier: typically developers need to push to
>      ++the remote to trigger server-side checks. Local hooks can be run anytime the
>      ++developer wants. This is especially useful if the project has slow
>      ++server-checks; catching issues locally can save the developer a lot of time
>      ++waiting for CI. They are also useful for locally reproducing an issue identified
>      ++in CI, helping resolve issues faster.
>      ++
>      ++* Since the typical goal of developers to submit changes to a central
>      ++repository, their interests are aligned with the project maintainers'; while
>      ++they can choose to skip local hook execution, it is in their interest to allow
>      ++hooks to run at least before proposing code for submission to the central
>      ++repository to increase the chances of the code getting accepted.

This all makes sense, but is really missing how this problem isn't adequately solved by:

    $ HACKING
    Welcome to project XYZ, first you'll be much more productive with
    our hooks, run:

        ./setup-hooks.sh

    [...]

Presumably developers working on some central repo are the ones least
served by this type of thing, since such setups usually have established
setup scripts etc. that you (mostly) go through once.

>      ++In the ideal world, developers and project maintainers use both local and server
>      ++side checks in their workflow. However, for many smaller projects, this may not
>      ++be possible: CI may be too expensive to run or configure. The number of local
>      ++solutions to this use case speaks to this need (see <<prior-art, Prior Art>>).
>      ++Bringing this natively to Git can give all these developers a well-supported,
>      ++secure implementation opposed to the fragmentation we see today.

The end-goal of this series combined with Emily's configurable hook is
basically to have less friction when it comes to setting up and
maintaining hooks.

I don't see how it would help with "CI may be too expensive to run or
configure" though. We're basically talking about auto-updating a script
in .git/hooks, which today is essentially a ./setup-hooks.sh, and the
script checking for a new version of itself when it runs at
origin/myscripts:myname.sh, no?

>      -+* As a repository owner / project maintainer,
>      ++* As a project maintainer,
>       +
>       +    ** I want to enforce code style so I require developers to use a
>       +    standardized linter.
>       +
>      -+    ** I want to prevent leaks / share code widely so I check that developers
>      -+    are uploading code to the right place.
>      ++    ** I want to prevent sensitive data from getting checked in.
>       +
>      -+    ** I want this to just work for all the developers in my repository, without
>      ++    ** I want to prevent leaks so I check that developers are uploading code to
>      ++    the right private central repository. Conversely, I may want to encourage
>      ++    more open source development and encourage developers to push to public
>      ++    central repositories.

I think I'm beginning to get the gist of this, so it's really also a
proposed workaround for projects that host on platforms like github.com
and whatnot where you can run arbitrary code in a CI, but you can't
install a custom pre-receive hook?

I think it might be worth explicitly spelling that out. E.g. if those
platforms gained a feature (which isn't that hard to imagine) of hiding
your ref until after some pre-receive part of a CI check has run (which
would not have public logs, so you could "safely" check/push passwords)
it seems to me that a large part of the immediate need for this would go
away.

Which doesn't per-se mean we shouldn't do it, just that assumptions,
constaints etc. should be documented.

>      ++* Trust comes from the central repository:
>      ++  ** Most users don't have the time or expertise to properly audit every hook
>      ++  and what it does. There must be trust between the user and the remote that the
>      ++  code came from, and the Git project should ensure trust to the degree it can
>      ++  e.g. enforce HTTPS for its integrity guarantees.

I won't belabor the point but in my feedback on v1 I suggested an
alternate mechanism of doing this which I think is more
distributed-friendly and more safe:
https://lore.kernel.org/git/87im5nzbe6.fsf@evledraar.gmail.com/

...

>      ++Example User Experience
>      ++^^^^^^^^^^^^^^^^^^^^^^^
>      ++
>      ++===== Case 1: Consent through clone
>      ++
>      ++....
>      ++$ git clone --setup-hooks
>      ++...
>      ++
>      ++The following hooks were installed from remote `origin` ($ORIGIN_URL):
>      ++
>      ++pre-commit: git-secrets --pre_commit_hook
>      ++pre-push:  $GIT_ROOT/pre_push.sh
>      ++....
>      ++
>      ++===== Case 2: Prompting after clone
>      ++....
>      ++$ git clone
>      ++...
>      ++
>      ++Remote `origin` ($ORIGIN_URL) suggest installing the following hooks:
>      ++
>      ++pre-commit: git-secrets --pre_commit_hook
>      ++pre-push:  $GIT_ROOT/pre_push.sh
>      ++
>      ++# instead of prompting, we could give users commands to run instead
>      ++# see case 3
>      ++
>      ++Do you wish to install them?
>      ++1. Yes (this time)
>      ++2. Yes (always from origin)
>      ++3. No (not this time)
>      ++4. No (never)
>      ++....
>      ++
>      ++===== Case 3: Re-prompting when hooks change
>      ++....
>      ++$ git pull
>      ++
>      ++The following hooks were updated from remote `origin` ($ORIGIN_URL):
>      ++
>      ++pre-push:  $GIT_ROOT/pre_push.sh
>      ++
>      ++If you wish to install them, run `git hook setup origin`.
>      ++
>      ++If you wish to always accept hooks from `origin`, run `git hook setup --always
>      ++origin`. You should only do this if you trust code changes from origin.
>      ++
>      ++To always ignore hooks from `origin`, run `git hook ignore origin`.
>      ++....
>      ++

That alternate security model I suggested above would make this much
less painful. I.e. you could say "I'll trust updates as long as the
whole chain is signed by XYZ authority".

> [...]

Snip the rest of the doc, which as noted I've god unreplied-to feedback
on in https://lore.kernel.org/git/87im5nzbe6.fsf@evledraar.gmail.com/
Albert Cui April 3, 2021, 12:58 a.m. UTC | #6
On Fri, Apr 2, 2021 at 3:30 AM Ævar Arnfjörð Bjarmason <avarab@gmail.com> wrote:
>
> I had comments on the v1 that are still unanswered by this re-roll:
> https://lore.kernel.org/git/87im5nzbe6.fsf@evledraar.gmail.com/
>
> I think a more productive way to handle such proposals is to reply to
> such general discussion and /then/ re-roll the series.
>
> I'm not going to repeat the outstanding points there (but would like you
> to reply to it, and having just read Derrick Stolee's E-Mail downthread
> there's another significant point of feedback to reply to.
>
Thanks for the feedback! I'm new to this process. I'll make sure to
reply to both.

[...]

> >      ++  ** The project may want to prevent developers from committing passwords or
> >      ++  other sensitive information e.g. via
> >      ++  https://github.com/awslabs/git-secrets[git-secrets].
>
> Why does the project want to prevent developers from *committing* such
> information by default? I commit such stuff all the time, it's only a
> problem once it's pushed.
>
> But I think this is answered below:
>
> >      ++Server-side vs Local Checks
> >      ++^^^^^^^^^^^^^^^^^^^^^^^^^^^
> >      ++
> >      ++A large motivation for this change is providing projects a method to enable
> >      ++local checks of code e.g. linting. As documented in
> >      ++https://git-scm.com/docs/gitfaq#_hooks[gitfaq], in some cases, server-side
> >      ++checks may be more appropriate, especially since developers can always skip
> >      ++local hook execution. We think there are still benefits to local checks:
> >      ++
> >      ++* Ensuring commit message style and preventing the committing of sensitive data,
> >      ++as described above. In particular, if CI results are public, as with many open
> >      ++source projects, server side checks are useless for guarding against leaking
> >      ++sensitive data.
>
> So what you really mean is you want a pre-commit hook as an alternative
> to some "once we've accpted the push and CI runs we notice naughty
> data", not as a pre-receive hook alternative?
>

I think you're identifying that "prevent developers from commiting" is
the wrong wording.
Maybe a better phrasing is "prevent sensitive information from getting pushed or
checked in."

Yes, this could be done in CI, but as noted below, catching this
before the push (or commit)
happens is more productive for developers:

> >      ++
> >      ++* Helps developers catch issues earlier: typically developers need to push to
> >      ++the remote to trigger server-side checks. Local hooks can be run anytime the
> >      ++developer wants. This is especially useful if the project has slow
> >      ++server-checks; catching issues locally can save the developer a lot of time
> >      ++waiting for CI. They are also useful for locally reproducing an issue identified
> >      ++in CI, helping resolve issues faster.
> >      ++

An additional point I should also call out is that for large
repositories, pushes themselves
can be slow, so even if server-side checks are fast, being able to
avoid unnecessary pushes
can help developer velocity.

>
> This all makes sense, but is really missing how this problem isn't adequately solved by:
>
>     $ HACKING
>     Welcome to project XYZ, first you'll be much more productive with
>     our hooks, run:
>
>         ./setup-hooks.sh
>
>     [...]
>
> Presumably developers working on some central repo are the ones least
> served by this type of thing, since such setups usually have established
> setup scripts etc. that you (mostly) go through once.
>

One issue that immediately comes to mind with a setup script is that
developers could
easily miss out on updates to the hooks. One nice thing about this
proposal is that we
try to address that problem.

Anothing issue is that people in general are bad at reading
instructions; that's why I'm
trying to get as close as possible to `git clone` doing set up for you
while balancing the
security concerns (I know Derrick Stolee finds issue with this in his
reply, which I'll try to
address in my reply there.)

> >      ++In the ideal world, developers and project maintainers use both local and server
> >      ++side checks in their workflow. However, for many smaller projects, this may not
> >      ++be possible: CI may be too expensive to run or configure. The number of local
> >      ++solutions to this use case speaks to this need (see <<prior-art, Prior Art>>).
> >      ++Bringing this natively to Git can give all these developers a well-supported,
> >      ++secure implementation opposed to the fragmentation we see today.
>
> The end-goal of this series combined with Emily's configurable hook is
> basically to have less friction when it comes to setting up and
> maintaining hooks.
>
> I don't see how it would help with "CI may be too expensive to run or
> configure" though. We're basically talking about auto-updating a script
> in .git/hooks, which today is essentially a ./setup-hooks.sh, and the
> script checking for a new version of itself when it runs at
> origin/myscripts:myname.sh, no?
>

By "expensive", I mean from both a money and a time perspective: for small
projects, you may not set up any server-side checks and instead rely
purely on local checks,
and this helps by, as you said, reducing the friction to set up these
local checks.

From my own experience: When I start a new weekend project, setting up
CI is not at the top
of the list of things I want to spend time on; I'm usually not even
writing tests. Maybe
I'm just a bad developer :D But I do set up local checks: linting,
code formatters.

To your point about updating hooks (we're thinking on the same
lines!): that's putting the
responsibility on the developer to implement. From a
maximizing-global-productivity
point of view, isn't it more elegant for us to extend Git's
functionality and provide good support
for this use case?

> >      -+* As a repository owner / project maintainer,
> >      ++* As a project maintainer,
> >       +
> >       +    ** I want to enforce code style so I require developers to use a
> >       +    standardized linter.
> >       +
> >      -+    ** I want to prevent leaks / share code widely so I check that developers
> >      -+    are uploading code to the right place.
> >      ++    ** I want to prevent sensitive data from getting checked in.
> >       +
> >      -+    ** I want this to just work for all the developers in my repository, without
> >      ++    ** I want to prevent leaks so I check that developers are uploading code to
> >      ++    the right private central repository. Conversely, I may want to encourage
> >      ++    more open source development and encourage developers to push to public
> >      ++    central repositories.
>
> I think I'm beginning to get the gist of this, so it's really also a
> proposed workaround for projects that host on platforms like github.com
> and whatnot where you can run arbitrary code in a CI, but you can't
> install a custom pre-receive hook?
>
> I think it might be worth explicitly spelling that out. E.g. if those
> platforms gained a feature (which isn't that hard to imagine) of hiding
> your ref until after some pre-receive part of a CI check has run (which
> would not have public logs, so you could "safely" check/push passwords)
> it seems to me that a large part of the immediate need for this would go
> away.
>
> Which doesn't per-se mean we shouldn't do it, just that assumptions,
> constaints etc. should be documented.
>

Agree we can call this out, but as I'm saying above, I don't think
it's fair to assume
everyone is already using server-side checks and there's still benefit
to doing the same
checks locally even if you have server-side checks.

[...]

> Snip the rest of the doc, which as noted I've god unreplied-to feedback
> on in https://lore.kernel.org/git/87im5nzbe6.fsf@evledraar.gmail.com/

Really appreciate the engagement! I'll try to reply early next week.
Albert Cui April 5, 2021, 10:45 p.m. UTC | #7
On Tue, Mar 30, 2021 at 8:24 AM Derrick Stolee <stolee@gmail.com> wrote:
>
> On 3/25/2021 9:43 PM, Albert Cui via GitGitGadget wrote:
> > From: Albert Cui <albertqcui@gmail.com>
> >
> > Hooks today are configured at the repository level, making it difficult to
> > share hooks across repositories. Configuration-based hook management, by
> > moving hooks configuration to the config, makes this much easier. However,
> > there is still no good way for project maintainers to encourage or enforce
> > adoption of specific hook commands on specific hook events in a repository.
> > As such, there are many tools that provide this functionality on top of Git.
> >
> > This patch documents the requirements we propose for this feature as well as
> > a design sketch for implementation.
>
> Sorry for being so late in reviewing this.
>
> My first reaction is that this feature is suggesting multiple security
> vulnerabilities as core functionality. It also seems to be tied to
> niche projects (in number of projects, not necessarily the size of those
> projects).
>

According to it's GitHub page, Husky [0] has 22M million monthly downloads and
is used by 437K projects. Many projects, both big (e.g. Android [1],
ChromeOS [2],
Angular [3], Webpack [4]) and small have use cases around hooks configuration,
so I don't think it's fair to say this is only needed by niche projects.

> I was recommended in conversation to think of this as a way to take
> existing ad-hoc behavior and standardize it with a "Git-blessed"
> solution. I'm not sure this proposal makes a strong enough case for
> why having a "configure-hooks.sh" script in the base of the repo is
> not enough. It simultaneously does not use existing precedents like
> .gitattributes or .gitignore as direction in using the worktree at
> HEAD as a mechanism for communicating details. I find using a separate
> ref for hooks to be a non-starter and the design should be rebuilt from
> scratch.
>

Right, this entire proposal is trying to get to a "Git-blessed" solution,
and I should make the need clearer. A few reasons for standardizing
this come to mind:

1. Many existing "standard" solutions. A multitude of existing solutions for
this use case speaks to the fact that a basic config script is not sufficient.
I mentioned Husky above, but here are a few more; basically each
popular programming language environment has a solution for this.

https://github.com/sds/overcommit - Ruby
https://github.com/pre-commit/pre-commit - Python
https://github.com/Arkweid/lefthook - Go
https://github.com/shibapm/Komondor - Swift
https://github.com/typicode/husky - Node

These solutions all handle the installation and updating of hooks. A
"configure-hooks.sh" script doesn't handle hook updates, unless you go through
the trouble yourself of implementing and maintaining that.

2. External dependencies. The existing solutions require users to
either have those
toolchains already installed or an explicit installation step via an OS package
manager, so there's a lot of friction for users, even when using these
standard solutions.
This functionality shouldn't require external dependencies, and most
certainly _how_ you
set up hooks shouldn't change depending on your coding environment.
Fixing this is only
possible in a world where Git supports this functionality natively.

3. Improving security. As you mentioned, hooks are difficult to get
right from a security
perspective, and standardizing on a single implementation allows us to
give developers
a well-vetted solution with a better security model than what exists
today. For example,
we're proposing making it very clear to users whenever there's a hook
update. This isn't
something that existing solutions do.

I'll also say in general, the Git project is much more likely to get
security right than smaller
projects, where oftentimes even popular projects end up unmaintained.

For the design feedback: what I'm hearing is that you'd prefer a
design that keeps configuration in the worktree. I'll leave discussion about
implementation to those on the list better suited :)

However, one thing that I should have mentioned in patch about the design
proposal (as Emily Shaffer touched upon in her response) is that since the
configuration is separate from the code base, it allows you to go back
in history
or to older branches while preserving "improvements" to the hooks configuration
e.g. maybe the project has shifted to using a newer version of a linter.

This functionality has pros and cons. Bringing improvements to checks
can be useful
e.g. it's arguably better to bring older code up to newer code style.
But it also makes
it less possible to replicate the exact state of older history. This
is a problem that
server-side checks also have.

> I also expect that a significant portion of users will see a message
> like "this repository needs hooks" and will just say "yes" to get rid
> of the prompt. There needs to be sufficient opportunity for users to
> inspect the hook configuration and avoid frustrated or distracted users
> from doing the wrong thing.
>
> Server-side checks should always exist, so users who don't follow the
> project's guidelines using the recommended hooks will be blocked. The
> important thing is that there is an easy way for willing participants
> to install the correct hooks. This doesn't mean we should make it
> almost automatic.
>

What I'm hearing here is that you don't like the interactive prompt, as you
noted in the UX feedback below. Allowing users the chance to inspect the hook
configuration makes sense to call out explicitly as part of the security model.

> Also, please proactively pursue a security review of the feature,
> including non-technical risks such as social engineering, forks, or
> other possible attacks. This idea seems so risky that I would be
> against accepting it unless a security expert has done a thorough
> review.
>

Agreed. We already did a security review internally at Google. The main
feedback was:

* We need an explicit opt-in opposed to setting hooks up automatically,
e.g. a command line flag like --accept-hooks at minimum. This is primarily
to distinguish people who are just cloning a repository to browse the code
from people who are developing.

* The average user doesn't have the ability to review hooks in general
(security is hard and obscuration is easy), and if the user has
already opted into
this feature because they are engaged in development, it's very likely
that they're
already running build scripts, so the additional attack vector here doesn't seem
like a big issue.

If there are other security folks I should seek advice from, please let me know.

[...]
> I think providing a way for repository owners to _recommend_ how cloners
> should interact with the repository is a good idea. I think starting with
> hooks is perhaps a significant jump to the most complicated version of
> that idea.
>
> As you think of this design, it might be good to think about how some
> recommended Git config (within an allow-list) might fit into this system
> as well. I would have started there, with things like "Use partial clone"
> or "use sparse-checkout". Those are really things that need to happen at
> clone time, they can't really happen after-the-fact, which helps justifying
> a modification to 'git clone'. The hook configuration doesn't _need_ to
> happen during 'git clone'. More on this timing later.
>
> The .gitattributes file is the closest analogue I could find in current
> functionality, but it operates on a path-based scope, not repository scope.

I am happy to extend this patch to talk about configuration in general with
a specific use case of suggesting hooks. However, I do want to separate out
"what should we do when?" from "should we do this?" Based on this feedback,
I'm hearing "we should have a design for suggested configuration in general,"
but I don't think that necessitates that we should implement, for
example, partial
clone configuration before hooks configuration.

>
> > +Server-side vs Local Checks
> > +^^^^^^^^^^^^^^^^^^^^^^^^^^^
> ...
> > +In the ideal world, developers and project maintainers use both local and server
> > +side checks in their workflow. However, for many smaller projects, this may not
> > +be possible: CI may be too expensive to run or configure. The number of local
> > +solutions to this use case speaks to this need (see <<prior-art, Prior Art>>).
> > +Bringing this natively to Git can give all these developers a well-supported,
> > +secure implementation opposed to the fragmentation we see today.
>
> I'm not sure this is a good selling point for small projects. If they
> are small, then the CI to verify commits is cheap(er).
>
> Local hooks should never be used as a replacement for server-side checks.
> A user could always use a repository without the local hooks and push
> commits that have not been vetted locally. The extreme example is to have
> a commit hook that compiles the code and runs all the tests. Would you
> then remove all CI builds?
>
> Making it easier to adopt local hooks can avoid some pain points when
> users are blocked by the server-side checks.
>

Sorry, this wasn't meant to come off as "small projects don't need CI." It was
more to say, "small projects today are already primarily relying on
local checks,
so we should make that easier for them." I clarified that a bit here as well:
https://lore.kernel.org/git/CAMbkP-Q4_z-nQzJwr2ZeM16deHKmzr=Z4908UzgOyk9FA-gKEA@mail.gmail.com/T/#u

[...]

> I appreciate the motivation in this document. However, the motivation
> doesn't really justify why this should be baked into Git itself, since
> a "configure-repo" script in the base of the repo would suffice to
> achieve that functionality.
>
> The reason to put this in Git is to standardize this process so it
> is not different in each repository. It might be good to spend time
> justifying that angle.
>

Thank you for the feedback. I hope what I wrote above helps.

[...]
> > +* Trust comes from the central repository:
> > +  ** Most users don't have the time or expertise to properly audit every hook
> > +  and what it does. There must be trust between the user and the remote that the
> > +  code came from, and the Git project should ensure trust to the degree it can
> > +  e.g. enforce HTTPS for its integrity guarantees.
> > +
> > +  ** Since developers will likely build their local clone in their development
> > +  process, at some point, arbitrary code from the repository will be executed.
> > +  In this sense, hooks _with user consent_ do not introduce a new attack surface.
>
> It is critical that users are presented with this consent at the correct
> times. For instance, I believe configuring local hooks should only be
> done _after_ "git clone" completes. That allows a user to inspect the
> worktree to their content instead of in the middle of an interactive shell
> session or something. (The "git clone" command could output a message to
> stderr saying "This repository recommends configuring local hooks. Run
> 'git <tbd>' to inspect the hooks and configure them.")
>
> We've had enough code-execution bugs with "git clone" that I want to
> completely avoid that possibility here.
>

Responding to this below in the "Consent through clone" section.

> > +* Give users visibility: Git must allow users to make informed decisions. This
> > +means surfacing essential information to the user in a visible manner e.g. what
> > +remotes the hooks are coming from, whether the hooks have changed in the latest
> > +checkout.
>
> As a user moves HEAD, we should similarly avoid updating the hooks
> automatically, but instead present a message to the user to update their
> hooks using an intentional command.
>

Responding to this below.

> > +    ** This could be a path to a script/binary within the repository
>
> Binaries will be tricky if you want users of multiple platforms to
> interact with your repository. And scripts can be slower than binaries.
>
> How could someone build hooks from source using your workflow? Perhaps
> users are expected to locally compile the code before configuring the
> hooks?
>

"Binaries" was primarily referring to packages like pylint which are installed
at the OS level. In an enterprise environment, these could be
installed automatically
for the user.

> > +    ** This could be a path to a script/binary contained within submodules of
> > +    the repository
>
> This gives me significant chills. Proceed with caution here.
>
> I understand the reason to want this feature: you could have a suite
> of repositories using a common hook set that lives in each as a
> submodule. I just want to point out that this adds yet another
> dimension for attack.
>

If we notify users about changes, both to hook commands and to the
underlying source files, we can make this safer.

> > +    ** This could be a user installed command or script/binary that exists
> > +    outside of the repository and is present in `$PATH`
>
> Like `rm -rf ~/*`? I'm trying to think of dangerous things to do without
> elevation. It could help here to clarify the intended user pattern here:
> "This repository requires that you install tool X." This seems unlikely
> to be necessarily true at clone time, so the users will have a broken
> state if they don't run some extra steps. How will that be communicated?
>
> Requirements like these make me think that these repositories would be
> better off with a script that configures the hooks after checking if
> these things actually exist on the PATH (and installs them if not). I
> would lower the priority of this one for now.
>

As mentioned, for enterprise deployments, this can be solved by administrators
installing any necessary software automatically.

Otherwise, I think ensuring the tool is installed feels out-of-scope
(as written in
the doc); it's not like Git makes sure compilers or build tools are
installed today,
and even today, users could set up Husky hooks that rely on $PATH tools, so
we're not introducing a new problem.

Some day, maybe we could have a `post-clone` and `post-checkout` hooks,
but I think this is all out-of-scope of the immediate discussion.

> > +* This configuration should only apply if it was received over HTTPS
>
> Avoiding http:// and git:// makes sense. Why not SSH?
>

Left off SSH accidentally! Definitely makes sense to include.

> > +* A setup command for users to set up hooks
> > +
> > +    ** Hook setup could happen at clone time assuming the user has consented
> > +    e.g. if `--setup-hooks` is passed to `git clone`
>
> This is not enough consent.
>

I hear what you're saying. How would you feel if this specific
functionality was behind
config (that enterprise environments could ship)? What I'm thinking:

```
#~/.gitconfig
[hook]
   allowCloneInstallFromRemote = $REMOTE
```

IFF $REMOTE matches the config, then `git clone $REMOTE --setup-hooks` works.

(We could even get rid of --setup-hooks and rename the config to
hook.installOnCloneFromRemote but I think requiring user consent is
still best here.)

> > +* Users must explicitly approve hooks at least once
> > +
> > +    ** Running the setup command should count as approval, including if the user
> > +    consented during the clone
> > +
> > +    ** When a hook command changes, a user should re-approve execution (note:
> > +    implementation should not interfere with requirement listed in “Fast
> > +    Follows")
>
> Users should explicitly approve hooks any time they would change.
> They should also be able to explore the source of the change using
> whatever editors and tools they want, so the worktree should change
> to its new state without new hooks, _then_ the user could consider
> updating hooks based on that new state.
>

As mentioned above, developers will execute code locally anyway when building.
Do you think allowing users to opt-into hook updates significantly
increases the attack
surface? I should note that existing solutions already do hook updates
automatically and
silently (since they use a trampoline script); this proposal is safer
from that given
1) we ask users to opt into auto updates and 2) we warn users about changes.

> > +Fast Follows
> > +^^^^^^^^^^^^
> > +
> > +* When prompted to execute a hook, users can specify always or never, even if
> > +the hook updates
>
> I don't understand what this means. "when prompted to execute a hook" are you
> saying that the user will get a message saying "Git will now run the pre-commit
> hook, are you ok with that?"
>

This should say "When prompted to install a hook, users can specify always or
never, even if the hook updates. For security, this consent should be tied to
a specific remote."

[trimming some duplicate points]

> > +# instead of prompting, we could give users commands to run instead
> > +# see case 3
> > +
> > +Do you wish to install them?
> > +1. Yes (this time)
> > +2. Yes (always from origin)
> > +3. No (not this time)
> > +4. No (never)
>
> I'd rather see the installation as a separate step. That gives more
> weight to the users' consent. Even if you do have a prompt here that
> says Yes/No, *do not* include "always from origin".
>
> > +===== Case 3: Re-prompting when hooks change
> > +....
> > +$ git pull
> > +
> > +The following hooks were updated from remote `origin` ($ORIGIN_URL):
> > +
> > +pre-push:  $GIT_ROOT/pre_push.sh
> > +
> > +If you wish to install them, run `git hook setup origin`.
>
> Good. Stop here. Perhaps also describe this as something that happens
> with "git checkout" because it matters when HEAD updates, even if the
> commit was fetched earlier.
>

Sure, case 3 seems like a reasonable path to pursue. Sounds like auto-updating
is the main point of contention here.

[...]

> > +Implementation Sketch
> > +^^^^^^^^^^^^^^^^^^^^^
> > +
> > +* Perform fetch as normal
> > +
> > +* After fetch is complete, Git checks for a "magic" config branch (e.g.
> > ++origin/refs/recommended-config+) which contains information about config lines
> > +an end-user may want (including hooks).
>
> I think this is the wrong direction to go. You are recommending a few things:
>
> 1. Some branch names are more special than others.
> 2. Hooks live in a separate history than the rest of the repository.
> 3. Users cannot inspect the hooks in their worktree before installation.
>
> Instead, think about things like .gitignore and .gitattributes, as they can
> change as the repository changes. Make a special _filename_ or directory:
> for example ".githooks/".
>

#2 is a feature, as mentioned above, but certainly just a
"nice-to-have." I agree we'd
need a solution to #3.

Thanks for all the feedback!

Albert

[0] https://github.com/typicode/husky
[1] https://android.googlesource.com/platform/tools/repohooks/
[2] https://chromium.googlesource.com/chromiumos/docs/+/HEAD/contributing.md#Upload-changes
[3] https://github.com/angular/angular/tree/master/.husky
[4] https://github.com/webpack/webpack/tree/master/.husky
Junio C Hamano April 5, 2021, 11:09 p.m. UTC | #8
Albert Cui <albertqcui@gmail.com> writes:

>> Requirements like these make me think that these repositories would be
>> better off with a script that configures the hooks after checking if
>> these things actually exist on the PATH (and installs them if not). I
>> would lower the priority of this one for now.
>>
>
> As mentioned, for enterprise deployments, this can be solved by administrators
> installing any necessary software automatically.
>
> Otherwise, I think ensuring the tool is installed feels out-of-scope
> (as written in
> the doc); it's not like Git makes sure compilers or build tools are
> installed today,
> and even today, users could set up Husky hooks that rely on $PATH tools, so
> we're not introducing a new problem.

I am afraid that this compares apples and oranges.  

I may "git clone" and try "make" to find out that I needed a special
compiler, and that would not be the end of the world.  It is
guaranteed that "git clean -f -x -d" followed by installation of
necessary toolchain followed by "make" would work.  And that is
partly because "git clone" does not do any more than just clone and
checkout the initial tree.

If a new version of "git clone" told me "I can install the project
recommended hooks to use", I answer "yes", and then failed while
installing and configuring the project-recommended hooks because of
missing dependencies, then I wouldn't know in what state the result
would be in.  In some projects, it may be enough to just install the
missing dependencies, and in some others, it may not be enough and I
have a broken half-configured mess depending on how the "installing
and configuring" step failed.
Albert Cui April 5, 2021, 11:40 p.m. UTC | #9
On Mon, Apr 5, 2021 at 4:09 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> Albert Cui <albertqcui@gmail.com> writes:
>
> >> Requirements like these make me think that these repositories would be
> >> better off with a script that configures the hooks after checking if
> >> these things actually exist on the PATH (and installs them if not). I
> >> would lower the priority of this one for now.
> >>
> >
> > As mentioned, for enterprise deployments, this can be solved by administrators
> > installing any necessary software automatically.
> >
> > Otherwise, I think ensuring the tool is installed feels out-of-scope
> > (as written in
> > the doc); it's not like Git makes sure compilers or build tools are
> > installed today,
> > and even today, users could set up Husky hooks that rely on $PATH tools, so
> > we're not introducing a new problem.
>
> I am afraid that this compares apples and oranges.
>
> I may "git clone" and try "make" to find out that I needed a special
> compiler, and that would not be the end of the world.  It is
> guaranteed that "git clean -f -x -d" followed by installation of
> necessary toolchain followed by "make" would work.  And that is
> partly because "git clone" does not do any more than just clone and
> checkout the initial tree.
>
> If a new version of "git clone" told me "I can install the project
> recommended hooks to use", I answer "yes", and then failed while
> installing and configuring the project-recommended hooks because of
> missing dependencies, then I wouldn't know in what state the result
> would be in.  In some projects, it may be enough to just install the
> missing dependencies, and in some others, it may not be enough and I
> have a broken half-configured mess depending on how the "installing
> and configuring" step failed.

I'm a little confused, and maybe it's because we have different
definitions of what
"installing hooks" means. By installing hooks, I meant the addition of
the hook command to the config, e.g the outcome of:

`git config --add hook.pre-commit.command pylint`

This works today; Git won't complain if I don't have pylint installed, so
I don't see how we'd get into a "broken half-configured mess."

It will complain when it tries to execute the hook, and this is where I see it
as the same as the 'I may "git clone" and try "make" to find out that
I needed a special compiler, and that would not be the end of the world' case.
Albert Cui April 5, 2021, 11:42 p.m. UTC | #10
On Fri, Apr 2, 2021 at 2:59 AM Ævar Arnfjörð Bjarmason <avarab@gmail.com> wrote:
>
>
> On Fri, Mar 26 2021, Albert Cui via GitGitGadget wrote:
>
> > From: Albert Cui <albertqcui@gmail.com>
>
> Formatting nit:
>
> > +Git has https://git-scm.com/docs/githooks[hooks] functionality to allow users to
> > [...]
> > +local checks of code e.g. linting. As documented in
> > +https://git-scm.com/docs/gitfaq#_hooks[gitfaq], in some cases, server-side
> > +checks may be more appropriate, especially since developers can always skip
>
> This should be a linkgit:* even in the technical/ directory, should it
> not? We build docs on git-scm.com (among others), but in our source tree
> we should be using linking syntax, not linking to our own already-built
> docs on some website.

Yes, thanks. Good catch.
Junio C Hamano April 6, 2021, 12:13 a.m. UTC | #11
Albert Cui <albertqcui@gmail.com> writes:

> I'm a little confused, and maybe it's because we have different
> definitions of what "installing hooks" means. By installing hooks,
> I meant the addition of the hook command to the config, e.g the
> outcome of:
>
> `git config --add hook.pre-commit.command pylint`

Yeah, I was envisioning that it won't be as a simple, mechanical and
unconditional single command invocation.  Rather, the 'pylint' part
would have to become different depending on the environment (e.g.
what operating system you are on, what editor you prefer, etc.).
And the part that decides what kind of environment you are on would
have to be written by the project that controls the "project-managed
hooks"---unfortunately that would most likely to be an error-prone
part.
Albert Cui April 6, 2021, 12:27 a.m. UTC | #12
On Mon, Apr 5, 2021 at 5:14 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> Albert Cui <albertqcui@gmail.com> writes:
>
> > I'm a little confused, and maybe it's because we have different
> > definitions of what "installing hooks" means. By installing hooks,
> > I meant the addition of the hook command to the config, e.g the
> > outcome of:
> >
> > `git config --add hook.pre-commit.command pylint`
>
> Yeah, I was envisioning that it won't be as a simple, mechanical and
> unconditional single command invocation.  Rather, the 'pylint' part
> would have to become different depending on the environment (e.g.
> what operating system you are on, what editor you prefer, etc.).
> And the part that decides what kind of environment you are on would
> have to be written by the project that controls the "project-managed
> hooks"---unfortunately that would most likely to be an error-prone
> part.
>

I don't see a need for that. At least, this doesn't feel part of the
"minimum feature set."

I don't think any existing solutions for hooks management handle this,
so it feels fine to push this part to the project. As you said, if
they wanted this, I'd expect `git config --add hook.pre-commit.command
lint.sh` where lint.sh does whatever cross-platform magic is required.

I wouldn't expect a lot of need for this anyway; I'd expect most hooks
to depend on tools that already work cross-platform (e.g. pylint) or
be hooks written in the programming environment of the project, so the
toolchain should already be in place to execute the hooks.
brian m. carlson April 6, 2021, 11:15 p.m. UTC | #13
On 2021-04-05 at 22:45:10, Albert Cui wrote:
> Right, this entire proposal is trying to get to a "Git-blessed" solution,
> and I should make the need clearer. A few reasons for standardizing
> this come to mind:
> 
> 1. Many existing "standard" solutions. A multitude of existing solutions for
> this use case speaks to the fact that a basic config script is not sufficient.
> I mentioned Husky above, but here are a few more; basically each
> popular programming language environment has a solution for this.
> 
> https://github.com/sds/overcommit - Ruby
> https://github.com/pre-commit/pre-commit - Python
> https://github.com/Arkweid/lefthook - Go
> https://github.com/shibapm/Komondor - Swift
> https://github.com/typicode/husky - Node
> 
> These solutions all handle the installation and updating of hooks. A
> "configure-hooks.sh" script doesn't handle hook updates, unless you go through
> the trouble yourself of implementing and maintaining that.

I think part of the problem is that an automated process to update hooks
is generally a security vulnerability, since it means that untrusted
remote code will automatically run on your computer.

I want to be clear that I understand the desire for this feature, even
though it's not a feature I would personally use, and the fact that
there are many approaches means that clearly there are many people that
do want this functionality.  I have in the past shared hooks with others
and we have mutually benefitted enormously from that fact.  My concerns
here are solely about the security aspects of this feature.

> 3. Improving security. As you mentioned, hooks are difficult to get
> right from a security
> perspective, and standardizing on a single implementation allows us to
> give developers
> a well-vetted solution with a better security model than what exists
> today. For example,
> we're proposing making it very clear to users whenever there's a hook
> update. This isn't
> something that existing solutions do.

I don't think this materially improves security.  All of these options
have the same security problems, and that's inherent in the solution.
What we're doing here is basically giving people a built-in feature that
is the equivalent of piping curl to bash and blessing it as secure when
it's not.

> I'll also say in general, the Git project is much more likely to get
> security right than smaller
> projects, where oftentimes even popular projects end up unmaintained.

I agree that Git tries to be careful about security.  It is for these
reasons that I think Derrick and I have provided you the feedback we
have here.

> Agreed. We already did a security review internally at Google. The main
> feedback was:
> 
> * We need an explicit opt-in opposed to setting hooks up automatically,
> e.g. a command line flag like --accept-hooks at minimum. This is primarily
> to distinguish people who are just cloning a repository to browse the code
> from people who are developing.
> 
> * The average user doesn't have the ability to review hooks in general
> (security is hard and obscuration is easy), and if the user has
> already opted into
> this feature because they are engaged in development, it's very likely
> that they're
> already running build scripts, so the additional attack vector here doesn't seem
> like a big issue.

I think you've hit the nail on the head here, but drawn a mistaken
conclusion.  The average user doesn't have the ability to review hooks
in general and therefore cannot make an informed decision about whether
to enable them, so the behavior we need to have is not to lead them to
doing things which are risky from a security perspective.

If my goal is to just build a product and not to run its tests, which I
do with a decent number of projects, then I can audit a Go or Rust
project trivially and determine if it executes arbitrary code or not
during the build process and if so, inspect it and gain confidence in
it.  In fact, there are many projects which don't execute build scripts
during the process, and therefore which are completely safe.  This hook
design changes that calculus dramatically.

I also want to point out that people clone repositories for a variety of
reasons.  At GitHub, every team has its own repository with
documentation.  Literally every employee at the company, regardless of
role, interacts with a Git repository, even people who do normally
nontechnical tasks such as our in-house lawyers and our event planners.
Many of these people are nontechnical, and almost none of these
repositories has any software development involvement.  There are also
numerous people elsewhere who may work on projects such as books or
other non-software in repositories who are nontechnical.  Under the
current model, the biggest problem these people face is accidentally
corrupting their local repository and losing data.  With a design that
prompts them to install hooks, they face the possibility of arbitrary
code execution.

The reason I proposed the FAQ we have in our documentation is because I
answer a decent number of questions on Stack Overflow, in addition to
questions that involve users that I get pulled into at work.
Overwhelmingly, the vast majority of users, even developers, are not
completely comfortable with Git and are unsure about how to use it
effectively (cf. https://ohshitgit.com/).  If we propose to a user that
they should do something like enable hooks by adding a prompt, many
users will automatically say "yes" because (a) they don't understand and
they trust that Git is prompting them to do something beneficial and (b)
because they don't know or care and just want to get on with their
lives.  As a result, we're exposing people to giant social engineering
attacks on behalf of potentially unscrupulous repository maintainers.

This is made worse by the fact that we will prompt users even when
cloning a repo that they have no intention of performing development on
means that we will have users who are misled here where otherwise
nothing would happen.

There is a huge problem with social engineering attacks and phishing on
the Internet today and I'm concerned that this is going in exactly the
wrong direction.

I would want to see a comprehensive security analysis feature taking
into consideration social engineering attacks, the skill level and
comfort with Git of the majority of Git users, and the fact that people
clone repositories for many reasons other than software development.
It's easy to look at this from the perspective of the typical employee
at a major tech company and assume that users are generally security
conscious, comfortable with Git, and primarily engaged in software
development on the projects they clone, but I'm not sure any of those
cases are generally true, and anyway there are many counterexamples in
the real world whose use cases we need to take into account.

I continue to have serious reservations about this series and approach,
and I'm not sure that any proposal we can adopt here will address the
security concerns.  To be frank, I don't think this proposal should move
forward in its current state or otherwise, since I think the security
problems are inherent in this approach and fundamentally can't be fixed.

This is, as should be obvious from my email address, my personal
opinion, despite my reference to my employer above.  Unless otherwise
stated, I don't speak for my employer and they don't speak for me.
Ævar Arnfjörð Bjarmason April 7, 2021, 7:53 a.m. UTC | #14
On Wed, Apr 07 2021, brian m. carlson wrote:

> On 2021-04-05 at 22:45:10, Albert Cui wrote:
>> Right, this entire proposal is trying to get to a "Git-blessed" solution,
>> and I should make the need clearer. A few reasons for standardizing
>> this come to mind:
>> 
>> 1. Many existing "standard" solutions. A multitude of existing solutions for
>> this use case speaks to the fact that a basic config script is not sufficient.
>> I mentioned Husky above, but here are a few more; basically each
>> popular programming language environment has a solution for this.
>> 
>> https://github.com/sds/overcommit - Ruby
>> https://github.com/pre-commit/pre-commit - Python
>> https://github.com/Arkweid/lefthook - Go
>> https://github.com/shibapm/Komondor - Swift
>> https://github.com/typicode/husky - Node
>> 
>> These solutions all handle the installation and updating of hooks. A
>> "configure-hooks.sh" script doesn't handle hook updates, unless you go through
>> the trouble yourself of implementing and maintaining that.
>
> I think part of the problem is that an automated process to update hooks
> is generally a security vulnerability, since it means that untrusted
> remote code will automatically run on your computer.
>
> I want to be clear that I understand the desire for this feature, even
> though it's not a feature I would personally use, and the fact that
> there are many approaches means that clearly there are many people that
> do want this functionality.  I have in the past shared hooks with others
> and we have mutually benefitted enormously from that fact.  My concerns
> here are solely about the security aspects of this feature.
>
>> 3. Improving security. As you mentioned, hooks are difficult to get
>> right from a security
>> perspective, and standardizing on a single implementation allows us to
>> give developers
>> a well-vetted solution with a better security model than what exists
>> today. For example,
>> we're proposing making it very clear to users whenever there's a hook
>> update. This isn't
>> something that existing solutions do.
>
> I don't think this materially improves security.  All of these options
> have the same security problems, and that's inherent in the solution.
> What we're doing here is basically giving people a built-in feature that
> is the equivalent of piping curl to bash and blessing it as secure when
> it's not.
>
>> I'll also say in general, the Git project is much more likely to get
>> security right than smaller
>> projects, where oftentimes even popular projects end up unmaintained.
>
> I agree that Git tries to be careful about security.  It is for these
> reasons that I think Derrick and I have provided you the feedback we
> have here.
>
>> Agreed. We already did a security review internally at Google. The main
>> feedback was:
>> 
>> * We need an explicit opt-in opposed to setting hooks up automatically,
>> e.g. a command line flag like --accept-hooks at minimum. This is primarily
>> to distinguish people who are just cloning a repository to browse the code
>> from people who are developing.
>> 
>> * The average user doesn't have the ability to review hooks in general
>> (security is hard and obscuration is easy), and if the user has
>> already opted into
>> this feature because they are engaged in development, it's very likely
>> that they're
>> already running build scripts, so the additional attack vector here doesn't seem
>> like a big issue.
>
> I think you've hit the nail on the head here, but drawn a mistaken
> conclusion.  The average user doesn't have the ability to review hooks
> in general and therefore cannot make an informed decision about whether
> to enable them, so the behavior we need to have is not to lead them to
> doing things which are risky from a security perspective.
>
> If my goal is to just build a product and not to run its tests, which I
> do with a decent number of projects, then I can audit a Go or Rust
> project trivially and determine if it executes arbitrary code or not
> during the build process and if so, inspect it and gain confidence in
> it.  In fact, there are many projects which don't execute build scripts
> during the process, and therefore which are completely safe.  This hook
> design changes that calculus dramatically.
>
> I also want to point out that people clone repositories for a variety of
> reasons.  At GitHub, every team has its own repository with
> documentation.  Literally every employee at the company, regardless of
> role, interacts with a Git repository, even people who do normally
> nontechnical tasks such as our in-house lawyers and our event planners.
> Many of these people are nontechnical, and almost none of these
> repositories has any software development involvement.  There are also
> numerous people elsewhere who may work on projects such as books or
> other non-software in repositories who are nontechnical.  Under the
> current model, the biggest problem these people face is accidentally
> corrupting their local repository and losing data.  With a design that
> prompts them to install hooks, they face the possibility of arbitrary
> code execution.
>
> The reason I proposed the FAQ we have in our documentation is because I
> answer a decent number of questions on Stack Overflow, in addition to
> questions that involve users that I get pulled into at work.
> Overwhelmingly, the vast majority of users, even developers, are not
> completely comfortable with Git and are unsure about how to use it
> effectively (cf. https://ohshitgit.com/).  If we propose to a user that
> they should do something like enable hooks by adding a prompt, many
> users will automatically say "yes" because (a) they don't understand and
> they trust that Git is prompting them to do something beneficial and (b)
> because they don't know or care and just want to get on with their
> lives.  As a result, we're exposing people to giant social engineering
> attacks on behalf of potentially unscrupulous repository maintainers.
>
> This is made worse by the fact that we will prompt users even when
> cloning a repo that they have no intention of performing development on
> means that we will have users who are misled here where otherwise
> nothing would happen.
>
> There is a huge problem with social engineering attacks and phishing on
> the Internet today and I'm concerned that this is going in exactly the
> wrong direction.
>
> I would want to see a comprehensive security analysis feature taking
> into consideration social engineering attacks, the skill level and
> comfort with Git of the majority of Git users, and the fact that people
> clone repositories for many reasons other than software development.
> It's easy to look at this from the perspective of the typical employee
> at a major tech company and assume that users are generally security
> conscious, comfortable with Git, and primarily engaged in software
> development on the projects they clone, but I'm not sure any of those
> cases are generally true, and anyway there are many counterexamples in
> the real world whose use cases we need to take into account.
>
> I continue to have serious reservations about this series and approach,
> and I'm not sure that any proposal we can adopt here will address the
> security concerns.  To be frank, I don't think this proposal should move
> forward in its current state or otherwise, since I think the security
> problems are inherent in this approach and fundamentally can't be fixed.
>
> This is, as should be obvious from my email address, my personal
> opinion, despite my reference to my employer above.  Unless otherwise
> stated, I don't speak for my employer and they don't speak for me.

I agree with pretty much every word you said, in particular the social
engineering aspect of this. In past mails I've referred to elsewhere
I've proposed some Emacs-like "ask" facility for git, but you've
convinced me that that default would be a bad idea for the "user just
clicks yes no matter what" reasons you noted.

Still, I do think there is a way to make this work in a way that's
probably acceptable for everyone:

 * We don't ever ask the user to install hooks, it's something that
   they'd have to know about and pro-actively set up in advance.

 * The security model is entirely focused on not approving changes as
   you "pull" them, but e.g. GPG-verifying the whole chain with some
   pre-setup key.

The use-case (and I've had this use-case in the past) would be something
like a BigCorp which automates its servers/laptops, but would prefer not
to patch/build/ship something like git itself.

So when you "git clone" your corporate repos you get relevant
config/hooks, but not otherwise. We'd of course have a way to discover
that you can set these up & do so after "clone", but it would be
something more like check-mailmap, not something we'd prompt you to do.

I'm personally much more interested in doing something like this for an
in-repo .gitconfig, with us shipping a graduals whitelist of known
config values at differeng safety levels.

That sort of thing really *is* something we could imagine asking the
user about, or even doing by default, e.g. applying a "diff -U<n>"
setting for that repo, picking up non-executable aliases for
non-data-changing git programs etc.

But as you note hooks are really on the extreme other side of that
security curve, which is why in some earlier thread discussing this I
suggested that a much more productive way to start an effort like this
would be the in-repo .gitconfig route. We could start with our N most
safe config variables, and work from there...
Derrick Stolee April 7, 2021, 1:09 p.m. UTC | #15
On 4/7/2021 3:53 AM, Ævar Arnfjörð Bjarmason wrote:
> 
> On Wed, Apr 07 2021, brian m. carlson wrote:
>>
>> I continue to have serious reservations about this series and approach,
>> and I'm not sure that any proposal we can adopt here will address the
>> security concerns.  To be frank, I don't think this proposal should move
>> forward in its current state or otherwise, since I think the security
>> problems are inherent in this approach and fundamentally can't be fixed.
>>
>> This is, as should be obvious from my email address, my personal
>> opinion, despite my reference to my employer above.  Unless otherwise
>> stated, I don't speak for my employer and they don't speak for me.
> 
> I agree with pretty much every word you said, in particular the social
> engineering aspect of this. In past mails I've referred to elsewhere
> I've proposed some Emacs-like "ask" facility for git, but you've
> convinced me that that default would be a bad idea for the "user just
> clicks yes no matter what" reasons you noted.

These replies definitely speak from a perspective common to mine.
This is very dangerous territory and should be handled carefully.

There is also a legitimate user need to use hooks _to contribute_
to some repositories. Hooks are not needed to read the repositories
or interact with them as a document.

The current mechanisms require ad-hoc approaches that are custom to
each project, so there would be value in creating a standard inside
the Git client itself. I think the proposal goes too far in making
this an automatic configuration, either because it assumes trust or
assumes sufficient skepticism on behalf of the users. Either is not
acceptable for the Git project.

Here are the hard lines I draw:

1. This should not happen in "git clone" (other than maybe a message
   over stderr that hooks are available to be configured through a
   different command).

2. Hooks should not update in "git checkout" (other than a message
   that hooks have updated).

3. Whatever document triggers a hook configuration should live at
   HEAD and should not be configured or updated until HEAD has been
   updated by one Git command (git clone, git checkout), time
   passes for the user to inspect the worktree, then _another_
   command (git hooks?) is run manually to reconfigure the hooks.

I think there is a potential way forward if these items are followed.

But I'd like to ask a different question: What problems are these
custom hooks solving, and can Git solve those problems in-core?

If we care about checking commits for format or something, is that
a common enough problem that we could implement it in Git itself and
enable it through a Git config option? It might be interesting to
pursue this direction and maybe we'll solve 80% of the need with
extensions like that.

I'm aware of some hooks that insert things like a Gerrit change-id
that would probably not be appropriate for such an in-core change.

There is always the extreme option of requiring users to use a
specific fork of Git in order to work with your repository. That
has its own pains, believe me. But, it does allow for the ultimate
flexibility in how these things are done. Optional config can be
enabled by default. Hooks can be replaced with in-core functionality.

Thanks,
-Stolee
Albert Cui April 7, 2021, 6:40 p.m. UTC | #16
On Wed, Apr 7, 2021 at 6:09 AM Derrick Stolee <stolee@gmail.com> wrote:
>
> On 4/7/2021 3:53 AM, Ævar Arnfjörð Bjarmason wrote:
> >
> > On Wed, Apr 07 2021, brian m. carlson wrote:
> >>
> >> I continue to have serious reservations about this series and approach,
> >> and I'm not sure that any proposal we can adopt here will address the
> >> security concerns.  To be frank, I don't think this proposal should move
> >> forward in its current state or otherwise, since I think the security
> >> problems are inherent in this approach and fundamentally can't be fixed.
> >>
> >> This is, as should be obvious from my email address, my personal
> >> opinion, despite my reference to my employer above.  Unless otherwise
> >> stated, I don't speak for my employer and they don't speak for me.
> >
> > I agree with pretty much every word you said, in particular the social
> > engineering aspect of this. In past mails I've referred to elsewhere
> > I've proposed some Emacs-like "ask" facility for git, but you've
> > convinced me that that default would be a bad idea for the "user just
> > clicks yes no matter what" reasons you noted.
>
> These replies definitely speak from a perspective common to mine.
> This is very dangerous territory and should be handled carefully.
>
> There is also a legitimate user need to use hooks _to contribute_
> to some repositories. Hooks are not needed to read the repositories
> or interact with them as a document.
>
> The current mechanisms require ad-hoc approaches that are custom to
> each project, so there would be value in creating a standard inside
> the Git client itself. I think the proposal goes too far in making
> this an automatic configuration, either because it assumes trust or
> assumes sufficient skepticism on behalf of the users. Either is not
> acceptable for the Git project.
>
> Here are the hard lines I draw:
>
> 1. This should not happen in "git clone" (other than maybe a message
>    over stderr that hooks are available to be configured through a
>    different command).
>
> 2. Hooks should not update in "git checkout" (other than a message
>    that hooks have updated).
>

To Ævar's point, maybe it would help to separate the two user bases of
project configured hooks.
(1) Employee working at BigCorp. They are cloning from a trusted
remote on company machines where the company controls what gets
installed and how Git is configured. Their motivation is to make
changes to their local clone and submit changes to a central
repository.
(2) Git user cloning from any remote e.g. GitHub. They could have many
motivations: to make changes, to inspect the code, to simply just
build.

I agree that this feature should not get in the way of users (2), or
expose them to new attack surfaces, users who may have no desire to
have project configured hooks. That said, I think we can still get
into a world that better serves users (1). I proposed this upthread
and would like feedback on it (I realize these examples still assume
one config for every branch, but you get the gist):

Case 1. Opt-into clone setup via config
```
#~/.gitconfig
[hook]
   allowCloneInstallFromRemote = $REMOTE
```

IFF $REMOTE matches the config, then `git clone $REMOTE --setup-hooks` works:

```
$ git clone $remote --setup-hooks
The following hooks were installed from `origin` ($ORIGIN_URL):
pre-push: $GIT_ROOT/pre_push.sh
```

Case 2. Without the config opt-in for clone setup
```
$ git clone $remote # using --setup-hooks here wouldn't change
behavior since there's no config opt-in
Remote `origin` ($ORIGIN_URL) suggests the following hooks:
pre-push: $GIT_ROOT/pre_push.sh

If you wish to install them, run `git hook setup origin`.
To always ignore hooks from `origin`, run `git hook ignore origin`.
```

Case 3. Opting into updates
You could imagine a similar config, e.g. allowAutoUpdateFromRemote
that allows Git to prompt users to consent to auto-updating hooks on
"git checkout" with this type of behavior:

....
$ git checkout

The following hooks were updated from remote `origin` ($ORIGIN_URL):

pre-push: $GIT_ROOT/pre_push.sh

If you wish to install them, run `git hook setup origin`.

# The below only appears if allowAutoUpdateFromRemote is set for $ORIGIN_URL
If you wish to always accept hooks from `origin`, run `git hook setup --always
origin`. You should only do this if you trust code changes from origin.

To always ignore hooks from `origin`, run `git hook ignore origin`.
....

> 3. Whatever document triggers a hook configuration should live at
>    HEAD and should not be configured or updated until HEAD has been
>    updated by one Git command (git clone, git checkout), time
>    passes for the user to inspect the worktree, then _another_
>    command (git hooks?) is run manually to reconfigure the hooks.
>

I want to separate the requirement from the implementation. What I'm
hearing is that "users should have a chance to inspect the suggested
hook before consenting to installing it." That doesn't necessarily
require the configuration to be in HEAD.

Again, that's reasonable for users (2) but doesn't seem necessary for
users (1) if we have the correct opt-ins.

> I think there is a potential way forward if these items are followed.
>
> But I'd like to ask a different question: What problems are these
> custom hooks solving, and can Git solve those problems in-core?
>
> If we care about checking commits for format or something, is that
> a common enough problem that we could implement it in Git itself and
> enable it through a Git config option? It might be interesting to
> pursue this direction and maybe we'll solve 80% of the need with
> extensions like that.
>
> I'm aware of some hooks that insert things like a Gerrit change-id
> that would probably not be appropriate for such an in-core change.
>

A `git lint` command would cover a lot of the use cases, but to your
point, there are others.

Thanks,
Albert
Junio C Hamano April 7, 2021, 8:02 p.m. UTC | #17
Albert Cui <albertqcui@gmail.com> writes:

>> Here are the hard lines I draw:
>>
>> 1. This should not happen in "git clone" (other than maybe a message
>>    over stderr that hooks are available to be configured through a
>>    different command).
>>
>> 2. Hooks should not update in "git checkout" (other than a message
>>    that hooks have updated).
>>
>
> To Ævar's point, maybe it would help to separate the two user bases of
> project configured hooks.
> (1) Employee working at BigCorp. They are cloning from a trusted
> remote on company machines where the company controls what gets
> installed and how Git is configured. Their motivation is to make
> changes to their local clone and submit changes to a central
> repository.

Hmph.

If the assumption is that their configuration is controlled by
BigCorp when they arrive at their desk, why do you even need any
change to upstream Git, especially with Emily's "configuration file
tells Git what hook scripts to run" in sight?

Wouldn't it be just a matter of the BigCorp installing
/etc/gitconfig on their BigCorpLinux installations?
Ævar Arnfjörð Bjarmason April 7, 2021, 10:23 p.m. UTC | #18
On Wed, Apr 07 2021, Junio C Hamano wrote:

> Albert Cui <albertqcui@gmail.com> writes:
>
>>> Here are the hard lines I draw:
>>>
>>> 1. This should not happen in "git clone" (other than maybe a message
>>>    over stderr that hooks are available to be configured through a
>>>    different command).
>>>
>>> 2. Hooks should not update in "git checkout" (other than a message
>>>    that hooks have updated).
>>>
>>
>> To Ævar's point, maybe it would help to separate the two user bases of
>> project configured hooks.
>> (1) Employee working at BigCorp. They are cloning from a trusted
>> remote on company machines where the company controls what gets
>> installed and how Git is configured. Their motivation is to make
>> changes to their local clone and submit changes to a central
>> repository.
>
> Hmph.
>
> If the assumption is that their configuration is controlled by
> BigCorp when they arrive at their desk, why do you even need any
> change to upstream Git, especially with Emily's "configuration file
> tells Git what hook scripts to run" in sight?

FWIW I've been assuming that this is wanted for BigCorp people who have
devs using vanilla OS's / XCode etc. Having managed laptops with a
managed /etc/gitconfig certainly makes some things easier...

> Wouldn't it be just a matter of the BigCorp installing
> /etc/gitconfig on their BigCorpLinux installations?

Whether you're at BigCorp or not you've got the problem with such a
/etc/gitconfig that it applies to all repos, but someone at BigCorp
might clone both a BigCorp repo (wants custom config, hooks etc.), and
also some random node.js github.com project (should have no such custom
config).

Having a .gitconfig or otherwise making the repo suggest/control the
config/hooks is an elegant way around that issue.

You otherwise need to do something like have config includes apply a
rule to ~/work/, and then hope your users follow your suggestion to
clone repos in the ~/work/ folder.

I think extending config includes to e.g. operate on a glob of the
remote URI was suggested at some point, which would make use-cases like
that easier than they are now, and would be a less invasive change than
what's being discussed here.

But right now we don't have that, so we have a gap there...
Ed Maste April 15, 2021, 4:52 p.m. UTC | #19
On Thu, 18 Mar 2021 at 21:29, brian m. carlson
<sandals@crustytoothpaste.net> wrote:
> > +* Works across Windows/Linux/macOS
>
> Git supports other platforms as well.

In particular, FreeBSD is an example of a platform that is not in the
above list, but included in Git's CI. Is there an explicit list of
supported platforms (and perhaps a notion of support tiers)?

On Wed, 7 Apr 2021 at 09:09, Derrick Stolee <stolee@gmail.com> wrote:
>
> Here are the hard lines I draw:
>
> 1. This should not happen in "git clone" (other than maybe a message
>    over stderr that hooks are available to be configured through a
>    different command).
>
> 2. Hooks should not update in "git checkout" (other than a message
>    that hooks have updated).
>
> 3. Whatever document triggers a hook configuration should live at
>    HEAD and should not be configured or updated until HEAD has been
>    updated by one Git command (git clone, git checkout), time
>    passes for the user to inspect the worktree, then _another_
>    command (git hooks?) is run manually to reconfigure the hooks.
>
> I think there is a potential way forward if these items are followed.
>
> But I'd like to ask a different question: What problems are these
> custom hooks solving, and can Git solve those problems in-core?

I was looking for repository-provided local hook support for the
FreeBSD repositories, and came across this proposal. I'll explain our
desire, in case it can provide some insight. (We started migrating
from Subversion to Git some time ago and finished the last repo a
couple of weeks ago.)

We use one hook today, and would like developers to keep it up to
date: prepare-commit-msg. We have some standard commit message
trailers like Reviewed by: and Differential Revision: that are
provided as templates, and our prepare-commit-msg adds these to the
default git-provided message that lists changed files and such. These
trailers are updated occasionally - for example, we recently adopted
"Fixes:" and added it to the template.

For us, I think a message that hooks are available or updated, and a
command to install or update them would be fine. I can foresee some
usability concern when switching branches - for example, to an older
release branch, but it's probably not a large concern.

> There is always the extreme option of requiring users to use a
> specific fork of Git in order to work with your repository.

FreeBSD effectively did this when we used Subversion - the modified
commit message template was provided by a patched Subversion via our
ports collection. As you suggested it's workable, but not great.
Junio C Hamano April 15, 2021, 7:41 p.m. UTC | #20
Ed Maste <emaste@freebsd.org> writes:

> On Thu, 18 Mar 2021 at 21:29, brian m. carlson
> <sandals@crustytoothpaste.net> wrote:
>> > +* Works across Windows/Linux/macOS
>>
>> Git supports other platforms as well.
>
> In particular, FreeBSD is an example of a platform that is not in the
> above list, but included in Git's CI. Is there an explicit list of
> supported platforms (and perhaps a notion of support tiers)?

It is not like there is a Git company who employs developers to
support certain platforms.  This is the mailing list for the open
source development community for Git, and Developers come and leave
over time [*].

The best you can get out of here is: if you find portability issues
when you tried to make it work on your favorite platform, you can
raise it here and (1) somebody with more Git experience on the same
platform may solve it for you, (2) somebody with similar Git
experience as you do (or less) on the same platform may solve it
with you, or (3) somebody without acess to the platform may offer to
help you solve it.  

You cannot reasonably expect more than that from us on this list.

However, major platforms have their own (often binary) packaging
system that offer Git packaged for their users.  Debian/Ubuntu ship
their own .deb, Git-for-Windows binary installer is made available
promptly after we release a new version, macOS folks have Apple Git
and homebrew or macPorts or whatever (but do not quote me on this; I
am not an Apple user).  What they do is not under our control, and
you need to ask them what their support policies, EOL timelines, and
such.  Some of these packagers lurk around here and may respond, but
that is you getting lucky ;-)


[Footnote]

* You can peek into config.mak.uname to see the list of platforms
  that have had a working Git some time in the past.  Hopefully most
  of them are still up-to-date and working, but we wouldn't even
  know if a minority platform, for which an entry for it was added
  to the file in the past by some developer who needed a working Git
  on it, no longer works with the latest version of Git with recent
  toolchains after the original developer lost interest.
Ed Maste April 15, 2021, 8:37 p.m. UTC | #21
On Thu, 15 Apr 2021 at 15:41, Junio C Hamano <gitster@pobox.com> wrote:
>
> Ed Maste <emaste@freebsd.org> writes:
>
> > On Thu, 18 Mar 2021 at 21:29, brian m. carlson
> > <sandals@crustytoothpaste.net> wrote:
> >> > +* Works across Windows/Linux/macOS
> >>
> >> Git supports other platforms as well.
> >
> > In particular, FreeBSD is an example of a platform that is not in the
> > above list, but included in Git's CI. Is there an explicit list of
> > supported platforms (and perhaps a notion of support tiers)?
>
> It is not like there is a Git company who employs developers to
> support certain platforms.  This is the mailing list for the open
> source development community for Git, and Developers come and leave
> over time [*].

I'm sorry that my query wasn't clear; I have no expectation of Git
volunteers providing support (in the commercial sense) for any
particular platform.

What I am interested in is the Git community's expectations around
platform support, with respect to new features, changes that break one
or more platforms, and similar. I submitted portability improvements
for FreeBSD, and certainly expected that if a change introduced a
regression on one of Linux, Windows, or macOS it would not be
accepted.

> * You can peek into config.mak.uname to see the list of platforms
>   that have had a working Git some time in the past.  Hopefully most
>   of them are still up-to-date and working, but we wouldn't even
>   know if a minority platform, for which an entry for it was added
>   to the file in the past by some developer who needed a working Git
>   on it, no longer works with the latest version of Git with recent
>   toolchains after the original developer lost interest.

Yes - this is what I'm wondering about. It seems this information is
not available other than by inspecting config.mak.uname or reading
mailing list archives. I can also look at .travis.yml, .cirrus.yml,
and .github/workflows/main.yml to get a sense of the platforms
supported by Git's CI. That includes at least various versions or
flavours of Linux, macOS, Windows, and FreeBSD. Is it worth putting a
sentence or two in README.md about this?
Junio C Hamano April 15, 2021, 8:50 p.m. UTC | #22
Ed Maste <emaste@freebsd.org> writes:

> Yes - this is what I'm wondering about. It seems this information is
> not available other than by inspecting config.mak.uname or reading
> mailing list archives. I can also look at .travis.yml, .cirrus.yml,
> and .github/workflows/main.yml to get a sense of the platforms
> supported by Git's CI. That includes at least various versions or
> flavours of Linux, macOS, Windows, and FreeBSD. Is it worth putting a
> sentence or two in README.md about this?

I do not think that README, which is end-user facing, is a good
place for that.

If anything, I am inclined to say that developer facing documents
like SubmittingPatches and CodingGuidelines may be a better place to
say that things are expected to work on X, Y and Z platforms, but I
would think it is perfectly OK to start with a new feature that
works only on, say, macOS, and leave it unimplemented for other
platforms until somebody who is motivated enough to help fills the
gap.  So in that sense, "things are expected to work on these
platforms" may be already stronger than what we want to say.
brian m. carlson April 15, 2021, 10:28 p.m. UTC | #23
On 2021-04-15 at 20:37:54, Ed Maste wrote:
> On Thu, 15 Apr 2021 at 15:41, Junio C Hamano <gitster@pobox.com> wrote:
> >
> > Ed Maste <emaste@freebsd.org> writes:
> >
> > > On Thu, 18 Mar 2021 at 21:29, brian m. carlson
> > > <sandals@crustytoothpaste.net> wrote:
> > >> > +* Works across Windows/Linux/macOS
> > >>
> > >> Git supports other platforms as well.
> > >
> > > In particular, FreeBSD is an example of a platform that is not in the
> > > above list, but included in Git's CI. Is there an explicit list of
> > > supported platforms (and perhaps a notion of support tiers)?
> >
> > It is not like there is a Git company who employs developers to
> > support certain platforms.  This is the mailing list for the open
> > source development community for Git, and Developers come and leave
> > over time [*].
> 
> I'm sorry that my query wasn't clear; I have no expectation of Git
> volunteers providing support (in the commercial sense) for any
> particular platform.
> 
> What I am interested in is the Git community's expectations around
> platform support, with respect to new features, changes that break one
> or more platforms, and similar. I submitted portability improvements
> for FreeBSD, and certainly expected that if a change introduced a
> regression on one of Linux, Windows, or macOS it would not be
> accepted.

We don't have a fixed set of supported tiers like, e.g. Rust.  We have
CI for some platforms, and we have people who routinely run Git,
including RCs and development branches like next, on various platforms
and report back.  If something breaks CI, obviously you are expected to
fix it, and if someone says you broke their platform, you are expected
to unbreak it (for open source systems like FreeBSD where you can spin
up a VM) or at least work with the interested party to unbreak it.

Otherwise, support is best effort.  While I don't use FreeBSD, I'm
reasonably aware of what functionality it does and doesn't support, and
I'll try to avoid inserting Linuxisms into our code.  Similarly,
sometimes people ask us to support some obsolete OS which doesn't have
security support (e.g., CentOS 5), and sometimes we accept patches for
that.  (I am personally opposed to supporting systems without security
support, but other developers feel differently.)  We will generally
accept reasonable portability patches for most OSes with little fanfare.

Developers often will CC maintainers of specific OSes (most often,
Windows) if they want to make sure that the patches being proposed meet
that platform's needs.

I have broken macOS in an edge case in the past due to its
case-insensitive file system behavior, and nobody noticed until the
release.  Since our testsuite lacked a test for that case and nobody
running macOS pre-releases on a regular basis hit that case (two files
in the repository differing only in case) and complained, it got shipped
broken, although we did promptly fix it.

That's the kind of support level we have.  Basically, we do our best,
and if there's a problem and someone shouts, we'll fix it.
diff mbox series

Patch

diff --git a/Documentation/Makefile b/Documentation/Makefile
index 81d1bf7a049b..06ba5945c428 100644
--- a/Documentation/Makefile
+++ b/Documentation/Makefile
@@ -96,6 +96,7 @@  TECH_DOCS += technical/protocol-common
 TECH_DOCS += technical/protocol-v2
 TECH_DOCS += technical/racy-git
 TECH_DOCS += technical/reftable
+TECH_DOCS += technical/project-configured-hooks
 TECH_DOCS += technical/send-pack-pipeline
 TECH_DOCS += technical/shallow
 TECH_DOCS += technical/signature-format
diff --git a/Documentation/technical/project-configured-hooks.txt b/Documentation/technical/project-configured-hooks.txt
new file mode 100644
index 000000000000..e880dd7af300
--- /dev/null
+++ b/Documentation/technical/project-configured-hooks.txt
@@ -0,0 +1,426 @@ 
+Project Configured Hooks
+------------------------
+
+Background
+~~~~~~~~~~
+
+Context
+^^^^^^^
+
+Git has https://git-scm.com/docs/githooks[hooks] functionality to allow users to
+execute commands or scripts when certain Git events occur. Some use cases
+include:
+
+* The `pre-commit` hook event:
+
+  ** A developer may want to lint their changes to enforce certain code style
+  and quality. The project may want the commit to fail so that commits that
+  violate the project's style don't get uploaded.
+
+  ** The project may want to prevent developers from committing passwords or
+  other sensitive information e.g. via
+  https://github.com/awslabs/git-secrets[git-secrets].
+
+* The `commit-msg` hook event: the project may want to enforce a certain commit
+message style. This may be out of necessity:
+https://www.gerritcodereview.com/[Gerrit Code Review], for example, requires
+each commit to have a Change-Id in the footer.
+
+* The `pre-push` hook: the project may want to verify that pushes are going to
+the correct central repository to prevent leaks.
+
+A common thread between these use cases is to enforce certain behavior across
+many developers working in the same code base, encouraging best practices and
+healthy code quality.
+
+Hooks today are configured individually in each clone, making it difficult for
+project maintainers to enforce hooks usage across them.
+https://lore.kernel.org/git/20210311021037.3001235-2-emilyshaffer@google.com[Configuration-based
+hook management], by moving hooks to the config, makes this easier; individuals
+can at least configure hooks to be used across multiple workspaces on their
+host. However, there is still no good way for project maintainers to encourage
+or enforce adoption of specific hook commands on specific hook events in a
+clone. As such, there are many tools that provide this functionality on top of
+Git (see <<prior-art, Prior Art>>).
+
+We propose adding native Git functionality to allow project maintainers to
+specify hooks that a user ought to install and utilize in their development
+workflows.
+
+Server-side vs Local Checks
+^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+A large motivation for this change is providing projects a method to enable
+local checks of code e.g. linting. As documented in
+https://git-scm.com/docs/gitfaq#_hooks[gitfaq], in some cases, server-side
+checks may be more appropriate, especially since developers can always skip
+local hook execution. We think there are still benefits to local checks:
+
+* Ensuring commit message style and preventing the committing of sensitive data,
+as described above. In particular, if CI results are public, as with many open
+source projects, server side checks are useless for guarding against leaking
+sensitive data.
+
+* Helps developers catch issues earlier: typically developers need to push to
+the remote to trigger server-side checks. Local hooks can be run anytime the
+developer wants. This is especially useful if the project has slow
+server-checks; catching issues locally can save the developer a lot of time
+waiting for CI. They are also useful for locally reproducing an issue identified
+in CI, helping resolve issues faster.
+
+* Since the typical goal of developers to submit changes to a central
+repository, their interests are aligned with the project maintainers'; while
+they can choose to skip local hook execution, it is in their interest to allow
+hooks to run at least before proposing code for submission to the central
+repository to increase the chances of the code getting accepted.
+
+In the ideal world, developers and project maintainers use both local and server
+side checks in their workflow. However, for many smaller projects, this may not
+be possible: CI may be too expensive to run or configure. The number of local
+solutions to this use case speaks to this need (see <<prior-art, Prior Art>>).
+Bringing this natively to Git can give all these developers a well-supported,
+secure implementation opposed to the fragmentation we see today.
+
+User Goals / Critical User Journeys
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+* As a project maintainer,
+
+    ** I want to enforce code style so I require developers to use a
+    standardized linter.
+
+    ** I want to prevent sensitive data from getting checked in.
+
+    ** I want to prevent leaks so I check that developers are uploading code to
+    the right private central repository. Conversely, I may want to encourage
+    more open source development and encourage developers to push to public
+    central repositories.
+
+    ** I want this to just work for all the developers in my project, without
+    needing to support them through configuration.
+
+* As a developer developing in a local clone,
+
+    ** I want to set up my workspace.
+
+    ** I want control over what code runs on my machine.
+
+    ** I want to set up my own hooks, in addition to or in lieu of project
+    configured hooks.
+
+    ** I want to skip hooks from running (for various reasons).
+
+Security Considerations and Design Principles
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+We must balance the desire to make hooks setup easy for developers --- allowing
+them to get hooks set up with low friction, and hence increasing the probability
+of them adopting these hooks --- with protecting users from the security risks
+of arbitrary code execution on their hosts.
+
+To inform the design, we propose these design principles:
+
+* User consent: Users must explicitly agree to hooks usage; no hooks should
+execute without consent, and users should re-consent if hooks update. Users can
+opt-out of hooks.
+
+* Trust comes from the central repository:
+  ** Most users don't have the time or expertise to properly audit every hook
+  and what it does. There must be trust between the user and the remote that the
+  code came from, and the Git project should ensure trust to the degree it can
+  e.g. enforce HTTPS for its integrity guarantees.
+
+  ** Since developers will likely build their local clone in their development
+  process, at some point, arbitrary code from the repository will be executed.
+  In this sense, hooks _with user consent_ do not introduce a new attack surface.
+
+* Give users visibility: Git must allow users to make informed decisions. This
+means surfacing essential information to the user in a visible manner e.g. what
+remotes the hooks are coming from, whether the hooks have changed in the latest
+checkout.
+
+Feature Requirements
+~~~~~~~~~~~~~~~~~~~~
+
+Minimum Feature Set
+^^^^^^^^^^^^^^^^^^^
+
+* A repository can specify a configuration for what hook commands are
+enabled for what hook events
+
+* A repository can specify, in this configuration, where the hook
+commands reside
+
+    ** This could be a path to a script/binary within the repository
+
+    ** This could be a path to a script/binary contained within submodules of
+    the repository
+
+    ** This could be a user installed command or script/binary that exists
+    outside of the repository and is present in `$PATH`
+
+* This configuration should only apply if it was received over HTTPS
+
+* A setup command for users to set up hooks
+
+    ** Hook setup could happen at clone time assuming the user has consented
+    e.g. if `--setup-hooks` is passed to `git clone`
+
+* Users must explicitly approve hooks at least once
+
+    ** Running the setup command should count as approval, including if the user
+    consented during the clone
+
+    ** When a hook command changes, a user should re-approve execution (note:
+    implementation should not interfere with requirement listed in “Fast
+    Follows")
+
+* Automation is able to continue to use clone and other commands
+non-interactively
+
+* Works across Windows/Linux/macOS
+
+Fast Follows
+^^^^^^^^^^^^
+
+* When prompted to execute a hook, users can specify always or never, even if
+the hook updates
+
+Nice to Haves
+^^^^^^^^^^^^^
+
+* A method to skip hook execution i.e. `--no-verify` works everywhere
+
+* Support a “warnings only mode” where hooks run but don’t block commands from
+executing
+
+Out of Scope
+^^^^^^^^^^^^
+
+* Ensuring the user has installed software that isn't distributed with the repo
+
+Implementation Exploration: Check "magic" branch for configs at fetch time
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Example User Experience
+^^^^^^^^^^^^^^^^^^^^^^^
+
+===== Case 1: Consent through clone
+
+....
+$ git clone --setup-hooks
+...
+
+The following hooks were installed from remote `origin` ($ORIGIN_URL):
+
+pre-commit: git-secrets --pre_commit_hook
+pre-push:  $GIT_ROOT/pre_push.sh
+....
+
+===== Case 2: Prompting after clone
+....
+$ git clone
+...
+
+Remote `origin` ($ORIGIN_URL) suggest installing the following hooks:
+
+pre-commit: git-secrets --pre_commit_hook
+pre-push:  $GIT_ROOT/pre_push.sh
+
+# instead of prompting, we could give users commands to run instead
+# see case 3
+
+Do you wish to install them?
+1. Yes (this time)
+2. Yes (always from origin)
+3. No (not this time)
+4. No (never)
+....
+
+===== Case 3: Re-prompting when hooks change
+....
+$ git pull
+
+The following hooks were updated from remote `origin` ($ORIGIN_URL):
+
+pre-push:  $GIT_ROOT/pre_push.sh
+
+If you wish to install them, run `git hook setup origin`.
+
+If you wish to always accept hooks from `origin`, run `git hook setup --always
+origin`. You should only do this if you trust code changes from origin.
+
+To always ignore hooks from `origin`, run `git hook ignore origin`.
+....
+
+===== Case 4: Nudging when hooks weren't installed
+....
+$ git commit
+advice: The repository owner has recommended a 'pre-commit' hook that was not run.
+To view it, run `git show origin/refs/recommended-config:some-pre-commit`. To install it, run `git hook setup origin pre-commit`
+
+Turn off this advice by setting config variable advice.missingHook to false."
+....
+
+
+Implementation Sketch
+^^^^^^^^^^^^^^^^^^^^^
+
+* Perform fetch as normal
+
+* After fetch is complete, Git checks for a "magic" config branch (e.g.
++origin/refs/recommended-config+) which contains information about config lines
+an end-user may want (including hooks).
+
+* As part of the fetch subcommand, Git prompts users to install the configs
+contained there.
+
+    ** User responses to that prompt could be "sticky" - e.g. a user could reply
+    "no (this time)", "no (never)", "yes (this time)", or "yes (always)".
+    Always/never indicate that the user trusts the remote this config is coming
+    from, and should not apply to configs fetched from other remotes.
+
+Later, we might want to do this before the initial clone is performed; that
+workflow looks like:
+
+* During clone, perform ls-refs as normal
+
+* If the server has a "magic" config branch, fetch only that config branch.
+
+* Prompt users as described above.
+
+* Perform the rest of the clone.
+
+Pros
+^^^^
+
+* Repository owners have a method for providing recommended config for
+contributors.
+
+* Installation flow happens without additional user intervention.
+
+* Keeping config branch and history separate from code branch and history means
+it is versioned, but not tied to user's checkout.
+
+* Letting users specify "always" or "never" reduces amount of pain introduced by
+interactive "configuration wizard" flow.
+
+Cons
+^^^^
+
+* Requires addition of step to fetch (and later clone).
+
+* Turning a "set and forget" command like clone into an interactive session with
+the user is not ideal; care must be taken to avoid breaking bots.
+
+* Inflating configs and executables from a remote tracking branch which is never
+checked out could be slow.
+
+Future Work
+~~~~~~~~~~~
+
+* Extending this to allow repository owners to specify specific configurations
+in general e.g. this repository should use partial-clone with these parameters.
+
+* Extending this to support submodules: We want to make sure this works in a way
+that's easy to adapt to submodules, who would likely need to run the same hooks
+as the superproject; for example, submodules could inherit the superproject
+config.
+
+* A way to perform general set up of a code base e.g. installing dependencies,
+any first-build related tasks
+
+* Sandboxing hook execution to provide higher levels of security.
+
+[[prior-art]]
+Prior Art
+~~~~~~~~~
+
+Husky
+^^^^^
+
+* Add it as a dev dependency in package.json
+
+* Supports out-of-the-box configuration: Adding a `prepare` script in
+package.json with `husky install` will automate the installation of the husky
+script in the .git directory.
+
+pre-commit
+^^^^^^^^^^
+
+* Acts as a package manager for linting, installing the required linters as
+needed
+
+* `pre-commit install` installs the pre-commit hook shim
+
+* Config is checked into the repository, allowing owners to set versions for
+linters
+
+* Provides some out-of-the-box hooks:
+https://github.com/pre-commit/pre-commit-hooks[https://github.com/pre-commit/pre-commit-hooks]
+
+Repo Hooks
+^^^^^^^^^^
+
+* A Git repository containing hooks commands is specified in the manifest and
+checked out during `repo init`.
+
+* A `repo-hooks` element specifies the Git repository where the hooks code lives
+and which hook to enable. This is typically
+https://android.googlesource.com/platform/tools/repohooks/.
+
+* The only hook event currently supported is `pre-upload`, running when people
+run +repo upload+.
+
+* The hooks code file name must be the same as the hook event name e.g.
+`pre-upload.py`.
+
+* When a hook is used for the first time, the user is prompted
+to approve execution
+
+    ** For manifests fetched via secure protocols (e.g. https://), the user is
+    prompted once.
+
+    ** For insecure protocols (e.g. http://), the user is prompted whenever the
+    registered repohooks project is updated and a hook is triggered.
+
+* Repo hooks must be python modules (Python 2 or 3, with 2 support deprecated)
+
+* Repo hooks run from the top of the repo directory
+
+    ** if the repo client is under `~/tree/`, then that is where the hook runs,
+    even if you ran repo in a git repository at `~/tree/src/foo/`, or in a
+    subdirectory of that git repository in `~/tree/src/foo/bar/`.
+
+* `--no-verify` allows developers to bypass hooks.
+
+* `--ignore-hooks` allows developers to run hooks without blocking
+
+// The '+' in Gerrit URL frightens asciidoctor.
+:repohook-read: https://android.googlesource.com/platform/tools/repohooks/+/refs/heads/master/rh/config.py
+:repohook-config: https://android.googlesource.com/platform/tools/repohooks/+/refs/heads/master#config-files
+
+* The hooks {repohook-read}[read] a {repohook-config}[config file] to determine
+what code to actually execute.
+
+    ** Config can be global e.g. at the top of the Repo workspace and local, and
+    they get merged together.
+
+    ** Local config: looks for the closest config file to where you ran repo,
+    allowing each repository to configure its own hooks
+
+* Example code that gets run:
+
+    ** pylint3 (requiring this to be installed on the host)
+
+    ** commit_msg_bug_field: require “Bug:” to be present in the commit message
+    (built into repohooks)
+
+Glossary
+~~~~~~~~
+
+*hook event*: A point during Git’s execution where user scripts may be run, for
+example, `prepare-commit-msg` or `pre-push`.
+
+*hook command*: A user script or executable which will be run on one or more
+hook events.